Please note: This seminar will take place in DC 1304.
Ana Klimovic, Assistant Professor
Systems Group, Department of Computer Science, ETH Zürich
Resource elasticity is fundamental to cloud computing. The more quickly a cloud platform can allocate resources to match the demand of each user request as it arrives, the less resources need to be pre-provisioned to meet performance requirements. However, even serverless platforms — which can boot sandboxes in 10s to 100s of milliseconds — are not sufficiently elastic to avoid over-provisioning expensive resources (e.g., warm sandboxes to avoid cold starts). A key obstacle for true elasticity is that today’s cloud platforms are stuck retrofitting system software designed for a more traditional execution model of cloud computing based on long-running virtual machines that provide each user application with a POSIX-like interface. While providing a POSIX interface was important in the early days of cloud computing to ease migration from on premise clusters, today's developers design cloud-native applications, in which user-defined computations interact with a variety of cloud services (e.g. storage, AI inference, data analytics engines) over REST APIs.
In this talk, I will propose a declarative programming model catered to cloud-native applications that enables co-designing a much more efficient and elastic underlying execution system. I will present Dandelion, a new elastic cloud platform that implements this declarative programming model. Dandelion applications are expressed as DAGs of pure compute functions and HTTP-based communication functions. This enables Dandelion to securely execute user-defined compute functions in lightweight sandboxes that cold start in hundreds of microseconds, since executing pure functions does not require initializing a POSIX environment. Dandelion makes it practical to boot a sandbox on-demand for every compute function invocation, decreasing performance variability by two to three orders of magnitude compared to Firecracker and reducing committed memory by 96% on average when running the Azure Functions trace. I will discuss the implications of true elasticity for cloud applications like interactive data analytics and emerging agentic AI workflows.
Bio: Ana Klimovic is an Assistant Professor in the Systems Group of the Computer Science Department at ETH Zurich. Her research interests span operating systems, computer architecture, and their intersection with machine learning. Ana's work focuses on computer system design for large-scale applications such as cloud computing services, data analytics, and machine learning. Before joining ETH in August 2020, Ana was a Research Scientist at Google Brain and completed her Ph.D. in Electrical Engineering at Stanford University.