Master’s Thesis Presentation • Data Systems — Serverless Data Analytics with Flint

Monday, August 13, 2018 10:00 am - 10:00 am EDT (GMT -04:00)

Youngbin Kim, Master’s candidate
David R. Cheriton School of Computer Science

AbstractServerless architectures organized around loosely-coupled function invocations represent an emerging design for many applications. Recent work mostly focuses on user-facing products and event-driven processing pipelines. 

In this thesis, we explore a completely different part of the application space and examine the feasibility of analytical processing on big data using a serverless architecture. We present Flint, a prototype Spark execution engine that takes advantage of AWS Lambda to provide a pure pay-as-you-go cost model. With Flint, a developer uses PySpark exactly as before, but without needing a Spark cluster and only paying for the execution of individual Spark programs. We describe the design, implementation, and performance of Flint, along with the challenges associated with serverless analytics.