Making Big Data Interactive with Spark
CS alumnus Matei Zaharia comes back to the University of Waterloo to discuss Spark - a single programming model for big data sets - and the industry applications of the most active project in the Apache big data ecosystem.
The rapid growth in data volumes requires new computer systems that scale out across hundreds of machines. While early programming models, such as MapReduce, handled large-scale batch processing, the demands on these systems have also grown: in particular, users quickly needed to run (1) more interactive ad-hoc queries, (2) more complex multi-pass algorithms (e.g. machine learning), and (3) real-time processing on large data streams.