Weekly Schedule

The following is the weekly schedule for the course.The schedule will be firmed up as students choose their presentation slots.

Week 1 - January 8, 2019

Introduction

Introductory lecture notes

Week 2 - January 15, 2019

Distributed Data Stores

Presentation 1: Tuhin Tiwari

Paper: N. Bronson, et al., Tao: Facebook's Distributed Data Store For The Social Graph, Proc. USENIX Annual Technical Conference, pages 49-60, 2013.

Presentation slides

Presentation 2: Noshin Nawar Sadat

Paper: S. Ghemawat, H. Gobioff, S-H. Leung. The Google file system. Proc. 19th ACM Symposium on Operating Systems Principles, pages 29-43, 2003.

Presentation slides

Presentation 3: Ruoxi Zhang

Paper: K. Shvachko, H. Kuang, S. Radia, R. Chansler, The Hadoop Distributed File System, IEEE 26th Symposium on Mass Storage Systems and Technologies, 2010.

Presentation slides

Week 3 - January 22, 2019

Main Memory Systems

Presentation 1: Michael Ababe

Paper: Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Åke Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, Mike Zwilling. Hekaton: SQL server's memory-optimized OLTP engine. Proc. ACM SIGMOD International Conference on Management of Data, pages 1243-1254, 2013.

Presentation slides

Presentation 2: Brad Glasbergen

Paper: Alfons Kemper, Thomas Neumann. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. Proc. IEEE 29th International Conference on Data Engineering, pages 195-206, 2011.

Presentation slides

Presentation 3: Varshanth Rao

Paper: Franz Färber, Norman May, Wolfgang Lehner, Philipp Große, Ingo Müller, Hannes Rauhe, Jonathan Dees. The SAP HANA Database -- An Architecture Overview. IEEE Data Eng. Bull., 35(1): 28-33, 2012.

Presentation slides

Week 4 - January 29, 2019

MapReduce-based data management

Presentation 1: Huanyi Chen

Paper: M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, A. Ghodsi, J. Gonzalez, S. Shenker, and I. Stoica. Apache Spark: a unified engine for big data processing. Commun. ACM 59(11): 56-65, 2016.

Presentation slides

Presentation 2: Camilo Munoz

Paper: R. Sumbaly, J. Kreps, and S. Shah. The big data ecosystem at LinkedIn. Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 1125-1134, 2013.

Presentation slides

Presentation 3: Manoj Sharma

Paper: A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murthy. Hive - a petabyte scale data warehouse using Hadoop," Proc. IEEE 26th International Conference on Data Engineering, pages 996-1005, 2010.

Presentation slides

Week 5 - February 5, 2019

Stream processing systems

Presentation 1: Sushant Raikar

Paper: Jean-François Im, Kishore Gopalakrishna, Subbu Subramaniam, Mayank Shrivastava, Adwait Tumbde, Xiaotian Jiang, Jennifer Dai, Seunghyun Lee, Neha Pawar, Jialiang Li, and Ravi Aringunram. Pinot: Realtime OLAP for 530 Million Users. In Proc. ACM International Conference on Management of Data, pages 583-594, 2018.

Presentation slides

Presentation 2: Sidharth Singla

Paper: L. Abraham, J. Allen, O. Barykin, V. Borkar, B. Chopra, C. Gerea, D. Merl, J. Metzler, D. Reiss, S. Subramanian, J. L. Wiener, and O. Zed. Scuba: diving into data at facebook. Proc. VLDB Endow., 6(11): 1057-1067, 2013.

Presentation slides

Presentation 3: Aida Sheshbolouki

Paper: Pramod Bhatotia, Umut A. Acar, Flavio P. Junqueira, and Rodrigo Rodrigues. Slider: incremental sliding window analytics. In Proc. 15th International Middleware Conference, pages 61-72, 2014.

Presentation slides

Week 6 - February 12, 2019

Graph databases

Presentation 1: Ishank Jain

Paper: Ayush Dubey, Greg D. Hill, Robert Escriva, and Emin Gün Sirer. Weaver: a high-performance, transactional graph database based on refinable timestamps. Proc. VLDB Endow. 9(11): 852-863, 2016.

Presentation slides

Presentation 2: Camilo Munoz

Paper: Norbert Martínez-Bazan, M. Ángel Águila-Lorente, Victor Muntés-Mulero, David Dominguez-Sal, Sergio Gómez-Villamor, and Josep-L. Larriba-Pey. Efficient graph management based on bitmap indices. Proc. 16th International Database Engineering & Applications Symposium, pages 110-119, 2012.

Presentation slides

Presentation 3: Tuhin Tiwari

Paper: Anurag Khandelwal, Zongheng Yang, Evan Ye, Rachit Agarwal, Ion Stoica. ZipG: A Memory-efficient Graph Store for Interactive Queries. Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 1149-1164, 2017.

Presentation slides

Week 7 - February 19, 2019 (Reading week, no class)

Week 8 - February 26, 2019

Graph analytics

Presentation 1: Noshin Nawar Sadat

Paper: Shiv Verma, Luke M. Leslie, Yosub Shin, Indranil Gupta. An Experimental Comparison of Partitioning Strategies in Distributed Graph Processing. Proc. VLDB Endow., 10(5): 493-504, 2017.

Presentation slides

Presentation 2: Ruoxi Zhang

Paper: B. Shao, H. Wang, Y. Li. Trinity: a distributed graph engine on a memory cloud, Proc. ACM SIGMOD International Conference on Management of Data, pages 505-516, 2013.

Presentation slides

Presentation 3: Brad Glasbergen

Paper: Carlo Curino, Evan Jones, Yang Zhang, and Sam Madden. 2010. Schism: a workload-driven approach to database replication and partitioning. Proc. VLDB Endow. 3(1-2): 48-57,2010.

Presentation slides

Week 9 - March 5, 2019

Machine learning for big data analytics

Presentation 1: Aida Sheshbolouki

Paper: William Hamilton, Ying Zhitao, and Jure Leskovec. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems (Proc. 31st Conference on Neural Information Processing Systems), pages 1024-1034, 2017.

Presentation slides

Presentation 2: Juan Carrillo

Paper: Matthias Boehm, Berthold Reinwald, Dylan Hutchison, Prithviraj Sen, Alexandre V. Evfimievski, Niketan Pansare. On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML. Proc. VLDB Endow., 11(12): 1755-1768, 2018.

Presentation slides

Presentation 3: Huanyi Chen

Paper: Yongjoo Park, Ahmad Shahab Tajik, Michael J. Cafarella, Barzan Mozafari. Database Learning: Toward a Database that Becomes Smarter Every Time. Proc. ACM SIGMOD International Conference on Management of Data, pages 587-602, 2017.

Presentation slides

Week 10 - March 12, 2019

RDF data processing

Presentation 1: Juan Carrillo

Paper: Gensheng Zhang, Damian Jimenez, Chengkai Li. Maverick: Discovering Exceptional Facts from Knowledge Graphs. Proc. ACM SIGMOD International Conference on Management of Data, pages 1317-1332, 2018.

Presentation slides

Presentation 2: Michael Abebe

Paper: J. Huang, D. J. Abadi, K. Ren. Scalable SPARQL Querying of Large RDF Graphs. Proc. VLDB Endow., 4(11): 1123-1134, 2011.

Presentation slides

Presentation 3: Sidharth Singla

Paper: J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A generic architecture for storing and querying RDF and RDF schema. Proc. 1st International Semantic Web Conference, pages 54-68, 2002.

Presentation slides

Week 11 - March19, 2019

Open topic -- based on your choice

Presentation 1: Varshanth Rao

Paper: W. Wang, J. Gao, M. Zhang, G. Chen, T.K. Ng, B.C. Ooi, J. Shao, M. Reyad. Rafiki: Machine Learning as an Analytics Service System. Proc. VLDB Endow., 12(2): 128-140, 2018.

Presentation slides

Presentation 2: Sushant Raikar

Paper: Michael Stonebraker and Ariel Weisberg. The VoltDB Main Memory DBMS. IEEE Data Eng. Bull., 36(2): 21-27, 2013.

Presentation slides

Presentation 3: Ishank Jain

Paper: Maaz Bin Safeer Ahmad, Alvin Cheung. Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications. Proc. ACM SIGMOD International Conference on Management of Data, pages 1205-1220, 2018.

Presentation slides

Presentation 4: Manoj Sharma

Paper: Viktor Leis, Michael Haubenschild, Alfons Kemper, Thomas Neumann. LeanStore: In-Memory Data Management beyond Main Memory. Proc. 34th IEEE International Conference on Data Engineering, pages 185-196, 2018.

Presentation slides

Week 12 - March 26, 2019

No class

Week 13 - April 2, 2019

Project presentations

Presentation 1: Noshin Nawar Sadat & Ishank Jain

Presentation slides

Presentation 2: Varshanth Rao & Sidharth Singla

Presentation slides

Presentation 3: Camilo Munoz & Juan Manuel Carillo

Presentation slides

Presentation 4: Michael Abebe & Brad Glasbergen

Presentation slides

April 4, 2019

Project presentations

Presentation 5: Huanyi Chen & Ruoxi Zhang

Presentation slides

Presentation 6: Aida Sheshbolouki & Manoj Sharma

Presentation slides

Presentation 7: Sushant Raikar & Tuhin Tiwari

Presentation slides