Introduction
Distributed Data Stores
- Presentation 1: Tuhin Tiwari
- Paper: N. Bronson, et al., Tao: Facebook's Distributed Data Store For The Social Graph, Proc. USENIX Annual Technical Conference, pages 49-60, 2013.
- Presentation slides
- Presentation 2: Noshin Nawar Sadat
- Paper: S. Ghemawat, H. Gobioff, S-H. Leung. The Google file system. Proc. 19th ACM Symposium on Operating Systems Principles, pages 29-43, 2003.
- Presentation slides
- Presentation 3: Ruoxi Zhang
- Paper: K. Shvachko, H. Kuang, S. Radia, R. Chansler, The Hadoop Distributed File System, IEEE 26th Symposium on Mass Storage Systems and Technologies, 2010.
- Presentation slides
Main Memory Systems
- Presentation 1: Michael Ababe
- Paper: Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Åke Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, Mike Zwilling. Hekaton: SQL server's memory-optimized OLTP engine. Proc. ACM SIGMOD International Conference on Management of Data, pages 1243-1254, 2013.
- Presentation slides
- Presentation 2: Brad Glasbergen
- Paper: Alfons Kemper, Thomas Neumann. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. Proc. IEEE 29th International Conference on Data Engineering, pages 195-206, 2011.
- Presentation slides
- Presentation 3: Varshanth Rao
- Paper: Franz Färber, Norman May, Wolfgang Lehner, Philipp Große, Ingo Müller, Hannes Rauhe, Jonathan Dees. The SAP HANA Database -- An Architecture Overview. IEEE Data Eng. Bull., 35(1): 28-33, 2012.
- Presentation slides
MapReduce-based data management
- Presentation 1: Huanyi Chen
- Paper: M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, A. Ghodsi, J. Gonzalez, S. Shenker, and I. Stoica. Apache Spark: a unified engine for big data processing. Commun. ACM 59(11): 56-65, 2016.
- Presentation slides
- Presentation 2: Camilo Munoz
- Paper: R. Sumbaly, J. Kreps, and S. Shah. The big data ecosystem at LinkedIn. Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 1125-1134, 2013.
- Presentation slides
- Presentation 3: Manoj Sharma
- Paper: A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murthy. Hive - a petabyte scale data warehouse using Hadoop," Proc. IEEE 26th International Conference on Data Engineering, pages 996-1005, 2010.
- Presentation slides
Stream processing systems
- Presentation 1: Sushant Raikar
- Paper: Jean-François Im, Kishore Gopalakrishna, Subbu Subramaniam, Mayank Shrivastava, Adwait Tumbde, Xiaotian Jiang, Jennifer Dai, Seunghyun Lee, Neha Pawar, Jialiang Li, and Ravi Aringunram. Pinot: Realtime OLAP for 530 Million Users. In Proc. ACM International Conference on Management of Data, pages 583-594, 2018.
- Presentation slides
- Presentation 2: Sidharth Singla
- Paper: L. Abraham, J. Allen, O. Barykin, V. Borkar, B. Chopra, C. Gerea, D. Merl, J. Metzler, D. Reiss, S. Subramanian, J. L. Wiener, and O. Zed. Scuba: diving into data at facebook. Proc. VLDB Endow., 6(11): 1057-1067, 2013.
- Presentation slides
- Presentation 3: Aida Sheshbolouki
- Paper: Pramod Bhatotia, Umut A. Acar, Flavio P. Junqueira, and Rodrigo Rodrigues. Slider: incremental sliding window analytics. In Proc. 15th International Middleware Conference, pages 61-72, 2014.
Graph databases
- Presentation 1: Ishank Jain
- Paper: Ayush Dubey, Greg D. Hill, Robert Escriva, and Emin Gün Sirer. Weaver: a high-performance, transactional graph database based on refinable timestamps. Proc. VLDB Endow. 9(11): 852-863, 2016.
- Presentation slides
- Presentation 2: Camilo Munoz
- Paper: Norbert Martínez-Bazan, M. Ángel Águila-Lorente, Victor Muntés-Mulero, David Dominguez-Sal, Sergio Gómez-Villamor, and Josep-L. Larriba-Pey. Efficient graph management based on bitmap indices. Proc. 16th International Database Engineering & Applications Symposium, pages 110-119, 2012.
- Presentation slides
- Presentation 3: Tuhin Tiwari
- Paper: Anurag Khandelwal, Zongheng Yang, Evan Ye, Rachit Agarwal, Ion Stoica. ZipG: A Memory-efficient Graph Store for Interactive Queries. Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 1149-1164, 2017.
- Presentation slides
Graph analytics
- Presentation 1: Noshin Nawar Sadat
- Paper: Shiv Verma, Luke M. Leslie, Yosub Shin, Indranil Gupta. An Experimental Comparison of Partitioning Strategies in Distributed Graph Processing. Proc. VLDB Endow., 10(5): 493-504, 2017.
- Presentation slides
- Presentation 2: Ruoxi Zhang
- Paper: B. Shao, H. Wang, Y. Li. Trinity: a distributed graph engine on a memory cloud, Proc. ACM SIGMOD International Conference on Management of Data, pages 505-516, 2013.
- Presentation slides
- Presentation 3: Brad Glasbergen
- Paper: Carlo Curino, Evan Jones, Yang Zhang, and Sam Madden. 2010. Schism: a workload-driven approach to database replication and partitioning. Proc. VLDB Endow. 3(1-2): 48-57,2010.
- Presentation slides
Machine learning for big data analytics
- Presentation 1: Aida Sheshbolouki
- Paper: William Hamilton, Ying Zhitao, and Jure Leskovec. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems (Proc. 31st Conference on Neural Information Processing Systems), pages 1024-1034, 2017.
- Presentation slides
- Presentation 2: Juan Carrillo
- Paper: Matthias Boehm, Berthold Reinwald, Dylan Hutchison, Prithviraj Sen, Alexandre V. Evfimievski, Niketan Pansare. On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML. Proc. VLDB Endow., 11(12): 1755-1768, 2018.
- Presentation slides
- Presentation 3: Huanyi Chen
- Paper: Yongjoo Park, Ahmad Shahab Tajik, Michael J. Cafarella, Barzan Mozafari. Database Learning: Toward a Database that Becomes Smarter Every Time. Proc. ACM SIGMOD International Conference on Management of Data, pages 587-602, 2017.
- Presentation slides
RDF data processing
- Presentation 1: Juan Carrillo
- Paper: Gensheng Zhang, Damian Jimenez, Chengkai Li. Maverick: Discovering Exceptional Facts from Knowledge Graphs. Proc. ACM SIGMOD International Conference on Management of Data, pages 1317-1332, 2018.
- Presentation slides
- Presentation 2: Michael Abebe
- Paper: J. Huang, D. J. Abadi, K. Ren. Scalable SPARQL Querying of Large RDF Graphs. Proc. VLDB Endow., 4(11): 1123-1134, 2011.
- Presentation slides
- Presentation 3: Sidharth Singla
- Paper: J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A generic architecture for storing and querying RDF and RDF schema. Proc. 1st International Semantic Web Conference, pages 54-68, 2002.
- Presentation slides
Open topic -- based on your choice
- Presentation 1: Varshanth Rao
- Paper: W. Wang, J. Gao, M. Zhang, G. Chen, T.K. Ng, B.C. Ooi, J. Shao, M. Reyad. Rafiki: Machine Learning as an Analytics Service System. Proc. VLDB Endow., 12(2): 128-140, 2018.
- Presentation slides
- Presentation 2: Sushant Raikar
- Paper: Michael Stonebraker and Ariel Weisberg. The VoltDB Main Memory DBMS. IEEE Data Eng. Bull., 36(2): 21-27, 2013.
- Presentation slides
- Presentation 3: Ishank Jain
- Paper: Maaz Bin Safeer Ahmad, Alvin Cheung. Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications. Proc. ACM SIGMOD International Conference on Management of Data, pages 1205-1220, 2018.
- Presentation slides
- Presentation 4: Manoj Sharma
- Paper: Viktor Leis, Michael Haubenschild, Alfons Kemper, Thomas Neumann. LeanStore: In-Memory Data Management beyond Main Memory. Proc. 34th IEEE International Conference on Data Engineering, pages 185-196, 2018.
- Presentation slides
No class
Project presentations
- Presentation 1: Noshin Nawar Sadat & Ishank Jain
- Presentation 2: Varshanth Rao & Sidharth Singla
- Presentation 3: Camilo Munoz & Juan Manuel Carillo
- Presentation 4: Michael Abebe & Brad Glasbergen
April 4, 2019
Project presentations
- Presentation 5: Huanyi Chen & Ruoxi Zhang
- Presentation 6: Aida Sheshbolouki & Manoj Sharma
- Presentation slides
- Presentation 7: Sushant Raikar & Tuhin Tiwari