[DSG Logo]

CS 848: Graph Data Management

(Fall 2018)

Semih Salihoglu
firname [dot] lastname [at] uwaterloo [dot] ca
DC 3351
Meeting times: Tue 10:00am-1:00pm
Meeting location: DC 2568

Announcements

Overview

This seminar covers topic in graph data management and processing, starting from the very early graph data models and systems to the current day research on these topics. The most widespread data model underlying existing systems is the relational model, originally invented by Edgar F. Codd in 1970. At a high-level, data in the relational model is modeled as tuples in structured tables and data manipulation and retrieval is expressed using relation operators, such as selection, joins, and projection. In contrast, in graph-based models (at a high-level) data is often represented as a set of nodes or objects and pointers between the nodes. Data manipulation and retrieval is expressed as path finding in the underlying graph or traversals through the nodes. Although systems with graph-based models are not as in widespread use as relational systems in today's world, they have existed since the birth of data management systems. Many people have heard of Edgar Codd and the relational model. Yet, fewer people have heard of Charles Bachman, who is credited as the "inventor of database systems" for leading the development of the first data management system IDS that is based on a graph-based model called the network model.

Through the years, driven by different popular applications in different eras, graph-based systems have become popular. The first half of the seminar will cover the historical waves that made graph data models popular, such as the web and semantic web. The second half of the seminar will cover recent topics about graph data management & processing that are particularly popular in the database research community. Example topics include modern graph databases, graph data processing software based on Hadoop and Spark-like software, knowledge graphs, and example machine learning applications on graphs. The seminar is project-based and each student will do a term-long mini research project preferably on existing systems.

Prerequisites

Students are expected to have taken a course covering the fundamentals of databases at the level of an introductory course. If you are not familiar with database systems, it will be very difficult to enjoy the readings. Having systems background is not necessary but helpful in appreciating the material too.

Workload & Mark Breakdown

The main workload will consist of a term-long project that students will do related to one of the topics covered in the seminar. The other workload includes paper reviews and one presentation. Detailed information for each piece is below. Typically, I will be lecturing or giving a presentation, and leading a discussion in the first half of the seminar. The latter half will have a 25 minute presentation by a student followed by a discussion. This is a seminar, so your participation in the class is very critical to everyone's learning. That is why it will be a significant part of your grade. Please ask questions and make comments throughout the discussions. The workload pieces and mark breakdown is as follows:

Projects

Each student will do a separate project. If you want to work in pairs, please talk to me. Students can come up with their own projects but I have several projects that I am happy to suggest. The projects are intended to be mini-research projects in a topic related to graph management and processing. Example projects can be extending or replacing a component of an existing graph database, RDF engine, or a graph processing system, building a prototype system demonstrating a new design for a component of a system (e.g., a new storage format), experimenting with different possible designs of a component, experimenting with multiple algorithms for evaluating graph query, etc. If you are interested in a theoretical project, please talk to me. The most important thing is to pick a project on a topic you really like! You will meet with me several times during the term to discuss your progress. You have about 3 weeks to decide on the project. By September 28th you need to give a 1 page proposal of your project in double-column paper format.

Project Deliverables

There are three main deliverables of your project, a 6-page paper, source code of your project, and refereeing duty. We will simulate a workshop submission cycle. You will submit your papers on a workhshop site that we will setup. Each student will be on the PC of the workshop and will get to referee three of their peers' papers. We will have an extra PC meeting to accept or reject the papers. The paper and source code are due November 27th. From 27th until 30th you'll review your peers' papers offline. On November 30th, we'll have a face-to-face PC meeting.

Paper Reviews

For each class we will be writing two reviews for two of the papers assigned to that day (except the first seminar). If there are more than two papers assigned, you can pick either one. You are allowed to skip 2 reviews throughout the term. The reviews will be at most 1 page long and roughly 2 paragraphs. You have to finish your review with one question to start a discussion in the seminar. The reviews are due the Monday at 6pm before the seminar. You are expected to (very very) briefly answer the following 6 questions and finish your reviews with a question that can start a discussion in class:

The first 4 of these questions are from Jennifer Widom's tips for writing introductions to technical papers. I strongly recommend that each one of you read this entire document very very carefully (probably multiple times) at some point in your graduate studies. There is no fixed format for the reviews but I recommend: Single column, 1.5 space, 12 pt, in Latex.

Presentations

Each student will be doing 1 presentation in the term. The presentation will be about 25 minutes long. Here are the important points summarizing what you have to do for your presentations.

Schedule and Papers

To access some of the papers below, you might have to log in to your UWaterloo library account at UWaterloo. NR below means "no review", that is do not write a review for that paper.
DateTopicReadingsSlides
11/09 Overview, Network and Hierarchical Models, IDS, IBM IMS Optional
  • Bachman's talk at Computer History Museum. Be amused by his story of how he did not really know who Alan Turing was when he was given the Turing award and him meeting Alan Turing's mother.
18/09 Relational Data Model and Systems Optional
25/09 Object-oriented and Object-relational Database Systems Make the "Modeling Concepts for VLSI CAD Objects" the first reading. Optional
Read O2 to see how object-oriented data is modeled as graphs.
Read GENESIS and EXODUS for ambitious attempts to design systems that can be "generate" database systems.
02/10 Web, Data Integration, and XML Lore-Shahid
16/10 RDF and Semantic Web Optional
Jena-Michael
23/10 Applications of Knowledge Graphs Optional
Jeremy-DBpedia
30/10 Modern Graph Databases: Native Systems Optional
Semih-Survey
06/11 Distributed Graph Processing Systems Nafisa-Blogel, Semih-Pregel
13/11 Late Graph Data Research (1): Hybrid Systems Optional
Sara-Vertica
20/10 Late Graph Data Research (2): Query Processing Research (2): Query Processing Characteristic-Sets-Pranjal
27/10 Other Topics Node2vec-Dharvi