[DSG Logo]

CS 848: Graph Analytics and Data Management

Winter 2020

Semih Salihoglu
firname [dot] lastname [at] uwaterloo [dot] ca
DC 3351
Meeting times: Tuesdays, 9am-12pm
Meeting location: DC 2568


Graphs are one of the most naturals data structures to represent connected entities in the real world. As such, many real world applications model their data in the form of a graph and analyze or query these graphs. Several of the numerous examples include neuro-scientists using brain networks to identify the functionality and importance of neurons, search engines, such as Google, using knowledge graphs for question answering, banks using transaction graphs to detect fraudulent money flow patterns, social networking applications, such as Facebook and LinkedIn, using their social networks to do recommendations. This seminar covers topics in graph analytics and data management that has enabled such applications. We will study the software systems for graph analytics and data management and cover algorithmic work focusing on several machine learning tasks, such as community detection, clustering, link prediction, and influence maximization. We will read a wide range of papers from different communities both within computer science, such as databases, machine learning, systems, and social network analysis, as well as outside of computer science such as biology and physics.


Having a background on systems, such as having taken courses in distributed systems, computer architecture, is not strictly necessary but helpful. Similarly, having a background on network analytics and graph algorithms is also not necessary but will be helpful for appreciating the papers on graph analytics.

Workload & Mark Breakdown

The main workload will consist of a term-long project that students will do related to one of the topics covered in the seminar. The other workload includes paper reviews and two in-class presentations. In a typical seminar, we will have two paper presentations, roughly 25 minutes each, followed by about an hour of open discussions. Each week a different student will be leading the discussion. This is a seminar, so your participation in the class is very critical to everyone's learning. That is why it will be a significant part of your grade. Please ask questions and make comments throughout the discussions. The workload pieces and mark breakdown is as follows:


Students can work individually or in pairs on a project. Students are encouraged to come up with their own projects. If you would like to brainstorm about a project, please talk to me. The projects are intended to be mini-research projects in a topic related to graph analytics and data management. Projects can be on systems or algorithmic. Example algorithmic projects can include studying the behaviour of an existing graph analytics algorithm through experiments, comparing the performances of several algorithms that are designed to solve the same problem, designing a new algorithm to solve a problem. Example systems projects can extend or replace a component of an existing graph database, RDF engine, or a graph analytics system or building a prototype system demonstrating a new design for a component of a system (e.g., a new storage format). You have 3 weeks to decide on the project. By January 21st you need to give a 1 page proposal of your project in double-column paper format.

Project Deliverables

There are two main deliverables of your project, a 6-page paper and the source code of your project with instructions to run your code.

Paper Reviews

For each class we will be writing two reviews for two of the papers assigned to that day (except the first seminar). If there are more than two papers assigned, you can pick any two of the assigned papers. You are allowed to skip 3 reviews throughout the term. The reviews will be written in the form of peer-reviewed conference reviews. So, they should be critiques that describe the strengths and weaknesses of the paper and contain suggestions to make the paper stronger. Each review must be written in the following template:

You will submit your reviews in the seminar. Finally, the format for the reviews is single column, 1.5 space, 12 pt, in Latex.


Each student will be doing 2 presentations in the term. Each presentation will be about 25 minutes long. Here are the important points summarizing what you have to do for your presentations.

Schedule and Papers

To access some of the papers below, you might have to log in to your UWaterloo library account at UWaterloo. Topics in graph processing is widely varied and the exact topics for the last three weeks will be determined by the interests of the students. We will have an in class poll in the middle of the term where students will rank their interests in the following topics: (i) Research in Computer Architecture: FPGAs, accelerators; (ii) More on Linked/Open Data and Semantic Web; (iii) Graph Visualization; (iv) Advanced Query Processing Techniques: Recursive path queries; (v) Graph Stream Processing; (vi) More machine Learning Algorithms: Clustering and/or community detection algorithms; (vii) Applications in Sciences: Genome assembly using de brujin graphs.
W1: Jan 7 Overview, Historic Graph Data Management Systems Optional
  • Bachman's talk at Computer History Museum. Be amused by his story of how he did not know who Alan Turing was when he was given the Turing award and him meeting Alan Turing's mother.
W2: Jan 14 Modern Graph Database Management Systems Optional
W3: Jan 21 Parallel Graph Analytics Systems 1: Vertex-centric Systems
W4: Jan 28 Parallel Graph Analytics Systems 2: Other Frontends
W5: Feb 4 Semantic Web and Knowledge Graph Management Optional
W6: Feb 11 Linked Data and Knowledge Graphs Optional
W7: Feb 25 Structure and Properties of Real World Graphs Optional
W8: March 3 Analytics Algorithms
W9: March 10 Graphs in Machine Learning
W10: March 17 Selected by students from the above list
W11: March 24 Selected by students from the above list
W12: March 31 Selected by students from the above list