Please note: This PhD defence will take place online.
Khaled Ammar, PhD candidate
David R. Cheriton School of Computer Science
Supervisors: Professors M. Tamer Özsu, Semih Salihoglu
Data generated from human and systems interactions could be naturally represented as graph data. Several emerging applications rely on graph data, such as, semantic web, social networks, bioinformatics, finance, and trading among others. These applications require graph querying capabilities which are often implemented in a graph database management systems (GDBMS). Many GDBMSs have capabilities to evaluate one-time versions of recursive or subgraph queries over static graphs — graphs that do not change or a single snapshot of a changing graph. They generally do not support incrementally maintaining them. In fact, most applications that employ graphs are dynamic in nature resulting in graphs that change over time, also known as dynamic graphs.
This thesis investigates how to build a generic and scalable incremental computation solution that is oblivious to graph workloads. It focuses on two fundamental computations performed by many applications: recursive queries and subgraph queries. Specifically, for subgraph queries, this thesis presents the first approach that (i) performs joins with worst case optimal computation and communication costs; and (ii) maintains a total memory footprint almost linear in the number of input edges. For recursive queries, this thesis studies optimizations for using differential computation (DC). DC is a general incremental computation that can maintain the output of a recursive dataflow computation upon changes. However, it requires a prohibitively large amount of memory. The thesis proposes a suite of optimizations that are based on dropping the differences of operators, both completely or partially, and recomputing these differences when necessary, as a technique to reduce the memory overhead of DC. The techniques and optimizations in this thesis represent a proposal for how to build a state-of-the-art generic and scalable GDBMS for dynamic graph data management.