Please note: This PhD seminar will take place in DC 1304.
Khaled
Ammar,
PhD
candidate
David
R.
Cheriton
School
of
Computer
Science
Supervisors: Professors Tamer Özsu, Semih Salihoglu
Data generated from human and systems interactions could be naturally represented as graph data. Several emerging applications rely on graph data, such as, the semantic web, social networks, bioinformatics, finance, and trading, among others. These applications require graph querying capabilities, often implemented in graph database management systems (GDBMS). Many GDBMSs have capabilities to evaluate one-time versions of recursive or subgraph queries over static graphs — graphs that do not change or a single snapshot of a changing graph. They generally do not support incrementally maintaining them. Most applications that employ graphs are dynamic, resulting in graphs that change over time, also known as dynamic graphs.
In this presentation, I will discuss the systems and algorithms approaches needed for building a generic and scalable incremental computation solution oblivious to graph workloads. The focus is on two fundamental computations performed by many applications: subgraph queries and recursive queries. Specifically, for subgraph queries, I will present the first approach that (i) performs joins with worst-case optimal computation and communication costs; and (ii) maintains a total memory footprint almost linear in the number of input edges. For recursive queries, I will discuss different optimizations on top of differential computation (DC). DC is a general incremental computation that can maintain the output of a recursive dataflow computation upon changes. However, it requires a prohibitively large amount of memory, limiting its scalability. My presentation shows a suite of optimizations based on dropping the operators’ differences, completely or partially, and recomputing these differences, when necessary, as a technique to reduce the memory overhead of DC. Overall, the presented methods and optimizations represent a proposal for how to build a state-of-the-art generic and scalable GDBMS for dynamic graph data management.