Chathura
Kankanamge,
Master’s
candidate
David
R.
Cheriton
School
of
Computer
Science
This thesis studies the problem of optimizing and evaluating multiple directed structural subgraph queries, i.e., those without highly selective predicates on the edges or vertices, continuously in a changing graph. Existing techniques focus on queries with highly selective predicates and are designed for evaluating a single query. As such, these techniques do not scale when evaluating multiple structural queries either because their computations become prohibitively inefficient or they use prohibitively large auxiliary data structures.
We build upon the {\em delta subgraph query (DSQ)} framework that was introduced in prior work. This framework decomposes queries into multiple delta queries, which are then evaluated one query vertex at a time in an evolving graph starting with the newly added or deleted edges.
We study the problem of picking good query vertex orderings for a set of DSQs cumulatively to share computation across different DSQs and achieve efficient run-times in practice. We describe a generic greedy cost-based optimizer that takes as input a set of DSQs and a {\em subgraph extension catalogue}, and generates a single low cost combined plan that cumulatively evaluates all of the DSQs. We adopt as our cost metric a new metric called the intersection cost (i-cost), which we show is a good estimate of the actual work performed during query evaluation. We further describe an optimization called the expanded DSQ optimization that algebraically expands DSQs into more DSQs to share even more computation than the original compact DSQs. On small query sets we demonstrate that our cost-based greedy optimizer is able to find close to optimal combined plans in terms of run time. On larger query sets, we demonstrate that our optimizer can yield significant performance improvements against several baselines. Compared to existing techniques, our approach can handle multiple structural queries efficiently and does not use expensive indices or auxiliary data structures.