I do both systems and theoretical research in data management and processing. My current systems research focuses on Graphflow, which is a new graph database we are building from scratch. We study fundamental components on graph databases such as query optimizer, storage layer, or transaction manager and build each component from scratch.
My theoretical research focuses on understanding the computational complexities of distributed algorithms that evaluate database queries. Many of the existing modern distributed systems are based on the BSP model of computation. Parallel algorithms running on these systems use three main resources: (1) number of rounds, i.e., synchronizations; (2) communication; and (3) memory, i.e., space. My work focuses on understanding the relationship between these three parameters and designing algorithms (specifically query processing algorithms) that are optimal in terms of one or more of these three resources.
I have one postdoc position available for 2 years to work broadly on internals of graph databases. If you are interested please email me with your CV and a description of work you have done on relational or graph database systems.
I have one co-op position available for a UWaterloo undergrad in the Summer 2019 term to work on a project on our prototype graph database Graphflow. If you are interested, please contact me with your CV and transcript.
Graphflow is a prototype active graph database. Graphflow evaluates general one-time and continuous subgraph queries and supports the property graph model. The database is implemented in Java and provides a Cypher-like interface, which extends the openCypher query language with subgraph-condition-action triggers. At the core of Graphflow’s query processor are two worst-case optimal join algorithms called Generic Join and our new Delta Generic Join algorithm for one-time and continuous subgraph queries, respectively.
GPS, A Graph Processing System, is an open-source system for scalable, fault-tolerant, and easy-to-program execution of algorithms on extremely large graphs. GPS is similar to Google’s proprietary Pregel system, and Apache Giraph. GPS is a distributed system designed to run on a cluster of machines, such as Amazon's EC2.
The seminar covered the historical waves that made graph data models popular, such as the web and semantic web. The seminar also covered recent topics popular in the database research community e.g. modern graph databases, graph data processing software based on Hadoop and Spark-like software, knowledge graphs, and example machine learning applications on graphs.
The study of efficient algorithms and effective algorithm design techniques. Topics include divide and conquer algorithms, recurrences, greedy algorithms, dynamic programming, graph search and backtrack, problems without algorithms, NP-completeness and its implications.
The seminar surveyed the models that underly modern large-scale data processing systems, e.g. MapReduce, Spark, Pregel, Flink, Storm, Timely, and others. The goal is to identify the fundamental advantages and limitations of different models and demonstrate the systems and applications that are built on each model.