I do both systems and theoretical research in data management and processing. My systems work focuses on developing systems for managing, querying, or doing analytics on graph-structured data. My main on-going systems projects include Graphflow, which is a new graph database we are building from scratch, and GraphWrangler which is a system designed to give an immediate graph-view on relational data. My theoretical work focuses on studying theoretical aspects of distributed algorithms for query processing.
I have one postdoc position available for 2 years to work broadly on internals of graph databases. If you are interested please email me with your CV and a description of work you have done on relational or graph database systems.
Graphflow is a prototype active graph database. Graphflow evaluates general one-time and continuous subgraph queries and supports the property graph model. The database is implemented in Java and provides a Cypher-like interface, which extends the openCypher query language with subgraph-condition-action triggers. At the core of Graphflow’s query processor are two worst-case optimal join algorithms called Generic Join and our new Delta Generic Join algorithm for one-time and continuous subgraph queries, respectively.
GPS, A Graph Processing System, is an open-source system for scalable, fault-tolerant, and easy-to-program execution of algorithms on extremely large graphs. GPS is similar to Google’s proprietary Pregel system, and Apache Giraph. GPS is a distributed system designed to run on a cluster of machines, such as Amazon's EC2.
The seminar covered the historical waves that made graph data models popular, such as the web and semantic web. The seminar also covered recent topics popular in the database research community e.g. modern graph databases, graph data processing software based on Hadoop and Spark-like software, knowledge graphs, and example machine learning applications on graphs.
The study of efficient algorithms and effective algorithm design techniques. Topics include divide and conquer algorithms, recurrences, greedy algorithms, dynamic programming, graph search and backtrack, problems without algorithms, NP-completeness and its implications.
The seminar surveyed the models that underly modern large-scale data processing systems, e.g. MapReduce, Spark, Pregel, Flink, Storm, Timely, and others. The goal is to identify the fundamental advantages and limitations of different models and demonstrate the systems and applications that are built on each model.