Semih Salihoğlu

Semih Salihoğlu, Associate Professor

Email: first.last@uwaterloo.ca

Office: DC 3351

I am an Associate Professor and a current David R. Cheriton Faculty Fellow at University of Waterloo's Cheriton School of Computer Science. I am a member of the Data Systems Research Group. I am also a co-founder of Kùzu Inc., the spinoff company from my research project Kùzu (see below).

Research Interests

I do research on the architectures of data management and processing systems. My work focuses on developing systems for managing, querying, or doing analytics on graph-structured data. My main on-going systems project is Kùzu, which is a new embeddable graph database management system (GDBMS) that is designed for high scalability and very fast querying. See this blog post that describes the vision of the system and these talks (1 and 2 ). Kùzu is based on our earlier system GraphflowDB (see this talk I gave at Pinterest for an overview of GraphflowDB).

Here are a few links related to Kùzu:

Website
Github repo
Discord Channel
Twitter account
Youtube account (where we put user meetings and talks)

For prospective postdoc, PhD, and MMath students: If you are interested in doing a postdoc or graduate studies developing large-scale data management, integration, and processing systems, reach out to me indicating your interests.

Publications [Google Scholar] [DBLP]

Jump to: 2024 | 2023 | 2022 | 2021 | 2020 | 2019 | 2018 | 2017 | < 2015

Filter by Project Category:

Kùzu	Graph Processing
Graphflow	Distributed BSP Systems
Other

2024

Modern Techniques For Querying Graph-structured Databases
Amine Mhedhbi, Amol Desphande, Semih Salihoğlu
Foundations and Trends in Databases, Volume 14, Issue 2, October 2024

Analysis of Open Government Datasets From a Data Design and Integration Perspective
Arif Usta, Chang Liu, Semih Salihoğlu
International Conference on Extending Database Technology (EDBT), March 2024

2023

Kùzu: A Database Management System For “Beyond Relational” Workloads
Semih Salihoğlu
DBrainstorming column of SIGMOD Record, Sep 2023

Kùzu: Graph Learning Applications Need a Modern Graph Database Management System
Ziyi Chen, Xiyang Feng, Guodong Jin, Chang Liu, Semih Salihoğlu
Learning on Graphs Conference (LOG), Nov 2023

Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs SIGMOD Research Highlight Award version for broader audience
Foreword/Technical perspective by Prof. Dan Suciu
Jeremy Chen, Yuqing Huang, Mushi Wang, Semih Salihoğlu, Ken Salem
SIGMOD Record, June 2023

Kùzu Graph Database Management System
Xiyang Feng, Guodong Jin, Ziyi Chen, Chang Liu, Semih Salihoğlu
The Conference on Innovative Data Systems Research (CIDR), January 2023

Governor: Turning Open Government Data Portals into Interactive Databases
Chang Liu, Arif Usta, Semih Salihoğlu, Jian Zhao
ACM Conference on Human Factors in Computing Systems (SIGCHI), April, 2023

2022

Optimizing Differentially-Maintained Recursive Queries on Dynamic Graphs
Khaled Ammar, Siddhartha Sahu, Semih Salihoğlu, Tamer Özsu
International Conference on Very Large Data Bases (VLDB), September 2022

Modern Techniques for Querying Graph-Structured Relations: Foundations, System Implementations, and Open Challenges [Tutorial] (slides-1, slides-2)
Amine Mhedhbi, Semih Salihoğlu
International Conference on Very Large Data Bases (VLDB), September 2022

Making RDBMSs Efficient on Graph Workloads Through Predefined Joins [Experiment, Analysis & Benchmark]
Guodong Jin, Semih Salihoğlu
International Conference on Very Large Data Bases (VLDB), September 2022

GRainDB: A Relational-core Graph-Relational DBMS
Guodong Jin, Nafisa Anzum, Semih Salihoğlu
The Conference on Innovative Data Systems Research (CIDR), January, 2022

Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs [Experiment, Analysis & Benchmark]
Jeremy Chen, Yuqing Huang, Mushi Wang, Semih Salihoğlu, Ken Salem
International Conference on Very Large Data Bases (VLDB), September 2022
(Best Experiments and Analysis Paper Award)

2021

R2GSync and Edge Views: Practical RDBMS to GDBMS Synchronization
Nafisa Anzum, Semih Salihoğlu
GRADES-NDA Workshop on Graph Data Management Experiences & Systems and Network Data Analytics, June, 2021

Integrating Column-Oriented Storage and Query Processing Techniques Into Graph Database Management Systems
Pranjal Gupta, Amine Mhedhbi, Semih Salihoğlu
International Conference on Very Large Data Bases (VLDB), August 2021

Optimizing One-time and Continuous Subgraph Queries using Worst-Case Optimal Joins
Amine Mhedhbi, Chathura Kankanamge, Semih Salihoğlu
Transactions on Database Systems (TODS), May, 2021

KTabulator: Interactive Ad hoc Table Creation using Knowledge Graphs
Steven Xia, Nafisa Anzum, Semih Salihoğlu, Jian Zhao
ACM Conference on Human Factors in Computing Systems (SIGCHI), May, 2021

2020

Graphsurge: Graph Analytics on View Collections Using Differential Computation
Siddhartha Sahu, Semih Salihoğlu
ACM International Conference on Management of Data (SIGMOD), June, 2021

A+ Indexes: Lightweight and Highly Flexible Adjacency Lists for Graph Database Management Systems
Amine Mhedhbi, Pranjal Gupta, Shahid Khaliq, Semih Salihoğlu
International Conference on Data Engineering (ICDE), April, 2021

2019

Box Covers and Domain Orderings for Beyond Worst-Case Join Processing
Kaleb Alway, Eric Blais, and Semih Salihoğlu
International Conference on Database Theory (ICDT), Cyprus, March 2021

The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing: Extended Survey
Siddhartha Sahu, Amine Mhedhbi, Semih Salihoğlu, Jimmy Lin, and M. Tamer Özsu
The VLDB Journal, June 2019

GraphWrangler: An Interactive Graph View on Relational Data
Nafisa Anzum, Semih Salihoğlu, Daniel Vogel
ACM International Conference on Management of Data (SIGMOD) (Demonstration Track), June 2019

Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins
Amine Mhedhbi, Semih Salihoğlu
International Conference on Very Large Data Bases (VLDB), August 2019

2018

Spectral Measures of Distortion for Change Detection in Dynamic Graphs
Luca Castelli Aleardi, Semih Salihoğlu, Gurprit Singh, Maks Ovsjanikov
International Conference on Complex Networks and Their Applications, December, 2018

Algorithmic Aspects of Parallel Data Processing (Sigmod 2018 Tutorial Slides)
Paris Koutris, Semih Salihoğlu, Dan Suciu
Foundations and Trends in Databases, Volume 8, Issue 4, February 2018

Distributed Evaluation of Subgraph Queries Using Worstcase Optimal Low-Memory Dataflows
Khaled Ammar, Frank McSherry, Semih Salihoğlu
International Conference on Very Large Data Bases (VLDB), Rio de Janeiro, August 2018

The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing
Siddhartha Sahu, Amine Mhedhbi, Semih Salihoğlu, Jimmy Lin, and M. Tamer Özsu
International Conference on Very Large Data Bases (VLDB), Rio de Janeiro, August 2018
(Best Paper Award)

Workload-Aware CPU Performance Scaling for Transactional Database Systems
Mustafa Korkmaz, Martin Karsten, Kenneth Salem, and Semih Salihoğlu
ACM International Conference on Management of Data (SIGMOD), June 2018

2017

Combining Vertex-centric Graph Processing with SPARQL for Large-scale RDF Data Analytics
Ibrahim Abdelaziz, Mohammad Razen Al-Harbi, Semih Salihoğlu, and Panos Kalnis
IEEE Transactions on Parallel and Distributed Systems (TPDS), June 2017

Graphflow: An Active Graph Database
Chathura Kankanamge, Siddhartha Sahu, Amine Mhedbhi, Jeremy Chen, and Semih Salihoğlu
ACM International Conference on Management of Data (SIGMOD) (Demonstration Track), May 2017

GYM: A Multiround Join Algorithm In MapReduce
Foto Afrati, Manas Joglekar, Chris Re, Semih Salihoğlu, and Jeffrey D. Ullman
International Conference on Database Theory (ICDT), Venice, Italy, March 2017

2015

SPARTex: A Vertex-Centric Framework for RDF Data Analytics
Ibrahim Abdelaziz, Razen Harbi, Semih Salihoğlu, Panos Kalnis and Nikos Mamoulis
International Conference on Very Large Data Bases (VLDB) (Demonstration Track), Hawaii, USA, September 2015

Graft: A Debugging Tool For Apache Giraph
Semin Salihoğlu, Jaeho Shin, Vikesh Khanna, Ba Quan Truong and Jennifer Widom
ACM International Conference on Management of Data (SIGMOD) (Demonstration Track), June 2015

2014

Optimizing Graph Algorithms on Pregel-like Systems
Semih Salihoğlu and Jennifer Widom
International Conference on Very Large Data Bases (VLDB), Hangzhou, China, September 2014

Anchor Points Algorithms for Hamming and Edit Distance
Foto Afrati, Anish Das Sarma, Anand Rajaraman, Pokey Rule, Semih Salihoğlu, and Jeffrey D. Ullman
International Conference on Database Theory (ICDT), Athens, Greece, March 2014

HelP: High-level Primitives for Large-Scale Graph Processing
Semih Salihoğlu and Jennifer Widom
Graph Data-management Experiences and Systems Workshop (GRADES), Snowbird, Utah, June 2014

Simplifying Scalable Graph Processing with a Domain-Specific Language
Sungpack Hong, Semih Salihoğlu, Jennifer Widom, and Kunle Olukotun
International Symposium on Code Generation and Optimization (CGO), Orlando, Fl., February 2014

2013

Upper and Lower Bounds on the Cost of a MapReduce Computation
Foto Afrati, Anish Das Sarma, Semih Salihoğlu, and Jeffrey D. Ullman
International Conference on Very Large Data Bases (VLDB), Trento, Italy, August 2013

GPS: A Graph Processing System
Semih Salihoğlu and Jennifer Widom
International Conference on Scientific and Statistical Database Management (SSDBM), July 2013
(best paper runner-up)

2012

Provenance-Based Debugging and Drill-Down in Data-Oriented Workflows
Robert Ikeda, Junsang Cho, Charlie Fang, Semih Salihoğlu, Satoshi Torikai, and Jennifer Widom
International Conference on Data Engineering (ICDE), Washington, DC, April 2012 (Demonstration Paper)

2011

Provenance-Based Refresh in Data-Oriented Workflows
Robert Ikeda, Semih Salihoğlu, and Jennifer Widom
Conference on Information and Knowledge Management (CIKM), Glasgow, Scotland, October 2011

Current Projects

Kùzu is a new embeddable graph database management system (GDBMS) that is designed for high scalability and very fast querying. Kùzu is actively being developed and is acquring more and more features. Kùzu`s core architecture is informed by our insights from our previous GraphflowDB project but Kùzu is disk-based and more importantly aims to be a fully functional user facing system. Our research on Kùzu focuses on techniques for making GDBMSs competent on knowledge graph management and in graph data science pipelines. Specifically, we are studying how to integrate GDBMSs better into Python graph data science ecosystem and performant on highly heterogeneous and string/URI-heavy knowledge graphs.

Previous Projects

GraphflowDB is a graph database management system (GDBMS) we are building from scratch. The system is implemented in Java and supports the openCypher language. Our research focuses on rethinking each core database component for contemporary GDBMSs, including core query optimization and processing techniques such as join optimization, indexes, and cardinality estimation.

Graphsurge is a new system built on top of Timely Dataflow and Differential Dataflow for performing analytical computations on multiple snapshots or views of large-scale static property graphs. Graphsurge allows users to create view collections, a set of related views of a graph created by applying filter predicates on node and edge properties, and run analytical computations on all the views of a collection efficiently. The system is designed to support contingency, perturbation, or temporal analysis applications that require running computations on thousands of but similar graph snapshots at a time.

GraphWrangler is a system that is designed to streamline the manual and often tedious ETL pipeline of extracting tabular data into a graph format for processing in a graph-specific software. GraphWrangler allows users to connect to an RDBMS, MySQL in our current implementation, and within a few clicks extract graphs out of their tabular data, visualize and explore these graphs, and automatically generate scripts for their ETL pipelines. Watch our demonstration video here!.

GPS, A Graph Processing System, is an open-source system for scalable, fault-tolerant, and easy-to-program execution of algorithms on extremely large graphs. GPS is similar to Google’s proprietary Pregel system, and Apache Giraph. GPS is a distributed system designed to run on a cluster of machines, such as Amazon`s EC2.

Students

Ph.D. Students:

M.Math. Students:

Alumni:

Siddhartha Sahu, PhD
Thesis: Optimizing Differential Computation for Large-Scale Graph Processing
Last known position: Software Engieer at Materialize Inc.
Ziyi Chen, Research Associate Last known position: Co-founder and Software Engineer at Kùzu Inc., Waterloo, Canada
Guodong Jin, Postdoc
Arif Usta, Postdoc
Last known position: Senior AI Engineer at St.Jude Children`s Research Hospital, Memphis, USA
Amine Mhedhbi, PhD
Thesis: GraphflowDB: Scalable Query Processing on Graph-Structured Relations (Cheriton SCS Dissertation Award)
Last known position: Assistant Professor at Polytechnique Montréal
Khaled Ammar, PhD (co-advised with Tamer Özsu)
Thesis: Systems and Algorithms for Dynamic Graph Processing
Last known position: Head of Data at Borealis AI
Chang Liu, MMath
Last known position: Co-founder and DevOps Engineer at Kùzu Inc., Waterloo, Canada
Xiyang Feng (MMath), 2019-2021
Last known position: Co-founder and Software Engineer at Kùzu Inc., Waterloo, Canada
Tamal Adhikary (co-advised with Khuzaima Daudjee) (MMath), 2019-2022
Last known position: Engineer at Huawei, Markham, Canada
Jeremy Chen (MMath), 2019-2020
Thesis: Join Cardinality Estimation Graphs: Analyzing Pessimistic and Optimistic Estimators Through a Common Lens
Last known position: Software Engineer at Snowflake, USA.
Nafisa Anzum, 2018-2020
Thesis: Systems for Graph Extraction from Tabular Data
Last known position: Software Engineer, Presto Inc., Waterloo, CA.
Pranjal Gupta (MMath), 2018-2020
Thesis: Integrating Column-Oriented Storage and Query Processing Techniques into Graph Database Management Systems
Last known position: Software Engineer, Apple Inc., London, UK.
Chathura Kankanamge (MMath), 2016-2018
Thesis: Multiple Continuous Subgraph Query Optimization Using Delta Subgraph Queries
Last known position: Software Engineer at Amazon AWS, Vancouver, Canada.
Kaleb Alway (co-supervised with Eric Blais) (MMath), 2017-2019
Thesis: Domain Ordering and Box Cover Problems for Beyond Worst-Case Join Processing
Last known position: Software Engineer at SAP, Waterloo, Canada.
Shahid Khaliq (MMath), 2017-2019
Thesis: Highly Flexible Adjacency Lists in Graph Database Management Systems
Last known position: Software Engineer at Top Hat, Toronto, Canada.
Arman Naeimian (co-supervised with Mei Nagappan) (MMath) 2017-2019
Thesis: Parallel Paths Analysis Using Function Call Graphs
Last known position: Software Engineer at Sandvine, Waterloo, Canada.
Azin Nazari (co-supervised with Ian Munro) (MMath) 2017-2019
Thesis: Majority in the Three-Way Comparison Model

Teaching

CS 348: Introduction to Database Systems (Winter 2025, Winter 2022, Fall 2021)

An introduction to database management systems. The course covers topics in three areas: (1) fundamental features of relational database management systems: the relational data model and its query languages, integrity constraints, indexes and views, and transactions; (2) database design methodology; and (3) core topics about the internals and architectures of DBMSs, such as physical record design, query planning and optimization, indexes, and transaction protocols.

CS 848: Beyond Relational Systems (Fall 2024)

This seminar covers three important classes of database management systems that either offer different data model and/or query and computational capabilities than relational systems. When possible we focus on applications of these systems to AI.

CS 848: Knowledge Graphs (Fall 2022)

This seminar covers seminal work in the space of knowledge graph representation, querying, management, and past and primarily modern applications that are powered by knowledge graphs. Topics include knowledge models, ontologies, query languages, graph data management systems, public knowledge graphs, knowledge graph embeddings, popular successful past and present applications.

CS 341: Algorithms (Summer 2016, Winter 2017, Winter 2018, Winter 2019, Spring 2020)

The study of efficient algorithms and effective algorithm design techniques. Topics include divide and conquer algorithms, recurrences, greedy algorithms, dynamic programming, graph search and backtrack, theory of NP-completeness and its implications.

CS 848: Graph Analytics and Data Management (Winter 2020)

The seminar is an updated version of a previous seminar that focused primarily on graph data management in Fall 2018. The current offering covers fewer database topics and instead surveys a wider range of topics in graph analytics. The goal of this offering is to showcase the specific interests of a very wide range of scientific communities that work on graphs including communities within computer science, such as databases, semantic web, hpc and computer architecture, data mining, machine learning, and hci as well as communities outside of computer science, such as physics and neuro-science. The seminar's reading list is particularly tailored for students who have interest in doing graudate studies in graph processing and analytics, as it tries to cover some of the seminal work on graph processing from multiple communities.

CS 848: Graph Data Management (Fall 2018)

The seminar covered the historical waves that made graph data models popular, such as the web and semantic web. The seminar also covered recent topics popular in the database research community e.g. modern graph databases, graph data processing software based on Hadoop and Spark-like software, knowledge graphs, and example machine learning applications on graphs.

CS 848/858: Modern Data Processing Systems (Fall 2016)

The seminar surveyed the models that underly modern large-scale data processing systems, e.g. MapReduce, Spark, Pregel, Flink, Storm, Timely, and others. The goal is to identify the fundamental advantages and limitations of different models and demonstrate the systems and applications that are built on each model.