semih.png

I am an assistant professor at University of Waterloo's Cheriton School of Computer Science. I am a member of the Data Systems Research Group.


Research Interests

I do both systems and theoretical research in data management and processing. My current systems research focuses on Graphflow, which is a new graph database we are building from scratch. We study fundamental components on graph databases such as query optimizer, storage layer, or transaction manager and build each component from scratch.

My theoretical research focuses on understanding the computational complexities of distributed algorithms that evaluate database queries. Many of the existing modern distributed systems are based on the BSP model of computation. Parallel algorithms running on these systems use three main resources: (1) number of rounds, i.e., synchronizations; (2) communication; and (3) memory, i.e., space. My work focuses on understanding the relationship between these three parameters and designing algorithms (specifically query processing algorithms) that are optimal in terms of one or more of these three resources.


Positions Available

I have one postdoc position available for 2 years to work broadly on internals of graph databases. If you are interested please email me with your CV and a description of work you have done on relational or graph database systems.

I have one co-op position available for a UWaterloo undergrad in the Winter 2019 term to work on a project on our prototype graph database Graphflow. If you are interested, please contact me with your CV and transcript.


Publications [Google Scholar] [DBLP]


Jump to: 2018 | 2017 | 2015 | 2014 | 2013 | 2012 | 2011.

Filter by Project Category:

2018

Spectral Measures of Distortion for Change Detection in Dynamic Graphs
Luca Castelli Aleardi, Semih Salihoglu, Gurprit Singh, Maks Ovsjanikov
Complex Networks, December, 2018

Algorithmic Aspects of Parallel Data Processing (Sigmod 2018 Tutorial Slides)
Paris Koutris, Semih Salihoglu, Dan Suciu
Foundations and Trends in Databases, Volume 8, Issue 4, February 2018

Distributed Evaluation of Subgraph Queries Using Worstcase Optimal Low-Memory Dataflows
Khaled Ammar, Frank McSherry, Semih Salihoglu
Proc. International Conference on Very Large Data Bases (VLDB), Rio de Janeiro, August 2018

The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing
Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M. Tamer Özsu
Proc. International Conference on Very Large Data Bases (VLDB), Rio de Janeiro, August 2018
(Best Paper Award)

Workload-Aware CPU Performance Scaling for Transactional Database Systems
Mustafa Korkmaz, Martin Karsten, Kenneth Salem, and Semih Salihoglu
Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD), June 2018

2017

Combining Vertex-centric Graph Processing with SPARQL for Large-scale RDF Data Analytics
Ibrahim Abdelaziz, Mohammad Razen Al-Harbi, Semih Salihoglu, and Panos Kalnis
IEEE Transactions on Parallel and Distributed Systems, June 2017

Graphflow: An Active Graph Database
Chathura Kankanamge, Siddhartha Sahu, Amine Mhedbhi, Jeremy Chen, and Semih Salihoglu
Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD) (Demonstration Track), May 2017

GYM: A Multiround Join Algorithm In MapReduce
Foto Afrati, Manas Joglekar, Chris Re, Semih Salihoglu, and Jeffrey D. Ullman
Proc. International Conference on Database Theory (ICDT), Venice, Italy, March 2017

2015

SPARTex: A Vertex-Centric Framework for RDF Data Analytics
Ibrahim Abdelaziz, Razen Harbi, Semih Salihoglu, Panos Kalnis and Nikos Mamoulis
Proc. International Conference on Very Large Data Bases (VLDB) (Demonstration Track), Hawaii, USA, September 2015

Graft: A Debugging Tool For Apache Giraph
Semin Salihoglu, Jaeho Shin, Vikesh Khanna, Ba Quan Truong and Jennifer Widom
Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD) (Demonstration Track), June 2015

2014

Optimizing Graph Algorithms on Pregel-like Systems
Semih Salihoglu and Jennifer Widom
Proc. International Conference on Very Large Data Bases (VLDB), Hangzhou, China, September 2014

Anchor Points Algorithms for Hamming and Edit Distance
Foto Afrati, Anish Das Sarma, Anand Rajaraman, Pokey Rule, Semih Salihoglu, and Jeffrey D. Ullman
Proc. International Conference on Database Theory (ICDT), Athens, Greece, March 2014

HelP: High-level Primitives for Large-Scale Graph Processing
Semih Salihoglu and Jennifer Widom
Graph Data-management Experiences and Systems Workshop (GRADES), Snowbird, Utah, June 2014

Simplifying Scalable Graph Processing with a Domain-Specific Language
Sungpack Hong, Semih Salihoglu, Jennifer Widom, and Kunle Olukotun
Proc. International Symposium on Code Generation and Optimization (CGO), Orlando, Fl., February 2014

2013

Upper and Lower Bounds on the Cost of a MapReduce Computation
Foto Afrati, Anish Das Sarma, Semih Salihoglu, and Jeffrey D. Ullman
Proc. International Conference on Very Large Data Bases (VLDB), Trento, Italy, August 2013

GPS: A Graph Processing System
Semih Salihoglu and Jennifer Widom
Proc. International Conference on Scientific and Statistical Database Management (SSDBM), July 2013
(best paper runner-up)

2012

Provenance-Based Debugging and Drill-Down in Data-Oriented Workflows
Robert Ikeda, Junsang Cho, Charlie Fang, Semih Salihoglu, Satoshi Torikai, and Jennifer Widom
Proc. International Conference on Data Engineering (ICDE), Washington, DC, April 2012 (Demonstration Paper)

2011

Provenance-Based Refresh in Data-Oriented Workflows
Robert Ikeda, Semih Salihoglu, and Jennifer Widom
Proc. Conference on Information and Knowledge Management (CIKM), Glasgow, Scotland, October 2011


Current Projects


semih.png

Graphflow is a prototype active graph database. Graphflow evaluates general one-time and continuous subgraph queries and supports the property graph model. The database is implemented in Java and provides a Cypher-like interface, which extends the openCypher query language with subgraph-condition-action triggers. At the core of Graphflow’s query processor are two worst-case optimal join algorithms called Generic Join and our new Delta Generic Join algorithm for one-time and continuous subgraph queries, respectively.

Previous Projects


semih.png

GPS, A Graph Processing System, is an open-source system for scalable, fault-tolerant, and easy-to-program execution of algorithms on extremely large graphs. GPS is similar to Google’s proprietary Pregel system, and Apache Giraph. GPS is a distributed system designed to run on a cluster of machines, such as Amazon's EC2.


Students


Current Ph.D. Students:

Current M.Math. Students:

Alumni:


Teaching


CS 848: Graph Data Management (Fall 2018)

The seminar covered the historical waves that made graph data models popular, such as the web and semantic web. The seminar also covered recent topics popular in the database research community e.g. modern graph databases, graph data processing software based on Hadoop and Spark-like software, knowledge graphs, and example machine learning applications on graphs.


CS 341: Algorithms (Summer 2016, Winter 2017, Winter 2018)

The study of efficient algorithms and effective algorithm design techniques. Topics include divide and conquer algorithms, recurrences, greedy algorithms, dynamic programming, graph search and backtrack, problems without algorithms, NP-completeness and its implications.


CS 848/858: Modern Data Processing Systems (Fall 2016)

The seminar surveyed the models that underly modern large-scale data processing systems, e.g. MapReduce, Spark, Pregel, Flink, Storm, Timely, and others. The goal is to identify the fundamental advantages and limitations of different models and demonstrate the systems and applications that are built on each model.