Anil Pacaci, PhD candidate
David R. Cheriton School of Computer Science
Graph partitioning in streaming setting is the problem of splitting graph structured data over a cluster of machines in the context of distributed graph processing systems. Streaming algorithms for graph partitioning has recently gained attention due to its ability to scale very large graphs with limited resources. This study first characterizes streaming algorithms for graph partitioning based on the distribution model. We identify edge-cut and vertex-cut model and provide analysis of existing algorithms in literature.
The main objective is to understand how the choice of graph partitioning algorithm affects the system performance, resource usage and scalability. To this end, we provide systematic empirical evaluation of existing algorithms on large, real-world graphs using two classes of workloads: offline graph analytics and online graph queries. In our study, we argue that "no one size fits all" and choice of graph partitioning algorithm depends on: (i) type and degree distribution of the graph, (ii) characteristics of the workloads and (iii) requirements of the application.