PhD Defence • Machine Learning • Perspectives of Graph Diffusion: Computation, Local Partitioning, Statistical Recovery, and Applications

Friday, January 10, 2025 12:00 pm - 3:00 pm EST (GMT -05:00)

Please note: This PhD defence will take place online.

Shenghao Yang, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Kimon Fountoulakis

Graph diffusion is a generic process where mass spreads from some nodes to nearby nodes along the edges in the graph. This includes a number of stochastic or deterministic processes such as random walk, heat diffusion, and electrical flow on graphs. It is an important computational primitive and has been a building block in a wide range of applications covering both theory and practice.

This thesis introduces several new perspectives of graph diffusion algorithms. First, we introduce a new class of diffusion methods that are provably better at capturing the local clustering structure of a graph. In particular, we show that the new diffusion algorithms lead to improved approximation guarantees for the local graph partitioning problem. In addition, we generalize the methods to hypergraphs, and consequently, obtain the first local hypergraph partitioning algorithm for a fairly general class of hypergraphs which received a lot of interests recently.  Second, we introduce a weighted diffusion mechanism to allow any existing graph diffusion method to work with attributed graphs. The mechanism weighs the edges of the graph based on additional node information, and thus allows to take into account node attributes by employing diffusion in the weighted graph. We provide statistical analyses and show that the weighted diffusion mechanism helps diffusion algorithm recover meaningful clusters from random graph models. Lastly, we provide a comprehensive set of experiments. For semi-supervised community detection, node ranking and classification problems, we show that the new diffusion methods lead to significantly better result compared with existing algorithms, without even requiring to process the entire graph dataset.


Attend this PhD defence on Zoom