Graph Sparsification
Motivation
Often times graph algorithms for graphs
If the graph is dense, i.e.
When sparsifying a graph, we may lose some information about the graph, so we will settle with approximately preserving some properties of the graph. That is, we will settle for approximate answers.
Graph sparsification is used as primitives in many graph algorithms, such as max-flow, sparsest cut, among others.
In this lecture, we will see how to sparsify a graph while approximately preserving the value of every cut in the graph.
But before we do the above, let us think about a warm-up problem: approximating the minimum cut of a graph. So here we will only care about (approximately) preserving the value of the minimum cut.
Warm-up: Approximating the Minimum Cut
Throughout this section, we will work with
We will denote
Definition 1 (Cut): A cut
A minimum cut is a cut of minimum value.
One useful operation on cuts is the contraction of an edge
Definition 2 (Edge contraction): Let
Note that the contraction of an edge
The following lemma shows that the value of the minimum cut does not decrease when we contract an edge.
Lemma 1: Let
The proof is left as a practice problem.
Now, we will see how to use contractions to compute the minimum cut, with high probability.
Randomized Minimum Cut Algorithm
Input: undirected, unweighted graph
Output: a minimum cut
Algorithm:
- While
:- Pick an edge
uniformly at random from - Contract
- Pick an edge
- When
, return the cut induced by the two vertices in
An intuitive way to see why this algorithm works is that when we pick a random edge to contract, we are most likely picking an edge from large cuts, and thus we are most likely to contract large cuts and thereby preserve small cuts.
Let’s now put this intuition into a more formal proof.
Theorem 1 (Karger): The above algorithm outputs a minimum cut with probability at least
Proof: Let
Let us compute the probability that an edge from
- Each vertex is a cut, so each vertex has degree at least
. Hence, we know that at least edges remain - The probability that we contract an edge from
is - Hence, the probability that we never contract an edge from
is at least
Hmmmm, the above probability is not that great.
However, to improve the probability, we can run the algorithm multiple times, and output the minimum cut over all runs.
If we repeat the algorithm
Running time: Each iteration of the algorithm takes
That doesn’t look great, but we can do better. You will see how to do better in the homework!
A really neat combinatorial conclusion of the above algorithm is the following structural result:
Corollary 1 (Karger): There are at most
Proof: each minimum cut is preserved with probability at least
By generalizing the argument above, we can bound the number of small cuts in
Lemma 2: If
Practice problem: Prove the above lemma.
Graph Sparsification
We will be interested in the following problem: given a graph
Note that
For this lecture, we will assume that
However, the results we will see in this lecture can be extended to weighted graphs, and to graphs with small minimum cuts. (See references at the end of the lecture slides)
Randomized Sparsification Algorithm
Input: undirected, unweighted graph
Output: a sparse (weighted) graph
Algorithm:
- Let
be a parameter (to be determined later) - For each edge
, include in with probability , and if included, set .
Main idea:
- We need to set
to be the corrected expected value for both the number of edges in and the value of each cut in . - After that, need to prove concentration bounds for values of all cuts in
, simultaneously!- We will do this by using Chernoff-Hoeffding bounds.
- Then we will show that there are not too many small cuts in
, and thus the probability that we have a bad cut in is small. - We will then use the union bound to prove that all cuts are concentrated simultaneously.
We will prove the following theorem:
Theorem 2 (Karger): Let
Proof: Let
Then,
Hence, the expected values are
Now, let us compute the concentration bounds for
- For
, we have - For
, note that is a sum of independent random variables with values in . Hence, we can use Chernoff bounds to get
Note that
Remark: the probability that a large cut is violated is very small (since
Let us work out the union bound.
This completes the proof.
Where did we use the assumption that the minimum cut in
How do we remove the assumption that we have a large cut? Benczur and Karger show that to remove the assumption, we need to use non-uniform sampling of edges. If we choose this non-uniform sampling carefully, then we can get a sparse graph which approximates all cuts with high probability!
If you are interested in seeing how they do it, see references in the lecture slides.