# Graph Sparsification

## Motivation

Often times graph algorithms for graphs $G(V, E)$ have runtimes which depend on the number of edges $|E|$. For example, the runtime of Dijkstra’s algorithm is $O(|E| + |V| \log |V|)$, and the runtime of the Ford-Fulkerson algorithm is $O(|E| \cdot f^*)$, where $f^*$ is the value of the maximum flow.

If the graph is *dense*, i.e. $|E| = \omega(|V|^{1 + \gamma})$, where $\gamma \in (0,1)$, then these runtimes may be too slow for practical purposes.
Thus, we would like to *sparsify* the graph, i.e. reduce the number of edges, while preserving some properties of the graph.
By a sparse graph, we mean a graph with $|E| = O(|V| \cdot \mathsf{poly} \log |V|)$.

When sparsifying a graph, we may lose some information about the graph, so we will settle with *approximately* preserving some properties of the graph.
That is, we will settle for approximate answers.

Graph sparsification is used as primitives in many graph algorithms, such as max-flow, sparsest cut, among others.

In this lecture, we will see how to sparsify a graph while approximately preserving the value of *every* cut in the graph.

But before we do the above, let us think about a warm-up problem: approximating the minimum cut of a graph. So here we will only care about (approximately) preserving the value of the minimum cut.

## Warm-up: Approximating the Minimum Cut

Throughout this section, we will work with $G(V, E, w)$, which is an undirected graph with non-negative edge weights $w_e \geq 0$. Whenever we work with unweighted graphs (i.e., $w_e = 1$ for all $e \in E$), we will omit the weights from the notation.

We will denote $n := |V|$ and $m := |E|$.

**Definition 1 (Cut):** A *cut* $(S, \bar{S})$ is a partition of the vertices $V$ into two sets $S$ and $\bar{S}$.
That is, $V = S \cup \bar{S}$ and $S \cap \bar{S} = \emptyset$.
The *value* of a cut is the sum of the weights of the edges that cross the cut, i.e. edges which have one endpoint in $S$ and the other endpoint in $\bar{S}$.
If we denote the edges that cross the cut by $E(S, \bar{S})$, then the value of the cut is
$$w(S, \bar{S}) = \sum_{e \in E(S, \bar{S})} w_e$$

A *minimum cut* is a cut of minimum value.

One useful operation on cuts is the *contraction* of an edge $e = (u, v)$, which we now define.

**Definition 2 (Edge contraction):** Let $e = {u, v}$ be an edge in $G(V, E, w)$.
The *contraction* of $e$ is a new graph $H(V \cup {z} \setminus {u,v}, F, w’)$, where we replace $u$ and $v$ with a new vertex $z$, and any edge ${u,x} \neq e$ is replaced by ${z,x}$, and any edge ${v,x} \neq e$ is replaced by ${z,x}$ (and we have $w’(z,x) = w(u,x)$, or $w’(z,x) = w(u,x)$).

Note that the contraction of an edge $e$ yields a new graph with *one less* vertex than the original graph.
Also, note that the contracted graph $H$ is not necessarily a simple graph, i.e. it may have parallel edges (in the case of weighted graphs, we can combine the parallel edges into one edge with the sum of the weights).

The following lemma shows that the value of the minimum cut does not decrease when we contract an edge.

**Lemma 1:** Let $e = {u, v}$ be an edge in $G(V, E, w)$, and let $H(V \cup {z} \setminus {u,v}, F, w’)$ be the graph obtained by contracting $e$.
Then, the value of the minimum cut in $H$ is at least the value of the minimum cut in $G$.

The proof is left as a practice problem.

Now, we will see how to use contractions to compute the minimum cut, with high probability.

### Randomized Minimum Cut Algorithm

**Input:** undirected, unweighted graph $G(V, E)$

**Output:** a minimum cut $(S, \bar{S})$ of $G$

**Algorithm:**

- While $n > 2$:
- Pick an edge $e = {u, v}$ uniformly at random from $E$
- Contract $e$

- When $n = 2$, return the cut $(S, \bar{S})$ induced by the two vertices in $V$

An intuitive way to see why this algorithm works is that when we pick a random edge to contract, we are most likely picking an edge from large cuts, and thus we are most likely to contract large cuts and thereby preserve small cuts.

Let’s now put this intuition into a more formal proof.

**Theorem 1 (Karger):** The above algorithm outputs a minimum cut with probability at least $1 / \binom{n}{2}$.

**Proof:** Let $(S, \bar{S})$ be a minimum cut in $G$, and let $c := w(S, \bar{S})$.
If we never contract an edge from $E(S, \bar{S})$, then the algorithm succeeds, as we will output $w(S, \bar{S})$.

Let us compute the probability that an edge from $E(S, \bar{S})$ is contracted in one iteration of the algorithm (conditioned on the fact that this cut is still alive).

- Each vertex is a cut, so each vertex has degree at least $c$. Hence, we know that at least $(n-i+1) \cdot c/2$ edges remain
- The probability that we contract an edge from $E(S, \bar{S})$ is $$ \dfrac{w(S, \bar{S})}{\text{# edges}} \leq \dfrac{c}{(n-i+1) \cdot c/2} = \dfrac{2}{n-i+1}$$
- Hence, the probability that we never contract an edge from $E(S, \bar{S})$ is at least $$\prod_{i=1}^{n-2} \left(1 - \dfrac{2}{n-i+1}\right) = \prod_{i=3}^{n} \left(1 - \dfrac{2}{i}\right) = 2/n(n-1) = \dfrac{1}{\binom{n}{2}}$$

Hmmmm, the above probability is not that great. However, to improve the probability, we can run the algorithm multiple times, and output the minimum cut over all runs. If we repeat the algorithm $t$ times, then the failure probability is at most $(1 - 2/n(n-1))^t$. If we set $t = 2n(n-1)$, we get a failure probability of at most $$\left(1 - \dfrac{2}{n(n-1)}\right)^{2n(n-1)} = \left(\left(1 - \dfrac{2}{n(n-1)}\right)^{n(n-1)/2} \right)^4 \sim \dfrac{1}{e^4}$$

**Running time:** Each iteration of the algorithm takes $O(m)$ time, and we run the algorithm $O(n^2)$ times, so the total running time is $O(n^2 m)$.

That doesn’t look great, but we can do better. You will see how to do better in the homework!

A really neat combinatorial conclusion of the above algorithm is the following structural result:

**Corollary 1 (Karger):** There are at most $\binom{n}{2}$ minimum cuts in a graph.

**Proof:** each minimum cut is preserved with probability at least $1 / \binom{n}{2}$. Since the events that each minimum cut is preserved are disjoint (and the sum of probabilities is $\leq 1$), there can be at most $\binom{n}{2}$ minimum cuts.

By generalizing the argument above, we can bound the number of small cuts in $G$.

**Lemma 2:** If $c$ is the minimum cut in $G$, then there are at most $n^{2 \alpha}$ cuts of value $k \leq \alpha \cdot c $ in $G$.

**Practice problem:** Prove the above lemma.

## Graph Sparsification

We will be interested in the following problem: given a graph $G(V, E, w)$, we want to find a sparse graph $H(V, F, w’)$ such that for every cut $(S, \bar{S})$ in $G$, the value of the cut in $H$ is approximately the same as the value of the cut in $G$.

Note that $|F|$ needs to be $O(n \cdot \textsf{poly} \log n)$.

For this lecture, we will assume that $G$ is an undirected, unweighted graph, i.e. $w_e = 1$ for all $e \in E$. Moreover, we will assume that the minimum cut in $G$ has value $\Omega(\log n)$. (i.e., a large-ish cut)

However, the results we will see in this lecture can be extended to weighted graphs, and to graphs with small minimum cuts. (See references at the end of the lecture slides)

### Randomized Sparsification Algorithm

**Input:** undirected, unweighted graph $G(V, E)$, and a parameter $\epsilon > 0$

**Output:** a sparse (weighted) graph $H(V, F, w)$, such that for *every cut* $(S, \bar{S})$, we have
$$(1 -\epsilon) \cdot w(S, \bar{S}) \leq w_H(S, \bar{S}) \leq (1 + \epsilon) \cdot w(S, \bar{S})$$

**Algorithm:**

- Let $p \in (0,1)$ be a parameter (to be determined later)
- For each edge $e \in E$, include $e$ in $F$ with probability $p$, and if included, set $w_H(e) = 1/p$.

Main idea:

- We need to set $p$ to be the corrected expected value for both the number of edges in $H$ and the value of each cut in $H$.
- After that, need to prove concentration bounds for values of
*all*cuts in $H$, simultaneously!- We will do this by using Chernoff-Hoeffding bounds.
- Then we will show that there are not too many small cuts in $G$, and thus the probability that we have a bad cut in $H$ is small.
- We will then use the union bound to prove that all cuts are concentrated simultaneously.

We will prove the following theorem:

**Theorem 2 (Karger):** Let $c$ be the value of the minimum cut in $G$.
Set
$$p = \dfrac{15 \log n}{\epsilon^2 \cdot c}$$
Then, with probability $\geq 1 - 4/n$, the above algorithm outputs a graph $H(V, F, w_H)$ with $|F| = O(p \cdot |E|)$, such that for every cut $(S, \bar{S})$ in $G$:
$$(1 -\epsilon) \cdot w(S, \bar{S}) \leq w_H(S, \bar{S}) \leq (1 + \epsilon) \cdot w(S, \bar{S})$$

**Proof:** Let $H(V, F, w_H)$ be the graph output by the algorithm.
Take a cut $(S, \bar{S})$.
Denote $k := w(S, \bar{S})$.
Let $X_e$ be the indicator random variable for the event that $e \in F$.

Then, $w_H(S, \bar{S}) = \sum_{e \in E(S, \bar{S})} X_e / p$ and $|F| = \sum_{e \in E} X_e$.

Hence, the expected values are

- $\mathbb{E}[|F|] = \sum_{e \in E} \mathbb{E}[X_e] = p \cdot m$
- $\mathbb{E}[w_H(S, \bar{S})] = \sum_{e \in E(S, \bar{S})} \mathbb{E}[w_H(e)] = \sum_{e \in E(S, \bar{S})} \mathbb{E}[X_e/ p] = k$

Now, let us compute the concentration bounds for $|F|$ and $w_H(S, \bar{S})$.

- For $|F|$, we have $$\Pr[|F| \geq (1 + \epsilon) \cdot p \cdot m] \leq e^{-\epsilon^2 \cdot \mathbb{E}[|F|] / 3} = e^{-\epsilon^2 \cdot p \cdot m / 3} \leq \dfrac{1}{n^2}$$
- For $w_H(S, \bar{S})$, note that $p \cdot w_H(S, \bar{S})$ is a sum of independent random variables with values in ${0, 1}$. Hence, we can use Chernoff bounds to get $$\Pr[|w_H(S, \bar{S}) - k| \geq \epsilon \cdot k] \leq 2 \cdot e^{-\epsilon^2 \cdot k p/3} = 2 \cdot n^{- 5 k/c}$$

Note that $k \geq c$, as $c$ is the value of the minimum cut in $G$.
The above probability is the probability that *a single cut* deviates from its expectation.
How can we handle *all* (the exponentially many) cuts simultaneously?

**Remark:** the probability that a large cut is violated is very small (since $k$ will be very large), and by Lemma 2, there are not too many small cuts in $G$.
So we will be able to do a clever union bound to get the desired result.

Let us work out the union bound.

$$\Pr[\text{some cut deviates from its expected value}] \leq \sum_{S \subset V} \Pr[|w_H(S, \bar{S}) - k| \geq \epsilon \cdot k] $$ $$ \leq \sum_{\alpha = 1, 2, 4, 8, \ldots} \sum_{\substack{S \subset V \ \alpha \cdot c \leq |w_G(S, \bar{S})| \leq 2 \cdot \alpha \cdot c}} \Pr[|w_H(S, \bar{S}) - k| \geq \epsilon \cdot k] $$ $$ \leq \sum_{\alpha = 1, 2, 4, 8, \ldots} n^{4\alpha} \cdot \Pr[|w_H(S, \bar{S}) - k| \geq \epsilon \cdot k \mid \alpha \cdot c \leq k \leq 2 \alpha \cdot c] $$ $$ \leq \sum_{\alpha = 1, 2, 4, 8, \ldots} n^{4\alpha} \cdot 2 \cdot n^{- 5 \alpha} = \sum_{\alpha = 1, 2, 4, 8, \ldots} n^{- \alpha} \leq 4/n$$

This completes the proof.

Where did we use the assumption that the minimum cut in $G$ is large (i.e., $c = \Omega(\log n)$)? This assumption was used so that $p < 1$. Note that if we did not have this assumption, then we would have $p > 1$, and the algorithm would not make sense.

How do we remove the assumption that we have a large cut? Benczur and Karger show that to remove the assumption, we need to use non-uniform sampling of edges. If we choose this non-uniform sampling carefully, then we can get a sparse graph which approximates all cuts with high probability!

If you are interested in seeing how they do it, see references in the lecture slides.