Solutions to Homework 2
Problem 1
Part 1
Since we are inserting $n$ elements into a hash table, we can w.l.o.g. (to ease the notation) assume that the keys are $[n]$, and that we insert key $k$ at the $k^{th}$ step.
Let $h_1 : [n] \rightarrow [n^{3/2}]$ and $h_2 : [n] \rightarrow [n^{3/2}]$ be our random functions, where $h_1$ maps the keys to the first array, and $h_2$ maps the keys to the second array. For each pair of keys $i < j$, let $X_{ij}$ be the indicator random variable that keys $i$ and $j$ collide.
Now, we can compute the probability that the keys $i < k$ collide as follows: when we insert key $k$ into the hash table, there are $k-1$ keys already in the table, the key $i$ being one of them. We can assume w.l.o.g. that the key $i$ is in the first array. Then, the probability that key $k$ collides with key $i$ is the probability that $h_1(k) = h_1(i)$, and that $h_2(k)$ maps to an entry in the second hasht table which has more elements than the bucket where $i$ is. Since $\Pr[h_1(k) = h_1(i)] = 1/n^{3/2}$, as $h_1$ is a random function, we are left with computing the probability that $h_2(k)$ maps to a bucket with more elements than the bucket where $i$ is. We can upper bound the probability of this event by the probability that $h_2(k)$ maps to a bucket with at least $1$ element. Since there are $k-1$ elements in both arrays prior to inserting key $k$, the probability that $h_2(k)$ maps to a bucket with at least $1$ element is at most $(k-1)/n^{3/2}$.
Putting everything together, we get: $$ \Pr[X_{ik} = 1] \leq \frac{k-1}{n^{3/2}} \cdot \frac{1}{n^{3/2}} = \frac{k-1}{n^3}.$$
Now, we can bound the expected number of collisions as follows: $$ \mathbb{E}[X] = \sum_{i < j} \Pr[X_{ij} = 1] = \sum_{k=2}^n \sum_{i < k} \Pr[X_{ik} = 1] \leq \sum_{k=2}^n \frac{(k-1)^2}{n^3} = \dfrac{(n-1)n(2n-1)}{6n^3} \leq 1/3 $$
Part 2
Since in the above argument we only used the fact that the probability of a collision is at most $1/n^{3/2}$, the arugment above works for any random choices $h_1, h_2$ coming from a $2$-universal hash function family.
Part 3
Take a prime $p$ such that $n^{3/2} \leq p \leq 2n^{3/2}$. Let $\mathcal{H}$ be the family of hash functions $h_{a,b} : \mathbb{F}_p \rightarrow [n^{3/2}]$ defined as $h_{a,b}(x) = ((ax + b) \mod p) \mod n^{3/2}$, where $a,b \in \mathbb{F}_p$ are chosen uniformly at random, with $a \neq 0$. As we saw in class, $\mathcal{H}$ is $2$-universal.
Now, we choose $h_1, h_2 \in \mathcal{H}$ uniformly at random, and we use them to hash the keys into the two arrays. Since the expected number of collisions is $\leq 1/3$, by Markov’s inequality, the probability that the number of collisions is $\geq 1$ is $\leq 1/3$. Thus, with probability $\geq 2/3$, our choice of $h_1, h_2$ will result in a perfect hashing scheme.
Note that we can check whether the number of collisions is $\geq 1$ in $O(n)$ time, by simply iterating over the keys and checking whether any two of them collide.
If the procedure above fails, we can simply repeat it until it succeeds. The expected number of repetitions to succeed is $\leq 3/2$, so the expected running time of the algorithm is $O(n)$.
Problem 2
Part 1
Not $2$-universal. Take $p = n+2$ (for instance $n=3, p=5$). Consider the numbers $x = 1, y = n+1$. If we pick $a = 1, n+1$, then $h_a(x) = h_a(y)$, so we have $$ \Pr_{h \in \mathcal{H}}[h(x) = h(y)] \geq 2/(p-1) > 1/n.$$
Part 2
Not $2$-universal. Take $p = n+2$ (for instance $n=3, p=5$). Consider the numbers $x = 0, y = n$. If we pick $b = 0, 1$ then they will map to the same location. Hence $$ \Pr_{h \in \mathcal{H}}[h(x) = h(y)] \geq 2/p > 1/n.$$
Part 3
Not $2$-universal. Take $m = 2n$. Consider the numbers $x = 0, y = n$. Since $2n \equiv 0 \bmod m$, we know that if $a$ is even then any choice for $b$ will cause $x, y$ to collide. This gives us at least $m(m-2)/2$ pairs $(a,b)$ which cause $x,y$ to collide. Hence, $$ \Pr_{h \in \mathcal{H}}[h(x) = h(y)] \geq m(m-2)/2m(m-1) > 1/n.$$
Part 4
This function is $2$-universal (as noted in class). To see this, let $x \neq y \in \mathbb{F}_p$. Now note that because $a \neq 0$, we have that the pair $(ax + b \bmod p, ay + b \bmod p) =: (r, s)$ is such that $r \neq s$, for any choice of $a \neq 0, b \in \mathbb{F}_p$. Hence, we have that the two sets $\{ (ax + b \bmod p, ay + b \bmod p) \}_{a \neq 0, b \in \mathbb{F}_p}$ and ${(r, s)}_{r \neq s \in \mathbb{F}_p}$ are the same set (both with $p(p-1)$ elements).
Thus, the probability that $h_{a,b}(x) = h_{a,b}(y)$ is equal to the fraction of pairs $(r,s)$ above such that $r \neq s$ and $r \equiv s \bmod n$.
To compute this fraction, note that for each $r \in \mathbb{F}_p$, letting $\alpha_r := r \bmod n$, there are at most $\lfloor (p - \alpha_r)/n \rfloor$ choices for $s$ such that $r \equiv s \bmod n$ and $s \neq r$.
Thus, the fraction of pairs is upper bounded by:
$$\Pr_{h \in \mathcal{H}}[h(x) = h(y)] \leq \frac{p(\lfloor p/n \rfloor)}{p(p-1)} \leq \frac{p(p - 1)}{np(p-1)} = 1/n.$$
Bonus remark: To see that it is not $2$-strongly universal, note that in the above calculation we did not tightly bound the number of pairs $(r,s)$ such that $r \equiv s \bmod n$. For many values of $r$, we will have that $\lfloor (p - \alpha_r)/n \rfloor < (p - \alpha_r)/n < (p-1)/n$. Thus, the inequality above is actually a strict inequality.
Hence, we have: $$ 1 = \sum_{0 \leq \alpha, \beta < n} \Pr_{h \in \mathcal{H}}[h(x) = \alpha \wedge h(y) = \beta] = $$ $$ = \sum_{\alpha} \Pr_{h \in \mathcal{H}}[h(x) = \alpha \wedge h(y) = \alpha] + \sum_{\alpha \neq \beta} \Pr_{h \in \mathcal{H}}[h(x) = \alpha \wedge h(y) = \beta] <$$ $$ < 1/n + \sum_{\alpha \neq \beta} \Pr_{h \in \mathcal{H}}[h(x) = \alpha \wedge h(y) = \beta] \leq 1/n + n(n-1) \cdot \Pr_{h \in \mathcal{H}}[h(x) = \alpha^* \wedge h(y) = \beta^*]$$ where $\alpha^*, \beta^*$ are the values such that $\Pr_{h \in \mathcal{H}}[h(x) = \alpha^* \wedge h(y) = \beta^*]$ is maximized. Hence, we conclude that $$ \Pr_{h \in \mathcal{H}}[h(x) = \alpha^* \wedge h(y) = \beta^*] > \dfrac{1-1/n}{n(n-1)} = 1/n^2.$$
Problem 3
Part 1
Note that in a tree, any cut $(S, \bar{S})$ where the nodes in $S$ form a connected component (and thus $\bar{S}$ is also a connected component) is a minimum cut, as there can only be one edge between $S$ and $\bar{S}$. If that were not the case, and such a cut $(S, \bar{S})$ had 2 edges between $S$ and $\bar{S}$, then we would have a cycle in our graph, which is a contradiction. To see that we have a cycle, let ${u, v}$ and ${u’, v’}$ be two edges between $S$ and $\bar{S}$, where $u, u’ \in S$. Then, we have that the path $u \to v \to v’ \to u’ \to u$ is a cycle, as $u, v, u’, v’$ are all distinct and $u, u’$ and $v, v’$ are connected within $S$ and $\bar{S}$, respectively.
Since Karger’s algorithm only picks cuts $(S, \bar{S})$ where $S$ and $\bar{S}$ are connected components, it will always pick a minimum cut.
Part 2
For simplicity, let $n = 2t$ be the number of vertices in our graph. Let $G$ be the dumbbell graph with $t$ vertices on each side. More precisely, $G$ will have the two cliques among the vertices ${1, \ldots, t}$ and ${t+1, \ldots, 2t}$, and a single edge between $t$ and $t+1$.
Note that the only minimum cut of $G$ is the cut $({1, \ldots, t}, {t+1, \ldots, 2t})$, which is of weight $1$.
Now, we will prove that this modification of the algorithm will have exponentially small probability of finding the minimum cut, even if we only consider the first $n/4$ steps of this algorithm. To see this, if the algorithm has not yet failed after $k$ steps, then at the $k+1$-th step, one click of the graph has $n-j$ vertices and the other has $n-k+j$ vertices, for some $j \leq k$. Thus, the probability of not failing in this step is given by $$\dfrac{\binom{n-j}{2} + \binom{n-k+j}{2}}{\binom{2n-k}{2}} \leq \dfrac{\binom{n}{2} + \binom{n-k}{2}}{\binom{2n-k}{2}} \leq 2/3$$ where the first inequality is by convexity of the binomial coefficient function and the second inequality follows for any value of $k \leq n/4$.
Hence, the probability of success in the first $n/4$ steps is at most $(2/3)^{n/4}$.
Remark: there are many ways to argue that the probability of success is exponentially small, some of which are simpler than what we discussed above. For instance, depending on how small you take $k$ (say $k\leq n/4$ still), you can lower bound the denominator by $\binom{7n/4}{2} \sim 49n^2/32$ whereas the numerator is at most $n^2$. Another way to argue is to use symmetry of the process to show that the probability of ending up with the cut is the same as the probability to find any other cut where both sides have size $n/2$. Then you can check that this probability is exponentially small. But the latter solution, you would also need to argue that such probabilities are the same.
Part 3
Based on the above calculation, we would have to run the algorithm for at least $2 \cdot (3/2)^{n/4}$ steps to have a probability of success of at least $1/2$.
Part 4
A cycle of length $n$ is such that each choice of two endpoints yields a minimum cut (take the two intervals that are defined by these two endpoints). Thus, there are $\binom{n}{2}$ minimum cuts in a cycle of length $n$.
Problem 4
See PDF on piazza
Problem 5
Part 1
Let $S_1 := {a_1, \ldots, a_n}$ and $S_2 := {b_1, \ldots, b_n}$ be our multisets of integers. Let $P_1(x) = \prod_{i=1}^n (x - a_i)$ and $P_2(x) = \prod_{i=1}^n (x - b_i)$ be the polynomials with roots $S_1$ and $S_2$, respectively. Note that $S_1 = S_2$ if, and only if, $P_1(x) \equiv P_2(x)$.
Now, let $P(x) := P_1(x) - P_2(x)$. So we want to check if $P(x) \equiv 0$. Note that $P(x)$ is a polynomial with integer coefficients of degree $\leq n$. Hence, if we pick a prime $p$ such that $2^{w+2} < p < 2^{w+3}$, we know that the polynomial $P(x) \bmod p$ is well-defined, and has degree $\leq n$. Moreover, since the roots of $P_1, P_2$ are all smaller than $p$, we know that $P(x) \not\equiv 0 \Leftrightarrow P(x) \bmod p \not\equiv 0$, as $P(x) \bmod p \equiv 0 \Leftrightarrow P_1(x) \equiv P_2(x) \bmod p \Leftrightarrow P_1(x) \equiv P_2(x)$.
Now, we will use the algorithm from class, and pick a random value $\alpha \in \mathbb{F}_p$, and return $S_1 = S_2$ if $P(\alpha) \equiv 0 \bmod p$ and return $S_1 \neq S_2$ otherwise. Note that if $S_1 = S_2$, then the algorithm will always return $S_1 = S_2$. Now, if $S_1 \neq S_2$, then we have that $\Pr[P(\alpha) \equiv 0 \bmod p] \leq n/p \leq 2^{w}/2^{w+2} = 1/4$.
Now to make sure that our algorithm runs in $O(n)$ time, we simply compute the evaluation $P(\alpha) \bmod p$ by first evaluating $P_1(\alpha)$ and $P_2(\alpha)$ over $\mathbb{F}_p$ (which takes $O(n)$ time, since we have to do $n$ multiplications and $n$ additions over $\mathbb{F}_p$), and then subtracting the two values (which takes $O(1)$ time. Thus, our algorithm runs in $O(n)$ time.
Part 2
For this part, we can use the two-level hashing algorithm from class, which runs in expected running time $O(n)$, and constructs a perfect hash table for the set of distinct elements of $S_1 \cup S_2$. Once we have this perfect hash table, we iterate over the elements of $S_1$ and at each entry of the hash table we keep a counter of how many times we have seen that element while hashing $S_1$. Then, we iterate over the elements of $S_2$ and at each entry of the hash table we keep a counter of how many times we have seen that element while hashing $S_2$. Finally, we iterate over the elements of the hash table, and if the counters for $S_1$ and $S_2$ are different, then we return $S_1 \neq S_2$.
Since the hashing scheme is perfect, this algorithm will always return the correct answer. Moreover, since the hashing scheme is two-level, we know that the expected running time is $O(n)$, as each operation on an element of $S_1$ or $S_2$ will take $O(1)$ time.
Problem 6
Before we proceed to find a perfect matching in $G$, we will first amplify the success probability of our algorithm $\mathcal{A}$ which decides whether a graph has a perfect matching or not.
For that, we devise the following algorithm $\mathcal{B}$:
- Let $G$ be the input graph.
- Run $n$ independent iterations of $\mathcal{A}$ on $G$, and output the majority answer.
Let $X_i$ be the random variable that is $1$ if $\mathcal{A}$ outputs the wrong answer on the $i$-th iteration, and $0$ otherwise. Then, we have that $\Pr[X_i = 1] \leq 1/n$ and thus the expected number of wrong answers that algorithm $\mathcal{A}$ outputs is $\mathbb{E}[\sum_{i=1}^n X_i] \leq 1$.
Hence, by the Chernoff bound, we have that $$\Pr[\sum_{i=1}^n X_i \geq n/2] \leq e^{-n^2/6}$$ Thus, $\Pr[\mathcal{B} \text{ outputs the wrong answer}] \leq e^{-n^2/6}$.
We will use the following algorithm to find a perfect matching in $G$, if it has one.
- Let $2n$ be the number of vertices in $G$ and $\mathcal{M}$ be the set of edges in our matching. So we start with $\mathcal{M} = \emptyset$. Let $G_1 := G$ be our input graph.
- For $i = 1, \ldots, n$:
- Pick a vertex from $G_i$ uniformly at random, and call it $u_i$.
- pick an edge $e_i = {u_i, v_i}$ uniformly at random from $G_i$ that is incident to our chosen vertex $u_i$. Let $H_i$ be the graph we get after deleting $u_i$ and $v_i$ from $G_i$.
- If $\mathcal{B}(H_i) = 1$, then let $\mathcal{M} := \mathcal{M} \cup {e_i}$ and let $G_{i+1} := H_i$. Iterate in the for loop.
- Otherwise, delete edge $e_i$ from $G_i$. If $u_i$ still has edges incident to it, go back to step 1. Else, return that $G$ has no perfect matching.
- Pick a vertex from $G_i$ uniformly at random, and call it $u_i$.
Runtime: note that the algorithm $\mathcal{B}$ runs in $O(n \cdot T)$ time, as it invokes $\mathcal{A}$ $n$ times, and each invocation of $\mathcal{A}$ takes $O(T)$ time. Now, our final algorithm will invoke $\mathcal{B}$ at most $m$ times, where $m$ is the number of edges in $G$. This is because we invoke $\mathcal{B}$ at most once per edge in $G$, and each invocation of $\mathcal{B}$ will delete at least one edge from $G$. Thus, our algorithm runs in $O(m \cdot n \cdot T)$ time.
Correctness: we will first prove that if $\mathcal{B}$ always outputs the correct answer, then our algorithm will always output the correct answer. Then we will prove that $\mathcal{B}$ always outputs the correct answer in $m$ iterations with high probability.
Note that if $\mathcal{B}$ always outputs the correct answer, then our algorithm will always output the correct answer (i.e., either find a matching or show that none exist). This is because if $G$ has a perfect matching, then for an edge ${u,v}$ if we run $\mathcal{B}$ on $G - {u,v}$, then $\mathcal{B}$ tells us whether ${u,v}$ is an edge in a perfect matching or not. If ${u,v}$ is an edge in a perfect matching, then we add it to our matching and delete it from $G$. By repeating this process, we will eventually find a perfect matching in $G$, since the new graph certainly has a perfect matching.
Now, in case $G$ has no perfect matching, then for any edge we pick, $\mathcal{B}$ will tell us that $G - {u,v}$ has no perfect matching, and thus our algorithm will return that $G$ has no perfect matching.
Thus, we have proven that if $\mathcal{B}$ always outputs the correct answer, then our algorithm will always output the correct answer. What is the probability that $\mathcal{B}$ outputs the correct answer? By our analysis above, we have that $\Pr[\mathcal{B} \text{ outputs the wrong answer}] \leq e^{-n^2/6}$. Our algorithm runs $\mathcal{B}$ at most $m$ times, and thus the probability that $\mathcal{B}$ outputs the wrong answer at least once is $\leq m e^{-n^2/6}$ (by the union bound).
Thus the success probability of our algorithm is $\geq 1 - m e^{-n^2/6}$.