Homework 2
This homework covers the material from lectures 5 to 7
Due date: June 2nd, 11:59pm Waterloo time.
LaTeX template, in case you want to write it in LaTeX.
Required Exercises
Remember that you are only required to turn in 5 out of the 6 exercises below.
Problem 1 - Bashing (20 points)
Perfect hashing is nice, but does have the drawback that the perfect hash function has a lengthy description (since you have to describe the second-level hash function for each bucket). Consider the following alternative approach to producing a perfect hash function with a small description. Define bi-bucket hashing, or bashing, as follows. Given $n$ items, allocate two arrays of size $n^{3/2}$. When inserting an item, map it to one bucket in each array, and place it in the emptier of the two buckets.
- Suppose a random function (i.e., all function values are uniformly random and mutually independent) is used to map each item to buckets. Give a good upper bound on the expected number of collisions (i.e., the number of pairs of items that are placed in the same bucket).
Hint: what is the probablility that the $k^{th}$ inserted item collides with some previously inserted item?
-
Argue that bashing can be implemented efficiently, with the same expected outcome, using the ideas from 2-universal hashing.
-
Conclude an algorithm with linear expected time (ignoring array initialization) for identifying a perfect bash function for a set of $n$ items. How large is the description of the resulting function?
Problem 2 - More hash functions (20 points)
Consider the following examples of hash families. For each one, prove that it is 2-universal or give a counterexample. The universe for the hash functions is $U = \mathbb{F}_p$ and the image is $\{0, \ldots, n-1 \}$.
- Let $p$ be a prime number and $n \leq p$ be an integer. Let $$\mathcal{H} := \{ h_a(x) = (ax \bmod p) \bmod n \ \mid \ a \in \{ 1, \ldots, p-1 \} \} $$
- Let $p$ be a prime number and $n \leq p$ be an integer. Let $$\mathcal{H} := \{ h_b(x) = (x + b \bmod p) \bmod n \ \mid \ b \in \{ 0,1, \ldots, p-1 \} \} $$
- Let $m$ be an integer multiple of $n$, where $m \geq n$ and the universe is $\{0,\ldots, m-1\}$. Let $$\mathcal{H} := \{ h_{a, b}(x) = (ax + b \bmod m) \bmod n \ \mid \ a \in \{ 1, \ldots, m-1 \}, b \in \{ 0, 1, \ldots, m-1 \} \} $$
- Let $p$ be a prime number and $n \leq p$ be an integer. Let $$\mathcal{H} := \{ h_{a, b}(x) = (ax + b \bmod p) \bmod n \ \mid \ a, b \in \{ 0,1, \ldots, p-1 \}, a \neq 0 \} $$
Problem 3 - Karger’s algorithm: the cut menace (20 points)
Another problem about Karger’s randomized algorithm for minimum cut:
- Suppose Karger’s algorithm is applied to a tree. Show that it finds a minimum cut in the tree with probability 1.
- Consider the following modification of Karger’s algorithm: instead of choosing an edge uniformly at random and merging the endpoints, the algorithm chooses any two distinct vertices uniformly at random and merges them. Show that for any $n$ there is a graph $G_n$ with $n$ vertices such that when the modified algorithm is run on $G_n$, the probability that it finds a minimum cut is exponentially small in $n$.
- How many times would you have to repeat the modified algorithm of the previous part to have a reasonable chance of finding a minimum cut? What does this tell us about the practicality of the modified algorithm?
- Show that for any $n \geq 3$ there is a connected graph $G_n$ with $n$ vertices that has $n(n-1)/2$ distinct minimum cuts.
Problem 4 - Karger strikes back (20 points)
To improve the probability of success of the randomized min-cut algorithm, it can be run multiple times.
-
Consider running the algorithm twice. Determine the number of edge contractions and bound the probability of finding a min-cut.
-
Consider the following variation. Starting with a graph with $n$ vertices, first contract the graph down to $k$ vertices using the randomized min-cut algorithm. Make $\ell$ copies of the graph with $k$ vertices, and now run the randomized algorithm on these reduced graphs independently. Determine the number of edge contractions and bound the probability of finding a min-cut.
-
Find optimal (or at least near-optimal) values of $k$ and $\ell$ for the variation in the previous part that maximizes the probability of finding a min-cut while using the same number of edge contractions as running the original algorithm twice.
Problem 5 - Fingerprinting (20 points)
Consider the problem of deciding whether two integer multisets $S_1$ and $S_2$ are identical (that is, each integer occurs the same number of times in both sets). This problem can be solved by sorting the two sets in $O(n \log n)$ time, where $n = |S_1| = |S_2|$. In this question, you will devise 2 faster randomized algorithms for this problem.
You can assume that the multisets $S_i$ only have integers of bit complexity $w$, that $n \leq 2^w$, that integer operations of $O(w)$-bit integers can be executed in $O(1)$ time (RAM model), and that a prime with $O(w)$-bits can be found in $O(n)$ time.
-
Use polynomial identity testing to give a $O(n)$ time algorithm for the problem above.
-
Use hashing to give an algorithm which runs in expected $O(n)$ time for the problem above.
Your algorithms for both parts should succeed with probability $\geq 2/3$.
Problem 6 - Randomized Matching (20 points)
Given a randomized algorithm $\mathcal{A}$ for testing the existence of a perfect matching in a graph $G$, give a randomized algorithm for finding a perfect matching in $G$. Analyze the running time and success probability of your algorithm on graphs with $n$ vertices and $m$ edges, given that the algorithm $\mathcal{A}$ runs in time $T(n,m)$, and has failure probability $\leq 1/n$.
Your algorithm should have success probability $\geq 2/3$, and it should definitely be guaranteed to stop after $\textsf{poly}(n,m) T(n,m)$ time.
Practice Problems
You are not required to submit the solution to these problems, but highly encouraged to attempt them for your own knowledge sake. :)
Problem 1 - From Lecture 5
- Check that the hash family we constructed in slide 51 is strongly 2-universal. Is it strongly 3-universal?
- Prove that the hash family from slide 54 is 2-universal. Prove also that it is not strongly 2-universal.
- Prove proposition from slide 55
- Construct a family of 3-wise independent hash functions, which is not 4-wise independent. Prove that this family is 3-wise independent and not 4-wise independent.
(Hint: see slide 59.)
- Prove that the number of collisions is an upper bound on the square of the maximum load of a hash table.
- Use Markov’s inequality and the part 6 to prove the second lemma from slide 72.
Problem 2 - Matching
Let $G(U, V, E)$ be a bipartite graph, and let $X$ be the matrix of indeterminates given from the adjacency matrix of $G$, as we saw in lecture. Show that the size of the maximum matching in $G$ is equal to the rank of $X$ over the field $\mathbb{F}(X)$.
Hint: you can use the fact that the rank of $X$ as above is the same as the max rank of any evaluation of $X$ over $\mathbb{F}$.