Homework 2

This homework covers the material from lectures 5 to 8

Due date: October 4th, 11:59pm Waterloo time.

PDF version of Homework 2.

LaTeX template, in case you want to write it in LaTeX.

Required Exercises

Remember that you are only required to turn in 5 out of the 6 exercises below.

Problem 1 - Bashing (20 points)

Perfect hashing is nice, but does have the drawback that the perfect hash function has a lengthy description (since you have to describe the second-level hash function for each bucket). Consider the following alternative approach to producing a perfect hash function with a small description. Define bi-bucket hashing, or bashing, as follows. Given $n$ items, allocate two arrays of size $n^{3/2}$. When inserting an item, map it to one bucket in each array, and place it in the emptier of the two buckets.

  1. Suppose a random function (i.e., all function values are uniformly random and mutually independent) is used to map each item to buckets. Give a good upper bound on the expected number of collisions (i.e., the number of pairs of items that are placed in the same bucket).

Hint: what is the probablility that the $k^{th}$ inserted item collides with some previously inserted item?

  1. Argue that bashing can be implemented efficiently, with the same expected outcome, using the ideas from 2-universal hashing.

  2. Conclude an algorithm with linear expected time (ignoring array initialization) for identifying a perfect bash function for a set of $n$ items. How large is the description of the resulting function?

Problem 2 - More hash functions (20 points)

Consider the following examples of hash families. For each one, prove that it is 2-universal or give a counterexample. The universe for the hash functions is $U = \mathbb{F}_p$ and the image is $\{0, \ldots, n-1\}$.

  1. Let $p$ be a prime number and $n \leq p$ be an integer. Let $$\mathcal{H} := \{ h_a(x) = (ax \bmod p) \bmod n \ \mid \ a \in \{ 1, \ldots, p-1 \} \} $$
  2. Let $p$ be a prime number and $n \leq p$ be an integer. Let $$\mathcal{H} := \{ h_b(x) = (x + b \bmod p) \bmod n \ \mid \ b \in \{ 0,1, \ldots, p-1 \} \} $$
  3. Let $m$ be an integer multiple of $n$, where $m \geq n$ and the universe is $\{0,\ldots, m-1\}$. Let $$\mathcal{H} := \{ h_{a, b}(x) = (ax + b \bmod m) \bmod n \ \mid \ a \in \{ 1, \ldots, m-1 \}, b \in \{ 0, 1, \ldots, m-1 \} \} $$
  4. Let $p$ be a prime number and $n \leq p$ be an integer. Let $$\mathcal{H} := \{ h_{a, b}(x) = (ax + b \bmod p) \bmod n \ \mid \ a, b \in \{ 0,1, \ldots, p-1 \}, a \neq 0 \} $$

Problem 3 - Fingerprinting (20 points)

Consider the problem of deciding whether two integer multisets $S_1$ and $S_2$ are identical (that is, each integer occurs the same number of times in both sets). This problem can be solved by sorting the two sets in $O(n \log n)$ time, where $n = |S_1| = |S_2|$. In this question, you will devise 2 faster randomized algorithms for this problem.

You can assume that the multisets $S_i$ only have integers of bit complexity $w$, that $n \leq 2^w$, that integer operations of $O(w)$-bit integers can be executed in $O(1)$ time (RAM model), and that a prime with $O(w)$-bits can be found in $O(n)$ time.

  1. Use polynomial identity testing to give a $O(n)$ time algorithm for the problem above.

  2. Use hashing to give a $O(n)$ time algorithm for the problem above.

Your algorithms for both parts should succeed with probability $\geq 2/3$.

Problem 4 - Karger’s algorithm: the cut menace (20 points)

Another problem about Karger’s randomized algorithm for minimum cut:

  1. Suppose Karger’s algorithm is applied to a tree. Show that it finds a minimum cut in the tree with probability 1.
  2. Consider the following modification of Karger’s algorithm: instead of choosing an edge uniformly at random and merging the endpoints, the algorithm chooses any two distinct vertices uniformly at random and merges them. Show that for any $n$ there is a graph $G_n$ with $n$ vertices such that when the modified algorithm is run on $G_n$, the probability that it finds a minimum cut is exponentially small in $n$.
  3. How many times would you have to repeat the modified algorithm of the previous part to have a reasonable chance of finding a minimum cut? What does this tell us about the practicality of the modified algorithm?
  4. Show that for any $n \geq 3$ there is a connected graph $G_n$ with $n$ vertices that has $n(n-1)/2$ distinct minimum cuts.

Problem 5 - Karger strikes back (20 points)

To improve the probability of success of the randomized min-cut algorithm, it can be run multiple times.

  1. Consider running the algorithm twice. Determine the number of edge contractions and bound the probability of finding a min-cut.

  2. Consider the following variation. Starting with a graph with $n$ vertices, first contract the graph down to $k$ vertices using the randomized min-cut algorithm. Make $\ell$ copies of the graph with $k$ vertices, and now run the randomized algorithm on these reduced graphs independently. Determine the number of edge contractions and bound the probability of finding a min-cut.

  3. Find optimal (or at least near-optimal) values of $k$ and $\ell$ for the variation in the previous part that maximizes the probability of finding a min-cut while using the same number of edge contractions as running the original algorithm twice.

Problem 6 - Sublinear-time Algorithms (20 points)

Sublinear-time algorithms for connectedness in graphs with bounded degree.

Given a graph $G$ of maximum degree $d$ (as adjacency list), and a parameter $\epsilon > 0$, give an algorithm which has the following behavior: if $G$ is connected, then the algorithm should output PASS with probability $1$, and if $G$ is $\epsilon$-far from connected (at least $\epsilon \cdot n \cdot d$ edges must be added to connect $G$), then the algorithm should output FAIL with probability at least $3/4$. Your algorithm should look at a number of edges that is independent of $n$, and polynomial in $d, \epsilon$.

For this problem, when proving the correctness of your algorithm, it is ok to show that if the input graph $G$ is likely to be passed, then it is $\epsilon$-close to a graph $G_0$ which is connected, without requiring that $G_0$ has degree at most $d$.

Practice Problems

You are not required to submit the solution to these problems, but highly encouraged to attempt them for your own knowledge sake. :)

Problem 1 - From Lecture 5

  1. Check that the hash family we constructed in slide 51 is strongly 2-universal. Is it strongly 3-universal?
  2. Prove that the hash family from slide 54 is 2-universal. Prove also that it is not strongly 2-universal.
  3. Prove proposition from slide 55
  4. Construct a family of 3-wise independent hash functions, which is not 4-wise independent. Prove that this family is 3-wise independent and not 4-wise independent.

(Hint: see slide 59.)

  1. Prove that the number of collisions is an upper bound on the square of the maximum load of a hash table.
  2. Use Markov’s inequality and the part 6 to prove the second lemma from slide 72.

Problem 2 - Testing Sortedness in Sublinear Time (courtesy of Nathan Harms)

In this question we mean that an array is sorted if it is sorted in increasing order.

Let $A$ be an array of $n$ integers, and let $\varepsilon \in (0,1)$. We say that $A$ is $\varepsilon$-far from sorted if every sorted array $B$ has at least $\varepsilon n$ entries that are different from $A$. In other words, at least $\varepsilon n$ entries of $A$ must be modified to turn it into a sorted array.

An $\varepsilon$-tester for sortedness with query complexity $q$ is a randomized algorithm that queries a sequence $i_1, \ldots, i_q$ of indices from $A$ and outputs YES or NO as follows:

  1. if $A$ is sorted, the algorithm output YES with probability 1
  2. if $A$ is $\varepsilon$-far from sorted, the algorithm outputs NO with probability at least $2/3$.

Question 1: the simplest idea for a tester is the random adjacent pair tester, which chooses a random index $i \in \{ 1, \ldots, n-1 \}$, queries $A[i]$ and $A[i+1]$ and rejects if $A[i] > A[i+1]$. Repeat this procedure $q$ times and reject if at any query we output reject.

Give an example array $A$ that is $1/2$-far from sorted, where this algorithm with $q=1$ repetitions outputs no with probability at most $1/(n-1)$. Conclude that the tester must repeat at least $q = \Omega(n)$ times, so the query complexity in this case is $\Omega(n)$

Question 2: Another idea is to choose $q$ independently, uniformly random indices $i_1, \ldots, i_q$ and output NO if the subsequence $A[i_1], A[i_2], \ldots, A[i_q]$ is not sorted.

Give an example of an array that is $1/2$-far from sorted where this algorithm requires $q = \Omega(\sqrt{n})$ queries.

Hint: remember the birthday paradox

Question 3: Assume each entry of $A$ is distinct. Consider the set $G$ of good indices of $A$, where index $i$ is good if using binary search on $A$ to find $A[i]$ will successfully find the index $i$.

  1. For any set $G = \{ i_1, \ldots, i_g \}$ of good indices, show that the subsequence $A[i_1], \ldots, A[i_g]$ is sorted.
  2. Show that, if $A$ is $\varepsilon$-far from sorted, then $|G| \leq (1-\varepsilon)n$.
  3. Show that there is an $\varepsilon$-tester for sortedness that uses $O(\frac{1}{\varepsilon} \log n)$ queries.

Hint: consider choosing a random index $i ~ \{1, \ldots, n\}$, querying $A[i]$, then trying to use binary search to find element $A[i]$.

Previous
Next