Homework 2

This homework covers the material from lectures 5 to 8

Due date: October 4th, 11:59pm Waterloo time.

PDF version of Homework 2.

LaTeX template, in case you want to write it in LaTeX.

Required Exercises

Remember that you are only required to turn in 5 out of the 6 exercises below.

Problem 1 - Bashing (20 points)

Perfect hashing is nice, but does have the drawback that the perfect hash function has a lengthy description (since you have to describe the second-level hash function for each bucket). Consider the following alternative approach to producing a perfect hash function with a small description. Define bi-bucket hashing, or bashing, as follows. Given $n$ items, allocate two arrays of size $n^{3 / 2}$ . When inserting an item, map it to one bucket in each array, and place it in the emptier of the two buckets.

Suppose a random function (i.e., all function values are uniformly random and mutually independent) is used to map each item to buckets. Give a good upper bound on the expected number of collisions (i.e., the number of pairs of items that are placed in the same bucket).

Hint: what is the probablility that the $k^{t h}$ inserted item collides with some previously inserted item?

Argue that bashing can be implemented efficiently, with the same expected outcome, using the ideas from 2-universal hashing.
Conclude an algorithm with linear expected time (ignoring array initialization) for identifying a perfect bash function for a set of $n$ items. How large is the description of the resulting function?

Problem 2 - More hash functions (20 points)

Consider the following examples of hash families. For each one, prove that it is 2-universal or give a counterexample. The universe for the hash functions is $U = F_{p}$ and the image is ${0, \dots, n - 1}$ .

Let $p$ be a prime number and $n \leq p$ be an integer. Let $H := {h_{a} (x) = (a x mod p) mod n ∣ a \in {1, \dots, p - 1}}$
Let $p$ be a prime number and $n \leq p$ be an integer. Let $H := {h_{b} (x) = (x + b mod p) mod n ∣ b \in {0, 1, \dots, p - 1}}$
Let $m$ be an integer multiple of $n$ , where $m \geq n$ and the universe is ${0, \dots, m - 1}$ . Let $H := {h_{a, b} (x) = (a x + b mod m) mod n ∣ a \in {1, \dots, m - 1}, b \in {0, 1, \dots, m - 1}}$
Let $p$ be a prime number and $n \leq p$ be an integer. Let $H := {h_{a, b} (x) = (a x + b mod p) mod n ∣ a, b \in {0, 1, \dots, p - 1}, a \neq 0}$

Problem 3 - Fingerprinting (20 points)

Consider the problem of deciding whether two integer multisets $S_{1}$ and $S_{2}$ are identical (that is, each integer occurs the same number of times in both sets). This problem can be solved by sorting the two sets in $O (n \log n)$ time, where $n = | S_{1} | = | S_{2} |$ . In this question, you will devise 2 faster randomized algorithms for this problem.

You can assume that the multisets $S_{i}$ only have integers of bit complexity $w$ , that $n \leq 2^{w}$ , that integer operations of $O (w)$ -bit integers can be executed in $O (1)$ time (RAM model), and that a prime with $O (w)$ -bits can be found in $O (n)$ time.

Use polynomial identity testing to give a $O (n)$ time algorithm for the problem above.
Use hashing to give a $O (n)$ time algorithm for the problem above.

Your algorithms for both parts should succeed with probability $\geq 2 / 3$ .

Problem 4 - Karger’s algorithm: the cut menace (20 points)

Another problem about Karger’s randomized algorithm for minimum cut:

Suppose Karger’s algorithm is applied to a tree. Show that it finds a minimum cut in the tree with probability 1.
Consider the following modification of Karger’s algorithm: instead of choosing an edge uniformly at random and merging the endpoints, the algorithm chooses any two distinct vertices uniformly at random and merges them. Show that for any $n$ there is a graph $G_{n}$ with $n$ vertices such that when the modified algorithm is run on $G_{n}$ , the probability that it finds a minimum cut is exponentially small in $n$ .
How many times would you have to repeat the modified algorithm of the previous part to have a reasonable chance of finding a minimum cut? What does this tell us about the practicality of the modified algorithm?
Show that for any $n \geq 3$ there is a connected graph $G_{n}$ with $n$ vertices that has $n (n - 1) / 2$ distinct minimum cuts.

Problem 5 - Karger strikes back (20 points)

To improve the probability of success of the randomized min-cut algorithm, it can be run multiple times.

Consider running the algorithm twice. Determine the number of edge contractions and bound the probability of finding a min-cut.
Consider the following variation. Starting with a graph with $n$ vertices, first contract the graph down to $k$ vertices using the randomized min-cut algorithm. Make $ℓ$ copies of the graph with $k$ vertices, and now run the randomized algorithm on these reduced graphs independently. Determine the number of edge contractions and bound the probability of finding a min-cut.
Find optimal (or at least near-optimal) values of $k$ and $ℓ$ for the variation in the previous part that maximizes the probability of finding a min-cut while using the same number of edge contractions as running the original algorithm twice.

Problem 6 - Sublinear-time Algorithms (20 points)

Sublinear-time algorithms for connectedness in graphs with bounded degree.

Given a graph $G$ of maximum degree $d$ (as adjacency list), and a parameter $ϵ > 0$ , give an algorithm which has the following behavior: if $G$ is connected, then the algorithm should output PASS with probability $1$ , and if $G$ is $ϵ$ -far from connected (at least $ϵ \cdot n \cdot d$ edges must be added to connect $G$ ), then the algorithm should output FAIL with probability at least $3 / 4$ . Your algorithm should look at a number of edges that is independent of $n$ , and polynomial in $d, ϵ$ .

For this problem, when proving the correctness of your algorithm, it is ok to show that if the input graph $G$ is likely to be passed, then it is $ϵ$ -close to a graph $G_{0}$ which is connected, without requiring that $G_{0}$ has degree at most $d$ .

Practice Problems

You are not required to submit the solution to these problems, but highly encouraged to attempt them for your own knowledge sake. :)

Problem 1 - From Lecture 5

Check that the hash family we constructed in slide 51 is strongly 2-universal. Is it strongly 3-universal?
Prove that the hash family from slide 54 is 2-universal. Prove also that it is not strongly 2-universal.
Prove proposition from slide 55
Construct a family of 3-wise independent hash functions, which is not 4-wise independent. Prove that this family is 3-wise independent and not 4-wise independent.

(Hint: see slide 59.)

Prove that the number of collisions is an upper bound on the square of the maximum load of a hash table.
Use Markov’s inequality and the part 6 to prove the second lemma from slide 72.

Problem 2 - Testing Sortedness in Sublinear Time (courtesy of Nathan Harms)

In this question we mean that an array is sorted if it is sorted in increasing order.

Let $A$ be an array of $n$ integers, and let $ε \in (0, 1)$ . We say that $A$ is $ε$ -far from sorted if every sorted array $B$ has at least $ε n$ entries that are different from $A$ . In other words, at least $ε n$ entries of $A$ must be modified to turn it into a sorted array.

An $ε$ -tester for sortedness with query complexity $q$ is a randomized algorithm that queries a sequence $i_{1}, \dots, i_{q}$ of indices from $A$ and outputs YES or NO as follows:

if $A$ is sorted, the algorithm output YES with probability 1
if $A$ is $ε$ -far from sorted, the algorithm outputs NO with probability at least $2 / 3$ .

Question 1: the simplest idea for a tester is the random adjacent pair tester, which chooses a random index $i \in {1, \dots, n - 1}$ , queries $A [i]$ and $A [i + 1]$ and rejects if $A [i] > A [i + 1]$ . Repeat this procedure $q$ times and reject if at any query we output reject.

Give an example array $A$ that is $1 / 2$ -far from sorted, where this algorithm with $q = 1$ repetitions outputs no with probability at most $1 / (n - 1)$ . Conclude that the tester must repeat at least $q = Ω (n)$ times, so the query complexity in this case is $Ω (n)$

Question 2: Another idea is to choose $q$ independently, uniformly random indices $i_{1}, \dots, i_{q}$ and output NO if the subsequence $A [i_{1}], A [i_{2}], \dots, A [i_{q}]$ is not sorted.

Give an example of an array that is $1 / 2$ -far from sorted where this algorithm requires $q = Ω (\sqrt{n})$ queries.

Hint: remember the birthday paradox

Question 3: Assume each entry of $A$ is distinct. Consider the set $G$ of good indices of $A$ , where index $i$ is good if using binary search on $A$ to find $A [i]$ will successfully find the index $i$ .

For any set $G = {i_{1}, \dots, i_{g}}$ of good indices, show that the subsequence $A [i_{1}], \dots, A [i_{g}]$ is sorted.
Show that, if $A$ is $ε$ -far from sorted, then $| G | \leq (1 - ε) n$ .
Show that there is an $ε$ -tester for sortedness that uses $O (\frac{1}{ε} \log n)$ queries.

Hint: consider choosing a random index $i {1, \dots, n}$ , querying $A [i]$ , then trying to use binary search to find element $A [i]$ .

Last updated on Sep 20, 2021

Edit this page