Homework 2

This homework covers the material from lectures 5 to 8

Due date: October 4th, 11:59pm Waterloo time.

PDF version of Homework 2.

LaTeX template, in case you want to write it in LaTeX.

Required Exercises

Remember that you are only required to turn in 5 out of the 6 exercises below.

Problem 1 - Bashing (20 points)

Perfect hashing is nice, but does have the drawback that the perfect hash function has a lengthy description (since you have to describe the second-level hash function for each bucket). Consider the following alternative approach to producing a perfect hash function with a small description. Define bi-bucket hashing, or bashing, as follows. Given n items, allocate two arrays of size n3/2. When inserting an item, map it to one bucket in each array, and place it in the emptier of the two buckets.

  1. Suppose a random function (i.e., all function values are uniformly random and mutually independent) is used to map each item to buckets. Give a good upper bound on the expected number of collisions (i.e., the number of pairs of items that are placed in the same bucket).

Hint: what is the probablility that the kth inserted item collides with some previously inserted item?

  1. Argue that bashing can be implemented efficiently, with the same expected outcome, using the ideas from 2-universal hashing.

  2. Conclude an algorithm with linear expected time (ignoring array initialization) for identifying a perfect bash function for a set of n items. How large is the description of the resulting function?

Problem 2 - More hash functions (20 points)

Consider the following examples of hash families. For each one, prove that it is 2-universal or give a counterexample. The universe for the hash functions is U=Fp and the image is {0,,n1}.

  1. Let p be a prime number and np be an integer. Let H:={ha(x)=(axmodp)modn  a{1,,p1}}
  2. Let p be a prime number and np be an integer. Let H:={hb(x)=(x+bmodp)modn  b{0,1,,p1}}
  3. Let m be an integer multiple of n, where mn and the universe is {0,,m1}. Let H:={ha,b(x)=(ax+bmodm)modn  a{1,,m1},b{0,1,,m1}}
  4. Let p be a prime number and np be an integer. Let H:={ha,b(x)=(ax+bmodp)modn  a,b{0,1,,p1},a0}

Problem 3 - Fingerprinting (20 points)

Consider the problem of deciding whether two integer multisets S1 and S2 are identical (that is, each integer occurs the same number of times in both sets). This problem can be solved by sorting the two sets in O(nlogn) time, where n=|S1|=|S2|. In this question, you will devise 2 faster randomized algorithms for this problem.

You can assume that the multisets Si only have integers of bit complexity w, that n2w, that integer operations of O(w)-bit integers can be executed in O(1) time (RAM model), and that a prime with O(w)-bits can be found in O(n) time.

  1. Use polynomial identity testing to give a O(n) time algorithm for the problem above.

  2. Use hashing to give a O(n) time algorithm for the problem above.

Your algorithms for both parts should succeed with probability 2/3.

Problem 4 - Karger’s algorithm: the cut menace (20 points)

Another problem about Karger’s randomized algorithm for minimum cut:

  1. Suppose Karger’s algorithm is applied to a tree. Show that it finds a minimum cut in the tree with probability 1.
  2. Consider the following modification of Karger’s algorithm: instead of choosing an edge uniformly at random and merging the endpoints, the algorithm chooses any two distinct vertices uniformly at random and merges them. Show that for any n there is a graph Gn with n vertices such that when the modified algorithm is run on Gn, the probability that it finds a minimum cut is exponentially small in n.
  3. How many times would you have to repeat the modified algorithm of the previous part to have a reasonable chance of finding a minimum cut? What does this tell us about the practicality of the modified algorithm?
  4. Show that for any n3 there is a connected graph Gn with n vertices that has n(n1)/2 distinct minimum cuts.

Problem 5 - Karger strikes back (20 points)

To improve the probability of success of the randomized min-cut algorithm, it can be run multiple times.

  1. Consider running the algorithm twice. Determine the number of edge contractions and bound the probability of finding a min-cut.

  2. Consider the following variation. Starting with a graph with n vertices, first contract the graph down to k vertices using the randomized min-cut algorithm. Make copies of the graph with k vertices, and now run the randomized algorithm on these reduced graphs independently. Determine the number of edge contractions and bound the probability of finding a min-cut.

  3. Find optimal (or at least near-optimal) values of k and for the variation in the previous part that maximizes the probability of finding a min-cut while using the same number of edge contractions as running the original algorithm twice.

Problem 6 - Sublinear-time Algorithms (20 points)

Sublinear-time algorithms for connectedness in graphs with bounded degree.

Given a graph G of maximum degree d (as adjacency list), and a parameter ϵ>0, give an algorithm which has the following behavior: if G is connected, then the algorithm should output PASS with probability 1, and if G is ϵ-far from connected (at least ϵnd edges must be added to connect G), then the algorithm should output FAIL with probability at least 3/4. Your algorithm should look at a number of edges that is independent of n, and polynomial in d,ϵ.

For this problem, when proving the correctness of your algorithm, it is ok to show that if the input graph G is likely to be passed, then it is ϵ-close to a graph G0 which is connected, without requiring that G0 has degree at most d.

Practice Problems

You are not required to submit the solution to these problems, but highly encouraged to attempt them for your own knowledge sake. :)

Problem 1 - From Lecture 5

  1. Check that the hash family we constructed in slide 51 is strongly 2-universal. Is it strongly 3-universal?
  2. Prove that the hash family from slide 54 is 2-universal. Prove also that it is not strongly 2-universal.
  3. Prove proposition from slide 55
  4. Construct a family of 3-wise independent hash functions, which is not 4-wise independent. Prove that this family is 3-wise independent and not 4-wise independent.

(Hint: see slide 59.)

  1. Prove that the number of collisions is an upper bound on the square of the maximum load of a hash table.
  2. Use Markov’s inequality and the part 6 to prove the second lemma from slide 72.

Problem 2 - Testing Sortedness in Sublinear Time (courtesy of Nathan Harms)

In this question we mean that an array is sorted if it is sorted in increasing order.

Let A be an array of n integers, and let ε(0,1). We say that A is ε-far from sorted if every sorted array B has at least εn entries that are different from A. In other words, at least εn entries of A must be modified to turn it into a sorted array.

An ε-tester for sortedness with query complexity q is a randomized algorithm that queries a sequence i1,,iq of indices from A and outputs YES or NO as follows:

  1. if A is sorted, the algorithm output YES with probability 1
  2. if A is ε-far from sorted, the algorithm outputs NO with probability at least 2/3.

Question 1: the simplest idea for a tester is the random adjacent pair tester, which chooses a random index i{1,,n1}, queries A[i] and A[i+1] and rejects if A[i]>A[i+1]. Repeat this procedure q times and reject if at any query we output reject.

Give an example array A that is 1/2-far from sorted, where this algorithm with q=1 repetitions outputs no with probability at most 1/(n1). Conclude that the tester must repeat at least q=Ω(n) times, so the query complexity in this case is Ω(n)

Question 2: Another idea is to choose q independently, uniformly random indices i1,,iq and output NO if the subsequence A[i1],A[i2],,A[iq] is not sorted.

Give an example of an array that is 1/2-far from sorted where this algorithm requires q=Ω(n) queries.

Hint: remember the birthday paradox

Question 3: Assume each entry of A is distinct. Consider the set G of good indices of A, where index i is good if using binary search on A to find A[i] will successfully find the index i.

  1. For any set G={i1,,ig} of good indices, show that the subsequence A[i1],,A[ig] is sorted.
  2. Show that, if A is ε-far from sorted, then |G|(1ε)n.
  3. Show that there is an ε-tester for sortedness that uses O(1εlogn) queries.

Hint: consider choosing a random index i {1,,n}, querying A[i], then trying to use binary search to find element A[i].

Previous
Next