Homework 1 - Spring 2024
This homework covers the material from lectures 1 to 6.
Due date: May 21st, 10pm Waterloo time.
PDF version of Homework 1.
LaTeX template, in case you want to write it in LaTeX.
Required Exercises
Problem 1 - Amortized Analysis (10 points)
Design a data structure to support the following two operations for a dynamic multiset
- INSERT(
) inserts into - DELETE-LARGER-HALF(
) deletes the largest elements from
Show how to implement this data structure so that any sequence of
Hint: to delete larger half, need to find the median. You can use the result that there is a linear time algorithm for finding the median in an unordered array (if you don’t know this result, you should learn about it - it is pretty cool).
Problem 2 - Amortized Analysis (10 points)
An ordered stack is a data structure that stores a sequence of items and supports the following operations:
- OrderedPush(
) removes all items smaller than from the beginning of the sequence and then adds to the beginning of the sequence. - Pop() deletes and returns the first item in the sequence, or NULL if the sequence is empty.
Suppose we implement an ordered stack with a simple linked list, using the obvious OrderedPush and Pop algorithms.
Prove that if we start with an empty data structure, the amortized cost of each OrderedPush or Pop operation is
Problem 3 - Concentration Inequalities (15 points)
The simplest model for a random graph consists of
- What is the expected number of triangles?
- What is the variance?
- Give an expression for the bound in the decay of probability, and show that the number of triangles is concentrated around the expectation.
That is, if
is the expectation of your random variable , show that for some function . To show that it is concentrated around the expectation, for what value of does ?
Problem 4 - Concentration Inequalities (15 points)
In this problem, we are in the setting where given a set
Part 1 (5 points)
Show that given a set
Hint: to show an impossibility result you need to you need to show two distinct sets of
Part 2 (10 points)
Given a set
Problem 5 - Balls and Bins (10 points)
Consider again the experiment in which we toss
-
By writing
for suitable random variables , show how to derive the following exact formula for the variance of : -
What is the asymptotic value (as
) of in the cases and ?
Hint: you may use the approximations
- When throwing
balls into bins, what is the expected number of bins with exactly one ball? Compute an exact formula and its asymptotic approximation.
Problem 6 - Coupon Collector (10 points)
Part 1 (5 points)
Let us consider the coupon collector problem: we toss
Prove that
Hint: Inclusion-Exclusion.
Part 2 (5 points)
Consider the following variation of the coupon collector’s problem. Each box of cereal contains one of
Problem 7 - More hash functions (15 points)
Consider the following examples of hash families.
For each one, prove that it is 2-universal or give a counterexample.
The universe for the hash functions is
Part 1 (5 points)
Let
Part 2 (5 points)
Let
Part 3 (5 points)
Let
Problem 8 - Karger strikes back (15 points)
To improve the probability of success of the randomized min-cut algorithm, it can be run multiple times.
-
Consider running the algorithm twice. Determine the number of edge contractions and bound the probability of finding a min-cut.
-
Consider the following variation. Starting with a graph with
vertices, first contract the graph down to vertices using the randomized min-cut algorithm. Make copies of the graph with vertices, and now run the randomized algorithm on these reduced graphs independently. Determine the number of edge contractions and bound the probability of finding a min-cut. -
Find optimal (or at least near-optimal) values of
and for the variation in the previous part that maximizes the probability of finding a min-cut while using the same number of edge contractions as running the original algorithm twice.
Practice Problems
You are not required to submit the solution to these problems, but highly encouraged to attempt them for your own knowledge sake. :)
Problem 1 - Amortized Analysis - from CLRS chapter 17
A stack
- PUSH(
) pushes object on the top of stack - POP(
) pops the top of the stack and returns the popped object. If one calls POP on an empty stack it returns an error.
We will now modify the stack structure to support the following operation:
- MULTIPOP(
) removes the top objects of stack , popping the entire stack if the number of elements in is less than .
First Question: what is the worst case performance per operation of a sequence of
Second Question: what is the amortized cost of
Problem 2 - Amortized Analysis - from Jeff Erickson’s book
Suppose we can insert or delete an element into a hash table in
- After an insertion, if the table is more than
full, we allocate a new table twice as big as our current table, insert everything into the new table, and then free the old table. - After a deletion, if the table is less than
full, we allocate a new table half as big as our current table, insert everything into the new table, and then free the old table.
Show that, starting from an empty hash table, for any sequence of insertions and deletions, the amortized time per operation is still
Problem 3 - Splay Trees
In this problem we will work out some splaying counterexamples;
-
Single rotations don’t work for splay trees. To show this, consider a degenerate
-node linked-list shaped binary tree where each node’s right child is empty. Suppose that the (only) leaf is splayed to the root by single rotations. Show the structure of the tree after this splay. Generalize this to argue that there is a sequence of splays (each splay using only single rotations) that each take at least work. -
Now from the same starting tree, show the final structure after splaying the leaf with (zig-zig) double rotations. Explain how this splay has made much more progress than single rotations in “improving” the tree.
-
Given the theorem about access time in splay trees, it is tempting to conjecture that splaying does not create trees in which it would take a long time to find an item. Show this conjecture is false by showing that for large enough
, it is possible to restructure any binary tree on nodes into any other binary tree on nodes by a sequence of splay requests. Conclude that it is possible to make a sequence of requests that cause the splay tree to achieve any desired shape.
Hint: start by showing how you can use splay operations to make a specified node into a leaf, then recurse.
Problem 4 - Probability - from MU'05 chapter 1
-
Consider the set
. We generate a subset of this set as follows: a fair coin is flipped independently for each element of the set; if the coin lands heads then the element is added to , and otherwise it is not. Argue that the resulting set is equally likely to be any one of the possible subsets. -
Suppose that two sets
and are chosen independently and uniformly at random from all the subsets of . Determine and .
Hint: use part 1 of the problem.
Problem 5 - Probability
- Consider the following probabilistic process:
- start with
- Repeat
times: toss a fair coin, and if it comes up heads, make . Otherwise, make .
If
- Compute
- For even
, show that . What happens for odd ?
Problem 6 - Covariance and Variance of Correlated Random Variables
In this question we will explore how to use covariance to compute the variance of correlated random variables
- Let
be random variables such that and thus, we are in the case where the random variables are the same random variable.
Let
- Let
be random variables such that and for we have
Let
Problem 7 - Martingales
Consider a fair game in a casino: on each play, you may stake any amount
-
What is the expected number of plays until you first win?
-
The following gambling strategy, known as the “martingale,” was popular in European casinos in the 18th century: on the first play, stake
; on the second play stake ; on the play, stake . Stop playing and leave the casino when you first win!
Show that if you follow the martingale strategy, you will leave the casino
- Let us know discuss the catch in the strategy above: let
be the random variable that measures your maximum loss before winning. Show that . What does this imply about your ability to play the matringale strategy in practice?
Problem 8 - Generating random strings
Alice and Bob have a fair coin and they are trying to generate a random sequence of 2n coin tosses
containing exactly
-
Alice suggests to flip the coin
times. If we get exactly heads, output the sequence; otherwise, try again. How many tosses do you expect Alice will have to make under this scheme? -
Bob claims that he has a much more efficient scheme: flip the coin until you have either
heads or tails. Output this sequence, padded at the end with tails or heads respectively to make the total length . Clearly this scheme requires at most tosses. Is this a good scheme? Justify your answer with a proof. -
Suggest a simple scheme for solving this problem that is better than the previous two schemes. How many tosses does your scheme require in expectation?
Problem 9 - Concentration Inequalities
Questions from lecture 3:
-
Is Markov’s inequality tight? If it is, give examples of random variables for which it is tight.
-
Give examples of discrete probability distributions which do not have a
moment. Also give examples of discrete probability distributions which do not have a central moment. -
Can you generalize Chebyshev’s inequality for higher order moments?
-
Work out the proof of the lower tail bound for Chernoff’s bound.
-
Work out the proof of Hoeffding’s generalization of Chernoff’s inequality.
Problem 10 - Reservoir Sampling - from Mu'05 chapter 2
Suppose we have a sequence of items passing by one at a time. We want to maintain a sample of one item with the property that it is uniformly distributed over all the items that we have seen at each step. Moreover, we want to accomplish this without knowing the total number of items in advance or storing all of the items that we see.
- Consider the following algorithm, which stores just one item in memory at all times.
When the first item appears, it is stored in memory. When the
item appears, it replaces the item in memory with probability . Show why this algorithm solves the problem.
Note: the above algorithm is called reservoir sampling.
- Suppose that we modify the reservoir sampling algorithm from the previous part so that, when the
item appears, it replaces the item in memory with probability . Describe the distribution of the item in memory.
Problem 11 - Exploring Chernoff
In this question, we will think about the following: when are Chernoff bounds actually giving us any non-trivial information?
-
Let
and be the random variable which takes value with probability , and takes the value with probability . Let .- what is the expectation of
? - why can we use Chernoff here, since
as defined is not Poisson Trial? What do we have to do to use Chernoff here? - what does Chernoff say about
?
- what is the expectation of
-
One may be tempted to think in the example above that the reason why Chernoff didn’t give any non-trivial information was because the expectation is small compared to
. However, consider the following random variable: , where is the random variable of part 1 and can be any positive real number.
Prove that even though the expectation can now be large compared to , Chernoff still does not say anything non-trivial.
Hint: for Chernoff we need Poisson trials.
- What is the minimum number of independent copies
of needed if we want Chernoff to say anything meaningful for , where ?
Problem 12 - Bashing
Perfect hashing is nice, but does have the drawback that the perfect hash function has a lengthy description (since you have to describe the second-level hash function for each bucket). Consider the following alternative approach to producing a perfect hash function with a small description. Define bi-bucket hashing, or bashing, as follows. Given
- Suppose a random function (i.e., all function values are uniformly random and mutually independent) is used to map each item to buckets. Give a good upper bound on the expected number of collisions (i.e., the number of pairs of items that are placed in the same bucket).
Hint: what is the probablility that the
-
Argue that bashing can be implemented efficiently, with the same expected outcome, using the ideas from 2-universal hashing.
-
Conclude an algorithm with linear expected time (ignoring array initialization) for identifying a perfect bash function for a set of
items. How large is the description of the resulting function?
Problem 13 - From Lecture 5
- Check that the hash family we constructed in slide 51 is strongly 2-universal. Is it strongly 3-universal?
- Prove that the hash family from slide 54 is 2-universal. Prove also that it is not strongly 2-universal.
- Prove proposition from slide 55
- Construct a family of 3-wise independent hash functions, which is not 4-wise independent. Prove that this family is 3-wise independent and not 4-wise independent.
(Hint: see slide 59.)
- Prove that the number of collisions is an upper bound on the square of the maximum load of a hash table.
- Use Markov’s inequality and the part 6 to prove the second lemma from slide 72.
Problem 14 - Karger’s algorithm: the cut menace
Another problem about Karger’s randomized algorithm for minimum cut:
- Suppose Karger’s algorithm is applied to a tree. Show that it finds a minimum cut in the tree with probability 1.
- Consider the following modification of Karger’s algorithm: instead of choosing an edge uniformly at random and merging the
endpoints, the algorithm chooses any two distinct vertices uniformly at random and merges them. Show that for any
there is a graph with vertices such that when the modified algorithm is run on , the probability that it finds a minimum cut is exponentially small in . - How many times would you have to repeat the modified algorithm of the previous part to have a reasonable chance of finding a minimum cut? What does this tell us about the practicality of the modified algorithm?
- Show that for any
there is a connected graph with vertices that has distinct minimum cuts.