Homework 1
This homework covers the material from lectures 1 to 4.
Due date: September 20th, 11:59pm Waterloo time.
PDF version of Homework 1.
LaTeX template, in case you want to write it in LaTeX.
Required Exercises
Remember that you are only required to turn in 5 out of the 6 exercises below.
Problem 1 - Amortized Analysis (20 points)
Part 1 (10 points)
Design a data structure to support the following two operations for a dynamic multiset
- INSERT(
) inserts into - DELETE-LARGER-HALF(
) deletes the largest elements from
Show how to implement this data structure so that any sequence of
Hint: to delete larger half, need to find the median.
Part 2 (10 points)
An ordered stack is a data structure that stores a sequence of items and supports the following operations:
- OrderedPush(
) removes all items smaller than from the beginning of the sequence and then adds to the beginning of the sequence. - Pop() deletes and returns the first item in the sequence, or NULL if the sequence is empty.
Suppose we implement an ordered stack with a simple linked list, using the obvious OrderedPush and Pop algorithms.
Prove that if we start with an empty data structure, the amortized cost of each OrderedPush or Pop operation is
Problem 2 - Splay Trees (20 points)
In this problem we will work out some splaying counterexamples;
-
Single rotations don’t work for splay trees. To show this, consider a degenerate
-node linked-list shaped binary tree where each node’s right child is empty. Suppose that the (only) leaf is splayed to the root by single rotations. Show the structure of the tree after this splay. Generalize this to argue that there is a sequence of splays (each splay using only single rotations) that each take at least work. -
Now from the same starting tree, show the final structure after splaying the leaf with (zig-zig) double rotations. Explain how this splay has made much more progress than single rotations in “improving” the tree.
-
Given the theorem about access time in splay trees, it is tempting to conjecture that splaying does not create trees in which it would take a long time to find an item. Show this conjecture is false by showing that for large enough
, it is possible to restructure any binary tree on nodes into any other binary tree on nodes by a sequence of splay requests. Conclude that it is possible to make a sequence of requests that cause the splay tree to achieve any desired shape.
Hint: start by showing how you can use splay operations to make a specified node into a leaf, then recurse.
Problem 3 - Concentration Inequalities (20 points)
The simplest model for a random graph consists of
- What is the expected number of triangles?
- What is the variance?
- Show that the number of triangles is concentrated around the expectation and give an expression for the bound in the decay of probability.
Problem 4 - Concentration Inequalities (20 points)
In this problem, we are in the setting where given a set
- Show that given a set
of numbers in it is impossible to estimate the value of the median within multiplicative approximation factor with samples.
Hint: to show an impossibility result you show two different sets of
- Given a set
of numbers as in part 1, now calculate the number of samples needed (as a function of ) so that the following is true: with high probability, the median of the sample has at least numbers from less than it and at least numbers of more than it.
Problem 5 - Balls and Bins (20 points)
Consider again the experiment in which we toss
-
By writing
for suitable random variables , show how to derive the following exact formula for the variance of : -
What is the asymptotic value (as
) of in the cases and ?
Hint: you may use the approximations
- When throwing
balls into bins, what is the expected number of bins with exactly one ball? Compute an exact formula and its asymptotic approximation.
Problem 6 - Coupon Collector (20 points)
Part 1 (10 points)
Let us consider the coupon collector problem: we toss
Prove that
Hint: Inclusion-Exclusion.
Part 2 (10 points)
Consider the following variation of the coupon collector’s problem. Each box of cereal contains one of
Practice Problems
You are not required to submit the solution to these problems, but highly encouraged to attempt them for your own knowledge sake. :)
Problem 1 - Amortized Analysis - from CLRS chapter 17
A stack
- PUSH(
) pushes object on the top of stack - POP(
) pops the top of the stack and returns the popped object. If one calls POP on an empty stack it returns an error.
We will now modify the stack structure to support the following operation:
- MULTIPOP(
) removes the top objects of stack , popping the entire stack if the number of elements in is less than .
First Question: what is the worst case performance per operation of a sequence of
Second Question: what is the amortized cost of
Problem 2 - Amortized Analysis - from Jeff Erickson’s book
Suppose we can insert or delete an element into a hash table in
- After an insertion, if the table is more than
full, we allocate a new table twice as big as our current table, insert everything into the new table, and then free the old table. - After a deletion, if the table is less than
full, we allocate a new table half as big as our current table, insert everything into the new table, and then free the old table.
Show that, starting from an empty hash table, for any sequence of insertions and deletions, the amortized time per operation is still
Problem 3 - Probability - from MU’05 chapter 1
-
Consider the set
. We generate a subset of this set as follows: a fair coin is flipped independently for each element of the set; if the coin lands heads then the element is added to , and otherwise it is not. Argue that the resulting set is equally likely to be any one of the possible subsets. -
Suppose that two sets
and are chosen independently and uniformly at random from all the subsets of . Determine and .
Hint: use part 1 of the problem.
Problem 4 - Probability
- Consider the following probabilistic process:
- start with
- Repeat
times: toss a fair coin, and if it comes up heads, make . Otherwise, make .
If
- Compute
- For even
, show that . What happens for odd ?
Problem 5 - Covariance and Variance of Correlated Random Variables
In this question we will explore how to use covariance to compute the variance of correlated random variables
- Let
be random variables such that and thus, we are in the case where the random variables are the same random variable.
Let
- Let
be random variables such that and for we have
Let
Problem 6 - Martingales
Consider a fair game in a casino: on each play, you may stake any amount
-
What is the expected number of plays until you first win?
-
The following gambling strategy, known as the “martingale,” was popular in European casinos in the 18th century: on the first play, stake
; on the second play stake ; on the play, stake . Stop playing and leave the casino when you first win!
Show that if you follow the martingale strategy, you will leave the casino
- Let us know discuss the catch in the strategy above: let
be the random variable that measures your maximum loss before winning. Show that . What does this imply about your ability to play the matringale strategy in practice?
Problem 7 - Generating random strings
Alice and Bob have a fair coin and they are trying to generate a random sequence of 2n coin tosses
containing exactly
-
Alice suggests to flip the coin
times. If we get exactly heads, output the sequence; otherwise, try again. How many tosses do you expect Alice will have to make under this scheme? -
Bob claims that he has a much more efficient scheme: flip the coin until you have either
heads or tails. Output this sequence, padded at the end with tails or heads respectively to make the total length . Clearly this scheme requires at most tosses. Is this a good scheme? Justify your answer with a proof. -
Suggest a simple scheme for solving this problem that is better than the previous two schemes. How many tosses does your scheme require in expectation?
Problem 8 - Concentration Inequalities
Questions from lecture 3:
-
Is Markov’s inequality tight? If it is, give examples of random variables for which it is tight.
-
Give examples of discrete probability distributions which do not have a
moment. Also give examples of discrete probability distributions which do not have a central moment. -
Can you generalize Chebyshev’s inequality for higher order moments?
-
Work out the proof of the lower tail bound for Chernoff’s bound.
-
Work out the proof of Hoeffding’s generalization of Chernoff’s inequality.
Problem 9 - Reservoir Sampling - from Mu’05 chapter 2
Suppose we have a sequence of items passing by one at a time. We want to maintain a sample of one item with the property that it is uniformly distributed over all the items that we have seen at each step. Moreover, we want to accomplish this without knowing the total number of items in advance or storing all of the items that we see.
- Consider the following algorithm, which stores just one item in memory at all times.
When the first item appears, it is stored in memory. When the
item appears, it replaces the item in memory with probability . Show why this algorithm solves the problem.
Note: the above algorithm is called reservoir sampling.
- Suppose that we modify the reservoir sampling algorithm from the previous part so that, when the
item appears, it replaces the item in memory with probability . Describe the distribution of the item in memory.
Problem 10 - Exploring Chernoff
In this question, we will think about the following: when are Chernoff bounds actually giving us any non-trivial information?
-
Let
and be the random variable which takes value with probability , and takes the value with probability . Let .- what is the expectation of
? - why can we use Chernoff here, since
as defined is not Poisson Trial? What do we have to do to use Chernoff here? - what does Chernoff say about
?
- what is the expectation of
-
One may be tempted to think in the example above that the reason why Chernoff didn’t give any non-trivial information was because the expectation is small compared to
. However, consider the following random variable: , where is the random variable of part 1 and can be any positive real number.
Prove that even though the expectation can now be large compared to , Chernoff still does not say anything non-trivial.
Hint: for Chernoff we need Poisson trials.
- What is the minimum number of independent copies
of needed if we want Chernoff to say anything meaningful for , where ?