Solutions to Exam 1

Problem 1

For this problem we are assuming that the our data structure is initially empty (as that does not change anything in the analysis).

Data structure: we will maintain two arrays $A$ and $B$ (my arrays will have initial index $1$), with extra information about the number of elements in each array, which we denote by $n_A, n_B$.

initialization: start with two arrays $A$ and $B$ of size $10$ each, and set the number of elements in each array to $0$.

Before we describe our operations, let’s describe our subroutine $DOUBLE(A, B)$, which given two arrays $A, B$, constructs arrays $A’, B’$ such that $A’$ and $B’$ have size $2|A|$ and $2|B|$ respectively, and $A’$ and $B’$ contain the same elements as $A$ and $B$ respectively (all in same order as in $A, B$).

$DOUBLE(A, B)$:
- create two new arrays $A’, B’$ of size $2|A|, 2|B|$ respectively
- for $i = 1, \ldots, n_A$:
  - $A’[i] = A[i]$
- for $i = 1, \ldots, n_B$:
  - $B’[i] = B[i]$
- return $A’, B'$

Now we can describe our operations:

$ADDTOFRONT(x)$:
- if $n_A + n_B + 1 > (|A| + |B|)/4$:
  - $A, B = DOUBLE(A, B)$
- $A[n_A + 1] = x$
- $n_A = n_A + 1$
$ADDTOBACK(x)$:
- if $n_A + n_B + 1 > (|A| + |B|)/4$:
  - $A, B = DOUBLE(A, B)$
- $B[n_B + 1] = x$
- $n_B = n_B + 1$
$LOOKUP(k)$:
- if $k > n_A + n_B$:
  - return $\bot$
- if $k \leq n_A$:
  - return $A[n_A + 1 - k]$
- else:
  - return $B[k - n_A]$

Note that $LOOKUP$ takes worst-case time $O(1)$, since we are just querying one entry of one of the arrays, and it takes constant time to check if $k > n_A + n_B$. Moreover, note that we never have an overflow in our arrays, since to have an overflow in $A$ we would need $n_A = |A|$ prior to an insertion, but that would imply $n_A + n_B \geq n_A = |A| = (|A| + |B|)/2$, which cannot happen, as we would have doubled the size of our arrays before that happened. The analysis is similar for $B$.

Now we analyze the amortized cost of each of the other operations. To do that we will use the accounting method. Assume that the cost of each basic operation (comparisons, insertion in array, etc) is $1$. We will charge $20$ to each $ADDTOFRONT$ and $ADDTOBACK$ operation, and we will use this charge to pay for the $DOUBLE$ operation. Thus, so long as we prove that this is a valid charging scheme, we will have shown that the amortized cost of each operation is $O(1)$.

Note that $DOUBLE$ takes time $\leq 3 \cdot (|A| + |B|)$, and so we need to charge $3 \cdot (|A| + |B|)$ to pay for it.

However, for us to invoke $DOUBLE$, we must have insterted at least $(|A| + |B|)/4$ elements into our data structure. Since we have charged $20$ to each of the newly added elements, we have charged at least $(|A| + |B|)/4 \cdot 20 = 5 \cdot (|A| + |B|)$, which covers the cost of $DOUBLE$. Thus, the above is a valid charging scheme.

Space requirement: note that, as we only increase the size of our arrays when they are (collectivelly) full up to a quarter capacity, we always have that $10 + n_A + n_B \geq (|A| + |B|)/8$. Thus, we have that our data structure uses $O(n)$ space, after $n$ insertions.

Problem 2

Note that for all methods which require repetition, to get a uniform probability distribution, we need to make sure that the probability of each element being chosen is the same in each trial.

Remark: in the exam you were required to compute these probabilities and the parameters of the geometric random variables.

Part 1

For Alice’s method to work, we need $p^2 = p(1-p)$, which implies $p = 1/2$ is the only solution.

Part 2

For Bob’s method to work, we need $p^2 = 2p(1-p) = (1-p)^2$, which has no solutions. So Bob’s method never works.

Part 3

In Charlie’s method, all probabilities are the same, so it works all the time.

Part 4

Let $X$ be the random variable such that $X = k$ iff Alice’s method succeeded in sampling one of $1, 2, 3$ in the $k$-th trial. Since Alice does 2 coin tosses per trial, the random variable we want is $Z = 2X$.

Since $X$ is a geometric random variable with success parameter $q = 1 - (1-p)^2$ (as this is the probability that she will not sample $TT$), we have: $$ \mathbb{E}[Z] = 2 \mathbb{E}[X] = 2 \cdot \frac{1}{q} = \frac{2}{1 - (1-p)^2}$$

Part 5

Let $X$ be the random variable such that $X = k$ iff Charlie’s method succeeded in sampling one of $1, 2, 3$ in the $k$-th trial. Since Charlie does 3 coin tosses per trial, the random variable we want is $Z = 3X$.

Since $X$ is a geometric random variable with success parameter $q = 3p(1-p)^2$ (as this is the probability that he will sample exactly one $H$), we have: $$ Var[Z] = 9 Var[X] = 9 \cdot \frac{1-q}{q^2} = \frac{1 - 3p(1-p)^2}{p^2(1-p)^4} $$

Problem 3

Let $t$ be a parameter that we will set later.

Algorithm $\mathcal{B}$ : run $\mathcal{A}$ on input $G$ $t$ times, and output the majority answer.

Let $X_i$ be the indicator random variable that $\mathcal{A}$ outputs the wrong answer on the $i$-th trial. Let $X = \sum_{i=1}^t X_i$ be the random variable that counts the number of times $\mathcal{A}$ outputs the wrong answer. Note that $\mathcal{B}$ outputs the wrong answer iff $X \geq t/2$.

Since $\mathcal{A}$ is correct with probability $2/3$, we have that $\mathbb{E}[X_i] = 1/3$. Thus, by linearity of expectation, we have that $\mathbb{E}[X] = t/3$.

Now, we want to bound the probability that $\mathcal{B}$ outputs the wrong answer. By Chernoff, we have that: $$ \Pr[X \geq t/2] = \Pr[X \geq (3/2) \cdot \mathbb{E}[X]] \leq e^{-\delta^2 \mathbb{E}[X]/3} $$ where $\delta = 1/2$. Plugging in the values, we have that: $$ \Pr[X \geq t/2] \leq e^{-t/36} $$ Thus, if we set $t = 36n$, we have failure probability $\leq e^{-n} \leq 2^{-n}$, as we wanted.

Problem 4

Part 1

$$\Pr[\text{first bin has no red balls} ] = \left( \frac{n-1}{n} \right)^n$$ $$\Pr[\text{first bin has } 3 \text{ blue balls}] = \binom{n}{3} \left( \frac{1}{n} \right)^3 \left( \frac{n-1}{n} \right)^{n-3}$$ as we choose the 3 blue balls that will go to the first bin, and then we choose the remaining balls to go to the other bins.

As these two events are independent, we have that the probability that both happen is the product of the probabilities, which is: $$ \left( \frac{n-1}{n} \right)^n \binom{n}{3} \left( \frac{1}{n} \right)^3 \left( \frac{n-1}{n} \right)^{n-3} \sim \frac{1}{6e^{2}} $$ where we use $(1-1/n)^n \sim 1/e$.

Part 2

This probability is the probability that the balls all fall into the other $n-2$ bins, which is: $\left( \frac{n-2}{n} \right)^n \sim 1/e^2$.

Part 3

This probability is given by choosing the ball that will fall on the first bin, and then making all other balls fall outside of the first two bins. This is given by $\binom{n}{1} \left( \frac{1}{n} \right) \left( \frac{n-2}{n} \right)^{n-1} \sim 1/e^2$.

Problem 5

Part 1

By definition of 2-universal hash functions, the probability over the choice of hash function is $\leq 1/n$. Since there are $|\mathcal{H}|$ hash functions, we get the bound $|\mathcal{H}|/n$.

Part 2

Note that, for $n \neq y \bmod p$, we have $a(x-y) \equiv 0 \bmod p$ iff $a = 0$. Thus, $$\Pr_{a \in \mathbb{F}_p}[h_a(x) = h_a(y)] = 1/p$$ Since a choice of $a$ determines a unique hash function in $\mathcal{H}$, the above proves that $\mathcal{H}$ is 2-universal.

Problem 6

Part 1

The height is the longest path from root to leaf. Thus for height $h$ we will have a path with $h+1$ nodes (one is the root of the tree and one is a leaf). For this path, note that at each step we introduce a new variable, and this variable is not canceled out, since the polynomial $P_v$ is monic in the new variable. Thus, the number of variables is $h+1$.

We will prove that the degree of the polynomial is the number of leaves, that is, $m$. There are many ways of doing this, and here we will prove it by induction on the height.

For the base case, note that for height $0$ we have a single node, which is a leaf, and the polynomial is $P_v = x_0$. Thus, the degree is $1$ and the base case holds.

For the inductive step, note that if a root $r$ has children $v_1, \ldots, v_k$, then the polynomial $P_r$ is given by: $$ P_r = \prod_{i=1}^k (x_h - P_{v_i}) $$ where $h$ is the height of $r$. Hence, we have $$ \deg(P_r) = \sum_{i=1}^k \deg(x_h - P_{v_i}) = \sum_{i=1}^k \max(\deg(x_h), \deg(P_{v_i})) = \sum_{i=1}^k \deg(P_{v_i}) $$ where the last equality follows from the inductive hypothesis. By the inductive hypothesis, we have that $\deg(P_{v_i}) = $ # of leaves in the subtree rooted at $v_i$. Thus, we have that $\deg(P_r) = $ # of leaves in the subtree rooted at $r$, which is what we wanted to prove.

Part 2

Base case: for height $0$ each tree is a single node, and both polynomials are $x_0$. Thus, the claim holds.

Inductive step: suppose that the claim holds for height $h-1$. Let $r_1$ and $r_2$ be the roots of the two trees. Since the trees are isomorphic, with isomorphism $f$, we have that if the children of $r_1$ are $v_1, \ldots, v_k$, then the children of $r_2$ are $f(v_1), \ldots, f(v_k)$. By induction, we have $P_{v_i} = P_{f(v_i)}$ for all $i$. Thus, we have $$ P_{r_1} = \prod_{i=1}^k (x_h - P_{v_i}) = \prod_{i=1}^k (x_h - P_{f(v_i)}) = P_{r_2} $$ and we are done.

Part 3

Base case: for height $0$ each tree is a single node, and both polynomials are $x_0$. Thus, the claim holds.

Inductive step: suppose that the claim holds for height $h-1$. Let $r_1$ and $r_2$ be the roots of the two trees such that $P_{r_1} \equiv P_{r_2}$. Now, by definition of $P_{r_1}$ and $P_{r_2}$, we have that these are polynomials of the form $\prod_{i=1}^k (x_h - P_{v_i})$, where the polynomials $P_{v_i}$ don’t depend on $x_h$. Hence, we can use the unique factorization from the hint to conclude that $P_{r_1}$ and $P_{r_2}$ have the same number of factors, and that these factors are the same up to a permutation.

Then, we can write $P_{r_1} = \prod_{i=1}^k (x_h - P_{v_i})$ and $P_{r_2} = \prod_{i=1}^k (x_h - P_{u_i})$, with $P_{v_i} \equiv P_{u_i}$, for all $i$. From this, we can construct the isomorphism $h$ by setting $h(v_i) = u_i$ for all $i$. By inductive hypothesis, we have an isomorphism $g_i$ for the subtrees rooted at $v_i$ and $u_i$. Hence, from the maps $h$ and $g_i$ we can construct the isomorphism $f$ for the whole tree, by making $f(v_i) = h(v_i) = u_i$. and for each subtree we can take $f$ to be the appropriate $g_i$.

Part 4

By parts 2 and 3, we know that if two trees are isomorphic, then their polynomials are equal. So we want to use the randomized PIT algorithm (Ore-de Millo-Lipton-Schwartz-Zippel) for this.

Let $r_1, r_2$ be the roots of the two trees, and let $P_{r_1}, P_{r_2}$ be their polynomials. An easy induction also shows that if we evaluate our polynomials at points coming from $[-t, t]^{h+1}$ (remember from part 1 that our polynomials have $h+1$ variables), then we will have $|P_{r_i}(\alpha) | \leq ((h+1)\cdot t)^m$. The induction is on the height and number of leaves of the subtrees.

Let $Q = P_{r_1} - P_{r_2}$. By the above, we have that $|Q(\alpha)| \leq 2 \cdot ((h+1)\cdot t)^m$ whenever $\alpha \in [-t, t]^{h+1}$. We can evaluate $Q$ by first evaluating $P_{r_1}$ and $P_{r_2}$ at $\alpha$, and then subtracting the results.

With the above in mind ¹, we have a couple of ways to proceed here. One way is this: we will pick a prime $t \leq p \leq (h+1)^2 \cdot t^2$ at random and evaluate $Q$ a random point over $\mathbb{F}_p^{h+1}$. If $P_{r_1} \equiv P_{r_2}$, then they will always evaluate to the same value, and if they are not equal, then $Q$ will evaluate (over $\mathbb{Z}$) to zero with probability at most $m/t$.

Provided $Q(\alpha) \neq 0$ and since this value will be in $[-2 \cdot ((h+1)\cdot t)^m, 2 \cdot ((h+1)\cdot t)^m]$, by picking a random prime in the above range we will have that $\bmod p$ this nonzero value will be zero with probability at most $$\dfrac{m \log((h+1) \cdot t) + 1}{\text{# primes in above range}} \leq \dfrac{m \log((h+1) \cdot t) + 1}{(h+1) \cdot t}. $$

Setting $t = m^3$, the above probability is $\tilde{O}(1/m^2)$. Hence, the probability of failure of our algorithm (by union bound) is $\tilde{O}(\dfrac{m}{t} + 1/m^2) = \tilde{O}(1/m^2)$.

Our algorithm is efficient since we can evaluate $P_{r_1}$ and $P_{r_2}$ at $\alpha$ (over $\mathbb{F}_p)$ by our recursion in time $O(m)$.

I did not require this much precision, but I am giving you a full formal solution just so you can see it. ↩︎

Last updated on Jun 9, 2023

Edit this page