Homework 3
This homework covers the material from lectures 10 to 13
Due date: June 28th.
PDF version of Homework 3.
LaTeX template, in case you want to write it in LaTeX.
Required Exercises
Remember that you are only required to turn in 5 out of the 6 exercises below.
Problem 1 - Fingerprinting (20 points)
Consider the problem of deciding whether two integer multisets $S_1$ and $S_2$ are identical (that is, each integer occurs the same number of times in both sets). This problem can be solved by sorting the two sets in $O(n \log n)$ time, where $n = |S_1| = |S_2|$. In this question, you will devise 2 faster randomized algorithms for this problem.
You can assume that the multisets $S_i$ only have integers of bit complexity $w$, and that integer operations of $O(w)$-bit integers can be executed in $O(1)$ time (RAM model), and that a prime with $O(w)$-bits can be found in $O(n)$ time.
-
Use polynomial identity testing to give a $O(n)$ time algorithm for the problem above.
-
Use hashing to give a $O(n)$ time algorithm for the problem above.
Your algorithms for both parts should succeed with probability $\geq 2/3$.
Problem 2 - Mixing Time of Random Walks (20 points)
Compute the mixing time (both upper bounds and lower bounds) of the dumbbell graph: the graph on $2n$ nodes that consists of two complete graphs on $n$ nodes joined by a single edge, where $n \geq 3$.
Hint: you can solve this by using elementary probability calculations
Another Hint: if you want to solve this via the methods from class, try breaking up the graph into 4 pieces which converge to the stationary distribution in the appropriate time, and then show that any other probability distribution converges quickly to one of these pieces.
Problem 3 - Pagerank (20 points)
Suppose someone searches a keyword (like “car”) and we would like to identify the webpages that are the most relevant for this keyword and the webpages that are the most reliable sources for this keyword (a page is a reliable source if it points to many of the most relevant pages).
First we identify the pages with this keyword and ignore all other pages. Then we run the following ranking algorithm on the remaining pages.
- Each vertex corresponds to a remaining page, and there is a directed edge from page $i$ to page $j$ if there is a link from page $i$ to page $j$. Call this directed graph $G(V, E)$, where $n = |V|$.
- For each vertex $i$, we have two values $s(i)$ and $r(i)$, where intendedly $r(i)$ represents how relevant is this page and $s(i)$ represents how reliable it is as a source (the larger the values the better).
- We start from some arbitrary initial values, say $s(i) = 1/n$ for all $i$, as we have no idea of their relevance at the beginning.
- At each step, we update $s$ and $r$ (where $s$ and $r$ are vectors whose $i^{th}$ entries are $s(i)$ and $r(i)$) as follows:
- First we update $$r(i) = \sum_{j: (j,i) \in E} s(j)$$ for all $i$, as a page is more relevant if it is linked by many reliable sources.
- Then we update $$s(i) = \sum_{j: (i,j) \in E} r(j)$$ for all $i$ (using the just updated values $r(j)$), as a page is a more reliable source if it points to many relevant pages.
- To keep the values small, we let $R = \sum_{i=1}^n r(i)$ and $S = \sum_{i=1}^n s(i)$, and divide each $s(i)$ by $S$ and each $r(i)$ by $R$.
- We repeat this step for many times to refine the values.
Let $s, r \in \mathbb{R}^n$ be the vectors of the $s$ and $r$ values.
- Give a matrix formulation for computing $s$ and $r$.
- Suppose $G$ is weakly connected (that is, when we ignore the direction of the edges, the underlying undirected graph is connected) and there is a self-loop at each vertex. Prove that there is a unique limiting $s$ and a unique limiting $r$ for any initial $s$ as long as $s \geq 0$ and $s \neq 0$.
Hint: You may use the Perron-Frobenius theorem which states that for any irreducible matrix, there is a unique positive eigenvalue with maximum absolute value and the entries of the corresponding eigenvector are all positive.
Problem 4 - Linear Programming (20 points)
Let $S = \{ x \in \mathbb{R}^n \mid A x \leq b \}$ and $T = \{ x \in \mathbb{R}^n \mid B x \leq c \}$, where $A \in \mathbb{R}^{m \times n}$, $b \in \mathbb{R}^m$, $B \in \mathbb{R}^{t \times n}$ and $c \in \mathbb{R}^t$. Given $A, B, b, c$ as an input, give a polynomial time algorithm for the problem of checking whether $S \subset T$.
Hint: Linear Programming
Problem 5 - Multiplicative Weights Update (20 points)
Here is a variation on the deterministic Weighted-Majority algorithm, designed to make it more adaptive.
- Each expert begins, on the 1st day, with weight $1$.
- We predict the result of a weighted-majority vote of the experts.
- If an expert makes a mistake, we penalize it by dividing its weight by $2$, but only if its weight was at least $1/4$ of the average weight of experts.
Prove that in any contiguous block of trials (e.g., the block from 51st day through the 77th day), the number of mistakes made by the algorithm is at most $O(m + \log n)$, where $m$ is the number of mistakes made by the best expert in that block, and $n$ is the total number of experts.
Problem 6 - Multiplicative Weights and Max-Flow (20 points)
Consider the maximum flow problem from $s$ to $t$ on a directed graph, where each edge has capacity one. A fractional $s$-$t$ flow solution with value $k$ is an assignment of each edge $e$ to a fractional value $x(e) \in \mathbb{R}$, satisfying
\begin{align*} \sum_{e \in \delta^{out}(s)} x(e) &= \sum_{e \in \delta^{in}(t)} x(e) = k \\ \sum_{e \in \delta^{in}(s)} x(e) &= \sum_{e \in \delta^{out}(t)} x(e) = 0 \\ \sum_{e \in \delta^{in}(v)} x(e) &= \sum_{e \in \delta^{out}(v)} x(e) \ \ \forall v \in V \setminus \{ s, t \} \\ 0 &\leq x(e) \leq 1, \ \ \forall e \in E \end{align*}
Use the multiplicative weights update method to solve this Linear Program by reducing the flow problem to the problem of finding shortest paths between $s$ and $t$. Analyze the convergence rate and total complexity of your algorithm to compute a flow of value $k \cdot (1-\epsilon)$ for any $\epsilon > 0$.
Practice Problems
You are not required to submit the solution to these problems, but highly encouraged to attempt them for your own knowledge sake. :)
Problem 1 (Problem 7.3 from [MR'07])
Given a randomized algorithm for testing the existence of a perfect matching in a graph $G$, which has $n$ vertices, which runs in time $O(T(n))$, describe and analyze an algorithm for the following problems:
-
Construct a perfect matching for $G$
-
Construct a maximum matching for $G$
Your algorithm for both parts should use the given randomized algorithm as a black-box.
Problem 2 (Problem 7.9 from [MR'07])
Consider the Tutte matrix of a (non-bipartite) graph $G(V,E)$ defined in lecture. Show that the rank of the Tutte matrix of $G$ is twice the size of a maximum matching in $G$.
Hint: If $A$ is an $n \times n$ skew-symmetric matrix of rank $r$ and $S, T \subset [n]$ are such that $|S| = |T| = r$, then $$ det(A_{SS}) \cdot det(A_{TT}) = det(A_{ST}) \cdot det(A_{TS}), $$ where $A_{ST}$ is the submatrix of $A$ obtained by only considering the entries $A_{i,j}$ where $i \in S, j \in T$.
Problem 3
There are $n$ terminals on a circuit board. Each terminal is either positive or negative. A wire will be attached to each terminal. It must be the case that for every pair of terminals of opposite polarity the two wires must be long enough so that they can be made to touch.
E.g., if there’s a positive terminal at point $a = (0, 0)$, and a negative terminal at $b = (0, 10)$, and another negative terminal at $c = (10, 0)$ then one solution is to put a wires of length $5$ at all three terminals. Or to put a wire of length $10$ at $a$ and length zero at the others.
Or, for any $x \in [0, 10]$, wires of length $\geq 10 - x$ at $a$, and length $\geq x$ at both $b$ and $c$.
So given the locations of the terminals (and their polarities), the problem is to compute the shortest total length of wires needed to satisfy the stated requirement.
- Consider the following specific instance of the problem. All the terminals are along the $x$-axis. There are positive terminals at $0, 110, 111, 112$. And there are $2$ negative terminals at $99$ and $100$.
What’s the optimal solution to the problem?
(You don’t need to prove your solution is optimal.)
- Formulate this problem as an LP. (Illustrate your LP by writing this for an example with two positive and two negative terminals.)
- Write the dual of the LP in part (2.). (Again, write down this dual for the two terminal example in part (2.).)
- Show how to compute an optimal solution to the dual problem of part (3.), without using a general LP solver.
Hint: see Seidel’s 2D algorithm for LPs