Homework 2 - Spring 2024

This homework covers the material from lectures 7 to 11

Due date: June 1st, 10pm Waterloo time.

PDF version of Homework 2.

LaTeX template, in case you want to write it in LaTeX.

Required Exercises

Problem 1 - Fingerprinting (20 points)

Consider the problem of deciding whether two integer multisets $S_1$ and $S_2$ are identical (that is, each integer occurs the same number of times in both sets). This problem can be solved by sorting the two sets in $O(n \log n)$ time, where $n = |S_1| = |S_2|$. In this question, you will devise 2 faster randomized algorithms for this problem.

You can assume that the multisets $S_i$ only have integers of bit complexity $w$, that $n \leq 2^w$, that integer operations of $O(w)$-bit integers can be executed in $O(1)$ time (RAM model), and that a prime with $O(w)$-bits can be found in $O(n)$ time.

  1. Use polynomial identity testing to give a $O(n)$ time algorithm for the problem above.

  2. Use hashing to give an algorithm which runs in expected $O(n)$ time for the problem above.

Your algorithms for both parts should succeed with probability $\geq 2/3$.

Problem 2 - Randomized Matching (10 points)

Given a randomized algorithm $\mathcal{A}$ for testing the existence of a perfect matching in a graph $G$, give a randomized algorithm for finding a perfect matching in $G$. Analyze the running time and success probability of your algorithm on graphs with $n$ vertices and $m$ edges, given that the algorithm $\mathcal{A}$ runs in time $T(n,m)$, and has failure probability $\leq 1/n$.

Your algorithm should have success probability $\geq 2/3$, and it should definitely be guaranteed to stop after $\textsf{poly}(n,m) T(n,m)$ time.

Problem 3 - Sublinear-time Algorithms (15 points)

Sublinear-time algorithms for connectedness in graphs with bounded degree.

Given a graph $G$ of maximum degree $d$ (as adjacency list), and a parameter $\epsilon > 0$, give an algorithm which has the following behavior: if $G$ is connected, then the algorithm should output PASS with probability $1$, and if $G$ is $\epsilon$-far from connected (at least $\epsilon \cdot n \cdot d$ edges must be added to connect $G$), then the algorithm should output FAIL with probability at least $3/4$. Your algorithm should look at a number of edges that is independent of $n$, and polynomial in $d, \epsilon$.

For this problem, when proving the correctness of your algorithm, it is ok to show that if the input graph $G$ is likely to be passed, then it is $\epsilon$-close to a graph $G_0$ which is connected, without requiring that $G_0$ has degree at most $d$.

Problem 4 - Markov Chains (15 points)

  1. Show that if a Markov chain with transition matrix $P$ is irreducible and has a state $i$ such that $P_{i,i} > 0$, then it is also aperiodic.
  2. Let $a, b$ be positive integers and consider the Markov chain with state space $$ \{ (i,j) \mid 0 \leq i \leq a-1, \ \ 0 \leq j \leq b-1 \} $$ where $i,j$ are integers, and the following transition mechanism: if the chain is in state $(i,j)$ at time $t$, then at time $t+1$ it moves to $( (i+1) \bmod a, j)$ with probability $1/2$ or to $( i, (j+1) \bmod b)$ with probability $1/2$.

Show that this Markov chain is irreducible, and show that it is aperiodic if, and only if, $\gcd(a,b) = 1$.

  1. Consider a chessboard with a lone white king making random (king) moves, meaning that at each move, he picks one of the possible squares to move to, uniformly at random. Is the corresponding Markov chain irreducible and/or aperiodic? If so, what is the stationary distribution?

Problem 5 - Random Walks (10 points)

  1. Consider a Markov chain on the vertices of a triangle: the chain moves from a vertex to one of the other two vertices with probability $1/2$ each. Find the stationary distribution and $\varepsilon$-mixing time of this Markov chain. Also, find the probability that the chain is at the starting vertex after $n$ steps, for $n \geq 1$.

  2. Suppose that we alter the transition probabilities of the Markov chain in the previous question as follows: $$ p_{12} = p_{23} = p_{31} = 2/3, \ \ \ \ p_{21} = p_{32} = p_{13} = 1/3.$$ Find the stationary distribution and the $\varepsilon$-mixing time of this Markov chain. Also, find the probability that the chain is at the starting vertex after $n$ steps, for $n \geq 1$.

Problem 6 - Linear Programming (15 points)

Let $S = \{ x \in \mathbb{R}^n \mid A x \leq b \}$ and $T = \{ x \in \mathbb{R}^n \mid B x \leq c \}$, where $A \in \mathbb{R}^{m \times n}$, $b \in \mathbb{R}^m$, $B \in \mathbb{R}^{t \times n}$ and $c \in \mathbb{R}^t$. Given $A, B, b, c$ as an input, give a polynomial time algorithm for the problem of checking whether $S \subset T$.

Problem 7 - LP Duality (15 points)

For a directed graph $G(V, E)$, with $2$ special vertices $s, t \in V$, let $c_e$ be the capacity of edge $e \in E$ and consider the max-flox LP: $$ \max \sum_{P \ s\text{-}t \text{ path of } G} f_P $$ subject to $$ \sum_{e \in P} f_P \leq c_e \quad \forall e \in E, $$ $$ 0 \leq f_P \quad \text{ for all paths } P. $$ The first constraint says that the total flow on all paths from $s$ to $t$ is at most the capacity of each edge, and the second constraint says that the flow on each path is non-negative.

Give the dual of this LP and explain how to interpret the dual LP as a minimum cut problem.

Practice Problems

You are not required to submit the solution to these problems, but highly encouraged to attempt them for your own knowledge sake. :)

Problem 1 - Matching

Let $G(U, V, E)$ be a bipartite graph, and let $X$ be the matrix of indeterminates given from the adjacency matrix of $G$, as we saw in lecture. Show that the size of the maximum matching in $G$ is equal to the rank of $X$ over the field $\mathbb{F}(X)$.

Hint: you can use the fact that the rank of $X$ as above is the same as the max rank of any evaluation of $X$ over $\mathbb{F}$.

Problem 2 - Probability Amplification

In the distinct element algorithm we saw in lecture, we showed that with probability at least $2/3$ we have $(1-\varepsilon) D \leq Y \leq (1+\varepsilon) D$, where $Y$ is the answer that our algorithm returns. Suppose we run $k$ independent copies of the algorithm in parallel and obtain the estimates $Y_1, \ldots, Y_k$ and return the median $Y’$ of $Y_1, \ldots, Y_k$ as our answer. Prove that $(1-\varepsilon) D \leq Y’ \leq (1+\varepsilon) D$ with probability $1-\delta$ when $k = O(\log 1/\delta)$

This technique of amplifying the success probability is very general and useful when one would like to have an algorithm with arbitrarily high success rate, given only an algorithm with success probability being a constant strictly greater than 1/2.

Problem 3 - Karger’s Algorithm

One advantage of Karger’s random contraction algorithm for the minimum cut problem is that it can be used to output all minimum cuts. In this question, we assume Karger’s algorithm as a black box, which when given a graph $G$ as an input, for any minimum cut $(S, \overline{S})$, can be used to output the minimum cut $(S, \overline{S})$ with probability at least $2/n(n - 1)$ in time $O(n^2),$ where $n$ is the number of vertices in the input graph. Explain how Karger’s algorithm can be used to output all minimum cuts and analyze its running time to output all minimum cuts with success probability at least $0.9999.$

Problem 4 - Testing Sortedness in Sublinear Time (courtesy of Nathan Harms)

In this question we mean that an array is sorted if it is sorted in increasing order.

Let $A$ be an array of $n$ integers, and let $\varepsilon \in (0,1)$. We say that $A$ is $\varepsilon$-far from sorted if every sorted array $B$ has at least $\varepsilon n$ entries that are different from $A$. In other words, at least $\varepsilon n$ entries of $A$ must be modified to turn it into a sorted array.

An $\varepsilon$-tester for sortedness with query complexity $q$ is a randomized algorithm that queries a sequence $i_1, \ldots, i_q$ of indices from $A$ and outputs YES or NO as follows:

  1. if $A$ is sorted, the algorithm output YES with probability 1
  2. if $A$ is $\varepsilon$-far from sorted, the algorithm outputs NO with probability at least $2/3$.

Question 1: the simplest idea for a tester is the random adjacent pair tester, which chooses a random index $i \in \{ 1, \ldots, n-1 \}$, queries $A[i]$ and $A[i+1]$ and rejects if $A[i] > A[i+1]$. Repeat this procedure $q$ times and reject if at any query we output reject.

Give an example array $A$ that is $1/2$-far from sorted, where this algorithm with $q=1$ repetitions outputs no with probability at most $1/(n-1)$. Conclude that the tester must repeat at least $q = \Omega(n)$ times, so the query complexity in this case is $\Omega(n)$

Question 2: Another idea is to choose $q$ independently, uniformly random indices $i_1, \ldots, i_q$ and output NO if the subsequence $A[i_1], A[i_2], \ldots, A[i_q]$ is not sorted.

Give an example of an array that is $1/2$-far from sorted where this algorithm requires $q = \Omega(\sqrt{n})$ queries.

Hint: remember the birthday paradox

Question 3: Assume each entry of $A$ is distinct. Consider the set $G$ of good indices of $A$, where index $i$ is good if using binary search on $A$ to find $A[i]$ will successfully find the index $i$.

  1. For any set $G = \{ i_1, \ldots, i_g \}$ of good indices, show that the subsequence $A[i_1], \ldots, A[i_g]$ is sorted.
  2. Show that, if $A$ is $\varepsilon$-far from sorted, then $|G| \leq (1-\varepsilon)n$.
  3. Show that there is an $\varepsilon$-tester for sortedness that uses $O(\frac{1}{\varepsilon} \log n)$ queries.

Hint: consider choosing a random index $i ~ \{1, \ldots, n\}$, querying $A[i]$, then trying to use binary search to find element $A[i]$.

Problem 5 - Mixing Time of Random Walks

Compute the mixing time (both upper bounds and lower bounds) of the dumbbell graph: the graph on $2n$ nodes that consists of two complete graphs on $n$ nodes joined by a single edge, where $n \geq 3$.

Hint: try breaking up the graph into 4 pieces which converge to the stationary distribution in the appropriate time, and then show that any other probability distribution converges quickly to one of these pieces.

Alternative Hint: you can solve this by using elementary probability calculations

Problem 6 - Random Walks (more advanced material)

A cat and a mouse each independently take a random walk on a connected, undirected, non-bipartite graph $G$. They start at the same time on different nodes, and each makes one transition at each time step. The cat catches the mouse if they are ever at the same node at some time step. Show an upper bound of $O(m^2 n)$ on the expected time before the cat catches the mouse, where $n$ is the number of vertices and $m$ is the number of edges of $G$.

Problem 7 - Pagerank (20 points)

Suppose someone searches a keyword (like “car”) and we would like to identify the webpages that are the most relevant for this keyword and the webpages that are the most reliable sources for this keyword (a page is a reliable source if it points to many of the most relevant pages).

First we identify the pages with this keyword and ignore all other pages. Then we run the following ranking algorithm on the remaining pages.

  • Each vertex corresponds to a remaining page, and there is a directed edge from page $i$ to page $j$ if there is a link from page $i$ to page $j$. Call this directed graph $G(V, E)$, where $n = |V|$.
  • For each vertex $i$, we have two values $s(i)$ and $r(i)$, where intendedly $r(i)$ represents how relevant is this page and $s(i)$ represents how reliable it is as a source (the larger the values the better).
  • We start from some arbitrary initial values, say $s(i) = 1/n$ for all $i$, as we have no idea of their relevance at the beginning.
  • At each step, we update $s$ and $r$ (where $s$ and $r$ are vectors whose $i^{th}$ entries are $s(i)$ and $r(i)$) as follows:
  • First we update $$r(i) = \sum_{j: (j,i) \in E} s(j)$$ for all $i$, as a page is more relevant if it is linked by many reliable sources.
  • Then we update $$s(i) = \sum_{j: (i,j) \in E} r(j)$$ for all $i$ (using the just updated values $r(j)$), as a page is a more reliable source if it points to many relevant pages.
  • To keep the values small, we let $R = \sum_{i=1}^n r(i)$ and $S = \sum_{i=1}^n s(i)$, and divide each $s(i)$ by $S$ and each $r(i)$ by $R$.
  • We repeat this step for many times to refine the values.

Let $s, r \in \mathbb{R}^n$ be the vectors of the $s$ and $r$ values.

  1. Give a matrix formulation for computing $s$ and $r$.
  2. Suppose $G$ is weakly connected (that is, when we ignore the direction of the edges, the underlying undirected graph is connected) and there is a self-loop at each vertex. Prove that there is a unique limiting $s$ and a unique limiting $r$ for any initial $s$ as long as $s \geq 0$ and $s \neq 0$.

Hint: use the Perron-Frobenius theorem.

Problem 8 - Linear Programming

There are $n$ terminals on a circuit board. Each terminal is either positive or negative. A wire will be attached to each terminal. It must be the case that for every pair of terminals of opposite polarity the two wires must be long enough so that they can be made to touch.

E.g., if there’s a positive terminal at point $a = (0, 0)$, and a negative terminal at $b = (0, 10)$, and another negative terminal at $c = (10, 0)$ then one solution is to put wires of length $5$ at all three terminals. Or to put a wire of length $10$ at $a$ and length zero at the others.

Or, for any $x \in [0, 10]$, wires of length $\geq 10 - x$ at $a$, and length $\geq x$ at both $b$ and $c$.

So given the locations of the terminals (and their polarities), the problem is to compute the shortest total length of wires needed to satisfy the stated requirement.

  1. Consider the following specific instance of the problem. All the terminals are along the $x$-axis. There are positive terminals at $0, 110, 111, 112$. And there are $2$ negative terminals at $99$ and $100$.

What’s the optimal solution to the problem?

(You don’t need to prove your solution is optimal.)

  1. Formulate this problem as an LP. (Illustrate your LP by writing this for an example with two positive and two negative terminals.)
  2. Write the dual of the LP in part (2.). (Again, write down this dual for the two terminal example in part (2.).)
  3. Show how to compute an optimal solution to the dual problem of part (3.), without using a general LP solver.

Hint: see Seidel’s 2D algorithm for LPs

Problem 9 - Linear Programming

Suppose we are given a sequence of $n$ linear inequalities of the form $a_i x + b_i y \leq c_i$, where $x, y$ are variables and $a_i, b_i, c_i \in \mathbb{R}$. Collectively, these $n$ inequalities describe a convex polygon $P$ in the plane.

  1. Describe a linear program whose solution describes the largest axis-aligned square that lies entirely inside $P$
  2. Describe a linear program whose solution describes the maximum-perimeter axis-aligned rectangle that lies entirely inside $P$
  3. Describe a linear program whose solution describes the largest circle that lies entirely inside $P$

Problem 10 - Linear Programming

Suppose we are given a sequence of $n$ linear inequalities of the form $a_i x + b_i y \leq c_i$, where $x, y$ are variables and $a_i, b_i, c_i \in \mathbb{R}$. Collectively, these $n$ inequalities describe a convex polygon $P$ in the plane.

  1. Describe a polynomial time algorithm to compute two interior-disjoint axis-aligned squares with maximum total perimeter that lie entirely inside $P$
  2. Describe a polynomial time algorithm to compute two interior-disjoint axis-aligned rectangles with maximum total perimeter that lie entirely inside $P$

Problem 11 - Linear Programming Duality (basic Helly’s theorem)

Given a finite set of halfspaces $H_i := \{ x \in \mathbb{R}^d \mid a_i^T x \leq b_i, \ a_i \in \mathbb{R}^d, \ b_i \in \mathbb{R} \}$, where $1 \leq i \leq n$ such that any set of $d+1$ halfspaces have a non-empty intersection, prove that there is a point lying in the intersection of all these halfspaces.

Previous
Next