10. Discrepancy

The last lecture ended with the challenge of finding a communication protocol for the inner product function with sublinear communication complexity. As it turns out, there is no such protocol. But how can we prove that? This and other similar lower bound questions is what we explore in the upcoming lectures.

Discrepancy and Communication Complexity

One good way to get some intuition for why the inner product function has large randomized communication complexity is to look at the matrix for this function:

Communication matrix of Inner Product

For a function to have low randomized communication complexity, the matrix should contain fairly large nearly monochromatic rectangles: rectangles for which almost all the entries have the same value. That’s because the leaves of a randomized communication protocol, if they have low error, correspond to nearly monochromatic rectangles. But even in this small example, we see that the structure of the matrix makes it impossible to find large rectangles which are close to monochromatic if we exclude the first row and the first column. The \(4 \times 4\) matrices, for example, all contain at least \(\frac14\) and at most a \(\frac34\) fraction of \(\mathbf{1}\)s.

This intuition can indeed lead to formal lower bounds on the communication complexity of Boolean functions. We formalize it using the notion of discrepancy

As a first step, let us change the range of the Boolean functions that we consider so that they are \(\{-1,1\}\)-valued. For example, we can redefine the inner product function to take the value

\[\mathsf{IP}_n(x,y) = (-1)^{\left< x, y \right>}.\]

With this change, the notion of balance or bias of a combinatorial rectangle can be expressed as an expectation. Specifically, let us fix a function \(f \colon \{0,1\}^n \times \{0,1\}^n \to \{-1,1\}\) and a distribution \(\mu\) over the domain of \(f\). The discrepancy of a rectangle \(R = A \times B\) for some \(A, B \subseteq \{0,1\}^n\) with respect to \(f\) and \(\mu\) is

\[\mathrm{disc}_\mu(f, R) = \left| \mathrm{E}_{(x,y) \sim \mu}[ \mathbf{1}_R(x,y) f(x,y) ] \right|,\]

where \(\mathbf{1}_R\) is the indicator function for \(R\) that takes the value \(1\) on \((x,y)\) if and only if \((x,y) \in R\) or, equivalently, \(x \in A\) and \(y \in B\).

The discrepancy of the function \(f\) with respect to \(\mu\) is

\[\mathrm{disc}_\mu(f) = \max_R \mathrm{disc}_\mu(f, R)\]

where the maximum is taken over all rectangles. Note that a function has large discrepancy if and only if it contains a large rectangle that is nearly monochromatic.

The discrepancy of a function can be used to give a lower bound on its randomized communication complexity.

Theorem 1. For any function \(f\), distribution \(\mu\), and parameter \(\epsilon > 0\),
\[\mathrm{R}_\epsilon(f) \ge \log \left( \frac{1-2\epsilon}{\mathrm{disc}_\mu(f)} \right).\]

Proof. Let \(\pi\) be a deterministic communication protocol that computes \(f\) with cost \(c\) and error at most \(\epsilon\) with respect to \(\mu\). The leaves of the protocol tree for \(\pi\) partition the communication matrix for \(f\) into \(t \le 2^c\) rectangles \(R_1,\ldots,R_t\). Note that for any input \((x,y)\),
\[\pi(x,y) f(x,y) = \begin{cases} 1 & \mbox{if } \pi(x,y) = f(x,y) \\ -1 & \mbox{if } \pi(x,y) \neq f(x,y) \end{cases}.\]
So the success guarantee on \(\pi\) implies that
\[1 - 2\epsilon \le \mathrm{E}_\mu[ \pi(x,y) f(x,y) ].\]
We can expand the expression on the right and use linearity of expectation to obtain
\[\mathrm{E}_\mu[ \pi(x,y) f(x,y) ] = \sum_{i=1}^t \mathrm{E}_\mu[ \mathbf{1}_{R_i}(x,y) \pi(x,y) f(x,y) ] \le \sum_{i=1}^t \left| \mathrm{E}_\mu[ \mathbf{1}_{R_i}(x,y) f(x,y) ] \right|\]
with the last inequality holding because \(\pi(x,y) \in \{-1,1\}\) takes the same value for all \((x,y) \in R_i\). But each term in the final sum is the discrepancy of a rectangle with respect to \(f\) and \(\mu\), so each of them is at most \(\mathrm{disc}_\mu(f)\). Combining everything, we then get
\[1 - 2\epsilon \le t \cdot \mathrm{disc}_\mu(f) \le 2^c \cdot \mathrm{disc}_\mu(f)\]
and rearranging gives the lower bound \(c \ge \log(\frac{1-2\epsilon}{\mathrm{disc}_\mu(f)})\) on the cost of \(\pi\). The conclusion of the theorem then follows from the easy direction of Yao’s Minimax Principle.

Inner Product

We can use the last theorem to obtain a linear lower bound on the randomized query complexity of the inner product function. We do this by proving the following lower bound on the discrepancy of the inner product function with respect to the uniform distribution.

Lemma 2. Let \(\mu\) denote the uniform distribution. Then \(\mathrm{disc}_{\mu}(\mathsf{IP}_n) \le 2^{-n/2}.\)

Proof. Fix any rectangle \(R = A \times B\). We can write the square of the discrepancy of \(R\) with respect to \(\mathsf{IP}\) and the uniform distribution as
\[\begin{align*} \mathrm{disc}_\mu(\mathsf{IP})^2 &= \left( \mathrm{E}[ \mathbf{1}_A(x) \mathbf{1}_B(y) (-1)^{\left< x, y \right>} ] \right)^2 \\ &= \left( \mathrm{E}_x\left[ \mathbf{1}_A(x) \mathrm{E}_y\big[ \mathbf{1}_B(y) (-1)^{\left< x, y \right>} \big]\right] \right)^2 \\ &\le \mathrm{E}_x\left[ \mathbf{1}_A(x)^2 \mathrm{E}_y\big[ \mathbf{1}_B(y) (-1)^{\left< x, y \right>} \big]^2 \right] \\ &\le \mathrm{E}_x\left[ \mathrm{E}_y\big[ \mathbf{1}_B(y) (-1)^{\left< x, y \right>} \big]^2 \right]. \end{align*}\]
The first upper bound above is by Jensen’s inequality and the second by the observation that \(\mathbf{1}_A(x)^2 \le 1\) and \(\mathrm{E}_y\big[ \mathbf{1}_B(y) (-1)^{\left< x, y \right>}\big]^2\) is non-negative. By expanding the square, the last expression equals
\[\mathrm{E}_{x,y,y'}\big[ \mathbf{1}_B(y) \mathbf{1}_B(y') (-1)^{\left<x,y\right> + \left<x,y'\right>} \big] \le \mathbf{E}_{y,y'}\left[ \left| \mathbf{E}_x\big[ (-1)^{\left< x, y+y'\right>} \big] \right| \right].\]
The key observation now is that
\[\mathrm{E}_x \big[ (-1)^{\left< x, y+y' \right>} \big] = \begin{cases} 1 & \mbox{when } y+y' = 0 \\ 0 & \mbox{otherwise.} \end{cases}\]
And \(y + y' = 0\) if and only if \(y = y'\). This occurs with probability \(2^{-n}\) when \(y\) and \(y'\) are drawn independently and uniformly at random. Therefore, combining the above inequalities we end up with
\[\mathrm{disc}_\mu(\mathsf{IP}_n)^2 \le 2^{-n},\]
giving the desired conclusion.

Combining the lemma with the theorem from the last section, we obtain tight bounds on the randomized communication complexity of the inner product function.

Theorem 3. \(\mathrm{R}(\mathsf{IP}_n) = \Theta(n).\)

Set Disjointness

We can try bounding the discrepancy of the disjointness function against the uniform distribution as well, but a quick look at the matrix representation of this function shows why this is not a promising approach.

Communication matrix of Disjointness

There is a monochromatic rectangle that covers the bottom right corner of the matrix. And with larger values of \(n\), the communication matrix always includes such a large monochromatic rectangle: for every pair of inputs \(x,y \in \{0,1\}^n\) that satisfy \(|x|, |y| > \frac n2\) by the pigeonhole principle there must be some index \(i \in [n]\) for which \(x_i = y_i = 1\). So the discrepancy of this function with respect to the uniform distribution is constant and only gives a trivial lower bound on the communication complexity of the function.

But we can use the discrepancy method to obtain a non-trivial lower bound on the communication complexity of the disjointness function by considering a different distribution on the inputs. One natural candidate to consider is the \(\frac1{\sqrt{n}}\)-biased product distribution on the inputs: we draw \(x_1,\ldots,x_n\) and \(y_1,\ldots,y_n\) independently at random with each of these \(2n\) bits taking the value 1 with probability \(p = \frac{1}{\sqrt{n}}\). This choice of \(p\) guarantees that the resulting strings correspond to disjoint sets with a constant probability that is bounded away from 0 and 1.

Th disjointness function has a non-trivial discrepancy bound under this distribution.

Lemma 4. When \(\mu\) is the \(\frac{1}{\sqrt{n}}\)-biased product distribution,
\[\mathrm{disc}_\mu(\mathsf{Disj}_n) = O(1/\sqrt{n}).\]

This bound, however, only gives a weak lower bound of

\[\mathrm{R}(\mathsf{Disj}_n) = \Omega( \log n)\]

on the randomized communication complexity of the disjointness function. And yet, this is the best bound that we can hope to achieve using the discrepancy method directly.

Theorem 5. For any distribution \(\mu\),
\[\mathrm{disc}_\mu(\mathsf{Disj}_n) = \Omega(1/n).\]

Proof. Consider the following trivial protocol for computing disjointness: Alice and Bob, they sample \(i \in [n]\) uniformly at random using their shared public randomness. They exchange \(x_i\) and \(y_i\). If \(x_i = y_i = 1\), they output \(0\). Otherwise, they output 0 with probability \(\frac12 - \frac1{n}\) and 1 with the remaining probability.

The protocol has constant cost. It also has error \(\frac12 - \frac1{n}\) on all inputs: whenever the inputs correspond to disjoint sets, they output \(1\) with probability \(\frac12 + \frac1{n}\). And when the strings are not disjoint, the probability that the protocol outputs \(0\) is at least \(\frac1n + (1 - \frac1n)(\frac12 - \frac1{n}) > \frac12 + \frac1n.\)

This protocol shows that \(\mathrm{R}_{\frac12 - \frac1n}(\mathsf{Disj}_n) = O(1)\). By Theorem 1 above, this means that the discrepancy of \(\mathsf{Disj}_n\) is bounded below by \(\Omega( 1 - 2(\frac12 - \frac1n)) = \Omega(1/n)\).

Discrepancy and PP complexity

We will see in the next lectures that the randomized communication complexity of the disjointness function is linear in \(n\). This is exponentially larger than the bound we get with the discrepancy method. In other words, that result will show that while discrepancy can be a useful method for proving lower bounds in communication complexity, it does not characterize it.

But this raises the question: does discrepancy characterize some other measure of communication complexity of functions? The answer is yes, and the notion of complexity that comes up in the answer is both useful and interesting in its own right.

Define the PP-cost of a communication protocol \(\pi\) with (standard) communication cost \(c\) and error \(\frac{1 - \delta}2\) to be

\[\mathbf{PP}\mathrm{cost}(\pi) := c + \log(1/\delta).\]

Note that the PP-cost of bounded-error communication protocols is the same as their standard cost up to an additive constant. But with this notion, we can now compare these bounded-error communication protocols with small-bias protocols like the one we introduced in the proof of Theorem 5. This lets us define a new measure of communication complexity of functions.

Definition. The PP complexity of the function \(f \colon \{0,1\}^n \times \{0,1\}^n \to \{0,1\}\) is
\[\mathbf{PP}(f) = \inf_\pi \mathbf{PP}\mathrm{cost}(\pi)\]
where the infimum is taken over all communication protocols and the error of each protocol is measured with respect to \(f\).

PP complexity provides a lower bound on the bounded-error randomized communication complexity of Boolean functions: for every \(f\), \(\mathbf{PP}(f) \le \mathrm{R}(f)\). And in some cases (like the discrepancy function), the PP complexity of a function can be much smaller than its randomized communication complexity.

PP complexity is also closely related to discrepancy. In fact, PP complexity is exactly what is characterized by discrepancy. Writing \(\mathrm{disc}(f) = \inf_\mu \mathrm{disc}_\mu(f)\), we obtain the following result:

Theorem 6. For every function \(f\), \(\mathbf{PP}(f) = \Theta \left( \log(1/\mathrm{disc}(f) \right).\)