10. Discrepancy
The last lecture ended with the challenge of finding a communication protocol for the inner product function with sublinear communication complexity. As it turns out, there is no such protocol. But how can we prove that? This and other similar lower bound questions is what we explore in the upcoming lectures.
Discrepancy and Communication Complexity
One good way to get some intuition for why the inner product function has large randomized communication complexity is to look at the matrix for this function:
For a function to have low randomized communication complexity, the matrix should contain fairly large nearly monochromatic rectangles: rectangles for which almost all the entries have the same value.
That’s because the leaves of a randomized communication protocol, if they have low error, correspond to nearly monochromatic rectangles.
But even in this small example, we see that the structure of the matrix makes it impossible to find large rectangles which are close to monochromatic if we exclude the first row and the first column.
The
This intuition can indeed lead to formal lower bounds on the communication complexity of Boolean functions. We formalize it using the notion of discrepancy
As a first step, let us change the range of the Boolean functions that we consider so that they are
With this change, the notion of balance or bias of a combinatorial rectangle can be expressed as an expectation.
Specifically, let us fix a function
where
The discrepancy of the function
where the maximum is taken over all rectangles. Note that a function has large discrepancy if and only if it contains a large rectangle that is nearly monochromatic.
The discrepancy of a function can be used to give a lower bound on its randomized communication complexity.
Theorem 1. For any function
, distribution , and parameter ,
Proof. Let
be a deterministic communication protocol that computes with cost and error at most with respect to . The leaves of the protocol tree for partition the communication matrix for into rectangles . Note that for any input , So the success guarantee on
implies that We can expand the expression on the right and use linearity of expectation to obtain
with the last inequality holding because
takes the same value for all . But each term in the final sum is the discrepancy of a rectangle with respect to and , so each of them is at most . Combining everything, we then get and rearranging gives the lower bound
on the cost of . The conclusion of the theorem then follows from the easy direction of Yao’s Minimax Principle.
Inner Product
We can use the last theorem to obtain a linear lower bound on the randomized query complexity of the inner product function. We do this by proving the following lower bound on the discrepancy of the inner product function with respect to the uniform distribution.
Lemma 2. Let
denote the uniform distribution. Then
Proof. Fix any rectangle
. We can write the square of the discrepancy of with respect to and the uniform distribution as The first upper bound above is by Jensen’s inequality and the second by the observation that
and is non-negative. By expanding the square, the last expression equals The key observation now is that
And
if and only if . This occurs with probability when and are drawn independently and uniformly at random. Therefore, combining the above inequalities we end up with giving the desired conclusion.
Combining the lemma with the theorem from the last section, we obtain tight bounds on the randomized communication complexity of the inner product function.
Theorem 3.
Set Disjointness
We can try bounding the discrepancy of the disjointness function against the uniform distribution as well, but a quick look at the matrix representation of this function shows why this is not a promising approach.
There is a monochromatic rectangle that covers the bottom right corner of the matrix.
And with larger values of
But we can use the discrepancy method to obtain a non-trivial lower bound on the communication complexity of the disjointness function by considering a different distribution on the inputs.
One natural candidate to consider is the
Th disjointness function has a non-trivial discrepancy bound under this distribution.
Lemma 4. When
is the -biased product distribution,
This bound, however, only gives a weak lower bound of
on the randomized communication complexity of the disjointness function. And yet, this is the best bound that we can hope to achieve using the discrepancy method directly.
Theorem 5. For any distribution
,
Proof. Consider the following trivial protocol for computing disjointness: Alice and Bob, they sample
uniformly at random using their shared public randomness. They exchange and . If , they output . Otherwise, they output 0 with probability and 1 with the remaining probability. The protocol has constant cost. It also has error
on all inputs: whenever the inputs correspond to disjoint sets, they output with probability . And when the strings are not disjoint, the probability that the protocol outputs is at least This protocol shows that
. By Theorem 1 above, this means that the discrepancy of is bounded below by .
Discrepancy and PP complexity
We will see in the next lectures that the randomized communication complexity of the disjointness function is linear in
But this raises the question: does discrepancy characterize some other measure of communication complexity of functions? The answer is yes, and the notion of complexity that comes up in the answer is both useful and interesting in its own right.
Define the PP-cost of a communication protocol
Note that the PP-cost of bounded-error communication protocols is the same as their standard cost up to an additive constant. But with this notion, we can now compare these bounded-error communication protocols with small-bias protocols like the one we introduced in the proof of Theorem 5. This lets us define a new measure of communication complexity of functions.
Definition. The PP complexity of the function
is where the infimum is taken over all communication protocols and the error of each protocol is measured with respect to
.
PP complexity provides a lower bound on the bounded-error randomized communication complexity of Boolean functions: for every
PP complexity is also closely related to discrepancy.
In fact, PP complexity is exactly what is characterized by discrepancy.
Writing
Theorem 6. For every function
,