11. Set Disjointness

The set disjointness function plays a central role in the study of communication complexity. We saw in the last lecture that the randomized communication complexity of this function is bounded below by R(Disjn)=Ω(logn). The main goal of this and the next lecture will be to significantly strengthen this bound and show that the trivial protocol in which Alice simply sends her input to Bob is asymptotically optimal.

Theorem 1. R(Disjn)=Θ(n).

We should remark that there are many other questions and results related to the communication complexity of the set disjointness function that are quite interesting but that we won’t be able to cover in the course. See for example the surveys of Chattopadhyay and Pitassi (2010) and Sherstov (2014) for more on this topic.

Applications

Before we dive into the task of proving Theorem 1, let us first take a detour to see how this communication complexity lower bound also implies lower bounds in other models of computation as well.

Streaming

A (one-pass) streaming algorithm is a randomized algorithm whose input consists of a sequence of elements x1,,xm that are processed one by one in order. The constraint of particular interest in this model is the amount of space that a streaming algorithm requires to compute some function on the input.

There are some functions that can easily be computed with a reasonably small amount of memory. For example, we can consider the frequency moments estimation task. Here, the elements x1,,xm are positive integers in the range {1,2,,n}, we define mi=|{j:xj=i}|, and the kth moment of the sequence is Fk=i=1nmik. The first moment F1=m represents the total number of elements in the sequence. We can compute it exactly with O(logm) bits of memory since that is sufficient to store the number of elements seen so far.

What if we consider the problem of approximating F1? Here, the requirement for a good multiplicative 13-approximation algorithm is that it returns an estimate y~ that satisfies |y~F1|13F1 with probability, say, .99. Can this be done with even less memory? Yes! Using a Morris counter, it’s possible to approximate F1 with space complexity only O(loglogm).

Can we also approximate the other moments of a stream of numbers as efficiently? This is the question that Alon, Matias, and Szegedy (1999) considered. They showed that some frequency moments can be approximated with little space but others cannot. In particular, the maximum frequency F=maxi[n]mi cannot be approximated with a sublinear amount of space.

Theorem 2. Any single-pass streaming algorithm that provides a 13-approximation to F on a stream of at most 2n numbers in {1,2,,n} has space complexity Ω(n).

Proof. Let A be a streaming algorithm that computes a 13 approximation to F and uses c bits of memory. Alice and Bob can use A to compute the set disjointness function in the following way. They simulate A on the sequence that includes the elements in Alice’s set followed by the elements in Bob’s set. Alice can simulate the algorithm on her own for the first part of the sequence. Then she send the c bits in memory after A has finished processing her part of the sequence to Bob. Bob then finishes simulating A on his part of the sequence. He then outputs 1 if and only if the estimate to F is at most 43.

This protocol uses c+1 bits of communication. When Alice and Bob’s sets are disjoint, the correct value of F is 1. And when their sets are not disjoint, then at least one element appears twice so that F is 2. In both cases, they output the correct value whenever A outputs a 13 approximation to F.

Property testing

The Set Disjointness function can also be used to prove strong lower bounds for some problems in property testing. In the k-junta testing problem, a randomized algorithm is given query access to some unknown function f:{0,1}n{0,1}. Its task is to distinguish with bounded error between the the cases where

(i) the function f is a k-junta in the sense that there is a set J={j1,,jk}[n] of at most k coordinates such that for any input x{0,1}n, the value f(x) is completely determined by the value of the inputs xj1,,xjk; and (ii) the function is far from being a k-junta in the sense that for any k-junta g, |{x{0,1}n:f(x)g(x)}|11002n.

There is an algorithm that can test k-juntas while querying the value of f on only O(klogk) inputs in {0,1}n. And the kDisjn function shows that this bound is tight up to a logarithmic factor.

Theorem 3. Any algorithms for testing k-juntas has query complexity Ω(k).

Proof. Let A be a (2k1)-junta tester with query complexity q. We can design an efficient protocol π for kDisjn using A in the following way.

Let S and T denote Alice and Bob’s inputs to the kDisjn function. Alice and Bob use their public randomness to simulate A on the function f=ST=ST. For every query that A makes on input x{0,1}n, Alice and Bob exchange the values ax=S(x) and bx=T(x) and continue simulating A when it receives the value f(x)=axbx to its query. Once A terminates, Alice and Bob output 1 if and only if A claims that f is not a k-junta.

Each query of A is answered with 2 bits of communication, so the total communication cost of π is O(q). And the correctness of the protocol follows from two facts. First, when S and T are disjoint k-subsets of [n], then |ST|=|S|+|T|=2k and a parity function on 2k coordinates is far from being a (2k1)-junta. Second, when S and T intersect, then |ST||S|+|T|2|ST|2k2<2k1 and the parity function ST is a (2k1)-junta.

Therefore, the argument above shows that R(kDisjn)=O(q). And with Theorem 1 we can also show that R(kDisjn)=Ω(k). So any k-junta tester must have query complexity at least q=Ω(k).

One-sided Discrepancy

Recall that the matrix representation of the set disjointness function looks like this:

Communication matrix of Disjointness

This matrix has a large 0-monochromatic rectangle. But it does not appear to contain any large 1-monochromatic rectangle. And indeed we can prove that even for larger values of n, there is a natural distribution μ on inputs for which any nearly 1-monochromatic rectangle must have very small measure with respect to μ. This is the method that Babai, Frankl, and Simon (1986) used to obtain the first polynomial lower bound on the randomized communication complexity of set disjointness.

In the rest of this section, let X denote the set of subsets of [n] of cardinality exactly n. And let μ denote the distribution on pairs (S,T) where S and T are drawn independently and uniformly at random from X.

The one-sided discrepancy result at the heart of the argument is the following.

Lemma 4. Let R=A×B be any rectangle with A,BX for which at most an ϵ fraction of the pairs in R are intersecting. Then there is a constant c>0 such that

min{|A|,|B|}2cn+1|X|.

Proof. Let |A|>2cn+1|X|. Define A1A to be the set of elements in A that intersect with at most a 2ϵ fraction of the sets in B. By Markov’s inequality, |A1||A|2>2cn|X|.

I claim that there are k=n/3 sets S1,,Sk in A1 such that each set S, 1k satisfies

|S(i<Si)|n2.

Indeed, the size of the sets in X guarantees that |i<Si|nn3. And the number of sets in X for which |S(i<Si)|>n2 is at most

m=n/2n(n/3m)(2n/3nm)<2cn|X|.

Since this number is less than |A1|, there must exist a set S in A1 that satisfies the required condition.

The condition on the sets S1,,Sk implies that their union has cardinality at least

|i=1kSi|kn2=n6.

And the fact that they are all in A1 implies that they each intersect at most an 2ϵ fraction of the sets in B. By Markov’s inequality, this means that at least half of the sets in B intersect at most 4ϵk of the k sets S1,,Sk. But this means that the cardinality of |B| is bounded above by

|B|2(k4ϵk)(5n/6n)<2cn|X|.

With Lemma 4, we can easily obtain our lower bound on the communication complexity of disjointness.

Theorem 5. Rϵ(Disjn)=Ω(n).

Proof. By the easy direction of Yao’s Minimax Principle, it suffices to show that for the distribution μ defined above, Dϵμ(Disjn)=Ω(n).

Fix any deterministic protocol π that computes Disjn with error at most ϵ on inputs (S,T) with S and T drawn independently and uniformly at random from X. (With constant probability, the leaf reached by π must correspond to a nearly 1-monochromatic rectangle.) By Lemma 4, however, each such rectangle has measure at most 2cn+1 with respect to μ. This means that there must be at least Ω(2cn1) such leaves and so the depth of the communication tree for π must be at least Ω(n).

The bound of Ω(n) in Theorem 5 is tight if we consider only product distributions where Alice and Bob’s inputs are drawn independently from each other. To obtain the stronger linear lower bound, we must consider a different distribution μ on the inputs where S and T are no longer drawn independently. This is what we will examine in the next lecture.