11. Set Disjointness

The set disjointness function plays a central role in the study of communication complexity. We saw in the last lecture that the randomized communication complexity of this function is bounded below by \(\mathrm{R}(\mathsf{Disj}_n) = \Omega(\log n)\). The main goal of this and the next lecture will be to significantly strengthen this bound and show that the trivial protocol in which Alice simply sends her input to Bob is asymptotically optimal.

Theorem 1. \(\mathrm{R}(\mathsf{Disj}_n) = \Theta(n).\)

We should remark that there are many other questions and results related to the communication complexity of the set disjointness function that are quite interesting but that we won’t be able to cover in the course. See for example the surveys of Chattopadhyay and Pitassi (2010) and Sherstov (2014) for more on this topic.

Applications

Before we dive into the task of proving Theorem 1, let us first take a detour to see how this communication complexity lower bound also implies lower bounds in other models of computation as well.

Streaming

A (one-pass) streaming algorithm is a randomized algorithm whose input consists of a sequence of elements \(x_1,\ldots,x_m\) that are processed one by one in order. The constraint of particular interest in this model is the amount of space that a streaming algorithm requires to compute some function on the input.

There are some functions that can easily be computed with a reasonably small amount of memory. For example, we can consider the frequency moments estimation task. Here, the elements \(x_1,\ldots,x_m\) are positive integers in the range \(\{1,2,\ldots,n\}\), we define \(m_i = |\{j : x_j = i\}|\), and the \(k\)th moment of the sequence is \(F_k = \sum_{i=1}^n m_i^k\). The first moment \(F_1 = m\) represents the total number of elements in the sequence. We can compute it exactly with \(O(\log m)\) bits of memory since that is sufficient to store the number of elements seen so far.

What if we consider the problem of approximating \(F_1\)? Here, the requirement for a good multiplicative \(\frac13\)-approximation algorithm is that it returns an estimate \(\tilde{y}\) that satisfies \(|\tilde{y} - F_1| \le \frac13 F_1\) with probability, say, \(.99\). Can this be done with even less memory? Yes! Using a Morris counter, it’s possible to approximate \(F_1\) with space complexity only \(O(\log \log m)\).

Can we also approximate the other moments of a stream of numbers as efficiently? This is the question that Alon, Matias, and Szegedy (1999) considered. They showed that some frequency moments can be approximated with little space but others cannot. In particular, the maximum frequency \(F_\infty = \max_{i \in [n]} m_i\) cannot be approximated with a sublinear amount of space.

Theorem 2. Any single-pass streaming algorithm that provides a \(\frac13\)-approximation to \(F_\infty\) on a stream of at most \(2n\) numbers in \(\{1,2,\ldots,n\}\) has space complexity \(\Omega(n)\).

Proof. Let \(A\) be a streaming algorithm that computes a \(\frac13\) approximation to \(F_\infty\) and uses \(c\) bits of memory. Alice and Bob can use \(A\) to compute the set disjointness function in the following way. They simulate \(A\) on the sequence that includes the elements in Alice’s set followed by the elements in Bob’s set. Alice can simulate the algorithm on her own for the first part of the sequence. Then she send the \(c\) bits in memory after \(A\) has finished processing her part of the sequence to Bob. Bob then finishes simulating \(A\) on his part of the sequence. He then outputs \(1\) if and only if the estimate to \(F_\infty\) is at most \(\frac43\).

This protocol uses \(c+1\) bits of communication. When Alice and Bob’s sets are disjoint, the correct value of \(F_\infty\) is 1. And when their sets are not disjoint, then at least one element appears twice so that \(F_\infty\) is 2. In both cases, they output the correct value whenever \(A\) outputs a \(\frac13\) approximation to \(F_\infty\).

Property testing

The Set Disjointness function can also be used to prove strong lower bounds for some problems in property testing. In the \(k\)-junta testing problem, a randomized algorithm is given query access to some unknown function \(f \colon \{0,1\}^n \to \{0,1\}\). Its task is to distinguish with bounded error between the the cases where

(i) the function \(f\) is a \(k\)-junta in the sense that there is a set \(J = \{j_1, \ldots, j_k\} \subseteq [n]\) of at most \(k\) coordinates such that for any input \(x \in \{0,1\}^n\), the value \(f(x)\) is completely determined by the value of the inputs \(x_{j_1},\ldots,x_{j_k}\); and (ii) the function is far from being a \(k\)-junta in the sense that for any \(k\)-junta \(g\), \(|\{x \in \{0,1\}^n : f(x) \neq g(x) \}| \ge \frac1{100} 2^n.\)

There is an algorithm that can test \(k\)-juntas while querying the value of \(f\) on only \(O(k \log k)\) inputs in \(\{0,1\}^n\). And the \(k\mathsf{Disj}_n\) function shows that this bound is tight up to a logarithmic factor.

Theorem 3. Any algorithms for testing \(k\)-juntas has query complexity \(\Omega(k)\).

Proof. Let \(A\) be a \((2k-1)\)-junta tester with query complexity \(q\). We can design an efficient protocol \(\pi\) for \(k\mathsf{Disj}_n\) using \(A\) in the following way.

Let \(S\) and \(T\) denote Alice and Bob’s inputs to the \(k\mathsf{Disj}_n\) function. Alice and Bob use their public randomness to simulate \(A\) on the function \(f = \bigoplus_{S \triangle T} = \bigoplus_S \oplus \bigoplus_T\). For every query that \(A\) makes on input \(x \in \{0,1\}^n\), Alice and Bob exchange the values \(a_x = \bigoplus_S(x)\) and \(b_x = \bigoplus_T(x)\) and continue simulating \(A\) when it receives the value \(f(x) = a_x \oplus b_x\) to its query. Once \(A\) terminates, Alice and Bob output 1 if and only if \(A\) claims that \(f\) is not a \(k\)-junta.

Each query of \(A\) is answered with 2 bits of communication, so the total communication cost of \(\pi\) is \(O(q)\). And the correctness of the protocol follows from two facts. First, when \(S\) and \(T\) are disjoint \(k\)-subsets of \([n]\), then \(|S \triangle T| = |S| + |T| = 2k\) and a parity function on \(2k\) coordinates is far from being a \((2k-1)\)-junta. Second, when \(S\) and \(T\) intersect, then \(|S \triangle T| \le |S| + |T| - 2|S \cap T| \le 2k-2 < 2k-1\) and the parity function \(\bigoplus_{S \triangle T}\) is a \((2k-1)\)-junta.

Therefore, the argument above shows that \(\mathrm{R}(k\mathsf{Disj}_n) = O(q)\). And with Theorem 1 we can also show that \(\mathrm{R}(k\mathsf{Disj}_n) = \Omega(k)\). So any \(k\)-junta tester must have query complexity at least \(q = \Omega(k)\).

One-sided Discrepancy

Recall that the matrix representation of the set disjointness function looks like this:

Communication matrix of Disjointness

This matrix has a large 0-monochromatic rectangle. But it does not appear to contain any large 1-monochromatic rectangle. And indeed we can prove that even for larger values of \(n\), there is a natural distribution \(\mu\) on inputs for which any nearly 1-monochromatic rectangle must have very small measure with respect to \(\mu\). This is the method that Babai, Frankl, and Simon (1986) used to obtain the first polynomial lower bound on the randomized communication complexity of set disjointness.

In the rest of this section, let \(X\) denote the set of subsets of \([n]\) of cardinality exactly \(\sqrt{n}\). And let \(\mu\) denote the distribution on pairs \((S,T)\) where \(S\) and \(T\) are drawn independently and uniformly at random from \(X\).

The one-sided discrepancy result at the heart of the argument is the following.

Lemma 4. Let \(R = A \times B\) be any rectangle with \(A, B \subseteq X\) for which at most an \(\epsilon\) fraction of the pairs in \(R\) are intersecting. Then there is a constant \(c > 0\) such that
\[\min\{|A|, |B|\} \le 2^{-c \sqrt{n} + 1} |X|.\]

Proof. Let \(|A| > 2^{-c \sqrt{n} + 1} |X|\). Define \(A_1 \subseteq A\) to be the set of elements in \(A\) that intersect with at most a \(2\epsilon\) fraction of the sets in \(B\). By Markov’s inequality, \(|A_1| \ge \frac{|A|}2 > 2^{-c \sqrt{n}} |X|.\)

I claim that there are \(k = \sqrt{n}/3\) sets \(S_1,\ldots,S_k\) in \(A_1\) such that each set \(S_\ell\), \(1 \le \ell \le k\) satisfies
\[\left| S_\ell \setminus \left( \bigcup_{i < \ell} S_i \right) \right| \ge \frac{\sqrt{n}}2.\]
Indeed, the size of the sets in \(X\) guarantees that \(| \bigcup_{i < \ell} S_i | \le \ell \sqrt{n} \le \frac{n}{3}\). And the number of sets in \(X\) for which \(|S \cap (\bigcup_{i < \ell} S_i)| > \frac{\sqrt{n}}2\) is at most
\[\sum_{m=\sqrt{n}/2}^{\sqrt{n}} \binom{n/3}{m}\binom{2n/3}{\sqrt{n} - m} < 2^{-c \sqrt{n}} |X|.\]
Since this number is less than \(|A_1|\), there must exist a set \(S_\ell\) in \(A_1\) that satisfies the required condition.

The condition on the sets \(S_1,\ldots,S_k\) implies that their union has cardinality at least
\[\left| \bigcup_{i=1}^{k} S_i \right| \ge k \frac{\sqrt{n}}2 = \frac{n}{6}.\]
And the fact that they are all in \(A_1\) implies that they each intersect at most an \(2\epsilon\) fraction of the sets in \(B\). By Markov’s inequality, this means that at least half of the sets in \(B\) intersect at most \(4\epsilon k\) of the \(k\) sets \(S_1,\ldots,S_k\). But this means that the cardinality of \(|B|\) is bounded above by
\[\frac{|B|}2 \le \binom{k}{4\epsilon k} \binom{5n/6}{\sqrt{n}} < 2^{-c \sqrt{n}} |X|.\]

With Lemma 4, we can easily obtain our lower bound on the communication complexity of disjointness.

Theorem 5. \(\mathrm{R}_\epsilon(\mathsf{Disj}_n) = \Omega(\sqrt{n}).\)

Proof. By the easy direction of Yao’s Minimax Principle, it suffices to show that for the distribution \(\mu\) defined above, \(\mathrm{D}_\epsilon^\mu(\mathsf{Disj}_n) = \Omega(\sqrt{n})\).

Fix any deterministic protocol \(\pi\) that computes \(\mathsf{Disj}_n\) with error at most \(\epsilon\) on inputs \((S,T)\) with \(S\) and \(T\) drawn independently and uniformly at random from \(X\). (With constant probability, the leaf reached by \(\pi\) must correspond to a nearly 1-monochromatic rectangle.) By Lemma 4, however, each such rectangle has measure at most \(2^{-c \sqrt{n} + 1}\) with respect to \(\mu\). This means that there must be at least \(\Omega(2^{c \sqrt{n} - 1})\) such leaves and so the depth of the communication tree for \(\pi\) must be at least \(\Omega(\sqrt{n})\).

The bound of \(\Omega(\sqrt{n})\) in Theorem 5 is tight if we consider only product distributions where Alice and Bob’s inputs are drawn independently from each other. To obtain the stronger linear lower bound, we must consider a different distribution \(\mu\) on the inputs where \(S\) and \(T\) are no longer drawn independently. This is what we will examine in the next lecture.