7. Query Hierarchy Theorems

The last two lectures showed that even the most basic questions “Does randomness help reduce query complexity?” and “If so, by how much?” lead to surprisingly rich lines of research. Let us now continue along this line by asking another deceptively simple question:

What values can \(\mathrm{R}(f)\) take?

Every non-constant function \(f \colon \{0,1\}^n \to \{0,1\}\) has query complexity bounded below by \(\mathrm{R}(f) \ge 1\) and above by \(\mathrm{R}(f) \le n\). Are there functions with each of the possible query complexities in between those two extremes? And what about when we restrict our attention to various classes of functions? Such questions give rise to Query Hierarchy Theorems (and some associated Gap Theorems).

But before we dive into these questions, let’s first take a detour and examine the randomized query complexity of two special functions.

Parity and Gap Majority

The parity (or XOR) function on \(n\) bits is the total Boolean function \(\mathsf{Parity}_n \colon \{0,1\}^n \to \{0,1\}\) defined by

\[\mathsf{Parity}_n(x) = x_1 \oplus x_2 \oplus \cdots \oplus x_n = \sum_{i=1}^n x_i \pmod{2}.\]

With an adversary argument, we can show that \(\mathrm{D}(\mathsf{Parity}_n) = n\). (In fact, the adversary argument shows the even stronger claim that every leaf of a decision tree that computes the parity function must have depth \(n\).) We can also show that the bounded-error randomized query complexity of the parity function is \(\mathrm{R}_\epsilon(\mathsf{Parity}_n) = \Omega(n)\) for any constant \(\epsilon < \frac12\) bounded away from \(\frac12\).

And even in the small bias regime where we consider \(\epsilon = \frac12 - \delta\) for some \(\delta = \delta(n)\) (think for example of \(\delta = 1/\sqrt{n}\)) then we still have a large lower bound on the query complexity of the function.

Proposition. For every bias function \(\delta \colon \mathbb{N} \to \mathbb{N}\),
\[\overline{\mathrm{R}}_{\frac12 - \delta(n)}( \mathsf{Parity}_n ) = \Theta( \delta(n) \cdot n ).\]
In particular, \(\overline{\mathrm{R}}_{\frac{1}2 - \frac1{\sqrt{n}}}( \mathsf{Parity}_n ) = \Theta( \sqrt{n} ).\)

As we will see, the parity function is useful in establishing a number of complexity theoretic results. Another useful function for query complexity in particular is the gap majority function on \(n\) bits. It is the partial function defined over \(\{0,1\}^n\) by

\[\mathsf{GapMaj}_n(x) = \begin{cases} 1 & \mbox{if } \|x\|_1 \ge \frac n2 + \sqrt{n} \\ 0 & \mbox{if } \|x\|_1 \le \frac n2 - \sqrt{n} \\ * & \mbox{otherwise}. \end{cases}\]

In other words, the gap majority function is the usual Majority function restricted to the domain

\[X = \left\{ x \in \{0,1\}^n: \left|\|x\|_1 - \tfrac n2\right| \ge \sqrt{n} \right\}.\]

The restriction on the domain makes the function a little bit easier to compute than the standard majority function because it guarantees a gap of size at least \(\sqrt{n}\) between the input bits that agree with the value of the function and the ones that don’t. This gap is not large enough, however, to affect the bounded-error randomized query complexity of the function: when \(\epsilon\) is bounded away from \(\frac12\), then \(\mathrm{R}_\epsilon( \mathsf{GapMaj}_n ) = \Omega( n ).\)

In the small bias regime, however, the gap does make a meaningful difference.

Proposition. For every bias function \(\delta \colon \mathbb{N} \to \mathbb{N}\),
\[\overline{\mathrm{R}}_{\frac12 - \delta(n)}( \mathsf{GapMaj}_n ) = \Theta( \delta(n)^2 \cdot n ).\]
In particular, \(\overline{\mathrm{R}}_{\frac{1}2 - \frac1{\sqrt{n}}}( \mathsf{GapMaj}_n ) = O(1).\)

Note that this bound is must stronger in terms of \(\delta\) than the corresponding bound for the parity function. In particular, this bound says that \(O(1)\) queries suffice to comput the gap majority function with bias \(\delta(n) = 1/\sqrt{n}\). (Indeed, querying a single bit chosen uniformly at random and returning its value gives a randomized algorithm for gap majority with success probability \(\frac12 + \sqrt{n}\).) By contrast, our bound on the parity function implies that any randomized algorithm with the same bias must have query complexity \(\Omega(\sqrt{n})\).

A Hierarchy Gap Theorem

Using the parity function on a subset of the \(n\) bits in the domain, we can easily show that for every function \(q \colon \mathbb{N} \to \mathbb{N}\) and every \(n \in \mathbb{N}\), there is a function \(f \colon \{0,1\}^n \to \{0,1\}\) with query complexity \(\mathrm{R}_{\frac 13}(f) = \Theta( q(n) )\).

But the observation above is not entirely satisfactory because the functions used in the construction only depend on a subset of the \(n\) bits of the input. Such functions are called juntas, and what we would really like a hierarchy theorem sto show is that even ignoring all junta functions, we can still achieve all possible query complexities.

Definition. A total function \(f \colon \{0,1\}^n \to \{0,1\}\) is truly \(n\)-variate if for every input \(i \in [n]\), there exists an input \(x \in \{0,1\}^n\) such that \(f(x) \neq f(x^{(i)})\).

Can truly \(n\)-variate functions achieve all possible query complexities? In the deterministic setting, the answer is no.

Proposition. Every truly \(n\)-variate Boolean function \(f \colon \{0,1\}^n \to \{0,1\}\) has deterministic query complexity \(\mathrm{D}(f) \ge \log n\). This bound is asymptotically tight: there exist such functions \(f\) with \(\mathrm{D}(f) = O( \log n)\).

Proof. The lower bound on \(\mathrm{D}(f)\) follows immediately from the observation that a binary tree of height \(h\) contains at most \(2^h\) nodes and that if \(f\) is truly \(n\)-variate, any decision tree that computes it must query all of the \(n\) indices \(x_i\) in at least one of the nodes.

Many different functions can be constructed to show that the \(\log n\) bound is asymptotically tight. One particularly useful construction because of its versatility is the \(\mathsf{Index}\) function. This function on \(n = k + 2^k\) bits interprets an input \(x\) as a tuple \((a,y)\), \(a \in \{0,1\}^k\) and \(y \in \{0,1\}^{2^k}\) and is defined by \(\mathsf{Index}(a,y) = y_a.\) In other words, it returns the value of the coordinate of \(y\) indexed by the address \(a\). As we can verify directly, the function is truly \(n\) variate and it can be computed by a deterministic decision tree of depth \(a + 1 = O(\log n)\).

What about for randomized query complexity? The simple argument that we used for deterministic decision trees no longer holds when we consider randomized decision tree, because the trees in the support of the distribution that corresponds to a randomized decision tree do not have to all have nodes that query all of the indices of the input. Yet, the same lower bound on the query complexity of truly \(n\)-variate functions still holds.

Theorem. Every truly \(n\)-variate Boolean function \(f \colon \{0,1\}^n \to \{0,1\}\) has randomized query complexity \(\mathrm{R}_{\frac13}(f) = \Omega(\log n)\).

Proof. The lower bound follows from a matching lower bound on the maximum sensitivity of truly \(n\)-variate functions. This result that \(s(f) = \Omega(\log n)\) for every non-constant truly \(n\)-variate function \(f\) is due to Simon (1982).

The bound on the minimum randomized query complexity of truly \(n\)-variate functions is again asymptotically tight by the \(\mathsf{Index}\) function.

And the set of all truly \(n\)-variate functions can be restricted even further. For example, we can consider the set of monotone truly \(n\)-variate functions or the set of all symmetric functions. In some cases, these further restrictions give much stronger gap theorems:

Theorem. Every non-constant symmetric function \(f \colon \{0,1\}^n \to \{0,1\}\) has randomized query complexity \(\mathrm{R}_{\frac 13}(f) = \Theta(n)\).

And in other cases, the additional restrictions do not affect the minimum query complexity of functions:

Theorem. Every monotone truly \(n\)-variate function \(f \colon \{0,1\}^n \to \{0,1\}\) has randomized query complexity \(\mathrm{R}_{\frac 13}(f) = \Omega(\log n)\). This bound is tight for some monotone truly \(n\)-variate functions.

Proof. The lower bound is the one that holds for all truly \(n\)-variate functions.

For the matching upper bound, consider the monotone variant of the index function. This function is defined on \(n = k + \binom{k}{k/2}\) bits for some even \(k\). We again consider the first \(k\) bits to represent an address, but in this case only the strings \(a \in \{0,1\}^k\) with Hamming weight exactly \(k/2\) represent valid addresses. The \(\mathsf{MonotoneIndex}\) function is defined as the standard \(\mathsf{Index}\) function on valid addresses. On all other addresses, the function evaluates to 1 if the address has more 1s than 0s and to 0 otherwise. We can again verify directly that the function is monotone and truly \(n\)-variate, and that it can be computed by a deterministic decision tree of depth \(a+1 = O(\log n)\).

What about other classes of functions? One particularly interesting special case is the set of all transitive-symmetric Boolean functions, where \(f \colon \{0,1\}^n \to \{0,1\}\) if for every pair of indices \(i,j \in [n]\), there exists a permutation \(\pi\) of \([n]\) such that for every \(x \in \{0,1\}^n\), the string \(y = x_{\pi(1)} x_{\pi(2)} \cdots x_{\pi(n)}\) satisfies \(f(y) = f(x)\). The randomized evasiveness conjecture for monotone graph properties discusssed in the first lecture is a special case of the following conjectured Gap Theorem for transitive-symmetric function:

Randomized Evasiveness Conjecture. Every non-constant monotone transitive-symmetric function \(f \colon \{0,1\}^n \to \{0,1\}\) has randomized query complexity \(\mathrm{R}_{\frac 13}(f) = \Theta(n)\).

A Hierarchy Theorem for Partial Functions

If we turn our attention to partial Boolean functions, we can again easily obtain a hierarchy theorem for randomized query complexity. This time, we can start with the Gap Majority function and vary the size of the gap in Hamming weights in the definition of the domain \(X\) of the function. As we saw above, when the gap has size \(\sqrt{n}\), the Gap Majority function has randomized query complexity \(\Omega(n)\). At the other extreme, when the gap has size linear in \(n\), say \(\frac n4\), we see that the resulting (Large) Gap Majority function has query complexity \(O(1)\). And analyzing the possible gap lengths in between those two extremes gives functions with query complexity everywhere between \(O(1)\) and \(\Omega(n)\).

But again this answer is not completely satisfying. Or, perhaps a better way to put it is that this observation raises an immediate follow-up question: is it just the size of the gap (appropriately defined/generalized) that determines the randomized query complexity of all partial functions? We can make this question precise by introducing the notion of margin of partial functions.

Definition. The margin of the partial function \(f \colon X \to \{0,1\}\) for some \(X \subseteq \{0,1\}^n\) is the minimum Hamming distance between inputs in \(X\) that take different values under \(f\):
\[\mathrm{margin}(f) = \min \left\{ \|x - y\|_1 : x,y \in X, f(x) \neq f(y) \right\}.\]

Let us say that the partial function \(f \colon X \to \{0,1\}\), \(X \subseteq \{0,1\}^n\), has linear margin when \(\mathrm{margin}(f) = \Omega(n)\). As we have seen with the large gap majority function, there exist some non-constant partial functions with linear margin that have constant randomized query complexity. Is that the case for all linear margin functions? Or is the randomized query complexity of all these functions bounded by some other sublinear function?

The answer is no: having a linear margin is not sufficient to guarantee having small randomized query complexity.

Theorem. There exists a partial function \(f \colon X \to \{0,1\}\), \(X \subseteq \{0,1\}^n\) with \(\mathrm{margin}(f) = \Omega(n)\) and randomized query complexity \(\mathrm{R}_{\frac13}(f) = \Omega(n)\).

Proof sketch. Following Goldreich, Goldwasser, and Ron (1998), one way to prove the theorem is via error-correcting codes.

We know that there is a set \(X \subseteq \{0,1\}^n\) of \(|X| = 2^{\Omega(n)}\) points in the hypercube such that every pair of points in \(X\) has Hamming distance \(\Omega(n)\) between them. (Taking \(X\) to be a random set of points and removing any pair that does not satisfy the condition suffices to establish this claiim; there are also more explict constructions that can be used.)

Let us define \(f \colon X \to \{0,1\}\) by setting its value independently at random for each point in \(X\). The minimum distance between points in \(X\) guarantees that \(f\) has linear margin. And we can show that for any decision tree with query complexity \(o(n)\), the uniform distribution on points in \(X\) leads to leaves that have almost the same probability of being reached by points in \(f^{-1}(0)\) and in \(f^{-1}(1)\), so that no such decision tree can compute \(f\) with bounded error.

Building on this result, we can obtain a full (asymptotic) hierarchy theorem for partial functions with linear margin.

Once again, we can ask about particular sets of partial functions with linear margins. Doing so leads to property testing, which considers the partial functions that arise when considering properties of combinatorial objects (such as graphs or functions) and the number of queries required to distinguish the objects that have the property from those that are far from having the same property.