5. Query Complexity I

Let us now change our focus from the last few lectures and dive into complexity theoretic questions related to specific models of computation. We begin with randomized query complexity.

Query Complexity

Query complexity considers the simple computational model where an algorithm gets some input $x$ and aims to compute the value $f(x)$ for some function $f$. More generally, in some cases the goal is to compute a relation instead, but to keep the discussion simple, we focus of functions throughout this lecture. In fact, we focus only on Boolean functions where every input $x$ is a binary string in $\{0,1\}^n$ and the value of the function is either $0$ or $1$.

In the query model, the input $x \in \{0,1\}^n$ to the algorithm is initially unknown. The algorithm can query any bit of the input by specifying an index $i \in [n]$ and observing the value $x_i$. It can then perform any computation it wants (using the values uncovered so far) and then either decide to query another bit or, when it is done, to output the value of the function $f$. The cost $\mathrm{cost}(D,x)$ of a deterministic algorithm $D$ on input $x$ in this model is the number of queries that $D$ makes to $x$ before it halts.

Query complexity is also called decision tree complexity because query algorithms have a natural representation as decision trees. In our setting of Boolean values, decision trees are rooted binary trees whose internal nodes are each labelled with an index in $[n]$ and where the two edges from an internal node to its children are labelled with $0$ and $1$, respectively. The leaves of the decision tree are labelled with either $0$ or $1$. An input $x \in \{0,1\}^n$ induces a unique path to a leaf in this tree when we start at the root and at each internal node labelled with $i$ follow the edge to its child labelled with the value $x_i$. The value that the algorithm outputs is the label of the leaf reached in the decision tree by the given input.

The cost of a deterministic algorithm on a given input corresponds to the depth of the leaf in its tree reached by that input. And so the worst-case cost of a deterministic algorithm corresponds to the height (or maximum depth) of the corresponding decision tree. A randomized algorithm in the query complexity setting corresponds to a distribution over decision trees. The worst-case and average-case costs of randomized algorithms and the corresponding query complexity measures of functions follow the definitions introduced in the Definitions lecture.

Our ultimate goal in studying randomized query complexity is a complete understanding of the query complexity of all functions. There are many other interesting questions we can ask along the way: Which functions have “small” query complexity? (Here “small” can be interpreted as constant, logarithmic in the length $n$ of the inputs, sublinear in $n$, or anything else in between.) What properties of functions are necessary or sufficient to guarantee small or large randomized query complexity? How does randomized query complexity relate to other measures of complexity? As a first step, we start with what should be a very basic question:

Does randomness even help in the query complexity setting?

As we will see, even this preliminary question is far from trivial; it corresponds to the task of proving (the existence or absence of) separations between deterministic and randomized query complexity.

Separations between $D$ and $\overline{R}_0$

There are two ways that randomness might help reduce the query complexity of specific functions. First, it might be that the average cost of randomized decision trees is less than the worst-case cost for some functions. Second, it might be that allowing for some bounded error probability makes the computation of some functions much easier. We examine both of these possible sources of efficiency improvement separately.

If we consider partial functions $f \colon X \to \{0,1\}$ for some subset $X \subseteq \{0,1\}^n$, we can establish the largest possible gap between deterministic and zero-error randomized query complexity.

Theorem 1. There exists a partial function $f \colon X \to \{0,1\}$, $X \subseteq \{0,1\}^n$ for which
\[\mathrm{D}(f) = \Omega(n) \qquad \mbox{and} \qquad \overline{\mathrm{R}}_0(f) = O(1).\]

One way to design a partial function to prove Theorem 1 is to construct a natural extension of a search problem where there are many copies of the element we’re trying to find. Specifically, partition the $n$ bits into $n/2$ blocks of 2 bits each. Let’s say that a block is a key block when the first of 2 bits is a 1, and that it is a dummy block instead. Let $X \subseteq \{0,1\}^n$ be the set of strings where exactly 1/2 of the blocks are key blocks, and where each of those key blocks have the same value for their second bit. (There is no restriction on the second value in dummy blocks.) The value of the function on inputs in $X$ is the value of the second bit in the key blocks.

If we consider total functions, however, the situation is very different: in this case, there can be at most a quadratic gap between the deterministic and zero-error randomized query complexity.

Theorem 2. For every Boolean function $f \colon \{0,1\}^n \to \{0,1\}$,
\[\mathrm{D}(f) \le \overline{\mathrm{R}}_0(f)^2.\]

The easiest way to prove this theorem, somewhat counter-intuitively, is to prove an even stronger bound. A certificate for an input $x$ with respect to the function $f$ is a set $I \subseteq [n]$ of coordinates such that every $y \in \{0,1\}^n$ that satisfies $y_I = x_I$ (which is shorthand for the statement that for every $i \in I$, $y_i = x_i$), we have $f(y) = f(x)$. The certificate complexity of a function $f$ is

\[\mathrm{C}(f) = \max_{x \in \{0,1\}^n} \min \{ |I| : I \subseteq [n] \mbox{ is a valid certificate for } x\}.\]

Note that an algorithm (deterministic or randomized) that computes $f$ with zero error can only halt on input $x$ after it has found a valid certificate for $x$. As a result, we have the following relation.

Lemma 3. For every function $f$, $\overline{\mathrm{R}}_0(f) \ge \mathrm{C}(f)$.

We can also further refine the notion of certificate complexity by considering certificates for 0-valued and 1-valued inputs separately:

\[\begin{align*} \mathrm{C}^1(f) &= \max_{x \in f^{-1}(1)} \min \{ |I| : I \subseteq [n] \mbox{ is a valid certificate for } x\}; \\ \mathrm{C}^0(f) &= \max_{x \in f^{-1}(0)} \min \{ |I| : I \subseteq [n] \mbox{ is a valid certificate for } x\}. \end{align*}\]

These measures correspond to the DNF width and CNF width complexity of $f$, respectively. And they upper bound deterministic query complexity in the following way.

Theorem 4. For every function $f \colon \{0,1\}^n \to \{0,1\}$,
\[\mathrm{D}(f) \le \mathrm{C}^0(f) \cdot \mathrm{C}^1(f).\]

The key observation at the heart of the proof of this theorem is the following: for any $0$-certificate $I_0$ and $1$-certificate $I_1$, we must have $I_0 \cap I_1 \neq \emptyset$. (Why?) This means that if we query all the coordinates in any of the $0$-certificates, we eliminate at least one coordinate from each of the $1$-certificates. So we can complete at most $\mathrm{C}^1(f)$ rounds (that each make $\mathrm{C}^0(f)$ queries) before the revealed bits of the input must be consistent with some certificate.

So we know that the deterministic and zero-error randomized query complexities of total functions can not be arbitrarily far apart from each other… but could we strengthen the lower bound even further, or is there a polynomial separation between the two measures for some functions?

Recursive NAND function

If you stop and really try to answer the last question from scratch, you may find that it is not as easy as one might think to find a total function for which a separation exists. For a very large number of Boolean functions (like the Majority function, AND or OR functions, dictator functions, XOR function, etc.) there is in fact at most a constant-factor gap between their deterministic and zero-error randomized query complexities. To find an asymptotic separation, it appears that we have to design a function specifically with our goal in mind. As it turns out, binary trees are a great way to do that.

Consider the binary NAND gate: it takes in two binary inputs, and outputs 1 if and only if the two inputs are not both 1. We can obtain a Boolean function on $n = 2^h$ variables by constructing a formula in the shape of a complete binary tree of depth $h$, where each node is a NAND node and each input variable is associated with one of the leaves of the tree. We call this function the recursive NAND function (of depth $h$) and denote it by $\mathrm{NAND}_h$. The function looks like this:

Illustration of recursive NAND function

We can see how randomness might help reduce our query complexity if we start to evaluate this function: to compute the value of the output of the function, we must first compute the value of one of the two NAND gates below the root gate. If this lower NAND gate evaluates to 1, then we need to evaluate the other second-level NAND gate because it’s value will determine the value of the function. But if the first lower NAND gate evaluates to 0, then there is no need to evaluate its sibling gate: we already know that the output of the root gate will be 1.

If we have a deterministic algorithm, the above observation does not help, because an adversary can fix the values of the input bits in a way that will always force the value of the first of two sibling NAND gates to evaluate to 1. But with randomness, for every NAND gate that evaluates to 1, if we choose uniformly at random which of its children gates to evaluate first, we have probability at least 1/2 of choosing one that evaluates to 0 and saves us from having to evaluate its sibling gate. Using this strategy recursively, we obtain the following separation result.

Theorem 5. For every $h \ge 1$, the $\mathrm{NAND_h}$ function on $n = 2^h$ variables satisfies
\[\mathrm{D}(\mathrm{NAND}_h) = 2^h = n\]
and
\[\overline{\mathrm{R}}_0(\mathrm{NAND}_h) = O \left( \left( \frac{1 + \sqrt{33}}4 \right)^h \right) = O(n^{0.753\ldots}).\]

Saks and Wigderson (1986) showed that the above upper bound on the zero-error randomized query complexity of the recursive NAND function is tight. They also conjectured that this separation between deterministic and zero-error randomized query complexity was optimal. This conjecture remained open for nearly 30 years.

Shortcut functions

The conjecture of Saks and Wigderson remained open for around 30 years, until the optimal separation (up to logarithmic factors) was recently established by Ambainis, Balodis, Belovs, Lee, Santha, and Smotrovs (2017). They showed that (ignoring once again logarithmic factors) the quadratic separation between deterministic and zero-error randomized query complexity does indeed occur for some total Boolean functions.

One of the main insights that led to the breakthrough was the observation that to construct functions with a very large gap between deterministic and zero-error randomized query complexity, we should move away from recursively-defined functions like the NAND function above and instead look for functions that include “shortcuts” that can only be used effectively by randomized algorithms.

This idea of using “shortcuts” to separate two measures of complexity arises naturally in some settings. For example, if we want to obtain a separation between $D(f)$ and the standard 1-certificate complexity $C^1(f)$, there is a simple function that shows a quadratic gap: think of the input $x \in \{0,1\}^n$ as representing an $m \times m$ matrix for $m = \sqrt{n}$. Then define $\mathsf{OnesCol}(x) = 1$ if and only if there is exactly one column of the matrix that consists of all 1’s. For example,

\[\mathsf{OnesCol} \left(\, \begin{array}{cccc} 0 & 1 & 0 & 1 \\ 0 & 1 & 0 & 1 \\ 1 & 1 & 0 & 1 \\ 0 & 1 & 0 & 0 \end{array} \,\right) = 1\]

but

\[\mathsf{OnesCol} \left( \, \begin{array}{cccc} 0 & 1 & 0 & 1 \\ 0 & 1 & 0 & 1 \\ 1 & 1 & 0 & 1 \\ 0 & 1 & 0 & 1 \end{array} \, \right) = \mathsf{OnesCol} \left( \, \begin{array}{cccc} 0 & 0 & 0 & 1 \\ 0 & 1 & 0 & 1 \\ 1 & 1 & 0 & 1 \\ 0 & 1 & 0 & 0 \end{array} \, \right) = 0\]

because the first array has more than one column with all 1’s while the second one has none. With an adversary argument, we can show that $\mathrm{D}(\mathsf{OnesCol}) = n$. And since we can reveal the all-ones column and one of the zeros for each of the other columns for each of the 1-inputs to the function $\mathrm{C}^1(\mathsf{OnesCol}) = m + (m-1) = O(\sqrt{n})$.

Göös, Pitassi, and Watson (2015) were the first to show the real power of “shortcut” functions. They were interested in separating deterministic query complexity from unambiguous 1-certificate complexity, a variant of the 1-certificate complexity measure described above that is also known as partition complexity. For this complexity measure, the function above does not give a reasonable separation, but the key observation that they made was that we don’t need to restrict ourselves just to have values in each entry of the table. That is, we can consider tables that have some value entries and additional information like pointers to other cells in the table. This idea is remarkably powerful.

For randomized query complexity, we can implement the shortcut idea by taking a function that evaluates to 1 if and only if the table contains exactly one column whose values are all ones and where exactly one of the entries in this column is the root of a binary tree whose nodes are 0-entries of all of the other columns. With this construction (and after handling all the related technical issues), we obtain the following separation result.

Theorem 6. There is a total function $f \colon \{0,1\}^n \to \{0,1\}$ that satisfies
\[\mathrm{D}(f) = \widetilde{\Omega}\big(\overline{\mathrm{R}}_0(f)^2\big).\]

See the original paper for all the technical details along with a more detailed discussion of the intuition behind the construction.

Theorems 2 and 6 match up to logarithmic factors, giving us an essentially complete picture of the maximum separation possible between the deterministic and zero-error randomized query complexities of total Boolean functions. But we should emphasize that our understanding of deterministic vs. zero-error randomized query complexity is still far from complete. For example, we don’t even know what is the exact zero-error randomized complexity of the recursive 3-Majority function, the function that we obtain when we replace the binary tree of NAND gates in the recursive NAND function with a ternary tree with Majority-of-3 gates.

But instead of diving deeper in this direction, let us instead explore another equally fundamental question: does giving randomized algorithms the ability to err with some bounded probability also help reduce query complexity, or are all the savings obtainable by randomness strictly due to the average-case vs. worst-case cost model?

5. Query Complexity I

Query Complexity

Separations between \(D\) and \(\overline{R}_0\)

Recursive NAND function

Shortcut functions