8. Query Composition Theorems

The composition of two functions \(f \colon \{0,1\}^n \to \{0,1\}\) and \(g \colon \{0,1\}^m \to \{0,1\}\) is the function \(f \circ g \colon \{0,1\}^{n \times m} \to \{0,1\}\) defined by

\[f \circ g(x) = f \big( g(x^1), g(x^2), \dots, g(x^n) \big)\]

for all \(x = (x^i_j)_{i \in [n], j \in [m]}\). The same definition can also be extended to partial functions.

The composition of Boolean function comes up in the definition of some functions we have already seen (such as the recursive NAND function) and in the study of hardness amplification. It’s also one of the most basic operations we can apply to Boolean functions. So it’s natural to ask how randomized query complexity behaves with respect to composition.

Towards a Composition Conjecture

In the deterministic setting, query complexity behaves multiplicatively under the composition of Boolean functions:

Theorem. For every total or partial Boolean functions \(f\) and \(g\),
\[\mathrm{D}(f \circ g) = \mathrm{D}(f) \cdot \mathrm{D}(g).\]

The upper bound is obtained with the natural approach: if we have a decision tree \(T_f\) that computes \(f\) and a decision tree \(T_g\) that computes \(g\), we can obtain a decision tree that computes \(f \circ g\) by constructing a tree where we replace each node of \(T_f\) querying the \(i\)th bit of its input into a copy of \(T_g\) that computes the value of \(g(x^i)\).

If we try to use the same type of construction in the randomized setting, we cannot use randomized decision trees that err with probability \(\frac13\) for both \(f\) and \(g\) (unless we have additional robustness promises on the tree for \(f\)). As a result, the simple upper bound for composition that we obtain from the natural construction is the following.

Proposition. For every total or partial Boolean functions \(f\) and \(g\),
\[\mathrm{R}_{\frac13}(f \circ g) = \mathrm{R}_{\frac16}(f) \cdot \mathrm{R}_{\frac1{6\mathrm{R}_{1/6}(f)}}(g) = O \big( \mathrm{R}_{\frac13}(f) \log \mathrm{R}_{\frac13}(f) \cdot \mathrm{R}_{\frac13}(g) \big).\]

But the additional logarithmic term is not necessary for all choices of \(f\) or \(g\). When \(g\) is the parity function, for example, then \(\mathrm{R}_{\epsilon}(\mathsf{Parity}_m) = \Omega(m)\) for every \(\epsilon \le 1/3\), so in this case

\[\mathrm{R}_{\frac13}(f \circ \mathsf{Parity}_m) = O \big( \mathrm{R}_{\frac13}(f) \cdot \mathrm{R}_{\frac13}(\mathsf{Parity}_m) \big).\]

Is it possible to obtain even smaller bounds on the randomized query complexity of composed function? For total functions, this is an open problem.

Conjecture. For every pair of total functions \(f\) and \(g\),
\[\mathrm{R}_{\frac13}(f \circ g) = \Omega \big( \mathrm{R}_{\frac13}(f) \cdot \mathrm{R}_{\frac13}(g) \big).\]

For partial functions, the analogous conjecture is false: there are functions for which randomized query complexity behaves sub-multiplicatively under function composition.

A Composition Counter-Example

We can show the following counter-example to the extension of the conjecture above to partial functions:

Theorem. There exist partial functions \(f\) and \(g\) for which
\[\mathrm{R}_{\frac13}(f \circ g) = O \Big( \big( \mathrm{R}_{\frac13}(f) \cdot \mathrm{R}_{\frac13}(g) \big)^{2/3} \Big).\]

Proof idea. We want two ingredients to prove the theorem:

A function \(f\) that can be computed even if it only has very slightly biased predictions on the values of each of its bits, and

A function \(g\) that can be computed to small bias with few queries.

We have already seen a good choice for \(g\) in the last lecture: the gap majority function, which can be computed to bias \(1/\sqrt{n}\) with only 1 query.

Now what about a function \(f\) that can use small-bias queries to its bits to obtain its value with bounded error? In this case, another variant of the \(\mathsf{Index}\) function can be designed when we have partial functions. Define the \(\mathsf{ApproxIndex}\) function to be like the original \(\mathsf{Index}\) function along with the extra promise that all the bits of the input whose indices are close to the true address all have the same value.

Note that the counter-example in the proof sketch has very low query complexity. As a result, it’s possible that the sublinear behaviour of randomized query complexity with respect to composition is only logarithmic in the domain size of \(f\).

Question. Does every partial function \(f \colon X \to \{0,1\}\), \(X \subseteq \{0,1\}^n\) and every partial function \(g\) satisfy
\[\mathrm{R}_{\frac13}(f \circ g) = \Omega \left( \frac{ \mathrm{R}_{\frac13}(f) \cdot \mathrm{R}_{\frac13}(g)) }{\log n} \right)?\]

A Composition Theorem

The counter-example for composition shows that there are functions like \(\mathsf{ApproxIndex}\) that can be computed more efficiently if they can (cheaply) obtain small-bias queries of some of their bits instead of relying only on (full price) standard queries. So a natural conjecture for composition is that, to make the playing field more even, we should compare \(\mathrm{R}(f \circ g)\) not with the product of the standard randomized query complexities of \(f\) and \(g\) but instead replace \(\mathrm{R}(f)\) to a smaller measure that allows for “partial queries” to be made to \(f\)’s inputs.

We can formalize the idea above using the notion of noisy queries. In a noisy decision tree, each internal node of the tree specifies the index \(i \in [n]\) of one of the bits of the input and a bias measure \(\gamma\). The oracle being queried knows the true value of the queried bit \(x_i\), but instead of returning it directly, it generates a random bit

\[b = \begin{cases} x_i & \mbox{with probability } \frac{1+\gamma}2 \\ 1- x_i & \mbox{with probability } \frac{1-\gamma}2. \end{cases}\]

The noisy decision tree follows the edge labelled with \(b\) and continues this process until it reaches a leaf. The random variables generated for each query are independent from each other. So in general a noisy decision tree may choose to query the same bit \(x_i\) and apply success amplification to determine its value with greater accuracy.

Finally, the cost of noisy queries needs to be defined in a non-trivial way. One natural choice is to set the cost of a \(\gamma\)-biased query to be \(\gamma^2\). This is the model we fix. The overall cost of a noisy decision tree is the expected sum of the cost of its queries. And this notion lets us define a new measure of randomized query complexity. Let

\[\mathrm{noisyR}_\epsilon(f)\]

denote the minimum cost of a randomized decision tree that computes \(f\) with error at most \(\epsilon\).

Every function \(f\) satisfies \(\mathrm{noisyR}_\epsilon(f) \le \mathrm{R}_\epsilon(f)\). For some functions, like the parity and gap majority functions, the two measures are equal. And for others, like the approximate index function, the noisy complexity can be much smaller than the standard complexity.

Theorem. For every pair of partial functions \(f\) and \(g\),
\[\mathrm{R}(f \circ g) = \Omega \big( \mathrm{noisyR}(f) \cdot \mathrm{R}(g) \big).\]

The proof of the theorem is obtained by re-arranging the target inequality as an upper bound. We want to show that

\[\mathrm{noisyR}(f) = O \left( \frac{\mathrm{R}(f \circ g)}{\mathrm{R}(g)} \right).\]

We can establish this upper bound by showing how an algorithm \(B\) for the composed function \(f \circ g\) can be simulated efficiently to obtain a randomized noisy decision tree for \(f\). That is, on input \(x \in X\) in the domain of \(f\), we want to generate an input \(y = (y^1,\ldots,y^n)\) to \(f \circ g\) such that for each \(i \in [n]\), \(g(y^i) = x_i\). If we can do this, simulating \(B\) on \(y\) gives the value of \(f(x)\) that we want to compute.

The problem, of course, is that we don’t know \(x\) so we don’t know explicitly what string \(y\) to simulate \(B\) on. A naive solution to this problem is to build \(y\) “on demand”: whenever \(B\) queries a bit of \(y^i\), we start by querying \(x_i\), then choose some \(y^i \in g^{-1}(x_i)\) and use this string to answer \(B\)’s query. This approach guarantees correctness, but in the worst case we query one bit of \(x\) for each query of \(B\) to \(y\), so it only gives the trivial bound

\[\mathrm{R}(f) \le \mathrm{R}(f \circ g).\]

But we can also hope to answer some of \(B\)’s queries without querying the corresponding bit of \(x\). Let \(\mu_0\) and \(\mu_1\) be hard distributions for \(g\) that are supported on \(g^{-1}(0)\) and \(g^{-1}(1)\), respectively. And let’s say that the first bit that \(B\) queries in the \(i\)th input \(y^i\) has hte same marginal distribution under both \(\mu_0\) and \(\mu_1\). In this case, we can answer the query without knowing the value of \(x_i\): just output a bit with the bias of that common distribution on the queried bit!

This approach also works when the marginal distributions of \(\mu_0\) and \(\mu_1\) are close but not identical. Let’s say the bit has value 1 with probability \(p_0\) when \(y^i \sim \mu_0\) and \(p_1\) when \(y^i \sim \mu_1\). Assume that \(p_1 \ge p_0\). We can draw a random bit \(a\) uniformly at random from \([0,1]\) and do the following:

If \(a \le \min\{p_0,p_1\}\), output 1.
If \(a > \max\{p_0,p_1\}\), output 0.
Else, query the value of \(x_i\) and output 1 if and only if \(x_i = 1\).

This procedure generates a random bit with bias \(p_{x_i}\), but it queries the value of \(x_i\) itself with probability only \(p_1 - p_0\). This savings is enough to obtain a composition theorem. By introducing the notion of max conflict complexity, Gavinsky, Lee, Santha, and Sanyal (2019) show that this approach gives the upper bound

\[\mathrm{R}(f) \le O \left( \frac{\mathrm{R}(f \circ g)}{\sqrt{\mathrm{R}(g)}} \right).\]

The heart of the proof of the theorem above is the observation that if we have access to noisy queries, we can do better. Instead of having expected query cost \(|p_1 - p_0|\) when querying a bit with biases \(p_0\) and \(p_1\) under \(\mu_0\) and \(\mu_1\), we can have a noisy query with expected cost \((p_1 - p_0)^2/(p_1 + p_0)\). For example, when \(p_0 = \frac{1-\gamma}2\) and \(p_1 = \frac{1+\gamma}2\), a single \(\gamma\)-biased noisy query with cost \(\gamma^2\) suffices to generate a bit with the appropraite bit, whereas the approach listed above has expected cost \(\gamma\) for simulating the query.

To see all the remaining of the proof of the theorem, see this paper with Shalev Ben-David (2020). And for another incomparable composition theorem, see also this paper with Shalev Ben-David, Mika Göös, and Gilbert Maystre (2022).