4. Minimax Principle
One of the other foundational result in the study of randomized complexity theory is Yao’s Minimax Principle. Before we can introduce it, let’s examine distributional complexity measures in a bit more detail.
Distributional Complexity
We saw already that there are distributional complexity measures for randomized algorithms. Similar complexity measures can also be defined for deterministic algorithms as well. Let’s define one such measure now.
Recall that the expected distributional error of a randomized algorithm
The distributional error of a deterministic algorithm
The worst-case distributional complexity of a function
Note that in this definition, the error of
The advantage of working with distributional complexity is that it is often easier to prove lower bounds on the cost of deterministic algorithms against random inputs than it is to argue about randomized algorithms. And yet, there is a very useful relation between randomized and distributional complexity which shows that distributional lower bounds also give lower bounds for randomized algorithms:
Lemma 1. For every function
and every ,
Proof. Fix any distribution
. Since the set of randomized algorithms
with is a superset of the set of randomized algorithms with error at most on all inputs, Now
, so there must always be a deterministic algorithm in the support of with error at most that of . And the worst-case cost of equals the maximum cost of any deterministic algorithm in its support. So
The use of this lemma and other related variants is often justified by the name of “Yao’s Minimax Principle”, but that’s not quite correct; before we see some example uses of the lemma, let’s first clarify what the Minimax Principle actually shows.
Yao’s Minimax Principle
The lemma It is often called “Yao’s Minimax Principle”, but it is more accurate to call this the “easy (or even perhaps trivial) direction of Yao’s Minimax Principle”.
The full Minimax Principle states something much stronger: that the inequality in the last lemma can be replaced with equality. Or, in other words, that the method of proving lower bounds on randomized complexity measures by bounding the distributional complexity of the same function is complete in that there will always be a hard distribution for which the distributional complexity matches the randomized complexity of the function.
Minimax Principle. (Worst-case randomized complexity variant) For every function
and every ,
This Minimax Principle is only one of many (many!) examples of minimax theorems in different areas of mathematics. Generally, a minimax theorem establishes the identity
for some sets
There are other variants of the Minimax Principle for randomized complexity theory. The following one was the origiinal Minimax Principle, the one that Andrew Yao first established. It is concerned with the average complexity of zero-error randomized algorithms.
Minimax Principle. (Average-case zero-error randomized complexity variant) For every function
,
Let us see how this variant of the Minimax Principle is a minimax theorem.
Let
where the last max is over all distributions on the domain of
Therefore, the Minimax Principle for average complexity of zero-error algorithms can be expressed as the minimax theorem
The last identity follows from the fact that the average cost of any randomized algorithm with respect to
Note, however, that the Minimax Principle does not hold in all randomized models of computation. For example, we have the following “non-minimax” result.
Lemma 1. For every function
and every , and there are choices of
and for which the inequality is strict.
Using the (Easy Direction of the) Minimax Principle
We can some examples of how to use Lemma 1 to obtain lower bounds on the randomized complexity of functions by revisiting the evasiveness conjecture from the first lecture.
Recall that in this setting, the cost of a deterministic algorithm on a graph
Theorem.
Proof. By Lemma 1, it suffices to construct a distribution
on graphs for which every deterministic algorithm that queries at most pairs of vertices must err with probability greater than on graphs drawn from . Take to be the distribution where
- With probability
, it returns the complete bipartite graph with parts and . - With probability
, it returns the graph obtained by taking \(K_{\frac n2, \frac n2\}\) and adding one more edge to the graph uniformly at random. In the first case, the graph returned by the algorithm is bipartite. In the second case, it’s not. But consider any deterministic algorithm that queries at most
vertex pairs before terminating. By the union bound, the probability in the second case that it queries the additional edge added within one of the two parts is at most Let’s say that it accepts (or outputs “bipartite”) with probability
even when it does not query any additional edge outside of the ones in the base graph. The error of the algorithm with respect to is then at least
Here’s another example of the use of Lemma 1.
Theorem.
Proof. We again construct a hard distribution
. This time,
- With probability
, it returns the empty graph. - With probability
, it returns a random embedding of the graph in the empty graph. A deterministic algorithm that queries only
vertex pairs will observe an edge in the second case with probability at most . By the same reasoning as in the last proof, any such algorithm must have error greater than on .
There are many other examples of the use of Lemma 1 or similar variants in other models of computation, as we will see throughout the rest of the course.