3. Foundational Inequalities

The Inequalities

There are a number of different inequalities that are extremely useful in the study of randomized complexity classes. Three fundamental inequalities, however, really stand out for their ubiquity.

The first is so basic that it almost seems unnecessary to list it here. But it is called upon so often that it does deserve highlighting:

Union Bound. For any two events A and B,

Pr[AB]Pr[A]+Pr[B].

More generally, for any finite or countable set of events A1,A2,A3,,

Pr[n1An]n1Pr[An].

Proof. The first inequality follows from the identity

Pr[AB]=Pr[A]+Pr[BA]

and the inequality

Pr[BA]=Pr[BA]Pr[B].

Both of these facts follow directly from the axioms of probability, and can be extended to taking the union of countably many events to obtain the second inequality.

Note that there are no conditions whatsoever on the events A1,A2,A3, in the statement of the union bound. This is part of what makes it such a useful inequality; we can use it even when there are some complicated (and sometimes unknown) dependencies between these events.

The second main tool is Markov’s inequality.

Markov’s Inequality. For any non-negative random variable X and parameter t0,

Pr[X>t]<E[X]t.

Proof. The expected value of X can be expressed as

E[X]=Pr[X>t]E[XX>t]+Pr[Xt]E[XXt].

The conditional expectation E[XX>t] must be greater than t since the expectation of a random variable is at least as large as its minimum value. And when X is non-negative then X0 always holds and so E[XXt]0 for the same reason. Therefore,

E[X]>Pr[X>t]t+Pr[Xt]0.

Re-arranging the terms yields Markov’s inequality.

Markov’s inequality is also extremely useful because it only requires non-negativity of X. It works particularly well with linearity of expectation: for any random variables X1,X2,,Xn, we always have that

E[i=1nXi]=i=1nE[Xi].

So when X1,,Xn are all non-negative, the probability that the sum X1++Xn is much larger than its expected value can always be bounded by Markov’s inequality, even when the variables X1,,Xn have some complicated dependency between them.

When we are considering the sum of independent random variables, another tool gives us much stronger concentration bounds. This tool has numerous variants and a few different names, including Hoeffding’s inequality and Chernoff bounds. Here is one standard variant of the inequality.

Hoeffding’s inequality. Let X1,X2,,Xn be independent random variables that each take values in [0,1]. Define X=i=1nXi and let μ=E[X] denote the expected value of this sum. Then for any t>0,

Pr[X>μ+t]e2t2/n

and

Pr[X<μt]e2t2/n.

Truncation

Let’s start to use our foundational inequalities to examine some basic questions in randomized complexity theory. As a first step, let’s compare worst-case and average-case randomized complexity of Boolean functions. We have already seen that there are randomized algorithms for which the worst-case cost of the algorithm is much larger than the average cost. Can there be a similar separation for the complexity of functions? In other words, does there exist a function f for which R(f)R(f)?

The answer, it turns out, is no for most natural models of computation. The only restriction on the result is that it applies only to models where at each step of algorithms in these models, we have a well-defined notion of the cost incurred by the algorithm so far. Let us call models of computation where this condition holds odometered.

Theorem. In any odometered model of computation, for every function f and all pairs of parameters ϵ0,δ>0,

Rϵ+δ(f)1δRϵ(f).

Proof. Let R be a randomized algorithm that has error at most ϵ and satisfies cost(R)=Rϵ(f). Let R be the randomized algorithm obtained by truncating R whenever its cost exceeds 1δcost(R): at any step where R reaches this cost and has not yet halted, we abort the computation and guess the answer. By construction, R has worst-case cost at most 1δcost(R). It remains to show that it computes f with error at most ϵ+δ.

Fix any input x. Let E denote the event that R aborts and guesses the value of f. By the Union Bound, the probability that R errs is at most

errf(R,x)+Pr[E]ϵ+Pr[E].

And by Markov’s inequality, the probability that R aborts is

Pr[E]=PrDR[cost(D,x)1δcost(R)]cost(R,x)1δcost(R)δ.

Success Amplification

With one-sided error, it’s easy to reduce error probability exponentially fast.

As an example, see the EQ function.

Theorem. Rϵpub(EQ)=O(log1ϵ) and Rϵpri(EQ)=O(lognlog1ϵ).

Proof. Repeat the 1-sided error protocol log1ϵ times. Output 1 if and only if all instances of the protocol output 1. This protocol still has 1-sided error: it can err only by incorrectly outputing 1 when xy. But since the protocols are independent, the probability that this occurs is 121212log1/ϵ=2log1/ϵ=ϵ.

This success amplification of algorithms with 1-sided error is also what is used in the analysis of Karger’s algorithm. In that setting, the original error of 1(n2)1 is very close to 1, but by repeating the algorithm enough times we still get the desired error reduction. The sucess amplification theorem in this case looks like this:

Theorem. For every δ>0, R12,0(f)1δR1δ,0(f).

Proof. The probability of observing an error in each of the 1/δ instances of the algorithm is at most

(1δ)(1δ)1/δ=(1δ)1/δ(eδ)1/δ=1/e,

where we used the remarkably useful inequality 1xex that holds for all xR.

What about the setting where algorithms have 2-sided error? We can obtain the same qualitative success amplification results, using Hoeffding’s inequality.

Theorem. For every ϵ>0, Rϵ(f)O(log1ϵR13(f)).

Proof. Let A be a randomized algorithm with error at most 13 and cost R13(f). Consider the algorithm B that runs k independent instances of A and outputs the majority answer, for some value of k that we will fix later. Let X1,,Xk be random variables where Xi=1 if the ith instance of A errs and Xi=0 otherwise. The variables are independent and all take values in the range [0,1]. By linearity of expectation, the expected value of their sum X is

μ:=E[X]=iE[Xi]k3.

The algorithm B is wrong only if X>k2. By Hoeffding’s inequality, this occurs with probability

Pr[X>k2]Pr[Xμ>k6]e2k/62.

When k=18ln1/ϵ=Θ(log1ϵ), this error probability is at most ϵ.