2. Definitions

Our goal throughout the rest of these lectures will be to understand better when randomness helps (and perhaps why) with computational efficiency in various models of computation like the ones above. To do so, we need to introduce some definitions.

But an important challenge in the study of randomized algorithms is that even when we fix a model of computation, there are many different ways that we can define randomized algorithms, and not all of these definitions are equivalent to each other. So with this warning in place, here are some of the most basic definitions that we will need:

Randomized Algorithms

Even at the very start of our exploration, we already face a non-trivial challenge: how exactly do we define what is a randomized algorithm? There are at least five possible (reasonable) definitions:

A randomized algorithm is…

(Online randomness I) An algorithm that can flip a fair coin at any step of its computation.
(Online randomness II) An algorithm that can flip a $p$-biased coin at any step of its computation for any choice of $p \in [0,1]$.
(Offline randomness) An algorithm that has access to a random string $r \in \{0,1\}^*$ where each $r_i \in \{0,1\}$ is generated uniformly at random upon initialization of the algorithm.
(Distribution I) A distribution over deterministic algorithms.
(Distribution II) A convex combination of deterministic algorithms.

All of these definitions are essentially equivalent: in the settings we will consider, any algorithm that satisfies one of these conditions can be converted into another one in any of the four other types. This means that we will switch between the different definitions of randomized algorithms to use whichever one is most convenient at the moment.

Roughly, when we try to design an explicit algorithm for a task, we will often use definitions 2, or 3. And when we want to prove lower bounds or general statements about randomized algorithms, Definition 4 will often be the most convenient to work with, and so this is what we will do.

Cost

The cost of the algorithms we consider in this course will depend on the computational model. For a deterministic algorithm $D$ and an input $x$, let us denote the cost of $D$ on $x$ to be $\mathrm{cost}(D,x)$. (This cost may be related to the communication, time, or query cost of the algorithm as in the examples above, or it will also sometimes relate to other cost measures.)

Given a cost defined by $\mathrm{cost}(D,x)$, there are at least three different ways that we can define the overall cost of a randomized algorithm:

Worst-case cost. The worst-case cost of the randomized algorithm $R$ is the maximum cost of any deterministic algorithm in the support of $R$ over any of its inputs:

\[\mathrm{cost}(R) = \max_{D \in \mathrm{supp}(R)} \max_{x} \mathrm{cost}(D,x).\]

Average cost. The average cost of $R$ on input $x$ is its expected cost, which we can denote by $\overline{\mathrm{cost}}(R,x) = \mathrm{E}_{D \sim R}[ \mathrm{cost}(D,x) ].$ The overall average cost of $R$ is its maximum average cost over all inputs:

\[\overline{\mathrm{cost}}(R) = \max_x \overline{\mathrm{cost}}(R,x).\]

Distributional cost. If we have a distribution $\mu$ over the inputs of $R$, the distributional cost of $R$ with respect to $\mu$ is

\[\overline{\mathrm{cost}}^\mu(R) = \mathrm{E}_{x \sim \mu} \overline{\mathrm{cost}}(R,x).\]

The different cost measures relate to each other in the following way.

Proposition. For every randomized algorithm $R$,
\[\max_\mu \overline{\mathrm{cost}}^\mu(R) = \overline{\mathrm{cost}}(R) \le \mathrm{cost}(R)\]
and the inequality can be strict.

Proof. For every distribution $\mu$, $\overline{\mathrm{cost}}^\mu(R) \le \overline{\mathrm{cost}}(R)$. And we have equality when $\mu$ is the point mass distribution on the input $x$ that maximizes $\overline{\mathrm{cost}}(R,x)$.

The second inequality is by the definitions and to see how the worst-case and average-case costs of an algorithm can have arbitrarily large gap between them, consider the algorithm $R$ that flips a coin at each step and does one unit of work if it lands heads, and halts if the coin lands tails. This algorithm has expected cost 1 but unbounded worst-case cost.

Success / Error

Randomized algorithms are particularly interesting to study when we allow them to err with some probability. To do so, we need to define what we mean by the success or error rate of an algorithm.

Aside. There are also other ways to measure the correctness of an algorithm that go beyond measuring error probabilities as we do below. We will see these alternative correctness measures in later lectures.

In all of the following definitions, we assume that the target function that the randomized algorithm is attempting to compute is a Boolean function $f \colon \{0,1\}^n \to \{0,1\}$. Later, we will see how the same concepts generalize to functions with other ranges, to partial functions, and to relations.

For a deterministic algorithm $D$ and some input $x$ to the algorithm, let us write $D(x)$ to denote the output of $D$ on input $x$.

The error probability of the randomized algorithm $R$ with respect to the function $f \colon \{0,1\}^n \to \{0,1\}$ on input $x \in \{0,1\}^n$ is

\[\overline{\mathrm{err}}_f(R,x) = \Pr_{D \sim R}[ D(x) \neq f(x) ].\]

We can now define two notions of overall error of $R$.

Worst-case error. The worst-case error of $R$ with respect to $f$ is its maximum error over all inputs:

\[\overline{\mathrm{err}}_f(R) = \max_x \overline{\mathrm{err}}_f(R,x).\]

Distributional error. When $\mu$ is a distribution over the domain of $f$, the distributional error of $R$ with respect to $f$ and $\mu$ is

\[\overline{\mathrm{err}}_f^\mu(R) = \mathrm{E}_{x \sim \mu} \overline{\mathrm{err}}_f(R,x).\]

As with measures of cost, there is a close relation between the two different measures of error.

Proposition. For every randomized algorithm $R$ and function $f$,
\[\overline{\mathrm{err}}_f(R) = \max_\mu \overline{\mathrm{err}}_f^\mu(R)\]
when the maximum is taken over all distribution on $\{0,1\}^n$.

Complexity Measures

The cost and error measures are for individual randomized algorithms. But in general we will most often be interested not in those but in the inherent complexity of their target functions.

Informally, the randomized complexity of $f$ is the minimum cost of a randomized algorithm that achieves some target error (or success probability) with respect to $f$. But, as was the case with cost and error, there are multiple ways that we can make this informal notion precise.

Fix an error parameter $\epsilon$. There are two main notions of randomized complexity of $f$ that are most commonly studied.

Worst-case complexity. The worst-case complexity of $f$ at error $\epsilon$ is the minimum worst-case cost of a randomized algorithm that achieves worst-case error at most $\epsilon$. We write this as:

\[\mathrm{R}_\epsilon(f) = \min_{R \,:\, \overline{\mathrm{err}}_f(R) \le \epsilon} \mathrm{cost}(R).\]

Average-case complexity. The average-case complexity of $f$ at error $\epsilon$ is the minimum average cost of a randomized algorithm with worst-case error at most $\epsilon$. Here we write:

\[\overline{\mathrm{R}}_\epsilon(f) = \min_{R \,:\, \overline{\mathrm{err}}_f(R) \le \epsilon} \overline{\mathrm{cost}}(R).\]

These are not the only two complexity measures we can study. For example, we can also study the following one:

Maximum distributional complexity. Define the distributional complexity of $f$ with respect to a distribution $\mu$ on its domain to be $\overline{\overline{\mathrm{R}}}_\epsilon^\mu(f) = \min_{R \,:\, \overline{\mathrm{err}}_f^\mu(R) \le \epsilon} \overline{\mathrm{cost}}^\mu(R)$. The maximum distributional complexity of $f$ is

\[\overline{\overline{\mathrm{R}}}_\epsilon(f) = \max_\mu \overline{\overline{\mathrm{R}}}^\mu_\epsilon(f).\]

These three complexity measures are comparable to each other.

Proposition. For every function $f$, $\overline{\overline{\mathrm{R}}}_\epsilon(f) \le \overline{\mathrm{R}}_\epsilon(f) \le \mathrm{R}_\epsilon(f).$

It’s a good exercise to see when the inequalities can be strict and what is the largest gap that you can have between these different measures.