7. Time Complexity
The set of decidable languages tells us what decision problems can be solved by computers that don’t have any restrictions on the amount of resources that they require to complete the computation. We can also ask about which languages can be computed “efficiently” by adding some restrictions to the amount of resources that computers can use during their execution. Doing so leads us to complexity theory.
In this first lecture on complexity theory, we consider what is possibly the most natural resource that we might want to restrict: time.
Time Complexity Classes
The time cost of the Turing machine \(M\) on input \(x \in \{0,1\}^*\) is the number of computational steps that \(M\) performs on that input before it halts. When \(M\) does not halt on input \(x\), its time cost can be defined to be \(\infty\), but from now on we will only consider Turing machines that halt on all inputs.
From the time cost of a Turing machine on individual inputs, we obtain a global measure of its time cost.
Definition. The (worst-case) time cost of the Turing machine \(M\) is the function \(t_M : \mathbb{N} \to \mathbb{N}\) where \(t_M(n)\) is the maximum time cost of \(M\) on any input \(x \in \{0,1\}^n\) of length \(n\).
Ideally, we would like this time cost measure to represent the amount of time that any (single-processor, classical) model of computation takes to decide a given language. But that’s not the case: it’s possible to show that the simple \(\textsf{Palindrome}\) language can be decided in time \(O(n)\) with random-access memory (RAM) machine models but has time cost \(\Omega(n^2)\) for single-tape Turing machines. And that’s for a rather uninteresting reason: it takes linear time just to go from one end of the tape to the other. Because this barrier is an artifact of the limitations of single-tape Turing machines and not representative of inherent limits of computation, when moving to time complexity we will consider a slight extension of Turing machines in which they have access to more than one (standard) tape.
A multi-tape Turing machine is defined in the same way as the (single-tape) Turing machines we have considered so far, except that they have \(\ell\) tapes for some constant \(\ell \ge 1\). It also has \(\ell\) tape heads, one on each tape, and can move them independently of each other. And each transition now depends on the current state of the TM as well as the \(\ell\) symbols currently visible under the tape heads. (As it turns out, \(2\)-tape Turing machines can efficiently simulate \(\ell\)-tape Turing machines for any constant \(\ell > 2\), so if you prefer, you can always consider multi-tape Turing machines that have exactly \(2\) tapes.)
The notion of time cost applie to multi-tape Turing machines and we can use this notion to refine the class of all decidable languages into various time complexity classes.
Definition. For every function \(f : \mathbb{N} \to \mathbb{N}\), the time complexity class \(\textbf{TIME}(f)\) is the set of all languages that can be decided by a multi-tape Turing machine \(M\) with worst-case time cost \(t_M \le O(f)\).
By considering various functions, we obtain a number of natural time complexity classes:
Time Class | Definition |
---|---|
Constant time | \(\mathbf{TIME}(1)\) |
Linear time | \(\mathbf{LIN} = \mathbf{TIME}(n)\) |
Quasi-linear time | \(\bigcup_{k \ge 0} \mathbf{TIME}(n \log^k(n))\) |
Quadratic time | \(\mathbf{TIME}(n^2)\) |
Cubic time | \(\mathbf{TIME}(n^3)\) |
Polynomial time | \(\mathbf{P} = \bigcup_{k \ge 0} \mathbf{TIME}(n^k)\) |
Linear-exponential time | \(\mathbf{E} = \mathbf{TIME}(2^{O(n)})\) |
Exponential time | \(\mathbf{EXP} = \bigcup_{k \ge 0} \mathbf{TIME}(2^{n^k})\) |
Double exponential time | \(\mathbf{EEXP} = \bigcup_{k \ge 0} \mathbf{TIME}(2^{2^{n^k}})\) |
Iterated exponential time | \(\mathbf{ELEMENTARY} = \bigcup_{k \ge 0} \mathbf{TIME}(\underbrace{2^{2^{\cdots^{2^n}}}}_{k})\) |
Time Hierarchy Theorem
Once we have defined these time complexity classes, it’s natural to ask whether they are all distinct from each other. In other words, does giving Turing machines more time let them decide more languages?
In the case of linear and cubic time, we can use the Universal Turing Machine and the Recursion Theorem to show that the two classes are indeed distinct.
Theorem 1. \(\mathbf{TIME}(n) \subsetneq \mathbf{TIME}(n^3)\).
Proof.
Consider the language \(A_{TM}^{n^{2}} = \{ \left< M \right> x : M \mbox{ accepts } x \mbox{ in } \le |x|^{2} \mbox{ steps}\}.\)
We first want to show that the language \(A_{TM}^{n^{2}}\) is in \(\mathbf{TIME}(n^3)\). We already know that the Universal Turing Machine \(U\) can simulate \(M\) on \(x\). We can modify \(U\) to augment it with a timer initialized with the value \(|x|^2\) so that it only simulates that number of steps of \(M\) before halting the simulation. This variant of \(U\) accepts if and only if \(\left< M \right> x \in A_{TM}^{n^2}\). And if we design this variant of \(U\) carefully and then analyze the number of steps that it needs to use to complete the simulation, we find that it can simulate each step of \(M\)’s computation in time at most \(O(n)\); overall it has worst-case time cost bounded by \(O(n^3)\). Therefore, \(A_{TM}^{n^2} \in \mathbf{TIME}(n^3)\).
We now want to show that \(A_{TM}^{n^{2}} \notin \mathbf{TIME}(n)\). Assume for the sake of contradiction that it is. Let \(T\) decide \(A_{TM}^{n^2}\) in linear time.
We define the Turing machine \(D\) that on input \(x\):
- Gets its own description \(\left< D \right>\) (by the Recursion Theorem).
- Calls \(T\) on the input \(\left< D \right> x\).
- Accepts if \(T\) rejects; otherwise rejects if \(T\) accepts.
By examining the proof of the Recursion Theorem, we see that the first step takes constant time (i.e., the time cost is independent of the length of \(x\).) And by assumption the second step takes \(O(n)\) time. So the overall time cost of \(D\) is linear. This means that there exists a large enough value \(n\) and an input \(x\) of length \(n\) for which \(D\) halts after fewer than \(n^2\) steps. But then it accepts this \(x\) if and only if \(T\) rejects \(\left< D \right> x\), contradicting the assumption that \(T\) decides \(A_{TM}^{n^2}\).
At a high level, the above proof uses the two fundamental observations that we established in the first part of the course: Turing machines can simulate any Turing machine given its description \(\left< M \right>\), but they can’t predict what \(M\) will do given its description without essentially simulating it. The same idea gives a much more general result known as the Time Hierarchy Theorem. But the statement of this theorem requires one more definition.
Definition. The function \(t : \mathbb{N} \to \mathbb{N}\) is time-constructible if there is a Turing machine that on every input of the form \(0^n\) for some \(n \ge 1\) writes \(t(n)\) on the tape and halts in time \(O(t(n))\).
All the functions in the table of complexity classes above are time-constructible. And they are all distinct, as the Time Hierarchy Theorem shows.
Time Hierarchy Theorem. For every pair of functions \(f, g \colon \mathbb{N} \to \mathbb{N}\) where \(f \log f = o(g)\) and \(g\) is time-constructible, \(\mathbf{TIME}(f) \subsetneq \mathbf{TIME}(g)\).
The proof of this version of the Time Hierarchy Theorem is essentially the same as the one we used for the special case above. But it does require two additional technical components:
- To obtain this version of the time hierarchy theorem, we need a more efficient version of the Universal Turing machine (with timer) which incurs only a logarithmic overhead in simulating each step of \(M\) (instead of a linear overhead, as we used in the proof of the special case above).
- And, more importantly, we also need to make sure that given the length \(n\) of the input \(x\), the machine \(U\) can initialize its timer with the value \(h(n) = g(n)/\log g(n)\). This is where time-constructibility comes in.
Is the time constructability condition in the theorem statement necessary? Maybe (perhaps with a slightly more clever approach) we could generalize the Time Hierarchy Theorem even further so that it holds for all pairs of functions \(f,g\) that satisfy \(f \log f = o(g)\)? No, we can’t! Time constructibility is a necessary condition in the Time Hierarchy Theorem. In fact, the (Time Hierarchy) Gap Theorem shows that it is possible to design a function \(f\) for which \(\mathbf{TIME}(f(n)) = \mathbf{TIME}(2^{f(n)})\). (Needless to say, this function \(f\) is not time-constructible; it also happens to be extremely fast growing.)
The Class P
The class \(\mathbf{P} = \bigcup_{k \ge 0} \mathbf{TIME}(n^k)\) of languages that can be decided by polynomial-time Turing machines plays a particularly important role in computer science. In fact, the class \(\mathbf{P}\) is the one that is often used to represent the class of decision problems that are “efficiently” computable. But why this class? In part, it’s because the class \(\mathbf{P}\) has two important properties that make it particularly well-suited for study: its closure properties, and its robustness.
The most important closure property of the class \(\mathbf{P}\) is that it is closed under subroutine calls. Let’s say that \(M\) is a Turing machine that decides a language but, to do so, simulates some (subroutine) Turing machines \(N_1, N_2, \ldots\) to solve some intermediate subproblems. Let’s say further that (1) when we only consider each call to one of the \(N_i\) machines as taking a single time step then \(M\) has polynomial time cost, and (2) each Turing machine \(N_i\) has polynomial time cost. Then \(M\) is a polynomial-time Turing machine.
What this closure property means is that we can use high-level descriptions of algorithms to define Turing machines (as we have already done on multiple instances). Then as long as the algorithm itself has polynomial time cost and each instruction of this algorithm can be computed in polynomial-time, then the result is a polynomial-time Turing machine. Note that the same property is not true of \(\mathbf{LIN}\) (or of many other natural time classes): if an algorithm has a linear number of instructions and each instruction can be completed in linear time, the overall time complexity of the algorithm can be as large as quadratic.
The robustness of the class \(\mathbf{P}\) lies in the fact that the Universal Turing Machine theorem can be extended to show that there is a Turing machine that can efficiently simulate (up to polynomial overhead) many other machine models as well. As a result, the set of languages that can be computed in polynomial time within those machine models are all contained in \(\mathbf{P}\). In fact, it is believed that this robustness property holds not only for some other machine models, but for all reasonable models of computation.
Cobham-Edmonds Thesis. Any decision problem that can be solved in polynomial time by an algorithm on any physically-realizable computer corresponds to a language that is in \(\mathbf{P}\).
(This thesis is quite a bit stronger than the Church-Turing thesis. For this reason it is also sometimes known as the strong Church-Turing thesis.)