Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Compared to its standing in many other disciplines, the function of time in traditional software modeling seems modest. Undergirding software theory and practice is the notion of algorithm – the most abstract view of computational process, in terms of elementary abstract operations. An algorithm is an effective procedure to solve a computational task that can be carried out with resources precisely defined; each algorithm defines a way to implement the computation of a function of some input into some output. In this very abstract view, time is mostly irrelevant, and the models that combine individual algorithms into larger pieces of software focus on purely functional behavior. On the other hand, computational complexity considers an abstract notion of running time of each algorithm as its time complexity, and considers also the algorithm’s usage of other computational resources (such as memory). Time complexity analysis “counts” the number of elementary operations performed when executing the algorithms on an abstract processor; time still is an uncomplicated entity compared to its complex attributes in other disciplines such as dynamical system theory (discussed in Chap. 4). Representing a significant departure from the unified view of dynamical systems, the traditional models of algorithms are double-edged, as they complement a functional view with aspects of time complexity analysis.

This twofold view of time in software – functional and complexity-theoretical – has historical origins and a theoretical basis. Historically, the functional/complexity-theoretical duality mirrors, and further abstracts, the macro/micro views used in hardware synchronous models and described in Chap. 5: the macro behavior of a hardware circuit is the sequence of actions performed at every tick of the clock, but each macro-step is just a summary of the behavior of individual signals that propagate through the elementary circuit elements with their asynchronous delays. The functional/complexity-theoretical duality and its elementary notion of time are also justified by the theoretical results described in Sects. 6.8 and 6.2.7, which suggest that, within the framework of algorithmic computation universally used, more complex models of time would not affect the results of the analyses performed.

This chapter introduces the fundamental notions used in the computational complexity analysis of algorithms, and illustrates some of the most important results. The purpose of this chapter is twofold. First, it completes the historical account of Part I of the book by presenting the characteristics of time in the traditional models used to describe software and computational processes. The presentation focuses on the aspects that are most relevant for the formalisms introduced, whereas other dimensions mentioned in Chap. 3 (such as concurrency and composition) are discussed in Part II of the book with reference to more specific notations derived from the basic models of the present chapter. Second, it introduces some notions of computational complexity that will be referenced in Part II of the book to discuss analysis and verification algorithms for the various models and notations presented. In this way, the book is self-contained also for readers with no prior knowledge of computational complexity.

1 Models of Algorithms and Computational Complexity

Algorithms run on digital machines, and every digital machine is, in principle, just a finite-state machine – with a possibly huge but finite number of states. Therefore, it seems natural to found the notion of algorithm and computational process on abstract operational models such as finite state machines – in the synchronous interpretation with discrete time that abstracts away from implementation details. Chapter 5, however, already discussed how a finite-state model is impractical when the number of states involved is too large, as in the case of modern digital computers with large memories and complex processors.

In addition to having this practical limitation, the general notion of algorithm calls for a more abstract view of computation, which finite-state models fail to capture satisfactorily even in principle. An algorithm describes, in an abstract yet implementable setting, a general procedure to solve a certain computational problem; sorting a list of elements, computing the product of two integer numbers in their decimal representation, and finding the shortest path between two given nodes on a graph are all examples of computational problems with well-known algorithmic solutions. The definition of computational problem – and of the algorithms that solve it – must encompass inputs of any size: we are not interested in sorting a list of elements of length up to 100 (or even 100100), but of any length; we want to devise algorithms that compute the product of two integers of any size, and that find the shortest path on a graph with any number of nodes. Correspondingly, it is natural to express the complexity of an algorithm – an abstract measure of the “time” taken by the computation – as a function of the input size. This setting requires models that extend abstract finite-state machines with unbounded resources, so that they can perform computations involving arbitrarily large numbers of steps and amounts of memory, and can summarize the behavior of any computing device irrespective of its configuration and resources. Such infinite-state abstractions capture the notion of algorithm and have yielded general results that are directly applicable to the design and analysis of real programs running on digital computers with finite resources.

The rest of the present chapter reviews the fundamental notions of computational complexity in two contexts:

  • Section 6.2 deals with automata models. “Automata” is the standard name given to abstract machines in the context of algorithmic analysis. The section focuses on two fundamental automata classes: finite-state automata and Turing machines, which directly extend finite-state machine models with unbounded computational resources. (Other, more specialized, automata classes are out of this book’s scope; interested readers can find references in the bibliographic remarks at the end of the chapter). Automata models underpin the most fundamental results of computability and computational complexity theory; they introduce a very abstract view of computational processes that enables the analysis of the fundamental capabilities and limits of computation, such as the properties shared by every conceivable algorithm that solves a certain computational problem.

  • Section 6.3 presents computational models based on computing architectures à la von Neumann, and in particular the “random access machine”. These models introduce, still in an abstract setting, some finer-grained details of real computer architectures and organization. They support the analysis of properties of specific algorithms and of some of their implementation details on real computers.

  • Section 6.4 discusses extensions of the fundamental abstract models of Sect. 6.2 with probabilistic features.

The presentation of this chapter is concerned with the historical approach of the current Part I; hence it mainly focuses on the traditional models of computation as a strictly sequential process that starts from the input, performs a sequence of elementary computational steps, and produces, if it terminates, the output. In these models, the time domain is DISCRETE (the natural numbers) and so are all the domains included in the computational model (such as the state). The evolution over time is sequential and DETERMINISTIC. However, the present chapter will also discuss the role of nondeterministic and probabilistic models of computation when relevant and instrumental to the general themes of the chapter.

2 Computational Complexity and Automata Models

The theory of computational complexity primarily uses models derived from the finite-state machines used in hardware design (see Chap. 5), called finite-state automata in complexity theory parlance. The next Sect. 6.2.1 introduces the general approach to defining computational complexity measures; then, Sect. 6.2.2 discusses the complexity analysis of finite-state automata. The rest of Sect. 6.2 introduces Turing machines, an extension of the finite-state model with unbounded resources. Turing machines support the most general definition of computational process (Sects. 6.2.3 and 6.2.4) and allow for a uniform and abstract analysis of the computational complexity of algorithmic problems (Sects. 6.2.5 and 6.2.7). The closing Sects. 6.2.8–6.2.10 discuss the role of nondeterminism in the context of computational complexity, and present some fundamental results based on the comparison of these models with the standard deterministic ones.

2.1 Measures of Computational Complexity

In the functional view, an algorithm A transforms an input x into an output f A (x). Computational complexity measures the resources used in A’s computation as a function of the input size (also called “length”). More precisely, it associates two complexity measures with every algorithm A: the time complexity T A and the space complexity S A ; both are functions from \(\mathrm{I}\!\mathbb{N}\) to \(\mathrm{I}\!\mathbb{N}\), following an inherently DISCRETE presentation of time and other state domains. Different inputs of the same size will, in general, produce computations of different length or memory usage; the most common definition of complexity – and the one we use throughout the chapter – assumes a worst-case scenario: take the maximum number of steps and the maximum amount of memory used for all possible inputs of a given size. Then, for every n, the time complexity T A (n) measures the maximum number of elementary operations (“steps”) performed by any computation with input of size n; and the space complexity S A (n) measures the maximum amount of elementary memory elements (“cells”) used during any computation with input of size n. For example, a time complexity 2n is associated with an algorithm that takes up to twice as many steps as the input size; a space complexity \({n}^{2} + {3}^{n}\) characterizes an algorithm that may use up to 13 (that is, \({2}^{2} + {3}^{2}\)) memory cells for an input of size 2.

The time and space complexities are in general partial functions; we write T A (n) =  if A does not terminate for some input of length n, and S A (n) =  if some input of size n generates a computation that uses an unbounded amount of memory.

The intuition behind the notions of time and space complexities should be clear, but their precise definition seems to require fixing many subtle details, such as the encoding of the input (and how we measure its size) and the elementary steps allowed by the computational model. Somehow surprisingly, Sect. 6.8 will show that these details are ultimately negligible with respect to other factors, and a sound and robust definition of complexity is possible at a very high level of abstraction, independently of most implementation details. Until then, the presentation relies on the readers’ intuition to deploy the complexity measures appropriately. The presentation focuses on the time complexity, given the general theme of the book, and discusses space complexity occasionally, mostly to show its relationship with time complexity.

2.2 Finite-State Automata

Let us apply the notion of complexity measures to the abstract computational model defined by finite-state automata in the perfect synchrony abstraction (Chap. 5). The input of a finite-state automaton is the sequence of input events; an input of n events has length n. It is irrelevant whether the automaton has an explicit output (such as in Moore and Mealy machines) or the sequence of states visited implicitly determines a trace of the computation. In both cases, the computation of an input of length n consists of exactly n computational steps because each transition exactly takes one input event. Therefore, T A (n) = n for every finite-state automaton A. The memory of a finite-state automaton is its finite set of states. Hence, S A (n) =  | Q | for every finite-state automaton A with | Q | states, and the space complexity of finite-state automata does not depend on the input size.

The literature on automata and formal language theory usually calls real-time machine any automaton that requires no more steps than the input length to process the input. Finite-state automata are real-time machines according to this terminology. The correspondence between this formal definition of “real time” and the informal usage to denote “predictable timing behavior” – discussed in Chaps. 1 and 3 – is loose. More precisely, the real time of automata implies the “informal” real time, but the latter is usually meant to include a broader class of behaviors and systems.

2.3 Turing Machines

Section 6.1 argued for a formal definition of algorithm and computational process based on an abstract computing machine more general than simple finite-state automata. This section presents “Turing machines”, a model that extends finite-state automata with unbounded scratch memory to perform arbitrarily long computations. Turing machines are named after the British mathematician Alan M. Turing, who introduced them in a groundbreaking article published in 1936.

Definition 6.1 (Turing machine). 

A k-tape Turing machine, pictured in Fig. 6.1, consists of:

  • A control unit, which is essentially a finite-state machine with a finite set Q of states managing the access to the input, output, and memory devices.

  • k + 2 tapes: one for the input, one for the output, and k for scratch memory. Each tape consists of a sequence of cells, bounded to the left and unbounded (infinite) to the right; cells on each tape are numbered with natural numbers. Each cell stores an element from a finite set Σ of symbols (the alphabet). Σ includes a special blank symbol “ □ ”. The memory tapes are rewritable, whereas the input is read-only and the output write-only.

  • k + 2 moving heads, one for each tape; the control unit accesses the tapes through the heads.

  • A transition function δ, whose details are described below, which defines how the control unit operates on the other components of the machine and on its own state.

Fig. 6.1
figure 1

A k-tape Turing machine

At the beginning of every computation:

  • The input tape stores an encoding of the input.

  • All memory cells in the tapes other than the input are blank.

  • Each head occupies position 0 on its memory tape.

  • The control unit is in an initial stateq 0 ∈ Q.

A computation consists of a sequence of steps from the initial setup. After every step during the computation:

  • The control unit is in one of the control states.

  • The tapes store only a finite number of non-blank symbols.

  • Every head occupies exactly one cell on its tape (the “current cell”).

The state of the control unit, the non-blank portions of the tapes, and the positions of the heads determine a configuration of the Turing machine.

The transition function

$$\delta : Q \times {\Sigma }^{k+1} \rightarrow Q \times {\Sigma }^{k+1} \times \{ R,L,{S\}}^{k+1} \times \{ R,S\}$$

defines every step (also called “move” or “transition”) from a configuration to the next one in a computation as follows. Whenever

  • The control unit is in state q ∈ Q;

  • The current cell under the input head stores the symbol i ∈ Σ;

  • For every 1 ≤ j ≤ k, the current cell under the jth tape head stores the symbol m j  ∈ Σ;

  • \(\delta (q,i,{m}_{1},\ldots ,{m}_{k})\) is defined and equal to

    $$\langle q^{\prime},o,{m}_{1}^{\prime},\ldots ,{m}_{k}^{\prime},{h}_{I},{h}_{1},\ldots ,{h}_{k},{h}_{O}\rangle$$

    for some q′ ∈ Q, o ∈ Σ, \({m}_{1}^{\prime},\ldots ,{m}_{k}^{\prime} \in \Sigma \), \({h}_{I} \in \{ R,L,S\}\), \({h}_{1},\ldots ,{h}_{k} \in \{ R,L,S\}\), h O  ∈ { R, S},

the Turing machine changes configuration in the following way:

  • The control unit switches to the state q′;

  • The current cell under the output head is written with the symbol o;

  • For every 1 ≤ k ≤ k, the current cell under the jth tape head is rewritten with the symbol m j ;

  • The input head moves right (by one position) if h I is R, moves left if h I is L, and does not move (“stays”) if h I is S;

  • For every 1 ≤ k ≤ k, the jth tape head moves right (by one position) if h j is R, moves left if h j is L, and does not move if h j is S;

  • The output head moves right (by one position) if h O is R, and does not move if h O is S (it cannot move left).

A computation stops when no move is possible; this occurs when \(\delta (q,i,{m}_{1},\ldots ,{m}_{k})\) is undefined, or when \(\delta (q,i,{m}_{1},\ldots ,{m}_{k})\) requires the head of the input tape or of one of the memory tapes to move left when it is already at the beginning of the tape.

It is customary to extend the graphical representation of finite-state automata to Turing machines: a graph with nodes corresponding to the control states Q and an arc between every pair of nodes q, q′ with label

$$i,{m}_{1},\ldots ,{m}_{k}/o,{m}_{1}^{\prime},\ldots ,{m}_{k}^{\prime}/{h}_{I},{h}_{1},\ldots ,{h}_{k},{h}_{O}$$

whenever the transition function defines

$$\delta (q,i,{m}_{1},\ldots ,{m}_{k})\ =\ \langle q^{\prime},o,{m}_{1}^{\prime},\ldots ,{m}_{k}^{\prime},{h}_{I},{h}_{1},\ldots ,{h}_{k},{h}_{O}\rangle.$$

\(\blacksquare\)

The k-tape Turing machine in Definition 6.1 is different from Turing’s original model, which has a single tape used for input, output, and scratch memory. We preferred a slightly more complex model, because it better fits the traditional dynamical system view (Chap. 4), with a clear separation between input, output, and state. The rest of the current Sect. 6.2 will also show how the k-tape machine supports a more realistic complexity analysis than the single-tape machine, where significant time may be spent simply accessing the remote input or output portions of the tape.

Example 6.2.

Consider the problem of computing the successor \(\mathit{succ}(x) = x + 1\) of natural numbers encoded in binary; encode the input as a sequence of 0/1 characters, one per cell, with the most significant digit occupying the second position of the input tape, after a leading blank. With this setup, a Turing machine M succ with one memory tape that solves the problem scans the input backward (from right to left) while writing on the memory tape (from left to right):

  1. 1.

    Before detecting the first 0 digit, it rewrites every 1 digit as 0 on the memory tape with an implicit “carry”;

  2. 2.

    When it detects the first 0 digit, it rewrites it as 1 on the memory tape;

  3. 3.

    After detecting the first 0 digit, it copies all the remaining input characters to the memory tape;

  4. 4.

    If there are no 0 digits, the machine adds an extra 1 digit after writing to the memory tape as many 0’s as 1’s are in the input;

  5. 5.

    After processing all the input (that is, when the input head reads a blank “ □ ”), the machine reverses the content of the memory tape to the output and stops.

Fig. 6.2
figure 2

A 1-tape Turing machine M succ that computes the successor of a binary number (Multiple transitions between the same pairs of states are represented by multiple labels attached to the same edge, a customary convention that we will use whenever convenient)

Figure 6.2 pictures such a Turing machine for the successor function, where q 0 is the initial state, q 1 is reached when the input head is on the rightmost input cell, q 2 denotes that the first 0 (if any) has been detected, and q 3 is the state where the machine halts after writing the output. \(\blacksquare\)

Exercise 6.3.

Define a Turing machine M double that computes the function double(x) = 2x of the input encoded in binary as in Example 6.2.

(Hint: doubling amounts to a shifting of the digits in binary). \(\blacksquare\)

Example 6.4.

A 2-tape Turing machine M square that computes the function square(x) = x 2 of natural numbers encoded in unary (that is, the encoding of x is a sequence of x symbols 1) works as follows:

  • Scan the input and copy it on both memory tapes; after copying all the input, the input head does not move anymore.

  • For each of the x cells of the first memory tape that stores a 1:

    • Scan and copy to the output the whole non-blank content of the second memory tape;

    • Move the head on the second memory tape back to the first position. \(\blacksquare\)

Exercise 6.5.

Complete the formalization of the Turing machine M square in Example 6.4. \(\blacksquare\)

2.4 Universal Computation and the Church-Turing Thesis

When computing, a Turing machine goes through a sequence of configurations. The sequence may be finite – the machine halts because it reaches a configuration where no further steps are defined – or infinite. The configuration completely determines the future behavior of a Turing machine: it is the equivalent of the state in dynamical system models (Chap. 4), but the standard terminology of automata models reserves the word “state” for the state of the control unit, which varies over a finite domain, whereas the domain of all possible configurations is infinite.

An infinite configuration space endows Turing machines with a conspicuous computational (expressive) power, their primitive memory model notwithstanding. In particular, it is clear that Turing machines can define computations inexpressible with finite-state automata: a finite-state automaton can only process the input “on line” as it reads it, but it has no long-term memory other than that of the finite states. Turing machines, in contrast, can process the input “off-line” after storing it on a memory tape. For example, no finite-state automaton can compute the function x 2 (encoded in unary as in Example 6.4) because an output of length x 2 cannot be generated “in real time”: finite-state automata produce the output synchronously with every input character, and only a finite number of characters can be output with every input event.

The realization that Turing machines achieve greater computational power than finite-state automata immediately prompts the question of whether there exist computing devices even more powerful than Turing machines. More precisely, we are interested in computational models that are implementable, that is, we abstract away inessential details of real physical processes, which can be engineered in real computing devices. The Turing machine model is implementable: the unbounded memory tapes are only an abstraction for resources that are not fixed a priori but can grow as the computation progresses.

Even with this qualification about implementations, the best answer to the question about the existence of “the most powerful” computing device is only a conjecture known as the “Church-Turing thesis”:

Every implementable computational process can be computed by a Turing machine.

The evidence corroborating the validity of the Church-Turing thesis is overwhelming, as very diverse computational models, defined in extremely different contexts and with heterogeneous scopes, have always been proved to be no more powerful than the simple Turing machine. Section 6.3 explains, in particular, that the RAM – a model much closer to the architectures of digital computers – has the very same computational power as the Turing machine.

The Church-Turing thesis and its experimental validation justify the choice of the Turing machine model to present general results about the properties of computational processes. It also implies that there exist “programmable” Turing machines that can simulate every other Turing machine by encoding its transition function in the input, similarly to what is done in general-purpose digital computers where instructions and data are both stored in the central memory (Sect. 6.3 discusses these models in more detail). This property is called universality: some Turing machines can simulate every other Turing machine.

The computational power of Turing machines also poses ultimate limits on their analyzability. There are many questions about the behavior of Turing machines that, while perfectly well defined, cannot be computed by any Turing machine and therefore, according to the Church-Turing thesis, by any algorithm or computing device. In particular, every question about the long-term dynamics of a generic Turing machine, seen as an operational model of a dynamical system, is undecidable: there exists no algorithm that can answer such a question reliably and in finite time. The computational complexity and timing analysis cannot be completely automated: with great power comes great undecidability.

Example 6.6.

The following questions about the long-term dynamics of a generic Turing machine M are undecidable:

  • The halting problem: does M halt for every input?

  • What is the maximum number of memory cells used by M during any computation? Is it always finite?

  • What is the maximum k such that M halts for every input of size less than or equal to k?

  • The busy-beaver problem: what is the maximum number of moves made by M in a halting computation? \(\blacksquare\)

2.5 Complexity of Algorithms and Problems

According to the Church-Turing thesis, the Turing machine provides the most general notion of computational process. This section introduces the timing analysis of Turing machines with the approach of computational complexity discussed in Sect. 6.2.1.

A computation consists of a sequence of configurations; the timing details of the configuration changes are abstracted away, and the only measure of time is the count of steps, each considered atomic. The time complexity T M (n) of a Turing machine M is then the function that counts the maximum number of M’s moves with an input of size n.

Example 6.7.

Consider the 2-tape Turing machine M square – outlined in Example 6.4 – that computes the square of the input encoded in unary form. For an input of size n, M square performs the following moves:

  • n + 1 steps to scan the input until the input head reaches the first blank, while copying it over the two memory tapes in parallel;

  • \(1 + 2(n + 1)\) steps for each input character: one step to move the head on the first tape to the next cell, n + 1 steps to scan the characters stored on the second tape until the first blank character, and another n + 1 steps to move the head back to the beginning of the tape just before the first position with a stored character.

In total, M square performs

$$(n + 1) + n \cdot (1 + 2(n + 1)) = 2{n}^{2} + 4n + 1 = {T}_{{ M}_{\mathit{square}}}(n)$$

moves for every input of size n. \(\blacksquare\)

There are two different, but tightly connected, branches that deal with computational complexity along the lines of Example 6.7. On the one hand, the “analysis of algorithms” is concerned with studying the complexity of individual Turing machines that formalize a certain algorithm; Example 6.7 is an instance of analysis of an algorithm to compute the square of numbers in unary form. On the other hand, “computational complexity theory” analyzes the inherent complexity of every algorithm for a certain computational problem. Example 6.7 considers the complexity of a particular solution to the problem of “computing the function square”, whereas computational complexity theory would investigate the complexity that any algorithm that correctly computes the function square must have, no matter how different the algorithms that compute the function are. The algorithmic analysis in Example 6.7, in particular, shows that it is possible to compute square with a time complexity \({T}_{{M}_{\mathit{square}}}\), but it leaves open the possibility of devising more ingenious algorithms that achieve the same goal with fewer computational steps.

Computational complexity theory classifies the computational problems into complexity classes according to their inherent complexities: two problems in the same class can be solved with a similar amount of resources. The rest of the present section gives a more rigorous definition of complexity classes, and briefly presents some of the most important ones. As for every other topic discussed in Part I, providing a comprehensive overview of such a complex and broad topic as computational complexity theory is impossible in this book – and out of its scope. The presentation focuses instead on the nitty-gritty with as few technical details as possible, and it privileges the results that are directly applicable to the presentation of the modern computational models that include a notion of time, introduced in Part II.

2.6 The Linear Speed-Up Theorem

Consider again the simple time complexity analysis of the Turing machine in Example 6.7. The exact form of \({T}_{{M}_{\mathit{square}}}\) depends on details of how the Turing machine works; for example, if the machine could detect when a head is on the first character written at the left end of a tape, it would perform n fewer steps for the input n. Remarkably, the linear speed-up theorem shows that these details are ultimately unimportant, and seemingly brittle analyses such as the one for \({T}_{{M}_{\mathit{square}}}\) turn out to be the basis of a robust and representative characterization of the time complexity of real algorithms, irrespective of their implementation details. Even more interestingly, a similar speed-up theorem holds for other models of computations, such as the RAM model of von Neumann architectures discussed in Sect. 6.3 in which the linear speed-ups correspond to an increase in the bit size of memory words in a real computer. This wide applicability of the notion of speed-up validates the generality of time complexity analysis.

Theorem 6.8 (Linear speed-up). 

Given any Turing machine M solving a problem with time complexity T M (n) and any rational number c > 0, it is possible to build another Turing machine m′ that, after reading the whole input string, solves the same problem as M with time complexity c ⋅ T M (n), that is,

$${T}_{M^{\prime}}(n)\ =\ \max \left (n,c \cdot {T}_{M}(n)\right ).$$

\(\blacksquare\)

The fundamental idea behind the proof of the linear speed-up theorem is the trade-off between complexity of the input (and memory) encoding and the resources needed to solve a problem: if the alphabet of M′ has an element for every k-tuple of characters of the alphabet of M (for a suitable choice of k according to c’s value), M′ can coalesce multiple sequential steps of M into one of its steps, thus achieving a faster running time for the same problem.

The linear speed-up theorem entails the robustness of the complexity analysis of Turing machines – according to the principles of Sect. 6.2.1 – with respect to changes in the details of how the machines work. More precisely, every specific that does not affect the asymptotic value of the complexity measures, but accounts for at most a constant factor, can be overlooked. This suggests ignoring multiplicative factors completely and partitioning complexity measures according to their asymptotic behavior, using the following notation.

Definition 6.9 (Asymptotic notation). 

Consider two functions \(f,g : \mathrm{I}\!\mathbb{N} \rightarrow \mathrm{I}\!\mathbb{N}\).

  • f is O(g) (“big oh of g”) if there exist positive constants c, k (k integer) such that \(f(n) \leq c \cdot g(n)\) for all n > k;

  • f is \(\Omega (g)\) (“big omega of g”) if there exist positive constants c, k (k integer) such that \(f(n) \geq c \cdot g(n)\) for all n > k;

  • f is \(\Theta (g)\) (“big theta of g”) if f is O(g) and \(\Omega (g)\). \(\blacksquare\)

Example 6.10.

The function \(2{n}^{2} + 4n + 1 = {T}_{{M}_{\mathit{square}}}(n)\) is:

  • O(n 2), \(\Omega ({n}^{2})\), and \(\Theta ({n}^{2})\);

  • \(\Theta ({n}^{2} + n + 1)\), \(\Theta ({n}^{2} + n)\), \(\Theta (3{n}^{2})\), \(\Theta (5{n}^{2})\), \(\Theta (10{0}^{1000}{n}^{2})\);

  • O(n 4), O(4n), \(\mathrm{O}(\exp (\exp (\exp (\exp ({n}^{1/2})))))\). \(\blacksquare\)

Exercise 6.11.

Show the following relation between exponential functions: for every 1 < b < c, b n is O(c n). \(\blacksquare\)

The \(\Theta \) relation is an equivalence relation among complexity functions. The equivalence classes in the partition induced by the equivalence relation are robust with respect to constant multiplicative factors and additive factors with asymptotically slower growth; machines with complexity measures in the same class have the same asymptotic complexity. Correspondingly, we can overload the notation and let \(\Theta (g)\) denote the set of all functions f that are \(\Theta (g)\). Customarily, \(\Theta (g)\) is presented with g in the simplest form with unit constants. For example, \({T}_{{M}_{\mathit{square}}}(n)\) in Example 6.4 induces the set of functions \(\Theta ({n}^{2})\), and M square has asymptotic complexity in \(\Theta ({n}^{2})\).

The asymptotic notation supports the analysis of the computational complexity of algorithms, and the invention of new ones, based on their asymptotic complexity: improvements in the hardware can achieve a linear speed-up of the algorithms currently available, but a new algorithm with asymptotically faster behavior will overwhelmingly outperform the other algorithms for inputs of increasingly large size, irrespective of how optimized the machines that run them are.

Example 6.12.

Consider the two well-known sorting algorithms implemented in a high-level language on a von Neumann architecture:

  • Bubble Sort, with asymptotic time complexity in \(\Theta ({n}^{2})\);

  • Merge Sort, with asymptotic time complexity in \(\Theta (n\log n)\).

The performance of Merge Sort on a slow computer will inevitably surpass the performance of Bubble Sort on a much faster computer as the input size becomes sufficiently large. \(\blacksquare\)

2.7 Complexity Classes and Polynomial Correlation

Let us now move our focus from the analysis of individual algorithms to the classification of computational problems according to the complexity of their algorithmic solutions. A complexity measure induces the class of problems that have solutions with that complexity.

Definition 6.13 (TIME and SPACE complexity classes). 

A time complexity measureFootnote 1 T(n) defines the complexity class TIME(T(n)) of all problems that can be solved by some Turing machine with time complexity in O(T(n)); a space complexity measure S(n) defines the complexity class SPACE(S(n)) of all problems that can be solved by some Turing machine with space complexity in O(S(n)). (DTIME and DSPACE are other names used for TIME and SPACE that emphasize the deterministic nature of the computational models). \(\blacksquare\)

Since complexity classes are sets of problems, it is natural to compare them by means of set-theoretic relations, such as “ ⊆ ”, “≠”, and “ ⊃ ”. When each of m > 1 classes \({C}_{1},\ldots ,{C}_{m}\) is included in the following one, \({C}_{1} \subseteq \cdots \subseteq {C}_{m}\), we say that they define a “hierarchy”. Exercise 6.14 presents a simple hierarchy between pairs of time complexity classes.

Exercise 6.14.

Show that if f(n) is O(g(n)) then \(\mathrm{TIME}(f(n)) \subseteq \mathrm{TIME}(g(n))\). \(\blacksquare\)

Example 6.15.

Consider the following computational problems.

PALINDROME::

determine if the input sequence is a palindrome. (A sequence is a palindrome if it reads the same left to right and right to left; for example, a, bb, abccba, and abcba are palindromes, but ab and abbca are not).

PALINDROME is in TIME(n) for a 1-tape Turing machine that works as follows: copy the input on the memory tape, and compare the input with itself read backward.

SORT::

sort a sequence of n integer elements.

Example 6.12 implies that SORT, when implemented on a von Neumann architecture, is in TIME(nlogn); for algorithms based on comparison, \(\Theta (n\log n)\) is also the best asymptotic time complexity possible.

MATRIX_MULTIPLY ::

compute the product of two n ×n matrices.

The algorithm that implements the definition of matrix multiplication on a von Neumann architecture shows that MATRIX_MULTIPLY is in TIME(n 3). Every algorithm for MATRIX_MULTIPLY necessarily takes at least n 2 steps because it must generate the n 2 elements of the result matrix; hence MATRIX_MULTIPLY is not in TIME(n c) for any c < 2. At the time of this writing, the asymptotically fastest algorithm known for MATRIX_MULTIPLY runs in time \(\Theta ({n}^{2.373})\); hence MATRIX_MULTIPLY is in TIME(n 2. 373).

ENUMERATE::

generate all n-digit numbers in base b.

ENUMERATE when implemented on k-tape Turing machines with alphabet of cardinality greater than or equal to b is in \(\mathrm{TIME}(n \cdot {b}^{n})\): there are exactly b n n-digit base-b numbers, and generating an n-digit number with a finite alphabet takes time proportional to n. \(\blacksquare\)

With the linear speed-up theorem, the notion of asymptotic behavior fosters an analysis of algorithms that is robust with respect to the details of the particular machine, or program, chosen to implement the given algorithm. Can we similarly define complexity classes that are robust with respect to the choice of computational model?

In general, in fact, whether a problem is in a class TIME(f(n)) may depend on the choice of computational model. Consider again the problem PALINDROME: determine if the input string is a palindrome. Example 6.15 outlines a 1-tape Turing machine solving PALINDROME that runs in time \(\Theta (n)\). If, however, we adopt the original and simpler variant of Turing machines with a single tape – used for input, output, and scratch memory – then we could prove that even the best algorithm for PALINDROME requires time \(\Theta ({n}^{2})\). In all, the problem of computing PALINDROME is certainly in TIME(n 2), but it is in TIME(n) for some (universal) computational models and not for others.

Exercise 6.16.

Consider the problem of sorting with a Turing machine.

  1. 1.

    Describe a k-tape Turing machine B that sorts a sequence of natural numbers, represented in binary form and separated by “#” symbols, using the Bubble Sort algorithm; what is the complexity T B (n) of the machine (with n the length of the whole input string)? Compare it with the complexity of a corresponding program running on a von Neumann architecture.

  2. 2.

    Consider the sorting problem of (1), but this time assume that the natural numbers considered have at most K binary digits (with K a given constant). Describe a Turing machine M that uses the Merge Sort algorithm to sort the sequence of input numbers, and determine its complexity T M (n), with n again the length of the whole input string. \(\blacksquare\)

It is possible to introduce the notion of a complexity class being robust with respect to the choice of computational model. It rests on the notion of polynomial correlation, which extends the \(\Theta \) notation to accommodate polynomial transformations.

Definition 6.17 (Polynomial correlation). 

Two functions \(f,g\,:\,\mathrm{I}\!\mathbb{N} \rightarrow \mathrm{I}\!\mathbb{N}\) are polynomially correlated if there exist two polynomial functions \(p,q\,:\,\mathrm{I}\!\mathbb{N} \rightarrow \mathrm{I}\!\mathbb{N}\) of any degree such that f is O(p(g(n)) and g is O(q(f(n)). \(\blacksquare\)

Example 6.18.

  1. 1.

    The functions n and n 2 are polynomially correlated: n 2 = nn.

  2. 2.

    The functions n and nlogn are polynomially correlated: n is O(nlogn) and nlogn is O(n 2).

  3. 3.

    The functions 2n and n n are not polynomially correlated:

    $$\frac{{n}^{n}} {{\left ({2}^{n}\right )}^{k}} = \frac{{n}^{n}} {{2}^{k\cdot n}} = \frac{{n}^{n}} {{\left ({2}^{k}\right )}^{n}}\mathop{\longrightarrow}\limits_{}^{n \rightarrow \infty }\infty $$

    for every constant k. \(\blacksquare\)

Consider two classes C 1, C 2 of abstract machines as powerful as Turing machines; for example, C 1 and C 2 are two variants of Turing machines, or a Turing machine and a markedly different model based on hardware architectures (such as the RAM described in Sect. 6.3). We say that C 1 can efficiently simulateC 2 if for every machine M 2 in C 2 that runs in time T 2(n) there exists a machine M 1 in C 1 that simulates M 2 and runs in time T 1(n) such that T 1(n) and T 2(n) are polynomially correlated. The notion of simulation refers to a way of “reproducing” the results of every algorithm defined in a computational model within another computational model, with at most a polynomial slowdown per step.

Of course, a polynomial slowdown may be quite conspicuous in practice, and in fact the analysis of individual algorithms is finer grained and does not group algorithms together only because they have polynomially correlated running times. The identification of “efficient” with “polynomial” is, however, apt for the study of the computational complexity of problems, where it provides a way to define classes of computational problems independently of the computational model chosen. This practice underlies a refinement of the Church-Turing thesis that takes computational complexity into account. This refinement usually goes by the name of “strong Church-Turing thesis”, although it is not due to Church or Turing:

Every implementable computational process can be computed by a Turing machine with at most a polynomial slowdown with respect to the complexity of any other abstract machine computing the same process.

As in the original Church-Turing thesis, the qualification “implementable” means that “unreasonable” computational models, which fail to capture features of real implementations, are ruled out.

We are finally ready to define a few standard complexity classes. They are robust with respect to any computational model that is efficiently (i.e., polynomially) simulated by a Turing machine (for space complexity classes, the polynomial correlation is between space complexity measures).

Definition 6.19 (Deterministic complexity classes). 

  • P (also PTIME) is the class of problems that can be solved in polynomial time (and unlimited space), that is,

    $$\mathrm{P}\ =\ { \bigcup }_{k\in \mathrm{I}\!\mathbb{N}}\mathrm{TIME}({n}^{k}).$$

    P is usually considered the class of problems that are computationally tractable, according to the intuition developed above.

  • EXP (also EXPTIME) is the class of problems that can be solved in exponential time (and unlimited space), that is,

    $$\mathrm{EXP}\ =\ { \bigcup }_{k\in \mathrm{I}\!\mathbb{N}}\mathrm{TIME}(\exp ({n}^{k})).$$
  • PSPACE is the class of problems that can be solved in polynomial space (and unlimited time), that is,

    $$\mathrm{PSPACE}\ =\ { \bigcup }_{k\in \mathrm{I}\!\mathbb{N}}\mathrm{SPACE}({n}^{k}).$$
  • EXPSPACE is the class of problems that can be solved in exponential space (and unlimited time), that is,

    $$\mathrm{EXPSPACE}\ =\ { \bigcup }_{k\in \mathrm{I}\!\mathbb{N}}\mathrm{SPACE}(\exp ({n}^{k})).$$
  • ELEMENTARY is the class of problems that can be solved in iterated exponential time (and unlimited space), that is,

    $$\mathrm{ELEMENTARY}\ =\ { \bigcup }_{k\in \mathrm{I}\!\mathbb{N}}\mathrm{TIME}({\exp (\exp (\cdots \exp (}^{k}n)\cdots \,))).$$

    Correspondingly, a decidable problem is nonelementary if its time complexity grows faster than any iterated exponential function. Chapter 9 will mention an algorithm with nonelementary complexity. \(\blacksquare\)

A computation cannot use more time than space, because writing a memory cell requires at least one step and each cell can be rewritten multiple times. Therefore, \(\mathrm{TIME}(f(n)) \subseteq \mathrm{SPACE}(f(n))\) for every complexity measure f(n). The next section introduces other fundamental complexity classes, for nondeterministic models of computation.

2.8 Nondeterministic Models of Computation

All the models of computation discussed so far in the present chapter are deterministic. Chapter 3, however, mentioned the usefulness (or necessity) of nondeterministic and probabilistic models for the formalization of certain classes of processes and systems. Section 6.3 will introduce probabilistic computational models, whereas the remaining parts of Sect. 6.2 will introduce nondeterministic models of computation and present their features in the context of computational complexity timing analysis.

A nondeterministic model of computation can produce several distinct computations starting from the same initial input. Hence, a nondeterministic process in the functional model associates with every input a set of outputs – all those that the computation can produce starting from the input given. Correspondingly, whereas the “state” uniquely determines the next step taken in a deterministic computation, a nondeterministic process allows multiple steps to be taken from the same “state” – for an appropriate notion of state. Concretely, the transition function of a nondeterministic automaton defines a set of transitions for each current “state” and input. The next definition details this idea for finite-state automata and Turing machines; Chaps. 7 and 8 will present different extensions of the finite-state automaton model that feature a notion of nondeterminism.

Definition 6.20 (Nondeterministic finite-state automaton and Turing machine). 

A nondeterministic finite-state automaton consists of the same components as a deterministic automaton (Sect. 5.2.2), but the transition function has signature

$$\delta : Q \times I \rightarrow \wp \left (Q\right ),$$

where \(\wp \left (Q\right )\) is the powerset of Q. For current state q ∈ Q and input i ∈ I, the nondeterministic automaton can change its state to any of the states in the set δ(q, i).

A nondeterministic Turing machine consists of the same components as a deterministic Turing machine (Definition 6.1), but the transition function has signature

$$\delta : Q \times {\Sigma }^{k+1} \rightarrow \wp \left (Q \times {\Sigma }^{k+1} \times \{ R,L,{S\}}^{k+1} \times \{ R,S\}\right ).$$

For current state a ∈ Q, input symbol i ∈ Σ, and memory symbols \({m}_{1},\ldots ,{m}_{k} \in \Sigma \), the nondeterministic machine can change its configuration according to any of the tuples in \(\delta (q,i,{m}_{1},\ldots ,{m}_{k})\). \(\blacksquare\)

The interpretation of nondeterministic features with respect to the process modeled depends on the kinds of questions that we want the model to address. Automata, in particular, model the computation of some function of the input. Given that nondeterministic automata express a set of possible behaviors for a given input and initial state, which of these behaviors is expected to produce the correct output? If an external “hostile” environment can influence the nondeterministic alternatives, the computational process has to guarantee a correct output for every nondeterministic behavior, of any length. This assumption is called “external” or “demonic” nondeterminism, and is adopted in certain notations such as process algebras (see Chap. 10); it corresponds to the general notion of UNIVERSAL NONDETERMINISM discussed in Sect. 3.4.1. If, instead, the nondeterministic choices represent the possibility of selecting the “best” alternative whenever convenient, it is sufficient that one of the nondeterministic behaviors yields a correct output. This semantics is called “internal” or “angelic” nondeterminism; it corresponds to the notion of EXISTENTIAL NONDETERMINISM of Sect. 3.4.1, and it is the standard assumption in the context of computational complexity analysis.

The rest of the chapter adopts the latter existential view of nondeterminism, of which there are two intuitive interpretations in the context of computational complexity. In the first view, a nondeterministic machine has a somehow uncanny power to “choose” the best alternative at each step – the choices leading to a successful computation, if one exists, with as few steps as possible. In the other view of nondeterminism the machine spawns parallel computations at every step; each computation examines the effects of a particular choice from among the alternatives, and the machine combines the results returned by the parallel threads. This notion of parallelism is unbounded, because even if the choices available at every step are finite, the length of computations is, in general, unbounded, and the machine may spawn parallel processes at every step. In both views – nondeterministic choice and unbounded parallelism – a BRANCHING-TIME domain often is the natural model of nondeterministic computations: all the possible computations originating from the same initial state are represented as a tree (see Chap. 3).

Remark 6.21.

Definition 6.20 conservatively considers automata and Turing machines with a unique initial state; but nondeterminism can easily accommodate multiple initial states. We discuss the issue with reference to finite-state automata; extending the same concepts to Turing machines is straightforward. If \({Q}_{0} \subseteq Q\) is a set of initial states of automaton A, every computation of A starts from a nondeterministically chosen state q ∈ Q 0 and then continues as in Definition 6.20. The additional source of nondeterminism is entirely reducible to the case of unique initial states, because the initial nondeterministic choice is expressible by adding transitions that exit a unique initial state: let us build an automaton A′ with unique initial state q 0 equivalent to A. A′ includes all states and transitions in A, plus a fresh state \({q}_{0}\not\in Q\) that is its unique initial state. For every initial state q ∈ Q 0 of A, input symbol i ∈ I, and state \(q^{\prime} \in \delta (q,i)\) directly reachable from the initial state, A′ also includes a transition from q 0 to q′ with input i. A′ can make precisely the same first transitions as A, and it behaves identically to A after leaving q 0; hence the two automata are equivalent. Figure 6.3 shows the equivalence construction on a simple automaton with two initial states. \(\blacksquare\)

The introduction of nondeterministic models brings forth the question of their impact on computational power and resource usage. The following sections investigate these fundamental questions, focusing on Turing machines (Chap. 7 will discuss nondeterministic extensions of finite-state automata in more detail).

  • Are nondeterministic computational models more powerful, in terms of expressible computations, than deterministic models?

  • Within models with the same expressive power, do nondeterministic models achieve a better (that is, lower) computational complexity? In other words, is nondeterministic computation more efficient?

  • Is the nondeterministic abstraction a suitable model of “real” computational processes? Is it readily implementable? What is the impact of nondeterministic computation on the Church-Turing thesis (Sect. 6.2.4) and on the strong Church-Turing thesis (Sect. 6.2.7)?

Fig. 6.3
figure 3

Two equivalent nondeterministic finite-state automata

To approach these questions rigorously, we first introduce complexity measures and classes for nondeterministic models.

Definition 6.22 (Nondeterministic complexity classes). 

A nondeterministic Turing machine N runs in # N steps for an input x if the shortest sequence of steps that is allowed by the nondeterministic choices and correctly computes the result for input x has length # N . Correspondingly:

  • A nondeterministic Turing machine has time complexity T(n) if the longest computation with input of size n runs in T(n).

  • A time complexity measure T(n) defines the complexity class NTIME(T(n)) of all problems that can be solved by some nondeterministic Turing machine with time complexity in O(T(n)).

  • NP (also NPTIME) is the class of problems that can be solved in polynomial time (and unlimited space) using nondeterminism, that is,

    $$\mathrm{NP}\ =\ { \bigcup }_{k\in \mathrm{I}\!\mathbb{N}}\mathrm{NTIME}({n}^{k}).$$
  • NEXP (also NEXPTIME) is the class of problems that can be solved in exponential time (and unlimited space) using nondeterminism, that is,

    $$\mathrm{NEXP}\ =\ { \bigcup }_{k\in \mathrm{I}\!\mathbb{N}}\mathrm{NTIME}(\exp ({n}^{k})).$$

    \(\blacksquare\)

To simplify the presentation, and to focus on the central issues, we do not analyze the consequences of nondeterminism for space complexity. It is useful to remark, however, that PSPACE and EXPSPACE define classes of complexity even higher than NP: it is known that

$$\begin{array}{rcl} \mathrm{P} \subseteq \mathrm{NP} \subseteq \mathrm{PSPACE}& =& \mathrm{NPSPACE} \subseteq \mathrm{EXP} \subseteq \mathrm{NEXP} \subseteq \mathrm{EXPSPACE} \\ & =& \mathrm{NEXPSPACE}.\end{array}$$

Also, notice that each deterministic class is included in its nondeterministic counterpart (e.g., P is included in NP) because determinism is a special case of nondeterminism where there is a unique choice at every step.

2.9 Nondeterministic Turing Machines

Nondeterminism does not increase the computational power of Turing machines. Therefore, the Church-Turing thesis applies to nondeterministic models too:

Every (implementable) nondeterministic computation can be computed by a deterministic Turing machine.

The intuition behind a deterministic simulation of a nondeterministic Turing machine rests on the idea of nondeterministic choice: whenever the nondeterministic computation can “choose” from among multiple next steps, the deterministic Turing machine will sequentially try each alternative, exhaustively enumerating all possible computations originating from the current configuration (the complete branching-time tree). This simulation scheme is not difficult to implement, but it introduces in general an exponential blowup in the running time. Precisely, consider a nondeterministic Turing machine N that runs in time T N (n). The nondeterministic complexity measure factors in the power of choice, that is, it does not consider the alternative steps that are possible but useless for the computation. A deterministic Turing machine D simulating N by enumerating the computations will, however, have to include the “unfruitful” choices as well in the enumeration; if N’s transition function allows for up to b alternatives at every step, D runs in time polynomially correlated to \({b}^{{T}_{N}(n)}\) in the worst case.

Example 6.23.

The “map 3-coloring problem” (3-COL) is the problem of deciding if you can color with green, blue, or red each region on a generic map in such a way that no two adjacent regions have the same color.

A nondeterministic algorithm – expressible with a nondeterministic Turing machine – can solve 3-COL as follows: pick a color; use it to color one of the regions not colored yet; repeat until all regions are colored; when the coloring is complete, check if no two adjacent regions have the same color and report success or failure. The time complexity of this nondeterministic algorithm is O(n 3) in the number of regions: n steps to nondeterministically produce a coloring, and then O(n 3) steps to check if the coloring satisfies the adjacency property.

To this end, a memory tape stores the regions that are adjacent as sequences with separators; for example, with reference to US states, the sequence AZ NV UT CO NM CA encodes the fact that Arizona (AZ) is adjacent to Nevada (NV), Utah (UT), Colorado (CO), New Mexico (NM), and California (CA). If US states are encoded in unary notation respecting the alphabetic order (Alabama = 1, Alaska = 11, Arizona = 111, etc.), the adjacency list has length O(n 3). Another tape encodes the colors given to each state in a fixed order, say alphabetical; for instance, a list beginning with \(\mathsf{red}\mathsf{green}\mathsf{blue}\) means that Alabama is colored red, Alaska green, and Arizona blue. With this setting, since every state has O(n) neighbors, and checking the color of each neighbor takes O(n) steps to look up the list of colors, the complete check has cubic complexity.

A deterministic Turing machine simulating the nondeterministic algorithm must enumerate, in the worst case, all possible 3n colorings. Therefore, it runs in time \(\mathrm{O}({n}^{3} \cdot {3}^{n})\), exponential in the nondeterministic time complexity. \(\blacksquare\)

2.10 NP-Completeness

Is the exponential blowup that deterministic Turing machines incur in simulating nondeterministic computations unavoidable in the worst case? In other words, can we efficiently implement the nondeterministic model of computation? These are outstanding open questions, whose complete answers have so far eluded the best scientists – and not for lack of trying!

If the blowup is unavoidable, it means that nondeterminism is not a model of efficiently implementable computational processes; the strong Church-Turing thesis does not apply to nondeterministic processes; unbounded parallelism is incommensurable with the bounded amount of parallelism that deterministic machines with multiple processors can achieve; and the complexity class P is a strict subset of the class NP.

The prevailing conjecture, supported by overwhelming theoretical and practical evidence, is that nondeterminism cannot be efficiently implemented in the general case: every attempt at designing an efficient (i.e., polynomially correlated) deterministic simulation of nondeterministic computations has failed, but it is extremely hard to prove that such an achievement is impossible, even in principle.

If we set aside the theoretical aspects and focus on practical evidence, it seems that nondeterminism is not a reasonable abstraction of “physical” computational processes. Nonetheless, it is still a convenient abstraction because it captures the essence shared by a huge family of heterogeneous problems with enormous practical relevance, known as NP-complete problems. A precise definition of NP-complete problem appears below. Intuitively, an NP-complete problem is one for which checking if a candidate solution is correct is computationally tractable, but constructing a correct solution from scratch seems intractable. As explained in the previous sections, the notion of “tractable” means that there exists a polynomial-time algorithm; hence a tractable problem is in the class P. In contrast, an intractable problem has only solutions with super-polynomial (typically, exponential) running times. Nondeterministic computations can use the power of “magic choice” and construct the correct solution in the same running time needed to check it, and hence efficiently with respect to the use of nondeterministic resources; or, equivalently, they can explore in parallel all the exponentially many possible solutions and select the correct ones.

The complexity class NP includes all these NP-complete problems that are “easy to check and (seemingly) hard to solve” for deterministic algorithms, but which nondeterministic algorithms can process efficiently. The NP-complete problems are then the “hardest problems in NP”. Some examples of NP-complete problems:

  • The “map 3-coloring” problem (3-COL) (introduced in Example 6.23).

  • The “traveling salesman” problem (TSP) (also called “shortest Hamiltonian cycle” problem): given a list of cities and their distances from each other, find the shortest path that goes through every city exactly once.

  • The “Boolean satisfiability” problem (SAT): given a combinatorial circuit of Boolean logic gates (see Chap. 5) with a single output, find if there is an input that makes the output evaluate to true. An equivalent presentation of the same problem is in terms of Boolean formulae in propositional logic.

  • The “register allocation problem”: given a program in a high-level language, find an assignment of the program variables to a minimum number of registers in such a way that no two variables use the same register at the same time.

All these problems have enormous practical relevance. What is remarkable is that, in spite of their widely different domains, they all are intimately connected by being NP-complete: the existence of an efficient implementable algorithm for any NP-complete problem would imply that every NP-complete problem has a similarly efficient solution. In complexity-theoretical terms, if there exists an NP-complete problem which is also in P, then P = NP and every NP problem has a deterministic tractable solution. Conversely, if we succeeded in proving that a specific NP-complete problem has no efficient solution, then no NP-complete problem would have an efficient solution, and P ⊂ NP.

Definition 6.24 (NP-completeness). 

For a problem p, P(x) denotes the solution of p for input x; similarly, M(x) denotes the (unique) output of the deterministic Turing machine M with input x. Then, a problem c in NP is “NP-complete” if, for any other problem p in NP, there exist two deterministic Turing machines \({R}_{p\rightarrow c}\) and \({R}_{c\rightarrow p}\) with polynomial time complexities such that \({R}_{c\rightarrow p}(C({R}_{p\rightarrow c}(x))) = P(x)\) holds for every input x. \(\blacksquare\)

Thus, a problem c is NP-complete if we can use it to solve every other problem p in NP with the same time complexity up to a polynomial factor: transform the input to p into an “equivalent” input to c in polynomial time with \({R}_{p\rightarrow c}\); solve c for the transformed input; transform the solution back in p’s form in polynomial time with \({R}_{c\rightarrow p}\). The bottleneck in this process is solving c: if it can be done efficiently, then every other problem in NP is solvable efficiently (with at most a polynomial slowdown) by piggybacking on c’s solution.

Conventional wisdom, supported by the results outlined above, considers the border between tractable and intractable problems in any “reasonable” computational model to lie close to the P/ NP (or, more precisely, P/NP-complete) frontier. However, the failure to come up with efficient solutions for NP-complete problems has not prevented the development of algorithms that use sophisticated heuristics and highly optimized data structures to achieve performances that are acceptable in practice for inputs of reasonable size. Part II of the book will mention several examples of verification tools that exploit these results to analyze complex properties of systems automatically with acceptable performance, in spite of the problems being NP-complete or of even harder complexity classes.

3 Computational Complexity and Architecture Models

The automata-based models analyzed in Sect. 6.2 are suitable for investigating fundamental questions about the limits of computing and the inherent computational complexities of problems. Part II will show that automata are also appropriate abstractions for domain-specific modeling, especially when they include a notion of time. Automata-based models usually are, however, of limited usefulness for modeling general-purpose digital computing devices. The von Neumann hardware architecture typically underlies the internal organization of these devices. Therefore, the fine-grained analysis of the running time (and memory used) by digital computer programs must use formal models that represent the principles of the von Neumann architecture accurately, with suitable abstractions.

3.1 Random Access Machines

We already discussed the inadequacy of finite-state automata for providing a reasonable abstraction of general-purpose computers. Turing machines provide the “right” expressive power for representing general-purpose computing, but they are imprecise models of the computational complexities of real programs. In particular, the assumption that every move of a Turing machine counts for one unit of time is too simplifying for general-purpose computers, where different elementary operations performed by the CPU can have quite different running times. For example, memory access operations normally take more clock cycles than other operations accessing only internal CPU registers, and input/output operations are even orders of magnitude slower.

There is a simple partial solution to this inadequacy of the Turing machine model: assign different costs to different moves of the Turing machine. This may somehow complicate the complexity analysis, but it makes the model more detailed. Such a more detailed model, however, would be superseded by some linear speed-up, and it would still fail to capture a more fundamental discrepancy between automata-based models and architectures à la von Neumann: the memory model. Turing machines access their memory tapes strictly sequentially; therefore the position of a datum affects the number of steps required to access it. Computers with von Neumann architectures have direct access memory, and an elementary operation such as load or store can atomically transfer a block of bits from one memory location to any other location, regardless of their relative positions in the memory.

This section develops computational models that, while still abstract compared to the architectures of real computers, include memory with direct access, and therefore support an accurate analysis of algorithmic complexity. The analysis results are largely extensible to general-purpose computers with similar architectures. One classic abstract model of computer with direct access memory is the “Random Access Machine” (RAM).

Definition 6.25 (Random access machine). 

A random access machine (RAM), pictured in Fig. 6.4, consists of:

  • Input and output tapes, each a sequence of cells accessed sequentially with one head (similarly to Turing machines).

  • A direct access memory of unbounded size; the memory is a sequence of cells, each addressed by a natural number.

  • A control unit with:

    • A hardwired program consisting of a sequence of instructions, each numbered by a natural number;

    • A register called the “accumulator” (ACC);

    • The “program counter” (PC), another register storing the address of the next instruction to be executed;

    • A unit capable of performing arithmetic operations.

Fig. 6.4
figure 4

A random access machine (RAM)

Every memory cell, including the registers and those on input/output tapes and in the direct access memory, can store any integer value, or a character from a finite alphabet.

The RAM executes the instructions programmed sequentially, except for jump instructions that explicitly modify the PC and change the instruction to be executed next. Figure 6.5 lists the instructions available and their semantics; M[n] denotes the content of memory at address n. \(\blacksquare\)

Example 6.26.

Figure 6.6 shows a RAM program that checks if the input is a prime number; it implements the trivial algorithm that tries all possible divisors. In the comments, n denotes the input value read. \(\blacksquare\)

3.2 Algorithmic Complexity Analysis with Random Access Machines

Let us analyze the (time) complexity of the RAM program in Fig. 6.6 with the same principles used for Turing machines: count the maximum number of instructions executed in a run with input n. Every integer n has n possible divisors tested by the program, and the loop consists of 12 instructions; this gives a time complexity function \(\Theta (n)\). It is clear that the implementation of the same algorithm with a Turing machine would not have the same asymptotic complexity: just multiplying two numbers requires a number of steps that increases with the number of digits of the numbers.

Fig. 6.5
figure 5

The instruction set of the RAM

Fig. 6.6
figure 6

A RAM program for primality testing

This significant discrepancy suggests the need for more detailed inspection to understand the scope and limits of the RAM’s abstraction. The assumption that every memory cell in the RAM can store an integer implies that a single instruction can process an infinite amount of information: there is no a priori bound on the maximum integer storable. On the one hand, the architectures of digital computers do offer powerful instruction sets, which can manipulate and perform arithmetic on the content of any register with a fixed number of clock cycles. On the other hand, every architecture has memory cells and registers of finite size, usually measured in bits. In a 64-bit architecture, for example, the RAM’s abstraction of unbounded integers storable in every cell is appropriate as long as the program deals only with integers between − 263 and \(+{2}^{63} - 1\) that fit a “real” memory cell. When this hypothesis does not hold, the complexity analysis on the RAM may give results that are not generalizable to a real architecture. The following Example 6.27 makes a compelling case.

Example 6.27.

Figure 6.7 shows a RAM program that computes the double exponential \({2}^{{2}^{n} }\) of the input n by successive squaring: \({2}^{{2}^{n} } = {2}^{{2 \cdot 2 \cdot \cdots \cdot 2}^{n} }\). The loop is executed n times, which gives again a time complexity \(\Theta (n)\) relative to the input value. \(\blacksquare\)

The time complexity analysis of Example 6.27 is clearly unacceptable: in a k-bit architecture, the intermediate result would overflow after k loop iterations, and it would occupy 2i − k k-bit cells at the ith iteration of the loop. Any reasonable complexity analysis must take the extra elementary operations used to manipulate multi-cell numbers into account.

Fig. 6.7
figure 7

A RAM program that computes the double exponential of the input

We combine these observations in the “logarithmic cost criterion”, which evaluates the time complexity of a RAM program according to the following principles.

  • An integer value n occupies log2 n bits, split into \({(\log }_{2}n)/k\) cells in a real k-bit architecture; then, accessing the content of a memory cell that stores an integer n requires \({(\log }_{2}n)/k\) elementary operations. The logarithmic cost criterion incorporates these estimates by modeling the access to a memory cell storing a number n as taking time \(\Theta (\log n)\); the precise value of k is abstracted away in the asymptotic notation.

  • Similarly, accessing a memory cell at address n requires \(\Theta (\log n)\) elementary resources in a real architecture, to send the address over the memory bus and to receive back the content of the cell. The logarithmic cost criterion thus counts \(\Theta (\log n)\) time units for every access to memory at address n.

  • The measure of the input length is also proportional to the value of the input cells: an input consisting of the sequence of integer values \({i}_{1}\,{i}_{2}\,\ldots \,{i}_{n}\) has length \(\Theta (\log {i}_{1} +\log {i}_{2} + \cdots \log {i}_{n})\), the parameter to be used in the complexity measures.

  • Characters are from a finite alphabet; hence accessing a cell storing a character only costs constant \(\Theta (1)\) time units.

The base of the logarithms is irrelevant because \({\log }_{a}(n)\,{=\,\log }_{b}(n){/\log }_{b}(a)\,=\Theta {(\log }_{b}(n))\): every encoding other than unary (in base 2, 3, or more) achieves the same asymptotic complexity.

The simplistic approach to complexity measures used before is called uniform cost criterion for comparison with the logarithmic cost. The uniform cost criterion is perfectly adequate for algorithms that only manipulate integers of absolutely bounded size and use an amount of memory absolutely bounded (that is, bounded for every input). Otherwise, only the logarithmic cost criterion achieves a reliable complexity analysis.

Figure 6.8 shows the cost of the basic RAM instructions computed according to the logarithmic cost criterion; in the table, (x) is a shorthand for the expression

$$\mathcal{l}(x)\ =\ \left \{\begin{array}{@{}l@{\quad }l@{}} 1 \quad &x = 0\text{ or }x\text{ is a character},\\ {\lfloor \log }_{ 2}\vert x\vert \rfloor + 1\quad &\text{ otherwise}. \end{array} \right.$$

Intuitively, each operation “costs” something proportional to the logarithmic size of its operands and result. For example, the sum of two integers n 1 and n 2 requires us to manipulate approximately \({\lfloor \log }_{2}({n}_{1})\rfloor \) and \({\lfloor \log }_{2}({n}_{2})\rfloor \) bits for the operands and \({\lfloor \log }_{2}({n}_{1} + {n}_{2})\rfloor \) bits for the result; this affects the memory cells used and is also proportional to the number of truly elementary sum operations required by the RAM’s addition.

Fig. 6.8
figure 8

The cost of some RAM instructions with the logarithmic criterion

For example, each iteration of the loop for the program in Fig. 6.6 corresponds to a number of elementary steps proportional to the logarithm of the input n and the current divisor, that is \(\Theta (\log n)\). Repeated for n iterations, this gives a total number of steps \(\Theta (n\log n)\). Finally, we express this complexity with respect to the input size m = (n) – which is \(\Theta (\log n)\) – to get a time complexity measure \(\Theta (m \cdot {2}^{m})\).

Exercise 6.28.

Complete Fig. 6.8 with the logarithmic cost of the missing instructions.

  • How would you measure the cost of mult? Does it affect the complexity result for the program in Fig. 6.6?

  • Determine the time complexity of the RAM program in Fig. 6.7 using the logarithmic cost criterion. \(\blacksquare\)

Exercise 6.29.

Consider the algorithm that performs binary search of an integer in a sequence of integers.

  • Write a RAM program that performs binary search of a sequence stored in memory.

  • Determine the time complexity of the program developed in (6.29), both with the uniform cost and with the logarithmic cost criteria; compare the two measures.

  • Outline (without details) a Turing machine that performs binary search; determine its time complexity measure; compare it to the RAM’s in (6.29). What is the reason for the difference? \(\blacksquare\)

Exercise 6.30.

Consider the problem PALINDROME (Example 6.15), determining if a sequence of characters is a palindrome or not.

  • Write a RAM program that determines if the input sequence is a palindrome; determine its time complexity (both with uniform and with logarithmic cost criteria).

  • Write a Turing machine that determines if the input sequence is a palindrome; determine its time complexity.

  • Compare the time complexities of the RAM and Turing machine models. \(\blacksquare\)

Exercise 6.31.

The “Random Access Stored Program machine” (RASP) is similar to the RAM, but stores its program in the direct access memory together with the data. It is the equivalent of a universal Turing machine for RAMs. The RASP execution model consists of fetching each instruction and copying it in a special register before executing it; the value of the program counter determines the next instruction to fetch. Extend the cost measures in Fig. 6.8 for the RASP machine model.

3.3 Random Access Machines and Complexity Classes

The examples and exercises in the previous parts of Sect. 6.3 show that the same problems may have different complexities in the RAM model and the Turing machine model. In most cases, the RAM’s direct access memory allows it to achieve asymptotically faster running times than the Turing machine, but there are problems for which the humble Turing machine model can be faster.

Consider, for example, the problem of generating an output sequence that equals the concatenation of the input with itself (e.g., if the input sequence is 17, the output is 1717), and assume that the elements in the sequence are from a finite set. The Turing machine can solve the task in \(\Theta (n)\) steps, by copying the input to a memory tape and then copying it twice onto the output tape. The RAM must copy the elements of the sequence in memory using \(\Theta (n)\) memory addresses; this causes a logarithmic slowdown with the logarithmic cost criterion and a time complexity \(\Theta (n\log n)\) asymptotically worse than the Turing machine’s.

Section 6.2.7 anticipated the differences in analyzing single algorithms versus problems and justified them. After presenting the RAM model, we can better understand the different scope of complexity analysis for algorithms and for problems.

We base the analysis of the optimal implementation of specific algorithms on a machine model that, while abstract, captures the fundamental characteristics of the architecture of the real computers used. The RAM is a suitable model for investigating the complexities of algorithms for sequential programs running on architectures à la von Neumann. The bibliographic remarks at the end of the chapter mention other models of different architectures (e.g., parallel architectures) with the same level of abstraction as the RAM. The linear speed-up theorem applied to RAMs guarantees that details such as the bit size of memory cells or the execution speed of individual instructions do not affect the algorithmic complexity analysis performed with RAM models.

In the more detailed abstraction of RAMs and similar machines, it is sometimes useful to consider finer-grained complexity classes that are not robust with respect to polynomial correlations but are with respect to constant-factor (i.e., linear) speed-up. In particular, the two deterministic complexity classes LIN (problems solvable in “linear” time O(n)), also called LTIME, and L (problems solvable in logarithmic space O(logn)), also called LSPACE, characterize problems with very efficient solutions that easily scale up to very large input sizes.

On the other hand, we use Turing machines (and other very abstract models) to determine the inherent complexities of computational problems. The results are valid beyond Turing machines, because every algorithm formalized in a “reasonable” abstract (or real) machine will have complexity polynomially correlated with the Turing machine’s; hence the complexity class of a problem is universal with respect to any “implementable” computational model. In particular, a Turing machine can simulate the computation of every RAM with a polynomial slowdownFootnote 2 – with respect to the “realistic” logarithmic cost criterion; and a RAM can simulate the computation of every k-tape Turing machine with a polynomial (actually, logarithmic) slowdown – measured with the logarithmic cost criterion.

4 Randomized Models of Computation

A computational process is randomized if its evolution over time depends on chance. Whereas the behavior of a deterministic process is completely determined by the initial state and the input, and a nondeterministic process can “choose” among possible next states and will explore all choices, a randomized (or stochastic) process can “flip a coin” or “throw a die” to select the next state from the current one. Chapter 3 outlined the basic motivation for introducing randomness in computational models: stochastic behavior is common in many physical phenomena. More precisely, it is inherent in some (as in the microscopic world of quantum mechanics) and an extremely convenient abstraction of others (such as in statistical mechanics for modeling the collective behavior of large populations of simple particles). The theme of the present chapter gives an additional justification for considering randomized models: since randomized models are clearly implementable – because random processes exist in nature – does using randomness as a resource increase computational power?

The present section discusses stochastic extensions of finite-state automata (Sect. 6.4.1) and Turing machines (Sect. 6.4.2), their impact on the results discussed in the previous sections, and their connections with some of the models presented in Part II of the book. In accordance with the outline of Sect. 3.4, the rest of this section presents notations incorporating randomness with two different roles: in some cases, the choice of which transition to make is randomized; in others, the timing of transitions is subject to chance. Whereas it is obvious that the latter case – of randomized timing – has a direct and EXPLICIT impact on timed behavior, the former case – of randomized transition choices – has an indirect impact, since different transitions may produce different response times.

4.1 Probabilistic Finite-State Automata

Finite-state automata extended with a notion of probability are usually called “Markov chains”. The name is after Andrey Markov, who first suggested using the word “chain” to designate a family of discrete-time stochastic processes with memory. Even if the origins of Markov chains and finite-state automata are quite different, it is natural to present Markov chains as probabilistic extensions of finite-state automata. More precisely, Markov chains coincide with probabilistic finite-state automata when they possess a finite number of states, but Markov chains over a denumerable number of states are also possible. This section uses the two nomenclatures interchangeably, and presents three variants of the probabilistic finite-state automaton model: vanilla discrete-time Markov chains (Sect. 6.4.1.1) with probabilities associated with transitions; Markov decision processes (Sect. 6.4.1.2), which include input events; and continuous-time Markov chains (Sect. 6.4.1.3), where the timing of transitions is randomized.

4.1.1 Discrete-Time Markov Chains

Probabilistic finite-state automata (also, “discrete-time Markov chains”) associate a probability with each transition that determines the chance that the transition is taken.

Definition 6.32 (Probabilistic finite-state automaton/Discrete-time Markov chain). 

A probabilistic finite-state automaton is a finite-state automaton without an input alphabet, extended with a probability function π : Q ×Q → [0, 1] which determines which transitions are taken. The probability function is normalized:

$${\sum }_{q^{\prime}\in Q}\pi (q,q^{\prime}) = 1$$

for every q ∈ Q. When the automaton is in some state q ∈ Q, it can make a transition to any state q′ ∈ Q with probability π(q, q′). At every step, the automaton moves to the drawn state q′, where it gets ready for a new transition.

Discrete-time Markov chains generalize discrete-time probabilistic finite-state automata to any countable set Q of states. \(\blacksquare\)

Probabilistic automata include no explicit notion of input, as transitions are chosen at random solely according to the probability distribution defined by π for the present state. This feature makes probabilistic automata suitable models of systems where a finite number of “events” can happen with certain probability but without direct control.

Example 6.33.

Figure 6.9 shows a probabilistic finite-state automaton that models the following behavior of a student who takes a series of exams: when the student passes an exam (state P), there is a 90 % chance that she will pass the next exam as well (transition to P); when the student fails an exam (state F), there is a 60 % chance that she will fail the next exam (transition to F). The figure shows the probabilities associated with each transition and the corresponding next states. \(\blacksquare\)

The analysis of deterministic models focuses on the properties of individual runs with a given input; that of nondeterministic models captures the behavior of every nondeterministic run on a given input. Analyses of probabilistic finite-state models typically take a different angle, and target the probability of certain runs or the average long-term behavior of the automaton.

Example 6.34.

Let us go back to Example 6.33 and analyze a few properties of the model, under the assumption that the probability of passing the very first exam is 0.9 (that is, P is the initial state).

  • The probability that the student passes the first k consecutive exams is 0. 9k.

  • The probability that the student passes every other exam for 2k exams is \({(0.1 \cdot 0.4)}^{k} = 0.0{4}^{k}\).

  • The probability that the student always fails goes to 0 as the number of exams increases. \(\blacksquare\)

Computing the probability of specific runs is simple because the probabilistic choices depend only on the current state and are otherwise independent of the outcomes of the previous stochastic events. This property – which characterizes Markov processes – is in fact called “Markov property”.

Fig. 6.9
figure 9

A probabilistic finite-state automaton modeling a student taking exams

Another significant feature of probabilistic models à la Markov is the decrease of probability of every specific behavior as the length of the behavior increases: whereas in nondeterministic models every nondeterministic choice is possible and must be considered, assigning probabilities to different choices entails that the long-term behavior is more and more likely to asymptotically approach the average behavior. Precisely, the steady-state probability gives the likelihood that an automaton will be in a certain finite state after an arbitrarily long number of steps. To compute the steady-state probability, represent the probability function π as a \(\vert Q\vert \times \vert Q\vert \) probability matrixM, whose element in row i, column j is the probability of transitioning from the ith to the jth state. The steady-state probability has the characterizing property of being independent of the initial and current states: a row vector \(\vec{p} = {p}_{1}\,\ldots \,{p}_{\vert Q\vert }\) of nonnegative elements, whose ith element denotes the probability of being in the ith state, is the steady-state probability if it satisfies \(\vec{p} \cdot M =\vec{ p}\) (it does not change after one iteration) and \({\sum }_{1\leq i\leq \vert Q\vert }{p}_{i} = 1\) (it is a probability distribution on states). In other words, the steady-state probability is a left eigenvector of the probability matrix with unit eigenvalue, whose elements are nonnegative and sum to 1.

Example 6.35.

The probability function in Example 6.33 determines the probability matrix

$$M\ =\ \left [\begin{array}{cc} 0.9&0.1\\ 0.4 &0.6 \end{array} \right ],$$

with left eigenvector

$$\left [{p}_{1}{p}_{2}\right ] = [0.8\;0.2]$$

that satisfies \([{p}_{1}{p}_{2}] \cdot M = [{p}_{1}{p}_{2}]\). This means that, in the long term, the student passes 80 % of the exams she attempts. \(\blacksquare\)

Exercise 6.36.

A mono-dimensional “random walk” is a process that visits the sequence of integer numbers as follows. The process starts from the position 0; at any step, it flips an unbiased coin, increases its position by 1 if the coin lands on tails, and decreases it by 1 otherwise.

  • Model the random walk with an infinite-state Markov chain.

  • Compute the probability P k, t that the process is at integer position k after t steps, for every \(k \in \mathbb{Z},t \in \mathrm{I}\!\mathbb{N}\).

  • Compute the average position after k steps. Does it depend on k? \(\blacksquare\)

Time is DISCRETE, SYNCHRONOUS, and METRIC in finite-state automata regardless of whether they are in their deterministic, nondeterministic, or probabilistic version. Probabilities, in particular, only determine the transition taken at every step, but time progresses uniformly and synchronously with every transition. Time is also INFINTE in probabilistic finite-state automata: the time domain is isomorphic to the natural numbers, and all runs are indefinitely long because the probability that some transition is taken is 1 in every state. This notion of infinite behavior further justifies the focus on long-term steady-state behavior rather than on properties of individual finite runs.

4.1.2 Markov Decision Processes

Probabilistic finite-state automata with input (also, “Markov decision processes”) extend probabilistic automata (and standard Markov chains) with input events.

Definition 6.37 (Probabilistic finite-state automaton with input/Discrete-time Markov decision process). 

A probabilistic finite-state automaton with input is a probabilistic finite-state automaton with probability function π : Q ×I ×Q → [0, 1] over transitions. The probability function is normalized with respect to the next states: for every q ∈ Q and i ∈ I,

$${\sum }_{q^{\prime}\in Q}\pi (q,i,q^{\prime}) = 1.$$

When the automaton is in state q ∈ Q and inputs an event i ∈ I, it can make a transition to any state q′ ∈ Q with probability π(q, i, q′). At every step, the automaton moves to the drawn state q′, where it gets ready for a new transition.

Discrete-time Markov decision processes generalize discrete-time probabilistic finite-state automata with input to any countable sets Q and I of states and input events. \(\blacksquare\)

Input events are useful for modeling external influences on the system, such as control actions or changes in the environment; input does not, in general, deterministically control the system evolution – which remains stochastic – but it does affect the probability of individual transitions. According to the real-world entity it models, the input of Markov decision processes is identified by one of many names: “environment”, “scheduler”, “controller”, “adversary”, or “policy”. These terms present decision processes as OPEN systems, to which an external abstract entity supplies input (see also Sect. 3.7). The composition of decision process and environment is a CLOSED system that characterizes an embedded process operating in specific conditions.

Example 6.38.

Figure 6.10 shows a probabilistic finite-state automaton with input that models a student taking a series of exams; it is an extension of Example 6.33. While preparing for an exam, the student may attend classes on the exam’s topic (event a) or skip them (event s). While attending classes is no guarantee of passing, it significantly affects the probability of success: after the student has passed an exam (state P), there is a 90 % chance that she will also pass the next one if she attends classes (transition to P with event a); if she has not, the probability shrinks to only 20 % (transition to P with event s). Conversely, after the student has failed an exam (state F), there is a 70 % chance that she will pass the next one if she attends classes (transition to P with event a), but only 10 % if she skips them (transition to P with event s). The figure shows the probabilities associated with each transition and input event. \(\blacksquare\)

As with Markov chains, the properties of interest with probabilistic finite-state models with input include the probabilities of sets of runs or average long-term behavior. Probabilities depend, however, on the supplied input sequences, which corresponds to the notion of conditional probability (mentioned in Sect. 3.4.2), as demonstrated in the following example.

Example 6.39.

Let us derive some probabilistic properties of the model of Example 6.38, generalizing Example 6.34 for the presence of input events. Assume that P is the initial state.

  • The probability that the student passes the first k consecutive exams when she always attends classes is 0. 9k.

  • The probability that she passes the first 2k consecutive exams when she attends classes of every other exam is \({(0.9 \cdot 0.2)}^{k} = 0.1{8}^{k}\).

  • The probability pass k that the student passes the first k consecutive exams is, in general, a function of the input sequence \({i}_{1}\,{i}_{2}\,\ldots \,{i}_{k}\):

    $${\mathit{pass}}_{k}({i}_{1},{i}_{2},\ldots ,{i}_{k})\ =\ { \prod }_{1\leq j\leq k}p({i}_{j}),$$

    where

    $$p({i}_{j})\ =\ \left \{\begin{array}{@{}l@{\quad }l@{}} 0.9\quad &{i}_{j} = a, \\ 0.2\quad &{i}_{j} = s.\end{array} \right.$$

    \(\blacksquare\)

Since the input modifies the behavior of Markov decision processes, some interesting analysis problems consist of computing the input that ensures behaviors with certain properties. Using the terminology introduced before, these problems are presented as “controller (or scheduler) synthesis”. A synthesis problem for Example 6.38 is finding the input sequence that maximizes the probability of passing all the exams; it is clear that the solution is the input consisting of only events a. The following example illustrates less trivial synthesis problems.

Example 6.40.

Consider the following synthesis problem for the model of Example 6.38: “determine the input that maximizes the probability of alternating between passing and failing exams”.

Fig. 6.10
figure 10

A probabilistic finite-state automaton with input modeling a student taking exams

The obvious candidate – the input sequence asas that strictly alternates a and s – does not make for a satisfactory solution: if, for example, the student fails the first exam even if a was selected, we should input a again as the second event, to maximize the probability that she passes the second exam, hence alternating between passing and failing. The proper solution to the synthesis problem is thus not simply a fixed input sequence, or even a set of inputs. It is, instead, a computational process that responds to the probabilistic choices of the Markov decision process with suitable inputs, according to the strategy: “when the student passes an exam, generate input s; otherwise, generate input a”.

We can formalize this controller process as the deterministic Mealy machine in Fig. 6.11 (Sect. 5.2.3 defines Mealy machines), with input symbols p and f corresponding to “passing” and “failing” an exam, and output symbols a and s to be fed to the Markov decision process. \(\blacksquare\)

Exercise 6.41.

For the probabilistic finite-state automaton of Example 6.38, informally describe controllers that:

  1. 1.

    Minimize the probability of alternating between passing and failing exams.

  2. 2.

    Maximize the probability of passing exactly two exams consecutively after every failed exam, and otherwise alternating between passing and failing.

  3. 3.

    (\(\blacklozenge\)) Maximize the probability of passing exactly k + 1 exams consecutively after the kth sequence of consecutive failed exams, and otherwise alternating between passing and failing. For example, if the first exam is failing, the probability of having exactly two passing exams next should be maximized; if the first exams are fpfffpfpff, the probability of having exactly five passing exams next should be maximized (because there have been four sequences, of lengths one, three, one, and two, each consisting of consecutive failed exams).

    Fig. 6.11
    figure 11

    A Mealy machine controlling the probabilistic finite-state automaton with input of Fig. 6.10

    Formalize the controllers described in items (1–3):

  4. 4.

    Formalize the controllers in (1) and (2) as finite-state Mealy machines.

  5. 5.

    (\(\blacklozenge\)) Argue that the controller in (3) is not expressible as a finite-state Mealy machine, and suggest a more expressive computational model that can formalize the controller. \(\blacksquare\)

4.1.3 Continuous-Time Markov Chains

In contrast with the discrete-time probabilistic automata presented in the two previous subsections, CONTINUOUS-TIME probabilistic finite-state automata (also, continuous-time Markov chains) model time as a continuous set (the real numbers) and assign a probability distribution to the residence (also, “sojourn”) time in every state. This produces ASYNCHRONOUS behavior, where an automaton sits in a state and moves to a different state after an amount of time drawn from a probability distribution that may depend on the current state. In this respect, continuous-time probabilistic finite-state automata extend with probabilities the asynchronous interpretation of finite-state machines discussed in Sect. 5.2.2. For continuous-time probabilistic finite-state automata (and Markov chains), the probability distributions of residence times ensure that the probability of remaining in the current state for the next t time units does not depend on the previous states traversed by the automaton, but only on the current one, in the same way as the probability of making a transition only depends on the current state in discrete-time probabilistic automata (that is, the Markov property). The only probability distribution that satisfies the Markov property is the exponential one: the probability that an automaton waits for t time units decreases exponentially with t. Chapter 7 will present more general examples of finite-state automata with probabilities on time that allow for probability distributions other than the exponential.

Definition 6.42 (Continuous-time probabilistic finite-state automaton/Continuous-time Markov chain). 

A continuous-time probabilistic finite-state automaton extends a (discrete-time) probabilistic finite-state automaton with a rate function \(\rho : Q \rightarrow {\mathbb{R}}_{>0}\). Whenever the automaton enters state q ∈ Q, it waits a time given by an exponential distribution with parameter ρ(q) and probability density function

$$p(t)\ =\ \left \{\begin{array}{@{}l@{\quad }l@{}} \rho (q){e}^{-\rho (q)t}\quad &t \geq 0, \\ 0 \quad &t < 0, \end{array} \right.$$

whose corresponding cumulative distribution function is

$$P(t)\ =\ { \int }_{-\infty }^{t}p(x)\mathrm{d}x\ =\ \left \{\begin{array}{@{}l@{\quad }l@{}} 1 - {e}^{-\rho (q)t}\quad &t \geq 0, \\ 0 \quad &t < 0. \end{array} \right.$$

When it leaves q, the next state is determined as in the underlying discrete-time probabilistic finite-state automaton.

Continuous-time Markov chains generalize continuous-time probabilistic finite-state automata to any countable set Q of states. \(\blacksquare\)

Example 6.43.

A lamp with a lightbulb can be in one of three states: \(\mathsf{on}\), \(\mathsf{off}\), and \(\mathsf{broken}\). When it is off, it is turned on after 100 s on average; while turning on, the lightbulb breaks in 1 % of the cases. When the lamp is on, it is turned off after 60 s on average; while turning off, the lightbulb breaks in 5 % of the cases. Finally, it takes 500 s on average before a broken lightbulb is replaced with a new one. Figure 6.12 models the behavior of this lamp with a continuous-time probabilistic finite-state automaton, assuming seconds as time unit. Every state q is labeled with its rate ρ(q): since an exponential distribution with rate r has mean 1 ∕ r, the rates are the reciprocals of the average residence times. \(\blacksquare\)

Probabilistic automata underpin several computational models that include a distinctive notion of continuous time. Chapter 8 will present, in particular, stochastic Petri nets, whose semantics is based on continuous-time Markov chains and variants thereof. Chapter 7 will present other variants and generalizations of probabilistic finite-state automata.

Fig. 6.12
figure 12

A continuous-time probabilistic finite-state automaton modeling a lamp that may break

Some of those applications naturally refer to an alternative definition of continuous-time Markov chains, which we first illustrate in Example 6.44, then present formally in Definition 6.45, and finally claim equivalent to Definition 6.42 in Exercise 6.46.

Example 6.44.

Figure 6.13 shows a continuous-time probabilistic finite-state automaton with the same states and transitions as that in Fig. 6.12, but with a positive rate associated with every transition rather than state. The rates are parameters of the exponential distributions associated with each transition, which determine the time before each transition is taken.

Fig. 6.13
figure 13

An equivalent presentation of the continuous-time probabilistic finite-state automaton of Fig. 6.12

Suppose, for example, that the automaton enters state \(\mathsf{off}\); it will stay there until it moves either to state \(\mathsf{on}\) or to state \(\mathsf{broken}\). Transitions from \(\mathsf{off}\) to \(\mathsf{on}\) happen after time that follows an exponential distribution with rate \({p}_{1,2}{r}_{1}\); those from \(\mathsf{off}\) to \(\mathsf{broken}\) follow instead an exponential distribution with rate \({p}_{1,3}{r}_{1}\). Correspondingly, the overall residence time in \(\mathsf{off}\) follows the distribution of the minimum of the times before a transition to \(\mathsf{on}\) or to \(\mathsf{broken}\): whichever transition happens first preempts the other one. The minimum of two stochastic variables with exponential distribution is also exponentially distributed with rate given by the sum of the distributions’ rates; hence, state \(\mathsf{off}\) has rate \({p}_{1,2}{r}_{1} + {p}_{1,3}{r}_{1} = {r}_{1}\), as for the automaton in Fig. 6.12. Finally, a transition outgoing from \(\mathsf{off}\) reaches \(\mathsf{on}\) in \({p}_{1,2}{r}_{1}/({p}_{1,2}{r}_{1} + {p}_{1,3}{r}_{1}) = {p}_{1,2}\) of the cases, and \(\mathsf{broken}\) in the other \({p}_{1,3}{r}_{1}/({p}_{1,2}{r}_{1} + {p}_{1,3}{r}_{1}) = {p}_{1,3} = 1 - {p}_{1,2}\) fraction of the cases. These numbers show that the models in Figs. 6.12 and 6.13 have identical behavior. \(\blacksquare\)

Definition 6.45 (Continuous-time probabilistic finite-state automaton with transition rates/Continuous-time Markov chain with transition rates). 

A continuous-time probabilistic finite-state automaton (also, “continuous-time Markov chain”) with transition rates includes transition function \(\delta : Q \rightarrow \wp \left (Q\right )\) and rate function \(\rho : Q \times Q \rightarrow {\mathbb{R}}_{>0}\) for transitions between different states. Whenever the automaton enters state q ∈ Q, it draws, for every state q k  ∈ δ(q) directly reachable from q, a time t k exponentially distributed with parameter \(\rho (q,{q}_{k})\). It will then make a transition from q to q′ after exactly t′ time units, where \(t^{\prime} {=\min }_{k}\{{t}_{k}\}\) is the shortest drawn time and q′ = q h is the corresponding state with \(h {=\arg \min }_{k}\{{t}_{k}\}\). \(\blacksquare\)

Exercise 6.46 (\(\blacklozenge\)). 

Following Example 6.44, show that Definitions 6.42 and 6.45 are equivalent. Namely, show how to construct a continuous-time probabilistic finite-state automaton with transition rates with behavior identical to that of any given continuous-time probabilistic finite-state automaton (with state rates), and, conversely, how to construct a continuous-time probabilistic finite-state automaton (with state rates) with behavior identical to that of any given continuous-time probabilistic finite-state automaton with transition rates. \(\blacksquare\)

Continuous-time probabilistic automata augmented with input are the continuous-time counterparts of Markov decision processes, where input determines the transition rates of states or transitions. The following exercise asks for a more precise definition.

Exercise 6.47.

Define continuous-time Markov decision processes by combining Definitions 6.44 and 6.37. \(\blacksquare\)

4.2 Probabilistic Turing Machines and Complexity Classes

A probabilistic Turing machine can randomly choose which transition to take from among those offered by the transition function. Therefore, the transition function of a probabilistic Turing machine has the same signature as that of a nondeterministic Turing machine (Sect. 6.2.8). An alternative, but equivalent, model of a probabilistic Turing machine consists of a deterministic machine augmented with an unbounded tape of random bits; the transition function can take the value of the current bit into account to determine the next configuration.

It should be apparent that the use of randomness does not increase expressive power: everything computable by a probabilistic Turing machine is computable by a nondeterministic Turing machine that simply includes every transition that has nonzero probability of taking place. A deterministic Turing machine can then simulate the probabilistic Turing machine, thus confirming the Church-Turing thesis for probabilistic models.

The computational complexity analysis is more interesting – and much more challenging. The output of a Turing machine is in fact a random variable: on a given input, the machine can return completely different output values at random, or it can even randomly fail to terminate. This makes it difficult to compare the behavior of probabilistic Turing machines with that of deterministic Turing machines – described in purely functional terms – as well as that of nondeterministic machines – where the analysis simply focuses on the “best” possible outcome from among all those possible. To avoid these problems, the computational complexity analysis of probabilistic Turing machines usually focuses on bounded-error computations.

Definition 6.48 (Bounded-error probabilistic Turing machine). 

A “bounded-error probabilistic Turing machine” is a probabilistic Turing machine M that computes a function F(x) of the input. For all inputs x, M halts; upon termination, it outputs the correct value F(x) with probability greater than or equal to 2 ∕ 3. \(\blacksquare\)

The intuition behind Definition 6.48 is that the probability of a bounded-error Turing machine returning inconsistent results by chance can be made as small as needed with repeated runs of the machine on the same input. More precisely, consider n consecutive runs of a bounded-error probabilistic Turing machine M on an input x. Since the random choices made in every run are independent of those of the previous runs, one can derive, by applying suitable results of probability theory, an upper bound on the probability that a majority of them are incorrect as

$$\exp \left (-2n \cdot \frac{1} {{6}^{2}}\right ),$$

which decreases exponentially with the number of runs n. Therefore, it is sufficient to run the Turing machine for, say, 100 runs, and the behavior in the majority of runs is almost certain to be the expected one.

Exercise 6.49 (\(\blacklozenge\)). 

Show that any probability strictly greater than 1 ∕ 2 can replace 2 ∕ 3 in Definition 6.48 without changing the asymptotic behavior of bounded-error probabilistic Turing machines. \(\blacksquare\)

The notion of bounded error makes for a sound definition of computational complexity in probabilistic models based on average running time. Take any bounded-error probabilistic Turing machine, and let \({T}_{1}(x),{T}_{2}(x),\ldots ,{T}_{j}(x)\) be the j independent, identically distributed random variables counting the number of steps in each of j runs on the same input x. Their average

$$\frac{{T}_{1}(x) + {T}_{2}(x) + \cdots + {T}_{j}(x)} {j}$$

may either converge to a constant μ(T(x)) for increasing j or diverge to + . μ(T(x)) equals the expected value \(\mathbb{E}[T(x)]\) of the random variables’ distribution. The average k-run time complexity of the machine is then the expected value of the total running time:

$$\begin{array}{rcl} \mu (T(x),k)& =& \mathbb{E}[{T}_{1}(x) + {T}_{2}(x) + \cdots {T}_{k}(x)] \\ & =& \mathbb{E}[{T}_{1}(x)] + \mathbb{E}[{T}_{2}(x)] + \cdots \mathbb{E}[{T}_{k}(x)] \\ & =& k \cdot \mu (T(x)), \\ \end{array}$$

which is \(\Theta (\mu (T(x)))\) for every constant k. Thus, the asymptotic average running time is independent of how many finite runs are needed to achieve the desired probability of error and only depends on the probability distribution induced by the probabilistic model. Finally, define T(n) as the maximum μ(T(x)) from among all inputs x of length n.

T(n) is the time complexity function for bounded-error probabilistic Turing machines; a space complexity function can be defined similarly. Such complexity functions are on a par with those used for deterministic and nondeterministic Turing machines; hence we can compare the (time) complexity across computational models. To this end, we introduce the complexity class BPP, which represents all problems that are tractable for probabilistic computational models with bounded error, in the same way as the class P collects all tractable problems for deterministic models.

Definition 6.50 (Probabilistic complexity classes). 

  • A time complexity measure T(n) defines the complexity class BPTIME(T(n)) of all problems that can be solved by some bounded-error probabilistic Turing machine with time complexity in O(T(n)).

  • BPP (“bounded-error probabilistic polynomial”) is the class of problems that can be solved in polynomial time (and unlimited space) by a bounded-error probabilistic Turing machine, that is,

    $$\mathrm{BPP}\ =\ { \bigcup }_{k\in \mathrm{I}\!\mathbb{N}}\mathrm{BPTIME}({n}^{k}).$$

    \(\blacksquare\)

Using the complexity classes we know, the question of whether deterministic Turing machines can efficiently simulate (bounded-error) probabilistic Turing machines can be expressed as: “does P equal BPP?” If P = BPP then randomization does not make intractable problems tractable, and the strong Church-Turing thesis applies to probabilistic Turing machines that would be efficiently simulated by deterministic Turing machines with at most a polynomial slowdown. If instead P ⊂ BPP, then there exist problems that can be solved in polynomial time only with the aid of randomization.

The P/ BPP question is currently open, as is the other outstanding question on the relation between the classes P and NP. Unlike for the latter, for the former there is currently a large amount of evidence, both theoretical and experimental, suggesting that P equals BPP. In particular, efficient deterministic algorithms have been found for various important problems whose initial solutions required randomization. Even if the consensus is that “derandomizing” polynomial-time algorithms is always possible, it may be arduous to devise deterministic algorithms for certain problems first approached with probabilistic computational models. The most famous example is the problem PRIME of determining whether an integer is prime or composite (without necessarily computing a factorization). Polynomial-time probabilistic algorithms for PRIME have been known since the 1970s, but only about 30 years later was the first deterministic polynomial-time algorithm developed.

On the basis of the likely equivalence between P and BPP, nontechnical presentations usually do not detail the role of randomization and simply assume a probabilistic model whenever it makes for a practically convenient implementation by means of pseudo-random number generators.

5 Bibliographic Remarks

There is a rich choice of textbooks on the general topics of this chapter, such as Hopcroft et al. [30], Sipser [68], Lewis and Papadimitriou [43], Manna [46], Linz [44], Rosenberg [60], Jones [32], and Mandrioli and Ghezzi [45].

The chapter discussed the difference in scope between computational complexity theory, which focuses on complexity classes and their relations, and analysis of algorithms, which is concerned with devising efficient algorithms for specific problems. There are several comprehensive references in both areas: Papadimitriou [54], Arora and Barak [8], Bovet and Crescenzi [12], Goldreich [25], and Kozen [40] take the complexity theory viewpoint; Cormen et al. [16], Aho et al. [5], Kleinberg and Tardos [37], Mehlhorn and Sanders [48], Sedgewick and Wayne [63], Skiena [69], Papadimitriou and Steiglitz [55], and the encyclopedic Knuth [39] focus on algorithm design.

The theory of computation does not stop at Turing machines; books on computability theory – such as Rogers [59], Cutland [18], Soare [72], and Odifreddi [53] – explore the rich mathematics behind universal computation. At the other end of the spectrum, the theory of finite automata and formal languages studies computational models of lesser power than Turing machines, such as finite-state and pushdown automata. For example, deterministic pushdown automata have the same \(\Theta (n)\) asymptotic time complexity as finite-state automata, whereas nondeterministic pushdown automata can produce computation trees of size \(\Theta ({2}^{kn})\) for inputs of length n. Anderson [7], Lawson [41], Simon [66], Sakarovitch [62], Shallit [64], Ito [31], Khoussainov and Nerode [35], and Perrin and Pin [56] all present elements of automata theory, with different levels of mathematical sophistication.

Hartmanis and Stearns’s paper [28] is often credited as the foundational paper of modern computational complexity, although other computer science pioneers introduced similar methods even earlier. Fortnow and Homer [22] give a concise history of the origins of computational complexity.

Alan Turing introduced the Turing machine in his revolutionary 1936 paper “On computable numbers, with an application to the Entscheidungsproblem” [78, 79], which gives a foundation to all modern computer science. Turing’s original machine used a single tape for input, output, and scratch memory; proofs of equivalence between Turing’s model and k-tape Turing machines can be found in basically every textbook mentioned at the beginning of these notes. Van Emde Boas [80] gives a comprehensive treatment of the equivalence of many models of computation.

The Church-Turing thesis is sometimes referred to using only Church’s name, who first formulated it, but Robert Soare makes a convincing historical reconstruction [71, 73] showing that it was Turing who gave solid arguments for the thesis and showcased its importance; the name “Church-Turing thesis” seems therefore more historically accurate. Hofstadter [29] discusses the implications of the Church-Turing thesis in a somehow informal setting. It was recently suggested by authors such as Deutsch [19] that, since the Church-Turing thesis talks about computations that are physically implementable, it should be provable from the laws of physics; this line of thought highlighted interesting connections between physics and computational models [2, 11].

Hartmanis and Stearns’s seminal paper [28] contains the first proof of the linear speed-up theorem. The asymptotic notation was first introduced in mathematics by Bachmann and popularized by Landau. Knuth established the notation in computer science in [38], where he also traces its historical origins; Graham et al. [26] provide a comprehensive mathematical treatment of the asymptotic notation.

The strong Church-Turing thesis gradually developed as “folk knowledge” in complexity theory; Slot and van Emde Boas give one of the earliest explicit statements of the thesis [70]. As with the standard Church-Turing thesis, the strong Church-Turing thesis should be interpretable as a consequence of the limits imposed by the laws of physics on (efficient) computational processes [1]. Under this view, Feynman first observed that deterministic or probabilistic Turing machines seem incapable of efficiently simulating certain quantum-mechanical phenomena [20]; and Shor presented an algorithm [34, 36, 49, 52, 65] that factors an n-bit integer in time O(n 3) on a quantum computer, whereas no polynomial-time deterministic or even probabilistic algorithm for factoring is known. If large-scale computers based on quantum mechanics are implementable, the strong Church-Turing thesis will have to refer to quantum Turing machines [10] rather than ordinary Turing machines.

Strassen initiated the development of matrix multiplication algorithms asymptotically faster than the trivial \(\Theta ({n}^{3})\) algorithm. Strassen’s algorithm [76] runs in time O(n 2. 807) for n ×n matrices; for many years, the asymptotically fastest algorithm was Coppersmith-Winograd’s [15], which runs in time O(n 2. 376). A recent breakthrough improved it to O(n 2. 373) [75, 81]; related theoretical results suggest that there is still room for improvement [13, 58].

In a letter [21, 67] sent to von Neumann in 1956 and rediscovered in 1988, Gödel discussed the essence of the P/NP problem long before the birth of modern computational complexity. Cook [14] and Levin [42] independently introduced the notion of NP-completeness and produced the first NP-complete problems. Karp [33] showed how to prove that many other problems are NP-complete by polynomial-time reductions. There are several other types of reduction, with stricter bounds on the resources used, which are needed to study smaller classes than P and NP; any textbook on computational complexity discusses them at length. Garey and Johnson’s survey [23] of NP-complete problems is still valuable but somehow outdated; Crescenzi and Kann maintain a more up-to-date list [17]. The paramount importance of the P/NP problem is testified by many facts, including its inclusion among the mathematical problems of the millennium by the Clay Mathematics Institute. Gasarch surveyed [24] the opinion of computer scientists and showed that a large majority of them believe that P and NP are not the same; Aaronson [3] adds an informal, sanguine, and persuasive argument for P≠NP.

Von Neumann introduced the von Neumann architecture [83]; RAM and RASP machines [80] are modeled after it. There are many variants of abstract machines modeled after different computer architectures; for example, PRAM (Parallel RAM) is a family of machines to simulate parallel computation [80], and Knuth’s MMIX [39] is a detailed model of RISC processors.

Markov introduced Markov chains and applied them to natural language processing [47]; a number of textbooks present Markov chains and their variants in detail [27, 61, 77]. Example 6.43 is adapted from Ajmone Marsan [6].

Motwani and Raghavan [51], Mitzenmacher and Upfal [50], Vazirani [82], and Ausiello et al. [9] discuss randomized (and approximate) algorithms in detail.

Solovay and Strassen gave the first randomized polynomial-time algorithm for primality testing [74]. Miller-Rabin [57] is another well-known randomized algorithm. Agrawal et al. [4] provide the first deterministic algorithm for primality testing.