Ergodicity and large deviations in physical systems with stochastic dynamics

In ergodic physical systems, time-averaged quantities converge (for large times) to their ensemble-averaged values. Large deviation theory describes rare events where these time averages differ significantly from the corresponding ensemble averages. It allows estimation of the probabilities of these events, and their mechanisms. This theory has been applied to a range of physical systems, where it has yielded new insights into entropy production, current fluctuations, metastability, transport processes, and glassy behaviour. We review some of these developments, identifying general principles. We discuss a selection of dynamical phase transitions, and we highlight some connections between large-deviation theory and optimal control theory.


Introduction
In statistical mechanics, many properties of equilibrium systems can be calculated using free-energy methods, and the underlying Boltzmann distribution. However, this approach has two important restrictions -it only applies in equilibrium, and it is restricted to static properties. For example, the Boltzmann distribution has very little to say about dynamical quantities like viscosity and thermal conductivity, nor can it predict the time required for a protein to fold. Predicting such quantities requires some knowledge of the equations of motion of a system: the relevant statistical mechanical theories must include dynamical information. Such theories are useful in many contexts, which include non-equilibrium steady states [1][2][3] as well as dynamical aspects of the equilibrium state (for example in glassy materials [4]). Other physical phenomena also involve transient relaxation to equilibrium, for example nucleation [5] and self-assembly [6].
For complex systems (with many strongly-interacting components), dynamical theories often assume that the behaviour is ergodic. That is, the systems have steady states in which time-averaged measurements converge (for long times) to corresponding ensemble averages. Many important physical systems have this property, which motivates several questions. For example: (i) How long does it take for the time-averaged measurements to converge? (ii) What is the probability that the time-average a e-mail: rlj22@cam.ac.uk does not converge to the ensemble average, given some long time τ ?
This article outlines the application of large deviation theory as it applies to time-averaged quantities, and it describes some of the results and insights that have been obtained for physical systems. By considering a range of applications, the aim is to complement other papers that focus primarily on the general structure of the theory [29] or on specific classes of system [2,16]. The remainder of this section lays out some general principles and describes the theoretical context in more detail. Later sections are devoted to general aspects of the theory and to application areas including phase transitions, glassy systems, entropy production, and exclusion processes. A few examples are discussed in detail. The choice of applications and examples is biased towards the author's own work; they are presented within the broader context of the field.

Fluctuations of time-averaged quantities
This section introduces the main question that will be considered below. Consider a system with stochastic dynamics, whose configuration at time t is C t . Define an observable quantity b = b(C) and a time interval [0, τ ]; then the time-average of b is As a simple example one may consider an Ising model with N spins, as in [30,31]. Then C = (σ 1 , σ 2 , . . . , σ N ) where each spin σ i = ±1. Take b(C) to be the energy of this configuration, so b τ is the time-averaged energy. Clearly b τ is a random variable: different trajectories of the system have different values for this quantity. However, in ergodic systems the typical situation is that b τ obeys a central limit theorem at large times: its distribution is Gaussian with a variance that decays as τ −1 . Motivated by questions (i) and (ii) above, this article considers fluctuations that are not covered by the central limit theorem: large deviation theory is used to characterize rare events where b τ differs significantly from its mean value, even as τ → ∞. We will see below that these are exponentially rare, in the sense that their log-probability is negative and proportional to τ . Since these events are very rare, one might wonder what relevance they have for practical physical systems. In response to this question, we make two general points, which will be clarified below. First, large-deviation theory has a rich structure and enables sharp statements about the dynamical behaviour of complex systems. As such, it can be viewed as an idealised theoretical starting point for studies of dynamical behaviour in non-equilibrium systems, which enables general insight. An important example is the analysis of fluctuation theorems [12]. Second, the theory has already proven useful for understanding the behaviour of physical systems, for example through analysis of metastable states in glassy systems [18,32] and biomolecules [25], and through uncertainty bounds on fluctuations of the current [33], which are relevant for rare events and for typical fluctuations.

Theoretical context
The mathematical theory for large deviations of timeaveraged quantities in stochastic processes was formulated by Donsker and Varadhan in the 1970s [34][35][36][37]. A clear presentation of the general (mathematical) theory of large deviations is given in the book of den Hollander [10]. An alternative mathematical approach to these problems is discussed in the book of Dupuis and Ellis [38], including a connection to ideas of optimal control theory, as discussed below. In physical studies of non-equilibrium systems, work by Derrida and Lebowitz [13] and Lebowitz and Spohn [12] laid the foundations for the work described here, building on earlier studies [11,39,40]. As mentioned in the introduction, theories of ergodicity and time-averages in deterministic systems also have a long history [7][8][9]41], and large deviation theory is also relevant in these cases [8,9,42]. This article is restricted to stochastic systems, analysis of deterministic systems requires a different set of methods and assumptions.
A separate strand of mathematical work applied large deviation theory to hydrodynamic limits ( [43], Ch. 10), and underlies the macroscopic fluctuation theory of Bertini, de Sole, Gabrielli, Jona-Lasinio and Landim [2,14], which can also be used to analyse fluctuations of time-averaged quantities. Yet another direction is the connection between large deviation theory and the theory of equilibrium statistical mechanics, as discussed by Ellis [44], see also [42,45].
A useful resource from the physics literature is the review of Touchette [29] which gives a clear presentation of large-deviation theory as it applies to equilibrium statistical mechanics and to time-averaged quantities, see also [16,46]. Two recent papers by Chétrite and Touchette [47,48] provide a comprehensive summary of the large deviation theory of time-averaged quantities, as it applies to physical systems.

Outline
The remainder of this article is structured as follows. Section 2 gives an overview of the large deviation theory for time-averaged quantities. It focuses on finite systems, which simplifies the analysis. Section 3 discusses some of the dynamical phase transitions that can occur in infinite systems, including an example calculation for the 1d Glauber-Ising model and a discussion of dynamical phase coexistence. In Section 4 we discuss the behaviour of glassy systems, including dynamical phase transitions in kinetically constrained models. Section 5 discusses the role of time-reversal symmetries and large deviations of the entropy production, including an example from active matter. We give a short discussion of exclusion processes and hydrodynamic behaviour in Section 6 before ending in Section 7 with an outlook and a discussion of some possible future directions.

General theory
This section outlines the general theory of large deviations of time-averaged quantities. This presentation is not at all complete, the aim is to highlight useful facts, in order to provide physical insight and intuition. Nevertheless, some mathematical precision is required, in order to understand the scope and applicability of the theory; some technical details are provided in footnotes. A more comprehensive presentation of similar material is given by Chétrite and Touchette [47,48].

Definitions
The central quantities that appear in large deviation theory are probability distributions, rate functions, and cumulant generating functions. These are introduced in a general context, some of the systems to which the theory can be applied are discussed in Section 2.2. We consider models that converge at long times to unique steady states, and angled brackets · indicate steady-state averages.
Recalling (1), the probability density for b τ is denoted by p(b|τ ). The cumulant generating function (CGF) for b τ is G(s, τ ) = log e −sτ bτ . (2) One sees that G(0, τ ) = 0 and (∂G/∂s) s=0 = − τ b τ . 1 To analyse large deviations, we consider the limit of large τ , defining As anticipated in Section 1.1, the interesting case is where this limit is finite (and non-zero), so the relevant fluctuations occur with probabilities that decay exponentially with τ . In this case we say that b τ obeys a large deviation principle (LDP) and I is called the rate function. 2 The rate function is non-negative, I(b) ≥ 0 for all b. In cases where an LDP holds with rate function I, we write The meaning of the asymptotic equality symbol is that (4) is equivalent to (3), see [50]. It is a general property of LDPs that the argument of the exponential in (4) is the product of the rate function and a large parameter that is called the speed of the LDP. In (3,4) the speed is τ , which is an assumption of the theory presented so far. There are physical systems where time-averaged quantities obey LDPs with other speeds (for example [51-54]) but we focus here on LDPs with speed τ , which is the most common situation.
In simple cases (see Sect. 2.2 for examples), the rate function I is analytic and strictly convex, with a unique minimum at b = b , and I( b ) = 0. In this case [10,29], b τ obeys a a central limit theorem (as in [55]), with a variance σ 2 b /τ that is related to the curvature of the rate function as σ 2 b = 1/I ( b ). The next step is to define the scaled cumulant generating function (SCGF), One sees from (2) that ψ (0) = − b and ψ (0) = σ 2 b . Fig. 1. Sketch of a rate function and an SCGF for a positive quantity b. In this example, both functions are convex and related by Legendre transformation; the rate function has a single minimum at the ensemble averarge b ; the SCGF has ψ(0) = 0 and ψ (0 The rate function and the SCGF are related by a Legendre transformation: This is (a particular case of) Varadhan's lemma [10]. It can be motivated by writing (2) as and substituting (4), then doing the integral by the saddlepoint method. If the function ψ is analytic then one has also An important result in large deviation theory is the Gärtner-Ellis theorem [10,29]: it allows large deviation results like (4) to be proved, as long as the SCGF ψ obeys certain conditions. In such cases the rate function can then be derived from (8).
A schematic illustration of two functions I and ψ related by Legendre transform are shown in Figure 1. Note, the minus sign in (2) means that the behaviour of the SCGF for s > 0 is relevant for the rate function for b < b , and vice versa.

Applicability of the theory
The theory of Section 2.1 can be applied to a wide range of models, but some assumptions are required in order to ensure that the limit in (3) is finite and non-zero. Some results including (8) also rely on analytic properties of ψ. In this article, much of our analysis is based on two main classes of system, which are Markov chains and diffusion processes. We make several assumptions, which ensure that the models are ergodic, the limit (3) is well-behaved, and the functions I and ψ are analytic and strictly convex, as discussed in [47]. These cases are useful to illustrate the theory. However, the tools of large-deviation theory are not at all restricted to these cases; this will become clear in later sections.

Finite-state Markov chains
We consider finite Markov chains, so the configurations C come from a finite set. They may evolve in either continuous-or discrete-time. In continuous time, a model is defined by specifying the transition rates between the configurations, which are denoted by W (C → C ). Prominent examples in this case include exclusion processes and Ising-like models on finite lattices. In this case b may be defined as a time integral as in (1), or one may take the more general form [49] where the functions α, h correspond to observable quantities similar to b in (1), and the sum runs over the transitions that take place in the trajectory. This type of time-averaged quantity is particularly useful when considering time-averaged currents: for example, if the model involves particles hopping on a 1d lattice with periodic boundaries, one may take α = 1 for jumps to the right and α = −1 for jumps to the left [16], with h = 0. If the continuous-time Markov chain is finite and irreducible and α, h are finite then the limits in (3,5) certainly exist, and the functions I and ψ are analytic and strictly convex [47]. In discrete time the situation is the same, as long as the Markov chain is also aperiodic.

Diffusion processes
We also consider models defined by stochastic differential equations (or Langevin equations). In this case the configurations C are vectors in d-dimensional space and they evolve by where the circle indicates a Stratonovich product. Here, v is a vector-valued drift, σ is a matrix-valued noise strength, and W t is a d-dimensional standard Brownian motion. For models in this class, some technical restrictions are needed on the functions v, σ, in order to establish existence of the limits in (3,5) and convexity of the rate function. For simplicity, we restrict to systems defined on finite domains, with periodic boundary conditions. In this case it is sufficient that v, σ should be finite and the matrix σσ † should not have any zero eigenvalues. In this case b may again be defined as in (1), or one may consider [47] b where now a, h are vector-valued and scalar functions respectively. A simple case takes a to be constant and h = 0 in which case b τ is a time-averaged current in the direction a. (In systems with closed boundaries then such time-averaged currents must vanish as τ → ∞ but periodic systems can support trajectories with sustained non-zero current.) The functions b, a, h are all assumed to be finite.

Analogy between τ → ∞ and thermodynamic limit
This article focuses on large deviations of time-averaged quantities, but there are other situations where large deviation theory is relevant in physics. The most prominent example is the theory of the thermodynamic limit [42,44]. We briefly outline the analysis of this limit within large deviation theory, which motivates an analogy between large-time limits and thermodynamic limits. A more detailed discussion of this analogy is given in the review of Touchette [29], see also [9,46,49]. The analogy is useful for two reasons. First, it provides valuable intuition about dynamical large deviations, since thermodynamic theories may be more familiar than dynamical ones. Second, it provides a route whereby established methods from thermodynamics can be generalised, in order to address dynamical problems. Within the analogy, the CGF in (2) corresponds to a difference in free-energy between two states. Specifically, consider a thermodynamic system of volume V , where the energy of configuration C is E(C). Define β = (k B T ) −1 where k B is Boltzmann's constant, so the Boltzmann distribution of this system is p β (C) = e −βE(C) /Z β , where Z β is the partition function. We denote averages with respect to p β by · Boltz . Now consider a perturbation to this system where the (extensive) energy is modified by ∆E(C) = −hV m V (C). For example, m V might be the (intensive) magnetisation of an Ising model, and h its conjugate (magnetic) field. The free energy difference between the original system and this new state is ∆F . It satisfies − β∆F (β, h, V ) = log e βhV m V (C) Boltz (12) which is analogous to (2) with (−βh, V, m V ) → (s, τ, b τ ) and (−β∆F ) → G. For the purposes of this analogy we consider β to be a fixed number, it is the field h that is the analogue of the parameter s from Sect. 2.1. In that section we considered the limit τ → ∞, here we consider V → ∞.
Application of large-deviation theory to the fluctuations of m V requires that this quantity is intensive, which means that it can be expressed as an average over the (large) system, analogous to the time-average in (1). Then standard thermodynamic arguments for large systems imply that ∆F is extensive: is a difference in free-energy density. Comparing with (5), the dynamical SCGF ψ(s) is analogous to −β∆f (β, h). Continuing the analogy shows that m V for the unperturbed system has a probability distribution p(m|V, β) e −V I(m,β) (13) with I(m, β) = sup h [βhm + β∆f (β, h)], similar to (4,8).
Just like the dynamical case, some care is required with this analysis in cases where ∆f (β, h) is not analytic. These cases correspond to thermodynamic phase transitions, for which there is a well-developed theory: see for example [44,45]. Section 3 discusses some ways that the thermodynamic theory of phase transitions can be generalised to the dynamical context.

Biased ensembles of trajectories (the s-ensemble)
In the thermodynamic setting, it is natural to consider a family of Boltzmann distributions, parameterised by h. We now introduce corresponding distributions for trajectories, which we refer to as s-ensembles [49] or biased ensembles [47]. Let C indicate a trajectory of the system of interest, where the time t runs from 0 to τ . This trajectory has a probability density P τ (C), which has the property that F = F (C)P τ (C)dC. 3 Note that the probability of the initial state C 0 is included in P τ (C). The probability density for trajectory C in the biased ensemble is which is normalised, by (2). The average of any trajectorydependent observable F within this ensemble is Note that these averages depend implicitly on the trajectory length τ . In the analogy with thermodynamics, (14) corresponds to a Boltzmann distribution, in the canonical ensemble. As discussed in [49], standard thermodynamic arguments for equivalence of ensembles then indicate that typical trajectories of (14) should be similar to typical trajectories from an associated microcanonical ensemble, where the value of b τ is constrained to a specific value. A precise characterisation of this ensemble-equivalence is given in references [47,56].
An important observation is that the initial and final conditions of the trajectory are analogous to boundaries of thermodynamic systems, where the behaviour may differ from the bulk. Thermodynamic equivalence of ensembles applies to observable quantities that are evaluated in finite regions, within the bulk of a large system. In biased ensembles, these correspond to observables that are wellseparated (in time) from the initial and final conditions at t = 0, τ .
Bearing this mind, it is useful to consider a one-time dynamical observable a(C t ), such as the instantaneous energy of the system E(C t ). This quantity is associated with two different probability distributions, depending on the time t [30,49,57]. The bulk is characterised by a distribution which we define by evaluating the observable at a randomly-chosen time: 3 It is not trivial to define the integration measure dC, but see [49] for an explicit construction for finite Markov chains. A more rigorous mathematical approach would sidestep this problem by working directly with probability measures for trajectories. The analysis of this work can be reformulated in that way: one should replace integration measures Pτ (C)dC by dPτ (C) and ratios of probability densities P (C)/Q(C) by Radon-Nikodym derivatives dP/dQ. All conclusions remain unchanged.
Alternatively one may evaluate the same observable at the final time τ to obtain The presence of boundaries means that P ave = P end in general. In the cases that we consider, the bulk of the s-ensemble is time-translation invariant (similar to homogeneity of thermodynamic systems), which means that P ave can also be evaluated as P ave (a|s) = lim τ →∞ δ[a − a(C uτ )] s for any u with 0 < u < 1. 4

Formulation as eigenproblem (operator approach)
We describe two types of theoretical approach by which results for large deviations can be obtained. This section describes the first method, which is to characterise ψ(s) as the largest eigenvalue of an operator (or matrix), which is called a tilted generator or a biased master operator.
In the physical context, this was the approach applied (for stochastic models) in [12,13], see also [11,39,40]. To explain it, define where the delta function restricts the average to trajectories that end in state C. Comparing with (2), one sees that G(s, t) = log ρ(C|s, τ )dC. The time derivative of ρ behaves as ∂ ∂τ ρ(C|s, τ ) = W s ρ(C|s, τ ) where W s is an s-dependent linear operator. 5 For example, in finite-state Markov chains (with n states) then where M s is a matrix of size n × n that depends on the transition rates of the model and on the observable b τ [16,47,49]. For diffusion processes then W s is an operator that involves first and second derivatives with respect to C, an example is given in (31), below. The large-time behavior of the solution of (19) can be deduced by considering the largest eigenvalue of W s . [In the example of (20), this is simply the largest eigenvalue of M .] Anticipating the answer, we assume that this largest eigenvalue is unique and we denote it by ψ(s). The associated eigenvector (or eigenfunction) is P end (C|s) which we define to be normalised as a probability distribution P end (C|s)dC = 1. So the eigenproblem is W s P end = ψ(s)P end (21) and the solution of (18) is for some constant A s (independent of τ ). In the correction term, ∆ is the gap between the largest and secondlargest eigenvalues of W s . 6 Integrating over C one sees that G(s, τ ) ≈ τ ψ(s) + log A s , consistent with (5). By (15), we also identify ρ(C|s, τ )e −G(s,τ ) with δ(C − C τ ) s . Taking τ → ∞ one sees from (22,5) that it is consistent to identify the eigenvector of W s with P end as defined in (17). Note that the operator W s is not generally Hermitian (self-adjoint). The eigenvector that we identified here as P end is the right eigenvector. The role of the left eigenvector will be discussed in the next section.
To summarise, large deviations of b τ can be characterised by analysing the properties of the tilted operator W s , as in [12,13]. This approach is valuable as a tool for explicit computations (especially in finite-state Markov chains where the matrix M is finite). In addition, it establishes a connection between large-deviation problems and eigenproblems that are familiar from quantum mechanics. Like the analogy with thermodynamics discussed above, this connection with quantum mechanics is useful in practice because it means that methods from that field can be generalised in order to analyse large deviations [30,39,58,59].

Control representation and auxiliary process
This section describes a second method for analysis of large deviations, based on optimal control theory [60]. One advantage of this method is that it is built on a variational formula, which can be very useful for deriving approximate results in situations where diagonalisation of W s is not possible. The method has a transparent physical interpretation which is that (rare) large deviation events can be characterised by deriving a new physical model whose typical trajectories resemble closely the rare events of interest. This new model is called here the optimally controlled process, following earlier work by Fleming [61] and (more generally) the book of Dupuis and Ellis [38]. In previous work it has been called a driven process [47,48] or an auxiliary process [30,62], see also [39,[63][64][65].
Consider first a general controlled process (not necessarily optimal). Let F con denote the average of a path-dependent quantity F , in this process. The probability density for trajectories in the controlled process is P con τ (C). Then a useful general formula ( 6 For the cases described in Section 2.2, the gap ∆ is strictly positive. Models (and limits) where ∆ vanishes are often associated with anomalous fluctuations, including dynamical phase transitions, see Section 3.
where D(P con τ ||P τ ) = log is the Kullback-Leibler (KL) divergence between P con and P . 7 The KL divergence is non-negative and is equal to zero only if P con τ = P τ . In the thermodynamic setting of Section 2.3, equation (23) is the Gibbs-Bogoliubov inequality, see [66], in particular their equation (25). It is possible to find a controlled process where (23) becomes an equality. To see this, use (15,24) to rewrite the right hand side of (23): Hence equality is possible in (23) only if P con τ = P s τ : the controlled process must reproduce the probability distribution of the s-ensemble. 8 The bound (23) can be analysed using tools from stochastic optimal control theory [60], see also [48]. The general aim of this theory is to find (controlled) Markov processes that maximise (or minimise) quantities like the right hand side of (23), which are interpreted as cost functions. For example, the process P con might consist of requests which arrive randomly in a queue, and a stochastic rule for dealing with these requests. In this case a suitable cost would be some combination of the mean waiting time in the queue and the resource required to implement the policy. One seeks the policy that minimises the cost. Such problems have been studied in detail, they are obviously applicable in practical settings and they are also mathematically tractable [60].
Returning to the large-deviation context, observe that computation of the large-deviation rate function does not require a full characterisation of G(s, τ ) but only of ψ(s), which is related to G(s, τ ) by (5). Hence A key observation is that for the standard cases of Section 2.2, equality can be achieved in this formula by an (optimally)-controlled process that is Markovian and stationary [30,47,67]. This is a very useful simplification. From a comparison with (6), one may expect that where P con is a controlled process with b τ con = b. In this case, 7 In a more rigorous approach, the ratio of probability densities in this definition would be replaced by a Radon-Nikodym derivative. 8 It is not trivial to construct a stochastic process whose probability distribution of trajectories achieves P con τ = P s τ , this is related to the theory of dynamic programming [60]. The construction of such a process is possible for all examples considered here, although the controlled process may be complicated. For example, its transition rates may depend on time, see for example [47].
where the minimisation is over stationary Markovian controlled processes for which b τ con = b. 9 The final result (27) has an intuitive interpretation, it states that the least unlikely mechanism for achieving a rare event with b τ = b can be reproduced by a controlled process that minimises the KL divergence. A central idea of large-deviation theory [10] is that this least unlikely mechanism is sufficient to characterise the rare event. The variational principle means that the controlled process differs as little as possible from original process; the size of the difference is quantified via the KL divergence.

Equivalence of different large deviation problems
An interesting aspect of the theory presented here is that the same optimally-controlled process may appear as the solution to several different large deviation problems. In the operator formalism, this happens because the same operator W s may appear in several different contexts. In fact, this is a very common situation. To see the reason, we define For models in the scope of Section 2.2, the quantity g τ has a representation as either (9) or (11). Hence one sees that the biased ensemble P s τ of (14) can also be characterised as a biased ensemble for the controlled process: Given a biased ensemble of interest, one may choose the controlled process (and hence g) in order to transform the problem into a form that is more tractable. This is very useful for numerical work [68][69][70][71][72]. It also enables analytic progress. For example, in biased ensembles where b τ is of the form given in (9) of (11), it is simple to construct a controlled process such that the quantity sb τ (C) + g τ (C) that appears in (29) reduces to a simple time-integral as in (1). Hence biased ensembles P s with b τ as in (9) have alternative formulations where the dynamics is modified but the bias has the (simpler) form (1). This observation was used in [49] to relate large deviations of the dynamical activity in spin models to large deviations of the time-integrated escape rate, see also [73] which discusses some relationships between large deviations of currents and dynamical activities.

Connection of operator and optimal-control approaches
There is a deep connection between the optimal control approach of Section 2.6 and the operator approach of Section 2.5. A similar connection appears in quantum 9 The results (23,26) are extremely general but (27) is similar to (8) in that it requires assumptions related to analyticity and convexity of ψ and I. These assumptions are valid for models within the scope of Section 2.2. mechanics, where one may use either an operator approach or an approach based on path integrals.
A general method to connect operator equations and controlled processes is to maximise the right hand side of (26) over some class of controlled processes, in order to find an optimally-controlled model. This variational problem is equivalent to solving for the largest eigenvalue of an operator W † s , which is the Hermitian conjugate (adjoint) of the operator W s discussed above. (The eigenvalue appears as the value of a Lagrange multiplier.) We present an example calculation for a simple diffusion process, after which we summarise the resulting general picture.
Consider large deviations of b τ as in (1), for a diffusion problem described by a stochastic differential equation with additive noise: where C t is a d-dimensional vector and W t a d-dimensional standard Brownian motion. Using the operator method, the SCGF can be obtained for this process by solving the eigenvalue problem (The second line is an explicit formula for W s , the derivatives are with respect to C.) The controlled process is obtained from (30) by replac- Similarly to [48,67], we show in Appendix A that if this control potential is used with (26), maximising the resulting bound on ψ is equivalent to solving the eigenproblem (31). In particular, the optimal control may be expressed in which W † s is the Hermitian conjugate (adjoint) of the operator W s given in (31). Its form is given in (A.7). Equation (33) is an eigenproblem for W † s , whose largest eigenvalue was already shown to be the SCGF ψ. 10 Constructing the controlled process from the corresponding eigenfunction F achieves equality in (26) -hence this is an optimally-controlled process.
The conclusion of this analysis is that solving the eigenvalue problem (A.3) is equivalent to optimising (26) over controlled processes of the form (32). Also, the optimal control potential and the eigenvector are related as F = e −φ/2 . So the same information is available by the operator and optimal-control approaches.
We have analysed the simple model (30) but this structure is very general, see also [38]. Analogous steps can be applied to all the models of Section 2.2. Taking b τ as in (1) it is sufficient in these cases to consider controlled processes that are obtained by adding conservative control forces, as the derivative of a potential. For Markov chains with transition rates W (C → C ), the appropriate controlled dynamics is [30,63] For b τ as in (9,11) one should first use the method of Section 2.7 to transform the problem to a form where b τ has the form given in (1): this may require a nonconservative control force. One then adds an additional conservative control force, as the gradient of φ. It is sufficient to optimise over this φ. We end this section by observing that for time-reversal symmetric systems, both the eigenvalue problem and the optimal-control problem can be simplified. If (30) represents an equilibrium (time-reversal symmetric) system then v = −∇U for some potential U , so the controlled system (32) is also time-reversal symmetric (with potential U + φ). In this case the steady state of the controlled system is a Boltzmann distribution µ ∝ e −(U +φ) . Then (23) yields a simple variational result which is equivalent to the Rayleigh-Ritz formula for the largest eigenvalue of a self-adjoint operator, see also [11]. The physical origin of this simplification is the timereversal symmetry of the biased ensemble (15). The F that maximises the right hand side of (35) is the eigenfunction of W † s and gives the optimal control potential as φ = −2 log F.

Dynamical phase transitions
We emphasised in Section 2.2 that finite systems are typically associated with analytic rate functions and SCGFs. However, there are many examples of rate functions that have singularities. For example, this can occur in Markov chains with infinite state spaces [17,30,74,75], which are not covered by Section 2.2. Motivated by the analogy with thermodynamics discussed in Section 2.3, these singularities can be identified as phase transitions.
Physically, the key feature is that singularities are (usually) associated with a qualitative difference in mechanism between rare events with different values of b τ . There are several situations in which such behaviour can arise. We focus here on one broad class of phase transitions, which we describe as space-time phase transitions, sometimes called trajectory phase transitions [17], see also [76]. These occur in large systems where the observable b in (1) is an intensive variable in the spatial (thermodynamic) sense, see below.
Other kinds of dynamical phase transition have also been discussed in the context of dynamical large deviations [74,75,[77][78][79][80]. Those results show that singular rate  [49,81,82] is that dynamical trajectories of d-dimensional models (left) can be analysed by mapping them to configurations of (d + 1) thermodynamic models (right). The thermodynamic system has size N × τ and one analyses its behaviour in the joint limit N, τ → ∞.
functions can occur for a variety of different reasons. They also show that systems outside the scope of Section 2.2 cannot be assumed to have analytic rate functions, even if the models appear very simple.

Thermodynamics in space-time
In Section 2.3 we described an analogy between large deviations of time-averaged quantities and the thermodynamic limit. In this section we are concerned with large deviations of time-averaged quantities in large systems. As a guiding example, we consider the one-dimensional Ising model with periodic boundaries, evolving by Glauber dynamics, as in [30]. There are N spins and the state of the ith spin at time t is σ i,t = ±1. We consider a joint limit of large time τ → ∞ and large system size N → ∞.
To analyse this situation it is useful to make a mapping between trajectories of a d-dimensional model and configurations of a corresponding (d + 1)-dimensional thermodynamic system [39,46,49,[81][82][83]. The key idea is that the time t in the dynamical model is interpreted as an additional spatial co-ordinate in the thermodynamic system. Figure 2 illustrates this mapping for the 1d Glauber-Ising model, for which the corresponding thermodynamic system is a variant of the 2d Ising model. 11 In the general case, we use the same symbol C to indicate a trajectory of the dynamical model (as in Sect. 2.4) and the corresponding configuration of the (d + 1)-dimensional thermodynamic model. We define a Boltzmann distribution for the (d + 1)-dimensional model by assigning probability P τ (C) to configuration C. This means that fluctuations in the dynamical model can (in principle) be analysed by applying methods of equilibrium statistical mechanics to the Boltzmann distribution of the (d + 1)-dimensional system.
Consider large deviations of some dynamical quantity u that corresponds to an intensive variable in the (d + 1)dimensional system. 12 For the example of the Ising model, we consider the time-averaged energy per spin: For a general dynamical model (with finite N ) that falls in the scope of Section 2.2, large deviations of u can be analysed following Section 2. It is convenient to perform this analysis by setting b τ = N u. Then the biased ensemble of (15) is with analogous to (2). Recalling that P τ (C) is a Boltzmann distribution for the (d + 1)-dimensional model, we identify P s τ (C) in (37) as a Boltzmann distribution where the energy has been perturbed by the extensive quantity sN τ u. 13 Also G N is the difference in free energy between the perturbed and unperturbed models. Since u was assumed to be intensive, this (d + 1)-dimensional system has an extensive energy function. On general thermodynamic grounds [45] one therefore expects for N, τ → ∞ that where G is the bulk free-energy density. We recall from thermodynamics that there are no phase transitions in finite systems: in the present context this means that G N (s, τ ) should always be an analytic function of s. However, the limiting function G(s) may have singularities, which correspond to thermodynamic phase transitions in the (d + 1)-dimensional model. In the dynamical context, we refer to these as space-time phase transitions.

Space-time phase transitions
To analyse these phase transitions, it is convenient to first take τ → ∞ at fixed N , and then later N → ∞. At fixed N , we define a SCGF by analogy with (5) and a rate function by analogy with (3) (The factors of N are included for later convenience.) The assumptions of Section 2.2 are sufficient to ensure that ψ N and I N are analytic and strictly convex. The analogue of (6) is ψ N (s) = sup u [−su − I N (u)]. For large systems we are motivated by (39) to define and also These functions may not be analytic. However, the convexity of I N means that As an example of a dynamical phase transition, Figure 3 shows the large-deviation behaviour of the time-integrated energy in the 1d Glauber-Ising model. Exact results are available for this model, see [30] and also Appendix B. We show results at inverse temperature β = 1 but the qualitative behaviour is the same for all positive β [30]. There is a critical point at s = s c where G is singular, and there is a corresponding singularity in I. This critical point separates a paramagnetic regime for small s and a ferromagnetic regime for s > s c , as might be anticipated by the correspondence with the 2d Ising-like model shown in Figure 2. The transition may also be analysed via a mapping to a quantum phase transition [84], see [30].
In finite systems the function ψ N (s) is analytic, as is I N (s). However the second derivative ψ N (s c ) diverges logarithmically with N : this is the (weak) specific-heat singularity of the 2d Ising universality class [84]. The singularity is illustrated in Figure 3c by plots of ψ N (s) close to s c ; its gradient ψ N (s c ) grows (slowly) with N .

First order phase transitions and dynamical phase coexistence
In thermodynamics, first-order phase transitions are associated with phase coexistence phenomena. The same situation holds at first-order space-time phase transitions. However, the manifestation of this phenomenon may differ between thermodynamic and dynamical transitions. This can be illustrated by the finite-size scaling behaviour at these transitions [17,58,69,82]. We summarise the associated behaviour, a more detailed analysis can be found in [69,85].
Applying the thermodynamic analogy of Section 3.2, note that the associated thermodynamic model is anisotropic because the horizontal (time-like) and vertical (space-like) axes in Figure 3 are not equivalent. To reflect this, consider a d-dimensional system with N = L d so that G N (s, τ ) depends separately on L and τ . In the analogy with thermodynamics, L d τ corresponds to the volume of the thermodynamic model, and τ /L to its aspect ratio.
In large deviation analysis, a natural approach is to first take τ → ∞ at fixed L as in (40), and then take L → ∞ as in (42). This means that the aspect ratio (τ /L) → ∞. However, in thermodynamic finite-size scaling analyses, it is more common to consider isotropic systems where the aspect ratio is fixed at unity [86], this corresponds to taking L, τ → ∞ together. Nevertheless, thermodynamic systems with diverging aspect ratio have been analysed [87]: they provide a suitable comparison point for large-deviation analyses [69,85]. Limits where L, τ → ∞ together have also been considered in numerical studies of large deviations [18,58].
The key fact is that physical behaviour at phase coexistence depends on the aspect ratio of the system. The situation is summarised in Figure 4. For τ /L = O(1) one observes the familiar behaviour of thermodynamic  (43) for a system exhibiting dynamical phase coexistence between two phases in which the order parameter has values ki and ka. (b) A trajectory exhibiting phase coexistence contains domains of both two dynamical phases, which are labelled as inactive (k ≈ ki) and active (k ≈ ka), see for example [81,82]. (c) Sketch of the probability distribution of k in a finite system where τ is comparable to L, with sample trajectories that correspond to different values of k. Dynamical phase coexistence similar to (b) is associated with a local minimum of the probability, which is a local maximum in this plot. (d) Sketch of the same probability distribution in a finite system where τ is very large, compared to L. In this case the distribution is unimodal and phase coexistence involves multiple domains arranged along the time-like axis. See [86,87] and [69,85].
phase coexistence, which means that the probability density p N (b|τ ) is bimodal with two peaks corresponding to the coexisting phases, see Figure 4c and also [18,58]. The trough between the peaks corresponds to coexistence, where macroscopic domains of the phases are separated by an interface. On the other hand, if one takes instead a very large aspect ratio (τ → ∞ before L → ∞) then I N (b|τ ) in (41) is strictly convex so p N is unimodal. In this case typical trajectories include many large domains of each phase, which are arranged along the time-like axis, see Figure 4d and also [69,85].
To summarise the central message of space-time thermodynamics: large-deviation theory can be applied to time-averages of (spatially) intensive quantities. The results can be understood by analogy with (d + 1)dimensional thermodynamic systems. A natural approach to this limit is to consider the behaviour of G N and I N as N → ∞, which means that we take a limit of large time before any limit of large N . In this case G N and I N are both analytic convex functions that converge to nonanalytic limits as N → ∞. This signals that a space-time phase transition is taking place.

Glassy systems and metastability
Interesting examples of space-time phase transitions appear in glassy systems, including supercooled liquids [18]. The dynamical behaviour of these systems continues to challenge theoretical understanding [88,89]. The structural relaxation time of a liquid is the time required for a molecule to diffuse a distance comparable with its (microscopic) diameter. In a simple liquid at a moderate temperature, this time might be a few picoseconds. On cooling through the glass transition, the structural relaxation time increases rapidly and eventually exceeds the (macroscopic) experimental time scale, which might be seconds or hours. For practical purposes, the system is no longer ergodic. The spatial correlations between molecules change only slightly as the system approaches its glass transition, but the system's dynamical properties change dramatically.

Dynamical phase transitions in glasses
Observing that the glass transition is a dynamical phenomenon, Merolle et al. [81] applied thermodynamic methods to the statistics of (d + 1)-dimensional trajectories, similarly to Section 3.2 above, see also [82]. Their idea was that this methodology might capture information that is not available from standard thermodynamic methods. Early studies [81,82] focussed on simple kineticallyconstrained lattice models (KCMs), which capture many of the dynamical features of glassy systems [89]. They considered fluctuations of the time-averaged dynamical activity, which in spin models is defined by counting the total number of configuration changes in a trajectory. This is a proxy for the extent to which molecules in a supercooled liquid are able to move around and explore their environment [18].
The connection of [81,82] to large deviation theory was realised shortly afterwards, and it was shown that dynamical phase transitions occur generically in KCMs [17,49]. This result is discussed in Section 4.2, below. It is notable because KCMs do not exhibit thermodynamic phase transitions, raising the possibility that the experimental glass transition might be related to an underlying dynamical phase transition, even in a system with simple thermodynamic properties [89].
Following this work on KCMs, numerical studies of atomistic models of liquids have shown evidence for dynamical phase transitions [18][19][20][90][91][92]. Large deviations have been analysed for a variety of time-averaged quantities including several different definitions of dynamical activity [19,90,91], and measures of liquid structure [20,92]. There is also evidence for dynamical phase transitions in experiments on glassy colloidal systems [21,93]. Some glassy spin models have thermodynamic glass transitions, and numerical and analytic arguments indicate that these models should also support dynamical transitions [94]. Together, these works show that glassy systems generically exhibit large fluctuations, which can be probed by a variety of time-averaged quantities, and can be characterised via rate functions.
To explain the dynamical phase transition that takes place in KCMs, we discuss the prototypical example of the Fredrickson-Andersen (FA) model [95] in one dimension. This was one of the first glassy systems [81] for which large deviations were analysed. The existence of the phase transition can be proved by a very simple argument [17,49]. More recent work has characterised this transition in detail [59,69,96,97], as well as other large-deviation properties of this model [59,98,99].

Dynamical phase transition in the FA model
The FA model (in one dimension) consists of N spins in a linear chain with periodic boundaries. The state of the ith spin is n i = 0, 1 and a configuration of the system is C = (n 1 , n 2 , . . . , n N ). Spins with n i = 1 are active and indicate excitations, which are regions of a glassy system where particles are moving more than is typical. Spins with n i = 0 are inactive. The kinetic constraint is that spin i can change its state only if at least one of its neighbours n i±1 are active. If this constraint is satisfied then spin i flips from state 0 to state 1 with rate c, while the reverse process happens with rate 1 − c.
The behaviour of the model depends on the parameter c. In particular, for a system at equilibrium then the fraction of spins that are in state 1 is n i = c. The dependence of the model on temperature T is captured by identifying c = e −J/(kBT ) where J is the characteristic energy of an active site (excitation). 14 Now let k i,τ be the number of times that spin i changes its state, between time zero and time τ . Summing over all spins, a time-averaged (intensive) measure of dynamical activity is This corresponds to (9) with α(C, C ) = (1/N ) for all C, C . We analyse the large deviations of this activity by following Section 3.2 with u → k τ (this is similar to Section 2.1, replacing b τ → N k τ ). The following very simple argument shows that the functions G and I have singularities that correspond to first-order phase transitions. Consider the configuration with n 1 = 1 and n i = 0 for all other sites. The rate of transitions out of this configuration is 2c; the probability that it occurs as initial condition is denoted by π 1 . Now define a very simplistic controlled process where the system begins in this configuration and never leaves it. For this trajectory one has Using this result with (23,39,42) and noting that π 1 is independent of τ , one obtains ψ N > −2c/N and hence Since the activity k τ ≥ 0, it follows from (2,47) that G(s) = 0 for s ≥ 0. Figure 5 illustrates the result: there is a discontinuity in the first derivative of G at s = 0, which corresponds to a first-order space-time phase transition. 15 Applying (8), it follows that I(k) = 0 for all k < k τ . This means that for large N, τ , rare events where k τ is smaller than its average have log-probabilities that do not scale as N τ . In fact, these log-probabilities are much smaller: they are either proportional to N or τ , depending on the relative magnitudes of these two quantities [82].
We note that the bound (47) is very general in KCMs, and establishes that these phase transitions occur in many different models [17,49]. However, it does rely on the existence of a "hard" kinetic constraint, which means that for a typical configuration C, there are spins which cannot flip. This is a strong assumption and leaves open the question as to whether similar phase transitions are possible in models with softened constraints as in [58], where every spin flips with a non-zero rate. In fact similar (first-order) dynamical phase transitions still occur in the softened FA model [58], although in this case the singularity in G(s) occurs at s * > 0, and the only zero of I(k) is at k = k τ .

Large deviations and metastable states
These results for kinetically constrained models show that glassy systems with simple thermodynamic properties can still exhibit dynamical phase transitions. However, other theories of the glass transition assert that slow relaxation in liquids is linked to long-lived metastable states that can be analysed thermodynamically. This theoretical paradigm is certainly valid in a class of mean-field spin glasses, 16 while research continues into the question of whether it applies in physical (three-dimensional) liquids [100]. Some mean-field spin-glass models exhibit firstorder dynamical phase transitions [94], similar to those in kinetically constrained models. The operator approach of Section 2.5 has been used to show that long-lived metastable states lead naturally to such transitions [94]. Here we give a brief explanation as to how the same conclusions can be reached (perhaps more intuitively) by an optimal-control argument.
Metastability is associated with a separation of time scales. The physical idea -which can be applied in nonequilibrium systems as well as in equilibrium [101][102][103] is that if a system is initialised in a metastable state then it equilibrates quickly within that state, on a time scale τ f = O(1), before eventually relaxing to some other state on a much longer time scale τ s 1. Consider a system with n ≥ 2 states, labelled by α = 1, 2, . . . , n. This includes the case where one state is stable and the others are metastable (for example a mean-field ferromagnet in a field). It also includes systems at thermodynamic phase coexistence, which have two or more stable states. 17 Let π α be the probability that a steadystate configuration belongs to state α. We analyse large deviations of an intensive observable u that has different average values in each state: we denote these averages by u α . In cases where the time scales are well-separated and the metastable states are well-defined then α π α 1 and the steady-state average of u is u α π α u α . These approximate equalities are accurate if τ s τ f . Following (46) as well as [32,94] we consider a controlled process that starts in state α and remains there for the entire trajectory. Its behaviour within state α matches the natural dynamics of the model within that state. Since relaxation is fast within the metastable state, the time for the original (uncontrolled) model to leave this state is exponentially distributed with a mean that we denote by τ s α . By analogy with (46), we deduce that As usual we consider large systems, N → ∞. In idealised cases such as mean-field ferromagnets, the slow relaxation between states occurs on time scale τ s ∼ e κN where N is the system size and κ = O(1). If state α is metastable then π α ∼ e −N ∆f where ∆f = O(1) is a difference in (intensive) free energy; if α is stable then π α = O(1). Using (48) with (23) and b = N u shows that Taking τ → ∞ at fixed N and using (40,42) gives If τ s α 1 is a slow time scale then one sees that G(s) ≥ −s u α . Using also that ψ N (0) = − u and u = u α , this implies that G (s) has a discontinuity at s = 0, which corresponds to a first-order space-time phase transition, similar to the case of kinetically constrained models. A more detailed analysis of this case can be found in [94], using the operator approach.
We emphasise that such first-order transitions are generic for systems where (τ s /τ f ) → ∞ which includes mean-field systems with metastable states, and finitedimensional systems at phase coexistence. For finitedimensional systems away from phase coexistence then all metastable states have finite lifetimes, and one expects (τ s α ) −1 ∝ N . (For example, recall that nucleation rates for systems close to phase coexistence are proportional to the system size N [5].) In such cases, (50) gives a bound on G that is not sufficient to establish the existence of a phase transition, but can be used to relate crossovers in G(s) and G N (s, τ ) to properties of metastable states, particularly u α and τ s α [32,94]. These arguments establish strong connections between metastability and large deviations, which (we argue) are very useful when interpreting largedeviation computations for glassy systems [18,20,92].

Fluctuation theorems and time's arrow
Glassy systems have slow dynamics but their equilibrium states are time-reversal symmetric. We now turn to models of non-equilibrium steady states. Early work in this area [8,12] demonstrated the usefulness of large deviation studies of time-averaged quantities in physics, by exploiting connections between dissipation and irreversibility. In this section we set Boltzmann's constant k B = 1, so that entropy is a dimensionless quantity.
We write C R for the trajectory that is obtained by reversing the arrow of time in trajectory C. In the simplest case, this means that C R τ −t = C t . More generally the time-reversal operation might involve a change in some system variables, such as reversal of molecular velocities, as in [104,105]. Then, a (time-integrated) measure of irreversibility for trajectory C within a given model can be identified as Recall that P includes the probability of the initial configuration of the system C 0 and that these initial conditions are taken from the steady state of the system. It follows that for equilibrium systems, Σ τ (C) = 0 exactly, for every time τ and every trajectory C. It is useful to define Fig. 6. Sketch of the rate function and SCGF for the entropy production in a generic system that obeys the fluctuation theorem. The dashed line in the left panel has gradient −1/2, it intersects the rate function at σ = ± σ , consistent with (61). The SCGF has reflection symmetry through s = 1 2 , consistent with (59). and to identify this P * as the probability distribution for trajectories under a particular controlled process which we refer to as the adjoint process, following [2]. One sees that The mean entropy production can never be negative; it is zero only for time-reversal symmetric (equilibrium) systems, since P = P * in that case. One drawback of the irreversibility measure Σ τ is that the quantity P τ (C) appearing in (51) cannot usually be evaluated, because it depends on the probability of the initial state of the trajectory, which is typically not known (except in equilibrium systems where it only depends on the energy). However, one may define a time-averaged rate of entropy production as where we recall that π(C 0 ) is the probability density for the initial condition of the trajectory. 18 In non-equilibrium systems, the usual situation is that Σ τ grows with τ while log π(C0) π(C R 0 ) remains finite. 19 In this case, the large deviations of σ τ are the same as the large deviations of Σ τ /τ , even if these quantities have different values when τ is finite.
In many physical systems, closed formulae for σ τ are available. For example, consider a simple model for particle motion (in d dimensions) similar to (30). The natural physical interpretation of this model is that a particle moves through a viscous fluid with friction constant γ at temperature T , and feels an nonconservative external force f (C t ). Then it may be shown from (54) that We identify τ T σ τ as the total work done by the force f which coincides (in this simple situation) with the heat dissipated in the fluid. Dividing the dissipated heat by the temperature gives the entropy production, so the probabilistic definition of σ τ in (54) coincides with the time-averaged rate of (physical) entropy production. Note that heat and work coincide in this example system because all forces were assumed to be external: hence there is no internal energy. To separate the definitions of heat and work one should formulate the first law of thermodynamics by defining an internal energy U and a corresponding force −∇U . Then write f = F ext − ∇U in (55), where F ext is an external force [3]. The work is then F ext (C t ) • dC t , and the heat transferred to the fluid is (54). The difference of these quantities is the change in internal energy: this is the first law of thermodynamics.
For Markov chains with jump rates W (C → C ), the analogue of (56) is given by (9) with α(C, C ) = log[W (C → C )/W (C → C)], which requires the assumption that W (C → C ) is non-zero whenever W (C → C ) is non-zero. (This property is sometimes called weak reversibility.) Returning to the main argument, it follows from the explicit formula (54) that large deviations of σ τ can be analysed within the class of models discussed in Section 2.2. The connection of the entropy production σ τ with the irreversibility measure Σ τ means that large deviations of σ τ have interesting symmetry properties, as we now discuss.

Fluctuation theorem of Gallavotti-Cohen
We discuss fluctuation theorems for the entropy production in non-equilibrium steady states [3,8,12,83,[106][107][108]. Consider first the CGF for Σ τ : where the first line is the definition of G Σ , the second line uses (51), and the third simply rearranges various terms. Changing integration variable from C R to C, and using again (51) one finds the symmetry relation See for example [12], where the quantity Σ τ was denoted by W . Now consider large deviations of the entropy production σ τ whose SCGF is denoted here by ψ(s). Recalling (54), one may expect that the large deviations of σ τ are the same as those of Σ τ /τ , in which case one would have ψ(s) = lim τ →∞ τ −1 G Σ (s, τ ). The relationship between σ τ and Σ τ is discussed in [12], which showed (for several broad classes of stochastic model) that This is the symmetry that was identified by Gallavotti and Cohen [8]. It is closely related to (58) but we note that (59) is a statement about large deviations of σ τ as τ → ∞, in contrast to (58) which is a statement about Σ τ that is valid for all τ . See also [3,108]. Now assume convexity of ψ and use (8) with (59) to write Relabelling the dummy variable s = 1 − x and using (8) one obtains a fluctuation theorem [12,83] I(−σ) = I(σ) + σ.
Taking σ > 0, one sees that I(−σ) determines the logprobability of trajectories with negative entropy production. Since I(−σ) > I(σ), these trajectories are exponentially rarer than trajectories with positive entropy production. (This can be interpreted as a statistical form of the second law of thermodynamics.) In addition, the difference in log-probability is given quantitatively by (61), so the fluctuation theorem (which is an equality) contains more information than the second law (which is an inequality). For equilibrium systems we recall that Σ τ = 0 exactly, so the methods of large-deviation theory are not relevant. At a formal level then I(σ) = ∞ whenever σ = 0, and ψ(s) = 0 for all s. For non-equilibrium systems, it is notable that the optimally-controlled process at s = 1 2 is time-reversal symmetric; also the optimally-controlled process at s = 1 is the adjoint process, which corresponds to the original process running backwards in time, see for example [109].

Example: active Brownian particles
Fluctuation theorems such as (59) are very general results. However, the analysis of the fluctuations of the entropy production in specific systems can reveal additional rich structure. An interesting example is the behaviour of active-matter systems [72,[110][111][112]. As an example we consider a system of active Brownian particles [113,114], as considered in [72]. It consists of N circular particles in a two-dimensional system of size L 2 . Particle i has an orientation, which is represented by a unit vector e i . The particles interact by repulsive forces and they undergo thermal diffusion with diffusion constant D 0 . In addition, they feel non-conservative propulsive forces of fixed strength which act along their orientation vectors. The propulsive forces are such that a single isolated particle moves with average speed v 0 . Each orientation vector Fig. 7. Illustration of the behaviour of the entropy production σ in a system of active Brownian particles, based on [72]. We show the limiting rate function I and the corresponding free energy G, which obey a Gallavotti-Cohen symmetry as in Figure 6. The free energy G is singular at s = 0, and I has two linear segments, see the text for a discussion. For s < 0 (high entropy production), typical configurations involve spontaenous particle alignment and exhibit collective motion. For 0 < s < 1 (low entropy production) system enters a dynamically-arrested phase-separated state.
undergoes rotational diffusion, independent of all other co-ordinates.
Reference [72] considered large deviations of a quantity called the active work, which has a corresponding (intensive) measure of entropy production: where r i is the position of particle i, and the integral is evaluated using the Stratonovich convention. The physical interpretation of σ τ is that there is a force on particle i, acting in direction e i (with constant magnitude). The integral in (62) is the work done by this force, normalised by the temperature. We refer to σ τ as the entropy production, although other definitions of the entropy production are possible in such systems [115][116][117][118]. Large deviations of σ τ obey the fluctuation theorems (59,61). The resulting large-deviation phenomenology is illustrated in Figure 7, following [72]. We focus on large systems and we consider I and G as defined in (42,43). Note that I obeys the fluctuation theorem (61) but its behaviour is quite different from the illustration in Figure 6. The reason is that this system exhibits several space-time phase transitions which appear in the limit of large system size. This corresponds to N → ∞ at a fixed overall density ρ 0 = N/L 2 . In the following, it is sufficient to consider only σ > 0: the behaviour for σ < 0 follows from the fluctuation theorem.
A first observation is that the behaviour for σ > 0 and s < 1 2 in Figure 7 somewhat resembles Figure 5: there is a discontinuity in G (s) at s = 0 and a range of σ over which I(σ) = 0. This was explained in [72] by an optimalcontrol argument: they proposed a controlled process that can be used with (27) to show that I N (σ) ≤ O(1/L) for a finite range of σ between 0 and σ . 20 Hence I(σ) = 0 in this regime, by (43). The behaviour of the controlled system in this case is that the particles form a high-density cluster where particle motion is strongly reduced, and σ is small. Hence this state was called "phase-separated and arrested". The associated reduction in particle motion is analogous to the transition to the inactive phase in Figure 5, which explains the similarity to that case, see also Section 6.
For large deviations with σ > σ , the numerical results of [72] show spontaneous symmetry breaking, in that particles align their orientations with each other (Fig. 7). In this case they also move collectively through the system. For an intuitive understanding of this transition, it is useful to consider a controlled system where the particles' orientation vectors feel forces (or torques) that tend to align them. Reference [72] considered a mean-field (infinite-ranged) interaction. If this interaction is strong enough to create long-ranged (ferromagnetic) order of the orientations, it clearly reduces the number of interparticle collisions, and this increases σ. This controlled process provides a bound on I via (27) and numerical tests indicate that this bound is close to the true value of I. The conclusion is that particle alignment is an effective mechanism for fluctuations of the entropy production.
The understanding of large deviations in this system is not yet complete, but it is clear from [72] that fluctations with σ < σ are strongly coupled to density fluctuations and the arrest of particle motion, while fluctuations with σ > σ are associated with spontaneous symmetry breaking and particle alignment. This illustrates the rich large-deviation behaviour of these non-equilibrium systems.

Exclusion processes and hydrodynamic behaviour
A very active area of large-deviation research is the behaviour of interacting-particle systems including exclusion processes and zero-range processes [2,[13][14][15][16]74,[119][120][121]. This section gives a brief overview of some of the relevant phenomena, focussing on the similarities and differences between these systems and those analysed in previous sections.

Activity fluctuations in the simple symmetric exclusion process
As a concrete example, we focus on the symmetric simple exclusion process (SSEP) with periodic boundaries, as considered in [122] as well as [71,73,123]. In this case, N particles move on the L sites of a one-dimensional periodic lattice, with at most one particle per site. Suppose that the particle hop rate is γ and the lattice spacing is a 0 so that the diffusion constant for a single particle is D 0 = γa 2 0 /2. 21 In this system, exact results are available, for large deviations of the (time-averaged, intensive) particle current  τ and the activity k τ [122]. The current is defined using (9) with α = 1/L when a particle hops to the right and α = −1/L for hops to the left. Similarly the activity is defined by taking α = 1/L for all hops. 22 We focus here on large deviations of the activity.
This process is a finite Markov chain, satisfying the conditions of Section 2.2. Hence the rate functions for finite systems are analytic and convex. The interesting behaviour occurs in the limit of large system size, which means that L → ∞ with a fixed mean density ρ = N/L. This suggests that the space-time thermodynamic theory of Section 3.1 should be applicable. However, the number of particles is a conserved quantity in the SSEP, which means that the corresponding (d + 1)-dimensional thermodynamic model has some unusual features, from the thermodynamic perspective.
To understand dynamical large deviations, note first that if all the particles in the SSEP form a single cluster by occupying adjacent sites, then there are only two particle hops that are possible (at the edges of the cluster). In this case one may apply exactly the same argument as Section 4.2 to obtain ψ N (s) ≥ −2γ/N (63) where ψ N is the SCGF [as in (40)] for the activity k τ . The activity k τ ≥ 0 so ψ N (s) ≤ 0 for positive s. This establishes that ψ N (s) = O(1/N ) for positive s (low activity).
On the other hand ψ N (s) is of order unity for negative s (high activity). Defining G as in (42) one arrives at a situation similar to Figure 5, with G(s) = 0 for s ≥ 0 while G is of order unity for s < 0. The resulting situation is shown in Figure 8. This result is correct but it misses some important properties of exclusion processes, for which one requires a more detailed analysis [122,123]. The SSEP has a slow diffusive time scale associated with large-scale density fluctuations τ L ∼ L 2 /D 0 . These slow (hydrodynamic) fluctuations hinder ergodicity and tend to enhance the variance of time-averaged quantities. For example, it may be verified from [122] that the variance of k τ behaves for large N, τ as Var(k τ ) ∝ 1/τ , independent of N , and hence ψ N (0) = O(N ). This is in contrast to dynamical phase coexistence as it occurs in the FA model, where ψ N (0) is of order unity.
The activity fluctuations responsible for ψ N (0) → ∞ in the SSEP can be captured by macroscopic fluctuation theory [2]. It is convenient to rescale time by τ L : let Similarlyτ = D 0 τ /L 2 . The space-time thermodynamics approach of Section 3.1 focusses on large deviations with where I(k) takes values of order unity. This is an LDP with speed τ , where the rate function is proportional to N . By contrast, macroscopic fluctuation theory is a theory for large deviations with withĨ(k) of order unity. From a physical perspective, the interpretation of this formula is that the log-probability of the large deviation is proportional to the system size N and to the timeτ , which is measured on the hydrodynamic scale. Just like (65), we interpret (66) as an LDP with speed τ , but now with a rate function proportional to N/L 2 . For this one-dimensional system then L ∝ N as N → ∞, so the rate function in (66) goes to zero with system size; this may be contrasted with (65), where the rate function diverges. In general, the question of whether (65) or (66) is applicable depends on whether the fluctuation of interest is governed by hydrodynamic (slow) variables or microscopic (fast) variables. For the SSEP, the macroscopic fluctuation theory gives a quantitative description of fluctuations on the hydrodynamic scale. They can be analysed by considering a suitable SCGF, for small values of the biasing parameter s = O(N −2 ), see [123] for details. The result is that (66) is applicable for large deviations throughout the range 0 < k τ < k . In this case the fluctuation mechanism is that the SSEP becomes macroscopically inhomogeneous: it forms dense and dilute regions that suppress the activity.
However, for fluctuations where the intensive activity is significantly larger than k τ , the probability scales as in (65) and the macroscopic fluctuation theory is not applicable. Specifically, for small negative s, reference [122] gives where the constant A = O(1) can be obtained by adapting ( [122], Eq. 57) to the current notation. From (44) we see for k ≥ k τ that The second derivative G (s) diverges as s → 0 − , while I (k) vanishes as k → k τ from above. This is consistent with the scaling of the variance of k τ (inversely proportional to τ and independent of N ) and its link to hydrodynamic density fluctuations. In fact, these fluctuations have a strong dependence on s: for any s < 0 the system is hyperuniform [73], which means that density fluctuations on large scales are very strongly suppressed [124].

General implications of hydrodynamic modes
We have explained that for the SSEP with periodic boundaries, high-activity fluctuations follow (65) and lowactivity fluctuations follow (66). The low-activity regime may be analysed within macroscopic fluctuation theory, which can also be applied to large deviations in other interacting-particle systems, including (weakly) asymmetric exclusion processes and zero-range processes [2,16,75,120]. Similar results can also be found in off-lattice models [125,126]. An important general question in this area is whether slow (hydrodynamic) modes lead to fluctuations governed by (66). Macroscopic fluctuation theory provides a partial answer. We use the language of activity fluctuations but the argument is general. We introduce a notion of local equilibration within a spatial region of size , with 1 L. A system is at local equilibrium [2] if the distribution of particles within that region resembles the natural (unbiased) system at the same (local) density. In this case the hydrodynamic behaviour can be analysed by considering the (smooth) density field, and an associated current.
Consider a system in d spatial dimensions so N ∝ L d and suppose that one can construct a macroscopically inhomogeneous state where the (total) activity differs from k but the system is everywhere in local equilibrium. (Such states may be also be time-dependent, for example travelling waves [75], and there may be hydrodynamic flow of particles.) In this case, macroscopic fluctuation theory explains that the log-probability of fluctuations with this activity obeys (66). However, we now have N ∝ L d so the rate function scales as L d−2 . The physical interpretation is that local equilibrium states have densities that vary slowly in space: these smooth (hydrodynamic) profiles relax slowly towards the steady state and can therefore be stabilised by adding very weak control forces to the system [125,126]. This leads to small values of the KL divergence in (23,27) and hence to small values of the rate function.
In fact, the nomenclature of local equilibrium may be slightly misleading in this context, in that the same argument may be applied to systems with non-equilibrium steady states, such as the active-matter system of Figure 7. In that case, the same hydrodynamic argument shows that I(σ) = 0 for a finite range of σ between 0 and σ ; see [72]. However, this argument relies on the existence of a hydrodynamic theory for this active system where the only relevant field is the density -general conditions for this to hold in fluids with non-equilibrium steady states have not yet been established.

Outlook
This article has illustrated some aspects of the rich phenomenology of large deviations of time-averaged quantities. The focus has been on the behaviour in large systems, with many interacting degrees of freedom. In particular, on taking the system size N → ∞, rate functions can develop singular behaviour. These singularities -which can be interpreted as dynamical phase transitions -happen when the mechanism for large fluctuations differs qualitatively from the typical behaviour. The main examples that we have considered are (i) the appearance of ferromagnetic order in a 1d Ising model [30]; (ii) the existence of an inactive state in the FA model [17,69,96]; (iii) collective motion and arrested phases in an active matter system [72]; (iv) phase separation and hyperuniformity in the SSEP [73,122].
We emphasise that these examples are illustrative and we have not attempted a comprehensive review. Among the things that have not been discussed are the recent development of large deviations at level-2.5 [127,128], which can be interpreted as a more detailed fluctuation theory from which the main results of Section 2.1 can be derived by the contraction principle, see [47]. This theory also allows derivation of thermodynamic uncertainty principles, which are general bounds on the fluctuations of currents, including variances and large deviations [33,129]. In a similar vein, there are some indications that large deviation principles are built on an underlying geometrical structure [63,130,131], which has consequences for optimally-controlled processes.
Looking forward, we mention a few directions of ongoing research. This review has concentrated on theoretical results and their implications for qualitative behaviour (such as how rate function scale with system size N ). However, numerical results have also contributed strongly to large-deviation research. Building on earlier studies [18,81,132,133], recent years have seen renewed interest in efficient and accurate computation of rate functions and SCGFs [68][69][70][71][134][135][136][137].
The theoretical ideas presented here are also being adapted to new settings. For example, large-deviations of time-averaged quantities are increasingly discussed in open quantum systems [28], mostly using operator approaches applied to density matrices. Generalisation of the level-2.5 and optimal control approaches are also being explored in that context [138]. Another direction of interest is non-Markovian models [51, 53,139,140], which can be even richer than the Markovian cases considered here [80,141]. Overall, the field has many interesting open questions, and new methods are becoming available, in order to address them. This makes us optimistic about future progress.
My understanding of large deviation theory and its applications has been shaped by many discussions and collaborations, and I am grateful to many people for their encouragement, advice, and patient explanations. Special thanks go to Juan P. Garrahan Publisher's Note The EPJ Publishers remain neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Appendix A: Optimal control calculation A.1 Equivalence of optimal-control problem and eigenproblems for a stochastic differential equation For large deviations of b τ in the model of (30), we show that using the controlled model (32) with (26) and maximising over φ is equivalent to solving the eigenproblem (31). Using the theory of path integrals with Stratonovich convention, we write the path probability 23 for (30) as (A.1) where π is the probability of the initial condition. A similar expression holds for the controlled process (32): we assume for simplicity that this process has the same initial distribution π as the original process, although this 23 The integral τ 0 |∂tCt| 2 dt that appears in (A.1) is not mathematically well-defined for processes like (30). This may be resolved by defining Pτ (C) as a density with respect to the path-measure for a Brownian motion (see for example [105]), or by considering P (C) to be the probability density of an explicitly time-discretised trajectory. In either case one arrives at the same result in (A.2), which is well-defined and unambiguous. assumption is easily relaxed. In order to apply (26) we compute log P con τ (C) where the • indicates a Stratonovich product. This gives an explicit expression for g τ in (28). The integral in the first line can be evaluated as To apply (26) we require the average of (A.2), as τ → ∞. Ergodicity of the controlled process allows us to replace averages of time integrals by averages with respect to the steady-state distribution, which we denote by µ. So (26) becomes 3) It is not necessary to compute µ explicitly, one uses instead the Fokker-Planck equation for the controlled process to show that it solves and one also has µ(C)dC = 1. These are two constraints that can be implemented by Lagrange multipliers: we are left to find an extremum of where the functional Lagrange multiplier λ enforces (A.4) while γ enforces normalisation of µ. A short calculation shows that the extremum occurs for λ = 0, and is characterised by This is an example of a Hamilton-Jacobi equation (or a Hamilton-Jacobi-Bellman equation). Using it with (A. 3) shows that solutions of the variational problem have γ ≤ ψ(s). Moreover, writing φ = −2 log F yields where the second line follows from the expression for W s given in (31). This is an eigenfunction equation for the operator W † s , and γ is the associated eigenvalue. The optimal bound on ψ is obtained by taking the largest available solution for γ, which is therefore the largest eigenvalue of W † s -this is equal to ψ(s), by (31). It follows that (A.3) is an equality if one takes the (optimal) control potential φ = −2 log F where F is the relevant eigenfunction of W † s .

A.2 Example: large deviations of squared displacement in an Ornstein-Uhlenbeck process
To illustrate this general discussion, we analyse the specific case of a one-dimensional Ornstein-Uhlenbeck process, which is where x t is a real number. The force −ωx is the gradient of a potential 1 2 ωx 2 . Large deviations for this process have been discussed previously in several contexts, for example [54,142,143]. We consider large deviations of b τ = τ −1 τ 0 x 2 t dt so b(x t ) = x 2 t . The tilted generator of (31) is W s , which acts on probability density functions P as W s P = ∇ · (ωxP + ∇P ) − sx 2 P. (A.9) Its largest eigenvalue is the SCGF ψ(s) = (ω/2) − s + (ω 2 /4).

(A.10)
This result is valid for s > −ω 2 /4, otherwise the spectrum of W s is not bounded from below, this is linked to the behaviour of the rate function, as we discuss below. The associated eigenfunction is P end (x|s) = ω − ψ(s) 2π exp − x 2 (ω − ψ(s)) 2 . (A.11) Note that ψ(s) ≤ (ω/2) so this is a normalised probability density function, for ψ(s) = 0 it reduces to the steadystate distribution of (A.8). Using (A.10) with (8) From this we infer (using φ = −2 log F) that the optimal control potential φ = −ψ(s)x 2 . For s > 0 then ψ(s) < 0 and the control potential results in an additional confining force that reduces the typical value of x 2 t . To verify that these results are consistent with the optimal-control theory, recall that the controlled process in this case is dx t = −(ωx t + ∇φ(x t ))dt + √ 2dW t . (A.14) Identifying v = −ωx then (A.6) becomes This equation can be solved by taking γ = ψ(s) [as given in (A.10)] and φ = −ψ(s)x 2 . This is indeed the optimal control potential φ that maximises the bound in (A.3), which then becomes an equality.
To understand this result in a more intuitive way, it is useful to restrict to a controlled process with a quadratic control potential φ(x) = 1 2 ux 2 , where u is a variational parameter. Then (A.2) becomes log P con τ (C) (A.16) The controlled process describes motion in a potential 1 2 (ω + u)x 2 so x 2 t con = 1/(ω + u) and one obtains by (24) lim τ →∞ 1 τ D(P con τ ||P τ ) = u 2 4(ω + u) .
This quantity measures how different is the controlled process from the original OU process. For u = 0 the two processes coincide and the KL divergence is zero. Moreover, this controlled process has b con = 1/(ω + u) so it can be used with (27), as long as one takes b = b con , that is u = b −1 − ω. The result is a bound on the rate function Comparison with (A.12) shows that this variational result holds as an equality. This occurs because the ansatz of a quadratic control potential is sufficient to capture the optimal control. It follows that the optimally-controlled dynamics for fluctuations with b τ = b is simply The SCGF can also be obtained by using (A.17) with (26), or equivalently by using (A.18) with (6). The conclusion of this analysis for the OU process (A.8) is that large deviations of the time-average of x 2 t occur by trajectories that are representative of a similar (controlled) OU process, in which only the parameter ω is modified. This parameter governs the size of the restoring force in (A.8) and hence the typical value of b τ .

Appendix B: Glauber-Ising model
This section summarises some results for large deviations of the time-integrated energy in the one-dimensional Glauber-Ising model. This situation was analysed in [30]. Here we summarise the results and we also correct two small errors in that analysis. Details of the corrected analysis are given in [144].
The Glauber-Ising chain has N spins σ i = ±1. The energy is E = − 1 2 i σ i σ i+1 with periodic boundaries. Spin i flips with rate 1/(1 + e βhiσi ) where β is the inverse temperature, and h i = (σ i−1 + σ i+1 ) is the local field on site i, such that h i σ i is the change in energy on flipping spin i. The model can be analysed by writing τ i = σ i σ i+1 so that τ i = −1 if there is a domain wall between spin i and spin i + 1, and τ i = +1 otherwise. Then the energy is − i τ i /2. The domain walls can be interpreted as particles that evolve according to a reaction-diffusion dynamics. Since the system has periodic boundaries then it is important to note that the number of these domain walls is always even.
In the domain-wall representation, the operator W s can be represented in terms of Pauli matrices. The dependence of the model on temperature is incorporated through a parameter λ = 2/(1 + e 2β ) ≤ 1. As stated in [73], the operator W s can be diagonalised using a Jordan-Wigner transformation. Some details of a similar computation are given in [145], where it is emphasised that the Jordan-Wigner transformation requires some care with the periodic boundary conditions. Using that the number of domain walls is always even the relevant operator can be diagonalised in a Fourier basis as where the sum runs over wavevectors q in the first Brillouin zone (see below), β q and β † q are fermionic creation and annihilation operators, and Ω q = (1 − cos q + s − λ) 2 + λ(2 − λ) sin 2 q. (B.2) Compared with [30], we have corrected a factor of two in (B.1). In addition, careful treatment of the interplay between periodic boundaries and the Jordan-Wigner transformation requires that the wavevectors q in (B.1) are q = (2m + 1)π/N with m = 0, 1, 2, . . . , L − 1 so that qN/π is an odd integer [145], contrary to [30]. Deriving this result uses explicitly that the number of domain walls is even. The largest eigenvalue of W s is then 3) The corresponding eigenvector |0 obeys β † q |0 = 0 for all q. For s = 0 it can be checked that Ω q = [1 + (1 − λ) cos q] so ψ(0) = 0, as required.
The function ψ N (s) is analytic and convex in s, as it must be because the model is in the class of Section 2.2. Taking the large-N limit, the free energy per site of (42) is This function has a singularity in its second derivative at the phase transition point s = λ, at which Ω q → 0 for small-q. This is a dynamical phase transition model and the system is ferromagnetic for s > λ.