The Semi-infinite Asymmetric Exclusion Process: Large Deviations via Matrix Products

We study the totally asymmetric exclusion process on the positive integers with a single particle source at the origin. Liggett (Trans. Am. Math. Soc. 213, 237–261, 1975) has shown that the long term behaviour of this process has a phase transition: If the particle production rate at the source and the original density are below a critical value, the stationary measure is a product measure, otherwise the stationary measure is spatially correlated. Following the approach of Derrida et al. (J. Phys. A 26(7), 1493, 1993) it was shown by Grosskinsky (2004) that these correlations can be described by means of a matrix product representation. In this paper we derive a large deviation principle with explicit rate function for the particle density in a macroscopic box based on this representation. The novel and rigorous technique we develop for this problem combines spectral theoretical and combinatorial ideas and is potentially applicable to other models described by matrix products.


Introduction
Many natural systems are not in thermodynamic equilibrium, which loosely speaking means that there is a permanent exchange of energy or matter of the system with its surroundings or within the system itself. In statistical physics, the asymmetric exclusion process is often considered the paradigm of such a system out of equilibrium. In the absence of a general theory for systems out of equilibrium, it has been argued that large deviation rate functions play an important role as a replacement for the thermodynamical potential [4]. The principal aim of the present paper is to develop a rigorous mathematical technique to derive such rate functions from a particular type of representation of the stationary state of the system, the matrix products, which twenty years after the pioneering work of Derrida et al. [11] is available for a wide range of particle systems out of equilibrium, see for example Blythe and Evans [6] for a survey.
We present our method in the case of the totally asymmetric exclusion process (TASEP) on the positive integers with a single particle source at the origin, a case which has apparently not been treated in the literature so far. In this Markovian model, particles are positioned on the sites of the semi-infinite lattice N = {1, 2, . . . } in such a way that no site carries more than one particle. The dynamics of the model can be informally described as follows: A particle source carries a Poisson clock with intensity α > 0. If this clock rings, the source attempts to inject a particle at site one. If this site is vacant the injection takes place, otherwise it is suppressed and nothing happens. Also, every particle in the system carries an independent Poisson clock with rate one, and when the clock rings the particle tries to jump to the neighbouring site on its right. If this site is vacant the jump takes place, otherwise it is suppressed. Note that the exclusion interaction originating from the suppression of jumps and injections ensures that no site ever carries more than one particle.
The exclusion interaction in this model has a profound effect on the behaviour of the system. Most notably the detailed balance equations for this Markov chain have no nontrivial solution. Hence the system is not reversible, in other words it is out of equilibrium. The long-term behaviour of the process shows local convergence to a stationary measure which depends on the initial configuration of the system. Assuming that initially particles are iid Bernoulli with density ρ, this stationary measure has an interesting phase transition described by Liggett [16]. If the injection rate α satisfies α ≤ 1 2 and ρ ≤ 1 − α the system does not feel the interaction and the stationary measure is the product measure with density α. If however α > 1 2 or ρ > 1 − α, the exclusion of particles leads to spatial correlations in the stationary measure, which is no longer a product measure. In this case, the overall particle density at stationarity is the maximum of 1/2 and the initial density ρ, independently of the injection rate α.
There have been considerable efforts to describe the long range correlations of the stationary measures and the microscopic transition kernels in the exclusion process explicitly. For instance, Sasamoto and Williams [19] and Tracy and Widom [21] derive explicit formulas from combinatorial identities, and Sasamoto [18] uses an ansatz based on orthogonal polynomials. A particularly successful approach to describe spatial correlations is the matrix product ansatz first suggested in 1993 by Derrida, Evans, Hakim and Pasquier [11] and refined and extended in a large number of papers, see [10,13,15] for a few further examples. Large deviation principles have been derived for the hydrodynamical limits of a range of boundary driven exclusion processes by Bertini and coauthors [3,5] and the method should be extendable to our case. In principle, large deviation principles for the particle density in a macroscopic box then follow from these results by contraction, see [7]. However, the optimisation in path space, which is required to get an explicit rate function, is often unwieldy and technical as Bahadoran's paper [2] readily testifies.
In the light of these difficulties it is a natural idea to try and derive large deviation principles directly from the matrix product ansatz. This plan was carried out by Derrida et al. [12] in the case of an asymmetric exclusion process on a finite interval of sites. Key to their method is a saddle point argument, which allows to derive an additivity formula which compares the stationary measure on the interval with stationary measures on complementary subintervals. From this formula an explicit rate function for the particle density is derived. The paper [12] was a spectacular success, but we have not been able to implement this method in the case of a semi-infinite lattice. In a different development, Angeletti et al. [1] show that already for matrix product representations with finite matrices the large deviation principles that arise from this exhibit a rich phenomenology. Finite matrix representations have the advantage that they can be studied using the Perron-Frobenius theory, which is unavailable for infinite matrices. Physical examples, however, are almost always based on representations by infinite matrices.
In this paper we present a rigorous and novel approach to calculate large deviations for the macroscopic particle density in the semi-infinite totally asymmetric exclusion process. We use the matrix product representation as a starting point, and base the analysis on the Gärtner-Ellis theorem. To study the asymptotics of the cumulant generating function of the particle density, we use quite different approaches for the lower and upper bounds. The lower bound is based on the spectral theory of Toeplitz operators in a suitable weighted sequence space, while the upper bound exploits combinatorial identities coming directly from the matrix product ansatz. As our method is not too technical, we believe that it is very promising to deal with a wide range of other particle systems whose stationary measure can be described by a matrix product representation.
The paper is organised as follows. In Section 2 we give a rigorous definition of the model, state background results and formulate and interpret our main result. Section 3 discusses the matrix product representation in this case and describes our approach to the large deviation problem. The proof of the upper bound for the cumulant generating function is carried out in Section 4, while the lower bound is derived in Section 5. The proof is completed in Section 6, in which we also provide some further comments on our technique.

Background
To give a formal definition of the model, we first define the auxiliary switching and swapping functions σ x , σ x,y : A semi-infinite totally asymmetric exclusion process (TASEP) with injection rate α ∈ (0, 1) is a Markov process {ξ(t)} t≥0 in continuous time with state space {0, 1} N and semigroup S(t) identified by its infinitesimal generator G defined by where f : {0, 1} N → R is a function that depends only on a finite number of sites. Denote by ν α the product measure with constant density α, that is ν α {η ∈ {0, 1} N : η j 1 = 1, η j 2 = 1, . . . , η j n = 1} = α n for all distinct choices of j 1 , j 2 , . . . , j n ∈ N and all n ∈ N. We say a measure μ ρ on {0, 1} N is asymptotically product with density ρ if lim k→∞ μ ρ {η ∈ {0, 1} N : η j 1 +k = 1, η j 2 +k = 1, . . . , η j n +k = 1} = ρ n for all distinct choices of j 1 , j 2 , . . . , j n ∈ N and all n ∈ N. It is known that the semi-infinite TASEP is not an ergodic process and there is no uniqueness of the stationary measure as proved in Theorem 1.8 of [16].
Observe the notation we adopt throughout this paper: The initial particle density for the TASEP is denoted ρ, whereas is the stationary particle density under μ α . Convergence in Theorem 2.1 is not uniform and slows down the further sites are from the origin. The initial density ρ determines the stationary measure the process will converge to. In particular, starting from all sites empty, if the injection rate satisfies α ≤ 1 2 , the distribution of the process converges to the product measure with constant density α. If α > 1 2 , the distribution of the process converges to μ α 1 2 , which has spatial correlations and an asymptotic density equal to 1 2 . Observe that the injection mechanism is not able to produce a stationary particle density larger than 1 2 unless there is initially a high density of particles in the system. We will explore matrix representations of the measures μ α in Section 3.

Main Result
Our problem at hand is to find the rate function for a large deviation principle of the empirical density on the first n sites under the stationary measure, as n goes to infinity. Recall from Theorem 2.1 that the answer to this problem depends in a subtle way on the spatial correlations occurring in the case α > 1 2 . The theory of large deviations analyses the exponential decay of probabilities of increasingly unlikely events. Formally, a sequence of random variables {Z n } n∈N taking values in The main result of this paper is the following large deviation principle. (a) If α ≤ 1 2 and ρ < 1 − α, then (c) If α > 1 2 and 1 2 < ρ < α, then Recall that part (a) is well-known and included for completeness. It implies the weak law of large numbers, saying that the empirical density converges in probability to α if α ≤ 1 2 , see Fig. 1. The unique zero of the rate function moves from 0 to 1 2 with the value of α, at the same time it is getting easier to achieve any given density larger than 1 2 . Part (b) shows that the empirical density converges to 1 2 if α > 1 2 and ρ < 1 2 . Looking at Fig. 1 we see that the rate function is still convex and its zero stays fixed at 1 2 . Now reaching high densities has always the same cost regardless of the value of α, but low densities become increasingly expensive as the value of α increases. Note that the rate function is non-analytic at the value z = 1 − α, which reveals a dynamical phase transition in the sense of [17]. For part (c), the minimum of the rate function is now at ρ. Low densities still become increasingly expensive as α increases; yet, high densities now become cheaper, see the left diagram of Fig. 2. In this case, we observe that both α and ρ play a role in the rate function, see the right diagram of Fig. 2 to appreciate the joint effect. The phase transitions are seen at z = 1 − α, as in the previous case, and z = 1 − ρ. As ρ → α we recover the rate function of Bernoulli product measures.
In parts (b) and (c), the cost in the first regime, when z ≤ 1 − α, is up to a shift by a negative constant equal to the cost of changing the boundary density. The rates in the third regime, when 1 − z is smaller than the typical density, represent the cost of replacing the typical density in the asymptotic Bernoulli product measure by the desired value, so the cost in this regime is bulk dominated. In the second regime the cost is larger than in the third, indicating that both a bulk and a boundary cost have to be paid.
The regimes when α ≤ 1 2 and ρ ≥ 1 − α, or when α > 1 2 and ρ ≥ α are not covered by the techniques of this paper and remain open, see Fig. 3. In the former case we have no matrix product representation.  If α > 1 2 and ρ < α it is worth comparing our large deviation result for the semi-infinite TASEP with that for the finite TASEP studied in [12]. The stationary measure on the semiinfinite TASEP can be obtained as a limit of the stationary measures on the finite TASEP with n sites and boundary densities chosen as α on the left boundary, and min{1 − ρ, 1 2 } on the right boundary, see [16,Section 3]. However, the transition rates across bonds in the semi-infinite TASEP are not equal to min{1 − ρ, 1 2 }. It is therefore somewhat surprising that the large deviation rate of the sequence of densities in the finite TASEP with increasing system size obtained in [12, (3.12)] still agrees with the rate we have obtained for the average density over increasing blocks in the infinite TASEP at stationarity.

Matrix Product Ansatz
Grosskinsky [14], following the seminal work of [11], has given a characterisation of the long range dependence in μ α with a matrix product ansatz.
for some c > 0. Then (a) the probability measureν α c defined by is invariant for the generator (1) if and only if 1 2 , and otherwise it is asymptotically product with density given as the solution of c = (1 − ) which satisfies ≥ 1 2 .
Very often, to apply Theorem 3.1, no explicit solution of Eq. 3 is needed. Below we only use the recursive structure of these equations to show that the measuresν α c and μ α agree under certain conditions. Note that this does not follow directly from Theorem 2.1 as this result does not describe the long-term behaviour of the TASEP started inν α c , which is not necessarily a product measure. Proof By part (e) in [16, Theorem 3.10] the measure μ α is uniquely determined by the following two properties, numbered as in [16], (d) If n > 1 and η ∈ {0, 1} n with η 1 = 0, then We show thatν α c satisfies these properties. Under the assumptions of (c) we get from properties (3a) in the second equality and Eq. 3c in the third onē Under the assumptions of (d) we get from conditions (3b) in the second equality and Eq. 3c in the third one, We now explain our approach to find a large deviation principle of the empirical density under this measure. We will approach this via the Gärtner-Ellis theorem, see Theorem V.6 in [9]. Here we state the conditions specific for our case.
To calculate the moment generating function M n (θ ) of Z n we use Theorem 3.1 and Proposition 3.2 in the third equality and condition (3c) in the fifth one, to get If D and E were finite matrices, we could identify this limit using the Perron-Frobenius theorem as the spectral radius of the matrix e θ D + E. However in our example (and in almost all physically interesting examples) the matrices solving Eq. 3 are necessarily infinite. A first idea would be to truncate the matrices to finite size, calculate the spectral radius and taking a limit, but this turns out to lead to a wrong result, as it neglects the important information contained in the vectors v and w.
We will look at lower and upper bounds in Eq. 5 separately. For the upper bound we exploit that matrices D and E, as well as the vectors v and w, solving Eq. 3 are explicitly known. We introduce weighted 2 spaces, denoted 2 s , and interpret the matrix e θ D + E as an operator on these spaces. If the weights are such that v is an element of 2 s , and w T an element of its dual, we can get a bound on Eq. 5 from the spectral radius of the operator, which can be optimised by minimising the bound over all admissible weights. In order to obtain the spectral radius we use a simple isomorphism between weighted and unweighted 2 spaces. Acting on the unweighted spaces, the operator has a Toeplitz structure and from the general theory of Toeplitz operators on 2 an explicit formula for the spectral radius is available. This estimate will be carried out in detail in Section 4.
The technique for the lower bound only relies on the structure of the equations (3). These provide an algorithmic way to reduce arbitrary products of D and E to linear combinations of monomials of the form E p−j D j with 0 ≤ j ≤ p. We expand the product (e θ D + E) n into a polynomial with nonnegative coefficients f n p,j (θ ), and use Eq. 3 to derive a recursion formula for the coefficients. While it seems to be too complicated to fully resolve this recursion, we focus on selected key terms which can be derived explicitly. Note that for a lower bound we can drop all inaccessible terms in the expansion. In our case, to obtain the growth rate it suffices to include the fastest growing terms f n p,p D p and f n p,0 E p corresponding to all summands in the product which can be reduced to monomials of just one variable. It is quite a typical phenomenon that only a small number of boundary summands contribute to the growth of the sum, and that these coefficients can be identified without solving the entire system of equations. This estimate will be carried out explicitly in Section 5.

Weighted l 2 Spaces
To find an upper bound for the cumulant generating function we consider the weighted spaces Proof We can define the inverse T −1 s : 2 s → 2 by (T −1 s x) k = x k s k/2 and hence T s is bijective. We just need to prove it is an isometry, so let x∈ 2 s and calculate Analogously, for x ∈ 2 we have |T s x| 2 s = |x| 2 .

Lemma 4.2
The dual space 2 * s can be identified with 2 s −1 .
Proof Define the dual product ·, · D : 2 s −1 × 2 s → R by y, x D = T −1 s −1 y, T −1 s x , where ·, · is the usual inner product in 2 . We first prove that for each vector y ∈ 2 s −1 there exists a function f y ∈ 2 * s such that f y (x) = y, x D . To this end, let y ∈ 2 s −1 and define f y : 2 s → R by f y (x) = y, x D = k∈N x k y k .
The linearity of f y follows easily from the definition; the Cauchy-Schwarz inequality in 2 shows it is also bounded, Conversely, let f ∈ 2 * s . Define g : 2 → R by g(x) = (f • T s )(x). Since f and T s are both linear, so is g, and since f is bounded, Hence, g ∈ 2 * and by the Riesz Representation theorem there exists a uniqueỹ ∈ 2 such that g( We now need an explicit solution for Eq. 3, so we first define the values Elementary calculations show that the matrices D, E and the vectors v and w defined by , · · · and v = 1 satisfy the matrix product conditions (3). In the boundary case c = 1 4 we have λ 1 = 1 = λ 2 and we take v T = (1, 2, 3, . . .).
To simplify notation, define for fixed θ ∈ R the operator A(θ ) : 2 s → 2 s with the infinite matrix representation and then the k-th component of the vector A(θ )x is

Proposition 4.3 The operator A(θ )
Proof Let x∈ 2 s . Using Cauchy-Schwarz in R 2 and R 3 for each term of A(θ )x gives where C s > 0 is a constant independent of x and hence we see that A(θ ) is a bounded linear operator.

Lemma 4.4
Let L ∈ L( 2 s ), that is a bounded linear operator from 2 s to itself. The operator Proof Take x ∈ 2 . Then by Lemma 4.1, The tilde operator commutes with exponentiation. Proof We proceed by induction over n. For n = 1, the proposition is a tautology. We assume the proposition true for n, let x ∈ 2 and calculatẽ Recall from Eq. 8 the explicit form of w and v and note that if s ∈ (0, 1) On the other hand, if s > 1 Therefore, for s ∈ ( 1 α − 1) 2 , 1 we have that v∈ 2 s and w ∈ 2 s −1 .

Toeplitz Operators
Before stating the main result of this section, we need to review some properties of Toeplitz operators. Let a = {a k } k∈Z ∈ 2 (C), that is, a double sequence of complex numbers such that k∈Z |a k | 2 < ∞. A Toeplitz operator A defined by the double sequence a ∈ 2 (C) is an infinite matrix with the structure The symbol κ : {z ∈ C : |z| = 1} → C of a Toeplitz operator is defined by We recall Theorem 7.1 in [22] that deals with spectra of Toeplitz operators. For fixed θ ∈ R, the operator A(θ ) defined by Eq. 9 is by Proposition 4.3 in L( 2 s ). By Lemma 4.4 the operatorÃ(θ ) is a Toeplitz operator in 2 with its symbol κ given by Writing ζ = e iϕ as an element of the unit circle, which we recognise as a parametrised ellipse centred at 1 + e θ , with major axis of length √ s + e θ √ s along the real line, and minor axis of length | √ s − e θ √ s |. Therefore, the spectral radius is found when z = 1 and We now state the main result of this section: the upper bound for the cumulant generating function .

Proposition 4.7 For defined by Eq. 5, an upper bound is
Proof Recall from Eq. 10 that v∈ 2 s ; by Proposition 4.3 A(θ ) ∈ L( 2 s ) and therefore A(θ )v∈ 2 s . Also w ∈ 2 s −1 from Eq. 11. Hence, by Eq. 5 and Cauchy-Schwarz, The norm of w does not contribute to the limit since it does not depend on n. By Lemma 4.4, Lemma 4.1, and Lemma 4.5 we can continue the previous estimate once again, the norm of T −1 s v does not contribute to the limit since it does not depend on n. We insert the factor 1 n to the logarithm by continuity and use the definition of spectral radius We now use Eq. 12 to find the spectral radius Since the left hand side does not depend on s, it is a lower bound on the right hand side for s, so we take the infimum over the interval Given θ , the value of s that reaches the infimum of this function is given by Plugging s * into the formula gives the result of the lemma.

Lower Bound: A Combinatorial Approach
In order to find the lower bound we use a completely different approach. We will focus on the powers of e θ D + E.

Proposition 5.1
There exists a sequence of polynomials f n p,j (θ ) in e θ such that and they can be defined recursively in two ways: Starting with (14) and the second characterisation is We now state an auxiliary result.

Lemma 5.2
For n ≥ 2 and 1 ≤ p ≤ r ≤ n − 1, we have the identity changing the order of summation and using Lemma 5.2 gives We now use Stirling's formula and a change of variables ε = r n and δ = p n to obtain The inner problem, when ε is fixed, is solved by choosing δ max = 0, if ε ≤ 1 1+λ 1 , and This problem is solved by choosing 1+λ 1 e θ if θ > −2 log λ 1 . Plugging this value of ε max yields the result of the proposition.

Proposition 5.7
For the cumulant generating function defined by Eq. 5, Proof We follow the same technique as in the previous proposition, but now consider only those values for which j = 0 in Eq. 13. Hence, using Corollary 5.5 With the same change of variables as before, ε = r n and δ = p n , we have Splitting the problem in two, for a fixed ε, we find the optimal δ as δ max = 0, if 1 − α ≤ ε ≤ 1, and δ max = 1 − ε 1−α , if 0 ≤ ε < 1 − α. And the remaining problem is solved by choosing the optimum ε as With these values of ε we get the desired result.
Proof The bounds from Propositions 5.6 and 5.7 are the same in the interval 2 log 1 α − 1 ≤ θ ≤ −2 log λ 1 . In the other intervals, a comparison of the bounds establishes the claim.

The Rate Function
Summarising, we have the following result. Corollary 6.1 For the cumulant generating function defined by Eq. 5, Proof This follows from the fact that the upper and lower bounds from Proposition 4.7 and Corollary 5.8, respectively, are the same.
Finally we have the necessary tools to prove Theorem 2.2.
An alternative approach to the problems studied here is based on translation into a random walk problem. Rewriting the infinite matrix e θ D + E as 2(1 + e θ )Q θ , one can see that Q θ can be interpreted as the probability transition matrix of a discrete time random walk {X n } n≥0 on N 0 killed at the origin. By the form of the vector w T , shown in Eq. 8, the product w T Q n θ is, up to a normalising constant, the sub-probability distribution of the n-th step of the random walk starting from a geometric distribution with parameter 2 − 1 α . If T denotes the killing time of the random walk, then Eq. 23 becomes n log E (λ X n 1 − λ X n 2 )1 {T >n} , using the form of v given in Eq. 8. The latter rate may be evaluated using a functional large deviation principle for the random walk. This alternative method hinges on the availability of the large deviation result and the resolvability of the variational problems that come out of its application, which may be more complicated than our original approach. In principle, however, the method seems suited to not only reveal large deviations for the asymptotic density, but also for a density profile depending on a macroscopic space variable, as done by Derrida et al. in [12] in the case of the finite TASEP. By contrast, the technique presented in this paper is more elementary and direct. It therefore seems to be more flexible, giving hope to produce large deviation principles for a range of other systems with spatially correlated distributions given by a matrix product representation. In particular there are several natural variants even of the example of a semi-infinite TASEP considered in this paper: For example, we would be interested in identifying a large deviation principle for the semi-infinite TASEP process starting from a configuration given by a value of ρ in Theorem 2.1 with ρ > α > 1 2 , which was so far excluded for technical reasons. It also seems feasible to generalise the large deviation principle for the overall density to a large deviation principle for a density profile. Finally, it would be interesting to analyse the non-stationary TASEP using a time dependent matrix representation, as given by Stinchcombe and Schütz in [20].
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.