Quadratic and rate-independent limits for a large-deviations functional

We construct a stochastic model showing the relationship between noise, gradient flows and rate-independent systems. The model consists of a one-dimensional birth-death process on a lattice, with rates derived from Kramers' law as an approximation of a Brownian motion on a wiggly energy landscape. Taking various limits we show how to obtain a whole family of generalized gradient flows, ranging from quadratic to rate-independent ones, connected via '$L \log L$' gradient flows. This is achieved via Mosco-convergence of the renormalized large-deviations rate functional of the stochastic process.

1. Introduction 1.1. Variational evolution. Two of the most studied types of variational evolution, 'Gradient-flow evolution' and 'Rate-independent evolution', differ in quite a few aspects. Although both are driven by the variation in space and time of an energy, gradient flows are in fact driven by energy gradients, while in practice rate-independent systems are driven by changes in the external loading (represented by the time variation of the energy). As a result, gradient-flow systems have an intrinsic time scale, while rate-independent systems (as the name signals) do not, and the mathematical definitions of solutions of the two are rather different [Mie05,AGS08].
Despite this they share a common structure. Both can be written, at least formally, as 0 ∈ ∂ψ(ẋ(t)) + D x E(x(t), t). (1) Here E is the energy that drives the system, and the convex function ψ is a dissipation potential, with subdifferental ∂ψ. For gradient flows, ψ typically is quadratic, and ∂ψ single-valued and linear; for rate-independent systems, ψ is 1-homogeneous, and ∂ψ is a degenerate monotone graph. Rate-independent systems have some unusual properties. Solutions are expected to be discontinuous, and therefore the concept of smooth solutions is meaningless. Currently two rigorous definitions of weak solutions are used, which we refer to as 'energetic solutions' [MM05] and 'BV solutions' [MRS12a]. Heuristically, the first corresponds to the principle 'jump whenever it lowers the energy', while the second can be characterized as 'don't jump until you become unstable'. For time-dependent convex energies the two definitions coincide, but in the non-convex case they need not be.
Various rigorous justifications of rate-independent evolutions have been constructed, which underpin the rate-independent nature by obtaining it through upscaling from a 'microscopic' underlying system (e.g. [ACJ96,Cag08,DMDM06,DMDMM08,Fia09,Mie12,PT05,Sul09]). The general approach in these results is to choose a microscopic model with a component of gradient-flow type (quadratic dissipation, deemed more 'natural'), and then take a limit which induces the vanishing of the quadratic behaviour and the appearance of the rate-independent behaviour.
While these results give a convincing explanation of the rate-independent nature, they all are based on deterministic microscopic models. Other arguments suggest that the rate-independence may arise through the interplay between thermal noise and a rough energy landscape. A well-studied example of this is the non-trivial temperature-dependence of the yield stress in metals, which shows that the process is thermally driven (e.g. [Bas59]), together with many classical non-rigorous derivations of rate-independent behaviour [Bec25,Oro40,KE75].
Recently, stochasticity has also been shown to play a role in understanding the origin of various gradient-flow systems, such as those with Wasserstein-type metrics [ADPZ11,DLZ12,ADPZ13,Ren13,MPR13]. In this paper we ask the question whether these different roles of noise can be related: What is the relationship between noise, gradient flows, and rate-independent systems? We will provide a partial answer to this question by studying a simple stochastic model below. By taking various limits in this model, we obtain a full continuum of behaviours, among which rate-independence and quadratic gradient flow can be considered extreme cases. In this sense both rate-independent and quadratic dissipation arise naturally from the same stochastic model in different limits.
1.2. The model. The model of this paper is a continuous-time Markov jump process t → X n t on a one-dimensional lattice, sketched in Figure 1. Denoting by 1/n the lattice spacing, we will be interested in the continuum limit as n → ∞.
The evolution of the process can be described as follows. Assume that a smooth function (x, t) → E(x, t) is given and fix the origin as initial point. If the process is at the position x at time t, then it jumps in continuous time to its neighbours (x − 1/n) and (x + 1/n) with rate nr − and nr + , where r ± (x) = α exp(∓β∇E(x, t)) (throughout we use ∇E(x, t) for the derivative with respect to x).
x − 1 n x x + 1 n 1 n nr + nr − r ± (x) = αe ∓β∇E(x,t) x ∈ 1 n Z Figure 1. The one-dimensional lattice with spacing 1/n. The jump rates r + and r − depend on two parameters α and β and on the derivative of the function E.
The choice of this stochastic process is inspired by the noisy evolution of a particle in a wiggly energy landscape. An example could be that of a Brownian particle in an energy landscape of the form E n (x, t) = E(x, t) + n −1 e(nx), where E is the smooth energy introduced above, and e is a fixed periodic function. If the noise is small with respect to the variation of e (max e − min e), then this Brownian particle will spend most of its time near the wells of E n , which are close to the wells of e(n · ). Kramers' formula [Kra40,Ber11] provides an estimate of the rate at which the particle jumps from one well to the next; in Section 1.6 below we show how some approximations lead to the jump rates r ± above.
The jump process of Figure 1 has a bias in the direction −∇E of magnitude and we will see this expression return as a drift term in the limit problem. The parameter α characterizes the rate of jumps, and thus fixes the global time scale of the process; the parameter β should be thought as the inverse of temperature, and characterizes the size of the noise.
1.3. Heuristics. We now give a heuristic view of the dependence of this stochastic process on the parameters n, α, and β, and in doing so we look ahead at the rigorous results that we prove below. First, as n → ∞, the process X n becomes deterministic, as might be expected, and its limit x satisfies the differential equation suggested by (2): ( Equation (3) is of the form (1) with (∂ψ) −1 = 2α sinh(β·). From the viewpoint of the gradientflow-versus-rate-independence discussion above, the salient feature of the function ξ → 2α sinh(βξ) is that it embodies both quadratic and rate-independent behaviour in one and the same function, in the form of limiting behaviours according to the values of the parameters α, β. This is illustrated by Figure 2, as follows. On one hand, if we construct a limit by zooming in to the origin, corresponding to β → 0, α ∼ w/β (the left-hand figure), then we find a limit that is linear; on the other hand, if we zoom out, and rescale with β → ∞ and α ∼ e −βA , then the exponential growth causes the limit to be the monotone graph in the right-hand side. These two limiting cases correspond to a gradient-flow and a rate-independent behaviour respectively. In formulas, as α → ∞ and β → 0 with αβ → ω for fixed ω > 0, then equation (3) converges which is a gradient flow of E. The limit α → ∞ corresponds to large rate of jumps in the underlying stochastic process, while β → 0 corresponds to a weak influence of the energy gradient.
In the other case, as α → 0 and β → ∞ with α ∼ exp(−βA), the rate of jumps is low, but the influence of the energy becomes large. Formally, we find the limiting equatioṅ Again formally, in this limit the system can only move while ∇E = ±A; whenever the force |∇E| is less than A, the system is frozen, while values of |∇E| larger than A should never appear. In Section 4 we obtain a rigorous version of this evolution as the limit system.
1.4. Large deviations, gradient flows, and variational formulations. Before we describe the results of this paper, we comment on the methods that we use. We previously introduced the concept of gradient flows and now we introduce the one of large deviations (both are defined precisely in Section 2). In the context of stochastic systems, the theory of large deviations provides a characterization of the probability of rare events, as some parameter-in our case n-tends to infinity. In the case of stochastic processes, this leads to a large-deviations rate function J that is defined on a suitable space of curves. It is now known that many gradient flows and large-deviations principles are strongly connected [ADPZ11,DLR13,ADPZ13,MPR13]. In abstract terms, the rate function J of the largedeviations principle simultaneously figures as the defining quantity of the gradient flow, in the sense ξ 2α sinh(βξ) Figure 2. The middle graph shows the function ξ → 2α sinh(βξ) for moderate values of α and β. The left graph shows the limit for β → 0, similar to zooming in to the region close to the origin; this limit is linear. The figure on the right shows the limiting behaviour when β → ∞, for a specific scaling of α. This second limit does not exist as a function, but only as a graph (a subset of the plane) defined in (5). that J ≥ 0; t → z(t) is a solution of the gradient flow ⇐⇒ J (z(·)) = 0.
The components of the gradient flow (the energy E and dissipation potential ψ) can be recognized in J . In [MPR13] it was shown how jump processes may generate large-deviations rate functions with non-quadratic dissipation, leading to the concept of generalized gradient flows (see also [MRS09,MRS12b,DPZ13]).
The central tool in this paper is this functional J that characterizes both the large deviations of the stochastic process and the generalized gradient-flow structure of the limit. Our convergence proofs will be stated and proved using only this functional, giving a high level of coherence to the results.
1.5. Results. In this section we give a non-rigorous description of the results of this paper, with pointers towards the rigorous theorems later in the paper.
Fix an energy E ∈ C 1 (R × [0, T ]). We start with the large deviations result for the jump process X n due to Wentzell [Wen77,Wen90].
Statement 1 (Large deviations). For constant α and β, X n satisfies a large-deviations principle for n → ∞, with rate function J α,β given in (14). Moreover, the minimizer of J α,β satisfies the generalized gradient flow equation (3).
This result is stated in Theorem 2.5 with a sketch of the proof as it is presented in the introduction of [FK06]. In accordance with the discussion above, J α,β (x) ≥ 0 for all curves x : [0, T ] → R, and J α,β (x) = 0 if and only if x is a solution of (3). Figure 3. The figure is a schematic representation of this paper. In the top center there is the generator of the Markov process. The arrows starting from X n represent the limiting behaviour for n → ∞ in different regimes. The center arrow represents the limit with α and β fixed and ends at the rate functional J α,β (Statement 1). Statement 3 is represented by the right side of the Figure Next we prove that the functional J α,β converges to two functionals J Q and to J RI in the sense of Mosco-convergence, defined in (19), when α and β have the limiting behaviour of Figure 2. The limiting functionals drive respectively a quadratic gradient-flow and a rate-independent evolution. B1. First let 0 < δ < 1. Then X n satisfies a large-deviations principle as n → ∞, with rate function J Q ; the Markov process X n has a deterministic limit (4), and this limit minimizes J Q (as we already mentioned). B2a. In the case δ = 1, let α n β n → ω, nβ n → 1/h, for some ω, h > 0; then X n converges to the process Y h described by the SDE Remark 1.1. In this paper we consider only the one-dimensional case. We make this choice because the main goal of the paper is to show the connection and the interplay between large deviations and gradient flows, and the one-dimensionality allows us to avoid various technical complications. However, the generalization to higher dimension is in some cases just a change in the notation and does not require any relevant modification in some of the proofs. The rate-independent limit in higher dimensions is non-trivial and it is the object of work in progress.
1.6. Modelling. We mentioned above that the rates r ± can be derived from Kramers' law; we now give some details.
In the wiggly energy E n (x, t) = E(x, t) + n −1 e(nx), assume that E varies slowly both in x and t, e has period 2 and let n be large. Then a small patch of the energy landscape of E n looks like Figure 4. The height of the energy barriers to the left and right of a well equals ∆e := (max e − min e), plus a perturbation from the smooth energy E, which to leading order has size n −1 ∇E.
We now assume that the position Z t of the system solves the SDE Here β characterizes the noise, and can be interpreted as 1/kT as usual, although in this case there is an additional scaling factor n. For sufficiently large β the rates of escape from a well, to the left and to the right, are given by Kramers' law to be approximately where the minus sign applies to the rate of leaving to the right. Here a is a constant depending on the form of e [Kra40,Ber11]. Writing we find the rates r ± of Figure 1. In this formula for α, it appears that α and β are coupled. From a modelling point of view, this is true: if one varies the temperature while keeping all other parameters fixed, then both β and α will change. Note that the scaling regime α ∼ e −βA is exactly this case, with A = ∆e. On the other hand, the parameter a in (8) is still free, and this allows us to consider α and β as independent parameters when necessary. 1.7. Outline. The paper evolves as described in the following. In Section 2 we introduce the concepts of (generalized) gradient flows and of large deviations and we show the connection between the two concepts. In Section 3 we prove point A1 of Statement 2 and the whole Statement 3. Then in Section 4 we introduce the space of functions of bounded variations, rate-independent systems, and we prove point A2 of Statement 2. We end the paper in Section 5 with a final discussion.

Gradient Flows & Large Deviations
In the introduction we mentioned that the methods of this paper make use of a certain unity between gradient flows and large-deviations principles: the same functional J that defines the gradient flow also appears as the rate function of a large-deviations principle. We now describe gradient flows, large deviations, and this functional J .
2.1. Gradient flows. Given a C 1 energy E : R → R, we call a gradient flow of E the flow generated by the equationẋ = −∇E(x). The energy E decreases along a solution, since Adopting the notation ψ(ξ) = ξ 2 2 and ψ * (η) = η 2 2 , this identity can be integrated in time to find In this paper we study a generalized concept of gradient flow, considering the energy equality for a broader class of couples ψ, ψ * . We will allow the energy to be also dependent on time. We recall the definition of the Legendre transform: given ψ : R → R we define the transform ψ * as In the following, apart from the rate-independent case, we will assume that ψ ∈ C 1 (R) is symmetric, superlinear and convex, so that its Legendre transform ψ * will share the same properties as well.
A curve x : [0, T ] → R is an absolutely continuous curve, i.e. x ∈ AC(0, T ), if for every ε > 0, there exists a δ > 0 such that, for every finite sequence of pairwise disjoint intervals (t j ,

Definition 2.1 (Generalized gradient flow). Given an energy
Note that the left-hand side of (9) is non-negative for any function x, since From this inequality one deduces that equality in (9), as required by Definition 2.1, implies that for where ∂ψ * is the subdifferential of ψ * . We will not use this form of the equation; the arguments of this paper are based on Definition 2.1 instead. Existence and uniqueness of classical gradient-flow solutions in R (i.e. with quadratic ψ) follows from classical ODE theory; in recent years the theory has been extended to metric spaces and spaces of probability measures [AGS08].
In our case we will require the energy to satisfy the following conditions Remark 2.2. Definition 2.1 requires ψ to be strictly convex. The rate-independent evolution (5) is formally the case of a non-strictly convex, 1-homogeneous dissipation potential ψ, and for this case there are several natural ways to define a rigorous solution concept. In Section 4 we show how generalized gradient flows for finite α and β, with strictly convex ψ, converge to a specific rigorous rate-independent solution concept, the so-called BV solutions [MRS09,MRS12a]. We define this concept in Definition 4.1.
Returning to the unity between gradient flows and large deviations, for generalized gradient flows the functional J mentioned before is the left-hand side of (9).
2.2. Large deviations. 'Large deviations' of a random variable are rare events, and large-deviations theory characterizes the rarity of certain rare events for a sequence of random variables. Let {X n } be such a sequence of random variables with values in some metric space.
and for each closed set C lim sup The function J is called the rate function for the large-deviations principle.
Intuitively, the two inequalities above state that where we purposefully use the vague notations ≃ and ∼; the rigorous versions of these symbols is exactly given by Definition 2.3.
Remark 2.4. Typically, the rate function for Markov processes contains a term I 0 characterizing the large deviations of the initial state X n (0). In the following we will always assume that the starting point will be fixed, or at least that X n (0) → x 0 , so that I 0 (x) equals 0 if x = x 0 , and +∞ otherwise, so we will disregard I 0 .
2.3. The Feng-Kurtz method. Feng and Kurtz created a general method to prove large-deviations principles for Markov processes [FK06]. The method provides both a formal method to calculate the rate functional and a rigorous framework to prove the large-deviations principle. Here we present only the formal calculation. Consider a sequence of Markov processes {X n } in R, which we take time-invariant for the moment, and consider the corresponding evolution semigroups {S n (t)} defined by where Ω n is the generator of X n .
The Feng-Kurtz method then states, formally, that {X n } satisfies a large-deviations principle in D([0, T ]) with speed a n , with a rate function In the book [FK06] a general method is described to make this algorithm rigorous.
2.4. Large deviations of X n . We now apply this method to the process X n described in the introduction. It is a continuous time Markov chain, defined by its generator For f ∈ C b (R) the expected value E is defined as with, denoting by Ω T n the adjoint of Ω n , where µ n t is the law at time t of the process X n started at the position X n 0 at time t = 0. Under the condition (11), the martingale problem (13) is well-posed, since the operator Ω n is bounded in the uniform topology.
The rigorous proof of Statement 1 consists of the following theorem Consider the sequence of Markov processes {X n } with generator Ω n defined in (12) and with X n (0) converging to x 0 for n → ∞. Then the sequence X n satisfies a large-deviations principle in D([0, T ]) with speed n and rate function where The proof can be found in [FW12, Ch. 5-Th. 2.1] when the energy E is independent of time. In the general case of a time-dependent energy, the proof follows considering a space-time process, as shown in the proof of Theorem 3.2.
In the remainder of this paper we will consider sequences in α and β; to reduce notation we will drop the double index, writing ψ β and ψ * β for ψ α,β and ψ * α,β ; similarly we define the rescaled functional J β , Remark 2.6. The theorem above shows how the rate function J α,β can be interpreted as defining a generalized gradient flow. This illustrates the structure of the fairly widespread connection between gradient flows and large deviations: in many systems the rate function not only defines the gradientflow evolution, through its zero set, but the components of the gradient flow (E and ψ) can be recognized in the rate function. This connection is explored more generally in [MPR13] as we describe in the following.
where Ψ * can be expressed in terms of the Legendre transform H(z, ξ) of L as Applying the same procedure to our case, with L = L defined in (18), we obtain after some calculations that Substituting into S(z) our choice for r + and r − it follows that S(z) = βE(z).
2.5. Calculating the large-deviations rate functional for (12). We conclude this section by calculating the rate function for the simpler situation when the jump rates r ± are constant in space and time, as it is shown in the introduction of [FK06]. This formally proves Theorem 2.5, substituting in the end the expression or r ± from (12). With constant jump rates, the generator reduces to and for n → ∞ it converges to Ωf (x) = (r + − r − )∇f (x). As we said in the introduction, the process X n has a deterministic limit, i.e. X n → x a.s., withẋ = r + − r − . In order to calculate the rate functional, we compute the non-linear generator and the limiting Hamiltonian and Lagrangian. We have We then obtain by an explicit calculation the Lagrangian and substituting r + and r − with the corresponding ones from (12) we get and we formally prove Theorem 2.5.

The Quadratic Limit
In this section we precisely state and prove point A1 of Statement 2 and the whole of Statement 3. We are in the regime where β → 0, α → ∞, with αβ → ω.
First we show heuristically why the functional J β defined in (16) is expected to converge to J Q defined in (20). Looking at the equation that minimises the functional J β , and doing a Taylor expansion for β ≪ 1,ẋ = −2α sinh(β∇E) ≃ −2αβ∇E → −2ω∇E. Considering the functional J β for β ≪ 1 it can be seen that We now turn to the rigorous proof of the convergence to the quadratic gradient flow and therefore point A1 of Statement 2. For this we need the concept of Mosco-convergence. Given a sequence of functionals φ n and φ defined on a space X with weak and strong topology, φ n is said to Mosco- The gradient-flow Definition 2.1 is based on the function space AC(0, T ). We define weak and strong topologies on the space AC(0, T ) by using the equivalence with W 1,1 (0, T ). Let x, x n ∈ AC(0, T ). We say that x n converges weakly to x (x n ⇀ x) if x n → x strongly in L 1 (0, T ) anḋ x n ⇀ẋ weakly in L 1 (0, T ), i.e. in σ(L 1 , L ∞ ); we say that x n converges strongly to x (x n → x) if in additionẋ n →ẋ strongly in L 1 (0, T ).

Theorem 3.1 (Convergence to the quadratic limit). Given
Moreover, if a sequence {x β } is such that J β (x β ) is bounded, and then the sequence {ẋ β } is compact in the topology σ(L 1 , L ∞ ).
Proof. First we prove the Mosco-convergence. The lim-sup condition follows because, for β → 0 By the local uniform convergence we can choose the recovery sequence to be the trivial one. Now we prove the lim-inf inequality. The uniform convergence of x β to x implies that we can pass to the limit in the terms E(x β (·), ·) and ∂ t E(x β , t) dt. Then applying Fatou's lemma we find where we used the inequality 2 cosh(θ) ≥ 2 + θ 2 . The function ψ * β is non-increasing for β → 0 so, for any β ≤ β, we have ψ β ≥ ψ β . Then and we conclude taking the limit β → 0. Now we prove the compactness. Let us suppose that J β (x β ) and E(x β (0), 0) + T 0 ∂ t E(x β , t)dt are bounded. Then, by the positivity of ψ * β , With the choice β = 1 we have Note that this result can also be obtained by the abstract method of Mielke [Mie14,Th. 3.3]. Also note that the result can also be formulated in the weak-strict convergence of BV ; for the lower semicontinuity this follows since the weak convergence in BV with bounded J β implies weak convergence in AC, and for the recovery sequence it follows from our choice of the trivial sequence.
We end this section with two theorems completing the proof of Statement 3, pictured in the right hand side of Figure 3.
First we define for each h > 0 the SDE where W t is the Brownian motion on R, Y h 0 has law δ x 0 and its generator Ω is defined as For f ∈ C b (R) the expected value E is defined as with, denoting by Ω T the adjoint of Ω, where µ t is the law at time t of the process Y h started at the position x 0 at time t = 0. Then the following theorems hold.
Theorem 3.2 (Large deviations for the processes X n and Y h ). Given an energy E satisfying condition (11), fix 0 < δ < 1 and consider the sequence of processes {X n } with generator Ω n defined in (12) with β = n −δ and αβ → ω for n → ∞. Then, if X n (0) → x 0 , the process X n satisfies a large-deviations principle in D([0, T ]) with speed n 1−δ and with rate function the extension of J Q in (20) to BV : Proof. The proof of the large-deviation principle for X n relies on the fulfilment of three conditions, namely convergence of the operators H n , exponential tightness for the sequence of processes X n , and the comparison principle for the limiting operator H, following the steps of [FK06, Sec. 10.3]. We restrict ourselves, for sake of simplicity, to the case α = ω/β, and let m = n 1−δ . To treat the time dependence we use the standard procedure of converting a time-dependent process into a time-independent process by adding the time to the state variable (see e.g. [EK09,Sec. 4.7]): , and given Ω n defined in (12), we define Q n as With m = n 1−δ , we have, Now, with the convention u + 1/n = (x + 1/n, t), implying convergence in the uniform topology, and then the proof, mutatis mutandis, follows similarly.
Then the large-deviations principle holds in D R×[0,T ] ([0, T ]) with rate functional where L is the Legendre transform of H respect to the variables (∇f, ∂ t f ). It is just a calculation to check that where I 1 is the indicator function of the set {1}, i.e.
It is then clear that J Q (u) = J Q (x).
The large-deviations result for Y h can be found in [FW12, Th. 1.1 of Ch. 4] in the case of a time-indedendent energy. The time-dependent case follows by the same modification as above. (11), with ∇E uniformly continuous, let α n β n → ω, nβ n → 1/h, and let µ n t be the law of the process {X n (t)} defined in (12) with µ n 0 = δ X n 0 . If X n 0 → x 0 , then µ n weakly converge to µ (in the duality with C b (R)), where µ is the law of the Brownian motion with gradient drift (21)

Theorem 3.3 (Convergence to Brownian motion with gradient drift). Let be given an energy E satisfying condition
Proof. This is a result of standard type, and we give a brief sketch of the proof for the case of time-independent E, using the semigroup convergence theorem of Trotter [Tro58, Th. 5.2]. The assumptions of this theorem are satisfied by the existence of a single dense set on which Ω and Ω n are defined, pointwise convergence of Ω n to Ω on that dense set, and a dense range of λ − Ω for sufficiently large λ. The assertion of Trotter's theorem is pointwise convergence of the corresponding semigroups at each fixed t, which implies convergence of the dual semigroups in the dual topology, which is the statement of Theorem 3.3. We set the system up as follows. Define the state space Y := C b (R) with the uniform norm, where R is the one-point compactification of R; define the core D := {f ∈ C 2 b (R) ∩ C(R) : ∆f uniformly continuous }, which is dense in Y for the uniform topology, and which will serve as the dense set of definition mentioned above for both Ω n and Ω. For each f ∈ D, Ω n f → Ωf in the uniform topology.
The density of the range of λ − Ω is the solvability in D of the equation for all g in a dense subset of Y ; we choose g ∈ C c (R) + R. This is a standard result from PDE theory, which can be proved for instance as follows. First note that we can assume g ∈ C c (R), by adding a constant to both g and f . Secondly, for sufficiently large λ > 0 the left-hand side generates a coercive bilinear form in H 1 (R) in the sense of the Lax-Milgram lemma, and therefore there exists a unique solution f ∈ H 1 (R). By bootstrap arguments, using the continuity and boundedness of ∇E, we find f ∈ C 2 b (R), and since f ∈ H 1 (R) ∩ C 2 b (R), f (x) tends to zero at ±∞, implying that f ∈ C 2 b (R)∩C(R). Finally, since ∇E is uniformly continuous, the same holds for ∆f . This concludes the proof.

Rate-independent Limit
In this section we prove point A2 of Statement 2 and the whole of Statement 3. We will prove point A2 with a theorem that holds in greater generality, without assuming the explicit form of the couple ψ β , ψ * β , but only a few 'reasonable' assumptions and the limiting behaviour. We are therefore in the regime where β → ∞, log α = −βA for some A > 0.

Functions of bounded variation and rate-independent systems.
We now briefly recall the definition of the BV space of functions with bounded variation, following the notation of [MRS12b]. A full description of this space and its properties can be found in [AFP00]. Given a function x : [0, T ] → R the total variation of x in the interval [0, T ] is defined by The function x then admits left and right limits x(t − ) and x(t + ) in every point t ∈ [0, T ], and we define the jump set of x as and the pointwise variation in the jump set as The total variation admits the representation where |ẋ| is the modulus of the absolutely continuous (a.c.) part of the distributional derivative of x; the measure |Cx| is the Cantor part and Jmp represents the contribution of the (at most countable) jumps. Given a sequence {x n } ⊂ BV ([0, T ]), we again define two notions of convergence. We say that x n weakly converges to x (x n ⇀ x) if x n (t) converges to x(t) for every t ∈ [0, T ] and the variation is uniformly bounded, i.e. sup n Var(x n , [0, T ]) < ∞. We say that x n strictly converges to x (x n → x) if x n ⇀ x and in addition Var(x n , [0, T ]) converges to Var(x, [0, T ]) as n → ∞.
According to the general setup of [MRS12a,MRS12b] we define a notion of rate-independent system based on an energy balance similar to equation (9), where now the dissipation ψ has a linear growth, i.e. ψ(η) = ψ RI (η) = A|η| with A > 0.
We first define Jmp E , which can be viewed as an energy-weighted jump term, as The relation between the definitions (25) and (24) becomes clear in the following inequality where equality can be achieved depending on the behaviour of E, e.g. trivially when |∇E(x, t)| ≤ A for every x, t. Then we can interpret Jmp E as a modified jump term, with an E-dependent weight.
In analogy with the (generalized) gradient flow Definition 2.1 we define rate-independent systems. There is no unique way to define a rate-independent system. The so-called energetic solutions have been introduced and analysed in [MT04,MT99,MTL02], and are based on the combination of a pointwise global minimality property and an energy balance. Here we concentrate on BV solutions, as defined in [MRS12a]. Our limiting system will be of this type.
Fix A > 0, the rate-independent dissipation ψ RI and its Legendre transform ψ * RI are with ψ RI and ψ * RI are defined in (26). 4.2. Assumptions and the main result. In the rest of this section we prove that the generalized gradient-flow evolution converges to the rate-independent one. This is point A2 of statement 2, formulated in Theorem 4.2 showing Mosco-convergence of J β to J RI , which is the left-hand side of (27). There are three main reasons why the convergence to a rate independent system should be expected.
First, from a heuristic mathematical point of view, our choices of α and β yield pointwise convergence of ψ β and ψ * β to a one-homogeneous function and to its dual, the indicator function; this suggests a rate-independent limit. However, this argument does not explain which of the several rate-independent interpretations the limit should satisfy, nor does it explain the additional jump term.
Secondly, from a physical point of view, the underlying stochastic model mimics a rate-independent system. This can be recognized by keeping the lattice size finite but letting β → ∞; then the rates either explode or converge to zero, depending on the value of ∇E. We can interpret this in the sense that when a rate is infinite, with probability one a jump will occur to the nearest lattice point with zero jump rate.
Thirdly, considering the evolution, in the case 1 ≪ β < ∞ the generalized gradient flow will present fast transitions when |∇E| > A. By slowing down time during these fast transitions, we can capture what is happening at the small time scale of these fast transitions-which become jumps in the limit. This is exactly how we construct the recovery sequence in Theorem 4.2.
The convergence will be proven in a greater generality; more precisely, we do not use the explicit formulas, but we require that ψ β and ψ * β satisfy the following conditions: A ψ β and ψ * β are both symmetric, convex and C 1 ; B ψ * β converge pointwise to ψ * RI (w) = +∞ for |w| > A, 0 for |w| ≤ A; C ∀ M > 0 ∃ δ β → 0 such that as β → ∞, . D For each α ≥ 1 and for each |w| ≤ R there exists η β (w, α) ≥ 0 such that ∂ψ * β (w + η β (w, α)) = α∂ψ * β (w), and η β is bounded uniformly in α, β, and |w| ≤ R. It is important to underline that the previous conditions C-D are needed in Theorem 4.2 only for the the Γ-limsup, meanwhile they are not necessary for the Γ-liminf. These conditions are satisfied by a large family of couples ψ β -ψ * β . The two examples below show that our specific case and the vanishing-viscosity approach respectively are covered by the assumptions A-D.
Vanishing viscosity: ψ * β (w) = β(|w| − A) 2 + . Also here, conditions A and B are immediately satisfied. Then, again considering w ≥ A, we verify condition C by choosing δ β ≃ β −1/3 and λ = 1, so that Then it is just a calculation to check that condition C is satisfied in this case. Now condition D requires that 2β(w + η − A) = 2αβ(w − A), and so η = (α − 1)(w − A) satisfies the condition.
Theorem 4.2 (Convergence to the rate-independent evolution). Given an energy E satisfying condition (11), a sequence of couple ψ β -ψ * β satisfying conditions A-D, and for x ∈ BV ([0, T ]) consider the functional J β then, as β → ∞, J β M − → J RI with respect to the weak-strict topology of BV , where J RI is given by The proof is divided into three main steps. We first prove the compactness and the lim-inf inequality; this will follow as in [MRS12b,Th. 4.1,4.2]. We report them for completeness and we translate their proof because we can avoid some technicalities. To finish the proof we need to construct a recovery sequence. When minimizers with J RI ≡ 0 are considered, then the recovery sequence is easy to construct; we just need to take a sequence x β such that J β (x β ) = 0 for every β. But for the full Mosco-convergence, we need to find a way to construct a recovery sequence also for non-minimizers of J RI . This is the last part of the proof and it will be achieved using a parametrizedsolution technique.

Proof of compactness and the lower bound.
Proof of compactness. Recall that weak convergence in BV is equivalent to pointwise convergence supplemented with a global bound on the total variation (e.g. [AFP00, Prop. 3.13]).
First we show that |x β (t) − x β (0)| is bounded uniformly in t and β. We observe that and so we obtain where the constant C may change from line to line. Then The inequality above and the boundedness of x β (0) imply that the whole sequence is bounded for every t ∈ [0, T ]. Next we show the existence of a converging subsequence. For every 0 ≤ t 0 ≤ t 1 ≤ T we recall the bound Defining the non-negative finite measures on [0, T ] ν β,A := ψ β (ẋ β ) + ψ * β (A) L 1 , up to extracting a suitable subsequence, we can suppose that they weakly- * converge to a finite measure ν A , so that −→ x for every t ∈ I as β h → ∞. From now on, for simplicity, we will number the subsequence with the same index of the main sequence. Then for every t 0 , t 1 ∈ I.
(32) The curve I ∋ t → x(t) can be uniquely extended to a continuous curve in [0, T ] \ J, that we will still denote by x. Arguing by contradiction we show that the whole x β (t) converges pointwise to x(t). If the pointwise convergence does not hold, then there will be a further subsequence t βn → t ∈ [0, T ] \ J such that x βn (t βn ) →x = x(t), but this is in contradiction to the previous inequality We have so proven the pointwise convergence of x β to x; the inequality (32) then gives a uniform bound on the BV norm of x β and so we conclude.
Proof of the lim-inf inequality. Let {x β } ⊂ AC(0, T ) be a sequence such that J β (x β ) is bounded, and which converges weakly to x ∈ BV ([0, T ]). By the arguments above x β is bounded uniformly in t and β and every term in the functional is bounded itself by a constant independent of β. The following limits, follow from the pointwise convergence and Lebesgue's dominated convergence theorem: As we said, the integral T 0 ψ * β (−∇E(x β (t), t))dt is bounded for every β by a constant that we still denote by C. Because of the monotonicity of ψ * β this bound implies that This proves that |∇E(x(t), t)| ≤ A a.e. and therefore T 0 ψ * RI (∇E(x(t), t)) dt = 0; it trivially follows that We now prove the second part of the inequality, As in the proof of the compactness, we consider the non-negative finite measure on [0, T ] ν β := ψ β (ẋ β ) + ψ * β (∇E(x β , ·)) L 1 , up to extracting a subsequence, we can suppose that they weakly * converge to a finite measure ν 0 + ψ * RI (∇E(x, ·))L 1 . Because |∇E(x, ·)| ≤ A L-a.e. we obtain that, as in the proof of the compactness, This inequality is slightly too weak for us. The Cantor part and the Lebesgue measurable part are fine, but we need a stronger characterization of the jump part: for all t ∈ J x , To prove this, fix t ∈ J x and take two sequences h − β < t < h + β converging monotonically to t such that Because of the convergence of ν β we have lim sup and up to extracting a subsequence we can assume that s ± β → s ± . Denote by h β := s −1 β the inverse map of s β , we observe that h β is 1-Lipschitz and monotone, and it maps [ We can then define the following Lipschitz functions and they take the special values Therefore, denoting by I a compact interval containing the intervals [s − β , s + β ] for all β, then up to a subsequence, we have that Moreover θ(s ± ) = x(t ± ) and θ(t) = x(t). Then using the inequality The last term (h + β − h − β )ψ * β (A) tends to zero as β → ∞. Therefore The inequality ( * ) follows from the technical Lemma [MRS12b, Lemma 4.3], and so we conclude.

Proof of the lim-sup inequality.
Proof. We assume that we are given x ∈ BV ([0, T ]); we will construct a sequence x β such that Reparametrization. A central tool in this construction is a reparametrization of the curve x (as in Figure 5), in terms of a new time-like parameter s on a domain [0, S]. The aim is to expand the jumps in x into smooth connections.
As in [MRS12a,Prop. 6.10], we define then there exists a Lipschitz parametrization (t, and such that where Moreover, it also holds that Note that L(x, t,ẋ,ṫ) ≥ |ẋ| A ∨ |∇E(x, t)| , since ψ * RI (w) is only finite when |w| ≤ A. Preliminary remarks. The third term in J β (x β ) (see (28)) is equal to and these three terms pass to the limit under the strict convergence x β → x that we prove below. We therefore focus on the other terms in J β and J RI . By (34) it is sufficient to prove that lim sup L(x(s), t(s),ẋ(s),ṫ(s)) ds.
From condition (11) we have that ∇E(x, t) is uniformly Lipschitz continuous in t; let L be the Lipschitz constant. In order to define a time rescaling we introduce an auxiliary function. We fix x(s) s S Figure 5. Schematic representation of the time parametrization procedure. The curve x is such that x(s(t)) = x(t).
and use Hypothesis C to obtain sequences δ β , K β → 0 for this value of M . We now define = |v| (|w| ∨ (A + δ β )) , and the infimum is achieved by The function ε β can be interpreted as an optimal time rescaling of a given speed v and a given force |w| ∨ (A + δ β ).
Definition of the new time t β and the recovery sequence x β . For sake of simplicity, in the following we construct a recovery sequence only for a curve x with jumps at 0 and T . Later in the proof we show that, in a similar way, a recovery sequence can be constructed for a curve x with countable jumps with transparent changes in the proof.
We construct the recovery sequence by first perturbing the time variable t. We define t β : [0, S] → [0, T β ] as the solution of the differential equatioṅ We can assume that |ẋ(s)| = 0 for s ∈ and define our recovery sequence as follows: We now have that Estimates. The inequality (36) now follows from the following three estimates: We have now that Applying Lemma 4.3 in every subinterval [s(t i ), s(t i+1 )], we obtain the same bounds (39-41) with C β S substituted by C β |s(t i+1 ) − s(t i )|. Then inequality (36) follows because The pointwise convergence of x β (t) → x(t) for t ∈ (Σ i ) • is again trivial. The following calculations show that, by construction, the convergence holds also in the points and we conclude.

Discussion
In the introduction we posed the question whether we could understand the distinction and the relationship between gradient-flow and rate-independent systems from the point of view of stochastic processes. The simple one-dimensional model of this paper gives a very clear answer, that we summarize in our words as follows: • The continuum limit is a generalized gradient flow, with non-quadratic, non-1-homogeneous dissipation, and the large-deviations rate functional 'is' the corresponding generalized gradientflow structure, in the sense of Section 1.4; • Taking further limits recovers both quadratic gradient-flow and rate-independent cases; • At least some of the limits are robust against exchanging the order of the limits, and we conjecture that this robustness goes much further. Therefore the quadratic and rate-independent cases are naturally embedded in the scale of systems characterized by α and β.
In addition, the details of the proofs show how the formulation in terms of J of (a) large deviations, (b) generalized gradient flows including rate-independent systems, and (c) convergence results for these systems, gives a unified view on the field and a coherent set of tools for the analysis and manipulation of the systems.
Related issues have been investigated in the case of stochastic differential equations. The two limiting processes, n → ∞ and β → {0, ∞} can be interpreted as differently scaled combinations of two limiting processes: (a) the small-noise limit, (b) the limit of vanishing microstructure. In the case of SDEs [Bal91, FS99, DS12], three regimes have been identified, corresponding to 'microstructure smaller than noise', 'noise smaller than microstructure', and the critical case. In the first of these, 'microstructure smaller than noise', a behaviour arises that resembles the quadratic limit of this paper, in which the microstructure is effectively swamped by the noise. The critical case resembles our original large-deviation result (Theorem 2.5) in that both give non-quadratic, non-one-homogeneous rate functionals. Finally, when the noise is asymptotically smaller than the microstructure, a limit similar to the rate-independent limit is obtained in [FS99,DS12], but because the authors consider time-invariant energies and a different scaling, the behaviour of the limiting system is rather different.
The one-dimensionality of the current setup may appear to be a significant restriction, but we believe (and in some cases we know) that the structure can be generalized to a wide class of other systems. For instance, • The initial large-deviations result (Theorem 2.5) also holds in higher dimensions; other proofs of this and similar results are given in [SW95,Che96]. • The joint large-deviations-quadratic limit (Theorem 3.2) generalizes to higher dimensions with only notational changes in the proof. • Of the proof of the convergence to a rate-independent system (Theorem 4.2), one part (the liminf-inequality) has been done in the generality of a metric space, with a specific functional form of the dissipation potential, in [MRS12b]. The other part, the construction of a recovery sequence, is subject of current work; here the characterization of the limiting jump term depends on the particular form of the approximating ψ β -ψ * β , in a way that is not yet clear. More generally, the results of [MPR13] show that the connection between large-deviation principles and generalized gradient flows is robust, and arises for all reversible stochastic processes and quite a few more (such as the GENERIC system in [DPZ13]).
In Figure 3 the question mark represents an open problem: the combined large-deviation-rateindependent limit. We conjecture that, as in the combined large-deviation-quadratic limit (Theorem 3.2), a large-devation principle holds in this limit, with rate functional J RI . Unfortunately the framework provided by [FK06] does not seem to apply as-is, and the form of this functional will require a radical change in the strategy of the proof.