Non-Equilibrium Steady States for Chains of Four Rotors

We study a chain of four interacting rotors (rotators) connected at both ends to stochastic heat baths at different temperatures. We show that for non-degenerate interaction potentials the system relaxes, at a stretched exponential rate, to a non-equilibrium steady state (NESS). Rotors with high energy tend to decouple from their neighbors due to fast oscillation of the forces. Because of this, the energy of the central two rotors, which interact with the heat baths only through the external rotors, can take a very long time to dissipate. By appropriately averaging the oscillatory forces, we estimate the dissipation rate and construct a Lyapunov function. Compared to the chain of length three (considered previously by C. Poquet and the current authors), the new difficulty with four rotors is the appearance of resonances when both central rotors are fast. We deal with these resonances using the rapid thermalization of the two external rotors.

steady state (NESS). In general, the explicit expression for this invariant measure is unknown, and the convergence rate depends on the nature of the system. For the model under consideration, we obtain a stretched exponential rate.
For several examples of Hamiltonian chains, properties of the NESS (e.g., thermal conductivity, validity of the Fourier law, temperature profile, …) have been studied numerically, perturbatively, or via some effective theories. See for example [2,3,7,13,15,19,21] for chains of rotors and [1][2][3][4]16,20,21] for chains of oscillators. From a rigorous point of view however, the mere existence of an invariant measure is not evident, and has been proved only in special cases.
A lot of attention has been devoted to chains of classical oscillators with (nonlinear) nearest neighbor interactions. In such models, each oscillator has a position q i ∈ R (we take one dimension for simplicity), is attached to the reference frame with a pinning potential U (q i ), and interacts with its neighbors via some interaction potentials W (q i+1 − q i ) and W (q i − q i−1 ).
It turns out that the properties of the chain depend crucially on the relative growth of W and U at high energy. In the case of (asymptotically) polynomial potentials, and for Markovian heat baths, it has been shown [5,[9][10][11][12]27] that if W grows faster than U , the system typically relaxes exponentially fast to a NESS. The convergence is fast because, thanks to the strong interactions, the sites in the bulk of the chain "feel" the heat baths effectively even though they are separated from them by other sites.
In the strongly pinned case, i.e., when U grows faster than W , the situation is more complicated. When a given site has a lot of energy, the corresponding oscillator essentially feels only its pinning potential U (q i ) and not the interaction. Assume U (q) ∝ q 2k with k > 1. An isolated oscillator pinned with a potential U and with an energy E oscillates with a frequency that grows like E 1/2−1/2k . This scaling plays a central role, since the larger the energy at a site, the faster the corresponding q i oscillates. But then, the interaction forces with the sites i + 1 and i − 1 oscillate very rapidly and become ineffective at high energy. Therefore, a site (or a set of sites) with high energy tends to decouple from the rest of the chain, so that energy can be "trapped" in the bulk. This mechanism not only makes the convergence to the invariant measure slower, but it also makes the proof of its existence harder. The case where W is quadratic is considered in [18]. There, Hairer and Mattingly show that if U (q) ∝ q 2k with k sufficiently large, no exponential convergence to an invariant measure (if there is one) can take place. Moreover, they show that an invariant measure exists in the case of 3 oscillators when k > 3/2. The existence of a NESS for longer chains of oscillators remains an open problem when the pinning dominates the interactions.
Chains of rotors are in fact closely related to strongly pinned oscillator chains: The frequency of a rotor scales as E 1/2 , where E is its energy. This scaling corresponds to that of an oscillator in the limit k → ∞, for the pinning U (q) ∝ q 2k discussed above. In this sense, our chain of rotors behaves as a chain of oscillators in the limit of "infinite pinning", which is some kind of worst-case scenario from the point of view of the asymptotic decoupling at high energy. On the other hand, the compactness of the position-space (it is a torus) in the rotor case is technically very convenient. The problems appearing with chains of strongly pinned oscillators are very similar to those faced with chains of rotors, and so are the ideas involved to solve them.
The existence of an invariant measure for the chain of 3 rotors has been proved in [6], as well as a stretched exponential upper bound of the kind exp(−c √ t) on the convergence rate. The methods, which involve averaging the rapid oscillations of the central rotor, are inspired by those of [18] for the chain of 3 oscillators.
In the present paper, we generalize the result of [6] to the case of 4 rotors, and obtain again a bound exp(−c √ t) on the convergence rate. The main new difficulty in this generalization is the presence of resonances among the two central rotors. When they both have a large energy, there are two fast variables and some resonant terms make the averaging technique developed in [6] insufficient. A large portion of the present paper is devoted to eliminating such resonant terms by using the rapid thermalization of the external rotors.
It would be of course desirable to be able to work with a larger number of rotors. The present paper uses explicit methods to deal with the averaging phenomena. We hope that by crystallizing the essentials of our methods, longer chains can be handled in the same spirit. We expect that for longer chains, the convergence rate is of the form exp(−ct k ), for some exponent k ∈ (0, 1) which depends on the length of the chain. We formulate a conjecture and explain the main difficulties for longer chains in Remark 5. 3.
We now introduce the model and state the main results. In Sect. 2, we study the behavior of the system when one of the two central rotors is fast, and construct a Lyapunov function in this region. In Sect. 3, we do the same in the regime where both central rotors are fast. In Sect. 4 we construct a Lyapunov function that is valid across all regimes, and in Sect. 5 we provide the technicalities necessary to obtain the main result.

The model.
We study a model of 4 rotors, each given by a momentum p i ∈ R and an angle q i ∈ T = R/2π Z, i = 1, . . . , 4. We write in the sequel q = (q 1 , . . . , q 4 ) ∈ T 4 , p = ( p 1 , . . . , p 4 ) ∈ R 4 , and x = (q, p) ∈ Ω, where Ω = T 4 × R 4 is the phase space of the system. We consider the Hamiltonian where W I : T → R, I = L, C, R (standing for left, center and right) are smooth 2πperiodic interaction potentials (see Fig. 1).
Convention Unless specified otherwise, the arguments of the potentials are always as above, namely W L = W L (q 2 − q 1 ), W C = W C (q 3 − q 2 ) and W R = W R (q 3 − q 4 ). The same applies to any function with index L, C and R. Note that the argument for R is q 3 − q 4 (and not q 4 − q 3 ) since this choice will lead to more symmetrical expressions between the sites 1 and 4. To model the interaction with two heat baths, we add at each end of the chain a Langevin thermostat at temperature T b > 0, with dissipation constant γ b > 0, b = 1, 4. Introducing the derivative of the potentials w I = W I , I = L, C, R, the main object of our study is the SDE: where B 1 t , B 4 t are independent standard Brownian motions. The generator of the semigroup associated to (1.2) reads Remark 1.1. In contrast to [6], we do not allow for the presence of pinning potentials U (q i ) and of constant forces at the ends of the chain, although we believe that the main result still holds with such modifications. While constant forces would be easy to handle, the addition of a pinning potential would require some generalization of a technical result (Proposition 3.12) which we are currently unable to provide (see Remark 3.13).
We consider the measure space (Ω, B), with the Borel σ -field B over Ω. The coefficients in (1.2) are globally Lipschitz, and therefore the solutions are almost surely defined for all times and all initial conditions. We denote the transition probability of the corresponding Markov process by P t (x, · ), for all x ∈ Ω and t ≥ 0.

Main results.
We will often refer to the sites 1 and 4 as the outer (or external) rotors, and the sites 2 and 3 as the central rotors. We require the interactions from the inner rotors to the outer rotors to be non-degenerate in the following sense: Assumption 1.2. We assume that for I = L, R and for each s ∈ T, at least one of the derivatives w (k) This assumption is not very restrictive. In particular, it holds if all the potentials consist of finitely many nonconstant Fourier modes.
Our main result is a statement about the speed of convergence to a unique stationary state of the system (1.2). In order to state it, we introduce for each continuous function f : Ω → (0, ∞) the norm · f on the space of signed Borel measures on Ω: If f ≡ 1, we retrieve the total variation norm. (i) The transition probabilities P t (x, dy) have a C ∞ ((0, ∞)×Ω ×Ω) density p t (x, y). (ii) There is a unique invariant measure π , and it has a smooth density.
(iii) For all 0 ≤ θ 1 < min(1/T 1 , 1/T 4 ) and all θ 2 > θ 1 , there exist constants C, λ > 0 such that for all x = (q 1 , q 2 , . . . , p 4 ) ∈ Ω and all t ≥ 0, At thermal equilibrium, namely when T 1 = T 4 = 1/β for some β > 0, the invariant measure is the Gibbs distribution with density e −β H (x) /Z , where Z is a normalization constant. Theorem 1.3 will be proved in Sect. 5 with help of results of [8] and the existence of a Lyapunov function, the properties of which are stated in (i) There are constants c 1 > 0 and a ∈ (0, 1) such that There are a compact set K and constants c 2 , c 3 > 0 such that . (1.7) Most of the paper will be devoted to proving the existence of such a Lyapunov function.
Remark 1.5. We assume throughout that T 1 and T 4 are strictly positive. While the conclusions of Theorem 1.4 remain true for T 1 = T 4 = 0 (with any θ > 0), part of the argument has to be changed in this case, as sketched in Remark 3.17. The positivity of the temperatures is, however, essential for Theorem 1.3; at zero temperature, the system is not irreducible, and none of the conclusions of Theorem 1.3 hold.

Overview of the dynamics.
To gain some insight into the strategy of the proof, we illustrate some essential features of the dynamics (1.2). Since the exterior rotors (at sites 1 and 4) are directly damped by the −γ b p b dt terms in (1.2), we expect their energy to decrease rapidly with large probability. More specifically, for b = 1, 4, we find that Lp b is equal to −γ b p b plus some bounded terms, and thus we expect p b to decay exponentially (in expectation value) when it is large. Therefore, the external rotors recover very fast from thermal fluctuations, and will not be hard to deal with.
On the other hand, the central rotors are not damped directly, and feel the dissipative terms of (1.2) only indirectly, by interacting with the outer rotors. The interesting issue appears when the energy of the system is very large and mostly concentrated in one  20,30,40). The interaction potentials in the simulations here are W I = − cos so that the forces are w I = sin, I = L, C, R or both of the central rotors. If most of the energy is at site 2 (meaning that | p 2 | is much larger than all other momenta), the corresponding rotor spins very rapidly, i.e., q 2 moves very rapidly on T. But then, the interaction forces w L (q 2 − q 1 ) and w C (q 3 − q 2 ) oscillate rapidly, which causes the site to essentially decouple from its neighbors. The same happens when most of the energy is at site 3, when w C (q 3 − q 2 ) and w R (q 3 − q 4 ) oscillate rapidly. And when both | p 2 | and | p 3 | are large and much larger than | p 1 | and | p 4 |, the forces w L (q 2 − q 1 ) and w R (q 3 − q 4 ) are highly oscillatory, so that the central two rotors almost decouple from the outer ones (the force w C might or might not oscillate depending on p 2 and p 3 ).
This asymptotic decoupling is the interesting feature of the model: in principle, if the central rotors do not recover sufficiently fast from thermal fluctuations, the energy of the chain could grow (in expectation value) without bounds. On the other hand, when their energy is large, the decoupling phenomenon should make the central rotors less affected by the fluctuations of the heat baths. Our results imply that both effects combine in a way that prevents overheating. See [6, Remark 3.10] for a quantitative discussion of these two effects for a chain of three rotors. See also [17] for a clear exposition of the overheating problem in a related model. Figure 2 illustrates the evolution 1 of the momenta at two different time scales, starting with p(0) = (50, 20,30,40). The upper graph shows that indeed p 1 and p 4 decrease very fast, and the lower graph indicates that p 2 and p 3 remain large for a significantly longer time, but eventually also decrease. Since for this initial condition p 3 is larger than p 2 , the force w R oscillates faster than w L . Therefore, p 3 couples less effectively to the outer rotors (where the dissipation happens) than p 2 , and hence p 3 decreases more slowly.
If one were to look at these trajectories for much longer times, one would eventually observe some fluctuations of arbitrary magnitude, followed by new recovery phases. But large fluctuations are very rare.
Since the system is rapidly driven to small p 1 , p 4 , it is really the dynamics of ( p 2 , p 3 ) that plays the most important role. We will often argue in terms of the 8-dimensional dynamics projected onto the p 2 p 3 -plane. We illustrate some trajectories in this plane for several initial conditions in Fig. 3. To make the illustration readable, we used a very small temperature, so that the picture is dominated by the deterministic dynamics.
The typical trajectory is as follows. Starting with some large | p 2 | and | p 3 |, the slower of the two central rotors is damped faster than the other, so that the projection drifts rapidly towards one of the axes. This leads to a regime where only one of the central rotors is fast, while the other is essentially thermalized. The energy in this fast rotor is gradually dissipated, so that the orbit follows the axis towards the origin.
The behavior that we observe in Fig. 3 around the diagonal p 2 = p 3 far enough from the origin is easily explained: in the "center of mass frame" of the two central rotors, we simply see two interacting rotors that oscillate slowly in opposition, while being almost decoupled from the outer rotors. More precisely, introducing Q = q 3 − q 2 ∈ T and P = p 3 − p 2 ∈ R, we see that (Q, P) acts approximately as a mathematical pendulum with potential 2W C , plus some rapidly oscillating (and therefore weak) interactions with the outer rotors:Q Typically, if at first the energy in the center of mass frame is not large enough to make a "full turn," Q oscillates slowly around a minimum of W C , which corresponds to a back-and-forth exchange of momentum between 2 and 3, and explains the strips that we observe around the diagonal. The two central rotors are then gradually slowed down, until at some point the interaction with the external rotors tears them apart.
The picture in the absence of noise (that is, when T 1 = T 4 = 0, which is not covered by our assumptions) is quite different, due to some resonances. We discuss their nature in Appendix A. These resonances are washed away by the noise, and are therefore not visible here. They nevertheless play an important role in our computations, as we will see.

Strategy.
In order to obtain rigorous results about the dynamics and construct a Lyapunov function, we will apply specific methods to each regime described above. We present them here in increasing order of difficulty.
-When a significant part of the energy is contained in the outer rotors, then as discussed above, the momenta of the two outer rotors essentially decrease exponentially fast. In this region, the Lyapunov function will be e θ H , and we will show that when p 2 1 + p 2 4 is large enough and θ < min (1/T 1 , 1/T 4 ), then Le θ H −e θ H (Lemma 4.1). -When most of the energy is contained at just one of the central sites, namely at site j = 2 or j = 3, we will show that Lp j ∼ −p −3 j when averaged appropriately (Proposition 2.2). This corresponds to the neighborhood of the axes in Fig. 3. This case is essentially treated as in [6]. In this region, we use a Lyapunov function ). -When both | p 2 | and | p 3 | are large and hold most of the energy, we do not approximate the dynamics of p 2 and p 3 separately, but we consider instead the "central" We show that when averaged properly, L H c ∼ −p −2 2 − p −2 3 (Proposition 3.2). The Lyapunov function in this region is V c ∼ H c e θ H c , and we show (Proposition 3.5) that LV c −V c /H c . Showing that L H c ∼ −p −2 2 − p −2 3 is the most difficult part of our proof. The averaging of the rapidly oscillating forces will prove to be insufficient due to some resonances, which manifest themselves for some rational values of p 3 / p 2 . We will consider separately the vicinity of the p 2 = p 3 diagonal, which is easy to deal with (Lemma 3.7), and the case where | p 3 − p 2 | is large, which requires substantially more work (Sect. 3.3). In the latter case, we will use the rapid thermalization of the external rotors in order to eliminate the resonant terms. The factors 1/ p 2 2 and 1/H c in LV j −V j / p 2 j and LV c −V c /H c are the cause of the logarithmic contribution in (1.7), which leads to the subexponential convergence rate.
The final step (Sect. 4) is to combine e θ H , V 2 , V 3 and V c (which each behave nicely in a given regime) to obtain a Lyapunov function V that behaves nicely everywhere and satisfies the conclusions of Theorem 1.4.

The domains.
Following the discussion above, we decompose Ω into several sub-regions. This decomposition only involves the momenta, and not the positions. All the sets in the decomposition are defined in the complement of a ball B R of (large) radius R in p-space: For convenience, we consider only R ≥ √ 2 (see Remark 1.6). We also use (large) integers k, , and m which will be fixed in Sect. 4, and we assume throughout that 1 ≤ k < < m. (1.8) The first regions we consider are along the p 2 and p 3 axes: (1.9) The region Ω 2 (resp. Ω 3 ) corresponds to the configurations where most of the energy is concentrated at site 2 (resp. 3). The next region corresponds to the configurations where most of the energy is shared among the sites 2 and 3: (the conditions p 2 3 > p 2 2 and p 2 2 > p 2 3 ensure that both | p 2 | and | p 3 | diverge sufficiently fast when p → ∞ in Ω c ). These regions are illustrated in Figs. 4 and 5. Note that Ω 2 , Ω 3 , Ω c do intersect and do not cover Ω. However, for R large enough, the set Ω 2 ∪Ω 3 ∪Ω c ∪ B R contains the p 2 p 3 -plane (more precisely, the product of T 4 and some neighborhood of the p 2 p 3 -plane in momentum space), which is where the determining part of the dynamics lies, as discussed above.
for all m ≥ m and R ≥ R. This allows us to increase k, m and R as needed (but not ). We also observe immediately that for all k, , m, and for j = 2, 3, (1.12)

Notations.
Since averaging functions that rapidly oscillate in time will play an important role, we introduce the q i -average f i = 1 For any function f : T → R satisfying f = 0, one can find a unique integral F : T → R such that F = f and F = 0. More generally, we write f [ j] for the j th integral of f that averages to zero.
Without loss of generality, we fix the additive constants of the potentials so that We also introduce two "effective dissipation constants": where the positivity follows from Assumption 1.2. Note also that because of (1.13), there is no indeterminate additive constant in the α j . Finally, throughout the proofs, c denotes a generic positive constant that can be each time different. These constants are allowed to depend on the parameters and functions at hand, but not on the position x. We sometimes also use c to emphasize that the constant has changed.

When Only One of the Central Rotors is Fast
We consider the regime where either | p 2 | or | p 3 | (but not both) is much larger than all other momenta. The estimates for this regime are simple adaptations from [6], but we recall here the main ideas.
We start with some formal computations, thinking in terms of powers of p 2 (resp. p 3 ) only. Then, we will restrict ourselves to the set Ω 2 (k, R) (resp. Ω 3 (k, R)) for some large enough k and R, so that the other momenta are indeed "negligible" (see Lemma 2.3) compared to p 2 (resp. p 3 ).

Averaging with one fast variable.
Assume that | p 2 | is much larger than the other momenta. We think in terms of the following fast-slow decomposition: the variables q 1 , q 3 , q 4 and p evolve slowly, while q 2 evolves rapidly, sinceq 2 = p 2 , and p 2 is large. In this regime, the variable q 2 swipes through T many times before any other variable changes significantly. The dynamics for short times is We consider an observable f : Ω → R and let g be defined by Under the approximation (2.1), the quantity g(x(t)) oscillates very rapidly around its q 2 -average g 2 , which is a function of the slow variables q 1 , q 3 , q 4 and p. We therefore expect the effective equation L f ≈ g 2 to describe the evolution of f over several periods of oscillations, and we now show how to give a precise meaning to this approximation.
Although the stochastic terms (which appear as the second-order part of the differential operator L) appear in the computations, they do not play an important conceptual role in this discussion; the rapid oscillations that we average are of dynamical nature and are present regardless of the stochastic forcing exerted by the heat baths.
The generator of the dynamics (2.1) is simply Decomposing the generator L defined in (1.3) as L = L 2 + (L − L 2 ) and considering powers of p 2 , we view L 2 as large, and L − L 2 as small. Note that for all smooth h : Ω → R, we have L 2 h 2 = p 2 ∂ q 2 h 2 = 0 by periodicity, so that the image of L 2 contains only functions with zero q 2 -average. Consider next the indefinite integral G = (g− g 2 )dq 2 (we choose the integration constant C(q 1 , q 3 , q 4 , p) to our convenience). By construction, we have By subtracting the "small" counterterm G/ p 2 from f , we have managed to replace g with its q 2 -average in the right-hand side, plus some "small" correction. This procedure is what we refer to as averaging with respect to q 2 , and it makes sense only in the regime where | p 2 | is very large. If g 2 = 0 and (L 2 − L)(G/ p 2 ) is still oscillatory, the procedure must be repeated.

Application to the central momenta.
We now apply this averaging method to the observable p 2 , in the regime where | p 2 | is very large. By the definition of L, we find . Thus, in the notation above, Observe that the right-hand side of (2.6) is still oscillatory, but now with an amplitude of order 1/ p 2 , which is much smaller than the amplitude of (2.4) when | p 2 | is large. Furthermore, the right-hand side of (2.6) has zero mean, since w C 2 = w L 2 = 0 and by periodicity. In order to see a net effect, we need to average again. We consider now the observable f = p (1) 2 , and apply the same procedure. Instead of averaging the righthand side of (2.6) in one step, we first deal only with the terms of order −1 in p 2 , by introducing (2.7) We postpone further computations to the proof of Proposition 2.2 below, and explain here the main steps. We will see that Lp (2) 2 consists of terms of order −2 and −3 (by construction, the contribution of order −1 disappears). The terms of order −2 have mean zero, and will be removed by introducing a new variable p (3) 2 . We will then find that Lp (3) 2 contains terms of order −3 and −4. To replace the terms of order −3 with their average (which is finally non-zero), we will introduce a function p (4) 2 . This will complete the averaging procedure.
We illustrate in Fig. 6 the time-dependence of p 2 , p (1) 2 and p (2) 2 (slightly shifted for better readability). 2 Clearly, the oscillations of p (1) 2 are much smaller than those of p 2 , and we barely perceive the oscillations of p (2) 2 , since they are smaller than the random fluctuations.
Before we state the result of this averaging process, we introduce a convenient notation for the remainders. 2 The irregularity of the envelope of p 2 in Fig. 6 is due to the randomness of the phases of the two oscillatory forces w L and w C : they sometimes add up, and sometimes compensate each other. Note also that the trajectory of p (2) 2 is rougher than the other two, since the definition of p (2) 2 involves p 1 , which is directly affected by the stochastic force. (2.8) The analogous notation O 3 will be used when | p 3 | is large, and with a polynomial z( p 1 , p 2 , p 4 ).
This notation reflects the fact that when most of the energy is at site 2 (resp. 3), one can forget about the dependence on p 1 , p 3 , p 4 (resp. p 1 , p 2 , p 4 ), provided that it is at most polynomial (by the compactness of T 4 , the position q is irrelevant). For example, the term ( . It is easy to realize that the O j , j = 2, 3, follow the same basic rules as the usual O.
Proposition 2.2. There are functionsp 2 andp 3 of the form such that for j = 2, 3, where α 2 > 0, α 3 > 0 are defined in (1.14). Furthermore, (2.12) Proof. It suffices to consider the case j = 2. The variablep 2 is constructed as in [6]. We continue the averaging procedure started above. It is easy to check that Lp (2) 2 can be written as Since it is a total derivative, the term of order −2 has zero q 2 -average, and by introducing p One can then average the terms of order −3 in (2.13). We have again ∂ q 2 R 2 2 = 0 by periodicity, and after integration by parts we find (for the signs, recall that W L = W L (q 2 − q 1 ) and W C = W C (q 3 − q 2 )). By adding appropriate counterterms (not written explicitly), we obtain a function p The first term in the right-hand side is the one we are looking for, and we deal with the other term of order −3 (which is non-zero) as follows. We observe that since W L W C 2 is a function of q 1 , q 3 only. We then set and obtain (2.11). It is immediate by the construction ofp 2 that (2.12) holds.
We now introduce a lemma, which says that remainders of the kind O j (| p j | −r ), j = 2, 3, can be made very small on Ω j (k, R), provided that the parameters k, R are large enough.
Proof. We prove the result for j = 2. By Definition 2.1 and (1.11), there is a polynomial z such that for all large enough R and all k, where the second inequality is immediate for sufficiently large N , the third inequality comes from the definition of Ω 2 , and the fourth inequality holds because | p 2 | is bounded away from zero on Ω 2 (k, R). Recalling (1.11), we obtain the desired result when k is large enough so that 2N k − r < − r 2 . We now construct partial Lyapunov functions in the regions Ω 2 and Ω 3 . (2.14) with thep j of Proposition 2.2, and F 2 , . Then, there are constants C 1 , C 2 , C 3 > 0, independent of a ∈ (0, 1), such that for all sufficiently large k and R, we have for j = 2, 3 the following inequalities on Ω j : Proof. By symmetry, it suffices to prove the result for j = 2. In this proof, we do not allow the O 2 to depend on a ∈ (0, 1) (that is, we want the bound (2.8) to hold uniformly in a ∈ (0, 1)). We start by proving (2.15). For large enough R, we have that | p 2 | > 2 on Ω 2 . Moreover, 2 ), we have by Lemma 2.3 that for large enough k, R, it holds on Ω 2 that Moreover, since both |p 2 | and | p 2 | are > 1, (2.9) implies, for all a ∈ (0, 1), . 3 The role of the contribution |p j | a is to facilitate the patchwork that will lead to a global Lyapunov function in Sect. 4. The corrections involving F 2 and F 3 help average some W 2 L and W 2 R that appear in the computations. Without this correction, we would need a condition on θ that is more restrictive than the natural condition θ < min(1/T 1 , 1/T 4 ).
Since W L and W C are bounded, it follows from Lemma 2.3 that we can bound the right-hand side by a constant, so that we find where we have used thatp 2 2 ) obtained is indeed uniform in a. Next, one can verify that uniformly in a ∈ (0, 1) and |p 2 | > 1, Moreover, by (2.12) we have b=1, But then Using the definition of α 2 in (1.14) and the condition on θ , we find that −α 2 θ + θ 2 γ 1 T 1 W 2 L is negative. Using then Lemma 2.3 to make the O 2 ( p −1 2 ) very small, and combining the result with (2.18) completes the proof.

When Both Central Rotors are Fast
We now study the regime where both | p 2 | and | p 3 | are large (not necessarily of the same order of magnitude), and | p 1 | and | p 4 | are much smaller. We then have two fast variables: q 2 and q 3 . As we will see, this will lead to some trouble related to resonances, and averaging the rapid oscillations will not be enough. We start with some formal computations thinking in terms of powers of p 2 and p 3 , and then restrict ourselves to the set Ω c ( , m, R) for some appropriate parameters.

Averaging with two fast variables: resonances.
Now the fast-slow decomposition is as follows: q 1 , q 4 and p are the slow variables, and q 2 , q 3 are the fast variables, with the approximate dynamics (for short times) (3.1) generated by L 2 + L 3 = p 2 ∂ q 2 + p 3 ∂ q 3 , which we see as the most important contribution in L. Let again f, g : Ω → R and assume that We would like, as above, to add a correction to f in the left-hand side in order to replace g with its average in the right-hand side. However, since the fast motion of (q 2 , q 3 ) on T 2 [in the dynamics (3.1)] follows orbits that are open or closed depending on whether p 2 and p 3 are commensurable or not, there seems to be no natural notion of "average of g" that is continuous with respect to the slow variables.
Consider for example g(x) = sin(2q 2 −q 3 ). In our approximation, sin(2q 2 (t)−q 3 (t)) oscillates with frequency (2 p 2 − p 3 )/2π . The average is zero when p 3 = 2 p 2 , and sin(2q 2 (t) − q 3 (t)) remains constant when p 3 = 2 p 2 . When p 3 is close to 2 p 2 , the oscillations are slow, and one cannot simply average sin(2q 2 (t)−q 3 (t)). More generally, any smooth function g on Ω can be written as n,m∈Z a n,m sin(nq 2 + mq 3 + ϕ n,m ) for some coefficients a n,m and ϕ n,m which depend on the slow variables q 1 , q 4 and p. Each such term gives rise to problems close to the line p 3 / p 2 = −n/m in the p 2 p 3 -plane.
However, if g depends on q 2 but not on q 3 , then no problem appears. In the approximation (3.1), the quantity g(x(t)) then oscillates rapidly around g 2 , which is then a function of the slow variables q 1 , q 4 and p. Then, as in Sect. 2.1, we use G = (g − g 2 )dq 2 (we choose the integration constant independent of q 3 ), so that ( , which has the desired form. Similarly, if g depends on q 3 but not on q 2 , we use the counterterm G/ p 3 with G = (g − g 3 )dq 3 . And of course, if g can be decomposed as the sum of a function not involving q 3 and a function not involving q 2 , then we can average each part separately and sum the two counterterms.
It turns out that we will mostly encounter terms that depend only on one of the fast variables, and are therefore easy to average. We will go as far as possible averaging such terms, and then introduce a method to deal with the resonant terms (involving both q 2 and q 3 ) that appear.

Application to the central energy.
As a starting point, we use the central energy Unless explicitly stated otherwise, we take A = A * .
We state the main result of this section.

Proposition 3.2. There is a function of the form
with α j as defined in (1.14). Furthermore, In order to reduce the length of some symmetric formulae, we use the notation "+ ⇔" as a shorthand for the other half of the terms with the indices exchanged as follows: 1 ⇔ 4, 2 ⇔ 3, L ⇔ R, and the sign of w C changed (due to the asymmetry of the argument q 3 − q 2 of W C ).
In order to prepare the proof of Proposition 3.2, we proceed as follows. We first see that Since w L does not involve q 3 and w R does not involve q 2 , it easy to find appropriate counterterms: we introduce and obtain The terms of order 1/ p 2 do not depend on q 3 and have mean zero with respect to q 2 (again w L W L = ∂ q 2 W 2 L /2 has zero q 2 -average by periodicity). Similarly, the terms in 1/ p 3 do not involve q 2 and average to zero with respect to q 3 . Therefore, we introduce a next round of counterterms: The terms in the first line are easy to eliminate, since each one depends on only one of the fast variables and averages to zero. The terms γ 1 w L W [ are the ones we are looking for, since after integrating by parts, we find γ 1 w L W [1] L 2 = −γ 1 W 2 L 2 = −α 2 and γ 4 w R W [1] R 3 = −γ 4 W 2 R 3 = −α 3 . The two "resonant" terms involving W L w C and W R w C are more problematic and we leave them untouched for now. By introducing the appropriate counterterms (which we do not write explicitly), we obtain a function H In order to obtain (3.3), we must get rid of the two "mixed" terms involving W L w C and W R w C , which are of the same order as the dissipative contributions involving α 2 and α 3 . Since they each depend on both q 2 and q 3 , these terms are not easy to get rid of, due to the resonance phenomenon discussed above. In fact, as discussed in Appendix A, these resonances have a physical meaning. Their effect becomes clearly visible when T 1 = T 4 = 0 (which is not covered by our assumptions): they alter the dynamics in the p 2 p 3 -plane, but do not prevent H c from decreasing in average. We postpone to Sect. 3.3 the construction of the counterterms that will eliminate these resonant terms.
We introduce next two technical lemmata and an application of Proposition 3.2. The following lemma is analogous to Lemma 2.3.
where we choose N large enough and use the definition of Ω c . By (1.12), we conclude that the desired result holds for m large enough so that 2 N m − r < − r 2 .

Lemma 3.4. Let f
Proof. We apply Young's inequality in the form x y ≤ x a + y b with a = z 1 +z 2 This, and the definition of O c , complete the proof.
As a consequence of Proposition 3.2 we have: with the H c of Proposition 3.2 and F 2 , F 3 as in Proposition 2.4. Let > 1 be a fixed integer. Then, there are constants C 4 , C 5 , C 6 > 0 such that for all large enough m and R, the following inequalities hold on Ω c (m, , R): Proof. We first prove (3.8). By (3.2), the boundedness of the potentials, and Lemma 3.3, we have for m, R large enough that on Ω c , In addition, if m, R are large enough, p 2 2 + p 2 3 is large on Ω c , so that the first part of (3.10) implies that c( p 2 2 + p 2 3 ) < H c < c ( p 2 2 + p 2 3 ). This and (3.10) imply (3.8).
We next prove (3.9). Define f (s) = se θs . By Proposition 3.2, (3.11) Now observe that for any C ∈ R, we have But then, by (3.11) and Lemma 3.4, we find that As in the proof of Proposition 2.4, the corrections involving F 2 and F 3 replace the oscillatory terms W 2 L and W 2 R with their averages: Therefore, by the definition (1.14) of α j and the condition on θ , we have Finally, by Lemma 3.3, and using that ( We now return to the proof of Proposition 3.2. We need to find some counterterms to eliminate the mixed terms in (3.7). For this, we use a subdivision of A * = {x ∈ Ω : p 2 = 0, p 3 = 0} into 3 disjoint pieces, as shown in Fig. 7: By construction, A 1 is close to the diagonal p 2 = p 3 , A 3 is far from it, and A 2 is some transition region.

Proof. Trivially, (i) holds because on
To obtain (ii), observe that either p 2 and p 3 have the same sign and by the definition of In both cases, we have the desired bound.
We first work on A 1 ∪ A 2 . In this region, p 2 and p 3 are close to each other, and are both large in absolute value. It is then easy to find a counterterm for p 1 W L w C / p 2 2 and p 4 W R w C / p 2 3 . Indeed, W L and W R oscillate very rapidly (the respective frequencies are approximately p 2 /2π and p 3 /2π ), while w C oscillates only "moderately", with frequency ( p 3 − p 2 )/2π . One can then simply average the rapidly oscillating part, and obtain Proof. We have for the first term: where the last equality uses Lemma 3.6 (i). A similar computation for the second term completes the proof.
The counterterm R 12 works well on A 1 ∪ A 2 because | p 3 − p 2 | is small compared to p 2 and p 3 . We now have to find a counterterm R 23 that works on A 2 ∪ A 3 and then patch the two counterterms together on A 2 . We state the properties of the counterterm R 23 in the following lemma, but postpone its construction to Sect. 3.3.

Lemma 3.8. There is a function R
and Assuming that Lemma 3.8 is proved, we next join the two counterterms R 12 and R 23 by a smooth interpolation on A 2 in order to prove Proposition 3.2.

Proof of Proposition 3.2.
We introduce a smooth function : R ∪ {−∞, ∞} → [0, 1] such that (x) = 1 when |x| ≤ 1 and (x) = 0 when |x| ≥ 2. We then consider the function which is well-defined and smooth on the set Moreover, it is equal to 1 on A 1 , and 0 on A 3 . We now omit the arguments and simply write instead of (3.14). Using Lemmas 3.7 and 3.8, we obtain Observe next that But then, by (3.15) and using that R 12 We set now

Fully decoupled dynamics approximation.
We construct here the counterterm R 23 of Lemma 3.8, which eliminates the two resonant terms − p 1 W L w C / p 2 2 and p 4 W R w C / p 2 3 on A 2 ∪ A 3 when both | p 2 | and | p 3 | are large. In this regime, all three interaction forces w L , w C , w R oscillate rapidly (since | p 2 |, | p 3 − p 2 | and | p 3 | are all large) and we expect the dynamics to be well approximated by the following decoupled dynamics, where all the interaction forces are removed. Definition 3.9. We call decoupled dynamics the SDE with generatorL and denote byĒ x the corresponding expectation value with initial condition x ∈ Ω.
We will construct two functions U 1 , U 4 such thatLU 1 = p 1 W L w C andLU 4 = − p 4 W R w C . Then, we will introduce a change of variable x →x(x) such thatx approximately obeys the decoupled dynamics, so that L(U 1 (x)) ≈ p 1 W L w C and L(U 4 (x)) ≈ − p 4 W R w C in the regime of interest. Finally, we will show that the choice R 23 (x) = U 1 (x)/ p 2 2 + U 4 (x)/ p 2 3 satisfies the conclusions of Lemma 3.8. The decoupled dynamics can be integrated explicitly for any initial condition x = (q 1 , . . . , p 4 ) ∈ Ω. For the outer rotors b = 1, 4, we have (3.20) and for the central ones ( j = 2, 3) we simply have p j (t) = p j , q j (t) = q j + p j t (mod 2π), (3.21) which is deterministic. We decompose the variables between the central and external rotors as Under the decoupled dynamics, the two processes x e (t) and x c (t) are independent and x c (t) is deterministic. Moreover, under the decoupled dynamics, x e (t) has the generator and admits the invariant probability measureπ e on (T × R) 2 given by where Z is a normalization constant (recall that T 1 , T 4 > 0 by assumption). Definition 3.10. We denote by S the set of functions f ∈ C ∞ (Ω, R) for which the norm is finite. We denote by S 0 the subspace of functions f ∈ S for which We will later consider f = p 1 W L w C and f = −p 4 W R w C , which are manifestly in S 0 . Lemma 3.11. There are constants C * , c * > 0 such that for all f ∈ S 0 , all x ∈ Ω, and all t ≥ 0, Proof. As mentioned, x e (t) and x c (t) are independent under the decoupled dynamics. Introducing the expectation valueĒ e with respect to the process x e (t) under the decoupled dynamics, we obtain that for any function f on Ω, where x c (t) is (deterministically) given by (3.21).
The process x e (t) under the decoupled dynamics is exponentially ergodic, with the unique invariant measureπ e defined above. Indeed, one can check explicitly that this measure is invariant, and introducing the Lyapunov function V e (x e ) = 1 + p 2 1 + p 2 4 , we easily obtain thatL e V e ≤ c − cV e . It follows from [23, Theorem 6.1] 4 that there are two constants C * , c * > 0 such that for any function g : x e g(x e (t)) −π e (g)| |g(x e ) −π e (g)| Let now f ∈ S 0 . For any fixed v ∈ (T × R) 2 , we apply (3.25) to the function g v (x e ) = f (x e , v). Since f ∈ S 0 , we haveπ e (g v ) = 0. Therefore, for any t ≥ 0, This holds for all v, and in particular for v = x c (t). Therefore, by (3.24), we have the desired result.
(3.27) Proposition 3.12. Let f ∈ S 0 be a function such that for all multi-indices a, we have ∂ a f ∈ S 0 , and let

28)
Then: (i)K f and its derivatives of all orders are in S.
Proof. By Lemma 3.11, the integral (3.28) converges absolutely for all x and we havē K f ∈ S. We now prove the result about the derivatives. By (3.20) and (3.21), we can write where the h i j are deterministic functions of t only that grow at most linearly (namely 0, 1, e −γ b t , (1 − e −γ b t )/γ b and t). We then have For the derivatives of order n, we find by induction where the sum is taken over all (i 1 , . . . , i n ) ∈ {1, 2, . . . , 8} n . Since by assumption ∂ i 1 ,...,i n f ∈ S 0 , we have by Lemma 3.11 that But then, by (3.30), we have Since the h i j grow at most linearly, the time-integrals in the right-hand side converge. Therefore,K f is C ∞ and (i) holds. For the second statement, we observe that where we have used that lim t→∞ E x f (x(t)) = 0 by (3.26).
Remark 3.13. The proof of Proposition 3.12, and in particular (3.29), relies on the linear nature of the decoupled dynamics. If we add constant forces τ 1 and τ 4 at the ends of the chain (as in [6]), the method above applies with little modification, and with the replacements , in the invariant measureπ e . However, if we add pinning potentials of the kind U (q i ), the decoupled dynamics cannot be solved explicitly, and we do not have (3.29) for some deterministic functions h i j (t). Although we believe there exists an analog of Proposition 3.12 in that case, we are currently unable to provide it. The situation is even worse in the simultaneous presence of constant forces and pinning potentials. In that case, the expression ofπ e is not known [14], which makes it difficult to decide whether a given function is in S 0 . (Of course, although there is no difficulty there, the averaging of p 2 , p 3 and H c also needs to be adapted to accommodate for such modifications of the model.) We now have an inverse ofL on a given class of functions. We next use it to find an approximate inverse of L. The key is to introduce a change of variablesx = (q 1 ,p 1 , . . . ,q 4 ,p 4 ) such that for nice enough functions f , it holds that L( f (x)) ≈ (L f )(x) in the regime of interest. Here and in the sequel, it is always understood that x is viewed as a function of x. We compare the actions of L andL in Lemma 3.14. We state this lemma with the notation (3.27), and write generically In our case, only σ 5 and σ 8 , which correspond to the variables p 1 and p 4 , are non-zero.
Lemma 3.14. Consider a change of coordinates x →x(x) = x + s(x), defined on some set Ω 0 ⊂ Ω. Assume that for all j, for some ε j . Then, for any smooth function h, we have for all x ∈ Ω 0 that Proof. We do the computation for the case of just one variable x ∈ R. Let g(x) =x(x) = x + s(x). From the definition of L andL, and since by assumption Lg =b • g + ε, we find The desired result follows from generalizing to the multivariate case.
We consider now the following change of variables defined on A 2 ∪ A 3 : (3.33) with analogous expressions for the indices 3, 4. Here, we have used Lemma 3.6 (ii) to While one could choose a more refined change of variables by going to higher orders, the change (3.33) is good enough for our purpose.

Lemma 3.15. Let f ∈ S. Then f is O c (1). Moreover, given any function
which is indeed a O c (1) on this set. The claim about f (x) follows from the choice ξ ≡ 1.
Proof. We use again the notations x = (x 1 , . . . , x 8 ) = (q 1 , . . . , q 4 , p 1 , . . . , p 4 ) and (3.31). We apply Lemma 3.14 with the coordinate changex = x + s(x) defined by (3.33). Then, the s j are given by (3.33), and the ε j are given by (3.34). Observe then that on A 2 ∪ A 3 , all the s j and ε j and are at most The only non-zero σ i are σ 5 = γ 1 T 1 and σ 8 = γ 4 T 4 . Moreover, ∂ x 5 s j = ∂ p 1 s j = 0 for all j ∈ {1, 2, . . . , 8}, and similarly ∂ x 8 s j = ∂ p 4 s j = 0. Therefore, from (3.32) we are left with ζ(x) = j (∂ j h)(x)ε j (x). We now apply this to the function h =K f . By Proposition 3.12, we haveLh = f , so that (3.35) To obtain the desired results, it remains to make the following two observations. First, by the mean value theorem, there is for each x some ξ(x) ∈ [0, 1] such that on A 2 ∪ A 3 , where we have applied Lemma 3.15 to ∂ j f , which is in S by assumption. Secondly, using Lemma 3.15 and the fact that ∂ j h ∈ S by Proposition 3.12, we find which, together with (3.35) and (3.36), completes the proof.
We are now ready for the Proof of Lemma 3.8. Let and That U 1 depends only on (q 1 , . . . , q 3 , p 1 , . . . , p 3 ) follows from the independence of the four rotors under the decoupled dynamics. Similarly for U 4 . It is easy to check that f = p 1 W L w C satisfies the assumptions of Proposition 3.12: Since f 1 = 0, we also have ∂ a f 1 = 0 for each multi-index a. From this it follows thatπ e ( f ) = 0 and that π e (∂ a f ) = 0, sinceπ e is uniform with respect to q 1 . Since no powers of p 1 or p 4 appear upon differentiation, we indeed obtain that f and all its derivatives are in S 0 . A similar argument applies to f = −p 4 W L w C . Therefore, applying Proposition 3.16, we find that on the set A 2 ∪ A 3 , the functions U 1 (x) and U 4 (x) are O c (1), and that In (3.37), the arguments of W L , W R and W C are indeed x and notx. Finally, we have The main assertion of the lemma then follows from this, (3.37), and Lemma 3.4. The assertion (3.13) follows from the definition of R 23 and the following observation: using the explicit expression forx, Proposition 3.12 (i) and Lemma 3.15, we obtain (1), and ∂ p 4 (U 1 (x)) = (∂ p 4 U 1 )(x) = 0 (and similarly for U 4 ).
Remark 3.17. The construction above relies on the strict positivity of the temperatures (which we assume throughout). Nonetheless, it can be adapted to the case T 1 = T 4 = 0.
In this case, the external rotors are not ergodic under the decoupled dynamics: they deterministically slow down and asymptotically reach a given position that depends on the initial condition. Therefore, the conclusion of Lemma 3.11 does not hold. However, the counterterm R 23 that we obtained still produces the desired effect. Indeed, at zero temperature, the definition of U 1 becomes where x(t) is the deterministic solution given in (3.20) and (3.21) with initial condition x and T 1 = T 4 = 0. Since p 1 (t) decreases exponentially fast and W L w C is bounded, this integral still converges. A similar argument applies to U 4 .

Constructing a Global Lyapunov Function
We construct here the Lyapunov function of Theorem 1.4. We start by fixing the parameters defining the sets Ω 2 , Ω 3 , Ω c and the functions V 2 , V 3 , V c . We assume throughout this section that θ is fixed and satisfies This condition is necessary to apply Propositions 2.4 and 3.5. In addition, it guarantees that when p 2 1 + p 2 4 is large, exp(θ H ) decreases very fast: Lemma 4.1. There are constants C 7 , C 8 > 0 such that the result follows from the condition on θ .
We next choose the constants k, , a, m, and finally R. First, we fix k large enough, and require a lower bound R 0 on R so that the conclusions of Proposition 2.4 hold on Ω j (k, R), j = 2, 3. We then fix the parameters a (appearing in V 2 , V 3 ) and such that As a consequence, Ω c ( , m, R) now depends only on m and R, which we fix large enough so that Proposition 3.5 applies, and so that m > and R ≥ R 0 . This choice satisfies the condition 1 ≤ k < < m imposed in (1.8). This ensures that the sets Ω j ( j = 2, 3) and Ω c have "large" intersections, and that they indeed look as shown in Figs. 4 and 5. Moreover, condition (4.2) ensures that for large | p j |, j = 2, 3, which will be crucial. We next introduce smooth cutoff functions for the sets Ω 2 , Ω 3 , Ω c . For this, we consider for each set a thin "boundary layer" included in the set itself.

Definition 4.2.
Let P be a subset of the momentum space R 4 . We define B(P) = {p ∈ P : dist( p, P c ) < 1}. Lemma 4.3. Let P ⊂ R 4 . Then, there is a smooth function ψ : R 4 → [0, 1] with the following properties. First, ψ( p) = 1 on P\B(P) and ψ( p) = 0 on P c , with some interpolation on B(P). Secondly, ∂ a ψ is bounded on R 4 for each multi-index a.
Proof. Such a function is obtained by appropriately regularizing the characteristic function of the set { p ∈ P : dist( p, Since the definition of sets Ω c and Ω j , j = 2, 3, involves only the momenta, we can write Ω c = T 4 × P c and Ω j = T 4 × P j for some sets P c , P j ⊂ R 4 . We apply Lemma 4.3 to P c , P 2 and P 3 , and denote by ψ c , ψ 2 , and ψ 3 the functions obtained. We introduce also the sets Proof of Theorem 1.4. We show that the Lyapunov function has the necessary properties, provided that the constant M is large enough. We start by proving (1.5). From (2.15) and (3.8), we immediately obtain the bound which is slightly sharper than (1.5). We next turn to the bound on LV . We introduce the set G = {x ∈ Ω : p 2 1 + p 2 4  2 on G. Moreover, observe that for j = 2, 3, there is a polynomial z j ( p) such that where the first inequality follows from (2.16) and the second inequality holds because the derivatives of ψ j ( p) have support on B(Ω j ), because |ψ j − 1 Ω j | ≤ 1 B(Ω j ) , and because of (2.15). Similarly, using Proposition 3.5, we obtain a polynomial z c ( p) such that on Ω Combining (4.5), (4.6) and (4.7), we find The first line contains the "good" terms. We next show that these terms dominate the others. Let ε > 0. We claim that there is a (large) compact set K (which depends on ε) such that Since the constants C i do not depend on ε and M, we can make the three parentheses We now show that this implies (1.6). Observe that since V ≥ e θ H , we have log V ≥ θ H , and therefore, for j = 2, 3, Since also −e θ H ≤ −e θ H /(2 + log V ), we obtain by (4.13) that which, by the definition (1.7) of ϕ, proves (1.6).

Proof of Theorem 1.3
Now that we have a Lyapunov function (Theorem 1.4), we can prove Theorem 1.3 in the spirit of [6]. In addition to Theorem 1.4, we need a few other ingredients. We first use the result of [8] about subgeometric ergodicity. We state it here in a simplified form. For a definition of "irreducible skeleton" and "petite set", see the introduction of [8] or [6, Section 2]. Theorem 5.1 (Douc-Fort-Guillin (2009)). Assume that a skeleton of the process (1.2) is irreducible and let V : Ω → [1, ∞) be a smooth function with lim p →∞ V (q, p) = +∞. If there are a petite set K and a constant C such that L V ≤ C1 K − ϕ(V ) for some differentiable, concave and increasing function ϕ : [1, ∞) → (0, ∞), then the process admits a unique invariant measure π , and for any z ∈ [0, 1], there exists a constant C such that for all t ≥ 0 and all x ∈ Ω, Proof. (i) follows from Hörmander's condition. The proof that Hörmander's condition holds, which relies on Assumption 1.2, is very similar to that of Lemma 5.3 of [6] and is left to the reader. The proof of (ii) is exactly as in Lemma 5.6 of [6], and (iii) follows from (i), (ii), and Proposition 6.2.8 of [22].
Thus, we have proved (iii). Since (i) and the smoothness assertion in (ii) follow from Proposition 5.2, the proof is complete. Remark 5.3. It would of course be desirable to generalize Theorem 1.3 to longer chains of rotors. The proof of Proposition 5.2 carries on unchanged to chains of arbitrary length. Therefore, in order to prove the existence of a steady state and obtain a convergence rate (with Theorem 5.1), it "suffices" to find an appropriate Lyapunov function. We expect the convergence rate to be limited by the central rotor (if the length of the chain is odd) or the two central rotors (if the length is even). Preliminary studies indicate that for chains of length n, a convergence rate exp(−ct k ) with k = 1/(2 n/2 − 2) is to be expected. Obtaining such a result raises some major technical difficulties. First, the averaging procedure has to be carried to much higher orders, which quickly becomes intractable if we proceed explicitly, as we do here. Moreover, the number of regimes to consider grows very rapidly with n. And finally, some generalization of Proposition 3.12 to more general (nonlinear) decoupled systems will be needed, with the difficulties mentioned in Remark 3.13. We are trying to solve these issues by developing a inductive method which requires fewer explicit calculations, but much work remains to be done.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

A. Resonances in the Deterministic Case
In Sect. 2, two resonant terms appeared, namely p 1 W L w C / p 2 2 and − p 4 W R w C / p 2 3 . These terms have a physical meaning. We start with the case where W I (s) = − cos(s), I = L, C, R. Then, W L w C = − cos(q 2 − q 1 ) sin(q 3 − q 2 ) = sin(q 1 − q 3 ) 2 + sin(2q 2 − q 3 − q 1 ) 2 . (6.1) Consider now the regime where most of the energy is concentrated at sites 2 and 3. In the approximate dynamics (3.1), we see that sin(q 1 − q 3 ) oscillates with frequency p 3 /2π and mean zero, while sin(2q 2 −q 3 −q 1 ) oscillates with frequency (2 p 2 − p 3 )/2π . When p 3 = 2 p 2 , the second term does not oscillate. In Fig. 8, we represent some trajectories projected onto the p 2 p 3 -plane in the deterministic case (i.e., T 1 = T 4 = 0). We observe that some trajectories are "trapped" by the line p 3 / p 2 = 2, while some others just cross it. By symmetry, the same happens when p 3 / p 2 = 1/2 because of the term − p 4 W R w C / p 2 3 . This phenomenon does not occur when the same conditions are used with positive temperatures (see Fig. 3). A finer analysis (not detailed here) shows that in the resonant regime p 3 / p 2 = 2, a net momentum flux from p 3 to p 2 appears, and similarly for p 3 / p 2 = 1/2 with a flux from p 2 to p 3 . These fluxes stabilize the resonant regimes.
If we take W I (s) = − cos(n I s) for some n I ∈ Z * , I = L, C, R, we find by a decomposition similar to (6.1) some resonances at p 3 p 2 ∈ n C + n L n C , n C − n L n C , n C n C + n R , n C n C − n R . (If some of these values are 0 or ∞, we exclude them since our approximation is reasonable when both | p 2 | and | p 3 | are very large.) For example, if we choose (n L , n C , n R ) = (3, 1, 3), we obtain the ratios p 3 / p 2 = 4, 1/4, −2, −1/2, which we indeed observe in Fig. 9. Of course, a similar analysis applies to more general interaction potentials by taking their Fourier series and treating the (products of) modes separately.