1 Introduction

1.1 Description of the Model

We consider a model for heat conduction consisting of a one-dimensional chain of N coupled oscillators. The evolution is a Hamiltonian dynamics with Hamiltonian

$$\begin{aligned} H(p,q)= \sum _{1\le i \le N} \left( \frac{p_i^2}{2} + U_{\text {pin}}(q_i) \right) + \sum _{i=0}^{N}U_{\text {int}}(q_{i+1}-q_i), \end{aligned}$$

where (pq) belong in the phase space \( {\mathbb {R}}^{2N}\) and \(q_0,q_{N+1}\) describe the boundaries which here are considered to be fixed: \(q_0=q_{N+1}=0\). We denote by \(q=(q_1,\ldots ,q_N) \in {\mathbb {R}}^N\) the displacements of the atoms from their equilibrium positions and by \(p=(p_1,\ldots ,p_N) \in {\mathbb {R}}^N\) the momenta. Each particle has its own pinning potential \(U_{\text {pin}}\) and it also interacts with its nearest neighbors through an interaction potential \(U_{\text {int}}\). Notice that here all the masses are equal and we take them \(m_i=1\). So we consider a homogeneous chain, where both the masses and the potentials that act on each oscillator, are the same. The classical Hamiltonian dynamics is perturbed by noise and friction in the following way: the two ends of the chain are in contact with heat Langevin baths at two different temperatures \(T_L\), \(T_R >0 \). So our dynamics is described by the following system of SDEs:

$$\begin{aligned} \begin{aligned} \text{ d }q_i(t)&=p_i(t) \text{ d }t \quad \text {for} \quad i=1,\ldots ,N, \\ \text{ d }p_i(t)&= ( - \partial _{q_i} H) \text{ d }t \quad \text {for} \quad i=2,\ldots ,N-1, \\ \text{ d }p_1(t)&=(-\partial _{q_1}H - \gamma _1 p_1)\text{ d }t + \sqrt{2 \gamma _1 T_L} \text{ d }W_1(t),\\ \text{ d }p_N(t)&=(-\partial _{q_N}H - \gamma _N p_N)\text{ d }t + \sqrt{2 \gamma _N T_R} \text{ d }W_N(t) \end{aligned} \end{aligned}$$
(1.1)

where \(\gamma _i\) are the friction constants, \(T_i\) are the two temperatures and \(W_1,W_N\) are two independent normalised Wiener processes.

The dynamics (1.1) is equivalently described by the following Liouville equation on the law of the process

$$\begin{aligned} \partial _tf = {\mathcal {L}}^* f\quad \text {with}\quad f(0,p,q)=f_0(p,q) \end{aligned}$$
(1.2)

where \({\mathcal {L}}\) is the second order differential operator

$$\begin{aligned} {\mathcal {L}}= \sum _{i=1}^N ( p_i \partial _{q_i}- \partial _{q_i}H \partial _{p_i})- \gamma _1 p_1\partial _{p_1}- \gamma _N p_N\partial _{p_N}+\gamma _1T_L \partial _{p_1}^2+\gamma _NT_R \partial _{p_N}^2 \end{aligned}$$
(1.3)

which is the generator of the semigroup \(P_t\) acting on the space \(C_b^2({\mathbb {R}}^{2N})\) of bounded real-valued, \(C^2\) functions on the phase space. We denote by \({\mathcal {L}}^*\) the generator of the dual semigroup that acts on probability measures.

1.1.1 State of the Art

The model described by the SDEs (1.1), was first used to describe heat diffusion and derive rigorously Fourier’s law (for an overview see [8, 14, 27] and [17]). Since then, it has been the subject of many studies, both from a numerical and from a theoretical perspective. First, the purely harmonic case with several idealised reservoirs at different temperatures has been solved explicitly in [35]. In this paper the authors found exactly how the non-equilibrium stationary state looks like: it is Gaussian in the positions and momenta of the system. For the anharmonic chain there are no explicit results in general. However it has been studied numerically for many different potentials and many kinds of heat baths, including the Langevin heat baths that we consider here. See for instance [2, 20, 28] and references therein.

There are two facts in this model that make its rigorous study very challenging: first of all, we do not know explicitly the form of the invariant measure of (1.1) and also our generator is highly degenerate, having the dissipation and noise acting only on two variables of momenta at the end of the chain. It is not difficult to see, though, that in the equilibrium case, i.e. when the two temperatures are equal \(T_L=T_R=T= \beta ^{-1}\), the stationary measure is the Gibbs–Boltzmann measure \(\text{ d }\mu (p,q)=\exp (-\beta H(p,q)) \text{ d }p\text{ d }q\): after explicit calculations we have \({\mathcal {L}}^* e^{-\beta H(p,q)}=0\).

Since we are interested in the theoretical aspects of the model, we refer to [15, 16], which is the first rigorous study of the anharmonic case. The existence of a steady state has only been obtained in some cases where the potentials act like polynomials near infinity. In particular under the following assumptions on the potentials:

$$\begin{aligned} \lim _{\lambda \rightarrow \infty } \lambda ^{-k} U(\lambda q )=a_k |q|^k\ \text {and}\ \lim _{\lambda \rightarrow \infty } \lambda ^{1-k} U'(\lambda q )=k a_k |q|^{k-1} \text {sign}(q) \end{aligned}$$

for constants \(a_k>0\), where for the interaction: \(k \ge 2\) and for the pinning \(k \ge 1\) (the exponent k for the pinning was improved in [9]) and assuming that the interaction potential is at least as strong as the pinning, the existence and uniqueness of an invariant measure was first proved in [15] using functional analytic methods. In particular it was proved that the resolvent of the generator of (1.1) is compact in a suitable weighted \(L^2\) space. Later it was proved in [34] that the rate of convergence to the steady state is exponential using probabilistic tools. Note that in the above-mentioned papers, the coupling of the chain with the heat baths is slightly different and a bit more complicated than considering Langevin thermostats, with physical interpretation: the model of the reservoirs is the classical field theory given by linear wave equations with initial conditions distributed with respect to appropriate Gibbs measures at different temperatures, see also [33, Sect. 2]. Later, an adaptation of a very similar probabilistic proof was provided in [9] for the Langevin thermostats. The difference with the Langevin heat baths is that the dissipation and the noise act on the momenta only indirectly through some auxiliary variables. Finally let us mention that the relaxation rates have been studied for short chains of rotors with Langevin thermostats in [11, 13].

Regarding the existence, uniqueness of a non-equilibrium stationary state and exponential convergence towards it in more complicated networks of oscillators (multi-dimensional cases) see [12]. The proofs there are inspired by the above-mentioned works in the 1-dimensional chains.

There are also cases where there is no convergence to equilibrium, when for instance \(l > k\), i.e. when the pinning is stronger than the coupling potential, see for example [21, 22]. In [22] the resolvent of the generator fails to be compact or/and there is lack of spectral gap, under some scenarios included in \(l > k\). In particular, when the interaction is harmonic, 0 belongs in the essential spectrum of the generator as soon as the pinning potential is of the form \(| q |^{k}\) for \(k>3\). The conjecture is that this is true as soon as \(k>\frac{2n}{2n-1}\) if n is the center of the chain.

1.2 Notation

\(\{e_i\}_{i=1}^n\) denote the elements of the canonical basis in \({\mathbb {R}}^n\) and \(| \cdot |\) to denote the Euclidean norm on \({\mathbb {R}}^n\), from the usual inner product \(\langle \cdot , \cdot \rangle \). For a square matrix \(A = (a_{ij})_{1 \le i,j \le n} \in {\mathbb {R}}^{n \times n}\), we write \(\Vert A\Vert _2\) for the operator (spectral) norm, induced by the Euclidean norm for vectors :

$$\begin{aligned} \Vert A \Vert _2 = \max _{x \in {\mathbb {R}}^n} \frac{ | Ax|_2}{|x|_2} = ( \text {maximum eigenvalue of } A^T A)^{1/2}. \end{aligned}$$

We also write \(A^{1/2}\) for the square root of a (positive definite) matrix A, i.e. the matrix such that \(A^{1/2} A^{1/2}=A\), for \(A^{1/2}\) a positive definite matrix as well. Moreover, by \(C_b^{\infty }({\mathbb {R}}^n)\) we denote the space of the smooth and bounded functions, by \(\nabla _z\) we denote the gradient on z-variables in a metric space X with respect to the Euclidean metric. We write \({\mathcal {P}}_2({\mathbb {R}}^n)\) for the space of the probability measures on \({\mathbb {R}}^n\) that have second moment finite, i.e.

$$\begin{aligned} {\mathcal {P}}_2({\mathbb {R}}^n) = \Big \{ \rho \in {\mathcal {P}}({\mathbb {R}}^n): \int _{{\mathbb {R}}^n} |x|^2 d \rho (x) < \infty \Big \}. \end{aligned}$$

[N] denotes the set \(\{1,2,\ldots , N \}\) and we use the notation \(g(x) \lesssim {\mathcal {O}}\big (f(x) \big )\) to indicate that there is a dimensionless constant \(C>0\) so that \(|g(x)| \le C |f(x)|\).

1.3 Set Up and Main Results

Let us state two assumptions: one on the boundary conditions of the chain and one on the potentials.

  • (H1) Regarding the boundary conditions, we consider the oscillators chain with rigidly fixed edges: the left boundary of the chain is an oscillator labelled 0 and the right is an oscillator labelled \(N+1\) under the hypothesis that \(q_0=q_{N+1}=0\). The first and the last particle are pinned with additional harmonic forces, corresponding to their attachment to a wall. Note that these boundary conditions and heat baths modelled by two Ornstein–Uhlenbeck processes at both ends as explained above, is the same model as in [35] and is known as the Casher–Lebowitz model, since it is also one of the models considered in [10].Footnote 1

  • (H2) The chain is weakly anharmonic: both pinning and interaction potentials differ from the quadratic ones by perturbing potentials \( U_{\text {pin}}^N, U_{\text {int}}^N \in {\mathcal {C}}^2({\mathbb {R}})\) with bounded Hessians in the following sense:

    $$\begin{aligned} \sup _{\begin{array}{c} q_i \in {\mathbb {R}},\\ i=1,\ldots ,N \end{array} } \Vert \text {Hess}\ U_{\text {pin}}^N(q_i) \Vert _{2} \le C_{pin}^N \quad \text {and}\quad \sup _{\begin{array}{c} r_i \in {\mathbb {R}},\\ i=1,\ldots ,N \end{array}} \Vert \text {Hess}\ U_{\text {int}}^N (r_i)\Vert _{2} \le C_{int}^N \end{aligned}$$
    (1.4)

    where \(r_i:= q_{i+1}-q_i,\ i=1,\ldots ,N\). The positive constants \(C_{pin}^N\), \(C_{int}^N\) scale with the dimension like

    $$\begin{aligned} C_{pin}^N + C_{int}^N \le C_0 N^{-9/2} \end{aligned}$$
    (1.5)

    and \(C_0\) is a dimensionless constant.

Under Assumptions (H1) and (H2) for \(a \ge 0, c>0\), the Hamiltonian takes the form

$$\begin{aligned} H(p,q)=&\sum _{i=1}^N \left( \frac{p_i^2}{2} + a \frac{ q_i^2}{2} + U_{\text {pin}}^N(q_i) \right) + \sum _{i=1}^{N-1}&\left( c \frac{(q_{i+1}-q_i)^2}{2} + U_{\text {int}}^N( q_{i+1}-q_i) \right) \nonumber \\&+ \frac{c q_1^2}{2}+\frac{c q_N^2}{2} \end{aligned}$$
(1.6)

and denoting by \({\mathcal {L}}\) the infinitesimal generator, we look at the Liouville equation \( \partial _tf = {\mathcal {L}}^* f\), where the generator of the dynamics now is

$$\begin{aligned} {\mathcal {L}} =&p\cdot \nabla _q - q \cdot B \nabla _p - \sum _{i=1}^N (U_{\text {pin}}^N)'(q_i)\partial _{p_i}- \gamma p_1 \partial _{p_1} -\gamma p_N \partial _{p_N} + \gamma T_L \partial _{p_1}^2 + \gamma T_R \partial _{p_N}^2 \\&- \sum _{i=1}^{N} \Big ( (U_{\text {int}}^N)'(q_{i+1}-q_i) \partial _{p_i} - (U_{\text {int}}^N)'(q_{i}-q_{i-1}) \partial _{p_i} \Big ) \end{aligned}$$

where we take all the friction constants equal \(\gamma _1=\gamma _N=\gamma \), for the two temperatures \(T_L,T_R\) we assume that they satisfy \(T_L=T+\Delta T\), \(T_R=T- \Delta T\), for some temperature difference \(\Delta T >0\). Also, B is the symmetric tridiagonal (Jacobi) matrix

$$\begin{aligned} B:= \begin{bmatrix} (a+2c) &{} -c &{}0 &{}0&{} \dots &{}0&{} 0 &{}0 \\ -c&{} (a +2c) &{} -c &{}0&{} \dots &{} 0 &{} 0&{}0 \\ 0 &{} -c &{} (a +2c) &{} -c&{} \dots &{} 0 &{}0&{}0\\ \vdots &{}\vdots &{} \quad &{} \quad &{}\quad &{} \quad &{} \quad \\ \vdots &{}\vdots &{} \quad &{} \quad &{}\ddots &{} \ddots &{} \ddots \\ 0&{}0&{}0&{}0&{}\quad &{} -c &{} (a+2c) &{} -c \\ 0 &{}0&{} 0&{}0 &{} \dots &{} 0&{} -c &{} (a+2c) \end{bmatrix}. \end{aligned}$$
(1.7)

It is convenient to see the above form of the generator in the following block-matrix form:

$$\begin{aligned} {\mathcal {L}} =- z^T M \nabla _z -\nabla _q \Phi (q) \cdot \nabla _p + \nabla _p \cdot \Gamma \Theta \nabla _p \end{aligned}$$
(1.8)

where \(z=(p,q)^T \in {\mathbb {R}}^{2N}\), \(\Phi (q)\) corresponds to the perturbing potentials so that

$$\begin{aligned} \Phi (q)&= \sum _{i=1}^N U_{\text {pin}}^N(q_i) + \sum _{i=1}^{N} U_{\text {int}}^N(q_{i+1}-q_i) + U_{\text {int}}^N(q_1) + U_{\text {int}}^N(-q_{N}), \end{aligned}$$

the matrix \({\mathcal {F}}\) is the friction matrix

$$\begin{aligned} {\mathcal {F}} = \text {diag}(\gamma ,0 ,\ldots ,0, \gamma ) \end{aligned}$$

the matrix \(\Theta \) is the temperature matrix

$$\begin{aligned} \Theta = \text {diag}(T_L,0,\ldots , 0,T_R) \end{aligned}$$

and M in blocks is the following

$$\begin{aligned} M = \begin{bmatrix} {\mathcal {F}} &{} -I\\ B &{} 0 \end{bmatrix} \end{aligned}$$
(1.9)

where I is the identity matrix, so that it corresponds to the transport part of the operator, while B and \(\Gamma \) correspond to the harmonic part of the potentials and the drift from both ends, respectively.

Motivation. This study is motivated by a discussion opened in C. Villani’s memoir on hypocoercivity, see Sect. 9.2 in [40], concerning open questions on the heat conduction model as defined above, and how to approach them by hypocoercive techniques. This chain of coupled oscillators corresponds to a hypocoercive situation, where the diffusion only at the ends of the chain leads to a convergence to the stationary distribution exponentially fast, under the following assumptions on the potentials: strict convexity on the interaction potential (being stronger than the pinning one) and bounded Hessians for both potentials. In particular, he points out that it might be possible to recover the previous results of exponential convergence in the weighted \(H^1(\mu )\)-norm for this different class of potentials (than the potentials assumed in [16] for instance) by applying a generalised version of Theorem 24 in [40]. For that, one needs to know some properties of the, non-explicit, non-equilibrium steady state \(\mu \): for instance, if it satisfies a Poincaré inequality or if the Hessian of the logarithm of its density is bounded.

Finally we note that entropic hypocoercivity has been applied in [29] in order to develop estimates and to get quantitative convergence results to the limit equation, for anharmonic chains but with thermostats in contact with all the particles along the chain.

Main results. Here, considering a perturbation of the harmonic chain (homogeneous case), instead we follow an approach that combines hypocoercivity techniques and the Bakry–Émery theory of \(\Gamma \) calculus and curvature conditions as in [4]. We prove the validity of the Bakry–Émery criterion in a modified setting. This is explained in more details and is implemented in Sect. 3. The whole idea was inspired by Baudoin in [6]: using this combination, Baudoin proved exponential convergence to equilibrium for the Kinetic Fokker–Planck equation in \(H^1\)-norm and in Kantorovich–Wasserstein distance.

Thus we show, for the dynamics (1.1) as well, exponential convergence to the stationary state in Kantorovich–Wasserstein distance and in relative entropy and we get quantitative rates of convergence in these distances, i.e. we obtain information on the N-dependence of the rate. In particular our estimates show that the convergence rate in the harmonic chain approach 0 as N tends to infinity at a polynomial rate with order between \(C_1 /N^{3}\) and \(C_2/N\) and that the scaling of the rate is bigger than \(C_3 N^{-3}\) in the weakly anharmonic chain.

In order to quantify the above rates, we estimate \(\Vert b_N\Vert _2\), where \(b_N\) is a block matrix defined in Sect. 3 as a solution of a matrix equation, (1.10). Since \(\Vert b_N\Vert _2\) appears in the rates in the Theorems 1.4, 1.6 and the Proposition 1.2, we start by stating this result:

Proposition 1.1

Let \(\Pi _{N} = {\text {diag}}(2T_L, 1, \ldots ,1,2T_R, 1, 1, \ldots , 1,1) \in {\mathbb {R}}^{2N \times 2N}\) and \(M \in {\mathbb {R}}^{2N \times 2N}\) given by (1.9), with pinning and interaction coefficients \(a \ge 0, c>0\). For all \( N \in {\mathbb {N}}\), there exists a unique symmetric positive definite block matrix \(b_{N} \in {\mathbb {R}}^{2N \times 2N}\) such that

$$\begin{aligned} b_{N} M +M^T b_{N} = \Pi _{N}. \end{aligned}$$
(1.10)

Moreover there exists \(C_{a,c} >0\), that depends only on the coefficients ac, such that for all \(N \in {\mathbb {N}}\), \( \Vert b_{N} \Vert _2 \le C_{a,c} N^3 \) and \( \Vert b_N^{-1}\Vert \le C_{a,c}\).

Second, we state the following Proposition, that is restricted to the harmonic chain, and provides us with a lower bound on the spectral gap (given the estimates on \(\Vert b_N\Vert _2\) by Proposition 1.1):

Proposition 1.2

(Lower bound on the spectral gap of the harmonic chain) For the spectral gap \(\rho \) of the chain described by the generator (1.8) without the perturbing potentials (the harmonic chain), which is given by the relation

$$\begin{aligned} \min \{ \rho >0: (z - {\mathcal {L}})^{-1}\ \text {is invertible with bounded inverse, for}\ -\rho \le \text {Re}(z) < 0 \}, \end{aligned}$$

we have the following property: there exists \(\kappa >0\) such that for all \(N \in {\mathbb {N}}\),

$$\begin{aligned} \rho \ge \kappa N^{-3} . \end{aligned}$$

This lower bound is in fact the optimal rate in the case of the harmonic homogeneous chain. In the work [7, Proposition 9.1] an upper bound is provided as well and thus the scaling of \(\rho \) is exactly \(N^{-3}\). This is done by exploiting the form of the matrix M, (1.9), and more specifically using information on the spectrum of the discrete Laplacian. In [7] we study also the case of disordered chains by considering different pinning coefficients for each oscillator. Compared to the homogeneous case, as in this paper, where the decay is polynomial, in a disordered chain the spectral gap decays at an exponential rate in terms of N. Regarding the adaptation of the generalised Bakry–Emery theory presented in this paper to a non-homogeneous scenario, we can prove existence of a spectral gap for the weakly anharmonic chain as soon as the matrix M has a spectral gap (and this is the case as soon as all the interaction coefficients \(c_i \ne 0\)). The difficulty in a non-homogeneous scenario will be the second part (as described in the Sect. 2): to solve the high-dimensional matrix equation (1.10) in order to estimate the spectral norm.

Remark 1.3

We expect the bound on the \(\Vert b_N\Vert _2\), from Proposition 1.1, to be optimal, since from the proof of Proposition 1.2 combined with [7, Proposition 9.1]: there exist \(c_1>0\), such that

$$\begin{aligned} c_1 N^{-3} \ge \rho \ge \frac{1}{\Vert b_N\Vert _2}. \end{aligned}$$

In the following, we consider \(b_N\) as given by Proposition 1.1. Before we state the first main Theorem, we recall the definition of the Kantorovich–Rubinstein–Wasserstein \(L^2\)-distance \(W_2(\mu , \nu )\) between two probability measures \(\mu , \nu \):

$$\begin{aligned} W_2(\mu ,\nu )^2 = \inf \int _{{\mathbb {R}}^N \times {\mathbb {R}}^N} |x-y|^2 d\pi (x,y) \end{aligned}$$

where the infimum is taken over the set of all the couplings, i.e. the joint measures \(\pi \) on \( {\mathbb {R}}^N \times {\mathbb {R}}^N\) with left and right marginals \(\mu \) and \(\nu \) respectively.

It is easy to see that \(W_2\) is indeed a metric. We restrict ourselves on the subspace \({\mathcal {P}}_2({\mathbb {R}}^{2N})\), where \(\mu \) and \(\nu \) have second moments finite, so that their distance \(W_2(\mu ,\nu )\) will be finite. For more information on this distance we refer the reader for instance to [41] and references therein.

Theorem 1.4

We consider a chain of coupled oscillators whose dynamics are described by the system (1.1) under Assumptions (H1) and (H2). For a fixed number of particles N, there is a unique stationary state \(f_{\infty }\), in particular, for initial data \(f_0^1,f_0^2\) of the evolution equation, we have the following contraction property:

$$\begin{aligned} W_2(P_t^*f_0^1, P_t^* f_0^2) \le C_{a,c} N^{\frac{3}{2}} e^{- \frac{\lambda _0}{N^3}t}\ W_2 (f_0^1,f_0^2) \end{aligned}$$
(1.11)

for \(C_{a,c}, \lambda _0\) dimensionless constants.

Moreover, in the set up of Theorem 1.4, we get some qualitative information about the non-equilibrium steady distribution, like the validity of a Poincaré inequality and even better, a Log–Sobolev inequality:

Proposition 1.5

(Log–Sobolev inequality) Let \({\mathcal {T}}\) be the quadratic form

$$\begin{aligned} {\mathcal {T}}(f,g)= \nabla _zf^T b_N \nabla _zg+ \nabla _z g^T b_N \nabla _zf. \end{aligned}$$

Under Assumption (H2), the unique invariant measure \(\mu =f_{\infty } \) from the Theorem 1.4 satisfies a Log–Sobolev inequality \((LSI(C_N) )\) :

$$\begin{aligned} \int _{{\mathbb {R}}^{2N}} f \log f\ d\mu - \int _{{\mathbb {R}}^{2N}} f\ d\mu \ \log \left( \int _{{\mathbb {R}}^{2N}} f\ d\mu \right) \le C_N \int _{{\mathbb {R}}^{2N}} \frac{{\mathcal {T}}(f,f)}{f} d\mu . \end{aligned}$$
(1.12)

where

$$\begin{aligned} C_N:=\frac{ \gamma T_L\Vert b_N^{-1} \Vert _{2}}{2 \left( {\text {min}}(1,2T_R) \Vert b_N \Vert _{2}^{-1} - (C_{pin}^N + C_{int}^N ) \Vert b_N\Vert _2^{1/2} \Vert b_N^{-1}\Vert _2^{1/2}\right) } \le \gamma T_L C_{a,c}\lambda _0^{-1} N^{3} \end{aligned}$$

where \(\gamma , T_L, C_{a,c}, \lambda _0:= \lambda _0(C_0)\) are all dimensionless constants with the prefactor in (1.5), \(C_0\), to satisfy \(C_0 < {\text {min}}(1,2T_R)C_{a,c}^{-2}. \)

Consequently we have convergence to the non-equilibrium steady state in Entropy. Let us first define the following information-theoretical functionals. For two probability measures \(\mu \) and \(\nu \) on \({\mathbb {R}}^{2N}\) with \(\nu \ll \mu \), we define the Boltzmann H functional

$$\begin{aligned} H_{\mu }(\nu )=\int _{{\mathbb {R}}^{2N}} h \log h \ d\mu ,\ \nu =h \mu \end{aligned}$$
(1.13)

and the relative Fisher information

$$\begin{aligned} I_{\mu }(\nu )= \int _{{\mathbb {R}}^{2N}} \frac{| \nabla h |^2}{h} d \mu ,\ \nu =h \mu . \end{aligned}$$
(1.14)

We have entropic convergence in the following sense, as in [40, Sect. 6]:

Theorem 1.6

We consider a chain of coupled oscillators whose dynamics are described by the system (1.1) under Assumptions (H1) and (H2). For a fixed number of particles N, assuming that (i) \(\mu \) is the invariant measure for \(P_t\) and (ii) that it satisfies a Log–Sobolev inequality with constant \(C_N>0\), for all \(f>0\) with

$$\begin{aligned} {\mathcal {E}}(f)< \infty ,\ \text {and}\ \int f d\mu =1, \end{aligned}$$

we have a convergence to the non-equilibrium steady state in the following sense:

$$\begin{aligned} H_{\mu }(P_tf \mu ) + I_{\mu }(P_tf \mu ) \le \lambda _{a,c} N^3 e^{-\lambda _0 N^{-3} t} \Big ( H_{\mu }(f \mu ) + I_{\mu }(f \mu ) \Big ) \end{aligned}$$
(1.15)

for dimensionless constants \(\lambda _{a,c}, \lambda _0\).

From Theorem 1.4 we get an exponential rate of order bigger than \(N^{-3}\) for the weakly anharmonic chain. In the purely harmonic case, we have that the convergence rate is between \(C_1 N^{-3}\) and \(C_2 N^{-1}\) for some constants \(C_1, C_2\) that are independent of N.

Remark 1.7

Note that a generalised version of \(\Gamma \) calculus has been applied for a toy model of the dynamics (1.1) by Monmarché, [31]: working with the unpinned, non-kinetic version, with convex interaction and given that the center of the mass is fixed, he proves the same kind of convergences and ends up with explicit and optimal N-dependent rates, of order \({\mathcal {O}}(N^{-2})\), for the overdamped dynamics.

1.4 Plan of the Paper

Sections 2, 3, 4 and 5 concern the proofs of the convergence to the steady state by hypocoercive arguments (applying the generalised Bakry–Emery criterion) while Sect. 6 is devoted to estimating the spectral norm of \(b_N\), which is crucial in the final estimate for the scaling of the spectral gap. In particular, Sect. 2 contains an introduction to Bakry–Emery theory and an explanation of the method that is used. In Sect. 3 we obtain the estimates that lead to the proof of Proposition 1.5. In Sects. 4 and 5 we give the proof of Theorems 1.6 and 1.4 respectively. Finally in Sect. 6 we prove Propositions 1.1 and 1.2.

2 Carré du Champ Operators and Curvature Condition

2.1 Introduction to Carré du Champ Operators

Consider a Markov semigroup \(P_t\) with at least one invariant measure \(\mu \) and infinitesimal generator \(L: D(L) \subset L^2(\mu ) \rightarrow L^2(\mu )\). Here we restrict ourselves to the case of the diffusion operators and we associate with the operator L, a bilinear quadratic differential form \(\Gamma \), the so-called Carré du Champ operator, which is defined as follows: for every pair of functions (fg) in \( C^{\infty }\times C^{\infty }\)

$$\begin{aligned} \Gamma (f,g):= \frac{1}{2} \Big ( L(fg) - f Lg-g Lf \Big ). \end{aligned}$$
(2.1)

In other words \(\Gamma \) measures the default of the distributivity of L. Then we define its iteration \(\Gamma _2\), where instead of the multiplication we use the action of \(\Gamma \):

$$\begin{aligned} \Gamma _2(f,g):= \frac{1}{2} \Big ( L(\Gamma (f,g)) -\Gamma (f, Lg) - \Gamma (g, Lf) \Big ). \end{aligned}$$
(2.2)

From the theory of \(\Gamma \)-calculus we have that a curvature condition of the form

$$\begin{aligned} \Gamma _2(f,f) \ge \lambda \Gamma (f,f) \end{aligned}$$
(2.3)

for all f in a suitable algebra \({\mathcal {A}}\) dense in the \(L^2(\mu )\)-domain of L and \(\lambda >0\) is equivalent to the following gradient estimate

$$\begin{aligned} \Gamma \big ( P_t f, P_t f \big ) \le e^{-2\lambda t} P_t (\Gamma (f,f)),\quad t \ge 0 \end{aligned}$$

where \(P_t\) is the semigroup generated by \({\mathcal {L}}\). The uniqueness of the invariant measure then follows from the contraction property in \(W_2\) distance (which is equivalent to the gradient estimate above thanks to Kuwada’s duality, see [26] or Theorem 4.1 later on). This also implies a Log–Sobolev inequality (and thus a Poincaré inequality), see [4] or [3, Sect. 3].

Attempt to apply the classical \(\Gamma \)theory to the generator \({\mathcal {L}}\)given by (1.8): For the generator of the dynamics (1.1), given by (1.8), we can not bound \(\Gamma _2 \) by \( \Gamma \) from below. Explicit calculations give

$$\begin{aligned} \Gamma (f,f) =2 \gamma _1 T_L (\partial _{p_1}f)^2 + 2 \gamma _N T_R (\partial _{p_N}f)^2 \end{aligned}$$

while

$$\begin{aligned} \Gamma _2(f,f)&= 2 (\gamma _1 T_L)^2 (\partial _{p_1}^2f)^2 + 2 (\gamma _N T_R)^2 (\partial _{p_N}^2f)^2 + 2 T_L \gamma _1^2 (\partial _{p_1}f) ( \partial _{q_1}f) \\&\quad + 2 T_R \gamma _N^2 (\partial _{p_N}f) ( \partial _{q_N}f) + \Gamma (f,f) . \end{aligned}$$

Since we can not control the terms \( \partial _{p_i}f \partial _{q_i}f, \) we can not bound \(\Gamma _2\) from below by \(\Gamma \). In cases like this, we say that the particle system has \( -\infty \) Bakry–Emery curvature.

2.2 Description of the Method

In order to overcome this problem, we are doing the following:

(1) First we modify the classical \(\Gamma \) theory: we define a new quadratic form, different, but equivalent, to the \(| \nabla _z f|^2\) that will play the role of the \(\Gamma \) functional. This will spread the noise from \(p_1\) and \(p_N \) to all the other degrees of freedom as well. The general idea comes from Baudoin [6]. We make a suitable choice of a positive definite matrix, \(b_N \in {\mathbb {R}}^{2N \times 2N}\), to define a new quadratic form that will replace the \(\Gamma \) functional, so that we obtain a ’twisted’ curvature condition: an estimate of the form (2.3). This implies also a modified gradient estimate, and thus a Poincaré and Log–Sobolev inequality. We choose this matrix to be the unique solution of a Lyapunov equation with positive definite r.h.s.:

$$\begin{aligned} b_NM + M^Tb_N= \Pi _N>0. \end{aligned}$$

In general in order to deal with a hypocoercive situation in \(H^1\)- setting, one can perturb the norm to an equivalent norm, so that exponential convergence results can be deduced with this new norm. The idea is originally due to Talay in [38] and it was later generalised by Villani in [40]. Then one can have convergence in the usual norm thanks to their equivalence. Here, instead of the norm, we modify the gradient and thus the \(\Gamma \)Carré du Champ, and work with a generalised \(\Gamma \)- theory.

Fig. 1
figure 1

Spreading of dissipation by commutators as in Hörmander’s hypoellipticity theory

The idea of working with the matrix that solves the above-mentioned Lyapunov equation came from the fact that (i) we need to control from below the quantity \(b_NM + M^Tb_N\) and (ii) in the linear chain, the covariance matrix \(b_0 \in {\mathbb {R}}^{2N \times 2N}\) solves

$$\begin{aligned} b_0 M + M^T b_0 = \text {diag} \left( 2T_L,0,\ldots ,2T_R,0,\ldots ,0 \right) \end{aligned}$$
(2.4)

and determines the stationary solution of the corresponding Liouville equation. Therefore, tackling the hypoellipticity problem, i.e. spreading the dissipation to all the degrees of freedom, corresponds to working with a Lyapunov equation with positive definite r.h.s. A way to think of it is as a sequence of Lyapunov equations:

$$\begin{aligned} b_0 M+M^T b_0&= \text {diag} \left( 2T_L,0,\ldots ,2T_R,0,\ldots ,0 \right) \\ b_1 M + M^Tb_1&= \text {diag}\left( 2 T_L,0, \ldots , 0,2T_R,1,0, \ldots , 0,1 \right) := \Pi _1 \\ b_2M+M^Tb_2&= \text {diag}\left( 2T_L,1,0,\ldots ,0,1,2T_R,1,1,0,\ldots ,0,1,1\right) := \Pi _2\\&\vdots \\ b_{N} M+ M^T&b_{N}= \text {diag}(2T_L, 1, \ldots ,1,2T_R, 1, 1, \ldots , 1,1) := \Pi _{N} \end{aligned}$$

so that in each step we add a positive entry in the diagonal of the r.h.s. from both sides. This corresponds to spreading the noise and dissipation to the next oscillator from both ends until the center of the chain, like the commutators would do in a classical hypoelliptic setting, see also Fig. 1. So in the last step we have \(\Pi _N>0\) which corresponds to having spread the noise everywhere in the space. This allows us to prove the validity of the generalised Bakry–Emery criterion (3.4), which is the key estimate in order to have exponential convergence to the non-equilibrium steady state.

(2) In order to make our estimates quantitative, we estimate the spectral norm of the matrix \(b_N\) and its inverse. Regarding the bound on the norm of \(b_N\), we estimate its entries using that it solves the Lyapunov equation, while for the norm of \(b_N^{-1}\), we compare it to the norm of \(b_0^{-1}\) which is uniformly bounded in N. This corresponds to the proof of Proposition 1.1 which is the subject of Sect. 6.

For those familiar with Hörmander’s method we describe briefly here the similarity with the spreading of dissipation-mechanism: in Hörmander’s theory the smoothing mechanism is the one transferred through the interacting particles inductively by the use of commutators: the generator has the form

$$\begin{aligned} {\mathcal {L}}=X_0+X_1^2+X_N^2 \end{aligned}$$

where

$$\begin{aligned} X_0=p\cdot \nabla _q-\nabla _qH \cdot \nabla _p-\gamma p_1\partial _{p_1}- \gamma p_N \partial _{p_N}\quad \text {and}\quad X_i=\sqrt{T_i}\partial _{p_i} . \end{aligned}$$

Then \([\partial _{p_1},X_0]=-\partial _{p_1}+ \partial _{q_1} \). Now commuting \(\partial _{q_1}\) with the first order terms of the generator: \([\partial _{q_1},X_0]= \partial _{q_1q_1}H \partial _{p_1}-\partial _{q_1q_2}H \partial _{p_2}\). Given that \(\partial _{q_1q_2}H\) is non-vanishing we have ’spread the smoothing mechanism’ to \(p_2\). Continuing like that, commuting the ’new’ variable with the first order terms of \({\mathcal {L}}\), inductively we cover all the particles of the chain.

3 Functional Inequalities in the Modified Setting

In order to apply a ’twisted’ Bakry–Emery machinery, introduced by Baudoin in Sect. 2.6 of [6], we work with the positive definite matrix \(b_N\) chosen to be the solution of the Lyapunov equation (1.10). The following Proposition gives us existence of such a solution.

Proposition 3.1

There exists a positive solution to (1.10) if and only if the r.h.s. of it, is positive definite and all the eigenvalues of M have positive real parts.

Proof

It is a matrix reformulation of a well known and classical result of Lyapunov that can be found for instance in [18, p. 224] or [30, Sect. 20]. \(\square \)

The eigenvalues of M have strictly positive real part ([25, Lemma 5.1]) and the right hand side of (1.10) is positive definite. Therefore there exists a positive solution of (1.10). Also, we can easily see that the solution is given by the formula

$$\begin{aligned} b_N = \int _0^{\infty } e^{-t M^T} \Pi _N e^{-tM} dt. \end{aligned}$$

We define the following quadratic quantity for \(f,g \in C^{\infty }({\mathbb {R}}^{2N})\),

$$\begin{aligned} {\mathcal {T}}(f,g):= \nabla _z f^T b_N \nabla _z g + \nabla _z g^T b_N \nabla _z f \end{aligned}$$
(3.1)

so that

$$\begin{aligned} {\mathcal {T}}(f,f)=2 \nabla _zf^T b_N \nabla _zf. \end{aligned}$$

Then we consider the functional

$$\begin{aligned} {\mathcal {T}}_2(f,f)= \frac{1}{2} \Big ( {\mathcal {L}} {\mathcal {T}}(f,f) - 2{\mathcal {T}}(f, {\mathcal {L}} f) \Big ). \end{aligned}$$

Here \({\mathcal {T}}(f,f)\) is always positive since \(b_N \ge 0\) (and in fact positive definite since \(b_N>0\): this is proven in the last part of the proof of Proposition 1.1). In contrast with the original operator \(\Gamma \), our modified quadratic form \({\mathcal {T}}\) is related to \({\mathcal {L}}\) only indirectly through the different steps of commutators.

We have an equivalence of the following form between \( {\mathcal {T}}\) and \(| \nabla _z|^2\):

$$\begin{aligned} \frac{1}{ \Vert b_N^{-1} \Vert _{2}} | \nabla _zf|^2 \le {\mathcal {T}}(f,f) \le \Vert b_N \Vert _{2} | \nabla _z f|^2. \end{aligned}$$
(3.2)

Combining this with the conclusion of Proposition 1.1, we write

$$\begin{aligned} C_{a,c}^{-1} | \nabla f|^2 \le {\mathcal {T}}(f,f) \le C_{a,c} N^{3} | \nabla f |^2 . \end{aligned}$$

Proposition 3.2

With the above notation, under Assumption (H2), for all \(N \in {\mathbb {N}}\) there exists constant

$$\begin{aligned} \lambda _N= {\text {min}}(1,2T_R) \Vert b_N \Vert _{2}^{-1} - (C_{pin}^N + C_{int}^N ) \Vert b_N\Vert _2^{1/2} \Vert b_N^{-1}\Vert _2^{1/2} \end{aligned}$$
(3.3)

such that for \(f \in C^{\infty }({\mathbb {R}}^{2N})\),

$$\begin{aligned} {\mathcal {T}}_2(f,f) \ge \lambda _N {\mathcal {T}}(f,f). \end{aligned}$$
(3.4)

Proof

We use the form of the generator \({\mathcal {L}}\) as in (1.8):

$$\begin{aligned} {\mathcal {L}} = -z^T M \nabla _z -\nabla _q \Phi (q) \cdot \nabla _p + \gamma T_L \partial _{p_1}^2 + \gamma T_R \partial _{p_N}^2 \end{aligned}$$

where \(\Phi \) is the function that corresponds to the perturbing potentials. We write

$$\begin{aligned} 2 {\mathcal {T}}_2(f,f)&= {\mathcal {L}} {\mathcal {T}}(f,f)- 2 {\mathcal {T}}(f,{\mathcal {L}}f) = {\mathcal {L}} {\mathcal {T}}(f,f)- 2\nabla _zf^T b_N \nabla _z {\mathcal {L}}f - 2\nabla _z {\mathcal {L}} f^T b_N \nabla _z f. \end{aligned}$$

About the \((-z^TM \nabla _z)\) -part of \({\mathcal {L}}\), the last equation of the above formula gives

$$\begin{aligned} 2 \nabla _zf^T b_N M\ \nabla _zf + 2 \nabla _zf^T M^Tb_N\ \nabla _zf. \end{aligned}$$

Similarly, concerning the \((-\nabla _q \Phi (q) \cdot \nabla _p)\) -part of \({\mathcal {L}}\) we get

$$\begin{aligned} \nabla _zf^T b_N \text {Hess} (\Phi )^T\ \nabla _zf + \nabla _zf^T \text {Hess}(\Phi ) b_N\ \nabla _zf \end{aligned}$$

and finally regarding the second order terms of the generator we end up with

$$\begin{aligned}&4 \gamma T_L\ \nabla _z \partial _{p_1} f^T b_N\ \nabla _z \partial _{p_1} f + 2\gamma T_L \nabla _z \partial _{p_1}^2 f^T b_N\ \nabla _zf + 2\gamma T_L \nabla _zf^T b_N\ \nabla _z \partial _{p_1}^2 f \\&\qquad -2 \gamma T_L \nabla _z f^T b_N\ \nabla _z \partial _{p_1}^2 f - 2\gamma T_L \nabla _z \partial _{p_1}^2f^T b_N\ \nabla _zf \\&\qquad + 4 \gamma T_R\ \nabla _z \partial _{p_N} f^T b_N\ \nabla _z \partial _{p_N} f + 2\gamma T_R \nabla _z \partial _{p_N}^2f^T b_N\ \nabla _zf + 2\gamma T_R \nabla _zf^T b_N\ \partial _{p_N}^2 \nabla _zf \\&\qquad - 2 \gamma T_R \nabla _z f^T b_N\ \nabla _z \partial _{p_N}^2f - 2\gamma T_R \nabla _z \partial _{p_N}^2f^T b_N\ \nabla _zf. \end{aligned}$$

We eventually write

$$\begin{aligned} {\mathcal {T}}_2(f,f) =&\nabla _zf^T b_N M\ \nabla _zf + \nabla _zf^T M^Tb_N\ \nabla _zf + \nabla _zf^T b_N \text {Hess} (\Phi )^T\ \nabla _zf \\&+ \nabla _zf^T \text {Hess}(\Phi ) b_N\ \nabla _zf + 2 \gamma T_L {\mathcal {T}}( \partial _{p_1} f,\partial _{p_1} f) + 2\gamma T_R {\mathcal {T}}( \partial _{p_N}f,\partial _{p_N}f) \\ \ge&\nabla _zf^T (b_N M + M^Tb_N) \nabla _zf + \nabla _zf^T b_N \big ( \text {Hess}(U_{pin}^N) + \text {Hess}(U_{int}^N) \big ) \nabla _zf \\&+ \nabla _zf^T \big ( \text {Hess}(U_{pin}^N)+ \text {Hess}(U_{int}^N) \big )^T b_N \nabla _zf \\ =&\nabla _zf^T (b_N M+ M^T b_N) \nabla _zf + \nabla _zf^T ( b_N \text {Hess}(U_{pin}^N)+ \text {Hess}(U_{pin}^N)^T b_N ) \nabla _zf \\&+ \nabla _zf^T \big ( b_N \text {Hess}(U_{int}^N)+ \text {Hess}(U_{\text {int}}^N)^T b_N \big ) \nabla _zf \end{aligned}$$

where for the second inequality we used that the terms \({\mathcal {T}}( \partial _{p_i} f,\partial _{p_i} f)\) for \(i=1,N\), are positive. We write the second and third term of the last equation as

$$\begin{aligned} \nabla _zf^T ( b_N \text {Hess}(U_{pin}^N)) \nabla _zf&= \nabla _zf^T b_N^{1/2}b_N^{1/2} \text {Hess}(U_{pin}^N) b_N^{-1/2}b_N^{1/2} \nabla _zf \\&= (b_N^{1/2} \nabla _zf )^T (b_N^{1/2} \text {Hess}(U_{pin}^N) b_N^{-1/2}(b_N^{1/2} \nabla _zf) \end{aligned}$$

and then from the boundedness assumption on the operator norms of the Hessians for both perturbing potentials and the Lyapunov equation (1.10), we get the following

$$\begin{aligned} {\mathcal {T}}_2(f,f) \ge&\nabla _zf \Pi _N \nabla _zf^T -\Vert b_N^{1/2} \text {Hess}(U_{pin}^N) b_N^{-1/2}\Vert _2 {\mathcal {T}}(f,f)\\&-\Vert b_N^{1/2} \text {Hess}(U_{int}^N) b_N^{-1/2}\Vert _2 {\mathcal {T}}(f,f) \\ \ge&\text {min}(1,2T_L,2T_R) | \nabla _zf|^2 -\sup _z \Vert \text {Hess}(U_{pin}^N)(z) \Vert _2 \Vert b_N \Vert _2^{1/2} \Vert b_N^{-1}\Vert _2^{1/2} {\mathcal {T}}(f,f) \\&-\sup _z\Vert \text {Hess}(U_{int}^N)(z) \Vert _2 \Vert b_N \Vert _2^{1/2} \Vert b_N^{-1}\Vert _2^{1/2} {\mathcal {T}}(f,f) \\ \ge&\text {min}(1,2T_R) \Vert b_N \Vert _2^{-1}{\mathcal {T}}(f,f) - (C_{pin}^N +C_{int}^N)\Vert b_N \Vert _2^{1/2} \Vert b_N^{-1}\Vert _2^{1/2} {\mathcal {T}}(f,f). \end{aligned}$$

We conclude by gathering the terms. \(\square \)

The assumption (H2) combined with the conclusion of the Proposition 1.1 ensures us that \(\lambda _N\) is positive, by choosing suitable pre-factors, as we do in the proofs of the main Theorems 1.4 and 1.6. We state now the following lemma that gives the ’twisted’ gradient bound.

Lemma 3.3

(Gradient bound) Under Assumption (H2), for all \(N \in {\mathbb {N}}\), \(t \ge 0\), \((p,q) \in {\mathbb {R}}^{2N}\) and \(f \in C_c^{\infty }({\mathbb {R}}^{2N})\), we have the following twisted gradient estimate

$$\begin{aligned} {\mathcal {T}}(P_tf,P_tf)(p,q) \le e^{-2\lambda _N t} P_t({\mathcal {T}}(f,f))(p,q) \end{aligned}$$
(3.5)

for \(\lambda _N\) given by Proposition 3.2.

Proof

We shall first present a formal derivation of the estimate (3.5). If \({\mathcal {T}}(P_t f, P_tf)\) is compactly supported we consider the functional, for fixed \(t>0, (p,q) \in {\mathbb {R}}^{2N}\),

$$\begin{aligned} \Psi (s)= P_s \big ( {\mathcal {T}}(P_{t-s}f,P_{t-s}f) \big )(p,q),\ s \in [0,t] \end{aligned}$$

for \(f \in C_c^{\infty }({\mathbb {R}}^{2N})\). Since from the semigroup property we have

$$\begin{aligned} \frac{d}{ds} P_s= {\mathcal {L}}P_s= P_s {\mathcal {L}}, \end{aligned}$$

by differentiating and using the above inequality we get

$$\begin{aligned} \frac{d}{ds} \Psi (s)=2P_s \big ( {\mathcal {T}}_2(P_{t-s}f, P_{t-s}f) \big ) \ge 2\lambda _N P_s \big ( {\mathcal {T}}(P_{t-s}f,P_{t-s}f) \big ) = 2\lambda _N \Psi (s) \end{aligned}$$

and since \(\Psi (0)={\mathcal {T}}(P_tf,P_tf),\ \Psi (t)=P_t({\mathcal {T}}(f,f))\), by Grönwall’s lemma we get the desired inequality for every smooth and bounded function f.

In general we need \({\mathcal {T}}(P_t f,P_tf)\) to belong in \(L^{\infty }({\mathbb {R}}^{2N})\) because then we know that \(P_s\big ({\mathcal {T}}(P_{t-s}f,P_{t-s}f)\big )\) is well defined. So we do the following:

First we take \(W(p,q)= 1+ |p|^2 + |q|^2\) as a Lyapunov structure that satisfies the following conditions: \(W >1\), \( {\mathcal {L}} W \le C W\), the sets \( \{ W \le m \}\) are compact for each m, and \({\mathcal {T}}(W) \le C W^2\). This W satisfy the conditions thanks to the bounded-Hessians assumption, i.e. \( | \nabla (U_{int}^N+ U_{pin}^N)| \) will be Lipschitz. In particular, for the inequality \( {\mathcal {L}} W \le C W \) using Cauchy–Schwarz and Young’s inequalities, we write

$$\begin{aligned} {\mathcal {L}} W&= 2 p \cdot q -2q \cdot B p-2 p \cdot \nabla _q \Phi - 2 \gamma _1 p_1^2 - 2 \gamma _N p_N^2 + 2T_L \gamma _1 +2 T_R \gamma _N \\&\le 2 |p||q|+ 2|Bq||p| + 2 | \nabla _q \Phi | |p| + 2T_L \gamma _1 +2 T_R \gamma _N \\&\le |p|^2 + |q|^2 + C_{ C_{lip}, \Vert B\Vert _2} (|p|^2 + |q|^2)+ 2T_L \gamma _1 +2 T_R \gamma _N \\&\le \max \big \{ \max (1, C_{ C_{lip},\Vert B \Vert _{2}}), 2T_L \gamma _1 +2 T_R \gamma _N \big \} (1+|p|^2+|q|^2)= C_1 W \end{aligned}$$

while the inequality \( {\mathcal {T}}(W) \le C_2 W^2\) obviously holds. So we end up with the same constant by choosing \(C:= \max \{C_1,C_2\}.\)

Now using the function W combined with a localization argument as in the work by F.Y. Wang [42, Lemma 2.1] or [5, Theorem 2.2] we prove the boundedness of \( {\mathcal {T}} (P_tf,P_tf).\) For this we approximate the generator \({\mathcal {L}}_n\) with truncated operators so that the approximating diffusion processes remain in compact sets. Consider \(h \in C_c^{\infty }([0,\infty ))\) decreasing such that \(h\vert _{[0,1]}=1\) and \( h\vert _{[2,\infty )}=0\) and define

$$\begin{aligned} h_n = h(W/n)\quad \text {and}\ {\mathcal {L}}_n=h_n^2 {\mathcal {L}}. \end{aligned}$$

Then \({\mathcal {L}}_n\) has compact support in \(K_n:= \{W \le 2n\}\), in the sense that it is 0 outside of it, due to the definition of \(h_n\). Let \(P_t^n\) be the semigroup generated by \({\mathcal {L}}_n\), which is given as the unique bounded solution of

$$\begin{aligned} {\mathcal {L}}_n P_t^nf =\partial _t P_t^n f\quad \text {for}\ f \in L^{\infty }({\mathbb {R}}^{2N}). \end{aligned}$$

Then we also have that for every bounded \(f \in L^{\infty }({\mathbb {R}}^{2N})\), pointwise

$$\begin{aligned} P_t^n f {\mathop {\rightarrow }\limits ^{n\rightarrow \infty }} P_tf. \end{aligned}$$

We do the ’interpolation semigroup argument’ as before for \({\mathcal {L}}_n\) and for \(f \in C_c^{\infty }({\mathbb {R}}^{2N})\) supported in \(\{W\le n\}\). Define

$$\begin{aligned} \Psi _n(s) = P_s^n({\mathcal {T}}(P_{t-s}^nf,P_{t-s}^nf))(p,q),\quad s \in [0,t] \end{aligned}$$

for fixed \(t>0\), \(n \ge 1\) applied to a fixed point (pq) in the support inside the set \(\{W \le n \}\).

It is true, due to the properties of W, that \({\mathcal {T}}(P_t^n f,P_t^n f) \le C_{f,t} \) with \(C_{f,t}\) independent of n and so we have a bound on \({\mathcal {T}}(P_t^nf,P_t^nf )\) uniformly on the set \(\{W\le n\}\). Indeed

$$\begin{aligned} \Psi '_n(s)&= P_s^n( {\mathcal {L}}_n {\mathcal {T}}(P_{t-s}^nf,P_{t-s}^nf) - 2 {\mathcal {T}} ({\mathcal {L}}_nP_{t-s}^nf,P_{t-s}^nf)) \\&= P_s^n ( 2h_n^2 {\mathcal {T}}_2(P_{t-s}^nf,P_{t-s}^nf) -4h_n{\mathcal {L}} P_{t-s}^nf {\mathcal {T}}(h_n,P_{t-s}^nf)) \\&\ge P_s^n ( 2h_n^2 \lambda _N {\mathcal {T}}(P_{t-s}^nf,P_{t-s}^nf)- 4h_n{\mathcal {L}} P_{t-s}^nf {\mathcal {T}}(h_n,P_{t-s}^nf)) \\&\ge P_s^n ( 2h_n^2 \lambda _N {\mathcal {T}}(P_{t-s}^nf,P_{t-s}^nf)-4P_{t-s}^n{\mathcal {L}}_n f {\mathcal {T}}(\log h_n,P_{t-s}^nf)) \\&\ge P_s^n \big ( 2h_n^2 \lambda _N {\mathcal {T}}(P_{t-s}^nf,P_{t-s}^nf)- 4 \Vert {\mathcal {L}} f \Vert _{\infty } \sqrt{{\mathcal {T}}(\log h_n,\log h_n)} \sqrt{{\mathcal {T}}(P_{t-s}^nf,P_{t-s}^nf)} \big ) \\&{\mathop {\ge }\limits ^{\text {Young's ineq.}}} P_s^n \big ( -(2| \lambda _N | + 2) {\mathcal {T}}(P_{t-s}^nf,P_{t-s}^nf) -C_1 {\mathcal {T}}(\log h_n,\log h_n) \big ) \end{aligned}$$

with \(C_1\) constant independent of n. About the last term:

$$\begin{aligned} {\mathcal {T}}(\log h_n, \log h_n) = -\frac{1}{n^2h_n^2} h'(W/n)^2 {\mathcal {T}}(W) \le \frac{C}{h_n^2} \end{aligned}$$

with C independent of n. Now calculate

$$\begin{aligned} {\mathcal {L}}_n \left( \frac{1}{h_n^2} \right) = -\frac{2h'(W/n){\mathcal {L}}W}{nh_n} - \frac{2h''(W/n)\Gamma (W)}{n^2h_n} + \frac{6h'(W/n)^2 \Gamma (W)}{n^2h_n^2} \le \frac{C_2}{h_n^2} \end{aligned}$$

with \(C_2>0\) some constant again independent of n (from the assumptions on the Lyapunov functional W). Therefore

$$\begin{aligned} P_s^n \left( \frac{1}{h_n^2} \right) \le \frac{e^{sc_2}}{h_n^2}. \end{aligned}$$

Combining this last estimate with the above bounds we end up with the differential inequality

$$\begin{aligned} \Psi _n'(s) \ge -(2| \lambda _N | + 2) \Psi _n(s)- C_3 \end{aligned}$$

and \(C_3=C_3(f,t) \) is again independent of n. We multiply both sides with \(e^{(2| \lambda _N | + 2)s}\) so that the above inequality implies

$$\begin{aligned} (e^{(2| \lambda _N | + 2)s} \Psi _n(s) )' \ge - C_3e^{(2| \lambda _N | + 2)s} \end{aligned}$$

or equivalently, after integrating both sides in time from 0 to t, that

$$\begin{aligned} \Psi _n(0) \le e^{ (2| \lambda _N | + 2) t} \Psi _n(t) +\bar{C_3}(f,t) \le e^{ (2| \lambda _N | + 2) t} \Vert {\mathcal {T}}(f,f) \Vert _{\infty } +\bar{C_3}(f,t) \end{aligned}$$

which gives the boundedness of \( {\mathcal {T}}(P_t^nf,P_t^nf) = \Psi _n(0) \) uniformly in n, on the set \(\{ W \le n\}\).

Now if \(d'\) is the intrinsic distance induced by \({\mathcal {T}}\)

$$\begin{aligned} d'(x,y)=\sup _{{\mathcal {T}}(f,f) \le 1} |f(x)-f(y)|, \end{aligned}$$

from the above bound we have that

$$\begin{aligned} |P_t^nf(x) - P_t^nf(y)| \le C d'(x,y) \end{aligned}$$

for n large enough with \(x,y \in \{W \le n\}\) and \(f \in C_c^{\infty }({\mathbb {R}}^{2N})\) with support in \(\{W \le n\}\). This comes from the formula

$$\begin{aligned} P_t^nf(y)-P_t^nf(x)= \int _0^1 \nabla P_t^nf(x+t(y-x)) \cdot (y-x) dt . \end{aligned}$$

Now C does not depend on n (from before), so passing to the limit we have

$$\begin{aligned} |P_t f(x) - P_t f(y)| \le C d'(x,y) \end{aligned}$$

and so \({\mathcal {T}}(P_tf, P_tf) \) is also bounded. Now we can repeat the standard Bakry–Emery calculations as in the beginning of the proof. \(\square \)

Remark 3.4

Note that using the equivalence of \( {\mathcal {T}}\) and \(| \nabla _z|^2\):

$$\begin{aligned} \frac{1}{ \Vert b_N^{-1} \Vert _{2}} | \nabla _zf|^2 \le {\mathcal {T}}(f,f) \le \Vert b_N \Vert _{2} | \nabla _z f|^2, \end{aligned}$$

we get the following \(L^2\)- gradient estimate

$$\begin{aligned} | \nabla _zP_tf|^2&\le \Vert b_N \Vert _{2} \Vert b_N^{-1} \Vert _{2}\ e^{-2\lambda _N t} P_t \big ( |\nabla _zf|^2 \big ) \end{aligned}$$
(3.6)

Once we have a curvature condition of the form (3.4) we are also able to show that the stationary measure satisfies a Poincaré inequality.

Proposition 3.5

Let \({\mathcal {L}} \) be the generator of the dynamics described by the SDEs (1.1) and \({\mathcal {T}}\) the perturbed quadratic form defined in (3.1). Under Assumption (H2), for all \(N \in {\mathbb {N}}\), if \(f \in C^{\infty }({\mathbb {R}}^{2N})\), invariant measure \(\mu \) satisfies a Poincaré inequality

$$\begin{aligned} \text {Var}_{\mu } (f) \le C_N \int _{{\mathbb {R}}^{2N}} {\mathcal {T}}(f,f) d\mu . \end{aligned}$$

where \(C_N = \frac{ \gamma T_L \Vert b_N^{-1} \Vert _{2}}{\lambda _N}\), with \(\lambda _N\) defined in Proposition 3.2.

Proof

For \(f \in C^{\infty }({\mathbb {R}}^{2N})\), we consider the functional

$$\begin{aligned} \Psi (s) = P_s( (P_{t-s}f)^2),\ s \in [0,t]. \end{aligned}$$

We denote by \(\Gamma \) the Carré du Champ operator defined in (2.1). By differentiating we have

$$\begin{aligned} \Psi '(s) = {\mathcal {L}} P_s ((P_{t-s} f)^2)-2 P_s (P_{t-s}f {\mathcal {L}} P_{t-s}f) =2P_s \big ( \Gamma (P_{t-s}f, P_{t-s}f) \big ). \end{aligned}$$

Now by integrating from 0 to t

$$\begin{aligned} P_t(f^2)- (P_tf)^2&= 2 \int _0^t P_s ( \Gamma (P_{t-s}f,P_{t-s}f)) ds \le 2\gamma T_L \int _0^t P_s ( | \nabla P_{t-s} f |^2 ) ds \\&\le 2\gamma T_L \Vert b_N^{-1} \Vert _{2} \int _0^t P_s ( {\mathcal {T}}(P_{t-s}f,P_{t-s}f)) ds \\&\le 2\gamma T_L \Vert b_N^{-1} \Vert _{2} \int _0^t P_s( e^{-2\lambda _N(t-s)} P_{t-s} {\mathcal {T}}(f,f) ) ds \\&= 2 \gamma T_L \Vert b_N^{-1} \Vert _{2}\ e^{-2 \lambda _N t} P_t {\mathcal {T}}(f,f) \int _0^t e^{2 \lambda _N s} ds \\&= 2\gamma T_L \Vert b_N^{-1} \Vert _{2}\ e^{-2 \lambda _N t} P_t {\mathcal {T}}(f,f) \left( \frac{e^{2 \lambda _N t}-1}{2 \lambda _N} \right) \end{aligned}$$

where in the first inequality we used that

$$\begin{aligned} \Gamma (f, f) =\gamma T_L (\partial _{p_1}f)^2 + \gamma T_R (\partial _{p_N}f)^2 \le \gamma T_L | \nabla f |^2, \end{aligned}$$

for the second we used the gradient bound from Lemma 3.3 and just right after that, the semigroup property. The last line can be rewritten like

$$\begin{aligned} P_t(f^2)- (P_tf)^2 = \gamma T_L \Vert b_N^{-1} \Vert _{2}\ \frac{1-e^{-2\lambda _N t}}{\lambda _N} P_t {\mathcal {T}}(f,f). \end{aligned}$$

Now letting t to go to \(\infty \), thanks to the ergodicity, we have the desired inequality. \(\square \)

In fact it is possible to show a stronger pointwise gradient bound, that we exploit for the proof of a Log–Sobolev inequality for the invariant measure of the dynamics.

Proposition 3.6

(Strong gradient bound) For \(f \in C_c^{\infty }({\mathbb {R}}^{2N})\), \(\forall \ t\ge 0\) and \((p,q) \in {\mathbb {R}}^{2N}\)

$$\begin{aligned} {\mathcal {T}}(P_{t} f,P_{t} f)(p,q) \le \Big ( P_t ( \sqrt{{\mathcal {T}}(f,f)}) \Big )^2(p,q) e^{-2 \lambda _N t}. \end{aligned}$$
(3.7)

Remark 3.7

This is a better estimate than (3.5) in Lemma 3.3 because of Cauchy–Schwarz inequality.

Proof

The rigorous justification, i.e. boundedness of \( \sqrt{{\mathcal {T}}(P_{t-s}f,P_{t-s}f)}\)), of the following formal calculations is exactly like in the proof of Lemma 3.3.

Here for \(f \in C_c^{\infty }({\mathbb {R}}^{2N}) \), and for fixed \(t \ge 0, (p,q) \in {\mathbb {R}}^{2N}\), instead we define

$$\begin{aligned} \Phi (s) = P_s \Big ( \sqrt{{\mathcal {T}}(P_{t-s}f,P_{t-s}f)}\Big )(p,q),\ s \in [0,t]. \end{aligned}$$

We denote by \(g=P_{t-s}f \), we differentiate and perform the standard calculations we have

$$\begin{aligned} \Phi '(s)&= P_s \Big ( {\mathcal {L}} (\sqrt{{\mathcal {T}}(g,g)}) - \frac{\nabla {\mathcal {L}}g^T b_N \nabla g + \nabla g^T b_N \nabla {\mathcal {L}} g }{2 \sqrt{{\mathcal {T}}(g,g)}} \Big ) \\&= P_s \Big ( {\mathcal {L}} (\sqrt{{\mathcal {T}}(g,g)}) + \frac{ 2{\mathcal {T}}_2(g,g)-{\mathcal {L}} {\mathcal {T}}(g,g)}{2 \sqrt{{\mathcal {T}}(g,g)}} \Big ) \\&= P_s \Bigg ( \frac{1}{\sqrt{{\mathcal {T}}(g,g)}} \Big ( -\Gamma \big (\sqrt{{\mathcal {T}}(g,g)},\sqrt{{\mathcal {T}}(g,g)}\big ) + {\mathcal {T}}_2(g,g) \Big ) \Bigg ) \\&= P_s \Bigg ( \frac{1}{\sqrt{{\mathcal {T}}(g,g)}} \Big ( {\mathcal {T}}_2(g,g) - \frac{2\gamma T_L({\mathcal {T}}(\partial _{p_1}g,\partial _{p_1}g))^2 + 2\gamma T_R({\mathcal {T}}(\partial _{p_N}g,\partial _{p_N}g))^2 }{4 {\mathcal {T}}(g,g)} \Big ) \Bigg ) \\&\ge P_s \Bigg ( \frac{1}{4 {\mathcal {T}}(g,g)^{3/2}} \bigg ( 4 \lambda _N ({\mathcal {T}}(g,g))^2 + 4\gamma T_L\big ( {\mathcal {T}}(\partial _{p_1} g) \big )^2 \\&\quad + 4 \gamma T_R\big ( {\mathcal {T}}(\partial _{p_N} g) \big )^2 -2\gamma T_L \big (\partial _{p_1} {\mathcal {T}}(g,g) \big )^2 -2\gamma T_R \big ( \partial _{p_N} {\mathcal {T}}(g,g)\big )^2 \bigg ) \Bigg ) \\&\ge P_s \left( \frac{4\lambda _N ({\mathcal {T}}(g,g))^2}{4 {\mathcal {T}}(g,g)^{3/2}} \right) = \lambda _N \Phi (s) \end{aligned}$$

where in the first equality we used that

$$\begin{aligned} {\mathcal {L}}(g) = \frac{{\mathcal {L}}(g^2)}{2g}- \frac{\Gamma (g,g)}{g}. \end{aligned}$$

In the first inequality we used the formula

$$\begin{aligned} {\mathcal {T}}_2(f,f) \ge \lambda _N {\mathcal {T}}(f,f)+ \gamma T_L {\mathcal {T}}(\partial _{p_1}f,\partial _{p_1}f) +\gamma T_R {\mathcal {T}}(\partial _{p_N}f,\partial _{p_N}f) \end{aligned}$$

from the proof of Proposition 3.2, that

$$\begin{aligned} \Gamma (f,g)= \gamma T_L (\partial _{p_1}f) (\partial _{p_1} g) +\gamma T_R (\partial _{p_N}f) (\partial _{p_N} g) \end{aligned}$$

where \(\Gamma \) is the Carré du Champ operator defined in (2.1), and that \({\mathcal {T}}\) and \(\partial _{p_1}\) obviously commute. Now from Grönwall’s lemma we get

$$\begin{aligned} \Phi (t) \ge e^{ \lambda _N t} \Phi (0)\ \Rightarrow \ {\mathcal {T}} ( P_{t} f, P_tf) \le e^{-2 \lambda _N t} \Big ( P_t ( \sqrt{{\mathcal {T}}(f,f)}) \Big )^2. \end{aligned}$$

\(\square \)

This pointwise, strong gradient bound implies a Log–Sobolev inequality.

Proof of Proposition 1.5

For \(f \in C_c^{\infty }({\mathbb {R}}^{2N}) \), we introduce the functional

$$\begin{aligned} H(s) = P_s \Big ( P_{t-s}f \log P_{t-s}f \Big ) \end{aligned}$$

for fixed \(s \in [0,t]\) evaluated at a fixed point in the phase space. We denote by \(\Gamma \) the Carré du Champ operator defined in (2.1) and following again Bakry’s recipes, we get

$$\begin{aligned} H'(s)&= P_s \Big ( {\mathcal {L}}(P_{t-s}f \log P_{t-s}f) - {\mathcal {L}}P_{t-s}f \log P_{t-s}f - {\mathcal {L}}(P_{t-s}f) \Big ) \\&= P_s \Big ( \Gamma (P_{t-s}f,\log P_{t-s}f )\Big ) \\&= P_s \left( \frac{\gamma T_L (\partial _{p_1}P_{t-s}f)^2}{P_{t-s}f} + \frac{\gamma T_R (\partial _{p_N}P_{t-s}f)^2}{P_{t-s}f} \right) \\&= P_s \left( \frac{\Gamma (P_{t-s}f,P_{t-s}f)}{P_{t-s}f} \right) \\&\le \gamma T_L \Vert b_N^{-1} \Vert _{2} P_s \left( \frac{{\mathcal {T}}(P_{t-s}f,P_{t-s}f)}{P_{t-s}f} \right) \\&\le \gamma T_L \Vert b_N^{-1} \Vert _{2} P_s \left( e^{-2\lambda _N (t-s)} \frac{(P_{t-s} (\sqrt{{\mathcal {T}}(f,f)}))^2}{P_{t-s}f } \right) \\&\le \gamma T_L \Vert b_N^{-1} \Vert _{2} P_t \left( \frac{{\mathcal {T}}(f,f)}{f} \right) e^{-2 \lambda _N (t-s)} \end{aligned}$$

where for the second inequality we used the bound from Proposition 3.6, while for the last inequality we applied Jensen’s and the fact that the function \(y^2/x\) is convex for xy positive. Now integrating from 0 to t, we get

$$\begin{aligned} H(t)-H(0)&\le \frac{ \gamma T_L \Vert b_N^{-1} \Vert _{2} }{2\lambda _N} (1- e^{-2\lambda _N t}) P_t \left( \frac{{\mathcal {T}}(f,f)}{f} \right) \\&\le \frac{ \gamma T_L \Vert b_N^{-1} \Vert _{2} \Vert b_N \Vert _{2}}{2 \lambda _N} (1- e^{-2\lambda _N t}) P_t \left( \frac{| \nabla _z f |^2}{f} \right) \end{aligned}$$

Letting \(t \rightarrow \infty \) and thanks to the ergodicity of the semigroup, we get the LSI with constant \( \frac{\gamma T_L \Vert b_N^{-1} \Vert _{2} \Vert b_N \Vert _{2} }{2\lambda _N } \) corresponding to the constant with the non-perturbed Fischer information. Therefore, applying the estimates from Proposition 1.1 we have

$$\begin{aligned} \frac{\gamma T_L \Vert b_N^{-1} \Vert _{2} }{ 2\lambda _N}&= \frac{\gamma T_L \Vert b_N^{-1}\Vert _2}{ 2 \left( {\text {min}}(1,2T_R) \Vert b_N \Vert _{2}^{-1} - (C_{pin}^N + C_{int}^N ) \Vert b_N\Vert _2^{1/2} \Vert b_N^{-1}\Vert _2^{1/2}\right) } \\&\le \frac{\gamma T_L C_{a,c}}{N^{-3}\left( {\text {min}}(1,2T_R)C_{a,c}^{-1}-C_0C_{a,c}\right) }:= \lambda _0^{-1}\gamma T_L C_{a,c} N^3 \end{aligned}$$

where \(C_0\) is the constant in (1.5) which we choose small enough, i.e. to satisfy

$$\begin{aligned} C_0< {\text {min}}(1,2T_R) C_{a,c}^{-2}, \end{aligned}$$

so that \(\lambda _0 >0\). \(\square \)

4 Convergence to Equilibrium in Kantorovich–Wasserstein Distance

We use that the gradient estimate (3.6) is equivalent to an estimate in Wasserstein distance (Kuwada’s duality [26]). More specifically, we have the following Theorem, here stated only in the Euclidean space with the Lebesgue measure \( ({\mathbb {R}}^{2N}, |\cdot |, \lambda )\) and only for the Wasserstein-2 distance:

Theorem 4.1

(Theorem 2.2 of [26]) Let a Markov semigroup P on \({\mathbb {R}}^{2N}\), that has a continuous density with respect to the Lebesgue measure. For \(c>0\), the following are equivalent:

  1. (i)

    For all probability measures \(\mu , \nu \) we have,

    $$\begin{aligned} W_2 (P_t^* \mu , P_t^* \nu ) \le c W_2 (\mu ,\nu ). \end{aligned}$$
  2. (ii)

    For all bounded and Lipschitz functions f and \( z \in {\mathbb {R}}^{2N}\),

    $$\begin{aligned} |\nabla P_t f | (z) \le c P_t \big ( | \nabla f|^2\big )(z)^{1/2} \end{aligned}$$

    where this estimate is associated with the Lipschitz norm defined just above.

Now we are ready to prove Theorem 1.4.

Proof of Theorem 1.4

The convergence follows if we apply Kuwada’s duality from Theorem 4.1 since we have the estimate (3.6) with \(c= \Vert b_N^{-1} \Vert _{2}^{1/2} \Vert b_N \Vert _{2}^{1/2}.\) Therefore the contraction reads

$$\begin{aligned} W_2(P_t^*f_0^1, P_t^* f_0^2) \le \Vert b_N \Vert _{2}^{1/2} \Vert b_N^{-1} \Vert _{2}^{1/2} e ^{- \lambda _N t} W_2 (f_0^1,f_0^2). \end{aligned}$$
(4.1)

Since \(\lambda _N\), as defined in (3.3), is:

$$\begin{aligned} \lambda _N = {\text {min}}(1,2T_R) \Vert b_N \Vert _{2}^{-1} - (C_{pin}^N + C_{int}^N ) \Vert b_N\Vert _2^{1/2} \Vert b_N^{-1}\Vert _2^{1/2}, \end{aligned}$$

by exploiting the estimates on \(\Vert b_N\Vert _2\) and \(\Vert b_N^{-1}\Vert _2\) from the Proposition 1.1 we quantify the rate:

$$\begin{aligned} \lambda _N&\ge {\text {min}}(1,2T_R) C_{a,c}^{-1} N^{-3} - C_0 N^{-9/2}C_{a,c}N^{3/2} = ({\text {min}}(1,2T_R)C_{a,c}^{-1}-C_0 C_{a,c}) N^{-3} := \lambda _0 N^{-3} \end{aligned}$$

Choosing \(C_0 < {\text {min}}(1,2T_R) C_{a,c}^{-2}\) gives us \(\lambda _N >0\) for all N.

This gives us the statement of the Theorem:

$$\begin{aligned} W_2(P_t^*f_0^1, P_t^* f_0^2) \le C_{a,c} N^{\frac{3}{2}} e^{- \frac{\lambda _0}{N^3}t}\ W_2 (f_0^1,f_0^2). \end{aligned}$$
(4.2)

Finally, for the uniqueness of the stationary solution \(f_{\infty }\), we see that all the solutions \(f_t\) will converge towards it if we make the choice \(f_0^2= f_{\infty }\). \(\square \)

5 Entropic Convergence to Equilibrium

If \(\mu \) is the invariant measure of the system, we prove here convergence to the stationary state in Entropy as stated in Theorem 1.6: first with respect to the functional

$$\begin{aligned} {\mathcal {E}}(f) := \int _{{\mathbb {R}}^{2N}} f \log f+ f {\mathcal {T}}( \log f, \log f) d\mu \end{aligned}$$

and then using the equivalence of \({\mathcal {T}}(f,f)\) with \(|\nabla f |^2\).

Proof of Theorem 1.6

We consider the functional

$$\begin{aligned} \Lambda (s) =P_s\Big (P_{t-s}f \log P_{t-s}f \Big )+ P_s \Big ( P_{t-s}f {\mathcal {T}}( \log P_{t-s}f, \log P_{t-s}f) \Big ) \end{aligned}$$

and by differentiating and repeating similarly the steps from the Propositions 3.6 and 1.5 we end up with

$$\begin{aligned} \Lambda '(s)&= P_s \Big ( \Gamma (P_{t-s}f, \log P_{t-s}f)\Big ) + P_s {\mathcal {L}} \Big ( P_{t-s}f {\mathcal {T}}( \log P_{t-s}f, \log P_{t-s}f) \Big ) \\&\quad - 2 P_s \left( P_{t-s}f {\mathcal {T}}\left( \log P_{t-s}f , \frac{{\mathcal {L}} P_{t-s}f}{P_{t-s}f} \right) \right) -P_s\Big ( {\mathcal {L}} P_{t-s}f {\mathcal {T}}( \log P_{t-s}f, \log P_{t-s}f) \Big )\\&\ge P_s \Big (P_{t-s}f {\mathcal {L}} {\mathcal {T}}(\log P_{t-s} f) \Big ) + 2 P_s \Big ( \Gamma ( P_{t-s}f, {\mathcal {T}}(\log P_{t-s}f,\log P_{t-s}f )) \Big ) \\&\quad -2 P_s \Big ( P_{t-s}f {\mathcal {T}} \big (\log P_{t-s}f, \Gamma (\log P_{t-s}f, \log P_{t-s}f)+ {\mathcal {L}}(\log P_{t-s}f)\big ) \Big ) \\&= 2 P_s \Big ( P_{t-s}f {\mathcal {T}}_2(\log P_{t-s}f,\log P_{t-s}f) \Big ) \\&\ge 2\lambda _N P_s \Big ( P_{t-s}f {\mathcal {T}}(\log P_{t-s}f, \log P_{t-s}f) \Big ) \end{aligned}$$

where we have used that for the second inequality

$$\begin{aligned}&\Gamma (P_{t-s}f, \log P_{t-s}f) \ge 0,\ {\mathcal {L}} ( \log P_{t-s}f ) = \frac{{\mathcal {L}} P_{t-s} f }{P_{t-s}f} - \Gamma ( \log P_{t-s} f,\log P_{t-s}f )\ \text {and} \\&\quad {\mathcal {T}} \big (\log P_{t-s}f, \Gamma (\log P_{t-s}f, \log P_{t-s}f) \big ) = \Gamma \big (\log P_{t-s}f, {\mathcal {T}}(\log P_{t-s}f, \log P_{t-s}f) \big ) \end{aligned}$$

and in the last inequality we used the bound (3.4). We introduce a constant \(\eta \) on which we will optimise later, we integrate against the invariant measure \(\mu \) and we apply the Log-Sobolev inequality from Proposition 1.5:

$$\begin{aligned} \int _{{\mathbb {R}}^{2N}} \Lambda '(s) d\mu&\ge 2 \eta \frac{\lambda _N}{C_N} \int _{{\mathbb {R}}^{2N}} P_s \Big ( P_{t-s}f \log P_{t-s}f \Big ) d\mu \\&\quad + 2(1-\eta )\lambda _N \int _{{\mathbb {R}}^{2N}} P_s \Big ( {\mathcal {T}}(\log P_{t-s}f,\log P_{t-s}f) P_{t-s}f \Big ) d\mu \\&\ge 2 \lambda _N \text {min} \left( \frac{\eta }{C_N},1-\eta \right) \int _{{\mathbb {R}}^{2N}} \Lambda (s) d\mu \end{aligned}$$

since \(\int _{{\mathbb {R}}^{2N}} P_s \Big ( P_{t-s}f \log P_{t-s}f \Big ) d\mu =\int _{{\mathbb {R}}^{2N}} P_s \Big ( P_{t-s}f \log P_{t-s}f - P_{t-s}f +1 \Big ) d\mu \) which is nonnegative. For \(\eta := \frac{C_N}{1+C_N}\) we have

$$\begin{aligned} \int _{{\mathbb {R}}^{2N}} \Lambda '(s) d\mu \ge 2 \lambda _N \frac{C_N}{1+C_N} \int _{{\mathbb {R}}^{2N}} \Lambda (s) d\mu . \end{aligned}$$

Finally, from Grönwall’s inequality we have

$$\begin{aligned} \int \Lambda (0) d\mu \le e^{-2 \lambda _N \frac{C_N}{1+C_N} t} \int \Lambda (t) d\mu \end{aligned}$$

or equivalently the desired convergence, thanks to the invariance of the measure. Since \(\lim _{N \rightarrow \infty } \lambda _N \frac{C_N}{1+C_N} = \lim _{N \rightarrow \infty } \lambda _N \), we have that the exponential rate is indeed of order \(\lambda _N\) (as in the convergence in Theorem 1.4):

$$\begin{aligned} {\mathcal {E}}(P_t f) \le e^{- \lambda _N t } {\mathcal {E}}(f) \end{aligned}$$
(5.1)

Since \( {\mathcal {T}}\) and \( | \nabla _z|^2\) are equivalent, see (3.2), we get the above convergence in the non-perturbed setting with equivalence-constant \( {\text {max}}\left( 1,\Vert b_N^{-1}\Vert _2 \right) \Vert b_N \Vert _{2}. \)

In particular, both the Boltzmann entropy \(H_{\mu }(P_tf \mu )\), given by (1.13), and the Fisher information \(I_{\mu }(P_tf \mu )\), given by (1.14), decay:

$$\begin{aligned} H_{\mu }(P_tf \mu ) + I_{\mu }(P_tf \mu ) \le \frac{{\text {max}}(1,\Vert b_N\Vert _2)}{{\text {min}}\left( 1,\Vert b_N^{-1}\Vert _2^{-1}\right) } e^{- \lambda _N t} \Big ( H_{\mu }(f \mu ) + I_{\mu }(f \mu ) \Big ) \end{aligned}$$
(5.2)

Thus, combining with the conclusion of Proposition 1.1, the denominator is of order 1 with the dimension, and, as in the proof of Theorem 1.4, \(\lambda _N \ge \lambda _0 N^{-3}\) and we conclude. \(\square \)

Remark 5.1

  1. (i)

    The rate of the convergence to the stationary state, \(\lambda _N\), does not depend on the difference of the temperatures \(\Delta T\): under the assumption (H2) we get existence of spectral gap for all \(\Delta T\), since the twisted curvature condition from Proposition 3.2 sees only the first order terms of the generator. The scaling of \(\lambda _N\) relies on the result of the Proposition 1.1 and we can see through its proof that it is not affected by \(\Delta T\). Therefore, the same scaling holds in the equilibrium case \(\Delta T=0\) as well.

  2. (ii)

    Regarding the boundary conditions: Assumption (H1) is not necessary in order to obtain existence of a spectral gap with a lower bound \(N^{-3}\). In fact, we have spectral gap as soon as there is a solution to the matrix equation (1.10), which is the case when \(a\ge 0, c>0\) (see Proposition 3.1). Therefore, the proof of Proposition 1.1 still holds, with minor differences, when we consider the following b.c. as well (free in a sense): \(q_0=q_1\), \(q_N=q_{N+1}\). Note that for the harmonic chain, this is suggested by numerical simulations similar to Fig. 2, too. We work under assumption (H1) here in order to keep the presentation of Sect. 6 as simple as possible. This is since, (H1) corresponds to the Discrete Laplacian B with Dirichlet b.c. (i.e. is constant along the diagonal) giving us more symmetries, whereas the above-mentioned free b.c. correspond to the Discrete Laplacian B with Neumann b.c.

  3. (iii)

    A comment on the choice of \(\Pi _N\): We have the curvature condition from Proposition 3.2 by considering any positive definite r.h.s. of (1.10). We choose specifically \(\Pi _N\), since then we can compare \(b_N\) to \(b_0\) that solves (2.4) (\(b_0\) is the covariance matrix for the harmonic chain) and then we bound \(\Vert b_N^{-1}\Vert _2\). See the end of proof of the Proposition 1.1.

  4. (iv)

    A convergence to equilibrium in total variation norm for a similar small perturbation of the harmonic oscillator chain, has been shown recently in [32]. There, a version of Harris’ ergodic Theorem was applied making it possible to treat more general cases of the oscillator chain with different kind of noises, as well. However, this is a non-quantitative version of Harris’ Theorem, which provides no information on the dependency of the convergence rate in N.

6 Estimates on the Spectral Norm of \(b_N\)

First, let us state the following Proposition on the optimal exponential rate of convergence for the purely harmonic chain.

Proposition 6.1

(Proposition 7.1 and 7.2 (3) in [7]) We write \(\lambda _N^H\) for the spectral gap of the dynamics which evolution is described by the generator (1.8), without the perturbing potentials, i.e. dynamics of the linear chain, and \(\rho :=\inf \{\text {Re}(\mu ) : \mu \in \sigma (M) \}\). We have

$$\begin{aligned} \lim _{N \rightarrow \infty } \frac{\lambda _N^H}{\rho } \in {\mathbb {R}} . \end{aligned}$$

Moreover the spectral gap approaches 0 as N goes to infinity as follows:

$$\begin{aligned} \rho \le \frac{C}{2N} \end{aligned}$$
(6.1)

for some constant C independent of N.

Proof

We exploit the results by Arnold and Erb in [1] or by Monmarché in [31, Proposition 13]: working with an operator of the form

$$\begin{aligned} L f(x) = -(M^T x) \cdot \nabla _x f (x) + \text {div}({\mathcal {F}} \Theta \nabla _x f )(x) \end{aligned}$$

under the conditions that (i) no non-trivial subspace of \(\text {Ker} ({\mathcal {F}} \Theta )\) is invariant under M and (ii) the matrix M is positively stable, i.e. all the eigenvalues have real part greater than 0, then the associated semigroup has a unique invariant measure and if \(\rho >0\), then for the exponential rate \(\lambda _N^H\) of the above Ornstein–Uhlenbeck process we have

$$\begin{aligned} \rho - \epsilon \le \lambda _N^H \le \rho \end{aligned}$$

for every \(\epsilon \in (0, \rho )\). Fix such an \(\epsilon >0\) and conclude the first statement of the Proposition. In particular, when m is the maximal dimension of the Jordan block of M corresponding to the eigenvalue \(\lambda \) such that \(\text {Re}(\lambda ) = \rho \), the quantity \((1+t^{2(m-1)})e^{-2\rho t}\) is the optimal one regarding the long time behaviour, [31]. This implies that the spectral gap of the generator is \(\rho -\epsilon \), whereas the constant in front of the exponential is

$$\begin{aligned} c(\epsilon ,m):= \sup _{t}(1+t^{2(m-1)})e^{-2\epsilon t}. \end{aligned}$$

The harmonic chain satisfies the conditions (i) and (ii): the first condition is equivalent to the hypoellipticity of the operator L, [23, Sect. 1], and our generator (1.8) is indeed hypoelliptic: it is proven, [16, Sect. 3, p. 667] and [9, Sect. 3], for more general classes of potentials than the quadratic ones, that the generator satisfies the rank condition of Hörmander’s hypoellipticity Theorem, [24, Theorem 22.2.1]. Also the matrix M is stable for every N, i.e. \(\text {Re}(\lambda ) > 0\) for all the eigenvalues \(\lambda \), see [25, Lemma 5.1].

For the second conclusion of the Proposition, we recall that the matrix M is given by (1.9) and we write,

$$\begin{aligned} 2\gamma = \text {Tr}({\mathcal {F}}) = \text {Re}(\text {Tr}({\mathcal {F}}))= \text {Re}(\text {Tr}(M)) = \sum _{\lambda \in \sigma (M) }\text {Re}(\lambda ). \end{aligned}$$

In the r.h.s. we have a sum of 2N (counting multiplicity) positive terms, since \(\inf \{\text {Re}(\lambda ) \} \) is strictly positive, [25, Lemma 5.1(2)]. Now note that the \(\text {Tr}({\mathcal {F}})\) does not depend on the number of oscillators, so the r.h.s. of the above displayed equation should be uniformly bounded in N. Since

$$\begin{aligned} \sum _{\lambda \in \sigma (M) }\text {Re}(\lambda ) \ge 2N\inf \{\text {Re}(\lambda ): \lambda \in \sigma (M)\} \end{aligned}$$

we have that \(2N\inf \{\text {Re}(\lambda ): \lambda \in \sigma (M)\}\) is bounded asymptotically with N, which implies the second part of the statement. \(\square \)

Remark 6.2

B can be seen as the Schrödinger operator : \(B=-c\ \Delta ^N + \sum _{i=1}^N a \delta _i\) where \(c>0\), \( \Delta ^N \) is the Dirichlet Laplacian on \( l^2( \{1,\ldots ,N \})\) and \(\delta _i\) the projection on the i-th coordinate. We give the following definition for the (discrete) Laplacian on \( l^2( \{1,\ldots ,N \})\) with Dirichlet boundary conditions:

$$\begin{aligned} -\Delta ^N := \sum _{i=1}^{N-1} L^{i,i+1} \end{aligned}$$

where \(L^{i,i+1}\) are uniquely determined by the quadratic form

$$\begin{aligned} \langle u, L^{i,i+1} u \rangle&= (u(i)-u(i+1))^2\quad \text {with} \\ u(0)&=u(N+1)=0\qquad \text {Dirichlet b.c}. \end{aligned}$$

We will use this information in the last part of the proof of Proposition 1.1, to bound the spectral norm of the inverse, \(\Vert b_N^{-1} \Vert _2\).

The rest of this section is devoted to the study of the solution of the matrix equation (1.10). Note that [35, 36] are two other cases where a Lyapunov equation is explicitly solved in order to study the thermal transport in atom harmonic chains. The right hand side of the equation in the two above-mentioned cases is much simpler though, therefore it is easier to provide an analytical formula which represents the unique solution as in [36].

Here we split the \(2N \times 2N\) dimensional problem into 4 equal-sized blocks of dimension \(N\times N\). Then we exploit all the information we get about each block from the following Lemma 6.3. In order to ease the readability of the proof we split it into several lemmas until the end of the section.

6.1 Matrix Equations on Lyapunov Equation

Lemma 6.3

For \(0 \le m \le N\), we have the following equations for the blocks \(x_m,y_m\) and \(z_m\) of the matrix \(b_m\):

$$\begin{aligned} -z_m&=z_m^T +{\widetilde{J}}_m \end{aligned}$$
(6.2)
$$\begin{aligned} x_m&= B y_m + {\mathcal {F}} z_m \end{aligned}$$
(6.3)
$$\begin{aligned} -Bz_m+z_mB -&B {\widetilde{J}}_m = J_m^{(\Delta T)}- x_m {\mathcal {F}} - {\mathcal {F}} x_m \end{aligned}$$
(6.4)
$$\begin{aligned} y_m B - By_m&= {\mathcal {F}} + z_m {\mathcal {F}} + {\mathcal {F}} z_m\ \text {for}\ m \ge 1 \end{aligned}$$
(6.5)
$$\begin{aligned} y_m B - By_m&= z_m {\mathcal {F}} + {\mathcal {F}} z_m \quad \text {for}\ m=0 \end{aligned}$$
(6.6)

Here \({\widetilde{J}}_m = {\text {diag}}(1,1,\ldots ,1,0,\ldots ,0, 1,1,\ldots ,1)\) where the 0’s start at \((m+1,m+1)\)-entry and stop at \((N-(m+1),N-(m+1))\)-entry, and

\(J_m^{(\Delta T)} = {\text {diag}}(2T_L,1,\ldots ,1,0,\ldots ,0,1,\ldots ,1,2T_R) \) where the 0’s start at \((m+2,m+2)\)-entry and stop at \((N-(m+2),N-(m+2))\)-entry.

Proof

We consider m s.t. \(0 \le m \le N\), where \(b_m\) solves

$$\begin{aligned} b_m M +M^T b_m= \Pi _m \end{aligned}$$
(6.7)

and where

$$\begin{aligned} \Pi _m = \begin{bmatrix} J_m^{(\Delta T)} &{}0 \\ 0&{} {\widetilde{J}}_m \end{bmatrix}. \end{aligned}$$

From (6.7) and considering that \(x_m\) and \(y_m\) are symmetric matrices, we get

$$\begin{aligned} \begin{bmatrix} x_m{\mathcal {F}} + {\mathcal {F}} x_m +z_mB+Bz_m^T &{} -x_m+{\mathcal {F}} z_m + B y_m \\ -x_m+ z_m^T {\mathcal {F}} +y_m B &{} -z_m^T -z_m \end{bmatrix} = \begin{bmatrix} J_m^{(\Delta T)} &{}0 \\ 0&{} {\widetilde{J}}_m \end{bmatrix}. \end{aligned}$$

From that we get (6.2) and (6.3) directly, and also that:

$$\begin{aligned} Bz_m^T+z_mB&= J_m^{(\Delta T)}- x_m \Gamma - \Gamma x_m \end{aligned}$$
(6.8)

and by applying (6.2) to (6.8) we get (6.4).

Also, using that \(x_m\) and \(y_m\) are required to be symmetric matrices, from the transposed version of (6.3), we get the equation

$$\begin{aligned} x_m = y_m B -z_m {\mathcal {F}}- {\widetilde{J}}_m {\mathcal {F}} \end{aligned}$$

which, combined with (6.3), gives (6.5) for \(m\ge 1\) and (6.6) for \(m=0\). \(\square \)

From now on, we perform all the calculations when the dimension of the block matrices, N, is odd. The same calculations with minor differences hold when N is even as well.

6.2 Calculations for \(m=0,1,2\)

Before we start analysing the form of the block \(z_N\), we first present how each unit in the right hand side of the Lyapunov equation (6.7) for \(0 \le m \le N\) (that corresponds to the spread of noise on the system), affects the \(z_m\) block of the solution \(b_m\).

This subsection is only to make it easier for the reader to follow on how perturbing the r.h.s. of the Lyapunov equation affects the solution in each sequential step. Then in the next subsection we analyse the \(z_N\) block (\(m=N\)) which is what we are interested in. Thus, the reader who is interested only in the proofs, and not in the motivation behind them, might skip this subsection.

For \(m=0\): The unique solution \(b_0\) of

$$\begin{aligned} b_0 M + M^T b_0= \text {diag}(2T_L,0,\ldots , 2T_R,0,\ldots ,0) \end{aligned}$$

has been computed in [35], where they found exactly the elements of \(z_0 := (z_{ij}^{(0)})_{1\le i,j \le N}\) when \(a=0,c=1\), to be

$$\begin{aligned} z_{1,j}^{(0)}=\frac{{\text {sinh}}((N-j)\alpha )}{{\text {sinh}}(N\alpha )} \end{aligned}$$
(6.9)

for \(\alpha \) constant such that \( {\text {cosh}} (\alpha ) = 1+\frac{1}{2\gamma }\). (It was done in the same manner with [43, Sect. 11] but there the case was \(\Delta T =0\)). Here we describe briefly the steps: first we notice that \(z_0\) is antisymmetric since in (6.2) \(J_0^{(0)}=0\), and second, by (6.4) we get that it has a Toeplitz-form

(6.10)

Indeed note that the r.h.s of (6.4) forms a bordered matrix

$$\begin{aligned} \left[ \begin{array}{c|ccc|c} *&{} *&{} \cdots &{} *&{} *\\ \hline *&{}0 &{} \quad &{} 0 &{} *\\ \vdots &{} \quad &{} \ddots &{} \quad &{} \vdots \\ *&{} 0 &{}\quad &{} 0 &{} *\\ \hline *&{} *&{} \cdots &{} *&{} *\end{array} \right] \end{aligned}$$

i.e. only the bordered elements are non zero and so the l.h.s of (6.4) should also have this bordered form. Due to the tridiagonal form of B we get a Toeplitz matrix: in particular using that \(B= -c \Delta ^N + a I\), the l.h.s of (6.4) is

$$\begin{aligned} z_0 ( -c \Delta ^N + a I)-(-c \Delta ^N + a I) z_0 = c(\Delta ^N z_0 - z_0 \Delta ^N ) =\left[ \begin{array}{c|ccc|c} *&{} *&{} \cdots &{} *&{} *\\ \hline *&{}0 &{} \quad &{} 0 &{} *\\ \vdots &{} \quad &{} \ddots &{} \quad &{} \vdots \\ *&{} 0 &{}\quad &{} 0 &{} *\\ \hline *&{} *&{} \cdots &{} *&{} *\end{array} \right] \end{aligned}$$
(6.11)

and equating the non-boundary entries, due to the symmetry of \(\Delta ^N\) and the antisymmetry of \(z_0\), we have that the elements of \(z_0\) will be constant along the diagonals: indeed, for \(1<i<N\), for the diagonal’s entries of the Eq. (6.11) we have

$$\begin{aligned} -cz_{i-1,i}^{(0)}-cz_{i+1,i}^{(0)}&+2cz_{i,i}^{(0)} -2c z_{i,i}^{(0)}+c z_{i,i-1}^{(0)} + c z_{i,i+1}^{(0)}=0 \\ \text {or}\quad 2cz_{i,i+1}^{(0)}&-2c z_{i-1,i}^{(0)} =0\quad \text {and so}\quad z_{i,i+1}^{(0)}=z_{i-1,i}^{(0)}. \end{aligned}$$

For the superdiagonal’s entries of the Eq. (6.11)

$$\begin{aligned}&-cz_{i-1,i+1}^{(0)}+2cz_{i,i+1}^{(0)}-cz_{i+1,i+1}^{(0)}+cz_{ii}^{(0)}-2cz_{i,i+1}^{(0)}+cz_{i,i+2}^{(0)}=0 \\&\text {or}\quad -cz_{i-1,i+1}^{(0)} +cz_{i,i+2}^{(0)}=0 \quad \text {and so}\quad z_{i-1,i+1}^{(0)}=z_{i,i+2}^{(0)}. \end{aligned}$$

We repeat these calculations through all the non-boundary entries of the matrix, and using the information we get from each one calculation, we end up with the Toeplitz form of \(z_0\) in (6.10).

We can now see that a solution to (6.6) is a symmetric Hankel matrix which is antisymmetric about the cross diagonal and such that \((y_{1,j}^{(0)})_{j=1}^{N-1}= z_{1,j+1}^{(0)}.\) Then we apply (6.3) to get a formula for the entries of \(x_0\) and from the bordered entries of \(x_0\) from (6.4), we end up with the linear equation

$$\begin{aligned} K_0 \cdot \underline{z_0} =e_1. \end{aligned}$$

Here \(\underline{z_0},\ e_1 \in {\mathbb {C}}^{N-1}\) are the vectors \(\underline{z_0}=(z_{1,1}^{(0)}, \ldots , z_{1,N-1}^{(0)})^T\), \(e_1=(1,0,\ldots ,0)^T\) and \(K_0\) is a \((N-1) \times (N-1)\) symmetric Jacobi matrix whose entries depend on the (dimensionless) friction constant \(\gamma \) and interaction constant c:

$$\begin{aligned} K_0= cB+ \gamma ^{-1} I. \end{aligned}$$

We solve the above equation using for example Cramer’s rule and we find an explicit formula for the \(z_{1,j}^{(0)}\)’s: the recurrence formula of the determinant of \(K_0\) is the same formula of the Chebyshev polynomials of the second kind, so using properties of these polynomials and imposing appropriate initial conditions we end up with the form (6.9).

For \(m\ge 1\) we use again the Eq. (6.4). In the first step we get that:

For \(m=1\), i.e. for the form of the \(z_1\)-block in \(b_1\), the elements \(z_{1,1}^{(1)}, z_{N,N}^{(1)}\) in the main diagonal are \(-1/2\). The difference with the \(m=0\) step is that \(z_1\) is not antisymmetric anymore, since 1/2 is added in the first entry of the diagonal (due to the form of \({\widetilde{J}}_1\)). So from (6.2) we write

$$\begin{aligned} -z_{i,i}^{(1)}=z_{i,i}^{(1)}+1\quad \text {or}\quad z_{i,i}^{(1)}=-1/2\quad \text {for}\ i=1,N. \end{aligned}$$

But we still have the bordered form in the r.h.s. of (6.4), so we still have a Toeplitz-form for \(z_1\).

In the next Lemma we give the form of the \(z_2\) block of \(b_2\).

Lemma 6.4

(For \(m=2\), form of \(z_2\)) For the \(z_2\)-block of \(b_2\) : There exists an antisymmetric matrix \(z_2^{anti}\): \( z_2 = z_2^{anti} - {\widetilde{J}}_2 \) and

$$\begin{aligned} \left\{ \begin{array}{ l l } z_{1,1}^{(2)}=z_{2,2}^{(2)}=z_{N,N}^{(2)}=z_{N-1,N-1}^{(2)}=-1/2\ \text {and}\ z_{i,i}^{(2)}=0\quad \text {otherwise} \\ z_{1,2}^{(2)}+ z_{N,N-1}^{(2)} = 2\frac{ 1 +a+2c}{4c},\quad z_{N,N-2}^{(2)}+z_{1,3}^{(2)}=1 \\ z_{N-k,N}^{(2)} =z_{1,k+1}^{(2)}\quad \text {for}\ 3\le k \le N-3 .\end{array} \right. \end{aligned}$$

The last property is that the Toeplitz form is not perturbed in more than 2 diagonals away from the centre.

So we denote by \(\mu _{a,c}:= \frac{1+a+2c}{4c}\) and we write:

Proof of Lemma 6.4

\(z_2 \) is not antisymmetric but from (6.2) we immediately have that \( z_2=z_2^{anti}-{\widetilde{J}}_2\), where \(z_2^{anti}\) is antisymmetric. So we work with \(z_2^{anti}\) and due to the antisymmetry we look only at the upper diagonal part of the matrix.

Here, besides that \(z_2\) is not antisymmetric, the r.h.s of (6.4) is not a bordered matrix anymore and also the matrix \(B {\widetilde{J}}_2\) affects non boundary entries as well, in particular it adds the \((3 \times 2)\) top-left and bottom-right submatrices of B to the \((3 \times 2) \) respective submatrices of \(z_2\):

(6.12)

Equating the entries that correspond to the zero-submatrix as drawn above we will have the same calculations as in the step \(m=0\).

From (6.2) we have \(z_{1,1}^{(2)}=z_{2,2}^{(2)}=z_{N,N}^{(2)}=z_{N-1,N-1}^{(2)}=-1/2\) and \(z_{i,i}^{(2)}=0\) for \(N-1>i>2\).

Looking at the (2, 2)-entry and the (2, 3)-entry of the equation (6.12) we have respectively

$$\begin{aligned} -c z_{2,1}^{(2)} + 2c z_{2,2}^{(2)}-c z_{2,3}^{(2)}&+c z_{1,2}^{(2)} -2c z_{2,2}^{(2)}+c z_{3,2}^{(2)} -\frac{(a+2c)}{2} = \frac{1}{2} \\ -c z_{2,2}^{(2)}+2c z_{2,3}^{(2)}-c z_{2,4}^{(2)}+&c z_{1,3}^{(2)}-2c z_{2,3}^{(2)}+c z_{3,3}^{(2)}=0 \end{aligned}$$

and since \(z_{i,j}^{(2)}=-z_{j,i}^{(2)}\) for \(j \ne i\) from (6.2), and also \(z_{2,2}^{(2)}=-1/2, z_{3,3}^{(2)}=0\), we get

$$\begin{aligned} z_{2,3}^{(2)}=z_{1,2}^{(2)}- \mu _{a,c}\quad \text {and}\quad z_{2,4}^{(2)}=z_{1,3}^{(2)}+1/2. \end{aligned}$$

Now looking at the entries (ii) for \(3 \le i\le N-2\) of equation (6.12), we write (as in the 0-step):

$$\begin{aligned} -cz_{i,i-1}^{(2)} + 2c z_{i,i}^{(2)} -cz_{i,i+1} + cz_{i-1,i}^{(2)} -2c z_{i,i}^{(2)} + cz_{i+1,i}^{(2)} =0 \end{aligned}$$

which gives

$$\begin{aligned} z_{i-1,i}^{(2)}=z_{i,i+1}^{(2)},\quad 3 \le i\le N-2. \end{aligned}$$

In particular

$$\begin{aligned} z_{i,i+1}^{(2)}&= z_{1,2}^{(2)}-\mu _{a,c}=- z_{N,N-1}^{(2)} + \mu _{a,c}\quad \text {and}\\ z_{i,i+2}^{(2)}&= z_{1,3}^{(2)}+ \frac{1}{2} = -z_{N,N-2}^{(2)} -\frac{1}{2} \end{aligned}$$

where the second equations in both lines are proved by looking at the reversed direction (bottom-right to top-left side of the matrix). Also for \(k \ge 2\) and \( 1 \le i \le N-k\), look at \((i,i+k)\) entry of the Eq. (6.12) and get

$$\begin{aligned} z_{i,i+k+1}^{(2)}=z_{i-1,i+k}^{(2)}. \end{aligned}$$

This corresponds to the Toeplitz property that holds for all the diagonals apart from the 5 central ones. Remember that for \(m=0\) we end up with a Toeplitz matrix. \(\square \)

In the m-th step of the sequence of these matrix equations, for the \(z_m\)- block of \(b_m\), the central \((4m-3)\) diagonals have a perturbed Toeplitz form: the elements across these diagonals on each line are changed by constants that depend on the coefficients ac.

The resulting matrix \(z_m\) is described in the following way, where \(\mu _{a,c}:= \frac{1+a+2c}{4c}\):

$$\begin{aligned} \left\{ \begin{array}{ l l } z_{1,j}^{(m)}+z_{N,N-(j-1)}^{(m)} = m\mu _{a,c},\quad &{}\text {for j even},\ j\le m \\ z_{1,j}^{(m)}+z_{N,N-(j-1)}^{(m)} = -m,\quad &{}\text {for j odd},\ j\le m \\ z_{N-j,N}^{(m)} = z_{1,j+1}^{(m)},\quad &{}\text {for}\ m<j<N-2,\qquad \textit{(Toeplitz form)}\\ z_{i,i}^{(m)} =-1/2, &{}\text {for}\ 1 \le m\ \text {and}\ i \ge N-m\\ z_{i,i}^{(m)}=0,\ &{}\text {for}\ m< i < N-m. \end{array} \right. \end{aligned}$$

The explanation is the same as in the step \(m=2\) but this holds for an arbitrary \(m\le N\).

6.3 Preliminaries: Compute the Blocks \(z_N\), \(y_N, x_N\) of \(b_N\)

Lemma 6.5

(Form of \(z_N\) block) The matrix \(z_N:= (z_{i,j}^{(N)})_{1 \le i,j \le N}\) is a real \(N \times N\) matrix of the form

$$\begin{aligned} z_N = z_N^{anti}- \frac{1}{2} I \end{aligned}$$

where \(z_N^{anti}= [z_{i,j}^{(N),anti}]\) is an antisymmetric matrix. We denote by \(\mu _{a,c} := \frac{1+a+2c}{2c}\). \(z_N\) has the following perturbed Toeplitz form: for \(2 \le i\le N-k\) and \(1 \le k \le N-2\),

$$\begin{aligned} \left\{ \begin{array}{ l l } z_{i,i+k}^{(N),anti} - z_{i-1,i+k-1}^{(N),anti}=-\mu _{a,c},\ &{}\text {for}\ k\ \text {odd} \\ z_{i,i+k}^{(N),anti} - z_{i-1,i+k-1}^{(N),anti}=1,\ &{}\text {for}\ k\ \text {even} \end{array} \right. \end{aligned}$$
(6.13)

and for the second and second-to-last line respectively:

$$\begin{aligned} \left\{ \begin{array}{ l l } z_{2,k}^{(N),anti} - z_{1,k-1}^{(N),anti}=-\mu _{a,c},\quad z_{N-1,k}^{(N),anti}-z_{N,k+1}^{(N),anti}= - \mu _{a,c},\ &{}\text {for}\ k\ \text {odd} \\ z_{2,k}^{(N),anti} - z_{1,k-1}^{(N),anti}=1,\quad z_{N-1,k}^{(N),anti}-z_{N,k+1}^{(N),anti}=1 &{}\text {for}\ k\ \text {even} \\ \end{array} \right. \end{aligned}$$
(6.14)

Regarding the ’cross-diagonal’ we have, for \(1 \le k \le N-2\),

$$\begin{aligned} \left\{ \begin{array}{ l l} z_{i,i+k}^{(N),anti} - z_{N-k-(i-1),N-(i-1)}^{(N),anti}=(N-k-2i+1)\mu _{a,c},\ &{}\text {for}\ k\ \text {odd},\ 1\le i \le \frac{N-k}{2} \\ z_{i,i+k}^{(N),anti} - z_{N-k-(i-1),N-(i-1)}^{(N),anti}=k-N+2i-1,\ &{}\text {for}\ k\ \text {even},\ 1 \le i \le \frac{N-(k+1)}{2}. \end{array} \right. \end{aligned}$$
(6.15)

In particular,

$$\begin{aligned} \left\{ \begin{array}{ c l } z_{1,1+k}^{(N),anti} +z_{N,N-k}^{(N),anti}=(N-(k+1))\mu _{a,c},\ &{}\text {for}\ k\ \text {odd} \\ z_{1,1+k}^{(N),anti} +z_{N,N-k}^{(N),anti}=k-N+1,\ &{}\text {for}\ k\ \text {even}. \end{array} \right. \end{aligned}$$
(6.16)

This corresponds to the relation of the first row with the last row of the matrix.

From the above Lemma we conclude that \(z_N\) can be written in the general form

$$\begin{aligned} z_N=&-\frac{1}{2} I + \sum _{ \begin{array}{c} k=1 \\ k\ \text {odd} \end{array}}^{N-1} \left( z_{N,N-k}^{(N)}( {\underline{J}}^k - {\overline{J}}^k) + \sum _{j=k+1}^{N} (N-j) \mu _{a,c}({\overline{\iota }}_j + {\underline{\iota }}_{-j}) \right) \nonumber \\&+ \sum _{ \begin{array}{c} k=1 \\ k\ even \end{array}}^{N-1} \left( z_{N,N-k}^{(N)}( {\underline{J}}^k - {\overline{J}}^k) - \sum _{j=k+1}^{N} (N-j)({\overline{\iota }}_j + {\underline{\iota }}_{-j}) \right) \end{aligned}$$
(6.17)

where we write \({\overline{J}}\) for the square matrix with 1’s in the superdiagonal and \({\underline{J}}\) for the matrix with 1’s in the subdiagonal.

Also \({\overline{\iota }}_k \) for the matrix with 1 in the \((k,k+1)\)- entry and \({\underline{\iota }}_{-k}\) for the matrix with \(-1\) in the \((k+1,k)\)-entry. So for example

For a visualisation:

Proof of Lemma 6.5

From (6.2) we have

$$\begin{aligned} z_N = z_N^{anti}-\frac{1}{2}I, \end{aligned}$$

where \(z_N^{anti}\) is antisymmetric matrix. So in order to find the form of \(z_N\) we only need to study \(z_N^{anti}\) and due to its antisymmetry, we only need to study its upper triagonal part.

We look at the non-bordered entries of the upper triagonal part of (6.4). That is the equation

$$\begin{aligned} c(-\Delta ^N z_N^{anti} + z_N^{anti} \Delta ^N) -B = \left[ \begin{array}{c|c|ccc|c} *&{} *&{}*&{} \cdots &{} *&{} *\\ \hline *&{}1 &{}0 &{} \quad &{} 0 &{} *\\ \hline *&{} 0 &{}1 &{}\quad &{}0 &{}*\\ \vdots &{} \quad &{} \quad &{} \ddots &{} \vdots \\ \hline *&{} 0 &{}0 &{}\quad &{} 1&{} *\\ \hline *&{} *&{} *&{} \cdots &{} *&{} *\end{array} \right] . \end{aligned}$$
(6.18)

Looking at the diagonal’s entries (ii) for \(1<i<N\) of the above Eq. (6.18), we write

$$\begin{aligned} -cz_{i,i-1}^{(N),anti}+2cz_{i,i}^{(N),anti}-cz_{i,i+1}^{(N),anti}+cz_{i-1,i}^{(N),anti}-2cz_{i,i}^{(N),anti}+cz_{i+1,i}^{(N),anti}-(2c+a) =1 \end{aligned}$$

and using the antisymmetry of the elements of \(z_N^{anti}\), it gives

$$\begin{aligned} z_{i,i+1}^{(N),anti}&=z_{i-1,i}^{(N),anti}- \mu _{a,c} = z_{i-2,i-1}^{(N),anti}-2\mu _{a,c} \\&=\dots =z_{1,2}^{(N),anti} - (i-1) \mu _{a,c}. \end{aligned}$$

Therefore, inductively we get

$$\begin{aligned} z_{i,i+1}^{(N),anti}= z_{1,2}^{(N),anti}-(i-1)\mu _{a,c}. \end{aligned}$$
(6.19)

At the same time, looking from bottom-right to top-left, we can write

$$\begin{aligned} z_{i-1,i}^{(N),anti}&=z_{i,i+1}^{(N),anti}+ \mu _{a,c} =z_{i+1,i+2}^{(N),anti}+ 2\mu _{a,c} \\&=\dots =z_{N,N-1}^{(N),anti}+(i-1) \mu _{a,c}. \end{aligned}$$

Then, looking at the super-diagonal’s entries, i.e. the \((i,i+1)\)-entry, for \(1< i < N-1\), of Eq. (6.18), we write

$$\begin{aligned} -cz_{i,i}^{(N),anti} + 2cz_{i,i+1}^{(N),anti}-cz_{i,i+2}^{(N),anti}+cz_{i-1,i+1}^{(N),anti}-2cz_{i,i+1}^{(N),anti}+cz_{i+1,i+1}^{(N),anti}+c=0 \end{aligned}$$

and that gives

$$\begin{aligned} z_{i,i+2}^{(N),anti}=z_{i-1,i+1}^{(N),anti}+1= \cdots = z_{1,3}^{(N),anti}+(i-1) \end{aligned}$$

and at the same time (reversed direction, i.e. from bottom right to top left)

$$\begin{aligned} z_{i-1,i+1}^{(N),anti} =-z_{i+2,i}^{(N),anti}-1=\cdots =- z_{N,N-2}^{(N),anti}- (N-(i+1)) . \end{aligned}$$

Similarly, looking at the entries \((i,i+2)\) for \(1< i < N-2\):

$$\begin{aligned} c z_{i-1,i+2}^{(N),anti} -2c z_{i,i+2}^{(N),anti} + c z_{i+1,i+2}^{(N),anti} -c z_{i,i+1}^{(N),anti} + 2c z_{i,i+2}^{(N),anti} -c z_{i,i+3}^{(N),anti}=0. \end{aligned}$$

Apply (6.19) twice: \( z_{i+1,i+2}^{(N),anti} = z_{1,2}^{(N),anti} - i\mu _{a,c}\) and \( -z_{i,i+1}^{(N),anti} = -z_{1,2}^{(N),anti}+ (i-1)\mu _{a,c}\) and get

$$\begin{aligned} z_{i-1,i+2}^{(N),anti}-\mu _{a,c}= z_{i,i+3}^{(N),anti}. \end{aligned}$$

So inductively,

$$\begin{aligned} z_{i,i+3}^{(N),anti}= z_{1,4}^{(N),anti} - (i-1)\mu _{a,c}. \end{aligned}$$
(6.20)

Also, from the reversed direction we get inductively

$$\begin{aligned} z_{i,i+3}^{(N),anti} = z_{N,N-3}^{(N),anti}-(N-3-i). \end{aligned}$$

For the general case, as stated in the Lemma, we prove it by induction in k. For \(k=1,2,3\) is true from the above calculations. We do it for k odd. Let it hold for \(k-2\), we look at the \((i,i+k-1)\)-entry of Eq. (6.18) : for \(1<i<N-(k-1)\),

$$\begin{aligned}&cz_{i-1,i+k-1}^{(N),anti}-2cz_{i,i+k-1}^{(N),anti}+cz_{i+1,i+k-1}^{(N),anti} -cz_{i,i+(k-2)}^{(N),anti} + 2cz_{i,i+k-1}^{(N),anti} -cz_{i,i+k}^{(N),anti}=0\quad \text {or} \\&z_{i-1,i+k-1}^{(N),anti}- z_{i,i+k}^{(N),anti}+ ( z_{i+1, i+1+(k-2)}^{(N),anti} - z_{i,i+(k-2)}^{(N),anti} )=0 . \end{aligned}$$

Then from the induction hypothesis we end up with the (6.13). The case k even follows similarly.

Now generalise the previous induction formulas for k odd for example and write:

$$\begin{aligned} z_{i,i+k}^{(N),anti}= z_{1,k+1}^{(N),anti}-(i-1)\mu _{a,c} \end{aligned}$$

and from the reversed direction

$$\begin{aligned} z_{i,i+k}^{(N),anti}=(N-k-i)\mu _{a,c} + z_{N-k,N}^{(N),anti}. \end{aligned}$$

From these two equations we have the specific case (6.16). k even is proven similarly. For (6.15) we write for k odd:

$$\begin{aligned} z_{i,i+k}^{(N),anti} - z_{N-k-(i-1),N-(i-1)}^{(N),anti}&= z_{i-1,i+k-1}^{(N),anti}-\mu _{a,c} - (z_{N-k-i,N-i}^{(N),anti} + \mu _{a,c} ) \\&= z_{i-1,i+k-1}^{(N),anti} - z_{N-k-i,N-i}^{(N),anti} - 2\mu _{a,c} \\&= \cdots = z_{1,k+1}^{(N),anti} - z_{N-k,N}^{(N),anti}-2(i-1)\mu _{a,c}\\&= (N-k-2i+1)\mu _{a,c}. \end{aligned}$$

where in the last line we applied (6.16). The case k even is proven in the same way. \(\square \)

The above discussion shows that in order to understand the entries of \(z_N\), we need only to understand the vector \(\underline{z_N} = (z_{1,2}^{(N)}, z_{1,3}^{(N)},\ldots , z_{1,N}^{(N)})\).

We state now a Lemma that shows the relation between the elements of \(\underline{z_N}\) and the entries of the first row and the last column of \(x_N=[x_{i,j}^{(N)}]\), concluding a relation between \(x_{1,j}^{(N)}\) and \(x_{i,N}^{(N)}\) about the ’cross diagonal’.

Lemma 6.6

For \(3 \le k \le N\),

$$\begin{aligned} \left\{ \begin{array}{ l l } z_{1,k}^{(N),anti} = 1+\frac{\gamma }{c}x_{1,k-1}^{(N)} =- \frac{\gamma }{c}x_{N,N-k+2}^{(N)}-(N-k+1),\ &{}\text {for}\ k\ \text {odd} \\ z_{1,k}^{(N),anti}= -\mu _{a,c} +\frac{\gamma }{c}x_{1,k-1}^{(N)} =- \frac{\gamma }{c}x_{N,N-k+2}^{(N)}+(N-k+1)\mu _{a,c},\ &{}\text {for}\ k\ \text {even} \end{array} \right. \end{aligned}$$
(6.21)

and \(z_{1,2}^{(N),anti} = \frac{\gamma }{c}x_{1,1}^{(N)} - \frac{T_L+a+2c}{2c}\) and so for \(3 \le k \le N\)

$$\begin{aligned} \left\{ \begin{array}{ c l } x_{1,k-1}^{(N)} =- x_{N,N-k+2}^{(N)}-\frac{c}{\gamma }(N-k+2),\ &{}\text {for}\ k\ \text {odd} \\ x_{1,k-1}^{(N)} =- x_{N,N-k+2}^{(N)}+\frac{c}{\gamma }(N-k+2) \mu _{a,c},\ &{}\text {for}\ k\ \text {even}. \end{array} \right. \end{aligned}$$
(6.22)

Also \(x_{1,N}^{(N)} = \frac{c}{2\gamma }\mu _{a,c}\), where \(\mu _{a,c} := \frac{1+a+2c}{2c}\).

Proof

We look at the bordered entries of Eq. (6.4). Let us first look at (Nj)-entry for j even:

$$\begin{aligned} -c z_{N,j-1}^{(N),anti}+2cz_{N,j}^{(N),anti}-cz_{N,j+1}^{(N),anti}+c z_{N-1,j}^{(N),anti} -2c z_{N,j}^{(N),anti}= -\gamma x_{N,j}^{(N)}. \end{aligned}$$

Using Lemma 6.5 we write

$$\begin{aligned} c z_{1,N-j+2}^{(N),anti} + (j-2)c + c z_{1,N-j }^{(N),anti}+jc -cz_{1,N-j}^{(N),anti} - (j-1)c = - \gamma x_{N,j}^{(N)} \end{aligned}$$

and after the obvious cancellations we have for j even

$$\begin{aligned} x_{N,j}^{(N)} = -\frac{c}{\gamma }z_{1,N-j+2}^{(N),anti} - (j-1)\frac{c}{\gamma }. \end{aligned}$$
(6.23)

Similarly for j odd we have

$$\begin{aligned} x_{N,j}^{(N)} = -\frac{c}{\gamma }z_{1,N-j+2}^{(N),anti} + (j-1)\frac{c}{\gamma }\mu _{a,c}. \end{aligned}$$
(6.24)

Moreover, with exactly the same calculations, but looking at the (1, j)-entry of Eq. (6.4) we get, for \(2 \le j \le N-1\),

$$\begin{aligned} x_{1,j}^{(N)} = \frac{c}{\gamma }z_{1,j+1}^{(N),anti} - \frac{c}{\gamma }\ \text {for}\ j\ \text {even}\quad \text {and}\quad x_{1,j}^{(N)}= \frac{c}{\gamma }z_{1,j+1}^{(N),anti} + \frac{c}{\gamma } \mu _{a,c}\ \text {for}\ j\ \text {odd}. \end{aligned}$$
(6.25)

Now for \(k:=N-j+2\) then \(3 \le k \le N\). Since N is odd, whenever j is odd, k is even and the opposite. Solving the Eqs. (6.24) and (6.23) for \(z_{1,k}^{(N),anti}\), we get the second equations in (6.21), whereas solving (6.25) for \(\lambda := j+1\), for \(z_{1,\lambda }^{(N),anti}\), we get the first equations in (6.21) as well. We conclude with (6.22) just by combining the above relations in both cases.

Finally to get this specific value for \(x_{1,N}^{(N)}\) we look at the (1, N)-entry of Eq. (6.4) and perform the same calculations as above. \(\square \)

Considering the above Lemma we can write the matrix \(z_N\) also as follows:

where \(\kappa _L := \frac{T_L+a+2c}{2c}\) and \(\kappa _R := \frac{T_R+a+2c}{2c}\).

In the following we state a Lemma about the symmetries that hold in \(y_N\)-block of \(b_N\), concluding that all the entries of \(y_N\) can be written in terms of the vectors \(\underline{y_N}:= (y_{1,N}^{(N)},y_{1,N-1}^{(N)}, \ldots , y_{1,1}^{(N)})\) and \(\underline{z_N}\).

Lemma 6.7

For \(2 \le i \le N-(k+1)\) and \(1 \le k \le N-3\),

$$\begin{aligned}&y_{i-1,i+k}^{(N)} - y_{i,i+k-1}^{(N)} + (y_{i+1,i+k}^{(N)} - y_{i,i+k+1}^{(N)})=0 \end{aligned}$$
(6.26)
$$\begin{aligned}&y_{2,k}^{(N)} = y_{1,k-1}^{(N)} + y_{1,k+1}^{(N)}+ \frac{\gamma }{c} z_{1,k}^{(N)},\quad \text {for}\quad 2 \le k \le N-1, \end{aligned}$$
(6.27)
$$\begin{aligned}&\text {and}\quad y_{2,N}^{(N)} = y_{1,N-1}^{(N)} + \frac{2\gamma }{c}z_{1,N}^{(N)} \nonumber \\&\quad y_{k,N}^{(N)}= \frac{\gamma }{c}(z_{k-1,N}^{(N)} + z_{1,N-(k-2)}^{(N)})+ y_{1,N-(k-1)}^{(N)},\quad \text {for}\quad 2 \le k \le N \end{aligned}$$
(6.28)

Proof

Due to symmetry of \(y_N\) is enough to look at the upper-triagonal part. We look at the entries \((i,i+k)\) of Eq. (6.5). For \(k=1\) we have

$$\begin{aligned} -y_{i,i}^{(N)}- y_{i,i+2}^{(N)}+y_{i-1,i+1}^{(N)}+y_{i+1,i+1}^{(N)}=0 \end{aligned}$$

which is the Eq. (6.26). For \(1<k< N-1\) we prove it by induction in k, like in the proof of Lemma 6.5. Let us now look at the (1, N)- entry of (6.5):

$$\begin{aligned} -c y_{1,N-1}^{(N)}+ 2cy_{1,N}^{(N)}-2cy_{1,N}^{(N)} + cy_{2,N}^{(N)} = 2 \gamma z_{1,N}^{(N),anti} \end{aligned}$$

which gives \(y_{2,N}^{(N)} = y_{1,N-1}^{(N)} + \frac{2\gamma }{c}z_{1,N}^{(N)}.\) For (6.27) we look at (1, k)- entry:

$$\begin{aligned} -cy_{1,k-1}^{(N)}+2cy_{1,k}^{(N)}-cy_{1,k+1}^{(N)} -2cy_{1,k}^{(N)}+cy_{2,k}^{(N)}=\gamma z_{1,k}^{(N),anti} \end{aligned}$$

which is

$$\begin{aligned} -y_{1,k-1}^{(N)} - y_{1,k+1}^{(N)} + y_{2,k}^{(N)} = \frac{\gamma }{c} z_{1,k}^{(N),anti} \end{aligned}$$

and this is the desired equation. For (6.28), we look at \((k-1,N)\)- entry of (6.5) for \(k\ge 3\). Performing the same calculations as above we get

$$\begin{aligned} y_{k,N}^{(N)} = \frac{\gamma }{c} z_{k-1,N}^{(N),anti} - y_{k-2,N}^{(N)}+y_{k-1,N-1}^{(N)}. \end{aligned}$$

Then using the relations (6.26) and (6.27) for each of the terms above, we get the stated relation. \(\square \)

With the result of the following Lemma we relate the entries of \(\underline{y_N}\) with the entries of \(\underline{z_N}\).

Lemma 6.8

Let B be the matrix (1.7). We have

$$\begin{aligned} \underline{y_N} = B^{-1} \underline{{\tilde{z}}_N} \end{aligned}$$
(6.29)

where \(\underline{{\tilde{z}}_N}\) is the vector

$$\begin{aligned} \underline{{\tilde{z}}_N}= \begin{bmatrix} \gamma z_{1,N}^{(N)} + \frac{c}{2\gamma }\mu _{a,c} \\ \frac{c}{\gamma }z_{1,N}^{(N)} - \frac{c}{\gamma }\\ \frac{c}{\gamma }z_{1,N-1}^{(N)} + \frac{c}{\gamma } \mu _{a,c}\\ \vdots \\ \frac{c}{\gamma }z_{1,N-i}^{(N)} + \frac{c}{\gamma }\mu _{a,c}\\ \frac{c}{\gamma }z_{1,N-(i+1)}^{(N)} - \frac{c}{\gamma } \\ \vdots \\ \frac{c}{\gamma }z_{1,3}^{(N)} -\frac{c}{\gamma } \\ \frac{c}{\gamma }z_{1,2}^{(N)} +\frac{T_L+a+2c}{2\gamma }+ \frac{\gamma }{2} \end{bmatrix} \end{aligned}$$

where \(\mu _{a,c} := \frac{1+a+2c}{2c}\). In particular:

$$\begin{aligned} \Vert \underline{y_N}\Vert _2 \lesssim \Vert \underline{z_N} \Vert _2 + N^{1/2}. \end{aligned}$$
(6.30)

Proof

We combine the information for \(x_{1i}\)’s we get from two equations: first from (6.3), we remind that Eq. (6.3) is

$$\begin{aligned} x_N=By_N+{\mathcal {F}} z_N \end{aligned}$$

and second from the bordered entries of (6.4), which is

$$\begin{aligned} -Bz_N+z_NB - B = J_N^{(\Delta T)}- x_N {\mathcal {F}} -{\mathcal {F}} x_N. \end{aligned}$$

We look at the element \(x_{1,N}^{(N)}\) and we write:

$$\begin{aligned} x_{1,N}^{(N)}&=(a+2c)y_{1,N}^{(N)}-cy_{2,N}^{(N)} +\gamma z_{1,N}^{(N),anti} = (a+2c)y_{1,N}^{(N)}-cy_{1,N-1}^{(N)} -2\gamma z_{1,N}^{(N),anti}+\gamma z_{1,N}^{(N),anti}\\&= (a+2c)y_{1,N}^{(N)}-cy_{1,N-1}^{(N)} -\gamma z_{1,N}^{(N),anti} \end{aligned}$$

and

$$\begin{aligned} x_{1,N}^{(N)}= \frac{c}{2\gamma } \mu _{a,c} \end{aligned}$$

which give

$$\begin{aligned} (a+2c)y_{1,N}^{(N)}-cy_{1,N-1}^{(N)} = \gamma z_{1,N}^{(N),anti}+ \frac{c}{2\gamma } \mu _{a,c}. \end{aligned}$$

Moreover

$$\begin{aligned} x_{1,N-1}^{(N)}&= (a+2c) y_{1,N-1}^{(N)} -c y_{2,N-1}^{(N)}+ \gamma z_{1,N-1}^{(N),anti} \\&= (a+2c) y_{1,N-1}^{(N)} -cy_{1,N-2}^{(N)}-cy_{1,N}^{(N)} - \gamma z_{1,N-1}^{(N),anti} + \gamma z_{1,N-1}^{(N),anti} \\&= (a+2c) y_{1,N-1}^{(N)} -cy_{1,N-2}^{(N)}-cy_{1,N}^{(N)} \end{aligned}$$

and from the proof of Lemma 6.6, see relation (6.25), we have

$$\begin{aligned} x_{1,N-1}^{(N)} = \frac{c}{\gamma } z_{1,N}^{(N),anti}- \frac{c}{\gamma }. \end{aligned}$$

Both of them give

$$\begin{aligned} (a+2c) y_{1,N-1}^{(N)} -cy_{1,N-2}^{(N)}-cy_{1,N}^{(N)} = \frac{c}{\gamma } z_{1,N}^{(N),anti}- \frac{c}{\gamma }. \end{aligned}$$

In general using again Lemma 6.7 and relation (6.25), we have

$$\begin{aligned} (a+2c)y_{1,N-i}^{(N)} -cy_{1,N-(i+1)}^{(N)}-cy_{1,N-(i-1)}^{(N)}&= \left\{ \begin{array}{ c l } &{}\frac{c}{\gamma } z_{1,N-(i-1)}^{(N),anti}- \frac{c}{\gamma },\ \text {if}\ i\ \text {odd} \\ &{} \frac{c}{\gamma } z_{1,N-(i-1)}^{(N),anti}+\frac{c}{\gamma }\mu _{a,c},\ \text {if}\ i\ \text {even}. \end{array} \right. \end{aligned}$$

For \(x_{1,1}^{(N)}\) we use that

$$\begin{aligned} x_{1,1}^{(N)} =\frac{c}{\gamma }z_{1,2}^{(N),anti} +\frac{c(T_L+a+2c)}{2\gamma c} \end{aligned}$$

from Lemma 6.6, and from (6.3),

$$\begin{aligned} x_{1,1}^{(N)} =(a+2c)y_{1,1}^{(N)}-cy_{1,2}^{(N)}- \frac{\gamma }{2}. \end{aligned}$$

Putting the above relations in a more compact form we have

$$\begin{aligned} B \underline{y_N} = \underline{{\tilde{z}}_N}. \end{aligned}$$

We end up with (6.30) considering that \(\Vert B^{-1}\Vert _2\) is uniformly (in N) bounded, since B has bounded spectral gap. \(\square \)

The following Lemma shows, through its proof, that there is one unique solution to the Lyapunov matrix equation (since one can explicitly find the entries of \(\underline{z_N}\), that determine all the rest) and eventually gives the scaling in N of the entries of \(\underline{z_N}\). For \(1 \le k \le N-2\), using all the information we have from the block equations in Lemma 6.3, we write all the \(z_{1,N-k}^{(N),anti}\) in terms of \(z_{1,N}^{(N),anti}\), which we then calculate explicitly. This is presented in the following Lemma.

Lemma 6.9

For \(1 \le k \le N-2\), the order of the entries of \(\underline{z_N}\) is given by

$$\begin{aligned} \left\{ \begin{array}{ c l } z_{1,N-k}^{(N),anti} &{}= {\mathcal {O}}\left( R^kz_{1,N}^{(N),anti} + \frac{k}{2}\mu _{a,c}\right) ,\ \text {for}\ k\ \text {odd} \\ z_{1,N-k}^{(N),anti} &{}= {\mathcal {O}}\left( R^kz_{1,N}^{(N),anti}-\frac{k}{2} \right) ,\ \text {for}\ k\ \text {even} \end{array} \right. \end{aligned}$$
(6.31)

and \(z_{1,N}^{(N),anti} = {\mathcal {O}}\left( R^{1-N} \left( \frac{ \kappa _R-\kappa _L}{2\gamma } \right) \right) \), where \(R:= \frac{c}{\gamma ^2} + \frac{a+2c}{c}\) and \(\mu _{a,c}:= \frac{1+a+2c}{2c}\). Therefore

$$\begin{aligned} |z_{1,i}^{(N),anti}| \lesssim {\mathcal {O}}\left( (\Delta T) R^{-i+1} +(N-i) \right) ,\quad \text {for}\quad 2\le i \le N \end{aligned}$$

where \(\Delta T\) is the temperature difference at the ends of the chain.

Proof

We look at the equations around \(x_{k,N}^{(N)}\) for \(2 \le k \le N\). First we look at \(x_{2,N}^{(N)}\) and from (6.23) we have

$$\begin{aligned} x_{2,N}^{(N)} = -\frac{c}{\gamma }z_{1,N}^{(N),anti}-\frac{c}{\gamma } \end{aligned}$$

while from the (2, N)-entry of (6.3) we have

$$\begin{aligned} x_{2,N}^{(N)}&= -cy_{1,N}^{(N)}+(a+2c)y_{2,N}^{(N)}-cy_{3,N}^{(N)}\\&= -cy_{1,N}^{(N)}+(a+2c)y_{1,N-1}^{(N)} + \frac{2\gamma (a+2c)}{c}z_{1,N}^{(N),anti}-\gamma (z_{2,N}^{(N),anti}+z_{1,N-1}^{(N),anti}) -cy_{1,N-2}^{(N)} \\&= x_{1,N-1}^{(N)} + \frac{2\gamma (a+2c)}{c}z_{1,N}^{(N),anti} -2\gamma z_{1,N-1}^{(N),anti}+ \gamma \mu _{a,c} \\&= \frac{c}{\gamma }z_{1,N}^{(N),anti}-\frac{c}{\gamma }+ \frac{2\gamma (a+2c)}{c}z_{1,N}^{(N),anti} -2\gamma z_{1,N-1}^{(N),anti}+ \gamma \mu _{a,c}. \end{aligned}$$

Combine them and get

$$\begin{aligned} z_{1,N-1}^{(N),anti} = R z_{1,N}^{(N),anti} + \frac{\mu _{a,c}}{2}. \end{aligned}$$
(6.32)

Then we look at \(x_{3,N}^{(N)}\): from (6.24) we have

$$\begin{aligned} -\frac{c}{\gamma }z_{1,N-1}^{(N),anti}+2\frac{c\mu _{a,c}}{\gamma } \end{aligned}$$

while from the (3, N)-entry of (6.3) we have similarly

$$\begin{aligned} x_{3,N}^{(N)}&= -cy_{2,N}^{(N)}+(a+2c)y_{3,N}^{(N)}-cy_{4,N}^{(N)}\\&=x_{1,N-2}^{(N)} -2\gamma z_{1,N}^{(N),anti}+ \frac{2\gamma (a+2c)}{c}z_{1,N-1}^{(N),anti} -2\gamma z_{1,N-2}^{(N),anti} - \frac{\gamma (a+2c)\mu _{a,c}}{c} -2\gamma . \end{aligned}$$

Combine them and get

$$\begin{aligned}R z_{1,N-1}^{(N),anti} = z_{1,N}^{(N),anti}+z_{1,N-2}^{(N),anti} + R \frac{\mu _{a,c}}{2} + 1. \end{aligned}$$

Then considering (6.32) as well, we have

$$\begin{aligned} z_{1,N-2}^{(N),anti}= (R^2-1)z_{1,N}^{(N),anti}-1. \end{aligned}$$
(6.33)

In the same manner, but looking around \(x_{4,N}^{(N)}\) and \(x_{5N}^{(N)}\), we get

$$\begin{aligned} z_{1,N-3}^{(N),anti} = (R^3-2R) z_{1,N}^{(N),anti}+ \frac{3 \mu _{a,c}}{2},\quad z_{1,N-4}^{(N),anti} = (R^4-3R^2+1) z_{1,N}^{(N),anti}-2. \end{aligned}$$
(6.34)

respectively. Inductively, we have a way to write all the elements of \(\underline{z_N}\) in terms of \(z_{1,N}^{(N),anti}\), and looking at the leading order in terms of N we have the general formula (6.31) for \(1 \le k \le N-2\). In particular, for \(k=N-3\) (is even by assumption on N) and \(k=N-2\) (odd) :

$$\begin{aligned} z_{1,3}^{(N),anti} \sim R^{N-3}z_{1,N}^{(N),anti} - \frac{N-3}{2},\quad z_{1,2}^{(N),anti} \sim R^{N-2}z_{1,N}^{(N),anti} + \frac{(N-2)\mu _{a,c}}{2}. \end{aligned}$$
(6.35)

respectively. Moreover, by looking at \(x_{N,N}^{(N)}\) combining (6.3) and (6.4) we have

$$\begin{aligned} R z_{1,2}^{(N),anti} = R\frac{(N-2)\mu _{a,c}}{2}- \frac{(3-N)}{2}+ \frac{(\kappa _R- \kappa _L)}{2 \gamma } + z_{1,3}^{(N),anti}. \end{aligned}$$

Plugging in the above equation the relations from (6.35), we write

$$\begin{aligned}&(R^{N-1}+R^{N-3} ) z_{1,N}^{(N),anti} + \frac{R(N-2)\mu _{a,c}}{2} \sim \frac{R(N-2)\mu _{a,c}}{2} - \frac{(3-N)}{2} + \frac{(\kappa _R- \kappa _L)}{2 \gamma } - \frac{(N-3)}{2} \\&\quad \text {which is}\quad z_{1,N}^{(N),anti} \sim R^{1-N} \left( \frac{\kappa _R- \kappa _L}{2 \gamma }\right) . \end{aligned}$$

We conclude the last statement by combining the above estimate on \(z_{1,N}^{(N),anti}\) with (6.31). \(\square \)

Now we estimate the entries \(\underline{y_N}\): from (6.30) and Lemma 6.9,

$$\begin{aligned} \Vert \underline{y_N}\Vert _2&\lesssim \left( \sum _{i=1}^N |z_{1,i}|^2 \right) ^{1/2} + N^{1/2} \lesssim N^{3/2} + N^{1/2} \lesssim N^{3/2}. \end{aligned}$$

This gives that

$$\begin{aligned} |y_{1,j}^{(N)}| \lesssim {\mathcal {O}}(N) \end{aligned}$$
(6.36)

and then also, since \( y_{k,N}^{(N)}= \frac{\gamma }{c}(z_{k-1,N}^{(N)} + z_{1,N-(k-2)}^{(N)})+ y_{1,N-(k-1)}^{(N)} \),

$$\begin{aligned} |y_{j,N}^{(N)}| \lesssim {\mathcal {O}}(N). \end{aligned}$$
(6.37)

Lemma 6.10

(Estimate on the spectral norm of \(y_N\)) For the spectral norm of \(y_N\) we have that

$$\begin{aligned} \Vert y_N\Vert _2 \lesssim {\mathcal {O}}(N^3). \end{aligned}$$

Proof

Let \(v=(v_1,v_2,\ldots ,v_N) \in {\mathbb {C}}^{N}\). We write \(L_i\) for the i-th row of the matrix \(y_N\) and then calculate

$$\begin{aligned} | y_N v |_2^2&= |L_1 \cdot v |^2 + \cdots + |L_N \cdot v |^2 \\&\le N \Bigg ( |y_{1,1}^{(N)} v_1|^2 + |y_{1,2}^{(N)}v_2|^2 +\cdots + |y_{1,N}^{(N)} v_{N}|^2 + \qquad \quad (\text {from}\ L_1\cdot v) \\&\quad + |y_{1,2}^{(N)}v_2 |^2 + |y_{2,2}^{(N)}v_2|^2+ \cdots + | y_{2,N}^{(N)}v_N|^2+ \qquad \quad (\text {from}\ L_2 \cdot v) \\&\quad \vdots \\&\quad + |y_{1, \lfloor \frac{N}{2}\rfloor +1}^{(N)}v_1|^2 +\cdots + |y_{\lfloor \frac{N}{2}\rfloor +1, \lfloor \frac{N}{2}\rfloor +1}^{(N)}v_{\lfloor \frac{N}{2}\rfloor +1}|^2+ \cdots + |y_{N, \lfloor \frac{N}{2}\rfloor +1}^{(N)}v_N|^2 \\&\quad + \big (\text {from}\ L_{\lfloor \frac{N}{2}\rfloor +1} \cdot v \big ) \\&\quad \vdots \\&\quad +|y_{1,N}^{(N)} v_1|^2+ | y_{2,N}^{(N)}v_2|^2+ \cdots + |y_{N,N}^{(N)} v_N |^2 \Bigg ) \qquad (\text {from}\ L_N \cdot v ) \end{aligned}$$

We estimate the terms due to the first half of the matrix, i.e. the terms until \( L_{\lfloor \frac{N}{2}\rfloor +1} \cdot v \): from Lemma 6.7 we write all the \(y_{i,j}^{(N)}\)’s in terms of the entries of \(\underline{y_N}\) and \(\underline{z_N}\) that, due to the observations above, scale at most like N. In particular for the second line

$$\begin{aligned} y_{2,k}^{(N)} = y_{1,k-1}^{(N)} + y_{1,k+1}^{(N)} + \frac{\gamma }{c} z_{1,k}^{(N),anti} \end{aligned}$$

and more generally

$$\begin{aligned} y_{i,i+k}^{(N)} = y_{1,1+k}^{(N)} + y_{1,3+k}^{(N)}+ \cdots + y_{1,2i+k-1}^{(N)} + \frac{\gamma }{c}\left( z_{1,2+k}^{(N),anti}+ \cdots + z_{1,2i+k-2}^{(N),anti}\right) . \end{aligned}$$

Then, from (6.36):

$$\begin{aligned}&|L_1 \cdot v |^2 + \cdots + \left| L_{\lfloor \frac{N}{2}\rfloor +1} \cdot v \right| ^2 \lesssim N \Bigg ( N^2 |v_1|^2+ \cdots + N^2 |v_{N}|^2 \nonumber \\&\qquad + N^2 |v_1|^2+ 3^2N^2|v_2|^2+ \cdots +3^2N^2|v_{N-1}|^2+ N^2 |v_N|^2 \nonumber \\&\qquad + N^2|v_1|^2 +3^2N^2|v_2|^2+5^2N^2|v_3|^2+5^2N^2|v_4|^2+ \cdots \nonumber \\&\qquad +5^2N^2|v_{N-2}|^2+3^2|v_{N-1}|^2+ N^2|v_N|^2 \nonumber \\&\qquad \vdots \nonumber \\&\qquad +N^2|v_1|^2 +3^2N^2|v_2|^2+\cdots + \left( 2 \Big \lfloor \frac{N}{2} \Big \rfloor +1 \right) ^2N^2 \left| v_{\lfloor \frac{N}{2}\rfloor +1}\right| ^2\nonumber \\&\qquad +\left( 2 \Big \lfloor \frac{N}{2} \Big \rfloor -1\right) ^2N^2 \left| v_{\lfloor \frac{N}{2}\rfloor +2}\right| ^2\nonumber \\&\qquad + \cdots + N^2 |v_N|^2 \Bigg ). \end{aligned}$$
(6.38)

So the highest order is due to \(\left| L_{\lfloor \frac{N}{2}\rfloor +1} \cdot v \right| ^2\) for which we estimate

$$\begin{aligned} \left| L_{\lfloor \frac{N}{2}\rfloor +1} \cdot v \right| ^2 \lesssim \Bigg ( 2N^2\sum _{i=1}^{\lfloor \frac{N}{2}\rfloor +1}(2i-1)^2 \Bigg ) |v|_2^2. \end{aligned}$$

The terms \((2i-1)\) in the sum above, denote the number of the entries of \(\underline{y_N}, \underline{z_N}\) that each \(y_{i,j}^{(N)}\) is given by.

Regarding the terms due to the second half of the matrix, we use again Lemma 6.7, Eq. (6.26). This way we write the elements \(y_{i,j}^{(N)}\)’s in terms of \(y_{N,j}^{(N)}\)’s and then from relation (6.28), we have all the \(y_{i,j}^{(N)}\)’s in terms of the entries of \(\underline{y_N}\) and \(\underline{z_N}\), that scale at most like N. So in the end we have

$$\begin{aligned} | y_N v |_2^2 \lesssim N \Bigg ( N^3N^2 \Bigg ) | v |_2^2 = N^6 |v|_2^2. \end{aligned}$$

Then

$$\begin{aligned} \frac{|y_N v |_2}{|v |_2} \lesssim {\mathcal {O}}(N^3)\qquad \text {and so}\qquad \Vert y_N \Vert _2 \lesssim {\mathcal {O}}(N^3) . \end{aligned}$$

Before we finish the proof, we give more details on the estimates (6.38) above:

For the first inequality we apply iteratively Lemma 6.7. Regarding the row \(L_2\):

$$\begin{aligned} y_{2,2}^{(N)} = y_{1,3}^{(N)} + y_{1,1}^{(N)}+ \frac{\gamma }{c} z_{1,2}^{(N),anti}. \end{aligned}$$

So \(y_{2,2}^{(N)}\) is given by the sum of 3 terms whose absolute value is of order not more than \({\mathcal {O}}(N)\). The same holds (from Lemma 6.7) for each \(y_{2,j}^{(N)}\) for \(j \le N-2\), i.e. until we reach the ’cross-diagonal’. After the ’cross-diagonal’: \(y_{2,N}^{(N)}= y_{1,N-1}^{(N)} + \frac{2\gamma }{c}z_{1,N}^{(N)}\), and \(|y_{1,N-1}^{(N)}|, |z_{1,N}^{(N)}|\) have order less than N.

Regarding the row \(L_3\):

$$\begin{aligned} y_{3,2}^{(N)} = y_{1,2}^{(N)} + y_{1,4}^{(N)}+ \frac{\gamma }{c}z_{1,3}^{(N),anti} \end{aligned}$$

is given by the sum of 3 terms whose absolute value has order less than N, while for \(y_{3,3}^{(N)}\), by applying Lemma 6.7 twice, i.e. until we end up only with elements of \(\underline{y_N}\) and \(\underline{z_N}\), we get

$$\begin{aligned} y_{3,3}^{(N)} = y_{1,3}^{(N)} + y_{1,1}^{(N)}+y_{1,5}^{(N)}+ \frac{\gamma }{c} \left( z_{1,2}^{(N),anti} + z_{1,4}^{(N),anti} \right) . \end{aligned}$$

So \(y_{3,3}^{(N)}\) is given by the sum of 5 terms whose absolute value has order less than N. For \(y_{3,j}^{(N)}\), \(j \le N-2\) (until the ’cross-diagonal’), apply Lemma 6.7 twice: the value of \(y_{3,j}^{(N)}\) is given by the sum of 5 such terms, while for \(N-1 \le j \le N\),

$$\begin{aligned} y_{3,N-1}^{(N)}&=y_{1,N-3}^{(N)}+ y_{1,N-1}^{(N)} + \frac{\gamma }{c} z_{1,N-2}^{(N),anti} \\ y_{3,N}^{(N)}&=\frac{\gamma }{c}\left( z_{2,N}^{(N)} + z_{1,N-1}^{(N)}\right) + y_{1,N-2}^{(N)} = \frac{2\gamma }{c}z_{1,N-1}^{(N)}- \frac{\gamma \mu _{a,c}}{c} + y_{1,N-2}^{(N)} \end{aligned}$$

and so they are given by 3 terms with absolute value of order at most N.

In general, the same holds for the row \(L_i\), \(i \le \lfloor \frac{N}{2}\rfloor +1\) from applications of Lemma 6.7 inductively. For all \(y_{i,j}^{(N)}\) we apply Lemma 6.7 until we have written each \(y_{i,j}^{(N)}\) only in terms of entries of \(\underline{y_N}\) and \(\underline{z_N}\).

For \(j \le i \), i.e. until the main diagonal, \(y_{i,j}^{(N)} \) is given by the sum of \(\nu \) terms, whose order is less than N, and

$$\begin{aligned} \nu =1,3,5, \cdots ,(2i-1)\quad \text {for}\quad y_{i,1}^{(N)}, y_{i,2}^{(N)}, \cdots , y_{i,i}^{(N)},\ \text {respectively}. \end{aligned}$$

For that we apply Lemma 6.7 and write

$$\begin{aligned} y_{i,j}^{(N)}= y_{j,i}^{(N)} = y_{1,i-j+1}^{(N)}+y_{1,i-j+3}^{(N)}+ \cdots + y_{1,j+i-1}^{(N)}+ \frac{\gamma }{c}\left( z_{1,i-j+2}^{(N),anti}+ \cdots + z_{1,i+j-2}^{(N),anti}\right) . \end{aligned}$$

This formula gives that \(y_{i,j}^{(N)}\) is the sum of \((2j-1)\) terms whose absolute value has order less than \({\mathcal {O}}(N)\).

The same holds for \(j > N-(i-1)\), i.e. after the ’cross-diagonal’, considering also (6.37). As for the rest terms in \(L_i\), for \(i \le j\le N-(i-1)\): \(y_{i,j}^{(N)}\) is given by the sum of \((2i-1)\) terms whose order is less than \({\mathcal {O}}(N)\). \(\square \)

Now, from (6.3) we can see that the entries of \(x_N\) can be written in terms of entries of \(z_N\) as well:

$$\begin{aligned} x_{i,j}^{(N)}&= \sum _{\begin{array}{c} k=1 \end{array}}^N \beta _{i,k} y_{k,j}^{(N)} + \gamma \sum _{k} ( \delta _{(i=1,k=1)}+\delta _{(i=N,k=N)}) z_{k,j}^{(N)} \\&= \sum _{\begin{array}{c} k=1,\\ k+j \le N \end{array}}^N \beta _{i,k}z_{1,j+k}^{(N)} + \sum _{\begin{array}{c} k=1,\\ k+j > N \end{array}}^N \beta _{i,k}z_{N,j+k-N-1}^{(N)} + \gamma \sum _{k} ( \delta _{(i=1,k=1)}+\delta _{(i=N,k=N)}) z_{k,j}^{(N)} \end{aligned}$$

where \(\beta _{ij}\) are the elements of the matrix B, (1.7), and the entries of \(y_N\) are split into two sums regarding their position about the cross diagonal.

We write

$$\begin{aligned} \Vert x_N\Vert _2 \le \Vert B\Vert _2 \Vert y_N\Vert _2+ \Vert {\mathcal {F}} a z_N\Vert _2 \lesssim \Vert y_N\Vert _2+ N \lesssim N^3 . \end{aligned}$$

Proof of Proposition 1.1

We are ready now to bound from above \(\Vert b_N\Vert _2\). We write for some positive constant \(C_{a,c}^1\)

$$\begin{aligned} \Vert b_N \Vert _2 \le \Vert x_N\Vert _2 + \Vert y_N\Vert _2 \le C_{a,c}^1N^3 \end{aligned}$$

where for the first inequality: since \(b_N\) is positive definite, decomposing \(b_N\) in its square root matrices:

$$\begin{aligned} b_N = \begin{bmatrix} \chi &{} \zeta \\ \zeta ^T &{} \psi \end{bmatrix} \begin{bmatrix} \chi &{} \zeta \\ \zeta ^T &{} \psi \end{bmatrix}&= \begin{bmatrix} \chi &{} 0 \\ \zeta ^T &{} 0 \end{bmatrix} \begin{bmatrix} \chi &{} \zeta \\ 0&{} 0 \end{bmatrix} + \begin{bmatrix} 0 &{} \zeta \\ 0 &{} \psi \end{bmatrix} \begin{bmatrix} 0 &{} 0 \\ \zeta ^T &{} \psi \end{bmatrix} \\&=: X^*X + Y^*Y. \end{aligned}$$

And since \(X^{*}X\) and \(XX^{*}\) are unitarily congruent and the same holds for \(Y^{*}Y\) and \(YY^{*}\) (from polar decomposition for example), there are unitary matrices U, \(V \in {\mathbb {C}}^{N \times N}\) so that:

$$\begin{aligned} b_N = X^{*}X+ Y^{*}Y = U XX^{*} U^{*} + V YY^{*}V^{*} = U \begin{bmatrix} x_N &{} 0 \\ 0 &{} 0 \end{bmatrix} U^{*} + V \begin{bmatrix} 0 &{} 0 \\ 0 &{} y_N \end{bmatrix} V^{*}. \end{aligned}$$

Then it is clear that for the spectral norm (which is unitarily invariant):

$$\begin{aligned} \left\| \begin{bmatrix} x_N &{} z_N \\ z_N^T &{} y_N \end{bmatrix} \right\| _2 \le \Vert x_N\Vert _2 + \Vert y_N\Vert _2 . \end{aligned}$$

Regarding the last part of the statement that \(\Vert b_N^{-1}\Vert _2\) is bounded from above: Let us first state some facts about the spectrum of the matrix \(b_0\) that solves

$$\begin{aligned} b_0 M+M^T b_0 = \text {diag} \left( 2T_L,0,\ldots ,2T_R,0,\ldots ,0 \right) := {\tilde{\Theta }}. \end{aligned}$$

It is known that \(b_0\) is the covariance matrix that determines the stationary solution of the Liouville equation in the harmonic chain (and it has been found explicitly in [35], see a description of their approach in the beginning of the proof of Lemma 6.5). From [25, Lemma 5.1], we know that \(b_0\) is bounded below and above:

$$\begin{aligned} T_R \begin{bmatrix} I &{}0 \\ 0 &{} B^{-1} \end{bmatrix} \le b_0 \le T_L \begin{bmatrix} I &{}0 \\ 0&{}B^{-1} \end{bmatrix}. \end{aligned}$$

Thus \(\Vert b_0\Vert _2 \) and \(\Vert b_0^{-1} \Vert _2\) are uniformly bounded in terms of N: from Remark 6.2 we write \(B=-c\ \Delta ^N + \sum _{i=1}^N \alpha \delta _i\). Even though here we will only use that \(\Vert b_0^{-1} \Vert _2\) is finite, in fact when \(a>0\), B possesses a spectral gap uniformly in N. Moreover, \(b_N \ge b_0\): since \( \Pi _N > {\tilde{\Theta }}\), for every \(t>0\),

$$\begin{aligned} e^{-t M^T}\ \Pi _N\ e^{-tM} > e^{-t M^T}\ {\tilde{\Theta }}\ e^{-t M} \end{aligned}$$

and since \(-M\) is stable (all the characteristic roots have negative real part) we have

$$\begin{aligned} b_N = \int _0^{\infty } e^{-t M^T} \Pi _N e^{-tM} dt > \int _0^{\infty } e^{-t M^T} {\tilde{\Theta }} e^{-tM} dt = b_0. \end{aligned}$$

So \(b_N^{-1} \le b_0^{-1}\) and so \( \Vert b_{ N}^{-1} \Vert _2 \le \Vert b_0^{-1} \Vert _2 \) which is less than a finite constant (because of the spectrum of the discrete Laplacian). Therefore there exists positive and finite constant \(C_{a,c}^2\) so that \(\Vert b_N^{-1}\Vert _2 \le C_{a,c}^2\). Conclude the Proposition by taking \(C_{a,c}:= \text {min}(C_{a,c}^1, C_{a,c}^2)\). \(\square \)

To sum up: for the homogeneous weakly anharmonic chain, the method described in Sect.  3with the modified Bakry–Emery criterion, gives a lower bound on the spectral gap that is of order \(N^{-3}\)(see the exponential rate in the main Theorems). For the purely harmonic chain, since we know that it always decays with N from Proposition 6.1, this lower bound shows that the spectral gap in this case can not decay at an exponential rate in N, it is at most polynomial.

In the next Proposition, exploiting the estimates on \(\Vert b_N\Vert _2\) from the above matrix analysis, we get alternatively the lower bound on the spectral gap of the harmonic chain.

Proof of Proposition 1.2

We remind that \(\Vert b_{ N} \Vert _2 \le C_{a,c} N^3 \) by Proposition 1.1 and that the spectral gap divided by \(\inf \{\text {Re}(\mu ) : \mu \in \sigma (M) \}\) is bounded below and above in terms of N, by Proposition 6.1. From [19, 39, Inequality (13)], we have an estimate for the decay of \(e^{-Mt}\):

$$\begin{aligned} \Vert e^{-Mt } \Vert ^2 \le \Vert b_{N} \Vert \Vert b_{N}^{-1} \Vert e^{-t/ \Vert b_{N}\Vert } \end{aligned}$$

So, for u be the (normalised) eigenvector corresponding to an eigenvalue of M, \(\mu >0\), we write

$$\begin{aligned} e^{-2 \text {Re}(\mu ) t } = \Vert e^{-2 \text {Re}(\mu ) t} u \Vert ^2 = \Vert e^{-Mt} u \Vert ^2 \le \Vert b_{N} \Vert \Vert b_{N}^{-1} \Vert e^{-t/ \Vert b_{N}\Vert } \end{aligned}$$

and therefore we write \( -2 \text {Re}(\mu ) \le - \frac{1}{\Vert b_{N}\Vert } \) which means

$$\begin{aligned} \text {Re}(\mu ) \ge \frac{1}{2 \Vert b_{N}\Vert }. \end{aligned}$$

Taking the infimum over the real parts of the eigenvalues of M, we conclude that

$$\begin{aligned} \inf \{\text {Re}(\mu ) : \mu \in \sigma (M) \} \ge C_{a,c}^{-1} N^{-3}. \end{aligned}$$

\(\square \)

Eventually, from the whole procedure in this note we have that the scaling of the spectral gap of the homogeneous harmonic chain is in between \(N^{-3}\) and \(N^{-1}\). In [7, Proposition 9.1] it is proven that this lower bound is the sharp one, i.e. an upper bound of order \(N^{-3}\) is provided.

From a simple numerical simulation in Matlab on the spectral gap of the matrix M, the true value is indeed \(N^{-3}\). In particular calculating the real part of the smallest eigenvalue of the matrix M and multiplying the result by \(N^3\) we get the following behaviour in Fig. 2, which shows that then the spectral gap converges for large N:

Fig. 2
figure 2

Scaled spectral gap as a function of the chain size for pinning coefficient \(a=0\), interaction coefficient \(c=1\) and friction constant \(\gamma =1\). We denote by \(\rho \) the spectral gap of the harmonic chain