1 Introduction

Kac’s model was introduced by Mark Kac in 1956 [15]. It is a stochastic N-particle model designed to mimic the dynamics of velocities of particles in a spatially homogeneous dilute gas. The dynamics are those of N particles with one dimensional velocities, these particles interact in a Markov process, where two particles “collide” resulting in a mixing of their velocities. The state of the system can be described by the vector of velocities of each of the particles. Kac derived an equation on the law of this system, this equation is usually called the Kac master equation and it is a linear integro-differential equation. Kac showed that, in a certain sense, as the number of particles goes to infinity the master equation tends to a Boltzmann like equation. This motivates estimates on the behaviour of the marginals of solutions which are uniform in the number of particles, which could then be used to show, or at least indicate, the same behaviour for the Boltzmann equation. In general a direct study of the Boltzmann equation has proved more fruitful, however the master equation has become an object of study in its own right. Convergence to equilibrium and spectral gaps have been studied in Kac’s master equation in both entropy [5, 10] and \(L^2\) [4, 14]. This paper studies convergence to equilibrium for solutions of the master equation coupled to a thermostat. More precisely, we study the master equation for a system of N particles who, as well as “colliding” with each other, can also “collide” with some infinite collection of other particles whose velocities lie in some fixed distribution. When this fixed distribution is not a Maxwellian this allows for the possibility of a non-equilibrium steady state. One possible more physical interpretation of this would be if the system was interacting with two different heat baths at different temperatures. Situations related to the existence and convergence to non-equilibrium steady states are studied in [1, 7, 9, 12, 16] and in particular looking at exponential convergence in [8, 17].

This paper is fundamentally motivated by two others the first [3] studies a similar model but only in the situation where the thermal bath is a Maxwellian distribution. They show exponential convergence to equilibrium in both entropy and \(L^2\). The second [6] studies the existence of non-equilibrium steady states in various coupled equations arising from mathematical physics including the non-linear spatially homogeneous Boltzmann equation. The paper [3] suggest as a further question, what would happen in the case of a non-Maxwellian reservoir and we adapt the techniques of [6] to study this situation. We also include a study of how our estimates on the first marginal behave as the number of particles \(N \rightarrow \infty \). This allows us, in some sense, to commute the long time and \(N \rightarrow \infty \) limit. The \(N \rightarrow \infty \) limit is very similar to the equations studied in [6], they study a coupled Boltzmann equation where in our case the limit would be a coupled Boltzmann–Kac equation. The convergence, both in this paper and in the Maxwellian case studied in [3], is primarily driven by the external force and not by the Kac mixing part. However, the effect of the Kac part is more evident in this paper since it affects the form of the steady state. The work in [3] has been extended in [2, 18] to study how their thermostatted model relates to a partially thermostatted model and to the original Kac’s model. In this second paper they make use of the GTW distance used in our work.

Following the strategy of [6] we study the problem of convergence to equilibrium in the Gabetta–Toscani–Wennberg metric. This metric is introduced in [11] and is

$$\begin{aligned} d_{GTW, N}(f,h) = \sup _{\xi \in \mathbb {R}^N, \xi \ne 0}\frac{|\hat{f}(\xi )-\hat{h}(\xi )|}{|\xi |^2}, \end{aligned}$$

where \(\hat{f}\) represents the Fourier transform of f. This is a metric on the space of probability measures with finite second moment and the same finite first moment. We also study convergence in the metric

$$\begin{aligned} d_{T1,N}(f,h)=\sup _{\xi \in \mathbb {R}^N, \xi \ne 0} \frac{|\hat{f}(\xi )-\hat{h}(\xi )|}{|\xi |}, \end{aligned}$$

This is a metric on the space of probability distributions with finite mean.

If we choose g to be the distribution of the particles in the thermostat and we pick \(g \in L^2\) such that g is a probability distribution function with zero mean and finite second moment \(K_g\) then the master equation for the system we study is

$$\begin{aligned} \partial _t F_n = -\lambda N(I-Q)[F_N] - \mu \sum _{j=1}^N (I-R_j)[F_N] = \mathcal {L}[F_N], \end{aligned}$$
(1)

where

and

In these

$$\begin{aligned} v_{ij}(\theta )&=(v_1,\dots ,v_i \cos (\theta )+v_j \sin (\theta ),\dots ,-v_i \sin (\theta )+v_j \cos (\theta ),\dots ,v_N),\\ v_j(w,\theta )&=(v_1,\dots ,v_j\cos (\theta ) + w \sin (\theta ),\dots ,v_N),\\ w_j^*&=w\cos (\theta )-v_j \sin (\theta ). \end{aligned}$$

We show that

Theorem 1

A steady state for the master equation exists, is unique and has the same moments up to order 2 as \(g^{\otimes N}\).

Theorem 2

If we start with initial data \(F^0_N\) and \(H^0_N\) which are probability distributions on \(\mathbb {R}^N\) with finite first and second moments then we have the following possible situations:

  1. 1.

    If \(F^0\) and \(H^0\) have the same mean initially then the GTW distance between the solutions is finite for all time and we get the exponential convergence:

    $$\begin{aligned} d_{GTW,N} (F_N(t), H_N(t)) \le e^{-\mu t/2} d_{GTW, N}(F_N^0, H_N^0). \end{aligned}$$
  2. 2.

    If \(F^0\) and \(H^0\) have different means then we can construct an altered distance in which the solutions still converge exponentially fast towards each other with rate \(\mu /2\). We also have the estimate

    $$\begin{aligned} d_{T1, N}(F_N(t), H_N(t)) \le e^{-\mu t/4} d_{T1,N}(F_N^0, H_N^0). \end{aligned}$$

Remark 1

The altered distance involves adding a correction term and is defined in order to deal with the fact that the GTW distance cannot deal with initial data with non-zero mean. If the two solutions initially have the same mean this reduces to the GTW distance. We give the theorem in both distances which shows we can either sacrifice something in the dependence on initial data or in the rate. In the asymptotic study as \(N \rightarrow \infty \) the two distances give the same dependence on N through different mechanisms which suggests that the dependence on N occurring here is in some way intrinsic to the problem.

Remark 2

Here \(\mu /2\) is the rate found in [3] to be the \(L^2\) spectral gap and the rate of convergence to equilibrium in relative entropy.

Furthermore we wish to study how the N particle Kac’s model behaves as \(N \rightarrow \infty \) in the manner originally proposed by Kac to link it with the spatially homogeneous Boltzmann equation. In order to do this we study how the convergence results which we have obtained can be translated into convergence results on the first marginal. We prove properties of the GTW metric which are similar to subadditivity. If the initial data \((F_N(0))_{N \ge 2}\) forms a chaotic family then we can control the convergence rate of the first marginals to equilibrium uniformly in N. We formally define the notion of chaotic family later. Similarly to [3] we can prove propagation of chaos in exactly the same manner as Kac in [15]. This means that the first marginals of the solution to the master equation will limit to the solution of a Boltzmann like equation. This motivates our proof of uniform in N convergence rates for the first marginal.

Theorem 3

Suppose that f and h are mean zero probability densities on \(\mathbb {R}\). If \((F_N(0,v))_{N \ge 2}\) and \((H_N(0,v))_{N \ge 2}\) are respectively fh-chaotic families with respect to the Gabetta–Toscani–Wennberg metric. If furthermore, the distance between \(F_N(0,\cdot )\) and \(f^{\otimes N}\), and between \(H_N(0,\cdot )\) and \(h^{\otimes N}\) are bounded uniformly in N, and \(F_N, H_N\) are the solution to the N-particle coupled Kac’s master equation with this initial data then there exists a constant C independent of N such that

$$\begin{aligned} d_{GTW,1} ( \Pi _1(F_N),\Pi _1(H_N)) \le (C+d_{GTW,1}(f,h)) e^{-\frac{\mu }{2}t}. \end{aligned}$$

Here we say that a family is f-chaotic with respect to a family of metrics, \((d_k)\), if

$$\begin{aligned} d_{k}(\Pi _k[F_N],f^{\otimes N}) \rightarrow 0, \end{aligned}$$

as \(N \rightarrow 0\) for every k. Here \(d_k\) is a metric on \(\mathbb {R}^k\) and \(\Pi _k\) is a projection onto this subspace of \(\mathbb {R}^N\). This is the standard notion of chaoticity which was introduced by Kac. Here we write it in terms of a distance which metrizes weak convergence of measures as it is more convenient for our set up.

Remark 3

Our theorem is really designed to work in the case of tensorised initial data and can be extended slightly as we have shown. If we no longer wanted our estimates to depend on the first marginal of the initial data we could replace it with the weaker, but difficult to check, condition

$$\begin{aligned} d_N(F_N, H_N) \le C \,\,\,\forall N. \end{aligned}$$

We also have two theorems in the case where we have non-zero and non equal mean for f and h using each of the different metrics which we use to study this case.

Theorem 4

Let \(F_N^0\) and \(H_N^0\) are respectively f and h chaotic families where the GTW distance between \(F_N^0\) and \(f^{\otimes N}\) (resp. for \(H_N^0\) and \(h^{\otimes N}\)) is bounded uniformly in N. Furthermore if f and h are probability densities with finite first and second moments and differentiable Fourier transforms, then we can choose a family of functions \(\chi \) (one for each N) to construct an altered distance \(\tilde{d}\) so that

$$\begin{aligned} \tilde{d} \left( \Pi _1[F_N], \Pi _1[H_N]) \right) \le (C_1 + (C_2 + C_3) \sqrt{N} + \tilde{d}(f,h))e^{-\frac{\mu }{2}t}. \end{aligned}$$

Theorem 5

Suppose that f and h are probability densities on \(\mathbb {R}\) with finite mean. If \((F_N(0,v))_{N \ge 2}\) and \((H_N(0,v))_{N \ge 2}\) are respectively fh-chaotic families with respect to the T1 metric, and the T1 distance between \(F_N(0,\cdot )\) and \(f^{\otimes N}\), and between \(H_N(0,\cdot )\) and \(h^{\otimes N}\) are bounded uniformly in N. Furthermore, let \(F_N, H_N\) are the solution to the N-particle coupled Kac’s master equation with this initial data, then there exists a C (the bound between the initial data and the tensorised form) of N such that

$$\begin{aligned} d_{T1,1}(\Pi _1[F_N](t), \Pi _1[H_N](t)) \le (C + \sqrt{N} d_{T1, 1}(f,h))e^{-\mu t/4}. \end{aligned}$$

We can also prove two similar theorems in Wasserstein distance on measures with finite second moment. The Wasserstein distance is given by

$$\begin{aligned} \mathcal {W}_{2,d}(\mu , \nu )= \inf _{\pi }\left( \int _{\mathbb {R}^{2d}} \Vert \mathbf {x}-\mathbf {y}\Vert ^2 \pi (\mathrm {d}\mathbf {x}, \mathrm {d}\mathbf {y}) \right) ^{1/2}, \end{aligned}$$

here \(\pi \) ranges over measures with marginals \(\mu , \nu \).

Theorem 6

If \(\mu _N\) and \(\nu _N\) are two solutions to the master equation with finite second moments then

$$\begin{aligned} \mathcal {W}_2(\mu _N(t), \nu _N(t)) \le e^{-\mu t/2} \mathcal {W}_2(\mu _N(0), \nu _N(0)). \end{aligned}$$

Theorem 7

Suppose that \(\mu _N(t)\) and \(\nu _N(t)\) are solutions to the master equation at time t, with initial data \(\mu _0^{\otimes N}\) and \(\nu _0^{\otimes N}\) then we have that for any N,

$$\begin{aligned} \mathcal {W}_{2,1}(\Pi _1(\mu _N(t)), \Pi _1(\nu _N(t))) \le e^{-\mu t/2} \mathcal {W}_{2,1}(\mu _0, \nu _0). \end{aligned}$$

2 Behaviour of the Moments

In this section we prove some basic lemmas on how the moments of a solution behave. We recall that \(K_g\) is the second moment of g our fixed distribution.

Lemma 1

The kinetic energy of a solution to the coupled master equation converges exponentially fast to \(NK_g\) with rate \(\mu /2\).

Proof

Let

$$\begin{aligned} K(t) = \int _{\mathbb {R}^n} \Vert v\Vert ^2 F_N(v) \mathrm {d}v. \end{aligned}$$

Differentiating under the integral and recalling that radial functions are in the kernel of \((I-Q)\) and that \((I-Q)\) is self adjoint we get,

The Jacobian of the change of variables \((v_j(w,\theta ),w_j^*) \leftrightarrow (v,w)\) is 1. Also we have that \(\Vert v\Vert ^2 + w^2 = \Vert v_j(w,\theta )\Vert ^2 +w_j^{*2}\). Using these we have

\(\square \)

Lemma 2

The first moments of a solution to the coupled master equation converge to 0 with rate greater than \(\mu /2\). Also the second order moments

$$\begin{aligned} d_{k,l}=\int _{\mathbb {R}^N}F_N(v)v_k v_l \mathrm {d}v, \end{aligned}$$

converge to 0 with rate greater than \(\mu /2\).

Proof

Let \(d_k = \int \mathrm {d}v F_N(v) v_k\) then we get the equation

$$\begin{aligned} \partial _t d_k&= - N(\lambda + \mu ) d_k + \lambda (N-2)d_k + \mu (N-1)d_k,\\&=-(2\lambda + \mu )d_k. \end{aligned}$$

For the second set we can calculate

$$\begin{aligned} \partial _t d_{k,l} =\left( -4\lambda -2\mu + \frac{2\lambda }{N-1}\right) d_{k,l} \end{aligned}$$

\(\square \)

3 Existence, Uniqueness and Convergence to a Steady State

We wish to show existence and uniqueness of a steady state via the Banach fixed point theorem in the space of probability measures with zero mean and finite second moment with the GTW distance. In order to do this we write the steady state equation for \(F_N\) as a fixed point theorem. We set \(\gamma = \lambda /(\lambda + \mu )\) to mirror the notation in [6].

$$\begin{aligned} F_N = \gamma Q[F_N] + (1-\gamma ) \frac{1}{N} \sum _{j=1}^N R_j[F_N]=\varPhi [F_N]. \end{aligned}$$

We want to show that \(\varPhi \) is a contraction in the Gabetta–Toscani–Wennberg metric. We first need to show that \(\varPhi \) preserves the metric space that we are working in.

Lemma 3

Suppose \(F_N\) has mean zero and finite second moment then \(\varPhi [F_N]\) has mean zero and finite second moment.

Proof

It is immediate that \(\int R_j[F_N](v)v_k \mathrm {d}v =0\) for \(j \ne k\). So it remains to look at

The fact that \(\varPhi [F_N]\) has finite second moments is clear since \(Q^*, R_j^*\) acting on \(\Vert v\Vert ^2\) or similar produces a finite linear combination of other functions to make second moments. \(\square \)

Further we would like to calculate how Q and \(R_j\) act in Fourier space.

Lemma 4

where \(\xi _{k,j} = (\xi _1, \dots ,\xi _k \cos \theta + \xi _j \sin \theta , \dots , -\xi _k \sin \theta + \xi _j \cos \theta , \dots , \xi _N)\). Also,

where \(\xi _j(\theta )=(\xi _1, \dots , \xi _j \cos \theta , \ldots ,\xi _N)\).

Proof

Where \(\xi _{k,j} = (\xi _1,\dots ,\xi _k \cos \theta + \xi _j \sin \theta ,\dots ,-\xi _k \sin \theta + \xi _j \cos \theta ,\dots , \xi _N)\).

Where \(\xi _j(\theta ) = (\xi _1,\dots ,\xi _j \cos \theta ,\dots ,\xi _N)\). \(\square \)

Now we can show existence and uniqueness.

Proof (Proof of Theorem 1)

Calculating we have

$$\begin{aligned} \widehat{\varPhi [F_N]}(\xi ) \!= \!\frac{1}{(2\pi )^{N/2}}\left( \gamma \int _{\mathbb {R}^N} Q[F_N](v)e^{-v\cdot \xi }\mathrm {d}v + (1-\gamma )\frac{1}{N}\sum _{j=1}^N \int _{\mathbb {R}^N} R_j[F_N]e^{-iv\cdot \xi } \mathrm {d}v \right) . \end{aligned}$$

Using the results of 4 we have

Therefore

Here to go between the second and third line we used

$$\begin{aligned} \sum _{j=1}^N \hat{g}(\xi _j \sin \theta ) \frac{|\xi _j(\theta )|^2}{|\xi |^2}&\le \sum _{j=1}^N \frac{|\xi _j(\theta )|^2}{|\xi |^2} \\&= \sum _{j=1}^N \frac{|\xi |^2 - \xi _j^2 \sin ^2 \theta }{\Vert xi|^2} \\&= N-\sin ^2 \theta . \end{aligned}$$

So we have the required contraction property for any fixed N. Which shows existence and uniqueness of a steady state thanks to the contraction mapping theorem. The moments being the same up to order 2 as g follow from the lemmas on the behaviour of moments in the previous section. \(\square \)

We also want to prove a contraction estimate in the T1 distance.

Lemma 5

$$\begin{aligned} d_{T1,N}(\varPhi [F_N], \varPhi [H_N]) \le \left( 1-\frac{1-\gamma }{4N} \right) d_{T1, N}(F_N, H_N). \end{aligned}$$

Proof

The proof is the same as for the GTW distance but here it is necessary to use

$$\begin{aligned} (1-x^2)^{1/2} \le 1-\frac{1}{2}x^2, \end{aligned}$$

when bounding \(|\xi _j(\theta )|/|\xi |\). This time we have

$$\begin{aligned} \sum _{j=1}^N \hat{g}(\xi _j \sin \theta )\frac{|\xi _j(\theta )|}{|\xi |}&\le \sum _{j=1}^N \sqrt{\frac{|\xi |^2- \xi _j^2 \sin ^2 \theta }{|\xi |^2}}\\&\le \sum _{j=1}^n \left( 1 - \frac{1}{2} \frac{\xi _j^2 \sin ^2 \theta }{|\xi |^2} \right) \\&= N - \frac{1}{2}\sin ^2 \theta . \end{aligned}$$

\(\square \)

Using these estimates we can also show convergence to equilibrium.

Proof (Proof of Theorem 2)

Suppose initially that \(F_N(t)\) and \(H_N(t)\) both have zero mean. From the above calculation we have

$$\begin{aligned} F_N(t+s)-H_N(t+s)&= (1-s(\lambda + \mu )N)(F_N(t)-H_N(t)) \\&\quad + s(\lambda + \mu ) N ( \varPhi [F_N(t)]-\varPhi [H_N(t)]) + o(s). \end{aligned}$$

Therefore

$$\begin{aligned} d_{GTW}(F_N(t+s),H_N(t+s))&\le (1-s(\lambda + \mu )N) d_{GTW}(F_N(t), H_N(t))\\&\quad + s(\lambda + \mu )N d_{GTW}(\varPhi [F_N], \varPhi [H_N]) + o(s) \\&\le (1-s(\lambda + \mu ))d_{GTW}(F_N(t),H_N(t)) \\&\quad + s(\lambda + \mu )N\left( 1- \frac{1-\gamma }{2N} \right) d_{GTW}(F_N(t), H_N(t)) + o(s)\\&=\left( 1-\frac{\mu }{2}s\right) d_{GTW}(F_N(t), H_N(t)) + o(s). \end{aligned}$$

Hence,

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t} d_{GTW}(F_N(t),H_N(t)) \le - \frac{\mu }{2} d_{GTW}(F_N(t), H_N(t)). \end{aligned}$$

So that we have exponential decrease with the stated rate. Since in 2 we showed that if we start the dynamics with two distribution which have zero mean then this property will be preserved, we see that if we start the dynamics with a zero mean distribution then it will converge exponentially fast towards the steady state. Now we would like to add a correction term so that we can deal with a wider class of initial data as in [6]. We define

$$\begin{aligned} \widehat{\mathcal {M}[F_N]} := \chi (\xi ) \sum _{k=1}^N \left( \int _{\mathbb {R}^N}v_k F_N(v)\mathrm {d}v \right) i \xi _k, \end{aligned}$$

where \(\chi \) is a smooth, compactly supported function which is 1 in some neighbourhood of 0. Therefore, if \(D_N = F_N - H_N - \mathcal {M}[F_N-H_N]\) we will have that

$$\begin{aligned} \widehat{D_N} = \int _{\mathbb {R}^N} \mathrm {d}v \left( F_N(v)-H_N(v) \right) \left( e^{-iv \cdot \xi } - \chi (\xi ) \sum _{j=1}^N v_j \xi _j \right) . \end{aligned}$$

This means that

$$\begin{aligned} \sup _{\xi \ne 0} \frac{\widehat{D_N}(\xi )}{|\xi |^2}< \infty . \end{aligned}$$

We calculate that

$$\begin{aligned} \partial _t D_N&= \partial _t F_N - \partial _t H_N - \partial _t \mathcal {M}[F_N-H_N] \\&= \lambda N(I-Q)[D_N] - \mu \sum _{j=1}^N (I-R_j)[D_N]\\&\quad - \lambda (I-Q)[\mathcal {M}[F_N - H_N]] - \mu \sum _{j=1}^N(I-R_j)[\mathcal {M}[F_N-H_N]] \\&\quad - \partial _t \mathcal {M}[F_N - H_N]. \end{aligned}$$

So if we let

$$\begin{aligned} W = -\lambda N(I-Q)[\mathcal {M}[F_N-H_N]] - \mu \sum _{j=1}^N (I-R_j)[\mathcal {M}[F_N-H_N]] - \partial _t \mathcal {M}[F_N - H_N], \end{aligned}$$

then \(D_N\) is a zero momentum, zero integral function and we have the equation

$$\begin{aligned} \partial _t D_N = -(\lambda + \mu )N(D_N-\varPhi [D_N]) + W. \end{aligned}$$

So if we want to show that

$$\begin{aligned} \sup _{\xi \ne 0} \frac{|\widehat{D_N}|}{|\xi |^2}, \end{aligned}$$

converges to zero exponentially fast it is sufficient to show that,

$$\begin{aligned} \sup _{\xi \ne 0} \frac{|\widehat{W}(\xi )|}{|\xi |^2}, \end{aligned}$$

converges to zero exponentially fast. Since \(\partial _t\) commutes with Fourier transform and \(\chi \) is compactly supported we know that

$$\begin{aligned} \widehat{\mathcal {M}}[F_N-H_N] = \chi (\xi ) \sum _{k=1}^N (m_f(0)-m_h(0))e^{-(2\lambda + \mu )t}i\xi _k, \end{aligned}$$

So ignoring \(\chi \) and looking near 0 we have, after Taylor expanding and using the formula from Lemma 4

$$\begin{aligned}&-\lambda N \widehat{(I-Q)[\mathcal {M}]} - \mu \sum _{j=1}^N \widehat{(I-R_j)[\mathcal {M}]} \\&\quad = -(2 \lambda + \mu )(m_f(0)-m_h(0)) e^{-(2\lambda + \mu )t} \sum _{k=1}^N \xi _k \\ {}&- \frac{1}{2}\mu K_g (m_f(0)-m_h(0))e^{-(2\lambda + \mu )t} |\xi |^2 \sum _{k=1}^N \xi _k + o(|\xi |^3). \end{aligned}$$

Therefore near \(\xi = 0\), we have

$$\begin{aligned} \frac{\widehat{W}(\xi )}{|\xi |^2} = -\frac{1}{2} \mu K_g \sum _{k=1}^N \xi _k + \frac{1}{2}\mu K_g \frac{\sum _{k=1}^N \xi _k^3}{|\xi |^2} + o(\xi ). \end{aligned}$$

This is because the lower order terms cancel. So in particular we have that

$$\begin{aligned} \lim _{\xi \rightarrow 0} \frac{\widehat{W}(\xi )}{|\xi |^2} = 0. \end{aligned}$$

Therefore, since \(\widehat{W}\) has compact support we can bound

$$\begin{aligned} \frac{\widehat{W}(\xi )}{|\xi |^2} \le C e^{-(2 \lambda + \mu )t} \end{aligned}$$

where C may increase with N. At 0 the gradient of

$$\begin{aligned} w(\xi )=\frac{\hat{W}(\xi )}{|\xi |^2} \end{aligned}$$

is \(C\sqrt{N}\mu K_g /2\) so the gradient of w cannot be bounded uniformly in N. Since we can calculate \(w(\xi )\) explicitly if \(\chi \) is always radial as

$$\begin{aligned} \mu \left( 1 - \sum _{j=1}^N(1-\alpha _j(\xi )) \right) \frac{\mathcal {M}}{|\xi |^2} \end{aligned}$$

where

This can be bounded uniformly provided we can bound the ration of the \(\chi \)s. Therefore under these additional assumptions we see that w increases no faster than \(\sqrt{N}\). This will give that

Therefore if we define a new distance

$$\begin{aligned} \tilde{d}_N(F_N, H_N) = \sup _{\xi \ne 0} \frac{|\widehat{D_N}|}{|\xi |^2} + \sup _{\xi \ne 0} \frac{|\widehat{W}|}{|\xi |^2}, \end{aligned}$$

we will get the inequality

$$\begin{aligned} \tilde{d}_N(F_N(t),H_N(t)) \le Ce^{-\frac{\mu }{2}t}. \end{aligned}$$

For the exponential convergence in the T1 distance we use the same argument as for the GTW distance with the same mean and the contraction estimate in Lemma 5. \(\square \)

Remark 4

If it were possible to get a bound on \(|\nabla w(\xi )|\) in terms of \(\sqrt{N}\) then it might in fact allow us to choose \(\chi \) for each N such that we didn’t get the increase with N by letting the radius of the support of \(\chi \) decrease with \(\sqrt{N}\). However, since the goal is to control the behaviour as \(N \rightarrow \infty \) then in the case of different marginals working with the correction term would introduce an error of at least \(\sqrt{N}\) when trying to control the initial data by its first marginal. In general because of having to choose a \(\chi \) for each N the altered distance is not well adapted to asymptotic analysis. We include it to show that for each N we can get the rate \(\mu /2\) and to compare with the limit equation case which is studied using this method in [6].

4 Convergence Rate of the First Marginal

It is shown in [3] that propagation of chaos holds for this type of coupled Kac’s model. The argument is very similar to Kac’s original argument therefore is not repeated here. Since we have propagation of chaos we know that the first marginal of \(F_N(t)\) will converge weakly towards a solution of the Boltzmann–Kac equation. In some sense we would like to be able to understand the two limits \(t \rightarrow \infty \) and \(N \rightarrow \infty \) simultaneously. For this reason we prove a bound on convergence to equilibrium for the first marginal which is uniform in N. Unfortunately, the GTW distance and our correction term W behave differently as \(N \rightarrow \infty \) so it was only possible to get these estimates when the initial data has zero mean.

The functions we work with will be invariant under permutations of variables so we can define the \(k^{th}\) marginal for \(k \le N\)

$$\begin{aligned} \Pi _k[F_N] := \int _{\mathbb {R}^{N-k}} F_N(v_1,\dots ,v_N)\mathrm {d}v_{i_1}\dots \mathrm {d}v_{i_{N-k}} \end{aligned}$$

for any choice of \(1 \le i_1< i_2< \dots < i_{N-k} \le N\). Many of the distances in which we could study Kac’s model, typically weighted \(L^2\) distances will not behave well as the number of particles tends to infinity so will not give convergence of the first marginal to an equilibrium in entropy, here the subadditivity property of entropy in the number of variables is crucial. We wish to show that the GTW and related distances will possess similar subadditivity properties, which will allow us to control things in a similar way.

Lemma 6

$$\begin{aligned}&\displaystyle d_{GTW,k}(\Pi _k[F_N], \Pi _k[H_N]) \le d_{GTW,N}(F_N, H_N),\\&\displaystyle \tilde{d}_k (\Pi _k[F_N], \Pi _k[H_N]) \le \tilde{d}_k(F_N, H_N), \end{aligned}$$

and

$$\begin{aligned} d_{T1,k}(\Pi _k[F_N], \Pi _k[H_N]) \le d_{T1,N}(F_N, H_N). \end{aligned}$$

Proof

The proof is the same for all the distances so we only do it in the case of GTW. We can notice that

$$\begin{aligned} \widehat{\Pi _k[F_N]}(\xi _1,\dots ,\xi _k) = \widehat{F_N}(\xi _1,\dots ,\xi _k,0,\dots ,0). \end{aligned}$$

Using this we have that

$$\begin{aligned} d_{GTW,k}(\Pi _k[F_N], \Pi _k[H_N])&= \sup _{\xi \ne 0, \xi _{k+1}=\dots =\xi _N=0} \frac{|\widehat{F_N}(\xi )-\widehat{H_N}(\xi )|}{|\xi |^2}\\&\le \tilde{d}(F_N,H_N). \end{aligned}$$

\(\square \)

Lemma 7

If fh have the same first moments

$$\begin{aligned} d_{GTW,N}(f^{\otimes N}, h^{\otimes N}) = d_{GTW,1}(f,h) \end{aligned}$$

where \(d_{GTW,k}\) is the GTW distance on probability densities with k-variables.

Proof

$$\begin{aligned} d_{GTW}(f^{\otimes N}, h^{\otimes N})&= \sup _{\xi \ne 0} \frac{|\hat{f}(\xi _1)\dots \hat{f}(\xi _N)-\hat{h}(\xi _1)\dots \hat{h}(\xi _N)|}{|\xi |^2}\\&\le \sup _{\xi \ne 0} \frac{\sum _{i=1}^N|\hat{f}(\xi _1)\dots \hat{f}(\xi _{i-1})(\hat{f}(\xi _i)-\hat{h}(\xi _i))\hat{h}(\xi _{i+1})\dots \hat{h}(\xi _N)|}{|\xi |^2}\\&\le \sup _{\xi \ne 0} \sum _{i=1}^N \frac{\hat{f}(\xi _i)-\hat{h}(\xi _i)}{\xi _i^2} \frac{\xi _i^2}{|\xi |^2} \\&\le \sup _{\xi \ne 0} \sum _{i=1}^N d_{GTW,1}(f,h)\frac{\xi _i^2}{|\xi |^2} = d_{GTW,1}(f,h). \end{aligned}$$

Since fh are the first marginals of \(f^{\otimes N}, h^{\otimes N}\) respectively we have by the earlier lemma that

$$\begin{aligned} d_{GTW,1}(f,h) \le d_{GTW,N}(f^{\otimes N}, h^{\otimes N}) \end{aligned}$$

putting the two inequalities together gives the required result. \(\square \)

We have already seen that

$$\begin{aligned} \frac{\widehat{W}(\xi )}{|\xi |^2}, \end{aligned}$$

may increase with N so this will cause us problems if we wished to try and control \(\tilde{d}_N(f^{\otimes N}, h^{\otimes N})\) by \(\tilde{d}_1(f,h)\). Even given this it would be good to be able to push the control by first marginals to general functions. However, the next lemma shows that this is not possible.

Lemma 8

There exist fg with finite second moment such that fg are symmetric and mean zero and they have the same marginals but fg are not the same. This means we cannot control the GTW distance between f and g in terms of the GTW distance between their first marginals.

Proof

Let \(\phi \) be a density function on \(\mathbb {R}\) which is mean zero but not even. Define

$$\begin{aligned} f(v_1, v_2) := \frac{1}{2} (\phi (v_1)\phi (-v_2)+\phi (-v_1)\phi (v_2)), \end{aligned}$$

and

$$\begin{aligned} g(v_1,v_2)= \frac{1}{2}(\phi (v_1)\phi (v_2) + \phi (-v_1)\phi (-v_2)). \end{aligned}$$

Then it is easy to see that f and g have the required properties. \(\square \)

We wish to combine these lemmas in such a way as to get uniform control on the first marginal. Given the restriction shown by Lemma 8 we want to choose ‘good’ initial data in order that the distance between the initial data is controlled by the distance between the first marginals.

Proof (Proof of Theorem 3)

Since fh have mean zero and the GTW distance between \(F_N(0)\) and \(f^{\otimes N}\) is finite, we have that \(F_N\) and \(H_N\) have zero mean initially. By 2 this holds for all time. Therefore we have by Lemma 6

$$\begin{aligned} d_{GTW,1}(\Pi _1[F_N], \Pi _1[H_N]) \le d_{GTW,N}(F_N, H_N). \end{aligned}$$

Furthermore, by Theorem 2

$$\begin{aligned} d_{GTW,N}(F_N(t), H_N(t)) \le d_{GTW,N}(F_N(0), H_N(0))e^{-\frac{\mu }{2}t}. \end{aligned}$$

Now we use the chaoticity property and our control on tensorised functions form Lemma 7 to get

$$\begin{aligned} d_{GTW,N}(F_N(0), H_N(0))&\le d_{GTW,N}(F_N(0), f^{\otimes N}) + d_{GTW, N}(f^{\otimes N},h^{\otimes N})\\&\quad + d_{GTW,N}(h^{\otimes N}, H_N(0)) \\&= C_1 + d_{GTW,1}(f,h). \end{aligned}$$

Here \(C_1\) only depends on how close the initial data is to tensorised. Putting this together gives

$$\begin{aligned} d_{GTW,1}(\Pi _1[F_N](t), \Pi _1[H_N](t)) \le (d_{GTW,1}(f,h) + C_1) e^{-\frac{\mu }{2}t}. \end{aligned}$$

We do not have from our conditions that \(C_1\) will decrease to 0 as \(N \rightarrow \infty \), but since in this situation the real interest is just to choose any f-chaotic family we may as well have that \(F_N(0) = f^{\otimes N}\) and similarly with H which would dispense with the \(C_1\) altogether. \(\square \)

Now we would like to prove a theorem in the spirit of Theorem 3 when we do not have f and h having zero mean initially. We cannot recover uniform estimates in N but we can control the growth with N. We have from Lemma 6 control of marginals by the function for the \(\tilde{d}\) distance so we have

$$\begin{aligned} \tilde{d}_k(\Pi _k[F_N], \Pi _k[H_N]) \le \tilde{d} (F_N, H_N). \end{aligned}$$

Following this we would like to prove something in the spirit of lemma 7 in order to control in the other direction.

Lemma 9

Suppose we have f and h probability distributions on \(\mathbb {R}\) with differentiable Fourier transforms. If we define

$$\begin{aligned} n_f = \int |v|f(v) \mathrm {d}v, \end{aligned}$$

and let \(M= \max \left\{ \frac{n_f}{|m_f|}, \frac{n_h}{|m_h|} \right\} \) then we have the following control by the first marginals for the \(\tilde{d}\) distance on tensorised functions.

$$\begin{aligned} \tilde{d}_N(f^{\otimes N}, h^{\otimes N}) \le \tilde{d}_1(f,h) + M|m_f-m_h| \sqrt{N}. \end{aligned}$$

Proof

Using the same bridging argument as before we see that

$$\begin{aligned}&\hat{f}(\xi _1)\dots \hat{f}(\xi _N) - \hat{h}(\xi _1)\dots \hat{h}(\xi _N) - (m_f-m_h)\chi _N(\xi )\sum _k i \xi _k\\&\quad = \sum _k \hat{f}(\xi _1)\dots \hat{f}(\xi _{k-1})(\hat{f}(\xi _k)-\hat{h}(\xi _k)-\chi _1(\xi _k)(m_f-m_h)i\xi _k)\hat{h}(\xi _{k+1})\dots \hat{h}(\xi _N) \\&\qquad + \sum _k \hat{f}(\xi _1)\dots \hat{f}(\xi _{k-1})(m_f-m_h)\chi _1(\xi _k)i\xi _k \hat{h}(\xi _{k+1})\dots \hat{h}(\xi _N) \\&\qquad - \chi _N(\xi )\sum _k (m_f-m_h)i\xi _k. \end{aligned}$$

In order to complete the proof we want to bound the last term by something of the form

$$\begin{aligned} M|m_f-m_h|\sqrt{N}|\xi |^2. \end{aligned}$$

Provided the radius of the set in which the \(\chi \) are 1 is sufficiently large this will be true. So if we look at the last term where the \(\chi \) are 1, we have

$$\begin{aligned} (m_f-m_h)i \sum _k \xi _k \left( \hat{f}(\xi _1)\dots \hat{f}(\xi _{k-1})\hat{h}(\xi _{k+1})\dots \hat{h}(\xi _k)-1 \right) . \end{aligned}$$

If instead we try and bound

$$\begin{aligned} A=\frac{ \hat{f}(\xi _1)\dots \hat{f}(\xi _{k-1})\hat{h}(\xi _{k+1})\dots \hat{h}(\xi _N)-1 }{m_f \sum _{j<k} i \xi _j + m_h \sum _{k<j}i \xi _j} \le M \end{aligned}$$

then we would have the bound

$$\begin{aligned}&\left| \frac{\sum _k (\hat{f}(\xi _1)\dots \hat{f}(\xi _{k-1})\hat{h}(\xi _{k+1})\dots \hat{h}(\xi _N) -1)\xi _k (m_f-m_h)}{|\xi |^2} \right| \\&\quad \le M \frac{\left| \sum _{k=1}^N(m_f\sum _{j<k}i\xi _j + m_h \sum _{k<j}i\xi _j)\xi _k i (m_f-m_h) \right| }{|\xi |^2} \le M|m_f-m_h|\sqrt{N}. \end{aligned}$$

Therefore it remains to prove the bound on A, we do this first by noting that by Taylor expanding we can see that as \(|\xi | \rightarrow 0, A \rightarrow 1\) and that as \(|\xi | \rightarrow \infty , A \rightarrow 0\). A is differentiable everywhere except possibly 0. Now we differentiate to get that at any stationary point of A and for every \(l<k\) we have

$$\begin{aligned}&\hat{f}(\xi _1)\dots \hat{f}'(\xi _l)\dots \hat{f}(\xi _{k-1})\hat{h}(\xi _{k+1})\dots \hat{h}(\xi _N) \left( m_f \sum _{j<k} i \xi _j + m_h \sum _{k<j} i \xi _j \right) \\&= i m_f \left( \hat{f}(\xi _1) \dots \hat{f}(\xi _{k-1})\hat{h}(\xi _{k+1})\dots \hat{h}(\xi _N) -1 \right) . \end{aligned}$$

Substituting this into our expression for A shows that at a stationary point

$$\begin{aligned} A= \frac{1}{i m_f} \hat{f}(\xi _1)\dots \hat{f}'(\xi _l)\dots \hat{f}(\xi _{k-1})\hat{h}(\xi _{k+1})\dots \hat{h}(\xi _N) \le M. \end{aligned}$$

This gives the claimed bound. It seems like there will be a problem if \(m_f=0\) but if so we can always choose to differentiate in a direction so that we will get \(m_h\) rather than \(m_f\) and the cannot both be 0. Here \(C_1\), in the statement, only depends on the distance between the initial data and the tensorised functions, \(C_2\) only depends on g and \(\chi \) and \(C_3\) is a constant times \(M|m_f-m_h|\) where M is the maximum of \(\int |v|f(v)\mathrm {d}v\) with the same quantity for h. \(\square \)

We can now prove the theorem

Proof (Proof of Theorem 4)

This is found by putting together the convergence theorems and lemmas on distance control in exactly the same way as Theorem 2. \(\square \)

If we move on to looking at the T1 distance we again have the bound on the T1 distance between marginals by the distance between the full function from Lemma 6. We would like to be able to control the distance between tensorised functions by the marginals in order to give similar arguments to Theorems 3 and 4.

Lemma 10

$$\begin{aligned} d_{T1,N}(f^{\otimes N}, h^{\otimes N}) \le \sqrt{N} d_{T1,1}(f,h). \end{aligned}$$

Furthermore, the square root dependence is the best possible if fh have different means.

Proof

This follows a similar argument to the others

$$\begin{aligned} \sup _{\xi \ne 0}&\frac{|\hat{f}(\xi _1)\dots \hat{f}(\xi _N)-\hat{h}(\xi _1)\dots \hat{h}(\xi _N)|}{|\xi |}\\ {}&\le \sup _{\xi \ne 0} \frac{\sum _k |\hat{f}(\xi _1)\dots \hat{f}(\xi _{k-1}(\hat{f}(\xi _k)-\hat{h}(\xi _k))\hat{h}(\xi _{k+1})\dots \hat{h}(\xi _N)|}{|\xi |}\\&\le \sup _{\xi \ne 0} \sum _k \frac{|\hat{f}(\xi _k)-\hat{h}(\xi _k)|}{|\xi _k|} \frac{|\xi _k|}{|\xi |}\\&\le sup_{\xi \ne 0} \frac{|\hat{f}(\xi )-\hat{h}(\xi )|}{|\xi |}\sum _k \frac{|\xi _k|}{|\xi |} \\&\le \sqrt{N}\sup _{\xi \ne 0} \frac{|\hat{f}(\xi )-\hat{h}(\xi )|}{|\xi |}. \end{aligned}$$

The fact that the square root dependence is necessary for functions with different means can be seen by Taylor expanding

$$\begin{aligned} \frac{\hat{f}(\xi _1)\dots \hat{f}(\xi _N)-\hat{h}(\xi _1)\dots \hat{h}(\xi _N)}{|\xi |} \end{aligned}$$

around \(\xi = 0\) then we can see that the limit as \(\xi \rightarrow 0\) of this expression has modulus \(\sqrt{N}|m_f-m_h|\). \(\square \)

Proof (Proof of Theorem 5)

Again we combine the convergence theorem that we have for the T1 distance with the control on distances as in Theorem 2. \(\square \)

5 Contraction in Wasserstein-2

We can also show contraction of this model in Wasserstein distances using a simple coupling of two different systems. This coupling involves taking two of the coupled Kac’s models and giving them simultaneous collisions with the same angle if it is an internal collision and the same angle and velocity of the external particle if it is an external collision. We can represent the stochastic process as an integral against several Poisson point processes. This is done in [13] and is helpful here to prove contraction for the energy process in Kac’s model.

$$\begin{aligned} V_{i,t}&= V_{i,0} + \lambda \sum _{j \ne i} \int _0^t \int _0^{2\pi } \left( V_{i,s^-}\cos \theta + V_{j, s^-}\sin \theta - V_{i, s^-}\right) \Pi _{i,j}(\mathrm {d}s, \mathrm {d}\theta ) \end{aligned}$$
(2)
$$\begin{aligned}&\quad + 2 \mu \int _0^t \int _{-\infty }^{\infty } \int _0^{2\pi }\left( V_{i,s^-} \cos \theta + w \sin \theta - V_{i,s^-}\right) \nu _i (\mathrm {d}s, \mathrm {d}w, \mathrm {d}\theta ). \end{aligned}$$
(3)

Here \(\Pi _{i,j}\) is a Poisson point process on \([0, \infty ) \times [0, 2\pi ]\) with intensity measure being \(1/2\pi (N-1)\) times Lebesgue measure, and \(\nu _i\) is a Poisson point process with intensity measure g tensored with \(1/2\pi (N-1)\) times Lebesgue. Using this representation we can prove contraction in Wasserstein-2.

Proof (Proof of Theorem 6)

Using the representation above we can write out a similar formula for the difference between two solutions coupled by giving them the same driving Poisson processes. If we call this difference in the \(i^{th}\) variable \(\Delta _{i,t}\) then we can write

$$\begin{aligned} \Delta _{i,t}^2&= \Delta _{i,0}^2 + \lambda \sum _{j \ne i} \int _0^t \int _0^{2\pi } \left( \Delta _{i,s^-}^2 (\cos ^2 \theta - 1) + \Delta _{j,s^-}^2 \sin ^2 \theta \right. \\&\left. \quad +\, 2\cos \theta \sin \theta \Delta _{i,s^-}\Delta _{j,s^-} \right) \Pi _{i,j}(\mathrm {d}s, \mathrm {d}\theta ) \\&\quad +\, 2 \mu \int _0^t \int _{-\infty }^{\infty } \int _0^{2\pi } \left( \Delta _{i,s^-}^2(cos^2 \theta - 1) + 2\Delta _{i,s^-} w \sin \theta \cos \theta \right) \nu (\mathrm {d}s, \mathrm {d}w, \mathrm {d}\theta ). \end{aligned}$$

Summing over i and taking expectations gives

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t} \mathbb {E}\left( \sum _{i=1}^n \Delta _{i,t}^2 \right)&= 2 \lambda (N-1) \frac{1}{2\pi } \int _0^{2\pi } (\cos ^2 \theta + \sin ^2 \theta -1) \mathrm {d}\theta \mathbb {E}\left( \sum _{i=1}^n \Delta _{i,t}^2 \right) \\&\quad + 2\mu \frac{1}{2\pi }\int _0^{2\pi } \int _{-\infty }^{\infty } g(w) (\cos ^2 \theta - 1) \mathrm {d}\theta \mathrm {d}w \mathbb {E}\left( \sum _{i=1}^n \Delta _{i,t}^2 \right) \\&= -\mu \mathbb {E}\left( \sum _{i=1}^n \Delta _{i,t}^2 \right) . \end{aligned}$$

Which gives the result after taking the infimum over possible couplings. \(\square \)

We can also prove a similar controls over how Wasserstein distances behave in as the dimension goes to infinity. Here we write \(\mathcal {W}_{p, d}\) to be the Wasserstein-2 distance related to the euclidean distance on \(\mathbb {R}^d\).

Lemma 11

If \(\mu , \nu \) are measures on \(\mathbb {R}\) with finite second moment then

$$\begin{aligned} \mathcal {W}_{2,N}(\mu ^{\otimes N}, \nu ^{\otimes N}) = \sqrt{N}\mathcal {W}_{2,1}(\mu , \nu ). \end{aligned}$$

Proof

We know that there exists an optimal coupling, \(\pi _1\) so that

$$\begin{aligned} \mathcal {W}_{2,1}(\mu , \nu ) = \left( \int _{\mathbb {R}^2} (x-y)^2 \pi _1(\mathrm {d}x, \mathrm {d}y) \right) ^{1/2} \end{aligned}$$

and an optimal coupling, \(\pi _N\), such that

$$\begin{aligned} \mathcal {W}_{2,N}( \mu ^{\otimes N}, \nu ^{\otimes N}) \left( \int _{\mathbb {R}^{2N}} \Vert \mathbf {x}-\mathbf {y}\Vert ^2 \pi _N(\mathrm {d}\mathbf {x}, \mathrm {d}\mathbf {y}) \right) ^{1/2}. \end{aligned}$$

Suppose that \(\pi _N \ne \pi _1^{\otimes N}\) then we have that

$$\begin{aligned}&\int \left( (x_1-y_1)^2 + \dots + (x_N-y_N)^2 \right) \pi _N(\mathrm {d}\mathbf {x}, \mathrm {d}\mathbf {y}) \\&\quad < \int \left( (x_1-y_1)^2 + \dots + (x_N-y_N)^2 \right) \pi _1(\mathrm {d}x_1,\mathrm {d}y_1)\dots \pi _1(\mathrm {d}x_N, \mathrm {d}y_N) \\&\quad = N \int (x-y)^2 \pi _1(\mathrm {d}x, \mathrm {d}y). \end{aligned}$$

Therefore, there exists some k such that

$$\begin{aligned} \int _{\mathbb {R}^{2N}}(x_k-y_k)^2 \pi _N(\mathrm {d}\mathbf {x}, \mathrm {d}\mathbf {y}) < \int _{\mathbb {R}^2} (x-y)^2 \pi _1(\mathrm {d}x, \mathrm {d}y). \end{aligned}$$

Since the integrand on the left hand side only depends on \(x_k, y_k\) \(\pi _N\) induces a coupling of \(\mu \) and \(\nu \) by projection onto the \(k^{th}\) variables. The cost under this measure is strictly less that the optimal cost which is a contradiction. Hence, the optimal coupling is achieved by \(\pi _1^{\otimes N}\). This gives that,

$$\begin{aligned}&\mathcal {W}_{2,N}(\mu ^{\otimes N}, \nu ^{\otimes N}) \\&\quad = \left( \int \left( (x_1-y_1)^2 + \dots + (x_N-y_N)^2 \right) \pi _1(\mathrm {d}x_1,\mathrm {d}y_1)\dots \pi _1(\mathrm {d}x_N, \mathrm {d}y_N) \right) ^{1/2} \\&\quad = \left( N \int (x-y)^2 \pi _1(\mathrm {d}x, \mathrm {d}y) \right) ^{1/2} \\&\quad = \sqrt{N} \mathcal {W}_{2,1}(\mu , \nu ). \end{aligned}$$

\(\square \)

Lemma 12

If \(\mu _N\) and \(\nu _N\) are symmetric probability distributions on \(\mathbb {R}^N\) with finite second moment then

$$\begin{aligned} \mathcal {W}_{2,1}(\Pi _1(\mu _N), \Pi _1(\nu _N)) \le \frac{1}{\sqrt{N}} \mathcal {W}_{2,N}(\mu _N, \nu _N). \end{aligned}$$

Proof

Suppose that \(\pi _N\) is a coupling of \(\mu _N\) and \(\nu _N\) then the marginals of \(\pi _N\) induce couplings of the marginals of \(\mu _N\) and \(\nu _N\).

$$\begin{aligned}&\left( \int \left( (x_1-y_1)^2 + \dots + (x_N-y_N)^2 \right) \pi _N(\mathrm {d}\mathbf {x}, \mathrm {d}\mathbf {y}) \right) ^{1/2} \\&= \left( \int (x_1-y_1)^2 \pi _N(\mathrm {d}\mathbf {x}, \mathrm {d}\mathbf {y}) + \dots + \int (x_N-y_N)^2 \pi _N(\mathrm {d}\mathbf {x}, \mathrm {d}\mathbf {y}) \right) ^{1/2} \\&\ge \left( N \mathcal {W}_{2,1}(\Pi _1(\mu _N), \Pi _1(\nu _N))^2 \right) ^{1/2} = \sqrt{N} \mathcal {W}_{2,1}(\Pi _1(\mu _N), \Pi (\nu _N)). \end{aligned}$$

\(\square \)

Like with the earlier sections we can combine this behaviour with our contraction estimates to show uniform behaviour of the first marginal. For simplicity we only looked at tensorised initial data.

Proof (Proof of Theorem 7)

$$\begin{aligned} \mathcal {W}_{2,1}( \Pi _1(\mu _N(t)), \Pi _1(\nu _N(t)))&\le \frac{1}{\sqrt{N}} \mathcal {W}_{2,N}(\mu _N(t), \nu _N(t))\\&\le \frac{1}{\sqrt{N}}e^{-\mu t/2} \mathcal {W}_{2,N}( \mu _0^{\otimes N}, \nu _0^{\otimes N}) \\&= e^{- \mu t/2} \mathcal {W}_{2,1}(\mu _0, \nu _0). \end{aligned}$$

\(\square \)

Remark 5

These uniform estimates in N combined with propagation of chaos means that the limit Boltzmann–Kac equation will also show exponential convergence to equilibrium in Wasserstein-2. This is very similar to the result shown in [6] in the Toscani distance.