Rate of convergence in the Smoluchowski-Kramers approximation for mean-field stochastic differential equations

In this paper we study a second-order mean-field stochastic differential systems describing the movement of a particle under the influence of a time-dependent force, a friction, a mean-field interaction and a space and time-dependent stochastic noise. Using techniques from Malliavin calculus, we establish explicit rates of convergence in the zero-mass limit (Smoluchowski-Kramers approximation) in the $L^p$-distances and in the total variation distance for the position process, the velocity process and a re-scaled velocity process to their corresponding limiting processes.


Introduction
In this paper, we are interested in the following second-order mean-field stochastic differential equations (1.1) Here α, γ and κ are positive constants, g(t, x) : [0, T ] → R is a given function, x 0 , y 0 ∈ R are given points in the real line, and (W t ) t≥0 is the standard onedimensional Wiener process. The notation E denotes the expectation with respect to the probability measure of the underlying probability space in which the Wiener process is defined.
System (1.1) describes the movement of a particle at position (displacement) X α t ∈ R and with velocity Y α t ∈ R, at time t, under the influence of four different forces: an external, possibly time-dependent and nonpotential, force −g(t, X α t ); a friction −κY α t ; a (McKean-Vlasov type) meanfield interaction force −γ(Y α t − E(Y α t )) (noting that here the mean-field term is acting on the velocity rather than the position) and a stochastic noise σ(t, X α t )Ẇ t . Physically, α is the inverse of the mass, κ is the friction coefficient and γ is the strength of the interaction. We use the superscript α in (1.1) to emphasize the dependence on α since in the subsequent analysis we are concerned with the asymptotic behaviour of (1.1) as α tends to +∞.
Under Assumptions 1.1 (see below) of this paper, system (1.1) can also be obtained as the mean-field (hydrodynamic) limit of the following interacting particle system as N tends to +∞ are independent one-dimensional Wiener processes. In fact, under Assumptions 1.1 the above interacting system satisfies the property of propagation of chaos, that is as N tends to infinity, it behaves more and more like a system of independent particles, in which each particle evolves according to (1.1) where the interaction term in (1.2) is replaced by the expectation one. For a detailed account on the propagation of chaos phenomenon, we refer the reader to classical papers [Kac56,Szn91] and more recent papers [BGM10,Duo15,JW17] and references therein for degenerate diffusion systems like (1.1). The interacting particle system (1.2) and its mean-field limit (1.1) and more broadly systems of these types have been used extensively in biology, chemistry and statistical physics for the modelling of molecular dynamics, chemical reactions, flockings, social interactions, just to name a few, see for instance, the monographs [RF96,Pav14].
In this paper, we are interested in the zero-mass limit (as also known as the Smoluchowski-Kramers approximation) of (1.1), that is its asymptotic behaviour as α tends to +∞. By employing techniques from Malliavin calculus, we obtain explicitly rate of convergences, in L p -distances and in total variation distances, for both the position and velocity processes.
1.1. Main results. Before stating our main results, we make the following assumptions. Let F, G be random variables, we denote by d T V (F, G) the total variation distance between the laws of F and G, that is, Consider the following first-order stochastic differential equation, which will be the limiting system for the displacement process (1.3) Our first main result provides an explicit rates of convergence for the displacement process. (1) (rate of convergence in L p -distances) For all p ≥ 2, α ≥ 1 and t ∈ [0, T ], where λ(t, a) = (1/a)[1 − exp(−at)] for t, a > 0 and C is a positive constant depending on {x 0 , y 0 , κ, γ, K, L, p, T } but not on α and t.
Theorem 1.1 combines Theorem 3.1 (for the L p -distances) and Theorem 3.2 (for the total variation distance) in Section 3.1.
We are also interested in the asymptotic behavior, when α → ∞, of the velocity process Y α t of (1.1) and of a re-scaled velocity process,Ỹ α t , which is defined byỸ The re-scaled processỸ α t satisfies the following stochastic differential equation   √ αW t/α is a rescaled Brownian process. Now we consider the following stochastic differential equation, which will be the limiting process of the rescaled velocity process (1.5) We now describe our result for the rescaled velocity process first since for this process we also work with a general setting where both g and σ can depend on both spatial and temporal variables. We only assume additionally the following condition.
In the next theorem, we provide explicit rates of convergence, both in L p -distances and in the total variation distance, for the rescaled velocity process.
Theorem 1.2 (Quantitative rates of convergence for the rescaled velocity processes). Under Assumptions 1.1 and 1.3 the following hold.
(1) (rate of convergence in L p -distance for the rescaled velocity process) For all p ≥ 2 and α ≥ 1, where C is a positive constant depending on p and other parameters but not on α.
When g(t, x) = g(x) and σ(t, x) = δ, [Nar94, Theorem 2.3] shows that the velocity process Y α t converges to the normal distribution as α → ∞. The third aim of this paper is to generalize this result to a more general setting where g depends on both x and t while σ depends only on t, i.e. σ(t, x) = σ(t), obtaining rates of convergence in the total variation distance. The following theorem is the content of Theorem 3.5 in Section 3.2. Theorem 1.3 (Quantitative rates of convergence for the velocity processes). Under Assumptions 1.1 the following hold. Assume additionally that σ(t) is continuously differentiable on [0, T ] and that σ(t) = 0 for each t ∈ (0, T ]. Let N be a normal random variable with mean 0 and variance σ 2 (t) 2(κ + γ) , Then, for each α ≥ 1 and t ∈ (0, T ] where C > 0 is a constant not depending on α and t.
Theorem 1.3 is Theorem 3.5 in Section 3.2. We emphasize that in the main theorems, to obtain the existence and uniqueness as well as the rate of convergence in L p -distances we only use Assumptions 1.1. Assumptions 1.2 and 1.3 are needed to employ techniques from Malliavin calculus, in particular to derive estimates for the Malliavin derivatives.
Corollary 1.1 (Rate of convergence in Wasserstein distance for the laws of the displacement and velocity processes). Let µ and ν be two probability measures with finite second moments, then the p-Wasserstein distance, W p (µ, ν), between them can be defined by Using this formulation, as a direct consequence of our main results, we also obtain explicit rates of convergence in p-Wasserstein distances for the laws of the displacement and the rescaled velocity processes to the corresponding limiting ones

1.2.
Comparison with existing literature and future work. The zeromass limit of second order differential equations has been studied intensively in the literature. In the seminal work [Kra40], Kramers formally discusses this problem, in the context of applications to chemical reactions, for the classical underdamped Langevin dynamics, which corresponds to (1.1) with g = −∇V (a gradient potential force), γ = 0 (no interaction force) and a constant diffusion coefficient. Due to this seminal work, this limit has become known in the literature as the Smoluchowski-Kramers approximation. Nelson rigorously shows that, under suitable rescaling, the solution to the Langevin equation converges almost surely to the solution of (3.14) with ψ = 0 [Nel67]. Since then various generalizations and related results have been proved using different approaches such as stochastic methods, asymptotic expansions and variational techniques, see for instance [Nar91b, Nar91a, Nar94, Fre04, CF06, HVW12, DLPS17, DLP + 18, NN20]. The most relevant papers to the present one include [Nar91b, Nar91a, Nar94, DLPS17, NN20]. The main novelty of the present paper lies in the fact that we consider interacting (mean-field) systems allowing time-dependent external forces and diffusion coefficients, and providing explicit rates of convergence in both L p -distances and total variation distances for both displacement and velocity processes. Existing papers lack at least one of these features. More specifically, Papers that consider mean-field (interaction) systems. The papers [Nar91b, Nar91a, Nar94, DLPS17] consider second order mean-field stochastic differential equations establishing the zero-mass limit, but they require much more stringent conditions that g(t, x) = g(x) (time-independent force) and σ(t, x) = δ (constant diffusivity). On top of that, they do not provide a rate of convergence. Furthermore, our approach using Malliavin calculus is also different: Narita's papers use direct arguments while [DLPS17] employs variational methods based on Gamma-convergence and large deviation principle.
Papers that provide a rate of convergence. The papers [NN20, DLP + 18] provide a rate of convergence but only consider non-interacting systems (also using different measurements). Like our paper, [NN20] also utilizes techniques from Malliavin calculus, but [DLP + 18] uses a completely different variational method. The recent paper [CT22], which studies the kinetic Vlasov-Fokker-Planck equation, is particularly interesting since it considers both interacting systems and provides a rate of convergence, but this paper is different to ours in a couple of aspects. First, the interaction force is acting on the position instead of the velocity; second, it works on the Fokker Planck equations and obtains a rate of convergence in Wasserstein distance while we work on the stochastic differential equations and obtain error quantifications in both L p -distances and total variation distances; third, as mentioned, we use Malliavin calculus while [CT22] applied variational techniques like in [DLPS17, DLP + 18]. We also mention the paper [Ta20], which provides similar rate of convergence to ours but it consider non mean-field stochastic differential equations driven by fractional Brownian motions.
Future work. The Lipschitz boundedness and differentiability Assumptions 1.1-1.2-1.3 are standard, but rather restricted since they do not cover some physically interesting interacting singular, such as Coloumb or Newton, forces. It would be interesting and challenging to extend our work to non-Lipschtizian and singular coefficients. Initial attempts in this direction for related models exist in the literature, see [Bre09] for non-Lipschitzian coefficients and recent papers [XY22,CT22] for singular forces. Another interesting problem for future work is to study the Kramers-Smoluchowski approximation for the N -particle system (1.2) obtaining a rate of convergence that is independent of N .
1.3. Overview of the proofs. To prove the main theorems for the general setting, with time-dependent coefficients, and obtain L p -distances and total variations distances for the position and velocity processes, several technical improvements have been carried out.
On existence and uniqueness. Under Assumptions 1.1, the existence and uniqueness, as well as the boundedness of the moments, of the second-order system (1.1) and the limiting first-order one (1.3) are standard results following Hölder's and the Burkholder-Davis-Gundy inequalities.
On rate of convergence in L p -distances. Combining the mentioned inequalities and known estimates from [Nar91b] we can directly estimate E sup 0≤s≤t |X α s − X s | p and E sup 0≤t≤T |Ỹ α t −Ỹ t | p and obtain the rate of convergences in L p -distances, proving parts (1) of both theorems.
On rate of convergence in total-variation distances. The Malliavin differentiablity of the processes is followed from similar arguments as in [Nua06]. Obtaining the rate of convergence in total variation distances is the most technically challenging. Lemma 2.1, which provides an upper bound estimate for the total variation between two random variables in terms of their Malliavin derivatives, is the key in our analysis. This lemma enables us to obtain the desired rates of convergence by estimating the corresponding quantities appearing in the right-hand side of Lemma 2.1.
1.4. Organization of the paper. The rest of of the paper is organized as follows. In Section 2, we give an overview of some elements of Malliavin calculus and mean-field stochastic differential equations. The proofs of the main theorems are given in Section 3.

Preliminaries
In this section, we provide some basic and directly relevant knowledge on the Malliavin calculus and mean-field stochastic differential equations.
2.1. Malliavin calculus. Let us recall some elements of stochastic calculus of variations (for more details see [Nua06]). We suppose that Let S denote the dense subset of L 2 (Ω, F, P ) := L 2 (Ω) consisting of smooth random variables of the form If F has the form (2.1), we define its Malliavin derivative as the process DF := {D t F, t ∈ [0, T ]} given by More generally, for each k ≥ 1 we can define the iterated derivative operator on a cylindrical random variable by setting A random variable F is said to be Malliavin differentiable if it belongs to D 1,2 . An important operator in the Malliavin's calculus theory is the divergence operator δ, which is the adjoint of the derivative operator D. The domain of δ is the set of all functions u ∈ L 2 (Ω, H) such that where C(u) is some positive constant depending on u. In particular, if u ∈ Dom(δ), then δ(u) is characterized by the following duality relationship The following lemma provides an upper bound on the total variation distance between two random variables in terms of their Malliavin derivatives. This lemma will play an important role in the analysis of the present paper.
Lemma 2.1. Let F 1 ∈ D 2,2 be such that DF 1 H > 0 a.s. Then, for any random variable F 2 ∈ D 1,2 we have provided that the expectations exist.
Proof. From [Ta20, Lemma 2.1] we have Moreover, observing that Substituting the inequality (2.5) into (2.4) and using Hölder's inequality, one can derive that Finally, substituting the above estimate back into (2.3) and using the fundamental inequality (a + b) 1/2 ≤ a 1/2 + b 1/2 for all a, b ≥ 0, we obtain (2.2), which completes the proof of this lemma.

2.2.
Mean-field stochastic differential equations. Let (Ω, F, P) be a probability space with an increasing family {F t ; t ≥ 0} of sub-σ-algebras of F and let {W t ; t ≥ 0} be a one-dimensional Brownian motion process adapted to F t .
The following lemma provides equivalent formulations of (1.1) and (1.3) as stochastic integral equations.
Lemma 2.2. Equations (1.1) and (1.3) are, respectively, equivalent to the following equations Proof. Firstly, we can rewrite the second equation of (1.1) as follows Using Itô formula, we have the following expression which implies Secondly, substituting this equation into the first equation of (1.1) we get Now, we use integration by parts for the non-stochastic integral and Ito's product rule for the stochastic integral to get where the terms I α i (t) (i = 0, 1, 2, 3) are defined in the statement of the lemma. On the other hand, from the second equation of (1.1) we have This implies that Integrating this equation over the interval [0, t] and changing the order of integration in the double integral, we get (2.11) Substituting (2.11) back into (2.9) we obtain (2.6).
The proof for Equation (2.7) is similar.
The existence and uniqueness of solutions to (2.6) and (2.7) under Assumptions 1.1 is stated in the [McK67].

Proof of the main results
In this section, we present the proofs of the main theorems 1.1, 1.2 and 1.3. We start with the displacement process (Theorems 3.1 and 3.2 give Theorem 1.1) in Section 3.1. Then in Section 3.2 we deal with the rescaled velocity process and the velocity process (Theorems 3.3 and 3.4 give Theorem 1.2 and Theorem 1.3 is Theorem 3.5).
3.1. Approximation of the displacement process. In this section, we give explicit bounds on L p -distances and the total variation distance between the solution X α t of (1.1) and the solution X t of (1.3). We will repeatedly use the following fundamental inequalities.
(i) Minkowski's inequality: for p ≥ 1 and n real numbers a 1 , . . . , a n , we have (3.1) (ii) Hölder's inequality: for p ≥ 1, t > 0 and measurable functions f we have The Burkholder-Davis-Gundy (BDG) inequality for Brownian stochastic integrals, see for instance [SP12, Section 17.7]: for 0 < p < ∞ and where C p is a positive constant depending only on p.
Applying the BDG inequality (3.3) to solutions of (2.6) and (2.7) we obtain The next lemma provides important estimates on the moments of the displacement process {X α t , t ∈ [0, T ]}, which will be helpful to prove the main results of this section. Hereafter, we denote by C a generic constant which may vary at each appearance. and for all 0 ≤ t ≤ T , where C is a positive constant depending only on {x 0 , y 0 , κ, γ, K, p, T }.
Proof. We first prove (3.6). We shall divide the proof into two steps.
Step 1: We evaluate the upper bound of the moments of each I α i (t), i = 1, 2, 3, 4.
In the following theorem, we obtain a rate of convergence in L p -distances in the Smoluchowski-Kramers approximation for the displacement process.

Now we consider sup
0≤s≤t |I α 4 (s)| p . Using Lemma 3.1 we can derive that where C is constant depending on {x 0 , y 0 , κ, γ, K, L, p}. From the above estimates, together with the fact that λ(t, α(κ + γ)) ≤ t ≤ T, one sees that where C is constant depending only on {x 0 , y 0 , κ, γ, K, L, p}. Using Growwall's inequality, we obtain the claimed estimate and complete the proof.
In the following lemma, we show Malliavin differentiability of X α t and X t .
The proof for the solution X t of (2.7) is similar.
In the next lemma, we show that the moments of the Malliavin's derivative of solutions of (2.7) are bounded.
Taking the expectation and using Gronwall's inequality, we obtain the claimed estimate.
The following lemma provides an upper bound for the difference between the derivatives of the solutions of (2.6) and (2.7).
By the same estimate for the last term in the right hand side of (3.20), we can obtain ≤ Cλ(t, α(κ + γ)).
From the above estimates, together with the fact that the function a → λ(t, a) is decreasing one can derive that where C is constant depending only on {x 0 , y 0 , κ, γ, K, L, M, T }.
Thus, applying Gronwall's inequality, we get where C is constant depending only on {x 0 , y 0 , κ, γ, K, L, M, T }. This completes the proof of the lemma. Now, we give explicit bounds on the total variation distance between the solution X α (t) of (2.6) and the solution X t of (2.7).
Proof. Lemma 2.1 gives us Thanks to Theorem 3.1 and Lemma 3.4, we obtain where C is a constant depending only on {x 0 , y 0 , κ, γ, K, L, M, T }. Now, from (3.17), one sees that Define the stochastic process M t : for 0 ≤ t ≤ T . We observe that M t is a martingale with bounded quadratic variation. Indeed, M t = t 0 σ ′ 2 (s, X s ) 2 ds ≤ L 2 T. So, by Dubins and Schwarz's theorem (see, e.g. Theorem 3.4.6 in [KSSS91]) there exists a Wiener process ( W t ) t≥0 such that M t = W M t . Then, we arrive at the following This implies that, for each r ≥ 2 and 0 < t ≤ T, .
From Theorem 3.2, together with the fact that for all t > 0 and a > 0, λ(t, a) < 1 a , then we get the following Corollary.

Approximation the velocity and rescaled velocity processes.
In this section, we establish rates of convergence in L p -distances and in the total variation distance for the velocity and rescaled velocity processes. We will discuss the re-scaled velocity process first since in this case, our results are applicable to more general settings where both external forces and diffusion coefficients can be dependent on both x and t, i.e. g = g(t, x) and σ = σ(t, x).
3.2.1. The re-scaled velocity process. From the second equation of (1.1) we can see that the process (3.24) We recall the definition of the re-scaled velocity process introduced in the IntroductionỸ (3.25) Now, we putW t = √ αW t/α , then (W t ) t≥0 is a Brownian motion process and (3.25) can be rewritten in the form   Our goal in this section is to study the rate of convergence in L p -distance and in the total variation distance betweenỸ α t andỸ t . Here,Ỹ t is the solution of Ornstein-Uhlembeck process (1.5), which is (3.27) First, we obtain the rate of convergence in L p -distances betweenỸ α t andỸ t in the following lemma.
T ]} be, respectively, the solution of (3.26) and of (3.27) with Assumptions 1.1 and 1.3. Then, for all p ≥ 2 and α ≥ 1, where C is a positive constant depending on p but not on α.
3.2.2. The velocity process. As mentioned in the introduction, when g(t, x) = g(x) and σ(t, x) = δ, [Nar94, Theorem 2.3] shows that the velocity process Y α t converges to the normal distribution as α → ∞. In the rest of this section, we generalize this result to a much more general setting where g depends on both x and t while σ depends only on t, i.e. σ(t, x) = σ(t).