A new perspective on Wasserstein distances for kinetic problems

We introduce a new class of Wasserstein-type distances specifically designed to tackle questions concerning stability and convergence to equilibria for kinetic equations. Thanks to these new distances, we improve some classical estimates by Loeper and Dobrushin on Vlasov-type equations, and we present an application to quasineutral limits.

1. Introduction 1.1.General overview.Monge-Kantorovich distances, also known as Wasserstein distances, play a central role in statistical mechanics, especially in the theory of propagation of chaos and studying large particle systems' mean behavior.From the late 1970s, there have been many applications of Wasserstein distances in kinetic theory, as is beautifully described in the bibliographical notes of [60,Chapter 6].In particular, these distances are frequently used to prove the uniqueness and stability of solutions to kinetic equations, study singular limits, and measure convergence to equilibrium.
The first celebrated result relying on Monge-Kantorovich-Wasserstein distances in non-collisional kinetic theory is the proof by Dobrushin [16] on the well-posedness for Vlasov equations with C 1,1 potentials, where existence, uniqueness, and stability are proved via a fixed point argument in the bounded-Lipschitz or the 1-Wasserstein distance.As a consequence of this argument, one also obtains the validity of the mean-field limit for Vlasov equations with smooth potentials.The interested reader may refer to [20,Chapter 1.4] and [40,Chapter 3.3] for a detailed explanation of Dobrushin's stability estimate, its consequences on the mean-field limit for the Vlasov equation, and of the role of Monge-Kantorovich-Wasserstein distances.Dobrushin's estimate is at the core of several kinetic theory arguments; see for example [10,11,12,13,15,18,22,29] for some applications.
In recent times, Golse and Paul in [23] introduced a quantum analog of the 2-Wasserstein distance to measure the approximation of the N -body quantum dynamics by its mean-field limit.In [21] the authors prove quantitative stability estimates that are reminiscent of Dobrushin's, and they show that, in the case of C 1,1 potentials, the mean-field limit of the quantum mechanics of N identical particles is uniform in the classical limit.
Another fundamental stability estimate was proved by Loeper [49], who established uniqueness and stability of solutions with bounded density for the Vlasov-Poisson equation.Loeper's argument relies on the fact that the Coulomb kernel is generated by a potential solving Poisson's equation and exploits the strong connection between the 2-Wasserstein distance and the H −1 -norm.Besides providing the best-known uniqueness criterion for Vlasov-Poisson, this approach also gives a new proof of uniqueness ETH Zürich, Department of Mathematics, Rämistrasse 101, 8092 Zürich, Switzerland.Email: mikaela.iacobelli@math.ethz.ch.à la Yudovich for 2D Euler.Loeper's result has been generalized to less singular kernels [37], and it is the cornerstone for several other stability arguments [7,14,36,42,44,48,58].Also, Loeper's uniqueness criterion for Vlasov-Poisson has been extended to solutions whose associated density belongs to some suitable Orlicz spaces [51,38].In the following, we will focus our attention on some applications of Loeper's stability estimate related to quasi-neutral limit for the Vlasov-Poisson equation [28,30,34,35].
In general, extending Dobrushin's and Loeper's estimates is a delicate matter.A possible idea is to introduce an anisotropic metric that weights spatial and momentum coordinates differently.For example, in [43], the author considers a variant of the 2-Wasserstein distance where the cost for moving points in the x-variable is higher than for the v-variable.By suitably selecting the parameters, this allows the author to extend the validity ranges for the mean-field limit for the Vlasov-Poisson system.Also, as shown in [28,30], an analogous method provides better convergence estimates when considering combined meanfield and quasi-neutral limits in Vlasov-Poisson-type systems.At the same time as this paper was written, another variant of this idea was introduced in [56], where the author improves the trend to equilibrium for 1-D kinetic Fokker-Planck equations via estimates measured in an analog of the 2-Wasserstein metric.
This work aims to push further the idea that, when applied to kinetic problems, Wasserstein distances should be modified to reflect the natural anisotropy between position and momentum variables.Moreover, since these metrics are used to measure the distance between PDEs' solutions, we will introduce timedependent counterparts that can vary along with the characteristic flow.Still, it is worth noticing that our method could be applied, beyond the kinetic framework, to equations where the evolution in one of the variables enjoys better regularity properties than the others.Before stating our main results, let us emphasize that the idea of finding appropriate generalised Wasserstein distances has been used successfully in other contexts in the optimal transport and evolution PDE community, see for instance [17,19,45,46,54,55] and references therein.
1.2.Definitions and main results.Let us recall the definition of Wasserstein distances (see for instance [1,60]).In what follows, X will be either the d-dimensional torus T d or the Euclidean space R d .Definition 1.1.Let µ, ν be two probability measures on X × R d .We denote with Π(µ, ν) the set of all probability measures on (X × R d ) 2 with marginals µ and ν.More precisely, We shall call coupling (between µ and ν) an element in Π(µ, ν).
For p ≥ 1, the p-Wasserstein distance between µ and ν is defined as (1) A free-flow W 1 -type distance for the Vlasov equations with C 1,1 potential.Consider two solutions f 1 , f 2 of the Vlasov equation on X , namely The classical Dobrushin's argument shows that In particular, when the potential K is identically zero, this bound provides an exponential stability for W 1 that is far from optimal.Indeed, since the solution is simply given by f By introducing a W 1 -type distance adapted to the free flow, we can prove that This estimate gives the optimal bound when K ≡ 0.Moreover, for B ≤ 1, this provides a better estimate compared to the usual Dobrushin's bound when t ∈ [0, T B ] with (2) An improved W 2 -stability estimate for Vlasov-Poisson with bounded density.For this second application, we focus on the case of the torus for simplicity, but a completely similar analysis works on the whole space.
Consider two solutions f 1 , f 2 of the Vlasov-Poisson equation on T d , namely As shown in [49], Loeper's proof provides the following stability estimate whenever W 2 (f 1 (0), f 2 (0)) is sufficiently small (which is the interesting case): , where c d > 0 is a dimensional constant, while C depends on the L ∞ norm of ρ f 1 and ρ f 2 .This estimate can then be applied to prove the validity of the quasi-neutral limit for Vlasov-Poisson for initial data that are double-exponential perturbation of analytic functions [34,35] (see Remark 3.4).
To improve this result, given (X i , V i ) the characteristics associated to f i , we consider a nonlinear W 2 -type quantity of the form where π 0 is an optimal coupling, and λ(t) = | log(Q(t))|.We then prove that Q(t) is well-defined whenever Q(t) ≪ 1, and finally, comparing Q(t) to W 2 , we show that (see Theorem 3.1).To better understand the improvement of our estimate with respect to Loeper's, one can think as follows: if , so on a much longer time-interval.
• Note that a standard Gronwall estimate of the form 1 for t ∈ [0, | log θ|].So, while Loeper's bound loses an extra logarithm in terms of time-scale, our bound only loses a square root.Since the electric field for a solution with bounded density is at most log-Lipschitz, an estimate of the form W 2 (f 1 (t), f 2 (t)) ≤ e Ct W 2 (f 1 (0), f 2 (0)) is not expected to hold in this setting, and we believe our bound to be essentially sharp.
• Our improvement from log | log θ| to | log θ| 1/2 is similar to the one obtained for the W 1 distance, see [38,Remark 1.7].In that paper, the authors rely crucially on the second-order structure of the Vlasov equation, namely Ẍ = ∇U (t, X).Our proof, instead, relies only on the fact that Ẋ = a(t, X, V ), where a(t, •, •) is Lipschitz, and it can be generalized to other contexts where the second-order structure fails.
Our new stability estimate has interesting applications for what concerns some singular limits for Vlasov-type equations.In particular, by considering the Vlasov-Poisson system in appropriate dimensionless variables that take the Debye length into account, we prove the validity of the quasi-neutral limit for Vlasov-Poisson for initial data that are an exponential perturbation of analytic functions, see also Remark 3.4.
The paper is structured as follows: in the next two sections, we will present our two main results, and then in the final section of the paper, we will discuss more generally our approach and how it leads to the introduction of a new family of Wasserstein-type distances.
2. Dobrushin's estimate revisited 2.1.The Vlasov equation.The Vlasov equation is a non-linear partial differential equation providing a statistical description for the collective behavior of large numbers of charged particles in mutual, longrange interaction.This model was first introduced by Jeans in the context of Newtonian stellar dynamics [41], and later by Vlasov in his work on plasma physics [61,62].The unknown of the Vlasov equation f (t, x, v) is the distribution function of the system at time, that is the number density of particles that are located at the position x and have instantaneous velocity v at time t.The Vlasov equation for the distribution function f reads as follows: where In other words, the Vlasov equation for particle systems is a kinetic model where each particle is subject to the acceleration field F [f ] created by all the other particles in the system.The Vlasov equation is a transport equation and, for a sufficiently regular force field, it can be described by the method of characteristics.The initial distribution f 0 is transported by a characteristic flow (X, V ) generated by the mean-field force ) is divergence free, one has conservation of mass and of all L p -norms.For an introduction to this topic we refer to the lecture notes [20].

2.2.
An improved Dobrushin's estimate.Consider the Vlasov equation with smooth kernel.More precisely, where As explained in the introduction, our goal is to provide a stability estimate for solutions that is optimal in the regime as B tends to zero.Here is our result: Note that, since ∇K is Lipschitz, the characteristic flow is well-defined thanks to Cauchy-Lipschitz theory (see [20,Chapter 2]).To prove Theorem 2.1, we consider π 0 an optimal W 1 -coupling between f 1 (0) and f 2 (0), and we define the quantity Note that We now observe that, since ∇K is B-Lipschitz, we can bound where the second inequality follows from (2.3).For T 1 , we note that Here, similarly to Dobrushin's argument, we use that W 1 admits the following dual formulation: and therefore . Then, by the definition of W 1 (see Definition 1.1), In conclusion, we proved that Recalling (2.2) and (2.3), this yields As noted in the introduction, this estimate is more powerful than the usual Dobrushin's estimate 1 1 Dobrushin's argument is performed considering the so-called bounded-Lipschitz distance on probability measures, which is defined by duality against bounded Lipschitz functions.However, the same proof where one replaces the bounded-Lipschitz distance with the W1 distance (which can be defined by duality against Lipschitz functions, as shown in (2.4)), provides this bound.
when B is small.On the other hand, for large times, the term (1 + t) 3 in our estimate provides a worse bound (2.5).Hence, both bounds are helpful depending on the mutual sizes of B and t, and one can choose to apply whichever gives the stronger bound.In conclusion, one has as desired.
3. Stability estimates for Vlasov-Poisson and quasi-neutral limits 3.1.The Vlasov-Poisson system.The Vlasov-Poisson system is the classical kinetic model describing dilute, totally ionised, unmagnetized plasma.In its most common form, f is the distribution function of the electrons moving in a self-induced electrostatic field, while the ions are assumed to act as a fixed background.In this section, we consider the phase space to be T d × R d , for reasons that will be explained later.
The well-posedness theory of this system has been extensively studied, see, for example, the survey paper [32].Global-in-time classical solutions have been constructed under various conditions on the initial data (see for example [4,6,47,53,57,59]), while global-in-time weak solutions were presented in [2] and [39] for L p initial data (see also [3,5]).In this section, we will focus on an important contribution to the uniqueness theory made by Loeper [49], who proved uniqueness for solutions of (3.1) with bounded density by means of a strong-strong stability estimate in Wasserstein.

3.2.
Quasi-neutral limits.Since plasmas are excellent conductors of electricity, and any charges that develop are readily neutralized, they can be treated as being quasi-neutral.On the other hand, at small spatial and time scales, the quasi-neutrality is no longer verified.The distance over which quasi-neutrality may break down can be described in terms of the Debye length λ D , and varies according to the physical characteristics of the plasma.The Debye length is usually much shorter than the typical observation scale.Therefore, we can define the parameter ε := λ D /L and consider the limit as ε tends to zero.This procedure is known as quasi-neutral limit.
When we take the Debye length into account, in appropriate dimensionless variables, the Vlasov-Poisson system becomes: and the energy of the rescaled system is the following: The quasi-neutral limit corresponds to a singular limit for the rescaled system (3.2), in which the formal limiting system is the Kinetic Isothermal Euler system: The force E = −∇ x U is defined implicitly through the incompressibility constraint ρ = 1, and may be thought of as a Lagrange multiplier associated to this constraint.In other words, electrons move under the effect of a gradient in such a way that their density remains equal to 1 everywhere.Thus (KIE) is a "kinetic" version of the incompressible Euler equations.As shown in [8], the potential U formally satisfies the Laplace equation As discussed in [31], the justification of this limit is very delicate.In particular, it can fail even for smooth initial data.Still, a series of positive results are available.In particular, as shown in [34,35], a way to get the validity of the quasi-neutral limit for a large class of data can be achieved if one can prove some quantitative strong-strong stability at the level of the (V P ) ε system.Also, the stronger the stability estimate, the larger the class of initial data for which the quasi-neutral limit hold.In [34,35] the authors prove that the quasi-neutral limit holds for initial data that are an extremely small perturbation of an analytic function.Here, by introducing a suitable non-linear version of the Wasserstein distance, we can considerably improve that results.
Here is our main theorem, which provides us with a new W 2 stability estimate.We prove the result with a general parameter ε ≤ 1 as this is necessary for the study of the quasi-neutral limit.The reader interested in the Vlasov-Poisson case can simply apply our estimate with ε = 1.Theorem 3.1.Let ε ≤ 1, and let f 1 , f 2 be two weak solutions of the (V P ) ε system (3.2), and set

Define the function
and assume that A(t) ∈ L 1 ([0, T ]) for some T > 0. There exist a dimensional constant C d > 0 and a universal constant c 0 > 0 such that the following holds: if then The assumption (3.4) depends on the time interval [0, T ].If T is very small so that Of course this is not the relevant regime since the time interval is usually at least of size 1.In particular, since Therefore (3.4) corresponds to asking W 2 (f 1 (0), f 2 (0)) being bounded by e −Cε −2 .This requirement is very natural in this context, as also discussed in Remark 3.4.
As in [34,35], Theorem 3.1 yields the validity of the quasi-neutral limit for W 2 -perturbations of analytic data.However, our estimate is stronger with respect to the previous results and provides an almost optimal rate in the quasi-neutral limit.More broadly, we believe that our approach for proving Theorem 3.1 has its own interest and could be used in other settings.
To state our application to the quasi-neutral limit, we need to recall some notation introduced by Grenier [26] in one of the first mathematical works on this topic.In [26] the author relies on an interpretation of the plasma as a superposition of a -possibly uncountable-collection of fluids and he shows that the quasi-neutral limit holds when the sequence of initial data f 0,ε enjoys uniform analytic regularity with respect to the space variable.As explained in [34] (see the discussion after Definition 1.4), this decomposition is purely a technical tool and it does not impose any restriction on the initial datum.This result has been improved by Brenier [9], who gives a rigorous justification of the quasi-neutral limit in the so called "cold electron" case, i.e. when the initial distribution f 0,ε converges to a monokinetic profile where δ v denotes the Dirac measure in velocity, see also [9,50,25].
Let us define a suitable analytic norm, as in [26]: given δ > 0 and a function g : T d → R, we define where g(k) is the k-th Fourier coefficient of g.We define B δ as the space of functions g such that g B δ < +∞.
For all ε ∈ (0, 1), consider f ε (t) a global weak solution of (3.2) with initial condition f 0,ε , and define the filtered distribution function where (d ± ) are the correctors are defined as the solution of Then there exist T > 0, and g(t) a weak solution on [0, T ] of (3.3) with initial condition g 0 , such that Remark 3.4.Already in the one dimensional case, there is a negative result stating that an initial rate of convergence of the form W 2 (f 0,ε , g 0,ε ) ≤ ε k for some k > 0 is not sufficient to ensure the validity of the quasi-neutral limit for positive times.This is the consequence of instability mechanisms described in [27] and [33].Hence, our assumption on the size of W 2 (f 0,ε , g 0,ε ) considerably improves the results in [34,35], where a double exponential exp − exp(Kε −ζ ) was required.
Remark 3.5.In Corollary 3.3 we consider sequences of initial conditions with compact support in velocity (yet, we allow the support to grow polynomially as ε goes to zero).The reason is that we need L ∞ bounds on the density ρ fε (t) = f ε (t) dv, so a control on the support in velocity is needed.We have decided to put these assumptions because they are the same as in [6] and so we can rely on some estimates proved in that paper.However, using the argument in [52] (see also [29]) one could relax the assumptions and require only a moment condition on f 0,ε .Providing this extension is not difficult, but it would require some work that would go beyond the main goal of this paper.
Before proving Proof of Theorem 3.1, we first show how it implies Corollary 3.3.
Proof of Corollary 3.3.Let g ε (t) denote the solution of (3.2) starting from g 0,ε .As shown in [34,Section 4], under the assumptions in the statement, the following bounds hold: 1) , Then To prove Theorem 3.1, we consider π 0 an optimal W 2 -coupling between f 1 (0) and f 2 (0), and we define the quantity Q(t) defined as the unique constant (assuming it exists) such that In other words, we are considering a quantity of the form with λ(t) depending on time, and we are assuming that actually λ(t) is a function of | is specific to this problem: the logarithm will help to compensate for the log-Lipschitz regularity of the electric fields, while ε −2 is the natural scaling in the current setting.Note that a priori is not clear that Q(t) is well-defined.This will be proved in Lemma 3.7 below.However, assuming for now that Q(t) is well-defined, we show how this quantity allows us to prove the result.We have By Cauchy-Schwartz inequality and recalling the definition of Q(t) we have: Adding and subtracting −E 2 (t, X 1 ) we obtain: where Thanks to Lemma 3.6 and by the very same argument in [34] we can bound T 1 and T 2 as follows: where we have | and we substitute this expression in the derivative of Q(t).
Notice that in this estimate we are interested in small values of Q(t) and in particular, as we will show below, we will always be in the regime ε 2 Q(t)/| log(Q(t))| ∈ (0, 1/e).Therefore we have so by equation (3.6) we have We now consider two cases, depending on the sign of Q ′ (t).If Q ′ (t) ≤ 0, then we do not do anything.If instead Q ′ (t) > 0, then the first term in the right-hand side above is negative, and therefore Since the right-hand side above is nonnegative, independently of the sign of Q ′ (t) we know that the bound above holds.We now observe that as long as Thus, where C d is a dimensional constant.Note that the two conditions Q(t) ≤ ε and | log(Q(t))| ≥ 1 are guaranteed if Q(t) ≤ ε e (recall that ε ≤ 1 by assumption).Hence, provided that we are in the regime Q(s) ≤ ε e on [0, t], this implies We observe that the bound (3.7) guarantees that sup In particular, (3.7) holds if We now compare the quantity Q to the Wasserstein distance.First of all, since ≤ Q(t). 4 We recall that T d ρ(x, t)dx = 1, that implies ρ(•, t) L ∞ (T d ) ≥ 1. Therefore A(t) ≥ 1.
On the other hand, since ε −2 | log(Q(0))| ≥ 1 and π 0 is an optimal plan, or equivalently We now observe that, near the origin, the inverse of the function s → s | log s| behaves like τ → τ | log τ |.In particular, there exists a universal small constant c 0 > 0 such that Combining these bounds with (3.7), and recalling (3.8), this implies Finally, to complete the proof, we show the following: Lemma 3.7.With the notation and assumptions of the theorem, the quantity Q(t) is well defined and it is locally Lipschitz continuous where Q(t) > 0. In particular it is differentiable a.e. Proof.Set We can assume that D(t) and E(t) are nonzero, otherwise we are in the "degenerate" situation where f 1 ≡ f 2 , in which case Q(t) is trivially 0. Also, since D(t) and E(t) are written in terms of the characteristic flow, it is standard to check that they are differentiable. 5e note that the quantity Q(t) is implicitly defined via the relation or equivalently, for each fixed time t, Q(t) is the solution of the equation F (q, D(t), E(t)) = 0 with F (q, r, s) := q + ε −2 log q r − s for q ∈ (0, 1).
Since the function q → q + ε −2 log q D(t) is strictly increasing on (0, 1) and its image covers the interval (0, 1), we deduce that the equation above has a unique solution provided E(t) < 1.Hence, this proves that Q(t) ∈ (0, 1) is well defined provided E(t) < 1.In addition, thanks to the implicit function theorem applied to the function F ∈ C 1 loc ((0, 1) × R × R), we deduce the existence of a C 1 loc function G such that Q(t) = G(D(t), E(t)).Now, differentiating the relation (3.9) with respect to t we obtain Hence, since D and E are uniformly bounded and Lipschitz, for any δ > 0 we deduce that This proves that, for any δ > 0, the function t → Q(t) is uniformly Lipschitz continuous inside the set {Q(t) > δ}.This proves that Q(t) is locally Lipschitz continuous inside the region {Q(t) > 0}.So, to conclude the proof, we need to ensure that E(t) < 1.Note that, since by assumption , by continuity we have that E(t) < 1 for t > 0 small.So Q(t) is well defined for t > 0 small.Also, as long as Q(t) is well defined, we have that Q(t) ≥ E(t).Hence, as long as Q(t) is well defined, we have that Since, by our smallness assumption on W 2 (f 1 (0), f 2 (0)), the right hand side above remain small on [0, T ], the bound above guarantees that E(t) ≪ 1 for all t ∈ [0, T ].This proves that Q(t) is well-defined on [0, T ], which concludes the proof.
Remark 3.8.In the previous proof we considered λ(t) = ε −2 | log(Q(t))| and in Lemma 3.7 we proved that Q(t) is well-defined provided it is small enough.This restriction is due to the fact that the function With this definition, the proof of Lemma 3.7 shows that Q(t) is always well defined (without any restriction on the size of E(t)), and it is locally Lipschitz continuous where Q(t) > 0.
Since in our setting we are interested in the case Q(t) ≪ 1, there is no advantage in using this latter definition of λ.However this observation could be useful in other situations, see also Section 4 below.

Summary, generalizations, and perspectives
As we have seen in the last two sections, suitably modifying Wasserstein distances can be particularly useful in a kinetic setting to take advantage of the asymmetry between x and v.More precisely, let X = T d or X = R d , and let µ and ν be two probability measures on X ×R d .Also, let Π(µ, ν) denote the collection of all measures on (X × R d ) 2 with marginals µ and ν on the first and second factors respectively.
The first natural generalization, given p ≥ 1 and λ ∈ R + , is to consider (i) First, we considered the nonlinear version of the W λ, p (µ, ν) by choosing λ depending on the distance itself.We defined this along a flow, but that can be also be defined in a general setting as follows: given p ≥ 1 and a decreasing function Φ : R + → R + , for every π ∈ Π(µ, ν) and λ we define D p (π, Φ) as the unique number s such that s − Φ(s) This definition with Φ(s) = ε −2 | log s| for s ∈ (0, 1/e) and p = 2 essentially corresponds to the quantity used in the proof of Theorem 3.1, although there we considered the quantity D(t) where we did not take the infimum over couplings π ∈ Π(µ, ν), since it was not needed for our purpose.(ii) In a different direction, we modified the W 1 distance by introducing a shift in position.Note that this second quantity cannot be defined as a "static" distance since the shift x − tv depends on the time t.Hence, one can generalize it only as a time dependent quantity as follows: Of course, these approaches can be further combined by mixing the different quantities defined above.Note that there is no universal "best" choice, and each problem requires its adaptation.Still, we believe, as this paper shows, that this approach can lead to an improvement to several existing results, as well as to prove new estimates.In addition, the approach is very general and can be useful in any situation where there is an asymmetry between the variables involved.
To mention some concrete applications, our ideas could also be applied in the setting of quantum systems by suitably modifying the quantum Wasserstein distances introduced in [21,24].Also, our new Loeper-type estimate may be helpful to obtain stability estimates in W 2 when the density belongs to some suitable Orlicz spaces, in analogy to [38] where stability estimates have been proved for W 1 .