Annealed quantitative estimates for the quadratic 2D-discrete random matching problem

We study a random matching problem on closed compact $2$-dimensional Riemannian manifolds (with respect to the squared Riemannian distance), with samples of random points whose common law is absolutely continuous with respect to the volume measure with strictly positive and bounded density. We show that given two sequences of numbers $n$ and $m=m(n)$ of points, asymptotically equivalent as $n$ goes to infinity, the optimal transport plan between the two empirical measures $\mu^n$ and $\nu^{m}$ is quantitatively well-approximated by $\big(\mathrm{Id},\exp(\nabla h^{n})\big)_\#\mu^n$ where $h^{n}$ solves a linear elliptic PDE obtained by a regularized first-order linearization of the Monge-Amp\`ere equation. This is obtained in the case of samples of correlated random points for which a stretched exponential decay of the $\alpha$-mixing coefficient holds and for a class of discrete-time Markov chains having a unique absolutely continuous invariant measure with respect to the volume measure.

1.1.The random matching problem and its asymptotic.The random matching problem is a popular optimization problem at the interface between analysis and probability with applications in many different fields such as statistical physics [16,44], computer science [40] and economics [20,24].Within the mathematical literature, it has been subject of intense studies due to its interactions with many areas, including for instance graph theory [38] and geometric probability [50].In this paper we focus on one of its simple versions.Let {X k } 1≤k≤n and {Y k } 1≤k≤m (with possibly m > n) be two families of random points on a compact Riemannian manifold M (endowed with the Riemannian distance d).We are interested in the quadratic matching problem Classically, (1.1) can be phrased in terms of a transport problem.Indeed, letting (1.2) be the empirical measures associated to the two point clouds, the linear programming problem (1.1) amounts to determine the quadratic Wasserstein distance W 2 2 (µ n , ν m ).In the special case n = m the Birkhoff-von Neumann's Theorem provides a correspondence between (1.1) and the usual bipartite matching (1.3) min where S n denotes the set of injective maps σ : {1, . . ., n} → {1, . . ., n}.Indeed, since Π nn is a convex polytope, minimizers in (1.1) have to be searched among extremal points.By the Birkhoff-von Neumann Theorem [8,Lemma 2.1.3],the latter are nothing but permutation matrices (up to a factor1 n ).
A first natural question is to understand the asymptotic of (1.1) as n, m ↑ ∞.For the same number of samples n = m and independently and identically distributed (i.i.d.) on the unit square [0, 1] d , the scaling of the cost (1.1)has been well understood in the mathematical and statistical physics literature.A simple heuristic argument, see for instance [44], suggests that given a point X i , we can find a point Y j within a volume of order O(n −1 ) with high probability.
For this reason, the typical inter-point distance is of order O(n − 1 d ) suggesting that the scaling of (1.1) is of order O(n − 2 d ).Although attractive, this heuristic turns out to be unfortunately false in low dimension showing a critical behavior when d = 2.This critical case is the one on which we focus in this paper.Ajtai, Komlós and Tusnády in [1] were the first to show that, for i.i.d.uniform samples, a logarithmic correction is needed, deriving 1   (1.4) extended later by Talagrand in [53] for clouds of i.i.d. points which are distributed accordingly to more general common law.A recent breakthrough was obtained within the physics community by Caracciolo, Lucibello, Parisi and Sicuro in [16], and further developed by Caracciolo and Sicuro in [17] and by Sicuro in [49], where the asymptotics of the cost are formally derived thanks to a novel PDE approach and optimal transport theory rather than combinatorics.A couple of years later, in general 2-dimensional compact Riemannian manifolds without boundary, the first-order asymptotic has been rigorously justified by Ambrosio, Stra and Trevisan in [6] for i.i.d.uniform samples and recently extended by Ambrosio, Goldman and Trevisan in [5] for samples distributed accordingly to more general laws which are absolutely continuous (with Hölder continuous density) w.r.t. the volume measure dm, leading to where |M| denotes the Lebesgue measure of M. The case n = m with n, m ↑ ∞ with similar rates is also covered, see [5,Theorem 1.2].
The novel approach introduced in [16], later revised in [11] consists on a linearization of the Monge-Amperé equation that allows for an explicit description of the cost thanks to the linearized proxies (see Section 1.2 for more details).The aim of this work is to quantitatively justify the linearization ansatz in terms of convergence of the approximating minimizers of (1.1) towards the optimal ones.In particular, we are interested in the case where the points are identically distributed with a common law ρ dm (where we recall that dm denotes the volume measure) and ρ satisfies for some λ, Λ > 0 To the best of our knowledge, there are only few results on the asymptotic behavior of the transport map and they are so far limited to the case of i.i.d.uniform samples in the study of the semi-discrete matching problem (that is couplings between µ n and dm), see the work of Ambrosio, Glaudo and Trevisan in [4].In connection with this work, quantitative estimates on the optimal map for the matching between the Lebesgue measure and Poisson clouds have been obtained by Goldman, Huesmann and Otto in [29] and Goldman and Huesmann in [28].
Our extension in this paper is fourfold: First, we look at more general distribution of points and we consider the case of general densities ρ satisfying (1.6).Second, we do not assume independence and we consider samples which may possess correlations.Third, we do not restrict the analysis to the semi-discrete matching problem and we also investigate the ansatz for the full matching problem (1.1).Finally, we investigate the case where the points are not identically distributed and we extend our result to a class of Harris positive recurrent Markov chains.
We finally mention that the effectiveness of the linearization ansatz introduced in [16] is not only limited to the case of i.i.d.distributed points on bounded domains, but it can be employed in many different settings.See for instance [15] for an interesting application to the matching on unbounded domains, [35,36,37] for an application to Gaussian matching, [33] for an application in stochastic matrix theory, [58,59,60,32] for an application to a continuous instance of the matching problem, i.e. when the empirical measure is replaced by the occupation measure of a stochastic process.It is worth to further mention that these techniques can also be employed when considering the matching problem with p-costs in higher dimension, see [30].
1.2.Linearization ansatz.We now briefly reproduce the linearization ansatz introduced in [16].For simplicity, we consider the case M = T 2 , n = m and i.i.d.samples with common distribution ρ dm.Let T n be an optimal transport map (whose existence is ensured by Brenier's Theorem [14]) between µ n and ν n .Based on the transport relation T n # µ n = ν n and a change of variables, T n solves (formally) the Monge-Ampère equation Since the cost is quadratic, by [48,Theorem 1.25], there exists a function h n such that T n = Id+ ∇h n .Applying the law of large numbers, we further have the weak convergences µ n , ν n ρ dm as n ↑ ∞ so that we expect T n ≈ Id as n ↑ ∞.Thus, this suggests that the correction ∇h n is small as n ↑ ∞, allowing to perform (formally) the Taylor expansions Plugging (1.8) into (1.7),neglecting the higher order terms and replacing µ n by ρ yields (1.9) This formal linearization suggests the following two conjectures Unfortunately, (1.10) cannot hold as it is, since the solution of (1.9) does not belong to H 1 due to the roughness of the source term.To overcome this, following the strategy in [6], a regularization using the heat-semigroup at time ∼ 1 n is made.Doing so, the first item of (1.10) turns out to be true, leading to the result (1.5) (see for instance [3] for a convergence rate).
1.3.Formulation of the main results.For the remainder of the paper M denotes a 2dimensional connected and compact Riemannian manifold without boundary (or the square [0, 1] 2 ) endowed with the Riemannian distance d.For t > 0 we denote by p t the fundamental solution of the heat operator ∂ t − ∆ on M, where ∆ denotes the Beltrami-Laplace operator.We define the heat semigroup (P t ) t>0 via its action on probability measures µ ∈ P(M) and square integrable functions f ∈ L 2 (M) P t µ := ˆM p t (•, y)dµ(y) and P t f := ˆM p t (•, y)f (y)dm(y).
We first introduce the class of correlated point clouds that we consider for studying the matching problem (1.1).This class concerns point clouds {X i } i for which the correlations between points decay at an exponential rate, where the correlations are measured in terms of the α-mixing coefficients given by, for any ≥ 1 (1.11) and the β-mixing coefficients given by 2 (1.12) Assumption 1.1 (Correlated point clouds).We consider point clouds {X i } i ⊂ M which are identically distributed according to ρ dm where ρ satisfies (1.6).We further assume decay of the correlations in the form of and there exist a, b > 0 and η ∈ (0, ∞] such that Our first main result concerns the approximation of transport plans coupling {µ n } n and {ν m } m defined in (1.2).We justify the formal linearization of the Monge-Ampère equation achieved in Section 1.2 in a annealed quantitative way (i.e. in expectation): We show that, a suitable regularization of, the plan Id, exp(∇h n ) # µ n , with h n defined in (1.9), provides a good approximation when measuring the error with respect to the W 2 -Wasserstein distance in the product space M × M endowed with the metric The density ρ will need further regularity in form of fractional Sobolev spaces defined as, for some ε > 0 where {λ k , φ k } k denote the eigenvalues and eigenvectors of −∆ on M and f (k) = ´M f φ k denotes the Fourier modes of f .Finally, we denote by Ḣ1 the L 2 -based Sobolev space (1.17) Theorem 1.2 (Approximation of the transport plan).Let ρ ∈ H ε for some ε > 0 satisfying (1.6) and {µ n } n and {ν m } m be defined in (1.2) (for m = m(n) with some given increasing map m : N → N) with point clouds satisfying Assumption 1.1 and such that there exists q ∈ [1, ∞) for which m(n) n → n↑∞ q.We consider 3 h n,t ∈ Ḣ1 the weak solution of for any t ∈ (0, 1) with µ n,t := P t µ n and ν m,t := P t ν m .
There exist an exponent κ > 0, a deterministic constant C and a random variable C n both depending on λ, Λ and M for which given t = log κ (n) n and where the inf runs over all optimal transport plans π between µ n and ν m .Furthermore, if (1.14) holds with η > 2, the assumption (1.13) can be dropped and it holds where the inf runs over all optimal transport plans π between µ n and ν m .
Our second main result concerns the particular case of the semi-discrete matching problem, i.e. optimal coupling between the common law ρ dm and {µ n } n .We know from McCann's theorem [39] that there exists a unique optimal transport map T n , that is the optimal transport plan π n can be written as π n = Id, T n # µ n .We show that T n can be approximated in a annealed quantitative way in L 2 by (a suitable regularized version of) the solution of (1.9) with ν m replaced by ρ.
Theorem 1.3.Let ρ ∈ H ε for some ε > 0 satisfying (1.6) and {µ n } n be defined in (1.2) with a point cloud satisfying Assumption 1.1.We consider f n,t ∈ Ḣ1 the weak solution of where, for all t ∈ (0, 1), we recall that µ n,t = P t µ n and ρ t = P t ρ.Finally, we denote by T n the optimal transport map from ρ dm to µ n .
There exist an exponent κ > 0, a deterministic constant C and a random variable C n both depending on λ, Λ and M for which given t = log κ (n) n , it holds 3 where we impose additional Neumann boundary conditions in the case M = [0, 1] 2 Furthermore, if (1.14) holds with η > 2, the assumption (1.13) can be dropped and it holds We finally mention that in the case where the eigenfunctions {φ k } k admit a uniform bound, the conclusions (1.21) and (1.24) can be improved.We comment on the proof at the end of Sub-section 3.5.
Remark 1.4.Let {µ n } n and {ν m } m be as in Theorem 1.2.We assume that the family of eigenfunctions {φ k } k satisfies the uniform bound Then, (1.21) and (1.24) hold true for η > 1 with a convergence rate log(n) with the same stochastic integrability.Note that (1.25) typically holds when the geometry of M is flat, see [54].
Theorem 1.2 and Theorem 1.3 are not restricted to the case of identically distributed point clouds and we present in the next section and in Appendix C a possible extension, using the same techniques, to a class of regular discrete-time Markov chains.
1.4.Extension to a class of Markov chains.We first recall some basic facts on discrete-time Markov chains on M. Such a Markov process is described by its initial distribution µ 0 ∈ P(M) and its transition kernel K, that is a measurable map from M to the space of probability measures P(M).We recall that K acts on P(M) in form of Given an initial distribution µ 0 ∈ P(M), we recall that the law of a Markov chain {X n } n≥0 can be expressed with the help of the transition kernel, namely (1.28) X n ∼ K n µ 0 for any n ≥ 0, where K n stands for the n th -iteration of the kernel K.
We now introduce the class of discrete-time Markov chains we consider.
Assumption 1.5.Let M be a 2-dimensional compact Riemannian manifold.Let µ 0 ∈ P(M) and {X n } n≥1 ⊂ M be a Markov chain with initial law µ 0 .We assume that the chain satisfies the two following conditions: (i) We assume that there exists a measurable function k : M × M → [0, 1] and λ, Λ > 0 such that for any Borel set A ⊂ M (ii) We assume that there exists an unique invariant measure µ ∞ , i.e.Kµ ∞ = µ ∞ , and V : M → [0, ∞), such that ´V dµ 0 < ∞, as well as σ ∈ (0, 1) for which We now comment on the consequences of the above assumptions.First, the condition (1.29) ensures that the invariant measure µ ∞ is absolutely continuous with respect to the volume measure with bounded density, namely We briefly give the argument for (1.31).Using the second item of (1.29), we have that the operator Furthermore, we note that the closed convex set C := {f ∈ L 1 | λ ≤ f ≤ Λ and ´M f = 1} is invariant under the action of I. Therefore Schauder's fixed point theorem implies that I admits a fixed point in C. Given such a fixed point ρ, it is clear that µ ∞ defined in (1.31) is an invariant measure according to (1.29).
Second, we note that by Harris' theorem [31,Theorem 3.7], the condition (1.30) is satisfied when the kernel satisfies the following geometric drift condition: There exist a function V : M → [0, ∞) and constants τ ∈ (0, 1) and C ≥ 0 such that and the kernel K is bounded on the sublevel sets of V , i.e. for all R ≥ 0, there exists a constant 0 < a < 1 for which Moreover, the assumption (1.30) implies the geometric decay of the β-mixing coefficient (1.12), namely there exists a constant C depending on λ, Λ and m(M) such that Finally, the condition (1.30) quantifies the weak convergence of the law of the Markov chain to its stationary distribution, namely there exists a constant C > 0 such that for any f ∈ L ∞ (M) We shortly give the argument.We first notice that a direct inductive argument together with the semigroup property K n 1 +n 2 = K n 1 K n 2 for every n 1 , n 2 > 0 and Fubini's theorem gives The combination of (1.28), (1.34) and (1.30) gives A classical example of a Markov chain satisfying Assumption 1.5 is given by iterated function systems with additive noise.For simplicity, let M = T 2 .Let {θ n } n≥1 be i.i.d.random variables with common law h dm for some h satisfying λ ≤ h ≤ Λ.Let F : T 2 → T 2 be a contraction, i.e. there exists a constant L < 1 such that We define the iterated function system {X n } n≥1 according to the induction The kernel is given by Moreover, the condition (1.35) ensures the validity of (1.30), see for instance [2,Theorem 3.2].Thus the Markov process {X n } n≥1 satisfies Assumption 1.5.
We show in Appendix C that the conclusions of Theorem 1.3 and Theorem 1.2 hold true for Markov chains satisfying Assumption 1.5.
1.5.Open problems.We conclude this section with open questions that arise in view of our results: (1) Sharpness of the rates.The convergence rate in (1.20) and (1.23) match with the one obtained for the case of uniformly distributed samples in [4].However, even in the latter case, it has not been shown whether the rate is optimal and we suspect the opposite.A possible way to track the optimal rate could be to perform a second-order linearization of the Monge-Ampère equation (1.7).Following the same type of computations leading to (1.9) in the case ρ = 1, a second-order linearization q n should solve where we recall that h n solves (1.9), providing the conjecture This direction of research is left for future investigations.
(2) Extension to p-costs.A natural question is to investigate if our results hold for different cost functions as p-cost functions for p > 1.The behavior of the cost has been optimally quantified in [1,13,9,23].However, to the best of our knowledge, quantitative estimates on the transport plan in the setting of general p-costs are not known, even for uniformly distributed samples.A possible approach would be to revise the linearization ansatz of [16] for general p-costs.As observed in [34], if the transport cost between two points x, y is given by 1 p |x − y| p , Gangbo-McCann's theorem [25,Theorem 1.2] ensures that there exists a map h n such that the optimal transport map T n takes the form T n = Id + |∇h n | p −2 ∇h n , where p denotes the conjugate exponent.Therefore, following the same type of computations leading to (1.9), a first-order linearization should solve the following degenerate p -Laplace equation and we may expect

Structure of the proof
This section is devoted to describe the main ideas and how are organized the proofs of Theorem 1.2 and 1.3.We mainly focus on the proof of Theorem 1.2 since the proof of Theorem 1.3 follows by the same strategy.
General strategy.The proof of Theorem 1.2 follows the strategy employed in [4] to deal with independent and uniformly distributed random points.The main idea is to use the quantitative stability result for transport maps in [4, Theorem 3.2], stating that two transports map are close in the L 2 -topology if the target measures are close in the W 2 -topology.We restate below the result for reader's convenience.
There exists a constant c > 0 depending on M such that, provided The first step consists of using Theorem 2.1 to deduce a stability estimate (in terms of the quadratic Wasserstein distance) of transport plans in the special case where µ 1 = ν m , µ 2 = exp(∇h n,t ) # µ n and ν = µ n .In this step, we immediately face the issue of the lack of regularity of h n,t necessary to ensure the condition (2.1): Indeed, recalling that h n,t solves (1.18), it does not have the C 1,1 -regularity condition for non-smooth densities ρ that we consider here.We overcome this issue introducing an additional regularization step: We smooth the operator −∇ • ρ∇ and implicitly γ n,t in form of , for a regularization parameter δ to be optimized and ρ δ := P δ ρ.Classical Schauder's theory ensures that h n,t δ owns C ∞ -regularity.Doing so, we can use Theorem 2.1 to deduce, provided that a stability error estimate which reads ), where we refer to (3.107) for further details.Our argument then differs from [4] by splitting the error in two parts stability error .
Our proof then proceeds in two steps, controlling separately the two terms in (2.5).

Control of the regularization error.
To deal with the regularization error, we look at the difference e n,t δ := h n,t δ − h n,t which solves, according to (1.18) and (2.2), (2.6) On the one hand, since ρ ∈ L ∞ , we have ρ δ → ρ as δ ↓ 0 in every L q with q < ∞.On the other hand, we learn from Meyers' estimate (recalled in Proposition A.3) that there exists q > 2 such that ∇h n,t ∈ L q. Consequently, we can treat (2.7) using Hölder's inequality which provides ∇h n,t 2 L q .
The next step is to control the averaged Meyers' norm E ∇h n,t 2 L q , that we show in Proposition 3.3 to be of order of (2.9) where we recall that η denotes the correlation length, see Assumption 1.1.The combination of (2.8), (2.9) and local Lipschitzianity of the exponential map yields where we refer to (3.104) for further details.
Control of the stability error.For the stability error, we first need to ensure (2.3).Our strategy follows the idea in [3] which consists of showing that (2.3) is satisfied with very high probability.
In our case, a new difficulty comes from our regularization of ρ and the regularization parameter δ has to be carefully optimized.We show that if δ is taken as an inverse power of log(n), (2.3) becomes very likely as n ↑ ∞.More precisely, we show in Proposition 3.4 that given two exponents κ 1 and υ κ 1 , there exists κ 2 such that given the choices we have (2.12) This result is in the spirit of [3, Theorem 3.3] but, in our setting, the proof relies on Schauder's theory rather than an explicit formula for h n,t δ as well as concentration inequalities in form of Proposition A.1 to treat the correlations.For further details on the strategy, we refer to Section 3.3.With (2.12) in hands, we can restrict the analysis to realizations satisfying ∇h n,t δ where, for n 1, (2.3) is satisfied which puts us in the validity of (2.4).We then treat each terms appearing in (2.4) separately: • The optimal control of the cost W 2 2 (µ n , ν m ) has been already well studied and is optimally estimated by We refer to Appendix B for a detailed statement, references and extensions to the cases of Assumption 1.1 and Assumption 1.5.
• The smoothing errors W 2 2 (µ n , µ n,t ) and W 2 2 (ν m , ν m,t ).Classical contractivity estimates are known and are usually applied to deal with these errors, see for instance [22,Theorem 3], which bound the errors by t.However, due to the choice of t in (2.11), this result is of no use since t is much larger than the magnitude of the cost, namely t log(n) n .Instead, we follow the approach in [3], where the authors showed that in the particular case of empirical measures in dimension 2, we can improve the rate and obtain the bound log log(n) n log(n) n .We extend this result to our setting of non-constant densities and correlated points.In Proposition 3.5, we derive As opposed to [3], our approach uses Fourier analysis together with additional cares to handle the correlations and non-constant densities.
• The error in the Moser coupling W 2 2 ν m,t , exp(∇h n,t δ ) # µ n,t .We follow the strategy in [3].This error can be related to a Moser coupling between µ n,t and ν m,t (see for instance [56,Appendix p. 16]): The equation (1.18) gives a natural coupling between µ n,t and ν m,t which can be formulated using Benamou-Brenier's theorem [10], Then, using the transport plan φ(1, •), exp(∇h n,t δ ) # µ n,t as a competitor, we get that we combine with a quantitative stability result for flows of vector fields, proved in [3, Proposition A.1], leading to where we recall that q denotes the Meyers' exponent introduced in (2.7).For further details, we refer to (3.115).It then remains to appeal to Meyers' estimate, see Proposition A.3, to (2.6) together with (2.9) in form of which finally yields (2.15) To conclude, we see that all bounds involve errors in terms of the approximation of ρ using the heat-semigroup, that we need to quantify.This is where the assumption ρ ∈ H ε plays a role, in form of the quantitative estimate 3. Proofs 3.1.Notations and preliminary results.We provide in this section some notations and preliminary results needed in the proofs of Theorem 1.2 and Theorem 1.3.We recall that throughout the paper, we denote by M a 2-dimensional compact connected Riemannian manifold (or the square [0, 1] 2 ) endowed with the Riemannian distance d.
Wasserstein distance.Given µ, ν ∈ P(M), we define the quadratic Wasserstein distance as where Π(µ, ν) is the set of couplings between µ and ν, that is the set of π ∈ P(M × M) having µ and ν as first and second marginal, respectively.We refer the reader to the monographs [55,56] for a detailed overview on the subject of optimal transport.We recall the following simple, but useful Lipschitz contraction property of the Wasserstein distance.
Lemma 3.1 (Lipschitz property of the Wasserstein metric).Let (D, d D ) be a complete and separable metric space, let µ, ν ∈ P(M) and let T : M → D be a L-Lipschitz map.It holds Taking the infimum among all possible couplings π ∈ Π(µ, ν) leads to (3.2).
Heat semigroup and heat kernel.We recall some basic facts on the heat semigroup and its generator, we refer the reader to [18, Chapter VI] for a more detailed overview.For t > 0, we denote by p t the fundamental solution of the heat operator ∂ t − ∆ on M, where ∆ denotes the Beltrami-Laplace operator.Classical Schauder's theory ensures that p t is smooth and it is known, see for instance [52] and [3, Appendix B], that p t and its derivatives satisfy for some C > 0 depending on M for any t > 0, N ≥ 1 and x, y ∈ M.
The kernel p t admits the spectral decomposition We recall that {φ k } k≥1 can be estimated in terms of the eigenvalues, k .We briefly recall the argument.Applying the Gagliardo-Nirenberg's interpolation inequality [7,Theorem 3.70], it holds We recall that (P t ) t>0 admits the spectral gap property, that is there exists a constant C sg > 0 such that Note that equivalently, (3.7) can be formulated in terms of the eigenvalues {λ k } k≥1 in form of simply by specifying (3.7) on {φ k } k≥1 .Finally, we recall the Riesz-transform bound where (−∆) 1 2 can be defined via its spectral representation with (•, •) L 2 the inner product in L 2 .We refer to the monograph [57] for a discussion of the inequalities (3.7) and (3.9), see Chapter 1 for the case of a Riemannian manifold without boundary and Chapter 2 for the case of a Riemannian manifold with (convex) boundary.In connection with the Wasserstein metric, the heat semigroup satisfies the following classical contraction property.
Lemma 3.2 (Semigroup contraction for absolutely continuous measures).Let ρ ∈ L ∞ satisfy (1.6).Given ρ t := P t ρ, it holds 3.2.L q -type estimates.As we have seen in (2.8), we need a sharp control of the averaged Meyers' norm E ∇h n,t 2 L q .This will be obtained as an immediate corollary of the following proposition, for more details see (3.101).Proposition 3.3 (L q -estimates).Let {µ n } n be defined in (1.2) with point clouds satisfying Assumptions 1.1.Let q be the Meyers exponent given in Theorem A.3 for the operator −∇ • ρ∇. satisfies: and a random variable C n,t satisfying for some generic constant C q depending on q, Furthermore, if (1.14) holds with η ≥ 1 then the assumption (1.13) can be dropped and the stochastic integrability can be improved up to losing a log(n) factor, namely and a random variable D n,t satisfying for some generic constant D q depending on q, sup n,t Proof.We proceed in four steps.In the first step, we prove a representation formula for (−∆) that we will use as the core tool in the next steps.In the second step, we compare the two operators −∇ • ρ∇ and −∆, with help of Meyers' estimate recalled in Theorem A.3.Doing so, we then have to bound the L q -norms (2 ≤ q ≤ q) of the gradient of the solution 5 to the Poisson equation with r.h.s.µ n,t − ρ t and Neumann boundary conditions.We control all the norms by the L 4 -norm that, in turn, we estimate using the Riesz-transform bound (3.9) and following some ideas from [6, Lemma 3.17].In the third and fourth steps, we control the bound previously obtained in expectation where our main tool is Assumption 1.1 and the concentration inequalities in Proposition A.1.
Step 1.A representation formula for (−∆) 1 2 .We show that given f ∈ C 2 such that n M • ∇f = 0 on ∂M we have (3.16)(−∆) Note that (−∆) f , defined in (3.10), is well defined in L 2 .Indeed, using the fact that (φ k , f which vanishes as N ↑ ∞ uniformly in M . We now justify (3.16).Observe that since n M • ∇f = 0 on ∂M, which is a direct consequence of two integration by parts using the heat-kernel representation P s f = ´M p s (•, y)f (y) dm(y).Therefore where the last integral is well-defined in L 2 since from (3.7) We then use the spectral decomposition of the heat semigroup (3.4) to get The combination of (3.18) and (3.19) yields for any η ∈ L 2 Using (3.8), we have so that we can exchange integration and summation in (3.20) to obtain which gives (3.16) by arbitrariness of η.Finally, note that the r.h.s. integral in (3.16) is absolutely convergent thanks to the integration by parts ∆P τ f = P τ ∆f and the bounds on the heat kernel (3.3), so that it defines a function in C 0 .
Step 2. Comparison with the solution of the Poisson equation.We claim that for any q ∈ [2, min{q, 4}] and p < ∞ Let g n,t ∈ Ḣ1 be the solution of the following Poisson equation Re-expressing the right-hand side of (3.12) as ∇ • ∇g n,t , we apply Meyers' estimate recalled in Theorem A.3 and Hölder's inequality to obtain: for any q ∈ [2, min{q, 4}], We now introduce the Paley-Littlewood functional for any g ∈ L 4 and ˆM g = 0.
We recall that the inverse of L is continuous, see [51], namely Combining the Riesz transform bound (3.9) with (3.24) yields We now claim that which requires a special care when M has a boundary.We use the definition of P s in form of Recalling that n M • ∇g n,t = 0, (3.17) implies that ∆P τ g n,t = P τ ∆g n,t which, combined with (3.22) and (3.16), gives In particular, it implies that n M • ∇(−∆) E ˆM dm ˆ∞ 0 ds (−s∆) which is (3.21).
Step 3. Proof of (3.13).The estimate (3.13) is a consequence of (3.21) applied with p = 1 and (3.30) Indeed, plugging (3.30) in (3.21) gives We now show (3.30).First, using the definition (1.2) of µ n , we have (3.33)(−s∆) such that expanding the square gives (−s∆) The estimate (3.30) is then a consequence of (3.34) Indeed, the first item of (3.34) immediately implies while, using the assumption (1.14) and that t It remains to prove (3.34) and we start with the first item.Here, we use the fact that (3.36) (−s∆) This can be seen from (3.10): Using that P s 2 is an auto-adjoint operator in L 2 and that from (3.19) we learn P s 2 φ n = e −λn s 2 φ n , we have (−s∆) Hence, using the semi-group property P s+t = P s 2 P s 2 +t , we deduce (−s∆) P s 2 +t (δ y − ρ) 2 L 2 .We now bound P s 2 +t (δ y − ρ) 2 L 2 in two different ways.First, using the bounds (3.3) of the heat-kernel, we have by the triangle inequality yielding to the first alternative in the first item of (3.34).Second, applying Poincaré's inequality yields yielding to the second alternative in the first item of (3.34).
We now turn to the second item of (3.34).Here, we make use of the representation formula (3.16) applied to f = P s+t (δ y − ρ).Using the semi-group property P τ P s+t = P τ +s+t , this takes the form A direct application of the heat-kernel bounds (3.3) leads to which is the first alternative in the second item of (3.34).For the second alternative, we write Step 4. Proof of (3.15).Following the same computations done in the previous step, the estimate (3.15) is a consequence of (3.21) and the moment bounds (3.37) for any s ∈ (0, ∞), together with the second item of (3.34) and Lemma A.2, where we recall that ζ is defined in (3.31).
We now show (3.37).Using (3.33) and the assumption η ≥ 1, we apply the concentration inequality in Proposition A.1 to the effect of: for any λ ≥ 0 (3.38) We then obtain (3.37) combining (3.38) with (3.34) and (3.35), together with an application of the Layer-cake formula.
3.3.Fluctuation estimates.This section is devoted to justify (2.12) needed to ensure the condition (2.3) with very high probability.Our result is in the spirit of [3,Theorem 3.3].However, our strategy differs from [3, Theorem 3.3] and is based on Schauder's theory, with an additional special care on the dependences on δ.We briefly sketch the main ingredients of the fluctuation estimates (2.12).By linearity, it is enough to show (2.12) for f n,t δ ∈ Ḣ1 being the solution of (3.39) − ∇ • ρ δ ∇f n,t δ = µ n,t − ρ t .We then make use of the chain rule to expand the equation in (3.40) − ∆f n,t δ = 1 ρ δ ∇ρ δ • ∇f n,t δ + 1 ρ δ (µ n,t − ρ t ), and we define the auxiliary problem δ .Doing so, on the one hand, we can control u n,t δ which can be handled using the explicit formula in terms of the heatkernel, the explicit bounds on the latter, cf (3.3), and the regularity of 1 ρ δ .On the other hand, we use Schauder's estimates to control v n,t δ and f n,t δ .Using the fact that those estimates depend polynomially on ρ δ C 0,α , we can keep track on the dependences on δ that we can optimize later on.We finally mention that in the case where M has a boundary, u n,t δ cannot be directly defined by (3.41) since the r.h.s. does not have zero mean.In order to also include this case, we add a zero order term in the equation, see (3.42).Proposition 3.4 (Fluctuation estimates).Let {µ n } n be defined in (1.2) with point clouds satisfying Assumption 1.1.For any parameter δ ∈ (0, 1) and υ > 0, we define 6 u n,t δ ∈ H 1 weak solution of and the two events and B δ,n := (∇u n,t δ , ∇ 2 u n,t δ ) L ∞ ≤ 1 log υ (n) .There exists κ > 0 such that for any κ 1 > 0 and the choice the solution f n,t δ ∈ Ḣ1 of (3.39) satisfies Furthermore, there exists κ 2 > 0 depending on υ such that for the choice t = t n := log κ 2 (n) n , we have Proof.The proof of Proposition 3.4 is split into two steps.In the first step, we prove (3.45) where our main tool is Schauder's theory and an explicit formula for u n,t δ defined in (3.40).In the second step, we show (3.46),where our main tool is the concentration inequalities in Proposition A.1 and the explicit formula for u n,t δ used in the first step.
Step 1. Proof of (3.45).We define the difference v n,t δ := f n,t δ − u n,t δ − ´M u n,t δ ∈ Ḣ1 and note that from (3.40) and (3.42), it solves We claim that there exists κ > 0 such that with the choices of δ in (3.44), we have The estimate (3.45) then follows from (3.48) and the triangle inequality.
We now prove (3.48).We apply Schauder's estimate (see for instance [27] and [45,Chapter 10]) to (3.47) to obtain While the second r.h.s. term is directly of order of 1 log υ (n) in B δ,n , the first r.h.s. term requires additional attention.Using (1.6) and (3.3), ρ δ satisfies and together with the algebraic property of • C 0,α , we have The latter is bounded using Schauder's estimate applied this time on (3.39) (knowing that the dependence on ρ δ C 0,α is at most polynomial): there exists κ > 0 such that which yields the following control of the first r.h.s. term of (3.49) (3.51) which is of order of Step 2. Proof of (3.46).We provide the arguments for (3.46) for some constants C 1 , C 2 , C 3 , C 4 > 0. To see this, let η n be defined by .
By compactness of M, we can find a η n -net {x k } 1≤k≤N ⊂ M with N η −2 n .We note that .
Applying P on (3.54) yields Using (3.52) and η −2 n log 2υ (n)t −3 , which can be proven following the arguments leading to the second item of (3.59) below, this yields We now prove (3.52).We first exploit the following explicit representation formula7 for u n,t δ , that we expand, using the definition (1.2) of µ n , in form of Then applying the concentration inequalities in Proposition A.1, we get The estimate (3.52) then follows from the three following estimates that we prove separately in the next three sub-steps.
Sub-step 2.1.Proof of the first item of (3.59).Splitting the time integral into ´t 0 + ´∞ t and subtracting and adding back 1 ρ δ (x) in the first integral as well as using the semigroup property of {P s } s in form of ˆt 0 ds e −s P s (p t (•, y) we decompose ω into a regular-part J 1 and a singular-part J 2 : (3.60) .
For the regular-part J 1 , we apply directly the heat-kernel bounds (3.3) and the first item of (3.50), to obtain from the triangle inequality and Minkowski's inequality .
The first r.h.s. term R 1 (x) is dominated using directly the heat-kernel bounds (3.3) and (3.50) For the second r.h.s. side term R 2 (x), we first simplify the y-integral.Using that (3.63) we have by Jensen's inequality and the heat-kernel bounds Thus, The combination of (3.61), (3.62) and (3.63) yields We now turn to the singular-part J 2 .We first apply Minkowski's inequality in form of (3.66) We then simplify the y-integral.To this aim, we bound the integrand in L ∞ using the heat-kernel bounds (3.3), (3.50) and (3.67) δ −1 d(x, y) for any x, y ∈ M, in form of (3.68) This yields together with (3.66) To conclude, the combination of (3.60), (3.65) and (3.69) shows the first item of (3.59).
Sub-step 2.2.Proof of the second item of (3.59).We use the decomposition (3.60).For the regular-part J 1 , we argue as in (3.61) for the first term whereas the second-term is estimated using the heat-kernel bounds (3.3) in form of Hence, For the singular-part J 2 , we use the bound (3.68) which directly yields The combination of (3.60), (3.70) and (3.71) gives the second item of (3.59).
Sub-Step 2.3.Proof of the third item of (3.59).According to the first item of (3.59), it suffices to give the argument for the second term in the definition (3.58) of v 2 .We use the assumption (1.14) together with the two first items of (3.59) in form of 3.4.Contractivity estimates.This section is devoted to the control of the smoothing errors W 2 2 (µ n,t , µ n ) and W 2 2 (ν m,t , ν m ) for the particular choice of t given in Proposition 3.4.The first result is in the spirit of [3, Theorem 5.2] that we extend in the case of non-uniformly distributed and correlated points.This extension requires a finer analysis of the error and the proof relies on Berry-Esseen type inequalities in the spirit of [12,Theorem 5].Proposition 3.5 (Semigroup contraction for empirical measures).Let {µ n } n be defined in (1.2) with point clouds satisfying Assumption 1.1.Given t such that Proposition 3.4 holds, we have Furthermore, if (1.14) holds with η ≥ 1 then the assumption (1.13) can be dropped and the stochastic integrability can be improved up to losing a log(n) factor, namely Proof.According to the fluctuation estimates in Proposition 3.4 together with W 2 2 (µ n,t , µ n ) ≤ (diam(M)) 2 , we can restrict the analysis in A n defined in (3.43).Note that for n large enough, (1.6) yields We split the proof into three steps.In the first step, we prove a Berry-Esseen type smoothing inequality for W 2 2 (µ n,t , µ n ) which decomposes the error in a deterministic part involving ρ and a random part involving the Fourier coefficients { µ n (k)} k of µ n .In the second step, we prove (3.72).In the third step, we control the fluctuations of { µ n (k)} k using the concentration inequalities in Proposition A.1 and deduce (3.73).
Step 1. Berry-Esseen type inequality.Recalling that we denote by {λ k , φ k } k the eigenvalues and eigenfunctions of −∆ respectively, we prove that where We first apply the triangle inequality and use the classical contractivity estimate in [22,Theorem 3] to get (3.76) We then apply Peyre's estimate [46] to the second r.h.s. , which takes the form (3.77) Now, given an arbitrary f such that For the first r.h.s. term of (3.79), we expand the integral using (3.4).Thus, together with the semigroup property of {P t } t>0 and Cauchy-Schwarz's inequality, we get Using that from (3.74) we have µ n,t+ 1 n ≥ λ 2 and recalling (3.78), we get Furthermore, noticing that P t φ k = e −tλ k φ k which implies that, since P t is self-adjoint we obtain This leads to For the second r.h.s. of (3.79), we introduce w ∈ Ḣ1 satisfying , so that from an integration by parts, Cauchy-Schwarz' inequality and the combination of (3.74) and (3.78), we obtain ˆM f (ρ Using then the explicit formula w = − ´t+ 1 The combination of (3.79), (3.80), (3.81) and (3.76) leads to (3.75).
Step 2. Proof of (3.72).According to (3.75), it remains to show that we first expand the square in form of (3.84) For the first r.h.s. term of (3.84), we use the normalisation φ k L 2 = 1 together with (1.6) to the effect of and get For the second r.h.s. term of (3.84), we use the definition of the β-mixing coefficient (1.12) together with the assumption (1.13) in form of (3.87) The combination of (3.84), (3.86) and (3.87) yields It remains to show that We only treat the second l.h.s. term of (3.88), the first term is controlled the same way.For any x, y ∈ M, we expand We then disintegrate using the spectral decomposition of the heat kernel (3.4) Step 3. Proof of (3.73).It is a consequence of the following fluctuation estimates together with Lemma A.2. Indeed, applying Minkowski's inequality followed by (3.89) and and (3.73) follows from which is obtained the same way as in (3.88) using additionally the trace formula (3.5).
We now prove (3.89).It follows from the estimate on the probability tails (3.91) for any λ > 0, for some C > 0, together with a simple application of the layer-cake representation.
To see (3.91), we use (3.83) together with Proposition A.1 to obtain The estimate (3.91) is then a consequence of (3.93) The first item of (3.93) has been treated in (3.6).For the second item of (3.93), we use (3.85) and, combined with (1.14), we obtain 3.5.Proof of Theorem 1.2: Approximation of the transport plan.We only give the arguments for (1.20), (1.21) is proved the same way using the corresponding results (3.15), (3.73) and (B.2) in the case η > 2 (where some additional comments are given if necessary along the proof).We split the proof into four steps.In the first step, we display some preliminary estimates useful all along the proof.In the second step, we deal with the approximation error that occurs in the process of regularizing ρ into ρ δ .In the third step, we estimate the W 2distance for the regularized quantity using the quantitative stability result in [4, Theorem 3.2], splitting the estimates in small pieces that we control in the fourth step.We finally comment on the proof of Remark 1.4 and Theorem 1.3, which are obtained with similar techniques.
Step 1.Preliminary estimates.Heat kernel regularization.The assumption ρ ∈ H ε provides Indeed, using the definition of the heat-kernel, Minkowski's inequality and the spectral decomposition (3.4), we have we get that so that, considering u n,t δ as in (3.42) and likewise v n,t δ with µ n,t replaced by ν m,t and defining (3.98) as well as L q -estimates.A similar decomposition as in (3.97) of (1.18) together with (3.13) and the choice of t in (3.95) yields where C n denotes, all along the proof, a random variable which satisfies (3.14) and may change from line to line.
Finally, since inf π W 2 2 (π, γ n,t ) ≤ (diam(M)) 2 and (3.46) holds, we can restrict our analysis in A n ∩ B δ,n that we do for the rest of the proof.
Step 2. Regularization error.We show that (3.103) survives when measuring the W 2distance, namely Using the coupling (Id, exp(∇h n,t δ )), (Id, exp(∇h n,t ) # µ n,t as a competitor in (3.1) and the fact that µ n,t We then claim that .
We now justify (3.106).The difficulty arises from the fact that exp is not globally Lipschitz.To overcome this, we define log(n) + log For the second right-hand side term, we simply apply Markov's inequality in form of .
The combination of the two previous estimates gives (3.106).
The three last terms are controlled using the contractivity estimate (3.72) and (B.1) which gives For the first two terms, we argue that which combined with (B.1) leads to (3.113).
Let us define the curve η : s ∈ [0, 1] → η s := sµ n,t + (1 − s)ν m,t and note that from (1.18) we have d ds Applying Benamou-Brenier' theorem [10], we learn that We then recall the link between algebraic moments and exponential moments.The proof is a direct consequence of the Taylor expansion of the exponential function.
Lemma A.2. Let X be a non-negative random variable.The following two statements are equivalent: (i) There exists C 1 > 0 such that We conclude this section by recalling the standard Meyers' estimate for elliptic equations in divergence form, see for instance the original paper [43].for some g ∈ L q with q > 2. There exists 2 < q < q such that ∇u ∈ L q and ∇u L q g L q .
Appendix B. Matching cost for point clouds This section is devoted to recall the upper bounds on the matching cost, results which can be found in [12,Theorem 2] under mild β-mixing conditions.The case of Markov chains have been studied in [47,23] where sharp upper bounds are obtained.We include a short proof for convenience.k , where we recall that σ ∈ (0, 1).Thus, using in addition (3.5), we get (B.5)We now turn to the second term of (B.4).Expanding the square provides where the r.h.s can be estimated following the lines of the proof of (3.52 Using the definition (1.2) of µ n and the convergence to equilibrium (1.33) applied with f = φ k and the bound on the eigenfunctions (3.6), we have for any k ≤ n k , so that, using the trace formula (3.5) and the heat-kernel estimates (3.3), we deduce

1 . 5 . 1 .
Introduction and statement of the main results 2 1.1.The random matching problem and its asymptotic 2 Proof of Theorem 1.2: Approximation of the transport plan 27 Appendix A. Probabilistic and PDE tools 32 Appendix B. Matching cost for point clouds 33 Appendix C. Proof for the class of Markov chains 35 Introduction and statement of the main results

(1. 32 )
β ij ≤ Cσ min{i,j} for any i, j, which ensures that (1.13) and (1.14) (with η = 1 and b = − log(σ)) hold and ensures good concentration property of the Markov chain, see Proposition A.1.The estimates (1.32) can be seen as a direct consequence of the combination of the representation for the β-mixing in [21, Proposition 1] and the assumption (1.29).

Theorem A. 3 (
Meyers estimate).Let a : M → R 2×2 be measurable and uniformly elliptic.Consider u ∈ H 1 the solution of the Neumann boundary problem−∇ • a∇u = ∇ • g in M, a∇u • n M = 0 on ∂M,
where κ is given in Proposition 3.4.Note that by linearity, one can decompose h n,t δ δ ∇h n,t δ = µ n,t − ν m,t .
En d 2 exp(∇h n,t δ ), exp(∇h n,t ) dm + (diam(M)) 2 m(E c n ).For the first right-hand side integral, note that from the choice of C n and (3.103), we can choose ς 1 uniformly in n such that in E n the quantity |∇h n,t δ − ∇h n,t | can be made arbitrary small.Since exp is Lipschitz-continuous in a neighborhood of the null vector, we deduce ˆM 1 En d 2 exp(∇h n,t δ ), exp(∇h n,t ) dm= ˆM 1 En d 2 exp(∇h n,t δ ), exp(∇h n,t ) dm + ˆM 1 E c n d 2 exp(∇h n,t δ ), exp(∇h n,t ) dm ≤ ˆM 1 ).As before, we investigate the extra term coming from the deviation ofE[µ n ] from µ ∞ .The estimate (C.3) ˆ∞ 0 e −s E[µ n,t+s ] − ρ t+s (x) ds + ˆ∞ 0 e −s P s 1 ρ δ − 1 ρ δ (x) E[µ n,t ] − ρ t (x) ds.To estimate the second r.h.s integral of (C.5), we use (3.68).For the first r.h.s integral, that we denote by J , we use the definition (1.2) of µ n , the convergence to equilibrium (1.33) and the heat-kernel bounds (3.3) to obtain Step 3. Contractivity estimate.Here, the law of the Markov chain affects the estimate (3.82).The extra error term coming from the deviation of E[µ n ] from µ ∞ reads k≥1 e − 2 n λ k