1 Introduction and results

1.1 Notation

Let \(d\ge 1\) and \({\mathcal P}({{\mathbb {R}}^d})\) stand for the set of all probability measures on \({{\mathbb {R}}^d}\). For \(\mu \in {\mathcal P}({{\mathbb {R}}^d})\), we consider an i.i.d. sequence \((X_k)_{k\ge 1}\) of \(\mu \)-distributed random variables and, for \(N \ge 1\), the empirical measure

$$\begin{aligned} \mu _N:=\frac{1}{N} \sum _{k=1}^N \delta _{X_k}. \end{aligned}$$

As is well-known, by Glivenko-Cantelli’s theorem, \(\mu _N\) tends weakly to \(\mu \) as \(N\rightarrow \infty \) (for example in probability, see Van der Vaart-Wellner [40] for details and various modes of convergence). The aim of the paper is to quantify this convergence, when the error is measured in some Wasserstein distance. Let us set, for \(p\ge 1\) and \(\mu ,\nu \) in \({\mathcal P}({{\mathbb {R}}^d})\),

$$\begin{aligned} {\mathcal T}_p(\mu ,\nu ) = \inf \left\{ \left( \int _{{{\mathbb {R}}^d}\times {{\mathbb {R}}^d}} |x-y|^p \xi (dx,dy) \right) \; : \; \xi \in {\mathcal H}(\mu ,\nu ) \right\} , \end{aligned}$$

where \({\mathcal H}(\mu ,\nu )\) is the set of all probability measures on \({{\mathbb {R}}^d}\times {{\mathbb {R}}^d}\) with marginals \(\mu \) and \(\nu \). See Villani [41] for a detailed study of \({\mathcal T}_p\). The Wasserstein distance \({\mathcal W}_p\) on \({\mathcal P}({{\mathbb {R}}^d})\) is defined by \({\mathcal W}_p(\mu ,\nu )={\mathcal T}_p(\mu ,\nu )\) if \(p\in (0,1]\) and \({\mathcal W}_p(\mu ,\nu )=({\mathcal T}_p(\mu ,\nu ))^{1/p}\) if \(p>1\).

The present paper studies the rate of convergence to zero of \({\mathcal T}_p(\mu _N,\mu )\). This can be done in an asymptotic way, finding e.g. a sequence \(\alpha (N)\rightarrow 0\) such that \(\lim _N \alpha (N)^{-1}{\mathcal T}_p(\mu _N,\mu ) <\infty \) a.s. or \(\lim _N\alpha (N)^{-1} \mathbb {E}({\mathcal T}_p(\mu _N,\mu ))<\infty \). Here we will rather derive some non-asymptotic moment estimates such as

$$\begin{aligned} \mathbb {E}({\mathcal T}_p(\mu _N,\mu ))\le \alpha (N) \quad \hbox {for all }N\ge 1 \end{aligned}$$

as well as some non-asymptotic concentration estimates (also often called deviation inequalities)

$$\begin{aligned} \Pr ({\mathcal T}_p(\mu _N,\mu )\ge x)\le \alpha (N,x) \quad \hbox {for all }N\ge 1,\, \hbox { all }x>0. \end{aligned}$$

They are naturally related to moment (or exponential moment) conditions on the law \(\mu \) and we hope to derive an interesting interplay between the dimension \(d\ge 1\), the cost parameter \(p>0\) and these moment conditions. Let us introduce precisely these moment conditions. For \(q>0\), \(\alpha >0\), \(\gamma >0\) and \(\mu \in {\mathcal P}({{\mathbb {R}}^d})\), we define

$$\begin{aligned} M_q(\mu ):= {\int _{{{\mathbb {R}}^d}}}|x|^q \mu (dx) \quad \hbox {and} \quad {\mathcal E}_{\alpha ,\gamma }(\mu ):= {\int _{{{\mathbb {R}}^d}}}e^{\gamma |x|^\alpha } \mu (dx). \end{aligned}$$

We now present our main estimates, the comparison with the existing results and methods will be developped after this presentation. Let us however mention at once that our paper relies on some recent ideas of Dereich et al. [16].

1.2 Moment estimates

We first give some \(L^p\) bounds.

Theorem 1

Let \(\mu \in {\mathcal P}({{\mathbb {R}}^d})\) and let \(p>0\). Assume that \(M_q(\mu )<\infty \) for some \(q>p\). There exists a constant \(C\) depending only on \(p,d,q\) such that, for all \(N\ge 1\),

$$\begin{aligned}&\mathbb {E}\left( {\mathcal T}_p(\mu _N,\mu )\right) \le C M_q^{p/q}(\mu )\\&\quad \times \left\{ \begin{array}{ll} N^{-1/2} +N^{-(q-p)/q}&{}\quad \hbox {if }\,p>d/2\, \quad \hbox {and}\quad \,q\ne 2p,\\ N^{-1/2} \log (1+N)+N^{-(q-p)/q} &{}\quad \hbox {if }\,p=d/2\,\quad \hbox {and}\quad \,q\ne 2p,\\ N^{-p/d}+N^{-(q-p)/q} &{}\quad \hbox {if }\,p\in (0,d/2)\,\quad \hbox {and}\quad \,q\ne d/(d-p). \end{array}\right. \end{aligned}$$

Observe that when \(\mu \) has sufficiently many moments (namely if \(q>2p\) when \(p\ge d/2\) and \(q> dp/(d-p)\) when \(p\in (0,d/2)\)), the term \(N^{-(q-p)/q}\) is small and can be removed. We could easily treat, for example, the case \(p>d/2\) and \(q=2p\) but this would lead to some logarithmic terms and the paper is technical enough.

This generalizes [16], in which only the case \(p\in [1,d/2)\) (whence \(d\ge 3\)) and \(q>dp/(d-p)\) was treated. The argument is also slightly simplified.

To show that Theorem 1 is really sharp, let us give examples where lower bounds can be derived quite precisely.

(a) If \(a\ne b \in {{\mathbb {R}}^d}\) and \(\mu =(\delta _a+\delta _b)/2\), one easily checks (see e.g. [16, Remark 1]) that \(\mathbb {E}({\mathcal T}_p(\mu _N,\mu )) \ge c N^{-1/2}\) for all \(p\ge 1\). Indeed, we have \(\mu _N=Z_N\delta _a+(1-Z_N)\delta _b\) with \(Z_N=N^{-1}\sum _1^N {\mathbf{1}}_{\{X_i=a\}}\), so that \({\mathcal T}_p(\mu _N,\mu )=|a-b|^p|Z_N-1/2|\), of which the expectation is of order \(N^{-1/2}\).

(b) Such a lower bound in \(N^{-1/2}\) can easily be extended to any \(\mu \) (possibly very smooth) of which the support is of the form \(A\cup B\) with \(d(A,B)>0\) (simply note that \({\mathcal T}_p(\mu _N,\mu )\ge d^p(A,B)|Z_N-\mu (A)|\), where \(Z_N=N^{-1}\sum _1^N {\mathbf{1}}_{\{X_i \in A\}}\)).

(c) If \(\mu \) is the uniform distribution on \([-1,1]^d\), it is well-known and not difficult to prove that for \(p>0\), \(\mathbb {E}({\mathcal T}_p(\mu _N,\mu )) \ge c N^{-p/d}\). Indeed, consider a partition of \([-1,1]^d\) into (roughly) \(N\) cubes with length \(N^{-1/d}\). A quick computation shows that with probability greater than some \(c>0\) (uniformly in \(N\)), half of these cubes will not be charged by \(\mu _N\). But on this event, we clearly have \({\mathcal T}_p(\mu _N,\mu )\ge a N^{-1/d}\) for some \(a>0\), because each time a cube is not charged by \(\mu _N\), a (fixed) proportion of the mass of \(\mu \) (in this cube) is at distance at least \(N^{-1/d}/2\) of the support of \(\mu _N\). One easily concludes.

(d) When \(p=d/2=1\), it has been shown by Ajtai et al. [2] that for \(\mu \) the uniform measure on \([-1,1]^d\), \({\mathcal T}_1(\mu _N,\mu ) \simeq c (\log N / N)^{1/2}\) with high probability, implying that \(\mathbb {E}({\mathcal T}_1(\mu _N,\mu )) \ge c (\log N / N)^{1/2}\).

(e) Let \(\mu (dx)= c |x|^{-q-d}{\mathbf{1}}_{\{|x|\ge 1\}}dx\) for some \(q>0\). Then \(M_r(\mu )<\infty \) for all \(r\in (0,q)\) and for all \(p\ge 1\), \(\mathbb {E}({\mathcal T}_p(\mu _N,\mu )) \ge c N^{-(q-p)/q}\). Indeed, \(\mathbb {P}(\mu _N(\{|x|\ge N^{1/q}\})=0)= (\mu (\{|x|< N^{1/q}\}))^N =(1-c/N)^N \ge c>0\) and \(\mu (\{|x|\ge 2 N^{1/q}\})\ge c/N\). One easily gets convinced that \({\mathcal T}_p(\mu _N,\mu )\ge N^{p/q}{\mathbf{1}}_{\{\mu _N(\{|x|\ge N^{1/q}\})=0\}} \mu (\{|x|\ge 2 N^{1/q}\})\), from which the claim follows.

As far as general laws are concerned, Theorem 1 is really sharp: the only possible improvements are the following. The first one, quite interesting, would be to replace \(\log (1+N)\) by something like \(\sqrt{\log (1+N)}\) when \(p=d/2\) (see point (d) above). It is however not clear whether it is feasible in full generality. The second one, which should be a mere (and not very interesting) refinement, would be to sharpen the bound in \(N^{-(q-p)/q}\) when \(M_q(\mu )<\infty \): point (e) only shows that there is \(\mu \) with \(M_q(\mu )<\infty \) for which we have a lowerbound in \(N^{-(q-p)/q - {\varepsilon }}\) for all \({\varepsilon }>0\).

However, some improvements are possible when restricting the class of laws \(\mu \). First, when \(\mu \) is the uniform distribution in \([-1,1]^d\), the results of Talagrand [38, 39] strongly suggest that when \(d\ge 3\), \(\mathbb {E}({\mathcal T}_p(\mu _N,\mu )) \simeq N^{-p/d}\) for all \(p>0\), and this is much better than \(N^{-1/2}\) when \(p\) is large. Such a result would of course immediately extend to any distribution \(\mu =\lambda \circ F^{-1}\), for \(\lambda \) the uniform distribution in \([-1,1]^d\) and \(F:[-1,1]^d\mapsto {{\mathbb {R}}^d}\) Lipschitz continuous. In any case, a smoothness assumption for \(\mu \) cannot be sufficient, see point (b) above.

Second, for irregular laws, the convergence can be much faster than \(N^{-p/d}\) when \(p<d/2\), see point (a) above where, in an extreme case, we get \(N^{-1/2}\) for all values of \(p>0\). It is shown by Dereich et al. [16] (see also Barthe and Bordenave [3]) that indeed, for a singular law, \(\lim _N N^{-p/d}\mathbb {E}({\mathcal T}_p(\mu _N,\mu ))=0\).

1.3 Concentration inequalities

We next state some concentration inequalities.

Theorem 2

Let \(\mu \in {\mathcal P}({{\mathbb {R}}^d})\) and let \(p>0\). Assume one of the three following conditions:

$$\begin{aligned}&\exists \; \alpha >p, \; \exists \;\gamma >0,\; {\mathcal E}_{\alpha ,\gamma }(\mu )<\infty ,\end{aligned}$$
(1)
$$\begin{aligned} \hbox {or} \quad&\exists \; \alpha \in (0,p), \; \exists \;\gamma >0,\; {\mathcal E}_{\alpha ,\gamma }(\mu )<\infty ,\end{aligned}$$
(2)
$$\begin{aligned} \hbox {or} \quad&\exists \; q>2p , \; M_q(\mu )<\infty . \end{aligned}$$
(3)

Then for all \(N\ge 1\), all \(x\in (0,\infty )\),

$$\begin{aligned} \mathbb {P}({\mathcal T}_p\left( \mu ^N,\mu )\ge x\right) \le a(N,x){\mathbf{1}}_{\{x\le 1\}}+b(N,x), \end{aligned}$$

where

$$\begin{aligned} a(N,x)=C \left\{ \begin{array}{ll} \exp (-cNx^2) &{}\quad \hbox {if }\,p>d/2, \\ \exp (-cN(x/\log (2+1/x))^2) &{}\quad \hbox {if }\,p=d/2, \\ \exp (-cN x^{d/p}) &{}\quad \hbox {if }\,p\in (0,d/2) \end{array}\right. \end{aligned}$$

and

$$\begin{aligned} b(N,x)=C\left\{ \begin{array}{ll} \exp (-cNx^{\alpha /p}){\mathbf{1}}_{\{x> 1\}} &{}\quad \hbox {under (1)},\\ \exp (-c(Nx)^{(\alpha -{\varepsilon })/p}){\mathbf{1}}_{\{x\le 1\}}+\exp (-c (Nx)^{\alpha /p}){\mathbf{1}}_{\{x>1\}} &{}\quad \forall \;{\varepsilon }\in (0,\alpha )\, \hbox {under (2)},\\ N (Nx)^{-(q-{\varepsilon })/p}&{}\quad \forall \;{\varepsilon }\in (0,q)\,\hbox { under (3)}. \end{array}\right. \end{aligned}$$

The positive constants \(C\) and \(c\) depend only on \(p,d\) and either on \(\alpha ,\gamma ,{\mathcal E}_{\alpha ,\gamma }(\mu )\) (under (1)) or on \(\alpha ,\gamma ,{\mathcal E}_{\alpha ,\gamma }(\mu ),{\varepsilon }\) (under (2)) or on \(q,M_q(\mu ),{\varepsilon }\) (under (3)).

We could also treat the critical case where \({\mathcal E}_{\alpha ,\gamma }(\mu )<\infty \) with \(\alpha =p\), but the result we could obtain is slightly more intricate and not very satisfying for small value of \(x\) (even if good for large ones).

Remark 3

When assuming (2) with \(\alpha \in (0,p)\), we actually also prove that

$$\begin{aligned} b(N,x)\le C\exp (-c N x^2 (\log (1+N))^{-\delta })+C\exp (-c (Nx)^{\alpha /p}), \end{aligned}$$

with \(\delta =2p/\alpha -1\), see Step 5 of the proof of Lemma 13 below. This allows us to extend the inequality \(b(N,x)\le C\exp (-c (Nx)^{\alpha /p})\) to all values of \(x\ge x_N\), for some (rather small) \(x_N\) depending on \(N,\alpha ,p\). But for very small values of \(x>0\), this formula is less interesting than that of Theorem 2. Despite much effort, we have not been able to get rid of the logarithmic term.

We believe that these estimates are quite satisfying. To get convinced, first observe that the scales seem to be the good ones. Recall that \(\mathbb {E}({\mathcal T}_p(\mu _N,\mu ))=\int _0^\infty \mathbb {P}( {\mathcal T}_p(\mu ^N,\mu )\ge x)dx\).

(a) One easily checks that \(\int _0^\infty a(N,x)dx \le CN^{-p/d}\) if \(p<d/2\), \(CN^{-1/2}\log (1+N)\) if \(p=d/2\), and \(CN^{-1/2}\) if \(p>d/2\), as in Theorem 1.

(b) When integrating \(b(N,x)\) (or rather \(b(N,x)\wedge 1\)), we find \(N^{-(q-{\varepsilon }-p)/(q-{\varepsilon })}\) under (3) and something smaller under (1) or (2). Since we can take \(q-{\varepsilon }>2p\), this is \(<N^{-1/2}\) (and thus also \(<N^{-p/d}\) if \(p<d/2\) and than \(N^{-1/2}\log (1+N)\) if \(p=d/2\)).

The rates of decrease are also satisfying in most cases. Recall that in deviation estimates, we never get something better than \(\exp (-Ng(x))\) for some function \(g\). Hence \(a(N,x)\) is probably optimal. Next, for \(\bar{Y}_N\) the empirical mean of a family of centered i.i.d. random variables, it is well-known that the good deviation inequalities are the following.

(a) If \(\mathbb {E}[\exp (a|Y_1|^\beta )]<\infty \) with \(\beta \ge 1\), then \(\Pr [|\bar{Y}_N|\ge x ] \le Ce^{-cNx^2}{\mathbf{1}}_{\{x\le 1\}} + Ce^{-cNx^{\beta }}{\mathbf{1}}_{\{x> 1\}}\), see for example Djellout et al. [18], Gozlan [24] or Ledoux [27], using transportation cost inequalities.

(b) If \(\mathbb {E}[\exp (a|Y_1|^\beta )]<\infty \) with \(\beta <1\), then \(\Pr [|\bar{Y}_N|\ge x ] \le Ce^{-cNx^2} + Ce^{-c(Nx)^{\beta }}\), see Merlevède et al. [31, Formula (1.4)] which is based on results by Borovkov [8].

(c) If \(\mathbb {E}[|Y_1|^r]<\infty \) for some \(r>2\), then \(\Pr [|\bar{Y}_N|\ge x ] \le Ce^{-cNx^2} + CN (Nx)^{-r}\), see Fuk and Nagaev [23], using usual truncation arguments.

Our result is in perfect adequacy with these facts [(up to some arbitratry small loss due to \({\varepsilon }\) under (2) and (3)] since \({\mathcal T}_p(\mu _N,\mu )\) should behave very roughly as the mean of the \(|X_i|^p\)’s, which e.g. has an exponential moment with power \(\beta :=\alpha /p\) under (1) and (2).

1.4 Comments

The control of the distance between the empirical measure of an i.i.d. sample and its true distribution is of course a long standing problem central both in probability, statistics and informatics with a wide number of applications: quantization (see Delattre et al. [14] and Pagès and Wilbertz [33] for recent results), optimal matching (see Ajtai et al. [2], Dobrić and Yukich [19], Talagrand [39], Barthe and Bordenave [3]), density estimation, clustering (see Biau et al. [5] and Laloë [26]), MCMC methods (see [36] for bounds on ergodic averages), particle systems and approximations of partial differential equations (see Bolley et al. [11] and Fournier and Mischler [22]). We refer to these papers for an extensive introduction on this vast topic.

If many distances can be used to consider the problem, the Wasserstein distance is quite natural, in particular in quantization or for particle approximations of P.D.E.’s. However the depth of the problem was discovered only recently by Ajtai et al. [2], who considered the uniform measure on the square, investigated thoroughly by Talagrand [39]. As a review of the litterature is somewhat impossible, let us just say that the methods involved were focused on two methods inherited by the definitions of the Wasserstein distance: the construction of a coupling or by duality to control a particular empirical process.

Concerning moment estimates (as in Theorem 1), some results can be found in Horowitz and Karandikar [25], Rachev and Rüschendorf [35] and Mischler and Mouhot [32]. But theses results are far from optimal, even when assuming that \(\mu \) is compactly supported. Very recently, strickingly clever alternatives were considered by Boissard and Le Gouic [7] and by Dereich et al. [16]. Unfortunately, the construction of Boissard and Le Gouic, based on iterative trees, was a little too complicated to yield sharp rates. On the contrary, the method of [16], exposed in details in the next section, is extremely simple, robust, and leads to the almost optimal results exposed here. Some sharp moment estimates were already obtained in [16] for a limited range of parameters.

Concerning concentration estimates, only few results are available. Let us mention the work of Bolley et al. [11] and very recently by Boissard [6], which we considerably improve. Our assumptions are often much weaker (the reference measure \(\mu \) was often assumed to satisfy some functional inequalities, which may be difficult to verify and usually include more “structure” than mere integrability conditions) and \(\Pr [{\mathcal T}_p (\mu _N,\mu )\ge x]\) was estimated only for rather large values of \(x\). In particular, when integrating the concentration estimates of [11], one does never find the good moment estimates, meaning that the scales are not the good ones.

Moreover, the approach of [16] is robust enough so that we can also give some good moment bounds for the Wasserstein distance between the empirical measure of a Markov chain and its invariant distribution (under some conditions). This could be useful for MCMC methods because our results are non asymptotic. We can also study very easily some \(\rho \)-mixing sequences (see Doukhan [20]), for which only very few results exist, see Biau et al. [7]. Finally, we show on an example how to use Theorem 1 to study some particle systems. For all these problems, we might also obtain some concentration inequalities, but this would need further refinements which are out of the scope of the present paper, somewhat already technical enough, and left for further works.

1.5 Plan of the paper

In the next section, we state some general upper bounds of \({\mathcal T}_p(\mu ,\nu )\), for any \(\mu ,\nu \in {\mathcal P}({{\mathbb {R}}^d})\), essentially taken from [16]. Section 3 is devoted to the proof of Theorem 1. Theorem 2 is proved in three steps: in Sect. 4 we study the case where \(\mu \) is compactly supported and where \(N\) is replaced by a Poisson\((N)\)-distributed random variable, which yields some pleasant independance properties. We show how to remove the randomization in Sect. 5, concluding the case where \(\mu \) is compactly supported. The non compact case is studied in Sect. 6. The final Sect. 7 is devoted to dependent random variables: \(\rho \)-mixing sequences, Markov chains and a particular particle system.

2 Coupling

The following notion of distance, essentially taken from [16], is the main ingredient of the paper.

Notation 4

(a) For \(\ell \ge 0\), we denote by \({\mathcal P}_\ell \) the natural partition of \((-1,1]^d\) into \(2^{d\ell }\) translations of \((-2^{-\ell },2^{-\ell }]^d\). For two probability measures \(\mu ,\nu \) on \((-1,1]^d\) and for \(p>0\), we introduce

$$\begin{aligned} {\mathcal D}_p(\mu ,\nu ):= \frac{2^p-1}{2} \sum _{\ell \ge 1} 2^{-p\ell } \sum _{F \in {\mathcal P}_\ell } |\mu (F)-\nu (F)|, \end{aligned}$$

which obviously defines a distance on \({\mathcal P}((-1,1]^d)\), always bounded by \(1\).

(b) We introduce \(B_0:=(-1,1]^d\) and, for \(n\ge 1\), \(B_n:=(-2^n,2^n]^d{\setminus }(-2^{n-1},2^{n-1}]^d\). For \(\mu \in {\mathcal P}({{\mathbb {R}}^d})\) and \(n\ge 0\), we denote by \({\mathcal R}_{B_n} \mu \) the probability measure on \((-1,1]^d\) defined as the image of \(\mu |_{B_n}/\mu (B_n)\) by the map \(x \mapsto x/2^n\). For two probability measures \(\mu ,\nu \) on \({{\mathbb {R}}^d}\) and for \(p>0\), we introduce

$$\begin{aligned} {\mathcal D}_p(\mu ,\nu ):= \sum _{n\ge 0} 2^{pn} \big ( |\mu (B_n)-\nu (B_n)| + (\mu (B_n)\wedge \nu (B_n)) {\mathcal D}_p({\mathcal R}_{B_n}\mu ,{\mathcal R}_{B_n}\nu ) \big ). \end{aligned}$$

A little study, using that \({\mathcal D}_p \le 1\) on \({\mathcal P}((-1,1]^d)\), shows that this defines a distance on \({\mathcal P}({{\mathbb {R}}^d})\).

Having a look at \({\mathcal D}_p\) in the compact case, one sees that in some sense, it measures distance of the two probability measures simultaneously at all the scales. The optimization procedure can be made for all scales and outperforms the approach based on a fixed diameter covering of the state space (which is more or less the approach of Horowitz and Karandikar [25]). Moreover one sees that the principal control is on \(|\pi (F)-\mu (F)|\) which is a quite simple quantity. The next results are slightly modified versions of estimates found in [16], see [16, Lemma 2] for the compact case and [16, proof of Theorem 3] for the non compact case. It contains the crucial remark that \({\mathcal D}_p\) is an upper bound (up to constant) of the Wasserstein distance.

Lemma 5

Let \(d\ge 1\) and \(p>0\). For all pairs of probability measures \(\mu ,\nu \) on \({{\mathbb {R}}^d}\), \({\mathcal T}_p(\mu ,\nu )\le \kappa _{p,d} {\mathcal D}_p(\mu ,\nu )\), with \(\kappa _{p,d}:= 2^p d^{p/2}(2^p+1)/(2^p-1)\).

Proof

We separate the proof into two steps.

Step 1. We first assume that \(\mu \) and \(\nu \) are supported in \((-1,1]^d\). We infer from [16, Lemma 2], in which the conditions \(p\ge 1\) and \(d\ge 3\) are clearly not used, that, since the diameter of \((-1,1]^d\) is \(2\sqrt{d}\),

$$\begin{aligned} {\mathcal T}_p(\mu ,\nu ) \le \frac{\left( 2\sqrt{d}\right) ^p}{2} \sum _{\ell \ge 0} 2^{-p\ell } \sum _{F \in {\mathcal P}_\ell } \mu (F) \sum _{C\; child\; of\; F}\left| \frac{\mu (C)}{\mu (F)}-\frac{\nu (C)}{\nu (F)}\right| , \end{aligned}$$

where “\(C\) child of \(F\)” means that \(C\in {\mathcal P}_{\ell +1}\) and \(C \subset F\). Consequently,

$$\begin{aligned} {\mathcal T}_p(\mu ,\nu )\, \le \,\,&2^{p-1}d^{p/2} \sum _{\ell \ge 0} 2^{-p\ell } \sum _{F \in {\mathcal P}_\ell } \sum _{C\; child\; of\; F} \left( \frac{\nu (C)}{\nu (F)}|\mu (F)\!-\nu (F)|\!+\!|\mu (C)\!-\nu (C)|\right) \\ \le \,&2^{p-1}d^{p/2} \sum _{\ell \ge 0} 2^{-p\ell } \left( \sum _{F \in {\mathcal P}_\ell } |\mu (F)-\nu (F)| + \sum _{C \in {\mathcal P}_{\ell +1}} |\mu (C)-\nu (C)|\right) \\ \le \,&2^{p-1}d^{p/2} (1+2^p) \sum _{\ell \ge 1} 2^{-p\ell } \sum _{F \in {\mathcal P}_\ell } |\mu (F)-\nu (F)|, \end{aligned}$$

which is nothing but \(\kappa _{p,d}{\mathcal D}_p(\mu ,\nu )\). We used that \(\sum _{F \in {\mathcal P}_0} |\mu (F)-\nu (F)|=0\).

In Dereich et al. [16], use directly the formula with the children to study the rate of convergence of empirical measures. This leads to some (small) technical complications, and does not seem to improve the estimates.

Step 2. We next consider the general case. We consider, for each \(n\ge 1\), the optimal coupling \(\pi _n(dx,dy)\) between \({\mathcal R}_{B_n}\mu \) and \({\mathcal R}_{B_n}\nu \) for \({\mathcal T}_p\). We define \(\xi _n(dx,dy)\) as the image of \(\pi _n\) by the map \((x,y)\mapsto (2^nx,2^ny)\), which clearly belongs to \({\mathcal H}(\mu \vert _{B_n}/\mu (B_n),\nu \vert _{B_n}/\nu (B_n))\) and satisfies \(\int \!\!\int |x-y|^p \xi _n(dx,dy) = 2^{np} \int \!\!\int |x-y|^p \pi _n(dx,dy) =2^{np} {\mathcal T}_p({\mathcal R}_{B_n}\mu ,{\mathcal R}_{B_n}\nu )\).

Next, we introduce \(q:=\frac{1}{2} \sum _{n\ge 0} |\nu (B_n)-\mu (B_n)|\) and we define

$$\begin{aligned} \xi (dx,dy)=&\sum _{n\ge 0} (\mu (B_n)\wedge \nu (B_n))\xi _n(dx,dy) + \frac{\alpha (dx)\beta (dy)}{q}, \end{aligned}$$

where

$$\begin{aligned} \alpha (dx)&:= \sum _{n\ge 0}(\mu (B_n)-\nu (B_n))_+ \frac{\mu \vert _{B_n}(dx)}{\mu (B_n)}\quad \hbox { and }\\ \beta (dy)&:= \sum _{n\ge 0}(\nu (B_n)-\mu (B_n))_+ \frac{\nu \vert _{B_n}(dy)}{\nu (B_n)}. \end{aligned}$$

Using that

$$\begin{aligned} q&= \sum _{n\ge 0}(\nu (B_n)-\mu (B_n))_+=\sum _{n\ge 0}(\mu (B_n)-\nu (B_n))_+\\&= 1-\sum _{n\ge 0} (\nu (B_n)\wedge \mu (B_n)), \end{aligned}$$

it is easily checked that \(\xi \in {\mathcal H}(\mu ,\nu )\). Furthermore, we have, setting \(c_p=1\) if \(p\in (0,1]\) and \(c_p=2^{p-1}\) if \(p>1\),

$$\begin{aligned} \int \!\!\!\!\int |x-y|^p \frac{\alpha (dx)\beta (dy)}{q} \le \,\,&\frac{1}{q} \int \!\!\int c_p(|x|^p +|y|^p) \alpha (dx)\beta (dy)\\ =\,\,&c_p\int |x|^p \alpha (dx) + c_p \int |y|^p \beta (dy)\\ \le \,\,&c_p \sum _{n\ge 0} 2^{pn} [(\mu (B_n)-\nu (B_n))_+ +(\nu (B_n)-\mu (B_n))_+ ]\\ =\,\,&c_p \sum _{n\ge 0} 2^{pn} |\mu (B_n)-\nu (B_n)|. \end{aligned}$$

Recalling that \(\int \!\!\int |x-y|^p \xi _n(dx,dy) \le 2^{np} {\mathcal T}_p({\mathcal R}_{B_n}\mu ,{\mathcal R}_{B_n}\nu )\), we deduce that

$$\begin{aligned} {\mathcal T}_p(\mu ,\nu )&\le \int \!\!\int |x-y|^p \xi (dx,dy)\\&\le \sum _{n\ge 0} 2^{np} \left( c_p|\mu (B_n)-\nu (B_n)|\right. \\&\quad \left. +\,\, (\mu (B_n)\wedge \nu (B_n)){\mathcal T}_p({\mathcal R}_{B_n}\mu ,{\mathcal R}_{B_n}\nu )\right) . \end{aligned}$$

We conclude using Step 1 and that \(c_p\le \kappa _{p,d}\). \(\square \)

When proving the concentration inequalities, which is very technical, it will be good to break the proof into several steps to separate the difficulties and we will first treat the compact case. On the contrary, when dealing with moment estimates, the following formula will be easier to work with.

Lemma 6

Let \(p>0\) and \(d\ge 1\). For all \(\mu ,\nu \in {\mathcal P}({{\mathbb {R}}^d})\),

$$\begin{aligned} {\mathcal D}_p(\mu ,\nu ) \le C_p \sum _{n\ge 0} 2^{pn} \sum _{\ell \ge 0} 2^{-p\ell } \sum _{F\in {\mathcal P}_\ell } \left| \mu (2^nF\cap B_n) -\nu (2^n F \cap B_n)\right| \end{aligned}$$

with the notation \(2^n F = \{2^n x\;:\; x\in F\}\) and where \(C_p=1+2^{-p}/(1-2^{-p})\).

Proof

For all \(n\ge 1\), we have \(|\mu (B_n)-\nu (B_n)|=\sum _{F\in {\mathcal P}_0}\left| \mu (2^nF \cap B_n)-\nu (2^n F \cap \right. \) \(\left. B_n)\right| \) and

$$\begin{aligned}&(\mu (B_n)\wedge \nu (B_n)) {\mathcal D}_p({\mathcal R}_{B_n}\mu , {\mathcal R}_{B_n}\nu )\\&\quad \le \mu (B_n)\sum _{\ell \ge 1} 2^{-p\ell }\sum _{F\in {\mathcal P}_\ell } \left| \frac{\mu (2^n F \cap B_n)}{\mu (B_n)} - \frac{\nu (2^n F \cap B_n)}{\nu (B_n)}\right| \\&\quad \le \sum _{\ell \ge 1}2^{-p\ell } \sum _{F\in {\mathcal P}_\ell } \left| \mu (2^n F \cap B_n)-\nu (2^n F \cap B_n)\right| \\&\qquad +\left| 1-\frac{\mu (B_n)}{\nu (B_n)} \right| \sum _{\ell \ge 1}2^{-p\ell } \sum _{F\in {\mathcal P}_\ell } \nu (2^n F \cap B_n). \end{aligned}$$

This last term is smaller than \(2^{-p} \left| \mu (B_n)-\nu (B_n)\right| /(1-2^{-p})\) and this ends the proof. \(\square \)

3 Moment estimates

The aim of this section is to give the

Proof of Theorem 1

We thus assume that \(\mu \in {\mathcal P}({{\mathbb {R}}^d})\) and that \(M_q(\mu )<\infty \) for some \(q>p\). By a scaling argument, we may assume that \(M_q(\mu )=1\). This implies that \(\mu (B_n)\le 2^{-q(n-1)}\) for all \(n\ge 0\). By Lemma 5, we have \({\mathcal T}_p(\mu _N,\mu )\le \kappa _{p,d} {\mathcal D}_p(\mu _N,\mu )\), so that it suffices to study \(\mathbb {E}({\mathcal D}_p(\mu _N,\mu ))\). In the whole proof, the positive constant \(C\), whose value may change from line to line, depends only on \(p,d,q\).

For a Borel subset \(A\subset {{\mathbb {R}}^d}\), since \(N\mu _N(A)\) is Binomial\((N,\mu (A))\)-distributed, we have

$$\begin{aligned} \mathbb {E}\left( |\mu _N(A)-\mu (A)|\right) \le \min \left\{ 2\mu (A),\sqrt{\mu (A)/N}\right\} . \end{aligned}$$

Using the Cauchy–Scharz inequality and that \(\#({\mathcal P}_\ell )=2^{d\ell }\), we deduce that for all \(n\ge 0\), all \(\ell \ge 0\),

$$\begin{aligned} \sum _{F\in {\mathcal P}_\ell } \mathbb {E}\left( \left| \mu _N(2^nF \cap B_n)-\mu (2^nF \cap B_n)\right| \right) \le \min \left\{ 2\mu (B_n), 2^{d\ell /2} (\mu (B_n)/N)^{1/2}\right\} . \end{aligned}$$

Using finally Lemma 6 and that \(\mu (B_n)\le 2^{-q(n-1)}\), we find

$$\begin{aligned} \mathbb {E}({\mathcal D}_p(\mu _N,\mu ))\le C \sum _{n\ge 0} 2^{pn} \sum _{\ell \ge 0} 2^{-p\ell } \min \left\{ 2^{-qn}, 2^{d\ell /2} (2^{-qn}/N)^{1/2}\right\} . \end{aligned}$$
(4)

Step 1. Here we show that for all \({\varepsilon }\in (0,1)\), all \(N\ge 1\),

$$\begin{aligned} \sum _{\ell \ge 0} 2^{-p\ell } \min \left\{ {\varepsilon }, 2^{d\ell /2} ({\varepsilon }/N)^{1/2}\right\} \le&C \left\{ \begin{array}{lll} \min \{{\varepsilon },({\varepsilon }/N)^{1/2}\} &{}\quad \hbox {if} &{} p>d/2,\\ \min \{{\varepsilon },({\varepsilon }/N)^{1/2}\log (2+{\varepsilon }N)\}&{}\quad \hbox {if} &{} p=d/2,\\ \min \{{\varepsilon },{\varepsilon }({\varepsilon }N)^{-p/d}\} &{}\quad \hbox {if} &{} p\in (0,d/2). \end{array}\right. \end{aligned}$$

First of all, the bound by \(C{\varepsilon }\) is obvious in all cases (because \(p>0\)). Next, the case \(p>d/2\) is immediate. If \(p\le d/2\), we introduce \(\ell _{N,{\varepsilon }}:= \lfloor \log (2+{\varepsilon }N)/(d\log 2)\rfloor \), for which \(2^{d \ell _{N,{\varepsilon }}} \simeq 2+{\varepsilon }N\) and get an upper bound in

$$\begin{aligned} ({\varepsilon }/N)^{1/2} \sum _{\ell \le \ell _{N,{\varepsilon }}} 2^{(d/2-p)\ell } + {\varepsilon }\sum _{\ell \ge \ell _{N,{\varepsilon }}} 2^{-p\ell }. \end{aligned}$$

If \(p=d/2\), we find an upper bound in

$$\begin{aligned}&({\varepsilon }/N)^{1/2}\ell _{N,{\varepsilon }}+C {\varepsilon }2^{-p\ell _{N,{\varepsilon }}} \le C({\varepsilon }/N)^{1/2}\log (2+{\varepsilon }N)\\&\quad +\, C {\varepsilon }(1+{\varepsilon }N)^{-1/2}\le C({\varepsilon }/N)^{1/2}\log (2+{\varepsilon }N) \end{aligned}$$

as desired. If \(p\in (0,d/2)\), we get an upper bound in

$$\begin{aligned} C({\varepsilon }/N)^{1/2} 2^{(d/2-p)\ell _{N,{\varepsilon }}}\!\!+\!C {\varepsilon }2^{-p\ell _{N,{\varepsilon }}} \!\!\le \! C ({\varepsilon }/N)^{1/2} (2\!+{\varepsilon }N)^{1/2-p/d}\!+\! C {\varepsilon }(2\!+{\varepsilon }N)^{-p/d}. \end{aligned}$$

If \({\varepsilon }N \ge 1\), then \((2+{\varepsilon }N)^{1/2-p/d}\le (3{\varepsilon }N)^{1/2-p/d}\) and the conclusion follows. If now \({\varepsilon }N \in (0,1)\), the result is obvious because \(\min \{{\varepsilon },{\varepsilon }({\varepsilon }N)^{-p/d}\}={\varepsilon }\).

Step 2: \(p>d/2\). By (4) and Step 1 (with \({\varepsilon }=2^{-qn}\)), we find

$$\begin{aligned} \mathbb {E}({\mathcal D}_p(\mu _N,\mu ))&\le C \sum _{n\ge 0} 2^{pn} \min \left\{ 2^{-qn},(2^{-qn}/N)^{1/2}\right\} \\&\le C \left\{ \begin{array}{lll} N^{-1/2} &{} \hbox { if } &{} q>2p,\\ N^{-(q-p)/q} &{} \hbox { if } &{} q\in (p,2p). \end{array}\right. \end{aligned}$$

Indeed, this is obvious if \(q>2p\), while the case \(q\in (p,2p)\) requires to separate the sum in two parts \(n \le n_N\) and \(n>n_N\) with \(n_N=\lfloor \log N /(q\log 2)\rfloor \). This ends the proof when \(p>d/2\).

Step 3: \(p=d/2\). By (4) and Step 1 (with \({\varepsilon }=2^{-qn}\)), we find

$$\begin{aligned} \mathbb {E}({\mathcal D}_p(\mu _N,\mu ))\le C \sum _{n\ge 0} 2^{pn} \min \left\{ 2^{-qn},(2^{-qn}/N)^{1/2}\log (2+2^{-qn} N) \right\} . \end{aligned}$$

If \(q>2p\), we immediately get a bound in

$$\begin{aligned} \mathbb {E}({\mathcal D}_p(\mu _N,\mu ))\le C \sum _{n\ge 0} 2^{(p-q/2)n} N^{-1/2} \log (2+N) \le C \log (2+N) N^{-1/2}, \end{aligned}$$

which ends the proof (when \(p=d/2\) and \(q>2p\)).

If \(q\in (p,2p)\), we easily obtain, using that \(\log (2+x)\le 2\log x\) for all \(x\ge 2\), an upper bound in

$$\begin{aligned} \mathbb {E}({\mathcal D}_p(\mu _N,\mu ))\le \,\,&C \sum _{n\ge 0} {\mathbf{1}}_{\{N < 2.2^{nq}\}} 2^{(p-q)n} + C \sum _{n\ge 0} {\mathbf{1}}_{\{N \ge 2.2^{nq}\}}2^{(p-q/2)n} N^{-1/2} \log (N2^{-nq})\\ \le \,\,&C N^{-(q-p)/q} + CN^{-1/2} \sum _{n=0}^{n_N} 2^{(p-q/2)n} (\log N - nq \log 2)\\ =:\,\,&CN^{-(q-p)/q} +CN^{-1/2} K_N, \end{aligned}$$

where \(n_N=\lfloor \log (N/2) / (q \log 2)\rfloor \). A tedious exact computation shows that

$$\begin{aligned} K_N=&\log N \frac{2^{(p-q/2)(n_N+1)}-1}{2^{(p-q/2)}-1} \\&- q\log 2 \left[ (n_N+1)\frac{2^{(p-q/2)(n_N+1)}-1}{2^{(p-q/2)}-1} + \frac{n_N+1}{2^{(p-q/2)}-1}\right. \\&\qquad \qquad \qquad \left. -\, \frac{2^{(p-q/2)(n_N+2)}-2^{(p-q/2)}}{(2^{(p-q/2)}-1)^2}\right] . \end{aligned}$$

Using that the contribution of the middle term of the second line is negative and the inequality \(\log N - (n_N+1)q\log 2 \le \log 2\) (because \((n_N+1)q\log 2 \ge \log (N/2)\)), we find

$$\begin{aligned} K_N \le C 2^{(p-q/2)n_N} \le C N^{p/q-1/2}. \end{aligned}$$

We finally have checked that \(\mathbb {E}({\mathcal D}_p(\mu _N,\mu ))\le CN^{-(q-p)/q}+CN^{-1/2}N^{p/q-1/2} \le C N^{-(q-p)/q} \), which ends the proof when \(p=d/2\).

Step 4: \(p\in (0,d/2)\). We then have, by (4) and Step 1,

$$\begin{aligned} \mathbb {E}\left( {\mathcal D}_p(\mu _N,\mu )\right) \le&C\sum _{n\ge 0} 2^{pn} \min \left\{ 2^{-qn}, 2^{-qn(1-p/d)} N^{-p/d}\right\} . \end{aligned}$$

If \(q>dp/(d-p)\), which implies that \(q(1-p/d)>p\), we immediately get an upper bound by \(C N^{-p/d}\), which ends the proof when \(p<d/2\) and \(q>dp/(d-p)\).

If finally \(q\in (p,dp/(d-p))\), we separate the sum in two parts \(n \le n_N\) and \(n>n_N\) with \(n_N=\lfloor \log N /(q\log 2)\rfloor \) and we find a bound in \(C N^{-(q-p)/q}\) as desired. \(\square \)

4 Concentration inequalities in the compact poissonized case

It is technically advantageous to first consider the case where the size of the sampling is Poisson distributed, which implies some independence properties. Replacing \(N\) (large) by a Poisson\((N)\)-distributed random variable should be feasible, because a Poisson\((N)\)-distributed random variable is close to \(N\) with high probability.

Notation 7

We introduce the functions \(f\) and \(g\) defined on \((0,\infty )\) by

$$\begin{aligned} f(x)=(1+x)\log (1+x)-x \quad \hbox {and} \quad g(x)=(x \log x - x + 1) {\mathbf{1}}_{\{x\ge 1\}}. \end{aligned}$$

Observe that \(f\) is increasing, nonnegative, equivalent to \(x^2\) at \(0\) and to \(x\log x\) at infinity. The function \(g\) is positive and increasing on \((1,\infty )\).

The goal of this section is to check the following.

Proposition 8

Assume that \(\mu \) is supported in \((-1,1]^d\). Let \(\Pi _N\) be a Poisson measure on \({{\mathbb {R}}^d}\) with intensity measure \(N\mu \) and introduce the associated empirical measure \(\Psi _N=(\Pi _N({{\mathbb {R}}^d}))^{-1}\Pi _N\). Let \(p\ge 1\) and \(d\ge 1\). There are some positive constants \(C,c\) (depending only on \(d,p\)) such that for all \(N\ge 1\), all \(x\in (0,\infty )\),

$$\begin{aligned} \mathbb {P}\left( \Pi _N({{\mathbb {R}}^d}){\mathcal D}_p(\Psi _N,\mu )\ge N x\right) \le C \left\{ \begin{array}{ll} \exp (-N f(c x))&{}\quad \hbox {if }\,p>d/2,\\ \exp \left( - N f(c x/\log (2+1/x))\right) &{}\quad \hbox {if }\,p=d/2,\\ \exp \left( - Nf(c x)\right) + \exp \left( -c N x^{d/p}\right) &{}\quad \hbox {if}\,p\in (0,d/2). \end{array}\right. \end{aligned}$$

We start with some easy and well-known concentration inequalities for the Poisson distribution.

Lemma 9

For \(\lambda >0\) and \(X\) a Poisson\((\lambda )\)-distributed random variable, we have

  1. (a)

    \(E(\exp (\theta X)) = \exp (\lambda (e^\theta -1))\) for all \(\theta \in {\mathbb {R}}\);

  2. (b)

    \(E(\exp (\theta |X-\lambda |)) \le 2 \exp (\lambda (e^\theta -1-\theta ) )\) for all \(\theta >0\);

  3. (c)

    \(\mathbb {P}(X>\lambda x) \le \exp (-\lambda g(x))\) for all \(x>0\);

  4. (d)

    \(\mathbb {P}(|X-\lambda |>\lambda x) \le 2 \exp (-\lambda f(x))\) for all \(x>0\);

  5. (e)

    \(\mathbb {P}(X>\lambda x) \le \lambda \) for all \(x>0\).

Proof

Point (a) is straightforward. For point (b), write \(E(\exp (\theta |X-\lambda |))\le e^{\theta \lambda }\mathbb {E}(\exp (-\theta X))+ e^{-\theta \lambda } \mathbb {E}(\exp (\theta X))\), use (a) and that \(\lambda (e^{-\theta }-1+\theta )\le \lambda (e^\theta -1-\theta )\). For point (c), write \(\mathbb {P}(X>\lambda x)\le e^{- \theta \lambda x}\mathbb {E}[\exp (\theta X)]\), use (a) and optimize in \(\theta \). Use the same scheme to deduce (d) from (b). Finally, for \(x>0\), \(\mathbb {P}(X>\lambda x)\le \mathbb {P}(X>0)=1-e^{-\lambda }\le \lambda \). \(\square \)

We can now give the

Proof of Proposition 8

During the proof, the constants may only depend on \(p\) and \(d\). We fix \(x>0\) for the whole proof. Recalling Notation 4-(a), we have

$$\begin{aligned} \Pi _N({{\mathbb {R}}^d}){\mathcal D}_p(\Psi _N,\mu )&= C\sum _{\ell \ge 1} 2^{-p\ell } \sum _{F\in {\mathcal P}_\ell } |\Pi _N(F) - \Pi _N({{\mathbb {R}}^d})\mu (F)|\\&\le C|\Pi _N({{\mathbb {R}}^d})-N| + C\sum _{\ell \ge 1} 2^{-p\ell } \sum _{F\in {\mathcal P}_\ell } |\Pi _N(F) - N\mu (F)|\\&\le C|\Pi _N({{\mathbb {R}}^d})-N| + C (N+\Pi _N({{\mathbb {R}}^d}))2^{-p \ell _0}\\&\quad +C\sum _{\ell =1}^{\ell _0} 2^{-p\ell } \sum _{F\in {\mathcal P}_\ell } |\Pi _N(F) - N\mu (F)| \end{aligned}$$

for any choice of \(\ell _0\in {\mathbb {N}}\). We will choose \(\ell _0\) later, depending on the value of \(x\). For any nonnegative family \(r_\ell \) such that \(\sum _1^{\ell _0} r_\ell \le 1\), we thus have

$$\begin{aligned} {\varepsilon }(N,x):=&\mathbb {P}\left( \Pi _N({{\mathbb {R}}^d}){\mathcal D}_p(\Psi _N,\mu )\ge Nx\right) \\ \le&\mathbb {P}\left( |\Pi _N({{\mathbb {R}}^d})-N| \ge c Nx\right) + \mathbb {P}\left( \Pi _N({{\mathbb {R}}^d})\ge N(c x 2^{p \ell _0} -1)\right) \\&+\sum _{\ell =1}^{\ell _0} \mathbb {P}\left( \sum _{F\in {\mathcal P}_\ell } |\Pi _N(F) - N\mu (F)| \ge c N x 2^{p\ell }r_\ell \right) . \end{aligned}$$

By Lemma 9-(c), (d), since \(\Pi _N({{\mathbb {R}}^d})\) is Poisson\((N)\)-distributed, \(\mathbb {P}(\Pi _N({{\mathbb {R}}^d})\ge N(c x 2^{p \ell _0} -1))\le \exp (-N g(c x 2^{p \ell _0} -1))\) and \(\mathbb {P}(|\Pi _N({{\mathbb {R}}^d})-N| \ge c Nx) \le 2 \exp (-Nf(cx))\). Next, using that the family \((\Pi _N(F))_{F\in {\mathcal P}_\ell }\) is independent, with \(\Pi _N(F)\) Poisson\((N\mu (F))\)-distributed, we use Lemma 9-(a) and that \(\#({\mathcal P}_\ell )=2^{\ell d}\) to obtain, for any \(\theta >0\),

$$\begin{aligned} \mathbb {E}\left( \!\exp \left( \theta \sum _{F\in {\mathcal P}_\ell } |\Pi _N(F) - N\mu (F)| \right) \right) \le \prod _{F\in {\mathcal P}_\ell } 2e^{N\mu (F)(e^\theta -\theta -1)} \le 2^{2^{d\ell }} e^{N(e^\theta -\theta -1)}. \end{aligned}$$

Hence

$$\begin{aligned}&\mathbb {P}\left( \sum _{F\in {\mathcal P}_\ell } |\Pi _N(F) - N\mu (F)| \ge c N x2^{p\ell } r_\ell \right) \\&\quad \le \exp \left( -c \theta N x2^{p\ell } r_\ell \right) 2^{2^{d\ell }} \exp \left( N(e^\theta -\theta -1)\right) . \end{aligned}$$

Choosing \(\theta = \log (1+c x2^{p\ell } r_\ell )\), we find

$$\begin{aligned} \mathbb {P}\left( \sum _{F\in {\mathcal P}_\ell } |\Pi _N(F) - N\mu (F)| \ge c N x2^{p\ell } r_\ell \right) \le&2^{2^{d\ell }} \exp (-Nf(c x 2^{p\ell }r_\ell )). \end{aligned}$$

We have checked that

$$\begin{aligned}&{\varepsilon }(N,x)\le 2 \exp (-Nf(cx)) + \exp (-N g(c x 2^{p \ell _0} -1))\\&\quad + \sum _{\ell =1}^{\ell _0}2^{2^{d\ell }} \exp (-Nf(c x 2^{p\ell }r_\ell )). \end{aligned}$$

At this point, the value of \(c>0\) is not allowed to vary anymore. We introduce some other positive constants \(a\) whose value may change from line to line.

Case 1: \(cx>2\). Then we choose \(\ell _0=1\) and \(r_1=1\). We have \(cx2^{p\ell _0}-1 = 2^pcx -1 \ge (2^p-1)cx+1\) whence \(g(c x 2^{p \ell _0} -1)\ge g((2^p-1)cx+1) =f((2^p-1)cx)\). We also have \(\sum _{\ell =1}^{\ell _0}2^{2^{d\ell }} \exp (-Nf(c x 2^{p\ell }r_\ell )) =2^{2^d}\exp (-Nf(2^pc x))\). We finally get \({\varepsilon }(N,x) \le C \exp (-Nf(ax))\), which proves the statement (in the three cases, when \(cx>2\)).

Case 2: \(cx\le 2\). We choose \(\ell _0\) so that \((1+2/(cx))\le 2^{p\ell _0}\le 2^p (1+2/(cx))\), i.e.

$$\begin{aligned} \ell _0:=\lfloor \log (1+2/(cx))/(p\log 2) \rfloor +1. \end{aligned}$$

This implies that \(c x 2^{p \ell _0} \ge 2 + cx\). Hence \(g(c x 2^{p \ell _0} -1)\ge g(1+cx)=f(cx)\). Furthermore, we have \(c x 2^{p \ell }r_\ell \le c x 2^{p \ell _0}\le 2^p( 2+cx)\le 2^{p+2}\) for all \(\ell \le \ell _0\), whence \(f(c x 2^{p\ell }r_\ell ) \ge a x^2 2^{2p\ell } r_\ell ^2\) (because \(f(x)\ge a x^2\) for all \(x\in [0,2^{p+2}]\)). We thus end up with (we use that \(2^{2^{d\ell }}\le \exp (2^{d\ell })\))

$$\begin{aligned} {\varepsilon }(N,x) \le 3\exp (-Nf(cx))+ \sum _{\ell =1}^{\ell _0}\exp \left( 2^{d\ell } - N a x^2 2^{2p\ell }r^2_\ell \right) . \end{aligned}$$

Now the value of \(a>0\) is not allowed to vary anymore, and we introduce \(a^{\prime }>0\), whose value may change from line to line.

Case 1.1: \(p>d/2\). We take \(r_\ell :=(1-2^{-\eta })2^{- \eta \ell }\) for some \(\eta >0\) such that \(2(p-\eta )>d\). If \(Nx^2\ge 1\), we easily get

$$\begin{aligned} {\varepsilon }(N,x) \le \,&3\exp (-Nf(cx))+ \sum _{\ell =1}^{\ell _0}\exp (2^{d\ell } - N a^{\prime } x^2 2^{2(p-\eta )\ell })\\ \le \,&3\exp (-Nf(cx)) + C \exp (-a^{\prime }Nx^2) \\ \le \,&C \exp (-Nf(a^{\prime }x)). \end{aligned}$$

The last inequality uses that \(y^2\ge f(y)\) for all \(y>0\). If finally \(Nx^2\le 1\), we obviously have

$$\begin{aligned} {\varepsilon }(N,x)\le 1 \le \exp (1-Nx^2)\le C \exp (-Nx^2) \le C \exp (-Nf(x)). \end{aligned}$$

We thus always have \({\varepsilon }(N,x)\le C \exp (-Nf(a^{\prime }x))\) as desired.

Case 2.2: \(p=d/2\). We choose \(r_\ell :=1/\ell _0\). Thus, if \(a N(x/\ell _0)^2 \ge 2\), we easily find

$$\begin{aligned} {\varepsilon }(N,x) \le \,&3\exp (-Nf(cx))+ \sum _{\ell =1}^{\ell _0}\exp \left( 2^{d\ell }(1 - a N (x/\ell _0)^2\right) \\ \le \,&3\exp (-Nf(cx)) + C \exp \left( -a^{\prime }N(x/\ell _0)^2\right) \\ \le \,&3\exp (-Nf(cx)) + C \exp \left( -Nf(a^{\prime }x/\ell _0)\right) \\ \le \,&C \exp (-Nf(a^{\prime }x/\ell _0)) \end{aligned}$$

because \(\ell _0 \ge 1\) and \(f\) is increasing. If now \(a N(x/\ell _0)^2 < 2\), we just write

$$\begin{aligned} {\varepsilon }(N,x)\!\le \! 1 \!\le \exp (2-aN(x/\ell _0)^2)\!\le C \exp (-aN(x/\ell _0)^2) \!\le C \exp (-Nf(ax/\ell _0)). \end{aligned}$$

We thus always have \({\varepsilon }(N,x)\le C \exp (-Nf(a^{\prime }x/\ell _0))\). Using that \(\ell _0 \le C \log (2+1/x)\), we immediately conclude that \({\varepsilon }(N,x)\le C \exp (-Nf(a^{\prime }x/\log (2+1/x)))\) as desired.

Case 2.3: \(p\in (0,d/2)\). We choose \(r_\ell := \kappa 2^{(d/2-p)(\ell -\ell _0)}\) with \(\kappa =1/(1-2^{p-d/2})\). For all \(\ell \le \ell _0\),

$$\begin{aligned} 2^{dl}\!-\!a Nx^2 2^{2p\ell }r_\ell ^2 \!&= -a \kappa ^2 N x^{d/p} 2^{2p\ell }\left[ 2^{(d-2p)(\ell -\ell _0)} x^{2-d/p} - 2^{(d-2p)\ell }/(N a x^{d/p})\right] \\&\le -a\kappa ^2Nx^{d/p}2^{2p\ell }\left[ b 2^{(d-2p)\ell }- 2^{(d-2p)\ell }/(N a \kappa ^2 x^{d/p})\right] \end{aligned}$$

where the constant \(b>0\) is such that \(2^{-(d-2p)\ell _0}\ge b x^{d/p-2}\) (the existence of \(b\) is easily checked). Hence if \(N a \kappa ^2 x^{d/p}\ge 2/b\), we find

$$\begin{aligned} 2^{dl}-a Nx^22^{2p\ell }r_\ell ^2 \le&-a b \kappa ^2 Nx^{d/p}2^{d\ell } b/2 \end{aligned}$$

and thus, still using that \(N x^{d/p}\ge 2/(ab\kappa ^2)\),

$$\begin{aligned} \sum _{\ell =1}^{\ell _0}\exp (2^{d\ell } - N c^2 x^2 2^{2p\ell }r^2_\ell ) \le C\exp (-a^{\prime } N x^{d/p}). \end{aligned}$$

Consequently, we have \({\varepsilon }(N,x) \le 3\exp (-Nf(cx)) +C\exp (-a^{\prime } N x^{d/p})\) if \(N a \kappa ^2 x^{d/p}\ge 2/b\). As usual, the case where \(N a \kappa ^2 x^{d/p}\le 2/b\) is trivial, since then

$$\begin{aligned} {\varepsilon }(N,x)\le 1 \le \exp (2/b -N a \kappa ^2 x^{d/p}) \le C \exp (-a^{\prime } N x^{d/p}). \end{aligned}$$

This ends the proof. \(\square \)

5 Depoissonization in the compact case

We next check the following compact version of Theorem 2.

Proposition 10

Assume that \(\mu \) is supported in \((-1,1]^d\). Let \(p>0\) and \(d\ge 1\) be fixed. There are some positive constants \(C\) and \(c\) (depending only on \(p,d\)) such that for all \(N\ge 1\), all \(x\in (0,\infty )\),

$$\begin{aligned} \mathbb {P}\left[ {\mathcal D}_p(\mu _N,\mu )\ge x\right] \le {\mathbf{1}}_{\{x\le 1\}}C \left\{ \begin{array}{ll} \exp (-c N x^2) &{}\quad \hbox {if }\,p>d/2;\\ \exp \left( - c N (x/\log (2+1/x))^2\right) &{}\quad \hbox {if }\,p=d/2;\\ \exp \left( -c N x^{d/p}\right) &{}\quad \hbox {if }\,p \in (0,d/2). \end{array}\right. \end{aligned}$$

We will need the following easy remark.

Lemma 11

For all \(N\ge 1\), for \(X\) Poisson\((N)\)-distributed, for all \(k\in \{0,\dots , \lfloor \sqrt{N} \rfloor \}\),

$$\begin{aligned} \mathbb {P}[X=N+k] \ge \kappa _0 N^{-1/2} \quad \hbox { where }\, \kappa _0= e^{-2}/\sqrt{2}. \end{aligned}$$

Proof

By Perrin [34], we have \(N! \le e \sqrt{N} (N/e)^N\). Thus

$$\begin{aligned} \mathbb {P}[X=N+k]&= e^{-N} \frac{N^{N+k}}{(N+k)!} \ge e^{-N-1} \frac{N^{N+k}}{ \sqrt{N+k} ((N+k)/e)^{N+k}}\\&\ge \frac{1}{\sqrt{2N}} \left( \frac{N}{N+k} \right) ^{N+k}e^{k-1}. \end{aligned}$$

Since \(\log (1+x)\le x\) on \((0,1)\), we have \(((N+k)/N)^{N+k}\le \exp (k+k^2/N)\le \exp (k+1)\), so that \(\mathbb {P}[X=N+k] \ge e^{-2}/\sqrt{2N}\). \(\square \)

Proof of Proposition 10

The probability indeed vanishes if \(x>1\), since \({\mathcal D}_p\) is smaller than \(1\) when restricted to probability measures on \((-1,1]^d\). In the sequel, the constants may only depend on \(p\) and \(d\).

Step 1. We introduce a Poisson measure \(\Pi _N\) on \({{\mathbb {R}}^d}\) with intensity measure \(N\mu \) and the associated empirical measure \(\Psi _N=\Pi _N/\Pi _N({{\mathbb {R}}^d})\). Conditionally on \(\{\Pi _N({{\mathbb {R}}^d})=n\}\), \(\Psi _N\) has the same law as \(\mu _n\) (the empirical measure of \(n\) i.i.d. random variables with law \(\mu \)).

$$\begin{aligned} \mathbb {P}\left[ \Pi _N({{\mathbb {R}}^d}){\mathcal D}_p(\Psi _N,\mu ) \ge Nx\right] =\sum _{n\ge 0} \mathbb {P}\left[ \Pi _N({{\mathbb {R}}^d})=n\right] \mathbb {P}\left[ n{\mathcal D}_p(\mu _n,\mu )\ge N x\right] . \end{aligned}$$

By Lemma 11 (since \(\Pi _N({{\mathbb {R}}^d})\) is Poisson\((N)\)-distributed),

$$\begin{aligned} \frac{1}{\sqrt{N}}\sum _{k=0}^{\lfloor \sqrt{N} \rfloor } \mathbb {P}\left[ (N+k){\mathcal D}_p(\mu _{N+k},\mu )\ge N x\right] \le \kappa _0^{-1} \mathbb {P}\left[ \Pi _N({{\mathbb {R}}^d}){\mathcal D}_p(\Psi _N,\mu ) \ge Nx\right] , \end{aligned}$$

which of course implies that (for all \(N\ge 1\), all \(x>0\)),

$$\begin{aligned} \frac{1}{\sqrt{N}}\sum _{k=0}^{\lfloor \sqrt{N} \rfloor } \mathbb {P}\left[ {\mathcal D}_p(\mu _{N+k},\mu )\ge x\right] \le \kappa _0^{-1} \mathbb {P}\left[ \Pi _N({{\mathbb {R}}^d}){\mathcal D}_p(\Psi _N,\mu ) \ge Nx\right] . \end{aligned}$$

Step 2. Here we prove that there is a constant \(A>0\) such that for any \(N\ge 1\), any \(k \in \{0,\ldots , \lfloor \sqrt{N} \rfloor \}\), any \(x > A N^{-1/2}\),

$$\begin{aligned} \mathbb {P}\left[ {\mathcal D}_p(\mu _{N},\mu )\ge x\right] \le \mathbb {P}\left[ {\mathcal D}_p(\mu _{N+k},\mu )\ge x/2\right] . \end{aligned}$$

Build \(\mu _n\) for all values of \(n\ge 1\) with the same i.i.d. family of \(\mu \)-distributed random variables \((X_k)_{k\ge 1}\). Then a.s.,

$$\begin{aligned} |\mu _{N+k}\!-\mu _N|_{TV}\!\le \left| \frac{k}{N(N\!+\!k)}\sum _1^N \delta _{X_j}\right| _{TV} \!+ \left| \frac{1}{N+k}\sum _{N+1}^{N+k}\delta _{X_j}\right| _{TV}\le \frac{k}{N+k}\le \frac{1}{\sqrt{N}}. \end{aligned}$$

This obviously implies (recall Notation 4-(a)) that \({\mathcal D}_p(\mu _{N},\mu _{N+k}) \le C N^{-1/2}\) a.s. (where \(C\) depends only on \(p\)). By the triangular inequality, \({\mathcal D}_p(\mu _N,\mu ) \le {\mathcal D}_p(\mu _{N+k},\mu ) + C N^{-1/2}\), whence

$$\begin{aligned} \mathbb {P}\left[ {\mathcal D}_p(\mu _{N},\mu )\!\ge \! x\right] \!\le \mathbb {P}\left[ {\mathcal D}_p(\mu _{N+k},\mu )\!\ge \! x - CN^{-1/2} \right] \le \mathbb {P}\left[ {\mathcal D}_p(\mu _{N+k},\mu )\!\ge \! x/2\right] \end{aligned}$$

if \(x- CN^{-1/2}\ge x/2\), i.e. \(x\ge 2C N^{-1/2}\).

Step 3. Gathering Steps 1 and 2, we deduce that for all \(N\ge 1\), all \(x>AN^{-1/2}\),

$$\begin{aligned} \mathbb {P}\left[ {\mathcal D}_p(\mu _{N},\mu )\ge x\right]&\le \frac{1}{\sqrt{N}} \sum _{k=0}^{\lfloor \sqrt{N} \rfloor } \mathbb {P}\left[ {\mathcal D}_p(\mu _{N+k},\mu )\ge x/2\right] \\&\le C \mathbb {P}\left[ \Pi _N({{\mathbb {R}}^d}){\mathcal D}_p(\Psi _N,\mu ) \ge Nx/2\right] . \end{aligned}$$

We next apply Proposition 8. Observing that, for \(x \in (0,1]\),

  1. (i)

    \(\exp (-Nf(cx/2))\le \exp (-cNx^2)\) (case \(p>d/2\)),

  2. (ii)

    \(\exp (-Nf(cx/2\log (2+2/x)))\le \exp (-cN(x/\log (2+1/x)^2)\) (case \(p=d/2\)),

  3. (iii)

    \(\exp (-Nf(cx/2)) + \exp (c N (x/2)^{d/p})\le \exp (-cNx^{d/p})\) (case \(p\in (0,d/2)\))

concludes the proof when \(x>AN^{-1/2}\). But the other case is trivial, because for \(x \le A N^{-1/2}\),

$$\begin{aligned} \mathbb {P}[{\mathcal D}_p(\mu _{N},\mu )\ge x]\le 1 \le \exp (A^2-Nx^2)\le C\exp (-Nx^2), \end{aligned}$$

which is also smaller than \(C\exp (-N(x/\log (2+1/x))^2)\) and than \(C\exp (-Nx^{d/p})\) (if \(d>2p\)).

6 Concentration inequalities in the non compact case

Here we conclude the proof of Theorem 2. We will need some concentration estimates for the Binomial distribution.

Lemma 12

Let \(X\) be Binomial\((N,p)\)-distributed. Recall that \(f\) was defined in Notation 7.

  1. (a)

    \(\mathbb {P}[|X-Np|\ge N p z]\le ({\mathbf{1}}_{\{p(1+z) \le 1\}}+{\mathbf{1}}_{\{z \le 1\}}) \exp (-Npf(z))\)   for all \(z>0\).

  2. (b)

    \(\mathbb {P}[|X-Np|\ge N p z]\le Np\)   for all \(z>1\).

  3. (c)

    \(\mathbb {E}(\exp (-\theta X))=(1-p+pe^{-\theta })^N \le \exp (-N p (1-e^{-\theta }))\)   for \(\theta >0\).

Proof

Point (c) is straightforward. Point (b) follows from the fact that for \(z>1\), \(\mathbb {P}[|X-Np|\ge N p z]=\mathbb {P}[X\ge Np(1+z)]\le \mathbb {P}[X\ne 0]=1-(1-p)^N \le pN\). For point (a), we use Bennett’s inequality [4], see Devroye and Lugosi [17, Exercise 2.2 page 11], together with the obvious facts that \(\mathbb {P}[X-Np\ge N p z]=0\) if \(p(1+z)>1\) and \(\mathbb {P}[X-Np\le -N p z]=0\) if \(z>1\). The following elementary tedious computations also works: write \(\mathbb {P}[|X-Np|\ge N p z] = \mathbb {P}(X\ge Np(1+z)) + \mathbb {P}(N-X\ge N(1-p+zp)) =:\Delta (p,z)+\Delta (1-p,zp/(1-p))\), observe that \(N-X\sim \) Binomial\((N,1-p)\). Use that \(\Delta (p,z)\le {\mathbf{1}}_{\{p(1+z)\le 1\}}\exp (-\theta N p(1+z))(1-p+pe^\theta )^N\) and choose \(\theta =\log ((1-p)(1+z)/(1-p-pz))\), this gives \(\Delta (p,z)\le {\mathbf{1}}_{\{p(1+z)\le 1\}} \exp (-N[p(1+z)\log (1+z)+(1-p-pz)\log ((1-p-pz)/(1-p)) ] )\). A tedious study shows that \(\Delta (p,z)\le {\mathbf{1}}_{\{p(1+z)\le 1\}} \exp (-Npf(z))\) and that \(\Delta (1-p,zp/(1-p))\le {\mathbf{1}}_{\{z \le 1\}}\exp (-Npf(z))\). \(\square \)

We next estimate the first term when computing \({\mathcal D}_p(\mu _N,\mu )\).

Lemma 13

Let \(\mu \in {\mathcal P}({{\mathbb {R}}^d})\) and \(p>0\). Assume (1), (2) or (3). Recall Notation 4 and put \(Z_N^p:=\sum _{n\ge 0}2^{pn}|\mu _N(B_n)-\mu (B_n)|\). Let \(x_0\) be fixed. For all \(x>0\),

$$\begin{aligned} \mathbb {P}[Z_N^p&\ge x ] \le C \exp (-c N x^2){\mathbf{1}}_{\{x\le x_0\}}\\&+ C\left\{ \begin{array}{ll} \exp (-cNx^{\alpha /p}){\mathbf{1}}_{\{x> x_0\}} &{}\quad under \mathrm{(1)},\\ \exp (-c(Nx)^{(\alpha -{\varepsilon })/p}){\mathbf{1}}_{\{x\le x_0\}}+\exp (-c (Nx)^{\alpha /p}){\mathbf{1}}_{\{x>x_0\}} &{}\quad \forall \;{\varepsilon }\in (0,\alpha )\, under \mathrm{(2)},\\ N (Nx)^{-(q-{\varepsilon })/p}&{}\quad \forall \;{\varepsilon }\in (0,q)\, under \mathrm{(3)}. \end{array}\right. \end{aligned}$$

The positive constants \(C\) and \(c\) depend only on \(p,d,x_0\) and either on \(\alpha ,\gamma ,{\mathcal E}_{\alpha ,\gamma }(\mu )\) (under (2)) or on \(\alpha ,\gamma ,{\mathcal E}_{\alpha ,\gamma }(\mu ),{\varepsilon }\) (under (2)) or on \(q,M_q(\mu ),{\varepsilon }\) (under (3)).

Proof

During the proof, the constants are only allowed to depend on the same quantities as in the statement, unless we precise it. Under (1) or (2), we assume that \(\gamma =1\) without loss of generality (by scaling), whence \({\mathcal E}_{\alpha ,1}(\mu )<\infty \) and thus \(\mu (B_n)\le C e^{-2^{(n-1)\alpha }}\) for all \(n\ge 0\). Under (3), we have \(\mu (B_n)\le C 2^{-qn}\) for all \(n\ge 0\). For \(\eta >0\) to be chosen later (observe that \(\sum _{n\ge 0} (1-2^{-\eta })2^{-\eta n}=1\)), putting \(c:=1-2^{-\eta }\) and \(z_n:= c x 2^{-(p+\eta )n}/\mu (B_n)\),

$$\begin{aligned} \mathbb {P}\left( Z_N^p\ge x \right)&\le \left( \sum _{n\ge 0} {\mathbf{1}}_{\{z_n \le 2\}} \mathbb {P}\left[ |N\mu _N(B_n)-N\mu (B_n)| \ge N \mu (B_n) z_n\right] \right) \wedge 1 \\&\quad + \left( \sum _{n\ge 0} {\mathbf{1}}_{\{z_n > 2\}} \mathbb {P}\left[ |N\mu _N(B_n)-N\mu (B_n)| \ge N \mu (B_n) z_n\right] \right) \wedge 1 \\&=: \left( \sum _{n\ge 0} I_n(N,x)\right) \wedge 1 + \left( \sum _{n\ge 0} J_n(N,x)\right) \wedge 1. \end{aligned}$$

From now on, the value of \(c>0\) is not allowed to vary anymore. We introduce another positive constant \(a>0\) whose value may change from line to line.

Step 1: bound of \(I_n\). Here we show that under (3) (which is of course implied by (1) or (2)), if \(\eta \in (0,q/2-p)\), there is \(A_0>0\) such that

$$\begin{aligned} \sum _{n\ge 0} I_n(N,x) \le C\exp (-a N x^2){\mathbf{1}}_{\{x\le A_0\}} \quad \hbox {if }N x^2 \ge 1. \end{aligned}$$

This will obviously imply that for all \(N\ge 1\), all \(x>0\),

$$\begin{aligned} \left( \sum _{n\ge 0} I_n(N,x)\right) \wedge 1 \le C \exp (-a N x^2){\mathbf{1}}_{\{x\le A_0\}}. \end{aligned}$$

First, \(\sum _{n\ge 0} I_n(N,x)=0\) if \(z_n>2\) for all \(n\ge 0\). Recalling that \(\mu (B_n)\le C2^{-qn}\), this is the case if \(x\ge (2C/c)\sup _{n\ge 0}2^{(p+\eta -q)n}=(2C/c):=A_0\). Next, since \(N\mu _N(B_n)\sim \) Binomial\((N,\mu (B_n))\), Lemma 12-(a) leads us to

$$\begin{aligned} I_n(N,x) \le 2 {\mathbf{1}}_{\{z_n\le 2\}} \exp (-N\mu (B_n)f(z_n)) \le 2 \exp (-N \mu (B_n)z_n^2/4)), \end{aligned}$$

because \(f(x) \ge x^2/4\) for \(x\in [0,2]\). Since finally \(\mu (B_n) z_n^2/4 \ge a x^2 2^{(q-2p-2\eta )n}\), we easily conclude, since \(q-2p-2\eta >0\) and since \(Nx^2 \ge 1\), that

$$\begin{aligned} \sum _{n\ge 0} I_n(N,x) \le C \sum _{n\ge 0} \exp (-a N x^22^{(q-2p-2\eta )n}){\mathbf{1}}_{\{x\le A_0\}} \le C \exp (-a N x^2){\mathbf{1}}_{\{x\le A_0\}}. \end{aligned}$$

Step 2: bound of \(J_n\) under (1) or (2) when \(x\le A\). Here we fix \(A>0\) and prove that if \(\eta >0\) is small enough, for all \(x\in (0,A]\) such that \(Nx^2\ge 1\),

$$\begin{aligned} \sum _{n\ge 0} J_n(N,x) \le&C \left\{ \begin{array}{ll} \exp (-a N x^2) &{}\quad \hbox {under (1),}\\ \exp (-a N x^2)+\exp (- a (N x)^{(\alpha -{\varepsilon })/p}) &{}\quad \forall \;{\varepsilon }\in (0,\alpha )\,\hbox { under (2).} \end{array}\right. \end{aligned}$$

Here the positive constants \(C\) and \(a\) are allowed to depend additionally on \(A\). This will imply, as usual, that for all \(N\ge 1\), all \(x\in (0,A]\),

$$\begin{aligned} \left( \sum _{n\ge 0} J_n(N,x)\right) \wedge 1 \le&C \left\{ \begin{array}{ll} \exp (-a N x^2)&{}\quad \hbox {under (1),}\\ \exp (-a N x^2)+\exp (- a (N x)^{(\alpha -{\varepsilon })/p}) &{}\quad \forall \;{\varepsilon }\in (0,\alpha )\,\hbox { under (2).}\\ \end{array}\right. \end{aligned}$$

By Lemma 12-(a), (b) (since \(z_n> 2\) implies \({\mathbf{1}}_{\{\mu (B_n)(1+z_n)\le 1\}}+{\mathbf{1}}_{\{z_n\le 1\}} \le {\mathbf{1}}_{\{z_n\le 1/\mu (B_n)\}}\)),

$$\begin{aligned} J_n(N,x)\le \,&{\mathbf{1}}_{\{2<z_n\le 1/\mu (B_n) \}} \min \left\{ \exp (-N\mu (B_n)f(z_n)),N\mu (B_n) \right\} \\ \le \,&{\mathbf{1}}_{\{z_n \mu (B_n)\le 1\}} \min \left\{ \exp \left( -a N \mu (B_n) z_n \log [2\vee z_n]\right) , N\mu (B_n)\right\} \end{aligned}$$

because \(f(y)\ge a y \log y \ge a y \log [2\vee y]\) for \(y> 2\). Since \(\mu (B_n) \le Ce^{-2^{(n-1)\alpha }}\), we get

$$\begin{aligned} J_n(N,x)\le C \min \{\exp (-aN x 2^{-(p+\eta )n} \log [2\vee ( a x2^{-(p+\eta )n}e^{2^{(n-1)\alpha }})]),Ne^{2^{-(n-1)\alpha }}\}. \end{aligned}$$

A straightforward computation shows that there is a constant \(K\) such that for \(n \ge n_1:=\lfloor K(1+\log \log (K/x))\rfloor \), we have \(\log (a x2^{-(p+\eta )n}e^{2^{(n-1)\alpha }})\ge 2^{(n-1)\alpha }/2\). Consequently,

$$\begin{aligned}&\sum _{n\ge 0}J_n(N,x) \le \, C n_1 \exp (-aN x2^{-(p+\eta )n_1})\\&\qquad +C \sum _{n> n_1} \min \left\{ \exp (-aN x 2^{(\alpha -p-\eta )n}), e^{-2^{(n-1)\alpha }} \right\} \\&\qquad =\, C J^1(N,x)+C J^2(N,x). \end{aligned}$$

We first show that \(J^1(N,x)\le Ce^{-aNx^2}\) (here we actually could get something much better). First, since \(n_1=\lfloor K+K\log \log (K/x)\rfloor \) and \(x \in [0,A]\), we clearly have e.g. \(x2^{-(p+\eta )n_1} \ge a x^{3/2}\). Next, \(Nx^2\ge 1\) implies that \(1/x \le (Nx^{3/2})^2\). Thus

$$\begin{aligned} J^1(N,x)&\le C(1+\log \log (C(Nx^{3/2})^2))\exp (-aNx^{3/2})\\&\le C\exp (-aNx^{3/2}) \le \exp (-aNx^2). \end{aligned}$$

We now treat \(J^2(N,x)\).

Step 2.1. Under (1), we immediately get, if \(\eta \in (0, \alpha -p)\) (recall that \(x\in [0,A]\)),

$$\begin{aligned} J^2(N,x)\le \sum _{n\ge 0} \exp \left( -aN x 2^{(\alpha -p-\eta )n}\right) \le C \exp (-aNx) \le C\exp (-aNx^2), \end{aligned}$$

where we used that \(x \le A\) and \(Nx^2\ge 1\) (whence \(Nx\ge 1/A\)).

Step 2.2. Under (2), we first write

$$\begin{aligned} J^2(N,x)&\le \sum _{n\ge 0} \min \left\{ \exp (-aN x 2^{(\alpha -p-\eta )n}),e^{-2^{(n-1)\alpha }}\right\} \\&\le n_2 \exp (-cNx 2^{(\alpha -p-\eta )n_2})+ Ne^{-2^{(n_2-1)\alpha }}. \end{aligned}$$

We choose \(n_2:=\lfloor \log (Nx)/((p+\eta )\log 2) \rfloor \), which yields us to \(2^{(n_2-1)\alpha }\ge (Nx)^{\alpha /(p+\eta )}/2^{2\alpha }\) and \((Nx) 2^{(\alpha -p-\eta )n_2}\le (Nx)^{\alpha /(p+\eta )}\). Consequently (recall that \(x\in (0,A]\)),

$$\begin{aligned} J^2(N,x)&\le C (1+\log (Nx)+N) \exp (-a (Nx)^{\alpha /(p+\eta )})\\&\le C(1+N) \exp (-a (Nx)^{\alpha /(p+\eta )}). \end{aligned}$$

For any fixed \({\varepsilon }\in (0,\alpha )\), we choose \(\eta >0\) small enough so that \(\alpha /(p+\eta )\ge (\alpha -{\varepsilon })/p\) and we conclude that (recall that \(Nx\ge 1/A\) because \(Nx^2\ge 1\) and \(x\le A\))

$$\begin{aligned} J^2(N,x)\le&C(1+N) \exp (-a (Nx)^{(\alpha -{\varepsilon })/p}) \le C \exp (-a (Nx)^{(\alpha -{\varepsilon })/p}). \end{aligned}$$

The last inequality is easily checked, using that \(Nx^2\ge 1\) implies that \(N\le (Nx)^2\).

Step 3: bound of \(J_n\) under (3). Here we show that for all \({\varepsilon }\in (0,q)\), if \(\eta >0\) is small enough,

$$\begin{aligned} \sum _{n\ge 0}J_n(N,x)\le C N \left( \frac{1}{Nx}\right) ^{(q-{\varepsilon })/p}\quad \hbox { if }\,Nx \ge 1. \end{aligned}$$

As usual, this will imply that for all \(x>0\), all \(N\ge 1\),

$$\begin{aligned} \left( \sum _{n\ge 0}J_n(N,x)\right) \wedge 1 \le C N \left( \frac{1}{Nx}\right) ^{(q-{\varepsilon })/p}. \end{aligned}$$

Exactly as in Step 2, we get from Lemma 12-(a)–(b) that

$$\begin{aligned} J_n(N,x) \le&\min \left\{ \exp \left( -a N \mu (B_n)z_n \log [2\vee z_n]\right) , N\mu (B_n)\right\} . \end{aligned}$$

Hence for \(n_3\) to be chosen later, since \(a N \mu (B_n)z_n= a Nx2^{-(p+\eta )n}\),

$$\begin{aligned} \sum _{n\ge 0}J_n(N,x)\le \,&C\sum _{n=0}^{n_3} \exp (- a Nx2^{-(p+\eta )n}) + C N \sum _{n>n_3} 2^{-qn}\\ \le \,&Cn_3 \exp (-a Nx2^{-(p+\eta )n_3}) + C N 2^{-qn_3}. \end{aligned}$$

We choose \(n_3:= \lfloor (q-{\varepsilon })\log (Nx)/(pq\log 2) \rfloor \), which implies that \(2^{-qn_3} \le 2^q (Nx)^{-(q-{\varepsilon })/p}\) and that \(2^{-(p+\eta )n_3} \ge (Nx)^{-(q-{\varepsilon })(p+\eta )/(pq)}\). Hence

$$\begin{aligned} \sum _{n\ge 0}J_n(N,x)\le&C \log (Nx) \exp (-a(Nx)^{1-(q-{\varepsilon })(p+\eta )/(pq)} ) +CN (Nx)^{-(q-{\varepsilon })/p}. \end{aligned}$$

If \(\eta \in (0,p{\varepsilon }/(q-{\varepsilon }))\), then \(1-(q-{\varepsilon })(p+\eta )/(pq)>0\), and thus

$$\begin{aligned} \log (Nx) \exp (-a(Nx)^{1-(q-{\varepsilon })(p+\eta )/(pq)} ) \le C (Nx)^{-(q-{\varepsilon })/p}. \end{aligned}$$

This ends the step.

Step 4. We next assume (1) and prove that for all \(x\ge A_1:=2^{p}[M_p(\mu )+(2 \log {\mathcal E}_{\alpha ,1}(\mu ))^{p/\alpha }]\),

$$\begin{aligned} \Pr [Z_N^{p} \ge x ] \le C \exp (-a N x^{\alpha /p}). \end{aligned}$$

A simple computation shows that for any \(\nu \in {\mathcal P}({{\mathbb {R}}^d})\), \(\sum _{n\ge 0} 2^{pn}\nu (B_n) \le 2^p M_p(\nu )\), whence \(Z_N^{p} \le 2^p M_p(\mu )+ 2^p N^{-1}\sum _1^N |X_i|^p\le 2^p M_p(\mu )+ 2^p [N^{-1}\sum _1^N |X_i|^\alpha ]^{p/\alpha }\). Thus

$$\begin{aligned} \Pr [Z_N^{p} \ge x ] \le \Pr \left[ N^{-1}\sum _1^N |X_i|^\alpha \ge [x2^{-p} - M_p(\mu )]^{\alpha /p}\right] . \end{aligned}$$

Next, we note that for \(y\ge 2 \log {\mathcal E}_{\alpha ,1}(\mu )\),

$$\begin{aligned} \Pr \left[ N^{-1}\sum _1^N |X_i|^\alpha \ge y\right] \le \exp (-N y + N \log {\mathcal E}_{\alpha ,1}(\mu ) ) \le \exp (-N y/2). \end{aligned}$$

The conclusion easily follows, since \(x\ge A_1\) implies that \(y:=[x2^{-p} - M_p(\mu )]^{\alpha /p}\ge 2 \log {\mathcal E}_{\alpha ,1}(\mu )\) and since \(y \ge [x2^{-p-1}]^{\alpha /p}-[M_p(\mu )]^{\alpha /p}\).

Step 5. Assume (2) and put \(\delta := 2p/\alpha -1\). Here we show that for all \(x>0\), \(N\ge 1\),

$$\begin{aligned} \Pr [Z_N^{p} \ge x ] \le C \exp (-a(Nx)^{\alpha /p})+C \exp (-a Nx^2 (\log (1+N))^{-\delta }). \end{aligned}$$

Step 5.1. For \(R>0\) (large) to be chosen later, we introduce the probability measure \(\mu ^R\) as the law of \(X{\mathbf{1}}_{\{|X|\le R\}}\). We also denote by \(\mu _N^R\) the corresponding empirical measure (coupled with \(\mu _N\) in that the \(X_i\)’s are used for \(\mu _N\) and the \(X_i{\mathbf{1}}_{\{|X_i|\le R\}}\)’s are chosen for \(\mu _N^R\)). We set \(Z_N^{p,R}:=\sum _{n\ge 0}2^{pn}\left| \mu _N^R(B_n)-\mu ^R(B_n)\right| \) and first observe that \(\left| Z^p_N-Z^{p,R}_N\right| \le 2^p N^{-1}\sum _1^N |X_i|^p{\mathbf{1}}_{\{|X_i|>R\}} + 2^p \int _{\{|x|>R\}}|x|^p\mu (dx)\). On the one hand, \(\int _{\{|x|>R\}}|x|^p\mu (dx)\le \exp (-R^\alpha /2) \int |x|^p e^{|x|^\alpha /2} \mu (dx) \le C \exp (-R^\alpha /2)\) by (2) (with \(\gamma =1\)). On the other hand, since \(\alpha \in (0,p]\), \(\sum _1^N |X_i|^p{\mathbf{1}}_{\{|X_i|>R\}}\le \left( \sum _1^N |X_i|^\alpha {\mathbf{1}}_{\{|X_i|>R\}}\right) ^{p/\alpha }\). Hence if \(x\ge A \exp (-R^\alpha /2)\), where \(A:=2^{p+1}C\),

$$\begin{aligned} \Pr \left( \left| Z^p_N-Z_N^{p,R}\right| \ge x\right) \le&\Pr \left( N^{-1}\sum _1^N |X_i|^p{\mathbf{1}}_{\{|X_i|>R\}} \ge x2^{-p-1} \right) \\ \le&\Pr \left( \sum _1^N |X_i|^\alpha {\mathbf{1}}_{\{|X_i|>R\}}\ge (Nx2^{-p-1})^{\alpha /p} \right) \\ \le&\exp (-(Nx2^{-p-1})^{\alpha /p}/2)\mathbb {E}\left[ \exp \left( |X_1|^\alpha {\mathbf{1}}_{\{|X_1|>R\}}/2\right) \right] ^N. \end{aligned}$$

Observing that \(\mathbb {E}[\exp (|X_1|^\alpha {\mathbf{1}}_{\{|X_1|>R\}}/2)]\le 1+ \mathbb {E}[\exp (|X_1|^\alpha /2){\mathbf{1}}_{\{|X_1|>R\}}] \le 1 + C\exp (-R^\alpha /2)\) by (2) and using that \(\log (1+u)\le u\), we deduce that for all \(x\ge 2^{p+1} C \exp (-R^\alpha /2)\),

$$\begin{aligned} \Pr \left( \left| Z^p_N-Z_N^{p,R}\right| \ge x\right) \le&\exp \left( - (Nx2^{-p-1})^{\alpha /p}/2+ CN \exp (-R^\alpha /2) \right) . \end{aligned}$$

With the choice

$$\begin{aligned} R:= (2\log (1+N))^{1/\alpha }, \end{aligned}$$
(5)

we finally find

$$\begin{aligned} \Pr \left( \left| Z^p_N-Z_N^{p,R}\right| \ge x\right) \le&\exp \left( - (Nx2^{-p-1})^{\alpha /p}/2 + C \right) \le C \exp \left( - a(Nx)^{\alpha /p}\right) \end{aligned}$$

provided \(x\ge A \exp (-R^\alpha /2)\), i.e. \((N+1)x \ge A\). As usual, this immediately extends to any value of \(x>0\).

Step 5.2. To study \(Z_N^{p,R}\), we first observe that since \(\mu ^R(B_n)=0\) if \(2^{n-1}\ge R\), we have \(2^{pn}\mu ^R(B_n) \le (2R)^{p-\alpha /2}2^{\alpha n/2}\mu ^R(B_n)\) for all \(n\ge 0\). Hence \(Z_N^{p,R}\le (2R)^{p-\alpha /2}Z_N^{\alpha /2,R}\). But \(\mu ^R\) satisfies \({\int _{{{\mathbb {R}}^d}}}\exp (|x|^\alpha ) \mu ^R(dx)<\infty \) uniformly in \(R\), so that we may use Steps 1, 2 and 4 (with \(p=\alpha /2<\alpha \)) to deduce that for all \(x>0\), \(\Pr \left( Z_N^{\alpha /2,R}\ge x\right) \le C \exp (-a N x^2)\). Consequently, \(\Pr \left( Z_N^{p,R}\ge x\right) \le C \exp (-a N (x/R^{p-\alpha /2})^2)\). Recalling (5) and that \(\delta := 2p/\alpha -1\), we see that that \(\Pr \left( Z_N^{p,R}\ge x\right) \le C \exp \left( -a Nx^2 (\log (1+N))^{-\delta }\right) \). This ends the step.

Conclusion. Recall that \(x_0>0\) is fixed.

First assume (1). By Step 4, \(\Pr \left[ Z_N^{p} \ge x \right] \le C\exp (-aNx^{\alpha /p})\) for all \(x\ge A_1\). We deduce from Steps 1 and 2 that for \(x\in (0,A_1)\), \(\Pr \left[ Z_N^{p} \ge x \right] \le C\exp (-aNx^2)\). We easily conclude that for all \(x>0\), \(\Pr \left[ Z_N^{p} \ge x \right] \le C\exp (-aNx^2){\mathbf{1}}_{\{x\le x_0\}} + C\exp (-aNx^{\alpha /p}){\mathbf{1}}_{\{x>x_0\}}\) as desired.

Assume next (2). By Step 5, \(\Pr \left[ Z_N^{p} \ge x \right] \le C \exp (-a Nx^2 (\log (1+N))^{-\delta })+C\exp (-a(Nx)^{\alpha /p})\). But if \(x \ge x_0\), we clearly have \((Nx)^{\alpha /p} \le C Nx^2 (\log (1+N))^{-\delta }\) because \(\alpha <p\), so that \(\Pr \left[ Z_N^{p} \ge x \right] \le C\exp (-a(Nx)^{\alpha /p})\). If now \(x \le x_0\), we use Steps 1 and 2 to write \(\Pr \left[ Z_N^{p} \ge x\right] \le C\exp (-aNx^2)+C\exp (-a(Nx)^{(\alpha -{\varepsilon })/p})\).

Assume finally (3). By Steps 1 and 3, \(\Pr [Z_N^{p} \ge x ]\le C\exp (-aNx^2) + C N (Nx)^{-(q-{\varepsilon })/q}\) for all \(x>0\). But if \(x\ge x_0\), \(\exp (-aNx^2)\le \exp (-aNx)\le C (Nx)^{-(q-{\varepsilon })/q} \le C N (Nx)^{-(q-{\varepsilon })/q}\). We conclude that for all \(x>0\), \(\Pr [Z_N^{p} \ge x ]\le C\exp (-aNx^2){\mathbf{1}}_{\{x\le x_0\}} + C N (Nx)^{-(q-{\varepsilon })/q}\) as desired.

We can now give the

Proof of Theorem 2

Let us recall that the constants during this proof may depend only on \(p,d\) and either on \(\alpha ,\gamma ,{\mathcal E}_{\alpha ,\gamma }(\mu )\) (under (1)) or on \(\alpha ,\gamma ,{\mathcal E}_{\alpha ,\gamma }(\mu ),{\varepsilon }\) (under (2)) or on \(q,M_q(\mu ),{\varepsilon }\) (under (3)).

Using Lemma 5, we write

$$\begin{aligned} {\mathcal T}_p(\mu _N,\mu )&\le \kappa _{p,d} {\mathcal D}_p(\mu _N,\mu )\\&\le \kappa _{p,d} \sum _{n\ge 0} 2^{pn}|\mu _N(B_n)-\mu (B_n)|\\&\quad +\kappa _{p,d}\sum _{n\ge 0} 2^{pn} \mu (B_n){\mathcal D}_p({\mathcal R}_{B_n}\mu _N,{\mathcal R}_{B_n}\mu )\\&=:\kappa _{p,d}( Z_N^p + V_N^p). \end{aligned}$$

Hence

$$\begin{aligned} \Pr ({\mathcal T}_p(\mu _N,\mu )\ge x) \le \Pr (Z_N^p \ge x/(2\kappa _{p,d})) +\Pr (V_N^p \ge x/(2\kappa _{p,d})). \end{aligned}$$

By Lemma 13 (choosing \(x_0:=1/(2\kappa _{p,d})\)), we easily find \(\Pr (Z_N^p \ge x/(2\kappa _{p,d})) \le Ce^{-cNx^2}{\mathbf{1}}_{\{x\le 1\}} +b(N,x) \le a(N,x){\mathbf{1}}_{\{x\le 1\}}+b(N,x)\), these quantities being defined in the statement of Theorem 2. We now check that there is \(A>0\) such that for all \(x>0\),

$$\begin{aligned} \Pr \left[ V_N^p \ge x/(2\kappa _{p,d}) \right] \le a(N,x){\mathbf{1}}_{\{x \le A\}}. \end{aligned}$$
(6)

This will end the proof, since one easily checks that \(a(N,x){\mathbf{1}}_{\{x \le A\}}\le a(N,x){\mathbf{1}}_{\{x \le 1\}}+b(N,x)\) (when allowing the values of the constants to change).

Let us thus check (6). For \(\eta >0\) to be chosen later, we set \(c:=(1-2^{-\eta })/(2\kappa _{p,d})\) and \(z_n:= c x 2^{-(p+\eta )n}/\mu (B_n)\). Observing that \(\sum _{n\ge 0} (1-2^{-\eta })2^{-\eta n}=1\)), we write

$$\begin{aligned} \mathbb {P}\left( V_N^p \ge x/(2\kappa _{p,d}) \right)&\le \left( \sum _{n\ge 0} \mathbb {P}\left[ {\mathcal D}_p({\mathcal R}_{B_n}\mu _N,{\mathcal R}_{B_n}\mu ) \ge z_n\right] \right) \wedge 1\\&=: \left( \sum _{n\ge 0} K_n(N,x)\right) \wedge 1. \end{aligned}$$

From now on, the value of \(c>0\) is not allowed to vary anymore. We introduce another positive constant \(a>0\) whose value may change from line to line. We only assume (3) (which is implied by (1) or (2)). We now show that if \(\eta >0\) is chosen small enough, there is \(A>0\) such that

$$\begin{aligned} \sum _{n\ge 0} K_n(N,x) \le C \exp (-a N h(x)){\mathbf{1}}_{\{x \le A\}} \quad \hbox {if }\,Nh(x)\ge 1, \end{aligned}$$
(7)

where \(h(x)=x^2\) if \(p>d/2\), \(h(x)=(x/\log (2+1/x))^2\) if \(p=d/2\) and \(h(x)=x^{d/p}\) if \(p<d/2\). This will obviously imply as usual that for all \(x>0\),

$$\begin{aligned} \left( \sum _{n\ge 0} K_n(N,x)\right) \wedge 1 \le C \exp (-a N h(x)){\mathbf{1}}_{\{x\le A\}} \end{aligned}$$

and thus conclude the proof of (6). We thus only have to prove (7).

Conditionally on \(\mu _N(B_n)\), \({\mathcal R}_{B_n}\mu _N\) is the empirical measure of \(N\mu _N(B_n)\) points which are \({\mathcal R}_{B_n}\mu \)-distributed. Since \({\mathcal R}_{B_n}\mu \) is supported in \((-1,1]^d\), we may apply Proposition 10 and obtain

$$\begin{aligned} K_n(N,x)&\le C\mathbb {E}\left[ {\mathbf{1}}_{\{z_n\le 1\}} \exp \left( - a N\mu _N(B_n) h(z_n)\right) \right] \\&\le C {\mathbf{1}}_{\{z_n\le 1\}} \exp (-N\mu (B_n)(1- e^{-a h(z_n)})) \end{aligned}$$

by Lemma 12-(c). But the condition \(z_n\le 1\) implies that \(h(z_n)\) is bounded (by a constant depending only on \(p\) and \(d\)), whence

$$\begin{aligned} K_n(N,x)\le C{\mathbf{1}}_{\{z_n\le 1\}} \exp (- a N\mu (B_n)h(z_n)). \end{aligned}$$

By (3), we have \(\mu (B_n)\le C 2^{-qn}\). Hence if \(x>A:=C/c\), we have \(z_n\ge (c/C)x2^{(q-p-\eta )n} >1\) for all \(n\ge 1\) (if \(\eta \in (0,q-p)\)) and thus \(\sum _{n\ge 0} K_n(N,x)=0\) as desired.

Next, we see that \(\theta \mapsto \theta h(x/\theta )\) is decreasing, whence for all \(x\le A\),

$$\begin{aligned} K_n(N,x)\!\le \! C \exp (- a N2^{-qn}h(c x 2^{(q-p-\eta )n}/C))\!\le \! C \exp (- a N2^{-qn}h(x 2^{(q-p-\eta )n})). \end{aligned}$$

We now treat separately the three cases.

Step 1: case \(p>d/2\). Since \(h(x)=x^2\), we have, if \(\eta \in (0,q/2-p)\),

$$\begin{aligned} \sum _{n\ge 0} K_n(N,x) \le C \sum _{n\ge 0} \exp \left( -a N x^2 2^{n(q-2p-2\eta )}\right) \le C \exp (-a Nx^2) \end{aligned}$$

if \(Nx^2\ge 1\).

Step 2: case \(p=d/2\). Since \(h(x)=(x/\log (2+1/x))^2\), we have, if \(\eta \in \hbox {(0,q/2-p)}\),

$$\begin{aligned} \sum _{n\ge 0} K_n(N,x) \le \,&C \sum _{n\ge 0} \exp \left( -a N x^2 2^{(q-2p-2\eta )n}/\log ^2[2+1/(x2^{(q-p-\eta )n})]\right) \\ \le \,&C \sum _{n\ge 0} \exp (-a N h(x) 2^{n(q-2p-2\eta )})\\ \le \,&C\exp (-a N h(x)) \end{aligned}$$

if \(N h(x) \ge 1\). The third inequality only uses that \(\log ^2(2+1/(x2^{n(q-p-\eta )})) \le \log ^2(2+1/x)\).

Step 3: case \(p<d/2\). Here \(h(x)=x^{d/p}\). Since \(p<d/2\) and \(q>2p\), it holds that \(q(1-p/d)-p>0\). We thus may take \(\eta \in (0,q(1-p/d)-p)\) (so that \(q(d/p-1)-d-d\eta /p>0\)) and we get

$$\begin{aligned} \sum _{n\ge 0} K_n(N,x) \le&C \sum _{n\ge 0} \exp (-a N x^{d/p} 2^{n(q(d/p-1)-d-d\eta /p)}) \le C\exp (-a N x^{d/p}) \end{aligned}$$

if \(N x^{d/p} \ge 1\). \(\square \)

7 The dependent case

We finally study a few classes of dependent sequences of random variables. We only give some moment estimates. Concentration inequalities might be obtained, but this should be much more complicated.

7.1 \(\rho \)-mixing stationary sequences

A stationary sequence of random variables \((X_n)_{n\ge 1}\) with common law \(\mu \) is said to be \(\rho \)-mixing, for some \(\rho :{\mathbb {N}}\rightarrow {\mathbb {R}}^+\) with \(\rho _n\rightarrow 0\), if for all \(f,g \in L^2(\mu )\) and all \(i,j\ge 1\)

$$\begin{aligned} \mathbb {C}\mathrm{ov}\,(f(X_i),g(X_j))\le \rho _{|i-j|}\sqrt{\mathbb {V}\mathrm{ar}\,(f(X_i))\mathbb {V}\mathrm{ar}\,(g(X_j))}. \end{aligned}$$

We refer for example to Rio [37], Doukhan [20] or Bradley [10].

Theorem 14

Consider a stationary sequence of random variables \((X_n)_{n\ge 1}\) with common law \(\mu \) and set \(\mu _N:=N^{-1}\sum _1^N \delta _{X_i}\). Assume that this sequence is \(\rho \)-mixing, for some \(\rho :{\mathbb {N}}\rightarrow {\mathbb {R}}^+\) satisfying \(\sum _{n\ge 0} \rho _n<\infty \). Let \(p>0\) and assume that \(\mu \in M_q({{\mathbb {R}}^d})\) for some \(p>q\). There exists a constant \(C\) depending only on \(p,d,q, M_q(\mu ),\rho \) such that, for all \(N\ge 1\),

$$\begin{aligned} \mathbb {E}\left( {\mathcal T}_p(\mu _N,\mu )\right) \le C\left\{ \begin{array}{l@{\quad }l} N^{-1/2} +N^{-(q-p)/q}&{} \!\!\!\quad \hbox {if }\,p>d/2 \quad \hbox { and }\quad \,q\ne 2p,\\ N^{-1/2} \log (1+N)+N^{-(q-p)/q} &{}\quad \!\!\! \hbox {if }\,p=d/2 \quad \hbox { and }\quad \,q\ne 2p,\\ N^{-p/d}+N^{-(q-p)/q} &{}\quad \!\!\!\hbox {if }\,p\in (0,d/2)\quad \hbox { and }\quad \,q\ne d/(d-p). \end{array}\right. \end{aligned}$$

This is very satisfying: we get the same estimate as in the independent case. The case \(\sum _{n\ge 0} \rho _n=\infty \) can also be treated (but then the upper bounds will be less good and depend on the rate of decrease of \(\rho \)). Actually, the \(\rho \)-mixing condition is slightly too strong (we only need the covariance inequality when \(f=g\) is an indicator function), but it is best adapted notion of mixing we found in the litterature.

Proof

We first check that for any Borel subset \(A \subset {{\mathbb {R}}^d}\),

$$\begin{aligned} \mathbb {E}[|\mu _N(A)-\mu (A)|] \le \min \{2\mu (A),C\mu (A)N^{-1/2}\}. \end{aligned}$$

But this is immediate: \(\mathbb {E}[\mu _N(A)]=\mu (A)\) (whence \(\mathbb {E}[|\mu _N(A)-\mu (A)|]\le 2\mu (A)\)) and

$$\begin{aligned} \mathbb {V}\mathrm{ar}\,\mu _N(A)=&\frac{1}{N^2} \sum _{i,j\le N} \mathbb {C}\mathrm{ov}\,({\mathbf{1}}_A(X_i),{\mathbf{1}}_A(X_i))\\&\le \frac{1}{N^2} \sum _{i,j\le N} \rho _{|i-j|}\mathbb {V}\mathrm{ar}\,({\mathbf{1}}_A(X_1))\\&\le \frac{\mu (A)(1-\mu (A))}{N^2}\sum _{i,j\le N} \rho _{|i-j|}. \end{aligned}$$

This is smaller than \(C \mu (A)/N\) as desired, since \(\sum _{i,j\le N} \rho _{|i-j|}\le N \sum _{k\ge 0} \rho _k= C N\). Once this is done, it suffices to copy (without any changes) the proof of Theorem 1.   \(\square \)

7.2 Markov chains

Here we consider a \({{\mathbb {R}}^d}\)-valued Markov chain \((X_n)_{n\ge 1}\) with transition kernel \(P\) and initial distribution \(\nu \in {\mathcal P}({{\mathbb {R}}^d})\) and we set \(\mu _N:=N^{-1}\sum _{1}^N \delta _{X_n}\). We assume that it admits a unique invariant probability measure \(\pi \) and the following \(L^2\)-decay property (usually related to a Poincaré inequality)

$$\begin{aligned} \forall \;n\ge 1,\; \forall \;f\in L^2(\pi ), \quad \Vert P^nf-\pi (f)\Vert _{L^2(\pi )}\le \rho _n \Vert f-\pi (f)\Vert _{L^2(\pi )} \end{aligned}$$
(8)

for some sequence \(\rho =(\rho _n)_{n\ge 1}\) decreasing to \(0\).

Theorem 15

Let \(p\ge 1\), \(d\ge 1\) and \(r> 2\) be fixed. Assume that our Markov chain \((X_n)_{n\ge 0}\) satisfies (8) with a sequence \((\rho _n)_{n\ge 1}\) satisfying \(\sum _{n\ge 1} \rho _n<\infty \). Assume also that the initial distribution \(\nu \) is absolutely continuous with respect to \(\pi \) and satisfies \(\Vert d\nu /d\pi \Vert _{L^r(\pi )}<\infty \). Assume finally that \(M_q(\pi )<\infty \) for some \(q>p r/(r-1)\). Setting \(q_r:=q(r-1)/r\) and \(d_r=d(r+1)/r\), there is a constant \(C\), depending only on \(p,d,r,q,\rho ,M_q(\pi )\) and \(\Vert d\nu /d\pi \Vert _{L^r(\pi )}\) such that for all \(N\ge 1\),

$$\begin{aligned} \mathbb {E}_\nu \left( {\mathcal T}_p(\mu _N,\pi )\right) \le C\left\{ \begin{array}{l@{\quad }l} N^{-1/2} +N^{-(q_r-p)/q_r}&{} \quad \hbox {if }\,p>d_r/2r\,\quad \hbox { and }\quad \,q_r\ne 2p,\\ N^{-1/2} \log (1+N)+N^{-(q_r-p)/q_r} &{} \quad \hbox {if }\,p=d_r/2r\,\quad \hbox { and }\quad \,q_r\ne 2p,\\ N^{-p/d}+N^{-(q_r-p)/q_r} &{}\quad \hbox {if }\,p\in (0,d_r/2)\,\quad \hbox { and }\quad \,q_r\ne d_r/(d_r-p). \end{array}\right. \end{aligned}$$

Once again, we might adapt the proof to get a complete picture corresponding to other decay than \(L^2\)-\(L^2\) and to slower mixing rates \((\rho _n)_{n\ge 1}\).

Proof

We only have to show that for any \(\ell \ge 0\), any \(n\ge 0\),

$$\begin{aligned} \Delta _{n,\ell }^N:=&\sum _{F\in {\mathcal P}_\ell }\mathbb {E}_\nu \left( |\mu _N(2^n F \cap B_n)-\pi (2^n F \cap B_n)| \right) \\ \le \,&C \min \left\{ (\pi (B_n))^{(r-1)/r}, [2^{d_r \ell }(\pi (B_n))^{(r-1)/r}/N]^{1/2} \right\} . \end{aligned}$$

Since \(M_q(\pi )<\infty \) (whence \(\pi (B_n)\le C 2^{-qn}\)), we will deduce that

$$\begin{aligned} \Delta _{n,\ell }^N \le C \min \left\{ 2^{-q_r n}, 2^{d_r \ell /2}(2^{-q_rn}/N)^{1/2} \right\} . \end{aligned}$$

Then the rest of the proof is exactly the same as that of Theorem 1, replacing everywhere \(q\) and \(d\) by \(q_r\) and \(d_r\).

We first check that \(\Delta _{n,\ell }^N \le C (\pi (B_n))^{(r-1)/r}\). Using that \(\Vert d\nu /d\pi \Vert _{L^r(\pi )}<\infty \), we write

$$\begin{aligned} \mathbb {E}_\nu (\mu _N(B_n))=\frac{1}{N}\sum _{i=1}^N\mathbb {E}_\pi \left[ \frac{d\nu }{d\pi }(X_0)1_{\{X_i\in B_n\}}\right] \le \Vert d\nu /d\pi \Vert _{L^r(\pi )} \pi (B_n)^{(r-1)/r}. \end{aligned}$$

We next consider a Borel subset \(A\) of \({{\mathbb {R}}^d}\) and check that

$$\begin{aligned} \mathbb {E}_\nu (|\mu _N(A)-\pi (A)|)\le C (\pi (A))^{(r-1)/(2r)} N^{-1/2}. \end{aligned}$$

To do so, as is usual when working with Markov chains or covariance properties (see [7]), we introduce \(f=1_A-\pi (A)\) and write

$$\begin{aligned} \mathbb {E}_\nu (|\mu _N(A)-\pi (A)|)=\frac{1}{N}\mathbb {E}_\nu \left( \left| \sum _{i=1}^N f(X_i)\right| \right) \le \frac{1}{N} \left( \sum _{i,j=1}^N\mathbb {E}_\nu (f(X_i)f(X_j))\right) ^{1/2}. \end{aligned}$$

For \(j\ge i\), it holds that

$$\begin{aligned} \mathbb {E}_\nu (f(X_i)f(X_j))=&\mathbb {E}_\nu [f(X_i)P^{j-i}f(X_i)] =\mathbb {E}_\pi \left[ \frac{d\nu }{d\pi }(X_0)f(X_i).P^{j-i}f(X_i)\right] . \end{aligned}$$

Using the Hölder inequality (recall that \(\Vert d\nu /d\pi \Vert _{L^r(\pi )}<\infty \) with \(r>2\)) and (8), we get

$$\begin{aligned} \mathbb {E}_\nu (f(X_i)f(X_j))&\le \Vert d\nu /d\pi \Vert _{L^r(\pi )} \Vert f\Vert _{L^{2r/(r-2)}(\pi )}\Vert P^{j-i}f\Vert _{L^2(\pi )}\\&\le C \rho _{j-i} \Vert f\Vert _{L^{2r/(r-2)}(\pi )}\Vert f\Vert _{L^2(\pi )} . \end{aligned}$$

But for \(s>1\), \(\Vert f\Vert _{L^s(\pi )}\le C_s(\pi (A)+(\pi (A))^s)^{1/s}\le C_s (\pi (A))^{1/s}\), we find \(\mathbb {E}_\nu (f(X_i)f(X_j))\le C \rho _{j-i}(\pi (A))^{(r-1)/r}\) and thus

$$\begin{aligned} \mathbb {E}_\nu (|\mu _N(F)-\pi (F)|)&\le \frac{C}{N} \left( \sum _{i,j=1}^N \rho _{|i-j|} (\pi (F))^{(r-1)/2r} \right) ^{1/2} \\&\le C (\pi (F))^{(r-1)/(2r)} N^{-1/2} \end{aligned}$$

as desired. We used that \(\sum _{i,j=1}^N \rho _{|i-j|}\le C N\).

We can finally conclude that

$$\begin{aligned} \Delta _{n,\ell }^N \le&CN^{-1/2} \sum _{F \in {\mathcal P}_\ell } (\pi (2^n F \cap B_n))^{(r-1)/(2r)} \le CN^{-1/2} 2^{d_r \ell / 2} (\pi (B_n))^{(r-1)/(2r)} \end{aligned}$$

by the Hölder inequality (and because \(\# {\mathcal P}_\ell =2^{d\ell }\)), where \(d_r=d (r+1)/r\) as in the statement. \(\square \)

7.3 Mc Kean-Vlasov particles systems

Particle approximation of nonlinear equations has attracted a lot of attention in the past thirty years. We will focus here on the following \({{\mathbb {R}}^d}\)-valued nonlinear S.D.E.

$$\begin{aligned} dX_t=\sqrt{2}dB_t-\nabla V(X_t)dt-\nabla W*u_t(X_t)dt,\qquad X_0=x \end{aligned}$$

where \(u_t= Law(X_t)\) and \((B_t)\) is an \({{\mathbb {R}}^d}\)-valued Brownian motion. This is a probabilistic representation of the so-called Mc Kean-Vlasov equation, which has been studied in particular by Carillo et al. [12], Malrieu [28] and Cattiaux et al.[13] to which we refer for further motivations and existence and uniqueness of solutions. We will mainly consider here the case where \(V\) and \(W\) are convex (and if \(V=0\) the center of mass is fixed) and \(W\) is even. To fix the ideas, let us consider only two cases:

  1. (a)

    \(Hess\, V\ge \beta Id>0\), \(Hess\, W\ge 0\).

  2. (b)

    \(V(x)=|x|^\alpha \) for \(\alpha >2\), \(Hess\, W\ge 0\).

The particle system introduced to approximate the nonlinear equation is the following. Let \((B^i_t)_{t\ge 0}\) be \(N\) independent Brownian motions. For \(i=1,\ldots ,N\), set \(X^{i,N}_0=x\) and

$$\begin{aligned} dX^{i,N}_t=\sqrt{2}dB^i_t-\nabla V(X^{i,N}_t)dt-\frac{1}{N}\sum _j\nabla W(X^{i,N}_t-X^{j,N}_t)dt. \end{aligned}$$

Usual propagation of chaos property is usually concerned with control of

$$\begin{aligned} {\mathcal T}_2(Law(X^{1,N}_t),u_t) \end{aligned}$$

uniformly (or not) in time. It is however very natural to consider rather a control of

$$\begin{aligned} {\mathcal T}_2(\hat{u}^N_t,u_t) \end{aligned}$$

where \(\hat{u}^N_t=\frac{1}{N}\sum _{i=1}^N\delta _{X^{i,N}_t}\), as in Bolley et al. [11].

To do so, and inspired by the usual proof of propagation of chaos, let us consider nonlinear independent particles

$$\begin{aligned} dX^i_t=\sqrt{2}dB^i_t-\nabla V(X_t^i)dt-\nabla W*u_t(X_t^i)dt,\qquad X^i_0=x \end{aligned}$$

(driven by the same Brownian motions as the particle system) and the corresponding empirical measure \(u^N_t=\frac{1}{N}\sum _{i=1}^N\delta _{X^{i}_t}\). We then have

$$\begin{aligned} {\mathcal T}_2\left( \hat{u}^N_t,u_t\right) \le 2{\mathcal T}_2\left( \hat{u}^N_t,u^N_t\right) +2{\mathcal T}_2\left( u^N_t,u_t\right) . \end{aligned}$$

Then following [28] in case (a) and [13] in case (b), one easily gets (for some time-independent constant \(C\))

$$\begin{aligned} \mathbb {E}\left( {\mathcal T}_2^2(\hat{u}^N_t,u^N_t)\right) \le \frac{1}{N}\mathbb {E}\left( \sum _{i=1}^N|X^{i,N}_t-X^i_t|^2\right) \le C\alpha (N) \end{aligned}$$

where \(\alpha (n)=N^{-1}\) in case (a) and \(\alpha (N)=N^{-1/(\alpha -1)}\) in case (b). It is not hard to prove here that the nonlinear particles have infinitely many moments (uniformly in time) so that combining Theorem 1 with the previous estimates gives

$$\begin{aligned} \sup _{t\ge 0}\mathbb {E}({\mathcal T}_2(\hat{u}^N_t,u_t))\le C(\alpha (N)+\beta (N)) \end{aligned}$$

where \(\beta (N)=N^{-1/2}\) if \(d=1\), \(\beta (N)=N^{-1/2}\log (1+N)\) if \(d=2\) and \(\beta (N)=N^{-1/d}\) if \(d\ge 3\).