1 Introduction

1.1 Classical transport-entropy inequalities

In transportation theory, an important achievement was the proof by Talagrand in [33] of the fact that the standard Gaussian measure \(\gamma \) in \(\mathbb{R }^n\) satisfies the transport-entropy inequality \(T_2(2)\) (named after Talagrand). We say that a probability measure \(\mu \) on \(\mathbb{R }^n\) satisfies the inequality \(T_p(C)\) for some \(C>0\) if, for any probability measure \(\nu \) on \(\mathbb{R }^n,\)

$$\begin{aligned} W_p^2(\nu , \mu ) \leqslant C H(\nu | \mu ), \end{aligned}$$

where

  • \(W_p(\mu ,\nu )\) is the Wassertein distance of order \(p\) with respect to the Euclidean distance on \(\mathbb{R }^n\) between the two probability measures \(\mu \) and \(\nu ,\) that is

    $$\begin{aligned} W_p(\mu , \nu ) := \inf \left\{ \int _{(\mathbb{R }^n)^2} |x-y|^p d\pi (x,y); \pi _0 =\mu , \pi _1=\nu \right\} ^{\frac{1}{p}}, \end{aligned}$$

    with \(\pi _0\) and \(\pi _1\) respectively the first and second marginals of \(\pi \).

  • \(H( \cdot | \mu )\) is the (classical) relative entropy with respect to \(\mu ,\) that is

    $$\begin{aligned} H(\nu | \mu ) = \int \ln \left( \frac{d\nu }{d\mu }\right) d\nu , \end{aligned}$$

    if \(\nu \) is absolutely continuous with respect to \(\mu \) and \(+\infty \) otherwise. If \(\mu \) is the Lebesgue measure on \(\mathbb{R }^n,\) \(H(\cdot |\mu )\) is just called the classical entropy and denoted by \(H(\cdot ).\)

These inequalities give very important informations on measures that satisfy them since they are related to concentration properties and allow to deduce precise deviation estimates starting from a large deviation principle (see the work of Gozlan and Léonard [22] for a discussion on these topics as well as an excellent review of the advances during the past decade in this field).

After the result of Talagrand, a lot of attention was devoted to prove similar inequalities beyond the Gaussian case; we will review only a few of them. It was proved by Otto and Villani in [30] that any probability measure \(\mu \) satisfying a log-Sobolev inequality with constant \(C\) also satisfies the inequality \(T_2(C)\). In particular, let \(\mu \) be a probability measure of the form \(d\mu (x)= e^{-V(x)}dx,\) for some potential \(V\) satisfying \({\text{ Hess } }V\geqslant \kappa {\text{ Id }}>0.\) In that case, for any probability measures \(\nu \) on \(\mathbb{R }^n,\)

$$\begin{aligned} W_2^2(\nu , \mu ) \leqslant \frac{2}{\kappa }H(\nu | \mu ) \end{aligned}$$
(1)

as if \({\text{ Hess } }V\geqslant \kappa {\text{ Id }}>0,\) then the measure \(d\mu (x)= e^{-V(x)}dx\) satisfies a log-Sobolev inequality with constant \(2/\kappa \).

There has been some attempts (e.g. [15, 16]) to generalise this results to potentials \(V\) that are no longer strictly convex but the criteria that have been obtained are quite difficult to handle.

Furthermore, it seems that there is little room for improvements of the result of Otto and Villani since the inequality \(T_2\) implies Poincaré inequality for \(\mu \). Thus it is impossible to hope for a measure \(\mu \) that does not have a connected support to satisfy an inequality \(T_2\) since such measures does not satisfy Poincaré inequality. For such measures, one can be interested in the inequality \(T_1\). Note that this inequality \(T_1(C)\) is weaker than \(T_2(C)\) by a direct application of Cauchy-Schwarz inequality. The benefit is that the criteria for \(T_1\) are much easier to handle. In particular, in [5], Bobkov and Götze proved that a probability measure \(\mu \) satisfies \(T_1(2C)\) if and only if,

$$\begin{aligned} \int e^{f(x)} d\mu (x) \leqslant e^{C\frac{\Vert f\Vert _{Lip}^2}{2}} \end{aligned}$$

for all Lipschitz function \(f\) such that \(\int f d\mu =0\) (with \(\Vert f\Vert _{Lip}\) denoting the Lipschitz constant of \(f\)). Later, Djellout et al. proved in [18] that this condition was equivalent to the quite easy to handle condition that there exists \(\alpha >0\) and \(x_0\) such that

$$\begin{aligned} \int \exp (\alpha d(x,x_0)^2)d\mu (x)<+\infty . \end{aligned}$$

One can see on this latter expression that compactly supported measures automatically satisfy a \(T_1\) inequality. Besides, if \(\mu \) is a measure of density \(\exp (-V(x))\) with \(V(x)\sim |x|^d\) for large \(x\), then \(\mu \) satisfies \(T_1\) if and only if \(d\geqslant 2\) (note the similar condition appearing in Hypothesis 1 below).

1.2 Free transport-entropy inequalities

We review hereafter some results in the literature that are the analogues in the free probability context of the inequality \(T_2\) previously discussed. We assume that the reader has some minimal background in free probability, that can be found for example in [1].

In the free probability context, the semi-circle law, also called Wigner law, given by \(d\sigma (x)= \frac{1}{2\pi } \sqrt{4-x^2} \mathbf 1 _{[-2,2]}(x)dx\) can for many reasons be seen as the free analogue of the standard Gaussian distribution. Therefore it is natural to ask whether the semi-circle law satisfies a free analogue of the transport-entropy inequality \(T_2,\) with the entropy replaced by the free entropy defined by Voiculescu (see [36] for a quick review). A positive answer to this question was given by Biane and Voiculescu in [12] : they showed that for any compactly supported probability measure \(\nu ,\)

$$\begin{aligned} W_2^2(\nu , \sigma ) \leqslant 2 \Sigma (\nu ), \end{aligned}$$

where \(\Sigma \) is the free entropy with respect to \(\sigma \) (called free entropy adapted to the free Ornstein-Uhlenbeck process in [12]).

The free entropy was introduced in whole generality (even for multivariate tracial states) by Voiculescu, it is a profound and quite complicated object but luckily in the one dimensional setting, one can give the following explicit expression for the free entropy with respect to \(\sigma \):

$$\begin{aligned} \Sigma (\nu ) = \frac{1}{2} \int x^2 d\nu (x) - \int \int \ln |x-y| d\nu (x)d\nu (y) - \frac{3}{4}. \end{aligned}$$
(2)

As we said that the semi-circle law \(\sigma \) is the analogue of the Gaussian law, one can now wonder what are the free analogues \(\mu _V\) of the classical measures of the form \( e^{-V(x)}dx.\) To define those probability measures \(\mu _V,\) we need to look at the probability measures defined on the space of \(N\) by \(N\) Hermitian matrices by:

$$\begin{aligned} d\mu ^N_V(X)\propto \exp (-NtrV(X))d^NX \end{aligned}$$

where  \(d^NX\) is the Lebesgue measure on the space of Hermitian matrices. In the sequel, we will assume that the potential \(V\) satisfies

Hypothesis 1

\(V\) is continuous and \(\liminf _{|x| \rightarrow \infty } \frac{V(x)}{x^2} >0\).

It ensures for example the existence of a normalising constant such that \(\mu ^N_V\) becomes a probability measure. Note that this hypothesis is a little more restrictive than the usual growth requirement for this model but seems necessary for our result (see some comments just after the statement of Theorem 3). If the matrix \(X^N\) is distributed according to the law \(\mu ^N_V\) then the joint law of the eigenvalues of \(X^N\) is the following:

$$\begin{aligned} \mathbb{P }^N_V(dx_1,\ldots ,dx_N)= \prod _{i<j}|x_i -x_j|^2\exp \left( -N\sum _{i=1}^NV(x_i)\right) \frac{\prod _{i=1}^Ndx_i}{Z^N_V}, \end{aligned}$$

with \(Z^N_V\) a normalising constant. This can be seen as the density of a Coulomb gas, that is \(N\) particles in the potential \(NV\) with a repulsive electrostatic interaction. Under the law \(\mathbb{P }^N_V\), the particles \(x_1,\ldots ,x_N\) tend to be near the minima of \(V\) but due to the Vandermonde determinant they can not be too close from each other. The study of how these two effects reach an equilibrium is a difficult, yet well studied one. We recall hereafter a few facts about their behaviour. First, if we introduce the empirical measure \({\widehat{\mu }_N}:=\frac{1}{N}\sum _{i=1}^N\delta _{x_i}\), the density of \(\mathbb{P }^N_V\) can be written as

$$\begin{aligned} \mathbb{P }^N_V(dx_1,\ldots ,dx_N)= \exp (-N^2{\widetilde{J}_V}({\widehat{\mu }_N}))\frac{\prod _{i=1}^Ndx_i}{Z^N_V} \end{aligned}$$

with, for any probability measure \(\mu ,\)

$$\begin{aligned} \widetilde{J}_V(\mu ) = \int V(x)d\mu (x)-\int \int _{x\ne y} \ln |x-y| d\mu (x) d\mu (y). \end{aligned}$$

One can expect that in the large \(N\) limit, the eigenvalues should organise according to the minimiser of this functional. We recall hereafter a result of the classical theory of logarithmic potentials which will define the family of measures \(\mu _V\) which are the analogues in the free probability setting to the probability measures of the form \( e^{-V(x)}dx.\) This result is Theorem 1.3 in Chapter 1 of [32] simplified by the use of Theorem 4.8 in the same chapter which implies the continuity of the logarithmic potential. The books [1, 17] also give presentation of similar results, in a perspective closer to random matrix theory but later on we will need some more involved results of the book of Saff and Totik so we try not to drift too much away from their notations. Let us denote, for \(X\) a Polish space, by \(\mathcal{P }(X)\) the set of probability measures on \(X\).

Theorem 1

(Equilibrium measure of a potential) Let \(V\) be a function satisfying Hypothesis 1. Define for \(\mu \) in \(\mathcal{P }(\mathbb{R })\),

$$\begin{aligned} J_V(\mu ) = \int _\mathbb{R }V(x)d\mu (x)- \int \int _{\mathbb{R }^2} \ln |x-y| d\mu (x) d\mu (y) \end{aligned}$$

with the convention \(J_V(\mu )=+\infty \) as soon as \(\int Vd\mu =+\infty \). Then \(c_V = \inf _{\nu \in \mathcal{P }(\mathbb{R })} J_V(\nu )\) is a finite constant and the minimum of \(J_V\) is achieved at a unique probability measure \(\mu _V\) called equilibrium measure which has a compact support. Besides, if we define the logarithmic potential of \(\mu _V\) as

$$\begin{aligned} U_{\mu _V}(x) = - \int \ln |x-y| d\mu _V(y), \end{aligned}$$

for all \(x \in \mathbb{C }\) then \(U_{\mu _V}\) is finite and continuous on \(\mathbb{C }\) and \(\mu _V\) is the unique probability measure on \(\mathbb{R }\) for which there exists a constant \(C_V\) such that:

$$\begin{aligned} -2U_{\mu _V}(x)+C_V&\leqslant V(x)\quad \quad \text{ for } \text{ all } x \text{ in } \mathbb{C }\text{. }\\ -2U_{\mu _V}(x)+C_V&= V(x)\quad \quad \text{ for } \text{ all } x \text{ in } \text{ the } \text{ support } \text{ of } \mu _V\\ \end{aligned}$$

\(C_V\) is related to \(c_V\) by the formula \(C_V = 2c_V-\int V(x) d\mu _V(x)\).

This allows to define the free entropy relative to the potential \(V\) as follows: for any \(\mu \in \mathcal{P }(\mathbb{R }),\)

$$\begin{aligned} \Sigma _V(\mu )=J_V(\mu )-c_V = J_V(\mu )-J_V(\mu _V). \end{aligned}$$

This quantity is always positive and vanishes only at \(\mu _V\). One can check that the functional \(\Sigma \) introduced in (2) coincides with \(\Sigma _{x^2/2}.\)

Let us make a few remarks on the functional \(\Sigma _V.\) First, Theorem 1 studies the optimum for the functional \(J_V\) but not how it is related to a typical distribution of \(x_i\)’s under the law \(\mathbb{P }^N_V\). This is the goal of the work of Ben Arous and Guionnet [2] (see also the book [1] for a slightly different point of view), from which we want to recall the following result:

Theorem 2

(Large deviations for the empirical measure) Let \(V\) be a function satisfying Hypothesis 1. Under the law \(\mu ^N_V,\) the sequence of random measures \({\widehat{\mu }_N}\) satisfies a large deviation principle in the speed \(N^2\) with good rate function \(\Sigma _V\).

We refer the reader not familiar with the theory of large deviations to [20]. By comparison to Sanov theorem where the classical relative entropy appears as a good rate function, this result can be seen as a justification of the name “free relative entropy” for \(\Sigma _V\).

Another reason is that \(\Sigma _V\) appears as a limit of classical relative entropy. Indeed, under some additional assumptions on \(V\) and \(W,\) we have

$$\begin{aligned} \lim _N \frac{1}{N^2}H(\mu ^N_W|\mu ^N_V)=\Sigma _V(\mu _W). \end{aligned}$$

We can now state a generalisation of the result of Biane and Voiculescu, which can be seen as a free analogue of the classical result (1). It was first proved by Hiai et al. in [26] using random matrix approximations and classical inequalities. Let \(V\) be a strictly convex function with \(V^{\prime \prime }(x)\geqslant \kappa >0\) on \(\mathbb{R }\). Then, for any probability measure \(\nu \)

$$\begin{aligned} W_2^2(\nu , \mu _V) \leqslant \frac{2}{\kappa } \Sigma _V(\mu ). \end{aligned}$$

The same result was later proved in a very direct way by Ledoux and Popescu [29].

Finally, let us finish this quick review by mentioning two interesting directions that could extend these works. First, in view of this result and the one by Otto and Villani, a natural question is to ask whether a free analogue of the log-Sobolev inequality (see the work of Biane [9] for the construction of such an object) is sufficient to obtain a free transport inequality. While the methods of [12] have some similarities with the ones of [30] this remains an open problem.

Another natural extension of these results would be to look at the multivariable case. As pointed out above, in several variables, the free entropy is a much more difficult object to handle and the theory of non-commutative transport is still at its beginning. The recent paper of Guionnet and Shlyakhtenko [23] gives some basis and highlights many pitfalls of this theory. Still, the Wasserstein distance is well defined and in some cases such as a \(n\)-uple of semi-circular variables one can define a notion of free relative entropy. In [4], Biane and Dabrowski prove a version of the free Talagrand inequality for a \(n\)-uple of semi-circular variables.

1.3 Statement of the free \(T_1\) inequality

The problem we want to address in this work is to prove a free analogue of the result of Bobkov and Götze [5], thus providing free transport-entropy inequality for measures \(\mu _V\) beyond the case of convex potentials which was treated in the work of Hiai, Petz and Ueda. As pointed out above, even in the classical context, there is no reason for measures coming from non-convex potentials to satisfy \(T_2\). Thus we will prove an analogue to the inequality \(T_1.\) Our main result can be stated as follows

Theorem 3

(Free \(T_1\) inequality) Let \(V\) be a function satisfying Hypothesis 1. Then there exists a constant \(B_V\) such that, for any probability measure \(\nu \) on \(\mathbb{R }\),

$$\begin{aligned} W_1^2(\nu ,\mu _V) \leqslant B_V \Sigma _V(\nu ). \end{aligned}$$

Let us make some comments on the role of Hypothesis 1.

Remark 1

It is well known that the equilibrium measure exists (and is compactly supported) under the hypothesis that \(\liminf V(x)(\ln |x|)^{-1}>2\) (see Theorem 1.3 in [32] or Lemma 2.6.2 in [1]). Let us check that the result is trivially false if this holds but \(V\) is negligible with respect to \(x^2\).

If \(\nu _n\) is the uniform law on \([n;n+1],\) then \(W_1(\nu _n,\mu _V)^2\) is equivalent to \(n^2\) (the cost to transport the mass at \(x\) for the measure \(\mu _V\) to the measure \(\nu _n\) is bounded from below by \(|n-x|-1\) which gives the result after integrating along \(\mu _V\)). But \(\Sigma _V(\nu _n)\) grows like \(\nu _n(V)\) which is less than quadratic. Thus \(\mu _V\) does not satisfies a free \(T_1\) inequality.

This argument can also be extended to tackle the case of non-compactly supported measures. Indeed, in [25], the author considers equilibrium measures for potentials with weaker growth and shows that the equilibrium measure still exists if we only assume that \(\liminf V(x)-2\ln |x|>-\infty \). Note that in that case the equilibrium measure is not necessarily compactly supported. Still, if \(V\) is negligible in front of \(x^2,\) the argument of the previous paragraph still applies.Footnote 1

Note that there is room for improvement since we do not say anything on potentials which do not satisfy Hypothesis 1 but are not negligible in front of \(x^2\) (oscillating potentials for example). A guess by analogy with the classical case for a necessary and sufficient condition for a free \(T_1\)-inequality for \(\mu _V\) would be slightly more general than Hypothesis 1, namely that for some \(\alpha >0\),

$$\begin{aligned} \int \exp (\alpha x^2)\exp (-V(x))dx<\infty . \end{aligned}$$

This remains an open question.

A natural strategy to try to prove this theorem, following the idea [26], is to look at a finite dimensional approximation by matrix models. The issue with this approach is that while for the classical \(T_2\) inequality the constant in front of the entropy is explicitly related to the potential and behave nicely when the dimension increases, this is no longer the case for \(T_1.\) In [13], Bolley and Villani managed to explicitly link the constant to the potential but when applied in this case the constant deteriorates very quickly with the dimension. Thus we will need some new tools to get our results. The main ingredients that we will use to adapt the proof of Bobkov and Götze is potential theory.

Since Theorem 3 is only stated for measures of the form \(\mu _V\), one may have the false impression that it is restricted to this particular case. In fact it is relevant for a quite large class of measures. A difficulty is that one may want to think of the functional \(\Sigma _V\) as the entropy relative to the measure \(\mu _V\) but we must be careful since different \(V\)’s can lead to the same equilibrium measure while defining different notions of this relative entropy.

The first step is to get rid of this dependence on the potential. Let \(\mu \) be a probability measure with a compact support \(S_\mu \) in \(\mathbb{R }\) such that its logarithmic potential \( U_\mu (x) = - \int \ln |x-y| d\mu (y)\) exists and is continuous on \(\mathbb{C }\). Then the potential \(V(x)=-2U_\mu (x)+(d(x,S_\mu ))^2\) satisfies Hypothesis 1 and using Theorem 1 it is easy to see that \(\mu _V=\mu \).

Now if we look at \(\nu \) a probability measure on \(S_\mu \):

$$\begin{aligned} \Sigma _V(\nu )&= \int V d\nu - \int \int _{\mathbb{R }^2} \ln |x-y| d\nu (x) d\nu (y)-c_V\\&= 2\int \int _{\mathbb{R }^2} \ln |x-y| d\mu (x) d\nu (y) - \int \int _{\mathbb{R }^2} \ln |x-y| d\nu (x) d\nu (y)-c_V+C_V\\&= -\int \int _{\mathbb{R }^2} \ln |x-y| d(\nu -\mu )(x) d(\nu -\mu )(y) \end{aligned}$$

where we used the Theorem 1 on the second line and it is easy to check that there is no constant in the last line since the expression must be \(0\) for \(\nu =\mu \).

This allows to define a relative free entropy which does not depend on a potential but only on a measure:

$$\begin{aligned} \Sigma _\mu (\nu )=-\int \int _{\mathbb{R }^2} \ln |x-y| d(\nu -\mu )(x) d(\nu -\mu )(y) \end{aligned}$$

if \(\nu \) has a support included in \(S_\mu \) and \(\Sigma _\mu (\nu )=+\infty \) otherwise. By construction we have \(\Sigma _V(\nu )\leqslant \Sigma _{\mu _V}(\nu )\) with equality for all probability measures on \(S_{\mu _V}\). Informally, an other way to express the link between the two is:

$$\begin{aligned} \Sigma _{\mu }=\sup _{V|\mu _V=\mu }\Sigma _V= \Sigma _{-2U_\mu +\infty \mathbf{1}_{S_\mu ^c}}. \end{aligned}$$

With this new quantity, Theorem 3 implies :

Theorem 4

(Free \(T_1\) inequality, version for probability measures) For any \(\mu \in \mathcal{P }(\mathbb{R }),\) with compact support such that its logarithmic potential \( U_\mu (x) = - \int \ln |x-y| d\mu (y)\) exists and is continuous on \(\mathbb{C }\), there exists a constant \(B_\mu \) such that for any probability measure \(\nu \)

$$\begin{aligned} W_1^2(\nu , \mu ) \leqslant B_\mu \Sigma _\mu (\nu ). \end{aligned}$$

This version of the Theorem which can be seen as a local one (it only gives information for \(\mu \) and \(\nu \) living in a given compact set \(K\)) has since been recovered with completely different methods by Popescu in [31] with an optimal constant depending only on the size of \(K\).

Note that since \(\mu \) is compactly supported the result of Bobkov and Götze also applies and gives:

$$\begin{aligned} W_1^2(\nu , \mu ) \leqslant C_\mu H(\nu |\mu ). \end{aligned}$$

A natural question is to ask whether our free inequality is a direct consequence of the classical one. This is not the case thanks to the following:

Proposition 1

Let \(\lambda \) be the uniform law on \([0;1]\), then

$$\begin{aligned} \sup _{\nu \in \mathcal{P }([0,1]),\nu \ne \lambda } \frac{H(\nu |\lambda )}{\Sigma _\lambda (\nu )} =\infty . \end{aligned}$$

Proof

The proof of the property is essentially a direct calculation. Consider \(\nu _n\) the uniform law on

$$\begin{aligned} \bigcup _{i=0}^{n-1}\frac{i}{n}+\left[ 0;\frac{1}{n^2}\right] . \end{aligned}$$

Then \(H(\nu _n)=\ln (n)\) but \(\Sigma _\mu (\nu _n)\) remains bounded since the double logarithmic part is equivalent to the convergent Riemann sum

$$\begin{aligned} \frac{1}{n^2}\sum _{1\leqslant i\ne j\leqslant n}\ln \left( \frac{i}{n}-\frac{j}{n}\right) . \end{aligned}$$

\(\square \)

1.4 Concentration property for \(\beta \)-ensembles

As mentioned in our quick review of classical transport-entropy inequalities at the beginning of the introduction and detailed in [22], those inequalities are intimately linked with concentration properties of the measures involved. This is the problem we address in the last part of the paper.

There are many concentration results for families of Wigner or Wishart type matrices (the interested reader can consult for example [24] or chapter 4.4 in [1], section 8.5 in [3, 7, 14, 21, 28]) but we want to emphasise that very few results are known in other cases of interest and in particular the so-called \(\beta \)-ensembles.

For \(\beta >0,\) we consider the empirical measure \({\widehat{\mu }_N}\) of the \(x_i\)’s distributed according to the measure

$$\begin{aligned} \mathbb{P }^N_{V,\beta }(dx_1,\ldots ,dx_N)= \prod _{i<j}|x_i -x_j|^\beta \exp \left( -N\sum _{i=1}^NV(x_i)\right) \frac{\prod _{i=1}^Ndx_i}{Z^N_{V,\beta }}. \end{aligned}$$

This corresponds to a matrix model for \(\beta =1,2,4\). We look at the concentration of \({\widehat{\mu }_N}\) around its limit law \(\mu _{\frac{2V}{\beta }}.\)

For matrix models (\(\beta =1\) or 2) with strictly convex potentials, Proposition 4.4.26 in [1] shows that if \(V\) is \(\mathcal C ^\infty \) with \(V^{\prime \prime }\geqslant \kappa >0\) and \(V^\prime \) has a polynomial growth at infinity, then for all \(\theta \geqslant 0\), for all \(1\)-Lipschitz function \(f\),

$$\begin{aligned} \mathbb{P }^N_{V,\beta }\left( \left| \frac{1}{N}trf-\int \frac{1}{N}trfd\mathbb{P }^N_{V,\beta }\right| >\theta \right) <e^{-N^2\frac{\kappa \theta ^2}{2}}. \end{aligned}$$
(3)

To extend such estimates to non-convex potentials, we will rely on our Theorem 3. Indeed, in [8], Bolley et al. show how to deduce from Talagrand’s inequalities explicit bounds on the convergence of the empirical measure of independent variables towards their common measure. For example if \(X_1,\ldots ,X_n,\ldots \) are independent variables in \(\mathbb{R }^d\) of law \(\mu \) satisfying \(T_p(C)\) with \(1\leqslant p\leqslant 2\), then for any \(d^{\prime }<d\), any \(C^{\prime }<C\), there exists \(N_0>0\) such that for all \(N>N_0\), for all \(\theta > v(N/N_0)^{-\frac{1}{2+d^{\prime }}}\)

$$\begin{aligned} \mathbb{P }\left( W_1\left( \frac{1}{N}\sum _{i=1}^N\delta _{X_i}, \mu \right) >\theta \right) <e^{-\gamma _p \frac{C^{\prime }}{2}N\theta ^2} \end{aligned}$$

with \(\gamma _p\) an explicit constant depending on \(p\) in a very simple way. These results have been extended in [10, 11].

Similarly, in our context, as we know that, under \(\mathbb{P }^N_{V,\beta },\) the empirical measure \({\widehat{\mu }_N}\) converges almost surely to \(\mu _{\frac{2V}{\beta }},\) it is natural to ask whether we can control the tail of the distribution of the random variable \(W_1({\widehat{\mu }_N},\mu _{\frac{2V}{\beta }}).\)

In comparison with Theorem 3, we need here some additional assumptions, for technical reasons that will appear more clearly along the proofs.

Hypothesis 2

  1. a.

    \(V\) satisfies Hypothesis 1, is locally Lipschitz, differentiable outside a compact set and there exists \(\alpha >0,d\geqslant 2\) such that, \(|V^\prime (x)| \sim _{|x|\rightarrow +\infty } \alpha |x|^{d-1}.\)

  2. b.

    \(V\) and \(\beta >0\) are such that the equilibrium measure \(\mu _{\frac{2V}{\beta }}\) has finite classical entropy.

The condition b. is not as restrictive as it may seem due to a result by Deift, Kriecherbauer and McLaughlin. A direct consequence of the main result in [19] is that this is satisfied as soon as \(V\) is \(\mathcal C ^2.\) Note also that consequently Hypothesis 2 is satisfied in the particular case of a polynomial of even degree with positive leading coefficient.

Our concentration result around the limiting measure is as follows:

Theorem 5

(Concentration for \(\beta \)-ensembles) Let \(V\) and \(\beta >0\) satisfy Hypothesis 2. Then there exists \(u,v >0\) such that for any \(\theta >v\sqrt{\frac{\ln (1+N)}{ N}}\),

$$\begin{aligned} \mathbb{P }^N_{V,\beta }\left( W_1({\widehat{\mu }_N},\mu _{\frac{2V}{\beta }} )\geqslant \theta \right) \leqslant e^{-u N^2 \theta ^2}. \end{aligned}$$

The result above is stated for potentials which are equivalent to a power at infinity but this hypothesis can be relaxed if we restrict it to a compact set, as will be stated in Theorem 7.

In comparison to (3), the strength of our result is that it is valid for any \(\beta >0\), does not require any convexity assumption and gives a bound simultaneously on all Lipschitz functions since \(W_1(\mu ,\nu )=\sup _{f\,\, 1-{Lip}}|\mu (f)-\nu (f)|\). On the other hand, our method does not allow to get a bound for all \(\theta \geqslant 0\) and the constant in the exponential decay is not explicit.

The rest of the paper is divided in two parts, the first one proves the free transport-entropy inequality Theorem 3; the second one deduces from there the concentration estimate Theorem 5.

2 Free \(T_1\) inequality

This section is devoted to the proof of our main result Theorem 3. In the first part, we will build some useful tools from potential theory. Then, in the second part of this section, we prove the result restricted to a fixed compact. The third part of this section extends the result on measures whose support is arbitrary.

2.1 Lipschitz perturbations of the potential

The first ingredient of the proof is to evaluate the distance between the equilibrium measures corresponding to two potentials obtained from one another by a Lipschitz perturbation. Propositions 2 and 3 will be particularly useful in the case when the perturbation is Lipschitz but we state them in a slightly more general context.

Before giving the statements of these propositions, we first need the following lemma, that uses crucially the properties of the Hilbert transform. This should be classical but we did not find a proper reference and we give its proof for the sake of completeness. We denote by \(L^2(\mathbb{R })\) the set on functions such that \(\int f^2(x) dx < \infty . \)

Lemma 1

Let \(\mu \) be a compactly supported probability measure on \(\mathbb{R }\) whose logarithmic potential \(U_{\mu }(x)=-\int \ln |x-y|d\mu (y)\) is continuous on \(\mathbb{C }\). Then if \(g\) is a continuously differentiable function on \(\mathbb{R }\) with compact support,

$$\begin{aligned} \int _\mathbb{R }g(x)d\mu (x)=\int _\mathbb{R }(Hg)^\prime (x)U_\mu (x)dx \end{aligned}$$

where \(H\) is the Hilbert transform: for \(f \in L^2(\mathbb{R }), \forall x \in \mathbb{R },\)

$$\begin{aligned} (Hf)(x)=-\!\!\!\!\!\!\int \frac{f(y)}{x-y}dy {:}= \lim _{\varepsilon \downarrow 0} \int _{\mathbb{R }\setminus [x-\varepsilon , x+\varepsilon ]} \frac{f(y)}{x-y}dy. \end{aligned}$$

The proof of the Lemma uses properties of the Hilbert transform in particular Titchmarsh’s theorem (see chapter 5 of [34] Theorem 93):

Property 1

  1. a.

    \(H\) is an isometry on \(L^2(\mathbb{R }),\,H^2=-id.\)

  2. b.

    If \(\psi \) is holomorphic on the half upper plane with

    $$\begin{aligned} \sup _{y>0}\int _\mathbb{R }|\psi (x+iy)|^2dx<+\infty \end{aligned}$$

    and \(\psi (x)=f(x)+ig(x)\) for \(x\) in \(\mathbb{R }\), for real valued functions \(f\) and \(g\), then \(H g=-f\) and \(H f = g\).

  3. c.

    If \(f\) is in \(L^2(\mathbb{R }),\) differentiable and such that \(f^\prime \) is in \(L^2(\mathbb{R }),\) then \(Hf\) is differentiable and \( (Hf)^\prime =H(f^\prime ).\) Moreover, if \(f\) is continuously differentiable, so is \(Hf.\)

We now prove Lemma 1.

Proof

First suppose that \(\mu \) as a smooth compactly supported density. For \(y>0\) and \(g\) continuously differentiable on \(\mathbb{R }\) with compact support, we define

$$\begin{aligned} \phi (y):=\mathfrak I \text{ m }\int _\mathbb{R }g(x) \int _\mathbb{R }\frac{1}{\pi (x+iy-t)}d\mu (t) dx. \end{aligned}$$

On one hand, if \(X\) is of law \(\mu \) and \(\Gamma \) is an independent Cauchy variable we can rewrite \(\phi \) as a convolution:

$$\begin{aligned} \phi (y)=\int \int g(x) \frac{y}{\pi ((x-t)^2+y^2)}d\mu (t)dx=\mathbb{E }[g(X+y\Gamma )]. \end{aligned}$$

Therefore, by dominated convergence, \(\phi (y)=\mathbb{E }[g(X)]+\varepsilon (y),\) with \(\varepsilon (y)\) going to zero as \(y\) goes to zero. Otherwise stated, when \(y\) goes to zero, \(\phi (y)\) converges to \(\int g(t)d\mu (t)\).

On the other hand, for any \(y>0,\) the function \(x \mapsto \mathfrak I \text{ m }\int \frac{1}{\pi (x+iy-t)}d\mu (t)\) is in \(L^2(\mathbb{R })\) and, by Property 1.a. above,

$$\begin{aligned} \phi (y)=\int (Hg)(x)H\left( \mathfrak I \text{ m }\int \frac{1}{\pi (\cdot +iy-t)}d\mu (t)\right) (x)dx. \end{aligned}$$

Thus, for \(\mu \) with a smooth compactly supported density (twice continuously differentiable is sufficient), \(z \mapsto S_\mu (z)=\int \frac{1}{\pi (z-t)}d\mu (t)\) is continuous in the upper half plane and thus \(S_\mu (x+iy)=O((1+\Vert (x,y)\Vert _2^2))\) is bounded independently of \(y\) in \(L_2(dx)\). By Property 1.b.,

$$\begin{aligned} \phi (y)=-\mathfrak R \text{ e }\int (Hg)(x)\left( \int \frac{1}{\pi (x+iy-t)}d\mu (t)\right) dx. \end{aligned}$$

Then, as \(U_\mu \) is continuous and \(g\) continuously differentiable, an integration by parts gives

$$\begin{aligned} \phi (y)=\mathfrak R \text{ e }\int (Hg)^\prime (x)U_\mu (x+iy)dx \end{aligned}$$

As \(g\) is compactly supported, one can easily check that there exists \(K>0\) such that for \(x\) large enough, \(|(Hg)^\prime (x)|\leqslant \frac{K}{x^2}.\) As \(\mu \) is compactly supported, for \(x\) large enough and any \(y>0,\) \(|U_\mu (x+iy)| \leqslant K \ln x.\) Therefore, by dominated convergence, \(\phi (y)\) converges to \( \int (Hg)^\prime (x)U_\mu (x)dx\) as \(y\) goes to zero.

Finally, we extend this result to measures without an hypothesis on their density. If \(\mu \) is such that \(U_{\mu }\) is continuous, let \(X\) be a random variable of law \(\mu \) and \(Y\) be a random variable independent of \(X\) and with a law with a smooth and compactly supported density. Then we can apply our result to the law \(\mu _\epsilon \) of \(X+\epsilon Y\):

$$\begin{aligned} \int _\mathbb{R }g(x)d\mu _\epsilon (x)=\int _\mathbb{R }(Hg)^\prime (x)U_{\mu _\epsilon }(x)dx. \end{aligned}$$

Then, we let \(\epsilon \) go to \(0\). The result follows by dominated convergence since

  1. 1.

    We can apply dominated convergence to prove that \(U_{\mu _\epsilon }(x)=\mathbb{E }[U_\mu (x-\epsilon Y)]\) goes to \(U_{\mu }(x)\)

  2. 2.

    \((Hg)^\prime (x)U_{\mu _\epsilon }(x)=O((1+x^2)^{-1})O(\ln (1+|x|))\)

We can now state the first perturbative estimate.

Proposition 2

(Dependancy of the equilibrium measure in the potential) For any \(L>0\), there exists a finite constant \(K_L\) such that, for any \(V,W\) satisfying Hypothesis 1, if \(\mu _V\) and \(\mu _W\) are probability measures on \([-L;L]\) then

$$\begin{aligned} W_1(\mu _V, \mu _W) \leqslant K_L{\text{ osc } }(V-W). \end{aligned}$$

with \({\text{ osc } }(f)=\sup _\mathbb{R }f - \inf _\mathbb{R }f\).

Proof

Our main tool for this proof is the use of the logarithmic potentials of the measures involved. We have already seen in Theorem 1 that they are closely related. Corollary I.4.2 in [32] gives us a valuable estimate

$$\begin{aligned} \Vert U_{\mu _V} - U_{\mu _W}\Vert _\infty \leqslant \Vert V-W\Vert _\infty . \end{aligned}$$

We will also crucially use a dual formulation for the distance \(W_1.\) Indeed, the Kantorovich-Rubinstein theorem (see e.g. Theorem 1.14 in [35]) gives that

$$\begin{aligned} W_1(\mu _V, \mu _W)=\sup _g \mu _V(g)-\mu _{W}(g) \end{aligned}$$

where the supremum is taken over the set of 1-Lipschitz function on \(\mathbb{R }.\)

Note that the quantity \(\mu _V(g)-\mu _{W}(g)\) does not change if we add a constant to \(g\) or if we change the values of \(g\) outside \([-L;L]\). This observation and a density argument show that

$$\begin{aligned} W_1(\mu _V, \mu _W)=\sup _{g\in \mathcal{G }}\mu _V(g)-\mu _{W}(g) \end{aligned}$$

with \(\mathcal{G }\) the set of \(\mathcal C ^1,\) compactly supported, 1-Lipschitz function on \(\mathbb{R }\), vanishing outside of \([-2L;2L].\) Let \(g\) be in \(\mathcal{G }\), according to Lemma 1,

$$\begin{aligned} \mu _V(g)-\mu _{W}(g) =\int (H g)^\prime (x) (U_{\mu _V}(x) - U_{\mu _W}(x))dx. \end{aligned}$$

Indeed, all the assumptions of Lemma 1 are fulfilled, as we know from Theorem I.4.8 of [32] that \(U_{\mu _V}\) and \(U_{\mu _W}\) are continuous on \(\mathbb{C }\) as soon as \(V\) and \(W\) are. Now we cut this integral into two. On one hand, as \(g \in \mathcal{G },\) \(\Vert g\Vert _\infty \leqslant 2L,\) we have

$$\begin{aligned}&\left| \,\,\int _{|x|>2L+1} (Hg)^\prime (x) (U_{\mu _V} - U_{\mu _W})(x)\right| \\&\quad \leqslant \Vert V-W\Vert _{\infty }\Vert g\Vert _\infty \int _{|x|>2L+1,|y|<2L}\frac{dxdy}{|x-y|^2} \leqslant K_L^1 \Vert V-W\Vert _{\infty } \end{aligned}$$

with \(K_L^1=2L\int _{|x|>2L+1}\int _{|y|<2L}|x-y|^{-2}.\)

On the other hand, by Cauchy-Schwarz inequality and using Properties 1.a. and c. of the Hilbert transform

$$\begin{aligned} \left| \,\,\int _{|x|<2L+1} (Hg)^\prime (x) (U_{\mu _V} - U_{\mu _W})(x)\right|&\leqslant \Vert V-W\Vert _{\infty }(4L+2)^{1/2}\Vert H(g^{\prime })\Vert _2\\&=(4L+2)^{1/2}\Vert g^{\prime }\Vert _2\Vert V-W\Vert _{\infty }\\&\leqslant (4L+2)\Vert V-W\Vert _{\infty } \end{aligned}$$

Finally, it is easy to check that \(\mu _V\) depends on \(V\) only up to an additive constant, thus we can always translate \(V\) such that \(\Vert V-W\Vert _{\infty }=2{\text{ osc } }(V-W)\). Thus we have proved

$$\begin{aligned} \mu _V(g)-\mu _{W}(g) \leqslant K_L{\text{ osc } }(V-W) \end{aligned}$$

with \(K_L=2(K_L^1+4L+2).\) As \(K_L\) does not depend on \(g \in \mathcal{G },\) taking the supremum for \(g\) in \(\mathcal{G }\) gives the result.

The next step is to show that given a Lipschitz function \(f\) on a given interval \([-L;L]\) we can extend the function outside this interval while keeping a control on the support of \(\mu _{V+f}\) independently of \(f\). This property is rather technical but crucial since we will need to consider functions \(f\) of arbitrary Lipschitz constant and a priori there is no way to control uniformly the support of \(\mu _{V+f}\).

Proposition 3

(Confinement Lemma) Let \(V\) be a function satisfying Hypothesis 1. For any \(L>0,\) there exists \({\widetilde{L}}>L\) depending only on \(L\) and the potential \(V\) such that for any \(u\geqslant 0\), for any \(u\)-Lipschitz function \(f\) on \([-L,L]\) one can find a function \({\widetilde{f}}\) such that

  1. 1.

    \({\widetilde{f}}\) is a bounded \(u\)-Lipschitz function on \(\mathbb{R }\)

  2. 2.

    for all \(|x|<L,\,{\widetilde{f}}(x)=f(x)\)

  3. 3.

    the support of \(\mu _{V+{\widetilde{f}}}\) is included in \([-{\widetilde{L}},{\widetilde{L}}]\)

  4. 4.

    \({\text{ osc } }({\widetilde{f}}) \leqslant 2u {\widetilde{L}}.\)

Proof

Let \(V\) and \(L\) be fixed and let \(f\) be a \(u\)-Lipschitz function defined on \([-L,L].\) Again, since \(\mu _V\) depends on \(V\) only up to an additive constant, one can always assume that \(f(0)=uL\) (so that \(f\) and the function \({\widetilde{f}}\) we are going to define both stay positive).

Let \({\widetilde{L}}>L\) be a constant to be defined later. Let us define \({\widetilde{f}}\) as the biggest \(u\)-Lipschitz function which extends \(f\) and is constant on components of \([-{\widetilde{L}},{\widetilde{L}}]^c.\) More explicitly, we have

$$\begin{aligned} {\widetilde{f}}(x)= \left\{ \begin{array}{ll} f(x) &{}\quad \text{ si } |x|\leqslant L\\ f(L)+u(x-L) &{}\quad \text{ if } L\leqslant x\leqslant {\widetilde{L}}\\ f(L)+u({\widetilde{L}}-L) &{}\quad \text{ if } x\geqslant {\widetilde{L}}\\ f(-L)-u(L+x) &{}\quad \text{ if } -{\widetilde{L}}\leqslant x\leqslant -L\\ f(-L)-u(L-{\widetilde{L}}) &{}\quad \text{ if } x\leqslant -{\widetilde{L}}. \end{array} \right. \end{aligned}$$

Our goal will be to find a constant \({\widetilde{L}},\) independent of \(f\) and \(u,\) such that \({\widetilde{f}}\) (which depends on \({\widetilde{L}}\)) fulfils the requirements of Proposition 3.

Since \(V+{\widetilde{f}}\) satisfies Hypothesis 1, the equilibrium measure \(\mu _{V+{\widetilde{f}}}\) is well defined. Let us have a look at the value of the minimiser of the entropy functional \(J_{V+{\widetilde{f}}}\) (as defined in the introduction): if \(\lambda \) denotes the Lebesgue measure on \([0;1],\)

$$\begin{aligned} c_{V+{\widetilde{f}}} =\inf _{\nu \in \mathcal{P }(\mathbb{R })}J_{V+{\widetilde{f}}}(\nu )\leqslant J_{V+{\widetilde{f}}}(\lambda ). \end{aligned}$$

Besides,

$$\begin{aligned} J_{V+{\widetilde{f}}}(\lambda )=\lambda (V)+\lambda ({\widetilde{f}})- \int \int _{[0;1]^2} \ln |x-y| dxdy\leqslant \max _{[0;1]}V+(L+1)u- \frac{3}{4} \end{aligned}$$

Thus

$$\begin{aligned} c_{V+{\widetilde{f}}} \leqslant M_L(1+u) \end{aligned}$$

with \(M_L\) a constant only depending on \(L\) and the potential \(V.\)

This estimate will allow us to find a bound on the support \(S_{\mu _{V+{\widetilde{f}}}}\) of \(\mu _{V+{\widetilde{f}}}\). Indeed, define \(b=\sup \big \{|x|\in \mathbb{R }/ x \in S_{\mu _{V+{\widetilde{f}}}}\big \}\). Now we prove that a good choice of \({\widetilde{L}}\) (depending only on \(V\)) such that \(b>{\widetilde{L}}\) leads to contradiction.

Let us first assume that \({\widetilde{L}}\) is chosen and \(b>{\widetilde{L}}\). From Theorem 1, for any \(x\) in the support of \(\mu _{V+{\widetilde{f}}}\):

$$\begin{aligned} V(x) + {\widetilde{f}}(x) = -2U_{V+{\widetilde{f}}}(x) +C_{V+{\widetilde{f}}} \end{aligned}$$

and replacing \(C_{V+{\widetilde{f}}}\) by its expression given in Theorem 1,

$$\begin{aligned} V(x) + {\widetilde{f}}(x) +\mu _{V+{\widetilde{f}}}(V+{\widetilde{f}})= -2U_{V+{\widetilde{f}}}(x) + 2c_{V+{\widetilde{f}}} \end{aligned}$$

For any \(x \in \mathbb{R },\) \(-U_{V+{\widetilde{f}}}(x) \leqslant \ln (|x|+b)\). On the other side, according to Hypothesis 1, there exists \(\alpha >0\) and \(\beta \in \mathbb{R }\) (depending on \(V\)) such that for any \(x \in \mathbb{R },\) \(V(x) \geqslant \alpha x^2 + \beta .\) Besides, as \(f(0) = uL,\) \({\widetilde{f}}(x)\geqslant u|\min (|x|,{\widetilde{L}})-L|\), thus \(\mu _{V+{\widetilde{f}}}(V+{\widetilde{f}})\geqslant \beta \). Putting these facts together, one gets for \(|x|>{\widetilde{L}}\) in the support of \(\mu _{V+{\widetilde{f}}}\):

$$\begin{aligned} \alpha x^2 + \beta +u|\min (|x|,{\widetilde{L}})-L|+\beta \leqslant 2\ln (|x|+b)+2M_L (1+u). \end{aligned}$$

Now take a sequence of points in the support of \(\mu _{V+{\widetilde{f}}}\) converging in absolute value to \(b.\) We get at the limit that:

$$\begin{aligned} \alpha b^2 + 2\beta + u({\widetilde{L}}-L)\leqslant 2\ln (2 b)+2M_L (1+u). \end{aligned}$$

There exists some \(\gamma _V>1\) such that the function \(\alpha x^2 - 2\ln (2 x)+2\beta -2M_L\) is strictly positive for \(|x|>\gamma _V\). Now choose \({\widetilde{L}}>\gamma _V\), since \(b>{\widetilde{L}}>1\), we get:

$$\begin{aligned} u({\widetilde{L}}-L)<(\alpha b^2 +2\beta - 2\ln (2 b)-2M_L)+ u({\widetilde{L}}-L) \leqslant 2 M_L u. \end{aligned}$$

Then if we also choose \({\widetilde{L}}>L+2M_L\) large enough, this leads a contradiction. To sum up, for this choice of \({\widetilde{L}}\), we have proven that it is absurd to suppose that the support of \(\mu _{V+{\widetilde{f}}}\) is not in \([-{\widetilde{L}};{\widetilde{L}}].\) Otherwise stated, \({\widetilde{f}}\) satisfies the third point of the proposition. The other points are trivially satisfied by construction. \(\square \)

2.2 Derivation of the theorem for measures on a given compact

The next step is to show a weak version of our main theorem, in the sense that the constant in the inequality between Wasserstein distance and free entropy depends on the support of the measures under consideration.

Proposition 4

(Free \(T_1\) inequality on a compact) Let \(V\) be a function satisfying Hypothesis 1. For all \(L>0,\) there exists a constant \(B_{V,L},\) depending only on \(L\) and \(V,\) such that, for any probability measure \(\nu \) with support in \([-L,L],\)

$$\begin{aligned} W_1(\nu , \mu _V)^2 \leqslant B_{V,L} \Sigma _V(\nu ). \end{aligned}$$

Proof

We can assume without loss of generality that \(L\) is large enough for the support of \(\mu _V\) to be inside \([-L;L]\). We are going to use a duality argument. We first recall that

$$\begin{aligned} W_1(\nu , \mu _V) = \sup _{\phi \,\, 1-Lip} \mu _V(\phi ) - \nu (\phi ). \end{aligned}$$

Let \(f\) be a \(u\)-Lipschitz function and \(g= -\widetilde{f},\) with \( \widetilde{f}\) defined as in Proposition 3.

$$\begin{aligned} \nu (g)-\mu _V(g)-\Sigma _V(\nu )\leqslant \sup _{\pi \in \mathcal{P }(\mathbb{R })}(\pi (g)-\mu _V(g)-\Sigma _V(\pi )) \end{aligned}$$

Note that since \(g\) is equal to \(-f\) on \([-L;L]\), the left hand side is just \(\mu _V(f)-\nu (f)-\Sigma _V(\nu )\).

Let us control the right hand side: for any \(\pi \in \mathcal{P }(\mathbb{R }),\)

$$\begin{aligned} \pi (g)-\mu _V(g)-\Sigma _V(\pi )=-J_{V-g}(\pi )+J_V(\mu _V)-\mu _V(g). \end{aligned}$$

But since \(J_{V-g}\) is minimal at \(\mu _{V-g}\) and \(J_V\) is minimal in \(\mu _V\), for any \(\pi \in \mathcal{P }(\mathbb{R }),\) we have

$$\begin{aligned} \pi (g)-\mu _V(g)-\Sigma _V(\pi )&\leqslant J_V(\mu _{V-g}) - J_{V-g}(\mu _{V-g}) -\mu _V(g)\\&= \mu _{V-g}(g)-\mu _V(g)\leqslant |\widetilde{f}|_{Lip}W_1(\mu _{V+\widetilde{f}},\mu _V). \end{aligned}$$

By construction the support of \(\mu _{V+\widetilde{f}}\) is inside \([-{\widetilde{L}};{\widetilde{L}}]\), with \({\widetilde{L}}\) as defined in Proposition 3, we can then apply Proposition 2:

$$\begin{aligned} \mu _V(f)-\nu (f)-\Sigma _V(\nu )\leqslant | \widetilde{f}|_{Lip}K_{{\widetilde{L}}} osc(\widetilde{f}) \leqslant u^2 {\widetilde{L}}K_{{\widetilde{L}}}. \end{aligned}$$

Thus, there exists a constant \(A_{V,L}\) such that for any \(\phi \) 1-Lipschitz and \(u>0\), taking \(f=u\phi ,\) we have

$$\begin{aligned} u(\mu _V(\phi )-\nu (\phi ))-A_{V,L} u^2\leqslant \Sigma _V(\nu ). \end{aligned}$$

We can take the supremum of this expression in \(\phi \) and then in \(u\) to get:

$$\begin{aligned} W_1(\nu ,\mu _V)^2\leqslant 4 A_{V,L} \Sigma _V(\nu ). \end{aligned}$$

\(\square \)

2.3 Extension to non-compactly supported measure

To deduce Theorem 3 from Proposition 4, we have to control what happens far from the support of \(\mu _V.\) The idea is that, since \(V\) grows faster than some \(a x^2,\) if the support of \(\mu \) is far from the support of \(\mu _V,\) \(\Sigma _V(\mu ),\) which is growing like \(V\) should be much larger than \( W_1(\mu , \mu _V)^2 \) which is growing rather like \(x^2.\) Therefore, it is enough to control what happens in a vicinity of the support of \(\mu _V\) and this case was treated in Proposition 4.

More precisely we have the following,

Lemma 2

Let \(V\) be a function satisfying Hypothesis 1. There exists \(\gamma _V>0\) and \(R_V\) depending only on \(V\) such that for any \(\mu \in \mathcal{P }(\mathbb{R }),\) there exists \(\widetilde{\mu }\) supported in \([-R_V,R_V]\) such that

$$\begin{aligned} \Sigma _V(\widetilde{\mu })&\leqslant \Sigma _V(\mu )\\ \gamma _V W_1(\mu , \widetilde{\mu })^2&\leqslant \Sigma _V(\mu ). \end{aligned}$$

We postpone the proof of the lemma to the end of this section and we first check that we can now get our main result (Theorem 3).

Proof

Let \(\mu \in \mathcal{P }(\mathbb{R })\) and \(\widetilde{\mu }\) corresponding to \(\mu \) as in Lemma 2. Then, using the triangular inequality and Proposition 4

$$\begin{aligned} W_1(\mu , \mu _V)^2&\leqslant 2 W_1(\widetilde{\mu }, \mu _V)^2 + 2 W_1(\widetilde{\mu }, \mu )^2 \\&\leqslant 2 B_{V,R_V} \Sigma _V(\widetilde{\mu }) + \frac{2}{\gamma _V}\Sigma _V(\mu )\\&\leqslant 2 \left( B_{V,R_V}+ \frac{1}{\gamma _V}\right) \Sigma _V(\mu ). \end{aligned}$$

\(\square \)

Finally, we prove Lemma 2.

Proof

Let \(R_V\) be a constant to be chosen later. There exists \(\alpha \in [0,1]\) such that \(\mu = (1-\alpha ) \mu _1 + \alpha \mu _2,\) with \(\mu _1 \in \mathcal{P }([-R_V,R_V])\) and \(\mu _2 \in \mathcal{P }([-R_V,R_V]^c).\) Then our definition for \(\widetilde{\mu }\) is:

$$\begin{aligned} \widetilde{\mu }= (1-\alpha ) \mu _1 + \alpha \lambda \end{aligned}$$

with \(\lambda \) the Lebesgue measureFootnote 2 on \([0;1].\)

We now want to show the following statement, which implies both inequalities stated in the Lemma: there exists \(R_V\) and \(\gamma _V\) such that

$$\begin{aligned} \Sigma _V(\mu ) - \Sigma _V(\widetilde{\mu }) -\gamma _V W_1(\mu , \widetilde{\mu })^2 \geqslant 0. \end{aligned}$$

Let us first bound the Wasserstein distance. In order to transport \(\mu \) onto \(\widetilde{\mu }\) one can always choose to transport \(\mu _2\) to \(\lambda \), this may not be optimal but gives the bound:

$$\begin{aligned} W_1(\mu , \widetilde{\mu })^2\leqslant \left( \alpha \int (|x|+1) d\mu _2(x)\right) ^2 \leqslant \alpha ^2 \mu _2((1+|\cdot |)^2) \end{aligned}$$

We then bound the difference between entropies:

$$\begin{aligned} \Sigma _V(\mu ) - \Sigma _V(\widetilde{\mu })&\geqslant \alpha (\mu _2-\lambda )(V) \\&- \alpha ^2 \int \int \ln |x-y|[d\mu _2(x)d\mu _2(y) - d\lambda (x)d\lambda (y)]\\&- 2\alpha (1-\alpha ) \int \int \ln |x-y| d\mu _1(x)d(\mu _2-\lambda )(y) \end{aligned}$$

Now we can get rid of the two double integrals by using that for all \(x,y,\,\ln |x-y|\leqslant \ln (1+|x|)+\ln (1+|y|)\) and that \(|\int \ln |x-y|d\lambda (y)- \ln (1+|x|)|<C\) for some \(C\) independent of \(x\). Thus,

$$\begin{aligned} \Sigma _V(\mu ) - \Sigma _V(\widetilde{\mu })&\geqslant \alpha ( \mu _2(V- 2(1+\alpha )\ln (1+|\cdot |))\\&-2\mu _1(\ln (1+|\cdot |))-\lambda (V-2(1+\alpha )\ln (1+|\cdot |))-2C). \end{aligned}$$

Finally, with \(C_V=\lambda (V-4\ln (1+|\cdot |))+2C\), and the inequality \(\mu _1(\ln (1+|\cdot |))\leqslant \ln (1+R_V)\leqslant \mu _2(\ln (1+|\cdot |))\),

$$\begin{aligned} \Sigma _V(\mu ) \!- \!\Sigma _V(\widetilde{\mu })-\gamma _V W_1(\mu , \widetilde{\mu })^2 \geqslant \alpha \mu _2(V- 6\ln (1+|\cdot |)\!-\!\alpha \gamma _V(1+|\cdot |)^2-C_V). \end{aligned}$$

We want this last expression to be positive. We first choose \(\gamma _V>0\) such that \(\liminf _{|x| \rightarrow \infty } \frac{V(x)}{\gamma _V x^2}>1.\)

Then \(V(x)- 6\ln (1+|x|)-\gamma _V(1+|x|)^2-C_V\) goes to infinity when \(|x|\) goes to infinity. In particular we can choose \(R_V>0\) such that for all \(|x|>R_V\), it is positive. Since \(\mu _2\) has its support inside \([-R_V;R_V]^c\), the above expression is positive. Since the choices of \(\gamma _V\) and \(R_V\) depend on \(V\) only, this concludes the proof. \(\square \)

3 Concentration inequality for random matrices

In this section we present an application of the free \(T_1\) inequality to a result of concentration for the empirical measure of a matrix model. The concentration result holds not only on usual matrix models as defined in the introduction but also on the slightly more general family measures, usually called \(\beta \)-ensembles. We recall the definition of these models in the next section before proving our concentration estimates.

3.1 \(\beta \)-Ensembles

For \(\beta >0\), and \(V\) a function satisfying Hypothesis 1, the \(\beta \)-ensemble with potential \(V\) is the family of laws on \(\mathbb{R }^N\), for \(N>0,\) given by

$$\begin{aligned} \mathbb{P }^N_{V,\beta }(dx_1,\ldots ,dx_N):= \prod _{i<j}|x_i -x_j|^\beta \exp \left( -N\sum _{i=1}^NV(x_i)\right) \frac{\prod _{i=1}^Ndx_i}{Z^N_{V,\beta }}. \end{aligned}$$

with \(Z^N_{V,\beta }\) a normalising constant which always exists under Hypothesis 1. For \(\beta =1,2,4\) this corresponds to the law of the eigenvalues of a matrix model (corresponding to the measure \(\mu _V^N\) when \(\beta =2\)).

Some of the results stated in the introduction still holds for these models. In particular, we can still express nicely \(\mathbb{P }^N_{V,\beta }\) in terms of the empirical measure of the \(x_i\)’s. If \({\widehat{\mu }_N}:=\frac{1}{N}\sum _{i=1}^N\delta _{x_i}\) then

$$\begin{aligned} \mathbb{P }^N_{V,\beta }(dx_1,\ldots ,dx_N)= \exp \left( -N^2\frac{\beta }{2}{\widetilde{J}_{\frac{2V}{\beta }}}({\widehat{\mu }_N})\right) \frac{\prod _{i=1}^Ndx_i}{Z^N_{V,\beta }} \end{aligned}$$

with the functional \(\widetilde{J}_V\) whose definition we recall

$$\begin{aligned} \widetilde{J}_V(\mu ) = \int V(x)d\mu (x)-\int \int _{x\ne y} \ln |x-y| d\mu (x) d\mu (y). \end{aligned}$$

Similarly to the definition of \(\Sigma _V,\) we also define

$$\begin{aligned} \widetilde{\Sigma }_V(\mu )= \widetilde{J}_V(\mu )- c_V. \end{aligned}$$

One can expect that in the large \(N\) limit, the eigenvalues should organise this time according to the measure \(\mu _{\frac{2V}{\beta }}\). This is indeed the case and we have a result analogous to Theorem 2, also proved in [2].

Theorem 6

(Large deviation principle for \(\beta \)-ensembles) Let \(V\) be a function satisfying Hypothesis 1. Under the law \(\mathbb{P }^N_{V,\beta }\) the sequence of random measures \({\widehat{\mu }_N}\) satisfies a large deviation principle in the speed \(N^2\) with good rate function \(\Sigma _{\frac{2V}{\beta }}\).

3.2 Approximate free \(T_1\) inequalities for empirical measures

At fixed \(N\) we have to work with probability measure \({\widehat{\mu }_N}\) which have the drawback of being discrete. This prevents us of applying the transport-entropy inequality since \(\Sigma _V({\widehat{\mu }_N})=+\infty \). We settle for an approximate inequality where \(\Sigma _V\) is replaced by \({\widetilde{\Sigma }_{V}}\). We define \(\Vert f\Vert _{Lip}^A\) the Lipschitz norm of \(f\) on a compact set \(A\):

$$\begin{aligned} \Vert f\Vert _{Lip}^A=\sup _{s,t\in A,s\ne t}\left| \frac{f(t)-f(s)}{t-s}\right| \end{aligned}$$

Proposition 5

(Approximate free \(T_1\) inequality) Let \(V\) be a locally Lipschitz function satisfying Hypothesis 1. Then, for any \(\mathcal K \) compact of \(\mathbb{R },\) any \(N\in \mathbb{N }^*\) and any \((x_1,\ldots ,x_N) \in \mathcal K ^N,\)

$$\begin{aligned} W_1^2({\widehat{\mu }_N},\mu _V) \leqslant 2B_V{\widetilde{\Sigma }_{V}}({\widehat{\mu }_N})+3\frac{\Vert V\Vert ^\mathcal{K _1}_{Lip}+B+\ln (N)}{N} \end{aligned}$$

where \(B_V\) is the same constant as in Theorem 3, \(B\) some universal finite constant and \(\mathcal K _u\) the set of reals at distance less than \(u\) from \(\mathcal K .\)

Proof

Let \(\mathcal K \) be a compact set of \(\mathbb{R }\) and \(x_1,\ldots ,x_N\) be in \(\mathcal K \). The idea is to replace \({\widehat{\mu }_N}\) by a measure \({\widehat{\nu }_N}\) such that \(W_1({\widehat{\mu }_N},{\widehat{\nu }_N})\) is small and \(\Sigma _V({\widehat{\nu }_N})\) is close to \({\widetilde{\Sigma }_{V}}({\widehat{\mu }_N})\).

We first spread each \(x_i\) such that they are at least \(N^{-2}\) apart. Let the \(x_{(i)}\)’s be the \(x_i\)’s rearranged by increasing order : \(x_{(1)} \leqslant x_{(2)} \leqslant \cdots \leqslant x_{(n)},\) then define the \(y_i\) by:

$$\begin{aligned} \left\{ \begin{array}{lll} y_1 = x_{(1)}\\ y_{i+1} = y_i + \max (x_{(i+1)}-x_{(i)}, \frac{1}{N^2}) \end{array} \right. \end{aligned}$$

Then we define

$$\begin{aligned} {\widehat{\rho }_N}=\frac{1}{N} \sum _{i=1}^N \delta _{y_i} \quad \mathrm and \quad {\widehat{\nu }_N}={\widehat{\rho }_N}*\lambda _{N^{-3}} \end{aligned}$$

where \(\lambda _{N^{-3}}\) is the uniform measure on \([0,N^{-3}]\) and \(*\) is the usual convolution of measures.

Let us see how the Wasserstein distance and the entropy change when we replace \({\widehat{\mu }_N}\) by \({\widehat{\nu }_N}\). Note that since \(|y_i-x_{(i)}|<(i-1)N^{-2}\),

$$\begin{aligned} W_1({\widehat{\mu }_N}, {\widehat{\nu }_N})\leqslant \frac{1}{N} \sum _{i=1}^N|y_i-x_{(i)}|\leqslant \frac{1}{2N} \end{aligned}$$

but

$$\begin{aligned} W_1({\widehat{\rho }_N}, {\widehat{\nu }_N})\leqslant \frac{1}{N^{3}}. \end{aligned}$$

so that

$$\begin{aligned} W_1({\widehat{\mu }_N}, {\widehat{\nu }_N})\leqslant \frac{2}{N}. \end{aligned}$$

Moreover, for any \(i \ne j,\) \(\ln |y_i-y_j| \geqslant \ln |x_{(i)} - x_{(j)}|,\) and \(y_i\in K_{N^{-1}}\subset K_1\),

$$\begin{aligned} {\widetilde{\Sigma }_{V}}({\widehat{\mu }_N})-{\widetilde{\Sigma }_{V}}({\widehat{\rho }_N}) \geqslant -\Vert V\Vert ^\mathcal{K _1}_{Lip}W_1({\widehat{\mu }_N},{\widehat{\rho }_N})\geqslant -\Vert V\Vert ^\mathcal{K _1}_{Lip}\frac{2}{N}. \end{aligned}$$

Let \((Z_i)_{i\geqslant 1}\) and \((\widetilde{Z}_i)_{i\geqslant 1}\) be two independent families of independent variables uniformly distributed on \([0,1].\) We can express the difference of entropies using this variables:

$$\begin{aligned}&{\widetilde{\Sigma }_{V}}({\widehat{\rho }_N})-\Sigma _V({\widehat{\nu }_N}) \geqslant \int V(x) d({\widehat{\rho }_N}-{\widehat{\nu }_N})(x) \\&\quad +\frac{1}{N^2} \sum _{i \ne j} \mathbb{E }\left( \ln \left( 1+ N^{-3} \frac{Z_i-Z_j}{y_i-y_j}\right) \right) +\frac{1}{N^2} \sum _{i=1}^N \mathbb{E }\left( \ln N^{-3}|Z_i - \widetilde{Z}_i|\right) \end{aligned}$$

Since for \(i\ne j,\,|y_i-y_j|\geqslant N^{-2}\), for \(N>2\),

$$\begin{aligned} \mathbb{E }\left( \ln \left( 1+ N^{-3} \frac{Z_i-Z_j}{y_i-y_j}\right) \right) \geqslant \ln \left( 1-\frac{2}{N}\right) . \end{aligned}$$

Thus,

$$\begin{aligned} {\widetilde{\Sigma }_{V}}({\widehat{\rho }_N})-\Sigma _V({\widehat{\nu }_N}) \geqslant - \Vert V\Vert ^\mathcal{K _1}_{Lip}N^{-3}-\frac{B+3\ln N}{N} \end{aligned}$$

with \(B>0\) a finite constant.

This leads to,

$$\begin{aligned} {\widetilde{\Sigma }_{V}}({\widehat{\mu }_N})&= ({\widetilde{\Sigma }_{V}}({\widehat{\mu }_N})-{\widetilde{\Sigma }_{V}}({\widehat{\rho }_N}))+({\widetilde{\Sigma }_{V}}({\widehat{\rho }_N})- \Sigma _V({\widehat{\nu }_N}))+\Sigma _V({\widehat{\nu }_N})\\&\geqslant -3\frac{\Vert V\Vert ^\mathcal{K _1}_{Lip}+B+\ln (N)}{N}+\Sigma _V({\widehat{\nu }_N}). \end{aligned}$$

Then by applying our the free transport inequality of Theorem 3 for the potential \(V\) on \({\widehat{\nu }_N}\), we obtain:

$$\begin{aligned} W_1({\widehat{\mu }_N}, \mu _V)^2&\leqslant \left( W_1({\widehat{\nu }_N}, \mu _V)+\frac{2}{N}\right) ^2\\&\leqslant 2W_1({\widehat{\nu }_N}, \mu _V)^2+\frac{8}{N^2}\\&\leqslant 2B_V\Sigma _V({\widehat{\nu }_N})+\frac{8}{N^2}\\&\leqslant 2B_V {\widetilde{\Sigma }_{V}}({\widehat{\mu }_N})+\frac{8}{N^2}+3\frac{\Vert V\Vert ^\mathcal{K _1}_{Lip}+B+\ln (N)}{N} \end{aligned}$$

\(\square \)

3.3 Tightness

The next step is to get a lower bound on the normalising constant \(Z^N_{V,\beta }.\) From large deviation results (Theorem 6), it is easy to check that \(\frac{1}{N^2} \ln Z^N_{V,\beta }\) has a finite limit \(-c_{V,\beta }\) and that \(c_{V,\beta }=\frac{\beta }{2}c_{\frac{2V}{\beta }}\). But hereafter, we are seeking a lower bound which is not asymptotic in \(N.\) This is the only place where the condition b. of Hypothesis 2 is needed.

Lemma 3

For any \(V\) a function satisfying Hypothesis 1 and \(\beta >0\) such that the equilibrium measure \(\mu _{\frac{2V}{\beta }}\) such that \(H(\mu _{\frac{2V}{\beta }})\) is finite, there exists a constant \(A_{V,\beta }\) such that for any \(N \in \mathbb N ^*,\)

$$\begin{aligned} \frac{1}{N^2} \ln Z^N_{V,\beta }+c_{V,\beta }\geqslant \frac{A_{V,\beta }}{N}. \end{aligned}$$

Proof

We follow closely a proof by Johansson in [27]. We denote by \(\rho _V\) the density of \(\mu _V.\) Note that if \(H(\mu _{\frac{2V}{\beta }})\) is finite, it implies in particular that \(\rho _{\frac{2V}{\beta }}\) is well defined and we introduce the following ensemble:

$$\begin{aligned} E_N:= \left\{ (x_1, \ldots , x_N) \in \mathbb{R }^N \Bigg | \prod _{i=1}^N \rho _{\frac{2V}{\beta }}(x_i) >0\right\} . \end{aligned}$$

Then,

$$\begin{aligned} Z^N_{V,\beta }\geqslant \int _{E_N} \exp \left( -N^2\frac{\beta }{2}{\widetilde{J}_{\frac{2V}{\beta }}}({\widehat{\mu }_N})\right) \prod _{i=1}^Ne^{-\ln \rho _{\frac{2V}{\beta }}(x_i)}\prod _{i=1}^N\rho _{\frac{2V}{\beta }}(x_i)dx_i \end{aligned}$$

and using Jensen inequality we get:

$$\begin{aligned} \ln Z^N_{V,\beta }&\geqslant -N^2\frac{\beta }{2}\int \widetilde{J}_{\frac{2V}{\beta }}({\widehat{\mu }_N})\prod _{i=1}^N\rho _{\frac{2V}{\beta }}(x_i)dx_i-N\int \ln \rho _{\frac{2V}{\beta }}(x)\rho _{\frac{2V}{\beta }}(x)dx\\&= -N(N-1)\frac{\beta }{2}J_{\frac{2V}{\beta }}(\mu _{\frac{2V}{\beta }})-N\int (V(x)+\ln \rho _{\frac{2V}{\beta }}(x))\rho _{\frac{2V}{\beta }}(x)dx \end{aligned}$$

We conclude by recalling that by definition: \(J_{\frac{2V}{\beta }}(\mu _{\frac{2V}{\beta }})=c_{\frac{2V}{\beta }}\). \(\square \)

We then need to control the behaviour of the largest eigenvalue. The proof follows the ideas of the proof of Proposition 2.1 in [6].

Lemma 4

Assume that \(V\) is a continuous function such that for some \(\alpha >0\) and \(d>1,\) \(V(x)-\alpha x^d\) is bounded from below on \(\mathbb{R }\). Then for any \(\beta >0\) and \(0<a<\alpha /2\), there exists \(M_0>0\) such that for any \(M\geqslant M_0\) and \(N\in \mathbb{N }^*\),

$$\begin{aligned} \mathbb{P }^N_{V,\beta }\left( \max _{i=1..N}|x_i| \geqslant M\right) \leqslant e^{-a NM^d}. \end{aligned}$$

Proof

First, we need to control \({Z^{N-1}_{V,\beta }}/{Z^N_{V,\beta }}\). For all \(L>0\),

$$\begin{aligned} \frac{Z^N_{V,\beta }}{Z^{N-1}_{V,\beta }}&\geqslant \int _{|x_N|<L} \int \exp \left( {-(N-1)V(x_N)+\sum _{i=1}^{N-1} \ln |x_N-x_i|^\beta -V(x_i) }\right) \\&\qquad \qquad \qquad \qquad d\mathbb{P }^{N-1}_{V,\beta }(x_1, \ldots , x_{N-1})Y_{V,L}d\rho _{V,L}(x_N) \end{aligned}$$

with \(\rho _{V,L}\) the probability measure of density \((Y_{V,L})^{-1}\exp (-V(\cdot ))\mathbf{1}_{[-L;L]}\) and \(Y_{V,L}\) its normalising constant.

By Jensen inequality we get

$$\begin{aligned} \ln \frac{Z^N_{V,\beta }}{Z^{N-1}_{V,\beta }}&\geqslant \ln Y_{V,L} \\&+ \int \left( -(N-1)V(x_N)+\sum _{i=1}^{N-1} \ln |x_N-x_i|^\beta -V(x_i)\right) \\&\qquad \quad d\mathbb{P }^{N-1}_{V,\beta }(x_1, \ldots , x_{N-1})d\rho _{V,L}(x_N) \end{aligned}$$

By Chebychev inequality, for any \(R>0,\)

$$\begin{aligned} \mathbb{P }^N_{V,\beta }\left( \frac{1}{N}\sum _{i=1}^{N}V(x_i)>R\right) \leqslant e^{- \frac{1}{2} N^2R}\frac{Z^N_{V/2, \beta }}{Z^N_{V,\beta }}. \end{aligned}$$

Now, from Theorem 6, we know that \( \frac{1}{N^2} \ln \left( \frac{Z^N_{V/2, \beta }}{Z^N_{V,\beta }}\right) \) converges so that it is bounded. From there, we can easily deduce that \(\int \frac{1}{N}\sum _{i=1}^{N}V(x_i)d\mathbb{P }^N_{V,\beta }\) is uniformly bounded in \(N.\) Since \(x\mapsto \int \ln |y-x|d\rho _{V,L}(x)\) is bounded from below, we immediately see that there exists a finite constant \(D_{V,\beta }\) such that for all \(N\),

$$\begin{aligned} \frac{1}{N}\ln \frac{Z^N_{V,\beta }}{Z^{N-1}_{V,\beta }}\geqslant D_{V,\beta }. \end{aligned}$$

With this bound, we can complete the proof of the Lemma. We integrate separately on \(x_N\) and on \(x_1,\ldots ,x_{N-1}\) to get:

$$\begin{aligned} \mathbb{P }^N_{V,\beta }(|x_N|\geqslant M)=\frac{Z^{N-1}_{V,\beta }}{Z^N_{V,\beta }} \int _{|x_N|>M} e^{-NV(x_N)} \int \left( \prod _{i=1}^{N-1} |x_N-x_i|^\beta e^{-V(x_i)}\right) \\ d\mathbb{P }^{N-1}_{V,\beta }(x_1, \ldots , x_{N-1})dx_N. \end{aligned}$$

There exists \(b_{V,\beta }>0\) such that

$$\begin{aligned} |x-y|^\beta e^{-V(y)}\leqslant b_{V,\beta }e^{V(x)/2}. \end{aligned}$$

Therefore,

$$\begin{aligned} \mathbb{P }^N_{V,\beta }(|x_N|\geqslant M) \leqslant e^{-ND_{V,\beta }}b_{V,\beta }^{N-1}\int _{|x_N|>M} e^{-\frac{N+1}{2}V(x_N)}dx_N, \end{aligned}$$

Let \(\gamma _V>0\) be such that for all \(x,\,V(x)-\alpha x^d >-\gamma _V\). If \(M>1\),

$$\begin{aligned} \mathbb{P }^N_V(|x_N|\geqslant M) \leqslant e^{-ND_{V,\beta }}b_{V,\beta }^{N-1}e^{\frac{N+1}{2} \gamma _V}2\frac{e^{-\frac{N+1}{2}\alpha M^d}}{\alpha \frac{N+1}{2}}. \end{aligned}$$

For any \(0<a< \alpha /2,\,M>M_0\), we obtain

$$\begin{aligned} \mathbb{P }^N_{V,\beta }(\max |x_i| \geqslant M) \leqslant N\mathbb{P }^N_V(|x_N|\geqslant M)\leqslant Ke^{-a NM^d} \end{aligned}$$

with

$$\begin{aligned} K=\sup _{N\in \mathbb{N }^*}Ne^{-ND_{V,\beta }}b_{V,\beta }^{N-1} e^{\frac{N+1}{2}\gamma _V}2\frac{e^{-(N(a-\frac{\alpha }{2}) +\frac{\alpha }{2}) M_0^d}}{\alpha \frac{N+1}{2}}. \end{aligned}$$

Now, \(a\) being fixed, we can clearly choose \(M_0\) such that \(K\) is finite and less than \(1\).

\(\square \)

3.4 Concentration results

Our goal is now to show Theorem 5. As an intermediate result, we will first show the following result, which deals with concentration when restricted to a compact set. Then, the proof of Theorem 5 will combine this result and the tightness shown in the preceding subsection.

Theorem 7

(Concentration inequality on a compact set) Let \(V\) be a locally Lipschitz function satisfying Hypothesis 1 and \(\beta >0,\) such that the equilibrium measure \(\mu _{\frac{2V}{\beta }}\) has a finite classical entropy. Then, for all \(M>0\), there exists \(u,v >0\) such that for all \(\theta >v\sqrt{\frac{\ln (1+N)}{ N}}\),

$$\begin{aligned} \mathbb{P }^N_{V,\beta }\left( W_1({\widehat{\mu }_N},\mu _{\frac{2V}{\beta }} )\geqslant \theta ,\forall i, |x_i|<M\right) \leqslant e^{-u N^2 \theta ^2}. \end{aligned}$$

Proof

We can rewrite our measure \(\mathbb{P }^N_{V,\beta }\) as follows

$$\begin{aligned} \mathbb{P }^N_{V,\beta }(dx_1, \ldots , dx_N) = \frac{e^{-N^2 c_{V,\beta }}}{Z^N_{V,\beta }}e^{- N^2 \frac{\beta }{2}{\widetilde{\Sigma }_{\frac{2}{\beta }V}}({\widehat{\mu }_N})}dx_1 \ldots dx_N. \end{aligned}$$

Thus, using Lemma 3, we get

$$\begin{aligned}&\mathbb{P }^N_{V,\beta }\left( W_1({\widehat{\mu }_N}, \mu _{\frac{2V}{\beta }}) \geqslant \theta ,\max |x_i|<M\right) \\&\quad \leqslant e^{-NA_{V,\beta }}(2M)^N \exp \left( -N^2\frac{\beta }{2} \inf \left\{ {\widetilde{\Sigma }_{\frac{2}{\beta }V}}({\widehat{\mu }_N})\left| \begin{array}{ll}\forall i, x_i\in [-M;M],\\ W_1({\widehat{\mu }_N}, \mu _{\frac{2V}{\beta }}) \geqslant \theta \end{array}\right. \right\} \right) . \end{aligned}$$

Next we apply the approximate free \(T_1\) inequality of Proposition 5 to obtain for any \(u>0\),

$$\begin{aligned}&\mathbb{P }^N_{V,\beta }\left( W_1({\widehat{\mu }_N}, \mu _{\frac{2V}{\beta }}) \geqslant \theta ,\max |x_i|<M\right) \\&\quad \leqslant e^{-NA_{V,\beta }}(2M)^N \exp \left( \frac{\beta N}{4B_V}\left( 3(\Vert V\Vert ^{[-M-1;M+1]}_{Lip}+B+\ln (N))- N\theta ^2\right) \right) \\&\quad \leqslant K(N,\theta ,u)\exp \left( -u N^2 \theta ^2\right) \end{aligned}$$

with

$$\begin{aligned}&K(N,\theta ,u) \\&\quad = \exp \left( N\left( -A_{V,\beta }+\ln (2M)+ \frac{3\beta }{4B_V}\left( \Vert V\Vert ^{[-M-1;M+1]}_ {Lip}+B+\ln (N)\right) \right. \right. \\&\qquad \left. \left. + \left( u-\frac{\beta }{4B_V}\right) N\theta ^2\right) \right) . \end{aligned}$$

Let us choose \(u<\frac{\beta }{4B_V}\) so that \(K(N,\theta ,u)\) is a decreasing function in \(\theta \). It is then easy to check that for a good choice of \(v\) (which may depend on \(M,\,V\) and \(\beta \)), for all \(\theta >v\sqrt{\frac{\ln (1+N)}{ N}}\),

$$\begin{aligned} K(N,\theta ,u)\leqslant K\left( N,v\sqrt{\frac{\ln (1+N)}{ N}},u\right) \leqslant 1. \end{aligned}$$

\(\square \)

We can now complete the proof of Theorem 5.

Proof

Following the same steps as above, we get that for any \(M, \theta >0,\)

$$\begin{aligned}&\mathbb{P }^N_{V,\beta }\left( W_1({\widehat{\mu }_N}, \mu _{\frac{2V}{\beta }}) \geqslant \theta \right) \\&\quad \leqslant e^{-NA_{V,\beta }}(2M)^N \exp \left( \frac{\beta N}{4B_V} \left( 3(\Vert V\Vert ^{[-M-1;M+1]}_{Lip}+B+\ln (N))- N\theta ^2\right) \right) \\&\qquad +\,\mathbb{P }^N_{V,\beta }\left( \max |x_i|>M\right) . \end{aligned}$$

Now, from Lemma 4 above, under Hypothesis 2, we have that, for any \(0<a<\frac{\alpha }{2d}\) and \(M\) large enough,

$$\begin{aligned} \mathbb{P }^N_{V,\beta }\left( \max |x_i|>M\right) \leqslant e^{-aNM^{d}}. \end{aligned}$$

Thus, if we choose \(\theta >v\sqrt{\frac{\ln (1+N)}{ N}}\) with \(v>M_0^{\frac{d}{2}}\) and \(M=(\sqrt{N}\theta )^{\frac{2}{d}} > M_0\), we get, for any \(u>0,\)

$$\begin{aligned} \mathbb{P }^N_{V,\beta }\left( W_1({\widehat{\mu }_N}, \mu _{\frac{2V}{\beta }}) \geqslant x\right) \leqslant \widetilde{K}(N,\theta ,u)\exp \left( -u N^2 \theta ^2\right) \end{aligned}$$

with

$$\begin{aligned}&\widetilde{K}(N,\theta ,u) = \exp \left( N\left( -A_{V,\beta }+\ln \left( 2(\sqrt{N} \theta ) ^{\frac{2}{d}}\right) \right. \right. \\&\qquad \qquad \qquad \quad + \left. \left. 3\left( \Vert V\Vert ^{[-(\sqrt{N} \theta )^{\frac{2}{d}}-1; (\sqrt{N} \theta )^{\frac{2}{d}}+1]}_{Lip}+B+\ln (N)\right) \right. \right. \\&\qquad \qquad \qquad \quad \left. \left. + \left( u-\frac{\beta }{4B_V}\right) (\sqrt{N}\theta )^2\right) \right) +\exp \left( -(a-u)N^2\theta ^2\right) . \end{aligned}$$

Again the result follows easily if we choose \(u < \min \big (\frac{\beta }{4B_V} ,a\big )\) since

$$\begin{aligned} \Vert V\Vert ^{[-(\sqrt{N} \theta )^{\frac{2}{d}}-1;(\sqrt{N} \theta )^{\frac{2}{d}}+1]}_{Lip}=O\left( \left( \sqrt{N} \theta \right) ^{\frac{2(d-1)}{d}}\right) =o(N\theta ^2). \end{aligned}$$

\(\square \)