A high-dimensional CLT in $$\mathcal {W}_2$$ distance with near optimal convergence rate

Zhai, Alex

doi:10.1007/s00440-017-0771-3

A high-dimensional CLT in $\mathcal {W}_2$ distance with near optimal convergence rate

Published: 24 March 2017

Volume 170, pages 821–845, (2018)
Cite this article

Probability Theory and Related Fields Aims and scope Submit manuscript

Alex Zhai ORCID: orcid.org/0000-0003-1695-7619¹

955 Accesses
22 Citations
Explore all metrics

Abstract

Let $X_1,\ldots ,X_n$ be i.i.d. random vectors in $\mathbb {R}^d$ with $\Vert X_1\Vert \le \beta $. Then, we show that

$$\begin{aligned} \frac{1}{\sqrt{n}}\left( X_1 + \cdots + X_n\right) \end{aligned}$$

converges to a Gaussian in quadratic transportation (also known as “Kantorovich” or “Wasserstein”) distance at a rate of $O \left( \frac{\sqrt{d} \beta \log n}{\sqrt{n}} \right) $, improving a result of Valiant and Valiant. The main feature of our theorem is that the rate of convergence is within $\log n$ of optimal for $n, d \rightarrow \infty $.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the complete convergence for sequences of random vectors in Hilbert spaces

Article 20 June 2015

A note on the convergence rates in precise asymptotics

Article Open access 18 January 2019

A prophet inequality for $$L^p$$ -bounded dependent random variables

Article Open access 11 June 2014

Notes

See [25] for the full version.
It should be noted that the bounds obtained by Bubeck and Ganguly [12] are also optimal to within logarithmic factors, but they are specific to Wishart matrices. We mention also the work of Bentkus and Götze [3], which obtains optimal bounds for quadratic forms under certain somewhat specialized assumptions.
Other names appearing in the literature include “Monge–Kantorovich distance”, “Kantorovich distance”, and “Wasserstein distance”. We refer to [26] for a historical discussion of the concept.
On the other hand, convergence in probabilities of convex sets does not in general imply convergence in $\mathcal {W}_2$ distance, and we do not know of any easy way to derive a result similar to Theorem 1.1 from Theorem 1.3.
We remark that even if the $d^{1/4}$ in Theorem 1.3 were replaced by a constant as in Nagaev’s lower bound, it would only give $\Delta _{{ CI}}(S_n, Z) = o(1)$ for $d = o(n^{1/3})$, which is still more restrictive than $d = o(n^{2/5})$. Thus, Corollary 1.5 proves that under the assumption $\Vert X_1\Vert = \sqrt{d}$, convergence in $\Delta _{{ CI}}$ is actually faster than indicated by Nagaev’s example (which does not satisfy $\Vert X_1\Vert = \sqrt{d}$).
The constant was later improved to $(2 \pi )^{-1/4} \approx 0.64$ by Nazarov [18], who also constructed an example with surface area of order $d^{1/4}$.
This is also given as equation (1.4) in [2].

References

Ball, K.: The reverse isoperimetric problem for Gaussian measure. Discrete Comput. Geom. 10(4), 411–420 (1993)
Article MathSciNet MATH Google Scholar
Bentkus, V.: On the dependence of the Berry–Esseen bound on dimension. J. Stat. Plan. Inference 113(2), 385–402 (2003)
Article MathSciNet MATH Google Scholar
Bentkus, V., Götze, F.: Optimal rates of convergence in the CLT for quadratic forms. Ann. Probab. 24(1), 466–490 (1996)
Article MathSciNet MATH Google Scholar
Bergström, H.: On the central limit theorem in the space $R_k$, $k >1$. Scand. Actuar. J. 1945(1–2), 106–127 (1945)
Article MathSciNet MATH Google Scholar
Bhattacharya, R.N.: Refinements of the multidimensional central limit theorem and applications. Ann. Probab. 5(1), 1–27 (1977)
Article MathSciNet MATH Google Scholar
Bhattacharya, R., Holmes, S.: An exposition of Götze’s estimation of the rate of convergence in the multivariate central limit theorem. Preprint arXiv:1003.4254 (2010)
Bobkov, S.G.: Entropic approach to E. Rio’s central limit theorem for ${\cal{W}}_2$ transport distance. Stat. Prob. Lett. 83(7), 1644–1648 (2013)
Article MATH Google Scholar
Bobkov, S.G., Chistyakov, G., Götze, F.: Berry–Esseen bounds in the entropic central limit theorem. Probab. Theory Relat. Fields 159(3–4), 435–478 (2014)
Article MathSciNet MATH Google Scholar
Bobkov, S.G., Götze, F.: Exponential integrability and transportation cost related to logarithmic Sobolev inequalities. J. Funct. Anal. 163(1), 1–28 (1999)
Article MathSciNet MATH Google Scholar
Bonis, T.: Rates in the central limit theorem and diffusion approximation via Stein’s method. Preprint arXiv:1506.06966 (2015)
Bubeck, S., Ding, J., Eldan, R., Rácz, M.: Testing for high-dimensional geometry in random graphs. Random Struct. Algorithms 49(3), 503–532 (2016)
Article MathSciNet MATH Google Scholar
Bubeck, S., Ganguly, S.: Entropic CLT and phase transition in high-dimensional Wishart matrices. Preprint arXiv:1509.03258 (2015)
Chen, L.H.Y., Fang, X.: Multivariate normal approximation by Stein’s method: the concentration inequality approach. Preprint arXiv:1111.4073 (2011)
Chernozhukov, V., Chetverikov, D., Kato, K.: Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Stat. 41(6), 2786–2819 (2013)
Article MathSciNet MATH Google Scholar
Götze, F.: On the rate of convergence in the multivariate CLT. Ann. Probab. 19(2), 724–739 (1991)
Article MathSciNet MATH Google Scholar
Marton, K.: Bounding $\bar{d}$-distance by informational divergence: a method to prove measure concentration. Ann. Probab. 24(2), 857–866 (1996)
Article MathSciNet MATH Google Scholar
Nagaev, S.V.: An estimate of the remainder term in the multidimensional CLT. In: Proceedings of the third Japan-USSR Symposium on Probability Theory, pp. 419–438. Springer, Berlin (1976)
Nazarov, F.: On the maximal perimeter of a convex set in${\mathbb{R}}^n$ with respect to a Gaussian measure. In: Geometric Aspects of Functional Analysis, pp. 169–187. Springer, Berlin (2003)
Rio, E.: Upper bounds for minimal distances in the central limit theorem. Annales de l’IHP Probabilits et Statistiques 45(3), 802–817 (2009)
MathSciNet MATH Google Scholar
Rio, E.: Asymptotic constants for minimal distance in the central limit theorem. Electron. Commun. Probab. 16, 96–103 (2011)
Article MathSciNet MATH Google Scholar
Sazanov, V.V.: On the multi-dimensional central limit theorem. Sankhyā Indian J. Stat. Ser. A 30(2), 181–204 (1968)
Senatov, V.V.: Uniform estimates of the rate of convergence in the multi-dimensional central limit theorem. Theory Probab. Appl. 24(4), 745–759 (1980)
MathSciNet MATH Google Scholar
Talagrand, M.: Transportation cost for Gaussian and other product measures. Geom. Funct. Anal. 6(3), 587–600 (1996)
Article MathSciNet MATH Google Scholar
Valiant, G., Valiant, P.: Estimating the unseen: an $n/\log (n)$-sample estimator for entropy and support size, shown optimal via new CLTs. In: Proceedings of the Forty-Third Annual ACM Symposium on the Theory of Computing, pp. 685–694 (2011)
Valiant, G., Valiant, P.: A CLT and tight lower bounds for estimating entropy. http://www.eccc.uni-trier.de/report/2010/179/ (2010)
Vershik, A.M.: Long history of the Monge–Kantorovich transportation problem. Math. Intell. 35(4), 1–9 (2013)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We are indebted to Jian Ding for suggesting the use of Talagrand’s transportation inequality and Amir Dembo for pointing out a hole in a preliminary version of the main argument as well as many helpful comments on the exposition. We also thank Sourav Chatterjee for helpful discussions about related work. Finally, we thank the anonymous reviewers for many good suggestions and for pointing out several references.

Author information

Authors and Affiliations

Stanford University, Stanford, CA, USA
Alex Zhai

Authors

Alex Zhai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alex Zhai.

Appendix

1.1 Proof of Proposition 1.2

Proof

Let $\ell _n = \frac{\beta }{\sqrt{n}}$, and consider the lattice $L = \ell _n \mathbb {Z}^d$. For any $x \in \mathbb {R}^d$, let $d_L(x)$ denote the minimum Euclidean distance from x to L. Note that $S_n$ takes values in L. Thus, letting $\rho $ denote the density of Z, we have v

$$\begin{aligned} \mathcal {W}_2(S_n, Z) \ge \int \rho (x) d_L(x)~{ dx}. \end{aligned}$$

To estimate the right hand side, for any $y \in L$, let $Q_n(y)$ denote the cube of side length $\ell _n$ centered at y (which is also the set of points in $\mathbb {R}^d$ closer to y than to any other point in L). We find that

$$\begin{aligned} \frac{1}{{{\mathrm{Vol}}}Q_n(y)} \int _{Q_n(y)} d_L(x) ~{ dx}&= \frac{\ell _n}{2^d} \int _{[-1,1]^d} \Vert x\Vert ~{ dx} = \frac{\ell _n}{2^d} \int _{[-1,1]^d} \sqrt{x_1^2 + \cdots + x_d^2} ~{ dx} \nonumber \\&\ge \frac{\ell _n}{2^d} \int _{[-1,1]^d} \frac{1}{\sqrt{d}}\left( |x_1| + \cdots + |x_d|\right) ~{ dx} = \frac{1}{2} \ell _n \sqrt{d}. \end{aligned}$$

(9)

Next, let M be large enough so that,

$$\begin{aligned} \int _{[-M,M]^d} \rho (x) ~{ dx} \ge \frac{1}{2}, \end{aligned}$$

and let

$$\begin{aligned} r_n = \inf _{\begin{array}{c} x, y \in [-2M, 2M]^d \\ \Vert x - y\Vert \le \sqrt{d} \ell _n \end{array}} \frac{\rho (x)}{\rho (y)}. \end{aligned}$$

(10)

Note that since $\rho $ is positive and continuous, we have $\lim _{n \rightarrow \infty } r_n = 1$.

Assume now that n is sufficiently large so that $\ell _n < M$. Combining (10) with (9), we have for each $y \in L \cap [-M,M]^d$ that

$$\begin{aligned} \int _{Q_n(y)} \rho (x) d_L(x) ~{ dx}&\ge \frac{r_n}{{{\mathrm{Vol}}}Q_n(y)} \int _{Q_n(y)} \rho (x) ~{ dx} \cdot \int _{Q_n(y)} d_L(x) ~{ dx} \\&\ge \frac{r_n \ell _n \sqrt{d}}{2} \int _{Q_n(y)} \rho (x) ~{ dx}. \end{aligned}$$

Summing over all such y yields

$$\begin{aligned} \mathcal {W}_2(S_n, Z)&\ge \int \rho (x) d_L(x) ~{ dx} \ge \int _{[-2M,2M]^d} \rho (x) d_L(x) ~{ dx} \\&\ge \sum _{y \in L \cap [-M,M]^d} \int _{Q_n(y)} \rho (x) d_L(x) ~{ dx} \\&\ge \frac{r_n \ell _n \sqrt{d}}{2} \int _{[-M,M]^d} \rho (x) ~{ dx} \ge \frac{r_n \beta \sqrt{d}}{4 \sqrt{n}}. \end{aligned}$$

Multiplying both sides by $\sqrt{n}$ and taking limits gives the result. $\square $

1.2 Proof of Proposition 1.4

Proof

We prove the result with $C = 5$. Let $A \subset \mathbb {R}^d$ be a given convex set. For a parameter $\epsilon $ to be specified later, define

$$\begin{aligned} A^\epsilon= & {} \left\{ x \in \mathbb {R}^d \mid \sup _{a \in A} \Vert x - a\Vert \le \epsilon \right\} \\ A_\epsilon= & {} \left\{ x \in \mathbb {R}^d \mid \inf _{a \in \mathbb {R}^d \setminus A} \Vert x - a\Vert \ge \epsilon \right\} . \end{aligned}$$

Ball [1] showed a $4d^{1/4}$ upper bound^{Footnote 6} for the Gaussian surface area of any convex set in $\mathbb {R}^d$. Hence,^{Footnote 7}

$$\begin{aligned} \mathbf {P}\left( Z \in A^\epsilon \setminus A\right) \le 4 \epsilon d^{1/4},\quad \text {and}\quad \mathbf {P}\left( Z \in A \setminus A_\epsilon \right) \le 4 \epsilon d^{1/4}. \end{aligned}$$

We may regard T as being coupled to Z so that $\mathbf {E}\Vert T - Z\Vert ^2 = \mathcal {W}_2(T, Z)^2$. Then,

$$\begin{aligned} \mathbf {P}(T \in A)&\le \mathbf {P}(\Vert T - Z\Vert \le \epsilon , \; T \in A) + \mathbf {P}(\Vert T - Z\Vert > \epsilon ) \\&\le \mathbf {P}(Z \in A^\epsilon ) + \epsilon ^{-2} \mathcal {W}_2(T, Z)^2 \\&\le \mathbf {P}(Z \in A) + 4 \epsilon d^{1/4} + \epsilon ^{-2} \mathcal {W}_2(T, Z)^2 \end{aligned}$$

Similarly,

$$\begin{aligned} \mathbf {P}(Z \in A)&\le \mathbf {P}(Z \in A_\epsilon ) + 4 \epsilon d^{1/4} \\&\le \mathbf {P}(\Vert T - Z\Vert \le \epsilon , \; Z \in A_\epsilon ) + \mathbf {P}(\Vert T - Z\Vert > \epsilon ) + 4 \epsilon d^{1/4} \\&\le \mathbf {P}(T \in A) + \epsilon ^{-2}\mathcal {W}_2(T, Z)^2 + 4 \epsilon d^{1/4}. \end{aligned}$$

Thus,

$$\begin{aligned} |\mathbf {P}(T \in A) - \mathbf {P}(Z \in A)| \le \epsilon ^{-2}\mathcal {W}_2(T, Z)^2 + 4 \epsilon d^{1/4}, \end{aligned}$$

and taking $\epsilon = d^{-1/12} \mathcal {W}_2(T, Z)^{2/3}$ gives the result. $\square $

1.3 Proof of Lemma 3.3

Proof

Let $A'$ and $B'$ be independent copies of A and B. Then,

$$\begin{aligned} \mathbf {E}\Big ( f(A, B) + f(A', B') - f(A, B') - f(A', B) \Big )^2 \ge 0. \end{aligned}$$

Expanding yields

$$\begin{aligned} 4 \mathbf {E}f(A, B)^2 + 4 (\mathbf {E}f(A, B))^2&= 4 \mathbf {E}f(A, B)^2 + 2 \mathbf {E}f(A, B) f(A', B') \\&\quad +\,2 \mathbf {E}f(A, B') f(A', B) \\&\ge 2 \mathbf {E}f(A, B)f(A, B') + 2 \mathbf {E}f(A, B) f(A', B) \\&\quad +\,2 \mathbf {E}f(A', B') f(A, B') + 2 \mathbf {E}f(A', B') f(A', B) \\&= 2 \mathbf {E}f_B(A)^2 + 2 \mathbf {E}f_A(B)^2 \\&\quad +\,2 \mathbf {E}f_A(B)^2 + 2 \mathbf {E}f_B(A)^2 \\&= 4 \mathbf {E}f_B(A)^2 + 4 \mathbf {E}f_A(B)^2, \end{aligned}$$

as desired. $\square $

1.4 Proof of Equation (2)

Proof

We proceed by induction on the dimension d, retracing the argument of [23], section 3. The base case $d = 1$ is immediate from Theorem 3.1.

Assume now that the inequality holds in $d - 1$ dimensions. For the inductive step, we can follow the same argument used to prove Theorem 3.1 (see [23], section 3). The argument proceeds by first comparing Y to another $\mathbb {R}^d$-valued random variable ${\hat{Y}}$ sharing the first $d - 1$ coordinates of Y, but whose last coordinate is independently drawn from $\mathcal {N}(0, \sigma _d)$.

Fix a $(d - 1)$-dimensional vector ${\hat{x}}$, and let $T_{{\hat{x}}}$ denote a random variable distributed as the last coordinate of Y conditioned on the first $d - 1$ coordinates being equal to ${\hat{x}}$. Let ${\hat{\rho }}({\hat{x}}) = \int _{-\infty }^\infty \rho ({\hat{x}}, t) { dt}$. Then, the density of $T_{{\hat{x}}}$ at t is given by

$$\begin{aligned} \frac{f({\hat{x}}, t) \cdot \rho ({\hat{x}}, t)}{f_{(d)}({\hat{x}}, 0) \cdot {\hat{\rho }}({\hat{x}})}. \end{aligned}$$

Noting that $\frac{\rho ({\hat{x}}, t)}{{\hat{\rho }}({\hat{x}})}$ is the density of $\mathcal {N}(0, \sigma _d)$ at t, the one-dimensional case of Theorem 3.1 implies

$$\begin{aligned} \mathcal {W}_2(T_{{\hat{x}}}, \mathcal {N}(0, \sigma _d))^2 \le 2 \sigma _d^2 \int _{-\infty }^\infty \frac{f({\hat{x}}, t)}{f_{(d)}({\hat{x}}, t)} \log \left( \frac{f({\hat{x}}, t)}{f_{(d)}({\hat{x}}, t)} \right) \frac{\rho ({\hat{x}}, t)}{{\hat{\rho }}({\hat{x}})} ~{ dt}. \end{aligned}$$

(11)

Since $T_{{\hat{x}}}$ and $\mathcal {N}(0, \sigma _d)$ have the same distributions as Y and ${\hat{Y}}$ conditioned on ${\hat{x}}$, we may integrate (11) over ${\hat{x}}$ to obtain

$$\begin{aligned} \mathcal {W}_2(Y, {\hat{Y}})^2&\le 2 \int _{\mathbb {R}^{d - 1}} \mathcal {W}_2(T_{{\hat{x}}}, \mathcal {N}(0, \sigma _d))^2 \cdot f_{(d)}({\hat{x}}, 0) {\hat{\rho }}({\hat{x}}) ~d{\hat{x}} \\&\le 2 \sigma _d^2 \int _{\mathbb {R}^{d - 1}} \int _{-\infty }^\infty f({\hat{x}}, t) \log \left( \frac{f({\hat{x}}, t)}{f_{(d)}({\hat{x}}, t)} \right) \rho ({\hat{x}}, t) ~{ dt} ~d{\hat{x}}. \\&= 2 \sigma _d^2 \cdot \mathbf {E}\left( f(Z) \log \frac{f(Z)}{f_{(d)}(Z)} \right) \\&= 2 \sigma _d^2 \cdot \bigg ( \mathbf {E}\left( f(Z) \log f(Z) \right) - \mathbf {E}\left( f_{(d)}(Z) \log f_{(d)}(Z) \right) \bigg ) \end{aligned}$$

Next, define $Y_{(d)}$ and $Z_{(d)}$ to be the projections onto the first $d - 1$ coordinates of Y and Z, respectively. Note that the coupling of Y to ${\hat{Y}}$ changes only d-th coordinate. Furthermore, the d-th coordinates of ${\hat{Y}}$ and Z are both distributed as $\mathcal {N}(0, \sigma _d)$ independent of the first $d - 1$ coordinates. Thus, a coupling of $Y_{(d)}$ to $Z_{(d)}$ induces a coupling of ${\hat{Y}}$ to Z in which the last coordinate does not change. Consequently,

$$\begin{aligned} \mathcal {W}_2(Y, Z)^2\le & {} 2 \sigma _d^2 \cdot \bigg ( \mathbf {E}\left( f(Z) \log f(Z) \right) - \mathbf {E}\left( f_{(d)}(Z) \log f_{(d)}(Z) \right) \bigg )\nonumber \\&+ \mathcal {W}_2(Y_{(d)}, Z_{(d)})^2. \end{aligned}$$

(12)

Now, recall that the density of $Y_{(d)}$ at a point ${\hat{x}} \in \mathbb {R}^{d - 1}$ is $f_{(d)}({\hat{x}}, 0) \cdot {\hat{\rho }}({\hat{x}})$, and so applying the inductive hypothesis to $\mathcal {W}_2(Y_{(d)}, Z_{(d)})^2$ yields

$$\begin{aligned} \mathcal {W}_2(Y_{(d)}, Z_{(d)})^2\le & {} 2 \sum _{k = 1}^{d - 1} \sigma _k^2 \cdot \mathbf {E}\left( f_{[k]}(Z_{(d)}) \log f_{[k]}(Z_{(d)}) - f_{[k - 1]}(Z_{(d)}) \log f_{[k - 1]}(Z_{(d)}) \right) \\= & {} 2 \sum _{k = 1}^{d - 1} \sigma _k^2 \cdot \mathbf {E}\left( f_{[k]}(Z) \log f_{[k]}(Z) - f_{[k - 1]}(Z) \log f_{[k - 1]}(Z) \right) . \end{aligned}$$

Substituting into (12), we obtain

$$\begin{aligned} \mathcal {W}_2(Y, Z)^2 \le 2 \sum _{k = 1}^{d - 1} \sigma _k^2 \cdot \mathbf {E}\left( f_{[k]}(Z) \log f_{[k]}(Z) - f_{[k - 1]}(Z) \log f_{[k - 1]}(Z) \right) , \end{aligned}$$

completing the induction. $\square $

1.5 Proof of Lemma 4.2

Proof

Let $C_k = (2 \pi )^{-\frac{k}{2}}$. We have

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhai, A. A high-dimensional CLT in $\mathcal {W}_2$ distance with near optimal convergence rate. Probab. Theory Relat. Fields 170, 821–845 (2018). https://doi.org/10.1007/s00440-017-0771-3

Download citation

Received: 10 June 2016
Revised: 23 February 2017
Published: 24 March 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s00440-017-0771-3

Mathematics Subject Classification

60F05

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A high-dimensional CLT in \(\mathcal {W}_2\) distance with near optimal convergence rate

Abstract

Access this article

Similar content being viewed by others

On the complete convergence for sequences of random vectors in Hilbert spaces

A note on the convergence rates in precise asymptotics

A prophet inequality for $$L^p$$ -bounded dependent random variables

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Proof of Proposition 1.2

Proof

1.2 Proof of Proposition 1.4

Proof

1.3 Proof of Lemma 3.3

Proof

1.4 Proof of Equation (2)

Proof

1.5 Proof of Lemma 4.2

Proof

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification

Navigation

A high-dimensional CLT in \(\mathcal {W}_2\) distance with near optimal convergence rate

Abstract

Access this article

Similar content being viewed by others

On the complete convergence for sequences of random vectors in Hilbert spaces

A note on the convergence rates in precise asymptotics

A prophet inequality for $$L^p$$ -bounded dependent random variables

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Proof of Proposition 1.2

Proof

1.2 Proof of Proposition 1.4

Proof

1.3 Proof of Lemma 3.3

Proof

1.4 Proof of Equation (2)

Proof

1.5 Proof of Lemma 4.2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation