Skip to main content
Log in

A high-dimensional CLT in \(\mathcal {W}_2\) distance with near optimal convergence rate

  • Published:
Probability Theory and Related Fields Aims and scope Submit manuscript

Abstract

Let \(X_1,\ldots ,X_n\) be i.i.d. random vectors in \(\mathbb {R}^d\) with \(\Vert X_1\Vert \le \beta \). Then, we show that

$$\begin{aligned} \frac{1}{\sqrt{n}}\left( X_1 + \cdots + X_n\right) \end{aligned}$$

converges to a Gaussian in quadratic transportation (also known as “Kantorovich” or “Wasserstein”) distance at a rate of \(O \left( \frac{\sqrt{d} \beta \log n}{\sqrt{n}} \right) \), improving a result of Valiant and Valiant. The main feature of our theorem is that the rate of convergence is within \(\log n\) of optimal for \(n, d \rightarrow \infty \).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. See [25] for the full version.

  2. It should be noted that the bounds obtained by Bubeck and Ganguly [12] are also optimal to within logarithmic factors, but they are specific to Wishart matrices. We mention also the work of Bentkus and Götze [3], which obtains optimal bounds for quadratic forms under certain somewhat specialized assumptions.

  3. Other names appearing in the literature include “Monge–Kantorovich distance”, “Kantorovich distance”, and “Wasserstein distance”. We refer to [26] for a historical discussion of the concept.

  4. On the other hand, convergence in probabilities of convex sets does not in general imply convergence in \(\mathcal {W}_2\) distance, and we do not know of any easy way to derive a result similar to Theorem 1.1 from Theorem 1.3.

  5. We remark that even if the \(d^{1/4}\) in Theorem 1.3 were replaced by a constant as in Nagaev’s lower bound, it would only give \(\Delta _{{ CI}}(S_n, Z) = o(1)\) for \(d = o(n^{1/3})\), which is still more restrictive than \(d = o(n^{2/5})\). Thus, Corollary 1.5 proves that under the assumption \(\Vert X_1\Vert = \sqrt{d}\), convergence in \(\Delta _{{ CI}}\) is actually faster than indicated by Nagaev’s example (which does not satisfy \(\Vert X_1\Vert = \sqrt{d}\)).

  6. The constant was later improved to \((2 \pi )^{-1/4} \approx 0.64\) by Nazarov [18], who also constructed an example with surface area of order \(d^{1/4}\).

  7. This is also given as equation (1.4) in [2].

References

  1. Ball, K.: The reverse isoperimetric problem for Gaussian measure. Discrete Comput. Geom. 10(4), 411–420 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bentkus, V.: On the dependence of the Berry–Esseen bound on dimension. J. Stat. Plan. Inference 113(2), 385–402 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bentkus, V., Götze, F.: Optimal rates of convergence in the CLT for quadratic forms. Ann. Probab. 24(1), 466–490 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bergström, H.: On the central limit theorem in the space \(R_k\), \(k >1\). Scand. Actuar. J. 1945(1–2), 106–127 (1945)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bhattacharya, R.N.: Refinements of the multidimensional central limit theorem and applications. Ann. Probab. 5(1), 1–27 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bhattacharya, R., Holmes, S.: An exposition of Götze’s estimation of the rate of convergence in the multivariate central limit theorem. Preprint arXiv:1003.4254 (2010)

  7. Bobkov, S.G.: Entropic approach to E. Rio’s central limit theorem for \({\cal{W}}_2\) transport distance. Stat. Prob. Lett. 83(7), 1644–1648 (2013)

    Article  MATH  Google Scholar 

  8. Bobkov, S.G., Chistyakov, G., Götze, F.: Berry–Esseen bounds in the entropic central limit theorem. Probab. Theory Relat. Fields 159(3–4), 435–478 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  9. Bobkov, S.G., Götze, F.: Exponential integrability and transportation cost related to logarithmic Sobolev inequalities. J. Funct. Anal. 163(1), 1–28 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  10. Bonis, T.: Rates in the central limit theorem and diffusion approximation via Stein’s method. Preprint arXiv:1506.06966 (2015)

  11. Bubeck, S., Ding, J., Eldan, R., Rácz, M.: Testing for high-dimensional geometry in random graphs. Random Struct. Algorithms 49(3), 503–532 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  12. Bubeck, S., Ganguly, S.: Entropic CLT and phase transition in high-dimensional Wishart matrices. Preprint arXiv:1509.03258 (2015)

  13. Chen, L.H.Y., Fang, X.: Multivariate normal approximation by Stein’s method: the concentration inequality approach. Preprint arXiv:1111.4073 (2011)

  14. Chernozhukov, V., Chetverikov, D., Kato, K.: Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Stat. 41(6), 2786–2819 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  15. Götze, F.: On the rate of convergence in the multivariate CLT. Ann. Probab. 19(2), 724–739 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  16. Marton, K.: Bounding \(\bar{d}\)-distance by informational divergence: a method to prove measure concentration. Ann. Probab. 24(2), 857–866 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  17. Nagaev, S.V.: An estimate of the remainder term in the multidimensional CLT. In: Proceedings of the third Japan-USSR Symposium on Probability Theory, pp. 419–438. Springer, Berlin (1976)

  18. Nazarov, F.: On the maximal perimeter of a convex set in\({\mathbb{R}}^n\) with respect to a Gaussian measure. In: Geometric Aspects of Functional Analysis, pp. 169–187. Springer, Berlin (2003)

  19. Rio, E.: Upper bounds for minimal distances in the central limit theorem. Annales de l’IHP Probabilits et Statistiques 45(3), 802–817 (2009)

    MathSciNet  MATH  Google Scholar 

  20. Rio, E.: Asymptotic constants for minimal distance in the central limit theorem. Electron. Commun. Probab. 16, 96–103 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  21. Sazanov, V.V.: On the multi-dimensional central limit theorem. Sankhyā Indian J. Stat. Ser. A 30(2), 181–204 (1968)

  22. Senatov, V.V.: Uniform estimates of the rate of convergence in the multi-dimensional central limit theorem. Theory Probab. Appl. 24(4), 745–759 (1980)

    MathSciNet  MATH  Google Scholar 

  23. Talagrand, M.: Transportation cost for Gaussian and other product measures. Geom. Funct. Anal. 6(3), 587–600 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  24. Valiant, G., Valiant, P.: Estimating the unseen: an \(n/\log (n)\)-sample estimator for entropy and support size, shown optimal via new CLTs. In: Proceedings of the Forty-Third Annual ACM Symposium on the Theory of Computing, pp. 685–694 (2011)

  25. Valiant, G., Valiant, P.: A CLT and tight lower bounds for estimating entropy. http://www.eccc.uni-trier.de/report/2010/179/ (2010)

  26. Vershik, A.M.: Long history of the Monge–Kantorovich transportation problem. Math. Intell. 35(4), 1–9 (2013)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We are indebted to Jian Ding for suggesting the use of Talagrand’s transportation inequality and Amir Dembo for pointing out a hole in a preliminary version of the main argument as well as many helpful comments on the exposition. We also thank Sourav Chatterjee for helpful discussions about related work. Finally, we thank the anonymous reviewers for many good suggestions and for pointing out several references.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alex Zhai.

Appendix

Appendix

1.1 Proof of Proposition 1.2

Proof

Let \(\ell _n = \frac{\beta }{\sqrt{n}}\), and consider the lattice \(L = \ell _n \mathbb {Z}^d\). For any \(x \in \mathbb {R}^d\), let \(d_L(x)\) denote the minimum Euclidean distance from x to L. Note that \(S_n\) takes values in L. Thus, letting \(\rho \) denote the density of Z, we have v

$$\begin{aligned} \mathcal {W}_2(S_n, Z) \ge \int \rho (x) d_L(x)~{ dx}. \end{aligned}$$

To estimate the right hand side, for any \(y \in L\), let \(Q_n(y)\) denote the cube of side length \(\ell _n\) centered at y (which is also the set of points in \(\mathbb {R}^d\) closer to y than to any other point in L). We find that

$$\begin{aligned} \frac{1}{{{\mathrm{Vol}}}Q_n(y)} \int _{Q_n(y)} d_L(x) ~{ dx}&= \frac{\ell _n}{2^d} \int _{[-1,1]^d} \Vert x\Vert ~{ dx} = \frac{\ell _n}{2^d} \int _{[-1,1]^d} \sqrt{x_1^2 + \cdots + x_d^2} ~{ dx} \nonumber \\&\ge \frac{\ell _n}{2^d} \int _{[-1,1]^d} \frac{1}{\sqrt{d}}\left( |x_1| + \cdots + |x_d|\right) ~{ dx} = \frac{1}{2} \ell _n \sqrt{d}. \end{aligned}$$
(9)

Next, let M be large enough so that,

$$\begin{aligned} \int _{[-M,M]^d} \rho (x) ~{ dx} \ge \frac{1}{2}, \end{aligned}$$

and let

$$\begin{aligned} r_n = \inf _{\begin{array}{c} x, y \in [-2M, 2M]^d \\ \Vert x - y\Vert \le \sqrt{d} \ell _n \end{array}} \frac{\rho (x)}{\rho (y)}. \end{aligned}$$
(10)

Note that since \(\rho \) is positive and continuous, we have \(\lim _{n \rightarrow \infty } r_n = 1\).

Assume now that n is sufficiently large so that \(\ell _n < M\). Combining (10) with (9), we have for each \(y \in L \cap [-M,M]^d\) that

$$\begin{aligned} \int _{Q_n(y)} \rho (x) d_L(x) ~{ dx}&\ge \frac{r_n}{{{\mathrm{Vol}}}Q_n(y)} \int _{Q_n(y)} \rho (x) ~{ dx} \cdot \int _{Q_n(y)} d_L(x) ~{ dx} \\&\ge \frac{r_n \ell _n \sqrt{d}}{2} \int _{Q_n(y)} \rho (x) ~{ dx}. \end{aligned}$$

Summing over all such y yields

$$\begin{aligned} \mathcal {W}_2(S_n, Z)&\ge \int \rho (x) d_L(x) ~{ dx} \ge \int _{[-2M,2M]^d} \rho (x) d_L(x) ~{ dx} \\&\ge \sum _{y \in L \cap [-M,M]^d} \int _{Q_n(y)} \rho (x) d_L(x) ~{ dx} \\&\ge \frac{r_n \ell _n \sqrt{d}}{2} \int _{[-M,M]^d} \rho (x) ~{ dx} \ge \frac{r_n \beta \sqrt{d}}{4 \sqrt{n}}. \end{aligned}$$

Multiplying both sides by \(\sqrt{n}\) and taking limits gives the result. \(\square \)

1.2 Proof of Proposition 1.4

Proof

We prove the result with \(C = 5\). Let \(A \subset \mathbb {R}^d\) be a given convex set. For a parameter \(\epsilon \) to be specified later, define

$$\begin{aligned} A^\epsilon= & {} \left\{ x \in \mathbb {R}^d \mid \sup _{a \in A} \Vert x - a\Vert \le \epsilon \right\} \\ A_\epsilon= & {} \left\{ x \in \mathbb {R}^d \mid \inf _{a \in \mathbb {R}^d \setminus A} \Vert x - a\Vert \ge \epsilon \right\} . \end{aligned}$$

Ball [1] showed a \(4d^{1/4}\) upper boundFootnote 6 for the Gaussian surface area of any convex set in \(\mathbb {R}^d\). Hence,Footnote 7

$$\begin{aligned} \mathbf {P}\left( Z \in A^\epsilon \setminus A\right) \le 4 \epsilon d^{1/4},\quad \text {and}\quad \mathbf {P}\left( Z \in A \setminus A_\epsilon \right) \le 4 \epsilon d^{1/4}. \end{aligned}$$

We may regard T as being coupled to Z so that \(\mathbf {E}\Vert T - Z\Vert ^2 = \mathcal {W}_2(T, Z)^2\). Then,

$$\begin{aligned} \mathbf {P}(T \in A)&\le \mathbf {P}(\Vert T - Z\Vert \le \epsilon , \; T \in A) + \mathbf {P}(\Vert T - Z\Vert > \epsilon ) \\&\le \mathbf {P}(Z \in A^\epsilon ) + \epsilon ^{-2} \mathcal {W}_2(T, Z)^2 \\&\le \mathbf {P}(Z \in A) + 4 \epsilon d^{1/4} + \epsilon ^{-2} \mathcal {W}_2(T, Z)^2 \end{aligned}$$

Similarly,

$$\begin{aligned} \mathbf {P}(Z \in A)&\le \mathbf {P}(Z \in A_\epsilon ) + 4 \epsilon d^{1/4} \\&\le \mathbf {P}(\Vert T - Z\Vert \le \epsilon , \; Z \in A_\epsilon ) + \mathbf {P}(\Vert T - Z\Vert > \epsilon ) + 4 \epsilon d^{1/4} \\&\le \mathbf {P}(T \in A) + \epsilon ^{-2}\mathcal {W}_2(T, Z)^2 + 4 \epsilon d^{1/4}. \end{aligned}$$

Thus,

$$\begin{aligned} |\mathbf {P}(T \in A) - \mathbf {P}(Z \in A)| \le \epsilon ^{-2}\mathcal {W}_2(T, Z)^2 + 4 \epsilon d^{1/4}, \end{aligned}$$

and taking \(\epsilon = d^{-1/12} \mathcal {W}_2(T, Z)^{2/3}\) gives the result. \(\square \)

1.3 Proof of Lemma 3.3

Proof

Let \(A'\) and \(B'\) be independent copies of A and B. Then,

$$\begin{aligned} \mathbf {E}\Big ( f(A, B) + f(A', B') - f(A, B') - f(A', B) \Big )^2 \ge 0. \end{aligned}$$

Expanding yields

$$\begin{aligned} 4 \mathbf {E}f(A, B)^2 + 4 (\mathbf {E}f(A, B))^2&= 4 \mathbf {E}f(A, B)^2 + 2 \mathbf {E}f(A, B) f(A', B') \\&\quad +\,2 \mathbf {E}f(A, B') f(A', B) \\&\ge 2 \mathbf {E}f(A, B)f(A, B') + 2 \mathbf {E}f(A, B) f(A', B) \\&\quad +\,2 \mathbf {E}f(A', B') f(A, B') + 2 \mathbf {E}f(A', B') f(A', B) \\&= 2 \mathbf {E}f_B(A)^2 + 2 \mathbf {E}f_A(B)^2 \\&\quad +\,2 \mathbf {E}f_A(B)^2 + 2 \mathbf {E}f_B(A)^2 \\&= 4 \mathbf {E}f_B(A)^2 + 4 \mathbf {E}f_A(B)^2, \end{aligned}$$

as desired. \(\square \)

1.4 Proof of Equation (2)

Proof

We proceed by induction on the dimension d, retracing the argument of [23], section 3. The base case \(d = 1\) is immediate from Theorem 3.1.

Assume now that the inequality holds in \(d - 1\) dimensions. For the inductive step, we can follow the same argument used to prove Theorem 3.1 (see [23], section 3). The argument proceeds by first comparing Y to another \(\mathbb {R}^d\)-valued random variable \({\hat{Y}}\) sharing the first \(d - 1\) coordinates of Y, but whose last coordinate is independently drawn from \(\mathcal {N}(0, \sigma _d)\).

Fix a \((d - 1)\)-dimensional vector \({\hat{x}}\), and let \(T_{{\hat{x}}}\) denote a random variable distributed as the last coordinate of Y conditioned on the first \(d - 1\) coordinates being equal to \({\hat{x}}\). Let \({\hat{\rho }}({\hat{x}}) = \int _{-\infty }^\infty \rho ({\hat{x}}, t) { dt}\). Then, the density of \(T_{{\hat{x}}}\) at t is given by

$$\begin{aligned} \frac{f({\hat{x}}, t) \cdot \rho ({\hat{x}}, t)}{f_{(d)}({\hat{x}}, 0) \cdot {\hat{\rho }}({\hat{x}})}. \end{aligned}$$

Noting that \(\frac{\rho ({\hat{x}}, t)}{{\hat{\rho }}({\hat{x}})}\) is the density of \(\mathcal {N}(0, \sigma _d)\) at t, the one-dimensional case of Theorem 3.1 implies

$$\begin{aligned} \mathcal {W}_2(T_{{\hat{x}}}, \mathcal {N}(0, \sigma _d))^2 \le 2 \sigma _d^2 \int _{-\infty }^\infty \frac{f({\hat{x}}, t)}{f_{(d)}({\hat{x}}, t)} \log \left( \frac{f({\hat{x}}, t)}{f_{(d)}({\hat{x}}, t)} \right) \frac{\rho ({\hat{x}}, t)}{{\hat{\rho }}({\hat{x}})} ~{ dt}. \end{aligned}$$
(11)

Since \(T_{{\hat{x}}}\) and \(\mathcal {N}(0, \sigma _d)\) have the same distributions as Y and \({\hat{Y}}\) conditioned on \({\hat{x}}\), we may integrate (11) over \({\hat{x}}\) to obtain

$$\begin{aligned} \mathcal {W}_2(Y, {\hat{Y}})^2&\le 2 \int _{\mathbb {R}^{d - 1}} \mathcal {W}_2(T_{{\hat{x}}}, \mathcal {N}(0, \sigma _d))^2 \cdot f_{(d)}({\hat{x}}, 0) {\hat{\rho }}({\hat{x}}) ~d{\hat{x}} \\&\le 2 \sigma _d^2 \int _{\mathbb {R}^{d - 1}} \int _{-\infty }^\infty f({\hat{x}}, t) \log \left( \frac{f({\hat{x}}, t)}{f_{(d)}({\hat{x}}, t)} \right) \rho ({\hat{x}}, t) ~{ dt} ~d{\hat{x}}. \\&= 2 \sigma _d^2 \cdot \mathbf {E}\left( f(Z) \log \frac{f(Z)}{f_{(d)}(Z)} \right) \\&= 2 \sigma _d^2 \cdot \bigg ( \mathbf {E}\left( f(Z) \log f(Z) \right) - \mathbf {E}\left( f_{(d)}(Z) \log f_{(d)}(Z) \right) \bigg ) \end{aligned}$$

Next, define \(Y_{(d)}\) and \(Z_{(d)}\) to be the projections onto the first \(d - 1\) coordinates of Y and Z, respectively. Note that the coupling of Y to \({\hat{Y}}\) changes only d-th coordinate. Furthermore, the d-th coordinates of \({\hat{Y}}\) and Z are both distributed as \(\mathcal {N}(0, \sigma _d)\) independent of the first \(d - 1\) coordinates. Thus, a coupling of \(Y_{(d)}\) to \(Z_{(d)}\) induces a coupling of \({\hat{Y}}\) to Z in which the last coordinate does not change. Consequently,

$$\begin{aligned} \mathcal {W}_2(Y, Z)^2\le & {} 2 \sigma _d^2 \cdot \bigg ( \mathbf {E}\left( f(Z) \log f(Z) \right) - \mathbf {E}\left( f_{(d)}(Z) \log f_{(d)}(Z) \right) \bigg )\nonumber \\&+ \mathcal {W}_2(Y_{(d)}, Z_{(d)})^2. \end{aligned}$$
(12)

Now, recall that the density of \(Y_{(d)}\) at a point \({\hat{x}} \in \mathbb {R}^{d - 1}\) is \(f_{(d)}({\hat{x}}, 0) \cdot {\hat{\rho }}({\hat{x}})\), and so applying the inductive hypothesis to \(\mathcal {W}_2(Y_{(d)}, Z_{(d)})^2\) yields

$$\begin{aligned} \mathcal {W}_2(Y_{(d)}, Z_{(d)})^2\le & {} 2 \sum _{k = 1}^{d - 1} \sigma _k^2 \cdot \mathbf {E}\left( f_{[k]}(Z_{(d)}) \log f_{[k]}(Z_{(d)}) - f_{[k - 1]}(Z_{(d)}) \log f_{[k - 1]}(Z_{(d)}) \right) \\= & {} 2 \sum _{k = 1}^{d - 1} \sigma _k^2 \cdot \mathbf {E}\left( f_{[k]}(Z) \log f_{[k]}(Z) - f_{[k - 1]}(Z) \log f_{[k - 1]}(Z) \right) . \end{aligned}$$

Substituting into (12), we obtain

$$\begin{aligned} \mathcal {W}_2(Y, Z)^2 \le 2 \sum _{k = 1}^{d - 1} \sigma _k^2 \cdot \mathbf {E}\left( f_{[k]}(Z) \log f_{[k]}(Z) - f_{[k - 1]}(Z) \log f_{[k - 1]}(Z) \right) , \end{aligned}$$

completing the induction. \(\square \)

1.5 Proof of Lemma 4.2

Proof

Let \(C_k = (2 \pi )^{-\frac{k}{2}}\). We have

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhai, A. A high-dimensional CLT in \(\mathcal {W}_2\) distance with near optimal convergence rate. Probab. Theory Relat. Fields 170, 821–845 (2018). https://doi.org/10.1007/s00440-017-0771-3

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00440-017-0771-3

Mathematics Subject Classification

Navigation