Abstract
Let \(X_1,\ldots ,X_n\) be i.i.d. random vectors in \(\mathbb {R}^d\) with \(\Vert X_1\Vert \le \beta \). Then, we show that
converges to a Gaussian in quadratic transportation (also known as “Kantorovich” or “Wasserstein”) distance at a rate of \(O \left( \frac{\sqrt{d} \beta \log n}{\sqrt{n}} \right) \), improving a result of Valiant and Valiant. The main feature of our theorem is that the rate of convergence is within \(\log n\) of optimal for \(n, d \rightarrow \infty \).
Similar content being viewed by others
Notes
See [25] for the full version.
It should be noted that the bounds obtained by Bubeck and Ganguly [12] are also optimal to within logarithmic factors, but they are specific to Wishart matrices. We mention also the work of Bentkus and Götze [3], which obtains optimal bounds for quadratic forms under certain somewhat specialized assumptions.
Other names appearing in the literature include “Monge–Kantorovich distance”, “Kantorovich distance”, and “Wasserstein distance”. We refer to [26] for a historical discussion of the concept.
We remark that even if the \(d^{1/4}\) in Theorem 1.3 were replaced by a constant as in Nagaev’s lower bound, it would only give \(\Delta _{{ CI}}(S_n, Z) = o(1)\) for \(d = o(n^{1/3})\), which is still more restrictive than \(d = o(n^{2/5})\). Thus, Corollary 1.5 proves that under the assumption \(\Vert X_1\Vert = \sqrt{d}\), convergence in \(\Delta _{{ CI}}\) is actually faster than indicated by Nagaev’s example (which does not satisfy \(\Vert X_1\Vert = \sqrt{d}\)).
The constant was later improved to \((2 \pi )^{-1/4} \approx 0.64\) by Nazarov [18], who also constructed an example with surface area of order \(d^{1/4}\).
This is also given as equation (1.4) in [2].
References
Ball, K.: The reverse isoperimetric problem for Gaussian measure. Discrete Comput. Geom. 10(4), 411–420 (1993)
Bentkus, V.: On the dependence of the Berry–Esseen bound on dimension. J. Stat. Plan. Inference 113(2), 385–402 (2003)
Bentkus, V., Götze, F.: Optimal rates of convergence in the CLT for quadratic forms. Ann. Probab. 24(1), 466–490 (1996)
Bergström, H.: On the central limit theorem in the space \(R_k\), \(k >1\). Scand. Actuar. J. 1945(1–2), 106–127 (1945)
Bhattacharya, R.N.: Refinements of the multidimensional central limit theorem and applications. Ann. Probab. 5(1), 1–27 (1977)
Bhattacharya, R., Holmes, S.: An exposition of Götze’s estimation of the rate of convergence in the multivariate central limit theorem. Preprint arXiv:1003.4254 (2010)
Bobkov, S.G.: Entropic approach to E. Rio’s central limit theorem for \({\cal{W}}_2\) transport distance. Stat. Prob. Lett. 83(7), 1644–1648 (2013)
Bobkov, S.G., Chistyakov, G., Götze, F.: Berry–Esseen bounds in the entropic central limit theorem. Probab. Theory Relat. Fields 159(3–4), 435–478 (2014)
Bobkov, S.G., Götze, F.: Exponential integrability and transportation cost related to logarithmic Sobolev inequalities. J. Funct. Anal. 163(1), 1–28 (1999)
Bonis, T.: Rates in the central limit theorem and diffusion approximation via Stein’s method. Preprint arXiv:1506.06966 (2015)
Bubeck, S., Ding, J., Eldan, R., Rácz, M.: Testing for high-dimensional geometry in random graphs. Random Struct. Algorithms 49(3), 503–532 (2016)
Bubeck, S., Ganguly, S.: Entropic CLT and phase transition in high-dimensional Wishart matrices. Preprint arXiv:1509.03258 (2015)
Chen, L.H.Y., Fang, X.: Multivariate normal approximation by Stein’s method: the concentration inequality approach. Preprint arXiv:1111.4073 (2011)
Chernozhukov, V., Chetverikov, D., Kato, K.: Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Stat. 41(6), 2786–2819 (2013)
Götze, F.: On the rate of convergence in the multivariate CLT. Ann. Probab. 19(2), 724–739 (1991)
Marton, K.: Bounding \(\bar{d}\)-distance by informational divergence: a method to prove measure concentration. Ann. Probab. 24(2), 857–866 (1996)
Nagaev, S.V.: An estimate of the remainder term in the multidimensional CLT. In: Proceedings of the third Japan-USSR Symposium on Probability Theory, pp. 419–438. Springer, Berlin (1976)
Nazarov, F.: On the maximal perimeter of a convex set in\({\mathbb{R}}^n\) with respect to a Gaussian measure. In: Geometric Aspects of Functional Analysis, pp. 169–187. Springer, Berlin (2003)
Rio, E.: Upper bounds for minimal distances in the central limit theorem. Annales de l’IHP Probabilits et Statistiques 45(3), 802–817 (2009)
Rio, E.: Asymptotic constants for minimal distance in the central limit theorem. Electron. Commun. Probab. 16, 96–103 (2011)
Sazanov, V.V.: On the multi-dimensional central limit theorem. Sankhyā Indian J. Stat. Ser. A 30(2), 181–204 (1968)
Senatov, V.V.: Uniform estimates of the rate of convergence in the multi-dimensional central limit theorem. Theory Probab. Appl. 24(4), 745–759 (1980)
Talagrand, M.: Transportation cost for Gaussian and other product measures. Geom. Funct. Anal. 6(3), 587–600 (1996)
Valiant, G., Valiant, P.: Estimating the unseen: an \(n/\log (n)\)-sample estimator for entropy and support size, shown optimal via new CLTs. In: Proceedings of the Forty-Third Annual ACM Symposium on the Theory of Computing, pp. 685–694 (2011)
Valiant, G., Valiant, P.: A CLT and tight lower bounds for estimating entropy. http://www.eccc.uni-trier.de/report/2010/179/ (2010)
Vershik, A.M.: Long history of the Monge–Kantorovich transportation problem. Math. Intell. 35(4), 1–9 (2013)
Acknowledgements
We are indebted to Jian Ding for suggesting the use of Talagrand’s transportation inequality and Amir Dembo for pointing out a hole in a preliminary version of the main argument as well as many helpful comments on the exposition. We also thank Sourav Chatterjee for helpful discussions about related work. Finally, we thank the anonymous reviewers for many good suggestions and for pointing out several references.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Proof of Proposition 1.2
Proof
Let \(\ell _n = \frac{\beta }{\sqrt{n}}\), and consider the lattice \(L = \ell _n \mathbb {Z}^d\). For any \(x \in \mathbb {R}^d\), let \(d_L(x)\) denote the minimum Euclidean distance from x to L. Note that \(S_n\) takes values in L. Thus, letting \(\rho \) denote the density of Z, we have v
To estimate the right hand side, for any \(y \in L\), let \(Q_n(y)\) denote the cube of side length \(\ell _n\) centered at y (which is also the set of points in \(\mathbb {R}^d\) closer to y than to any other point in L). We find that
Next, let M be large enough so that,
and let
Note that since \(\rho \) is positive and continuous, we have \(\lim _{n \rightarrow \infty } r_n = 1\).
Assume now that n is sufficiently large so that \(\ell _n < M\). Combining (10) with (9), we have for each \(y \in L \cap [-M,M]^d\) that
Summing over all such y yields
Multiplying both sides by \(\sqrt{n}\) and taking limits gives the result. \(\square \)
1.2 Proof of Proposition 1.4
Proof
We prove the result with \(C = 5\). Let \(A \subset \mathbb {R}^d\) be a given convex set. For a parameter \(\epsilon \) to be specified later, define
Ball [1] showed a \(4d^{1/4}\) upper boundFootnote 6 for the Gaussian surface area of any convex set in \(\mathbb {R}^d\). Hence,Footnote 7
We may regard T as being coupled to Z so that \(\mathbf {E}\Vert T - Z\Vert ^2 = \mathcal {W}_2(T, Z)^2\). Then,
Similarly,
Thus,
and taking \(\epsilon = d^{-1/12} \mathcal {W}_2(T, Z)^{2/3}\) gives the result. \(\square \)
1.3 Proof of Lemma 3.3
Proof
Let \(A'\) and \(B'\) be independent copies of A and B. Then,
Expanding yields
as desired. \(\square \)
1.4 Proof of Equation (2)
Proof
We proceed by induction on the dimension d, retracing the argument of [23], section 3. The base case \(d = 1\) is immediate from Theorem 3.1.
Assume now that the inequality holds in \(d - 1\) dimensions. For the inductive step, we can follow the same argument used to prove Theorem 3.1 (see [23], section 3). The argument proceeds by first comparing Y to another \(\mathbb {R}^d\)-valued random variable \({\hat{Y}}\) sharing the first \(d - 1\) coordinates of Y, but whose last coordinate is independently drawn from \(\mathcal {N}(0, \sigma _d)\).
Fix a \((d - 1)\)-dimensional vector \({\hat{x}}\), and let \(T_{{\hat{x}}}\) denote a random variable distributed as the last coordinate of Y conditioned on the first \(d - 1\) coordinates being equal to \({\hat{x}}\). Let \({\hat{\rho }}({\hat{x}}) = \int _{-\infty }^\infty \rho ({\hat{x}}, t) { dt}\). Then, the density of \(T_{{\hat{x}}}\) at t is given by
Noting that \(\frac{\rho ({\hat{x}}, t)}{{\hat{\rho }}({\hat{x}})}\) is the density of \(\mathcal {N}(0, \sigma _d)\) at t, the one-dimensional case of Theorem 3.1 implies
Since \(T_{{\hat{x}}}\) and \(\mathcal {N}(0, \sigma _d)\) have the same distributions as Y and \({\hat{Y}}\) conditioned on \({\hat{x}}\), we may integrate (11) over \({\hat{x}}\) to obtain
Next, define \(Y_{(d)}\) and \(Z_{(d)}\) to be the projections onto the first \(d - 1\) coordinates of Y and Z, respectively. Note that the coupling of Y to \({\hat{Y}}\) changes only d-th coordinate. Furthermore, the d-th coordinates of \({\hat{Y}}\) and Z are both distributed as \(\mathcal {N}(0, \sigma _d)\) independent of the first \(d - 1\) coordinates. Thus, a coupling of \(Y_{(d)}\) to \(Z_{(d)}\) induces a coupling of \({\hat{Y}}\) to Z in which the last coordinate does not change. Consequently,
Now, recall that the density of \(Y_{(d)}\) at a point \({\hat{x}} \in \mathbb {R}^{d - 1}\) is \(f_{(d)}({\hat{x}}, 0) \cdot {\hat{\rho }}({\hat{x}})\), and so applying the inductive hypothesis to \(\mathcal {W}_2(Y_{(d)}, Z_{(d)})^2\) yields
Substituting into (12), we obtain
completing the induction. \(\square \)
1.5 Proof of Lemma 4.2
Proof
Let \(C_k = (2 \pi )^{-\frac{k}{2}}\). We have
\(\square \)
Rights and permissions
About this article
Cite this article
Zhai, A. A high-dimensional CLT in \(\mathcal {W}_2\) distance with near optimal convergence rate. Probab. Theory Relat. Fields 170, 821–845 (2018). https://doi.org/10.1007/s00440-017-0771-3
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-017-0771-3