Abstract
Stein’s method has been widely used for probability approximations. However, in the multi-dimensional setting, most of the results are for multivariate normal approximation or for test functions with bounded second- or higher-order derivatives. For a class of multivariate limiting distributions, we use Bismut’s formula in Malliavin calculus to control the derivatives of the Stein equation solutions by the first derivative of the test function. Combined with Stein’s exchangeable pair approach, we obtain a general theorem for multivariate approximations with near optimal error bounds on the Wasserstein distance. We apply the theorem to the unadjusted Langevin algorithm.
Similar content being viewed by others
Change history
11 July 2019
We write this note to correct [1, <Emphasis Type="Bold">(6.9), (6.13), (7.1), (7.2)</Emphasis>] because there was one term <Emphasis Type="Italic">missed</Emphasis> in [1, (6.9)].
References
Albeverio, S., Bogachev, V., Röckner, M.: On uniqueness of invariant measures for finite-and infinite-dimensional diffusions. Commun. Pure Appl. Math. 52, 325–362 (1999)
Bentkus, V.: A Lyapunov type bound in \({\bf R}^d\). Theory Probab. Appl. 49, 311–323 (2005)
Bobkov, S.G.: Berry–Esseen bounds and Edgeworth expansions in the central limit theorem for transport distances. Probab. Theory Related Fields 170, 229–262 (2018)
Bonis, T.: Rates in the central limit theorem and diffusion approximation via Stein’s method (2018). Preprint. arXiv:1506.06966
Braverman, A., Dai, J.G.: Stein’s method for steady-state diffusion approximations of \(M/Ph/n+M\) systems. Ann. Appl. Probab. 27, 550–581 (2017)
Cerrai, S.: Second Order PDE’s in Finite and Infinite Dimensions: A Probabilistic Approach. Lecture Notes in Mathematics, vol. 1762. Springer, Berlin (2001)
Chatterjee, S., Meckes, E.: Multivariate normal approximation using exchangeable pairs. ALEA Lat. Am. J. Probab. Math. Stat. 4, 257–283 (2008)
Chatterjee, S., Shao, Q.M.: Nonnormal approximation by Stein’s method of exchangeable pairs with application to the Curie–Weiss model. Ann. Appl. Probab. 21, 464–483 (2011)
Courtade, T. A., Fathi, M., Pananjady, A.: Existence of Stein kernels under a spectral gap, and discrepancy bound (2018). Preprint. arXiv:1703.07707
Dalalyan, A.S.: Theoretical guarantees for approximate sampling from smooth and log-concave densities. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79, 651–676 (2017)
Da Prato, G., Goldys, B.: Elliptic operators on \({\bf R}^d\) with unbounded coefficients. J. Differ. Equ. 172, 333–358 (2001)
Down, D., Meyn, S.P., Tweedie, R.L.: Exponential and uniform ergodicity of Markov processes. Ann. Probab. 23, 1671–1691 (1995)
Durmus, A., Moulines, E.: High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm (2018). Preprint. arXiv:1605.01559
Eberle, A.: Reflection couplings and contraction rates for diffusions. Probab. Theory Related Fields 166, 851–886 (2016)
Eldan, R., Mikulincer, D., Zhai, A.: The CLT in high dimensions: quantitative bounds via martingale embedding (2018). Preprint. arXiv:1806.09087
Elworthy, L., Li, X.M.: Formulae for the derivatives of heat semigroups. J. Funct. Anal. 125, 252–286 (1994)
Goldstein, L., Rinott, Y.: Multivariate normal approximation by Stein’s method and size bias couplings. Appl. Probab. Index 33, 1–17 (1996)
Gorham, J., Duncan, A. B., Vollmer, S. J., Mackey, L.: Measuring sample quality with diffusions (2017). Preprint. arXiv:1611.06972
Götze, F.: On the rate of convergence in the multivariate CLT. Ann. Probab. 19, 724–739 (1991)
Gurvich, I.: Diffusion models and steady-state approximations for exponentially ergodic Markovian queues. Ann. Appl. Probab. 24, 2527–2559 (2014)
Hairer, M., Mattingly, J.C.: Ergodicity of the 2D Navier–Stokes equations with degenerate stochastic forcing. Ann. Math. 164, 993–1032 (2006)
Kusuoka, S., Tudor, C.A.: Characterization of the convergence in total variation and extension of the fourth moment theorem to invariant measures of diffusions. Bernoulli 24, 1463–1496 (2018)
Ledoux, M., Nourdin, I., Peccati, G.: Stein’s method, logarithmic Sobolev and transport inequalities. Geom. Funct. Anal. 25, 256–306 (2015)
Mackey, L.: Private communications (2018)
Mackey, L., Gorham, J.: Multivariate Stein factors for a class of strongly log-concave distributions. Electron. Commun. Probab. 21, 1–14 (2016)
Norris, J.: Simplified Malliavin calculus. Séminaire de Probabilités, XX, 1984/85, 101–130, Lecture Notes in Mathematics. Springer, Berlin (1986)
Nourdin, I., Peccati, G.: Stein’s method on Wiener chaos. Probab. Theory Relat. Fields 145, 75–118 (2009)
Nourdin, I., Peccati, G.: Normal Approximations with Malliavin Calculus. From Stein’s Method to Universality. Cambridge Tracts in Mathematics, vol. 192. Cambridge University Press, Cambridge (2012)
Nourdin, I., Peccati, G., Réveillac, A.: Multivariate normal approximation using Stein’s method and Malliavin calculus. Ann. Inst. Henri Poincare Probab. Stat. 46, 45–58 (2010)
Partington, J.R.: Linear Operators and Linear Systems. London Mathematical Society Student Texts, vol. 60. Cambridge University Press, Cambridge (2004)
Reinert, G., Röllin, A.: Multivariate normal approximation with Stein’s method of exchangeable pairs under a general linearity condition. Ann. Probab. 37, 2150–2173 (2009)
Revuz, D., Yor, M.: Continuous Martingales and Brownian Motion. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 293, 3rd edn. Springer, Berlin (1999)
Rinott, Y., Rotar, V.: A multivariate CLT for local dependence with \(n^{-1/2} \log n\) rate and applications to multivariate graph related statistics. J. Multivar. Anal. 56, 333–350 (1996)
Roberts, G.O., Tweedie, R.L.: Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 4, 341–363 (1996)
Röllin, A.: A note on the exchangeability condition in Stein’s method. Stat. Probab. Lett. 78, 1800–1806 (2008)
Sakhanenko, A. I.: Estimates in an invariance principle. (Russian) Limit theorems of probability theory, 27–44, 175, Trudy Instituta Matematiki, 5, “Nauka” Sibirsk. Otdel., Novosibirsk. (1985)
Shao, Q.M., Zhang, Z.S.: Identifying the limiting distribution by a general approach of Stein’s method. Sci. China Math. 59, 2379–2392 (2016)
Stein, C.: A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In: Proceedings of Sixth Berkeley Symposium Mathematical Statistics and Probability, vol. 2, pp. 583–602. University of California Press. Berkeley, CA (1972)
Stein, C.: Approximate Computation of Expectations. Lecture Notes 7. Institute of Mathematical Statistics, Hayward, CA (1986)
Valiant, G., Valiant, P.: A CLT and tight lower bounds for estimating entropy. In: Electronic Colloquium on Computational Complexity. TR10-179 (2010)
Wang, F.Y., Xu, L., Zhang, X.: Gradient estimates for SDEs driven by multiplicative Levy noise. J. Funct. Anal. 269, 3195–3219 (2015)
Zhai, A.: A high-dimensional CLT in \(\cal{W}_2\) distance with near optimal convergence rate. Probab. Theory Related Fields 170, 821–845 (2018)
Acknowledgements
We thank Michel Ledoux for very helpful discussions. We also thank two anonymous referees for their valuable comments which have improved the manuscript considerably. Fang X. was partially supported by Hong Kong RGC ECS 24301617, a CUHK direct grant and a CUHK start-up grant. Shao Q. M. was partially supported by Hong Kong RGC GRF 14302515 and 14304917. Xu L. was partially supported by Macao S.A.R. (FDCT 038/2017/A1, FDCT 030/2016/A1, FDCT 025/2016/A1), NNSFC 11571390, University of Macau (MYRG 2016-00025-FST, MYRG 2018-00133-FST).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: On the ergodicity of SDE (2.2)
This section provides the details of the verification of the ergodicity of SDE (2.2). There are several ways to prove the ergodicity of SDE (2.2); here, we follow the approach used by Eberle [14, Theorem 1 and Corollary 2] because it gives exponential convergence in Wasserstein distance. We verify the conditions in the theorem, adopting the notations in [14]. For any \(r>0\), define
Compared with the conditions in [14], SDE (2.2) has \(\sigma =\sqrt{2} I_d\) and \(G=\frac{1}{2} I_d\) and the associated intrinsic metric is \(\frac{1}{\sqrt{2}} |\cdot |\) with \(|\cdot |\) being the Euclidean distance. By (2.3), we have
which implies that
Therefore, we have \(\kappa (r)>0\) for \(r>0\) and thus \(\int _0^1 r \kappa (r)^{-} \mathrm {d}r=0\). Define
It is easy to check that \(R_0=0\) and \(R_1 \in (0,\infty )\).
As \(\kappa (r)>0\) for all \(r>0\), we have \(\varphi (r)=\exp \left( -\frac{1}{4} \int _0^r s \kappa (s)^{-} \mathrm {d}s\right) =1\) and thus \(\Phi (r)=\int _0^r \varphi (s)\mathrm {d}s=r\). Moreover, we have \(\alpha =1\) and
Applying Corollary 2 in [14], we have
where \(\mathcal L(X_t^x)\) denotes the probability distribution of \(X_t^x\). Note that the convergence rate \(c>0\) only depends on \(\theta _0, \theta _1\) and \(\theta _2\).
From (A.1), we say that \(\mathcal L(X_t^x) \rightarrow \mu \) weakly, in the sense that for any bounded continuous function \(f: \mathbb {R}^d \rightarrow \mathbb {R}\), we have
which immediately implies
Appendix B: Multivariate normal approximation
In this appendix, we prove the results stated in Remark 2.9 with regard to multivariate normal approximation for sums of independent, bounded random vectors.
Theorem B.1
Let \(W=\frac{1}{\sqrt{n}}\sum _{i=1}^n X_i\) where \(X_1,\ldots , X_n\) are independent d-dimensional random vectors such that \(\mathbb {E}X_i=0\), \(|X_i|\le \beta \) and \(\mathbb {E}W W^\mathrm{T}=I_d\). Then we have
and
where C is an absolute constant and Z has the standard d-dimensional normal distribution.
Proof
Note that by the same smoothing and limiting arguments as in the proof of Theorem 2.5, we only need to consider test functions \(h\in \text {Lip}(\mathbb {R}^d, 1)\), which are smooth and have bounded derivatives of all orders. This is assumed throughout the proof so that the integration, differentiation, and their interchange are legitimate.
For multivariate normal approximation, the Stein equation (3.1) simplifies to
An appropriate solution to (B.3) is known to be
where \(*\) denotes the convolution and \(\phi _r(x)=(2\pi r^2)^{-d/2}e^{-|x|^2/2r^2}\). From (B.3), we have
with \(f:=f_h\) in (B.4) (we omit the dependence of f on h for notational convenience).
Let C be a constant that may differ in different expressions. Denote
Let \(\{X_1',\ldots , X_n'\}\) be an independent copy of \(\{X_1,\ldots , X_n\}\). Let I be uniformly chosen from \(\{1,\ldots , n\}\) and be independent of \(\{X_1,\ldots , X_n, X_1',\ldots , X_n'\}\). Define
Then W and \(W'\) have the same distribution. Let
We have, by the independence assumption and the facts that \(\mathbb {E}X_i=0\) and \(\mathbb {E}W W^\mathrm{T}=I_d\),
and
Therefore, (2.7) and (2.8) are satisfied with
Note that this is Example 1 below Assumption 2.1 with \(\lambda _1=\cdots =\lambda _d=1\). By the boundedness condition,
We also have
As \(\beta ^2\ge \sum _{i=1}^n \mathbb {E}[\frac{|X_i|^2}{n}]=d\), we have \(\beta \ge \sqrt{d}\). Using these facts and assuming that \(\beta \le \sqrt{n}\) [otherwise (B.1) is trivial], in applying (2.10), we have
and
This proves (B.1).
To prove (B.2), we modify the argument from (3.7) by using the explicit expression of f in (B.4). With \(\delta _i=(X_i'-X_i)/\sqrt{n}\) and \(h_s(x):=h(e^{-s}x)\), we have
where \(\nabla _{\delta _i}^3:=\nabla _{\delta _i}\nabla _{\delta _i}\nabla _{\delta _i}\) (cf. Sect. 2.1). We separate the integration over s into \(\int _0^{\epsilon ^2}\) and \(\int _{\epsilon ^2}^\infty \) with an \(\epsilon \) to be chosen later. For the part \(\int _0^{\epsilon ^2}\), exactly following [40, pp. 18–19], we have
where we used \(\sum _{i=1}^n \mathbb {E}|\delta _i|^2=2d\). The part \(\int _{\epsilon ^2}^\infty \) is treated differently. Using the interchangeability of convolution and differentiation, we have
Let \(\{\hat{X}_1,\ldots , \hat{X}_n\}\) be another independent copy of \(\{X_1,\ldots , X_n\}\) and be independent of \(\{X_1',\ldots , X_n'\}\), and let \(\hat{W}_i=W-\frac{X_i}{\sqrt{n}}+\frac{\hat{X}_i}{\sqrt{n}}\). From this construct, for each i, \(\hat{W}_i\) has the same distribution as W and is independent of \(\{X_i, X_i'\}\). Let \(\hat{Z}\) be an independent standard Gaussian vector. Rewriting
the term inside the absolute value in (B.8) is separated into three terms as follows:
We bound their absolute values separately. From \(h\in \text {Lip}(\mathbb {R}^d, 1)\), \(h_s(x)=h(e^{-s}x)\) and \(|X_i|\le \beta \), we have
Moreover,
Therefore,
where we used \(|\delta _i|\le 2\beta /\sqrt{n}\) and \(\sum _{i=1}^n\mathbb {E}|\delta _i|^2=2d\).
From the definition of \(\eta \) in (B.5) and the fact that \(\hat{W}_i\) has the same distribution as W, we have
Using independence and the same argument as for \(R_{31}\), we have
Now we bound \(R_{33}\). Using integration by parts, and combining two independent Gaussians into one, we have
From (B.6), (B.7) and the bounds on \(R_{31}, R_{32}\) and \(R_{33}\), we have
The theorem is proved by choosing \(\epsilon \) to be a large multiple of \(d\beta /\sqrt{n}\) and solving the recursive inequality for \(\eta \). \(\square \)
Rights and permissions
About this article
Cite this article
Fang, X., Shao, QM. & Xu, L. Multivariate approximations in Wasserstein distance by Stein’s method and Bismut’s formula. Probab. Theory Relat. Fields 174, 945–979 (2019). https://doi.org/10.1007/s00440-018-0874-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-018-0874-5
Keywords
- Bismut’s formula
- Langevin algorithm
- Malliavin calculus
- Multivariate approximation
- Rate of convergence
- Stein’s method
- Wasserstein distance