Skip to main content
Log in

Multivariate approximations in Wasserstein distance by Stein’s method and Bismut’s formula

  • Published:
Probability Theory and Related Fields Aims and scope Submit manuscript

A Correction to this article was published on 11 July 2019

This article has been updated

Abstract

Stein’s method has been widely used for probability approximations. However, in the multi-dimensional setting, most of the results are for multivariate normal approximation or for test functions with bounded second- or higher-order derivatives. For a class of multivariate limiting distributions, we use Bismut’s formula in Malliavin calculus to control the derivatives of the Stein equation solutions by the first derivative of the test function. Combined with Stein’s exchangeable pair approach, we obtain a general theorem for multivariate approximations with near optimal error bounds on the Wasserstein distance. We apply the theorem to the unadjusted Langevin algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Change history

  • 11 July 2019

    We write this note to correct [1, <Emphasis Type="Bold">(6.9), (6.13), (7.1), (7.2)</Emphasis>] because there was one term <Emphasis Type="Italic">missed</Emphasis> in [1, (6.9)].

References

  1. Albeverio, S., Bogachev, V., Röckner, M.: On uniqueness of invariant measures for finite-and infinite-dimensional diffusions. Commun. Pure Appl. Math. 52, 325–362 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bentkus, V.: A Lyapunov type bound in \({\bf R}^d\). Theory Probab. Appl. 49, 311–323 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bobkov, S.G.: Berry–Esseen bounds and Edgeworth expansions in the central limit theorem for transport distances. Probab. Theory Related Fields 170, 229–262 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bonis, T.: Rates in the central limit theorem and diffusion approximation via Stein’s method (2018). Preprint. arXiv:1506.06966

  5. Braverman, A., Dai, J.G.: Stein’s method for steady-state diffusion approximations of \(M/Ph/n+M\) systems. Ann. Appl. Probab. 27, 550–581 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  6. Cerrai, S.: Second Order PDE’s in Finite and Infinite Dimensions: A Probabilistic Approach. Lecture Notes in Mathematics, vol. 1762. Springer, Berlin (2001)

    Book  MATH  Google Scholar 

  7. Chatterjee, S., Meckes, E.: Multivariate normal approximation using exchangeable pairs. ALEA Lat. Am. J. Probab. Math. Stat. 4, 257–283 (2008)

    MathSciNet  MATH  Google Scholar 

  8. Chatterjee, S., Shao, Q.M.: Nonnormal approximation by Stein’s method of exchangeable pairs with application to the Curie–Weiss model. Ann. Appl. Probab. 21, 464–483 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  9. Courtade, T. A., Fathi, M., Pananjady, A.: Existence of Stein kernels under a spectral gap, and discrepancy bound (2018). Preprint. arXiv:1703.07707

  10. Dalalyan, A.S.: Theoretical guarantees for approximate sampling from smooth and log-concave densities. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79, 651–676 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  11. Da Prato, G., Goldys, B.: Elliptic operators on \({\bf R}^d\) with unbounded coefficients. J. Differ. Equ. 172, 333–358 (2001)

    Article  MATH  Google Scholar 

  12. Down, D., Meyn, S.P., Tweedie, R.L.: Exponential and uniform ergodicity of Markov processes. Ann. Probab. 23, 1671–1691 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  13. Durmus, A., Moulines, E.: High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm (2018). Preprint. arXiv:1605.01559

  14. Eberle, A.: Reflection couplings and contraction rates for diffusions. Probab. Theory Related Fields 166, 851–886 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  15. Eldan, R., Mikulincer, D., Zhai, A.: The CLT in high dimensions: quantitative bounds via martingale embedding (2018). Preprint. arXiv:1806.09087

  16. Elworthy, L., Li, X.M.: Formulae for the derivatives of heat semigroups. J. Funct. Anal. 125, 252–286 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  17. Goldstein, L., Rinott, Y.: Multivariate normal approximation by Stein’s method and size bias couplings. Appl. Probab. Index 33, 1–17 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  18. Gorham, J., Duncan, A. B., Vollmer, S. J., Mackey, L.: Measuring sample quality with diffusions (2017). Preprint. arXiv:1611.06972

  19. Götze, F.: On the rate of convergence in the multivariate CLT. Ann. Probab. 19, 724–739 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  20. Gurvich, I.: Diffusion models and steady-state approximations for exponentially ergodic Markovian queues. Ann. Appl. Probab. 24, 2527–2559 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  21. Hairer, M., Mattingly, J.C.: Ergodicity of the 2D Navier–Stokes equations with degenerate stochastic forcing. Ann. Math. 164, 993–1032 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  22. Kusuoka, S., Tudor, C.A.: Characterization of the convergence in total variation and extension of the fourth moment theorem to invariant measures of diffusions. Bernoulli 24, 1463–1496 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  23. Ledoux, M., Nourdin, I., Peccati, G.: Stein’s method, logarithmic Sobolev and transport inequalities. Geom. Funct. Anal. 25, 256–306 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  24. Mackey, L.: Private communications (2018)

  25. Mackey, L., Gorham, J.: Multivariate Stein factors for a class of strongly log-concave distributions. Electron. Commun. Probab. 21, 1–14 (2016)

    MathSciNet  MATH  Google Scholar 

  26. Norris, J.: Simplified Malliavin calculus. Séminaire de Probabilités, XX, 1984/85, 101–130, Lecture Notes in Mathematics. Springer, Berlin (1986)

    Book  MATH  Google Scholar 

  27. Nourdin, I., Peccati, G.: Stein’s method on Wiener chaos. Probab. Theory Relat. Fields 145, 75–118 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  28. Nourdin, I., Peccati, G.: Normal Approximations with Malliavin Calculus. From Stein’s Method to Universality. Cambridge Tracts in Mathematics, vol. 192. Cambridge University Press, Cambridge (2012)

    Book  MATH  Google Scholar 

  29. Nourdin, I., Peccati, G., Réveillac, A.: Multivariate normal approximation using Stein’s method and Malliavin calculus. Ann. Inst. Henri Poincare Probab. Stat. 46, 45–58 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  30. Partington, J.R.: Linear Operators and Linear Systems. London Mathematical Society Student Texts, vol. 60. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  31. Reinert, G., Röllin, A.: Multivariate normal approximation with Stein’s method of exchangeable pairs under a general linearity condition. Ann. Probab. 37, 2150–2173 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  32. Revuz, D., Yor, M.: Continuous Martingales and Brownian Motion. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 293, 3rd edn. Springer, Berlin (1999)

    Book  MATH  Google Scholar 

  33. Rinott, Y., Rotar, V.: A multivariate CLT for local dependence with \(n^{-1/2} \log n\) rate and applications to multivariate graph related statistics. J. Multivar. Anal. 56, 333–350 (1996)

    Article  MATH  Google Scholar 

  34. Roberts, G.O., Tweedie, R.L.: Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 4, 341–363 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  35. Röllin, A.: A note on the exchangeability condition in Stein’s method. Stat. Probab. Lett. 78, 1800–1806 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  36. Sakhanenko, A. I.: Estimates in an invariance principle. (Russian) Limit theorems of probability theory, 27–44, 175, Trudy Instituta Matematiki, 5, “Nauka” Sibirsk. Otdel., Novosibirsk. (1985)

  37. Shao, Q.M., Zhang, Z.S.: Identifying the limiting distribution by a general approach of Stein’s method. Sci. China Math. 59, 2379–2392 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  38. Stein, C.: A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In: Proceedings of Sixth Berkeley Symposium Mathematical Statistics and Probability, vol. 2, pp. 583–602. University of California Press. Berkeley, CA (1972)

  39. Stein, C.: Approximate Computation of Expectations. Lecture Notes 7. Institute of Mathematical Statistics, Hayward, CA (1986)

    MATH  Google Scholar 

  40. Valiant, G., Valiant, P.: A CLT and tight lower bounds for estimating entropy. In: Electronic Colloquium on Computational Complexity. TR10-179 (2010)

  41. Wang, F.Y., Xu, L., Zhang, X.: Gradient estimates for SDEs driven by multiplicative Levy noise. J. Funct. Anal. 269, 3195–3219 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  42. Zhai, A.: A high-dimensional CLT in \(\cal{W}_2\) distance with near optimal convergence rate. Probab. Theory Related Fields 170, 821–845 (2018)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We thank Michel Ledoux for very helpful discussions. We also thank two anonymous referees for their valuable comments which have improved the manuscript considerably. Fang X. was partially supported by Hong Kong RGC ECS 24301617, a CUHK direct grant and a CUHK start-up grant. Shao Q. M. was partially supported by Hong Kong RGC GRF 14302515 and 14304917. Xu L. was partially supported by Macao S.A.R. (FDCT 038/2017/A1, FDCT 030/2016/A1, FDCT 025/2016/A1), NNSFC 11571390, University of Macau (MYRG 2016-00025-FST, MYRG 2018-00133-FST).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lihu Xu.

Appendices

Appendix A: On the ergodicity of SDE (2.2)

This section provides the details of the verification of the ergodicity of SDE (2.2). There are several ways to prove the ergodicity of SDE (2.2); here, we follow the approach used by Eberle [14, Theorem 1 and Corollary 2] because it gives exponential convergence in Wasserstein distance. We verify the conditions in the theorem, adopting the notations in [14]. For any \(r>0\), define

$$\begin{aligned} \kappa (r) = \inf \left\{ -2 \frac{\langle x-y, g(x)-g(y)\rangle }{|x-y|^2}: \ x,y \in \mathbb {R}^d \ s.t. \ |x-y|=\sqrt{2} r\right\} . \end{aligned}$$

Compared with the conditions in [14], SDE (2.2) has \(\sigma =\sqrt{2} I_d\) and \(G=\frac{1}{2} I_d\) and the associated intrinsic metric is \(\frac{1}{\sqrt{2}} |\cdot |\) with \(|\cdot |\) being the Euclidean distance. By (2.3), we have

$$\begin{aligned} \langle x-y, g(x)-g(y)\rangle&= \int _0^1 \langle x-y, \nabla _{x-y}g(sx+(1-s)y)\rangle \mathrm {d}s \\&\le \ -\theta _0\int _0^1 \left( 1+\theta _1 |sx+(1-s)y|^{\theta _2}\right) |x-y|^2 \mathrm {d}s, \end{aligned}$$

which implies that

$$\begin{aligned} \kappa (r) \ \ge \ \inf \left\{ 2\theta _0 \int _0^1 \left( 1+\theta _1 |sx+(1-s)y|^{\theta _2}\right) \mathrm {d}s: \ x,y \in \mathbb {R}^d \ s.t. \ |x-y|=\sqrt{2} r\right\} . \end{aligned}$$

Therefore, we have \(\kappa (r)>0\) for \(r>0\) and thus \(\int _0^1 r \kappa (r)^{-} \mathrm {d}r=0\). Define

$$\begin{aligned} R_0= & {} \inf \{R \ge 0: \ \kappa (r) \ge 0 \ \ \forall \ r \ge R\},\\ R_1= & {} \inf \{R \ge R_0: \ \kappa (r) R(R-R_0)>8 \ \ \forall \ r \ge R\}. \end{aligned}$$

It is easy to check that \(R_0=0\) and \(R_1 \in (0,\infty )\).

As \(\kappa (r)>0\) for all \(r>0\), we have \(\varphi (r)=\exp \left( -\frac{1}{4} \int _0^r s \kappa (s)^{-} \mathrm {d}s\right) =1\) and thus \(\Phi (r)=\int _0^r \varphi (s)\mathrm {d}s=r\). Moreover, we have \(\alpha =1\) and

$$\begin{aligned} c = \left( \alpha \int _0^{R_1} \Phi (s) \varphi (s)^{-1} \mathrm {d}s\right) ^{-1} = \frac{2}{R^2_1}. \end{aligned}$$

Applying Corollary 2 in [14], we have

$$\begin{aligned} d_{\mathcal W}(\mathcal L(X_t^x),\mu ) \ \le \ 2 e^{-ct} d_{\mathcal W}(\delta _x,\mu ), \ \ \ \ \forall \ t>0, \end{aligned}$$
(A.1)

where \(\mathcal L(X_t^x)\) denotes the probability distribution of \(X_t^x\). Note that the convergence rate \(c>0\) only depends on \(\theta _0, \theta _1\) and \(\theta _2\).

From (A.1), we say that \(\mathcal L(X_t^x) \rightarrow \mu \) weakly, in the sense that for any bounded continuous function \(f: \mathbb {R}^d \rightarrow \mathbb {R}\), we have

$$\begin{aligned} \lim _{t\rightarrow \infty } \mathbb {E}f(X_t^x) = \mu (f), \end{aligned}$$

which immediately implies

$$\begin{aligned} \lim _{t\rightarrow \infty } \frac{1}{t} \int _0^t \mathbb {E}f(X_s^x) \mathrm {d}s = \mu (f). \end{aligned}$$

Appendix B: Multivariate normal approximation

In this appendix, we prove the results stated in Remark 2.9 with regard to multivariate normal approximation for sums of independent, bounded random vectors.

Theorem B.1

Let \(W=\frac{1}{\sqrt{n}}\sum _{i=1}^n X_i\) where \(X_1,\ldots , X_n\) are independent d-dimensional random vectors such that \(\mathbb {E}X_i=0\), \(|X_i|\le \beta \) and \(\mathbb {E}W W^\mathrm{T}=I_d\). Then we have

$$\begin{aligned} d_{\mathcal W}(\mathcal {L}(W), \mathcal {L}(Z))\ \le \frac{C d \beta }{\sqrt{n}}(1+\log n) \end{aligned}$$
(B.1)

and

$$\begin{aligned} d_{\mathcal W}(\mathcal {L}(W), \mathcal {L}(Z))\ \le \ \frac{Cd^2 \beta }{\sqrt{n}}, \end{aligned}$$
(B.2)

where C is an absolute constant and Z has the standard d-dimensional normal distribution.

Proof

Note that by the same smoothing and limiting arguments as in the proof of Theorem 2.5, we only need to consider test functions \(h\in \text {Lip}(\mathbb {R}^d, 1)\), which are smooth and have bounded derivatives of all orders. This is assumed throughout the proof so that the integration, differentiation, and their interchange are legitimate.

For multivariate normal approximation, the Stein equation (3.1) simplifies to

$$\begin{aligned} \Delta f(w)-\langle w, \nabla f(w) \rangle = h(w)-\mathbb {E}h(Z) . \end{aligned}$$
(B.3)

An appropriate solution to (B.3) is known to be

$$\begin{aligned} f_h(x) = -\int _0^{\infty } \{h*\phi _{\sqrt{1-e^{-2s}}}(e^{-s }x) -\mathbb {E}h(Z)\}ds, \end{aligned}$$
(B.4)

where \(*\) denotes the convolution and \(\phi _r(x)=(2\pi r^2)^{-d/2}e^{-|x|^2/2r^2}\). From (B.3), we have

$$\begin{aligned} d_{\mathcal W}(\mathcal {L}(W), \mathcal {L}(Z)) = \sup _{h\in \text {Lip}(\mathbb {R}^d, 1)}|\mathbb {E}W\cdot \nabla f(W)- \mathbb {E}\Delta f(W)| \end{aligned}$$

with \(f:=f_h\) in (B.4) (we omit the dependence of f on h for notational convenience).

Let C be a constant that may differ in different expressions. Denote

$$\begin{aligned} \eta : = d_{\mathcal W}(\mathcal {L}(W), \mathcal {L}(Z)). \end{aligned}$$
(B.5)

Let \(\{X_1',\ldots , X_n'\}\) be an independent copy of \(\{X_1,\ldots , X_n\}\). Let I be uniformly chosen from \(\{1,\ldots , n\}\) and be independent of \(\{X_1,\ldots , X_n, X_1',\ldots , X_n'\}\). Define

$$\begin{aligned} W' = W-\frac{X_I}{\sqrt{n}}+\frac{X_I'}{\sqrt{n}}. \end{aligned}$$

Then W and \(W'\) have the same distribution. Let

$$\begin{aligned} \delta = W'-W = \frac{X_I'}{\sqrt{n}}-\frac{X_I}{\sqrt{n}}. \end{aligned}$$

We have, by the independence assumption and the facts that \(\mathbb {E}X_i=0\) and \(\mathbb {E}W W^\mathrm{T}=I_d\),

$$\begin{aligned} \mathbb {E}[\delta |W] = \frac{1}{n}\sum _{i=1}^n \mathbb {E}[\frac{X_i'}{\sqrt{n}}-\frac{X_i}{\sqrt{n}}|W] = -\frac{1}{n}W \end{aligned}$$

and

$$\begin{aligned} \mathbb {E}[\delta \delta ^\mathrm{T}|W] = \frac{2}{n}I_d +\frac{1}{n}\{\mathbb {E}[\frac{1}{n}\sum _{i=1}^n X_i X_i^\mathrm{T}|W]-I_d\}. \end{aligned}$$

Therefore, (2.7) and (2.8) are satisfied with

$$\begin{aligned} \lambda = \frac{1}{n},\quad g(W)=-W, \quad R_1=0,\quad R_2 = \frac{1}{2}\left\{ \mathbb {E}[\frac{1}{n}\sum _{i=1}^n X_i X_i^\mathrm{T}|W]-I_d\right\} \ \end{aligned}$$

Note that this is Example 1 below Assumption 2.1 with \(\lambda _1=\cdots =\lambda _d=1\). By the boundedness condition,

$$\begin{aligned} |\delta |\le \frac{2\beta }{\sqrt{n}}. \end{aligned}$$

We also have

$$\begin{aligned} \mathbb {E}[|\delta |^2]=\frac{2}{n}\sum _{i=1}^n \mathbb {E}[\frac{|X_i|^2}{n}]=\frac{2d}{n}. \end{aligned}$$

As \(\beta ^2\ge \sum _{i=1}^n \mathbb {E}[\frac{|X_i|^2}{n}]=d\), we have \(\beta \ge \sqrt{d}\). Using these facts and assuming that \(\beta \le \sqrt{n}\) [otherwise (B.1) is trivial], in applying (2.10), we have

$$\begin{aligned}&\mathbb {E}|R_1| = 0,\nonumber \\&\quad \sqrt{d}\mathbb {E}[||R_2||_\mathrm{HS}] \le C\sqrt{d}\sqrt{\sum _{j,k=1}^d \mathrm{Var} [\frac{1}{n}\sum _{i=1}^n X_{ij} X_{ik}]}\nonumber \\&\qquad = C\sqrt{d}\sqrt{\sum _{j,k=1}^d\frac{1}{n^2}\sum _{i=1}^n \mathrm{Var} [ X_{ij} X_{ik}]}\le C\sqrt{d}\sqrt{\sum _{j,k=1}^d\frac{1}{n^2}\sum _{i=1}^n \mathbb {E}[ X_{ij}^2 X_{ik}^2]}\nonumber \\&\qquad = C\sqrt{d}\sqrt{\frac{1}{n^2}\sum _{i=1}^n \mathbb {E}[ |X_{i}|^4]}\le C\sqrt{d}\beta \sqrt{\frac{1}{n^2}\sum _{i=1}^n \mathbb {E}[ |X_{i}|^2]}=\frac{Cd\beta }{\sqrt{n}}, \end{aligned}$$
(B.6)

and

$$\begin{aligned}&\frac{1}{\lambda }\mathbb {E}\left[ |\delta |^3 \left( |\log |\delta || \vee 1\right) \right] \\&\quad \le Cn \frac{\beta }{\sqrt{n}} (1+\log n) \mathbb {E}[|\delta ^2|]\le \frac{Cd\beta }{\sqrt{n}}(1+\log n). \end{aligned}$$

This proves (B.1).

To prove (B.2), we modify the argument from (3.7) by using the explicit expression of f in (B.4). With \(\delta _i=(X_i'-X_i)/\sqrt{n}\) and \(h_s(x):=h(e^{-s}x)\), we have

$$\begin{aligned}&\frac{1}{\lambda } \int _0^1 |\mathbb {E}[\langle \delta \delta ^\mathrm{T}, \nabla ^2 f(W+t\delta )-\nabla ^2 f(W) \rangle _{\text {HS}} ] |dt\\&\quad =\int _0^1 |\mathbb {E}\sum _{i=1}^n \int _0^\infty \int _0^1 t \nabla _{\delta _i}^3 (h_s*\phi _{\sqrt{e^{2s}-1}})(W+ut\delta _i) du ds | dt, \end{aligned}$$

where \(\nabla _{\delta _i}^3:=\nabla _{\delta _i}\nabla _{\delta _i}\nabla _{\delta _i}\) (cf. Sect. 2.1). We separate the integration over s into \(\int _0^{\epsilon ^2}\) and \(\int _{\epsilon ^2}^\infty \) with an \(\epsilon \) to be chosen later. For the part \(\int _0^{\epsilon ^2}\), exactly following [40, pp. 18–19], we have

$$\begin{aligned} C \mathbb {E}\sum _{i=1}^n \int _0^{\epsilon ^2} e^{-s} |\delta _i|^2 \frac{1}{\sqrt{e^{2s}-1}} ds \ \le \ C d \epsilon , \end{aligned}$$
(B.7)

where we used \(\sum _{i=1}^n \mathbb {E}|\delta _i|^2=2d\). The part \(\int _{\epsilon ^2}^\infty \) is treated differently. Using the interchangeability of convolution and differentiation, we have

$$\begin{aligned}&\left| \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty \int _0^1 \nabla ^3_{\delta _i} (h_s * \phi _{\sqrt{e^{2s}-1}})(W+ut\delta _i)duds \right| \nonumber \\&\quad = \left| \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty \int _0^1 \int _{\mathbb {R}^d} h_s(W+ut\delta _i-x) \nabla ^3_{\delta _i} \phi _{\sqrt{e^{2s}-1}} (x) dx du ds \right| . \end{aligned}$$
(B.8)

Let \(\{\hat{X}_1,\ldots , \hat{X}_n\}\) be another independent copy of \(\{X_1,\ldots , X_n\}\) and be independent of \(\{X_1',\ldots , X_n'\}\), and let \(\hat{W}_i=W-\frac{X_i}{\sqrt{n}}+\frac{\hat{X}_i}{\sqrt{n}}\). From this construct, for each i, \(\hat{W}_i\) has the same distribution as W and is independent of \(\{X_i, X_i'\}\). Let \(\hat{Z}\) be an independent standard Gaussian vector. Rewriting

$$\begin{aligned} h_s(W+ut\delta _i-x)&= [h_s(W+ut\delta _i-x)-h_s(\hat{W}_i-x)]\\&\quad +[h_s(\hat{W}_i-x)-h_s(\hat{Z}-x)]\\&\quad +[h_s(\hat{Z}-x)], \end{aligned}$$

the term inside the absolute value in (B.8) is separated into three terms as follows:

$$\begin{aligned} R_{31}= & {} \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty \int _0^1 \int _{\mathbb {R}^d} [h_s(W+ut\delta _i-x)-h_s(\hat{W}_i-x)] \nabla ^3_{\delta _i} \phi _{\sqrt{e^{2s}-1}} (x) dx du ds,\\ R_{32}= & {} \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty \int _0^1 \int _{\mathbb {R}^d} [h_s(\hat{W}_i-x)-h_s(\hat{Z}-x)] \nabla ^3_{\delta _i} \phi _{\sqrt{e^{2s}-1}} (x) dx du ds,\\ R_{33}= & {} \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty \int _0^1 \int _{\mathbb {R}^d} h_s(\hat{Z}-x) \nabla ^3_{\delta _i} \phi _{\sqrt{e^{2s}-1}} (x) dx du ds. \end{aligned}$$

We bound their absolute values separately. From \(h\in \text {Lip}(\mathbb {R}^d, 1)\), \(h_s(x)=h(e^{-s}x)\) and \(|X_i|\le \beta \), we have

$$\begin{aligned}{}[h_s(W+ut\delta _i-x)-h_s(\hat{W}_i-x)]\ \le \ \frac{Ce^{-s}\beta }{\sqrt{n}}. \end{aligned}$$

Moreover,

$$\begin{aligned} \int _{\mathbb {R}^d} |\nabla ^3_{\delta _i} \phi _{\sqrt{e^{2s}-1}} (x) |dx \ \le \ C|\delta _i|^3\frac{1}{(e^{2s}-1)^{3/2}}. \end{aligned}$$

Therefore,

$$\begin{aligned} |R_{31}|\ \le \ C\sum _{i=1}^n \mathbb {E}\int _{\epsilon ^2}^\infty e^{-s} \frac{\beta |\delta _i|^3}{n^2} \frac{1}{(e^{2s}-1)^{3/2}}ds\ \le \ C \frac{d\beta ^2}{\epsilon n}, \end{aligned}$$

where we used \(|\delta _i|\le 2\beta /\sqrt{n}\) and \(\sum _{i=1}^n\mathbb {E}|\delta _i|^2=2d\).

From the definition of \(\eta \) in (B.5) and the fact that \(\hat{W}_i\) has the same distribution as W, we have

$$\begin{aligned} |\mathbb {E}[h_s(\hat{W}_i-x)-h_s(\hat{Z}-x)]|\ \le \ e^{-s} \eta . \end{aligned}$$

Using independence and the same argument as for \(R_{31}\), we have

$$\begin{aligned} |R_{32}|\ \le \ \frac{Cd\beta \eta }{\epsilon \sqrt{n}}. \end{aligned}$$

Now we bound \(R_{33}\). Using integration by parts, and combining two independent Gaussians into one, we have

$$\begin{aligned} |R_{33}| =&\left| \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty \int _0^1 \int _{\mathbb {R}^d} h_s(\hat{Z}-x) \nabla ^3_{\delta _i} \phi _{\sqrt{e^{2s}-1}} (x) dx dt ds \right| \\ =&\left| \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty \int _{\mathbb {R}^d} \nabla ^3_{\delta _i} h_s(\hat{Z}-x) \phi _{\sqrt{e^{2s}-1}} (x) dx ds \right| \\ =&\left| \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty \int _{\mathbb {R}^d} \nabla ^3_{\delta _i} h_s(x) \phi _{\sqrt{e^{2s}}} (x) dx ds \right| \\ =&\left| \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty \int _{\mathbb {R}^d} \nabla _{\delta _i} h_s(x) D^2_{\delta _i} \phi _{\sqrt{e^{2s}}} (x) dx ds \right| \\ \ \le&\left| \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty |\delta _i| e^{-s} |\delta _i|^2 \frac{C}{e^{2s}} ds \right| \ \le \ \frac{Cd\beta }{\sqrt{n}}. \end{aligned}$$

From (B.6), (B.7) and the bounds on \(R_{31}, R_{32}\) and \(R_{33}\), we have

$$\begin{aligned} \eta \ \le \ C\left( \frac{d \beta }{\sqrt{n}}+d\epsilon +\frac{d \beta ^2}{\epsilon n}+\frac{d\beta \eta }{\epsilon \sqrt{n}}+\frac{d\beta }{\sqrt{n}}\right) . \end{aligned}$$

The theorem is proved by choosing \(\epsilon \) to be a large multiple of \(d\beta /\sqrt{n}\) and solving the recursive inequality for \(\eta \). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, X., Shao, QM. & Xu, L. Multivariate approximations in Wasserstein distance by Stein’s method and Bismut’s formula. Probab. Theory Relat. Fields 174, 945–979 (2019). https://doi.org/10.1007/s00440-018-0874-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00440-018-0874-5

Keywords

Mathematics Subject Classification

Navigation