Multivariate approximations in Wasserstein distance by Stein’s method and Bismut’s formula

Fang, Xiao; Shao, Qi-Man; Xu, Lihu

doi:10.1007/s00440-018-0874-5

Multivariate approximations in Wasserstein distance by Stein’s method and Bismut’s formula

Published: 28 September 2018

Volume 174, pages 945–979, (2019)
Cite this article

Probability Theory and Related Fields Aims and scope Submit manuscript

Xiao Fang¹,
Qi-Man Shao¹ &
Lihu Xu^2,3

1960 Accesses
21 Citations
Explore all metrics

A Correction to this article was published on 11 July 2019

This article has been updated

Abstract

Stein’s method has been widely used for probability approximations. However, in the multi-dimensional setting, most of the results are for multivariate normal approximation or for test functions with bounded second- or higher-order derivatives. For a class of multivariate limiting distributions, we use Bismut’s formula in Malliavin calculus to control the derivatives of the Stein equation solutions by the first derivative of the test function. Combined with Stein’s exchangeable pair approach, we obtain a general theorem for multivariate approximations with near optimal error bounds on the Wasserstein distance. We apply the theorem to the unadjusted Langevin algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bernstein–Jackson Inequalities on Gaussian Hilbert Spaces

Article Open access 12 September 2023

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

Limitations of the Wasserstein MDE for univariate data

Article Open access 17 October 2022

Change history

11 July 2019
We write this note to correct [1, <Emphasis Type="Bold">(6.9), (6.13), (7.1), (7.2)</Emphasis>] because there was one term <Emphasis Type="Italic">missed</Emphasis> in [1, (6.9)].

References

Albeverio, S., Bogachev, V., Röckner, M.: On uniqueness of invariant measures for finite-and infinite-dimensional diffusions. Commun. Pure Appl. Math. 52, 325–362 (1999)
Article MathSciNet MATH Google Scholar
Bentkus, V.: A Lyapunov type bound in ${\bf R}^d$. Theory Probab. Appl. 49, 311–323 (2005)
Article MathSciNet MATH Google Scholar
Bobkov, S.G.: Berry–Esseen bounds and Edgeworth expansions in the central limit theorem for transport distances. Probab. Theory Related Fields 170, 229–262 (2018)
Article MathSciNet MATH Google Scholar
Bonis, T.: Rates in the central limit theorem and diffusion approximation via Stein’s method (2018). Preprint. arXiv:1506.06966
Braverman, A., Dai, J.G.: Stein’s method for steady-state diffusion approximations of $M/Ph/n+M$ systems. Ann. Appl. Probab. 27, 550–581 (2017)
Article MathSciNet MATH Google Scholar
Cerrai, S.: Second Order PDE’s in Finite and Infinite Dimensions: A Probabilistic Approach. Lecture Notes in Mathematics, vol. 1762. Springer, Berlin (2001)
Book MATH Google Scholar
Chatterjee, S., Meckes, E.: Multivariate normal approximation using exchangeable pairs. ALEA Lat. Am. J. Probab. Math. Stat. 4, 257–283 (2008)
MathSciNet MATH Google Scholar
Chatterjee, S., Shao, Q.M.: Nonnormal approximation by Stein’s method of exchangeable pairs with application to the Curie–Weiss model. Ann. Appl. Probab. 21, 464–483 (2011)
Article MathSciNet MATH Google Scholar
Courtade, T. A., Fathi, M., Pananjady, A.: Existence of Stein kernels under a spectral gap, and discrepancy bound (2018). Preprint. arXiv:1703.07707
Dalalyan, A.S.: Theoretical guarantees for approximate sampling from smooth and log-concave densities. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79, 651–676 (2017)
Article MathSciNet MATH Google Scholar
Da Prato, G., Goldys, B.: Elliptic operators on ${\bf R}^d$ with unbounded coefficients. J. Differ. Equ. 172, 333–358 (2001)
Article MATH Google Scholar
Down, D., Meyn, S.P., Tweedie, R.L.: Exponential and uniform ergodicity of Markov processes. Ann. Probab. 23, 1671–1691 (1995)
Article MathSciNet MATH Google Scholar
Durmus, A., Moulines, E.: High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm (2018). Preprint. arXiv:1605.01559
Eberle, A.: Reflection couplings and contraction rates for diffusions. Probab. Theory Related Fields 166, 851–886 (2016)
Article MathSciNet MATH Google Scholar
Eldan, R., Mikulincer, D., Zhai, A.: The CLT in high dimensions: quantitative bounds via martingale embedding (2018). Preprint. arXiv:1806.09087
Elworthy, L., Li, X.M.: Formulae for the derivatives of heat semigroups. J. Funct. Anal. 125, 252–286 (1994)
Article MathSciNet MATH Google Scholar
Goldstein, L., Rinott, Y.: Multivariate normal approximation by Stein’s method and size bias couplings. Appl. Probab. Index 33, 1–17 (1996)
Article MathSciNet MATH Google Scholar
Gorham, J., Duncan, A. B., Vollmer, S. J., Mackey, L.: Measuring sample quality with diffusions (2017). Preprint. arXiv:1611.06972
Götze, F.: On the rate of convergence in the multivariate CLT. Ann. Probab. 19, 724–739 (1991)
Article MathSciNet MATH Google Scholar
Gurvich, I.: Diffusion models and steady-state approximations for exponentially ergodic Markovian queues. Ann. Appl. Probab. 24, 2527–2559 (2014)
Article MathSciNet MATH Google Scholar
Hairer, M., Mattingly, J.C.: Ergodicity of the 2D Navier–Stokes equations with degenerate stochastic forcing. Ann. Math. 164, 993–1032 (2006)
Article MathSciNet MATH Google Scholar
Kusuoka, S., Tudor, C.A.: Characterization of the convergence in total variation and extension of the fourth moment theorem to invariant measures of diffusions. Bernoulli 24, 1463–1496 (2018)
Article MathSciNet MATH Google Scholar
Ledoux, M., Nourdin, I., Peccati, G.: Stein’s method, logarithmic Sobolev and transport inequalities. Geom. Funct. Anal. 25, 256–306 (2015)
Article MathSciNet MATH Google Scholar
Mackey, L.: Private communications (2018)
Mackey, L., Gorham, J.: Multivariate Stein factors for a class of strongly log-concave distributions. Electron. Commun. Probab. 21, 1–14 (2016)
MathSciNet MATH Google Scholar
Norris, J.: Simplified Malliavin calculus. Séminaire de Probabilités, XX, 1984/85, 101–130, Lecture Notes in Mathematics. Springer, Berlin (1986)
Book MATH Google Scholar
Nourdin, I., Peccati, G.: Stein’s method on Wiener chaos. Probab. Theory Relat. Fields 145, 75–118 (2009)
Article MathSciNet MATH Google Scholar
Nourdin, I., Peccati, G.: Normal Approximations with Malliavin Calculus. From Stein’s Method to Universality. Cambridge Tracts in Mathematics, vol. 192. Cambridge University Press, Cambridge (2012)
Book MATH Google Scholar
Nourdin, I., Peccati, G., Réveillac, A.: Multivariate normal approximation using Stein’s method and Malliavin calculus. Ann. Inst. Henri Poincare Probab. Stat. 46, 45–58 (2010)
Article MathSciNet MATH Google Scholar
Partington, J.R.: Linear Operators and Linear Systems. London Mathematical Society Student Texts, vol. 60. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Reinert, G., Röllin, A.: Multivariate normal approximation with Stein’s method of exchangeable pairs under a general linearity condition. Ann. Probab. 37, 2150–2173 (2009)
Article MathSciNet MATH Google Scholar
Revuz, D., Yor, M.: Continuous Martingales and Brownian Motion. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 293, 3rd edn. Springer, Berlin (1999)
Book MATH Google Scholar
Rinott, Y., Rotar, V.: A multivariate CLT for local dependence with $n^{-1/2} \log n$ rate and applications to multivariate graph related statistics. J. Multivar. Anal. 56, 333–350 (1996)
Article MATH Google Scholar
Roberts, G.O., Tweedie, R.L.: Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 4, 341–363 (1996)
Article MathSciNet MATH Google Scholar
Röllin, A.: A note on the exchangeability condition in Stein’s method. Stat. Probab. Lett. 78, 1800–1806 (2008)
Article MathSciNet MATH Google Scholar
Sakhanenko, A. I.: Estimates in an invariance principle. (Russian) Limit theorems of probability theory, 27–44, 175, Trudy Instituta Matematiki, 5, “Nauka” Sibirsk. Otdel., Novosibirsk. (1985)
Shao, Q.M., Zhang, Z.S.: Identifying the limiting distribution by a general approach of Stein’s method. Sci. China Math. 59, 2379–2392 (2016)
Article MathSciNet MATH Google Scholar
Stein, C.: A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In: Proceedings of Sixth Berkeley Symposium Mathematical Statistics and Probability, vol. 2, pp. 583–602. University of California Press. Berkeley, CA (1972)
Stein, C.: Approximate Computation of Expectations. Lecture Notes 7. Institute of Mathematical Statistics, Hayward, CA (1986)
MATH Google Scholar
Valiant, G., Valiant, P.: A CLT and tight lower bounds for estimating entropy. In: Electronic Colloquium on Computational Complexity. TR10-179 (2010)
Wang, F.Y., Xu, L., Zhang, X.: Gradient estimates for SDEs driven by multiplicative Levy noise. J. Funct. Anal. 269, 3195–3219 (2015)
Article MathSciNet MATH Google Scholar
Zhai, A.: A high-dimensional CLT in $\cal{W}_2$ distance with near optimal convergence rate. Probab. Theory Related Fields 170, 821–845 (2018)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We thank Michel Ledoux for very helpful discussions. We also thank two anonymous referees for their valuable comments which have improved the manuscript considerably. Fang X. was partially supported by Hong Kong RGC ECS 24301617, a CUHK direct grant and a CUHK start-up grant. Shao Q. M. was partially supported by Hong Kong RGC GRF 14302515 and 14304917. Xu L. was partially supported by Macao S.A.R. (FDCT 038/2017/A1, FDCT 030/2016/A1, FDCT 025/2016/A1), NNSFC 11571390, University of Macau (MYRG 2016-00025-FST, MYRG 2018-00133-FST).

Author information

Authors and Affiliations

Department of Statistics, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Xiao Fang & Qi-Man Shao
Department of Mathematics, Faculty of Science and Technology, University of Macau, Av. Padre Tomás Pereira, Taipa, Macau, China
Lihu Xu
Zhuhai UM Science and Technology Research Institute, Zhuhai, China
Lihu Xu

Authors

Xiao Fang
View author publications
You can also search for this author in PubMed Google Scholar
Qi-Man Shao
View author publications
You can also search for this author in PubMed Google Scholar
Lihu Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lihu Xu.

Appendices

Appendix A: On the ergodicity of SDE (2.2)

This section provides the details of the verification of the ergodicity of SDE (2.2). There are several ways to prove the ergodicity of SDE (2.2); here, we follow the approach used by Eberle [14, Theorem 1 and Corollary 2] because it gives exponential convergence in Wasserstein distance. We verify the conditions in the theorem, adopting the notations in [14]. For any $r>0$, define

$$\begin{aligned} \kappa (r) = \inf \left\{ -2 \frac{\langle x-y, g(x)-g(y)\rangle }{|x-y|^2}: \ x,y \in \mathbb {R}^d \ s.t. \ |x-y|=\sqrt{2} r\right\} . \end{aligned}$$

Compared with the conditions in [14], SDE (2.2) has $\sigma =\sqrt{2} I_d$ and $G=\frac{1}{2} I_d$ and the associated intrinsic metric is $\frac{1}{\sqrt{2}} |\cdot |$ with $|\cdot |$ being the Euclidean distance. By (2.3), we have

$$\begin{aligned} \langle x-y, g(x)-g(y)\rangle&= \int _0^1 \langle x-y, \nabla _{x-y}g(sx+(1-s)y)\rangle \mathrm {d}s \\&\le \ -\theta _0\int _0^1 \left( 1+\theta _1 |sx+(1-s)y|^{\theta _2}\right) |x-y|^2 \mathrm {d}s, \end{aligned}$$

which implies that

$$\begin{aligned} \kappa (r) \ \ge \ \inf \left\{ 2\theta _0 \int _0^1 \left( 1+\theta _1 |sx+(1-s)y|^{\theta _2}\right) \mathrm {d}s: \ x,y \in \mathbb {R}^d \ s.t. \ |x-y|=\sqrt{2} r\right\} . \end{aligned}$$

Therefore, we have $\kappa (r)>0$ for $r>0$ and thus $\int _0^1 r \kappa (r)^{-} \mathrm {d}r=0$. Define

$$\begin{aligned} R_0= & {} \inf \{R \ge 0: \ \kappa (r) \ge 0 \ \ \forall \ r \ge R\},\\ R_1= & {} \inf \{R \ge R_0: \ \kappa (r) R(R-R_0)>8 \ \ \forall \ r \ge R\}. \end{aligned}$$

It is easy to check that $R_0=0$ and $R_1 \in (0,\infty )$.

As $\kappa (r)>0$ for all $r>0$, we have $\varphi (r)=\exp \left( -\frac{1}{4} \int _0^r s \kappa (s)^{-} \mathrm {d}s\right) =1$ and thus $\Phi (r)=\int _0^r \varphi (s)\mathrm {d}s=r$. Moreover, we have $\alpha =1$ and

$$\begin{aligned} c = \left( \alpha \int _0^{R_1} \Phi (s) \varphi (s)^{-1} \mathrm {d}s\right) ^{-1} = \frac{2}{R^2_1}. \end{aligned}$$

Applying Corollary 2 in [14], we have

$$\begin{aligned} d_{\mathcal W}(\mathcal L(X_t^x),\mu ) \ \le \ 2 e^{-ct} d_{\mathcal W}(\delta _x,\mu ), \ \ \ \ \forall \ t>0, \end{aligned}$$

(A.1)

where $\mathcal L(X_t^x)$ denotes the probability distribution of $X_t^x$. Note that the convergence rate $c>0$ only depends on $\theta _0, \theta _1$ and $\theta _2$.

From (A.1), we say that $\mathcal L(X_t^x) \rightarrow \mu $ weakly, in the sense that for any bounded continuous function $f: \mathbb {R}^d \rightarrow \mathbb {R}$, we have

$$\begin{aligned} \lim _{t\rightarrow \infty } \mathbb {E}f(X_t^x) = \mu (f), \end{aligned}$$

which immediately implies

$$\begin{aligned} \lim _{t\rightarrow \infty } \frac{1}{t} \int _0^t \mathbb {E}f(X_s^x) \mathrm {d}s = \mu (f). \end{aligned}$$

Appendix B: Multivariate normal approximation

In this appendix, we prove the results stated in Remark 2.9 with regard to multivariate normal approximation for sums of independent, bounded random vectors.

Theorem B.1

Let $W=\frac{1}{\sqrt{n}}\sum _{i=1}^n X_i$ where $X_1,\ldots , X_n$ are independent d-dimensional random vectors such that $\mathbb {E}X_i=0$, $|X_i|\le \beta $ and $\mathbb {E}W W^\mathrm{T}=I_d$. Then we have

$$\begin{aligned} d_{\mathcal W}(\mathcal {L}(W), \mathcal {L}(Z))\ \le \frac{C d \beta }{\sqrt{n}}(1+\log n) \end{aligned}$$

(B.1)

and

$$\begin{aligned} d_{\mathcal W}(\mathcal {L}(W), \mathcal {L}(Z))\ \le \ \frac{Cd^2 \beta }{\sqrt{n}}, \end{aligned}$$

(B.2)

where C is an absolute constant and Z has the standard d-dimensional normal distribution.

Proof

Note that by the same smoothing and limiting arguments as in the proof of Theorem 2.5, we only need to consider test functions $h\in \text {Lip}(\mathbb {R}^d, 1)$, which are smooth and have bounded derivatives of all orders. This is assumed throughout the proof so that the integration, differentiation, and their interchange are legitimate.

For multivariate normal approximation, the Stein equation (3.1) simplifies to

$$\begin{aligned} \Delta f(w)-\langle w, \nabla f(w) \rangle = h(w)-\mathbb {E}h(Z) . \end{aligned}$$

(B.3)

An appropriate solution to (B.3) is known to be

$$\begin{aligned} f_h(x) = -\int _0^{\infty } \{h*\phi _{\sqrt{1-e^{-2s}}}(e^{-s }x) -\mathbb {E}h(Z)\}ds, \end{aligned}$$

(B.4)

where $*$ denotes the convolution and $\phi _r(x)=(2\pi r^2)^{-d/2}e^{-|x|^2/2r^2}$. From (B.3), we have

$$\begin{aligned} d_{\mathcal W}(\mathcal {L}(W), \mathcal {L}(Z)) = \sup _{h\in \text {Lip}(\mathbb {R}^d, 1)}|\mathbb {E}W\cdot \nabla f(W)- \mathbb {E}\Delta f(W)| \end{aligned}$$

with $f:=f_h$ in (B.4) (we omit the dependence of f on h for notational convenience).

Let C be a constant that may differ in different expressions. Denote

$$\begin{aligned} \eta : = d_{\mathcal W}(\mathcal {L}(W), \mathcal {L}(Z)). \end{aligned}$$

(B.5)

Let $\{X_1',\ldots , X_n'\}$ be an independent copy of $\{X_1,\ldots , X_n\}$. Let I be uniformly chosen from $\{1,\ldots , n\}$ and be independent of $\{X_1,\ldots , X_n, X_1',\ldots , X_n'\}$. Define

$$\begin{aligned} W' = W-\frac{X_I}{\sqrt{n}}+\frac{X_I'}{\sqrt{n}}. \end{aligned}$$

Then W and $W'$ have the same distribution. Let

$$\begin{aligned} \delta = W'-W = \frac{X_I'}{\sqrt{n}}-\frac{X_I}{\sqrt{n}}. \end{aligned}$$

We have, by the independence assumption and the facts that $\mathbb {E}X_i=0$ and $\mathbb {E}W W^\mathrm{T}=I_d$,

$$\begin{aligned} \mathbb {E}[\delta |W] = \frac{1}{n}\sum _{i=1}^n \mathbb {E}[\frac{X_i'}{\sqrt{n}}-\frac{X_i}{\sqrt{n}}|W] = -\frac{1}{n}W \end{aligned}$$

and

$$\begin{aligned} \mathbb {E}[\delta \delta ^\mathrm{T}|W] = \frac{2}{n}I_d +\frac{1}{n}\{\mathbb {E}[\frac{1}{n}\sum _{i=1}^n X_i X_i^\mathrm{T}|W]-I_d\}. \end{aligned}$$

Therefore, (2.7) and (2.8) are satisfied with

$$\begin{aligned} \lambda = \frac{1}{n},\quad g(W)=-W, \quad R_1=0,\quad R_2 = \frac{1}{2}\left\{ \mathbb {E}[\frac{1}{n}\sum _{i=1}^n X_i X_i^\mathrm{T}|W]-I_d\right\} \ \end{aligned}$$

Note that this is Example 1 below Assumption 2.1 with $\lambda _1=\cdots =\lambda _d=1$. By the boundedness condition,

$$\begin{aligned} |\delta |\le \frac{2\beta }{\sqrt{n}}. \end{aligned}$$

We also have

$$\begin{aligned} \mathbb {E}[|\delta |^2]=\frac{2}{n}\sum _{i=1}^n \mathbb {E}[\frac{|X_i|^2}{n}]=\frac{2d}{n}. \end{aligned}$$

As $\beta ^2\ge \sum _{i=1}^n \mathbb {E}[\frac{|X_i|^2}{n}]=d$, we have $\beta \ge \sqrt{d}$. Using these facts and assuming that $\beta \le \sqrt{n}$ [otherwise (B.1) is trivial], in applying (2.10), we have

$$\begin{aligned}&\mathbb {E}|R_1| = 0,\nonumber \\&\quad \sqrt{d}\mathbb {E}[||R_2||_\mathrm{HS}] \le C\sqrt{d}\sqrt{\sum _{j,k=1}^d \mathrm{Var} [\frac{1}{n}\sum _{i=1}^n X_{ij} X_{ik}]}\nonumber \\&\qquad = C\sqrt{d}\sqrt{\sum _{j,k=1}^d\frac{1}{n^2}\sum _{i=1}^n \mathrm{Var} [ X_{ij} X_{ik}]}\le C\sqrt{d}\sqrt{\sum _{j,k=1}^d\frac{1}{n^2}\sum _{i=1}^n \mathbb {E}[ X_{ij}^2 X_{ik}^2]}\nonumber \\&\qquad = C\sqrt{d}\sqrt{\frac{1}{n^2}\sum _{i=1}^n \mathbb {E}[ |X_{i}|^4]}\le C\sqrt{d}\beta \sqrt{\frac{1}{n^2}\sum _{i=1}^n \mathbb {E}[ |X_{i}|^2]}=\frac{Cd\beta }{\sqrt{n}}, \end{aligned}$$

(B.6)

and

$$\begin{aligned}&\frac{1}{\lambda }\mathbb {E}\left[ |\delta |^3 \left( |\log |\delta || \vee 1\right) \right] \\&\quad \le Cn \frac{\beta }{\sqrt{n}} (1+\log n) \mathbb {E}[|\delta ^2|]\le \frac{Cd\beta }{\sqrt{n}}(1+\log n). \end{aligned}$$

This proves (B.1).

To prove (B.2), we modify the argument from (3.7) by using the explicit expression of f in (B.4). With $\delta _i=(X_i'-X_i)/\sqrt{n}$ and $h_s(x):=h(e^{-s}x)$, we have

$$\begin{aligned}&\frac{1}{\lambda } \int _0^1 |\mathbb {E}[\langle \delta \delta ^\mathrm{T}, \nabla ^2 f(W+t\delta )-\nabla ^2 f(W) \rangle _{\text {HS}} ] |dt\\&\quad =\int _0^1 |\mathbb {E}\sum _{i=1}^n \int _0^\infty \int _0^1 t \nabla _{\delta _i}^3 (h_s*\phi _{\sqrt{e^{2s}-1}})(W+ut\delta _i) du ds | dt, \end{aligned}$$

where $\nabla _{\delta _i}^3:=\nabla _{\delta _i}\nabla _{\delta _i}\nabla _{\delta _i}$ (cf. Sect. 2.1). We separate the integration over s into $\int _0^{\epsilon ^2}$ and $\int _{\epsilon ^2}^\infty $ with an $\epsilon $ to be chosen later. For the part $\int _0^{\epsilon ^2}$, exactly following [40, pp. 18–19], we have

$$\begin{aligned} C \mathbb {E}\sum _{i=1}^n \int _0^{\epsilon ^2} e^{-s} |\delta _i|^2 \frac{1}{\sqrt{e^{2s}-1}} ds \ \le \ C d \epsilon , \end{aligned}$$

(B.7)

where we used $\sum _{i=1}^n \mathbb {E}|\delta _i|^2=2d$. The part $\int _{\epsilon ^2}^\infty $ is treated differently. Using the interchangeability of convolution and differentiation, we have

$$\begin{aligned}&\left| \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty \int _0^1 \nabla ^3_{\delta _i} (h_s * \phi _{\sqrt{e^{2s}-1}})(W+ut\delta _i)duds \right| \nonumber \\&\quad = \left| \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty \int _0^1 \int _{\mathbb {R}^d} h_s(W+ut\delta _i-x) \nabla ^3_{\delta _i} \phi _{\sqrt{e^{2s}-1}} (x) dx du ds \right| . \end{aligned}$$

(B.8)

Let $\{\hat{X}_1,\ldots , \hat{X}_n\}$ be another independent copy of $\{X_1,\ldots , X_n\}$ and be independent of $\{X_1',\ldots , X_n'\}$, and let $\hat{W}_i=W-\frac{X_i}{\sqrt{n}}+\frac{\hat{X}_i}{\sqrt{n}}$. From this construct, for each i, $\hat{W}_i$ has the same distribution as W and is independent of $\{X_i, X_i'\}$. Let $\hat{Z}$ be an independent standard Gaussian vector. Rewriting

$$\begin{aligned} h_s(W+ut\delta _i-x)&= [h_s(W+ut\delta _i-x)-h_s(\hat{W}_i-x)]\\&\quad +[h_s(\hat{W}_i-x)-h_s(\hat{Z}-x)]\\&\quad +[h_s(\hat{Z}-x)], \end{aligned}$$

the term inside the absolute value in (B.8) is separated into three terms as follows:

$$\begin{aligned} R_{31}= & {} \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty \int _0^1 \int _{\mathbb {R}^d} [h_s(W+ut\delta _i-x)-h_s(\hat{W}_i-x)] \nabla ^3_{\delta _i} \phi _{\sqrt{e^{2s}-1}} (x) dx du ds,\\ R_{32}= & {} \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty \int _0^1 \int _{\mathbb {R}^d} [h_s(\hat{W}_i-x)-h_s(\hat{Z}-x)] \nabla ^3_{\delta _i} \phi _{\sqrt{e^{2s}-1}} (x) dx du ds,\\ R_{33}= & {} \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty \int _0^1 \int _{\mathbb {R}^d} h_s(\hat{Z}-x) \nabla ^3_{\delta _i} \phi _{\sqrt{e^{2s}-1}} (x) dx du ds. \end{aligned}$$

We bound their absolute values separately. From $h\in \text {Lip}(\mathbb {R}^d, 1)$, $h_s(x)=h(e^{-s}x)$ and $|X_i|\le \beta $, we have

$$\begin{aligned}{}[h_s(W+ut\delta _i-x)-h_s(\hat{W}_i-x)]\ \le \ \frac{Ce^{-s}\beta }{\sqrt{n}}. \end{aligned}$$

Moreover,

$$\begin{aligned} \int _{\mathbb {R}^d} |\nabla ^3_{\delta _i} \phi _{\sqrt{e^{2s}-1}} (x) |dx \ \le \ C|\delta _i|^3\frac{1}{(e^{2s}-1)^{3/2}}. \end{aligned}$$

Therefore,

$$\begin{aligned} |R_{31}|\ \le \ C\sum _{i=1}^n \mathbb {E}\int _{\epsilon ^2}^\infty e^{-s} \frac{\beta |\delta _i|^3}{n^2} \frac{1}{(e^{2s}-1)^{3/2}}ds\ \le \ C \frac{d\beta ^2}{\epsilon n}, \end{aligned}$$

where we used $|\delta _i|\le 2\beta /\sqrt{n}$ and $\sum _{i=1}^n\mathbb {E}|\delta _i|^2=2d$.

From the definition of $\eta $ in (B.5) and the fact that $\hat{W}_i$ has the same distribution as W, we have

$$\begin{aligned} |\mathbb {E}[h_s(\hat{W}_i-x)-h_s(\hat{Z}-x)]|\ \le \ e^{-s} \eta . \end{aligned}$$

Using independence and the same argument as for $R_{31}$, we have

$$\begin{aligned} |R_{32}|\ \le \ \frac{Cd\beta \eta }{\epsilon \sqrt{n}}. \end{aligned}$$

Now we bound $R_{33}$. Using integration by parts, and combining two independent Gaussians into one, we have

$$\begin{aligned} |R_{33}| =&\left| \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty \int _0^1 \int _{\mathbb {R}^d} h_s(\hat{Z}-x) \nabla ^3_{\delta _i} \phi _{\sqrt{e^{2s}-1}} (x) dx dt ds \right| \\ =&\left| \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty \int _{\mathbb {R}^d} \nabla ^3_{\delta _i} h_s(\hat{Z}-x) \phi _{\sqrt{e^{2s}-1}} (x) dx ds \right| \\ =&\left| \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty \int _{\mathbb {R}^d} \nabla ^3_{\delta _i} h_s(x) \phi _{\sqrt{e^{2s}}} (x) dx ds \right| \\ =&\left| \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty \int _{\mathbb {R}^d} \nabla _{\delta _i} h_s(x) D^2_{\delta _i} \phi _{\sqrt{e^{2s}}} (x) dx ds \right| \\ \ \le&\left| \mathbb {E}\sum _{i=1}^n \int _{\epsilon ^2}^\infty |\delta _i| e^{-s} |\delta _i|^2 \frac{C}{e^{2s}} ds \right| \ \le \ \frac{Cd\beta }{\sqrt{n}}. \end{aligned}$$

From (B.6), (B.7) and the bounds on $R_{31}, R_{32}$ and $R_{33}$, we have

$$\begin{aligned} \eta \ \le \ C\left( \frac{d \beta }{\sqrt{n}}+d\epsilon +\frac{d \beta ^2}{\epsilon n}+\frac{d\beta \eta }{\epsilon \sqrt{n}}+\frac{d\beta }{\sqrt{n}}\right) . \end{aligned}$$

The theorem is proved by choosing $\epsilon $ to be a large multiple of $d\beta /\sqrt{n}$ and solving the recursive inequality for $\eta $. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fang, X., Shao, QM. & Xu, L. Multivariate approximations in Wasserstein distance by Stein’s method and Bismut’s formula. Probab. Theory Relat. Fields 174, 945–979 (2019). https://doi.org/10.1007/s00440-018-0874-5

Download citation

Received: 25 January 2018
Revised: 03 September 2018
Published: 28 September 2018
Issue Date: 01 August 2019
DOI: https://doi.org/10.1007/s00440-018-0874-5

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multivariate approximations in Wasserstein distance by Stein’s method and Bismut’s formula

Abstract

Access this article

Similar content being viewed by others

Bernstein–Jackson Inequalities on Gaussian Hilbert Spaces

Random Gradient-Free Minimization of Convex Functions

Limitations of the Wasserstein MDE for univariate data

Change history

11 July 2019

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: On the ergodicity of SDE (2.2)

Appendix B: Multivariate normal approximation

Theorem B.1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Multivariate approximations in Wasserstein distance by Stein’s method and Bismut’s formula

Abstract

Access this article

Similar content being viewed by others

Bernstein–Jackson Inequalities on Gaussian Hilbert Spaces

Random Gradient-Free Minimization of Convex Functions

Limitations of the Wasserstein MDE for univariate data

Change history

11 July 2019

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: On the ergodicity of SDE (2.2)

Appendix B: Multivariate normal approximation

Theorem B.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation