Multivariate normal approximation on the Wiener space: new bounds in the convex distance

We establish explicit bounds on the convex distance between the distribution of a vector of smooth functionals of a Gaussian field, and that of a normal vector with a positive definite covariance matrix. Our bounds are commensurate to the ones obtained by Nourdin, Peccati and R\'eveillac (2010) for the (smoother) 1-Wasserstein distance, and do not involve any additional logarithmic factor. One of the main tools exploited in our work is a recursive estimate on the convex distance recently obtained by Schulte and Yukich (2019). We illustrate our abstract results in two different situations: (i) we prove a quantitative multivariate fourth moment theorem for vectors of multiple Wiener-It\^o integrals, and (ii) we characterise the rate of convergence for the finite-dimensional distributions in the functional Breuer-Major theorem.


Introduction
Fix m ≥ 1, and consider random vectors F and G with values in R m . The convex distance between the distributions of F and G is defined as where the supremum runs over the class I m of indicator functions of the measurable convex subsets of R m . For m ≥ 2, the distance d c represents a natural counterpart to the well-known Kolmogorov distance on the class of probability distributions on the real line, and enjoys a number of desirable invariance properties that make it well-adapted to applications 1 .
The aim of the present note is to establish explicit bounds on the quantity d c (F, G), in the special case where F is a vector of smooth functionals of an infinite-dimensional Gaussian field, and G = N Σ is a m-dimensional centered Gaussian vector with covariance Σ > 0. Our main tool is the so-called Malliavin-Stein method for probabilistic approximations [17], that we will combine with some powerful recursive estimates on d c , recently derived in [29] in the context of multidimensional second-order Poincaré inequalities on the Poisson space -see Lemma 2.1 below.
Multidimensional normal approximations in the convex distance have been the object of an intense study since several decades, mostly in connection with multivariate central limit theorems (CLTs) for sums of independent random vectors -see e.g. [4,10,13,14] for some classical references, as well as [28] for recent advances and for a discussion of further relevant literature. The specific challenge we are setting ourselves in the present work is to establish bounds on the quantity d c (F, N Σ ) that coincide (up to an absolute multiplicative constant) with the bounds deduced in [19] on the 1-Wasserstein distance d W (F, N Σ ) := sup h∈Lip (1) where Lip(1) denotes the class of C 1 mappings h : R m → R with Lipschitz constant not exceeding 1. We will see that our estimates systematically improve the bounds that one can infer from the general inequality where K is an absolute constant uniquely depending on Σ. For the sake of completeness, a full proof of (3) is presented in Appendix A, where one can also find more details on the constant K.
Remark 1.1. In order for the quantity d W (F, N Σ ) to be well-defined, one needs that E F R m < ∞. In Appendix A we will also implicitly use the well-known representation where the infimum runs over all couplings (U, V) of F and N Σ . See [30, for a proof of this fact, and for further relevant properties of Wasserstein distances.
The main contributions of our paper are described in full detail in Section 1.4 and Section 1.5. Section 1.1 contains some elements of Malliavin calculus that are necessary in order to state our findings. Section 1.2 discusses some estimates on the smooth distance d 2 (to be defined therein) that can be obtained by interpolation techniques, whereas Section 1.3 provides an overview of the main results of [19]. Remark on notation. From now on, every random element is assumed to be defined on a common probability space (Ω, F , P), with E denoting expectation with respect to P. For p ≥ 1, we write L p (Ω) := L p (Ω, F , P).

Elements of Malliavin calculus
The reader is referred e.g. to the monographs [17,23,24] for a detailed discussion of the concepts and results presented in this subsection.
Let H be a real separable Hilbert space, and write ·, · H for the corresponding inner product. In what follows, we will denote by X = {X(h) : h ∈ H} an isonormal Gaussian process over H, that is, X is a centered Gaussian family indexed by the elements of H and such that E[X(h)X(g)] = h, g H for every h, g ∈ H. For the rest of the paper, we will assume without loss of generality that F coincides with the σ-field generated by X.
Every F ∈ L 2 (Ω) admits a Wiener-Itô chaos expansion of the form where f q belongs to the symmetric qth tensor product H q (and is uniquely determined by F ), and I q (f q ) is the q-th multiple Wiener-Itô integral of f q with respect to X. One writes For F ∈ D 1,2 (Ω), we denote by DF the Malliavin derivative of F , and recall that DF is by definition a random element with values in H. The operator D satisfies a fundamental chain rule: if ϕ : R m → R is C 1 and has bounded derivatives and if F 1 , . . . , F m ∈ D 1,2 (Ω), then ϕ(F 1 , . . . , F m ) ∈ D 1,2 (Ω) and The adjoint of D, customarily called the divergence operator or the Skorohod integral, is denoted by δ and satisfies the duality formula, for all F ∈ D 1,2 (Ω), whenever u : Ω → H is in the domain Dom(δ) of δ.
The generator of the Ornstein-Uhlenbeck semigroup, written L, is defined by the rela- The crucial relation that links the objects introduced above is the identity which is valid for any F ∈ L 2 (Ω) (in particular, one has that, for every F ∈ L 2 (Ω), DL −1 F ∈ Dom(δ)).

Bounds on the smooth distance d 2
Fix m ≥ 1 and assume that F = (F 1 , ..., F m ) is a centered random vector in R m whose components belong to D 1,2 (Ω). Without loss of generality, we can assume that each F i is of the form F i = δ(u i ) for some u i ∈ Dom(δ); indeed, by virtue of (6) one can always set u i = −DL −1 F i (although this is by no means the only possible choice). Let also N Σ = (N 1 , . . . , N m ) be a centered Gaussian vector with invertible m × m covariance matrix Σ = {Σ(i, j)} = {Σ(i, j) : i, j = 1, ..., m}. Finally, consider the so-called d 2 distance (between the distributions of F and N Σ ) defined by where the supremum is taken over all C 2 functions h : R m → R that are 1-Lipschitz and such that sup x∈R m (Hess h)(x) H.S. ≤ 1; here, Hess h stands for the Hessian matrix of h, whereas · H.S. denotes the Hilbert-Schmidt norm, that is, For a given h : R m → R ∈ C 2 with bounded partial derivatives, let us introduce its mollification at level √ t, defined by Supposing in addition (and without loss of generality) that F and N Σ are independent, one can write Combining the duality formula (5) with the chain rule (4) implies where M F is the random m × m matrix given by It is then immediate that Inequalities in the spirit of (9) were derived e.g. in [18] (in the context of limit theorems for homogeneous sums) and [27] (in the framework of multivariate normal approximations on the Poisson space) -see also [29] and the references therein.

Bounds on the 1-Wasserstein distance
For random vectors F and N Σ as in the previous section, we will now discuss a suitable method for assessing the quantity d W (F, N Σ ) defined in (2), that is, for uniformly bounding the absolute difference |Eh(F) − Eh(N Σ )| over all 1-Lipschitz functions h of class C 1 .
Since we do not assume h to be twice differentiable, the method presented in Section 1.2 no longer works. A preferable approach is consequently the so-called 'Malliavin-Stein method', introduced in [16] in dimension 1, and later extended to the multivariate setting in [19]. Let us briefly recall how this works (see [17,Chapter 4 and Chapter 6] for a full discussion, and [1] for a constantly updated list of references).
Start by considering the following Stein's equation, with h : R m → R given and f : When h ∈ C 1 has bounded partial derivatives, it turns out that (10) admits a solution f = f h of class C 2 and whose second partial derivatives are bounded -see e.g. [17, Proposition 4.3.2] for a precise statement. Taking expectation with respect to the distribution of F in (10) gives We can apply again the duality formula (5) together with the chain rule (4), to deduce that where M F is defined in (8). Taking the supremum over the set of all 1-Lipschitz functions with and · op is the operator norm for m × m matrices. The estimate (11) is the main result of [19] (see also [17,

Main results: bounds on the convex distance
The principal aim of the present paper is to address the following natural question: can one obtain a bound similar to (11) where h t stands for the mollification at level √ t of h, as defined in (7). Let f t = f ht be the solution of the Stein's equation (10) associated with h = h t . In [6] (see also [29]), it is shown that with c 2 = c 2 (m, Σ) a constant depending only on m and Σ. Combining such an estimate with (7) yields the existence of a constant c 3 = c 3 (m, Σ) > 0 such that From (13), it is straightforward to deduce the existence of c 4 = c 4 (m, Σ) > 0 such that Comparing (14) with (9) and (11) shows that such a strategy yields a bound on d c (F, N Σ ) differing from those deduced above for the distances d 2 and d W by an additional logarithmic factor. See also [11,20] for more inequalities analogous to (14) -that is, displaying a multiplicative logarithmic factor -related respectively to the (multivariate) Kolmogorov and total variation distances.
In this paper, we will show that one can actually remove the redundant logarithmic factor on the right-hand side of (14), thus yielding a bound on d c (F, N Σ ) that is commensurate to (9) and (11) (with moreover an explicit multiplicative constant). Our main result is the following: with M F defined in (8).
As anticipated, to prove Theorem 1.2, we shall combine the somewhat classical smoothing estimate (13) with a remarkable bound by Schulte and Yukich [29].

Applications
We illustrate the use of Theorem 1.2 by developing two examples in full detail.
Quantitative fourth moment theorems. A fourth moment theorem (FMT) is a mathematical statement implying that a given sequence of centered and normalized random variables converges in distribution to a Gaussian limit, as soon as the corresponding sequence of fourth moments converges to 3 (that is, to the fourth moment of the standard Gaussian distribution). Distinguished examples of FMTs are e.g. de Jong's theorem for degenerate U -statistics (see [8,9]) as well as the CLTs for multiple Wiener-Itô integrals proved in [25,26]; the reader is referred to the webpage [1] for a list (composed of several hundreds of papers) of applications and extensions of such results, as well as to the lecture notes [31] for a modern discussion of their relevance in mathematical physics. Our first application of Theorem 1.2 is a quantitative multivariate fourth moment theorem for a vector of multiple Wiener-Itô integrals, considerably extending the qualitative multivariate results proved in [26]. Note that such a result was already obtained by Nourdin In particular, for a vector F of multiple Wiener-Itô integrals to be close in the convex distance to a centered Gaussian vector N Σ with matching covariance matrix, it is enough that The multivariate Breuer-Major theorem. The second example concerns the convergence towards a Brownian motion occurring in the Breuer-Major theorem proved in [5]. Let us briefly recall this fundamental result (see [17,Chapter 7] for an introduction to the subject, as well as [15,7] for recent advances in a functional setting). Let {G k : k ∈ Z} be a centered Gaussian stationary sequence with ρ(j − k) = E[G j G k ] and ρ(0) = 1; in particular, G k ∼ N (0, 1) for all k. Let ϕ ∈ L 2 (R, γ) where γ(dx) = (2π) −1/2 e −x 2 /2 dx denotes the standard Gaussian measure on R.
Since the Hermite polynomials {H k : k ≥ 0} form an orthonormal basis of L 2 (R, γ), one has ϕ = k≥d a k H k , with d ∈ N and a d = 0. The index d is known as the Hermite rank of ϕ ∈ L 2 (R, γ). Suppose in addition that R ϕdγ = E[ϕ(G 0 )] = 0, that is, suppose d ≥ 1. The Breuer-Major theorem [5] states the following: if k∈Z |ρ(k)| d < ∞, then where W is a standard Brownian motion, −→ indicates convergence in the sense of finitedimensional distributions, and (That σ 2 is a well-defined positive real number is part of the conclusion.) We refer to our note [21] and references therein for results on the rate of convergence in the total variation distance for one-dimensional marginal distributions (that is, in dimension 1). We intend to apply Theorem 1.2 to address the rate of convergence for the following multivariate CLT implied by (15): Therefore, choosing A appropriately, it suffices to consider the vector F n = (F n,1 , ..., F n,m ) with and obtain the rate of convergence for The following result provides a quantitative version of this CLT with respect to the distance d c .
Recall from [21] that the minimal regularity assumption over ϕ for obtaining rates of convergence via the Malliavin-Stein method is that ϕ ∈ D 1,4 (R, γ), meaning that ϕ is absolutely continuous and both ϕ and its derivative ϕ belong to L 4 (R, γ). We say that ϕ is 2-sparse if its expansion in Hermite polynomials does not have consecutive non-zero coefficients. In particular, even functions are 2-sparse.
Corollary 1.4. Let F n and N Σ be given in (16). Suppose that ϕ ∈ D 1,4 (R, γ) with Hermite rank d ≥ 1. Then, i) There exists a constant C depending only on ϕ, m, Σ such that for each n ∈ N, ii) If d = 2, ϕ is 2-sparse and b ∈ [1,2], then there exists a constant C depending only on ϕ, m, Σ such that for each n ∈ N, iii) If d = 2, ϕ is 2-sparse, and k∈Z |ρ(k)| 2 < ∞, then as n → ∞, The rest of the note is organized as follows. The proof of Theorem 1.2 is given in Section 2.1, Corollary 1.3 in Section 2.2, Corollary 1.4 in Section 2.3. We use C to denote a generic constant whose value may change from line to line.

Acknowledgments
I. Nourdin is supported by the FNR grant APOGee (R-AGR-3585-10) at Luxembourg University; G. Peccati is supported by the FNR grant FoRGES (R-AGR-3376-10) at Luxembourg University; X. Yang is supported by the FNR Grant MISSILe (R-AGR-3410-12-Z) at Luxembourg and Singapore Universities.

Proof of Theorem 1.2
We divide the proof into several steps.
Step 1 (smoothing). For any bounded and measurable h and t ∈ (0, 1), recall its mollification at level √ t from (7). Then it is plain that h t is C ∞ with bounded derivatives of all orders and the solution to (10) with h = h t is given by see [29, p.12]. Finally, recall from e.g. [29, Lemma 2.2] that, for any t ∈ (0, 1), Step 2 (integration by parts). An integration by parts by (5) and (4) (see [17,Chapter 4] for more details), together with Cauchy-Schwarz's inequality, implies, The following remarkable estimate is due to M. Schulte and J. Yukich.

Lemma 2.1 (Proposition 2.3 in [29]). Let Y be an R m -valued random vector and Σ be an invertible m × m covariance matrix. Then,
where the left-hand side depends on h through the function f t solving Stein's equation with test function h t given by (7).
Remark 2.2. Lemma 2.1 improves upon the uniform bound (see [6] or [29]) when some a priori estimate on d c (Y, N Σ ) is available.
As consequence, Letting we have thus established Step 3 (exploiting the recursive inequality). Suppose that γ ≤ 1/e, otherwise the bound we intend to prove holds already (although not being informative). Let t = γ 2 . Using the fact that κ ≤ 1 for the κ on the right-hand side of the (18) The proof is complete.

Proof of Corollary 1.3
We will obtain the desired conclusion as a direct application of Theorem 1.2 with u i = −DL −1 F i , see (6). Indeed, recall that by On the other hand, Plugging these estimates into Theorem 1.2 gives the result.

Proof of Corollary 1.4
We follow closely the arguments of [21] and assume without loss of generality that T = 1.
Then, standard computations using (6) leads to Applying Theorem 1.2 and the triangle inequality implies that Var( DF n,i , u n,j ) =: I 1 + I 2 .
Note that, by the chain rule and the relation D(G k ) = e k , where k ∼ t i means that the sum is taken over k ∈ { nt i−1 + 1, ..., nt i }, and similarly for the symbol ∼ t j . Hence, The variance is bounded because of the assumption that ϕ ∈ D 1,4 . Once (20) is in place, one can apply Gebelein's inequality as in [21]. In particular, one infers that (see [21,Proposition 3.4]) If, in addition, ϕ is 2-sparse, then Items i)-ii) now follow from these inequalities, as shown in [21]; we include a proof for completeness. Applying twice Young's inequality for convolutions, one has yielding Item i). Rewrite the sum of products as a sum of the product of convolutions by introducing the function 1 n (k) := 1 |k|<n . We have For b ∈ [1, 2], we have yielding Item ii). Now we move to the proof of Item iii). Notice that taking b = 2 for the right-hand side of (21), together with an application of Young's inequality, yields that Thus, To proceed, we handle the convolution involving 1 n a bit differently. Set so that ρ n = ρ n + ρ n . One has 1 n ρ n * 1 n 2 (Z) ≤ 1 n ρ n 2 (Z) 1 n 1 (Z) + 1 n ρ n 1 (Z) 1 n 2 (Z) ≤ N ≤|k|<n ρ(k) 2 1/2 + (2N + 1)n −1/2 , from which Item iii) follows. The proof is complete.

A Proof and discussion of relation (3)
Inequality (3) is a direct consequence of the following statement, whose proof exploits a strategy already adopted in [2, Proof of Theorem 3.1].
where Γ(Σ) is the isoperimetric constant defined by where Q ranges over all Borel measurable convex subsets of R m , and Q indicates the set of all elements of R m whose Euclidean distance from Q does not exceed .
Remark A.2. In [14] it is proved that, for some absolute constants 0 < c < C < ∞, where · H.S. stands as above for the Hilbert-Schmidt norm. When Σ = I m (identity matrix), one has also the well-known estimate Γ(I m ) ≤ 4m 1/4 (see [3]), as well as Nazarov [4] for related computations in the framework of the multivariate CLT.
Proof of Proposition A.1. We can assume that F and N Σ are defined on a common probability space, and that E F − N Σ R m = d W (F, N Σ ). Fix a convex set Q, as well as > 0. We have that On the other hand, defining Q − as the set of those y ∈ Q such that the closed ball with radius centered at y is contained in Q, where we have used the inequality The conclusion follows from a standard optimisation in .
The left-hand side of the previous inequality is usually referred to as the Kolmogorov distance between the distributions of F and N . The presence of the factor (log m) 1/4 is consistent with the fact that, for the standard Gaussian measure on R m , the isoperimetric constant associated with all hyper-rectangles of R m is bounded from above by √ log m, see [3,14]. An estimate analogous to (23) is established by different methods in [12,Corollary 3.1].