1 Introduction

Let \(\varvec{X}=(X_i)_{i=1}^\infty \) be a sequence of independent centered random variables with unit variance. A homogeneous sum is a random variable of the form

$$\begin{aligned} Q(f;\varvec{X})=\sum _{i_1,\dots ,i_p=1}^Nf(i_1,\dots ,i_q)X_{i_1}\cdots X_{i_q}, \end{aligned}$$

where \(N,q\in {\mathbb {N}}\), \([N]:=\{1,\dots ,N\}\) and \(f:[N]^q\rightarrow {\mathbb {R}}\) is a symmetric function vanishing on diagonals, i.e., \(f(i_1,\dots ,i_q)=0\) unless \(i_1,\dots ,i_q\) are mutually different. Studies of limit theorems for a sequence of homogeneous sums have some history in probability theory. Rotar’ [53, 54] investigated invariance principles for \(Q(f;\varvec{X})\) regarding the law of \(\varvec{X}\). In the notable work of de Jong [22], the following striking result has been established: For every \(n\in {\mathbb {N}}\), let \(f_n:[N_n]^q\rightarrow {\mathbb {R}}\) be a symmetric function vanishing on diagonals with q fixed and \(N_n\uparrow \infty \) as \(n\rightarrow \infty \). Assume \(\mathrm {E}[X_i^4]<\infty \) for all i and \(\mathrm {E}[Q(f_n;\varvec{X})^2]=1\) for all n. Then, \(Q(f_n;\varvec{X})\) converges in law to the standard normal distribution, provided that the following two conditions hold true:

  1. (i)

    \(\mathrm {E}[Q(f_n;\varvec{X})^4]\rightarrow 3\) as \(n\rightarrow \infty \).

  2. (ii)

    \(\max _{1\le i\le N_n}{{\,\mathrm{Inf}\,}}_i(f_n)\rightarrow 0\) as \(n\rightarrow \infty \), where \({{\,\mathrm{Inf}\,}}_i(f_n)\) is defined by

    $$\begin{aligned} {{\,\mathrm{Inf}\,}}_i(f_n):=\sum _{i_2,\dots ,i_q=1}^{N_n}f_n(i,i_2,\dots ,i_q)^2 \end{aligned}$$
    (1.1)

    and called the influence of the ith variable of \(f_n\).

When \(q=1\), condition (ii) says that \(\max _{1\le i\le N_n}f_n(i)^2\rightarrow 0\) as \(n\rightarrow \infty \), which is equivalent to the celebrated Lindeberg condition. In this case condition (i) is always implied by (ii), and thus it is an extra one. In contrast, when \(q\ge 2\), condition (ii) is no longer sufficient for the asymptotic normality of the sequence \((Q(f_n;\varvec{X}))_{n=1}^\infty \), so one needs an additional condition. The motivation of introducing condition (i) in [22] was that one can easily check condition (i) is equivalent to the asymptotic normality of \((Q(f_n;\varvec{X}))_{n=1}^\infty \) when \(q=2\) and \(\varvec{X}\) is Gaussian (see also [21]). Later on, this observation was significantly improved in the influential paper by Nualart & Peccati [49]: For any q, the asymptotic normality of \((Q(f_n;\varvec{X}))_{n=1}^\infty \) is implied just by condition (i) as long as \(\varvec{X}\) is Gaussian. Results of this type are nowadays called fourth-moment theorems and have been extensively studied in the past decade. In particular, further investigation of the fourth-moment theorem in [49] has led to the introduction of the so-called Malliavin–Stein method by Nourdin & Peccati [43], which have produced one of the most active research areas in the recent probabilistic literature. We refer the reader to the monograph [44] for an introduction to this subject and the survey [3] for recent developments.

Implication of the Malliavin–Stein method to de Jong’s central limit theorem (CLT) for homogeneous sums has been investigated in the seminal work of Nourdin, Peccati & Reinert [47], where several important extensions of de Jong’s result have been developed. The following three results are particularly relevant to our work:

  1. (I)

    First, they have established a multi-dimensional extension of de Jong’s CLT which shows multi-dimensional vectors of homogeneous sums enjoy a CLT if de Jong’s criterion is satisfied component-wise. More precisely, let \(d\in {\mathbb {N}}\) and, for every \(j=1,\dots ,d\), let \(q_j\in {\mathbb {N}}\) and \(f_{n,j}:[N_n]^{q_j}\rightarrow {\mathbb {R}}\) be a symmetric function vanishing on diagonals. Also, let \(\mathfrak {C}=(\mathfrak {C}_{jk})_{1\le j,k\le d}\) be a \(d\times d\) positive semidefinite symmetric matrix and suppose that

    $$\begin{aligned} \max _{1\le j,k\le d}|\mathrm {E}[Q(f_{n,j};\varvec{X})Q(f_{n,k};\varvec{X})]-\mathfrak {C}_{jk}|\rightarrow 0 \end{aligned}$$

    as \(n\rightarrow \infty \). Then, the d-dimensional random vector

    $$\begin{aligned} \varvec{Q}^{(n)}(\varvec{X}):=(Q(f_{n,1};\varvec{X}),\dots ,Q(f_{n,d};\varvec{X})) \end{aligned}$$

    converges in law to the d-dimensional normal distribution \({\mathcal {N}}_d(0,\mathfrak {C})\) with mean 0 and covariance matrix \(\mathfrak {C}\) as \(n\rightarrow \infty \) if \(\mathrm {E}[Q(f_{n,j};\varvec{X})^4]-3\mathrm {E}[Q(f_{n,j};\varvec{X})^2]^2\rightarrow 0\) and \(\max _{1\le i\le N_n}{{\,\mathrm{Inf}\,}}_i(f_{n,j})\rightarrow 0\) as \(n\rightarrow \infty \) for every \(j=1,\dots ,d\).

  2. (II)

    Second, they have found the following universality of Gaussian variables in the context of homogeneous sums ( [47, Theorem 1.2]): Assume

    $$\begin{aligned} \sup _n\sum _{i_1,\dots ,i_q=1}^{N_n}f_{n,j}(i_1,i_2,\dots ,i_{q_j})^2<\infty \end{aligned}$$

    and \(\mathfrak {C}_{jj}>0\) for every j. Then, if \(\varvec{Q}^{(n)}(\varvec{G})\) converges in law to \({\mathcal {N}}_d(0,\mathfrak {C})\) as \(n\rightarrow \infty \) for a sequence of standard Gaussian variables \(\varvec{G}=(G_i)_{i=1}^\infty \), then \(\varvec{Q}^{(n)}(\varvec{X})\) converges in law to \({\mathcal {N}}_d(0,\mathfrak {C})\) as \(n\rightarrow \infty \) for any sequence \(\varvec{X}=(X_i)_{i=1}^\infty \) of independent centered random variables with unit variance and such that \(\sup _i\mathrm {E}[|X_i|^3]<\infty \).

  3. (III)

    Third, they have established some quantitative versions of de Jong’s CLT for homogeneous sums; see Proposition 5.4 and Corollary 7.3 in [47] for details (see also Sect. 2.1.1).

We remark that these results have been generalized in various directions by subsequent studies. For example, the universality results analogous to (II) have also been established for Poisson variables in Peccati & Zheng [52] and i.i.d. variables with zero skewness and nonnegative excess kurtosis in Nourdin et al. [45, 46], respectively. Also, the recent work of Döbler & Peccati [28] has extended (I) and (II) to more general degenerate U-statistics which were originally treated in [22].

As the title of the paper suggests, the aim of this paper is to extend the above results to a high-dimensional setting where the dimension d depends on n and \(d=d_n\rightarrow \infty \) as \(n\rightarrow \infty \). Of course, in such a setting, the “asymptotic distribution” \(\mathcal {N}_d(0,\mathfrak {C})\) also depends on n and, even worse, it is typically no longer tight. Therefore, we need to properly reformulate the above statements in this setting. In this paper, we adopt the so-called metric approach to accomplish this purpose: We try to establish the convergence of some metric between the laws of \(\varvec{Q}^{(n)}(\varvec{X})\) and \(\mathcal {N}_d(0,\mathfrak {C})\). Specifically, we take the Kolmogorov distance as the metric between the probability laws. Namely, letting \(Z^{(n)}\) be a \(d_n\)-dimensional centered Gaussian vector with covariance matrix \(\mathfrak {C}_n\) for each n, we aim at proving the following convergence:

$$\begin{aligned} \sup _{x\in {\mathbb {R}}^{d_n}}|P(\varvec{Q}^{(n)}(\varvec{X})\le x)-P(Z^{(n)}\le x)|\rightarrow 0\quad \text {as }n\rightarrow \infty . \end{aligned}$$

Here, for vectors \(x=(x_1,\dots ,x_{d_n})\in {\mathbb {R}}^{d_n}\) and \(y=(y_1,\dots ,y_{d_n})\in {\mathbb {R}}^{d_n}\), we write \(x\le y\) to express \(x_j\le y_j\) for every \(j=1,\dots ,d_n\). In addition, we are particularly interested in a situation where the dimension \(d=d_n\) increases extremely faster than the “standard” convergence rate of Gaussian approximation for a sequence of univariate homogeneous sums. Given that both \(\sqrt{|\mathrm {E}[Q(f_n;\varvec{X})^4]-3\mathrm {E}[Q(f_{n};\varvec{X})^2]^2|}\) and \(\max _{1\le i\le N_n}\sqrt{{{\,\mathrm{Inf}\,}}_i(f_n)}\) can be the optimal convergence rates of the Gaussian approximation of \(Q(f_{n};\varvec{X})\) in the Kolmogorov distance (see [42, Proposition 3.8] for the former and [30, Remark 1] for the latter), we might consider the quantity

$$\begin{aligned} \delta _n:=\max _{1\le j\le d_n}\sqrt{|\mathrm {E}[Q(f_{n,j};\varvec{X})^4]-3\mathrm {E}[Q(f_{n,j};\varvec{X})^2]^2|+\max _{1\le i\le N_n}{{\,\mathrm{Inf}\,}}_i(f_{n,j})} \end{aligned}$$

as an appropriate definition of the “standard” convergence rate. Then, we aim at proving

$$\begin{aligned} \sup _{x\in {\mathbb {R}}^{d_n}}|P(\varvec{Q}^{(n)}(\varvec{X})\le x)-P(Z^{(n)}\le x)|\le C(\log d_n)^a\delta _n^b \end{aligned}$$
(1.2)

for all \(n\in {\mathbb {N}}\), where \(a,b,C>0\) are constants which do not depend on n (here and below we assume \(d_n\ge 2\)). As a byproduct, results of this type enable us to extend fourth-moment theorems and universality results for homogeneous sums to a high-dimensional setting (see Theorem 2.2 for the precise statement).

Our formulation of a high-dimensional extension of CLTs for homogeneous sums is motivated by the recent path-breaking work of Chernozhukov, Chetverikov & Kato [13, 18], where results analogous to (1.2) have been established for sums of independent random vectors. More formally, let \((\xi _{n,i})_{i=1}^n\) be a sequence of independent centered \(d_n\)-dimensional random vectors. Set \(S_n:=n^{-1/2}\sum _{i=1}^n\xi _{n,i}\) and assume \(\mathfrak {C}_n=\mathrm {E}[S_nS_n^\top ]\) (\(\top \) denotes the transpose of a matrix). Then, under an appropriate assumption on moments, we have

$$\begin{aligned} \sup _{x\in {\mathbb {R}}^{d_n}}|P(S_n\le x)-P(Z^{(n)}\le x)|\le C'\left( \frac{\log ^7(d_nn)}{n}\right) ^{1/6}, \end{aligned}$$
(1.3)

where \(C'>0\) is a constant which does not depend on n (see Proposition 2.1 in [18] for the precise statement). Here, we shall remark that the bound in (1.3) depends on n through \(n^{-1/6}\), which is suboptimal when the dimension \(d_n\) is fixed. However, in [18, Remark 2.1(ii)] it is conjectured that the rate \(n^{-1/6}\) is nearly optimal in a minimax sense when \(d_n\) is extremely larger than n (see also [10, Remark 1]). This conjecture is motivated by the fact that the rate \(n^{-1/6}\) is minimax optimal in CLTs for sums of independent random variables taking values in an infinite-dimensional Banach space (see, e.g., [8, Theorem 2.6]). Given that high-dimensional CLTs of type (1.3) are closely related to Gaussian approximation of the suprema of empirical processes (see, e.g., [15, 17]), it would be worth mentioning that a duality argument enables us to translate the minimax rate for CLTs in a Banach space to the one for Gaussian approximation of the suprema of empirical processes with a specific class of functions in the Kolmogorov distance; see [50] for details. For this reason, we also conjecture that \(b=1/3\) would give an optimal dependence on \(\delta _n\) of the bound in (1.2) (note that the rate \(n^{-1/2}\) is the standard convergence rate of CLTs for sums of independent one-dimensional random variables). In this paper, we indeed establish that the bound of type (1.2) holds true with \(b=1/3\) under a moment assumption on \(\varvec{X}\) when \(q_j\)’s do not depend on j (see Theorem 2.1 and Remark 2.1).

We remark that there are a number of articles which extend the scope of the Chernozhukov–Chetverikov–Kato theory (CCK theory for short) in various directions. We refer the reader to the survey [6] for recent developments. Nevertheless, most studies focus on linear statistics (i.e., sums of random variables) and there are only a few articles concerned with nonlinear statistics. Two exceptions are U-statistics developed in [10,11,12, 56] and Wiener functionals developed in [34, 35]. On the one hand, however, the former are mainly concerned with non-degenerate U-statistics which are approximately linear statistics via Hoeffding decomposition (Chen & Kato [11] also handle degenerate U-statistics, but they focus on the randomized incomplete versions that are still approximately linear statistics). On the other hand, although the latter deal with essentially nonlinear statistics, they must be functionals of a (possibly infinite-dimensional) Gaussian process, except for [35, Theorem 3.2] that is a version of our result with \(q_j\equiv 2\) (see Sect. 2.1.2 for more details). In this sense, our result would be the first extension of CCK-type results to essentially nonlinear statistics based on possibly non-Gaussian variables.

Finally, we mention that the main results of this paper have potential applications to statistics. In fact, the original motivation of this paper is to improve the Gaussian approximation result for maxima of high-dimensional vectors of random quadratic forms given by [35, Theorem 3.2], which is used to ensure the validity of the bootstrap testing procedure proposed in [35, Section 4.1] (see Sect. 2.2). Another potential application might be specification test for parametric form in nonparametric regression. In this area, to derive the null distributions of test statistics, one sometimes needs to approximate the maximum of (essentially degenerate) quadratic forms; see [25, 32, 40] for instance.

This paper is organized as follows. Section 2 presents the main results obtained in the paper, while Sects. 37 are devoted to the proof of the main results: Sect. 3 demonstrates a basic scheme of the CCK theory to prove high-dimensional CLTs. Subsequently, Sect. 4 presents a connection of this scheme to Stein’s method. Based on this observation, Sect. 5 develops a high-dimensional CLT of the form (1.2) for homogeneous sums based on normal and gamma variables. Then, Sect. 6 establishes a kind of invariance principle for high-dimensional homogeneous sums using a randomized version of the Lindeberg method. Finally, Sect. 7 completes the proof of the main results.

2 Notation

\({\mathbb {Z}}_+\) denotes the set of all nonnegative integers. For \(x=(x_1,\dots ,x_d)\in {\mathbb {R}}^d\), we define \(\Vert x\Vert _{\ell _\infty }:=\max _{1\le j\le d}|x_j|\). For \(N\in {\mathbb {N}}\), we set \([N]:=\{1,\dots ,N\}\). We set \(\sum _{i=p}^q\equiv 0\) if \(p>q\) by convention. For \(q\in {\mathbb {N}}\), we denote by \(\mathfrak {S}_q\) the set of all permutations of [q], i.e., the symmetric group of degree q. For a function \(f:[N]^q\rightarrow {\mathbb {R}}\), we set \({\mathcal {M}}(f):=\max _{1\le i\le N}{{\,\mathrm{Inf}\,}}_i(f)\) (recall that \({{\,\mathrm{Inf}\,}}_i(f)\) is defined according to (1.1)). We also set \( \Vert f\Vert _{\ell _2}:=\sqrt{\sum _{i_1,\dots ,i_q=1}^Nf(i_1,\dots ,i_q)^2}. \) For a function \(h:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\), we set \(\Vert h\Vert _\infty :=\sup _{x\in {\mathbb {R}}^d}|h(x)|\). We write \(C^m_b({\mathbb {R}}^d)\) for the set of all real-valued \(C^m\) functions on \({\mathbb {R}}^d\) all of whose partial derivatives are bounded. We write \(\partial _{j_1\dots j_m}=\frac{\partial ^m}{\partial x_{j_1}\cdots \partial x_{j_m}}\) for short. Throughout the paper, \(Z=(Z_1,\dots ,Z_d)\) denotes a d-dimensional centered Gaussian random vector with covariance matrix \({\mathfrak {C}}=({\mathfrak {C}}_{ij})_{1\le i,j\le d}\) (note that we do not assume that \(\mathfrak {C}\) is positive definite in general). Also, \((q_j)_{j=1}^\infty \) stands for a sequence of positive integers. Throughout the paper, we will regard \((q_j)_{j=1}^\infty \) as fixed, i.e., it does not vary when we consider asymptotic results. Given a probability distribution \(\mu \), we write \(X\sim \mu \) to express that X is a random variable with distribution \(\mu \). For \(\nu >0\), we write \(\gamma (\nu )\) for the gamma distribution with shape \(\nu \) and rate 1. If S is a topological space, \(\mathcal {B}(S)\) denotes the Borel \(\sigma \)-field of S.

Given a random variable X, we set \(\Vert X\Vert _p:=\{\mathrm {E}[|X|^p]\}^{1/p}\) for every \(p>0\). When X satisfies \(\mathrm {E}[X^4]<\infty \), we denote the fourth cumulant of X by \(\kappa _4(X)\). Note that \(\kappa _4(X)=\mathrm {E}[X^4]-3\mathrm {E}[X^2]^2\) if X is centered. For \(\alpha >0\), we define the \(\psi _\alpha \)-norm of X by \( \Vert X\Vert _{\psi _\alpha }:=\inf \{C>0:\mathrm {E}[\psi _\alpha (|X|/C)]\le 1\}, \) where \(\psi _\alpha (x):=\exp (x^\alpha )-1\). Note that \(\Vert \cdot \Vert _{\psi _\alpha }\) is indeed a norm (on a suitable space) if and only if \(\alpha \ge 1\). Some useful properties of the \(\psi _\alpha \)-norm are collected in Appendix A.

3 Main Results

Our first main result is a high-dimensional version of de Jong’s CLT for homogeneous sums:

Theorem 2.1

Let \(\varvec{X}=(X_i)_{i=1}^N\) be a sequence of independent centered random variables with unit variance. Set \(w=\frac{1}{2}\) if \(\mathrm {E}[X_i^3]=0\) for every \(i\in [N]\) and \(w=1\) otherwise. For every \(j\in [d]\), let \(f_j:[N]^{q_j}\rightarrow {\mathbb {R}}\) be a symmetric function vanishing on diagonals, and set \(\varvec{Q}(\varvec{X}):=(Q(f_1;\varvec{X}),\dots ,Q(f_d;\varvec{X}))\). Suppose that \(d\ge 2\), \(\underline{\sigma }:=\min _{1\le j\le d}\Vert Z_j\Vert _2>0\) and \(\max _{1\le i\le N}\Vert X_i\Vert _{\psi _\alpha }<\infty \) for some \(\alpha \in (0,w^{-1}]\). Then,

$$\begin{aligned}&\sup _{x\in {\mathbb {R}}^d}\left| P(\varvec{Q}(\varvec{X})\le x)-P(Z\le x)\right| \nonumber \\&\quad \le C(1+\underline{\sigma }^{-1})\left\{ (\log d)^{\frac{2}{3}}\delta _0[\varvec{Q}(\varvec{X})]^{\frac{1}{3}}\right. \nonumber \\&\left. \qquad +(\log d)^{\mu +\frac{1}{2}}\delta _1[\varvec{Q}(\varvec{X})]^{\frac{1}{3}} +(\log d)^{\frac{2\overline{q}_d-1}{\alpha }+\frac{3}{2}}\max _{1\le k\le d}\overline{B}_N^{q_k}\sqrt{{\mathcal {M}}(f_k)}\right\} , \end{aligned}$$
(2.1)

where \(\overline{B}_N:=\max _{1\le i\le N}(\Vert X_i\Vert _{\psi _\alpha }\vee |\mathrm {E}[X_i^3]|)\), \(\overline{q}_d:=\max _{1\le j\le d}q_j\), \(\mu :=\max \{\frac{2}{3}w\overline{q}_d-\frac{1}{6},\frac{2(\overline{q}_d-1)}{3\alpha }+\frac{1}{3}\}\), \(C>0\) depends only on \(\alpha ,\overline{q}_d\) and

$$\begin{aligned} \delta _0[\varvec{Q}(\varvec{X})]&:=\max _{1\le j,k\le d}\left| \mathrm {E}[Q(f_j;\varvec{X})Q(f_k;\varvec{X})]-{\mathfrak {C}}_{jk}\right| ,\\ \delta _1[\varvec{Q}(\varvec{X})]&:= \overline{A}_N^{2w\overline{q}_d-1} \max _{1\le j,k\le d}\left\{ 1_{\{q_j=q_k\}}\sqrt{|\kappa _4(Q(f_k;\varvec{X}))|+\overline{A}_N^{4q_k}\sum _{i=1}^N{{\,\mathrm{Inf}\,}}_i(f_k)^2}\right. \\&\quad \left. +1_{\{q_j< q_k\}}\overline{A}_N^{q_j}\Vert f_j\Vert _{\ell _2}\left( |\kappa _4(Q(f_k;\varvec{X}))|+\overline{A}_N^{4q_k}\sum _{i=1}^N{{\,\mathrm{Inf}\,}}_i(f_k)^2\right) ^{1/4}\right\} \end{aligned}$$

with \(\overline{A}_N:=\max _{1\le i\le N}(|\mathrm {E}[X_i^3]|\vee {\Vert X_i\Vert _4})\).

Remark 2.1

  1. (a)

    Since \(\sum _{i=1}^N{{\,\mathrm{Inf}\,}}_i(f_k)^2\le \Vert f_k\Vert _{\ell _2}^2\mathcal {M}(f_k)\), Theorem 2.1 gives the bound of the form (1.2) under reasonable assumptions when \(q_1=\cdots =q_d\). For example, this is the case when \(\mathrm {E}[Q(f_j;\varvec{X})Q(f_k;\varvec{X})]={\mathfrak {C}}_{jk}\) for all \(j,k\in [d]\), \(\sup _i\Vert X_i\Vert _{\psi _\alpha }<\infty \) and \(\sup _j\Vert f_j\Vert _{\ell _2}<\infty \). Here, we keep \(\sum _{i=1}^N{{\,\mathrm{Inf}\,}}_i(f_k)^2\) rather than \(\Vert f_k\Vert _{\ell _2}^2\mathcal {M}(f_k)\) for the convenience of application.

  2. (b)

    When \(q_j<q_k\) for some \(j,k\in [d]\), the exponents of \(|\kappa _4(Q(f_k;\varvec{X}))|\) and \(\mathcal {M}(f_k)\) appearing in the bound of (2.1) are 1/12, which are halves of those for the case \(q_j=q_k\). This phenomenon is not specific to the high-dimensional setting but common in fourth-moment-type theorems. See Remark 1.9(a) in [29] for more details.

  3. (c)

    In Sect. 2.1, we compare Theorem 2.1 to two existing results in some detail. The results therein show the dependence of the bound in (2.1) on the dimension d is as sharp as (and sometimes sharper than) the previous results.

We can easily extend Theorem 2.1 to a high-dimensional CLT for homogeneous sums in hyperrectangles as follows. Let \({\mathcal {A}}^\mathrm {re}(d)\) be the set of all hyperrectangles in \({\mathbb {R}}^d\), i.e., \({\mathcal {A}}^\mathrm {re}(d)\) consists of all sets A of the form

$$\begin{aligned} A=\{(x_1,\dots ,x_d)\in {\mathbb {R}}^d:a_j\le x_j\le b_j\text { for all }j=1,\dots ,d\} \end{aligned}$$

for some \(-\infty \le a_j\le b_j\le \infty \), \(j=1,\dots ,d\).

Corollary 2.1

Under the assumptions of Theorem 2.1, we have

$$\begin{aligned}&\sup _{A\in {\mathcal {A}}^\mathrm {re}(d)}\left| P(\varvec{Q}(\varvec{X})\in A)-P(Z\in A)\right| \\&\quad \le C'(1+\underline{\sigma }^{-1})\left\{ (\log d)^{\frac{2}{3}}\delta _0[\varvec{Q}(\varvec{X})]^{\frac{1}{3}}\right. \\&\quad \qquad \left. +(\log d)^{\mu +\frac{1}{2}}\delta _1[\varvec{Q}(\varvec{X})]^{\frac{1}{3}} +(\log d)^{\frac{2\overline{q}_d-1}{\alpha }+\frac{3}{2}}\max _{1\le k\le d}\overline{B}_N^{q_k}\sqrt{{\mathcal {M}}(f_k)}\right\} , \end{aligned}$$

where \(C'>0\) depends only on \(\alpha ,\overline{q}_d\).

For application, it is often useful to restate Theorem 2.1 in an asymptotic form as follows.

Corollary 2.2

Let \(\varvec{X}=(X_i)_{i=1}^\infty \) be a sequence of independent centered random variables with unit variance. Set \(w=\frac{1}{2}\) if \(\mathrm {E}[X_i^3]=0\) for every \(i\in {\mathbb {N}}\) and \(w=1\) otherwise. For every \(n\in {\mathbb {N}}\), let \(N_n,d_n\in {\mathbb {N}}\setminus \{1\}\) and \(f_{n,k}:[N_n]^{q_{ k}}\rightarrow {\mathbb {R}}\) (\(k=1,\dots ,d_n\)) be symmetric functions vanishing on diagonals, and set \(\varvec{Q}^{(n)}(\varvec{X}):=(Q(f_{n,1};\varvec{X}),\dots ,Q(f_{n,d};\varvec{X}))\). Moreover, for every \(n\in {\mathbb {N}}\), let \(Z^{(n)}=(Z_{n,1},\dots ,Z_{n,d_n})\) be a \(d_n\)-dimensional centered Gaussian vector with covariance matrix \(\mathfrak {C}_n=(\mathfrak {C}_{n,kl})_{1\le k,l\le d_n}\). Suppose that \(\overline{q}_\infty :=\sup _{j\in {\mathbb {N}}}q_j<\infty \), \(\inf _{n\in {\mathbb {N}}}\min _{1\le k\le d_n}\Vert Z_{n,k}\Vert _2>0\), \(\sup _{i\in {\mathbb {N}}}\Vert X_i\Vert _{\psi _\alpha }<\infty \) for some \(\alpha \in (0,w^{-1}]\) and

$$\begin{aligned} (\log d_n)^2\max _{1\le k,l\le d_n}|\mathrm {E}[Q(f_{n,k};\varvec{X})Q(f_{n,l};\varvec{X})]-\mathfrak {C}_{n,kl}|\rightarrow 0 \end{aligned}$$
(2.2)

as \(n\rightarrow \infty \). Moreover, setting \(a_1:=(4w\overline{q}_\infty -2)\vee (4\alpha ^{-1}(\overline{q}_\infty -1)+5)\) and \(a_2:=2\alpha ^{-1}(2\overline{q}_\infty -1)+3\), we suppose that either one of the following conditions is satisfied:

  1. (i)

    \((\log d_n)^{2a_1}\max _{1\le j\le d_n}|\kappa _4(Q(f_{n,j};\varvec{X}))|\rightarrow 0\) and \((\log d_n)^{2a_1\vee a_2}\max _{1\le j\le d_n}{\mathcal {M}}(f_{n,j})\rightarrow 0\) as \(n\rightarrow \infty \).

  2. (ii)

    \((\log d_n)^{a_1}\max _{1\le j\le d_n}|\kappa _4(Q(f_{n,j};\varvec{X}))|\rightarrow 0\) and \((\log d_n)^{a_1\vee a_2}\max _{1\le j\le d_n}{\mathcal {M}}(f_{n,j})\rightarrow 0\) as \(n\rightarrow \infty \) and \(q_1=q_2=\cdots \).

Then, we have \(\sup _{A\in {\mathcal {A}}^\mathrm {re}(d_n)}|P(\varvec{Q}^{(n)}(\varvec{X})\in A)-P(Z^{(n)}\in A)|\rightarrow 0\) as \(n\rightarrow \infty \).

Our second main result gives high-dimensional versions of fourth-moment theorems, universality results and Peccati–Tudor-type theorems for homogeneous sums:

Theorem 2.2

Let us keep the same notation as in Corollary 2.2. Suppose that one of the following conditions is satisfied:

  1. (A)

    \(\varvec{X}\) is a sequence of independent copies of a random variable X such that \(\Vert X\Vert _{\psi _\alpha }<\infty \) for some \(\alpha >0\) and \(\mathrm {E}[X^3]=0\) and \(\mathrm {E}[X^4]\ge 3\).

  2. (B)

    For every i, \(X_i\) is a standardized Poisson random variable with intensity \(\lambda _i>0\), i.e., \(\lambda _i+\sqrt{\lambda _i}X_i\) is a Poisson random variable with intensity \(\lambda _i\). Moreover, \(\inf _{i\in {\mathbb {N}}}\lambda _i>0\).

  3. (C)

    For every i, \(X_i\) is a standardized gamma random variable with shape \(\nu _i>0\) and unit rate, i.e., \(\nu _i+\sqrt{\nu _i}X_i\sim \gamma (\nu _i)\). Moreover, \(\inf _{i\in {\mathbb {N}}}\nu _i>0\).

Suppose also \(2\le \inf _{j\in {\mathbb {N}}}q_j\le \sup _{j\in {\mathbb {N}}}q_j<\infty \), \(0<\inf _{n\in {\mathbb {N}}}\min _{1\le j\le d_n}\mathfrak {C}^{(n)}_{jj}\le \sup _{n\in {\mathbb {N}}}\max _{1\le j\le d_n}\mathfrak {C}^{(n)}_{jj}<\infty \) and

$$\begin{aligned} (\log d_n)^a\max _{1\le j,k\le d_n}|\mathrm {E}[Q(f_{n,j};\varvec{X})Q(f_{n,k};\varvec{X})]-\mathfrak {C}_{n,jk}|\rightarrow 0 \end{aligned}$$

as \(n\rightarrow \infty \) for every \(a>0\). Then, we have \(\kappa _4(Q(f;\varvec{X}))\ge 0\) for any symmetric function \(f:[N]^q\rightarrow {\mathbb {R}}\) vanishing on diagonals. Moreover, the following conditions are equivalent:

  1. (i)

    \((\log d_n)^a\max _{1\le j\le d_n}\kappa _4(Q(f_{n,j};\varvec{X}))\rightarrow 0\) as \(n\rightarrow \infty \) for every \(a>0\).

  2. (ii)

    \((\log d_n)^a\max _{1\le j\le d_n}\sup _{x\in {\mathbb {R}}}|P(Q(f_{n,j};\varvec{X})\le x)-P(Z_{n,j}\le x)|\rightarrow 0\) as \(n\rightarrow \infty \) for every \(a>0\).

  3. (iii)

    \((\log d_n)^a\sup _{x\in {\mathbb {R}}^{d_n}}|P(\varvec{Q}^{(n)}(\varvec{X})\le x)-P(Z^{(n)}\le x)|\rightarrow 0\) as \(n\rightarrow \infty \) for every \(a>0\).

  4. (iv)

    \((\log d_n)^a\sup _{x\in {\mathbb {R}}^{d_n}}|P(\varvec{Q}^{(n)}(\varvec{Y})\le x)-P(Z^{(n)}\le x)|\rightarrow 0\) as \(n\rightarrow \infty \) for any \(a>0\) and sequence \(\varvec{Y}=(Y_i)_{i\in {\mathbb {N}}}\) of centered independent variables with unit variance such that \(\sup _{i\in {\mathbb {N}}}\Vert Y_i\Vert _{\psi _\alpha }<\infty \) for some \(\alpha >0\).

Remark 2.2

  1. (a)

    The implications (i) \(\Rightarrow \) (iii), (iii) \(\Rightarrow \) (iv) and (ii) \(\Rightarrow \) (iii) can be viewed as high-dimensional versions of fourth-moment theorems, universality results and Peccati–Tudor-type theorems for homogeneous sums, respectively. Here, Peccati–Tudor-type theorems refer to statements such that a joint CLT is implied by component-wise CLTs (Peccati & Tudor [51] have established such a result for multiple Wiener–Itô integrals with respect to an isonormal Gaussian process).

  2. (b)

    The proof of Theorem 2.2 relies on the fact that condition (i) automatically yields \((\log d_n)^a\max _{j}\mathcal {M}(f_{n,j})\rightarrow 0\) as \(n\rightarrow \infty \) for every \(a>0\). On the one hand, this fact has already been established in the previous work for cases (A) and (B) (see the proof of Lemma 7.2). On the other hand, for case (C), this fact seems not to have appeared in the literature so far. Indeed, for case (C) we obtain it as a byproduct of the proof of Proposition 5.2 (see Lemma 5.4). As a consequence, Theorem 2.2 seems new for case (C) even in the fixed-dimensional case. We remark that the fourth-moment theorem for case (C) has been established by [1] in the univariate case, which inspired our discussions in Sect. 5 (see also [9]).

3.1 Comparison of Theorem 2.1 to Some Existing Results

3.1.1 Comparison to Corollary 7.3 in Nourdin, Peccati and Reinert [47]

First, we compare our result to the quantitative multi-dimensional CLT for homogeneous sums obtained in Nourdin et al. [47]. To state their result, we need to introduce the notion of contraction, which will also play an important role in Sect. 5.2. For two symmetric functions \(f:[N]^p\rightarrow {\mathbb {R}},g:[N]^q\rightarrow {\mathbb {R}}\) and \(r\in \{0,1\dots ,p\wedge q\}\), we define the contraction \(f\star _rg:[N]^{p+q-2r}\rightarrow {\mathbb {R}}\) by

$$\begin{aligned} f\star _rg(i_1,\dots ,i_{p+q-2r})= & {} \sum _{k_1,\dots ,k_r=1}^Nf(i_1,\dots ,i_{p-r},k_1,\dots ,k_{r})g\nonumber \\&(i_{p-r+1},\dots ,i_{p+q-2r},k_1,\dots ,k_r). \end{aligned}$$
(2.3)

In particular, we have

$$\begin{aligned} f\star _0g(i_1,\dots ,i_{p+q})=f\otimes g(i_1,\dots ,i_{p+q})=f(i_1,\dots ,i_p)g(i_{p+1},\dots ,i_{p+q}). \end{aligned}$$

Now we are ready to state the result of [47]. To simplify the notation, we focus only on the identity covariance matrix case and do not keep the explicit dependence of constants on \(q_j\)’s.

Proposition 2.1

(Nourdin et al. [47], Corollary 7.3) Let us keep the same notation as in Theorem 2.1. Suppose that \(\mathfrak {C}_{jk}=\mathrm {E}[Q(f_j;\varvec{X})Q(f_k;\varvec{X})]\) for all \(j,k\in [d]\) and \(\mathfrak {C}\) is the identity matrix of size d. Suppose also that \(\beta :=\max _{1\le i\le N}\mathrm {E}[|X_i|^3]<\infty \) and \(q_d\ge \cdots \ge q_1\ge 2\). Then, we have

$$\begin{aligned}&\sup _{A\in {\mathcal {C}}({\mathbb {R}}^{d})}|P(\varvec{Q}(\varvec{X})\in A)-P(Z\in A)|\nonumber \\&\quad \le Kd^{3/8}\left\{ \overline{\Delta }+{\mathsf {C}}(\beta +1)\left( \sum _{j=1}^d\beta ^{(q_j-1)/3}\right) ^3\sqrt{\max _{1\le j\le d}\mathcal {M}(f_j)}\right\} ^{1/4}, \end{aligned}$$
(2.4)

where \({\mathcal {C}}({\mathbb {R}}^{d})\) is the set of all convex Borel subsets of \({\mathbb {R}}^d\), \(K>0\) is a constant depending only on \(\overline{q}_d\), \({\mathsf {C}}:=\sum _{i=1}^N\max _{1\le j\le d}{{\,\mathrm{Inf}\,}}_i(f_j)\) and

$$\begin{aligned} \overline{\Delta }:=\sum _{1\le j\le k\le d}\left( \sum _{r=1}^{q_j-1}(\Vert f_j\star _{q_j-r}f_j\Vert _{\ell _2}+\Vert f_k\star _{q_k-r}f_k\Vert _{\ell _2})+1_{\{q_j<q_k\}}\sqrt{\Vert f_k\star _{q_k-q_j}f_k\Vert _{\ell _2}}\right) . \end{aligned}$$

To compare Proposition 2.1 to our result, we need to bound the quantity \(\overline{\Delta }\) by \(|\kappa _4(Q(f_j;\varvec{X}))|\) and \(\mathcal {M}(f_j)\), \(j\in [d]\). This can be carried out by the following lemma (proved in Sect. 7.4):

Lemma 2.1

Let \(\varvec{X}=(X_i)_{i=1}^N\) be a sequence of independent centered random variables with unit variance and such that \(M:=1+\max _{1\le i\le N}\mathrm {E}[X_i^4]<\infty \). Also, let \(q\ge 2\) be an integer and \(f:[N]^q\rightarrow {\mathbb {R}}\) be a symmetric function vanishing on diagonals. Then, we have

$$\begin{aligned} \max _{1\le r\le q-1}\Vert f\star _r f\Vert _{\ell _2}\le \sqrt{|\kappa _4(Q(f;\varvec{X}))|+CM\Vert f\Vert _{\ell _2}^2\mathcal {M}(f)}, \end{aligned}$$

where \(C>0\) depends only on q.

Remark 2.3

The bound in Lemma 2.1 is generally sharp. In fact, it is well known that \(\sqrt{|\kappa _4(Q(f;\varvec{X}))|}\) has the same order as \(\max _{1\le r\le q-1}\Vert f\star _r f\Vert _{\ell _2}\) if \(\varvec{X}\) is Gaussian (see, e.g., Eq.(5.2.6) in [44]). Moreover, if \(q=2\) and \(f(i,j)=N^{-1/2}1_{\{|i-j|=1\}}\), then both \(\Vert f\star _1f\Vert _{\ell _2}\) and \(\Vert f\Vert _{\ell _2}\sqrt{\mathcal {M}(f)}\) are of order \(N^{-1/2}\).

With the help of Lemma 2.1, we observe that the bound in (2.4) typically has the same order as

$$\begin{aligned} d^{3/8}\left\{ d^2\max _{1\le j,k\le d}\widehat{\Delta }_{jk}+d^3{\mathsf {C}}\max _{1\le j\le d}\sqrt{\mathcal {M}(f_j)}\right\} ^{1/4}, \end{aligned}$$

where

$$\begin{aligned} \widehat{\Delta }_{jk}:= & {} 1_{\{q_j=q_k\}}\sqrt{|\kappa _4(Q(f_j;\varvec{X}))|+\mathcal {M}(f_j)}\\&+1_{\{q_j<q_k\}}\left\{ |\kappa _4(Q(f_k;\varvec{X}))|+\mathcal {M}(f_k)\right\} ^{1/4}. \end{aligned}$$

Thus, in the bound of (2.4), the dimension appears as a power of d, while the exponent of the “standard” convergence rate \(\delta :=\max _{1\le j\le d}\sqrt{|\kappa _4(Q(f_j;\varvec{X}))|+\mathcal {M}(f_j)}\) is 1/4. These are much improved in our result because the former appears as a power of \(\log d\) and the latter is 1/3. Nevertheless, we should note that the bound in (2.4) is given for the much stronger metric than the Kolmogorov distance. In fact, to the best of the author’s knowledge, all the known bounds for this metric depend polynomially on the dimension even for sums of independent random variables; see [58, Section 1.1] and references therein.

Remark 2.4

  1. (a)

    Roughly speaking, the exponent of \(\delta \) is 1/4 in the bound of (2.4) because this bound is transferred from an analogous quantitative CLT for the Gaussian counterpart by the Lindeberg method with matching moments up to the second order. To overcome this issue, we need to match moments up to the third order and thus we can no longer rely on the result analogous to Theorem 2.1 for the Gaussian counterpart, which is obtained in [35]. For this reason, we will develop a high-dimensional CLT for homogeneous sums based on normal and gamma variables in Sect. 5.

  2. (b)

    It is worth noting that the quantity \({\mathsf {C}}=\sum _{i=1}^N\max _{1\le j\le d}{{\,\mathrm{Inf}\,}}_i(f_j)\) in the bound of (2.4) can be much larger than \(\max _{1\le j\le d}\sum _{i=1}^N{{\,\mathrm{Inf}\,}}_i(f_j)=\max _{1\le j\le d}\Vert f_j\Vert _{\ell _2}^2\) in high-dimensional situations (see Remark 2.5 for a concrete example). Indeed, naïve application of the Lindeberg method produces a quantity like \({\mathsf {C}}\), which prevents us from using the Lindeberg method in its pure form (this is why Chernozhukov et al. [13, 18] rely on Stein’s method to prove their high-dimensional CLTs; see [14, Appendix L] for a detailed discussion). In Sect. 6, we will resolve this issue by randomizing the Lindeberg method as Deng & Zhang [24] have recently done in the context of sums of independent random vectors.

3.1.2 Comparison to Theorem 3.2 in Koike [35]

Next, we compare our result to the Gaussian approximation result for maxima of quadratic forms obtained in [35, Theorem 3.2]. Here, for an explicit comparison, we state this result with applying [35, Corollary 3.1]. For a function \(f:[N]^2\rightarrow {\mathbb {R}}\), we denote the \(N\times N\) matrix \((f(i,j))_{1\le i,j\le N}\) by [f].

Proposition 2.2

(Koike [35], Theorem 3.2 and Corollary 3.1) Let us keep the same notation as in Corollary 2.2. Suppose that \(\inf _{n\in {\mathbb {N}}}\min _{1\le k\le d_n}\Vert Z_{n,k}\Vert _2>0\), \(q_j=2\) for all j, \(\sup _{i\in {\mathbb {N}}}\Vert X_i\Vert _{\psi _2}<\infty \) and (2.2) holds true as \(n\rightarrow \infty \). Suppose also that

$$\begin{aligned}&(\log d_n)^3\max _{1\le k\le d_n}\sqrt{{{\,\mathrm{tr}\,}}\left( [f_{n,k}]^4\right) }\nonumber \\&\qquad +(\log d_n)^5\left( \max _{1\le k\le d_n}\sqrt{\mathcal {M}(f_{n,k})}\right) \sum _{i=1}^{N_n}\max _{1\le k\le d_n}{{\,\mathrm{Inf}\,}}_i(f_{n,k})\rightarrow 0 \end{aligned}$$
(2.5)

as \(n\rightarrow \infty \). Then, we have

$$\begin{aligned} \sup _{t\in {\mathbb {R}}}\left| P\left( \max _{1\le k\le d_n}|Q(f_{n,k};\varvec{X})|\le t\right) -P\left( \max _{1\le k\le d_n}|Z_{n,k}|\le t\right) \right| \rightarrow 0 \end{aligned}$$
(2.6)

as \(n\rightarrow \infty \).

When we apply our result to quadratic forms as above, we obtain the following result.

Proposition 2.3

Let us keep the same notation as in Corollary 2.2. Set

$$\begin{aligned} \Delta _n:=\max _{1\le k\le d_n}\sqrt{{{\,\mathrm{tr}\,}}\left( [f_{n,k}]^4\right) }+\max _{1\le k\le d_n}\sqrt{\mathcal {M}(f_{n,k})}\Vert f_{n,k}\Vert _{\ell _2} \end{aligned}$$

for every n. Assume \(q_1=q_2=\cdots =2\), \(\inf _{n\in {\mathbb {N}}}\min _{1\le k\le d_n}\Vert Z_{n,k}\Vert _2>0\) and (2.2). Assume also that either one of the following conditions is satisfied:

  1. (i)

    \(\sup _{i\in {\mathbb {N}}}\Vert X_i\Vert _{\psi _1}<\infty \) and \((\log d_n)^5\Delta _n\rightarrow 0\) as \(n\rightarrow \infty \).

  2. (ii)

    \(\sup _{i\in {\mathbb {N}}}\Vert X_i\Vert _{\psi _2}<\infty \), \(\mathrm {E}[X_i^3]=0\) for all i and \((\log d_n)^3\Delta _n\rightarrow 0\) as \(n\rightarrow \infty \).

Then, we have \(\sup _{A\in {\mathcal {A}}^\mathrm {re}(d_n)}|P(\varvec{Q}^{(n)}(\varvec{X})\in A)-P(Z^{(n)}\in A)|\rightarrow 0\) as \(n\rightarrow \infty \).

Remark 2.5

(a) Regarding the convergence rate of \(\max _{1\le k\le d_n}{{\,\mathrm{tr}\,}}\left( [f_{n,k}]^4\right) \), condition (i) in Proposition 2.3 is stronger than the one in Proposition 2.2. However, the former imposes a weaker moment condition on \(\varvec{X}\) than the latter. More importantly, the second term of \(\Delta _n\) is always smaller than or equal to the second term in (2.5), and the latter can be much larger than the former. For example, let us assume \(N_n=d_n=n\) and consider the functions \(f_{n,k}\) defined as follows:

$$\begin{aligned} f_{n,k}(i,j)=\left\{ \begin{array}{ll} n^{-1/2} &{} \text {if }|i-j|=1,i\ne k,j\ne k,\\ n^{-1/4} &{} \text {if }|i-j|=1,i=k\text { or }j=k,\\ 0 &{} \text {otherwise}. \end{array} \right. \end{aligned}$$

Then, we have \({{\,\mathrm{Inf}\,}}_i(f_{n,k})=(1+1_{\{1<i<n\}})n^{-1/2}\) if \(i\in \{k,k\pm 1\}\) and \({{\,\mathrm{Inf}\,}}_i(f_{n,k})=(1+1_{\{1<i<n\}})n^{-1}\) otherwise. Therefore, on the one hand

$$\begin{aligned} \left( \max _{1\le k\le d_n}\sqrt{\mathcal {M}(f_{n,k})}\right) \sum _{i=1}^{N_n}\max _{1\le k\le d_n}{{\,\mathrm{Inf}\,}}_i(f_{n,k}) \end{aligned}$$

does not converge to 0 as \(n\rightarrow \infty \), but on the other hand \(\max _{1\le k\le d_n}\sqrt{\mathcal {M}(f_{n,k})}\Vert f_{n,k}\Vert _{\ell _2}=O(n^{-1/4})\) as \(n\rightarrow \infty \). Note that in this case we have \(\max _{1\le k\le d_n}\sqrt{{{\,\mathrm{tr}\,}}\left( [f_{n,k}]^4\right) }=O(n^{-1/4})\) and \(\Vert f_{n,k}\Vert _{\ell _2}\rightarrow 1\) as \(n\rightarrow \infty \), so (2.6) holds true due to Proposition 2.3.

(b) Condition (ii) in Proposition 2.3 requires the additional zero skewness assumption, but it always improves the assumption on the functions \(f_{n,k}\) than the one in Proposition 2.2.

(c) We have \(\Delta _n\le 2\max _{k}\Vert [f_{n,k}]\Vert _{\mathrm {sp}}\Vert f_{n,k}\Vert _{\ell _2}\) with \(\Vert \cdot \Vert _{\mathrm {sp}}\) the spectral norm of matrices. So \((\log d_n)^a\Delta _n\rightarrow 0\) for some \(a>0\) is implied by \((\log d_n)^a\max _{k}\Vert [f_{n,k}]\Vert _{\mathrm {sp}}\Vert f_{n,k}\Vert _{\ell _2}\rightarrow 0\).

3.2 Statistical Application: Bootstrap Test for the Absence of Lead–Lag Relationship

Let \(W_t=(W_t^1,W_t^2)\) \((t\in {\mathbb {R}})\) be a two-sided bivariate standard Wiener process. Also let \(\rho \in (-1,1)\) and \(\vartheta \in {\mathbb {R}}\) be two (unknown) parameters. We define the bivariate process \(B_t=(B_t^1,B_t^2)\) \((t\in {\mathbb {R}})\) as \(B_t^1=W^1_t\) and \(B_t^2=\rho W^1_{t-\vartheta }+\sqrt{1-\rho ^2}W^2_t\). For each \(\nu =1,2\), we consider the process \(X^\nu =(X^\nu _t)_{t\ge 0}\) given by

$$\begin{aligned} X^\nu _t=X^\nu _0+\int _0^t\sigma _\nu (s)\mathrm{d}B^\nu _s,\qquad t\ge 0, \end{aligned}$$
(2.7)

where \(\sigma _\nu \in L^2(0,\infty )\) is nonnegative-valued and deterministic. If \(\rho \ne 0\), there is a correlation between \(X^1\) and \(X^2\) with a time lag of \(\vartheta \). We aim to test for whether such a correlation really exists or not, given (possibly asynchronous) high-frequency observations of \(X^1\) and \(X^2\). Specifically, for each \(\nu =1,2\), we observe the process \(X^\nu \) on the interval [0, T] at the deterministic sampling times \(0\le t^\nu _0<t^\nu _1<\cdots <t^\nu _{n_\nu }\le T\), which implicitly depend on the parameter \(n\in {\mathbb {N}}\) such that

$$\begin{aligned} r_n:=\max _{\nu =1,2}\max _{i=0,1,\dots ,n_\nu +1}(t^\nu _i-t^\nu _{i-1})\rightarrow 0 \end{aligned}$$

as \(n\rightarrow \infty \), where we set \(t^\nu _{-1}:=0\) and \(t^\nu _{n_\nu +1}:=T\) for each \(\nu =1,2\). To test for the null hypothesis \(H_0:\rho =0\) against the alternative \(H_1:\rho \ne 0\), Koike [35] proposed the test statistic given by \(T_n=\sqrt{n}\max _{\theta \in {\mathcal {G}}_n}|U_n(\theta )|\), where \({\mathcal {G}}_n\) is a finite subset of \({\mathbb {R}}\) and

$$\begin{aligned} U_n(\theta )= & {} \sum _{i=1}^{n_1}\sum _{j=1}^{n_2}\Delta ^n_iX^1\Delta ^n_jX^2K^{ij}_\theta \qquad \text {with }\Delta ^n_iX^\nu =X^\nu _{t^\nu _i}-X^\nu _{t^\nu _{i-1}}\text { and }\\ K^{ij}_\theta= & {} 1_{\{(t^1_{i-1},t^1_i]\cap (t^2_{j-1}-\theta ,t^2_j-\theta ]\ne \emptyset \}}. \end{aligned}$$

The null distribution of \(T_n\) can be approximated by its Gaussian analog as follows:

Proposition 2.4

([35], Proposition 4.1) For each \(n\in {\mathbb {N}}\), let \((Z_n(\theta ))_{\theta \in {\mathcal {G}}_n}\) be a family of centered Gaussian variables such that \(\mathrm {E}[Z_n(\theta )Z_n(\theta ')]=n{{\,\mathrm{Cov}\,}}[U_n(\theta ),U_n(\theta ')]\) for all \(\theta ,\theta '\in {\mathcal {G}}_n\). Suppose that \(\sup _{t\in [0,T]}(\sigma _1(t)+\sigma _2(t))<\infty \) and there are positive constants \({\underline{v}},{\overline{v}}\) such that

$$\begin{aligned} {\underline{v}}\le n\sum _{i=1}^{n_1}\sum _{j=1}^{n_2}\left( \int _{t_{i-1}^1}^{t_i^1}\sigma _1(t)^2\mathrm{d}t\right) \left( \int _{t_{j-1}^2}^{t_j^2}\sigma _2(t)^2\mathrm{d}t\right) K^{ij}_\theta \le {\overline{v}} \end{aligned}$$

for all \(n\in {\mathbb {N}}\) and \(\theta \in {\mathcal {G}}_n\). Then, under the null hypothesis \(\rho =0\), we have

$$\begin{aligned} \sup _{x\in {\mathbb {R}}}\left| P\left( T_n\le x\right) -P\left( \max _{\theta \in {\mathcal {G}}_n}|Z_n(\theta )|\le x\right) \right| \rightarrow 0 \end{aligned}$$

as \(n\rightarrow \infty \), provided that \(nr_n^2\log ^6(\#{\mathcal {G}}_n)\rightarrow 0\).

Since the distribution of \(\max _{\theta \in {\mathcal {G}}_n}|Z_n(\theta )|\) is analytically intractable, Koike [35] proposed a wild bootstrap procedure to approximate it. Formally, let \((w^1_i)_{i=1}^\infty \) and \((w^2_j)_{j=1}^\infty \) be mutually independent sequences of i.i.d. random variables independent of \(X^1\) and \(X^2\). Assume that \(\mathrm {E}[w^1_1]=\mathrm {E}[w^2_1]=0\), \({{\,\mathrm{Var}\,}}[w^1_1]={{\,\mathrm{Var}\,}}[w^2_1]=1\) and \(\Vert w^1_1\Vert _{\psi _2}\vee \Vert w^2_1\Vert _{\psi _2}<\infty \). Define the bootstrapped test statistic as \(T_n^*=\sqrt{n}\max _{\theta \in {\mathcal {G}}_n}|U_n^*(\theta )|\) where

$$\begin{aligned} U_n^*(\theta )= \sum _{i=1}^{n_1}\sum _{j=1}^{n_2}\left( w^1_i\Delta ^n_iX^1\right) \left( w^2_j\Delta ^n_jX^2\right) K^{ij}_\theta . \end{aligned}$$

In [35, Proposition B.8], it is shown that

$$\begin{aligned} \sup _{x\in {\mathbb {R}}}\left| P\left( T_n^*\le x\mid X\right) -P\left( \max _{\theta \in {\mathcal {G}}_n}|Z_n(\theta )|\le x\right) \right| \rightarrow ^p0 \end{aligned}$$
(2.8)

as \(n\rightarrow \infty \), provided that \(r_n=O(n^{-3/4-\eta })\) and \(\#{\mathcal {G}}_n=O(n^\gamma )\) for some \(\eta ,\gamma >0\) in addition to the assumptions of Proposition 2.4. Our result allows us to relax the condition on \(r_n\) as follows:

Proposition 2.5

Under the assumptions of Proposition 2.4, we have (2.8) as \(n\rightarrow \infty \), provided that \(r_n=O(n^{-1/2-\eta })\) and \(\#{\mathcal {G}}_n=O(n^\gamma )\) for some \(\eta ,\gamma >0\).

4 Chernozhukov–Chetverikov–Kato Theory

In this section we demonstrate a basic scheme of the CCK theory to establish high-dimensional CLTs. One main ingredient of the CCK theory is the following smooth approximation of the maximum function: For each \(\beta >0\), we define the function \(\Phi _\beta :{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) by

$$\begin{aligned} \Phi _\beta (x)=\beta ^{-1}\log \left( \sum _{j=1}^de^{\beta x_j}\right) ,\qquad x=(x_1,\dots ,x_d)\in {\mathbb {R}}^d. \end{aligned}$$

Eq.(1) in [16] states that

$$\begin{aligned} 0\le \Phi _\beta (x)-\max _{1\le j\le d}x_j\le \beta ^{-1}\log d \end{aligned}$$
(3.1)

for any \(x\in {\mathbb {R}}^d\). Therefore, the larger \(\beta \) is, the better \(\Phi _\beta \) approximates the maximum function. The next lemma, which is a summary of [24, Lemmas 5–6], highlights the key properties of this smooth max function:

Lemma 3.1

For any \(\beta >0\), \(m\in {\mathbb {N}}\) and \(C^m\) function \(h:{\mathbb {R}}\rightarrow {\mathbb {R}}\), there is an \({\mathbb {R}}^{\otimes m}\)-valued function \(\Upsilon _{\beta }(x)=(\Upsilon ^{j_1,\dots ,j_m}_\beta (x))_{1\le j_1,\dots ,j_m\le d}\) on \({\mathbb {R}}^d\) satisfying the following conditions:

  1. (i)

    For any \(x\in {\mathbb {R}}^d\) and \(j_1,\dots ,j_m\in [d]\), we have \( |\partial _{j_1\dots j_m}(h\circ \Phi _\beta )(x)|\le \Upsilon _\beta ^{j_1,\dots ,j_m}(x). \)

  2. (ii)

    For every \(x\in {\mathbb {R}}^d\), we have

    $$\begin{aligned} \sum _{j_1,\dots ,j_m=1}^d\Upsilon _\beta ^{j_1,\dots , j_m}(x) \le c_{m}\max _{1\le k\le m}\beta ^{m-k}\Vert h^{(k)}\Vert _\infty , \end{aligned}$$

    where \(c_{m}>0\) depends only on m.

  3. (iii)

    For any \(x,t\in {\mathbb {R}}^d\) and \(j_1,\dots ,j_m\in [d]\), we have

    $$\begin{aligned} e^{-8\Vert t\Vert _{\ell _\infty }\beta }\Upsilon _\beta ^{j_1,\dots ,j_m}(x+t)\le \Upsilon _\beta ^{j_1,\dots ,j_m}(x)\le e^{8\Vert t\Vert _{\ell _\infty }\beta }\Upsilon _\beta ^{j_1,\dots ,j_m}(x+t). \end{aligned}$$

Remark 3.1

An explicit expression of the constant \(c_m\) in Lemma 3.1 can be derived from [24, Lemma 5]. In particular, we have \(c_1=1\) and \(c_2=3\).

Another important ingredient of the CCK theory is the so-called anti-concentration inequality. For our purpose, the following one is particularly useful (see [19] for the proof):

Lemma 3.2

(Nazarov’s inequality) If \(\underline{\sigma }:=\min _{1\le j\le d}\Vert Z_j\Vert _2>0\), for any \(x\in {\mathbb {R}}^d\) and \(\varepsilon >0\) we have

$$\begin{aligned} P(Z\le x+\varepsilon )-P(Z\le x)\le \frac{\varepsilon }{\underline{\sigma }}\left( \sqrt{2\log d}+2\right) . \end{aligned}$$

These tools enable us to establish the following form of smoothing inequality:

Proposition 3.1

Let \(g_{0}:{\mathbb {R}}\rightarrow [0,1]\) be a measurable function such that \(g_{0}(t)=1\) for \(t\le 0\) and \(g_{0}(t)=0\) for \(t\ge 1\). Also, let \(\varepsilon >0\) and set \(\beta :=\varepsilon ^{-1}\log d\). Suppose that \(\underline{\sigma }:=\min _{1\le j\le d}\Vert Z_j\Vert _2>0\). Then, for any d-dimensional random vector F, we have

$$\begin{aligned} \sup _{x\in {\mathbb {R}}^d}\left| P(F\le x)-P(Z\le x)\right| \le \Delta _\varepsilon (F,Z)+\frac{2\varepsilon }{\underline{\sigma }}\left( \sqrt{2\log d}+2\right) , \end{aligned}$$
(3.2)

where

$$\begin{aligned} \Delta _\varepsilon (F,Z):=\sup _{y\in {\mathbb {R}}^d}\left| \mathrm {E}[g_0(\varepsilon ^{-1}\Phi _\beta (F-y))]-\mathrm {E}[g_0(\varepsilon ^{-1}\Phi _\beta (Z-y))]\right| . \end{aligned}$$
(3.3)

Proof

This result has been essentially shown in Step 2 in the proof of [18, Lemma 5.1]. \(\square \)

Remark 3.2

Proposition 3.1 can be seen as a special version of more general smoothing inequalities such as [7, Lemma 2.1]. An important feature of bound (3.2) is that the quantity \(\Delta _\varepsilon (F,Z)\) contains only test functions of the form \(x\mapsto g_0(\Phi _\beta (x-y))\) for some \(y\in {\mathbb {R}}^d\). If \(g_0\) is sufficiently smooth, derivatives of such a test function admit good estimates with respect to the dimension d, as seen from Lemma 3.1.

It might be worth mentioning that we can use Proposition 3.1 to derive a bound for the Kolmogorov distance by the Wasserstein distance. Let us recall the definition of the Wasserstein distance.

Definition 3.1

(Wasserstein distance) For d-dimensional random vectors FG with integrable components, the Wasserstein distance between the laws of F and G is defined by

$$\begin{aligned} {\mathcal {W}}_1(F,G):=\sup _{h\in \mathcal {H}}|\mathrm {E}[h(F)]-\mathrm {E}[h(G)]|, \end{aligned}$$

where \(\mathcal {H}\) denotes the set of all functions \(h:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) such that

$$\begin{aligned} \Vert h\Vert _{\mathrm {Lip}}:=\sup _{x,y\in {\mathbb {R}}^d:x\ne y}\frac{|h(x)-h(y)|}{\Vert x-y\Vert }\le 1. \end{aligned}$$

Here, \(\Vert \cdot \Vert \) is the usual Euclidian norm on \({\mathbb {R}}^d\).

Corollary 3.1

Under the assumptions of Proposition 3.1, we have

$$\begin{aligned} \sup _{x\in {\mathbb {R}}^d}\left| P(F\le x)-P(Z\le x)\right| \le \sqrt{\frac{2\left( \sqrt{2\log d}+2\right) }{\underline{\sigma }}{\mathcal {W}}_1(F,Z)}. \end{aligned}$$

Proof

It suffices to consider the case \({\mathcal {W}}_1(F,Z)>0\). Let us define the function \(g_0:{\mathbb {R}}\rightarrow [0,1]\) by \(g_0(x)=\min \{1,\max \{1-x,0\}\}\), \(x\in {\mathbb {R}}\). Then, for any \(x,x',y\in {\mathbb {R}}^d\) and \(\varepsilon >0\), we have \(|g_0(\varepsilon ^{-1}\Phi _\beta (x-y)-g_0(\varepsilon ^{-1}\Phi _\beta (x'-y)|\le \varepsilon ^{-1}\Vert x-x'\Vert _{\ell _\infty }\) by [13, Lemma A.3], so we obtain \(\Delta _\varepsilon (F,Z)\le \varepsilon ^{-1}{\mathcal {W}}_1(F,Z)\). Now, setting \(\varepsilon =\sqrt{\underline{\sigma }{\mathcal {W}}_1(F,Z)/(2\sqrt{2\log d}+4)}\), we infer the desired result from Proposition 3.1. \(\square \)

When \(d=1\), Corollary 3.1 recovers the standard estimate (cf. Eq.(C.2.6) in [44]). We shall remark that a bound similar to the above (with a slightly different constant) has already appeared in [4, Theorem 3.1].

Remark 3.3

It is generally impossible to derive (1.2)-type bounds from the corresponding ones for the Wasserstein distance. To see this, let \(F=(F_1,\dots ,F_d)\) be a d-dimensional random vector such that the laws of \(F_1,\dots ,F_d\) are identical (and integrable). Also, let \(G=(G_1,\dots ,G_d)\) be another d-dimensional random vector satisfying the same condition. Then, we can easily verify \({\mathcal {W}}_1(F,G)\ge \sqrt{d}{\mathcal {W}}_1(F_1,G_1)\) by definition.

5 Stein Kernels and High-Dimensional CLTs

In the rest of the paper, we fix a \(C^\infty \) function \(g_0:{\mathbb {R}}\rightarrow [0,1]\) such that \(g_0(t)=1\) for \(t\le 0\) and \(g_0(t)=0\) for \(t\ge 1\): For example, we can take it as \(g_0(t)=f_0(1-t)/\{f_0(t)+f_0(1-t)\}\), where the function \(f_0:{\mathbb {R}}\rightarrow {\mathbb {R}}\) is defined by \(f_0(t)=e^{-1/t}\) if \(t>0\) and \(f_0(t)=0\) otherwise.

To make Proposition 3.1 useful, we need to obtain a “good” upper bound for the quantity \(\Delta _\varepsilon (F,Z)\). As briefly mentioned in Remark 2.4, Chernozhukov et al. [13] have pointed out that Stein’s method effectively solves this task. Moreover, discussions in [16, 35] implicitly suggest that the CCK theory would have a nice connection to Stein kernels. In this section, we illustrate this idea.

Definition 4.1

(Stein kernel) Let \(F=(F_1,\dots ,F_d)\) be a centered d-dimensional random variable. A \(d\times d\) matrix-valued measurable function \(\tau _F=(\tau _F^{ij})_{1\le i,j\le d}\) on \({\mathbb {R}}^d\) is called a Stein kernel for (the law of) F if \(\max _{1\le i,j\le d}\mathrm {E}[|\tau _F^{ij}(F)|]<\infty \) and

$$\begin{aligned} \sum _{j=1}^d\mathrm {E}[\partial _j\varphi (F)F_j]=\sum _{i,j=1}^d\mathrm {E}[\partial _{ij}\varphi (F)\tau _F^{ij}(F)] \end{aligned}$$
(4.1)

for any \(\varphi \in C^\infty _b({\mathbb {R}}^d)\).

Remark 4.1

In this paper, we adopt \(C^\infty _b({\mathbb {R}}^d)\) as the class of test functions for which identity (4.1) holds true because of convenience, but other classes are also used in the literature; see [20] for instance.

Lemma 4.1

Let \(F=(F_1,\dots ,F_d)\) be a centered d-dimensional random vector. Also, let \(\tau _F=(\tau _F^{ij})_{1\le i,j\le d}\) be a Stein kernel for F. Then, we have

$$\begin{aligned} \sup _{y\in {\mathbb {R}}^d}\left| {{\,\mathrm{E}\,}}\left[ h\left( \Phi _\beta (F-y)\right) \right] -{{\,\mathrm{E}\,}}\left[ h\left( \Phi _\beta (Z-y)\right) \right] \right| \le \frac{3}{2}\max \{\Vert h''\Vert _\infty ,\beta \Vert h'\Vert _\infty \}\Delta \end{aligned}$$

for any \(\beta >0\) and \(h\in C^\infty _b({\mathbb {R}})\), where

$$\begin{aligned} \Delta :={{\,\mathrm{E}\,}}\left[ \max _{1\le i,j\le d}|\tau _F^{ij}(F)-{\mathfrak {C}}_{ij}|\right] . \end{aligned}$$

Proof

The proof is essentially same as that of [16, Theorem 1] or [35, Proposition 2.1], so we omit it. \(\square \)

Proposition 4.1

Suppose that \(d\ge 2\) and \(\underline{\sigma }:=\min _{1\le j\le d}\Vert Z_j\Vert _2>0\). Under the assumptions of Lemma 4.1, there is a universal constant \(C>0\) such that

$$\begin{aligned} \sup _{x\in {\mathbb {R}}^d}\left| P(F\le x)-P(Z\le x)\right| \le C(1+\underline{\sigma }^{-1})\Delta ^{1/3}(\log d)^{2/3}. \end{aligned}$$
(4.2)

Proof

Thanks to [44, Lemma 4.1.3], it suffices to consider the case \(\Delta >0\). By Lemma 4.1, for any \(\varepsilon >0\) we have \(\Delta _\varepsilon (F,Z)\le C'\varepsilon ^{-2}(\log d)\Delta \), where \(C'>0\) is a universal constant. Therefore, Proposition 3.1 yields

$$\begin{aligned} \sup _{x\in {\mathbb {R}}^d}\left| P(F\le x)-P(Z\le x)\right| \le C'\varepsilon ^{-2}(\log d)\Delta +\frac{2\varepsilon }{\underline{\sigma }}\left( \sqrt{2\log d}+2\right) . \end{aligned}$$

Now, setting \(\varepsilon =\Delta ^{1/3}(\log d)^{1/6}\), we obtain the desired result. \(\square \)

6 A High-Dimensional CLT for Normal-Gamma Homogeneous Sums

In view of the results in Sect. 4, we naturally seek a situation where a vector of homogeneous sums has a Stein kernel. This is the case when all the components are eigenfunctions of a Markov diffusion operator (cf. Proposition 5.1 in [39]). Moreover, as clarified in [1, 9, 38], only some spectral properties of the Markov diffusion operator are essential for deriving a fourth-moment-type bound for the variance of the corresponding Stein kernel. This spectral property is especially satisfied when each \(X_i\) is either a Gaussian or (standardized) gamma variable, so this section focuses on such a situation and derives a high-dimensional CLT for this special case.

For each \(\nu >0\), we denote by \(\gamma _\pm (\nu )\) the distribution of the random variable \(\pm (X-\nu )/\sqrt{\nu }\) with \(X\sim \gamma (\nu )\). Also, for every \(q\in {\mathbb {N}}\) we set \( \mathfrak {c}_q:=\sum _{r=1}^qr!\left( {\begin{array}{c}q\\ r\end{array}}\right) ^2. \)

Proposition 5.1

Let us keep the same notation as in Theorem 2.1 and assume \(d\ge 2\). Let \(\varvec{Y}=(Y_i)_{i=1}^N\) be a sequence of independent random variables such that the law of \(Y_i\) belongs to \(\{{\mathcal {N}}(0,1)\}\cup \{\gamma _+(\nu ):\nu>0\}\cup \{\gamma _-(\nu ):\nu >0\}\) for all i. For every i, define the constants \(v_i\) and \(\eta _i\) by

$$\begin{aligned} v_i:= \left\{ \begin{array}{ll} 2 &{} \text {if }Y_i\sim {\mathcal {N}}(0,1), \\ 2(1+\nu ^{-1}) &{} \text {if }Y_i\sim \gamma _\pm (\nu ), \end{array} \right. \qquad \eta _i:= \left\{ \begin{array}{ll} 1 &{} \text {if }Y_i\sim {\mathcal {N}}(0,1), \\ 1\wedge \sqrt{\nu } &{} \text {if }Y_i\sim \gamma _\pm (\nu ). \end{array} \right. \end{aligned}$$

We also set \(w_*=1/2\) if \(Y_i\sim {\mathcal {N}}(0,1)\) for all i and \(w_*=1\) otherwise. Then, \(\kappa _4(Q(f_j;\varvec{Y}))\ge 0\) for all j and

$$\begin{aligned}&\sup _{y\in {\mathbb {R}}^d}\left| \mathrm {E}[h\left( \Phi _\beta (\varvec{Q}(\varvec{Y})-y)\right) ]-\mathrm {E}[h\left( \Phi _\beta (Z-y)\right) ]\right| \nonumber \\&\quad \le \frac{3}{2}\max \{\Vert h''\Vert _\infty ,\beta \Vert h'\Vert _\infty \}\left( \delta _0[\varvec{Q}(\varvec{Y})]+C\delta _2[\varvec{Q}(\varvec{Y})]\right) \end{aligned}$$
(5.1)

for any \(\beta >0\) and \(h\in C^\infty _b({\mathbb {R}})\), where \(C>0\) depends only on \(\overline{q}_d\) and

$$\begin{aligned} \delta _2[\varvec{Q}(\varvec{Y})]:= & {} \max _{1\le j,k\le d}\{\underline{\eta }_N^{-1}(\log d)\}^{w_*(q_j+q_k)-1}\left\{ 1_{\{q_j< q_k\}}\Vert Q(f_j;\varvec{Y})\Vert _4\kappa _4(Q(f_k;\varvec{Y}))^{1/4}\right. \\&\left. +1_{\{q_j=q_k\}}\sqrt{2\kappa _4(Q(f_j;\varvec{Y}))+\left( 2^{-q_j}\overline{v}_N^{q_j}-1\right) (2q_j)!\mathfrak {c}_{q_j} \sum _{i=1}^N{{\,\mathrm{Inf}\,}}_i(f_j)^2}\right\} \end{aligned}$$

with \(\overline{v}_N:=\max _{1\le i\le N}v_i\) and \(\underline{\eta }_N:=\min _{1\le i\le N}\eta _i\).

The rest of this section is devoted to the proof of Proposition 5.1. In the remainder of this section, we assume that the probability space \((\Omega ,{\mathcal {F}},P)\) is given by the product probability space \((\prod _{i=1}^N\Omega _i,\bigotimes _{i=1}^N{\mathcal {F}}_i,\bigotimes _{i=1}^NP_i)\), where

$$\begin{aligned} (\Omega _i,{\mathcal {F}}_i,P_i):= \left\{ \begin{array}{ll} ({\mathbb {R}},{\mathcal {B}}({\mathbb {R}}),{\mathcal {N}}(0,1)) &{} \text {if }Y_i\sim {\mathcal {N}}(0,1), \\ ((0,\infty ),{\mathcal {B}}((0,\infty )),\gamma (\nu )) &{} \text {if }Y_i\sim \gamma _\pm (\nu ). \end{array} \right. \end{aligned}$$

Then, we realize the variables \(Y_1,\dots ,Y_N\) as follows: For \(\omega =(\omega _1,\dots ,\omega _N)\in \Omega \), we define

$$\begin{aligned} Y_i(\omega ):= \left\{ \begin{array}{ll} \omega _i &{} \text {if }Y_i\sim {\mathcal {N}}(0,1), \\ \pm (\omega _i-\nu )/\sqrt{\nu } &{} \text {if }Y_i\sim \gamma _\pm (\nu ). \end{array} \right. \end{aligned}$$

6.1 \(\Gamma \)-Calculus

Our first aim is to construct a suitable Markov diffusion operator whose eigenspaces contain all the components of \(\varvec{Q}(\varvec{Y})\). In the following, for an open subset U of \({\mathbb {R}}^m\), we write \(C^\infty _p(U)\) for the set of all real-valued \(C^\infty \) functions on U all of whose partial derivatives have at most polynomial growth.

First, we denote by \({{\,\mathrm{L}\,}}_\text {OU}\) the Ornstein–Uhlenbeck operator on \({\mathbb {R}}\). Next, for every \(\nu >0\), we write \({{\,\mathrm{L}\,}}_\nu \) for the Laguerre operator on \((0,\infty )\) with parameter \(\nu \). We then define the operators \({\mathcal {L}}_1,\dots ,{\mathcal {L}}_N\) by

$$\begin{aligned} {\mathcal {L}}_i:= \left\{ \begin{array}{ll} {{\,\mathrm{L}\,}}_\text {OU} &{} \text {if }Y_i\sim {\mathcal {N}}(0,1), \\ {{\,\mathrm{L}\,}}_\nu &{} \text {if }Y_i\sim \gamma _\pm (\nu ). \end{array} \right. \end{aligned}$$

Finally, we construct the densely defined symmetric operator \({{\,\mathrm{L}\,}}\) in \(L^2(P)\) by tensorization of \({\mathcal {L}}_1,\dots ,{\mathcal {L}}_N\) (see Section 2.2 of [2] for details). We will use the following properties of \({{\,\mathrm{L}\,}}\) (cf. [1] and Section 2.2 of [2]):

  1. (i)

    If F and G are eigenfunctions of \(-{{\,\mathrm{L}\,}}\) associated with eigenvalues p and q, respectively, FG belongs to \(\bigoplus _{k=0}^{p+q}{{\,\mathrm{Ker}\,}}({{\,\mathrm{L}\,}}+k{{\,\mathrm{Id}\,}})\).

  2. (ii)

    The eigenspaces of \({{\,\mathrm{L}\,}}_\text {OU}\) and \({{\,\mathrm{L}\,}}_\nu \) associated with eigenvalue \(k\in {\mathbb {Z}}_+\) are given by \({{\,\mathrm{Ker}\,}}({{\,\mathrm{L}\,}}_\text {OU}+k{{\,\mathrm{Id}\,}})=\{aH_k:a\in {\mathbb {R}}\}\) and \({{\,\mathrm{Ker}\,}}({{\,\mathrm{L}\,}}_\nu +k{{\,\mathrm{Id}\,}})=\{aL^{(\nu -1)}_k:a\in {\mathbb {R}}\}\), respectively. Here, \(H_k\) and \(L_k^{(\alpha )}\) denote the Hermite polynomial of degree k and Laguerre polynomial of degree k and parameter \(\alpha >-1\), respectively.

  3. (iii)

    The eigenspace of \({{\,\mathrm{L}\,}}\) associated with eigenvalue k is given by

    $$\begin{aligned} {{\,\mathrm{Ker}\,}}({{\,\mathrm{L}\,}}+k{{\,\mathrm{Id}\,}})=\bigoplus _{\begin{array}{c} k_1+\cdots +k_N=k\\ k_1,\dots ,k_N\in {\mathbb {Z}}_+ \end{array}} {{\,\mathrm{Ker}\,}}({\mathcal {L}}_1+k_1{{\,\mathrm{Id}\,}})\otimes \cdots \otimes {{\,\mathrm{Ker}\,}}({\mathcal {L}}_N+k_N{{\,\mathrm{Id}\,}}).\nonumber \\ \end{aligned}$$
    (5.2)

Let us write \({\mathcal {S}}=C^\infty _p(\Omega )\). We define the carré du champ operator of \({{\,\mathrm{L}\,}}\) by

$$\begin{aligned} \Gamma (F,G)=\frac{1}{2}\left( {{\,\mathrm{L}\,}}(FG)-F{{\,\mathrm{L}\,}}G-G{{\,\mathrm{L}\,}}F\right) \end{aligned}$$

for all \(F,G\in {\mathcal {S}}\). The following lemma is a special case of [39, Proposition 5.1].

Lemma 5.1

For every \((i,j)\in [d]^2\), define the function \(\tau ^{ij}:{\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\otimes {\mathbb {R}}^d\) by

$$\begin{aligned} \tau ^{ij}(x)=\frac{1}{q_j}\mathrm {E}[\Gamma (Q(f_i;\varvec{Y}),Q(f_j;\varvec{Y}))\mid \varvec{Q}(\varvec{Y})=x],\qquad x\in {\mathbb {R}}^d. \end{aligned}$$

Then, \(\tau =(\tau ^{ij})_{1\le i,j\le d}\) is a Stein kernel for \(\varvec{Q}(\varvec{Y})\).

We refer to [5] for more details about these operators.

6.2 A Bound for the Variance of the Carré du Champ Operator

In view of Lemmas 4.1 and 5.1 , we obtain (5.1) once we show that

$$\begin{aligned} {{\,\mathrm{E}\,}}\left[ \max _{1\le j,k\le d}\left| \frac{1}{q_k}\Gamma \left( Q(f_j;\varvec{Y}),Q(f_k;\varvec{Y})\right) -\mathfrak {C}_{jk}\right| \right] \le \delta _0[\varvec{Q}(\varvec{Y})]+C\delta _2[\varvec{Q}(\varvec{Y})], \end{aligned}$$
(5.3)

where \(C>0\) depends only on \(\overline{q}_d\). As a first step, we estimate \({{\,\mathrm{Var}\,}}[\Gamma ( Q(f_j;\varvec{Y}),Q(f_k;\varvec{Y}))]\) for every \((j,k)\in [d]^2\). More precisely, our aim here is to prove the following result:

Proposition 5.2

Let \(p\le q\) be two positive integers. Let \(f:[N]^p\rightarrow {\mathbb {R}}\) and \(g:[N]^q\rightarrow {\mathbb {R}}\) be symmetric functions vanishing on diagonals and set \(F:=Q(f;\varvec{Y})\) and \(G:=Q(g;\varvec{Y})\). Then, \(\kappa _4(F)\ge 0\), \(\kappa _4(G)\ge 0\) and

$$\begin{aligned}&{{\,\mathrm{Var}\,}}\left[ \frac{1}{q}\Gamma (F,G)\right] \nonumber \\&\quad \le 1_{\{p<q\}}\sqrt{\mathrm {E}[F^4]}\sqrt{\kappa _4(G)}\nonumber \\&\quad \quad +1_{\{p=q\}}\left\{ 2\sqrt{\kappa _4(F)}\sqrt{\kappa _4(G)} +\left( 2^{-p}\overline{v}_N^p-1\right) (2p)!\mathfrak {c}_p \sqrt{\sum _{i=1}^N{{\,\mathrm{Inf}\,}}_i(f)^2} \sqrt{\sum _{i=1}^N{{\,\mathrm{Inf}\,}}_i(g)^2} \right\} .\nonumber \\ \end{aligned}$$
(5.4)

Before starting the proof, we remark how this result is related to the preceding studies. When \(f=g\), Azmoodeh et al. [1] have derived a better estimate than (5.4) in a more general setting. Their technique of the proof can also be applied to the case \(f\ne g\), and this has been implemented in Campese et al. [9]. However, this leads to a bound containing the quantity \({{\,\mathrm{Cov}\,}}[F^2,G^2]-2{{\,\mathrm{E}\,}}\left[ FG\right] ^2\), so we need an additional argument to estimate it. For this reason, we take an alternative route for the proof, which is inspired by the discussions in Zheng [59] as well as [9, Proposition 3.6]. As a byproduct of this strategy, we obtain inequality (5.11) which leads to the universality of gamma variables.

We begin by introducing some notation. We write \(J_k\) for the orthogonal projection of \(L^2(P)\) onto the eigenspace \({{\,\mathrm{Ker}\,}}({{\,\mathrm{L}\,}}+k{{\,\mathrm{Id}\,}})\). For every i, we define the random variable \({\mathfrak {p}}_{2}(Y_i)\) by

$$\begin{aligned} {\mathfrak {p}}_2(Y_i):= \left\{ \begin{array}{ll} H_2(Y_i) &{} \text {if }Y_i\sim {\mathcal {N}}(0,1), \\ \pm \frac{2}{\nu }L_2^{(\nu -1)}(\pm \sqrt{\nu }(Y_i+1)) &{} \text {if }Y_i\sim \gamma _\pm (\nu ). \end{array} \right. \end{aligned}$$

The following lemma can be proved by a straightforward computation.

Lemma 5.2

For every i, \(\mathrm {E}[{\mathfrak {p}}_2(Y_i)]=\mathrm {E}[Y_i{\mathfrak {p}}_2(Y_i)]=0\) and \(\mathrm {E}[\mathfrak {p}_2(Y_i)^2]=v_i\).

Next, given \(h':[N]^r\rightarrow {\mathbb {R}}\), we define

$$\begin{aligned} \langle h,h'\rangle :=\sum _{i_1,\dots ,i_r=1}^Nh(i_1,\dots ,i_r)h'(i_1,\dots ,i_r). \end{aligned}$$

Note that \(\Vert h\Vert _{\ell _2}^2=\langle h,h\rangle \). For every \(r\in \{0,1,\dots ,p\wedge q\}\), we define the function \(f\mathbin {\widehat{\star _{r}^0}}g:[N]^{p+q-r}\rightarrow {\mathbb {R}}\) by

$$\begin{aligned}&f\mathbin {\widehat{\star _{r}^0}}g(i_1,\dots ,i_{p+q-2r},k_1,\dots ,k_r)\\&\quad :=\frac{1}{(p+q-2r)!}\sum _{\sigma \in {\mathfrak {S}}_{p+q-2r}}f(i_{\sigma (1)},\dots ,i_{\sigma (p-r)},k_1,\dots ,k_{r})\\&\quad \qquad \times g(i_{\sigma (p-r+1)},\dots ,i_{\sigma (p+q-2r)},k_1,\dots ,k_r). \end{aligned}$$

Note that we have

$$\begin{aligned} \widetilde{f\star _rg}(i_1,\dots ,i_{p+q-2r})=\sum _{(k_1,\dots ,k_r)\in \Delta _r^N}f\mathbin {\widehat{\star _{r}^0}}g(i_1,\dots ,i_{p+q-2r},k_1,\dots ,k_r), \end{aligned}$$
(5.5)

where \(\widetilde{f\star _rg}\) is the symmetrization of \(f\star _rg\) (recall (2.3)). Finally, we set \(\Delta _q^N:=\{(i_1,\dots ,i_q)\in [N]^q:i_j\ne i_k\text { if }j\ne k\}\).

The next lemma is a key part in our proof.

Lemma 5.3

Under the assumptions of Proposition 5.1, we have

$$\begin{aligned} \mathrm {E}[J_{p+q}(FG)^2]\ge (p+q)!\Vert f\mathbin {{\widetilde{\otimes }}}g\Vert _{\ell _2}^2. \end{aligned}$$
(5.6)

Moreover, if \(p=q\), we have

$$\begin{aligned}&\left| \mathrm {E}[J_{2p}(F^2)J_{2p}(G^2)]-(2p)!\langle f\mathbin {{\widetilde{\otimes }}}f,g\mathbin {{\widetilde{\otimes }}}g\rangle \right| \nonumber \\&\quad \le \left( 2^{-p}\overline{v}_N^p-1\right) (2p)!\mathfrak {c}_p \sqrt{\sum _{i=1}^N{{\,\mathrm{Inf}\,}}_i(f)^2} \sqrt{\sum _{i=1}^N{{\,\mathrm{Inf}\,}}_i(g)^2}. \end{aligned}$$
(5.7)

Proof

We can deduce from [48, Proposition 2.9] and (5.5)

$$\begin{aligned} J_{p+q}(FG)= & {} \sum _{r=0}^{p\wedge q}r!\left( {\begin{array}{c}p\\ r\end{array}}\right) \left( {\begin{array}{c}q\\ r\end{array}}\right) \sum _{(i_1,\dots ,i_{p+q-r})\in \Delta ^N_{p+q-r}}f\widehat{\star _r^0}g(i_{1},\dots ,i_{p+q-r})Y_{i_1}\nonumber \\&\cdots Y_{i_{p+q-2r}}{\mathfrak {p}}_2(Y_{i_{p+q-2r+1}}) \cdots {\mathfrak {p}}_2(Y_{i_{p+q-r}}). \end{aligned}$$
(5.8)

Next, let \((i_1,\dots ,i_{p+q-r})\in \Delta ^N_{p+q-r}\) and \((j_1,\dots ,j_{p+q-l})\in \Delta ^N_{p+q-l}\). Thanks to Lemma 5.2,

$$\begin{aligned}&\mathrm {E}[Y_{i_1}\cdots Y_{i_{p+q-2r}}{\mathfrak {p}}_2(Y_{i_{p+q-2r+1}})\cdots {\mathfrak {p}}_2(Y_{i_{p+q-r}})Y_{j_1}\\&\qquad \cdots Y_{j_{p+q-2l}}{\mathfrak {p}}_2(Y_{j_{p+q-2l+1}})\cdots {\mathfrak {p}}_2(Y_{j_{p+q-l}})] \end{aligned}$$

does not vanish if and only if the following condition is satisfied:

  • (\(\star \)) \((i_1,\dots ,i_{p+q-2r})\) is a permutation of \((j_1,\dots ,j_{p+q-2l})\) and \((i_{p+q-2r+1},\dots ,i_{p+q-r})\) is a permutation of \((j_{p+q-2l+1},\dots ,j_{p+q-l})\).

Note that the condition (\(\star \)) can hold true only if \(r=l\). Moreover, if the condition (\(\star \)) is satisfied, we have

$$\begin{aligned}&\mathrm {E}[Y_{i_1}\cdots Y_{i_{p+q-2r}}{\mathfrak {p}}_2(Y_{i_{p+q-2r+1}})\cdots {\mathfrak {p}}_2(Y_{i_{p+q-r}})Y_{j_1}\\&\quad \cdots Y_{j_{p+q-2l}}{\mathfrak {p}}_2(Y_{j_{p+q-2l+1}})\cdots {\mathfrak {p}}_2(Y_{j_{p+q-l}})] =v_{i_{p+q-2r+1}}\cdots v_{i_{p+q-r}} \end{aligned}$$

by Lemma 5.2. Since there are totally \((p+q-2r)!\) permutations of \((i_1,\dots ,i_{p+q-2r})\) and r! permutations of \((i_{p+q-2r+1},\dots ,i_{p+q-r})\), respectively, (5.8) yields

$$\begin{aligned}&\mathrm {E}[J_{p+q}(FG)^2]\nonumber \\&\quad =\sum _{r=0}^{p\wedge q}r!^2\left( {\begin{array}{c}p\\ r\end{array}}\right) ^2\left( {\begin{array}{c}q\\ r\end{array}}\right) ^2(p+q-2r)!r!\nonumber \\&\quad \times \sum _{(i_1,\dots ,i_{p+q-r})\in \Delta ^N_{p+q-r}}f\mathbin {\widehat{\star _{r}^0}}g(i_{1},\dots ,i_{p+q-r})^2v_{i_{p+q-2r+1}}\cdots v_{i_{p+q-r}}. \end{aligned}$$
(5.9)

Now, (5.9) is especially true when all the \(Y_i\)’s follow the standard normal distribution. Therefore, the product formula for multiple integrals with respect to an isonormal Gaussian process yields

$$\begin{aligned} (p+q)!\Vert f\mathbin {{\widetilde{\otimes }}}g\Vert _{\ell _2}^2=&{} \sum _{r=0}^{p\wedge q}r!^2\left( {\begin{array}{c}p\\ r\end{array}}\right) ^2\left( {\begin{array}{c}q\\ r\end{array}}\right) ^2(p+q-2r)!r!\\&\times \sum _{(i_1,\dots ,i_{p+q-r})\in \Delta ^N_{p+q-r}}f\mathbin {\widehat{\star _{r}^0}}g(i_{1},\dots ,i_{p+q-r})^2\cdot 2^r, \end{aligned}$$

where \(f\mathbin {{\widetilde{\otimes }}}g\) is the symmetrization of \(f\otimes g\). Combining this formula with (5.9), we obtain (5.6).

Next, we prove (5.7). Similar arguments to the above yield

$$\begin{aligned}&\left| \mathrm {E}[J_{2p}(F^2)J_{2p}(G^2)]-(2p)!\langle f\mathbin {{\widetilde{\otimes }}}f,g\mathbin {{\widetilde{\otimes }}}g\rangle \right| \\&\quad \le \left( 2^{-p}\overline{v}_N^p-1\right) \sum _{r=1}^{p}r!^2\left( {\begin{array}{c}p\\ r\end{array}}\right) ^4(2p-2r)!r!\\&\quad \times \sum _{(i_1,\dots ,i_{2p-r})\in \Delta ^N_{2p-r}}\left| f\mathbin {\widehat{\star _{r}^0}}f(i_{1},\dots ,i_{2p-r})g\mathbin {\widehat{\star _{r}^0}}g(i_{1},\dots ,i_{2p-r})\right| \cdot 2^r \end{aligned}$$

and

$$\begin{aligned}&\sum _{r=1}^{p}r!^2\left( {\begin{array}{c}p\\ r\end{array}}\right) ^4(2p-2r)!r!\sum _{(i_1,\dots ,i_{2p-r})\in \Delta ^N_{2p-r}}f\mathbin {\widehat{\star _{r}^0}}f(i_{1},\\&\qquad \dots ,i_{2p-r})g\mathbin {\widehat{\star _{r}^0}}g(i_{1},\dots ,i_{2p-r})\cdot 2^r\\&\quad =(2p)!\left\langle f\mathbin {{\widetilde{\otimes }}}f,g\mathbin {{\widetilde{\otimes }}}g1_{(\Delta ^N_{2p})^c}\right\rangle . \end{aligned}$$

Combining these results with the Schwarz inequality, we infer that

$$\begin{aligned}&\left| \mathrm {E}[J_{2p}(F^2)J_{2p}(G^2)]-(2p)!\langle f\mathbin {{\widetilde{\otimes }}}f,g\mathbin {{\widetilde{\otimes }}}g\rangle \right| \\&\quad \le \left( 2^{-p}\overline{v}_N^p-1\right) \sqrt{(2p)!\Vert f\mathbin {{\widetilde{\otimes }}}f1_{(\Delta ^N_{2p})^c}\Vert ^2_{\ell _2}} \sqrt{(2p)!\Vert g\mathbin {{\widetilde{\otimes }}}g1_{(\Delta ^N_{2p})^c}\Vert ^2_{\ell _2}}. \end{aligned}$$

The desired result now follows from Eq.(50) in [27] and Hölder’s inequality. \(\square \)

Lemma 5.4

Under the assumptions of Proposition 5.1, we have

$$\begin{aligned} \sum _{k=1}^{p+q-1}\mathrm {E}[J_k(FG)^2]\le {{\,\mathrm{Cov}\,}}[F^2,G^2]-2\mathrm {E}[FG]^2 \end{aligned}$$
(5.10)

and

$$\begin{aligned} \sum _{k=1}^{2p-1}\mathrm {E}[J_k(F^2)^2]+p!^2\sum _{r=1}^{p-1}\left( {\begin{array}{c}p\\ r\end{array}}\right) ^2\Vert f\star _rf\Vert _{\ell _2}^2\le \mathrm {E}[F^4]-3\mathrm {E}[F^2]^2. \end{aligned}$$
(5.11)

Proof

The proof is parallel to that of [59, Lemma 3.1] but using Lemma 5.3 instead of [59, Lemma 2.1]. \(\square \)

Proof of Proposition 5.2

The nonnegativity of \(\kappa _4(F)\) and \(\kappa _4(G)\) follows from (5.11). (5.4) can be shown in a similar manner to the proof of [59, Theorem 1.1] but using Lemmas 5.3 and 5.4 instead of Lemmas 2.1 and 3.1 in [59], respectively. \(\square \)

6.3 Proof of Proposition 5.1

We have already established the nonnegativity of \(\kappa _4(Q(f_j;\varvec{Y}))\)’s in Proposition 5.2. The remaining claim of the proposition follows once we prove (5.3). By Proposition 5.2 and Lemma A.2, this follows once we show that, under the assumptions of Proposition 5.1,

$$\begin{aligned} \Vert \Gamma (F,G)-\mathrm {E}[\Gamma (F,G)]\Vert _{\psi _{(w_*(p+q)-1)^{-1}}}\le C_{p,q}\underline{\eta }_N^{-(w_*(p+q)-1)}\sqrt{{{\,\mathrm{Var}\,}}[\Gamma (F,G)]},\nonumber \\ \end{aligned}$$
(5.12)

where \(C_{p,q}>0\) depends only on pq. To prove (5.12), note that

$$\begin{aligned} \Gamma (F,G)= & {} \frac{1}{2}\left( {{\,\mathrm{L}\,}}(FG)+qFG+pGF\right) \nonumber \\= & {} \frac{p+q}{2}\mathrm {E}[FG]+\sum _{k=1}^{p+q-1}\frac{p+q-k}{2}J_k(FG). \end{aligned}$$
(5.13)

Hence, using Lemma A.5, we can deduce (5.12) by a hypercontractivity argument similar to those in [33, Section 5] and [41, Section 3.2]. \(\square \)

7 Randomized Lindeberg Method

For any \(\varpi \ge 0\) and \(x\ge 0\), we set

$$\begin{aligned} \chi _\varpi (x)= \left\{ \begin{array}{ll} \exp (-x^{1/\varpi }) &{} \text {if }\varpi >0,\\ 1_{[0,1)}(x) &{} \text {if }\varpi =0. \end{array}\right. \end{aligned}$$

The aim of this section is to prove the following result.

Proposition 6.1

Set \(\Lambda _i:=(\log d)^{(\overline{q}_d-1)/\alpha }\max _{1\le k\le d}M_N^{q_k-1}\sqrt{{{\,\mathrm{Inf}\,}}_{i}(f_k)}\) for \(i\in [N]\). Let \(\varvec{X}=(X_i)_{i=1}^N\) and \(\varvec{Y}=(Y_i)_{i=1}^N\) be two sequences of independent centered random variables with unit variance. Suppose that \(M_N:=\max _{1\le i\le N}(\Vert X_i\Vert _{\psi _\alpha }\vee \Vert Y_i\Vert _{\psi _\alpha })<\infty \) for some \(\alpha \in (0,2]\). Suppose also that there is an integer \(m\ge 3\) such that \(\mathrm {E}[X_i^r]=\mathrm {E}[Y_i^r]\) for all \(i\in [N]\) and \(r\in [m-1]\). Then, for any \(h\in C^m_b({\mathbb {R}})\), \(\beta >0\) and \(\tau ,\rho \ge 0\) with \(\tau \rho M_N\max _{1\le i\le N}\Lambda _i\le \beta ^{-1}\), we have

$$\begin{aligned}&\sup _{y\in {\mathbb {R}}^d}\left| {{\,\mathrm{E}\,}}\left[ h\left( \Phi _\beta (\varvec{Q}(\varvec{X})-y)\right) \right] -{{\,\mathrm{E}\,}}\left[ h\left( \Phi _\beta (\varvec{Q}(\varvec{Y})-y)\right) \right] \right| \nonumber \\&\quad \le C\left( \max _{1\le j\le m}\beta ^{m-j}\Vert h^{(j)}\Vert _\infty \right) \left\{ (\log d)^{m(\overline{q}_d-1)/\alpha }\max _{1\le k\le d}M_N^{mq_k}\sum _{i=1}^N{{\,\mathrm{Inf}\,}}_i(f_k)^{m/2}\right. \nonumber \\&\qquad \left. +\left( e^{-\left( \frac{\tau }{K_1}\right) ^{\alpha }} +\chi _{(\overline{q}_d-1)/\alpha }\left( \frac{\rho }{K_2}\right) (\rho \vee 1)^m+\exp \left( -\left( \frac{\tau \rho }{K_3}\right) ^{\alpha /\overline{q}_d}\right) (\tau \rho \vee 1)^m \right) \right. \nonumber \\&\qquad \left. \times M_N^m\sum _{i=1}^N\Lambda _{i}^m\right\} , \end{aligned}$$
(6.1)

where \(C>0\) depends only on \(m,\alpha ,\overline{q}_d\), \(K_1\) depends only on \(\alpha \), and \(K_2,K_3>0\) depend only on \(\alpha ,\overline{q}_d\).

Remark 6.1

Proposition 6.1 can be viewed as a version of [47, Theorem 7.1]. Apart from that we take account of higher moment matching, there are important differences between these two results. On the one hand, the latter takes all \(C^3\) functions with bounded third-order partial derivatives as test functions, while the former focuses only on test functions of the form \(x\mapsto h(\Phi _\beta (x-y))\) for some \(h\in C^m_b({\mathbb {R}})\) and \(y\in {\mathbb {R}}^d\). On the other hand, in the bound of (6.1), terms like \(\sum _{i=1}^N\max _{1\le k\le d}{{\,\mathrm{Inf}\,}}_i(f_k)\) always appear with exponential factors, so we can remove such terms by appropriately selecting the parameters \(\tau ,\rho \). In contrast, such a quantity appears (as the constant C) in the dominant term of the bound given by [47, Theorem 7.1]. As pointed out in Remark 2.4(b), this can be crucial in a high-dimensional setting, and this phenomenon originates from a (naïve) application of the Lindeberg method. To avoid this difficulty, we use a randomized version of the Lindeberg method, which was originally introduced in [24] for sums of independent random vectors.

For the proof, we need three auxiliary results. The first one is a generalization of [36, Lemma S.5.1]:

Lemma 6.1

Let \(\xi \) be a nonnegative random variable such that \(P(\xi >x)\le Ae^{-(x/B)^\alpha }\) for all \(x\ge 0\) and some constants \(A,B,\alpha >0\). Then, we have

$$\begin{aligned} {{\,\mathrm{E}\,}}\left[ \xi ^p1_{\{\xi >t\}}\right] \le A\left( 1+\frac{2p-\alpha }{p-\alpha }\right) \left( t\vee \{(2(p/\alpha -1))^{1/\alpha }B\}\right) ^pe^{-(t/B)^\alpha } \end{aligned}$$

for any \(p>\alpha \) and \(t>0\).

Proof

The proof is analogous to that of [36, Lemma S.5.1] and elementary, so we omit it. \(\square \)

The second one is a moment inequality for homogeneous sums with a sharp constant:

Lemma 6.2

Let \(\varvec{X}=(X_i)_{i=1}^N\) be a sequence of independent centered random variables. Suppose that \(M:=\max _{1\le i\le N}\Vert X_i\Vert _{\psi _\alpha }<\infty \) for some \(\alpha \in (0,2]\). Also, let \(q\in {\mathbb {N}}\) and \(f:[N]^q\rightarrow {\mathbb {R}}\) be a symmetric function vanishing on diagonals. Then,

$$\begin{aligned} \left\| Q(f;\varvec{X})\right\| _p\le K_{\alpha ,q}p^{q/\alpha }M^q\Vert f\Vert _{\ell _2} \end{aligned}$$

for any \(p\ge 2\), where \(K_{\alpha ,q}>0\) depends only on \(\alpha ,q\).

Since we need additional lemmas to prove Lemma 6.2, we postpone its proof to Appendix B.

The third one is well known and immediately follows from the commutativity of addition, but it will deserve to be explicitly stated for later reference.

Lemma 6.3

Let S be a finite set and \(\varphi \) be a real-valued function on S. Also, let \(b:S\rightarrow S\) be a bijection. Then, \(\sum _{x\in A}\varphi (b(x))=\sum _{x\in b(A)}\varphi (x)\) for any \(A\subset S\).

Now we turn to the main body of the proof. Throughout the proof, we will use the standard multi-index notation. For a multi-index \(\lambda =(\lambda _1,\dots ,\lambda _d)\in {\mathbb {Z}}_+^d\), we set \(|\lambda |:=\lambda _1+\cdots +\lambda _d\), \(\lambda !:=\lambda _1!\cdots \lambda _d!\) and \(\partial ^\lambda :=\partial _1^{\lambda _1}\cdots \partial _d^{\lambda _d}\) as usual. Also, given a vector \(x=(x_1,\dots ,x_d)\in {\mathbb {R}}^d\), we write \(x^\lambda =x_1^{\lambda _1}\cdots x_d^{\lambda _d}\).

Proof of Proposition 6.1

Without loss of generality, we may assume \(\varvec{X}\) and \(\varvec{Y}\) are independent. Throughout the proof, for two real numbers a and b, the notation \(a\lesssim b\) means that \(a\le cb\) for some constant \(c>0\) which depends only on \(m,\alpha ,\overline{q}_d\).

Take a vector \(y\in {\mathbb {R}}^d\) and define the function \(\Psi :{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) by \(\Psi (x)=h(\Phi _\beta (x-y))\) for \(x\in {\mathbb {R}}^d\). For any \(i\in [N]\), \(\sigma \in {\mathfrak {S}}_N\) and \(k\in [d]\), we define

$$\begin{aligned} \varvec{W}^\sigma _i=(W_{i,1}^\sigma ,\dots ,W_{i,N}^\sigma ):=(X_{\sigma (1)},\dots ,X_{\sigma (i)},Y_{\sigma (i+1)},\dots ,Y_{\sigma (N)}) \end{aligned}$$

and

$$\begin{aligned} U_{k,i}^\sigma&:=\sum _{\begin{array}{c} i_1,\dots ,i_{q_k}=1\\ i_1\ne i,\dots ,i_{q_k}\ne i \end{array}}^N f_k(\sigma (i_1),\dots ,\sigma (i_{q_k}))W^\sigma _{i,i_1}\cdots W^\sigma _{i,i_{q_k}},\\ V_{k,i}^\sigma&:=\sum _{\begin{array}{c} i_1,\dots ,i_{q_k}=1\\ \exists j:i_j= i \end{array}}^N f_k(\sigma (i_1),\dots ,\sigma (i_{q_k}))\prod _{l:i_l\ne i}W_{i,i_l}^\sigma . \end{aligned}$$

Then, we set \(\varvec{U}^\sigma _{i}=(U_{k,i}^\sigma )_{k=1}^d\) and \(\varvec{V}^\sigma _{i}=(V^\sigma _{k,i})_{k=1}^d\). By construction, \(\varvec{U}^\sigma _{i}\) and \(\varvec{V}^\sigma _{i}\) are independent of \(X_{\sigma (i)}\) and \(Y_{\sigma (i)}\). Moreover, we have \(Q(f_k;\varvec{W}^\sigma _{i-1})=U^\sigma _{k,i}+Y_{\sigma (i)}V^\sigma _{k,i}\) and \(Q(f_k;\varvec{W}^\sigma _{i})=U^\sigma _{k,i}+X_{\sigma (i)}V^\sigma _{k,i}\) (with \(\varvec{W}^\sigma _0:=(Y_{\sigma (1)},\dots ,Y_{\sigma (N)})\)). In particular, by Lemma 6.3 it holds that \(Q(f_k;\varvec{W}^\sigma _{0})=Q(f_k;\varvec{Y})\) and \(Q(f_k;\varvec{W}^\sigma _{N})=Q(f_k;\varvec{X})\). Therefore, we obtain

$$\begin{aligned}&\left| {{\,\mathrm{E}\,}}\left[ \Psi (\varvec{Q}(\varvec{X}))\right] -{{\,\mathrm{E}\,}}\left[ \Psi (\varvec{Q}(\varvec{Y}))\right] \right| \nonumber \\&\quad =\frac{1}{N!}\sum _{\sigma \in {\mathcal {S}}_N}\left| {{\,\mathrm{E}\,}}\left[ \Psi (\varvec{Q}(\varvec{W}^\sigma _N))\right] -{{\,\mathrm{E}\,}}\left[ \Psi (\varvec{Q}(\varvec{W}^\sigma _0))\right] \right| \nonumber \\&\quad \le \frac{1}{N!}\sum _{\sigma \in {\mathcal {S}}_N}\sum _{i=1}^N\left| {{\,\mathrm{E}\,}}\left[ \Psi (\varvec{Q}(\varvec{W}^\sigma _i))\right] -{{\,\mathrm{E}\,}}\left[ \Psi (\varvec{Q}(\varvec{W}^\sigma _{i-1}))\right] \right| . \end{aligned}$$
(6.2)

Taylor’s theorem and the independence of \(X_{\sigma (i)}\) and \(Y_{\sigma (i)}\) from \(\varvec{U}_{i}^\sigma \) and \(\varvec{V}_{i}^\sigma \) yield

$$\begin{aligned} {{\,\mathrm{E}\,}}\left[ \Psi (\varvec{U}^\sigma _i+\xi \varvec{V}^\sigma _{i})\right] =\sum _{\lambda \in {\mathbb {Z}}_+^d:|\lambda |\le m-1}\frac{1}{\lambda !}{{\,\mathrm{E}\,}}\left[ \partial ^\lambda \Psi (\varvec{U}^\sigma _i)\left( \varvec{V}_i^\sigma \right) ^\lambda \right] {{\,\mathrm{E}\,}}\left[ \xi ^{|\lambda |}\right] +R^\sigma _i[\xi ] \end{aligned}$$

for \(\xi \in \{X_{\sigma (i)},Y_{\sigma (i)}\}\), where

$$\begin{aligned} R^\sigma _i[\xi ]:=\sum _{\lambda \in {\mathbb {Z}}_+^d:|\lambda |=m}\frac{m}{\lambda !}\int _0^1(1-t)^{m-1}{{\,\mathrm{E}\,}}\left[ \partial ^\lambda \Psi (\varvec{U}^\sigma _i+t\xi \varvec{V}^\sigma _i)\xi ^{m}(\varvec{V}^\sigma _{i})^\lambda \right] dt. \end{aligned}$$

Since \(\mathrm {E}[X_i^r]=\mathrm {E}[Y_i^r]\) for all \(i\in [N]\) and \(r\in [m-1]\) by assumption, we obtain

$$\begin{aligned} \left| {{\,\mathrm{E}\,}}\left[ \Psi \left( \varvec{Q}^\sigma _{i})\right] -{{\,\mathrm{E}\,}}\left[ \Psi (\varvec{Q}^\sigma _{i-1}\right) \right] \right| \le |R^\sigma _i[X_{\sigma (i)}]|+|R^\sigma _i[Y_{\sigma (i)}]| \le {\mathbf {I}}_i^\sigma +\mathbf {II}_i^\sigma , \end{aligned}$$
(6.3)

where \({\mathbf {I}}_i^\sigma :={\mathbf {I}}_i^\sigma [X_{\sigma (i)}]+{\mathbf {I}}_i^\sigma [Y_{\sigma (i)}]\), \(\mathbf {II}_i^\sigma :=\mathbf {II}_i^\sigma [X_{\sigma (i)}]+\mathbf {II}_i^\sigma [Y_{\sigma (i)}]\) and

$$\begin{aligned} {\mathbf {I}}_i^\sigma [\xi ]&:=\sum _{\lambda \in {\mathbb {Z}}_+^d:|\lambda |=m}\frac{m}{\lambda !}\int _0^1(1-t)^{m-1}{{\,\mathrm{E}\,}}\left[ |\partial ^\lambda \Psi (\varvec{U}^\sigma _i+t\xi \varvec{V}^\sigma _i)||\xi |^{m}|(\varvec{V}^\sigma _{i})^\lambda |;{\mathcal {E}}_{\sigma ,i}\right] dt,\\ \mathbf {II}_i^\sigma [\xi ]&:=\sum _{\lambda \in {\mathbb {Z}}_+^d:|\lambda |=m}\frac{m}{\lambda !}\int _0^1(1-t)^{m-1}{{\,\mathrm{E}\,}}\left[ |\partial ^\lambda \Psi (\varvec{U}^\sigma _i+t\xi \varvec{V}^\sigma _i)||\xi |^{m}|(\varvec{V}^\sigma _{i})^\lambda |;{\mathcal {E}}_{\sigma ,i}^c\right] dt \end{aligned}$$

for \(\xi \in \{X_{\sigma (i)},Y_{\sigma (i)}\}\) and \({\mathcal {E}}_{\sigma ,i}:=\{(|X_{\sigma (i)}|+|Y_{\sigma (i)}|)\Vert \varvec{V}^\sigma _i\Vert _{\ell _\infty }\le \tau \rho M_N\Lambda _{\sigma (i)}\}\).

First, we consider \({\mathbf {I}}_i^\sigma \). Since \(\tau \rho M_N\max _{1\le i\le N}\Lambda _i\le \beta ^{-1}\) by assumption, Lemma 3.1 and the independence of \(X_{\sigma (i)},Y_{\sigma (i)}\) from \(\varvec{U}^\sigma _i,\varvec{V}^\sigma _i\) imply that

$$\begin{aligned} {\mathbf {I}}_i^\sigma&\le \frac{e^8}{m!}\sum _{j_1,\dots ,j_m=1}^d{{\,\mathrm{E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}(\varvec{U}^\sigma _i-y)(|X_{\sigma (i)}|^{m}+|Y_{\sigma (i)}|^{m})|V^\sigma _{j_1,i}|^m\right] \nonumber \\&\le \frac{e^8}{m!}\sum _{j_1,\dots ,j_m=1}^d{{\,\mathrm{E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}(\varvec{U}^\sigma _i-y)|V^\sigma _{j_1,i}|^m\right] {{\,\mathrm{E}\,}}\left[ |X_{\sigma (i)}|^{m}+|Y_{\sigma (i)}|^{m}\right] \nonumber \\&\le \frac{e^8}{m!}\sup _{1\le i\le N}\mathrm {E}[|X_{i}|^{m}+|Y_{i}|^{m}]\left\{ {\mathbf {I}}(1)_i^\sigma +{\mathbf {I}}(2)_i^\sigma +{\mathbf {I}}(3)_i^\sigma \right\} , \end{aligned}$$
(6.4)

where

$$\begin{aligned} {\mathbf {I}}(1)_i^\sigma&:=\sum _{j_1,\dots ,j_m=1}^d{{\,\mathrm{E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}(\varvec{U}^\sigma _i-y)|V^\sigma _{j_1,i}|^m;{\mathcal {C}}_{\sigma ,i}\cap {\mathcal {D}}_{\sigma ,i}\right] ,\\ {\mathbf {I}}(2)_i^\sigma&:=\sum _{j_1,\dots ,j_m=1}^d{{\,\mathrm{E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}(\varvec{U}^\sigma _i-y)|V^\sigma _{j_1,i}|^m;{\mathcal {C}}_{\sigma ,i}^c\right] ,\\ {\mathbf {I}}(3)_i^\sigma&:=\sum _{j_1,\dots ,j_m=1}^d{{\,\mathrm{E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}(\varvec{U}^\sigma _i-y)|V^\sigma _{j_1,i}|^m;{\mathcal {D}}_{\sigma ,i}^c\right] \end{aligned}$$

and \({\mathcal {C}}_{\sigma ,i}:=\{|X_{\sigma (i)}|+|Y_{\sigma (i)}|\le \tau M_N\},{\mathcal {D}}_{\sigma ,i}:=\{\Vert \varvec{V}^\sigma _i\Vert _{\ell _\infty }\le \rho \Lambda _{\sigma (i)}\}\).

We begin by estimating \({\mathbf {I}}(1)_i^\sigma \). Let \((\delta _i)_{i=1}^N\) be a sequence of i.i.d. Bernoulli variables independent of \(\varvec{X}\) and \(\varvec{Y}\) with \(P(\delta _i=1)=1-P(\delta _i=0)=i/(N+1)\). We set \(\zeta _{i,a}:=\delta _i X_a+(1-\delta _i)Y_a\) for all \(i,a\in [N]\). Then, since \(\Vert \zeta _{i,\sigma (i)}\varvec{V}^\sigma _i\Vert _{\ell _\infty }\le \tau \rho M_N\max _{1\le i\le N}\Lambda _i\le \beta ^{-1}\) on the set \({\mathcal {C}}_{\sigma ,i}\cap {\mathcal {D}}_{\sigma ,i}\), by Lemma 3.1 we obtain

$$\begin{aligned} {\mathbf {I}}(1)_i^\sigma \le e^8\sum _{j_1,\dots ,j_m=1}^d{{\,\mathrm{E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}(\varvec{U}^\sigma _i+\zeta _{i,\sigma (i)}\varvec{V}^\sigma _i-y)|V^\sigma _{j_1,i}|^m\right] . \end{aligned}$$

The subsequent discussions are inspired by the proof of [24, Lemma 2] and we introduce some notation analogous to theirs. For any \(i,a\in [N]\), we set

$$\begin{aligned} {\mathcal {A}}_{i,a}=\{(A,B):A\subset [N],B\subset [N],A\cup B=[N]\setminus \{a\},\#A=i-1,\#B=N-i\}, \end{aligned}$$

where \(\#S\) denotes the number of elements in a set S. We also set

$$\begin{aligned} {\mathcal {A}}_{i}=\{(A,B):A\subset [N],B\subset [N],A\cup B=[N],\#A=i,\#B=N-i\} \end{aligned}$$

for every \(i\in \{0,1\dots ,N\}\). Moreover, for any \(A,B\subset [N]\) with \(A\cap B=\emptyset \) and \(i\in A\cup B\), we define the random variable \(W^{(A,B)}_i\) by \(W^{(A,B)}_i:=X_i\) if \(i\in A\) and \(W^{(A,B)}_i:=Y_i\) if \(i\in B\). Then, we define

$$\begin{aligned} Q_k^{(A,B)}:=\sum _{i_1,\dots ,i_{q_k}=1}^Nf_k(i_1,\dots ,i_{q_k})W^{(A,B)}_{i_1}\cdots W^{(A,B)}_{i_{q_k}} \end{aligned}$$

for any \(k\in [d]\) and \((A,B)\in \bigcup _{i=0}^\infty \mathcal {A}_i\), and set \(\varvec{Q}^{(A,B)}:=(Q_k^{(A,B)})_{k=1}^d\). We also define

$$\begin{aligned} U_{k,a}^{(A,B)}&:=\sum _{\begin{array}{c} i_1,\dots ,i_{q_k}=1\\ i_1\ne a,\dots ,i_{q_k}\ne a \end{array}}^N f_k(i_1,\dots ,i_{q_k})W^{(A,B)}_{i_1}\cdots W^{(A,B)}_{i_{q_k}},\\ V_{k,a}^{(A,B)}&:=\sum _{\begin{array}{c} i_1,\dots ,i_{q_k}=1\\ \exists j:i_j= a \end{array}}^N f_k(i_1,\dots ,i_{q_k})\prod _{l:i_l\ne a}W^{(A,B)}_{i_l} \end{aligned}$$

for any \(k\in [d]\), \(i,a\in [N]\) and \((A,B)\in \bigcup _{j=1}^N{\mathcal {A}}_{j,a}\) and set \(\varvec{U}_a^{(A,B)}:=(U_{k,a}^{(A,B)})_{k=1}^d\) and \(\varvec{V}_a^{(A,B)}:=(V_{k,a}^{(A,B)})_{k=1}^d\). Finally, for any \(\sigma \in {\mathfrak {S}}_N\) and \(i\in [N]\) we set \(A^\sigma _i:=\{\sigma (1),\dots ,\sigma (i-1)\}\) and \(B^\sigma _i:=\{\sigma (i+1),\dots ,\sigma (N)\}\).

Now, since we have \(W^\sigma _{i,j}=W^{(A^\sigma _i,B^\sigma _i)}_{\sigma (j)}\) for \(j\in [N]\setminus \{i\}\), it holds that \(\varvec{U}^\sigma _{i}=\varvec{U}^{(A^\sigma _i,B^\sigma _i)}_{\sigma (i)}\) and \(\varvec{V}^\sigma _{i}=\varvec{V}^{(A^\sigma _i,B^\sigma _i)}_{\sigma (i)}\) by Lemma 6.3. Therefore, we obtain

$$\begin{aligned}&\frac{1}{N!}\sum _{\sigma \in {\mathfrak {S}}_N}\sum _{i=1}^N{\mathbf {I}}(1)_i^\sigma \\ {}&\quad \le \frac{e^8}{N!}\sum _{\sigma \in {\mathfrak {S}}_N}\sum _{i=1}^N\sum _{j_1,\dots ,j_m=1}^d\\ {}&\qquad {{\,\mathrm {E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}\left( \varvec{U}^{(A^\sigma _i,B^\sigma _i)}_{\sigma (i)}+\zeta _{i,\sigma (i)}\varvec{V}^{(A^\sigma _i,B^\sigma _i)}_{\sigma (i)}-y\right) \left| V^{(A^\sigma _i,B^\sigma _i)}_{j_1,\sigma (i)}\right| ^m\right] \\ {}&\quad =\frac{e^8}{N!}\sum _{i=1}^N\sum _{a=1}^N\sum _{\sigma \in {\mathfrak {S}}_N:\sigma (i)=a}\sum _{j_1,\dots ,j_m=1}^d {{\,\mathrm {E}\,}} \left[ \Upsilon _\beta ^{j_1,\dots , j_m}\left( \varvec{U}^{(A^\sigma _i,B^\sigma _i)}_{a}\right. \right. \\ {}&\left. \left. \qquad +\zeta _{i,a}\varvec{V}^{(A^\sigma _i,B^\sigma _i)}_{a}-y\right) \left| V^{(A^\sigma _i,B^\sigma _i)}_{j_1,a}\right| ^m\right] \\ {}&\quad =\frac{e^8}{N!}\sum _{i=1}^N\sum _{a=1}^N\sum _{(A,B)\in {\mathcal {A}}_{i,a}}\#\{\sigma \in {\mathfrak {S}}_N:A^\sigma _i=A,\sigma (i)=a\}\\ {}&\quad \times \sum _{j_1,\dots ,j_m=1}^d{{\,\mathrm {E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}\left( \varvec{U}^{(A,B)}_{a}+\zeta _{i,a}\varvec{V}^{(A,B)}_{a}-y\right) \left| V^{(A,B)}_{j_1,a}\right| ^m\right] \\ {}&\quad =e^8\sum _{i=1}^N\frac{(i-1)!(N-i)!}{N!}\sum _{a=1}^N\sum _{(A,B)\in {\mathcal {A}}_{i,a}}\sum _{j_1,\dots ,j_m=1}^d\\ {}&\quad \qquad {{\,\mathrm {E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}\left( \varvec{U}^{(A,B)}_{a}+\zeta _{i,a}\varvec{V}^{(A,B)}_{a}-y\right) \left| V^{(A,B)}_{j_1,a}\right| ^m\right] . \end{aligned}$$

Now, for \((A,B)\in {\mathcal {A}}_{i,a}\) we have \(\varvec{U}^{(A,B)}_{a}+X_a\varvec{V}^{(A,B)}_{a}=\varvec{Q}^{(A\cup \{a\},B)}\) and \(\varvec{U}^{(A,B)}_{a}+Y_a\varvec{V}^{(A,B)}_{a}=\varvec{Q}^{(A,B\cup \{a\})}\), so we obtain

$$\begin{aligned}&\frac{1}{N!}\sum _{\sigma \in {\mathfrak {S}}_N}\sum _{i=1}^N{\mathbf {I}}(1)_i^\sigma \\&\quad \le e^8\sum _{i=1}^N\frac{(i-1)!(N-i)!}{N!}\sum _{a=1}^N\sum _{(A,B)\in {\mathcal {A}}_{i,a}}\sum _{j_1,\dots ,j_m=1}^d\\&\qquad \left\{ \frac{i}{N+1}{{\,\mathrm{E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}\left( \varvec{Q}^{(A\cup \{a\},B)}-y\right) \left| V^{(A,B)}_{j_1,a}\right| ^m\right] \right. \\&\quad \quad \left. +\frac{N+1-i}{N+1}{{\,\mathrm{E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}\left( \varvec{Q}^{(A,B\cup \{a\})}-y\right) \left| V^{(A,B)}_{j_1,a}\right| ^m\right] \right\} \\&\quad =e^8\sum _{i=1}^N\frac{i!(N-i)!}{(N+1)!}\sum _{a=1}^N\sum _{(A,B)\in {\mathcal {A}}_{i,a}}\sum _{j_1,\dots ,j_m=1}^d\\&\qquad {{\,\mathrm{E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}\left( \varvec{Q}^{(A\cup \{a\},B)}-y\right) \left| V^{(A,B)}_{j_1,a}\right| ^m\right] \\&\qquad +e^8\sum _{i=1}^N\frac{(i-1)!(N+1-i)!}{(N+1)!}\sum _{a=1}^N\sum _{(A,B)\in {\mathcal {A}}_{i,a}}\sum _{j_1,\dots ,j_m=1}^d\\&\qquad {{\,\mathrm{E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}\left( \varvec{Q}^{(A,B\cup \{a\})}-y\right) \left| V^{(A,B)}_{j_1,a}\right| ^m\right] \\&\quad =e^8\sum _{i=1}^N\frac{i!(N-i)!}{(N+1)!}\sum _{a=1}^N\sum _{(A,B)\in {\mathcal {A}}_{i,a}}\sum _{j_1,\dots ,j_m=1}^d\\&\qquad {{\,\mathrm{E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}\left( \varvec{Q}^{(A\cup \{a\},B)}-y\right) \left| V^{(A,B)}_{j_1,a}\right| ^m\right] \\&\qquad +e^8\sum _{i=0}^{N-1}\frac{i!(N-i)!}{(N+1)!}\sum _{a=1}^N\sum _{(A,B)\in {\mathcal {A}}_{i+1,a}}\sum _{j_1,\dots ,j_m=1}^d\\&\qquad {{\,\mathrm{E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}\left( \varvec{Q}^{(A,B\cup \{a\})}-y\right) \left| V^{(A,B)}_{j_1,a}\right| ^m\right] \\&\quad =e^8\sum _{i=1}^N\frac{i!(N-i)!}{(N+1)!}\sum _{a=1}^N\sum _{(A,B)\in {\mathcal {A}}_{i}:a\in A}\sum _{j_1,\dots ,j_m=1}^d\\&\qquad {{\,\mathrm{E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}\left( \varvec{Q}^{(A,B)}-y\right) \left| V^{(A\setminus \{a\},B)}_{j_1,a}\right| ^m\right] \\&\qquad +e^8\sum _{i=0}^{N-1}\frac{i!(N-i)!}{(N+1)!}\sum _{a=1}^N\sum _{(A,B)\in {\mathcal {A}}_{i}:a\in B}\sum _{j_1,\dots ,j_m=1}^d\\&\qquad {{\,\mathrm{E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}\left( \varvec{Q}^{(A,B)}-y\right) \left| V^{(A,B\setminus \{a\})}_{j_1,a}\right| ^m\right] \\&\quad =e^8\sum _{i=0}^N\frac{i!(N-i)!}{(N+1)!}\sum _{(A,B)\in {\mathcal {A}}_{i}}\sum _{j_1,\dots ,j_m=1}^d\\&\qquad {{\,\mathrm{E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}\left( \varvec{Q}^{(A,B)}-y\right) \sum _{a=1}^N\left| V^{(A\setminus \{a\},B\setminus \{a\})}_{j_1,a}\right| ^m\right] \\&\quad \le e^8\sum _{i=0}^N\frac{i!(N-i)!}{(N+1)!}\sum _{(A,B)\in {\mathcal {A}}_{i}}\sum _{j_1,\dots ,j_m=1}^d\\&\qquad {{\,\mathrm{E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}\left( \varvec{Q}^{(A,B)}-y\right) \max _{1\le k\le d}\sum _{a=1}^N\left| V^{(A\setminus \{a\},B\setminus \{a\})}_{k,a}\right| ^m\right] . \end{aligned}$$

Hence, Lemma 3.1 yields

$$\begin{aligned} \frac{1}{N!}\sum _{\sigma \in {\mathfrak {S}}_N}\sum _{i=1}^N{\mathbf {I}}(1)_i^\sigma&\lesssim \max _{1\le j\le m}\beta ^{m-j}\Vert h^{(j)}\Vert _\infty \sum _{i=0}^N\frac{i!(N-i)!}{(N+1)!}\sum _{(A,B)\in {\mathcal {A}}_{i}}\\&\quad {{\,\mathrm{E}\,}}\left[ \max _{1\le k\le d}\sum _{a=1}^N\left| V^{(A\setminus \{a\},B\setminus \{a\})}_{k,a}\right| ^m\right] . \end{aligned}$$

Now, by Lemma 6.2 we have

$$\begin{aligned} \left\| V^{(A\setminus \{a\},B\setminus \{a\})}_{k,a}\right\| _{r}\le C_{\alpha ,\overline{q}_d}r^{(q_k-1)/\alpha }M_N^{q_k-1}\sqrt{{{\,\mathrm{Inf}\,}}_a(f_k)} \end{aligned}$$
(6.5)

for any \(r\ge 1\), where \(C_{\alpha ,\overline{q}_d}>0\) depends only on \(\alpha ,\overline{q}_d\). Hence, the Minkowski inequality yields

$$\begin{aligned} \left\| \sum _{a=1}^N\left| V^{(A\setminus \{a\},B\setminus \{a\})}_{k,a}\right| ^m\right\| _r \le C_{\alpha ,\overline{q}_d}(mr)^{m(q_k-1)/\alpha }M_N^{m(q_k-1)}\sum _{a=1}^N{{\,\mathrm{Inf}\,}}_a(f_k)^{m/2}.\quad \end{aligned}$$
(6.6)

Thus, if \(\overline{q}_d>1\), Lemma A.5 yields

$$\begin{aligned}&\left\| \sum _{a=1}^N\left| V^{(A\setminus \{a\},B\setminus \{a\})}_{k,a}\right| ^m\right\| _{\psi _{\alpha /\{m(\overline{q}_d-1)\}}}\\&\quad \lesssim M_N^{m(q_k-1)}\sum _{a=1}^N{{\,\mathrm{Inf}\,}}_a(f_k)^{m/2}. \end{aligned}$$

Therefore, by Lemmas A.2 and A.6 we conclude that

$$\begin{aligned} {{\,\mathrm{E}\,}}\left[ \max _{1\le k\le d}\sum _{a=1}^N\left| V^{(A\setminus \{a\},B\setminus \{a\})}_{k,a}\right| ^m\right] \lesssim (\log d)^{m(\overline{q}_d-1)/\alpha }\max _{1\le k\le d}M_N^{m(q_k-1)}\sum _{a=1}^N{{\,\mathrm{Inf}\,}}_a(f_k)^{m/2}. \end{aligned}$$

This inequality also holds true when \(\overline{q}_d=1\) because in this case \(V^{(A\setminus \{a\},B\setminus \{a\})}_{k,a}\)’s are non-random and thus it is a direct consequence of (6.6). As a result, we obtain

$$\begin{aligned} \frac{1}{N!}\sum _{\sigma \in {\mathfrak {S}}_N}\sum _{i=1}^N{\mathbf {I}}(1)_i^\sigma \lesssim&{} \left( \max _{1\le j\le m}\beta ^{m-j}\Vert h^{(j)}\Vert _\infty \right) \nonumber \\ \times&\left( (\log d)^{m(\overline{q}_d-1)/\alpha }\max _{1\le k\le d}M_N^{m(q_k-1)}\sum _{a=1}^N{{\,\mathrm {Inf}\,}}_a(f_k)^{m/2}\right) .\nonumber \\ \end{aligned}$$
(6.7)

Next, we estimate \({\mathbf {I}}(2)^\sigma _i\). Since \(X_{\sigma (i)}\) and \(Y_{\sigma (i)}\) are independent of \(\varvec{U}^\sigma _i\) and \(\varvec{V}^\sigma _i\),

$$\begin{aligned} {\mathbf {I}}(2)_i^\sigma \le \sum _{j_1,\dots ,j_m=1}^d{{\,\mathrm{E}\,}}\left[ \Upsilon _\beta ^{j_1,\dots , j_m}(\varvec{U}^\sigma _i-y)\Vert V^\sigma _{i}\Vert _{\ell _\infty }^m\right] P\left( {\mathcal {C}}_{\sigma ,i}^c\right) . \end{aligned}$$

Hence, Lemma 3.1 yields

$$\begin{aligned} {\mathbf {I}}(2)_i^\sigma \lesssim \left( \max _{1\le j\le m}\beta ^{m-j}\Vert h^{(j)}\Vert _\infty \right) {{\,\mathrm{E}\,}}\left[ \Vert V^\sigma _{i}\Vert _{\ell _\infty }^m\right] P\left( {\mathcal {C}}_{\sigma ,i}^c\right) . \end{aligned}$$

Now, if \(\overline{q}_d>1\), (6.5) and Lemma A.5 yield

$$\begin{aligned} \Vert V^\sigma _{k,i}\Vert _{\psi _{\alpha /(\overline{q}_d-1)}}\le c_{\alpha ,\overline{q}_d}M_N^{q_k-1}\sqrt{{{\,\mathrm{Inf}\,}}_{\sigma (i)}(f_k)}, \end{aligned}$$

where \(c_{\alpha ,\overline{q}_d}>0\) depends only on \(\alpha ,\overline{q}_d\). Hence, Lemmas A.2 and A.6 yield

$$\begin{aligned} \Vert \Vert \varvec{V}^\sigma _{i}\Vert _{\ell _\infty }\Vert _{r}\le c'_{\alpha ,\overline{q}_d}r^{(\overline{q}_d-1)/\alpha }\Lambda _{\sigma (i)} \end{aligned}$$
(6.8)

for every \(r\ge 1\) with \(c'_{\alpha ,\overline{q}_d}>0\) depending only on \(\alpha ,\overline{q}_d\). This inequality also holds true when \(\overline{q}_d=1\) because in this case \(\varvec{V}^\sigma _{i}\) is non-random and thus it is a direct consequence of (6.5). Meanwhile, (A.3) and Lemma A.3 yield \( P({\mathcal {C}}_{\sigma ,i}^c)\le 2e^{-(\tau /2^{1\vee \alpha ^{-1}})^{\alpha }}. \) Consequently, we obtain

$$\begin{aligned} \frac{1}{N!}\sum _{\sigma \in {\mathfrak {S}}_N}\sum _{i=1}^N{\mathbf {I}}(2)_i^\sigma \lesssim \left( \max _{1\le j\le m}\beta ^{m-j}\Vert h^{(j)}\Vert _\infty \right) e^{-(\tau /2^{1\vee \alpha ^{-1}})^{\alpha }}\sum _{i=1}^N\Lambda _i^m. \end{aligned}$$
(6.9)

Third, we estimate \({\mathbf {I}}(3)_i^\sigma \). Lemma 3.1 yields

$$\begin{aligned} {\mathbf {I}}(3)_i^\sigma \lesssim \left( \max _{1\le j\le m}\beta ^{m-j}\Vert h^{(j)}\Vert _\infty \right) {{\,\mathrm{E}\,}}\left[ \Vert \varvec{V}^\sigma _{i}\Vert _{\ell _\infty }^m;{\mathcal {D}}_{\sigma ,i}^c\right] . \end{aligned}$$

If \(\overline{q}_d>1\), (6.8) and Lemma A.4 yield

$$\begin{aligned} P\left( \Vert \varvec{V}^\sigma _{i}\Vert _{\ell _\infty }\ge x\right) \le e^{(\overline{q}_d-1)/\alpha }\exp \left( -\left( \frac{x}{K_{\alpha ,\overline{q}_d}\Lambda _{\sigma (i)}}\right) ^{\alpha /(\overline{q}_d-1)}\right) \end{aligned}$$

for every \(x>0\) with \(K_{\alpha ,\overline{q}_d}>0\) depending only on \(\alpha ,\overline{q}_d\). Hence, Lemma 6.1 yields

$$\begin{aligned} {{\,\mathrm{E}\,}}\left[ \Vert \varvec{V}^\sigma _{i}\Vert _{\ell _\infty }^m;{\mathcal {D}}_{\sigma ,i}^c\right] \lesssim (\rho \vee 1)^m\Lambda _{\sigma (i)}^m\exp \left( -\left( \frac{\rho }{K_{\alpha ,\overline{q}_d}}\right) ^{\alpha /(\overline{q}_d-1)}\right) . \end{aligned}$$

Meanwhile, if \(\overline{q}_d=1\), \(\varvec{V}^\sigma _{i}\) is non-random, so (6.8) yields \( {{\,\mathrm{E}\,}}\left[ \Vert \varvec{V}^\sigma _{i}\Vert _{\ell _\infty }^m;{\mathcal {D}}_{\sigma ,i}^c\right] \lesssim \Lambda _{\sigma (i)}^m1_{\{c'_{\alpha ,\overline{q}_d}>\rho \}}. \) Consequently, setting \(K_{\alpha ,\overline{q}_d}':=K_{\alpha ,\overline{q}_d}\vee c'_{\alpha ,\overline{q}_d}\), we obtain

$$\begin{aligned}&\frac{1}{N!}\sum _{\sigma \in {\mathfrak {S}}_N}\sum _{i=1}^N{\mathbf {I}}(3)_i^\sigma \nonumber \\&\quad \lesssim \left( \max _{1\le j\le m}\beta ^{m-j}\Vert h^{(j)}\Vert _\infty \right) \chi _{(\overline{q}_d-1)/\alpha }\left( \frac{\rho }{K'_{\alpha ,\overline{q}_d}}\right) (\rho \vee 1)^m\sum _{i=1}^N\Lambda _{i}^m. \end{aligned}$$
(6.10)

Now, combining (6.4), (6.7), (6.9), (6.10) with Lemma A.6, we obtain

$$\begin{aligned}&\frac{1}{N!}\sum _{\sigma \in {\mathfrak {S}}_N}\sum _{i=1}^N{\mathbf {I}}_i^\sigma \nonumber \\&\quad \lesssim \left( \max _{1\le j\le m}\beta ^{m-j}\Vert h^{(j)}\Vert _\infty \right) \left\{ (\log d)^{m(\overline{q}_d-1)/\alpha }\max _{1\le k\le d}M_N^{mq_k}\sum _{i=1}^N{{\,\mathrm{Inf}\,}}_i(f_k)^{m/2}\right. \nonumber \\&\qquad \left. +e^{-(\tau /2^{1\vee \alpha ^{-1}})^{\alpha }}M_N^{m}\sum _{i=1}^N\Lambda _i^m +\chi _{(\overline{q}_d-1)/\alpha }\left( \frac{\rho }{K'_{\alpha ,\overline{q}_d}}\right) (\rho \vee 1)^mM_N^{m}\sum _{i=1}^N\Lambda _{i}^m\right\} .\nonumber \\ \end{aligned}$$
(6.11)

Next, we consider \(\mathbf {II}_i^\sigma \). Lemma 3.1 yields

$$\begin{aligned} \frac{1}{N!}\sum _{\sigma \in {\mathfrak {S}}_N}\sum _{i=1}^N\mathbf {II}_i^\sigma&\lesssim \frac{1}{N!}\left( \max _{1\le j\le m}\beta ^{m-j}\Vert h^{(j)}\Vert _\infty \right) \\&\quad \times \sum _{\sigma \in {\mathfrak {S}}_N}\sum _{i=1}^N{{\,\mathrm {E}\,}}\left[ (|X_{\sigma (i)}|^{m}+|Y_{\sigma (i)}|^{m})\max _{1\le k\le d}\left| V^\sigma _{k,i}\right| ^m;{\mathcal {E}}_{\sigma ,i}^c\right] . \end{aligned}$$

Since \(X_{\sigma (i)}\) and \(Y_{\sigma (i)}\) are independent of \(\varvec{V}^\sigma _{i}\), Lemma A.6 and (6.8) imply that

$$\begin{aligned} \Vert (|X_{\sigma (i)}|+|Y_{\sigma (i)}|)\Vert \varvec{V}^\sigma _{i}\Vert _{\ell _\infty }\Vert _{r}\le L_{\alpha ,\overline{q}_d}r^{\overline{q}_d/\alpha }M_N\Lambda _{\sigma (i)} \end{aligned}$$

for every \(r\ge 1\) with \(L_{\alpha ,\overline{q}_d}>0\) depending only on \(\alpha ,\overline{q}_d\). Thus, by Lemma A.4 we obtain

$$\begin{aligned} P\left( (|X_{\sigma (i)}|+|Y_{\sigma (i)}|)\Vert \varvec{V}^\sigma _{i}\Vert _{\ell _\infty }\ge x\right) \le e^{\overline{q}_d/\alpha }\cdot \exp \left( -\left( \frac{x}{L'_{\alpha ,\overline{q}_d}M_N\Lambda _{\sigma (i)}}\right) ^{\alpha /\overline{q}_d}\right) \end{aligned}$$

for every \(x>0\) with \(L'_{\alpha ,\overline{q}_d}>0\) depending only on \(\alpha ,\overline{q}_d\). So Lemma 6.1 yields

$$\begin{aligned}&\frac{1}{N!}\sum _{\sigma \in {\mathfrak {S}}_N}\sum _{i=1}^N\mathbf {II}_i^\sigma \\&\quad \lesssim \left( \max _{1\le j\le m}\beta ^{m-j}\Vert h^{(j)}\Vert _\infty \right) \sum _{i=1}^N\left( \tau \rho \vee 1\right) ^mM_N^m\Lambda _{i}^m \exp \left( -\left( \frac{\tau \rho }{L'_{\alpha ,\overline{q}_d}}\right) ^{\alpha /\overline{q}_d}\right) . \end{aligned}$$

Combining this inequality with (6.2), (6.3) and (6.11), we complete the proof. \(\square \)

8 Proof of the Main Results

8.1 Proof of Theorem 2.1

The following result is a version of [47, Lemma 4.3]. The proof is a minor modification of the latter’s, so we omit it.

Lemma 7.1

Let \(q\in {\mathbb {N}}\) and \(f:[N]^q\rightarrow {\mathbb {R}}\) be a symmetric function vanishing on diagonals. Also, let \(\varvec{X}=(X_i)_{i=1}^N\) and \(\varvec{Y}=(Y_i)_{i=1}^N\) be two sequences of independent centered random variables with unit variance. Suppose that there are integers \(3\le m\le l\) such that \(M_N:=\max _{1\le i\le N}(\Vert X_i\Vert _{l}\vee \Vert Y_i\Vert _{l})<\infty \) and \(\mathrm {E}[X_i^r]=\mathrm {E}[Y_i^r]\) for all \(i\in [N]\) and \(r\in [m-1]\). Then, we have \(Q(f;\varvec{X}),Q(f;\varvec{Y})\in L^l(P)\) and

$$\begin{aligned}&|\mathrm {E}[Q(f;\varvec{X})^l]-\mathrm {E}[Q(f;\varvec{Y})^l]|\\&\quad \le CM_N^{ql}(1\vee \Vert f\Vert _{\ell _2})^{l-m}\sum _{i=1}^N\max \{{{\,\mathrm{Inf}\,}}_i(f)^{\frac{m}{2}},{{\,\mathrm{Inf}\,}}_i(f)^{\frac{l}{2}}\}, \end{aligned}$$

where \(C>0\) depends only on ql.

Proof of Theorem 2.1

Throughout the proof, for two real numbers a and b, the notation \(a\lesssim b\) means that \(a\le cb\) for some constant \(c>0\) which depends only on \(\alpha ,\overline{q}_d\). Moreover, if \((\log d)^{\mu +\frac{1}{2}}\delta _1[\varvec{Q}(\varvec{X})]^{\frac{1}{3}}\ge 1\), then the claim evidently holds true with \(C=1\), so we may assume \((\log d)^{\mu +\frac{1}{2}}\delta _1[\varvec{Q}(\varvec{X})]^{\frac{1}{3}}<1\).

Set \(s_i:=\mathrm {E}[X_i^3]\) for every i. We take a sequence \(\varvec{Y}=(Y_i)_{i=1}^N\) of independent random variables such that

$$\begin{aligned} Y_i\sim \left\{ \begin{array}{cl} {\mathcal {N}}(0,1) &{} \text {if }s_i=0, \\ \gamma _+(4/s_i^2) &{} \text {if }s_i>0, \\ \gamma _-(4/s_i^2) &{} \text {if }s_i<0. \end{array} \right. \end{aligned}$$

By construction, we have \(\mathrm {E}[X_i^r]=\mathrm {E}[Y_i^r]\) for any \(i\in [N]\) and \(r\in [3]\). Moreover, one can easily check that \(\Vert Y_i\Vert _r\le \overline{B}_N(r-1)^w\) for any \(i\in [N]\) and \(r\ge 2\). Hence, by Lemma A.5 we have \(\max _{1\le i\le N}\Vert Y_i\Vert _{\psi _\alpha }\le c_\alpha \overline{B}_N\) with \(c_\alpha \ge 1\) depending only on \(\alpha \). Therefore, applying Proposition 6.1 with \(m=4\), we obtain

$$\begin{aligned}&\Delta _\varepsilon (\varvec{Q}(\varvec{X}),\varvec{Q}(\varvec{Y}))\\&\quad \le C_1\varepsilon ^{-4}(\log d)^3 \left\{ (\log d)^{4(\overline{q}_d-1)/\alpha }\max _{1\le k\le d}\overline{B}_N^{4q_k}\sum _{i=1}^N{{\,\mathrm{Inf}\,}}_i(f_k)^{2} \right. \\&\qquad +\left( e^{-\left( \frac{\tau }{K_1}\right) ^{\alpha }} +\chi _{(\overline{q}_d-1)/\alpha }\left( \frac{\rho }{K_2}\right) (\rho \vee 1)^4\right. \\&\left. \left. \qquad +\exp \left( -\left( \frac{\tau \rho }{K_3\overline{B}_N}\right) ^{\alpha /\overline{q}_d}\right) (\tau \rho \vee 1)^4 \right) \overline{B}_N^4\sum _{i=1}^N\Lambda _{i}^4 \right\} \\&\quad =:C_1\varepsilon ^{-4}(\log d)^3\left( {\mathbb {I}}+\mathbb {II}\right) \end{aligned}$$

for any \(\varepsilon >0\) and \(\tau ,\rho \ge 0\) with \(\tau \rho c_\alpha \overline{B}_N\max _{1\le i\le N}\Lambda _i\le \varepsilon /\log d\), where \(C_1,K_1,K_2,K_3>0\) depend only on \(\alpha ,\overline{q}_d\), and \(\Lambda _i:=(\log d)^{(\overline{q}_d-1)/\alpha }\max _{1\le k\le d}\overline{B}_N^{q_k-1}\sqrt{{{\,\mathrm{Inf}\,}}_{i}(f_k)}\). We apply this inequality with \(\tau :=(\log d^2)^{1/\alpha }\{K_1\vee (K_3/K_2)\}\), \(\rho :=(\log d^2)^{(\overline{q}_d-1)/\alpha }K_2\) and

$$\begin{aligned} \varepsilon :=(\log d)^{1/6}\delta _0[\varvec{Q}(\varvec{X})]^{1/3}+(\log d)^{\mu }\delta _1[\varvec{Q}(\varvec{X})]^{1/3}+(\log d)\tau \rho c_\alpha \overline{B}_N\max _{1\le i\le N}\Lambda _i. \end{aligned}$$

By construction, we have

$$\begin{aligned} \mathbb {II}&\lesssim d^{-1}\sum _{i=1}^N\sum _{k=1}^d\overline{B}_N^{4q_k}{{\,\mathrm{Inf}\,}}_{i}(f_k)^2 \le \max _{1\le k\le d}\overline{B}_N^{4q_k}\sum _{i=1}^N{{\,\mathrm{Inf}\,}}_{i}(f_k)^2. \end{aligned}$$

Therefore, we obtain

$$\begin{aligned} \Delta _\varepsilon (\varvec{Q}(\varvec{X}),\varvec{Q}(\varvec{Y}))&\lesssim \varepsilon ^{-4}(\log d)^{3+4(\overline{q}_d-1)/\alpha }\max _{1\le k\le d}\overline{B}_N^{4q_k}\sum _{i=1}^N{{\,\mathrm{Inf}\,}}_i(f_k)^2\\&\lesssim \varepsilon ^{-4}(\log d)^{3+4(\overline{q}_d-1)/\alpha }\delta _1[\varvec{Q}(\varvec{X})]^2 \\&\le (\log d)^{3+4\{(\overline{q}_d-1)/\alpha -\mu \}}\delta _1[\varvec{Q}(\varvec{X})]^{2/3}. \end{aligned}$$

Since \(3+4\{(\overline{q}_d-1)/\alpha -\mu \}\le \frac{4}{3\alpha }(\overline{q}_d-1)+\frac{5}{3}\le 2\mu +1\) and \((\log d)^{\mu +\frac{1}{2}}\delta _1[\varvec{Q}(\varvec{X})]^{\frac{1}{3}}<1\), we conclude that

$$\begin{aligned} \Delta _\varepsilon (\varvec{Q}(\varvec{X}),\varvec{Q}(\varvec{Y}))\lesssim (\log d)^{2\mu +1}\delta _1[\varvec{Q}(\varvec{X})]^{2/3} \le (\log d)^{\mu +\frac{1}{2}}\delta _1[\varvec{Q}(\varvec{X})]^{\frac{1}{3}}. \end{aligned}$$
(7.1)

Meanwhile, Proposition 5.1 yields

$$\begin{aligned} \Delta _{\varepsilon }(\varvec{Q}(\varvec{Y}),Z)&\lesssim \varepsilon ^{-2}(\log d)\left( \delta _0[\varvec{Q}(\varvec{Y})]+\delta _2[\varvec{Q}(\varvec{Y})]\right) . \end{aligned}$$

Now, in the present situation, the constants \(w_*\), \(\overline{v}_N\) and \(\overline{\eta }_N\) appearing in Proposition 5.1 satisfy \(w_*=w\), \(\overline{v}_N\le 2+\overline{A}_N^2/2\) and \(\overline{\eta }_N^{-1}\le \overline{A}_N/2\), so we have

$$\begin{aligned}&\delta _2[\varvec{Q}(\varvec{Y})]\\&\quad \le (\overline{A}_N/2)^{2w\overline{q}_d-1}(\log d)^{2w\overline{q}_d-1} \max _{1\le j,k\le d}\left\{ 1_{\{q_j< q_k\}}\Vert Q(f_j;\varvec{Y})\Vert _4\kappa _4(Q(f_k;\varvec{Y}))^{1/4}\right. \\&\qquad \left. +1_{\{q_j=q_k\}}\sqrt{2\kappa _4(Q(f_k;\varvec{Y}))+\left( (1+\overline{A}_N^2/4)^{q_k}-1\right) (2q_k)!\mathfrak {c}_{q_k} \sum _{i=1}^N{{\,\mathrm{Inf}\,}}_i(f_k)^2}\right\} . \end{aligned}$$

Moreover, by a standard hypercontractivity argument, we have \(\Vert Q(f_j;\varvec{Y})\Vert _4\lesssim \overline{A}_N^{q_j}\Vert Q(f_j;\varvec{Y})\Vert _2\) for every j. Also, since Lemma A.5 yields \(\max _{1\le i\le N}\Vert Y_i\Vert _{\psi _\alpha }\lesssim \underline{\eta }_N^{-1}\), by Lemma 7.1 (with \(l=m=4\)) we obtain

$$\begin{aligned} |\mathrm {E}[Q(f_k;\varvec{X})^4]-\mathrm {E}[Q(f_k;\varvec{Y})^4]| \lesssim \overline{A}_N^{4q_k}\sum _{i=1}^N{{\,\mathrm{Inf}\,}}_i(f_k)^2 \end{aligned}$$

for every k. Since we have \(\Vert Q(f_k;\varvec{Y})\Vert _2=\sqrt{q_k!}\Vert f_k\Vert _{\ell _2}=\Vert Q(f_k;\varvec{X})\Vert _2\) for every k, it holds that \( \delta _2[\varvec{Q}(\varvec{Y})]\lesssim (\log d)^{2w\overline{q}_d-1}\delta _1[\varvec{Q}(\varvec{X})]. \) Consequently, we obtain

$$\begin{aligned} \Delta _{\varepsilon }(\varvec{Q}(\varvec{Y}),Z)&\lesssim (\log d)^{2/3}\delta _0[\varvec{Q}(\varvec{X})]^{1/3}+(\log d)^{2(w\overline{q}_d-\mu )}\delta _1[\varvec{Q}(\varvec{X})]^{1/3}. \end{aligned}$$

Since \(2(w\overline{q}_d-\mu )\le \frac{2}{3}w\overline{q}_d+\frac{1}{3}\le \mu +\frac{1}{2}\), we conclude that

$$\begin{aligned} \Delta _{\varepsilon }(\varvec{Q}(\varvec{Y}),Z)\lesssim (\log d)^{2/3}\delta _0[\varvec{Q}(\varvec{X})]^{1/3}+(\log d)^{\mu +\frac{1}{2}}\delta _1[\varvec{Q}(\varvec{X})]^{1/3}. \end{aligned}$$
(7.2)

Now, (7.1)–(7.2) imply that

$$\begin{aligned} \Delta _{\varepsilon }(\varvec{Q}(\varvec{X}),Z)\le & {} \Delta _{\varepsilon }(\varvec{Q}(\varvec{X}),\varvec{Q}(\varvec{Y}))+\Delta _{\varepsilon }(\varvec{Q}(\varvec{Y}),Z)\\\lesssim & {} (\log d)^{2/3}\delta _0[\varvec{Q}(\varvec{X})]^{1/3}+(\log d)^{\mu +\frac{1}{2}}\delta _1[\varvec{Q}(\varvec{X})]^{1/3}. \end{aligned}$$

Therefore, Proposition 3.1 yields

$$\begin{aligned}&\sup _{x\in {\mathbb {R}}^d}\left| P(\varvec{Q}(\varvec{X})\le x)-P(Z\le x)\right| \\&\quad \lesssim (\log d)^{2/3}\delta _0[\varvec{Q}(\varvec{X})]^{1/3}+(\log d)^{\mu +\frac{1}{2}}\delta _1[\varvec{Q}(\varvec{X})]^{1/3}+\underline{\sigma }^{-1}\varepsilon \sqrt{\log d}\\&\quad \lesssim (1+\underline{\sigma }^{-1})\{(\log d)^{2/3}\delta _0[\varvec{Q}(\varvec{X})]^{1/3}+(\log d)^{\mu +\frac{1}{2}}\delta _1[\varvec{Q}(\varvec{X})]^{1/3}\} \\&\qquad +\underline{\sigma }^{-1}(\log d)^{(2\overline{q}_d-1)/\alpha +\frac{3}{2}}\max _{1\le k\le d}\overline{B}_N^{q_k}\sqrt{{\mathcal {M}}(f_k)}. \end{aligned}$$

This completes the proof. \(\square \)

8.2 Proof of Corollaries 2.1 and 2.2

Corollary 2.1 can be shown in an analogous manner to the proof of [18, Corollary 5.1] with applying Theorem 2.1 instead of [18, Lemma 5.1]. Corollary 2.2 immediately follows from Corollary 2.1. \(\square \)

8.3 Proof of Theorem 2.2

Lemma 7.2

Let \(q\ge 2\) and \(f:[N]^q\rightarrow {\mathbb {R}}\) be a symmetric function vanishing on diagonals. Suppose that the sequence \(\varvec{X}\) satisfies one of conditions (A)–(C). Then, we have \(\kappa _4(Q(f;\varvec{X}))\ge 0\) and

$$\begin{aligned} {\mathcal {M}}(f)\le \max _{1\le r\le q-1}\Vert f\star _rf\Vert _{\ell _2}\le \frac{1}{q\cdot q!}\sqrt{\kappa _4(Q(f;\varvec{X}))}. \end{aligned}$$
(7.3)

Proof

The first inequality in (7.3) is a consequence of Eq.(1.9) in [47] (note that they define \({{\,\mathrm{Inf}\,}}_i(f)\) with dividing ours by \((q-1)!\)). To prove the second inequality in (7.3), first we suppose that \(\varvec{X}\) satisfies condition (A). Let \(\varvec{G}=(G_i)_{i\in {\mathbb {N}}}\) be a sequence of independent standard normal variables. Then, by [45, Proposition 3.1] we have \(\kappa _4(Q(f;\varvec{X}))\ge \kappa _4(Q(f;\varvec{G}))\). Therefore, (5.11) yields the desired result. Next, when \(\varvec{X}\) satisfies condition (B), the desired result follows from Eq.(5.3) in [29]. Finally, when \(\varvec{X}\) satisfies condition (C), the desired result follows from (5.11). Hence, we complete the proof. \(\square \)

Lemma 7.3

Let FG be two random variables such that \(\Vert F\Vert _{\psi _\alpha }\vee \Vert G\Vert _{\psi _\alpha }<\infty \) for some \(\alpha >0\). Then, we have

$$\begin{aligned} |\mathrm {E}[|F|^r]-\mathrm {E}[|G|^r]| \le \frac{2r(\Vert F\Vert _{\psi _\alpha }^r+\Vert G\Vert _{\psi _\alpha }^r)}{\alpha b}\varvec{\Gamma }\left( \frac{r}{\alpha b}\right) \sup _{x\in {\mathbb {R}}}|P(F\le x)-P(G\le x)|^b \end{aligned}$$

for any \(r\ge 1\) and \(b\in (0,1)\), where \(\varvec{\Gamma }\) denotes the gamma function.

Proof

This is an easy consequence of [55, Theorem 8.16] and Lemma A.3. \(\square \)

Proof of Theorem 2.2

The inequality \(\kappa _4(Q(f;\varvec{X}))\ge 0\) is proved in Lemma 7.2. The implication (iv) \(\Rightarrow \) (iii) \(\Rightarrow \) (ii) is obvious. The implication (i) \(\Rightarrow \) (iv) follows from Corollary 2.2 and Lemma 7.2.

It remains to prove (ii) \(\Rightarrow \) (i). In view of Lemma 7.3, it is enough to prove \(\sup _{n\in {\mathbb {N}}}\max _{1\le j\le d_n}(\Vert Q(f_{n,j};\varvec{X})\Vert _{\psi _{\beta }}+\Vert Z_{n,j}\Vert _{\psi _{\beta }})<\infty \) for some \(\beta >0\). This follows from Lemmas 6.2 and A.5. \(\square \)

8.4 Proof of Lemma 2.1

Let us define the sequence of random variables \((Y_i)_{i=1}^N\) in the same way as in the proof of Theorem 2.1. Then, since \(\mathrm {E}[Y_i^4]=3+\frac{3}{2}|\mathrm {E}[X_i^3]|\le \frac{9}{2}M\), Lemma 7.1 yields

$$\begin{aligned} |\mathrm {E}[Q(f;\varvec{X})^4]-\mathrm {E}[Q(f;\varvec{Y})^4]|\le C_1M^{q}\mathcal {M}(f)\Vert f\Vert _{\ell _2}^2, \end{aligned}$$

where \(C_1>0\) depends only on q. Now, since \(\mathrm {E}[Q(f;\varvec{X})^2]=q!\Vert f\Vert _{\ell _2}^2=\mathrm {E}[Q(f;\varvec{Y})^2]\) and \(\sqrt{\kappa _4(Q(f;\varvec{Y}))}\ge q\cdot q!\max _{1\le r\le q-1}\Vert f\star _rf\Vert _{\ell _2}\) by Lemma 7.2, we obtain the desired result. \(\square \)

8.5 Proof of Proposition 2.3

Lemma 7.4

Let \(\varvec{X}=(X_i)_{i=1}^N\) be a sequence of independent centered random variables with unit variance and such that \(M:=\max _{1\le i\le N}\mathrm {E}[X_i^4]<\infty \). Also, let \(f:[N]^2\rightarrow {\mathbb {R}}\) be a symmetric function vanishing on diagonals. Then, we have

$$\begin{aligned} |\kappa _4(Q(f;\varvec{X}))|\le C\left\{ (1+M)\Vert f\Vert _{\ell _2}^2\mathcal {M}(f)+{{\,\mathrm{tr}\,}}([f]^4)\right\} , \end{aligned}$$

where \(C>0\) is a universal constant.

Proof

By Proposition 3.1 and Eq.(3.1) in [21], we have

$$\begin{aligned} |\mathrm {E}[Q(f;\varvec{X})^4]-6G_{\text {V}}|\le G_{\text {I}}+18G_{\text {II}}+24|G_{\text {IV}}|, \end{aligned}$$

where

$$\begin{aligned} G_{\text {I}}&= 2^3\sum _{(i,j)\in \Delta _2^N}f(i,j)^4\mathrm {E}[X_i^4]\mathrm {E}[X_j^4] , \\ G_{\text {II}}&= 2^3\sum _{(i,j,k)\in \Delta _3^N}f(i,j)^2f(i,k)^2\mathrm {E}[X_i^4],\\ G_{\text {IV}}&= 2\sum _{(i,j,k,l)\in \Delta _4^N}f(i,j)f(i,k)f(l,j)f(l,k) ,\\ G_{\text {V}}&= 2\sum _{(i,j,k,l)\in \Delta _4^N}f(i,j)^2f(k,l)^2. \end{aligned}$$

Since we have \(2\Vert f\Vert _{\ell _2}^4-G_{\text {V}}\le 8\Vert f\Vert _{\ell _2}^2\mathcal {M}(f)\), it holds that \(|\kappa _4(Q(f;\varvec{X}))|\le |\mathrm {E}[Q(f;\varvec{X})^4]-6G_{\text {V}}|+48\Vert f\Vert _{\ell _2}^2\mathcal {M}(f)\). Meanwhile, a straightforward computation yields

$$\begin{aligned} {{\,\mathrm{tr}\,}}([f]^4)&=\sum _{(i,j)\in \Delta _2^N}f(i,j)^4 +2\sum _{(i,j,k)\in \Delta _3^N}f(i,k)^2f(j,k)^2\\&\quad +\sum _{(i,j,k,l)\in \Delta _4^N}f(i,k)f(j,k)f(i,l)f(j,l). \end{aligned}$$

Hence, we obtain

$$\begin{aligned}&|\mathrm {E}[Q(f;\varvec{X})^4]-6G_{\text {V}}| \\&\quad \le C_1\left\{ (1+M)\left( \sum _{i,k=1}^Nf(i,k)^4+\sum _{(i,j,k)\in \Delta _3^N}f(i,k)^2f(j,k)^2\right) \right. \\&\qquad \left. +{{\,\mathrm{tr}\,}}([f]^4)\right\} , \end{aligned}$$

where \(C_1>0\) is a universal constant. Since it holds that

$$\begin{aligned} \max \left\{ \sum _{(i,j)\in \Delta _2^N}f(i,j)^4,\sum _{(i,j,k)\in \Delta _3^N}f(i,k)^2f(j,k)^2\right\} \le \Vert f\Vert _{\ell _2}^2\mathcal {M}(f), \end{aligned}$$

we obtain the desired result. \(\square \)

Proof of Proposition 2.3

The desired result immediately follows from Corollary 2.2 and Lemma 7.4. \(\square \)

8.6 Proof of Proposition 2.5

Define the \(n_1\times n_2\) matrix \(\Xi _n(\theta )\) by \(\Xi _n(\theta )= (\frac{1}{2}\Delta _i^nX^1K^{ij}_\theta \Delta _j^nX^2)_{i,j},\) and set

$$\begin{aligned} {\widetilde{\Xi }}_n(\theta )= \left( \begin{array}{cc} O &{} \Xi _n(\theta ) \\ \Xi _n(\theta )^\top &{} O \end{array} \right) . \end{aligned}$$

Note that \(U_n^*(\theta )={\varvec{w}}^\top {\widetilde{\Xi }}_n(\theta ){\varvec{w}}\) with \({\varvec{w}}=((w^1_i)_{i=1}^{n_1},(w^2_j)_{j=1}^{n_2})^\top \). Hence, by Proposition 2.3, it suffices to prove

$$\begin{aligned}&\log ^2(\#{\mathcal {G}}_n)\cdot \sqrt{n}\max _{\theta ,\theta '\in {\mathcal {G}}_n}\left| \mathrm {E}[U_n(\theta )U_n(\theta ')]-\mathrm {E}[U^*_n(\theta )U^*_n(\theta ')\mid X]\right| \rightarrow ^p0, \end{aligned}$$
(7.4)
$$\begin{aligned}&\log ^5(\#{\mathcal {G}}_n)\max _{\theta \in {\mathcal {G}}_n}\sqrt{{{\,\mathrm{tr}\,}}\left( {\widetilde{\Xi }}_n(\theta )^4\right) }\rightarrow ^p0, \end{aligned}$$
(7.5)
$$\begin{aligned}&\log ^5(\#{\mathcal {G}}_n)\cdot \max _{\theta \in {\mathcal {G}}_n}\nonumber \\&\quad \sqrt{\max _{1\le i\le n_1}n\sum _{j=1}^{n_2}(\Delta ^n_iX^1)^2(\Delta ^n_jX^2)^2K^{ij}_\theta +\max _{1\le j\le n_2}n\sum _{i=1}^{n_1}(\Delta ^n_iX^1)^2(\Delta ^n_jX^2)^2K^{ij}_\theta }\rightarrow ^p0. \end{aligned}$$
(7.6)

(7.4) and (7.5) are established in the proof of [35, Proposition B.8] under the current assumptions. Moreover, as in bounding the quantity \(\mathrm {E}[R_{n,1}^*]\) in the proof of [35, Proposition B.8], we deduce for any \(p\ge 1\)

$$\begin{aligned}&{{\,\mathrm{E}\,}}\left[ \left| \max _{\theta \in {\mathcal {G}}_n}\max _{i}n\sum _{j}(\Delta _i^n X^1)^2(\Delta _j^n X^2)^2K_\theta ^{ij}\right| ^p\right] \\&\quad \le n^p\sum _i{{\,\mathrm{E}\,}}\left[ (\Delta _i^n X^1)^{2p}\max _{\theta \in {\mathcal {G}}_n}\left| \sum _{j}(\Delta _j^n X^2)^2K_\theta ^{ij}\right| ^p\right] \\&\quad \le n^p\sum _i\sqrt{{{\,\mathrm{E}\,}}\left[ (\Delta _i^n X^1)^{4p}\right] {{\,\mathrm{E}\,}}\left[ \max _{\theta \in {\mathcal {G}}_n}\left| \sum _{j}(\Delta _j^n X^2)^2K_\theta ^{ij}\right| ^{2p}\right] }\\&\quad =O\left( n^pr_n^{p-1}(r_n\log (\#{\mathcal {G}}_n))^p\right) . \end{aligned}$$

Exchanging \(X^1\) and \(X^2\), we obtain a similar estimate. Hence, (7.6) holds by assumption. \(\square \)