Abstract
We study the distribution of a general class of asymptotically linear statistics which are symmetric functions of N independent observations. The distribution functions of these statistics are approximated by an Edgeworth expansion with a remainder of order \(o(N^{-1})\). The Edgeworth expansion is based on Hoeffding’s decomposition which provides a stochastic expansion into a linear part, a quadratic part as well as smaller higher order parts. The validity of this Edgeworth expansion is proved under Cramér’s condition on the linear part, moment assumptions for all parts of the statistic and an optimal dimensionality requirement for the non linear part.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction and results
1.1 Introduction
Let \(X,\, X_1,X_2,\dots , X_N\) be independent and identically distributed random variables taking values in a measurable space \((\mathcal X,\mathcal B)\). Let \(P_X\) denotes the distribution of X on \((\mathcal X,\mathcal B)\). We assume that \(\mathbb T(X_1,\dots , X_N)\) is a symmetric function of its arguments (symmetric statistic, for short). Furthermore, we assume that the moments \(\mathbf{E}\mathbb T\) and \(\sigma _{\mathbb T}^2:=\mathbf{Var}\mathbb T\) are finite. A function of observations \(X_1,\dots , X_N\) is called linear statistic if it can be represented by a sum of functions depending on a single observation only. Many important statistics are non linear, but can be approximated by a linear statistic. We call these statistics asymptotically linear. The central limit theorem and the normal approximation with rate \(O(N^{-1/2})\) extend to the class of asymptotically linear statistics as well. Our approach in studying the distribution of this class of statistics in the statistically relevant case of asymptotic normal \(\mathbb T\) is based on Hoeffding’s decomposition of \(\mathbb T\), see Hoeffding [31], Efron and Stein [21] and van Zwet [37]. Hoeffding’s decomposition expands \(\mathbb T\) into the series of centered and mutually uncorrelated U-statistics of increasing order
Let L, Q and K denote the first, the second and the third sum. We call L the linear part, Q the quadratic part and K the cubic part of the decomposition. We shall consider a general situation where the kernel \(\mathbb T=\mathbb T^{(N)}\), the space \((\mathcal X,\mathcal B)=(\mathcal X^{(N)},\mathcal B^{(N)})\) and the distribution \(P_X=P_X^{(N)}\) all depend on N as \(N\rightarrow \infty \). In order to keep the notation simple we drop the subscript N in what follows. An improvement over the normal approximation is obtained by using Edgeworth expansions for the distribution function \(\mathbb F(x)=\mathbf{P}\{\mathbb T-\mathbf{E}\mathbb T\le \sigma _{\mathbb T}x\}\). For this purpose we write Hoeffding’s decomposition in the form
where R denotes the remainder. For a number of important examples of asymptotically linear statistics we have \(R/\sigma _{\mathbb T}=o_P(N^{-1})\) (in probability) as \(N\rightarrow \infty \). Therefore, the U-statistic \(\sigma _{\mathbb T}^{-1}(L+Q+K)\) can be viewed as a stochastic expansion of \((\mathbb T-\mathbf{E}\mathbb T)/\sigma _{\mathbb T}\) up to the order \(o_P(N^{-1})\).
Furthermore, a so-called Edgeworth expansion of \(\sigma _{\mathbb T}^{-1}(L+Q+K)\) can be used to approximate \(\mathbb F(x)\) by a smooth distribution function G(x) as defined in (2) below depending on N and moments of \(\mathbb T\). A two term Edgeworth expansion of the distribution function of \(\sigma _{\mathbb T}^{-1}(L+Q+K)\) is given by
Here \(\Phi \) respectively \(\Phi '\) denote the standard normal distribution function and its derivative. Furthermore, we introduce \(\sigma ^2=\mathbf{E}g^2(X_1)\) and
Our main result, Theorem 1 below, establishes a bound \(o(N^{-1})\) for the Kolmogorov distance between \(\mathbb F(x)\) and G(x):
Valid expansions of this type were shown by Cramér [19] for sums of independent random variables \(X_j\) and later on for the Student statistic (which is of type (1)) by Kai-Lai Chung [18]. A new impetus for studying higher order approximations in statistic was given by the fundamental paper of Hodges and Lehmann on deficiency [30], where they compared the power of two tests based on N and \(N'\) observations respectively and where \(N'-N=o(N)\) as \(N\rightarrow \infty \). They suggested a program of comparisons of the power of tests, estimators and confidence regions based on classical parametric and non parametric symmetric statistics e.g. using ranks and ordered samples. They noted that this would require going beyond Gaussian limit theorems to asymptotic expansions to order \(N^{-1}\). For more details on the statistical relevance and the related development of asymptotic methods we refer to the review paper in memory of Willem van Zwet [12].
Now we discuss the principal contribution of this paper: the minimal smootness and structural conditions under which approximation (3) holds. Let us emphasize that any \(\mathbb F\) satisfying (3) cannot have fluctuations/increments of order \(\Theta (N^{-1})\) in the intervals of size \(o(N^{-1})\) because G is a differentiable function with all derivatives bounded. We focus on the conditions that guarrantee the necessary level of smoothness of the distribution of \(\mathbb T\). In the case of linear statistic \(\mathbb T=\mathbf{E}\mathbb T+L\) the necessary smoothness of \(\mathbb F\) is ensured by the classical Cramer condition
This condition excludes, in particular, lattice distributions, for which approximation (3) obviously fails. We note that condition (C) can be weakened to cover some special classes of discrete distribution which are sufficiently non-lattice distributed, see e.g. Bickel and Robinson [13], Angst and Poly [1] or Bobkov [14] for almost sure choices of such non-lattice discrete distributions.
Since the class of symmetric statistic should include the case of linear statistics we require a Cramér type condition but on the linear part of the statistic only, see (7). Interestingly, this condition together with appropriate moment conditions on various parts of the decomposition (1) guarantees already an approximation error \(\Delta =O(N^{-1})\) for general symmetric statistics (see [4]). But (7) is not sufficient for the desired error bound \(o(N^{-1})\) even for U statistics of degree two, see Example 1 below. The reason why (7) alone is not sufficient for the approximation accuracy \(\Delta =o(N^{-1})\) is due to the potential occurrence of a very special relation between the linear and quadratic parts L and Q that fosters an approximate lattice structure as shown in Example 1. Namely, the quadratic part of the U statistic in Example 1 has a factorizable kernel \(\psi \) of the form \(\psi _h(X_1,X_2)=h(X_1)g(X_2)+g(X_1)h(X_2)\), \(h-\)measurable. The following structural condition (4) (introduced in the unpublished manuscript by Götze and van Zwet [25]) avoids such counterexamples by separating (in \(L^2\) distance) the random variable \(\psi (X_1,X_2)\) from any random variable of the form \(\psi _h(X_1,X_2)\). Note that the \(L^2\) distance \(\mathbf{E}(\psi (X_1,X_2)-\psi _h(X_1,X_2))^2\) is minimized by \(h(x)=b(x)\), where
Here \(\kappa =\mathbf{E}\psi (X_1,X_2)g(X_1)g(X_2)\). Therefore, we will assume that, for some absolute constant \(\delta _{*}>0\), we have
The main contribution of the present paper consist of a proof that condition (4) will indeed ensure the desired bound \(\Delta =o(N^{-1})\). The proof is based on a careful investigation of the size distribution for \(|t|> N^{1-\nu }\) of the absolute values of conditional Fourier transforms of symmetric statistics that is the landscape of its maxima when imposing Cramer’s condition (7) and the structural condition (4). Here new methods are used for studying this landscape in the frequency t as well in the random function representing the conditioning. For the latter variable a combinatorial argument of Kleitman on symmetric partitions for the Littlewood–Offord problem in Banach spaces (see [15]) is used.
A short outline of the approach is given at the beginning of Sect. 2, where we focus on the use of condition (4).
1.2 Results
Let us state our main result Theorem 1.
Moment conditions We will assume that, for some absolute constants \(0<A_*<1\) and \(M_*>0\) and numbers \(r>4\) and \(s>2\), we have
These moment conditions refer to the linear, quadratic and cubic part of \(\mathbb T\). In order to control the remainder R of the approximation (1) we use moments of differences introduced in Bentkus, Götze and van Zwet [4], see also van Zwet [37]. Define, for \(1\le i\le N\),
A subsequent application of difference operations \(D_i\), \(D_j\), \(\dots \), (the indices i, j, \(\dots \), are all distinct) produce higher order differences, like
For \(m=1,2,3,4\) write \(\Delta _m^2=\mathbf{E}|N^{m-1/2}D_1D_2\cdots D_m \mathbb T|^2\).
We will assume that for some absolute constant \(D_*>0\) and number \(\nu _1\in (0,1/2)\) we have
For a number of important examples of asymptotically linear statistics the moments \(\Delta _m^2\) are evaluated or estimated in [4]. Typically we have \(\Delta _m^2/\sigma _\mathbb T^2=O(1)\) for some m. Therefore, assuming that (6) holds uniformly in N as \(N\rightarrow \infty \), we obtain from the inequality \(\mathbf{E}R^2\le N^{-3}\Delta _4^2\), see (167) (see “Appendix”), that \(R/\sigma _\mathbb T=O_P(N^{-1-\nu _1})\). Furthermore, assuming that (5), (6) hold uniformly in N as \(N\rightarrow \infty \), we obtain from (167), (166), see “Appendix”, that \(\sigma ^2/\sigma _\mathbb T^2=(1-O(N^{-1}))\).
Cramér type smoothness condition We introduce the function
and assume that, for some \(\delta >0\) and \(\nu _2>0\), we have
Here \(\beta _3=\sigma ^{-3}\mathbf{E}|g(X_1)|^3\). Define \(\nu =600^{-1}\min \{\nu _1,\nu _2,s-2,r-4\}\).
Theorem 1
Assume that for some absolute constants \(A_*,M_*,D_*>0\) and numbers \(r>4, s>2\), \(\nu _1,\nu _2>0\) and \(\delta ,\delta _{*}>0\), the conditions (5), (6), (7), (4) hold. Then there exists a constant \(C_*>0\) depending only on \(A_*\), \(M_*\), \(D_*\), r, s, \(\nu _1,\nu _2,\delta , \delta _{*}\) such that
Remark 1
The value of \(\nu =600^{-1}\min \{\nu _1,\nu _2,s-2,r-4\}\) is far from being optimal. Furthermore, the moment conditions (5) and (6) are not the weakest possible that would ensure the approximation of order \(o(N^{-1})\). The condition (5) can likely be reduced to the moment conditions that are necessary to define Edgeworth expansion terms \(\kappa _3\) and \(\kappa _4\), similarly, (6) can be reduced to \(\Delta _4^2/\sigma _\mathbb T^2= o(N^{-1})\). No effort was made to obtain the result under the optimal conditions. This would increase the complexity of the proof which is already rather involved.
Remark 2
Condition (4) can be relaxed. Assume that for some absolute constant \(G_*\) we have
The bound of Theorem 1 holds if we replace (4) by this weaker condition. In this case we have \(\Delta \le C_*N^{-1-\nu }\), where the constant \(C_*\) depends on \(A_*,D_*, G_*\), \(M_*,r,s,\nu _1,\nu _2, \delta \).
In the particular case of U statistics of degree three (the case where \(R\equiv 0\) in (1)) the proof of Theorem 1 has been outlined in the unpublished manuscript by Götze and van Zwet [25]. We provide a complete and more readable version of the arguments sketched in that preprint and extend them to a general class of symmetric statistics. In the same paper [25], see as well [4], it was shown that moment conditions (like (5), (6)) together with Cramér’s condition (like (7)) do not suffice for the bound \(\Delta =o(N^{-1})\). For convenience we state this result in Example 1 below.
Example 1
Let \(X_1,X_2,\dots \) be independent random variables uniformly distributed on the interval \((-1/2,1/2)\). Define \(T_N=(W_N+N^{-1/2}V_N)(1-N^{-1/2}V_N)\), where \(V_N=N^{-1/2}\sum \{N^{1/2}X_j\}\) and \(W_N=N^{-1}\sum [N^{1/2}X_j]\). Here [x] denotes the nearest integer to x and \(\{x\}=x-[x]\).
Assume that \(N=m^2\), where m is odd. We have, by the local limit theorem,
where \(c>0\) is an absolute constant. From these inequalities it follows by the independence of \(W_N\) and \(V_N\), that \(\mathbf{P}\{1-\delta ^2 N^{-1}\le T_N\le 1\}\ge c^2\delta N^{-1}\).
The example defines a sequence of U-statistics \(\mathbb T_N\) whose distribution functions \(\mathbb F_N\) have \(O(N^{-1})\) sized increments in a particular interval of length \(o(N^{-1})\). These fluctuations of magnitude \(O(N^{-1})\) appear as a result of a nearly lattice structure induced by the interplay between the (smooth) linear part and the quadratic part.
1.3 Earlier work
There is a rich literature devoted to normal approximation and Edgeworth expansions for various classes of asymptotically linear statistics (see e.g. Babu and Bai [2], Bai and Rao [3], Bentkus, Götze and van Zwet [4], Bhattacharya and Ghosh [8, 9], Bhattacharya and Rao [7], Bickel [10], Bickel, Götze and van Zwet [11], Callaert, Janssen and Veraverbeke [16], Chibisov [17], Hall [28], Helmers [29], Petrov [33], Pfanzagl [34], Serfling [35], etc.
A wide class of statistics can be represented as functions of sample means of vector variables. Edgeworth expansions of such statistics can be obtained by applying the multivariate expansion to corresponding functions, see Bhattacharya and Ghosh [8, 9]. In their work the crucial Cramér condition (C) is assumed on the joint distribution of all the components of a vector which may be too restrictive in cases where some components have a negligible influence on the statistic. More often only one or a few of the components satisfy a conditional version of condition (C). Bai and Rao [3], Babu and Bai [2] established Edgeworth expansions for functions of sample means under such a conditional Cramér condition. This approach exploits the smoothness of the distribution of a random vector as well as the smoothness of the function defining the statistic. In particular this approach needs a class of statistics which are smooth functions of observations or can be approximated by such functions via Taylor’s expansion, see also Chibisov [17]. The respective condition (6) of the present paper is expressed in terms of moments of iterated differences \(\Delta _m\) and does not assume Taylor’s expansion.
Let us note that generally the smoothness of the distribution function of \(\mathbb T\) may have little to do with the smoothness of the function \(\mathbb T(X_1,\dots , X_N)\) of observations \(X_1,\dots , X_N\). Just take Gini’s mean difference \(\sum _{i<j}|X_i-X_j|\) with absolutely continuous \(X_i\) for example. Another interesting example is about Studentization, when it enchances the smoothness of the distribution function of a sum of lattice random variables dramatically, see [26]. Our Theorem 1 shows, in particular, that structural condition (4) together with (7) guarantee the smoothness of the distribution of \(\mathbb T\) necessary for the bound \(\Delta =o(N^{-1})\).
In order to compare Theorem 1 with earlier results of similar nature let us consider the case of U-statistics of degree two
where \(h(\cdot ,\cdot )\) denotes a (fixed) symmetric kernel. Assume for simplicity of notation and without loss of generality that \(\mathbf{E}h(X_1,X_2)=0\). Write \(h_1(x)=\mathbf{E}(h(X_1,X_2)|X_1=x)\) and assume that \(\sigma _h^2>0\), where \(\sigma _h^2=\mathbf{E}h_1^2(X_1)\). In this case Hoeffding’s decomposition (1) reduces to \(\mathbb U=L+Q\), where, by the assumption \(\sigma _h^2>0\), we have \(\mathbf{Var}L>0\). Since the cubic part vanishes we remove the moment \(\mathbf{E}g(X_1)g(X_2)g(X_3)\chi (X_1,X_2,X_3)\) from the expression for \(\kappa _4\). In this way we obtain the two term Edgeworth expansion (2) for the distribution function \(\mathbb F_U(x)=\mathbf{P}\{\mathbb U\le \sigma _\mathbb Ux\}\) with \(\sigma ^2_\mathbb U:=\mathbf{Var}\mathbb U\).
We call h reducible if for some measurable functions \(u,v:\mathcal X\rightarrow R\) we have \(h(x,y)=v(x)u(y)+v(y)u(x)\) for \(P_X\times P_X\) almost sure \((x,y)\in \mathcal X\times \mathcal X\). A simple calculation shows that for a sequence of U-statistics (9) with a fixed non-reducible kernel condition (4) is satisfied, for some \(\delta _{*}>0\), uniformly in N. A straightforward consequence of Theorem 1 is the following corollary. Write \({\tilde{\nu }}=600^{-1}\min \{\nu _2,r-4,1\}\).
Corollary 1
Assume that \(\mathbf{E}h(X_1,X_2)=0\) and for some \(r>4\)
Assume that \(\sigma _h^2>0\) and the kernel h is non-reducible and that for some \(\delta >0\)
Then there exist a constant \(C_*>0\) such that
For U-statistics with a fixed kernel the validity of the Edgeworth expansion (2) up to the order \(o(N^{-1})\) was established by Callaert, Janssen and Veraverbeke [16] and Bickel, Götze and van Zwet [11]. In addition to the moment conditions (like (10)) and Cramér’s condition (like (11)) Callaert, Janssen and Veraverbeke [16] imposed the following rather implicit condition. They assumed that for some \(0<c<1\) and \(0<\alpha <1/8\) the event
has probability \(1-o(1/N\log N)\) uniformly for all \(t\in [N^{3/4}/\log N,\, N\log N]\). Here \(m\approx N^\alpha \), for a small positive \(\alpha \). Bickel, Götze and van Zwet [11] more explicitly required that the linear operator, \(f(\cdot )\rightarrow \mathbf{E}\psi (X,\cdot )f(X)\) defined by \(\psi \) has a sufficiently large number of non-zero eigenvalues (the number depending on the existing moments, but always larger than 4). Correspondingly the eigenvalue condition is stronger than the non-reducibility condition of Corollary 1 since for a reducible kernel h the linear operator \(f(\cdot )\rightarrow \mathbf{E}\psi (X,\cdot )f(X)\) has at most two eigenvalues. On the other hand, it is difficult to compare the structural non-reducibility condition with condition (12) whose technical nature is discussed in the outline of the proof at the beginning of Sect. 2.
The remaining parts of the paper (Sects. 2–5) contain the proof of Theorem 1. Auxiliary results are placed in the “Appendix”.
2 Proof of Theorem 1
2.1 Proof highlights
After the seminal paper of Esseen [22] a standard proof of the validity of the normal approximation and its refinements proceeds in two steps. In the first step, with the aid of a smoothing inequality, the Kolmogorov distance between the distribution function and its approximation G is upper bounded by a (weighted) average difference of the respective Fourier transforms, see (25). In the second step one performs a careful analysis of the Fourier transforms: for frequencies \(t=O(\sqrt{N})\) one shows the closeness between the respective Fourier transforms, while for the remaining range \(\Omega (\sqrt{N})\le |t|\le O(T)\) one establishes their exponential decay. The cut-of T is defined by the desired approximation accuracy level \(O(T^{-1})\) (in our case \(T=N^{1+\nu })\)). The approach, initially developed for sums of independent random variables [22, 33], was later applied to non-degenerate U-statistics [11, 16] and general asymptotically linear symmetric statistics [4, 37].
One particular problem related to the implementation of the proof strategy outlined above is about establishing exponential decay of the (absolute value of the) Fourier transform in the range of large frequencies. For a linear statistic this problem is elegantly resolved by introducing Cramér’s condition. Indeed, in view of the multiplicity property of the Fourier transform, the Cramér condition implies the desired exponential decay. Consequently, Cramér’s condition together with moment conditions ensure the validity of an Edgeworth expansion of an arbitrary order. But the multiplicity property can not be used any more (at least directly) when we turn to general symmetric statistics because various parts (linear, quadratic etc.) are mutually dependent. This fact leads to considerable difficulties in estimating the respective Fourier transforms in the range of large frequencies \(t\gg N\) and requires new conditions to control the above mentioned dependencies. The present paper suggests a novel approach to estimation of the Fourier transform of a symmetric statistic for large frequencies.
As our general setup of symmetric statistics covers linear ones, we keep assuming the Cramér condition, but on the linear part of the statistic only, see (7). In view of Example 1, condition (7) is not enough. We introduce the additional structural condition (4), which together with (7) guarantees the desired \(O(N^{-1-\nu })\) upper bound on the weighted average of the Fourier transform over the frequency range \(N^{1-\nu }\le |t|\le N^{1+\nu }\), see (26) below. Condition (4) is optimal and natural in the sense that it matches the counterexample. It has first appeared in the unpublished manuscript [25] by Götze and van Zwet in the case of U statistics.
Let us compare (4) with alternative conditions introduced in earlier papers by Callaert, Janssen and Veraverbeke [16] and Bickel, Götze and van Zwet [11] in the case of U statistics of degree two. The conditional Cramér condition (12) of [16] forces the multiplicity property of the Fourier transform in a formal way thus circumventing the problem of establishing relation between the structure of the kernel (of U statistic) and the smoothness of the distribution. Therefore (4) and (12) are not comparable. This is not the case with the eigenvalue condition of [11], which is stronger than (4). In their proof Bickel, Götze and van Zwet [11] have used for the frequencies \(t\in [N^{(r-1)/r}/\log N,\, N\log N]\) a symmetrization technique of [23] which essentially estimates the absolute value of the Fourier transform of U by that of a bilinear version of Q thus neglecting L and its smoothess properties implied by Cramér’s condition (7). The approach of the present paper instead makes use of the smoothness of L and Q simultaneously.
The main contribution of this paper is in showing that condition (4) suggested by the counterexample (Example 1) is sufficient to prove the bound of Theorem 1. This condition is used in constructing estimates of weighted averages of the Fourier transform (26) that we briefly comment below. In fact, after initial “linearization” step we turn to slightly modified statistic \({\tilde{\mathbb T}}(X_1,\dots , X_N)\), where the nonlinear terms in \(X_1,\dots , X_m\) are removed (see (19)), and then switch to \(T' ={\tilde{\mathbb T}}(X_1,\dots , X_m, Y_{m+1},\dots , Y_N)\), where \(Y_{m+1},\dots , Y_{N}\) are truncated versions of \(X_{m+1},\dots , X_{N}\), see (42). Let \(\mathbf{E}_{\mathbb Y}\) denote the conditional expectation given \(\mathbb Y=(Y_{m+1},\dots , Y_N)\). The conditional Fourier transform \(\mathbf{E}_{\mathbb Y}\exp \{itT'\}=\mathbf{E}\bigl (\exp \{itT'\}\bigl |Y_{m+1},\dots , Y_N\bigr )\) contains a multiplicative component \(\alpha _t^m\), where
For t satisfying \(|\alpha _t|^2\le 1-m^{-1}\ln ^2N\) the bound \(|\mathbf{E}_{\mathbb Y}\exp \{itT'\}|\le \exp \{-0.5\ln ^2N\}\) follows immediately. We then look carefully at the set of remaining t. We show that this set is a union of non-intersecting intervals (depending on \(\mathbb Y\)) each of size \(O(\sqrt{N/m}\ln N)\). While estimating the weighted averages of the Fourier transform over these intervals we split the frequency domain \(N^{1-\nu }\le |T|\le N^{1+\nu }\) into a deterministic sequence \(J_p\), \(p=1,2,\dots \), of consecutive intervals of size \(\Theta (N^{1-\nu })\) so that each ‘singular’ set \(\{t\in J_p:\, |\alpha _t|^2>1-m^{-1}\ln ^2N\}\) is either empty or an interval \([a_N, a_N+b_N^{-1}]\) of size \(b_N^{-1}=O\bigl (\sqrt{N/m}\ln N\bigr )\), (see (51) and (56) based on Lemma 12). At the very last step, using Kleitman’s concentration inequalities for sums of random variables with values in a function space, we upper bound the probability of the event that each particular singular set is non-empty, that is, the event that \(\sup _{t\in J_p}|\alpha _t|^2> 1-m^{-1}\ln ^2N\) thus obtaining an extra factor \(N^{-k\nu }\), \(k\ge 5\) to arrive to the error bound \(o(N^{-1})\).
More precisely, the non-zero projection to the g orthogonal part of \(\sum _{l=m+1}^N\psi (\cdot ,Y_l)\) which is non zero by condition (4) is used in the crucial Lemma 2. Via conditioning and randomization we represent it as a sum \(S_{\alpha }:=\sum _{j=1}^n \alpha _j f_j\) of independent \(\alpha _j=0,1\) variables with vectors \(f_j\) with \(||f_j||> \epsilon \) and estimate the combinatorial probability for those \(\alpha =(\alpha _1,\dots , \alpha _n)\) that a value larger than \(1- m^{-1} ln^2 N\) of the conditional Fourier transform, say \(\tilde{\phi }_t(\alpha )\), of \(f+ S_{\alpha }\) occurs at some ’singular’ frequency \( t \in J_p\). This is achieved by Kleitman’s partition of the \(2^n\) \(\alpha \)’s into at most \(\left( {\begin{array}{c}n\\ n/2\end{array}}\right) \) disjoint sets, say \(C_d, 1\le d\le \left( {\begin{array}{c}n\\ n/2\end{array}}\right) \), such that for different \(\alpha , \alpha '\in C_d\), \(S_{\alpha }\) and \(S_{\alpha '}\) are separated by a distance of at least \(\epsilon \). This separation implies by Lemma 2 that the event that t is singular somewhere in the interval \(J_p\) can be witnessed by at most one \(\alpha \in C_d\) for each \(C_d\). Hence the singular event among the \(\alpha 's\) has combinatorial probability at most \(\left( {\begin{array}{c}n\\ n/2\end{array}}\right) 2^{-n} =O(n^{-1/2})\).
The crucial arguments in Lemma 2 rest upon the observation on harmonics (see (118)) that two singular values \(\tilde{\phi }_t(\alpha ), \tilde{\phi }_s(\alpha ' ) \ge 1-m^{-1}\ln ^2N \) imply a similar high value of \(\mathbf{E}\exp \{i( t (f+ S_{\alpha }) - s(f+ S_{\alpha '}))\}\). If here t and s are close, say \(|t-s| \le \delta _2\), such a high value is excluded by the separation of \(S_{\alpha }\) and \(S_{\alpha }\) which dominate \((t-s)f\) (see step 4.2.1 in Lemma 2), whereas for \( \delta _2<|t-s| < N^{\nu -1/2}\), Cramér’s condition for \((t-s)f\) applies which together with size bounds on \(t S_{\alpha }\) and \(sS_{\alpha '}\) again prevents a high value (see step 4.2.2 in Lemma 2).
Note that this method of width bounds and separation of singular sets of Fourier transforms has been successfully employed for optimal approximation results for U-statistics with non-Gaussian limits by Bentkus, Götze and Zaitsev, see [5] and [27] and is strongly related to results on the distribution of quadratic forms on lattices by Bentkus and Götze, see [6] and [24], the latter providing a solution of the Davenport-Lewis conjecture for positive definite forms.
Finally, we mention that in the case of U statistics of degree three (\(\mathbb T=\mathbf{E}\mathbb T+L+Q+K\)) the proof is outlined in the unpublished manuscript of Götze and van Zwet [25]. We extend these arguments to general symmetric statistics using stochastic expansions by means of Hoeffding’s decomposition and bounds for various parts of the decomposition.
2.2 Outline of the proof
Firstly, using the linear structure induced by Hoeffding’s decomposition we replace \(\mathbb T/\sigma _\mathbb T\) by the statistic \({\tilde{\mathbb T}}\) which is conditionally linear given \(X_{m+1},\dots , X_N\). Secondly, invoking a smoothing inequality we pass from distribution functions to Fourier transforms. In the remaining steps we bound the difference \(\delta (t)=\mathbf{E}e^{it {\tilde{\mathbb T}}}- {\hat{G}}(t)\), for \(|t|\le N^{1+\nu }\). For "small frequencies" \(|t|\le C N^{1/2}\), we expand the characteristic function \(\mathbf{E}e^{it {\tilde{\mathbb T}}}\) in order to show that \(\delta (t)=o(N^{-1})\). Here we combine various techniques developed in earlier papers [4, 11, 16]. For remaining range of frequencies, that is \(C N^{1/2}\le |t|\le N^{1+\nu }\), we bound the summands \(\mathbf{E}e^{it {\tilde{\mathbb T}}}\) and \({\hat{G}}(t)\) separately. The cases of "large frequencies" \(N^{1-\nu }\le |t|\le N^{1+\nu }\) and "medium frequencies" \(C\sqrt{N}\le |t|\le N^{1-\nu }\) are treated in a different manner. For medium frequencies the Cramér type condition (7) ensures an exponential decay of \(|\mathbf{E}e^{it {\tilde{\mathbb T}}}|\). For large frequencies we combine conditions (7) and (4).
2.3 Hoeffding’s decomposition
Before starting the proof we introduce some notation. By \(c_*\) we shall denote a positive constant which may depend only on \(A_*,D_*,M_*, r, s, \nu _1,\nu _2, \delta \), but it does not depend on N. In different places the values of \(c_*\) may be different.
It is convenient to write the decomposition in the form
where, for every k, the symmetric kernel \(g_k\) is centered, i.e., \(\mathbf{E}g_k(X_1,\dots , X_k)=0\), and satisfies, see, e.g., [4],
Here we write \(g_1:= N^{-1/2}g\), \(g_2:= N^{-3/2}\psi \) and \(g_3:= N^{-5/2}\chi \). Furthermore, for an integer \(k>0\) we write \(\Omega _k:=\{1,\dots , k\}\). Given a subset \(A=\{i_1,\dots , i_k\}\subset \Omega _N\) we write, for short, \(T_A:=g_k(X_{i_1},\dots , X_{i_k})\). Put \(T_{\emptyset }:=\mathbf{E}\mathbb T\). Now the decomposition (14) can be written as follows
2.4 Proof of Theorem 1
Throughout the proof we assume without loss of generality that
Denote, for \(t>0\),
Linearization. Choose number \(\nu >0\) and integer m such that
Split
Furthermore, write
Before applying a smoothing inequality we replace \(\mathbb F(x)\) by
In order to show that \(\Lambda \) can be neglected we apply a simple Slutzky type argument. Given \(\varepsilon >0\), we have
From Lemma 5 we obtain via Chebyshev’s inequality, for \(\varepsilon =N^{-1-\nu }\),
In the last step we used conditions (5), (6) and the inequality (168). Furthermore, using (5) and (6) one can show that
Therefore, (20) implies
It remains to show that \({\tilde{\Delta }}\le c_*N^{-1-\nu }\).
A smoothing inequality. Given \(a>0\) and even integer \(k\ge 2\) consider the probability density function, see (10.7) in Bhattacharya and Rao [7],
where c(k) is the normalizing constant. Its characteristic function
vanishes outside the interval \(|t|\le ka\). Here \(u^{*k}_{[-a,a]}(t)\) denotes the probability density function of the sum of k independent random variables each uniformly distributed in \([-a,a]\). It is easy to show that the function \(t\rightarrow {\hat{g}}_{a,k}(t)\) is unimodal and symmetric around \(t=0\).
Let \(\mu \) be the probability distribution with the density \(g_{a,2}\), where a is chosen to satisfy \(\mu ([-1,1])=3/4\). Given \(T>1\) define \(\mu _T({\mathcal{A}})=\mu (T{\mathcal{A}})\), for a Borel set \({\mathcal{A}}\subset \mathbb R\). Let \({\hat{\mu }}_T\) denote the characteristic function corresponding to \(\mu _T\).
We apply Lemma 12.1 of [7]. It follows from (21) and the identity \(\mu _T([-T^{-1}, T^{-1}])= 3/4\) that
Here \({\tilde{\mathcal{F}}}\) and \(\mathcal{G}\) denote the probability distribution of \({\tilde{\mathbb T}}\) and the signed measure with density function \(G'(x)\) respectively. Furthermore, \(*\) denotes the convolution operation. Proceeding as in the proof of Lemma 12.2 ibidem we obtain
where \({\hat{G}}\) denotes the Fourier transform of G(x). Note that \({\hat{\mu }}_T(t)\) vanishes outside the interval \(|t|\le 2aT\). Finally, we obtain from (23) and (24) that
where \(T'=T/2a\). Here we use the fact that \({\hat{\mu }}_{T'}(t)=0\) for \(|t|>T\). Denote \(K_N(t)= {\hat{\mu }}_{T'}(t)\) and observe that \(|K_N(t)|\le 1\) (since \(\mu _{T'}\) is a probability measure). Let
We have
Here we denote \(t_1=N^{1/2}10^{-3}/\beta _3\) and \(t_2=N^{1-\nu }\). In view of (25) the bound \({\tilde{\Delta }}\le c_*N^{-1-\nu }\) follows from the bounds
The bound \(I_2\le c_*N^{-1-\nu }\) is a consequence of the exponential decay of \(|{\hat{G}}(t)|\) as \(|t|\rightarrow \infty \). In Sect. 3 we show (26) for \(k=3,4\). The proof of (26), for \(k=1\), is based on careful expansions and is given Sect. 5.
3 Large frequencies
Here we prove inequalities (26) for \(I_3\) and \(I_4\). The proof of \(|I_3|\le c_*N^{-1-\nu }\) is relatively simple and is deferred to the end of the section.
Let us upper bound \(|I_4|\). We will show that
In what follows we assume that N is sufficiently large, say \(N>C_*\), where \(C_*\) depends only on \(A_*,D_*,M_*,r,s,\nu _1,\nu _2,\delta \). We use this inequality in several places below, where the constant \(C_*\) can be easily specified. Note that for small N such that \(N\le C_*\) the inequality (27) becomes trivial.
3.1 Notation
Let us first introduce some notation. Introduce the number
and note that for \(r\in (4,5]\) and \(\nu \) defined by (17) we have
Given N introduce the integers
We have \(N-m=M\,n+s\), where the integer \(0\le s<n\). Observe, that the inequalities \(\nu <600^{-1}\) and \(m<N^{1/2}\), see (17), imply \(M>n\). Therefore \(s<M\). Split the index set
Clearly, \(O_1,\dots , O_{n-1}\) are of equal size (=M) and \(|O_n|=M+s<2M\).
We shall assume that the random variable \(X:\Omega \rightarrow \mathcal X\) is defined on the probability space \((\Omega , P)\) and \(P_X\) is the probability distribution on \(\mathcal X\) induced by X. Given \(p\ge 1\) let \(L^p=L^p(\mathcal X,P_X)\) denote the space of real functions \(f:\mathcal X\rightarrow \mathbb R\) with \(\mathbf{E}|f(X)|^p<\infty \). Denote \(\Vert f\Vert _p=(\mathbf{E}|f(X)|^p)^{1/p}\). With a random variable f(X) we associate an element (vector) \(f=f(\cdot )\) of \(L^p\), \(p\le r\). Let \(p_g:L^2\rightarrow L^2\) denote the projection onto the subspace orthogonal to the vector \(g(\cdot )\) in \(L^2\). Given \(h\in L^2\), decompose
Here \(\left<h,g\right>=\int h(x)g(x)P_X(dx)\). For \(h\in L^r\) we have
Furthermore, for \(r^{-1}+v^{-1}=1\) (here \(r\ge 2\ge v>1\)) we have, by Hölder’s inequality,
In particular,
Denote
and observe that \(c_g\le c_g^*\). It follows from the decomposition (31) and (33) that
Introduce the numbers
It follows from (7) that there exist \(\delta ',\delta ''>0\) depending on \(A_*, M_*, \delta \) such that (uniformly in N) Cramér’s characteristic \(\rho \) satisfies the inequalities
We shall prove the first inequality only. In view of (7) it suffices to prove that \(\rho (a_1,\beta _3^{-1})\ge \delta '\). Expanding the exponent in powers of \(itg(X_1)/\sigma \) we show the inequality
For \(|t|\le \beta _3^{-1}\) this inequality implies
Therefore, \(\rho (a_1,\beta _3^{-1})\ge a_1^2/3\) and we can choose \(\delta '=\min \{\delta , a_1^2/3\}\) in (36).
Introduce the constant (depending only on \(A_*,M_*,\delta \))
Note that \(0<\delta _1<1/10\). Given \(f\in L^r\) and \(T_0\in {\mathbb R}\) such that
denote
Given a random variable \(\eta \) with values in \(L^r\) and number \(0<s< 1\), define
Introduce the function
and the number
It follows from (4) and our assumption \(\sigma _\mathbb T^2=1\), see (16), that \(\delta _3^2\ge \delta _{*}^2\).
3.2 Proof of (27)
We write \(\mathbf{E}_{\mathbb Y}\exp \{itT'\}\) in the form \(\mathbf{E}_{\mathbb Y}\exp \{itT'\}=\alpha _t^m\exp \{itW'\}\), where \(\alpha _t\) is defined in (13) and where the random variable \(W'\) is defined in the same way as \({\mathbb W}\) in (18), but with \(T_A=g_k(X_{i_1},\dots , X_{i_k})\) replaced by \(g_k(Y_{i_1},\dots , Y_{i_k})\) for each \(A=\{i_1,\dots , i_k\}\). A standard way to upper bound a quantity like \(|\mathbf{E}_{\mathbb Y}e^{itT'}|\) is to show an exponential decay (in m) of the product \(|\alpha _t^m|\) using a Cramér type condition. This task can be accomplished for medium frequencies. Indeed, for \(|t|=o( N)\) the quadratic part \(itN^{-3/2}\sum _{j=m+1}^N\psi (X_1,Y_j)\) can be neglected and Cramér’s condition implies \(|\alpha _t|\le 1-v'\) for some \(v'>0\). This leads to an exponential bound \(|\alpha _t^m|\le e^{-mv'}\). For large frequencies \(|t|\approx N\), the contribution of the quadratic part becomes significant. To upper bound \(|\alpha _t^m|\) we use condition (4). We show that, for a large set of values \(t\in J_p\), see (51), Cramér’s condition (7) yields the desired decay of \(|\alpha _t^m|\), while the measure of the set of remaining t is small with high probability.
Step 1. Truncation. Recall that the random variable \(X:\Omega \rightarrow \mathcal X\) is defined on the probability space \((\Omega , P)\). Let \(X'\) be an independent copy so that \((X,X')\) is defined on \((\Omega \times \Omega ', P\times P)\), where \(\Omega '=\Omega \). It follows from \(\mathbf{E}|\psi (X,X')|^r<\infty \), by Fubini, that for P almost all \(\omega '\in \Omega '\) the function \(\psi (\cdot , X'(\omega '))=\{x\rightarrow \psi (x,X'(\omega ')), \, x\in \mathcal X\}\) is an element of \(L^r\). Furthermore, one can define an \(L^r\)-valued random variable \(Z':\Omega '\rightarrow L^r\) such that \(Z'(\omega ')=\psi (\cdot , X'(\omega '))\), for P almost all \(\omega '\). Consider the event \({\tilde{\Omega }}=\{\Vert Z'\Vert _r\le N^{\alpha }\}\subset \Omega '\) and denote \(q_N=P({\tilde{\Omega }})\). Here \(\Vert Z'\Vert _r=(\int |\psi (x,X'(w'))|^rP_X(dx))^{1/r}\) denotes the \(L^r\) norm of the random vector \(Z'\) and \(\alpha \) is defined in (28). Let \(Y:{\tilde{\Omega }}\rightarrow \mathcal X\) denote the random variable \(X'\) conditioned on the event \({\tilde{\Omega }}\). Therefore Y is defined on the probability space \(({\tilde{\Omega }}, {\tilde{P}})\), where \({\tilde{P}}\) denotes the restriction of \(q_N^{-1}P\) to the set \({\tilde{\Omega }}\) and, for every \(\omega '\in {\tilde{\Omega }}\), we have \(Y(\omega ')=X'(\omega ')\). Let Z denote the \(L^r-\) valued random element \(\{x\rightarrow \psi (x,\, Y(\omega '))\}\) defined on the probability space \(({\tilde{\Omega }}, {\tilde{P}})\).
We can assume that \({\mathbb X}:=(X_1,\dots , X_N)\) is a sequence of independent copies of X defined on the probability space \((\Omega ^N, P^N)\). Let \({\overline{\omega }}=(\omega _1,\dots , \omega _N)\) denote an element of \(\Omega ^N\). Every \(X_j\) defines random vector \(Z_j'=\psi (\cdot , X_j)\) taking values in \(L^r\). Introduce events \(A_j:=\{\Vert Z_j'\Vert _r\le N^{\alpha }\}\subset \Omega ^N\) and let \({\mathbb X}'=(X_1,\dots , X_m,Y_{m+1},\dots , Y_N)\) denote the sequence \({\mathbb X}\) conditioned on the event \(\Omega ^*=\cap _{j=m+1}^NA_j=\Omega ^{m}\times {\tilde{\Omega }}^{N-m}\). Clearly, \({\mathbb X}'({\overline{\omega }})={\mathbb X}({\overline{\omega }})\) for every \({\overline{\omega }}\in \Omega ^*\) and \({\mathbb X}'\) is defined on the space \(\Omega ^{m}\times {\tilde{\Omega }}^{N-m}\) equipped with the probability measure \(P^m\times {\tilde{P}}^{N-m}\). In particular, the random variables \(X_1,\dots , X_m, Y_{m+1},\dots , Y_{N}\) are independent and \(Y_j\), for \(m+1\le j\le N\), has the same distribution as Y. Let \(Z_j\) denote the \(L^r-\) valued random element \(\{x\rightarrow \psi (x,Y_j),\, x\in \mathcal X\}\), for \(m+1\le j\le N\). Let
We are going to replace \(\mathbf{E}e^{it{\tilde{\mathbb T}}}\) by \(\mathbf{E}e^{itT'}\). For \(s>0\) we have almost surely
From (43) with \(s=r\) we obtain, by Chebyshev’s inequality, that
Consequently, for \(k\le N\) we have
Using the identity, which holds for a measurable function \(f:\mathcal X^N\rightarrow {\mathbb R}\),
we obtain from (45) and (46) for \(f\ge 0\) that
Furthermore, (45) and (46) imply
Now we can replace the integral in (27) by the integral
In view of (48) and the simple inequality \(|K_N(t)|\le 1\) the error of this replacement is \(c_*N^{-1-2\nu }\). Hence in order to prove (27) it remains to show that
Step 2. Here we prove (50). We split the integral
where \(\{J_p,\, p=1,2,\dots \}\) is a sequence of consecutive intervals of length \(\approx \delta _1N^{1-\nu }\) each and \(\cup _pJ_p=[N^{1-\nu }, N^{1+\nu }]\). Recall that \(\delta _1\) is defined in (37). To prove (50) we show that for every p
We fix p and prove (52). Firstly, we replace \(I_p\) by \(\mathbf{E}J_*\), where
and where \(I_*=I_*(Y_{m+1}, \dots , Y_{N})\subset J_p\) is a random subset:
Note that for \(t\in J_p\setminus I_*\), we have
These inequalities imply the bound
Secondly, we show that with a high probability the set \(I_*\subset J_p\) is an interval. This fact and the fact that \(v_N(t)\) is monotone will be used latter to bound the integral \(J_*\). Introduce the \(L^r-\) valued random element
We apply Lemma 12 (see below) to the set \(N^{-1/2}I_*\) conditionally given the event \({{\mathbb S}}=\{\Vert S\Vert _r<N^{\nu /10}\}\). This lemma shows that \(N^{-1/2} I_*\) is an interval of size at most \(c_*\varepsilon _m\). Hence we can write \(I_*=(a_N,a_N+b_N^{-1})\) and
where the random variables \(a_N,b_N\) (functions of \(Y_{m+1},\dots , Y_N\)) satisfy
Furthermore, by Lemma 13 below we have \(\mathbf{P}\{{\mathbb S}\}\ge 1-c_*N^{-3}\). Therefore,
Next, we observe that \(I_*\not =\emptyset \) if and only if \( {\tilde{\alpha }}^2>1-\varepsilon _m^2\), where
Therefore we can write (56) in the form
This identity together with (54) and (57) imply
Using the integration by parts formula we shall show below that
Moreover, we shall show that
The latter inequalities in combination with (58) and (59) yield (52). We prove (60) in Sect. 3.3.
Let us prove (59). Firstly, we show that
From the integration by parts formula we obtain the identity
By our choice of the smoothing kernel the function \(v_N(t)\) is monotone on \(J_p\). Therefore
Invoking the simple inequality \(|a'|\le |v_N(a_N)|+|v_N(a_N+b_N^{-1})|\) and using \(|v_N(t)|\le |t|^{-1}\) we obtain from (62) that
For \(|{\hat{T}}|>b_N\), this inequality implies (61). For \(|{\hat{T}}|\le b_N\) the inequality (61) follows from the inequalities
The proof of (61) is complete. Now from (61) and the inequality \(a_N \ge N^{1-\nu }\) we obtain that
Finally, using the inequality (which holds for arbitrary real number v)
we show that
The latter inequality implies (59).
3.3 Proof of (60)
The first and second inequality of (60) are proved in steps A and B.
Step A. Proof of the first inequality of (60). Recall \({\mathbb W}\) from (18). We split
Define \(W_1', W_2', W_3'\) as \(W_1, W_2, W_3\) above, but with \(X_j\) replaced by \(Y_j\), for \(m+1\le j\le N\). We have \(W'=W_1'+W_2'+W_3'\). Now we write \({\hat{T}}\) (see (49)) in the form \({\hat{T}}=L+\Delta +W_3'\), where
The inequalities \(|{\hat{T}}|\le \varepsilon \) and \(|L|\ge 2\varepsilon \) imply \(|\Delta +W_3'|>\varepsilon \). Therefore,
To prove the first inequality of (60) we show that
Step A.1. Proof of the second inequality of (64). We have
It follows from (47), by Chebyshev’s inequality, that \( \mathbf{P}\{|W_3'|>\varepsilon /2\}\le c_*\varepsilon ^{-2}\mathbf{E}W_3^2\). Furthermore, invoking the inequalities, see (167), (168) below,
we obtain from (65) that \(I_2(\varepsilon )\le I_3(\varepsilon )+c_*\varepsilon ^{-2}N^{-2}\). Since
it suffices to show inequality (64) for \(I_3(\varepsilon )\) (instead of \(I_2(\varepsilon )\)). Recall the notation \(\Lambda _1=N^{-3/2}\sum _{1\le i<j\le m}\psi (X_i,X_j)\) and put \(U=\Lambda _1+\Delta \). We have
Invoking the inequality, which follows by Chebyshev’s inequality,
we upper bound the integral
Hence, it remains to show the second inequality of (64) for \(I_4(\varepsilon )\).
Let \(I_4'(\varepsilon )\) be the same probability as \(I_4(\varepsilon )\) but with \(X_i\) replaced by \(Y_i\), for \(1\le i\le m\). That is,
By the same reasoning as in (48) we obtain that \(|I_4(\varepsilon )-I_4'(\varepsilon )|\le c_*N^{-1-3\nu }\). Now, in view of the bound
we conclude that it suffices to show (the second) inequality (64) for \(I_4'(\varepsilon )\).
Let us show the second inequality of (64) for \(I_4'(\varepsilon )\). We split the sample
into three groups of nearly equal size. Next, we split \(U'=\sum _{i\le j}U'_{ij}\) so that the sum \(U'_{ij}\) depends on the observations from the groups \({\mathbb Y}_i\) and \({\mathbb Y}_j\) only. We have
Now we show that the second inequality of (64) holds for every summand in the right of (66). Let \({\tilde{U}}\) denote a summand \(U'_{ij}\), say, not depending on \({\mathbb Y}_3\). Let
We observe that
and note that the random function \( x \rightarrow {\overline{S}}(x)\) is a sum of iid random variables with values in \(L^r\) such that, for every i, we have \(\Vert \psi (\cdot ,Y_i)\Vert _r\le N^{\alpha }\) for almost all values of \(Y_i\). By Lemma 13,
Therefore in (67) we can replace the event \(\mathcal V\) by \(\mathcal{V}_1=\mathcal{V}\cap \{\Vert {\overline{S}}\Vert _r\le N^{\nu }\}\). Furthermore, since \({\tilde{U}}\) does not depend on \({\mathbb Y}_3\), we have \(\mathbf{E}{\mathbb I}_\mathcal{U}{\mathbb I}_{\mathcal{V}_1} =\mathbf{E}{\mathbb I}_\mathcal{U}p'\), where \(p':=\mathbf{E}\bigl ({\mathbb I}_{\mathcal{V}_1}|{\mathbb Y}_1, {\mathbb Y}_2\bigr )\). The concentration bound for the conditional probability \(p'\), which is shown below,
implies
In the last step we applied Markov’s inequality
and the bound \(\mathbf{E}|N^{1/2}{\tilde{U}}|^r\le c_*\mathbf{E}|N^{1/2}U_{ij}|^r\le c_*\). Here \(U_{ij}\) denotes the random variable obtained from \({\tilde{U}}\) after we replace \(Y_j\) by \(X_j\) for every j. The second last inequality follows from (47). The last inequality follows from the well known moment inequalities for U-statistics [20].
It follows from (69) and the simple inequality \(\varepsilon \ge b_N\ge c_* N^{-1/2}\) that
provided that \( m^{r/2}\ge N^{6\nu }\). The latter inequality is ensured by (17). Thus we have shown (64) for \({\tilde{I}}(\varepsilon )\).
It remains to prove (68). We write \(L'+U'\) in the form \(L_*+U_*+b-x\), where
and where b is a function of \(\{Y_i\in {\mathbb Y}\setminus {\mathbb Y}_3\}\). Introduce the random variables \({\overline{L}}\) and \({\overline{U}}\) which are obtained from \(L_*\) and \(U_*\) after we replace every \(Y_j\in {\mathbb Y}_3\) by the corresponding observation \(X_j\). We have
In the last step we applied (47). Now an application of the Berry–Esseen bound due to van Zwet [37] shows (68). The proof of the second inequality of (64) is complete.
Step A.2. Proof of the first inequality of (64). We introduce events
(recall that \(\varepsilon _m\) is defined in (53)) and write \(I_1(\varepsilon )\) in the form \(I_1(\varepsilon ) =\mathbf{E}\, \mathbb I_{\mathbb A}\mathbb I_{{\mathbb S}}\mathbb I_{\mathbb L}\). We have
To upper bound \(I_1(\varepsilon )\) we use the following strategy. We can upper bound the probability \(\mathbf{P}\{\mathbb L\}\) using the Berry–Esseen inequality,
Furthemore, one can show that the probability \(\mathbf{P}\{\mathbb A\}=O(N^{-6\nu })\). We are going to make use of both of these bounds. However, since the events \(\mathbb A\) and \(\mathbb L\) refer to the same set of random variables \(Y_{m+1},\dots , Y_N\), we cannot argue directly that \(\mathbf{E}{\mathbb I}_{\mathbb A}{\mathbb I}_{\mathbb L}\approx \mathbf{P}\{\mathbb A\}\mathbf{P}\{\mathbb L\}\). Nevertheless, invoking a complex conditioning argument we are able to show that
The latter inequality together with the inequalities \(\varepsilon \ge b_N>N^{-1/2}\) imply the first part of (64). Let us prove (71). As the proof is rather involved we start by providing an outline. Let the integers n and M be defined by (29). Split \(\{1,\dots , N\}=O_0\cup O_1\cup \dots \cup O_n\), where \(O_0=\{1,\dots , m\}\) and where the sets \(O_i\), for \(1\le i\le n\), are defined in (30). Split L, see (63),
and where \(L_0=N^{-1/2}\sum _{j\in O_0}g(X_j)\). Observe, that \(\mathbb I_{\mathbb L}\) is a function of \(L_0, L_{1},\dots , L_n\). The random variables \(\mathbb I_{\mathbb A}\) and \(\mathbb I_{\mathbb V}\) are functions of \(Y_{m+1},\dots , Y_N\) and do not depend on \(X_1,\dots , X_m\). Therefore, denoting
we obtain from (70)
Clearly, the bound \(\mathcal{M}\le c_*\mathcal{R}\) would imply (71). Unfortunately, we are not able to establish such a bound directly. In what follows we prove (71) using a delicate conditioning which allows us to estimate quantities like \(\mathcal{M}\).
Step A.2.1. Firstly we replace \(L_k\), \(1\le k\le n\), by smooth random variables
where \(\xi _{1},\dots , \xi _n\) are symmetric i.i.d. random variables with the density function defined by (22) with \(k=6\) and \(a=1/6\) so that the characteristic function \(t\rightarrow \mathbf{E}\exp \{it\xi _1\}\) vanishes outside the unit interval \(\{t:\, |t|<1\}\). Note that \(\mathbf{E}\xi _1^4<\infty \). We assume that the sequences \(\xi _1,\, \xi _2, \dots \) and \(X_1,\dots , X_m,Y_{m+1},\dots , Y_N\) are independent. In particular, \(\xi _k\) and \(L_k\) are independent.
Introduce the event
Note that
Using Markov’s inequality and the inequality \(\mathbf{E}\xi ^4\le c\) we estimate the probability
where in the last step we used \(\varepsilon ^2N\ge b_N^2N\ge c'_*\). Hence we have
In the subsequent steps of the proof we replace the conditioning on \(L_1,\dots , L_n\) (in (73)) by the conditioning on the random variables \(g_1,\dots , g_n\). Since the latter random variables have densities (their densities are analysed in Lemma 7 below) the corresponding conditional distributions are much easier to handle. Moreover, we restrict the conditioning on the event where these densities are positive.
Step A.2.2. Given \(w>0\), consider the events \(\{|g_k|\le n^{-1/2}w\}\) and their indicator functions \(\mathbb I_k=\mathbb I_{\{|g_k|\le n^{-1/2}w\}}\). Using the simple inequality \(n\mathbf{E}g^2_k\le c_*\) (where \(c_*\) depends on \(M_*\) and r) we obtain from Chebyshev’s inequality that
where the last inequality holds for a sufficiently large constant w (depending on \(M_*,\, r\)). Fix w such that (76) holds and introduce the event \(\mathbb B^*=\{\sum _{k=1}^{n}\mathbb I_k> n/4\}\). Hoeffding’s inequality shows \(\mathbf{P}\{\mathbb B^*\}\ge 1-\exp \{-n/8\}\). Therefore,
Given a binary vector \(\theta =(\theta _1,\dots ,\theta _{n})\) (with \(\theta _k\in \{0;1\}\)) write \(|\theta |=\sum _k\theta _k\). Introduce the event \(\mathbb B_\theta =\{\mathbb I_k=\theta _k, \, 1\le k\le n\}\) and the conditional expectation
Note that \(\mathbb I_{\mathbb B_{\theta }}\), the indicator of the event \(\mathbb B_{\theta }\), is a function of \(g_1,\dots , g_{n}\). It follows from the identities
(here \(\mathbb B_{\theta }\cap \mathbb B_{\theta '}=\emptyset \), for \(\theta \not =\theta '\)) that
We shall show below that uniformly in \(\theta \), satisfying \(|\theta |> n/4\), we have
This bound in combination with (70), which extends to \({\tilde{\mathbb L}}\) as well, implies
Combining the latter inequalities with (75) and (77) we obtain (71).
Step A.2.3. Here we show (78). Fix \(\theta =(\theta _1,\dots , \theta _{n})\) satisfying \(|\theta |> n/4\). Denote, for brevity, \(h=|\theta |\) and assume without loss of generality that \(\theta _i=1\), for \(1\le i\le h\), and \(\theta _j=0\), for \(h+1\le j\le n\). Consider the \(h-\)dimensional random vector \({\overline{g}}_{[\theta ]}=(g_1,\dots , g_h)\). Note that the random vector \({\overline{g}}_{[\theta ]}\) and the sequences of random variables
are independent. Recall S from (55) and note that the terms \(S_{\theta }\) and \(S'_{\theta }\) of the decomposition
are independent as well.
For \({\overline{z}}_{[\theta ]} = (z_1,\dots , z_h)\in {\mathbb R}^h\) we have \( m_{\theta }(z_1,\dots , z_n) \le {\tilde{m}}_{\theta }({\overline{z}}_{[\theta ]})\), where
denotes the "ess sup" taken with respect to almost all values of \({\mathbb Y}_{\theta }\) and \(\xi _{\theta }\). To prove (78) we show that
Let us prove (79). Given \({\mathbb Y}_{\theta }\), denote \(f_{\theta }=S'_{\theta }\) (note that \(S'_{\theta }\) is a function of \(\mathbb Y_{\theta }\)). Using the notation (40), we have for the interval \(J'_p=N^{-1/2}J_p\),
Note that the factor \({\mathbb I}_{{\mathbb B}_{\theta }}\) in the right side is non zero whenever \({\overline{z}}_{[\theta ]}=(z_1,\dots , z_h)\) satisfies \(|z_i|\le w/\sqrt{n}\), for \(i=1,\dots , h\). Introduce \(L^r\) valued random variables
and the regular conditional probability
Here \(\mathcal{A}\) denotes a Borel subset of \(L^r\times \dots \times L^r\) (h-times). By independence, there exist regular conditional probabilities
such that for Borel subsets \({\mathcal{A}}_i\) of \(L^r\) we have
In particular, for every \({\overline{z}}_{[\theta ]}\), the regular conditional probability \(P({\overline{z}}_{[\theta ]};\cdot )\) is the (measure theoretical) extension of the product of the regular conditional probabilities (81). Therefore, denoting by \(\psi _i\) a random variable with values in \(L^r\) and with the distribution
we obtain that the distribution of the sum
of independent random variables \(\psi _1,\dots , \psi _h\) is the regular conditional distribution of \(S_{\theta }\), given \({\overline{g}}_{[\theta ]}={\overline{z}}_{[\theta ]}\). In particular, the expectation in the right side of (80) equals \(\delta _{\varepsilon _m}(f_{\theta }+\zeta )\), where
and where \(\mathbf{E}_\zeta \) denotes the conditional expectation given all the random variables, but \(\zeta \). Recall \(\varepsilon _m\) defined by (53) and note that for any \(\varepsilon _*\) satisfying the inequality
we have
We put \(\varepsilon _*=\mu _* |T_0| N^{-1/2}/20\) and apply Lemma 1 to upper bound \(\delta _{\varepsilon _*}(f_{\theta }+\zeta )\) (the quantity \(\mu _*\) is defined in (97) below). We will use the inequalities \(c_*\delta _3^2/n\le \mu _*^2\le c_*'\delta _3^2/n\) that follow from (217) below. Note that for \(T_0\) satisfying (38), integers m, n as in (17), (29), and the quantity \(\delta _3\) (see (41)) satisfying
the inequality (85) holds, provided that N is sufficiently large (\(N>C_*\)). Moreover, we have
Now Lemma 1 (together with the moment inequalities of Lemma 10) implies the inequality
where the number \(\kappa _*\), defined in (97), satisfies \(\kappa _*\le c_*\delta _3^{-r/(r-2)}\), by (218).
Denote \({\tilde{r}}=r^{-1}+(r-2)^{-1}\). It follows from (89), (88) and (86), for \(r>4\), that
In the last step we used the simple bound \(\delta _3^2\le c_*\), see (200), and the inequality \(1+\delta _3^{-{\tilde{r}}}\le 2+\delta _3^{-1}\), which holds for \({\tilde{r}}<1\). Note that (90) and (80), (84) imply (79). The proof the proof of the first inequality of (60) is complete.
Step B. Here we prove the second bound of (60). It is convenient to write the \(L^r\)-valued random variable (55) in the form
Observe that \(U_1,\dots , U_{n-1}\) are independent and identically distributed. We are going to apply Lemma 1 conditionally, given \(U_n\), to the probability
To upper bound \({\tilde{p}}(f)\) we proceed similarly as in the proof of (90). Lemma 9 shows that \(U_1,\dots , U_{n-1}\) satisfy the moment conditions of Lemma 1. Note that in this case the quantity \(\mu _*\) satisfies \(c_*\delta _3^2/n\le \mu ^2_*\le c_*'/n\) (these inequalities follow from (201)). The right inequality implies the bound \(\varepsilon _*\le c_*N^{-48\nu }\) instead of (88) above. As a result we obtain a different power of \(\delta _3\) in the upper bound below. Proceeding as in proof of (90), see (86), (88), (89), we obtain
In the last step we used the inequality \(1+\delta _3^{-r/2(r-2)}\le 2+\delta _3^{-1}\), which follows from \(r/2(r-2)< 1\) (recall that \(r>4\)). Therefore, we have \(\mathbf{P}\{\mathbb B\}\le \mathbf{E}{\tilde{p}}(U_n)\le c_*\mathcal{R}\), where \(\mathcal{R}\) is defined in (71). This completes the proof of the second inequality in (60).
3.4 Proof of (26) for \(k = 3\)
Here we prove the bound \(|I_3|\le c_*N^{-1-\nu }\), see (26). It follows from (48) and the identity \(\mathbf{E}_{\mathbb Y}\exp \{itT'\}=\alpha _t^m\exp \{itW'\}\), see (13), that
Recall the event \({\mathbb S}=\{\Vert S\Vert _r<N^{\nu /10}\}\), where S defined in (55). We have
Using Lemma 13 we upper bound the second term on the right: \(\mathbf{P}\{\Vert S\Vert _r\ge N^{\nu /10}\}\le c_*N^{-3}\). Furthermore, the one-term expansion of the exponent in (13) in powers of \(itN^{-3/2}\sum _{j=m+1}^N\psi (X_1,Y_j)\) shows the inequality
It follows from (7) that the first summand is bounded from above by \(1-v\), for some \(v>0\) depending on \(A_*,M_*,D_*, \delta \) only, see the proof of (36). Furthermore, the second summand is bounded from above by \(N^{-9\nu /10}\) almost surely. Therefore, for sufficiently large \(N>C_*\) we have \({\mathbb I}_{\mathbb S}|\alpha _t|\le 1-v/2\) uniformly in N. Invoking this bound in (93) we obtain
for m satisfying (17). The latter inequality implies that the integral in (92) is bounded from above by \(c_*N^{-2}\) thus completing the proof.
4 Combinatorial concentration bound
We start the section by introducing some notation and collecting auxiliary inequalities. Then we formulate and prove Lemmas 1 and 2.
Introduce the number
where \(c_g=1+\Vert g\Vert _r/\Vert g\Vert _2\) and \(c_r=(7/24)2^{-(r-1)}\). Denote
It follows from the identity \(\rho ^* = \rho (2^{-1}\sigma \delta _2,\, \sigma N^{-\nu +1/2})\) and the simple inequality \(a_1\le \delta _2/4\), see (35), that \(\rho ^* \ge \rho (2 \sigma \,a_1, \sigma N^{-\nu +1/2})\). Furthermore, it follows from (169) and the assumption \(\sigma _\mathbb T^2=1\) that \(1/2<\sigma <2\) for sufficiently large N (\(N>C_*\)). Therefore, \(\rho ^* \ge \rho (a_1,2N^{-\nu +1/2}) \ge \delta '\), where the last inequality follows from (36). We obtain, for \(N>C_*\),
where the number \(\delta '\) depends on \(A_*,D_*,M_*,\nu _1, r,s,\delta \) only. In what follows we use the notation \(c_0=10\). Let \(L_0^r= \{y\in L^r: \int _\mathcal{X} y(x)P_X(dx)=0\}\) denotes a subspace of \(L^r\). Observe, that \(\mathbf{E}g(X_1)=0\) implies \(y^*(=p_g(y))\in L_0^r\), for every \(y\in L_0^r\).
4.1 Lemma 1
Let \(\psi _1,\dots , \psi _n\) denote independent random vectors with values in \(L_0^r\). For \(k=1,\dots , n\), write
Let \(\overline{\psi }_i\) denote an independent copy of \(\psi _i\). Write \(\psi _i^*=p_g(\psi _i)\) and \(\overline{\psi }_i^*=p_g({\overline{\psi }}_i)\), see (31). Introduce random vectors
We shall assume that, for some \(c_A\ge c_D\ge c_B>0\),
for every \(1\le i\le n\). Furthermore, denote \(\mu _i^2=\mathbf{E}\Vert {\tilde{\psi }}^*_i\Vert _2^2\) and \({\tilde{\kappa }}_i^{r-2}=\frac{8}{3}\frac{\mathbf{E}\Vert {\tilde{\psi }}_i\Vert _r^r}{\mu _i^r}\),
Observe that, by Hölder’s inequality and (32), we have \({\tilde{\kappa }}_i>1\), for \(i=1,\dots , n\).
Lemma 1
Let \(4<r\le 5\) and \(0<\nu <10^{-2}(r-4)\). Assume that \(n\ge N^{5\nu }\). Suppose that
Assume that (95), (96) as well as (106), (112) (below) hold. There exist a constant \(c_*>0\) which depends on \(r,s,\nu , A_*, D_*, M_*, \delta \) only such that for every \(T_0\) satisfying (38) we have
for an arbitrary non-random element \(f\in L_0^r\). Here \( {\varepsilon _*}=\frac{\mu _*}{2c_0}\frac{|T_0|}{\sqrt{N}}\). The function \(\delta _s(\cdot , I(T_0))\), is defined in (40).
In Step A.2.3 of Sect. 3 we apply this lemma to random vector \(\zeta =\psi _1+\dots +\psi _h\), see (83). In Step B of Sect. 3 we apply this lemma to the random vector \(S'\), see (91).
Proof
We shall consider the case where \(T_0>0\). For \(T_0<0\) the proof is the same. We can assume without loss of generality that \(c_0<N^{\nu }\). Denote \(X=\Vert \tilde{\psi }_i^*\Vert _2\) and \(Y=\Vert \tilde{\psi }_i\Vert _r\) and \(\mu =\mu _i\), \(\kappa ={\tilde{\kappa }}_i\). By (32), we have \(Y\ge X\).
Step 1. Here we construct the bound (100), see below, for the probability \(\mathbf{P}\{B_i\}\), where
Write
Substitution of the bounds
gives
Finally, invoking the identity \(\kappa ^{r-2}=(8/3)\mathbf{E}Y^r/\mu ^r\) we obtain
Introduce the (random) set \(J=\{i:\, B_i\ {\text {occurs}}\}\subset \{1,\dots , n\}\). Hoeffding’s inequality applied to the random variable \(|J|=\mathbb I_{B_1}+\dots +\mathbb I_{B_n}\) shows
In the last step we invoke (98) and use (100).
Step 2. Here we introduce randomization. Note that for any \(\alpha _i\in \{-1, +1\}\), \(i=1,\dots , n\), the distributions of the random vectors
coincide. Therefore, denoting
we have for \(s>0\),
for every choice of \(\alpha _1,\dots , \alpha _n\). From now on let \(\alpha _1,\dots , \alpha _n\) denote a sequence of independent identically distributed Bernoulli random variables independent of \({\tilde{\psi }}_i, {\hat{\psi }}_i\), \(1\le i\le n\), and with probabilities \(\mathbf{P}\{\alpha _1=1\}=\mathbf{P}\{\alpha _1=-1\}=1/2\). Denoting by \(\mathbf{E}_\alpha \) the expectation with respect to the sequence \(\alpha _1,\dots , \alpha _n\) we obtain
We are going to condition on \({\tilde{\psi }}_i\) and \({\hat{\psi }}_i\), \(1\le i\le n\), while taking expectations with respect to \(\alpha _1,\dots , \alpha _n\). It follows from (101), (102) and the fact that the random variable |J| does not depend on \(\alpha _1,\dots , \alpha _n\) that
where
denotes the conditional expectation given \({\tilde{\psi }}_i,\, {\hat{\psi }}_i\), \(1\le i\le n\). Note that (99) is a consequence of (103) and of the bound
Let us prove this bound. Introduce the integers
Let us show that
It follows from the inequalities
that
Note that (98) implies \(k_*\le n^{1/4}\). Therefore, the inequality
implies \(l\le (3/16)k_*^{-2}n=\rho n\). We obtain (105).
Given \({\tilde{\psi }}_i,\, {\hat{\psi }}_i\), \(1\le i\le n\), consider the corresponding set J, say \(J=\{i_1,\dots , i_k\}\). Assume that \(k\ge \rho n\). From the inequality \(\rho n\ge n_0\), see (105), it follows that we can choose a subset \(J'\subset J\) of size \(|J'|=n_0\). Split
and denote \(f+\zeta '+{\hat{\zeta }}_n=f_*\). Note that \(f_*\in L_0^r\) almost surely. Let
where \(\mathbf{E}'\) denotes the conditional expectation given all the random variables, but \(\{\alpha _i,\, i\in J'\}\). The bound (104) would follow if we show that
Step 3. Here we prove (107). Note that for \(j\in J'\) the vectors
satisfy
Given \(A\subset J'\) denote
We are going to apply Kleitman’s theorem on symmetric partitions (see, e.g. the proof of Theorem 4.2, Bollobas [15]) to the sequence \(\{x_j^*,\, j\in J'\}\) in \(L^2\). Since for \(j\in J'\) we have \(\Vert x_j^*\Vert _2\ge c_0\varepsilon _*\), it follows from Kleitman’s theorem that the collection \(\mathcal{P}(J')\) of all subsets of \(J'\) splits into non-intersecting non-empty classes \(\mathcal{P}(J')=\mathcal{D}_1\cup \cdots \cup \mathcal{D}_s\), such that the corresponding sets of linear combinations \( V_t = \bigl \{ x^*_A,\, A\in \mathcal{D}_t \bigr \}\), \(t=1,2,\dots , s\), are sparse, i.e., given t, for \(A,A'\in \mathcal{D}_t\) and \(A\not =A'\) we have
Furthermore, the number of classes s is bounded from above by \(\left( {\begin{array}{c}n_0\\ \lfloor n_0/2\rfloor \end{array}}\right) \).
Next, using Lemma 2 we shall show that given \(f_*\) the class \(\mathcal{D}_t\) may contain at most one element \(A\in \mathcal{D}_t\) such that
This means that there are at most \(\left( {\begin{array}{c}n_0\\ \lfloor n_0/2\rfloor \end{array}}\right) \) different subsets \(A\subset J'\) for which (110) holds. This implies (107)
Finally, (99) follows from (103), (104), (107).
Given \(f_*\in L_0^r\) let us show that there is no pair \(A,\,A'\) in \(\mathcal{D}_t\) which satisfy (110). Fix \(A,A'\in \mathcal{D}_t\). We have, by (108) and the choice of \(n_0\),
Denoting \(S_{A}=f_*+{\tilde{x}}_A\) and \(S_{A'}=f_*+{\tilde{x}}_{A'}\) we obtain
Assume that \(S_A\) and \(S_{A'}\) satisfy the second inequality of (110), i.e., \(\Vert S_A\Vert _r\le N^{\nu }\) and \(\Vert S_{A'}\Vert _r\le N^{\nu }\). We are going to apply Lemma 2 to the vectors \(S_A\) and \(S_{A'}\). In order to check the conditions of Lemma 2 note that (114) and (115) are verified by (108), (109) and (111). Furthermore, the inequalities \(c_0<N^{\nu }\) and
imply \( N^{2\nu -1/2}\le \varepsilon _*\). Finally, we can assume without loss of generality that \(\varepsilon _*\le c_*'\), where \(c_*':=\min \bigl \{( \delta '/4)^{r/2}, (A_*^{1/2}/6)^{r/2}\bigr \}\). Otherwise (99) follows from trivial inequalities
and the inequality \(\kappa _*>1\).
Now Lemma 2 implies \(\min \{v^2(S_A), \, v^2(S_{A'})\}\le 1-\varepsilon _*^2\) thus completing the proof of Lemma 1. \(\square \)
4.2 Lemma 2
Here we formulate and prove Lemma 2. Let us introduce first some notation. Given \(y\in L^r(=L^r(\mathcal X,P_X))\) define the symmetrization \(y_{s}\in L^r(\mathcal X\times \mathcal X,P_X\times P_X)\) by \(y_{s}(x,x')=y(x)-y(x')\), for \(x,x'\in \mathcal X\). In what follows \(X_1,X_2\) denote independent random variables with values in \(\mathcal X\) and with the common distribution \(P_X\). By \(\mathbf{E}\) we denote the expectation taken with respect to \(P_X\). For \(h\in L^r\) we write
Furthermore, for \(2\le p\le r\), denote
Note that for \(y\in L_0^r\) we have \(y^*(=p_g(y))\in L_0^r\) and, therefore,
Let \(y_1,\dots , y_k, f\) be non-random vectors in \(L^r\). We shall assume that these vectors belong to the linear subspace \(L_0^r\). Given non random vectors \(\alpha =\{\alpha _i\}_{i=1}^k\) and \(\alpha '=\{\alpha '_i\}_{i=1}^k\), with \(\alpha _i, \alpha '_i\in \{-1, +1\}\), denote
Lemma 2
Let \(\varkappa >0\). Assume that (95) holds and suppose that
Given \(T_0\), satisfying (38), write \(T^*=N^{1/2}T_0^{-1}\) and assume that
Suppose that \(\Vert S_{\alpha }\Vert _r\le N^{\nu }\) and \(\Vert S_{\alpha '}\Vert _r\le N^{\nu }\) and
Then \(\min \{v^2(S_{\alpha }),\, v^2(S_{\alpha '}\}\le 1-\varepsilon ^2\).
Recall that the functionals \(v(\cdot ),\tau (\cdot )\), \(u_t(\cdot )\) and the interval \(I=I(T_0)\) used in proof below are defined in (39).
Proof
Note that \(\delta _1<1/10\) and \(\delta _2<1/12\). In particular, we have
Step 1. Assume that the inequality \(\min \{v^2(S_{\alpha }),\, v^2(S_{\alpha '}\}\le 1-\varepsilon ^2\) fails. Then for some \(s,t\in I\) we have
see (39). Fix these s, t and denote
We are going to apply the inequality (256),
to \(Z=-{\tilde{X}}\) and \(Y=s(g+N^{-1/2}S_{\alpha '})\). It follows from this inequality and (117) that
In view of the identity \(|\mathbf{E}e^{-i{\tilde{X}}}|=|\mathbf{E}e^{i{\tilde{X}}}|\) we have
Step 2. Here we shall show that (118) contradicts the second inequality of (115). Firstly, we collect some auxiliary inequalities. Write the decomposition (31) for \(S_{\alpha }\) and \(S_{\alpha '}\),
Decompose
where \(v\in {\mathbb R}\) and where \(h\in L^r\) is \(L^2\)-orthogonal to g. An application of (34) to \(S_{\alpha }^*\) and \(S_{\alpha '}^*-S_{\alpha }^*\) gives
Furthermore, it follows from the simple inequality
that
Note that for a and \(a'\) defined in (119) we obtain from (33) and (115) that
Step 4.2.1. Consider the case where, \( |s-t|<\delta _2\). Invoking the inequalities \(\Vert S_{\alpha }\Vert _r\le N^{\nu }\) and (115) we obtain from (120) that
Furthermore, using (116), (94), and \(N^{\nu -1/2}\le \varepsilon \), we obtain for \(4\le r\le 5\)
Note that (32) implies \(\Vert S_{\alpha }^*\Vert _2\le \Vert S_{\alpha }\Vert _r\le N^{\nu }\). This inequality in combination with (115) and (121) gives
Invoking (116) and using \(c_0>10\), \(\delta _2<12^{-1}\), and \(N^{\nu -1/2}\le \varepsilon \) we obtain
Now we are going to apply Lemma 12 statement a) to \({\tilde{X}}=vg+h\). For this purpose we verify the conditions of this lemma. Firstly, note that (125), (113) imply, \(\Vert h_s\Vert _2^2\ge (8/10)c_0^2\varepsilon ^2\). Furthermore, it follows from the simple inequality \(\mathbf{E}|h(X_1)-h(X_2)|^r\le 2^r\mathbf{E}|h(X_1)|^r\) and (124) that \(\Vert h_s\Vert _r^r\le 3(2/3)^r\varepsilon ^2\). Therefore, we obtain, for \(4\le r\le 5\),
Furthermore, the inequalities (122), (123) and (116) imply
for \(N^{\nu -1/2}\le \varepsilon \le 1\). Invoking (94) and using the inequality \(\Vert g_s\Vert _r^r\le 2^r\Vert g\Vert _r^r\) and the identity \(\Vert g_s\Vert _2^2=2\Vert g\Vert _2^2\) we obtain
as required by Lemma 12a). This lemma implies
In the last step we used (113). Now (125), for \(c_0\ge 10\), contradicts (118).
Step 4.2.2. Consider the case where \(\delta _2<|s-t|\le \delta _1 N^{-\nu +1/2}\). It follows from (120), (115) and (116) that
In the last step we used \(\delta _2<1/3\). From (122), (123) and (116), we obtain for \(\delta _2\le |s-t|\) and \(N^{\nu -1/2}\le \varepsilon \),
provided that \(\varepsilon ^{2/r}<\Vert g\Vert _2/6\). Similarly, using in addition, \(\delta _1, \delta _2<1/4\) and \(\varepsilon <\Vert g\Vert _2\), we obtain, for \(|s-t|\le \delta _1N^{-\nu +1/2}\),
It follows from these inequalities, see (95), that
Finally, invoking (126) and (37), we get
Once again we obtain a contradiction to (118), thus completing the proof.\(\square \)
5 Expansions
Here we prove the bound
where \(t_1=N^{1/2}/10^3\beta _3\). For the definition of \({\tilde{\mathbb T}}\) and \({\hat{G}}\) see Sect. 2.4. Here and below \(c_*\) denotes a constant depending on \(A_*,M_*,D_*,r,s,\nu _1\) only. We prove (127) for sufficiently large N, that is, we shall assume that \(N>C_*\), where \(C_*\) is a number depending on \(A_*,M_*,D_*,r,s,\nu _1\) only. Note that for \(N<C_*\), the bound (127) becomes trivial, since in this case the integral is bounded by a constant.
Let us first introduce some notation. Denote \(\Omega _m=\{1,\dots , m\}\). For \(A\subset \Omega _N\) write \({\mathbb U}_1(A)=\sum _{j\in A}g_1(X_j)\). Given complex valued functions f, h we write \(f\prec \mathcal R\) if
and write \(f\sim h\) if \(f-h\prec \mathcal R\). In particular, (127) can be written in short \(\mathbf{E}e^{it{\tilde{\mathbb T}}}\sim {\hat{G}}(t)\).
In order to prove (127) we show that
In what follows we use the notation of Sect. 2. We denote \(\alpha (t)=\mathbf{E}e^{itg(X_1)}\). We assume that (16) holds.
5.1 Proof of the first relation of (128)
We have, see (19),
where the random variables \(\Lambda _j\) are introduced in Sect. 2.4. We shall show that
The second relation follows from the moment bounds of Lemma 5 via Taylor expansion. We have
By Lyapunov’s inequality,
Invoking the moment bounds of Lemma 5 we obtain \(|t|\mathbf{E}|{\tilde{\Lambda }}_2|\prec \mathcal R\), thus, proving the second part of (129).
In order to prove the first part we combine Taylor’s expansion with bounds for characteristic functions. Expanding the exponent we obtain
Invoking the identities
we obtain, for \(\gamma _2<c_*\) and \(\zeta _2<c_*\), see (5), and \(m\le N^{1/12}\), that \(R\prec \mathcal R\). We complete the proof of (129) by showing that
Let us prove (131). Split \({\mathbb W}={\mathbb W}_1+{\mathbb W}_2+{\mathbb W}_3+R_{W}\), where
Here \(\Omega '=\{m+1,\dots , N\}\). Denote \({\mathbb R}={\mathbb U}_2^*+{\mathbb W}_3+R_W\) and \({\mathbb U}_1=\sum _{j=1}^Ng_1(X_j)\). We have \( {\tilde{\mathbb T}}={\mathbb U}_1+{\mathbb W}_2+{\mathbb R}\). Expanding the exponent in powers of \(it{\mathbb R}\) we obtain
where
In the last step we applied the Cauchy–Schwartz inequality. Combining (130) with the identities
and invoking the simple bound
we obtain \(t^2(r_1+r_2)(r_3+r_4+r_5)\prec \mathcal R\). Therefore, (132) implies
Let us show that \(t\mathbf{E}e^{it({\mathbb U}_1+{\mathbb W}_2)}{\tilde{\Lambda }}_1\sim 0\). Expanding the exponent in powers of \(it{\mathbb W}_2\) we get
where \(\theta _1, \theta _2\) are functions of \({\mathbb W}_2\) satisfying \(|\theta _i|\le 1\).
Let us show that \(f_i\prec \mathcal R\), for \(i=1,2,3,4\). Split the set \(\Omega _m=\{1,\dots , m\}\) in three (non-intersecting) parts \(A_1\cup A_2\cup A_3=\Omega _m\) of (almost) equal size \(|A_i|\approx m/3\). The set of pairs \(\bigl \{\{i,j\}\subset \Omega _m\bigr \}\) splits into six (non-intersecting) parts \(B_{kr}\), \(1\le k\le r\le 3\) (the pair \(\{i,j\}\) belongs to \(B_{kr}\) if \(i\in A_k\) and \(j\in A_r\)). Write
Let us prove \(f_4\prec \mathcal R\). We shall show that
Given a pair (k, r) denote \(A_i=\Omega _m\setminus (A_k\cup A_r)\) and write \(k_i=|A_i|\). Note that \(k_i\approx m/3\). We shall assume that \(k_i\ge m/4\). Since the random variable \({\mathbb U}_1(A_i):=\sum _{j\in A_i}g_1(X_j)\) and the random variables \(\Lambda _1(k,r)\), \({\mathbb W}_2\) are independent, we have
Therefore,
The first factor on the right is bounded from above by \(\exp \{-mt^2/16N\}\), for \(k_i\ge m/4\), see (165) below. The second factor is bounded from above by r, where
Here we combined the Cauchy–Schwartz inequality and the bounds
Finally, (133) follows from (134)
The proof of \(f_3\prec \mathcal R\) is almost the same as that of \(f_4\prec \mathcal R\).
Let us prove \(f_2\prec \mathcal R\). Split the set \(\Omega '=\{m+1,\dots , N\}\) into three (non-intersecting) parts \(B_1\cup B_2\cup B_3=\Omega '\) of (almost) equal sizes \(|B_i|\approx (N-m)/3\). Split the set of pairs \(\bigl \{ \{i,j\}:\, m+1\le i<j\le N\bigr \}\) into (non-intersecting) groups D(k, r), for \(1\le k\le r\le 3\). The pair \(\{i,j\}\in D(k,r)\) if \(i\in B_k\) and \(j\in B_r\). Write
In order to prove \(f_2\prec \mathcal R\) we shall show that
Write \(B_i=\Omega '\setminus (B_k\cup B_r)\) and denote \(m_i=|B_i|\). We shall assume that \(m_i\ge N/4\). Since the random variable \({\mathbb U}_1(B_i)=\sum _{j\in B_i}g_1(X_j)\) and the random variables \(\Lambda _1\) and \({\mathbb W}_2(k,r)\) are independent, we have, cf. (134),
The first factor in the right is the product \(|\alpha ^{m_i}(t)|\le e^{-m_it^2/4N}\), see the argument used in the proof of (133) above. The second factor is bounded from above by \({\tilde{r}}\), where
Finally, we obtain, using the inequality \(m_i\ge N/4\),
This in combination with (136) shows (135). We obtain \(f_2\prec \mathcal R\).
Let us prove \(f_1\prec \mathcal R\). We shall show that \(f^*\prec \mathcal R\) and \(f^{\star }\prec \mathcal R\), where
satisfy \(f^*+f^{\star }=f_1\).
Let us show \(f^{\star }\prec \mathcal R\). Denote \({\mathbb U}_1^{\star }=\sum _{j=m+1}^Ng_1(X_j)\). We obtain, by the independence of \({\mathbb U}_1^{\star }\) and \(\Lambda _1\) that
Invoking, for \(N-m>N/2\), the bound \( |\mathbf{E}e^{it{\mathbb U}_1^{\star }}| \le e^{-t^2/8} \), see (165) below, and the bound \(\mathbf{E}|\Lambda _1|\le (\mathbf{E}\Lambda _1^2)^{1/2}\le c_*mN^{-3/2}\) we obtain
Let us prove \(f^*\prec \mathcal R\). We shall show that, for \(1\le k\le r\le 3\),
Proceeding as in the proof of (135) we obtain the chain of inequalities
In the last step we applied Cauchy–Schwartz and the simple bound \(\mathbf{E}\Lambda _4^2(k,r)\le c_*mN^{-3}\). Clearly, (138) implies (137).
5.2 Proof of the second relation of (128)
Here we prove the second relation of (128). Firstly, we shall show that
where \(w=\mathbf{E}g_3(X_1,X_2,X_3)g_1(X_1)g_1(X_2)g_1(X_3)\).
Let m(t) be an integer valued function such that
and put \(m(t)\equiv 10\), for \(|t|\le C_1\). Here \(C_1\) denotes a large absolute constant (one can take, e.g., \(C_1=200\)). Assume, in addition, that the numbers \(m=m(t)\) are even.
5.2.1 Proof of (139)
Given m write
where
In order to show (139) we expand the exponent in powers of \(it{\mathbb H}\) and \(it{\mathbb U}_3\),
where \( |R|\le t^2(\mathbf{E}{\mathbb H}^2+\mathbf{E}|{\mathbb U}_3{\mathbb H}|) \). Invoking the bounds, see (166), (167), (5), (6),
we obtain, by Cauchy–Schwartz, \(|R|\le c_*t^2 N^{-2-\nu _1}\prec \mathcal R\). We complete the proof of (139) by showing that
Before proving (143) we collect some auxiliary inequalities. For \(m=2k\) write
Furthermore, split the sum
In what follows we shall use the simple bounds, see (5),
Let us prove (143). Expand the exponent \(\exp \{it({\mathbb U}_1+{\mathbb Z}_1+\dots +{\mathbb Z}_4)\}\) in powers of \(it{\mathbb Z}_1\) to get
where \(h_1(t)=\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb Z}_2+\dots +{\mathbb Z}_4)\}it{\mathbb H}\) and where
For \(m=m(t)\) satisfying (141) we have \(R\prec \mathcal R\). Therefore, we obtain
In order to prove \(h_1\prec \mathcal R\) we write \(h_1=h_2+h_3\) and show that \(h_2, h_3\prec \mathcal R\) , where
Let us show that \(h_2\prec \mathcal R\). Firstly, we prove that
where \(h_{2.1}(t)=\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb Z}_4)\}it{\mathbb H}_1\) and, for \(j=2,3\),
Expanding the exponent in powers of \(it({\mathbb Z}_2+{\mathbb Z}_3)\) we obtain
where \(|R|\le |t|^3\mathbf{E}|{\mathbb H}_1|({\mathbb Z}_2+{\mathbb Z}_3)^2\) is bounded from above by
In the last step we used \(\mathbf{E}{\mathbb H}_1^2\le \mathbf{E}{\mathbb H}^2\) and applied (142) and (146). Therefore, (147) follows.
Let us show \(h_{2.i}\prec \mathcal R\), for \(i=1,\, 2,\,3\). The random variable \({\mathbb U}_1(A_1)\) does not depend on the observations \(X_j\), \(j\in \Omega \setminus A_1\). Therefore, we can write
Furthermore, using (165) we obtain, for \(|A_1|=m/2\),
In the last step we combined the bound \(\mathbf{E}{\mathbb H}_1^2\le c_*N^{-2-2\nu _1}\) and (146) to get
Note that choosing of \(C_1\) in (141) sufficiently large implies, for \(|t|\ge C_1\),
An application of this bound to the argument of the exponent in (148) shows \(h_{2.3}\prec \mathcal R\). The proof of \(h_{2.i}\prec \mathcal R\), for \(i=1,2\), is almost the same. Therefore, we obtain \(h_2\prec \mathcal R\).
Let us prove \(h_3\prec \mathcal R\). Firstly we collect some auxiliary inequalities. Write \(m=2k\) (recall that the number m is even) and split \(\Omega _m=B \cup D\), where B denotes the set of odd numbers and D denotes the set of even numbers. Split \({\mathbb H}_2={\mathbb H}_B+{\mathbb H}_D+{\mathbb H}_C\). Here, for \(A\subset \Omega _N\) and \(|A|\ge 4\), we denote by \({\mathbb H}_B\) the sum of \(T_A\) such that \(A\cap B=\emptyset \) and \(A\cap D\not =\emptyset \); \({\mathbb H}_D\) denotes the sum of \(T_A\) such that \(A\cap B\not =\emptyset \) and \(A\cap D=\emptyset \); \({\mathbb H}_C\) denotes the sum of \(T_A\) such that \(A\cap B\not =\emptyset \) and \(A\cap D\not =\emptyset \). It follows from the inequalities (177) and (6) that
Using the notation \(z=it\exp \{it({\mathbb U}_1+{\mathbb Z}_2+{\mathbb Z}_3+{\mathbb Z}_4)\}\) write
We shall show that \(h_{3.i}\prec \mathcal R\), for \(i=1,2,3\). The relation \(h_{3.3}\prec \mathcal R\) follows from (149) and (146), and by Cauchy–Schwartz, \( |h_{3.3}|\le c_*|t|\, mN^{-2-\nu _1}\prec \mathcal R\).
Let us show that \(h_{3.2}\prec \mathcal R\). Expanding the exponent in powers of \(it({\mathbb Z}_2+{\mathbb Z}_3)\) we obtain
where \(|R|\le t^2\mathbf{E}|{\mathbb H}_D ({\mathbb Z}_2+{\mathbb Z}_3)|\). Combining the bounds (146) and (149) we obtain, by Cauchy–Schwartz, \(|R|\le c_*t^2mN^{-(5+2\nu _1)/2}\prec \mathcal R\). Next we show that \(h_{3.2}^*\prec \mathcal R\). The random variable \({\mathbb U}_1(D)=\sum _{j\in D}g_1(X_j)\) and the random variable \({\mathbb H}_D\) are independent. Therefore, we can write
Combining (165) and (149) we obtain using Cauchy–Schwartz,
The proof of \(h_{3.1}\prec \mathcal R\) is similar. Therefore, we obtain \(h_3\prec \mathcal R\). This together with the relation \(h_2\prec \mathcal R\), proved above, implies \(h_1\prec \mathcal R\). Thus we arrive at (143) completing the proof of (139).
5.2.2 Proof of (140)
We start with some auxiliary moment inequalities. Split
Using the orthogonality and moment bounds for U-statistics, see, e.g., Dharmadhikari et al. [20], one can show that
and \(\mathbf{E}|Z|^s\le cN^{3s/2}\mathbf{E}|g_3(X_1,X_2,X_3)|^s\). Invoking (5) we obtain
For the sets \(A_1,A_2\subset \Omega _m\) defined in (144) write
We have \(\mathcal{D}=\mathcal{D}_1\cup \mathcal{D}_2\cup \mathcal{D}_3\) and \(W=\sum _{A\in \mathcal{D}}T_A\). Therefore, we can write \(W=W_1+W_2+W_3\), where \(W_j=\sum _{A\in \mathcal{D}_j}T_A\).
A calculation shows that
Therefore, we obtain form (5) that
Let us prove (140). Write \({\mathbb U}_3=W+Z\). Expanding the exponent in powers of itW we obtain
where, by (150), \(|R|\le t^2\mathbf{E}W^2\le c_*t^2mN^{-3}\prec \mathcal R\). This implies
In order to prove (140) we shall show that
Let us prove (152). Expanding the exponent (in \(h_5\)) in powers of itZ we obtain
where, by (150) and Cauchy–Schwartz,
We have, \(h_5\sim h_6\).
It remains to show that \(h_6\sim \mathbf{E}\exp \{it{\mathbb U}_1\}itW\). Split
We have, see (146),
Expanding the exponent (in \(h_6\)) in powers of \(it{\mathbb U}_2^*\) we obtain
and where, by (150), (156) and Cauchy–Schwartz,
Therefore, we obtain \(h_6\sim h_7\).
We complete the proof of (152) by showing that \(h_7\sim \mathbf{E}\exp \{it{\mathbb U}_1\}itW\). Use the decomposition \(W=W_1+W_2+W_3\) and write
We shall show that
Expanding in powers of \(it{\mathbb U}_2^{\star }\) we obtain
where \(R_j=(it)^2\mathbf{E}\exp \{it{\mathbb U}_1\} W_j{\mathbb U}_2^{\star }\theta \) and where \(\theta \) is a function of \({\mathbb U}_2^{\star }\) satisfying \(|\theta |\le 1\). In order to prove (157) we show that \(R_j\prec \mathcal R\), for \(j=1,2,3\).
Combining (151) and (156) we obtain via Cauchy–Schwartz
Furthermore, using the fact that the random variable \({\mathbb U}_1(A_2)\) and the random variables \({\mathbb U}_2^{\star }\) and \(W_2\) are independent, we can write
Here we used (165) and the moment inequalities (151) and (156). The proof of \(R_1\prec \mathcal R\) is similar. We arrive at (157) and, thus, complete the proof of (152).
Let us prove (153). We proceed in two steps. Firstly we show
Secondly, we show
In order to prove (158) we write
and show that \(R\prec \mathcal R\). In order to bound the remainder R we write \({\mathbb U}_2={\mathbb U}_2^*+{\mathbb U}_2^{\star }\), see (155), and expand the exponent in powers of \(it{\mathbb U}_2^*\). We obtain \(R=R_1+R_2\), where
Note that, for \(2<s\le 3\), we have \(|{\tilde{r}}|\le c |tZ|^{s/2}\). Combining (150) and (156) we obtain via Cauchy–Schwartz,
In order to prove \(R_1\prec \mathcal R\) we use the fact that the random variable \({\mathbb U}_1(\Omega _m)\) and the random variables \({\mathbb U}_2^{\star }\) and \({\tilde{r}}\) are independent. Invoking the inequality \(|{\tilde{r}}|\le t^2Z^2\) we obtain from (165) and (150)
We thus arrive at (158).
Let us prove (159). Use the decomposition (145) and expand the exponent (in \(h_9\)) in powers of \(it{\mathbb Z}_1\) to get \(h_9=h_{10}+R\), where
Combining (146) and (150) we obtain via Cauchy–Schwartz
Therefore, we have
Now we expand the exponent in \(h_{10}\) in powers of \(it({\mathbb Z}_2+{\mathbb Z}_3)\) and obtain \( h_{10}=h_{11}+h_{12}+R\), where
and where \(|R|\le |t|^3\mathbf{E}|Z|\,|{\mathbb Z}_2+{\mathbb Z}_3|^2\). Combining (146) and (150) we obtain via Cauchy–Schwartz \(|R|\le |t|^3mN^{-3}\prec \mathcal R\). Therefore, we have
We complete the proof of (159) by showing that
In order to prove the second bound write
We shall show that \(R_3\prec \mathcal R\). Using the fact that the random variable \({\mathbb U}_1(A_1)\) and the random variables Z, \({\mathbb Z}_3\) and \({\mathbb Z}_4\) are independent we obtain from (165)
In the last step we combined (146), (150) and Cauchy–Schwartz. The proof of \(R_1\prec \mathcal R\) is similar.
In order to prove the first relation of (160) we expand the exponent in powers of \(it{\mathbb Z}_4\) and obtain \(h_{11}=\mathbf{E}\exp \{it{\mathbb U}_1\}itZ+R\). Furthermore, combining (165), (146) and (150) we obtain
Hence the first relation of (160). The proof of (153) is complete.
Let us prove (154). By symmetry and the independence,
Here we denote \(z=g_3(X_1,X_2,X_3)\) and write,
Furthermore, write
In what follows we expand the exponents in powers of \(\textit{itx}_j\), \(j=1,2,3\) and use the fact that \(\mathbf{E}\bigl ( g_3(X_1,X_2,X_3)\bigl |X_1,X_2\bigr )=0\) as well as the obvious symmetry. Thus, we have
Furthermore, we have
Invoking the bounds \(|r_j|\le |tx_j|^2\) and \(|v_j|\le |tx_j|\) we obtain
where \(|R|\le c|t|^{5}\mathbf{E}|z x_1x_2|\,x_3^2\). The bound, \(|R|\le c_*|t|^5N^{-9/2}\) (which follows, by Cauchy–Schwartz) in combination with (161) and (162) implies
Note that \(\left( {\begin{array}{c}N\\ 3\end{array}}\right) |w|\le c_*N^{-1}\). In order to show (154) we replace \(\mathbf{E}e^{it{\mathbb U}_*}\) by \(e^{-t^2/2}\). Therefore, (154) follows from (163) and the inequalities
The second inequality is a direct consequence of (169). The proof of the first inequality is routine and here omitted. Thus the proof of (140) is complete.
5.2.3 Completion of the proof of (128)
Here we show that
This relation in combination with (139) and (140) implies \(\mathbf{E}e^{it{\mathbb T}}\sim {\hat{G}}(t)\).
Let \(G_U(t)\) denote the two term Edgeworth expansion of the U- statistic \({\mathbb U}_1+{\mathbb U}_2\). That is, \(G_U(t)\) is defined by (2), but with \(\kappa _4\) replaced by \(\kappa _4^*\), where \(\kappa _4^*\) is obtained from \(\kappa _4\) after removing the summand \(4\mathbf{E}g(X_1)g(X_2)g(X_3)\chi (X_1,X_2,X_3)\). Furthermore, let \({\hat{G}}_U(t)\) denote the Fourier transform of \(G_U(t)\). It easy to show that
Therefore, in order to prove (164) it suffices to show that \({\hat{G}}_U(t)\sim \mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2)\}\). The bound
where \(\varepsilon _N\downarrow 0\), was shown by Callaert, Janssen and Veraverbeke [16] and Bickel, Götze and van Zwet [11]. An inspection of their proofs shows that under the moment conditions (5) one can replace \(\varepsilon _n\) by \(c_*N^{-\nu }\). This completes the proof of (127).
For the reader convenience we formulate in Lemma 3 a known result on upper bounds for characteristic functions.
Lemma 3
Assume that (16) holds. There exists a constant \(c_*\) depending on \(D_*,M_*, r, s, \nu _1\) only such that, for \(N>c_*\) and \(|t|\le N^{1/2}/10^3\beta _3\) and \(B\subset \Omega _N\), we have
Here \(\alpha (t)=\mathbf{E}\exp \{itg_1(X_1)\}\) and \({\mathbb U}_1(B)=\sum _{j\in B}g_1(X_j)\).
Proof
Let us prove the first inequality of (165). Expanding the exponent, see (188), we obtain
Invoking the inequality \(1-10^{-3}\le \sigma ^2\le 1\) which follows from (169) for \(N>c_*\), where \(c_*\) is sufficiently large, we obtain \(|\alpha (t)|\le 1-t^2/4N\), for \(|t|\le N^{1/2}/10^3\beta _3\).
The second inequality of (165) follows from the first one via the inequality \(1+x\le e^x\), for \( x\in R\).\(\square \)
References
Angst, J., Poly, G.: A weak Cramér condition and application to Edgeworth expansions. Electron. J. Probab. 22(59), 1–24 (2017)
Babu, G.J., Bai, Z.D.: Edgeworth expansions of a function of sample means under minimal moment conditions and partial Cramer’s condition. Sankhya Ser. A 55, 244–258 (1993)
Bai, Z.D., Rao, C.R.: Edgeworth expansion of a function of sample means. Ann. Stat. 19, 1295–1315 (1991)
Bentkus, V., Götze, F., van Zwet, W.R.: An Edgeworth expansion for symmetric statistics. Ann. Stat. 25, 851–896 (1997)
Bentkus, V., Götze, F.: Lattice point problems and distribution of values of quadratic forms. Ann. Math. (2) 150(3), 977–1027 (1999)
Bentkus, V., Götze, F.: Optimal bounds in non-Gaussian limit theorems for U-statistics. Ann. Probab. 27, 454–521 (1999)
Bhattacharya, R.N., Rao, R.R.: Normal Approximation and Asymptotic Expansions. Robert E. Krieger Publishing Company, Inc., Malabar (1986)
Bhattacharya, R.N., Ghosh, J.K.: On the validity of the formal Edgeworth expansion. Ann. Stat. 6, 434–451 (1978)
Bhattacharya, R.N., Ghosh, J.K.: Correction to: On the validity of the formal Edgeworth expansion. Ann. Stat. 8, 1399 (1980)
Bickel, P.J.: Edgeworth expansions in nonparametric statistics. Ann. Stat. 2, 1–20 (1974)
Bickel, P.J., Götze, F., van Zwet, W.R.: The Edgeworth expansion for \(U\)-statistics of degree two. Ann. Stat. 14, 1463–1484 (1986)
Bickel, P., et al.: Willem van Zwet’s research. Ann. Stat. 49, 2439–2447 (2021)
Bickel, P.J., Robinson, J.: Edgeworth expansions and smoothness. Ann. Probab. 10, 500–503 (1982)
Bobkov, S.G.: Khinchine’s theorem and Edgeworth approximations for weighted sums. Ann. Stat. 47, 1616–1633 (2019)
Bollobás, B.: Combinatorics. Set Systems, Hypergraphs, Families of Vectors and Combinatorial Probability. Cambridge University Press, Cambridge (1986)
Callaert, H., Janssen, P., Veraverbeke, N.: An Edgeworth expansion for \(U\)-statistics. Ann. Stat. 8, 299–312 (1980)
Chibisov, D.M.: Asymptotic expansion for the distribution of a statistic admitting a stochastic expansion. I. Teor. Veroyatn. Primen. 25, 745–757 (1980)
Chung, K.-L.: The approximate distribution of Student’s statistic. Ann. Math. Stat. 17, 447–465 (1946)
Cramér, H.: Random variables and probability distributions. In: Cambridge Tracts in Mathematics and Mathematical Physics, No. 36, 3rd edn (1970). Cambridge University Press (1937)
Dharmadhikari, S.W., Fabian, V., Jogdeo, K.: Bounds on the moments of martingales. Ann. Math. Stat. 39, 1719–1723 (1968)
Efron, B., Stein, C.: The jackknife estimate of variance. Ann. Stat. 9, 586–596 (1981)
Esseen, C.G.: Fourier analysis of distribution functions. A mathematical study of the Laplace–Gaussian law. Acta Math. 77, 1–125 (1945)
Götze, F.: Asymptotic expansions for bivariate von Mises functionals. Z. Wahrsch. Verw. Gebiete 50, 333–355 (1979)
Götze, F.: Lattice point problems and values of quadratic forms. Invent. Math. 157, 195–226 (2004)
Götze, F., van Zwet W.R.: Edgeworth expansions for asymptotically linear statistics. Manuscript 1–45 (1992)
Götze, F., van Zwet, W.R.: An Expansion for a Discrete Non-lattice Distribution. Frontiers in Statistics, pp. 257–274. Imperial College, London (2006)
Götze, F., Zaitsev, A.: Explicit rates of approximation in the CLT for quadratic forms. Ann. Probab. 42, 354–397 (2014)
Hall, P.: Edgeworth expansion for Student’s t statistic under minimal moment conditions. Ann. Probab. 15, 920–931 (1987)
Helmers, R.: Edgeworth Expansions for Linear Combinations of Order Statistics. Mathematical Centre Tracts, vol. 105. CWI, Amsterdam (1982)
Hodges, J.L., Jr., Lehmann, E.L.: Deficiency. Ann. Math. Stat. 41, 783–801 (1970)
Hoeffding, W.: A class of statistics with asymptotically normal distribution. Ann. Math. Stat. 19, 293–325 (1948)
Ledoux, M., Talagrand, M.: Probability in Banach Spaces. Isoperimetry and Processes. Springer, Berlin (1991)
Petrov, V.V.: Sums of Independent Random Variables. Springer, New York (1975)
Pfanzagl, J.: Asymptotic expansions for general statistical models. With the assistance of W. Wefelmeyer. In: Lecture Notes in Statistics, vol. 31. Springer, Berlin (1985)
Serfling, R.J.: Approximation Theorems of Mathematical Statistics. Wiley, New York (1980)
Yurinskii, V.V.: Exponential inequalities for sums of random vectors. J. Multivar. Anal. 6, 473–499 (1976)
van Zwet, W.R.: A Berry–Esseen bound for symmetric statistics. Z. Wahrsch. Verw. Gebiete 66, 425–440 (1984)
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
In memoriam Willem Rutger van Zwet *March 31, 1934 $${\dagger }$$ † July 2, 2020.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Research funded in part by the German Research Foundation—SFB 1283/2 2021—317210226.
Appendices
Appendix 1
In Lemma 4 below we compare the moments \(\Delta _m^2\) and \(\mathbf{E}R_m^2\), where \(R_m\) is the remainder of expansion (14),
For \(k=1,\dots , N\), write \(\Omega _k=\{1,2,\dots , k\}\) and denote \( \sigma _k^2:=\mathbf{E}g_k^2(X_1,\dots , X_k)=\mathbf{E}T_{\Omega _k}^2\). It follows from (14), by the orthogonality property (15), that
Lemma 4
Assume that \(\mathbf{E}\mathbb T^2<\infty \). Then
Assume that (5) and (6) hold, then there exists a constant \(c_*<\infty \) depending on \(D_*,M_*,r,s, \nu _1\) such that
Remark. For \(m=3\), inequality (168) yields \(\Delta _3^2\le \zeta _2+N^{-1}\Delta _4^2\).
Proof
Let us prove (167). The identity
where \(\mathbb U_{k|m}=\sum _{|A|=k,\, A\supset \Omega _m}T_A\), implies
We have
where \(b_k=[k]_m/[N]_m\) satisfies \(b_k\ge b_m\ge m!N^{-m}\). Here we denote \([x]_m=x(x-1)\cdots (x-m+1)\). A comparison of (166) and (171) shows (167)
Let us prove (168). We have
where \({\tilde{b}}_k=(N-m)/(k-m)\le N\). We obtain the inequality
which implies (168).
Let us prove (169). From (166), (167) we have, for \(\sigma ^2=N\sigma _1^2\),
Invoking the bounds, which follow from (5),
and using (6) we obtain (169). \(\square \)
In Lemma 5 below we establish moments bounds for various parts of Hoeffding decomposition defined in Sect. 2.
Lemma 5
Assume that \(\sigma _\mathbb T^2=1\). For \(3\le m\le N\) and \(s>2\), we have
Here c denotes an absolute constant and c(s) denotes a constant which depends only on s.
Proof
The inequalities (172) are proved in [4].
Let us prove (173). Split \(\Lambda _4=z_1+\dots +z_m\), where
Let \(\mathbf{E}'\) denote the conditional expectation given \(X_{m+1},\dots , X_N\). It follows from Rosenthal’s inequality that almost surely
Invoking Hölder’s inequality we obtain, by symmetry,
Using well known martingale moment inequalities (and their applications to U statistics), see [20], one can show the bound \(\mathbf{E}|z_1|^s\le c(s)N^{-3s/2}\zeta _s\). Invoking this bound in (174) we obtain the first bound of (173).
In order to prove the second bound of (173) write
A simple calculation shows \(\mathbf{E}(U^*_k)^2= \left( {\begin{array}{c}N-m\\ k-1\end{array}}\right) \sigma _k^2\). Therefore, by orthogonality,
In the last step we invoke (170) and use the bound \(b_k\le N^3\), where \(b_k= \left( {\begin{array}{c}N-m\\ k-1\end{array}}\right) \left( {\begin{array}{c}N-4\\ k-4\end{array}}\right) ^{-1}\). Clearly, (175) implies \(\mathbf{E}\eta _i^2\le N^{-4}\Delta _4^2\). Finally, using the fact that \(\eta _1,\dots , \eta _m\) are uncorrelated we obtain
thus completing the proof.\(\square \)
Before formulating next result we introduce some notation. Given m let \(\mathcal D\) denote the class of subsets \(A\subset \Omega _N\) satisfying \(|A|\ge 4\) and \(\Omega _m\cap A\not =\emptyset \). Introduce the random variable \({\mathbb H}(m)=\sum _{A\in \mathcal D}T_A\). Denote \(x_i=2i-1\) and \(y_i=2i\). For even integer \(m=2k\le N\) write
and put \(A_0=B_0=\emptyset \). Let \(\mathcal{A}(k)\) (respectively \(\mathcal{B}(k)\)) denote the collection of those \(A\in \mathcal D\) which satisfy \(A\cap A_k=\emptyset \) (respectively \(A\cap B_k=\emptyset \)). Furthermore, let \(\mathcal{C}(k)\) denote the collection of \(A\in \mathcal{D}\) such that \(A\cap A_k\not =\emptyset \) and \(A\cap B_k\not =\emptyset \). Write
Lemma 6
There exists an absolute constant c such that,
For even integer \(m=2k<N\) we have
Proof
Let us prove the first bound of (176). For \(m=4\) we have
A calculation shows that, for \(k=1,2,3,4\),
where the numbers
Invoking (171) we obtain
Finally, we obtain (176) for \(m=4\)
In order to prove (176) for \(m=5,6,\dots \) we apply a recursive argument. Write
where \(d_m={\mathbb H}(m+1)-{\mathbb H}(m)\) is the sum of those \(T_A\) with \(|A|\ge 4\) satisfying \(A\cap \Omega _m=\emptyset \) and \(A\cap \Omega _{m+1}\not =\emptyset \). In particular, we have
Therefore,
where the numbers
Invoking (171) we obtain \(\mathbf{E}d_m^2\le N^{-4}\Delta _4^2\). This bound together with (179) implies (176).
Let us prove (177). Note that for \(m=2k\) we have \({\mathbb H}(m)={\mathbb H}_A(k)+{\mathbb H}_B(k)+{\mathbb H}_C(k)\) and the summands are uncorrelated. Therefore, the first bound of (177) follows from (176).
Let us show the second inequality of (177). For \(k=2\) we have \(\mathcal{C}(2)\subset \mathcal{C}\), where \(\mathcal C\) denotes the class of subsets \(A\subset \Omega _N\) such that \(|A|\ge 4\) and \(|A\cap \Omega _4|\ge 2\). Write \({\mathbb H}_C=\sum _{A\in \mathcal C}T_A\). We have
In the last step we applied (178). We obtain (177), for \(k=2\).
In order to prove the bound (177), for \(k=3,4,\dots \), we apply a recursive argument similar to that used in the proof of (176). Denote
We shall show that
This bound in combination with the identity \(\mathbf{E}{\mathbb H}_C^2(k+1)=\mathbf{E}{\mathbb H}_C^2(k)+\mathbf{E}d_{[k]}^2\) shows (177) for arbitrary k.
In order to show (180) split the set \(\mathcal{C}(k+1)\setminus \mathcal{C}(k)\) into \(2k+1\) non-intersecting parts
where we denote
By the orthogonality property (\(\mathbf{E}T_AT_{V}=0\) for \(A\not =V\)), the random variables
are uncorrelated. Therefore, we have
A calculation shows that
where the coefficients
Invoking (171) we obtain \(\mathbf{E}d_{x.y}^2\le N^{-5}\Delta _4^2\). The same argument shows \(\mathbf{E}d_{x.i}^2=\mathbf{E}d_{y.i}^2\le N^{-5}\Delta _4^2\). The latter bound in combination with (181) shows (180). The lemma is proved.\(\square \)
Appendix 2
Here we construct bounds for the probability density function (and its derivatives) of random variables \(g_k^*=(N/M)^{1/2}g_k\), for \(1\le k\le n-1\), where \(g_k\) are defined in (74). Since these random variables are identically distributed it suffices to consider
Here \(R=\sqrt{n\,M\,N\,}\). Introduce the random variables
Let \(p_i(\cdot )\) denote the probability density function of \(g_i^*\), for \(i=1,2,3\). Recall that the integers \(n\approx N^{50\nu }\le N^{\nu _2/10}\) and \(M\approx N/n \ge N^{9/10}\) are introduced in (29) and the number \(\nu >0\) is defined by (17).
Lemma 7
Assume that conditions of Theorem 1 are satisfied. There exist positive constants \(C_*, c_*, c_*'\) depending only on \(M_*, D_*, \delta , r\) and \(\nu _1, \nu \) such that, for \(i=1,2,3\), we have uniformly in \(u\in \mathbb R\) and \(N>C_*\)
Furthermore, given \(w>0\) there exists a constant \(C_*(w)\) depending on \(M_*, D_*, \delta , r\), \(\nu _1, \nu \) and w such that uniformly in \(z_*\in [-2w,2w]\) and \(N>C_*(w)\) we have
Proof
We shall prove (182) and (183) for \(i=1\). For \(i=2,3\), the proof is almost the same. Before starting the proof we introduce some notation and collect auxiliary results.
Denote
and recall that \(q_N=\mathbf{P}\{A_j\}\), where \(A_j=\{\Vert Z_j'\Vert _r\le N^{\alpha }\}\). It follows from \(\mathbf{E}g(X_{m+1})=0\) that
Therefore, by Chebyshev’s inequality, for \(\alpha =3/(r+2)\) we have
In the last step we invoke the inequalities \(\alpha (r-1)\ge 1+(r-1)/(r+2)\ge 3/2\) and \(q_N^{-1}\le c_*\), see (45), and \(\mathbf{E}|g(X_{m+1})|\,\Vert Z_{m+1}'\Vert _r^{r-1}\le M_*\), where the latter inequality follows from (5) by Hölder inequality.
Similarly, the identities
in combination with (44) and the inequalities
and \(\alpha (r-2)=1+2(r-4)/(r-2)\ge 1\) yield
Introduce the random variables
We have \(g_*=s^{-1}(g_1^*-\theta )\). Let \(p(\cdot )\) denote the density function of \(g_*\). Note that \(p_1(u)=s^{-1}p\bigl (s^{-1}(u-\theta )\bigr )\). Furthermore, we have, by (184), \(|\theta |\le c_*N^{-1}\) and, by (185), (169), \(|s^2-1|\le c_*N^{-1}\). Therefore, it suffices to prove (182) and (183) for \(p(\cdot )\) (the latter inequality we verify for every \(z_*\in [-3w,3w]\)).
In order to prove (182) and (183) we approximate the characteristic function \({\hat{p}}(t)=\mathbf{E}e^{itg_*}\) by \(e^{-t^2/2}\) and then apply a Fourier inversion formula. Write
The fact that \(\tau (t)=0\), for \(|t|\ge 1\), implies \({\hat{p}}(t)=0\), for \(|t|>s\, R\). Therefore, we obtain from the Fourier inversion formula,
Write \({\hat{p}}(t)-e^{-t^2/2}=r_1(t)+r_2(t)\), where
We shall show below that
These bounds in combination with the simple inequality
show that
Here \(\varphi \) denotes the standard normal density function
It follows from (187) that
Furthermore, given w we have uniformly in \(|z_*|\le 3w\)
for sufficiently large M (for \(N>C_*(w)\)).
In order to prove an upper bound for the \(k-\)th derivative, \(|p^{(k)}(x)|\le c_*\), write
and replace \({\hat{p}}(t)\) by \(e^{-t^2/2}\) as in the proof of (187). We obtain
This implies \(|p^{(k)}(x)-\varphi ^{(k)}(x)|\le c_*M^{-1/2}\). We arrive at the desired bound \(|p^{(k)}(x)|\le c_*\), for \(k=1,2,3\).
In the remaining part of the proof we verify (186). For \(i=2\) this bound follows from \(|\tau (t/sR)-1|\le ct^2/(sR)^2\). The latter inequality is a consequence of the short expansion
and \(\mathbf{E}\xi _1=0\) and \(\mathbf{E}\xi _1^2\le c\), for some absolute constant c.
Let us prove (186) for \(i=1\). Introduce the sequence of i.i.d. centered Gaussian random variables \(\eta _1,\,\eta _2,\,\dots \) with variances \(\mathbf{E}\eta _i^2=M^{-1}\). Denote
We are going to apply the well known inequality
It follows from (188) and identities \(\mathbf{E}\eta _1^i=\mathbf{E}w_1^i\), \(i=1,2\), that
Here we use the inequality \(\mathbf{E}|\eta _1|^3\le c\mathbf{E}|w_1|^3\), which follows from \(\mathbf{E}\eta _1^2=\mathbf{E}w_1^2\).
Combining (189) and the simple identity
we obtain
Here we denote
We shall show below that
where \(\delta ''>0\) depends on \(\delta , A_*, D_*, M_*, \nu _1\) and it is given in (36). This inequality in combination with (190) proves (186).
Let us prove (191). Clearly, \(Z\le |f^{M-1}(t)|+|\gamma ^{M-1}(t)|\). Furthermore, \(f^M(t)=e^{-t^2/2}\). In order to prove (191) we shall show
To show (192) we expand \(e^{itw_1}\) using (188),
Here we used the identity \(|1-t^2/2M|=1-t^2/2M\), which holds for \(|t|<M^{1/2}/{\tilde{\beta }}_3\), since \({\tilde{\beta }}_3\ge 1\). Finally, an application of the inequality \(1-x\le e^{-x}\) to \(x=t^2/3M>0\) completes the proof of (192).
Let us prove (193). For \(\delta ''\) defined by (36) we shall show \(\delta ''\le 2{\tilde{\delta }}\), where
We are going to replace \(g(Y_{m+1}),\, {\tilde{\beta }}_3,\, s^2\) by \(g(X_{m+1}), \,\beta _3, \,\sigma ^2\) respectively. Write
It follows from (44), (45) that, for every \(v\in \mathbb R\),
These bounds imply
One can show that, for sufficiently large N (i.e., for \(N>C_*\)), we have
Using (194), (195) we get, for \(N>C_*\),
We obtain \(|\gamma (t)|\le 1-{\tilde{\delta }}\le 1-\delta ''/2\) and, therefore, \(|\gamma (t)|\le e^{-\delta ''/2}\). The lemma is proved.\(\square \)
Appendix 3
The main results of this section are moment inequalities of Lemma 9 and corresponding inequalities for conditional moments of Lemma 10. Lemma 8 provides an auxiliary inequality.
We start with some notation. We call \(v=v(\cdot ),u=u(\cdot )\in L^r\) orthogonal if \(\langle u,v \rangle =0\), where
Given \(f\in L^2(P_X)\) we have for the kernel \(\psi ^{**}\) defined in (41)
and almost surely
The latter identity says that almost all values of the \(L^r\)-valued random variable \(\psi ^{**}(\cdot ,X_2)\) are orthogonal to the vector \(g(\cdot )\in L^r\).
Let \( p_g:L^r\rightarrow L^r \) denote the projection on the subspace of elements \(u\in L^r\) which are orthogonal to \(g=g(\cdot )\). For \(v\in L^r\), write \(v^*=p_g(v)\). It follows from (197) that
where \( b^*(\cdot )=p_g(b(\cdot ))= \sigma ^{-2}p_g\bigl (\, \mathbf{E}(\psi (\cdot ,X_1)g(X_1)\, \bigr )\). Denote
where the \(L^r\)-valued random variables \(U_k\) are introduced in (91). For the random variables \(g_k\) and \(L_k\) introduced in (72) and (74), we have
Denote \(K=\mathbf{E}|\psi (X_1,X_2)|^r\) and \(K_s=\mathbf{E}|\psi ^{**}(X_1,X_2)|^s\), \(s\le r\).
Lemma 8
Let \(4<r\le 5\). For \(s\le r\), we have
Proof
The first inequality of (200) is a consequence of Lyapunov’s inequality. Let us prove the second inequality. The inequality \(|a+b+c|^r\le 3^r(|a|^r+|b|^r+|c|^r)\) implies
Therefore, (200) is a consequence of the inequalities
Here \(\kappa =\mathbf{E}\psi (X_1,X_2)g(X_1)g(X_2)\). To prove the first inequality use \(|a+b|^r\le 2^r(|a|^r+|b|^r)\) to get
Furthermore, by Cauchy–Schwartz,
Finally, Lyapunov’s inequality implies
We obtain \(\mathbf{E}\bigl |\mathbf{E}\bigl (\psi (X_1,X_2)g(X_2)\,| \,X_1\bigr )\bigr |^r\le K\sigma ^r\) thus completing the proof.\(\square \)
Lemma 9
Let \(1\le k\le n-1\). For \({\overline{U}}_k^*\), an independent copy of \(U_k^*\), we have
Recall that \(\delta _3^2=\mathbf{E}|\psi ^{**}(X_1,X_2)|^2\).
Proof
Let us prove (201). By symmetry, we have, for \(i,j\in O_1\),
The inequality (201) follows from the inequalities
Let us prove (203). From (198) we have \(H_1=V_1+V_2+2V_3\), where
Let us show that
This inequality follows from (44), (45) and the identity
where \(V_1'=\mathbf{E}|\psi ^{**}(X_1,X_j)|^2(1-{\mathbb I}_{A_j})\) satisfies, by (43),
In the last step we applied Hölder’s inequality and Lemma 8 to get
Let us show that
For \({\tilde{b}}(\cdot ):= \mathbf{E}\psi (\cdot ,X_1)g(X_1)\) we have, by Cauchy–Schwartz,
Now the identity \(b^*=\sigma ^{-2}p_g({\tilde{b}})\) implies
Invoking the bound \(\mathbf{E}g^2(Y_j)\le c_*\sigma ^2\), see (47), we obtain (207).
Finally, write
Identity (197) implies \(\mathbf{E}{\tilde{V}}=0\). Therefore \(V_3=q_N^{-1}\mathbf{E}{\tilde{V}}({\mathbb I}_{A_j}-1)\). Invoking (43) and using \(q_N^{-1}\le c_*\), see (45), we obtain
In the last step we used the bound \(\mathbf{E}|{\tilde{V}}|\Vert Z_j'\Vert _r^{r-4}\le c_*\). In order to prove this bound we invoke the inequalities
to show that
Furthermore, by Hölder’s inequality and (200),
By the independence and (208),
Thus we arrive at (209). Combining (205), (207) and (209) we obtain (203).
Let us prove (204). Using (198) write \(H_2=Q_1+Q_2+2Q_3\), where
It follows from the identity (196) that
The simple inequality \(|\psi ^{**}(X_1,X_j)\psi ^{**}(X_1,X_i)|\le |\psi ^{**}(X_1,X_j)|^2+|\psi ^{**}(X_1,X_i)|^2\) yields, by symmetry,
In the last step we applied (206) and \(q_N^{-1}\le c_*\), see (44).
Furthermore, using the identity \(\mathbf{E}g(X_i)=0\) we obtain from (43)
In the last step we applied Hölder’s inequality to show \(\mathbf{E}|g(X_i)|\Vert Z_i\Vert _r^{r-1}\le c_*\).
The bounds (211), (44) and (208) together imply
The bound (204) follows from (210) and (212).
Let us prove (202). For this purpose we shall show that
and where \({\overline{Y}}_j\) denote independent copies of \(Y_j\), \(j\in O_k\). Using
we obtain, by symmetry and (47),
Now (213) follows from the well known inequality
which is valid for independent centered random elements \(\xi _i\) with values in \(L^r\). One can derive this inequality from Hoffmann–Jorgensen’s inequality (see e.g., Proposition 6.8 in Ledoux and Talagrand [32]) using the type 2 property of the Banach space \(L^r\) and the symmetrization lemma (see formula (9.8) and Lemma 6.3 ibidem). The proof of the lemma is complete.\(\square \)
Before formulating and proving Lemma 10 we introduce some more notation. Let \(\mathcal{B}(L^r)\) denote the class of Borel sets of \(L^r\). Consider the regular conditional probability \(P_k:{\mathbb R}\times \mathcal{B}(L^r)\rightarrow [0,1]\), defined, for \(z_k\in {\mathbb R}\) and \(B\in \mathcal{B}(L^r)\),
Recall, see (82), that \(\psi _k\) denotes a \(L^r\) valued random variable with the distribution \(\mathbf{P}\{\psi _k\in B\}=P_k(z_k;B)\). Note that the \(L^r\) valued random variable \(\psi _k^*=p_g(\psi _k)\) has distribution
Furthermore, using (199) we write (215) in the form
Let \({\overline{\psi }}_k\) respectively \({\overline{\psi }}^*_k\) denote an independent copy of \(\psi _k\) respectively \(\psi ^*_k\). Denote
Lemma 10
Let \(k=1,\dots , n-1\). Let \(|z_k|\le w\,n^{-1/2}\). There exist positive constants \(c_*^{(i)}\), \(i=0,1,2,3\), which depend on \(w, r,\nu _1,\nu _2, \delta , A_*,D_*,M_*\) only such that for
we have
Condition (216) requires N to be large enough. A simple calculation shows \(\tau _N\le N^{-75\nu }\), for \(\nu \) satisfying (17). Therefore, (87) implies \(\tau _N\le N^{-65\nu }\delta _3^2\). In particular, under (87) the inequality (216) is satisfied provided that \(N>c_*\), where \(c_*\) does not depend on \(\delta _3^2\).
Proof
By \({\tilde{c}}_*, {\tilde{c}}_*'\) we denote positive constants which depend only on \(w, r,\nu _1,\nu _2, \delta , A_*, D_*,M_*\). These constants can be different in different places of the text. Given \(i,j\in O_k\), \(i\not = j\), introduce random variables
Here \(R=\sqrt{n\,M\,N}\) satisfies \(N/2\le R\le N\), by the choice of n and M. Let p, \(p_0\), \(p_1\), and \(p_2\) denote the densities of random variables \(\eta \), \(\zeta +\eta \), \(\zeta _i+\eta \), and \(\zeta _{ij}+\eta \) respectively.
Note that \(g_*=\sqrt{N/M}g_k\). Therefore, the condition \(g_k=z_k\) is equivalent to \(g_*=z_*\), where \(z_*=\sqrt{N/M}z_k\). Furthermore, \(|z_k|\le w\, n^{-1/2}\Leftrightarrow |z_*|\le w_*\), where \(w_*=w\sqrt{N/Mn}\le 2w\).
Given a random variable Y, we denote the conditional expectation \(\mathbf{E}(Y|g_*=z_*)=\mathbf{E}(Y|g_k=z_k)\) by \(\mathbf{E}_*Y\). For an event A, we have \(P(A|g_k=z_k)=P(A|g_*=z_*)\).
Proof of (217). For the \(L^r\) valued random variable \({\hat{\psi }}^*=\psi _k^*-z_kb^*\) we have
Note that for an independent copy \({\overline{\psi }}^*_k\) of \(\psi ^*_k\) the distributions of \(\psi _k^*-{\overline{\psi }}^*_k\) and \({\hat{\psi }}^*-{\hat{\psi }}^*_c\) are the same. Here \({\hat{\psi }}^*_c\) denotes an independent copy of \({\hat{\psi }}^*\). Therefore,
In order to prove (217) we show that
and, for \(\tau _N\le c_*^{(0)}\delta _3^2\) (i.e., for sufficiently large N),
Since \(N^{-1}n<\tau _N\), we can choose \(c_*^{(0)}\) small enough such that the inequalities (220), (221) and (222) together imply (217)
Proof of (221). Recall that an element \(m=m(\cdot )\in L^2(P_X)\) is called mean of an \(L^2(P_X)\) valued random variable \({\hat{\psi }}^*={\hat{\psi }}^*(\cdot )\) if for every \(f=f(\cdot )\in L^2(P_X)\)
We shall show below that \(\mathbf{E}\Vert {\hat{\psi }}^*\Vert _2^2<\infty \). Then, by Fubini,
Therefore, \(m(x)=\mathbf{E}{\hat{\psi }}^*(x)\), for \(P_X\) almost all x.
For \(f\in L^2(P_X)\) it follows from (219) that
Fix \(i\in O_k\). By symmetry,
An application of (252) yields
where
are non-random elements of \(L^r\). It follows from (223), (224), (225) that
In order to prove (221) we show that, for \(|z_*|\le w_*\),
and apply (208). Note that, by Lemma 7, there exist positive constants \({\tilde{c}}_*, {\tilde{c}}_*'\) such that, for \(M,N>{\tilde{c}}_*'\), the inequality (228) holds.
Let us prove (226). In Lemma 7 we show, for \(i=1,2\), that \(p_i\) and its derivatives are bounded functions. That is,
Expanding in powers of \(M^{-1/2}g(Y_i)\) we obtain
It follows from the identities (196) and (197) that for \(P_X\) almost all x
Using (229) and the inequality \(q_N^{-1}\le c_*\), see (44), we obtain from (230)
where we denote \(a_2(\cdot )=\mathbf{E}\psi ^{**}(\cdot ,Y_i)g^2(Y_i)\). In order to prove (226) we show that
Let us prove (231). Invoking (43) we obtain, by Hölder’s inequality,
where we denote \(w(x)=\mathbf{E}|\psi ^{**}(x,X_i)|^r\). Furthermore, by Lyapunov’s inequality,
Clearly, the first bound of (231) follows from (232), (233) and (200). A similar argument shows the second bound of (231). We have
where we denote \(V=\mathbf{E}\bigl ( \Vert Z_i'\Vert _r^{r-2} |g(X_i)|\bigr )^{r/(r-1)}\). By Hölder’s inequality,
Clearly, (233), (234) and (235) imply the second bound of (231). The last bound of (231) follows from (47), by Cauchy-Shwartz. Indeed, we have
Therefore, \(\Vert a_2(\cdot )\Vert _2^2\le c_*K_2\mathbf{E}g^4(X_i)\le c_*\), by (200).
Let us prove (227). We have, by (251),
In order to prove (227) it suffices to show in view of (228) that
Let \({\tilde{p}}\) denote the density function of \(\xi _k\). Then \(p(u)=R\,{\tilde{p}}(R\, u)\). We have
Therefore, denoting \(H(z_*)=1+|R\,(z_*-\zeta )|^5\), we obtain
On the event \(|\zeta -z_*|\ge R^{-1/2}\) we have \( H^{-1}(z_*) \le R^{-5/2}\). Furthermore, a bound for the probability of the complementary event
follows by the Berry–Esseen bound applied to the sum \(\zeta \). Therefore, \(\mathbf{E}H^{-1}(z_*)\) is bounded by the right side of (236). Now (236) follows from (237).
Proof of (222). Write
It follows from (219), by the inequality \(\Vert u+v\Vert _2^2\ge \Vert u\Vert _2^2/2-\Vert v\Vert _2^2\), for \(u,v\in L^2(P_X)\), that
We shall show that
The inequalities (238) and (239) imply the lower bound in (222). Indeed, by (228), we have, for small \(c_*^{(0)}\),
Similarly, the inequalities (238) and (240) imply the upper bound in (222).
Proof of (238). We have, by (251),
Proceeding as in proof of (236), we obtain
where \({\tilde{H}}(z_*)=1+|R(z_*-\zeta )|^{4}\) satisfies
Therefore, \( W \le c_*R^{-3/2}+c_*R^{-1}M^{-1/2}\). This inequality in combination with (208) implies (238).
Proof of (239). Fix \(i,j\in O_k\), \(i\not =j\). By symmetry,
We have, by (252),
The inequality (239) follows from (241) and the bounds
Let us prove (242). It follows from (229), by the mean value theorem, that
where \(|Q|\le c_*M^{-1/2}\). Indeed, by (47) and Cauchy–Schwartz,
In the last step we applied (200). Furthermore, the identity
combined with (43), (44) and (45) yields \(\mathbf{E}T_{11}\ge \delta _3^2-c_*M^{-1/2}\). This bound together with (244) shows (242).
Let us prove (243). Write \(y_i=g(Y_i)\) and expand
where \({\tilde{Q}}\) denotes the remainder term. From (229) it follows, for \(2<r-2\le 3\) that
Furthermore, denote
We obtain, by symmetry,
Denote
It follows from (45), by Hölders inequality and (200), that
Therefore,
Furthermore, (196) and (197) imply
Invoking the inequalities \(q_N^{-2}\le c_*\), see (44), and \(1-\mathbb I_{A_i}\le V_i^s\), \(s>0\), where \(V_i:= \Vert Z_i'\Vert _r/N^{\alpha }\), see (43), we obtain, by Hölder’s inequality,
Combining (245), (247), (246) and using the simple inequalities
and the inequalities (229), we obtain (243).
Proof of (240). The inequality follows from (241), (243) and the inequality
which is obtained in the same way as (242) above.
Proof of (218). In order to prove (218) we shall show that
Split \(O_k=B\cup D\), where \(B\cap D=\emptyset \) and \(|B|=[M/2]\) and write
In particular, we have \(g_*=\eta +\zeta _B+\zeta _D\).
The inequality
combined with the bounds
imply (248). Let us prove the first bound of (249). By (252), we have
where \(p_3\) denotes the density of \(\eta +\zeta _D\). Furthermore, invoking the bound \(\sup _{x\in {\mathbb R}}|p_3(x)|\le c_*\), (which is obtained using the same argument as in the proof of Lemma 7) and the inequality (228), we obtain \(\mathbf{E}_*\Vert U_B\Vert _r^r\le {\tilde{c}}_*\mathbf{E}\Vert U_B\Vert _r^r\). Finally, invoking the bound
see (47) and (214), we obtain the first bound of (249). The second bound is obtained in the same way. This completes the proof of the lemma.\(\square \)
We collect some facts about conditional moments in a separate lemma.
Lemma 11
Let \(\eta \) and \(\zeta \) be independent random variables. Assume that \(\eta \) is real valued and has a density, say \(x\rightarrow p(x)\).
(i) Assume that \(\zeta \) is real valued. Then the function
is a density of the distribution \(P_{\eta +\zeta }\) of \(\eta +\zeta \). Let \(w:{\mathbb R}\rightarrow {\mathbb R}\) be a measurable function such that \(\mathbf{E}|w(\eta )|<\infty \). For \(P_{\eta +\zeta }\) almost all \(x\in {\mathbb R}\), we have
(ii) Assume that \(\zeta \) takes values in a measurable space, say \(\mathcal Y\). Assume that \(u,v:\mathcal Y\rightarrow {\mathbb R}\) are measurable functions and denote \(P_{\eta +u(\zeta )}\) the distribution of \(\eta +u(\zeta )\). If \(\mathbf{E}|v(\zeta )|<\infty \), then for \(P_{\eta +u(\zeta )}\) almost all \(x\in {\mathbb R}\),
Appendix 4
In the next lemma we consider independent and identically distributed random vectors \((\xi ,\,\eta )\) and \((\xi ',\, \eta ')\) with values in \({\mathbb R}^2\) and the symmetrization \((\xi _s,\eta _s)\) where \(\xi _s=\xi -\xi '\) and \(\eta _s=\eta -\eta '\). Note that in the main text we apply this lemma to \(\xi =g(X_1)\) and \(\eta =N^{-1/2}\sum _{j=m+1}^N\psi (X_1,Y_j)\).
Lemma 12
Let \(0<\nu <1/2\) and \(r>2\). Assume that \(\mathbf{E}|\xi |^r+\mathbf{E}|\eta |^r<\infty \). The following statements hold.
(a) For \(c_r=(7/12)2^{-r}\) the conditions
imply \(1-|\mathbf{E}\exp \{i(t\xi +\eta )\}|^2\ge 6^{-1}(t^2\mathbf{E}\xi _s^2+\mathbf{E}\eta _s^2)\).
(b) Assume that for some \({\tilde{c}}_1, {\tilde{c}}_2 >0\) we have
Let \(\varepsilon >0\) be such that
where \({\tilde{c}_3}=2+(5/{\tilde{c}}_1)^2\sigma _z^2\) and where the numbers A, B are defined in (265). Here \(\sigma _z^2=\mathbf{E}(\xi _s+N^{-1/2}\eta _s)^2\). Assume that for some \(0<\delta <{\tilde{c}}_2\) and \(\delta '>10\varepsilon ^2\),
Then for every \(T^*\), satisfying \(N^{1/2-\nu }\le |T^*|\le N^{\nu +1/2}\), the set
is an interval of size at most \(5{\tilde{c}}_1^{-1} \varepsilon \).
Proof
Proof of (a). Invoking the inequality \(1-\cos x\ge x^2/2-x^2/24-|x|^r\) and using the simple inequality \(|a+b|^r\le 2^{r-1}(|a|^r+|b|^r)\) we obtain
In the last step we use the conditions a).
Proof of (b). Introduce the function \(t\rightarrow \tau _t^*=1-|\mathbf{E}e^{it(\xi +N^{-1/2}\eta )}|^2\). Assume that the set \(I^*\) is non-empty and choose \(s,t\in I^*\), i.e., we have \(\tau _t^*,\tau _s^*\le \varepsilon ^2\). Firstly we show that \(|s-t|\le 5{\tilde{c}}_1^{-1} \varepsilon \), thus proving the bound for the size of the set \(I^*\).
The inequality \(1-\cos (x+y)\ge (1-\cos x)/2-(1-\cos y)\) implies
for arbitrary random variables X, Y. Choosing \({\tilde{Y}}=t(\xi +N^{-1/2}\eta )\) and \({\tilde{X}}=(s-t)(\xi +N^{-1/2}\eta )\) shows
Now we show that the inequality \(|t-s|>5{\tilde{c}}_1^{-1}\varepsilon \) implies \(1-|\mathbf{E}e^{i{\tilde{X}}}|^2>5\varepsilon ^2\), thus, contradicting to our choice \(\tau _s^*,\tau _t^*<\varepsilon ^2\) and (257). In what follows the cases of “large” and “small” values of \(|t-s|\) are treated separately.
For \(5{\tilde{c}}_1^{-1}\varepsilon <|t-s|\le \delta \) we shall apply (256) to \({\tilde{X}}=X+Y\), where \(X=(s-t)\xi \) and \(Y=(s-t)N^{-1/2}\eta \). Note that the statement a) implies
Indeed, in view of the second inequality of (253), the conditions of a) are satisfied for \(|t-s|\le \delta \le {\tilde{c}}_2\). Furthermore, we have
Invoking the bounds (258) and (259) in (256) we obtain
In the last step we used (253).
For \(\delta <|t-s|\le N^{-\nu +1/2}\) we expand in powers of \(a=i(s-t)N^{-1/2}\eta _s\) to get
In the last step we applied (255).
Let us prove that \(I^*\) is indeed an interval. Assume the contrary, i.e. there exist \(s<u<t\) such that \(s,t\in I^*\) and \(u\notin I^*\). In particular, \(\tau _t^*\le \varepsilon ^2<\tau _u^*\). Clearly, we can choose u to be a local maximum (stationary) point of the function \(t\rightarrow \tau _t^*\). Denote
An application of (256) to \(Y'=(t-u)(\xi +N^{-1/2}\eta )\) and \(X'=u(\xi +N^{-1/2}\eta )\) gives
Invoking the inequalities \(\tau _t^*\le \varepsilon ^2\) and \(1-\cos (t-u)z\le (t-u)^2z^2/2\) we obtain
Here we used the bound \(|t-u|\le |t-s|\le 5\varepsilon /{\tilde{c}}_1\) proved above.
Denoting \(y=(t-u)z\) we have \(\tau _t^*=1-\mathbf{E}e^{iuz}e^{iy}\). Invoking the expansion \(e^{iy}=1+iy+(iy)^2/2+R'\), where \(|R'|\le y^2/6+|y|^r\), we obtain
For a stationary point u we have \(0=\frac{\partial }{\partial t}\tau _t^*\bigl |_{t=u}=-i\mathbf{E}z e^{iuz}\). Therefore, \(\mathbf{E}ye^{iuz}=0\) and (261) implies
Write the right hand side in the form \(\tau _u^*+2^{-1}(t-u)^2R_1\), where
Note that the inequality \(R_1>0\) contradicts to our assumption \(\tau _t^*< \tau _u^*\). We complete the proof by showing that \(R_1>0\).
Since the random variable z is symmetric we have \(\mathbf{E}z^2\sin uz=0\). Therefore,
Given \(\lambda >0\) split
In the last step we used Chebyshev’s inequality. Furthermore, invoking the inequality \(\mathbf{E}(1-\cos uz)=\tau _u^*\le {\tilde{c}}_3\varepsilon ^2\), see (260), we obtain from (262) and (263) for \(\lambda ^2=\varepsilon ^{-1}\sigma _z^2\)
Finally, invoking the inequality \(|t-u|\le |t-s|\le 5{\tilde{c}}_1^{-1}\varepsilon \) we obtain from (264)
where for random variable \(z=\xi _s+N^{-1/2}\eta _s\) we write
Thus, for \(\varepsilon \) satisfying (254) we have \(R_1>0\).\(\square \)
Appendix 5
Let \(Z_1,\dots , Z_N\) be independent copies of the \(L^r\) valued random element \(Z=\{x\rightarrow \psi (x,Y)\}\). Recall that almost surely \(\Vert Z\Vert \le N^{\alpha }\). Here \(\Vert \cdot \Vert \) denotes the norm of the Banach space \(L^r\), where \(r>4\) and \(1/2>\alpha >0\). Write \(M_p=\mathbf{E}|\psi (X_1,X_2)|^p\).
Lemma 13
-
(i)
Assume that \(\Vert \mathbf{E}Z\Vert ^2\le \mathbf{E}\Vert Z\Vert ^2/N\). Then there exists a constant \(c(r)>0\) such that for \(k\le N\) and \(x>c(r)\) we have
$$\begin{aligned} \mathbf{P}\{\Vert Z_1+\dots +Z_k\Vert >k^{1/2} u\,x\} \le \exp \{-2^{-5}x^2(1+xN^{\alpha }/k^{1/2}u)^{-1}\}. \end{aligned}$$(266)Here \(u^2=\mathbf{E}\Vert Z\Vert ^2\).
-
(ii)
The following inequalities hold
$$\begin{aligned}&\Vert \mathbf{E}Z\Vert \le M_r/q_N N^{(r-1)\alpha }, \end{aligned}$$(267)$$\begin{aligned}&q_N^{-1}(M_2-M_rN^{-(r-2)\alpha }) \le \mathbf{E}\Vert Z\Vert ^2 \le q_N^{-1}(M_r^{2/r}+M_rN^{-(r-2)\alpha }). \end{aligned}$$(268)
Remark. Assume that
Then (267) and (268) imply the inequality \(\Vert \mathbf{E}Z\Vert ^2\le \mathbf{E}\Vert Z\Vert ^2/N\). Note that \(r\alpha >2\) implies \(\varkappa >2\). Furthermore, by (44), the probability \(q_N>1-M_rN^{-r\alpha }\).
Proof
We derive (i) from Yurinskii’s [36] inequality. Denote \(\zeta _k=Z_1+\dots +Z_k\). Using the type\(-2\) inequality for an \(L^r\) valued random variable \(\zeta _k-\mathbf{E}\zeta _k\),
and the inequality \(\Vert Z_1-\mathbf{E}Z_1\Vert ^2\le 2\Vert Z_1\Vert ^2+2\Vert \mathbf{E}Z_1\Vert ^2\), we obtain
We have
It follows from the inequality \(\Vert Z_1\Vert \le N^{\alpha }\) that
Write \(B_k^2=ku^2\). Theorem 2.1 of Yurinskii [36] shows
provided that \({\overline{x}}=x-\beta _k/B_k>0\).
Since \(\beta _k/B_k\le 1+c'(r)(1+k^{-1/2})\) we have, for \(x> c(r):=4c'(r)+2\),
The latter inequality implies
Finally, replacing B by \(B'\) in (269) we obtain (266).
Let us prove (ii). The mean value \(\mathbf{E}Z=\{x\rightarrow \mathbf{E}\psi (x,Y)\}\) is an element of \(L^r\). For \(P_X\) almost all \(x\in \mathcal X\) we have \(\mathbf{E}\psi (x,X)=0\). Therefore,
Invoking (43) and using Chebyshev and Hölder inequalities, we obtain, for \(P_X\) almost all x,
where \(a(x)=(\mathbf{E}|\psi (x,X)|^{r})^{1/r}\). Note that \(\mathbf{E}\Vert Z'\Vert _r^{r}=M_r\) and \(\Vert a\Vert ^r=M_r\). Finally,
Let us prove (268). Denote \(b_p(x)=(\mathbf{E}_{X_1}|\psi (X_1,x)|^p)^{1/p}\). Here \(\mathbf{E}_{X_1}\) denotes the conditional expectation given all the random variables, but \(X_1\). We have
By Hölder’s inequality \(b_r(x)\ge b_2(x)\), for \(P_X\) almost all x. Therefore,
Combining (271) and (270) and the bound \(|R|\le M_rN^{-(r-2)\alpha }\) we obtain (268). In order to bound |R| we use (43), \(|R| \le N^{-(r-2)\alpha }\mathbf{E}\Vert Z'\Vert _r^{r-2}b_r^2(X)\), and apply Hölder’s inequality,
The lemma is proved.\(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Bloznelis, M., Götze, F. Edgeworth approximations for distributions of symmetric statistics. Probab. Theory Relat. Fields 183, 1153–1235 (2022). https://doi.org/10.1007/s00440-022-01144-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-022-01144-x
Keywords
- Edgeworth expansion
- Littlewood–Offord problem
- Concentration in Banach spaces
- Symmetric statistic
- U-statistic
- Hoeffding decomposition