1 Introduction and results

1.1 Introduction

Let \(X,\, X_1,X_2,\dots , X_N\) be independent and identically distributed random variables taking values in a measurable space \((\mathcal X,\mathcal B)\). Let \(P_X\) denotes the distribution of X on \((\mathcal X,\mathcal B)\). We assume that \(\mathbb T(X_1,\dots , X_N)\) is a symmetric function of its arguments (symmetric statistic, for short). Furthermore, we assume that the moments \(\mathbf{E}\mathbb T\) and \(\sigma _{\mathbb T}^2:=\mathbf{Var}\mathbb T\) are finite. A function of observations \(X_1,\dots , X_N\) is called linear statistic if it can be represented by a sum of functions depending on a single observation only. Many important statistics are non linear, but can be approximated by a linear statistic. We call these statistics asymptotically linear. The central limit theorem and the normal approximation with rate \(O(N^{-1/2})\) extend to the class of asymptotically linear statistics as well. Our approach in studying the distribution of this class of statistics in the statistically relevant case of asymptotic normal \(\mathbb T\) is based on Hoeffding’s decomposition of \(\mathbb T\), see Hoeffding [31], Efron and Stein [21] and van Zwet [37]. Hoeffding’s decomposition expands \(\mathbb T\) into the series of centered and mutually uncorrelated U-statistics of increasing order

$$\begin{aligned} {\mathbb T}&= \mathbf{E}{\mathbb T} + \frac{1}{N^{1/2}} \sum _{1\le i\le N}g(X_i) + \frac{1}{N^{3/2}}\sum _{1\le i<j\le N}\psi (X_i,X_j)\\&\quad \, + \frac{1}{N^{5/2}}\sum _{1\le i<j<k\le N}\chi (X_i,X_j,X_k) +\dots . \end{aligned}$$

Let LQ and K denote the first, the second and the third sum. We call L the linear part, Q the quadratic part and K the cubic part of the decomposition. We shall consider a general situation where the kernel \(\mathbb T=\mathbb T^{(N)}\), the space \((\mathcal X,\mathcal B)=(\mathcal X^{(N)},\mathcal B^{(N)})\) and the distribution \(P_X=P_X^{(N)}\) all depend on N as \(N\rightarrow \infty \). In order to keep the notation simple we drop the subscript N in what follows. An improvement over the normal approximation is obtained by using Edgeworth expansions for the distribution function \(\mathbb F(x)=\mathbf{P}\{\mathbb T-\mathbf{E}\mathbb T\le \sigma _{\mathbb T}x\}\). For this purpose we write Hoeffding’s decomposition in the form

$$\begin{aligned} \mathbb T-\mathbf{E}\mathbb T=L+Q+K+R, \end{aligned}$$
(1)

where R denotes the remainder. For a number of important examples of asymptotically linear statistics we have \(R/\sigma _{\mathbb T}=o_P(N^{-1})\) (in probability) as \(N\rightarrow \infty \). Therefore, the U-statistic \(\sigma _{\mathbb T}^{-1}(L+Q+K)\) can be viewed as a stochastic expansion of \((\mathbb T-\mathbf{E}\mathbb T)/\sigma _{\mathbb T}\) up to the order \(o_P(N^{-1})\).

Furthermore, a so-called Edgeworth expansion of \(\sigma _{\mathbb T}^{-1}(L+Q+K)\) can be used to approximate \(\mathbb F(x)\) by a smooth distribution function G(x) as defined in (2) below depending on N and moments of \(\mathbb T\). A two term Edgeworth expansion of the distribution function of \(\sigma _{\mathbb T}^{-1}(L+Q+K)\) is given by

$$\begin{aligned} G(x)= & {} \Phi (x) -\frac{1}{\sqrt{N}}\frac{\kappa _3}{6}(x^2-1)\Phi '(x) \nonumber \\&-\frac{1}{N} \Bigl ( \frac{\kappa _3^2}{72}(x^5-10x^3+15x)\Phi '(x)+\frac{\kappa _4}{24}(x^3-3x)\Phi '(x) \Bigr ). \end{aligned}$$
(2)

Here \(\Phi \) respectively \(\Phi '\) denote the standard normal distribution function and its derivative. Furthermore, we introduce \(\sigma ^2=\mathbf{E}g^2(X_1)\) and

$$\begin{aligned} \kappa _3=&\, \sigma ^{-3}\Bigl (\mathbf{E}g^3(X_1)+3\mathbf{E}g(X_1)g(X_2)\psi (X_1, X_2)\Bigr ), \\ \kappa _4 =&\,\sigma ^{-4}\Bigl (\mathbf{E}g^4(X_1)-3\sigma ^4+12\mathbf{E}g^2(X_1)g(X_2)\psi (X_1,X_2) \\&+12\mathbf{E}g(X_1)g(X_2)\psi (X_1,X_3)\psi (X_2,X_3) \\&+4\mathbf{E}g(X_1)g(X_2)g(X_3)\chi (X_1,X_2,X_3) \Bigr ). \end{aligned}$$

Our main result, Theorem 1 below, establishes a bound \(o(N^{-1})\) for the Kolmogorov distance between \(\mathbb F(x)\) and G(x):

$$\begin{aligned} \Delta =\sup _{x\in \mathbb R}|\mathbb F(x)-G(x)|= o(N^{-1}). \end{aligned}$$
(3)

Valid expansions of this type were shown by Cramér [19] for sums of independent random variables \(X_j\) and later on for the Student statistic (which is of type (1)) by Kai-Lai Chung [18]. A new impetus for studying higher order approximations in statistic was given by the fundamental paper of Hodges and Lehmann on deficiency [30], where they compared the power of two tests based on N and \(N'\) observations respectively and where \(N'-N=o(N)\) as \(N\rightarrow \infty \). They suggested a program of comparisons of the power of tests, estimators and confidence regions based on classical parametric and non parametric symmetric statistics e.g. using ranks and ordered samples. They noted that this would require going beyond Gaussian limit theorems to asymptotic expansions to order \(N^{-1}\). For more details on the statistical relevance and the related development of asymptotic methods we refer to the review paper in memory of Willem van Zwet [12].

Now we discuss the principal contribution of this paper: the minimal smootness and structural conditions under which approximation (3) holds. Let us emphasize that any \(\mathbb F\) satisfying (3) cannot have fluctuations/increments of order \(\Theta (N^{-1})\) in the intervals of size \(o(N^{-1})\) because G is a differentiable function with all derivatives bounded. We focus on the conditions that guarrantee the necessary level of smoothness of the distribution of \(\mathbb T\). In the case of linear statistic \(\mathbb T=\mathbf{E}\mathbb T+L\) the necessary smoothness of \(\mathbb F\) is ensured by the classical Cramer condition

$$\begin{aligned} \limsup _{|t|\rightarrow \infty } |\mathbf{E}\exp \{it g(X_1)\}|<1. \qquad \qquad (C) \end{aligned}$$

This condition excludes, in particular, lattice distributions, for which approximation (3) obviously fails. We note that condition (C) can be weakened to cover some special classes of discrete distribution which are sufficiently non-lattice distributed, see e.g. Bickel and Robinson [13], Angst and Poly [1] or Bobkov [14] for almost sure choices of such non-lattice discrete distributions.

Since the class of symmetric statistic should include the case of linear statistics we require a Cramér type condition but on the linear part of the statistic only, see (7). Interestingly, this condition together with appropriate moment conditions on various parts of the decomposition (1) guarantees already an approximation error \(\Delta =O(N^{-1})\) for general symmetric statistics (see [4]). But (7) is not sufficient for the desired error bound \(o(N^{-1})\) even for U statistics of degree two, see Example 1 below. The reason why (7) alone is not sufficient for the approximation accuracy \(\Delta =o(N^{-1})\) is due to the potential occurrence of a very special relation between the linear and quadratic parts L and Q that fosters an approximate lattice structure as shown in Example 1. Namely, the quadratic part of the U statistic in Example 1 has a factorizable kernel \(\psi \) of the form \(\psi _h(X_1,X_2)=h(X_1)g(X_2)+g(X_1)h(X_2)\), \(h-\)measurable. The following structural condition (4) (introduced in the unpublished manuscript by Götze and van Zwet [25]) avoids such counterexamples by separating (in \(L^2\) distance) the random variable \(\psi (X_1,X_2)\) from any random variable of the form \(\psi _h(X_1,X_2)\). Note that the \(L^2\) distance \(\mathbf{E}(\psi (X_1,X_2)-\psi _h(X_1,X_2))^2\) is minimized by \(h(x)=b(x)\), where

$$\begin{aligned} b(x)=\sigma ^{-2}\mathbf{E}\bigl (\psi (X_1,X_2)g(X_2)\bigr |X_1=x\bigr )-(\kappa /2\sigma ^4)g(x). \end{aligned}$$

Here \(\kappa =\mathbf{E}\psi (X_1,X_2)g(X_1)g(X_2)\). Therefore, we will assume that, for some absolute constant \(\delta _{*}>0\), we have

$$\begin{aligned} \mathbf{E}\Bigl (\psi (X_1,X_2)-\bigl (b(X_1)g(X_2)+b(X_2)g(X_1)\bigr )\Bigr )^2\ge \delta _{*}^2\sigma _\mathbb T^2. \end{aligned}$$
(4)

The main contribution of the present paper consist of a proof that condition (4) will indeed ensure the desired bound \(\Delta =o(N^{-1})\). The proof is based on a careful investigation of the size distribution for \(|t|> N^{1-\nu }\) of the absolute values of conditional Fourier transforms of symmetric statistics that is the landscape of its maxima when imposing Cramer’s condition (7) and the structural condition (4). Here new methods are used for studying this landscape in the frequency t as well in the random function representing the conditioning. For the latter variable a combinatorial argument of Kleitman on symmetric partitions for the Littlewood–Offord problem in Banach spaces (see [15]) is used.

A short outline of the approach is given at the beginning of Sect. 2, where we focus on the use of condition (4).

1.2 Results

Let us state our main result Theorem 1.

Moment conditions We will assume that, for some absolute constants \(0<A_*<1\) and \(M_*>0\) and numbers \(r>4\) and \(s>2\), we have

$$\begin{aligned}&\mathbf{E}g^2(X_1)>A_*\sigma _\mathbb T^2, \quad \mathbf{E}|g(X_1)|^r<M_*\sigma _\mathbb T^r, \nonumber \\&\mathbf{E}|\psi (X_1,X_2)|^r<M_*\sigma _\mathbb T^r, \quad \mathbf{E}|\chi (X_1,X_2,X_3)|^s<M_*\sigma _\mathbb T^s. \end{aligned}$$
(5)

These moment conditions refer to the linear, quadratic and cubic part of \(\mathbb T\). In order to control the remainder R of the approximation (1) we use moments of differences introduced in Bentkus, Götze and van Zwet [4], see also van Zwet [37]. Define, for \(1\le i\le N\),

$$\begin{aligned} D_i\mathbb T=\mathbb T-\mathbf{E}_i\mathbb T, \qquad \mathbf{E}_i\mathbb T:=\mathbf{E}(\mathbb T|X_1,\dots , X_{i-1},X_{i+1},\dots , X_N). \end{aligned}$$

A subsequent application of difference operations \(D_i\), \(D_j\), \(\dots \), (the indices ij, \(\dots \), are all distinct) produce higher order differences, like

$$\begin{aligned} D_iD_j\mathbb T:=D_i(D_j\mathbb T)=\mathbb T-\mathbf{E}_i\mathbb T-\mathbf{E}_j\mathbb T+\mathbf{E}_i\mathbf{E}_j\mathbb T. \end{aligned}$$

For \(m=1,2,3,4\) write \(\Delta _m^2=\mathbf{E}|N^{m-1/2}D_1D_2\cdots D_m \mathbb T|^2\).

We will assume that for some absolute constant \(D_*>0\) and number \(\nu _1\in (0,1/2)\) we have

$$\begin{aligned} \Delta _4^2/\sigma _\mathbb T^2\le N^{1-2\nu _1}D_* \end{aligned}$$
(6)

For a number of important examples of asymptotically linear statistics the moments \(\Delta _m^2\) are evaluated or estimated in [4]. Typically we have \(\Delta _m^2/\sigma _\mathbb T^2=O(1)\) for some m. Therefore, assuming that (6) holds uniformly in N as \(N\rightarrow \infty \), we obtain from the inequality \(\mathbf{E}R^2\le N^{-3}\Delta _4^2\), see (167) (see “Appendix”), that \(R/\sigma _\mathbb T=O_P(N^{-1-\nu _1})\). Furthermore, assuming that (5), (6) hold uniformly in N as \(N\rightarrow \infty \), we obtain from (167), (166), see “Appendix”, that \(\sigma ^2/\sigma _\mathbb T^2=(1-O(N^{-1}))\).

Cramér type smoothness condition We introduce the function

$$\begin{aligned} \rho (a,b)=1-\sup \{|\mathbf{E}\exp \{itg(X_1)/\sigma \}|: \, a\le |t|\le b \} \end{aligned}$$

and assume that, for some \(\delta >0\) and \(\nu _2>0\), we have

$$\begin{aligned} \rho (\beta _3^{-1}, N^{\nu _2+1/2}) \ge \delta . \end{aligned}$$
(7)

Here \(\beta _3=\sigma ^{-3}\mathbf{E}|g(X_1)|^3\). Define \(\nu =600^{-1}\min \{\nu _1,\nu _2,s-2,r-4\}\).

Theorem 1

Assume that for some absolute constants \(A_*,M_*,D_*>0\) and numbers \(r>4, s>2\), \(\nu _1,\nu _2>0\) and \(\delta ,\delta _{*}>0\), the conditions (5), (6), (7), (4) hold. Then there exists a constant \(C_*>0\) depending only on \(A_*\), \(M_*\), \(D_*\), r, s, \(\nu _1,\nu _2,\delta , \delta _{*}\) such that

$$\begin{aligned} \Delta \le C_*N^{-1-\nu }\bigl (1+\delta _{*}^{-1}N^{-\nu }\bigr ). \end{aligned}$$

Remark 1

The value of \(\nu =600^{-1}\min \{\nu _1,\nu _2,s-2,r-4\}\) is far from being optimal. Furthermore, the moment conditions (5) and (6) are not the weakest possible that would ensure the approximation of order \(o(N^{-1})\). The condition (5) can likely be reduced to the moment conditions that are necessary to define Edgeworth expansion terms \(\kappa _3\) and \(\kappa _4\), similarly, (6) can be reduced to \(\Delta _4^2/\sigma _\mathbb T^2= o(N^{-1})\). No effort was made to obtain the result under the optimal conditions. This would increase the complexity of the proof which is already rather involved.

Remark 2

Condition (4) can be relaxed. Assume that for some absolute constant \(G_*\) we have

$$\begin{aligned} \mathbf{E}\Bigl (\psi (X_1,X_2)-\bigl (b(X_1)g(X_2)+b(X_2)g(X_1)\bigr )\Bigr )^2\ge N^{-2\nu } G_*\sigma _\mathbb T^2. \end{aligned}$$
(8)

The bound of Theorem 1 holds if we replace (4) by this weaker condition. In this case we have \(\Delta \le C_*N^{-1-\nu }\), where the constant \(C_*\) depends on \(A_*,D_*, G_*\), \(M_*,r,s,\nu _1,\nu _2, \delta \).

In the particular case of U statistics of degree three (the case where \(R\equiv 0\) in (1)) the proof of Theorem 1 has been outlined in the unpublished manuscript by Götze and van Zwet [25]. We provide a complete and more readable version of the arguments sketched in that preprint and extend them to a general class of symmetric statistics. In the same paper [25], see as well [4], it was shown that moment conditions (like (5), (6)) together with Cramér’s condition (like (7)) do not suffice for the bound \(\Delta =o(N^{-1})\). For convenience we state this result in Example 1 below.

Example 1

Let \(X_1,X_2,\dots \) be independent random variables uniformly distributed on the interval \((-1/2,1/2)\). Define \(T_N=(W_N+N^{-1/2}V_N)(1-N^{-1/2}V_N)\), where \(V_N=N^{-1/2}\sum \{N^{1/2}X_j\}\) and \(W_N=N^{-1}\sum [N^{1/2}X_j]\). Here [x] denotes the nearest integer to x and \(\{x\}=x-[x]\).

Assume that \(N=m^2\), where m is odd. We have, by the local limit theorem,

$$\begin{aligned} \mathbf{P}\{W_N=1\}\ge cN^{-1} \qquad {\text { and}} \qquad \mathbf{P}\{|V_N|<\delta \}>c\delta , \qquad 0<\delta <1, \end{aligned}$$

where \(c>0\) is an absolute constant. From these inequalities it follows by the independence of \(W_N\) and \(V_N\), that \(\mathbf{P}\{1-\delta ^2 N^{-1}\le T_N\le 1\}\ge c^2\delta N^{-1}\).

The example defines a sequence of U-statistics \(\mathbb T_N\) whose distribution functions \(\mathbb F_N\) have \(O(N^{-1})\) sized increments in a particular interval of length \(o(N^{-1})\). These fluctuations of magnitude \(O(N^{-1})\) appear as a result of a nearly lattice structure induced by the interplay between the (smooth) linear part and the quadratic part.

1.3 Earlier work

There is a rich literature devoted to normal approximation and Edgeworth expansions for various classes of asymptotically linear statistics (see e.g. Babu and Bai [2], Bai and Rao [3], Bentkus, Götze and van Zwet [4], Bhattacharya and Ghosh [8, 9], Bhattacharya and Rao [7], Bickel [10], Bickel, Götze and van Zwet [11], Callaert, Janssen and Veraverbeke [16], Chibisov [17], Hall [28], Helmers [29], Petrov [33], Pfanzagl [34], Serfling [35], etc.

A wide class of statistics can be represented as functions of sample means of vector variables. Edgeworth expansions of such statistics can be obtained by applying the multivariate expansion to corresponding functions, see Bhattacharya and Ghosh [8, 9]. In their work the crucial Cramér condition (C) is assumed on the joint distribution of all the components of a vector which may be too restrictive in cases where some components have a negligible influence on the statistic. More often only one or a few of the components satisfy a conditional version of condition (C). Bai and Rao [3], Babu and Bai [2] established Edgeworth expansions for functions of sample means under such a conditional Cramér condition. This approach exploits the smoothness of the distribution of a random vector as well as the smoothness of the function defining the statistic. In particular this approach needs a class of statistics which are smooth functions of observations or can be approximated by such functions via Taylor’s expansion, see also Chibisov [17]. The respective condition (6) of the present paper is expressed in terms of moments of iterated differences \(\Delta _m\) and does not assume Taylor’s expansion.

Let us note that generally the smoothness of the distribution function of \(\mathbb T\) may have little to do with the smoothness of the function \(\mathbb T(X_1,\dots , X_N)\) of observations \(X_1,\dots , X_N\). Just take Gini’s mean difference \(\sum _{i<j}|X_i-X_j|\) with absolutely continuous \(X_i\) for example. Another interesting example is about Studentization, when it enchances the smoothness of the distribution function of a sum of lattice random variables dramatically, see [26]. Our Theorem 1 shows, in particular, that structural condition (4) together with (7) guarantee the smoothness of the distribution of \(\mathbb T\) necessary for the bound \(\Delta =o(N^{-1})\).

In order to compare Theorem 1 with earlier results of similar nature let us consider the case of U-statistics of degree two

$$\begin{aligned} \mathbb U=\frac{\sqrt{N}}{2} \left( {\begin{array}{c}N\\ 2\end{array}}\right) ^{-1}\sum _{1\le i<j\le N}h(X_i,X_j), \end{aligned}$$
(9)

where \(h(\cdot ,\cdot )\) denotes a (fixed) symmetric kernel. Assume for simplicity of notation and without loss of generality that \(\mathbf{E}h(X_1,X_2)=0\). Write \(h_1(x)=\mathbf{E}(h(X_1,X_2)|X_1=x)\) and assume that \(\sigma _h^2>0\), where \(\sigma _h^2=\mathbf{E}h_1^2(X_1)\). In this case Hoeffding’s decomposition (1) reduces to \(\mathbb U=L+Q\), where, by the assumption \(\sigma _h^2>0\), we have \(\mathbf{Var}L>0\). Since the cubic part vanishes we remove the moment \(\mathbf{E}g(X_1)g(X_2)g(X_3)\chi (X_1,X_2,X_3)\) from the expression for \(\kappa _4\). In this way we obtain the two term Edgeworth expansion (2) for the distribution function \(\mathbb F_U(x)=\mathbf{P}\{\mathbb U\le \sigma _\mathbb Ux\}\) with \(\sigma ^2_\mathbb U:=\mathbf{Var}\mathbb U\).

We call h reducible if for some measurable functions \(u,v:\mathcal X\rightarrow R\) we have \(h(x,y)=v(x)u(y)+v(y)u(x)\) for \(P_X\times P_X\) almost sure \((x,y)\in \mathcal X\times \mathcal X\). A simple calculation shows that for a sequence of U-statistics (9) with a fixed non-reducible kernel condition (4) is satisfied, for some \(\delta _{*}>0\), uniformly in N. A straightforward consequence of Theorem 1 is the following corollary. Write \({\tilde{\nu }}=600^{-1}\min \{\nu _2,r-4,1\}\).

Corollary 1

Assume that \(\mathbf{E}h(X_1,X_2)=0\) and for some \(r>4\)

$$\begin{aligned} \mathbf{E}|h(X_1,X_2)|^r<\infty . \end{aligned}$$
(10)

Assume that \(\sigma _h^2>0\) and the kernel h is non-reducible and that for some \(\delta >0\)

$$\begin{aligned} \sup \{|\mathbf{E}e^{it\sigma _h^{-1}h_1(X_1)}|:\, |t|\ge \beta _3^{-1}\} \le 1-\delta . \end{aligned}$$
(11)

Then there exist a constant \(C_*>0\) such that

$$\begin{aligned} \sup _{x\in \mathbb R}|\mathbb F_U(x)-G(x)| \le C_*N^{-1-{\tilde{\nu }}}. \end{aligned}$$

For U-statistics with a fixed kernel the validity of the Edgeworth expansion (2) up to the order \(o(N^{-1})\) was established by Callaert, Janssen and Veraverbeke [16] and Bickel, Götze and van Zwet [11]. In addition to the moment conditions (like (10)) and Cramér’s condition (like (11)) Callaert, Janssen and Veraverbeke [16] imposed the following rather implicit condition. They assumed that for some \(0<c<1\) and \(0<\alpha <1/8\) the event

$$\begin{aligned} \Bigl |\mathbf{E}\bigl (\exp \{it\sigma _\mathbb U^{-1}\sum _{j=m+1}^Nh(X_1,X_j)\}\,\bigl | X_{m+1},\dots , X_N\bigr )\Bigr |\le c \end{aligned}$$
(12)

has probability \(1-o(1/N\log N)\) uniformly for all \(t\in [N^{3/4}/\log N,\, N\log N]\). Here \(m\approx N^\alpha \), for a small positive \(\alpha \). Bickel, Götze and van Zwet [11] more explicitly required that the linear operator, \(f(\cdot )\rightarrow \mathbf{E}\psi (X,\cdot )f(X)\) defined by \(\psi \) has a sufficiently large number of non-zero eigenvalues (the number depending on the existing moments, but always larger than 4). Correspondingly the eigenvalue condition is stronger than the non-reducibility condition of Corollary 1 since for a reducible kernel h the linear operator \(f(\cdot )\rightarrow \mathbf{E}\psi (X,\cdot )f(X)\) has at most two eigenvalues. On the other hand, it is difficult to compare the structural non-reducibility condition with condition (12) whose technical nature is discussed in the outline of the proof at the beginning of Sect. 2.

The remaining parts of the paper (Sects. 25) contain the proof of Theorem 1. Auxiliary results are placed in the “Appendix”.

2 Proof of Theorem 1

2.1 Proof highlights

After the seminal paper of Esseen [22] a standard proof of the validity of the normal approximation and its refinements proceeds in two steps. In the first step, with the aid of a smoothing inequality, the Kolmogorov distance between the distribution function and its approximation G is upper bounded by a (weighted) average difference of the respective Fourier transforms, see (25). In the second step one performs a careful analysis of the Fourier transforms: for frequencies \(t=O(\sqrt{N})\) one shows the closeness between the respective Fourier transforms, while for the remaining range \(\Omega (\sqrt{N})\le |t|\le O(T)\) one establishes their exponential decay. The cut-of T is defined by the desired approximation accuracy level \(O(T^{-1})\) (in our case \(T=N^{1+\nu })\)). The approach, initially developed for sums of independent random variables [22, 33], was later applied to non-degenerate U-statistics [11, 16] and general asymptotically linear symmetric statistics [4, 37].

One particular problem related to the implementation of the proof strategy outlined above is about establishing exponential decay of the (absolute value of the) Fourier transform in the range of large frequencies. For a linear statistic this problem is elegantly resolved by introducing Cramér’s condition. Indeed, in view of the multiplicity property of the Fourier transform, the Cramér condition implies the desired exponential decay. Consequently, Cramér’s condition together with moment conditions ensure the validity of an Edgeworth expansion of an arbitrary order. But the multiplicity property can not be used any more (at least directly) when we turn to general symmetric statistics because various parts (linear, quadratic etc.) are mutually dependent. This fact leads to considerable difficulties in estimating the respective Fourier transforms in the range of large frequencies \(t\gg N\) and requires new conditions to control the above mentioned dependencies. The present paper suggests a novel approach to estimation of the Fourier transform of a symmetric statistic for large frequencies.

As our general setup of symmetric statistics covers linear ones, we keep assuming the Cramér condition, but on the linear part of the statistic only, see (7). In view of Example 1, condition (7) is not enough. We introduce the additional structural condition (4), which together with (7) guarantees the desired \(O(N^{-1-\nu })\) upper bound on the weighted average of the Fourier transform over the frequency range \(N^{1-\nu }\le |t|\le N^{1+\nu }\), see (26) below. Condition (4) is optimal and natural in the sense that it matches the counterexample. It has first appeared in the unpublished manuscript [25] by Götze and van Zwet in the case of U statistics.

Let us compare (4) with alternative conditions introduced in earlier papers by Callaert, Janssen and Veraverbeke [16] and Bickel, Götze and van Zwet [11] in the case of U statistics of degree two. The conditional Cramér condition (12) of [16] forces the multiplicity property of the Fourier transform in a formal way thus circumventing the problem of establishing relation between the structure of the kernel (of U statistic) and the smoothness of the distribution. Therefore (4) and (12) are not comparable. This is not the case with the eigenvalue condition of [11], which is stronger than (4). In their proof Bickel, Götze and van Zwet [11] have used for the frequencies \(t\in [N^{(r-1)/r}/\log N,\, N\log N]\) a symmetrization technique of [23] which essentially estimates the absolute value of the Fourier transform of U by that of a bilinear version of Q thus neglecting L and its smoothess properties implied by Cramér’s condition (7). The approach of the present paper instead makes use of the smoothness of L and Q simultaneously.

The main contribution of this paper is in showing that condition (4) suggested by the counterexample (Example 1) is sufficient to prove the bound of Theorem 1. This condition is used in constructing estimates of weighted averages of the Fourier transform (26) that we briefly comment below. In fact, after initial “linearization” step we turn to slightly modified statistic \({\tilde{\mathbb T}}(X_1,\dots , X_N)\), where the nonlinear terms in \(X_1,\dots , X_m\) are removed (see (19)), and then switch to \(T' ={\tilde{\mathbb T}}(X_1,\dots , X_m, Y_{m+1},\dots , Y_N)\), where \(Y_{m+1},\dots , Y_{N}\) are truncated versions of \(X_{m+1},\dots , X_{N}\), see (42). Let \(\mathbf{E}_{\mathbb Y}\) denote the conditional expectation given \(\mathbb Y=(Y_{m+1},\dots , Y_N)\). The conditional Fourier transform \(\mathbf{E}_{\mathbb Y}\exp \{itT'\}=\mathbf{E}\bigl (\exp \{itT'\}\bigl |Y_{m+1},\dots , Y_N\bigr )\) contains a multiplicative component \(\alpha _t^m\), where

$$\begin{aligned} \alpha _t = \mathbf{E}_{\mathbb Y} \exp \left\{ itN^{-1/2}g(X_1)+itN^{-3/2}\sum _{l=m+1}^N\psi (X_1,Y_l)\right\} . \end{aligned}$$
(13)

For t satisfying \(|\alpha _t|^2\le 1-m^{-1}\ln ^2N\) the bound \(|\mathbf{E}_{\mathbb Y}\exp \{itT'\}|\le \exp \{-0.5\ln ^2N\}\) follows immediately. We then look carefully at the set of remaining t. We show that this set is a union of non-intersecting intervals (depending on \(\mathbb Y\)) each of size \(O(\sqrt{N/m}\ln N)\). While estimating the weighted averages of the Fourier transform over these intervals we split the frequency domain \(N^{1-\nu }\le |T|\le N^{1+\nu }\) into a deterministic sequence \(J_p\), \(p=1,2,\dots \), of consecutive intervals of size \(\Theta (N^{1-\nu })\) so that each ‘singular’ set \(\{t\in J_p:\, |\alpha _t|^2>1-m^{-1}\ln ^2N\}\) is either empty or an interval \([a_N, a_N+b_N^{-1}]\) of size \(b_N^{-1}=O\bigl (\sqrt{N/m}\ln N\bigr )\), (see (51) and (56) based on Lemma 12). At the very last step, using Kleitman’s concentration inequalities for sums of random variables with values in a function space, we upper bound the probability of the event that each particular singular set is non-empty, that is, the event that \(\sup _{t\in J_p}|\alpha _t|^2> 1-m^{-1}\ln ^2N\) thus obtaining an extra factor \(N^{-k\nu }\), \(k\ge 5\) to arrive to the error bound \(o(N^{-1})\).

More precisely, the non-zero projection to the g orthogonal part of \(\sum _{l=m+1}^N\psi (\cdot ,Y_l)\) which is non zero by condition (4) is used in the crucial Lemma 2. Via conditioning and randomization we represent it as a sum \(S_{\alpha }:=\sum _{j=1}^n \alpha _j f_j\) of independent \(\alpha _j=0,1\) variables with vectors \(f_j\) with \(||f_j||> \epsilon \) and estimate the combinatorial probability for those \(\alpha =(\alpha _1,\dots , \alpha _n)\) that a value larger than \(1- m^{-1} ln^2 N\) of the conditional Fourier transform, say \(\tilde{\phi }_t(\alpha )\), of \(f+ S_{\alpha }\) occurs at some ’singular’ frequency \( t \in J_p\). This is achieved by Kleitman’s partition of the \(2^n\) \(\alpha \)’s into at most \(\left( {\begin{array}{c}n\\ n/2\end{array}}\right) \) disjoint sets, say \(C_d, 1\le d\le \left( {\begin{array}{c}n\\ n/2\end{array}}\right) \), such that for different \(\alpha , \alpha '\in C_d\), \(S_{\alpha }\) and \(S_{\alpha '}\) are separated by a distance of at least \(\epsilon \). This separation implies by Lemma 2 that the event that t is singular somewhere in the interval \(J_p\) can be witnessed by at most one \(\alpha \in C_d\) for each \(C_d\). Hence the singular event among the \(\alpha 's\) has combinatorial probability at most \(\left( {\begin{array}{c}n\\ n/2\end{array}}\right) 2^{-n} =O(n^{-1/2})\).

The crucial arguments in Lemma 2 rest upon the observation on harmonics (see (118)) that two singular values \(\tilde{\phi }_t(\alpha ), \tilde{\phi }_s(\alpha ' ) \ge 1-m^{-1}\ln ^2N \) imply a similar high value of \(\mathbf{E}\exp \{i( t (f+ S_{\alpha }) - s(f+ S_{\alpha '}))\}\). If here t and s are close, say \(|t-s| \le \delta _2\), such a high value is excluded by the separation of \(S_{\alpha }\) and \(S_{\alpha }\) which dominate \((t-s)f\) (see step 4.2.1 in Lemma 2), whereas for \( \delta _2<|t-s| < N^{\nu -1/2}\), Cramér’s condition for \((t-s)f\) applies which together with size bounds on \(t S_{\alpha }\) and \(sS_{\alpha '}\) again prevents a high value (see step 4.2.2 in Lemma 2).

Note that this method of width bounds and separation of singular sets of Fourier transforms has been successfully employed for optimal approximation results for U-statistics with non-Gaussian limits by Bentkus, Götze and Zaitsev, see [5] and [27] and is strongly related to results on the distribution of quadratic forms on lattices by Bentkus and Götze, see [6] and [24], the latter providing a solution of the Davenport-Lewis conjecture for positive definite forms.

Finally, we mention that in the case of U statistics of degree three (\(\mathbb T=\mathbf{E}\mathbb T+L+Q+K\)) the proof is outlined in the unpublished manuscript of Götze and van Zwet [25]. We extend these arguments to general symmetric statistics using stochastic expansions by means of Hoeffding’s decomposition and bounds for various parts of the decomposition.

2.2 Outline of the proof

Firstly, using the linear structure induced by Hoeffding’s decomposition we replace \(\mathbb T/\sigma _\mathbb T\) by the statistic \({\tilde{\mathbb T}}\) which is conditionally linear given \(X_{m+1},\dots , X_N\). Secondly, invoking a smoothing inequality we pass from distribution functions to Fourier transforms. In the remaining steps we bound the difference \(\delta (t)=\mathbf{E}e^{it {\tilde{\mathbb T}}}- {\hat{G}}(t)\), for \(|t|\le N^{1+\nu }\). For "small frequencies" \(|t|\le C N^{1/2}\), we expand the characteristic function \(\mathbf{E}e^{it {\tilde{\mathbb T}}}\) in order to show that \(\delta (t)=o(N^{-1})\). Here we combine various techniques developed in earlier papers [4, 11, 16]. For remaining range of frequencies, that is \(C N^{1/2}\le |t|\le N^{1+\nu }\), we bound the summands \(\mathbf{E}e^{it {\tilde{\mathbb T}}}\) and \({\hat{G}}(t)\) separately. The cases of "large frequencies" \(N^{1-\nu }\le |t|\le N^{1+\nu }\) and "medium frequencies" \(C\sqrt{N}\le |t|\le N^{1-\nu }\) are treated in a different manner. For medium frequencies the Cramér type condition (7) ensures an exponential decay of \(|\mathbf{E}e^{it {\tilde{\mathbb T}}}|\). For large frequencies we combine conditions (7) and (4).

2.3 Hoeffding’s decomposition

Before starting the proof we introduce some notation. By \(c_*\) we shall denote a positive constant which may depend only on \(A_*,D_*,M_*, r, s, \nu _1,\nu _2, \delta \), but it does not depend on N. In different places the values of \(c_*\) may be different.

It is convenient to write the decomposition in the form

$$\begin{aligned} \mathbb T=\mathbf{E}\mathbb T+\sum _{1\le k\le N}\mathbb U_k, \qquad \mathbb U_k=\sum _{1\le i_1<\cdots <i_k\le N}g_k(X_{i_1},\dots , X_{i_k}), \end{aligned}$$
(14)

where, for every k, the symmetric kernel \(g_k\) is centered, i.e., \(\mathbf{E}g_k(X_1,\dots , X_k)=0\), and satisfies, see, e.g., [4],

$$\begin{aligned} \mathbf{E}\bigl (g_k(X_1,\dots , X_k)\bigr |X_2,\dots , X_k)=0 \qquad {\text {almost surely}}. \end{aligned}$$
(15)

Here we write \(g_1:= N^{-1/2}g\), \(g_2:= N^{-3/2}\psi \) and \(g_3:= N^{-5/2}\chi \). Furthermore, for an integer \(k>0\) we write \(\Omega _k:=\{1,\dots , k\}\). Given a subset \(A=\{i_1,\dots , i_k\}\subset \Omega _N\) we write, for short, \(T_A:=g_k(X_{i_1},\dots , X_{i_k})\). Put \(T_{\emptyset }:=\mathbf{E}\mathbb T\). Now the decomposition (14) can be written as follows

$$\begin{aligned} {\mathbb T} = \mathbf{E}\mathbb T+\sum _{1\le k\le N}{\mathbb U}_k = \sum _{A\subset \Omega _N}T_A, \qquad {\mathbb U}_k=\sum _{|A|=k,\, A\subset \Omega _N}T_A. \end{aligned}$$

2.4 Proof of Theorem 1

Throughout the proof we assume without loss of generality that

$$\begin{aligned} 4<r\le 5, \qquad 2<s\le 3 \qquad {\text {and}} \qquad \mathbf{E}\mathbb T=0, \qquad \sigma _\mathbb T^2=1. \end{aligned}$$
(16)

Denote, for \(t>0\),

$$\begin{aligned} \beta _t=\sigma ^{-t}\mathbf{E}|g(X_1)|^t, \qquad \gamma _t=\mathbf{E}|\psi (X_1,X_2)|^t, \qquad \zeta _t=\mathbf{E}|\chi (X_1,X_2,X_3)|^t. \end{aligned}$$

Linearization. Choose number \(\nu >0\) and integer m such that

$$\begin{aligned} \nu =600^{-1}\min \{\nu _1,\, \nu _2,\, s-2, \, r-4\}, \qquad m\approx N^{100\nu }. \end{aligned}$$
(17)

Split

$$\begin{aligned} {\mathbb T}={\mathbb T}_{[m]}+{\mathbb W}, \qquad {\mathbb T}_{[m]}=\sum _{A:\, A\cap \Omega _m\ne \emptyset }T_A, \qquad {\mathbb W}=\sum _{A:\, A\cap \Omega _m=\emptyset }T_A. \end{aligned}$$
(18)

Furthermore, write

$$\begin{aligned} {\mathbb T}_{[m]}&= {\mathbb U}_1^*+{\mathbb U}_2^*+\Lambda , \qquad \Lambda =\Lambda _1+\Lambda _2+\Lambda _3+\Lambda _4+\Lambda _5, \\ {\mathbb U}_1^*&= \sum _{i=1}^mT_{\{i\}}, \qquad {\mathbb U}_2^* = \sum _{i=1}^m\sum _{j=m+1}^N T_{\{i,j\}}, \\ \Lambda _1&= \sum _{1\le i<j\le m} T_{\{i,j\}}, \qquad \Lambda _2 = \sum _{|A|\ge 3, |A\cap \Omega _m|=2} T_A, \\ \Lambda _3&= \sum _{A:\, |A\cap \Omega _m|\ge 3} T_A, \qquad \Lambda _4=\sum _{|A|=3,\, |A\cap \Omega _m|=1}T_A, \\ \Lambda _5&= \sum _{i=1}^m\eta _i, \qquad \eta _i=\sum _{|A|\ge 4,\, A\cap \Omega _m=\{i\}}T_A. \end{aligned}$$

Before applying a smoothing inequality we replace \(\mathbb F(x)\) by

$$\begin{aligned} {\tilde{\mathbb F}}(x):=\mathbf{P}\{{\tilde{\mathbb T}}\le x\}, \qquad {\text { where}} \qquad {\tilde{\mathbb T}} = {\mathbb U}_1^*+{\mathbb U}_2^*+{\mathbb W}= {\mathbb T}-\Lambda . \end{aligned}$$
(19)

In order to show that \(\Lambda \) can be neglected we apply a simple Slutzky type argument. Given \(\varepsilon >0\), we have

$$\begin{aligned} \Delta \le \sup _{x\in \mathbb R}|{\tilde{\mathbb F}}(x)-G(x)| + \varepsilon \,\sup _{x\in \mathbb R}|G'(x)|+\mathbf{P}\{|\Lambda |>\varepsilon \}. \end{aligned}$$
(20)

From Lemma 5 we obtain via Chebyshev’s inequality, for \(\varepsilon =N^{-1-\nu }\),

$$\begin{aligned} \mathbf{P}\{|\Lambda |>\varepsilon \}&\le \sum _{i=1}^5\mathbf{P}\left\{ |\Lambda _i|>\frac{\varepsilon }{5}\right\} \\&\le \left( \frac{5}{\varepsilon }\right) ^3\mathbf{E}|\Lambda _1|^3 + \left( \frac{5}{\varepsilon }\right) ^2(\mathbf{E}\Lambda _2^2+\mathbf{E}\Lambda _3^2+\mathbf{E}\Lambda _5^2) + \left( \frac{5}{\varepsilon }\right) ^s\mathbf{E}|\Lambda _4|^s \\&\le c_*N^{-1-\nu }. \end{aligned}$$

In the last step we used conditions (5), (6) and the inequality (168). Furthermore, using (5) and (6) one can show that

$$\begin{aligned} \sup _{x\in \mathbb R}|G'(x)|\le c_*. \end{aligned}$$
(21)

Therefore, (20) implies

$$\begin{aligned} \Delta \le {\tilde{\Delta }}+c_*N^{-1-\nu }, \qquad {\text {where}} \qquad {\tilde{\Delta }}:=\sup _{x\in \mathbb R}|{\tilde{\mathbb F}}(x)-G(x)|. \end{aligned}$$

It remains to show that \({\tilde{\Delta }}\le c_*N^{-1-\nu }\).

A smoothing inequality. Given \(a>0\) and even integer \(k\ge 2\) consider the probability density function, see (10.7) in Bhattacharya and Rao [7],

$$\begin{aligned} x\rightarrow g_{a,k}(x)=a \,c(k)(ax)^{-k}\sin ^k(ax), \end{aligned}$$
(22)

where c(k) is the normalizing constant. Its characteristic function

$$\begin{aligned} {\hat{g}}_{a,k}(t) = \int _{-\infty }^{+\infty } e^{itx}g_{a,k}(x)dx = 2\pi \, a \, c(k)u^{*k}_{[-a,a]}(t) \end{aligned}$$

vanishes outside the interval \(|t|\le ka\). Here \(u^{*k}_{[-a,a]}(t)\) denotes the probability density function of the sum of k independent random variables each uniformly distributed in \([-a,a]\). It is easy to show that the function \(t\rightarrow {\hat{g}}_{a,k}(t)\) is unimodal and symmetric around \(t=0\).

Let \(\mu \) be the probability distribution with the density \(g_{a,2}\), where a is chosen to satisfy \(\mu ([-1,1])=3/4\). Given \(T>1\) define \(\mu _T({\mathcal{A}})=\mu (T{\mathcal{A}})\), for a Borel set \({\mathcal{A}}\subset \mathbb R\). Let \({\hat{\mu }}_T\) denote the characteristic function corresponding to \(\mu _T\).

We apply Lemma 12.1 of [7]. It follows from (21) and the identity \(\mu _T([-T^{-1}, T^{-1}])= 3/4\) that

$$\begin{aligned} {\tilde{\Delta }} \le 2\sup _{x\in \mathbb R}\bigl |({\tilde{{\mathcal{F}}}}-{\mathcal{G}})*\mu _T(-\infty ,x]\bigr | + c_*T^{-1}. \end{aligned}$$
(23)

Here \({\tilde{\mathcal{F}}}\) and \(\mathcal{G}\) denote the probability distribution of \({\tilde{\mathbb T}}\) and the signed measure with density function \(G'(x)\) respectively. Furthermore, \(*\) denotes the convolution operation. Proceeding as in the proof of Lemma 12.2 ibidem we obtain

$$\begin{aligned} ({\tilde{\mathcal{F}}}-\mathcal{G})*\mu _T(-\infty ,x] = \frac{1}{2\pi }\int _{-\infty }^{+\infty } e^{-itx} \Bigl ( \mathbf{E}e^{it{\tilde{\mathbb T}}}-{\hat{G}}(t)\Bigr ) \frac{{\hat{\mu }}_T(t)}{-it}dt, \end{aligned}$$
(24)

where \({\hat{G}}\) denotes the Fourier transform of G(x). Note that \({\hat{\mu }}_T(t)\) vanishes outside the interval \(|t|\le 2aT\). Finally, we obtain from (23) and (24) that

$$\begin{aligned} {\tilde{\Delta }} \le \frac{1}{\pi } \sup _{x\in \mathbb R}|I(x)| +c_*\frac{2a}{T}, \qquad I(x):= \int _{-T}^{T} e^{-itx} \bigl (\mathbf{E}e^{ it {\tilde{\mathbb T}} } - {\hat{G}} (t)\bigr ) \frac{{\hat{\mu }}_{T'}(t)}{-it}dt, \end{aligned}$$
(25)

where \(T'=T/2a\). Here we use the fact that \({\hat{\mu }}_{T'}(t)=0\) for \(|t|>T\). Denote \(K_N(t)= {\hat{\mu }}_{T'}(t)\) and observe that \(|K_N(t)|\le 1\) (since \(\mu _{T'}\) is a probability measure). Let

$$\begin{aligned} T=N^{1+\nu }. \end{aligned}$$

We have

$$\begin{aligned}&|I(x)|\le I_1+I_2+|I_3|+|I_4|, \\&I_1 = \int _{|t|\le t_1} \bigl |\mathbf{E}e^{ it {\tilde{\mathbb T}} } - {\hat{G}}(t)\bigr | \frac{dt}{|t|}, \quad I_2=\int _{t_1<|t|<T}|{\hat{G}}(t)|\frac{dt}{|t|}, \\&I_3=\int _{t_1<|t|<t_2}e^{-itx}\mathbf{E}e^{it{\tilde{\mathbb T}}}\frac{K_N(t)}{-it}dt, \quad I_4=\int _{t_2<|t|<T}e^{-itx}\mathbf{E}e^{it{\tilde{\mathbb T}}}\frac{K_N(t)}{-it}dt. \end{aligned}$$

Here we denote \(t_1=N^{1/2}10^{-3}/\beta _3\) and \(t_2=N^{1-\nu }\). In view of (25) the bound \({\tilde{\Delta }}\le c_*N^{-1-\nu }\) follows from the bounds

$$\begin{aligned} |I_k|\le c_*N^{-1-\nu }, \quad k=1,2,3, \quad {\text {and}} \quad |I_4|\le c_*N^{-1-\nu }(1+\delta _*^{-1}N^{-\nu }). \end{aligned}$$
(26)

The bound \(I_2\le c_*N^{-1-\nu }\) is a consequence of the exponential decay of \(|{\hat{G}}(t)|\) as \(|t|\rightarrow \infty \). In Sect. 3 we show (26) for \(k=3,4\). The proof of (26), for \(k=1\), is based on careful expansions and is given Sect. 5.

3 Large frequencies

Here we prove inequalities (26) for \(I_3\) and \(I_4\). The proof of \(|I_3|\le c_*N^{-1-\nu }\) is relatively simple and is deferred to the end of the section.

Let us upper bound \(|I_4|\). We will show that

$$\begin{aligned} \Bigl | \int _{N^{1-\nu }<|t|<N^{1+\nu }}e^{-itx}\mathbf{E}e^{it{\tilde{\mathbb T}}}\frac{K_N(t)}{t}dt \Bigr | \le c_*\frac{1+\delta _{*}^{-1}}{N^{1+2\nu }}. \end{aligned}$$
(27)

In what follows we assume that N is sufficiently large, say \(N>C_*\), where \(C_*\) depends only on \(A_*,D_*,M_*,r,s,\nu _1,\nu _2,\delta \). We use this inequality in several places below, where the constant \(C_*\) can be easily specified. Note that for small N such that \(N\le C_*\) the inequality (27) becomes trivial.

3.1 Notation

Let us first introduce some notation. Introduce the number

$$\begin{aligned} \alpha =3/(r+2) \end{aligned}$$
(28)

and note that for \(r\in (4,5]\) and \(\nu \) defined by (17) we have

$$\begin{aligned} 2/r<\alpha<1/2 \qquad {\text {and}} \qquad 80\nu <\min \{r\alpha -2,\, 1-2\alpha \}. \end{aligned}$$

Given N introduce the integers

$$\begin{aligned} n\approx N^{50\nu }, \qquad M=\lfloor (N-m)/n\rfloor . \end{aligned}$$
(29)

We have \(N-m=M\,n+s\), where the integer \(0\le s<n\). Observe, that the inequalities \(\nu <600^{-1}\) and \(m<N^{1/2}\), see (17), imply \(M>n\). Therefore \(s<M\). Split the index set

$$\begin{aligned}&\{m+1,\dots , N\}=O_1\cup O_2\cup \dots \cup O_{n}, \nonumber \\&O_i=\{j:\, m+(i-1)M< j\le m+iM\}, \quad 1\le i\le n-1, \nonumber \\&O_n=\{j:\, m+(n-1)M<j\le N\}. \end{aligned}$$
(30)

Clearly, \(O_1,\dots , O_{n-1}\) are of equal size (=M) and \(|O_n|=M+s<2M\).

We shall assume that the random variable \(X:\Omega \rightarrow \mathcal X\) is defined on the probability space \((\Omega , P)\) and \(P_X\) is the probability distribution on \(\mathcal X\) induced by X. Given \(p\ge 1\) let \(L^p=L^p(\mathcal X,P_X)\) denote the space of real functions \(f:\mathcal X\rightarrow \mathbb R\) with \(\mathbf{E}|f(X)|^p<\infty \). Denote \(\Vert f\Vert _p=(\mathbf{E}|f(X)|^p)^{1/p}\). With a random variable f(X) we associate an element (vector) \(f=f(\cdot )\) of \(L^p\), \(p\le r\). Let \(p_g:L^2\rightarrow L^2\) denote the projection onto the subspace orthogonal to the vector \(g(\cdot )\) in \(L^2\). Given \(h\in L^2\), decompose

$$\begin{aligned} h=a_hg+h^*, \qquad {\text {where}} \qquad a_h=\left< h,g\right>\Vert g\Vert _2^{-2} \qquad {\text {and}} \qquad h^*=p_g(h). \end{aligned}$$
(31)

Here \(\left<h,g\right>=\int h(x)g(x)P_X(dx)\). For \(h\in L^r\) we have

$$\begin{aligned} \Vert h\Vert _r\ge \Vert h\Vert _2\ge \Vert h^*\Vert _2. \end{aligned}$$
(32)

Furthermore, for \(r^{-1}+v^{-1}=1\) (here \(r\ge 2\ge v>1\)) we have, by Hölder’s inequality,

$$\begin{aligned} |\left<h,g\right>|\le \Vert h\Vert _r\Vert g\Vert _v\le \Vert h\Vert _r\Vert g\Vert _2. \end{aligned}$$

In particular,

$$\begin{aligned} |a_h|\le \Vert h\Vert _r/\Vert g\Vert _2. \end{aligned}$$
(33)

Denote

$$\begin{aligned} c_g:=1+\Vert g\Vert _r/\Vert g\Vert _2, \qquad c_g^*:=1+M_*^{1/r}A_*^{-1/2} \end{aligned}$$

and observe that \(c_g\le c_g^*\). It follows from the decomposition (31) and (33) that

$$\begin{aligned} \Vert h^*\Vert _r \le \Vert h\Vert _r+|a_h|\,\Vert g\Vert _r\le \Vert h\Vert _r(1+\Vert g\Vert _r/\Vert g\Vert _2) = c_g\Vert h\Vert _r. \end{aligned}$$
(34)

Introduce the numbers

$$\begin{aligned} a_1 = \frac{1}{4} \min \left\{ \frac{1}{12c_g^*}, \, \frac{(c_rA_*/2^rM_*)^{1/(r-2)}}{1+4A_*^{-1/2}} \right\} , \qquad c_r=\frac{7}{24}\frac{1}{2^{r-1}}. \end{aligned}$$
(35)

It follows from (7) that there exist \(\delta ',\delta ''>0\) depending on \(A_*, M_*, \delta \) such that (uniformly in N) Cramér’s characteristic \(\rho \) satisfies the inequalities

$$\begin{aligned} \rho (a_1,2N^{-\nu + 1/2})\ge \delta ', \qquad \rho ((2\beta _3)^{-1},N^{\nu _2+1/2})\ge \delta ''. \end{aligned}$$
(36)

We shall prove the first inequality only. In view of (7) it suffices to prove that \(\rho (a_1,\beta _3^{-1})\ge \delta '\). Expanding the exponent in powers of \(itg(X_1)/\sigma \) we show the inequality

$$\begin{aligned} |\mathbf{E}e^{it\sigma ^{-1}g(X_1)}|\le 1-2^{-1}t^2(1-3^{-1}|t|\beta _3). \end{aligned}$$

For \(|t|\le \beta _3^{-1}\) this inequality implies

$$\begin{aligned} |\mathbf{E}e^{it\sigma ^{-1}g(X_1)}|\le 1-t^2/3. \end{aligned}$$

Therefore, \(\rho (a_1,\beta _3^{-1})\ge a_1^2/3\) and we can choose \(\delta '=\min \{\delta , a_1^2/3\}\) in (36).

Introduce the constant (depending only on \(A_*,M_*,\delta \))

$$\begin{aligned} \delta _1= \delta '/(10 c_g^*). \end{aligned}$$
(37)

Note that \(0<\delta _1<1/10\). Given \(f\in L^r\) and \(T_0\in {\mathbb R}\) such that

$$\begin{aligned} N^{-\nu +1/2}\le |T_0|\le N^{\nu +1/2}, \end{aligned}$$
(38)

denote

$$\begin{aligned}&I(T_0)=[T_0, \, T_0+\delta _1N^{-\nu +1/2}],\nonumber \\&u_t(f)=\int \exp \bigr \{it\bigl (g(x)+N^{-1/2}f(x)\bigr )\bigr \}P_X(dx), \nonumber \\&v(f) = \sup _{t\in I(T_0)}|u_t(f)|, \qquad \tau (f)=1-v^2(f). \end{aligned}$$
(39)

Given a random variable \(\eta \) with values in \(L^r\) and number \(0<s< 1\), define

$$\begin{aligned} d_s(\eta ,I(T_0)) = \mathbb I_{\{v^2(\eta )>1-s^2\}} \mathbb I_{\{\Vert \eta \Vert _r\le N^{\nu }\}}, \qquad \delta _s(\eta ,I(T_0))=\mathbf{E}d_s(\eta , I(T_0)). \end{aligned}$$
(40)

Introduce the function

$$\begin{aligned} \psi ^{**}(x,y)=\psi (x,y)-b(x)g(y)-b(y)g(x) \end{aligned}$$
(41)

and the number

$$\begin{aligned} \delta _3^2=\mathbf{E}|\psi ^{**}(X_1,X_2)|^2. \end{aligned}$$

It follows from (4) and our assumption \(\sigma _\mathbb T^2=1\), see (16), that \(\delta _3^2\ge \delta _{*}^2\).

3.2 Proof of (27)

We write \(\mathbf{E}_{\mathbb Y}\exp \{itT'\}\) in the form \(\mathbf{E}_{\mathbb Y}\exp \{itT'\}=\alpha _t^m\exp \{itW'\}\), where \(\alpha _t\) is defined in (13) and where the random variable \(W'\) is defined in the same way as \({\mathbb W}\) in (18), but with \(T_A=g_k(X_{i_1},\dots , X_{i_k})\) replaced by \(g_k(Y_{i_1},\dots , Y_{i_k})\) for each \(A=\{i_1,\dots , i_k\}\). A standard way to upper bound a quantity like \(|\mathbf{E}_{\mathbb Y}e^{itT'}|\) is to show an exponential decay (in m) of the product \(|\alpha _t^m|\) using a Cramér type condition. This task can be accomplished for medium frequencies. Indeed, for \(|t|=o( N)\) the quadratic part \(itN^{-3/2}\sum _{j=m+1}^N\psi (X_1,Y_j)\) can be neglected and Cramér’s condition implies \(|\alpha _t|\le 1-v'\) for some \(v'>0\). This leads to an exponential bound \(|\alpha _t^m|\le e^{-mv'}\). For large frequencies \(|t|\approx N\), the contribution of the quadratic part becomes significant. To upper bound \(|\alpha _t^m|\) we use condition (4). We show that, for a large set of values \(t\in J_p\), see (51), Cramér’s condition (7) yields the desired decay of \(|\alpha _t^m|\), while the measure of the set of remaining t is small with high probability.

Step 1. Truncation. Recall that the random variable \(X:\Omega \rightarrow \mathcal X\) is defined on the probability space \((\Omega , P)\). Let \(X'\) be an independent copy so that \((X,X')\) is defined on \((\Omega \times \Omega ', P\times P)\), where \(\Omega '=\Omega \). It follows from \(\mathbf{E}|\psi (X,X')|^r<\infty \), by Fubini, that for P almost all \(\omega '\in \Omega '\) the function \(\psi (\cdot , X'(\omega '))=\{x\rightarrow \psi (x,X'(\omega ')), \, x\in \mathcal X\}\) is an element of \(L^r\). Furthermore, one can define an \(L^r\)-valued random variable \(Z':\Omega '\rightarrow L^r\) such that \(Z'(\omega ')=\psi (\cdot , X'(\omega '))\), for P almost all \(\omega '\). Consider the event \({\tilde{\Omega }}=\{\Vert Z'\Vert _r\le N^{\alpha }\}\subset \Omega '\) and denote \(q_N=P({\tilde{\Omega }})\). Here \(\Vert Z'\Vert _r=(\int |\psi (x,X'(w'))|^rP_X(dx))^{1/r}\) denotes the \(L^r\) norm of the random vector \(Z'\) and \(\alpha \) is defined in (28). Let \(Y:{\tilde{\Omega }}\rightarrow \mathcal X\) denote the random variable \(X'\) conditioned on the event \({\tilde{\Omega }}\). Therefore Y is defined on the probability space \(({\tilde{\Omega }}, {\tilde{P}})\), where \({\tilde{P}}\) denotes the restriction of \(q_N^{-1}P\) to the set \({\tilde{\Omega }}\) and, for every \(\omega '\in {\tilde{\Omega }}\), we have \(Y(\omega ')=X'(\omega ')\). Let Z denote the \(L^r-\) valued random element \(\{x\rightarrow \psi (x,\, Y(\omega '))\}\) defined on the probability space \(({\tilde{\Omega }}, {\tilde{P}})\).

We can assume that \({\mathbb X}:=(X_1,\dots , X_N)\) is a sequence of independent copies of X defined on the probability space \((\Omega ^N, P^N)\). Let \({\overline{\omega }}=(\omega _1,\dots , \omega _N)\) denote an element of \(\Omega ^N\). Every \(X_j\) defines random vector \(Z_j'=\psi (\cdot , X_j)\) taking values in \(L^r\). Introduce events \(A_j:=\{\Vert Z_j'\Vert _r\le N^{\alpha }\}\subset \Omega ^N\) and let \({\mathbb X}'=(X_1,\dots , X_m,Y_{m+1},\dots , Y_N)\) denote the sequence \({\mathbb X}\) conditioned on the event \(\Omega ^*=\cap _{j=m+1}^NA_j=\Omega ^{m}\times {\tilde{\Omega }}^{N-m}\). Clearly, \({\mathbb X}'({\overline{\omega }})={\mathbb X}({\overline{\omega }})\) for every \({\overline{\omega }}\in \Omega ^*\) and \({\mathbb X}'\) is defined on the space \(\Omega ^{m}\times {\tilde{\Omega }}^{N-m}\) equipped with the probability measure \(P^m\times {\tilde{P}}^{N-m}\). In particular, the random variables \(X_1,\dots , X_m, Y_{m+1},\dots , Y_{N}\) are independent and \(Y_j\), for \(m+1\le j\le N\), has the same distribution as Y. Let \(Z_j\) denote the \(L^r-\) valued random element \(\{x\rightarrow \psi (x,Y_j),\, x\in \mathcal X\}\), for \(m+1\le j\le N\). Let

$$\begin{aligned} T':={\tilde{\mathbb T}}(X_1,\dots , X_m,Y_{m+1},\dots , Y_N). \end{aligned}$$
(42)

We are going to replace \(\mathbf{E}e^{it{\tilde{\mathbb T}}}\) by \(\mathbf{E}e^{itT'}\). For \(s>0\) we have almost surely

$$\begin{aligned} 1-{\mathbb I}_{A_j}\le N^{-\alpha \,s}\Vert Z_j'\Vert _r^{s}, \qquad \Vert Z_j'\Vert _r^r=\mathbf{E}\bigl ( |\psi (X,X_j)|^r \bigl |\, X_j\bigr ). \end{aligned}$$
(43)

From (43) with \(s=r\) we obtain, by Chebyshev’s inequality, that

$$\begin{aligned} 0 \le 1-q_N \le N^{-r\alpha }\mathbf{E}|\psi (X,X_j)|^r \le N^{-r\alpha }M_* \le c_* N^{-2-3\nu }. \end{aligned}$$
(44)

Consequently, for \(k\le N\) we have

$$\begin{aligned} q_N^{-k}\le & {} (1-N^{-r\, \alpha }M_*)^{-k} \le (1-N^{-2}M_*)^{-N} \le c_*, \nonumber \\ q_N^{-k}-1\le & {} kq_N^{-k}(1-q_N)\le c_*kN^{-2-3\nu }\le c_*N^{-1-3\nu }. \end{aligned}$$
(45)

Using the identity, which holds for a measurable function \(f:\mathcal X^N\rightarrow {\mathbb R}\),

$$\begin{aligned} \mathbf{E}f(X_1,\dots , X_m, Y_{m+1},\dots , Y_N) = \mathbf{E}f(X_1,\dots , X_N) \frac{ {\mathbb I}_{A_{m+1}}\dots {\mathbb I}_{A_N} }{q_N^{(N-m)}} \end{aligned}$$
(46)

we obtain from (45) and (46) for \(f\ge 0\) that

$$\begin{aligned} \mathbf{E}f(X_1,\dots , X_m, Y_{m+1},\dots , Y_N) \le c_*\mathbf{E}f(X_1,\dots , X_N). \end{aligned}$$
(47)

Furthermore, (45) and (46) imply

$$\begin{aligned} |\mathbf{E}e^{it(T'-x)}-\mathbf{E}e^{it({\tilde{\mathbb T}}-x)}|\le & {} \bigl (q_N^{-(N-m)}-1\bigr ) + \bigl (1-\mathbf{P}\{ A_{m+1}\cap \dots \cap A_N \}\bigr ) \nonumber \\= & {} (q_N^{-(N-m)}-1)+(1-q_N^{N-m}) \le c_*N^{-1-3\nu }. \end{aligned}$$
(48)

Now we can replace the integral in (27) by the integral

$$\begin{aligned} I := \int _{N^{1-\nu } \le |t|\le N^{1+\nu }} \mathbf{E}e^{it{\hat{T}}}v_N(t)dt, \qquad {\text {where}} \qquad v_N(t)=t^{-1} K_N(t), \qquad {\hat{T}}=T'-x. \end{aligned}$$
(49)

In view of (48) and the simple inequality \(|K_N(t)|\le 1\) the error of this replacement is \(c_*N^{-1-2\nu }\). Hence in order to prove (27) it remains to show that

$$\begin{aligned} |I| \le c_*\frac{1+\delta _3^{-1}}{N^{1+2\nu }}. \end{aligned}$$
(50)

Step 2. Here we prove (50). We split the integral

$$\begin{aligned} I= \sum _pI_p, \qquad I_p=\mathbf{E}\int _{t\in J_p} e^{it{\hat{T}}}v_N(t)dt, \end{aligned}$$
(51)

where \(\{J_p,\, p=1,2,\dots \}\) is a sequence of consecutive intervals of length \(\approx \delta _1N^{1-\nu }\) each and \(\cup _pJ_p=[N^{1-\nu }, N^{1+\nu }]\). Recall that \(\delta _1\) is defined in (37). To prove (50) we show that for every p

$$\begin{aligned} |I_p| \le c_*N^{-2}+c_*N^{-1-4\nu }\bigl (1+\delta _3^{-1}\bigr ). \end{aligned}$$
(52)

We fix p and prove (52). Firstly, we replace \(I_p\) by \(\mathbf{E}J_*\), where

$$\begin{aligned} J_*=\int {\mathbb I}_{\{t\in I_*\}}v_N(t)\mathbf{E}_{\mathbb Y}e^{it{\hat{T}}}dt \end{aligned}$$

and where \(I_*=I_*(Y_{m+1}, \dots , Y_{N})\subset J_p\) is a random subset:

$$\begin{aligned} I_*=\{t\in J_p:\, |\alpha _t|^2>1-\varepsilon _m^2\}, \qquad \varepsilon _m^2=m^{-1}\ln ^{2}N. \end{aligned}$$
(53)

Note that for \(t\in J_p\setminus I_*\), we have

$$\begin{aligned} |\mathbf{E}_{\mathbb Y}e^{itT'}| \le |\alpha _t|^m \le (1-\varepsilon _m^2)^{m/2} \le c_*N^{-3}. \end{aligned}$$

These inequalities imply the bound

$$\begin{aligned} |I_p-\mathbf{E}J_*|\le c_*N^{-2}. \end{aligned}$$
(54)

Secondly, we show that with a high probability the set \(I_*\subset J_p\) is an interval. This fact and the fact that \(v_N(t)\) is monotone will be used latter to bound the integral \(J_*\). Introduce the \(L^r-\) valued random element

$$\begin{aligned} S=N^{-1/2}(Z_{m+1}+\dots +Z_N)=N^{-1/2}\sum _{j=m+1}^N\psi (\cdot , Y_j). \end{aligned}$$
(55)

We apply Lemma 12 (see below) to the set \(N^{-1/2}I_*\) conditionally given the event \({{\mathbb S}}=\{\Vert S\Vert _r<N^{\nu /10}\}\). This lemma shows that \(N^{-1/2} I_*\) is an interval of size at most \(c_*\varepsilon _m\). Hence we can write \(I_*=(a_N,a_N+b_N^{-1})\) and

$$\begin{aligned} {\mathbb I}_{{\mathbb S}} J_*={\mathbb I}_{{\mathbb S}}\mathbf{E}_{\mathbb Y}{\tilde{J}}_*, \qquad {\tilde{J}}_*=\int _{a_N}^{a_N+b_N^{-1}}v_N(t)e^{it{\hat{T}}}dt, \end{aligned}$$
(56)

where the random variables \(a_N,b_N\) (functions of \(Y_{m+1},\dots , Y_N\)) satisfy

$$\begin{aligned} a_N\in J_p \qquad {\text {and}} \qquad b_N^{-1}\le c_*\varepsilon _m\sqrt{N}=c_*\sqrt{N} m^{-1/2}\ln N. \end{aligned}$$

Furthermore, by Lemma 13 below we have \(\mathbf{P}\{{\mathbb S}\}\ge 1-c_*N^{-3}\). Therefore,

$$\begin{aligned} |\mathbf{E}J_*-\mathbf{E}{\mathbb I}_{{\mathbb S}} J_*|\le c_*N^{-2}. \end{aligned}$$
(57)

Next, we observe that \(I_*\not =\emptyset \) if and only if \( {\tilde{\alpha }}^2>1-\varepsilon _m^2\), where

$$\begin{aligned} {\tilde{\alpha }}=\sup \{|\alpha _t|:\, t\in J_p\}. \end{aligned}$$

Therefore we can write (56) in the form

$$\begin{aligned} {\mathbb I}_{{\mathbb S}}J_*={\mathbb I}_{\mathbb B}J_*={\mathbb I}_{\mathbb B}\mathbf{E}_{\mathbb Y}{\tilde{J}}_*, \qquad {\text {where}} \qquad \mathbb B=\{{\tilde{\alpha }}^2>1-\varepsilon _m^2\}\cap {\mathbb S}. \end{aligned}$$

This identity together with (54) and (57) imply

$$\begin{aligned} |I_p|\le |\mathbf{E}{\mathbb I}_{\mathbb B}\mathbf{E}_{\mathbb Y}{\tilde{J}}_*|+c_*N^{-2}. \end{aligned}$$
(58)

Using the integration by parts formula we shall show below that

$$\begin{aligned}&|\mathbf{E}{\mathbb I}_{\mathbb B}\mathbf{E}_{\mathbb Y}{\tilde{J}}_*| \le \frac{c}{N^{1-\nu }} \Bigl ( \mathbf{P}\{\mathbb B\} + \int _{b_N}^1\frac{\mathbf{P}\{\mathbb B_{\varepsilon }\}}{\varepsilon ^2}d\varepsilon \Bigr ), \qquad {\text {where}} \qquad \mathbb B_{\varepsilon }:=\mathbb B\cap \{|{\hat{T}}|\le \varepsilon \}.\nonumber \\ \end{aligned}$$
(59)

Moreover, we shall show that

$$\begin{aligned} \int _{b_N}^1\frac{\mathbf{P}\{\mathbb B_{\varepsilon }\}}{\varepsilon ^2}d\varepsilon \le c_*\frac{1+\delta _3^{-1}}{N^{5\nu }} \qquad {\text {and}} \qquad \mathbf{P}\{\mathbb B\} \le c_*\frac{1+\delta _3^{-1}}{N^{5\nu }}. \end{aligned}$$
(60)

The latter inequalities in combination with (58) and (59) yield (52). We prove (60) in Sect. 3.3.

Let us prove (59). Firstly, we show that

$$\begin{aligned} |{\tilde{J}}_*|\le c(|{\hat{T}}|+b_N)^{-1}a_N^{-1}. \end{aligned}$$
(61)

From the integration by parts formula we obtain the identity

$$\begin{aligned} i{\hat{T}}{\tilde{J}}_* = v_N(t)e^{it{\hat{T}}}\bigr |_{a_N}^{a_N+b_N^{-1}} -\int _{a_N}^{a_N+b_N^{-1}}v'_N(t)e^{it{\hat{T}}}dt =: a'-a''. \end{aligned}$$
(62)

By our choice of the smoothing kernel the function \(v_N(t)\) is monotone on \(J_p\). Therefore

$$\begin{aligned} |a''| \le \int _{a_N}^{a_N+b_N^{-1}}\left| v'_N(t)\right| dt = \left| \int _{a_N}^{a_N+b_N^{-1}}v'_N(t)dt \right| = \left| v_N(a_N)-v_N(a_N+b_N^{-1})\right| . \end{aligned}$$

Invoking the simple inequality \(|a'|\le |v_N(a_N)|+|v_N(a_N+b_N^{-1})|\) and using \(|v_N(t)|\le |t|^{-1}\) we obtain from (62) that

$$\begin{aligned} |{\hat{T}}{\tilde{J}}_*| \le c \, \bigl (a_N^{-1}+ (a_N+b_N^{-1})^{-1}\bigr ) \le c \, a_N^{-1}. \end{aligned}$$

For \(|{\hat{T}}|>b_N\), this inequality implies (61). For \(|{\hat{T}}|\le b_N\) the inequality (61) follows from the inequalities

$$\begin{aligned} |{\tilde{J}}_*| \le \int _{a_N}^{a_N+b_N^{-1}}|v_N(t)|dt \le \int _{a_N}^{a_N+b_N^{-1}}\frac{c}{|t|} dt \le c \, a_N^{-1}b_N^{-1}. \end{aligned}$$

The proof of (61) is complete. Now from (61) and the inequality \(a_N \ge N^{1-\nu }\) we obtain that

$$\begin{aligned} |{\tilde{J}}_*|\le c(|{\hat{T}}|+b_N)^{-1}N^{-1+\nu }. \end{aligned}$$

Finally, using the inequality (which holds for arbitrary real number v)

$$\begin{aligned} \frac{1}{|v|+b_N} \le 2+2\int _{b_N}^1\frac{d\varepsilon }{\varepsilon ^2}{\mathbb I}_{\{|v|\le \varepsilon \}} \end{aligned}$$

we show that

$$\begin{aligned} |{\tilde{J}}_*| \le \frac{c_*}{N^{1-\nu }} \left( 1 + \int _{b_N}^1\frac{d\varepsilon }{\varepsilon ^2}{\mathbb I}_{\{|{\hat{T}}|\le \varepsilon \}} \right) . \end{aligned}$$

The latter inequality implies (59).

3.3 Proof of (60)

The first and second inequality of (60) are proved in steps A and B.

Step A. Proof of the first inequality of (60). Recall \({\mathbb W}\) from (18). We split

$$\begin{aligned}&{\mathbb W}=W_1+W_2+W_3, \qquad W_1=\frac{1}{N^{1/2}}\sum _{j=m+1}^Ng(X_j), \\&W_2=\frac{1}{N^{3/2}}\sum _{m<i<j\le N}\psi (X_i,X_j), \qquad W_3=\sum _{|A|\ge 3:A\cap \Omega _m=\emptyset }T_A. \end{aligned}$$

Define \(W_1', W_2', W_3'\) as \(W_1, W_2, W_3\) above, but with \(X_j\) replaced by \(Y_j\), for \(m+1\le j\le N\). We have \(W'=W_1'+W_2'+W_3'\). Now we write \({\hat{T}}\) (see (49)) in the form \({\hat{T}}=L+\Delta +W_3'\), where

$$\begin{aligned}&L = \frac{1}{\sqrt{N}}\sum _{j=1}^m g(X_j) + \frac{1}{\sqrt{N}}\sum _{j=m+1}^N g(Y_j) -x, \nonumber \\&\Delta = \frac{1}{N^{3/2}}\sum _{j=1}^m\sum _{l=m+1}^N\psi (X_j,Y_l) + \frac{1}{N^{3/2}}\sum _{m+1\le j<l\le N}\psi (Y_j,Y_l). \end{aligned}$$
(63)

The inequalities \(|{\hat{T}}|\le \varepsilon \) and \(|L|\ge 2\varepsilon \) imply \(|\Delta +W_3'|>\varepsilon \). Therefore,

$$\begin{aligned} \mathbf{P}\{\mathbb B_{\varepsilon }\} \le \mathbf{P}\{\mathbb B\cap \{|L|\le 2\varepsilon \}\, \} + \mathbf{P}\{|{\hat{T}}|\le \varepsilon ,\, |\Delta +W_3'|\ge \varepsilon \} =:I_1(\varepsilon )+I_2(\varepsilon ). \end{aligned}$$

To prove the first inequality of (60) we show that

$$\begin{aligned} \int _{b_N}^1\frac{d\varepsilon }{\varepsilon ^2}I_1(\varepsilon )\le c_*N^{-5\nu }(1+\delta _3^{-1}), \qquad \int _{b_N}^1\frac{d\varepsilon }{\varepsilon ^2}I_2(\varepsilon )\le c_*N^{-5\nu }. \end{aligned}$$
(64)

Step A.1. Proof of the second inequality of (64). We have

$$\begin{aligned} I_2(\varepsilon )\le \mathbf{P}\{|W_3'|\ge \varepsilon /2\}+I_3(\varepsilon ), \quad {\text {where}} \quad I_3(\varepsilon ):=\mathbf{P}\{|L+\Delta |<3\varepsilon /2, |\Delta |>\varepsilon /2\}. \end{aligned}$$
(65)

It follows from (47), by Chebyshev’s inequality, that \( \mathbf{P}\{|W_3'|>\varepsilon /2\}\le c_*\varepsilon ^{-2}\mathbf{E}W_3^2\). Furthermore, invoking the inequalities, see (167), (168) below,

$$\begin{aligned} \mathbf{E}W_3^2 = \sum _{|A|\ge 3: A\cap \Omega _m=\emptyset }\mathbf{E}T_A^2 \le \sum _{|A|\ge 3}\mathbf{E}T_A^2\le N^{-2}\Delta _3^2 \le c_*N^{-2} \end{aligned}$$

we obtain from (65) that \(I_2(\varepsilon )\le I_3(\varepsilon )+c_*\varepsilon ^{-2}N^{-2}\). Since

$$\begin{aligned} \int _{b_N}^1\frac{d\varepsilon }{\varepsilon ^2}\Bigl (\frac{1}{\varepsilon ^2N^2}\Bigr ) \le c_* b_N^{-3}N^{-2} \le c_*N^{-5\nu }, \end{aligned}$$

it suffices to show inequality (64) for \(I_3(\varepsilon )\) (instead of \(I_2(\varepsilon )\)). Recall the notation \(\Lambda _1=N^{-3/2}\sum _{1\le i<j\le m}\psi (X_i,X_j)\) and put \(U=\Lambda _1+\Delta \). We have

$$\begin{aligned} I_3(\varepsilon )\le \mathbf{P}\{|\Lambda _1|>\varepsilon /4\}+I_4(\varepsilon ), \quad {\text {where}} \quad I_4(\varepsilon ):=\mathbf{P}\{|L+U|<2\varepsilon ,\, |U|>\varepsilon /4\}. \end{aligned}$$

Invoking the inequality, which follows by Chebyshev’s inequality,

$$\begin{aligned} \mathbf{P}\{|\Lambda _1|>\varepsilon /4\} \le 16\varepsilon ^{-2}\mathbf{E}\Lambda _1^2 \le c_*\varepsilon ^{-2}m^2N^{-3} \end{aligned}$$

we upper bound the integral

$$\begin{aligned} \int _{b_N}^1\frac{d\varepsilon }{\varepsilon ^2}\mathbf{P}\{|\Lambda _1|>\varepsilon /4\} \le \int _{b_N}^1\frac{d\varepsilon }{\varepsilon ^2}\Bigl (\frac{m^2}{\varepsilon ^2N^3}\Bigr ) \le c_* b_N^{-3}m^2N^{-3} \le c_*N^{-5\nu }. \end{aligned}$$

Hence, it remains to show the second inequality of (64) for \(I_4(\varepsilon )\).

Let \(I_4'(\varepsilon )\) be the same probability as \(I_4(\varepsilon )\) but with \(X_i\) replaced by \(Y_i\), for \(1\le i\le m\). That is,

$$\begin{aligned}&I_4'(\varepsilon )=\mathbf{P}\{|L'+U'|<2\varepsilon ,\, |U'|>\varepsilon /4\}, \\&L'=\frac{1}{N^{1/2}}\sum _{1\le i\le N}g(Y_i)-x, \qquad U'=\frac{1}{N^{3/2}}\sum _{1\le i<j\le N}\psi (Y_i,Y_j). \end{aligned}$$

By the same reasoning as in (48) we obtain that \(|I_4(\varepsilon )-I_4'(\varepsilon )|\le c_*N^{-1-3\nu }\). Now, in view of the bound

$$\begin{aligned} \int _{b_N}^1\frac{d\varepsilon }{\varepsilon ^2}N^{-1-3\nu }\le c_*b_N^{-1}N^{-1-3\nu }\le c_*N^{-5\nu } \end{aligned}$$

we conclude that it suffices to show (the second) inequality (64) for \(I_4'(\varepsilon )\).

Let us show the second inequality of (64) for \(I_4'(\varepsilon )\). We split the sample

$$\begin{aligned} {\mathbb Y} := \{Y_1,\dots , Y_N\} = {\mathbb Y}_1\cup {\mathbb Y}_2\cup {\mathbb Y}_3, \end{aligned}$$

into three groups of nearly equal size. Next, we split \(U'=\sum _{i\le j}U'_{ij}\) so that the sum \(U'_{ij}\) depends on the observations from the groups \({\mathbb Y}_i\) and \({\mathbb Y}_j\) only. We have

$$\begin{aligned} I_4'(\varepsilon ) \le \sum _{i\le j} \mathbf{P}\{|L'+U'|\le 2\varepsilon ,\,|U'_{ij}|\ge \varepsilon /24\}. \end{aligned}$$
(66)

Now we show that the second inequality of (64) holds for every summand in the right of (66). Let \({\tilde{U}}\) denote a summand \(U'_{ij}\), say, not depending on \({\mathbb Y}_3\). Let

$$\begin{aligned}&{\tilde{I}}(\varepsilon ) := \mathbf{P}\{|L'+U'|\le 2\varepsilon ,\, |{\tilde{U}}|\ge \varepsilon /24\}, \qquad \mathcal{U}=\{|{\tilde{U}}|\ge \varepsilon /24\}, \qquad \\&\mathcal{V}=\{|L'+U'|\le 2\varepsilon \},\qquad {\overline{S}}(x) := N^{-1/2} \sum _{Y_i\in {\mathbb Y}\setminus {\mathbb Y}_3}\psi (x,Y_i), \qquad x\in \mathcal X. \end{aligned}$$

We observe that

$$\begin{aligned} {\tilde{I}}(\varepsilon ) = \mathbf{E}{\mathbb I}_\mathcal{U}{\mathbb I}_\mathcal{V} \end{aligned}$$
(67)

and note that the random function \( x \rightarrow {\overline{S}}(x)\) is a sum of iid random variables with values in \(L^r\) such that, for every i, we have \(\Vert \psi (\cdot ,Y_i)\Vert _r\le N^{\alpha }\) for almost all values of \(Y_i\). By Lemma 13,

$$\begin{aligned} \mathbf{P}\{\Vert {\overline{S}}\Vert _r> N^{\nu }\}\le N^{-3}. \end{aligned}$$

Therefore in (67) we can replace the event \(\mathcal V\) by \(\mathcal{V}_1=\mathcal{V}\cap \{\Vert {\overline{S}}\Vert _r\le N^{\nu }\}\). Furthermore, since \({\tilde{U}}\) does not depend on \({\mathbb Y}_3\), we have \(\mathbf{E}{\mathbb I}_\mathcal{U}{\mathbb I}_{\mathcal{V}_1} =\mathbf{E}{\mathbb I}_\mathcal{U}p'\), where \(p':=\mathbf{E}\bigl ({\mathbb I}_{\mathcal{V}_1}|{\mathbb Y}_1, {\mathbb Y}_2\bigr )\). The concentration bound for the conditional probability \(p'\), which is shown below,

$$\begin{aligned} p' \le c_*(\varepsilon +N^{-1/2}) \end{aligned}$$
(68)

implies

$$\begin{aligned} {\tilde{I}}(\varepsilon ) \le c_*(\varepsilon +N^{-1/2})\mathbf{P}\{\mathcal{U}\} \le c_*(\varepsilon +N^{-1/2})\varepsilon ^{-r}N^{-r/2}. \end{aligned}$$
(69)

In the last step we applied Markov’s inequality

$$\begin{aligned} \mathbf{P}\{\mathcal{U}\} \le (24/\varepsilon )^{r}N^{-r/2}\mathbf{E}|N^{1/2}{\tilde{U}}|^r \end{aligned}$$

and the bound \(\mathbf{E}|N^{1/2}{\tilde{U}}|^r\le c_*\mathbf{E}|N^{1/2}U_{ij}|^r\le c_*\). Here \(U_{ij}\) denotes the random variable obtained from \({\tilde{U}}\) after we replace \(Y_j\) by \(X_j\) for every j. The second last inequality follows from (47). The last inequality follows from the well known moment inequalities for U-statistics [20].

It follows from (69) and the simple inequality \(\varepsilon \ge b_N\ge c_* N^{-1/2}\) that

$$\begin{aligned} \int _{b_N}^{1}\frac{d\varepsilon }{\varepsilon ^2}{\tilde{I}}(\varepsilon ) \le \frac{c_*}{N^{r/2}} \int _{b_N}^1\frac{d\varepsilon }{\varepsilon ^{1+r}} \le \frac{c_*}{N^{r/2}b_N^r} = c_*m^{-r/2}\ln ^rN \le c_* N^{-5\nu }, \end{aligned}$$

provided that \( m^{r/2}\ge N^{6\nu }\). The latter inequality is ensured by (17). Thus we have shown (64) for \({\tilde{I}}(\varepsilon )\).

It remains to prove (68). We write \(L'+U'\) in the form \(L_*+U_*+b-x\), where

$$\begin{aligned} L_* =\frac{1}{N^{1/2}}\sum _{Y_j\in {\mathbb Y}_3} \bigl (g(Y_j)+N^{-1/2}{\overline{S}}(Y_j)\bigr ) \qquad {\text {and}} \qquad U_*= \frac{1}{N^{3/2}}\sum _{\{Y_j,Y_k\}\subset {\mathbb Y}_3}\psi (Y_j,Y_k), \end{aligned}$$

and where b is a function of \(\{Y_i\in {\mathbb Y}\setminus {\mathbb Y}_3\}\). Introduce the random variables \({\overline{L}}\) and \({\overline{U}}\) which are obtained from \(L_*\) and \(U_*\) after we replace every \(Y_j\in {\mathbb Y}_3\) by the corresponding observation \(X_j\). We have

$$\begin{aligned} p'&\le \sup _{v\in R} \mathbf{E}\bigl ( {\mathbb I}_{\{L_*+U_*\in [v,v+2\varepsilon ]\}} \bigl | {\mathbb Y}_1, {\mathbb Y}_2 \bigr ) {\mathbb I}_{\{\Vert {\overline{S}}\Vert _r\le N^{\nu }\}} \\&\le c_*\sup _{v\in R} \mathbf{E}\bigl ( {\mathbb I}_{\{{\overline{L}}+{\overline{U}}\in [v,v+2\varepsilon ]\}} \bigl | {\mathbb Y}_1, {\mathbb Y}_2 \bigr ) {\mathbb I}_{\{\Vert {\overline{S}}\Vert _r\le N^{\nu }\}}. \end{aligned}$$

In the last step we applied (47). Now an application of the Berry–Esseen bound due to van Zwet [37] shows (68). The proof of the second inequality of (64) is complete.

Step A.2. Proof of the first inequality of (64). We introduce events

$$\begin{aligned} \mathbb A=\{{\tilde{\alpha }}^2>1-\varepsilon _m^2\}, \quad \mathbb V=\{\Vert S\Vert _r\le N^{\nu }\}, \quad \mathbb L=\{|L|<2\varepsilon \} \end{aligned}$$

(recall that \(\varepsilon _m\) is defined in (53)) and write \(I_1(\varepsilon )\) in the form \(I_1(\varepsilon ) =\mathbf{E}\, \mathbb I_{\mathbb A}\mathbb I_{{\mathbb S}}\mathbb I_{\mathbb L}\). We have

$$\begin{aligned} I_1(\varepsilon ) =\mathbf{E}\, \mathbb I_{\mathbb A}\mathbb I_{{\mathbb S}}\mathbb I_{\mathbb L} \le \mathbf{E}\, \mathbb I_{\mathbb A}\mathbb I_{\mathbb V}\mathbb I_{\mathbb L}. \end{aligned}$$

To upper bound \(I_1(\varepsilon )\) we use the following strategy. We can upper bound the probability \(\mathbf{P}\{\mathbb L\}\) using the Berry–Esseen inequality,

$$\begin{aligned} \mathbf{P}\{\mathbb L\}\le c_*(\varepsilon +N^{-1/2}). \end{aligned}$$
(70)

Furthemore, one can show that the probability \(\mathbf{P}\{\mathbb A\}=O(N^{-6\nu })\). We are going to make use of both of these bounds. However, since the events \(\mathbb A\) and \(\mathbb L\) refer to the same set of random variables \(Y_{m+1},\dots , Y_N\), we cannot argue directly that \(\mathbf{E}{\mathbb I}_{\mathbb A}{\mathbb I}_{\mathbb L}\approx \mathbf{P}\{\mathbb A\}\mathbf{P}\{\mathbb L\}\). Nevertheless, invoking a complex conditioning argument we are able to show that

$$\begin{aligned} I_1(\varepsilon )\le c_*\mathcal{R}(\varepsilon +N^{-1/2})+c_*N^{-2}, \qquad \mathcal{R}:=N^{-6\nu }(1+\delta _3^{-1}). \end{aligned}$$
(71)

The latter inequality together with the inequalities \(\varepsilon \ge b_N>N^{-1/2}\) imply the first part of (64). Let us prove (71). As the proof is rather involved we start by providing an outline. Let the integers n and M be defined by (29). Split \(\{1,\dots , N\}=O_0\cup O_1\cup \dots \cup O_n\), where \(O_0=\{1,\dots , m\}\) and where the sets \(O_i\), for \(1\le i\le n\), are defined in (30). Split L, see (63),

$$\begin{aligned} L= \sum _{k=0}^nL_k-x, \quad \ \ {\text {where}} \quad \ \ L_k=N^{-1/2}\sum _{j\in O_k}g(Y_j), \quad \ \ {\text {for}} \quad \ \ k=1,\dots , n, \end{aligned}$$
(72)

and where \(L_0=N^{-1/2}\sum _{j\in O_0}g(X_j)\). Observe, that \(\mathbb I_{\mathbb L}\) is a function of \(L_0, L_{1},\dots , L_n\). The random variables \(\mathbb I_{\mathbb A}\) and \(\mathbb I_{\mathbb V}\) are functions of \(Y_{m+1},\dots , Y_N\) and do not depend on \(X_1,\dots , X_m\). Therefore, denoting

$$\begin{aligned} m(l_1,\dots , l_n)=\mathbf{E}(\mathbb I_{\mathbb A}\mathbb I_{\mathbb V}|L_{1}=l_{1},\dots , L_n=l_n) \qquad {\text {and}} \qquad \mathcal{M}=ess\sup m(l_{1},\dots , l_n) \end{aligned}$$

we obtain from (70)

$$\begin{aligned} \mathbf{E}\, \mathbb I_{\mathbb A}\mathbb I_{\mathbb V}\mathbb I_{\mathbb L} = \mathbf{E}\, \mathbb I_{\mathbb L} m(L_1,\dots , L_n) \le c_*(\varepsilon +N^{-1/2})\mathcal{M}. \end{aligned}$$
(73)

Clearly, the bound \(\mathcal{M}\le c_*\mathcal{R}\) would imply (71). Unfortunately, we are not able to establish such a bound directly. In what follows we prove (71) using a delicate conditioning which allows us to estimate quantities like \(\mathcal{M}\).

Step A.2.1. Firstly we replace \(L_k\), \(1\le k\le n\), by smooth random variables

$$\begin{aligned} g_k=\frac{1}{N}\frac{\xi _k}{n^{1/2}}+L_k, \end{aligned}$$
(74)

where \(\xi _{1},\dots , \xi _n\) are symmetric i.i.d. random variables with the density function defined by (22) with \(k=6\) and \(a=1/6\) so that the characteristic function \(t\rightarrow \mathbf{E}\exp \{it\xi _1\}\) vanishes outside the unit interval \(\{t:\, |t|<1\}\). Note that \(\mathbf{E}\xi _1^4<\infty \). We assume that the sequences \(\xi _1,\, \xi _2, \dots \) and \(X_1,\dots , X_m,Y_{m+1},\dots , Y_N\) are independent. In particular, \(\xi _k\) and \(L_k\) are independent.

Introduce the event

$$\begin{aligned} {\tilde{\mathbb L}} = \left\{ \left| L_0+\sum _{k=1}^ng_k-x \right| <3\varepsilon \right\} . \end{aligned}$$

Note that

$$\begin{aligned} \mathbb I_{\mathbb L} \le \mathbb I_{\tilde{\mathbb L}} + \mathbb I_{\{|\xi |\ge \varepsilon N\}}, \qquad {\text {where}} \qquad \xi =\frac{1}{n^{1/2}}\sum _{k=1}^n\xi _k. \end{aligned}$$

Using Markov’s inequality and the inequality \(\mathbf{E}\xi ^4\le c\) we estimate the probability

$$\begin{aligned} \mathbf{P}\{|\xi |\ge \varepsilon N\} \le \frac{\mathbf{E}\xi ^4}{\varepsilon ^4 N^4} \le \frac{c}{\varepsilon ^4N^4} \le \frac{c_*}{N^2}, \end{aligned}$$

where in the last step we used \(\varepsilon ^2N\ge b_N^2N\ge c'_*\). Hence we have

$$\begin{aligned} \mathbf{E}\mathbb I_{\mathbb A}\mathbb I_{\mathbb V}\mathbb I_{\mathbb L} \le \mathbf{E}\mathbb I_{\mathbb A}\mathbb I_{\mathbb V}\mathbb I_ {\tilde{\mathbb L}}+c_*N^{-2}. \end{aligned}$$
(75)

In the subsequent steps of the proof we replace the conditioning on \(L_1,\dots , L_n\) (in (73)) by the conditioning on the random variables \(g_1,\dots , g_n\). Since the latter random variables have densities (their densities are analysed in Lemma 7 below) the corresponding conditional distributions are much easier to handle. Moreover, we restrict the conditioning on the event where these densities are positive.

Step A.2.2. Given \(w>0\), consider the events \(\{|g_k|\le n^{-1/2}w\}\) and their indicator functions \(\mathbb I_k=\mathbb I_{\{|g_k|\le n^{-1/2}w\}}\). Using the simple inequality \(n\mathbf{E}g^2_k\le c_*\) (where \(c_*\) depends on \(M_*\) and r) we obtain from Chebyshev’s inequality that

$$\begin{aligned} \mathbf{P}\{\mathbb I_k=1\} = 1-\mathbf{P}\{|g_k|>n^{-1/2}w\} \ge 1-w^{-2}n\mathbf{E}|g_k|^2>7/8, \end{aligned}$$
(76)

where the last inequality holds for a sufficiently large constant w (depending on \(M_*,\, r\)). Fix w such that (76) holds and introduce the event \(\mathbb B^*=\{\sum _{k=1}^{n}\mathbb I_k> n/4\}\). Hoeffding’s inequality shows \(\mathbf{P}\{\mathbb B^*\}\ge 1-\exp \{-n/8\}\). Therefore,

$$\begin{aligned} \mathbf{E}\mathbb I_{\mathbb A}\mathbb I_{\mathbb V}\mathbb I_{\tilde{\mathbb L}} \le \mathbf{E}\mathbb I_{\mathbb A}\mathbb I_{\mathbb V}\mathbb I_{\tilde{\mathbb L}}\mathbb I_{\mathbb B^*}+c_*N^{-2}. \end{aligned}$$
(77)

Given a binary vector \(\theta =(\theta _1,\dots ,\theta _{n})\) (with \(\theta _k\in \{0;1\}\)) write \(|\theta |=\sum _k\theta _k\). Introduce the event \(\mathbb B_\theta =\{\mathbb I_k=\theta _k, \, 1\le k\le n\}\) and the conditional expectation

$$\begin{aligned} m_{\theta }(z_1,\dots ,z_{n}) = \mathbf{E}(\mathbb I_{\mathbb A}\mathbb I_{\mathbb V}\mathbb I_{\mathbb B_{\theta }}\,|\,g_1=z_1,\dots , g_{n}=z_{n}), \qquad (z_1,\dots , z_n)\in \mathbb R^n. \end{aligned}$$

Note that \(\mathbb I_{\mathbb B_{\theta }}\), the indicator of the event \(\mathbb B_{\theta }\), is a function of \(g_1,\dots , g_{n}\). It follows from the identities

$$\begin{aligned} \mathbb B^*=\cup _{|\theta |> n/4}\mathbb B_{\theta } \qquad {\text {and}} \qquad \mathbb I_{\mathbb B^*}=\sum _{|\theta |> n/4}\mathbb I_{\mathbb B_{\theta }} \end{aligned}$$

(here \(\mathbb B_{\theta }\cap \mathbb B_{\theta '}=\emptyset \), for \(\theta \not =\theta '\)) that

$$\begin{aligned} \mathbf{E}\mathbb I_{\mathbb A}\mathbb I_{\mathbb V}\mathbb I_{\tilde{\mathbb L}}\mathbb I_{\mathbb B^*} = \sum _{|\theta |> n/4}\mathbf{E}\mathbb I_{\mathbb A}\mathbb I_{\mathbb V}\mathbb I_{\tilde{\mathbb L}}\mathbb I_{\mathbb B_{\theta }} = \sum _{|\theta |> n/4}\mathbf{E}\mathbb I_{\mathbb B_\theta }\mathbb I_{\tilde{\mathbb L}}m_{\theta }(g_1,\dots , g_{n}). \end{aligned}$$

We shall show below that uniformly in \(\theta \), satisfying \(|\theta |> n/4\), we have

$$\begin{aligned} M_{\theta }\le c_*\mathcal{R}, \qquad {\text {where}} \qquad M_{\theta } := {\text {ess sup}} \ m_{\theta }(z_1,\dots , z_{n}). \end{aligned}$$
(78)

This bound in combination with (70), which extends to \({\tilde{\mathbb L}}\) as well, implies

$$\begin{aligned} \mathbf{E}\mathbb I_{\mathbb A}\mathbb I_{\mathbb V}\mathbb I_{\tilde{\mathbb L}}\mathbb I_{\mathbb B^*}&\le c_*\mathcal{R}\sum _{|\theta |> n/4}\mathbf{E}\mathbb I_{\mathbb B_\theta }\mathbb I_{\tilde{\mathbb L}} = c_*\mathcal{R}\mathbf{E}\mathbb I_{\mathbb B^*}\mathbb I_{\tilde{\mathbb L}} \\&\le c_*\mathcal{R} \mathbf{P}\{\tilde{\mathbb L}\} \le c_*\mathcal{R}(\varepsilon +N^{-1/2}). \end{aligned}$$

Combining the latter inequalities with (75) and (77) we obtain (71).

Step A.2.3. Here we show (78). Fix \(\theta =(\theta _1,\dots , \theta _{n})\) satisfying \(|\theta |> n/4\). Denote, for brevity, \(h=|\theta |\) and assume without loss of generality that \(\theta _i=1\), for \(1\le i\le h\), and \(\theta _j=0\), for \(h+1\le j\le n\). Consider the \(h-\)dimensional random vector \({\overline{g}}_{[\theta ]}=(g_1,\dots , g_h)\). Note that the random vector \({\overline{g}}_{[\theta ]}\) and the sequences of random variables

$$\begin{aligned} {\mathbb Y}_{\theta } = \bigl \{ Y_j:\, m+hM<j\le N \bigr \}, \qquad \xi _{\theta }=\{\xi _j:\, h<j\le n\} \end{aligned}$$

are independent. Recall S from (55) and note that the terms \(S_{\theta }\) and \(S'_{\theta }\) of the decomposition

$$\begin{aligned} S=S_{\theta }+S'_{\theta }, \qquad S_{\theta }(\cdot )=\frac{1}{\sqrt{N}}\sum _{1\le k\le h} \sum _{j\in O_k}\psi (\cdot ,Y_j) \end{aligned}$$

are independent as well.

For \({\overline{z}}_{[\theta ]} = (z_1,\dots , z_h)\in {\mathbb R}^h\) we have \( m_{\theta }(z_1,\dots , z_n) \le {\tilde{m}}_{\theta }({\overline{z}}_{[\theta ]})\), where

$$\begin{aligned} {\tilde{m}}_{\theta }({\overline{z}}_{[\theta ]}) = {\text {ess sup}}_{\theta } \mathbf{E}\bigl ( \mathbb I_{\mathbb A}\mathbb I_{\mathbb V}\mathbb I_{\mathbb B_{\theta }} \, \bigr | \, {\overline{g}}_{[\theta ]}={\overline{z}}_{[\theta ]}, \, {\mathbb Y}_{\theta }, \, \xi _{\theta } \bigr ) \end{aligned}$$

denotes the "ess sup" taken with respect to almost all values of \({\mathbb Y}_{\theta }\) and \(\xi _{\theta }\). To prove (78) we show that

$$\begin{aligned} {\tilde{m}}_{\theta }({\overline{z}}_{[\theta ]}) \le c_*\mathcal{R}. \end{aligned}$$
(79)

Let us prove (79). Given \({\mathbb Y}_{\theta }\), denote \(f_{\theta }=S'_{\theta }\) (note that \(S'_{\theta }\) is a function of \(\mathbb Y_{\theta }\)). Using the notation (40), we have for the interval \(J'_p=N^{-1/2}J_p\),

$$\begin{aligned} \mathbf{E}\bigl ( \mathbb I_{\mathbb A}\mathbb I_{\mathbb V}\mathbb I_{\mathbb B_{\theta }} \, \bigr | \, {\overline{g}}_{[\theta ]}={\overline{z}}_{[\theta ]}, \, {\mathbb Y}_{\theta }, \, \xi _{\theta } \bigr ) = {\mathbb I}_{{\mathbb B}_{\theta }} \mathbf{E}\bigl ( d_{\varepsilon _m}(f_{\theta }+S_{\theta }, J'_p)\, \bigr | \, {\overline{g}}_{[\theta ]}={\overline{z}}_{[\theta ]},\, \mathbb Y_{\theta }, \,\xi _{\theta } \bigr ). \end{aligned}$$
(80)

Note that the factor \({\mathbb I}_{{\mathbb B}_{\theta }}\) in the right side is non zero whenever \({\overline{z}}_{[\theta ]}=(z_1,\dots , z_h)\) satisfies \(|z_i|\le w/\sqrt{n}\), for \(i=1,\dots , h\). Introduce \(L^r\) valued random variables

$$\begin{aligned} U_i=N^{-1/2}\sum _{j\in O_{i}}\psi (\cdot ,Y_j), \qquad i=1,\dots , h, \end{aligned}$$

and the regular conditional probability

$$\begin{aligned} P({\overline{z}}_{[\theta ]};{\mathcal{A}}) = \mathbf{E}\bigl ( {\mathbb I}_{\{(U_{1},\dots , U_h)\in {\mathcal{A}}\}} \, \bigr | \, {\overline{g}}_{[\theta ]}={\overline{z}}_{[\theta ]} \bigr ). \end{aligned}$$

Here \(\mathcal{A}\) denotes a Borel subset of \(L^r\times \dots \times L^r\) (h-times). By independence, there exist regular conditional probabilities

$$\begin{aligned} P_i(z_i;\,{\mathcal{A}}_i) = \mathbf{E}({\mathbb I}_{U_i\in {\mathcal{A}}_i}\,\bigr |\, g_i=z_i), \qquad i=1,\dots , h, \end{aligned}$$
(81)

such that for Borel subsets \({\mathcal{A}}_i\) of \(L^r\) we have

$$\begin{aligned} P({\overline{z}}_{[\theta ]}; {\mathcal{A}}_1\times \cdots \times {\mathcal{A}}_h) = \prod _{1\le i\le h}P_i(z_i; {\mathcal{A}}_i). \end{aligned}$$

In particular, for every \({\overline{z}}_{[\theta ]}\), the regular conditional probability \(P({\overline{z}}_{[\theta ]};\cdot )\) is the (measure theoretical) extension of the product of the regular conditional probabilities (81). Therefore, denoting by \(\psi _i\) a random variable with values in \(L^r\) and with the distribution

$$\begin{aligned} \mathbf{P}\{\psi _i\in \mathcal{B} \}=P_i(z_i;\mathcal{B}), \qquad \mathcal{B}\subset L^r - {\text {Borel set}}, \end{aligned}$$
(82)

we obtain that the distribution of the sum

$$\begin{aligned} \zeta =\psi _1+\dots +\psi _h \end{aligned}$$
(83)

of independent random variables \(\psi _1,\dots , \psi _h\) is the regular conditional distribution of \(S_{\theta }\), given \({\overline{g}}_{[\theta ]}={\overline{z}}_{[\theta ]}\). In particular, the expectation in the right side of (80) equals \(\delta _{\varepsilon _m}(f_{\theta }+\zeta )\), where

$$\begin{aligned} \delta _s(f_{\theta }+\zeta ) := \mathbf{E}_\zeta d_s(f_{\theta }+\zeta , J_p'), \qquad s>0, \end{aligned}$$
(84)

and where \(\mathbf{E}_\zeta \) denotes the conditional expectation given all the random variables, but \(\zeta \). Recall \(\varepsilon _m\) defined by (53) and note that for any \(\varepsilon _*\) satisfying the inequality

$$\begin{aligned} \varepsilon _m\le \varepsilon _* \end{aligned}$$
(85)

we have

$$\begin{aligned} \delta _{\varepsilon _m}(f_{\theta }+\zeta )\le \delta _{\varepsilon _*}(f_{\theta }+\zeta ). \end{aligned}$$
(86)

We put \(\varepsilon _*=\mu _* |T_0| N^{-1/2}/20\) and apply Lemma 1 to upper bound \(\delta _{\varepsilon _*}(f_{\theta }+\zeta )\) (the quantity \(\mu _*\) is defined in (97) below). We will use the inequalities \(c_*\delta _3^2/n\le \mu _*^2\le c_*'\delta _3^2/n\) that follow from (217) below. Note that for \(T_0\) satisfying (38), integers mn as in (17), (29), and the quantity \(\delta _3\) (see (41)) satisfying

$$\begin{aligned} \delta _3^2\ge N^{-8\nu }, \end{aligned}$$
(87)

the inequality (85) holds, provided that N is sufficiently large (\(N>C_*\)). Moreover, we have

$$\begin{aligned} \varepsilon _*^2\le c_*\delta _3^2N^{-48\nu }. \end{aligned}$$
(88)

Now Lemma 1 (together with the moment inequalities of Lemma 10) implies the inequality

$$\begin{aligned} \delta _{\varepsilon _*}(f_{\theta }+\zeta ) \le c_*\kappa _*^{1/2}\varepsilon _*^{(r-2)/(2r)}+c_*N^{-2}, \end{aligned}$$
(89)

where the number \(\kappa _*\), defined in (97), satisfies \(\kappa _*\le c_*\delta _3^{-r/(r-2)}\), by (218).

Denote \({\tilde{r}}=r^{-1}+(r-2)^{-1}\). It follows from (89), (88) and (86), for \(r>4\), that

$$\begin{aligned} \delta _{\varepsilon _m}(f_{\theta }+\zeta ) \le c_*\delta _3^{-{\tilde{r}}}N^{-6\nu }+c_*N^{-2} \le c_*(1+\delta _3^{-{\tilde{r}}})N^{-6\nu } \le c_*\mathcal{R}. \end{aligned}$$
(90)

In the last step we used the simple bound \(\delta _3^2\le c_*\), see (200), and the inequality \(1+\delta _3^{-{\tilde{r}}}\le 2+\delta _3^{-1}\), which holds for \({\tilde{r}}<1\). Note that (90) and (80), (84) imply (79). The proof the proof of the first inequality of (60) is complete.

Step B. Here we prove the second bound of (60). It is convenient to write the \(L^r\)-valued random variable (55) in the form

$$\begin{aligned} S=U_1+\dots +U_{n-1}+U_n=:S'+U_n, \qquad {\text {where}} \qquad U_i=N^{-1/2}\sum _{j\in O_i}\psi (\cdot ,Y_j). \end{aligned}$$
(91)

Observe that \(U_1,\dots , U_{n-1}\) are independent and identically distributed. We are going to apply Lemma 1 conditionally, given \(U_n\), to the probability

$$\begin{aligned} \mathbf{P}\{\mathbb B\}=\mathbf{E}{\tilde{p}}(U_n), \qquad {\text {where}} \qquad {\tilde{p}}(f)=\mathbf{E}\bigl (d_{\varepsilon _m}(S'+f,\,N^{-1/2}J_p)\bigl | U_n=f\bigr ). \end{aligned}$$

To upper bound \({\tilde{p}}(f)\) we proceed similarly as in the proof of (90). Lemma 9 shows that \(U_1,\dots , U_{n-1}\) satisfy the moment conditions of Lemma 1. Note that in this case the quantity \(\mu _*\) satisfies \(c_*\delta _3^2/n\le \mu ^2_*\le c_*'/n\) (these inequalities follow from (201)). The right inequality implies the bound \(\varepsilon _*\le c_*N^{-48\nu }\) instead of (88) above. As a result we obtain a different power of \(\delta _3\) in the upper bound below. Proceeding as in proof of (90), see (86), (88), (89), we obtain

$$\begin{aligned} {\tilde{p}}(f)\le c_*(1+\delta _3^{-r/2(r-2)})N^{-6\nu } \le c_*\mathcal{R}. \end{aligned}$$

In the last step we used the inequality \(1+\delta _3^{-r/2(r-2)}\le 2+\delta _3^{-1}\), which follows from \(r/2(r-2)< 1\) (recall that \(r>4\)). Therefore, we have \(\mathbf{P}\{\mathbb B\}\le \mathbf{E}{\tilde{p}}(U_n)\le c_*\mathcal{R}\), where \(\mathcal{R}\) is defined in (71). This completes the proof of the second inequality in (60).

3.4 Proof of (26) for \(k = 3\)

Here we prove the bound \(|I_3|\le c_*N^{-1-\nu }\), see (26). It follows from (48) and the identity \(\mathbf{E}_{\mathbb Y}\exp \{itT'\}=\alpha _t^m\exp \{itW'\}\), see (13), that

$$\begin{aligned} |I_3|\le \int _{t_1<|t|<t_2}\frac{\mathbf{E}|\alpha _t^m|}{|t|}dt+c_*N^{-1-\nu }. \end{aligned}$$
(92)

Recall the event \({\mathbb S}=\{\Vert S\Vert _r<N^{\nu /10}\}\), where S defined in (55). We have

$$\begin{aligned} \mathbf{E}|\alpha _t^m| \le \mathbf{E}{\mathbb I}_{{\mathbb S}}|\alpha _t^m| +\mathbf{E}(1-{\mathbb I}_{\mathbb S}). \end{aligned}$$
(93)

Using Lemma 13 we upper bound the second term on the right: \(\mathbf{P}\{\Vert S\Vert _r\ge N^{\nu /10}\}\le c_*N^{-3}\). Furthermore, the one-term expansion of the exponent in (13) in powers of \(itN^{-3/2}\sum _{j=m+1}^N\psi (X_1,Y_j)\) shows the inequality

$$\begin{aligned} {\mathbb I}_{\mathbb S}|\alpha _t| \le |\mathbf{E}\exp \{itN^{-1/2}g(X_1)\}|+ {\mathbb I}_{\mathbb S}|t|N^{-1}\Vert S\Vert _1. \end{aligned}$$

It follows from (7) that the first summand is bounded from above by \(1-v\), for some \(v>0\) depending on \(A_*,M_*,D_*, \delta \) only, see the proof of (36). Furthermore, the second summand is bounded from above by \(N^{-9\nu /10}\) almost surely. Therefore, for sufficiently large \(N>C_*\) we have \({\mathbb I}_{\mathbb S}|\alpha _t|\le 1-v/2\) uniformly in N. Invoking this bound in (93) we obtain

$$\begin{aligned} \mathbf{E}|\alpha _t^m|\le (1-v/2)^m+c_*N^{-3}\le c_*N^{-3}, \end{aligned}$$

for m satisfying (17). The latter inequality implies that the integral in (92) is bounded from above by \(c_*N^{-2}\) thus completing the proof.

4 Combinatorial concentration bound

We start the section by introducing some notation and collecting auxiliary inequalities. Then we formulate and prove Lemmas 1 and 2.

Introduce the number

$$\begin{aligned} \delta _2 = \min \left\{ \frac{1}{12c_g}, \frac{(c_r\Vert g\Vert _2^2/2^r\Vert g\Vert _r^r)^{1/(r-2)}}{1+4/\Vert g\Vert _2} \right\} , \end{aligned}$$
(94)

where \(c_g=1+\Vert g\Vert _r/\Vert g\Vert _2\) and \(c_r=(7/24)2^{-(r-1)}\). Denote

$$\begin{aligned} \rho ^*=1-\sup \{|\mathbf{E}e^{itg(X_1)}|: 2^{-1}\delta _2\le |t|\le N^{-\nu +1/2}\}. \end{aligned}$$

It follows from the identity \(\rho ^* = \rho (2^{-1}\sigma \delta _2,\, \sigma N^{-\nu +1/2})\) and the simple inequality \(a_1\le \delta _2/4\), see (35), that \(\rho ^* \ge \rho (2 \sigma \,a_1, \sigma N^{-\nu +1/2})\). Furthermore, it follows from (169) and the assumption \(\sigma _\mathbb T^2=1\) that \(1/2<\sigma <2\) for sufficiently large N (\(N>C_*\)). Therefore, \(\rho ^* \ge \rho (a_1,2N^{-\nu +1/2}) \ge \delta '\), where the last inequality follows from (36). We obtain, for \(N>C_*\),

$$\begin{aligned} 1-\sup \{|\mathbf{E}e^{itg(X_1)}|: 2^{-1}\delta _2\le |t|\le N^{-\nu +1/2}\} \ge \delta ', \end{aligned}$$
(95)

where the number \(\delta '\) depends on \(A_*,D_*,M_*,\nu _1, r,s,\delta \) only. In what follows we use the notation \(c_0=10\). Let \(L_0^r= \{y\in L^r: \int _\mathcal{X} y(x)P_X(dx)=0\}\) denotes a subspace of \(L^r\). Observe, that \(\mathbf{E}g(X_1)=0\) implies \(y^*(=p_g(y))\in L_0^r\), for every \(y\in L_0^r\).

4.1 Lemma 1

Let \(\psi _1,\dots , \psi _n\) denote independent random vectors with values in \(L_0^r\). For \(k=1,\dots , n\), write

$$\begin{aligned} \zeta _k = \psi _1+\dots +\psi _k \qquad {\text {and}} \qquad \zeta =\zeta _n. \end{aligned}$$

Let \(\overline{\psi }_i\) denote an independent copy of \(\psi _i\). Write \(\psi _i^*=p_g(\psi _i)\) and \(\overline{\psi }_i^*=p_g({\overline{\psi }}_i)\), see (31). Introduce random vectors

$$\begin{aligned} \tilde{\psi }_i = 2^{-1}(\psi _i-\overline{\psi }_i), \qquad \tilde{\psi }_i^* = 2^{-1}(\psi _i^*-\overline{\psi }_i^*), \qquad \hat{\psi }_i = 2^{-1}(\psi _i+\overline{\psi }_i). \end{aligned}$$

We shall assume that, for some \(c_A\ge c_D\ge c_B>0\),

$$\begin{aligned} n^{r/2}\mathbf{E}\Vert {\tilde{\psi }}_i\Vert _r^r\le c_A^r, \qquad c_B^2 \le n\, \mathbf{E}\Vert {\tilde{\psi }}^*_i\Vert _2^2 \le c_D^2, \end{aligned}$$
(96)

for every \(1\le i\le n\). Furthermore, denote \(\mu _i^2=\mathbf{E}\Vert {\tilde{\psi }}^*_i\Vert _2^2\) and \({\tilde{\kappa }}_i^{r-2}=\frac{8}{3}\frac{\mathbf{E}\Vert {\tilde{\psi }}_i\Vert _r^r}{\mu _i^r}\),

$$\begin{aligned} \mu _*=\min _{1\le i\le n}\mu _i, \qquad \kappa _*=\max _{1\le i\le n}{\tilde{\kappa }}_i. \end{aligned}$$
(97)

Observe that, by Hölder’s inequality and (32), we have \({\tilde{\kappa }}_i>1\), for \(i=1,\dots , n\).

Lemma 1

Let \(4<r\le 5\) and \(0<\nu <10^{-2}(r-4)\). Assume that \(n\ge N^{5\nu }\). Suppose that

$$\begin{aligned} \kappa _*^4\le \frac{9}{256}\frac{n}{\ln N}. \end{aligned}$$
(98)

Assume that (95), (96) as well as (106), (112) (below) hold. There exist a constant \(c_*>0\) which depends on \(r,s,\nu , A_*, D_*, M_*, \delta \) only such that for every \(T_0\) satisfying (38) we have

$$\begin{aligned} \delta _{\varepsilon _*}(f+\zeta , I(T_0)) \le c_*(C_D/C_B)^{1/2}\kappa _*^{1/2}{\varepsilon _*}^{(r-2)/2r}+c_*N^{-2}, \end{aligned}$$
(99)

for an arbitrary non-random element \(f\in L_0^r\). Here \( {\varepsilon _*}=\frac{\mu _*}{2c_0}\frac{|T_0|}{\sqrt{N}}\). The function \(\delta _s(\cdot , I(T_0))\), is defined in (40).

In Step A.2.3 of Sect. 3 we apply this lemma to random vector \(\zeta =\psi _1+\dots +\psi _h\), see (83). In Step B of Sect. 3 we apply this lemma to the random vector \(S'\), see (91).

Proof

We shall consider the case where \(T_0>0\). For \(T_0<0\) the proof is the same. We can assume without loss of generality that \(c_0<N^{\nu }\). Denote \(X=\Vert \tilde{\psi }_i^*\Vert _2\) and \(Y=\Vert \tilde{\psi }_i\Vert _r\) and \(\mu =\mu _i\), \(\kappa ={\tilde{\kappa }}_i\). By (32), we have \(Y\ge X\).

Step 1. Here we construct the bound (100), see below, for the probability \(\mathbf{P}\{B_i\}\), where

$$\begin{aligned} B_i=\{X\ge \mu /2,\, Y<\kappa \mu \}. \end{aligned}$$

Write

$$\begin{aligned}&\mu ^2=\mathbf{E}X^2 = \mathbf{E}X^2I_A+\mathbf{E}X^2I_{B_i}+\mathbf{E}X^2I_D, \\&A =\{X<\mu /2\}, \qquad D=\{X\ge \mu /2,\, Y\ge \kappa \mu \}. \end{aligned}$$

Substitution of the bounds

$$\begin{aligned} \mathbf{E}X^2I_A&\le \frac{\mu ^2}{4}, \\ \mathbf{E}X^2I_{B_i}&\le \mathbf{E}Y^2I_{B_i}\le (\kappa \mu )^2\mathbf{P}\{B_i\}, \\ \mathbf{E}X^2I_D&\le \mathbf{E}Y^2I_{\{Y\ge \kappa \mu \}} \le (\kappa \mu )^{2-r}\mathbf{E}Y^r \end{aligned}$$

gives

$$\begin{aligned} \mu ^2\le 4^{-1}\mu ^2+\kappa ^2\mu ^2\mathbf{P}\{B_i\}+(\kappa \mu )^{2-r}\mathbf{E}Y^r. \end{aligned}$$

Finally, invoking the identity \(\kappa ^{r-2}=(8/3)\mathbf{E}Y^r/\mu ^r\) we obtain

$$\begin{aligned} \mathbf{P}\{B_i\} \ge \frac{3}{4\kappa ^2}-\frac{\mathbf{E}Y^r}{(\kappa \mu )^{r}} = \frac{3}{4\kappa ^2}\bigl (1-\frac{4\mathbf{E}Y^r}{3\mu ^r\kappa ^{r-2}}\bigr ) = \frac{3}{8\kappa ^2} \ge \frac{3}{8\kappa _*^2}=:p. \end{aligned}$$
(100)

Introduce the (random) set \(J=\{i:\, B_i\ {\text {occurs}}\}\subset \{1,\dots , n\}\). Hoeffding’s inequality applied to the random variable \(|J|=\mathbb I_{B_1}+\dots +\mathbb I_{B_n}\) shows

$$\begin{aligned} \mathbf{P}\{|J|\le \rho n\} \le \exp \{-np^2/2\} \le N^{-2}, \qquad \rho :=p/2=(3/16)\kappa _*^{-2}. \end{aligned}$$
(101)

In the last step we invoke (98) and use (100).

Step 2. Here we introduce randomization. Note that for any \(\alpha _i\in \{-1, +1\}\), \(i=1,\dots , n\), the distributions of the random vectors

$$\begin{aligned} (\psi _1,\dots , \psi _n) \qquad {\text { and}} \qquad \bigl ( \alpha _1\tilde{\psi }_1+\hat{\psi }_1, \dots , \alpha _n\tilde{\psi }_n+\hat{\psi }_n \bigr ) \end{aligned}$$

coincide. Therefore, denoting

$$\begin{aligned} {\tilde{\zeta }}_n=\alpha _1\tilde{\psi }_1+\dots +\alpha _n\tilde{\psi }_n, \qquad {\hat{\zeta }}_n={\hat{\psi }}_1+\dots +{\hat{\psi }}_n, \end{aligned}$$

we have for \(s>0\),

$$\begin{aligned} \delta _{s}(f+ \zeta , I(T_0)) = \delta _{s}(f+{\tilde{\zeta }}_n+{\hat{\zeta }}_n, I(T_0)), \end{aligned}$$

for every choice of \(\alpha _1,\dots , \alpha _n\). From now on let \(\alpha _1,\dots , \alpha _n\) denote a sequence of independent identically distributed Bernoulli random variables independent of \({\tilde{\psi }}_i, {\hat{\psi }}_i\), \(1\le i\le n\), and with probabilities \(\mathbf{P}\{\alpha _1=1\}=\mathbf{P}\{\alpha _1=-1\}=1/2\). Denoting by \(\mathbf{E}_\alpha \) the expectation with respect to the sequence \(\alpha _1,\dots , \alpha _n\) we obtain

$$\begin{aligned} \delta _{s}(f+\zeta ,I(T_0)) = \mathbf{E}_{\alpha }\delta _{s}(f+{\tilde{\zeta }}_n+{\hat{\zeta }}_n,I(T_0)). \end{aligned}$$
(102)

We are going to condition on \({\tilde{\psi }}_i\) and \({\hat{\psi }}_i\), \(1\le i\le n\), while taking expectations with respect to \(\alpha _1,\dots , \alpha _n\). It follows from (101), (102) and the fact that the random variable |J| does not depend on \(\alpha _1,\dots , \alpha _n\) that

$$\begin{aligned} \delta _{s}(f+\zeta , I(T_0)) \le \mathbf{E}\mathbb I_{\{|J|\ge \rho n\}} \gamma _s({\tilde{\psi }}_i,\, {\hat{\psi }}_i, \, 1\le i\le n)+N^{-2}, \end{aligned}$$
(103)

where

$$\begin{aligned} \gamma _s( {\tilde{\psi }}_i,\, {\hat{\psi }}_i, \, 1\le i\le n) = \mathbf{E}_{\alpha }\mathbb I_{\{|J|\ge \rho n\}} \mathbb I_{ \{ v^2(f+{\tilde{\zeta }}_n+{\hat{\zeta }}_n) > 1-s^2 \} } \mathbb I_{ \{ \Vert f+{\tilde{\zeta }}_n+{\hat{\zeta }}_n\Vert _r\le N^{\nu } \} } \end{aligned}$$

denotes the conditional expectation given \({\tilde{\psi }}_i,\, {\hat{\psi }}_i\), \(1\le i\le n\). Note that (99) is a consequence of (103) and of the bound

$$\begin{aligned} \gamma _{\varepsilon _*}({\tilde{\psi }}_i,\, {\hat{\psi }}_i, \, 1\le i\le n) \le c_*\kappa _*^{1/2}\varepsilon _*^{(r-2)/(2r)}. \end{aligned}$$
(104)

Let us prove this bound. Introduce the integers

$$\begin{aligned} n_0=l-1 \qquad \qquad l=\lfloor \delta _2\varkappa ^{-1}\varepsilon _*^{-(r-2)/r}\rfloor , \qquad \qquad \varkappa =2c_0(C_D/C_B)\kappa _*. \end{aligned}$$

Let us show that

$$\begin{aligned} n_0\le \rho n. \end{aligned}$$
(105)

It follows from the inequalities

$$\begin{aligned} \varepsilon _*^{-1}\le 2\frac{c_0}{c_B}N^{ \nu }n^{1/2}, \quad N^{\nu (r-2)/r}\le N^{\nu }\le n^{1/r}, \quad \delta _2\le \frac{3}{16}\bigl (\frac{3}{8}\bigr )^{1/(r-2)} \end{aligned}$$

that

$$\begin{aligned} l \le \frac{\delta _2}{C_D}\frac{1}{k_*} \left( \frac{C_B}{2c_0}\right) ^{2/r} \left( N^{\nu }n^{1/2}\right) ^{(r-2)/r} \le \frac{3}{16}\frac{1}{k_*}\frac{C_B^{2/r}}{C_D}n^{1/2}. \end{aligned}$$

Note that (98) implies \(k_*\le n^{1/4}\). Therefore, the inequality

$$\begin{aligned} C_B^{2/r}C_D^{-1}\le n^{1/4} \end{aligned}$$
(106)

implies \(l\le (3/16)k_*^{-2}n=\rho n\). We obtain (105).

Given \({\tilde{\psi }}_i,\, {\hat{\psi }}_i\), \(1\le i\le n\), consider the corresponding set J, say \(J=\{i_1,\dots , i_k\}\). Assume that \(k\ge \rho n\). From the inequality \(\rho n\ge n_0\), see (105), it follows that we can choose a subset \(J'\subset J\) of size \(|J'|=n_0\). Split

$$\begin{aligned} {\tilde{\zeta }}_n = \sum _{i\in J'}\alpha _i{\tilde{\psi }}_i + \sum _{i\in J\setminus J'}\alpha _i{\tilde{\psi }}_i =:\zeta _*+\zeta ' \end{aligned}$$

and denote \(f+\zeta '+{\hat{\zeta }}_n=f_*\). Note that \(f_*\in L_0^r\) almost surely. Let

$$\begin{aligned} {\tilde{\delta }} = \mathbf{E}' \mathbb I_{ \{ v^2(f_*+\zeta _*) > 1-\varepsilon _*^2 \}} \mathbb I_{ \{ \Vert f_*+\zeta _*\Vert _r\le N^{\nu } \}}, \end{aligned}$$

where \(\mathbf{E}'\) denotes the conditional expectation given all the random variables, but \(\{\alpha _i,\, i\in J'\}\). The bound (104) would follow if we show that

$$\begin{aligned} {\tilde{\delta }} \le c_*\kappa _*^{1/2}\varepsilon _*^{(r-2)/(2r)}. \end{aligned}$$
(107)

Step 3. Here we prove (107). Note that for \(j\in J'\) the vectors

$$\begin{aligned} x_j=T_0N^{-1/2}{\tilde{\psi }}_j \qquad {\text { and}} \qquad x_j^*=p_g(x_j)=T_0N^{-1/2}{\tilde{\psi }}_j^* \end{aligned}$$

satisfy

$$\begin{aligned} \Vert x_j^*\Vert _2\ge c_0\varepsilon _*, \qquad \Vert x_j\Vert _r\le \varkappa \varepsilon _*, \qquad \varkappa =2c_0(C_D/C_B)\kappa _*. \end{aligned}$$
(108)

Given \(A\subset J'\) denote

$$\begin{aligned} x_A=\sum _{i\in A}x_i-\sum _{i\in J'\setminus A}x_i, \qquad x_A^*=p_g(x_A). \end{aligned}$$

We are going to apply Kleitman’s theorem on symmetric partitions (see, e.g. the proof of Theorem 4.2, Bollobas [15]) to the sequence \(\{x_j^*,\, j\in J'\}\) in \(L^2\). Since for \(j\in J'\) we have \(\Vert x_j^*\Vert _2\ge c_0\varepsilon _*\), it follows from Kleitman’s theorem that the collection \(\mathcal{P}(J')\) of all subsets of \(J'\) splits into non-intersecting non-empty classes \(\mathcal{P}(J')=\mathcal{D}_1\cup \cdots \cup \mathcal{D}_s\), such that the corresponding sets of linear combinations \( V_t = \bigl \{ x^*_A,\, A\in \mathcal{D}_t \bigr \}\), \(t=1,2,\dots , s\), are sparse, i.e., given t, for \(A,A'\in \mathcal{D}_t\) and \(A\not =A'\) we have

$$\begin{aligned} \Vert x_A^*-x_{A'}^*\Vert _2\ge c_0\varepsilon _*. \end{aligned}$$
(109)

Furthermore, the number of classes s is bounded from above by \(\left( {\begin{array}{c}n_0\\ \lfloor n_0/2\rfloor \end{array}}\right) \).

Next, using Lemma 2 we shall show that given \(f_*\) the class \(\mathcal{D}_t\) may contain at most one element \(A\in \mathcal{D}_t\) such that

$$\begin{aligned} v^2(f_*+{\tilde{x}}_A)> 1-\varepsilon _*^2, \qquad \Vert f_*+{\tilde{x}}_A\Vert _r\le N^{\nu }, \qquad {\tilde{x}}_A:=N^{1/2}T_0^{-1}x_A. \end{aligned}$$
(110)

This means that there are at most \(\left( {\begin{array}{c}n_0\\ \lfloor n_0/2\rfloor \end{array}}\right) \) different subsets \(A\subset J'\) for which (110) holds. This implies (107)

$$\begin{aligned} {\tilde{\delta }} \le 2^{-n_0}\left( {\begin{array}{c}n_0\\ \lfloor n_0/2\rfloor \end{array}}\right) \le cn_0^{-1/2} = c\delta _2^{-1/2} \varkappa ^{1/2}\varepsilon _*^{\frac{r-2}{2r}}. \end{aligned}$$

Finally, (99) follows from (103), (104), (107).

Given \(f_*\in L_0^r\) let us show that there is no pair \(A,\,A'\) in \(\mathcal{D}_t\) which satisfy (110). Fix \(A,A'\in \mathcal{D}_t\). We have, by (108) and the choice of \(n_0\),

$$\begin{aligned} \Vert x_A-x_{A'}\Vert _r \le 2\sum _{i\in J'}\Vert x_i\Vert _r \le 2n_0\varkappa \varepsilon _*<2\delta _2\varepsilon _*^{2/r}. \end{aligned}$$

Denoting \(S_{A}=f_*+{\tilde{x}}_A\) and \(S_{A'}=f_*+{\tilde{x}}_{A'}\) we obtain

$$\begin{aligned} \Vert S_A-S_{A'}\Vert _r = N^{1/2}T_0^{-1}\Vert x_A-x_{A'}\Vert _r \le 2\delta _2\varepsilon _*^{2/r}N^{1/2}T_0^{-1}. \end{aligned}$$
(111)

Assume that \(S_A\) and \(S_{A'}\) satisfy the second inequality of (110), i.e., \(\Vert S_A\Vert _r\le N^{\nu }\) and \(\Vert S_{A'}\Vert _r\le N^{\nu }\). We are going to apply Lemma 2 to the vectors \(S_A\) and \(S_{A'}\). In order to check the conditions of Lemma 2 note that (114) and (115) are verified by (108), (109) and (111). Furthermore, the inequalities \(c_0<N^{\nu }\) and

$$\begin{aligned} c_B\ge 2N^{4\nu }(n/N)^{1/2}, \end{aligned}$$
(112)

imply \( N^{2\nu -1/2}\le \varepsilon _*\). Finally, we can assume without loss of generality that \(\varepsilon _*\le c_*'\), where \(c_*':=\min \bigl \{( \delta '/4)^{r/2}, (A_*^{1/2}/6)^{r/2}\bigr \}\). Otherwise (99) follows from trivial inequalities

$$\begin{aligned} \delta _{\varepsilon _*} \le 1 \le (\varepsilon _*/c_*')^{(r-2)/2r} \le c_*\varepsilon _*^{(r-2)/2r} \end{aligned}$$

and the inequality \(\kappa _*>1\).

Now Lemma 2 implies \(\min \{v^2(S_A), \, v^2(S_{A'})\}\le 1-\varepsilon _*^2\) thus completing the proof of Lemma 1. \(\square \)

4.2 Lemma 2

Here we formulate and prove Lemma 2. Let us introduce first some notation. Given \(y\in L^r(=L^r(\mathcal X,P_X))\) define the symmetrization \(y_{s}\in L^r(\mathcal X\times \mathcal X,P_X\times P_X)\) by \(y_{s}(x,x')=y(x)-y(x')\), for \(x,x'\in \mathcal X\). In what follows \(X_1,X_2\) denote independent random variables with values in \(\mathcal X\) and with the common distribution \(P_X\). By \(\mathbf{E}\) we denote the expectation taken with respect to \(P_X\). For \(h\in L^r\) we write

$$\begin{aligned} \mathbf{E}h=\mathbf{E}h(X_1)=\int _\mathcal{X} h(x)P_X(dx), \qquad \mathbf{E}e^{ih}=\mathbf{E}e^{ih(X_1)}=\int _\mathcal{X} e^{ith(x)}P_X(dx). \end{aligned}$$

Furthermore, for \(2\le p\le r\), denote

$$\begin{aligned} \Vert y_s\Vert _p^p=\mathbf{E}|y(X_1)-y(X_2)|^p, \qquad \Vert y\Vert _p^p=\mathbf{E}|y(X_1)|^p. \end{aligned}$$

Note that for \(y\in L_0^r\) we have \(y^*(=p_g(y))\in L_0^r\) and, therefore,

$$\begin{aligned} \mathbf{E}|y^*(X_1)-y^*(X_2)|^2=2\mathbf{E}|y^*(X_1)|^2. \end{aligned}$$
(113)

Let \(y_1,\dots , y_k, f\) be non-random vectors in \(L^r\). We shall assume that these vectors belong to the linear subspace \(L_0^r\). Given non random vectors \(\alpha =\{\alpha _i\}_{i=1}^k\) and \(\alpha '=\{\alpha '_i\}_{i=1}^k\), with \(\alpha _i, \alpha '_i\in \{-1, +1\}\), denote

$$\begin{aligned} S_{\alpha }=f+\sum _{i=1}^k\alpha _iy_i, \qquad S_{\alpha '}=f+\sum _{i=1}^k\alpha '_iy_i. \end{aligned}$$

Lemma 2

Let \(\varkappa >0\). Assume that (95) holds and suppose that

$$\begin{aligned} N^{\nu -1/2} \le \varepsilon \le \min \bigl \{( \delta '/4)^{r/2}, (\Vert g\Vert _2/6)^{r/2} \bigr \}. \end{aligned}$$

Given \(T_0\), satisfying (38), write \(T^*=N^{1/2}T_0^{-1}\) and assume that

$$\begin{aligned} \Vert y_j^*\Vert _2>c_0T^*\varepsilon , \quad \Vert y_j\Vert _r\le \varkappa \, T^*\varepsilon , \quad j=1,\dots , k. \end{aligned}$$
(114)

Suppose that \(\Vert S_{\alpha }\Vert _r\le N^{\nu }\) and \(\Vert S_{\alpha '}\Vert _r\le N^{\nu }\) and

$$\begin{aligned} \Vert S^*_{\alpha }-S^*_{\alpha '}\Vert _2\ge c_0T^*\varepsilon , \qquad \Vert S_{\alpha }-S_{\alpha '}\Vert _r \le 2\delta _2T^*\varepsilon ^{2/r}. \end{aligned}$$
(115)

Then \(\min \{v^2(S_{\alpha }),\, v^2(S_{\alpha '}\}\le 1-\varepsilon ^2\).

Recall that the functionals \(v(\cdot ),\tau (\cdot )\), \(u_t(\cdot )\) and the interval \(I=I(T_0)\) used in proof below are defined in (39).

Proof

Note that \(\delta _1<1/10\) and \(\delta _2<1/12\). In particular, we have

$$\begin{aligned} 9/10\le 1-\delta _1\le |s/T_0|\le 1+\delta _1\le 11/10, \quad {\text {for}} \quad |s-T_0|<\delta _1N^{-\nu +1/2}. \end{aligned}$$
(116)

Step 1. Assume that the inequality \(\min \{v^2(S_{\alpha }),\, v^2(S_{\alpha '}\}\le 1-\varepsilon ^2\) fails. Then for some \(s,t\in I\) we have

$$\begin{aligned} 1-|u_t(S_{\alpha })|^2< \varepsilon ^2, \qquad 1-|u_s(S_{\alpha '})|^2< \varepsilon ^2, \end{aligned}$$
(117)

see (39). Fix these st and denote

$$\begin{aligned} {\tilde{X}}=s(g+N^{-1/2}S_{\alpha '})-t(g+N^{-1/2}S_{\alpha }). \end{aligned}$$

We are going to apply the inequality (256),

$$\begin{aligned} 1-|\mathbf{E}e^{i(Y+Z)}|^2 \ge 2^{-1}(1-|\mathbf{E}e^{iZ}|^2) - (1-|\mathbf{E}e^{iY}|^2) \end{aligned}$$

to \(Z=-{\tilde{X}}\) and \(Y=s(g+N^{-1/2}S_{\alpha '})\). It follows from this inequality and (117) that

$$\begin{aligned} \varepsilon ^2 > 1-|u_t(S_{\alpha })|^2 = 1-|\mathbf{E}e^{i(Y+Z)}|^2 \ge 2^{-1}(1-|\mathbf{E}e^{-i{\tilde{X}}}|^2)-\varepsilon ^2. \end{aligned}$$

In view of the identity \(|\mathbf{E}e^{-i{\tilde{X}}}|=|\mathbf{E}e^{i{\tilde{X}}}|\) we have

$$\begin{aligned} 1-|\mathbf{E}e^{i{\tilde{X}}}|^2< 4\varepsilon ^2. \end{aligned}$$
(118)

Step 2. Here we shall show that (118) contradicts the second inequality of (115). Firstly, we collect some auxiliary inequalities. Write the decomposition (31) for \(S_{\alpha }\) and \(S_{\alpha '}\),

$$\begin{aligned} S_{\alpha }=a\,g+S_{\alpha }^*, \qquad S_{\alpha '}=a'\,g+S_{\alpha '}^*. \end{aligned}$$
(119)

Decompose

$$\begin{aligned}&{\tilde{X}} =vg+h, \\&v=(s-t)(1+a\,N^{-1/2})+(a'-a)sN^{-1/2}, \\&h= (s-t)N^{-1/2}S_{\alpha }^*+sN^{-1/2}(S_{\alpha '}^*-S_{\alpha }^*), \end{aligned}$$

where \(v\in {\mathbb R}\) and where \(h\in L^r\) is \(L^2\)-orthogonal to g. An application of (34) to \(S_{\alpha }^*\) and \(S_{\alpha '}^*-S_{\alpha }^*\) gives

$$\begin{aligned} \Vert h\Vert _r \le c_g N^{-1/2} \bigl ( |s|\, \Vert S_{\alpha '}-S_{\alpha }\Vert _r+|s-t|\,\Vert S_{\alpha }\Vert _r \bigr ). \end{aligned}$$
(120)

Furthermore, it follows from the simple inequality

$$\begin{aligned} \Vert x+y\Vert _2^2\ge 2^{-1}\Vert x\Vert _2^2-\Vert y\Vert _2^2 \end{aligned}$$

that

$$\begin{aligned} \Vert h\Vert _2^2 \ge 2^{-1}s^{2}N^{-1}\Vert S_{\alpha '}^*-S_{\alpha }^*\Vert _2^2 - (s-t)^2N^{-1}\Vert S_{\alpha }^*\Vert _2^2. \end{aligned}$$
(121)

Note that for a and \(a'\) defined in (119) we obtain from (33) and (115) that

$$\begin{aligned}&|a| \le \Vert S_{\alpha }\Vert _r \Vert g\Vert _2^{-1} \le N^{\nu }\Vert g\Vert _2^{-1}, \end{aligned}$$
(122)
$$\begin{aligned}&|a'-a| \le \Vert S_{\alpha '}-S_{\alpha }\Vert _r \Vert g\Vert _2^{-1} \le 2 \delta _2 \varepsilon ^{2/r} N^{1/2}T_0^{-1}\Vert g\Vert _2^{-1}. \end{aligned}$$
(123)

Step 4.2.1. Consider the case where, \( |s-t|<\delta _2\). Invoking the inequalities \(\Vert S_{\alpha }\Vert _r\le N^{\nu }\) and (115) we obtain from (120) that

$$\begin{aligned} \Vert h\Vert _r^r \le (4c_g)^r \delta _2^r\, \Bigl ( N^{\nu r-r/2}+\varepsilon ^2|s|^rT_0^{-r} \Bigr ). \end{aligned}$$

Furthermore, using (116), (94), and \(N^{\nu -1/2}\le \varepsilon \), we obtain for \(4\le r\le 5\)

$$\begin{aligned} \Vert h\Vert _r^r\le 3^{-r}(\varepsilon ^r+\varepsilon ^2(11/10)^r) \le 3^{1-r}\varepsilon ^2. \end{aligned}$$
(124)

Note that (32) implies \(\Vert S_{\alpha }^*\Vert _2\le \Vert S_{\alpha }\Vert _r\le N^{\nu }\). This inequality in combination with (115) and (121) gives

$$\begin{aligned} \Vert h\Vert _2^2 \ge 2^{-1}(s/T_0)^2 c_0^2 \varepsilon ^2-\delta _2^2N^{2\nu -1}. \end{aligned}$$

Invoking (116) and using \(c_0>10\), \(\delta _2<12^{-1}\), and \(N^{\nu -1/2}\le \varepsilon \) we obtain

$$\begin{aligned} \Vert h\Vert _2^2\ge (4/10)c_0^2\varepsilon ^2. \end{aligned}$$
(125)

Now we are going to apply Lemma 12 statement a) to \({\tilde{X}}=vg+h\). For this purpose we verify the conditions of this lemma. Firstly, note that (125), (113) imply, \(\Vert h_s\Vert _2^2\ge (8/10)c_0^2\varepsilon ^2\). Furthermore, it follows from the simple inequality \(\mathbf{E}|h(X_1)-h(X_2)|^r\le 2^r\mathbf{E}|h(X_1)|^r\) and (124) that \(\Vert h_s\Vert _r^r\le 3(2/3)^r\varepsilon ^2\). Therefore, we obtain, for \(4\le r\le 5\),

$$\begin{aligned} \Vert h_s\Vert _r^r \le \frac{6}{10}\varepsilon ^2 \le c_0^{-2}\Vert h_s\Vert _2^2\le c_r \Vert h_s\Vert _2^2, \qquad c_r=(7/24)2^{-(r-1)}. \end{aligned}$$

Furthermore, the inequalities (122), (123) and (116) imply

$$\begin{aligned} |v| \le \delta _2+\delta _2\Vert g\Vert _2^{-1}(N^{\nu -1/2} + 2\varepsilon ^{2/r}(11/10)) \le \delta _2(1+4\Vert g\Vert _2^{-1}), \end{aligned}$$

for \(N^{\nu -1/2}\le \varepsilon \le 1\). Invoking (94) and using the inequality \(\Vert g_s\Vert _r^r\le 2^r\Vert g\Vert _r^r\) and the identity \(\Vert g_s\Vert _2^2=2\Vert g\Vert _2^2\) we obtain

$$\begin{aligned} |v|^{r-2} \le \frac{c_r}{2^r}\frac{\Vert g\Vert _2^2}{\Vert g\Vert _r^{r}} \le \frac{c_r}{2^r}\frac{2^{-1}\Vert g_s\Vert _2^2}{2^{-r}\Vert g_s\Vert _r^r} \le \frac{c_r}{2}\frac{\Vert g_s\Vert _2^2}{\Vert g_s\Vert _r^r} \end{aligned}$$

as required by Lemma 12a). This lemma implies

$$\begin{aligned} 1-|\mathbf{E}e^{i{\tilde{X}}}|^2 \ge 6^{-1}\Vert h_s\Vert _2^2=3^{-1}\Vert h\Vert _2^2. \end{aligned}$$

In the last step we used (113). Now (125), for \(c_0\ge 10\), contradicts (118).

Step 4.2.2. Consider the case where \(\delta _2<|s-t|\le \delta _1 N^{-\nu +1/2}\). It follows from (120), (115) and (116) that

$$\begin{aligned} \mathbf{E}|h| \le \Vert h\Vert _r\le & {} c_g \bigl (2\delta _2\varepsilon ^{2/r}|s/T_0|+\delta _1\bigr ) \nonumber \\\le & {} c_g(\delta _1+3\delta _2\varepsilon ^{2/r}) \le c_g\delta _1+\varepsilon ^{2/r}. \end{aligned}$$
(126)

In the last step we used \(\delta _2<1/3\). From (122), (123) and (116), we obtain for \(\delta _2\le |s-t|\) and \(N^{\nu -1/2}\le \varepsilon \),

$$\begin{aligned} |v|&\ge \delta _2 (1-N^{\nu -1/2}\Vert g\Vert _2^{-1}) - 2\delta _2\varepsilon ^{2/r}|s/T_0|\Vert g\Vert _2^{-1} \\&= \delta _2(1-\Vert g\Vert _2^{-1}(\varepsilon +\varepsilon ^{2/r}(22/10))) \\&\ge \delta _2(1-3\varepsilon ^{2/r}\Vert g\Vert _2^{-1})\ge \delta _2/2, \end{aligned}$$

provided that \(\varepsilon ^{2/r}<\Vert g\Vert _2/6\). Similarly, using in addition, \(\delta _1, \delta _2<1/4\) and \(\varepsilon <\Vert g\Vert _2\), we obtain, for \(|s-t|\le \delta _1N^{-\nu +1/2}\),

$$\begin{aligned} |v|&\le |s-t|(1+N^{\nu -1/2}\Vert g\Vert _2^{-1}) + 2\delta _2\varepsilon ^{2/r}|s/T_0|\Vert g\Vert _2^{-1} \\&\le |s-t|(1+\varepsilon \Vert g\Vert _2^{-1})+(22/10)\delta _2\varepsilon ^{2/r}\Vert g\Vert _2^{-1} \\&\le 2\,|s-t|+1 \le N^{-\nu +1/2}. \end{aligned}$$

It follows from these inequalities, see (95), that

$$\begin{aligned} 1-|\mathbf{E}e^{i{\tilde{X}}}|^2 \ge 1-|\mathbf{E}e^{i{\tilde{X}}}| \ge 1-|\mathbf{E}e^{ivg}|-\mathbf{E}|h| \ge \delta '-\mathbf{E}|h|. \end{aligned}$$

Finally, invoking (126) and (37), we get

$$\begin{aligned} 1-|\mathbf{E}e^{i{\tilde{X}}}|^2\ge \delta ' -c_g\delta _1-\varepsilon ^{2/r} \ge \delta '/2 > 4\varepsilon ^2, \end{aligned}$$

Once again we obtain a contradiction to (118), thus completing the proof.\(\square \)

5 Expansions

Here we prove the bound

$$\begin{aligned} \int _{|t|\le t_1}\Bigl |\mathbf{E}e^{it{\tilde{\mathbb T}}}-{\hat{G}}(t)\Bigr |\frac{dt}{|t|} \le c_*N^{-1-\nu }, \end{aligned}$$
(127)

where \(t_1=N^{1/2}/10^3\beta _3\). For the definition of \({\tilde{\mathbb T}}\) and \({\hat{G}}\) see Sect. 2.4. Here and below \(c_*\) denotes a constant depending on \(A_*,M_*,D_*,r,s,\nu _1\) only. We prove (127) for sufficiently large N, that is, we shall assume that \(N>C_*\), where \(C_*\) is a number depending on \(A_*,M_*,D_*,r,s,\nu _1\) only. Note that for \(N<C_*\), the bound (127) becomes trivial, since in this case the integral is bounded by a constant.

Let us first introduce some notation. Denote \(\Omega _m=\{1,\dots , m\}\). For \(A\subset \Omega _N\) write \({\mathbb U}_1(A)=\sum _{j\in A}g_1(X_j)\). Given complex valued functions fh we write \(f\prec \mathcal R\) if

$$\begin{aligned} \int _{|t|\le t_1}|t^{-1}f(t)|dt\le c_*N^{-1-\nu } \end{aligned}$$

and write \(f\sim h\) if \(f-h\prec \mathcal R\). In particular, (127) can be written in short \(\mathbf{E}e^{it{\tilde{\mathbb T}}}\sim {\hat{G}}(t)\).

In order to prove (127) we show that

$$\begin{aligned} \mathbf{E}e^{it{\tilde{\mathbb T}}}\sim \mathbf{E}e^{it{\mathbb T}} \qquad {\text {and}} \qquad \mathbf{E}e^{it{\mathbb T}}\sim {\hat{G}}(t). \end{aligned}$$
(128)

In what follows we use the notation of Sect. 2. We denote \(\alpha (t)=\mathbf{E}e^{itg(X_1)}\). We assume that (16) holds.

5.1 Proof of the first relation of (128)

We have, see (19),

$$\begin{aligned} {\mathbb T}={\tilde{\mathbb T}}+{\tilde{\Lambda }}_1+{\tilde{\Lambda }}_2, \qquad {\tilde{\Lambda }}_1=\Lambda _1+\Lambda _4, \qquad {\tilde{\Lambda }}_2=\Lambda _2+\Lambda _3+\Lambda _5, \end{aligned}$$

where the random variables \(\Lambda _j\) are introduced in Sect. 2.4. We shall show that

$$\begin{aligned} \mathbf{E}e^{it{\tilde{\mathbb T}}} \sim \mathbf{E}e^{it({\tilde{\mathbb T}}+{\tilde{\Lambda }}_1)} \qquad {\text {and}} \qquad \mathbf{E}e^{it({\tilde{\mathbb T}}+{\tilde{\Lambda }}_1)} \sim \mathbf{E}e^{it {\mathbb T}}. \end{aligned}$$
(129)

The second relation follows from the moment bounds of Lemma 5 via Taylor expansion. We have

$$\begin{aligned} \mathbf{E}e^{it{\mathbb T}} = \mathbf{E}e^{it({\tilde{\mathbb T}}+{\tilde{\Lambda }}_1)} +R, \qquad |R|\le |t|\mathbf{E}|{\tilde{\Lambda }}_2|, \end{aligned}$$

By Lyapunov’s inequality,

$$\begin{aligned} \mathbf{E}|{\tilde{\Lambda }}_2| \le (\mathbf{E}\Lambda _2^2)^{1/2} + (\mathbf{E}\Lambda _3^2)^{1/2} + (\mathbf{E}\Lambda _5^2)^{1/2}. \end{aligned}$$

Invoking the moment bounds of Lemma 5 we obtain \(|t|\mathbf{E}|{\tilde{\Lambda }}_2|\prec \mathcal R\), thus, proving the second part of (129).

In order to prove the first part we combine Taylor’s expansion with bounds for characteristic functions. Expanding the exponent we obtain

$$\begin{aligned} \mathbf{E}e^{it({\tilde{\mathbb T}}+{\tilde{\Lambda }}_1)} = \mathbf{E}e^{it{\tilde{\mathbb T}}} + it\mathbf{E}e^{it{\tilde{\mathbb T}}}{\tilde{\Lambda }}_1 + R, \qquad |R|\le t^2\mathbf{E}|{\tilde{\Lambda }}_1|^2. \end{aligned}$$

Invoking the identities

$$\begin{aligned} \mathbf{E}\Lambda _1^2=\left( {\begin{array}{c}m\\ 2\end{array}}\right) \frac{\gamma _2}{N^3}, \qquad \mathbf{E}\Lambda _4^2=m \left( {\begin{array}{c}N-m\\ 2\end{array}}\right) \frac{\zeta _2}{N^5} \end{aligned}$$
(130)

we obtain, for \(\gamma _2<c_*\) and \(\zeta _2<c_*\), see (5), and \(m\le N^{1/12}\), that \(R\prec \mathcal R\). We complete the proof of (129) by showing that

$$\begin{aligned} t\mathbf{E}e^{it{\tilde{\mathbb T}}}{\tilde{\Lambda }}_1\prec \mathcal R. \end{aligned}$$
(131)

Let us prove (131). Split \({\mathbb W}={\mathbb W}_1+{\mathbb W}_2+{\mathbb W}_3+R_{W}\), where

$$\begin{aligned} {\mathbb W}_k=\sum _{A\subset \Omega ',\,|A|=k}T_A, \qquad R_W=\sum _{A\subset \Omega ', \, |A|\ge 4}T_A. \end{aligned}$$

Here \(\Omega '=\{m+1,\dots , N\}\). Denote \({\mathbb R}={\mathbb U}_2^*+{\mathbb W}_3+R_W\) and \({\mathbb U}_1=\sum _{j=1}^Ng_1(X_j)\). We have \( {\tilde{\mathbb T}}={\mathbb U}_1+{\mathbb W}_2+{\mathbb R}\). Expanding the exponent in powers of \(it{\mathbb R}\) we obtain

$$\begin{aligned} t\mathbf{E}e^{it{\tilde{\mathbb T}}}{\tilde{\Lambda }}_1 = t\mathbf{E}e^{it({\mathbb U}_1+{\mathbb W}_2)}{\tilde{\Lambda }}_1+t^2R, \end{aligned}$$
(132)

where

$$\begin{aligned}&|R| \le \mathbf{E}|{\tilde{\Lambda }}_1{\mathbb R}| \le (r_1+r_2)(r_3+r_4+r_5),\\&r_1^2=\mathbf{E}\Lambda _1^2, \quad r_2^2=\mathbf{E}\Lambda _4^2, \quad r_3^2=\mathbf{E}({\mathbb U}_2^*)^2, \quad r_4^2=\mathbf{E}R_W^2, \quad r_5^2=\mathbf{E}{\mathbb W}_3^2. \end{aligned}$$

In the last step we applied the Cauchy–Schwartz inequality. Combining (130) with the identities

$$\begin{aligned} \mathbf{E}({\mathbb U}_2^*)^2 = \frac{m(N-m)}{N^3}\gamma _2, \qquad \mathbf{E}{\mathbb W}_3^2=\frac{\left( {\begin{array}{c}N-m\\ 3\end{array}}\right) }{N^5}\zeta _2 \end{aligned}$$

and invoking the simple bound

$$\begin{aligned} \mathbf{E}R_W^2\le \frac{\Delta _4^2}{N^3}\le \frac{D_*}{N^{2+2\nu _1}}, \end{aligned}$$

we obtain \(t^2(r_1+r_2)(r_3+r_4+r_5)\prec \mathcal R\). Therefore, (132) implies

$$\begin{aligned} t\mathbf{E}e^{it{\tilde{\mathbb T}}}{\tilde{\Lambda }}_1 \sim t\mathbf{E}e^{it({\mathbb U}_1+{\mathbb W}_2)}{\tilde{\Lambda }}_1. \end{aligned}$$

Let us show that \(t\mathbf{E}e^{it({\mathbb U}_1+{\mathbb W}_2)}{\tilde{\Lambda }}_1\sim 0\). Expanding the exponent in powers of \(it{\mathbb W}_2\) we get

$$\begin{aligned}&t\mathbf{E}e^{it({\mathbb U}_1+{\mathbb W}_2)}{\tilde{\Lambda }}_1 = f_1(t)+f_2(t)+f_3(t)+f_4(t), \\&f_1(t)= t\mathbf{E}e^{it{\mathbb U}_1}{\tilde{\Lambda }}_1, \qquad \qquad \quad f_2(t)= it^2\mathbf{E}e^{it{\mathbb U}_1}\Lambda _1{\mathbb W}_2, \\&f_3(t)= t^2\mathbf{E}e^{it{\mathbb U}_1} \Lambda _4{\mathbb W}_2\theta _1, \qquad \ f_4(t)= t^3\mathbf{E}e^{it{\mathbb U}_1}\Lambda _1{\mathbb W}_2^2\theta _2/2, \end{aligned}$$

where \(\theta _1, \theta _2\) are functions of \({\mathbb W}_2\) satisfying \(|\theta _i|\le 1\).

Let us show that \(f_i\prec \mathcal R\), for \(i=1,2,3,4\). Split the set \(\Omega _m=\{1,\dots , m\}\) in three (non-intersecting) parts \(A_1\cup A_2\cup A_3=\Omega _m\) of (almost) equal size \(|A_i|\approx m/3\). The set of pairs \(\bigl \{\{i,j\}\subset \Omega _m\bigr \}\) splits into six (non-intersecting) parts \(B_{kr}\), \(1\le k\le r\le 3\) (the pair \(\{i,j\}\) belongs to \(B_{kr}\) if \(i\in A_k\) and \(j\in A_r\)). Write

$$\begin{aligned}&\Lambda _1=\sum _{1\le k\le r\le 3}\Lambda _1(k,r), \qquad \Lambda _1(k,r)=\sum _{\{i,j\}\in B_{kr}} g_2(X_k,X_l), \\&\Lambda _4=\sum _{1\le k\le 3}\Lambda _4(k), \qquad \Lambda _4(k)=\sum _{i\in A_k}\sum _{m+1\le j<l\le N}g_3(X_i,X_j,X_l). \end{aligned}$$

Let us prove \(f_4\prec \mathcal R\). We shall show that

$$\begin{aligned} t^3\mathbf{E}e^{it{\mathbb U}_1}\Lambda _1(k,r){\mathbb W}^2_2\theta _2\prec \mathcal R. \end{aligned}$$
(133)

Given a pair (kr) denote \(A_i=\Omega _m\setminus (A_k\cup A_r)\) and write \(k_i=|A_i|\). Note that \(k_i\approx m/3\). We shall assume that \(k_i\ge m/4\). Since the random variable \({\mathbb U}_1(A_i):=\sum _{j\in A_i}g_1(X_j)\) and the random variables \(\Lambda _1(k,r)\), \({\mathbb W}_2\) are independent, we have

$$\begin{aligned} \mathbf{E}e^{it{\mathbb U}_1}\Lambda _1(k,r){\mathbb W}^2_2\theta _2 = \mathbf{E}e^{it{\mathbb U}_1(A_i)} \ \mathbf{E}\Lambda _1(k,r){\mathbb W}^2_2\theta _2. \end{aligned}$$

Therefore,

$$\begin{aligned} |\mathbf{E}e^{it{\mathbb U}_1}\Lambda _1(k,r){\mathbb W}^2_2\theta _2| \le |\mathbf{E}e^{it{\mathbb U}_1(A_i)}| \ \mathbf{E}| \Lambda _1(k,r){\mathbb W}^2_2|. \end{aligned}$$
(134)

The first factor on the right is bounded from above by \(\exp \{-mt^2/16N\}\), for \(k_i\ge m/4\), see (165) below. The second factor is bounded from above by r, where

$$\begin{aligned} {r}^2=\mathbf{E}\Lambda _1^2(k,r)\mathbf{E}{\mathbb W}_2^4\le c_*m^2N^{-5}. \end{aligned}$$

Here we combined the Cauchy–Schwartz inequality and the bounds

$$\begin{aligned} \mathbf{E}\Lambda _1^2(k,r)\le c_*m^2N^{-3}, \qquad \mathbf{E}{\mathbb W}_2^4\le c_*N^{-2}. \end{aligned}$$

Finally, (133) follows from (134)

$$\begin{aligned} \bigl | t^3\mathbf{E}e^{it{\mathbb U}_1}\Lambda _1(k,r){\mathbb W}_2^2\theta _2 \bigr | \le c_*|t|^3e^{-mt^2/16N}mN^{-5/2}\prec \mathcal R. \end{aligned}$$

The proof of \(f_3\prec \mathcal R\) is almost the same as that of \(f_4\prec \mathcal R\).

Let us prove \(f_2\prec \mathcal R\). Split the set \(\Omega '=\{m+1,\dots , N\}\) into three (non-intersecting) parts \(B_1\cup B_2\cup B_3=\Omega '\) of (almost) equal sizes \(|B_i|\approx (N-m)/3\). Split the set of pairs \(\bigl \{ \{i,j\}:\, m+1\le i<j\le N\bigr \}\) into (non-intersecting) groups D(kr), for \(1\le k\le r\le 3\). The pair \(\{i,j\}\in D(k,r)\) if \(i\in B_k\) and \(j\in B_r\). Write

$$\begin{aligned}&{\mathbb W}_2=\sum _{1\le k\le r\le 3}{\mathbb W}_2(k,r), \qquad {\mathbb W}_2(k,r)=\sum _{\{i,j\}\in D(k,r)}g_2(X_i,X_j). \\&\Lambda _4=\sum _{1\le k\le r\le 3}\Lambda _4(k,r), \qquad \Lambda _4(k,r)=\sum _{1\le s\le m}\sum _{\{i,j\}\in D(k,r)}g_3(X_s,X_i,X_j), \end{aligned}$$

In order to prove \(f_2\prec \mathcal R\) we shall show that

$$\begin{aligned} t^2\mathbf{E}e^{it{\mathbb U}_1}\Lambda _1{\mathbb W}_2(k,r)\prec \mathcal R. \end{aligned}$$
(135)

Write \(B_i=\Omega '\setminus (B_k\cup B_r)\) and denote \(m_i=|B_i|\). We shall assume that \(m_i\ge N/4\). Since the random variable \({\mathbb U}_1(B_i)=\sum _{j\in B_i}g_1(X_j)\) and the random variables \(\Lambda _1\) and \({\mathbb W}_2(k,r)\) are independent, we have, cf. (134),

$$\begin{aligned} |\mathbf{E}e^{it{\mathbb U}_1}\Lambda _1{\mathbb W}_2(k,r)| \le |\mathbf{E}e^{it{\mathbb U}_1(B_i)}| \ \mathbf{E}|\Lambda _1{\mathbb W}_2(k,r)|. \end{aligned}$$
(136)

The first factor in the right is the product \(|\alpha ^{m_i}(t)|\le e^{-m_it^2/4N}\), see the argument used in the proof of (133) above. The second factor is bounded from above by \({\tilde{r}}\), where

$$\begin{aligned} {\tilde{r}}^2=\mathbf{E}\Lambda _1^2\mathbf{E}{\mathbb W}_2^2(k,r)\le c_*m^2N^{-4}. \end{aligned}$$

Finally, we obtain, using the inequality \(m_i\ge N/4\),

$$\begin{aligned} |\mathbf{E}e^{it{\mathbb U}_1}| \ \mathbf{E}|\Lambda _1{\mathbb W}_2(k,r)| \le c_*\frac{m}{N^2}\exp \{-t^2\frac{m_i}{4N}\} \le c_*\frac{m}{N^2}\exp \{-\frac{t^2}{16}\}. \end{aligned}$$

This in combination with (136) shows (135). We obtain \(f_2\prec \mathcal R\).

Let us prove \(f_1\prec \mathcal R\). We shall show that \(f^*\prec \mathcal R\) and \(f^{\star }\prec \mathcal R\), where

$$\begin{aligned} f^{\star }=t\mathbf{E}e^{it{\mathbb U}_1}\Lambda _1 \qquad {\text {and}} \qquad f^*=t\mathbf{E}e^{it{\mathbb U}_1}\Lambda _4 \end{aligned}$$

satisfy \(f^*+f^{\star }=f_1\).

Let us show \(f^{\star }\prec \mathcal R\). Denote \({\mathbb U}_1^{\star }=\sum _{j=m+1}^Ng_1(X_j)\). We obtain, by the independence of \({\mathbb U}_1^{\star }\) and \(\Lambda _1\) that

$$\begin{aligned} |\mathbf{E}e^{it{\mathbb U}_1}\Lambda _1| \le |\mathbf{E}e^{it{\mathbb U}_1^{\star }}| \ \mathbf{E}|\Lambda _1|. \end{aligned}$$

Invoking, for \(N-m>N/2\), the bound \( |\mathbf{E}e^{it{\mathbb U}_1^{\star }}| \le e^{-t^2/8} \), see (165) below, and the bound \(\mathbf{E}|\Lambda _1|\le (\mathbf{E}\Lambda _1^2)^{1/2}\le c_*mN^{-3/2}\) we obtain

$$\begin{aligned} |f^{\star }(t)|\le c_*|t|e^{-t^2/8}N^{-3/2}\prec \mathcal R. \end{aligned}$$

Let us prove \(f^*\prec \mathcal R\). We shall show that, for \(1\le k\le r\le 3\),

$$\begin{aligned} t\mathbf{E}e^{it{\mathbb U}_1}\Lambda _4(k,r)\prec \mathcal R. \end{aligned}$$
(137)

Proceeding as in the proof of (135) we obtain the chain of inequalities

$$\begin{aligned} |\mathbf{E}e^{it{\mathbb U}_1}\Lambda _4(k,r)| \le e^{-t^2/16}\mathbf{E}|\Lambda _4(k,r)| \le c_* e^{-t^2/16}m^{1/2}N^{-3/2}. \end{aligned}$$
(138)

In the last step we applied Cauchy–Schwartz and the simple bound \(\mathbf{E}\Lambda _4^2(k,r)\le c_*mN^{-3}\). Clearly, (138) implies (137).

5.2 Proof of the second relation of (128)

Here we prove the second relation of (128). Firstly, we shall show that

$$\begin{aligned}&\mathbf{E}e^{it{\mathbb T}}\sim \mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2+{\mathbb U}_3)\}, \end{aligned}$$
(139)
$$\begin{aligned}&\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2+{\mathbb U}_3)\} \sim \mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2)\}+\left( {\begin{array}{c}N\\ 3\end{array}}\right) e^{-t^2/2}(it)^4w, \end{aligned}$$
(140)

where \(w=\mathbf{E}g_3(X_1,X_2,X_3)g_1(X_1)g_1(X_2)g_1(X_3)\).

Let m(t) be an integer valued function such that

$$\begin{aligned} m(t)\approx C_1Nt^{-2}\ln (t^2+1), \qquad C_1\le |t|\le t_1, \end{aligned}$$
(141)

and put \(m(t)\equiv 10\), for \(|t|\le C_1\). Here \(C_1\) denotes a large absolute constant (one can take, e.g., \(C_1=200\)). Assume, in addition, that the numbers \(m=m(t)\) are even.

5.2.1 Proof of (139)

Given m write

$$\begin{aligned} {\mathbb T}={\mathbb U}_1+{\mathbb U}_2+{\mathbb U}_3+{\mathbb H}, \end{aligned}$$

where

$$\begin{aligned} {\mathbb H}={\mathbb H}_1+{\mathbb H}_2, \qquad {\mathbb H}_1=\sum _{|A|\ge 4,\, A\cap \Omega _m=\emptyset }T_A, \qquad {\mathbb H}_2=\sum _{|A|\ge 4,\, A\cap \Omega _m\not =\emptyset }T_A. \end{aligned}$$

In order to show (139) we expand the exponent in powers of \(it{\mathbb H}\) and \(it{\mathbb U}_3\),

$$\begin{aligned} \mathbf{E}\exp \{it{\mathbb T}\} = \mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2+{\mathbb U}_3)\} + \mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2)\}it{\mathbb H} + R, \end{aligned}$$

where \( |R|\le t^2(\mathbf{E}{\mathbb H}^2+\mathbf{E}|{\mathbb U}_3{\mathbb H}|) \). Invoking the bounds, see (166), (167), (5), (6),

$$\begin{aligned} \mathbf{E}{\mathbb H}^2\le N^{-3}\Delta _4^2\le c_*N^{-2-2\nu _1}, \qquad \mathbf{E}{\mathbb U}_3^2\le N^{-2}\zeta _2\le c_*N^{-2} \end{aligned}$$
(142)

we obtain, by Cauchy–Schwartz, \(|R|\le c_*t^2 N^{-2-\nu _1}\prec \mathcal R\). We complete the proof of (139) by showing that

$$\begin{aligned} \mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2)\}it{\mathbb H}\prec \mathcal R. \end{aligned}$$
(143)

Before proving (143) we collect some auxiliary inequalities. For \(m=2k\) write

$$\begin{aligned} \Omega _m=A_1\cup A_2, {\text { where}} \qquad A_1=\{1,\dots , k\}, \qquad A_2=\{k+1,\dots , 2k\}. \end{aligned}$$
(144)

Furthermore, split the sum

$$\begin{aligned} {\mathbb U}_2= & {} {\mathbb Z}_1+{\mathbb Z}_2+{\mathbb Z}_3+{\mathbb Z}_4, \nonumber \\ {\mathbb Z}_1= & {} \sum _{1\le i<j\le m}g_2(X_i,X_j), \qquad {\mathbb Z}_2 = \sum _{i\in A_1}\sum _{m<j\le N}g_2(X_i,X_j), \nonumber \\ {\mathbb Z}_3= & {} \sum _{i\in A_2}\sum _{m<j\le N}g_2(X_i,X_j), \qquad {\mathbb Z}_4=\sum _{m<i<j\le N}g_2(X_i,X_j). \end{aligned}$$
(145)

In what follows we shall use the simple bounds, see (5),

$$\begin{aligned}&\mathbf{E}{\mathbb Z}_1^2\le \frac{m^2}{N^3}\gamma _2\le c_*\frac{m^2}{N^3}, \qquad \mathbf{E}{\mathbb Z}_4^2\le \frac{\gamma _2}{N}\le \frac{c_*}{N}, \nonumber \\&\mathbf{E}{\mathbb Z}_i^2\le \frac{m}{N^2}\gamma _2\le c_*\frac{m}{N^2}, \quad \mathbf{E}{\mathbb Z}_i^4\le c\frac{m^2}{N^{4}}\gamma _4\le c_*\frac{m^2}{N^4}, \quad i=2,3. \end{aligned}$$
(146)

Let us prove (143). Expand the exponent \(\exp \{it({\mathbb U}_1+{\mathbb Z}_1+\dots +{\mathbb Z}_4)\}\) in powers of \(it{\mathbb Z}_1\) to get

$$\begin{aligned} \mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2)\}it{\mathbb H} = h_1(t)+R, \end{aligned}$$

where \(h_1(t)=\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb Z}_2+\dots +{\mathbb Z}_4)\}it{\mathbb H}\) and where

$$\begin{aligned} |R| \le t^2\mathbf{E}|{\mathbb H}{\mathbb Z}_1| \le t^2(\mathbf{E}{\mathbb H}^2)^{1/2}(\mathbf{E}{\mathbb Z}_1^2)^{1/2} \le c_*t^2mN^{-(5+2\nu _1)/2}. \end{aligned}$$

For \(m=m(t)\) satisfying (141) we have \(R\prec \mathcal R\). Therefore, we obtain

$$\begin{aligned} \mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2)\}it{\mathbb H}\sim h_1. \end{aligned}$$

In order to prove \(h_1\prec \mathcal R\) we write \(h_1=h_2+h_3\) and show that \(h_2, h_3\prec \mathcal R\) , where

$$\begin{aligned}&h_2=\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb Z}_2+\dots +{\mathbb Z}_4)\}it{\mathbb H}_1, \qquad \\&h_3=\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb Z}_2+\dots +{\mathbb Z}_4)\}it{\mathbb H}_2. \end{aligned}$$

Let us show that \(h_2\prec \mathcal R\). Firstly, we prove that

$$\begin{aligned} h_2\sim h_{2.1}+h_{2.2}+h_{2.3}, \end{aligned}$$
(147)

where \(h_{2.1}(t)=\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb Z}_4)\}it{\mathbb H}_1\) and, for \(j=2,3\),

$$\begin{aligned} h_{2.j}(t)=\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb Z}_4)\}(it)^2{\mathbb H}_1{\mathbb Z}_j. \end{aligned}$$

Expanding the exponent in powers of \(it({\mathbb Z}_2+{\mathbb Z}_3)\) we obtain

$$\begin{aligned} h_2=h_{2.1}+h_{2.2}+h_{2.3}+R, \end{aligned}$$

where \(|R|\le |t|^3\mathbf{E}|{\mathbb H}_1|({\mathbb Z}_2+{\mathbb Z}_3)^2\) is bounded from above by

$$\begin{aligned} |t|^3 (\mathbf{E}{\mathbb H}_1^2)^{1/2}(\mathbf{E}({\mathbb Z}_2+{\mathbb Z}_3)^4)^{1/2} \le c_*|t|^3mN^{-3-\nu _1} \prec \mathcal R. \end{aligned}$$

In the last step we used \(\mathbf{E}{\mathbb H}_1^2\le \mathbf{E}{\mathbb H}^2\) and applied (142) and (146). Therefore, (147) follows.

Let us show \(h_{2.i}\prec \mathcal R\), for \(i=1,\, 2,\,3\). The random variable \({\mathbb U}_1(A_1)\) does not depend on the observations \(X_j\), \(j\in \Omega \setminus A_1\). Therefore, we can write

$$\begin{aligned} h_{2.3} = \mathbf{E}\exp \{it{\mathbb U}_1(A_1)\} \mathbf{E}\exp \{it({\mathbb U}_1(\Omega \setminus A_1)+{\mathbb Z}_4)\}(it)^2 {\mathbb H}_1{\mathbb Z}_3. \end{aligned}$$

Furthermore, using (165) we obtain, for \(|A_1|=m/2\),

$$\begin{aligned} |h_{2.3}|\le t^2|\alpha ^{m/2}(t)|\mathbf{E}|{\mathbb H}_1{\mathbb Z}_3| \le c_*t^2\exp \{-t^2\frac{m}{8N}\}\frac{m^{1/2}}{N^{2+\nu _1}}. \end{aligned}$$
(148)

In the last step we combined the bound \(\mathbf{E}{\mathbb H}_1^2\le c_*N^{-2-2\nu _1}\) and (146) to get

$$\begin{aligned} \mathbf{E}|{\mathbb H}_1{\mathbb Z}_3| \le (\mathbf{E}{\mathbb H}_1^2)^{1/2}(\mathbf{E}{\mathbb Z}_3^2)^{1/2} \le c_*m^{1/2}N^{-2-\nu _1}. \end{aligned}$$

Note that choosing of \(C_1\) in (141) sufficiently large implies, for \(|t|\ge C_1\),

$$\begin{aligned} t^2m/12N\approx (C_1/12)\ln (t^2+1)\ge 10 \ln (t^2+1). \end{aligned}$$

An application of this bound to the argument of the exponent in (148) shows \(h_{2.3}\prec \mathcal R\). The proof of \(h_{2.i}\prec \mathcal R\), for \(i=1,2\), is almost the same. Therefore, we obtain \(h_2\prec \mathcal R\).

Let us prove \(h_3\prec \mathcal R\). Firstly we collect some auxiliary inequalities. Write \(m=2k\) (recall that the number m is even) and split \(\Omega _m=B \cup D\), where B denotes the set of odd numbers and D denotes the set of even numbers. Split \({\mathbb H}_2={\mathbb H}_B+{\mathbb H}_D+{\mathbb H}_C\). Here, for \(A\subset \Omega _N\) and \(|A|\ge 4\), we denote by \({\mathbb H}_B\) the sum of \(T_A\) such that \(A\cap B=\emptyset \) and \(A\cap D\not =\emptyset \); \({\mathbb H}_D\) denotes the sum of \(T_A\) such that \(A\cap B\not =\emptyset \) and \(A\cap D=\emptyset \); \({\mathbb H}_C\) denotes the sum of \(T_A\) such that \(A\cap B\not =\emptyset \) and \(A\cap D\not =\emptyset \). It follows from the inequalities (177) and (6) that

$$\begin{aligned} \mathbf{E}\,{\mathbb H}_C^2\le c_*m^2N^{-4-2\nu _1}, \qquad \mathbf{E}\,{\mathbb H}_B^2=\mathbf{E}\,{\mathbb H}_D^2\le c_*mN^{-3-2\nu _1}. \end{aligned}$$
(149)

Using the notation \(z=it\exp \{it({\mathbb U}_1+{\mathbb Z}_2+{\mathbb Z}_3+{\mathbb Z}_4)\}\) write

$$\begin{aligned}&h_3=\mathbf{E}z{\mathbb H}_2=h_{3.1}+h_{3.2}+h_{3.3}, \\&h_{3.1}=\mathbf{E}z{\mathbb H}_B, \quad h_{3.2}=\mathbf{E}z{\mathbb H}_D, \quad h_{3.3}=\mathbf{E}z{\mathbb H}_C. \end{aligned}$$

We shall show that \(h_{3.i}\prec \mathcal R\), for \(i=1,2,3\). The relation \(h_{3.3}\prec \mathcal R\) follows from (149) and (146), and by Cauchy–Schwartz, \( |h_{3.3}|\le c_*|t|\, mN^{-2-\nu _1}\prec \mathcal R\).

Let us show that \(h_{3.2}\prec \mathcal R\). Expanding the exponent in powers of \(it({\mathbb Z}_2+{\mathbb Z}_3)\) we obtain

$$\begin{aligned} h_{3.2}=h_{3.2}^*+R, \qquad h_{3.2}^*:=\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb Z}_4)\}it{\mathbb H}_D, \end{aligned}$$

where \(|R|\le t^2\mathbf{E}|{\mathbb H}_D ({\mathbb Z}_2+{\mathbb Z}_3)|\). Combining the bounds (146) and (149) we obtain, by Cauchy–Schwartz, \(|R|\le c_*t^2mN^{-(5+2\nu _1)/2}\prec \mathcal R\). Next we show that \(h_{3.2}^*\prec \mathcal R\). The random variable \({\mathbb U}_1(D)=\sum _{j\in D}g_1(X_j)\) and the random variable \({\mathbb H}_D\) are independent. Therefore, we can write

$$\begin{aligned} |h_{3.2}^*|\le |t|\,|\mathbf{E}\exp \{it{\mathbb U}_1(D)\}|\mathbf{E}|{\mathbb H}_D|. \end{aligned}$$

Combining (165) and (149) we obtain using Cauchy–Schwartz,

$$\begin{aligned} |h_{3.2}^*| \le c_*|t|\,e^{-mt^2/8N}m^{1/2}N^{-(3+2\nu _1)/2}\prec \mathcal R. \end{aligned}$$

The proof of \(h_{3.1}\prec \mathcal R\) is similar. Therefore, we obtain \(h_3\prec \mathcal R\). This together with the relation \(h_2\prec \mathcal R\), proved above, implies \(h_1\prec \mathcal R\). Thus we arrive at (143) completing the proof of (139).

5.2.2 Proof of (140)

We start with some auxiliary moment inequalities. Split

$$\begin{aligned} {\mathbb U}_3=W+Z, \qquad W=\sum _{|A|=3,\, A\cap \Omega _m\not =\emptyset }T_A, \qquad Z=\sum _{|A|=3,\, A\cap \Omega _m=\emptyset }T_A. \end{aligned}$$

Using the orthogonality and moment bounds for U-statistics, see, e.g., Dharmadhikari et al. [20], one can show that

$$\begin{aligned} \mathbf{E}W^2\le mN^2\mathbf{E}g_3^2(X_1,X_2,X_3), \qquad \mathbf{E}Z^2\le N^3\mathbf{E}g_3^2(X_1,X_2,X_3), \end{aligned}$$

and \(\mathbf{E}|Z|^s\le cN^{3s/2}\mathbf{E}|g_3(X_1,X_2,X_3)|^s\). Invoking (5) we obtain

$$\begin{aligned} \mathbf{E}W^2\le c_*mN^{-3}, \qquad \mathbf{E}Z^2\le c_*N^{-2}, \qquad \mathbf{E}|Z|^s\le c_*N^{-s}. \end{aligned}$$
(150)

For the sets \(A_1,A_2\subset \Omega _m\) defined in (144) write

$$\begin{aligned}&\mathcal{D}=\{A\subset {\Omega }_N:\, |A|=3,\, A\cap {\Omega }_m\not =\emptyset \}, \\&\mathcal{D}_1=\{A\in \mathcal{D}:\, A\cap A_1=\emptyset \}, \\&\mathcal{D}_2=\{A\in \mathcal{D}:\, A\cap A_2=\emptyset \}, \\&\mathcal{D}_3=\{A\in \mathcal{D}:\, A\cap A_1\not =\emptyset , \ A\cap A_2\not =\emptyset \}. \end{aligned}$$

We have \(\mathcal{D}=\mathcal{D}_1\cup \mathcal{D}_2\cup \mathcal{D}_3\) and \(W=\sum _{A\in \mathcal{D}}T_A\). Therefore, we can write \(W=W_1+W_2+W_3\), where \(W_j=\sum _{A\in \mathcal{D}_j}T_A\).

A calculation shows that

$$\begin{aligned} \mathbf{E}W_1^2=\mathbf{E}W_2^2\le kN^{2}\mathbf{E}g_3^2(X_1,X_2,X_3), \qquad \mathbf{E}W_3^2\le k^2N\mathbf{E}g_3^2(X_1,X_2,X_3). \end{aligned}$$

Therefore, we obtain form (5) that

$$\begin{aligned} \mathbf{E}W_1^2=\mathbf{E}W_2^2\le c_*mN^{-3}, \qquad \mathbf{E}W_3^2\le c_*m^2N^{-4}. \end{aligned}$$
(151)

Let us prove (140). Write \({\mathbb U}_3=W+Z\). Expanding the exponent in powers of itW we obtain

$$\begin{aligned}&\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2+{\mathbb U}_3)\}=h_4+h_5+R, \\&h_4 =\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2+Z)\}, \\&h_5=\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2+Z)\}itW, \end{aligned}$$

where, by (150), \(|R|\le t^2\mathbf{E}W^2\le c_*t^2mN^{-3}\prec \mathcal R\). This implies

$$\begin{aligned} \mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2+{\mathbb U}_3)\}\sim h_4+h_5. \end{aligned}$$

In order to prove (140) we shall show that

$$\begin{aligned}&h_5\sim \mathbf{E}\exp \{it{\mathbb U}_1\}itW, \end{aligned}$$
(152)
$$\begin{aligned}&h_4 \sim \mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2)\} + \mathbf{E}\exp \{it{\mathbb U}_1\}itZ, \end{aligned}$$
(153)
$$\begin{aligned}&\mathbf{E}\exp \{it{\mathbb U}_1\}it{\mathbb U}_3\sim \left( {\begin{array}{c}N\\ 3\end{array}}\right) e^{-t^2/2}(it)^4w. \end{aligned}$$
(154)

Let us prove (152). Expanding the exponent (in \(h_5\)) in powers of itZ we obtain

$$\begin{aligned} h_5 = h_6+R, \qquad h_6 = \mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2)\}itW, \end{aligned}$$

where, by (150) and Cauchy–Schwartz,

$$\begin{aligned} |R| \le t^2\mathbf{E}|WZ|\le c_*t^2m^{1/2}N^{-5/2}\prec \mathcal R. \end{aligned}$$

We have, \(h_5\sim h_6\).

It remains to show that \(h_6\sim \mathbf{E}\exp \{it{\mathbb U}_1\}itW\). Split

$$\begin{aligned} {\mathbb U}_2={\mathbb U}_2^*+{\mathbb U}_2^{\star }, \qquad {\mathbb U}_2^*=\sum _{|A|=2,\, A\cap \Omega _m\not =\emptyset }T_A, \qquad {\mathbb U}_2^{\star }=\sum _{|A|=2,\, A\cap \Omega _m=\emptyset }T_A. \end{aligned}$$
(155)

We have, see (146),

$$\begin{aligned} \mathbf{E}({\mathbb U}_2^*)^2\le c_*mN^{-2}, \qquad \mathbf{E}({\mathbb U}_2^{\star })^2\le c_*N^{-1}. \end{aligned}$$
(156)

Expanding the exponent (in \(h_6\)) in powers of \(it{\mathbb U}_2^*\) we obtain

$$\begin{aligned} h_6=h_7+R, \qquad {\text {where}} \qquad h_7=\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2^{\star })\}itW, \end{aligned}$$

and where, by (150), (156) and Cauchy–Schwartz,

$$\begin{aligned} |R| \le t^2\mathbf{E}|W{\mathbb U}_2^*| \le c_*t^2mN^{-5/2} \prec \mathcal R. \end{aligned}$$

Therefore, we obtain \(h_6\sim h_7\).

We complete the proof of (152) by showing that \(h_7\sim \mathbf{E}\exp \{it{\mathbb U}_1\}itW\). Use the decomposition \(W=W_1+W_2+W_3\) and write

$$\begin{aligned} h_7=h_{7.1}+h_{7.2}+h_{7.3}, \qquad h_{7.j}=\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2^{\star })\}itW_j. \end{aligned}$$

We shall show that

$$\begin{aligned} h_{7.j}\sim \mathbf{E}\exp \{it{\mathbb U}_1\}itW_j, \qquad j=1,2,3. \end{aligned}$$
(157)

Expanding in powers of \(it{\mathbb U}_2^{\star }\) we obtain

$$\begin{aligned} h_{7.j}= \mathbf{E}\exp \{it{\mathbb U}_1\}itW_j+R_j, \end{aligned}$$

where \(R_j=(it)^2\mathbf{E}\exp \{it{\mathbb U}_1\} W_j{\mathbb U}_2^{\star }\theta \) and where \(\theta \) is a function of \({\mathbb U}_2^{\star }\) satisfying \(|\theta |\le 1\). In order to prove (157) we show that \(R_j\prec \mathcal R\), for \(j=1,2,3\).

Combining (151) and (156) we obtain via Cauchy–Schwartz

$$\begin{aligned} |R_3|\le c_*t^2mN^{-5/2}\prec \mathcal R. \end{aligned}$$

Furthermore, using the fact that the random variable \({\mathbb U}_1(A_2)\) and the random variables \({\mathbb U}_2^{\star }\) and \(W_2\) are independent, we can write

$$\begin{aligned} |R_2| \le t^2|\mathbf{E}\exp \{it{\mathbb U}_1(A_2)\}|\mathbf{E}|W_2{\mathbb U}_2^{\star }| \le c_*t^2e^{-mt^2/8N}m^{1/2}N^{-2} \prec \mathcal R. \end{aligned}$$

Here we used (165) and the moment inequalities (151) and (156). The proof of \(R_1\prec \mathcal R\) is similar. We arrive at (157) and, thus, complete the proof of (152).

Let us prove (153). We proceed in two steps. Firstly we show

$$\begin{aligned}&h_4\sim h_8+h_9, \nonumber \\&h_8=\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2)\}, \qquad h_9=\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2)\}itZ. \end{aligned}$$
(158)

Secondly, we show

$$\begin{aligned} h_9\sim \mathbf{E}\exp \{it{\mathbb U}_1\}itZ. \end{aligned}$$
(159)

In order to prove (158) we write

$$\begin{aligned} h_4=h_8+h_9+R, \qquad R=\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2)\}{\tilde{r}}, \qquad {\tilde{r}}=\exp \{itZ\}-1-itZ, \end{aligned}$$

and show that \(R\prec \mathcal R\). In order to bound the remainder R we write \({\mathbb U}_2={\mathbb U}_2^*+{\mathbb U}_2^{\star }\), see (155), and expand the exponent in powers of \(it{\mathbb U}_2^*\). We obtain \(R=R_1+R_2\), where

$$\begin{aligned} R_1=\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2^{\star })\}{\tilde{r}} \qquad {\text {and}} \qquad |R_2|\le \mathbf{E}|it{\mathbb U}^*_2{\tilde{r}}|. \end{aligned}$$

Note that, for \(2<s\le 3\), we have \(|{\tilde{r}}|\le c |tZ|^{s/2}\). Combining (150) and (156) we obtain via Cauchy–Schwartz,

$$\begin{aligned} |R_2| \le |t|^{1+s/2}\mathbf{E}|Z|^{s/2}|{\mathbb U}_2^*| \le c_*|t|^{1+s/2}m^{1/2}N^{-1-s/2} \prec \mathcal R. \end{aligned}$$

In order to prove \(R_1\prec \mathcal R\) we use the fact that the random variable \({\mathbb U}_1(\Omega _m)\) and the random variables \({\mathbb U}_2^{\star }\) and \({\tilde{r}}\) are independent. Invoking the inequality \(|{\tilde{r}}|\le t^2Z^2\) we obtain from (165) and (150)

$$\begin{aligned} |R_1|\le t^2|\alpha ^m(t)|\mathbf{E}Z^2\le c_* t^2e^{-mt^2/4N}N^{-2}\prec \mathcal R. \end{aligned}$$

We thus arrive at (158).

Let us prove (159). Use the decomposition (145) and expand the exponent (in \(h_9\)) in powers of \(it{\mathbb Z}_1\) to get \(h_9=h_{10}+R\), where

$$\begin{aligned} h_{10}=\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb Z}_2+{\mathbb Z}_3+{\mathbb Z}_4)\}itZ, \quad |R|\le t^2\mathbf{E}|Z{\mathbb Z}_1|. \end{aligned}$$

Combining (146) and (150) we obtain via Cauchy–Schwartz

$$\begin{aligned} |R|\le c_* t^2 mN^{-5/2}\prec \mathcal R. \end{aligned}$$

Therefore, we have

$$\begin{aligned} h_9\sim h_{10}. \end{aligned}$$

Now we expand the exponent in \(h_{10}\) in powers of \(it({\mathbb Z}_2+{\mathbb Z}_3)\) and obtain \( h_{10}=h_{11}+h_{12}+R\), where

$$\begin{aligned} h_{11}=\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb Z}_4)\}itZ, \qquad h_{12}=\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb Z}_4)\}(it)^2Z({\mathbb Z}_2+{\mathbb Z}_3), \end{aligned}$$

and where \(|R|\le |t|^3\mathbf{E}|Z|\,|{\mathbb Z}_2+{\mathbb Z}_3|^2\). Combining (146) and (150) we obtain via Cauchy–Schwartz \(|R|\le |t|^3mN^{-3}\prec \mathcal R\). Therefore, we have

$$\begin{aligned} h_{10}\sim h_{11}+h_{12}. \end{aligned}$$

We complete the proof of (159) by showing that

$$\begin{aligned} h_{11}\sim \mathbf{E}\exp \{it{\mathbb U}_1\}itZ \qquad {\text {and}} \qquad h_{12}\prec \mathcal R. \end{aligned}$$
(160)

In order to prove the second bound write

$$\begin{aligned} h_{12}=R_2+R_3, \qquad {\text {where}} \qquad R_j=\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb Z}_4)\}(it)^2Z{\mathbb Z}_j. \end{aligned}$$

We shall show that \(R_3\prec \mathcal R\). Using the fact that the random variable \({\mathbb U}_1(A_1)\) and the random variables Z, \({\mathbb Z}_3\) and \({\mathbb Z}_4\) are independent we obtain from (165)

$$\begin{aligned} |R_3| \le t^2|\alpha ^{m/2}(t)|\mathbf{E}|Z{\mathbb Z}_3| \le t^2e^{-mt^2/8}m^{1/2}N^{-2} \prec \mathcal R. \end{aligned}$$

In the last step we combined (146), (150) and Cauchy–Schwartz. The proof of \(R_1\prec \mathcal R\) is similar.

In order to prove the first relation of (160) we expand the exponent in powers of \(it{\mathbb Z}_4\) and obtain \(h_{11}=\mathbf{E}\exp \{it{\mathbb U}_1\}itZ+R\). Furthermore, combining (165), (146) and (150) we obtain

$$\begin{aligned} |R| \le t^2|\alpha ^m(t)| \mathbf{E}|Z{\mathbb Z}_4| \le c_*t^2e^{-mt^2/4N}N^{-3/2} \prec \mathcal R. \end{aligned}$$

Hence the first relation of (160). The proof of (153) is complete.

Let us prove (154). By symmetry and the independence,

$$\begin{aligned} \mathbf{E}e^{it{\mathbb U}_1}it{\mathbb U}_3 = \left( {\begin{array}{c}N\\ 3\end{array}}\right) h_{13} \mathbf{E}e^{it{\mathbb U}_*}, \qquad h_{13}=\mathbf{E}e^{itx_1}e^{itx_2}e^{itx_3}itz. \end{aligned}$$
(161)

Here we denote \(z=g_3(X_1,X_2,X_3)\) and write,

$$\begin{aligned} {\mathbb U}_1=x_1+x_2+x_3+{\mathbb U}_*, \qquad {\mathbb U}_*=\sum _{4\le j\le N}g_1(X_j), \qquad x_j=g_1(X_j). \end{aligned}$$

Furthermore, write

$$\begin{aligned} r_j=e^{itx_j}-1-itx_j, \qquad v_j=e^{itx_j}-1. \end{aligned}$$

In what follows we expand the exponents in powers of \(\textit{itx}_j\), \(j=1,2,3\) and use the fact that \(\mathbf{E}\bigl ( g_3(X_1,X_2,X_3)\bigl |X_1,X_2\bigr )=0\) as well as the obvious symmetry. Thus, we have

$$\begin{aligned}&h_{13}=h_{14}+R_1, \qquad h_{14}=\mathbf{E}e^{itx_2}e^{itx_3}(it)^2zx_1, \qquad R_1=\mathbf{E}e^{itx_2}e^{itx_3}itzr_1, \\&h_{14}=h_{15}+R_2, \qquad h_{15}=\mathbf{E}e^{itx_3}(it)^3zx_1x_2, \qquad R_2=\mathbf{E}e^{itx_3}(it)^2zx_1r_2 \\&h_{15}=h_{16}+R_3, \qquad h_{16}=\mathbf{E}(it)^4zx_1x_2x_3, \qquad R_3=\mathbf{E}(it)^3zx_1x_2r_3. \end{aligned}$$

Furthermore, we have

$$\begin{aligned} R_1=\mathbf{E}itz_1r_1v_2v_3, \qquad R_2=\mathbf{E}(it)^2zx_1r_2v_3. \end{aligned}$$

Invoking the bounds \(|r_j|\le |tx_j|^2\) and \(|v_j|\le |tx_j|\) we obtain

$$\begin{aligned} h_{13}=h_{16}+R, \end{aligned}$$
(162)

where \(|R|\le c|t|^{5}\mathbf{E}|z x_1x_2|\,x_3^2\). The bound, \(|R|\le c_*|t|^5N^{-9/2}\) (which follows, by Cauchy–Schwartz) in combination with (161) and (162) implies

$$\begin{aligned} \mathbf{E}e^{it{\mathbb U}_1}it{\mathbb U}_3 \sim \left( {\begin{array}{c}N\\ 3\end{array}}\right) \mathbf{E}e^{it{\mathbb U}_*} (it)^4w. \end{aligned}$$
(163)

Note that \(\left( {\begin{array}{c}N\\ 3\end{array}}\right) |w|\le c_*N^{-1}\). In order to show (154) we replace \(\mathbf{E}e^{it{\mathbb U}_*}\) by \(e^{-t^2/2}\). Therefore, (154) follows from (163) and the inequalities

$$\begin{aligned} \frac{(it)^4}{N}(\mathbf{E}e^{it{\mathbb U}_*}-e^{-t^2\sigma ^2(N-3)/2N})\prec \mathcal{R}, \quad \frac{(it)^4}{N}(e^{-t^2\sigma ^2(N-3)/2N}-e^{-t^2/2})\prec \mathcal{R}. \end{aligned}$$

The second inequality is a direct consequence of (169). The proof of the first inequality is routine and here omitted. Thus the proof of (140) is complete.

5.2.3 Completion of the proof of (128)

Here we show that

$$\begin{aligned} \mathbf{E}\exp \{it{\mathbb U}_1+{\mathbb U}_2)\}+ \left( {\begin{array}{c}N\\ 3\end{array}}\right) e^{-t^2/2}(it)^4w \sim {\hat{G}}(t). \end{aligned}$$
(164)

This relation in combination with (139) and (140) implies \(\mathbf{E}e^{it{\mathbb T}}\sim {\hat{G}}(t)\).

Let \(G_U(t)\) denote the two term Edgeworth expansion of the U- statistic \({\mathbb U}_1+{\mathbb U}_2\). That is, \(G_U(t)\) is defined by (2), but with \(\kappa _4\) replaced by \(\kappa _4^*\), where \(\kappa _4^*\) is obtained from \(\kappa _4\) after removing the summand \(4\mathbf{E}g(X_1)g(X_2)g(X_3)\chi (X_1,X_2,X_3)\). Furthermore, let \({\hat{G}}_U(t)\) denote the Fourier transform of \(G_U(t)\). It easy to show that

$$\begin{aligned} {\hat{G}}(t)={\hat{G}}_U(t)+ \left( {\begin{array}{c}N\\ 3\end{array}}\right) e^{-t^2/2}(it)^4w. \end{aligned}$$

Therefore, in order to prove (164) it suffices to show that \({\hat{G}}_U(t)\sim \mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2)\}\). The bound

$$\begin{aligned} \int _{|t|\le t_1} |{\hat{G}}_U(t)-\mathbf{E}\exp \{it({\mathbb U}_1+{\mathbb U}_2)\}| \frac{dt}{|t|} \le \varepsilon _NN^{-1} \end{aligned}$$

where \(\varepsilon _N\downarrow 0\), was shown by Callaert, Janssen and Veraverbeke [16] and Bickel, Götze and van Zwet [11]. An inspection of their proofs shows that under the moment conditions (5) one can replace \(\varepsilon _n\) by \(c_*N^{-\nu }\). This completes the proof of (127).

For the reader convenience we formulate in Lemma 3 a known result on upper bounds for characteristic functions.

Lemma 3

Assume that (16) holds. There exists a constant \(c_*\) depending on \(D_*,M_*, r, s, \nu _1\) only such that, for \(N>c_*\) and \(|t|\le N^{1/2}/10^3\beta _3\) and \(B\subset \Omega _N\), we have

$$\begin{aligned} |\alpha (t)|\le 1-t^2/4N, \qquad \mathbf{E}\exp \{it{\mathbb U}_1(B)\}|\le |\alpha (t)|^{|B|}\le e^{-|B|t^2/4N}. \end{aligned}$$
(165)

Here \(\alpha (t)=\mathbf{E}\exp \{itg_1(X_1)\}\) and \({\mathbb U}_1(B)=\sum _{j\in B}g_1(X_j)\).

Proof

Let us prove the first inequality of (165). Expanding the exponent, see (188), we obtain

$$\begin{aligned} |\alpha (t)|&\le \bigl |1-2^{-1}t^2\mathbf{E}g_1^2(X_1)\bigr |+6^{-1}|t|^3\mathbf{E}|g_1(X_1)|^3 \\&= \bigl |1-\sigma ^2t^2/2N\bigr |+\beta _3\sigma ^3|t|^3/6N^{3/2} \end{aligned}$$

Invoking the inequality \(1-10^{-3}\le \sigma ^2\le 1\) which follows from (169) for \(N>c_*\), where \(c_*\) is sufficiently large, we obtain \(|\alpha (t)|\le 1-t^2/4N\), for \(|t|\le N^{1/2}/10^3\beta _3\).

The second inequality of (165) follows from the first one via the inequality \(1+x\le e^x\), for \( x\in R\).\(\square \)