# Limit theorems for von Mises statistics of a measure preserving transformation

- 268 Downloads
- 5 Citations

## Abstract

### Keywords

Von Mises statistic Measure preserving transformation Projective tensor product Ergodic theorem Hoeffding decomposition Central limit theorem Martingale approximation### Mathematics Subject Classification (2010)

Primary 60F05 60F15 60G10 28D05 28A35 37A30 Secondary 28A33 60A10 60B10 62E20 62G05## 1 Introduction

### 1.1 Objectives and contents

The present paper aims to extend the theory of von Mises statistics for independent, identically distributed random variables to the realm of strictly stationary processes. Every stationary process will be investigated together with a respective measure preserving transformation of the main probability space. Such a transformation is the only structure used in the present article to establish a Strong Law of Large Numbers (SLLN) for von Mises statistics. The Central Limit Theorem (CLT) and other weak convergence results are treated in the framework of a filtration compatible with the transformation. A stationary processes generating such a filtration will appear only in applications. It turns out that a considerable part of the limit theory can be developed on this basis. One of the objectives of the paper is to show that such a relatively modest additional structure creates a suitable setting to apply some form of the martingale approximation; indeed, the latter is our main tool when proving the CLT-type results. Below, we will explain another objective of the present work and its results; the latter are collected in four statements.

*kernel*, we investigate, after normalizing appropriately, the asymptotic behavior of random variables

*von Mises statistic*(or a \(V\)-

*statistic*) for the transformation \(T\) and the kernel \(f\). Notice that the same class of statistics is determined by

*symmetric*kernels, so we will assume that \(f\) is symmetric whenever it is needed.

At first glance the summands in (1) can be defined in two steps. Firstly, the functions \((x_1,\ldots ,x_d)\mapsto f(T^{i_1}x_1,\ldots ,T^{i_d}x_d)\) can be obtained using the dynamics coordinatewise; secondly, they should be restricted to the main diagonal of \(X^d\). The second step, however, requires some care. Analysis and clarification of the concept of restriction became another important objective of this work. This is a crucial point determining substantially the approach in the present paper. If \(f: X^d \rightarrow \mathbb R \) is a measurable function on the Cartesian power \((X^d, \mathcal{F }^{\otimes d},\mu ^d)\), it is viewed, as usual, not as an individual function, but rather as an equivalence class of individual functions any two of which agree on some set of measure \(1\). Such an equivalence class, in general, does not have a well-defined restriction to a subset of measure zero, like the main diagonal is in the case of the atomless space \((X,\mathcal{F },\mu )\). However, some equivalence classes may contain individual functions with well-defined restrictions (for example, continuous functions, assuming that \(X\) is the unit interval with the Lebesgue measure \(\mu \)). A simple but important observation made in this article is that suitable nice functions on product probability spaces can be described in purely measure-theoretical terms. The key concept here is the *projective tensor product* of Banach spaces. First we show that, under appropriate assumptions, the elements of a respective abstract Banach space can be viewed as functions from some \(L_p(\mu ^d)\). In particular, every such a function determines an equivalence class discussed above. Analogously to the situation with continuous functions, nice representatives (non-unique) can be found within every such equivalence class; in view of specific properties of projective tensor products, they can be represented by absolutely convergent series of the products of functions in separate variables. Furthermore, such ‘special representatives’ can be restricted to the main diagonal in a correct way. Notice, that the main diagonal is considered here as a probability measure space whose measure is the image of \(\mu \) under the map \(x \mapsto \underbrace{(x, \ldots ,x\,)}_{d \, \text {times}}\) ; correctness means here that possible uncertainty in the choice of the restricted function concerns only sets of measure \(0\) on the diagonal. We emphasize that this procedure of ‘naive restriction’ applies to ‘special representatives’ of equivalence classes only. Different choice of a representative within the same equivalence class may lead to misunderstandings which can be observed in the literature. In the present paper, however, another approach to the restriction problem is developed. Using general properties of projective tensor products, a *restriction operator* is defined. We will see that this operator agrees with the ‘naive restriction’ in the case of the sums of product functions and their proper limits. On the other hand, for every equivalence class of measurable functions discussed above, the restriction operator can (or can not) be applied to the *entire equivalence class* and sends it, if applicable, to an equivalence class of functions on the diagonal; thus, no special choice of a representative within the class is needed. Moreover, we show in Proposition 2 that the correct restriction can be obtained as the result of a natural procedure combining approximation and regularization (compare with the Steklov smoothing operators and Theorem 8.4 in [29]). Finally, we obtain, along with the *correctness of the restriction*, its *continuous dependence on the kernel*; this continuity is critical for our approach. The above discussion introduces the following result which summarizes Lemma 1 and a particular case of Proposition 1 in Sect. 2 where also some information on projective tensor products can be found. We denote by \(L_p(\mu ^d)\) the space \(L_p\bigl (X^d, \mathcal{F }^{\otimes d}, \mu ^d \bigr )\) and by \(|\cdot |_p\) the norm in any space \(L_p\).

**Statement A**

Statement A leads to the following version of the multivariate ergodic theorem (Corollary 2 in Sect. 3).

**Statement B**

Here \(E_{\text {inv}}\) is the conditional expectation operator with respect to the \(\sigma \)-algebra of \(T\)-invariant sets, and \(E^{\otimes d}_{\text {inv},\,\pi }\) is the \(d\)-th projective tensor power of \(E_{\text {inv}}\).

*symmetric Hoeffding decomposition*asserts the existence of operators \(R_m: L_p^{sym}\,(\mu ^d)\rightarrow L_p^{sym}\,(\mu ^m)\) such that every \(f \in L_p^{sym}\,(\mu ^d)\) can be represented in a unique way in the form

In the following Statement C (Theorem 2 in Sect. 7) we assume that \(T\) is an *exact* transformation in the sense that \(\bigcap _{n \ge 1} T^{-n} \mathcal{F } = \mathcal{N }\), where \(\mathcal{N }\) is the trivial sub-\(\sigma \)-field of \(\mathcal{F }\). Let \( E\) denote the expectation operator. Using the Hoeffding decomposition and applying to every of its components the multiparameter martingale-coboundary representation [33], we prove

**Statement C**

This CLT is complemented by Theorem 3 in Sect. 7 which asserts, under weaker assumptions, only the convergence of the first absolute moments (besides the convergence to the Gaussian distribution). Last, in Theorem 4 of Sect. 8, we prove the following distributional result when \(d=2\) and \(f\) is a symmetric *canonical* kernel (that is \(R_0\,f=0\) and \(R_1\,f=0\)).

**Statement D**

The main limit theorems are presented with proofs in Sects. 3, 7 and 8. Section 2 contains necessary preliminary material; in particular, the restriction operator is introduced there. The Hoeffding decomposition and filtrations are discussed, respectively, in Sects. 4 and 5. Section 6 contains the main part of the preparatory work for the rest of the paper. It is here that the martingale decomposition undergoes the projective tensor multiplication, leading from the classical Burkholder martingale inequality to upper bounds for certain multiparameter sums. These bounds allow (Sect. 7) to neglect the influence of higher degree summands in the Hoeffding decomposition to the asymptotic behavior when proving the CLT in the non-degenerate case. They are also applied in Sect. 8 in the proof of Statement D to show that the contribution of “partial coboundaries” vanishes in the limit; this reduces the proof to the particular case of a kernel with maximal possible martingale difference properties. Some examples (in fact, mostly general results treating entire classes of stationary processes and kernels) are collected in Sect. 9.

The above stated results, along with their modification for the case of invertible transformations (see Remark 8) and the examples in Sect. 9, clearly show that a substantial part of the limit theory for \(V\)-statistics of stationary processes can be developed, basing exclusively on projective tensor products and martingale approximations. The latter is presented only in its original primitive form (moreover, only the adapted case is considered). Using more recent developments could substantially relax many assumptions in the paper. Many other limit results can be established similarly or at the expense of small additional efforts. However, we believe that this presentation is more suitable for introducing the subject.

*Remark 1*

For a given function \(f\) defined on \((X^d, \mathcal{F }^{\otimes d},\mu ^d)\), a natural question arises to decide whether \( f \in L_{p,\,\pi }(\mu ^d)\) and to bound its norm. For \(d=2\) and some \(p \in (1, \infty ], p^{\,\prime } \in [1,\infty )\), \(p^{-1}+(p^{\,\prime })^{-1}=1\), an equivalent question is whether the integral operator from \(L_{p^{\,\prime }}\) to \(L_{p}\) with the kernel \(f\) is *nuclear* [49]. There is an extensive literature on the topic, especially on nuclear (or *trace class*: see [47] and also [49] where Exercise 2.12 shows the difference between the complex and the real cases) operators in Hilbert spaces. Criteria for integral operators to be nuclear can be traced back to classical papers of Fredholm and Carleman (see monographs [28, 29] and references therein; in [29] also nuclear operators in Banach spaces are considered). A special class consists of positive semidefinite kernels. For example, the well-known Mercer’s theorem implies \(f \in L_{2,\,\pi }(\mu ^2)\) for such kernels under the additional assumption that \(X\) is a compact space and \(f\) is continuous.

To the best of our knowledge, for \(d \ge 3\), much fewer literature exists on this topic. The main tool here is the expansion of \(f\) into a functional series whose summands are products of sufficiently regular functions in separate variables \(x_1,\ldots ,x_d\) (see Proposition 6 and Sect. 9 for some examples).

*Remark 2*

The \(U\)-statistics [that is, for symmetric kernels \(f\), the off-diagonal modification of sums (1)] are mentioned but not treated in the present paper. Under some strengthening our assumptions (the series in (3), (29) and (32) should converge unconditionally; for example, this will be the case if we are in the position to check the assumptions of Proposition 6) the conclusions of Theorems 2, 3 and 4 can be reformulated for \(U\)-statistics. Notice that both advantages of \(U\)-statistics compared to \(V\)-statistics in the i.i.d. case (to be unbiased estimates of the mean value of the kernel with i.i.d. arguments; to require weaker assumptions imposed on the kernel) in general are no longer valid in the dependent case.

### 1.2 Some history and earlier results

The theory of \(U\)- and \(V\)-statistics for i.i.d. variables is well developed (see [2, 15, 17, 35, 36, 40] and references therein). Degenerate von Mises statistics for independent variables have first been treated by von Mises in [52] and Filippova in [27]. Neuhaus [46] proved a functional form of the weak convergence for degenerate kernels of degree \(2\). Although he dealt with the \(U\)-statistics only, the method applies as well to von Mises statistics with properly modified limit distributions. In [23] the functional form of Filippova’s result is obtained with the distributional limit presented by multiple stochastic integrals with respect to the Kiefer–Müller process. Many fine results on \(U\)-statistics (maximal inequalities, large deviations, functional CLT) are included or surveyed in [17] and [45].

*absolutely regular*. Other

*mixing conditions*are investigated in [4, 5, 7, 8, 9, 20, 39, 50, 51, 54]. Functionals of absolutely regular processes have been studied in [21]. In [22] these results were used to construct a new type of asymptotically distribution free confidence intervals for the correlation dimension (see [34]). Later many limit results have been considerably improved in [10] and [11] by establishing a functional form of the central limit theorem. In the weakly dependent case we mention the works of Babbel [4, 5] and Amanov [3] where various types of mixing conditions are considered, including strong mixing. The above list is incomplete, more information is contained in the surveys [18] and [19].

Notice that in a recent paper [41], independently of our research, for a certain class of canonical symmetric kernels of degree 2 (in 9.2.1 we call them *martingale kernels*) a limit distribution of \(V\)-statistics is derived which has the same form as in the i.i.d. case. This conclusion agrees with ours in Statement D above; the result in [41] is a rather particular case of our Statement D (see 9.2.1 for more details). The paper [41] and the subsequent papers [42, 43] also develop impressive statistical applications of this and other limit results; some new, compared to [41], limit theorems in [42, 43] are developed by means of methods different from those used in the present paper; the corresponding assumptions about the process include some decay of the Kantorovich distance between the conditional and the unconditional distributions of the process given its past; also some form of the Lipschitz condition is imposed on the kernel. The spectral decomposition of the kernel or, alternatively, its approximation by Lipschitz continuous wavelets are used there to derive the results.^{1}

## 2 Preliminaries

### 2.1 Multiparameter actions

Let \(T\) be a measure preserving transformation of a probability space \((X,\mathcal{F }, \mu )\) (which is assumed to be standard, that is a Lebesgue space in the sense of Rokhlin [48]). For every \(p \in [1,\infty ]\) we set \(L_p(\mu )= L_p(X,\mathcal{F }, \mu ) \), choosing \(\mathbb C \) as the field of scalars and denoting by \(|\cdot |_{p}\) the norm of \(L_p(\,\mu )\). Define an isometry \(V:L_p(\mu )\rightarrow L_p(\mu )\) by the relation \(Vf= f\circ T\). For every \(p \in [1,\infty )\) let \(V^*:L_{p\,'}(\mu )\rightarrow L_{p\,'}(\mu )\) be the adjoint operator of \(V:L_p(\mu )\rightarrow L_p(\mu )\) where \(p^{-1}+p\,'^{-1}=1\). The preadjoint operator (acting in \(L_1(X,\mathcal{F }, \mu ) \)) of the operator \(V: L_{\infty }(\mu )\rightarrow L_{\infty }(\mu )\) will be loosely called the adjoint of \(V\) and denoted by \(V^*\), too, whenever this does not lead to a misunderstanding. Analogous notations and agreements will be applied to other measure spaces, their transformations and related operators.

*exact transformations*). The family of adjoint operators \((V^{\mathbf{n}*})_{\mathbf{n} \in \mathbb Z ^d_+}\) is also a representation of \(\mathbb Z ^d_+\) (by

*coisometries*in this case). Note that these two representations do not commute with each other in the noninvertible case (otherwise they clearly commute). However, if \(\mathbf{e}_1, \ldots , \mathbf{e}_d \) denote the standard basis of \(\mathbb Z _+^d\), the operators \(V^{\mathbf{e}_i}\) and \(V^{*\,\mathbf{e}_j}\) commute for \(i \ne j\) because they act on different coordinates in \(X^d\). This will be used in the proof of Lemma 5.

### 2.2 Tensor products and products of functions

We discuss here conditions on kernels under which \(V\)-statistics are well-defined. Recall the concept of the projective tensor product of Banach spaces [16, 49]. The main field is assumed to be \(\mathbb C \) or \(\mathbb R \).

Let \(B_1, \ldots ,B_d\) be Banach spaces with norms \(|\cdots |_{B_1}, \ldots ,|\cdot |_{B_d}\) and let \(B_1 \otimes \ldots \otimes B_d\) be their algebraic tensor product. Elements of \(B_1 \otimes \cdots \otimes B_d\), representable in the form \(f_1\otimes \cdots \otimes f_d\), are called *elementary tensors*. The *projective tensor product* of \(d \ge 2\) Banach spaces denoted by \(B_1 \hat{\otimes }_{\pi } \cdots \hat{\otimes }_{\pi } B_d\) is, by definition, the completion of the algebraic tensor product with respect to the *projective norm* defined as the supremum of all *cross norms* on \(B_1 \otimes \cdots \otimes B_d\). Recall that a norm on \(B_1 \otimes \cdots \otimes B_d\) is said to be a cross norm whenever it equals \(\prod _{i=1}^d |f_i|_{B_i}\) for every elementary tensor \(f_1\otimes \cdots \otimes f_d.\)

**Lemma 1**

*Proof*

We prove now that \(J_d \,(\,=J_{d,\,p})\) is injective. For \(p=1\) it is so because \(J_{d,1}\) is an isometric isomorphism between its domain and its range ([49], Exercise 2.8). Let now for some \(p > 1\)\(I_{1,\,p}: L_p(\mu ) \rightarrow L_1(\mu )\) and \(I_{d,\,p}: L_p(\mu ^d) \rightarrow L_1(\mu ^d)\) be the inclusion operators (of norm \(1\) each). By the *metric mapping property* ([16], 12.1) of the projective tensor norm, the inclusion \(I_{1,\,p}\) gives rise to the norm \(1\) mapping \(A_d: L_{p,\,\pi }(\mu ^d)\rightarrow L_{1,\,\pi }(\mu ^d)\) (notice that \(L_{1,\,\pi }(\mu ^d)\) and \(L_1(\mu ^d)\) are identified by \(J_{d,\,1}\)). Since the spaces \(L_p\) have the *approximation property*, the operator \(A_d\) is injective as a projective tensor product of injective operators \(I_{1,\,p}\) (see Corollary 4 (1), subsection 5.8, in [16]; then use induction). Starting with algebraic tensor products and passing, in view of boundedness of all operators involved, to the completions with respect to corresponding norms, we obtain that the mappings \(J_{d,\,1}\, A_d: L_{p,\pi }(\mu ^d) \rightarrow L_1(\mu ^d)\) and \(I_{d,\,p}\, J_{d,\,p}: L_{p,\,\pi }(\mu ^d) \rightarrow L_1(\,\mu ^d)\) agree. Since \(A_d\) and \(J_{d,\,1}\) are injective, so is \(J_{d,\,p}.\)\(\square \)

*Remark 3*

The space \( L_{2,\,\pi }(\mu ^2)\) can be identified with the space of nuclear (or *trace class*) operators from \({L_2(\mu )}^*\) to \(L_2(\mu )\) ([49]). The operator \(J_2\) in Lemma 1 transforms such (integral) operators to their kernels which form a subspace of \( L_{2}(\mu ^2)\).

### 2.3 Restriction to the diagonal

In the following Proposition 1, for every \(p_1,\ldots ,p_d \in [1, \infty ]\) with \(p_1^{-1}+\cdots + p_d^{-1}=1\) and for every \(f \in L_{p_1}(\mu ) \hat{\otimes }_{\pi } \cdots \hat{\otimes }_{\pi } L_{p_d}(\mu ),\) we define a function \(D_df\in L_1(\mu ).\) In the case of \(1 \le p_1= \cdots =p_d=p \le \infty \) the embedding \(J_d\) (Lemma 1) allows us to consider the space \(L_{p}(\mu ) \hat{\otimes }_{\pi } \cdots \hat{\otimes }_{\pi } L_{p}(\mu )\) as a subspace of the \(L_{p}(\mu ^d)\) and interpret its elements as functions defined on \(X^d\). Then \(D_d f\) plays the role of the restriction of \(f\) to the principal diagonal \(\{(x_1,\ldots ,x_d): x_1=\cdots =x_d)\}\subset X^d\). In this particular case the term ‘restriction’ can be justified by an approximation procedure described in Proposition 2 below.

**Proposition 1**

- (1)the map \(\mathcal{D }\), sending every \(d\)-tuple \((f_1,\ldots ,f_d) \in L_{p_1}(\mu )\times \cdots \)\(\times L_{p_d}(\mu )\) to the functionis a norm \(1\ d\)-linear map from \(L_{p_1}(\mu )\times \cdots \times L_{p_d}(\mu )\) to \(L_r(\mu );\)$$\begin{aligned} x \mapsto f_1(x) \cdots f_d(x), \end{aligned}$$
- (2)there exists a unique linear map (of norm \(1\))such that for every \(d-\)tuple \((f_1,\ldots ,f_d) \in L_{p_1}(\mu )\times \cdots \times L_{p_d}(\mu )\)$$\begin{aligned} D_d: L_{p_1}(\mu ) \hat{\otimes }_{\pi } \cdots \hat{\otimes }_{\pi } L_{p_d}(\mu ) \rightarrow L_r(\mu ) \end{aligned}$$$$\begin{aligned} D_d(f_1 \otimes \ldots \otimes f_d) = \mathcal{D } (f_1, \ldots , f_d). \end{aligned}$$

*Proof*

The first assertion is a consequence of the multiple Hölder inequality (Exercise 6.11.2 in [24]). The second one follows from the linearization property of the projective tensor products with respect to polylinear maps. For the case of bilinear maps see Theorem 2.9 in [49]; for \(d>2\) use induction and associativity. \(\square \)

If \(p_1=\cdots =p_{d}=p,\) the space \( L_{p,\, \pi }(\mu ^d)= L_{p}(\mu ) \hat{\otimes }_{\pi } \cdots \hat{\otimes }_{\pi } L_{p}(\mu )\) is embedded into \(L_p(\mu ^d)\) by the operator \(J_d\) (Lemma 1); we omit \(J_d\) and treat an \(f \in L_{p,\, \pi }(\mu ^d)\) as a function. For every finite measurable partition \(\mathcal{A }=\{A_1 \ldots , A_m \} \) let us denote by \(\mathcal{F }_{\mathcal{A }}\) the \(\sigma \)-field of all possible unions of atoms of \(\mathcal{A }\) and by \(E(\cdot |\, \mathcal{A })\) the corresponding conditional expectation. Let \((\mathcal{A }_n)_{n\ge 1}\) be a refining sequence of finite measurable partitions \(\mathcal{A }_n=\{A_{1,\,n},\)\(\ldots , A_{m_n\!,\,n} \} \) such that \(\mathcal{F }\) is the smallest \(\sigma \)-field containing all \(\mathcal{F }_{\mathcal{A }_n}, n \ge 1.\) Let \(I_A\) denote the indicator of the set \(A.\)

**Proposition 2**

*Proof*

The following corollary will be used in the proof of Proposition 3.

**Corollary 1**

The restriction operator \(D_d\) preserves positivity of real valued functions.

Thus, the function \(D_d f \in L_r(\mu )\) is a well-defined substitute for the naive restriction of \(f\) to the principal diagonal. For example, for \( \mathbf{n} \! = \! (n_1,\ldots ,n_d)\) the function \(D_d V^{\mathbf{n}}f\) can be viewed as a substitute for the function \( x \mapsto f(T^{n_1}x, \ldots ,T^{n_d}x).\)

## 3 Strong law of large numbers

### 3.1 A multivariate ergodic theorem

If \(T\) is an ergodic transformation of a probability space, a von Mises statistic may be considered as an estimate for the multiple integral of the kernel with respect to the invariant measure. Consistency is one of the desirable statistical properties of a sequence of estimates; this raises the question of an appropriate ergodic theorem. Proposition 3, the main result of this subsection, states such a theorem in a general setting. It asserts, in the ergodic case, the convergence of multiparameter sums (7) to the average of the kernel with respect to the product measure. This reminds of a Wiener-type ergodic theorem ([24], Theorem 8.6.9) specialized to the case of \(d\) one-parameter coordinatewise actions on the product of \(d\) probability spaces. However, not only the assumptions, but also the conclusions in these results are different: unlike the Wiener theorem, our result asserts the convergence for almost all initial points with respect to a probability measure which is in general *neither absolutely continuous* with respect to the product measure (being supported on the main diagonal) *nor invariant* under the multiparameter action.

We do not assume here symmetry of the kernel and perform summation over rectangular coordinate domains (which is common in the multiparameter ergodic theorems, see [24], Chapter 8) rather than over coordinate cubes involved in the definition of \(V\)-statistics. In this subsection we consider several possibly different \(\mu \)–preserving transformations \(T_1, \ldots , T_d\) of the space \((X,\mathcal{F }, \mu )\), using the notation \(T^{(n_1,\ldots ,n_d)}(x_1,\ldots ,x_d)\! =\!(T_1^{n_1}x_1, \ldots ,T_d^{n_d}x_d)\) and \(V^{(n_1,\ldots ,n_d)}f\! = f \circ T^{(n_1,\ldots ,n_d)}.\)

Transformations considered in this subsection in general are not ergodic, so we need some notations to include the non-ergodic case. Recall that \(A \in \mathcal{F }\) is said to be \(T-\)*invariant* if \(T^{-1}A=A\). For every \(l \in \{ 1,\ldots ,d \}\) let \(\mathcal{F }_{inv,\,l}\) denote the \(\sigma \)-field of all \(T_l-\)invariant measurable sets in \((X, \mathcal{F }, \mu )\), and let \(E_{inv,\,l}\) be the corresponding conditional expectation considered as an operator in \(L_{p_{l}}(X, \mathcal{F }, \mu ).\)

**Proposition 3**

*Remark 4*

The next lemma will be used in the proof of Proposition 3.

**Lemma 2**

*Proof*

For the proof we will use the bound in [24], Theorem 8.6.8. Note that this result is the lemma for \(d=1\).

*Proof of Proposition 3*

### 3.2 Applications to the SLLN for von Mises statistics

We return here to the assumption that the transformations \(T_1, \ldots , T_d\) are copies of the same transformation \(T.\) For simplicity we assume that \(T\) is ergodic. Symmetry of the kernel is not assumed.

**Theoerm 1**

*Proof*

The theorem follows from Proposition 3. We only need to identify the limits. Since the limit expressions given in Proposition 3 and in the theorem are both continuous in the projective norm, it suffices to check that these expressions agree for elementary tensors \(f_1 \otimes \cdots \otimes f_d\). It is straightforward to check that in the ergodic case both expressions reduce to \(Ef_1 \cdots Ef_d\), where \(E\) denotes the integral with respect to \(\mu .\)\(\square \)

**Corollary 2**

In the case \(p=d\) Theorem 1 applies and gives the convergence with probability 1 and in \(L_1(\mu ).\)

*Remark 5*

Examples show that it is possible to extend the class of kernels to which the conclusion in Corollary 2 applies to such kernels \(f\in L_{p}(\,\mu ^d)\) which can be “sandwiched” between decreasing and increasing sequences of some \( L_{p, \pi }(\,\mu ^d)\)-kernels whose common \( L_{p}(\,\mu ^d)-\)limit is \(f\) (notice that bounding by products plays some role in [1]). This indicates that probably more appropriate functional spaces can be found in order to treat the SLLN.

**Corollary 3**

*Proof*

The series representing \(f\) obviously converges in \( L_{p\,, \,\pi }(\mu ^d) \), and the corollary follows. \(\square \)

## 4 The Hoeffding decomposition

In this section we recall well-known properties of the Hoeffding decomposition for kernels in the spaces \(L_{p\,}\), omitting proofs (see [25] for the proofs in the symmetric case). It is not hard to see that the results and formulas related to this decomposition (both general and symmetric) apply also to the spaces \( {L}_{p, \pi }\) and, in case \(\mu _1=\cdots = \mu _d=\mu \), to their symmetric subspaces.

### 4.1 The Hoeffding decomposition for general kernels

- (i)
for every \(S \in \mathcal{S }_d\)\(R_S f \in L_p(\mu ^S);\)

- (ii)for every \(S=\{l_1, \ldots , l_m\} \in \mathcal{S }^m_d\)where \(\pi _S:X^d \mapsto X^S\) is defined by \(\pi _S(x_1,\ldots ,x_d)=(x_{l_1}, \ldots , x_{l_m});\)$$\begin{aligned} (R_S f) \circ \pi _S = Q_S f, \end{aligned}$$
- (iii)every \(R_S f\) is
*canonical*(or, using an alternate terminology,*totally degenerate*) that is for every \(l \in S, f \in L_p^{\{1,\ldots ,\,d\}}\)$$\begin{aligned} \check{E}^l\bigr ((R_Sf)\circ \pi _S\bigr )=0. \end{aligned}$$

*degree*\(m\) whenever it does not vanish identically and \(S \in \mathcal{S }^m_d\). Every kernel \(f \in L_{p\,}(\mu ^{\{1, \ldots ,d\}})\) can be represented in a unique way as a sum of canonical kernels (the Hoeffding decomposition) as follows

The *degree* of a kernel \(f\) with decomposition (9) (or the decomposition (10) below) is, by definition, the smallest degree of non-vanishing summands in (9). A kernel \(f\) in (9) is called *degenerate* if the degree of \(f-R_{\emptyset }f\) is greater than \(1\) and *non-degenerate* if it equals \(1\).

### 4.2 The Hoeffding decomposition of symmetric kernels

*symmetric*; their denotations will contain the superscript \(sym\); their elements are called

*symmetric functions*. The next property of the Hoeffding decomposition is specific for the symmetric case.

- iv)whenever the function \(f\) belongs to \( L_p^{sym}(\mu ^d),\) the canonical function \( R_S f \) does not depend on the choice of \(S \in \mathcal{S }^m_d \) and is symmetric; thus, in this case there exist operators \( R_m: L_p^{sym}(\mu ^d) \rightarrow L_p^{sym}(\mu ^m)\) such that for every \(S=\{i_1, \ldots , i_m\} \in \mathcal{S }^m_d\)$$\begin{aligned} (R_m f) \circ \pi _S = Q_S f. \end{aligned}$$

*Remark 6*

## 5 Filtrations: exactness and Kolmogorov property

In the remaining part of the paper we deal with distributional convergence of von Mises statistics for a measure preserving transformation. Our tool here is a kind of martingale approximation. For \(d=1\) this approximation goes back to [30, 32] and [44] (in the latter paper only Harris recurrent Markov chains were considered) and was developed for higher dimensional random arrays in [33].

The additional structure needed is a filtration compatible with the dynamics defined by a measure preserving transformation. From now on we restrict ourselves to a class of measure preserving transformations of probability spaces, which are *exact* [48]. Let \(T\) be a measure preserving transformation of a probability space \((X, \mathcal{F }, \mu ).\) The transformation \(T\) defines a decreasing filtration \( (T^{-k} \mathcal{F })_{k \ge 0}.\) Exactness of \(T\) means that \( \bigcap _{k \ge 0}T^{-k}\mathcal{F } = \mathcal{N },\) where \(\mathcal{N }\) is the trivial \(\sigma \)-field of the space \((X,\mathcal{F }, \mu ).\) As can easily be seen, every exact transformation is ergodic. The standard assumption of the ergodic theory is that \((X,\mathcal{F }, \mu )\) is a Lebesgue space in the sense of Rokhlin. Under this assumption it can be shown that, except for the case of the one point measure space, the Lebesgue space with an exact transformation is an atomless measure space, hence, is isomorphic to the unit interval with the Lebesgue measure. As before, by \( V^* \) we denote the adjoint (for \(p >1\)) and the preadjoint (for \(p=1\)) of the operator \(V.\) As the operator \(V\) acts as an isometry in all \(L_p\) spaces, preserves constants and positivity, the operator \(V^*\) also acts on all these spaces as a contraction which preserves constants and positivity. The operator \(V^*\) is a particular case of a Markov transition operator.

*Remark 7*

In the rest of the paper we will mainly restrict ourselves to exact transformations. This is just done to simplify the statements of the results and make the notation more convenient. We could easily extend these results to ergodic transformations \(T\) and to kernels \(f \in L_{p}(\mu ^d)\) satisfying the additional condition \(E(f\,|\,\mathcal{F }_1\otimes \cdots \otimes T_l^{-n} \mathcal{F }_l \otimes \cdots \otimes \mathcal{F }_d)\underset{n \rightarrow \infty }{\rightarrow } \check{E}^lf\), \(l=1, \ldots ,d\). Here \(T_l\) is the copy of \(T\) acting on the \(l\)-th coordinate in \(X^d\), \(\check{E}^l\) was defined in Sect. 4.1.

*Remark 8*

The results of the next sections are primarily concerned with exact (hence, non-invertible) transformations; however, they can be converted into some results on invertible transformations furnished with an additional structure. Indeed, assume that an invertible measure preserving \(T\) acts on \((X,\mathcal{F }, \mu )\) and we are given a \(\sigma \)-field \( \mathcal{F }_0 \subset \mathcal{F }\) such that \(T^{-1}\mathcal{F }_0 \supseteq \mathcal{F }_0.\) Then a theory, totally parallel to that we develop in the following sections for the exact case, applies to kernels measurable with respect to \(\mathcal{F }_0^{\otimes d}.\) The restriction of \(T^{-1}\) to \(\mathcal{F }_0\) corresponds to a non-invertible transformation. We leave details of this correspondence to the reader; it will be used when considering applications in Sect. 9. Just notice that the counterpart of exactness for an invertible \(T\) is the property \(\bigcap _{k \ge 0}T^k\mathcal{F }_0=\mathcal{N }\). If, moreover, \(\bigvee _{k \ge 0} T^{-k} \mathcal{F }_0=\mathcal{F }\), the transformation \(T\) is called *Kolmogorov*. Similarly to the exactness property in Remark 7, the Kolmogorov property can be relaxed to the requirement that \(T\) is ergodic and \(f\) satisfies an analogue of the additional condition there.

## 6 Growth rates for multiparameter sums

It follows from Lemma 1 for \(p \in [1, \infty )\) that the space \( L_{p,\, \pi }^{sym}(\,\mu ^m)\) can be identified, using the injective map \(J_m\), with a (non-closed) dense subspace of \(L_{p\,}^{sym}(\,\mu ^m)\). As we warned the reader above, the symbol \(J_m\) will be omitted and the relation \( L_{p,\,\pi }^{sym}(\,\mu ^m)\)\(\subset L_{p\,}^{sym}(\,\mu ^m)\) will be assumed instead of \(J_m( L_{p,\, \pi }^{sym}(\,\mu ^m))\)\(\subset L_{p\,} ^{sym}(\,\mu ^m).\) In particular, it makes sense to speak of canonical elements of \( L_{p,\,\pi \,}^{sym}(\,\mu ^m).\)

A noninvertible measure preserving transformation \(T\) of a probability space \((X, \mathcal{F }, \mu )\) has a natural decreasing filtration given by \((T^{-n}\mathcal{F })_{n \ge 0}.\) We shall use the following consequence of the Burkholder inequality.

**Lemma 3**

*Proof*

**Lemma 4**

*Proof*

*Remark 9*

Every \(f\) satisfying the assumptions of the above lemma is \(S\)-canonical in the following sense: since every operator \( V^{* {\mathbf{e}}_l}\) preserves the integrals with respect to the \(l\)–th variable, it follows from (12) that, under the assumptions of Lemma 4, integrating \(f\) over the \(l\)–th variable returns \(0\) whenever \(l \in S.\) This implies the assertion.

The following lemma provides a condition under which the martingale-coboundary decomposition is valid.

**Lemma 5**

*Proof*

The results and the proofs in [33], developed originally for the \(L_p\)–spaces, apply to the \(L_{p, \pi }\)-spaces without any changes. The requirement of *complete commutativity* imposed in [33] on the multiparameter dynamical system and the invariant measure is obviously fulfilled for a direct product with a coordinatewise action which we deal with in the present paper. Hence, by Proposition 3 in [33], the convergence of the series (14) implies that the *Poisson equation* (see [33]) is solvable for \(f\); therefore, we may apply Proposition 1 in [33] to \(f\). Then we obtain the representation (15) with \(A^Sf\) defined by formulas (16), (17) and the assertion on the uniqueness of the summands of the form (16). Notice that the operator \(V^*\) preserves integrals of functions with respect to \(\mu \); as a consequence, every \(V^{*\mathbf{n}}\) maps canonical functions to canonical ones. Being according to (14) a limit of canonical functions, \(g\) is canonical. In view of (17), all \(h^S\) are canonical, too. \(\square \)

**Proposition 4**

*Proof*

**Proposition 5**

*Proof*

Again, since the norm of the operator \(D_m: L_{p, \pi }(\mu ^m) \rightarrow L_r(\mu )\) is \(1\), we only need to prove (21). As \(n_1 \ge 1, \ldots , n_m \ge 1,\) we have for every \(S \in \mathcal{S }_m\, \prod _{l\, \in \,S} \frac{1}{n_l} \le 1\). Using this relation along with (15) and (18) we obtain (21) with \(C_{p, m}=\sum _{s=0}^m \left( \begin{array}{l} m\\ s\\ \end{array}\right) C_{p,\,m,\,s}\). \(\square \)

The following sufficient condition for convergence of the series in (14) will be used in Sect. 9 when considering applications. Expansion of a kernel into an absolutely convergent series whose summands are products of functions in separate variables is natural in the context of the limit theory of \(U\)- and \(V\)-statistics (see, for example, [9]). Projective tensor products call for using such series to representing arbitrary elements (see Proposition 2.8 in [49]). Neither uniqueness of the representation, nor linear independence of the ‘basis’ is assumed. Notice that we used such a decomposition in Corollary 3.

**Proposition 6**

*Proof*

## 7 Central limit theorems in the non-degenerate case

\(N(m, \sigma ^2)\) will denote the Gaussian distribution in \(\mathbb R \) with mean value \(m \in \mathbb R \) and variance \(\sigma ^2\ge 0\) including the case \(\sigma ^2=0\) of the Dirac measure at \(m \in \mathbb R \). We first prove a central limit theorem together with the convergence of the second moments.

**Theoerm 2**

*Remark 10*

According to the standard terminology, a kernel \(f\) is called *non-degenerate* if \(R_1f\) does not vanish identically, otherwise \(f\) is called *degenerate*. In the case of i.i.d. variables such non-degeneracy is equivalent to the non-degeneracy of the limit Gaussian distribution using normalization by the constants \(n^ {d-1/2}\). However, in the general stationary dependent case such a *statical non-degeneracy* may occur together with the degeneracy of the limit distribution. This phenomenon can be viewed as a *dynamical degeneracy*.

*Proof*

- (1)
\( V_n^{(d)}f_1 \) converges in distribution to \(N(0,d^2\sigma ^2(f)),\)

- (2)
\(|V_n^{(d)}f_1|_{2}^{\,2} \underset{n \rightarrow \infty }{\rightarrow }d^2\sigma ^2(f),\)

- (3)
\(|V_n^{(d)}\sum _{m=2}^df_m|_{2} \underset{n \rightarrow \infty }{\rightarrow } 0.\)

Under somewhat weaker assumptions we have the following central limit theorem with the convergence of the first absolute moment.

**Theoerm 3**

- (1)for every \(m=1, \ldots , d\)\(R_m f \in L_{m,\,\pi }^{sym}(\,\mu ^m)\) and the seriesconverges in \( L_{m,\, \pi }(\,\mu ^m)\),$$\begin{aligned} \sum _{\begin{array}{c} \mathbf{k} \in \mathbb Z ^m_+\\ \mathbf{0} \varvec{\le }\mathbf{k} \varvec{<} \varvec{\infty } \end{array}} V^{* \mathbf{k}}\,R_m f \, \left( \,\overset{\text {def}}{=}\,\,\,\underset{\begin{array}{c} n_1 \rightarrow \infty \\ \ldots \\ n_m \rightarrow \infty \end{array}}{\mathrm{lim }} \sum _{\begin{array}{c} \mathbf{k} \in \mathbb Z ^m_+\\ \mathbf{0} \varvec{\le }\mathbf{k} \varvec{<} \mathbf{n} \end{array}} V^{* \mathbf{k}} R_m f \right) \end{aligned}$$(29)
- (2)\(R_1 f\) satisfies the relation$$\begin{aligned} \left| \sum _{ k\,=\,0}^{n-1} V^kR_1 f\right| _1 =O(\sqrt{n})\qquad \text{ as } n \rightarrow \infty . \end{aligned}$$(30)

*Proof*

- 1)
for some \(\sigma (f)\ge 0\), \( V_n^{(d)}f_1 \) converges in distribution to \(N(0,d^2\sigma ^2(f)),\)

- 2)
\(|V_n^{(d)}f_1|_1 \underset{n \rightarrow \infty }{\rightarrow }{d\sqrt{\frac{2}{\pi }}}\,\sigma (f),\)

- 3)
\(|V_n^{(d)}\sum _{m=2}^df_m|_1 \underset{n \rightarrow \infty }{\rightarrow } 0.\)

*Remark 11*

## 8 A limit theorem for canonical kernels of degree 2

Apart from non-degenerate kernels of the previous section, a different type of von Mises statistics emerges from canonical symmetric kernels of degree \(d \ge 2\). Limit distributions of \(V\)-statistics defined by such kernels are usually described in terms of series (or polynomials) in Gaussian variables, or in terms of multiple stochastic integrals. In the case of \(V\)-statistics of dependent variables some descriptions of the limits in terms of dependent Gaussian variables or non-orthogonal stochastic integrals are known [7, 8, 26]. A rather attractive way is to present the limit distribution, like in the i.i.d. case, in terms of *independent* Gaussian variables. This will be done below in the case \(d=2\) and is based on the *diagonalization* of the symmetric kernel. The point is that the diagonalization here is applied, instead of the original kernel, to a *martingale kernel* which emerges as a leading summand in the *martingale-coboundary representation* of the original kernel. Notice that the diagonalization of martingale kernels is also used in [41]; in the present work, however, martingale kernels are considered as a subclass to which the study of much more general kernels is reduced.

We assume that \(f=f_2\) in terms of the Hoeffding decomposition for symmetric kernels (see Remark 6 in Sect. 4.2). Let \(\theta \) denote the involution in \((X^2\!,\mathcal{F }^{\otimes 2}\!\!,\mu ^2)\) interchanging the multipliers in the Cartesian product. We consider the spaces \( L_{2,\,\pi }(\,\mu ^2)\) and \(L_{2,\,\pi }^{sym}(\,\mu ^2)\) as embedded in \(L_2(\,\mu ^2).\)

**Proposition 7**

^{2}

*Proof*

**Theorem 4**

*Proof*

## 9 Exemplary applications

In this section we show how the results of the present paper can be applied in situations familiar to specialists in limit theorems for dynamical systems or weakly dependent random variables. We develop only a few of all possible applications and we do not optimize our assumptions. Instead, we show how certain earlier known and some new results can be deduced from ours. Applications of Theorem 1 were given in Corollaries 2 and 3.

### 9.1 Doubling transformation

**Translation-invariant kernels**Let now \(f \in L^2(\mu ^2)\) be of the form

**General kernels**Consider now (compare Proposition 6) a general kernel \(f \in \)\(L_2(X^2,\mathcal{F }^{\otimes 2}, \mu ^2)\) with Fourier expansion

*Remark 12*

In this subsection we gave applications of our results to the simplest example of a *differentiable expanding map*. This is based on the group structure of the example and its Fourier analysis. A more general approach can be developed on the basis of the *transfer operator* (\(V^*\) in our setting) restricted to some spaces of nice (smooth, Hölder or Sobolev) functions.

### 9.2 Stationary processes (martingale kernels, mixing conditions, Markov processes)

Let \(\xi =(\xi _n)_{n \in \mathbb Z }\) be an ergodic stationary random process defined on the space \((X, \mathcal{F }, \mu )\) where an invertible measure preserving transformation \(T\) acts so that \(\xi _{n+1}=\xi _n\circ T,\, n \in \mathbb Z \). We assume that all \(\xi _n\) take values in a probability space \( (Y, \mathcal{G }, \nu )\), \(\nu \) being the common distribution of \((\xi _n)_{n \in \, \mathbb Z }\). Let \((X^d, \mathcal{F }^{\otimes d}, \mu ^d)\) be the \(d\)-th Cartesian power of \((X, \mathcal{F }, \mu )\) with the coordinatewise action of \((T^{\mathbf{n}})_{\mathbf{n} \in \, \mathbb Z ^d}\) and the corresponding operators \((V^{\mathbf{n}})_{\mathbf{n} \in \, \mathbb Z ^d}\); let, furthermore, \((\xi _n^{(i)})_{n \in \mathbb Z }\), \(1 \le i \le d\), be independent copies of \((\xi _n)_{n \in \mathbb Z }\) defined on \((X^d, \mathcal{F }^{\otimes d}, \mu ^d)\) so that \(\xi _n^{(i)} (x_1,\ldots ,x_d)=\xi _n(x_i)\), where \(\,x_1,\ldots ,\)\( x_d \in X\), \(1\le i \le d, n \in \mathbb Z .\) Assume now that we are given some \(F \in L_{p,\,\pi }(Y^d,\mathcal{G }^{\otimes d}, \nu ^d) \) for some \(d \in \mathbb N \) and \(p \in [1,\infty )\). Then \(f=F(\xi _0^{(1)}, \ldots ,\xi _0^{(d)}) \in L_{p,\,\pi }(X^d,\mathcal{F }^{\otimes d}, \mu ^d)\), \(F(\xi _{n_1}^{(1)},\ldots ,\xi _{n_d}^{(d)})=V^{\mathbf{n}}f\) and \(F(\xi _{n_1},\ldots ,\xi _{n_d})=D_d V^{\mathbf{n}}f\) for every \(\mathbf{n}=(n_1, \ldots , n_d) \in \mathbb Z ^d\).

In the rest of the paper, instead of saying that an assertion of the previous part of the paper applies to a kernel \(f\) and a transformation \(T\), we will usually say that this assertion applies to the kernel \(F\) (the process \(\xi \) will be omitted).

#### 9.2.1 Martingale kernels

Let \(d=2\). Set \(\mathcal{F }_0=\sigma (\xi _0,\xi _{-1},\ldots ),\) the \(\sigma \)-field generated by \(\xi _0,\xi _{-1},\ldots \), and \(\mathcal{F }_n = T^{-n}\mathcal{F }_0=\sigma (\xi _n,\xi _{n-1},\ldots )\). Assume that \(f= F(\xi _0^{(1)},\xi _0^{(2)})\) is a canonical kernel. Obviously, it is measurable with respect to \(\mathcal{F }_0^{(1)} \otimes \mathcal{F }_0^{(2)}\bigl (\,\overset{\text {def}}{=} \sigma ((\xi _0^{(1)},\xi _{-1}^{(1)},\ldots ,\xi _0^{(2)}, \xi _{-1}^{(2)},\ldots )\bigr )\).

*symmetric*and satisfies \(E\bigl (F(\xi _0^{(1)}\!,\,\xi _0^{(2)})|\,\mathcal{F }_{-1}^{(1)} \otimes \mathcal{F }_{0}^{(2)}\bigr )=0. \) This implies that a non-vanishing summand may appear in the above sum only for \(i_1=i_2=0\), so we have nothing more to check in this case.

#### 9.2.2 Processes satisfying mixing conditions

*mixing coefficients*by setting

In the rest of 9.2.2 we show how Proposition 6 (more precisely, its analogue for an invertible \(T\)) can be used applying the results of the paper to \(V\)-statistics of a process \(\xi \) with suitable mixing properties.

*the invertible version of Proposition*6

*applies to the kernel*\(f:(x_1,\ldots ,x_m) \mapsto F(\xi _0(x_1),\ldots ,\xi _0(x_m))\)

*with some*\(p \in [1,\infty ]\)

*and the system*\((e_k)_{k=0}^{\infty }\)

*if, for a certain*\(q \in [p,\infty ]\),

*the system*\((\epsilon _k)_{k=0}^{\infty }\)

*satisfies the conditions*(41), \(F \in L_{q}(\,Y^m,\mathcal{G }^{\otimes m}, \nu ^m) \)

*admits the representation*(43),

*satisfying*(45),

*and we have*\(M_{q,\,p}< \infty \)

*for the process*\(\xi \). We now indicate conditions (stated in terms of \(\alpha , \varphi \) and \(\psi \)) under which Theorems 2, 3 and 4 of the paper, in their invertible forms and numerated by 2\(^{\,\prime }\), 3\(^{\,\prime }\) and 4\(^{\,\prime }\), apply to an \(F\). Theorem 3 needs more substantial changes in case of the mixing coefficient \(\varphi \). Below \((\epsilon _k)_{k \ge 0}\) is a system satisfying (41) with some parameter \(q\).

**(a)**Let \(q\in [2d,\infty ]\). We will use (38), (39) and (40), substituting there, in place of the pair \((q,p)\), the pair \((q,2d)\); we will employ Proposition 6 and formulas (42), (44) with \(p=2d\). Theorem 2\(^{\,\prime }\) applies to an \(F \in L_{2}^{sym}(\,\nu ^d) \) if

- (1)at least one of the seriesconverges (for \(q=2d\) the convergence of the \(\alpha \)-series means that \(\alpha (n)=0\) for \(n \ge n_0\)), and$$\begin{aligned} \sum _{n \ge 0} \alpha (n)^{\! (2d)^{-1}\!-q^{-1}},\, \sum _{n \ge 0}\varphi (n)^{1-\! q^{-1}\!},\, \sum _{n \ge 0} \psi (n) \end{aligned}$$(46)
- (2)for every \(m =2, \ldots ,d\)\(R_m F\) belongs to \(L_q^{sym}(\nu ^m)\) and admits the representationwhere the coefficients satisfy \( \sum _{\mathbf{0} \, \varvec{<}\mathbf{k} \, \varvec{<}\,\varvec{\infty }} |\,\lambda ^{R_m F}_{\mathbf{k}}\,|\, < \infty \). Under condition 2) with \(q = 2d\) Theorem 2\(^{\,\prime }\) applies, in particular, if \( \sum _{n \ge 0}\varphi (n)^{1-\! (2d)^{-1}\!} < \infty \).$$\begin{aligned} R_m F(y_1,\ldots ,y_m)= \sum _{\mathbf{0} \varvec{<} \mathbf{k} \varvec{<}\varvec{\infty }} \lambda ^{R_m F}_{\mathbf{k}}\, \epsilon _{k_1}(y_1)\cdots \epsilon _{k_m}(y_m) \end{aligned}$$(47)

**(b)**To simplify the statements involving \(\varphi \) assume that \(d \ge 2\). Let \(q\in [d,\infty ]\). We will use (38), (39) and (40), substituting there, in place of the pair \((q,p)\), the pair \((q,d)\); we will employ Proposition 6 and formulas (42), (44) with \(p=d\). Theorem 3\(^{\,\prime }\) applies to an \(F \in L_{1}^{sym}(\,\nu ^d) \) if

- (1)at least one of the seriesconverges (if \(q=d\) the convergence of the \(\alpha \)-series means that \(\alpha (n)=0\) for \(n \ge n_0\));$$\begin{aligned} \sum _{n \ge 0} \alpha (n)^{d^{-1}\! -\! q^{-1}}\, \sum _{n \ge 0}\varphi (n)^{1-q^{-1}},\, \sum _{n \ge 0} \psi (n) \end{aligned}$$(48)
- (2)\(R_1 F\) satisfies the relation (30):$$\begin{aligned} \left| \sum _{k=0}^{n-1}(R_1 F)\circ \xi _k\right| _1\,\,=\,\, O(\sqrt{n}); \end{aligned}$$
- (3)for every \(m =2, \ldots ,d R_m F\) belongs to \(L_q^{sym}(\nu ^m)\) and admits the representation$$\begin{aligned} R_m F(y_1,\ldots ,y_m)= \sum _{\mathbf{0} \varvec{<} \mathbf{k} \varvec{<}\varvec{\infty }} \lambda ^{R_m F}_{\mathbf{k}}\, \epsilon _{k_1}(y_1)\cdots \epsilon _{k_m}(y_m) \end{aligned}$$(49)

Under conditions 2) and 3) Theorem 3\(^{\,\prime }\) applies, in particular, if \( q=2d\) and \( \sum _{n\, \ge \, 0}\alpha (n)^{1/2d} < \infty \).

**(c)**Theorem 4 leads to a result on mixing processes in the following way. Let \(F \in L^{sym}_{2,\,\pi }(Y^2,\mathcal{G }^{\otimes 2}, \nu ^2) \) be a canonical function. Hence, it is the kernel of a

*nuclear*(or

*trace class*) symmetric integral operator in \(L_2(\nu )\) vanishing on constant functions. The general theory says that in \(L_2(\nu )\) there exists an orthogonal normalized sequence \(\epsilon _0\equiv 1,\epsilon _1,\ldots \) and a real sequence \(\gamma _1, \gamma _2, \ldots \) such that

*Remark 13*

The last assertion under the assumption \(\sum _{n \ge 0}\,\varphi (n)^{1/2} < \infty \) is, up to inessential details, Theorem 5 in [26]. In [9] the authors express their doubts on correctness in [26] to substituting a dependent process into the function (50). Our conclusion agrees with that of [26]. In our paper the correctness is a simple consequence of general properties of projective tensor products. However, an elementary reasoning shows that the series (50) absolutely converges in \(L_1(X^2,\kappa )\) where \(\kappa \) is an arbitrary probability on \(X^2\) with one-dimensional marginals \(\mu \).

#### 9.2.3 Discrete time Markov processes

Let \(\xi =(\xi _n)_{n \, \in \mathbb Z }\) be a stationary Markov process defined on the space \((X, \mathcal{F }, \mu )\) where an invertible measure preserving transformation \(T\) acts so that \(\xi _{n+1}=\xi _n\circ T,\, n \in \mathbb Z \). We assume that all \(\xi _n\) take values in a probability space \( (Y, \mathcal{G }, \nu )\), \(Y\) being the * state space* of \(\xi \) and \(\nu \) its *stationary distribution*. We will use the notations \(\mathcal{F }_k\), \(\mathcal{F }^k\), \(\mathcal{F }(k)\), \(E_k,E^k, E(k)\) and \(E\) as introduced above.

Let \(Q\) be the *transition operator* of \(\xi \) acting on every space \(L_p(\nu ), 1 \le p \le \infty ,\) with norm \(1\) and satisfying \(E_k f(\xi _{k+1})=(Qf)(\xi _k)\) for every \(f \in L_1(\nu )\) and \(k \in \mathbb Z \). Assuming \(\mathcal{F } =\sigma (\xi _l, l \in \mathbb Z )\), the process \(\xi \) (that is the transformation \(T\)) is ergodic if and only if for the transition operator \(Q: L_2(\nu ) \rightarrow L_2(\nu )\) every solution to the equation \(Qf=f\) is a constant. To stay within the assumptions of the present paper we assume a stronger relation \(Q^n h \underset{n \rightarrow \infty }{\rightarrow } \int h (y)\nu (dy) (h \in L_1(\nu ))\) which implies the Kolmogorov property of \(\xi \).

Let \(d \ge 1\) and \((\epsilon _k)_{k=0}^{\infty }\) be a sequence of functions satisfying (41) with \(q=2d\). Let \(I_{\nu }\) denote the identity operator in every space \(L_{q}(\nu )\). Assume that for some \(C > 0\) and every \(k \ge 1\) the equation \((I_{\nu }-Q)\phi _k=\epsilon _k\) is solvable and \(|\,\phi _k\,|_{2d}\le C \) (notice that the latter condition is fulfilled if the restriction \((I_{\nu }-Q)|_{L^0_{2d}}\) is invertible, \(L^0_{2d}\) denoting the subspace of functions in \(L_{2d}\) with integral \(0\)). Let \(F\in L_{2}\,(Y^d,\mathcal{G }^{\otimes d}, \nu ^d)\) satisfy assumption 2) of paragraph **a)** in 9.2.2 with \(q=2d\). Let, finally, the equation \((I_{\nu }-Q)g=R_1F\) have a solution \(g \in L_2(\nu )\). Then Theorem 2\(^{\,\prime }\) applies to \(f=F(\xi _0^{(1)}, \ldots ,\xi _0^{(d)})\).

## Footnotes

## Notes

### Acknowledgments

The authors would like to thank Herold Dehling for several discussions clarifying many aspects of limit distributions for \(V\)-statistics and for his encouragement to get this paper written. Also comments by referees were very helpful. The research was supported by the Deutsche Forschungsgemeinschaft under Grant Number 436 RUS 113/962/0-1 and the Russian Foundation for Basic Research under Grant Number 09-01-91331-NNIO-a. Manfred Denker was also partially supported by the National Science Foundation under Grant Number DMS-10008538. Mikhail Gordin was also partially supported by the Grant Number 13-01-00256-a of the Russian Foundation for Basic Research and by the grant Number NS-1216.2012.1 for the Support of Scientific Schools. He thanks Axel Munk (Institute for Mathematical Stochastics) and Laurent Bartholdi (Mathematical Institute) for their hospitality at the University of Göttingen where a part of this paper was prepared.

### References

- 1.Aaronson, J., Burton, R., Dehling, H., Gilat, D., Hill, T., Weiss, B.: Strong laws for \(L\)- and \(U\)-statistics. Trans. Am. Math. Soc.
**348**(7), 2845–2866 (1992)CrossRefMathSciNetGoogle Scholar - 2.Arcones, M.A., Giné, E.: Limit theorems for \(U\)-processes. Ann. Probab.
**21**(3), 1494–1542 (1993)CrossRefMATHMathSciNetGoogle Scholar - 3.Amanov, A.K.: On the limit distribution of the Mises functional with degenerate kernel for dependent random variables. (Russian) Probabilistic models and mathematical statistics, Collect. Artic., Tashkent, 7–14 (1987)Google Scholar
- 4.Babbel, B.: Schwache Invarianzprinzipien für verallgemeinerte von Mises Funktionale und U-Statistiken von Funktionalen schwach abhängiger Prozesse im Mehrstichprobenfall. (Weak invariance principles for generalized von Mises functionals and U-statistics of functionals of weakly dependent processes in the multisample case) (German). Göttingen (FRG): Univ . Göttingen, Mathematisch-Naturwissenschaftlicher Fachbereich, Diss (1987)Google Scholar
- 5.Babbel, B.: Invariance principles for \(U\)-statistics and von Mises functionals. J. Stat. Plann. Inference
**22**(3), 337–354 (1989)CrossRefMATHMathSciNetGoogle Scholar - 6.Billingsley, P.: The Lindeberg–Levy theorem for martingales. Proc. Am. Math. Soc.
**12**, 788–792 (1961)MATHMathSciNetGoogle Scholar - 7.Borisov, I.S., Bystrov, A.A.: Stochastic integrals and asymptotic analysis of canonical von Mises statistics based on dependent observations. High dimensional probability. IMS Lecture Notes Monograph Series, vol. 51, pp. 1–17. Inst. Math. Statist., Beachwood (2006)Google Scholar
- 8.Borisov, I.S., Bystrov, A.A.: Limit theorems for canonical von Mises statistics constructed from dependent observations. (Russian). Sibirsk. Mat. Zh.
**47**(6), 1205–1217 (2006); transl. in. Siberian Math. J.**47**(6), 980–989 (2006)Google Scholar - 9.Borisov, I.S., Volod’ko, N.V.: Orthogonal series and limit theorems for canonical \(U\)- and \(V\)-statistics of stationarily connected observations. (Russian) Mat. Tr.
**11**(1), 25–48 (2008); Transl. in Siberian Adv. Math.**18**(4), 242–257 (2008)Google Scholar - 10.Borovkova, S., Burton, R., Dehling, H.: Limit theorems for functionals of mixing processes with applications to \(U\)-statistics and dimension estimation. Trans. Am. Math. Soc.
**353**(11), 4261–4318 (2001) (electronic)Google Scholar - 11.Borovkova, S., Burton, R., Dehling, H.: From dimension estimation to asymptotics of dependent \(U\)-statistics. In: Limit Theorems in Probability and Statistics, vol. 1, pp. 201–234 (Balatonlelle, 1999). János Bolyai Mathamatical Society, Budapest (2002) Google Scholar
- 12.Bradley, R.C.: On some results of M. I. Gordin: a clarification of a misunderstanding. J. Theor. Probab.
**1**(2), 115–119 (1988)CrossRefMATHGoogle Scholar - 13.Burkholder, D.L.: Martingale transforms. Ann. Math. Stat.
**37**, 1494–1504 (1966)CrossRefMATHMathSciNetGoogle Scholar - 14.Davydov, YuA: Convergence of distributions generated by stationary stochastic processes (Russian). Prob. Theory Appl.
**13**(4), 730–737 (1968)CrossRefMATHGoogle Scholar - 15.Dehling, H., Denker, M., Philipp, W.: Invariance principles for von Mises and \(U\)-statistics. Z. Wahrsch. Verw. Gebiete
**67**(2), 139–167 (1984)CrossRefMATHMathSciNetGoogle Scholar - 16.Defant, A., Floret, K.: Tensor norms and operator ideals. North-Holland Mathematics Studies, vol. 176. North-Holland Publishing Co., Amsterdam (1993)Google Scholar
- 17.de la Peña, V.H., Giné, E.: Decoupling. From dependence to independence. Randomly stopped processes. \(U\)-statistics and processes. Martingales and beyond. Probab. Appl. (New York), Springer, New York (1999)Google Scholar
- 18.Dehling, H., Taqqu, M.S.: The limit behavior of empirical processes and symmetric statistics for stationary sequences. In: Proceedings of the 46th Session of the International Statistical Institute, vol. 4 (Tokyo, 1987). Bull. Inst. Intern. Stat.
**52**(4), 217–234 (1987)Google Scholar - 19.Dehling, H.: Limit theorems for dependent \(U\)-statistics. Dependence in probability and statistics. Lecture Notes in Statist, vol. 187, pp. 65–86 . Springer, New York (2006)Google Scholar
- 20.Dehling, H., Wendler, M.: Central limit theorem and a bootstrap for \(U\)-statistics of strongly mixing data. J. Mult. Anal.
**101**, 126–137 (2010)CrossRefMATHMathSciNetGoogle Scholar - 21.Denker, M., Keller, G.: On \(U\)-statistics and v. Mises’ statistics for weakly dependent processes. Z. Wahrsch. Verw. Gebiete
**64**(4), 505–522 (1983)Google Scholar - 22.Denker, M., Keller, G.: Rigorous statistical procedures for data from dynamical systems. J. Stat. Phys.
**44**(1–2), 67–93 (1986)CrossRefMATHMathSciNetGoogle Scholar - 23.Denker, M., Grillenberger, C., Keller, G.: A note on invariance principles for v. Mises’ statistics. Metrika
**32**(3–4), 197–214 (1985)CrossRefMATHMathSciNetGoogle Scholar - 24.Dunford, N., Schwartz, J.T.: Linear Operators. I. General Theory. Interscience Publishers Inc., New York (1958)MATHGoogle Scholar
- 25.Dynkin, E.B., Mandelbaum, A.: Symmetric statistics, Poisson point proceses, and multiple Wiener integrals. Ann. Stat.
**11**(3), 739–746 (1983)CrossRefMATHMathSciNetGoogle Scholar - 26.Eagleson, G.K.: Orthogonal expansions and \(U\)-statistics. Austral. J. Stat.
**21**(3), 221–237 (1979)CrossRefMATHMathSciNetGoogle Scholar - 27.Filippova, A.A.: Mises’ theorem on the asymptotic behavior of functionals of empirical distribution functions and its statistical applications. Theory Probab. Appl.
**7**, 24–57 (1962)CrossRefMATHMathSciNetGoogle Scholar - 28.Gohberg, I.C., Krein, M.G.: Introduction to the theory of linear nonselfadjoint operators. Translated from the Russian 1965 original. Translations of Mathematical Monographs, vol. 18. American Mathematical Society, Providence (1969)Google Scholar
- 29.Gohberg, I., Krupnik, N., Goldberg S.: Traces and determinants of linear operators. Oper. Theory Adv. Appl. (Birkhäuser)
**116**(2000)Google Scholar - 30.Gordin M.I.: On the central limit theorem for stationary processes. (Russian) Dokl. Akad. Nauk SSSR
**188**, 739–741 (1969). Transl. in Soviet Math. Dokl.**10**, 1174–1176 (1969)Google Scholar - 31.Gordin, M.I.: Central limit theorem for stationary processes without the assumption of finite variance. (Russian). Abstracts of Communications, T. 1: A-K. International Conference on Probability Theory and Mathematical Statistics, pp 173–174, June 25–30, Vilnius (1973)Google Scholar
- 32.Gordin, M.I., Lifshits, B.A.: Central limit theorem for stationary Markov processes. (Russian) Dokl. Akad. Nauk SSSR
**239**, 766–767 (1978); Transl. in Soviet Math. Dokl.**19**(2), 392–394 (1978)Google Scholar - 33.Gordin, M.I.: Martingale-coboundary representation for a class of random fields. (Russian). Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov (POMI),
**364**(2009), Veroyatnost i Statistika. no. 14.2, pp. 88–108, 236; Transl.: J. Math. Sci. (NY)**163**(2), 363–374 (2009)Google Scholar - 34.Grassberger, P., Procaccia, I.: Characterization of strange attractors. Phys. Rev. Lett.
**50**, 346–349 (1983)CrossRefMathSciNetGoogle Scholar - 35.Halmos, P.: The theory of unbiased estimation. Ann. Math. Stat.
**17**, 34–43 (1946)CrossRefMATHMathSciNetGoogle Scholar - 36.Hoeffding, W.: A class of statistics with asymptotically normal distribution. Ann. Math. Stat.
**19**, 293–325 (1948)CrossRefMATHMathSciNetGoogle Scholar - 37.Ibragimov, I.A.: A central limit theorem for a class of dependent random variables (Russian). Prob.Theory Appl.
**8**(1), 89–94 (1963)CrossRefGoogle Scholar - 38.Ibragimov, I.A., Linnik, YuV: Independent and Stationry Sequences of Random Variables. Wolters-Noordhof Publishing, Groningen (1971)Google Scholar
- 39.Khashimov Sh.A.: Asymptotic normality of a generalized von Mises functional for \(m\)-dependent variables. Theory of Random Processes and its Applications (Russian), pp. 141–148. “Naukova Dumka”, Kiev (1990)Google Scholar
- 40.Koroljuk, V.S., Borovskich, Yu.V.: Theory of \(U\)-statistics. Math. Appl.
**273**Translated from the 1989 Russian original. Kluwer Academic Publications Group, Dordrecht (1989)Google Scholar - 41.Leucht, A., Neumann, M.H.: Degenerate U- and V-statistics under ergodicity: asymptotics, bootstrap and applications in statistics. Ann. Inst. Stat. Math.
**65**, 349–386 (2013)CrossRefMATHMathSciNetGoogle Scholar - 42.Leucht, A., Neumann, M.H.: Dependent wild bootstrap for degenerate U- and V-statistics. J. Multivar. Anal.
**117**, 257–280 (2013)CrossRefMATHMathSciNetGoogle Scholar - 43.Leucht, A.: Degenerate U- and V-statistics under weak dependence: asymptotic theory and bootstrap consistency. Bernoulli
**18**(4), 552–585 (2012)CrossRefMATHMathSciNetGoogle Scholar - 44.Maigret, N.: Théorème de limite centrale fonctionnel pour une chaî ne de Markov récurrente au sens de Harris et positive. Ann. Inst. H. Poincaré Sect. B (N.S.)
**14**(4), 425–440 (1978)MATHMathSciNetGoogle Scholar - 45.Major, P.: Tail behaviour of multiple random integrals and \(U\)-statistics. J. Probab. Surv.
**2**, 448–505 (2005)CrossRefMATHMathSciNetGoogle Scholar - 46.Neuhaus, G.: Functional limit theorems for \(U\)-statistics in the degenerate case. J. Multivar. Anal.
**7**(3), 424–439 (1977)CrossRefMATHMathSciNetGoogle Scholar - 47.Reed, M., Simon, B.: Methods of modern mathematical physics. Functional Analysis, vol. 1. Academic Press Inc., New York (1980)Google Scholar
- 48.Rohlin, V.A.: Exact endomorphisms of a Lebesgue space (Russian). Izv. Akad. Nauk SSSR Ser. Mat.
**25**, 499–530 (1961)MathSciNetGoogle Scholar - 49.Ryan, R.A.: Introduction to Tensor Products of Banach Spaces. Springer, London (2002)CrossRefMATHGoogle Scholar
- 50.Sharipov, OSh: The invariance principle for \(U\)-statistics and von Mises functionals of weakly dependent observations. (Russian). Teor. Veroyatnost. i Primenen.
**47**(4), 814–817 (2002); Transl. in Theory Probab. Appl.**47**(4), 730–733 (2003)Google Scholar - 51.Volod’ko, N.V.: Limit theorems for canonical von Mises statistics and \(U\)-statistics of \(m\)-dependent observations. (Russian). Teor. Veroyatn. Primen.,
**55**(2), 226–249 (2010). Transl. in. Theory Probab. Appl.**55**(2), 271–290 (2011)Google Scholar - 52.von Mises, R.: On the asymptotic distribution of differentiable statistical functions. Ann. Math. Stat.
**18**, 309–348 (1947)CrossRefMATHGoogle Scholar - 53.Yoshihara, K.-I.: Limiting behavior of \(U\)-statistics for stationary, absolutely regular processes.linebreak Z. Wahrsch. Verw. Gebiete
**35**(3), 237–252 (1976)Google Scholar - 54.Yoshihara, K.-I.: Limiting behavior of \(U\)-statistics for strongly mixing sequences. Yokohama Math. J.
**39**(2), 107–113 (1992)Google Scholar