1 Introduction

In investigations of topological dynamical systems, different versions of chaos, which represent complexity in various senses, attract a lot of attention over the past few decades. The most studied chaos are Devaney chaos, Li–Yorke chaos, distributional chaos, positive entropy and weak mixing. The relationship among them naturally became a central topic as well. It was known that weak mixing implies Li–Yorke chaos [24]. In 2002, Huang and Ye [20] showed that Devaney chaos implies Li–Yorke chaos. Later, using ergodic theoretic method, Blanchard, Glasner, Kolyada and Maass [3] proved that positive topological entropy also implies Li–Yorke chaos. A combinatorial proof was given by Kerr and Li [25].

Many of the classical notions in topological dynamical systems have an analogous version in the mean sense. Downarowicz [11] observed that mean Li–Yorke chaos is equivalent to the so-called DC2 chaos and proved that positive topological entropy implies mean Li–Yorke chaos, which strengthens the result of [3]. Huang, Ye and the first author [22] provided a different approach and showed that positive topological entropy implies a multivariant version of mean Li–Yorke chaos. For related topics on chaotic behaviours we refer to [19, 23, 26] for general group actions, [15] for a new condition implying mean Li–Yorke chaos, [2, 13, 18] for more careful discussions, and the survey article [28] for more aspects and details.

It is natural to study “sequence versions” of dynamical notions (e.g., sequence entropy). In order to make our statement explicit, we restate the main theorem of [22] in the following, which concerns the terminology of multivariant mean Li–Yorke chaos. By a topological dynamical system, we mean a pair (XT), where X is a compact metric space equipped with a metric d and T is a homeomorphism of X onto itself.

Theorem 1.1

([22, Theorem 1.1]). If a topological dynamical system (XT) has positive topological entropy, then it is multivariant mean Li–Yorke chaotic; namely, there exists a Cantor subset K of X such that for every integer \(n\ge 2\) and pairwise distinct points \(x_1,x_2,\cdots ,x_n\) in K it holds that

$$\begin{aligned} \liminf _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\max _{1\le i<j\le n} d\left( T^kx_i,T^kx_j\right) =0 \end{aligned}$$

and

$$\begin{aligned} \limsup _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\min _{1\le i<j\le n} d\left( T^kx_i,T^kx_j\right) >0. \end{aligned}$$

Motivated by the above consideration and results we naturally ask if “sequence chaos” appears in positive entropy systems; more precisely, we investigate the following question.

Question 1.2

Let \(\{a_k\}_{k=1}^\infty \) be a sequence of positive integers. Suppose that (XT) is a topological dynamical system with positive entropy. Is there a Cantor (in particular, uncountable) subset K of X such that for every integer \(n\ge 2\) and pairwise distinct points \(x_1,x_2,\cdots ,x_n\) in K it holds that

$$\begin{aligned} \liminf _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\max _{1\le i<j\le n} d\left( T^{a_k}x_i,T^{a_k}x_j\right) =0 \end{aligned}$$

and

$$\begin{aligned} \limsup _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\min _{1\le i<j\le n} d\left( T^{a_k}x_i,T^{a_k}x_j\right) >0\;? \end{aligned}$$

As it was shown in previous results (e.g., Theorem 1.1), Birkhoff’s pointwise ergodic theorem plays a key role. To study Question 1.2, we would follow this idea. However, for general sequences of positive integers the situation of pointwise ergodic convergence becomes much more complicated. Fortunately, there were classic pointwise ergodic theorems for subsequences of positive integers as well (see Sect. 3.3 for details), which are quite useful for our argument. It is worth mentioning that the proofs of these ergodic theorems are extremely nontrivial. All these facts lead to the following restrictions on sequences.

Definition 1.3

An increasing sequence \(\{a_k\}_{k=1}^\infty \) of positive integers is pointwise good if for each measure preserving system \((X,{\mathscr {B}},\mu ,T)\) and \(f\in L^2(\mu )\),

$$\begin{aligned} \frac{1}{N}\sum _{k=1}^{N}f(T^{a_k}x) \end{aligned}$$

converges for \(\mu \)-a.e. \(x\in X\).

There are a lot of pointwise good sequences. In a series of papers, Bourgain [5,6,7] proved that \(\{[p(k)]\}\) is pointwise good provided p(x) is a polynomial of real coefficients, where \([\,\cdot \,]\) denotes the integer part of a real number. This result was generalized to some logarithmic–exponential subpolynomials [4]. The sequence of primes [5] and many return time sequences [6, 8, 27] are also pointwise good. We refer to [9, 30] for a more comprehensive understanding of pointwise ergodic theorems.

If a sequence is pointwise good, then the ergodic average along this sequence converges almost everywhere. Nonetheless, we do not have a further understanding of the value of the limit. Following [27], we say that a sequence \(\{a_k\}\) is very good if for any ergodic system \((X,{\mathscr {B}},\mu ,T)\) and \(f\in L^2(\mu )\), it holds that

$$\begin{aligned} \lim _{N\rightarrow \infty } \frac{1}{N}\sum _{k=1}^{N}f(T^{a_k}x)=\int fd\mu \quad \text {for}\; \mu \text {-a.e. }\, x\in X. \end{aligned}$$

In this terminology, Birkhoff’s pointwise ergodic theorem states that the sequence of natural numbers is very good. Following the ideas in [22], it is routine to check that Theorem 1.1 also holds along any very good sequence; namely, the answer to Question 1.2 is affirmative for every very good sequence. For example, it was shown in [27] that the Morse sequence is very good and in [4] that sequences generalized by some logarithmic–exponential subpolynomials (including sequences \(\{[k^r]\}\) for all non-integer numbers \(r>0\)) are very good.

Unfortunately, there are only few pointwise good sequences which are verified to be very good. Comparing to the unsolved cases of Question 1.2, this is far from satisfying. To deal with the remaining cases, we follow the idea in [14] to study the limit of ergodic average along sequences.

Definition 1.4

A sub-\(\sigma \)-algebra \({\mathscr {F}}\) of \({\mathscr {B}}\) is a characteristic \(\sigma \) -algebra for the sequence \(\{a_k\}\) if for every \(f\in L^2(\mu )\),

$$\begin{aligned} \frac{1}{N}\sum _{k=1}^{N} T^{a_k}f- \frac{1}{N}\sum _{k=1}^{N} T^{a_k}E(f|{\mathscr {F}})\rightarrow 0 \end{aligned}$$

as \(N\rightarrow \infty \) in \(L^2(\mu )\), where \(E(\cdot |{\mathscr {F}})\) is the conditional expectation with respect to \({\mathscr {F}}\).

As we will show in Theorem 3.2, the Pinsker \(\sigma \)-algebra of a system is characteristic for all sequences \(\{a_k\}\) satisfying Condition (\(*\)) below, which helps us overcome additional difficulties in the proof of our main theorem.

Condition ( \(*\) ).  For every \(L>0\),

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{1}{N^2}\#\{(i,j)\in [1,N]^2:\,|a_i-a_j|\le L\}=0. \end{aligned}$$

We note here that Condition (\(*\)) is quite mild. In fact, it is not hard to check that any strictly increasing sequence of positive integers satisfies Condition (\(*\)) (see Lemma 3.1), and that all those pointwise good sequences we previously mentioned satisfy Condition (\(*\)) as well (see Remark 3.15). Moreover, for any pointwise good sequence satisfying Condition (\(*\)), the limit of ergodic average is a constant on each atom of the generating partition of the Pinsker \(\sigma \)-algebra.

We now state the main result of this paper in the following, solving Question 1.2 affirmatively for a large class of sequences.

Theorem 1.5

(Main theorem). Let \(\{a_k\}\) be a pointwise good sequence satisfying Condition (\(*\)). Suppose that (XT) is a topological dynamical system with positive topological entropy. Then (XT) is multivariant mean Li–Yorke chaotic along the sequence \(\{a_k\}\); namely, there exists a Cantor subset K of X such that for every integer \(n\ge 2\) and pairwise distinct points \(x_1,x_2,\cdots ,x_n\) in K we have

$$\begin{aligned} \liminf _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\max _{1\le i<j\le n} d\left( T^{a_k}x_i,T^{a_k}x_j\right) =0 \end{aligned}$$

and

$$\begin{aligned} \limsup _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\min _{1\le i<j\le n} d\left( T^{a_k}x_i,T^{a_k}x_j\right) >0. \end{aligned}$$

The proof of Theorem 1.5 is based on the techniques used in [22]. A main new ingredient in this paper is the convergence argument for sequences given in Sect. 3.

Corollary 1.6

Let \(\{a_k\}\) be a pointwise good sequence satisfying Condition (\(*\)). Suppose that (XT) is a topological dynamical system with positive topological entropy. Then (XT) is Li–Yorke chaotic along the sequence \(\{a_k\}\); namely, there exists a Cantor subset K of X such that for any \(x,y\in K\) with \(x\ne y\) we have

$$\begin{aligned} \liminf _{N\rightarrow \infty }d(T^{a_k}x,T^{a_k}y)=0 \quad \text {and}\; \limsup _{N\rightarrow \infty }d(T^{a_k}x,T^{a_k}y)>0. \end{aligned}$$

As we mentioned previously, the prime sequence and \(\{[p(k)]\}\) (where p(x) is a polynomial of real coefficients) are pointwise good and satisfy Condition (\(*\)).

Corollary 1.7

Any positive entropy system is multivariant mean Li–Yorke chaotic along the prime sequence.

Corollary 1.8

Any positive entropy system is multivariant mean Li–Yorke chaotic along the sequence \(\{[p(k)]\}\), where p(x) is a polynomial of real coefficients.

This paper is organized as follows. In Sect. 2, we review some necessary notions and required properties. In Sect. 3, we study ergodic average along sequences. More precisely, we show that the Pinsker \(\sigma \)-algebra is a characteristic \(\sigma \)-algebra for any sequence satisfying Condition (\(*\)) (Theorem 3.2). Moreover, for any pointwise good sequence satisfying Condition (\(*\)) we have a decomposition for any invariant measure (Theorem 3.4); meanwhile, the limit of ergodic average is a constant on each atom of the generating partition of the Pinsker \(\sigma \)-algebra (Theorem 3.5). In Sect. 4, we prove Theorem 1.5, which holds for all continuous surjective maps as well (Theorem 4.5).

2 Preliminaries

For convenience, our notations will be as close to [22] as possible. For the reader who is not familiar with notions in ergodic theory and dynamical systems, we refer to [12, 32].

2.1 Mycielski’s theorem

Let X be a compact metric space and C(X) the space of all real-valued continuous functions on X equipped with the supremum norm. For \(n\ge 2\), denote \(X^n=X\times X\times \cdots \times X\) (n-copies). Set \(\Delta _n=\{(x,x,\cdots ,x)\in X^n:x\in X\}\) and \(\Delta ^{(n)}=\{(x_1,x_2,\cdots ,x_n)\in X^n:\,\text { there exists }1\le i<j\le n\text { such that }x_i=x_j\}\). We shall use the following version of Mycielski’s theorem.

Theorem 2.1

(Cf. [29, Theorem 1]). Assume that X is a perfect compact metric space. If for every integer \(n\ge 2\), \(R_n\) is a dense \(G_\delta \) subset of \(X^n\), then there exists a dense subset K of X which is a union of countably many Cantor sets such that \(K^n\subset R_n\cup \Delta ^{(n)}\) holds for all integers \(n\ge 2\).

2.2 Conditional expectation and disintegration

Let \((X,{\mathscr {B}},\mu )\) be a probability space and \({\mathscr {A}}\) a sub-\(\sigma \)-algebra of \({\mathscr {B}}\). The conditional expectation is a map

$$\begin{aligned} E(\,\cdot \,|{\mathscr {A}}):L^1(X,{\mathscr {B}},\mu )\rightarrow L^1(X,{\mathscr {A}},\mu ) \end{aligned}$$

satisfying the following conditions:

  1. (1)

    for every \(f\in L^1(X,{\mathscr {B}},\mu )\), \(E(f|{\mathscr {A}})\) is \({\mathscr {A}}\)-measurable;

  2. (2)

    if g is \({\mathscr {A}}\)-measurable and \(fg\in L^1(X,{\mathscr {B}},\mu )\), then \(E(fg|{\mathscr {A}})=gE(f|{\mathscr {A}})\);

  3. (3)

    if \(f\in L^p(X,{\mathscr {B}},\mu )\) for some \(p\ge 1\), then \(E(f|{\mathscr {A}})\in L^p(X,{\mathscr {A}},\mu )\) and

    $$\begin{aligned} \Vert E(f|{\mathscr {A}})\Vert _{L^p}\le \Vert f \Vert _{L^p}. \end{aligned}$$

The Martingale theorem is well known (see e.g., [16, Theorem 14.26], [12, Chapter 5.2]).

Theorem 2.2

(Martingale theorem). Let \((X,{\mathscr {B}},\mu )\) be a probability space. Suppose that \(\{{\mathscr {A}}_n\}_{n=1}^\infty \) is a decreasing sequence (resp. an increasing sequence) of sub-\(\sigma \)-algebras of \({\mathscr {B}}\) and \({\mathscr {A}}=\bigcap _{n\ge 1}{\mathscr {A}}_n\) (resp. \({\mathscr {A}}=\bigvee _{n\ge 1}{\mathscr {A}}_n\)). Then for any \(f\in L^1(\mu )\),

$$\begin{aligned} E(f|{\mathscr {A}}_n)\rightarrow E(f|{\mathscr {A}}) \end{aligned}$$

as \(n\rightarrow \infty \) in \(L^1(\mu )\) and \(\mu \)-almost everywhere.

Let X be a compact metric space. Denote by \({\mathscr {B}}_X\) the Borel \(\sigma \)-algebra of X and \({\mathscr {M}}(X)\) the set of all Borel probability measures on X. For \(\mu \in {\mathscr {M}}(X)\), let \({\mathscr {B}}_\mu \) be the completion of \({\mathscr {B}}_X\) under the measure \(\mu \). Then \((X,{\mathscr {B}}_\mu ,\mu )\) is a Lebesgue space. A finite partition of X is a finite family of pairwise distinct measurable subsets of X whose union is X. If \(\{ \alpha _i\}_{i\in I}\) is a countable family of finite partitions of X, then we say that \(\alpha =\bigvee _{i\in I}\alpha _i\) is a measurable partition. The sets \(A\in {\mathscr {B}}_\mu \), which are unions of atoms of \(\alpha \), form a sub-\(\sigma \)-algebra \({\mathscr {B}}_\mu \), which we denote by \({\widehat{\alpha }}\) or \(\alpha \) if there is no ambiguity. Every sub-\(\sigma \)-algebra of \({\mathscr {B}}_\mu \) coincides with a \(\sigma \)-algebra constructed in this way (mod \(\mu \)).

Let \({\mathscr {F}}\) be a sub-\(\sigma \)-algebra of \({\mathscr {B}}_\mu \) and \(\alpha \) a measurable partition of X with \({\widehat{\alpha }}={\mathscr {F}}\) (mod \(\mu \)). Then \(\mu \) can be disintegrated over \({\mathscr {F}}\) as

$$\begin{aligned} \mu =\int _X \mu _x d \mu (x), \end{aligned}$$

where \(\mu _x\in {\mathscr {M}}(X)\) and \(\mu _x(\alpha (x))=1\) for \(\mu \)-a.e. \(x\in X\). The disintegration can be characterized by the properties (2.1) and (2.2) as follows:

$$\begin{aligned}&\text {for every } f \in L^1(X,{\mathscr {B}}_X,\mu ),\ f \in L^1(X,{\mathscr {B}}_X,\mu _x)\ \text {for } \mu \text {-a.e. } x\in X, \end{aligned}$$
(2.1)
$$\begin{aligned}&\text {and the map } x \mapsto \int _X f(y)\,d\mu _x(y) \text { is in } L^1(X,{\mathscr {F}},\mu ); \nonumber \\&\text {for every } f\in L^1(X,{\mathscr {B}}_X,\mu ),\ E(f|{\mathscr {F}})(x)=\int _X f\,d\mu _{x}\ \text {for } \mu \text {-a.e. } x\in X. \end{aligned}$$
(2.2)

Then for any \(f \in L^1(X,{\mathscr {B}}_X,\mu )\), one has

$$\begin{aligned} \int _X \left( \int _X f(y)\,d\mu _x(y) \right) \, d\mu (x)=\int _X f \,d\mu . \end{aligned}$$

2.3 Ergodic theory

By a measure-preserving system, we mean a quadruple \((X,{\mathscr {B}},\mu ,T)\), where \((X,{\mathscr {B}},\mu )\) is a probability space and \(T:X\rightarrow X\) is an invertible measure-preserving transformation. A measure-preserving system \((X,{\mathscr {B}},\mu ,T)\) is called ergodic if the only members \(B\in {\mathscr {B}}\) with \(T^{-1}B=B\) satisfy \(\mu (B)=0\) or \(\mu (B)=1\).

Let \(\alpha \) be a finite partition of X. The measure-theoretic entropy of \(\mu \) relative to \(\alpha \) is denoted by \(h_\mu (T,\alpha )\), and the measure-theoretic entropy of \(\mu \) is defined as

$$\begin{aligned} h_\mu (T)=\sup _{\alpha } h_\mu (T,\alpha ), \end{aligned}$$

where the supremum ranges over all finite partitions of X.

The Pinsker \(\sigma \) -algebra of a system \((X,{\mathscr {B}}, \mu ,T)\) is defined as

$$\begin{aligned} P_\mu (T)=\{A\in {\mathscr {B}}:h_\mu (T,\{A, X{\setminus } A\})=0\}. \end{aligned}$$

It is easy to see that \(P_\mu (T)\) is T-invariant. The Rokhlin–Sinai theorem identifies the Pinsker \(\sigma \)-algebra as the “remote past” of a generating partition.

Theorem 2.3

([31]). For a measure-preserving system \((X,{\mathscr {B}},\mu ,T)\), there exists a sub-\(\sigma \)-algebra \({\mathscr {P}}\) of \({\mathscr {B}}\) such that \(T^{-1}{\mathscr {P}}\subset {\mathscr {P}}\), \(\bigvee _{k=0}^\infty T^k{\mathscr {P}}={\mathscr {B}}\) and \(\bigcap _{n=0}^\infty T^{-k}{\mathscr {P}}=P_\mu (T)\).

2.4 Topological dynamics

Let (XT) be a topological dynamical system. For a point \(x\in X\), the stable set of x is defined as

$$\begin{aligned} W^{s}(X,T)=\left\{ y\in X:\lim _{k\rightarrow \infty }d(T^kx,T^ky)=0\right\} , \end{aligned}$$

and the unstable set of x is defined as

$$\begin{aligned} W^{u}(X,T)=\left\{ y\in X:\lim _{k\rightarrow \infty }d(T^{-k}x,T^{-k}y)=0\right\} . \end{aligned}$$

Let \({\mathscr {M}}(X,T)\) (resp. \({\mathscr {M}}^e(X,T)\)) be the collection of all T-invariant (resp. ergodic T-invariant) Borel probability measures on X. For every \(\mu \in {\mathscr {M}}(X,T)\), \((X,{\mathscr {B}}_X,\mu ,T)\) is a measure-preserving system. The variational principle is famous:

$$\begin{aligned} h_\text {top}(X,T)=\sup _{\mu \in {\mathscr {M}}(X,T)}h_\mu (X,T)=\sup _{\mu \in {\mathscr {M}}^e(X,T)}h_\mu (X,T), \end{aligned}$$

where \(h_\text {top}(X,T)\) denotes the topological entropy of (XT).

3 The Pinsker \(\sigma \)-algebra and pointwise good sequences

3.1 Characteristic \(\sigma \)-algebras

Lemma 3.1

Any strictly increasing sequence of positive integers satisfies Condition (\(*\)).

Proof

Fix a strictly increasing sequence \(\{a_k\}\) of positive integers and \(L>0\). It is clear that \(|a_i-a_j|\le L\) implies \(|i-j|\le L\). For every integer \(N>L\), we have

$$\begin{aligned} \#\{(i,j)\in [1,N]^2:|a_{i}-a_{j}|\le L\} \le \# \{(i,j)\in [1,N]^{2}:|i-j|\le L\} \le 2NL. \end{aligned}$$

Thus,

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{1}{N^2}\#\{(i,j)\in [1,N]^2:|a_i-a_j|\le L\}=0. \end{aligned}$$

Theorem 3.2

Suppose that \(\{a_k\}\) is a sequence of positive integers satisfying Condition (\(*\)) and \((X,{\mathscr {B}},\mu ,T)\) is a measure-preserving system. Then the Pinsker \(\sigma \)-algebra \(P_\mu (T)\) is a characteristic \(\sigma \)-algebra for the sequence \(\{a_k\}\).

Proof

Fix \(f\in L^2(\mu )\) and \(\varepsilon >0\). Choose a partition \({\mathscr {P}}\) as in the Rohlin–Sinai theorem. By Theorem 2.2, there exist \(m>0\) and \(g_m=E(f|T^m{\mathscr {P}})\) such that

$$\begin{aligned} \Vert g_m-f\Vert _{L^2(\mu )}<\varepsilon . \end{aligned}$$

Thus, it holds that

$$\begin{aligned} \left\| \frac{1}{N}\sum _{k=1}^{N} \left( T^{a_k}f- T^{a_k}g_m\right) \right\| _{L^2(\mu )}&\le \frac{1}{N} \sum _{k=1}^{N} \left\| T^{a_k}f- T^{a_k}g_m\right\| _{L^2(\mu )}\\&=\frac{1}{N} \sum _{k=1}^{N} \Vert f-g_m\Vert _{L^2(\mu )}<\varepsilon \end{aligned}$$

and

$$\begin{aligned}&\left\| \frac{1}{N}\sum _{k=1}^{N} \left( T^{a_k}E(g_m|P_\mu (T))- T^{a_k}E(f|P_\mu (T))\right) \right\| _{L^2(\mu )}\\&\qquad \le \frac{1}{N} \sum _{k=1}^{N} \Vert T^{a_k}E(g_m|P_\mu (T))- T^{a_k}E(f|P_\mu (T))\Vert _{L^2(\mu )}\\&\qquad = \frac{1}{N} \sum _{k=1}^{N} \Vert E(g_m|P_\mu (T))-E(f|P_\mu (T))\Vert _{L^2(\mu )}\\&\qquad = \frac{1}{N} \sum _{k=1}^{N}\Vert E(g_m-f|P_\mu (T))\Vert _{L^2(\mu )}\\&\qquad \le \frac{1}{N} \sum _{k=1}^{N}\Vert g_m-f\Vert _{L^2(\mu )} <\varepsilon . \end{aligned}$$

By Theorem 2.2 again, there exist \(n>0\) and \(h_n= E(g_m|T^{-n}{\mathscr {P}})\) such that

$$\begin{aligned} \Vert h_n-E(g_m|P_\mu (T))\Vert _{L^2(\mu )}<\varepsilon . \end{aligned}$$

It follows that

$$\begin{aligned} \left\| \frac{1}{N}\sum _{k=1}^{N} \left( T^{a_k}h_n-T^{a_k}E(g_m|P_\mu (T))\right) \right\| _{L^2(\mu )}<\varepsilon . \end{aligned}$$

Notice that

$$\begin{aligned} \left\| \frac{1}{N}\sum _{k=1}^{N}\left( T^{a_k}g_m-T^{a_k}h_n\right) \right\| _{L^2(\mu )}^2&=\int \left| \frac{1}{N}\sum _{k=1}^{N} \left( T^{a_k}g_m-T^{a_k}h_n\right) \right| ^2 d\mu \\&= \frac{1}{N^2} \sum _{i,j=1}^{N} \int \left( T^{a_i}g_m-T^{a_i}h_n\right) \left( T^{a_j}g_m-T^{a_j}h_n\right) d\mu \\&=\frac{1}{N^2}\sum _{i,j=1}^{N} \left( A_{ij}-B_{ij}+C_{ij}-D_{ij}\right) , \end{aligned}$$

where

$$\begin{aligned} A_{ij}&=\int T^{a_i}g_m \cdot T^{a_j}g_m d\mu ,\quad B_{ij}=\int T^{a_i}h_n\cdot T^{a_j}g_m d\mu ,\\ C_{ij}&=\int T^{a_i}h_n\cdot T^{a_j}h_nd\mu ,\quad D_{ij}=\int T^{a_i}g_m\cdot T^{a_j}h_nd\mu . \end{aligned}$$

Claim 1

If \(a_j-m\ge n+a_i\), then we have \(A_{ij}=B_{ij}\) and \(C_{ij}=D_{ij}\).

Proof of Claim 1

First, we have

$$\begin{aligned} B_{ij} =&\int h_n \circ T^{a_i}\cdot g_m\circ T^{a_j} d\mu =\int \left( h_n\cdot g_m\circ T^{a_j-a_i}\right) \circ T^{a_i} d\mu \\ =&\int h_n\cdot g_m\circ T^{a_j-a_i}d\mu . \end{aligned}$$

Recall that \(g_m=E(f|T^m{\mathscr {P}})\). Then \(g_m\) is \(T^m{\mathscr {P}}\)-measurable and hence \(g_m\circ T^{a_j-a_i}\) is \(T^{-(a_j-a_i-m)}{\mathscr {P}}\)-measurable. As \(a_j-m\ge n+a_i\), \(a_j-a_i-m\ge n\), we know that \(g_m\circ T^{a_j-a_i}\) is \(T^{-n}{\mathscr {P}}\)-measurable. Recall that \(h_n=E(g_m|T^{-n}{\mathscr {P}})\). By (2) of Sect. 2.2, we have

$$\begin{aligned} B_{ij} =&\int E(g_m\cdot g_m \circ T^{a_j-a_i}|T^{-n}{\mathscr {P}}) d\mu =\int g_m\cdot g_m \circ T^{a_j-a_i} d\mu \\ =&\int \left( g_m\cdot g_m \circ T^{a_j-a_i}\right) \circ T^{a_i} d\mu = A_{ij}. \end{aligned}$$

Now we consider the term \(C_{ij}\):

$$\begin{aligned} C_{ij}&=\int h_n\circ T^{a_i}\cdot h_n\circ T^{a_j}d\mu =\int h_n \cdot h_n\circ T^{a_j-a_i} d\mu . \end{aligned}$$

As \(a_j-m\ge n+a_i\), \(a_j\ge a_i\), we have that \(E(g_m|T^{-n}{\mathscr {P}})\circ T^{a_j-a_i}\) is \(T^{-n}{\mathscr {P}}\)-measurable. Recall that \(h_n=E(g_m|T^{-n}{\mathscr {P}})\). By (2) of Sect. 2.2, we have

$$\begin{aligned} C_{ij}=&\int E\left( g_m \cdot h_n\circ T^{a_j-a_i}|T^{-n}{\mathscr {P}}\right) d\mu =\int g_m \cdot h_n\circ T^{a_j-a_i}d\mu \\ =&\int g_m\circ T^{a_i}\cdot h_n\circ T^{a_j} d\mu =D_{ij}. \end{aligned}$$

This ends the proof of Claim 1. \(\square \)

Similarly, we have the following

Claim 2

If \(a_i-m\ge n+a_j\), then we have \(A_{ij}=D_{ij}\) and \(B_{ij}=C_{ij}\).

By Hölder’s inequality, it is easy to see that

$$\begin{aligned} \max \left\{ \left| A_{ij}\right| , \left| B_{ij}\right| , \left| C_{ij}\right| , \left| D_{ij}\right| \right\} \le \Vert g_m\Vert _{L^2(\mu )}^2. \end{aligned}$$

Note that the sequence \(\{a_k\}\) satisfies Condition (\(*\)). For \(n+m-1\) there exists \(N_0>0\) such that whenever \(N\ge N_0\) it holds that

$$\begin{aligned} \frac{1}{N^2}\#\{(i,j)\in [1,N]^2:|a_j-a_i|\le n+m-1\}< \frac{\varepsilon ^2}{4\Vert g_m\Vert _{L^2(\mu )} ^2}. \end{aligned}$$

Therefore, when \(N\ge N_0\) we have

$$\begin{aligned}&\left\| \frac{1}{N}\sum _{k=1}^{N} \left( T^{a_k}g_m-T^{a_k}h_n\right) \right\| _{L^2(\mu )}^2\\&\qquad =\frac{1}{N^2}\sum _{i,j=1}^{N} \left( A_{ij}-B_{ij}+C_{ij}-D_{ij}\right) \\&\qquad = \frac{1}{N^2} \sum _{\begin{array}{c} i,j=1\\ |a_j-a_i| \le n+m-1 \end{array}}^{N} \left( A_{ij}-B_{ij}+C_{ij}-D_{ij}\right) \\&\qquad \le 4\Vert g_m\Vert _{L^2(\mu )}^2\cdot \frac{1}{N^2}\#\{(i,j)\in [1,N]^2:|a_j-a_i|\le n+m-1\} < \varepsilon ^2. \end{aligned}$$

To sum up, for \(N\ge N_0\) we have

$$\begin{aligned}&\biggl \Vert \frac{1}{N}\sum _{k=1}^{N} \left( T^{a_k}f- T^{a_k}E(f|P_\mu (T))\right) \biggr \Vert _{L^2(\mu )}\\&\quad \le \biggl \Vert \frac{1}{N}\sum _{k=1}^{N} (T^{a_k}f- T^{a_k}g_m)\biggr \Vert _{L^2(\mu )}+ \biggl \Vert \frac{1}{N}\sum _{k=2}^{N} \left( T^{a_k}g_m-T^{a_k}h_n\right) \biggr \Vert _{L^2(\mu )}\\&\quad \quad +\biggl \Vert \frac{1}{N}\sum _{k=1}^{N} \left( T^{a_k}h_n- T^{a_k}E(g_m|P_\mu (T))\right) \biggr \Vert _{L^2(\mu )}\\&\quad \quad + \biggl \Vert \frac{1}{N}\sum _{k=1}^{N} \left( T^{a_k}E(g_m|P_\mu (T))- T^{a_k}E(f|P_\mu (T))\right) \biggr \Vert _{L^2(\mu )}\\&\quad < \varepsilon +\varepsilon +\varepsilon +\varepsilon =4\varepsilon . \end{aligned}$$

This ends the proof. \(\square \)

Remark 3.3

We also refer to [10] for a variant of Theorem 3.2 which deals with polynomial iterates and pointwise convergence.

3.2 Good sequences for pointwise convergence

Similar to the ergodic decomposition theorem, we have the following decomposition of an invariant measure with respect to any pointwise good sequence.

Theorem 3.4

Let \(\{a_k\}\) be a pointwise good sequence of positive integers. Suppose that (XT) is a topological dynamical system and \(\mu \in M(X,T)\). Then there exists a disintegration of \(\mu \),

$$\begin{aligned} \mu =\int \tau _xd\mu (x), \end{aligned}$$

in the sense that there exists a Borel subset \(X_0\) of X with \(\mu (X_0)=1\) such that for any \(x\in X_0\) and \(f\in C(X)\), it holds that

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^{N}f(T^{a_k}x)=\int fd\tau _x \end{aligned}$$

and

$$\begin{aligned} \int \int fd\tau _xd\mu (x)=\int fd\mu . \end{aligned}$$

Proof

Fix a countable dense subset \(\{f_n\}\) of C(X). As \(\{a_k\}\) is pointwise good, there exists a Borel subset \(X_0\) of X with \(\mu (X_0)=1\) such that for every \(x\in X_0\) and \(f_n\), we have

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^{N}f_n(T^{a_k}x)=\overline{f}_n(x). \end{aligned}$$

Fix \(x\in X_0\). Define

$$\begin{aligned} L_x:C(X)\rightarrow {\mathbb {R}}, \;\; f\mapsto \lim _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^{N}f(T^{a_k}x). \end{aligned}$$

Since \(\{f_n\}\) is dense in C(X) endowed with the supremum norm, \(L_x\) is well defined. Moreover, \(L_x\) is a positive linear functional with \(L_x(1)=1\). By the Riesz Representation Theorem, there exists \(\tau _x\in M(X)\) such that for any \(f\in C(X)\),

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^{N}f(T^{a_k}x)=\int fd\tau _x. \end{aligned}$$

By Lebesgue’s dominated convergence theorem,

$$\begin{aligned} \int \left( \int fd\tau _x\right) d\mu (x)&= \int \left( \lim _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^{N}f(T^{a_k}x)\right) d\mu (x)\\&= \lim _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^{N} \int f(T^{a_k}x)d\mu (x)=\int fd\mu . \end{aligned}$$

\(\square \)

Theorem 3.5

Let \(\{a_k\}\) be a pointwise good sequence, (XT) a topological dynamical system and \(\mu \in M(X,T)\). If the Pinsker \(\sigma \)-algebra \(P_\mu (T)\) is a characteristic \(\sigma \)-algebra for the sequence \(\{a_k\}\), then there exists a Borel subset \(X_1\) of X with \(\mu (X_1)=1\) such that for any \(x\in X_1\) and \(f\in C(X)\), it holds that

$$\begin{aligned} \int fd\tau _x=\int \left( \int fd\tau _y\right) d\mu _x(y), \end{aligned}$$

where

$$\begin{aligned} \mu =\int \tau _xd\mu (x)\quad \text {and} \; \mu =\int \mu _xd\mu (x) \end{aligned}$$

are the disintegrations of \(\mu \) as in Theorem 3.4 and over \(P_\mu (T)\) respectively.

Proof

As \(\{a_k\}\) is pointwise good, for every \(f\in C(X)\) there exist \(\bar{f}\) and \(f^*\) in \(L^2(\mu )\) such that

$$\begin{aligned} \frac{1}{N}\sum _{k=1}^N T^{a_k} f\rightarrow \bar{f} \;\;\text {and}\;\; \frac{1}{N}\sum _{k=1}^N T^{a_k}E(f|P_\mu (T))\rightarrow f^* \end{aligned}$$

\(\mu \)-almost everywhere. As the Pinsker \(\sigma \)-algebra \(P_\mu (T)\) is a characteristic \(\sigma \)-algebra for the sequence \(\{a_k\}\), by the definition we have

$$\begin{aligned} \frac{1}{N}\sum _{k=1}^{N} T^{a_k}f- \frac{1}{N}\sum _{k=1}^{N} T^{a_k}E(f|P_\mu (T))\rightarrow 0 \end{aligned}$$

as \(N\rightarrow \infty \) in \(L^2(\mu )\). So there exists a strictly increasing sequence \(\{N_i\}\) of positive integers such that

$$\begin{aligned} \frac{1}{N_i}\sum _{k=1}^{N_i} T^{a_k}f- \frac{1}{N_i}\sum _{k=1}^{N_i} T^{a_k}E(f|P_\mu (T))\rightarrow 0 \end{aligned}$$

as \(i\rightarrow \infty \) \(\mu \)-almost everywhere. Hence \(\bar{f}(x)= f^*(x)\) for \(\mu \)-a.e. \(x\in X\). Clearly, \(f^*\) is \(P_\mu (T)\)-measurable, so is \(\bar{f}\). Let

$$\begin{aligned} \mu =\int \tau _xd\mu (x) \end{aligned}$$

be the disintegration of \(\mu \) as in Theorem 3.4. Then \(\bar{f}(x)=\int f d\tau _x\) for \(\mu \)-a.e. \(x\in X\). Let

$$\begin{aligned} \mu =\int \mu _xd\mu (x) \end{aligned}$$

be the disintegration of \(\mu \) over the Pinsker \(\sigma \)-algebra \(P_\mu (T)\). As \(\bar{f}\) is \(P_\mu (T)\)-measurable, by (2.2) we have \(\bar{f}(x)=\int \bar{f} d\mu _x\) for \(\mu \)-a.e. \(x\in X\).

Now fix a countable dense subset \(\{f_n\}\) of C(X). By the above discussion, there exists a Borel subset \(X_1\) of \(X_0\) with \(\mu (X_1)=1\) such that for every \(x\in X_1\) and \(n\in {\mathbb {N}}\),

$$\begin{aligned} \int f_n d\tau _x = \int \left( \int f_n d\tau _y\right) d\mu _x(y). \end{aligned}$$
(3.1)

As any function in C(X) can be uniformly approximated by \(\{f_n\}\), (3.1) holds for all \(f\in C(X)\). \(\square \)

Remark 3.6

Let \(\alpha \) be a measurable partition generating \(P_\mu (T)\). Then \(\mu _x(\alpha (x))=1\) for \(\mu \)-a.e. \(x\in X\), where \(\alpha (x)\) is the atom of \(\alpha \) containing x. If a function f is \(P_\mu (T)\)-measurable, then we know from (2.2) that for \(\mu \)-a.e. \(x\in X\), f is a constant almost everywhere with respect to \(\mu _x\) on \(\alpha (x)\). By Theorem 3.5, if a sequence is pointwise good and satisfies Condition (\(*\)), then the limit of the ergodic average along this sequence is a constant on any atom of the generating partition of \(P_\mu (T)\).

If \((X,{\mathscr {B}},\mu ,T)\) is a Kolmogorov system, then the Pinsker factor is trivial. This yields the following.

Corollary 3.7

Let \(\{a_k\}\) be a pointwise good sequence satisfying Condition (\(*\)) and \((X,{\mathscr {B}},\mu ,T)\) a Kolmogorov system. Then for every \(f\in L^2(\mu )\),

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^{N}f(T^{a_k}x) = \int fd\mu \end{aligned}$$

holds for \(\mu \)-a.e. \(x\in X\).

3.3 Examples of pointwise good sequences

There are many results on pointwise good/bad sequences, we refer the reader to [9, 30] for recent work on this topic. Here let us list some sequences of positive integers which are both pointwise good and satisfying Condition (\(*\)).

Theorem 3.8

([5, Theorem 1]). The sequence of prime numbers is pointwise good.

Theorem 3.9

([6, Theorem 2]). For any polynomial

$$\begin{aligned} p(x)=b_{0}+b_{1}x+\cdots +b_{m}x^{m}, \end{aligned}$$

where \(m\ge 1\), \(b_0,\cdots ,b_m\in {\mathbb {R}}\), and \(b_m>0\), the sequence \(\{[p(k)]\}\) is pointwise good.

Remark 3.10

There are very good random sequences of integers that are not too sparse, including ones that grow faster than any polynomial (see [7, Proposition 8.2]).

Furthermore, a lot of sequences generated by logarithmic–exponential subpolynomials are pointwise good (for details see [4, Theorems 3.4, 3.5, 3.8]). In particular, one has the following:

Theorem 3.11

For every non-integer number \(r>0\), the sequence \(\{[k^r]\}\) is very good.

Theorem 3.12

([6, Theorem in Appendix]). Let \((X,{\mathscr {B}},\mu ,T)\) be an ergodic measure-preserving system and \(A\in {\mathscr {B}}\) with \(\mu (A)>0\). Then for \(\mu \)-a.e. \(x\in X\), the sequence \(\{n\in {\mathbb {N}}:T^nx\in A\}\) is pointwise good.

Theorem 3.13

([8, Theorem 1]). Let (XT) be a minimal equicontinuous system with the unique invariant measure \(\mu \). For every \(x\in X\) and every subset A of X with \(\mu (A)>0\) and \(\mu (\partial A)=0\) (where \(\partial A\) is the boundary of A), the sequence \(\{n\in {\mathbb {N}}:T^nx\in A\}\) is pointwise good.

Theorem 3.14

([27, Theorem 3]). Return time sequences for dynamical systems which are abelian extensions of translations on compact abelian groups are pointwise good. Moreover, the Morse sequence is very good.

Remark 3.15

Similar to the proof of Lemma 3.1, it is not hard to check that the sequences in Theorems 3.9 and 3.11 satisfy Condition (\(*\)).

4 Proof of the main theorem

4.1 Dynamical systems with positive entropy

Let (XT) be a topological dynamical system, \(\mu \in {\mathscr {M}}(X,T)\), and \({\mathscr {B}}_{\mu }\) the completion of \({\mathscr {B}}_X\) under \(\mu \). Then \((X,{\mathscr {B}}_\mu ,\mu ,T)\) is a Lebesgue system. Let \(P_\mu (T)\) be the Pinsker \(\sigma \)-algebra of \((X,{\mathscr {B}}_\mu ,\mu ,T)\) and

$$\begin{aligned} \mu =\int _X \mu _x d \mu (x) \end{aligned}$$

the disintegration of \(\mu \) over \(P_\mu (T)\). Under the above settings, we first state the following lemma.

Lemma 4.1

([22, Lemma 3.1]). If \(\mu \in {\mathscr {M}}^e(X,T)\) and \(h_\mu (T)>0\), then for \(\mu \)-a.e. \(x\in X\),

$$\begin{aligned} \overline{W^s(x,T)\cap {{\mathrm{supp}}}(\mu _x)}={{\mathrm{supp}}}(\mu _x) \quad \text {and}\; \overline{W^u(x,T)\cap {{\mathrm{supp}}}(\mu _x)}={{\mathrm{supp}}}(\mu _x). \end{aligned}$$

By Theorem 3.2 and the variational principle, our main result (Theorem 1.5) follows from Theorem 4.2:

Theorem 4.2

Let (XT) be a topological dynamical system and \(\mu \in {\mathscr {M}}^e(X,T)\) with \(h_\mu (T)>0\). If \(\{a_k\}\) is a pointwise good sequence and the Pinsker \(\sigma \)-algebra \(P_\mu (T)\) is a characteristic \(\sigma \)-algebra for the sequence \(\{a_k\}\), then for \(\mu \)-a.e. \(x\in X\), there exists a Cantor subset \(K_x\subset \overline{W^s(x,T)}\cap \overline{W^u(x,T)}\) satisfying that for every integer \(n\ge 2\) and pairwise distinct points \(x_1,x_2,\cdots ,x_n\) in \(K_x\) we have

$$\begin{aligned} \liminf _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\max _{1\le i<j\le n} d\left( T^{a_k}x_i,T^{a_k}x_j\right) =0 \end{aligned}$$

and

$$\begin{aligned} \limsup _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\min _{1\le i<j\le n} d\left( T^{a_k}x_i,T^{a_k}x_j\right) \ge \eta _{x,n}, \end{aligned}$$

where \(\eta _{x,n}\) is a positive constant depending only on x and n.

Proof

Let

$$\begin{aligned} \mu =\int _X\mu _xd\mu (x) \end{aligned}$$

be the disintegration of \(\mu \) over \(P_\mu (T)\). By Lemma 4.1, there exists a Borel subset \(X_1\) of X with \(\mu (X_1)=1\) such that for any \(x\in X_1\),

$$\begin{aligned} \overline{W^s(x,T)\cap {{\mathrm{supp}}}(\mu _x)}={{\mathrm{supp}}}(\mu _x) \quad \text {and}\; \overline{W^u(x,T)\cap {{\mathrm{supp}}}(\mu _x)}={{\mathrm{supp}}}(\mu _x). \end{aligned}$$

Fix an integer \(n\ge 2\). Set

$$\begin{aligned} MP_n(X,T)= \left\{ (x_1,x_2,\cdots ,x_n)\in X^n:\liminf _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\max _{1\le i<j\le n} d(T^{a_k}x_i,T^{a_k}x_j)=0\right\} . \end{aligned}$$

By a similar argument as in [22, Lemma 2.4], \(MP_n(X,T)\) is a \(G_\delta \) subset of \(X^n\). It is clear that \((x_1,x_2,\cdots ,x_n)\in MP_n(X,T)\) for any \(x_1,x_2,\cdots ,x_n\) in \(W^s(x,T)\). Thus, for each \(x\in X_1\), \(MP_n(X,T)\cap {{\mathrm{supp}}}(\mu _x)^n\) is a dense \(G_\delta \) subset of \({{\mathrm{supp}}}(\mu _x)^n\).

Define a measure \(\lambda _n\) on \((X^n,T^{(n)})\) by

$$\begin{aligned} \lambda _n=\int _X \mu _x^{(n)} d \mu (x), \end{aligned}$$

where \(\mu _x^{(n)}=\mu _x\times \mu _x\times \cdots \times \mu _x\) (n-times) and \(T^{(n)}=T\times T\times \cdots \times T\) (n-times). As \(\mu \) is ergodic and has positive entropy, \({\mu }_x\) is non-atomic for \(\mu \)-a.e. \(x\in X\) and \(\lambda _n\) is a \(T^{(n)}\)-invariant ergodic measure on \(X^{n}\) (see [21, Lemma 5.4]). Since \(\mu _x\) is non-atomic for \(\mu \)-a.e. \(x\in X\), by the Fubini theorem, \(\lambda _n(\Delta ^{(n)})=0\). By Theorem 3.4, there exists a disintegration of \(\lambda _n\) with respect to the sequence \(\{a_k\}\):

$$\begin{aligned} \lambda _n=\int \tau _{(x_1,x_2,\cdots ,x_n)}d\lambda _n(x_1,x_2,\cdots ,x_n). \end{aligned}$$

Consider the continuous function

$$\begin{aligned} f:X^n\rightarrow {\mathbb {R}},\quad (x_1,x_2,\cdots ,x_n)\mapsto \min \{d(x_i,x_j):1\le i<j\le n\}. \end{aligned}$$

By Theorem 3.4, we have

$$\begin{aligned} \lim _{N\rightarrow \infty } \frac{1}{N}\sum _{k=1}^{N} f\left( (T^{(n)})^{a_k}(x_1,x_2,\cdots ,x_n)\right)&= \lim _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\min _{1\le i<j\le n} d(T^{a_k}x_i,T^{a_k}x_j)\\&=\int f d\tau _{(x_1,x_2,\cdots ,x_n)}. \end{aligned}$$

Since \(\lambda _n(\Delta ^{(n)})=0\), \(\tau _{(x_1,x_2,\cdots ,x_n)}(\Delta ^{(n)})=0\) for \(\lambda _n\)-a.e. \((x_1,x_2,\cdots ,x_n)\). Note that for any \((x_1,x_2,\cdots ,x_n)\not \in \Delta ^{(n)}\), \(f(x_1,x_2,\cdots ,x_n)>0\). Thus, we have \(\int fd\tau _{(x_1,x_2,\cdots ,x_n)}>0\) for \(\lambda _n\)-a.e. \((x_1,x_2,\cdots ,x_n)\).

Let \(\pi :X^n\rightarrow X\), \((x_1,x_2,\cdots ,x_n)\mapsto x_1\) be the canonical projection to the first coordinate. By [17, Theorem 4] (see also [23, Lemma 4.2]), we know that the Pinsker \(\sigma \)-algebra of \((X^n,\lambda _n,T^{(n)})\) equals \(\pi ^{-1}(P_\mu (T))\pmod {\lambda _n}\). So

$$\begin{aligned} \lambda _n=\int _X \mu _x^{(n)} d \mu (x) \end{aligned}$$

can be also regarded as the disintegration of \(\lambda \) over the Pinsker \(\sigma \)-algebra of \((X^n,\lambda _n,T^{(n)})\). By Theorem 3.5, we have that for \(\mu \)-a.e. \(x\in X\), \(\int f d\tau _{(x_1,x_2,\cdots ,x_n)}\) is a constant for \(\mu _x^{(n)}\)-a.e. \((x_1,x_2,\cdots ,x_n)\in X^n\). More specifically, there exists a Borel subset \(X_2\) of X with \(\mu (X_2)=1\) such that for any \(x\in X_2\),

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\min _{1\le i<j\le n} d(T^{a_k}x_i,T^{a_k}x_j)=\eta _{x,n} \end{aligned}$$

for \(\mu _x^{(n)}\)-a.e. \((x_1,x_2,\cdots ,x_n)\in X^n\) and some constant \(\eta _{x,n}>0\). Put

$$\begin{aligned} D_{n,\eta }(X,T)=\left\{ (x_1,x_2,\cdots ,x_n)\in X^n:\limsup _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\min _{1\le i<j\le n} d(T^{a_k}x_i,T^{a_k}x_j)\ge \eta \right\} . \end{aligned}$$

Similar to [22, Lemma 2.4], \(D_{n,\eta }(X,T)\) is a \(G_\delta \) subset of \(X^n\). Then for each \(x\in X_2\), \(D_{n,\eta _{x,n}}(X,T) \cap {{\mathrm{supp}}}(\mu _x)^n\) is a dense \(G_\delta \) subset of \({{\mathrm{supp}}}(\mu _x)^n\).

As \(n\ge 2\) is arbitrary, there exists a Borel subset \(X_0\) of X with \(\mu (X_0)=1\) such that for every integer \(n\ge 2\) and \(x\in X_0\), \(MP_n(X,T)\cap D_{n,\eta _{x,n}}(X,T) \cap {{\mathrm{supp}}}(\mu _x)^n\) is a dense \(G_\delta \) subset of \({{\mathrm{supp}}}(\mu _x)^n\). Since \(\mu _x\) is non-atomic for \(\mu \)-a.e. \(x\in X\), we can also require that \(\mu _x\) is non-atomic for every \(x\in X_0\). Then for each \(x\in X_0\), \({{\mathrm{supp}}}(\mu _x)\) is a perfect closed subset of X. By Mycielski’s theorem, there exists a Cantor subset \(K_x\) of \({{\mathrm{supp}}}(\mu _x)\) such that for every integer \(n\ge 2\),

$$\begin{aligned} K_x^n\subset (MP_n(X,T)\cap D_{n,\eta _{x,n}}(X,T))\cup \Delta ^{(n)}. \end{aligned}$$

Thus, \(K_x\) is as required. \(\square \)

Recall that the Pinsker factor of a Kolmogorov system is trivial. Theorem 4.2 yields the following.

Corollary 4.3

Let \(\{a_k\}\) be a pointwise good sequence satisfying Condition (\(*\)). Suppose that (XT) is a topological dynamical system and there is an invariant measure \(\mu \) such that \({{\mathrm{supp}}}(\mu )=X\) and \((X,{\mathscr {B}},\mu ,T)\) is a Kolmogorov system. Then there exists a Cantor subset K of X and a sequence \(\{\eta _n\}\) of positive numbers such that for every integer \(n\ge 2\) and pairwise distinct points \(x_1,x_2,\cdots ,x_n\) in K, it holds that

$$\begin{aligned} \liminf _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\max _{1\le i<j\le n} d\left( T^{a_k}x_i,T^{a_k}x_j\right) =0 \end{aligned}$$

and

$$\begin{aligned} \limsup _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\min _{1\le i<j\le n} d\left( T^{a_k}x_i,T^{a_k}x_j\right) \ge \eta _{n}. \end{aligned}$$

Remark 4.4

One of the referees asked the following interesting question: Does there exist some subsequences which are bad for mean Li–Yorke chaos in dynamical systems with positive topological entropy? In particular, how about the sequence \(\{2^n\}\)?

It is proved by Bellow in [1] that very fast growing sequences, known as lacunary sequences, are bad for mean ergodic theorems. Recall that a positive sequence \(\{a_n\}\) is called lacunary if there exists \(\lambda > 1\) such that \(\frac{a_{n+1}}{a_n}\ge \lambda \) for all \(n\ge 1\). Clearly the sequence \(\{2^n\}\) is lacunary. Note that the proof of Theorem 4.2 strongly depends on the sequences which are good for pointwise ergodic theorems. To solve the above question, new technique should be involved.

4.2 Non-invertible case

In this subsection, we generalize Theorem 4.2 to the non-invertible case. Let (XT) be a non-invertible system, meaning that X is a compact metric space and the map \(T:X\rightarrow X\) is continuous surjective but not one-to-one.

Theorem 4.5

Let (XT) be a non-invertible system and \(\mu \in {\mathscr {M}}^e(X,T)\) with \(h_\mu (T)>0\). If \(\{a_k\}\) is a pointwise good sequence and the Pinsker \(\sigma \)-algebra \(P_\mu (T)\) is a characteristic \(\sigma \)-algebra for the sequence \(\{a_k\}\), then for \(\mu \)-a.e. \(x\in X\), there exists a Cantor subset \(K_x\subset \overline{W^s(x,T)}\) satisfying that for every integer \(n\ge 2\) and pairwise distinct points \(x_1,x_2,\cdots ,x_n\) in \(K_x\) we have

$$\begin{aligned} \liminf _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\max _{1\le i<j\le n} d\left( T^{a_k}x_i,T^{a_k}x_j\right) =0 \end{aligned}$$

and

$$\begin{aligned} \limsup _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\min _{1\le i<j\le n} d\left( T^{a_k}x_i,T^{a_k}x_j\right) \ge \eta _{x,n}, \end{aligned}$$

where \(\eta _{x,n}\) is a positive constant depending only on x and n.

Proof

We consider the natural extension \(({\tilde{X}}, {\tilde{T}})\) of (XT); that is,

$$\begin{aligned} {\widetilde{X}}=\left\{ (x_{1}, x_{2}, \cdots )\in X^{{\mathbb {N}}}:Tx_{i+1}=x_{i},\ i\in {\mathbb {N}}\right\} \end{aligned}$$

and \({\widetilde{T}}:{\widetilde{X}}\rightarrow {\widetilde{X}}\), \((x_{1}, x_{2}, \cdots )\mapsto (Tx_{1}, x_{1}, x_{2},\cdots )\) is the shift homeomorphism. A compatible metric \({\tilde{d}}\) on \({\widetilde{X}}\) is defined by

$$\begin{aligned} {\tilde{d}}((x_{1}, x_{2}, \cdots ), (y_{1}, y_{2}, \cdots ))= \sum _{i=1}^{\infty }\frac{d(x_{i}, y_{i})}{2^{i}}. \end{aligned}$$

Let \(\pi :{\widetilde{X}}\rightarrow X\), \((x_1,x_2,\cdots )\mapsto x_1\) be the projection to the first coordinate. Then \(\pi :({\widetilde{X}},{\widetilde{T}})\rightarrow (X,T)\) is a factor map. As \(\mu \) is an ergodic invariant measure of (XT), there exists an ergodic invariant measure \({\widetilde{\mu }}\) of \(({\widetilde{X}},{\widetilde{T}})\) such that \(\pi ({\widetilde{\mu }})=\mu \). Clearly, \(h_{{\widetilde{\mu }}}({\widetilde{T}})\ge h_{\mu }(T)>0\). By Theorem 4.2, there is a Borel subset \({\widetilde{X}}_{0}\) of \({\widetilde{X}}\) with \({\widetilde{\mu }}({\widetilde{X}}_{0})=1\) such that for each \({\tilde{x}}\in {\widetilde{X}}_{0}\), there exists a Cantor subset \(K_{{\tilde{x}}}\subset \overline{W^s({\tilde{x}},{\widetilde{T}})}\) satisfying that for every integer \(n\ge 2\) and pairwise distinct points \({\tilde{x}}_1,{\tilde{x}}_2,\cdots ,{\tilde{x}}_n\) in \(K_{{\tilde{x}}}\) one has

$$\begin{aligned} \liminf _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\max _{1\le i<j\le n} {\tilde{d}} \left( {\widetilde{T}}^{a_k}{\tilde{x}}_i,{\widetilde{T}}^{a_k}{\tilde{x}}_j\right) =0 \end{aligned}$$

and

$$\begin{aligned} \limsup _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\min _{1\le i<j\le n} {\tilde{d}}\left( {\widetilde{T}}^{a_k}{\tilde{x}}_i,{\widetilde{T}}^{a_k}{\tilde{x}}_j\right) \ge \eta _{{\tilde{x}},n}. \end{aligned}$$

Let \(X_{0}=\pi ({\widetilde{X}}_{0})\). Then \(X_{0}\) is a \(\mu \)-measurable set with \(\mu (X_{0})=1\). For any \(x\in X_{0}\), there exists \({\tilde{x}}\in {\widetilde{X}}_{0}\) such that \(\pi ({\tilde{x}})=x\). Let \(K_{x}=\pi (K_{{\tilde{x}}})\). Then \(K_{x}\subset \overline{W^s(x,T)}\). Similar to [22, Lemma 3.7], for every integer \(n\ge 2\) there is a positive constant \(\eta _{x,n}\) such that for any pairwise distinct points \(x_1,x_2,\cdots ,x_n\) in \(K_x\) one has

$$\begin{aligned} \liminf _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\max _{1\le i<j\le n} d\left( T^{a_k}x_i,T^{a_k}x_j\right) =0 \end{aligned}$$

and

$$\begin{aligned} \limsup _{N\rightarrow \infty }\frac{1}{N}\sum _{k=1}^N\min _{1\le i<j\le n} d\left( T^{a_k}x_i,T^{a_k}x_j\right) \ge \eta _{x,n}. \end{aligned}$$

So \(K_x\) is as required. \(\square \)