1 Introduction

1.1 General background

This paper concerns probabilistic aspects of the Takens delay embedding theorem, dealing with the problem of reconstructing a dynamical system from a sequence of measurements of a one-dimensional observable. More precisely, let \(T:X \rightarrow X\) be a transformation on a phase space X. Fix \(k \in {\mathbb {N}}\) and consider a function (observable) \(h :X \rightarrow {\mathbb {R}}\) together with the corresponding k-delay coordinate map

$$\begin{aligned} \phi :X \rightarrow {\mathbb {R}}^k, \qquad \phi (x) = (h(x), \ldots , h(T^{k-1}x)). \end{aligned}$$

Takens-type delay embedding theorems state that if k is large enough, then \(\phi \) is an embedding (i.e. is injective) for a typical observable h. The injectivity of \(\phi \) ensures that an (unknown) initial state \(x \in X\) of the system can be uniquely recovered from the sequence of k measurements \(h(x), \ldots , h(T^{k-1}x)\) of the observable h, performed along the orbit of x. It also implies that the dynamical system (XT) has a reliable model in \({\mathbb {R}}^k\) of the form \(({{\tilde{X}}}, {{\tilde{T}}}) = (\phi (X), \phi \circ T \circ \phi ^{-1})\).

This line of research originates from the seminal paper of Takens [Tak81] on diffeomorphisms of compact manifolds. Extensions of Takens’ work were obtained in several categories, e.g. in [SYC91, Sta99, Cab00, Rob05, Gut16, GQS18, SBDH97, SBDH03, NV20] (see also [Rob11, BGŚ20] for a more detailed overview). A common feature of these results is that the minimal number of measurements sufficient for an exact reconstruction of the system is \(k \approx 2\dim X \), where \(\dim X \) is the dimension of the phase space X. This threshold agrees with the one appearing in the classical non-dynamical embedding theorems (e.g. Whitney theorem [Whi36], Menger–Nöbeling theorem [HW41, Theorem V.2] and Mañé theorem [Rob11, Theorem 6.2]). It is worth to notice that Takens-type theorems serve as a justification of the validity of time-delay based procedures, which are actually used in applications (see e.g. [HGLS05, KY90, SGM90, SM90]) and have been met with a great interest among mathematical physicists (see e.g. [PCFS80, HBS15, SYC91, Vos03]).

In 1998, Shroer, Sauer, Ott and Yorke conjectured (see [SSOY98, Conjecture 1]), that for smooth diffeomorphisms on compact manifolds, in a probabilistic setting (i.e. when the initial point \(x \in X\) is chosen randomly according to a natural probability measure \(\mu \)), the number of measurements required for an almost sure predictable reconstruction of the system can be generically reduced by half, up to the information dimension of \(\mu \). A precise formulation is given below in Sect. 1.2. We will refer to this conjecture as Shroer–Sauer–Ott–Yorke predictability conjecture or SSOY predictability conjecture. In [SSOY98], the authors provided some heuristic arguments supporting the conjecture together with its numerical verification for some examples (Hénon and Ikeda maps). However, a rigorous proof of the conjecture has been unknown up to now.

In this paper, we prove a general version of a predictable embedding theorem (Theorem 1.7), valid for injective Lipschitz transformations of compact sets and arbitrary Borel probability measures, which shows that an almost sure predictable reconstruction of the system is possible with the number of measurements reduced to the Hausdorff dimension of \(\mu \), under a mild assumption bounding the dimensions of sets of periodic points of low periods. As a corollary, we obtain the SSOY predictability conjecture for generic smooth \(C^r\)-diffeomorphisms on compact manifolds for \(r \ge 1\), with information dimension replaced by the Hausdorff one (Corollary 1.9) and the original conjecture for arbitrary \(C^r\)-diffeomorphisms and ergodic measures (Corollary 1.10). We also construct an example of a \(C^\infty \)-smooth diffeomorphism of a compact Riemannian manifold with a non-ergodic natural measure, for which the original conjecture does not hold (Theorem 1.11). This shows that in a general case, the change of the information dimension to the Hausdorff one is necessary.

Let us note that the SSOY predictability conjecture has been invoked in a number of papers (see e.g. [Liu10, MS04, OL98]) as a theoretical argument for reducing the number of measurements required for a reliable reconstruction of the system, also in applications (see e.g. [QMAV99] studying neural brain activity in focal epilepsy). Our result provides a mathematically rigorous proof of the correctness of these procedures.

1.2 Shroer–Sauer–Ott–Yorke predictability conjecture

Before we formulate the conjecture stated in [SSOY98] in a precise way, we need to introduce some preliminaries, in particular the notion of predictability. In the sequel, we consider a general situation, when the phase space X is an arbitrary compact set in \({\mathbb {R}}^N\) (note that by the Whitney embedding theorem [Whi36], we can assume that a smooth compact manifold is embedded in \({\mathbb {R}}^N\) for sufficiently large N). We denote the (topological) support of a measure \(\mu \) by \({{\,\mathrm{supp}\,}}\mu \) and write \(\phi _*\mu \) for a push-forward of \(\mu \) by a measurable transformation \(\phi \), defined by \(\phi _*\mu (A) = \mu (\phi ^{-1}(A))\) for measurable sets A.

Definition 1.1

Let \(X \subset {\mathbb {R}}^N\) be a compact set, let \(\mu \) be a Borel probability measure with support in X and let \(T :X \rightarrow X\) be a Borel transformation (i.e. such that the preimage of any Borel set is Borel). Fix \(k \in {\mathbb {N}}\). Let \(h :X \rightarrow {\mathbb {R}}\) be a Borel observable and let \(\phi :X \rightarrow {\mathbb {R}}^k\) given by \(\phi (x) = (h(x), \ldots , h(T^{k-1}x))\) be the corresponding k-delay coordinate map. Set \(\nu = \phi _*\mu \) (considered as a Borel measure in \({\mathbb {R}}^k\)) and note that \({{\,\mathrm{supp}\,}}\nu \subset \phi (X)\). For \(y \in {{\,\mathrm{supp}\,}}\nu \) and \(\varepsilon >0\) define

$$\begin{aligned} \chi _{\varepsilon }(y)&= \frac{1}{\mu \big (\phi ^{-1}(B(y, \varepsilon ))\big )} \int \limits _{\phi ^{-1}(B(y, \varepsilon ))} \phi (Tx)d\mu (x),\\ \sigma _{\varepsilon }(y)&= \bigg (\frac{1}{\mu \big (\phi ^{-1}(B(y, \varepsilon ))\big )} \int \limits _{\phi ^{-1}(B(y, \varepsilon ))} \Vert \phi (Tx) -\chi _{\varepsilon }(y)\Vert ^2d\mu (x)\bigg )^{\frac{1}{2}}, \end{aligned}$$

where \(B(y,\varepsilon )\) denotes the open ball of radius \(\varepsilon \) centered at y. In other words, \(\chi _{\varepsilon }(y)\) is the conditional expectation of the random variable \(\phi \circ T\) (with respect to \(\mu \)) given \(\phi \in B(y, \varepsilon )\), while \(\sigma _{\varepsilon }(y)\) is its conditional standard deviation. Define also the prediction error at y as

$$\begin{aligned} \sigma (y) = \lim \limits _{\varepsilon \rightarrow 0} \sigma _{\varepsilon }(y), \end{aligned}$$

provided the limit exists. A point y is said to be predictable if \(\sigma (y)=0\).

Note that the prediction error depends on the observable h. We simplify the notation by suppressing this dependence.

Remark 1.2

Note that the predictability of points of the support of the measure \(\nu \) does not imply that the delay coordinate map \(\phi \) is injective. Indeed, if h (and hence \(\phi \)) is constant, then every point \(y \in {{\,\mathrm{supp}\,}}\nu \) is predictable.

Remark 1.3

(Farmer and Sidorowich algorithm) As explained in [SSOY98], the notion of predictability arises naturally in the context of a prediction algorithm proposed by Farmer and Sidorowich in [FS87]. To describe it, suppose that for a point \(x \in X\) we are given a sequence of measurements \(h(x), \ldots , h(T^{n+k-1}(x))\) of the observable h for some \(n \in {\mathbb {N}}\). This defines a sequence of k-delay coordinate vectors of the form

$$\begin{aligned} y_i = (h(T^ix), \ldots , h(T^{i+k-1}x)), \qquad i = 0, \ldots , n. \end{aligned}$$

Knowing the sample values of \(y_0, \ldots , y_n\), we would like to predict the one-step future of the model, i.e. the value of the next point \(y_{n+1} = (h(T^{n+1}x), \ldots , h(T^{n+k}x))\). For a small \(\varepsilon >0\) we define the predicted value of \(y_{n+1}\) as

$$\begin{aligned} \widehat{y_{n+1}} = \frac{1}{\#{\mathcal {I}}} \sum _{i \in {\mathcal {I}}} y_{i+1} \quad \text {for} \quad {\mathcal {I}} = \{ 0 \le i < n : y_i \in B(y_n, \varepsilon ) \}. \end{aligned}$$

In other words, the predicted value of \(y_{n+1}\) is taken to be the average of the values \(y_{i+1}\), where we count only those i, for which \(y_i\) are \(\varepsilon \)-close to the last known point \(y_n\).

Notice that if the k-delay coordinate map \(\phi \) is an embedding, then the points \(y_i\) form an orbit of \(y_0\) under the model transformation \({{\tilde{T}}}\) defined by the delay coordinate map \(\phi \), i.e. \(y_i = {{\tilde{T}}}^i(y_0)\) for \(({{\tilde{X}}}, {{\tilde{T}}}) = (\phi (X), \phi \circ T \circ \phi ^{-1})\). Hence, in this case the predicted value \(y_{n+1} = {{\tilde{T}}}(y_n)\) is the average of the values \(y_{i+1} = {{\tilde{T}}}(y_i)\), \(i \in {\mathcal {I}}\).

If the initial point \(x \in X\) is chosen randomly according to an ergodic probability measure \(\mu \), then for \(n \rightarrow \infty \), the collection of points \(y_i,\ i \in {\mathcal {I}}\) is asymptotically distributed in \(B(y_n, \varepsilon )\) according to the measure \(\nu = \phi _*\mu \). Therefore, the value of \(\sigma _\varepsilon (y_n)\) from Definition 1.1 approaches asymptotically the standard deviation of the predicted point \(\widehat{y_{n+1}}\). The condition of predictability states that this standard deviation converges to zero as \(\varepsilon \) tends to zero.

In [SSOY98], the Shroer–Sauer–Ott–Yorke predictability conjecture is stated for a special class of measures, called natural measures. To define it, recall first that a measure \(\mu \) on X is invariant for a measurable map \(T:X \rightarrow X\) if \(\mu (T^{-1}(A))=\mu (A)\) for every measurable set \(A \subset X\). A set \(\Lambda \subset X\) is called T-invariant if \(T(\Lambda ) \subset \Lambda \).

Definition 1.4

Let X be a compact Riemannian manifold and \(T :X \rightarrow X\) be a smooth diffeomorphism. A compact T-invariant set \(\Lambda \subset X\) is called an attractor, if the set \(B(\Lambda ) = \{x \in X: \lim _{n \rightarrow \infty } {{\,\mathrm {dist}\,}}(T^n x, \Lambda ) = 0\}\) is an open set containing \(\Lambda \). The set \(B(\Lambda )\) is called the basin of attraction to \(\Lambda \). A T-invariant Borel probability measure \(\mu \) on \(\Lambda \) is called a natural measure if

$$\begin{aligned} \lim \limits _{n \rightarrow \infty } \frac{1}{n} \sum \limits _{i=0}^{n-1} \delta _{T^i x} = \mu \end{aligned}$$

for almost every \(x \in B(\Lambda )\) with respect to the volume measure on X, where \(\delta _y\) denotes the Dirac measure at y and the limit is taken in the weak-\(^*\) topology.

Remark 1.5

Note that in ergodic theory of dynamical systems, some authors use the name physical measure or SRB (Sinai–Ruelle–Bowen) measure for similar concepts (see e.g. [You02]). The term ‘natural measure’ occurs commonly in mathematical physics literature (see e.g. [Ott02, OY08]).

Definition 1.6

For a Borel probability measure \(\mu \) in \({\mathbb {R}}^N\) with compact support define its lower and upper information dimensions asFootnote 1

$$\begin{aligned} {\underline{{{\,\mathrm{ID}\,}}}}(\mu )= & {} \liminf \limits _{\varepsilon \rightarrow 0} \int \limits _{{{\,\mathrm{supp}\,}}\mu } \frac{\log \mu (B(x,\varepsilon ))}{\log \varepsilon } d\mu (x), \\ \overline{{{\,\mathrm{ID}\,}}}(\mu )= & {} \limsup \limits _{\varepsilon \rightarrow 0} \int \limits _{{{\,\mathrm{supp}\,}}\mu } \frac{\log \mu (B(x,\varepsilon ))}{\log \varepsilon } d\mu (x). \end{aligned}$$

If \({\underline{{{\,\mathrm{ID}\,}}}}(\mu ) = \overline{{{\,\mathrm{ID}\,}}}(\mu )\), then we denote their common value as \({{\,\mathrm{ID}\,}}(\mu )\) and call it the information dimension of \(\mu \).

We are now ready to state the SSOY predictability conjecture in its original form as stated in [SSOY98]. Recall that for a map \(T:X \rightarrow X\) with a Borel probability measure \(\mu \), a number \(k \in {\mathbb {N}}\) and a function \(h :X \rightarrow {\mathbb {R}}\), we consider the k-delay coordinate map for the observable h defined by

$$\begin{aligned} \phi (x) = \phi _{h,k} (x) = (h(x), \ldots , h(T^{k-1}x)). \end{aligned}$$

To emphasize the dependence on h and k, we will write \(\phi _{h,k}\) for \(\phi \) and \(\nu _{h,k}\) for the push-forward measure \(\nu = \nu _{h,k} = (\phi _{h,k})_*\mu \).

SSOY predictability conjecture ([SSOY98, Conjecture 1]) Let \(T:X \rightarrow X\) be a smooth diffeomorphism of a compact Riemannian manifold X and let \(\Lambda \subset X\) be an attractor of T with a natural measure \(\mu \) such that \({{\,\mathrm{ID}\,}}(\mu ) = D\). Fix \(k>D\). Then \(\nu _{h,k}\)-almost every point of \({\mathbb {R}}^k\) is predictable for a generic observable \(h:X \rightarrow {\mathbb {R}}\).

Note that in this formulation some details (e.g. the type of genericity and the smoothness class of the dynamics) are not specified precisely.

1.3 Main results

Now we present the main results of the paper. First, we state a predictable embedding theorem, which holds in a general context of injective Lipschitz maps T on a compact set \(X \subset {\mathbb {R}}^N\) equipped with a Borel probability measure \(\mu \). Recall that by the Whitney embedding theorem [Whi36], we can assume that a smooth compact manifold is embedded in \({\mathbb {R}}^N\) for sufficiently large N. Our observation is that in this generality, the predictability holds if we replace the information dimension \({{\,\mathrm{ID}\,}}(\mu )\) by the Hausdorff dimension \(\dim _H \mu \) (see Sect. 2.1 for definition).

In the presented results, we understand the genericity of the observable h in the sense of prevalence in the space \({{\,\mathrm{Lip}\,}}(X)\) of Lipschitz observables \(h :X \rightarrow {\mathbb {R}}\) (with a polynomial probe set), which is an analogue of the ‘Lebesgue almost sure’ condition in infinite dimensional spaces (see Sect. 2.2 for precise definitions). In particular, the genericity of h holds also in the sense of prevalence in the space of \(C^r\)-smooth observables \(h :X \rightarrow {\mathbb {R}}\), for \(r \ge 1\). Let us note that it is standard to use prevalence as a notion of genericity in the context of Takens-type embedding theorems (see e.g. [SYC91, Rob11]).

It is known that Takens-type theorems require some bounds on the size of sets of T-periodic points of low periods. Following [BGŚ20], we assume \(\dim _H(\mu |_{{{\,\mathrm{Per}\,}}_p(T)}) < p\) for \(p=1, \ldots , k-1\), where

$$\begin{aligned} {{\,\mathrm{Per}\,}}_p(T) = \{ x \in X : T^p x= x \}. \end{aligned}$$

With these remarks, our main result is the following.

Theorem 1.7

(Predictable embedding theorem for Lipschitz maps). Let \(X \subset {\mathbb {R}}^N\) be a compact set, let \(\mu \) be a Borel probability measure on X and let \(T:X \rightarrow X\) be an injective Lipschitz map. Take \(k>\dim _H\mu \) and assume \(\dim _H(\mu |_{{{\,\mathrm{Per}\,}}_p(T)}) < p\) for \(p=1, \ldots , k-1\). Then for a prevalent set of Lipschitz observables \(h:X \rightarrow {\mathbb {R}}\), the k-delay coordinate map \(\phi _{h,k}\) is injective on a Borel set of full \(\mu \)-measure, and \(\nu _{h,k}\)-almost every point of \({\mathbb {R}}^k\) is predictable.

Remark 1.8

Notice that except of predictability, we obtain almost sure injectivity of the delay coordinate map, which means that the system can be reconstructed in \({\mathbb {R}}^k\) in a one-to-one fashion on a set of full measure.

An extended version of Theorem 1.7 is proved in Sect. 3 as Theorem 3.1.

Note that the assumption on the dimension of \(\mu \) restricted to the set of p-periodic points can be omitted if there are only finitely many periodic points of given period. By the Kupka–Smale theorem (see [PdM82, Chapter 3, Theorem 3.6]), the latter condition is generic (in the Baire category sense) in the space of \(C^r\)-diffeomorphisms, \(r\ge 1\), of a compact manifold, equipped with the uniform \(C^r\)-topology (see [BGŚ20] for more details). Therefore, we immediately obtain the SSOY predictability conjecture for generic smooth \(C^r\)-diffeomorphisms, with information dimension replaced by the Hausdorff one.

Corollary 1.9

(SSOY predictability conjecture for generic diffeomorphisms). Let X be a compact Riemannian manifold and \(r \ge 1\). Then for a \(C^r\)-generic diffeomorphism \(T:X \rightarrow X\) with a natural measure \(\mu \) (or, more generally, any Borel probability measure) and \(k > \dim _H \mu \), for a prevalent set (depending on T) of Lipschitz observables \(h:X \rightarrow {\mathbb {R}}\), the k-delay coordinate map \(\phi _{h,k}\) is injective on a set of full \(\mu \)-measure, and \(\nu _{h,k}\)-almost every point of \({\mathbb {R}}^k\) is predictable.

Suppose now the measure \(\mu \) in Theorem 1.7 is T-invariant and ergodic. Then we have \(\dim _H\mu \le {\underline{{{\,\mathrm{ID}\,}}}}(\mu ) \le \overline{{{\,\mathrm{ID}\,}}}(\mu )\) (see Proposition 2.1). Moreover, either the set of T-periodic points has \(\mu \)-measure zero, or \(\mu \) is supported on a periodic orbit of T (see the proof of [BGŚ20, Remark 4.4(c)]. Hence, the assumption on the dimension of \(\mu \) restricted to the set of p-periodic points can again be omitted. This proves the original SSOY conjecture for arbitrary \(C^r\)-diffeomorphisms and ergodic measures.

Corollary 1.10

(SSOY predictability conjecture for ergodic measures). Let X be a compact Riemannian manifold, \(r \ge 1\), and let \(T:X \rightarrow X\) be a \(C^r\)-diffeomorphism with an ergodic natural measure \(\mu \) (or, more generally, any T-invariant ergodic Borel probability measure). Take \(k>{\underline{{{\,\mathrm{ID}\,}}}}(\mu )\). Then for a prevalent set of Lipschitz observables \(h:X \rightarrow {\mathbb {R}}\), the k-delay coordinate map \(\phi _{h,k}\) is injective on a set of full \(\mu \)-measure, and \(\nu _{h,k}\)-almost every point of \({\mathbb {R}}^k\) is predictable.

Our final result is that the SSOY predictability conjecture does not hold in its original formulation for all smooth diffeomorphisms, i.e. the condition \(k>{{\,\mathrm{ID}\,}}(\mu )\) is not sufficient for almost sure predictability for generic observables, even if \(\mu \) is within the class of natural measures.

Theorem 1.11

There exists a \(C^{\infty }\)-smooth diffeomorphism of the 3-dimensional compact Riemannian manifold \(X = {\mathbb {S}}^2 \times {\mathbb {S}}^1\) with a natural measure \(\mu \), such that \({{\,\mathrm{ID}\,}}(\mu )<1\) and for a prevalent set of Lipschitz observables \(h:X \rightarrow {\mathbb {R}}\), there exists a positive \(\nu _{h,1}\)-measure set of non-predictable points. In particular, the set of Lipschitz observables \(h:X \rightarrow {\mathbb {R}}\) for which \(\nu _{h,1}\)-almost every point of \({\mathbb {R}}^k\) is predictable, is not prevalent.

The construction is presented in Sect. 4 (see Theorem 4.14 for details).

Remark 1.12

Theorem 1.11 shows that the original SSOY predictability conjecture fails for a specific system (XT). It remains an open question whether it holds for a generic \(C^r\)-diffeomorphism T of a given compact Riemannian manifold X. By Corollary 1.9, this would follow from the dimension conjecture of Farmer, Ott and Yorke [FOY83, Conjecture 1], which (in particular) states that the Hausdorff and information dimension of the natural measure typically coincide.

1.4 Organization of the paper

Section 2 contains preliminary material, gathering definitions and tools required for the rest of the paper. Theorem 1.7 and its extension Theorem 3.1 are proved in Sect. 3. Section 4 contains a construction of the example presented in Theorem 1.11, divided into several steps.

2 Preliminaries

2.1 Hausdorff and information dimensions

For \(s>0\), the s-dimensional (outer) Hausdorff measure of a set \(X \subset {\mathbb {R}}^N\) is defined as

$$\begin{aligned} {\mathcal {H}}^s(X) = \lim \limits _{\delta \rightarrow 0}\ \inf \Big \{ \sum \limits _{i = 1}^{\infty } |U_i|^s : X \subset \bigcup \limits _{i=1}^{\infty } U_i,\ |U_i| \le \delta \Big \}, \end{aligned}$$

where \(| \cdot |\) denotes the diameter of a set (with respect to the Euclidean distance in \({\mathbb {R}}^N)\). The Hausdorff dimension of X is given as

$$\begin{aligned} \dim _HX = \inf \{ s> 0 : {\mathcal {H}}^s(X) = 0 \} = \sup \{ s > 0 : {\mathcal {H}}^s(X) = \infty \}. \end{aligned}$$

The (upper) Hausdorff dimension of a finite Borel measure \(\mu \) in \({\mathbb {R}}^N\) is defined as

$$\begin{aligned} \dim _H\mu = \inf \{ \dim _HX: X \subset {\mathbb {R}}^N \text { is a Borel set of full}\, \mu -{ measure} \}. \end{aligned}$$

By the Whitney embedding theorem [Whi36], we can assume that a smooth compact manifold is smoothly embedded in the Euclidean space, hence the Hausdorff dimension is well defined also for Borel measures on manifolds.

In general, \({\underline{{{\,\mathrm{ID}\,}}}}(\mu )\) and \(\overline{{{\,\mathrm{ID}\,}}}(\mu )\) are not comparable with \(\dim _H\mu \) (see [FLR02, Sect. 3]). One can however obtain inequalities between them for measures which are ergodic with respect to Lipschitz transformations.

Proposition 2.1

Let \(X \subset {\mathbb {R}}^N\) be a closed set, let \(T :X \rightarrow X\) be a Lipschitz map and let \(\mu \) be a T-invariant and ergodic Borel probability measure on X. Then

$$\begin{aligned} \dim _H\mu \le {\underline{{{\,\mathrm{ID}\,}}}}(\mu ) \le \overline{{{\,\mathrm{ID}\,}}}(\mu ). \end{aligned}$$

Proof

The inequality \({\underline{{{\,\mathrm{ID}\,}}}}(\mu ) \le \overline{{{\,\mathrm{ID}\,}}}(\mu )\) is obvious. The estimate \(\dim _H\mu \le {\underline{{{\,\mathrm{ID}\,}}}}(\mu )\) follows by combining [Fal97, Propositions 10.2–10.3] with [FLR02, Theorem 1.3] and [Fal97, Proposition 10.6]. \(\quad \square \)

For more information on dimension theory in Euclidean spaces we refer to [Fal04, Mat95, Rob11].

2.2 Prevalence

In the formulation of our results, the genericity of the considered observables is understood in terms of prevalence – a notion introduced by Hunt, Shroer and Yorke in [HSY92], which is regarded to be an analogue of ‘Lebesgue almost sure’ condition in infinite dimensional normed linear spaces.

Definition 2.2

Let V be a normed space. A Borel set \(S \subset V\) is called prevalent if there exists a Borel measure \(\nu \) in V, which is positive and finite on some compact set in V, such that for every \(v \in V\), the vector \(v + e\) belongs to S for \(\nu \)-almost every \(e \in V\). A non-Borel subset of V is prevalent if it contains a prevalent Borel subset.

We will apply this definition to the space \({{\,\mathrm{Lip}\,}}(X)\) of all Lipschitz functions \(h :X \rightarrow {\mathbb {R}}\) on a compact metric space X, endowed with the Lipschitz norm \(\Vert h\Vert _{{{\,\mathrm{Lip}\,}}} = \Vert h\Vert _{\infty } + {{\,\mathrm{Lip}\,}}(h)\), where \(\Vert h\Vert _\infty \) is the supremum norm and \({{\,\mathrm{Lip}\,}}(h)\) is the Lipschitz constant of h. We will use the following standard condition, which is sufficient for prevalence. Let \(\{h_1, \ldots , h_m\}\), \(m \in {\mathbb {N}}\), be a finite set of functions in \({{\,\mathrm{Lip}\,}}(X)\), called the probe set. Define \(\xi :{\mathbb {R}}^m \rightarrow {{\,\mathrm{Lip}\,}}(X)\) by \(\xi (\alpha _1, \ldots , \alpha _m) = \sum _{j=1}^m \alpha _j h_j\). Then \(\nu = \xi _*{{\,\mathrm{Leb}\,}}\), where \({{\,\mathrm{Leb}\,}}\) is the Lebesgue measure in \({\mathbb {R}}^k\), is a Borel measure in \({{\,\mathrm{Lip}\,}}(X)\), which is positive and finite on the compact set \(\xi ([0,1]^m)\). For this measure, the sufficient condition for a set \(S \subset {{\,\mathrm{Lip}\,}}(X)\) to be prevalent is that for every \(h \in {{\,\mathrm{Lip}\,}}(X)\), the function \(h + \sum _{j=1}^m \alpha _j h_j\) is in S for Lebesgue almost every \((\alpha _1, \ldots , \alpha _m) \in {\mathbb {R}}^m\). In this case, we say that S is prevalent in \({{\,\mathrm{Lip}\,}}(X)\) with the probe set \(\{h_1, \ldots , h_m\}\).

For more information on prevalence we refer to [HSY92] and [Rob11, Chapter 5].

2.3 Probabilistic Takens delay embedding theorem

To prove Theorem 1.7, we will use our previous result from [BGŚ20], which we recall below, using the notion of prevalence described in Sect. 2.2. This is a probabilistic version of the Takens delay embedding theorem, asserting that under suitable conditions on k, there is a prevalent set of Lipschitz observables, which give rise to an almost surely injective k-delay coordinate map.

Theorem 2.3

(Probabilistic Takens delay embedding theorem, [BGŚ20, Theorem 4.3 and Remark 4.4]). Let \(X \subset {\mathbb {R}}^N\) be a compact set, \(\mu \) a Borel probability measure on X and \(T:X \rightarrow X\) an injective Lipschitz map. Take \(k > \dim _H\mu \) and assume \(\dim _H(\mu |_{{{\,\mathrm{Per}\,}}_p(T)}) < p\) for \(p=1, \ldots , k-1\). Let S be the set of Lipschitz observables \(h :X \rightarrow {\mathbb {R}}\), for which the k-delay coordinate map \(\phi _{h,k}\) is injective on a Borel set \(X_h \subset X\) of full \(\mu \)-measure. Then S is prevalent in \({{\,\mathrm{Lip}\,}}(X)\) with the probe set equal to a linear basis of the space of real polynomials of N variables of degree at most \(2k-1\). If \(\mu \) is additionally T-invariant, then the set \(X_h\) for \(h \in S\) can be chosen to satisfy \(T(X_h) = X_h\).

2.4 Topological Rokhlin disintegration theorem

A useful tool connecting the probabilistic Takens delay embedding theorem and the SSOY predictability conjecture is the following topological version of the Rokhlin disintegration theorem in compact metric spaces. The Rokhlin disintegration theorem (see e.g. [Roh52]) is a classical result on the existence and almost sure uniqueness of the system of conditional measures. The crucial fact for us is that in the topological setting, the conditional measures can be defined as limits of conditional measures on preimages of shrinking balls, where the convergence holds almost surely, as was proved by Simmons in [Sim12].

In the context of the Rokhlin disintegration theorem, one assumes that the considered measures are complete, i.e. every subset of a zero-measure set is measurable. Recall that every finite Borel measure \(\mu \) on a metric space X has an extension (completion) to a complete measure on the \(\sigma \)-algebra of \(\mu \)-measurable sets, i.e. the smallest \(\sigma \)-algebra containing all Borel sets in X and all subsets of zero \(\mu \)-measure Borel sets. In other words, every \(\mu \)-measurable set A can be expressed as \(A = B \cup C\), where B is a Borel set and \(C \subset D\) for some Borel set D with \(\mu (D) = 0\) (see e.g. [Fol99, Theorem 1.19] for the case \(X = {\mathbb {R}}\)). Alternatively, this \(\sigma \)-algebra is obtained as a family of sets measurable with respect to the outer measure generated by \(\mu \) (see e.g. [Fol99, Example 22, p. 32]). Recall also that a function \(\psi :X \rightarrow {\mathbb {R}}\) is called \(\mu \)-measurable if \(\psi ^{-1}(B)\) is \(\mu \)-measurable for every Borel set \(B \subset {\mathbb {R}}\).

Definition 2.4

Let X be a compact metric space and let \(\mu \) be a complete Borel probability measure on X. Let Y be a separable Riemannian manifold and let \(\phi :X \rightarrow Y\) be a Borel map. Set \(\nu = \phi _* \mu \) (considered as a complete Borel measure in Y). A family \(\{ \mu _y : y \in Y\}\) is a system of conditional measures of \(\mu \) with respect to \(\phi \), if

  1. (1)

    for every \(y \in Y,\ \mu _y\) is a (possibly zero) Borel measure on \(\phi ^{-1}(\{y\})\),

  2. (2)

    for \(\nu \)-almost every \(y \in Y\), \(\mu _y\) is a Borel probability measure,

  3. (3)

    for every \(\mu \)-measurable set \(A \subset X\), the function \(Y \ni y \mapsto \mu _y(A)\) is \(\nu \)-measurable and

    $$\begin{aligned} \mu (A) = \int \limits _{Y} \mu _y(A)d\nu (y). \end{aligned}$$

We say that system of conditional measures \(\{ \mu _y : y \in Y\}\) is unique, if for every family \(\{ {{\tilde{\mu }}}_y : y \in Y \}\) satisfying (1)–(3), we have \({{\tilde{\mu }}}_y = \mu _y\) for \(\nu \)-almost every \(y \in Y\).

Theorem 2.5

(Topological Rokhlin disintegration theorem, [Sim12, Theorems 2.1–2.2]). Let X be a compact metric space and let \(\mu \) be a Borel probability measure on X. Let Y be a separable Riemannian manifold and let \(\phi :X \rightarrow Y\) be a Borel map. Set \(\nu = \phi _* \mu \). Then for \(\nu \)-almost every \(y \in {{\,\mathrm{supp}\,}}\nu \) and \(\varepsilon > 0\), the conditional probability measures

$$\begin{aligned} \mu _{y,\varepsilon } = \frac{1}{\mu (\phi ^{-1}(B(y,\varepsilon )))} \mu |_{\phi ^{-1}(B(y,\varepsilon ))} \end{aligned}$$

converge in weak-\(^*\) topology to a Borel probability measure \(\mu _y\) as \(\varepsilon \) tends to 0. Moreover, the collection of measures \(\{ \mu _y : y \in Y\}\), where we set \(\mu _y = 0\) if \(y \notin {{\,\mathrm{supp}\,}}\nu \) or the convergence does not hold, is a unique system of conditional measures of \(\mu \) with respect to \(\phi \).

The proof of the above theorem is based on the differentiation theorem for finite Borel measures, see [Sim12, Theorem 9.1] for details.

3 Proof of the Predictable Embedding Theorem for Lipschitz Maps

In this section we prove the following extended version of Theorem 1.7, which at the same time is an extension of Theorem 2.3 asserting prevalent almost sure predictability.

Theorem 3.1

(Predictable embedding theorem for Lipschitz maps – extended version). Let \(X \subset {\mathbb {R}}^N\) be a compact set, let \(\mu \) be a Borel probability measure on X and let \(T:X \rightarrow X\) be an injective and Lipschitz map. Take \(k > \dim _H\mu \) and assume \(\dim _H(\mu |_{{{\,\mathrm{Per}\,}}_p(T)}) < p\) for \(p=1, \ldots , k-1\). Then there is a set S of Lipschitz observables \(h :X \rightarrow {\mathbb {R}}\), such that S is prevalent in \({{\,\mathrm{Lip}\,}}(X)\) with the probe set equal to a linear basis of the space of real polynomials of N variables of degree at most \(2k-1\), and for every \(h \in S\), the following assertions hold.

  1. (a)

    There exists a Borel set \(X_h\subset X\) of full \(\mu \)-measure, such that the k-delay coordinate map \(\phi _{h,k}\) is injective on \(X_h\).

  2. (b)

    For every \(x \in X_h\), \(\lim \limits _{\varepsilon \rightarrow 0} \mu _{\phi _{h,k}(x), \varepsilon } = \delta _x\) in the weak-\(^*\) topology, where \(\delta _x\) denotes the Dirac measure at the point x.

  3. (c)

    \(\nu _{h,k}\)-almost every point of \({\mathbb {R}}^k\) is predictable.

If \(\mu \) is additionally T-invariant, then the set \(X_h\) for \(h \in S\) can be chosen to satisfy \(T(X_h) = X_h\).

The main ingredients of the proof of Theorem 3.1 are Theorems 2.3 and 2.5. First, notice that under the assumptions of Theorem 3.1, we can use Theorem 2.5 to show the existence of a system \(\{ \mu _y : y \in {\mathbb {R}}^k\}\) of conditional measures of \(\mu \) with respect to \(\phi _{h,k}\), such that for \(\nu _{h,k}\)-almost every \(y \in {\mathbb {R}}^k\), \(\mu _y\) is a Borel probability measure in X satisfying

$$\begin{aligned} \mu _y = \lim _{\varepsilon \rightarrow 0} \mu _{y,\varepsilon } \end{aligned}$$
(3.1)

in weak-\(^*\) topology, where

$$\begin{aligned} \mu _{y,\varepsilon } = \frac{1}{\mu (\phi ^{-1}_{h,k}(B(y,\varepsilon )))} \mu |_{\phi ^{-1}_{h,k}(B(y,\varepsilon ))} \end{aligned}$$

for \(\varepsilon > 0\).

The following lemma shows that for \(\nu _{h,k}\)-almost every \(y \in {\mathbb {R}}^k\), the prediction error \(\sigma (y)\) from Definition 1.1 is equal to the standard deviation of the random variable \(\phi _{h,k} \circ T\) with respect to the measure \(\mu _y\). Note that the lemma is valid for any continuous (non-necessary Lipschitz) maps T and h.

Lemma 3.2

For \(\nu _{h,k}\)-almost every \(y \in {\mathbb {R}}^k\),

$$\begin{aligned} \sigma (y) = \sqrt{{{\,\mathrm{Var}\,}}_{\mu _y} (\phi _{h,k} \circ T)}, \end{aligned}$$

where

$$\begin{aligned} {{\,\mathrm{Var}\,}}_{\mu _y} ( \phi _{h,k} \circ T ) = \int \limits _{X} \Big \Vert \phi _{h,k} \circ T - \int \limits _{X}\phi _{h,k} \circ T d \mu _y\Big \Vert ^2 d\mu _y. \end{aligned}$$

Proof

For simplicity, let us write \(\phi = \phi _{h,k}\). Observe first that for \(\nu _{h,k}\)-almost every \(y \in {\mathbb {R}}^k\), by (3.1) and the continuity of \(\phi \circ T\), we have

$$\begin{aligned} \chi _\varepsilon (y) = \int \limits _X \phi \circ T d\mu _{y,\varepsilon } \underset{\varepsilon \rightarrow 0}{\longrightarrow } \chi (y) \end{aligned}$$
(3.2)

for

$$\begin{aligned} \chi (y) = \int \limits _X \phi \circ T d\mu _y. \end{aligned}$$

Moreover,

$$\begin{aligned} \sigma ^2_\varepsilon (y) - {{\,\mathrm{Var}\,}}_{\mu _y} ( \phi \circ T)&= \int \limits _X \Vert \phi \circ T - \chi _\varepsilon (y) \Vert ^2 d\mu _{y,\varepsilon } - \int \limits _{X} \Vert \phi \circ T - \chi (y)\Vert ^2 d\mu _y\\&= \int \limits _X \Vert \phi \circ T - \chi _\varepsilon (y) \Vert ^2 d\mu _{y,\varepsilon } - \int \limits _X \Vert \phi \circ T - \chi (y) \Vert ^2d\mu _{y,\varepsilon }\\&+ \int \limits _X \Vert \phi \circ T - \chi (y) \Vert ^2d\mu _{y,\varepsilon } - \int \limits _{X} \Vert \phi \circ T - \chi (y)\Vert ^2 d\mu _y,\\&=I + \textit{II}. \end{aligned}$$

Again by the continuity of \(\phi \circ T\), we have \(\textit{II} \underset{\varepsilon \rightarrow 0}{\longrightarrow } 0\). Furthermore,

$$\begin{aligned} |I|&\le \int \limits _X \big | \Vert \phi \circ T - \chi _\varepsilon (y) \Vert ^2 - \Vert \phi \circ T - \chi (y) \Vert ^2 \big |d\mu _{y,\varepsilon }\\&= \int \limits _X \big ( \Vert \phi \circ T - \chi _\varepsilon (y_0) \Vert + \Vert \phi \circ T - \chi (y) \Vert \big ) \, \big | \Vert \phi \circ T\\&\qquad - \chi _\varepsilon (y) \Vert - \Vert \phi \circ T - \chi (y)\Vert \big |d\mu _{y,\varepsilon }\\&\le 4 \Vert \phi \circ T\Vert _\infty \int \limits _X \Vert \chi _{\varepsilon }(y) - \chi (y)\Vert d\mu _{y,\varepsilon } = 4 \Vert \phi \circ T\Vert _\infty \, \Vert \chi _{\varepsilon }(y) - \chi (y)\Vert , \end{aligned}$$

by the triangle inequality and the fact \(\chi _\varepsilon (y) \le \Vert \phi \circ T \Vert _\infty \). The latter quantity converges to zero by (3.2). Therefore, \(\sigma ^2_\varepsilon (y)\) tends to \({{\,\mathrm{Var}\,}}_{\mu _y} ( \phi \circ T)\) as \(\varepsilon \rightarrow 0\), so \(\sigma (y) = \sqrt{{{\,\mathrm{Var}\,}}_{\mu _y} (\phi \circ T)}\). \(\quad \square \)

The following corollary is immediate.

Corollary 3.3

For \(\nu _{h,k}\)-almost every \(y \in {\mathbb {R}}^k\), y is predictable if and only if \(\phi _{h,k} \circ T\) is constant \(\mu _y\)-almost surely. In particular, y is predictable provided \(\mu _y = \delta _x\) for some \(x \in X\).

By Corollary 3.3, in order to establish almost sure predictability, it is enough to prove the convergence \(\lim _{\varepsilon \rightarrow 0} \mu _{\phi _{h,k}(x), \varepsilon } = \delta _x\) for almost every \(x \in X\). The idea of the proof of Theorem 3.1 is the following. Theorem 2.3 guarantees that for a prevalent set of observables, the corresponding delay-coordinate map is injective on a set of full \(\mu \)-measure. On the other hand, Theorem 2.5 assures that the measures \(\mu _{\phi (x),\varepsilon }\) are almost surely convergent as \(\varepsilon \rightarrow 0\), and the limits form a system of conditional measures of \(\mu \) with respect to \(\phi _{h,k}\). Almost sure injectivity implies that these conditional measures are almost surely Dirac measures, hence indeed \(\lim _{\varepsilon \rightarrow 0} \mu _{\phi _{h,k}(x), \varepsilon } = \delta _x\). A detailed proof is presented below.

Proof of Theorem 3.1

By Theorem 2.3, there exists a prevalent set S of Lipschitz observables h, such that for each \(h \in S\), the k-delay coordinate map \(\phi _{h,k}\) is injective on a Borel set \({{\tilde{X}}}_h \subset X\) of full \(\mu \)-measure. For \(h \in S\), let us denote for simplicity \(\phi = \phi _{h,k}\) and

$$\begin{aligned} {{\tilde{Y}}}_h = \phi ({{\tilde{X}}}_h). \end{aligned}$$

Note that \({{\tilde{Y}}}_h\) has full \(\nu _{h,k}\)-measure. Moreover, \({{\tilde{Y}}}_h\) is Borel, as a continuous and injective image of a Borel set, see [Kec95, Theorem 15.1]. Since \(\phi \) is injective on \({{\tilde{X}}}_h\), for every \(y \in {{\tilde{Y}}}_h\) there exists a unique point \(x_y \in {{\tilde{X}}}_h\), such that \(\phi (x_y) = y\). For \(y \in {\mathbb {R}}^k\) define

$$\begin{aligned} {{\tilde{\mu }}}_y = {\left\{ \begin{array}{ll} \delta _{x_y} &{}\text {for } y \in {{\tilde{Y}}}_h\\ 0 &{}\text {for } y \in {\mathbb {R}}^k \setminus {{\tilde{Y}}}_h \end{array}\right. }. \end{aligned}$$

We check that the collection \(\{ {{\tilde{\mu }}}_y : y \in {\mathbb {R}}^k\}\) satisfies the conditions (1)–(3) of Definition 2.4. The first two conditions are obvious. To check the third one, take a \(\mu \)-measurable set \(A \subset X\) and note that for \(y \in \phi (A \cap {{\tilde{X}}}_h)\), we have \(y \in {{\tilde{Y}}}_h\) and \(x_y \in A\), so \({{\tilde{\mu }}}_y(A) = \delta _{x_y}(A) = 1\). On the other hand, if \(y \in {{\tilde{Y}}}_h \setminus \phi (A \cap {{\tilde{X}}}_h)\), then \(x_y \notin A\), so \({{\tilde{\mu }}}_y(A) = \delta _{x_y}(A) = 0\). Since \({{\tilde{\mu }}}_y(A) = 0\) for \(y \in {\mathbb {R}}^k \setminus {{\tilde{Y}}}_h\), we conclude that for

$$\begin{aligned} \psi :{\mathbb {R}}^k \rightarrow {\mathbb {R}}, \qquad \psi (y)= {{\tilde{\mu }}}_y(A) \end{aligned}$$

we have

$$\begin{aligned} \psi = \mathbbm {1}_{\phi (A \cap {{\tilde{X}}}_h)}. \end{aligned}$$
(3.3)

Hence, to show the \(\nu _{h,k}\)-measurability of \(\psi \), it is enough to check that the set \(\phi (A \cap {{\tilde{X}}}_h)\) is \(\nu _{h,k}\)-measurable. To do it, note that since A is \(\mu \)-measurable, we have \(A = B \cup C\), where B is a Borel set and \(C \subset D\) for some Borel set D with \(\mu (D) = 0\). Hence, \(\phi (A \cap {{\tilde{X}}}_h) = \phi (B \cap {{\tilde{X}}}_h) \cup \phi (C \cap {{\tilde{X}}}_h)\). The set \(\phi (B \cap {{\tilde{X}}}_h)\) is Borel, which again follows from [Kec95, Theorem 15.1], as \(\phi \) is continuous and injective on the Borel set \(B \cap {{\tilde{X}}}_h\). Similarly, the set \(\phi (C \cap {{\tilde{X}}}_h)\) is contained in the Borel set \(\phi (D \cap {{\tilde{X}}}_h)\). Since \({{\tilde{X}}}_h\) has full \(\mu \)-measure, we have

$$\begin{aligned} \nu _{h,k}(\phi (D \cap {{\tilde{X}}}_h)) = \mu (\phi ^{-1}(\phi (D \cap {{\tilde{X}}}_h))) = \mu (\phi ^{-1}(\phi (D \cap {{\tilde{X}}}_h))\cap {{\tilde{X}}}_h) = \mu (D) = 0. \end{aligned}$$

This yields the \(\nu _{h,k}\)-measurability of the set \(\phi (A \cap {{\tilde{X}}}_h)\) and the function \(\psi \). Moreover, by (3.3),

$$\begin{aligned} \int \limits _{Y} {{\tilde{\mu }}}_y(A)d\nu _{h,k}(y)&= \nu _{h,k}(\phi (A \cap {{\tilde{X}}}_h))\\&= \mu (\phi ^{-1}(\phi (A \cap {{\tilde{X}}}_h)))\\&= \mu (\phi ^{-1}(\phi (A \cap {{\tilde{X}}}_h))\cap {{\tilde{X}}}_h) = \mu (A). \end{aligned}$$

It follows that \(\{ {{\tilde{\mu }}}_y : y \in {\mathbb {R}}^k\}\) is a system of conditional measures of \(\mu \) with respect to \(\phi \), so by the uniqueness in Theorem 2.5 and (3.1),

$$\begin{aligned} {{\tilde{\mu }}}_y = \mu _y = \lim _{\varepsilon \rightarrow 0} \mu _{y,\varepsilon } \end{aligned}$$

for \(\nu _{h,k}\)-almost every \(y \in {\mathbb {R}}^k\). Since \({{\tilde{Y}}}_h\) is a Borel set of full \(\nu _{h,k}\)-measure, we have

$$\begin{aligned} \mu _y = \lim _{\varepsilon \rightarrow 0} \mu _{y,\varepsilon } = \delta _{x_y} \end{aligned}$$
(3.4)

for every \(y \in Y_h\), where \(Y_h \subset {{\tilde{Y}}}_h\) and \(Y_h\) is a Borel set of full \(\nu _{h,k}\)-measure. By Corollary 3.3, this implies that \(\nu _{h,k}\)-almost every \(y \in {\mathbb {R}}^k\) is predictable, which proves the assertion (c) in Theorem 3.1.

Define

$$\begin{aligned} X_h = \phi ^{-1}(Y_h) \cap {{\tilde{X}}}_h. \end{aligned}$$

Then \(X_h\) is a Borel full \(\mu \)-measure subset of X. Since \(\phi (X_h) \subset Y_h \subset {{\tilde{Y}}}_h\), by (3.4) we have

$$\begin{aligned} \mu _{\phi (x)} = \lim _{\varepsilon \rightarrow 0} \mu _{\phi (x),\varepsilon } = \delta _{x_{\phi (x)}} = \delta _x \end{aligned}$$

for every \(x \in X_h\), which shows the assertion (b). Finally, the assertion (a) follows from the fact \(X_h \subset {{\tilde{X}}}_h\).

To end the proof of Theorem 3.1, note that if the measure \(\mu \) is T-invariant, we can define \(X_h' = \bigcap _{n\in {\mathbb {Z}}} T^n(X_h)\) to obtain a full \(\mu \)-measure subset of \(X_h\) with \(T(X_h') = X_h'\). For details, see the proof of [BGŚ20, Remark 4.4(b)]. \(\quad \square \)

Remark 3.4

Similarly as in [BGŚ20], the assumptions \(\dim _H(\mu ) < k\) and \(\dim _H(\mu |_{{{\,\mathrm{Per}\,}}_p(T)}) < p\) of Theorem 3.1 can be weakened to \(\mu \perp {\mathcal {H}}^k\) and \(\mu |_{{{\,\mathrm{Per}\,}}_p(T)} \perp {\mathcal {H}}^p\), respectively. Moreover, one can prove a version of Theorem 3.1 for \(\beta \)-Hölder observables \(h:X \rightarrow {\mathbb {R}}\), \(\beta \in (0,1]\). It is enough to take k with \({\mathcal {H}}^{\beta k}(X) = 0\) and assume that \(\mu |_{{{\,\mathrm{Per}\,}}_p(T)}\) is singular with respect to \({\mathcal {H}}^{\beta p}\) for \(p = 1, \ldots , k-1\), where \({\mathcal {H}}^s\) is the s-Hausdorff measure. For a precise formulation of required assumptions see [BGŚ20, Theorem 4.3]. As previously, the assumption on periodic points can be omitted if the measure \(\mu \) is T-invariant and ergodic (see [BGŚ20, Remark 4.4(c)] and its proof).

4 Counterexample to SSOY Predictability Conjecture: Proof of Theorem 1.11

In this section we prove Theorem 1.11, constructing an example of a \(C^{\infty }\)-smooth diffeomorphism T of a compact Riemannian manifold X with an attractor \(\Lambda \) endowed with a natural measure \(\mu \), such that \({{\,\mathrm{ID}\,}}(\mu )<1\) and for a prevalent set of Lipschitz observables, there is a positive \(\nu _{h,1}\)-measure set of non-predictable points. In particular, the set of Lipschitz observables \(h:X \rightarrow {\mathbb {R}}\), for which \(\nu _{h,1}\)-almost sure predictability holds for the 1-delay coordinate map \(\phi _{h,1}\), is not prevalent. Since the proof is quite involved, we shortly describe the subsequent steps.

In Sect. 4.1 we construct a model for the natural measure \(\mu \). First, we prove that for an irrational rotation on a circle \({\mathbb {S}}^1 \subset {\mathbb {R}}^N\) endowed with the Lebesgue measure \({{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}\), the only Lipschitz observables \(h :{\mathbb {S}}^1\rightarrow {\mathbb {R}}\) such that the almost sure predictability holds for the 1-delay coordinate map \(\phi \), are the constant functions. Then we construct a model \(\mu _0\) for the natural measure \(\mu \), taking \(X_0 = \{p_0\} \cup {\mathbb {S}}^1 \subset {\mathbb {R}}^N\) for some \(p_0 \notin {\mathbb {S}}^1\) and defining \(T_0 :X_0 \rightarrow X_0\) as the identity on \(\{p_0\}\) and an irrational rotation on \({\mathbb {S}}^1\). Then the measure \(\mu _0 = \delta _{p_0}/2 + {{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}/2\) satisfies \({{\,\mathrm{ID}\,}}(\mu _0) = 1/2 < 1\), yet the only Lipschitz observables \(h :X_0 \rightarrow {\mathbb {R}}\) yielding almost sure predictability for the 1-delay coordinate maps are the functions constant on \({\mathbb {S}}^1\). The same holds for any extension \((X,\mu ,T)\) of \((X_0,\mu _0, T_0)\) with \(X_0 \subset X,\ T|_{X_0} = T_0\) and \(\mu = \mu _0\). In particular, the set of Lipschitz observables \(h:X \rightarrow {\mathbb {R}}\) with almost sure predictability for the 1-delay coordinate map, is not prevalent. Moreover, for a prevalent set of Lipschitz observables, the almost sure predictability does not hold (Corollary 4.3).

The main step, performed in Sects. 4.24.3 is to realize the model measure \(\mu _0\) as a natural measure \(\mu \) for a smooth diffeomorphism T of a compact Riemannian manifold X. In Sect. 4.2, we construct a \(C^\infty \)-diffeomorphism f of the 2-dimensional sphere \({\mathbb {S}}^2 = {\mathbb {R}}^2 \cup \{\infty \}\), such that the trajectories of Lebesgue-almost all points of \({\mathbb {S}}^2\) spiral towards the invariant unit circle \(S = \{(x,y): x^2 + y^2 = 1\}\), spending most of the time in small neighbourhoods of two fixed points \(p, q \in S\) (Proposition 4.12). It follows that the average of the Dirac measures at p and q is a natural measure for f, with the attractor S and basin \({\mathbb {S}}^2 \setminus \{(0,0), \infty \}\) (Corollary 4.13). Then, in Sect. 4.3, we take

$$\begin{aligned} X = {\mathbb {S}}^2 \times {\mathbb {S}}^1 \end{aligned}$$

and define a \(C^\infty \)-diffeomorphism \(T:X \rightarrow X\) as a skew product of the form

$$\begin{aligned} T(z,t) = (f(z), h_{z}(t)), \qquad z \in {\mathbb {S}}^2,\; t \in {\mathbb {S}}^1, \end{aligned}$$

where \(h_z\) are diffeomorphisms of \({\mathbb {S}}^1\) depending smoothly on \(z \in {\mathbb {S}}^2\), such that for z in a neighbourhood of p, the map \(h_z\) is equal to a map \(g:{\mathbb {S}}^1\rightarrow {\mathbb {S}}^1\) with a unique fixed point \(0 \in {\mathbb {R}}/{\mathbb {Z}}\simeq {\mathbb {S}}^1\) attracting all points of \({\mathbb {S}}^1\), while for z in a neighbourhood of q, the map \(h_z\) is an irrational rotation on \({\mathbb {S}}^1\). See Fig. 1 for a schematic view of the map T.

Fig. 1
figure 1

Schematic view of the map \(T:{\mathbb {S}}^2 \times {\mathbb {S}}^1 \rightarrow {\mathbb {S}}^2 \times {\mathbb {S}}^1\)

The map T has an attractor

$$\begin{aligned} \Lambda = S \times {\mathbb {S}}^1 \end{aligned}$$

with the basin \(B(\Lambda ) = ({\mathbb {S}}^2 \setminus \{0, \infty \}) \times {\mathbb {S}}^1\) and natural measure

$$\begin{aligned} \mu = \frac{1}{2} \delta _{p_0} + \frac{1}{2} {{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}, \end{aligned}$$

where \(p_0 = (p,0)\) and \({{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}\) is the Lebesgue measure on the circle \(\{q\} \times {\mathbb {S}}^1\) (Theorem 4.14). Since the measure \(\mu \) is equal to the model measure \(\mu _0\), the conclusion follows.

4.1 Model measure

Consider a circle \({\mathbb {S}}^1 \subset {\mathbb {R}}^N\) (by a circle we mean an image of \(\{(x,y) \in {\mathbb {R}}^2: x^2+y^2 = 1\}\) by an affine similarity transformation) with the normalized Lebesgue (1-Hausdorff) measure \({{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}\) and a rotation \(R_\alpha :{\mathbb {S}}^1\rightarrow {\mathbb {S}}^1\) by an angle \(\alpha \). We use here an additive notation, i.e. for an angle coordinate \(t \in {\mathbb {R}}/{\mathbb {Z}}\simeq {\mathbb {S}}^1\) we write \(R_\alpha (t) = t + \alpha \text { mod } 1\). We assume \(\alpha \in {\mathbb {R}}\setminus {\mathbb {Q}}\). By \(d(\cdot , \cdot )\) we denote the standard rotation-invariant metric on \({\mathbb {S}}^1\).

For the system \(({\mathbb {S}}^1, {{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}, R_\alpha )\) we consider Lipschitz observables \(h:{\mathbb {S}}^1 \rightarrow {\mathbb {R}}\) and the corresponding 1-delay coordinate maps \(\phi :{\mathbb {S}}^1 \rightarrow {\mathbb {R}}\). Note that 1-delay coordinate maps are equal to the observables, i.e. \(\phi = h\).

Proposition 4.1

Suppose that for a Lipschitz function \(h :{\mathbb {S}}^1\rightarrow {\mathbb {R}}\), \(\nu \)-almost every \(y \in {\mathbb {R}}\) is predictable for the 1-delay coordinate map \(\phi =h\), where \(\nu = \phi _*{{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}\). Then h is constant.

Proof

Take h as in the proposition. The proof that h is constant is divided into four parts, described by the following claims.

Claim 1

There exists a set \(B \subset {\mathbb {S}}^1\) of full \({{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}\)-measure, with the following property: if \(t_1, t_2 \in B\) and \(h(t_1) = h(t_2)\), then \(h(R^n_\alpha t_1) = h(R^n_\alpha t_2)\) for every \(n \ge 0\).

For the proof of the above claim, consider the system \(\{ \mu _y : y \in {\mathbb {R}}\}\) of conditional measures of \({{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}\) with respect to \(\phi = h\), given by Theorem 2.5. Let

$$\begin{aligned} A = \Big \{ t\in {\mathbb {S}}^1: h (R_\alpha t) = \int \limits h \circ R_\alpha d\mu _{h(t)} \Big \}. \end{aligned}$$

It follows from Theorem 2.5 that the map \(y \mapsto \int \limits h \circ R_\alpha d\mu _{y}\) is \(\nu \)-measurable, hence \(t \mapsto \int \limits h \circ R_\alpha d\mu _{h(t)}\) is \({{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}\)-measurable. Consequently, A is a \({{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}\)-measurable set. By Theorem 2.5,

$$\begin{aligned} {{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}(A) = \int \limits _{\mathbb {R}}\mu _y(A) d\nu (y) \end{aligned}$$
(4.1)

and

$$\begin{aligned} \mu _y(A) = \mu _y(A \cap \{h = y\}) = \mu _y \Big ( \Big \{ t\in {\mathbb {S}}^1: h(t) = y \text { and } h(R_\alpha t) = \int \limits h \circ R_\alpha d\mu _{y} \Big \} \Big ). \end{aligned}$$

Since \(\nu \)-almost every \(y \in {\mathbb {R}}\) is predictable, Lemma 3.2 implies that the function \(h \circ R_\alpha \) is constant \(\mu _y\)-almost surely for \(\nu \)-almost every \(y \in {\mathbb {R}}\), hence \(\mu _y(A) = 1\) for \(\nu \)-almost every \(y \in {\mathbb {R}}\). Therefore, (4.1) gives \({{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}(A) = 1\).

Let

$$\begin{aligned} B = \bigcap \limits _{n=0}^{\infty } R_\alpha ^{-n}(A). \end{aligned}$$

Then B has full \({{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}\)-measure. Moreover, the definition of A implies that if \(t_1, t_2 \in A\) and \(h(t_1) = h(t_2)\), then \(h(R_\alpha t_1) = h(R_\alpha t_2)\). Therefore, if \(t_1, t_2 \in B\) and \(h(t_1) = h(t_2)\), then \(h(R^n_\alpha t_1) = h(R^n_\alpha t_2)\) for every \(n \ge 0\).

Claim 2

If \(t_1, t_2 \in B\) and \(h(t_1) = h(t_2)\), then \(h(t_1 + s) = h(t_2 + s)\) for every \(s \in {\mathbb {S}}^1\).

In order to prove the claim, assume that \(t_1, t_2 \in B\) and \(h(t_1) = h(t_2)\). Fix \(s \in {\mathbb {S}}^1\). Since \(\alpha \notin {\mathbb {Q}}\), every orbit under \(R_\alpha \) is dense in \({\mathbb {S}}^1\), so there exists a sequence \(n_k \rightarrow \infty \) with \(R^{n_k}_\alpha t_1 \rightarrow t_1+s\) as \(k \rightarrow \infty \). Then \(R^{n_k}_\alpha t_2 \rightarrow t_2+s\). As \(t_1, t_2 \in B\) and \(h(t_1)=h(t_2)\), by Claim 1 we have \(h(R^{n_k}_\alpha t_1) = h(R^{n_k}_\alpha t_2)\), hence the continuity of h gives \(h(t_1 + s) = h(t_2 + s)\).

Claim 3

For every \(\varepsilon >0\), there exist \(t_1, t_2 \in B\) such that \(0< d(t_1, t_2) < \varepsilon \) and \(h(t_1) = h(t_2)\).

To prove Claim 3, note first that it holds trivially if the set \(h^{-1}\left( \left\{ \inf h \right\} \right) \) has non-empty interior. Otherwise, fix a small \(\varepsilon > 0\) and take \(t_0 \in {\mathbb {S}}^1\) such that \(h(t_0) = \inf h\). Then by the continuity of h, there exist disjoint open arcs \(I, J \subset {\mathbb {S}}^1\) of length smaller than \(\varepsilon /2\), such that \({{\overline{I}}} \cap {{\overline{J}}} = \{ t_0 \}\) and their images \(h(I),\ h(J)\) are intervals of positive length with \(\overline{h(I)} = \overline{h(J)} = K\) for some closed, non-degenerate interval \(K \subset {\mathbb {R}}\). As B is of full \({{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}\)-measure and h is Lipschitz, \(h(I \cap B)\) and \(h(J \cap B)\) both have full Lebesgue measure in K, hence \(h(I \cap B) \cap h(J \cap B) \ne \emptyset \). This proves the claim.

Claim 4

The map h is constant.

For the proof of Claim 4, fix a small \(\delta >0\). As h is uniformly continuous, there exists \(\varepsilon >0\) such that \(|h(t) - h(t')| < \delta \) whenever \(d(t,t') < \varepsilon \). According to Claim 3, there exist \(t_1, t_2 \in B\) such that \(0<d(t_1, t_2) < \varepsilon \) and \(h(t_1) = h(t_2)\). Let \(\beta = t_2 - t_1\text { mod }1\) and note that \(\beta \ne 0\), \(|\beta | < \varepsilon \). Applying inductively Claim 2 to \(t_1, t_2\) with \(s = \beta , \ldots , (n-1)\beta \text { mod }1\), for \(n \in {\mathbb {N}}\), we obtain \(h(t_1) = h(t_1 + \beta \text { mod }1) = \cdots = h(t_1 + n\beta \text { mod }1)\). Again by Claim 2, we arrive at \(h(0) = h(n\beta \text { mod }1)\) for \(n \in {\mathbb {N}}\).

Take \(t \in {\mathbb {S}}^1\). As \(|\beta | < \varepsilon \), for every \(t \in {\mathbb {S}}^1\) there exists \(n \in {\mathbb {N}}\) such that \(d(t, n\beta \text { mod }1) < \varepsilon \). For such n we have \(|h(t) - h(0)| = |h(t) - h(n\beta \text { mod }1)| < \delta \). As \(\delta \) was arbitrary, we have \(h(t) = h(0)\). Therefore, h is constant. \(\quad \square \)

Remark 4.2

In [BGŚ20, Example 3.5] it is shown that there does not exist a Lipschitz map \(h :{\mathbb {S}}^1\rightarrow {\mathbb {R}}\) which is injective on a set of full \({{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}\)-measure. However, it may still happen that for certain Lipschitz transformations \(T :{\mathbb {S}}^1\rightarrow {\mathbb {S}}^1\) almost sure predictability holds for every h, e.g. if T is the identity.

Corollary 4.3

Let \(X \subset {\mathbb {R}}^N\) be a compact set with a Borel probability measure \(\mu \) and let \(T:X \rightarrow X\) be an injective Lipschitz map, such that

$$\begin{aligned} ({{\,\mathrm{supp}\,}}\mu , \mu , T|_{{{\,\mathrm{supp}\,}}\mu }) = (X_0, \mu _0, T_0), \end{aligned}$$

where \(X_0 = \{p_0\} \cup {\mathbb {S}}^1\) for a circle \({\mathbb {S}}^1 \subset {\mathbb {R}}^N\) and \(p_0 \in {\mathbb {R}}^N \setminus {\mathbb {S}}^1\),

$$\begin{aligned} \mu _0 = \frac{1}{2} \delta _{p_0} + \frac{1}{2} {{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}, \end{aligned}$$

and \(T_0 :X_0 \rightarrow X_0\), such that \(T_0(p_0)=p_0\) and \(T_0\) is an irrational rotation \(R_\alpha \) on \({\mathbb {S}}^1\). Set \(\nu = \phi _*\mu \). Then \({{\,\mathrm{ID}\,}}(\mu ) = 1/2\) and the only Lipschitz observables \(h :X \rightarrow {\mathbb {R}}\), such that \(\nu \)-almost every \(y \in {\mathbb {R}}^k\) is predictable for the 1-delay coordinate map \(\phi =h\), are the ones constant on \({\mathbb {S}}^1\). Consequently, for a prevalent set of Lipschitz observables, there is a positive \(\nu \)-measure set of non-predictable points. In particular, the set of Lipschitz observables \(h:X \rightarrow {\mathbb {R}}\) for which \(\nu \)-almost every point of \({\mathbb {R}}\) is predictable, is not prevalent.

Proof

The fact \({{\,\mathrm{ID}\,}}(\mu ) = {{\,\mathrm{ID}\,}}(\mu _0) = 1/2\) follows from the definition of the information by a direct checking. The assertion that only observables constant on \({\mathbb {S}}^1\) give almost sure predictability is an immediate consequence of Proposition 4.1. Consider now the space \({{\,\mathrm{Lip}\,}}(X)\) of all Lipschitz observables \(T:X \rightarrow X\), with the Lipschitz norm \(\Vert h\Vert _{{{\,\mathrm{Lip}\,}}}\) (see Sect. 2.2), and let \(Z \subset {{\,\mathrm{Lip}\,}}(X)\) be the set of Lipschitz observables which are constant on \({\mathbb {S}}^1\). Note first that any prevalent set is dense (see [Rob11, Sect. 5.1]), while Z is not dense in \({{\,\mathrm{Lip}\,}}(X)\) in the supremum norm (hence also in the Lipschitz norm). Therefore, Z is not prevalent in \({{\,\mathrm{Lip}\,}}(X)\). In fact, we can prove more, showing that \({{\,\mathrm{Lip}\,}}(X) \setminus Z\) is prevalent (note that a subset of the complement of a prevalent set cannot be prevalent, as the intersection of two prevalent sets is prevalent, see [HSY92]).

In order to prove prevalence of \({{\,\mathrm{Lip}\,}}(X) \setminus Z\), we can assume that the circle \({\mathbb {S}}^1 \subset X\) is of the form \({\mathbb {S}}^1 = \{ (x_1, \ldots , x_N) \in {\mathbb {R}}^N : x_1^2 + x_2^2 = 1,\ x_3 = 0, \ldots , x_N = 0 \}\). Indeed, an affine change of coordinates in \({\mathbb {R}}^N\) transforming the circle in X to the circle \(\{ (x_1, \ldots , x_N) \in {\mathbb {R}}^N : x_1^2 + x_2^2 = 1,\ x_3 = 0, \ldots , x_N = 0 \}\) induces a linear isomorphism between the corresponding spaces of Lipschitz observables. Like in Theorem 3.1, we show the prevalence of \({{\,\mathrm{Lip}\,}}(X) \setminus Z\) with the probe set equal to a linear basis of the space of real polynomials of N variables of degree at most 1. In other words, we should check that for any \(h \in {{\,\mathrm{Lip}\,}}(X)\), we have \(h + \alpha _0 + \alpha _1 h_1 + \cdots + \alpha _N h_N \notin Z\) for Lebesgue-almost every \(\alpha = (\alpha _0, \ldots , \alpha _N) \in {\mathbb {R}}^{N+1}\), where \(h_j(x_1, \ldots , x_N) = x_j\), \(j = 1, \ldots , N\). Let \(e_1, \ldots , e_N\) be the standard basis of \({\mathbb {R}}^N\). If \(h + \alpha _0 + \alpha _1 h_1 + \cdots + \alpha _N h_N \in Z\), then evaluating at \(e_1,e_2 \in {\mathbb {S}}^1\) gives

$$\begin{aligned} h(e_1) + \alpha _0 + \alpha _1 = h(e_2) + \alpha _0 + \alpha _2. \end{aligned}$$

Therefore \(\alpha _1 = \alpha _2 + h(e_2) - h(e_1)\), so \(\alpha \) belongs to an affine subspace of \({\mathbb {R}}^{N+1}\) of codimension one. It follows that given \(h \in {{\,\mathrm{Lip}\,}}(X)\), we have \(h + \alpha _0 + \alpha _1 h_1 + \cdots + \alpha _N h_N \in Z\) for \((\alpha _0, \ldots , \alpha _N)\) in a set of zero Lebesgue measure in \({\mathbb {R}}^{N+1}\), which ends the proof. \(\quad \square \)

4.2 Construction of the diffeomorphism \(f:{\mathbb {S}}^2 \rightarrow {\mathbb {S}}^2\)

In this subsection we construct a smooth diffeomorphism f of \({\mathbb {S}}^2 \simeq {\mathbb {R}}^2 \cup \{\infty \}\) with the invariant unit circle S containing two fixed points pq, such that the trajectories of all points in \({\mathbb {R}}^2 \setminus \{0,0)\}\) spiral towards the invariant unit circle S, spending most of the time in small neighbourhoods of p and q.

We consider points \((x,y) \in {\mathbb {R}}^2\) in polar coordinates, i.e. \(x = r \cos \varphi \), \(y = r \sin \varphi \) for \(r \in [0, +\infty )\), \(\varphi \in {\mathbb {R}}\). Let

$$\begin{aligned} f(r\cos \varphi , r\sin \varphi ) = (R(r) \cos \Phi (r, \varphi ), R(r) \sin \Phi (r, \varphi )) \end{aligned}$$

for

$$\begin{aligned} R(r) = r + \varepsilon \frac{r(1-r)^3}{1+r^4}, \qquad \Phi (r, \varphi ) = \varphi + \varepsilon \theta (\varphi ) + (1-r)^2 \eta (r), \end{aligned}$$

where \(\varepsilon > 0\) is a small constant, \(\theta :{\mathbb {R}}\rightarrow [0, +\infty )\) is a \(\pi \)-periodic \(C^\infty \)-function such that \(\theta (\varphi ) = \varphi ^2\) for \(\varphi \in (-\pi /4, \pi /4)\) and \(\theta \) has no zeroes except for \(k\pi \), \(k \in {\mathbb {Z}}\), while \(\eta :[0, +\infty ) \rightarrow [0, +\infty )\) is a \(C^\infty \)-function such that \(\eta |_{[\frac{1}{2}, \frac{3}{2}]} \equiv 1\), \(\eta > 0\) on \((0,\infty )\) and \(\lim _{r \rightarrow 0^+} (1-r)^2\eta (r) = \lim _{r \rightarrow +\infty } (1-r)^2\eta (r) = 0\) (the role of \(\eta \) is to ensure that f extends to a \(C^{\infty }\)-diffeomorphism of the sphere). The following two lemmas are elementary.

Lemma 4.4

For sufficiently small \(\varepsilon >0\), the function R has the following properties.

  1. (a)

    R is an increasing homeomorphism of \([0,+\infty )\).

  2. (b)

    \(R(0) = 0\), \(R(r) > r\) for \(r \in (0,1)\), \(R(1) = 1\) and \(R(r) < r\) for \(r \in (1, +\infty )\).

  3. (c)

    Near \(r = 1\), R has the Taylor expansion \(R(r) = 1 + r-1 -\frac{\varepsilon }{2} (r-1)^3 + \cdots \).

Lemma 4.5

For sufficiently small \(\varepsilon >0\), the function \(\Phi \) has the following properties.

  1. (a)

    \(\Phi (r, \varphi ) > \varphi \) for \(r \in ((0,1) \cup (1, +\infty ))\).

  2. (b)

    For given \(r \in (0, +\infty )\), the function \(\varphi \mapsto \Phi (r, \varphi )\) is strictly increasing.

  3. (c)

    For the function \(\varphi \mapsto \Phi (1, \varphi ) \mod 2\pi \), the points \(0, \pi \) are the unique fixed points and the intervals \((0, \pi ), (\pi , 2\pi )\) are invariant.

Let

$$\begin{aligned} {\mathbb {B}}= \{(x,y) \in {\mathbb {R}}^2 : \Vert (x,y)\Vert < 1\}, \qquad S= \{(x,y) \in {\mathbb {R}}^2 : \Vert (x,y)\Vert = 1\}, \end{aligned}$$

where \(\Vert \cdot \Vert \) denotes the Euclidean norm. For sufficiently small \(\varepsilon \), the function f defines a \(C^\infty \)-diffeomorphism of \({\mathbb {R}}^2\), such that the unit disc \({\mathbb {B}}\), the unit circle S and the complement of \({\overline{{\mathbb {B}}}}\) are f-invariant. Compactifying \({\mathbb {R}}^2\) to the Riemann sphere \({\mathbb {S}}^2 \simeq {\mathbb {R}}^2 \cup \{\infty \}\) and putting \(f(\infty ) = \infty \), we extend f to a \(C^\infty \)-diffeomorphism of \({\mathbb {S}}^2\) with fixed points at (0, 0) and \(\infty \). Another two fixed points,

$$\begin{aligned} p = (1,0), \qquad q = (-1, 0), \end{aligned}$$

corresponding to the fixed points described in Lemma 4.5(c), are located in the unit circle S.

Now we analyse the behaviour of the orbits of points \((x, y) \in {\mathbb {S}}^2\) under f. By Lemma 4.5, if \((x, y) = (\cos \varphi _0, \sin \varphi _0) \in S\) for some \(\varphi _0 \in {\mathbb {R}}\), then \(f^n(x, y)\) tends to p (resp. to q) as \(n \rightarrow \infty \) for \(\varphi _0 \in (-\pi , 0]\) \(\text {mod }2\pi \) (resp. \(\varphi _0 \in (0, \pi ]\) \(\text {mod }2\pi \)). Suppose now \((x, y) \in {\mathbb {S}}^2 \setminus S\). Recall that the points (0, 0) and \(\infty \) are fixed, so we can assume \((x, y) \in \mathbb {\mathbb {R}}^2 \setminus (S \cup \{(0,0)\})\). Then

$$\begin{aligned} (x, y) = (r_0 \cos \varphi _0,r_0 \sin \varphi _0) \end{aligned}$$

for \(r_0 \in {\mathbb {R}}\setminus \{1\}\), \(\varphi _0 \in {\mathbb {R}}\). The goal of this subsection is to prove

$$\begin{aligned} \lim \limits _{N \rightarrow \infty } \frac{1}{N} \sum \limits _{n=0}^{N-1} \delta _{f^n(x, y)} = \frac{1}{2} \delta _p + \frac{1}{2} \delta _q \end{aligned}$$

in the sense of weak-\(^*\) convergence (see Corollary 4.13). To this aim, we find the asymptotics of the subsequent times spent by the iterates of (xy) in small neighbourhoods of the points p and q. We will make calculations only for the case

$$\begin{aligned} r_0 \in (0,1), \end{aligned}$$

since the functions R, \(\Phi \) are defined such that the behaviour of the trajectories in the case of points \(r_0 > 1\) is symmetric (see Remark 4.11). From now on, we fix the initial point \((x, y) = (r_0 \cos \varphi _0,r_0 \sin \varphi _0)\) with \(r_0 \in (0,1)\) and allow all the constants appearing below to depend on this point. For \(n \in {\mathbb {N}}\) let

$$\begin{aligned} r_n = R^n(r_0) \end{aligned}$$

and define inductively

$$\begin{aligned} \varphi _{n+1} = \Phi (r_n, \varphi _n). \end{aligned}$$

Then

$$\begin{aligned} f^n(r_0\cos \varphi _0, r_0\sin \varphi _0) = (r_n \cos \varphi _n, r_n \sin \varphi _n). \end{aligned}$$

For convenience, set

$$\begin{aligned} \rho _n = 1 -r_n \end{aligned}$$

and note that by Lemma 4.4, \(\rho _n\) decreases to 0 as \(n \rightarrow \infty \).

Lemma 4.6

We have

$$\begin{aligned} \rho _n = \frac{a + o(1)}{\sqrt{n}} \end{aligned}$$

as \(n \rightarrow \infty \) for some \(a > 0\). Moreover, for every \(0 \le k \le n\),

$$\begin{aligned} \frac{k}{cn^{3/2}} \le \rho _n - \rho _{n+k} \le \frac{ck}{n^{3/2}}, \end{aligned}$$

where \(c > 0\) is independent of n and k.

Proof

By Lemma 4.4, we have \(\rho _n \searrow 0^+\) as \(n \rightarrow \infty \) and

$$\begin{aligned} \rho _{n+1} = \rho _n - \frac{\varepsilon }{2} \rho _n^3 + \cdots \end{aligned}$$

for \(\rho _n\) close to 0. Hence, the first assertion follows from the standard analysis of the behaviour of an analytic map near a parabolic fixed point, see e.g. [Mil06, Lemma 10.1]. To check the second one, note that there exists a univalent holomorphic map \(\psi :V \rightarrow {\mathbb {C}}\) (Fatou coordinate) on a domain \(V \subset {\mathbb {C}}\) containing \(\rho _n\) for large n, such that \(\psi (V)\) contains a half-plane \(\{z \in {\mathbb {C}}: Re (z) > c_0\}\) for some \(c_0 \in {\mathbb {R}}\) and

$$\begin{aligned} \psi (\rho _{n+1}) = \psi (\rho _n) + 1 \end{aligned}$$

(see e.g. [Mil06, Theorem 10.9]). Let

$$\begin{aligned} z_n = \psi (\rho _n) \end{aligned}$$

for large n and take \(n_0\) with \(Re (z_{n_0}) > c_0\). Then \(\psi ^{-1}\) is defined on

$$\begin{aligned} D = \{z \in {\mathbb {C}}: |z- z_{n+k}| < n+k-n_0\} \end{aligned}$$

for large n, and \(z_{\lfloor n/2\rfloor }, z_n \in D'\) for

$$\begin{aligned} D' = \{z \in {\mathbb {C}}: |z- z_{n+k}| \le n+k-\lfloor n/2\rfloor \}. \end{aligned}$$

Since \(k \le n\), the ratio of the radius of \(D'\) to the radius of D is at most \(\frac{(3/2)n+1}{2n-n_0}\), which tends to 3/4 as \(n \rightarrow \infty \). Moreover,

$$\begin{aligned} \frac{|z_{n+k} - z_n|}{|z_n - z_{\lfloor n/2\rfloor }|} = \frac{k}{n - \lfloor n/2\rfloor } \end{aligned}$$

Therefore, by the Koebe distortion theorem (see e.g. [CG93, Theorem 1.6]),

$$\begin{aligned} \frac{1}{c} \frac{k}{n}< \frac{\rho _n - \rho _{n+k}}{\rho _{\lfloor n/2\rfloor } - \rho _n} < c \frac{k}{n} \end{aligned}$$

for some constant \(c > 0\). Since \(\sqrt{n}(\rho _{\lfloor n/2\rfloor } - \rho _n) \rightarrow \sqrt{2} - 1\) as \(n \rightarrow \infty \) by the first assertion of the lemma, this ends the proof. \(\quad \square \)

Convention. Within subsequent calculations, we will \(a_n \asymp b_n\) for sequences \(a_n, b_n\), if \(\frac{1}{c}< \frac{a_n}{b_n} < c\), where \(c > 0\) is independent of n.

Lemma 4.7

Suppose

$$\begin{aligned} x_{n+1} = x_n + a x_n^2 \end{aligned}$$

for \(n\in {\mathbb {Z}}\) and some \(a > 0\). Then for given \(x_0 < 0\) (resp. \(x_0 > 0)\) sufficiently close to 0, we have

$$\begin{aligned} x_n \asymp -\frac{1}{n} \qquad \Big (\text {resp. } x_{-n} \asymp \frac{1}{n}\Big ) \end{aligned}$$

for \(n \in {\mathbb {N}}\).

Proof

Follows directly from [Mil06, Lemma 10.1]. \(\quad \square \)

By Lemmas 4.44.6, the trajectory of (xy) approaches the unit circle S, spiralling an infinite number of times near S and slowing down near the fixed points p and q. In fact, the definitions of the functions R, \(\Phi \) easily imply that p and q are in the limit set of the trajectory. In particular, for a fixed \(\delta > 0\) (which is small enough to satisfy several conditions, specified later), the trajectory visits infinitely number of times the \(\delta \)-neighbourhoods of p and q, defined respectively by

$$\begin{aligned} \begin{aligned} U_p&= \{(r\cos \varphi , r\sin \varphi ): r \in (1-\delta , 1+\delta ),\, \varphi \in (-\delta , \delta )\},\\ U_q&= \{(r\cos \varphi , r\sin \varphi ): r \in (1-\delta , 1+\delta ),\, \varphi \in (\pi -\delta , \pi + \delta )\}. \end{aligned} \end{aligned}$$
(4.2)

Hence, for \(i \in {\mathbb {N}}\) we can define \(N_{p, i}\) (resp. \(N_{q, i}\)) to be the time spent by the trajectory during its i-th visit in \(U_p\) (resp. \(U_q\)). More precisely, set \(n^+_{p,0} = 0\) and define inductively

$$\begin{aligned} n^-_{p, i}&= \min \{n \ge n^+_{p, i-1}: (r_n\cos \varphi _n, r_n\sin \varphi _n) \in U_p\},\\ n^+_{p, i}&= \min \{n \ge n^-_{p, i}: (r_n\cos \varphi _n, r_n\sin \varphi _n) \notin U_p\},\\ N_{p, i}&= n^+_{p, i} - n^-_{p, i} \end{aligned}$$

for \(i \ge 1\). Define \(n^-_{q, i}\), \(n^+_{q, i}\), \(N_{q, i}\) analogously. By Lemmas 4.4 and 4.5, if \(\delta >0\) is chosen small enough, then

$$\begin{aligned} 0< n^-_{p,1}< n^+_{p,1}< n^-_{q,1}< n^+_{q,1}< \cdots< n^-_{p, i}< n^+_{p, i}< n^-_{q, i}< n^+_{q, i} < \cdots \end{aligned}$$
(4.3)

or

$$\begin{aligned} 0< n^-_{q,1}< n^+_{q,1}< n^-_{p,1}< n^+_{p,1}< \cdots< n^-_{q, i}< n^+_{q, i}< n^-_{p, i}< n^+_{p, i} < \cdots , \end{aligned}$$

depending on the position of the point (xy). To simplify notation, we assume that (4.3) holds. Again by Lemmas 4.4 and 4.5, we obtain the following.

Lemma 4.8

We have

$$\begin{aligned} n^-_{q, i} - n^+_{p, i}, \; n^-_{p,i+1} - n^+_{q, i} < N_0 \end{aligned}$$

for some constant \(N_0 > 0\). In other words, the times spent by the trajectory of (xy) between consecutive visits in \(U_p \cup U_q\) remain uniformly bounded.

Now we estimate the times spent by the trajectory during its stay in \(U_p\) and \(U_q\).

Lemma 4.9

$$\begin{aligned} N_{p, i} \asymp N_{q, i} \asymp i. \end{aligned}$$

Proof

We prove the lemma by induction. Obviously, we can assume that i is large. Suppose, by induction,

$$\begin{aligned} \frac{j}{C} \le N_{p, j} \le C j, \quad \frac{j}{C} \le N_{q, j} \le C j \qquad \text {for } j = 1, \ldots , i-1 \end{aligned}$$
(4.4)

for a large constant \(C > 1\) (to be specified later). First, we estimate \(N_{p, i}\). By Lemma 4.8,

$$\begin{aligned} \frac{i^2}{c_1C} \le n^-_{p, i} \le c_1 C i^2 \end{aligned}$$
(4.5)

for some \(c_1 > 0\) (we denote by \(c_1, c_2, \ldots \) constants independent of C.) Obviously, we can assume \(\varphi _{n^-_{p, i}} \in [-\pi , \pi )\). Then, since \(\delta \) is small and i is large, we have

$$\begin{aligned} -\frac{\pi }{4}< -\delta< \varphi _{n^-_{p, i}} < 0. \end{aligned}$$

Note that \(\rho _{n^-_{p, i}} < \delta \) and the sequence \(\rho _n\) is decreasing, so

$$\begin{aligned} N_{p, i} = \min \{n \ge n^-_{p, i}: \varphi _n \ge \delta \} - n^-_{p, i}. \end{aligned}$$

Recall that if \(\varphi _n \in (-\pi /4, \pi /4)\) (in particular, if \(n \in [n^-_{p, i}, n^+_{p, i})\)), then

$$\begin{aligned} \varphi _{n+1} = \varphi _n + \varepsilon \varphi _n^2 + \rho _n^2. \end{aligned}$$
(4.6)

Let

$$\begin{aligned} \rho ^-_i = \frac{1}{C^{2/3}i}, \qquad \rho ^+_i = \frac{C^{2/3}}{i}. \end{aligned}$$

To estimate the behaviour of the sequence \(\varphi _n\) for \(n \ge n^-_{p, i}\), we will compare it with the sequences \(\varphi ^+_n\), \(\varphi ^-_n\) for \(n \ge n^-_{p, i}\), given by

$$\begin{aligned} \varphi ^\pm _{n^-_{p, i}} = \varphi _{n^-_{p, i}}, \qquad \varphi ^\pm _{n+1} = \varphi ^\pm _n + \varepsilon (\varphi ^\pm _n)^2 + (\rho ^\pm _i)^2. \end{aligned}$$
(4.7)

First, we will analyse the behaviour of the sequences \(\varphi ^\pm _n\) and then show that they provide upper and lower bounds for \(\varphi _n\). By definition, \(\varphi _{n^-_{p, i}}^\pm \in (-\delta , 0)\) and \(\varphi _n^\pm \) increases to infinity as \(n \rightarrow \infty \). Hence, we can define

$$\begin{aligned} N^\pm _i = \min \{n \ge n^-_{p, i+1}: \varphi ^\pm _n \ge \delta \} - n^-_{p, i}. \end{aligned}$$

to be the time which the sequence \(\varphi ^\pm _n\) spends in \((-\delta , \delta )\). Since \(\rho ^-_i < \rho ^+_i\), we have \(\varphi ^-_n \le \varphi ^+_n\) and \(N^+ \le N^-\). Set

$$\begin{aligned} k^\pm _1&= \min \left\{ n \in [n^-_{p, i}, n^-_{p, i} + N^\pm _i] : \varphi ^\pm _n> - \frac{\rho ^\pm _i}{\sqrt{\varepsilon }}\right\} ,\\ k^\pm _2&= \min \left\{ n \in [k^\pm _1, n^-_{p, i} + N^\pm _i] : \varphi ^\pm _n > \frac{\rho ^\pm _i}{\sqrt{\varepsilon }}\right\} . \end{aligned}$$

Note that for \(n \in [n^-_{p, i}, k^\pm _1) \cup [k^\pm _2, N^\pm _i + n^-_{p, i})\) we have \(\varepsilon (\varphi ^\pm _n)^2 \ge (\rho ^\pm _i)^2\), so

$$\begin{aligned} \varphi ^\pm _n + \varepsilon (\varphi ^\pm _n)^2 \le \varphi ^\pm _{n+1} \le \varphi ^\pm _n + 2\varepsilon (\varphi ^\pm _n)^2. \end{aligned}$$

Hence, by Lemma 4.7,

$$\begin{aligned} k^\pm _1 - n^-_{p, i} \asymp N^\pm _i + n^-_{p, i} - k^\pm _2 \asymp \frac{1}{\rho ^\pm _i}. \end{aligned}$$

On the other hand, for \(n \in [k^\pm _1, k^\pm _2)\) we have \(\varepsilon (\varphi ^\pm _n)^2 \le (\rho ^\pm _i)^2\), so

$$\begin{aligned} \varphi ^\pm _n + (\rho ^\pm _i)^2 \le \varphi ^\pm _{n+1} \le \varphi ^\pm _n + 2 (\rho ^\pm _i)^2, \end{aligned}$$

which implies

$$\begin{aligned} k^\pm _2 - k^\pm _1 \asymp \frac{1}{\rho ^\pm _i}. \end{aligned}$$

Hence,

$$\begin{aligned} \frac{i}{c_2 C^{2/3}} = \frac{1}{c_2\rho ^+_i} \le N^+_i \le N^-_i \le \frac{c_2}{\rho ^-_i} = c_2 C^{2/3} i \end{aligned}$$

for some \(c_2 > 0\). If C is chosen sufficiently large, then this yields

$$\begin{aligned} \frac{i}{C} \le N^+_i \le N^-_i \le C i. \end{aligned}$$
(4.8)

Now we show by induction that

$$\begin{aligned} \varphi ^-_n \le \varphi _n \le \varphi ^+_n \end{aligned}$$
(4.9)

for \(n \in [n^-_{p, i}, n^-_{p, i} + \min (N_{p, i}, N^-_i)]\). To do it, note that for \(n = n^-_{p, i}\) we have equalities in (4.9). Suppose, by induction, that (4.9) is satisfied for some \(n \in [n^-_{p, i}, n^-_{p, i} + \min (N_{p, i}, N^-_i))\). Then by (4.6) and (4.7),

$$\begin{aligned} \varphi _{n+1} - \varphi ^\pm _{n+1} = (\varphi _n - \varphi ^\pm _n)(1 + \varepsilon (\varphi _n + \varphi ^\pm _n)) + \rho _n^2 - (\rho ^\pm _i)^2, \end{aligned}$$

where \(1 + \varepsilon (\varphi _n + \varphi ^\pm _n)>1 -2\varepsilon \delta > 0\). Moreover, by Lemma 4.6, (4.5) and (4.8), there exists a constant \(c_3 > 0\), such that

$$\begin{aligned} \frac{1}{c_3 \sqrt{C} \, i} \le \rho _n \le \frac{c_3 \sqrt{C}}{i}, \end{aligned}$$

which gives

$$\begin{aligned} \rho ^-_i \le \rho _n \le \rho ^+_i, \end{aligned}$$

provided C is chosen sufficiently large. Therefore, the sign of \(\varphi _{n+1} - \varphi ^\pm _{n+1}\) is the same as the one of \(\varphi _n - \varphi ^\pm _n\), which provides the induction step and proves (4.9).

Using (4.9), we can show

$$\begin{aligned} N^+ _i\le N_{p, i} \le N^-_i. \end{aligned}$$
(4.10)

Indeed, if \(N_{p, i} > N_i^-\), then by (4.9),

$$\begin{aligned} \delta \le \varphi ^-_{n^-_{p, i} + N^-_i} \le \varphi _{n^-_{p, i} + N^-_i}, \end{aligned}$$

so \(n^+_{p, i} \le n^-_{p, i} + N^-_i\), which is a contradiction. Hence, \(N_{p, i} \le N_i^-\), and then (4.9) gives

$$\begin{aligned} \delta \le \varphi _{n^+_{p, i}} \le \varphi ^+_{n^+_{p, i}}, \end{aligned}$$

which implies (4.10). By (4.8) and (4.10),

$$\begin{aligned} \frac{i}{C} \le N_{p, i} \le C i, \end{aligned}$$

which completes the inductive step started in (4.4) and shows \(N_{p, i} \asymp i\).

To show \(N_{q, i} \asymp i\), note that if \(\varphi _n \in (3\pi /4 , 5\pi /4)\), then for \({{\tilde{\varphi }}}_n = \varphi _n - \pi \) we have

$$\begin{aligned} {{\tilde{\varphi }}}_{n+1} = {{\tilde{\varphi }}}_n + \varepsilon {{\tilde{\varphi }}}_n^2 + \rho _n^2. \end{aligned}$$

Moreover, by the proved assertion \(N_{p, i} \asymp i\) and Lemmas 4.6 and 4.8, we have \(n^-_{q, i} \asymp n^-_{p, i}\) and \(\rho _{n^-_{q, i}}\asymp \rho _{n^-_{p, i}}\). Using this, one can show \(N_{q, i}\asymp i\) by repeating the proof in the case of \(N_{p, i}\). \(\quad \square \)

A more accurate comparison of \(N_{p, i}\) and \(N_{q, i}\) is presented below.

Lemma 4.10

There exists \(M > 0\) such that

$$\begin{aligned} |N_{p, i}-N_{q, i}| < M \end{aligned}$$

for all \(i \ge 1\).

Proof

Take a large \(i \in {\mathbb {N}}\). Let

$$\begin{aligned} (\eta _n, \psi _n) = f^n(r_{n_{p, i}^-}, \varphi _{n_{p, i}^-}), \qquad ({{\tilde{\eta }}}_n, {{\tilde{\psi }}}_n) = f^n(r_{n_{q, i}^-}, \varphi _{n_{q, i}^-} - \pi ) \end{aligned}$$

and

$$\begin{aligned} \sigma _n = 1 -\eta _n = \rho _{n+n_{p, i}^-}, \qquad {{\tilde{\sigma }}}_n = 1 -{{\tilde{\eta }}}_n = \rho _{n+n_{q, i}^-} \end{aligned}$$

for \(n \ge 0\). Subtracting multiplicities of \(2\pi \), we can assume \(\psi _0, {{\tilde{\psi }}}_0 \in [-\pi , \pi )\), so in fact

$$\begin{aligned} -\delta< \psi _0, {{\tilde{\psi }}}_0 < 0. \end{aligned}$$

By definition,

$$\begin{aligned} \psi _{n+1} = \psi _n + \varepsilon \psi _n^2 + \sigma _n^2, \qquad {{\tilde{\psi }}}_{n+1} = {{\tilde{\psi }}}_n + \varepsilon {{\tilde{\psi }}}_n^2 + {{\tilde{\sigma }}}_n^2 \end{aligned}$$
(4.11)

as long as \(\psi _n, {{\tilde{\psi }}}_n < \pi /4\). It follows that

$$\begin{aligned} N_{p, i} = \min \{n \ge 0: \psi _n \ge \delta \}, \qquad N_{q, i} = \min \{n \ge 0: {{\tilde{\psi }}}_n \ge \delta \}. \end{aligned}$$

Note that (4.11) holds for \(n \le \min (N_{p, i}, N_{q, i})+1\). To prove the lemma, we will carefully compare the behaviour of the sequences \(\psi _n\) and \({{\tilde{\psi }}}_n\). First, note that

$$\begin{aligned} {{\tilde{\psi }}}_0 \le \psi _2 \le {{\tilde{\psi }}}_4 \end{aligned}$$
(4.12)

provided i is sufficiently large (because then \(\sigma _n, {{\tilde{\sigma }}}_n\) are small compared to \(\varepsilon \) and \(\delta \)). Note also that since \(\rho _n\) is decreasing, we have

$$\begin{aligned} \sigma _{n+2} > {{\tilde{\sigma }}}_n \end{aligned}$$
(4.13)

for every \(n \ge 0\). By (4.11),

$$\begin{aligned} \psi _{n+3} - {{\tilde{\psi }}}_{n+1} = (\psi _{n+2} - {{\tilde{\psi }}}_n) (1 + \varepsilon (\psi _{n+2} + {{\tilde{\psi }}}_n)) + \sigma _{n+2}^2 - {{\tilde{\sigma }}}_n^2 \end{aligned}$$

for \(n \le \min (N_{p, i}-2, N_{q, i})\), where \(\varepsilon (\psi _{n+2} + {{\tilde{\psi }}}_n)< \varepsilon \pi /2 < 1\). Hence, by induction, using (4.12) and (4.13), we obtain

$$\begin{aligned} \psi _{n+2} \ge {{\tilde{\psi }}}_n \end{aligned}$$
(4.14)

for \(n \in [0, \min (N_{p, i}-2, N_{q, i})+1]\). In particular,

$$\begin{aligned} N_{p, i} < N_{q, i}+2 \qquad \text {or} \qquad \psi _{N_{q, i}+2} > {{\tilde{\psi }}}_{N_{q, i}} \ge \delta , \end{aligned}$$

which gives

$$\begin{aligned} N_{p, i} \le N_{q, i} + 2. \end{aligned}$$
(4.15)

The proof of the opposite estimate is more involved, so let us first present its sketch. We fix a number k such that (roughly speaking) \(\psi _k \approx 1/i\). Then we show inductively \({{\tilde{\psi }}}_{n+2} \ge \psi _n - cn/i^3\) for \(n \le k\) and some constant \(c >0\) (see (4.18)). This gives \({{\tilde{\psi }}}_{k+2} \ge \psi _k - c'/i^2\) for some \(c' > 0\) (see (4.19)). By the definition of k, we check that for sufficiently large constant \(M > 0\) we have \({{\tilde{\psi }}}_{k+M} \ge \psi _k + c''M/i^2\) for some \(c'' > 0\). With this starting condition, we inductively show \({{\tilde{\psi }}}_{n+M} \ge \psi _n + c''M/i^2\) for \(n \in [k, N_{p, i}]\) (see (4.23)). This provides \({{\tilde{\psi }}}_{N_{p, i}+M} \ge \psi _{N_{p, i}} \ge \delta \), so \(N_{q, i} \le N_{p, i} + M\).

Now let us go into the details of the proof. By Lemmas 4.8 and 4.9, we have

$$\begin{aligned} n_{p, i}^- \asymp n_{q, i}^- \asymp i^2, \qquad N_{p, i} \asymp N_{q, i} \asymp i, \end{aligned}$$
(4.16)

so by Lemma 4.6,

$$\begin{aligned} \sigma _n \le \frac{c_1}{i}, \qquad \sigma _n^2 - {{\tilde{\sigma }}}_{n+2}^2 = (\sigma _n + {{\tilde{\sigma }}}_{n+2})(\sigma _n - {{\tilde{\sigma }}}_{n+2}) \le \frac{c_1}{i^3} \end{aligned}$$
(4.17)

for \(n \in [0, N_{q, i}+4]\) and a constant \(c_1 > 0\). Let

$$\begin{aligned} k = \max \left\{ n \in [2, N_{q, i}]: \psi _{n+4} < \frac{b}{i}\right\} \end{aligned}$$

for a small constant \(b > 0\) (to be specified later). Note that \(k \le \min (N_{p, i} - 5, N_{q, i})\), so (4.11) holds for \(n \in [2, k)\).

We will show by induction that

$$\begin{aligned} \psi _n - {{\tilde{\psi }}}_{n+2} \le \frac{2c_1n}{i^3} \end{aligned}$$
(4.18)

for every \(n \in [2,k]\). For \(n = 2\), (4.18) holds due to (4.12). Suppose it holds for some \(n\in [2, k)\). By (4.11), we have

$$\begin{aligned} \psi _{n+1} - {{\tilde{\psi }}}_{n+3} = (\psi _n - {{\tilde{\psi }}}_{n+2}) (1 + \varepsilon (\psi _n + {{\tilde{\psi }}}_{n+2})) + \sigma _n^2 - {{\tilde{\sigma }}}_{n+2}^2, \end{aligned}$$

where by (4.14) and the definition of k, \(\psi _n + {{\tilde{\psi }}}_{n+2} \le \psi _n + \psi _{n+4}< 2\psi _{n+4} < 2b/i\), so using (4.16), (4.17) and the inductive assumption (4.18), we obtain

$$\begin{aligned} \psi _{n+1} - {{\tilde{\psi }}}_{n+3} \le \frac{2c_1 n}{i^3} \left( 1 + \frac{2\varepsilon b}{i}\right) + \frac{c_1}{i^3} \le \left( 2n + \frac{4\varepsilon b N_{q, i}}{i} + 1\right) \frac{c_1}{i^3} < \frac{(2n + c_2 b + 1)c_1}{i^3} \end{aligned}$$

for some constant \(c_2 > 0\). Choosing the constant b in the definition of k sufficiently small, we can assume \(c_2 b< 1\), which gives

$$\begin{aligned} \psi _{n+1} - {{\tilde{\psi }}}_{n+3} \le \frac{2c_1(n + 1)}{i^3}. \end{aligned}$$

This completes the inductive step and proves (4.18).

By (4.16) and (4.18),

$$\begin{aligned} {{\tilde{\psi }}}_{k+2} \ge \psi _k - \frac{c_3}{i^2} \end{aligned}$$
(4.19)

for a constant \(c_3 > 0\), while (by the definition of k),

$$\begin{aligned} \psi _{k+5} \ge \frac{b}{i} \end{aligned}$$
(4.20)

and by (4.11),

$$\begin{aligned} \psi _{k+5} = \psi _k + \varepsilon (\psi _k^2 + \cdots + \psi _{k+4}^2) + \sigma _k^2 + \cdots + \sigma _{k+4}^2 <\psi _k + \frac{5(\varepsilon b^2 + c_1)}{i^2}. \end{aligned}$$
(4.21)

by the definition of k, (4.11) and (4.17). Using (4.19), (4.20) and (4.21), we obtain

$$\begin{aligned} {{\tilde{\psi }}}_{k+2} \ge \frac{b}{i} - \frac{5(\varepsilon b^2 + c_1)+c_3}{i^2}\ge \frac{b}{2i} \end{aligned}$$
(4.22)

for large i.

Take a large constant \(M > 0\). We will show inductively

$$\begin{aligned} {{\tilde{\psi }}}_{n+M} - \psi _n \ge \frac{M\varepsilon b^2}{5i^2} \end{aligned}$$
(4.23)

for \(n \in [k, N_{p, i}]\). By (4.11), (4.19) and (4.22), we have

$$\begin{aligned} {{\tilde{\psi }}}_{k+M}&\ge {{\tilde{\psi }}}_{k+2} + \varepsilon ({{\tilde{\psi }}}_{k+2}^2 + \cdots +{{\tilde{\psi }}}_{k+M}^2) \ge {{\tilde{\psi }}}_{k+2} + (M-2) \varepsilon {{\tilde{\psi }}}_{k+2}^2 \\&\ge {{\tilde{\psi }}}_{k+2} + \frac{(M-2) \varepsilon b^2}{4i^2} \ge \psi _k - \frac{c_3}{i^2}+ \frac{(M-2) \varepsilon b^2}{4i^2} \ge \psi _k + \frac{M\varepsilon b^2}{5i^2}, \end{aligned}$$

if M is chosen sufficiently large, so (4.23) holds for \(n = k\). Suppose (4.23) holds for some \(n \in [k, N_{p, i})\). Now (4.15) implies that (4.11) is valid for n, so

$$\begin{aligned} {{\tilde{\psi }}}_{n+1+M} - \psi _{n+1} = ({{\tilde{\psi }}}_{n+M} - \psi _n) (1 + \varepsilon ({{\tilde{\psi }}}_{n+M} + \psi _n)) + {{\tilde{\sigma }}}_{n+M}^2 -\sigma _n^2, \end{aligned}$$

where

$$\begin{aligned} {{\tilde{\psi }}}_{n+M} + \psi _n> {{\tilde{\psi }}}_{k+M} + \psi _k> {{\tilde{\psi }}}_{k+2} \end{aligned}$$

for large i by (4.20) and (4.21) (which imply \(\psi _k > 0\)), while

$$\begin{aligned} {{\tilde{\sigma }}}_{n+M}^2 -\sigma _n^2 > - \frac{c_4}{i^3} \end{aligned}$$

for a constant \(c_4 > 0\) by (4.16) and Lemma 4.6 (with estimates analogous to the ones in (4.17)). Hence, using (4.22) we obtain

$$\begin{aligned} {{\tilde{\psi }}}_{M+n+1} - \psi _{n+1} \ge \frac{M\varepsilon b^2}{5i^2}(1 + \varepsilon {{\tilde{\psi }}}_{k+2}) - \frac{c_4}{i^3} \ge \frac{M\varepsilon b^2}{5i^2} \left( 1 + \frac{\varepsilon b}{2i}\right) - \frac{c_4}{i^3} \ge \frac{M\varepsilon b^2}{5i^2}, \end{aligned}$$

provided M is chosen sufficiently large. This ends the inductive step and proves (4.23).

By (4.23),

$$\begin{aligned} {{\tilde{\psi }}}_{N_{p, i}+M} \ge \psi _{N_{p, i}} \ge \delta , \end{aligned}$$

so

$$\begin{aligned} N_{q, i} \le N_{p, i} + M. \end{aligned}$$

This and (4.15) end the proof of the lemma. \(\quad \square \)

Remark 4.11

Proving Lemmas 4.84.10, we have made the calculations for the initial point \((x, y) = (r_0 \cos \varphi _0, r_0 \sin \varphi _0)\) assuming \(r_0 \in (0,1)\). In fact, the case \(r_0 > 1\) can be treated analogously. This can be seen by noting that \(\Phi \) is symmetric with respect to r around the circle \(r=1\), while the only properties of R used in the proofs of the lemmas are the ones stated in Lemma 4.4. As the initial terms of the Taylor expansion of R near \(r=1\) are symmetric around 1, we see that an analogue of Lemma 4.6 holds in the case \(r_0 > 1\) and the proof of Lemmas 4.84.10 can be repeated in that case. We conclude that Lemmas 4.84.10 hold for every initial point \((x, y) \in {\mathbb {S}}^2 \setminus (S \cup \{(0,0), \infty \})\).

We summarize the results of this subsection in the following proposition.

Proposition 4.12

For every \((x, y) \in {\mathbb {S}}^2 \setminus (S \cup \{(0,0), \infty \})\) and every \(\delta > 0\), if \(N_{p, i}(x, y)\) (resp. \(N_{q, i}(x, y))\) is the time spent by the trajectory of (xy) under f during its i-th visit in the \(\delta \)-neighbourhood \(U_p\) of p (resp. \(U_q\) of q), defined in (4.2), then

$$\begin{aligned} N_{p, i}(x, y) \asymp N_{q, i}(x, y) \asymp i \end{aligned}$$

and

$$\begin{aligned} |N_{p, i}(x, y) - N_{q, i}(x, y)| \le M \end{aligned}$$

for some constant \(M>0\), while the times spent by the trajectory between consecutive visits in \(U_p \cup U_q\) are uniformly bounded.

This implies the following.

Corollary 4.13

For every \((x, y) \in {\mathbb {S}}^2 \setminus (S \cup \{(0,0), \infty \})\),

$$\begin{aligned} \lim \limits _{m \rightarrow \infty } \frac{1}{m} \sum \limits _{n=0}^{m-1} \delta _{f^n(x, y)} = \frac{1}{2} \delta _p + \frac{1}{2} \delta _q \end{aligned}$$

in the sense of weak-\(^*\) convergence.

Proof

Fix \((x, y) \in {\mathbb {S}}^2 \setminus (S \cup \{(0,0), \infty \})\) and \(\delta >0\). It is sufficient to prove that for the \(\delta \)-neighbourhoods \(U_p\) and \(U_q\), defined in (4.2), one has

$$\begin{aligned} \lim \limits _{m \rightarrow \infty } \frac{1}{m} \sum \limits _{n=0}^{m-1} \mathbbm {1}_{U_p} \big (f^n(x , y) \big ) = \lim \limits _{m \rightarrow \infty } \frac{1}{m} \sum \limits _{n=0}^{m-1} \mathbbm {1}_{U_q} \big (f^n(x , y)\big ) = \frac{1}{2}. \end{aligned}$$

Fix \(m \in {\mathbb {N}}\) and let \(i = i(m)\) be the number of visits of (xy) to \(U_p\) completed up to the time m, i.e. let i be the unique number such that

$$\begin{aligned} n^-_{p, i} \le m < n^-_{p,i+1}. \end{aligned}$$

Then by Proposition 4.12, there exist a constant \(c>0\) (independent of m) such that

$$\begin{aligned} \frac{i^2}{c}\le \sum \limits _{n=0}^{m-1} \mathbbm {1}_{U_p}\big (f^n(x , y)\big ) \le c i^2, \qquad \frac{i^2}{c}\le \sum \limits _{n=0}^{m-1} \mathbbm {1}_{U_q}\big (f^n(x , y)\big ) \le c i^2, \end{aligned}$$

and

$$\begin{aligned} \sum \limits _{n=0}^{m-1} \mathbbm {1}_{(U_p \cup U_q)^c}\big (f^n(x , y)\big ) \le ci. \end{aligned}$$

This implies

$$\begin{aligned} \frac{2i^2}{c} \le m \le 3ci^2 \end{aligned}$$
(4.24)

provided i is large enough (which holds if m is large enough). Therefore,

$$\begin{aligned} \lim \limits _{m \rightarrow \infty } \frac{1}{m} \sum \limits _{n=0}^{m-1} \mathbbm {1}_{(U_p \cup U_q)^c}\big (f^n(x , y)\big ) = 0 \end{aligned}$$

and hence

$$\begin{aligned} \lim \limits _{m \rightarrow \infty } \bigg ( \frac{1}{m} \sum \limits _{n=0}^{m-1} \mathbbm {1}_{U_p}\big (f^n(x , y)\big ) + \frac{1}{m} \sum \limits _{n=0}^{m-1} \mathbbm {1}_{U_q}\big (f^n(x , y)\big ) \bigg ) = 1. \end{aligned}$$
(4.25)

Proposition 4.12 together with (4.24) implies

$$\begin{aligned} \bigg |\frac{1}{m} \sum \limits _{n=0}^{m-1} \mathbbm {1}_{U_p} \big (f^n(x , y) \big ) - \frac{1}{m} \sum \limits _{n=0}^{m-1} \mathbbm {1}_{U_q} \big (f^n(x , y) \big ) \bigg | \le \frac{C}{i} \end{aligned}$$

for a constant \(C >0\) (independent of m), hence

$$\begin{aligned} \lim \limits _{m \rightarrow \infty }\bigg |\frac{1}{m} \sum \limits _{n=0}^{m-1} \mathbbm {1}_{U_p} \big (f^n(x , y) \big ) - \frac{1}{m} \sum \limits _{n=0}^{m-1} \mathbbm {1}_{U_q} \big (f^n(x , y) \big ) \bigg | = 0. \end{aligned}$$
(4.26)

Combining (4.25) with (4.26) finishes the proof (it is enough to notice that if \(a_n, b_n\) are sequences of real numbers with \(\lim _{n \rightarrow \infty } (a_n + b_n) = 1\) and \(\lim _{n \rightarrow \infty } |a_n - b_n| = 0\), then \(\lim _{n \rightarrow \infty } a_n = \lim _{n \rightarrow \infty } b_n = \frac{1}{2}\)). \(\quad \square \)

4.3 Construction of the diffeomorphism \(T:{\mathbb {S}}^2 \times {\mathbb {S}}^1 \rightarrow {\mathbb {S}}^2 \times {\mathbb {S}}^1\)

Let

$$\begin{aligned} X = {\mathbb {S}}^2 \times {\mathbb {S}}^1, \end{aligned}$$

where \({\mathbb {S}}^2\simeq {\mathbb {R}}^2 \cup \{\infty \}\) and \({\mathbb {S}}^1 \simeq {\mathbb {R}}/ {\mathbb {Z}}\). We can assume \(X \subset {\mathbb {R}}^N\) for some \(N \in {\mathbb {N}}\). Let

$$\begin{aligned} R_\alpha :{\mathbb {S}}^1 \rightarrow {\mathbb {S}}^1, \qquad R_\alpha (t) = t + \alpha \mod 1, \qquad \alpha \in {\mathbb {R}}\setminus {\mathbb {Q}}\end{aligned}$$

be an irrational rotation. Recall that the normalized Lebesgue measure on \({\mathbb {S}}^1\) is the unique \(R_\alpha \)-invariant Borel probability measure. Let

$$\begin{aligned} g:{\mathbb {S}}^1 \rightarrow {\mathbb {S}}^1, \qquad g(t) = t + \frac{1}{100}\sin ^2(\pi t) \mod 1. \end{aligned}$$

Note that g is a \(C^{\infty }\)-diffeomorphism of \({\mathbb {S}}^1\) with 0 as the unique fixed point. Moreover, \(\lim _{n \rightarrow \infty } g^n(t) = 0\) for every \(t \in {\mathbb {S}}^1\). Therefore, \(\delta _0\) is the unique g-invariant Borel probability measure. Let \(f :{\mathbb {S}}^2 \rightarrow {\mathbb {S}}^2\) be the diffeomorphism defined in Sect. 4.2, with the invariant unit circle \(S \subset {\mathbb {S}}^2\) and the fixed points \(p, q \in S\). Fix a small \(\delta >0\) and consider the \(\delta \)-neighbourhoods \(U_p, U_q \subset {\mathbb {S}}^2\) of p and q, respectively, defined in (4.2). Let

$$\begin{aligned} T :X \rightarrow X, \qquad T(z,t) = (f(z), h_{z}(t)), \qquad z \in {\mathbb {S}}^2,\; t \in {\mathbb {S}}^1, \end{aligned}$$

where \(h_z\) are diffeomorphisms of \({\mathbb {S}}^1\) depending smoothly on \(z \in {\mathbb {S}}^2\), such that \(h_z = g\) for \(z \in U_p\), \(h_z = R_\alpha \) for \(z \in U_q\), and for z outside \(U_p \cup U_q\), \(h_z\) is defined in any manner which makes T a \(C^{\infty }\)-diffeomorphism of X.Footnote 2

In view of Corollary 4.3, to conclude the proof of Theorem 1.11, it is sufficient to show the following.

Theorem 4.14

The map T has an attractor

$$\begin{aligned} \Lambda = S \times {\mathbb {S}}^1 \end{aligned}$$

with the basin \(B(\Lambda ) = ({\mathbb {S}}^2 \setminus \{(0,0), \infty \}) \times {\mathbb {S}}^1\) and natural measure

$$\begin{aligned} \mu = \frac{1}{2} \delta _{p_0} + \frac{1}{2} {{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}, \end{aligned}$$

where \(p_0 = (p,0)\) and \({{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}\) is the Lebesgue measure on the circle \(\{q\} \times {\mathbb {S}}^1\).

Before proving Theorem 4.14 we show the following lemma.

Lemma 4.15

Let \(T :X \rightarrow X\) be a continuous transformation of a compact metric space. Let \(\nu _n,\ n \ge 0\), be a sequence of Borel probability measures on X and let \({\mathcal {A}} \subset {\mathbb {N}}\cup \{0\}\) be a set of asymptotic density zero, i.e.

$$\begin{aligned} \lim \limits _{m \rightarrow \infty } \frac{1}{m} \#\{ 0 \le n < m : n\in {\mathcal {A}} \} = 0. \end{aligned}$$

Assume \(\nu _{n+1} = T_*\nu _n\) for \(n \notin {\mathcal {A}}\). Then any weak-\(^*\) limit point of the sequence

$$\begin{aligned} \frac{1}{m} \sum \limits _{n=0}^{m-1} \nu _n \end{aligned}$$

is T-invariant.

Proof

Let \(\nu \) be a weak-\(^*\) limit of a sequence \(\frac{1}{m_k} \sum \limits _{n=0}^{m_k-1} \nu _n\) for some sequence \(m_k \nearrow \infty \). Then

$$\begin{aligned} T_*\nu - \nu = \lim \limits _{k \rightarrow \infty } \frac{1}{m_k} \sum \limits _{n=0}^{m_k-1} (T_*\nu _n - \nu _n) \end{aligned}$$
(4.27)

and we will prove

$$\begin{aligned} \lim \limits _{k \rightarrow \infty } \Big \Vert \frac{1}{m_k} \sum \limits _{n=0}^{m_k-1} (T_*\nu _{n} - \nu _n)\mathbbm {1}_{{\mathcal {A}}}(n) \Big \Vert =0 \end{aligned}$$
(4.28)

and

$$\begin{aligned} \lim \limits _{k \rightarrow \infty } \Big \Vert \frac{1}{m_k} \sum \limits _{n=0}^{m_k-1} (T_*\nu _{n} - \nu _n)\mathbbm {1}_{{\mathcal {A}}^c}(n) \Big \Vert = \lim \limits _{k \rightarrow \infty } \Big \Vert \frac{1}{m_k} \sum \limits _{n=0}^{m_k-1} (\nu _{n+1} - \nu _n)\mathbbm {1}_{{\mathcal {A}}^c}(n) \Big \Vert = 0,\nonumber \\ \end{aligned}$$
(4.29)

where \(\Vert \cdot \Vert \) stands for the total variation norm. Due to (4.27), this will imply \(T_* \nu = \nu \). For (4.28), we have

$$\begin{aligned} \lim \limits _{k \rightarrow \infty } \Big \Vert \frac{1}{m_k} \sum \limits _{n=0}^{m_k-1} (T_*\nu _{n} - \nu _n)\mathbbm {1}_{{\mathcal {A}}}(n) \Big \Vert \le \lim \limits _{k \rightarrow \infty } \frac{2}{m_k} \sum \limits _{n=0}^{m_k-1} \mathbbm {1}_{{\mathcal {A}}}(n) = 0, \end{aligned}$$

as the asymptotic density of \({\mathcal {A}}\) is zero and all \(\nu _n\) and \(T_*\nu _n\) are probability measures. For (4.29), observe that the first equality follows by assumptions, and for a given \(n \in \{0, \ldots , m_{k}-2\}\), if both n and \(n+1\) are in \({\mathcal {A}}^c\), then \(\nu _{n+1}\) cancels out in the sum \(\sum _{n=0}^{m_k-1} (\nu _{n+1} - \nu _n)\mathbbm {1}_{{\mathcal {A}}^c}(n)\) and otherwise it appears in the above sum at most once (possibly with a negative sign). The terms \(\nu _0\) and \(\nu _{m_k}\) appear at most once. Therefore,

$$\begin{aligned}&\lim \limits _{k \rightarrow \infty } \Big \Vert \frac{1}{m_k} \sum \limits _{n=0}^{m_k-1} (\nu _{n+1} - \nu _n)\mathbbm {1}_{{\mathcal {A}}^c}(n) \Big \Vert \\&\quad \le \lim \limits _{k \rightarrow \infty } \frac{1}{m_k} \Big ( \Vert \nu _{m_k} \Vert + \Vert \nu _0 \Vert + \sum \limits _{n=0}^{m_k-2} \Vert \nu _{n+1} \Vert \big (1- \mathbbm {1}_{{\mathcal {A}}^c}(n)\mathbbm {1}_{{\mathcal {A}}^c}(n+1)\big ) \Big )\\&\quad = \lim \limits _{k \rightarrow \infty } \frac{1}{m_k}\Big (2+ \sum \limits _{n=0}^{m_{k}-2} \big (1- \mathbbm {1}_{{\mathcal {A}}^c}(n)\mathbbm {1}_{{\mathcal {A}}^c}(n+1)\big )\Big )\\&\quad \le \lim \limits _{k \rightarrow \infty } \frac{1}{m_k} \Big ( 2 + \sum \limits _{n=0}^{m_{k}-2} \big (\mathbbm {1}_{{\mathcal {A}}}(n) + \mathbbm {1}_{{\mathcal {A}}}(n+1)\big ) \Big ) = 0. \end{aligned}$$

\(\square \)

Let us proceed now with the proof of Theorem 4.14.

Proof of Theorem 4.14

By the construction of f, the set \(\Lambda \) is a compact T-invariant set, and for every \((z,t) \in ({\mathbb {S}}^2 \setminus \{(0,0), \infty \}) \times {\mathbb {S}}^1\), we have \({{\,\mathrm{dist}\,}}(T^n(z,t), \Lambda )\) as \(n \rightarrow \infty \). Hence, \(\Lambda \) is an attractor for T with the basin \(B(\Lambda ) = ({\mathbb {S}}^2 \setminus \{(0,0), \infty \}) \times {\mathbb {S}}^1\). To prove that \(\mu \) is a natural measure for T, we show that the sequence of measures

$$\begin{aligned} \mu _m = \frac{1}{m}\sum \limits _{n=0}^{m-1} \delta _{T^n (z,t)} \end{aligned}$$

converges to \(\mu \) in the weak-\(^*\) topology for every \((z,t) \in ({\mathbb {S}}^2 \setminus (S \cup \{(0,0), \infty \}) \times {\mathbb {S}}^1\). It is enough to prove that every limit point of the sequence \(\mu _m\) is equal to \(\mu \). It follows from Corollary 4.13 that every such limit point must be of the form \(\nu _1/2 + \nu _2/2\), where \(\nu _1\) is a probability measure on the circle \(\{p\} \times {\mathbb {S}}^1\) and \(\nu _2\) is a probability measure on the circle \(\{q\} \times {\mathbb {S}}^1\). Our goal is to show that \(\nu _1 = \delta _{(p, 0)}\) and \(\nu _2 = {{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}\), where \({{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}\) is the Lebesgue measure on \(\{q\} \times {\mathbb {S}}^1\).

Take \(m_k \nearrow \infty \) such that \(\lim \limits _{k \rightarrow \infty } \mu _{m_k} = \nu _1/2 + \nu _2/2\). Let

$$\begin{aligned} \vartheta _{p, k} = \frac{1}{m_k}\sum \limits _{n=0}^{m_k - 1} \mathbbm {1}_{U_p}(f^n(z)) \,\delta _{T^n (z,t)},\qquad \vartheta _{q, k} = \frac{1}{m_k}\sum \limits _{n=0}^{m_k - 1} \mathbbm {1}_{U_q}(f^n(z))\, \delta _{T^n (z,t)} \end{aligned}$$

and

$$\begin{aligned} \vartheta _{O, k} = \frac{1}{m_k}\sum \limits _{n=0}^{m_k - 1} \mathbbm {1}_{{\mathbb {S}}^2 \setminus (S \cup \{(0,0), \infty \} \cup U_p \cup U_q)}(f^n(z)) \, \delta _{T^n (z,t)}. \end{aligned}$$

Clearly,

$$\begin{aligned} \mu _{m_k} = \vartheta _{p, k} + \vartheta _{q, k} + \vartheta _{O, k}. \end{aligned}$$

By Corollary 4.13,

$$\begin{aligned} \lim \limits _{k \rightarrow \infty } \vartheta _{p, k} = \frac{1}{2} \nu _1,\ \lim \limits _{k \rightarrow \infty } \vartheta _{q, k} = \frac{1}{2} \nu _2\ \text { and }\ \lim \limits _{k \rightarrow \infty } \vartheta _{O, k} = 0. \end{aligned}$$

Let

$$\begin{aligned} \pi :X \rightarrow {\mathbb {S}}^1, \qquad \pi (z,t) = t \end{aligned}$$

be the projection. As \({{\,\mathrm{supp}\,}}\nu _1 \subset \{ p \} \times {\mathbb {S}}^1\) and \({{\,\mathrm{supp}\,}}\nu _2 \subset \{q\} \times {\mathbb {S}}^1\) and g, \(R_\alpha \) are uniquely ergodic with invariant measures \(\delta _0\) and \({{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}\), respectively, it is enough to show that the limits of projected measures \(\pi _*\vartheta _{p, k}\) and \(\pi _*\vartheta _{q, k}\) are, respectively, g and \(R_\alpha \)-invariant.

We have

$$\begin{aligned} \pi _*\vartheta _{p, k} = \frac{1}{m_k}\sum \limits _{n=0}^{m_k - 1} \mathbbm {1}_{U_p}(f^n(z)) \, \delta _{\pi (T^n(z,t))}, \end{aligned}$$

Let

$$\begin{aligned} M_k = \sum \limits _{n=0}^{m_k - 1} \mathbbm {1}_{U_p}(f^n(z)) \end{aligned}$$

be the number of iterates \(f^n(z)\) which are in \(U_p\) up to time \(m_k -1\) and let \((z_0, t_0), (z_1, t_1), \ldots \) be consecutive elements of the trajectory \(\{T^n(z,t)\}_{n=0}^{\infty }\), such that \((z_j, t_j) \in U_p \times {\mathbb {S}}^1\). Then

$$\begin{aligned} \pi _*\vartheta _{p, k} = \frac{1}{m_k}\sum \limits _{j=0}^{M_k-1} \delta _{t_j}. \end{aligned}$$

Note that if \(f(z_j) \in U_p\), then \(t_{j+1} = g(t_j)\), so \(\delta _{t_{j+1}} = g_* \delta _{t_j}\). Let \({\mathcal {A}} = \{ j \in {\mathbb {N}}: f(z_j) \notin U_{p} \}\). By Proposition 4.12, the set \({\mathcal {A}}\) has asymptotic density zero, as the time spent in \(U_p\) by the trajectory of z under f during its i-th visit grows linearly with i, while during each visit only the last iterate is such that \(f(z_j) \notin U_p\). We can therefore apply Lemma 4.15 to conclude that the sequence \(\frac{1}{M_k}\sum _{j=0}^{M_k-1} \delta _{t_j}\) converges to a g-invariant probability measure, hence

$$\begin{aligned} \lim \limits _{k \rightarrow \infty } \frac{1}{M_k}\sum \limits _{j=0}^{M_k-1} \delta _{t_j} = \delta _{0}. \end{aligned}$$

On the other hand, Corollary 4.13 implies \(\lim _{k \rightarrow \infty } \frac{M_k}{m_k} = \frac{1}{2}\), so

$$\begin{aligned} \lim \limits _{k \rightarrow \infty } \pi _*\vartheta _{p, k} = \frac{1}{2} \delta _0. \end{aligned}$$

By the same arguments we show

$$\begin{aligned}\lim \limits _{k \rightarrow \infty } \pi _*\vartheta _{q, k} = \frac{1}{2} {{\,\mathrm{Leb}\,}}_{{\mathbb {S}}^1}.\end{aligned}$$

Therefore, \(\mu _m\) converges to \(\mu \) in the weak-\(^*\) topology and \(\mu \) is a natural measure for T. \(\square \)

Remark 4.16

To obtain a counterexample to the SSOY predictability conjecture in its original formulation, one can also perform a similar construction on a manifold with boundary \({\mathbb {B}}\times {\mathbb {S}}^1\), where \({\mathbb {B}}\) is a closed 2-dimensional disc. Namely, it is enough to replace the diffeomorphism f of \({\mathbb {S}}^2\) constructed in Sect. 4.2 with a diffeomorphism of \({\mathbb {B}}\), which is a suitable modification if the ‘Bowen’s eye’ example described e.g. in [Cat14, Example 5.2.(B)], with properties similar to f.