1 Introduction

Let \(T :M \rightarrow M\) be a nonuniformly hyperbolic dynamical system in the sense of Young [32]. Notable examples of such systems are Axiom A or Hénon-type attractors and various chaotic billiards. Suppose that T preserves an ergodic physical (Sinai–Ruelle–Bowen) probability measure m. Let \(\varphi :M \rightarrow {\mathbb {R}}\) be a Hölder observable with \(\int \varphi \, dm = 0\) and let \(S_n = \sum _{j =0}^{n-1} \varphi \circ T^j\) be its corresponding Birkhoff sums.

It is common to expect that \(S_n\), considered as a random process on the probability space (Mm), behaves like a Brownian motion. A standard result, proved for a large class of nonuniformly hyperbolic maps under nonrestrictive assumptions, is the Almost Sure Invariance Principle (ASIP), see Melbourne and Nicol [23] or Gouëzel [17]. It holds if, without changing the distribution, the process \((S_n)_{n \ge 0}\) can be defined on a probability space supporting a Brownian motion \((W_t)_{t \ge 0}\) with variance \(c^2\), such that with some \(\delta \in (0, 1/2)\),

$$\begin{aligned} S_n = W_n + o(n^\delta ) \quad \text {almost surely.} \end{aligned}$$
(1.1)

In this case we say that the process \(S_n\) satisfies the ASIP with rate \(o(n^\delta )\) and variance \(c^2\).

The ASIP has a range of useful implications, such as the (functional) central limit theorem or the (functional) law of iterated logarithm, see Berkes and Philipp [4]. The error rates received a significant amount of attention, see e.g. our paper [11] for an overview. See also Zaitsev [33] for a historical overview for sums of independent random vectors.

There are natural nonuniformly hyperbolic dynamical systems where the ASIP is expected but is not covered by existing proofs for one reason or another. Prime examples are Bunimovich flowers [7] and Wojtkowski’ system of two falling balls [31].

figure a

The nonuniformly hyperbolic structure for these maps, under certain natural assumptions, is established by Bálint, Borbély and Varga [1] and Chernov and Zhang [9]. Although the ASIP has not been proven, other statistical properties such as the functional central limit theorem or (iterated) moment bounds are known, see Fleming-Vázquez [15] and Melbourne and Varandas [24].

Another prototypical example is the intermittent baker’s map [24, Example 4.1]. It is defined as a transformation of the unit square \(M = [0,1] \times [0,1]\) by

$$\begin{aligned} T (x,y) = {\left\{ \begin{array}{ll} (g(x), g^{-1}(y)) &{} \quad x \in [0, 1/2] , \\ (2x - 1, (y+1) / 2 ) &{} \quad x \in (1/2, 1] , \end{array}\right. } \end{aligned}$$
(1.3)

where \(g(x) = x ( 1 + 2^\alpha x^\alpha )\) and \(\alpha \in (0,1)\).

In this paper we prove the ASIP for a class of nonuniformly hyperbolic maps, including (1.2) and (1.3), with rates close to optimum (see Theorem 3.1). Applied to the above examples, our main result is:

Theorem 1.1

Suppose that \(T :M \rightarrow M\) is either (1.3) or one of the maps (1.2) under the assumptions of [1, 9]. Let m be the unique physical invariant probability measure. Let \(v :M \rightarrow {\mathbb {R}}\) be Hölder with \(\int v \, dm = 0\) and \(S_n = \sum _{j < n} v \circ T^j\). Then, for each \({\varepsilon }> 0\),

  1. (a)

    \(S_n\) satisfies the ASIP with rate \(o(n^{1/3} (\log n)^{4/3 + {\varepsilon }})\) for both maps (1.2). (Possibly, the logarithmic factor can be reduced, see Theorem 3.1 and Remark 3.3.)

  2. (b)

    \(S_n\) satisfies the ASIP with rate \(o(n^\alpha (\log n)^{\alpha + {\varepsilon }})\) for the map (1.3) when \(\alpha < 1/2\).

Our proofs are based on:

\(\bullet \):

A construction of an extension of T, which is similar to a Young tower in the sense that it is (topologically) Markov with a tower structure and that the semiconjugacy map is Lipschitz, but with the difference that our extension is also Markov from the measure-theoretical point of view (i.e. can be described by a matrix of transition probabilities).

\(\bullet \):

A representation of the process \(( v \circ T^j)_{ j \ge 0}\), without changing its distribution (but on another probability space), as

$$\begin{aligned} \big ( \psi ( \ldots , g_{k-1}, g_k, g_{k+1}, \ldots ) \big )_{k \in {\mathbb Z}} , \end{aligned}$$
(1.4)

where \((g_n)\) is a particular Markov chain and \(\psi \) is a sufficiently regular function of its trajectories. A key tool in the study of the Markov chain is the meeting time which is, informally, the time when two independent copies of the Markov chain first meet. Moments of the meeting time are used to control approximations of \(\psi ( \ldots , g_{j-1}, g_j, g_{j+1}, \ldots )\) by functions of finitely many coordinates.

\(\bullet \):

An adapted argument of Berkes, Liu and Wu [3] which allows to deal with functions of the whole trajectory of a Markov chain instead of a sequence of independent identically distributed random variables as in [3].

This is a continuation and extension of our previous works on nonuniformly expanding maps [11, 12].

Remark 1.2

As in [11, 12], our proof of ASIP happens on the level of a Markov chain (1.4). At the same time, in [11, 12] we had to deal with functions \(\psi \) which depend only on future trajectories, i.e. \(\sum _{j=0}^{n-1} \psi (g_j, g_{j+1}, \ldots )\). Working with nonuniformly hyperbolic dynamical systems, we necessarily have to deal with functions of the whole past and future trajectories. This creates problems which go beyond simple modifications of our previous arguments both in the reduction to the Markov chain (Sect. 2) and in the probabilistic part (Sect. 3). This can be contrasted with the proof of the ASIP in [3] which is written for a one-sided Bernoulli shift rather than for a two-sided one “to simplify the notation”.

Remark 1.3

It is curious that the degenerate case of zero variance is not covered by our general arguments and requires a special and very different treatment (Sect. 3.6).

Remark 1.4

There are prior ideas which may be useful in finding alternative proofs of the ASIP for dynamical systems such as (1.2) and (1.3), including the generalization to vector-valued observables. Notably:

\(\bullet \):

Melbourne and Nicol [22, 23] prove the ASIP for vector-valued Hölder observables on nonuniformly hyperbolic dynamical systems with uniform contraction along stable leaves. Their proofs work using the so-called Sinai’s trick (after Sinai [30] and Bowen [6], see [24, Introduction]), representing a Hölder observable \(v :M \rightarrow {\mathbb {R}}\) as \(v = u + \chi \circ T - \chi \), where \(\chi \) is bounded and u is Hölder and constant on stable leaves. Then one can work with Birkhoff sums of u on a quotient system which is nonuniformly expanding, and can be in turn reduced to a Gibbs–Markov map by inducing. For the ASIP, Melbourne and Nicol adapt the results of Philipp and Stout [26] and Kuelbs and Philipp [20]. In the absence of uniform contraction along stable leaves, as is the case for the maps (1.2) and (1.3), Sinai’s trick works but \(\chi \) and u lose regularity and become challenging to control. Nevertheless, some control is possible. Melbourne and Varandas [24] prove that \(\chi ,u \in L^p\) with some \(p > 1\), and furthermore \(u = m + \psi \circ T - \psi \) where again \(m,\psi \in L^p\) and Birkhoff sums of m are a reverse martingale.

\(\bullet \):

Gouëzel [17] proves the ASIP for vector-valued random variables under a condition of certain exponential memory loss for characteristic functions. This condition does not hold for the maps (1.2) or (1.3), but for instance Gouëzel [17, Theorem 2.4] uses it to prove the ASIP for a class of \(L^p\) observables on Gibbs–Markov maps, and as a corollary on nonuniformly expanding maps via inducing.

However, using the above techniques, the error rates would be \(O(n^{-1/4} (\log n)^\alpha )\) for some \(\alpha >0\) at best, whereas our approach is the only currently available one with better rates.

The paper is organized as follows. In Sect. 2 we state the assumptions on the nonuniformly hyperbolic maps and prove that they can be modelled using Markov chains of a particular type. We call such systems Markov shift towers. In Sect. 3 we prove the ASIP for Hölder observables on nonuniformly hyperbolic maps through that on Markov shift towers. Theorem 1.1 is a corollary of the more general Theorem 3.1.

We use the notation \({\mathbb {N}}= \{0,1,\ldots \}\) and \({\mathbb {N}}_0 = \{1,2,\ldots \}\).

2 Nonuniformly Hyperbolic Maps as Markov Shift Towers

In this section we give a standard definition of nonuniformly hyperbolic maps, introduce Markov shift towers and show that every nonuniformly hyperbolic map \(T :M \rightarrow M\) with invariant measure m has an extension \(f :\Delta \rightarrow \Delta \) with invariant measure \({{\,\mathrm{\mathbb {P}}\,}}\) which is a Markov shift tower, and the semiconjugacy \(\pi :\Delta \rightarrow M\) is Lipschitz.

figure b

We also quantify the return time tails of f depending on those of T. This result is stated formally in Theorem 2.4, but first we need a few pages of notation.

2.1 Notation and result

2.1.1 Nonuniformly hyperbolic maps

Let \((M,d_0)\) be a bounded metric space with a Borel probability measure m such that (Mm) is a Lebesgue space. Let \(T :M \rightarrow M\) be measurable for the Borel sigma-field. We assume that T preserves m, is ergodic and is nonuniformly hyperbolic in the following sense:

\(\bullet \):

There is a measurable subset \(Y \subset M\), \(m(Y) >0\), with an at most countable partition \(\alpha \) and a return time \(\tau :Y \rightarrow \{1,2,\ldots \}\) that is constant on each \(a \in \alpha \) with value \(\tau (a)\) and \(T^{\tau (y)}(y) \in Y\) for all \(y \in Y\). We denote by \(F :y \mapsto T^{\tau (y)}(y)\) the induced map. There is an F-invariant probability measure \(\mu \) on Y such that \(\int \tau \, d\mu < \infty \) and which agrees with m in the usual for induced maps sense:

$$\begin{aligned} m = \Bigl ( \int \tau \, d\mu \Bigr )^{-1} \sum _{a \in \alpha } \sum _{k=0}^{\tau (a) - 1} (T^k)_* \mu _a , \end{aligned}$$

where \(\mu _a\) is the restriction of \(\mu \) on a.

\(\bullet \):

Coding of orbits under F by elements of \(\alpha \) is non-pathological in the sense that the set of \((\ldots , a_{-1}, a_0, a_1, \ldots ) \in \alpha ^{\mathbb {Z}}\), for which there exists \((\ldots , y_{-1}, y_0, y_1, \ldots ) \in Y^{\mathbb {Z}}\) such that \(y_n \in a_n\) and \(F(y_n) = y_{n+1}\), is measurable in \(\alpha ^{\mathbb {Z}}\) with the product topology and Borel sigma-algebra (\(\alpha \) is equipped with the discrete topology).

\(\bullet \):

For \(y,z \in Y\) we define the separation time s(yz) as the least \(n \ge 0\) such that \(F^n(y)\) and \(F^n(z)\) belong to different elements of \(\alpha \), or \(\infty \) if such n does not exist. There are constants \(K\ge 1\) and \(\gamma \in (0,1)\) such that:

\(\bullet \):

If \(y,z \in Y\), then for all \(j \ge 0\),

$$\begin{aligned} d_0(F^j(y),F^j(z)) \le K (\gamma ^{s(y,z) - j} + \gamma ^{j}) . \end{aligned}$$
(2.1)
\(\bullet \):

If \(y,z \in a\), \(a \in \alpha \), then for all \(0 \le j \le \tau (a)\),

$$\begin{aligned} d_0(T^j y,T^j z) \le K \bigl [d_0(y,z) + d_0(F(y), F(z)) \bigr ] \end{aligned}$$
(2.2)
\(\bullet \):

There is a partition \({\mathcal {W}}^s\) of Y into “stable leaves”. The stable leaves are invariant under F meaning that \(F(W^s_y) \subset W^s_{F(y)}\) for all \(y \in Y\), where \(W^s_y\) is the stable leaf containing y. Each \(a \in \alpha \) is a union of stable leaves, i.e. \({\mathcal {W}}_s\) is a refinement of \(\alpha \). Let \({\bar{Y}}= Y / \sim \), where \(y \sim z\) if \(y \in W^s_z\), and let \({\bar{\pi }}:Y \rightarrow {\bar{Y}}\) be the natural projection. Since the stable leaves are invariant under F and \({\mathcal {W}}^s\) is a refinement of \(\alpha \), we obtain a well defined quotient map \({\bar{F}}:{\bar{Y}}\rightarrow {\bar{Y}}\) with a partition \({\bar{\alpha }}\) of \({\bar{Y}}\) and separation time s. We suppose that the probability measure \({\bar{\mu }}= {\bar{\pi }}_* \mu \) on \({\bar{Y}}\) is invariant under \({\bar{F}}\). Moreover, for each \(a \in {\bar{\alpha }}\), the restriction \({\bar{F}}:a \rightarrow {\bar{Y}}\) is a bijection (modulo \({\bar{\mu }}\) zero measure), and its inverse Jacobian \( \displaystyle \zeta _a= \frac{d{\bar{\mu }}}{d{\bar{\mu }}\circ {\bar{F}}}\) satisfies

$$\begin{aligned} \bigl |\log \zeta _a(y)-\log \zeta _a(z)\bigr | \le K\gamma ^{s(y,z)} \quad \text {for all } y,z \in a . \end{aligned}$$

In other words, the quotient map \({\bar{F}}\) is full branch Gibbs–Markov.

2.1.2 Markov shift towers

A closely related class of nonuniformly hyperbolic maps is what we call a Markov shift tower. The main difference is that the induced map, denoted by \(f_X\) below, is a Bernoulli shift, both topologically and measure-theoretically.

We say that \(f :\Delta \rightarrow \Delta \) is a Markov shift tower if it has the following structure.

\(\bullet \):

We are given a finite or countable probability space \(({\mathcal {A}}, {{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}})\), an integrable function \(h_{\mathcal {A}}:{\mathcal {A}}\rightarrow \{1,2,\ldots \}\) and a constant \(\xi \in (0, 1)\). These define the rest of the construction.

\(\bullet \):

Let \((X, {{\,\mathrm{\mathbb {P}}\,}}_X) = ({\mathcal {A}}^{\mathbb {Z}}, {{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}}^{\mathbb {Z}})\) be the product probability space and let \(f_X :X \rightarrow X\) be the left shift

$$\begin{aligned} f_X(\ldots , a_{-1}, a_0, a_1, \ldots ) = (\ldots , a_0, a_1, a_2, \ldots ) . \end{aligned}$$

Define \(h :X \rightarrow \{1,2,\ldots \}\) by \(h(\ldots , a_{-1}, a_0, a_1, \ldots ) = h_{\mathcal {A}}(a_0)\).

\(\bullet \):

The map \(f :\Delta \rightarrow \Delta \) is a suspension over \(f_X :X \rightarrow X\) with a roof function h, i.e.

$$\begin{aligned} \begin{aligned} \Delta&= \{(x,\ell ) \in X \times {\mathbb {Z}}: 0 \le \ell< h(x)\} \\ f(x,\ell )&= {\left\{ \begin{array}{ll} (x, \ell + 1), &{} \quad \ell < h(x) - 1 \\ (f_X (x), 0), &{} \quad \ell = h(x) - 1 \end{array}\right. } . \end{aligned} \end{aligned}$$
(2.3)
\(\bullet \):

Let \({{\,\mathrm{\mathbb {P}}\,}}\) be the probability measure on \(\Delta \) defined by

$$\begin{aligned} {{\,\mathrm {\mathbb {P}}\,}}(A \times \{\ell \})=&{} \biggl ( \int h \, d {{\,\mathrm {\mathbb {P}}\,}}_X \biggr )^{-1} {{\,\mathrm {\mathbb {P}}\,}}_X(A) \quad \text{ for } \text{ all } \quad \ell \ge 0 \quad \text{ and } \quad \\ {}{}&{} \quad A \subset \{x \in X : h(x) \ge \ell + 1\} . \end{aligned}$$

Since \({{\,\mathrm{\mathbb {P}}\,}}_X\) is \(f_X\)-invariant, note that \({{\,\mathrm{\mathbb {P}}\,}}\) is \(f\)-invariant.

\(\bullet \):

Define a distance \(d\) on \(X\) by \(d(x,y) = \xi ^{s(x,y)}\), where \(s :X \times X \rightarrow \{0,1,\ldots \}\) is the separation time,

$$\begin{aligned}{} & {} s((\ldots , a_{-1}, a_0, a_1, \ldots ), (\ldots , b_{-1}, b_0, b_1, \ldots ) ) \\{} & {} \qquad = \inf \{ j \ge 0 : a_j \ne b_j \text { or } a_{-j} \ne b_{-j} \} . \end{aligned}$$

Let \(d\) also denote the related distance on \(\Delta \):

$$\begin{aligned} d((x,k),(y,j)) = {\left\{ \begin{array}{ll} 1, &{} \quad k \ne j \\ d(x,y), &{} \quad k=j \end{array}\right. }. \end{aligned}$$
(2.4)

Remark 2.1

As a part of the definition of a Markov shift tower, \(f :\Delta \rightarrow \Delta \) is Markov both topologically (as a suspension over a Bernoulli shift, it has a natural Markov partition) and probabilistically (the invariant measure \({{\,\mathrm{\mathbb {P}}\,}}\) is Markov, it can be described by transition probabilities). Also \(\Delta \) is a metric space with metric d.

Remark 2.2

We used similar Markov shift towers in our previous works [11, 12, 19]. The main difference is that they were one-sided: the base map was the left shift \(f_X :X \rightarrow X\) on \(X = {\mathcal {A}}^{\mathbb {N}}\) instead of \(X = {\mathcal {A}}^{\mathbb {Z}}\) here. A two-sided Markov shift tower is the natural extension [28, 29] of the corresponding one-side tower.

Remark 2.3

An important characteristic of both nonuniformly hyperbolic maps and Markov shift towers is tail of the return times, i.e. the asymptotics of \(\mu (\tau \ge n)\) and \({{\,\mathrm{\mathbb {P}}\,}}_X(h \ge n)\) respectively.

2.1.3 Main result

Theorem 2.4

Suppose that \(T :M \rightarrow M\) is a nonuniformly hyperbolic map as in Sect. 2.1.1. Then there exists a Markov shift tower \(f :\Delta \rightarrow \Delta \) and a map \(\pi :\Delta \rightarrow M\), defined \({{\,\mathrm{\mathbb {P}}\,}}\)-almost surely on \(\Delta \), such that:

  • \(\pi \) is a semiconjugacy, i.e. \(\pi \circ f = T \circ \pi \),

  • \(\pi \) is measure preserving, i.e. \(\pi _* {{\,\mathrm{\mathbb {P}}\,}}= m\),

  • \(\pi \) is Lipschitz.

Moreover, the return time tails of f are closely related to those of T:

  • if \(\mu (\tau \ge n) = O(n^{-\beta } L(n) )\) with \(\beta > 0\) and L a slowly varying function at infinity, then \({{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}}(h_{\mathcal {A}}\ge n) = O(n^{-\beta } L(n) )\);

  • if \(\int \tau ^\beta \, d\mu < \infty \) with \(\beta > 0\), then \(\int h_{\mathcal {A}}^\beta \, d{{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}}< \infty \);

  • if \(\int e^{\beta \tau ^\delta } \, d\mu < \infty \) with \(\beta > 0\) and \(\delta \in (0,1]\), then \(\int e^{\beta ' h_{\mathcal {A}}^\delta } \, d{{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}}< \infty \) for some \(0 < \beta ' \le \beta \).

2.2 Proof of Theorem 2.4

Our strategy is to reduce the proof to an application of  [19, Theorem 3.4]. There, a one-sided Markov shift tower is constructed for a nonuniformly expanding map. Then T can be considered via the natural extension of such a map, and the corresponding natural extension of the one-sided Markov shift in [19] is a two-sided Markov shift, as required for Theorem 2.4.

Let \(\sigma :\alpha ^{\mathbb {Z}}\rightarrow \alpha ^{\mathbb {Z}}\) be the left shift. We supply \(\alpha ^{\mathbb {Z}}\) with the product topology and the Borel sigma-algebra. For \(x=(x_k)_{k \in {\mathbb {Z}}}\) and \(y=(y_k)_{k \in {\mathbb {Z}}}\) in \(\alpha ^{\mathbb {Z}}\), define the separation time and distance by

$$\begin{aligned} s(x,y)&= \inf \{k \ge 0 : x_k \ne y_k \text { or } x_{-k} \ne y_{-k} \} , \\ d(x,y)&= \gamma ^{s(x,y)} . \end{aligned}$$

Here \(\gamma \) is the constant in (2.1).

2.2.1 Tower over full shift

Our first step is to model \(T :M \rightarrow M\) as a suspension tower over \(\sigma :\alpha ^{\mathbb {Z}}\rightarrow \alpha ^{\mathbb {Z}}\). It is natural that this can be done, yet this requires some care.

Proposition 2.5

There exists a probability measure \(\mu _\alpha \) on \(\alpha ^{\mathbb {Z}}\) and a \(\mu _\alpha \)-almost everywhere defined map \(\chi _\alpha :\alpha ^{\mathbb {Z}}\rightarrow Y\) such that on the domain of \(\chi _\alpha \):

  • \(\chi _\alpha \) is Lipschitz: \(d_0(\chi _\alpha (x), \chi _\alpha (y)) \le 2 K d(x,y)\),

  • \((\chi _\alpha )_* \mu _\alpha = \mu \),

  • \(F \circ \chi _\alpha = \chi _\alpha \circ \sigma \),

  • \(\chi _\alpha (\ldots , a_{-1}, a_0, a_1, \ldots ) \in a_0\).

Proof

In order to construct the natural extension [28, 29] of \(F :Y \rightarrow Y\), we assume that \(F(Y) = Y\). This comes without loss of generality: otherwise instead of Y we work with \(\cap _{n \ge 0} F^{-n}(Y)\). This is a full measure subset of Y on which F is surjective, and the proof goes through with straightforward and minor changes.

Let \({\breve{F}}:{\breve{Y}}\rightarrow {\breve{Y}}\) be the natural extension [28, 29] of \(F :Y \rightarrow Y\):

$$\begin{aligned} {\breve{Y}}= \bigl \{ (\ldots , y_{-1}, y_0, y_1, \ldots ) \in Y^{\mathbb {Z}}: y_{n + 1} = F(y_n) \bigr \} , \end{aligned}$$

and \({\breve{F}}\) is the left shift. The topology on \({\breve{Y}}\) is generated by open cylinders of the type \(\{y_i \in E\}\) with open \(E \subset Y\) (in the induced topology on Y as a subspace of M), and we consider \({\breve{Y}}\) with the Borel sigma-algebra. Let \({\breve{\pi }}_\ell :{\breve{Y}}\rightarrow Y\), \((\ldots , y_{-1}, y_0, y_1, \ldots ) \mapsto y_\ell \) be the natural projections, set \({\breve{\pi }}= {\breve{\pi }}_0\), and let \({\breve{\mu }}\) be the unique \({\breve{F}}\)-invariant probability measure on \({\breve{Y}}\) such that \({\breve{\pi }}_* {\breve{\mu }}= \mu \).

Define \(\iota :{\breve{Y}}\rightarrow \alpha ^{\mathbb {Z}}\) by \((\ldots , y_{-1}, y_0, y_1, \ldots ) \mapsto (\ldots , a_{-1}, a_0, a_1, \ldots )\) where \(y_n \in a_n\). Then \(\iota \) is a measurable and injective map. Set \(\mu _\alpha = \iota _* {\breve{\mu }}\). Using (2.1), for \(0 \le |\ell | \le n\),

$$\begin{aligned}&{{\,\textrm{diam}\,}}\bigl ( ({\breve{\pi }}_\ell \circ \iota ^{-1})([a_{-n}, \ldots , a_n]) \bigr )\\&\quad = {{\,\textrm{diam}\,}}\bigl ( F^{n + \ell } ( \{ y \in Y : F^k(y) \in a_{-n + k} \text { for all } 0 \le k \le 2n \} ) \bigr ) \\&\quad \le 2 K \gamma ^{n - |\ell |} = 2 K \gamma ^{-|\ell |} {{\,\textrm{diam}\,}}([a_{-n}, \ldots , a_n]) , \end{aligned}$$

where \([a_{-n}, \ldots , a_n] \subset \alpha ^{\mathbb {Z}}\) is a cylinder. Hence \({\breve{\pi }}_\ell \circ \iota ^{-1}\) is Lipschitz with Lipschitz constant \(2 K \gamma ^{-|\ell |}\). Since \(\iota ^{-1}(x) = ( ({\breve{\pi }}_\ell \circ \iota ^{-1}) (x) )_{\ell \in {\mathbb {Z}}}\) and each \({\breve{\pi }}_\ell \circ \iota ^{-1}\) is continuous, \(\iota ^{-1} :\iota ({\breve{Y}}) \rightarrow {\breve{Y}}\) is continuous. Recall that, as a part of the definition of T as a nonuniformly hyperbolic map we assumed that \(\iota ({\breve{Y}})\) is measurable in \(\alpha ^{\mathbb {Z}}\). Then \(\iota ^{-1}\) is measurable.

Observe that \(\iota ^{-1}\) is a conjugacy between \({\breve{F}}\) and \(\sigma \). Set \(\chi _\alpha = {\breve{\pi }}\circ \iota ^{-1}\).

figure c

Then \(\chi _\alpha \) is a measure preserving semiconjugacy between \(\sigma \) and F with the required properties. \(\square \)

In the setup of Proposition 2.5, consider the suspension map over \(\sigma \) with roof function \(\tau _\alpha = \tau \circ \chi _\alpha \), i.e. the space

$$\begin{aligned} M_\alpha = \{(x, \ell ) : x \in \alpha ^{\mathbb {Z}}, \; 0 \le \ell < \tau _\alpha (x) \} \end{aligned}$$

with the transformation

$$\begin{aligned} T_\alpha :(x, \ell ) \mapsto {\left\{ \begin{array}{ll} (x, \ell + 1), &{} \quad \ell < \tau _\alpha (x) - 1, \\ (\sigma (x), 0), &{} \quad \text {else.} \end{array}\right. } \end{aligned}$$

As in (2.4), we supply \(M_\alpha \) with the metric

$$\begin{aligned} d_\alpha ((x, k), (y, \ell )) = {\left\{ \begin{array}{ll} 1, &{} \quad k \ne \ell , \\ d(x,y), &{} \quad k = \ell . \end{array}\right. } \end{aligned}$$

Let \(m_\alpha = \mu _\alpha \times \text {counting} / \text {normalization}\) be the probability measure on \(M_\alpha \), and let \(\pi _\alpha :M_\alpha \rightarrow M\),

$$\begin{aligned} \pi _\alpha :(x, \ell ) \mapsto T^\ell (\chi _\alpha (x)) . \end{aligned}$$

Remark 2.6

Observe that:

  • \(T_\alpha :M_\alpha \rightarrow M_\alpha \) is a nonuniformly hyperbolic map with the same tails of return times as T, namely \(\mu _\alpha (\tau _\alpha \ge n) = \mu (\tau \ge n)\) for all n.

  • \(\pi _\alpha \) is defined \(m_\alpha \)-almost surely, but possibly not on the whole \(M_\alpha \).

  • \(\pi _\alpha :M_\alpha \rightarrow M\) is a measure preserving semiconjugacy between \(T_\alpha \) and T (modulo zero measure).

  • \(\pi _\alpha :M_\alpha \rightarrow M\) is Lipschitz (on its domain). Indeed,

    $$\begin{aligned} d_0(\pi _\alpha (x, \ell ), \pi _\alpha (y, k)) \le C d((x,\ell ), (y, k)) \end{aligned}$$

    holds:

    • with \(C = {{\,\textrm{diam}\,}}M\) when \(\ell \ne k\) by construction,

    • with \(C = 2K\) when \(\ell = k = 0\) by Proposition 2.5,

    • with \(C = 2 K^2 ( 1 + \gamma ^{-1})\) when \(\ell = k > 0\). Indeed, by (2.2),

      $$\begin{aligned} d_0(\pi _\alpha (x, \ell ), \pi _\alpha (y, \ell )) \le K \big [ d_0 ( \chi _\alpha (x), \chi _\alpha (y)) + d_0 ( F \circ \chi _\alpha (x), F \circ \chi _\alpha (y)) \big ] \, . \end{aligned}$$

      Now, by Proposition 2.5, \(F \circ \chi _\alpha = \chi _\alpha \circ \sigma \) and \(\chi _\alpha \) is Lipschitz. Then

      $$\begin{aligned} d_0(\pi _\alpha (x, \ell ), \pi _\alpha (y, \ell )) \le 2K^2 \big [ d ( x,y) + d ( \sigma (x), \sigma (y)) \big ] \, . \end{aligned}$$

      To conclude, we use the fact that since \(s(x,y) \le 1 + s(\sigma (x),\sigma (y)) \), \(d ( \sigma (x), \sigma (y)) \le \gamma ^{-1} d ( x,y)\).

2.2.2 Proof of Theorem 2.4 for \(T_\alpha \)

We prove Theorem 2.4 for \(T_\alpha \) instead of T, and Remark 2.6 guarantees that the result carries over to T. We construct a number of intermediate spaces.

Let \({\mathcal {A}}\) denote the set of all finite words in the alphabet \(\alpha \), not including the empty word. For \(w = a_0 \cdots a_{n-1} \in {\mathcal {A}}\) let \(h_{\mathcal {A}}(w) = \tau (a_0) + \cdots + \tau (a_{n-1})\).

Define \(\pi _{\mathcal {A}}:{\mathcal {A}}^{\mathbb {Z}}\rightarrow \alpha ^{\mathbb {Z}}\) by \(\pi _{\mathcal {A}}(\ldots , w_{-1}, w_0, w_1, \ldots ) = \cdots w_{-1} w_0 w_1 \cdots \), which is a concatenation with the first letter of \(w_0\) at index 0. Similarly define \({\bar{\pi }}_{\mathcal {A}}:{\mathcal {A}}^{\mathbb {N}}\rightarrow \alpha ^{{\mathbb {N}}}\) by \({\bar{\pi }}_{\mathcal {A}}(w_0, w_1, \ldots ) = w_0 w_1 \cdots \).

Let \(\pi _\alpha ^+ :\alpha ^{\mathbb {Z}}\rightarrow \alpha ^{\mathbb {N}}\) be the projection on nonnegative coordinates. Using that \(\tau _\alpha \) depends only on the “future” coordinates, define \({\bar{\tau }}_\alpha :\alpha ^{{\mathbb {N}}} \rightarrow {\mathbb {N}}_0\) so that \(\tau (x) = {\bar{\tau }}_\alpha (\pi _\alpha ^+(x))\).

Let \({\bar{M}}_\alpha = \{ (x, \ell ) \in \alpha ^{{\mathbb {N}}} \times {\mathbb {Z}}: 0 \le \ell < {\bar{\tau }}_\alpha (x) \}\), and let

$$\begin{aligned} {\bar{T}}_\alpha :(x, \ell ) \mapsto {\left\{ \begin{array}{ll} (x, \ell + 1) &{} \quad \ell < {\bar{\tau }}_\alpha (y) - 1 , \\ (\sigma (x), 0) &{} \quad \ell = {\bar{\tau }}_\alpha (y) - 1 . \end{array}\right. } \end{aligned}$$

Then \({\bar{T}}_\alpha :{\bar{M}}_\alpha \rightarrow {\bar{M}}_\alpha \) is a nonuniformly expanding system. It preserves the probability measure \({\bar{m}}_\alpha = {\bar{\mu }}_\alpha \times \text {counting} / \text {normalization}\), where \({\bar{\mu }}_\alpha = (\pi _\alpha ^+)_* \mu _\alpha \) is the projection of \(\mu _\alpha \) on nonnegative coordinates.

Let \(\psi :M_\alpha \rightarrow {\bar{M}}_\alpha \) be the natural projection,

$$\begin{aligned} \psi :( x, \ell ) \mapsto ( \pi _\alpha ^+(x), \ell \bigr ) . \end{aligned}$$

Remark 2.7

\(T_\alpha \) is the natural extension of \({\bar{T}}_\alpha \). In particular, \(\psi \) is a semiconjugacy, and \({\bar{m}}_\alpha \) is the unique \({\bar{T}}_\alpha \)-invariant probability measure so that \(\psi _* m_\alpha = {\bar{m}}_\alpha \).

To use the same notations as in [19], let \(\xi = \gamma \). By [19, Theorem 3.4, Section 4.1], there exists a probability measure \({{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}}\) on \({\mathcal {A}}\), such that the one-sided Markov shift tower (see Remark 2.2) \({\bar{f}}:{\bar{\Delta }}\rightarrow {\bar{\Delta }}\), defined by \(\bigl \{ ({\mathcal {A}}, {{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}})\), \(h_{\mathcal {A}}\), \(\xi \bigr \}\), is an extension of \({\bar{T}}_\alpha \) with the measure preserving and Lipschitz semiconjugacy \({\bar{\pi }}_\alpha :{\bar{\Delta }}\rightarrow {\bar{M}}_\alpha \),

$$\begin{aligned} {\bar{\pi }}_\alpha :(x, \ell ) \mapsto {\bar{T}}_\alpha ^\ell ({\bar{\pi }}_{\mathcal {A}}(x),0) . \end{aligned}$$

Next let \({\bar{{{\,\mathrm{\mathbb {P}}\,}}}}\) denote be the \({\bar{f}}\)-invariant probability measure on \({\bar{\Delta }}\).

Now let \(f :\Delta \rightarrow \Delta \) be the two-sided (i.e. as in Sect. 2.1) Markov shift tower defined by the same objects \(\bigl \{ ({\mathcal {A}}, {{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}})\), \(h_{\mathcal {A}}\), \(\xi \bigr \}\), with the invariant probability measure \({{\,\mathrm{\mathbb {P}}\,}}\).

Let \({\bar{\pi }}_\Delta :\Delta \rightarrow {\bar{\Delta }}\) be the natural projection,

$$\begin{aligned} {\bar{\pi }}_\Delta :(x, \ell ) \mapsto ({\bar{\pi }}^+_{\mathcal {A}}(x), \ell ) , \end{aligned}$$

where \({\bar{\pi }}^+_{\mathcal {A}}: {\mathcal {A}}^{\mathbb {Z}}\rightarrow \alpha ^{\mathbb {N}}\) is defined by \({\bar{\pi }}^+_{\mathcal {A}}(\ldots , w_{-1}, w_0, w_1, \ldots ) =w_0 w_1 w_2 \cdots \). Observe that \({\bar{\pi }}_\Delta \) is a measure preserving and Lipschitz semiconjugacy between f and \({\bar{f}}\).

Remark 2.8

\({\bar{\pi }}_\alpha \circ {\bar{\pi }}_\Delta :\Delta \rightarrow {\bar{M}}_\alpha \) is a measure preserving semiconjugacy, in particular

$$\begin{aligned} ({\bar{\pi }}_\alpha \circ {\bar{\pi }}_\Delta )_* {{\,\mathrm{\mathbb {P}}\,}}= {\bar{m}}_\alpha . \end{aligned}$$

Define \(\pi _\alpha :\Delta \rightarrow M_\alpha \),

$$\begin{aligned} \pi _\alpha :(x, \ell ) \mapsto T_\alpha ^\ell (\pi _{\mathcal {A}}(x), 0) . \end{aligned}$$

Observe that \(\pi _\alpha \) is a measure preserving semiconjugacy too.

Now, f (\( \Delta \rightarrow \Delta \)) preserving the probability measure \({{\,\mathrm{\mathbb {P}}\,}}\), with the semiconjacy \(\pi _\alpha \) to \(T_\alpha \) (\( M_\alpha \rightarrow M_\alpha \)), is the Markov shift tower we are after. It remains to verify that it has the required properties: that \(\pi _\alpha \) is measure preserving and Lipschitz, and to bound the return time tails.

Proposition 2.9

\(\pi _\alpha \) is measure preserving: \((\pi _\alpha )_* {{\,\mathrm{\mathbb {P}}\,}}= m_\alpha \).

Proof

We have constructed four dynamical systems: \(f :\Delta \rightarrow \Delta \) with invariant measure \({{\,\mathrm{\mathbb {P}}\,}}\), \({\bar{f}}:{\bar{\Delta }}\rightarrow {\bar{\Delta }}\) with invariant measure \({\bar{{{\,\mathrm{\mathbb {P}}\,}}}}\), \(T_\alpha :M_\alpha \rightarrow M_\alpha \) with invariant measure \(m_\alpha \), and \({\bar{T}}_\alpha :{\bar{M}}_\alpha \rightarrow {\bar{M}}_\alpha \) with invariant measure \({\bar{m}}_\alpha \). They are connected by semiconjugacies \(\pi _\alpha :\Delta \rightarrow M_\alpha \), \({\bar{\pi }}_\alpha :{\bar{\Delta }}\rightarrow {\bar{M}}_\alpha \), \({\bar{\pi }}_\Delta :\Delta \rightarrow {\bar{\Delta }}\) and \(\psi :M_\alpha \rightarrow M_\alpha \), where \(\psi \) is the natural projection.

The left diagram commutes, and we have to justify the dashed arrow in the right diagram:

figure d

By Remark 2.7, \(m_\alpha \) is the unique probability measure on \(M_\alpha \) so that \(\psi _* m_\alpha = {\bar{m}}_\alpha \). On the other hand, \(\psi \circ \pi _\alpha = {\bar{\pi }}_\Delta \circ {\bar{\pi }}_\alpha \), and therefore \(\psi _* \bigl ( (\pi _\alpha )_* {{\,\mathrm{\mathbb {P}}\,}}\bigr ) = {\bar{m}}_\alpha \), so \((\pi _\alpha )_* {{\,\mathrm{\mathbb {P}}\,}}= m_\alpha \) as required. \(\square \)

By the same argument as in [19, Proposition 4.18], \(\pi _\alpha \) is Lipschitz.

The return time tails on \(\Delta \) and \({\bar{\Delta }}\) are equal by construction, and [19, Theorem 3.4] proves that on \({\bar{\Delta }}\) they are as required for Theorem 2.4 except for the more general than in [19] polynomial case \(\mu _\alpha (\tau _\alpha \ge n) = O(n^{-\beta } L(n))\) with \(\beta > 0\) and L a slowly varying function at infinity. In turn, the relation between the tails in [19] is taken from [18, Section 4]. To extend the results as we require, it is sufficient to notice that, by taking into account the properties of slowly varying functions (see for instance [5, Chapter 1]), the following version of [18, Proposition 4.4] holds:

Proposition 2.10

(in notation of [18]). Suppose that there exist \(C_\tau > 0\), \(\beta > 0\), L a slowly varying function at infinity and \(\ell _0 \ge 1\), such that \(m(\tau \ge \ell ) \le C_\tau \ell ^{-\beta } L( \ell )\) for all \(\ell \ge \ell _0\). Then \({{\,\mathrm{\mathbb {P}}\,}}(t \ge \ell ) \le C \ell ^{-\beta }L( \ell )\).

The proof of Theorem 2.4 for \(T_\alpha \) is complete, and following Remark 2.6 we recover the full result.

3 ASIP for Hölder Observables of Nonuniformly Hyperbolic Maps

Theorem 1.1 is an application of the following general result, which is the goal of this section:

Theorem 3.1

Let \(T :M \rightarrow M\) be an ergodic, measure-preserving transformation defined on a bounded metric space (Md) with Borel probability measure m. Let \(\varphi :M \rightarrow {\mathbb {R}}\) be Hölder continuous and centered, i.e. \(\int \varphi \, dm = 0\). Let \(S_n( \varphi ) = \sum _{k =0}^{ n-1} \varphi \circ T^k\). Suppose that T is nonuniformly hyperbolic (in the sense of Sect. 2.1.1) with uniformly hyperbolic induced map F, return time \(\tau \) and F-invariant measure \(\mu \) associated with a subset Y of M. Assume that \(\int \tau ^2 \, d \mu < \infty \). Then the limit \(c^2= \lim _{n \rightarrow \infty } n^{-1}\int |S_n( \varphi )|^2 d \mu \) exists. In addition,

  1. (a)

    If \( \mu ( \tau \ge n) = O(n^{-\beta } (\log n)^\gamma )\), with \(\beta > 2\) and \(\gamma \in {\mathbb R}\), then for each \({\varepsilon }> 0\) the process \(S_n\) satisfies the ASIP with variance \(c^2\) and rate \(o(n^{1/\beta } (\log n)^{( \gamma + 1)/\beta + {\varepsilon }})\).

  2. (b)

    If \(\int \tau ^\beta \, d \mu < \infty \) with \(\beta > 2\), then \(S_n\) satisfies the ASIP with variance \(c^2\) and rate \(o(n^{1/\beta })\).

  3. (c)

    If \(\int e^{\beta \tau ^\delta } \, d \mu < \infty \) with \(\beta > 0\) and \(\delta \in (0,1]\), then \(S_n\) satisfies the ASIP with variance \(c^2\) and rate \(O(\log n)^{1 + 1/\delta }\).

Remark 3.2

The dependence of the ASIP on \({\varepsilon }\) in item (a) should be understood as follows: for every \({\varepsilon }> 0\), there exists a Brownian motion \((W_t)_{t >0}\) with variance \(c^2\) such that \(S_n = W_n + o(n^{1/\beta } (\log n)^{( \gamma + 1)/\beta + {\varepsilon }})\) almost surely.

For the flower billiard map and the two falling balls map defined in (1.2), it has been proved in [9] that they are examples of nonuniformly hyperbolic maps with tails of the return times of the form \(O ( n^{-3} ( \log n)^3 )\). On another hand, for the intermittent Baker’s map as described in (1.3), it has been proved in [24] that the associated return times have tails \(\sim n^{-\beta }\) with \(\beta =1/\alpha \). These considerations together with Theorem 3.1 prove Theorem 1.1 of the introduction.

Remark 3.3

It is expected [2] that for the maps (1.2) the tails of the return times can be improved to \(O(n^{-3})\), dropping the logarithmic factor, similarly to what was done for Buminovich stadia and related systems [10]. This would improve our result.

Let \(f :\Delta \rightarrow \Delta \) be a Markov shift tower constructed for T as in Theorem 2.4, built with a probability space \(({\mathcal {A}}, {{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}})\), roof function \(h_{\mathcal {A}}\) and the metric constant \(\xi \in (0,1)\). Recall the associated notations, in particular that \((\Delta , d)\) is a metric space and that f preserves the probability measure \({{\,\mathrm{\mathbb {P}}\,}}\).

By Theorem 2.4, the process \(( \varphi \circ T^k)_k\) defined on (Mm) and the process \(( v \circ f^k)_k\), where \(v = \varphi \circ \pi \), defined on \((\Delta ,{{\,\mathrm{\mathbb {P}}\,}})\) have the same law. Moreover, if \( \varphi \) is Hölder continuous then so is v since \(\pi \) is Lipschitz. Hence Theorem 3.1 will follow from its equivalent result stated for Markov shift towers as given in Proposition 3.4 below.

3.1 ASIP for Markov shift towers

Proposition 3.4

Suppose that \(v :\Delta \rightarrow {\mathbb {R}}\) is Hölder continuous and centered, i.e. \(\int v \, d{{\,\mathrm{\mathbb {P}}\,}}= 0\). Let \(S_n = \sum _{k < n} v \circ f^k\). Assume that \(\int h_{\mathcal {A}}^2 \, d{{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}}< \infty \). Then \(c^2= \lim _{n \rightarrow \infty } n^{-1}\int |S_n( \varphi )|^2 d \mu \) exists. In addition,

  1. (a)

    If \({{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}}(h_{\mathcal {A}}\ge n) = O(n^{-\beta } (\log n)^\gamma )\), with \(\beta > 2\) and \(\gamma \in {\mathbb R}\), then \(S_n\) satisfies the ASIP with variance \(c^2\) and rate \(o(n^{1/\beta } (\log n)^{ (\gamma + 1)/\beta + {\varepsilon }})\) for each \({\varepsilon }> 0\).

  2. (b)

    If \(\int h_{\mathcal {A}}^\beta \, d{{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}}< \infty \) with \(\beta > 2\), then \(S_n\) satisfies the ASIP with variance \(c^2\) and rate \(o(n^{1/\beta })\).

  3. (c)

    If \(\int e^{\beta h_{\mathcal {A}}^\delta } \, d{{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}}< \infty \) with \(\beta > 0\) and \(\delta \in (0,1]\), then \(S_n\) satisfies the ASIP with variance \(c^2\) and rate \(O(\log n)^{1 + 1/\delta }\).

The rest of this section is devoted to the proof of Proposition 3.4.

3.2 The associated Markov chain

It is convenient to represent the dynamics on a Markov shift tower as a Markov chain in the conventional sense. For this, let \(G = \{(a, \ell ) \in {\mathcal {A}}\times {\mathbb {Z}}: 0 \le \ell < h_{\mathcal {A}}(a)\}\) and let \({\mathcal {G}}\subset G^{\mathbb {Z}}\) be the set of admissible symbolic trajectories:

$$\begin{aligned} {\mathcal {G}}= \{ g = (g_n)_{n \in {\mathbb {Z}}} \in G^{\mathbb {Z}}: \text { if } g_n = (a, \ell ) \text { with } \ell < h(a) - 1 \text {, then } g_{n+1} = (a, \ell + 1) \}. \end{aligned}$$

Let now \((g_n)_{n\ge 0}\) be a Markov chain with state space G and transition probabilities

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}(g_{n+1} = (a, \ell )&\mid g_n = (a', \ell ')) \\&= {\left\{ \begin{array}{ll} 1, &{} \quad \ell = \ell ' + 1 \text { and } \ell ' + 1 < h(a) \text { and } a = a', \\ {{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}}(a) , &{} \quad \ell = 0 \text { and } \ell ' + 1 = h(a'), \\ 0, &{} \quad \text {else}. \end{array}\right. } \end{aligned}$$

The Markov chain has a unique invariant probability measure \(\nu \) with respect to which the Markov chain \((g_n)_{n \ge 0}\) is stationary. By the Kolmogorov existence theorem, there exists a stationary Markov chain indexed by \({\mathbb {Z}}\) that we still denote by \((g_n)_{n \in {\mathbb {Z}}}\) with transition probabilities given above and which defines a probability measure \({{\,\mathrm{\mathbb {P}}\,}}_{{\mathcal {G}}}\) on the space \({\mathcal {G}}\).

Define \(\pi _G :\Delta \rightarrow {\mathcal {G}}\) by \(\pi _G(x) = (a_n, \ell _n)_n\) such that \(f^n(x) \in ([a_n], \ell _n)\) for all n, where \([a_n] \subset {\mathcal {A}}^{\mathbb {Z}}\) is the cylinder \(\{ x \in {\mathcal {A}}^{\mathbb {Z}}: x_0 = a_n \}\). Note that \(\pi _G\) is a bijection; moreover, it is a measure preserving conjugacy. It follows that if we define

$$\begin{aligned} \psi :{\mathcal {G}}\rightarrow {\mathbb {R}}\text { by }\psi = v \circ \pi _G^{-1}, \end{aligned}$$
(3.1)

then setting for all \(k \in {\mathbb {Z}}\),

$$\begin{aligned} X_k = \psi ( (g_{k+n})_{n\in {\mathbb {Z}}} ) , \end{aligned}$$
(3.2)

the stationary process \((X_k)_{k \in {\mathbb {Z}}} \) has the same law as \(( v \circ f^k)_{k \in {\mathbb {Z}}} \). To give the regularity properties of \(\psi \), we need to equip \({\mathcal {G}}\) with a suitable metric.

3.3 Metric on \({\mathcal {G}}\)

Since \(\pi _G :\Delta \rightarrow {\mathcal {G}}\) is a bijection and \(\Delta \) is a metric space, it is natural to define a metric on \({\mathcal {G}}\) so that \(\pi _G\) is an isometry. For \(g, g' \in {\mathcal {G}}\), let

$$\begin{aligned} {\tilde{s}}^\pm (g, g') = \inf \{ \ell \ge 0 : g_{\pm \ell } \ne g'_{\pm \ell } \} . \end{aligned}$$

Let \(G_0 =\{(a, 0) : a \in {\mathcal {A}}\} \subset G\),

$$\begin{aligned} s^{-} (g, g')= & {} \# \big \{ 0 \le \ell< {\tilde{s}}^{-} (g, g') : g_{- \ell } \in G_0 \big \} \text { and } s^{+} (g, g') \\ {}= & {} \# \big \{ 0 < \ell \le {\tilde{s}}^{+} (g, g') : g_{ \ell } \in G_0 \big \} . \end{aligned}$$

Let

$$\begin{aligned} s (g, g') = \min \bigl \{ s^- (g, g'), s^+ (g, g') \bigr \} . \end{aligned}$$

Then s is a kind of separation time (cf.  the separation time on X defined in Sect. 2.1). Recall that \(\xi \in (0,1)\) and define

$$\begin{aligned} d(g, g') = \xi ^{s(g, g')} . \end{aligned}$$

This is the metric on \({\mathcal {G}}\) which agrees with that on \(\Delta \) in the sense that \(d(x,y) = d(\pi _G(x), \pi _G(y))\) for all \(x,y \in \Delta \).

This allows to infer that if v is Hölder with index \(\eta \in ]0,1]\) then so is \(\psi \) defined by (3.1).

Next, to prove the ASIP for the partial sums associated with \((X_k)_{k \in {\mathbb {Z}}}\) defined by (3.2) with \(\psi \) bounded and Hölder continuous, it is convenient, as in [11, 12], to represent the Markov chain with the help of its innovations and to introduce a particular meeting time.

3.4 Innovations and meeting time of the underlying Markov chain

Extending the approach in [11, Section 3], without changing the distribution of \((g_n)_{n \in {\mathbb {Z}}}\), we assume that there is a sequence of iid random elements \(({\varepsilon }_n)_{n \in {\mathbb {Z}}}\) sampled from \(({\mathcal {A}}, {{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}})\) such that each \({\varepsilon }_{n+1}\) is independent from \(({\varepsilon }_\ell , g_\ell )_{\ell \le n}\), and \(g_{n+1}=U(g_n,{\varepsilon }_{n+1})\), where

$$\begin{aligned} U((a, \ell ), {\varepsilon }) = {\left\{ \begin{array}{ll} (a, \ell +1), &{} \quad \ell < h(a) - 1, \\ ({\varepsilon }, 0), &{} \quad \ell = h(a) - 1. \end{array}\right. } \end{aligned}$$

We refer to \(({\varepsilon }_n)_{n \in {\mathbb {Z}}}\) as innovations.

For \(g_0, g'_0 \in G\) and a sequence \(({\varepsilon }_n)_{n \in {\mathbb {Z}}}\) of random elements in \({\mathcal {A}}\), let \(T_{g_0, g'_0} (({\varepsilon }_n))\) be the meeting time of two Markov chains with respective initial states \(g_0, g'_0\) and common innovations \(({\varepsilon }_n)_{n \in {\mathbb {Z}}}\):

$$\begin{aligned} T_{g_0, g'_0} (({\varepsilon }_n)) = \inf \{ \ell \ge 0 : g_\ell = g'_\ell \} , \end{aligned}$$

where \(g_{n+1} = U(g_n, {\varepsilon }_{n+1})\) and \(g'_{n+1} = U(g'_n, {\varepsilon }_{n+1})\).

Proposition 3.5

Suppose that \(g'_0\) is an independent copy of \(g_0\), also independent from \((g_n, {\varepsilon }_n)_{n \ge 0}\). Set \(T = T_{g_0, g'_0} (({\varepsilon }_n))\).

  1. 1.

    Assume that there exists a sequence \({\varepsilon }(n)\) of positive reals tending to 0 as \(n \rightarrow \infty \), such that \({{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}}( h_{\mathcal {A}}\ge n) \ge \exp (-n {\varepsilon }(n))\). If in addition \({{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}}( h_{\mathcal {A}}\ge n) = O ( L(n) n^{-\beta }) \) with \(\beta > 1\) and L(n) a slowly varying function at infinity, then \({{\,\mathrm{\mathbb {P}}\,}}(T \ge n ) = O ( L(n) n^{-(\beta -1)})\).

  2. 2.

    If \({{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}}( h_{\mathcal {A}}\ge n) = O ( L(n) n^{-\beta }) \) with \(\beta > 1\) and L(n) a slowly varying function at infinity, then \( {{\,\mathrm{\mathbb {E}}\,}}( g_\beta (T) ) <\infty \) for any \(\eta >1\) where \(g_\beta (x) = x^{\beta -1} (\log (1+x))^{- \eta }/ L(x)\).

  3. 3.

    If \({{\,\mathrm{\mathbb {E}}\,}}_{\mathcal {A}}( h^\beta _{\mathcal {A}}) < \infty \) for some \(\beta >1\), then \({{\,\mathrm{\mathbb {E}}\,}}( T^{\beta -1} ) < \infty \).

  4. 4.

    If \({{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}}( h_{\mathcal {A}}\ge n) = O ( \textrm{e}^{ - c n^\delta } )< \infty \) for some \(c >0\) and \(\delta \in (0,1]\) , then there exists \(\kappa >0\) such that \( {{\,\mathrm{\mathbb {P}}\,}}(T \ge n ) = O ( \textrm{e}^{ - \kappa n^\delta } )\).

Items 2 and 3 are proved in [11] whereas Item 4 is proved in [12]. Item 1 of Proposition 3.5 follows straightforwardly from Lemma 3.6 below.

Lemma 3.6

Suppose that \(g'_0\) is an independent copy of \(g_0\), also independent from \((g_n, {\varepsilon }_n)_{n \ge 0}\). Set \(T = T_{g_0, g'_0} (({\varepsilon }_n))\). Suppose that there exists a sequence \({\varepsilon }(n)\) of positive reals tending to 0 as \(n \rightarrow \infty \), such that \({{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}}( h_{\mathcal {A}}\ge n) \ge \exp (-n {\varepsilon }(n))\). Then

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}( T \ge n) = O \bigl ( {{\,\mathrm{\mathbb {E}}\,}}_{\mathcal {A}}( (h_{\mathcal {A}}-n)_+) \bigr ) . \end{aligned}$$

Proof

As in Lindvall [21] (see also Rio [27, Proposition 9.6]), let \(\Lambda _0\) be the class of nondecreasing functions \(\psi \) from \({\mathbb N}\) to \([1, \infty [\) such that \(\log (\psi (n))/n\) decreases to 0 as \(n \rightarrow \infty \). Let

$$\begin{aligned} \psi ^{(0)}(k) = \sum _{i=0}^{k-1} \psi (i) \quad \text {and, for }k \ge 1, \quad \varphi (k) = \psi (k)-\psi (k-1) . \end{aligned}$$

Proceeding exactly as in the proof of [11, Lemma 3.1] and applying the result of [21], we infer that

$$\begin{aligned} {{\,\mathrm{\mathbb {E}}\,}}_{\mathcal {A}}\bigl ( \psi ^{(0)}(h_{\mathcal {A}}) \bigr )< \infty \Rightarrow {{\,\mathrm{\mathbb {E}}\,}}(\psi (T)) < \infty . \end{aligned}$$

This last assertion is in fact equivalent to

$$\begin{aligned} \sum _{k=1}^\infty \varphi (k) {{\,\mathrm{\mathbb {E}}\,}}_{\mathcal {A}}\bigl ( (h_{\mathcal {A}}-k)_+ \bigr )< \infty \Rightarrow \sum _{k=1}^\infty \varphi (k) {{\,\mathrm{\mathbb {P}}\,}}( T \ge k ) < \infty . \end{aligned}$$
(3.3)

Now, assume that the conclusion of Lemma 3.6 is not true. Then there exists an increasing sequence \((n_k)_{k \ge 1}\) such that \({{\,\mathrm{\mathbb {P}}\,}}( T \ge n_k) \ge k {{\,\mathrm{\mathbb {E}}\,}}_{\mathcal {A}}( (h_{\mathcal {A}}-n_k)_+)\). Define then the function \(\varphi \) as follows: \(\varphi (i)=0\) if \(i\notin \{n_k, k \ge 1\}\) and \(\varphi (n_k)= k^{-3/2}/{{\,\mathrm{\mathbb {E}}\,}}_{\mathcal {A}}( (h_{\mathcal {A}}-n_k)_+) \). For such a \(\varphi \) it is clear that the sum on left hand in (3.3) is finite, while the selection of \(n_k\) implies that the sum on right hand is \(+\infty \). This leads to a contradiction and proves the Lemma, provided we check that the function \(\psi \) defined by \(\psi (k) =\psi (0) + \varphi (1)+ \cdots + \varphi (k)\) belongs to \(\Lambda _0\). Hence, it remains to check that \(\log (\psi (n_k))/n_k\) tends to 0 as \(k\rightarrow \infty \). Now, by definition of \(\varphi \), \(\psi (n_k)-\psi (0) \le C\sqrt{k}/{{\,\mathrm{\mathbb {E}}\,}}_{\mathcal {A}}( (h_{\mathcal {A}}-n_k)_+)\). On another hand, the assumption on \(h_{\mathcal {A}}\) implies that \({\mathbb E}_{\mathcal {A}}( (h_{\mathcal {A}}-n_k)_+) \ge \exp (-n_k {\varepsilon }'(n_k))\) for some sequence \({\varepsilon }'(n)\) of positive reals tending to 0 as \(n \rightarrow \infty \). This implies that \(\log (\psi (n_k))/n_k\) tends to 0 as \(k\rightarrow \infty \), and the proof is complete. \(\square \)

3.5 Proof of Proposition 3.4 when \(c^2 > 0\)

By the previous considerations, it suffices to prove the ASIP for the partial sums associated with \((X_k)_{k \in {\mathbb {Z}}}\) defined by (3.2) with \(\psi \) bounded and Hölder continuous, by taking into account Proposition 3.5.

Without loss of generality, one can assume that \(v :\Delta \rightarrow {\mathbb {R}}\) is Lipschitz and that \(\Vert v\Vert _{{{\,\textrm{Lip}\,}}} \le 1\), where \(\Vert v\Vert _{{{\,\textrm{Lip}\,}}} = \sup _x |v(x)| + \sup _{x \ne y} |v(x) - v(y)| / d(x,y)\). Recall that \(\psi :{\mathcal {G}}\rightarrow {\mathbb {R}}\), \(\psi = v \circ \pi _G^{-1}\). Then \(\psi \) is also Lipschitz, in particular

$$\begin{aligned} | \psi (g) - \psi (g') | \le \xi ^{s(g,g')} . \end{aligned}$$

We assume also in the rest of this section that the Markov chain \((g_n)_{n \in {\mathbb {Z}}}\) is aperiodic. If it is not the case, we modify the proof as in [11, Appendix A].

The proof follows the lines of [11, Section 4.2] or [12, Section 3] with the same notations, but with the following changes. [11, Proposition 3.2] and [12, Proposition 2.3] are replaced by Proposition 3.7 below whereas [11, Inequality 4.10] is replaced by our Proposition 3.8, whereas [11, Lemma 3.3] is replaced by our Lemma 3.9. Moreover in the case where \({{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}}(h_{\mathcal {A}}\ge n) = O(n^{-\beta } (\log n)^\gamma )\), with \(\beta > 2\) and \(\gamma \in {\mathbb R}\), we take \(m=m_{\ell } = [3^{\ell /\beta } \ell ^{\kappa }]\) with \(\kappa > (1+\gamma )/\beta \). In case \(\int h_{\mathcal {A}}^\beta \, d{{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}}< \infty \) with \(\beta > 2\), we select \(m=m_{\ell } = [3^{\ell /\beta } ]\), and if \(\int e^{\beta h_{\mathcal {A}}^\delta } \, d{{\,\mathrm{\mathbb {P}}\,}}_{\mathcal {A}}< \infty \) with \(\beta > 0\) and \(\delta \in (0,1]\), we take \(m=m_{\ell } = [\kappa \ell ^{1/\delta } ] \), for a suitable \(\kappa \).

To state the key propositions 3.7 and 3.8, we need to introduce some notations.

Fix \(k \in {\mathbb {Z}}\) and \(m > 0\), and let \(({\varepsilon }'_n)_n\) be a copy of \(({\varepsilon }_n)_n\) independent of \((g_n, {\varepsilon }_n)_n\). Define

$$\begin{aligned} {\tilde{g}}_\ell = {\left\{ \begin{array}{ll} g_\ell , &{} \quad \ell \le k + m , \\ U( g_{\ell - 1}, {\varepsilon }'_\ell ) ,&{} \quad \ell = k + m +1 , \\ U( {\tilde{g}}_{\ell - 1}, {\varepsilon }'_\ell ), &{} \quad \ell \ge k + m +2 \end{array}\right. } \qquad \text {and} \qquad X_{m,k} = {{\,\mathrm{\mathbb {E}}\,}}_g \psi ( ({\tilde{g}}_{n+k})_{n \in {\mathbb {Z}}} ) , \end{aligned}$$

where \({{\,\mathrm{\mathbb {E}}\,}}_g\) is the conditional expectation given \((g_n, {\varepsilon }_n)\).

Let

$$\begin{aligned} T = T_{g_0, g'_0}(({\varepsilon }_n)_{n\ge 1}) \end{aligned}$$
(3.4)

be the meeting time between \((g_n)_{n \ge 0}\) and \(({\tilde{g}}_{n})_{n \ge 0}\) with the same innovation \(({\varepsilon }_n)_{n \ge 1}\) but starting from two independent starting points \(g_0\) and \(g_0'\).

Proposition 3.7

  1. 1.

    Assume that \({{\,\mathrm{\mathbb {E}}\,}}(T) < \infty \). Then, for every \(r \ge 1\),

    $$\begin{aligned} {{\,\mathrm{\mathbb {E}}\,}}| X_{m,k} - X_k | \ll m^{-r/2} + {{\,\mathrm{\mathbb {P}}\,}}( T \ge \lfloor m / r \rfloor ) . \end{aligned}$$
  2. 2.

    Assume that there exist \(\delta >0\) and \(\gamma \in ]0,1]\) such that \({{\,\mathrm{\mathbb {P}}\,}}( T \ge n) = O ( \textrm{e}^{- \delta n^{\gamma }} )\). Then there exists \(\delta ' >0\) such that

    $$\begin{aligned} {{\,\mathrm{\mathbb {E}}\,}}| X_{m,k} - X_k | = O ( ( \textrm{e}^{- \delta ' m^{\gamma }} ) . \end{aligned}$$

Proof

See [11, Proposition 3.2] and [12, Proposition 2.3]. \(\square \)

Now define

$$\begin{aligned} {\widetilde{X}}_{m,k} = {{\,\mathrm{\mathbb {E}}\,}}( X_{m,k} \mid {\varepsilon }_{k-m}, \ldots , {\varepsilon }_{k+m} ) . \end{aligned}$$
(3.5)

Proposition 3.8

Let \(\theta ^{-}_u = \sum _{k=0}^{u-1} \textbf{1}_{\{ g_{-k} \in G_0 \}}\) and \(\theta ^{+}_u = \sum _{k=0}^{u-1} \textbf{1}_{\{ g_{k} \in G_0 \}}\). Then

$$\begin{aligned} {{\,\mathrm{\mathbb {E}}\,}}| {\widetilde{X}}_{m,k} - X_{m,k} | \ll \Vert \psi \Vert _\infty {{\,\mathrm{\mathbb {P}}\,}}(T > m/2) + {{\,\mathrm{\mathbb {E}}\,}}\xi ^{ \min (\theta ^{-}_{[m/2]}, \theta ^{+}_{m} ) } , \end{aligned}$$

where T is defined by (3.4). Consequently:

  1. 1.

    If \({{\,\mathrm{\mathbb {E}}\,}}(T) < \infty \), then, for every \(r \ge 1\),

    $$\begin{aligned} {{\,\mathrm{\mathbb {E}}\,}}| {\widetilde{X}}_{m,k} - X_k | \ll m^{-r/2} + {{\,\mathrm{\mathbb {P}}\,}}( T \ge \lfloor m / r \rfloor ) . \end{aligned}$$
  2. 2.

    If there exist \(\delta >0\) and \(\gamma \in ]0,1]\) such that \({{\,\mathrm{\mathbb {P}}\,}}( T \ge n) = O ( \textrm{e}^{- \delta n^{\gamma }} )\), then there exists \(\delta ' >0\) such that

    $$\begin{aligned} {{\,\mathrm{\mathbb {E}}\,}}| {\widetilde{X}}_{m,k} - X_k | = O ( \textrm{e}^{- \delta ' m^{\gamma }} ) . \end{aligned}$$

Proof

Since \(X_{m,k}\) is determined by \((g_n)_{n \le k + m}\), we write \(X_{m,k} = h_m((g_n)_{n \le k + m})\) with some function \(h_m\).

Let \((g'_n, {\varepsilon }'_n)_{n \in {\mathbb {Z}}}\) be an independent copy of \((g_n, {\varepsilon }_n)_{n \in {\mathbb {Z}}}\). Note that \(g_n' = U (g'_{n-1}, {\varepsilon }'_n)\). Define \(( {\hat{g}}_{\ell }) \) by

$$\begin{aligned} {\hat{g}}'_\ell = {\left\{ \begin{array}{ll} g'_\ell &{} \quad \ell < k - m , \\ U ( g'_{k-m -1}, {\varepsilon }_{k-m} ) , &{} \quad \ell = k - m , \\ U ({\hat{g}}'_{\ell -1 }, {\varepsilon }_{\ell } ) , &{} \quad \ell \le k - m + 1 . \end{array}\right. } \end{aligned}$$

Let \({\mathcal {B}}_{k,m}\) be the sigma-algebra generated by \(\bigl ( (g_{\ell })_{\ell < k-m} , {\varepsilon }_{k-m}, \ldots , {\varepsilon }_{k+m} \bigr )\). By the properties of conditional expectation, we have

$$\begin{aligned} X_{m,k} = {{\,\mathrm{\mathbb {E}}\,}}\bigl ( h_m ((g_n)_{n \le k + m}) \mid {\mathcal {B}}_{k,m} \bigr ) \end{aligned}$$

and

$$\begin{aligned} {\widetilde{X}}_{m,k} = {{\,\mathrm{\mathbb {E}}\,}}\bigl ( h_m \bigl ( (g'_u)_{u < k - m} , ({\hat{g}}'_u)_{k - m \le u \le k+m} \bigr ) \mid {\mathcal {B}}_{k,m} \bigr ) . \end{aligned}$$

Set

$$\begin{aligned} {\widehat{T}}= T_{g_{k-m-1}, {\hat{g}}'_{k-m-1} } ( ( {\varepsilon }_{\ell } )_{\ell \ge k - m}) \end{aligned}$$

This is the meeting time of the chains \((g_n)_{n \ge k - m - 1}\) and \(({\hat{g}}'_n)_{n \ge k - m-1}\) with innovations \(( {\varepsilon }_{\ell } )_{\ell \le k - m}\) and independent starting points \(g_{k-m-1}\) and \({\hat{g}}'_{k-m-1}\). It has the same law as T defined in (3.4). Note that

$$\begin{aligned} I&= \bigl | h_m ((g_n)_{n \le k + m}) - h_m \bigl ((g'_u)_{u \le k - m-1} , ({\hat{g}}'_u)_{k - m \le u \le k + m} \bigr ) \bigr | \\&\le 2 \Vert h_m \Vert _{\infty } {\textbf{1}}_{{\widehat{T}}>m/2} + \sum _{\ell =0}^{[m/2]} {\textbf{1}}_{{\widehat{T}}= \ell } \Bigl | h_m ((g_n)_{n \le k + m}) \\&\quad - h_m \bigl ( (g'_u)_{u \le k - m-1} , ({\hat{g}}'_u)_{k - m \le u < k - m + \ell }, g_{k - m + \ell }, \dots , g_k, \dots , g_{k+m} \bigr ) \Bigr | . \end{aligned}$$

But, from the definition of the separation distance and using \(\Vert v\Vert _{{{\,\textrm{Lip}\,}}} \le 1\), for any \(\ell \) in \(\{0, \dots , [m/2] \}\),

$$\begin{aligned}{} & {} \bigl | h_m ((g_n)_{n \le k + m}) - h_m \bigl ((g'_u)_{u \le k - m-1} , ({\hat{g}}'_u)_{k - m \le u < k-m + \ell }, g_{k - m + \ell }, \dots , g_k, \dots , g_{k+m} \bigr ) \bigr | \\{} & {} \quad \le \max \big ( \xi ^{\# \{ k - m + \ell \le i \le k : g_i \in G_0 \} } , \xi ^{\# \{ k \le i \le k +m : g_i \in G_0 \} } \big ) . \end{aligned}$$

Taking the expectation, we get

$$\begin{aligned}{} & {} {{\,\mathrm{\mathbb {E}}\,}}|X_{m,k} - {\widetilde{X}}_{m,k} | \le {{\,\mathrm{\mathbb {E}}\,}}( I ) \\{} & {} \quad \le 2 {{\,\mathrm{\mathbb {E}}\,}}\bigl ( \Vert h_m \Vert _{\infty } {\textbf{1}}_{{\widehat{T}}>m/2} \bigr ) + {{\,\mathrm{\mathbb {E}}\,}}\max \bigl ( \xi ^{\# \{ k - [m/2] \le i \le k : g_i \in G_0 \} } , \xi ^{\# \{ k \le i \le k +m : g_i \in G_0 \} } \bigr ) . \end{aligned}$$

The first part of the proposition follows by stationarity and the fact that \( \Vert h_m \Vert _{\infty } \le \Vert \psi \Vert _\infty \). To end the proof, we used the same arguments as in the proofs of [11, Proposition 3.2] and [12, Proposition 2.3]. \(\square \)

Lemma 3.9

  1. 1.

    If \({{\,\mathrm{\mathbb {E}}\,}}(T) < \infty \), for every \(\alpha \ge 1\) and every \(k \ge 1\),

    $$\begin{aligned} | \textrm{Cov} (X_0,X_k) | \ll k^{-\alpha /2} + {{\,\mathrm{\mathbb {P}}\,}}( T \ge \lfloor k / (4 \alpha ) \rfloor ) . \end{aligned}$$
  2. 2.

    If there exist \(\delta >0\) and \(\gamma \in ]0,1]\) such that \({{\,\mathrm{\mathbb {P}}\,}}( T \ge n) = O ( \textrm{e}^{- \delta n^{\gamma }} )\), then for any \(k \ge 0\), there exists \(\delta ' >0\) such that

    $$\begin{aligned} | \textrm{Cov} (X_0,X_k) | = O ( ( \textrm{e}^{- \delta ' k^{\gamma }} ) . \end{aligned}$$

Proof

For every positive integers m and n, we have

$$\begin{aligned}{} & {} | \textrm{Cov} (X_0,X_k) | \le \Vert X_k \Vert _{\infty } \Vert X_0 - {\widetilde{X}}_{m,0} \Vert _1+ | \textrm{Cov} ( {\widetilde{X}}_{m,0} ,X_k) | \\{} & {} \quad \le \Vert X_k \Vert _{\infty } \Vert X_0 - {\widetilde{X}}_{m,0} \Vert _1 + \Vert {\widetilde{X}}_{m,0} \Vert _{\infty } \Vert X_k - {\widetilde{X}}_{n,k} \Vert _1 + | \textrm{Cov} ( {\widetilde{X}}_{m,0} ,{\widetilde{X}}_{n,k} ) | \, . \end{aligned}$$

For every \(k \ge 2\), we select \(m= [k/2]\) and \(n = [k/2] -1\). In this case \({\widetilde{X}}_{m,0}\) and \({\widetilde{X}}_{n,k}\) are independent, implying that for any \(k \ge 2\),

$$\begin{aligned} | \textrm{Cov} (X_0,X_k) | \le \Vert \psi \Vert _{\infty } \big ( \Vert X_0 - {\widetilde{X}}_{ [k/2] ,0} \Vert _1 + \Vert X_k - {\widetilde{X}}_{ [k/2] -1,k} \Vert _1 \big ) \, . \end{aligned}$$

This upper bound combined with Proposition 3.8 proves the lemma. \(\square \)

3.6 Proof of Proposition 3.4 when \(c^2=0\)

Like in the nonuniformly expanding case [11], our strategy is to represent \(X_k\) as a coboundary: \(X_k = z_{k-1} - z_k\), where \(z_i\) is a sufficiently nice stationary sequence, and to bound the growth of \(z_k\) almost surely. Our proof follows well established ideas [6, 13, 24, 30] but unlike in the nonuniformly expanding case, we have not found a result that we can conveniently cite.

The representation \(X_k = z_{k-1} - z_k\) is provided by the following general result which is essentially due to Gordin [16] (see also  [13, Lemma 7.1] in case of adapted r.v.’s but there the condition c) is actually unnecessary).

Lemma 3.10

Suppose that \((X_k)_{k \in {\mathbb {Z}}}\) is a strictly stationary sequence of real-valued random variables that are in \(L^2\), and let \(({{\mathcal {F}}}_k)_{k \in {\mathbb {Z}}}\) be a stationary filtration. Let \({{\,\mathrm{\mathbb {E}}\,}}_i\) denote the conditional expectation given \({\mathcal {F}}_i\) and set \(S_n = X_1+ \cdots + X_n\). Assume that

  1. (a)

    \({{\,\mathrm{\mathbb {E}}\,}}_0 (S_n)\) and \( \sum _{k =0}^{n} ( X_{-k} - {{\,\mathrm{\mathbb {E}}\,}}_{0} ( X_{-k}) )\) converge in \(L^1\) as \(n \rightarrow \infty \),

  2. (b)

    \(\displaystyle \liminf _{n \rightarrow \infty } \frac{{{\,\mathrm{\mathbb {E}}\,}}|S_n| }{\sqrt{n}}=0\).

Then, almost surely, \( X_i = z_{i-1} - z_i \), where \(z_i = g_i-h_i\) with \(g_i = \sum _{k \ge i +1} {{\,\mathrm{\mathbb {E}}\,}}_{i} ( X_k)\) and \(h_i = \sum _{k \le i } ( X_{k} - {{\,\mathrm{\mathbb {E}}\,}}_{i} ( X_{k}) )\). All \(z_i, g_i, h_i\) are in \(L^1\).

Proof

Set \(d_{i} = \sum _{k \in {\mathbb Z}} P_i (X_{k+i}) \) where \(P_i = {{\,\mathrm{\mathbb {E}}\,}}_i - {{\,\mathrm{\mathbb {E}}\,}}_{i-1} \). Then

$$\begin{aligned} X_i = d_{i } + z_{i-1} - z_{i} . . \end{aligned}$$
(3.6)

By assumption, all random variables in the above decomposition are in \(L^1\). Moreover, \((d_{i})_{i \in {\mathbb {Z}}}\) is a stationary martingale difference sequence. Let’s prove that the \(d_i\)’s are almost surely equal to zero. With this aim, we shall proceed as in [14]. By Burkholder’s deviation inequality for the martingale square function [8] (see [14, Lemma 6] for an easy reference), there exists a positive constant C such that, for any \(\lambda >0\),

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}\Big ( n^{-1} \sum _{i=1}^n d_i^2 > \lambda ^2 \Big ) \le C \lambda ^{-1} \frac{ {{\,\mathrm{\mathbb {E}}\,}}\big | \sum _{i=1}^n d_i \big |}{ \sqrt{n}} . \end{aligned}$$

But, by (3.6), \( {{\,\mathrm{\mathbb {E}}\,}}\big | \sum _{i=1}^n d_i \big | \le {{\,\mathrm{\mathbb {E}}\,}}|S_n| + 2 {{\,\mathrm{\mathbb {E}}\,}}|z_0| \). Using part (b), there exists a subsequence \((n_k)_{k>0}\) tending to infinity such that \(n_k^{-1} \sum _{i=1}^{n_k} d_i^2\) converges to zero in probability as \(k \rightarrow \infty \). On the other hand, by the ergodic theorem, since \((d_i^2)_{i \in {\mathbb Z}}\) is a stationary sequence of nonnegative r.v.’s, \(n^{-1} \sum _{i=1}^n d_i^2\) converges almost surely to \({{\,\mathrm{\mathbb {E}}\,}}(d_0^2 | \mathcal {I} ) \) where \(\mathcal {I}\) is the so-called invariant \(\sigma \)-field. So, overall, \( {{\,\mathrm{\mathbb {E}}\,}}(d_0^2 | \mathcal {I} ) =0\) almost surely and \({{\,\mathrm{\mathbb {E}}\,}}(d_0^2) = {{\,\mathrm{\mathbb {E}}\,}}( {{\,\mathrm{\mathbb {E}}\,}}(d_0^2 | \mathcal {I} ))=0\). This proves that \(d_0=0\) almost surely. \(\square \)

Now we return to the specific process we are interested in: \(X_k = \psi ( (g_{k+\ell })_{\ell \in {\mathbb {Z}}})\) as in (3.2). Let \({\mathcal {F}}_k = \sigma ( {\varepsilon }_i, i \le k)\).

Proposition 3.11

Suppose that \({{\,\mathrm{\mathbb {E}}\,}}(T) < \infty \) and \(c^2 = 0\). Then the assumptions of Lemma 3.10 are satisfied.

Proof

Recall the construction of \(X_{m,k}\) and \({\widetilde{X}}_{m,k} = {{\,\mathrm{\mathbb {E}}\,}}( X_{m,k} | {\varepsilon }_{k-m}, \ldots , {\varepsilon }_{k+m})\). For \(k \ge 0\), let \(X_{k}^*={\widetilde{X}}_{m_{k} ,k} \) with \(m_k = \lfloor k/2 \rfloor \). Clearly \(X_{k}^*\) is independent of \({\mathcal {F}}_{0}\) and is centered. To prove the first part of assumption (a), that is the convergence of \({{\,\mathrm{\mathbb {E}}\,}}_0 (S_n)\), write

$$\begin{aligned} \sum _{k \ge 2} \Vert {{\,\mathrm{\mathbb {E}}\,}}_{0} ( X_k) \Vert _1 \le \sum _{k \ge 2} \Vert X_k - X_{k}^* \Vert _1 , \end{aligned}$$

and use Proposition 3.8 to prove that the right hand side above converges.

Next we prove the second part of assumption (a). For \(k \ge 0\) let \(X_{-k}^* = {\widetilde{X}}_{\lfloor k/2 \rfloor , - k}\). Clearly \(X_{-k}^*\) is \({\mathcal {F}}_0\)-measurable. Hence

$$\begin{aligned}{} & {} \sum _{k \ge 0} \Vert X_{-k} - {{\,\mathrm{\mathbb {E}}\,}}_0 (X_{-k} ) \Vert _1\\{} & {} = \sum _{k \ge 0} \Vert X_{-k} - X_{-k}^* - {{\,\mathrm{\mathbb {E}}\,}}_0 (X_{-k} - X_{-k}^* ) \Vert _1 \le 2 \sum _{k \ge 0} \Vert X_{-k} - X_{-k}^* \Vert _1 . \end{aligned}$$

By another application of Proposition 3.8, for any \(r \ge 1\),

$$\begin{aligned} \sum _{k \ge 2} \Vert X_{-k} - {{\,\mathrm{\mathbb {E}}\,}}_0 (X_{-k} ) \Vert _1 \ll \sum _{k \ge 2} \bigl ( k^{-r/2} + {{\,\mathrm{\mathbb {P}}\,}}( T \ge \lfloor k /(2 r) \rfloor ) \bigr ) . \end{aligned}$$

Choosing \(r >1\), the second part of assumption (a) follows.

We turn now to the proof of assumption (b). By Lemma 3.9, since \({{\,\mathrm{\mathbb {E}}\,}}(T) < \infty \), \(\sum _{k \ge 0} \bigl | \textrm{Cov} (X_0, X_k) \bigr | < \infty \) and then \(c^2=\lim _{n \rightarrow \infty } n^{-1} {{\,\mathrm{\mathbb {E}}\,}}(S_n^2)\) is well defined and is assumed to be zero. It follows that assumption (b) is satisfied. \(\square \)

At this point, as long as \(c^2 = 0\), we have the representation

$$\begin{aligned} S_n = X_1 + \cdots + X_n = z_0 - z_n \end{aligned}$$

with \(z_n\) as in Lemma 3.10. Thus the proof of Proposition 3.4 is reduced to estimating \(z_n\) almost surely, and is completed by:

Proposition 3.12

Suppose \({{\,\mathrm{\mathbb {E}}\,}}(T) < \infty \) and \(c^2 = 0\) and \(z_i\) are as in Lemma 3.10.

  1. 1.

    If \({{\,\mathrm{\mathbb {E}}\,}}(T^{p}) < \infty \) for some \(p>1\), then \( \Vert z_0 \Vert _p < \infty \) and \(S_n = o (n^{1/(p+1)})\) a.s.

  2. 2.

    If \({{\,\mathrm{\mathbb {E}}\,}}(\psi _p(T) ) < \infty \) with \(\psi _p(x) = x^{p} ( \log x )^{ - ( \gamma + 1 + {\varepsilon })}\) for some \({\varepsilon }>0\), then \(S_n = o (n^{1/(p+1)} ( \log n)^{ ( \gamma + 1 + {\varepsilon }) /(p+1) } )\) a.s.

  3. 3.

    If \({{\,\mathrm{\mathbb {P}}\,}}( T \ge n) = O ( \textrm{e}^{- \delta n^{\gamma }} )\) with some \(\delta >0\) and \(\gamma \in ]0,1]\), then \(S_n = O ( ( \log n)^{1/ \gamma } )\) a.s.

Proof

Denote \(M = \Vert X_0\Vert _\infty = \Vert \psi \Vert _\infty \).

We begin with part 1: let \(p > 1\) and suppose that \({{\,\mathrm{\mathbb {E}}\,}}(T^p) < \infty \). By Proposition 3.8,

$$\begin{aligned} \sum _{k >0} k^{p-1} \bigl ( \Vert X_k - {\widetilde{X}}_{ [k/2] ,k} \Vert _1 + ( \Vert X_{-k} - {\widetilde{X}}_{ [k/2] ,-k} \Vert _1 \bigr ) < \infty , \end{aligned}$$
(3.7)

First we use (3.7) to prove that \(z_0\) is in \(L^p\). Recall that \(z_0 = g_0 - h_0\) where \(g_0 = \sum _{k \ge 1} {{\,\mathrm{\mathbb {E}}\,}}_{0} ( X_k)\) and \(h_0 = \sum _{k \le 0 } ( X_{k} - {{\,\mathrm{\mathbb {E}}\,}}_{0} ( X_{k}) ) \). Observe that for \(k \ge 2\), these summands can be bounded in \(L^1\) by those in (3.7): \(\Vert {{\,\mathrm{\mathbb {E}}\,}}_0 (X_k) \Vert _1 \le \Vert X_k - {\widetilde{X}}_{ [k/2] ,k} \Vert _1\) and \(\Vert X_{-k} - {{\,\mathrm{\mathbb {E}}\,}}_{0} ( X_{-k}) \Vert _1 \le 2 \Vert X_{-k} - {\widetilde{X}}_{ [k/2] ,-k} \Vert _1\). Now, \(g_0\), \(h_0\) and consequently \(z_0\) are in \(L^p\) by an application of Lemma 3.13 with \(V=g_0\) and \(V=h_0\) respectively. (To prove that \(g_0\) is in \(L^p\) we can also use the arguments given in the proof of [13, Proposition 2.1].)

To prove that \(S_n = o (n^{1/(p+1)})\) a.s., it suffices to show that, for every \({\varepsilon }>0\), \( \sum _{\ell>0} {{\,\mathrm{\mathbb {P}}\,}}\bigl ( \max _{k \le 2^{\ell }} |S_k| > {\varepsilon }2^{\ell /(p+1)} \bigr ) < \infty \), which is equivalent to proving that

$$\begin{aligned} \sum _{n>1} n^{-1} {{\,\mathrm{\mathbb {P}}\,}}\bigl ( \max _{k \le n} |S_k| > {\varepsilon }n^{1/(p+1)} \bigr ) < \infty . \end{aligned}$$
(3.8)

Let \(k \le n\) and write

$$\begin{aligned} S_k = S_k - {{\,\mathrm{\mathbb {E}}\,}}_k(S_k) + {{\,\mathrm{\mathbb {E}}\,}}_k(S_k-S_n) + {{\,\mathrm{\mathbb {E}}\,}}_k(S_n) . \end{aligned}$$

Let \(q \ge 1\). Using \({{\,\mathrm{\mathbb {E}}\,}}_k( X_i - {{\,\mathrm{\mathbb {E}}\,}}_{i-q} (X_i) ) =0 \) if \(i -q \ge k\), we have

$$\begin{aligned} \big | {{\,\mathrm{\mathbb {E}}\,}}_k(S_n-S_k) \big |&\le \Bigl | \sum _{i=k+1}^n {{\,\mathrm{\mathbb {E}}\,}}_k( X_i - {{\,\mathrm{\mathbb {E}}\,}}_{i-q} (X_i) ) \Bigr | + \Bigl | \sum _{i=k+1}^n {{\,\mathrm{\mathbb {E}}\,}}_k( {{\,\mathrm{\mathbb {E}}\,}}_{i-q} (X_i) ) \Bigr | \\&\le 2qM + \sum _{i=k+1}^n | {{\,\mathrm{\mathbb {E}}\,}}_k ( {{\,\mathrm{\mathbb {E}}\,}}_{i-q} (X_i) | . \end{aligned}$$

Then, using \({{\,\mathrm{\mathbb {E}}\,}}_k( {{\,\mathrm{\mathbb {E}}\,}}_{\ell + q } (X_{\ell } ) ) = {{\,\mathrm{\mathbb {E}}\,}}_{\ell + q } (X_{\ell } ) \) for \(\ell \le k-q\),

$$\begin{aligned} |S_k - {{\,\mathrm{\mathbb {E}}\,}}_k(S_k)|&\le \Bigl | \sum _{\ell =1}^{k - q} \bigl ( X_{\ell } - {{\,\mathrm{\mathbb {E}}\,}}_{\ell + q} X_\ell - {{\,\mathrm{\mathbb {E}}\,}}_k( X_{\ell } - {{\,\mathrm{\mathbb {E}}\,}}_{\ell + q} X_\ell ) \bigr ) \Bigr | + 2 q M \\&\le \sum _{\ell =1}^{k - q} \vert X_{\ell } - {{\,\mathrm{\mathbb {E}}\,}}_{\ell + q } (X_{\ell } ) \vert + \sum _{\ell =1}^{k - q} {{\,\mathrm{\mathbb {E}}\,}}_k \vert X_{\ell } - {{\,\mathrm{\mathbb {E}}\,}}_{\ell + q } (X_{\ell } ) \vert + 2 q M . \end{aligned}$$

Overall,

$$\begin{aligned}{} & {} \max _{1 \le k \le n} | S_k | \le 4 q M + \max _{1 \le k \le n} | {{\,\mathrm{\mathbb {E}}\,}}_k ( S_n) | + \max _{1 \le k \le n} \sum _{i=1}^n | {{\,\mathrm{\mathbb {E}}\,}}_k ( {{\,\mathrm{\mathbb {E}}\,}}_{i-q} (X_i) | \\{} & {} \quad + \sum _{\ell =1}^{n} \vert X_{\ell } - {{\,\mathrm{\mathbb {E}}\,}}_{\ell + q } (X_{\ell } ) \vert + \max _{1 \le k \le n} \sum _{\ell =1}^{n} {{\,\mathrm{\mathbb {E}}\,}}_k \vert X_{\ell } - {{\,\mathrm{\mathbb {E}}\,}}_{\ell + q } (X_{\ell } ) \vert . \end{aligned}$$

Accordingly, if \(x > 0\) and \(qM \le x\), then

$$\begin{aligned} \begin{aligned}&{{\,\mathrm{\mathbb {P}}\,}}\bigl ( \max _{1 \le k \le n} |S_k|> 8x \bigr ) \le {{\,\mathrm{\mathbb {P}}\,}}\bigl ( \max _{1 \le k \le n} | {{\,\mathrm{\mathbb {E}}\,}}_k ( S_n) |> x \bigr ) + {{\,\mathrm{\mathbb {P}}\,}}\Bigl ( \max _{1 \le k \le n} \sum _{i=1}^n | {{\,\mathrm{\mathbb {E}}\,}}_k ( {{\,\mathrm{\mathbb {E}}\,}}_{i-q} (X_i) |> x \Bigr ) \\&\quad + {{\,\mathrm{\mathbb {P}}\,}}\Bigl ( \sum _{\ell =1}^{n} \vert X_{\ell } - {{\,\mathrm{\mathbb {E}}\,}}_{\ell + q } (X_{\ell } ) \vert>x \Bigr ) + {{\,\mathrm{\mathbb {P}}\,}}\Bigl ( \max _{1 \le k \le n} \sum _{\ell =1}^{n} {{\,\mathrm{\mathbb {E}}\,}}_k \vert X_{\ell } - {{\,\mathrm{\mathbb {E}}\,}}_{\ell + q } (X_{\ell } ) \vert >x \Bigr ) . \end{aligned} \end{aligned}$$
(3.9)

Let

$$\begin{aligned} \theta (q) = \Vert X_0 - {\widetilde{X}}_{q ,0} \Vert _1 . \end{aligned}$$

Starting from (3.9) and using Doob’s maximal inequality, as done in the proof of inequality (A.42) in [25], we infer that for any nondecreasing, non negative and convex function \(\varphi \) and any \(x > 0\),

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}\Bigl ( \max _{1 \le k \le n} |S_k| > 8x \Bigr ) \ll \frac{{{\,\mathrm{\mathbb {E}}\,}}( \varphi (S_n) ) }{ \varphi (x) } + n x^{-1} \theta (\lfloor x/M \rfloor ) . \end{aligned}$$
(3.10)

Then, taking \(\varphi (x) =x^p\),

$$\begin{aligned} \sum _{n>1} n^{-1} {{\,\mathrm{\mathbb {P}}\,}}\bigl ( \max _{k \le n} |S_k|> {\varepsilon }n^{1/(p+1)} \bigr )&\ll \sum _{n> 1} n^{-1-\frac{p}{p+1}} \Vert S_n \Vert _p^p + \sum _{n> 1} n^{-1/(p+1)} \theta (\lfloor {\varepsilon }n^{1/(p+1)} / M \rfloor )\\&\ll \sup _n \Vert S_n \Vert _p^p + \sum _{n >1} n^{p-1} \theta (n) . \end{aligned}$$

The expression above is finite: \(\Vert S_n\Vert _p \le 2 \Vert z_0\Vert _p < \infty \) and \(\sum _{n >1} n^{p-1} \theta (n) < \infty \) by Proposition 3.8. This verifies (3.8) and completes the proof of part 1.

We now turn to part 2. By part 1, \(\Vert S_n \Vert _r < \infty \) for all \(r \in [1, p[\). Hence, with \(b_n = n^{1/(p+1)} (\log n)^{ ( {\varepsilon }+ \gamma +1 )/(p+1) }\), applying (3.10) with \(\varphi (x) = x^r\), it follows that for any \(\eta >0\),

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}( \max _{1 \le k \le n} |S_k|> \eta b_n) \ll b_n^{-r} + \min \Bigl ( 1, \frac{n}{ \eta b_n} {{\,\mathrm{\mathbb {P}}\,}}(T >c \eta b_n) \Bigr ) , \end{aligned}$$

where c is a positive constant which does not depend on n. Part 2 is proved if we show that \(\sum _{n \ge 1} b_n^{-1} {{\,\mathrm{\mathbb {P}}\,}}(T > c \eta b_n) < \infty \). By a change of variable this is equivalent to \(\sum _{n \ge 2} n^{p-1} (\log n )^{ {\varepsilon }+ \gamma +1} {{\,\mathrm{\mathbb {P}}\,}}(T > n) < \infty \) which holds by Proposition 3.8 because \( {{\,\mathrm{\mathbb {E}}\,}}(\psi _p (T)) < \infty \).

We turn now to the proof of part 3. Let \(K>0\), to be chosen later. By (3.10) applied with with \(x= K(\log n)^{1/\gamma }\) and \(\varphi (u)=u^{ 2}\), using Lemma (3.13), our assumption and part 2 of Proposition 3.8, we see that

$$\begin{aligned} \frac{1}{n} {{\,\mathrm{\mathbb {P}}\,}}\bigl ( \max _{1\le k\le n} |S_k|>8K(\log n)^{1/\gamma } \bigr )&\ll \frac{1}{n (\log n)^{2/\gamma }} + \frac{1}{K(\log n)^{1/\gamma }} \textrm{e}^{-\delta ' (K(\log n)^{1/\gamma }/M)^\gamma } \\&\ll \frac{1}{n (\log n)^2} +\frac{1}{n^{\delta '(K/M)^\gamma }} . \end{aligned}$$

We conclude by taking \(K =2M\delta '^{-1/\gamma }\). \(\square \)

Lemma 3.13

Let \((V_k)_{k\in {\mathbb {N}}}\) be a sequence of non negative random variables uniformly bounded by M. Set \(V = \sum _{k\in {\mathbb {N}}}V_k\). Then, for every \(p>1\), we have

$$\begin{aligned} \Vert V\Vert _p \le M+ \sum _{k\in {\mathbb {N}}} k^{p-1}\Vert V_k\Vert _1 . \end{aligned}$$
(3.11)

Proof

Recall that \(\Vert V\Vert _p=\sup _{\Vert Z\Vert _q=1}{{\,\mathrm{\mathbb {E}}\,}}(ZV)\), where \(q=p/(p-1)\). Let Z be a random variable with \(\Vert Z\Vert _q=1\). Then

$$\begin{aligned} {{\,\mathrm{\mathbb {E}}\,}}(|ZV|)&\le \sum _{k\in {\mathbb {N}}} k^{p-1}{{\,\mathrm{\mathbb {E}}\,}}(|V_k|) + \sum _{k\in {\mathbb {N}}} M {{\,\mathrm{\mathbb {E}}\,}}\bigl ( |Z| \textbf{1}_{\{|Z|\ge k^{p-1}\}} \bigr ) \\&\qquad = \sum _{k\in {\mathbb {N}}} k^{p-1}{{\,\mathrm{\mathbb {E}}\,}}(|V_k|) + M{{\,\mathrm{\mathbb {E}}\,}}\Bigl ( |Z|\sum _{1\le k\le |Z|^{1/(p-1)}} 1 \Bigr ) \\&\qquad = \sum _{k\in {\mathbb {N}}} k^{p-1}{{\,\mathrm{\mathbb {E}}\,}}(|V_k|) + M{{\,\mathrm{\mathbb {E}}\,}}(|Z^q|)\, , \end{aligned}$$

and the lemma is proved.\(\square \)