1 Introduction

Throughout this paper, we assume that \(\varvec{\Upsilon }=(\Upsilon _n)_{n\in {{\mathbb {N}}}}\) is a Markov chain defined on a probability space \((\Omega , {{\mathcal {F}}}, {\mathbb {P}})\), taking values in a measurable (countably generated) space \(({{\mathcal {X}}},{{\mathcal {B}}})\), with a transition function \(P:{{\mathcal {X}}}\times {{\mathcal {B}}}\rightarrow [0,1]\). Moreover, we assume that \(\varvec{\Upsilon }\) is \(\varvec{\psi }\)-irreducible and aperiodic and admits a unique invariant probability measure \(\pi \). As usual for any initial distribution \(\mu \) on \(\mathcal {X}\), we will write \({\mathbb {P}}_{\mu }\left( \varvec{\Upsilon } \in \cdot \right) \) for the distribution of the chain with \(\Upsilon _0\) distributed according to the measure \(\mu \). We will denote by \(\delta _x\) the Dirac’s mass at x, and to shorten the notation, we will use \({\mathbb {P}}_{x}\) instead of \({\mathbb {P}}_{\delta _x}\).

We say that \(\varvec{\Upsilon }\) is geometrically ergodic if there exists a positive number \(\rho < 1\) and a real function \(G : {{\mathcal {X}}}\rightarrow {{\mathbb {R}}}\) such that for every starting point \(x \in {{\mathcal {X}}}\) and \(n \in {{\mathbb {N}}}\),

$$\begin{aligned} \left\| P^n (x, \cdot ) - \pi (\cdot ) \right\| _{TV} \le G(x) \rho ^n, \end{aligned}$$
(1.1)

where \(\Vert \cdot \Vert _{TV}\) denotes the total variation norm of a measure and \(P^n(\cdot , \cdot )\) is the n-step transition function of the chain. For equivalent conditions, we refer to Chapter 15 of [22].

We will be interested in tail inequalities for sums of random variables of the form

$$\begin{aligned} {\mathbb {P}}_x \left( \left| \sum _{i=0}^{n-1} f(\Upsilon _i)\right| >t\right) , \end{aligned}$$

where \(f:{{\mathcal {X}}}\rightarrow {{\mathbb {R}}}\) is a measurable real function and \(x\in \mathcal {X}\) is a starting point. Although our main results, stated in Sect. 4, do not require f to be bounded, we give here a version in the bounded case for the sake of simplicity. This version will be easier to compare to the Bernstein inequality for bounded random variables stated in Sect. 2 (cf. Theorem 2.1). Below for convenience, we set \(\log (\cdot ) = \ln (\cdot \vee e)\), where \(\ln (\cdot )\) is the natural logarithm.

Theorem 1.1

(Bernstein-like inequality for Markov chains) Let \(\varvec{\Upsilon }\) be a geometrically ergodic Markov chain with state space \({{\mathcal {X}}}\), and let \(\pi \) be its unique stationary probability measure. Moreover, let \(f:{{\mathcal {X}}}\rightarrow {{\mathbb {R}}}\) be a bounded measurable function such that \({{\mathbb {E}}}_\pi f=0\). Furthermore, let \(x\in {{\mathcal {X}}}\). Then, we can find constants \(K,\tau >0\) depending only on x and the transition probability \(P(\cdot ,\cdot )\) such that for all \(t>0\),

$$\begin{aligned} {\mathbb {P}}_x \left( \left| \sum _{i=0}^{n-1} f(\Upsilon _i)\right| >t\right) \le K \exp \left( -\frac{t^2}{32 n \sigma _{Mrv}^2+ \tau t \Vert f \Vert _\infty \log n}\right) , \end{aligned}$$

where

$$\begin{aligned} \sigma _{Mrv}^2 = \text {Var}_\pi (f(\Upsilon _0))+2\sum _{i=1}^\infty \text {Cov}_\pi (f(\Upsilon _0),f(\Upsilon _i)) \end{aligned}$$
(1.2)

denotes the asymptotic variance of the process \(\left( f(\Upsilon _i)\right) _i\).

Remark 1.2

We refer to Theorem 4.3 for a more general counterpart of Theorem 1.1 and to Theorem 4.4 for explicit formulas for K and \(\tau \).

Let us comment briefly on the method of proof. We rely on the by now classical regeneration technique of Athreya–Ney and Nummelin (see [3, 22, 23]), which allows to split the sum in question into a random number of 1-dependent blocks of random lengths. In the context of tail inequalities, this approach has been successfully used, e.g., in [1, 2, 6, 7, 10, 12] and provides Bernstein inequalities of optimal type under an additional assumption of strong aperiodicity of the chain (corresponding to \(m=1\) in (3.1)), which ensures that the blocks are independent and allow for a reduction to inequalities for sums of i.i.d. random variables. However, in the general case the implementation of this method available in the literature leads to loss of correlation structure and as a consequence to suboptimal sub-Gaussian coefficient in Bernstein’s inequality (in place of \(\sigma ^2_{Mrv}\)). Our main technical contribution is to propose a regeneration-based approach which allows to preserve the correlation structure and recover the correct asymptotic behavior, corresponding to the CLT for Markov chains.

The organization of the article is as follows. After a brief discussion of our results (Sect. 2), we introduce the notation and provide a short description of the regeneration method (Sect. 3). Next, we state our main theorems at their full strength (Sect. 4). At the end, we present their proofs (Sect. 7). Along the way, we develop auxiliary theorems for 1-dependent random variables (Sect. 5) and bounds on number of regenerations (Sect. 6). Some technical lemmas concerning exponential Orlicz norms are deferred to Appendix.

2 Discussion of the Main Result

Let us start by recalling the Bernstein inequality in the i.i.d. bounded case.

Theorem 2.1

(Classical Bernstein inequality) If \((\xi _i)_i\) is a sequence of i.i.d. centered random variables such that \(\sup _i \Vert \xi _i \Vert _{\infty }\le M\), then for \(\sigma ^2 = {{\mathbb {E}}}\xi _i^2\) and any \(t > 0\),

$$\begin{aligned} {\mathbb {P}}\left( \sup _{1\le k\le n}\left| \sum _{i=1}^k \xi _i\right| \ge t\right) \le 2\exp \left( - \frac{t^2}{2n\sigma ^2+ \frac{2}{3} M t}\right) . \end{aligned}$$

Let us recall that the CLT for Markov chains (see, e.g., [9, 22, 23]) guarantees that under assumptions and notation of Theorem 1.1, the sums \(\frac{1}{\sqrt{n}} \sum _{i=0}^{n-1} f(\Upsilon _i)\) converge in distribution to the normal distribution \({{\mathcal {N}}}(0,\sigma ^2_{Mrv})\). Thus, the inequality obtained in Theorem 1.1 reflects (up to constants) the asymptotic normal behavior of the sums \(\frac{1}{\sqrt{n}}\sum f(\Upsilon _i)\) similarly as the classical Bernstein inequality in the i.i.d. context. Furthermore, the term \(\log n\) which appears in our inequality is necessary. More precisely, one can show that if the following inequality holds for all \(t > 0\):

$$\begin{aligned} {\mathbb {P}}_x \left( \left| \sum _{i=0}^{n-1} f(\Upsilon _i)\right| >t\right) \le const \cdot \exp \left( -\frac{t^2}{const \cdot n\sigma ^2 + const(x) \cdot a_n t \Vert f\Vert _\infty }\right) \nonumber \\ \end{aligned}$$
(2.1)

for some \(a_n = o(n)\) and \(\sigma \in {{\mathbb {R}}}\) (const’s stand for some absolute constants, whereas const(x) depends only on x and the Markov chain), then one must have \(\sigma ^2 \ge const \cdot \sigma _{Mrv}^2\). Moreover, it is known that for some geometrically ergodic chains \(a_n\) must grow at least logarithmically with n (see [1], Section 3.3).

Concentration inequalities for Markov chains and processes have been thoroughly studied in the literature, the (non-comprehensive) list of works concerning this topic includes [1, 2, 6, 7, 10,11,12,13, 15,16,17, 19, 20, 24, 25, 27]. Some results are devoted to concentration for general functions of the chain (they are usually obtained under various Lipschitz or bounded difference type conditions); others specialize to additive functionals, which are the object of study in our case. Tail inequalities for additive functionals are usually counterparts of Hoeffding or Bernstein inequalities. The former ones do not take into account the variance of the additive functional and are expressed in terms of \(\Vert f\Vert _\infty \) only. They can be often obtained as special cases of concentration inequalities for general function (see, e.g., [11, 24, 25]). Bernstein-type estimates of the form (2.1) are considered, e.g., in [1, 2, 6, 7, 10, 12, 13, 16, 17, 19, 20, 24, 27] and use various variance proxies \(\sigma ^2\), which do not necessarily coincide with the limiting variance \(\sigma _{Mrv}^2\). In the continuous time case, inequalities of Bernstein type for the natural counterpart of the additive functional, involving asymptotic variance, have been obtained under certain spectral gap or Lyapunov-type conditions in [13, 16]. For discrete time Markov chains, inequalities obtained in [1, 2, 7, 10, 12] by the regeneration method give (2.1) (under various types of ergodicity assumptions and with various parameters \(a_n\)) with \(\sigma ^2\), which coincides with \(\sigma _{Mrv}^2\) only under additional assumption of strong aperiodicity of the chain. On the other hand, the articles [19, 20, 25, 27] provide more general results, available for non-necessarily Markovian sequences of random variables, satisfying various types of mixing conditions. The variance proxies \(\sigma ^2\) that are used in these references are close to the asymptotic variance and however in general do not coincide with it. For instance, the inequality obtained in [19], which is valid in particular for geometrically ergodic chains, uses (in our notation) \(\sigma ^2 = \text {Var}_\pi (f(\Upsilon _0))+2\sum _{i=1}^\infty |\text {Cov}_\pi (f(\Upsilon _0),f(\Upsilon _i))|\). Comparing with (1.2), one can see that \(\sigma _{Mrv}^2 \le \sigma ^2\). In fact, one can construct examples when the ratio between the two quantities is arbitrarily large or even \(\sigma _{Mrv}^2 = 0\) and \(\sigma ^2 > 0\). Reference [27] provides an inequality for uniformly geometrically ergodic processes, involving a certain implicitly defined variance proxy \(\sigma _n^2\), which may be bounded from above by \(\sigma ^2\) from [19] or by \(\text {Var}_\pi (f(\Upsilon _0))+C\Vert f\Vert _\infty {{\mathbb {E}}}_\pi |f(\Upsilon _0)|\), where C is a constant depending on the mixing properties of the process. For a fixed process, in the non-degenerate situation, when the asymptotic variance is nonzero, it can be substituted for \(\sigma _n^2\) at the cost of introducing additional multiplicative constants, depending on the chain and the function f.

To the best of our knowledge, Theorem 1.1 is therefore the first tail inequality available for general geometrically ergodic Markov chains (not necessarily strongly aperiodic), which (up to universal constants) reflects the correct limiting Gaussian behavior of additive functionals. The problem of obtaining an inequality of this type was posed in [2]. Let us remark that quantitative investigation of problems related to the central limit theorems for general aperiodic Markov chains seems to be substantially more difficult than for chains which are strongly aperiodic. For instance, optimal strong approximation results are still known only in the latter case [21].

3 Notation and Basic Properties

For any \(k, l \in {{\mathbb {Z}}}\), \(k \le l\), we define integer intervals of consecutive integers

$$\begin{aligned}{}[k, l] = \left\{ k, k + 1, \ldots , l\right\} , \quad [k, l) = \left\{ k, k + 1, \ldots , l - 1\right\} , \quad [k, \infty ) = \left\{ k, k + 1, \ldots \right\} . \end{aligned}$$

For any process \(\mathbf {X} = \left( X_i\right) _{i\in {{\mathbb {N}}}}\) and \(S \subset {{\mathbb {N}}}\), we put

$$\begin{aligned} X_{S} = \left( X_i\right) _{i \in S}, \quad \mathcal {F}^\mathbf {X} = \left( \mathcal {F}_i^\mathbf {X}\right) _{i\in {{\mathbb {N}}}}, \quad \mathcal {F}_i^\mathbf {X}= \sigma \left( X_{[0, i]}\right) . \end{aligned}$$

Moreover, for \(k \in {{\mathbb {N}}}\) we define the corresponding vectorized process

$$\begin{aligned} \mathbf {X}^{(k)} = \left( X_i^{(k)}\right) _{i \in {{\mathbb {N}}}}, \qquad X_i^{(k)} = X_{[i k,(i + 1)k)}. \end{aligned}$$

Definition 3.1

(Stationarity) We say that a process \((X_n)_{n\in {{\mathbb {N}}}}\) is stationary if for any \(k \in {{\mathbb {N}}}\) the shifted process \((X_{n + k})_{n\in {{\mathbb {N}}}}\) has the same distribution as \((X_n)_{n\in {{\mathbb {N}}}}\).

Definition 3.2

(m-dependence) Fix \(m \in {{\mathbb {N}}}\). We say that \((X_n)_{n\in {{\mathbb {N}}}}\) is \(\mathbf {m}\)-dependent if for any \(k \in {{\mathbb {N}}}\) the process \((X_n)_{n \le k}\) is independent of the process \((X_n)_{n \ge m + 1 + k}\).

Remark 3.3

Let us note that a process \((X_n)_{n\in {{\mathbb {N}}}}\) is 0-dependent iff the variables \((X_n)_{n\in {{\mathbb {N}}}}\) are independent. Finally, let us give a natural example of a 1-dependent process \((X_n)_{n\in {{\mathbb {N}}}}\). Fix an independent process \((\xi _n)_{n\in {{\mathbb {N}}}}\) and a Borel, real function \(h:{{\mathbb {R}}}^2 \rightarrow {{\mathbb {R}}}\). Then, \((h(\xi _n, \xi _{n+1}))_{n\in {{\mathbb {N}}}}\) is 1-dependent. Such processes are called two-block factors. It is worth noting that there are 1-dependent processes which are not two-block factors (see [8]).

Remark 3.4

Assume that a process \((X_n)_{n\in {{\mathbb {N}}}}\) is m-dependent. Then for any \(n_0\in {{\mathbb {N}}}\), the process \((X_{n_0 + k(m+1)})_{k\in {{\mathbb {N}}}}\) is independent. Moreover, if the process \((X_n)_{n\in {{\mathbb {N}}}}\) is stationary, then for any \(n_0\in {{\mathbb {N}}}\), \((X_{n_0 + k(m+1)})_{k\in {{\mathbb {N}}}}\) is a collection of i.i.d. random variables.

3.1 Split Chain

As already mentioned in the Introduction, our proofs will be based on the regeneration technique which was invented independently by Nummelin and Athreya–Ney (see [3] and [23]) and was popularized by Meyn and Tweedie [22]. We will introduce the split chain and then regeneration times of the split chain. The construction of the split chain is well known, and as references, we recommend [22] (Chaps. 5,17) and [23]. We briefly recall this technique below. Let us stress that although this construction is based on the one presented in [22], our notation is slightly different. Firstly, let us recall the minorization condition for Markov chains which plays a main role in the splitting technique.

Definition 3.5

We say that a Markov chain \(\varvec{\Upsilon }\) satisfies the minorization condition if there exists a set \(C\in {{\mathcal {B}}}({{\mathcal {X}}})\) (called a small set), a probability measure \(\nu \) on \(\mathcal {X}\) (a small measure), a constant \(\delta >0\) and a positive integer \(m\in {{\mathbb {N}}}\) such that \(\pi (C) > 0\) and

$$\begin{aligned} P^m(x,B)\ge \delta \nu (B) \end{aligned}$$
(3.1)

holds for all \(x\in C\) and \(B\in {{\mathcal {B}}}({{\mathcal {X}}})\).

Remark 3.6

One can assume that \(\nu (C)=1\) (possibly at the cost of increasing m).

Remark 3.7

One can check that under assumptions of our theorem, the minorization condition (3.1) holds for some C, \(\nu \), \(\delta \) and m. We refer to [22], Section 5.2 for the proof of this fact.

Fix C, m, \(\nu \) and \(\delta >0\) as in (3.1). The minorization condition allows us to redefine the chain \(\varvec{\Upsilon }\) together with an auxiliary regeneration structure. More precisely, we start with a splitting of the space \({{\mathcal {X}}}\) into two identical copies on level 0 and 1, namely we consider \({\overline{{{\mathcal {X}}}}} ={{\mathcal {X}}}\times \{0,1\}\). Now, we split \(\varvec{\Upsilon }\) in the following way. We consider a process \(\varvec{\Phi } =(\varvec{\Upsilon }, \varvec{\Lambda })=(\Upsilon _i,\Lambda _i)_{i\ge 0}\) (usually called the split chain) defined on \({\overline{{{\mathcal {X}}}}}\). (We slightly abuse the notation by denoting the first coordinate of the split chain with the same letter as for the initial Markov chain, but it will turn out that the first coordinate of the split chain has the same distribution as the starting Markov chain, so this notation is justified.) The random variables \(\Lambda _k\) take values in \(\{0,1\}\). (They indicate the level on which \(\Phi _k\) is.) For a fixed \(x\in C\), let

$$\begin{aligned} r(x,y) = \frac{\delta \nu (dy)}{P^m(x,dy)} \end{aligned}$$
(3.2)

and note that the above Radon–Nikodym derivative is well defined thanks to (3.1). Moreover, \(r(x,y) \le 1\). Now, for any \(A_1,\ldots ,A_m\in {{\mathcal {B}}}({{\mathcal {X}}})\), \(k\in {{\mathbb {N}}}\) and \(i \in \{0, 1\}\) set

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}\left( \Lambda _{km} = i, \Upsilon _{[km+1, (k + 1)m]}\in A_1\times \cdots \times A_m \;|\;{{\mathcal {F}}}_{km}^{\varvec{\Upsilon }}, {{\mathcal {F}}}_{km-m}^{\varvec{\Lambda }} ,\Upsilon _{km}=x\right) \\&\quad = {\mathbb {P}}\left( \Lambda _{0} = i, \Upsilon _{[1, m]}\in A_1\times \cdots \times A_m \;|\; \Upsilon _{0}=x\right) \\&\quad =\int _{A_1} \cdots \int _{A_m} r(x,x_m, i) P(x_{m-1},dx_m)P(x_{m-2},dx_{m-1})\ldots P(x,dx_{1}), \end{aligned} \end{aligned}$$
(3.3)

where

$$\begin{aligned} r(x, y, i) = \left\{ \begin{array}{ll} {\mathbb {1}}_{x\in C}\; r(x,x_m), \quad &{}\text {if } i = 1, \\ 1-{\mathbb {1}}_{x\in C}\; r(x,x_m), \quad &{}\text {if } i = 0. \\ \end{array}\right. \end{aligned}$$
(3.4)

Moreover, for any \(k, i \in {{\mathbb {N}}}\) such that \(km< i < (k + 1)m\) we set

$$\begin{aligned} \Lambda _i = \Lambda _{km}. \end{aligned}$$
(3.5)

Remark 3.8

(Initial distribution for the split chain) In order to be able to set initial distribution for the split chain for arbitrary probability measure \(\mu \) on \({{\mathcal {X}}}\), we define the split measure \(\mu ^*\) on \({\overline{{{\mathcal {X}}}}}\) by:

$$\begin{aligned} \mu ^*(A\times \{i\}) = \left\{ \begin{array}{ll} (1-\delta )\mu (C\cap A)+\mu (A\cap C^c), \quad &{}\text {if } i = 0, \\ \delta \mu (C\cap A), \quad &{}\text {if } i = 1 . \\ \end{array}\right. \end{aligned}$$
(3.6)

Such definition ensures that \((\Upsilon _0, \Lambda _0) \sim \mu ^*\) as soon as \(\Upsilon _0 \sim \mu \). For convenience sake, for any \(x \in {{\mathcal {X}}}\), we will write

$$\begin{aligned} {\mathbb {P}}_{x^*}(\cdot ) = {\mathbb {P}}_{\delta _{x}^*}(\cdot ). \end{aligned}$$
(3.7)

Remark 3.9

(Markov-like properties of the split chain) In order to give some intuition behind the definition of the split chain, note that the distribution of the first coordinate of the split chain \(\varvec{\Phi }\) with initial distribution \(\mu ^{*}\) coincides with that of the original Markov chain \(\varvec{\Upsilon }\) which starts from \(\mu \). From now on, \(\varvec{\Upsilon }\) always corresponds to this first coordinate of the split chain. One can easily generalize (3.3) to show the following Markov-like property of the split chain: For any \(k\in {{\mathbb {N}}}\) and product measurable bounded function F, we have

$$\begin{aligned} {{\mathbb {E}}}\left( F\left( \Upsilon _{[km+1, \infty )}, \Lambda _{[km, \infty )}\right) \;|\;{{\mathcal {F}}}_{km}^{\varvec{\Upsilon }},{{\mathcal {F}}}_{km-m}^{\varvec{\Lambda }} \right) = {{\mathbb {E}}}\left( F\left( \Upsilon _{[km+1, \infty )}, \Lambda _{[km, \infty )}\right) \;|\;\Upsilon _{km} \right) .\nonumber \\ \end{aligned}$$
(3.8)

This, in turn, leads to the fact that the vectorized split chain \(\varvec{\Phi }^{(m)}\) is a Markov chain. Even more, for any product measurable bounded function F and \(k\in {{\mathbb {N}}}\) we have

$$\begin{aligned} {{\mathbb {E}}}\left( F\left( \Phi ^{(m)}_{[k, \infty )} \right) |\;\Phi ^{(m)}_{[0, k)}\right)= & {} {{\mathbb {E}}}\left( F\left( \Phi ^{(m)}_{[k, \infty )} \right) |\; \Phi ^{(m)}_{k - 1}\right) \\= & {} {{\mathbb {E}}}\left( F\left( \Phi ^{(m)}_{[k, \infty )} \right) |\; \Upsilon _{mk-m}, \Upsilon _{mk - 1}, \Lambda _{mk - m}\right) . \end{aligned}$$

Now, we can introduce the aforementioned regeneration structure for \(\varvec{\Phi }\). Firstly, we define certain stopping times. For convenience, we put \(\tau _{-1} = -m\), and then, for \(i \ge 0\) we define \(\tau _i\) to be the ith time when the second coordinate (level coordinate) hits 1, namely

$$\begin{aligned} \tau _i = \min \{k> \tau _{i-1} \;\big |\; \Lambda _{k}=1, \; m|k\}. \end{aligned}$$
(3.9)

Now, we are ready to introduce the random blocks and the random block process

$$\begin{aligned} \Xi _i=\Upsilon _{[\tau _{i-1}+m, \tau _i + m)}, \quad \varvec{\Xi } = \left( \Xi _{i}\right) _{i \ge 0}, \end{aligned}$$
(3.10)

where we consider \( \Xi _i\) as a random variable with values in the disjoint union \(\bigsqcup _{j \ge 0}\mathcal {X}^j\). For clarity of this presentation, here and later on, we omit the measurability details.

Remark 3.10

Let us now briefly discuss the behavior of these random blocks. Firstly, by the strong Markov property of the vectorized split chain it is not hard to see that \(\varvec{\Xi }\) is a Markov chain. On a closer look, one can see that for any product measurable function F

$$\begin{aligned} {{\mathbb {E}}}\left( F\left( \Xi _{[i, \infty )}\right) \;|\;\Xi _{[0, i)}\right) = {{\mathbb {E}}}\left( F\left( \Xi _{[i, \infty )}\right) \;|\;\Xi _{i - 1}\right) = {{\mathbb {E}}}\left( F\left( \Xi _{[i, \infty )}\right) \;|\;\text {pr}_m\left( \Xi _{i - 1}\right) \right) ,\nonumber \\ \end{aligned}$$
(3.11)

where \(\text {pr}_m:\bigsqcup _{j \ge m}\mathcal {X}^j\rightarrow \mathcal {X}^m\) is a projection on m-last coordinates,

$$\begin{aligned} \text {pr}_m\left( x_0, \ldots , x_j\right) = \left( x_{j - m + 1}, \ldots , x_j\right) . \end{aligned}$$
(3.12)

Apart from being Markovian, the sequence \((\Xi _i)_{i \ge 0}\) is 1-dependent, whereas \((\Xi _i)_{i \ge 1}\) is stationary (see [9], Corollary 2.4). The stationarity follows from the fact that for m|k, we have

$$\begin{aligned} \mathcal {L}\left( \Upsilon _{k + m}\;|\; \Lambda _{k} = 1\right) = \nu , \end{aligned}$$
(3.13)

that is, every time k (which is a multiple of m) the split chain is on level 1 (note that this implies \(\Upsilon _k \in C\)) and the split chain regenerates and starts anew from \(\nu \). Furthermore, the lengths of \(\Xi _i\):

$$\begin{aligned} \left| \Xi _i\right| = \tau _{i} - \tau _{i - 1}, \end{aligned}$$
(3.14)

are independent random variables for \(i \ge 0\) and form a stationary process for \(i\ge 1\). Let us add that if \(m = 1\), one can show that \(\Xi _i\)’s are independent. This fact makes a crucial difference between strongly aperiodic and not strongly aperiodic Markov chains (see [5, Section 6]).

At last, let us introduce the excursions and the excursion process

$$\begin{aligned} \chi _i = \chi _i\left( f\right) = \sum _{j = \tau _i + m}^{\tau _{i + 1}+ m - 1} f(\Upsilon _j), \qquad \varvec{\chi } = \left( \chi _i\right) _{i \ge 0}, \end{aligned}$$
(3.15)

which will play a crucial role in our future considerations. By properties of the random blocks, one concludes that \(\varvec{\chi }\) is 1-dependent and satisfies

$$\begin{aligned} {{\mathbb {E}}}\left( \chi _i \;|\; \Xi _{[0, i]}\right) = {{\mathbb {E}}}\left( \chi _i \;|\; \Xi _{i}\right) . \end{aligned}$$
(3.16)

Moreover, \(\left( \chi _i\right) _{i \ge 1}\) is stationary. Due to the Pitman occupation measure formula (see, [22], Theorem 17.3.1, page 428) which says that for any measurable real function G,

$$\begin{aligned} {{\mathbb {E}}}_\nu \sum _{i=0}^{\tau _0/m} G(\Upsilon _{mi},\Lambda _{mi})=\delta ^{-1}\pi (C)^{-1}{{\mathbb {E}}}_\pi G(\Upsilon _0,\Lambda _0), \end{aligned}$$
(3.17)

and observation that \({\mathbb {P}}_\mu \)-distribution of excursion \(\chi _i(f)\) (\(i \ge 1\)) is equal to the \({\mathbb {P}}_\nu \) -distribution of \(\chi _0\), we get that for any initial distribution \(\mu \) and any \(i \ge 1\),

$$\begin{aligned} {{\mathbb {E}}}_\mu \chi _i = {{\mathbb {E}}}_\nu \chi _{0}= \delta ^{-1}\pi (C)^{-1}m\int f d\pi . \end{aligned}$$
(3.18)

As a consequence, \({{\mathbb {E}}}_\pi f(\Upsilon _i) = 0\) implies that for every \(i \ge 1\), \({{\mathbb {E}}}_\mu \chi _i(f) = 0\). Now, we are ready to decompose our sums into random blocks. If m|n, then

$$\begin{aligned} \sum _{i=0}^{n-1}f(\Upsilon _i)= & {} \left( \sum _{i=0}^{\tau _0/m} \Theta _i {\mathbb {1}}_{N>0} + {\mathbb {1}}_{N=0} \sum _{i=0}^{n/m - 1} \Theta _i \right) +\left( \sum _{i=1}^{N} \chi _{i-1}(f)\right) \nonumber \\&- \left( {\mathbb {1}}_{N>0} \sum _{k=n}^{\tau _N+ m - 1} f(\Upsilon _k)\right) , \end{aligned}$$
(3.19)

where

$$\begin{aligned} \Theta _k = \Theta _k(f) = \sum _{i=0}^{m-1} f(\Upsilon _{km+i}), \qquad N = \inf \{i\ge 0 \;|\; \tau _i + m - 1 \ge n-1\}.\nonumber \\ \end{aligned}$$
(3.20)

This decomposition will be of utmost importance in our proof.

3.2 Asymptotic Variances

During the upcoming proofs, we will meet two types of asymptotic variances: \(\sigma _{Mrv}^2\) associated with the process \(\left( f(\Upsilon _i)\right) _{i \ge 0}\) and \(\sigma _{ \infty }^2\) associated with \(\varvec{\chi }\). The first one defined as

$$\begin{aligned} \sigma _{Mrv}^2= & {} \lim \limits _{n\rightarrow \infty }\frac{1}{n}\text {Var}\left( f(\Upsilon _0) + \cdots + f(\Upsilon _{n - 1}) \right) \nonumber \\= & {} \text {Var}_\pi (f(\Upsilon _0)) + 2 \sum _{i\ge 1} \text {Cov}_\pi (f(\Upsilon _i),f(\Upsilon _0)) \end{aligned}$$
(3.21)

is exactly the variance of the limiting normal distribution of the sequence \(\frac{1}{\sqrt{n}} \sum _{i=1}^n f(\Upsilon _i)\). The second one:

$$\begin{aligned} \sigma _\infty ^2 = \lim \limits _{n\rightarrow \infty }\frac{1}{n}\text {Var}\left( \chi _{1} + \cdots + \chi _n \right) = {{\mathbb {E}}}\chi _1^2 + 2{{\mathbb {E}}}\chi _1 \chi _2, \end{aligned}$$

is the variance of the limiting normal distribution of the sequence \(\frac{1}{\sqrt{n}} \sum _{i=1}^n \chi _i\). Both asymptotic variances are very closely linked via the formula

$$\begin{aligned} \sigma _\infty ^2 = \sigma _{Mrv}^2 {{\mathbb {E}}}(\tau _1-\tau _0) = \sigma _{Mrv}^2 m \delta ^{-1} \pi (C)^{-1}. \end{aligned}$$
(3.22)

For the proof of this formula, we refer to [22] (see (17.32), page 434).

4 Main Results

In order to state our results in the general form, we need to recall the definition of the exponential Orlicz norm. For any random variable X and \(\alpha >0\), we define

$$\begin{aligned} \Vert X \Vert _{\psi _\alpha }=\inf \Big \{c>0 \;|\; {{\mathbb {E}}}\exp \left( \frac{|X|^\alpha }{c^\alpha }\right) \le 2\Big \}. \end{aligned}$$
(4.1)

If \(\alpha < 1\), then \(\Vert \cdot \Vert _{\psi _\alpha }\) is just a quasi-norm. (For basic properties of these quasi-norms, we refer to Appendix A.) In what follows, we will deal with various underlying measures on the state space \({\overline{{{\mathcal {X}}}}}\). In order to stress the dependence of the Orlicz norm on the initial distribution \(\mu \) of the chain \(\varvec{\Phi }\), we will sometimes write \(\Vert \cdot \Vert _{\psi _\alpha ,\mu }\) instead of \(\Vert \cdot \Vert _{\psi _\alpha }\).

Before we formulate our main result, let us introduce and explain the role of the following parameters:

$$\begin{aligned} \mathbf{{{a}} }= & {} \left\| \sum _{k=0}^{\tau _0 /m} \left| \Theta _k\right| \right\| _{\psi _\alpha , {\mathbb {P}}_{x^*}}, \quad \mathbf{{{b}} }= \left\| \sum _{k=0}^{\tau _0 /m} \left| \Theta _k\right| \right\| _{\psi _\alpha ,{\mathbb {P}}_{\pi ^*}},\nonumber \\ \mathbf{{{c}} }= & {} \left\| \chi _i(f)\right\| _{\psi _\alpha },\quad \mathbf{{{d}} }= \left\| \tau _1-\tau _0\right\| _{\psi _1}, \end{aligned}$$
(4.2)

where \(\Theta _k = \sum _{i=0}^{m-1} f(\Upsilon _{km+i})\) (cf. (3.19)). The parameter \(\mathbf{{{a}} }\) (resp. \(\mathbf{{{b}} }\)) will allow us to estimate the first (third) term on the right-hand side of (3.19), whereas the parameters \(\mathbf{{{c}} }\) and \(\mathbf{{{d}} }\) will be used to control the middle term. We note that \(\mathbf{{{d}} }\) quantifies geometric ergodicity of \(\varvec{\Upsilon }\) and is finite as soon as \(\varvec{\Upsilon }\) is geometrically ergodic. Let us mention that all these parameters can be bounded, for example, by means of drift conditions widely used in the theory of Markov chains (see Remark 4.2). Finally, let us remind that \(\sigma _{Mrv}^2 = \text {Var}_\pi (f(\Upsilon _0)) +2 \sum _{i = 1}^\infty \text {Cov}_\pi (f(\Upsilon _0),f(\Upsilon _i))\) denotes the asymptotic variance of normalized partial sums of the process \(\left( f(\Upsilon _i)\right) _i\).

We are now ready to formulate the first of our main results. (Recall the definitions of the small set C and the minorization condition (3.1).)

Theorem 4.1

Let \(\varvec{\Upsilon }\) be a geometrically ergodic Markov chain and \(\pi \) be its unique stationary probability measure. Let \(f:{{\mathcal {X}}}\rightarrow {{\mathbb {R}}}\) be a measurable function such that \({{\mathbb {E}}}_\pi f=0\) and let \(\alpha \in (0,1]\). Moreover, assume for simplicity that m|n. Then for all \(x\in {{\mathcal {X}}}\) and \(t>0\),

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}_x \left( \left| \sum _{i=0}^{n-1} f(\Upsilon _i)\right| >t\right) \le 2\exp \left( -\frac{t^\alpha }{(23\mathbf{{{a}} })^\alpha }\right) + 2\left[ \delta \pi (C)\right] ^{-1} \exp \left( -\frac{t^\alpha }{(23\mathbf{{{b}} })^\alpha }\right) \\&\quad + 6\exp (8)\exp \left( -\frac{t^\alpha }{\frac{16}{\alpha } (27 \mathbf{{{c}} })^\alpha }\right) + 6\exp \left( -\frac{t^2}{30 n\sigma _{Mrv}^2+ 8 t M}\right) \\&\quad +\exp (1)\exp \left( -\frac{ n m}{67 \delta \pi (C)\mathbf{{{d}} }^2 }\right) , \end{aligned} \end{aligned}$$
(4.3)

where \(\sigma ^2_{Mrv}\) denotes the asymptotic variance for the process \((f(\Upsilon _i))_i\) given by (3.21), the parameters \(\mathbf{{{a}} },\mathbf{{{b}} },\mathbf{{{c}} },\mathbf{{{d}} }\) are defined by (4.2) and \(M=\mathbf{{{c}} }(24\alpha ^{-3} \log {n})^\frac{1}{ \alpha }\).

Remark 4.2

For the conditions under which \(\mathbf{{{a}} },\mathbf{{{b}} },\mathbf{{{c}} }\) are finite, we refer to [2], where the authors give bounds on \(\mathbf{{{a}} },\mathbf{{{b}} },\mathbf{{{c}} }\) under classical drift conditions. If f is bounded, then one easily shows that

$$\begin{aligned} \max \left( \mathbf{{{a}} }, \mathbf{{{b}} }\right) \le 2 D \Vert f\Vert _\infty ,\qquad \mathbf{{{c}} }\le D\Vert f\Vert _{\infty }, \end{aligned}$$
(4.4)

where \(D = \max \left( \mathbf{{{d}} }, \Vert \tau _0\Vert _{\psi _1, \;{\mathbb {P}}_{x^*}}, \Vert \tau _0\Vert _{\psi _1, \;{\mathbb {P}}_{\pi ^*}}\right) \). For computable bounds on D, we refer to [4].

Let us note that in Theorem 4.1, the right-hand side of the inequality does not converge to 0 when t tends to infinity. (One of the terms depends on n but not on t.) Usually, in applications t is of order at most n and the other terms dominate on the right-hand side of the inequality, so this does not pose a problem. Nevertheless, one can obtain another version of Theorem 4.1, namely

Theorem 4.3

Under the assumptions and notation of Theorem 4.1, we have

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}_x \left( \left| \sum _{i=0}^{n-1} f(\Upsilon _i)\right| >t\right) \le 2\exp \left( -\frac{t^\alpha }{(54\mathbf{{{a}} })^\alpha }\right) + 2\left[ \delta \pi (C)\right] ^{-1} \exp \left( -\frac{t^\alpha }{(54\mathbf{{{b}} })^\alpha }\right) \\&\;\; + 4\exp (8)\exp \left( -\frac{t^\alpha }{\frac{16}{\alpha }(27\mathbf{{{c}} })^\alpha }\right) + 6\exp \left( -\frac{t^2}{37(1+p) n\sigma _{Mrv}^2+ 18 M \mathbf{{{d}} }\sqrt{K_p}t}\right) , \\ \end{aligned}\nonumber \\ \end{aligned}$$
(4.5)

where \(K_p = L_p + 16/L_p\) and \(L_p = \frac{16}{p} + 20\).

It is well known that for geometrically ergodic chains \(\Vert \tau _0\Vert _{\psi _1, \;{\mathbb {P}}_{x^*}}\), \(\Vert \tau _0\Vert _{\psi _1, \;{\mathbb {P}}_{\pi ^*}}\), \(\Vert \tau _1 - \tau _0\Vert _{\psi _1} < \infty \) (see [4] for constructive estimates). Therefore, (4.4) and Theorem 4.1 lead to

Theorem 4.4

Let \(\varvec{\Upsilon }\) be a geometrically ergodic Markov chain and \(\pi \) be its unique stationary, probability measure. Let \(f:{{\mathcal {X}}}\rightarrow {{\mathbb {R}}}\) be a bounded, measurable function such that \({{\mathbb {E}}}_\pi f=0\). Fix \(x\in {{\mathcal {X}}}\). Moreover, assume that \(\Vert \tau _0\Vert _{\psi _1,\delta _{x}^*}\), \(\Vert \tau _0\Vert _{\psi _1,\pi ^*}\), \(\Vert \tau _1-\tau _0\Vert _{\psi _1} \le D\). Then for all \(t>0\),

$$\begin{aligned} \begin{aligned} {\mathbb {P}}_x \left( \left| \sum _{i=0}^{n-1} f(\Upsilon _i)\right| >t\right)&\le K \exp \left( -\frac{t^2}{32 n \sigma _{Mrv}^2+ 433 t \delta \pi (C)\Vert f\Vert _\infty D^2 \log n} \right) , \end{aligned}\nonumber \\ \end{aligned}$$
(4.6)

where \(\sigma ^2_{Mrv}\) is the asymptotic variance of \((f(\Upsilon _i))_i\) and \(K = \exp (10)+2\delta ^{-1}\pi (C)^{-1}\).

Remark 4.5

Theorem 4.4 implies our main Theorem 1.1 from Introduction with constants \(K = \left( \exp (10) + 2\delta ^{-1}\pi (C)^{-1}\right) \) and \(\tau =433 \delta \pi (C) D^2 \).

5 Bernstein Inequalities for 1-Dependent Sequences

In this section, we will show two versions (for suprema and randomly stopped sums) of Bernstein inequality for 1-dependent random variables. They will be later used in the proofs of our main theorems. In what follows for a 1-dependent sequence of random variables \((X_i)_{i\ge 0}\), \(\sigma _\infty ^2\) denotes the asymptotic variance of normalized partial sums, i.e.,

$$\begin{aligned} \sigma _\infty ^2 = {{\mathbb {E}}}X_1^2 + 2{{\mathbb {E}}}X_1X_2. \end{aligned}$$

Lemma 5.1

(Bernstein inequality for suprema of partial sums) Let \((X_i)_{i\ge 0}\) be a 1-dependent sequence of centered random variables such that \({{\mathbb {E}}}\exp (c^{-\alpha }|X_i|^\alpha )\le 2\) for some \(\alpha \in (0,1]\) and \(c>0\). Assume that there exists a filtration \(\left( {{\mathcal {F}}}_i\right) _{i\ge 0}\) such that for \(Z_i = X_i + {{\mathbb {E}}}\left( X_{i+1}|{{\mathcal {F}}}_i\right) -{{\mathbb {E}}}\left( X_{i}|{{\mathcal {F}}}_{i-1}\right) \) we have the following:

  1. (0)

    \(X_i\) is \({{\mathcal {F}}}_i\) measurable,

  2. (1)

    \((Z_i)_{i\ge 1}\) is stationary,

  3. (2)

    \((Z_i)_{i\ge 1}\) is m-dependent with \(m=1\) or \(m=2\),

  4. (3)

    \(\left( {{\mathbb {E}}}\left( X_{i}|{{\mathcal {F}}}_{i-1}\right) \right) _{i\ge 1}\) is stationary,

  5. (4)

    \( {{\mathbb {E}}}(X_i|{{\mathcal {F}}}_{i-1})\) is independent of \(X_{i+1}\) for any \(i \ge 1\).

Then,

$$\begin{aligned} {{\mathbb {E}}}Z_i^2 = \sigma _\infty ^2, \quad \Vert Z_i\Vert _{\psi _\alpha } \le c (8/\alpha )^\frac{1}{\alpha }. \end{aligned}$$
(5.1)

Moreover, for any \(t > 0\) and \(n\in {{\mathbb {N}}}\),

$$\begin{aligned} {\mathbb {P}}\left( \sup _{1 \le k \le n} \left| \sum _{i=1}^k X_i\right| > t \right) \le K_m\exp \left( -\frac{t^\alpha }{u_m c^\alpha }\right) \;\; + L_m\exp \left( -\frac{t^2}{v_{n, m} \sigma _\infty ^2+ w_{n, m} t}\right) \nonumber \\ \end{aligned}$$
(5.2)

where \(u_m=\frac{16\cdot 8^\alpha (m+1)^\alpha }{\alpha }\), \(v_{n, m}=5(m+1)(n + m + 1)\), \(w_{n, m}=2(m+1)(24\alpha ^{-3} \log {n})^\frac{1}{ \alpha }c\), \(K_m = 2(m + 1)\exp (8)\) and \(L_m = 2(m + 1)\).

Proof

Firstly, we will show that if \(X_i\)’s are centered, independent random variables with common variance \(\sigma _\infty ^2\) and \({{\mathbb {E}}}\exp (c^{-\alpha }|X_i|^\alpha )\le 2\), then (5.2) holds with \(u_0 = 2\cdot 6^\alpha \), \(v_{n, 0}=\frac{72}{25}n\), \(w_{n, 0}=\frac{8}{5}c \left( 3\alpha ^{-2}\log n\right) ^{\frac{1}{\alpha }}\), \(K_0 = \exp (8)\) and \(L_0 = 2\) (allowing for a slight abuse of precision we consider this the \(m=0\) case of the lemma). Indeed, by Lemma 4.1 in [2] for \(\lambda = (2^{1/ \alpha } c)^{-1}\),

$$\begin{aligned} {{\mathbb {E}}}\exp \left( \lambda ^\alpha \sum _{i=0}^{n-1} \left( |U_i|^\alpha +({{\mathbb {E}}}|U_i|)^\alpha \right) \right) \le \exp (8), \end{aligned}$$
(5.3)

where \(U_i = X_i \mathbb {1}_{\left| X_i\right| > M_0}\) stands for the “unbounded” part of \(X_i\) and \(M_0 = c \left( 3\alpha ^{-2}\log n\right) ^{\frac{1}{\alpha }}\). Define the “bounded” part of \(X_i\), \(B_i = X_i {\mathbb {1}}_{|X_i| \le M_0}\) and notice that \(X_i = \overline{B_i} + \overline{U_i}\), where \(\overline{B_i} = B_i - {{\mathbb {E}}}B_i\) and \(\overline{U_i} = U_i - {{\mathbb {E}}}U_i\). Using the union bound, we get for \(p = 1/6\)

$$\begin{aligned} {\mathbb {P}}\left( \sup _{1\le k \le n}\left| \sum _{i=1}^k X_i\right|> t \right)\le & {} {\mathbb {P}}\left( \sup _{1\le k \le n}\left| \sum _{i=1}^k \overline{U_i} \right|> tp \right) \\&+ {\mathbb {P}}\left( \sup _{1\le k \le n}\left| \sum _{i=1}^k \overline{B_i} \right| > t(1-p) \right) . \end{aligned}$$

Consider first the unbounded part. Using the subadditivity of \(x \rightarrow x^\alpha \), Markov’s inequality and then (5.3), we get

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}\left( \sup _{1\le k \le n}\left| \sum _{i=1}^k \overline{U_i} \right|> tp \right) \le {\mathbb {P}}\left( \exp \left( \lambda ^\alpha \sum _{i=1}^n |\overline{U_i}|^\alpha \right) > \exp \left( \lambda p t\right) ^\alpha \right) \\&\qquad \qquad \le \exp \left( 8\right) \exp \left( -\frac{t^\alpha p^\alpha }{2 c^\alpha }\right) = \exp \left( 8\right) \exp \left( -\frac{t^\alpha }{2(6c)^\alpha }\right) . \end{aligned} \end{aligned}$$

As for the “bounded” part, notice that \({{\mathbb {E}}}\overline{B_i}^2 \le {{\mathbb {E}}}B_i^2 \le {{\mathbb {E}}}X_i^2 = \sigma _\infty ^2\). Therefore, using the classical Bernstein inequality we get

$$\begin{aligned} {\mathbb {P}}\left( \sup _{1\le k \le n}\left| \sum _{i=1}^k \overline{B_i} \right| > t(1-p) \right) \le 2\exp \left( -\frac{t^2(1-p)^2}{2n\sigma ^2_\infty + \frac{4}{3}t(1-p)M_0}\right) . \end{aligned}$$

Combining the three last estimates and substituting \(p = 1/6\) allow to finish the proof for independent random variables.

We will now use the independent case to prove the tail estimate (5.2), assuming (5.1), the proof of which we postpone. Note that (5.2) is trivial unless \(t\ge w_m\log \left( 2(m+1)\right) \) (as the right-hand side exceeds 1). Therefore, from now on we will consider only t satisfying this lower bound. In particular, setting \(p = 1/5\), we have \(t \ge \frac{2}{p}(2/\alpha )^\frac{1}{\alpha }c\) and \( t\ge 4^\frac{1}{\alpha } \frac{2c}{p} \log (n)^\frac{1}{\alpha }\). Using the union bound and the assumption 3), we get (denoting for brevity \({{\mathbb {E}}}_i\left( \cdot \right) = {{\mathbb {E}}}\left( \cdot \;|\;{{\mathcal {F}}}_i\right) \))

$$\begin{aligned} {\mathbb {P}}\left( \sup _{1\le k \le n}\left| \sum _{i=1}^k X_i\right|> t \right)\le & {} {\mathbb {P}}\left( \sup _{1\le k \le n}\left| \sum _{i=1}^k Z_i\right|> t(1-p) \right) \nonumber \\&+ {\mathbb {P}}\left( \sup _{1\le i \le n}\left| {{\mathbb {E}}}_i X_{i+1} - {{\mathbb {E}}}_0 X_{1}\right|> tp \right) \nonumber \\\le & {} {\mathbb {P}}\left( \sup _{1\le k \le n}\left| \sum _{i=1}^k Z_i\right|> t(1-p) \right) \nonumber \\&+ 2{\mathbb {P}}\left( \sup _{1\le i \le n}\left| {{\mathbb {E}}}_{i - 1}X_{i}\right| > \frac{tp}{2} \right) . \end{aligned}$$
(5.4)

By another application of the union bound together with Lemma A.5 and stationarity of \(\left( {{\mathbb {E}}}_{i - 1}X_i\right) _i\), we obtain

$$\begin{aligned} 2{\mathbb {P}}\left( \sup _{1\le i \le n}\left| {{\mathbb {E}}}_{i - 1}X_{i}\right|> \frac{tp}{2} \right) \le 2n {\mathbb {P}}\left( \left| {{\mathbb {E}}}_0 X_{1} \right| > \frac{tp}{2} \right) \le 12n\exp \left( -\frac{p^\alpha t^\alpha }{2(2c)^\alpha }\right) . \end{aligned}$$

Notice that

$$\begin{aligned} 12n\exp \left( -\frac{p^\alpha t^\alpha }{2(2c)^\alpha }\right)= & {} 12\left[ n\exp \left( -\frac{p^\alpha t^\alpha }{4(2c)^\alpha }\right) \right] \exp \left( -\frac{p^\alpha t^\alpha }{4(2c)^\alpha }\right) \\\le & {} 12\exp \left( -\frac{p^\alpha t^\alpha }{4(2c)^\alpha }\right) , \end{aligned}$$

where the inequality is a consequence of the estimate \( t\ge 4^\frac{1}{\alpha } \frac{2c}{p} \log (n)^\frac{1}{\alpha }\). It follows that

$$\begin{aligned} 2{\mathbb {P}}\left( \sup _{1\le i \le n}\left| {{\mathbb {E}}}_{i - 1} X_{i}\right| > \frac{pt}{2} \right) \le 12\exp \left( -\frac{p^\alpha t^\alpha }{4(2c)^\alpha }\right) = 12\exp \left( -\frac{t^\alpha }{4(10c)^\alpha }\right) .\nonumber \\ \end{aligned}$$
(5.5)

In order to deal with \({\mathbb {P}}\left( \left| \sum _{i=1}^n Z_i\right| > t(1-p) \right) \), we start with splitting this sum into \(m+1\) parts and using the union bound, namely

$$\begin{aligned} {\mathbb {P}}\left( \sup _{1\le k \le n}\left| \sum _{i=1}^k Z_i\right|> t(1-p) \right) \le \sum _{j=0}^m {\mathbb {P}}\left( \sup _{1\le k \le n}\left| \sum _{1\le i \le k, m+1|i-j} Z_i\right| > \frac{t(1-p)}{m+1} \right) . \end{aligned}$$

Now, to each summand on the right-hand side of the above inequality we will apply the estimate for the independent case obtained at the beginning of this proof. Setting \(M = (24\alpha ^{-3} \log {n})^\frac{1}{ \alpha }c\) and taking into account (5.1), we obtain

$$\begin{aligned}&\frac{1}{m + 1}{\mathbb {P}}\left( \sup _{1\le k \le n}\left| \sum _{i=1}^k Z_i\right|> t(1-p) \right) \nonumber \\&\quad \le \frac{1}{m + 1} \sum _{j=0}^m {\mathbb {P}}\left( \sup _{1\le k \le n}\left| \sum _{1\le i \le k, m+1|i-j} Z_i\right| > \frac{t(1-p)}{m+1} \right) \nonumber \\&\quad \le \exp (8)\exp \left( -\frac{t^\alpha }{\frac{16}{\alpha } (8(m+1)c)^\alpha }\right) \nonumber \\&\qquad + 2\exp \left( -\frac{(1-p)^2 t^2 }{\frac{72}{25}(m+1)\left[ \left( n+m+1\right) \sigma _\infty ^2+\frac{8}{5}(1-p)tM\right] }\right) \nonumber \\&\quad \le \exp (8)\exp \left( -\frac{t^\alpha }{\frac{16}{\alpha } (8(m+1)c)^\alpha }\right) \nonumber \\&\qquad + 2\exp \left( -\frac{ t^2 }{(m+1)\left[ 5\left( n+ m + 1\right) \sigma _\infty ^2+2tM\right] }\right) . \end{aligned}$$
(5.6)

Finally, using (5.4), (5.5) and (5.6) we get

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}\left( \sup _{1\le k \le n}\left| \sum _{i=1}^k X_i\right| > t \right) \le 12\exp \left( -\frac{t^\alpha }{4(10c)^\alpha }\right) \\&\quad + (m+1)\exp (8)\exp \left( -\frac{t^\alpha }{\frac{16}{\alpha } (8(m+1)c)^\alpha }\right) \\&\quad + 2(m+1)\exp \left( -\frac{t^2}{5(m+1)\left( n+m+1\right) \sigma _\infty ^2+2(m+1)t M}\right) . \end{aligned} \end{aligned}$$

To conclude (5.2), it is now enough to note that the second summand on the right-hand side above dominates the first one.

To finish the proof of the lemma, it remains to show (5.1). Firstly, we address the variance of \(Z_i\), which can be easily calculated by using the properties of conditional expectation. We have (recall the notation \({{\mathbb {E}}}_i\left( \cdot \right) = {{\mathbb {E}}}\left( \cdot \;|\;{{\mathcal {F}}}_i\right) \))

$$\begin{aligned} \begin{aligned} {{\mathbb {E}}}Z_i^2 = {{\mathbb {E}}}\big [ X_i^2+{{\mathbb {E}}}^2_i X_{i+1}+{{\mathbb {E}}}^2_{i - 1} X_{i} -2{{\mathbb {E}}}_{i}X_{i+1}{{\mathbb {E}}}_{i - 1} X_{i} - 2 X_i{{\mathbb {E}}}_{i - 1} X_{i}+2 X_i{{\mathbb {E}}}_{i} X_{i+1}\big ]. \end{aligned} \end{aligned}$$

Since \({{\mathbb {E}}}X_i{{\mathbb {E}}}_{i - 1}X_{i} ={{\mathbb {E}}}{{\mathbb {E}}}^2_{i - 1}X_i\), \({{\mathbb {E}}}{{\mathbb {E}}}_{i} X_{i+1}{{\mathbb {E}}}_{i - 1} X_{i}={{\mathbb {E}}}X_{i + 1} {{\mathbb {E}}}_{i - 1}X_{i}\) and \( X_i{{\mathbb {E}}}_{i} X_{i+1} = {{\mathbb {E}}}_{i}( X_i X_{i+1})\), we obtain

$$\begin{aligned} \begin{aligned} {{\mathbb {E}}}Z_i^2&= {{\mathbb {E}}}\Big ( X_i^2+{{\mathbb {E}}}^2_{i}X_{i+1}-{{\mathbb {E}}}^2_{i - 1} X_{i} - 2 X_{i+1}{{\mathbb {E}}}_{i - 1} X_{i} + 2 X_i X_{i+1} \Big )\\&= {{\mathbb {E}}}\left( X_i^2 + 2 X_i X_{i+1}\right) -2{{\mathbb {E}}}\left( X_{i+1}{{\mathbb {E}}}_{i - 1} X_{i}\right) + {{\mathbb {E}}}\left( {{\mathbb {E}}}^2_{i}X_{i+1}-{{\mathbb {E}}}^2_{i - 1}X_{i}\right) .\\ \end{aligned} \end{aligned}$$

The variance formula in (5.1) follows by observing that due to 3), \({{\mathbb {E}}}\left( {{\mathbb {E}}}^2_{i}X_{i+1}-{{\mathbb {E}}}^2_{i - 1}X_{i}\right) = 0\), whereas by 4), \({{\mathbb {E}}}\left( X_{i+1}{{\mathbb {E}}}_{i - 1} X_{i}\right) = 0\).

Now, we will demonstrate the upper bound on \( \Vert Z_i\Vert _{\psi _\alpha }\) in (5.1). Using the triangle inequality (cf. Lemma A.1) twice and then Lemma A.3, we obtain

$$\begin{aligned} \begin{aligned} \Vert Z_i\Vert _{\psi _\alpha }&\le 2^{\frac{1}{\alpha } - 1}\Vert X_i\Vert _{\psi _\alpha } + 2^{\frac{1}{\alpha } - 1} \Vert {{\mathbb {E}}}_i X_{i+1} - {{\mathbb {E}}}_0 X_{1}\Vert _{\psi _\alpha } \le 2^{\frac{1}{\alpha }}\Vert X_i\Vert _{\psi _\alpha } + 2^{\frac{2}{\alpha }-1} \Vert {{\mathbb {E}}}_0 X_{1}\Vert _{\psi _\alpha } \\&\le 2^{\frac{1}{\alpha }}\Vert X_i\Vert _{\psi _\alpha } + 2^{\frac{2}{\alpha }-1} (2/\alpha )^\frac{1}{\alpha } \Vert X_{1}\Vert _{\psi _\alpha } \le \Vert X_{1}\Vert _{\psi _\alpha } \left( 2^{\frac{1}{\alpha }}+\frac{1}{2} (8/\alpha )^\frac{1}{\alpha }\right) \le c (8/\alpha )^\frac{1}{\alpha }. \end{aligned} \end{aligned}$$
(5.7)

This concludes the proof of the lemma. \(\square \)

Remark 5.2

If \((X)_{i\ge 0}\) is a 1-dependent, centered and stationary Markov chain such that \(\Vert X_i\Vert _\infty \le M <\infty \), then the assumptions of the above lemma are satisfied with \(m = 2\) and \({{\mathcal {F}}}_i = \sigma \left\{ X_j \;|\; j\le i\right\} \). If \((\xi _i)_{i\ge 0}\) are i.i.d. random variables and \(f:{{\mathbb {R}}}^2\rightarrow {{\mathbb {R}}}\) is a bounded, Borel function such that \(X_i = f(\xi _i, \xi _{i+1})\) are centered, then we can take \({{\mathcal {F}}}_i = \sigma \{\xi _j \;|\; j\le i+1\}\) and notice that the assumptions of the above lemma are satisfied with \(m = 1\).

Remark 5.3

It is worth noticing that \(\sigma _\infty ^2\) may be equal to 0 in case of 1-dependent processes \((X_i)_{i \in {{\mathbb {N}}}}\). Take for example \(X_i= \xi _{i+1} - \xi _i\) where \((\xi _i)_{i \in {{\mathbb {N}}}}\) are i.i.d. random variables. It turns out (cf. [14]) that the reverse is true, that is, if for a 1-dependent, bounded stationary process \((X_i)_{i \in {{\mathbb {N}}}}\) we have \(\sigma _\infty ^2 = 0\), then there exists an i.i.d. process \((\xi _i)_{i \in {{\mathbb {N}}}}\) such that \(X_i= \xi _{i+1} - \xi _i\).

Lemma 5.4

(Bernstein inequality for random sums) Let \((X_i)_{i\ge 0}\) be a 1-dependent sequence of centered random variables such that \({{\mathbb {E}}}\exp (c^{-\alpha }|X_i|^\alpha )\le 2\) for some \(\alpha \in (0,1]\) and \(c \ge 1\). Moreover, let \(N\le n\in {{\mathbb {N}}}\) be an \({{\mathbb {N}}}\)-valued bounded random variable. Assume that we can find a filtration \({{\mathcal {F}}}= \left( {{\mathcal {F}}}_i\right) _{i\ge 0}\) such that for \(Z_i =X_i + {{\mathbb {E}}}\left( X_{i+1}|{{\mathcal {F}}}_i\right) -{{\mathbb {E}}}\left( X_{i}|{{\mathcal {F}}}_{i-1}\right) \) we have the following:

  1. (0)

    \(X_i\) is \({{\mathcal {F}}}_i\) measurable,

  2. (1)

    N is a stopping time with respect to \({{\mathcal {F}}}\),

  3. (2)

    \((Z_i)_{i\ge 1}\) is stationary,

  4. (3)

    For each \(j \in {{\mathbb {N}}}\) process, \((Z_i)_{i\ge j + 3}\) is independent of \({{\mathcal {F}}}_{j}\),

  5. (4)

    \(\left( {{\mathbb {E}}}\left( X_{i}|{{\mathcal {F}}}_{i-1}\right) \right) _{i\ge 1}\) is stationary,

  6. (5)

    \( {{\mathbb {E}}}(X_i|{{\mathcal {F}}}_{i-1})\) is independent of \(X_{i+1}\) for all \(i \ge 1\).

Then for any \(t > 0\) and \(a>0\),

$$\begin{aligned} {\mathbb {P}}\left( \left| \sum _{i=1}^N X_i\right| > t \right) \le 4\exp (8) \exp \left( -\frac{t^\alpha }{u c^\alpha }\right) + 9 \exp \left( -\frac{t^2}{v \sigma _\infty ^2 + w t}\right) , \end{aligned}$$
(5.8)

where \(u = \frac{16 \cdot 26^\alpha }{\alpha }\), \(v = 102 a\), \(w = 14M\max \left( 2, \sqrt{\Vert \left( \lceil N/3 \rceil - a + 1\right) _+ \Vert _{\psi _1}}\right) \) and \(M = c(24\alpha ^{-3} \log {n})^\frac{1}{ \alpha }\).

Proof

Observe that 0) and 3) imply 2-dependence of the process \((Z_i)_{i\ge 1}\). Therefore, the filtration \(\mathcal {F}\) satisfies all the assumptions of Lemma 5.1 and thus (5.1) holds. Note also that without loss of generality, we may assume that \(t\ge w\log 9 \). (Otherwise, the right-hand side of (5.8) is at least one.) Fix \(s = (8\sqrt{2}\log 9)^{-1}\). Using the union bound, we get (\({{\mathbb {E}}}_i\left( \cdot \right) = {{\mathbb {E}}}\left( \cdot \;|\;{{\mathcal {F}}}_i\right) \))

$$\begin{aligned} {\mathbb {P}}\left( \left| \sum _{i=1}^N X_i\right|> t \right) \le {\mathbb {P}}\left( \left| \sum _{i=1}^N Z_i\right|> t(1-s) \right) + 2{\mathbb {P}}\left( \sup _{1\le i \le n}\left| {{\mathbb {E}}}_{i - 1}X_i \right| > \frac{ts}{2} \right) .\nonumber \\ \end{aligned}$$
(5.9)

Now using Lemma A.5, \(ts/2 \ge c\left( \frac{2}{\alpha }\right) ^\frac{1}{\alpha }\), \(t\ge w\log 9 \) and \(n\exp \left( -\frac{(st)^\alpha }{4(2c)^\alpha }\right) \le 1\), we obtain

$$\begin{aligned} \begin{aligned} 2{\mathbb {P}}\left( \sup _{1\le i \le n}\left| {{\mathbb {E}}}_{i - 1}X_{i}\right|> \frac{st}{2} \right) \le 2n{\mathbb {P}}\left( \left| {{\mathbb {E}}}_0 X_{1}\right| > \frac{st}{2} \right) \le 12\exp \left( -\frac{(st)^\alpha }{4(2c)^\alpha }\right) . \end{aligned}\qquad \quad \end{aligned}$$
(5.10)

Next, we take care of the other term on the right-hand side of (5.9). Firstly, we split the sum

$$\begin{aligned} {\mathbb {P}}\left( \left| \sum _{i=1}^N Z_{i} \right|> t(1-s) \right) \le \sum _{j=0}^2 {\mathbb {P}}\left( \left| \sum _{1\le i\le N,\; 3|(i+j)} Z_i \right| > \frac{t(1-s)}{3} \right) . \end{aligned}$$
(5.11)

Now, we will consider the jth summand of the above sum. Let us take \(r = \frac{3}{8\sqrt{2}\log (9)}\) and notice that there is function \(f_j:{{\mathbb {N}}}\rightarrow {{\mathbb {N}}}\) such that for any \(n\in {{\mathbb {N}}}\), \(\left\lfloor \frac{n}{3} \right\rfloor \le f_j(n) \le \left\lceil \frac{n}{3} \right\rceil \) and

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}\left( \left| \sum _{1\le i\le N,\; 3|i + j} Z_{i} \right|> t(1-s)/3 \right) = {\mathbb {P}}\left( \left| \sum _{1\le i\le f_j(N)} Z_{3i - j} \right|> t(1-s)/3 \right) \\&\quad \le {\mathbb {P}}\left( \left| \sum _{1\le i\le \lceil N/3 \rceil + 1} Z_{3i - j } \right|> t(1-r)(1-s)/3 \right) + {\mathbb {P}}\left( 2\sup _{k\le n+6} \left| Z_{k} \right| >t r(1-s)/3\right) .\\ \end{aligned}\nonumber \\ \end{aligned}$$
(5.12)

Due to \(\Vert Z_i\Vert _{\psi _\alpha } \le c (8/\alpha )^\frac{1}{\alpha }\) (cf. (5.1)) and Lemma A.4 along with \(t \ge w \log (9)\), \(n \ge 2\) (for \(n = 1\), the result of the lemma is trivial), we get

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}\left( 2\sup _{k\le n+6} \left| Z_{k} \right|>\frac{t r(1-s)}{3}\right) \le (n + 6) {\mathbb {P}}\left( \left| Z_{k} \right| >\frac{t r(1-s)}{3}\right) \\&\quad \le 2(n + 6) \exp \left( -\frac{\alpha (tr(1 - s))^\alpha }{8(3c)^\alpha }\right) \le 2\exp \left( -\frac{\alpha (tr(1 - s))^\alpha }{16(3c)^\alpha }\right) . \end{aligned} \end{aligned}$$
(5.13)

To handle the first summand on the right-hand side of (5.12), let us fix j and denote \(\gamma _i := Z_{3i + 3 - j}\), \({{\mathcal {G}}}_i := {{\mathcal {F}}}_{3i - j}\), \(T:= \lceil N/3 + 1\rceil \le \lceil n/3 \rceil + 1\). Using the assumptions on the filtration \(\mathcal {F}\) and (5.1), it is straightforward to check that the following properties hold:

  1. 1.

    \(\gamma _i\) are independent,

  2. 2.

    \({{\mathbb {E}}}\gamma _i = 0\), \({{\mathbb {E}}}\gamma _i^2 = \sigma _\infty ^2\), \(\Vert \gamma _i\Vert _{\psi _\alpha } \le c (8/\alpha )^\frac{1}{\alpha }\),

  3. 3.

    \(\gamma _{i-1}\) is \({{\mathcal {G}}}_i\) measurable,

  4. 4.

    \(\gamma _i\) is independent of \({{\mathcal {G}}}_i\),

  5. 5.

    T is a stopping time with respect to the filtration \({{\mathcal {G}}}_i\).

This is precisely the setting of Proposition 4.4. ii) from [2] which applied with \(\epsilon := 1\), \(p:= \frac{\sqrt{2}}{\sqrt{2}-1}\) and \(q := \sqrt{2}\) gives that for any \(a > 0\),

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}\left( \left| \sum _{1\le i\le \lceil N/3 \rceil + 1} Z_{3i - j } \right| > t(1-r)(1-s)/3 \right) \\&\quad \le \exp (8)\exp \left( -\frac{(t(1-r)(1-s))^\alpha }{2(3(2+\sqrt{2}){\hat{c}})^\alpha }\right) \\&\qquad + 3\exp \left( -\frac{(t(1-r)(1-s))^2}{72 a \sigma _\infty ^2 +6\sqrt{2}\mu (1-r)(1-s)t}\right) , \end{aligned} \end{aligned}$$
(5.14)

where

$$\begin{aligned} \mu = \max \left( \frac{8M}{3}, 2\sigma _\infty \sqrt{\Vert \left( \lceil N/3 \rceil - a + 1\right) _+ \Vert _{\psi _1}}\right) , \qquad {\hat{c}} = c \left( \frac{8}{\alpha }\right) ^\frac{1}{\alpha }. \end{aligned}$$

Using (5.1), Lemma A.2 with \(Y = \frac{\alpha Z^\alpha }{8c^\alpha }\) and \(\beta = \frac{2}{\alpha }\), together with the gamma function estimate \(\Gamma (x) \le \left( \frac{x}{2}\right) ^{x - 1} \text { for } x \ge 2\) (see Theorem 1 in [18]), we get

$$\begin{aligned} \sigma _\infty ^2 = {{\mathbb {E}}}Z_1^2 \le 2c^2\left( \frac{8}{\alpha }\right) ^\frac{2}{\alpha }\Gamma \left( \frac{2}{\alpha } + 1\right) \le \frac{4}{\alpha }c^2\left( \frac{8}{\alpha }\right) ^\frac{2}{\alpha }\Gamma \left( \frac{2}{\alpha }\right) \le 4 c^2\left( \frac{8}{\alpha ^2}\right) ^{\frac{2}{\alpha }}, \end{aligned}$$

which implies that \(\sigma _\infty \le \frac{2}{3}M\) and as a consequence,

$$\begin{aligned} \mu \le \frac{4}{3}M b, \qquad b = \max \left( 2, \sqrt{\Vert \left( \lceil N/3 \rceil - a + 1\right) _+ \Vert _{\psi _1}}\right) . \end{aligned}$$

Therefore, (5.14) reduces to

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}\left( \left| \sum _{1\le i\le \lceil N/3 \rceil + 1} Z_{3i - j } \right| > t(1-r)(1-s)/3 \right) \\&\quad \le \exp (8)\exp \left( -\frac{(t(1-r)(1-s))^\alpha }{2(3(2+\sqrt{2}){\hat{c}})^\alpha }\right) \\&\qquad + 3\exp \left( -\frac{(t(1-r)(1-s))^2}{72 a \sigma _\infty ^2 +8\sqrt{2}M b(1-r)(1-s)t}\right) .\\ \end{aligned} \end{aligned}$$

Combining the above inequality with (5.9)–(5.13), we obtain

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}\left( \left| \sum _{i=1}^N X_i\right| > t \right) \le 12\exp \left( -\frac{(st)^\alpha }{4(2c)^\alpha }\right) + 6\exp \left( -\frac{\alpha (tr(1 - s))^\alpha }{16(3c)^\alpha }\right) \\&\quad + 3\exp (8)\exp \left( -\frac{(t(1-r)(1-s))^\alpha }{2(3(2+\sqrt{2}){\hat{c}})^\alpha }\right) \\&\quad +9\exp \left( -\frac{(t(1-r)(1-s))^2}{72 a \sigma _\infty ^2 +8\sqrt{2}M b(1-r)(1-s)t}\right) . \end{aligned} \end{aligned}$$

To conclude, it is now enough to recall that \(r = 3(8\sqrt{2}\log (9))^{-1}\), \(s = (8\sqrt{2}\log 9)^{-1}\) and do some elementary calculations. \(\square \)

6 Bounds on the Number of Regenerations

We will now obtain a bound on the stopping time N, introduced in (3.20). To this end, we will use the \(\psi _1\) version of Bernstein inequality, which follows easily from the classical moment version of this inequality (see, e.g., Lemma 2.2.11 in [26]), by observing that for \(k \ge 2\), \({{\mathbb {E}}}|\xi |^k \le k! \Vert \xi \Vert _{\psi _1}^k = k!M^{k-2}v/2\), where \(M = \Vert \xi \Vert _{\psi _1}\), \(v = 2\Vert \xi \Vert _{\psi _1}^2\).

Lemma 6.1

(\(\psi _1\) Bernstein’s inequality.) If \((\xi _i)_i\) is a sequence of independent centered random variables such that \(\sup _i \Vert \xi _i \Vert _{\psi _1}\le \tau \), then

$$\begin{aligned} {\mathbb {P}}\left( \sum _{i=1}^n \xi _i \ge t\right) \le \exp \left( - \frac{t^2}{4n\tau ^2+2\tau t}\right) . \end{aligned}$$

Lemma 6.2

If \(\Vert \tau _1-\tau _0\Vert _{\psi _1} \le d \), then for any \(p>0\),

$$\begin{aligned} {\mathbb {P}}\left( N > \left\lceil {(1+p)n\left[ {{\mathbb {E}}}(\tau _1-\tau _0)\right] ^{-1}}\right\rceil \right) \le \exp (1)\exp \left( -\frac{p n {{\mathbb {E}}}(\tau _1-\tau _0)}{K_p d^2 }\right) , \end{aligned}$$
(6.1)

where \(K_p = L_p + 16/L_p\) and \(L_p = \frac{16}{p} + 20\). Moreover, the function \(p\rightarrow K_p\) is decreasing on \({{\mathbb {R}}}_+\) (in particular, \(K_p \ge K_\infty = \frac{104}{5}\)) and if \(p=2/3\), then \(\frac{1}{p} K_{p} \le 67\) .

Proof

For convenience, let \(T_i = \tau _i - \tau _{i - 1}\) for \(i \ge 1\). Firstly, notice that without loss of generality, we may assume that \(np \ge L_p {{\mathbb {E}}}T_1\). Indeed, otherwise, using \({{\mathbb {E}}}T_1 \le d\) we obtain

$$\begin{aligned} \exp (1)\exp \left( -\frac{p n {{\mathbb {E}}}T_1}{K_p d^2 }\right) \ge \exp (1)\exp \left( -\frac{L_p {{\mathbb {E}}}^2T_1}{K_p d^2 }\right) \ge \exp \left( 1 -\frac{L_p}{K_p}\right) \ge 1. \end{aligned}$$

Thus, from now on we consider n such that \(np \ge L_p {{\mathbb {E}}}T_1\). For \(A = (1+p)n\left[ {{\mathbb {E}}}T_1\right] ^{-1} \ge 1\), we get

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}(N > \lceil {A}\rceil ) \le {\mathbb {P}}(\tau _{\lceil {A}\rceil }-\tau _0\le n) \le {\mathbb {P}}\left( \sum _{i=0}^{\lceil {A}\rceil -1}T_{i + 1} -{{\mathbb {E}}}T_{i + 1} \le n -A {{\mathbb {E}}}T_{1}\right) \\&\quad = {\mathbb {P}}\left( \sum _{i=0}^{\lceil {A}\rceil - 1 }T_{i + 1} -{{\mathbb {E}}}T_{i + 1} \le n - (1+p)n\right) \\&\quad = {\mathbb {P}}\left( \sum _{i=0}^{\lceil {A}\rceil -1}T_{i + 1} -{{\mathbb {E}}}T_{i + 1} \le -np\right) . \end{aligned} \end{aligned}$$
(6.2)

Now, we have \(\Vert T_{i+1} -{{\mathbb {E}}}T_{i+1}\Vert _{\psi _1} \le 2d\), so using Lemma 6.1, \({{\mathbb {E}}}T_1 \le d\) and \(np \ge L_p {{\mathbb {E}}}T_1\), we get

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}\left( N > \left\lceil (1+p)n \left[ {{\mathbb {E}}}T_1\right] ^{-1} \right\rceil \right) \le \exp \left( -\frac{p^2 n^2}{4(A+1) 4d^2 + 4d np}\right) \\&\quad = \exp \left( -\frac{p^2 n^2}{16 d^2 \left[ (1+p)n\left[ {{\mathbb {E}}}T_1\right] ^{-1} + 1\right] + 4d np}\right) \\&\quad = \exp \left( -\frac{p n {{\mathbb {E}}}T_{1}}{16 d^2 \left( \frac{1+p}{p} +\frac{{{\mathbb {E}}}T_{1}}{pn}\right) + 4d {{\mathbb {E}}}T_{1}}\right) \\&\quad \le \exp \left( -\frac{p n {{\mathbb {E}}}T_{1}}{16 d^2 \left( \frac{1+p}{p} +\frac{1}{L_p}\right) + 4d^2}\right) = \exp \left( -\frac{p n {{\mathbb {E}}}T_{1}}{K_p d^2 }\right) \\&\quad \le \exp \left( 1 - \frac{p n {{\mathbb {E}}}T_{1}}{K_p d^2 }\right) , \end{aligned} \end{aligned}$$

which finishes the proof of (6.1). The properties of \(K_p\) follow from easy computations. \(\square \)

The following lemma is a standard consequence of the tail estimates given in Lemma 6.2. Its proof, based on integration by parts, is analogous to that of Lemma 5.4 in [2] and is therefore omitted.

Lemma 6.3

Suppose that \(\Vert \tau _1-\tau _0\Vert _{\psi _1} \le d \) for some \(d>0\). Then for any \(p>0\),

$$\begin{aligned} \left\| \left( N - a\right) _+\right\| _{\psi _1} \le \frac{4 K_p d^2}{\left[ {{\mathbb {E}}}(\tau _1-\tau _0)\right] ^2}\le \frac{4 K_p d^2}{m^2}, \end{aligned}$$

where \(a = (1+p) n \left[ {{\mathbb {E}}}(\tau _1-\tau _0)\right] ^{-1}\), \(K_p = L_p + \frac{16}{L_p}\) and \(L_p = \frac{16}{p} + 20\). Moreover,

$$\begin{aligned} \frac{d^2 K_p}{\left[ {{\mathbb {E}}}(\tau _1-\tau _0)\right] ^2} \ge K_p \ge K_\infty . \end{aligned}$$

7 Proofs of Theorems 4.14.3 and 4.4

In this section, we will prove our main results. The structure of proofs of Theorems 4.1 and 4.3 is similar, and they contain a common part, which we will present first in Sects. 7.1 and 7.2. The proof of Theorem 4.1 will be concluded in Sect. 7.3 and the proof of Theorem 4.3 in Sect. 7.4. Theorem 4.4 will be obtained as a corollary to Theorem 4.1 in Sect. 7.5.

Let us thus pass to the proofs of Theorems 4.1 and 4.3. Assume that m|n. The argument will be based on the approach of [1] and [2] (see also [10] and [12]) and will rely on the decomposition

$$\begin{aligned} \left| \sum _{i=0}^{n-1}f(\Upsilon _i)\right| \le H_n + M_n + T_n, \end{aligned}$$
(7.1)

where

$$\begin{aligned} \begin{aligned} H_n&= \left| \sum _{i=0}^{\tau _0 /m} \Theta _i {\mathbb {1}}_{N>0} + {\mathbb {1}}_{N=0} \sum _{i=0}^{n/m - 1} \Theta _i \right| ,\; M_n = \left| \sum _{i=1}^{N} \chi _{i-1}(f)\right| ,\\ T_n&=\left| {\mathbb {1}}_{N>0} \sum _{k=n}^{\tau _N + m - 1} f(\Upsilon _k)\right| , \; N = \inf \{i\ge 0 \;|\; \tau _i + m - 1\ge n-1\}. \end{aligned} \end{aligned}$$
(7.2)

The proof will be divided into three main steps. In the first two (common for both theorems), we will get easy bounds on tails of \(H_n\) and \(T_n\). The main, third step will be devoted to obtaining two different estimates on the tail of \(M_n\). To this end, we will use Lemmas 5.16.2 (for the proof of Theorem 4.1) and Lemmas 5.46.3 (for Theorem 4.3).

7.1 Estimate on \(H_n\)

Using \(\{N=0\}\subset \{\tau _0 \ge n - m\}\), the definition of \(\mathbf{{{a}} }\) (see (4.2)) and Lemma A.4, we get

$$\begin{aligned} \begin{aligned} {\mathbb {P}}_{x^*}(H_n>t)&\le {\mathbb {P}}_{x^*}\left( {\mathbb {1}}_{N>0}\sum _{i=0}^{\tau _0 /m}\left| \Theta _i \right| + {\mathbb {1}}_{N=0} \sum _{i=0}^{n/m - 1} \left| \Theta _i \right|> t\right) \\&\le {\mathbb {P}}_{x^*}\left( \sum _{i=0}^{\tau _0 /m}\left| \Theta _i \right| > t\right) \\&\le 2 \exp \left( -\frac{t^\alpha }{\mathbf{{{a}} }^\alpha }\right) . \end{aligned} \end{aligned}$$
(7.3)

7.2 Estimate on \(T_n\)

By repeating verbatim the easy argument presented in the proof of Theorem 5.1 in [2], we obtain

$$\begin{aligned} {\mathbb {P}}\left( \left| T_n\right| > t \right) \le 2\left[ \delta \pi (C)\right] ^{-1} \exp \left( -\frac{t^\alpha }{\mathbf{{{b}} }^\alpha }\right) . \end{aligned}$$
(7.4)

We skip the details.

7.3 Proof of Theorem 4.1

Recall that \(M = \mathbf{{{c}} }(24\alpha ^{-3} \log {n})^\frac{1}{ \alpha }\) and note that without loss of generality, we can assume that \(t \ge 8 M \log 6\). Otherwise, (4.3) is trivial as the right- hand side is greater than or equal to 1. Fix \(p = 2/3\). We have (\(A := \left\lceil {(p+1)n({{\mathbb {E}}}(\tau _1-\tau _0))^{-1}}\right\rceil \))

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}\left( M_n \ge t \right) = {\mathbb {P}}\left( M_n \ge t ,\; N\le A \right) + {\mathbb {P}}\left( M_n \ge t, N> A\right) \\&\quad \le {\mathbb {P}}\left( \sup _{1\le k \le A} \left| \sum _{i=1}^{k} \chi _{i-1}\right| \ge t \right) + {\mathbb {P}}\left( N > A\right) . \end{aligned} \end{aligned}$$
(7.5)

To control the first summand on the right-hand side of the above inequality, we will apply Lemma 5.1 with \(m=2\), \(X_i := \chi _{i} = F(\Xi _{i + 1})\) (cf. (3.16)), \(c := \mathbf{{{c}} }\) and \(n := A\). Assuming that the assumptions of the lemma are satisfied (we will verify them later on), we obtain (in the first line, we use stationarity of \((\Xi _i)_{i\ge 1}\)):

$$\begin{aligned}&P := {\mathbb {P}}\left( \sup _{1\le k \le A} \left| \sum _{i=1}^{k} \chi _{i-1}\right| \ge t \right) = {\mathbb {P}}\left( \sup _{1\le k \le A} \left| \sum _{i=1}^{k} F(\Xi _{i+1})\right| \ge t \right) \nonumber \\&\quad \le 6\exp (8)\exp \left( -\frac{t^\alpha }{\frac{16}{\alpha } (24 \mathbf{{{c}} })^\alpha }\right) \nonumber \\&\qquad + \,6\exp \left( -\frac{t^2}{15 \left( \left\lceil {(p+1)n({{\mathbb {E}}}(\tau _1-\tau _0))^{-1}}\right\rceil +3\right) \sigma _\infty ^2+6 t M}\right) \nonumber \\&\quad \le 6\exp (8)\exp \left( -\frac{t^\alpha }{\frac{16}{\alpha } (24 \mathbf{{{c}} })^\alpha }\right) \nonumber \\&\qquad +\, 6\exp \left( -\frac{t^2}{15 \left( (p+1)n({{\mathbb {E}}}(\tau _1-\tau _0))^{-1}+4\right) \sigma _\infty ^2+6 t M}\right) \end{aligned}$$
(7.6)

Recall that by (3.22), \(\sigma _\infty ^2 = \sigma _{Mrv}^2 {{\mathbb {E}}}(\tau _1-\tau _0)\). We will now obtain a comparison between \(\sigma ^2_\infty \) and tM, which will allow us to reduce the above estimate to one in which the sub-Gaussian coefficient is expressed only in terms of \(\sigma _{Mrv}^2\). Thanks to Lemma A.2 applied with \(\Lambda := (\chi _1/\mathbf{{{c}} })^\alpha \) and \(\beta := 2/\alpha \), we have

$$\begin{aligned} \sigma _\infty ^2 \le 3{{\mathbb {E}}}\chi ^2_1 \le 3\mathbf{{{c}} }^2 \Gamma (2/\alpha + 1) \le 3\mathbf{{{c}} }^2 (2/\alpha )^{\frac{2}{\alpha } + 1}, \end{aligned}$$

where the last inequality is a consequence of equation 4 in [18]. Moreover, recalling the definition of M and using the assumption \(t \ge 8\log (6)M\), we obtain

$$\begin{aligned} \begin{aligned} tM&\ge 8 \log (6) M^2 =8\log (6) \mathbf{{{c}} }^2(24 \alpha ^{-3} \log (n))^\frac{2}{\alpha }\ge 16\cdot 8\log (6) \mathbf{{{c}} }^2(2/\alpha )^{\frac{2}{\alpha } + 1} \ge 76 \sigma _\infty ^2. \end{aligned} \end{aligned}$$

The last inequality in combination with (7.6) yields

$$\begin{aligned} \begin{aligned} P&\le 6\exp (8)\exp \left( -\frac{t^\alpha }{\frac{16}{\alpha } (24 \mathbf{{{c}} })^\alpha }\right) + 6\exp \left( -\frac{t^2}{15 (p+1)n\sigma _{Mrv}^2+7 t M}\right) \\&\le 6\exp (8)\exp \left( -\frac{t^\alpha }{\frac{16}{\alpha } (24 \mathbf{{{c}} })^\alpha }\right) + 6\exp \left( -\frac{t^2}{25 n\sigma _{Mrv}^2+ 7 t M}\right) . \end{aligned} \end{aligned}$$
(7.7)

In order to justify the above inequality, it remains to verify the assumptions of Lemma 5.1. To this end, take

$$\begin{aligned} {{\mathcal {F}}}_i = \sigma \{\Xi _j\;|\; j\le i+1 \}, \quad Z_i =\chi _i+{{\mathbb {E}}}(\chi _{i+1}|{{\mathcal {F}}}_{i})-{{\mathbb {E}}}(\chi _{i}|{{\mathcal {F}}}_{i-1}). \end{aligned}$$

We will now strongly rely on the properties of the stationary sequence of 1-dependent blocks \((\Xi _i)_{i\ge 1}\) stated in Remark 3.10 together with (3.15) and (3.16). Since \(\chi _i = F(\Xi _{i+1})\), the assumption 0) of Lemma 5.1 is trivially satisfied. To prove 2), observe that \({{\mathbb {E}}}(\chi _{i+1}|{{\mathcal {F}}}_{i}) = {{\mathbb {E}}}(\chi _{i+1}|\Xi _{i+1}) = G(\Xi _{i+1})\) for some measurable function G, and so the sequence \(\left( Z_i\right) _{i\ge 1}=\left( F(\Xi _{i+1})+G(\Xi _{i+1}) - G(\Xi _{i})\right) _{i\ge 1}\) is stationary (as a function of the stationary sequence \(\left( \Xi _i\right) _{i\ge 1}\)). The sequence \(\left( \Xi _i\right) )_{i\ge 1}\) is 1-dependent, which clearly implies that \(\left( Z_i\right) _{i\ge 1}\) is 2-dependent, i.e., the assumption 2) of the lemma. The assumption 3), i.e., the stationarity of the sequence \(\left( {{\mathbb {E}}}(\chi _i|{{\mathcal {F}}}_{i-1})\right) _{i\ge 1} = \left( G(\Xi _i)\right) _{i\ge 1}\), follows again by stationarity of \(\left( \Xi _i\right) _{i\ge 1}\). Finally, using once more the fact that \((\Xi _i)_{i\ge 0}\) is 1-dependent, we obtain that for any \(i\ge 1\), the random variable \({{\mathbb {E}}}(\chi _i|{{\mathcal {F}}}_{i-1}) = G(\Xi _i)\) is independent of \(\chi _{i+1} = F(\Xi _{i+2})\), which ends the verification of the assumptions of Lemma 5.1 and proves (7.7).

Thus, in order to get a bound on \({\mathbb {P}}(M_n > t)\) it suffices to estimate the second term on the right-hand side of (7.5). To this aim, we use Lemma 6.2 with \(p =2/3\) and \(d= \mathbf{{{d}} }\) obtaining

$$\begin{aligned} {\mathbb {P}}\left( N > \left\lceil {(1+p)n({{\mathbb {E}}}(\tau _1-\tau _0))^{-1}}\right\rceil \right) \le \exp (1)\exp \left( -\frac{ n {{\mathbb {E}}}(\tau _1-\tau _0)}{67 \mathbf{{{d}} }^2 }\right) . \end{aligned}$$

In combination with (7.5) and (7.7), this gives

$$\begin{aligned} \begin{aligned} {\mathbb {P}}\left( M_n \ge t \right)&\le 6\exp (8)\exp \left( -\frac{t^\alpha }{\frac{16}{\alpha } (24 \mathbf{{{c}} })^\alpha }\right) + 6\exp \left( -\frac{t^2}{25 n\sigma _{Mrv}^2+ 7 t M}\right) \\&\quad +\exp (1)\exp \left( -\frac{ n {{\mathbb {E}}}(\tau _1-\tau _0)}{67 \mathbf{{{d}} }^2 }\right) . \end{aligned} \end{aligned}$$

Combining the above inequality with (7.3) and (7.4), we get

$$\begin{aligned}&{\mathbb {P}}_x \left( \left| \sum _{i=0}^{n-1} f(\Upsilon _i)\right| >t\right) \le {\mathbb {P}}\left( H_n\ge \frac{1-\sqrt{5/6}}{2}t\right) + {\mathbb {P}}\left( M_n\ge \sqrt{5/6}t\right) \\&\qquad + {\mathbb {P}}\left( T_n\ge \frac{1-\sqrt{5/6}}{2}t\right) \\&\quad \le 2\exp \left( -\frac{t^\alpha }{(23\mathbf{{{a}} })^\alpha }\right) + 2\left[ \delta \pi (C)\right] ^{-1} \exp \left( -\frac{t^\alpha }{(23\mathbf{{{b}} })^\alpha }\right) \\&\qquad +\exp (1)\exp \left( -\frac{ n {{\mathbb {E}}}(\tau _1-\tau )}{67 \mathbf{{{d}} }^2 }\right) \\&\qquad + 6\exp (8)\exp \left( -\frac{t^\alpha }{\frac{16}{\alpha } (27 \mathbf{{{c}} })^\alpha }\right) + 6\exp \left( -\frac{t^2}{30 n\sigma _{Mrv}^2+8 t M}\right) . \end{aligned}$$

In order to finish the proof of Theorem 4.1, it is enough to substitute \({{\mathbb {E}}}(\tau _1-\tau _0) = \delta ^{-1} \pi (C)^{-1}m\).

7.4 Proof of Theorem 4.3

Recall that \(M = \mathbf{{{c}} }(24\alpha ^{-3} \log {n})^\frac{1}{ \alpha }\), and let \(p>0\) be a parameter which will be fixed later on. We are going to apply Lemma 5.4 with \(X_i := \chi _{i} = F(\Xi _{i + 1})\), \(c := \mathbf{{{c}} }\), \({{\mathcal {F}}}_i := \sigma \{\Xi _j\;|\; 0\le j \le i + 1\}\). Clearly, N is a stopping time with respect to \({{\mathcal {F}}}\). The remaining assumptions of Lemma 5.1 can be verified in the same manner as in the proof of Theorem 4.1. Let \(a = (1+p)\frac{n}{3} \left[ {{\mathbb {E}}}(\tau _1-\tau _0)\right] ^{-1}\). By Lemma 6.3, we get

$$\begin{aligned} \begin{aligned} \left\| \left( \lceil N/3 \rceil - a + 1 \right) _+\right\| _{\psi _1}&\le \frac{1}{3}\left\| \left( N - (1+p)n({{\mathbb {E}}}(\tau _1-\tau _0))^{-1} \right) _+\right\| _{\psi _1} + \frac{2}{\log 2}\\&\le \frac{4}{3} \mathbf{{{d}} }^2 K_p + \frac{2}{\log 2} \le \left( \frac{4}{3} +\frac{7}{50}\right) \mathbf{{{d}} }^2 K_p, \end{aligned} \end{aligned}$$

where the last inequality follows from (recall the definition of \(K_\infty \) from Lemma 6.2)

$$\begin{aligned} \frac{7}{50}K_p \ge \frac{7}{50}K_\infty = \frac{7}{50}\cdot \frac{104}{5} \ge \frac{2}{\log 2}. \end{aligned}$$

Therefore, \(\max \left( 2, \sqrt{\Vert \left( \lceil N/3 \rceil - a + 1\right) _+ \Vert _{\psi _1}}\right) \le \sqrt{4/3 + 7/50} \sqrt{K_p} \cdot \mathbf{{{d}} }\) and we get that for arbitrary \(p > 0\),

$$\begin{aligned}&{\mathbb {P}}\left( \left| \sum _{i=1}^N \chi _{i-1}\right| >t\right) \le 4\exp (8)\exp \left( -\frac{t^\alpha }{\frac{16}{\alpha }(26\mathbf{{{c}} })^\alpha }\right) \\&\quad +9\exp \left( -\frac{t^2}{34(1+p) \sigma _{Mrv}^2 + 17 M \mathbf{{{d}} }t\sqrt{K_p}}\right) . \end{aligned}$$

Using the above inequality together with (7.3) and (7.4), we obtain

$$\begin{aligned}&{\mathbb {P}}_x \left( \left| \sum _{i=0}^{n-1} f(\Upsilon _i)\right| >t\right) \le {\mathbb {P}}\left( H_n\ge \frac{t}{54}\right) + {\mathbb {P}}\left( M_n\ge \frac{26t}{27}\right) + {\mathbb {P}}\left( T_n\ge \frac{t}{54}\right) \\&\quad \le 2\exp \left( -\frac{t^\alpha }{(54\mathbf{{{a}} })^\alpha }\right) + 2\left[ \delta \pi (C)\right] ^{-1} \exp \left( -\frac{t^\alpha }{(54\mathbf{{{b}} })^\alpha }\right) \\&\qquad + 4\exp (8)\exp \left( -\frac{t^\alpha }{\frac{16}{\alpha }(27\mathbf{{{c}} })^\alpha }\right) \\&\qquad +9\exp \left( -\frac{t^2}{37(1+p) \sigma _{Mrv}^2 + 18 M \mathbf{{{d}} }t\sqrt{K_p}}\right) , \end{aligned}$$

which concludes the proof of Theorem 4.3.

7.5 Proof of Theorem 4.4.

Denote \(M = \left\| f\right\| _{\infty }\) and notice that for \(t > n M\), the left-hand side of (4.6) vanishes, so we may assume that \(t \le n M\). Using (4.4), one can easily see that if m|n, then Theorem 4.1 applied with \(\alpha = 1\) implies that

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}_x \left( \left| \sum _{i=0}^{n-1} f(\Upsilon _i)\right| >t\right) \le \left( 2 + 2\left[ \delta \pi (C)\right] ^{-1}\right) \exp \left( -\frac{t}{46DM}\right) \\&\quad + 6\exp (8)\exp \left( -\frac{t}{432DM}\right) \\&\quad + 6\exp \left( -\frac{t^2}{30 n\sigma _{Mrv}^2+ 192 t DM}\right) +\exp (1)\exp \left( -\frac{ n m}{67 \delta \pi (C)D^2 }\right) . \end{aligned} \end{aligned}$$
(7.8)

The assumption \(t \le nM\) yields

$$\begin{aligned} \exp \left( -\frac{ n m}{67 \delta \pi (C)D^2 }\right) \le \exp \left( -\frac{ t m}{67 \delta \pi (C)MD^2 }\right) , \end{aligned}$$

which plugged into (7.8) gives after some elementary calculations that (recall \(K =\exp (10) + 2\left[ \delta \pi (C)\right] ^{-1}\))

$$\begin{aligned} \begin{aligned} {\mathbb {P}}_x \left( \left| \sum _{i=0}^{n-1} f(\Upsilon _i)\right| >t\right)&\le K\exp \left( -\frac{t^2}{30n\sigma _{Mrv}^2 + 432t D^2 M \delta \pi (C) \log n}\right) , \end{aligned} \end{aligned}$$
(7.9)

proving the theorem in the special case m|n.

Now, we consider the case \(m\not \mid n\). Define \(\lceil n \rceil _m\) to be the smallest integer greater or equal to n, which is divisible by m. Notice that without loss of generality, we can assume that \(t > 4330D^2M\delta \pi (C)\). (Otherwise, the assertion of the theorem is trivial as the right-hand side of (4.6) exceeds one.) Since \(D^2\delta \pi (C) > m\) (recall \({{\mathbb {E}}}(\tau _1-\tau _0) = \delta ^{-1} \pi (C)^{-1}m\)), this implies that \(t \ge 4330 Mm\). Moreover, as \(t \le nM\), we also obtain that \(n\ge 4330m\).

Thus, for \(p = 1/4330\) we have \(\left| \sum _{i=n}^{\lceil n \rceil _m} f(\Upsilon _i)\right| \le Mm \le pt\), and as a consequence,

$$\begin{aligned} \begin{aligned} {\mathbb {P}}_x \left( \left| \sum _{i=0}^{n-1} f(\Upsilon _i)\right|>t\right) \le {\mathbb {P}}_x \left( \left| \sum _{i=0}^{\lceil n \rceil _m -1} f(\Upsilon _i)\right| >(1-p)t\right) . \end{aligned} \end{aligned}$$
(7.10)

Now using (7.9) and the inequality \(n > 4330m\), we get

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}_x \left( \left| \sum _{i=0}^{n-1} f(\Upsilon _i)\right| >t\right) \le K\exp \left( -\frac{t^2}{31\lceil n \rceil _m \sigma _{Mrv}^2 + 433t D^2 M\delta \pi (C)\log n}\right) \\&\qquad \le K\exp \left( -\frac{t^2}{31 (n + m) \sigma _{Mrv}^2 + 433 t D^2 M\delta \pi (C)\log n}\right) \\&\qquad \le K\exp \left( -\frac{t^2}{32n \sigma _{Mrv}^2 + 433t D^2 M\delta \pi (C)\log n}\right) . \\ \end{aligned} \end{aligned}$$

This concludes the proof of Theorem 4.4.