1 Introduction

Nonconventional ergodic theorems which attracted substantial attention in ergodic theory (see, for instance, [2] and [13]) studied the limits of expressions having the form \(1/N\sum _{n=1}^NT^{q_1(n)}f_1\cdots T^{q_\ell (n)}f_\ell \) where \(T\) is a weakly mixing measure preserving transformation, \(f_i\)’s are bounded measurable functions and \(q_i\)’s are polynomials taking on integer values on the integers. While, for instance, [2] and [13] were interested in \(L^2\) convergence, other papers such as [1] provided conditions for almost sure convergence in such ergodic theorems. Originally, these results were motivated by applications to multiple recurrence for dynamical systems taking functions \(f_i\) being indicators of some measurable sets.

Introducing stronger mixing or weak dependence conditions enabled us in [22] to obtain functional central limit theorems for even more general expressions of the form

$$\begin{aligned} \frac{1}{\sqrt{N}}\sum _{n=1}^{[Nt]}\big ( F(X(q_1(n)),\ldots ,X(q_\ell (n))-\bar{F}\big ) \end{aligned}$$
(1.1)

where \(X(n),\, n\ge 0\) is a sufficiently fast mixing vector valued process with some moment conditions and stationarity properties, \(F\) is a locally Hölder continuous function with polynomial growth, \(\bar{F}=\int Fd(\mu \times \cdots \times \mu )\) and \(\mu \) is the distribution of \(X(0)\). In order to ensure existence of limiting variances and covariances we had to impose certain assumptions concerning the functions \(q_j(n),\, j\ge 1\) saying that there exists an integer \(k\ge 1\) such that \(q_j(n)=jn\) for \(j=1,\ldots ,k\) while \(q_j(n),\, j\ge k\) are positive functions taking on integer values on integers with some (faster than linear) growth conditions.

The next natural step in the study of limiting behavior of nonconventional sums \(S_N=\sum _{n=1}^NF\big (X(q_1(n)),\ldots ,X(q_\ell (n))\big )\) is to obtain large deviations estimates. Namely, we will be interested in this paper in the asymptotical behavior as \(N\rightarrow \infty \) of probabilities

$$\begin{aligned} P\left\{ \frac{1}{N}S_N\in {\Gamma }\right\} \end{aligned}$$
(1.2)

for various (open or closed) sets \({\Gamma }\subset \mathbb{R }\). According to [19] under appropriate conditions \(\frac{1}{N}S_N\) converges with probability one as \(N\rightarrow \infty \) to \(\bar{F}=\int Fd\mu \times \cdots \times \mu \) where \(\mu \) is the common distribution of \(X(n)\)’s. Thus, as usual, (1.2) describes deviations of \(\frac{1}{N}S_N\) from the limit in the law of large numbers.

The study of asymptotics of probabilities in (1.2) leads to what is usually called the first level of large deviations. We will study also second level large deviations estimates which means in our setup to consider occupational measures

$$\begin{aligned} \zeta _N=\frac{1}{N}\sum _{n=1}^N{\delta }_{\big (X(q_1(n)),\ldots ,X(q_\ell (n))\big )} \end{aligned}$$
(1.3)

and to study the asymptotical behavior as \(N\rightarrow \infty \) of probabilities \(P\{\zeta _N\in \mathcal{U }\}\) where \(\mathcal{U }\) is a subset in the space of probability measures on a corresponding product space. In addition, we will consider also large deviations in the averaging setup, namely, for the “slow” variable \(\Xi ^{\varepsilon }(n)=\Xi _x^{\varepsilon }(n)\) given by a difference equation of the form

$$\begin{aligned}&\Xi ^{\varepsilon }(n+1)=\Xi ^{\varepsilon }(n)+{\varepsilon }F\big (\Xi ^{\varepsilon }(n),X(q_1(n)),\ldots ,X(q_\ell (n))\big ),\nonumber \\&\quad n=0,1,\ldots ,\,\Xi _x^{\varepsilon }(0)=x \end{aligned}$$
(1.4)

which is actually a generalization of the above since if \(F(\xi ,x_1,\ldots ,x_\ell )\) does not depend on \(\xi \) then \(\Xi ^{\frac{1}{N}}(N)= \frac{1}{N}S_N\). We will deal also with continuous time versions of the above results considering \(S_T=\int _0^TF\big (X(q_1(t)),\ldots ,X(q_\ell (t))\big )dt\) for some stochastic process \(X(s),\, s\ge 0\).

As for conventional sums (\(\ell =k=1\)) meaningful large deviations estimates can be obtained only for some specific classes of stochastic processes and dynamical systems. In our more general situation we also assume that in the probabilistic setup \(X(n),\, n=0,1,\ldots \) is a Markov chain satisfying a (strong) Doeblin condition while in the dynamical systems setup we can consider \(X(n)=X(n,{\omega })=f(T^n{\omega })\) where \(T\) is either a mixing subshift of finite type or a hyperbolic diffeomorphism or an expanding transformation and \(f\) is a Hölder continuous (vector) function. In the continuous time case we take the underlying process \(X(t)\) to be in the probabilistic setup either an irreducible finite Markov chain with continuous time or a nondegenerate diffusion on a compact manifold while in the dynamical systems setup we can take \(X(t)=X(t,{\omega })=f(T^t{\omega })\) where \(T^t,\, t\ge 0\) is a hyperbolic flow on a compact manifold and \(f\) is a Hölder continuous (vector) function.

We will show that it is not difficult to reduce the problem to the case \(k=\ell \) and the major problems arise only in dealing with random variables \(X(n), X(2n),\ldots ,X(kn)\). When \(k=1\) the above reduction leads to the standard (conventional) setup of large deviations. When \(k>1\) then the general case of Markov sequences requires a quite elaborate technique and a lengthy proof and it will be treated in another paper while here when \(k>1\) we restrict ourselves to independent identically distributed (i.i.d.) sequences \(X(n), n\ge 0\) which, unlike in the conventional setup, is still nontrivial.

Both probabilistic and dynamical systems setups are united by common ideas and motivations but their machineris are quite different and by this reason most of this paper deals with the probabilistic setup and only in the last Sect. 5 we discuss some of dynamical systems results which especially can benefit readers familiar with this field.

2 Preliminaries and main results

We start with the probabilistic discrete time setup where the underlying process \(X(0),\, X(1),\, X(2),\ldots \) is a Markov chain defined on a probability space \(({\Omega },\mathcal{F },P)\) and evolving on a Polish measurable space \((M,\mathcal{B })\) as its phase space. We assume a “strong” Doeblin condition saying that for some integer \(n_0>0\), a constant \(C>0\) and a probability measure \(\nu \) on \(M\) the \(n_0\)-step transition probability \(P(n_0,x,\cdot )\) of the above Markov chain \(X\) satisfies

$$\begin{aligned} C^{-1}\nu (G)\le P(n_0,x,G)\le C\nu (G) \end{aligned}$$
(2.1)

for any \(x\in M\) and every measurable set \(G\subset M\). It is well known (see, for instance, [8]) that (2.1) implies existence of a unique invariant measure \(\mu \) of the Markov chain \(X\) and the equality \(\mu (G)=\int d\mu (x)P(n,x,G)\) yields that

$$\begin{aligned} C^{-1}\le \frac{d\mu }{d\nu }(x)=p(x)\le C \end{aligned}$$
(2.2)

where \(d\mu /d\nu \) denotes the Radon-Nikodim derivative.

In all cases our setup includes also a bounded measurable function \(F=F(x_1,x_2,\ldots ,x_\ell )\) on the \(\ell \)-times product space \(M^\ell = M\times \cdots \times M\). The setup becomes complete with introduction of positive increasing functions \(q_j,\, j=1,\ldots ,\ell \) taking on integer values on integers and such that

$$\begin{aligned} q_j(n)=jn\quad \text{ for}\, j=1,\ldots ,k\quad \text{ and} \text{ some}\quad k\le \ell \end{aligned}$$
(2.3)

while for \(j=k+1,\ldots ,\ell \) and any \({\gamma }>0\),

$$\begin{aligned} \lim _{n\rightarrow \infty }(q_j(n)-q_j(n-1))=\infty \,\,\,\text{ and}\quad \liminf _{n\rightarrow \infty }(q_j({\gamma }n)- q_{j-1}(n))>0. \end{aligned}$$
(2.4)

For any function \(W\) on \(M^\ell \) we denote by \(\hat{W}\) the function on \(M\) defined by

$$\begin{aligned} \hat{W}(x)=\int \exp (W(x,x_2,\ldots ,x_\ell ))d\mu (x_2)\ldots d\mu (x_\ell ). \end{aligned}$$
(2.5)

As usual we denote by \(P_x\) the probability conditioned to \(X(0)=x\) and by \(E_x\) the corresponding expectation. Now, we can formulate our first result.

Theorem 2.1

Let \(W_{\lambda }(x_1,\ldots ,x_\ell ),\,{\lambda }\in (-\infty ,\infty )\) be a differentiable in \({\lambda }\) family of bounded measurable functions on \(M^\ell \) such that \(dW_{\lambda }(x_1,\ldots ,x_\ell )/d{\lambda }\) is bounded for each \({\lambda }\), as well. Assume that \(k=1\) in (2.3) and (2.4). Then for any \(x\in M\) the limit

$$\begin{aligned} Q(W_{\lambda })=\lim _{N\rightarrow \infty }\frac{1}{N}\ln E_x\exp \bigg (\sum _{n=1}^N W_{\lambda }(X(q_1(n)),\ldots ,X(q_\ell (n)))\bigg ) \end{aligned}$$
(2.6)

exists, it is independent of \(x\) and it is differentiable in \({\lambda }\). In fact, \(Q(W_{\lambda })=\ln r(W_{\lambda })\) where \(r(W)\) is the spectral radius of the positive operator \(R(W)\) acting by

$$\begin{aligned} R(W)g(x)=\int P(x,dy)g(y)\hat{W}(y). \end{aligned}$$
(2.7)

Furthermore, set \(W_{\lambda }(x_1,\ldots ,x_\ell )={\lambda }F(x_1,\ldots ,x_\ell )\) and

$$\begin{aligned} J(u)=\sup _{\lambda }({\lambda }u-r(W_{\lambda })),\, u\in \mathbb{R }. \end{aligned}$$
(2.8)

Then for any closed set \(K\subset \mathbb{R }\),

$$\begin{aligned} \limsup _{N\rightarrow \infty }\frac{1}{N}\ln P\left\{ \frac{1}{N}S_N\in K\right\} \le -\inf _{u\in K} J(u) \end{aligned}$$
(2.9)

and for any open set \(U\subset \mathbb{R }\),

$$\begin{aligned} \liminf _{N\rightarrow \infty }\frac{1}{N}\ln P\left\{ \frac{1}{N}S_N\in U\right\} \ge -\inf _{u\in U} J(u) \end{aligned}$$
(2.10)

where, as before, \(S_N=S_N(F)=\sum _{n=1}^NF\big (X(q_1(n),\ldots , X(q_\ell (n))\big )\).

We observe that a very particular case of Theorem 2.1 when \(\{ X(n), \, n\ge 0\}\) are i.i.d. random variables was considered in Section 6 of [18]. Next, we describe the second level of large deviations in the nonconventional setup which deals with occupational measures \(\zeta _N\) on \(M^\ell \) given by (1.3) where \(M\) is assumed to be a compact space and \({\delta }_z\) is the unit mass concentrated at \(z\). For any probability measure \(\eta \) on \(M^\ell \) define

$$\begin{aligned} I(\eta )= -\inf _{u\in \mathbb{C }_+(M^\ell )}\int _{M^\ell }\!\ln \frac{E_{x_1}\int u(X(1),x_2,\ldots ,x_\ell )d\mu (x_2)\ldots d\mu (x_\ell )}{u(x_1,\ldots ,x_\ell )} d\eta (x_1,\ldots ,x_\ell )\nonumber \\ \end{aligned}$$
(2.11)

where \(\mathbb{C }_+(\cdot )\) denotes the space of all positive continuous functions on a space in brackets.

Theorem 2.2

Let \(k=1\) in (2.3) and (2.4). Then for any continuous function \(W=W(x_1,\ldots ,x_\ell )\) on \(M^\ell \) the limit

$$\begin{aligned} Q(W)=\lim _{N\rightarrow \infty }\frac{1}{N}\ln E_x\exp \left(\sum _{n=1}^N W(X(q_1(n)),\ldots ,X(q_\ell (n)))\right) \end{aligned}$$
(2.12)

is a convex lower semicontinuous functional satisfying

$$\begin{aligned} Q(W)=\sup _{\eta \in \mathcal{P }(M^\ell )}\left(\int W(x_1,\ldots ,x_\ell )d\eta (x_1,\ldots ,x_\ell )- I(\eta )\right) \end{aligned}$$
(2.13)

where \(\mathcal{P }(\cdot )\) is the space of probability measures on a space in brackets considered with the topology of weak convergence.

Furthermore, for any closed set \(K\subset \mathcal{P }(M^\ell )\),

$$\begin{aligned} \limsup _{N\rightarrow \infty }\frac{1}{N}\ln P\{\zeta _N\in K\}\le -\inf _{\eta \in K}I(\eta ) \end{aligned}$$
(2.14)

and for any open set \(U\subset \mathcal{P }(M^\ell )\),

$$\begin{aligned} \liminf _{N\rightarrow \infty }\frac{1}{N}\ln P\{\zeta _N\in U\}\ge -\inf _{\eta \in U}I(\eta ). \end{aligned}$$
(2.15)

Next, we exhibit continuous time versions of the above results. Here we assume that \(X(t),\, t\ge 0\) is a Markov process on a Polish measurable space \((M,\mathcal{B })\) such that for some \(t_0>0\), a constant \(C>0\) and a probability measure \(\nu \) on \(M\) the time \(t_0\) transition probability \(P(t_0,x,\cdot )\) of the above Markov process \(X\) satisfies

$$\begin{aligned} C^{-1}\nu (G)\le P(t_0,x,G)\le C\nu (G) \end{aligned}$$
(2.16)

for any \(x\in M\) and every measurable set \(G\subset M\). Again (see [8]), (2.16) implies existence of a unique invariant measure \(\mu \) of the Markov process \(X\) which satisfies (2.2). Now we introduce positive increasing functions \(q_j,\, j=1,\ldots ,\ell \) on \(\mathbb{R }_+\) such that for some \(0<{\alpha }_1<{\alpha }_2<\cdots <{\alpha }_k\) and \(k\le \ell \),

$$\begin{aligned} q_j(t)={\alpha }_jt\quad \text{ for}\, j=1,\ldots ,k \end{aligned}$$
(2.17)

while for \(j=k+1,\ldots ,\ell \) and any \({\gamma }>0\),

$$\begin{aligned} \lim _{t\rightarrow \infty }\left(q_j(t+{\gamma })-q_j(t)\right)=\infty \,\,\,{and}\,\,\, \liminf _{t\rightarrow \infty }\left(q_j({\gamma }t)-q_{j-1}(t)\right)>0.\qquad \end{aligned}$$
(2.18)

We will be interested in large deviations estimates as \(T\rightarrow \infty \) for

$$\begin{aligned} S_T(F)=S_T=\int _0^TF\big (X(q_1(t)),\ldots ,X(q_\ell (t))\big )dt. \end{aligned}$$

Theorem 2.3

Let \(W_{\lambda }(x_1,\ldots ,x_\ell ),\,{\lambda }\in (-\infty ,\infty )\) be as in Theorem 2.1. Assume that \(k=1\) in (2.17) and (2.18). Then for any \(x\in M\) the limit

$$\begin{aligned} Q_{\text{ cont}}(W_{\lambda })=\lim _{T\rightarrow \infty }\frac{1}{T}\ln E_x\exp \left(\int _0^T W_{\lambda }(X(q_1(t)),\ldots ,X(q_\ell (t)))dt\right)\qquad \end{aligned}$$
(2.19)

exists, it is independent of \(x\) and it is differentiable in \({\lambda }\). In fact, \(Q_{cont}(W_{\lambda })=\ln r_\mathrm{cont}(W_{\lambda })\) where \(r_{ cont}(W)\) is the spectral radius of the semigroup of positive operators \(R^t_{\text{ cont}}(W)\) acting by the formula

$$\begin{aligned} R^t_{cont}(W)g(x)=E_x\left(g(X(t))\hat{W}_{cont}(t)\right) \end{aligned}$$
(2.20)

where

$$\begin{aligned} \hat{W}_{cont}(t)=\exp \left(\int _0^tds\int W_{\lambda }(X({\alpha }_1 s),x_2,\ldots , x_\ell ) d\mu (x_2)\ldots d\mu (x_\ell )\right)\!. \end{aligned}$$
(2.21)

Furthermore, set \(W_{\lambda }(x_1,\ldots ,x_\ell )={\lambda }F(x_1,\ldots ,x_\ell )\) and define \(J(u)=J_{cont}(u)\) by (2.8) with \(r_{cont}\) in place of \(r\). Then for any closed set \(K\subset \mathbb{R }\),

$$\begin{aligned} \limsup _{T\rightarrow \infty }\frac{1}{T}\ln P\left\{ \frac{1}{T}S_T\in K\right\} \le -\inf _{u\in K} J(u) \end{aligned}$$
(2.22)

and for any open set \(U\subset \mathbb{R }\),

$$\begin{aligned} \liminf _{T\rightarrow \infty }\frac{1}{T}\ln P\left\{ \frac{1}{T}S_T\in U\right\} \ge -\inf _{u\in U} J(u). \end{aligned}$$
(2.23)

The second level of large deviations in the continuous time nonconventional setup deals with occupational measures

$$\begin{aligned} \zeta _T=\frac{1}{T}\int _0^T{\delta }_{\left(X(q_1(t)),\ldots ,X(q_\ell (t))\right)}dt \end{aligned}$$
(2.24)

on \(M^\ell \). Now we assume that \(X(t),\, t\ge 0\) is a diffusion process on a compact Riemannian manifold \(M\) with the generator \(L\) which is a nondegenerate second order elliptic differential operator. For any probability measure \(\eta \) on \(M^\ell \) set

$$\begin{aligned} I_\mathrm{cont}(\eta )=-\inf _{u\in D_+}\int _M\frac{L_{x_1}u(x_1,x_2,\ldots ,x_\ell ) d\mu (x_2)\ldots d\mu (x_\ell )}{u(x_1,x_2,\ldots ,x_\ell )}d\eta (x_1,\ldots ,x_\ell )\nonumber \\ \end{aligned}$$
(2.25)

where the infimum is taken over all positive \(u\) from the domain of \(L\).

Theorem 2.4

Let \(k=1\) in (2.17) and (2.18). Then for any continuous function \(W=W(x_1,\ldots ,x_\ell )\) on \(M^\ell \) the limit

$$\begin{aligned} Q_\mathrm{cont}(W)=\lim _{T\rightarrow \infty }\frac{1}{T}\ln E_x\exp \left(\int _0^T W(X(q_1(t)),\ldots ,X(q_\ell (t)))dt\right)=r_\mathrm{cont}(W)\nonumber \\ \end{aligned}$$
(2.26)

is a convex lower semicontinuous functional satisfying

$$\begin{aligned} Q_{cont}(W)=\sup _{\eta \in \mathcal{P }(M^\ell )}\left(\int W(x_1,\ldots ,x_\ell ) d\eta (x_1,\ldots ,x_\ell )-I_{cont}(\eta )\right)\!. \end{aligned}$$
(2.27)

Furthermore, for any closed set \(K\subset \mathcal{P }(M^\ell )\),

$$\begin{aligned} \limsup _{T\rightarrow \infty }\frac{1}{T}\ln P\{\zeta _T\in K\}\le -\inf _{\eta \in K} I_{cont}(\eta ) \end{aligned}$$
(2.28)

and for any open set \(\subset \mathcal{P }(M^\ell )\),

$$\begin{aligned} \liminf _{T\rightarrow \infty }\frac{1}{T}\ln P\{\zeta _T\in U\}\ge -\inf _{\eta \in U} I_{cont}(\eta ). \end{aligned}$$
(2.29)

A similar result holds true when \(X(t)\) is a nondegenerate continuous time Markov chain with a finite state space.

Next, we describe our large deviations estimates in a nonconventional averaging setup. Here we consider either a difference equation (1.4) for \(\Xi ^{\varepsilon }(n)\) in the discrete time case where \(X(n),\, n\ge 0\) is a Markov chain satisfying conditions of Theorem 2.1 or a differential equation for \(\Xi ^{\varepsilon }(t)=\Xi _x^{\varepsilon }(t)\in \mathbb{R }^d,\,t\ge 0\),

$$\begin{aligned} \frac{d\Xi ^{\varepsilon }(t)}{dt}={\varepsilon }F\left(\Xi ^{\varepsilon }(t),X(q_1(t)),\ldots ,X(q_\ell (t))\right), \, \Xi _x^{\varepsilon }(0)=x \end{aligned}$$
(2.30)

in the continuous time setup where \(X(t),\, t\ge 0\) is a Markov process satisfying conditions of Theorem 2.3. We assume that \(F(\xi ,x_1,\ldots , x_\ell )\) is bounded and Lipschitz continuous in \(\xi \). The setup of (2.30) emerges considering, for instance, a time dependent small perturbation of the oscillator equation

$$\begin{aligned} \ddot{x}+{\lambda }^2x={\varepsilon }g(x,\dot{x},t) \end{aligned}$$
(2.31)

where the force term \(g\) depends on time in a random way \(g(x,y,t)=g(x,y, X(q_1(t)), \ldots ,X(q_\ell (t)))\). Then passing to the polar coordinates \((r,\phi )\) with \(x=r\sin ({\lambda }(t-\phi ))\) and \(\dot{x}={\lambda }r\cos ({\lambda }(t-\phi ))\) the Eq. (2.31) will be transformed into (2.30) with \(\Xi ^{\varepsilon }=(r,\phi )\). It seems reasonable that a random force may depend on versions of a same process moving with different speeds which is what we have here.

As it is well known (see, for instance, [25]), if \(F(\xi ,x_1,\ldots , x_\ell )\) is bounded and Lipschitz continuous in \(\xi \) then whenever for each \(\xi \) the (pointwise) limit

$$\begin{aligned} \bar{F}(\xi )=\lim _{\mathcal{T }\rightarrow \infty }\frac{1}{\mathcal{T }}\int _0^\mathcal{T }F(\xi ,X(q_1(t)),\ldots , X(q_\ell (t)))dt \end{aligned}$$

exists then for any \(T\ge 0\),

$$\begin{aligned} \lim _{{\varepsilon }\rightarrow 0}\sup _{0\le t\le T/{\varepsilon }}|\Xi ^{\varepsilon }(t)-\bar{\Xi }^{\varepsilon }(t)|=0 \end{aligned}$$

where

$$\begin{aligned} \frac{d\bar{\Xi }^{\varepsilon }(t)}{dt}={\varepsilon }\bar{F}(\bar{\Xi }^{\varepsilon }(t)). \end{aligned}$$

In the discrete time case we have to take

$$\begin{aligned} \bar{F}(\xi )=\lim _{N\rightarrow \infty }\frac{1}{N}\sum _{n=0}^N F(\xi ,X(q_1(n)),\ldots , X(q_\ell (n))). \end{aligned}$$

Almost everywhere limits of the averages above can be obtained by nonconventional pointwise ergodic theorems from [4] and [1], respectively, in rather general circumstances in the dynamical systems case and under another set of conditions existence of such limits follows from [19]. The next natural step here is to obtain large deviations estimates for the above approximation of the slow motion \(\Xi ^{\varepsilon }\) by the averaged one \(\bar{\Xi }^{\varepsilon }\).

For any \(\eta \in \mathcal{P }(M^\ell )\) set

$$\begin{aligned} \bar{B}_\eta (\xi )=\int B(\xi ,x_1,\ldots ,x_\ell )d\eta (x_1,\ldots ,x_\ell ). \end{aligned}$$
(2.32)

For each absolutely continuous curve \({\gamma }_t,\, t\in [0,\mathcal{T }]\) set

$$\begin{aligned} \mathcal{S }_{0T}({\gamma })=\int _0^\mathcal{T }\inf \{ I(\eta ):\,\dot{\gamma }_t =\bar{B}_\eta ({\gamma }_t)\}dt \end{aligned}$$

where \(I(\eta )\) is given by (2.11) or \(I(\eta )=I_\mathrm{cont}(\eta )\) given by (2.25) in the discrete or continuous time cases, respectively. If \({\gamma }_t,\, t\in [0,T]\) is not absolutely continuous we set \(S_{0T}({\gamma }) =\infty \).

Theorem 2.5

Let \(k=1\) in (2.3) and (2.4) or in (2.17) and (2.18) and set \(\Psi ^{\varepsilon }(t)=\Xi ^{\varepsilon }([t/{\varepsilon }])\) or \(\Psi ^{\varepsilon }(t)=\Xi ^{\varepsilon }(t/{\varepsilon })\) in the discrete or continuous time cases, respectively. Then for any continuous function \(W_t(x_1,\ldots ,x_\ell )\) on \(\mathbb{R }_+\times M^\ell \),

$$\begin{aligned} \lim _{{\varepsilon }\rightarrow 0}{\varepsilon }\ln E_x\exp \left({\varepsilon }^{-1}\int _0^TW_t\big ( X(q_1(t/{\varepsilon })),\ldots , X(q_\ell (t/{\varepsilon }))\big )dt\right)=\int _0^Tr_{ cont}(W_t)dt\nonumber \\ \end{aligned}$$
(2.33)

where \(r_\mathrm{cont}\) is the same as in Theorem 2.3 with \(W_t\) considered as a function on \(M^\ell \) and in the discrete time case we either extend \(q_j(t)=q_j([t])\) to all \(t\ge 0\) in order to write the integral in exponent in (2.33) or replace this integral by the corresponding sum.

Furthermore, for any \(a,{\delta },{\lambda }>0\) and every continuous \({\gamma }_t,\, t\in [0,\mathcal{T }],\,{\gamma }_0=x\) there exist \({\varepsilon }_0>0\) such that for all positive \({\varepsilon }<{\varepsilon }_0\),

$$\begin{aligned} P\{\,\rho _{0,\mathcal{T }}(\Psi _x^{\varepsilon },{\gamma })<{\delta }\}\ge \exp \left\{ -\frac{1}{{\varepsilon }}(\mathcal{S }_{0,\mathcal{T }}({\gamma })+{\lambda })\right\} \quad \text{ and}\end{aligned}$$
(2.34)
$$\begin{aligned} P\{\,\rho _{0,\mathcal{T }}(\Psi _x^{\varepsilon },\Phi ^a_{0,\mathcal{T }}(x))\ge {\delta }\}\le \exp \left\{ -\frac{1}{{\varepsilon }}(a-{\lambda })\right\} \end{aligned}$$
(2.35)

where \(\Psi _x^{\varepsilon }(0)=x,\,\rho _{0,\mathcal{T }}\) is the uniform distance and \(\Phi _{0,\mathcal{T }}^a(x)=\{{\gamma }:\,{\gamma }_0=x,\, \mathcal{S }_{0,T}({\gamma })\le a\}\).

Remark 2.6

Suppose that the averaged motion \(\bar{\Xi }^{\varepsilon }\) has several attracting fixed points and limit circles. Then similarly to [12] (Markov chains case) and [17] (dynamical systems case) we can study rare transitions of the slow motion \(\Xi ^{\varepsilon }\) between these attractors. However, in the nonconventional setup the situation is more complicated and this problem will not be dealt with in this paper.

Certain versions of Theorems 2.2–2.5 can be obtained for some classes of dynamical systems such as mixing subshifts of finite type and \(C^2\) hyperbolic and expanding transformations but in order not to interrupt probabilistic exposition here we discuss some of these results in the last Sect.  5.

In the next section we will show that the study of large deviations in our nonconventional setup can be always reduced to the case \(k=\ell \), i.e. we have to deal only with \(q_j(n)=jn,\, j=1,\ldots ,k\). So we discuss next this situation allowing any \(k\ge 1\) while assuming that \(X(n),\, n\ge 0,\,q_j\) and \(F\) are the same as in Theorem 2.1. It turns out that the treatment of the general case when \(X(0),X(1),X(2),\ldots \) is a Markov chain requires a quite complicated and technical proof whose exposition here would make this paper too long, and so it will be discussed in another paper. Thus, we will restrict ourselves here to a particular case when \(X(n),\, n\ge 0\) are independent identically distributed (i.i.d.) random variables (or vectors). Namely, we are interested in large deviations estimates for \(S_N(F)= \sum _{n=1}^NF(X(n),X(2n),\ldots ,X(kn))\) where \(X(n)\in M,\, n\ge 1\) are i.i.d. random variables (vectors) with a compact support \(M\). Let \(r_1,\ldots ,r_m\ge 2\) be all primes not exceeding \(k\). Set \(A_n=\{ a\le n:\, a\,\,\text{ is} \text{ relatively} \text{ prime} \text{ with}\, r_1,\ldots ,r_m\}\) and \(B_\eta (a)=\{ b\le \eta :\, b=ar_1^{d_1}r_2^{d_2}\cdots r_m^{d_m}\) for some nonnegative integers \(d_1,\ldots ,d_m\}\). Now for any bounded measurable function \(V\) on \(M^k\) we write

$$\begin{aligned} S_N(V)=\sum _{a\in A_N}S_{N,a}(V)\,\,\text{ with}\,\, S_{N,a}(V)= \sum _{b\in B_N(a)}V(X(b),X(2b),\ldots ,X(kb)).\nonumber \\ \end{aligned}$$
(2.36)

Observe that \(S_{N,a}(V),\, a\in A_V\) is a collection of independent random variables.

Theorem 2.7

For any continuous function \(V\) on \(M^k\) the limit

$$\begin{aligned} Q(V)&= \lim _{N\rightarrow \infty }\frac{1}{N}\ln E\exp \left(\sum _{n=1}^NV(X(n),X(2n),\ldots ,X(kn))\right)\nonumber \\&= \lim _{N\rightarrow \infty }\frac{1}{N}\sum _{a\in A_N}\ln E\exp S_{N,a}(V) \end{aligned}$$
(2.37)

exists and the functional \(Q(V)\) is convex and lower semicontinuous. If \(V=V_{\lambda }\) depends on a parameter \({\lambda }\) and has a bounded derivative in \({\lambda }\) then \(Q(V_{\lambda })\) is also differentiable in \({\lambda }\). Thus taking \(V_{\lambda }={\lambda }F\) we obtain that also for \(k\ge 2\) in the above i.i.d. setup both upper and lower large deviations bounds (2.9) and (2.10) hold true with the rate functional \(J\) being the Fenchel-Legendre transform \(J(u)=\sup _{\lambda }({\lambda }u- Q({\lambda }F))\) of \(Q\).

In Sect. 4 we will provide a rather explicit computation of the limit (2.37). As a model application of Theorem 2.7 we can consider digits \(X(n)=X(n,{\omega }),\, n\ge 1\) of base \(M\) expansions \({\omega }=\sum _{n=1}^\infty \frac{X(n,{\omega })}{M^n},\,X(n,{\omega })\in \{ 0,1,\ldots ,M-1\}\) of numbers \({\omega }\in [0,1)\) which are i.i.d. random variables on the probability space \(([0,1),\mathcal{B },P)\) where \(\mathcal{B }\) is the Borel \({\sigma }\)-algebra and \(P\) is the Lebesgue measure. Take, for instance, \(V(x_1,\ldots ,x_k)={\delta }_{{\alpha }_1x_1} {\delta }_{{\alpha }_2x_2}\cdots {\delta }_{{\alpha }_kx_k}\) for some \({\alpha }_1,\ldots ,{\alpha }_k \in \{ 0,1,\ldots ,M-1\}\) with \({\delta }_{ij}=1\) if \(i=j\) and \(=0\), otherwise. Then Theorem 2.7 provides large deviations estimates for the number

$$\begin{aligned}&n_{{\alpha }_1,\ldots ,{\alpha }_k}(N,{\omega })= \#\{ n\le N:\, X(n,{\omega })={\alpha }_1,X(2n,{\omega })={\alpha }_2,\nonumber \\&\ldots ,X(kn,{\omega })={\alpha }_k\}=\sum _{n=1}^NV(X(n,{\omega }),\ldots ,X(kn,{\omega })). \end{aligned}$$
(2.38)

The same setup can be reformulated in the following way. Consider infinite sequences of letters (colors, spins, etc.) taken out of an alphabet of size \(M\). Let \(n_{{\alpha }_1,\ldots ,{\alpha }_k}(N)\) be the number of arithmetic progressions of length \(k\) with both the first term and the difference equal \(n\le N\) and having the letter (color, spin, etc.) \({\alpha }_i\) on the place \(i=1,2,\ldots ,k\). Then Theorem 2.7 yields large deviations bounds for \(n_{{\alpha }_1,\ldots ,{\alpha }_k}(N)\) as \(N\rightarrow \infty \) considered as a random variable on the space of sequences of letters with any product probability measure, in particular, with uniform probability measure which assigns the same weight to each combination of \(n\) consecutive letters (i.e. to each cylinder set of length \(n\)) for all \(n=1,2,\ldots \). We observe that another statistical physics interpretation of a particular case of the above i.i.d. setup appeared independently in a recent paper [6] though large deviations bounds were obtained there only for the case \(k=M=2\).

3 Large deviations for Markov processes: \(k=1\) case

3.1 Reduction to the \(k=\ell \) case

First, we will show that the study of the limit (2.6) for any \(k\le \ell \) can be reduced to the case \(k=\ell \). In order to apply this result not only to Markov chains but also to other fast mixing processes, in particular to dynamical systems considered in Sect.  5, we will deal here with a somewhat more general setup.

Let \(\{ X(n),\, n=0,1,\ldots \}\) be a sequence of measurable mappings of a measurable space \(({\Omega },\mathcal{F })\) to a Polish space \(M\) considered with its Borel \({\sigma }\)-algebra \(\mathcal{B }\). Since \((M,\mathcal{B })\) is isomorphic to a Borel subset \({\Upsilon }\) of an interval we can and do identify \(M\) with \({\Upsilon }\) and assume that each \(X(n)\) is real (or vector) valued. Then \(\{ X(n),\, n=0,1,\ldots \}\) becomes a real (or vector) valued stochastic process under each probability measure on \(({\Omega },\mathcal{F })\). Our setup includes two such measures \(P\) and \(\Pi \) while we assume that \(X(n)\Pi =\mu \) does not depend on \(n\), i.e. that the one dimensional distribution \(\mu \) of \(X(n)\) on the probability space \(({\Omega },\mathcal{F },\Pi )\) is the same for all \(n\). In order to state our conditions we introduce also a family of \({\sigma }\)-algebras \(\mathcal{F }_{ml}\subset \mathcal{F },\,-\infty \le m\le l\le \infty \) satisfying \(\mathcal{F }_{-\infty ,\infty }=\mathcal{F }\) and \(\mathcal{F }_{ml}\subset \mathcal{F }_{m^{\prime }l^{\prime }}\) if \(m^{\prime }\le m\) and \(l^{\prime }\ge l\). Next, we define a modified \(\psi \)-mixing (dependence) coefficient by

$$\begin{aligned}&\psi (n)=\psi _{P,\Pi }(n)=\sup _{l\ge 0,g}\{\Vert E_P(g|\mathcal{F }_{-\infty ,l})- E_\Pi g\Vert _\infty :\\&\,g\,\,\text{ is}\,\,\mathcal{F }_{l+n,\infty }-\text{ measurable} \text{ and}\,\, E_\Pi |g|\le 1\} \end{aligned}$$

where \(E_Q\) is the expectation with respect to a probability measure \(Q\) and \(\Vert \cdot \Vert _\infty \) is the \(L^\infty ({\Omega },P)\) norm. The rational behind introduction of two probability measures \(P\) and \(\Pi \) above is to allow \(X(n),\, n\ge 0\) to be a Markov chain with an arbitrary initial distribution (in particular, starting at a point) under \(P\) while \(X(n)\) is stationary under \(\Pi \) and the distribution of \(X(n)\) under \(P\) converges to \(\mu =X(0)\Pi \). Furthermore, we will not assume measurability of \(X(n)\)’s with respect to some of \({\sigma }\)-algebras \(\mathcal{F }_{m,l}\) but instead will rely on approximation coefficients defined for each bounded continuous function \(V=V(x_1,\ldots ,x_\ell )\) on \(M^\ell \) by

$$\begin{aligned}&{\beta }_V(n)=\max _{1\le j\le \ell }\sup _{x_1,\ldots ,x_{j-1},x_{j+1},\ldots ,x_\ell \in M} \sup _{m\ge 0}\Vert V(x_1,\ldots ,x_{j-1},\\&X(m),x_{j+1},\ldots ,x_\ell )-V(x_1,\ldots ,x_{j-1},E_P(X(m)|\mathcal{F }_{m-n,m+n}), x_{j+1},\ldots ,x_\ell )\Vert _\infty . \end{aligned}$$

Since \(V\) is continuous we can take here the supremum over a countable dense set in \(M^{\ell -1}\), and so outside of one \(P\)-measure zero set \({\beta }_V(n)\) gives a uniform bound of the difference above.

Proposition 3.1

Let \(V(x_1,\ldots ,x_\ell )\) be a bounded continuous function on \(M^\ell \) and assume that

$$\begin{aligned} \lim _{n\rightarrow \infty }(\psi (n)+{\beta }_V(n))=0 \end{aligned}$$
(3.1)

together with the conditions (2.3) and (2.4) on functions \(q_j,\, j=1,\ldots ,\ell \). Then,

$$\begin{aligned}&\lim _{N\rightarrow \infty }\frac{1}{N}\left(\ln E_P\exp \left(\sum _{n=1}^NV(X(q_1(n)),\ldots , X(q_\ell (n))\right)\right)\nonumber \\&-\ln E_P\exp \left(\sum _{n=1}^NV^{(k)}(X(n),X(2n),\ldots ,X(kn))\big )\right)=0 \end{aligned}$$
(3.2)

where for each \(m<\ell \),

$$\begin{aligned}&V^{(m)}(x_1,\ldots ,x_m)=\ln \int _M\ldots \int _M\exp (V(x_1,\ldots ,x_m,x_{m+1},\ldots ,x_\ell ))\nonumber \\&d\mu (x_{m+1})\dot{s}d\mu (x_\ell )\,\,\text{ and}\,\, V^{(\ell )}=V. \end{aligned}$$
(3.3)

If, in fact, \(X(n)\) is \(\mathcal{F }_{n,n}\)-measurable then (3.2) holds true for any bounded measurable function \(V\) assuming only that \(\psi (n)\rightarrow 0\) as \(n\rightarrow \infty \).

Proof

Observe that (2.4) yields

$$\begin{aligned} \lim _{n\rightarrow \infty }(q_j({\gamma }n)-q_{j-1}(n))=\infty \,\,\text{ for} \text{ any}\,\, j>k\,\,\text{ and}\,\,{\gamma }>0. \end{aligned}$$
(3.4)

Set

$$\begin{aligned} d_{\gamma }(n)=\min _{k+1\le j\le \ell }\min \left(q_j({\gamma }n)-q_{j-1}(n), \min _{l\ge {\gamma }n}(q_j(l)-q_j(l-1))\right) \end{aligned}$$

and observe that \(d_{\gamma }(n)\rightarrow \infty \) as \(n\rightarrow \infty \) in view of (2.4) and (3.4). For any \(l=0,1,\ldots \) and \(0\le r\le \infty \) set

$$\begin{aligned} X_r(l)=E_P(X(l)|\mathcal{F }_{l-r,l+r}). \end{aligned}$$

Next, for \(m=1,2,\ldots ,\ell ,\,a\le b\le c\) and \(0\le r\le \infty \) denote

$$\begin{aligned}&Z_r^{(m)}(a,b,c)=E_P\exp \left(\sum _{a<l\le b}V^{(m)}(X_r(q_1(l)),\ldots , X_r(q_m(l))\right)\\&\quad +\sum _{b<l\le c}V^{(m-1)}(X_r(q_1(l)),\ldots ,X_r(q_{m-1}(l)))). \end{aligned}$$

If \(b=c\), i.e. we have only the first sum above, we set \(Z_r^{(m)}(a,b,c)= Z_r^{(m)}(a,b)\). If \(r=\infty \) we drop the index \(r\) and write just \(Z^{(n)}(a,b,c)\) or \(Z^{(m)}(a,b)\). Observe that

$$\begin{aligned} e^{-C(V){\gamma }N}Z_r^{(m)}({\gamma }N,N)\le Z^{(m)}_r(0,N)\le e^{C(V){\gamma }N} Z_r^{(m)}({\gamma }N,N) \end{aligned}$$
(3.5)

where \(C(V)=\sup _{(x_1,\ldots ,x_\ell )\in M^\ell }|V(x_1,\ldots ,x_\ell )|\). By the definition of \({\beta }_V(n)\) (and the remark after it) we obtain also that for any \(m\!=\!1,2,\ldots ,\ell ,\,a\!\le \! b\!\le \! c\) and \(0\!\le \! r\!\le \!\infty \),

$$\begin{aligned} Z_r^{(m)}(a,b,c)e^{-(c-a)\ell {\beta }_V(r)}\le Z^{(m)}(a,b,c)\le Z_r^{(m)}(a,b,c) e^{(c-a)\ell {\beta }_V(r)}. \end{aligned}$$
(3.6)

Let \(g=g(x,y)\) be a bounded measurable function on a product \(M_1\times M_2\) [(for some measurable spaces \((M_1,\mathcal{B }_1)\) and \((M_2,\mathcal{B }_2)\)] and \(X:{\Omega }\rightarrow M_1\) and \(Y:{\Omega }\rightarrow M_2\) be \(\mathcal{F }_{-\infty ,l}-\) and \(\mathcal{F }_{l+n,\infty }-\)measurable random variables (maps), respectively. Then it follows from the definition of \(\psi (n)=\psi _{P,\Pi }(n)\) that

$$\begin{aligned} |E_P(g(X,Y)|\mathcal{F }_{-\infty ,l})-g_\Pi (X)|\le \psi (n)|g|_\Pi (X) \end{aligned}$$
(3.7)

where \(g_\Pi (x)=E_\Pi g(x,Y)\) and \(|g|_\Pi (x)=E_\Pi |g(x,Y)|\). Now take \(r=r_{\gamma }(N)=[\frac{1}{3}d_{\gamma }(N)]\) where \([\cdot ]\) denotes the integral part. Then for all \(N\ge n\ge {\gamma }N+1,\,m=k+1,\ldots ,\ell \) and \(N\) large enough

$$\begin{aligned}&Z_r^{(m)}({\gamma }N,n,N)=E_P\left( J\exp (\sum _{{\gamma }N<l\le n-1} V^{(m)}(X_r(q_1(l)),\ldots ,X_r(q_m(l))\right)\nonumber \\&\quad +\sum _{n<l\le N}V^{(m-1)}(X_r(q_1(l)),\ldots ,X_r(q_{m-1}(l)))) \end{aligned}$$
(3.8)

where

$$\begin{aligned} J=J_r(n)=E_P\left(\exp \big (V^{(m)}(X_r(q_1(n)),\ldots ,X_r(q_m(n)))\big )\big \vert \mathcal{F }_{-\infty ,q_m(n-1)+r}\right)\!. \end{aligned}$$

By (3.7) and the definition of \({\beta }_V\) we conclude that

$$\begin{aligned}&\bigg \vert J-\int \exp \big (V^{(m)}(X_r(q_1(n)),\ldots ,X_r(q_{m-1}(n)),y)\big ) d\mu (y)\bigg \vert \nonumber \\&\le \eta (r)\int \exp \big (V^{(m)}(X_r(q_1(n)),\ldots ,X_r(q_{m-1}(n)),y)\big )d\mu (y) \end{aligned}$$
(3.9)

where \(\eta (n)=(\psi (n)+2{\beta }_V(n)+2{\beta }_V(n)\psi (n))e^{C(V)}\rightarrow 0\) as \(n\rightarrow \infty \). Employing (3.8) and (3.9) for \(n=N,N-1,\ldots ,[{\gamma }N]+1\) we obtain that

$$\begin{aligned} (1-\eta (r))^NZ_r^{(m-1)}({\gamma }N,N)\le Z_r^{(m)}({\gamma }N,N)\le (1+\eta (r))^NZ_r^{(m-1)}({\gamma }N,N).\nonumber \\ \end{aligned}$$
(3.10)

Next, we use (3.10) for \(m=\ell ,\ell -1,\ldots ,k+1\) which together with (3.5) and (3.6) yields that

$$\begin{aligned}&(1-\eta (r))^{\ell N}e^{-2N(\ell {\beta }_V(r)+C(V){\gamma })}Z^{(k)}(0,N)\le Z^{(\ell )}(0,N)\nonumber \\&\le (1+\eta (r))^{\ell N}e^{2N(\ell {\beta }_V(r)+C(V){\gamma })}Z^{(k)}(0,N). \end{aligned}$$
(3.11)

Taking \(\ln \) in (3.11), dividing by \(N\), letting \(N\rightarrow \infty \) and taking into account that then \(r=r(N)\rightarrow \infty \), we obtain that

$$\begin{aligned} \limsup _{N\rightarrow \infty }\frac{1}{N}\big \vert \ln Z^{(\ell )}(0,N)-\ln Z^{(k)}(0,N) \big \vert \le 2C(V){\gamma }\end{aligned}$$

and (3.2) follows since \({\gamma }>0\) is arbitrary.

If \(X(n)\) is \(\mathcal{F }_{n,n}\)-measurable for each \(n\) then we do not have to deal with the approximation coefficient \({\beta }_V(r)\) and \(X_r=X,\, Z_r^{(m)}= Z^{(m)}\) above. Hence all above arguments remain true with \({\beta }_V(r)=0\) for any bounded measurable \(V\) and we obtain (3.2) provided \(\psi (n)\rightarrow 0\) as \(n\rightarrow \infty \). \(\square \)

It is easy to check the conditions of Proposition 3.1 for Markov chains \(X(n),\, n\ge 0\) satisfying the “strong” Doeblin condition (2.1). Indeed, denote by \(\mathcal{F }_{l,m},\, l\le m\) the \({\sigma }\)-algebra generated by \(X(l),\ldots , X(m)\) with \(\mathcal{F }_{l,\infty }\) being the minimal \({\sigma }\)-algebra containing all \(\mathcal{F }_{l,m},\, m\ge l\) and we set \(\mathcal{F }_{l,m}=\mathcal{F }_{0,m}\) for \(l<0\) and \(m\ge 0\). If \(g\) is \(\mathcal{F }_{l+n,\infty }-\)measurable then by the Markov property

$$\begin{aligned} E_P(g|\mathcal{F }_{-\infty ,l})=\int P(n,X(l),dy)E_{P_y}g \end{aligned}$$
(3.12)

where \(P_y\) is the probability measure on the path space of the Markov chain \(X(n)\) starting at \(y\). The Chapman-Kolmogorov equation says that for any \(n\ge n_0\),

$$\begin{aligned} P(n,x,G)=\int P(n-n_0,x,dy)P(n_0,y,G), \end{aligned}$$

and so by (2.1) for all such \(n\),

$$\begin{aligned} C^{-1}\nu (G)\le P(n,x,G)\le C\nu (G). \end{aligned}$$

This together with the Radon-Nikodim theorem yields existence for \(\nu \)-almost all \(y\) and \(n\ge n_0\) of the transition density \(p(n,x,y)\) satisfying

$$\begin{aligned} C^{-1}\le p(n,x,y)=\frac{dP(n,x,\cdot )}{d\nu }(y)\le C. \end{aligned}$$

It is well known (see, for instance, [8]) that (2.1) and (2.2) imply that

$$\begin{aligned} (1-Ke^{-{\kappa }n})p(y)\le p(n,x,y)\le (1+Ke^{-{\kappa }n})p(y) \end{aligned}$$
(3.13)

for some \(K,{\kappa }>0\) independent of \(n\ge n_0\). If \(\Pi \) is the stationary probability of the Markov chain on the path space then

$$\begin{aligned} E_\Pi g=\int p(y)E_{P_y}gd\nu (y). \end{aligned}$$

Hence, by (3.12) and (3.13),

$$\begin{aligned} \Vert E_P(g|\mathcal{F }_{-\infty ,l})-E_\Pi g\Vert _\infty \le Ke^{-{\kappa }n}E_\Pi |g|. \end{aligned}$$

Thus the condition (3.1) with \({\beta }_V(n)=0\) is satisfied in our Markov chains case.

Corollary 3.2

Assume that conditions of Proposition 3.1 hold true. Suppose that for any bounded measurable function \(V_{\lambda }(x_1,\ldots ,x_k)\) on \(\mathbb{R }\times M^k\) having a bounded in \(x_1,\ldots ,x_k\) derivative in a parameter \({\lambda }\in (-\infty ,\infty )\) the limit

$$\begin{aligned} Q(V_{\lambda })=\lim _{N\rightarrow \infty }\frac{1}{N}\ln E_x\exp \left(\sum _{n=1}^NV_{\lambda }(X(n),X(2n),\ldots ,X(kn))\right) \end{aligned}$$

exists, it is a lower semicontinuous convex functional and it is differentiable in the parameter \({\lambda }\). Then for any bounded measurable function \(W_{\lambda }(x_1,\ldots ,x_\ell )\) on \(\mathbb{R }\times M^\ell \) having a bounded in \(x_1,\ldots ,x_\ell \) derivative in a parameter \({\lambda }\in (-\infty ,\infty )\) the limit

$$\begin{aligned} Q(W_{\lambda })=\lim _{N\rightarrow \infty }\frac{1}{N}\ln E_x\exp \left(\sum _{n=1}^NW_{\lambda }(X(q_1(n)), ,\ldots ,X(q_\ell (n)))\right)=Q(W_{\lambda }^{(k)}) \end{aligned}$$

exists, it is a lower semicontinuous convex functional and it is differentiable in the parameter \({\lambda }\). In particular, the large deviations estimates in the form (2.9) and (2.10) hold true then with the rate functional \(J\) given by (2.8) with \(W_{\lambda }={\lambda }F\).

proof

By Proposition 3.1, \(Q(W_{\lambda })=Q(W_{\lambda }^{(k)})\) and we see from (3.3) that if \(W_{\lambda }\) is bounded and has a bounded derivative in \({\lambda }\) then so does \(W_{\lambda }^{(k)}\). Hence, by the assumption \(Q(W_{\lambda }^{(k)})\) is a lower semicontinuous convex functional and it is differentiable in \({\lambda }\) which implies the same for \(Q(W_{\lambda })\) and the result follows. \(\square \)

Now let \(k=1\) and \(V=W_{\lambda }\) as in Theorem 2.1. Then \(\hat{W}_{\lambda }= V^{(1)}\) and by Proposition 3.1,

$$\begin{aligned} Q(W_{\lambda })=\lim _{N\rightarrow \infty }\frac{1}{N}\ln E_x\exp \left(\sum _{n=1}^N \hat{W}_{\lambda }(X(n))\right)\!. \end{aligned}$$
(3.14)

Thus we arrive at the standard limit appearing in “conventional” large deviations results which is well known for Markov chains \(X(n),\, n\ge 0\) satisfying our conditions as it is described in Theorem 2.1. Differentiability of \(Q(W_{\lambda })\) in \({\lambda }\) follows from standard results on positive operatos (see, for instance, [20]) and we derive now Theorem 2.1 from well known “conventional” large deviations results (see, for instance, [9, 16] and Section 2.3 in [11]). \(\square \)

3.2 2nd level of large deviations

Recall that in the setup of Theorem 2.2 we have \(k=1,\,M\) being a compact space and the result is about large deviations for occupational measures \(\zeta _N\) appearing there. Let \(W\) be a continuous function on \(M^\ell \) with \(\hat{W}\) defined by (2.5). By Proposition 3.1 together with the well known facts (see, for instance, [9, 11] and [16]),

$$\begin{aligned} Q(W)=\lim _{N\rightarrow \infty }\frac{1}{N}\ln E_x\exp \left(\sum _{n=1}^NW(X(q_1(n)),\ldots , X(q_\ell (n)))\right)=\ln (r(W))\nonumber \\ \end{aligned}$$
(3.15)

where \(r(W)\) is the spectral radius of the operator

$$\begin{aligned} R(W)g(x)=E_x\left(g(X(1))\hat{W}(X(1))\right)=E_x\left(g(X(1))e^{\ln \hat{W}(X(1))}\right)\!. \end{aligned}$$
(3.16)

Observe, that by the Donsker–Varadhan variational formula (see [9] and [10]),

$$\begin{aligned} Q(W)=\sup _{\nu \in \mathcal P (M)}\left(\int _M\ln \hat{W}(x)d\nu (x)-\hat{I}(\nu )\right) \end{aligned}$$
(3.17)

where \(\hat{I}(\nu )=-\inf _{u\in C_+(M)}\int \ln \frac{E_xu(X(1))}{u(x)}d\nu (x)\) and the infimum is taken over positive continuous functions on \(M\).

Next, let \(Y^{(i)}(n),\, i=2,\ldots ,\ell ;\, n=0,1,2,\ldots \) be i.i.d. \(M\)-valued random variables with the distribution \(\mu \), all of them independent of the Markov chain \(X(n),\, n\ge 0\). Then it is easy to see that

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{1}{N}\ln E_x\exp \left(\sum _{n=1}^NW(X_n,Y^{(2)}(n),\ldots , Y^{(\ell )}(n))\right)=Q(W). \end{aligned}$$
(3.18)

Indeed, let \(\mathcal{F }_X\) be the \(\sigma \)-algebra generated by the Markov chain \(X(n),n\ge 0\). Then

$$\begin{aligned}&E_x\exp \left(\sum _{n=1}^NW(X(n),Y^{(2)}(n),\ldots ,Y^{(\ell )}(n))\right)\nonumber \\&=E_x\left(E_x(\exp (\sum _{n=1}^NW(X(n),Y^{(2)}(n),\ldots ,Y^{(\ell )}(n)))| \mathcal{F }_X)\right)\nonumber \\&=E_x\exp \left(\sum _{n=1}^N\ln \hat{W}(X(n))\right) \end{aligned}$$
(3.19)

and (3.18) follows. But now we have the standard situation for the Markov chain \((X(n),Y^{(2)}(n),\ldots ,Y^{(\ell )}(n)),\, n\ge 0\), and so by the Donsker–Varadhan variational formula (see [9] and [10]),

$$\begin{aligned} Q(W)=\sup _{\nu \in \mathcal P (M\times \cdots \times M)}\left(\int W(x_1,x_2,\ldots ,x_\ell )d\nu (x_1,\ldots ,x_\ell )-I(\nu )\right) \end{aligned}$$
(3.20)

where

$$\begin{aligned}&I(\nu )=-\inf _{u\in C_+(M\times \cdots \times M)}\int _{M\times \cdots \times M}\nonumber \\&\ln \frac{E_{x_1}\int u(X(1),x_1,\ldots ,x_\ell )d\mu (x_2)\ldots d\mu (x_\ell )}{u(x_1,\ldots ,x_\ell )}d\nu (x_1,\ldots ,x_\ell ). \end{aligned}$$
(3.21)

It is known here (see, for instance, Proposition 5.1 in [15]) that there exists a unique \(\nu =\nu _W\) on which the supremum in (3.20) is attained and it follows from the standard theory (see, for instance, [16]) that \(I(\nu )\) is the rate functional for the second level large deviations both for the auxiliary occupational measures

$$\begin{aligned} \frac{1}{N}\sum _{n=1}^N\delta _{(X_n,Y^{(2)}_n,\ldots ,Y^{(\ell )}_n)} \end{aligned}$$

and for our nonconventional occupational measures \(\zeta _N\). \(\square \)

3.3 Continuous time case

Similarly to the discrete time case, the main step in the proof of Theorem 2.3 is to establish (2.19) and to identify the limit there as the spectral radius of the semigroup (2.20).

From (2.16) it follows that for any \(t\ge t_0\) and every measurable set \(G\subset M\),

$$\begin{aligned} P(t,x,G)=\int _Gp(t,x,y)d\nu (y)\,\,\,\text{ with}\,\,\, C^{-1}\le p(t,x,y) \le C. \end{aligned}$$
(3.22)

Furthermore, similarly to (3.5) (see [8]),

$$\begin{aligned} (1-Ke^{-{\kappa }t})p(y)\le p(t,x,y)\le (1+Ke^{-{\kappa }t})p(y) \end{aligned}$$
(3.23)

where \(p(y)=\frac{d\mu }{d\nu }(y)\) is the density of the unique invariant measure \(\mu \) of the Markov process \(X\). Observe that (2.18) implies also that for any \(j\ge k+1\) and \({\gamma }>0\),

$$\begin{aligned} \lim _{t\rightarrow \infty }(q_j({\gamma }t)-q_{j-1}(t))=\infty . \end{aligned}$$
(3.24)

Let \(V=V(x_1,\ldots ,x_\ell )\) be a bounded measurable function on \(M^\ell \) and for \(m=1,2,\ldots ,\ell \) set

$$\begin{aligned} V^{(m)}_\mathrm{cont}(x_1,\ldots ,x_m)=\int _M\ldots \int _MV(x_1,\ldots ,x_m,x_{m+1},\ldots , x_\ell )d\mu (x_{m+1})\ldots d\mu (x_\ell ) \end{aligned}$$

with \(V^{(\ell )}_\mathrm{cont}=V\). Set \(t_n({\gamma },T)={\gamma }T+n({\gamma }+{\gamma }^2)\) for \(n=0,1,2,\ldots ,M({\gamma },T)-1\) where \(M({\gamma },T)=[(T(1-{\gamma })/({\gamma }+{\gamma }^2)]\). Next, for \(a\le b\le c\) and \(m=1,2,\ldots ,\ell \) we denote

$$\begin{aligned}&Z^{(m)}_x(a,b,c)=E_x\exp \bigg (\sum _{a\le n<b}\int _{t_n({\gamma },T)}^{t_n({\gamma },T) +{\gamma }}V^{(m)}_\mathrm{cont}(X(q_1(t)),\ldots ,X(q_m(t)))dt\\&+\sum _{b\le n<c}\int _{t_n({\gamma },T)}^{t_n({\gamma },T)+{\gamma }}V^{(m-1)}_\mathrm{cont} (X(q_1(t)),\ldots ,X(q_{m-1}(t)))dt\bigg ) \end{aligned}$$

and set \(Z^{(m)}_x(a,b)=Z^{(m)}_x(a,b,b)\). Observe that \(Z_x^{(\ell )}(0,M({\gamma },T))\) does not contain the integration from 0 to \({\gamma }T\) as well as the sum of integrals from \(t_n({\gamma },T)+{\gamma }\) to \(t_n({\gamma },T)+{\gamma }+{\gamma }^2\) which are both present in the integral from 0 to \(T\), and so estimating these missing parts we arrive at the inequality

$$\begin{aligned} \exp (-2C(V){\gamma }T)Z_x^{(\ell )}(0,M({\gamma },T))&\le E_x\exp \left(\int _0^TV(X(q_1(t)),\ldots ,X(q_\ell (t)))dt\right)\nonumber \\&\le \exp (2C(V){\gamma }T)Z_x^{(\ell )}(0,M({\gamma },T)) \end{aligned}$$
(3.25)

where \(C(V)=\sup _{(x_1,\ldots ,x_\ell )}|V(x_1,\ldots ,x_\ell )|\).

Denote by \(\mathcal{F }_t\) the \({\sigma }\)-algebra generated by \(X(s),\, s\,{\le }\, t\). Then by (2.18) and (3.24) for all \(T\) large enough if \(n\ge 1,\, T\ge t\ge t_n({\gamma },T)\) and \(s\le t_{n-1}({\gamma },T)+{\gamma }\) then \(X(q_1(s)),\ldots , X(q_m(s))\) and \(X(q_1(t)),\ldots ,X(q_{m-1}(t))\) are \(\mathcal{F }_{q_m(t_n({\gamma },T) -{\gamma }^2)}-\)measurable. Hence,

$$\begin{aligned}&Z^{(m)}_x(0,n,M({\gamma },T))=E_x\left( J_{m,n}\exp \left(\sum _{0\le l<n} \int _{t_l({\gamma },T)}^{t_l({\gamma },T)+{\gamma }}\right.\right.\nonumber \\&V^{(m)}_\mathrm{cont}(X(q_1(s)),\ldots ,X(q_m(s)))ds+\sum _{n+1\le l<M({\gamma },T)} \int _{t_l({\gamma },T)}^{t_l({\gamma },T)+{\gamma }}\nonumber \\&\left.\left.V^{(m-1)}_\mathrm{cont}(X(q_1(s)),\ldots ,X(q_{m-1}(s)))ds {} \right)\right) \end{aligned}$$
(3.26)

where

$$\begin{aligned} J_{m,n}=E_x\left(\exp \left(\int _{t_n({\gamma },T)}^{t_n({\gamma },T)+{\gamma }} V^{(m)}_\mathrm{cont}(X(q_1(s)),\ldots ,X(q_m(s)))ds\right)\big \vert \mathcal{F }_{q_m(t_n ({\gamma },T)-{\gamma }^2)}\right). \end{aligned}$$

Let

$$\begin{aligned} \tilde{J}_{m,n}=\exp \left(\int _{t_n({\gamma },T)}^{t_n({\gamma },T)+{\gamma }}E_x\big ( V^{(m)}_\mathrm{cont}(X(q_1(s)),\ldots ,X(q_m(s)))\left|\mathcal{F }_{q_m(t_n ({\gamma },T)-{\gamma }^2)}\right)ds\right). \end{aligned}$$

Since \(|e^{\alpha }-1-{\alpha }|\le {\alpha }^2\) if \(|{\alpha }|\le 1\) then

$$\begin{aligned} |J_{m,n}-\tilde{J}_{m,n}|\le 2{\gamma }^2(C(V))^2. \end{aligned}$$
(3.27)

On the other hand, by the Markov property

$$\begin{aligned}&\tilde{J}_{m,n}=\exp \left(\int _{t_n({\gamma },T)}^{t_n({\gamma },T)+{\gamma }}ds\int _M p\big (q_m(s)-q_m(t_n({\gamma },T)-{\gamma }^2),\right.\nonumber \\&\left. X(q_m(t_n({\gamma },T)-{\gamma }^2)),y\big )V^{(m)}_\mathrm{cont}(X(q_1(s)),\ldots ,X(q_{m-1}(s)),y)d\nu (y) \right)\!.\qquad \end{aligned}$$
(3.28)

Set

$$\begin{aligned} d_{\gamma }(t)=\inf _{s\ge {\gamma }t}\min _{k+1\le j\le \ell }\min (q_j(s)- q_{j-1}(s{\gamma }^{-1}),\, q_j(s)-q_j(s-{\gamma }^2)) \end{aligned}$$

and observe that \(d_{\gamma }(t)\rightarrow \infty \) as \(t\rightarrow \infty \) for each fixed \({\gamma }>0\) in view of the assumption (2.18). Now, by (3.23) and (3.28),

$$\begin{aligned}&\exp (-Ke^{-{\kappa }d_{\gamma }(T)}C(V){\gamma })\le J_{m,n}\exp \left(-\int _{t_n({\gamma },T)}^ {t_n({\gamma },T)+{\gamma }}V^{(m-1)}_\mathrm{cont}(X(q_1(s)),\right.\nonumber \\&\left.\ldots ,X(q_{m-1}(s)))ds\big )\le \exp (Ke^{-{\kappa }d_{\gamma }(T)}C(V){\gamma }\right)\!. \end{aligned}$$
(3.29)

Employing (3.26)–(3.29) for \(n=M({\gamma },T),\, M({\gamma },T)-1,\ldots ,1\) with each \(m=\ell ,\ell -1,\ldots ,k+1\) we obtain that

$$\begin{aligned} \limsup _{T\rightarrow \infty }\frac{1}{T}\big \vert \ln \big ( Z_x^{(\ell )}(0,M({\gamma },T))\big ) -\ln \big ( Z_x^{(k)}(0,M({\gamma },T))\big )\big |=0. \end{aligned}$$
(3.30)

Now taking \(\ln \) in (3.25) and letting first \(T\rightarrow \infty \) and then \({\gamma }\rightarrow 0\) we obtain from (3.30) and the definition of \(Z_x^{(k)}\) that

$$\begin{aligned}&\lim _{T\rightarrow \infty }\frac{1}{T}\left(\ln E_x\exp \left(\int _0^TV(X(q_1(t)),\ldots , X(q_\ell (t)))dt\right)\right.\nonumber \\&\left. -\ln E_x\exp \left(\int _0^TV^{(k)}_\mathrm{cont} (X({\alpha }_1t),\ldots ,X({\alpha }_kt))dt\right)\right)=0. \end{aligned}$$
(3.31)

If \(k=1\) then \(\frac{1}{T}\) of the second expression in brackets in (3.31) converges as \(T\rightarrow \infty \) to the logarithm of the spectral radius of the semigroup of operators \(R^t_\mathrm{cont}(V)\) defined in (2.20). Thus, the assertions of Theorems 2.3 and 2.4 follow from the well known results on large deviations (see [9, 10, 16] and [11]) in the same way as in the discrete time case. \(\square \)

3.4 Nonconventional averaging

According to [12] the large deviations estimates (2.34) and (2.35) follow once we establish (2.33) for all continuous functions \(W_t(x_1,\ldots ,x_\ell )\) on \(\mathbb{R }_+\times M^\ell \). First, we claim that even without the assumption \(k=1\),

$$\begin{aligned}&\lim _{{\varepsilon }\rightarrow 0}{\varepsilon }\left(\ln E_x\exp \bigg ({\varepsilon }^{-1}\int _0^TW_t(X(q_1(t/{\varepsilon })), \ldots ,X(q_\ell (t/{\varepsilon }))\bigg )dt\right)\nonumber \\&-\ln E_x\exp \left({\varepsilon }^{-1}\int _0^TW_t^{(k)}(X(q_1(t/{\varepsilon })),\ldots ,X(q_k(t/{\varepsilon }))) dt\big )\right)=0 \end{aligned}$$
(3.32)

where in the discrete time case \(q_j\)’s are extended to all \(s\ge 0\) by writing \(q_j(s)=q_j([s])\) and we set

$$\begin{aligned} W^{(k)}_t(x_1,\ldots ,x_k)=\ln \int _M\ldots \int _M\exp (W_t(x_1,\ldots ,x_\ell )) d\mu (x_{k+1})\ldots d\mu (x_\ell ) \end{aligned}$$

while in the continuous time case we set

$$\begin{aligned} W^{(k)}_t(x_1,\ldots ,x_k)=\int _M\ldots \int _MW_t(x_1,\ldots ,x_\ell ) d\mu (x_{k+1})\ldots d\mu (x_\ell ). \end{aligned}$$

The proof of (3.32) is the same as the proofs of (3.1) in the discrete time case and of (3.31) in the continuous time case while the dependence of \(W_t\) on \(t\) does not play any role in the arguments employed there.

Next, when \(k=1\) we arrive at the “conventional” setup and (2.33) follows in the same way as in [12] (see also [17]). \(\square \)

4 Large deviations for any \(k\ge 1\): i.i.d. case

Here we assume that \(X(n),\, n\ge 1\) are i.i.d. random variables (vectors) and rely on the decomposition (2.36). In view of independency of \(S_{N,a}(V)\) for different \(a\in A_N\) we can write

$$\begin{aligned} Z_N(V)=E\exp \left(\sum ^N_{n=1}V(X(n),X(2n),\ldots ,X(kn))\right)=\prod _{a\in A_N} Z_{N,a}(V) \end{aligned}$$
(4.1)

where

$$\begin{aligned} Z_{\eta ,a}(V)=E\exp \left(\sum _{b\in B_\eta (a)}V(X(b),X(2b),\ldots ,X(kb))\right) \end{aligned}$$

with \(A_N\) and \(B_\eta (a)\) defined in Sect. 2.

In order to study \(Z_{N,a}(V)\) we introduce also

$$\begin{aligned} B(a)=\{ b\ge 1:\, b=ar_1^{d_1}r_2^{d_2}\cdots r_m^{d_m}\,\,\text{ for} \text{ some} \text{ nonnegative} \text{ integers}\,\, d_1,\ldots ,d_m\}. \end{aligned}$$

Observe that each \(l=1,2,\ldots ,k\) can be written uniquely in the form \(l=r_1^{d_1(l)} r_2^{d_2(l)}\cdots r_m^{d_m(l)}\) for some nonnegative integers \(d_1(l),\ldots ,d_m(l)\). Now, if \(b=ar_1^{d_1}\cdots r_m^{d_m}\in B(a)\) and \(l=1,2,\ldots ,k\) then \(lb=ar_1^{d_1+d_1(l)}\cdots r_m^{d_m+d_m(l)}\in B(a)\). Next, consider the lattice \(\mathbb{Z }^m\) and set

$$\begin{aligned} \mathbb{Z }^m_+=\{ n=(n_1,\ldots ,n_m),\, n_i\ge 0\,\,\text{ for} \text{ all}\,\, i=1,\ldots ,m\}. \end{aligned}$$

Then the formula \({\varphi }_a(n_1,\ldots ,n_m)=ar_1^{n_1}\cdots r_m^{n_m}\) provides a one-to-one correspondence

$$\begin{aligned} {\varphi }_a:\,\mathbb{Z }^m_+\rightarrow B(a) \end{aligned}$$

where, recall, \(a\) is relatively prime with \(r_1,\ldots ,r_m\). Set

$$\begin{aligned} D(\rho )=\left\{ n=(n_1,\ldots ,n_m)\in \mathbb{Z }^m:\, n_i\ge 0,\, i=1,\ldots ,m\,\,\text{ and} \,\,\sum _{i=1}^mn_i\ln r_i\le \rho \right\} \!. \end{aligned}$$

Then, clearly,

$$\begin{aligned} {\varphi }_aD(\ln (N/a))=B_N(a). \end{aligned}$$
(4.2)

It follows that

$$\begin{aligned} |B_N(a)|\le \prod _{i=1}^m\left(1+\frac{1}{\ln r_i}\ln \frac{N}{a}\right)\le \left(1+\frac{1}{\ln 2}\ln \frac{N}{a}\right)^m \end{aligned}$$
(4.3)

where \(|{\Gamma }|\) denotes the cardinality of a set \({\Gamma }\). Hence

$$\begin{aligned} a\le N2^{-(|B_N(a)|^{1/m}-1)}. \end{aligned}$$
(4.4)

Next, we claim that \(Z_{N,a}(N)\) is determined only by \(|B_N(a)|\) and not by \(N\) and \(a\) themselves. Indeed, since \(|D(\rho )|\) is nondecreasing in \(\rho \) then it determines the set \(D(\rho )\) itself, and so \(|D(\ln (N/a))|=|B_N(a)| =|B_{N/a}(1)|\) determines the set \(B_{N/a}(1)\) in view of (4.2). Set \(\hat{B}_\eta (a)=B_\eta (a)\cup \{ n:\,n=ln^{\prime }\) for some \(n^{\prime }\in B_\eta (a)\) and \(l=2,3,\ldots ,k\}\). Then we can write

$$\begin{aligned} Z_{\eta ,a}(V)=\int \ldots \int \exp \left(\sum _{b\in B_\eta (a)}V(x_b,x_{2b},\ldots ,x_{kb}) \right)\prod _{b^{\prime }\in \hat{B}_\eta (a)}d\mu (x_{b^{\prime }}). \end{aligned}$$

It is easy to see from here that \(Z_{\eta ,a}(V)=Z_{\eta /a,1}(V)\) for any \(\eta >0\) and an integer \(a\ge 2\) relatively prime with \(r_1,\ldots ,r_m\). Indeed, \(Z_{\eta ,a}(V)\) is determined by the labeled directed graph \({\Gamma }_\eta (a)\) having \(B_\eta (a)\) as its vertices and having arrows of \(k-1\) types so that an arrow with a label \(l=2,3,\ldots ,k\) is drawn from \(n\in B_\eta (a)\) to \(n^{\prime }\in B_\eta (a)\) if \(n^{\prime }=ln\). Clearly, the graphs \({\Gamma }_\eta (a)\) and \({\Gamma }_{\eta /a}(1)\) are isomorphic in the sense that there exists a one-to-one map \({\varphi }:\, B_\eta (a)\rightarrow B_{\eta /a}(1)\) such that if \(n,n^{\prime }\in B_\eta (a)\) and \(n^{\prime }=ln\) then \({\varphi }n,{\varphi }n^{\prime }\in B_{\eta /a}(1)\) and \({\varphi }n^{\prime }=l{\varphi }n\). Since \(X(n),\, n\ge 1\) are i.i.d., \(Z_{\eta ,a}(V)\) is determined, in fact, by the isomorphism class of \({\Gamma }_\eta (a)\) and not by \({\Gamma }_\eta (a)\) itself, and so \(Z_{\eta ,a}(V)=Z_{\eta /a,1}(V)\). Since \(|B_N(a)|\) determines the set \(B_{N/a}(1)\) we conclude that it determines \(Z_{N,a}(V)\), as well, proving the claim.

Let \(l=|B_N(a)|\) and set \(R_l(V)=Z_{N,a}(V)\) since the latter depends only on \(l\) (and, of course, on \(V\)). Observe that

$$\begin{aligned} \ln R_l(V)\le lC(V) \end{aligned}$$
(4.5)

where \(C(V)=\sup _{x_1,\ldots ,x_k\in M}|V(x_1,\ldots ,x_k)|\). Set \(A^{(l)}_N= \{ a\in A_N:\, |B_N(a)|=l\}\). By (4.4),

$$\begin{aligned} |A_N^{(l)}|\le N2^{-(l^{1/m}-1)}. \end{aligned}$$
(4.6)

Observe that \(|D(\rho )|\) is a nondecreasing right continuous piecewise constant function and since \(r_1,r_2,\ldots ,r_m\) are primes the jumps of \(|D_N(\rho )|\) can only be of size 1, i.e. for all \(\tilde{\rho }>0\),

$$\begin{aligned} |D(\tilde{\rho })|-\lim _{\rho \uparrow \tilde{\rho }}|D(\rho )|\le 1. \end{aligned}$$

It follows that

$$\begin{aligned} \rho _{\min }(l)=\inf \{\rho \ge 0:\, |D(\rho )|=l\}\,\,\text{ and}\,\, \rho _{\max }(l)=\sup \{\rho \ge 0:\, |D(\rho )|=l\} \end{aligned}$$

are well defined for each integer \(l\ge 1\) and \(\rho _{\min }(l)<\rho _{\max }(l)\). Denote \(\hat{A}_N^{(l)}=\{ a\in \mathbb{N }:\, Ne^{-\rho _{\max }(l)}\le a\le Ne^{-\rho _{\min }(l)},\, a\,\) is relatively prime with \(\,r_1,r_2,\ldots ,r_m\}\). Then by (4.2) and the above,

$$\begin{aligned} \frac{1}{N}\big \vert |A_N^{(l)}|-|\hat{A}_N^{(l)}|\big \vert \le \frac{1}{N}\rightarrow 0 \,\,\text{ as}\,\, N\rightarrow \infty . \end{aligned}$$
(4.7)

We will show next that the limit

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{1}{N}|\hat{A}_N^{(l)}|=(e^{-\rho _{\min }(l)}- e^{-\rho _{\max }(l)})r \end{aligned}$$
(4.8)

exists with

$$\begin{aligned} r=1\!-\!\frac{1}{2}\!-\!\frac{1}{3}\!+\!\frac{1}{2\cdot 3}-\frac{1}{5}+\frac{1}{2\cdot 5}+ \frac{1}{3\cdot 5}-\frac{1}{2\cdot 3\cdot 5}+\cdots +(-1)^m\frac{1}{r_1\cdot r_2\cdots r_m}.\nonumber \\ \end{aligned}$$
(4.9)

Indeed, for each integer \(n\ge 1\) set \(G(n)=\{ in:\, i\in \mathbb{Z }_+\}\) and \(G^{(l)}_N(n)=\{ j\in G(n):\, Ne^{-\rho _{\max }(l)}\le j\le Ne^{-\rho _{\min }(l)}\}\). Then (by the inclusion-exclusion principle),

$$\begin{aligned} |\hat{A}_N^{(l)}|&= |G_N^{(l)}(1)|-|G_N^{(l)}(2)|-|G_N^{(l)}(3)|\nonumber \\&+\ |G_N^{(l)}(2\cdot 3)|+\cdots +(-1)^m|G_N^{(l)}(r_1\cdot r_2\cdots r_m)|. \end{aligned}$$
(4.10)

Since each \(G(n)\) is an arithmetic progression with the difference \(n\) we obtain that

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{1}{N}|G_N^{(l)}(n)|=\frac{1}{n}(e^{-\rho _{\min }(l)}- e^{-\rho _{\max }(l)}) \end{aligned}$$
(4.11)

and (4.8)–(4.9) follows from (4.10)–(4.11).

Observe that by (4.2), (4.3) and the definition of \(\rho _{\min }\) and \(\rho _{\max }\),

$$\begin{aligned} \rho _{\max }(l)\ge \rho _{\min }(l)\ge (l^{1/m}-1)\ln 2, \end{aligned}$$
(4.12)

and so we obtain from (4.1) and (4.5)–(4.9) that

$$\begin{aligned} \frac{1}{N}\ln Z_N(V)&= \frac{1}{N}\sum _{a\in A_N}\ln Z_{N,a}(V)\nonumber \\&= \frac{1}{N}\sum _{1\le l\le (1+\frac{1}{\ln 2}\ln \frac{N}{a})^m}|A^{(l)}_N| \ln R_l(V)\nonumber \\&\longrightarrow r\sum _{l=1}^\infty (e^{-\rho _{\min }(l)}- e^{-\rho _{\max }(l)})\ln R_l(V)\,\,\text{ as}\,\, N\rightarrow \infty \end{aligned}$$
(4.13)

while the last series converges absolutely in view of (4.5) and (4.12). Furthermore, if \(V=V_{\lambda }\) depends on a parameter \({\lambda }\) in a differentiable way with a derivative bounded by \(\tilde{C}\) then each \(\ln R_l(V_{\lambda })\) is also differentiable in \({\lambda }\) with a derivative bounded by \(\tilde{C}l\). Hence, in this case we can differentiate in \({\lambda }\) the series in the right hand side of (4.13) and the assertion of Theorem 2.7 follows. \(\square \)

Remark 4.1

Arguments of the present section yield also moderate deviations estimates for sums \(S_N(V)\) given by (2.36) in the above i.i.d. setup. Namely, let \(\bar{V}=\int V(x_1,x_2,\ldots ,x_k)d\mu (x_1)d\mu (x_2)\cdots d\mu (x_k)\), where \(\mu \) is the probability distribution of \(X(1)\), and observe that \(\bar{V}=EV(X(n),X(2n),\ldots ,X(kn))\) for any \(n\ge 1\). Then for any \({\kappa }\in (0,\frac{1}{2})\),

$$\begin{aligned} \limsup _{N\rightarrow \infty }N^{2{\kappa }-1}\ln P\{ N^{{\kappa }-1}S_N(V-\bar{V})\in K\}\le -\frac{1}{2}{\Lambda }\inf _{u\in K}u^2 \end{aligned}$$
(4.14)

for any closed set \(K\subset \mathbb{R }\) and

$$\begin{aligned} \liminf _{N\rightarrow \infty }N^{2{\kappa }-1}\ln P\{ N^{{\kappa }-1}S_N(V-\bar{V})\in U\}\ge -\frac{1}{2}{\Lambda }\inf _{u\in U}u^2 \end{aligned}$$
(4.15)

for any open set \(U\subset \mathbb{R }\) provided that for any \({\lambda }\in \mathbb{R }\),

$$\begin{aligned} \lim _{N\rightarrow \infty }N^{2{\kappa }-1}\ln E\exp ({\lambda }N^{-{\kappa }}S_N(V-\bar{V}))=\frac{1}{2} {\Lambda }^{-1}{\lambda }^2 \end{aligned}$$
(4.16)

(cf. [12] and [11]). In order to compute the limit (4.16) we observe relying on the same arguments as above that \({\upsilon }_l(V)= E(S_{N,a}(V-\bar{V}))^2\) depends only on \(l=|B_N(a)|\) and on \(V\) where, recall, \(S_{N,a}\) was defined in (2.36). It follows that

$$\begin{aligned} \ln Z_{N,a}({\lambda }N^{-{\kappa }}(V-\bar{V}))=\frac{1}{2}{\lambda }^2N^{-2{\kappa }}{\upsilon }_l(V)+ O(|{\lambda }|^3N^{-3{\kappa }}\Vert V\Vert ^3l^3) \end{aligned}$$
(4.17)

provided \(|B_N(a)|=l\). Then in the same way as in (4.13),

$$\begin{aligned}&\lim _{N\rightarrow \infty }N^{2{\kappa }-1}\ln Z_N({\lambda }N^{-{\kappa }}(V-\bar{V}))\nonumber \\&\quad =\frac{1}{2}{\lambda }^2\lim _{N\rightarrow \infty }N^{-1}\sum _{1\le l\le (1+\frac{1}{\ln 2} \ln \frac{N}{a})^m}|A_N^{(l)}|{\upsilon }_l(V)\nonumber \\&\quad =\frac{1}{2}{\lambda }^2r\sum _{l=1}^\infty (e^{-\rho _{\min }(l)}-e^{-\rho _{\max }(l)}){\upsilon }_l(V) \end{aligned}$$
(4.18)

and (4.16) follows under a nondegeneracy condition \({\upsilon }_l(V)\ne 0\) whenever \(A_N^{(l)}\ne \emptyset \).

5 Nonconventional large deviations for dynamical systems

In this section we discuss nonconventional large deviations results in the dynamical systems case and a reader which is not familiar with hyperbolic dynamical systems and is interested only in the probabilistic setup may skip this section altogether. We assume now that \(T:\, M\rightarrow M\) is either a subshift of finite type or a \(C^2\) expanding endomorphism or a hyperbolic diffeomorphism on a compact Riemannian manifold (see [3] and [21]). By the latter we mean a \(C^2\) Anosov diffeomorphism or, more generally, a \(C^2\) diffeomorphism defined in a neighborhood of a hyperbolic attractor. We identify now the probability space \(({\Omega },\mathcal{F },P)\) with \((M,\mathcal{B },\mu )\) where \(\mathcal{B }\) is the Borel \({\sigma }\)-algebra on \(M\) and \(\mu \) is a Gibbs \(T\)-invariant measure constructed by a Hölder continuous potential \(g\) (see [3] and [21]). Let \(k=1\) in (2.3), (2.4) and (2.17), (2.18).

Theorem 5.1

Let \(X(n)=X(n,{\omega })=X(n,x)=f(T^nx),\, n\ge 0\), where \(f\) is a Hölder continuous (vector) function, and we take also \(q_j\)’s as in Theorem 2.1. Let \(k=1\) then for any \(W_{\lambda }=W_{\lambda }(x_1,\ldots ,x_\ell )\) continuous in \(x_1,\ldots ,x_\ell \),

$$\begin{aligned}&Q(W_{\lambda })=\lim _{N\rightarrow \infty }\frac{1}{N}\ln \int _M\exp \bigg (\sum _{n=1}^N W_{\lambda }(T^{q_1(n)}x,\ldots ,T^{q_\ell (n)}x)\bigg )d\mu (x)\nonumber \\&\quad =\mathfrak{P }(\ln \hat{W}_{\lambda }+g) \end{aligned}$$
(5.1)

with \(\hat{W}\) defined by (2.5), \(g\) being the potential of \(\mu \) and \(\mathfrak{P }(\cdot )\) being the topological pressure of a function in brackets for the transformation \(T\) (see [3] and [21]). If the derivative \(dW_{\lambda }/d{\lambda }\) exists and is bounded in \(x_1,\ldots ,x_\ell \) for each \({\lambda }\) then \(Q(W_{\lambda })\) is differentiable in \({\lambda }\), as well. In the expanding and hyperbolic cases the limit in (5.1) remains the same if we integrate in (5.1) either with respect to the normalized Riemannian volume or with respect to the Sinai-Ruelle-Bowen (SRB) measure \(\mu ={\mu ^{\mathrm{\tiny {SRB}}}}\) which is the Gibbs measure corresponding to the potential \(g=-\ln {\varphi }\) where \({\varphi }\) is the Jacobian of the differential \(DT\) restricted to unstable leaves (see [3] and [21]). The large deviations estimates (2.9) and (2.10) hold true with the rate functional \(J\) given by (2.8) with \(Q(W_{\lambda })\) for \(W_{\lambda }={\lambda }F\) given by (5.1) in place of \(r(W_{\lambda })\) in (2.8).

Proof

For \(T\) being a \(C^2\) Axiom A diffeomorphism (in particular, Anosov) in a neighborhood of an attractor or \(T\) being an expanding \(C^2\) endomorphism of a Riemannian manifold \(M\) (see [3]) let \(\zeta \) be a finite Markov partition for \(T\). Then we can take \(\mathcal{F }_{kl}\) to be the finite \({\sigma }\)-algebra generated by the partition \(\cap _{i=k}^lT^i\zeta \). Another case for the above theorem is when \(T\) is a topologically mixing subshift of finite type, i.e. \(T\) is the left shift on a subspace \(\Xi \) of the space of one-sided sequences \({\varsigma }=({\varsigma }_i,i\ge 0), {\varsigma }_i=1,\ldots ,l_0\) such that \({\varsigma }\in \Xi \) if \(\xi _{{\varsigma }_i{\varsigma }_{i+1}}=1\) for all \(i\ge 0\) where \(\Xi =(\xi _{ij})\) is an \(l_0\times l_0\) matrix with \(0\) and \(1\) entries and such that \(\Xi ^n\) for some \(n\) is a matrix with positive entries. Again, we take \(\mu \) to be a Gibbs invariant measure corresponding to some Hölder continuous function and to define \(\mathcal{F }_{kl}\) as the finite \({\sigma }\)-algebra generated by cylinder sets with fixed coordinates having numbers from \(k\) to \(l\). The exponentially fast \(\psi \)-mixing is well known in the above cases (see [3]). In fact, convergence to zero of the modified \(\psi -\)mixing coefficient \(\psi _{P,\Pi }(n)\) holds true, as well, in the hyperbolic and expanding case when \(P\) is the normalized Riemannian volume and \(\Pi ={\mu ^{\mathrm{\tiny {SRB}}}}\).

If the function \(W_{\lambda }=W_{\lambda }(x_1,\ldots ,x_\ell )\) is continuous in \(x_1,\ldots ,x_\ell \) then \({\beta }_{W_{\lambda }}(n)\) from Proposition 3.1 tends to zero as \(n\rightarrow \infty \), and so the condition (3.1) will be satisfied here. It follows from [16] that

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{1}{N}\ln \int \exp \left(\sum _{n=1}^N\hat{W}_{\lambda }(f(T^nx)) \right)d\mu (x)=\mathfrak{P }(\ln \hat{W}_{\lambda }+g) \end{aligned}$$

and Theorem 5.1 follows from Proposition 3.1 and Corollary 3.2 considered with \(k=1\) since in our circumstances differentiability of the topological pressure in parameters of the potential is well known (see, for instance, [24] and [23]).

Remark 5.2

  1. (i)

    A version of Theorem 2.2 can also be obtained in the present dynamical systems setup where the limit \(Q(W)=\mathfrak{P }(\ln \hat{W}(x)+g)\) is obtained in the same way as in Theorem 5.1. Since \(\mathfrak{P }(q)\) is Gateaux differentiable at any Hölder continuous \(q\) (see [26] and [23]) then \(Q(W)\) is also Gateaux differentiable at any Hölder continuous \(W\) and the large deviations for occupational measures

    $$\begin{aligned} \zeta _N=\zeta _{N,x}=\frac{1}{N}\sum _{n=1}^N{\delta }_{\big (T^{q_1(n)}x,\ldots , T^{q_\ell (n)}x\big )} \end{aligned}$$

    follow from Section 4.5.3 in [11] with a rate function which is the Fenchel–Legendre transform of \(Q\).

  2. (ii)

    Theorem 2.6 provides a direct application to the dynamical systems case when \(T\) is a full shift (on a finite alphabet sequence space) considered with a Bernoulli invariant measure taking \(X(n)=f\circ T^n\) with a function \(f\) on the sequence space depending only on zero coordinate. Nonconventional large deviations when \(k>1\) for more general cases (e.g. subshifts of finite type with Gibbs invariant measures, hyperbolic and expanding transformations etc.) require more elaborate technique and they will not be treated in this paper.