Abstract
We obtain large deviations theorems for both discrete time expressions of the form \(\sum _{n=1}^NF\big (X(q_1(n)),\ldots ,X(q_\ell (n))\big )\) and similar expressions of the form \(\int _0^TF\big ( X(q_1(t)),\ldots , X(q_\ell (t))\big )dt\) in continuous time. Here \(X(n),n\ge 0\) or \(X(t), t\ge 0\) is a Markov process satisfying Doeblin’s condition, \(F\) is a bounded continuous function and \(q_i(n)=in\) for \(i\le k\) while for \(i>k\) they are positive functions taking on integer values on integers with some growth conditions which are satisfied, for instance, when \(q_i\)’s are polynomials of increasing degrees. Applications to some types of dynamical systems such as mixing subshifts of finite type and hyperbolic and expanding transformations will be obtained, as well.
1 Introduction
Nonconventional ergodic theorems which attracted substantial attention in ergodic theory (see, for instance, [2] and [13]) studied the limits of expressions having the form \(1/N\sum _{n=1}^NT^{q_1(n)}f_1\cdots T^{q_\ell (n)}f_\ell \) where \(T\) is a weakly mixing measure preserving transformation, \(f_i\)’s are bounded measurable functions and \(q_i\)’s are polynomials taking on integer values on the integers. While, for instance, [2] and [13] were interested in \(L^2\) convergence, other papers such as [1] provided conditions for almost sure convergence in such ergodic theorems. Originally, these results were motivated by applications to multiple recurrence for dynamical systems taking functions \(f_i\) being indicators of some measurable sets.
Introducing stronger mixing or weak dependence conditions enabled us in [22] to obtain functional central limit theorems for even more general expressions of the form
where \(X(n),\, n\ge 0\) is a sufficiently fast mixing vector valued process with some moment conditions and stationarity properties, \(F\) is a locally Hölder continuous function with polynomial growth, \(\bar{F}=\int Fd(\mu \times \cdots \times \mu )\) and \(\mu \) is the distribution of \(X(0)\). In order to ensure existence of limiting variances and covariances we had to impose certain assumptions concerning the functions \(q_j(n),\, j\ge 1\) saying that there exists an integer \(k\ge 1\) such that \(q_j(n)=jn\) for \(j=1,\ldots ,k\) while \(q_j(n),\, j\ge k\) are positive functions taking on integer values on integers with some (faster than linear) growth conditions.
The next natural step in the study of limiting behavior of nonconventional sums \(S_N=\sum _{n=1}^NF\big (X(q_1(n)),\ldots ,X(q_\ell (n))\big )\) is to obtain large deviations estimates. Namely, we will be interested in this paper in the asymptotical behavior as \(N\rightarrow \infty \) of probabilities
for various (open or closed) sets \({\Gamma }\subset \mathbb{R }\). According to [19] under appropriate conditions \(\frac{1}{N}S_N\) converges with probability one as \(N\rightarrow \infty \) to \(\bar{F}=\int Fd\mu \times \cdots \times \mu \) where \(\mu \) is the common distribution of \(X(n)\)’s. Thus, as usual, (1.2) describes deviations of \(\frac{1}{N}S_N\) from the limit in the law of large numbers.
The study of asymptotics of probabilities in (1.2) leads to what is usually called the first level of large deviations. We will study also second level large deviations estimates which means in our setup to consider occupational measures
and to study the asymptotical behavior as \(N\rightarrow \infty \) of probabilities \(P\{\zeta _N\in \mathcal{U }\}\) where \(\mathcal{U }\) is a subset in the space of probability measures on a corresponding product space. In addition, we will consider also large deviations in the averaging setup, namely, for the “slow” variable \(\Xi ^{\varepsilon }(n)=\Xi _x^{\varepsilon }(n)\) given by a difference equation of the form
which is actually a generalization of the above since if \(F(\xi ,x_1,\ldots ,x_\ell )\) does not depend on \(\xi \) then \(\Xi ^{\frac{1}{N}}(N)= \frac{1}{N}S_N\). We will deal also with continuous time versions of the above results considering \(S_T=\int _0^TF\big (X(q_1(t)),\ldots ,X(q_\ell (t))\big )dt\) for some stochastic process \(X(s),\, s\ge 0\).
As for conventional sums (\(\ell =k=1\)) meaningful large deviations estimates can be obtained only for some specific classes of stochastic processes and dynamical systems. In our more general situation we also assume that in the probabilistic setup \(X(n),\, n=0,1,\ldots \) is a Markov chain satisfying a (strong) Doeblin condition while in the dynamical systems setup we can consider \(X(n)=X(n,{\omega })=f(T^n{\omega })\) where \(T\) is either a mixing subshift of finite type or a hyperbolic diffeomorphism or an expanding transformation and \(f\) is a Hölder continuous (vector) function. In the continuous time case we take the underlying process \(X(t)\) to be in the probabilistic setup either an irreducible finite Markov chain with continuous time or a nondegenerate diffusion on a compact manifold while in the dynamical systems setup we can take \(X(t)=X(t,{\omega })=f(T^t{\omega })\) where \(T^t,\, t\ge 0\) is a hyperbolic flow on a compact manifold and \(f\) is a Hölder continuous (vector) function.
We will show that it is not difficult to reduce the problem to the case \(k=\ell \) and the major problems arise only in dealing with random variables \(X(n), X(2n),\ldots ,X(kn)\). When \(k=1\) the above reduction leads to the standard (conventional) setup of large deviations. When \(k>1\) then the general case of Markov sequences requires a quite elaborate technique and a lengthy proof and it will be treated in another paper while here when \(k>1\) we restrict ourselves to independent identically distributed (i.i.d.) sequences \(X(n), n\ge 0\) which, unlike in the conventional setup, is still nontrivial.
Both probabilistic and dynamical systems setups are united by common ideas and motivations but their machineris are quite different and by this reason most of this paper deals with the probabilistic setup and only in the last Sect. 5 we discuss some of dynamical systems results which especially can benefit readers familiar with this field.
2 Preliminaries and main results
We start with the probabilistic discrete time setup where the underlying process \(X(0),\, X(1),\, X(2),\ldots \) is a Markov chain defined on a probability space \(({\Omega },\mathcal{F },P)\) and evolving on a Polish measurable space \((M,\mathcal{B })\) as its phase space. We assume a “strong” Doeblin condition saying that for some integer \(n_0>0\), a constant \(C>0\) and a probability measure \(\nu \) on \(M\) the \(n_0\)-step transition probability \(P(n_0,x,\cdot )\) of the above Markov chain \(X\) satisfies
for any \(x\in M\) and every measurable set \(G\subset M\). It is well known (see, for instance, [8]) that (2.1) implies existence of a unique invariant measure \(\mu \) of the Markov chain \(X\) and the equality \(\mu (G)=\int d\mu (x)P(n,x,G)\) yields that
where \(d\mu /d\nu \) denotes the Radon-Nikodim derivative.
In all cases our setup includes also a bounded measurable function \(F=F(x_1,x_2,\ldots ,x_\ell )\) on the \(\ell \)-times product space \(M^\ell = M\times \cdots \times M\). The setup becomes complete with introduction of positive increasing functions \(q_j,\, j=1,\ldots ,\ell \) taking on integer values on integers and such that
while for \(j=k+1,\ldots ,\ell \) and any \({\gamma }>0\),
For any function \(W\) on \(M^\ell \) we denote by \(\hat{W}\) the function on \(M\) defined by
As usual we denote by \(P_x\) the probability conditioned to \(X(0)=x\) and by \(E_x\) the corresponding expectation. Now, we can formulate our first result.
Theorem 2.1
Let \(W_{\lambda }(x_1,\ldots ,x_\ell ),\,{\lambda }\in (-\infty ,\infty )\) be a differentiable in \({\lambda }\) family of bounded measurable functions on \(M^\ell \) such that \(dW_{\lambda }(x_1,\ldots ,x_\ell )/d{\lambda }\) is bounded for each \({\lambda }\), as well. Assume that \(k=1\) in (2.3) and (2.4). Then for any \(x\in M\) the limit
exists, it is independent of \(x\) and it is differentiable in \({\lambda }\). In fact, \(Q(W_{\lambda })=\ln r(W_{\lambda })\) where \(r(W)\) is the spectral radius of the positive operator \(R(W)\) acting by
Furthermore, set \(W_{\lambda }(x_1,\ldots ,x_\ell )={\lambda }F(x_1,\ldots ,x_\ell )\) and
Then for any closed set \(K\subset \mathbb{R }\),
and for any open set \(U\subset \mathbb{R }\),
where, as before, \(S_N=S_N(F)=\sum _{n=1}^NF\big (X(q_1(n),\ldots , X(q_\ell (n))\big )\).
We observe that a very particular case of Theorem 2.1 when \(\{ X(n), \, n\ge 0\}\) are i.i.d. random variables was considered in Section 6 of [18]. Next, we describe the second level of large deviations in the nonconventional setup which deals with occupational measures \(\zeta _N\) on \(M^\ell \) given by (1.3) where \(M\) is assumed to be a compact space and \({\delta }_z\) is the unit mass concentrated at \(z\). For any probability measure \(\eta \) on \(M^\ell \) define
where \(\mathbb{C }_+(\cdot )\) denotes the space of all positive continuous functions on a space in brackets.
Theorem 2.2
Let \(k=1\) in (2.3) and (2.4). Then for any continuous function \(W=W(x_1,\ldots ,x_\ell )\) on \(M^\ell \) the limit
is a convex lower semicontinuous functional satisfying
where \(\mathcal{P }(\cdot )\) is the space of probability measures on a space in brackets considered with the topology of weak convergence.
Furthermore, for any closed set \(K\subset \mathcal{P }(M^\ell )\),
and for any open set \(U\subset \mathcal{P }(M^\ell )\),
Next, we exhibit continuous time versions of the above results. Here we assume that \(X(t),\, t\ge 0\) is a Markov process on a Polish measurable space \((M,\mathcal{B })\) such that for some \(t_0>0\), a constant \(C>0\) and a probability measure \(\nu \) on \(M\) the time \(t_0\) transition probability \(P(t_0,x,\cdot )\) of the above Markov process \(X\) satisfies
for any \(x\in M\) and every measurable set \(G\subset M\). Again (see [8]), (2.16) implies existence of a unique invariant measure \(\mu \) of the Markov process \(X\) which satisfies (2.2). Now we introduce positive increasing functions \(q_j,\, j=1,\ldots ,\ell \) on \(\mathbb{R }_+\) such that for some \(0<{\alpha }_1<{\alpha }_2<\cdots <{\alpha }_k\) and \(k\le \ell \),
while for \(j=k+1,\ldots ,\ell \) and any \({\gamma }>0\),
We will be interested in large deviations estimates as \(T\rightarrow \infty \) for
Theorem 2.3
Let \(W_{\lambda }(x_1,\ldots ,x_\ell ),\,{\lambda }\in (-\infty ,\infty )\) be as in Theorem 2.1. Assume that \(k=1\) in (2.17) and (2.18). Then for any \(x\in M\) the limit
exists, it is independent of \(x\) and it is differentiable in \({\lambda }\). In fact, \(Q_{cont}(W_{\lambda })=\ln r_\mathrm{cont}(W_{\lambda })\) where \(r_{ cont}(W)\) is the spectral radius of the semigroup of positive operators \(R^t_{\text{ cont}}(W)\) acting by the formula
where
Furthermore, set \(W_{\lambda }(x_1,\ldots ,x_\ell )={\lambda }F(x_1,\ldots ,x_\ell )\) and define \(J(u)=J_{cont}(u)\) by (2.8) with \(r_{cont}\) in place of \(r\). Then for any closed set \(K\subset \mathbb{R }\),
and for any open set \(U\subset \mathbb{R }\),
The second level of large deviations in the continuous time nonconventional setup deals with occupational measures
on \(M^\ell \). Now we assume that \(X(t),\, t\ge 0\) is a diffusion process on a compact Riemannian manifold \(M\) with the generator \(L\) which is a nondegenerate second order elliptic differential operator. For any probability measure \(\eta \) on \(M^\ell \) set
where the infimum is taken over all positive \(u\) from the domain of \(L\).
Theorem 2.4
Let \(k=1\) in (2.17) and (2.18). Then for any continuous function \(W=W(x_1,\ldots ,x_\ell )\) on \(M^\ell \) the limit
is a convex lower semicontinuous functional satisfying
Furthermore, for any closed set \(K\subset \mathcal{P }(M^\ell )\),
and for any open set \(\subset \mathcal{P }(M^\ell )\),
A similar result holds true when \(X(t)\) is a nondegenerate continuous time Markov chain with a finite state space.
Next, we describe our large deviations estimates in a nonconventional averaging setup. Here we consider either a difference equation (1.4) for \(\Xi ^{\varepsilon }(n)\) in the discrete time case where \(X(n),\, n\ge 0\) is a Markov chain satisfying conditions of Theorem 2.1 or a differential equation for \(\Xi ^{\varepsilon }(t)=\Xi _x^{\varepsilon }(t)\in \mathbb{R }^d,\,t\ge 0\),
in the continuous time setup where \(X(t),\, t\ge 0\) is a Markov process satisfying conditions of Theorem 2.3. We assume that \(F(\xi ,x_1,\ldots , x_\ell )\) is bounded and Lipschitz continuous in \(\xi \). The setup of (2.30) emerges considering, for instance, a time dependent small perturbation of the oscillator equation
where the force term \(g\) depends on time in a random way \(g(x,y,t)=g(x,y, X(q_1(t)), \ldots ,X(q_\ell (t)))\). Then passing to the polar coordinates \((r,\phi )\) with \(x=r\sin ({\lambda }(t-\phi ))\) and \(\dot{x}={\lambda }r\cos ({\lambda }(t-\phi ))\) the Eq. (2.31) will be transformed into (2.30) with \(\Xi ^{\varepsilon }=(r,\phi )\). It seems reasonable that a random force may depend on versions of a same process moving with different speeds which is what we have here.
As it is well known (see, for instance, [25]), if \(F(\xi ,x_1,\ldots , x_\ell )\) is bounded and Lipschitz continuous in \(\xi \) then whenever for each \(\xi \) the (pointwise) limit
exists then for any \(T\ge 0\),
where
In the discrete time case we have to take
Almost everywhere limits of the averages above can be obtained by nonconventional pointwise ergodic theorems from [4] and [1], respectively, in rather general circumstances in the dynamical systems case and under another set of conditions existence of such limits follows from [19]. The next natural step here is to obtain large deviations estimates for the above approximation of the slow motion \(\Xi ^{\varepsilon }\) by the averaged one \(\bar{\Xi }^{\varepsilon }\).
For any \(\eta \in \mathcal{P }(M^\ell )\) set
For each absolutely continuous curve \({\gamma }_t,\, t\in [0,\mathcal{T }]\) set
where \(I(\eta )\) is given by (2.11) or \(I(\eta )=I_\mathrm{cont}(\eta )\) given by (2.25) in the discrete or continuous time cases, respectively. If \({\gamma }_t,\, t\in [0,T]\) is not absolutely continuous we set \(S_{0T}({\gamma }) =\infty \).
Theorem 2.5
Let \(k=1\) in (2.3) and (2.4) or in (2.17) and (2.18) and set \(\Psi ^{\varepsilon }(t)=\Xi ^{\varepsilon }([t/{\varepsilon }])\) or \(\Psi ^{\varepsilon }(t)=\Xi ^{\varepsilon }(t/{\varepsilon })\) in the discrete or continuous time cases, respectively. Then for any continuous function \(W_t(x_1,\ldots ,x_\ell )\) on \(\mathbb{R }_+\times M^\ell \),
where \(r_\mathrm{cont}\) is the same as in Theorem 2.3 with \(W_t\) considered as a function on \(M^\ell \) and in the discrete time case we either extend \(q_j(t)=q_j([t])\) to all \(t\ge 0\) in order to write the integral in exponent in (2.33) or replace this integral by the corresponding sum.
Furthermore, for any \(a,{\delta },{\lambda }>0\) and every continuous \({\gamma }_t,\, t\in [0,\mathcal{T }],\,{\gamma }_0=x\) there exist \({\varepsilon }_0>0\) such that for all positive \({\varepsilon }<{\varepsilon }_0\),
where \(\Psi _x^{\varepsilon }(0)=x,\,\rho _{0,\mathcal{T }}\) is the uniform distance and \(\Phi _{0,\mathcal{T }}^a(x)=\{{\gamma }:\,{\gamma }_0=x,\, \mathcal{S }_{0,T}({\gamma })\le a\}\).
Remark 2.6
Suppose that the averaged motion \(\bar{\Xi }^{\varepsilon }\) has several attracting fixed points and limit circles. Then similarly to [12] (Markov chains case) and [17] (dynamical systems case) we can study rare transitions of the slow motion \(\Xi ^{\varepsilon }\) between these attractors. However, in the nonconventional setup the situation is more complicated and this problem will not be dealt with in this paper.
Certain versions of Theorems 2.2–2.5 can be obtained for some classes of dynamical systems such as mixing subshifts of finite type and \(C^2\) hyperbolic and expanding transformations but in order not to interrupt probabilistic exposition here we discuss some of these results in the last Sect. 5.
In the next section we will show that the study of large deviations in our nonconventional setup can be always reduced to the case \(k=\ell \), i.e. we have to deal only with \(q_j(n)=jn,\, j=1,\ldots ,k\). So we discuss next this situation allowing any \(k\ge 1\) while assuming that \(X(n),\, n\ge 0,\,q_j\) and \(F\) are the same as in Theorem 2.1. It turns out that the treatment of the general case when \(X(0),X(1),X(2),\ldots \) is a Markov chain requires a quite complicated and technical proof whose exposition here would make this paper too long, and so it will be discussed in another paper. Thus, we will restrict ourselves here to a particular case when \(X(n),\, n\ge 0\) are independent identically distributed (i.i.d.) random variables (or vectors). Namely, we are interested in large deviations estimates for \(S_N(F)= \sum _{n=1}^NF(X(n),X(2n),\ldots ,X(kn))\) where \(X(n)\in M,\, n\ge 1\) are i.i.d. random variables (vectors) with a compact support \(M\). Let \(r_1,\ldots ,r_m\ge 2\) be all primes not exceeding \(k\). Set \(A_n=\{ a\le n:\, a\,\,\text{ is} \text{ relatively} \text{ prime} \text{ with}\, r_1,\ldots ,r_m\}\) and \(B_\eta (a)=\{ b\le \eta :\, b=ar_1^{d_1}r_2^{d_2}\cdots r_m^{d_m}\) for some nonnegative integers \(d_1,\ldots ,d_m\}\). Now for any bounded measurable function \(V\) on \(M^k\) we write
Observe that \(S_{N,a}(V),\, a\in A_V\) is a collection of independent random variables.
Theorem 2.7
For any continuous function \(V\) on \(M^k\) the limit
exists and the functional \(Q(V)\) is convex and lower semicontinuous. If \(V=V_{\lambda }\) depends on a parameter \({\lambda }\) and has a bounded derivative in \({\lambda }\) then \(Q(V_{\lambda })\) is also differentiable in \({\lambda }\). Thus taking \(V_{\lambda }={\lambda }F\) we obtain that also for \(k\ge 2\) in the above i.i.d. setup both upper and lower large deviations bounds (2.9) and (2.10) hold true with the rate functional \(J\) being the Fenchel-Legendre transform \(J(u)=\sup _{\lambda }({\lambda }u- Q({\lambda }F))\) of \(Q\).
In Sect. 4 we will provide a rather explicit computation of the limit (2.37). As a model application of Theorem 2.7 we can consider digits \(X(n)=X(n,{\omega }),\, n\ge 1\) of base \(M\) expansions \({\omega }=\sum _{n=1}^\infty \frac{X(n,{\omega })}{M^n},\,X(n,{\omega })\in \{ 0,1,\ldots ,M-1\}\) of numbers \({\omega }\in [0,1)\) which are i.i.d. random variables on the probability space \(([0,1),\mathcal{B },P)\) where \(\mathcal{B }\) is the Borel \({\sigma }\)-algebra and \(P\) is the Lebesgue measure. Take, for instance, \(V(x_1,\ldots ,x_k)={\delta }_{{\alpha }_1x_1} {\delta }_{{\alpha }_2x_2}\cdots {\delta }_{{\alpha }_kx_k}\) for some \({\alpha }_1,\ldots ,{\alpha }_k \in \{ 0,1,\ldots ,M-1\}\) with \({\delta }_{ij}=1\) if \(i=j\) and \(=0\), otherwise. Then Theorem 2.7 provides large deviations estimates for the number
The same setup can be reformulated in the following way. Consider infinite sequences of letters (colors, spins, etc.) taken out of an alphabet of size \(M\). Let \(n_{{\alpha }_1,\ldots ,{\alpha }_k}(N)\) be the number of arithmetic progressions of length \(k\) with both the first term and the difference equal \(n\le N\) and having the letter (color, spin, etc.) \({\alpha }_i\) on the place \(i=1,2,\ldots ,k\). Then Theorem 2.7 yields large deviations bounds for \(n_{{\alpha }_1,\ldots ,{\alpha }_k}(N)\) as \(N\rightarrow \infty \) considered as a random variable on the space of sequences of letters with any product probability measure, in particular, with uniform probability measure which assigns the same weight to each combination of \(n\) consecutive letters (i.e. to each cylinder set of length \(n\)) for all \(n=1,2,\ldots \). We observe that another statistical physics interpretation of a particular case of the above i.i.d. setup appeared independently in a recent paper [6] though large deviations bounds were obtained there only for the case \(k=M=2\).
3 Large deviations for Markov processes: \(k=1\) case
3.1 Reduction to the \(k=\ell \) case
First, we will show that the study of the limit (2.6) for any \(k\le \ell \) can be reduced to the case \(k=\ell \). In order to apply this result not only to Markov chains but also to other fast mixing processes, in particular to dynamical systems considered in Sect. 5, we will deal here with a somewhat more general setup.
Let \(\{ X(n),\, n=0,1,\ldots \}\) be a sequence of measurable mappings of a measurable space \(({\Omega },\mathcal{F })\) to a Polish space \(M\) considered with its Borel \({\sigma }\)-algebra \(\mathcal{B }\). Since \((M,\mathcal{B })\) is isomorphic to a Borel subset \({\Upsilon }\) of an interval we can and do identify \(M\) with \({\Upsilon }\) and assume that each \(X(n)\) is real (or vector) valued. Then \(\{ X(n),\, n=0,1,\ldots \}\) becomes a real (or vector) valued stochastic process under each probability measure on \(({\Omega },\mathcal{F })\). Our setup includes two such measures \(P\) and \(\Pi \) while we assume that \(X(n)\Pi =\mu \) does not depend on \(n\), i.e. that the one dimensional distribution \(\mu \) of \(X(n)\) on the probability space \(({\Omega },\mathcal{F },\Pi )\) is the same for all \(n\). In order to state our conditions we introduce also a family of \({\sigma }\)-algebras \(\mathcal{F }_{ml}\subset \mathcal{F },\,-\infty \le m\le l\le \infty \) satisfying \(\mathcal{F }_{-\infty ,\infty }=\mathcal{F }\) and \(\mathcal{F }_{ml}\subset \mathcal{F }_{m^{\prime }l^{\prime }}\) if \(m^{\prime }\le m\) and \(l^{\prime }\ge l\). Next, we define a modified \(\psi \)-mixing (dependence) coefficient by
where \(E_Q\) is the expectation with respect to a probability measure \(Q\) and \(\Vert \cdot \Vert _\infty \) is the \(L^\infty ({\Omega },P)\) norm. The rational behind introduction of two probability measures \(P\) and \(\Pi \) above is to allow \(X(n),\, n\ge 0\) to be a Markov chain with an arbitrary initial distribution (in particular, starting at a point) under \(P\) while \(X(n)\) is stationary under \(\Pi \) and the distribution of \(X(n)\) under \(P\) converges to \(\mu =X(0)\Pi \). Furthermore, we will not assume measurability of \(X(n)\)’s with respect to some of \({\sigma }\)-algebras \(\mathcal{F }_{m,l}\) but instead will rely on approximation coefficients defined for each bounded continuous function \(V=V(x_1,\ldots ,x_\ell )\) on \(M^\ell \) by
Since \(V\) is continuous we can take here the supremum over a countable dense set in \(M^{\ell -1}\), and so outside of one \(P\)-measure zero set \({\beta }_V(n)\) gives a uniform bound of the difference above.
Proposition 3.1
Let \(V(x_1,\ldots ,x_\ell )\) be a bounded continuous function on \(M^\ell \) and assume that
together with the conditions (2.3) and (2.4) on functions \(q_j,\, j=1,\ldots ,\ell \). Then,
where for each \(m<\ell \),
If, in fact, \(X(n)\) is \(\mathcal{F }_{n,n}\)-measurable then (3.2) holds true for any bounded measurable function \(V\) assuming only that \(\psi (n)\rightarrow 0\) as \(n\rightarrow \infty \).
Proof
Observe that (2.4) yields
Set
and observe that \(d_{\gamma }(n)\rightarrow \infty \) as \(n\rightarrow \infty \) in view of (2.4) and (3.4). For any \(l=0,1,\ldots \) and \(0\le r\le \infty \) set
Next, for \(m=1,2,\ldots ,\ell ,\,a\le b\le c\) and \(0\le r\le \infty \) denote
If \(b=c\), i.e. we have only the first sum above, we set \(Z_r^{(m)}(a,b,c)= Z_r^{(m)}(a,b)\). If \(r=\infty \) we drop the index \(r\) and write just \(Z^{(n)}(a,b,c)\) or \(Z^{(m)}(a,b)\). Observe that
where \(C(V)=\sup _{(x_1,\ldots ,x_\ell )\in M^\ell }|V(x_1,\ldots ,x_\ell )|\). By the definition of \({\beta }_V(n)\) (and the remark after it) we obtain also that for any \(m\!=\!1,2,\ldots ,\ell ,\,a\!\le \! b\!\le \! c\) and \(0\!\le \! r\!\le \!\infty \),
Let \(g=g(x,y)\) be a bounded measurable function on a product \(M_1\times M_2\) [(for some measurable spaces \((M_1,\mathcal{B }_1)\) and \((M_2,\mathcal{B }_2)\)] and \(X:{\Omega }\rightarrow M_1\) and \(Y:{\Omega }\rightarrow M_2\) be \(\mathcal{F }_{-\infty ,l}-\) and \(\mathcal{F }_{l+n,\infty }-\)measurable random variables (maps), respectively. Then it follows from the definition of \(\psi (n)=\psi _{P,\Pi }(n)\) that
where \(g_\Pi (x)=E_\Pi g(x,Y)\) and \(|g|_\Pi (x)=E_\Pi |g(x,Y)|\). Now take \(r=r_{\gamma }(N)=[\frac{1}{3}d_{\gamma }(N)]\) where \([\cdot ]\) denotes the integral part. Then for all \(N\ge n\ge {\gamma }N+1,\,m=k+1,\ldots ,\ell \) and \(N\) large enough
where
By (3.7) and the definition of \({\beta }_V\) we conclude that
where \(\eta (n)=(\psi (n)+2{\beta }_V(n)+2{\beta }_V(n)\psi (n))e^{C(V)}\rightarrow 0\) as \(n\rightarrow \infty \). Employing (3.8) and (3.9) for \(n=N,N-1,\ldots ,[{\gamma }N]+1\) we obtain that
Next, we use (3.10) for \(m=\ell ,\ell -1,\ldots ,k+1\) which together with (3.5) and (3.6) yields that
Taking \(\ln \) in (3.11), dividing by \(N\), letting \(N\rightarrow \infty \) and taking into account that then \(r=r(N)\rightarrow \infty \), we obtain that
and (3.2) follows since \({\gamma }>0\) is arbitrary.
If \(X(n)\) is \(\mathcal{F }_{n,n}\)-measurable for each \(n\) then we do not have to deal with the approximation coefficient \({\beta }_V(r)\) and \(X_r=X,\, Z_r^{(m)}= Z^{(m)}\) above. Hence all above arguments remain true with \({\beta }_V(r)=0\) for any bounded measurable \(V\) and we obtain (3.2) provided \(\psi (n)\rightarrow 0\) as \(n\rightarrow \infty \). \(\square \)
It is easy to check the conditions of Proposition 3.1 for Markov chains \(X(n),\, n\ge 0\) satisfying the “strong” Doeblin condition (2.1). Indeed, denote by \(\mathcal{F }_{l,m},\, l\le m\) the \({\sigma }\)-algebra generated by \(X(l),\ldots , X(m)\) with \(\mathcal{F }_{l,\infty }\) being the minimal \({\sigma }\)-algebra containing all \(\mathcal{F }_{l,m},\, m\ge l\) and we set \(\mathcal{F }_{l,m}=\mathcal{F }_{0,m}\) for \(l<0\) and \(m\ge 0\). If \(g\) is \(\mathcal{F }_{l+n,\infty }-\)measurable then by the Markov property
where \(P_y\) is the probability measure on the path space of the Markov chain \(X(n)\) starting at \(y\). The Chapman-Kolmogorov equation says that for any \(n\ge n_0\),
and so by (2.1) for all such \(n\),
This together with the Radon-Nikodim theorem yields existence for \(\nu \)-almost all \(y\) and \(n\ge n_0\) of the transition density \(p(n,x,y)\) satisfying
It is well known (see, for instance, [8]) that (2.1) and (2.2) imply that
for some \(K,{\kappa }>0\) independent of \(n\ge n_0\). If \(\Pi \) is the stationary probability of the Markov chain on the path space then
Thus the condition (3.1) with \({\beta }_V(n)=0\) is satisfied in our Markov chains case.
Corollary 3.2
Assume that conditions of Proposition 3.1 hold true. Suppose that for any bounded measurable function \(V_{\lambda }(x_1,\ldots ,x_k)\) on \(\mathbb{R }\times M^k\) having a bounded in \(x_1,\ldots ,x_k\) derivative in a parameter \({\lambda }\in (-\infty ,\infty )\) the limit
exists, it is a lower semicontinuous convex functional and it is differentiable in the parameter \({\lambda }\). Then for any bounded measurable function \(W_{\lambda }(x_1,\ldots ,x_\ell )\) on \(\mathbb{R }\times M^\ell \) having a bounded in \(x_1,\ldots ,x_\ell \) derivative in a parameter \({\lambda }\in (-\infty ,\infty )\) the limit
exists, it is a lower semicontinuous convex functional and it is differentiable in the parameter \({\lambda }\). In particular, the large deviations estimates in the form (2.9) and (2.10) hold true then with the rate functional \(J\) given by (2.8) with \(W_{\lambda }={\lambda }F\).
proof
By Proposition 3.1, \(Q(W_{\lambda })=Q(W_{\lambda }^{(k)})\) and we see from (3.3) that if \(W_{\lambda }\) is bounded and has a bounded derivative in \({\lambda }\) then so does \(W_{\lambda }^{(k)}\). Hence, by the assumption \(Q(W_{\lambda }^{(k)})\) is a lower semicontinuous convex functional and it is differentiable in \({\lambda }\) which implies the same for \(Q(W_{\lambda })\) and the result follows. \(\square \)
Now let \(k=1\) and \(V=W_{\lambda }\) as in Theorem 2.1. Then \(\hat{W}_{\lambda }= V^{(1)}\) and by Proposition 3.1,
Thus we arrive at the standard limit appearing in “conventional” large deviations results which is well known for Markov chains \(X(n),\, n\ge 0\) satisfying our conditions as it is described in Theorem 2.1. Differentiability of \(Q(W_{\lambda })\) in \({\lambda }\) follows from standard results on positive operatos (see, for instance, [20]) and we derive now Theorem 2.1 from well known “conventional” large deviations results (see, for instance, [9, 16] and Section 2.3 in [11]). \(\square \)
3.2 2nd level of large deviations
Recall that in the setup of Theorem 2.2 we have \(k=1,\,M\) being a compact space and the result is about large deviations for occupational measures \(\zeta _N\) appearing there. Let \(W\) be a continuous function on \(M^\ell \) with \(\hat{W}\) defined by (2.5). By Proposition 3.1 together with the well known facts (see, for instance, [9, 11] and [16]),
where \(r(W)\) is the spectral radius of the operator
Observe, that by the Donsker–Varadhan variational formula (see [9] and [10]),
where \(\hat{I}(\nu )=-\inf _{u\in C_+(M)}\int \ln \frac{E_xu(X(1))}{u(x)}d\nu (x)\) and the infimum is taken over positive continuous functions on \(M\).
Next, let \(Y^{(i)}(n),\, i=2,\ldots ,\ell ;\, n=0,1,2,\ldots \) be i.i.d. \(M\)-valued random variables with the distribution \(\mu \), all of them independent of the Markov chain \(X(n),\, n\ge 0\). Then it is easy to see that
Indeed, let \(\mathcal{F }_X\) be the \(\sigma \)-algebra generated by the Markov chain \(X(n),n\ge 0\). Then
and (3.18) follows. But now we have the standard situation for the Markov chain \((X(n),Y^{(2)}(n),\ldots ,Y^{(\ell )}(n)),\, n\ge 0\), and so by the Donsker–Varadhan variational formula (see [9] and [10]),
where
It is known here (see, for instance, Proposition 5.1 in [15]) that there exists a unique \(\nu =\nu _W\) on which the supremum in (3.20) is attained and it follows from the standard theory (see, for instance, [16]) that \(I(\nu )\) is the rate functional for the second level large deviations both for the auxiliary occupational measures
and for our nonconventional occupational measures \(\zeta _N\). \(\square \)
3.3 Continuous time case
Similarly to the discrete time case, the main step in the proof of Theorem 2.3 is to establish (2.19) and to identify the limit there as the spectral radius of the semigroup (2.20).
From (2.16) it follows that for any \(t\ge t_0\) and every measurable set \(G\subset M\),
Furthermore, similarly to (3.5) (see [8]),
where \(p(y)=\frac{d\mu }{d\nu }(y)\) is the density of the unique invariant measure \(\mu \) of the Markov process \(X\). Observe that (2.18) implies also that for any \(j\ge k+1\) and \({\gamma }>0\),
Let \(V=V(x_1,\ldots ,x_\ell )\) be a bounded measurable function on \(M^\ell \) and for \(m=1,2,\ldots ,\ell \) set
with \(V^{(\ell )}_\mathrm{cont}=V\). Set \(t_n({\gamma },T)={\gamma }T+n({\gamma }+{\gamma }^2)\) for \(n=0,1,2,\ldots ,M({\gamma },T)-1\) where \(M({\gamma },T)=[(T(1-{\gamma })/({\gamma }+{\gamma }^2)]\). Next, for \(a\le b\le c\) and \(m=1,2,\ldots ,\ell \) we denote
and set \(Z^{(m)}_x(a,b)=Z^{(m)}_x(a,b,b)\). Observe that \(Z_x^{(\ell )}(0,M({\gamma },T))\) does not contain the integration from 0 to \({\gamma }T\) as well as the sum of integrals from \(t_n({\gamma },T)+{\gamma }\) to \(t_n({\gamma },T)+{\gamma }+{\gamma }^2\) which are both present in the integral from 0 to \(T\), and so estimating these missing parts we arrive at the inequality
where \(C(V)=\sup _{(x_1,\ldots ,x_\ell )}|V(x_1,\ldots ,x_\ell )|\).
Denote by \(\mathcal{F }_t\) the \({\sigma }\)-algebra generated by \(X(s),\, s\,{\le }\, t\). Then by (2.18) and (3.24) for all \(T\) large enough if \(n\ge 1,\, T\ge t\ge t_n({\gamma },T)\) and \(s\le t_{n-1}({\gamma },T)+{\gamma }\) then \(X(q_1(s)),\ldots , X(q_m(s))\) and \(X(q_1(t)),\ldots ,X(q_{m-1}(t))\) are \(\mathcal{F }_{q_m(t_n({\gamma },T) -{\gamma }^2)}-\)measurable. Hence,
where
Let
Since \(|e^{\alpha }-1-{\alpha }|\le {\alpha }^2\) if \(|{\alpha }|\le 1\) then
On the other hand, by the Markov property
Set
and observe that \(d_{\gamma }(t)\rightarrow \infty \) as \(t\rightarrow \infty \) for each fixed \({\gamma }>0\) in view of the assumption (2.18). Now, by (3.23) and (3.28),
Employing (3.26)–(3.29) for \(n=M({\gamma },T),\, M({\gamma },T)-1,\ldots ,1\) with each \(m=\ell ,\ell -1,\ldots ,k+1\) we obtain that
Now taking \(\ln \) in (3.25) and letting first \(T\rightarrow \infty \) and then \({\gamma }\rightarrow 0\) we obtain from (3.30) and the definition of \(Z_x^{(k)}\) that
If \(k=1\) then \(\frac{1}{T}\) of the second expression in brackets in (3.31) converges as \(T\rightarrow \infty \) to the logarithm of the spectral radius of the semigroup of operators \(R^t_\mathrm{cont}(V)\) defined in (2.20). Thus, the assertions of Theorems 2.3 and 2.4 follow from the well known results on large deviations (see [9, 10, 16] and [11]) in the same way as in the discrete time case. \(\square \)
3.4 Nonconventional averaging
According to [12] the large deviations estimates (2.34) and (2.35) follow once we establish (2.33) for all continuous functions \(W_t(x_1,\ldots ,x_\ell )\) on \(\mathbb{R }_+\times M^\ell \). First, we claim that even without the assumption \(k=1\),
where in the discrete time case \(q_j\)’s are extended to all \(s\ge 0\) by writing \(q_j(s)=q_j([s])\) and we set
while in the continuous time case we set
The proof of (3.32) is the same as the proofs of (3.1) in the discrete time case and of (3.31) in the continuous time case while the dependence of \(W_t\) on \(t\) does not play any role in the arguments employed there.
Next, when \(k=1\) we arrive at the “conventional” setup and (2.33) follows in the same way as in [12] (see also [17]). \(\square \)
4 Large deviations for any \(k\ge 1\): i.i.d. case
Here we assume that \(X(n),\, n\ge 1\) are i.i.d. random variables (vectors) and rely on the decomposition (2.36). In view of independency of \(S_{N,a}(V)\) for different \(a\in A_N\) we can write
where
with \(A_N\) and \(B_\eta (a)\) defined in Sect. 2.
In order to study \(Z_{N,a}(V)\) we introduce also
Observe that each \(l=1,2,\ldots ,k\) can be written uniquely in the form \(l=r_1^{d_1(l)} r_2^{d_2(l)}\cdots r_m^{d_m(l)}\) for some nonnegative integers \(d_1(l),\ldots ,d_m(l)\). Now, if \(b=ar_1^{d_1}\cdots r_m^{d_m}\in B(a)\) and \(l=1,2,\ldots ,k\) then \(lb=ar_1^{d_1+d_1(l)}\cdots r_m^{d_m+d_m(l)}\in B(a)\). Next, consider the lattice \(\mathbb{Z }^m\) and set
Then the formula \({\varphi }_a(n_1,\ldots ,n_m)=ar_1^{n_1}\cdots r_m^{n_m}\) provides a one-to-one correspondence
where, recall, \(a\) is relatively prime with \(r_1,\ldots ,r_m\). Set
Then, clearly,
It follows that
where \(|{\Gamma }|\) denotes the cardinality of a set \({\Gamma }\). Hence
Next, we claim that \(Z_{N,a}(N)\) is determined only by \(|B_N(a)|\) and not by \(N\) and \(a\) themselves. Indeed, since \(|D(\rho )|\) is nondecreasing in \(\rho \) then it determines the set \(D(\rho )\) itself, and so \(|D(\ln (N/a))|=|B_N(a)| =|B_{N/a}(1)|\) determines the set \(B_{N/a}(1)\) in view of (4.2). Set \(\hat{B}_\eta (a)=B_\eta (a)\cup \{ n:\,n=ln^{\prime }\) for some \(n^{\prime }\in B_\eta (a)\) and \(l=2,3,\ldots ,k\}\). Then we can write
It is easy to see from here that \(Z_{\eta ,a}(V)=Z_{\eta /a,1}(V)\) for any \(\eta >0\) and an integer \(a\ge 2\) relatively prime with \(r_1,\ldots ,r_m\). Indeed, \(Z_{\eta ,a}(V)\) is determined by the labeled directed graph \({\Gamma }_\eta (a)\) having \(B_\eta (a)\) as its vertices and having arrows of \(k-1\) types so that an arrow with a label \(l=2,3,\ldots ,k\) is drawn from \(n\in B_\eta (a)\) to \(n^{\prime }\in B_\eta (a)\) if \(n^{\prime }=ln\). Clearly, the graphs \({\Gamma }_\eta (a)\) and \({\Gamma }_{\eta /a}(1)\) are isomorphic in the sense that there exists a one-to-one map \({\varphi }:\, B_\eta (a)\rightarrow B_{\eta /a}(1)\) such that if \(n,n^{\prime }\in B_\eta (a)\) and \(n^{\prime }=ln\) then \({\varphi }n,{\varphi }n^{\prime }\in B_{\eta /a}(1)\) and \({\varphi }n^{\prime }=l{\varphi }n\). Since \(X(n),\, n\ge 1\) are i.i.d., \(Z_{\eta ,a}(V)\) is determined, in fact, by the isomorphism class of \({\Gamma }_\eta (a)\) and not by \({\Gamma }_\eta (a)\) itself, and so \(Z_{\eta ,a}(V)=Z_{\eta /a,1}(V)\). Since \(|B_N(a)|\) determines the set \(B_{N/a}(1)\) we conclude that it determines \(Z_{N,a}(V)\), as well, proving the claim.
Let \(l=|B_N(a)|\) and set \(R_l(V)=Z_{N,a}(V)\) since the latter depends only on \(l\) (and, of course, on \(V\)). Observe that
where \(C(V)=\sup _{x_1,\ldots ,x_k\in M}|V(x_1,\ldots ,x_k)|\). Set \(A^{(l)}_N= \{ a\in A_N:\, |B_N(a)|=l\}\). By (4.4),
Observe that \(|D(\rho )|\) is a nondecreasing right continuous piecewise constant function and since \(r_1,r_2,\ldots ,r_m\) are primes the jumps of \(|D_N(\rho )|\) can only be of size 1, i.e. for all \(\tilde{\rho }>0\),
It follows that
are well defined for each integer \(l\ge 1\) and \(\rho _{\min }(l)<\rho _{\max }(l)\). Denote \(\hat{A}_N^{(l)}=\{ a\in \mathbb{N }:\, Ne^{-\rho _{\max }(l)}\le a\le Ne^{-\rho _{\min }(l)},\, a\,\) is relatively prime with \(\,r_1,r_2,\ldots ,r_m\}\). Then by (4.2) and the above,
We will show next that the limit
exists with
Indeed, for each integer \(n\ge 1\) set \(G(n)=\{ in:\, i\in \mathbb{Z }_+\}\) and \(G^{(l)}_N(n)=\{ j\in G(n):\, Ne^{-\rho _{\max }(l)}\le j\le Ne^{-\rho _{\min }(l)}\}\). Then (by the inclusion-exclusion principle),
Since each \(G(n)\) is an arithmetic progression with the difference \(n\) we obtain that
and (4.8)–(4.9) follows from (4.10)–(4.11).
Observe that by (4.2), (4.3) and the definition of \(\rho _{\min }\) and \(\rho _{\max }\),
and so we obtain from (4.1) and (4.5)–(4.9) that
while the last series converges absolutely in view of (4.5) and (4.12). Furthermore, if \(V=V_{\lambda }\) depends on a parameter \({\lambda }\) in a differentiable way with a derivative bounded by \(\tilde{C}\) then each \(\ln R_l(V_{\lambda })\) is also differentiable in \({\lambda }\) with a derivative bounded by \(\tilde{C}l\). Hence, in this case we can differentiate in \({\lambda }\) the series in the right hand side of (4.13) and the assertion of Theorem 2.7 follows. \(\square \)
Remark 4.1
Arguments of the present section yield also moderate deviations estimates for sums \(S_N(V)\) given by (2.36) in the above i.i.d. setup. Namely, let \(\bar{V}=\int V(x_1,x_2,\ldots ,x_k)d\mu (x_1)d\mu (x_2)\cdots d\mu (x_k)\), where \(\mu \) is the probability distribution of \(X(1)\), and observe that \(\bar{V}=EV(X(n),X(2n),\ldots ,X(kn))\) for any \(n\ge 1\). Then for any \({\kappa }\in (0,\frac{1}{2})\),
for any closed set \(K\subset \mathbb{R }\) and
for any open set \(U\subset \mathbb{R }\) provided that for any \({\lambda }\in \mathbb{R }\),
(cf. [12] and [11]). In order to compute the limit (4.16) we observe relying on the same arguments as above that \({\upsilon }_l(V)= E(S_{N,a}(V-\bar{V}))^2\) depends only on \(l=|B_N(a)|\) and on \(V\) where, recall, \(S_{N,a}\) was defined in (2.36). It follows that
provided \(|B_N(a)|=l\). Then in the same way as in (4.13),
and (4.16) follows under a nondegeneracy condition \({\upsilon }_l(V)\ne 0\) whenever \(A_N^{(l)}\ne \emptyset \).
5 Nonconventional large deviations for dynamical systems
In this section we discuss nonconventional large deviations results in the dynamical systems case and a reader which is not familiar with hyperbolic dynamical systems and is interested only in the probabilistic setup may skip this section altogether. We assume now that \(T:\, M\rightarrow M\) is either a subshift of finite type or a \(C^2\) expanding endomorphism or a hyperbolic diffeomorphism on a compact Riemannian manifold (see [3] and [21]). By the latter we mean a \(C^2\) Anosov diffeomorphism or, more generally, a \(C^2\) diffeomorphism defined in a neighborhood of a hyperbolic attractor. We identify now the probability space \(({\Omega },\mathcal{F },P)\) with \((M,\mathcal{B },\mu )\) where \(\mathcal{B }\) is the Borel \({\sigma }\)-algebra on \(M\) and \(\mu \) is a Gibbs \(T\)-invariant measure constructed by a Hölder continuous potential \(g\) (see [3] and [21]). Let \(k=1\) in (2.3), (2.4) and (2.17), (2.18).
Theorem 5.1
Let \(X(n)=X(n,{\omega })=X(n,x)=f(T^nx),\, n\ge 0\), where \(f\) is a Hölder continuous (vector) function, and we take also \(q_j\)’s as in Theorem 2.1. Let \(k=1\) then for any \(W_{\lambda }=W_{\lambda }(x_1,\ldots ,x_\ell )\) continuous in \(x_1,\ldots ,x_\ell \),
with \(\hat{W}\) defined by (2.5), \(g\) being the potential of \(\mu \) and \(\mathfrak{P }(\cdot )\) being the topological pressure of a function in brackets for the transformation \(T\) (see [3] and [21]). If the derivative \(dW_{\lambda }/d{\lambda }\) exists and is bounded in \(x_1,\ldots ,x_\ell \) for each \({\lambda }\) then \(Q(W_{\lambda })\) is differentiable in \({\lambda }\), as well. In the expanding and hyperbolic cases the limit in (5.1) remains the same if we integrate in (5.1) either with respect to the normalized Riemannian volume or with respect to the Sinai-Ruelle-Bowen (SRB) measure \(\mu ={\mu ^{\mathrm{\tiny {SRB}}}}\) which is the Gibbs measure corresponding to the potential \(g=-\ln {\varphi }\) where \({\varphi }\) is the Jacobian of the differential \(DT\) restricted to unstable leaves (see [3] and [21]). The large deviations estimates (2.9) and (2.10) hold true with the rate functional \(J\) given by (2.8) with \(Q(W_{\lambda })\) for \(W_{\lambda }={\lambda }F\) given by (5.1) in place of \(r(W_{\lambda })\) in (2.8).
Proof
For \(T\) being a \(C^2\) Axiom A diffeomorphism (in particular, Anosov) in a neighborhood of an attractor or \(T\) being an expanding \(C^2\) endomorphism of a Riemannian manifold \(M\) (see [3]) let \(\zeta \) be a finite Markov partition for \(T\). Then we can take \(\mathcal{F }_{kl}\) to be the finite \({\sigma }\)-algebra generated by the partition \(\cap _{i=k}^lT^i\zeta \). Another case for the above theorem is when \(T\) is a topologically mixing subshift of finite type, i.e. \(T\) is the left shift on a subspace \(\Xi \) of the space of one-sided sequences \({\varsigma }=({\varsigma }_i,i\ge 0), {\varsigma }_i=1,\ldots ,l_0\) such that \({\varsigma }\in \Xi \) if \(\xi _{{\varsigma }_i{\varsigma }_{i+1}}=1\) for all \(i\ge 0\) where \(\Xi =(\xi _{ij})\) is an \(l_0\times l_0\) matrix with \(0\) and \(1\) entries and such that \(\Xi ^n\) for some \(n\) is a matrix with positive entries. Again, we take \(\mu \) to be a Gibbs invariant measure corresponding to some Hölder continuous function and to define \(\mathcal{F }_{kl}\) as the finite \({\sigma }\)-algebra generated by cylinder sets with fixed coordinates having numbers from \(k\) to \(l\). The exponentially fast \(\psi \)-mixing is well known in the above cases (see [3]). In fact, convergence to zero of the modified \(\psi -\)mixing coefficient \(\psi _{P,\Pi }(n)\) holds true, as well, in the hyperbolic and expanding case when \(P\) is the normalized Riemannian volume and \(\Pi ={\mu ^{\mathrm{\tiny {SRB}}}}\).
If the function \(W_{\lambda }=W_{\lambda }(x_1,\ldots ,x_\ell )\) is continuous in \(x_1,\ldots ,x_\ell \) then \({\beta }_{W_{\lambda }}(n)\) from Proposition 3.1 tends to zero as \(n\rightarrow \infty \), and so the condition (3.1) will be satisfied here. It follows from [16] that
and Theorem 5.1 follows from Proposition 3.1 and Corollary 3.2 considered with \(k=1\) since in our circumstances differentiability of the topological pressure in parameters of the potential is well known (see, for instance, [24] and [23]).
Remark 5.2
-
(i)
A version of Theorem 2.2 can also be obtained in the present dynamical systems setup where the limit \(Q(W)=\mathfrak{P }(\ln \hat{W}(x)+g)\) is obtained in the same way as in Theorem 5.1. Since \(\mathfrak{P }(q)\) is Gateaux differentiable at any Hölder continuous \(q\) (see [26] and [23]) then \(Q(W)\) is also Gateaux differentiable at any Hölder continuous \(W\) and the large deviations for occupational measures
$$\begin{aligned} \zeta _N=\zeta _{N,x}=\frac{1}{N}\sum _{n=1}^N{\delta }_{\big (T^{q_1(n)}x,\ldots , T^{q_\ell (n)}x\big )} \end{aligned}$$follow from Section 4.5.3 in [11] with a rate function which is the Fenchel–Legendre transform of \(Q\).
-
(ii)
Theorem 2.6 provides a direct application to the dynamical systems case when \(T\) is a full shift (on a finite alphabet sequence space) considered with a Bernoulli invariant measure taking \(X(n)=f\circ T^n\) with a function \(f\) on the sequence space depending only on zero coordinate. Nonconventional large deviations when \(k>1\) for more general cases (e.g. subshifts of finite type with Gibbs invariant measures, hyperbolic and expanding transformations etc.) require more elaborate technique and they will not be treated in this paper.
References
Assani, I.: Multiple recurrence and almost sure convergence for weakly mixing dynamical systems. Israel J. Math. 103, 111–124 (1998)
Bergelson, V.: Weakly mixing PET. Ergod. Th. Dynam. Sys. 7, 337–349 (1987)
Bowen R.: Equilibrium states and the Ergodic theory of Anosov diffeomorphisms. In: Lecture Notes in Math. 470, Springer, Berlin, 2nd ed. (2008)
Bergelson, V., Leibman, A., Moreira, C.G.: From discrete-to continuous time ergodic theorems, Ergod. Th. Dyn. Sys. (2013)
Contreras, G.: Regularity of topological and metric entropy of hyperbolic flows. Math. Z. 210, 97–111 (1992)
Carinci, G., Chazottes, J.R., Giardina, C., Redig, F.: Nonconventional averages along arithmetic progressions and lattice spin systems. Indag. Math. 23, 589–602 (2012)
Dolgopyat, D.: Limit theorems for partially hyperbolic systems. Trans. Am. Math. Soc. 356, 1637–1689 (2003)
Doob, J.: Stochastic Processes. Wiley, New York (1953)
Donsker, M.D., Varadhan, S.R.S.: Asymptotic evaluation of certain Markov processes expectations for large time. I. Comm. Pure Appl. Math. 28, 1–47 (1975)
Donsker, M.D., Varadhan, S.R.S.: On the variational formula for the principal eigenvalue for operators with maximum principle. Proc. Natl. Acad. Sci. USA 72, 780–783 (1975)
Dembo, A., Zeitouni, O.: Large deviations techniques and applications, 2nd edn. Springer, Heidelberg (1998)
Freidlin, M.I.: The averaging principle and theorems on large deviations. Russ. Math. Surv. 33(5), 107–160 (1978)
Furstenberg, H.: Nonconventional ergodic averages. Proc. Symp. Pure Math. 50, 43–56 (1990)
Ibragimov, I.A., Linnik, Yu.V.: Independent and stationary sequences of random variables. Wolters-Noordhoff, Groningen (1971)
Kifer, Yu.: Principal eigenvalues, topological pressure, and stochastic stability of equilibrium states. Israel J. Math. 70, 1–47 (1990)
Kifer, Yu.: Large deviations in dynamical systems and stochastic processes. Trans. Am. Math. Soc. 321, 505–524 (1990)
Kifer, Yu.: Averaging in dynamical systems and large deviations. Invent. Math. 110, 337–370 (1992)
Kifer, Yu.: Nonconventional limit theorems. Prob. Th. Rel. Fields 148, 71–106 (2010)
Kifer, Yu.: A nonconventional strong law of large numbers and fractal dimensions of some multiple recurrence sets. Stoch. Dynam. 12, 1150023 (2012)
Krasnoselskii, M.A.: Positive solutions of operator equations. Noordhoff, Groningen (1964)
Katok, A., Hasselblatt, B.: Introduction to the modern theory of dynamical systems. Cambridge Univ Press, Cambridge (1995)
Kifer, Yu., Varadhan, S.R.S.: Nonconventional limit theorems in discrete and continuous time via martingales, Ann. Probab. (2013)
Parry, W., Pollicott, M.: Zeta functions and the periodic structure of hyperbolic dynamics, Astérisque 187–188, (Soc. Math. de France), (1990)
Ruelle, D.: Thermodynamic formalism. Addison-Wesley, Reading (1978)
Sanders, J.A., Verhurst, F., Murdock, J.: Averaging methods in nonlinear dynamical systems, 2nd edn. Springer, New York (2007)
Walters, P.: Differentiability properties of the pressure of a continuous transformation on a compact metric space. J. London Math. Soc. (2)(46), 471–481 (1992)
Author information
Authors and Affiliations
Corresponding author
Additional information
Yu. Kifer was supported by ISF grants 130/06 and 82/10 and S. R. S. Varadhan was supported by NSF grants OISE 0730136 and DMS 0904701.
Rights and permissions
About this article
Cite this article
Kifer, Y., Varadhan, S.R.S. Nonconventional large deviations theorems. Probab. Theory Relat. Fields 158, 197–224 (2014). https://doi.org/10.1007/s00440-013-0481-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-013-0481-4
Keywords
- Large deviations
- Markov processes
- Nonconventional averages
- Hyperbolic diffeomorphisms
Mathematics Subject Classification (2000)
- Primary 60F10
- Secondary 60J05
- 60J25
- 37D20