Cramér’s Theorem is Atypical

Conference paper
Part of the Association for Women in Mathematics Series book series (AWMS, volume 6)

Abstract

The empirical mean of n independent and identically distributed (i.i.d.) random variables \((X_1,\dots ,X_n)\) can be viewed as a suitably normalized scalar projection of the n-dimensional random vector \(X^{(n)}\displaystyle \mathop {=}^{\cdot }\,(X_1,\dots ,X_n)\) in the direction of the unit vector \(n^{-1/2}(1,1,\dots ,1) \in \mathbb {S}^{n-1}\). The large deviation principle (LDP) for such projections as \(n\rightarrow \infty \) is given by the classical Cramér’s theorem. We prove an LDP for the sequence of normalized scalar projections of \(X^{(n)}\) in the direction of a generic unit vector \(\theta ^{(n)}\in \mathbb {S}^{n-1}\), as \(n\rightarrow \infty \). This LDP holds under fairly general conditions on the distribution of \(X_1\), and for “almost every” sequence of directions \((\theta ^{(n)})_{n\in \mathbb {N}}\). The associated rate function is “universal” in the sense that it does not depend on the particular sequence of directions. Moreover, under mild additional conditions on the law of \(X_1\), we show that the universal rate function differs from the Cramér rate function, thus showing that the sequence of directions \(n^{-1/2}(1,1,\dots ,1) \in \mathbb {S}^{n-1},\)\(n \in \mathbb {N}\), corresponding to Cramér’s theorem is atypical.

Keywords

Large deviations Projections High-dimensional product measures Cramér’s theorem Rate function 

Mathematics Subject Classification

60F10 (primary) 60D05 (secondary) 

1 Introduction

Let \(X^{(n)} = (X_1,\dots , X_n)\) be a sequence of n independent and identically distributed (i.i.d.) \(\mathbb {R}\)-valued random variables with common distribution \(\gamma \in \mathcal {P}(\mathbb {R})\), the space of probability measures on \({\mathbb {R}}\). A fundamental probabilistic question is how the empirical mean of \(X^{(n)}\) behaves as the length of the sequence n increases. From a geometric perspective, the empirical mean is a suitably normalized version of (the scalar component of) the projection of the n-dimensional vector \(X^{(n)}\) in the direction of the unit vector \(\iota ^{(n)}\), defined byIn other words, we can write
$$\begin{aligned} W_\iota ^{(n)}\displaystyle \mathop {=}^{\cdot }\frac{1}{\sqrt{n}} \langle X^{(n)}, \iota ^{(n)}\rangle _n = \frac{1}{n}\sum _{i=1}^n X_i \, , \end{aligned}$$
(2)
where \(\langle \cdot , \cdot \rangle _n\) denotes the Euclidean inner product. With some abuse of terminology, for \(x\in \mathbb {R}^n\) and \(v\in \mathbb {S}^{n-1}\), we hereby write the “projection of x in the direction v” to refer to the scalar component \(\langle x,v\rangle _n \in \mathbb {R}\) (rather than the vector \(\langle x,v\rangle _nv\in \mathbb {R}^n\)). Then, the expression (2) indicates that questions on the empirical mean for large n can be rephrased in a geometric language as questions on suitably normalized projections of high-dimensional random vectors.
The classical Cramér’s theorem characterizes the large deviations behavior of (2), the empirical mean of i.i.d. random variables, as \(n\rightarrow \infty \). In particular, if \(X_1\sim \gamma \) has some finite exponential moments, in the sense that
$$\begin{aligned} \exists \, t_0>0 \text { s.t. } \forall \, |t|< t_0, \quad \varLambda (t) \displaystyle \mathop {=}^{\cdot }\log \mathbb {E}[e^{tX_1}] < \infty , \end{aligned}$$
(3)
then we have the limit
$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{n}\log \mathbb {P}(W_\iota ^{(n)}\ge x) = -\varLambda ^*(x), \end{aligned}$$
where \(^*\) denotes the Legendre transform,
$$\begin{aligned} \varLambda ^*(x) \displaystyle \mathop {=}^{\cdot }\sup _{t\in \mathbb {R}} \{ tx - \varLambda (t)\}. \end{aligned}$$
(4)
We refer to [17, Sect. 12] for a review of the Legendre transform (also known as the convex conjugate).
Given the geometric view of empirical means given by (2), it is natural to investigate analogs of Cramér’s theorem for normalized projections in directions \(\theta ^{(n)}\in \mathbb {S}^{n-1}\) other than \(\iota ^{(n)}\). Such projections correspond to weighted means,
$$\begin{aligned} W_{\theta }^{(n)}\displaystyle \mathop {=}^{\cdot }\frac{1}{\sqrt{n}} \langle X^{(n)}, \theta ^{(n)}\rangle _n = \frac{1}{n}\sum _{i=1}^n X_i \sqrt{n} \theta ^{(n)}_i . \end{aligned}$$
(5)
Our main result is an LDP for \((W_{\theta }^{(n)})_{n\in \mathbb {N}}\) for almost every (in a sense that is specified below) sequence of directions \(\theta = (\theta ^{(1)},\theta ^{(2)},\dots )\). In particular, we show that the associated rate function does not depend on \(\theta \), and that it differs from the Cramér rate function \(\varLambda ^*\). That is, the sequence of directions \((\iota ^{(n)})_{n\in \mathbb {N}}\) corresponding to Cramér’s theorem is “atypical”!

Remark 1

While the LDP for (5) is novel, the corresponding law of large numbers (LLN) and central limit theorem (CLT) for weighted sums are well known. For example, a weak LLN follows from Chebyshev’s inequality, and a CLT follows from the Lindeberg condition (see, e.g., [11, Sect. VIII.4, Theorem 3]).

The outline of this note is as follows. In Sect. 2, we state our main results and discuss their relation to prior work. In Sect. 3, we prove the claimed LDP. In Sect. 4, we establish that Cramér’s theorem is atypical, and also comment on a generalization that is considered in [13].

2 Main Results

We first set some notation. Suppose the random variables \(X_1,X_2,\dots \) are all defined on a common probability space \((\varOmega ,\mathcal {F},\mathbb {P})\). Let \(\Vert \cdot \Vert _{n}\) denote the Euclidean norm on \(\mathbb {R}^n\). Write \(\sigma _{n-1}\) for the unique rotation invariant probability measure on \(\mathbb {S}^{n-1}\), the unit sphere in \(\mathbb {R}^n\). Let \(\mathbb {S}\displaystyle \mathop {=}^{\cdot }{\prod }_{n \in \mathbb {N}} \mathbb {S}^{n-1}\), and let \(\pi _n:\mathbb {S}\rightarrow \mathbb {S}^{n-1}\) be the coordinate map such that for \(\theta = (\theta ^{(1)},\theta ^{(2)},\dots )\in \mathbb {S}\), we have \(\pi _n(\theta ) = \theta ^{(n)}\). Let \(\sigma \) be a probability measure on (the Borel sets of) \(\mathbb {S}\) such that
$$\begin{aligned} \sigma \circ \pi _n^{-1} = \sigma _{n-1}, \quad n\in \mathbb {N}. \end{aligned}$$
(H1)
The generic example to keep in mind that satisfies (H1) is the product measure \(\sigma =\bigotimes _{n\in \mathbb {N}} \sigma _{n-1}\), in which case the projection directions \(\theta ^{(n)}\), \(n\in \mathbb {N}\), are independent under \(\sigma \). However, our results allow for more general dependencies; for more discussion on \(\sigma \) and the condition (H1), see Remark 3.

For \(\sigma \)-a.e. \(\theta \in \mathbb {S}^{n-1}\), we prove a large deviation principle for the sequence \((W_\theta ^{(n)})_{n\in \mathbb {N}}\) with a rate function that does not depend on \(\theta \). We refer to [9] for general background on large deviations. In particular, recall the following definition:

Definition 1

The sequence of probability measures \((\mu _n)_{n\in \mathbb {N}}\subset \mathcal {P}(\mathbb {R})\) is said to satisfy a large deviation principle (LDP) with a rate function\(\mathbb {I}:\mathbb {R}\rightarrow [0,\infty ]\) if \(\mathbb {I}\) is lower semicontinuous, and for all Borel measurable sets \(\varGamma \subset \mathbb {R}\),
$$\begin{aligned} -\inf _{x\in \varGamma ^\circ } \mathbb {I}(x) \le \liminf _{n\rightarrow \infty } \tfrac{1}{n} \log \mu _n(\varGamma ^\circ ) \le \limsup _{n\rightarrow \infty } \tfrac{1}{n} \log \mu _n(\bar{\varGamma }) \le -\inf _{x\in \bar{\varGamma }} \mathbb {I}(x), \end{aligned}$$
where \(\varGamma ^\circ \) and \(\bar{\varGamma }\) denote the interior and closure of \(\varGamma \), respectively. Furthermore, \(\mathbb {I}\) is said to be a good rate function if it has compact level sets.

We say the sequence of \(\mathbb {R}\)-valued random variables \((\xi _n)_{n\in \mathbb {N}}\) satisfies an LDP if the sequence of laws \((\mu _n)_{n\in \mathbb {N}}\) given by \(\mu _n = \mathbb {P}\circ \xi _n^{-1}\) satisfies an LDP.

In particular, for empirical means of i.i.d. random variables, we recall the following classical result, due to [5, 7].

Theorem 1

(Cramér) Let \((X_n)_{n\in \mathbb {N}}\) be an i.i.d. sequence such that (3) holds, and let \(\iota =(\iota ^{(1)},\iota ^{(2)},\dots )\) be defined as in (1). Then the sequence \((W_\iota ^{(n)})_{n\in \mathbb {N}}\) of (2) satisfies an LDP with the good rate function \(\mathbb {I}_\iota \), given by
$$\begin{aligned} \mathbb {I}_\iota (w) \displaystyle \mathop {=}^{\cdot }\varLambda ^*(w) = \sup _{t\in \mathbb {R}} \{ tw - \varLambda (t)\}. \end{aligned}$$
(6)
Let \(\nu \in \mathcal {P}(\mathbb {R})\) denote the standard one-dimensional Gaussian measure. In the sequel, we assume the following condition on \(\varLambda \), the logarithmic moment generating function (log mgf) of \(X_1\sim \gamma \):
$$\begin{aligned} \forall \,t\in \mathbb {R}, \quad \int _\mathbb {R}|\varLambda (tu)|^4\nu (du) < \infty . \end{aligned}$$
(H2)
Note that (H2) is stronger than even requiring the exponential moment condition in (3) to hold with \(t_0=\infty \). For an absolutely continuous \(\gamma \) with density, a sufficient condition for (H2) is that the decay of the tail of the density is strictly faster than exponential, in the following sense:

Lemma 1

Suppose that \(\gamma \) has density f, and that there exist \(p\in (1,\infty )\) and constants \(0< C_1, C_2, C_3 < \infty \) such that for \(|x| > C_1\), we have
$$\begin{aligned} f(x) \le C_2 e^{-C_3|x|^p}. \end{aligned}$$
Then there exists some constant \(C< \infty \) such that \(\varLambda \), the log mgf of \(\gamma \), satisfies the following upper bound for all \(t\in \mathbb {R}\):
$$\begin{aligned} \varLambda (t) \le C|t|^{p/(p-1)} +C. \end{aligned}$$
Moreover, this implies that \(\varLambda \) satisfies the condition (H2).

Proof

By Young’s inequality applied to the conjugate exponents p and \(\tfrac{p}{p-1}\), for \(\varepsilon > 0\) and \(t,y\in \mathbb {R}\),
$$\begin{aligned} ty \le \left( \varepsilon ^{-1/p}|t|\right) \left( \varepsilon ^{1/p} |y|\right) \le \tfrac{p-1}{p} \varepsilon ^{-1/(p-1)}|t|^{p/(p-1)} + \frac{\varepsilon |y|^p}{ p}. \end{aligned}$$
In the following, let C absorb all constants, and note that for \(0< \varepsilon < C_3p\) and \(t\in \mathbb {R}\),
$$\begin{aligned} \varLambda (t)&= \log \int _{|y|\le C_1} e^{ty} f(y) dy + \log \int _{|y| > C_1} e^{ty} f(y) dy\\&\le C_1|t| + \log C_2 + \log \int _{\mathbb {R}} e^{ty} e^{-C_3 |y|^p} dy\\&\le C_1|t|^{p/(p-1)} + C_1 + \tfrac{p-1}{p} \varepsilon ^{-1/(p-1)}|t|^{p/(p-1)} -\tfrac{1}{p}\log (C_3p-\varepsilon ) + 1\\&= C |t|^{p/(p-1)} + C. \end{aligned}$$
From the preceding inequalities, since the Gaussian measure \(\nu \) has finite moments of every order, it is clear that \(\varLambda \) satisfies the integrability condition (H2). \(\square \)
We define the following analog of the log mgf in the case of weighted sums,
$$\begin{aligned} \varPsi (t) \displaystyle \mathop {=}^{\cdot }\int _\mathbb {R}\varLambda (tu) \tfrac{1}{\sqrt{2\pi }} e^{-u^2/2}du, \quad t\in \mathbb {R}. \end{aligned}$$
(7)
Our first main result is the following.

Theorem 2

(Weighted LDP) Assume (H1) and (H2). Then, for \(\sigma \)-a.e. \(\theta \in \mathbb {S}\), the sequence \((W_\theta ^{(n)})_{n\in \mathbb {N}}\) of (5) satisfies an LDP with the convex good rate function \(\mathbb {I}_\sigma \), given by
$$\begin{aligned} \mathbb {I}_\sigma (w) \displaystyle \mathop {=}^{\cdot }\varPsi ^*(w) = \sup _{t\in \mathbb {R}} \{ tw - \varPsi (t)\}. \end{aligned}$$
(8)

The proof of Theorem 2 is given in Sect. 3, with intermediate steps established in Sects. 3.1 and 3.2, and the proof completed in Sect. 3.3.

In principle, the rate function \(\mathbb {I}_\sigma \) of Theorem 2 could depend on the particular choice of \(\theta \), but our result shows that the rate function is the same for \(\sigma \)-a.e. \(\theta \). In the case where \(\sigma \) is the product measure \(\sigma =\bigotimes _{n\in \mathbb {N}} \sigma _{n-1}\), this follows immediately from the Kolmogorov zero-one law. That is, let \(\mathcal {T}_n\) be the sigma-algebra generated by \((\theta ^{(k)})_{k\ge n}\), and let
$$\begin{aligned} \mathcal {T} \displaystyle \mathop {=}^{\cdot }\bigcap _{n=1}^\infty \mathcal {T}_n \end{aligned}$$
(9)
denote the tail sigma-algebra induced by \((\theta ^{(1)},\theta ^{(2)},\dots )\). The rate function \(\mathbb {I}_\sigma \) is measurable with respect to \(\mathcal {T}\), and the Kolmogorov zero-one law states that \(\mathcal {T}\) is trivial under the product measure. Hence, \(\mathbb {I}_\sigma \) coincides for \(\sigma \)-a.e. \(\theta \in \mathbb {S}\). However, our claim holds for general \(\sigma \) satisfying (H1). In particular, Example 2(b) in Sect. 3.1 gives an example of \(\sigma \) such that \(\theta ^{(1)},\theta ^{(2)},\dots \) are highly dependent, \(\mathcal {T}\) is not trivial, and hence, the lack of dependence of the rate function \(\mathbb {I}_\sigma \) on \(\theta \) is not a priori obvious.
Given the \(\sigma \)-a.e. statement of Theorem 2, it is natural to ask what happens on the set of measure zero in \(\mathbb {S}\) where the stated LDP does not hold. In particular, our second main result Theorem 3 shows that under certain additional conditions on \(\varLambda \), the sequence of directions \(\iota \) associated with Cramér’s theorem is exceptional, in the sense that Cramér’s rate function \(\mathbb {I}_\iota \) differs from the universal rate function \(\mathbb {I}_\sigma \). For the following theorem, we assume \(\gamma \) is symmetric, or specifically:
$$\begin{aligned} \forall \, t\in \mathbb {R}, \quad \varLambda (t)=\varLambda (-t). \end{aligned}$$
(H3)

Theorem 3

(Atypicality) Assume \(\varLambda \) satisfies (H3), and let \(\mathbb {I}_\iota \) and \(\mathbb {I}_\sigma \) be given by (6) and (8), respectively.
(a)

If \(\varLambda \circ \sqrt{\cdot }\) is concave on \(\mathbb {R}_+\), then \(\mathbb {I}_\sigma (w) \ge \mathbb {I}_\iota (w)\) for all \(w\in \mathbb {R}\).

(b)

If \(\varLambda \circ \sqrt{\cdot }\) is convex on \(\mathbb {R}_+\), then \(\mathbb {I}_\sigma (w) \le \mathbb {I}_\iota (w)\) for all \(w\in \mathbb {R}\).

(c)

If \(\varLambda \circ \sqrt{\cdot }\) is concave or convex, but not linear, on \(\mathbb {R}_+\), then \(\mathbb {I}_\sigma (w) = \mathbb {I}_\iota (w) < \infty \) if and only if \(w=0\).

The proof of Theorem 3 is given in Sect. 4.

We now provide some sufficient conditions (established in [1]) for the convexity or concavity conditions of Theorem 3 to hold.

Proposition 1

Assume the exponential moment condition (3) and the symmetry condition (H3).
(i)
Suppose \(\gamma \ne \delta _0\), the Dirac mass at 0. Define \(\varphi :\mathbb {N}\rightarrow \mathbb {R}\) by
$$\begin{aligned} \varphi (k) \displaystyle \mathop {=}^{\cdot }(2k+1) \frac{\mathbb {E}[|X_1|^{2k}]}{\mathbb {E}[|X_1|^{2k+2}]}, \quad k\in \mathbb {N}. \end{aligned}$$
If \(\varphi \) is non-decreasing (resp., non-increasing), then \(\varLambda \circ \sqrt{\cdot }\) is concave (resp., convex) on \(\mathbb {R}_+\).
(ii)

Suppose \(\gamma \) has density f such that \(\log f \circ \sqrt{\cdot }\) is concave (resp., convex) on \(\mathbb {R}_+\). Then \(\varLambda \circ \sqrt{\cdot }\) is concave (resp., convex) on \(\mathbb {R}_+\).

Proof

Part (i) is established in Theorem 7 of [1]. Part (ii) follows from applying Theorem 12 of [1] with their f replaced by our \(f\circ \sqrt{\cdot }\), and noticing that the integrability of \(f\circ \sqrt{\cdot }\) follows from the fact that f has finite first moment, due to the exponential moment condition of (3). \(\square \)

Example 1

Suppose \(\gamma \) is the generalized normal distribution with location 0, scale \(\alpha > 0\), and shape \(\beta > 1\); that is, \(\gamma =\mu _{\alpha ,\beta }\), where
$$\begin{aligned} \mu _{\alpha ,\beta }(dx) \displaystyle \mathop {=}^{\cdot }\frac{1}{2\alpha \varGamma (1+\frac{1}{\beta })}e^{-(|x|/\alpha )^\beta }dx \end{aligned}$$
It follows from Lemma 1 that \(\mu _{\alpha ,\beta }\) satisfies (H2), which implies (3). It is also easy to see that \(\mu _{\alpha ,\beta }\) satisfies (H3). Thus, the conditions of Proposition 1 are satisfied. It follows immediately from Proposition 1(ii) that for \(\beta \ge 2\) (resp., for \(\beta \le 2\)), \(\varLambda \circ \sqrt{\cdot }\) is concave (resp., convex). In fact, for \(\beta \ne 2\), the concavity (resp., convexity) is strict.

The preceding example suggests the particular role of the Gaussian, which corresponds to \(\beta =2\). In particular, \(\gamma =\mu _{\alpha ,2}\) for some \(\alpha > 0\) if and only if \(\varLambda \circ \sqrt{\cdot }\) is linear. Thus, we could interpret the conditions of Theorem 3 as evaluating whether our distribution of interest is “more” or “less” log-concave than the Gaussian. We also have the following result in the Gaussian case (i.e., when \(\gamma =\mu _{\alpha ,2}\)), which holds for all\(\theta \) as opposed to just for \(\sigma \)-a.e. \(\theta \).

Proposition 2

Suppose \(\gamma = \mu _{\alpha ,2}\) for some \(\alpha > 0\). Then, for all \(\theta \in \mathbb {S}\), the sequence \((W_\theta ^{(n)})_{n\in \mathbb {N}}\) satisfies an LDP with the good rate function \(\varPsi ^*(w) = \varLambda ^*(w) = (w/\alpha )^2\), where \(\varLambda ^*\) is defined in (4) with \(\varLambda \) the log mgf of the Gaussian with mean 0 and variance \(\alpha ^2/2\).

Proof

This follows from the fact that for all \(n\in \mathbb {N}\), the Gaussian measure on \(\mathbb {R}^n\) is spherically symmetric, and hence, for any \(\theta ^{(n)} \in \mathbb {S}^{n-1}\), the law of \(\langle X^{(n)}, \theta ^{(n)}\rangle _n\) is the same as the law of \(\langle X^{(n)}, \iota ^{(n)}\rangle _n\). Thus, the LDP for \((W_\theta ^{(n)})_{n\in \mathbb {N}}\) follows from the classical Cramér’s theorem for empirical means of i.i.d. Gaussians, for which the rate function can be easily computed to be \(\varLambda ^*(w) = (w/\alpha )^2\). \(\square \)

Remark 2

It is not clear whether a converse of Proposition 2 holds. That is, whether \(\mathbb {I}_\sigma \equiv \mathbb {I}_\iota \) if and only if \(\gamma \) is Gaussian. As one possible approach in this direction, it would be sufficient to show that for any measure \(\gamma \) satisfying both (H2) and (H3) (and possibly some additional natural conditions), the function \(\varLambda \circ \sqrt{\cdot }\) must be either concave or convex.

Aside from the sequence of Cramér directions \(\iota \in \mathbb {S}\), another natural sequence of directions to consider is the sequence of canonical basis vectors, \(e_1 = (e_1^{(1)}, e_1^{(2)},\dots ) \in \mathbb {S}\), whereThen \(W_{e_1}^{(n)} = X_1/\sqrt{n}\) for all n. The following result states that under certain tail conditions, such normalized projections yield a trivial LDP, again with a rate function different from \(\mathbb {I}_\sigma \).

Proposition 3

Assume the following condition (which is stronger than (H2)):
$$\begin{aligned} \exists \, C < \infty , \, r\in [0,2) such that \forall \,t\in \mathbb {R}, \quad \varLambda (t) \le C(1+|t|^r). \end{aligned}$$
(H2′)
Then the sequence \((W_{e_1}^{(n)})_{n\in \mathbb {N}}\) satisfies an LDP with the trivial good rate function \(\chi _{0}\) given by
$$\begin{aligned} \chi _0(x) \displaystyle \mathop {=}^{\cdot }\left\{ \begin{array}{ll} 0, &{} x=0;\\ \infty , &{} x\ne 0. \end{array}\right. \end{aligned}$$

Proof

Consider the limit log mgf associated with the Gärtner–Ellis theorem (recalled for convenience later in Theorem 4). For all \(t\in \mathbb {R}\),
$$\begin{aligned} \varLambda _n(t) \displaystyle \mathop {=}^{\cdot }\frac{1}{n} \log \mathbb {E}[\exp (tnW_{e_1}^{(n)})] = \frac{1}{n}\log \mathbb {E}[\exp (t\sqrt{n} X_1)] = \frac{1}{n} \varLambda (t\sqrt{n}) \le \frac{1}{n} (C|t|^{r} n^{r/2} + C). \end{aligned}$$
Since the exponent r of (H2′) satisfies \(r<2\) by assumption,we have \(\lim _{n\rightarrow \infty }\varLambda _n (t) = 0\) for all \(t\in \mathbb {R}\). Thus, by the Gärtner–Ellis theorem, the sequence \((W_{e_1}^{(n)})_{n\in \mathbb {N}}\) satisfies an LDP with good rate function \(0^*=\chi _0\). \(\square \)

2.1 Relation to Prior Work

There is a wealth of literature on large deviations for weighted sums, but our work seems to be the first to emphasize the unique position of Cramér’s theorem in the geometric setting. Moreover, it appears that none of the existing literature is readily adaptable to our particular problem. We offer a partial (but inevitably, incomplete) survey of existing results.

In the somewhat classical works of Book [2, 3], we can find asymptotics bounds for quantities of the form
$$\begin{aligned} \mathbb {P}\left( \frac{\sum _{k=1}^n a_{nk} X_k}{\sum _{k=1}^n a_{nk}} > c \right) , \end{aligned}$$
where \((a_{nk})_{k\le n, n\in \mathbb {N}}\) is a triangular array of weights such that \(\sum _{k=1}^n a_{nk}^2 = 1\) for all n. However, this does not address our setting because if we let \(a_{nk} = \theta ^{(n)}_k\), we have \(\sum _{k=1}^n a_{nk}^2 = 1\), but this only yields tail bounds of the form\( \mathbb {P}( W_\theta ^{(n)} > cn^{-1/2}\sum _{k=1}^n \theta _k^{(n)})\), as opposed to the desired asymptotics for \(\mathbb {P}( W_\theta ^{(n)} > c)\). Furthermore, Book does not establish an LDP or identify a rate function.
In a more recent line of work, consider [14], where on their p. 932, their \(\lambda \) and \(\nu \) correspond to our n and k, respectively. For \(Z\sim N(0,1)\), we have the following correspondence:
$$\begin{aligned} a_j(n)&= \mathbf {1}_{\{j\le n-1\}} \frac{1}{n} \sqrt{n} \theta _j^{(n)}, \quad j, n\in \mathbb {N};\\ \sum _{j=0}^\infty a_j(n)^k&\approx \sum _{j=0}^{n-1} \frac{1}{n^k} z_j^k \mathop {\approx }\limits ^{(\zeta -a.e. )} \frac{\mathbb {E}[|Z|^k]}{n^{k-1}} \displaystyle \mathop {=}^{\cdot }\frac{a_k}{n^{k-1}} , \quad k,n\in \mathbb {N};\\ \phi (n)&= n, \quad n\in \mathbb {N}. \end{aligned}$$
Suppose that the sequence \((a_k)_{k\in \mathbb {N}}\) (which depends on the particular choice of weights \(a_j(n)\), \(j,n\in \mathbb {N}\)) satisfies the following condition (from p. 932 of [14]):
$$\begin{aligned} \lim _{k\rightarrow \infty } |a_k|^{1/k} < \infty . \end{aligned}$$
(10)
The main result of [14] is that for a sequence of i.i.d. random variables \((X_k)_{k\in \mathbb {N}}\) with cumulants \((c_k)_{k\in \mathbb {N}}\), if condition (10) holds, then the sequence of weighted means \(\frac{1}{n}\sum _{j=1}^n a_j(n) X_j\), \(n\in \mathbb {N}\), satisfies an LDP with rate function \(\chi ^*\), the Legendre transform of \(\chi (t) \displaystyle \mathop {=}^{\cdot }\sum \nolimits _{k=2}^\infty \frac{a_kc_k}{k!}t^k\). However, the finiteness condition (10) does not hold in our setting of \(a_k = \mathbb {E}[|Z|^k]\), since the following limit is infinite:
$$\begin{aligned} \lim _{k\rightarrow \infty } \mathbb {E}[|Z|^k]^{1/k} = \sqrt{2} \lim _{k\rightarrow \infty } \varGamma (\tfrac{k+1}{2})^{1/k} = \infty . \end{aligned}$$
Therefore, the weighted mean LDP of [14] does not apply in our setting.

Yet more recently, [16] proves an LDP for weighted empirical means similar to (5), except with weights that are uniformly bounded (in n). Our results correspond to unbounded weights \(\sqrt{n}\theta ^{(n)}_i\) which are not covered by their results. Similarly, [6] proves an LDP for empirical means of certain bounded functionals, which again fails to apply to our unbounded weights.

In the context of information theory, [8] states an LDP for sums of the form \(\frac{1}{n}\sum _{i=1}^n \rho (x_i,Y_i)\), where \((x_i)_{i\in \mathbb {N}}\) are “weights,” \((Y_i)_{i\in \mathbb {N}}\) is a sequence of random variables satisfying certain mixing properties, and \(\rho :\mathcal {X}\times \mathcal {Y}\rightarrow \mathbb {R}_+\) for Polish spaces \(\mathcal {X}\) and \(\mathcal {Y}\). The LDP is stated in the form of a generalized asymptotic equipartition property for “distortion measures.” However, note that \(\rho \) is assumed to be nonnegative, so a function like \(\rho (x,y)=xy\) (corresponding to projections) does not fit within the setting of [8]. Moreover, their weights \((x_i)_{i\in \mathbb {N}}\) are assumed to be a realization of a stationary ergodic process, which is not the case for our weights \(\sqrt{n}\theta ^{(n)}\) that are drawn from the scaled sphere \(\sqrt{n}\mathbb {S}^{n-1}\). This lends our work a geometric rather than information-theoretic interpretation.

The paper [12], co-authored by the first and third authors of this work, also analyzes weighted sums of i.i.d. random variables, but there the emphasis is on sums of subexponential random variables, rather than the weights themselves.

The most closely related work to our own is the recent work of [4], which gives strong large deviations (i.e., refined asymptotics) for weighted sums of i.i.d. random variables and i.i.d. weights, conditioned on the weights. Our weights \(\sqrt{n}\theta ^{(n)}_i\) are not i.i.d., but in Sects. 3.1 and 3.2, we prove that Theorem 2 can be reduced to an LDP for the sequence \((\widehat{W}_z^{(n)})_{n\in \mathbb {N}}\) defined in (12), which is an i.i.d. weighted sum, conditional on given weights. With some additional calculations from this point, the rate function \(\mathbb {I}_\sigma \) of Theorem 2 could then be deduced from the conditional LDP of [4], stated in their Theorem 1.6 with rate function defined in their Eq. (1.13). Note that condition (iii) of their Theorem 1.6 has two parts, but our integrability condition (H2) corresponds only to their first part; in fact, it follows from Lemma 4 that their second part follows from our condition (15), which is weaker than (H2), and thus, need not be assumed separately. Moreover, our research (completed independently) differs due to our emphasis on a geometric point of view; as a consequence, we can explicitly identify a rate function \(\mathbb {I}_\sigma \) and highlight the atypical position occupied by Cramér’s theorem.

Lastly, the method we use is a simplification of those developed in a companion paper [13], where we consider normalized projections of certain non-product measures, as well as projections in random directions.

3 The \(\sigma \)-Almost Everywhere LDP

3.1 The Surface Measure on \(\mathbb {S}^{n-1}\)

In this section, we recall a convenient representation for a random vector distributed according to the surface measure on \(\mathbb {S}^{n-1}\), in order to obtain (13), which reduces \(\sigma \)-a.e. statements into more tractable statements about Gaussian random variables. Let \(\mathbb {A} \displaystyle \mathop {=}^{\cdot }{\prod }_{n\in \mathbb {N}} \mathbb {R}^n\) denote the space of infinite triangular arrays. That is, \(z\in \mathbb {A}\) is of the form \(z=(z^{(1)},z^{(2)},\dots )\) where \(z^{(n)} \in \mathbb {R}^n\) for all \(n\in \mathbb {N}\). Let \(\mathcal {R}:\mathbb {A}\rightarrow \mathbb {A}\) be the map such that for \(z \in \mathbb {A}\), the nth row of \(\mathcal {R}(z)\) is
$$\begin{aligned}{}[\mathcal {R}(z)]^{(n)} \displaystyle \mathop {=}^{\cdot }\frac{z^{(n)}}{\Vert z^{(n)}\Vert _{n}}. \end{aligned}$$
Let \(\bar{\pi }_n:\mathbb {A}\rightarrow \mathbb {R}^n\) denote the nth row map such that \(\bar{\pi }_n(z) = z^{(n)}\). Let \(\nu \) denote the Gaussian measure on \(\mathbb {R}\), and let \(\nu ^{\otimes n}\) denote the standard Gaussian measure on \(\mathbb {R}^n\).

Lemma 2

If \(\zeta \in \mathcal {P}(\mathbb {A})\) is such that
$$\begin{aligned} \zeta \circ \bar{\pi }_n^{-1} = \nu ^{\otimes n}, \quad n\in \mathbb {N}, \end{aligned}$$
(11)
then \(\sigma \displaystyle \mathop {=}^{\cdot }\zeta \circ \mathcal {R}^{-1}\) satisfies (H1). Conversely, if \(\sigma \in \mathcal {P}(\mathbb {S})\) satisfies (H1), then there exists some \(\zeta \in \mathcal {P}(\mathbb {A})\) satisfying (11) such that \(\sigma = \zeta \circ \mathcal {R}^{-1}\).

Proof

Both results are merely a restatement of the well known fact that if \(Z^{(n)}\) has the n-dimensional standard Gaussian distribution, then \(Z^{(n)}/\Vert Z^{(n)}\Vert _n\) is uniformly distributed on the unit sphere \(\mathbb {S}^{n-1}\), and independent of \(\Vert Z^{(n)}\Vert _n\). \(\square \)

Note that Lemma 2 states that for any given \(\sigma \in \mathcal {P}(\mathbb {S})\), we can find a corresponding \(\zeta \in \mathcal {P}(\mathbb {A})\). Fix such a pair \((\sigma ,\zeta )\). Now, for \(z\in \mathbb {A}\), define
$$\begin{aligned} \widehat{W}_z^{(n)} \displaystyle \mathop {=}^{\cdot }\frac{1}{n} \sum _{i=1}^n X_i\,z_i^{(n)}. \end{aligned}$$
(12)
Then, given \(W_\theta ^{(n)}\) as defined in (5), and any good rate function \(\mathbb {I}:\mathbb {R}\rightarrow [0,\infty ]\), Lemma 2 implies that
$$\begin{aligned} \sigma&\left( \theta \in \mathbb {S}: (W_\theta ^{(n)})_{n\in \mathbb {N}} \text { satisfies an LDP with good rate function } \mathbb {I}\right) \nonumber \\&\quad = \zeta \left( z\in \mathbb {A} : (\tfrac{\sqrt{n}}{\Vert z^{(n)}\Vert _n} \widehat{W}_z^{(n)})_{n\in \mathbb {N}} \text { satisfies an LDP with good rate function } \mathbb {I}\right) . \end{aligned}$$
(13)
In addition, Lemma 2 yields a large class of examples of \(\sigma \) satisfying (H1), constructed via \(\zeta \) satisfying (11). We specify two such examples below.

Example 2

  1. (a)

    Consider the completely independent case, where the elements \(Z_i^{(n)}\), \(i=1,\dots , n\), \(n\in \mathbb {N}\), are all independent; then the law of \(\mathcal {R}(Z)\) is the product measure \(\sigma = \bigotimes _{n\in \mathbb {N}} \sigma _{n-1}\), where each row \(\theta ^{(n)}\) of \(\theta \) is independent under \(\sigma \). As previously noted, the tail sigma-algebra \(\mathcal {T}\) induced by the rows (defined in (9)), is trivial in this case due to the Kolmogorov zero-one law.

     
  2. (b)
    Alternatively, consider the following highly dependent case: let \(\zeta \in \mathcal {P}(\mathbb {A})\) satisfy (11) such that for \(\zeta \)-a.e. \(z\in \mathbb {A}\), we have \(z_i^{(n)} = z_i^{(m)}\) for all \(i\in \mathbb {N}\) and \(m,n\ge i\) (i.e., constant within columns). Then, let \(\sigma = \zeta \circ \mathcal {R}^{-1}\), so that \(\sigma \) satisfies (H1) by Lemma 2. In this case, there is strong dependence across rows which precludes a claim regarding triviality of the tail sigma-algebra \(\mathcal {T}\) induced by the rows. In fact, consider the event
    $$\begin{aligned} A \displaystyle \mathop {=}^{\cdot }\left\{ \theta \in \mathbb {S}: \lim _{n\rightarrow \infty } \sqrt{n} \theta ^{(n)}_1 > 0 \right\} \end{aligned}$$
    Note that A is measurable with respect to \(\mathcal {T}\). However, we also have due to the strong law of large numbers (\(\zeta \)-a.e., as stated precisely in (14)),
    $$\begin{aligned} \sigma (A) = \zeta \left( z\in \mathbb {A} : \lim _{n\rightarrow \infty } \sqrt{n} \,z_1^{(n)} / \Vert z^{(n)}\Vert _{n,2}> 0\right) = \zeta \left( z\in \mathbb {A} : z_1^{(1)} > 0 \right) = \frac{1}{2}. \end{aligned}$$
    That is, \(\mathcal {T}\) is non-trivial, and so \(\mathbb {I}_\sigma \) cannot a priori be declared as \(\sigma \)-a.e. constant through a simple analysis of the tail sigma-algebra.
     

Remark 3

We assume the condition (H1) not in an attempt to be as general as possible, but rather to point out that the universality of the rate function is a genuinely interesting phenomenon. Specifically, if we only consider the independent case of Example 2(a), then the fact that \(\mathbb {I}_\sigma \) is “universal” (in that it does not depend on \(\theta \)) is a consequence of the fact that the tail sigma-algebra \(\mathcal {T}\) is trivial. However, Example 2(b) shows that universality of the rate function is a more general phenomenon that holds even when \(\mathcal {T}\) is non-trivial. The condition (H1) only imposes constraints on the “marginal” distribution of the nth row of the array \(\theta \), and imposes no restrictions on the dependence across different rows \(\theta ^{(n)}\), \(n\in \mathbb {N}\). In fact, for \(Z\sim \zeta \) satisfying (11), the elements of Z need not even be jointly Gaussian in order for the law of \(\mathcal {R}(Z)\) to satisfy (H1).

3.2 Exponential Equivalence

As a consequence of Lemma 2 and the equality in (13), we can replace \(\sigma \)-a.e. statements about \(W_\theta ^{(n)}\), \(n\in \mathbb {N}\), with \(\zeta \)-a.e. statements about \((\sqrt{n}/ \Vert z^{(n)}\Vert _n)\widehat{W}_z^{(n)}\), \(n\in \mathbb {N}\). In this section, we go further and explain why in the large deviations setting, we can ignore the contribution of the multiplicative factor \(\sqrt{n}/\Vert z^{(n)}\Vert _n\). That is, we show that such a factor yields an exponentially equivalent sequence, defined as follows.

Definition 2

Let \((\xi _n)_{n\in \mathbb {N}}\) and \((\tilde{\xi }_n)_{n\in \mathbb {N}}\) be two sequences of \(\mathbb {R}\)-valued random variables such that for all \(\delta > 0\),
$$\begin{aligned} \limsup _{n\rightarrow \infty } \frac{1}{n} \log \mathbb {P}( |\xi _n - \tilde{\xi }_n| > \delta ) = -\infty ; \end{aligned}$$
then \((\xi _n)_{n\in \mathbb {N}}\) and \((\tilde{\xi }_n)_{n\in \mathbb {N}}\) are said to be exponentially equivalent.

Proposition 4

([9]) If \((\xi _n)_{n\in \mathbb {N}}\) is a sequence of random variables that satisfies an LDP with good rate function \(\mathbb {I}\), and \((\tilde{\xi }_n)_{n\in \mathbb {N}}\) is another sequence that is exponentially equivalent to \((\xi _n)_{n\in \mathbb {N}}\), then \((\tilde{\xi }_n)_{n\in \mathbb {N}}\) satisfies an LDP with good rate function \(\mathbb {I}\).

Lemma 3

Let \((\xi _n)_{n\in \mathbb {N}}\) be a sequence of random variables that satisfies an LDP with a good rate function \(\mathbb {I}\). Let \((a_n)_{n\in \mathbb {N}}\) be a deterministic sequence such that \(a_n\rightarrow 1\) as \(n\rightarrow \infty \), and let \((\tilde{\xi }_n)_{n\in \mathbb {N}}\) be another sequence defined by:
$$\begin{aligned} \tilde{\xi }_n = a_n {\xi }_n, \quad n\in \mathbb {N}. \end{aligned}$$
If \(\mathbb {I}\) is quasiconvex—that is, if the set \(\{x \in \mathbb {R}: \mathbb {I}(x) \in (-\infty ,c) \}\) is convex for all \(c\in \mathbb {R}\)—then \(({\xi }_n)_{n\in \mathbb {N}}\) and \((\tilde{\xi }_n)_{n\in \mathbb {N}}\) are exponentially equivalent.

Proof

For \(\varepsilon > 0\), let \(N_\varepsilon <\infty \) be such that for all \(n\ge N_\varepsilon \), we have \(|1-a_n| < \varepsilon \). For \(n\ge N_\varepsilon \) and any \(\delta > 0\),
$$\begin{aligned} |\tilde{\xi }_n - {\xi }_n| \ge \delta \quad \Leftrightarrow \quad |\xi _n| \cdot |1- a_n| \ge \delta \quad \Rightarrow \quad |\xi _n| \ge \tfrac{\delta }{\varepsilon }. \end{aligned}$$
Because \(\mathbb {I}\) is lower semicontinuous and has compact level sets, it achieves its global minimum at some (not necessarily unique) \(\bar{x} \in \mathbb {R}\). Fix \(\delta > 0\) and let \(\varepsilon > 0\) be small enough such that \(|\bar{x}| < \tfrac{\delta }{\varepsilon }\). Then,
$$\begin{aligned} \limsup _{n\rightarrow \infty } \frac{1}{n} \log \mathbb {P}( |\tilde{\xi }_n - \xi _n| > \delta )&\le \limsup _{n\rightarrow \infty } \frac{1}{n} \log \mathbb {P}( | \xi _n| \ge \tfrac{\delta }{\varepsilon } ) \\&\le -\inf _{|x|\ge \delta /\varepsilon } \mathbb {I}(x)\\&= -\min \left[ \mathbb {I}(\tfrac{\delta }{\varepsilon }) , \, \mathbb {I}(-\tfrac{\delta }{\varepsilon })\right] . \end{aligned}$$
The second inequality follows from the LDP for \((\xi _n)_{n\in \mathbb {N}}\). The last equality follows from the fact that if a quasiconvex function has a global minimizer \(\bar{x}\), then it is non-increasing for \(x < \bar{x}\), and non-decreasing for \(x > \bar{x}\) [15, Lemma 1]. Hence, since the rate function \(\mathbb {I}\) is quasiconvex and has a global minimizer \(\bar{x}\) which satisfies \(|\bar{x}| < \delta /\varepsilon \), it follows that if \(x \ge \delta /\varepsilon \) (resp., \(x \le -\delta /\varepsilon \)), then we have \(\mathbb {I}(x) \ge \mathbb {I}(\delta /\varepsilon )\) (resp., \(\mathbb {I}(x) \ge \mathbb {I}(-\delta /\varepsilon )\)). Lastly, take the limit as \(\varepsilon \rightarrow 0\), and use the compactness of the level sets of \(\mathbb {I}\) to conclude that \(\mathbb {I}(\frac{\delta }{\varepsilon }) \rightarrow +\infty \) and \(\mathbb {I}(-\frac{\delta }{\varepsilon }) \rightarrow +\infty \). This proves the required exponential equivalence. \(\square \)
Fix \(\zeta \) satisfying (11). Due to the strong law of large numbers, we have that for \(\zeta \)-a.e. \(z\in \mathbb {A}\),
$$\begin{aligned} \frac{\sqrt{n}}{\Vert z^{(n)}\Vert _n} = \left( \frac{1}{n} \sum _{i=1}^n (z_i^{(n)})^2 \right) ^{-1/2} \xrightarrow {n\rightarrow \infty } 1. \end{aligned}$$
(14)
Thus, we are in a prime position to apply Lemma 3, which motivates the analysis of an LDP for \((\widehat{W}_z^{(n)})_{n\in \mathbb {N}}\).

3.3 Proof of the LDP for \((W_\theta ^{(n)})_{n\in {\pmb {\mathbb {N}}}}\)

We aim to prove an LDP for the sequence \((\widehat{W}_z^{(n)})_{n\in \mathbb {N}}\); that is, an LDP for sums of independent but not identically distributed random variables (where the lack of identical distribution comes from the inhomogeneous weights \(z_i^{(n)}\) within the sum). The Gärtner–Ellis theorem (recalled below) is well suited for such an LDP.

Theorem 4

(Gärtner-Ellis) Let \((\xi _n)_{n\in \mathbb {N}}\) be a sequence of \(\mathbb {R}\)-valued random variables. Suppose that the limit log mgf\(\bar{\varLambda }:\mathbb {R}\rightarrow [0,\infty )\) defined by
$$\begin{aligned} \bar{\varLambda }(t) \displaystyle \mathop {=}^{\cdot }\lim _{n\rightarrow \infty } \frac{1}{n} \log \mathbb {E}[e^{tn\xi _n}] \end{aligned}$$
is finite and differentiable at all \(t\in \mathbb {R}\). Then \((\xi _n)_{n\in \mathbb {N}}\) satisfies an LDP with the convex good rate function \(\bar{\varLambda }^*\), the Legendre transform of \(\bar{\varLambda }\).

For a proof of Theorem 4, we refer to [10, Theorem V.6], which also includes a more general version of the Gärtner–Ellis theorem that applies even if \(\bar{\varLambda }\) is finite for only some\(t\in \mathbb {R}\) (under mild additional conditions).

The following lemma establishes a property of \(\varPsi \) that will be used in the application of the Gärtner–Ellis theorem.

Lemma 4

Suppose that
$$\begin{aligned} \forall \, t\in \mathbb {R}, \quad \int _\mathbb {R}|\varLambda (tu)| \, \nu (du) < \infty . \end{aligned}$$
(15)
Then, the function \(\varPsi \) of (7) is differentiable on \(\mathbb {R}\).

Proof

For each \(t\in \mathbb {R}\), differentiability of \(\varPsi \) at t follows from the differentiability of \(t\mapsto \varLambda (tu)\) for all \(u\in \mathbb {R}\), and an application of the dominated convergence theorem with the dominating function
$$\begin{aligned} g_t(u) \displaystyle \mathop {=}^{\cdot }|\varLambda '((t-1)u)u| + |\varLambda '((t+1)u)u|, \quad u\in \mathbb {R}. \end{aligned}$$
Indeed, fix \(t\in \mathbb {R}\) and for each \(\delta \in (-1,1)\) and \(u\in \mathbb {R}\), define the difference quotient \(R_{t,\delta }(u) \displaystyle \mathop {=}^{\cdot }[\varLambda ((t+\delta )u) - \varLambda (tu)]/\delta \). Then,
$$\begin{aligned} |R_{t,\delta }(u)| \le \sup \left\{ |\varLambda '((t+\alpha )u)u| : \alpha \in [-1,1]\right\} \le g(u), \end{aligned}$$
where the last inequality uses the fact that \(t\mapsto u\,\varLambda '(tu)\) is monotone. To show that \(g_t\) is integrable, first note that the convexity of \(\varLambda \) implies that for \(u,s\in \mathbb {R}\),
$$\begin{aligned} \varLambda (su) - \varLambda (0) \le \varLambda '(su)\,su \le \varLambda (2su)-\varLambda (su), \end{aligned}$$
and hence,
$$\begin{aligned} |\varLambda '(su)su| \le |\varLambda (0)| + |\varLambda (su)| + |\varLambda (2su)|. \end{aligned}$$
Since, by the assumption (15), for every \(s\in \mathbb {R}\), the right-hand side is an integrable function of u, it follows that \(g_t\) is also integrable for every \(t\in \mathbb {R}\). \(\square \)

Proof of Theorem 2

Due to Lemma 2 (in particular, its consequence, (13)), it suffices to prove a \(\zeta \)-a.e. LDP for the sequence \(((\sqrt{n}/\Vert z\Vert _n)\widehat{W}_z^{(n)})_{n\in \mathbb {N}}\), where \(\widehat{W}_z^{(n)}\) is defined as in (12). Due to Lemma 3 and the limit (14), it suffices to prove a \(\zeta \)-a.e. LDP for \((\widehat{W}_z^{(n)})_{n\in \mathbb {N}}\). To this end, we consider the Gärtner–Ellis limit log mgf for the sequence \((\widehat{W}_z^{(n)})_{n\in \mathbb {N}}\). For every \(n\in \mathbb {N}\) and \(t\in \mathbb {R}\), we have due to the independence of \(X_i\), \(i=1,\ldots , n\),
$$\begin{aligned} \varLambda _{n,z}(t)\displaystyle \mathop {=}^{\cdot }\frac{1}{n} \log \mathbb {E}\left[ \exp \left( tn\widehat{W}_z^{(n)}\right) \right] = \frac{1}{n} \log \prod _{i=1}^n \mathbb {E}\left[ \exp \left( t X_iz_i^{(n)}\right) \right] = \frac{1}{n} \sum _{i=1}^n \varLambda (tz_i^{(n)}). \end{aligned}$$
(16)
We first claim that for \(\zeta \)-a.e. \(z\in \mathbb {A}\), the Gärtner–Ellis limit log mgf, the limit of (16), satisfies, for each \(t\in \mathbb {R}\),
$$\begin{aligned} \lim _{n\rightarrow \infty } \varLambda _{n,z}(t) = \int _\mathbb {R}\varLambda (tu) \nu (du) = \varPsi (t), \end{aligned}$$
(17)
with \(\varPsi \) as defined in (7).

We proceed by proving the following modified claim (obtained by interchanging the quantifiers in our original claim): for each \(t\in \mathbb {R}\), for \(\zeta \)-a.e. \(z\in \mathbb {A}\), the expression (17) holds. Note that if z were an i.i.d. sequence instead of a triangular array, our modified claim would follow from the usual strong law of large numbers. However, the strong LLN does not necessarily extend to empirical means of rows of i.i.d. random variables in a triangular array (see, e.g., [18, Example 5.41]). On the other hand, if the common distribution of the i.i.d. elements (in our case, each of the random variables \(\varLambda (tz_i^{(n)})\), \(i=1,\dots ,n\), \(n\in \mathbb {N}\)) has finite fourth moment, then the strong LLN follows from a standard weak LLN and Borel–Cantelli argument [18, p. 113, (i)]. Due to our assumption (H2), it follows that for all \(t\in \mathbb {R}\), for \(\zeta \)-a.e. \(z\in \mathbb {A}\), the limit (17) holds.

Next, we aim to interchange the quantifiers to establish the original claim. Note that for each \(n\in \mathbb {N}\), \(\varLambda _{n,z}\) of (16) is a convex function (since it is the sum of convex functions). Now, let \(T\subset \mathbb {R}\) be countable and dense. Then, it follows from countable additivity that for \(\zeta \)-a.e. \(z\in \mathbb {A}\), the convex functions \(\varLambda _{n,z}(t)\) converge pointwise as \(n\rightarrow \infty \) to \(\varPsi (t)\), for all t in the dense subset \(T\subset \mathbb {R}\). Hence, the convex analytic considerations of [17, Theorem 10.8] imply that the pointwise convergence of \(\varLambda _{n,z}(t)\) to \(\varPsi (t)\) holds for all \(t\in \mathbb {R}\). That is, for \(\zeta \)-a.e. \(z\in \mathbb {A}\), for all \(t\in \mathbb {R}\), the limit (17) holds, proving our original claim.

Since (H2) holds, \(\varPsi (t)<\infty \) for all \(t\in \mathbb {R}\) and, because (15) follows trivially from (H2), Lemma 4 implies that \(\varPsi \) is differentiable on \(\mathbb {R}\). Therefore, by the Gärtner–Ellis Theorem (Theorem 4), for \(\zeta \)-a.e. \(z\in \mathbb {A}\), the sequence \((\widehat{W}_z^{(n)})_{n\in \mathbb {N}}\) satisfies an LDP with good rate function \(\varPsi ^*\). \(\square \)

4 Atypicality

In this section, we compare the rate function \(\mathbb {I}_\sigma \) with the Cramér rate function \(\mathbb {I}_\iota \). We first use Jensen’s inequality to compare the associated log mgfs \(\varPsi \) and \(\varLambda \).

Lemma 5

Assume (H3), and let \(\varLambda \) and \(\varPsi \) be defined as in (3) and (7), respectively.
(a)

If \(\varLambda \circ \sqrt{\cdot }\) is concave on \(\mathbb {R}_+\), then \(\varPsi (t) \le \varLambda (t)\) for all \(t\in \mathbb {R}\).

(b)

If \(\varLambda \circ \sqrt{\cdot }\) is convex on \(\mathbb {R}_+\), then \(\varPsi (t) \ge \varLambda (t)\) for all \(t\in \mathbb {R}\).

(c)

If \(\varLambda \circ \sqrt{\cdot }\) is concave or convex, but not linear, on \(\mathbb {R}_+\), then \(\varLambda (t) = \varPsi (t)\) if and only if \(t=0\).

Proof

We begin with part (a). Let \(\nu \) be the standard Gaussian distribution, and let \(Z\sim \nu \) be a standard Gaussian random variable. Then, for all \(t\in \mathbb {R}\), we have
$$\begin{aligned} \varPsi (t)&= \mathbb {E}[\varLambda (tZ)]\\ \text {(symmetry)}\quad&= \mathbb {E}\left[ \varLambda \left( (t^2Z^2)^{1/2}\right) \right] \\ \text {(Jensen)} \quad&\le \varLambda \left( \mathbb {E}[t^2Z^2]^{1/2}\right) \\&= \varLambda (t). \end{aligned}$$
Similar calculations can be used to establish part (b). As for part (c), recall that in Jensen’s inequality, equality holds if and only if either: (i) \(\varLambda \circ \sqrt{\cdot }\) is linear; or (ii) the underlying random variable is almost surely constant. Note that (i) is not the case by assumption. As for (ii), this holds if and only if \(t^2Z^2\) is almost surely constant, which is the case if and only if \(t=0\). \(\square \)

Before we prove the theorem, we recall some basic facts about the log mgf of \(X_1\sim ~\gamma \). Let the domain of a function \(f:\mathbb {R}\rightarrow \mathbb {R}\) be the set \(D_f\displaystyle \mathop {=}^{\cdot }\{x\in \mathbb {R}: f(x)<~\infty \}\). For a set \(D\subset \mathbb {R}\), let \(D^\circ \) denote the interior of D.

Lemma 6

Let \(\varLambda (t) = \log \mathbb {E}[e^{tX_1}]\) be the log mgf of some random variable \(X_1\). Then,
  1. 1.

    \(\varLambda \) is lower semicontinuous;

     
  2. 2.

    \(\varLambda \) is smooth in \(D_{\varLambda }^\circ \);

     
  3. 3.

    \(\varLambda \) is convex.

     
Furthermore, if \(X_1\) is non-degenerate (i.e., not a.s. constant), then
  1. 4.

    \(\varLambda \) is strictly convex in \(D_{\varLambda }^\circ \);

     
  2. 5.

    \(\varLambda ^*\) is differentiable in \(D_{\varLambda ^*}^\circ \);

     
  3. 6.
    for \(x\in D_{\varLambda ^*}^\circ \), the maximum in the definition of the Legendre transform is uniquely attained—that is, the following quantity is well defined:
    $$\begin{aligned} t_x \displaystyle \mathop {=}^{\cdot }\arg \max \{tx - \varLambda (t)\}. \end{aligned}$$
    (18)
     

Proof

These are mostly standard, but we provide sketches of the proofs. For 1., lower semicontinuity follows from Fatou’s lemma. For 2., smoothness follows from interchanging differentiation and expectation. Convexity in 3. and strict convexity in 4. follow from Hölder’s inequality. As for 5., it is classical that if a function is lower semicontinuous and strictly convex in the interior of its domain, then its Legendre transform is differentiable in the interior of its domain (see [17, Theorem 26.3]). Lastly, for 6., it is also classical that for \(x\in D_{\varLambda ^*}^\circ \), we have \(t_x = (\varLambda ^*)'(x)\) (see [17, Theorem 26.5]). \(\square \)

Proof of Theorem 3

Assume without loss of generality that \(X_1\) is non-degenerate. If it were degenerate, then due to the symmetry condition (H3), the law of \(X_1\) must be that \(\gamma = \delta _0\), in which case \(\varLambda = \varPsi = 0\). Therefore, \(\mathbb {I}_\sigma \) and \(\mathbb {I}_\iota \) are both equal to the characteristic function at 0 (which is equal to 0 at \(w=0\) and \(+\infty \) for all other w), and the result is trivial.

Suppose \(\varLambda \circ \sqrt{\cdot }\) is concave (the convex case is similar, but with inequalities reversed). Due to Lemma 5, we have \(\varPsi (t) \le \varLambda (t)\) for all \(t\in \mathbb {R}\), which due to the definition of the Legendre transform implies that \(\mathbb {I}_\sigma (w) = \varPsi ^*(w) \ge \varLambda ^*(w)= \mathbb {I}_\iota (w)\) for all \(w\in \mathbb {R}\), thus proving (a) (and (b) for the convex case).

Further assume the stronger condition of (c), that \(\varLambda \circ \sqrt{\cdot }\) is concave but not linear. Then, for \(w\in \mathbb {R}\) such that \(\varLambda ^*(w) < \infty \), let \(t_w\) be as in (18), which is well defined due to the non-degeneracy condition of Lemma 6. Then,
$$\begin{aligned} \mathbb {I}_\sigma (w) = \varPsi ^*(w)&\ge t_ww - \varPsi (t_w) \\&\ge t_ww - \varLambda (t_w) \\&= \varLambda ^*(w) = \mathbb {I}_\iota (w). \end{aligned}$$
Due to Lemma 5, the second inequality above is an equality if and only if \(t_w = 0\), which occurs if and only if \((\varLambda ^*)'(w) = 0\). Note that \(\varLambda \) is symmetric, so \(\varLambda ^*\) is also symmetric (by definition of the Legendre transform). Moreover, the smoothness of \(\varLambda \) (see Lemma 6), implies the strict convexity of \(\varLambda ^*\) within its domain (see [17, Theorem 26.3]). Thus, \((\varLambda ^*)'(w) = 0\) if and only if \(w=0\). This yields the claim of part (c). \(\square \)

Remark 4

In this paper, we address the “atypical” nature of the directions \(\iota ^{(n)}= (1,1,\dots ,1)\) associated with Cramér’s theorem for large deviations of product measures. But in fact, the notions of atypicality and universal rate function extend beyond the product case. In particular, the companion paper [13] establishes LDPs for random projections of random vectors distributed according to the uniform measure on \(\ell ^p\) balls, again with a rate function that coincides for \(\sigma \)-a.e. sequence of directions, and the sequence of directions \(\iota ^{(n)}= (1,1,\dots ,1)\), \(n \in \mathbb {N}\), can be shown to be atypical in that setting as well.

Notes

Acknowledgments

NG and KR would like to thank ICERM, Providence, for an invitation to the program “Computational Challenges in Probability,” where some of this work was initiated. SSK and KR would also like to thank Microsoft Research New England for their hospitality during the Fall of 2014, when some of this work was completed. SSK was partially supported by a Department of Defense NDSEG fellowship. KR was partially supported by ARO grant W911NF-12-1-0222 and NSF grant DMS 1407504. The authors would like to thank an anonymous referee for helpful feedback on the exposition.

References

  1. 1.
    F. Barthe, A. Koldobsky, Extremal slabs in the cube and the Laplace transform. Adv. Math. 174(1), 89–114 (2003)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    S.A. Book, Large deviation probabilities for weighted sums. Ann. Math. Stat. 43(4), 1221–1234 (1972)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    S.A. Book, A large deviation theorem for weighted sums. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 26(1), 43–49 (1973)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    A. Bovier, H. Mayer, A conditional strong large deviation result and a functional central limit theorem for the rate function. ALEA Lat. Am. J. Probab. Math. Stat. 12, 533–550 (2015)MathSciNetMATHGoogle Scholar
  5. 5.
    H. Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 27, 1–22 (1956)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Z. Chi, Stochastic sub-additivity approach to the conditional large deviation principle. Ann. Probab. 29(3), 1303–1328 (2001)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    H. Cramér, Sur un nouveau théoréme-limite de la théorie des probabilités. Actualités Scientifiques et Industrielles 736, 5–23 (1938)MATHGoogle Scholar
  8. 8.
    A. Dembo, I. Kontoyiannis, Source coding, large deviations, and approximate pattern matching. IEEE Trans. Inf. Theory 48(6), 1590–1615 (2002)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    A. Dembo, O. Zeitouni, Large Deviations Techniques and Applications, 2 edn. (Springer, Berlin, 1998)Google Scholar
  10. 10.
    F. Den Hollander. Large Deviations, Fields Institute Monographs, vol. 14 (American Mathematical Society, Providence, 2008)Google Scholar
  11. 11.
    W. Feller, An Introduction to Probability Theory and Its Applications, vol. II (Wiley, New York, 1970)Google Scholar
  12. 12.
    N. Gantert, K. Ramanan, F. Rembart, Large deviations for weighted sums of stretched exponential random variables. Electron. Commun. Probab. 19, 1–14 (2014)MathSciNetMATHGoogle Scholar
  13. 13.
    N. Gantert, S.S. Kim, K. Ramanan, Large deviations for random projections of \(\ell ^p\) balls. Preprint (2015) arXiv:1512.04988
  14. 14.
    R. Kiesel, U. Stadtmüller, A large deviation principle for weighted sums of independent identically distributed random variables. J. Math. Anal. Appl. 251(2), 929–939 (2000)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    D.G. Luenberger, Quasi-convex programming. SIAM J. Appl. Math. 16(5), 1090–1095 (1968)Google Scholar
  16. 16.
    J. Najim. A Cramér type theorem for weighted random variables. Electron. J. Probab. 7(4), 1–32 (2002)Google Scholar
  17. 17.
    R.T. Rockafellar, Convex Analysis vol. 28 (Princeton University Press, Princeton, 1970)Google Scholar
  18. 18.
    J.P. Romano, A.F. Siegel, Counterexamples in Probability and Statistics (CRC Press, Boca Raton, 1986)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Nina Gantert
    • 1
  • Steven Soojin Kim
    • 2
  • Kavita Ramanan
    • 3
  1. 1.Faculty for MathematicsTechnical University of MunichGarchingGermany
  2. 2.Division of Applied MathematicsBrown UniversityProvidenceUSA
  3. 3.Division of Applied MathematicsBrown UniversityProvidenceUSA

Personalised recommendations