Advances in the Mathematical Sciences pp 253-270 | Cite as

# Cramér’s Theorem is Atypical

## Abstract

The empirical mean of *n* independent and identically distributed (i.i.d.) random variables \((X_1,\dots ,X_n)\) can be viewed as a suitably normalized scalar projection of the *n*-dimensional random vector \(X^{(n)}\displaystyle \mathop {=}^{\cdot }\,(X_1,\dots ,X_n)\) in the direction of the unit vector \(n^{-1/2}(1,1,\dots ,1) \in \mathbb {S}^{n-1}\). The large deviation principle (LDP) for such projections as \(n\rightarrow \infty \) is given by the classical Cramér’s theorem. We prove an LDP for the sequence of normalized scalar projections of \(X^{(n)}\) in the direction of a generic unit vector \(\theta ^{(n)}\in \mathbb {S}^{n-1}\), as \(n\rightarrow \infty \). This LDP holds under fairly general conditions on the distribution of \(X_1\), and for “almost every” sequence of directions \((\theta ^{(n)})_{n\in \mathbb {N}}\). The associated rate function is “universal” in the sense that it does not depend on the particular sequence of directions. Moreover, under mild additional conditions on the law of \(X_1\), we show that the universal rate function differs from the Cramér rate function, thus showing that the sequence of directions \(n^{-1/2}(1,1,\dots ,1) \in \mathbb {S}^{n-1},\)\(n \in \mathbb {N}\), corresponding to Cramér’s theorem is atypical.

### Keywords

Large deviations Projections High-dimensional product measures Cramér’s theorem Rate function### Mathematics Subject Classification

60F10 (primary) 60D05 (secondary)## 1 Introduction

*n*independent and identically distributed (i.i.d.) \(\mathbb {R}\)-valued random variables with common distribution \(\gamma \in \mathcal {P}(\mathbb {R})\), the space of probability measures on \({\mathbb {R}}\). A fundamental probabilistic question is how the empirical mean of \(X^{(n)}\) behaves as the length of the sequence

*n*increases. From a geometric perspective, the empirical mean is a suitably normalized version of (the scalar component of) the projection of the

*n*-dimensional vector \(X^{(n)}\) in the direction of the unit vector \(\iota ^{(n)}\), defined byIn other words, we can write

*projection*of

*x*in the direction

*v*” to refer to the scalar component \(\langle x,v\rangle _n \in \mathbb {R}\) (rather than the vector \(\langle x,v\rangle _nv\in \mathbb {R}^n\)). Then, the expression (2) indicates that questions on the empirical mean for large

*n*can be rephrased in a geometric language as questions on suitably normalized projections of high-dimensional random vectors.

*Legendre transform*,

*convex conjugate*).

*weighted*means,

### Remark 1

While the LDP for (5) is novel, the corresponding law of large numbers (LLN) and central limit theorem (CLT) for weighted sums are well known. For example, a weak LLN follows from Chebyshev’s inequality, and a CLT follows from the Lindeberg condition (see, e.g., [11, Sect. VIII.4, Theorem 3]).

The outline of this note is as follows. In Sect. 2, we state our main results and discuss their relation to prior work. In Sect. 3, we prove the claimed LDP. In Sect. 4, we establish that Cramér’s theorem is atypical, and also comment on a generalization that is considered in [13].

## 2 Main Results

For \(\sigma \)-a.e. \(\theta \in \mathbb {S}^{n-1}\), we prove a large deviation principle for the sequence \((W_\theta ^{(n)})_{n\in \mathbb {N}}\) with a rate function that does not depend on \(\theta \). We refer to [9] for general background on large deviations. In particular, recall the following definition:

### Definition 1

*large deviation principle (LDP)*with a

*rate function*\(\mathbb {I}:\mathbb {R}\rightarrow [0,\infty ]\) if \(\mathbb {I}\) is lower semicontinuous, and for all Borel measurable sets \(\varGamma \subset \mathbb {R}\),

*good rate function*if it has compact level sets.

We say the sequence of \(\mathbb {R}\)-valued random variables \((\xi _n)_{n\in \mathbb {N}}\) satisfies an LDP if the sequence of laws \((\mu _n)_{n\in \mathbb {N}}\) given by \(\mu _n = \mathbb {P}\circ \xi _n^{-1}\) satisfies an LDP.

In particular, for empirical means of i.i.d. random variables, we recall the following classical result, due to [5, 7].

### Theorem 1

**(Cramér)**Let \((X_n)_{n\in \mathbb {N}}\) be an i.i.d. sequence such that (3) holds, and let \(\iota =(\iota ^{(1)},\iota ^{(2)},\dots )\) be defined as in (1). Then the sequence \((W_\iota ^{(n)})_{n\in \mathbb {N}}\) of (2) satisfies an LDP with the good rate function \(\mathbb {I}_\iota \), given by

### Lemma 1

*f*, and that there exist \(p\in (1,\infty )\) and constants \(0< C_1, C_2, C_3 < \infty \) such that for \(|x| > C_1\), we have

### Proof

*p*and \(\tfrac{p}{p-1}\), for \(\varepsilon > 0\) and \(t,y\in \mathbb {R}\),

*C*absorb all constants, and note that for \(0< \varepsilon < C_3p\) and \(t\in \mathbb {R}\),

### Theorem 2

**(Weighted LDP)**Assume (H1) and (H2). Then, for \(\sigma \)-a.e. \(\theta \in \mathbb {S}\), the sequence \((W_\theta ^{(n)})_{n\in \mathbb {N}}\) of (5) satisfies an LDP with the convex good rate function \(\mathbb {I}_\sigma \), given by

The proof of Theorem 2 is given in Sect. 3, with intermediate steps established in Sects. 3.1 and 3.2, and the proof completed in Sect. 3.3.

*tail sigma-algebra*induced by \((\theta ^{(1)},\theta ^{(2)},\dots )\). The rate function \(\mathbb {I}_\sigma \) is measurable with respect to \(\mathcal {T}\), and the Kolmogorov zero-one law states that \(\mathcal {T}\) is trivial under the product measure. Hence, \(\mathbb {I}_\sigma \) coincides for \(\sigma \)-a.e. \(\theta \in \mathbb {S}\). However, our claim holds for general \(\sigma \) satisfying (H1). In particular, Example 2(b) in Sect. 3.1 gives an example of \(\sigma \) such that \(\theta ^{(1)},\theta ^{(2)},\dots \) are highly dependent, \(\mathcal {T}\) is not trivial, and hence, the lack of dependence of the rate function \(\mathbb {I}_\sigma \) on \(\theta \) is not a priori obvious.

### Theorem 3

**(Atypicality)**Assume \(\varLambda \) satisfies (H3), and let \(\mathbb {I}_\iota \) and \(\mathbb {I}_\sigma \) be given by (6) and (8), respectively.

- (a)
If \(\varLambda \circ \sqrt{\cdot }\) is concave on \(\mathbb {R}_+\), then \(\mathbb {I}_\sigma (w) \ge \mathbb {I}_\iota (w)\) for all \(w\in \mathbb {R}\).

- (b)
If \(\varLambda \circ \sqrt{\cdot }\) is convex on \(\mathbb {R}_+\), then \(\mathbb {I}_\sigma (w) \le \mathbb {I}_\iota (w)\) for all \(w\in \mathbb {R}\).

- (c)
If \(\varLambda \circ \sqrt{\cdot }\) is concave or convex, but not linear, on \(\mathbb {R}_+\), then \(\mathbb {I}_\sigma (w) = \mathbb {I}_\iota (w) < \infty \) if and only if \(w=0\).

The proof of Theorem 3 is given in Sect. 4.

We now provide some sufficient conditions (established in [1]) for the convexity or concavity conditions of Theorem 3 to hold.

### Proposition 1

- (i)
- Suppose \(\gamma \ne \delta _0\), the Dirac mass at 0. Define \(\varphi :\mathbb {N}\rightarrow \mathbb {R}\) byIf \(\varphi \) is non-decreasing (resp., non-increasing), then \(\varLambda \circ \sqrt{\cdot }\) is concave (resp., convex) on \(\mathbb {R}_+\).$$\begin{aligned} \varphi (k) \displaystyle \mathop {=}^{\cdot }(2k+1) \frac{\mathbb {E}[|X_1|^{2k}]}{\mathbb {E}[|X_1|^{2k+2}]}, \quad k\in \mathbb {N}. \end{aligned}$$
- (ii)
Suppose \(\gamma \) has density

*f*such that \(\log f \circ \sqrt{\cdot }\) is concave (resp., convex) on \(\mathbb {R}_+\). Then \(\varLambda \circ \sqrt{\cdot }\) is concave (resp., convex) on \(\mathbb {R}_+\).

### Proof

Part (i) is established in Theorem 7 of [1]. Part (ii) follows from applying Theorem 12 of [1] with their *f* replaced by our \(f\circ \sqrt{\cdot }\), and noticing that the integrability of \(f\circ \sqrt{\cdot }\) follows from the fact that *f* has finite first moment, due to the exponential moment condition of (3). \(\square \)

### Example 1

*generalized normal distribution*with location 0, scale \(\alpha > 0\), and shape \(\beta > 1\); that is, \(\gamma =\mu _{\alpha ,\beta }\), where

The preceding example suggests the particular role of the Gaussian, which corresponds to \(\beta =2\). In particular, \(\gamma =\mu _{\alpha ,2}\) for some \(\alpha > 0\) if and only if \(\varLambda \circ \sqrt{\cdot }\) is linear. Thus, we could interpret the conditions of Theorem 3 as evaluating whether our distribution of interest is “more” or “less” log-concave than the Gaussian. We also have the following result in the Gaussian case (i.e., when \(\gamma =\mu _{\alpha ,2}\)), which holds for *all*\(\theta \) as opposed to just for \(\sigma \)-a.e. \(\theta \).

### Proposition 2

Suppose \(\gamma = \mu _{\alpha ,2}\) for some \(\alpha > 0\). Then, for all \(\theta \in \mathbb {S}\), the sequence \((W_\theta ^{(n)})_{n\in \mathbb {N}}\) satisfies an LDP with the good rate function \(\varPsi ^*(w) = \varLambda ^*(w) = (w/\alpha )^2\), where \(\varLambda ^*\) is defined in (4) with \(\varLambda \) the log mgf of the Gaussian with mean 0 and variance \(\alpha ^2/2\).

### Proof

This follows from the fact that for all \(n\in \mathbb {N}\), the Gaussian measure on \(\mathbb {R}^n\) is spherically symmetric, and hence, for any \(\theta ^{(n)} \in \mathbb {S}^{n-1}\), the law of \(\langle X^{(n)}, \theta ^{(n)}\rangle _n\) is the same as the law of \(\langle X^{(n)}, \iota ^{(n)}\rangle _n\). Thus, the LDP for \((W_\theta ^{(n)})_{n\in \mathbb {N}}\) follows from the classical Cramér’s theorem for empirical means of i.i.d. Gaussians, for which the rate function can be easily computed to be \(\varLambda ^*(w) = (w/\alpha )^2\). \(\square \)

### Remark 2

It is not clear whether a converse of Proposition 2 holds. That is, whether \(\mathbb {I}_\sigma \equiv \mathbb {I}_\iota \) if and only if \(\gamma \) is Gaussian. As one possible approach in this direction, it would be sufficient to show that for any measure \(\gamma \) satisfying both (H2) and (H3) (and possibly some additional natural conditions), the function \(\varLambda \circ \sqrt{\cdot }\) must be either concave or convex.

*n*. The following result states that under certain tail conditions, such normalized projections yield a trivial LDP, again with a rate function different from \(\mathbb {I}_\sigma \).

### Proposition 3

### Proof

*r*of (H2′) satisfies \(r<2\) by assumption,we have \(\lim _{n\rightarrow \infty }\varLambda _n (t) = 0\) for all \(t\in \mathbb {R}\). Thus, by the Gärtner–Ellis theorem, the sequence \((W_{e_1}^{(n)})_{n\in \mathbb {N}}\) satisfies an LDP with good rate function \(0^*=\chi _0\). \(\square \)

### 2.1 Relation to Prior Work

There is a wealth of literature on large deviations for weighted sums, but our work seems to be the first to emphasize the unique position of Cramér’s theorem in the geometric setting. Moreover, it appears that none of the existing literature is readily adaptable to our particular problem. We offer a partial (but inevitably, incomplete) survey of existing results.

*n*. However, this does not address our setting because if we let \(a_{nk} = \theta ^{(n)}_k\), we have \(\sum _{k=1}^n a_{nk}^2 = 1\), but this only yields tail bounds of the form\( \mathbb {P}( W_\theta ^{(n)} > cn^{-1/2}\sum _{k=1}^n \theta _k^{(n)})\), as opposed to the desired asymptotics for \(\mathbb {P}( W_\theta ^{(n)} > c)\). Furthermore, Book does not establish an LDP or identify a rate function.

*n*and

*k*, respectively. For \(Z\sim N(0,1)\), we have the following correspondence:

Yet more recently, [16] proves an LDP for weighted empirical means similar to (5), except with weights that are uniformly bounded (in *n*). Our results correspond to *unbounded* weights \(\sqrt{n}\theta ^{(n)}_i\) which are not covered by their results. Similarly, [6] proves an LDP for empirical means of certain bounded functionals, which again fails to apply to our unbounded weights.

In the context of information theory, [8] states an LDP for sums of the form \(\frac{1}{n}\sum _{i=1}^n \rho (x_i,Y_i)\), where \((x_i)_{i\in \mathbb {N}}\) are “weights,” \((Y_i)_{i\in \mathbb {N}}\) is a sequence of random variables satisfying certain mixing properties, and \(\rho :\mathcal {X}\times \mathcal {Y}\rightarrow \mathbb {R}_+\) for Polish spaces \(\mathcal {X}\) and \(\mathcal {Y}\). The LDP is stated in the form of a *generalized asymptotic equipartition property* for “distortion measures.” However, note that \(\rho \) is assumed to be nonnegative, so a function like \(\rho (x,y)=xy\) (corresponding to projections) does not fit within the setting of [8]. Moreover, their weights \((x_i)_{i\in \mathbb {N}}\) are assumed to be a realization of a stationary ergodic process, which is not the case for our weights \(\sqrt{n}\theta ^{(n)}\) that are drawn from the scaled sphere \(\sqrt{n}\mathbb {S}^{n-1}\). This lends our work a geometric rather than information-theoretic interpretation.

The paper [12], co-authored by the first and third authors of this work, also analyzes weighted sums of i.i.d. random variables, but there the emphasis is on sums of subexponential random variables, rather than the weights themselves.

The most closely related work to our own is the recent work of [4], which gives strong large deviations (i.e., refined asymptotics) for weighted sums of i.i.d. random variables and i.i.d. weights, conditioned on the weights. Our weights \(\sqrt{n}\theta ^{(n)}_i\) are not i.i.d., but in Sects. 3.1 and 3.2, we prove that Theorem 2 can be reduced to an LDP for the sequence \((\widehat{W}_z^{(n)})_{n\in \mathbb {N}}\) defined in (12), which is an i.i.d. weighted sum, conditional on given weights. With some additional calculations from this point, the rate function \(\mathbb {I}_\sigma \) of Theorem 2 could then be deduced from the conditional LDP of [4], stated in their Theorem 1.6 with rate function defined in their Eq. (1.13). Note that condition (iii) of their Theorem 1.6 has two parts, but our integrability condition (H2) corresponds only to their first part; in fact, it follows from Lemma 4 that their second part follows from our condition (15), which is weaker than (H2), and thus, need not be assumed separately. Moreover, our research (completed independently) differs due to our emphasis on a geometric point of view; as a consequence, we can explicitly identify a rate function \(\mathbb {I}_\sigma \) and highlight the atypical position occupied by Cramér’s theorem.

Lastly, the method we use is a simplification of those developed in a companion paper [13], where we consider normalized projections of certain *non*-product measures, as well as projections in random directions.

## 3 The \(\sigma \)-Almost Everywhere LDP

### 3.1 The Surface Measure on \(\mathbb {S}^{n-1}\)

*n*th row of \(\mathcal {R}(z)\) is

*n*th row map such that \(\bar{\pi }_n(z) = z^{(n)}\). Let \(\nu \) denote the Gaussian measure on \(\mathbb {R}\), and let \(\nu ^{\otimes n}\) denote the standard Gaussian measure on \(\mathbb {R}^n\).

### Lemma 2

### Proof

Both results are merely a restatement of the well known fact that if \(Z^{(n)}\) has the *n*-dimensional standard Gaussian distribution, then \(Z^{(n)}/\Vert Z^{(n)}\Vert _n\) is uniformly distributed on the unit sphere \(\mathbb {S}^{n-1}\), and independent of \(\Vert Z^{(n)}\Vert _n\). \(\square \)

### Example 2

- (a)
Consider the completely independent case, where the elements \(Z_i^{(n)}\), \(i=1,\dots , n\), \(n\in \mathbb {N}\), are all independent; then the law of \(\mathcal {R}(Z)\) is the product measure \(\sigma = \bigotimes _{n\in \mathbb {N}} \sigma _{n-1}\), where each row \(\theta ^{(n)}\) of \(\theta \) is independent under \(\sigma \). As previously noted, the tail sigma-algebra \(\mathcal {T}\) induced by the rows (defined in (9)), is trivial in this case due to the Kolmogorov zero-one law.

- (b)Alternatively, consider the following highly dependent case: let \(\zeta \in \mathcal {P}(\mathbb {A})\) satisfy (11) such that for \(\zeta \)-a.e. \(z\in \mathbb {A}\), we have \(z_i^{(n)} = z_i^{(m)}\) for all \(i\in \mathbb {N}\) and \(m,n\ge i\) (i.e., constant within columns). Then, let \(\sigma = \zeta \circ \mathcal {R}^{-1}\), so that \(\sigma \) satisfies (H1) by Lemma 2. In this case, there is strong dependence across rows which precludes a claim regarding triviality of the tail sigma-algebra \(\mathcal {T}\) induced by the rows. In fact, consider the eventNote that$$\begin{aligned} A \displaystyle \mathop {=}^{\cdot }\left\{ \theta \in \mathbb {S}: \lim _{n\rightarrow \infty } \sqrt{n} \theta ^{(n)}_1 > 0 \right\} \end{aligned}$$
*A*is measurable with respect to \(\mathcal {T}\). However, we also have due to the strong law of large numbers (\(\zeta \)-a.e., as stated precisely in (14)),That is, \(\mathcal {T}\) is non-trivial, and so \(\mathbb {I}_\sigma \) cannot$$\begin{aligned} \sigma (A) = \zeta \left( z\in \mathbb {A} : \lim _{n\rightarrow \infty } \sqrt{n} \,z_1^{(n)} / \Vert z^{(n)}\Vert _{n,2}> 0\right) = \zeta \left( z\in \mathbb {A} : z_1^{(1)} > 0 \right) = \frac{1}{2}. \end{aligned}$$*a priori*be declared as \(\sigma \)-a.e. constant through a simple analysis of the tail sigma-algebra.

### Remark 3

We assume the condition (H1) not in an attempt to be as general as possible, but rather to point out that the universality of the rate function is a genuinely interesting phenomenon. Specifically, if we only consider the independent case of Example 2(a), then the fact that \(\mathbb {I}_\sigma \) is “universal” (in that it does not depend on \(\theta \)) is a consequence of the fact that the tail sigma-algebra \(\mathcal {T}\) is trivial. However, Example 2(b) shows that universality of the rate function is a more general phenomenon that holds even when \(\mathcal {T}\) is non-trivial. The condition (H1) only imposes constraints on the “marginal” distribution of the *n*th row of the array \(\theta \), and imposes no restrictions on the dependence across different rows \(\theta ^{(n)}\), \(n\in \mathbb {N}\). In fact, for \(Z\sim \zeta \) satisfying (11), the elements of *Z* need not even be *jointly* Gaussian in order for the law of \(\mathcal {R}(Z)\) to satisfy (H1).

### 3.2 Exponential Equivalence

As a consequence of Lemma 2 and the equality in (13), we can replace \(\sigma \)-a.e. statements about \(W_\theta ^{(n)}\), \(n\in \mathbb {N}\), with \(\zeta \)-a.e. statements about \((\sqrt{n}/ \Vert z^{(n)}\Vert _n)\widehat{W}_z^{(n)}\), \(n\in \mathbb {N}\). In this section, we go further and explain why in the large deviations setting, we can ignore the contribution of the multiplicative factor \(\sqrt{n}/\Vert z^{(n)}\Vert _n\). That is, we show that such a factor yields an exponentially equivalent sequence, defined as follows.

### Definition 2

*exponentially equivalent*.

### Proposition 4

([9]) If \((\xi _n)_{n\in \mathbb {N}}\) is a sequence of random variables that satisfies an LDP with good rate function \(\mathbb {I}\), and \((\tilde{\xi }_n)_{n\in \mathbb {N}}\) is another sequence that is exponentially equivalent to \((\xi _n)_{n\in \mathbb {N}}\), then \((\tilde{\xi }_n)_{n\in \mathbb {N}}\) satisfies an LDP with good rate function \(\mathbb {I}\).

### Lemma 3

### Proof

### 3.3 Proof of the LDP for \((W_\theta ^{(n)})_{n\in {\pmb {\mathbb {N}}}}\)

We aim to prove an LDP for the sequence \((\widehat{W}_z^{(n)})_{n\in \mathbb {N}}\); that is, an LDP for sums of independent but not identically distributed random variables (where the lack of identical distribution comes from the inhomogeneous weights \(z_i^{(n)}\) within the sum). The Gärtner–Ellis theorem (recalled below) is well suited for such an LDP.

### Theorem 4

**(Gärtner-Ellis)**Let \((\xi _n)_{n\in \mathbb {N}}\) be a sequence of \(\mathbb {R}\)-valued random variables. Suppose that the

*limit log mgf*\(\bar{\varLambda }:\mathbb {R}\rightarrow [0,\infty )\) defined by

For a proof of Theorem 4, we refer to [10, Theorem V.6], which also includes a more general version of the Gärtner–Ellis theorem that applies even if \(\bar{\varLambda }\) is finite for only *some*\(t\in \mathbb {R}\) (under mild additional conditions).

The following lemma establishes a property of \(\varPsi \) that will be used in the application of the Gärtner–Ellis theorem.

### Lemma 4

### Proof

*t*follows from the differentiability of \(t\mapsto \varLambda (tu)\) for all \(u\in \mathbb {R}\), and an application of the dominated convergence theorem with the dominating function

*u*, it follows that \(g_t\) is also integrable for every \(t\in \mathbb {R}\). \(\square \)

### Proof of Theorem 2

We proceed by proving the following modified claim (obtained by interchanging the quantifiers in our original claim): for each \(t\in \mathbb {R}\), for \(\zeta \)-a.e. \(z\in \mathbb {A}\), the expression (17) holds. Note that if *z* were an i.i.d. sequence instead of a triangular array, our modified claim would follow from the usual strong law of large numbers. However, the strong LLN does not necessarily extend to empirical means of rows of i.i.d. random variables in a triangular array (see, e.g., [18, Example 5.41]). On the other hand, if the common distribution of the i.i.d. elements (in our case, each of the random variables \(\varLambda (tz_i^{(n)})\), \(i=1,\dots ,n\), \(n\in \mathbb {N}\)) has finite fourth moment, then the strong LLN follows from a standard weak LLN and Borel–Cantelli argument [18, p. 113, (i)]. Due to our assumption (H2), it follows that for all \(t\in \mathbb {R}\), for \(\zeta \)-a.e. \(z\in \mathbb {A}\), the limit (17) holds.

Next, we aim to interchange the quantifiers to establish the original claim. Note that for each \(n\in \mathbb {N}\), \(\varLambda _{n,z}\) of (16) is a convex function (since it is the sum of convex functions). Now, let \(T\subset \mathbb {R}\) be countable and dense. Then, it follows from countable additivity that for \(\zeta \)-a.e. \(z\in \mathbb {A}\), the convex functions \(\varLambda _{n,z}(t)\) converge pointwise as \(n\rightarrow \infty \) to \(\varPsi (t)\), for all *t* in the dense subset \(T\subset \mathbb {R}\). Hence, the convex analytic considerations of [17, Theorem 10.8] imply that the pointwise convergence of \(\varLambda _{n,z}(t)\) to \(\varPsi (t)\) holds for all \(t\in \mathbb {R}\). That is, for \(\zeta \)-a.e. \(z\in \mathbb {A}\), for all \(t\in \mathbb {R}\), the limit (17) holds, proving our original claim.

Since (H2) holds, \(\varPsi (t)<\infty \) for all \(t\in \mathbb {R}\) and, because (15) follows trivially from (H2), Lemma 4 implies that \(\varPsi \) is differentiable on \(\mathbb {R}\). Therefore, by the Gärtner–Ellis Theorem (Theorem 4), for \(\zeta \)-a.e. \(z\in \mathbb {A}\), the sequence \((\widehat{W}_z^{(n)})_{n\in \mathbb {N}}\) satisfies an LDP with good rate function \(\varPsi ^*\). \(\square \)

## 4 Atypicality

In this section, we compare the rate function \(\mathbb {I}_\sigma \) with the Cramér rate function \(\mathbb {I}_\iota \). We first use Jensen’s inequality to compare the associated log mgfs \(\varPsi \) and \(\varLambda \).

### Lemma 5

- (a)
If \(\varLambda \circ \sqrt{\cdot }\) is concave on \(\mathbb {R}_+\), then \(\varPsi (t) \le \varLambda (t)\) for all \(t\in \mathbb {R}\).

- (b)
If \(\varLambda \circ \sqrt{\cdot }\) is convex on \(\mathbb {R}_+\), then \(\varPsi (t) \ge \varLambda (t)\) for all \(t\in \mathbb {R}\).

- (c)
If \(\varLambda \circ \sqrt{\cdot }\) is concave or convex, but not linear, on \(\mathbb {R}_+\), then \(\varLambda (t) = \varPsi (t)\) if and only if \(t=0\).

### Proof

Before we prove the theorem, we recall some basic facts about the log mgf of \(X_1\sim ~\gamma \). Let the *domain* of a function \(f:\mathbb {R}\rightarrow \mathbb {R}\) be the set \(D_f\displaystyle \mathop {=}^{\cdot }\{x\in \mathbb {R}: f(x)<~\infty \}\). For a set \(D\subset \mathbb {R}\), let \(D^\circ \) denote the interior of *D*.

### Lemma 6

- 1.
\(\varLambda \) is lower semicontinuous;

- 2.
\(\varLambda \) is smooth in \(D_{\varLambda }^\circ \);

- 3.
\(\varLambda \) is convex.

- 4.
\(\varLambda \) is strictly convex in \(D_{\varLambda }^\circ \);

- 5.
\(\varLambda ^*\) is differentiable in \(D_{\varLambda ^*}^\circ \);

- 6.for \(x\in D_{\varLambda ^*}^\circ \), the maximum in the definition of the Legendre transform is uniquely attained—that is, the following quantity is well defined:$$\begin{aligned} t_x \displaystyle \mathop {=}^{\cdot }\arg \max \{tx - \varLambda (t)\}. \end{aligned}$$(18)

### Proof

These are mostly standard, but we provide sketches of the proofs. For 1., lower semicontinuity follows from Fatou’s lemma. For 2., smoothness follows from interchanging differentiation and expectation. Convexity in 3. and strict convexity in 4. follow from Hölder’s inequality. As for 5., it is classical that if a function is lower semicontinuous and strictly convex in the interior of its domain, then its Legendre transform is differentiable in the interior of its domain (see [17, Theorem 26.3]). Lastly, for 6., it is also classical that for \(x\in D_{\varLambda ^*}^\circ \), we have \(t_x = (\varLambda ^*)'(x)\) (see [17, Theorem 26.5]). \(\square \)

### Proof of Theorem 3

Assume without loss of generality that \(X_1\) is non-degenerate. If it were degenerate, then due to the symmetry condition (H3), the law of \(X_1\) must be that \(\gamma = \delta _0\), in which case \(\varLambda = \varPsi = 0\). Therefore, \(\mathbb {I}_\sigma \) and \(\mathbb {I}_\iota \) are both equal to the characteristic function at 0 (which is equal to 0 at \(w=0\) and \(+\infty \) for all other *w*), and the result is trivial.

Suppose \(\varLambda \circ \sqrt{\cdot }\) is concave (the convex case is similar, but with inequalities reversed). Due to Lemma 5, we have \(\varPsi (t) \le \varLambda (t)\) for all \(t\in \mathbb {R}\), which due to the definition of the Legendre transform implies that \(\mathbb {I}_\sigma (w) = \varPsi ^*(w) \ge \varLambda ^*(w)= \mathbb {I}_\iota (w)\) for all \(w\in \mathbb {R}\), thus proving (a) (and (b) for the convex case).

### Remark 4

In this paper, we address the “atypical” nature of the directions \(\iota ^{(n)}= (1,1,\dots ,1)\) associated with Cramér’s theorem for large deviations of product measures. But in fact, the notions of atypicality and universal rate function extend beyond the product case. In particular, the companion paper [13] establishes LDPs for random projections of random vectors distributed according to the uniform measure on \(\ell ^p\) balls, again with a rate function that coincides for \(\sigma \)-a.e. sequence of directions, and the sequence of directions \(\iota ^{(n)}= (1,1,\dots ,1)\), \(n \in \mathbb {N}\), can be shown to be atypical in that setting as well.

## Notes

### Acknowledgments

NG and KR would like to thank ICERM, Providence, for an invitation to the program “Computational Challenges in Probability,” where some of this work was initiated. SSK and KR would also like to thank Microsoft Research New England for their hospitality during the Fall of 2014, when some of this work was completed. SSK was partially supported by a Department of Defense NDSEG fellowship. KR was partially supported by ARO grant W911NF-12-1-0222 and NSF grant DMS 1407504. The authors would like to thank an anonymous referee for helpful feedback on the exposition.

### References

- 1.F. Barthe, A. Koldobsky, Extremal slabs in the cube and the Laplace transform. Adv. Math.
**174**(1), 89–114 (2003)MathSciNetCrossRefMATHGoogle Scholar - 2.S.A. Book, Large deviation probabilities for weighted sums. Ann. Math. Stat.
**43**(4), 1221–1234 (1972)MathSciNetCrossRefMATHGoogle Scholar - 3.S.A. Book, A large deviation theorem for weighted sums. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete
**26**(1), 43–49 (1973)MathSciNetCrossRefMATHGoogle Scholar - 4.A. Bovier, H. Mayer, A conditional strong large deviation result and a functional central limit theorem for the rate function. ALEA Lat. Am. J. Probab. Math. Stat.
**12**, 533–550 (2015)MathSciNetMATHGoogle Scholar - 5.H. Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat.
**27**, 1–22 (1956)MathSciNetCrossRefGoogle Scholar - 6.Z. Chi, Stochastic sub-additivity approach to the conditional large deviation principle. Ann. Probab.
**29**(3), 1303–1328 (2001)MathSciNetCrossRefMATHGoogle Scholar - 7.H. Cramér, Sur un nouveau théoréme-limite de la théorie des probabilités. Actualités Scientifiques et Industrielles
**736**, 5–23 (1938)MATHGoogle Scholar - 8.A. Dembo, I. Kontoyiannis, Source coding, large deviations, and approximate pattern matching. IEEE Trans. Inf. Theory
**48**(6), 1590–1615 (2002)MathSciNetCrossRefMATHGoogle Scholar - 9.A. Dembo, O. Zeitouni,
*Large Deviations Techniques and Applications*, 2 edn. (Springer, Berlin, 1998)Google Scholar - 10.F. Den Hollander.
*Large Deviations, Fields Institute Monographs*, vol. 14 (American Mathematical Society, Providence, 2008)Google Scholar - 11.W. Feller,
*An Introduction to Probability Theory and Its Applications*, vol. II (Wiley, New York, 1970)Google Scholar - 12.N. Gantert, K. Ramanan, F. Rembart, Large deviations for weighted sums of stretched exponential random variables. Electron. Commun. Probab.
**19**, 1–14 (2014)MathSciNetMATHGoogle Scholar - 13.N. Gantert, S.S. Kim, K. Ramanan, Large deviations for random projections of \(\ell ^p\) balls. Preprint (2015) arXiv:1512.04988
- 14.R. Kiesel, U. Stadtmüller, A large deviation principle for weighted sums of independent identically distributed random variables. J. Math. Anal. Appl.
**251**(2), 929–939 (2000)MathSciNetCrossRefMATHGoogle Scholar - 15.D.G. Luenberger, Quasi-convex programming. SIAM J. Appl. Math.
**16**(5), 1090–1095 (1968)Google Scholar - 16.J. Najim. A Cramér type theorem for weighted random variables. Electron. J. Probab.
**7**(4), 1–32 (2002)Google Scholar - 17.R.T. Rockafellar,
*Convex Analysis*vol. 28 (Princeton University Press, Princeton, 1970)Google Scholar - 18.J.P. Romano, A.F. Siegel,
*Counterexamples in Probability and Statistics*(CRC Press, Boca Raton, 1986)Google Scholar