A Proof of Sanov’s Theorem via Discretizations

Baldasso, Rangel; Oliveira, Roberto I.; Pereira, Alan; Reis, Guilherme

doi:10.1007/s10959-022-01174-0

A Proof of Sanov’s Theorem via Discretizations

Open access
Published: 09 April 2022

Volume 36, pages 646–660, (2023)
Cite this article

Download PDF

You have full access to this open access article

Journal of Theoretical Probability Aims and scope Submit manuscript

A Proof of Sanov’s Theorem via Discretizations

Download PDF

Rangel Baldasso¹,
Roberto I. Oliveira²,
Alan Pereira³ &
…
Guilherme Reis ORCID: orcid.org/0000-0003-4129-6170⁴

2090 Accesses
1 Altmetric
Explore all metrics

Abstract

We present an alternative proof of Sanov’s theorem for Polish spaces in the weak topology that follows via discretization arguments. We combine the simpler version of Sanov’s theorem for discrete finite spaces and well-chosen finite discretizations of the Polish space. The main tool in our proof is an explicit control on the rate of convergence for the approximated measures.

Stochastic Quasi-Interpolation with Bernstein Polynomials

Article 19 September 2022

Deterministic Walks and Quasi-Subgradient Methods for the Karcher Mean on NPC Spaces

Intermediate weighted spaces and domains of semi-groups

Article 02 June 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Sanov’s theorem is a well-known result in the theory of large deviations principles. It provides the large deviations profile of the empirical measure of a sequence of i.i.d. random variables and characterizes its rate function as the relative entropy. This short note provides an alternative proof of this fact, by exploring the metric structure of the weak topology with the variational formulation of the relative entropy.

Formally, let $(M,\mathrm{d})$ be a Polish space and let $\big (X_n \big )_{n \in {\mathbb {N}}}$ be a sequence of independent M-valued random elements identically distributed according to $\mu \in \mathcal {P}(M)$, where $\mathcal {P}(M)$ is the set of Borel probability measures on M. We denote by $\delta _x$ the probability measure degenerate at $x \in M$, and define the empirical measure of $X_1, \dots , X_n$ by

$$\begin{aligned} L_n {:}{=} \frac{1}{n}\sum _{i=1}^n\delta _{X_i}. \end{aligned}$$

(1.1)

Also, given $\upsilon , \mu \in \mathcal {P}(M)$, the relative entropy between $\upsilon $ and $\mu $ is defined as:

$$\begin{aligned} H(\upsilon |\mu ){:}{=}\sup \left\{ \int f \mathrm{d}\upsilon -\log \int e^f \mathrm{d}\mu ;\,f \text { is measurable and bounded} \right\} . \end{aligned}$$

(1.2)

Sanov’s theorem is given by the following statement.

Theorem 1.1

(Sanov) Let $\big (X_n \big )_{n \in {\mathbb {N}}}$ be a sequence of i.i.d. random variables taking values in a Polish space $(M,\mathrm{d})$ with distribution $\mu \in \mathcal {P}(M)$. The sequence of empirical measures $\big (L_n\big )_{n \in {\mathbb {N}}}$ of $\big (X_{n}\big )_{n \in {\mathbb {N}}}$ (defined in Eq. (1.1)) satisfies a large deviations principle on the space ${\mathcal {P}}(M)$ with rate function $H( \,\cdot \, |\mu )$.

When the space M is finite, the theorem above is proved in an elementary and elegant way (see Den Hollander [3, Theorem II.2], and Dembo and Zeitouni [2, Theorem 2.1.10]). In this work, we prove the theorem for general Polish metric spaces by extending this elementary proof via sequences of discretizations of the space. We split the set M in a finite number of subsets which belong to one of two distinct categories. The well-behaved sets are the ones with small diameter, while the badly behaved sets will have small $\mu $-measure. We remark though that when the space M is compact, no badly behaved sets are necessary. These partitions define natural projections on the space and allow us to approximate the sequence $\big (X_{n} \big )_{n \in {\mathbb {N}}}$ by variables in the discretized spaces, and by consequence provide approximations for its empirical measures. The main technical observation is that the discretized relative entropy converges to the relative entropy (1.2) as we take thinner partitions (see Lemma 4.1) and the relative entropy is well approximated in balls (see Lemma 4.3).

Some ideas used to prove Lemma 4.3 are roughly inspired by the proof of the upper bound in Csiszár [1]. His work presents a proof of Sanov’s theorem for the $\uptau $-topology, a stronger topology than that of weak convergence, with an approach that differs greatly from more classical ones that can be found, for example, in [2, Theorem 6.2.10].

There are two proofs of Sanov’s theorem in [2], one by means of Cramér’s theorem for Polish spaces and the other following a projective limit approach. Although we strongly use the metric structure of the space, our proof does not require profound knowledge of large deviations theory or general topology.

Organization of the paper In the next section, we collect some preliminary notation and results that are used during the text. Section 3 introduces the discretization considered here. Section 4 contains the statement of the main lemmas used in the proof. We also show how Sanov’s theorem is proved in Sect. 4. Sections 5 and 6 contain the proofs of Lemmas 4.1 and 4.3, respectively.

2 Preliminaries

In this section, we review some basic concepts. We provide the definition of large deviations principle and weak topology, and collect some properties of the relative entropy.

Definition 2.1

(Large deviation principle) A sequence $\big ( \mathbb {P}_n \big )_{n \in {\mathbb {N}}}$ of probabilities over a metric space $({\mathfrak {X}}, \mathrm{d}_{{\mathfrak {X}}})$ satisfies a large deviation principle with rate function I if

1.
(Lower bound) For any open set $\mathcal {O} \subset {\mathfrak {X}}$,
$$\begin{aligned} \liminf _{n \rightarrow \infty }\frac{1}{n} \log \mathbb {P}_n (\mathcal {O}) \ge -\inf _{x \in \mathcal {O}} I(x); \end{aligned}$$
(2.1)
2.
(Upper bound) For any closed set $\mathcal {C} \subset {\mathfrak {X}}$,
$$\begin{aligned} \limsup _{n \rightarrow \infty }\frac{1}{n} \log \mathbb {P}_n(\mathcal {C}) \le -\inf _{x \in \mathcal {C}} I(x). \end{aligned}$$
(2.2)

The weak topology on $\mathcal {P}(M)$ is defined as the topology generated by the functionals:

$$\begin{aligned} \upsilon \mapsto \int \varphi \, \mathrm{d}\upsilon , \end{aligned}$$

(2.3)

where $\varphi \in C_b(M)$ is a continuous bounded function.

A metric compatible with the weak topology is the bounded Lipschitz metric

$$\begin{aligned} \mathrm{d}_{\mathrm{BL}}(\mu , \upsilon ) = \sup \left\{ \left| \int f \, \mathrm{d}\upsilon - \int f \, \mathrm{d}\mu \right| : f \in BL(M) \right\} , \end{aligned}$$

(2.4)

where BL(M) is the class of 1-Lipschitz functions $f : M \rightarrow \mathbb {R}$ bounded by one.

For the next lemma, let $x \wedge y$ denote the minimum between x and y.

Lemma 2.2

Let (X, Y) be a coupling of two distributions $\mu $ and $\upsilon $. Then,

$$\begin{aligned} \mathrm{d}_{\mathrm{BL}}(\mu ,\upsilon )\le \mathbb {E}(\mathrm{d}(X,Y) \wedge 2). \end{aligned}$$

(2.5)

Proof

Let $x, y \in M$ and notice that, for each $f \in BL(M)$,

$$\begin{aligned} |f(x) -f(y)| \le \mathrm{d}(x,y) \wedge 2, \end{aligned}$$

(2.6)

since f is 1-Lipschitz and bounded by one. The proof is now complete by noting that

$$\begin{aligned} \left| \int f \, \mathrm{d}\upsilon - \int f \, \mathrm{d}\mu \right| = \big | \mathbb {E} \big (f(X)-f(Y) \big ) \big | \le \mathbb {E}(\mathrm{d}(X,Y) \wedge 2). \end{aligned}$$

(2.7)

$\square $

Equation (1.2) is called the variational formulation of entropy, and it readily implies the so-called entropy inequality

$$\begin{aligned} \int f \mathrm{d}\upsilon \le H(\upsilon |\mu ) + \log \int e^f \mathrm{d}\mu , \end{aligned}$$

(2.8)

for any measurable bounded function f. We will also make use of the integral formulation of the relative entropy, provided in the next lemma. This formulation will be the key result used to approximate relative entropies in the discrete case to the general case.

Lemma 2.3

The variational formula of the relative entropy (1.2) is equivalent to the following integral formulation of the entropy:

$$\begin{aligned} H(\upsilon |\mu )={\left\{ \begin{array}{ll} \int \frac{\mathrm{d}\upsilon }{\mathrm{d}\mu }\log \frac{\mathrm{d}\upsilon }{\mathrm{d}\mu } \mathrm{d}\mu , &{} \text { if } \upsilon \ll \mu , \\ +\infty , &{} \text { otherwise}. \end{array}\right. } \end{aligned}$$

(2.9)

We refrain from presenting the proof of the lemma above and refer the reader to [4, Theorem 5.2.1].

3 Discretization

In this section, we present the discretization procedure used for the space M and related constructions for measures and random variables.

We start by discretizing the space. Let $\mu \in \mathcal {P}(M)$ and recall that since M is a Polish space, there exists, for each $m \in \mathbb {N}$, a compact set $K_m$ with

$$\begin{aligned} \mu (K_{m}^{\complement }) \le \frac{e^{-m^{2}-1}}{m}. \end{aligned}$$

(3.1)

The support of the measure $\mu $ is contained in the closure of the union of the compacts $K_{m}$. Notice that the collection of probability measures supported on the closure of $\cup _{m=1}^{\infty } K_{m}$ forms a closed subset of $\mathcal {P}(M)$, and thus, it is enough to prove a large deviations principle for this subspace (see [2, Lemma 4.1.5]). We assume from now on that

$$\begin{aligned} M = \overline{\bigcup _{m=1}^{\infty } K_{m}}. \end{aligned}$$

(3.2)

Given a sequence of partitions $\big ( \mathcal {A}_{m} \big )_{m \in \mathbb {N}}$, let ${\mathcal {F}}_{m}$ and ${\mathcal {F}}_{\infty }$ denote the $\sigma $-algebras generated by ${\mathcal {A}}_{m}$ and by the union $\cup _{m=1}^{\infty }{\mathcal {A}}_{m}$, respectively. We write ${\mathcal {B}}(M)$ for the Borel $\sigma $-algebra in M.

Lemma 3.1

There exists a sequence of nested partitions $\big ( \mathcal {A}_{m} \big )_{m \in \mathbb {N}}$ such that $\mathcal {A}_{m} = \{ A_{m, 1}, \dots , A_{m, \ell _{m}} \}$ and

$\mathrm{diam}(A_{m, i})<\frac{1}{m}$, if $i=1,\dots , {\tilde{\ell }}_{m}$, for some ${\tilde{\ell }}_{m} \le \ell _{m}$.
$K_{m}^{\complement } = \bigcup _{i={\tilde{\ell }}_{m}+1}^{\ell _{m}} A_{m, i}$.
${\mathcal {F}}_{\infty } = {\mathcal {B}}(M)$.

Proof

Notice that if we can construct partitions $\mathcal {A}_{m}$ for each m that satisfy the three first requirements of the lemma without requiring them to be nested, it is possible to take refinements in order to obtain a nested sequence.

Recall the definition the compact set $K_{m}$ in (3.1). By compactness, it is possible to partition $K_{m}$ into subsets $\{ C_{m, 1}, \dots , C_{m, {\bar{\ell }}_m} \}$ of diameter at most $\frac{1}{m}$, so that $\mathcal {C}_m=\{K_{m}^{\complement }, C_{m, 1}, \dots , C_{m, {\bar{\ell }}_m}\}$ defines a partition of M.

Consider an enumeration $\big (B^{i}\big )_{i \in \mathbb {N}}$ of balls of rational radius and centred in a countable dense subset of M. We now define the partition $\mathcal {A}_{m}$ via the intersections of sets in $\mathcal {C}_{m}$ with $B^{m}$ and its complement. We write

$$\begin{aligned} \mathcal {A}_{m} = \{ A_{m, 1}, \dots , A_{m, \ell _{m}} \}, \end{aligned}$$

(3.3)

where $A_{m, i}$, $i \le {\tilde{\ell }}_{m}$, denotes the sets contained in $K_{m}$ and $A_{m, i}$, ${\tilde{\ell }}_{m}+1 \le i \le \ell _{m}$, indicates the sets contained in $K_{m}^{\complement }$.

Notice that the first two statements about the partition $\mathcal {A}_{m}$ are immediately verified. To check the last claim, notice that $B^{i} \in {\mathcal {F}}_{i}$, and thus $B^{i} \in \mathcal {F}_{\infty }$, for all $i \in \mathbb {N}$, which implies ${\mathcal {F}}_{\infty } = {\mathcal {B}}(M)$ and concludes the proof. $\square $

We select a subset $M_m{:}{=}\{a_{m, 1}, \dots , a_{m, \ell _{m}} \} \subset M$ such that $a_{m, i} \in A_{m, i}$ for $i=1, \dots ,\ell _m$ and turn $(\mathcal {A}_{m},M_{m})$ into a tagged partition. We will furthermore assume that $M_m \subset M_{m+1}$.

For each $m \in \mathbb {N}$, the tagged partition $(\mathcal {A}_{m}, M_{m})$ defines a natural projection $\pi ^{m}: M \rightarrow M_{m}$ via

$$\begin{aligned} \pi ^m(x)=a_{m, i}, \text { if } x \in A_{m, i}. \end{aligned}$$

(3.4)

This allows us to define, for any measure $\upsilon \in \mathcal {P}(M)$, its discretized version $\upsilon ^{m} \in \mathcal {P}(M)$ as the probability measure supported in $M_m$ given by the pushforward of $\upsilon $ via the map $\pi ^{m}$, i.e.

$$\begin{aligned} \upsilon ^{m}(a_{m, i})=\upsilon \big ( (\pi ^{m})^{-1}(a_{m, i}) \big ) = \upsilon (A_{m, i}), \text { for }i=1,\dots ,\ell _m. \end{aligned}$$

(3.5)

Random elements are also discretized with the aid of the projection maps $\pi ^{m}$. If $\big ( X_i \big )_{i \in \mathbb {N}}$ is an i.i.d. sequence of random elements with distribution $\mu \in \mathcal {P}(M)$, then $X_{i}^{m}=\pi ^m(X_i)$ yields an i.i.d. sequence of random elements distributed according to $\mu ^{m}$.

The empirical measure for the discretized elements is given by

$$\begin{aligned} L_n^{m}{:}{=}\frac{1}{n}\sum _{i=1}^n\delta _{X_i^{m}}. \end{aligned}$$

(3.6)

Since, for each $m \in \mathbb {N}$, the elements $X_{i}^{m}$ take values on the finite space $M_{m}$, we know that the sequence of empirical measures $\big ( L_n^m \big )_{n \in \mathbb {N}}$ satisfies a large deviations principle on the space $\mathcal {P}(M_{m})$ with rate function $H( \,\cdot \, |\mu ^m)$. Via [2, Lemma 4.1.5], we can extend these large deviation principles to the whole space $\mathcal {P}(M)$ with rate function also given by $H( \,\cdot \, |\mu ^m)$ (note that $H( \upsilon |\mu ^m)$ is infinite if $\upsilon \notin \mathcal {P}(M_{m})$).

Lemma 2.3 yields the following expression for the rate function $H( \upsilon |\mu ^m)$, when $\upsilon \in \mathcal {P}(M_{m})$:

$$\begin{aligned} H(\upsilon |\mu ^m) = \sum _{a \in M_m}\upsilon (a)\log \frac{\upsilon (a)}{\mu ^m(a)}. \end{aligned}$$

(3.7)

4 Proof of Theorem 1.1

In this section, we present our approach to the proof of Sanov’s theorem. Our goal is to deduce that the empirical measures $L_{n}$ given by (1.1) satisfy a large deviations principle from the information that the sequences $\big ( L_n^m \big )_{n \in \mathbb {N}}$ satisfy large deviations principles, for all $m \in \mathbb {N}$. Since the rate function given by Sanov’s theorem (Theorem 1.1) is the relative entropy, the following two lemmas that relate the entropies in discrete and Polish spaces are the central pieces in our proof.

Lemma 4.1

For any two probability measures $\mu , \upsilon \in \mathcal {P}(M)$,

$$\begin{aligned} \sup _m H(\upsilon ^m|\mu ^m)=\lim _{m} H(\upsilon ^m|\mu ^m)=H(\upsilon |\mu ). \end{aligned}$$

(4.1)

Furthermore, if $\sup _m H(\upsilon ^{m}| \mu ^m)<\infty $, then there exists a positive constant $c \ge 0$ such that

$$\begin{aligned} \mathrm{d}_{\mathrm{BL}}(\upsilon ^m,\upsilon )\le \frac{c}{m}. \end{aligned}$$

(4.2)

Remark 4.2

Notice that the lemma above provides an alternative expression for the rate function in Sanov’s theorem, depending on the sequence of partitions chosen. Even more is true: the supremum in (4.1) can be taken over all finite partitions of M. In fact, simply notice that if $({\mathcal {A}}, {\mathcal {M}})$ is a dotted partition of M with projection map $\pi : M \rightarrow {\mathcal {M}}$, then any function $f: {\mathcal {M}} \rightarrow \mathbb {R}$ can be extended to M via ${\tilde{f}}=f \circ \pi $. To conclude, apply the variational formulation of relative entropy to obtain $H(\upsilon ^{{\mathcal {A}}}|\mu ^{\mathcal {A}}) \le H(\upsilon |\mu )$.

Lemma 4.3

For any $\mu , \upsilon \in \mathcal {P}(M)$ and $m_{0} \in \mathbb {N}$,

$$\begin{aligned} \sup _{m \ge m_{0}} \inf _{\sigma \in {\bar{B}}_{\frac{1}{\sqrt{m}}}(\upsilon )}H(\sigma |\mu ^m)=H(\upsilon |\mu ). \end{aligned}$$

(4.3)

We prove Lemma 4.1 in Sect. 5 and Lemma 4.3 in Sect. 6.

The next lemma is a result of exponential equivalence.

Lemma 4.4

Let $L_n$ and $L_n^m$ as defined in Eqs. (1.1) and (3.6), respectively. Then,

$$\begin{aligned} \mathbb {P}\bigg ( \mathrm{d}_{\mathrm{BL}}(L_n,L_n^m) > \frac{3}{m}\bigg ) \le \exp \big (-mn\big ). \end{aligned}$$

(4.4)

Proof

Observe that if $X_i \in A_{m, j}$ for some $j=1,\dots ,\tilde{\ell }_m$ then $d(X_i,X_i^{m})\le 1/m$. In particular,

$$\begin{aligned} d_{BL}(L_n,L_n^m)\le \frac{1}{n}\sum _{i=1}^n d(X_i,X_i^{m}) \wedge 2 \le \frac{1}{m}+\frac{2}{n}\sum _{i=1}^n1_{\{X_i \in K_m^{\complement }\}}. \end{aligned}$$

(4.5)

This implies

$$\begin{aligned} \mathbb {P}\bigg ( d_{BL}(L_n,L_n^m)>\frac{3}{m}\bigg )\le \mathbb {P}\bigg (\frac{1}{n}\sum _{i=1}^n 1_{\{X_i \in K_m^{\complement }\}}>\frac{1}{m}\bigg ) \end{aligned}$$

(4.6)

In order to bound the last probability, we use union bound and independence to obtain

$$\begin{aligned} \begin{aligned} \mathbb {P}\bigg (\frac{1}{n}\sum _{i=1}^n 1_{\{X_i \in K_m^{\complement }\}}>\frac{1}{m}\bigg )&\le \sum _{A \subset [n]: |A|=\frac{n}{m}} \mathbb {P} \big (X_i \in K_m^{\complement }, \text { for all } i \in A \big ) \\&\le \left( {\begin{array}{c}n\\ \frac{n}{m}\end{array}}\right) \Big ( \frac{e^{-m^{2}-1}}{m} \Big )^{\frac{n}{m}} \\&\le \bigg (em\frac{e^{-m^{2}-1}}{m} \bigg )^{\frac{n}{m}} \\&\le \exp \big (-mn\big ), \end{aligned} \end{aligned}$$

(4.7)

concluding the proof. $\square $

We are now ready to work on the proof of Theorem 1.1. It is proved in [2, Lemma 6.2.6] that the sequence $\big ( L_n \big )_{n \in \mathbb {N}}$ is exponentially tight. In particular, there exists a subsequence $\big ( L_{n_k} \big )_{n_{k}}$ that satisfies a large deviations principle with rate function I.

From now on, we drop the subscript k in $n_{k}$. Notice that

$$\begin{aligned} -I(\upsilon ) = \lim _{\varepsilon \rightarrow 0} \lim _{n\rightarrow \infty } \frac{1}{n}\log \mathbb {P}(L_n\in B_\varepsilon (\upsilon )). \end{aligned}$$

(4.8)

Even though the rate function I might depend on the subsequence, our goal is to prove that this is not the case. In fact, we prove that $I(\, \cdot \,)=H( \, \cdot \,|\mu )$. In Proposition 4.5, we prove that $H( \, \cdot \,|\mu ) \ge I(\, \cdot \,)$, while the opposite inequality is established in Proposition 4.6. This concludes the proof of Theorem 1.1, since any possible subsequence $\big ( L_{n_{k}} \big )_{n_{k}}$ that satisfies a large deviations principle does so with the same rate function $H( \,\cdot \, | \mu )$, which implies that the whole sequence also satisfies a large deviations principle.

Proposition 4.5

The function I in Eq. (4.8) satisfies $H(\, \cdot \,|\mu )\ge I(\, \cdot \,)$.

Proof

Fix $\upsilon \in \mathcal {P}(M)$ and notice we can assume that $H(\upsilon | \mu )$ is finite since the statement is trivially verified if otherwise.

Due to Lemma 4.1, we have

$$\begin{aligned} \mathrm{d}_{\mathrm{BL}}(\upsilon ,\upsilon ^m) \le \frac{c}{m}, \end{aligned}$$

(4.9)

for some positive constant $c>0$. In particular, this implies

$$\begin{aligned} \mathbb {P}( L^m_n \in B_\varepsilon (\upsilon ^m))\le \mathbb {P}\left( L_n \in {\bar{B}}_{\varepsilon +\frac{3+c}{m}}(\upsilon ) \right) +\mathbb {P} \left( \mathrm{d}_{\mathrm{BL}}(L_n,L_n^m)> \frac{3}{m} \right) , \end{aligned}$$

(4.10)

which yields

$$\begin{aligned} \begin{aligned} \frac{1}{n}&\log \mathbb {P}(L^m_n \in B_\varepsilon (\upsilon ^m)) \le \frac{\log 2}{n} \\&\qquad \qquad + \max \left\{ \frac{1}{n}\log \mathbb {P}\left( L_n \in {\bar{B}}_{\varepsilon +\frac{3+c}{m}}(\upsilon ) \right) , \frac{1}{n}\log \mathbb {P} \left( \mathrm{d}_{\mathrm{BL}}(L_n,L_n^m)> \frac{3}{m} \right) \right\} . \end{aligned} \end{aligned}$$

(4.11)

Lemma 4.4 gives

$$\begin{aligned} \frac{1}{n}\log \mathbb {P} \left( \mathrm{d}_{\mathrm{BL}}(L_n,L_n^m)> \frac{3}{ m} \right) \le -m, \end{aligned}$$

(4.12)

and, by taking the limit as n grows in (4.11),

$$\begin{aligned} -\inf _{\sigma \in B_{\varepsilon }(\upsilon ^m)} H(\sigma |\mu ^{m}) \le \max \left\{ -\inf _{\sigma \in {\bar{B}}_{\varepsilon +\frac{3+c}{m}}(\upsilon )}I(\sigma ),-m \right\} . \end{aligned}$$

(4.13)

We now take the limit as $\varepsilon $ goes to zero to obtain

$$\begin{aligned} -H(\upsilon ^m | \mu ^m) \le \max \left\{ - \inf _{\sigma \in {\bar{B}}_{\frac{3+c}{m}}(\upsilon )} I (\sigma ),-m \right\} , \end{aligned}$$

(4.14)

which readily implies, via Lemma 4.1,

$$\begin{aligned} H(\upsilon | \mu ) = \sup _{{{\bar{m}}}}H(\upsilon ^{{{\bar{m}}}}|\mu ^{{{\bar{m}}}}) \ge \min \left\{ \inf _{\sigma \in {\bar{B}}_{\frac{3+c}{m}}(\upsilon )} I(\sigma ), m \right\} , \end{aligned}$$

(4.15)

for all $m \in \mathbb {N}$.

Since the function I is lower semicontinuous,

$$\begin{aligned} \lim _{m \rightarrow \infty } \inf _{\sigma \in {\bar{B}}_{\frac{3+c}{m}}(\upsilon )}I(\sigma )=I(\upsilon ). \end{aligned}$$

(4.16)

In particular,

$$\begin{aligned} H(\upsilon | \mu ) \ge I(\upsilon ), \end{aligned}$$

(4.17)

concluding the proof. $\square $

Proposition 4.6

We have $H( \,\cdot \, |\mu )\le I(\, \cdot \,)$.

Proof

Fix $\upsilon \in \mathcal {P}(M)$ and observe once again that

$$\begin{aligned} \begin{aligned}&\frac{1}{n} \log \mathbb {P}\big ( L_n \in B_\varepsilon (\upsilon ) \big ) \le \frac{\log 2}{n} \\&\qquad + \max \bigg \{\frac{1}{n} \log \mathbb {P} \left( L_n^m \in {\bar{B}}_{\varepsilon +\frac{1}{\sqrt{m}}}(\upsilon ) \right) , \frac{1}{n} \log \mathbb {P} \left( \mathrm{d}_{\mathrm{BL}} (L_n,L_n^m)>\tfrac{1}{\sqrt{m}} \right) \bigg \}. \end{aligned} \end{aligned}$$

Taking $n \rightarrow \infty $ and $\varepsilon \rightarrow 0$, we obtain with the aid of Lemma 4.4

$$\begin{aligned} -I(\upsilon ) \le \max \bigg \{-\inf _{\sigma \in \bar{B}_{\frac{1}{\sqrt{m}}}(\upsilon )}H(\sigma |\mu ^m),\,-m \bigg \}. \end{aligned}$$

(4.18)

Let us now split the discussion in whether $H(\upsilon |\mu )$ is finite or not. Assume first that this relative entropy is infinite and notice that Lemma 4.3 implies

$$\begin{aligned} \sup _{m}\inf _{\sigma \in {\bar{B}}_{\frac{1}{\sqrt{m}}}(\upsilon )}H(\sigma |\mu ^m) = \infty , \end{aligned}$$

(4.19)

which readily implies $I(\upsilon )=\infty $, when combined with (4.18).

If, on the other hand, we have $H(\upsilon |\mu )< \infty $, we combine (4.18) and Lemma 4.3 with $m_{0} \ge H(\upsilon |\mu )$ to obtain

$$\begin{aligned} I(\upsilon ) \ge \min \left\{ \inf _{\sigma \in {\bar{B}}_{\frac{1}{\sqrt{m}}}(\upsilon )} H(\sigma |\mu ^m), m \right\} \ge \inf _{\sigma \in {\bar{B}}_{\frac{1}{\sqrt{m}}}(\upsilon )} H(\sigma |\mu ^m), \end{aligned}$$

(4.20)

for every $m \ge m_{0}$. Taking the supremum in m concludes the proof. $\square $

5 Proof of Lemma 4.1

In this section, we prove Lemma 4.1. We start with the following preliminary lemma, which in particular implies the second part of Lemma 4.1. We prove the first part afterwards.

Lemma 5.1

If $\sigma \in \mathcal {P}(M)$ is such that $H(\sigma |\mu ) \le \alpha $, then, for any $\theta >0$, we have

$$\begin{aligned} \mathrm{d}_{\mathrm{BL}}(\sigma ^m,\sigma ) \le \frac{1}{m}+2\frac{\alpha }{\theta }+ 2\frac{e^{-m^{2}-1+\theta }}{m\theta }. \end{aligned}$$

(5.1)

In particular,

$$\begin{aligned} \mathrm{d}_{\mathrm{BL}}(\sigma ^m,\sigma ) \le \frac{3+2\alpha }{m}, \text { for all } m \in \mathbb {N}. \end{aligned}$$

(5.2)

Proof

Consider $X \sim \sigma $ and notice that $X^{m} = \pi ^{m}(X)$ has distribution $\sigma ^{m}$. Therefore,

$$\begin{aligned} \mathrm{d}_{\mathrm{BL}}(\sigma ^m,\sigma )\le \mathbb {E}(\mathrm{d}(X^m,X)\wedge 2). \end{aligned}$$

(5.3)

Splitting on whether $X\in K_{m}$ or not, we obtain

$$\begin{aligned} \mathrm{d}_{\mathrm{BL}}(\sigma ^m,\sigma )\le \frac{1}{m}+2\sigma (K_m^{\complement }). \end{aligned}$$

(5.4)

We now combine the entropy inequality with the bound $\log (1+x)\le x$ to obtain, for $\theta >0$,

$$\begin{aligned} \begin{aligned} \sigma (K_m^{\complement })&= \frac{1}{\theta } {\mathbb {E}}_{\sigma }\left[ \theta 1_{K_m^{\complement }} \right] \le \frac{1}{\theta }\left( H(\sigma |\mu )+\log \mathbb {E}_{\mu } \left[ e^{\theta 1_{K_m^{\complement }}}\right] \right) \\&= \frac{1}{\theta } \left( \alpha + \log \left( 1-\mu \left( K_m^{\complement }\right) +e^\theta \mu \left( K_m^{\complement }\right) \right) \right) \\&\le \frac{\alpha }{\theta } + \frac{(e^{\theta }-1)}{\theta } \mu \left( K_m^{\complement }\right) \\&\le \frac{\alpha }{\theta } + \frac{(e^{\theta }-1)}{\theta }\frac{e^{-m^{2}-1}}{m}, \end{aligned} \end{aligned}$$

(5.5)

by the choice of $K_m$ in (3.1). Combining the equation above with (5.4) concludes the proof of (5.1).

Choose now $\theta =m$ in (5.1) to obtain

$$\begin{aligned} \mathrm{d}_{\mathrm{BL}}(\sigma ^m,\sigma )\le \frac{1}{m}+2\frac{\alpha }{m} + 2\frac{e^{-m^{2}+m-1}}{m^{2}} \le \frac{3+2\alpha }{m}, \end{aligned}$$

(5.6)

concluding the proof. $\square $

Second, we provide a martingale that will be useful during the proof.

Lemma 5.2

Assume either that $H(\upsilon |\mu )$ or $\sup _{m}H(\upsilon ^{m}|\mu ^{m})$ is finite. Then,

$$\begin{aligned} S_m = \frac{\mathrm{d}\upsilon ^m}{\mathrm{d}\mu ^m} \circ \pi ^{m} \end{aligned}$$

(5.7)

is an uniformly integrable martingale in the probability space $\big (M, \mathcal {B}(M), \mu \big )$ with respect to the filtration $\big ({\mathcal {F}}_{m}\big )_{m \in \mathbb {N}}$.

Proof

Assume first that $H(\upsilon |\mu ) < \infty $. In this case, $\tfrac{\mathrm{d}\upsilon }{\mathrm{d}\mu }$ exists and

$$\begin{aligned} {\hat{S}}_m = {\mathbb {E}}_\mu \left[ \frac{\mathrm{d}\upsilon }{\mathrm{d}\mu } \Big | {\mathcal {F}}_m \right] \end{aligned}$$

(5.8)

is a uniformly integrable martingale. It follows directly from the definition of conditional expectation and Radon–Nikodym derivative that ${\hat{S}}_{m}=S_{m}$ almost surely for every $m \in \mathbb {N}$, concluding the proof of the first case.

Assume now that $\sup _{m}H(\upsilon ^{m}|\mu ^{m}) < \infty $ and observe that this implies that $S_m$ is well defined for all $m \in \mathbb {N}$, has expectation one, and is non-negative. We first have to verify that ${\mathbb {E}}[S_{m+1} |{\mathcal {F}}_m] = S_m$. Take an element $A_{m,k} \in {\mathcal {F}}_m$, with $0 \le k \le \ell _{m}$, so that

$$\begin{aligned} {\mathbb {E}}_\mu [S_{m} \cdot 1_{A_{m,k}}] = \frac{\upsilon (A_{m,k})}{\mu (A_{m,k})} {\mathbb {E}}_{\mu }[ 1_{A_{m,k}}] = \upsilon (A_{m,k}). \end{aligned}$$

(5.9)

Now, let $B_1, \cdots , B_j$ elements of ${\mathcal {F}}_{m+1}$ such that $\cup _{i=1}^j B_i = A_{m,k}$, then

$$\begin{aligned} \begin{aligned} {\mathbb {E}}_\mu \left[ {\mathbb {E}}_\mu [S_{m+1} | {\mathcal {F}}_m \right] \cdot 1_{A_{m,k}}]&= {\mathbb {E}}_\mu \left[ {\mathbb {E}}_\mu \left[ S_{m+1} \left. \sum _{i=1}^j 1_{B_i} \right| {\mathcal {F}}_m \right] \right] \\&= {\mathbb {E}}_\mu \left[ {\mathbb {E}}_\mu \left[ \left. \sum _{i=1}^j \frac{\upsilon (B_i)}{\mu (B_i)} 1_{B_i} \right| {\mathcal {F}}_m \right] \right] \\&= \sum _{i=1}^j \frac{\upsilon (B_i)}{\mu (B_i)} \mu (B_i) \\&= \upsilon (A_{m,k})={\mathbb {E}}[S_m1_{A_{m,k}}], \end{aligned} \end{aligned}$$

(5.10)

which implies $ {\mathbb {E}}[S_{m+1} |{\mathcal {F}}_m] = S_m$ , concluding our first statement.

In order to verify uniform integrability of $S_{n}$, observe that

$$\begin{aligned} {\mathbb {E}}_\mu [S_m \log S_m] = H(\upsilon ^{m}|\mu ^{m}) \le \sup _{m} H(\upsilon ^{m}|\mu ^{m}){:}{=}K < \infty . \end{aligned}$$

(5.11)

Now, for each $M>0$ we have, uniformly in $m \in \mathbb {N}$,

$$\begin{aligned} E_\mu [S_m 1_{\{S_m \ge M\}}] \le {\mathbb {E}}_\mu \left[ S_m 1_{\{S_m \ge M\}} \dfrac{\log S_m}{\log M} \right] \le \frac{K}{\log M}. \end{aligned}$$

(5.12)

Therefore, $S_m$ is a uniformly integrable martingale, concluding the proof of the lemma. $\square $

We are now in position to prove the first part of Lemma 4.1.

Proof of Lemma 4.1

We first observe that via the variational definition of entropy and the fact that $M_{m} \subset M_{m+1}$, for all $m \in \mathbb {N}$, we obtain that $H(\upsilon ^m|\mu ^m)$ is monotone increasing in m (following the steps pointed out in Remark 4.2 or directly as a consequence of [4, Corollary 5.2.2]). In particular,

$$\begin{aligned} \sup _m H(\upsilon ^m|\mu ^m)=\lim _{m \rightarrow \infty } H(\upsilon ^m|\mu ^m) \end{aligned}$$

(5.13)

and thus it suffices to verify that

$$\begin{aligned} \lim _m H(\upsilon ^m|\mu ^m) = H(\upsilon |\mu ). \end{aligned}$$

(5.14)

Again from the variational definition of relative entropy, we have $ H(\upsilon ^m|\mu ^m) \le H(\upsilon |\mu )$, for all $m \in \mathbb {N}$, so that

$$\begin{aligned} \limsup _{m \rightarrow \infty } H(\upsilon ^{m}|\mu ^{m}) \le H(\upsilon |\mu ). \end{aligned}$$

(5.15)

We now work on the proof of the reverse inequality. The strategy of the proof is as follows. If at least one of the two quantities of interest is finite, we have access to the uniformly integrable martingale $S_{m}$ given by the Radon–Nikodym derivative of $\upsilon ^{m}$ with respect to $\mu ^{m}$. As we will see, this martingale converges in $L^{1}$ and almost surely to $\tfrac{\mathrm{d}\upsilon }{\mathrm{d}\mu }$, which will yield the result when combined with Fatou’s lemma.

Assume that either $H(\upsilon |\mu )<\infty $ or $\sup _{m}H(\upsilon ^{m}|\mu ^{m})<\infty $. The martingale $S_{m}$ introduced in (5.7) is uniformly integrable and thus converges almost surely and in $L^{1}$ to a random variable X.

In the case $H(\upsilon |\mu )<\infty $, we have

$$\begin{aligned} X=E_{\mu }\left[ \frac{\mathrm{d}\upsilon }{\mathrm{d}\mu } \Big | {\mathcal {F}}_{\infty } \right] = \frac{\mathrm{d}\upsilon }{\mathrm{d}\mu }, \end{aligned}$$

(5.16)

since ${\mathcal {F}}_{\infty } = {\mathcal {B}}(M)$ (see Lemma 3.1). If we are in the case $\sup _{m}H(\upsilon ^{m}|\mu ^{m})<\infty $, the above also holds precisely because elements in ${\mathcal {B}}(M)$ can be approximated by elements in $\cup _{m=1}^{\infty } {\mathcal {F}}_{m}$.

We now note that since $x \log x \ge -e^{-1}$, Fatou’s lemma implies

$$\begin{aligned} \liminf _{m} {\mathbb {E}}_\mu \left[ S_{m} \log S_{m} \right] \ge {\mathbb {E}}_\mu \left[ \frac{\mathrm{d}\upsilon }{\mathrm{d}\mu } \log \frac{\mathrm{d}\upsilon }{\mathrm{d}\mu } \right] , \end{aligned}$$

(5.17)

which verifies (4.1) (see also Lemma 2.3) and concludes the proof of the lemma. $\square $

6 Proof of Lemma 4.3

In this section, we prove Lemma 4.3. We fix $m_{0} \in \mathbb {N}$ and denote by

$$\begin{aligned} I^0(\upsilon ) {:}{=} \sup _{m \ge m_{0}} \inf _{\sigma \in {\bar{B}}_{\frac{1}{\sqrt{m}}}(\upsilon )} H(\sigma |\mu ^m). \end{aligned}$$

(6.1)

Our goal is to show that $I^0(\upsilon )= H(\upsilon |\mu )$. We will prove this in two steps, by checking that $I^0(\upsilon ) \le H(\upsilon |\mu )$ and $I^0(\upsilon ) \ge H(\upsilon |\mu )$. The first inequality is verified in the next paragraph. The reverse inequality is more delicate, and we dedicate the rest of the section to verify it.

Let us check that $I^0(\upsilon ) \le H(\upsilon |\mu )$. Indeed, the inequality is trivial if $H(\upsilon |\mu ) = \infty $. If, on the other hand, this entropy is finite, we have, in view of Lemma 4.1,

$$\begin{aligned} \mathrm{d}_{\mathrm{BL}}(\upsilon , \upsilon ^{m}) \le \frac{c}{m } \le \frac{1}{\sqrt{m}}, \end{aligned}$$

(6.2)

for m large enough, from which our claim follows by noting that $\upsilon ^{m} \in {\bar{B}}_{\frac{1}{\sqrt{m}}}(\upsilon )$ and applying Lemma 4.1.

We now focus on the proof of the inequality

$$\begin{aligned} I^0(\upsilon )\ge H(\upsilon |\mu ). \end{aligned}$$

(6.3)

Once again we assume that $I^0(\upsilon )<\infty $, since the alternative case is trivial.

The first observation we make is that (6.3) follows if, for any $\alpha > 0$,

$$\begin{aligned} I^0(\upsilon ) < \alpha \text { implies } H(\upsilon |\mu ) \le \alpha . \end{aligned}$$

(6.4)

This follows directly from the following lemma together with the lower semicontinuity of the relative entropy $H( \,\cdot \, |\mu )$.

Lemma 6.1

If $I^0(\upsilon ) < \alpha $, then, for every $\varepsilon >0$, there exists $\rho \in B_{\varepsilon }(\upsilon )$ such that

$$\begin{aligned} H(\rho |\mu ) \le \alpha . \end{aligned}$$

(6.5)

Proof

Our goal will be to find $\rho $ such that $H(\rho |\mu ) \le \alpha $ and $\mathrm{d}_{\mathrm{BL}}(\upsilon , \rho )< \varepsilon $. Fix $m \ge m_{0}$ large enough such that

$$\begin{aligned} \frac{3+2\alpha }{m}+\frac{1}{\sqrt{m}} < \varepsilon , \end{aligned}$$

(6.6)

Recall from (6.1) that $I^0(\upsilon ) < \alpha $ implies that there exists $\sigma \in {\bar{B}}_{\frac{1}{\sqrt{m}}}(\upsilon )$ such that $H(\sigma |\mu ^{m}) \le \alpha $. Notice that, since this entropy is finite, we have $\sigma = \sigma ^{m}$.

Define

$$\begin{aligned} \rho (F){:}{=}\sum _{i=1}^{\ell _m}\frac{\sigma (A_{m, i})}{\mu (A_{m, i})} \mu (F \cap A_{m, i}) = \sum _{i=0}^{\ell _m} \mu (F | A_{m, i}) \sigma (A_{m, i}). \end{aligned}$$

(6.7)

Via direct substitution it follows that $\rho ^{m} = \sigma $. We claim that $H(\rho |\mu )=H(\sigma |\mu ^{m}) \le \alpha $ and $\mathrm{d}_{\mathrm{BL}}(\upsilon , \rho )< \varepsilon $.

In order to verify that $H(\rho |\mu )=H(\sigma |\mu ^{m})$, observe that

$$\begin{aligned} \begin{aligned} H(\rho ^{m+j}|\mu ^{m+j})&=\sum _{i=0}^{\ell _{m+j}} \rho (A_{m+j, i}) \log \frac{\rho (A_{m+j, i})}{\mu (A_{m+j, i})}\\&=\sum _{i=0}^{\ell _m}\sum _{k:A_{m+j, k}\subset A_{m, i}} \rho (A_{m+j, k})\log \frac{ \rho (A_{m+j, k})}{\mu (A_{m+j, k})}. \end{aligned} \end{aligned}$$

(6.8)

Furthermore, if $A_{m+j, k} \subset A_{m, i}$, then, from (6.7),

$$\begin{aligned} \rho (A_{k+j, m})=\frac{\sigma (A_{m, i})}{\mu (A_{m, i})}\mu (A_{m+j, k}). \end{aligned}$$

(6.9)

Therefore,

$$\begin{aligned} \begin{aligned} H(\rho ^{m+j}|\mu ^{m+j})&= \sum _{i=0}^{\ell _m} \sum _{k:A_{m+j, k}\subset A_{m, i}} \frac{\sigma (A_{m, i})}{\mu (A_{m, i})}\mu (A_{m+j, k}) \log \frac{\sigma (A_{m, i})}{\mu (A_{m, i})} \\&= H(\sigma |\mu ^{m}), \end{aligned} \end{aligned}$$

(6.10)

since

$$\begin{aligned} \sum _{k:A_{m+j, k} \subset A_{m, i}} \mu (A_{m+j, k})= \mu (A_{m, i}). \end{aligned}$$

(6.11)

In particular, from Lemma 4.1, $H(\rho |\mu ) = H(\sigma |\mu ^{m}) \le \alpha $.

Finally, we now prove that $\mathrm{d}_{\mathrm{BL}}(\rho , \upsilon ) < \varepsilon $ by estimating

$$\begin{aligned} \begin{aligned} \mathrm{d}_{\mathrm{BL}}(\rho , \upsilon )&\le \mathrm{d}_{\mathrm{BL}}(\rho , \rho ^{m}) + \mathrm{d}_{\mathrm{BL}}(\rho ^{m}, \sigma ) + \mathrm{d}_{\mathrm{BL}}(\sigma , \upsilon ) \\&\le \frac{3+2\alpha }{m}+\frac{1}{\sqrt{m}} \le \varepsilon , \end{aligned} \end{aligned}$$

(6.12)

where the last line uses Lemma 5.1, since $H(\rho |\mu )$ is bounded by $\alpha $ and recalling that $\rho ^m=\sigma $. This concludes the proof. $\square $

Data Availability Statement

Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.

References

Csiszár, Imre: A simple proof of Sanov’s theorem. Bulletin of the Brazilian Mathematical Society 37(4), 453–459 (2006)
Article MathSciNet MATH Google Scholar
Dembo, Amir, Zeitouni, Ofer: Large deviations techniques and applications. Stochastic Modelling and Applied Probability. Springer, Berlin Heidelberg (2009)
MATH Google Scholar
Den Hollander, Frank: Large deviations. American Mathematical Society, USA (2008)
MATH Google Scholar
Gray, Robert M.: Entropy and information theory. Springer Science & Business Media, Cham (2011)
Book MATH Google Scholar

Download references

Acknowledgements

RB is supported by the Mathematical Institute of Leiden University. RIO counted on the support of CNPq, Brazil via a Bolsa de Produtividade em Pesquisa (304475/2019-0) and a Universal Grant (432310/2018-5). GR was partially supported by a Capes/PNPD fellowship 888887.313738/2019-00 while he was a postdoctoral fellow at Federal University of Bahia (UFBA).

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Mathematical Institute, Leiden University, P.O. Box 9512, 2300, RA Leiden, The Netherlands
Rangel Baldasso
IMPA, Estrada Dona Castorina 110, Rio de Janeiro, RJ, 22460-320, Brazil
Roberto I. Oliveira
Instituto de Matemática, Universidade Federal de Alagoas, Rua Lorival de Melo Mota s/n, Maceió, AL, 57072970, Brazil
Alan Pereira
Fakultät für Mathematik, Technische Universität München, Boltzmannstraße 3, 85748, Garching bei München, Germany
Guilherme Reis

Authors

Rangel Baldasso
View author publications
You can also search for this author in PubMed Google Scholar
Roberto I. Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Alan Pereira
View author publications
You can also search for this author in PubMed Google Scholar
Guilherme Reis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guilherme Reis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Baldasso, R., Oliveira, R.I., Pereira, A. et al. A Proof of Sanov’s Theorem via Discretizations. J Theor Probab 36, 646–660 (2023). https://doi.org/10.1007/s10959-022-01174-0

Download citation

Received: 18 January 2022
Accepted: 17 March 2022
Published: 09 April 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10959-022-01174-0

Keywords

Mathematics Subject Classification (2020)

60F10

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Proof of Sanov’s Theorem via Discretizations

Abstract

Similar content being viewed by others

Stochastic Quasi-Interpolation with Bernstein Polynomials

Deterministic Walks and Quasi-Subgradient Methods for the Karcher Mean on NPC Spaces

Intermediate weighted spaces and domains of semi-groups

1 Introduction

Theorem 1.1

2 Preliminaries

Definition 2.1

Lemma 2.2

Proof

Lemma 2.3

3 Discretization

Lemma 3.1

Proof

4 Proof of Theorem 1.1

Lemma 4.1

Remark 4.2

Lemma 4.3

Lemma 4.4

Proof

Proposition 4.5

Proof

Proposition 4.6

Proof

5 Proof of Lemma 4.1

Lemma 5.1

Proof

Lemma 5.2

Proof

Proof of Lemma 4.1

6 Proof of Lemma 4.3

Lemma 6.1

Proof

Data Availability Statement

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2020)

Search

Navigation