Large Deviations of Continuous Regular Conditional Probabilities
 714 Downloads
Abstract
We study product regular conditional probabilities under measures of two coordinates with respect to the second coordinate that are weakly continuous on the support of the marginal of the second coordinate. Assuming that there exists a sequence of probability measures on the product space that satisfies a large deviation principle, we present necessary and sufficient conditions for the conditional probabilities under these measures to satisfy a large deviation principle. The arguments of these conditional probabilities are assumed to converge. A way to view regular conditional probabilities as a special case of product regular conditional probabilities is presented. This is used to derive conditions for large deviations of regular conditional probabilities. In addition, we derive a Sanovtype theorem for large deviations of the empirical distribution of the first coordinate conditioned on fixing the empirical distribution of the second coordinate.
Keywords
(Product) regular conditional kernel Weakly continuous Large deviationsMathematics Subject Classification (2010)
60A10 60F101 Introduction and Main Results
Such kernels are called regular conditional probabilities and form an important object in probability theory. The existence of regular conditional probabilities has been studied extensively, for example, by Faden [12] or by Leao et al. [21]. There exist in fact various forms of regular conditional probabilities, namely either with respect to a \(\sigma \)algebra, with respect to a measurable map or with respect to the projection on one of the coordinates (in case of a product space).
In order to consider large deviations of conditional probabilities, we have to specify which conditional probability we are considering; the conditional probability may not be unique. However, if a (product) regular conditional probability is weakly continuous on the support of the measure composed with the inverse of the measurable map (or projection), it is unique on that domain. For these (product) regular conditional probabilities, it is natural to study their large deviations whenever the argument of the probability is in the domain on which it is unique. In this paper, we study the large deviations in the case when the arguments of these kernels converge, i.e. we study large deviations of \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) for the case that \(y_n \rightarrow y\). To the best of our knowledge, current literature does not provide a general condition under which such kernels satisfy a large deviation principle.
1.1 Literature
Some examples in this direction are present. For example in Adams et al. [1], the large deviation principle is proved for the empirical distribution that is evolved by independent Brownian motions conditioned on their initial empirical distribution to lie in a ball (see [1, Theorem 1]). They proceed by proving that the large deviation principle rate function converges as the radius of the ball converges to zero. For the purpose of this paper, we have to show that the limit of the radius of the ball and the limit belonging to the large deviation principle can be interchanged. Léonard [22] proves the large deviation principle of the empirical distribution that is evolved by independent Brownian motions conditioned on their initial empirical distribution; those initial empirical distributions are assumed to be converging (see [22, Proposition 2.19]). In both papers, the evolved state is conditioned on the initial state, while there is also interest in large deviations of the initial state conditioned on the evolved state. In this paper, we prove the large deviation principle in this setting for finite state spaces.
There exist various results on quenched large deviations, i.e. large deviations for regular conditional probabilities in the sense that for almost all realisations of the disorder, the conditional probabilities satisfy the large deviation principle with a rate function that does not depend on the disorder. Examples of papers on quenched large deviations are Comets [5] for conditional large deviations of i.i.d. random fields, Greven and den Hollander [14] and Comets et al. [6] for random walks in random environments, Kosygina–Rezakhanlou–Varadhan [18] for a diffusion with a random drift and RassoulAgha et al. [24] for polymers in a random potential.
Biggins [2] obtains the large deviation principle for mixtures of probability measures that satisfy the large deviation principle with kernels that satisfy the large deviation principle as their arguments converge. To some extent, we complement the article in the opposite direction, in the sense that we assume the large deviation principle of the mixture and derive the large deviation principle of the kernels.
Our main motivation to study the above large deviations lies in the theory of Gibbs–nonGibbs transitions. There is a correspondence between the large deviation rate function of the conditional probability with respect to the evolved coordinate and the evolved state (measure or sequence) being Gibbs (see van Enter et al. [9]). We refer to Sect. 1.4 for further discussions on Gibbs–nonGibbs transitions.
1.2 Large Deviations
In the literature on large deviations, two dominant definitions of large deviation principles are used. One is in terms of a \(\sigma \)algebra on the topological space, as is done in the book by Dembo and Zeitouni [7] and in the book by Deuschel and Stroock [8]; the other is in terms of the topology, i.e. in terms of open and closed sets, as is done in the book by den Hollander [16] and in the book by RassoulAgha and Seppäläinen [25]. Whenever one considers the Borel \(\sigma \)algebra on the topological space, the two definitions agree.
We define the large deviation lower bound and the large deviation upper bound separately, as in Sect. 1.3, and in Sect. 6, we describe the necessary and sufficient conditions for each of the bounds separately. Moreover, we define them on a set of subsets of the topological space, which is not required to be a \(\sigma \)algebra. In Remark 7.4, we motivate the choice for this definition.
Definition 1.1
We omit “on \(\mathcal {A}\)” whenever \(\mathcal {A}\) is the Borel \(\sigma \)algebra \(\mathcal {B}(\mathcal {X})\) on \(\mathcal {X}\). In this case, the large deviation lower bound is satisfied if and only if the inequality in (1.2) holds for all open subsets of \(\mathcal {X}\) and the large deviation upper bound is satisfied if and only if the inequality in (1.3) holds for all closed subsets of \(\mathcal {X}\).
1.3 Main Results
See Sects. 3 and 4 for the definitions of the objects in the statements of the following theorems. In Sects. 6 and 7, we consider a more general situation. Theorem 1.2 is a consequence of Theorem 6.9, and Theorem 1.3 is a consequence of Theorem 7.5.
In this section \(\mathcal {X}\) and \(\mathcal {Y}\) are metric spaces.
Theorem 1.2

(A1) For all \((y_n)_{n\in \mathbb {N}}\) with \(y_n \rightarrow y\) and \(y_n \in {{\mathrm{supp}}}(\mu _n\circ \pi ^{1})\) for all n large enough,^{1} the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation lower bound with rate function I.
 (A2) For all \(x\in \mathcal {X}\) and \(r>0\), with \(U= B(x,r)\),$$\begin{aligned} \sup _{\varepsilon >0} \liminf _{n\rightarrow \infty }\inf _{ \genfrac{}{}{0.0pt}{}{z\in \mathcal {Y}, \delta \in (0,\varepsilon ) }{ B(z,\delta ) \subset B(y,\varepsilon ) } } \tfrac{1}{n}\log \mu _n \Big ({\overline{U}} \times \mathcal {Y}\, \Big  \, \mathcal {X}\times B(z,\delta ) \Big ) \ge \inf I(U ). \end{aligned}$$(1.5)

(B1) For all \((y_n)_{n\in \mathbb {N}}\) with \(y_n \rightarrow y\) and \(y_n \in {{\mathrm{supp}}}(\mu _n\circ \pi ^{1})\) for all n large enough, the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation upper bound with rate function I.
 (B2) For all \(x_1,\dots ,x_k\in \mathcal {X}\) and \(r_1,\dots ,r_k>0\), with \(W = \mathcal {X}\setminus [B(x_1,r_1)\cup \cdots \cup B(x_k,r_k)]\),$$\begin{aligned} \inf _{\varepsilon >0} \limsup _{n\rightarrow \infty }\sup _{ \genfrac{}{}{0.0pt}{}{z\in \mathcal {Y}, \delta \in (0,\varepsilon ) }{ B(z,\delta ) \subset B(y,\varepsilon ) } } \tfrac{1}{n}\log \mu _n \Big (W^\circ \times \mathcal {Y}\, \Big  \, \mathcal {X}\times B(z,\delta ) \Big ) \le \inf I(W ). \end{aligned}$$(1.6)
The next theorem is similar to Theorem 1.2, but considers the large deviation bounds for regular conditional kernels instead of product regular conditional probabilities.
Theorem 1.3

(A1) For all \((y_n)_{n\in \mathbb {N}}\) with \(y_n \rightarrow y\) and \(y_n \in {{\mathrm{supp}}}(\nu _n\circ \pi ^{1})\) for all n large enough, the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation lower bound with rate function I.
 (A2) For all \(x\in \mathcal {X}\) and \(r>0\), with \(U= B(x,r)\),$$\begin{aligned} \sup _{\varepsilon >0} \liminf _{n\rightarrow \infty }\inf _{ \genfrac{}{}{0.0pt}{}{z\in \mathcal {Y}, \delta \in (0,\varepsilon ) }{ B(z,\delta ) \subset B(y,\varepsilon ) } } \tfrac{1}{n}\log \nu _n \Big ({\overline{U}} \ \Big  \, \tau ^{1}(B(z,\delta ) ) \Big ) \ge \inf I(U ). \end{aligned}$$(1.8)

(B1) For all \((y_n)_{n\in \mathbb {N}}\) with \(y_n \rightarrow y\) and \(y_n \in {{\mathrm{supp}}}(\nu _n\circ \pi ^{1})\) for all n large enough, the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation upper bound with rate function I.
 (B2) For all \(x_1,\dots ,x_k\in \mathcal {X}\) and \(r_1,\dots ,r_k>0\), with \(W = \mathcal {X}\setminus [B(x_1,r_1)\cup \cdots \cup B(x_k,r_k)]\),$$\begin{aligned} \inf _{\varepsilon >0} \limsup _{n\rightarrow \infty }\sup _{ \genfrac{}{}{0.0pt}{}{z\in \mathcal {Y}, \delta \in (0,\varepsilon ) }{ B(z,\delta ) \subset B(y,\varepsilon ) } } \tfrac{1}{n}\log \nu _n \Big (W^\circ \, \Big  \, \tau ^{1}(B(z,\delta ) ) \Big ) \le \inf I(W ). \end{aligned}$$(1.9)
1.4 Gibbs–nonGibbs Transitions and Future Research
In this section we discuss the relation between the large deviation results in this paper and Gibbs–nonGibbs transitions in more detail. In particular, we discuss possible future directions regarding large deviations of conditional kernels.
The following situation for interacting particle systems occurs in the meanfield context (a similar context holds in the context of lattices). The initial system of socalled spins consists of distributions describing the interaction between spins via a potential V (for each n there is a distribution describing the law of n spins). This initial system is assumed to be Gibbs, which is called sequentially Gibbs in the meanfield context. Allowing the initial state to be transformed, for example, by an evolution of the spins, a question of interest is whether the transformed state is (sequentially) Gibbs. This question has been addressed in the meanfield context by Ermoleav and Külske [11] and by Fernández et al. [13] for \(\{1,+1\}\)valued spins, by den Hollander et al. [17] for \(\mathbb {R}\)valued spins and by Külske and Opoku [19] and van Enter et al. [10] for compactly valued spins. In these papers, independent dynamics of the spins are considered (the evolution of each spin is independent of the evolution of the other spins). Independent dynamics simplify the situation. Namely, the evolved measure on either the product space of the initial and the final space, or—in case of an evolution—the space of trajectories, is a tilted measure of the evolved measure when considering \(V=0\). In this case the measure is a product measure, which means that the spins are independent. As a consequence (this will be clarified in a forthcoming paper), the conditional kernel \(\eta _n\) of the initial state on n spins with respect to the final state (for a fixed potential V) is a tilted version of the conditional kernel \(\eta _n^0\) of the initial state with respect to the final state of independent spins (i.e. \(V=0\)). Because of this tilting, by Varadhan’s lemma, \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function \(V+ I_y  \inf (V+ I_y)\) if \((\eta _n^0(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function \(I_y\). In the forthcoming paper, we will prove that the evolved sequence is sequentially Gibbs if \(V+I_\zeta \) has a unique global minimiser.
The large deviation principle of \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) has been mentioned in the case of trajectories in [11, Corollary 2.4] and—as a corollary of that theorem—for the case of the product space of the initial and the final space in [13, Corollary 1.3]. However, no proof was given. Theorem 8.2 provides a rigorous proof of the large deviation principle statement in [13, Corollary 1.3]. In this paper, we do not provide a rigorous proof of [11, Corollary 2.4]. But Theorem 1.3 may be used, as the conditioning on the final state is a regular conditional kernel with respect to the map \(\tau : C([0,T],\mathcal {X}) \rightarrow \mathcal {X}\), \(\tau (f)=f(T)\).
In order to deal with empirical distributions (and not with magnetisations as is done in [17]), in future research we strive to “extend” the statement of Theorem 8.2 to infinite and possibly noncompact state spaces. In the case of noncompact spaces, it may be that topologies on the space of probability measures are considered that are not metrisable.
1.5 Outline
We list some notations, definitions and assumptions in Sect. 2. In Sect. 3, we give and compare the notions of regular conditional kernels, and we show that a regular conditional kernel under a measure \(\nu \) is in fact a product regular conditional kernel under a measure that is related to \(\nu \). In Sect. 4, we introduce and study weakly continuous regular conditional kernels. In Sect. 5, we present some facts about lower semicontinuous functions with compact sublevel sets. Relying on the results of Sects. 4 and 5, in Sect. 6, we present results on large deviation bounds for product regular conditional probabilities, in particular necessary and sufficient conditions for these bounds to hold. In Sect. 7, we discuss how to obtain large deviation bounds for regular conditional probabilities from the results in Sect. 6. In Sect. 8, we apply the theory to obtain the large deviation principle for the empirical density of the first coordinate given the empirical density of the second coordinate, for independent and identically distributed pairs of random variables. In Sect. 9, we give some examples. We also include an example for which the conditions are not satisfied. For this example we compare the quenched large deviations with large deviations of the weakly continuous regular conditional probabilities and comment on the difference with an example by La Cour and Schieve [20]. In “Appendices 1 and 2” we state some general results considering large deviations bounds that are used in the different sections. In “Appendix 3” we provide the proof of a theorem on which the examples of Sect. 9 rely.
2 Notations and Conventions
\(\mathbb {N}=\{1,2,3,\dots \}\). For a topological space \(\mathcal {X}\), we write \(\mathcal {B}(\mathcal {X})\) for the Borel \(\sigma \)algebra and \(\mathcal {P}(\mathcal {X})\) and \(\mathcal {M}(\mathcal {X})\) for the spaces of probability and signed measures on \(\mathcal {B}(\mathcal {X})\), respectively. For \(A\subset \mathcal {X}\) we write \(A^\circ \) for the interior of A and \({{\overline{A}}}\) for the closure of A. For \(x\in \mathcal {X}\) we write \(\delta _x\) for the element in \(\mathcal {P}(\mathcal {X})\) with \(\delta _x(A) =1\) if \( x\in A\) and \(\delta _x(A)=0\) otherwise. For \(x\in \mathcal {X}\) we write \(\mathcal {N}_x\) for the set of \(\mathcal {B}(\mathcal {X})\)measurable neighbourhoods of x. For a \(\mu \in \mathcal {M}(\mathcal {X})\) we write \({{\mathrm{supp}}}\, \mu = \{ x\in \mathcal {X}: \mu (V)>0 \text{ for } \text{ all } V\in \mathcal {N}_x\}\) and call this the support of \(\mu \). For a function f from a set \(\mathcal {X}\) into \(\mathbb {R}\) and \(c\in \mathbb {R}\) we write \([f \ge c] = \{x\in \mathcal {X}: f(x) \ge c\}\). Similarly, we use the notations \([f > c]\), \([f\le c]\) and \([f<c]\). Whenever \((x_\iota )_{\iota \in \mathbb {I}}\) is a net, where \(\mathbb {I}\) is a directed set by (a direction) \(\preceq \), we write \(\liminf _{\iota \in \mathbb {I}} x_\iota = \sup _{\iota _0\in \mathbb {I}} \inf _{\iota \succeq \iota _0, \iota \in \mathbb {I}} x_\iota \) (similarly \(\limsup \)). In particular, if \(\mathcal {V}\subset \mathcal {N}_x\) and \(\bigcap \mathcal {V}= \{x\}\) and \(f: \mathcal {V}\rightarrow \mathbb {R}\), we write \(\liminf _{V\in \mathcal {V}} f(V) = \sup _{V_0\in \mathcal {V}} \inf _{V\subset V_0, V\in \mathcal {V}} f(V)\) (i.e. we consider \((f(V))_{V\in \mathcal {V}}\) as a net where \(\mathcal {V}\) is directed by \(\supset \) (as \(\preceq \))).
Whenever we write \(\mu (A B)\), we implicitly assume that it is well defined (as \(\mu (A\cap B)/\mu (B)\)), i.e. that \(\mu (B)\ne 0\).
We use the conventions \(\log 0 =  \infty \) and \(\inf I(\emptyset ) = \infty \) whenever I is a function with values in \([0,\infty ]\).
All measures in this paper are signed measures, unless mentioned otherwise.
3 Regular Conditional Kernels Being Product Regular Conditional Kernels
In this section we introduce the notion of a (product) regular conditional kernel. For an extensive study on regular conditional kernels, see Bogachev [4, Section 10.4]. The notion of a product regular conditional kernel does not appear in [4], but it does in Faden [12] and in Leao et al. [21]. Besides giving definitions, we make a few observations, of which Theorem 3.6 is used later on to derive statements of regular conditional kernels from statements of product regular conditional kernels.
In this section \((X,\mathcal {A})\), \((Y,\mathcal {B})\) are measurable spaces, \(\nu \) is a measure on \(\mathcal {A}\) and \(\mu \) is a measure on \(\mathcal {A}\otimes \mathcal {B}\), \(\tau : X \rightarrow Y\) is measurable, and \(\pi : X \times Y \rightarrow Y\) is given by \(\pi (x,y)=y\).
Definition 3.1
A function \(\eta : Y \times \mathcal {A}\rightarrow \mathbb {R}\) is called a (\(\mathcal {B}\))kernel if \(\eta (\cdot ,A)\) is (\(\mathcal {B}\))measurable for all \(A\in \mathcal {A}\) and \(\eta (y,\cdot )\) is a measure for all \(y \in Y\). A kernel \(\eta \) is called a probability kernel if \(\eta (y,\cdot )\) is a probability measure for all \(y \in Y\).
Definition 3.2
 (a)\(\eta \) is called a regular conditional kernel (regular conditional probability) under \(\nu \) with respect to \(\tau \) if$$\begin{aligned} \nu (A \cap \tau ^{1}(B) ) = \int _Y \mathbbm {1}_B(y) \eta (y,A) {{\mathrm{\, \mathrm {d}}}}\left[ \nu \circ \tau ^{1}\right] (y) \quad (A\in \mathcal {A}, B\in \mathcal {B}). \end{aligned}$$(3.1)
 (b)\(\eta \) is called a product regular conditional kernel (product regular conditional probability) under \(\mu \) with respect to \(\pi \) if$$\begin{aligned} \mu (A \times B ) = \int _Y \mathbbm {1}_B(y) \eta (y,A) {{\mathrm{\, \mathrm {d}}}}\left[ \mu \circ \pi ^{1}\right] (y) \quad (A\in \mathcal {A}, B\in \mathcal {B}). \end{aligned}$$(3.2)
3.3
Suppose that \(\mathcal {E}\) is a sub\(\sigma \)algebra of \(\mathcal {F}\). Let \((Y,\mathcal {B}) = (X, \mathcal {E})\) and \({{\mathrm{Id}}}: (X, \mathcal {A}) \rightarrow (Y,\mathcal {B})\) be the identity map. In agreement of [4, Definition 10.4.1] a kernel \(\eta : Y \times \mathcal {A}\rightarrow \mathbb {R}\) is a regular conditional kernel under \(\mu \) with respect to \(\mathcal {E}\) if and only if \(\eta \) is a regular conditional kernel under \(\mu \) with respect to \({{\mathrm{Id}}}\).
3.4
Consider the two kernels \(\eta : Y \times \mathcal {A}\rightarrow \mathbb {R}\) and \(\xi : Y \times (\mathcal {A}\otimes \mathcal {B}) \rightarrow \mathbb {R}\), corresponding to each other by the formulas \(\xi (y, F) = \int _\mathcal {X}\mathbbm {1}_{F}(x,y) {{\mathrm{\, \mathrm {d}}}}[\eta (y,\cdot )](x)\) and \(\eta (y,A) = \xi (y,A\times Y)\). Then \(\xi \) is a regular conditional kernel under \(\mu \) given \(\pi \) if and only if \(\eta \) is a product regular conditional kernel under \(\mu \) given \(\pi \).
In general, \(X\times Y\) may be equipped with a \(\sigma \)algebra \(\mathcal {F}\) different from \(\mathcal {A}\otimes \mathcal {B}\). In this situation, where \(\mu \) is a measure on \(\mathcal {F}\) and \(\pi \) is \(\mathcal {F}\)measurable, the above correspondence cannot be used in general to reduce statements about product regular conditional kernels to statements about regular conditional kernels. See also example 4.5.
On the other hand, regular conditional probabilities can be seen as special cases of product regular conditional probabilities; see Theorem 3.6. In the present paper we use this to derive Theorem 1.3 from Theorem 1.2 but also Theorem 7.5 from Theorem 6.9.
Remark 3.5
If \(\mathcal {A}\) is generated by a countable set, two regular conditional probabilities under a measure with respect to a \(\sigma \)algebra (see 3.3) are almost everywhere equal (see Bogachev [4, Theorem 10.4.3]). Similarly one could state an analogous statement for regular conditional kernels with respect to measurable maps and for product regular conditional kernels. In Theorem 4.3 we prove that (product) regular conditional kernels are unique on the domain on which they are weakly continuous, in case the underlying topological space is perfectly normal. For such space the Borel \(\sigma \)algebra may not be generated by a countable set.^{2}
Theorem 3.6
 (a)
There exists a measure \({{\tilde{\mu }}}\) on \((X \times Y, \mathcal {A}\otimes \mathcal {B})\) for which \({{\tilde{\mu }}}(A\times B) = \nu (A \cap \tau ^{1}(B))\).
 (b)
\(\eta : Y\times \mathcal {A}\rightarrow \mathbb {R}\) is a regular conditional kernel under \(\nu \) with respect to \(\tau \) if and only if \(\eta \) is a product conditional kernel under \({{\tilde{\mu }}}\) with respect to \(\pi \).
Proof
 (a)
We may assume \(\nu \) to be positive, since \(\nu = \nu ^+  \nu ^\). Let \(\mathcal {E}\) be the set that consists of \(\bigcup _{i=1}^n A_i \times B_i\), where \(n\in \mathbb {N}\) and \(A_i \in \mathcal {A}, B_i \in \mathcal {B}\) are such that \(A_1\times B_1,\dots , A_n\times B_n\) are disjoint. Define \(\nu ^*: \mathcal {E}\rightarrow [0,\infty )\) by \( \nu ^* \left( \bigcup _{i=1}^n A_i \times B_i \right) = \nu \left( \bigcup _{i=1}^n A_i \cap \tau ^{1}(B_i) \right) \) for \(A_1,\dots ,A_n\in \mathcal {A}\) and \(B_1,\dots ,B_n \) \( \in \mathcal {B}\) as above. Checking that \(\mathcal {E}\) is a ring of sets and that \(\nu ^*\) is \(\sigma \)additive is left for the reader. The existence and unicity of the extension \({{\tilde{\mu }}}\) follow from the Carathéodory theorem (see Halmos [15, Section 13, Theorem A]).
 (b)
It follows from by definition of \({{\tilde{\mu }}}\) (note that \(\nu \circ \tau ^{1} = {{\tilde{\mu }}} \circ \pi ^{1}\)). \(\square \)
4 Weakly Continuous Kernels
In this section we introduce the notion of weak continuity for kernels on topological spaces. In Theorem 4.3 we show uniqueness of (product) regular conditional kernels that are weakly continuous. In Theorems 4.6 and 4.8 we describe conditions that imply the existence of weakly continuous regular conditional probabilities. Similarly as is done in the Portmanteau theorem when one considers metric spaces, weak convergence implies lower bounds for open sets and upper bounds for closed sets, as is shown in Theorem 4.10. As described in Lemmas 4.11 and 4.12, these \(\liminf \) and \(\limsup \) bounds imply bounds for (product) regular conditional probabilities on which the results of Sects. 6 and 7 are based.
In this section \(\mathcal {X}\) and \(\mathcal {Y}\) are topological spaces, \(\nu \) is a measure on \(\mathcal {B}(\mathcal {X})\), \(\mu \) is a measure on \(\mathcal {B}(\mathcal {X})\otimes \mathcal {B}(\mathcal {Y})\), \(\tau : \mathcal {X}\rightarrow \mathcal {Y}\) is measurable, and \(\pi : \mathcal {X}\times \mathcal {Y}\rightarrow \mathcal {Y}\) is given by \(\pi (x,y) = y\).
Definition 4.1
We equip the space of measures, \(\mathcal {M}(\mathcal {X})\), with the weak topology (generated by \(C_b(\mathcal {X})\), which we denote by \(\sigma (\mathcal {M}(\mathcal {X}),C_b(\mathcal {X}))\) as in the book of Schaefer [26, Chapter II, Section 5]). In this topology, a net \((\mu _\iota )_{\iota \in \mathbb {I}}\) in \(\mathcal {M}(\mathcal {X})\) converges to a \(\mu \) in \(\mathcal {M}(\mathcal {X})\) if and only if \(\int _\mathcal {X}f {{\mathrm{\, \mathrm {d}\!}}}\mu _\iota \rightarrow \int _\mathcal {X}f {{\mathrm{\, \mathrm {d}\!}}}\mu \) for all \(f\in C_b(\mathcal {X})\).
Let \(D\subset \mathcal {Y}\). A kernel \(\eta : \mathcal {Y}\times \mathcal {B}(\mathcal {X}) \rightarrow \mathbb {R}\) is called weakly continuous on D if the map \(D\rightarrow \mathcal {M}(\mathcal {X})\) given by \(y\mapsto \eta (y,\cdot )\) is continuous in the weak topology. \(\eta \) is called weakly continuous if \(\eta \) is weakly continuous on \(\mathcal {Y}\).
Theorem 4.2
Proof
We may assume \(\mu \) is positive. Let \(x\in {{\mathrm{supp}}}\, \mu \). Then \(\mu (V)>0\) for all \(V\in \mathcal {N}_x\). Let \(f\in C(\mathcal {X},[0,1])\) be such that \(f(x)>0\). Then \(V= f^{1}(0,\infty )\) has strictly positive measure. Since \(\mu (V) = \lim _{n\rightarrow \infty }\int _\mathcal {X}\min \{nf, 1\} {{\mathrm{\, \mathrm {d}\!}}}\mu \), there exists an n such that \(\int _\mathcal {X}\min \{nf, 1\} {{\mathrm{\, \mathrm {d}\!}}}\mu >0\). Consequently, as \(f \ge \frac{1}{n} \min \{nf, 1\}\), we have \(\int _\mathcal {X}f {{\mathrm{\, \mathrm {d}\!}}}\mu >0\).
Let \(x\in \mathcal {X}\) be such that \(\int _\mathcal {X}f {{\mathrm{\, \mathrm {d}\!}}}\mu >0\) for all \(f \in C(\mathcal {X},[0,1])\) with \(f(x)>0\). Let \(V\in \mathcal {N}_x\). As \(V= f^{1}(0,\infty )\) for some \(f\in C(\mathcal {X},[0,1])\), we have \(\mu (V) \ge \int _\mathcal {X}f {{\mathrm{\, \mathrm {d}\!}}}\mu >0\). \(\square \)
Theorem 4.3
 (a)
Let \(\eta \) and \(\zeta \) be regular conditional kernels under \(\nu \) with respect to \(\tau \) that are weakly continuous on \({{\mathrm{supp}}}(\nu \circ \tau ^{1})\). Then \(\eta (y,\cdot )=\zeta (y,\cdot )\) for all \(y\in {{\mathrm{supp}}}(\nu \circ \tau ^{1})\). If \(\nu \) is a probability measure, then \(\eta (y,\cdot )\) is a probability measure for all \(y\in {{\mathrm{supp}}}(\nu \circ \tau ^{1})\).
 (b)
Let \(\eta \) and \(\zeta \) be product regular conditional kernels under \(\mu \) with respect to \(\pi \) that are weakly continuous on \({{\mathrm{supp}}}(\mu \circ \pi ^{1})\). Then \(\eta (y,\cdot )=\zeta (y,\cdot )\) for all \(y\in {{\mathrm{supp}}}(\mu \circ \pi ^{1})\). If \(\mu \) is a probability measure, then \(\eta (y,\cdot )\) is a probability measure for all \(y\in {{\mathrm{supp}}}(\mu \circ \pi ^{1})\).
Proof
4.4
When \(\eta \) is a regular conditional kernel under \(\nu \) with respect to \(\tau \), the value of the function \(\eta (\cdot ,A)\) on the complement of \({{\mathrm{supp}}}(\nu \circ \tau ^{1})\) is not determined, in the sense that if \({{\tilde{\eta }}}\) is a kernel with \({{\tilde{\eta }}}(y,\cdot ) = \eta (y,\cdot )\) for all \(y\in {{\mathrm{supp}}}(\nu \circ \tau ^{1})\), then \({{\tilde{\eta }}}\) is also a regular conditional kernel under \(\nu \) with respect to \(\tau \).
For example \({{\tilde{\eta }}}\) given by \({{\tilde{\eta }}}(y,\cdot ) = \eta (y,\cdot )\) for \(y\in {{\mathrm{supp}}}(\nu \circ \tau ^{1})\) and \({{\tilde{\eta }}}(y,\cdot ) = \delta _x\) for \(y\in {{\mathrm{supp}}}(\nu \circ \tau ^{1})^c\) for some chosen \(x\in \mathcal {X}\), is such regular conditional kernel.
Whence if \(\nu \) is a probability measure and there exists a regular conditional kernel under \(\nu \) with respect to \(\tau \) that is weakly continuous on \({{\mathrm{supp}}}(\nu \circ \tau ^{1})\), then we may as well assume this kernel to be a probability kernel. A similar statement is true for product regular conditional kernels.
4.5
By Theorem 3.6, statement (a) of Theorem 4.3 is a consequence of statement (b). In an attempt to reduce statement (b) to statement (a), the following problem occurs to the correspondence between regular conditional kernels and product regular conditional kernels that is mentioned in 3.4.
The Borel \(\sigma \)algebra of \(\mathcal {X}\times \mathcal {Y}\), i.e. \(\mathcal {B}(\mathcal {X}\times \mathcal {Y})\), may be strictly larger than \(\mathcal {B}(\mathcal {X}) \otimes \mathcal {B}(\mathcal {Y})\) (see, e.g. Bogachev [4, Lemma 6.4.1 and Example 6.4.3]). If this is the case, i.e. \(\mathcal {B}(\mathcal {X})\otimes \mathcal {B}(\mathcal {Y}) \subsetneq \mathcal {B}(\mathcal {X}\times \mathcal {Y})\), and \(\mathcal {B}(\mathcal {X}\times \mathcal {Y})\) equals the Baire\(\sigma \)algebra on \(\mathcal {X}\times \mathcal {Y}\), i.e. the smallest \(\sigma \)algebra that makes all continuous function \(\mathcal {X}\times \mathcal {Y}\rightarrow \mathbb {R}\) measurable, then there exists a continuous function \(f\in C(\mathcal {X}\times \mathcal {Y})\) that is not \(\mathcal {B}(\mathcal {X}) \otimes \mathcal {B}(\mathcal {Y})\)measurable. Composing the function f with \(\arctan \), we obtain a \(g\in C_b(\mathcal {X}\times \mathcal {Y})\) that is not measurable with respect to \(\mathcal {B}(\mathcal {X})\otimes \mathcal {B}(\mathcal {Y})\). So if \(\eta : \mathcal {Y}\times \mathcal {B}(\mathcal {X}) \rightarrow \mathbb {R}\) is a product regular conditional kernel under \(\mu \) with respect to \(\pi \), and \(\xi : \mathcal {Y}\times \mathcal {B}(\mathcal {X}) \otimes \mathcal {B}(\mathcal {Y}) \rightarrow \mathbb {R}\) is as in Example 3.4, then g is not integrable with respect to \(\xi (y,\cdot )\) for any \(y\in \mathcal {Y}\).
\(\mathcal {B}(\mathcal {X}\times \mathcal {Y})\) equals the Baire\(\sigma \)algebra if \(\mathcal {X}\times \mathcal {Y}\) is a metric space (Bogachev [4, Proposition 6.3.4]). Therefore \(\mathcal {X}= \mathcal {Y}= \mathbb {R}^\mathbb {R}\) equipped with the discrete topology form an example for which the above is the case.
We state two theorems (Theorems 4.6 and 4.8) showing the existence of product regular conditional probabilities that are weakly continuous on \({{\mathrm{supp}}}(\mu \circ \pi ^{1})\).
Theorem 4.6
Proof
It follows from the fact that \(\mu (A\times B) = \sum _{y\in B} \mu (A\times \{y\})\) for \(A\in \mathcal {B}(\mathcal {X})\), \(B\in \mathcal {B}(\mathcal {Y})\). \(\square \)
4.7
 (a)
\(\eta \) is weakly continuous in y.
 (b)
For all \((y_n)_{n\in \mathbb {N}}\) in \(\mathcal {Y}\) with \(y_n \rightarrow y\), one has \(\eta (y_n,\cdot ) \xrightarrow {w} \eta (y,\cdot )\).
The following theorem is an easy consequence of Lebesgue dominated convergence theorem.
Theorem 4.8
Remark 4.9
In the above theorem the conditions may be weakened. Instead of assuming f to be bounded and \(\lambda \), \(\kappa \) to be probability measures, we may as well assume that \(\lambda \) and \(\kappa \) are positive nonzero measures; that for all \(y\in D\) there exists a \(V\in \mathcal {N}_y\) and a \(\lambda \)integrable \(h: \mathcal {X}\rightarrow [0,\infty )\) such that \(f(x,z) \le h(x)\) for all \(x\in \mathcal {X}\); and all \(z\in V \cap D \) and that f is \(\lambda \otimes \kappa \)integrable.
In Sect. 6, the condition (b) of Theorem 4.10 is one of the key assumptions. If \(\mathcal {X}\) is a metric space, this property follows from weak continuity as in the Portmanteau theorem. We state this in Theorem 4.10.
Theorem 4.10
 (a)
\(D \rightarrow \mathcal {M}(\mathcal {X})\), \(y\mapsto \eta (y,\cdot )\) is weakly continuous in y.
 (b)
\(\liminf _{\iota \in \mathbb {I}} \eta (y_\iota ,G) \ge \eta (y,G)\) for all open \(G\subset \mathcal {X}\) and \((y_\iota )_{\iota \in \mathbb {I}}\) in D with \(y_\iota \rightarrow y\).
 (c)
\(\limsup _{\iota \in \mathbb {I}}\eta (y_\iota ,F) \le \eta (y,F)\) for all closed \(F\subset \mathcal {X}\) and \((y_\iota )_{\iota \in \mathbb {I}}\) in D with \(y_\iota \rightarrow y\).
 (d)
\(\sup _{V\in \mathcal {V}} \inf _{v\in V\cap D} \eta (v,G) \ge \eta (y,G)\) for all open sets \(G\subset \mathcal {X}\).
 (e)
\(\inf _{V\in \mathcal {V}} \sup _{v\in V\cap D} \eta (v,F) \le \eta (y,F)\) for all closed sets \(F\subset \mathcal {X}\).
Proof
We leave it to the reader to check the equivalences between (b), (c), (d), (e). If \(\mathcal {X}\) is a metric space, one can follow the lines of the Portmanteau theorem in the book of Billingsley [3, Theorem 2.1] for the implication (a) implies (b); the fact that the measures in the proof are indexed by the natural numbers instead of a general directed set \(\mathbb {I}\) does not affect the argument. The proof of (b)\(\Longrightarrow \)(a) in the book of Billingsley relies on the Lebesgue dominated convergence theorem. But when \(\mathcal {Y}\) is first countable, one can restrict to sequences (see 4.7) and obtain the implication (b)\(\Longrightarrow \)(a) as is done in the book of Billingsley. \(\square \)
Lemma 4.11
Proof
For a regular conditional probability we have a similar statement; see Lemma 4.12. The proof can be done following the lines of the proof of Lemma 4.11 or as a consequence of Lemma 4.11 using Theorem 3.6.
Lemma 4.12
5 Some Facts About Functions with Compact Sublevel Sets
In this section we present some facts for functions with compact sublevel sets which are used in Sects. 6, 7 and 8.
In this section \(\mathcal {X},\mathcal {Y}\) and \(\mathcal {Z}\) are topological spaces.
Definition 5.1
Let \(J: \mathcal {X}\rightarrow [0,\infty ]\). We call the set \([J\le \alpha ]\) (see Sect. 2) a sublevel set of J for \(\alpha \in [0,\infty )\). J is said to be lower semicontinuous if all sublevels of J are closed. J is said to have compact sublevel sets if all sublevels of J are compact.
5.2
Lemma 5.3
Proof
5.4
The assumption that \(\tau \) be continuous is not redundant, e.g. consider \(\mathcal {Y}=\mathcal {Z}=[0,1]\) and \(J = \mathbbm {1}_{(\frac{1}{2},1]}\) and \(\tau \) given by \(\tau (0)=0\), \(\tau (1)=1\) and \(\tau (x) = 1x\) for \(x\in (0,1)\), \(F=[0,1]\) and \(y=1\). Then, for all neighbourhoods V of y, \(\tau ^{1}(V)\) contains the interval \((0,\varepsilon )\) for some \(\varepsilon >0\), whence \(\inf J(F\cap \tau ^{1}(V)) = 0\) but \(\inf J(F\cap \tau ^{1}(\{y\})) = J(1) = 1\).
Lemma 5.5
 (a)For all open \(G\subset \mathcal {X}\) and \(\varepsilon >0\), there exists a \(U\in \mathcal {G}\) with \(U\subset {{\overline{U}}} \subset G\) such that$$\begin{aligned} \inf J(G \times \{y\}) + \varepsilon \ge \inf J (U \times \{y\}). \end{aligned}$$(5.6)
 (b)For all closed \(F\subset \mathcal {X}\) and \(\alpha < \inf J(F \times \{y\})\), there exists \(U_1,\dots , U_k \in \mathcal {G}\) such that with \(W=\mathcal {X}\setminus (U_1\cup \cdots \cup U_k)\) one has \(F\subset W^\circ \subset W\) and$$\begin{aligned} \alpha&< \inf J(W \times \{y\}) \le \inf J(W^\circ \times \{y\}) \le \inf J(F \times \{y\}). \end{aligned}$$(5.7)
Proof
 (a)
Let \(\varepsilon >0\). Let \(x\in G\) be such that \( J(x,y) \le \inf J(G \times \{y\}) + \varepsilon . \) Since \(\mathcal {X}\) is a normal topological space, there exists an open set U with \(x\in U \subset {{\overline{U}}} \subset G\). Because \(\mathcal {G}\) is a basis, U may be chosen in \(\mathcal {G}\). Then \( \inf J(G \times \{y\}) + \varepsilon \ge J(x,y) \ge \inf J(U \times \{y\}). \)
 (b)
Let \(\beta >\alpha \) be such that \(\beta < \inf J(F \times \{y\})\). The set \(K:=\{ x\in \mathcal {X}: J(x,y)\le \beta \}\) is a compact set that is disjoint from F. Whence there exists disjoint open \(U,V \subset \mathcal {X}\) with \(K\subset U\) and \(F\subset V\). Since \(\mathcal {G}\) is a basis and K is compact, there exists \(U_1,\dots ,U_k\) in \(\mathcal {G}\) with \(K\subset U_1\cup \cdots \cup U_k \subset U\). Then \(\overline{ U_1\cup \cdots \cup U_k} \cap V = \emptyset \). Whence with \(W:=\mathcal {X}\setminus \overline{ U_1\cup \cdots \cup U_k}\), one has \(F\subset W^\circ \) and \(W\subset \mathcal {X}\setminus K\), which implies \(\inf J(W \times \{y\}) \ge \beta > \alpha \). \(\square \)
6 Large Deviations for Product Regular Conditional Probabilities
 (i)
\(\mathcal {X}\) and \(\mathcal {Y}\) are topological spaces, where \(\mathcal {X}\) is normal.
 (ii)
\(\mathcal {G}\) is a basis for the topology of \(\mathcal {X}\) and \(\mathcal {H}\) is a basis for the topology of \(\mathcal {Y}\).
 (iii)
\(\pi : \mathcal {X}\times \mathcal {Y}\rightarrow \mathcal {Y}\) is given by \(\pi (x,y) =y\).
 (iv)
\((\mu _n)_{n\in \mathbb {N}}\) is a sequence of probability measures on \(\mathcal {B}(\mathcal {X}) \otimes \mathcal {B}(\mathcal {Y})\) satisfying the large deviation principle on \(\{A\times B: A\in \mathcal {B}(\mathcal {X}), B\in \mathcal {B}(\mathcal {Y})\}\) with a rate function \(J: \mathcal {X}\times \mathcal {Y}\rightarrow [0,\infty ]\) that has compact sublevel sets.
 (v)For each \(n\in \mathbb {N}\) we assume the following: \({{\mathrm{supp}}}(\mu _n \circ \pi ^{1}) \ne \emptyset \),^{5} there exists a product regular conditional probability \(\eta _n : \mathcal {Y}\times \mathcal {B}(\mathcal {X}) \rightarrow [0,1]\) under \(\mu _n\) with respect to \(\pi \), which satisfies the following continuity condition (see Theorem 4.10):$$\begin{aligned}&\liminf _{\iota \in \mathbb {I}} \eta _n(y_\iota ,G) \ge \eta _n(y,G) \hbox { for all open }G\subset \mathcal {X}\nonumber \\&\hbox {and }(y_\iota )_{\iota \in \mathbb {I}} \hbox { in }{{\mathrm{supp}}}(\mu _n \circ \pi ^{1}) \hbox {with } y_\iota \rightarrow y. \end{aligned}$$(6.1)
 (vi)Let \(y \in \mathcal {Y}\). We assume that \(\inf J(\mathcal {X}\times \{y\})<\infty \) and that there exist \(y_n \in {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\) with \(y_n \rightarrow y\). We define \(I: \mathcal {X}\rightarrow [0,\infty ]\) by$$\begin{aligned} I(x) = J(x,y)  \inf J(\mathcal {X}\times \{y\}). \end{aligned}$$(6.2)
In Theorem 6.3 we consider a fixed sequence \((y_n)_{n\in \mathbb {N}}\) with \(y_n \rightarrow y\) and describe equivalent conditions for the lower and upper large deviation bound to hold.
We are interested in the question whether for all sequences \((y_n)_{n\in \mathbb {N}}\) with \(y_n \rightarrow y\) the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the lower and upper large deviation bound with rate function I. In Theorem 6.9 we give equivalent^{6} and sufficient conditions for these bounds in a way that does not depend on sequences \((y_n)_{n\in \mathbb {N}}\) and the sets \((\mathcal {V}_n)_{n\in \mathbb {N}}\) as in Theorem 6.3.
Finally in 6.12 we comment on deriving Theorem 1.2 from Theorem 6.9.
But first we consider specific situations, providing a simple proof of the large deviation bounds with rate function I for sequences of the form \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\). Namely, we consider the case that \(\mathcal {Y}\) is a discrete space (Theorem 6.1) and the case where \(\mu _n\) is a product measure for all \(n\in \mathbb {N}\) (Theorem 6.2).
Theorem 6.1
Suppose that \(\mathcal {Y}\) is countable and equipped with the discrete topology. Let \(y\in \mathcal {Y}\) be such that \(\inf J(\mathcal {X}\times \{y\})<\infty \). For all \((y_n)_{n\in \mathbb {N}}\) in \(\mathcal {Y}\) with \(y_n \in {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\) and \(y_n \rightarrow y\), the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function I.
Proof
Theorem 6.2
(Independent coordinates) Suppose that \(\mathcal {X}\) and \(\mathcal {Y}\) are second countable and \(\mathcal {Y}\) is regular. Suppose that \(\mu _n = \mu _n^1 \otimes \mu _n^2\) for some \(\mu _n^1\) on \(\mathcal {B}(\mathcal {X})\) and \(\mu _n^2\) on \(\mathcal {B}(\mathcal {Y})\) for all \(n\in \mathbb {N}\). Then \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function I for all sequences \((y_n)_{n\in \mathbb {N}}\) in \(\mathcal {Y}\). In particular, \(\eta _n(y_n,\cdot ) = \mu _n^1\) and \(I(x) = \inf J(\{x\} \times \mathcal {Y})\).
Proof
I is lower semicontinuous (e.g. by 5.2) and for \(c\in \mathbb {R}\) the set \([I \le c]\) is a subset of the compact set \(\{x\in \mathcal {X}: \exists z\in \mathcal {Y}, J(x,z) \le c  \inf J(\mathcal {X}\times \{y\})\}\).
\([I\le c] = \pi ( [ J \le c +\inf J(\mathcal {X}\times \{y\})])\). \(\square \)
Theorem 6.3
 (a1)For all open \(G\subset \mathcal {X}\)$$\begin{aligned} \liminf _{n\rightarrow \infty }\tfrac{1}{n}\log \eta _n(y_n,G) \ge  \inf I(G). \end{aligned}$$(6.5)
 (a2)For all \(U\in \mathcal {G}\) ^{7}$$\begin{aligned} \liminf _{n\rightarrow \infty }\limsup _{V\in \mathcal {V}_n} \tfrac{1}{n}\log \mu _n({\overline{U}} \times \mathcal {Y} \mathcal {X}\times V) \ge  \inf I(U). \end{aligned}$$(6.6)
 (a3)For all open \(U\subset \mathcal {X}\), one has$$\begin{aligned} \liminf _{n\rightarrow \infty }\liminf _{V\in \mathcal {V}_n} \tfrac{1}{n}\log \mu _n(U \times \mathcal {Y} \mathcal {X}\times V) \ge  \inf I(U). \end{aligned}$$(6.7)
 (b1)For all closed \(F\subset \mathcal {X}\)$$\begin{aligned} \limsup _{n\rightarrow \infty }\tfrac{1}{n}\log \eta _n(y_n,F) \le  \inf I(F). \end{aligned}$$(6.8)
 (b2)For all \(U_1,\dots ,U_k\in \mathcal {G}\), one has for \(W= \mathcal {X}\setminus (U_1\cup \cdots \cup U_k)\)$$\begin{aligned} \limsup _{n\rightarrow \infty }\liminf _{V\in \mathcal {V}_n} \tfrac{1}{n}\log \mu _n(W^\circ \times \mathcal {Y} \mathcal {X}\times V) \le  \inf I(W). \end{aligned}$$(6.9)
 (b3)For all closed \(W\subset \mathcal {X}\)$$\begin{aligned} \limsup _{n\rightarrow \infty }\limsup _{V\in \mathcal {V}_n} \tfrac{1}{n}\log \mu _n(W \times \mathcal {Y} \mathcal {X}\times V) \le  \inf I(W). \end{aligned}$$(6.10)
Proof
The implications (a3) \(\Longrightarrow \) (a2) and (b3) \(\Longrightarrow \) (b2) are immediate.
6.4
(Fixed y) Note that if \(y_n =y\) for all \(n\in \mathbb {N}\), one can take \(\mathcal {V}_n = \mathcal {V}\) for a \(\mathcal {V}\subset \mathcal {N}_y\) with \(\bigcap \mathcal {V}= \{y\}\). Then Theorem 6.3 implies that \((\eta _n(y,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function I if and only if (a2) and (b2) hold (with \(\mathcal {V}_n=\mathcal {V}\)).
6.5
Let \((y_n)_{n\in \mathbb {N}}\) in \(\mathcal {Y}\) be such that \(y_n \in {{\mathrm{supp}}}(\mu _n\circ \pi ^{1})\) and \(y_n \rightarrow y\). From Theorem 6.3 we derive that (a2) holds for some \(\mathcal {V}_n\subset \mathcal {N}_{y_n}\) with \(\bigcap \mathcal {V}_n=\{y_n\}\) if and only if (a2) holds for all such \(\mathcal {V}_n\). Similarly, (b2) holds for some \(\mathcal {V}_n\subset \mathcal {N}_{y_n}\) with \(\bigcap \mathcal {V}_n=\{y_n\}\) if and only if (b2) holds for all such \(\mathcal {V}_n\subset \mathcal {N}_{y_n}\).
In Lemma 6.7, we give a consequence of the large deviation principle of \((\mu _n)_{n\in \mathbb {N}}\). In Theorems 6.9 and 6.10 we use this to formulate sufficient conditions for upper or lower large deviation bounds on sequences \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) with \(y_n \rightarrow y\) and sequences \((\eta _n(y,\cdot ))_{n\in \mathbb {N}}\).
We assumed \(\mathcal {X}\) to be normal in this section. For Lemma 6.7 this assumption can be dropped.
6.6
Lemma 6.7
 (a)For open \(G\subset \mathcal {X}\)$$\begin{aligned} \liminf _{V\in \mathcal {N}_y} \liminf _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \mu _n(\mathcal {X}\times V)>0 } } \tfrac{1}{n}\log \mu _n(G \times \mathcal {Y} \mathcal {X}\times V) \ge  \inf I (G). \end{aligned}$$(6.16)
 (b)For closed \(F\subset \mathcal {X}\)$$\begin{aligned} \limsup _{V\in \mathcal {N}_y} \limsup _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \mu _n(\mathcal {X}\times V)>0 } } \tfrac{1}{n}\log \mu _n(F \times \mathcal {Y} \mathcal {X}\times V) \le  \inf I(F). \end{aligned}$$(6.17)
Proof
 (a)Let \(\varepsilon >0\). By Lemma 5.3, there exists a \(V_0\in \mathcal {N}_y\) such that for all \(V\in \mathcal {N}_y\) with \(V\subset V_0\)Let \(V\in \mathcal {N}_y\) be such that \(V \subset V_0\). As \(\limsup _{n\rightarrow \infty }\tfrac{1}{n}\log \mu _n(\mathcal {X}\times V) > \infty \) (see 6.6), we can “split the \(\liminf \) in two” and we get by the large deviation principle and by (6.18)$$\begin{aligned}&\inf J(\mathcal {X}\times \{y\}) \ge \inf J(\mathcal {X}\times {\overline{V}}) \ge \inf J(\mathcal {X}\times {\overline{V}}_0) \ge \inf J(\mathcal {X}\times \{y\})  \varepsilon . \end{aligned}$$(6.18)$$\begin{aligned}&\liminf _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \mu _n(\mathcal {X}\times V)>0 } } \tfrac{1}{n}\log \mu _n (G \times \mathcal {Y} \mathcal {X}\times V) \nonumber \\&\qquad \qquad =\liminf _{n\rightarrow \infty }\tfrac{1}{n}\log \mu _n(G \times V)  \limsup _{n\rightarrow \infty }\tfrac{1}{n}\log \mu _n(\mathcal {X}\times V) \nonumber \\&\qquad \qquad \ge  \inf J(G \times \{y\} ) + \inf J(\mathcal {X}\times {\overline{V}}) \ge  \inf I(G)  \varepsilon . \end{aligned}$$(6.19)
 (b)Let \(\alpha < \inf J(F \times \{y\})\). There exists a neighbourhood \(V_0\) of y such that for all neighbourhoods V of y with \(V\subset V_0\)Let \(V\in \mathcal {N}_y\) be such that \(y\in V \subset V_0\). Similarly as above, we get$$\begin{aligned}&\inf J(F \times \{y\}) \ge \inf J(F \times {\overline{V}}) \ge \inf J(F \times {\overline{V}}_0) \ge \alpha . \end{aligned}$$(6.20)\(\square \)$$\begin{aligned} \limsup _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \mu _n(\mathcal {X}\times V)>0 } } \tfrac{1}{n}\log \mu _n(F \times \mathcal {Y} \mathcal {X}\times V)&\le  \alpha + \inf J(\mathcal {X}\times \{y\} ). \end{aligned}$$(6.21)
Theorem 6.8
I has compact sublevel sets.
Proof
\([I\le c] = \pi ([ J \le c +\inf J(\mathcal {X}\times \{y\})])\). \(\square \)
Theorem 6.9

(A1) For all \((y_n)_{n\in \mathbb {N}}\) with \(y_n \in {{\mathrm{supp}}}(\mu _n\circ \pi ^{1})\) and \(y_n \rightarrow y\), the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation lower bound with rate function I.
 (A2) For all \(U\in \mathcal {G}\)$$\begin{aligned} \sup _{V_0\in \mathcal {N}_y} \liminf _{n\rightarrow \infty }\inf _{ \genfrac{}{}{0.0pt}{}{V\in \mathcal {H}, V \subset V_0}{V\cap {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\ne \emptyset } } \tfrac{1}{n}\log \mu _n({\overline{U}} \times \mathcal {Y} \mathcal {X}\times V) \ge  \inf I(U). \end{aligned}$$(6.22)
 (A3) For all \(U\in \mathcal {G}\)$$\begin{aligned}&\sup _{V_0\in \mathcal {N}_y} \liminf _{n\rightarrow \infty }\inf _{ \genfrac{}{}{0.0pt}{}{V\in \mathcal {H}, V \subset V_0}{V\cap {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\ne \emptyset } } \tfrac{1}{n}\log \mu _n({\overline{U}} \times \mathcal {Y} \mathcal {X}\times V) \nonumber \\&\quad \ge \liminf _{V\in \mathcal {N}_y} \liminf _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \mu _n(\mathcal {X}\times V)>0 } } \tfrac{1}{n}\log \mu _n( U \times \mathcal {Y} \mathcal {X}\times V). \end{aligned}$$(6.23)
 (A4) For all \(U\in \mathcal {G}\) we have \(\forall Z_0\in \mathcal {N}_y\; \forall \varepsilon >0 \exists V_0\in \mathcal {N}_y\, \exists Z\;\in \mathcal {N}_y, Z\subset Z_0\; \forall M \,\exists m \ge M\, \exists N\; \forall n\ge N\; \forall V\in \mathcal {H}, V \subset V_0, V\cap {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\ne \emptyset \):$$\begin{aligned} \tfrac{1}{n}\log \mu _n (\overline{U} \times \mathcal {Y} \mathcal {X}\times V) \ge \tfrac{1}{m}\log \mu _m (U \times \mathcal {Y} \mathcal {X}\times Z)  \varepsilon . \end{aligned}$$(6.24)
 (A5) For all \(U\in \mathcal {G}\) we have \(\forall \varepsilon >0\; \forall V_0 \in \mathcal {N}_y \exists N\;\in \;\mathbb {N}\; \forall n \ge N\; \forall V\in \mathcal {H}, V\subset V_0, V\cap {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\ne \emptyset \):$$\begin{aligned} \mu _n( {{\overline{U}}} \times \mathcal {Y} \mathcal {X}\times V) \ge e^{n\varepsilon } \mu _n( U \times \mathcal {Y} \mathcal {X}\times V_0). \end{aligned}$$(6.25)

(B1) For all \((y_n)_{n\in \mathbb {N}}\) with \(y_n \in {{\mathrm{supp}}}(\mu _n\circ \pi ^{1})\) and \(y_n \rightarrow y\) the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation upper bound with rate function I.
 (B2) For all \(U_1,\dots ,U_k\in \mathcal {G}\) one has for \(W= \mathcal {X}\setminus (U_1 \cup \cdots \cup U_k)\)$$\begin{aligned}&\inf _{V_0\in \mathcal {N}_y} \limsup _{n\rightarrow \infty }\sup _{ \genfrac{}{}{0.0pt}{}{V\in \mathcal {H}, V \subset V_0}{V\cap {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\ne \emptyset } } \tfrac{1}{n}\log \mu _n(W^\circ \times \mathcal {Y} \mathcal {X}\times V) \le  \inf I(W). \end{aligned}$$(6.26)
 (B3) For all \(U_1,\dots , U_k\in \mathcal {G}\) with \(W= \mathcal {X}\setminus (U_1\cup \cdots \cup U_k)\)$$\begin{aligned}&\inf _{V_0\in \mathcal {N}_y} \limsup _{n\rightarrow \infty }\sup _{ \genfrac{}{}{0.0pt}{}{V\in \mathcal {H}, V \subset V_0}{V\cap {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\ne \emptyset } } \tfrac{1}{n}\log \mu _n(W^\circ \times \mathcal {Y} \mathcal {X}\times V) \nonumber \\&\quad \le \limsup _{V\in \mathcal {N}_y} \limsup _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \mu _n(\mathcal {X}\times V)>0 } } \tfrac{1}{n}\log \mu _n(W \times \mathcal {Y} \mathcal {X}\times V). \end{aligned}$$(6.27)
 (B4) For all \(U_1,\dots , U_k\in \mathcal {G}\) with \(W= \mathcal {X}\setminus (U_1\cup \cdots \cup U_k)\) we have \(\forall Z_0\in \mathcal {N}_y\; \forall \varepsilon >0 \,\exists V_0\in \mathcal {N}_y \,\exists Z\in \mathcal {N}_y, Z\subset Z_0\; \forall M \,\exists m \ge M \,\exists N\; \forall n\ge N\; \forall V\in \mathcal {H}, V \subset V_0,V\cap {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\ne \emptyset \):$$\begin{aligned} \tfrac{1}{n}\log \mu _n ({\overline{U}} \times \mathcal {Y} \mathcal {X}\times V) \le \tfrac{1}{m}\log \mu _m (U \times \mathcal {Y} \mathcal {X}\times Z) + \varepsilon . \end{aligned}$$(6.28)
 (B5) For all \(U_1,\dots , U_k\in \mathcal {G}\) with \(W= \mathcal {X}\setminus (U_1\cup \cdots \cup U_k)\) we have \(\forall \varepsilon >0\; \forall V_0 \in \mathcal {N}_y\, \exists N\in \mathbb {N}\; \forall n \ge N\; \forall V\in \mathcal {H}, V\subset V_0,V\cap {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\ne \emptyset \):$$\begin{aligned} \mu _n(W^\circ \times \mathcal {Y} \mathcal {X}\times V) \le e^{n\varepsilon } \mu _n( W \times \mathcal {Y} \mathcal {X}\times V_0) \end{aligned}$$(6.29)
Proof
The proofs of (B5) \(\Longrightarrow \) (B4) \(\iff \) (B3) \(\Longrightarrow \) (B2) \(\Longrightarrow \) (B1) and of (B1) \(\Longrightarrow \) (B2) are similar to the proofs of the following implications.
(A4) \(\iff \) (A3) follows by definition of \(\sup \), \(\inf \), \(\limsup \) and \(\liminf \).
(A2) \(\Longrightarrow \) (A1). Suppose that (A2) holds. Let \(U\in \mathcal {G}\) with \(\inf J(U \times \{y\})<\infty \) and let \(\varepsilon >0\). Let \(V_0 \in \mathcal {N}_y\) and \(N\in \mathbb {N}\) be such that \(\tfrac{1}{n}\log \mu _n( {\overline{U}} \times \mathcal {Y} \mathcal {X}\times V) \ge  \inf I(U)  \varepsilon \) for all \(n\ge N\) and all \(V\in \mathcal {H}\) with \(V\subset V_0\) and \(V\cap {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\ne \emptyset \). Let \((y_n)_{n\in \mathbb {N}}\) be such that \(y_n \in {{\mathrm{supp}}}(\mu _n\circ \pi ^{1})\) and \(y_n \rightarrow y\). Let \(N_0\ge N\) be such that \(y_n \in V_0\) for all \(n\ge N_0\). Then for all \(n\ge N_0\) and \(V \in \mathcal {N}_{y_n}\cap \mathcal {H}\) with \(V\subset V_0\) we have \(\tfrac{1}{n}\log \mu _n({\overline{U}} \times \mathcal {Y} \mathcal {X}\times V) \ge  \inf I(U)  \varepsilon \). This implies (a2) of Theorem 6.3 (with \(\mathcal {V}_n=\mathcal {N}_{y_n}\cap \mathcal {H}\)).
We can also use Lemma 6.7 and Theorem 6.3 (see also 6.4) to obtain sufficient conditions for the lower or upper large deviation bounds for \((\eta _n(y,\cdot ))_{n\in \mathbb {N}}\).
Theorem 6.10
 (a)Suppose that for all \(U\in \mathcal {G}\) with \(\inf J(U\times \{y\})<\infty \)Then \((\eta _n(y,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation lower bound with rate function I.$$\begin{aligned}&\liminf _{n\rightarrow \infty }\limsup _{V\in \mathcal {V}} \tfrac{1}{n}\log \mu _n({\overline{U}} \times \mathcal {Y} \mathcal {X}\times V) \nonumber \\&\quad \ge \liminf _{V\in \mathcal {N}_y} \liminf _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \mu _n(\mathcal {X}\times V)>0 } } \tfrac{1}{n}\log \mu _n( U \times \mathcal {Y} \mathcal {X}\times V). \end{aligned}$$(6.36)
 (b)Suppose that for all \(U_1,\dots , U_k\in \mathcal {G}\) with \(W= \mathcal {X}\setminus (U_1\cup \cdots \cup U_k)\)Then \((\eta _n(y,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation upper bound with rate function I.$$\begin{aligned}&\limsup _{n\rightarrow \infty }\liminf _{V\in \mathcal {V}} \tfrac{1}{n}\log \mu _n(W^\circ \times \mathcal {Y} \mathcal {X}\times V) \nonumber \\&\quad \le \limsup _{V\in \mathcal {N}_y} \limsup _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \mu _n(\mathcal {X}\times V)>0 } } \tfrac{1}{n}\log \mu _n(W \times \mathcal {Y} \mathcal {X}\times V). \end{aligned}$$(6.37)
6.11
7 Large Deviations for Regular Conditional Probabilities
In this section \(\mathcal {X}\) and \(\mathcal {Y}\) are topological spaces, \((\nu _n)_{n\in \mathbb {N}}\) is a sequence of probability measures on \(\mathcal {B}(\mathcal {X})\) that satisfies the large deviation principle with rate function \(K: \mathcal {X}\rightarrow [0,\infty ]\), and \(\tau : \mathcal {X}\rightarrow \mathcal {Y}\) is continuous. For more assumptions, see 7.2.
We derive the analogous statements as in Sect. 6 but for regular conditional kernels instead of product regular conditional kernels (7.3 and Theorem 7.5). First we show that with \(\mu _n\) the probability measure corresponding on the product space corresponding to \(\nu _n\) as in Theorem 3.6, the sequence \((\mu _n)_{n\in \mathbb {N}}\) satisfies the large deviation principle with a rate function described in terms of K (Theorem 7.1).
If \((\eta _n)_{n\in \mathbb {N}}\) are regular conditional probabilities under \((\nu _n)_{n\in \mathbb {N}}\) given \(\tau \), then one could also follow the proofs in Sect. 6 for the product regular conditional probabilities to obtain similar results for large deviations for sequences of the form \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\). Instead, we make the approach via Theorem 3.6 to translate the results to the setting of regular conditional probabilities.
Theorem 7.1
Proof
7.2
 (v)*

For each \(n\in \mathbb {N}\) we assume the following: \({{\mathrm{supp}}}(\nu _n \circ \tau ^{1}) \ne \emptyset \), there exists a regular conditional probability \(\eta _n : \mathcal {Y}\times \mathcal {B}(\mathcal {X}) \rightarrow [0,1]\) under \(\nu _n\) with respect to \(\tau \), satisfying the continuity condition (6.1).
 (vi)*
 Let \(y \in \mathcal {Y}\). We assume that \(\inf K(\tau ^{1}(\{y\}))<\infty \) and that there exists \(y_n \in {{\mathrm{supp}}}(\nu _n \circ \tau ^{1})\) with \(y_n \rightarrow y\). Let \(I: \mathcal {X}\rightarrow [0,\infty ]\) be given by$$\begin{aligned} I(x)&= J(x,y)  \inf J(\mathcal {X}\times \{y\}) \nonumber \\&= {\left\{ \begin{array}{ll} K(x)  \inf K(\tau ^{1}(\{y\})) &{} \tau (x)=y, \\ \infty &{} \tau (x) \ne y. \end{array}\right. } \end{aligned}$$(7.7)
7.3
Remark 7.4
Because of the relation between \(\mu _n\) and \(\nu _n\) and between K and J, in Theorem 7.1 we were able to prove the large deviation principle on \(\{A\times B: A\in \mathcal {B}(\mathcal {X}), B\in \mathcal {B}(\mathcal {Y})\}\). Whether it can be extended to the large deviation principle on \(\mathcal {B}(\mathcal {X})\otimes \mathcal {B}(\mathcal {Y})\) is a priori not clear. However, for the purpose of using the results of Sect. 6, this is not required (as only (iv) of Sect. 6 is required). This is the main reason to define the large deviation bounds as in Definition 1.1.
Theorem 7.5
 (A1)
For all \((y_n)_{n\in \mathbb {N}}\) with \(y_n \in {{\mathrm{supp}}}(\nu _n\circ \tau ^{1})\) and \(y_n \rightarrow y\) the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation lower bound with rate function I.
 (A2)For all \(U\in \mathcal {G}\)$$\begin{aligned} \sup _{V_0\in \mathcal {N}_y} \liminf _{n\rightarrow \infty }\inf _{ \genfrac{}{}{0.0pt}{}{V\in \mathcal {H}, V \subset V_0}{V\cap {{\mathrm{supp}}}(\nu _n \circ \tau ^{1})\ne \emptyset } } \tfrac{1}{n}\log \nu _n({\overline{U}}  \tau ^{1}(V) ) \ge  \inf I(U). \end{aligned}$$(7.9)
 (A3)For all \(U\in \mathcal {G}\)$$\begin{aligned}&\sup _{V_0\in \mathcal {N}_y} \liminf _{n\rightarrow \infty }\inf _{ \genfrac{}{}{0.0pt}{}{V\in \mathcal {H}, V \subset V_0}{V\cap {{\mathrm{supp}}}(\nu _n \circ \tau ^{1})\ne \emptyset } } \tfrac{1}{n}\log \nu _n({\overline{U}}  \tau ^{1}(V) )\nonumber \\&\quad \ge \liminf _{V\in \mathcal {N}_y} \liminf _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \nu _n(\tau ^{1}(V))>0 } } \tfrac{1}{n}\log \nu _n( U  \tau ^{1}(V)). \end{aligned}$$(7.10)
 (B1)

For all \((y_n)_{n\in \mathbb {N}}\) with \(y_n \in {{\mathrm{supp}}}(\nu _n\circ \tau ^{1})\) and \(y_n \rightarrow y\) the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation upper bound with rate function I.
 (B2)
 For all \(U_1,\dots ,U_k\in \mathcal {G}\) one has for \(W= \mathcal {X}\setminus (U_1 \cup \cdots \cup U_k)\)$$\begin{aligned}&\inf _{V_0\in \mathcal {N}_y} \limsup _{n\rightarrow \infty }\sup _{ \genfrac{}{}{0.0pt}{}{V\in \mathcal {H}, V \subset V_0}{V\cap {{\mathrm{supp}}}(\nu _n \circ \tau ^{1})\ne \emptyset } } \tfrac{1}{n}\log \nu _n(W^\circ  \tau ^{1}(V ) ) \le  \inf I(W). \end{aligned}$$(7.11)
 (B3)
 For all \(U_1,\dots , U_k\in \mathcal {G}\) with \(W= \mathcal {X}\setminus (U_1\cup \cdots \cup U_k)\)$$\begin{aligned}&\inf _{V_0\in \mathcal {N}_y} \limsup _{n\rightarrow \infty }\sup _{ \genfrac{}{}{0.0pt}{}{V\in \mathcal {H}, V \subset V_0}{V\cap {{\mathrm{supp}}}(\nu _n \circ \tau ^{1})\ne \emptyset } } \tfrac{1}{n}\log \nu _n(W^\circ  \tau ^{1}(V ) )\nonumber \\&\quad \le \limsup _{V\in \mathcal {N}_y} \limsup _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \nu _n(\tau ^{1}(V))>0 } } \tfrac{1}{n}\log \nu _n(W  \tau ^{1}(V ) ). \end{aligned}$$(7.12)
8 An Application to Conditional Probabilities of Empirical Distributions on Finite Sets
In terms of random variables, Sanov’s theorem gives us the large deviation principle of empirical densities \(\frac{1}{n}\sum _{i=1}^n \delta _{X_i}\), where \(X_1,X_2,\dots \) are independent and identically distributed random variables. We consider large deviations of \(\frac{1}{n}\sum _{i=1}^n \delta _{X_i}\) conditioning on \(\frac{1}{n}\sum _{i=1}^n \delta _{Y_i} = \psi _n\), where \((X_1,Y_1), (X_2,Y_2), \dots \) are independent and identically distributed couples of random variables, both random variables attaining their values in a finite set. This large deviation principle is formalised in Theorem 8.2.

Let \(\mathcal {R}\) and \(\mathcal {S}\) be finite sets equipped with the discrete topology (discrete metric). Let \(\mathcal {P}(\mathcal {R}),\mathcal {P}(\mathcal {S})\) and \(\mathcal {P}(\mathcal {R}\times \mathcal {S})\) be equipped by the weak topology, and let \(\mathfrak {d}\) denote the Prohorov metric (see Billingsley [3, Appendix III]) on each of the spaces.

Let \(\lambda \in \mathcal {P}(\mathcal {R}\times \mathcal {S})\). We assume \(\lambda (\mathcal {R}\times \{s\}) >0\) for all \(s\in \mathcal {S}\).

For \(n\in \mathbb {N}\) let \(L_n : \mathcal {R}^n \rightarrow \mathcal {P}(\mathcal {R})\) be given by \(L_n(r) = \frac{1}{n} \sum _{i=1}^n \delta _{r_i}\) for \(r=(r_1,\dots ,r_n)\in \mathcal {R}^n\).

Write \(\mathcal {P}_{emp}^n(\mathcal {R}) = L_n(\mathcal {R}^n) = \{ \frac{1}{n} \sum _{i=1}^n \delta _{r_i} : r_1,\dots ,r_n\in \mathcal {R}\}\), similarly \(\mathcal {P}_{emp}^n(\mathcal {S})= L_n(\mathcal {S}^n)\) and \(\mathcal {P}_{emp}^n(\mathcal {R}\times \mathcal {S})= L_n((\mathcal {R}\times \mathcal {S})^n)\).
 Let \(\mathfrak {m}: \mathcal {P}(\mathcal {R}\times \mathcal {S}) \rightarrow \mathcal {P}(\mathcal {R}) \times \mathcal {P}(\mathcal {S})\) be the map that maps a measure in \(\mathcal {P}(\mathcal {R}\times \mathcal {S})\) onto the pair of its marginals, i.e. \(\mathfrak {m}\) is given by$$\begin{aligned} \mathfrak {m}(\xi ) = \big ( \xi (\cdot \times \mathcal {S}), \xi (\mathcal {R}\times \cdot ) \big ). \end{aligned}$$(8.1)

Let \(\pi : \mathcal {P}(\mathcal {R}) \times \mathcal {P}(\mathcal {S}) \rightarrow \mathcal {P}(\mathcal {S})\) be the map given by \(\pi (\xi ,\zeta ) = \zeta \).
 Let \(\mu _n\) be the probability measure on \(\mathcal {B}(\mathcal {P}(\mathcal {R})) \otimes \mathcal {B}(\mathcal {P}(\mathcal {S}))\) defined by \(\mu _n = \left( \bigotimes _{i=1}^n \lambda \right) \circ L_n^{1} \circ \mathfrak {m}^{1}\), so that for \(A\in \mathcal {B}(\mathcal {P}(\mathcal {R}))\) and \(B\in \mathcal {B}(\mathcal {P}(\mathcal {S}))\)$$\begin{aligned} \mu _n(A \times B) = \left( \bigotimes _{i=1}^n \lambda \right) ( L_n^{1}(A) \times L_n^{1}(B)). \end{aligned}$$(8.2)

Define \(\theta : \mathcal {S}\times \mathcal {B}(\mathcal {R}) \rightarrow [0,1]\) by \(\theta (s,A) = \lambda (A\times \mathcal {S} \mathcal {R}\times \{s\})\).
 Define \(\eta _n : \mathcal {P}(\mathcal {S}) \times \mathcal {B}(\mathcal {P}(\mathcal {R})) \rightarrow [0,1]\) by$$\begin{aligned} \eta _n(\xi ,A) = {\left\{ \begin{array}{ll} \left[ \bigotimes _{i=1}^n \theta (s_i,\cdot ) \right] \circ L_n^{1}(A) &{} \xi \in \mathcal {P}_{emp}^n(\mathcal {S}), \xi = L_n(s_1,\dots ,s_n) \\ &{} \text{ for } s_1,\dots ,s_n \in \mathcal {S}, \\ 0 &{} \xi \notin \mathcal {P}_{emp}^n(\mathcal {S}). \end{array}\right. } \end{aligned}$$(8.3)
 Let \(J: \mathcal {P}(\mathcal {R}) \times \mathcal {P}(\mathcal {S}) \rightarrow [0,\infty ]\) be given bywhere \(H(\xi  \lambda )\) is the relative entropy of \(\xi \) with respect to \(\lambda \) ([7, Definition 2.1.5]).$$\begin{aligned} J(\rho , \sigma )&= \inf _{\xi \in \mathfrak {m}^{1} (\{ (\rho ,\sigma ) \}) } H(\xi  \lambda ). \end{aligned}$$(8.4)
 Let \(\psi \in \mathcal {P}(\mathcal {S})\) be such that$$\begin{aligned} \inf _{ \xi \in \mathfrak {m}^{1} (\mathcal {P}(\mathcal {R}) \times \{\psi \}) } H(\xi \lambda )<\infty . \end{aligned}$$(8.5)
8.1
 (a)
\(\mathcal {P}_{emp}^n(\mathcal {S})\) is closed in \(\mathcal {P}(\mathcal {S})\). Moreover, if \(\xi _k\) and \(\xi \) in \(\mathcal {P}_{emp}^n(\mathcal {S})\) are such that \(\xi _k \rightarrow \xi \), then there exist \(s_{ki}\) and \(q_i\) in \(\mathcal {S}\) for \(i\in \{1,\dots ,n\}\) such that \(\xi _k = L_n((s_{k1},\dots ,s_{kn}))\), \(\xi = L_n((q_1,\dots ,q_n))\) and \(s_{ki} \rightarrow q_i\) for all \(i\in \{1,\dots ,n\}\).
 (b)
\({{\mathrm{supp}}}(\mu _n \circ \pi ^{1}) = \mathcal {P}_{emp}^n(\mathcal {S})\).
 (c)
\(\eta _n\) is a product regular conditional kernel under \(\mu _n\) with respect to \(\pi \) that is weakly continuous on \(\mathcal {P}_{emp}^n(\mathcal {S})\).
 (d)
\((\bigotimes \lambda ^n \circ L_n^{1})_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function \( H(\cdot \lambda )\).
 (e)
\(\mathfrak {m}\) is continuous.
 (f)
\((\mu _n)_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function J.
In the rest of this section, we prove the following theorem.
Theorem 8.2
As \(\mathcal {P}(\mathcal {S})\) is first countable, it is sufficient to show that (A2) and (B2) of Theorem 6.9 hold. In 8.4 we use the bounds of Lemma 8.3 to derive other bounds which imply (A2) and (B2). The continuity of I follows by continuity of the map \(\nu \mapsto H(\nu \lambda )\) (Lemma 8.5).
Lemma 8.3
8.4
Lemma 8.5
Lemma 8.6
 (a)
Let \(k,l\in \mathbb {N}\) and \(\zeta \in \mathcal {P}_{emp}^k(\mathcal {S})\). For all \(m\ge kl\) there exists a \(\nu \in \mathcal {P}_{emp}^m(\mathcal {S})\) such that \(\mathfrak {d}(\nu ,\zeta ) <\frac{1}{l}\).
 (b)
For all open \(\Theta \subset \mathcal {P}(\mathcal {S})\) there exists an \(N\in \mathbb {N}\) such that \(\mathcal {P}_{emp}^n(\mathcal {S}) \cap \Theta \ne \emptyset \) for all \(n\ge N\).
Proof
(b) Let \(\xi \in \mathcal {P}(\mathcal {S})\) and \(\delta >0\) be such that \(B(\xi ,\delta ) \subset \Theta \). For each \(\xi \in \mathcal {P}(\mathcal {S})\) there is a \(k\in \mathbb {N}\) and a \(\zeta \in \mathcal {P}_{emp}^k(\mathcal {S})\) such that \(\mathfrak {d}(\zeta ,\xi )<\frac{\delta }{2}\). Because of this, (b) follows from (a) by letting l be such that \(\frac{1}{l} < \frac{\delta }{2}\) and \(N= lk\). \(\square \)
Lemma 8.7
Proof
First we show that there exists a \(\xi ^* \in \mathcal {P}_{emp}^n(\mathcal {X}\times \mathcal {Y})\) with \(\xi ^* \ll \xi \) and \(\xi _{rs}^*  \xi _{rs} \le \frac{2}{n}\) for all \(r\in \mathcal {R}\) and \(s\in \mathcal {S}\). For each pair \((r,s) \in \mathcal {R}\times \mathcal {S}\) with \(\xi _{rs}>0\) we can choose a \(\xi _{rs}'\in \{0,\frac{1}{n},\frac{2}{n}, \dots , 1\}\) such that \(\xi _{rs}  \xi _{rs}'<\frac{1}{n}\). By letting \(\xi _{rs}^*=0\) when \(\xi _{rs}=0\) and add or subtract \(\frac{1}{n}\) to some of the \(\xi _{rs}'\) we obtain a collection of \(\xi _{rs}^*\in \{0,\frac{1}{n},\frac{2}{n},\dots ,1\}\) with \(\sum _{rs} \xi _{rs}^*=1\) and \(\xi _{rs}^*  \xi _{rs} \le \frac{2}{n}\) and \(\xi ^*_{rs}=0\) whenever \(\xi _{rs}=0\) for all \(r\in \mathcal {R}\) and \(s\in \mathcal {S}\).
Lemma 8.8
Proof
9 Examples
In Sect. 8, we showed that the regular conditional kernel \(\eta _n\) as in (8.3) satisfies (A1) and (B1) of Theorem 6.9 by showing that (A2) and (B2) of that theorem hold. This is not always the most optimal approach; in Example 9.1 we show that for a specific example of Gaussian measures the expression of \(\eta _n\) allows us to derive (A1) and (B1) directly.
Furthermore, relying on Theorem 9.2, in Example 9.4, we give an example of a \((\eta _n)_{n\in \mathbb {N}}\) for which (A1) of Theorem 6.9 does not hold. In Remark 9.5 we mention that for the one choice of measures in Example 9.4 a quenched large deviation principle is satisfied, while for the other choice of measures there is no quenched large deviation principle. In Example 9.6 we show that for a choice of measures as in Example 9.4 the conditional regular kernel in a specific chosen point does not satisfy any large deviation principle. In Remark 9.7 we discuss exponential tightness of the regular conditional kernel. In Remark 9.8 we discuss the differences between the present paper and the paper of La Cour and Schieve [20].
Example 9.1
The proof of the following theorem can be found in “Appendix 3”.
Theorem 9.2
Note that \(I(x) = J(x,y)  \inf J(\mathcal {X}\times \{y\})\) for all \(x\in \mathcal {X}, y\in \mathcal {Y}\).
Example 9.3
 (a)
Let \(\mathcal {Y}= [0,\infty )\), \(\alpha _n (y) = \min \{ny,1\}\) for \(y\in \mathcal {Y}\) and let \(\nu _n(B) = \int _0^\infty \mathbbm {1}_B(y) n e^{ny} {{\mathrm{\, \mathrm {d}\!}}}y\) for \(B\in \mathcal {B}([0,\infty ))\). Then \(\int _0^\frac{1}{n} \alpha _n {{\mathrm{\, \mathrm {d}\!}}}\nu _n = 12e^{1}\) and \(\int _0^\frac{1}{n} (1\alpha _n) {{\mathrm{\, \mathrm {d}\!}}}\nu _n = e^{1}\). Therefore with this \(\nu _n\), \(\alpha _n\) and \(W_n = [\frac{1}{n},\frac{1}{n}]\) (9.4) is satisfied. Moreover \((\nu _n)_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function \(L :\mathcal {Y}\rightarrow [0,\infty ]\), \(L(y) =y\) (this follows from example by the Gärtner–Ellis theorem [7, Theorem 2.3.6]).
 (b)
Let \(\mathcal {Y}= \mathbb {R}\) and \(\nu _n =\mu _{\mathcal {N}(0,\frac{1}{n})}\) (the Gaussian measure corresponding to a \(\mathcal {N}(0,\frac{1}{n})\) distributed random variable). Then there exists a decreasing sequence \((\varepsilon _n)_{n\in \mathbb {N}}\) in \((0,\infty )\) with \(\varepsilon _n \downarrow 0\), such that with \(W_n = [\varepsilon _n,\varepsilon _n]\) there exist functions \(\alpha _n\) as in Theorem 9.2 such that (9.4) is satisfied (see the postscript). With \(\nu _n^0 = \frac{1}{2} \delta _0 + \frac{1}{2} \nu _n\) instead of \(\nu _n\), (9.4) is also satisfied. Moreover, \((\nu _n)_{n\in \mathbb {N}}\) and \((\nu _n^0)_{n\in \mathbb {N}}\) (use Lemma 9.9) satisfy the large deviation principle with rate function \(L :\mathcal {Y}\rightarrow [0,\infty ]\), \(L(y) = \frac{1}{2} y^2\).
Postscript.
Let \(\beta = \nu _1([1,1])\). Let \(\kappa _n = \frac{1}{\sqrt{n}}\). Then \(\nu _n[\kappa _n,\kappa _n] = \beta \) for all \(n\in \mathbb {N}\). Let \(\phi _\varepsilon : \mathbb {R}\rightarrow [0,1]\) be defined by \(\phi _\varepsilon (z) = \min \{\varepsilon ^{1} z,1\}\). Then \(\lim _{\varepsilon \downarrow 0} \int _{[\kappa _n,\kappa _n]} \phi _\varepsilon {{\mathrm{\, \mathrm {d}\!}}}\nu _1 = \beta \), \(\lim _{\varepsilon \downarrow 0} \int _{[\kappa _n,\kappa _n]} 1 \phi _\varepsilon {{\mathrm{\, \mathrm {d}\!}}}\nu _1 =0\) andTherefore, for all \(n\in \mathbb {N}\), there exists an \(\varepsilon _n \in (0,\kappa _n)\) such that$$\begin{aligned} \int _{[\kappa _n,\kappa _n]} \phi _{\kappa _n} {{\mathrm{\, \mathrm {d}\!}}}\nu _n < \int _{[\kappa _n,\kappa _n]} 1 \phi _{\kappa _n} {{\mathrm{\, \mathrm {d}\!}}}\nu _n. \end{aligned}$$(9.9)With \(\alpha _n = \phi _{\varepsilon _n}\), (9.4) as in Theorem 9.2 is satisfied.$$\begin{aligned} \int _{[\kappa _n,\kappa _n]} \phi _{\varepsilon _n} {{\mathrm{\, \mathrm {d}\!}}}\nu _n = \tfrac{1}{2} \beta = \int _{[\kappa _n,\kappa _n]} 1 \phi _{\varepsilon _n} {{\mathrm{\, \mathrm {d}\!}}}\nu _n, \end{aligned}$$(9.10)
Example 9.4
With \(\mathcal {X}= \mathbb {R}\), \(\mu _n^1 = \mu _{\mathcal {N}(0,\frac{1}{n})}\), \(\mu _n^2 = \delta _{\frac{1}{n}}\) and \(I(x) = \frac{1}{2} x^2\) for \(x\in \mathbb {R}\) and \(\mathcal {Y},\nu _n\) (or \(\nu _n^0\)), \(\alpha _n\), \(W_n\) and L as in Examples 9.3 (a) or (b) the conditions of Theorem 9.2 are satisfied (note that \((\delta _{\frac{1}{n}})_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function \(H: \mathbb {R}\rightarrow [0,\infty ]\) given by \(H(0)=0\) and \(H(x) =\infty \) for \(x\ne 0\)).
Remark 9.5
(Quenched large deviations) Consider the situation as in Example 9.4. For all \(n\in \mathbb {N}\) we have the following. If \(\zeta _n: \mathcal {Y}\times \mathcal {B}(\mathcal {X}) \rightarrow [0,1]\) is a product regular conditional probability under \(\mu _n\) with respect to \(\pi \), then \(\zeta _n(y,\cdot ) = \eta _n(y,\cdot )\) for \([\mu _n \circ \pi ^{1}]\)almost all y (see Remark 3.5).
Whence, with \(\nu _n\) as in Examples 9.3 (a) or (b), we have a quenched large deviation principle of the conditional probability with respect to the second coordinate with rate function I; for every product regular conditional probability \(\zeta _n\) under \(\mu _n\) with respect to \(\pi \) there exists a \(Z\subset \mathcal {Y}\) with \(\mu _n \circ \pi ^{1}(Z) = \nu _n(Z)=1\) such that \((\zeta _n(y,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function I for all \(y\in Z\).
However, with \(\nu _n^0\) as in Examples 9.3 (b) instead of \(\nu _n\) for such \(\zeta \) one has \(\zeta _n(0,\cdot ) = \eta _n(0,\cdot )\) as \(\nu _n^0(\{0\}) >0\). Thus in this case we do not have such a quenched large deviation principle.
Example 9.6
With \(\mathcal {X}= \mathbb {N}\), \(\mu _n^1 = \sum _{k\in \mathbb {N}} 2^{k} \delta _k\), \(\mu _n^2= \delta _n\) and \(I(x) =0\) for \(x\in \mathbb {N}\) as in Example 9.4, and \(\mathcal {Y}, W_n, \alpha _n, \nu _n\) and L as in Examples 9.3 (a) or (b), the conditions of Theorem 9.2 are satisfied. In this case, \((\eta _n(0,\cdot ))_{n\in \mathbb {N}}\) does not satisfy a large deviation principle.
Remark 9.7
(Exponential tightness of the regular conditional kernel) Considering the situation as in Theorem 9.2, we would like to mention that if \((\mu _n^1)_{n\in \mathbb {N}}\) is exponentially tight, then so is \((\mu _n)_{n\in \mathbb {N}}\) since \(\mu _n(K_1^c\times K_2^c) = \mu _n^1(K_1^c) \nu _n(K_2^c)\) for large n and (compact) \(K_1\subset \mathcal {X}, K_2\subset \mathcal {Y}\). Similarly \((\eta _n(y,\cdot ))_{n\in \mathbb {N}}\) is exponentially tight for all \(y>0\) since \(\eta _n(y,K^c) = \mu _n^1(K^c)\) for large n and compact \(K\subset \mathcal {X}\). However, as is the case in Example 9.6, \((\eta _n(y_n,\dot{)})_{n\in \mathbb {N}}\) need not be exponentially tight for all converging sequences \((y_n)_{n\in \mathbb {N}}\) (e.g. if \((\mu _n^2)_{n\in \mathbb {N}}\) is not exponentially tight, then \((\eta _n(0,\cdot ))_{n\in \mathbb {N}}\) is neither).
Remark 9.8
Example 9.4 with \(\nu _n\) (or \(\nu _n^0\)) and \(\alpha _n\) as in Examples 9.3 (b) fits the assumptions made in Sect. 4 of La Cour and Schieve [20].^{8} In Sect. 4 of that paper, it is claimed that the law of the first coordinate conditioned on the second coordinate satisfies the large deviation principle with the rate function I. Their notion of conditioning on y is “condition on an arbitrarily small neighbourhood around y”. This approach needs to be justified. Our results are different, as by Example 9.4 the conditioned kernel in 0, \(\eta _n(0,\cdot )\) does not satisfy the large deviation principle with the rate function I (even in the sense of quenched large deviations as discussed in Remark 9.5).
Footnotes
 1.
Meaning that there exists an \(N\in \mathbb {N}\) such that \(y_n \in {{\mathrm{supp}}}(\mu _n\circ \pi ^{1})\) for all \(n\ge N\).
 2.
The Sorgenfrey line, the space \(\mathbb {R}\) with the right halfopen interval topology, is perfectly normal but not second countable (see Steen and Seebach [27, Example 51]).
 3.
Perfectly normal means that every open set in \(\mathcal {X}\) is equal to \(f^{1}((0,\infty ))\) for some \(f\in C(\mathcal {X})\). All metric spaces are perfectly normal; Bogachev [4, Proposition 6.3.5].
 4.
This is not true in general. For an example, see Bogachev [4, Example 7.1.3].
 5.
As we are considering large deviation bound for \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) with \(y_n \in {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\), we want such \(y_n\) to exist. Instead of this condition, one could of course deal with the situation where \({{\mathrm{supp}}}(\mu _n \circ \pi ^{1}) \ne \emptyset \) for some large N and consider sequences \((y_n)_{n\in \mathbb {N}}\) with \(y_n \in {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\) for \(n\ge N\).
 6.
Under the condition that \(\mathcal {Y}\) is first countable.
 7.
Note that \(\mu _n(\mathcal {X}\times V) >0\) for all \(n\in \mathbb {N}\) and \(V\in \mathcal {N}_{y_n}\), as \(y_n \in {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\).
 8.
The logarithmic moment generating function (see Dembo and Zeitouni [7, Assumption 2.3.2]) is given by \((x,y) \mapsto \frac{1}{2} x^2 + \frac{1}{2} y^2\), whence the Hessian of it equals the identity matrix and is therefore invertible. In [20], it is mentioned that one cannot proceed the conditioning on all elements, but only those that equal the derivative of \(y\mapsto \frac{1}{2} y^2\) at a certain point are considered, of which 0 is an example.
 9.
 10.
This can also be found in O’Brien [23, Proposition 2.1].
Notes
Acknowledgements
The author is supported by ERC Advanced Grant VARIS267356 of Frank den Hollander. The author is grateful to both Frank den Hollander and Frank Redig for valuable suggestions and useful discussions.
References
 1.Adams, S., Dirr, N., Peletier, M.A., Zimmer, J.: From a largedeviations principle to the Wasserstein gradient flow: a new micromacro passage. Commun. Math. Phys. 307(3), 791–815 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
 2.Biggins, J.D.: Large deviations for mixtures. Electron. Commun. Probab. 9, 60–71 (2004). (electronic)MathSciNetCrossRefzbMATHGoogle Scholar
 3.Billingsley, P.: Convergence of Probability Measures, Wiley Series in Probability and Statistics: Probability and Statistics, 2nd edn. Wiley, New York (1999)CrossRefzbMATHGoogle Scholar
 4.Bogachev, V.: Measure Theory. Springer, Berlin (2007)CrossRefzbMATHGoogle Scholar
 5.Comets, F.: Large deviation estimates for a conditional probability distribution. Applications to random interaction Gibbs measures. Probab. Theory Relat. Fields 80(3), 407–432 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
 6.Comets, F., Gantert, N., Zeitouni, O.: Quenched, annealed and functional large deviations for onedimensional random walk in random environment. Probab. Theory Relat. Fields 118(1), 65–114 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
 7.Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications, Stochastic Modelling and Applied Probability, vol. 38. Springer, Berlin (2010). [Corrected reprint of the second edition (1998)]CrossRefzbMATHGoogle Scholar
 8.Deuschel, J.D., Stroock, D.W.: Large Deviations, Pure and Applied Mathematics, vol. 137. Academic Press, Boston (1989)zbMATHGoogle Scholar
 9.van Enter, A.C.D., Fernández, R., den Hollander, F., Redig, F.: A largedeviation view on dynamical GibbsnonGibbs transitions. Mosc. Math. J. 10(4), 687–711 (2010)MathSciNetzbMATHGoogle Scholar
 10.van Enter, A.C.D., Külske, C., Opoku, A.A., Ruszel, W.M.: GibbsnonGibbs properties for nvector lattice and meanfield models. Braz. J. Probab. Stat. 24(2), 226–255 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
 11.Ermolaev, V., Külske, C.: Lowtemperature dynamics of the Curie–Weiss model: periodic orbits, multiple histories, and loss of Gibbsianness. J. Stat. Phys. 141(5), 727–756 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
 12.Faden, A.: The existence of regular conditional probabilities: necessary and sufficient conditions. Ann. Probab. 13(1), 288–298 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
 13.Fernández, R., den Hollander, F., Martínez, J.: Variational description of GibbsnonGibbs dynamical transitions for the Curie–Weiss model. Commun. Math. Phys. 319(3), 703–730 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 14.Greven, A., den Hollander, F.: Large deviations for a random walk in random environment. Ann. Probab. 22(3), 1381–1428 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
 15.Halmos, P.R.: Measure Theory. Springer, Berlin (1974)zbMATHGoogle Scholar
 16.den Hollander, F.: Large Deviations, Fields Institute Monographs, vol. 14. American Mathematical Society, Providence, RI (2000)Google Scholar
 17.den Hollander, F., Redig, R., van Zuijlen, W.: GibbsnonGibbs dynamical transitions for meanfield interacting Brownian motions. Stoch. Process. Appl. 125(1), 371–400 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 18.Kosygina, E., Rezakhanlou, F., Varadhan, S.R.S.: Stochastic homogenization of Hamilton–Jacobi–Bellman equations. Commun. Pure Appl. Math. 59(10), 1489–1521 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
 19.Külske, C., Opoku, A.A.: Continuous spin meanfield models: limiting kernels and Gibbs properties of local transforms. J. Math. Phys. 49(12), 125215 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
 20.La Cour, B.R., Schieve, W.C.: A general conditional large deviation principle. J. Stat. Phys. 161(1), 123–130 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 21.Leao Jr., D., Fragoso, M., Ruffino, P.: Regular conditional probability, disintegration of probability and Radon spaces. Proyecciones 23(1), 15–29 (2004)MathSciNetGoogle Scholar
 22.Léonard, C.: A large deviation approach to optimal transport. arXiv:0710.1461v1 (2007)
 23.O’Brien, G.L.: Sequences of capacities, with connections to largedeviation theory. J. Theor. Probab. 9(1), 19–35 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
 24.RassoulAgha, F., Seppäläinen, T., Yilmaz, A.: Quenched free energy and large deviations for random walk in random potential. Commun. Pure Appl. Math. 66(2), 202–244 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 25.RassoulAgha, F., Seppäläinen, T.: A Course on Large Deviations with an Introduction to Gibbs Measures, Graduate Studies in Mathematics, vol. 162. American Mathematical Society, Providence, RI (2015)CrossRefzbMATHGoogle Scholar
 26.Schaefer, H.H.: Topological Vector Spaces. Springer, New York (1971). (Third printing corrected, Graduate Texts in Mathematics, Vol. 3)CrossRefzbMATHGoogle Scholar
 27.Steen, L.A., Seebach Jr., J.A.: Counterexamples in Topology. Holt, Rinehart and Winston, New York (1970)zbMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.