Abstract
We study product regular conditional probabilities under measures of two coordinates with respect to the second coordinate that are weakly continuous on the support of the marginal of the second coordinate. Assuming that there exists a sequence of probability measures on the product space that satisfies a large deviation principle, we present necessary and sufficient conditions for the conditional probabilities under these measures to satisfy a large deviation principle. The arguments of these conditional probabilities are assumed to converge. A way to view regular conditional probabilities as a special case of product regular conditional probabilities is presented. This is used to derive conditions for large deviations of regular conditional probabilities. In addition, we derive a Sanovtype theorem for large deviations of the empirical distribution of the first coordinate conditioned on fixing the empirical distribution of the second coordinate.
Introduction and Main Results
In the present paper, we study large deviations of probabilities “of the form”
where \(((X_n,Y_n))_{n\in \mathbb {N}}\) is a sequence of couples of random variables that satisfies a large deviation principle and \(y_n \rightarrow y\) for some y. As the event \([Y_n = y_n]\) may have probability zero, we make sense of (1.1) in terms of a kernel \(\eta _n\), so that
“represents” (1.1).
Such kernels are called regular conditional probabilities and form an important object in probability theory. The existence of regular conditional probabilities has been studied extensively, for example, by Faden [12] or by Leao et al. [21]. There exist in fact various forms of regular conditional probabilities, namely either with respect to a \(\sigma \)algebra, with respect to a measurable map or with respect to the projection on one of the coordinates (in case of a product space).
In order to consider large deviations of conditional probabilities, we have to specify which conditional probability we are considering; the conditional probability may not be unique. However, if a (product) regular conditional probability is weakly continuous on the support of the measure composed with the inverse of the measurable map (or projection), it is unique on that domain. For these (product) regular conditional probabilities, it is natural to study their large deviations whenever the argument of the probability is in the domain on which it is unique. In this paper, we study the large deviations in the case when the arguments of these kernels converge, i.e. we study large deviations of \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) for the case that \(y_n \rightarrow y\). To the best of our knowledge, current literature does not provide a general condition under which such kernels satisfy a large deviation principle.
Literature
Some examples in this direction are present. For example in Adams et al. [1], the large deviation principle is proved for the empirical distribution that is evolved by independent Brownian motions conditioned on their initial empirical distribution to lie in a ball (see [1, Theorem 1]). They proceed by proving that the large deviation principle rate function converges as the radius of the ball converges to zero. For the purpose of this paper, we have to show that the limit of the radius of the ball and the limit belonging to the large deviation principle can be interchanged. Léonard [22] proves the large deviation principle of the empirical distribution that is evolved by independent Brownian motions conditioned on their initial empirical distribution; those initial empirical distributions are assumed to be converging (see [22, Proposition 2.19]). In both papers, the evolved state is conditioned on the initial state, while there is also interest in large deviations of the initial state conditioned on the evolved state. In this paper, we prove the large deviation principle in this setting for finite state spaces.
There exist various results on quenched large deviations, i.e. large deviations for regular conditional probabilities in the sense that for almost all realisations of the disorder, the conditional probabilities satisfy the large deviation principle with a rate function that does not depend on the disorder. Examples of papers on quenched large deviations are Comets [5] for conditional large deviations of i.i.d. random fields, Greven and den Hollander [14] and Comets et al. [6] for random walks in random environments, Kosygina–Rezakhanlou–Varadhan [18] for a diffusion with a random drift and RassoulAgha et al. [24] for polymers in a random potential.
Biggins [2] obtains the large deviation principle for mixtures of probability measures that satisfy the large deviation principle with kernels that satisfy the large deviation principle as their arguments converge. To some extent, we complement the article in the opposite direction, in the sense that we assume the large deviation principle of the mixture and derive the large deviation principle of the kernels.
Our main motivation to study the above large deviations lies in the theory of Gibbs–nonGibbs transitions. There is a correspondence between the large deviation rate function of the conditional probability with respect to the evolved coordinate and the evolved state (measure or sequence) being Gibbs (see van Enter et al. [9]). We refer to Sect. 1.4 for further discussions on Gibbs–nonGibbs transitions.
Large Deviations
In the literature on large deviations, two dominant definitions of large deviation principles are used. One is in terms of a \(\sigma \)algebra on the topological space, as is done in the book by Dembo and Zeitouni [7] and in the book by Deuschel and Stroock [8]; the other is in terms of the topology, i.e. in terms of open and closed sets, as is done in the book by den Hollander [16] and in the book by RassoulAgha and Seppäläinen [25]. Whenever one considers the Borel \(\sigma \)algebra on the topological space, the two definitions agree.
We define the large deviation lower bound and the large deviation upper bound separately, as in Sect. 1.3, and in Sect. 6, we describe the necessary and sufficient conditions for each of the bounds separately. Moreover, we define them on a set of subsets of the topological space, which is not required to be a \(\sigma \)algebra. In Remark 7.4, we motivate the choice for this definition.
Definition 1.1
Let \(\mathcal {X}\) be a topological space and \(\mathcal {A}\) be a set of subsets of \(\mathcal {X}\). Let \(I: \mathcal {X}\rightarrow [0,\infty ]\) be lower semicontinuous. Let \((\mu _n)_{n\in \mathbb {N}}\) be a sequence of probability measures on \(\mathcal {A}\). Let \((r_n)_{n\in \mathbb {N}}\) be an increasing sequence in \((0,\infty )\) with \(\lim _{n\rightarrow \infty }r_n = \infty \). We say that \((\mu _n)_{n\in \mathbb {N}}\) satisfies a large deviation lower bound on \(\mathcal {A}\) with rate function I and rates \((r_n)_{n\in \mathbb {N}}\) if
We say that \((\mu _n)_{n\in \mathbb {N}}\) satisfies a large deviation upper bound on \(\mathcal {A}\) with rate function I and rates \((r_n)_{n\in \mathbb {N}}\) if
In the rest of the paper we only consider the rates \(r_n=n\). However, the theory presented is still valid for general rates \((r_n)_{n\in \mathbb {N}}\). We say that \((\mu _n)_{n\in \mathbb {N}}\) satisfies a large deviation principle on \(\mathcal {A}\) with rate function I whenever it satisfies both the large deviation lower bound and the large deviation upper bound with rate function I.
We omit “on \(\mathcal {A}\)” whenever \(\mathcal {A}\) is the Borel \(\sigma \)algebra \(\mathcal {B}(\mathcal {X})\) on \(\mathcal {X}\). In this case, the large deviation lower bound is satisfied if and only if the inequality in (1.2) holds for all open subsets of \(\mathcal {X}\) and the large deviation upper bound is satisfied if and only if the inequality in (1.3) holds for all closed subsets of \(\mathcal {X}\).
Main Results
See Sects. 3 and 4 for the definitions of the objects in the statements of the following theorems. In Sects. 6 and 7, we consider a more general situation. Theorem 1.2 is a consequence of Theorem 6.9, and Theorem 1.3 is a consequence of Theorem 7.5.
In this section \(\mathcal {X}\) and \(\mathcal {Y}\) are metric spaces.
Theorem 1.2
Let \(\pi : \mathcal {X}\times \mathcal {Y}\rightarrow \mathcal {Y}\) be given by \(\pi (x,y)=y\). Suppose that \((\mu _n)_{n\in \mathbb {N}}\) is a sequence of probability measures on \(\mathcal {B}(\mathcal {X}) \otimes \mathcal {B}(\mathcal {Y})\) that satisfies the large deviation principle with rate function \(J: \mathcal {X}\times \mathcal {Y}\rightarrow [0,\infty ]\) that has compact sublevel sets. Suppose that for each \(n\in \mathbb {N}\), there exists a product regular conditional probability \(\eta _n : \mathcal {Y}\times \mathcal {B}(\mathcal {X}) \rightarrow [0,1]\) under \(\mu _n\) with respect to \(\pi \) that is weakly continuous on \({{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\), which we assume to be nonempty. Let \(y \in \mathcal {Y}\) be such that \(\inf J(\mathcal {X}\times \{y\})<\infty \). Define \(I: \mathcal {X}\rightarrow [0,\infty ]\) by
I has compact sublevel sets, and, for each \(n\in \mathbb {N}\), \(\eta _n\) is unique on \({{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\). Moreover,
where

(A1) For all \((y_n)_{n\in \mathbb {N}}\) with \(y_n \rightarrow y\) and \(y_n \in {{\mathrm{supp}}}(\mu _n\circ \pi ^{1})\) for all n large enough,^{Footnote 1} the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation lower bound with rate function I.

(A2) For all \(x\in \mathcal {X}\) and \(r>0\), with \(U= B(x,r)\),
$$\begin{aligned} \sup _{\varepsilon >0} \liminf _{n\rightarrow \infty }\inf _{ \genfrac{}{}{0.0pt}{}{z\in \mathcal {Y}, \delta \in (0,\varepsilon ) }{ B(z,\delta ) \subset B(y,\varepsilon ) } } \tfrac{1}{n}\log \mu _n \Big ({\overline{U}} \times \mathcal {Y}\, \Big  \, \mathcal {X}\times B(z,\delta ) \Big ) \ge \inf I(U ). \end{aligned}$$(1.5) 
(B1) For all \((y_n)_{n\in \mathbb {N}}\) with \(y_n \rightarrow y\) and \(y_n \in {{\mathrm{supp}}}(\mu _n\circ \pi ^{1})\) for all n large enough, the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation upper bound with rate function I.

(B2) For all \(x_1,\dots ,x_k\in \mathcal {X}\) and \(r_1,\dots ,r_k>0\), with \(W = \mathcal {X}\setminus [B(x_1,r_1)\cup \cdots \cup B(x_k,r_k)]\),
$$\begin{aligned} \inf _{\varepsilon >0} \limsup _{n\rightarrow \infty }\sup _{ \genfrac{}{}{0.0pt}{}{z\in \mathcal {Y}, \delta \in (0,\varepsilon ) }{ B(z,\delta ) \subset B(y,\varepsilon ) } } \tfrac{1}{n}\log \mu _n \Big (W^\circ \times \mathcal {Y}\, \Big  \, \mathcal {X}\times B(z,\delta ) \Big ) \le \inf I(W ). \end{aligned}$$(1.6)
The next theorem is similar to Theorem 1.2, but considers the large deviation bounds for regular conditional kernels instead of product regular conditional probabilities.
Theorem 1.3
Let \(\tau : \mathcal {X}\rightarrow \mathcal {Y}\) be continuous. Suppose that \((\nu _n)_{n\in \mathbb {N}}\) is a sequence of probability measures on \(\mathcal {B}(\mathcal {X}) \) that satisfies the large deviation principle with rate function \(J: \mathcal {X}\rightarrow [0,\infty ]\) that has compact sublevel sets. Suppose that for each \(n\in \mathbb {N}\) there exists a regular conditional probability \(\eta _n : \mathcal {Y}\times \mathcal {B}(\mathcal {X}) \rightarrow [0,1]\) under \(\nu _n\) with respect to \(\tau \) that is weakly continuous on \({{\mathrm{supp}}}(\nu _n \circ \tau ^{1})\), which is assumed to be nonempty. Let \(y \in \mathcal {Y}\) be such that \(\inf J(\tau ^{1}(\{y\}))<\infty \). Define \(I: \mathcal {X}\rightarrow [0,\infty ]\) by
I has compact sublevel sets, and, for each \(n\in \mathbb {N}\), \(\eta _n\) is unique on \({{\mathrm{supp}}}(\nu _n \circ \tau ^{1})\). Moreover,
where

(A1) For all \((y_n)_{n\in \mathbb {N}}\) with \(y_n \rightarrow y\) and \(y_n \in {{\mathrm{supp}}}(\nu _n\circ \pi ^{1})\) for all n large enough, the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation lower bound with rate function I.

(A2) For all \(x\in \mathcal {X}\) and \(r>0\), with \(U= B(x,r)\),
$$\begin{aligned} \sup _{\varepsilon >0} \liminf _{n\rightarrow \infty }\inf _{ \genfrac{}{}{0.0pt}{}{z\in \mathcal {Y}, \delta \in (0,\varepsilon ) }{ B(z,\delta ) \subset B(y,\varepsilon ) } } \tfrac{1}{n}\log \nu _n \Big ({\overline{U}} \ \Big  \, \tau ^{1}(B(z,\delta ) ) \Big ) \ge \inf I(U ). \end{aligned}$$(1.8) 
(B1) For all \((y_n)_{n\in \mathbb {N}}\) with \(y_n \rightarrow y\) and \(y_n \in {{\mathrm{supp}}}(\nu _n\circ \pi ^{1})\) for all n large enough, the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation upper bound with rate function I.

(B2) For all \(x_1,\dots ,x_k\in \mathcal {X}\) and \(r_1,\dots ,r_k>0\), with \(W = \mathcal {X}\setminus [B(x_1,r_1)\cup \cdots \cup B(x_k,r_k)]\),
$$\begin{aligned} \inf _{\varepsilon >0} \limsup _{n\rightarrow \infty }\sup _{ \genfrac{}{}{0.0pt}{}{z\in \mathcal {Y}, \delta \in (0,\varepsilon ) }{ B(z,\delta ) \subset B(y,\varepsilon ) } } \tfrac{1}{n}\log \nu _n \Big (W^\circ \, \Big  \, \tau ^{1}(B(z,\delta ) ) \Big ) \le \inf I(W ). \end{aligned}$$(1.9)
Gibbs–nonGibbs Transitions and Future Research
In this section we discuss the relation between the large deviation results in this paper and Gibbs–nonGibbs transitions in more detail. In particular, we discuss possible future directions regarding large deviations of conditional kernels.
The following situation for interacting particle systems occurs in the meanfield context (a similar context holds in the context of lattices). The initial system of socalled spins consists of distributions describing the interaction between spins via a potential V (for each n there is a distribution describing the law of n spins). This initial system is assumed to be Gibbs, which is called sequentially Gibbs in the meanfield context. Allowing the initial state to be transformed, for example, by an evolution of the spins, a question of interest is whether the transformed state is (sequentially) Gibbs. This question has been addressed in the meanfield context by Ermoleav and Külske [11] and by Fernández et al. [13] for \(\{1,+1\}\)valued spins, by den Hollander et al. [17] for \(\mathbb {R}\)valued spins and by Külske and Opoku [19] and van Enter et al. [10] for compactly valued spins. In these papers, independent dynamics of the spins are considered (the evolution of each spin is independent of the evolution of the other spins). Independent dynamics simplify the situation. Namely, the evolved measure on either the product space of the initial and the final space, or—in case of an evolution—the space of trajectories, is a tilted measure of the evolved measure when considering \(V=0\). In this case the measure is a product measure, which means that the spins are independent. As a consequence (this will be clarified in a forthcoming paper), the conditional kernel \(\eta _n\) of the initial state on n spins with respect to the final state (for a fixed potential V) is a tilted version of the conditional kernel \(\eta _n^0\) of the initial state with respect to the final state of independent spins (i.e. \(V=0\)). Because of this tilting, by Varadhan’s lemma, \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function \(V+ I_y  \inf (V+ I_y)\) if \((\eta _n^0(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function \(I_y\). In the forthcoming paper, we will prove that the evolved sequence is sequentially Gibbs if \(V+I_\zeta \) has a unique global minimiser.
The large deviation principle of \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) has been mentioned in the case of trajectories in [11, Corollary 2.4] and—as a corollary of that theorem—for the case of the product space of the initial and the final space in [13, Corollary 1.3]. However, no proof was given. Theorem 8.2 provides a rigorous proof of the large deviation principle statement in [13, Corollary 1.3]. In this paper, we do not provide a rigorous proof of [11, Corollary 2.4]. But Theorem 1.3 may be used, as the conditioning on the final state is a regular conditional kernel with respect to the map \(\tau : C([0,T],\mathcal {X}) \rightarrow \mathcal {X}\), \(\tau (f)=f(T)\).
In order to deal with empirical distributions (and not with magnetisations as is done in [17]), in future research we strive to “extend” the statement of Theorem 8.2 to infinite and possibly noncompact state spaces. In the case of noncompact spaces, it may be that topologies on the space of probability measures are considered that are not metrisable.
Outline
We list some notations, definitions and assumptions in Sect. 2. In Sect. 3, we give and compare the notions of regular conditional kernels, and we show that a regular conditional kernel under a measure \(\nu \) is in fact a product regular conditional kernel under a measure that is related to \(\nu \). In Sect. 4, we introduce and study weakly continuous regular conditional kernels. In Sect. 5, we present some facts about lower semicontinuous functions with compact sublevel sets. Relying on the results of Sects. 4 and 5, in Sect. 6, we present results on large deviation bounds for product regular conditional probabilities, in particular necessary and sufficient conditions for these bounds to hold. In Sect. 7, we discuss how to obtain large deviation bounds for regular conditional probabilities from the results in Sect. 6. In Sect. 8, we apply the theory to obtain the large deviation principle for the empirical density of the first coordinate given the empirical density of the second coordinate, for independent and identically distributed pairs of random variables. In Sect. 9, we give some examples. We also include an example for which the conditions are not satisfied. For this example we compare the quenched large deviations with large deviations of the weakly continuous regular conditional probabilities and comment on the difference with an example by La Cour and Schieve [20]. In “Appendices 1 and 2” we state some general results considering large deviations bounds that are used in the different sections. In “Appendix 3” we provide the proof of a theorem on which the examples of Sect. 9 rely.
Notations and Conventions
\(\mathbb {N}=\{1,2,3,\dots \}\). For a topological space \(\mathcal {X}\), we write \(\mathcal {B}(\mathcal {X})\) for the Borel \(\sigma \)algebra and \(\mathcal {P}(\mathcal {X})\) and \(\mathcal {M}(\mathcal {X})\) for the spaces of probability and signed measures on \(\mathcal {B}(\mathcal {X})\), respectively. For \(A\subset \mathcal {X}\) we write \(A^\circ \) for the interior of A and \({{\overline{A}}}\) for the closure of A. For \(x\in \mathcal {X}\) we write \(\delta _x\) for the element in \(\mathcal {P}(\mathcal {X})\) with \(\delta _x(A) =1\) if \( x\in A\) and \(\delta _x(A)=0\) otherwise. For \(x\in \mathcal {X}\) we write \(\mathcal {N}_x\) for the set of \(\mathcal {B}(\mathcal {X})\)measurable neighbourhoods of x. For a \(\mu \in \mathcal {M}(\mathcal {X})\) we write \({{\mathrm{supp}}}\, \mu = \{ x\in \mathcal {X}: \mu (V)>0 \text{ for } \text{ all } V\in \mathcal {N}_x\}\) and call this the support of \(\mu \). For a function f from a set \(\mathcal {X}\) into \(\mathbb {R}\) and \(c\in \mathbb {R}\) we write \([f \ge c] = \{x\in \mathcal {X}: f(x) \ge c\}\). Similarly, we use the notations \([f > c]\), \([f\le c]\) and \([f<c]\). Whenever \((x_\iota )_{\iota \in \mathbb {I}}\) is a net, where \(\mathbb {I}\) is a directed set by (a direction) \(\preceq \), we write \(\liminf _{\iota \in \mathbb {I}} x_\iota = \sup _{\iota _0\in \mathbb {I}} \inf _{\iota \succeq \iota _0, \iota \in \mathbb {I}} x_\iota \) (similarly \(\limsup \)). In particular, if \(\mathcal {V}\subset \mathcal {N}_x\) and \(\bigcap \mathcal {V}= \{x\}\) and \(f: \mathcal {V}\rightarrow \mathbb {R}\), we write \(\liminf _{V\in \mathcal {V}} f(V) = \sup _{V_0\in \mathcal {V}} \inf _{V\subset V_0, V\in \mathcal {V}} f(V)\) (i.e. we consider \((f(V))_{V\in \mathcal {V}}\) as a net where \(\mathcal {V}\) is directed by \(\supset \) (as \(\preceq \))).
Whenever we write \(\mu (A B)\), we implicitly assume that it is well defined (as \(\mu (A\cap B)/\mu (B)\)), i.e. that \(\mu (B)\ne 0\).
We use the conventions \(\log 0 =  \infty \) and \(\inf I(\emptyset ) = \infty \) whenever I is a function with values in \([0,\infty ]\).
All measures in this paper are signed measures, unless mentioned otherwise.
Regular Conditional Kernels Being Product Regular Conditional Kernels
In this section we introduce the notion of a (product) regular conditional kernel. For an extensive study on regular conditional kernels, see Bogachev [4, Section 10.4]. The notion of a product regular conditional kernel does not appear in [4], but it does in Faden [12] and in Leao et al. [21]. Besides giving definitions, we make a few observations, of which Theorem 3.6 is used later on to derive statements of regular conditional kernels from statements of product regular conditional kernels.
In this section \((X,\mathcal {A})\), \((Y,\mathcal {B})\) are measurable spaces, \(\nu \) is a measure on \(\mathcal {A}\) and \(\mu \) is a measure on \(\mathcal {A}\otimes \mathcal {B}\), \(\tau : X \rightarrow Y\) is measurable, and \(\pi : X \times Y \rightarrow Y\) is given by \(\pi (x,y)=y\).
Definition 3.1
A function \(\eta : Y \times \mathcal {A}\rightarrow \mathbb {R}\) is called a (\(\mathcal {B}\))kernel if \(\eta (\cdot ,A)\) is (\(\mathcal {B}\))measurable for all \(A\in \mathcal {A}\) and \(\eta (y,\cdot )\) is a measure for all \(y \in Y\). A kernel \(\eta \) is called a probability kernel if \(\eta (y,\cdot )\) is a probability measure for all \(y \in Y\).
Definition 3.2
Let \(\eta : Y \times \mathcal {A}\rightarrow \mathbb {R}\) be a (probability) kernel.

(a)
\(\eta \) is called a regular conditional kernel (regular conditional probability) under \(\nu \) with respect to \(\tau \) if
$$\begin{aligned} \nu (A \cap \tau ^{1}(B) ) = \int _Y \mathbbm {1}_B(y) \eta (y,A) {{\mathrm{\, \mathrm {d}}}}\left[ \nu \circ \tau ^{1}\right] (y) \quad (A\in \mathcal {A}, B\in \mathcal {B}). \end{aligned}$$(3.1) 
(b)
\(\eta \) is called a product regular conditional kernel (product regular conditional probability) under \(\mu \) with respect to \(\pi \) if
$$\begin{aligned} \mu (A \times B ) = \int _Y \mathbbm {1}_B(y) \eta (y,A) {{\mathrm{\, \mathrm {d}}}}\left[ \mu \circ \pi ^{1}\right] (y) \quad (A\in \mathcal {A}, B\in \mathcal {B}). \end{aligned}$$(3.2)
3.3
Suppose that \(\mathcal {E}\) is a sub\(\sigma \)algebra of \(\mathcal {F}\). Let \((Y,\mathcal {B}) = (X, \mathcal {E})\) and \({{\mathrm{Id}}}: (X, \mathcal {A}) \rightarrow (Y,\mathcal {B})\) be the identity map. In agreement of [4, Definition 10.4.1] a kernel \(\eta : Y \times \mathcal {A}\rightarrow \mathbb {R}\) is a regular conditional kernel under \(\mu \) with respect to \(\mathcal {E}\) if and only if \(\eta \) is a regular conditional kernel under \(\mu \) with respect to \({{\mathrm{Id}}}\).
3.4
Consider the two kernels \(\eta : Y \times \mathcal {A}\rightarrow \mathbb {R}\) and \(\xi : Y \times (\mathcal {A}\otimes \mathcal {B}) \rightarrow \mathbb {R}\), corresponding to each other by the formulas \(\xi (y, F) = \int _\mathcal {X}\mathbbm {1}_{F}(x,y) {{\mathrm{\, \mathrm {d}}}}[\eta (y,\cdot )](x)\) and \(\eta (y,A) = \xi (y,A\times Y)\). Then \(\xi \) is a regular conditional kernel under \(\mu \) given \(\pi \) if and only if \(\eta \) is a product regular conditional kernel under \(\mu \) given \(\pi \).
In general, \(X\times Y\) may be equipped with a \(\sigma \)algebra \(\mathcal {F}\) different from \(\mathcal {A}\otimes \mathcal {B}\). In this situation, where \(\mu \) is a measure on \(\mathcal {F}\) and \(\pi \) is \(\mathcal {F}\)measurable, the above correspondence cannot be used in general to reduce statements about product regular conditional kernels to statements about regular conditional kernels. See also example 4.5.
On the other hand, regular conditional probabilities can be seen as special cases of product regular conditional probabilities; see Theorem 3.6. In the present paper we use this to derive Theorem 1.3 from Theorem 1.2 but also Theorem 7.5 from Theorem 6.9.
Remark 3.5
If \(\mathcal {A}\) is generated by a countable set, two regular conditional probabilities under a measure with respect to a \(\sigma \)algebra (see 3.3) are almost everywhere equal (see Bogachev [4, Theorem 10.4.3]). Similarly one could state an analogous statement for regular conditional kernels with respect to measurable maps and for product regular conditional kernels. In Theorem 4.3 we prove that (product) regular conditional kernels are unique on the domain on which they are weakly continuous, in case the underlying topological space is perfectly normal. For such space the Borel \(\sigma \)algebra may not be generated by a countable set.^{Footnote 2}
Theorem 3.6

(a)
There exists a measure \({{\tilde{\mu }}}\) on \((X \times Y, \mathcal {A}\otimes \mathcal {B})\) for which \({{\tilde{\mu }}}(A\times B) = \nu (A \cap \tau ^{1}(B))\).

(b)
\(\eta : Y\times \mathcal {A}\rightarrow \mathbb {R}\) is a regular conditional kernel under \(\nu \) with respect to \(\tau \) if and only if \(\eta \) is a product conditional kernel under \({{\tilde{\mu }}}\) with respect to \(\pi \).
Proof

(a)
We may assume \(\nu \) to be positive, since \(\nu = \nu ^+  \nu ^\). Let \(\mathcal {E}\) be the set that consists of \(\bigcup _{i=1}^n A_i \times B_i\), where \(n\in \mathbb {N}\) and \(A_i \in \mathcal {A}, B_i \in \mathcal {B}\) are such that \(A_1\times B_1,\dots , A_n\times B_n\) are disjoint. Define \(\nu ^*: \mathcal {E}\rightarrow [0,\infty )\) by \( \nu ^* \left( \bigcup _{i=1}^n A_i \times B_i \right) = \nu \left( \bigcup _{i=1}^n A_i \cap \tau ^{1}(B_i) \right) \) for \(A_1,\dots ,A_n\in \mathcal {A}\) and \(B_1,\dots ,B_n \) \( \in \mathcal {B}\) as above. Checking that \(\mathcal {E}\) is a ring of sets and that \(\nu ^*\) is \(\sigma \)additive is left for the reader. The existence and unicity of the extension \({{\tilde{\mu }}}\) follow from the Carathéodory theorem (see Halmos [15, Section 13, Theorem A]).

(b)
It follows from by definition of \({{\tilde{\mu }}}\) (note that \(\nu \circ \tau ^{1} = {{\tilde{\mu }}} \circ \pi ^{1}\)). \(\square \)
Weakly Continuous Kernels
In this section we introduce the notion of weak continuity for kernels on topological spaces. In Theorem 4.3 we show uniqueness of (product) regular conditional kernels that are weakly continuous. In Theorems 4.6 and 4.8 we describe conditions that imply the existence of weakly continuous regular conditional probabilities. Similarly as is done in the Portmanteau theorem when one considers metric spaces, weak convergence implies lower bounds for open sets and upper bounds for closed sets, as is shown in Theorem 4.10. As described in Lemmas 4.11 and 4.12, these \(\liminf \) and \(\limsup \) bounds imply bounds for (product) regular conditional probabilities on which the results of Sects. 6 and 7 are based.
In this section \(\mathcal {X}\) and \(\mathcal {Y}\) are topological spaces, \(\nu \) is a measure on \(\mathcal {B}(\mathcal {X})\), \(\mu \) is a measure on \(\mathcal {B}(\mathcal {X})\otimes \mathcal {B}(\mathcal {Y})\), \(\tau : \mathcal {X}\rightarrow \mathcal {Y}\) is measurable, and \(\pi : \mathcal {X}\times \mathcal {Y}\rightarrow \mathcal {Y}\) is given by \(\pi (x,y) = y\).
Definition 4.1
We equip the space of measures, \(\mathcal {M}(\mathcal {X})\), with the weak topology (generated by \(C_b(\mathcal {X})\), which we denote by \(\sigma (\mathcal {M}(\mathcal {X}),C_b(\mathcal {X}))\) as in the book of Schaefer [26, Chapter II, Section 5]). In this topology, a net \((\mu _\iota )_{\iota \in \mathbb {I}}\) in \(\mathcal {M}(\mathcal {X})\) converges to a \(\mu \) in \(\mathcal {M}(\mathcal {X})\) if and only if \(\int _\mathcal {X}f {{\mathrm{\, \mathrm {d}\!}}}\mu _\iota \rightarrow \int _\mathcal {X}f {{\mathrm{\, \mathrm {d}\!}}}\mu \) for all \(f\in C_b(\mathcal {X})\).
Let \(D\subset \mathcal {Y}\). A kernel \(\eta : \mathcal {Y}\times \mathcal {B}(\mathcal {X}) \rightarrow \mathbb {R}\) is called weakly continuous on D if the map \(D\rightarrow \mathcal {M}(\mathcal {X})\) given by \(y\mapsto \eta (y,\cdot )\) is continuous in the weak topology. \(\eta \) is called weakly continuous if \(\eta \) is weakly continuous on \(\mathcal {Y}\).
Theorem 4.2
Let \(\mathcal {X}\) be a perfectly normal^{Footnote 3} space and \(\mu \in \mathcal {M}(\mathcal {X})\). Then
Moreover, \(\mu (\mathcal {X}\setminus {{\mathrm{supp}}}(\mu )) =0\).^{Footnote 4} As a consequence, \(\mu =0\) if and only if \(\int _\mathcal {X}f {{\mathrm{\, \mathrm {d}\!}}}\mu =0\) for all \(f\in C_b(\mathcal {X})\).
Proof
We may assume \(\mu \) is positive. Let \(x\in {{\mathrm{supp}}}\, \mu \). Then \(\mu (V)>0\) for all \(V\in \mathcal {N}_x\). Let \(f\in C(\mathcal {X},[0,1])\) be such that \(f(x)>0\). Then \(V= f^{1}(0,\infty )\) has strictly positive measure. Since \(\mu (V) = \lim _{n\rightarrow \infty }\int _\mathcal {X}\min \{nf, 1\} {{\mathrm{\, \mathrm {d}\!}}}\mu \), there exists an n such that \(\int _\mathcal {X}\min \{nf, 1\} {{\mathrm{\, \mathrm {d}\!}}}\mu >0\). Consequently, as \(f \ge \frac{1}{n} \min \{nf, 1\}\), we have \(\int _\mathcal {X}f {{\mathrm{\, \mathrm {d}\!}}}\mu >0\).
Let \(x\in \mathcal {X}\) be such that \(\int _\mathcal {X}f {{\mathrm{\, \mathrm {d}\!}}}\mu >0\) for all \(f \in C(\mathcal {X},[0,1])\) with \(f(x)>0\). Let \(V\in \mathcal {N}_x\). As \(V= f^{1}(0,\infty )\) for some \(f\in C(\mathcal {X},[0,1])\), we have \(\mu (V) \ge \int _\mathcal {X}f {{\mathrm{\, \mathrm {d}\!}}}\mu >0\). \(\square \)
Theorem 4.3
Suppose that \(\mathcal {X}\) is a perfectly normal space.

(a)
Let \(\eta \) and \(\zeta \) be regular conditional kernels under \(\nu \) with respect to \(\tau \) that are weakly continuous on \({{\mathrm{supp}}}(\nu \circ \tau ^{1})\). Then \(\eta (y,\cdot )=\zeta (y,\cdot )\) for all \(y\in {{\mathrm{supp}}}(\nu \circ \tau ^{1})\). If \(\nu \) is a probability measure, then \(\eta (y,\cdot )\) is a probability measure for all \(y\in {{\mathrm{supp}}}(\nu \circ \tau ^{1})\).

(b)
Let \(\eta \) and \(\zeta \) be product regular conditional kernels under \(\mu \) with respect to \(\pi \) that are weakly continuous on \({{\mathrm{supp}}}(\mu \circ \pi ^{1})\). Then \(\eta (y,\cdot )=\zeta (y,\cdot )\) for all \(y\in {{\mathrm{supp}}}(\mu \circ \pi ^{1})\). If \(\mu \) is a probability measure, then \(\eta (y,\cdot )\) is a probability measure for all \(y\in {{\mathrm{supp}}}(\mu \circ \pi ^{1})\).
Proof
We prove (a), and the proof of (b) is similar (replace “\(\nu \circ \tau ^{1}\)” by “\(\mu \circ \pi ^{1}\)”). To prove \(\eta = \zeta \) on \(D={{\mathrm{supp}}}(\nu \circ \tau ^{1})\), by Theorem 4.2, it is sufficient to prove \(\int _\mathcal {X}f {{\mathrm{\, \mathrm {d}\!}}}\eta (y,\cdot ) = \int _\mathcal {X}f {{\mathrm{\, \mathrm {d}\!}}}\zeta (y,\cdot )\) for all \(y\in D\) and all \(f\in C_b(\mathcal {X})\). Let \(f\in C_b(\mathcal {X})\). Because f is the uniform limit of simple functions, one has for all \(B\in \mathcal {B}(\mathcal {Y})\)
Therefore there exists a set \(Z\in \mathcal {B}(\mathcal {Y})\) with \(\nu \circ \tau ^{1}(\mathcal {Y}\setminus Z)=0\) such that
Since both \(y \mapsto \int _\mathcal {X}f {{\mathrm{\, \mathrm {d}\!}}}\eta (y,\cdot )\) and \(y \mapsto \int _\mathcal {X}f {{\mathrm{\, \mathrm {d}\!}}}\zeta (y,\cdot )\) are weakly continuous on D, and Z is dense in D by Theorem 4.2, we have \(\int _\mathcal {X}f {{\mathrm{\, \mathrm {d}\!}}}\eta (y,\cdot ) = \int _\mathcal {X}f {{\mathrm{\, \mathrm {d}\!}}}\zeta (y,\cdot )\) for all \(y\in D\). The second statement is proved by taking \(f=\mathbbm {1}_\mathcal {X}\). \(\square \)
4.4
When \(\eta \) is a regular conditional kernel under \(\nu \) with respect to \(\tau \), the value of the function \(\eta (\cdot ,A)\) on the complement of \({{\mathrm{supp}}}(\nu \circ \tau ^{1})\) is not determined, in the sense that if \({{\tilde{\eta }}}\) is a kernel with \({{\tilde{\eta }}}(y,\cdot ) = \eta (y,\cdot )\) for all \(y\in {{\mathrm{supp}}}(\nu \circ \tau ^{1})\), then \({{\tilde{\eta }}}\) is also a regular conditional kernel under \(\nu \) with respect to \(\tau \).
For example \({{\tilde{\eta }}}\) given by \({{\tilde{\eta }}}(y,\cdot ) = \eta (y,\cdot )\) for \(y\in {{\mathrm{supp}}}(\nu \circ \tau ^{1})\) and \({{\tilde{\eta }}}(y,\cdot ) = \delta _x\) for \(y\in {{\mathrm{supp}}}(\nu \circ \tau ^{1})^c\) for some chosen \(x\in \mathcal {X}\), is such regular conditional kernel.
Whence if \(\nu \) is a probability measure and there exists a regular conditional kernel under \(\nu \) with respect to \(\tau \) that is weakly continuous on \({{\mathrm{supp}}}(\nu \circ \tau ^{1})\), then we may as well assume this kernel to be a probability kernel. A similar statement is true for product regular conditional kernels.
4.5
By Theorem 3.6, statement (a) of Theorem 4.3 is a consequence of statement (b). In an attempt to reduce statement (b) to statement (a), the following problem occurs to the correspondence between regular conditional kernels and product regular conditional kernels that is mentioned in 3.4.
The Borel \(\sigma \)algebra of \(\mathcal {X}\times \mathcal {Y}\), i.e. \(\mathcal {B}(\mathcal {X}\times \mathcal {Y})\), may be strictly larger than \(\mathcal {B}(\mathcal {X}) \otimes \mathcal {B}(\mathcal {Y})\) (see, e.g. Bogachev [4, Lemma 6.4.1 and Example 6.4.3]). If this is the case, i.e. \(\mathcal {B}(\mathcal {X})\otimes \mathcal {B}(\mathcal {Y}) \subsetneq \mathcal {B}(\mathcal {X}\times \mathcal {Y})\), and \(\mathcal {B}(\mathcal {X}\times \mathcal {Y})\) equals the Baire\(\sigma \)algebra on \(\mathcal {X}\times \mathcal {Y}\), i.e. the smallest \(\sigma \)algebra that makes all continuous function \(\mathcal {X}\times \mathcal {Y}\rightarrow \mathbb {R}\) measurable, then there exists a continuous function \(f\in C(\mathcal {X}\times \mathcal {Y})\) that is not \(\mathcal {B}(\mathcal {X}) \otimes \mathcal {B}(\mathcal {Y})\)measurable. Composing the function f with \(\arctan \), we obtain a \(g\in C_b(\mathcal {X}\times \mathcal {Y})\) that is not measurable with respect to \(\mathcal {B}(\mathcal {X})\otimes \mathcal {B}(\mathcal {Y})\). So if \(\eta : \mathcal {Y}\times \mathcal {B}(\mathcal {X}) \rightarrow \mathbb {R}\) is a product regular conditional kernel under \(\mu \) with respect to \(\pi \), and \(\xi : \mathcal {Y}\times \mathcal {B}(\mathcal {X}) \otimes \mathcal {B}(\mathcal {Y}) \rightarrow \mathbb {R}\) is as in Example 3.4, then g is not integrable with respect to \(\xi (y,\cdot )\) for any \(y\in \mathcal {Y}\).
\(\mathcal {B}(\mathcal {X}\times \mathcal {Y})\) equals the Baire\(\sigma \)algebra if \(\mathcal {X}\times \mathcal {Y}\) is a metric space (Bogachev [4, Proposition 6.3.4]). Therefore \(\mathcal {X}= \mathcal {Y}= \mathbb {R}^\mathbb {R}\) equipped with the discrete topology form an example for which the above is the case.
We state two theorems (Theorems 4.6 and 4.8) showing the existence of product regular conditional probabilities that are weakly continuous on \({{\mathrm{supp}}}(\mu \circ \pi ^{1})\).
Theorem 4.6
Suppose that \(\mathcal {Y}\) is countable and equipped with the discrete topology. Then \(\eta : \mathcal {Y}\times \mathcal {B}(\mathcal {X}) \rightarrow \mathbb {R}\) defined by
is a product regular conditional kernel under \(\mu \) with respect to \(\pi \) that is weakly continuous on \({{\mathrm{supp}}}(\mu \circ \pi ^{1})\).
Proof
It follows from the fact that \(\mu (A\times B) = \sum _{y\in B} \mu (A\times \{y\})\) for \(A\in \mathcal {B}(\mathcal {X})\), \(B\in \mathcal {B}(\mathcal {Y})\). \(\square \)
4.7
In case \(\mathcal {Y}\) is first countable, the notion of open and closed sets and continuity of functions \(\mathcal {Y}\rightarrow \mathbb {R}\) is characterised by the convergence of sequences. Therefore the following are equivalent for a kernel \(\eta : \mathcal {Y}\times \mathcal {B}(\mathcal {X}) \rightarrow \mathbb {R}\)

(a)
\(\eta \) is weakly continuous in y.

(b)
For all \((y_n)_{n\in \mathbb {N}}\) in \(\mathcal {Y}\) with \(y_n \rightarrow y\), one has \(\eta (y_n,\cdot ) \xrightarrow {w} \eta (y,\cdot )\).
The following theorem is an easy consequence of Lebesgue dominated convergence theorem.
Theorem 4.8
Let \(\mathcal {Y}\) be first countable. Let \(\lambda \) be a probability measure on \(\mathcal {B}(\mathcal {X})\). Let \(D\subset \mathcal {Y}\). Let \(f: \mathcal {X}\times \mathcal {Y}\rightarrow [0,\infty )\) be a bounded \(\mathcal {B}(\mathcal {X}) \otimes \mathcal {B}(\mathcal {Y})\)measurable function such that \(y\mapsto f(x,y)\) is continuous on D and equal to zero on \(\mathcal {Y}\setminus D\) for \(\lambda \)almost all \(x\in \mathcal {X}\). Suppose that \(\int _\mathcal {X}f(x,y) {{\mathrm{\, \mathrm {d}\!}}}\lambda (x)>0\) for all \(y\in D\). If \(\eta :\, \mathcal {Y}\times \mathcal {B}(\mathcal {X}) \rightarrow [0,1]\) is given by
then \(\eta \) is weakly continuous on D (even strongly continuous, i.e. \(y \mapsto \eta (y,A)\) is continuous for all \(A\in \mathcal {B}(\mathcal {X})\)). Let \(\kappa \) be a probability measure on \(\mathcal {B}(\mathcal {Y})\) and assume \(D= {{\mathrm{supp}}}\, \kappa \). Then \(\eta \) is a product regular conditional kernel under
with respect to \(\pi \), that is weakly continuous on \(D= {{\mathrm{supp}}}(\mu \circ \pi ^{1})\).
Remark 4.9
In the above theorem the conditions may be weakened. Instead of assuming f to be bounded and \(\lambda \), \(\kappa \) to be probability measures, we may as well assume that \(\lambda \) and \(\kappa \) are positive nonzero measures; that for all \(y\in D\) there exists a \(V\in \mathcal {N}_y\) and a \(\lambda \)integrable \(h: \mathcal {X}\rightarrow [0,\infty )\) such that \(f(x,z) \le h(x)\) for all \(x\in \mathcal {X}\); and all \(z\in V \cap D \) and that f is \(\lambda \otimes \kappa \)integrable.
In Sect. 6, the condition (b) of Theorem 4.10 is one of the key assumptions. If \(\mathcal {X}\) is a metric space, this property follows from weak continuity as in the Portmanteau theorem. We state this in Theorem 4.10.
Theorem 4.10
Let \(\eta : \mathcal {Y}\times \mathcal {B}(\mathcal {X}) \rightarrow \mathbb {R}\) be a probability kernel. Let \(D\subset \mathcal {Y}\), \(y\in D\) and \(\mathcal {V}\subset \mathcal {N}_y\) be such that \(\bigcap \mathcal {V}= \{y\}\). Consider the following conditions.

(a)
\(D \rightarrow \mathcal {M}(\mathcal {X})\), \(y\mapsto \eta (y,\cdot )\) is weakly continuous in y.

(b)
\(\liminf _{\iota \in \mathbb {I}} \eta (y_\iota ,G) \ge \eta (y,G)\) for all open \(G\subset \mathcal {X}\) and \((y_\iota )_{\iota \in \mathbb {I}}\) in D with \(y_\iota \rightarrow y\).

(c)
\(\limsup _{\iota \in \mathbb {I}}\eta (y_\iota ,F) \le \eta (y,F)\) for all closed \(F\subset \mathcal {X}\) and \((y_\iota )_{\iota \in \mathbb {I}}\) in D with \(y_\iota \rightarrow y\).

(d)
\(\sup _{V\in \mathcal {V}} \inf _{v\in V\cap D} \eta (v,G) \ge \eta (y,G)\) for all open sets \(G\subset \mathcal {X}\).

(e)
\(\inf _{V\in \mathcal {V}} \sup _{v\in V\cap D} \eta (v,F) \le \eta (y,F)\) for all closed sets \(F\subset \mathcal {X}\).
(b), (c), (d), (e) are equivalent. If \(\mathcal {X}\) is metrisable, then (a) implies (b). If \(\mathcal {X}\) is metrisable and \(\mathcal {Y}\) is first countable, then (a) is equivalent to (b) and hence to (c), (d), and (e).
Proof
We leave it to the reader to check the equivalences between (b), (c), (d), (e). If \(\mathcal {X}\) is a metric space, one can follow the lines of the Portmanteau theorem in the book of Billingsley [3, Theorem 2.1] for the implication (a) implies (b); the fact that the measures in the proof are indexed by the natural numbers instead of a general directed set \(\mathbb {I}\) does not affect the argument. The proof of (b)\(\Longrightarrow \)(a) in the book of Billingsley relies on the Lebesgue dominated convergence theorem. But when \(\mathcal {Y}\) is first countable, one can restrict to sequences (see 4.7) and obtain the implication (b)\(\Longrightarrow \)(a) as is done in the book of Billingsley. \(\square \)
Lemma 4.11
Assume that \(\mu \) is a probability measure. Let \(\eta \) be a product regular conditional probability under \(\mu \) with respect to \(\pi \). Write \(D={{\mathrm{supp}}}(\mu \circ \pi ^{1})\) and let \(y\in D\). Then for every \(U\in \mathcal {N}_y\), one has \(\mu (\mathcal {X}\times U) >0\) and
Moreover, if \(\mathcal {V}\subset \mathcal {N}_y\) is such that \(\bigcap \mathcal {V}= \{y\}\) and \(\eta \) satisfies (b) of Theorem 4.10, then
Proof
Let \(U \in \mathcal {N}_y\). Since \(y\in D={{\mathrm{supp}}}(\mu \circ \pi ^{1})\), one has \( \mu (\mathcal {X}\times U)>0\). (4.7) follows from the fact that for all \(A\in \mathcal {B}(\mathcal {X})\)
For an open \(G\subset \mathcal {X}\) we have for \(\mathcal {V}\) as above
Thus (4.8) follows when assuming (d) of Theorem 4.10. Similarly, one obtains (4.9). \(\square \)
For a regular conditional probability we have a similar statement; see Lemma 4.12. The proof can be done following the lines of the proof of Lemma 4.11 or as a consequence of Lemma 4.11 using Theorem 3.6.
Lemma 4.12
Assume that \(\nu \) is a probability measure. Let \(\eta \) be a regular conditional probability under \(\nu \) with respect to \(\tau \). Write \(D={{\mathrm{supp}}}(\nu \circ \tau ^{1})\) and let \(y\in D\). Then for every \(U\in \mathcal {N}_y\), one has \(\nu (\tau ^{1}(U)) >0\) and
Moreover, if \(\mathcal {V}\subset \mathcal {N}_y\) is such that \(\bigcap \mathcal {V}= \{y\}\) and \(\eta \) satisfies (b) of Theorem 4.10, then
Some Facts About Functions with Compact Sublevel Sets
In this section we present some facts for functions with compact sublevel sets which are used in Sects. 6, 7 and 8.
In this section \(\mathcal {X},\mathcal {Y}\) and \(\mathcal {Z}\) are topological spaces.
Definition 5.1
Let \(J: \mathcal {X}\rightarrow [0,\infty ]\). We call the set \([J\le \alpha ]\) (see Sect. 2) a sublevel set of J for \(\alpha \in [0,\infty )\). J is said to be lower semicontinuous if all sublevels of J are closed. J is said to have compact sublevel sets if all sublevels of J are compact.
5.2
Let \(J: \mathcal {X}\rightarrow [0,\infty ]\) be lower semicontinuous. Then
Indeed, for all \(\alpha < J(x)\) the set \([J > \alpha ] \) is open and contains x.
Hence, a function \(J: \mathcal {X}\rightarrow [0,\infty ]\) is lower semicontinuous if and only if
for all \(x\in \mathcal {X}\) and all nets \((x_\iota )_{\iota \in \mathbb {I}}\) in \(\mathcal {X}\) that converge to x.
Lemma 5.3
Let \(\tau : \mathcal {Z}\rightarrow \mathcal {Y}\) be continuous. Let \(J: \mathcal {Z}\rightarrow [0,\infty ]\) have compact sublevel sets. Let \(y\in \mathcal {Y}\) and \(\mathcal {V}\subset \mathcal {N}_y\), \(\bigcap \mathcal {V}= \{y\}\). Let \(F\subset \mathcal {Z}\) be closed. Then
Consequently, if \(\mathcal {Z}= \mathcal {X}\times \mathcal {Y}\), then, for all closed \(F\subset \mathcal {X}\) with \(\inf J(F \times \{y\})<\infty \),
Proof
The \(\le \) inequality in (5.3) is immediate. Because \(\liminf _{V\in \mathcal {V}} \inf J(F \cap \tau ^{1}( {\overline{V}} ) ) \ge \liminf _{V\in \mathcal {N}_y} \inf J(F \cap \tau ^{1}( \overline{V} ) ) \), it is sufficient to prove
Note that \(\alpha =\sup _{V\in \mathcal {N}_y} \inf J(F \cap \tau ^{1}({\overline{V}}) )\). If \(\alpha = \infty \), there is nothing to prove. Suppose that \(\alpha <\infty \). Whence \(F\cap \tau ^{1}({\overline{V}}) \cap [ J \le \alpha + \varepsilon ] \ne \emptyset \) for all \(V\in \mathcal {N}_y\) and all \(\varepsilon >0\). Since \([J \le \alpha + \varepsilon ]\) is compact, this implies that \(\bigcap _{V\in \mathcal {N}_y} F\cap \tau ^{1}({\overline{V}}) \cap [ J \le \alpha + \varepsilon ] \ne \emptyset \), i.e. \(\inf J(F \cap \tau ^{1}(\{y\})) \le \alpha + \varepsilon \) for all \(\varepsilon >0\). \(\square \)
5.4
The assumption that \(\tau \) be continuous is not redundant, e.g. consider \(\mathcal {Y}=\mathcal {Z}=[0,1]\) and \(J = \mathbbm {1}_{(\frac{1}{2},1]}\) and \(\tau \) given by \(\tau (0)=0\), \(\tau (1)=1\) and \(\tau (x) = 1x\) for \(x\in (0,1)\), \(F=[0,1]\) and \(y=1\). Then, for all neighbourhoods V of y, \(\tau ^{1}(V)\) contains the interval \((0,\varepsilon )\) for some \(\varepsilon >0\), whence \(\inf J(F\cap \tau ^{1}(V)) = 0\) but \(\inf J(F\cap \tau ^{1}(\{y\})) = J(1) = 1\).
Lemma 5.5
Let \(\mathcal {X}\) be normal and let \(\mathcal {G}\) be a basis for the topology of \(\mathcal {X}\). Let \(J: \mathcal {X}\times \mathcal {Y}\rightarrow [0,\infty ]\) have compact sublevel sets.

(a)
For all open \(G\subset \mathcal {X}\) and \(\varepsilon >0\), there exists a \(U\in \mathcal {G}\) with \(U\subset {{\overline{U}}} \subset G\) such that
$$\begin{aligned} \inf J(G \times \{y\}) + \varepsilon \ge \inf J (U \times \{y\}). \end{aligned}$$(5.6) 
(b)
For all closed \(F\subset \mathcal {X}\) and \(\alpha < \inf J(F \times \{y\})\), there exists \(U_1,\dots , U_k \in \mathcal {G}\) such that with \(W=\mathcal {X}\setminus (U_1\cup \cdots \cup U_k)\) one has \(F\subset W^\circ \subset W\) and
$$\begin{aligned} \alpha&< \inf J(W \times \{y\}) \le \inf J(W^\circ \times \{y\}) \le \inf J(F \times \{y\}). \end{aligned}$$(5.7)
Proof

(a)
Let \(\varepsilon >0\). Let \(x\in G\) be such that \( J(x,y) \le \inf J(G \times \{y\}) + \varepsilon . \) Since \(\mathcal {X}\) is a normal topological space, there exists an open set U with \(x\in U \subset {{\overline{U}}} \subset G\). Because \(\mathcal {G}\) is a basis, U may be chosen in \(\mathcal {G}\). Then \( \inf J(G \times \{y\}) + \varepsilon \ge J(x,y) \ge \inf J(U \times \{y\}). \)

(b)
Let \(\beta >\alpha \) be such that \(\beta < \inf J(F \times \{y\})\). The set \(K:=\{ x\in \mathcal {X}: J(x,y)\le \beta \}\) is a compact set that is disjoint from F. Whence there exists disjoint open \(U,V \subset \mathcal {X}\) with \(K\subset U\) and \(F\subset V\). Since \(\mathcal {G}\) is a basis and K is compact, there exists \(U_1,\dots ,U_k\) in \(\mathcal {G}\) with \(K\subset U_1\cup \cdots \cup U_k \subset U\). Then \(\overline{ U_1\cup \cdots \cup U_k} \cap V = \emptyset \). Whence with \(W:=\mathcal {X}\setminus \overline{ U_1\cup \cdots \cup U_k}\), one has \(F\subset W^\circ \) and \(W\subset \mathcal {X}\setminus K\), which implies \(\inf J(W \times \{y\}) \ge \beta > \alpha \). \(\square \)
Large Deviations for Product Regular Conditional Probabilities
In this section we consider the following situation.

(i)
\(\mathcal {X}\) and \(\mathcal {Y}\) are topological spaces, where \(\mathcal {X}\) is normal.

(ii)
\(\mathcal {G}\) is a basis for the topology of \(\mathcal {X}\) and \(\mathcal {H}\) is a basis for the topology of \(\mathcal {Y}\).

(iii)
\(\pi : \mathcal {X}\times \mathcal {Y}\rightarrow \mathcal {Y}\) is given by \(\pi (x,y) =y\).

(iv)
\((\mu _n)_{n\in \mathbb {N}}\) is a sequence of probability measures on \(\mathcal {B}(\mathcal {X}) \otimes \mathcal {B}(\mathcal {Y})\) satisfying the large deviation principle on \(\{A\times B: A\in \mathcal {B}(\mathcal {X}), B\in \mathcal {B}(\mathcal {Y})\}\) with a rate function \(J: \mathcal {X}\times \mathcal {Y}\rightarrow [0,\infty ]\) that has compact sublevel sets.

(v)
For each \(n\in \mathbb {N}\) we assume the following: \({{\mathrm{supp}}}(\mu _n \circ \pi ^{1}) \ne \emptyset \),^{Footnote 5} there exists a product regular conditional probability \(\eta _n : \mathcal {Y}\times \mathcal {B}(\mathcal {X}) \rightarrow [0,1]\) under \(\mu _n\) with respect to \(\pi \), which satisfies the following continuity condition (see Theorem 4.10):
$$\begin{aligned}&\liminf _{\iota \in \mathbb {I}} \eta _n(y_\iota ,G) \ge \eta _n(y,G) \hbox { for all open }G\subset \mathcal {X}\nonumber \\&\hbox {and }(y_\iota )_{\iota \in \mathbb {I}} \hbox { in }{{\mathrm{supp}}}(\mu _n \circ \pi ^{1}) \hbox {with } y_\iota \rightarrow y. \end{aligned}$$(6.1) 
(vi)
Let \(y \in \mathcal {Y}\). We assume that \(\inf J(\mathcal {X}\times \{y\})<\infty \) and that there exist \(y_n \in {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\) with \(y_n \rightarrow y\). We define \(I: \mathcal {X}\rightarrow [0,\infty ]\) by
$$\begin{aligned} I(x) = J(x,y)  \inf J(\mathcal {X}\times \{y\}). \end{aligned}$$(6.2)
In this section we derive necessary and sufficient conditions for the large deviation bounds with rate function I for sequences of the form \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\). We prove this for general topological spaces instead of metric spaces as it does not cost more effort.
In Theorem 6.3 we consider a fixed sequence \((y_n)_{n\in \mathbb {N}}\) with \(y_n \rightarrow y\) and describe equivalent conditions for the lower and upper large deviation bound to hold.
We are interested in the question whether for all sequences \((y_n)_{n\in \mathbb {N}}\) with \(y_n \rightarrow y\) the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the lower and upper large deviation bound with rate function I. In Theorem 6.9 we give equivalent^{Footnote 6} and sufficient conditions for these bounds in a way that does not depend on sequences \((y_n)_{n\in \mathbb {N}}\) and the sets \((\mathcal {V}_n)_{n\in \mathbb {N}}\) as in Theorem 6.3.
Finally in 6.12 we comment on deriving Theorem 1.2 from Theorem 6.9.
But first we consider specific situations, providing a simple proof of the large deviation bounds with rate function I for sequences of the form \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\). Namely, we consider the case that \(\mathcal {Y}\) is a discrete space (Theorem 6.1) and the case where \(\mu _n\) is a product measure for all \(n\in \mathbb {N}\) (Theorem 6.2).
Theorem 6.1
Suppose that \(\mathcal {Y}\) is countable and equipped with the discrete topology. Let \(y\in \mathcal {Y}\) be such that \(\inf J(\mathcal {X}\times \{y\})<\infty \). For all \((y_n)_{n\in \mathbb {N}}\) in \(\mathcal {Y}\) with \(y_n \in {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\) and \(y_n \rightarrow y\), the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function I.
Proof
This basically follows from the following inequalities which follow from the large deviation principle and from Theorem 4.6.
\(\square \)
Theorem 6.2
(Independent coordinates) Suppose that \(\mathcal {X}\) and \(\mathcal {Y}\) are second countable and \(\mathcal {Y}\) is regular. Suppose that \(\mu _n = \mu _n^1 \otimes \mu _n^2\) for some \(\mu _n^1\) on \(\mathcal {B}(\mathcal {X})\) and \(\mu _n^2\) on \(\mathcal {B}(\mathcal {Y})\) for all \(n\in \mathbb {N}\). Then \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function I for all sequences \((y_n)_{n\in \mathbb {N}}\) in \(\mathcal {Y}\). In particular, \(\eta _n(y_n,\cdot ) = \mu _n^1\) and \(I(x) = \inf J(\{x\} \times \mathcal {Y})\).
Proof
I is lower semicontinuous (e.g. by 5.2) and for \(c\in \mathbb {R}\) the set \([I \le c]\) is a subset of the compact set \(\{x\in \mathcal {X}: \exists z\in \mathcal {Y}, J(x,z) \le c  \inf J(\mathcal {X}\times \{y\})\}\).
\([I\le c] = \pi ( [ J \le c +\inf J(\mathcal {X}\times \{y\})])\). \(\square \)
Theorem 6.3
Let \((y_n)_{n\in \mathbb {N}}\) be a sequence in \(\mathcal {Y}\) with \(y_n \in {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\) that converges to y. For \(n\in \mathbb {N}\) let \(\mathcal {V}_n \subset \mathcal {N}_{y_n}\) be such that \(\bigcap \mathcal {V}_n = \{y_n\}\). Then (a2) \(\iff \) (a3) \(\iff \) (a1) and (b2) \(\iff \) (b3) \(\iff \) (b1)

(a1)
For all open \(G\subset \mathcal {X}\)
$$\begin{aligned} \liminf _{n\rightarrow \infty }\tfrac{1}{n}\log \eta _n(y_n,G) \ge  \inf I(G). \end{aligned}$$(6.5) 
(a2)
For all \(U\in \mathcal {G}\) ^{Footnote 7}
$$\begin{aligned} \liminf _{n\rightarrow \infty }\limsup _{V\in \mathcal {V}_n} \tfrac{1}{n}\log \mu _n({\overline{U}} \times \mathcal {Y} \mathcal {X}\times V) \ge  \inf I(U). \end{aligned}$$(6.6) 
(a3)
For all open \(U\subset \mathcal {X}\), one has
$$\begin{aligned} \liminf _{n\rightarrow \infty }\liminf _{V\in \mathcal {V}_n} \tfrac{1}{n}\log \mu _n(U \times \mathcal {Y} \mathcal {X}\times V) \ge  \inf I(U). \end{aligned}$$(6.7) 
(b1)
For all closed \(F\subset \mathcal {X}\)
$$\begin{aligned} \limsup _{n\rightarrow \infty }\tfrac{1}{n}\log \eta _n(y_n,F) \le  \inf I(F). \end{aligned}$$(6.8) 
(b2)
For all \(U_1,\dots ,U_k\in \mathcal {G}\), one has for \(W= \mathcal {X}\setminus (U_1\cup \cdots \cup U_k)\)
$$\begin{aligned} \limsup _{n\rightarrow \infty }\liminf _{V\in \mathcal {V}_n} \tfrac{1}{n}\log \mu _n(W^\circ \times \mathcal {Y} \mathcal {X}\times V) \le  \inf I(W). \end{aligned}$$(6.9) 
(b3)
For all closed \(W\subset \mathcal {X}\)
$$\begin{aligned} \limsup _{n\rightarrow \infty }\limsup _{V\in \mathcal {V}_n} \tfrac{1}{n}\log \mu _n(W \times \mathcal {Y} \mathcal {X}\times V) \le  \inf I(W). \end{aligned}$$(6.10)
Proof
The implications (a3) \(\Longrightarrow \) (a2) and (b3) \(\Longrightarrow \) (b2) are immediate.
(a1) \(\Longrightarrow \) (a3) Let \(U\subset \mathcal {X}\) be an open set. By Lemma 4.11, (4.8),
(b1) \(\Longrightarrow \) (b3) Let \(W\subset \mathcal {X}\) be a closed set. By Lemma 4.11, (4.9),
(a2) \(\Longrightarrow \) (a1). Let \(G\subset \mathcal {X}\) be open. Let \(\varepsilon >0\) and U be as in Lemma 5.5(a). Then we obtain using Lemma 4.11
As this holds for all \(\varepsilon >0\), we conclude (6.5).
(b2) \(\Longrightarrow \) (b1). Let \(\alpha < \inf J (F \times \{y\})\) and \(U_1,\dots ,U_k\) and W be as in Lemma 5.5(b). Then we obtain using Lemma 4.11
As this holds for all \(\alpha < \inf J(F \times \{y\})\), we conclude (6.8). \(\square \)
6.4
(Fixed y) Note that if \(y_n =y\) for all \(n\in \mathbb {N}\), one can take \(\mathcal {V}_n = \mathcal {V}\) for a \(\mathcal {V}\subset \mathcal {N}_y\) with \(\bigcap \mathcal {V}= \{y\}\). Then Theorem 6.3 implies that \((\eta _n(y,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function I if and only if (a2) and (b2) hold (with \(\mathcal {V}_n=\mathcal {V}\)).
6.5
Let \((y_n)_{n\in \mathbb {N}}\) in \(\mathcal {Y}\) be such that \(y_n \in {{\mathrm{supp}}}(\mu _n\circ \pi ^{1})\) and \(y_n \rightarrow y\). From Theorem 6.3 we derive that (a2) holds for some \(\mathcal {V}_n\subset \mathcal {N}_{y_n}\) with \(\bigcap \mathcal {V}_n=\{y_n\}\) if and only if (a2) holds for all such \(\mathcal {V}_n\). Similarly, (b2) holds for some \(\mathcal {V}_n\subset \mathcal {N}_{y_n}\) with \(\bigcap \mathcal {V}_n=\{y_n\}\) if and only if (b2) holds for all such \(\mathcal {V}_n\subset \mathcal {N}_{y_n}\).
In Lemma 6.7, we give a consequence of the large deviation principle of \((\mu _n)_{n\in \mathbb {N}}\). In Theorems 6.9 and 6.10 we use this to formulate sufficient conditions for upper or lower large deviation bounds on sequences \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) with \(y_n \rightarrow y\) and sequences \((\eta _n(y,\cdot ))_{n\in \mathbb {N}}\).
We assumed \(\mathcal {X}\) to be normal in this section. For Lemma 6.7 this assumption can be dropped.
6.6
For all neighbourhoods V of y one has by the large deviation principle
In particular, there exists an \(N\in \mathbb {N}\) such that \(\mu _n (\mathcal {X}\times V) >0\) for all \(n\ge N\). Therefore \(\mu _n (G \times \mathcal {Y} \mathcal {X}\times V)\) is well defined for large n.
Lemma 6.7

(a)
For open \(G\subset \mathcal {X}\)
$$\begin{aligned} \liminf _{V\in \mathcal {N}_y} \liminf _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \mu _n(\mathcal {X}\times V)>0 } } \tfrac{1}{n}\log \mu _n(G \times \mathcal {Y} \mathcal {X}\times V) \ge  \inf I (G). \end{aligned}$$(6.16) 
(b)
For closed \(F\subset \mathcal {X}\)
$$\begin{aligned} \limsup _{V\in \mathcal {N}_y} \limsup _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \mu _n(\mathcal {X}\times V)>0 } } \tfrac{1}{n}\log \mu _n(F \times \mathcal {Y} \mathcal {X}\times V) \le  \inf I(F). \end{aligned}$$(6.17)
Proof

(a)
Let \(\varepsilon >0\). By Lemma 5.3, there exists a \(V_0\in \mathcal {N}_y\) such that for all \(V\in \mathcal {N}_y\) with \(V\subset V_0\)
$$\begin{aligned}&\inf J(\mathcal {X}\times \{y\}) \ge \inf J(\mathcal {X}\times {\overline{V}}) \ge \inf J(\mathcal {X}\times {\overline{V}}_0) \ge \inf J(\mathcal {X}\times \{y\})  \varepsilon . \end{aligned}$$(6.18)Let \(V\in \mathcal {N}_y\) be such that \(V \subset V_0\). As \(\limsup _{n\rightarrow \infty }\tfrac{1}{n}\log \mu _n(\mathcal {X}\times V) > \infty \) (see 6.6), we can “split the \(\liminf \) in two” and we get by the large deviation principle and by (6.18)
$$\begin{aligned}&\liminf _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \mu _n(\mathcal {X}\times V)>0 } } \tfrac{1}{n}\log \mu _n (G \times \mathcal {Y} \mathcal {X}\times V) \nonumber \\&\qquad \qquad =\liminf _{n\rightarrow \infty }\tfrac{1}{n}\log \mu _n(G \times V)  \limsup _{n\rightarrow \infty }\tfrac{1}{n}\log \mu _n(\mathcal {X}\times V) \nonumber \\&\qquad \qquad \ge  \inf J(G \times \{y\} ) + \inf J(\mathcal {X}\times {\overline{V}}) \ge  \inf I(G)  \varepsilon . \end{aligned}$$(6.19) 
(b)
Let \(\alpha < \inf J(F \times \{y\})\). There exists a neighbourhood \(V_0\) of y such that for all neighbourhoods V of y with \(V\subset V_0\)
$$\begin{aligned}&\inf J(F \times \{y\}) \ge \inf J(F \times {\overline{V}}) \ge \inf J(F \times {\overline{V}}_0) \ge \alpha . \end{aligned}$$(6.20)Let \(V\in \mathcal {N}_y\) be such that \(y\in V \subset V_0\). Similarly as above, we get
$$\begin{aligned} \limsup _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \mu _n(\mathcal {X}\times V)>0 } } \tfrac{1}{n}\log \mu _n(F \times \mathcal {Y} \mathcal {X}\times V)&\le  \alpha + \inf J(\mathcal {X}\times \{y\} ). \end{aligned}$$(6.21)\(\square \)
Theorem 6.8
I has compact sublevel sets.
Proof
\([I\le c] = \pi ([ J \le c +\inf J(\mathcal {X}\times \{y\})])\). \(\square \)
Theorem 6.9
We have
and, if \(\mathcal {Y}\) is first countable, then
where

(A1) For all \((y_n)_{n\in \mathbb {N}}\) with \(y_n \in {{\mathrm{supp}}}(\mu _n\circ \pi ^{1})\) and \(y_n \rightarrow y\), the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation lower bound with rate function I.

(A2) For all \(U\in \mathcal {G}\)
$$\begin{aligned} \sup _{V_0\in \mathcal {N}_y} \liminf _{n\rightarrow \infty }\inf _{ \genfrac{}{}{0.0pt}{}{V\in \mathcal {H}, V \subset V_0}{V\cap {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\ne \emptyset } } \tfrac{1}{n}\log \mu _n({\overline{U}} \times \mathcal {Y} \mathcal {X}\times V) \ge  \inf I(U). \end{aligned}$$(6.22) 
(A3) For all \(U\in \mathcal {G}\)
$$\begin{aligned}&\sup _{V_0\in \mathcal {N}_y} \liminf _{n\rightarrow \infty }\inf _{ \genfrac{}{}{0.0pt}{}{V\in \mathcal {H}, V \subset V_0}{V\cap {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\ne \emptyset } } \tfrac{1}{n}\log \mu _n({\overline{U}} \times \mathcal {Y} \mathcal {X}\times V) \nonumber \\&\quad \ge \liminf _{V\in \mathcal {N}_y} \liminf _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \mu _n(\mathcal {X}\times V)>0 } } \tfrac{1}{n}\log \mu _n( U \times \mathcal {Y} \mathcal {X}\times V). \end{aligned}$$(6.23) 
(A4) For all \(U\in \mathcal {G}\) we have \(\forall Z_0\in \mathcal {N}_y\; \forall \varepsilon >0 \exists V_0\in \mathcal {N}_y\, \exists Z\;\in \mathcal {N}_y, Z\subset Z_0\; \forall M \,\exists m \ge M\, \exists N\; \forall n\ge N\; \forall V\in \mathcal {H}, V \subset V_0, V\cap {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\ne \emptyset \):
$$\begin{aligned} \tfrac{1}{n}\log \mu _n (\overline{U} \times \mathcal {Y} \mathcal {X}\times V) \ge \tfrac{1}{m}\log \mu _m (U \times \mathcal {Y} \mathcal {X}\times Z)  \varepsilon . \end{aligned}$$(6.24) 
(A5) For all \(U\in \mathcal {G}\) we have \(\forall \varepsilon >0\; \forall V_0 \in \mathcal {N}_y \exists N\;\in \;\mathbb {N}\; \forall n \ge N\; \forall V\in \mathcal {H}, V\subset V_0, V\cap {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\ne \emptyset \):
$$\begin{aligned} \mu _n( {{\overline{U}}} \times \mathcal {Y} \mathcal {X}\times V) \ge e^{n\varepsilon } \mu _n( U \times \mathcal {Y} \mathcal {X}\times V_0). \end{aligned}$$(6.25)
Moreover,
and, if \(\mathcal {Y}\) is first countable, then
where

(B1) For all \((y_n)_{n\in \mathbb {N}}\) with \(y_n \in {{\mathrm{supp}}}(\mu _n\circ \pi ^{1})\) and \(y_n \rightarrow y\) the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation upper bound with rate function I.

(B2) For all \(U_1,\dots ,U_k\in \mathcal {G}\) one has for \(W= \mathcal {X}\setminus (U_1 \cup \cdots \cup U_k)\)
$$\begin{aligned}&\inf _{V_0\in \mathcal {N}_y} \limsup _{n\rightarrow \infty }\sup _{ \genfrac{}{}{0.0pt}{}{V\in \mathcal {H}, V \subset V_0}{V\cap {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\ne \emptyset } } \tfrac{1}{n}\log \mu _n(W^\circ \times \mathcal {Y} \mathcal {X}\times V) \le  \inf I(W). \end{aligned}$$(6.26) 
(B3) For all \(U_1,\dots , U_k\in \mathcal {G}\) with \(W= \mathcal {X}\setminus (U_1\cup \cdots \cup U_k)\)
$$\begin{aligned}&\inf _{V_0\in \mathcal {N}_y} \limsup _{n\rightarrow \infty }\sup _{ \genfrac{}{}{0.0pt}{}{V\in \mathcal {H}, V \subset V_0}{V\cap {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\ne \emptyset } } \tfrac{1}{n}\log \mu _n(W^\circ \times \mathcal {Y} \mathcal {X}\times V) \nonumber \\&\quad \le \limsup _{V\in \mathcal {N}_y} \limsup _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \mu _n(\mathcal {X}\times V)>0 } } \tfrac{1}{n}\log \mu _n(W \times \mathcal {Y} \mathcal {X}\times V). \end{aligned}$$(6.27) 
(B4) For all \(U_1,\dots , U_k\in \mathcal {G}\) with \(W= \mathcal {X}\setminus (U_1\cup \cdots \cup U_k)\) we have \(\forall Z_0\in \mathcal {N}_y\; \forall \varepsilon >0 \,\exists V_0\in \mathcal {N}_y \,\exists Z\in \mathcal {N}_y, Z\subset Z_0\; \forall M \,\exists m \ge M \,\exists N\; \forall n\ge N\; \forall V\in \mathcal {H}, V \subset V_0,V\cap {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\ne \emptyset \):
$$\begin{aligned} \tfrac{1}{n}\log \mu _n ({\overline{U}} \times \mathcal {Y} \mathcal {X}\times V) \le \tfrac{1}{m}\log \mu _m (U \times \mathcal {Y} \mathcal {X}\times Z) + \varepsilon . \end{aligned}$$(6.28) 
(B5) For all \(U_1,\dots , U_k\in \mathcal {G}\) with \(W= \mathcal {X}\setminus (U_1\cup \cdots \cup U_k)\) we have \(\forall \varepsilon >0\; \forall V_0 \in \mathcal {N}_y\, \exists N\in \mathbb {N}\; \forall n \ge N\; \forall V\in \mathcal {H}, V\subset V_0,V\cap {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\ne \emptyset \):
$$\begin{aligned} \mu _n(W^\circ \times \mathcal {Y} \mathcal {X}\times V) \le e^{n\varepsilon } \mu _n( W \times \mathcal {Y} \mathcal {X}\times V_0) \end{aligned}$$(6.29)
Proof
The proofs of (B5) \(\Longrightarrow \) (B4) \(\iff \) (B3) \(\Longrightarrow \) (B2) \(\Longrightarrow \) (B1) and of (B1) \(\Longrightarrow \) (B2) are similar to the proofs of the following implications.
(A4) \(\iff \) (A3) follows by definition of \(\sup \), \(\inf \), \(\limsup \) and \(\liminf \).
(A5) \(\Longrightarrow \) (A3) Let \(U \in \mathcal {G}\). Assuming (A5) we obtain \(\forall \varepsilon >0\; \forall V_0 \in \mathcal {N}_y\, \exists N\in \mathbb {N}\; \forall n\ge N\) and one has \(\mu _n(\mathcal {X}\times V_0)>0\) and
So \(\forall \varepsilon >0\; \forall V_0 \in \mathcal {N}_y\)
(A3) \(\Longrightarrow \) (A2) Follows by Lemma 6.7.
(A2) \(\Longrightarrow \) (A1). Suppose that (A2) holds. Let \(U\in \mathcal {G}\) with \(\inf J(U \times \{y\})<\infty \) and let \(\varepsilon >0\). Let \(V_0 \in \mathcal {N}_y\) and \(N\in \mathbb {N}\) be such that \(\tfrac{1}{n}\log \mu _n( {\overline{U}} \times \mathcal {Y} \mathcal {X}\times V) \ge  \inf I(U)  \varepsilon \) for all \(n\ge N\) and all \(V\in \mathcal {H}\) with \(V\subset V_0\) and \(V\cap {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\ne \emptyset \). Let \((y_n)_{n\in \mathbb {N}}\) be such that \(y_n \in {{\mathrm{supp}}}(\mu _n\circ \pi ^{1})\) and \(y_n \rightarrow y\). Let \(N_0\ge N\) be such that \(y_n \in V_0\) for all \(n\ge N_0\). Then for all \(n\ge N_0\) and \(V \in \mathcal {N}_{y_n}\cap \mathcal {H}\) with \(V\subset V_0\) we have \(\tfrac{1}{n}\log \mu _n({\overline{U}} \times \mathcal {Y} \mathcal {X}\times V) \ge  \inf I(U)  \varepsilon \). This implies (a2) of Theorem 6.3 (with \(\mathcal {V}_n=\mathcal {N}_{y_n}\cap \mathcal {H}\)).
(A1) \(\Longrightarrow \) (A2) (assuming \(\mathcal {Y}\) is first countable). Suppose that (A2) does not hold. Let \((V_m)_{m\in \mathbb {N}}\) be a decreasing sequence in \(\mathcal {H}\) with \(\bigcap _{m\in \mathbb {N}} V_m = \{y\}\). Then there exists a \(U\in \mathcal {G}\) with \(\inf J(U \times \{y\})<\infty \) and an \(\alpha >\inf I(U)\) such that for all \(M\in \mathbb {N}\) and \(N\in \mathbb {N}\) there exist an \(n\ge N\) and a \(V\in \mathcal {H}\) with \(V\subset V_M\) and \(V \cap {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\ne \emptyset \) such that
Let \(\beta <\alpha \) be such that \(\beta > \inf I(U)\). By Lemma 4.11 we have
For each \(m\in \mathbb {N}\) there exist an \(n_m\) and a \(y_{n_m}\in V_m \cap {{\mathrm{supp}}}(\mu _{n_m} \circ \pi ^{1})\) such that
We may choose \(n_1<n_2<n_3<\cdots \). With \(y_k = y\) for \(k\notin \{n_m: m\in \mathbb {N}\}\) we have \(y_n \rightarrow y\) and
Therefore (a1) of Theorem 6.3 does not hold, which implies that (A1) does not hold. \(\square \)
We can also use Lemma 6.7 and Theorem 6.3 (see also 6.4) to obtain sufficient conditions for the lower or upper large deviation bounds for \((\eta _n(y,\cdot ))_{n\in \mathbb {N}}\).
Theorem 6.10
Let \(\mathcal {V}\subset \mathcal {N}_y\) be such that \(\bigcap \mathcal {V}= \{y\}\).

(a)
Suppose that for all \(U\in \mathcal {G}\) with \(\inf J(U\times \{y\})<\infty \)
$$\begin{aligned}&\liminf _{n\rightarrow \infty }\limsup _{V\in \mathcal {V}} \tfrac{1}{n}\log \mu _n({\overline{U}} \times \mathcal {Y} \mathcal {X}\times V) \nonumber \\&\quad \ge \liminf _{V\in \mathcal {N}_y} \liminf _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \mu _n(\mathcal {X}\times V)>0 } } \tfrac{1}{n}\log \mu _n( U \times \mathcal {Y} \mathcal {X}\times V). \end{aligned}$$(6.36)Then \((\eta _n(y,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation lower bound with rate function I.

(b)
Suppose that for all \(U_1,\dots , U_k\in \mathcal {G}\) with \(W= \mathcal {X}\setminus (U_1\cup \cdots \cup U_k)\)
$$\begin{aligned}&\limsup _{n\rightarrow \infty }\liminf _{V\in \mathcal {V}} \tfrac{1}{n}\log \mu _n(W^\circ \times \mathcal {Y} \mathcal {X}\times V) \nonumber \\&\quad \le \limsup _{V\in \mathcal {N}_y} \limsup _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \mu _n(\mathcal {X}\times V)>0 } } \tfrac{1}{n}\log \mu _n(W \times \mathcal {Y} \mathcal {X}\times V). \end{aligned}$$(6.37)Then \((\eta _n(y,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation upper bound with rate function I.
6.11
(6.36) and (6.37) hold for example when \(\forall \varepsilon >0 \;\forall V_0 \in \mathcal {V}\,\exists N\in \mathbb {N}\; \forall n\ge N \;\forall V\in \mathcal {V}, V\subset V_0:\)
respectively.
6.12
Theorem 1.2 is a consequence of Theorems 4.10, 6.8 and 6.9 with \(\mathcal {G}= \{ B(x,r): x\in \mathcal {X}, r>0\}\) and \(\mathcal {H}= \{B(y,\delta ): y\in \mathcal {Y}, \delta >0\}\).
Large Deviations for Regular Conditional Probabilities
In this section \(\mathcal {X}\) and \(\mathcal {Y}\) are topological spaces, \((\nu _n)_{n\in \mathbb {N}}\) is a sequence of probability measures on \(\mathcal {B}(\mathcal {X})\) that satisfies the large deviation principle with rate function \(K: \mathcal {X}\rightarrow [0,\infty ]\), and \(\tau : \mathcal {X}\rightarrow \mathcal {Y}\) is continuous. For more assumptions, see 7.2.
We derive the analogous statements as in Sect. 6 but for regular conditional kernels instead of product regular conditional kernels (7.3 and Theorem 7.5). First we show that with \(\mu _n\) the probability measure corresponding on the product space corresponding to \(\nu _n\) as in Theorem 3.6, the sequence \((\mu _n)_{n\in \mathbb {N}}\) satisfies the large deviation principle with a rate function described in terms of K (Theorem 7.1).
If \((\eta _n)_{n\in \mathbb {N}}\) are regular conditional probabilities under \((\nu _n)_{n\in \mathbb {N}}\) given \(\tau \), then one could also follow the proofs in Sect. 6 for the product regular conditional probabilities to obtain similar results for large deviations for sequences of the form \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\). Instead, we make the approach via Theorem 3.6 to translate the results to the setting of regular conditional probabilities.
Theorem 7.1
For all \(n\in \mathbb {N}\) let \(\mu _n\) be the probability measure on \( \mathcal {B}(\mathcal {X}) \otimes \mathcal {B}(\mathcal {Y})\) for which \(\mu _n(A\times B) = \nu _n (A \cap \tau ^{1}(B))\) for \(A\in \mathcal {B}(\mathcal {X}), B\in \mathcal {B}(\mathcal {Y})\) (as in Theorem 3.6). Then \((\mu _n)_{n\in \mathbb {N}}\) satisfies the large deviation principle on \(\{A\times B: A\in \mathcal {B}(\mathcal {X}), B\in \mathcal {B}(\mathcal {Y})\}\) with rate function \(J: \mathcal {X}\times \mathcal {Y}\rightarrow [0,\infty ]\) given by
If \(K\) has compact sublevel sets, then so does \(J\).
Proof
By definition of \(J\) we have
Let \(A\in \mathcal {B}(\mathcal {X})\) and \(B\in \mathcal {B}(\mathcal {Y})\). Then
We have \((A \cap \tau ^{1}(B))^\circ = A^\circ \cap \tau ^{1}(B)^\circ \) and \( \tau ^{1}(B)^\circ \supset \tau ^{1}(B^\circ )\), whence
Similarly
We have \(\overline{A \cap \tau ^{1}(B)} \subset \overline{A} \cap \overline{\tau ^{1}(B)}\) and \(\overline{\tau ^{1}(B)} \subset \tau ^{1}({\overline{B}})\), whence
Suppose that \(K\) has compact sublevel sets. Let \(c\ge 0\). Then \([J\le c]\) is contained in the compact set \([K\le c] \times \tau ([K\le c])\). By Theorem 6.8 I has compact sublevel sets. \(\square \)
7.2
In the rest of this section \(\mathcal {X}\) is normal, and \(\mathcal {G}\), \(\mathcal {H}\), \(\pi \) are as in (ii) and (iii) of Sect. 6. Furthermore similarly to (v) and (vi) of Sect. 6, we assume the following.
 (v)*:

For each \(n\in \mathbb {N}\) we assume the following: \({{\mathrm{supp}}}(\nu _n \circ \tau ^{1}) \ne \emptyset \), there exists a regular conditional probability \(\eta _n : \mathcal {Y}\times \mathcal {B}(\mathcal {X}) \rightarrow [0,1]\) under \(\nu _n\) with respect to \(\tau \), satisfying the continuity condition (6.1).
 (vi)*:

Let \(y \in \mathcal {Y}\). We assume that \(\inf K(\tau ^{1}(\{y\}))<\infty \) and that there exists \(y_n \in {{\mathrm{supp}}}(\nu _n \circ \tau ^{1})\) with \(y_n \rightarrow y\). Let \(I: \mathcal {X}\rightarrow [0,\infty ]\) be given by
$$\begin{aligned} I(x)&= J(x,y)  \inf J(\mathcal {X}\times \{y\}) \nonumber \\&= {\left\{ \begin{array}{ll} K(x)  \inf K(\tau ^{1}(\{y\})) &{} \tau (x)=y, \\ \infty &{} \tau (x) \ne y. \end{array}\right. } \end{aligned}$$(7.7)
7.3
As by Theorem 3.6 \(\eta _n\) is the product regular conditional kernel under \(\mu _n\) with respect to \(\pi \); by Theorem 7.1 \((\mu _n)_{n\in \mathbb {N}}\) satisfies the large deviation principle on \(\{A\times B: A\in \mathcal {B}(\mathcal {X}), B\in \mathcal {B}(\mathcal {Y})\}\) with rate function J, and \(\inf J(\mathcal {X}\times \{y\}) = \inf K(\tau ^{1}(y))<\infty \) and \(\mu _n\) and \(\eta _n\) are as in Sect. 6 (in the sense that (iv), (v), (vi) hold). Therefore we can translate the results of Sect. 6, but also the results of Sects. 4 and 5, using, for example (7.2), \( \nu _n \circ \tau ^{1} = \mu _n \circ \pi ^{1}\) and that for \(V\in \mathcal {B}(\mathcal {Y})\) with \(\nu _n(\tau ^{1}(V))>0\) and for \(A\in \mathcal {B}(\mathcal {X})\)
In this sense also Theorem 1.3 follows from Theorem 1.2. We present some of the equivalent statements of Theorem 6.9 in Theorem 7.5.
Remark 7.4
Because of the relation between \(\mu _n\) and \(\nu _n\) and between K and J, in Theorem 7.1 we were able to prove the large deviation principle on \(\{A\times B: A\in \mathcal {B}(\mathcal {X}), B\in \mathcal {B}(\mathcal {Y})\}\). Whether it can be extended to the large deviation principle on \(\mathcal {B}(\mathcal {X})\otimes \mathcal {B}(\mathcal {Y})\) is a priori not clear. However, for the purpose of using the results of Sect. 6, this is not required (as only (iv) of Sect. 6 is required). This is the main reason to define the large deviation bounds as in Definition 1.1.
Theorem 7.5
(A3) \(\Longrightarrow \) (A2) \(\Longrightarrow \) (A1). If \(\mathcal {Y}\) is first countable, then (A1) \(\iff \) (A2).

(A1)
For all \((y_n)_{n\in \mathbb {N}}\) with \(y_n \in {{\mathrm{supp}}}(\nu _n\circ \tau ^{1})\) and \(y_n \rightarrow y\) the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation lower bound with rate function I.

(A2)
For all \(U\in \mathcal {G}\)
$$\begin{aligned} \sup _{V_0\in \mathcal {N}_y} \liminf _{n\rightarrow \infty }\inf _{ \genfrac{}{}{0.0pt}{}{V\in \mathcal {H}, V \subset V_0}{V\cap {{\mathrm{supp}}}(\nu _n \circ \tau ^{1})\ne \emptyset } } \tfrac{1}{n}\log \nu _n({\overline{U}}  \tau ^{1}(V) ) \ge  \inf I(U). \end{aligned}$$(7.9) 
(A3)
For all \(U\in \mathcal {G}\)
$$\begin{aligned}&\sup _{V_0\in \mathcal {N}_y} \liminf _{n\rightarrow \infty }\inf _{ \genfrac{}{}{0.0pt}{}{V\in \mathcal {H}, V \subset V_0}{V\cap {{\mathrm{supp}}}(\nu _n \circ \tau ^{1})\ne \emptyset } } \tfrac{1}{n}\log \nu _n({\overline{U}}  \tau ^{1}(V) )\nonumber \\&\quad \ge \liminf _{V\in \mathcal {N}_y} \liminf _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \nu _n(\tau ^{1}(V))>0 } } \tfrac{1}{n}\log \nu _n( U  \tau ^{1}(V)). \end{aligned}$$(7.10)
(B3) \(\Longrightarrow \) (B2) \(\Longrightarrow \) (B1). If \(\mathcal {Y}\) is first countable, then (B1) \(\iff \) (B2).
 (B1):

For all \((y_n)_{n\in \mathbb {N}}\) with \(y_n \in {{\mathrm{supp}}}(\nu _n\circ \tau ^{1})\) and \(y_n \rightarrow y\) the sequence \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation upper bound with rate function I.
 (B2):

For all \(U_1,\dots ,U_k\in \mathcal {G}\) one has for \(W= \mathcal {X}\setminus (U_1 \cup \cdots \cup U_k)\)
$$\begin{aligned}&\inf _{V_0\in \mathcal {N}_y} \limsup _{n\rightarrow \infty }\sup _{ \genfrac{}{}{0.0pt}{}{V\in \mathcal {H}, V \subset V_0}{V\cap {{\mathrm{supp}}}(\nu _n \circ \tau ^{1})\ne \emptyset } } \tfrac{1}{n}\log \nu _n(W^\circ  \tau ^{1}(V ) ) \le  \inf I(W). \end{aligned}$$(7.11)  (B3):

For all \(U_1,\dots , U_k\in \mathcal {G}\) with \(W= \mathcal {X}\setminus (U_1\cup \cdots \cup U_k)\)
$$\begin{aligned}&\inf _{V_0\in \mathcal {N}_y} \limsup _{n\rightarrow \infty }\sup _{ \genfrac{}{}{0.0pt}{}{V\in \mathcal {H}, V \subset V_0}{V\cap {{\mathrm{supp}}}(\nu _n \circ \tau ^{1})\ne \emptyset } } \tfrac{1}{n}\log \nu _n(W^\circ  \tau ^{1}(V ) )\nonumber \\&\quad \le \limsup _{V\in \mathcal {N}_y} \limsup _{ \genfrac{}{}{0.0pt}{}{ n\rightarrow \infty }{n\in \mathbb {N}: \nu _n(\tau ^{1}(V))>0 } } \tfrac{1}{n}\log \nu _n(W  \tau ^{1}(V ) ). \end{aligned}$$(7.12)
An Application to Conditional Probabilities of Empirical Distributions on Finite Sets
In terms of random variables, Sanov’s theorem gives us the large deviation principle of empirical densities \(\frac{1}{n}\sum _{i=1}^n \delta _{X_i}\), where \(X_1,X_2,\dots \) are independent and identically distributed random variables. We consider large deviations of \(\frac{1}{n}\sum _{i=1}^n \delta _{X_i}\) conditioning on \(\frac{1}{n}\sum _{i=1}^n \delta _{Y_i} = \psi _n\), where \((X_1,Y_1), (X_2,Y_2), \dots \) are independent and identically distributed couples of random variables, both random variables attaining their values in a finite set. This large deviation principle is formalised in Theorem 8.2.
In this section we consider the following.

Let \(\mathcal {R}\) and \(\mathcal {S}\) be finite sets equipped with the discrete topology (discrete metric). Let \(\mathcal {P}(\mathcal {R}),\mathcal {P}(\mathcal {S})\) and \(\mathcal {P}(\mathcal {R}\times \mathcal {S})\) be equipped by the weak topology, and let \(\mathfrak {d}\) denote the Prohorov metric (see Billingsley [3, Appendix III]) on each of the spaces.

Let \(\lambda \in \mathcal {P}(\mathcal {R}\times \mathcal {S})\). We assume \(\lambda (\mathcal {R}\times \{s\}) >0\) for all \(s\in \mathcal {S}\).

For \(n\in \mathbb {N}\) let \(L_n : \mathcal {R}^n \rightarrow \mathcal {P}(\mathcal {R})\) be given by \(L_n(r) = \frac{1}{n} \sum _{i=1}^n \delta _{r_i}\) for \(r=(r_1,\dots ,r_n)\in \mathcal {R}^n\).

Write \(\mathcal {P}_{emp}^n(\mathcal {R}) = L_n(\mathcal {R}^n) = \{ \frac{1}{n} \sum _{i=1}^n \delta _{r_i} : r_1,\dots ,r_n\in \mathcal {R}\}\), similarly \(\mathcal {P}_{emp}^n(\mathcal {S})= L_n(\mathcal {S}^n)\) and \(\mathcal {P}_{emp}^n(\mathcal {R}\times \mathcal {S})= L_n((\mathcal {R}\times \mathcal {S})^n)\).

Let \(\mathfrak {m}: \mathcal {P}(\mathcal {R}\times \mathcal {S}) \rightarrow \mathcal {P}(\mathcal {R}) \times \mathcal {P}(\mathcal {S})\) be the map that maps a measure in \(\mathcal {P}(\mathcal {R}\times \mathcal {S})\) onto the pair of its marginals, i.e. \(\mathfrak {m}\) is given by
$$\begin{aligned} \mathfrak {m}(\xi ) = \big ( \xi (\cdot \times \mathcal {S}), \xi (\mathcal {R}\times \cdot ) \big ). \end{aligned}$$(8.1) 
Let \(\pi : \mathcal {P}(\mathcal {R}) \times \mathcal {P}(\mathcal {S}) \rightarrow \mathcal {P}(\mathcal {S})\) be the map given by \(\pi (\xi ,\zeta ) = \zeta \).

Let \(\mu _n\) be the probability measure on \(\mathcal {B}(\mathcal {P}(\mathcal {R})) \otimes \mathcal {B}(\mathcal {P}(\mathcal {S}))\) defined by \(\mu _n = \left( \bigotimes _{i=1}^n \lambda \right) \circ L_n^{1} \circ \mathfrak {m}^{1}\), so that for \(A\in \mathcal {B}(\mathcal {P}(\mathcal {R}))\) and \(B\in \mathcal {B}(\mathcal {P}(\mathcal {S}))\)
$$\begin{aligned} \mu _n(A \times B) = \left( \bigotimes _{i=1}^n \lambda \right) ( L_n^{1}(A) \times L_n^{1}(B)). \end{aligned}$$(8.2) 
Define \(\theta : \mathcal {S}\times \mathcal {B}(\mathcal {R}) \rightarrow [0,1]\) by \(\theta (s,A) = \lambda (A\times \mathcal {S} \mathcal {R}\times \{s\})\).

Define \(\eta _n : \mathcal {P}(\mathcal {S}) \times \mathcal {B}(\mathcal {P}(\mathcal {R})) \rightarrow [0,1]\) by
$$\begin{aligned} \eta _n(\xi ,A) = {\left\{ \begin{array}{ll} \left[ \bigotimes _{i=1}^n \theta (s_i,\cdot ) \right] \circ L_n^{1}(A) &{} \xi \in \mathcal {P}_{emp}^n(\mathcal {S}), \xi = L_n(s_1,\dots ,s_n) \\ &{} \text{ for } s_1,\dots ,s_n \in \mathcal {S}, \\ 0 &{} \xi \notin \mathcal {P}_{emp}^n(\mathcal {S}). \end{array}\right. } \end{aligned}$$(8.3) 
Let \(J: \mathcal {P}(\mathcal {R}) \times \mathcal {P}(\mathcal {S}) \rightarrow [0,\infty ]\) be given by
$$\begin{aligned} J(\rho , \sigma )&= \inf _{\xi \in \mathfrak {m}^{1} (\{ (\rho ,\sigma ) \}) } H(\xi  \lambda ). \end{aligned}$$(8.4)where \(H(\xi  \lambda )\) is the relative entropy of \(\xi \) with respect to \(\lambda \) ([7, Definition 2.1.5]).

Let \(\psi \in \mathcal {P}(\mathcal {S})\) be such that
$$\begin{aligned} \inf _{ \xi \in \mathfrak {m}^{1} (\mathcal {P}(\mathcal {R}) \times \{\psi \}) } H(\xi \lambda )<\infty . \end{aligned}$$(8.5)
8.1
We present some facts which follow from the assumptions with little effort; to some facts we give some explanation or references.

(a)
\(\mathcal {P}_{emp}^n(\mathcal {S})\) is closed in \(\mathcal {P}(\mathcal {S})\). Moreover, if \(\xi _k\) and \(\xi \) in \(\mathcal {P}_{emp}^n(\mathcal {S})\) are such that \(\xi _k \rightarrow \xi \), then there exist \(s_{ki}\) and \(q_i\) in \(\mathcal {S}\) for \(i\in \{1,\dots ,n\}\) such that \(\xi _k = L_n((s_{k1},\dots ,s_{kn}))\), \(\xi = L_n((q_1,\dots ,q_n))\) and \(s_{ki} \rightarrow q_i\) for all \(i\in \{1,\dots ,n\}\).

(b)
\({{\mathrm{supp}}}(\mu _n \circ \pi ^{1}) = \mathcal {P}_{emp}^n(\mathcal {S})\).

(c)
\(\eta _n\) is a product regular conditional kernel under \(\mu _n\) with respect to \(\pi \) that is weakly continuous on \(\mathcal {P}_{emp}^n(\mathcal {S})\).

(d)
\((\bigotimes \lambda ^n \circ L_n^{1})_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function \( H(\cdot \lambda )\).

(e)
\(\mathfrak {m}\) is continuous.

(f)
\((\mu _n)_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function J.
(a) follows from the fact that \(\mathcal {S}\) is a finite space. (b) follows from (a), from the fact that the complement of \(\mathcal {P}_{emp}^n(\mathcal {S})\) has \(\mu _n \circ \pi ^{1}\)measure zero and because \(\mu _n \circ \pi ^{1}(\{L_n(s)\}) >0 \) for all \(s\in \mathcal {S}^n\), which is due to the assumptions on \(\lambda \). (c) follows by a straightforward calculation, and the continuity follows from (a). For (d) see Sanov’s theorem (Dembo and Zeitouni [7, Theorem 6.2.10]). (e) follows from the fact that if \(\xi _n \rightarrow \xi \) in \(\mathcal {P}(\mathcal {R}\times \mathcal {S})\), then the \(\mathcal {R}\) and \(\mathcal {S}\)marginals of \(\xi _n\) converge to the \(\mathcal {R}\) and \(\mathcal {S}\)marginals of \(\xi \), respectively. Then (f) follows from (a) and (d) by the contraction principle [7, Theorem 4.2.1].
In the rest of this section, we prove the following theorem.
Theorem 8.2
For all \((\psi _n)_{n\in \mathbb {N}}\) with \(\psi _n \in \mathcal {P}_{emp}^n(\mathcal {S})\) and \(\psi _n \rightarrow \psi \), the sequence \((\eta _n(\psi _n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function \(I: \mathcal {P}(\mathcal {R}) \rightarrow [0,\infty ]\), given by
I is continuous on \([I<\infty ]\).
As \(\mathcal {P}(\mathcal {S})\) is first countable, it is sufficient to show that (A2) and (B2) of Theorem 6.9 hold. In 8.4 we use the bounds of Lemma 8.3 to derive other bounds which imply (A2) and (B2). The continuity of I follows by continuity of the map \(\nu \mapsto H(\nu \lambda )\) (Lemma 8.5).
Lemma 8.3
[7, Lemma 2.1.9] For \(\nu \in \mathcal {P}_{emp}^n(\mathcal {R}\times \mathcal {S})\) one has, with \(M=(\#\mathcal {R})(\#\mathcal {S})\),
8.4
From Lemma 8.3 we obtain the following bounds for \(A\in \mathcal {B}(\mathcal {P}(\mathcal {R}))\) and \(B\in \mathcal {B}(\mathcal {P}(\mathcal {S}))\).
Whence
In order to derive (A2) and (B2) of Theorem 6.9, we make the following observation. By (8.10) we have for an open U and a closed W that if for both \(A=U\) and \(C= \mathcal {R}\) as well as \(A=\mathcal {R}\) and \(C=W\) we have
then
As
(8.11) holds (for both \(A=U\) and \(C= \mathcal {R}\) as well as for \(A=\mathcal {R}\) and \(C=W\), where U is open and W is closed) if for all open U and all closed W
(8.16) is a consequence of Lemma 5.3, as \(\mathfrak {m}^{1}(W\times V) = \mathfrak {m}^{1}(W\times \mathcal {P}(\mathcal {S})) \cap \mathfrak {m}^{1}(\mathcal {P}(\mathcal {R}) \times V)\), the set \(F= \mathfrak {m}^{1}(W\times \mathcal {P}(\mathcal {S}))\) is closed for closed W, \(\mathfrak {m}^{1}(\mathcal {P}(\mathcal {R}) \times V)= (\pi \circ \mathfrak {m})^{1}(V)\), and \(\pi \circ \mathfrak {m}\) is continuous. The proof of inequality (8.15) requires a little more attention. First we present some facts which are used to prove this inequality in Lemma 8.8.
Lemma 8.5
[7, Remark below Definition 2.1.5] The map \(\nu \mapsto H(\nu \lambda )\) is continuous on \([H(\cdot \lambda ) <\infty ]\). In particular, for all \(\varepsilon >0\) and \(\xi \in \mathcal {P}(\mathcal {R}\times \mathcal {S})\) there exists a \(\Theta \in \mathcal {N}_\xi \) such that
Consequently, I as in (8.6) is continuous on \([J<\infty ]\).
Lemma 8.6

(a)
Let \(k,l\in \mathbb {N}\) and \(\zeta \in \mathcal {P}_{emp}^k(\mathcal {S})\). For all \(m\ge kl\) there exists a \(\nu \in \mathcal {P}_{emp}^m(\mathcal {S})\) such that \(\mathfrak {d}(\nu ,\zeta ) <\frac{1}{l}\).

(b)
For all open \(\Theta \subset \mathcal {P}(\mathcal {S})\) there exists an \(N\in \mathbb {N}\) such that \(\mathcal {P}_{emp}^n(\mathcal {S}) \cap \Theta \ne \emptyset \) for all \(n\ge N\).
Proof
(a) Let \(i \in \{1,\dots ,k\}\). Let \(\xi \in \mathcal {P}_{emp}^i(\mathcal {S})\). Then the measure \(\frac{lk}{lk+i}\zeta + \frac{i}{lk+i} \xi \) is an element of \(\mathcal {P}_{emp}^{lk+i}(\mathcal {S})\). For every \(A \subset \mathcal {S}\)
By definition of the Prohorov metric, this implies \(\mathfrak {d}( [\tfrac{lk}{lk+i}\zeta + \tfrac{i}{lk+i} \xi ], \zeta ) \le \frac{2}{l}\).
(b) Let \(\xi \in \mathcal {P}(\mathcal {S})\) and \(\delta >0\) be such that \(B(\xi ,\delta ) \subset \Theta \). For each \(\xi \in \mathcal {P}(\mathcal {S})\) there is a \(k\in \mathbb {N}\) and a \(\zeta \in \mathcal {P}_{emp}^k(\mathcal {S})\) such that \(\mathfrak {d}(\zeta ,\xi )<\frac{\delta }{2}\). Because of this, (b) follows from (a) by letting l be such that \(\frac{1}{l} < \frac{\delta }{2}\) and \(N= lk\). \(\square \)
Lemma 8.7
Let \(\xi \in \mathcal {P}(\mathcal {R}\times \mathcal {S})\), \(\pi \circ \mathfrak {m}(\xi ) = \psi \) and \(\xi \ll \lambda \). For all \(\delta >0\) there exist a \(\kappa >0\) and an \(N\in \mathbb {N}\) such that for all \(n\ge N\) and all \(\zeta \in \mathcal {P}_{emp}^n(\mathcal {S})\) with \(\mathfrak {d}(\zeta ,\psi ) < \kappa \) there is a \(\nu \in \mathcal {P}_{emp}^n(\mathcal {R}\times \mathcal {S})\) with
Proof
In this proof, for a measure \(\xi \in \mathcal {P}(\mathcal {R}\times \mathcal {S})\), we write \(\xi _{rs} = \xi (\{(r,s)\})\), so that \(\xi = \sum _{rs} \xi _{rs} \delta _{(r,s)}\) where we use the shorthand notation “\(\sum _{rs}\)” instead of “\(\sum _{r\in \mathcal {R},s\in \mathcal {S}}\)”. Let \(M= \# \mathcal {R}\# \mathcal {S}\). Note that
Let \(\kappa >0\) and \(n\in \mathbb {N}\). We first give an estimation by which it is clear which \(\kappa \) and N one should choose. By the assumptions on \(\lambda \) for every \(s\in \mathcal {S}\) there exists a \(r_s \in \mathcal {R}\) with \(\lambda _{r_s s}>0\).
First we show that there exists a \(\xi ^* \in \mathcal {P}_{emp}^n(\mathcal {X}\times \mathcal {Y})\) with \(\xi ^* \ll \xi \) and \(\xi _{rs}^*  \xi _{rs} \le \frac{2}{n}\) for all \(r\in \mathcal {R}\) and \(s\in \mathcal {S}\). For each pair \((r,s) \in \mathcal {R}\times \mathcal {S}\) with \(\xi _{rs}>0\) we can choose a \(\xi _{rs}'\in \{0,\frac{1}{n},\frac{2}{n}, \dots , 1\}\) such that \(\xi _{rs}  \xi _{rs}'<\frac{1}{n}\). By letting \(\xi _{rs}^*=0\) when \(\xi _{rs}=0\) and add or subtract \(\frac{1}{n}\) to some of the \(\xi _{rs}'\) we obtain a collection of \(\xi _{rs}^*\in \{0,\frac{1}{n},\frac{2}{n},\dots ,1\}\) with \(\sum _{rs} \xi _{rs}^*=1\) and \(\xi _{rs}^*  \xi _{rs} \le \frac{2}{n}\) and \(\xi ^*_{rs}=0\) whenever \(\xi _{rs}=0\) for all \(r\in \mathcal {R}\) and \(s\in \mathcal {S}\).
Let \(\xi \in \mathcal {P}(\mathcal {R}\times \mathcal {S})\). Suppose that \(\zeta \in \mathcal {P}_{emp}^n(\mathcal {S})\) is such that \(\zeta _s  \sum _r \xi _{rs}<\kappa \). Then \(\zeta _s  \sum _r \xi ^*_{rs} < \kappa + \frac{2}{n} M\). We construct a \(\nu \in \mathcal {P}_{emp}^n(\mathcal {R}\times \mathcal {S})\) by defining the \(\nu _{rs}\) by each s separately. Let \(s\in S\). If \(\zeta _s  \sum _r \xi ^*_{rs}<0\), then we choose \(\nu _{rs} \le \xi ^*_{rs}\) with \(\nu _{rs} \in \{0,\frac{1}{n},\dots ,1\}\) in such way that \(\sum _r \nu _{rs} = \zeta _s\) (note that \(\nu _{rs}  \xi ^*_{rs} \le \zeta _s  \sum _r \xi ^*_{rs}\)). While, if \(\zeta _s  \sum _r \xi ^*_{rs}\ge 0\), then we let \(\nu _{rs} = \xi ^*_{rs}\) for all \(r\ne r_s\) and we let \(\nu _{r_s s} = \xi _{r_s s}^* + \zeta _s  \sum _r \xi ^*_{rs}\) (so that \(\sum _r \nu _{rs} = \zeta _s\)). As \(\xi ^* \ll \xi \) and \(\xi \ll \lambda \), by the construction of \(\nu \) we have \(\nu \ll \lambda \). Moreover, we have \(\pi \circ \mathfrak {m}(\nu ) = \zeta \) and
which implies by (8.20)
Moreover, as \( \sum _s \nu _{rs}  \sum _s \xi _{rs} \le M \max _{s\in \mathcal {S}} \nu _{rs}  \xi _{rs}\),
By choosing \(\kappa >0\) and \(N\in \mathbb {N}\) such that \(M^2 \kappa + \tfrac{2}{n} (M^3 +M^2) <\delta \) the proof is complete. \(\square \)
Lemma 8.8
For all open \(U\subset \mathcal {R}\)
Proof
We assume \(\inf _{\nu \in \mathfrak {m}^{1} (U \times \{\psi \}) } H(\nu \lambda )<\infty \). Let \(\xi \in \mathfrak {m}^{1}(U \times \{\psi \})\) be such that \(H(\xi \lambda )<\infty \). Let \(\varepsilon >0\). We show there exists a \(V_0 \in \mathcal {N}_\psi \) and an \(N\in \mathbb {N}\) such that for all \(n\ge N\) the set \(\mathcal {P}_{emp}^n(\mathcal {S})\cap V_0\) is not empty and for all \(\zeta \in \mathcal {P}_{emp}^n(\mathcal {S})\cap V_0\) there exists a \(\nu \in \mathfrak {m}^{1}(U\times \{\zeta \})\cap \mathcal {P}_{emp}^n(\mathcal {R}\times \mathcal {S})\) with
Let \(\delta \) be such that (see Lemma 8.5)
Then let \(\kappa >0\) and \(N\in \mathbb {N}\) be as in Lemma 8.7. Let \(V_0 = B(\psi ,\kappa )\). By Lemma 8.6 we may assume that N is large enough such that \(\mathcal {P}_{emp}^n (\mathcal {S}) \cap V_0 \ne \emptyset \). Let \(n\ge N\) and \(\zeta \in \mathcal {P}_{emp}^n (\mathcal {S}) \cap V_0\). By Lemma 8.7 there exists a \(\nu \in \mathcal {P}_{emp}^n(\mathcal {R}\times \mathcal {S})\) with \(\pi \circ \mathfrak {m}(\nu ) = \zeta \), \(\nu \ll \lambda \) and \(\nu (\cdot \times \mathcal {S}) \in B(\xi (\cdot \times \mathcal {S}), \delta )\), \(\nu \in B(\xi ,\delta )\), i.e. by (8.26), \(\nu \in \mathfrak {m}^{1}(U\times \{\zeta \})\). \(\nu \ll \lambda \) implies \(\nu \in [H(\cdot \lambda )<\infty ]\); thus with (8.27) we obtain (8.25). \(\square \)
Examples
In Sect. 8, we showed that the regular conditional kernel \(\eta _n\) as in (8.3) satisfies (A1) and (B1) of Theorem 6.9 by showing that (A2) and (B2) of that theorem hold. This is not always the most optimal approach; in Example 9.1 we show that for a specific example of Gaussian measures the expression of \(\eta _n\) allows us to derive (A1) and (B1) directly.
Furthermore, relying on Theorem 9.2, in Example 9.4, we give an example of a \((\eta _n)_{n\in \mathbb {N}}\) for which (A1) of Theorem 6.9 does not hold. In Remark 9.5 we mention that for the one choice of measures in Example 9.4 a quenched large deviation principle is satisfied, while for the other choice of measures there is no quenched large deviation principle. In Example 9.6 we show that for a choice of measures as in Example 9.4 the conditional regular kernel in a specific chosen point does not satisfy any large deviation principle. In Remark 9.7 we discuss exponential tightness of the regular conditional kernel. In Remark 9.8 we discuss the differences between the present paper and the paper of La Cour and Schieve [20].
Example 9.1
Let \(r\ne 0\), \(Z_n : = \int _{\mathbb {R}} \int _\mathbb {R}e^{\frac{n}{2} (x^2  2rxy +y^2)} {{\mathrm{\, \mathrm {d}\!}}}x {{\mathrm{\, \mathrm {d}\!}}}y\) and consider \((\mu _n)_{n\in \mathbb {N}}\) the sequence of probability measures on \(\mathcal {B}(\mathbb {R}\times \mathbb {R})\) determined by
The sequence satisfies the large deviation principle with rate function \(J: \mathbb {R}^2 \rightarrow [0,\infty ]\) given by \( J(x,y) = \tfrac{1}{2} (x^2  2rxy +y^2) \). By Theorem 4.8 \(\eta _n\) given by
is the weakly continuous product regular conditional probability under \(\mu _n\) with respect to the projection on the \(\mathcal {Y}\)coordinate. If \(y_n \rightarrow y\), one can show that for \(\lambda \in \mathbb {R}\)
Then by the Gärtner–Ellis theorem (see for example Dembo and Zeitouni [7, Theorem 2.3.6]) we conclude that \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation principle with the same rate function as the one of the large deviation principle of \((\eta _n(y,\cdot ))_{n\in \mathbb {N}}\), which is \(x\mapsto (x  ry )^2\). Note that this equals \(J(x,y)  \inf J(\mathbb {R}\times \{y\})\) because of the equality \(x^2  2rxy +y^2= (xry)^2 + (1r^2)y^2\).
The proof of the following theorem can be found in “Appendix 3”.
Theorem 9.2
Let \(\mathcal {X}\) and \(\mathcal {Y}\) be separable metric spaces. Let \((\mu _n^1)_{n\in \mathbb {N}}\) and \((\mu _n^2)_{n\in \mathbb {N}}\) be sequences of probability measures on \(\mathcal {B}(\mathcal {X})\). Let \((\nu _n)_{n\in \mathbb {N}}\) be a sequence of probability measures on \(\mathcal {B}(\mathcal {Y})\) that satisfies the large deviation principle with a rate function \(L: \mathcal {Y}\rightarrow [0,\infty ]\). Suppose that \(y\in \mathcal {Y}\) and \(W_n\in \mathcal {N}_y\) are such that \(\bigcap _{n\in \mathbb {N}} W_n = \{y\}\) and \(\alpha _n : \mathcal {Y}\rightarrow [0,1]\) is a continuous function with \(\alpha _n(y) =0\) and \(\alpha _n =1\) on \(\mathcal {Y}\setminus W_n\) such that
Assume \((\mu _n^1)_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function I. Assume furthermore that for all open \(A\subset \mathcal {X}\)
Let \(\mu _n\) be the probability measure on \(\mathcal {B}(\mathcal {X}) \otimes \mathcal {B}(\mathcal {Y})\) for which for \(A\in \mathcal {B}(\mathcal {X})\), \(B\in \mathcal {B}(\mathcal {Y})\)
Then \((\mu _n)_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function \(J: \mathcal {X}\times \mathcal {Y}\rightarrow [0,\infty ]\) given by \(J(x,y) = I(x) + L(y)\). \(\eta _n: \mathcal {Y}\times \mathcal {B}(\mathcal {X}) \rightarrow [0,1]\) defined by
is the weakly continuous product regular conditional probability under \(\mu _n\) with respect to \(\pi : \mathcal {X}\times \mathcal {Y}\rightarrow \mathcal {Y}\) given by \(\pi (x,y)=y\).
Note that \(I(x) = J(x,y)  \inf J(\mathcal {X}\times \{y\})\) for all \(x\in \mathcal {X}, y\in \mathcal {Y}\).
Example 9.3
We give examples of \(\mathcal {Y}, W_n, \alpha _n, \nu _n\) and L such that (9.4) of Theorem 9.2 is satisfied and \((\nu _n)_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function L.

(a)
Let \(\mathcal {Y}= [0,\infty )\), \(\alpha _n (y) = \min \{ny,1\}\) for \(y\in \mathcal {Y}\) and let \(\nu _n(B) = \int _0^\infty \mathbbm {1}_B(y) n e^{ny} {{\mathrm{\, \mathrm {d}\!}}}y\) for \(B\in \mathcal {B}([0,\infty ))\). Then \(\int _0^\frac{1}{n} \alpha _n {{\mathrm{\, \mathrm {d}\!}}}\nu _n = 12e^{1}\) and \(\int _0^\frac{1}{n} (1\alpha _n) {{\mathrm{\, \mathrm {d}\!}}}\nu _n = e^{1}\). Therefore with this \(\nu _n\), \(\alpha _n\) and \(W_n = [\frac{1}{n},\frac{1}{n}]\) (9.4) is satisfied. Moreover \((\nu _n)_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function \(L :\mathcal {Y}\rightarrow [0,\infty ]\), \(L(y) =y\) (this follows from example by the Gärtner–Ellis theorem [7, Theorem 2.3.6]).

(b)
Let \(\mathcal {Y}= \mathbb {R}\) and \(\nu _n =\mu _{\mathcal {N}(0,\frac{1}{n})}\) (the Gaussian measure corresponding to a \(\mathcal {N}(0,\frac{1}{n})\) distributed random variable). Then there exists a decreasing sequence \((\varepsilon _n)_{n\in \mathbb {N}}\) in \((0,\infty )\) with \(\varepsilon _n \downarrow 0\), such that with \(W_n = [\varepsilon _n,\varepsilon _n]\) there exist functions \(\alpha _n\) as in Theorem 9.2 such that (9.4) is satisfied (see the postscript). With \(\nu _n^0 = \frac{1}{2} \delta _0 + \frac{1}{2} \nu _n\) instead of \(\nu _n\), (9.4) is also satisfied. Moreover, \((\nu _n)_{n\in \mathbb {N}}\) and \((\nu _n^0)_{n\in \mathbb {N}}\) (use Lemma 9.9) satisfy the large deviation principle with rate function \(L :\mathcal {Y}\rightarrow [0,\infty ]\), \(L(y) = \frac{1}{2} y^2\).
Postscript.
Let \(\beta = \nu _1([1,1])\). Let \(\kappa _n = \frac{1}{\sqrt{n}}\). Then \(\nu _n[\kappa _n,\kappa _n] = \beta \) for all \(n\in \mathbb {N}\). Let \(\phi _\varepsilon : \mathbb {R}\rightarrow [0,1]\) be defined by \(\phi _\varepsilon (z) = \min \{\varepsilon ^{1} z,1\}\). Then \(\lim _{\varepsilon \downarrow 0} \int _{[\kappa _n,\kappa _n]} \phi _\varepsilon {{\mathrm{\, \mathrm {d}\!}}}\nu _1 = \beta \), \(\lim _{\varepsilon \downarrow 0} \int _{[\kappa _n,\kappa _n]} 1 \phi _\varepsilon {{\mathrm{\, \mathrm {d}\!}}}\nu _1 =0\) and
$$\begin{aligned} \int _{[\kappa _n,\kappa _n]} \phi _{\kappa _n} {{\mathrm{\, \mathrm {d}\!}}}\nu _n < \int _{[\kappa _n,\kappa _n]} 1 \phi _{\kappa _n} {{\mathrm{\, \mathrm {d}\!}}}\nu _n. \end{aligned}$$(9.9)Therefore, for all \(n\in \mathbb {N}\), there exists an \(\varepsilon _n \in (0,\kappa _n)\) such that
$$\begin{aligned} \int _{[\kappa _n,\kappa _n]} \phi _{\varepsilon _n} {{\mathrm{\, \mathrm {d}\!}}}\nu _n = \tfrac{1}{2} \beta = \int _{[\kappa _n,\kappa _n]} 1 \phi _{\varepsilon _n} {{\mathrm{\, \mathrm {d}\!}}}\nu _n, \end{aligned}$$(9.10)With \(\alpha _n = \phi _{\varepsilon _n}\), (9.4) as in Theorem 9.2 is satisfied.
Example 9.4
With \(\mathcal {X}= \mathbb {R}\), \(\mu _n^1 = \mu _{\mathcal {N}(0,\frac{1}{n})}\), \(\mu _n^2 = \delta _{\frac{1}{n}}\) and \(I(x) = \frac{1}{2} x^2\) for \(x\in \mathbb {R}\) and \(\mathcal {Y},\nu _n\) (or \(\nu _n^0\)), \(\alpha _n\), \(W_n\) and L as in Examples 9.3 (a) or (b) the conditions of Theorem 9.2 are satisfied (note that \((\delta _{\frac{1}{n}})_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function \(H: \mathbb {R}\rightarrow [0,\infty ]\) given by \(H(0)=0\) and \(H(x) =\infty \) for \(x\ne 0\)).
Then \(\eta _n(0,\cdot ) = \delta _{\frac{1}{n}}\) and \(\eta _n(\varepsilon _n,\cdot ) = \mu _{\mathcal {N}(0,\frac{1}{n})}\) for all \(n\in \mathbb {N}\). Whence \((\eta _n(0,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function H and \((\eta _n(\varepsilon _n,\cdot ))_{n\in \mathbb {N}}\) (and also \((\eta _n(y,\cdot ))_{n\in \mathbb {N}}\) for \(y>0\)) satisfies the large deviation principle with rate function I. Because \(I \le H\), the sequence \((\eta _n(0,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation upper bound not only with H but also with I instead of H. Therefore (b1) of Theorem 6.3 holds in case \(y_n =0\) for all n. Since \((\eta _n(0,\cdot ))_{n\in \mathbb {N}}\) does not satisfy the large deviation principle with rate function I, (a1) of Theorem 6.3 does not hold. Therefore for any decreasing sequence \((V_m)_{m\in \mathbb {N}}\) in \(\mathcal {N}_0\) with \(\bigcap _{m\in \mathbb {N}} V_m= \{0\}\) there exists an open set U with \(\inf I(U) <\infty \) with
We illustrate this for \(\mathcal {Y},\alpha _n, W_n, \nu _n \) and L as in Examples 9.3(a): For \(V_m=[0,\tfrac{1}{m})\), \(U=(1,\infty )\) we get for \(m\ge n\)
Since \(\int _0^\frac{1}{m} n y\cdot n e^{ny} {{\mathrm{\, \mathrm {d}\!}}}y \le \frac{n}{m} \int _0^\frac{1}{m} n e^{ny} {{\mathrm{\, \mathrm {d}\!}}}y\), we get
which converges to zero as \(m\rightarrow \infty \), which implies
Remark 9.5
(Quenched large deviations) Consider the situation as in Example 9.4. For all \(n\in \mathbb {N}\) we have the following. If \(\zeta _n: \mathcal {Y}\times \mathcal {B}(\mathcal {X}) \rightarrow [0,1]\) is a product regular conditional probability under \(\mu _n\) with respect to \(\pi \), then \(\zeta _n(y,\cdot ) = \eta _n(y,\cdot )\) for \([\mu _n \circ \pi ^{1}]\)almost all y (see Remark 3.5).
Whence, with \(\nu _n\) as in Examples 9.3 (a) or (b), we have a quenched large deviation principle of the conditional probability with respect to the second coordinate with rate function I; for every product regular conditional probability \(\zeta _n\) under \(\mu _n\) with respect to \(\pi \) there exists a \(Z\subset \mathcal {Y}\) with \(\mu _n \circ \pi ^{1}(Z) = \nu _n(Z)=1\) such that \((\zeta _n(y,\cdot ))_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function I for all \(y\in Z\).
However, with \(\nu _n^0\) as in Examples 9.3 (b) instead of \(\nu _n\) for such \(\zeta \) one has \(\zeta _n(0,\cdot ) = \eta _n(0,\cdot )\) as \(\nu _n^0(\{0\}) >0\). Thus in this case we do not have such a quenched large deviation principle.
Example 9.6
With \(\mathcal {X}= \mathbb {N}\), \(\mu _n^1 = \sum _{k\in \mathbb {N}} 2^{k} \delta _k\), \(\mu _n^2= \delta _n\) and \(I(x) =0\) for \(x\in \mathbb {N}\) as in Example 9.4, and \(\mathcal {Y}, W_n, \alpha _n, \nu _n\) and L as in Examples 9.3 (a) or (b), the conditions of Theorem 9.2 are satisfied. In this case, \((\eta _n(0,\cdot ))_{n\in \mathbb {N}}\) does not satisfy a large deviation principle.
Remark 9.7
(Exponential tightness of the regular conditional kernel) Considering the situation as in Theorem 9.2, we would like to mention that if \((\mu _n^1)_{n\in \mathbb {N}}\) is exponentially tight, then so is \((\mu _n)_{n\in \mathbb {N}}\) since \(\mu _n(K_1^c\times K_2^c) = \mu _n^1(K_1^c) \nu _n(K_2^c)\) for large n and (compact) \(K_1\subset \mathcal {X}, K_2\subset \mathcal {Y}\). Similarly \((\eta _n(y,\cdot ))_{n\in \mathbb {N}}\) is exponentially tight for all \(y>0\) since \(\eta _n(y,K^c) = \mu _n^1(K^c)\) for large n and compact \(K\subset \mathcal {X}\). However, as is the case in Example 9.6, \((\eta _n(y_n,\dot{)})_{n\in \mathbb {N}}\) need not be exponentially tight for all converging sequences \((y_n)_{n\in \mathbb {N}}\) (e.g. if \((\mu _n^2)_{n\in \mathbb {N}}\) is not exponentially tight, then \((\eta _n(0,\cdot ))_{n\in \mathbb {N}}\) is neither).
Remark 9.8
Example 9.4 with \(\nu _n\) (or \(\nu _n^0\)) and \(\alpha _n\) as in Examples 9.3 (b) fits the assumptions made in Sect. 4 of La Cour and Schieve [20].^{Footnote 8} In Sect. 4 of that paper, it is claimed that the law of the first coordinate conditioned on the second coordinate satisfies the large deviation principle with the rate function I. Their notion of conditioning on y is “condition on an arbitrarily small neighbourhood around y”. This approach needs to be justified. Our results are different, as by Example 9.4 the conditioned kernel in 0, \(\eta _n(0,\cdot )\) does not satisfy the large deviation principle with the rate function I (even in the sense of quenched large deviations as discussed in Remark 9.5).
Notes
 1.
Meaning that there exists an \(N\in \mathbb {N}\) such that \(y_n \in {{\mathrm{supp}}}(\mu _n\circ \pi ^{1})\) for all \(n\ge N\).
 2.
The Sorgenfrey line, the space \(\mathbb {R}\) with the right halfopen interval topology, is perfectly normal but not second countable (see Steen and Seebach [27, Example 51]).
 3.
Perfectly normal means that every open set in \(\mathcal {X}\) is equal to \(f^{1}((0,\infty ))\) for some \(f\in C(\mathcal {X})\). All metric spaces are perfectly normal; Bogachev [4, Proposition 6.3.5].
 4.
This is not true in general. For an example, see Bogachev [4, Example 7.1.3].
 5.
As we are considering large deviation bound for \((\eta _n(y_n,\cdot ))_{n\in \mathbb {N}}\) with \(y_n \in {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\), we want such \(y_n\) to exist. Instead of this condition, one could of course deal with the situation where \({{\mathrm{supp}}}(\mu _n \circ \pi ^{1}) \ne \emptyset \) for some large N and consider sequences \((y_n)_{n\in \mathbb {N}}\) with \(y_n \in {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\) for \(n\ge N\).
 6.
Under the condition that \(\mathcal {Y}\) is first countable.
 7.
Note that \(\mu _n(\mathcal {X}\times V) >0\) for all \(n\in \mathbb {N}\) and \(V\in \mathcal {N}_{y_n}\), as \(y_n \in {{\mathrm{supp}}}(\mu _n \circ \pi ^{1})\).
 8.
The logarithmic moment generating function (see Dembo and Zeitouni [7, Assumption 2.3.2]) is given by \((x,y) \mapsto \frac{1}{2} x^2 + \frac{1}{2} y^2\), whence the Hessian of it equals the identity matrix and is therefore invertible. In [20], it is mentioned that one cannot proceed the conditioning on all elements, but only those that equal the derivative of \(y\mapsto \frac{1}{2} y^2\) at a certain point are considered, of which 0 is an example.
 9.
 10.
This can also be found in O’Brien [23, Proposition 2.1].
References
 1.
Adams, S., Dirr, N., Peletier, M.A., Zimmer, J.: From a largedeviations principle to the Wasserstein gradient flow: a new micromacro passage. Commun. Math. Phys. 307(3), 791–815 (2011)
 2.
Biggins, J.D.: Large deviations for mixtures. Electron. Commun. Probab. 9, 60–71 (2004). (electronic)
 3.
Billingsley, P.: Convergence of Probability Measures, Wiley Series in Probability and Statistics: Probability and Statistics, 2nd edn. Wiley, New York (1999)
 4.
Bogachev, V.: Measure Theory. Springer, Berlin (2007)
 5.
Comets, F.: Large deviation estimates for a conditional probability distribution. Applications to random interaction Gibbs measures. Probab. Theory Relat. Fields 80(3), 407–432 (1989)
 6.
Comets, F., Gantert, N., Zeitouni, O.: Quenched, annealed and functional large deviations for onedimensional random walk in random environment. Probab. Theory Relat. Fields 118(1), 65–114 (2000)
 7.
Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications, Stochastic Modelling and Applied Probability, vol. 38. Springer, Berlin (2010). [Corrected reprint of the second edition (1998)]
 8.
Deuschel, J.D., Stroock, D.W.: Large Deviations, Pure and Applied Mathematics, vol. 137. Academic Press, Boston (1989)
 9.
van Enter, A.C.D., Fernández, R., den Hollander, F., Redig, F.: A largedeviation view on dynamical GibbsnonGibbs transitions. Mosc. Math. J. 10(4), 687–711 (2010)
 10.
van Enter, A.C.D., Külske, C., Opoku, A.A., Ruszel, W.M.: GibbsnonGibbs properties for nvector lattice and meanfield models. Braz. J. Probab. Stat. 24(2), 226–255 (2010)
 11.
Ermolaev, V., Külske, C.: Lowtemperature dynamics of the Curie–Weiss model: periodic orbits, multiple histories, and loss of Gibbsianness. J. Stat. Phys. 141(5), 727–756 (2010)
 12.
Faden, A.: The existence of regular conditional probabilities: necessary and sufficient conditions. Ann. Probab. 13(1), 288–298 (1985)
 13.
Fernández, R., den Hollander, F., Martínez, J.: Variational description of GibbsnonGibbs dynamical transitions for the Curie–Weiss model. Commun. Math. Phys. 319(3), 703–730 (2013)
 14.
Greven, A., den Hollander, F.: Large deviations for a random walk in random environment. Ann. Probab. 22(3), 1381–1428 (1994)
 15.
Halmos, P.R.: Measure Theory. Springer, Berlin (1974)
 16.
den Hollander, F.: Large Deviations, Fields Institute Monographs, vol. 14. American Mathematical Society, Providence, RI (2000)
 17.
den Hollander, F., Redig, R., van Zuijlen, W.: GibbsnonGibbs dynamical transitions for meanfield interacting Brownian motions. Stoch. Process. Appl. 125(1), 371–400 (2015)
 18.
Kosygina, E., Rezakhanlou, F., Varadhan, S.R.S.: Stochastic homogenization of Hamilton–Jacobi–Bellman equations. Commun. Pure Appl. Math. 59(10), 1489–1521 (2006)
 19.
Külske, C., Opoku, A.A.: Continuous spin meanfield models: limiting kernels and Gibbs properties of local transforms. J. Math. Phys. 49(12), 125215 (2008)
 20.
La Cour, B.R., Schieve, W.C.: A general conditional large deviation principle. J. Stat. Phys. 161(1), 123–130 (2015)
 21.
Leao Jr., D., Fragoso, M., Ruffino, P.: Regular conditional probability, disintegration of probability and Radon spaces. Proyecciones 23(1), 15–29 (2004)
 22.
Léonard, C.: A large deviation approach to optimal transport. arXiv:0710.1461v1 (2007)
 23.
O’Brien, G.L.: Sequences of capacities, with connections to largedeviation theory. J. Theor. Probab. 9(1), 19–35 (1996)
 24.
RassoulAgha, F., Seppäläinen, T., Yilmaz, A.: Quenched free energy and large deviations for random walk in random potential. Commun. Pure Appl. Math. 66(2), 202–244 (2013)
 25.
RassoulAgha, F., Seppäläinen, T.: A Course on Large Deviations with an Introduction to Gibbs Measures, Graduate Studies in Mathematics, vol. 162. American Mathematical Society, Providence, RI (2015)
 26.
Schaefer, H.H.: Topological Vector Spaces. Springer, New York (1971). (Third printing corrected, Graduate Texts in Mathematics, Vol. 3)
 27.
Steen, L.A., Seebach Jr., J.A.: Counterexamples in Topology. Holt, Rinehart and Winston, New York (1970)
Acknowledgements
The author is supported by ERC Advanced Grant VARIS267356 of Frank den Hollander. The author is grateful to both Frank den Hollander and Frank Redig for valuable suggestions and useful discussions.
Author information
Affiliations
Corresponding author
Appendices
Appendix 1: An Elementary Fact About Limsup and Liminf
Lemma 9.9
Let \(k\in \mathbb {N}\) and \(a_n^i\in [0,\infty )\) for all \(n\in \mathbb {N}\) and \(i\in \{1,\dots ,k\}\). If there exists an \(N\in \mathbb {N}\) such that \(\max _{i\in \{1,\dots ,k\}} a_n^i >0\) for all \(n\ge N\), then^{Footnote 9}
Proof
(9.16), (9.17) and (9.18) follow from the inequality
\(\square \)
Appendix 2: Sufficient Bounds for Large Deviation Bounds
Let \(\mathcal {X}\) be a topological space. Let \(I : \mathcal {X}\rightarrow [0,\infty ]\) have compact sublevel sets. Let \((\mu _n)_{n\in \mathbb {N}}\) be a sequence of probability measures on \(\mathcal {B}(\mathcal {X})\).
Lemma 9.10
Let \((F_m)_{m\in \mathbb {N}}\) be a decreasing sequence of closed sets with \(F= \bigcap _{m\in \mathbb {N}} F_m\). Then
Proof
Let \(c:= \sup _{m\in \mathbb {N}}\inf I(F_m) \). Note that \(c\le \inf I(F)\). If \(c=\infty \), there is nothing to prove. Assume that \(c <\infty \). Let K be the compact set \([I\le c]\). Then \(F_m \cap K \ne \emptyset \) for all \(m\in \mathbb {N}\), whence \(F\cap K \ne \emptyset \) and thus \(\inf I(F) \le c\). \(\square \)
9.11
For Lemma 9.10 the condition that I has compact sublevel sets is not redundant. For example: Let \(I: \mathbb {N}\cup \{0\} \rightarrow [0,\infty ]\) be given by \(I(0)=1\) and \(I(x) = 0\) for \(x\in \mathbb {N}\). Then for \(F_m = \{0\} \cup \{m,m+1,\dots \}\) and \(F=\{0\}\) one has \(\sup _{m\in \mathbb {N}}\inf I(F_m)=0\) and \(\inf I(F)=1\).
Lemma 9.12

(a)
\(\mathcal {U}\) be a set of open subsets of \(\mathcal {X}\). Suppose that for all \(G\in \mathcal {U}\)
$$\begin{aligned} \liminf _{n\rightarrow \infty }\tfrac{1}{n}\log \mu _n(G) \ge \inf I(G). \end{aligned}$$(9.21)Then \(G= \bigcup \mathcal {U}\) satisfies (9.21) as well.^{Footnote 10}

(b)
Let \(F_1,F_2,\dots \) be closed. Suppose that for all \(F\in \{F_m:m\in \mathbb {N}\}\)
$$\begin{aligned} \limsup _{n\rightarrow \infty }\tfrac{1}{n}\log \mu _n(F) \le \inf I(F). \end{aligned}$$(9.22)Then \(F= \bigcap _{m\in \mathbb {N}} F_m\) satisfies (9.22) as well.
Proof
Now apply Lemma 9.10. \(\square \)
As a consequence of Lemma 9.12 we obtain the following.
Theorem 9.13
Suppose that \(\mathcal {G}\) is a basis for the topology on \(\mathcal {X}\), such that (9.21) holds for all \(G\in \mathcal {G}\) and (9.22) holds for \(F= \mathcal {X}\setminus G\). Suppose that every open G can be written as countable union of elements in \(\mathcal {G}\). Then \((\mu _n)_{n\in \mathbb {N}}\) satisfies the large deviation principle with rate function I.
Appendix 3: Proof of Theorem 9.2
Proof of Theorem 9.2
As \(\mathcal {X}\) and \(\mathcal {Y}\) are separable metric spaces, every open subset of \(\mathcal {X}\times \mathcal {Y}\) is a countable union of elements of the form \(A\times B\) where \(A\subset \mathcal {X}\) is open and \(B\in \mathcal {H}\), where (with \(d_\mathcal {Y}\) the metric on \(\mathcal {Y}\))
We use Theorem 9.13 to prove the large deviation bounds. Note first that \((\mathcal {X}\times \mathcal {Y}) \setminus (A \times B) = (\mathcal {X}\times (\mathcal {Y}\setminus B))\cup ((\mathcal {X}\setminus A) \times \mathcal {Y})\), that \( \min \{\inf I(\mathcal {X}\setminus A), \inf L(\mathcal {Y}\setminus B) \} = \inf _{(x,y) \in (\mathcal {X}\times \mathcal {Y}) \setminus (A\times B)} I(x) + L(y)\) and that by (9.17)
Using this and Theorem 9.13 it is sufficient to show that for all open sets \(A\subset \mathcal {X}\) and \(B\subset \mathcal {Y}\)
Let \(A\subset \mathcal {X}\) be open and \(B\in \mathcal {H}\).

(9.26) follows from the fact that \(\mu _n(\mathcal {X}\times (\mathcal {Y}\setminus B) ) = \nu _n (\mathcal {Y}\setminus B)\).

(9.27) follows from the fact that by (9.4), (9.6) and (9.17) we have
$$\begin{aligned} \limsup _{n\rightarrow \infty }\tfrac{1}{n}&\log \mu _n((\mathcal {X}\setminus A) \times \mathcal {Y})\nonumber \\&= \max \Big \{ \limsup _{n\rightarrow \infty }\tfrac{1}{n} \log \mu _n^1(\mathcal {X}\setminus A), \limsup _{n\rightarrow \infty }\tfrac{1}{n} \log \mu _n^2(\mathcal {X}\setminus A) \Big \} \nonumber \\&= \limsup _{n\rightarrow \infty }\tfrac{1}{n} \log \mu _n^1(\mathcal {X}\setminus A) \le  \inf I(\mathcal {X}\setminus A), \end{aligned}$$(9.29) 
(9.28) follows by separating two cases (as either \(y\in B\) or \(y\notin {\overline{B}}\)): If \(y \notin {\overline{B}}\), then \(W_n \cap B = \emptyset \) and so \(\mu _n(A\times B) = \mu _n^1(A)\nu _n(B)\) for large n, whence
$$\begin{aligned} \liminf _{n\rightarrow \infty }&\tfrac{1}{n} \log \mu _n(A \times B) = \liminf _{n\rightarrow \infty }\bigg ( \tfrac{1}{n} \log \mu _n^1(A) + \tfrac{1}{n} \log \nu _n(B) \bigg ). \end{aligned}$$(9.30)
Suppose that \(y \in B\), i.e. \(W_n\subset B\) for large n. By (9.18) we obtain
Using that \(\liminf _{n\rightarrow \infty }\tfrac{1}{n} \log (\int _{W_n^c} \mathbbm {1}_B {{\mathrm{\, \mathrm {d}\!}}}\nu _n) \le 0\) together with (9.4) and (9.5), we obtain
Because \(\inf L(B) \ge 0\), we conclude (9.28).
We leave it to the reader to check that \(\eta _n\) is the weakly continuous product regular conditional probability under \(\mu _n\) with respect to \(\pi \). \(\square \)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
van Zuijlen, W. Large Deviations of Continuous Regular Conditional Probabilities. J Theor Probab 31, 1058–1096 (2018). https://doi.org/10.1007/s1095901607331
Received:
Revised:
Published:
Issue Date:
Keywords
 (Product) regular conditional kernel
 Weakly continuous
 Large deviations
Mathematics Subject Classification (2010)
 60A10
 60F10