A General Conditional Large Deviation Principle

Given a sequence of Borel probability measures on a Hausdorff space which satisfy a large deviation principle, we consider the corresponding sequence of measures formed by conditioning on a set $B$. If the large deviation rate function $I$ is good and effectively continuous and the conditioning set has the property that (1) $\overline{B^\circ} = \overline{B}$ and (2) $I(x)<\infty$ for all $x \in \overline{B}$, then the sequence of conditional measures satisfies a large deviation principle with the good, effectively continuous rate function $I_B$, where $I_B(x) = I(x)-\inf I(B)$ if $x\in\overline{B}$ and $I_B(x) = \infty$ otherwise.


Introduction
Conditional large deviation principles (LDPs) have played an important role in both mathematical statistics and statistical mechanics. In the latter field in particular, the asymptotic behavior of conditional distributions has been studied since the time of Boltzmann [1,5] and remains an important subject today in establishing the equivalence of microcanonical and canonical ensembles [6,14]. Early work by Lanford [9] and Ruelle [13] had anticipated many of the results regarding entropy and the thermodynamic limit which would later find form in the mathematical theory of large deviations. More recently, large deviation techniques have also been applied to nonequilibrium macroscopic time evolution [7,8].
Much of the past work on conditional LDPs has focused on distributions of sample means or, equivalently, empirical measures for mutually independent and identically distributed (IID) random variables. In this case, the conditioning set may be expressed as a constraint on the sample mean of IID random variables or, equivalently, as a constraint on the empirical distribution directly. Stroock and Zeitouni [14], for example, have developed a theorem on Gibbs conditioning, later revised in Dembo and Zeitouni [3], which establishes convergence in probability for empirical measures conditioned on the sample mean of IID random variables. Earlier, van Campenhout and Cover [15], using results from Zabell and Lanford, showed explicitly that the marginals of a distribution conditioned on a particular value of the sample mean converge to the canonical form predicted by the maximum entropy principle.
General conditional limit theorems have also been studied by Lewis et al. [10]. Casting the work of Ruelle and Lanford into the language of large deviation theory, they show that for a suitable sequence of conditioning sets, the corresponding conditional measures converge asymptotically to a canonical or "tilted" (in the Varadhan sense) measure with respect to the Kullback-Leibler information gain (Theorem 5.1). If the conditioning sets are assumed to be convex, then an even stronger conclusion follows in which the conditional equilibrium points may be determined from the subgradient of the free energy (Theorem 6.1). Their results fall short of giving an LDP for the sequence of conditional measures, however, and it is this problem which we address here.
In this paper we develop a general conditional LDP in terms of an assumed LDP and a given conditioning set. Suppose {P n } is a sequence of Borel probability measures on a Hausdorff topological space (X, T ) which satisfies a LDP with a good rate function I . Given a set B ⊆ X for which inf I (B • ) < ∞, the large deviation lower bound implies that P n (B) > 0 for all n sufficiently large. Without loss of generality, we may suppose that P n (B) > 0 for all n, so the sequence {P n ( · |B)} of conditional probability measures is well defined. Since the minimum of the unconditioned rate function, I , gives the asymptotically most likely value (or values) in X , we may anticipate that, under conditioning, the asymptotically most likely values in B (or, more precisely, in B) will be those values that minimize I over B. If this sequence of conditional measures is to have an LDP, however, then this minimum must be zero, and this suggests that the conditional rate function, denoted I B , is given by is open or, e.g., a closed ball in a metric space. Also, if B is a convex subset of a normed linear space, then B • = B holds if B has a nonempty interior. This is a straightforward extension of Theorem 6.3 in Rockafellar [12]. The utility of convex conditioning sets had been recognized by Csiszár [2] in studying conditionally distributed empirical measures. The condition that I be finite and continuous everywhere may be further relaxed by noting that the latter condition is relevant only on the effective domain, D I = {x ∈ X : I (x) < ∞}, of I . We shall say that a rate function is effectively continuous if it is continuous relative to its effective domain. Convex rate functions on a Banach space, for example, are effectively continuous, since D I is convex whenever I is convex (see, e.g., Roberts and Varberg [11], p. 112). With these considerations in mind, the main result may now be stated as follows: This result is proved in Sect. 3. The restriction to Hausdorff spaces is needed to obtain a good conditional rate function.

Auxilliary Rate Function Theorems
Before proving the conditional LDP theorem, a few general results regarding large deviation rate functions are needed. We begin with the following theorem regarding continuous rate functions. Throughout this section it is assumed that (X, T ) is a Hausdorff topological space.

Theorem 2 Let I be a good rate function which is continuous on an open set containing A. If either
Since I is a good rate function and A is closed, there exists at least one ) and note that, since I is continuous and We thus arrive at a contradiction and conclude that I (

Corollary 1 Let I be a good rate function which is continuous on
where the last equality follows from Theorem 2.

Proof of Conditional Large Deviation Principle
Observe that the large deviation bounds imply that, for any ε > 0 and all n sufficiently large, for ε > 0 and all n sufficiently large, while inf We begin with the large deviation upper bound. First assume 0 < inf I (A ∩ B) < ∞ and inf I (B) > 0. For a given ε > 0 we have that for all n sufficiently large As the second term is positive and may be made arbitrarily small, we conclude lim sup n→∞ a −1 n log P n (A|B) ≤ − inf I B (A).
where Corollary 1 has been used in the last equality. The second term is negative and may be made arbitrarily small, so we conclude Thus, the lower bound is satisfied in this case.
, the lower bound is clearly satisfied in this case as well.
To complete the proof, we must show that I B is a good, continuous rate function relative to B. Effective continuity of I B follows from that of I . To show that I B is a good rate function, consider any α < ∞ and note that We have already established that inf I (B) < ∞. As X is a Hausdorff space and I is a good rate function, the above intersection is compact, thus establishing that I B is a good rate function.

Application to Joint Random Vectors
Let {( n , F n , P n )} n∈N be a sequence of probability spaces and let {(X n , Y n )} n∈N be a sequence of Borel-measurable random vectors on n taking values in R d × R d . Suppose we are interested in the asymptotic behavior of Y n when X n is conditioned on a value x 0 ∈ R d . Rather than condition on X n = x 0 explicitly, we shall instead consider the joint distribution of (X n , Y n ) and construct a conditioning set for which X n converges to x 0 in probability. Assuming an LDP for the joint distribution and using the conditional LDP theorem (Theorem 1), we will determine the value y 0 corresponding to x 0 to which Y n converges in probability under this conditioning.
By the Gärtner-Ellis Theorem [3], the joint distribution of (X n , Y n ) will satisfy an LDP if the free energy, : R d × R d → (−∞, ∞], given by (λ 1 , λ 2 ) = lim n→∞ 1 a n log n e a n [λ 1 ·X n (ω)+λ 2 ·Y n (ω)] d P n (ω) is well defined and everywhere finite and differentiable. Assuming this to be the case, the rate function, I , is given by the Legendre-Fenchel transform of , i.e., where λ x and λ y are such that x = ∇ 1 (λ x , λ y ) and y = ∇ 2 (λ x , λ y ). (The mapping (λ x , λ y ) → (x, y) is invertible if the Jacobian of (∇ 1 , ∇ 2 ) exists everywhere and vanishes nowhere, by the inverse function theorem; we shall assume this is indeed the case.) It follows that I is a good, essentially strictly convex (hence, effectively continuous) rate function. (See Ellis [4], Theorem VII.2.1.) As such, there is a unique point (x * , y * ) for which the rate function is zero and to which (X n , Y n ) converges in probability. In terms of the free energy, note that x * = ∇ 1 (0, 0) and y * = ∇ 2 (0, 0). The effective domain of I will be denoted, as usual, by D I . To condition on x 0 , we consider a conditional LDP with a conditioning set B chosen so that I has its infimum at a unique point (x 0 , y 0 ) for some y 0 . Not all choices of x 0 will allow for a suitable conditioning set, as boundary points may be problematic. One way to address this problem is to consider the LDP for X n alone. By the contraction principle, X n satisfies an LDP with rate function I X ( · ) = I ( · , y * ) and corresponding free energy X ( · ) = ( · , 0). If we choose x 0 ∈ ∇ X (R d ), then clearly there exists a λ 0 ∈ R d such that x 0 = ∇ X (λ 0 ) = ∇ 1 (λ 0 , 0). As we have assumed invertibility, λ 0 is in fact uniquely determined by x 0 . Now choose Using the value of λ 0 determined by x 0 , define y 0 = ∇ 2 (λ 0 , 0) and note that, since is the intersection of the convex set D I with the affine half-space demarcated by the hyperplane containing the point (x 0 , y 0 ) and having a normal vector proportional to (λ 0 , 0) directed towards its interior; thus, B is also convex. We shall now verify that the conditions of Theorem 1 do indeed hold.
We have already established that I is a good, effectively continuous rate function and its domain is clearly Hausdorff, so it remains to verify the required conditions on B. Clearly B is a convex set, and B • ⊆ B ⊆ D I . Due to the choice of x 0 and the assumed continuity of (∇ 1 , ∇ 2 ), B also has a nonempty interior, so the fact that it is convex implies B • = B. This establishes the conditional LDP. It remains, then, to determine the corresponding rate function, i.e., to compute inf I (B).
For (x, y) ∈ B we have that λ 0 · x ≥ λ 0 · x 0 , so with equality if and only if (λ x , λ y ) = (λ 0 , 0). Thus, I (x, y) ≥ I (x 0 , y 0 ) for all (x, y) ∈ B, and, since (x 0 , y 0 ) ∈ B, we conclude that with y 0 = ∇ 2 (λ 0 , 0) and λ 0 given by x 0 = ∇ 1 (λ 0 , 0). This gives the desired conditional rate function, from which it follows that (X n , Y n ) converges in probability to (x 0 , y 0 ). Note that, since B ⊆ D I is convex and I is essentially strictly convex, (x 0 , y 0 ) is the unique point in B at which I attains the minimum value inf I (B). Note that this result continues to hold if the condition where δ > 0 is arbitrary. (In statistical mechanics, this corresponds to a microcanonical distribution with a "thickened" energy shell.) Consequently, while we do not condition on X n = x 0 precisely, we may restrict X n to be arbitrarily close to x 0 . The asymptotic value for Y n , i.e., y 0 , may be written in a more familiar form by evaluating ∇ 2 (λ 0 , 0) explicitly. Since is convex and we have assumed it to be finite and differentiable, it follows from Theorem 25.7 of Rockafellar [12] that the convergence of the gradients is uniform; hence, y 0 = ∇ 2 (λ 0 , 0) = lim n→∞ n Y n (ω) e a n λ 0 ·X n (ω) d P n (ω) n e a n λ 0 ·X n (ω ) d P n (ω ) , (8) which is the familiar canonical expectation.
Using the contraction principle one may obtain an explicit LDP for Y n | x 0 , i.e., Y n conditioned on λ 0 · (X n − x 0 ) ∈ [0, δ). To do this, note that the projection map (x, y) → y is continuous; hence, Y n | x 0 satisfies an LDP with rate function for y ∈ R d . The last equality follows from an argument similar to that used to determine inf I (B). From the properties of I it follows that I x 0 is a good, essentially strictly convex rate function. Thus, the corresponding free energy, call it x 0 , is given by for λ ∈ R d . The properties of finiteness and differentiability for x 0 follow from those of (or, more specifically, from those of (λ 0 , · )), so the global minimum of I x 0 is attained at ∇ x 0 (0) = ∇ 2 (λ 0 , 0) = y 0 , as expected. Note that I x 0 may also be written directly in terms of x 0 via the relation I x 0 (y) = λ · y − x 0 (λ). (11)