1 Introduction and Main Results

Throughout, S is a metric space, \({\mathcal {B}}\) the Borel \(\sigma \)-field on S, and \((\mu _n:n\ge 0)\) a sequence of probability measures on \({\mathcal {B}}\). Moreover, \((T,{\mathcal {C}})\) is a measurable space, \(g:(S,{\mathcal {B}})\rightarrow (T,{\mathcal {C}})\) a measurable function, and

$$\begin{aligned} \sigma (g)=g^{-1}({\mathcal {C}})=\bigl \{g^{-1}(C):C\in {\mathcal {C}}\bigr \}. \end{aligned}$$

If \(\mu _n\rightarrow \mu _0\) weakly and \(\mu _0\) is separable (namely, \(\mu _0(A)=1\) for some separable \(A\in {\mathcal {B}}\)), then, on some probability space, there are S-valued random variables \(X_n\) such that \(X_n\sim \mu _n\) for all \(n\ge 0\) and\(X_n\overset{\text {a.s.}}{\longrightarrow }X_0\). This is Skorohod representation theorem (SRT) as it appears after Skorohod [22], Dudley [12] and Wichura [24]. See [13, p. 130] and [23, p. 77] for historical notes, and [5] for the case where \(\mu _0\) is not separable. Some other related references are [3, 4, 8,9,10, 14,15,16, 18, 21].

This paper stems from the following question. Suppose \(\mu _0\) is separable, \(\mu _n\rightarrow \mu _0\) in some sense, and

$$\begin{aligned} \mu _n=\mu _0\text { on }\sigma (g)\text { for all }n\ge 0. \end{aligned}$$
(1)

Is it possible to take the \(X_n\) in SRT such that \(g(X_n)=g(X_0)\) for all n ? More precisely, the question is whether, on some probability space, there are S-valued random variables \(X_n\) satisfying

$$\begin{aligned} X_n\overset{\text {a.s.}}{\longrightarrow }X_0,\quad X_n\sim \mu _n\,\text { and }\,g(X_n)=g(X_0)\,\text { for all }n\ge 0. \end{aligned}$$
(2)

Such a question is intriguing, quite natural from the foundational point of view, and also has some practical implications. Examples are in Sect. 3.

Some more notation is needed. If \({\mathcal {X}}\) is any topological space, \({\mathcal {B}}({\mathcal {X}})\) denotes the Borel \(\sigma \)-field on \({\mathcal {X}}\) and \(C_b({\mathcal {X}})\) the set of real bounded continuous functions on \({\mathcal {X}}\). In case \({\mathcal {X}}=S\), we just write \({\mathcal {B}}\) instead of \({\mathcal {B}}(S)\). Moreover, we say that \((T,{\mathcal {C}})\) is diagonal if

$$\begin{aligned} \bigl \{(t,t):t\in T\bigr \}\in {\mathcal {C}}\otimes {\mathcal {C}}. \end{aligned}$$

In particular, \((T,{\mathcal {C}})\) is diagonal if T is a separable metric space and \({\mathcal {C}}={\mathcal {B}}(T)\).

We are now able to state our main result.

Theorem 1

Suppose \((T,{\mathcal {C}})\) is diagonal, \(\mu _n\) is tight and \(\mu _n=\mu _0\) on \(\sigma (g)\) for every \(n\ge 0\). Then, there are S-valued random variables \(X_n\), defined on the same probability space, such that

$$\begin{aligned} X_n\sim \mu _n\,\text { and }\,g(X_n)=g(X_0)\,\text { for all }n\ge 0. \end{aligned}$$
(3)

Such \(X_n\) can be taken to meet condition (2) (namely, they also satisfy \(X_n\overset{\mathrm{a.s.}}{\longrightarrow }X_0\)) if and only if

$$\begin{aligned} E_{\mu _n}(f\mid g)\,\overset{\mu _0-\mathrm{a.s.}}{\longrightarrow }\,E_{\mu _0}(f\mid g)\quad \quad \text {for each }f\in C_b(S). \end{aligned}$$
(4)

In a nutshell, Theorem 1 states that, under mild conditions on \(\mu _n\) and \((T,{\mathcal {C}})\),

$$\begin{aligned} \text {(1)}\,\,\Leftrightarrow \,\,\text {(3)}\quad \quad \text {and}\quad \quad \text {(1)}\wedge \text {(4)}\,\,\Leftrightarrow \,\,\text {(2)}. \end{aligned}$$

The second equivalence is possibly more meaningful but the first may be useful as well; see, e.g., Example 3. We also note that the notation \(E_{\mu _n}(f\mid g)\) stands for

$$\begin{aligned} E_{\mu _n}(f\mid g)=E_{\mu _n}\bigl [f\mid g^{-1}({\mathcal {C}})\bigr ]. \end{aligned}$$

Some more remarks are in order.

  1. (i)

    Any countably generated sub-\(\sigma \)-field \({\mathcal {G}}\subset {\mathcal {B}}\) can be written as \({\mathcal {G}}=\sigma (g)\) for some Borel measurable function \(g:S\rightarrow {\mathbb {R}}\).

  2. (ii)

    Some assumption on \((T,{\mathcal {C}})\) is necessary. As an obvious example, take \(T=S\) and \({\mathcal {C}}\) the collection of countable and co-countable subsets of S. If g is the identity map, conditions (1) and (4) hold true whenever \(\mu _n\rightarrow \mu _0\) weakly and \(\mu _n\{x\}=0\) for all \(n\ge 0\) and \(x\in S\). But, since g is the identity, condition (3) fails unless \(\mu _n=\mu _0\) on all of \({\mathcal {B}}\).

  3. (iii)

    Let \(\mu =\sum _{n=0}^\infty 2^{-n-1}\mu _n\). In view of Theorem 1, it is tempting to say that \(\mu _n\) converges to \(\mu _0\) conditionally with respect to g, written as \(\mu _n\overset{g}{\longrightarrow }\mu _0\), whenever

    $$\begin{aligned} E_{\mu _n}(f\mid g)\,\overset{\mu -\text {a.s.}}{\longrightarrow }\,E_{\mu _0}(f\mid g)\quad \quad \text {for each }f\in C_b(S). \end{aligned}$$

    This notion of convergence allows for a version of SRT and reduces to weak convergence in the special case where g is constant. Furthermore, if S is Polish, \(\mu _n\overset{g}{\longrightarrow }\mu _0\) is equivalent to

    $$\begin{aligned} \gamma _n(x)\rightarrow \gamma _0(x)\text { weakly for }\mu \text {-almost all }x\in S, \end{aligned}$$

    where \(\gamma _n=\{\gamma _n(x):x\in S\}\) is a regular conditional distribution for \(\mu _n\) given \(\sigma (g)\); see Sect. 2.

  4. (iv)

    In condition (4), \(C_b(S)\) can be replaced by the set of bounded Lipschitz functions on S. Note also that, if g is constant, condition (1) is trivially true and condition (4) reduces to \(\mu _n\rightarrow \mu _0\) weakly. Thus, when the \(\mu _n\) are tight, SRT is contained in Theorem 1.

  5. (v)

    As an obvious application, suppose the \(\mu _n\) are tight and take a countable Borel partition \(\bigl \{H_0,H_1,\ldots \bigr \}\) of S. Then, there are random variables \(X_n\) such that

    $$\begin{aligned} X_n\overset{\text {a.s.}}{\longrightarrow }X_0,\quad X_n\sim \mu _n\,\text { and }\,1_{H_j}(X_n)=1_{H_j}(X_0)\,\text { for all }n,\,j\ge 0 \end{aligned}$$

    if and only if

    $$\begin{aligned} \mu _n(H_j)= & {} \mu _0(H_j)\text { for all }n,\,j\ge 0\text { and }\\ E_{\mu _0}(f\,1_{H_j})= & {} \lim _nE_{\mu _n}(f\,1_{H_j}) \text { for all }f\in C_b(S)\text { and }j\ge 0. \end{aligned}$$

    This follows from Theorem 1 with \(T=\bigl \{0,1,\ldots \bigr \}\), \({\mathcal {C}}\) the power set of T, and \(g(x)=j\) for all \(x\in H_j\).

  6. (vi)

    Suppose S is Polish, \({\mathcal {G}}\subset {\mathcal {B}}\) is a countably generated sub-\(\sigma \)-field, and some element \(A\in {\mathcal {G}}\) satisfies \(\{x\}\in {\mathcal {G}}\) for all \(x\in A\). Then, \(A\cap {\mathcal {G}}=A\cap {\mathcal {B}}\) by a result of Blackwell [7], and one can take \(T=S\) and \(g(x)=x\) for all \(x\in A\). Therefore, under condition (3), one obtains \(X_n=X_0\) on the set \(\{X_0\in A\}\).

  7. (vii)

    A weak version of condition (2) is

    $$\begin{aligned} X_n\rightarrow X_0\text { in probability},\quad X_n\sim \mu _n\,\text { and }\,g(X_n)=g(X_0)\,\text { for all }n\ge 0. \end{aligned}$$

    This condition can be characterized by the same argument as Theorem 1. If \((T,{\mathcal {C}})\) is diagonal, the \(\mu _n\) are tight and condition (1) holds, the weak version of (2) is actually equivalent to

    $$\begin{aligned} E_{\mu _n}(f\mid g)\rightarrow E_{\mu _0}(f\mid g),\text { in }\mu _0\text {-probability, for each }f\in C_b(S). \end{aligned}$$
    (5)

    Indeed, as shown by Example 1, it may be that conditions (1) and (5) hold but condition (4) fails.

To motivate Theorem 1, in addition to the previous remarks, some examples are given in Sect. 3. Here, we close this section with three corollaries.

Given the metric spaces \(S_1\) and \(S_2\), define

$$\begin{aligned} S=S_1\times S_2\quad \text {and}\quad g(x,y)=x\quad \quad \text {for all }(x,y)\in S. \end{aligned}$$

If \(S_1\) is separable, it is possible to let \(T=S_1\) and \({\mathcal {C}}={\mathcal {B}}(S_1)\), so that

$$\begin{aligned} \sigma (g)=\bigl \{A\times S_2:A\in {\mathcal {B}}(S_1)\bigr \}. \end{aligned}$$

Therefore, Theorem 1 yields:

Corollary 2

Let \(\nu (\cdot )=\mu _0(\cdot \times S_2)\) be the marginal of \(\mu _0\) on \(S_1\). Suppose \(S_1\) is separable, \(\mu _n\) is tight and

$$\begin{aligned} \mu _n(\cdot \times S_2)=\nu (\cdot )\quad \quad \text {for every }n\ge 0. \end{aligned}$$

Then, on some probability space, there are random variables Y and \((Z_n:n\ge 0)\), where Y is \(S_1\)-valued and the \(Z_n\) are \(S_2\)-valued, such that

$$\begin{aligned} (Y,Z_n)\sim \mu _n\text { for all }n\ge 0. \end{aligned}$$

Moreover, the \(Z_n\) can be taken such that \(Z_n\overset{\text {a.s.}}{\longrightarrow }Z_0\) if and only if

$$\begin{aligned} E_{\mu _n}(f\mid g)\,\overset{\nu -\text {a.s.}}{\longrightarrow }\,E_{\mu _0}(f\mid g)\quad \quad \text {for each }f\in C_b(S_2). \end{aligned}$$
(6)

An obvious application of Corollary 2 is as follows. Let \((U_n,V_n)\) be a sequence of random variables such that \(U_n\sim U_0\) for all \(n\ge 0\). For some reason, we would like to replace \((U_n,V_n)\) with another sequence \(X_n=(Y,Z_n)\), possibly defined on a different probability space, such that \(X_n\overset{\text {a.s.}}{\longrightarrow }X_0\) and \(X_n\sim (U_n,V_n)\) for all n. Note that all the \(X_n\) have the same first coordinate Y, and this is basic in various frameworks, including optimal transport and stochastic control. Corollary 2 states that, under mild conditions, to replace \((U_n,V_n)\) with \(X_n\) is admissible if and only if condition (6) holds with \(\mu _n\) the probability distribution of \((U_n,V_n)\). See also [4, Prop. 2] for an analogous result.

The second corollary concerns the tightness condition. The main reason for assuming \(\mu _n\) tight, for every \(n\ge 0\), is to reduce the proof of Theorem 1 to the special case where S is a Borel subset of a Polish space. We do not know whether the tightness condition can be weakened for all \(n\ge 0\). However, arguing as in [3], the tightness condition on \(\mu _0\) can be dropped at the price of requiring something more on S and g.

Corollary 3

Suppose \((T,{\mathcal {C}})\) is diagonal, S is a subset of a Polish space \({\mathcal {X}}\), and g can be extended to a function \(\phi :{\mathcal {X}}\rightarrow T\) such that \(\phi ^{-1}({\mathcal {C}})\subset {\mathcal {B}}({\mathcal {X}})\). Suppose also that \(\mu _n\) is tight and \(\mu _n=\mu _0\) on \(\sigma (g)\) for each \(n\ge 1\). Then, condition (3) holds for some S-valued random variables \(X_n\) (defined on the same probability space). Moreover, under condition (4), the \(X_n\) also satisfy condition (2).

The third corollary deals with the probability space, say \((\Omega ,{\mathcal {A}},P)\), where the \(X_n\) can be defined. In Theorem 1 and Corollary 2, one can take \(\Omega =(0,1)^2\), \({\mathcal {A}}={\mathcal {B}}(\Omega )\) and P the Lebesgue measure. In Corollary 3, since \(\mu _0\) is not necessarily tight, \(\Omega \) is a suitable subset of \((0,1)^2\) and P the Lebesgue-outer measure. In our last result, the \(\mu _n\) are the probability distributions of random variables \(U_n\) defined on the same probability space. In this case, a question is whether the \(X_n\) can be defined on the probability space where the \(U_n\) live.

Corollary 4

Let \((U_n:n\ge 0)\) be a sequence of S-valued random variables on the probability space \((\Omega ,{\mathcal {A}},P)\). Define \(\mu _n(\cdot )=P(U_n\in \cdot )\) and suppose

  • \((T,{\mathcal {C}})\) is diagonal and \((\Omega ,{\mathcal {A}},P)\) is nonatomic;

  • Conditions (1) and (4) hold and \(\mu _n\) is tight for each \(n\ge 0\).

Then, \((\Omega ,{\mathcal {A}},P)\) supports S-valued random variables \(X_n\) satisfying condition (2).

It is worth noting that, for \((\Omega ,{\mathcal {A}},P)\) to be nonatomic, a sufficient condition is \(P(U_n=x)=0\) for some \(n\ge 0\) and all \(x\in S\); see, e.g., [3, Lem. 3.3].

2 Preliminaries

In the sequel, m is the Lebesgue measure on \({\mathcal {B}}((0,1))\) and \(\delta _z\) the unit mass at the point z. We denote by \({\mathcal {P}}\) the collection of all probability measures on \({\mathcal {B}}\) and we write \(\mu (f)=\int f\,\hbox {d}\mu \) whenever \(\mu \in {\mathcal {P}}\) and \(f:S\rightarrow {\mathbb {R}}\) is a bounded Borel function.

Let \(F\subset C_b(S)\) and \({\mathcal {Q}}\subset {\mathcal {P}}\). Say that F is a convergence determining class for \({\mathcal {Q}}\) if, for any sequence \((\lambda _n:n\ge 0)\subset {\mathcal {Q}}\),

$$\begin{aligned} \lambda _n\rightarrow \lambda _0\text { weakly}\quad \Leftrightarrow \quad \lambda _0(f)=\lim _n\lambda _n(f)\text { for each }f\in F. \end{aligned}$$

Let \({\mathcal {G}}\subset {\mathcal {B}}\) be a sub-\(\sigma \)-field. We recall that a regular conditional distribution (r.c.d.) for \(\mu _n\) given \({\mathcal {G}}\) is a collection \(\gamma _n=\{\gamma _n(x):x\in S\}\) such that

  • \(\gamma _n(x)\in {\mathcal {P}}\) for each \(x\in S\);

  • \(x\mapsto \gamma _n(x)(B)\) is \({\mathcal {G}}\)-measurable for each \(B\in {\mathcal {B}}\);

  • \(\int _A\gamma _n(x)(B)\,\mu _n(\hbox {d}x)=\mu _n(A\cap B)\) for all \(A\in {\mathcal {G}}\) and \(B\in {\mathcal {B}}\).

A r.c.d. for \(\mu _n\) given \({\mathcal {G}}\) exists and is \(\mu _n\)-a.s. unique whenever \({\mathcal {B}}\) is countably generated and \(\mu _n\) tight.

The following version of SRT is involved in the proof of Theorem 1.

Theorem 5

(Blackwell and Dubins [8]) If S is Polish, there is a Borel map \(\Phi :(0,1)\times {\mathcal {P}}\rightarrow S\) such that

  • \(m\bigl \{\beta \in (0,1):\Phi (\beta ,\lambda )\in B\bigr \}=\lambda (B)\) for all \(\lambda \in {\mathcal {P}}\) and \(B\in {\mathcal {B}}\);

  • \(m\bigl \{\beta \in (0,1):\Phi (\beta ,\lambda _n)\rightarrow \Phi (\beta ,\lambda _0)\bigr \}=1\) if \(\lambda _n\in {\mathcal {P}}\) and \(\lambda _n\rightarrow \lambda _0\) weakly.

A clear and detailed proof of Theorem 5 can be found in [16, pp. 52–54] (Blackwell and Dubins actually provide only a sketch of the proof). Moreover, it is straightforward to verify that Theorem 5 is still valid if S is only a Borel subset of a Polish space.

We finally state two technical lemmas. The first is certainly known, but we give a proof since we are not aware of any explicit reference.

Lemma 6

A measurable space \((T,{\mathcal {C}})\) is diagonal if and only if there is a countably generated sub-\(\sigma \)-field \({\mathcal {C}}_0\subset {\mathcal {C}}\) such that \(\{t\}\in {\mathcal {C}}_0\) for all \(t\in T\). Moreover, if \((T,{\mathcal {C}})\) is diagonal and Q is a probability measure on \({\mathcal {C}}\otimes {\mathcal {C}}\), then

$$\begin{aligned} Q\bigl \{(t,t):t\in T\bigr \}=1\quad \Leftrightarrow \quad Q\bigl (A\times A^c\bigr )=0\text { for all }A\in {\mathcal {C}}. \end{aligned}$$
(7)

Proof

Let \(\Delta =\bigl \{(t,t):t\in T\bigr \}\). If \((T,{\mathcal {C}})\) is diagonal, there are \(A_n,\,B_n\in {\mathcal {C}}\) such that \(\Delta \in \sigma (A_1\times B_1,A_2\times B_2,\ldots )\). Let \({\mathcal {C}}_0=\sigma (A_1,B_1,A_2,B_2,\ldots )\). Then, \({\mathcal {C}}_0\) is countably generated and \(\Delta \in {\mathcal {C}}_0\otimes {\mathcal {C}}_0\). Therefore, \(\{t\}=\{u:(t,u)\in \Delta \}\in {\mathcal {C}}_0\) for all \(t\in T\). Conversely, if \({\mathcal {C}}_0\subset {\mathcal {C}}\) is countably generated and includes the singletons, there is a distance \(\rho \) on T such that \((T,\rho )\) is a separable metric space and \({\mathcal {C}}_0\) is the Borel \(\sigma \)-field on T under \(\rho \). Therefore, \(\Delta \in {\mathcal {C}}_0\otimes {\mathcal {C}}_0\subset {\mathcal {C}}\otimes {\mathcal {C}}\).

Finally, we turn to (7). Fix \({\mathcal {C}}_0\) as above, and recall that \((T,{\mathcal {C}}_0)\) can be regarded as a separable metric space equipped with the Borel \(\sigma \)-field. Hence, \(Q(\Delta )=1\) provided \(Q\bigl (A\times A^c\bigr )=0\) for all \(A\in {\mathcal {C}}_0\). This proves "\(\Leftarrow \)" while "\(\Rightarrow \)" is trivial. \(\square \)

Lemma 7

If B is a \(\sigma \)-compact subset of S, there is a countable convergence determining class for \({\mathcal {Q}}=\bigl \{\mu \in {\mathcal {P}}:\mu (B)=1\bigr \}\).

Proof

Since B is \(\sigma \)-compact, there is a sequence \((x_n)\subset B\) such that \(\overline{\{x_1,x_2,\ldots \}}={\overline{B}}\). Let \(H= [0,1]^\infty \) be the Hilbert cube (equipped with the usual metric) and

$$\begin{aligned} h(x)=\bigl (d(x,x_1)\wedge 1,d(x,x_2)\wedge 1,\ldots \bigr )\quad \quad \text {for all }x\in S, \end{aligned}$$

where d is the distance on S. Then, \(h:S\rightarrow H\) is continuous, and it is an homeomorphism as a map \(h:B\rightarrow h(B)\), and \(h(B)\in {\mathcal {B}}(H)\) (for h(B) is \(\sigma \)-compact). Take a countable subset \(G\subset C_b(H)\), dense in \(C_b(H)\) under the sup-norm, and define

$$\begin{aligned} F=\bigl \{g\circ h:g\in G\bigr \}. \end{aligned}$$

Then, \(F\subset C_b(S)\) is countable. Suppose now that \(\lambda _n\in {\mathcal {Q}}\) and \(\lambda _n(f)\rightarrow \lambda _0(f)\) for each \(f\in F\). Then, \(\lambda _n\circ h^{-1}\rightarrow \lambda _0\circ h^{-1}\) weakly, since G is dense in \(C_b(H)\). Hence, \(\lambda _n\rightarrow \lambda _0\) weakly follows from \(h:B\rightarrow h(B)\) is an homeomorphism and \(\lambda _n\circ h^{-1}(h(B))\ge \lambda _n(B)=1\) for all \(n\ge 0\). This concludes the proof. \(\square \)

3 Examples

It may be that conditions (1) and (5) hold but condition (4) fails. In this case, condition (2) can not be realized (since (4) fails). However, as noted in Remark (vii), some random variables \(X_n\) satisfy a weak version of (2).

Example 1

Let

$$\begin{aligned} S=[-1,1],\quad T=[0,1],\quad {\mathcal {C}}={\mathcal {B}}(T)\quad \text {and}\quad g(x)=|x|. \end{aligned}$$

Take a sequence \((B_n:n\ge 1)\subset {\mathcal {B}}[0,1]\) such that

$$\begin{aligned} \bigcap _{n=1}^\infty \bigcup _{j=n}^\infty B_j=[0,1]\quad \text {and}\quad \lim _nm(B_n)=0. \end{aligned}$$

Using such \(B_n\), define \(C_n=[-1,1]\setminus B_n\) and

$$\begin{aligned} f_n(x)=(1/2)\,1_{C_n}(x)+(1/4)\,\bigl \{1_{B_n}(x)+1_{B_n}(-x)\bigr \}\quad \quad \text {for all }x\in [-1,1]. \end{aligned}$$

Moreover, let \(f_0=1/2\) and

$$\begin{aligned} \mu _n(\hbox {d}x)=f_n(x)\,\hbox {d}x\quad \quad \text {for all }n\ge 0. \end{aligned}$$

Then,

$$\begin{aligned} \mu _n\circ g^{-1}=\mu _0\circ g^{-1}=m\quad \quad \text {for all }n\ge 0. \end{aligned}$$

Furthermore, a r.c.d. \(\gamma _n=\{\gamma _n(x):x\in S\}\) for \(\mu _n\) given \(\sigma (g)\) (see Sect. 2) is

$$\begin{aligned} \gamma _0(x)= & {} (1/2)\,\bigl \{\delta _{|x|}+\delta _{-|x|}\bigr \}\quad \quad \text {and}\\ \gamma _n(x)= & {} \gamma _0(x)\,\text { if }|x|\notin B_n\quad \text {and}\quad \gamma _n(x)=\frac{\delta _{|x|}+3\,\delta _{-|x|}}{4}\,\text { if }|x|\in B_n. \end{aligned}$$

Thus, for each \(f\in C_b[-1,1]\) and \(\epsilon >0\),

$$\begin{aligned} \mu _0\bigl \{x\in [-1,1]:|\gamma _n(x)(f)-\gamma _0(x)(f)|>\epsilon \bigr \}\le m(B_n)\rightarrow 0. \end{aligned}$$

Therefore, conditions (1) and (5) are both satisfied. However, condition (4) fails, since every \(x\in [-1,1]\) is such that \(|x|\in B_n\) for infinitely many n.

In a sense, the next example completes the previous one.

Example 2

Let \(S=S_1\times S_2\), where \(S_1\) and \(S_2\) are Polish spaces, and \((U_n,V_n)\) a sequence of S-valued random variables such that

$$\begin{aligned} (U_n,V_n)\rightarrow (U_0,V_0)\text { in distribution}\quad \text {and}\quad U_n\sim U_0\text { for all }n\ge 0. \end{aligned}$$

Then, by [17, Cor. 2.9], one obtains \(E\bigl [f(U_n,V_n)\bigr ]\rightarrow E\bigl [f(U_0,V_0)\bigr ]\) for each bounded Borel function \(f:S\rightarrow {\mathbb {R}}\) which is continuous in the second coordinate. Nevertheless, it may be that \(E\bigl [f(V_n)\mid U_n\bigr ]\) does not converge to \(E\bigl [f(V_0)\mid U_0\bigr ]\), even in distribution, for some \(f\in C_b(S_2)\). As an example, take \(S_1=S_2=(0,1)\) and \((U_0,V_0)\) uniform on \(S=(0,1)^2\). Then, \((U_n,V_n)\rightarrow (U_0,V_0)\) in distribution for some sequence \((U_n,V_n)\) such that

$$\begin{aligned} U_n\sim V_n\sim m\quad \text {and}\quad V_n\overset{\text {a.s.}}{=}\varphi _n(U_n)\quad \text {for all }n\ge 1 \end{aligned}$$

where the \(\varphi _n:(0,1)\rightarrow (0,1)\) are suitable Borel functions; see, e.g., [1, Prop. 2.7]. Therefore,

$$\begin{aligned} E(V_0\mid U_0)\overset{\text {a.s.}}{=}E(V_0)=1/2\quad \text {and}\quad E(V_n\mid U_n)\overset{\text {a.s.}}{=}V_n\sim m\text { for all }n\ge 1. \end{aligned}$$

Another example, similar to the previous one, is [11, Ex. 6.1]. Even in this case, \(E\bigl [f(V_n)\mid U_n\bigr ]\) does not converge in distribution to \(E\bigl [f(V_0)\mid U_0\bigr ]\) for some \(f\in C_b(S_2)\). In addition, one also obtains \(U_n\sim V_n\sim m\) for all \(n\ge 0\) and \(E\bigl [h(U_0,V_0)\bigr ]=\lim _nE\bigl [h(U_n,V_n)\bigr ]\) for each bounded Borel function \(h:S\rightarrow {\mathbb {R}}\).

Two remarks are in order. First, to obtain the weak version of (2) involved in Remark (vii), condition (5) cannot be dropped. As noted in [11], however, condition (5) can be weakened into

$$\begin{aligned} E_{\mu _n}(f\mid g)\rightarrow E_{\mu _0}(f\mid g),\text { in distribution, for each }f\in C_b(S). \end{aligned}$$

Second, if \(((U, V_n):n\ge 0)\) are S-valued random variables, it may be that \(V_n\rightarrow V_0\) in probability and yet there are not random variables \((Y,Z_n)\) satisfying \(Z_n\overset{\text {a.s.}}{\longrightarrow }Z_0\) and \((Y,Z_n)\sim (U,V_n)\) for all \(n\ge 0\); see Example 1.

Let d be the distance on S. Sometimes, one aims to realize the \(\mu _n\) by random variables which converge (say in probability) under some distance \(\rho \) stronger than d; see [4, 5]. This motivates the next example.

Example 3

Suppose S is separable and the \(\mu _n\) are tight. Fix \(A\in {\mathcal {B}}\) and define

$$\begin{aligned} b(x,y)=1_A(x)\,1\wedge d(x,y)+1_{A^c}(x)\,1\wedge \rho (x,y)\quad \quad \text {for }(x,y)\in S^2, \end{aligned}$$

where \(\rho \) is any distance on S such that the map \((x,y)\mapsto \rho (x,y)\) is measurable with respect to \({\mathcal {B}}\otimes {\mathcal {B}}\). For instance, \(\rho \) could be the 0-1 distance. Or else, S could be the set of real cadlag functions on [0, 1], d the Skorohod distance, and \(\rho \) the uniform distance.

A question is whether there are S-valued random variables \(X_n\) such that

$$\begin{aligned} \lim _nE\bigl \{b(X_0,X_n)\bigr \}=0\quad \text {and}\quad X_n\sim \mu _n\text { for each }n\ge 0. \end{aligned}$$
(8)

Equivalently, the question is whether the \(\mu _n\) can be realized by some \(X_n\) such that \(d(X_n,X_0)\rightarrow 0\) in probability on the set \(\{X_0\in A\}\) and \(\rho (X_n,X_0)\rightarrow 0\) in probability on \(\{X_0\notin A\}\).

Corollary 2 allows to answer this question. For any \(\mu ,\,\lambda \in {\mathcal {P}}\), let \(\Gamma (\mu ,\lambda )\) denote the collection of those probability measures \(\tau \) on \({\mathcal {B}}\otimes {\mathcal {B}}\) such that \(\tau (\cdot \times S)=\mu \) and \(\tau (S\times \cdot )=\lambda \). By Corollary 2, there are \(X_n\) satisfying condition (8) if and only if

$$\begin{aligned} \lim _n\tau _n(b)=0\quad \text {for some sequence }\tau _n\in \Gamma (\mu _0,\mu _n). \end{aligned}$$

In turn, by a duality theorem [20], the above condition can be written as

$$\begin{aligned} \sup _{(f,h)}\bigl \{\mu _0(f)+\mu _n(h)\bigr \}\rightarrow 0 \end{aligned}$$
(9)

where \(\sup \) is over all pairs (fh) of bounded Borel functions on S such that

$$\begin{aligned} f(x)+h(y)\le b(x,y)\quad \quad \text {for all }(x,y)\in S^2. \end{aligned}$$

The equivalence between (8) and (9), obtained above, improves [4, Theorem 4]. It also improves [21, Theorem 2.1] in the special case where S is separable and the \(\mu _n\) are tight.

The next example deals with exchangeable sequences, but the same argument applies to many other types of sequences, including martingale difference and stationary.

Example 4

Let \((T_n:n\ge 1)\) be an exchangeable sequence of real random variables on the probability space \((\Omega ,{\mathcal {A}},P)\). Suppose \(E(T_1^2)<\infty \) and define

$$\begin{aligned}&{\mathcal {T}}=\bigcap _n\sigma (T_n,T_{n+1},\ldots ),\quad V=E\bigl (T_1^2\mid {\mathcal {T}}\bigr )-E\bigl (T_1\mid {\mathcal {T}}\bigr )^2\\ \text {and}\quad \quad&M_n=\sqrt{n}\,\,\Bigl \{\,\frac{\sum _{i=1}^nT_i}{n}-E\bigl (T_1\mid {\mathcal {T}}\bigr )\Bigr \}. \end{aligned}$$

Moreover, let N be a standard normal random variable independent of \((T_n)\) and

$$\begin{aligned} M_0=\sqrt{V}N. \end{aligned}$$

Then, \(M_n\rightarrow M_0\) in distribution (and even stably); see, e.g., [2, Theorem 3.1]. In addition, even if the tail \(\sigma \)-field \({\mathcal {T}}\) is not countably generated, there is a real random variable U such that

$$\begin{aligned} \sigma ({\mathcal {T}}\cup {\mathcal {N}})=\sigma \bigl (\sigma (U)\cup {\mathcal {N}}\bigr )\quad \text {where}\quad {\mathcal {N}}=\bigl \{A\in {\mathcal {A}}:P(A)=0\bigr \}. \end{aligned}$$

Apart from trivial cases, \(M_n\) fails to converge in probability. However, thanks to Corollary 2, some real random variables Y and \(Z_n\) satisfy

$$\begin{aligned} Z_n\overset{\text {a.s.}}{\longrightarrow }Z_0\quad \text {and}\quad (Y,Z_n)\sim (U,M_n)\text { for each }n\ge 0. \end{aligned}$$

Take in fact \(S_1=S_2={\mathbb {R}}\) and define \(\mu _n\) to be the probability distribution of \((U,M_n)\). Conditionally on \({\mathcal {T}}\), the sequence \((T_n)\) is i.i.d. with mean \(E\bigl (T_1\mid {\mathcal {T}}\bigr )\) and variance V. Hence, given \(f\in C_b({\mathbb {R}})\), the standard CLT yields

$$\begin{aligned} E\bigl \{f(M_n)\mid U\bigr \}=E\bigl \{f(M_n)\mid {\mathcal {T}}\bigr \}\overset{\text {a.s.}}{\longrightarrow }\int f(x)\,N(0,V)(\hbox {d}x), \end{aligned}$$

where N(0, V) denotes the Gaussian law with mean 0 and (random) variance V with \(N(0,0)=\delta _0\). On the other hand,

$$\begin{aligned} E\bigl \{f(M_0)\mid U\bigr \}=E\bigl \{f(M_0)\mid {\mathcal {T}}\bigr \}=\int f(x)\,N(0,V)(\hbox {d}x). \end{aligned}$$

Therefore, condition (6) is satisfied.

Our last example concerns conditionally identically distributed sequences.

Example 5

Let \(S={\mathbb {R}}^\infty \) and T the set of probability measures on \({\mathcal {B}}(\mathcal {{\mathbb {R}}})\) (equipped with the topology of weak convergence). Moreover, let \({\mathcal {C}}={\mathcal {B}}(T)\) and \(g:S\rightarrow T\) the weak limit of the empirical measures. Precisely, for each \(x=(x_1,x_2,\ldots )\in S\),

$$\begin{aligned} g(x)=\lim _n\,\frac{1}{n}\,\sum _{i=1}^n\delta _{x_i}\,\,\text { provided the limit exists and }\,\,g(x)=\delta _{x_1}\,\,\text { otherwise}, \end{aligned}$$

where the limit is meant as a weak limit of probability measures.

A sequence \(Y=(Y_1,Y_2,\ldots )\) of real random variables is conditionally identically distributed (c.i.d.) if

$$\begin{aligned} (Y_1,\ldots ,Y_n,Y_{n+2})\sim (Y_1,\ldots ,Y_n,Y_{n+1})\text { for all }n\ge 0. \end{aligned}$$

An exchangeable sequence is c.i.d. but not conversely; see, e.g., [6] and references therein. However, as in the exchangeable case, if Y is c.i.d. one obtains

$$\begin{aligned} \frac{1}{n}\,\sum _{i=1}^n\delta _{Y_i}(B)\,\overset{\text {a.s.}}{\longrightarrow }\, g(Y)(B)\quad \quad \text {for all }B\in {\mathcal {B}}({\mathbb {R}}). \end{aligned}$$

Suppose Y is c.i.d. and define

$$\begin{aligned} N=g(Y),\quad \mu _0(\cdot )=E\Bigl (N^\infty (\cdot )\Bigr )\quad \text {and}\quad \mu _n(\cdot )=P\Bigl ((Y_n,Y_{n+1},\ldots )\in \cdot \Bigr ). \end{aligned}$$

Here, \(N^\infty =N\times N\times \ldots \) denotes the random probability measure on \((S,{\mathcal {B}})\) which makes the coordinate random variables i.i.d. with common distribution N. Hence, \(\mu _0\) is exchangeable (for it is a mixture of i.i.d. probability measures). Moreover, \(\mu _n\rightarrow \mu _0\) weakly and \(\mu _n=\mu _0\) on \(\sigma (g)\) because of [2, Theorem 2.6].

Since \(\mu _n=\mu _0\) on \(\sigma (g)\), Theorem 1 applies. Thus, under condition (4), there are real random sequences

$$\begin{aligned} X_n=(X_{n,1},X_{n,2},\ldots ),\quad \quad n\ge 0, \end{aligned}$$

such that

$$\begin{aligned} X_{0,j}\overset{\text {a.s.}}{=}\lim _nX_{n,j}\quad \text {and}\quad g(X_j)=g(X_0)\quad \text {for each }j\ge 1, \\X_0\sim \mu _0\quad \text {and}\quad X_n\sim (Y_n,Y_{n+1},\ldots )\text { for each }n\ge 1. \end{aligned}$$

We finally turn to condition (4), namely

$$\begin{aligned} E\bigl \{f(Y_n,Y_{n+1},\ldots )\mid N\bigr \}\overset{\text {a.s.}}{\longrightarrow }\int f(x)\,N^\infty (\hbox {d}x)\quad \quad \text {for all }f\in C_b(S). \end{aligned}$$

We do not know whether this condition holds for any c.i.d. sequence, but it holds in some (meaningful) special cases, including N a.s. discrete; see also [6, Th. 18].

4 Proofs

From now on, to make the notation easier, we let

$$\begin{aligned} {\mathcal {G}}=\sigma (g). \end{aligned}$$

4.1 Proof of Theorem 1

We first prove that (2) \(\Rightarrow \) (4). Suppose condition (2) holds for some S-valued random variables \(X_n\) defined on the probability space \((\Omega ,{\mathcal {A}},P)\). Fix \(f\in C_b(S)\). Because of condition (1), for each \(n\ge 0\), there is a measurable function \(\phi _n:(T,{\mathcal {C}})\rightarrow ({\mathbb {R}},{\mathcal {B}}({\mathbb {R}}))\) such that

$$\begin{aligned} E_{\mu _n}(f\mid g)=\phi _n\circ g,\quad \quad \mu _0\text {-a.s.} \end{aligned}$$

It is straightforward to verify that \(E_P\bigl \{f(X_n)\mid g(X_n)\bigr \}=\phi _n\bigl [g(X_n)\bigr ]\) a.s. Hence, \(g(X_n)=g(X_0)\) implies

$$\begin{aligned} E_P\bigl \{f(X_n)\mid g(X_n)\bigr \}=\phi _n\bigl [g(X_0)\bigr ]\quad \quad \text {a.s.} \end{aligned}$$

On the other hand, since f is bounded, \(f(X_n)\overset{\text {a.s.}}{\longrightarrow }f(X_0)\) and \(g(X_n)=g(X_0)\) for all n, one also obtains

$$\begin{aligned} E_P\bigl \{f(X_n)\mid g(X_n)\bigr \}=E_P\bigl \{f(X_n)\mid g(X_0)\bigr \}\overset{\text {a.s.}}{\longrightarrow }E_P\bigl \{f(X_0)\mid g(X_0)\bigr \}. \end{aligned}$$

Therefore, \(\phi _n\bigl [g(X_0)\bigr ]\overset{\text {a.s.}}{\longrightarrow }\phi _0\bigl [g(X_0)\bigr ]\), or equivalently

$$\begin{aligned} \mu _0\Bigl (E_{\mu _n}(f\mid g)\rightarrow E_{\mu _0}(f\mid g)\Bigr )= & {} \mu _0\Bigl (\phi _n\circ g\rightarrow \phi _0\circ g\Bigr )\\= & {} P\Bigl (\phi _n\bigl [g(X_0)\bigr ]\rightarrow \phi _0\bigl [g(X_0)\bigr ]\Bigr )=1. \end{aligned}$$

Thus, condition (4) holds.

The rest of the proof is split into two steps.

Step 1 We first suppose that S is a Borel subset of a Polish space. Then, for every \(n\ge 0\), we can fix a r.c.d. \(\gamma _n=\{\gamma _n(x):x\in S\}\) for \(\mu _n\) given \({\mathcal {G}}\). We will write

$$\begin{aligned} \gamma _n(x)(f)=\int f(y)\,\gamma _n(x)(\hbox {d}y) \end{aligned}$$

for all \(x\in S\) and all bounded Borel functions \(f:S\rightarrow {\mathbb {R}}\).

Let \(\Phi :(0,1)\times {\mathcal {P}}\rightarrow S\) be the Borel map involved in Theorem 5 and

$$\begin{aligned} (\Omega ,{\mathcal {A}},P)=\Bigl ((0,1)^2,\,{\mathcal {B}}((0,1)^2),\,m^2\Bigr ). \end{aligned}$$

For each \(n\ge 0\) and \((\alpha ,\beta )\in (0,1)^2\), define

$$\begin{aligned} \phi (\alpha )=\Phi (\alpha ,\mu _0)\quad \text {and}\quad X_n(\alpha ,\beta )=\Phi \Bigl (\beta ,\,\gamma _n[\phi (\alpha )]\Bigr ). \end{aligned}$$

The \(X_n\) are S-valued random variables on \((\Omega ,{\mathcal {A}},P)\). We now prove that they meet condition (3).

Fix \(n\ge 0\) and note that

$$\begin{aligned} m\Bigl \{\beta :X_n(\alpha ,\beta )\in B\Bigr \}=m\Bigl \{\beta :\Phi \Bigl (\beta ,\,\gamma _n[\phi (\alpha )]\Bigr )\in B\Bigr \}=\gamma _n[\phi (\alpha )](B) \end{aligned}$$

for all \(\alpha \in (0,1)\) and \(B\in {\mathcal {B}}\). Hence, Fubini’s theorem yields

$$\begin{aligned} P(X_n\in B)= & {} \int _0^1 m\Bigl \{\beta :X_n(\alpha ,\beta )\in B\Bigr \}\,\hbox {d}\alpha =\int _0^1\gamma _n[\phi (\alpha )](B)\,\hbox {d}\alpha \\= & {} \int \gamma _n(x)(B)\,\mu _0(\hbox {d}x)=\int \gamma _n(x)(B)\,\mu _n(\hbox {d}x)=\mu _n(B), \end{aligned}$$

where the third equality is because \(m\circ \phi ^{-1}=\mu _0\) while the fourth depends on \(\mu _n=\mu _0\) on \({\mathcal {G}}\) and \(x\mapsto \gamma _n(x)(B)\) is \({\mathcal {G}}\)-measurable. This proves \(X_n\sim \mu _n\).

We next prove \(P\bigl (g(X_n)\ne g(X_0)\bigr )=0\). To this end, since \((T,{\mathcal {C}})\) is diagonal, it suffices to show that \(P\bigl (g(X_n)\in C,\,g(X_0)\notin C\bigr )=0\) for all \(C\in {\mathcal {C}}\); see Lemma 6. Fix \(C\in {\mathcal {C}}\), define \(A=\{g\in C\}\), and note that

$$\begin{aligned}&m\Bigl \{\beta :X_n(\alpha ,\beta )\in A,\,X_0(\alpha ,\beta )\notin A\Bigr \}\\&\quad \le \min \Bigl \{m\bigl \{\beta :X_n(\alpha ,\beta )\in A\bigr \},\,m\bigl \{\beta :X_0(\alpha ,\beta )\notin A\bigr \}\Bigr \}\\&\quad =\min \Bigl \{\gamma _n[\phi (\alpha )](A),\,\gamma _0[\phi (\alpha )](A^c)\Bigr \}\quad \quad \text {for all }\alpha \in (0,1). \end{aligned}$$

Recalling that \(m\circ \phi ^{-1}=\mu _0\), one obtains

$$\begin{aligned} P\bigl (g(X_n)\in C,\,g(X_0)\notin C\bigr )= & {} P\bigl (X_n\in A,\,X_0\notin A\bigr )\\= & {} \int _0^1 m\Bigl \{\beta :X_n(\alpha ,\beta )\in A,\,X_0(\alpha ,\beta )\notin A\Bigr \}\,\hbox {d}\alpha \\\le & {} \int _0^1 \min \Bigl \{\gamma _n[\phi (\alpha )](A),\,\gamma _0[\phi (\alpha )](A^c)\Bigr \}\,\hbox {d}\alpha \\= & {} \int \min \Bigl \{\gamma _n(x)(A),\,\gamma _0(x)(A^c)\Bigr \}\,\mu _0(\hbox {d}x). \end{aligned}$$

Let

$$\begin{aligned} G=\bigl \{x\in S:\gamma _n(x)(A)=1_A(x),\,\gamma _0(x)(A^c)=1_{A^c}(x)\bigr \}. \end{aligned}$$

Since \(\bigl \{x\in S:\gamma _n(x)(A)=1_A(x)\bigr \}\) belongs to \({\mathcal {G}}\), condition (1) implies

$$\begin{aligned} \mu _0(G)=\mu _0\bigl \{x\in S:\gamma _n(x)(A)=1_A(x)\bigr \}=\mu _n\bigl \{x\in S:\gamma _n(x)(A)=1_A(x)\bigr \}=1. \end{aligned}$$

In turn, this implies

$$\begin{aligned} P\bigl (g(X_n)\in C,\,g(X_0)\notin C\bigr )\le \int _G\min \Bigl \{1_A(x),\,1_{A^c}(x)\Bigr \}\,\mu _0(\hbox {d}x)=0. \end{aligned}$$

Hence, \(P\bigl (g(X_n)\ne g(X_0)\bigr )=0\). To get \(g(X_n)=g(X_0)\) everywhere, as prescribed by condition (3), it suffices to modify \(X_n\) on a P-null set.

This proves condition (3). It remains to show that, under condition (4), one also obtains \(X_n\overset{\text {a.s.}}{\longrightarrow }X_0\) as \(n\rightarrow \infty \). Since S is separable (it is in fact a Borel subset of a Polish space), there is a countable convergence determining class \(F\subset C_b(S)\) for \({\mathcal {P}}\). Let

$$\begin{aligned} H=\bigl \{x\in S:\gamma _0(x)(f)=\lim _n\gamma _n(x)(f)\text { for each }f\in F\bigr \}. \end{aligned}$$

Then, \(\gamma _n(x)\rightarrow \gamma _0(x)\) weakly for each \(x\in H\). Moreover, since F is countable, condition (4) implies \(\mu _0(H)=1\). Therefore, Theorem 5 yields

$$\begin{aligned} P(X_n\rightarrow X_0)= & {} \int _0^1 m\Bigl \{\beta :\,\Phi (\beta ,\gamma _n[\phi (\alpha )])\,\longrightarrow \,\Phi (\beta ,\gamma _0[\phi (\alpha )])\Bigr \}\,\hbox {d}\alpha \\= & {} \int _H m\Bigl \{\beta :\,\Phi (\beta ,\gamma _n(x))\,\longrightarrow \,\Phi (\beta ,\gamma _0(x))\Bigr \}\,\mu _0(\hbox {d}x)\\= & {} \int _H 1\,d\mu _0=1. \end{aligned}$$

This concludes the proof when S is a Borel subset of a Polish space.

Step 2 Suppose now that S is an arbitrary metric space. For each \(n\ge 0\), since \(\mu _n\) is tight, there is a \(\sigma \)-compact set \(B_n\subset S\) such that \(\mu _n(B_n)=1\). Let \(B=\cup _nB_n\) and let \({\mathcal {X}}\) be the completion of B. Since B is \(\sigma \)-compact, \({\mathcal {X}}\) is a Polish space and \(B\in {\mathcal {B}}({\mathcal {X}})\) (in fact, B is still \(\sigma \)-compact as a subset of \({\mathcal {X}}\)). Since \(\mu _n(B)=1\) for each \(n\ge 0\), the \(\mu _n\) can be regarded as probability measures on \({\mathcal {B}}(B)\). Furthermore, as shown below, condition (4) implies

$$\begin{aligned} E_{\mu _n}\bigl (f\mid B\cap {\mathcal {G}}\bigr )\,\overset{\mu _0-\text {a.s.}}{\longrightarrow }\,E_{\mu _0}\bigl (f\mid B\cap {\mathcal {G}}\bigr )\quad \quad \text {for each }f\in C_b(B). \end{aligned}$$
(10)

Hence, to conclude the proof, it suffices to replace S with B, \({\mathcal {G}}\) with \(B\cap {\mathcal {G}}\), and to apply what already proved in Step 1.

We finally prove that (4) \(\Rightarrow \) (10). For each \(n\ge 0\), as \(\mu _n(B)=1\) and \(B\in {\mathcal {B}}({\mathcal {X}})\), there is a r.c.d. for \(\mu _n\) given \(\sigma ({\mathcal {G}}\cup \{B\})\), say \(\rho _n=\{\rho _n(x):x\in S\}\), such that

$$\begin{aligned} \rho _n(x)(B)=1\quad \quad \text {for all }x\in S. \end{aligned}$$
(11)

By Lemma 7, there is a countable convergence determining class \(F\subset C_b(S)\) for \({\mathcal {Q}}=\{\mu \in {\mathcal {P}}:\mu (B)=1\}\). Moreover, condition (4) implies

$$\begin{aligned} \rho _n(\cdot )(f)=E_{\mu _n}(f\mid g)\overset{\mu _0-\text {a.s.}}{\longrightarrow }E_{\mu _0}(f\mid g)=\rho _0(\cdot )(f)\quad \text {for all }f\in F. \end{aligned}$$

Hence, because of (11) and F is countable, there is a set \(H\in {\mathcal {G}}\) such that

$$\begin{aligned} \mu _0(H)=1\quad \text {and}\quad \rho _n(x)\rightarrow \rho _0(x)\text { weakly for all }x\in H. \end{aligned}$$

On the other hand, thanks to (11), \(\rho _n(x)\rightarrow \rho _0(x)\) weakly if and only if

$$\begin{aligned} \rho _n(x)(\cdot \cap B)\rightarrow \rho _0(x)(\cdot \cap B)\text { weakly with respect to the relative topology on }B. \end{aligned}$$

Therefore,

$$\begin{aligned} \rho _0(x)(f\,1_B)=\lim _n\rho _n(x)(f\,1_B)\quad \quad \text {for all }x\in H\text { and }f\in C_b(B). \end{aligned}$$

Since \(\mu _0(H)=1\) and

$$\begin{aligned} E_{\mu _n}\bigl (f\mid B\cap {\mathcal {G}}\bigr )=\rho _n(\cdot )(f\,1_B),\quad \quad \mu _0\text {-a.s.}, \end{aligned}$$

this proves condition (10) and concludes the proof of the theorem.

4.2 Proof of Remark (vii)

Suppose S is a Borel subset of a Polish space and define the \(X_n\) as in Step 1 of the proof of Theorem 1. Since condition (3) holds, it suffices to prove \(X_n\rightarrow X_0\) in probability. Equivalently, we have to show that, for each subsequence \(n_j\), there is a sub-subsequence \(n_{j_k}\) such that \(X_{n_{j_k}}\overset{\text {a.s.}}{\longrightarrow }X_0\) as \(k\rightarrow \infty \). Fix a subsequence \(n_j\) and a countable convergence determining class F for \({\mathcal {P}}\). Since \(F\subset C_b(S)\), by a diagonalizing argument, there is a sub-subsequence \(n_{j_k}\) satisfying

$$\begin{aligned} E_{\mu _{n_{j_k}}}(f\mid g)\,\overset{\mu _0-\text {a.s.}}{\longrightarrow }\,E_{\mu _0}(f\mid g),\quad \quad \text {as }k\rightarrow \infty ,\text { for each }f\in F. \end{aligned}$$

Since F is a convergence determining class for \({\mathcal {P}}\), one obtains

$$\begin{aligned} \gamma _{n_{j_k}}(x)\,\overset{\text {weakly}}{\longrightarrow }\,\gamma _0(x),\quad \quad \text {as }k\rightarrow \infty ,\text { for }\mu _0\text {-almost all }x\in S, \end{aligned}$$

where \(\gamma _n\) is a r.c.d. for \(\mu _n\) given \({\mathcal {G}}\). In turn, this implies

$$\begin{aligned} E_{\mu _{n_{j_k}}}(f\mid g)\,\overset{\mu _0-\text {a.s.}}{\longrightarrow }\,E_{\mu _0}(f\mid g),\quad \quad \text {as }k\rightarrow \infty ,\text { for each }f\in C_b(S). \end{aligned}$$

Hence, \(X_{n_{j_k}}\overset{\text {a.s.}}{\longrightarrow }X_0\). Finally, the general case (where S is arbitrary but the \(\mu _n\) are tight) can be handled as in Step 2 of the proof of Theorem 1.

4.3 Proof of Corollary 2

Let \(\nu (\cdot )=\mu _0(\cdot \times S_2)\) denote the marginal of \(\mu _0\) on \(S_1\). In view of Theorem 1, we only have to prove that (6) \(\Rightarrow \) (4).

Let \(S=S_1\times S_2\) and \(g(x,y)=x\) for all \((x,y)\in S\). Suppose condition (6) holds. For any \(f_1:S_1\rightarrow {\mathbb {R}}\) and \(f_2:S_2\rightarrow {\mathbb {R}}\), define a function \(f_1\times f_2\) on S as

$$\begin{aligned} (f_1\times f_2)(x,y)=f_1(x)\,f_2(y)\quad \quad \text {for all }(x,y)\in S. \end{aligned}$$

Also, as in the proof of Theorem 1, take a \(\sigma \)-compact set \(B\subset S\) satisfying \(\mu _n(B)=1\) for each \(n\ge 0\). Then, by Lemma 7 and separability of \(S_1\), there are two countable collections \(F_1\subset C_b(S_1)\) and \(F_2\subset C_b(S_2)\) such that

$$\begin{aligned} F=\bigl \{f_1\times f_2:f_1\in F_1,\,f_2\in F_2\bigr \} \end{aligned}$$

is a convergence determining class for \({\mathcal {Q}}=\bigl \{\mu \in {\mathcal {P}}:\mu (B)=1\bigr \}\).

Having noted this fact, fix a r.c.d. \(\rho _n=\bigl \{\rho _n(x,y):(x,y)\in S\bigr \}\) for \(\mu _n\) given \(\sigma \bigl ({\mathcal {G}}\cup \{B\}\bigr )\) such that \(\rho _n(x,y)(B)=1\) for all \((x,y)\in S\). Since \(g(x,y)=x\) and \(\mu _n(B)=1\), we can write \(\rho _n(x)\) instead of \(\rho _n(x,y)\). Then, for each \(f=f_1\times f_2\in F\), condition (6) implies

$$\begin{aligned} \rho _n(x)(f)=f_1(x)\,\rho _n(x)(f_2)\longrightarrow f_1(x)\,\rho _0(x)(f_2)=\rho _0(x)(f) \end{aligned}$$

for \(\nu \)-almost all \(x\in S_1\). Since F is countable, it follows that

$$\begin{aligned} \nu \bigl \{x\in S_1:\rho _n(x)\rightarrow \rho _0(x)\text { weakly}\bigr \}=1, \end{aligned}$$

which in turn implies condition (4). This concludes the proof.

4.4 Proof of Corollary 3

Let

$$\begin{aligned} \hat{{\mathcal {G}}}=\phi ^{-1}({\mathcal {C}})\quad \text {and}\quad {\hat{\mu }}_n(D)=\mu _n(S\cap D)\quad \text {for all }n\ge 0\text { and }D\in {\mathcal {B}}({\mathcal {X}}). \end{aligned}$$

The \({\hat{\mu }}_n\) are probability measures on \({\mathcal {B}}({\mathcal {X}})\) and

$$\begin{aligned} {\hat{\mu }}_n\bigl (\phi ^{-1}(C)\bigr )=\mu _n\bigl (S\cap \phi ^{-1}(C)\bigr )=\mu _n\bigl (g^{-1}(C)\bigr )=\mu _0\bigl (g^{-1}(C)\bigr )={\hat{\mu }}_0\bigl (\phi ^{-1}(C)\bigr ) \end{aligned}$$

for all \(n\ge 0\) and \(C\in {\mathcal {C}}\). Thus, \({\hat{\mu }}_n={\hat{\mu }}_0\) on \(\hat{{\mathcal {G}}}\) for each \(n\ge 0\).

Let \(L=(0,1)^2\) and let \({\mathcal {L}}=m^2\) be the Lebesgue measure on \({\mathcal {B}}(L)\). Because of Theorem 1 (and its proof), on the probability space \(\bigl (L,{\mathcal {B}}(L),{\mathcal {L}})\), there are \({\mathcal {X}}\)-valued random variables \({\hat{X}}_n\) such that

$$\begin{aligned} {\hat{X}}_n\sim {\hat{\mu }}_n\,\text { and }\,\phi ({\hat{X}}_n)=\phi ({\hat{X}}_0)\,\text { for all }n\ge 0. \end{aligned}$$

Furthermore, \({\hat{X}}_n\overset{\text {a.s.}}{\longrightarrow }{\hat{X}}_0\) provided

$$\begin{aligned} E_{{\hat{\mu }}_n}(f\mid \hat{{\mathcal {G}}})\,\overset{{\hat{\mu }}_0-\text {a.s.}}{\longrightarrow }\,E_{{\hat{\mu }}_0}(f\mid \hat{{\mathcal {G}}})\quad \quad \text {for all }f\in C_b({\mathcal {X}}). \end{aligned}$$
(12)

Let \({\mathcal {L}}^*\) denote the \({\mathcal {L}}\)-outer measure and

$$\begin{aligned} \Omega =\bigl \{{\hat{X}}_n\in S\text { for each }n\ge 0\bigr \}\quad \text {and}\quad {\mathcal {A}}={\mathcal {B}}(\Omega )=\Omega \cap {\mathcal {B}}(L). \end{aligned}$$

Suppose \({\mathcal {L}}^*(\Omega )=1\). Under this assumption, one can define

$$\begin{aligned} P(B\cap \Omega )={\mathcal {L}}(B)\quad \quad \text {for all }B\in {\mathcal {B}}(L). \end{aligned}$$

Such a P is a probability measure on \({\mathcal {A}}\) and condition (3) is satisfied by the S-valued random variables

$$\begin{aligned} X_n(\omega )={\hat{X}}_n(\omega )\quad \quad \text {for all }n\ge 0\text { and }\omega \in \Omega . \end{aligned}$$

We next prove \({\mathcal {L}}^*(\Omega )=1\). Since \(\mu _n\) is tight for \(n\ge 1\), there is a \(\sigma \)-compact subset \(B\subset S\) such that \(\mu _n(B)=1\) for each \(n\ge 1\). On noting that \(B\in {\mathcal {B}}({\mathcal {X}})\) (in fact, B is a \(\sigma \)-compact subset of \({\mathcal {X}}\)), one obtains

$$\begin{aligned} \bigl \{{\hat{X}}_n\in B\bigr \}\in {\mathcal {B}}(L)\quad \text {and}\quad {\mathcal {L}}({\hat{X}}_n\in B)={\hat{\mu }}_n(B)=\mu _n(B)=1\text { for each }n\ge 1. \end{aligned}$$

It follows that

$$\begin{aligned} {\mathcal {L}}\bigl ({\hat{X}}_n\in B\text { for each }n\ge 1\bigr )=1. \end{aligned}$$

Moreover, since \({\mathcal {L}}\) is tight,

$$\begin{aligned} {\mathcal {L}}^*\bigl ({\hat{X}}_0\in S\bigr )={\hat{\mu }}_0^*(S)=1 \end{aligned}$$

where \({\hat{\mu }}_0^*\) is the \({\hat{\mu }}_0\)-outer measure; see the proof of [3, Th. 3.1] and [13, Th. 3.4.1]. Thus, \(B\subset S\) implies

$$\begin{aligned} {\mathcal {L}}^*(\Omega )\ge {\mathcal {L}}^*\bigl ({\hat{X}}_0\in S\text { and }{\hat{X}}_n\in B\text { for each }n\ge 1\bigr )={\mathcal {L}}^*({\hat{X}}_0\in S)=1. \end{aligned}$$

Hence, \({\mathcal {L}}^*(\Omega )=1\) and this proves condition (3).

Finally, it is not hard to show that condition (4) implies condition (12) (we omit the explicit calculations). Therefore, under (4), one obtains \({\hat{X}}_n\overset{\text {a.s.}}{\longrightarrow }{\hat{X}}_0\), which in turn implies \(X_n\overset{\text {a.s.}}{\longrightarrow }X_0\). This concludes the proof.

4.5 Proof of Corollary 4

Thanks to the assumptions on \((T,{\mathcal {C}})\) and \((\mu _n:n\ge 0)\), on the probability space \(\bigl ((0,1)^2,{\mathcal {B}}((0,1)^2),\,m^2\bigr )\), there are random variables \(V_n\) such that \(V_n\overset{\text {a.s.}}{\longrightarrow }V_0\), \(V_n\sim \mu _n\) and \(g(V_n)=g(V_0)\) for all \(n\ge 0\); see Theorem 1 and its proof. Moreover, being a nonatomic probability space, \((\Omega ,{\mathcal {A}},P)\) supports a random variable T with distribution \(m^2\), namely, \(T\sim m^2\) for some measurable map \(T:\Omega \rightarrow (0,1)^2\); see, e.g., [3, Th. 3.1]. Therefore, it suffices to let \(X_n=V_n\circ T\) for all \(n\ge 0\).