1 Introduction

Let S be a metric space and \((\mu _n:n\ge 0)\) a sequence of probability measures on \(\mathcal {B}(S)\). (Throughout, for any topological space T, we let \(\mathcal {B}(T)\) denote the Borel \(\sigma \)-field on T). We say that \((X_n:n\ge 0)\) is a coupling of \((\mu _n)\) if

  • The \(X_n\) are S-valued random variables, all defined on the same probability space, such that \(X_n\sim \mu _n\) for each \(n\ge 0\).

The Skorohod representation theorem (SRT) states that, if \(\mu _n\rightarrow \mu _0\) weakly and \(\mu _0\) has a separable support, there is a coupling \((X_n)\) of \((\mu _n)\) such that \(X_n\overset{a.s.}{\longrightarrow }X_0\). This version of SRT is due to Wichura (Wichura 1970) who reworked the previous versions by Skorohod (Skorohod 1956) and Dudley (Dudley 1968). We refer to [Dudley (1999), p. 130] and [van der Vaart and Wellner (1996), p. 77] for historical notes, and to Berti et al. (2013) for the case where \(\mu _0\) does not have a separable support. Some other related references are (Berti et al. 2007, 2011, 2015; Blackwell and Dubins 1983; Chau and Rasonyi 2017; Cortissoz 2007; Dumav and Stinchombe 2016; Fernique 1988; Hernandez-Ceron 2010; Jakubowski 1998; Sethuraman 2002).

We aim at getting some new results in the spirit of SRT. Our starting point is the following version of SRT, recently proved in Pratelli and Rigo (2023).

Theorem 1

Let T be a separable metric space, \(g:S\rightarrow T\) a Borel function, and

$$\begin{aligned} \sigma (g)=\bigl \{g^{-1}(B):B\in \mathcal {B}(T)\bigr \}. \end{aligned}$$

Suppose

$$\begin{aligned} \mu _n\text { is tight and }\mu _n=\mu _0\text { on }\sigma (g)\text { for every }n\ge 0. \end{aligned}$$

Then, on some probability space \((\Omega ,\mathcal {A},\mathbb {P})\), there is a coupling \((X_n)\) of \((\mu _n)\) such that

$$\begin{aligned} g(X_n)=g(X_0)\,\text { for all }n\ge 0. \end{aligned}$$
(1)

In addition to (1), one also obtains \(X_n\overset{\mathbb {P}-a.s.}{\longrightarrow }X_0\) if and only if

$$\begin{aligned} E_{\mu _n}(f\mid g)\,\overset{\mu _0-a.s.}{\longrightarrow }\,E_{\mu _0}(f\mid g)\quad \quad \text {for each bounded continuous }f:S\rightarrow \mathbb {R}. \end{aligned}$$

Here and in the sequel, for any probability \(\nu \) on \(\mathcal {B}(S)\), the notation \(E_\nu (f\mid g)\) stands for the conditional expectation of f given \(\sigma (g)\) in the probability space \((S,\mathcal {B}(S),\nu )\). Note that, when g is constant, \(\sigma (g)\) reduces to the trivial \(\sigma \)-field and \(E_\nu (f\mid g)=E_\nu (f)=\int f\,d\nu \). Hence, if g is constant and the \(\mu _n\) are tight, Theorem 1 reduces to SRT.

This paper provides some extensions of Theorem 1 and investigates some of its consequences. Our results are of three types.

  1. (i)

    In Theorem 3, \(\sigma (g)\) is replaced by an arbitrary sub-\(\sigma \)-field \(\mathcal {G}\subset \mathcal {B}(S)\). In this case, the coupling \((X_n)\) of \((\mu _n)\) only satisfies

    $$\begin{aligned} \mathbb {P}(X_n\in A,\,X_0\notin A)=0\quad \quad \text {for all }A\in \mathcal {G}\text { and }n\ge 0. \end{aligned}$$

    However, in the special case \(\mathcal {G}=\sigma (g)\) with g as in Theorem 1, the above condition is equivalent to \(g(X_n)=g(X_0)\) a.s. Hence, Theorem 3 actually extends Theorem 1.

  2. (ii)

    In Examples 3 and 4, Theorem 1 is applied to some specific frameworks. Example 3 deals with a sequence \((U_n:n\ge 0)\) of cadlag processes with finite activity. Let \(U_n^*\) be the continuous part of \(U_n\). It is shown that, if \(U_n^*\sim U_0^*\) for all n, the \(U_n\) admit a common decomposition. Precisely,

    $$\begin{aligned} U_n\sim I+J_n\quad \quad \text {for all }n\ge 0, \end{aligned}$$

    where the processes I and \(J_n\) are defined on the same probability space, I has continuous paths and \(J_n\) is a pure jump process. Example 4 is concerned with optimal transport. It is shown that Theorem 1 implies (and slightly improves) a recent duality result on equivalence couplings and total variation distances; see (Jaffe 2023).

  3. (iii)

    In Sect. 4, we deal with models and kernels. Let \((\Theta ,\mathcal {H})\) and \((\mathcal {X},\mathcal {E})\) be measurable spaces. A model is a collection \(\mathcal {P}=\{P_\theta :\,\theta \in \Theta \}\), indexed by \(\Theta \), of probability measures \(P_\theta \) on \(\mathcal {E}\). A kernel is a model which satisfies a certain measurability condition. Those non-atomic kernels such that

    $$\begin{aligned} P_\theta \Bigl (h^{-1}\{\theta \}\Bigr )=1,\quad \quad \theta \in \Theta , \end{aligned}$$

    for some measurable function \(h:\mathcal {X}\rightarrow \Theta \), have been recently characterized. Such a characterization, obtained in Hansen et al. (2024), is reported in Theorem 4. Our contribution consists in two versions of Theorem 4. One extends Theorem 4 from kernels to models, while the other is in the spirit of Theorem 1. Unlike Theorem 4, both versions admit a straightforward proof. Obviously, models and kernels are fundamental in probability theory (just think of conditional distributions and Markov processes). But models and kernels play a role in many other frameworks. For instance, in decision theory, a model \(\mathcal {P}\) can be regarded as the collection of probability distributions of a state-contingent payoff conditional on a parameter \(\theta \). Or else, in statistical inference, \(\mathcal {P}\) may be viewed as the class of possible probability distributions on the data. Accordingly, Theorem 4 and its two versions can be attached some interpretation. In Sect. 4, this interpretation is discussed and various examples are given.

2 Preliminaries

We briefly recall some well known definitions and results. To this end, we let \((\mathcal {X},\mathcal {E},\mu )\) denote any probability space.

The measurable space \((\mathcal {X},\mathcal {E})\) is said to be a standard Borel space if \(\mathcal {X}\) is a Borel subset of a Polish space and \(\mathcal {E}=\mathcal {B}(\mathcal {X})\). Similarly, \((\mathcal {X},\mathcal {E})\) is a Radon space if \(\mathcal {X}\) is a metric space, \(\mathcal {E}=\mathcal {B}(\mathcal {X})\), and each probability measure on \(\mathcal {E}\) is tight. A standard Borel space is a Radon space but not conversely. For instance, if \(\mathcal {X}\) is a universally measurable, non-Borel subset of a Polish space, then \((\mathcal {X},\mathcal {B}(\mathcal {X}))\) is not a standard Borel space but every probability measure on \(\mathcal {B}(\mathcal {X})\) is tight.

A \(\mu \)-atom is a set \(A\in \mathcal {E}\) such that \(\mu (A)>0\) and \(\mu (\cdot \mid A)\) is 0-1 valued. We say that \((\mathcal {X},\mathcal {E},\mu )\) is a non-atomic probability space, or that \(\mu \) is non-atomic, if \(\mu \) has no atoms. If \(\mathcal {X}\) is a separable metric space and \(\mathcal {E}=\mathcal {B}(\mathcal {X})\), then \(\mu \) is non-atomic if and only if \(\mu \{x\}=0\) for all \(x\in \mathcal {X}\).

Let \(\mathcal {F}\subset \mathcal {E}\) be a sub-\(\sigma \)-field. A regular conditional distribution for \(\mu \) given \(\mathcal {F}\) is a collection \(\gamma =\{\gamma (x):x\in \mathcal {X}\}\) such that:

\(\gamma (x)\) is a probability measure on \(\mathcal {E}\) for each \(x\in \mathcal {X}\);

\(\gamma (\cdot )(A)\) is a version of \(E_\mu (1_A\mid \mathcal {F})\) for each \(A\in \mathcal {E}\).

If \((\mathcal {X},\mathcal {E})\) is a Radon space, a regular conditional distribution for \(\mu \) given \(\mathcal {F}\) exists and is \(\mu \)-a.s. unique.

Finally, to prove forthcoming Theorem 3, we report the following version of SRT; see (Blackwell and Dubins 1983) and [Hernandez-Ceron (2010), p. 52–54] for a detailed proof.

Theorem 2

(Blackwell and Dubins) Let m be the Lebesgue measure on \(\mathcal {B}((0,1))\) and \(\Lambda \) the collection of probability measures on \(\mathcal {B}(S)\). If S is Polish, there is a Borel map \(\Phi :(0,1)\times \Lambda \rightarrow S\) such that

  • \(m\bigl \{\beta \in (0,1):\Phi (\beta ,\lambda )\in B\bigr \}=\lambda (B)\) for all \(\lambda \in \Lambda \) and \(B\in \mathcal {B}(S)\);

  • \(m\bigl \{\beta \in (0,1):\Phi (\beta ,\lambda _n)\rightarrow \Phi (\beta ,\lambda _0)\bigr \}=1\) if \(\lambda _n\in \Lambda \) and \(\lambda _n\rightarrow \lambda _0\) weakly.

It is easily seen that Theorem 2 is still true if S is a Borel subset of a Polish space (but not necessarily a Polish space).

3 Theorem 1 and its consequences

This section includes three applications of Theorem 1, outlined in the form of examples, as well as an extension of Theorem 1. We begin with the latter.

Any \(\sigma \)-field \(\mathcal {G}\) over S can be written as \(\mathcal {G}=\sigma (g)\) for a suitable function g on S. More precisely, the following result is available.

Lemma 1

For each \(\sigma \)-field \(\mathcal {G}\) over S, there are a measurable space \((T,\mathcal {C})\) and a function \(g:S\rightarrow T\) such that

$$\begin{aligned} \mathcal {G}=\bigl \{g^{-1}(C):C\in \mathcal {C}\bigr \}=\sigma (g). \end{aligned}$$

Proof

For each \(x\in S\), let H(x) be the \(\mathcal {G}\)-atom including the point x, that is

$$\begin{aligned} H(x)=\bigl \{y\in S:1_B(y)=1_B(x)\text { for each }B\in \mathcal {G}\bigr \}; \end{aligned}$$

see e.g. (Berti and Rigo 2007) and (Blackwell and Dubins 1975). Define

$$\begin{aligned} T=\bigl \{H(x):x\in S\bigr \}. \end{aligned}$$

Then, T is a partition of S and every element of \(\mathcal {G}\) is a union of elements of T. For any \(C\subset T\), define \(C^*=\bigl \{x\in S:H(x)\in C\bigr \}\). Then, it suffices to let

$$\begin{aligned} \mathcal {C}=\bigl \{C\subset T:C^*\in \mathcal {G}\bigr \}\quad \text {and}\quad g(x)=H(x)\quad \text { for every }x\in S. \end{aligned}$$

\(\square \)

Based on Lemma 1, it is tempting to extend Theorem 1 to an arbitrary sub-\(\sigma \)-field \(\mathcal {G}\subset \mathcal {B}(S)\). This is impossible, however, if Theorem 1 is stated as above.

Example 1

Suppose \(\mu _n\{x\}=\mu _0\{x\}=0\) for all \(x\in S\) and take \(\mathcal {G}\) to be the collection of countable and co-countable subsets of S. In this case, \(\mu _n=\mu _0\) on \(\mathcal {G}\). However, since \(\mathcal {G}\) includes the singletons, any function g such that \(\mathcal {G}=\sigma (g)\) is injective, so that \(g(X_n)=g(X_0)\) amounts to \(X_n=X_0\). Hence, \((\mu _n)\) admits a coupling \((X_n)\) satisfying condition (1) if and only if \(\mu _n=\mu _0\) on all of \(\mathcal {B}(S)\).

The next result is motivated by the previous comments. In the sequel, for any topological space T, we denote by \(C_b(T)\) the collection of real bounded continuous functions on T.

Theorem 3

Fix a sub-\(\sigma \)-field \(\mathcal {G}\subset \mathcal {B}(S)\) and suppose

$$\begin{aligned} \mu _n\text { is tight and }\mu _n=\mu _0\text { on }\mathcal {G}\text { for every }n\ge 0. \end{aligned}$$

Then, on some probability space \((\Omega ,\mathcal {A},\mathbb {P})\), there is a coupling \((X_n)\) of \((\mu _n)\) such that

$$\begin{aligned} \mathbb {P}(X_n\in A,\,X_0\notin A)=0\,\text { for all }A\in \mathcal {G}\text { and }n\ge 0. \end{aligned}$$
(2)

In addition to (2), one also obtains \(X_n\overset{\mathbb {P}-a.s.}{\longrightarrow }X_0\) if and only if

$$\begin{aligned} E_{\mu _n}(f\mid \mathcal {G})\,\overset{\mu _0-a.s.}{\longrightarrow }\,E_{\mu _0}(f\mid \mathcal {G})\quad \quad \text {for each }f\in C_b(S). \end{aligned}$$
(3)

Proof

We just give a sketch of the proof, for it is quite similar to that of Theorem 1.

Since all the \(\mu _n\) are tight, S can be assumed to be a Borel subset of a Polish space. Hence, Theorem 2 applies. Moreover, for each \(n\ge 0\), we can fix a regular conditional distribution for \(\mu _n\) given \(\mathcal {G}\), say \(\gamma _n=\{\gamma _n(x):x\in S\}\); see Sect. 2.

Let m be the Lebesgue measure on \(\mathcal {B}((0,1))\) and \(\Phi :(0,1)\times \Lambda \rightarrow S\) the Borel map involved in Theorem 2. Define

$$\begin{aligned} \Omega =(0,1)\times (0,1),\quad \mathcal {A}=\mathcal {B}\Bigl ((0,1)\times (0,1)\Bigr ),\quad \mathbb {P}=m\times m. \end{aligned}$$

For each \(n\ge 0\) and \((\alpha ,\beta )\in (0,1)\times (0,1)\), define also

$$\begin{aligned} \phi (\alpha )=\Phi (\alpha ,\mu _0)\quad \text {and}\quad X_n(\alpha ,\beta )=\Phi \Bigl (\beta ,\,\gamma _n[\phi (\alpha )]\Bigr ). \end{aligned}$$

The \(X_n\) are S-valued random variables on \((\Omega ,\mathcal {A},\mathbb {P})\). Arguing as in the proof of Theorem 1, it can be shown that \((X_n)\) is a coupling of \((\mu _n)\) and condition (3) is equivalent to \(X_n\overset{\mathbb {P}-a.s.}{\longrightarrow }X_0\). Finally, we prove (2). Fix \(A\in \mathcal {G}\) and note that

$$\begin{aligned}{} & {} m\Bigl \{\beta :X_n(\alpha ,\beta )\in A,\,X_0(\alpha ,\beta )\notin A\Bigr \}\\\le & {} \min \Bigl \{m\bigl \{\beta :X_n(\alpha ,\beta )\in A\bigr \},\,m\bigl \{\beta :X_0(\alpha ,\beta )\notin A\bigr \}\Bigr \}\\= & {} \min \Bigl \{\gamma _n[\phi (\alpha )](A),\,\gamma _0[\phi (\alpha )](A^c)\Bigr \}\quad \quad \text {for all }\alpha \in (0,1). \end{aligned}$$

Since \(A\in \mathcal {G}\), then \(\gamma _n(x)(A)=1_A(x)\) for \(\mu _n\)-almost all \(x\in S\). Since \(\mu _n=\mu _0\) on \(\mathcal {G}\), it follows that

$$\begin{aligned} \mu _0\bigl \{x\in S:\,\gamma _n(x)(A)=1_A(x)\bigr \}=1. \end{aligned}$$

Therefore, since \(m\circ \phi ^{-1}=\mu _0\), one obtains

$$\begin{aligned} \mathbb {P}(X_n\in A,\,X_0\notin A)= & {} \int _0^1 m\Bigl \{\beta :X_n(\alpha ,\beta )\in A,\,X_0(\alpha ,\beta )\notin A\Bigr \}\,d\alpha \\\le & {} \int _0^1 \min \Bigl \{\gamma _n[\phi (\alpha )](A),\,\gamma _0[\phi (\alpha )](A^c)\Bigr \}\,d\alpha \\= & {} \int \min \Bigl \{\gamma _n(x)(A),\,\gamma _0(x)(A^c)\Bigr \}\,\mu _0(dx)\\= & {} \int \min \Bigl \{1_A(x),\,1_{A^c}(x)\Bigr \}\,\mu _0(dx)=0. \end{aligned}$$

\(\square \)

Theorem 3 extends Theorem 1 to an arbitrary sub-\(\sigma \)-field \(\mathcal {G}\subset \mathcal {B}(S)\). In fact, if g is as in Theorem 1, then \(g(X_n)=g(X_0)\) a.s. if and only if \(\mathbb {P}(X_n\in A,\,X_0\notin A)=0\) for all \(A\in \sigma (g)\).

One more remark on Theorem 3 is in order. If X and Y are S-valued random variables on \((\Omega ,\mathcal {A},\mathbb {P})\) such that

$$\begin{aligned} \mathbb {P}(X\in A,\,Y\notin A)=0\quad \quad \text {for all }A\in \mathcal {G}, \end{aligned}$$
(4)

then

$$\begin{aligned} \mathbb {P}(X\in A)=\mathbb {P}(X\in A,\,Y\in A)=\mathbb {P}(Y\in A)\quad \quad \text {for all }A\in \mathcal {G}. \end{aligned}$$

Therefore, for any tight probability measures \(\mu \) and \(\nu \) on \(\mathcal {B}(S)\), Theorem 3 yields

$$\begin{aligned} \mu =\nu \text { on }\mathcal {G}\quad \Leftrightarrow \quad \text {Condition (4) holds for some }X\text { and }Y\\\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\text {such that }X\sim \mu \text { and }Y\sim \nu . \end{aligned}$$

We now turn to some applications of Theorem 1. We begin with an example which is not new but may be useful to make clear the scope of Theorem 1.

Example 2

(Corollary 2 of Pratelli and Rigo (2023)) For each \(n\ge 0\), let \(U_n\) and \(V_n\) be random variables on a probability space \((\Omega _n,\mathcal {A}_n,\mathbb {P}_n)\). Suppose \(U_n\) is \(S_1\)-valued and \(V_n\) is \(S_2\)-valued, where \(S_1\) and \(S_2\) are metric spaces and \(S_1\) is separable. Suppose also that \(U_n\sim U_0\) and \((U_n,V_n)\) has a tight probability distribution. Under these conditions, Theorem 1 applies to \(S=S_1\times S_2\) and \(g(x,y)=x\). It follows that, on a probability space \((\Omega ,\mathcal {A},\mathbb {P})\), there are random variables U and \(V_n^*\) such that \((U,V_n^*)\sim (U_n,V_n)\) for all \(n\ge 0\). Moreover, \(V_n^*\overset{a.s.}{\longrightarrow }V_0^*\) if and only if \(E_{\mu _n}(f\mid g)\overset{\mu _0-a.s.}{\longrightarrow }E_{\mu _0}(f\mid g)\) for each \(f\in C_b(S_2)\), where \(\mu _n\) denotes the probability distribution of \((U_n,V_n)\).

In a nutshell, Example 2 may be summarized as follows. If \(U_n\sim U_0\) for all n, the random variables \((U_n,V_n)\) can be replaced by \((U,V_n^*)\). In addition to satisfying \((U,V_n^*)\sim (U_n,V_n)\), the new random variables \((U,V_n^*)\) are all defined on the same probability space and they all have the same first coordinate (that is, U). Using \((U,V_n^*)\) instead of \((U_n,V_n)\) may be useful in various settings, such as mass transportation and stochastic control.

The next example deals with a sequence \(U_0,U_1,\ldots \) of cadlag processes indexed by \([0,\infty )\). Using Theorem 1 we prove that, if the continuous part of \(U_n\) is distributed as that of \(U_0\) for every n, then \(U_0,U_1,\ldots \) can be coupled so as to have exactly the same continuous part.

Example 3

(Decomposition of cadlag processes with finite activity) Let D be the set of real cadlag functions on \([0,\infty )\), equipped with the Skorohod topology. Define

$$\begin{aligned} S= & {} \bigl \{x\in D:\sum _{0<s\le t}|\Delta x(s)|<\infty \text { for each }t>0\bigr \}\quad \text {and}\\ g(x)(t)= & {} x(t)-\sum _{0<s\le t}\Delta x(s)\quad \quad \text {for each }x\in S\text { and }t\ge 0, \end{aligned}$$

where \(\Delta x(s)=x(s)-x(s-)\) is the jump of x at the point s. In financial econometrics, a cadlag function is said to have finite activity if it has only finitely many jumps on any bounded interval. Hence, in particular, S includes all elements of D with finite activity. In turn, the function g associates every \(x\in S\) with its continuous part g(x). It can be shown that \(g:S\rightarrow C\) is a Borel map, where C denotes the set of continuous functions on \([0,\infty )\) (we omit the calculations). Moreover, since D is Polish and \(S\in \mathcal {B}(D)\), each probability measure on \(\mathcal {B}(S)\) is tight.

For each \(n\ge 0\), let \(U_n\) be a process with paths in S. Suppose \(g(U_n)\sim g(U_0)\) for each \(n\ge 0\), namely, the continuous parts of the \(U_n\) are identically distributed. Then, there are processes I and \(J_n\) such that:

  • I and the \(J_n\) are all defined on the same probability space;

  • \(I+J_n\sim U_n\) for all \(n\ge 0\);

  • I has continuous paths while \(J_n\) is a pure jump process.

The existence of I and \(J_n\) follows from Theorem 1. It suffices to take \(\mu _n\) as the probability distribution of \(U_n\) and to let

$$\begin{aligned} I=g(X_0)\quad \text {and}\quad J_n=X_n-g(X_n)=X_n-g(X_0). \end{aligned}$$

Note also that \(I+J_n\overset{a.s.}{\longrightarrow }I+J_0\) (in the Skorohod topology) if and only if

$$\begin{aligned} E_{\mu _n}(f\mid g)\overset{\mu _0-a.s.}{\longrightarrow }E_{\mu _0}(f\mid g)\quad \quad \text {for each }f\in C_b(S). \end{aligned}$$

Our next example deals with a notion of duality recently introduced by Jaffe (2023). In addition to be theoretically intriguing, this notion is potentially useful in various frameworks, including mathematical finance, decision theory, mass transportation and probability theory.

Example 4

(Equivalence couplings and total variation) To keep the notation easier, in this example, we write \(\mathcal {B}\) instead of \(\mathcal {B}(S)\). Let \(E\subset S\times S\) be a measurable equivalence relation. This means that \(E\in \mathcal {B}\otimes \mathcal {B}\) and the relation on S defined as

$$\begin{aligned} x\sim y\quad \Leftrightarrow \quad (x,y)\in E \end{aligned}$$

is reflexive, symmetric and transitive. Say that E is strongly dualizable if there is a sub-\(\sigma \)-field \(\mathcal {C}\subset \mathcal {B}\) such that

$$\begin{aligned} \min _{P\in \Gamma (\mu ,\nu )}(1-P(E))=\sup _{A\in \mathcal {C}}\,|\mu (A)-\nu (A)| \end{aligned}$$
(5)

for all probability measures \(\mu \) and \(\nu \) on \(\mathcal {B}\). Here, \(\Gamma (\mu ,\nu )\) is the collection of probability measures on \(\mathcal {B}\otimes \mathcal {B}\) with marginals \(\mu \) and \(\nu \), and the notation “\(\min \)" asserts that the infimum is actually achieved.

Various conditions for E to be strongly dualizable are in Jaffe (2023); see also (Pratelli and Rigo 2024). One of such conditions is the following. Define the sub-\(\sigma \)-field

$$\begin{aligned} \mathcal {U}=\bigl \{A\in \mathcal {B}:1_A(x)=1_A(y)\quad \text {for all }(x,y)\in E\bigr \}. \end{aligned}$$

Then, E is strongly dualizable provided \(E\in \mathcal {U}\otimes \mathcal {U}\) and \((S,\mathcal {B})\) is a standard Borel space; see [Jaffe (2023), Theo. 3.13] and [Pratelli and Rigo (2024), Cor. 6]. This result is a consequence of Theorem 1, however, as we now prove. Moreover, the assumption that \((S,\mathcal {B})\) is standard Borel can be weakened.

Suppose \(E\in \mathcal {U}\otimes \mathcal {U}\) and \((S,\mathcal {B})\) is a Radon space. Since \(E\in \mathcal {U}\otimes \mathcal {U}\),

$$\begin{aligned} E\in \sigma \bigl (A_1\times B_1,\,A_2\times B_2,\,\ldots \bigr ) \end{aligned}$$

for some \(A_n,\,B_n\in \mathcal {U}\), \(n\ge 1\). Define \(\mathcal {G}=\sigma (A_1,\,B_1,\,A_2,\,B_2,\,\ldots )\). Since \(\mathcal {G}\) is a countably generated sub-\(\sigma \)-field of \(\mathcal {B}\), there is a Borel function \(g:S\rightarrow \mathbb {R}\) such that \(\mathcal {G}=\sigma (g)\). Moreover, since \(E\in \mathcal {G}\otimes \mathcal {G}\), one obtains

$$\begin{aligned} \bigl \{(x,y):\,g(x)=g(y)\bigr \}\subset E. \end{aligned}$$

Next, fix two probability measures \(\mu \) and \(\nu \) on \(\mathcal {B}\) such that \(\mu =\nu \) on \(\mathcal {U}\). Since \(\mathcal {G}\subset \mathcal {U}\) and \((S,\mathcal {B})\) is a Radon space, \(\mu \) and \(\nu \) are tight and \(\mu =\nu \) on \(\mathcal {G}\). Because of Theorem 1, applied with \(\mu _0=\mu \) and \(\mu _n=\nu \) for \(n>0\), there are S-valued random variables \(X_0\) and \(X_1\) such that \(X_0\sim \mu \), \(X_1\sim \nu \) and \(g(X_0)=g(X_1)\). Denoting by P the probabilty distribution of \((X_0,X_1)\), it follows that

$$\begin{aligned} P\in \Gamma (\mu ,\nu )\quad \text {and}\quad P(E)\ge P\bigl \{(x,y):\,g(x)=g(y)\bigr \}=1. \end{aligned}$$

Therefore, letting \(\mathcal {C}=\mathcal {U}\), equation (5) holds provided \(\mu =\nu \) on \(\mathcal {U}\). This concludes the proof. In fact, if \(\mathcal {C}=\mathcal {U}\), equation (5) holds for all \(\mu \) and \(\nu \) if and only if it holds for those \(\mu \) and \(\nu \) such that \(\mu =\nu \) on \(\mathcal {U}\); see e.g. [Jaffe (2023),Prop. 3.9].

4 Kernels versus models

Let \((\Theta ,\mathcal {H})\) and \((\mathcal {X},\mathcal {E})\) be measurable spaces. To avoid trivialities, we assume

$$\begin{aligned} \text {card}\,(\mathcal {X})>1. \end{aligned}$$

A model is a collection

$$\begin{aligned} \mathcal {P}=\{P_\theta :\,\theta \in \Theta \} \end{aligned}$$

where each \(P_\theta \) is a probability measure on \(\mathcal {E}\). A model \(\mathcal {P}\) is non-atomic if \(P_\theta \) is a non-atomic probability measure on \(\mathcal {E}\) for each \(\theta \in \Theta \). Moreover, \(\mathcal {P}\) is measurable if the real valued map \(\theta \mapsto P_\theta (A)\) is \(\mathcal {H}\)-measurable for fixed \(A\in \mathcal {E}\). A measurable model is usually called a kernel.

One more definition is needed. Suppose \(\mathcal {H}\) includes the singletons. Then, a model \(\mathcal {P}\) is said to be orthogonal if there is a measurable function \(h:\mathcal {X}\rightarrow \Theta \) such that

$$\begin{aligned} P_\theta \Bigl (h^{-1}\{\theta \}\Bigr )=1\quad \quad \text {for all }\theta \in \Theta . \end{aligned}$$

Here, measurability of h is meant as \(h^{-1}(\mathcal {H})\subset \mathcal {E}\). Orthogonal kernels are investigated in Mauldin et al. (1983) and (Weis 1984). They are involved in many contexts, including ergodic decompositions, Gibbs states, disintegrations and extremal models; see e.g. (Berti and Rigo 2007; Blackwell and Dubins 1975; Dynkin 1978; Farrell 1962; Fölmer 1975; Lauritzen 1974; Maitra 1977). The next example, even if obvious, is useful to frame orthogonal kernels.

Example 5

(An orthogonal kernel) For any real random variables U and V, there is an orthogonal version of the conditional distribution of (UV) given U. Take in fact \((\Theta ,\mathcal {H})=(\mathbb {R},\mathcal {B}(\mathbb {R}))\), \((\mathcal {X},\mathcal {E})=(\mathbb {R}^2,\mathcal {B}(\mathbb {R}^2))\) and define the function \(h(u,v)=u\) for all \((u,v)\in \mathbb {R}^2\). Also, denote by \(\pi \) the marginal distribution of U. Any kernel \(\mathcal {P}=\{P_\theta :\,\theta \in \Theta \}\) satisfying the equation

$$\begin{aligned} \text {Prob}\bigl (U\in A,\,V\in B\bigr )=\int _A P_\theta (\mathbb {R}\times B)\,\pi (d\theta ),\quad \text {for all }A,\,B\in \mathcal {B}(\mathbb {R}), \end{aligned}$$

is a version of the conditional distribution of (UV) given U. If \(\mathcal {P}\) is one such version, it is well known that

$$\begin{aligned} P_\theta \Bigl (h^{-1}\{\theta \}\Bigr )=P_\theta \bigl (\{\theta \}\times \mathbb {R}\bigr )=1\quad \quad \text {for }\pi \text {-almost all }\theta \in \mathbb {R}; \end{aligned}$$

see e.g. (Berti and Rigo 2007) and (Blackwell and Dubins 1975). Hence, up to modifying \(\mathcal {P}\) on a \(\pi \)-null set, one obtains a kernel \(\mathcal {Q}=\{Q_\theta :\,\theta \in \Theta \}\) such that

$$\begin{aligned} \pi \bigl \{\theta :Q_\theta \ne P_\theta \bigr \}=0\quad \text {and}\quad Q_\theta \Bigl (h^{-1}\{\theta \}\Bigr )=1\quad \quad \text {for all }\theta \in \mathbb {R}. \end{aligned}$$

Such a \(\mathcal {Q}\) is an orthogonal version of the conditional distribution of (UV) given U.

In this section, we focus on the following result from (Hansen et al. 2024).

Theorem 4

(Hansen, Maccheroni, Marinacci, Sargent) Let \((\Theta ,\mathcal {H})\) and \((\mathcal {X},\mathcal {E})\) be standard Borel spaces and \(\mathcal {P}\) a kernel. Then, \(\mathcal {P}\) is non-atomic and orthogonal if and only if, for any other kernel \(\mathcal {Q}=\{Q_\theta :\theta \in \Theta \}\), there is a measurable function \(f:\mathcal {X}\rightarrow \mathcal {X}\) such that

$$\begin{aligned} Q_\theta =P_\theta \circ f^{-1}\quad \quad \text {for each }\theta \in \Theta . \end{aligned}$$

In Theorem 4, measurability of f is meant as \(f^{-1}(\mathcal {E})\subset \mathcal {E}\) and \(P_\theta \circ f^{-1}\) denotes the probability on \(\mathcal {E}\) defined as \(P_\theta \circ f^{-1}(A)=P_\theta \bigl (f^{-1}(A)\bigr )\) for all \(A\in \mathcal {E}\).

Essentially, Theorem 4 states that a kernel \(\mathcal {P}\) is non-atomic and orthogonal if and only if any other kernel \(\mathcal {Q}\) is a push forward of \(\mathcal {P}\), in the sense that \(Q_\theta =P_\theta \circ f^{-1}\) for all \(\theta \) and a suitable function f. This characterization may be useful in every framework where kernels play a role, and the list of such frameworks is very long. In probability theory, for instance, kernels are obviously a basic ingredient: just think of conditional distributions or Markov processes. In Bayesian statistical inference, a kernel \(\mathcal {P}\) may be viewed as the collection of the distributions on the data conditional on a parameter. In decision theory, \(\mathcal {P}\) can be regarded as the collection of the distributions of a state-contingent payoff conditional on a parameter; see e.g. (Hansen et al. 2024). In weak optimal transport, each \(P_\theta \) provides information about how the mass taken at \(\theta \) is distributed over \(\mathcal {X}\); see e.g. (Chone and Kramarz 2021; Chone et al. 2023; Galichon et al. 2014) and references therein. In each of these frameworks, thus, Theorem 4 has some motivation.

The previous remarks are still valid if kernels are replaced by models. In fact, there are several problems where measurability of a kernel is superfluous. We support this claim by three examples.

Example 6

(Classical statistical inference) According to the classical approach to statistics, the two basic ingredients of an inferential problem are a measurable space \((\mathcal {X},\mathcal {E})\) and a model \(\mathcal {P}=\{P_\theta :\theta \in \Theta \}\). The set \(\mathcal {X}\) is regarded as the sample space and \(P_\theta \) is the probability distribution of the data when the value of the parameter is \(\theta \). Importantly, the parameter is viewed as an unknown but fixed constant, and there is no reason to integrate over it. Hence, the \(\sigma \) field \(\mathcal {H}\) is superfluous and measurability of \(\mathcal {P}\) is not required. In the language of this paper, \(\mathcal {P}\) is a model but not a kernel.

Example 7

(Disintegrations) For any model \(\mathcal {P}\), let \(\sigma (\mathcal {P})\) denote the \(\sigma \)-field over \(\Theta \) generated by the maps \(\theta \mapsto P_\theta (A)\) for all \(A\in \mathcal {E}\). One of the main reasons for requiring measurability of a kernel is the need of defining a probability on \(\mathcal {E}\) as

$$\begin{aligned} \mu _\pi (A)=\int P_\theta (A)\,\pi (d\theta ),\quad A\in \mathcal {E}, \end{aligned}$$
(6)

where \(\pi \) is a given probability on \(\mathcal {H}\). Such \(\mu _\pi \) cannot be defined if \(\mathcal {P}\) is a model but not a kernel. In Bayesian inference, for instance, \(\mathcal {P}\) is asked to be a kernel and \(\pi \) is the prior distribution. This procedure assumes that the \(\sigma \)-field \(\mathcal {H}\) is fixed before than \(\mathcal {P}\). However, these two steps could be reverted. Precisely, one first selects a model \(\mathcal {P}\) and then takes \(\mathcal {H}=\sigma (\mathcal {P})\). This actually happens as regards non-measurable disintegrations. To illustrate, suppose we are given a probability P on \(\mathcal {E}\) and a partition \(\{A_\theta :\theta \in \Theta \}\) with \(A_\theta \in \mathcal {E}\) for all \(\theta \). A (non-measurable) disintegration for P is a pair \((\mathcal {P},\pi )\) where \(\mathcal {P}\) is a model, \(\pi \)a probability on \(\sigma (\mathcal {P})\), and

  • \(P_\theta (A_\theta )=1\) for all \(\theta \in \Theta \);

  • \(P(A)=\int P_\theta (A)\,\pi (d\theta )\) for all \(A\in \mathcal {E}\).

A disintegration is said to be measurable if \(\Theta \) is equipped with a \(\sigma \)-field \(\mathcal {H}\) and \(\mathcal {P}\) is a kernel. Obviously, the conditions for having a non-measurable disintegration are much more general than those for a measurable disintegation; see e.g. (Berti et al. 2020) and references therein.

Example 8

(Orthogonality preserving models) As noted in Example 7, if \(\mathcal {P}\) is a kernel and \(\pi \) a probability on \(\mathcal {H}\), one can define a probability \(\mu _\pi \) on \(\mathcal {E}\) via equation (6). A kernel \(\mathcal {P}\) is orthogonality preserving if \(\mu _{\pi _1}\) and \(\mu _{\pi _2}\) are singular whenever \(\pi _1\) and \(\pi _2\) are singular probabilities on \(\mathcal {H}\). It is straightforward to prove that an orthogonal kernel is orthogonality preserving; see (Mauldin et al. 1983). This implication is still valid if kernels are replaced by models. Indeed, in Proposition 7, we will show that a weakly orthogonal model (as defined below) is orthogonality preserving in a suitable sense.

We now extend Theorem 4 from kernels to models. Unlike Theorem 4, the extended version admits a straightforward proof. Moreover, the notion of orthogonality can be weakened.

For any model \(\mathcal {P}\), define the \(\sigma \)-field

$$\begin{aligned} \mathcal {E}_\mathcal {P}=\bigcap _{\theta \in \Theta }\overline{\mathcal {E}}^{P_\theta } \end{aligned}$$

where \(\overline{\mathcal {E}}^{P_\theta }\) is the completion of \(\mathcal {E}\) with respect to \(P_\theta \). Given a function \(f:\mathcal {X}\rightarrow \mathcal {X}\), we say that f is measurable if \(f^{-1}(\mathcal {E})\subset \mathcal {E}\) and that f is \(\mathcal {P}\)-measurable if \(f^{-1}(\mathcal {E})\subset \mathcal {E}_\mathcal {P}\). Note that f is \(\mathcal {P}\)-measurable if and only if it is measurable with respect to \(P_\theta \) for every \(\theta \in \Theta \). We also say that \(\mathcal {P}\) is weakly orthogonal if there is a partition \(\{A_\theta :\,\theta \in \Theta \}\) of \(\mathcal {X}\) such that

$$\begin{aligned} A_\theta \in \mathcal {E}_\mathcal {P}\quad \text {and}\quad P_\theta (A_\theta )=1\quad \text {for each }\theta \in \Theta . \end{aligned}$$
(7)

Here, with a slight abuse of notation, the only extension of \(P_\theta \) to \(\mathcal {E}_\mathcal {P}\) is still denoted by \(P_\theta \). In this notation, the following result is available.

Theorem 5

Suppose card\(\,(\Theta )\le \,\,\)card\(\,(\mathcal {X})\) and \((\mathcal {X},\mathcal {E})\) is a Radon space. Then, a model \(\mathcal {P}\) is non-atomic and weakly orthogonal if and only if, for any other model \(\mathcal {Q}\), there is a \(\mathcal {P}\)-measurable function \(f:\mathcal {X}\rightarrow \mathcal {X}\) such that

$$\begin{aligned} Q_\theta =P_\theta \circ f^{-1}\quad \quad \text {for each }\theta \in \Theta . \end{aligned}$$
(8)

Proof

If \(\mathcal {E}\) does not support non-atomic probability measures, non-atomic models do not exist and condition (8) certainly fails for some choice of \(\mathcal {Q}\). Hence, \(\mathcal {E}\) can be assumed to support a non-atomic probability measure.

Suppose \(\mathcal {P}\) is non-atomic and weakly orthogonal. Fix a model \(\mathcal {Q}\) and a partition \(\{A_\theta :\,\theta \in \Theta \}\) of \(\mathcal {X}\) satisfying condition (7). Given \(\theta \in \Theta \), since \(Q_\theta \) is tight and \(P_\theta \) is a non-atomic probability measure, there is a measurable function \(f_\theta :\mathcal {X}\rightarrow \mathcal {X}\) such that \(Q_\theta =P_\theta \circ f_\theta ^{-1}\); see [Berti et al. (2007), Theo. 3.1]. For each \(x\in \mathcal {X}\), denote by \(\theta _x\) the unique \(\theta \in \Theta \) such that \(x\in A_\theta \). Define a function \(f:\mathcal {X}\rightarrow \mathcal {X}\) as

$$\begin{aligned} f(x)=f_{\theta _x}(x)\quad \quad \text {for every }x\in \mathcal {X}. \end{aligned}$$

Fix \(\theta \in \Theta \) and \(A\in \mathcal {E}\). Then,

$$\begin{aligned} \bigl \{f\in A\bigr \}=\bigl \{f_\theta \in A,\,f=f_\theta \bigr \}\cup \bigl \{f\in A,\,f\ne f_\theta \bigr \}. \end{aligned}$$

Since \(\{f\ne f_\theta \}\subset A_\theta ^c\) and \(P_\theta (A_\theta ^c)=0\), both the sets \(\bigl \{f=f_\theta \bigr \}\) and \(\bigl \{f\in A,\,f\ne f_\theta \bigr \}\) belong to \(\overline{\mathcal {E}}^{P_\theta }\). Since \(f_\theta \) is measurable, \(\bigl \{f_\theta \in A\bigr \}\in \mathcal {E}\). It follows that

$$\begin{aligned} \bigl \{f\in A\bigr \}\in \,\overline{\mathcal {E}}^{P_\theta }. \end{aligned}$$

Therefore, f is \(\mathcal {P}\)-measurable. Furthermore,

$$\begin{aligned} Q_\theta =P_\theta \circ f_\theta ^{-1}=P_\theta \circ f^{-1}\quad \quad \text {for each }\theta \in \Theta . \end{aligned}$$

Conversely, suppose that, for any model \(\mathcal {Q}\), there is a \(\mathcal {P}\)-measurable function \(f:\mathcal {X}\rightarrow \mathcal {X}\) satisfying condition (8). Fix \(\theta \in \Theta \) and a non-atomic probability measure \(\nu \) on \(\mathcal {E}\). Taking \(\mathcal {Q}\) such that \(Q_\theta =\nu \), condition (8) implies \(P_\theta \circ f^{-1}=\nu \) for some f. Hence, \(P_\theta \) is non-atomic since \(\nu \) is non-atomic and \(P_\theta \circ f^{-1}=\nu \). We next prove that \(\mathcal {P}\) is weakly orthogonal. Since card\(\,(\Theta )\le \,\,\)card\(\,(\mathcal {X})\), there is an injective function \(\phi :\Theta \rightarrow \mathcal {X}\). Letting \(Q_\theta =\delta _{\phi (\theta )}\) for each \(\theta \in \Theta \), condition (8) yields

$$\begin{aligned} P_\theta \bigl (f^{-1}\{\phi (\theta )\}\bigr )=\delta _{\phi (\theta )}\{\phi (\theta )\}=1 \end{aligned}$$

for some \(\mathcal {P}\)-measurable function \(f:\mathcal {X}\rightarrow \mathcal {X}\). Define \(B_\theta =f^{-1}\{\phi (\theta )\}\) and

$$\begin{aligned} D=\Bigl (\bigcup _{\theta \in \Theta } B_\theta \Bigr )^c. \end{aligned}$$

The sets \(B_\theta \) belong to \(\mathcal {E}_\mathcal {P}\) and are pairwise disjoint since \(\phi \) is injective. Moreover, \(D\in \mathcal {E}_\mathcal {P}\) since \(D\subset B_\theta ^c\) and \(P_\theta (B_\theta ^c)=0\) for all \(\theta \in \Theta \). Hence, fixed any point \(\theta _0\in \Theta \), condition (7) holds with \(A_\theta =B_\theta \) for \(\theta \ne \theta _0\) and \(A_{\theta _0}=B_{\theta _0}\cup D\). \(\square \)

We do not know whether the assumption card\(\,(\Theta )\le \,\,\)card\(\,(\mathcal {X})\) can be dropped. Such an assumption, instead, is superfluous in Theorem 4. In fact, Theorem 4 is trivially true if \(\mathcal {X}\) is countable. Otherwise, if \(\mathcal {X}\) is uncountable, card\(\,(\Theta )\le \,\,\)card\(\,(\mathcal {X})\) follows from \((\Theta ,\mathcal {H})\) and \((\mathcal {X},\mathcal {E})\) are standard Borel spaces.

As noted above, the heuristic interpretation of kernels can be attached to models as well. Thus, Theorem 5 has essentially the same motivations as Theorem 4.

Our next result is actually a mixture of Theorems 1, 4 and 5. Let \(\mathcal {P},\,\mathcal {Q}_0,\,\mathcal {Q}_1,\ldots \) be models with \(\mathcal {P}\) non-atomic and weakly orthogonal. By Theorem 5, for each \(n\ge 0\), there is a \(\mathcal {P}\)-measurable function \(f_n:\mathcal {X}\rightarrow \mathcal {X}\) such that \(P_\theta \circ f_n^{-1}=Q_{n,\theta }\) for all \(\theta \). We now prove that, if \(Q_{n,\theta }=Q_{0,\theta }\) on \(\sigma (g)\) for all \(\theta \) and a suitable function g, then \(f_n\) can be taken such that \(g(f_n)=g(f_0)\). Moreover, we give conditions for \(f_n\overset{P_\theta -a.s.}{\longrightarrow }f_0\), as \(n\rightarrow \infty \), for fixed \(\theta \in \Theta \).

Theorem 6

Let \((\mathcal {X},\mathcal {E})\) be a Radon space, \(\mathcal {Y}\) a separable metric space, and \(g:\mathcal {X}\rightarrow \mathcal {Y}\) a Borel function. Let \(\mathcal {P}\) and \(\mathcal {Q}_n\) be models, where \(n\ge 0\). Suppose \(\mathcal {P}\) is non-atomic and weakly orthogonal and

$$\begin{aligned} Q_{n,\theta }=Q_{0,\theta }\quad \text {on }\sigma (g)\text { for all }n\ge 0\text { and }\theta \in \Theta . \end{aligned}$$

Then, there are \(\mathcal {P}\)-measurable functions \(f_n:\mathcal {X}\rightarrow \mathcal {X}\) such that

$$\begin{aligned} P_\theta \circ f_n^{-1}=Q_{n,\theta }\,\text { and }\,g(f_n)=g(f_0)\,\text { for all }n\ge 0\text { and }\theta \in \Theta . \end{aligned}$$
(9)

In addition to (9), for fixed \(\theta \in \Theta \), one obtains

$$\begin{aligned} f_n\overset{P_\theta -a.s.}{\longrightarrow }f_0 \end{aligned}$$

whenever

$$\begin{aligned} E_{Q_{n,\theta }}(\varphi \mid g)\,\overset{Q_{0,\theta }-a.s.}{\longrightarrow }\,E_{Q_{0,\theta }}(\varphi \mid g)\quad \quad \text {for each }\varphi \in C_b(\mathcal {X}). \end{aligned}$$
(10)

Proof

Fix \(\theta \in \Theta \). By Corollary 4 of Pratelli and Rigo (2023), since \((\mathcal {X},\mathcal {E})\) is Radon and \((\mathcal {X},\mathcal {E},P_\theta )\) is a non-atomic probability space, there are measurable functions \(f_{n,\theta }:\mathcal {X}\rightarrow \mathcal {X}\) such that

$$\begin{aligned} Q_{n,\theta }=P_\theta \circ f_{n,\theta }^{-1}\quad \text {and}\quad g(f_{n,\theta })=g(f_{0,\theta })\quad \text {for all }n\ge 0. \end{aligned}$$

Moreover, under condition (10), one also obtains \(f_{n,\theta }\overset{P_\theta -a.s.}{\longrightarrow }f_{0,\theta }\).

Next, since \(\mathcal {P}\) is weakly orthogonal, there is a partition \(\{A_\theta :\,\theta \in \Theta \}\) of \(\mathcal {X}\) satisfying condition (7). For all \(n\ge 0\) and \(x\in \mathcal {X}\), define

$$\begin{aligned} f_n(x)=f_{n,\theta _x}(x) \end{aligned}$$

where \(\theta _x\) denotes the unique \(\theta \in \Theta \) such that \(x\in A_\theta \). Then, it is obvious that \(g(f_n)=g(f_0)\) for all n. Moreover, arguing as in the proof of Theorem 5, the \(f_n\) are \(\mathcal {P}\)-measurable and \(Q_{n,\theta }=P_\theta \circ f_{n}^{-1}\) for all n and \(\theta \). Finally, since \(P_\theta \bigl (f_n=f_{n,\theta }\bigr )=1\), condition (10) implies \(f_n\overset{P_\theta -a.s.}{\longrightarrow }f_0\).

Incidentally, we note that card\(\,(\Theta )\le \,\,\)card\(\,(\mathcal {X})\) under the assumptions of Theorem 6. In fact, card\(\,(\Theta )\le \,\,\)card\(\,(\mathcal {X})\) follows from \(\mathcal {P}\) being weakly orthogonal.

We close the paper by proving a claim made in Example 8.

Proposition 7

Let \(\mathcal {P}\) be a model and \(\sigma (\mathcal {P})\) the \(\sigma \)-field defined in Example 7. If \(\mathcal {P}\) is weakly orthogonal and \(\pi _1\) and \(\pi _2\) are singular probabilities on \(\sigma (\mathcal {P})\), then

$$\begin{aligned} \mu _{\pi _1}(A)=\mu _{\pi _2}(A^c)=1\quad \quad \text {for some }A\in \mathcal {E}_\mathcal {P}. \end{aligned}$$

Proof

Let \(\{A_\theta :\,\theta \in \Theta \}\) be a partition of \(\mathcal {X}\) satisfying condition (7). Since \(\pi _1\) and \(\pi _2\) are singular, there is \(B\in \sigma (\mathcal {P})\) such that \(\pi _1(B)=\pi _2(B^c)=1\). Define

$$\begin{aligned} A=\bigcup _{\theta \in B}A_\theta . \end{aligned}$$

Then, \(A\supset A_\theta \) for \(\theta \in B\) and \(A\subset A_\theta ^c\) for \(\theta \in B^c\). Since \(P_\theta (A_\theta )=1\) for all \(\theta \in \Theta \), it follows that \(A\in \mathcal {E}_\mathcal {P}\). Moreover,

$$\begin{aligned} \mu _{\pi _1}(A)=\int \ P_\theta (A)\,\pi _1(d\theta )=\int _B\ P_\theta (A)\,\pi _1(d\theta )=\int _B 1\,d\pi _1=\pi _1(B)=1, \end{aligned}$$

and similarly \(\mu _{\pi _2}(A^c)=1\). \(\square \)