Abstract
The Skorokhod embedding problem is to represent a given probability as the distribution of Brownian motion at a chosen stopping time. Over the last 50 years this has become one of the important classical problems in probability theory and a number of authors have constructed solutions with particular optimality properties. These constructions employ a variety of techniques ranging from excursion theory to potential and PDE theory and have been used in many different branches of pure and applied probability. We develop a new approach to Skorokhod embedding based on ideas and concepts from optimal mass transport. In analogy to the celebrated article of Gangbo and McCann on the geometry of optimal transport, we establish a geometric characterization of Skorokhod embeddings with desired optimality properties. This leads to a systematic method to construct optimal embeddings. It allows us, for the first time, to derive all known optimal Skorokhod embeddings as special cases of one unified construction and leads to a variety of new embeddings. While previous constructions typically used particular properties of Brownian motion, our approach applies to all sufficiently regular Markov processes.
Introduction
Let B be a Brownian motion started in 0 and consider a probability \(\mu \) on the real line which is centered and has second moment. The Skorokhod embedding problem is to construct a stopping time \(\tau \) embedding \(\mu \) into Brownian motion in the sense that
Here, the second condition is imposed to exclude certain undesirable solutions, and can be modified to extend to measures without a second moment. As already demonstrated by Skorokhod [53, 54] in the early 1960s, it is always possible to construct solutions to the problem. Indeed, the survey article [43] of Obłój classifies 21 distinct solutions to (SEP), although this list (from 2004) misses many more recent contributions. A common inspiration for many of these papers is to construct solutions to (SEP) that exhibit additional desirable properties or a distinct internal structure. These have found applications in different fields and various extensions of the original problem have been considered. We refer to [43] (and the 120+ references therein) for a comprehensive account of the field.
Our aim is to develop a new approach to (SEP) based on ideas from optimal transport. Many of the previous developments are thus obtained as applications of one unifying principle (Theorem 1.3) and several difficult problems are rendered tractable. Moreover, our methods can easily handle a number of more general versions of the problem: for example, integrable measures, general starting distributions, and \(\mathbb {R}^d\)valued Feller processes.
A motivating example: Root’s construction
To illustrate our approach we introduce Root’s construction [48], which will serve as inspiration in the rest of the paper. Root’s construction is one of the earliest solutions to (SEP), and it is prototypical for many further solutions to (SEP) in that it has a simple geometric description and possesses a certain optimality property in the class of all solutions.
Root established that there exists a barrier \({\mathcal {R}}\) (which is essentially unique) such that the Skorokhod embedding problem is solved by the stopping time
A barrier is a Borel set \({\mathcal {R}} \subseteq \mathbb {R}_+\times \mathbb {R}\) such that \((s,x)\in {\mathcal {R}}\) and \(s < t\) implies \((t,x)\in {\mathcal {R}}\) (see Fig. 1). The Root construction is distinguished by the following optimality property: among all solutions to (SEP) for a fixed terminal distribution \(\mu \), it minimizes \(\mathbb {E}[\tau ^2]\). For us, the optimality property will be the starting point from which we deduce a geometric characterization of \(\tau _{\text {Root}}\). To this end, we now formalize the corresponding optimization problem.
Optimal Skorokhod embedding problem
We consider the set of stopped paths
Throughout the paper we consider a function
We fix a stochastic basis \(\Omega =(\Omega ,\mathcal {G},(\mathcal {G}_t)_{t\ge 0},\mathbb {P})\) which is sufficiently rich to support a Brownian motion B and a uniformly distributed \(\mathcal {G}_0\)random variable, independent of B. The optimal Skorokhod embedding problem is to construct a stopping time \(\tau \) on \(\Omega \) which optimizes
We emphasize that (OptSEP) does not depend on the particular choice of the underlying basis as long as it is rich enough in the above sense, cf. Lemma 3.11/Sect. 4.1. We will usually assume that (OptSEP) is well posed in the sense that \(\mathbb {E}[\gamma ((B_t)_{t\le \tau },\tau )]\) exists with values in \((\infty ,\infty ]\) for all \(\tau \) which solve (SEP) and is finite for one such \(\tau \).
The Root stopping time solves (OptSEP) in the case where \(\gamma (f,s)= s^2\). Other examples where the solution is known include functions depending on the running maximum \(\gamma ((f,s)):= {\bar{f}}(s):= \max _{t\le s} f(t)\) or functions of the local time at 0.
The solutions to (SEP) have their origins in many different branches of probability theory, and in many cases, the original derivation of the embedding occurred separately from the proof of the corresponding optimality properties. Moreover, the optimality of a given construction is often not immediate; for example, the optimality property of the Root embedding was first conjectured by Kiefer [36] and subsequently established by Rost [50].
In contrast to existing work, we will start with the optimization problem (OptSEP) and we seek a systematic method to determine the minimizer for a given function \(\gamma \). To develop a general theory for this optimization problem we interpret stopping times in terms of a transport plan from the Wiener space \(({C_0(\mathbb {R}_+)},{\mathbb {W}})\) to the target measure \(\mu \), i.e. we want to think of a stopping time \(\tau \) as transporting the mass of a trajectory \((B_t(\omega ))_{t \in \mathbb {R}_+}\) to the point \(B_{\tau (\omega )}(\omega )\in \mathbb {R}.\) Note that this is not a coupling between \({\mathbb {W}}\) and \(\mu \) in the usual sense and one cannot directly apply optimal transport theory. Nevertheless the transport perspective provides a powerful intuition that guides us to develop an analogous theory, which in particular accounts for the adaptedness properties of stopping times. To this end, it is necessary to combine ideas and results from optimal transport with concepts and techniques from stochastic analysis.
As in optimal transport, it is crucial to consider (OptSEP) in a suitably relaxed form, i.e. in (OptSEP) we will optimize over randomized stopping times (see Definition 3.7 below). These can be viewed as usual stopping times on a possibly enlarged probability space but in our context it is more natural to interpret them as stopping times of ‘Kantorovichtype’ (in the sense of optimal transport), i.e. stopping times which terminate a given path not at a single deterministic time instance but according to a distribution.
This relaxation will allow us to transfer many of the convenient properties of classical transport theory to our probabilistic setup. Exactly as in classical transport theory, (OptSEP) can be viewed as a linear optimization problem. The set of couplings in mass transport is compact and similarly the set of all randomized stopping times solving (SEP) on Wiener space is compact in a natural sense. Under the standing assumption that B is defined on a sufficiently rich stochastic basis, these considerations allow us to prove:
Theorem 1.1
Let \(\gamma :S\rightarrow \mathbb {R}\) be lsc and bounded from below. Then (OptSEP) admits a minimizing stopping time \(\tau \).
Here we can talk about the continuity properties of \(\gamma \) since S possesses a natural Polish topology (cf. (3.1)).
In the language of linear optimization, Theorem 1.1 is a primal problem. It is therefore natural to expect that there exists a corresponding dual problem, and our second main result concerns this duality:
Theorem 1.2
Let \(\gamma : S \rightarrow \mathbb {R}\) be lsc and bounded from below, and set
where \(M, \psi \) satisfy \(M_t \le a + bt+c B_t^2 ,\psi (y) \le a+ b y^2\) for some \(a,b,c>0\). Then we have the duality relation
We will prove this result in Sect. 4, and variants of this result will prove to be important in establishing later results. Theorem 1.2 has close analogues in the literature. In particular, using Hobson’s time change argument [30, 31], Theorem 1.2 is comparable to the work of Dolinsky and Soner [20, 21]. Similar duality results in a discrete time framework are established by Bouchard and Nutz [9] among others.
Geometric characterization of optimizers: monotonicity principle
A fundamental idea in optimal transport is that the optimality of a transport plan is reflected by the geometry of its support set. Often this is key to understanding the transport problem. On the level of support sets, the relevant notion is ccyclical monotonicity. The relevance of this concept for the theory of optimal transport has been fully recognized by Gangbo and McCann [24], based on earlier work of Knott and Smith [37] and Rüschendorf [51, 52] among others.
Inspired by these results, we establish a monotonicity principle which links the optimality of a stopping time \(\tau \) with ‘geometric’ properties of \(\tau \). Combined with Theorem 1.1, this principle will turn out to be surprisingly powerful. For the first time, all the known solutions to (SEP) with optimality properties can be established through one unifying principle. Moreover, the monotonicity principle allows us to treat the optimization problem (OptSEP) in a systematic manner, generating further embeddings as a byproduct.
Our third main result states:
Theorem 1.3
(Monotonicity principle) Let \(\gamma :S\rightarrow \mathbb {R}\) be Borel measurable. Suppose that (OptSEP) is well posed and \(\tau \) is an optimizer. Then there exists a \(\gamma \)monotone (cf. Definition 1.5 below) Borel set \(\Gamma \subseteq S\) such that \(\mathbb {P}\)a.s.
If (1.4) holds, we will loosely say that \(\Gamma \) supports \(\tau \). The significance of Theorem 1.3 is that it links the optimality of the stopping time \(\tau \) with a particular property of the set \(\Gamma \), i.e. \(\gamma \)monotonicity. In applications, the latter turns out to be much more tangible. We emphasize that we do not require continuity assumptions on \(\gamma \) in this result. This will be important when we apply our results.
To link the optimality of a stopping time with properties of the set \(\Gamma \) we consider the minimization problem (OptSEP) on a pathwise level. Consider two paths \((f,s), (g,t)\in S\) which end at the same value, i.e. \(f(s)=g(t)\). We want to determine which of the two paths should be stopped and which one should be allowed to go on further, bearing in mind that we try to minimize \(\mathbb {E}[\gamma ((B_s)_{s\le \tau }, \tau )]\). To make this definition formal, we need to perform an operation at the level of individual paths. We will write \(f\oplus h\) for the concatenation of the two paths \((f,s), (h,u) \in S\), specifically:
Then we set
We will call ((f, s), (g, t)) a stopgo pair if it is advantageous to stop (f, s) and to go on after (g, t) in the following sense:
Definition 1.4
The pair \(((f,s), (g,t))\in S\times S\) is a stopgo pair, written \(((f,s), (g,t))\in \mathsf {SG}\), iff \(f(s)=g(t)\) and
for every \((\mathcal {F}^B_t)_{t \ge 0}\)stopping time \(\sigma \) which satisfies \(0< \mathbb {E}[\sigma ] < \infty \) and for which both sides of (1.6) are well defined and the left hand side is finite.
Here \((\mathcal {F}_t^B)_{t \ge 0}\) denotes the natural filtration generated by the Brownian motion B. A consequence of considering only \((\mathcal {F}_t^B)_{t \ge 0}\)stopping times is that the set \(\mathsf {SG}\) does not depend on the particular choice of the underlying stochastic basis.
We note that a swapping of paths (as illustrated in Fig. 2) was used by Hobson [31, p. 34] to provide a heuristic derivation of the optimality properties of the Root embedding. Indeed Hobson’s approach was the starting point of the present paper.
Recalling (1.4), we see that the set \(\Gamma \subseteq S\) contains the stopped paths: that is, a path (g, t) is in \(\Gamma \) if there is some possibility that the optimal stopping rule decides to stop at time t having observed the path \((g(u))_{u \in [0,t]}\). In addition, we need to consider those paths which we observe as the initial section of a longer, stopped, path: these are the going paths
We can now formally introduce \(\gamma \)monotonicity.
Definition 1.5
A set \(\Gamma \subseteq S\) is called \(\gamma \)monotone iff \(\Gamma ^< \times \Gamma \) contains no stopgo pairs, i.e.
By the monotonicity principle, Theorem 1.3, an optimal stopping time is supported by a set \(\Gamma \) such that \(\Gamma ^<\times \Gamma \) contains no stopgo pair \(\big ((f,s),(g,t)\big )\). Intuitively, such a pair gives rise to a possible modification, improving the given stopping rule: as \(f(s) = g(t)\), we can imagine stopping the path (f, s) at time s, and allowing (g, t) to go on by transferring all paths which extend (f, s), the ‘remaining lifetime’, onto (g, t), which is now going (see Fig. 2). By (1.6) this guarantees an improved value of \(P_\gamma \), contradicting the optimality of our stopping rule. Observe that the condition \(f(s)=g(t)\) is what guarantees that a modified stopping rule still embeds the measure \(\mu \). In Sect. 2 below we will briefly indicate how the monotonicity principle can be used to derive existing solutions to the Skorokhod embedding problem as well as a whole family of novel solutions to the Skorokhod embedding problem; many further examples will be provided in Sect. 6.
Importantly, the transportbased approach readily admits a number of strong generalizations and extensions. With only minor changes the existence result, Theorem 1.1, the duality result, Theorem 1.2, and the monotonicity principle, Theorem 1.3 below, extend to general starting distributions and Brownian motion in \(\mathbb {R}^d\), and more generally to sufficiently regular Markov processes; see Sects. 5 and 7. This is notable since previous constructions usually exploit rather specific properties of Brownian motion.
The monotonicity principle, Theorem 1.3, represents the culmination of the three main results, and the proof of this result will be the most complex part of this paper, requiring substantial preparation in order to combine the relevant concepts from stochastic analysis and optimal transport. The preparation and proof of this result will therefore comprise the majority of the paper. In fact the proof will automatically imply a stronger version (Theorem 5.7) of Theorem 1.3. For our applications, it will also be helpful to introduce a version of this result which incorporates a secondary optimization, Theorem 5.16.
The ‘classical’ optimal transport version of Theorem 1.3 can be established through fairly direct arguments, at least in a reasonably regular setting, cf. [3, Thms. 3.2,3.3] and [57, p. 88f]. However, these approaches do not extend easily to our setup: stopping times are of course not couplings in the usual sense and there is no reason for particular combinatorial manipulations to carry over in a direct fashion. Another substantial difference is that the procedure of transferring paths described below Definition 1.5 necessarily refers to a continuum of paths while the classical notion of cyclical monotonicity is concerned with rearrangements along finite cycles. The argument given subsequently is more in the spirit of [6, 8] and requires a fusion of ideas from optimal transport and stochastic analysis.
New horizons
The results presented in this paper are limited to the case of the classical Skorokhod embedding problem for Markov processes with continuous paths. However we believe that our methods are sufficiently general that a number of interesting and important extensions, which previously would have been intractable, may now be within reach:

1.
Markov processes The results presented in this paper should extend to a more general class of Markov processes with càdlàg paths. The main technical issues this would present lie in the generalization of the results in Sect. 3, where the specific structure of the space of continuous paths is exploited.

2.
Multiple pathswapping In our monotonicity principle, Theorem 1.3, we consider the impact of swapping mass from a single unstopped path onto a single stopped path, and argue that if this improves the objective \(\gamma \) on average, then the stopping time in question was not optimal. In classical optimal transport, it is known that single swapping is not sufficient to guarantee optimality; rather, one needs to consider the impact of allowing a finite ‘cycle’ of swaps to occur, and moreover, that this is both a necessary and sufficient condition for optimality. It is natural to conjecture that a similar result applies in the present setup.

3.
Multiple marginals A natural generalization of the Skorokhod embedding problem is to consider the case where a sequence of measures, \(\mu _1, \mu _2, \ldots , \mu _n\) are given, and the aim is to find a sequence of stopping times \(\tau _1 \le \tau _2 \le \cdots \le \tau _n\) such that \(B_{\tau _k} \sim \mu _k\), and such that the chosen sequence of stopping times minimizes \(\mathbb {E}[\gamma ((B_t)_{t \le \tau _n},\tau _1,\ldots ,\tau _n)]\) for a suitable function \(\gamma \). In this setup, it is natural to ask whether there exists a suitable monotonicity principle, corresponding to Theorem 1.3.

4.
Constrained embedding problems In this paper, we consider classical embedding problems, where the optimization is carried out over the class of solutions to (SEP). However, in many natural applications, one needs to further consider the class of constrained embedding problems: for example, where one minimizes some function over the class of embeddings which also satisfy a restriction on the probability of stopping after a given time. It would be natural to derive generalizations of our duality results, and a corresponding monotonicity principle for such problems.
Background
Since the first solution to (SEP) by Skorokhod [54] the embedding problem has received frequent attention in the literature, with new solutions appearing regularly, and exploiting a number of different mathematical tools. Many of these solutions also prove to be, by design or accident, solutions of (OptSEP) for a particular choice of \(\gamma \), e.g. [4, 33, 45, 48, 50, 56]. The survey [43] is a comprehensive account of all the solutions to (SEP) up to 2004 and references many articles which use or develop solutions to the Skorokhod embedding problem. More recently, novel twists on the classical Skorokhod embedding problem have been investigated by: Last et. al. [38], who consider the closely related problem of finding unbiased shifts of Brownian motion (and where there are also natural connections to optimal transport); Hirsch et. al. [29], who have used solutions to the Skorokhod embedding problem to construct Peacocks; and Gassiat et. al. [25], who have exploited particular properties of Root’s solution to construct efficient numerical schemes for SDEs.
The Skorokhod embedding problem has also recently received substantial attention from the mathematical finance community. This goes back to an idea of Hobson [30]: through the DambisDubinsSchwarz Theorem, the optimization problems (OptSEP) are related to the pricing of financial derivatives, and in particular to the problem of modelrisk. We refer the reader to the survey article [31] for further details.
Recently there has been much interest in optimal transport problems where the transport plan must satisfy additional martingale constraints. Such problems arise naturally in the financial context, but are also of independent mathematical interest, for example—mirroring classical optimal transport—they have important consequences for the study of martingale inequalities (see e.g. [9, 28, 44]). The first papers to study such problems include [7, 19, 23, 32], and this field is commonly referred to as martingale optimal transport. The Skorokhod embedding problem has been considered in this context by Galichon et. al. in [23]; through a stochastic control problem they recover the Azéma–Yor solution of the Skorokhod embedding problem. Notably, their approach is very different from the one pursued in the present paper.
Outline of the article
In Sect. 2 we establish the Root and the Rost embeddings as a consequence of Theorems 1.1 and 1.3, as well as constructing a family of new embeddings. The results presented in this section are intended as a motivation for the rest of the paper. In the derivation of these embeddings we highlight the interplay between arguments of a probabilistic nature, and arguments relating to the pathwise space S introduced in (1.2). A major benefit of working in these two separate domains is that it is typically relatively easy to prove pointwise statements in the setup of the space S; on the other hand, the associated probabilistic arguments are usually straightforward. However neither set of arguments naturally transfers to the other setup.
The link between these distinct domains is provided by Theorems 1.1 and 1.2, and in particular the monotonicity principle Theorem 1.3 which we establish in Sects. 3–5. In Sect. 3, we introduce a framework that allows us to view classical probabilistic concepts on the pathwise space S and establish a number of auxiliary results that will be needed later on. In Sect. 4 we prove our first two main results. As in the transport case, Theorem 1.1 will be a simple consequence of lower semicontinuity plus compactness of the set of solutions to the Skorokhod problem. To establish Theorem 1.2, we use classical duality results from optimal transport. In Sect. 5 we prove Theorem 1.3 based on a combination of arguments from optimal transport with Choquet’s capacitability theorem and ingredients from stochastic analysis.
In Sect. 6 we use our results to establish all known solutions to (OptSEP) as well as further embeddings. We also give an example in which (OptSEP) admits only optimizers depending on additional randomization. For readers who are mainly interested in these applications, it should be possible to read this section immediately after Sect. 2.
In Sect. 7 we describe a number of extensions of our previous results. In particular we consider general starting distributions and show that our main results extend to continuous Feller processes under certain assumptions which we are able to verify for a large class of processes. As a special case of the results in this section, we also show that, as usual, the moment condition on \(\mu \) can be dropped when the second condition in (SEP) is recast in terms of uniform integrability resp. minimality (cf. (2.1)).
Frequently used notation

The set of (sub)probability measures on a space \(\mathsf {X}\) is denoted by \({\mathcal {P}}(\mathsf {X})\) / \({\mathcal {P}}^{\le 1}(\mathsf {X})\).

For a measure \(\xi \) on \(\mathsf {X}\) we write \(f(\xi )\) for the pushforward of \(\xi \) under \(f:\mathsf {X}\rightarrow \mathsf {Y}\).

We use \(\xi (f)\) as well as \(\int f~ d\xi \) to denote the integral of a function f against a measure \(\xi \).

Stochastic processes are usually denoted by capital letters like X, Y, Z.

\({C_x(\mathbb {R}_+)}\) denotes the continuous functions starting in x; \({C(\mathbb {R}_+)}=\bigcup _{x\in \mathbb {R}}{C_x(\mathbb {R}_+)}\).

The set of stopped paths is \( S =\{(f,s): f:[0,s] \rightarrow \mathbb {R} \text{ is } \text{ continuous }, \)f(0)=0\(\} \) and we define \(r:{C_0(\mathbb {R}_+)}\times \mathbb {R}_+\rightarrow S\) by \(r(\omega , t):= (\omega _{\upharpoonright [0,t]},t)\).

For \(\Gamma \subseteq S\) we set \(\Gamma ^<:=\{(f,s): \exists ({\tilde{f}},{\tilde{s}})\in \Gamma , s< {\tilde{s}} \text{ and } f\equiv {\tilde{f}} \hbox { on }[0,s]\}.\)

For \((f,s) \in S\) we write \(\bar{f} = \sup _{r \le s} f(r),\underline{f} = \inf _{r \le s} f(r)\) and \(f^* = \sup _{r \le s} f(r)\).

We use \(\oplus \) for the concatenation of paths: depending on the context the arguments may be elements of \(S,{C_0(\mathbb {R}_+)}\) or \({C_0(\mathbb {R}_+)}\times \mathbb {R}_+\).

If F is a function on S resp. \({C_0(\mathbb {R}_+)}\times \mathbb {R}_+\) and \((f,s)\in S\) we set \(F^{(f,s)\oplus } (y):= F((f,s)\oplus y)\), where y may be an element of \( S,{C_0(\mathbb {R}_+)}\), or \({C_0(\mathbb {R}_+)}\times \mathbb {R}_+\).

\({\mathbb {W}}\) denotes Wiener measure; \(\mathcal {F}^0\) (\(\mathcal {F}^a\)) the natural (augmented) filtration on \({C_0(\mathbb {R}_+)}\).

Two commonly used probability spaces are \((\Omega , \mathcal {G}, (\mathcal {G}_t)_{t \ge 0}, \mathbb {P})\), which is an arbitrary probability space, on which there exists a process B which is Brownian motion, and sometimes also a \(\mathcal {G}_0\)random variable Y which is uniformly distributed on [0, 1]. On this space, the natural filtration generated by the process B is denoted by \((\mathcal {F}^B_t)_{t \ge 0}\) In addition, we sometimes refer to the space \(({{\overline{C}}_0(\mathbb {R}_+)}, {\bar{\mathcal {F}}}, ({\bar{\mathcal {F}}}_t)_{t \ge 0}, \overline{{\mathbb {W}}})\), which is the product space \({{\overline{C}}_0(\mathbb {R}_+)}= {C_0(\mathbb {R}_+)}\times [0,1]\) equipped with a suitable filtration (see the discussion above Theorem 3.8 for further details) and the product measure \(\overline{{\mathbb {W}}}={\mathbb {W}}\otimes {\mathcal {L}}\) of Wiener and Lebesgue measure.
Particular embeddings
In this section we explain how Theorem 1.3 can be used to derive particular solutions to the Skorokhod embedding problem, (SEP), using the optimization problem (OptSEP). For much of the paper, we consider (SEP) for measures \(\mu \) where \(\int x^2\, \mu (dx) < \infty \). This constraint can be weakened to require only the first moment to be finite, subject to the restriction that the stopping time is minimal: that is, if \(\tau \) is a stopping time such that \(B_{\tau } \sim \mu \), then for any stopping time \(\tau '\),
In the case where \(\mu \) has a second moment, minimality and \(\mathbb {E}[\tau ]< \infty \) are equivalent. We emphasize that, mutatis mutandis, all of our results are valid in this more general setup, see Sect. 7. Recall that we are working on a stochastic basis which is rich enough to support a Brownian motion and a uniformly distributed random variable.
The Root embedding
We recall the definition of the Root embedding, \(\tau _{\text {Root}}\), from (1.1), and we wish to recover Root’s result [48] from an optimization problem. Remember that, according to Root’s terminology, a (closed) set \({\mathcal {R}} \subseteq \mathbb {R}_+ \times \mathbb {R}\) is a barrier if \((s,x) \in {\mathcal {R}}\) implies \((t,x) \in {\mathcal {R}}\) whenever \(t>s\). Then Root’s construction of a solution to the Skorokhod embedding problem can be summarized as follows:
Theorem 2.1
Let \(\gamma (f,t)= h(t)\), where \(h:\mathbb {R}_+\rightarrow \mathbb {R}\) is a strictly convex function such that (OptSEP) is well posed. Then a minimizer of (OptSEP) exists, and moreover for any minimizer \({\hat{\tau }}\), there exists a barrier \({\mathcal {R}}\) such that \({\hat{\tau }}=\inf \{ t \ge 0 : (t,B_t)\in {\mathcal {R}}\}\). In particular the Skorokhod embedding problem has a solution of barrier type as in (1.1).
Proof
Step 1. We first pick—by Theorem 1.1—a stopping time \({\hat{\tau }}\) which attains \(P_\gamma .\) By Theorem 1.3 there exists a set \(\Gamma \subseteq S\) such that \(\left( \left( B_{s}\right) _{s \le {\hat{\tau }}}, {\hat{\tau }}\right) \in \Gamma \) almost surely, and such that \((\Gamma ^{<} \times \Gamma ) \cap \mathsf {SG}= \emptyset \).
Step 2. Next, consider paths \((f,s),(g,t)\in S\) such that \(f(s)=g(t)\). We consider when \(\big ((f,s),(g,t)\big )\in \mathsf {SG},\) i.e. under which conditions (f, s) should be stopped and Brownian motion should continue to go after (g, t). In the present case (1.6) amounts to
Thus, by strict convexity of \(h,((f,s),(g,t)) \in \mathsf {SG}\) iff \(t <s\). We define two barriers by
Fix \((g,t) \in \Gamma \). Then we have \((t,g(t)) \in {\mathcal {R}}_{\textsc {cl}}\). Suppose for contradiction that \(\inf \{s \in [0,t]: (s,g(s)) \in {\mathcal {R}}_{\textsc {op}}\} < t\). Then there exists \(s<t\) such that \((f,s) := (g_{\upharpoonright [0,s]},s) \in \Gamma ^{<}\) and \((s, f(s)) \in {\mathcal {R}}_{\textsc {op}}\). By definition of \({\mathcal {R}}_{\textsc {op}}\), it follows that there exists another path \((k,u) \in \Gamma \) such that \(k(u) = f(s)\) and \(u < s\). But then \(( (f,s), (k,u)) \in \mathsf {SG}\cap (\Gamma ^<\times \Gamma )\) which cannot be the case. Hence,
Step 3. Now consider \(\omega \in \Omega \) such that \((g,t) = ((B_{s}(\omega ))_{s \le {\hat{\tau }}(\omega )}, {\hat{\tau }}(\omega )) \in \Gamma \). Then it follows immediately that:
We finally observe that \(\tau _\textsc {cl}= \tau _\textsc {op}\) a.s. by the strong Markov property, and the fact that onedimensional Brownian motion immediately returns to its starting point. \(\square \)
A consequence of this proof is that (on a given stochastic basis) there exists exactly one solution of the Skorokhod embedding problem which minimizes \(\mathbb {E}[h(\tau )]\); this property was first established in [50], together with the optimality property of Root’s solution. To see this, assume that minimizers \(\tau _1\) and \(\tau _2\) are given. Then we can use an independent coinflip to define a new minimizer \({\bar{\tau }}\) which is with probability 1 / 2 equal to \(\tau _1\) and with probability 1 / 2 equal to \(\tau _2\). By Theorem 2.1, \({\bar{\tau }}\) is of barrier type and hence \(\tau _1=\tau _2\).
Remark 2.2
We highlight here the nature of the proof of Theorem 2.1. The proof divides into three steps, two of these steps (Steps 1 and 3) being probabilistic in nature, making arguments about random variables on a particular probability space. The second step, however, is purely a pointwise argument about the properties of subsets of \(\Gamma \) in relation to the function \(\gamma \) which we look to optimize. The latter arguments are not probabilistic in nature.
Remark 2.3
The following argument, due to Loynes [39], can be used to argue that barriers are unique in the sense that if two barriers solve (SEP), then their hitting times must be equal. Suppose that \({\mathcal {R}}\) and \({\mathcal {S}}\) are both closed barriers which embed \(\mu \). Note that we can take the closed barriers without altering the stopping properties. Consider the barrier \({\mathcal {R}} \cup {\mathcal {S}}\): let \(A \subseteq \Omega _{{\mathcal {R}}} := \{x: (t,x) \in {\mathcal {S}} \implies (t,x) \in {\mathcal {R}}\}\). Then \(\mathbb {P}(B_{\tau _{{\mathcal {R}} \cup {\mathcal {S}}}} \in A) \le \mathbb {P}(B_{\tau _{{\mathcal {R}}}} \in A) = \mu (A)\). Similarly, for \(A' \subseteq \Omega _{\mathcal {S}} := \{x: (t,x) \in {\mathcal {R}} \implies (t,x) \in {\mathcal {S}}\},\mathbb {P}(B_{\tau _{{\mathcal {R}} \cup \mathcal {S}}} \in A') \le \mathbb {P}(B_{\tau _{\mathcal {S}}} \in A') = \mu (A')\). Since \(\mu (\Omega _{{\mathcal {R}}} \cup \Omega _\mathcal {S}) = 1,\tau _{{\mathcal {R}} \cup {\mathcal {S}}}\) embeds \(\mu \).
It is known (see Monroe [41]) that, when \(\mu \) has a second moment, the second condition in (SEP), \(\mathbb {E}[\tau ] < \infty \) is equivalent to minimality of the stopping time (recall (2.1)). It immediately follows from the argument above that if the barriers \({\mathcal {R}}\) and \({\mathcal {S}}\) solve (SEP), then \(\tau _{{\mathcal {R}}} = \tau _{\mathcal {S}}\) a.s. With minor modifications the argument of Loynes also applies to the Rost solution discussed below as well as to a number of further classical embeddings presented in Sect. 6 below.
In Sect. 6.3 we will prove generalizations of Theorem 2.1 which admit similar conclusions in \(\mathbb {R}^d\) and for general initial distributions.
The Rost embedding
A set \({{\mathcal {R}}} \subseteq \mathbb {R}_+\times \mathbb {R}\) is an inverse barrier if \((s,x)\in {{\mathcal {R}}}\) and \(s > t\) implies that \((t,x)\in {{\mathcal {R}}}\). It has been shown by Rost [50] that under the condition \(\mu (\{0\})=0\) there exists an inverse barrier such that the corresponding hitting time (in the sense of (1.1)) solves the Skorokhod problem (see Fig. 3(A)). It is not hard to see that without this condition some additional randomization is required. We derive this using an argument almost identical to the one above.
Theorem 2.4
Suppose \(\mu (\{0\}) = 0\). Let \(\gamma (f,t)= h(t)\), where \(h:\mathbb {R}_+\rightarrow \mathbb {R}_+\) is a strictly concave function such that \(\mathrm{(OptSEP)}\) is well posed. Then a minimizer \({\hat{\tau }}\) of \(\mathrm{(OptSEP)}\) exists, and moreover for any minimizer \({\hat{\tau }}\), there exists an inverse barrier \({\mathcal {R}}\) such that \({\hat{\tau }}=\inf \{ t \ge 0 : (t,B_t)\in {\mathcal {R}}\}\). In particular the Skorokhod embedding problem has a solution which is the hitting time of an inversebarrier.
Proof
Our proof follows closely the proof of Theorem 2.1. In particular, Steps 1 and 2 can be carried out almost verbatim to get an optimizer \({\hat{\tau }}\) and a \(\gamma \)monotone set \(\Gamma \subseteq S\) such that \(\mathbb {P}(((B_t)_{t\le {\hat{\tau }}},{\hat{\tau }})\in \Gamma )=1\). By concavity of h, the set of stopgo pairs is now given by
We remove all paths (f, s) with \(f(s)=0\) from \(\Gamma \), as \(\mu (\{0\})=0\) this does not alter the full support property (or the \(\gamma \)monotone property). Next we define inverse barriers by
Denoting the respective hitting times by \(\tau _{\textsc {op}}\) and \(\tau _{\textsc {cl}}\) the argument familiar from the Root case yields \(\tau _\textsc {cl}\le {\hat{\tau }}\le \tau _\textsc {op}\) a.s. and it remains to show \(\tau _\textsc {cl}= \tau _\textsc {op}\) a.s. The argument is slightly more involved than in the Root case but again entirely probabilistic:
We define \(b(t) := \inf \{x > 0: (t,x) \in {\mathcal {R}}_\textsc {cl}\},c(t) := \sup \{x<0: (t,x) \in {\mathcal {R}}_\textsc {cl}\}\) and note that
Concentrating on the function b, we have for \(\varepsilon > 0\)
By Girsanov’s Theorem, \(\lim _{\varepsilon \rightarrow 0} \mathbb {P}(\sigma ^\varepsilon _b \le t) = \mathbb {P}(\sigma _b \le t)\) for each \(t\in \mathbb {R}_+\) hence \( \sigma _b^+= \sigma _b\) a.s.
Arguing likewise on c, we obtain \(\tau _\textsc {cl}= \tau _\textsc {op}\) a.s. \(\square \)
As in the case of the Root embedding we obtain that the minimizer of \(\mathbb {E}[h( \tau )]\) is unique.
The Cave embedding
In this section we give an example of a new embedding that can be derived from Theorem 1.3. It can be seen as a unification of the Root and Rost embeddings. A set \({\mathcal {R}} \subseteq \mathbb {R}_+\times \mathbb {R}\) is a cave barrier if there exists \(t_0\in \mathbb {R}_+\), an inverse barrier \({\mathcal {R}}^0\subseteq [0,t_0]\times \mathbb {R}\) and a barrier \({\mathcal {R}}^1 \subseteq [t_0,\infty )\times \mathbb {R}\) such that \({\mathcal {R}}={\mathcal {R}}^0\cup {\mathcal {R}}^1.\) We will show that there exists a cave barrier such that the corresponding hitting time (in the sense of (1.1)) solves the Skorokhod problem (Fig. 3(B)). We derive this using an argument similar to the one above:
Fix \(t_0\in \mathbb {R}\) and pick a continuous function \(\varphi :\mathbb {R}_+\rightarrow [0,1]\) such that

\(\varphi (0)=0, \lim _{t\rightarrow \infty }\varphi (t)=0, \varphi (t_0)=1\)

\(\varphi \) is strictly concave on \([0,t_0]\)

\(\varphi \) is strictly convex on \([t_0,\infty )\).
It follows that \(\varphi \) is strictly increasing on \([0,t_0]\) and strictly decreasing on \([t_0,\infty )\).
Theorem 2.5
(Cave embedding) Suppose \(\mu (\{0\}) = 0\). Let \(\gamma (f,t)= \varphi (t)\). Then a minimizer \({\hat{\tau }}\) of (OptSEP) exists, and moreover for any minimizer \({\hat{\tau }}\), there exists a cave barrier \({\mathcal {R}}\) such that \({\hat{\tau }}=\inf \{ t \ge 0 : (t,B_t)\in {\mathcal {R}}\}\). In particular the Skorokhod embedding problem has a solution which is the hitting time of a cave barrier.
Since this construction does not already appear in the literature, we emphasize that the result remains true for integrable (centered) measures \(\mu \) (see Sect. 7).
Proof of Theorem 2.5
Note that since \(\varphi \) is bounded, the problem (OptSEP) is well posed. Following the steps of the proofs of Theorems 2.1 and 2.4, we find an optimizer \({\hat{\tau }}\) and a \(\gamma \)monotone set \(\Gamma \subseteq S\) such that \(\mathbb {P}(((B_t)_{t\le {\hat{\tau }}},{\hat{\tau }})\in \Gamma )=1\). The set of stopgo pairs is given by
Indeed, for \(s<t\le t_0\) and any \((h,r)\in S\) we have
which holds iff \(t\mapsto \varphi (t+r)\varphi (t)\) is strictly decreasing on \([0,t_0]\) for all \(r>0.\) If \(t+r,t\in [0,t_0]\) this follows from concavity of \(\varphi \). In the case that \(t\le t_0, t+r>t_0\) this follows since \(\varphi '\) is strictly positive on \([0,t_0)\) and strictly negative on \((t_0,\infty ).\) The case \(t_0\le t<s\) can be established similarly.
Then, we define an ‘open’ cave barrier by
and \({\mathcal {R}}_{\textsc {op}}:={\mathcal {R}}_{\textsc {op}}^0 \cup {\mathcal {R}}_{\textsc {op}}^1\) (resp. a ‘closed’ cave barrier where we allow \(t\le s\) and \(s\le t\) in \({\mathcal {R}}_{\textsc {cl}}^0\) and \({\mathcal {R}}_{\textsc {cl}}^1\) resp.). We denote the corresponding hitting time by \(\tau _{{\mathcal {R}}_{\textsc {op}}}=\tau _{{\mathcal {R}}_{\textsc {op}}^0}\wedge \tau _{{\mathcal {R}}_{\textsc {op}}^1}\) (resp. \(\tau _{{\mathcal {R}}_{\textsc {cl}}}\)).
By the same argument as for the Root and Rost embeddings it then follows that \(\tau _{{\mathcal {R}}_{\textsc {cl}}}\le {\hat{\tau }}\le \tau _{{\mathcal {R}}_{\textsc {op}}}\) a.s. and also that \(\tau _{{\mathcal {R}}_{\textsc {cl}}}=\tau _{{\mathcal {R}}_{\textsc {op}}}\) a.s., proving the claim. \(\square \)
Remarks
In Sect. 6.3 we will show that the arguments above can be adapted to prove the existence of Rost and Root embeddings in a more general setting. Specifically, in Sects. 6 and 7 we will show that this approach generalizes to a multidimensional setup and (sufficiently regular) Markov processes. In the case of the Root embedding it does not matter for the argument whether the starting distribution is a Dirac in 0 as in our setup or a more general distribution \(\lambda \). For the Rost embedding a general starting distribution is slightly more difficult. In the case where \(\lambda \) and \(\mu \) have common mass, then it may be the case that \({{\mathrm{proj}}}_{\mathbb {R}_+}({\mathcal {R}}_\textsc {cl}\cap (A \times \mathbb {R}_+)) = \{0\}\) for some set A—that is, all paths which stop at \(x \in A\) do so at time zero. In this case it is possible that \({\hat{\tau }} < \tau _\textsc {op}\) when the process starts in A, and in general, some proportion of the paths starting on A must be stopped instantly. As a result, in the case of general starting measures, independent randomization is necessary. In the Rost case, it is also straightforward to compute the independent randomization which preserves the embedding property.
Other recent approaches to the Root and Rost embeddings can be found in [13, 14, 25, 26]. These papers largely exploit PDE techniques, and as a consequence, are able to produce more explicit descriptions of the barriers, however the methods tend to be highly specific to the problem under consideration.
Preliminaries on stopping times and filtrations
A key feature of this article is that we are taking a nonstandard perspective on stopping times; the main purpose of this section is to provide a convenient framework. To this end, we need to discuss connections between common notions defined on an arbitrary probability space and their related notions defined on the canonical path space \({C_0(\mathbb {R}_+)}\) and the space S. We then see (by Lemma 3.11, Theorem 3.8) that in the context of our optimization problem, rather than studying the class of all possible stopping times, we can equivalently focus on randomized stopping times on the canonical space. These can be characterized in various equivalent terms (cf. Theorem 3.8); e.g. viewing them as measures on \({C_0(\mathbb {R}_+)}\times \mathbb {R}_+\) is useful to establish compactness results while the representation through ‘increasing’ functions on S is necessary for the manipulations of stopping times which we need to consider in the proof of the monotonicity principle, Theorem 1.3, in Sect. 5. Finally, we shall consider the set of ‘joinings’ which can be interpreted as a type of coupling between a randomized stopping time and an abstract probability measure. This is an important ingredient in the proofs of Theorem 1.2 and Theorem 1.3.
Spaces and filtrations
We will primarily consider the space \({C_0(\mathbb {R}_+)}\) of continuous functions on \(\mathbb {R}_+\) starting at the value 0, with the topology of uniform convergence on compact sets. The elements of \({C_0(\mathbb {R}_+)}\) will be denoted by \(\omega \). We denote the canonical process on \({C_0(\mathbb {R}_+)}\) by \((B_t)_{t\ge 0}\), i.e. \(B_t(\omega )=\omega _t.\) We denote the Wiener measure by \({\mathbb {W}}\). As explained above we consider the set S of all continuous functions defined on some initial segment [0, s] of \(\mathbb {R}_+\) and starting with value 0; we will denote the elements of S by (f, s) and (g, t). The set S admits a natural partial ordering; we say that (g, t) extends (f, s) if \(t\ge s \) and the restriction \( g_{\upharpoonright [0,s]}\) of g to the interval [0, s] equals f. We consider S with the topology induced by the metric
for \((f,s), (g, t)\in S, s\le t\). Equipped with this topology, S is a Polish space.
For our arguments it will be important to be precise about the relationship between the sets \({{C_0(\mathbb {R}_+)}} \times \mathbb {R}_+\) and S. We therefore discuss the underlying filtrations in some detail.
We consider two different filtrations on the Wiener space \({{C_0(\mathbb {R}_+)}}\), the canonical or natural filtration \(\mathcal {F}^0=(\mathcal {F}_t^0)_{t\in \mathbb {R}_+}\) as well as its usual augmentation \(\mathcal {F}^a=(\mathcal {F}^a_t)_{t\in \mathbb {R}_+}\). As Brownian motion is a continuous Feller process, all rightcontinuous \(\mathcal {F}^a\)martingales are continuous ([47, Theorem VI. 15.4]) and hence all \(\mathcal {F}^a\)stopping times are predictable and the \(\mathcal {F}^a\)optional and \(\mathcal {F}^a\)predictable \(\sigma \)algebras coincide [46, Corollary IV 5.7]. By [16, Theorem IV. 97, Rem. IV. 98] we also have that the \(\mathcal {F}^0\)predictable, \(\mathcal {F}^0\)optional and \(\mathcal {F}^0\)progressive \(\sigma \)algebras coincide because \({{C_0(\mathbb {R}_+)}}\) is the set of continuous paths. Moreover, we will use the following result.
Theorem 3.1
Let \((\Omega ,\mathcal {G},(\mathcal {G}_t)_{t\in \mathbb {R}_+},\mathbb {P})\) be a filtered probability space and let \(\mathcal {G}^a \) be the usual augmentation of the filtration \(\mathcal {G}\).

(1)
If \(\tau \) is a predictable time wrt \(\mathcal {G}^a\), then there exists a predictable time \(\tau '\) wrt \(\mathcal {G}\) such that \(\tau =\tau '\) a.s. For every \(\mathcal {G}^a\)predictable process \((X_t)_{t\in \mathbb {R}_+}\) there is a \(\mathcal {G}\)predictable process \((X_t')_{t\in \mathbb {R}_+}\) which is indistinguishable from \((X_t)_{t\in \mathbb {R}_+}.\)

(2)
If \((A_t)_{t\in \mathbb {R}_+}\) is an increasing rightcontinuous \(\mathcal {G}^a\)predictable process there is an increasing rightcontinuous \(\mathcal {G}\)predictable process \((A_t')_{t\in \mathbb {R}_+}\) (possibly assuming the value \(+\infty \)) which is indistinguishable from \((A_t)_{t\in \mathbb {R}_+}\).
Proof
For Statement (1) we refer to [16, Theorem IV. 78] and the comments directly afterwards. To prove statement (2), let \((A_t)_{t\in \mathbb {R}_+}\) be an increasing rightcontinuous \(\mathcal {G}^a\)predictable process. Arguing on \((\frac{2}{\pi }\arctan (A_tA_0))_{t\in \mathbb {R}_+}\), we may assume that A takes values in [0, 1].
We use an extension of the filtered probability space denoted \((\bar{\Omega },\bar{\mathcal {G}},(\bar{\mathcal {G}}_t)_{t \ge 0},\bar{\mathbb {P}})\), where we take \(\bar{\Omega } = \Omega \times [0,1],\bar{\mathcal {G}} = \mathcal {G}\otimes {\mathcal {B}}([0,1]), \bar{\mathbb {P}}(D_1\times D_2) = \mathbb {P}(D_1) \mathcal {L}(D_2)\), and set \(\bar{\mathcal {G}}_t= \mathcal {G}_t \otimes {\mathcal {B}}([0,1])\) and let \(\bar{\mathcal {G}}^a\) be its usual augmentation. Here, \(\mathcal {L}\) denotes Lebesgue measure. Abusing notation we also write A for the mapping \((\omega ,x,t) \mapsto A_t(\omega )\) on \( \bar{\Omega }\times \mathbb {R}_+\).
Set \(Y(\omega , x):=x\). Then \(AY\) is \(\bar{\mathcal {G}}^a\)predictable and rightcontinuous, hence
is a \(\bar{\mathcal {G}}^a\)predictable stopping time by the (predictable) Debut theorem. Moreover
Pick a \(\bar{\mathcal {G}}\)predictable stopping time \(\rho '\) such that \(\rho '=\rho ,\bar{\mathbb {P}}\)a.s. and set
Then \(A'(\omega )\) is increasing and rightcontinuous for each \(\omega \). For each t
for \(\mathbb {P}\)a.a. \(\omega \), hence \(A'\) is a version of A. By rightcontinuity, A and \(A'\) are indistinguishable. Predictability of \(\rho '\) asserts that (using obvious abbreviations)
Hence \((\omega ,t) \mapsto A'_t(\omega )\) is \({\mathsf {pred}}_{{\mathcal {G}}}\)measurable. \(\square \)
The message of Theorem 3.2 below is that a process \((X_t)_{t\in \mathbb {R}_+}\) is \(\mathcal {F}^0\)optional (and hence also \(\mathcal {F}^0\)predictable in our setup) iff \(X_t(\omega )\) can be calculated from the restriction \(\omega _{\upharpoonright [0,t]}\). We introduce the mapping
We note that the topology on S introduced in (3.1) coincides with the final topology induced by the mapping r; moreover r is a continuous open mapping.
The following result is a particular case of [16, Theorem IV. 97] (in somewhat different notation).
Theorem 3.2
\(\mathcal {F}^0\)optional sets and functions on \({{C_0(\mathbb {R}_+)}} \times \mathbb {R}_+\) correspond to Borel measurable sets and functions on S. More precisely we have:

(1)
A set \(D\subseteq {{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+\) is \(\mathcal {F}^0\)optional iff \(D=r^{1}(A)\) for some Borel set \(A\subseteq S\).

(2)
A process \(X=(X_t)_{t\in \mathbb {R}_+}\) is \(\mathcal {F}^0\)optional iff \(X=H\circ r\) for some Borel measurable \(H:S\rightarrow \mathbb {R}\).
The mapping r is not a closed mapping: it is easy to see that there exist closed sets in \({{C_0(\mathbb {R}_+)}} \times \mathbb {R}_+\) with a nonclosed image under r. However this does not happen for closed optional sets: it is straightforward that an \(\mathcal {F}^0\)optional set \(A\subseteq {{C_0(\mathbb {R}_+)}} \times \mathbb {R}_+\) is closed iff the corresponding set r(A) is closed in S.
Definition 3.3
If X is an \(\mathcal {F}^0\)optional process we write \(X^S\) for the unique function \(S\rightarrow \mathbb {R}\) satisfying \(X=X^S\circ r\). We say that an optional process X is Scontinuous (resp. Slsc) if the corresponding function \(X^S: S \rightarrow \mathbb {R}\) is continuous (resp. lsc).
It is trivially true that an Scontinuous process is continuous in the usual pathwise sense. The converse is not generally true—consider the case where \(X_t(\omega )=\text{ sign }(\omega (1))(t2)_+\). This is a continuous, optional process, however the corresponding function \(X^S\) is not a continuous mapping from S to \(\mathbb {R}\). Other examples arise from functions connected to the local time of Brownian motion, cf. Sect. 6.2.
Definition 3.4
For a measurable \(X:{{C_0(\mathbb {R}_+)}}\rightarrow \mathbb {R}\) which is bounded or positive we set
Clearly, (3.3) defines an \(\mathcal {F}^0_t\)measurable function which is a version of the classical conditional expectation; subsequently, it will be useful to have this function defined for all \(\omega \). In accordance with Definition 3.3 we write \(X^{M,S}\) for the function satisfying \(X^M = X^{M,S}\circ r\).
Proposition 3.5
Let \(X\in C_b({{C_0(\mathbb {R}_+)}})\). Then \(X^M_t\) is an Scontinuous martingale, \(X^M_\infty =\lim _{t\rightarrow \infty } X^M_t\) exists and equals X.
Proof
Note that \(X^{M,S}(f,s)= \int X^{(f,s)\oplus }(\omega )\, {\mathbb {W}}(d\omega )\) for \((f,s)\in S\). Also, \((f_n,s_n)\rightarrow (f,s)\) implies \(f_n\oplus \omega \rightarrow f\oplus \omega \) for \(\omega \in {{C_0(\mathbb {R}_+)}}\) and, by continuity of \(X,X^{(f_n,s)\oplus }(\omega )\rightarrow X^{(f,s)\oplus }(\omega )\). Since X is bounded, dominated convergence implies \(X^{M,S}(f_n,s_n)\rightarrow X^{M,S}(f,s).\) \(\square \)
For \(X\in C_b({{C_0(\mathbb {R}_+)}})\), \(X^M\) is a martingale with continuous paths and hence satisfies the optional stopping theorem. Using the functional monotone class theorem, we see that the optional stopping theorem holds for \(X^M\) for all bounded measurable \(X: {C_0(\mathbb {R}_+)}\rightarrow \mathbb {R}\). Also one can prove that \(X^M\) has almost surely continuous paths, even if X itself was not continuous, but we will not use this fact.
Randomized stopping times
Working on the probability space \(({C_0(\mathbb {R}_+)}, {\mathbb {W}})\), a stopping time \(\tau \) is a mapping which assigns to each path \(\omega \) the time \(\tau (\omega ) \) at which the path is stopped. If the stopping time depends on external randomization, then we may consider a path \(\omega \) which is not stopped at a single point \(\tau (\omega )\), but rather that there is a subprobability measure \(\tau _\omega \) on \(\mathbb {R}\) which represents the probability that the path \(\omega \) is stopped at a given time, conditional on observing the path \(\omega \). The aim of this section is to make this idea precise, and to establish connections with related properties in the literature. Specifically, the notion of a randomized stopping time has previously appeared in e.g. [5, 40, 49].
Subsequently we will identify randomized stopping times as a subset of the well studied \({\mathbf {P}}\)measures: A finite measure \(\xi \) on \({C_0(\mathbb {R}_+)}\times \mathbb {R}_+\) is a \({\mathbf {P}}\) measure (wrt \({\mathbb {W}}\)) if it does not charge any \({\mathbb {W}}\)evanescent set. A basic result of Doléans [18] is the following
Theorem 3.6
(cf. [17, Theorem VI 65]) A finite measure \(\xi \) on \({C_0(\mathbb {R}_+)}\times \mathbb {R}_+\) is a \({\mathbf {P}}\)measure iff there exists a rightcontinuous increasing process \(A,\mathbb {E}[A_\infty ]<\infty \) such that for all bounded and measurable processes X
Here the process A is unique up to evanescence.
We will be particularly interested in the following subset of \({\mathbf {P}}\)measures:
where \((\xi _\omega )_{\omega \in {C_0(\mathbb {R}_+)}} \) is a disintegration of \(\xi \) in the first coordinate \(\omega \in {C_0(\mathbb {R}_+)}\). We equip \({\mathsf {M}}\) with the weak topology induced by the continuous bounded functions on \({C_0(\mathbb {R}_+)}\times \mathbb {R}_+\). Clearly any \(\xi \in {\mathsf {M}}\) is a \({\mathbf {P}}\)measure with corresponding increasing process \(A^\xi _\omega (t)=\xi _\omega ([0,t])\) being the cumulative distribution function of \(\xi _\omega .\)
Definition 3.7
(Randomized stopping times) A measure \(\xi \in \mathsf {M}\) is called a randomized stopping time, written \(\xi \in \mathsf {RST}\), iff the associated increasing process A is optional.
Below, it will sometimes be convenient to represent randomized stopping times on an extension of the space \(({C_0(\mathbb {R}_+)}, \mathcal {F}^0, (\mathcal {F}^0_t)_{t\ge 0},{\mathbb {W}})\): we will consider \(({{\overline{C}}_0(\mathbb {R}_+)},{\bar{\mathcal {F}}},({\bar{\mathcal {F}}}_t)_{t \ge 0},\overline{{\mathbb {W}}})\), where \({{\overline{C}}_0(\mathbb {R}_+)}= {C_0(\mathbb {R}_+)}\times [0,1], \overline{{\mathbb {W}}}(A_1\times A_2) = {\mathbb {W}}(A_1) \mathcal {L}(A_2)\) (where \(\mathcal {L}\) denotes Lebesgue measure), \({\bar{\mathcal {F}}}\) is the completion of \(\mathcal {F}^0 \otimes {\mathcal {B}}([0,1])\), and \({\bar{\mathcal {F}}}_t\) the usual augmentation of \((\mathcal {F}_t^0 \otimes {\mathcal {B}}([0,1]))_{t \ge 0}\). We will write \({\bar{B}}=({\bar{B}}_t)_{t\ge 0}\) for the process given by \({\bar{B}}_t(\omega ,u)=\omega _t.\) Observe that if \(Y_t(\omega ,u) = u\), then \(({\bar{B}}_t, Y_t)\) is (trivially) a continuous Feller process, and hence by the same arguments as above, the \({\bar{\mathcal {F}}}\)predictable and \({\bar{\mathcal {F}}}\)optional \(\sigma \)algebras coincide.
Randomized stopping times play a key role in this paper; depending on the respective context, the following different characterizations will be useful:
Theorem 3.8
Let \(\xi \in {\mathsf {M}}\). Then the following are equivalent:

(1)
There is a Borel function \(A:S\rightarrow [0,1]\) such that the process \(A\circ r\) is rightcontinuous increasing and
$$\begin{aligned} \xi _\omega ([0,s]):=A\circ r(\omega ,s) \end{aligned}$$(3.4)defines a disintegration of \(\xi \) wrt to \({\mathbb {W}}\).

(2)
We have \(\xi \in \mathsf {RST}\), i.e. given a disintegration \((\xi _\omega )_{\omega \in {C_0(\mathbb {R}_+)}}\) of \(\xi \), the random variable \( {\tilde{A}}_t(\omega )=\xi _\omega ([0,t])\) is \(\mathcal {F}^a_t\)measurable for all \(t\in \mathbb {R}_+\).

(3)
For all \(f\in C_b(\mathbb {R}_+)\) supported on some \( [0,t],t\ge 0\) and all \(g\in C_b({C_0(\mathbb {R}_+)})\)
$$\begin{aligned} \int f(s) (g\mathbb {E}[g\mathcal {F}_t^0])(\omega ) \, \xi (d\omega , ds)=0 \end{aligned}$$(3.5) 
(4)
On the probability space \(({{\overline{C}}_0(\mathbb {R}_+)},{\bar{\mathcal {F}}},({\bar{\mathcal {F}}}_t)_{t \ge 0},\overline{{\mathbb {W}}})\), the random time
$$\begin{aligned} \rho (\omega ,u) :=\inf \{ t \ge 0 : \xi _\omega ([0,t]) \ge u\} \end{aligned}$$(3.6)defines an \({\bar{\mathcal {F}}}\)stopping time.
Proof
The equivalence of (1) and (2) follows directly from Theorems 3.1, 3.2 and 3.6.
It is straightforward to deduce (4) from (1). To see that (4) implies (2), consider for \(t\ge 0, \omega \in {C_0(\mathbb {R}_+)}\)
To show that (2) and (3) are equivalent, we first note that (2) is equivalent to requiring that \(X_t(\omega ) := \xi _\omega (f)\) is \(\mathcal {F}^a_t\) measurable whenever \(f \in C_b(\mathbb {R}_+)\) is supported on [0, t]. However we can express this measurability in a different fashion. Note that a bounded Borel function h is \(\mathcal {F}_t^a\)measurable iff for all bounded Borel functions g
vanishes; of course this does not rely on our particular setup. By a functional monotone class argument, for \(\mathcal {F}_t^a\)measurability of \(X_t\) it is sufficient to check that
for all \(g\in C_b({{C_0(\mathbb {R}_+)}})\). In terms of \(\xi \), (3.7) amounts to
\(\square \)
Remark 3.9

(1)
The function A in (3.4) is unique up to indistinguishability (cf. Theorem 3.6). We will denote this function by \(A^\xi \).

(2)
We will say \(\xi \in \mathsf {RST}\) is a nonrandomized stopping time iff there is a disintegration \((\xi _\omega )_{\omega \in {C_0(\mathbb {R}_+)}}\) of \(\xi \) such that \(\xi _\omega \) is either null (corresponding to a path which is not stopped) or a Diracmeasure (of mass 1) for every \(\omega \). Clearly this means that \(\xi _\omega = \delta _{\tau (\omega )}\) a.s. for some (nonrandomized) stopping time \(\tau \). \(\xi \) is a nonrandomized stopping time iff there is a version of \(A^\xi \) which only attains the values 0 and 1.

(3)
We will say \(\xi \in \mathsf {RST}\) is a finite randomized stopping time iff \(\xi ({C_0(\mathbb {R}_+)}\times \mathbb {R}_+) = 1\).
An immediate consequence of Theorem 3.8 (3) is the following
Corollary 3.10
The set \(\mathsf {RST}\) is closed wrt the weak topology induced by the continuous bounded functions on \({{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+\).
The next lemma implies that optimizing over usual stopping times on a rich enough probability space in (OptSEP) is equivalent to optimizing over randomized stopping times on Wiener space.
Lemma 3.11
Let B be a Brownian motion on some stochastic basis \((\Omega , \mathcal {G}, (\mathcal {G}_t)_{t\ge 0}, \mathbb {P})\), let \(\tau \) be a \(\mathcal {G}\)stopping time and consider
Then \(\xi := \Phi (\mathbb {P})\) is a randomized stopping time and for any measurable \(\gamma :S \rightarrow \mathbb {R}\) we have
If \(\Omega \) is sufficiently rich that it supports a uniformly distributed random variable which is \(\mathcal {G}_0\)measurable then for any \(\xi \in \mathsf {RST}\), we can find a \(\mathcal {G}\)stopping time \(\tau \) such that \(\xi = \Phi (\mathbb {P})\) and (3.8) holds.
Proof
Clearly \(\xi :=\Phi (\mathbb {P})\in \mathsf {M}\). Write \((\xi _\omega )_{\omega \in {C_0(\mathbb {R}_+)}}\) for a disintegration wrt Wiener measure. We need to show that \(\xi _\omega ([0,t])\) is \(\mathcal {F}_t^a\)measurable. Let \(g:{C_0(\mathbb {R}_+)}\rightarrow \mathbb {R}\) be a measurable function. If \(h = \mathbb {E}_{\mathbb {W}}[g\mathcal {F}_t^a]\), writing \(\mathcal {G}_t^a\) for the usual augmentation of \(\mathcal {G}\), and noting that \((B_t)_{t\ge 0}\) is also a \((\mathcal {G}_t^a)_{t \ge 0}\)Brownian motion, we have
It then follows that
Hence \(\xi _\omega ([0,t])\) is \(\mathcal {F}_t^a\)measurable as required.
To prove the second part, we observe that by Theorem 3.8 (4), there exists an \({\bar{\mathcal {F}}}\)stopping time \(\rho '\) representing \(\xi \). Since \(\rho '\) is \({\bar{\mathcal {F}}}\)predictable, it follows from Theorem 3.1 that there exists an almost surely equal \((\mathcal {F}_t^0 \times {\mathcal {B}}([0,1]))_{t \ge 0}\)stopping time \(\rho \). Then we can define a random time on \(\Omega \) by \(\rho ((B_s)_{s \ge 0},Y)\), where B is the Brownian motion, and Y the independent \(\mathcal {G}_0\)measurable, uniform random variable. Consider the map \({\bar{\Phi }}:\Omega \rightarrow {{\overline{C}}_0(\mathbb {R}_+)}, {\bar{\omega }}\mapsto ((B_t({\bar{\omega }}))_{t\ge 0}, Y({\bar{\omega }})).\) Since \(\rho \) is a \((\mathcal {F}_t^0 \times {\mathcal {B}}([0,1]))_{t \ge 0}\)stopping time and \({\bar{\Phi }} \) is measurable from \( (\Omega , \mathcal {G}_t)\) to \(({{\overline{C}}_0(\mathbb {R}_+)}, \mathcal {F}_t^0 \times {\mathcal {B}} ([0,1])),\rho \circ (B,Y)\) is a \(\mathcal {G}\)stopping time. \(\square \)
Randomized stopping times solving the Skorokhod problem and compactness
For a finite randomized stopping time \(\xi \) and optional \(Y:{C_0(\mathbb {R}_+)}\times \mathbb {R}_+ \rightarrow \mathbb {R}\) which is bounded or positive, define \(Y_\xi \) as the pushforward of \(\xi \) under the mapping \((t,\omega )\mapsto Y_t(\omega )\) and denote \(Y^\xi _t:=Y_{\xi \wedge t}\) for \(t\in \mathbb {R}_+\). Considering the representation \(\rho \) of \(\xi \) on the extended space \({{\overline{C}}_0(\mathbb {R}_+)}\) as in (3.6) and writing \({\bar{Y}}_t(\omega ,u)= Y_t(\omega )\), we then have
Taking \(Y_t = t\) we obtain \({\xi }(T) = {\bar{\mathbb {E}}}[\rho ]\), where T denotes the projection
Recall that \(\mu \) has mean 0 and finite second moment \( \int x^2\, \mu (dx)=:V\). Then the following result follows directly from classical properties of stopping times (e.g. [31, Corollary 3.3]).
Lemma 3.12
Let \(\xi \in \mathsf {RST}^1\) with representation \(\rho \) on \({{\overline{C}}_0(\mathbb {R}_+)}\) as in (3.6). Assume that \(B_\xi = \mu \), i.e. \(\bar{B}_\rho \sim \mu \). Then the following are equivalent:

(1)
\(\xi (T)={\bar{\mathbb {E}}}[\rho ] < \infty \),

(2)
\(\xi (T)={\bar{\mathbb {E}}}[\rho ] = V\) ,

(3)
\((\bar{B}_{\rho \wedge t})\) is uniformly integrable.
Definition 3.13
We denote by \(\mathsf {RST}(\mu )\) the set of all finite randomized stopping times satisfying the conditions in Lemma 3.12.
For us it is crucial that randomized stopping times have the following property:
Theorem 3.14
The set \(\mathsf {RST}(\mu )\) is nonempty and compact wrt the weak topology induced by the continuous and bounded functions on \({{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+\).
Proof
If \(\mu \) is a centered probability then it is not hard to establish that the Skorokhod embedding problem has a solution, e.g. one can use the external randomization \(u\in [0,1]\) to stop \(({\bar{B}}_t(\omega ,u))_{t\ge 0}\) once it leaves (a(u), b(u)). Choosing a, b carefully we obtain a solution of (SEP), see e.g. [43, p. 332] for a detailed account.
By Prokhorov’s theorem we have to show that \(\mathsf {RST}(\mu )\) is tight and closed.
Tightness. Fix \(\varepsilon >0\) and take \(R=2V/\varepsilon \). Then, for any \(\xi \in \mathsf {RST}(\mu )\) we have \(\xi (T>R)\le \varepsilon /2.\) As \({{C_0(\mathbb {R}_+)}}\) is Polish there is a compact set \({\tilde{K}}\subseteq {{C_0(\mathbb {R}_+)}}\) such that \({\mathbb {W}}( {{\tilde{K}}}^c)\le \varepsilon /2.\) Set \(K:={\tilde{K}}\times [0,R].\) Then K is compact and we have for any \(\xi \in \mathsf {RST}(\mu )\)
Closedness. Take a sequence \((\xi _n)_{n\in \mathbb {N}}\) in \(\mathsf {RST}(\mu )\) converging to some \(\xi \). Putting \(h:{{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+\rightarrow \mathbb {R}, (\omega ,t)\mapsto \omega (t)\) we have to show that \(h(\xi )=\mu \) and that \(\xi (T)<\infty .\) Note that h is a continuous map. Take any \(g\in C_b(\mathbb {R}).\) Then \(g \circ h\in C_b({{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+)\) and hence
thus \(h(\xi )=\mu .\) Moreover, \(T\wedge N\) is continuous and bounded for each \(N\in \mathbb {N}\), hence \(\xi (T\wedge N)=\lim _n\xi _n(T\wedge N)\le V\). As N was arbitrary, it follows that also \(\xi (T)\le V<\infty \). \(\square \)
Our use of randomization to achieve compactness of a set of stopping times has similarities to the work of Baxter and Chacon [5]. However their setup is different, and their intended applications are not connected to Skorokhod embedding.
Joinings
We now add another dimension: we assume that \((\mathsf {Y}, \nu )\) is some Polish probability space and consider randomized stopping times where each death of a particle is tagged by an element of \(\mathsf {Y}\). More precisely, the set of joinings \(\mathsf {JOIN}(\nu )\) is given by
We shall also write \( \mathsf {JOIN}^1(\nu )\) for the subset of \(\pi \in \mathsf {JOIN}(\nu )\) having mass 1.
Remark 3.15
Write \({\mathsf {pred}}\) for the \(\sigma \)algebra of \(\mathcal {F}^0\)predictable sets in \({{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+\). We call a set \(A\subseteq {{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+\times \mathsf {Y}\) predictable if it is an element of \({\mathsf {pred}}\otimes {\mathcal {B}}(\mathsf {Y})\). We will say that a function defined on \({{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+\times \mathsf {Y}\) is predictable if it is measurable wrt \({\mathsf {pred}}\otimes {\mathcal {B}}(\mathsf {Y})\). As before, predictable subsets of \( {{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+\times \mathsf {Y}\) correspond to measurable subsets of \(S\times \mathsf {Y}\), and similarly for functions.
The optimization problem and duality
The primal problem
As defined in (OptSEP) in the introduction, our primal problem is to minimize the value corresponding to a function \(\gamma :S \rightarrow \mathbb {R}\), where the minimization is taken over stopping times of Brownian motion defined on a sufficiently rich probability space. By Lemma 3.11, we obtain an equivalent problem if we take B to be the canonical process on Wiener space \({C_0(\mathbb {R}_+)}\) and minimize over all randomized stopping times, i.e. we have
In the following we will mainly work with the technically convenient formulation given in (4.1). It immediately allows us to establish the existence of optimal stopping times:
Theorem 4.1
Assume that \(\gamma :S\rightarrow \mathbb {R}\) is lsc and bounded from below in the sense that for some constants \(a,b,c\in \mathbb {R}_+\)
holds on \({C_0(\mathbb {R}_+)}\times \mathbb {R}_+\). Then the functional
is lsc and (4.1) admits a minimizer.
By Lemma 3.11, Theorem 1.1 is a consequence of this result.
Proof of Theorem 4.1/Theorem 1.1
By the Portmanteau theorem, the functional (4.3) is lsc if \(\gamma :S\rightarrow \mathbb {R}\) is lsc and bounded from below by a constant.
For the general case we recall the pathwise version of Doob’s inequality (see [1])
We emphasize that we can understand the integral defining M in a pathwise fashion. This is possible since \(r\mapsto \max _{t\le r}B_t\) is increasing; we refer to [1] for details. In fact it is straightforward to show that M is an Scontinuous martingale satisfying \(M_{t} <2 \max _{r\le t} B_r^2\). It follows that \({\tilde{\gamma }}(f,s):= \gamma (f,s) + b s+ c (M^S(f,s) + 4 f(s)^2)\) is bounded from below and hence \(\xi \mapsto \int {\tilde{\gamma }}\, d\xi \) is lsc. As the value of \(\int b s+ c (M_s(\omega ) + 4 B_s^2(\omega ))\, d\xi (\omega , s)\) is the same for any \(\xi \in \mathsf {RST}(\mu )\) the functional (4.3) is lsc as well. \(\square \)
In Sect. 7 below we establish existence of a minimizing stopping time in the case where the measure \(\mu \) does not necessarily admit a finite second moment. However we will then replace Assumption (4.2) by the requirement that \(\gamma \) is bounded from below.
The dual problem
The following result implies Theorem 1.2.
Theorem 4.2
Let \(\gamma : S \rightarrow \mathbb {R}\) be lsc and bounded from below in the sense of (4.2). Set
where \(\varphi , \psi \) satisfy \(\varphi _t \le a + bt+c B_t^2 ,\psi (y) \le a+ b y^2\) for some \(a,b,c>0\). Then we have
Using the same argument as in the proof of Theorem 4.1, we see that it suffices to establish Theorem 4.2 in the case where \(\gamma \) is bounded from below. As usual, one part of the duality relation is straightforward to verify: for \((\varphi ,\psi )\) satisfying the dual constraint and \(\xi \in \mathsf {RST}(\mu )\) we have
hence \(D_\gamma \le P_\gamma \).
We will establish Theorem 4.2 as a consequence of the following auxiliary duality result, where we write T for the projection map \({C_0(\mathbb {R}_+)}\times \mathbb {R}_+\times \mathbb {R}\rightarrow \mathbb {R}_+,T(\omega ,t,y)=t\).
Proposition 4.3
Let \(c:{C_0(\mathbb {R}_+)}\times \mathbb {R}_+\times \mathbb {R}\rightarrow \mathbb {R}\cup \{\infty \}\) be lsc, predictable (cf. Remark 3.15) and bounded from below. Write \(V = \int x^2 \, \mu (dx)\). Then
where the infimum is taken over the set
and the supremum is taken over \(\varphi \in C_b({C_0(\mathbb {R}_+)}),\psi \in C_b(\mathbb {R})\) such that
Proposition 4.3 should be compared to the (formally) very similar classical duality theorem of optimal transport, see e.g. [58, Section 5] for a proof as well as for a discussion of its origin and related literature.
Theorem 4.4
(Monge–Kantorovich Duality) Let \((\mathsf {X}_i,\mu _i),\) \( i=1,2\) be Polish probability spaces and \(c:\mathsf {X}_1\times \mathsf {X}_2\rightarrow \mathbb {R}\cup \{\infty \}\) a lsc and bounded from below cost function. Then
where the \(\inf \) is taken over probabilities \(\pi \) on \(\mathsf {X}_1\times \mathsf {X}_2\) satisfying \({{\mathrm{proj}}}_{\mathsf {X}_1}(\pi )=\mu _1, {{\mathrm{proj}}}_{\mathsf {X}_2}(\pi )=\mu _2\) and the \(\sup \) is taken over \(\varphi \in C_b(\mathsf {X}_1),\psi \in C_b({\mathsf {X}_2})\) satisfying for \(x_1\in {\mathsf {X}_1}, x_2\in {\mathsf {X}_2}\)
The strategy of the proof of Proposition 4.3 is to establish the duality relation \((\star )\) for \(\pi \), resp. \((\varphi , \psi ) \) taken from certain larger candidate sets, in which case the duality relation follows from Theorem 4.4. Then we introduce additional constraints via a variational approach to obtain an improved duality through the following minmax theorem.
Theorem 4.5
(see e.g. [2, Thm. 2.4.1] or [55, Thm. 45.8]) Let K, L be convex subsets of vector spaces \(H_1\) resp. \(H_2\), where \(H_1\) is locally convex and let \(F:K\times L\rightarrow \mathbb {R}\) be given. If

(1)
K is compact,

(2)
\(F(\cdot , y)\) is continuous and convex on K for every \(y\in L\),

(3)
\(F(x,\cdot )\) is concave on L for every \(x\in K\)
then
Proof of Proposition 4.3
Fix \(t_0> 0\) and consider for a probability \(\pi \) on \({C_0(\mathbb {R}_+)}\times \mathbb {R}_+\times \mathbb {R}\) and \((\varphi , \psi )\in C_b({C_0(\mathbb {R}_+)})\times C_b(\mathbb {R})\) the conditions
Using compactness of \([0,t_0]\) it is not hard to see that \({\tilde{c}}(\omega ,y)=\inf _{t\le t_0} c(\omega , t, y)\) is continuous. We may thus apply the Monge–Kantorovich duality (Theorem 4.4) to the cost \({\tilde{c}}\) and obtain:
Claim 1
Taking the \(\inf \) over \(\pi \) satisfying (\(p[t_0]\)) and the \(\sup \) over \((\varphi , \psi )\) satisfying (\(d[c,t_0]\)), the duality relation \((\star )\) holds for continuous bounded \(c: {C_0(\mathbb {R}_+)}\times \mathbb {R}_+\times \mathbb {R}\rightarrow \mathbb {R}\).
Next consider the constraints
Using the min–max theorem (Theorem 4.5) with the function \(F(\pi ,\alpha )= \int c +\alpha (TV) \, d\pi \), the set of \(\pi \) satisfying (\(p[t_0]\)), and \(\alpha \ge 0\) we thus obtain
where we applied Claim 1 to the function \({\tilde{c}}=c +\alpha (TV)\) to establish the equality between (4.7) and (4.8). Hence we obtain:
Claim 2
Taking the \(\inf \) over \(\pi \) satisfying (\(p[t_0, V]\)) and the \(\sup \) over \((\varphi , \psi )\) satisfying \((d[c,t_0, V])\), the duality relation \((\star )\) holds for continuous bounded \(c: {C_0(\mathbb {R}_+)}\times \mathbb {R}_+\times \mathbb {R}\rightarrow \mathbb {R}\).
In the next step we will drop \(t_0\) and consider the constraints
Claim 3
Taking the \(\inf \) over \(\pi \) satisfying (p[V]) and the \(\sup \) over \((\varphi , \psi )\) satisfying (d[c, V]), the duality relation \((\star )\) holds for \(c: {C_0(\mathbb {R}_+)}\times \mathbb {R}_+\times \mathbb {R}\rightarrow \mathbb {R}\) lsc and bounded from below.
Given \(c\ge 0 \) lsc, \({{\mathrm{supp}}}\,c \subseteq {C_0(\mathbb {R}_+)}\times [0,t_0]\times \mathbb {R}\) for some \(t_0\) it is straightforward to verify
Such functions can be used to approximate any nonnegative lsc function on \( {C_0(\mathbb {R}_+)}\times \mathbb {R}_+\times \mathbb {R}\) from below. Using that the set of \(\pi \) satisfying (p[V]) is compact, a straightforward approximation argument (see e.g. [58, Proof of Theorem 5.10, Step 5] for details) yields Claim 3.
Recalling (3.5), \(\pi \in \mathsf {JOIN}^{1,V}(\mu )\) if and only if
here, k enforces the condition that \(\mathsf {proj}_{ {{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+}(\pi _{\upharpoonright {{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+ \times D}) \in \mathsf {RST}\) for all Borel sets D. We will apply the minmax theorem to \( F(\pi , h )= \int c+ h\, d\pi , \) where \(\pi \) satisfies (p[V]) and
\(n\in \mathbb {N},f_i\in C_b(\mathbb {R}_+), {{\mathrm{supp}}}\,f_i\subseteq [0,t_i],t_i\ge 0\), \(g_i\in C_b({C_0(\mathbb {R}_+)}),k_i\in C_b(\mathbb {R})\).
The set of \(\pi \) satisfying (p[V]) is convex and compact by Prokhorov’s theorem and the set of all h of the form (4.9) is a vector space as well. Hence we obtain for c continuous and bounded
where the last equality holds by Claim 3. Assume now that \(c\) is also predictable. For \((\varphi ,\psi )\) satisfying \((d[c+ h,V])\) there is some \(\alpha \ge 0\) such that
Fixing t and y, (4.11) can be read as an inequality between functions in \(\omega \). Taking conditional expectations wrt \(\mathcal {F}^0_t\) in the sense of Definition 3.4 we obtain
for all \(\omega \in {C_0(\mathbb {R}_+)},t\in \mathbb {R}_+, y\in \mathbb {R}\), where we have used that \(c\) is predictable and that \(\mathbb {E}[f(t) (g\mathbb {E}[g\mathcal {F}_{u}^0])\mathcal {F}^0_t]=0\) whenever \({{\mathrm{supp}}}\,f \subseteq [0,u]\).
It follows that \((\varphi ,\psi )\) satisfy (\(d^M[c,V]\)). Thus (4.10) yields the nontrivial part of \((\star )\) for the constraints \((p^M[V])\), (\(d^M[c,V]\)) in the case of continuous bounded c. As above, the extension to lsc c is straightforward. \(\square \)
Proof of Theorem 4.2
Consider the space \({C_0(\mathbb {R}_+)}\times \mathbb {R}_+ \times \mathbb {R}\) and the cost function
It is straightforward to see that \(c\) is lsc since \(\gamma \) was assumed to be lsc. Hence \((\star )\) holds by Proposition 4.3. It remains to show that
To prove the first inequality, consider a bounded pair \((\varphi ,\psi )\) satisfying (\(d^M[c,V]\)), i.e. there is \(\alpha \ge 0\) such that \( \varphi ^M_t(\omega ) +\psi (y)  \alpha (tV)\le c(\omega ,t,y) \) for all \(\omega \in {C_0(\mathbb {R}_+)}, y\in \mathbb {R}, t\in \mathbb {R}_+\). But then
which we rewrite as
Noting that \(\alpha (\omega (t)^2t)\) is an Scontinuous martingale starting in 0, we find that \(({\bar{\varphi }}, {\bar{\psi }})\) satisfies the constraint of the dual problem considered in Theorem 4.2. Since \(V=\int y^2\, \mu (dy)\) we have \(\int {\bar{\psi }}(y)\ \mu (dy)=\int \psi (y)\ \mu (dy)+{\mathbb {W}}(\varphi )\), establishing the first part of (4.13).
To prove the latter inequality, note that each \(\pi \in \mathsf {JOIN}^{1,V}( \mu )\) satisfying \(\int c\, d\pi <\infty \) is concentrated on \(\{(\omega ,t, y): \omega (t)=y\}\) and writing \(p(\omega , t, y):=(\omega ,t)\) we find \(\xi := p(\pi )\in \mathsf {RST}(\mu ),\int c\, d\pi = \int \gamma \, d\xi \). \(\square \)
General starting distribution
In this section we consider \({C(\mathbb {R}_+)}\), the set of all continuous functions on \(\mathbb {R}_+\), and
Let \(\lambda \) be a probability measure on \(\mathbb {R}\) prior to \(\mu \) in convex order—i.e., \(\int F(x) \ \lambda (dx) \le \int F(x) \ \mu (dx)\) for any convex function F. In particular \(\lambda \) is centered and \(V_\lambda =\int x^2\ \lambda (dx)\le V < \infty \). This ensures the existence of solutions to the Skorokhod embedding problem with general starting distribution \(\lambda \) with finite first moment. Denote by \({\mathbb {W}}_x\) the law of Brownian motion starting in x and put \({\mathbb {W}}_\lambda (d\omega )=\int {\mathbb {W}}_x(d\omega )\lambda (dx)\) for \(\omega \in {C(\mathbb {R}_+)}\), the law of Brownian motion starting at a random point according to the distribution \(\lambda \). Given a function \(\gamma : S_\mathbb {R}\rightarrow \mathbb {R}\) we are interested in the minimization problem
where \(\mathsf {RST}(\lambda ,\mu )\) is the set of all randomized stopping times \(\xi \) on \(({C(\mathbb {R}_+)},{\mathbb {W}}_\lambda )\) embedding \(\mu \) and satisfying \({\xi }(T)=VV_\lambda \); in particular \({{\mathrm{proj}}}_{{C(\mathbb {R}_+)}}(\xi )={\mathbb {W}}_\lambda \) and \(h(\xi )=\mu \) for the map \(h:{C(\mathbb {R}_+)}\times \mathbb {R}_+ \rightarrow \mathbb {R}, (\omega ,t)\mapsto \omega (t).\) We then have the following result:
Theorem 4.6
Let \(\gamma : S_\mathbb {R}\rightarrow \mathbb {R}\) be lsc and bounded from below as in (4.2). Put
where \(\varphi , \psi \) satisfy \(\varphi _t \le a + bt+c B_t^2 ,\psi (y) \le a+ b y^2\) for some \(a,b,c>0\). Then we have the duality relation \(P_\gamma = D_\gamma .\)
The proof goes along the same lines as the proof of Theorem 4.2. The inequality \(D_\gamma (\lambda ,\mu )\le P_\gamma (\lambda ,\mu )\) is straightforward. For the other direction we can use the same argument as before, replacing \({\mathbb {W}}\) by \({\mathbb {W}}_\lambda \) and V by \({\tilde{V}}:=VV_\lambda \). Up to Eq. (4.14) everything can be copied verbatim. Then we rewrite \( \varphi ^M_t(\omega )+\psi (\omega (t))  \alpha (tV+V_\lambda )\) as
and note that \({{\mathbb {W}}_\lambda }(\omega (t)^2)=t+V_\lambda .\) The proof concludes as before.
The monotonicity principle
In this section we will establish the monotonicity principle: suppose \(\xi \in \mathsf {RST}(\mu )\) is an optimal stopping rule for some function \(\gamma \), then we will find a set \(\Gamma \) supporting \(\xi \) such that \(\mathsf {SG}\cap (\Gamma ^<\times \Gamma )=\emptyset \). The argument can be divided into two major steps:

1.
Consider an optimal stopping rule \(\xi \) and a stopgo pair \(((f,s),(g,t))\in \mathsf {SG}\) where (f, s) is still going according to the stopping rule \(\xi \) while (g, t) is stopped by \(\xi \). Intuitively speaking, we can find an (infinitesimal) improvement of \(\xi \) by switching the roles of f and g. As \(\xi \) is optimal, there should only exist a few such pairs. We formalize this in Proposition 5.8 by showing that if \(\pi (\mathsf {SG})>0\) for some \(\pi \in \mathsf {JOIN}(r(\xi ))\) we can explicitly construct a stopping rule with strictly lower ‘cost’.

2.
Knowing that \(\mathsf {SG}\) is negligible in the sense that it is not seen by the ‘couplings’ \(\pi \) just described, it remains to find a support \(\Gamma \) of \(\xi \) such that \(\mathsf {SG}\cap (\Gamma ^< \times \Gamma )=\emptyset .\) The crucial step is the characterization of a set which is null wrt all \(\pi \in \mathsf {JOIN}(r(\xi ))\) which we establish in Proposition 5.9 based on Choquet’s capacitability theorem and an auxiliary duality result.
Armed with Propositions 5.8 and 5.9, we will establish Theorem 1.3: If \(\xi \) is an optimal stopping time, then Proposition 5.8 implies that a certain set of pairs of paths, i.e. the set of stopgo pairs is negligible in a quasisure sense, i.e. almost surely null with respect to all \(\pi \in \mathsf {JOIN}(r(\xi ))\). Proposition 5.9 will then allow us to exclude a \(r(\xi )\)null set of paths to obtain a support \(\Gamma \) of \(r(\xi )\) such that \(\Gamma ^<\times \Gamma \) avoids all stopgo pairs.
In the first part of this section we will give a number of definitions and results that are needed to establish Theorem 1.3 (including the statements of Propositions 5.8 and 5.9); the respective proofs will be given subsequently.
The notion of stopgo pairs introduced in Definition 1.4 requires that all possible extensions \(\sigma \) are considered. However, to establish the monotonicity principle, it is actually more natural to prove a stronger result that appeals to a relaxed notion of stopgo pairs which are sensitive to the stopping measure \(\xi \), or—more precisely—to a representation of \(\xi \) through a function \(A^\xi \) as in Theorem 3.8 (1).
Important Convention
Throughout this section we will fix \(\xi \in \mathsf {RST}(\mu )\), as well as the particular representation \(A^\xi \).
Definition 5.1
For \((f,s)\in S\), the conditional randomized stopping time \(\xi ^{(f,s)}\) is given as
The measure \(\xi ^{(f,s)}\) is the normalized stopping measure given that we followed the path f up to time s. In other words this is the normalized stopping measure of the ‘bush’ which follows the ‘stub’ (f, s). We note that \(\xi ^{(f,s)}\) depends measurably on \((f,s)\in S\).
Informally, the following lemma asserts that if \(\xi \) is a wellbehaved stopping time, then the same holds for \(\xi ^{(f,s)}\) for typical \((f,s)\in S\). More precisely, we say that \(V\subseteq S\) is evanescent if \(r^{1}(V)\) is an evanescent subset of \({C_0(\mathbb {R}_+)}\times \mathbb {R}_+\). Equivalently, V is evanescent if there is a Borel set \(A\subseteq {C_0(\mathbb {R}_+)}, {\mathbb {W}}(A)=1\) such that \(r(A\times \mathbb {R}_+)\cap V=\emptyset \). Recall that T denotes the projection from \({{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+\) onto \( \mathbb {R}_+.\)
Lemma 5.2
The set \(\{(f,s)\in S : \xi ^{(f,s)}\notin \mathsf {RST}^1\}\) is evanescent. Moreover, if \(F:{C_0(\mathbb {R}_+)}\times \mathbb {R}_+\rightarrow \mathbb {R}_+\) is predictable and satisfies \(\xi (F)<\infty \) then the set \(\{(f,s)\in S : \xi ^{(f,s)}(F^{(f,s)\oplus }) =\infty \}\) is evanescent. In particular, \(\{(f,s)\in S:\xi ^{(f,s)}(T) =\infty \}\) is evanescent, since \(\xi \in \mathsf {RST}(\mu )\).
Definition 5.3
The set \(\mathsf {SG}^\xi \) of stopgo pairs relative to \(\xi \) consists of all \(\big ( (f,s), (g,t)\big )\in S\times S\), \(f(s)= g(t)\) such that
We define stopgo pairs in the wide sense by \({\widehat{\mathsf {SG}}}^\xi =\mathsf {SG}^\xi \cup \{(f,s)\in S: A^\xi (f,s)=1\}\times S\).
In analogy to Definition 1.4 we agree that (5.2) holds in any of the following cases:

(1)
\(\int T\, d\xi ^{(f,s)}=\infty \) or \(\xi ^{(f,s)}({C_0(\mathbb {R}_+)}\times \mathbb {R}_+)<1\);

(2)
the integral on the left side equals \(\infty \);

(3)
either of the integrals is not defined.
We now discuss the relation between the set \(\mathsf {SG}\) given in Definition 1.4 and the set \({\widehat{\mathsf {SG}}}^\xi \). Note that if \(A^\xi (f,s)<1\) and \(((f,s),(g,t))\in \mathsf {SG}\) for some \((g,t) \in S\) then we shall show below that \(((f,s),(g,t))\in \mathsf {SG}^\xi \). In contrast to this, whenever \(A^\xi (f,s)=1\), the left and right hand sides of (5.2) are identical and ((f, s), (g, t)) cannot be a stopgo pair relative to \(\xi \). However, in general \( \mathsf {SG}\cap \{(f,s)\in S: A^\xi (f,s)=1\}\times S\) may be nonempty. For this reason we are also interested in the set of stopgo pairs in the wide sense which satisfy:
Lemma 5.4
Every stopgo pair is a stopgo pair in the wide sense, i.e.
Remark 5.5
Note that \(\mathsf {SG}^\xi \) and \({\widehat{\mathsf {SG}}}^\xi \) are Borel subsets of \(S\times S\) (corresponding to predictable subsets of \(({{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+)\times ({{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+)=({{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+)\times \mathsf {Y}\) in the sense of Remark 3.15). In contrast, \(\mathsf {SG}\) is in general just coanalytic.
Definition 5.6
A set \(\Gamma \subseteq S\) is called \((\gamma ,\xi )\)monotone iff
Recall that we say that our optimization problem (OptSEP) is well posed if \({\int \gamma ~ d\xi }\) exists with values in \((\infty ,\infty ]\) for all \(\xi \in \mathsf {RST}(\mu )\) and it is finite for one such \(\xi \). Together with Lemma 5.4, the following result implies Theorem 1.3 stated in the introduction, and is itself a slightly stronger result.
Theorem 5.7
Assume that \(\gamma :S\rightarrow \mathbb {R}\) is Borel measurable, the optimization problem (4.1) is well posed and that \(\xi \in \mathsf {RST}(\mu )\) is an optimizer. Then there exists a \((\gamma ,\xi )\)monotone Borel set \(\Gamma \subseteq S\) which supports \(\xi \) in the sense that \(r(\xi )(\Gamma )=1\).
The proof of Theorem 5.7 relies on Proposition 5.8 and Proposition 5.9 below. The first result formalizes the heuristic idea that an optimizer cannot be improved on a large set of paths but at most on a small set of exceptional paths. The second result allows us to entirely exclude such an exceptional set of paths.
Given functions \(F:\mathsf {X}\rightarrow \mathsf {X}', G:\mathsf {Y}\rightarrow \mathsf {Y}'\) we denote the product map by \(F\otimes G:\mathsf {X}\times \mathsf {Y}\rightarrow \mathsf {X}'\times \mathsf {Y}'.\) Given a probability \(\nu \) on a Polish space \(\mathsf {Y}\), we defined the set \(\mathsf {JOIN}(\nu )\) in Sect. 3.4. An element \(\pi \in \mathsf {JOIN}(\nu )\) is a measure on \(({{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+) \times \mathsf {Y}\), and we will commonly consider the pushforward measure \((F \otimes G)(\pi )\). Typically F will be the map \(r: {{C_0(\mathbb {R}_+)}} \times \mathbb {R}_+ \rightarrow S\), and G will be r or the identity.
Proposition 5.8
Assume that \(\gamma :S\rightarrow \mathbb {R}\) is Borel measurable, the optimization problem (4.1) is well posed and that \(\xi \in \mathsf {RST}(\mu )\) is an optimizer. Then \((r \otimes {{\mathrm{Id}}})(\pi )(\mathsf {SG}^\xi )=0\) for any \(\pi \in \mathsf {JOIN}^1( r(\xi ))\).
Below we apply Proposition 5.9 to \((\mathsf {Y}, \nu )=(S, r(\xi ))\), but this choice is not relevant for the proof of Proposition 5.9 and so we state it for an abstract Polish probability space \((\mathsf {Y}, \nu )\).
Proposition 5.9
Let \((\mathsf {Y}, \nu )\) be a Polish probability space and \(E\subseteq S\times \mathsf {Y}\) a Borel set. Then the following are equivalent:

(1)
\((r \otimes {{\mathrm{Id}}})(\pi )(E)=0\) for all \(\pi \in \mathsf {JOIN}^1( \nu )\).

(2)
\(E \subseteq (F \times \mathsf {Y})\ \cup \ (S\times N)\) for some evanescent set \(F\subseteq S\) and a \(\nu \)null set \(N\subseteq \mathsf {Y}\).
Intuitively speaking, Proposition 5.9 characterizes when a predictable set \(E\subseteq S\times \mathsf {Y}\) is ‘negligible’. In this sense it relates to the classical (cross) section theorem, which implies the following characterization of negligible subsets of S.
Proposition 5.10
Let \(E\subseteq S\) be Borel. Then the following are equivalent:

(1)
\(r(\alpha )(E)=0\) for all \(\alpha \in \mathsf {RST}\).

(2)
E is evanescent.

(1’)
\({\mathbb {W}}(((B_s)_{s\le \tau },\tau )\in E)=0\) for every \(\mathcal {F}^0\)stopping time \(\tau \).
Note that the equivalence of (1) and (2) in Proposition 5.10 corresponds precisely to Proposition 5.9 in the case where \(\mathsf {Y}\) consists of a single element.
Proof of Theorem 5.7
By Proposition 5.8, \((r \otimes {{\mathrm{Id}}})(\pi )(\mathsf {SG}^\xi )=0\) for any \(\pi \in \mathsf {JOIN}^1(r(\xi ))\). Applying Proposition 5.9 with \((\mathsf {Y}, \nu )=(S, r(\xi ))\) we deduce that there exists an evanescent set \({\tilde{F}}\subseteq S\) and a set \(N\subseteq S\) such that \(r(\xi )(N)=0\), and
Put \(F:=\{(g,t)\in S:\exists (f,s)\in {\tilde{F}},t\ge s, g\equiv f \text { on }[0,s]\}.\) Then F is evanescent and satisfies
Setting \(\Gamma _0= S{\setminus } (F\cup N) \) we have \(r(\xi )(\Gamma _0)=1\) as well as \(\mathsf {SG}^\xi \cap ( \Gamma _0^<\times \Gamma _0)=\emptyset .\) Next we define
Then \(r(\xi )( \Gamma _1)=1\) and \( \Gamma _1^< \cap \{(f,s):A^\xi (f,s)=1\}=\emptyset \) so that \( {\widehat{\mathsf {SG}}}^\xi \cap (\Gamma _1^<\times \Gamma _1)=\emptyset \). Finally we take \(\Gamma \) to be a Borel subset of \(\Gamma _1\) which has full measure. \(\square \)
It remains to establish the auxiliary results stated above.
Proof of Lemma 5.2
Consider
Set \(A^\xi (\omega ):= \lim _{t\rightarrow \infty } A^\xi \circ r (\omega ,t)\). Then \((f,s)\in U_1 \) is equivalent to \(\int A^\xi (f\oplus \omega )\, d{\mathbb {W}}(\omega )<1\). Given an \(\mathcal {F}^0\)stopping time \(\tau \), the strong Markov property implies
hence \({\mathbb {W}}(((B_s)_{s\le \tau }, \tau )\in U_1)=0\).
Additionally, setting \(\alpha (d\omega ,dt)=\delta _{\tau (\omega )}(dt){\mathbb {W}}(d\omega )\) we have
which implies \(r(\alpha )(U_2)=0\). Summing up, we get \({\mathbb {W}}(((B_s)_{s\le \tau }, \tau )\in U_1\cup U_2)=0\) proving the claim in view of Proposition 5.10. \(\square \)
Proof of Lemma 5.4
Suppose that \(A^\xi (f,s)<1\) and \(((f,s),(g,t))\not \in {\widehat{\mathsf {SG}}}^{\xi }\) for some \((g,t) \in S\) with \(g(t) = f(s)\). In particular, (5.2) fails for \(\xi ^{(f,s)}\), and conditions (1)–(3) above all fail. By Theorem 3.8 (4), and using the same argument as seen in the proof of Lemma 3.11, we can find a \((\mathcal {F}^0_t \otimes {\mathcal {B}}([0,1]))_{t\ge 0}\)stopping time \(\rho \) such that \(\overline{{\mathbb {W}}}(\rho>0) > 0\) and for any measurable and bounded or nonnegative \(X: {C_0(\mathbb {R}_+)}\times \mathbb {R}_+ \rightarrow \mathbb {R}\), we have \(\int X_t(\omega ) \, d \xi ^{(f,s)}(\omega ,t) = \int \mathcal {L}(du) \int {\mathbb {W}}(d\omega ) X_{\rho (\omega ,u)}(\omega )\). By the conditions below Definition 5.3, it follows that there exists \(u_0 \in [0,1]\) such that
and such that \(\rho _0: \omega \mapsto \rho (\omega ,u_0)\) is an \((\mathcal {F}^0_t)_{t \ge 0}\)stopping time with \(0< {\mathbb {W}}(\rho _0) < \infty \), both sides of (5.4) are well defined, and the left hand side is finite. In particular, writing \(B^\Omega \) for Brownian motion on the abstract probability space \(\Omega ,\sigma = \rho _0\circ B^\Omega \) defines an \(\mathcal {F}^B\)stopping time, and (1.6) fails for this stopping time. Hence \(((f,s),(g,t))\not \in \mathsf {SG}\). \(\square \)
Proof of Proposition 5.8
Working towards a contradiction we assume that there is \(\pi \in \mathsf {JOIN}( r(\xi ))\) such that \((r \otimes {{\mathrm{Id}}})(\pi )(\mathsf {SG}^\xi )>0\). Observe that \(\pi \in \mathsf {JOIN}(r(\xi ))\) implies that \(\pi _{\upharpoonright (r\otimes {{\mathrm{Id}}})^{1}(E)}\in \mathsf {JOIN}(r(\xi ))\) for any \(E\subseteq S\times S\). Hence, considering \((r \otimes {{\mathrm{Id}}})(\pi )_{\upharpoonright \mathsf {SG}^\xi }\), we can also assume that \((r \otimes {{\mathrm{Id}}})(\pi )\) is concentrated on \(\mathsf {SG}^\xi \) and then \(r({{\mathrm{proj}}}_\mathsf {X}(\pi ))(\{(f,s):A^\xi (f,s)=1\})=0\), where \(\mathsf {X}:= {C_0(\mathbb {R}_+)}\times \mathbb {R}_+\). Finally we also consider the representation of \(\pi \) on \(({C_0(\mathbb {R}_+)}\times \mathbb {R}_+)\times ({C_0(\mathbb {R}_+)}\times \mathbb {R}_+)\) defined through
and note that \(\pi =({{\mathrm{Id}}}\otimes r)({\bar{\pi }}). \)
We will use \(\pi \) and \({\bar{\pi }}\) to define modifications \(\xi _0^\pi \in \mathsf {RST}\) and \(\xi _1^\pi \in \mathsf {RST}\) of \(\xi \) such that the following hold true:

(1)
The terminal distributions \(\mu _0, \mu _1\) corresponding to \(\xi _0^\pi \) and \(\xi _1^\pi \) satisfy \((\mu _0+\mu _1)/2= \mu .\)

(2)
\(\xi _0^\pi \) stops paths earlier than \(\xi \) while \(\xi _1^\pi \) stops later than \(\xi \).

(3)
The cost of \(\xi _0^\pi \) plus the cost of \(\xi _1^\pi \) is less than twice the cost of \(\xi \), i.e.
$$\begin{aligned}&\int \gamma \circ r(\omega ,t) \, d\xi ^\pi _0(\omega , t)+ \int \gamma \circ r(\omega ,t) \, d\xi ^\pi _1(\omega , t) \\&\quad < 2 \int \gamma \circ r(\omega ,t) \, d\xi (\omega , t). \end{aligned}$$
More formally, (2) asserts that for almost all \(\omega \), and every \(s\ge 0\)
where \((\xi _\omega )_{\omega \in {{C_0(\mathbb {R}_+)}}}\) is the disintegration of \(\xi \) wrt \({\mathbb {W}}\) induced by \(A^{\xi }\) and \(((\xi _0^\pi )_\omega )_{\omega \in {{C_0(\mathbb {R}_+)}}},\) \( ((\xi _1^\pi )_\omega )_{\omega \in {{C_0(\mathbb {R}_+)}}}\) are disintegrations of \(\xi _0^\pi , \xi _1^\pi \) wrt \({\mathbb {W}}\).
If we are able to construct such a pair \(\xi _0^\pi , \xi _1^\pi \), then \(\xi ^\pi :=(\xi _0^\pi +\xi _1^\pi )/2 \in \mathsf {RST}(\mu )\) is strictly better than \(\xi \) and therefore yields the desired contradiction.
In the proof we will often use the following ‘strong Markov property’ of randomized stopping times: for \(\alpha \in \mathsf {RST}\) and bounded measurable \(F:{C_0(\mathbb {R}_+)}\times \mathbb {R}_+\rightarrow \mathbb {R}\) we have
To define \(\xi _0^\pi \), let \(\alpha _0= \mathsf {proj}_\mathsf {X}(\pi )\in \mathsf {RST}\) and consider \(A^{\alpha _0}:S\rightarrow [0,1]\) as in Theorem 3.8 (1). We define the randomized stopping time \(\xi _0^\pi \) via the product
The probabilistic interpretation of this definition is that a particle is stopped by \(\xi _0^\pi \) if it is stopped by \(\alpha _0\) or stopped by \(\xi \), where these events are taken to be conditionally independent given the particle followed the path f until time s. Comparing \(\xi \) and \(\xi _0^\pi \) the latter will stop some particles earlier than the first one. Also, \(\xi ^\pi _0\in \mathsf {RST}\) by Theorem 3.8 (1). By partial integration, if \(D\subseteq {{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+\) then \(\xi ^\pi _0\) satisfies
where \((A^{\alpha _0}\circ r)_\) denotes the left continuous version of \(A^{\alpha _0}\circ r\).
Our next goal is to derive (in (5.9) below) a representation for the difference between \(\xi _0^\pi \) and \(\xi \). For Borel \(D\subseteq {{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+\) we have
Furthermore, writing \(D_\omega = \{ t\in \mathbb {R}_+: (\omega ,t)\in D\}\) and \(\theta _s(\omega )= (\omega _{t+s}\omega _s)_{t\ge 0}\), we have
Combining (5.7) and (5.8) we obtain for bounded measurable \(F:S\rightarrow \mathbb {R}\) using (5.6)
Let us now turn to the definition of \(\xi _1^\pi \). For \(D\subseteq {{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+\) we define
and observe that \(\alpha _1\in \mathsf {RST}\) by Theorem 3.8 (2) since \(\eta \mapsto (\alpha _1)_\eta ([0,t])\) is \(\mathcal {F}_t^a\)measurable by (5.5). Then we define the probability measure \(\xi _1^\pi \) on \({{C_0(\mathbb {R}_+)}}\times \mathbb {R}_+\) by
To motivate this definition, we note that the support of the randomized stopping time \(\xi \) can be viewed (informally) as a subtree of S. The joining \(\pi \) defines a plan how to trim this tree, i.e. to cut a bush at position \(r(\omega ,s)\) and to plant it on top of \(r(\eta ,t)\). Hence, we take the tree, \(\xi \), prepare the position where something will be newly planted, subtract \(\alpha _1\) which takes away some mass, and plant as much as possible (accounting for the factor \((1A^\xi \circ r(\omega ,s))\) in (5.10) and (5.11)) on these stubs to end up with a tree of mass one again.
As a consequence of Definition 5.1 for each u the map \(\eta \mapsto \xi ^{(f,s)}_{\theta _t(\eta )}([0,(ut)\vee 0])\) is \(\mathcal {F}_u^0\)measurable. Moreover, \((\xi ^\pi _1)_\eta \in {\mathcal {P}}^{\le 1}(\mathbb {R}_+)\) and it follows that \(\xi ^\pi _1\in \mathsf {RST}.\) From (5.10) and (5.11) it follows that for bounded measurable \(F:S\rightarrow \mathbb {R}\) using (5.5)
Adding (5.9) and (5.12) and recalling \(2\xi ^\pi =\xi ^\pi _0+\xi ^\pi _1\), we obtain for bounded measurable \(F:S\rightarrow \mathbb {R}\)
Next we show that (5.13) extends to nonnegative functions \(F:S\rightarrow \mathbb {R}_+\) satisfying \(\xi (F)<\infty \). Put \(X(\omega ):=\int F(r(\omega ,t))~\xi _\omega (dt)\). Then \(\mathbb {E}[X]=\xi (F)<\infty .\) Moreover, recalling Definition 3.4 we have
It then follows that
where \(\rho _0\) denotes the representation of \(\alpha _0\) as in (3.6) and \({\bar{X}}^M_t(\omega ,u)=X^M_t(\omega )\). This implies
hence (5.13) holds also for such F. Applying this to \(F(f,s)=s\) we find \(\xi ^\pi (T)=\xi (T)<\infty \). Taking \(F(f,s)=G(f(s))\) for bounded measurable \(G:\mathbb {R}\rightarrow \mathbb {R}\), the right hand side of (5.13) vanishes since \({\bar{\pi }}\) is concentrated on pairs \(((\omega ,s),(\eta ,t))\) satisfying \(\omega (s)=\eta (t)\). This implies that \(\xi \) and \(\xi ^\pi \) embed the same distribution, i.e. \(\xi ^\pi \in \mathsf {RST}(\mu )\).
Arguing on the negative and positive part of \(\gamma \) and using that \(\xi ^\pi (\gamma ^),\xi (\gamma ^)<\infty \) we see that (5.13) applies to \(F=\gamma \). By definition of \(\mathsf {SG}^\xi \),
is \({\bar{\pi }}\)a.s. strictly negative since \(r({{\mathrm{proj}}}_\mathsf {X}(\pi ))(\{(f,s):\xi ^{(f,s)}(T)=\infty \text { or } \xi ^{(f,s)}\notin \mathsf {RST}^1\})=0\) by Lemma 5.2. Hence \(\xi ^\pi (\gamma ) <\xi (\gamma )\), contradicting optimality of \(\xi \). \(\square \)
Proof of Proposition 5.9
Only the implication (1) \(\Rightarrow \) (2) of Proposition 5.9 is nontrivial. The proof is based on Choquet’s capacitability theorem and the following auxiliary duality result which is closely related to Proposition 4.3. We fix \(t_0\in \mathbb {R}_+\) and set \(S_{t_0}:=\{(f,s)\in S: s\le t_0\}\).
Proposition 5.11
Consider a Polish probability space \((\mathsf {Y},\nu )\) and let \(c:{C_0(\mathbb {R}_+)}\times \mathbb {R}_+\times \mathsf {Y}\rightarrow \mathbb {R}\cup \{\infty \}\) be lsc, predictable (cf. Remark 3.15) and bounded from below. Then
where the infimum is taken over the set
and the supremum is taken over \(\varphi \in C_b({C_0(\mathbb {R}_+)}),\psi \in C_b(\mathsf {Y})\) such that
Proof
As the arguments are almost identical to the ones from Proposition 4.3 we will only sketch the proof. By approximation it is sufficient to establish the result for continuous bounded c. As before the Monge–Kantorovich duality yields that \((\star \star )\) holds provided that \(\pi \) and \((\varphi , \psi )\), resp., satisfy
If c is predictable, we can then argue as in the last step of Proposition 4.3 to obtain the assertion of Proposition 5.11 \(\square \)
We now state several consequences of Proposition 5.11 in which we switch the roles of \(\inf \) and \(\sup \) to provide a more natural formulation.
Denote for Borel \(K\subseteq S_{t_0}\times \mathsf {Y}\)
where \(\mathsf {DC}_{t_0}(K)\) consists of all pairs of lsc \(\varphi ,\psi \) on \({C_0(\mathbb {R}_+)}\) resp. \(\mathsf {Y}\) satisfying
where we recall the notation \(\varphi ^{M,S}\) from Definition 3.4.
Corollary 5.12
Let \(K\subseteq S_{t_0}\times \mathsf {Y}\) be closed. Then
Proof
Fix \(\varepsilon >0\). Applying Proposition 5.11 to \(c=\mathbbm {1}_{(r\otimes {{\mathrm{Id}}})^{1}(K)}\), which is lsc due to the continuity of r, we obtain that there exist functions \(\varphi \in C_b({C_0(\mathbb {R}_+)}),\psi \in C_b(\mathsf {Y})\) such that
It follows from (5.16) that \(\varphi ^{M,S}\) is bounded from below on \(S_{t_0}\) and wlog we may assume that \(\varphi (\omega )=\varphi _{t_0}^M(\omega )\). Subtracting a constant from \(\varphi \) and adding it to \(\psi \), we may assume that \(\inf \varphi = 0\) (which implies \(\psi \ge 0\)). It follows that we can replace \(\psi \) with \({\bar{\psi }}=\psi \wedge 1\).
It suffices to consider the case \({\mathbb {W}}(\varphi )\le 1\). Put \(\rho =\inf \{t\ge 0:\varphi ^M_t > 1\}.\) Due to Scontinuity of \(\varphi ^M\) (by Proposition 3.5) the set \(O:=\{(\omega ,t) :\varphi ^{M,S}\circ r(\omega ,t)>1\}\) is open. Hence also \(\{\rho <\infty \}={{\mathrm{proj}}}_{{C_0(\mathbb {R}_+)}} O\) is open as projections are open mappings and the map \(\omega \mapsto \varphi ^M_{\rho (\omega )}(\omega )=:{\bar{\varphi }}(\omega )\le 1\) is lsc. Clearly, \((\bar{\varphi },{\bar{\psi }})\) satisfies (5.15) and \({\mathbb {W}}(\bar{\varphi })+\nu ({\bar{\psi }})\le {\mathbb {W}}(\varphi )+\nu (\psi ).\) \(\square \)
Lemma 5.13
\(D_{t_0}\) is a Choquet capacity on \(S\times \mathsf {Y}.\)
Proof
We need to verify the defining properties of a capacity \(\Psi \) (cf. [35, Definition 30.1]):

(1)
monotonicity: \(A\subseteq B \Rightarrow \Psi (A)\le \Psi (B)\)

(2)
continuity from below: \(A_1\subseteq A_2\subseteq \cdots \Rightarrow \Psi (A_n) \rightarrow \Psi (\bigcup _j A_j)\)

(3)
boundedness: \(\Psi (K)<\infty \) for all compact K; if \(\Psi (K)<u\) there exists open \(U\supseteq K\) with \(\Psi (U)<u.\)
Moreover, it is sufficient to test these properties for Borel sets (see [35, Section 30B]). The monotonicity is immediate. Let us turn to the continuity from below.
Take an increasing sequence \(A_1 \subseteq A_2\subseteq \cdots \subseteq S \times \mathsf {Y}\) of Borel sets and put \(A=\bigcup _n A_n.\) For all n there are lsc functions \(\varphi _n:{{C_0(\mathbb {R}_+)}}\rightarrow [0,1]\) (which give rise to Slsc martingales) and \(\psi _n: \mathsf {Y}\rightarrow [0,1]\) such that \(\mathbbm {1}_{A_n}((f,s),y)\le \varphi _n^{M,S}(f,s) + \psi _n(y)\) for all \((f,s)\in S_{t_0},y\in \mathsf {Y}\) and
Using a Mazur/Komlostype lemma (e.g. Lemma A1.1 in [15]) we can assume that some appropriate convex combinations of \(\psi _n\) and \(\varphi _n\) converge a.s. to functions \(\psi \) and \(\varphi \). More precisely: there exist convex coefficients \(\alpha _n^{n}, \ldots ,\alpha _{k_n}^{n}, n\ge 1, k_n<\infty ,\) and full measure subsets \(\Omega _1\subseteq {C_0(\mathbb {R}_+)},\mathsf {Y}_1\subseteq \mathsf {Y}\) such that with \(\tilde{\varphi }_n:= \sum _{i=n}^{k_n} \alpha _i^{n} \varphi _i,{\tilde{\psi }}_n:= \sum _{i=n}^{k_n} \alpha _i^{n} \psi _i\) we have that for all \(\omega \in \Omega _1\) and all \(y\in \mathsf {Y}_1\)
exist. Extend these functions to \({{C_0(\mathbb {R}_+)}}\) and \(\mathsf {Y}\), resp., through
This implies for \((f,s)\in S\)
Given \(m\le n\) we have for \((f,s)\in S_{t_0},y\in \mathsf {Y}\)
hence \(\mathbbm {1}_{A_m}((f,s), y) \le \varphi ^{M,S}(f,s)+\psi (y)\) and thus also
Given \(\varepsilon >0\), we can find lsc functions \(\varphi ^\varepsilon \ge \varphi \) and \(\psi ^\varepsilon \ge \psi \) such that \({\mathbb {W}}(\varphi ^\varepsilon )\varepsilon /2< {\mathbb {W}}(\varphi )=\lim {\mathbb {W}}({{\tilde{\varphi }}_n})\) and \(\nu (\psi ^\varepsilon )\varepsilon /2 < \nu (\psi )=\lim \nu ({\tilde{\psi }}_n).\) It follows that
Let us turn to the third property. Trivially, \(D_{t_0}(K) \le 1\), so take a compact set \(K\subseteq S\times \mathsf {Y}\) and fix \(\varepsilon >0.\) By Corollary 5.12 there is \((\varphi ,\psi )\in \mathsf {DC}_{t_0}(K)\) such that
As \((\varphi ,\psi )\in \mathsf {DC}_{t_0}(K)\) we have \(K\subseteq \{((f,s),y):\varphi ^{M,S}(f,s)+\psi (y) \ge 1\}.\) At the additional cost of 2 \(\varepsilon \) we can find two lsc functions \(\varphi ^\varepsilon :=(\varphi +\varepsilon ) \wedge 1 \ge \varphi \) and \(\psi ^\varepsilon := (\psi +\varepsilon ) \wedge 1 \ge \psi \) such that \({\mathbb {W}}(\varphi ^\varepsilon ) + \nu (\psi ^\varepsilon )\le {\mathbb {W}}(\varphi ) + \nu (\psi )+2\varepsilon \) and \(K\subseteq \{((f,s),y):(\varphi ^\varepsilon )^{M,S}(f,s)+\psi ^\varepsilon (y)>1\}\). By lower semicontinuity, \(U:=\{((f,s),y):(\varphi ^\varepsilon )^{M,S}(f,s)+\psi ^\varepsilon (y)>1\}\) is open. Hence, for every \(\varepsilon >0\) we have found an open \(U\supseteq K\) such that \(D_{t_0}(U)\le D_{t_0}(K) + 3\varepsilon \), proving the last claim. \(\square \)
The next step is to show that up to a factor of 2 we can restrict ourselves to dual functions \(\varphi \) and \(\psi \) which are indicator functions. The simple reason is that if \(1\le a + b\) then \(a> 1/2\) or \(b\ge 1/2\). In the formulation of the next lemma and subsequently we use the notation
Lemma 5.14
Let \(K\subseteq S_{t_0} \times \mathsf {Y}\) be Borel. Then
\(\text {where }{\mathsf {Cov}}(K)=\{F \subseteq S \text{ open } , A\subseteq \mathsf {Y}: K\subseteq (F \times \mathsf {Y}) \cup (S\times A)\}.\)
Proof
We may assume \(D_{t_0}(K)<1/2\), otherwise simply take \(A=\mathsf {Y}, F=\emptyset \).
Take \((\varphi ,\psi )\in \mathsf {DC}_{t_0}(K)\). As the cost function is \(\{0,1\}\)valued, the dual constraint
implies that
Recalling that \(0\le \psi \le 1\) we set \(A=\{\psi \ge 1/2\}\) and note that \(\nu (A)/2\le \nu (\psi ) .\)
Let us turn our attention to the set \(F=\{(f,s):\varphi ^{M,S}(f,s)> 1/2 \}\). As \(D_{t_0}(K)<1/2\), we may assume that \(\varphi ^{M,S}(0,0)< {1}/{2}\). Given \(\varepsilon >0\) we apply the optional section theorem to \(r^{1}(F)\cap ({C_0(\mathbb {R}_+)}\times [0,t_0))\) to obtain a stopping time \(\tau \) such that \({\mathbb {W}}(\tau <t_0)>{\mathbb {W}}({{\mathrm{deb}}}_{t_0}(F))\varepsilon \) and \(\varphi ^M_\tau >1/2\) on \(\{\tau < t_0\}\). By optional stopping
As \(\varepsilon >0\) was arbitrary, \( {\mathbb {W}}({{\mathrm{deb}}}_{t_0}(F))+ \nu (A) \le 2 (\mathbb {E}[\varphi _0^M]+ \nu (\psi ))\), establishing (5.19). \(\square \)
Proof of Proposition 5.9
Assume first that \(E\subseteq S_{t_0}\times \mathsf {Y}\). We have \(\sup _{\pi \in \mathsf {JOIN}^1( \nu )} \pi (K)=0\) for all compact \(K\subseteq E\). By Corollary 5.12, this implies that \(D_{t_0}(K)=0\) for all compact \(K\subseteq E\). By Choquet’s capacitability theorem [35, Theorem 30.13] and Lemma 5.13 this in turn implies \(D_{t_0}(E)=0\).
Hence, by Lemma 5.14, for each \(\varepsilon >0\) there exist \(F \subseteq S\) and a set \(N\subseteq \mathsf {Y}\) such that \(E \subseteq (F \times \mathsf {Y})\cup (S \times N)\) and \({\mathbb {W}}({{\mathrm{deb}}}_{t_0}(F))+\nu (N)\le 2\varepsilon .\)
For each k, pick some set \(F_k\subseteq S\) and a set \(N_k\subseteq \mathsf {Y}\) such that \(E \subseteq \left( F_k\times \mathsf {Y}\right) \cup \left( S \times N_k\right) \) and \({\mathbb {W}}({{\mathrm{deb}}}_{t_0}(F_k)) + \nu (N_k) \le 2^{k}\). Setting \( F=\limsup _k F_k\) and \( N= \limsup _k N_k\) we get \({\mathbb {W}}({{\mathrm{deb}}}_{t_0}( F))=0,\nu (N)=0\) and
To establish the result in the case of general \(E\subseteq S\times \mathsf {Y}\), for each \(n\in \mathbb {N}\) pick sets \(N_n\subseteq \mathsf {Y}, \nu (N_n)=0,\) \(F_n\subseteq S,{\mathbb {W}}({{\mathrm{deb}}}_{n}(F_n))=0\) such that \(E\cap (S_n\times \mathsf {Y})\subseteq (F_n\times \mathsf {Y}) \cup (S\times N_n).\) Then \(N:= \bigcup _{n\ge 1} N_n\) and \(F:=\bigcup _{n\ge 1} F_n\) are as required. \(\square \)
A secondary minimization result
In certain cases, in order to resolve possible nonuniqueness of a minimizer, it will be useful to identify particular solutions as the solution not only to a primary optimization result, but also as the unique optimizer within this class of a second minimization problem. To this end, we begin by making the following definition: Supposing that \(\gamma , {\tilde{\gamma }}:S \rightarrow \mathbb {R}\) are Borel measurable, we write \(\mathsf {Opt}_\gamma \) for the set of optimizers of (4.1). If \(\mathsf {Opt}_\gamma \ne \emptyset \), we consider the secondary optimization problem
We will say that (5.20) is well posed if the primary optimization problem (4.1) is well posed and \(\int {\tilde{\gamma }}\, d\xi \) exists with values in \((\infty ,\infty ]\) for all \(\xi \in \mathsf {Opt}_\gamma \) and is finite for one such \(\xi \). Observe that, when \(P_\gamma \) is finite and the map \(\pi \mapsto \int \gamma \ d\pi \) is lsc the set \(\mathsf {Opt}_\gamma \) is a closed subset of \(\mathsf {RST}(\mu )\), and hence also compact.
We need an extended version of the stopgo pairs introduced in Definition 5.3.
Definition 5.15
Let \(\gamma ,{\tilde{\gamma }}:S\rightarrow \mathbb {R}\) be Borel measurable. The set of secondary stopgo pairs \(\mathsf {SG}_{2}^{\xi }\) (relative to \(\xi \)) consists of all \(\big ( (f,s), (g,t)\big )\in S\times S,f(s)= g(t)\) such that either \(((f,s), (g,t)) \in \mathsf {SG}^\xi \), or
As before, we also say that (5.21) holds if any of the integrals in the second equation are not defined, or the integral on the lefthand side equals \(\infty \).
We also define secondary stopgo pairs in the wide sense by \({{\widehat{\mathsf {SG}}}_{2}^{\xi }}=\mathsf {SG}_2^\xi \cup \{(f,s)\in S: A^\xi (f,s)=1\}\times S\).
Then we have the following generalization of Theorem 5.7.
Theorem 5.16
Let \(\gamma , {\tilde{\gamma }}\) be Borel measurable functions on S. Suppose that \(\mathsf {Opt}_\gamma \ne \emptyset \), and that the optimization problem (5.20) is well posed with optimizer \(\xi \in \mathsf {Opt}_\gamma \). Then there exists a Borel set \(\Gamma \subseteq S\) such that \(r(\xi )(\Gamma ) = 1\) and
The proof given for Theorem 5.7 also applies in the present situation. Hence, the result follows immediately from the following straightforward variant of Proposition 5.8.
Proposition 5.17
Assume that \(\gamma , \tilde{\gamma }: S \rightarrow \mathbb {R}\) are measurable, the optimization problem (5.20) is well posed, and that \(\xi \in \mathsf {RST}(\mu )\) is an optimizer. Then \((r \otimes {{\mathrm{Id}}})(\pi )(\mathsf {SG}_2^\xi )=0\) for any \(\pi \in \mathsf {JOIN}^1(r(\xi ))\).
Proof
As \(\xi \in \mathsf {Opt}_\gamma \) we have to show that \((r \otimes {{\mathrm{Id}}})(\pi )(\mathsf {SG}_2^\xi {\setminus }\mathsf {SG}^\xi )=0\), however this follows by considering the same construction as in the proof of Proposition 5.8. \(\square \)
Embeddings in abundance
In the following we suppose that \((\Omega ,\mathcal {G},(\mathcal {G}_t)_{t\ge 0},\mathbb {P})\) is a stochastic basis which is sufficiently rich to support a Brownian motion B and a uniformly distributed \(\mathcal {G}_0\)random variable. We suppose that \(\gamma : S\rightarrow \mathbb {R}\) is a Borel measurable function. In a slight abuse of notation we will also write \((\gamma _t)_{t\in \mathbb {R}_+}\) for the process given by
In the previous section we have considered a secondary optimization problem and a version of the monotonicity principle (Theorem 5.16) accounting for this extension. We now give a brief summary in probabilistic terms.
Write \(\mathsf {Opt}_\gamma \) for the set of \(\mathcal {G}\)stopping times on \(\Omega \) which are optimizers of (OptSEP) and consider another Borel function \({\tilde{\gamma }}:S\rightarrow \mathbb {R}\). We call \({{\hat{\tau }}}\in \mathsf {Opt}_\gamma \) a secondary minimizer if it solves
As in (5.20) we say that \((\hbox {OptSEP}_2)\) is well posed if the primary optimization problem (OptSEP) is well posed and \(\mathbb {E}\left[ {\tilde{\gamma }}_\tau \right] \) exists with values in \((\infty ,\infty ]\) for all \(\tau \in \mathsf {Opt}_\gamma \) and is finite for one such \(\tau \). Then we have the following version of Theorems 1.1 and 4.1:
Theorem 6.1
Let \(\gamma , \tilde{\gamma }:S\rightarrow \mathbb {R}\) be lsc and bounded from below in the sense of (4.2). Then \(\mathrm{({OptSEP}_2)}\) admits a minimizer \({\hat{\tau }}\).
We now provide the appropriate generalizations of Definitions 1.4 and 1.5 and Theorem 1.3 for this case.
Definition 6.2
The pair \(\big ((f,s), (g,t)\big )\in S\times S\) constitutes a secondary stopgo pair, written \(\big ((f,s), (g,t)\big )\in \mathsf {SG}_2\), iff \(f(s)=g(t)\), and for every \((\mathcal {F}^B_t)_{t \ge 0}\)stopping time \(\sigma \) which satisfies \(0< \mathbb {E}[\sigma ] < \infty \),
whenever both sides are well defined, and the lefthand side is finite; and if
then
whenever both sides are welldefined and the lefthand side (of (6.3)) is finite.
Definition 6.3
We say that \(\Gamma \) is \( {\tilde{\gamma }} \gamma \)monotone if
From Theorem 5.16 together with a trivial modification of Lemma 5.4 we then obtain:
Theorem 6.4
(Monotonicity Principle II) Let \(\gamma , {\tilde{\gamma }}:S\rightarrow \mathbb {R}\) be Borel measurable, suppose that \(\mathrm{({OptSEP}_2)}\) is well posed and that \({\hat{\tau }}\) is an optimizer. Then there exists a \({\tilde{\gamma }} \gamma \)monotone Borel set \(\Gamma \subseteq S\) such that \(\mathbb {P}\)a.s.
Recovering classical embeddings
In this section we derive a number of classical embeddings as well as establish new embeddings. Figure 4 shows graphical representations of some of these constructions. We highlight the common feature of all these pictures: when plotted in an appropriate phase space, the stopping time is the hitting time of a barrier type set. Identifying the appropriate phase space, and determining the exact structure of the barrier will be the key step in deriving the solutions to (SEP) in this section.
For subsequent use, it will be helpful to write, for \((f,s) \in S,\bar{f} = \sup _{r \le s} f(r),\underline{f} = \inf _{r \le s} f(r)\) and \(f^* = \sup _{r \le s} f(r)\).
Theorem 6.5
(The Azéma–Yor embedding, cf. [4]) There exists a stopping time \(\tau _{AY}\) which maximizes
over all solutions to (SEP) and which is of the form \(\tau _{AY} = \inf \{ t > 0: B_t \le \psi (\sup _{s \le t} B_s)\}\) a.s., for some increasing function \(\psi \).
Proof
Fix a bounded and strictly increasing continuous function \(\varphi :\mathbb {R}_+\rightarrow \mathbb {R}\) and consider the continuous functions \(\gamma ((f,s)) = \bar{f}\) and \({\tilde{\gamma }}((f,s)) = \varphi (\bar{f})(f(s))^2\). Then \((\hbox {OptSEP}_2)\) is well posed and by Theorem 6.1 there exists a minimizer \(\tau _{AY}\). By Theorem 6.4, pick a \({\tilde{\gamma }}\gamma \)monotone set \(\Gamma \subseteq S\) supporting \(\tau _{AY}\). We claim that
This is represented graphically in Fig. 5.
Indeed, pick \(((f,s),(g,t))\in S\times S\) with \(f(s)=g(t)\) and \( \bar{g}< \bar{f}\) and a stopping time \(\sigma \) with positive and finite expectation. Then (6.1) amounts to
with a strict inequality unless \({\bar{g}} \ge g(t) + {\bar{B}}_\sigma \) a.s. However in that case (6.2) is trivially satisfied and (6.3) amounts to
which holds since \(g(t)=f(s)\). Summing up, \(((f,s),(g,t))\in \mathsf {SG}\subseteq {\mathsf {SG}_2}\) in the former case and \(((f,s),(g,t))\in {\mathsf {SG}_2}\) in the latter case, proving (6.6).
In complete analogy with the derivation of the Root embedding (Theorem 2.1) we define
and write \(\tau _\textsc {cl}, \tau _\textsc {op}\) for the first times the process \((\bar{B}_t(\omega ),{B}_t(\omega ))\) hits the sets \({\mathcal {R}}_\textsc {cl}\) and \({\mathcal {R}}_\textsc {op}\) respectively. Then we claim \(\tau _\textsc {cl}\le \tau _{AY} \le \tau _\textsc {op}\) a.s. Note that \(\tau _\textsc {cl}\le \tau _{AY}\) holds by definition of \(\tau _\textsc {cl}.\) To show \(\tau _{AY} \le \tau _\textsc {op}\), consider \(\omega \) satisfying \(((B_s(\omega ))_{s\le \tau _{AY}(\omega )},\tau _{AY}(\omega ))\in \Gamma \) and assume for contradiction that \(\tau _\textsc {op}(\omega )<\tau _{AY}(\omega ).\) Then there exists \(s\in [\tau _\textsc {op}(\omega ),\tau _{AY}(\omega ))\) such that \(f:=(B_r(\omega ))_{r\le s}\) satisfies \(({\bar{f}}, f(s))\in {\mathcal {R}}_\textsc {op}\). Since \(s< \tau _{AY}(\omega )\) we have \((f,s)\in \Gamma ^<\). By definition of \({\mathcal {R}}_\textsc {op}\), there exists \((g,t)\in \Gamma \) such that \(f(s)= g(t)\) and \({\bar{g}} < {\bar{f}}\), yielding a contradiction.
Finally, we define
It follows from the definition of \({\mathcal {R}}_\textsc {cl}\) that \(\psi _0(m)\) is increasing, and we define the rightcontinuous function \(\psi _+(m) = \psi _0(m+)\), and the leftcontinuous function \(\psi _(m) = \psi _0(m)\). It follows from the definitions of \(\tau _{\textsc {op}}\) and \(\tau _{\textsc {cl}}\) that:
It is then easily checked that \(\tau _ = \tau _+\) a.s., and the result follows on taking \(\psi = \psi _+\). \(\square \)
Theorem 6.6
(The Jacka Embedding, cf. [33]) Let \(\varphi :\mathbb {R}_+\rightarrow \mathbb {R}\) be a bounded, strictly increasing rightcontinuous function. There exists a stopping time \(\tau _{J}\) which maximizes
over all solutions to (SEP), and which is of the form
a.s., for some functions \(\alpha _+, \alpha _\), where \(\alpha _+\) is increasing, \(\alpha _\) is decreasing, and \(\alpha _+(y) \ge \alpha _(y)\) for all \(y > y_0,\alpha _(y) = \alpha _+(y) = \infty \) for \(y < y_0\), some \(y_0 \ge 0\).
Proof
The proof runs along similar lines to the proof of Theorem 6.5, when we take \(\gamma ((f,s)) = \varphi (f^*)\) and set \(\tilde{\gamma }((f,s)) = \tilde{\varphi }(f^*)(f(s))^2\) for some bounded and strictly increasing, continuous function \(\tilde{\varphi }\). Then the statement follows once we see
define
and then take \( \alpha _(m) = \inf \{ x : (m,x) \in {\mathcal {R}}_\textsc {cl}\} \text { and } \alpha _+(m) = \sup \{x: (m,x) \in {\mathcal {R}}_\textsc {cl}\}. \) \(\square \)
Remark 6.7
We observe that both the results hold for onedimensional Brownian motion with an arbitrary starting distribution \(\lambda \) satisfying the usual convex ordering condition.
Theorem 6.8
(The Perkins Embedding, cf. [45]) Suppose \(\mu (\{0\}) = 0\). Let \(\varphi :\mathbb {R}_+^2\rightarrow \mathbb {R}\) be a bounded function which is continuous and strictly increasing in both arguments. There exists a stopping time \(\tau _{P}\) which minimizes
over all solutions to (SEP) and which is of the form \(\tau _{P} = \inf \{ t > 0: B_t \not \in (\alpha _+(\bar{B}_t), \alpha _(\underline{B}_t))\}\), for some decreasing functions \(\alpha _+\) and \(\alpha _\) which are left and rightcontinuous respectively.
Proof
Fix a bounded and strictly increasing continuous function \({\tilde{\varphi }}:\mathbb {R}^2_+\rightarrow \mathbb {R}\) and consider the continuous functions \(\gamma ((f,s)) = \varphi (\bar{f},\underline{f})\) and \({\tilde{\gamma }}((f,s)) = (f(s))^2 {\tilde{\varphi }}(\bar{f},\underline{f})\). Then \((\hbox {OptSEP}_2)\) is well posed and by Theorem 6.1 there exists a minimizer \(\tau _{P}\). By Theorem 6.4, pick a \({\tilde{\gamma }}\gamma \)monotone set \(\Gamma \subseteq S\) supporting \(\tau _{P}\). Note that we may assume that \(\Gamma \) only contains points such that \(\underline{g}< 0 < \bar{g}\), since \(\mu (\{0\}) = 0\).
By a similar argument to that given in the proof of Theorem 6.5 we can show
where \((\bar{f},  \underline{f})< (\bar{g}, \underline{g})\) iff \((\bar{f},  \underline{f})\le (\bar{g}, \underline{g})\) but not \((\bar{f},  \underline{f})= (\bar{g}, \underline{g})\) and\((\bar{f},  \underline{f})\le (\bar{g}, \underline{g})\) refers to the partial order of \(\mathbb {R}^2\).
In addition, consider a path \((g,t) \in S\) such that \(\underline{g}< g(t) < \bar{g}\). Then there exists \((f,s) \in S\) such that \(f(r) = g(r)\) for \(r \le s\), and such that \(f(s) = g(t)\), and exactly one of \(\bar{f} = \bar{g}\), or \(\underline{f} = \underline{g}\). This is true since there must exist a last time that \(g(r) = x\) before setting the most recent extremum. In particular, \(((f,s),(g,t)) \in {\mathsf {SG}_2}\). It follows that \(\Gamma \cap \{(g,t): \underline{g}< g(t) < \bar{g}\} = \emptyset \), that is, any stopped path must stop at a minimum or a maximum.
Now consider the sets:
and their respective hitting times by \((\bar{B}_t,\underline{B}_t)_{t\ge 0}\), denoted \(\tau _\textsc {cl}, \tau _\textsc {op}\). Since \(\Gamma \cap \{(g,t): \underline{g}< g(t) < \bar{g}\} = \emptyset \), it follows that \(\tau _\textsc {cl}\le \tau _P\) a.s. In addition, an essentially identical argument to that used in the proof of Theorem 6.5 gives \(\tau _{P} \le \tau _\textsc {op}\) a.s.
We now set \(\alpha _+(m) = \sup \{x < 0 : (m,x) \in \underline{\mathcal {R}}_\textsc {cl}\}, \alpha _(i) = \inf \{x > 0 : (x,i) \in \bar{\mathcal {R}}_\textsc {cl}\}.\) Then these functions are both clearly decreasing and left and rightcontinuous respectively, by definition of the respective sets \(\underline{\mathcal {R}}_\textsc {cl}, \bar{\mathcal {R}}_\textsc {cl}\). Moreover, it is immediate that
and we deduce that \(\tau _\textsc {cl}= \tau _\textsc {op}\) a.s. by standard properties of Brownian motion. The conclusion follows. \(\square \)
Theorem 6.9
(Maximizing the range) Let \(\varphi :\mathbb {R}_+^2\rightarrow \mathbb {R}\) be a bounded function which is continuous and strictly increasing in both arguments. There exists a stopping time \(\tau _{xr}\) which maximizes
over all solutions to (SEP), and which is of the form \(\tau _{xr} = \inf \{ t > 0: B_t \ge \alpha _(\bar{B}_t,\underline{B}_t)\ \text {or}\) \(\ B_t \le \alpha _+(\bar{B}_t,\underline{B}_t)\}\) for some rightcontinuous functions \(\alpha _(m,i)\) decreasing in both coordinates and \(\alpha _+(m,i)\) increasing in both coordinates.
Proof
Our primary objective will be to minimize \(\gamma ((f,s)) = \varphi (\bar{f},\underline{f})\), which is a lsc function on S. We again introduce a secondary minimization problem: specifically, we consider the function \({\tilde{\gamma }}((f,s)) = (f(s))^2 {\tilde{\varphi }}(\bar{f},\underline{f})\) for some bounded, continuous and strictly increasing function \({\tilde{\varphi }}:\mathbb {R}_+^2\rightarrow \mathbb {R}\). Then \((\hbox {OptSEP}_2)\) is well posed and by Theorem 6.1 there exists a minimizer \(\tau _{xr}\). By Theorem 6.4, pick a \({\tilde{\gamma }}\gamma \)monotone set \(\Gamma \subseteq S\) supporting \(\tau _{xr}.\)
By a similar argument to that given in the proof of Theorem 6.5 we can show \({\mathsf {SG}_2}\supseteq \{((f,s),(g,t))\in S\times S: f(s)=g(t), (\bar{f},  \underline{f})> (\bar{g}, \underline{g})\}\).
Let \({{\mathrm{conv}}}\) denote the convex hull, and write
Then \(I_{\textsc {cl}}, I_{\textsc {op}}\) are both increasing in both coordinates, and \(I_{\textsc {cl}} \supseteq I_{\textsc {op}}\). Write \(\tau _{\textsc {op}} := \inf \{t \ge 0: B_t \in I_{\textsc {op}}(\bar{B}_t,\underline{B}_t)\}\), and \(\tau _{\textsc {cl}} := \inf \{t \ge 0: B_t \in I_{\textsc {cl}}(\bar{B}_t,\underline{B}_t)\}\). As previously, we deduce that \(\tau _{\textsc {cl}} \le \tau _{xr} \le \tau _{\textsc {op}}\). If, in addition, we define
then \(\alpha _+, \alpha _\) satisfy the conditions of the theorem, and
To conclude, we need to show that \(\tau _{\textsc {op}} = \tau _{\textsc {cl}}\). However, we first observe that \(\tau _{\textsc {op}} \ge \sigma \), and \(\tau _{\textsc {cl}} \ge \sigma _{\textsc {cl}}\), where
and in fact, \(\sigma = \sigma _{\textsc {cl}}\) a.s. In addition, on \(\{\sigma >0\}\) we have \(B_{\sigma } \in \{\bar{B}_\sigma , \underline{B}_\sigma \}\). On the set \(\{B_\sigma = \bar{B}_\sigma \}\) say, then
by the same argument as used at the end of the proof of Theorem 6.5, and the fact that \(\alpha _{+}(m+,i) = \alpha _{+,\textsc {cl}}(m,i)\), by the definition of the sets \(I_{\textsc {cl}}, I_{\textsc {op}}\) . \(\square \)
Remark 6.10
We observe that, in the case of Theorem 6.9, the characterization provided would not appear to be sufficient to identify the functions \(\alpha _+, \alpha _\) given the measure \(\mu \). This is in contrast to the constructions of Azéma–Yor, Perkins and Jacka, where knowledge of the form of the embedding is sufficient to identify the corresponding stopping rule.
On a more abstract level, uniqueness of barrier type embeddings in a two dimensional phase space can be seen as a consequence of Loynes’ argument [39]. More precisely, let \(A_t\) be some continuous process and suppose that \(\tau _1\) and \( \tau _2\) denote the times when \((A_t, B_t)\) hits a closed barrier type set \(R_1\) resp. \(R_2\). If \(\mathbb {E}[ \tau _1], \mathbb {E}[\tau _2] < \infty \) and both stopping times embed the same measure, the argument presented in Remark 2.3 shows that \(\tau _1=\tau _2\).
Remark 6.11
In Cox and Obłój [12], embeddings are constructed which maximize certain doubleexit probabilities: for example, to maximize the probability that both \(\bar{B}_\tau \ge \bar{b}\) and \(\underline{B}_\tau \le \underline{b}\), for given levels \(\bar{b}\) and \(\underline{b}\). In this case, the embedding is no longer naturally viewed as a barrier type construction; instead, it is natural to characterize the embedding in terms of where the paths with different crossing behaviour for the barriers finish (for example, the paths which only hit the upper level may end up above a certain value, or between two other values). However, it is possible, again using a suitable secondary maximization problem, to show that there exists an optimizer demonstrating the behaviour characterizing the Cox–Obłój embeddings. (Specifically, if we write \(H_{b}((f,s)) = \inf \{t\le s: f(t) = b\},\underline{H} = H_{\underline{b}} \wedge H_{\bar{b}}\) and \(\bar{H} = H_{\underline{b}}\vee H_{\bar{b}}\) then the secondary maximization problem
is sufficient to rederive the form of these embeddings).
The Valloisembedding and optimizing functions of local time
In this section we shall determine the stopping rule which solves
where \(\mathfrak {L}\) denotes the local time of Brownian motion at 0 and h is a convex or concave function. In many ways, the proof of this result will follow the arguments used in the previous section, however in contrast to the functions considered there, \(h(\mathfrak {L})\) is not defined on S in a straightforward way and hence we need to apply some care in fixing our notions. Moreover local time does not have an Scontinuous modification and hence some additional argument is needed to establish that (6.7) admits a minimizer.
We say that a \(\mathcal {G}\)adapted process \(\mathfrak {L}^x\) is a local time in x if it is a (rightcontinuous, increasing) compensator of \(Bx\) and we suppress x in the case of local time at 0. This determines \(\mathfrak {L}^x\) up to indistinguishability (and clearly the choice of \(\mathfrak {L}^x\) is irrelevant for (6.7)).
For us it is convenient to allow local time to assume the value \(+\infty \) on an evanescent set. Using this convention, Theorem 4.1 implies that there exists a Borel function \(L^x:S\rightarrow [0,\infty ]\) such that \(L^x\circ r \) is a (rightcontinuous, increasing) \(\mathcal {F}^0\)predictable local time on Wiener space. We will call such a process \(L^x\) a raw local time in x. We note that the value \(+\infty \) cannot be avoided here, see [42].
Lemma 6.12
Let L be a raw local time in 0. Then there exists a Borel set \(A\subseteq {C_0(\mathbb {R}_+)},{\mathbb {W}}(A)=1\) such that for all
we have \(L(f,s)<\infty \) and
is a raw local time in \(f(s)\).
Proof
Write V for the set of all (f, s) such that \(L^{(f,s)}\) is not a raw local time. To understand whether \((f,s)\in V\) we need to check whether or not \( (\omega , t)\mapsto B_{s+t}(f\oplus \omega )L^{(f,s)}(r(\omega ,t)) \) defines a martingale. Since this is a Borel property, \(V\subseteq S\) is Borel. Hence
is analytic and thus universally measurable. To prove that \({\mathbb {W}}({{\mathrm{deb}}}(V))=0\) it is sufficient to show this for any given Borel subset of \({{\mathrm{deb}}}(V)\). Suppose for contradiction that \({\mathbb {W}}(E)>0\) for some Borel set \(E\subseteq {{\mathrm{deb}}}(V)\). By the optional section theorem this implies that there exists an \(\mathcal {F}^a\)stopping time \(\tau \) such that \({\mathbb {W}}(\tau <\infty )>0\) and \((\omega , \tau (\omega ))\in r^{1}(V)\) whenever \(\tau (\omega )<\infty \). Upon requiring this only a.s. we may of course assume that \(\tau \) is an \(\mathcal {F}^0\)stopping time.
Given \(H=G\mathbbm {1}_{[[\tau ,\infty [[}\) for bounded \(\mathcal {F}^0_\tau \)measurable G it follows from usual properties of local time that
is a martingale. As G was arbitrary,
defines a martingale for almost all \(\omega ',\tau (\omega ')<\infty \), contradicting \({\mathbb {W}}({{\mathrm{deb}}}(V))>0\).
It follows that \({\mathbb {W}}({{\mathrm{deb}}}(V))=0\), hence we may pick a Borel set \(A\subseteq {{\mathrm{deb}}}(V)^c\) with \({\mathbb {W}}(A)=1\) such that (6.8) holds. \(\square \)
Our next goal is to verify that (6.7) admits an optimizer.
Proposition 6.13
Let \(h:[0,\infty ) \rightarrow \mathbb {R}\) be continuous and bounded. Then there exists an optimizer for (6.7). Moreover, if \({\tilde{\gamma }}(f,s) = e^{L(f,s)}f^2(s)\) or \({\tilde{\gamma }}(f,s) = e^{L(f,s)}f^2(s)\), the secondary minimization problem \(\mathrm{({OptSEP}_2)}\) also admits a solution.
Proof
Let L be a raw local time. We first observe that \((\mathfrak {L}_t)_{t \ge 0}:= (L\circ r((B_t)_{t \ge 0},t))_{t \ge 0}\) is (indistinguishable from) the local time of \((B_t)_{t \ge 0}\) on \((\Omega , \mathcal {G}, (\mathcal {G}_t)_{t\ge 0}, \mathbb {P})\). By Lemma 3.11 there exists a sequence \(\xi _1, \xi _2, \xi _3, \ldots \in \mathsf {RST}(\mu )\) such that
Possibly passing to a subsequence there is \(\xi \in \mathsf {RST}(\mu )\) such that \(\xi = \lim _n \xi _n\).
A result of Jacod and Memin ([34, Corollary 2.9]) asserts
for any bounded measurable function \(\varphi :{C_0(\mathbb {R}_+)}\times \mathbb {R}_+\) such that \(t\mapsto \varphi (\omega ,t)\) is continuous for every \(\omega \in {C_0(\mathbb {R}_+)}\). It follows that \(\int h(\mathfrak {L}_t(\omega )) \,\xi (d\omega ,dt) = V^*\). Moreover (again by Lemma 3.11) there exists a \(\mathcal {G}\)stopping time \(\tau ^*\) such that \(\mathbb {E}[h(\mathfrak {L}_{\tau ^*})] = \int h(\mathfrak {L}_t(\omega )) \,\xi (d\omega ,dt)\).
The second assertion follows from a similar reasoning, using an approximation argument to handle the unboundedness of \({\tilde{\gamma }}\). \(\square \)
Note added in revision. Guo et al. [27] were able to relax the continuity assumption in our existence and duality results Theorems 1.1 and 1.2. Based on the work of Jacod and Memin [34] they establish these results under the assumption that \(t\mapsto \gamma \circ r(\omega , t)\) is lsc for every \(\omega \in {C_0(\mathbb {R}_+)}\). In particular their results would imply a more general version of Proposition 6.13.
We are now able to show:
Theorem 6.14
Let \(h:[0,\infty ] \rightarrow \mathbb {R}\) be a bounded, strictly concave function and \(\mathfrak {L}\) the local time of B at 0.

(1)
There exists a stopping time \(\tau _{V}\) which maximizes
$$\begin{aligned} \mathbb {E}\left[ h\left( \mathfrak {L}_\tau \right) \right] \end{aligned}$$over the set of all solutions to (SEP), and which is of the form
$$\begin{aligned}\tau _{V} = \inf \left\{ t > 0: B_t \notin \left( \alpha _\left( \mathfrak {L}_t\right) ,\alpha _+\left( \mathfrak {L}_t\right) \right) \right\} \text { a.s.,} \end{aligned}$$for some decreasing function \(\alpha _+\ge 0\) and increasing function \(\alpha _\le 0\).

(2)
There exists a stopping time \(\tau _{V+}\) which minimizes
$$\begin{aligned} \mathbb {E}\left[ h\left( \mathfrak {L}_\tau \right) \right] \end{aligned}$$over the set of all solutions to (SEP), and which is of the form
$$\begin{aligned} \tau _{V+} = Z \wedge \inf \left\{ t > 0: B_t \notin \left( \alpha _\left( \mathfrak {L}_t\right) ,\alpha _+\left( \mathfrak {L}_t\right) \right) \right\} , \text { a.s.} \end{aligned}$$for some increasing function \(\alpha _+\ge 0\), and some decreasing function \(\alpha _\le 0\), and a \(\{0,\infty \}\)valued \(\mathcal {G}_0\)measurable random variable Z.
Proof
We consider the second case, under the additional assumption that \(0<\mu (\{0\}) <1\), the other cases being slightly simpler. As above, we let L be a raw local time and observe that \((\mathfrak {L}_t)_{t \ge 0}:= (L\circ r((B_t)_{t \ge 0},t))_{t \ge 0}\) is (indistinguishable from) the local time of \((B_t)_{t \ge 0}\) on \((\Omega , \mathcal {G}, (\mathcal {G}_t)_{t\ge 0}, \mathbb {P})\).
Applying Proposition 6.13 and Theorem 6.4 to the optimizations corresponding to \(\gamma (\omega ,t) = h(L\circ r(\omega ,t))\) and \({\tilde{\gamma }}(\omega ,t) = e^{L\circ r(\omega ,t)}\omega ^2_t\) we obtain a minimizer \(\tau _{V+}\) and a \({\tilde{\gamma }}{\gamma }\)monotone set \(\Gamma \subseteq S\) supporting \(\tau _{V+}\).
Recall the set \(A \subseteq {C_0(\mathbb {R}_+)}\) given by Lemma 6.12. By projection the set
is universally measurable and since \(\tau _{V+}\) is a finite stopping time, \(\mathbb {P}( ((B_t)_{t\le \tau _{V+}},\tau _{V+}) \in U)=1\). Passing to an appropriate subset if necessary, we may also assume that U is Borel. We may therefore assume \(\Gamma \subseteq U\), and it then also follows that \(\Gamma ^{<} \subseteq U\).
By a similar argument to the previous cases we can show that
where Lemma 6.12 guarantees that local time of paths is wellbehaved following a pathswapping operation. In particular, since both f and g belong to U, it follows that (6.8) holds, and (6.9) is a direct consequence of this.
Define the sets
and the corresponding stopping times
Strictly speaking, the random times on the righthand side only define stopping times in the augmented filtration (by the Début Theorem), however by Theorem 3.1, this is sufficient to find almost surely equal \(\mathcal {G}\)stopping times.
Since \((\Gamma ^{<} \times \Gamma ) \cap \mathsf {SG}_2 = \emptyset \) and \((0,0) \in \Gamma ^{<}\) (\(\Gamma \) contains a nontrivial element since \(\mu (\{0\}) < 1\)) then \((l,0) \not \in \Gamma \) for any \(l \ge 0\). It follows that \(\mathbb {P}(\tau _{V+} = 0) = \mu (\{0\})\).
We now consider \(\tau _{V+}\) on \(\{\tau _{V+} > 0 \}\). Note that \(\{\tau>0\}= \{ \mathfrak {L}_\tau >0\}\) a.s., for any stopping time \(\tau \) and hence in particular \(\{\tau _{V+}> 0 \}= \{ \mathfrak {L}_{\tau _{V+}} >0\}\) a.s. Then on \(\{\tau _{V+} >0\},\tau _\textsc {cl}^* \le \tau _{V+} \le \tau _\textsc {op}^*\) a.s., and hence \(\mathbb {P}(\tau _{V+} \le \tau _{\textsc {op}}^*) = 1\). Define \(\alpha _+(l) = \inf \{ x>0: (l,x) \in {\mathcal {R}}_\textsc {op}\}\) and \(\alpha _(l) = \sup \{ x<0: (l,x) \in {\mathcal {R}}_\textsc {op}\}\).
If either of \(\alpha _(\eta ) = 0\) or \(\alpha _+(\eta ) = 0\) for some \(\eta >0\), then \(\tau _{\textsc {op}}^* = 0\) a.s. Since \(\tau _{V+} \le \tau _\textsc {op}^*\) and \(\mathbb {P}(\tau _{V+}>0) >0\) we must therefore have \(\alpha _+(\eta )>0, \alpha _(\eta )<0\) for \(\eta >0\). In addition, \(\alpha _+(l)\) is clearly rightcontinuous and increasing, so it must have at most countably many discontinuities, and similarly for \(\alpha _(l)\). We can write
and observe that (by standard properties of Brownian motion) the stopping times on the left and right are almost surely equal (since there are at most countably many discontinuities, and \(\alpha _+(l)\) and \(\alpha _(l)\) are bounded away from zero on \([\eta ,\infty )\) for \(\eta >0\)). It follows that \(\tau _{V+} = \inf \{t: \mathfrak {L}_t >0, B_t \not \in (\alpha _(\mathfrak {L}_t),\alpha _+(\mathfrak {L}_t))\} \) on \(\{\tau _{V+} > 0\}\), and we deduce that \(\tau _{V+}\) is zero with probability \(\mu (\{0\})\), and, conditional on being greater than zero, \( \tau _{V+} =\inf \{ t > 0: B_t \notin (\alpha _(\mathfrak {L}_t),\alpha _+(\mathfrak {L}_t))\}\) a.s. \(\square \)
Remark 6.15
The arguments above extend from local time at 0 to a general continuous additive functional A. Recalling that \(\mathfrak {L}^x\) denotes local time in x, A can be represented in the form \(A_t:=\int _0^t \mathfrak {L}_s^x\, dm_A(x)\). Let f be a convex function such that \(f'' = m_A\) in the sense of distributions. If \(\int f\, d\mu < \infty \) then the above proof is easily adapted to the more general situation.
In this manner, we deduce the existence of optimal solutions to (SEP) for functions depending on A. By analogy with Theorem 6.14 this can be used to generate (inverse/cave) barrier type embeddings of various kinds. Other generalizations and variants may be considered in a similar manner. We leave specific examples as an exercise for the reader.
Root and Rost embeddings in higher dimensions
In this section we consider the Root and Rost constructions of Sects. 2.1 and 2.2 in the case of ddimensional Brownian motion with general initial distribution, for \(d\ge 2\). In \(\mathbb {R}^d\), since Brownian motion is transient, it is no longer straightforward to assert the existence of an embedding. In general, [49] gives necessary and sufficient conditions for the existence of an embedding, and without the additional condition that \(\mathbb {E}[\tau ] < \infty \). In the Brownian case, Rost’s conditions for \(d \ge 3\) can be written as follows. There exists a stopping time \(\tau \) such that \(B_0 \sim \lambda \) and \(B_\tau \sim \mu \) if and only if for all \(y \in \mathbb {R}^d\)
However, it is not clear that such a stopping time will satisfy the condition
As a result, it is not straightforward to give simple criteria for the existence of a solution in \(\mathsf {RST}(\mu )\).
In the case \(d=2\) it follows from Falkner’s results [22] that the Skorokhod problem admits a solution (i.e. \(\mathsf {RST}(\mu )\ne \emptyset \)) if (6.10) is satisfied for \(u(x,y)=\ln xy\) and then (6.11) applies.
In either case, assuming that we do have a solution satisfying (6.11), then the existence result as well as the monotonicity principle carry over to the present setup (with identical proofs) and we are able to state the following:
Theorem 6.16
Suppose \(\mathsf {RST}(\mu )\) is nonempty. If h is a strictly convex function and \(\hat{\tau } \in \mathsf {RST}(\mu )\) minimizes \(\mathbb {E}[h(\tau )]\) over \(\tau \in \mathsf {RST}(\mu )\) then there exists a barrier \({\mathcal {R}}\) such that \(\hat{\tau } = \inf \{ t > 0 : (B_t,t) \in {\mathcal {R}}\}\) on \(\{{\hat{\tau }} >0\}\) a.s.
The proof of this result is much the same as that of Theorem 2.1, except we no longer show that \(\tau _\textsc {cl}= \tau _\textsc {op}\). In higher dimensions with general initial laws, it is easy to construct examples where there are common atoms of \(\lambda \) and \(\mu \), but where the size of the atom in \(\lambda \) is strictly larger than the atom of \(\mu \). By the transience of the process, it is clear that the optimal (indeed, only) behaviour is to stop mass starting at such a point immediately with a probability strictly between 0 and 1, however the stopping times \(\tau _\textsc {cl}\) and \(\tau _\textsc {op}\) will always stop either all the mass, or none of this mass respectively. For this reason, we do not say anything about the behaviour of \({\hat{\tau }}\) when \({\hat{\tau }} = 0\). Trivially, the above result tells us that the solution of the optimal embedding problem is given by a barrier if there exists a set D such that \(\lambda (D) = 1 = \mu (D^c)\).
Proof of Theorem 6.16
The first part of the proof proceeds similarly to the proof of Theorem 2.1. In particular, the set of stopgo pairs is given by
and we define the sets \({\mathcal {R}}_\textsc {cl}, {\mathcal {R}}_\textsc {op}\) and the stopping times \(\tau _\textsc {cl}, \tau _\textsc {op}\) as above. We then fix \(\delta >0\), and consider the set \(\{{\hat{\tau }}\ge \delta \}\). Given \(\eta \ge 0\), we define \(B^{\eta }_t = B_{t+\eta }\), for \(t \ge \eta \) and set
Then \(\tau _\textsc {cl}^{\eta ,\delta } \ge \delta \), and for any \(\varepsilon >0\), there exists \(\eta >0\) sufficiently small that \(d_{TV}(B^{\eta }_{\delta },B_{\delta }) < \varepsilon ,\) where \(d_{TV}\) denotes the total variation distance. By the Strong Markov property of Brownian motion, it follows that \( d_{TV}(B^{\eta }_{\tau _\textsc {cl}^{\eta ,\delta }}, B_{\tau _\textsc {cl}^{0,\delta }}) < \varepsilon \). In particular, the law of \(B^{\eta }_{\tau _\textsc {cl}^{\eta ,\delta }}\) converges weakly to the law of \(B_{\tau _\textsc {cl}^{0,\delta }}\) as \(\eta \rightarrow 0\). Thus
so \(\tau _\textsc {cl}^{\eta ,\delta } \ge \tau _{R}^{0,\delta }\), and moreover, \(\tau _\textsc {cl}^{\eta ,\delta } \rightarrow \tau _\textsc {op}^{0,\delta }\) a.s. as \(\eta \rightarrow 0\). Hence, \(B^{\eta }_{\tau _\textsc {cl}^{\eta ,\delta }} \rightarrow B_{\tau _\textsc {op}^{0,\delta }}\) in probability, as \(\eta \rightarrow 0\), so we have weak convergence of the law of \(B^{\eta }_{\tau _\textsc {cl}^{\eta ,\delta }}\) to the law of \(B_{\tau _\textsc {op}^{0,\delta }}\), and hence \({B_{\tau _\textsc {op}^{0,\delta }} \sim B_{\tau _\textsc {cl}^{0,\delta }}}\). We now observe that, by an essentially identical argument to that in the proof of Theorem 2.1, we must have \(\tau _\textsc {cl}^{0,\delta } \le \hat{\tau } \le \tau _\textsc {op}^{0,\delta }\) on \(\{\hat{\tau } \ge \delta \}\). However, in the argument above, we know that \(\tau _\textsc {cl}^{0,\delta } \le {\hat{\tau }} \le \tau _\textsc {op}^{0,\delta }\), and \(\tau _\textsc {cl}^{\eta ,\delta } \rightarrow _{\mathcal {D}} \tau _\textsc {cl}^{0,\delta }\) and \(\tau _\textsc {cl}^{\eta ,\delta } \rightarrow _{\mathcal {D}} \tau _\textsc {op}^{0,\delta }\) as \(\eta \rightarrow 0\) (where \({\mathcal {D}}\) denotes convergence in distribution). It follows that \(\tau _\textsc {cl}^{0,\delta } =_{\mathcal {D}} \tau _\textsc {op}^{0,\delta }\) and hence \(\tau _\textsc {cl}^{0,\delta } = \tau _\textsc {op}^{0,\delta }\) a.s. In particular, \(B_{\tau _\textsc {cl}^{0,\delta }} = B_{\tau _\textsc {op}^{0,\delta }} = B_{{\hat{\tau }}}\) on \(\{{\hat{\tau }} \ge \delta \}\). Letting \(\delta \rightarrow 0\) we observe that \(\tau _\textsc {op}^{0,\delta } \rightarrow \tau _\textsc {op}\), and hence the required result holds on taking \({\mathcal {R}}={\mathcal {R}}_\textsc {op}\). \(\square \)
We now consider the generalization of the Rost embedding. Recall that \((\min (\lambda , \mu ))(A) := \inf _{B \subseteq A} \left( \lambda (B)+ \mu (A{\setminus } B)\right) \) defines a measure.
Theorem 6.17
Suppose \(\lambda , \mu \) are measures in \(\mathbb {R}^d\) and \(\hat{\tau } \in \mathsf {RST}(\mu )\) maximizes \(\mathbb {E}[h(\tau )]\) over all stopping times in \(\mathsf {RST}(\mu )\), for a convex function \(h: \mathbb {R}_+ \rightarrow \mathbb {R}\), with \(\mathbb {E}[h(\tau )]<\infty \). Then \(\mathbb {P}(\hat{\tau }=0, B_0 \in A) = (\min (\lambda , \mu ))(A)\), for \(A \in {\mathcal {B}}(\mathbb {R})\), and on \(\{\hat{\tau }>0\},\hat{\tau }\) is the first hitting time of an inverse barrier.
Proof
We follow the proof of Theorem 2.4 to recover the set of stopgo pairs given by
and the sets \({\mathcal {R}}_\textsc {op}\) and \({\mathcal {R}}_\textsc {cl}\), and their corresponding hitting times \(\tau _\textsc {op}, \tau _\textsc {cl}\). For \(0 \le \eta \le \delta \), we define in addition the stopping times
where \(B_t^\eta = B_{t\eta }\), for \(t \ge \eta \).
It follows from an identical argument to that in the proof of Theorem 2.4 that \(\tau _\textsc {cl}^{0,\delta } \le \hat{\tau } \le \tau _\textsc {op}^{0,\delta }\) on \(\{\hat{\tau } \ge \delta \}\). However, by similar arguments to those used above, we deduce that \(\tau _\textsc {op}^{0,\delta }\) and \(\tau _\textsc {cl}^{0,\delta }\) have the same law on \(\{\hat{\tau } \ge \delta \}\), and hence that \(\hat{\tau } = \tau _\textsc {op}^{0,\delta }\) on this set, and then by taking \(\delta \rightarrow 0\), we get \(\hat{\tau } = \tau _\textsc {op}\) on \(\{\hat{\tau }>0\}\).
To see the final claim, we note that trivially \(\mathbb {P}(\hat{\tau }=0, B_0 \in A) \le (\min (\lambda , \mu ))(A)\). If there is strict inequality, then there exist some paths in \(\Gamma \) which start at \(x \in A\), and paths in \(\Gamma \) which stop at x at strictly positive time, constituting a stopgo pair and therefore violating the monotonicity principle. \(\square \)
Remark 6.18
We observe that the arguments of Remark 2.3 can be applied again in this context. However, one needs to be a little more careful, since it is necessary to take the fine closure of the barriers with respect to the fine topology for the processes \((t,B_t)_{t\ge 0}\). With this modification in place, the argument of Loynes can be easily adapted to show that the (finely closed versions) of the barriers in Theorems 6.16 and 6.17 are unique in the sense of Remark 2.3.
An optimal Skorokhod embedding problem which admits only randomized solutions
By analogy with optimal transport, we might interpret a ‘natural stopping time’ (i.e. a stopping time wrt to the Brownian filtration) which solves (OptSEP) as a Mongetype solution whereas stopping times which depend on additional randomization are of Kantorovichtype. With the exception of the Rost solution, all optimal stopping times encountered in the previous section are natural stopping times, and in the Rost case external randomization is only needed at time 0. One might ask whether the optimal Skorokhod embedding problem always admits a solution \(\tau \) which is natural on \(\{\tau >0\}\). We sketch an example, showing that this is not the case:
Example 6.19
There exist an absolutely continuous probability \(\mu \) and a continuous adapted process \(\gamma _t=\gamma ((B_s)_{s\le t})\) with values in [0, 1] such that (OptSEP) admits only randomized solutions.
Proof
Define the stopping time \(\sigma := \inf \{t \ge 0: B_t^2 + t^2 \ge 1\}\), the first time the Brownian path leaves the right half of the unit disc. Write \((C(0,\sigma ), {\mathbb {W}}_\sigma )\) for the space of continuous functions up to time \(\sigma \), equipped with the corresponding projection of Wiener measure. Pick an isomorphism
of standard Borel probability spaces. Using some extra randomization (independent of \(\mathcal {F}^B\)) we define a stopping time \(\tau \) such that

(1)
\(\tau =\sigma \) with probability 1 / 2,

(2)
otherwise \(\tau \) stops the first time the Brownian path reaches the level \(\pm l((B_s)_{s\le \sigma })\).
We then define \(\mu := \mathrm {Law}(B_\tau )\) and pick \(\gamma \) to be a function which equals 0 on paths which are stopped by \(\tau \) and is strictly positive otherwise; clearly we can do this in such a way that \( \gamma \) has continuous paths.
Write \({\hat{\tau }}\) for the randomized stopping time \(\mathsf {RST}(\mu )\) corresponding to \( \tau \). It is then straightforward to see that \({\hat{\tau }} \) is the unique solution of (OptSEP). Thus, the optimal Skorokhod embedding problem admits no (nonrandomized) solution in the natural filtration of B. \(\square \)
In optimal transport it is a difficult and interesting problem to understand under which conditions transport problems admit solutions of Mongetype. An interesting subject for future research would be to understand when Mongetype solutions exist for the optimal Skorokhod embedding problem.
Skorokhod embedding for Feller processes
In this section we discuss the extension of our results to the embedding problem for a continuous Feller process Z, with values in \(\mathbb {R}^d\) and \(Z_0\sim \lambda \). Throughout we suppose that Z is defined on a stochastic basis \((\Omega ,\mathcal {G},(\mathcal {G}_t)_{t\ge 0},\mathbb {P})\) which is sufficiently rich to support a uniformly distributed \(\mathcal {G}_0\)random variable independent of Z. Given a probability \(\mu \in {\mathcal {P}}(\mathbb {R}^d)\) the analogue of (SEP) is to construct a stopping time \(\tau \) such that
Recall from (2.1) that a stopping time \(\tau \) is minimal iff for any stopping time \(\tau '\) such that \(Z_{\tau '}\sim Z_{\tau }\) then \(\tau '\le \tau \) implies \(\tau '=\tau \) a.s. If Z is a one dimensional Brownian motion and \(\mu \) has second moment, minimality of \(\tau \) is equivalent to \(\mathbb {E}[\tau ]<\infty .\) Working in higher dimensions with general starting law we redefine
Given a function \(\gamma :S\rightarrow \mathbb {R}\) the optimal Skorokhod embedding problem for Z is to construct a stopping time optimizing
(As above, the value of \(P_\gamma ^Z\) does not depend on the underlying stochastic basis provided it supports a uniformly distributed random variable independent of Z.)
Most of the arguments required to establish our main results are abstract and carry over to the present setup. In fact, only the parts building on the condition \(\mathbb {E}[\tau ]<\infty \) need to be adjusted to account for the more general condition of \(\tau \) being minimal. Therefore, to establish Theorems 1.1, 1.2, and 1.3 in the general Feller setup, we need the crucial Assumption 7.1 below which we verify in a number of natural examples in Sect. 7.2.
Assumption 7.1
From now on we assume that \((\hbox {SEP}^\mathrm{Z})\) admits a solution and either

(1)
that there exist continuous functions \(h:\mathbb {R}^n\rightarrow \mathbb {R}\) and \(\zeta :S\rightarrow \mathbb {R}\) such that:

\(\zeta _t:=\zeta ((Z_s)_{s\le t},t)\) is strictly increasing, \(\zeta _0=0,\lim _{t\rightarrow \infty } \zeta _t=\infty \), \(\mathbb {P}\)a.s. and

\(X_t := h(Z_t)\zeta _t\) is a martingale and \((X^\tau _{ t})_{t\ge 0}\) is uniformly integrable for all \(\tau \) solving \((\hbox {SEP}^\mathrm{Z})\), or


(2)
that whenever \(\tau \) is a finite stopping time satisfying \(Z_\tau \sim \mu \) then \(\tau \) is minimal and there is an increasing function \(G:\mathbb {R}_+\rightarrow \mathbb {R},\lim _{t\rightarrow \infty }G(t) =\infty \) which satisfies
$$\begin{aligned} \sup \{\mathbb {E}[G(\tau )]: \tau \text{ solves } (\hbox {SEP}^\mathrm{Z})\} =: V <\infty . \end{aligned}$$(7.1)
The existence of a function G such that (7.1) holds is equivalent to
In fact, it is straightforward to see that we would arrive at an equivalent condition when replacing the deterministic function G by a stochastic process \((\zeta _t)_{t\ge 0}\) as in Case (1).
Note also that in Case (1) of Assumption 7.1, \(\tau \) with \(Z_\tau \sim \mu \) is minimal if and only if
Under Assumption 7.1, our main results extend to continuous Feller processes:
Theorem 7.2
If \(\gamma :S\rightarrow \mathbb {R}\) is lsc and bounded from below, \(\mathrm{({OptSEP}^{Z})}\) admits a minimizer.
Theorem 7.3
Let \(\gamma : S \rightarrow \mathbb {R}\) be lsc and bounded from below. Then we have the duality relation \(P^Z_\gamma = D^Z_\gamma \) for \(D^Z_\gamma :=\sup \int \psi (y) \, d\mu (y)\), where the supremum is taken over all continuous \(\psi \in L^1(\mu )\) such that there exists a continuous bounded martingale M with \(\mathbb {E}[M_0] = 0\) and a decreasing process A with \(\mathbb {E}[A_\tau ]\ge 0\) for all solutions \(\tau \) of \((\hbox {SEP}^\mathrm{Z})\) and almost surely for all \(t\ge 0\)
Moreover, in Case (1) of Assumption 7.1, the process A may be assumed to be zero at the expense of assuming that \((M_{\tau \wedge t})_{t\ge 0}\) is only uniformly integrable for all \(\tau \) solving \((\hbox {SEP}^\mathrm{Z})\).
Theorem 7.4
Let \(\gamma :S\rightarrow \mathbb {R}\) be Borel measurable. If \(\mathrm{({OptSEP}^{Z})}\) is well posed and \(\tau \) is an optimizer, there exists a \(\gamma \)monotone Borel set \(\Gamma \subseteq S\) such that \(\mathbb {P}\)a.s.
Remark 7.5

(1)
Of course, the analogues of the secondary optimization results, Theorems 6.1 (on existence of a minimizer) and 6.4 (monotonicity principle), carry over to the present setup with the obvious changes.

(2)
The continuity of \(\zeta \) on S which was imposed in Assumption 7.1 (1) is not required in Theorems 7.2 and 7.4.

(3)
The condition \(0< \mathbb {E}[\sigma ] < \infty \) in Definition 1.4 should be replaced by considering all stopping times with \(0< \mathbb {E}[\zeta ((B_s)_{s \le \sigma },\sigma )]<\infty \) in case (1) of Assumption 7.1, or \(0<\mathbb {E}[G(\tau )] < \infty \) in case (2). In addition, the expectation should be taken over the law of the Feller process started at \(f(s) = g(t)\).
Sketch of proofs
As in Sect. 3 we consider the canonical setup \((C(\mathbb {R}_+,\mathbb {R}^d),\mathcal {F}^0,\mathbb {Q})\) (where \(\mathbb {Q}\) denotes the law of the Feller process) and we write Y for the canonical process. It follows from continuity of Y (resp. Z) and the Feller property that the \(\mathcal {F}^a\)optional and the \(\mathcal {F}^a\)predictable \(\sigma \)algebra on the canonical space agree; similarly Proposition 3.5 on the definition of Scontinuous martingales extends to the present context. We define \(\mathsf {RST},\mathsf {JOIN}\) and related notions as before with \(\mathbb {Q}\) replacing \({\mathbb {W}}\). We say that \(\xi \in \mathsf {RST}\) is a minimal embedding of \(\mu \) if the corresponding stopping time \(\rho \) (cf. (3.6)) on the enlarged probability space \((C(\mathbb {R}_+,\mathbb {R}^d)\times [0,1],{\bar{\mathbb {Q}}})\) constitutes a minimal embedding. (Representing randomized stopping times as in Theorem 3.8 (1), the stopping time \(\xi \) constitutes a minimal embedding iff there is no randomized stopping time \(\xi '\ne \xi \) embedding the same measure which satisfies \(A^{\xi '}\ge A^\xi \).) For \(\mu \in {\mathcal {P}}(\mathbb {R}^d)\) we define \(\mathsf {RST}(\mu )\) to be the set of all minimal randomized stopping times embedding the measure \(\mu \).
Recalling the argument from Theorem 3.14, we see that the existence of a function \(\zeta : S \rightarrow \mathbb {R}\) such that \(\zeta \circ r\) increases to \(\infty \) and \(\sup _{\xi \in \mathsf {RST}(\mu )} \xi (\zeta \circ r)<\infty \) implies that \(\mathsf {RST}(\mu )\) is compact. (Vice versa, if \(\mathsf {RST}(\mu )\) is compact then such a function exists and can be chosen so that \(\zeta \circ r\) is deterministic). Hence, by (7.3) resp. (7.1), \(\mathsf {RST}(\mu )\) is compact.
Proof of Theorem 7.2
The argument follows the proof of Theorem 4.1 line by line. \(\square \)
Proof of Theorem 7.3
We give the argument in the case \(\lambda =\delta _0\) for ease of exposition. Setting \(h= \zeta \circ r\) resp. \(h= G\circ T\) (and using identical arguments as previously) we obtain the following extension of Proposition 4.3:
For \(c:{C_0(\mathbb {R}_+)}\times \mathbb {R}_+\times \mathbb {R}\rightarrow \mathbb {R}\cup \{\infty \}\) lsc, predictable and bounded from below
where the infimum is taken over the set \(\mathsf {JOIN}^{1,V} (\mu )=\{\pi \in \mathsf {JOIN}^1(\mu ): \pi (h)\le V\}\) and the supremum is taken over \(\varphi \in C_b({C_0(\mathbb {R}_+)}),\psi \in C_b(\mathbb {R})\) for which
The argument used to derive Theorem 4.2 from Proposition 4.3 then implies the desired duality relation \(P^Z_\gamma =D^Z_\gamma \), with a decreasing process (in (7.4)) of the form \(A_t= \alpha (h_tV)\) for some \(\alpha \ge 0\). In Case (1) of Assumption 7.1, A can be ‘hidden’ in M / \(\psi \) as in (4.14).
Proof of Theorem 7.4
Apart from the abstract theory the ingredients of the proof of Theorem 5.7 are Propositions 5.8 and 5.9. The only stage where the proof of Proposition 5.8 has to be altered is when establishing that the randomized stopping time \(\xi ^\pi \) is minimal. Under Assumption (7.1) (1) this follows using the minimality characterization given in (7.3), under Assumption (7.1) (2) this is of course trivial.
Proposition 5.9 only uses transport duality, the Feller property to construct Scontinuous martingales and Choquet’s capacitability theorem. \(\square \)
Examples
We now provide a list of Examples in which Assumption 7.1 is satisfied and Theorems 7.2, 7.3, and 7.4 apply.
Let Z be a onedimensional Brownian motion and assume that \(\lambda \) and \(\mu \) have first moments and are in convex order: then Assumption 7.1 (1) holds
Proof
By the de la ValléePoussin theorem (see e.g. [16, Thm. II 22]) there exists a positive, smooth and symmetric function \(F:\mathbb {R}\rightarrow \mathbb {R}_+\) with strictly positive, bounded second derivative and \(\lim _{x\rightarrow \infty } F(x)/x=\infty \) such that \(V:=\int F(x)\, \mu (dx) <\infty .\) We set
and note that \(\zeta _t\) increases to \(\infty \) since \(\mathbb {Q}(\int _0^\infty \mathbbm {1}_{[1,1]} (Z_t)\, dt= \infty )=1\) and \(F''\) is bounded away from 0 on \([1,1]\). Using Itô’s formula and our conditions on F we define the martingale
In the present Brownian case, it is known that the minimality of a finite stopping time \(\tau \) is equivalent to \((Z_{\tau \wedge t})_{t\ge 0}\) being a uniformly integrable martingale. This follows (in the case of a general starting law) from Lemma 12 and Theorem 17 of [10].
If \(Z_\tau \sim \mu \) and \((Z^\tau _{ t})_{t\ge 0}\) is uniformly integrable, then for each t, the law of \(Z_{\tau \wedge t}\) is bounded by \(\mu \) in the convex order and in particular \(\mathbb {E}[F(Z_{\tau \wedge t})] \le V, t\ge 0\). Uniform integrability of X then follows upon noting
\(\square \)
Onedimensional regular diffusions
Let Z be a regular (timehomogeneous) onedimensional diffusion on an interval \(I \subseteq \mathbb {R}\), with inaccessible or absorbing endpoints (see [47] for the relevant definitions and terminology) and \(Z_0 \sim \lambda ,\lambda (I^\circ )=\mu (I^\circ )=1\). In particular, Z is a continuous Feller process [47, Proposition V.50.1]. Then (on a possibly enlarged probability space) there exists a scale function s and a continuous, strictly increasing time change \(A_t\) such that \(B_t = s(Z_{A_t})\) is a Brownian motion up to the exit of \(s(I^\circ )\). Recalling the discussion in [11, Section 5], with the obvious extension of our notation, it is clear that there exists a minimal stopping time \(\tau \) embedding \(\mu \) in Z if and only if there exists a stopping time \(\tau '\) embedding \(s(\mu )\) in B such that
Moreover, write \(A^{1}_t\) for the inverse of \(A_t\), so \(A^{1}_{A_t} = t\). Since A and \(A^{1}\) are continuous and strictly increasing \(\tau \) is a minimal embedding of \(\mu \) in Z if and only if \(\tau ':=A^{1}_\tau \) is a minimal embedding of \(s(\mu )\) in B.
We now consider three cases. In the first two we verify Assumption 7.1 (2) and in the last case we verify Assumption 7.1 (1) under some additional smoothness assumptions. Subsequently we give some concrete examples.

(i)
Suppose \(s(I^\circ ) = (a,b)\) for \(a, b \in \mathbb {R}\). Then it follows from [10, Theorems 17 and 22] that a solution to \((\hbox {SEP}^\mathrm{Z})\) exists if and only if \(s(\lambda )\) precedes \(s(\mu )\) in convex order, and in fact, any finite \(\tau \) with \(Z_\tau \sim \mu \) is minimal. Moreover we note that

\(\{\tau ': B_{\tau '}\sim s(\mu ), \tau ' \text{ is } \text{ a } \text{ minimal }\} \) is bounded in probability

\(A_t<\infty \) provided the path \((B_s)_{s\le t}\) stays inside an interval \([c,d]\subseteq (a,b)\).

Given \(\varepsilon >0\) there exists an interval \([c,d]\subseteq (a,b)\) such that \((B_s)_{s\le \tau '}\) stays inside [c, d] with probability \(> 1\varepsilon \) for each minimal \(\tau ',B_{\tau '}\sim s(\mu )\).
It follows that \( \{A_{\tau '}: B_{\tau '}\sim s(\mu ), \tau ' \text{ is } \text{ minimal }\}\) is bounded in probability, hence (7.2) and then Assumption 7.1 (2) holds.


(ii)
Suppose \(s(I^\circ ) = (a,\infty )\) for \(a \in \mathbb {R}\), and that \(s(\lambda )\) and \(s(\mu )\) are in convex order and that the moments \(m_\lambda = \int s(y) \,\lambda (dy),m_\mu = \int s(y)\, \mu (dy)\) exist. Then it follows from Theorems 17 and 22 and the discussion at the top of p. 245 of [10] that a solution to \((\hbox {SEP}^\mathrm{Z})\) exists if and only if for all \(x \ge a\),
$$\begin{aligned} \int s(y)x\,\mu (dy) \le \int s(y)  x\, \lambda (dy) + (m_\lambda m_\mu ) \end{aligned}$$(7.7)Again, any finite \(\tau \) with \(Z_\tau \sim \mu \) is minimal and (7.2) follows as above.
An analogous result holds if \(s(I^\circ ) = (\infty ,b)\) for \(b \in \mathbb {R}\).

(iii)
Suppose \(s(I^\circ ) = (\infty , \infty )\) and that \(s(\lambda ), s(\mu )\) are in convex order, \( \int s(y)^2 \,\mu (dy) < \infty \). Then we are in the classical case, and a stopping time \(\tau \) with \(Z_\tau \sim \mu \) is minimal if and only if \(\mathbb {E}[A^{1}_\tau ] < \infty \). If the process Z is sufficiently wellbehaved (as in the examples below) one can show that \(X_t = s(Z_t)^2  A_t^{1}\) is a martingale and that \(A^{1}\) depends continuously on the path \((Z_s)_{s\le t}\). For all \(\tau \) solving \((\hbox {SEP}^\mathrm{Z}),\mathbb {E}[A^{1}_\tau ] < \infty \); hence \((X^\tau _t)_{t\ge 0}\) is uniformly integrable and Assumption 7.1 (1) is satisfied.
More generally, when only the integrals \(\int s(y) \,\lambda (dy),\int s(y) \,\mu (dy)\) are finite, (assuming sufficient regularity of the diffusion), Assumption 7.1 (1) follows as in Sect. 7.2.1.
Remark 7.6
Observe that none of the constructions described in Sects. 6.1 and 6.2 rely on fine properties of Brownian motion—the main properties used are the continuity of paths, the strong Markov property, and the regularity and diffusive nature of paths (that the process started at x immediately returns to x, and immediately enters the sets \((x,\infty )\) and \((\infty ,x)\)). It follows that all the given constructions extend to the case of regular diffusions described above.
Example 7.7
(Brownian motion with drift) Let \(Z_t=B_t+at\) for some \(a<0\) with \(Z_0\sim \lambda \), and \(I=(\infty ,\infty )\). Then a possible choice of the scale function is \(s(x) = \exp (2 a x)\). Let \(\lambda , \mu \in {\mathcal {P}}(\mathbb {R})\) be such that \(s(\lambda ), s(\mu )\) are integrable and satisfy (7.7). Then Assumption 7.1 holds by (ii) above.
Example 7.8
(Geometric Brownian motion) Let Z be a geometric Brownian motion, given through the SDE \(dZ_t= Z_tdB_t,Z_0\sim \lambda ~.\) A possible choice of scale function is \(s(x)=x.\) Let \(\lambda ,\mu \in {\mathcal {P}}(0,\infty )\) be such that \(s(\lambda ),s(\mu )\) are integrable and satisfy the corresponding version of (7.7). Then Assumption 7.1 holds by (ii) above. (More general versions of geometric Brownian motion can be treated similarly.)
Example 7.9
(Threedimensional Bessel process) Let \(Z=B\) for a threedimensional Brownian motion \((B_t)_{t\ge 0}\) with \(Z_0\sim \lambda .\) A possible choice of scale function is \(s(x)=11/x,\) and \(s(I^\circ )=(\infty , 1)\). Let \(\lambda ,\mu \in {\mathcal {P}}(0,\infty )\) be such that \(s(\lambda ),s(\mu )\) are integrable and satisfy the corresponding version of (7.7). Then Assumption 7.1 holds by (ii) above. Similar results hold for ddimensional Bessel processes, with \(d > 2\).
Example 7.10
(Ornstein–Uhlenbeck process) Let Z be an Ornstein–Uhlenbeck process, given for example as the solution to the SDE \(dZ_t = Z_t \, dt + dW_t, Z_0\sim \lambda \). Then \(Z_t\) is a regular diffusion on \(I=(\infty ,\infty )\) with scale function given (up to constants) by \(s'(x) = \exp (x^2)\), and \(s(I^\circ ) = (\infty ,\infty )\). Suppose \(\lambda , \mu \) are measures on \(\mathbb {R}\) such that \(s(\lambda ), s(\mu )\) are in convex order and \( \int s(y)^2 \,\mu (dy) < \infty \). Then \(A_t^{1} = \int _0^t \exp \{2Z_t^2\} \, ds\) is continuous as a function of \((Z_s)_{s \le t}\), and hence Assumption 7.1 holds by (iii) above.
The Hoeffding–Frechet coupling as a very particular Root solution
Let Z be the deterministic process given by \(dZ_t= dt\) started in \(Z_0 \sim \lambda \). Z is not a regular diffusion, however Assumption 7.1 (2) is easily checked. Let \(\mu \) be another probability and assume for simplicity that \(\max {{\mathrm{supp}}}\,\lambda \le \min {{\mathrm{supp}}}\,\mu \). Then the Root solution minimizes \(\mathbb {E}[\tau ^2]\). But note also that since \(\tau = Z_\tau Z_0\), this minimization problem corresponds precisely to finding the joint distribution \((Z_0, Z_\tau )\) which minimizes \(\mathbb {E}[(Z_\tau Z_0)^2]\): the classical transport problem in the most simple setup. Specifically, the Root solution for the particular case of the process Z corresponds precisely to the monotone (Hoeffding–Frechet) coupling. In the same fashion the Rost solution corresponds to the comonotone coupling between \(\lambda \) and \(\mu \).
References
Acciaio, B., Beiglböck, M., Penkner, F., Schachermayer, W., Temme, J.: A trajectorial interpretation of doob’s martingale inequalities. Ann. Appl. Probab. 23(4), 1494–1505 (2013)
Adams, D.R., Hedberg, L.I.: Function Spaces and Potential Theory, volume 314 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer, Berlin (1996)
Ambrosio, L., Pratelli, A.: Existence and stability results in the \(L^1\) theory of optimal transportation. In: Caffarelli, L.A., Salsa, S. (eds.) Optimal Transportation and Applications (Martina Franca, 2001), volume 1813 of Lecture Notes in Math., pp. 123–160. Springer, Berlin (2003)
Azéma, J., Yor, M.: Une solution simple au problème de Skorokhod. In: Séminaire de Probabilités, XIII (Univ. Strasbourg, Strasbourg, 1977/78), volume 721 of Lecture Notes in Math., pp. 90–115. Springer, Berlin (1979)
Baxter, J.R., Chacon, R.V.: Compactness of stopping times. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 40(3), 169–181 (1977)
Beiglböck, M., Goldstern, M., Maresch, G., Schachermayer, W.: Optimal and better transport plans. J. Funct. Anal. 256(6), 1907–1927 (2009)
Beiglböck, M., HenryLabordère, P., Penkner, F.: Modelindependent bounds for option prices: a mass transport approach. Finance Stoch. 17(3), 477–501 (2013)
Bianchini, S., Caravenna, L.: On optimality of \(c\)cyclically monotone transference plans. C. R. Math. Acad. Sci. Paris 348(11–12), 613–618 (2010)
Bouchard, B., Nutz, M.: Arbitrage and duality in nondominated discretetime models. Ann. Appl. Probab. 25(2), 823–859 (2015)
Cox, A.M.G.: Extending Chacon–Walsh: minimality and generalised starting distributions. In: DonatiMartin, C., Émery, M., Rouault, A., Stricker, C. (eds.) Séminaire de probabilités XLI, volume 1934 of Lecture Notes in Math., pp. 233–264. Springer, Berlin (2008)
Cox, A.M.G., Hobson, D.: Skorokhod embeddings, minimality and noncentred target distributions. Probab. Theory Relat. Fields 135, 395–414 (2005)
Cox, A.M.G., Obłój, J.: Robust hedging of double touch barrier options. SIAM J. Financ. Math. 2, 141–182 (2011)
Cox, A.M.G., Wang, J.: Root’s barrier: construction, optimality and applications to variance options. Ann. Appl. Probab. 23(3), 859–894 (2013)
Cox, A.M.G., Peskir, G.: Embedding laws in diffusions by functions of time. Ann. Probab. 43(5), 2481–2510 (2015)
Delbaen, F., Schachermayer, W.: A general version of the fundamental theorem of asset pricing. Math. Ann. 300(3), 463–520 (1994)
Dellacherie, C., Meyer, P.A.: Probabilities and Potential, volume 29 of NorthHolland Mathematics Studies. NorthHolland Publishing Co., Amsterdam (1978)
Dellacherie, C., Meyer, P.A.: Probabilities and Potential. B, volume 72 of NorthHolland Mathematics Studies. NorthHolland Publishing Co., Amsterdam (1982). Theory of martingales, Translated from the French by J. P. Wilson
Doléans, C.: Existence du processus croissant naturel associé à un potentiel de la classe (d). Probab. Theory Relat. Fields 9(4), 309–314 (1968)
Dolinsky, Y., Soner, H.M.: Martingale optimal transport and robust hedging in continuous time. Probab. Theory Relat. Fields 160(1–2), 391–427 (2014)
Dolinsky, Y., Soner, H.M.: Martingale optimal transport in the Skorokhod space. Stoch. Process. Appl. 125(10), 3893–3931 (2015)
Dolinsky, Y., Soner, M.H.: Robust hedging with proportional transaction costs. Finance Stoch. 18(2), 327–347 (2014)
Falkner, N.: The distribution of Brownian motion in \({R}^{n}\) at a natural stopping time. Adv. Math. 40(2), 97–127 (1981)
Galichon, A., HenryLabordère, P., Touzi, N.: A stochastic control approach to noarbitrage bounds given marginals, with an application to lookback options. Ann. Appl. Probab. 24(1), 312–336 (2014)
Gangbo, W., McCann, R.: The geometry of optimal transportation. Acta Math. 177(2), 113–161 (1996)
Gassiat, P., Mijatović, A., Oberhauser, H.: An integral equation for Root’s barrier and the generation of Brownian increments. Ann. Appl. Probab. 25(4), 2039–2065 (2015)
Gassiat, P., Oberhauser, H., dos Reis, G.: Root’s barrier, viscosity solutions of obstacle problems and reflected FBSDEs. Stoch. Process. Appl. 125(12), 4601–4631 (2015)
Guo, G., Tan, X., Touzi, N.: Optimal Skorokhod embedding under finitelymany marginal constraints. ArXiv eprints (2015)
HenryLabordère, P., Obłój, J., Spoida, P., Touzi, N.: The maximum maximum of a martingale with given \(n\) marginals. Ann. Appl. Probab. 26(1), 1–44 (2016)
Hirsch, F., Profeta, C., Roynette, B., Yor, M.: Peacocks and associated martingales, with explicit constructions, volume 3 of Bocconi & Springer Series. Springer, Milan; Bocconi University Press, Milan (2011)
Hobson, D.: Robust hedging of the lookback option. Finance Stoch. 2, 329–347 (1998)
Hobson, D.: The Skorokhod embedding problem and modelindependent bounds for option prices. In: Carmona, R., Çınlar, E., Ekeland, I., Jouini, E., Scheinkman, J.A., Touzi N. (eds.) ParisPrinceton Lectures on Mathematical Finance 2010, volume 2003 of Lecture Notes in Math., pp. 267–318. Springer, Berlin (2011)
Hobson, D., Neuberger, A.: Robust bounds for forward start options. Math. Finance 22(1), 31–56 (2012)
Jacka, S.: Doob’s inequalities revisited: a maximal \(h^1\)embedding. Stoch. Process. Appl. 29(2), 281–290 (1988)
Jacod, J., Mémin, J.: Sur un type de convergence intermédiaire entre la convergence en loi et la convergence en probabilité. In: Azéma, J., Yor M. (eds.) Séminaire de Probabilités XV. 1979/80 (Univ. Strasbourg, Strasbourg, 1979/1980) (French), volume 850 of Lecture Notes in Math., pp. 529–546. Springer, BerlinNew York (1981)
Kechris, A.S.: Classical Descriptive Set Theory, volume 156 of Graduate Texts in Mathematics. Springer, New York (1995)
Kiefer, J.: Skorohod embedding of multivariate RV’s, and the sample DF. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 24(1), 1–35 (1972)
Knott, M., Smith, C.S.: On the optimal mapping of distributions. J. Optim. Theory Appl. 43(1), 39–49 (1984)
Last, G., Mörters, P., Thorisson, H.: Unbiased shifts of Brownian motion. Ann. Probab. 42(2), 431–463 (2014)
Loynes, R.M.: Stopping times on Brownian motion: some properties of Root’s construction. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 16, 211–218 (1970)
Meyer, P.A.: Convergence faible et compacité des temps d’arrêt, d’après Baxter et Chacon. In: Dellacherie, C., Meyer, P.A., Weil M. (eds.) Séminaire de probabilités. volume XII of Lecture Notes in Mathematics, 649, pp. 411–423. Springer, Berlin (1978)
Monroe, I.: On embedding right continuous martingales in Brownian motion. Ann. Math. Stat. 43, 1293–1311 (1972)
Najnudel, J., Nikeghbali, A.: A new kind of augmentation of filtrations. ESAIM Probab. Stat. 15, S39–S57 (2011)
Obłój, J.: The Skorokhod embedding problem and its offspring. Probab. Surv. 1, 321–390 (2004)
Obłój, J., Spoida, P., Touzi, N.: Martingale inequalities for the maximum via pathwise arguments. In: DonatiMartin, C., Lejay, A., Rouault, A. (eds.) In Memoriam Marc YorSéminaire de Probabilités XLVII, pp. 227–247. Springer (2015)
Perkins, E.: The CereteliDavis solution to the \(H^1\)embedding problem and an optimal embedding in Brownian motion. In: Çınlar, E., Chung, K.L., Getoor, R.K. (eds.) Seminar on Stochastic Processes, 1985, pp. 172–223. Birkhäuser, Boston (1986)
Revuz, D., Yor, M.: Continuous Martingales and Brownian Motion, volume 293 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 3rd edn. Springer, Berlin (1999)
Rogers, L .C .G., Williams, D.: Diffusions, Markov Processes and Martingales: Itô Calculus, Cambridge University Press, Cambridge (2000)
Root, D.H.: The existence of certain stopping times on Brownian motion. Ann. Math. Stat. 40, 715–718 (1969)
Rost, H.: The stopping distributions of a Markov process. Invent. Math. 14, 1–16 (1971)
Rost, H.: Skorokhod stopping times of minimal variance. In: Meyer, P.A. (ed.) Séminaire de Probabilités, X (Première partie, Univ. Strasbourg, Strasbourg, année universitaire 1974/1975), pp. 194–208. Lecture Notes in Math., vol. 511. Springer, Berlin (1976)
Rüschendorf, L.: Fréchetbounds and their applications. In: Dall’Aglio, G., Kotz, S., Salinetti, G. (eds.) Advances in Probability Distributions with Given Marginals (Rome, 1990), volume 67 of Math. Appl., pp. 151–187. Kluwer Acad. Publ., Dordrecht (1991)
Rüschendorf, L.: Optimal solutions of multivariate coupling problems. Appl. Math. (Warsaw) 23(3), 325–338 (1995)
Skorohod, A.V.: Issledovaniya po teorii sluchainykh protsessov (Stokhasticheskie differentsialnye uravneniya i predelnye teoremy dlya protsessov Markova). Izdat. Kiev. Univ., Kiev (1961)
Skorokhod, A.V.: Studies in the Theory of Random Processes (Translated from the Russian by Scripta Technica Inc.). AddisonWesley Publishing Co. Inc., Reading (1965)
Strasser, H.: Mathematical Theory of Statistics: Statistical Experiments and Asymptotic Decision Theory, volume 7 of de Gruyter Studies in Mathematics. Walter de Gruyter & Co., Berlin (1985)
Vallois, P.: Le probleme de Skorokhod sur \({\mathbb{R}}\): une approche avec le temps local. In: Azéma, J., Yor M. (eds.) Séminaire de Probabilités XVII 1981/82, pp. 227–239. Springer (1983)
Villani, C.: Topics in Optimal Transportation, volume 58 of Graduate Studies in Mathematics. American Mathematical Society, Providence (2003)
Villani, C.: Optimal Transport. Old and New, volume 338 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin (2009)
Acknowledgements
The authors thank Julio Backhoff, Walter Schachermayer, and Nizar Touzi for useful discussions. We are particularly indebted to the anonymous referees and Manu Eder for many helpful suggestions which had a significant impact on the final version of this article.
Author information
Authors and Affiliations
Corresponding author
Additional information
The first author was supported by the FWFGrants P26736 and Y00782, the third author by the CRC 1060.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Beiglböck, M., Cox, A.M.G. & Huesmann, M. Optimal transport and Skorokhod embedding. Invent. math. 208, 327–400 (2017). https://doi.org/10.1007/s0022201606922
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s0022201606922
Mathematics Subject Classification
 Primary 60G42
 60G44
 Secondary 91G20