Optimal Transport and Skorokhod Embedding

The Skorokhod embedding problem is to represent a given probability as the distribution of Brownian motion at a chosen stopping time. Over the last 50 years this has become one of the important classical problems in probability theory and a number of authors have constructed solutions with particular optimality properties. These constructions employ a variety of techniques ranging from excursion theory to potential and PDE theory and have been used in many different branches of pure and applied probability. We develop a new approach to Skorokhod embedding based on ideas and concepts from optimal mass transport. In analogy to the celebrated article of Gangbo and McCann on the geometry of optimal transport, we establish a geometric characterization of Skorokhod embeddings with desired optimality properties. This leads to a systematic method to construct optimal embeddings. It allows us, for the first time, to derive all known optimal Skorokhod embeddings as special cases of one unified construction and leads to a variety of new embeddings. While previous constructions typically used particular properties of Brownian motion, our approach applies to all sufficiently regular Markov processes.


Introduction
Throughout this paper we denote by λ a measure on the real line which has barycenter 0 and finite second moment. Let B be a Brownian motion on some stochastic basis (Ω, F , (F t ) t≥0 , P). It is well known that then there exists a stopping time τ which solves 1 the Skorokhod embedding-problem B τ ∼ λ, E[τ] = x 2 λ(dx) . (1.1) There exist a variety of different constructions of a stopping time τ which solves the embedding problem (1.1), we refer to the survey of Obloj [Obł04]. Starting with Hobson's seminal paper [Hob98] the Skorokhod embedding problem has received significant attention in the mathematical finance community due to its relevance in the theory of model-independent finance, see [Hob11] for an overview. Here we do not elaborate on this connection; we just mention that a large class of problems corresponds to an optimization problem which relates to (1.1) and which we now formalize. (1.2) Throughout the paper we consider a functional γ : S → R.
The primal problem which we study consists in P γ (λ) = sup E γ (B t ) t≤τ : τ solves (1.1) . (1. 3) The authors thank Walter Schachermayer for many discussions and Nizar Touzi for helpful comments. The first author was supported by the FWF-grant p21209, the second author by the CRC 1060. 1 In particular we allow here that the stopping time τ depends on external randomization. The condition E[τ] = x 2 λ(dx) is imposed to exclude trivial (degenerate) solution of the embedding problem.
We say that the problem is well-posed iff E γ (B t ) t≤τ exists with values in [−∞, ∞) for all τ which satisfy (1.1) and is finite for one such τ. Typical examples of the functional γ which are relevant in model-independent finance are given by the running maximum γ(( f, s)) :=f := max t≤s f (t) or the convex/concave functions of time, e.g., γ(( f, s)) = h(s), where h : R + → R is convex/concave.
The set of all randomized stopping times (see (4.2) below) solving the Skorokhod Problem (1.1) is compact in a natural sense and as a consequence we will establish: Proposition 1.1. Let γ : S → R be upper semi-continuous. Then (1.3) admits a maximizer τ whenever the optimization problem is well-posed.
Here we can talk about the continuity properties of γ since S carries a naturally Polish topology: Let ( f, s), (g, t) ∈ S and assume wlog s ≤ t. We then say that ( f, s) and (g, t) are ε-close if max t − s, sup 0≤u≤s | f (u) − g(u)|, sup s≤u≤t |g(u) − g(s)| < ε. (1.4) 1.2. The Dual Problem. Related to the above primal problem is a dual problem which has a financial interpretation in terms of robust super-hedging. In the formulation given below we take the probability space to be the path space C(R + ) of continuous functions starting in 0 equipped with the Wiener measure W and we take (B t ) t≥0 to be the canonical process, i.e. B t (ω) = ω(t). where H runs through all predictable processes with t 0 EH 2 s ds ≤ at + b for some a, b > 0. Then there is no duality gap, i.e. P γ (λ) = D γ (λ).
(1.5) 1.3. Variational Principle. A basic and fundamental notion in the theory of optimal transport is c-cyclical monotonicity which we recall in (3.6) below. The remarkable feature of this optimality criterion is that the optimality of the measure π is linked to the geometry of the support set supp(π). Often this is key to understanding the transport problem. We establish a corresponding result which applies to the theory of Skorokhod embedding. Let B be a Brownian motion (on some stochastic basis (Ω, F , (F t ) t≥0 , P)) and τ a stopping time. Let Γ be a Borel subset of the set S defined in (1.2). We say that τ is concentrated on Γ if P-a.s.
(1.6) Suppose now thatτ is an optimizer of the primal problem P γ (λ) for some function γ. Intuition from optimal transport suggest that in this case there exists a set Γ ⊆ S which supportsτ and reflects the optimality ofτ: Suppose thatτ stops a path once it reaches (g, t) and some other path is still living in ( f, s). Assume also that f (s) = g(t) and "stop in ( f, s), don't stop in (g, t)" leads to a better γ-payoff. (1.7) Then we will say that ( f, s), (g, t) is a bad pair 3 w.r.t. γ and write BP for the set of all ( f, s), (g, t) satisfying (1.7). Ifτ is optimal we should not encounter bad pairs for which ( f, s) is still living with respect toτ while the process is dying in (g, t). After all, in this case it would be better to switch the role of ( f, s) and (g, t) under the stopping timeτ.
The following result formalizes this heuristic idea.
Theorem 1.3 (Variational Principle). Assume that γ : S → R is upper semi-continuous, the optimization problem (1.3) is well-posed and thatτ is an optimizer of P γ (λ). There is a stopping set Γ ⊆ S such thatτ is supported by Γ and there are no bad pairs with respect to Γ, i.e. if ( f, s), (g, t) ∈ BP, then at least one of the following applies: (1) ( f, s) is not before the right end of Γ, i.e. ( f, s) has no proper extension ( f ′ , s ′ ) ∈ Γ.
(2) (g, t) Γ. We call a set Γ verifying (1) and (2) γ-monotone. for the set of all ( f, s) which are before (the right end of) Γ, the properties (1), (2) can also be expressed as (1.9) Notice that, in general, the sets Γ < and Γ are not disjoint. In fact, for a stopping time τ there exists a set Γ such that τ is supported by Γ and Γ < ∩ Γ = ∅ iff the stopping time τ depends only on the evolution of B but not on external randomization, i.e. if τ is a stopping time for the filtration generated by B.
In Section 2 we will use Theorem 1.3 to give short derivations of the Root-and the Rost solution of the Skorokhod problem.
1.4. Connections with the Literature. The idea to relate the theory of optimal transport with model-independent finance first appeared in the papers [GHT12,BHP12].
While the article [BHP12] is concerned with a discrete time setup, Galichon, Henry-Labordere, and Touzi [GHT12] study the Skorokhod embedding problem as an optimal stopping problem. By connecting the Skorokhod embedding problem to a free boundary problem they derive the Azema-Yor solution to the Skorokhod-embedding.
Through the Dambis-Dubins-Schwarz theorem, the optimization problems (1.3) and (5.4) are related to the pricing of financial derivatives whose payoff is invariant under timechanges; this idea goes back to [Hob98]. In mathematical finance terms, Theorem 1.2 is a robust super-replication theorem comparable to the recent result of Dolinsky and Soner [DS12]. Dolinsky and Soner processed through a discretization of the problem. Opposed to our result this allows to treat also functionals γ which are not necessarily invariant w.r.t. time-changes. On the other hand, in [DS12] it is necessary to assume stricter conditions on the continuity of the functional γ, hence excluding functionals involving the quadratic variation. For related duality results in a quasi-sure context we refer to [PRT13].
The idea to consider an analogue of c-cyclical monotonicity in the martingale context comes from [BJ12] where the corresponding notion is introduced in a discrete time framework and applied to obtain a 1-dimensional martingale analogue of Brenier's theorem. A different and more explicit approach to this Brenier-type result is given by Henry-Labordere and Touzi in [HT13].
1.5. Organization of the Article. In Section 2 we establish the Root-and the Rostembedding based on Theorem 1.3. In Section 3 we recall some principal definitions and results from optimal transport. In Section 4 we consider randomized stopping times on the Wiener space and establish some basic properties. In Section 5 we develop the dual side of the problem and prove Theorem 1.2. In Sections 6 and 7 we will establish Theorem 1.3 by combining the duality theory of optimal transport with the Choquet-capacity theorem.

Particular embeddings
In this section we explain how Theorem 1.3 can be used to derive particular solutions to the Skorokhod embedding problem. We first define the notion of "bad pairs".
Definition 2.1. Write (g ⊕ h, t + u) for the path obtained from concatenating (g, t) and (h, u) ∈ S . Then the set of bad pairs for γ : S → R is given by [Roo69] established that there exists a barrier R such that the Skorokhod problem is solved by the stopping time Proof. Pick, by Theorem 1.3, a γ-monotone set Γ ⊆ S such that P(τ ∈ Γ) = 1. Note that due to the concavity of h the set of bad pairs is given by As Γ is γ-monotone, Γ < × Γ ∩ BP = ∅. Define a left and a right barrier by (2.4) and denote the respective hitting times by τ L and τ R . We claim that τ L ≤τ ≤ τ R a.s. Note that τ L ≤τ holds by definition of τ L . To show the other inequality pick ω satisfying (B t (ω)) t≤τ(ω) ,τ(ω) ∈ Γ and assume for contradiction that τ R (ω) <τ(ω). Then there exists s <τ(ω) such that B s (ω) ∈ R R . By definition of the right barrier, this means that there is some (g, t) ∈ Γ such that t < s and g(t) = B s (ω). But then ( f, s) := (B u (ω)) u≤s , s) ∈ Γ < , hence ( f, s), (g, t) ∈ BP ∩ Γ < × Γ which is the desired contradiction.
It remains to show that B τ L ∼ B τ R . This is evident from the properties of one-dimensional Brownian motion but can also be seen by a "softer" argument: Consider x, t + ε ≤ s} and the corresponding hitting time τ ε . Then the law of B τ ε tends to the law B τ L in the total variation norm. To see this, write B ε for Brownian motion started at time −ε and note that B ε τ L ∼ B τ ε .
A consequence of this proof is that (on a given stochastic basis) there exists exactly one solution 4 of the Skorokhod embedding problem which maximizes (2.2): Assume that maximizers τ 1 and τ 2 are given. Then we can use an independent coin-flip to define a new maximizerτ which is with probability 1/2 equal to τ 1 and with probability 1/2 equal to τ 2 . By Theorem 2.2,τ is of barrier-type and hence τ 1 = τ 2 .
We also note that the above proof of Theorem 2.3 is based on a heuristic derivation of the optimality properties of the Root-embedding given by Hobson in [Hob11]. Indeed Hobson's approach was the starting point of the present paper.
2.2. The Rost embedding. A set R ⊆ R × R + is an inverse barrier if (x, s) ∈ R and s > t implies that (x, t) ∈ R. It has been shown by Rost that under the condition 5 λ({0}) = 0 there exists an inverse barrier such that the corresponding hitting time (in the sense of (2.1)) solves the Skorokhod problem. We derive this using an argument almost identical to the one above: Let Proof. Pick, by Theorem 1.3, a γ-monotone set Γ ⊆ S such that P(τ ∈ Γ) = 1. Note that due to the convexity of h the set of bad pairs is given by As Γ is γ-monotone, Γ < × Γ ∩ BP = ∅. Define a left and a right inverse barrier by (2.6) and denote the respective hitting times by τ L and τ R . We claim that τ R ≤τ ≤ τ L a.s. Note that τ R ≤τ holds by definition of τ R . To show the other inequality pick ω satisfying (B t (ω)) t≤τ(ω) ,τ(ω) ∈ Γ and assume for contradiction that τ L (ω) <τ(ω). Then there exists s <τ(ω) such that B s (ω) ∈ R L . By definition of the left barrier, this means that there is some (g, t) ∈ Γ such that s < t and g(t) = B s (ω). But then ( f, s) := (B u (ω)) u≤s , s) ∈ Γ < , hence ( f, s), (g, t) ∈ BP ∩ Γ < × Γ which is the desired contradiction.
Similar to the previous proof we have B τ L ∼ B τ R .
As in the case of the Root-embedding we obtain that the maximizer of E[h(τ)] is unique.
2.3. Remarks. It is well known (see for instance [Obł04,Hob11]) that the Root-and Rost-embedding can be shown to maximize E[h(τ)] for convex resp. concave h. In the above approach we have turned this upside down: the optimization problem is used as an auxiliary tool to derive the Root-and Rost-solution of the Skorokhod problem.
The arguments used here do not use the properties of one-dimensional Brownian motion. We believe that the above approach generalizes to a multi-dimensional setup and (sufficiently regular) continuous Markov-processes. Also it does not matter for the argument whether the starting distribution is a Dirac in 0 as in our setup or rather a more general distribution.
We also mention two recent accounts on the Root-embedding given in [CW12] and [Od13]. These approaches are based on PDE techniques and in particular allow for much more explicit description of the barrier than the proof given in Theorem 2.3.

The classical Transport Problem
To establish Theorem 1.2 and Theorem 1.3 we link the Skorokhod embedding problem to the Monge-Kantorovich optimal transport. This allows us to use the duality theorem of optimal transport and techniques related to c-cyclical monotonicity in the context of Brownian motion.
In abstract terms the transport problem (cf. [Vil03,Vil09]) can be stated as follows: For probabilities µ, ν on Polish spaces X, Y the set Cpl(µ, ν) of transport plans consists of all couplings between µ and ν. These are all measures on X × Y with X-marginal µ and Y-marginal ν. Associated to a cost function c : X × Y → [0, ∞] and π ∈ Cpl(µ, ν) are the transport costs X×Y c(x, y) dπ(x, y). The Monge-Kantorovich problem is then to determine the value inf c dπ : π ∈ Cpl(µ, ν) (3.1) and to identify an optimal transport planπ ∈ Cpl(µ, ν), i.e. a minimizer of (3.1). Going back to Kantorovich, this is related to the following dual problem. Consider the set Φ(µ, ν) of pairs (ϕ, ψ) of integrable functions ϕ : The dual part of the Monge-Kantorovich problem then consists in maximizing for (ϕ, ψ) ∈ Φ(µ, ν). In the literature duality has been established under various conditions, see for instance [Vil09, p 98f] for a short overview. Moreover the duality relation pertains if the optimization in the dual problem is restricted to bounded functions ϕ, ψ.
Concerning the origins of c-cyclical monotonicity in convex analysis and the study of the relation to optimality we mention [Roc66,KS92,Rüs96,GM96]. Intuitively speaking, 6 If c takes only values in [0, ∞), then it suffices to assume plain measurability ( [BS09]). c-cyclically monotone transport plans resist improvement by means of cyclical rerouting and optimal transport plans are expected to have this property. Indeed we have: Theorem 3.3. Let c : X × Y → R + be a lower semi-continuous cost function. Then a transport plan is optimal if and only if it is c-cyclically monotone.
Even in the case where c is the squared Euclidean distance this a non trivial result, posed as an open question by Villani in [Vil03, Problem 2.25]. Following contributions of Ambrosio and Pratelli [AP03], this problem was resolved by Pratelli [Pra08] and Schachermayer-Teichmann [ST09] who established the clear-cut characterization stated in Theorem 3.3. 7 Notably Theorem 1.3 above is only a first step towards a full characterization of optimality as provided in Theorem 3.3. To obtain a necessary and sufficient condition for optimality in the Skorokhod embedding case (1.3) an extension from pairs to a finite number of stopped paths will be required.

Spaces and Filtrations.
In this section we mainly discuss the formal aspects of filtrations, measure theory, etc. Confident readers might want to skip this section.
We consider the space Ω = C(R + ) of continuous paths with the topology of uniform convergence on compact sets. The elements of Ω will be denoted by ω. As explained above we consider the set S of all continuous functions defined on some initial segment [0, s] of R + ; we will denote the elements of S by ( f, s) and (g, t). The set S admits a natural partial ordering; we say that (g, t) extends ( f, s) if t ≥ s and the restriction g [0,s] = f . In this case we write ( f, s) ≺ (g, t).
For two sets A, B the projection from A × B to A (resp. B) will be denoted by proj A (resp. proj B ). For a map T : X → Y and a measure µ on X the push forward of µ by T will be denoted by T (µ). The set of all probability (resp. sub-probability) measures on a Polish space Z will be denoted by P(Z) (resp. P ≤1 (Z)). The set of all finite nonnegative measures on a set Z will be denoted by M(Z). The complement of a set A will be denoted by ∁A.
For our arguments it will be important to be precise about the relationship between the sets C(R + ) × R + and S . We therefore discuss the underlying filtrations in some detail.
We consider three different filtrations on the Wiener space C(R + ), the canonical or natural filtration F 0 = (F 0 t ) t∈R + , the right-continuous filtration F + = (F + t ) t∈R + , and the augmented filtration F a = (F a t ) t∈R + obtained from (F 0 t ) t∈R + by including all W-null sets in F 0 0 . As the Brownian motion is a continuous Feller process, F a is automatically rightcontinuous, all F a -stopping times are predictable and all right-continuous F a -martingales are continuous. In particular, the F a -optional and the F a -predictable σ-algebra coincide (see e.g. [RY99, Corollary IV 5.7]). This will allow us to use the following result.
Of course, every F a -martingale has a continuous version. Not so commonly used but entirely straightforward is the following: if M is an F 0 -martingale then there is a version M ′ of M which is an F 0 -martingale and almost all paths of M ′ are continuous. 7 We refer to [BGMS09] and [BC10] for more general results, in particular it turns out that lower semicontinuity of the cost function is not required.
The message of the proposition below is that a process (X t ) t∈R + is F 0 -predictable iff X t (ω) can be calculated from the restriction ω [0,t] . We introduce the mapping (4.1) Note that the topology on S introduced in (1.4) coincides with the terminal topology induced by the mapping r; in particular r is continuous.
To establish Proposition 4.2 we use a result from [DM78]. Adjoin to R a coffin state ð. We let D be the set of all cadlag paths with lifetime, i.e. all cadlag functions f : Proposition 4.2 then follows from the following result.
We will primarily work with the natural filtration F 0 on Ω = C(R + ).

Preliminaries on stopping times.
Working on the path space C(R + ), a stopping time τ is a mapping which assigns to each path ω the time τ(ω) at which the path is stopped.
Assuming that a stopping time depends on some external randomization we may think that a path ω is not stopped at a particular point τ(ω) but rather that there exists a subprobability µ ω on R such that the path ω is stopped randomly according to the law µ ω . Let us make this idea precise. We consider the space We equip M with the weak topology induced by the continuous bounded functions on In particular, we will be interested in the subset RST of all elements which are "adapted". Formally we define the set RST of all randomized stopping times 8 to consist of all µ ∈ M satisfying one of the equivalent properties in the following theorem.
Theorem 4.5. Let µ ∈ M. Then the following are equivalent: 8 The relation of usual (non-randomized) stopping times and randomized stopping times is analogous to the relation of (Monge) transport-maps to (Kantorovich) transport-plans in theory of optimal transport.
(1) There is a Borel function H : S → [0, 1] such that H is right-continuous, decreasing and such that the support of f lies in [0, t] the random variable Proof. The argument is fairly straightforward. We establish that (2) implies (1). Consider a disintegration (µ ω ) ω∈Ω of µ. Define a processH bȳ ThenH is right continuous and F a -adapted, hence F a -progressive. By Theorem 4.1 and Proposition 4.2 there exist a Borel function H on S such thatH is indistinguishable from H • r. This function H is as required.
(1) The function H in (4.2) is unique up to indistinguishability (cf. Remark 4.3). We will designate this function H µ in the following. This function has a natural interpretation. H µ ( f, s) is the probability that a particle is still alive at time s given that it has followed the path f. We call H µ the lively-hood function associated to µ.
(2) We will say µ is a non-randomized stopping time or a deterministic stopping time iff there is a disintegration (µ ω ) ω∈Ω of µ such that µ ω is a Dirac-measure (of mass 1) for every ω. Clearly this means that µ ω = δ τ(ω) a.s. for some usual stopping time τ. µ is a deterministic stopping time iff there is a version of H µ which only attains the values 0 and 1.
The set of all finite randomized stopping times will be denoted by RST 1 .
Note, that on the set RST equipped with the topology inherited from M, it is sufficient to consider continuous, bounded, adapted processes as test functions.
Proposition 4.8. The set RST is closed.
Proof. Assume that a sequence (µ n ) n∈N in RST tends to µ. Fix a continuous bounded random variable Y : C(R + ) → R + and a continuous function f : R + → R + which has support in [0, t]. Then, using the above notation, (4.4) That is, X n t converges to X t weakly in L 2 (Ω, W). All X n t are F 0 t -measurable, hence also X t can be taken to be F 0 t -measurable.

Randomized Stopping of Martingales.
Given µ ∈ M and s ∈ R + we define the measure µ ∧ s ∈ M to be the random time which is the minimum of µ and s; formally this means that for ω ∈ Ω and Assume that (U s ) s∈R + is a process on Ω. Then the stopped process (U µ s ) is given by Recall (Definition 4.7) that the random time µ is finite if the measure µ has mass one. If (U t ) t≥0 is uniformly integrable, then we may consider Of course the optional stopping theorem applies: Proposition 4.9. Let µ ∈ RST and let (M t ) t∈R + be a martingale. Then (M µ t ) t∈R + is a martingale and we have Subsequently we will use that this property actually characterizes whether a given random time is a stopping time: for every uniformly integrable martingale M.
Subsequently we will often use the following notation: Let f : C(R + ) → R be a continuous bounded function. Then we write f M for the F 0 -martingale defined through whose paths are almost surely continuous.
Then, by Proposition 4.10, µ ∈ M is a randomized stopping time if and only if for all continuous bounded functions f : Proof of Proposition 4.10. We show (2), the proof of (1) is the same. The first implication follows from optional stopping, so assume that (4.6) holds for all uniformly integrable martingales. If µ is not a randomized stopping time then for some t there exist an It follows that there also exist a bounded continuous function H : This is the desired contradiction.

Relation with Skorokhod-Embedding.
As is customary in the theory of Skorokhod embedding we consider stopping times which are minimal, that is, we are interested in finite randomized stopping times µ such that (B µ t ) t∈R + is uniformly integrable. Given a centered probability measure λ on R we denote by

RST(λ)
the set of minimal stopping times µ such that B µ ∼ λ. From now on we make the assumption that λ has finite second moment 9 V := x 2 λ(dx) < ∞. (4.8) Denote by T the projection T : C(R + ) × R + → R + . For µ ∈ RST(λ) we then have (by uniform integrability of (B µ t ) and Jensen's inequality) On the other hand if B µ ∼ λ then E µ [T ] < ∞ implies that (B µ t ) t≤∞ is uniformly integrable: (B t ) 2 − t =: M t defines a martingale and Summing up we obtain the following fact (which is of course well known in theory of Skorokhod-embedding): Lemma 4.11. Using the above notations and assumptions, the following are equivalent for µ ∈ RST with B µ = λ: (1) µ is minimal.
The main reason why we consider randomized stopping times is that they have the following property: Theorem 4.12. The set RST(λ) is compact.
Proof. By Prohorov's theorem we have to show that RST(λ) is tight and that RST(λ) is closed.
Tightness. Fix ε > 0 and take R so large that V/R ≤ ε/2. Then, for any µ ∈ RST(λ) we have Then K is compact and we have for any µ ∈ RST(λ) Hence, RST(λ) is tight.

This readily implies that
4.5. Joinings / Tagged Stopping Times. We now add another dimension: assume that (Y, ν) is some Polish probability space. The set of all tagged random times or joinings JOIN(W, ν) = JOIN(ν) is given by We shall also write JOIN 1 (W, ν)/JOIN 1 (ν) for the subset of measures which have mass 1.
Remark 4.13. Write pred for the σ-algebra of F 0 -predictable sets in C(R + ) × R + . We call a set A ⊆ C(R + ) × R + × Y predictable if it is an element of pred ⊗ B(Y). We will say that a function defined on C( Recall that we consider a function γ : S → R + which is continuous -(or at least upper semi-continuous).
A randomized stopping time µ gives rise to the probability measure µ S := r(µ). Given an F a -predictable functionγ on C(R + ) × R + we can find a Borel function γ on S such that γ • r is indistinguishable fromγ and then d( f, s)). (5.1) As long as there is no danger of confusion we will not distinguish strictly betweenγ and γ as well as µ and µ S , respectively. We assume that there exists at least one µ ∈ RST(λ) which satisfies that and that the integral in (5.2) is less than ∞ for all µ ∈ RST(λ). The maximization problem introduced in the introduction can then also be written as It is straightforward to see that the functional (5.1) is upper semi-continuous provided that γ is (upper semi-) continuous. (This is spelled out in detail for instance in [Vil09,Chapter 4] in the context of classical optimal transport.) In particular (5.3) then admits an optimizer according to the compactness properties derived above.
Theorem 5.1. Let γ : C(R + ) × R + → R be upper semi-continuous, bounded from above and predictable. Put where ϕ runs through all continuous F a -martingales with Eϕ 2 t ≤ at + b for some a, b > 0. Then we have the duality relation By the martingale-representation theorem, Theorem 5.1 and Theorem 1.2 are equivalent.
A more natural assumption on the function γ would be that D γ (W, λ) < ∞ but presently we are not able to establish Theorem 5.1 in this case.
The key idea for the proof of Theorem 5.1 is to translate the embedding problem for λ into a transportation problem between the Wiener measure W and λ using the cost function The result of choosing this special cost function is that (5.5) where T is the projection on R + , V = x 2 λ(dx) and we used Y = R in the definition of JOIN 1 (W, λ) (see Section 4.5).
In Proposition 5.6 we will establish a dual problem corresponding to P C(R + )×R + ×R c (W, λ) and Theorem 5.1 will then be a simple consequence. However we need some preparations before we can establish Proposition 5.6.

A Non-Adapted (NA) Duality Result.
We first prove a "non adapted version" of the desired result and afterwards we use the min-max Theorem 5.4 to introduce adaptedness. To this end, put Again it is easy to show that D NA c ≥ P NA c . To show the other inequality we first collect some ingredients which will also be useful later on. In particular, we will use the min-max theorem in the following form.
where DC V NA (c n ) is to remind us on the dependence of the dual constraint set on c n . P NA c and D NA c are defined analogously. We have to prove that D NA c ≤ P NA c . For each k let π k ∈ TM V (W, λ) be such that By compactness of TM V (W, λ) there is a subsequence, still denoted by k, such that (π k ) k converges weakly to some π ∈ TM V (W, λ). Then by monotone convergence using the monotonicity of the sequence (c k ) k∈N we have Since, c k ≥ c implies P NA c k ≥ P NA c and D NA c ≤ D NA c k this allows us to deduce that Proof of Proposition 5.3. We may assume that c is bounded from above by zero. Hence, by Lemma 5.5 it is sufficient to establish (5.7) for continuous functions whose support satisfies for some t 0 ∈ R + . Put Assume now that c satisfies (5.8) for some t 0 ≥ V. We then have Formally the conditions involving V disappear in TM V t 0 (W, λ) and DC V NA,t 0 (c) if we put V = ∞, we therefore define c(ω, t, y) for t ≤ t 0 , y ∈ R W-a.s.} As a consequence of the classical Monge-Kantorovich duality theorem (3.1) we have (see forc upper semi-continuous and bounded from above. Using the min-max Theorem 5.4 with the function for π ∈ TM ∞ t 0 (W, λ) and α ≥ 0 we thus obtain where we have applied (5.11) to the functionc = c − α(t − V) to establish the equality between (5.13) and (5.14). This concludes the proof.

Introducing Adaptedness.
We can test "adaptedness" of a measure π ∈ P(C(R + ) × R + × R) by testing it against martingales: Put This follows in complete analogy to Proposition 4.10 and will be crucial for the subsequent argument.
Consider now the following set of dual candidates: c(ω, t, y), W-a.s. for all y ∈ R, t ∈ R + .
Then we can derive the following, adapted version of Proposition 5.3. Proof. Let us start with the case that c is continuous and bounded. The general case will follow by approximation, cf. Lemma 5.5. We will use again the notation f M t = E[ f |F t ]. We want to use the min-max Theorem 5.4 with the function The set TM V (W, λ) is convex and compact by Prohorov's theorem and the set of all h ∈ C b (C(R + )) × C b (R + ) of the form (5.17) is convex as well. Then we have The last equality holds by Proposition 5.3. We set c h = c +h.
Taking conditional expectation w.r.t. F 0 t we get using predictability of c (cf. Remark 4.13) This implies that (ϕ t , ψ) ∈ DC V (c). Because W(ϕ M t ) = W(ϕ) this implies that DC V NA (c h ) ⊆ DC V (c). Therefore, we have As usual, the other inequality is straightforward.
Proof of Theorem 5.1. We already saw in the beginning of this section that P γ (W, λ) = Moreover, as γ was assumed to be upper semi-continuous, also c is upper semi-continuous. Indeed, take any sequence (ω n , t n , y n ) converging to (ω, t, y). If lim sup n c(ω n , t n , y n ) = −∞ there is nothing to prove. On the other hand, if lim sup n c(ω n , t n , y n ) > −∞ there is a subsequence (ω n k , t n k , y n k ) with ω n k (t n k ) = y n k converging to some (ω, t, y). Then, we necessarily have that ω(t) = y because |ω(t) − y| ≤ |ω(t) − ω n k (t n k )| + |y n k − y|. Thus the upper semi-continuity of c follows from the upper semi-continuity of γ. Hence, by Proposition 5.6 to see that and it remains to show that . This is trivially equivalent to The alternative representation in (5.19) is useful to us since ω(t) 2 − t is a martingale. Puttingφ . This means that (φ −φ 0 ,ψ +φ 0 ) satisfy the constraint in the dual problem in (5.4). Recalling that V was defined by V = y 2 λ(dy) we have ψ (y) λ(dy) = ψ(y) λ(dy). Therefore, we can conclude that

Bad Pairs and Closed Stochastic Intervals
In the following, ν will always denote an optimizer of the primal optimization problem (5.3).
The notion of BP given in Definition 2.1 requires that all possible extensions (h, u) are considered. In this section we will also consider a weaker notion which is sensitive to the stopping measure ν. To this end we introduce the conditional randomized stopping time given ( f, s). Definition 6.1. Let µ ∈ RST be given and consider the livelyhood function H µ as in Remark 4.6. The conditional randomized stopping time of µ, given ( f, s) ∈ S , denoted by µ ( f,s) , is defined to be if H( f, s) > 0 and 0 otherwise. This is the normalized stopping measure given that we followed the path f up to time s. In other words this is the normalized stopping measure of the "bush" which follows the "stub" ( f, s).
Definition 6.2. The set of bad pairs relative to ν is defined by The interpretation of BP ν is that in average it is better to stop at ( f, s), chop off the "bush" and transfer it onto the "stub" (g, t).
The following result constitutes an important intermediate step towards Theorem 1.3. In the formulation as well as in the proof we interpret the space (C(R + )×R + )×(C(R + )×R + ) as a product X × Y so that we can make sense of the projections proj X and proj Y . Proposition 6.3. Let ν be a randomized stopping time which maximizes (5.3). Then (Y, ν) = (C(R + ) × R + , ν) is a Polish probability space. Assume that π ∈ JOIN(τ, ν) (where τ can be arbitrary) satisfies Then we have π(BP ν ) = 0.
The interpretation of (6.3) is that if a particle has a strictly positive chance to be alive w.r.t. proj X (π) then the probability that this particle is still alive w.r.t. ν is positive as well.
(3) The cost of ν π 0 plus the cost of ν π 1 is less than twice the costs of ν, i.e.
To define ν π 0 , we first consider p 0 = proj X (π) which is a randomized stopping time. As in Remark 4.6 we can view p 0 as right-continuous decreasing livelyhood-function H p 0 : S → [0, 1] which starts at 1. Possibly p 0 does not decrease to 0 since we allow that particles survive until ∞.
We now define the randomized stopping time ν π 0 as the product The probabilistic interpretation of this definition is that a particle is stopped by ν π 0 if it is stopped by p 0 or stopped by ν, where these events are taken to be conditionally independent given the path ω ∈ C(R + ). Comparing ν 0 and ν π 0 the latter will stop some particles earlier than the first one. We note that this in particular implies that E ν π 0 [T ] ≤ E ν [T ] < ∞. Let us now turn to the definition of ν π 1 . Set p 1 := proj Y (π). (Recall that we write (C(R + ) × R + ) × (C(R + ) × R + ) = X × Y, so that proj Y denotes the projection on the second coordinate.) Fix an F 0 -measurable 10 disintegration (ν ω ) ω∈C(R + ) of ν. Given ( f, s) ∈ S and (ω, t) ∈ C(R + ) × R + we define a measure on R with support in [t, ∞) by setting for where θ t (ω) = (ω s+t − ω t ) s≥0 . As discussed above, randomized stopping times can be represented either as probability measures on C(R + )×R + or as probability measures on S . Formally the tagged random time π is a measures on (C(R + ) × R + ) × (C(R + ) × R + ). However in defining ν π 1 we consider π as a probability on S × (C(R + ) × R + ). 11 Define the probability measure ν π We then have (1) ν π 0 , ν π 1 ∈ RST and E ν π This quantity is strictly positive by the definition of bad pairs and Assumption (6.3) 6.1. Approximation by particular stopping times.
Lemma 6.4. Let τ be a non-randomized stopping time w.r.t. the right-continuous filtration F + . For any ε, η > 0 there is an F + -stopping time ρ such that Proof. Fix ε, η > 0. Assume first that ((t, ∞)). In this proof we will often use this kind of identification of Then, ρ n , ∞ is open and 0, ρ n is closed. Put U = n ρ n , ∞ and define Then, we have 0, ρ = n 0, ρ n , which implies that ρ(ω) = inf n ρ n (ω). Hence, ρ is an F + stopping time and 0, ρ is closed. Moreover, because t−ε ≤ t n < t we have that for all n ≥ 1 it holds that t−ε ≤ ρ n (ω) < τ(ω). Hence, it also holds that t − ε ≤ ρ(ω) < τ(ω). Therefore, we can conclude that This proves the Lemma for the case that τ is an F 0 -stopping time which only takes the values t and ∞. From here it is straightforward to prove the Lemma for the case where τ takes values in a discrete subset of R + .
Assume now that τ is an arbitrary F 0 -stopping time. Since τ is predictable, there is an F 0 -stopping timeτ such thatτ ≤ τ, W(τ − ε/2 <τ) ≥ 1 − η/2. Pick a sequence of stopping times (τ n ) n∈N which for any n take only values in some discrete set such that τ n ↓τ. Put ε n = 2 −n ε/2 and η n = 2 −n η/2. According to what we have proved above pick ρ n which are very close (in terms of ε n , η n ) to the τ n and satisfy that ρ n , ∞ is open. Then set V := n ρ n , ∞ and ρ := inf{t : (ω, t) ∈ V} such that V = ρ, ∞ is open. Note that ρ = inf n ρ n . Hence, by construction ρ ≤ τ satisfies the required properties. Indeed, we only have to check that W(τ − ρ ≥ ε) ≤ η. To this end, one easily checks that which directly yields the estimate.
If τ is an F + -stopping time, it can be represented as a decreasing limit of F 0 -stopping times and repeating the above argument yields the result also in this case.
Corollary 6.5. Let τ be a non-randomized F + -stopping time 12 . Then there is a sequence of F + -stopping times τ n such that (1) τ n ↑ τ W-a.s.
(3) For each n the stochastic interval 0, τ n is closed in C(R + ) × R + .
Proof. For each n apply the previous lemma with ε n = η n = 2 −n .
Corollary 6.6. Let µ be a randomized stopping time. There exists a sequence of stopping times µ n such that (1) for each n there exist stopping times τ 1 ≤ . . . ≤ τ k and convex coefficients α 1 , . . . , α k such that up to indistinguishability. Moreover the τ i can be chosen so that 0, τ i is closed. (2) for each n, the stopping time µ n is before µ, i.e. H µ n ≤ H µ and µ n → µ weakly.
Proof. Fix n and define for ω ∈ C(R + ) and 1 ≤ i ≤ 2 n , This in turn yields H n (ω [0,t] ) = k/2 n < α. By Lemma 6.4 there are stopping times τ i <τ i with W(τ i − τ i > 3 −n ) ≤ 3 −n and such that 0, τ i is closed. Defining µ n by In the following we assume that τ is a non-randomized, bounded stopping time such that 0, τ is closed. Then the set is compact as a consequence of Prohorov's theorem. We also let RST τ = RST ∩ M τ . Since RST τ is closed we have the following Recall the definition of joinings in Section 4.5. We set JOIN(τ, ν) = π ∈ JOIN(W, ν) : proj C(R + )×R + (π) ∈ RST τ . Observe: Lemma 6.8. Under the above assumptions, the set JOIN(τ, ν) of tagged random times / joinings is compact with respect to the topology coming from the continuous bounded functions on C(R + ) × R + × R.

A Filtered Kellerer-type Lemma and the Principle Of Pointwise Determination
In this section we establish the following result which implies Theorem 1.3 stated in the introduction.
where BP ν is as in Definition 6.2 and Γ < as in (1.8).
As an intermediate step towards the proof of Theorem 7.1 we will look for two different sets Γ L ⊆ S and Γ D ⊆ S where Γ L (which roughly corresponds to Γ < ) represents the "still living pairs", while ν is concentrated on Γ D which represents the paths which get killed by ν. Here Γ L is a subset of all ( f, s) which lie before the "death"-set Γ D . The above condition on Γ then corresponds to: for ( f, s), (g, t) ∈ BP ν , at least one of the following applies: (1) ( f, s) Γ L (( f, s) is not living).
(2) (g, t) Γ D ((g, t) is not dying). As in (1.9) above, this can equivalently be expressed as Define a (non-randomized) stopping time τ ν by Using Lemma 6.5 we can pick a sequence τ n , n ≥ 1 of stopping times such that (1) τ n ↑ τ ν .
Finally we can replace Γ by a K σ -subset which still has full ν measure.
It remains to establish Lemma 7.2 which we shall now do. Important Convention. For the remainder of this section we fix a (finite) non-randomized stopping time τ such that 0, τ is closed and satisfies τ ≤ t 0 for some t 0 ∈ R + . 7.1. An auxiliary Optimization Problem. We fix a Polish probability space (Y, ν) which eventually will be taken to be (S , ν), where ν denotes an optimizer of the primal problem 5.3. We are interested in the following maximization problem I c (π) (7.6) and its relation to the dual problem To indicate the dependence of DC on the cost function c and the stopping time τ we sometimes write DC(c) or DC(c, τ). Note that for integrable ϕ we always have by optional stopping. Note that this is different from the already established duality results because we allow subprobability measures.
We first establish the easy inequality Lemma 7.3. With the above notations and assumptions we have D ≤1 ≥ P ≤1 .
Proof. Take (ϕ, ψ) ∈ DC and π ∈ JOIN(τ, ν). Then, by definition of tagged random time we have The last inequality holds by the dual constraint.
We will first prove a version which applies to not necessarily predictable c. Afterwards, we will use Proposition 4.10 to derive the predictable version. Let us start with Theorem 7.5. Let c : C(R + ) × R + × Y → R + be (upper semi-) continuous and bounded from above. Then Here the set of all tagged random measures is given by Proof of Theorem 7.5. We reduce the theorem to the classical duality theorem in optimal transport. Putc(ω, y) = sup t≤τ(ω) c(ω, t, y). As 0, τ is closed and boundedc is continuous.
This implies that P ≤1,NA + ε ≥P. Letting ε go to zero we obtain the claim.
Remark 7.6. A consequence of allowing partial transports or sub-probability measures π in the definition of the set JOIN is the following: Assume that we are given a cost function c which is non positive. Then, P ≤1 (c) = 0 as the zero measure is admissible and everything else is worse. Also the value of the dual problem is zero, D ≤1 (c) = 0, as the constraint is satisfied for ϕ, ψ ≡ 0. Similarly, for a general cost function c we have P ≤1 (c) = P ≤1 (c ∨ 0) and also D ≤1 (c) = D ≤1 (c ∨ 0). Hence, the requirement that c is nonnegative is not a restriction.
Proof. This is a direct consequence of tightness as the integrals g≥R g dπ → 0 uniformly in π as R → 0 by assumption.
Proof of Theorem 7.4. As c is bounded from above we have P ≤1 < ∞. Arguing as in Lemma 5.5, we may assume that the cost function c is continuous.
We will now argue as in Proposition 5.6. I.e. we consider again the functions h,h as in (5.17) and we shall apply Theorem 5.4 to the function for π ∈ TM(τ, ν). The set TM(τ, ν) is convex and compact by Prohorov's theorem and the set of h under consideration is convex as well. The function F is continuous by Lemma 7.7. This allows us to deduce The last equality holds by Theorem 7.5. Write For (ϕ, ψ) ∈ DC(c h ) it holds that (see Remark 7.6) c h (ω, t, y) + = c h (ω, t, y) ∨ 0 ≤ ϕ(ω) + ψ(y).
Having established the duality we can start drawing conclusions. Proof. Without loss of generality we may assume that K = K ∩ 0, τ . We want to apply the previous theorem with the cost function c = ½ K . Clearly, P ≤1 (½ K ) = sup π∈JOIN(τ,ν) π(K).

A Choquet-argument.
We now want to extend the previous result to the more general case of a merely measurable set K.
In the proof we will use Choquet's theorem similarly as in [BLS12]. In the proof we rely on Lemma 6.5 and the Lemma 7.9. Recall that we assume that the stopping time τ is smaller than or equal to some number t 0 .
A simple stopping time is a right-continuous increasing, F + -predictable process is a capacity.
Proof. To show that D defines a capacity we have to check the three defining properties of capacities; monotonicity, continuity from below and continuity from above for compact sets. The monotonicity is clear. Let us turn to the continuity from below. Take an increasing sequence A 1 ⊆ A 2 ⊆ . . . ⊆ C(R + ) × R + × Y of measurable sets and put A = n A n . For all n there are simple stopping times F n and measurable functions ϕ n : S → [0, 1] such that (ϕ n , F n ) ∈ Cov R (A n ) and ν(ϕ n ) + W((F n ) t 0 ) ≤ D(A n ) + 1 n .
We can then replace F by its right-continuous version which of course does not effect W(F t 0 ). Given ε > 0, by Corollary 6.6 we can find a simple stopping time F ε ≥ F such that W(F ε t 0 ) − ε < W(F t 0 ) = lim W((F n ) t 0 ). Therefore we can conclude D(A) ≤ lim sup n D(A n ) + 1 n + ε.
To show continuity from above for compact sets, take a sequence K 1 ⊇ K 2 ⊇ . . . of compact sets in C(R + ) × R + × Y and put K = n K n . Fix ε > 0. Then there is (ϕ, F) ∈ Cov S (K) s.t.
Using Corollary 6.5 / Corollary 6.6 and the regularity of the measure ν we can, on the cost of another ε, assume that where a i , b j ≥ 0 and B i , τ i , ∞ are open and F ε ↓ F such that W(F ε t 0 − F t 0 ) ≤ ε/2. By compactness of the K n , n ≥ 1 it follows that there is some N such that also (ϕ, F ε ) ∈ Cov S (K N ). Hence D(K N ) ≤ D(K) + 2ε.
Fix η > 0 and pick for each k some κ k , N k such that K ⊆ κ k , ∞ ×Y ∪ C(R + ) × R + × N k and W(κ k < τ) + ν(N k ) ≤ η2 −k . Then This shows that κ can be chosen so that W(κ < τ) = 0. Repeating this argument for the set N we obtain the desired result.