Geometry of distribution-constrained optimal stopping problems

We adapt ideas and concepts developed in optimal transport (and its martingale variant) to give a geometric description of optimal stopping times \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document}τ of Brownian motion subject to the constraint that the distribution of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document}τ is a given probability \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document}μ. The methods work for a large class of cost processes. (At a minimum we need the cost process to be measurable and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\mathcal {F}^0_{t})_{t \ge 0}$$\end{document}(Ft0)t≥0-adapted. Continuity assumptions can be used to guarantee existence of solutions.) We find that for many of the cost processes one can come up with, the solution is given by the first hitting time of a barrier in a suitable phase space. As a by-product we recover classical solutions of the inverse first passage time problem/Shiryaev’s problem.


Appetizer
To whet the reader's appetite and to give some idea of the kind of problems that can be solved with the methods presented in this paper we would like to start with two  corollaries to our main results. In Sect. 3 we will present these main results and in Sect. 4 we will use them to prove Corollary 1.1 from them.
Both Corollaries 1.1 and 1.2 assert that the solutions of certain optimal stopping problems can be described by a barrier in an appropriate phase space.
In this section, let (B t ) t≥0 be a Brownian motion started 1 in 0 on some filtered probability space ( , G, (G t ) t≥0 , P) satisfying the usual conditions and let μ be a measure on (0, ∞). First we consider optimal stopping problems of the following form.
Problem (OptStop ψ(B t ,t) ) Among all stopping times τ ∼ μ on ( , G, (G t ) t≥0 , P) find the maximizer of where the process Z is of the form Z t = ψ(B t , t). has distribution μ. τ has the following uniqueness properties: On the one hand it is the a.s. unique stopping time which has distribution μ and which is of the form (1.1) (we will later say that such a stopping time is the hitting time of a downwards barrier).
On the other hand τ is also the a.s. unique solution of (OptStop ψ(B t ,t) ) for a number of different ψ. Namely: -Let p ≥ 0, assume μ has finite moment of order 1 2 + p + ε for some ε > 0 and let A : R + → R be strictly increasing and |A(t)| ≤ K (1 + t p ) for some constant K . 2 Then we may choose ψ(B t , t) = B t A(t).
-Let p ≥ 2, assume μ has finite moment of order p 2 + ε for some ε > 0 and let φ : R → R satisfy φ > 0 as well as |φ(y)| ≤ K (1 + |y| p ) for some constant K . Then we may choose To give an example of a slightly more complicated functional amenable to analysis with our tools consider 1 We note that the results presented in this section remain valid for Brownian motions started according to a general law λ at the cost of slightly more tedious moment conditions in the formulation of Corollaries 1.1 and 1.2. 2 One may of course choose 0 ≤ p < 1 2 , ε := 1 2 − p and e.g. A(t) := t p so that no moment conditions beyond those at the very beginning of this theorem are imposed on μ.
where B * t = sup s≤t B(s).
We emphasize that the solutions to the constrained optimal stopping problems provided in Corollaries 1.1 and 1.2 represent particular applications of the abstract results obtained below. Figure 1 presents graphical depictions of stopping rules of several further solutions of constrained optimal stopping problems (together with the respective optimality properties). These stopping rules can be derived-under suitable moment conditions-using arguments very similar to those required for Corollaries 1.1 and 1.2 (see also the comments in Remark 7.1 at the end of the paper).

Background: martingale optimal transport and Shiryaev's problem
In this article we consider distribution-constrained stopping problems from a mass transport perspective. Specifically we find that problems of the form exemplified in (OptStop ψ(B t ,t) ) and (OptStop B * t ) are amenable to techniques originally developed for the martingale version of the classical mass transport problem. This martingale optimal transport problem arises naturally in robust finance; papers to investigate such problems include [8,12,16,18,20,25,31]. In mathematical finance, transport techniques complement the Skorokhod embedding approach (see [24,32] for an overview) to model-independent/robust finance.
A fundamental idea in optimal transport is that the optimality of a transport plan is reflected by the geometry of its support set which can be characterized using the notion of c-cyclical monotonicity. The relevance of this concept for the theory of optimal transport has been fully recognized by Gangbo and McCann [19], based on earlier work of Knott and Smith [28] and Rüschendorf [36,37] among others. Inspired by these ideas, the literature on martingale optimal transport has developed a 'monotonicity principle' which allows to characterize martingale transport plans through geometric properties of their support sets, cf. [6,7,9,10,22,39].
The main contribution of this article is to establish a monotonicity principle which is applicable to distribution-constrained optimal stopping problems. This transport approach turns out to be remarkably powerful, in particular we will find that questions as raised in Problems (OptStop ψ(B t ,t) ) and (OptStop B * t ) can be addressed using a relatively intuitive set of arguments.
The distribution-constrained optimal stopping problem (OptStop) (and specifically (OptStop B * t )) arises naturally in financial and actuarial mathematics. We refer the reader to [23] which describes various examples (unit-linked life insurances, stochastic modelling for health insurances, the liquidation of an investment portfolio, the valuation of swing options).
Bayraktar and Miller [5] consider the same optimization problem that we treat here. However their setup and methods are rather distinct from the ones used here: they assume that the target distribution is given by finitely many atoms and that the target functional depends solely on the terminal value of Brownian motion. Following the measure valued martingale approach of Cox and Källblad [5,14] address the constrained optimal stopping problem using a Bellman perspective.
The problem to construct a stopping time τ of Brownian motion such that the law of τ matches a given distribution on the real line was proposed by Shiryaev in his Banach Center lectures in the 1970's, it has since been called Shiryaev's problem or inverse first passage problem. Dudley and Gutmann [17] provide an abstract measure-theoretic construction. An early barrier-type solution to the inverse first passage problem was given by Anulova [3]. She constructs a symmetric two-sided barrier (corresponding to the case a = 0 in the sixth picture of Fig. 1). Anulova discretises the measure μ and concludes through approximation arguments. The solution to the inverse first passage problem given in Corollary 1.1 was derived by Chen et al. [13] based on a variational inequality which describes the corresponding barrier. Notably, this is predated by a (formal) PDE description of such barriers given by Avellaneda and Zhu [4] in the context of credit risk modeling. Ekström and Janson [13] relate this solution to an optimal stopping problem and provide an integral equation for the barrier. Analytic solutions to the inverse first passage problem are known only in a few cases ( [1,2,11,29,33,38]). An interesting connection between the inverse first passage problem and Skorokhod's problem is provided by Jaimungal et al. [26].

Statement of main results
Assumption 1 Throughout we will assume that ( , G, (G t ) t≥0 , P) is a filtered probability space and that (B t ) t≥0 is an adapted process which has continuous paths on ( , G, (G t ) t≥0 ), such that B can be regarded as a measurable map from to C(R + ), the space of continuous functions from R + to R. The cost function c will always be a measurable map C(R + ) × R + → R. μ will denote a probability measure on R + .
Then the problem we consider can be stated as follows.
Problem (OptStop) Among all stopping times τ ∼ μ find the minimizer of Here we formulate our main optimization problem in terms of minimization, following the usual convention in the optimal transport literature (which is also used in the closely related paper [6]). Clearly, a sign change transforms this into a maximization problem and in our applications we will in fact turn to this latter version when resulting formulations appear more natural. We trust that this will not cause confusion.
Throughout we will also make the following assumptions without further mention: There is a G 0 -measurable random variable U which is uniformly distributed on [0, 1] and independent of the process (B t ) t≥0 . 3. There is a probability measure λ s.t. (B t ) t≥0 is a Brownian motion with initial law λ, i.e. B 0 ∼ λ. 4. The problem is well-posed in the sense that E[c(B, τ )] is defined and > −∞ for all stopping times τ ∼ μ and that E[c(B, τ )] < ∞ for at least one such stopping time. 5. t p 0 dμ(t) < ∞, where p 0 ≥ 0 is some constant that we fix here and that can be chosen when applying the results from this section.
A note on language: The adjective "adapted" is usually applied to processes whose time argument is written in subscript form. For any filtered measurable space˜ and any function f :˜ × R + → R (or possibly f :˜ × R + → [−∞, ∞]) we will interchangeably think of f simply as a function or as the process Y t (ω) := f (ω, t). And so f being adapted means the same thing as (Y t ) t∈R + being adapted. Similarly for a subset of˜ × R + we may also think of as its indicator function or as the process Y t (ω) := 1 (ω, t) and will also say that the set is adapted.
With that in mind, Assumption 2.1 should seem like an obvious thing to ask for from the cost function. Also, knowing about the existence of optional projections, it should be clear no later than Lemma 5.3 that Assumption 2.1 does not pose a real restriction on the class of problems we are treating.
The role of Assumption 2.2 should become clearer soon. We would like to note at this point though that often enough our results put together will imply that the solution of Problem (OptStop) for a space ( , G, (G t ) t≥0 , P) which satisfies Assumption 2.2 is essentially the same as the solution of the Problem for a space which may not satisfy said assumption, and we will find that we can describe this solution in detail. This can be seen executed in the proofs of the corollaries stated in the Appetizer.
The methods in this paper work not just for Brownian motion but for a class of processes which is conceptually bigger, but then turns out to not include much beyond Brownian motion-namely for any space-homogeneous but possibly timeinhomogeneous Markov process with continuous paths which has the strong Markov property. (Here space-homogeneous means that starting the process at location x and then moving its paths to start at location y results in a version of the process started at y.) If the reader so wishes, she may think of B as a process from this slightly larger class of processes. Care was taken not to reference any properties of Brownian motion beyond those stated here. In particular our results apply to multi-dimensional Brownian motion. Assumption 2.4 is mostly just there to ensure that we are actually talking about an optimization problem in a meaningful sense. For the problems presented in the Appetizer, the moment conditions on μ which are given in the statement of Corollary 1.1 and Corollary 1.2 ensure that Assumption 2.4 is satisfied (as we will see in the proofs of these corollaries).
The constant p 0 in Assumption 2.5 will (implicitly) appear in the statement of Theorem 3.6, one of the main results. Its role is to ensure that E[ϕ(B, τ )] will be finite for some (class of) function(s) ϕ and any solution τ of (OptStop). (The choice ϕ(B, τ ) = τ p 0 is somewhat arbitrary here.) The main results are Theorems 3.1 and 3.6. We give two versions of Theorem 3.1. Version A is easier to state and may feel more natural, but we will need Version B (which is more general and has essentially the same proof as Version A) in the proof of the corollaries in the Appetizer.

Theorem 3.1 Version A.
Assume that the cost function c is bounded from below and lower semicontinuous when we equip C(R + ) with the topology of uniform convergence on compacts. Then the Problem (OptStop) has a solution.
Version B. Assume that the cost function c is lower semicontinuous when we equip C(R + ) × R + with the product topology of two Polish topologies which generate the right sigma-algebras on C(R + ) and R + respectively and assume that the set {c − (B, τ ) : τ ∼ μ, τ is a stopping time} is uniformly integrable, where c − := −c ∨ 0 denotes the negative part of c. Then the Problem (OptStop) has a solution.
To state Theorem 3.6 we need a few more definitions.
Remark 3.2 We will find it convenient to talk about processes that don't start at time 0 but instead at some time t > 0. Similarly we will consider stopping times taking values in [t, ∞). These will be defined on the space C([t, ∞)) equipped with the filtration (F s t ) s≥t , again generated by the canonical process (ω → ω(s)) s≥t . We refer to the distribution of Brownian motion started at time t and location x by W t x . This is a measure on C([t, ∞)). For a probability measure κ on R we write W t κ for the distribution of Brownian motion started at time t with initial law κ.

Definition 3.3 (Concatenation)
For every t ∈ R + we have an operation of concatenation, which is a map into C([t, ∞)) and is defined for (ω, s) ∈ C([t, ∞)) × [t, ∞) and θ ∈ C ([s, ∞)) with θ(s) = 0 by is defined as the set of all pairs ((ω, t), (η, t)) (note that the time components have to match) such that A hopefully intuitive way of putting the definition of Stop-Go pairs into words is the following: ((ω, s), (η, t)) form a Stop-Go pair iff, irrespective of how we might stop after time t (i.e. which stopping rule σ we might use after time t), Stopping ω at time t and letting η Go on is better-i.e. has lower cost-than stopping η and letting ω go on. These two situations are contrasted in Fig. 2.
As hinted at earlier, the definition of Stop-Go pairs depends on the parameter p 0 from Assumption 2.5. A larger p 0 means that we are asking for more in Assumption 2.5 and implies that we get a larger set SG, as we are quantifying over fewer stopping times σ in the definition of SG. This in turn implies that the conclusion of Theorem 3.6 below will be stronger.
Theorem 3.6 (Monotonicity Principle) Assume that τ solves (OptStop). Then there is a measurable, The following lemma should give a first hint about how the Monotonicity can be applied.
Lemma 3.7 Let τ be a solution of (OptStop) and assume that the cost function c is such that there exists a measurable, (3.5) where is a set with the properties in Theorem 3.6. Define the functionsτ andτ on C(R + ) byτ Thenτ ≤ τ ≤τ P-a.s. (3.6) When applying this Lemma to show that some optimal stopping problem has a barrier-type solution as symbolized for example by the pictures in Fig. 1 ) (the sign is flipped relative to the labelling in the picture because in this picture the barrier is drawn "up" instead of "down"), etc.
Notice that, contrary to customs, when we draw the barriersŘ/R in the pictures in Fig. 1 the first coordinate is the vertical axis and the second coordinate is the horizontal axis. This is because, to make cross-referencing and comparison with [6] easier, we follow their convention of always having time as the second coordinate but still in the pictures it seems more natural to put the independent variable on the horizontal axis.
Note that a prioriτ andτ need not be stopping times or even measurable, as we don't know much about the setsŘ andR.
Using the properties of a concrete process (Y t ) t≥0 we will in the proofs of Corollaries 1.1 and 1.2 be able to show thatτ =τ a.s. (this should not be surprising as for each time t the barriersŘ andR differ by at most a single point) and therefore that the optimizer τ is the hitting time of a barrier.

Remark 3.8 (Duality) Problem (OptStop)
is an infinite-dimensional linear programming problem and one would hence expect that a corresponding dual problem can be formulated. Indeed, assuming that c is lower semicontinuous and bounded from below, the value of the optimization problem equals where the supremum is taken over bounded This can be established in complete analogy to the duality result derived in [6, Theorem 1.2/Section 4.2] and we do not elaborate.

Digesting the appetizer
We will now demonstrate how to use the Monotonicity Principle of Theorem 3.6 to derive Corollary 1.1. The proof of Corollary 1.2 is very similar but relies on understanding a technical detail which does not add much to the story at this point, so we leave it for the end of the paper.
Both of the setsŘ andR in Lemma 3.7 have the property that (writing R for the set in question) (y, t) ∈ R and y ≤ y implies (y , t) ∈ R. We call such sets (downwards) barriers. More specifically, for technical reasons in what follows it is slightly more convenient to talk about subsets of [−∞, ∞] × R + instead of subsets of R × R + , giving the following definition.
The reader will easily verify the following lemma.
while the inverse maps a barrier R to the function β given by What we will show now, on the way to proving Corollary 1.1 is that the first hitting time after 0 of any downwards barrier by Brownian motion is a.s. equal to the first hitting time after 0 of the closure of that barrier. This serves to both resolve the question whether the times in Lemma 3.7 are stopping times and to show thatτ =τ a.s.
Let us assume for the rest of this section that B is actually a Brownian motion started in 0.

Lemma 4.3 Let R be a downwards barrier in
Then τ = τ a.s.
1+t is a bounded, strictly increasing function. Using just that R is the closure of R one proves by elementary methods that τ (ω) ≤ τ ε (ω) for all ω ∈ and any ε > 0. Because A(t) = t 0 (1 + s) −2 ds is the integral from 0 to t of a square integrable function we can apply Girsanov's theorem (see e.g. [34,Theorem 38.5]) to see that τ 1/n converges to τ in distribution as n → ∞.
As τ 1/n n is a decreasing sequence bounded below by τ we get that convergence holds almost surely.
The following is a particular case of [21, Corollary 2.3] (which in turn relies on arguments given in [30,35]). Note that this lemma is purely a statement about barriertype stopping times and is not directly connected to the optimization problem under consideration.

Lemma 4.4 (Uniqueness of Barrier-type solutions)
Assume that (Y t ) t≥0 is a measurable, (F 0 t ) t≥0 -adapted process and that the process Z defined through Z t := Y t (B) has a.s. continuous paths. Let R 1 , R 2 ⊆ [−∞, ∞] × R + be closed downwards barriers such that for Proof Is to be found in [21,Corollary 2.3].
We now have the necessary prerequisites to use our main results in showing that the first optimization problem in the Appetizer admits a (unique) barrier-type solution.
Proof of Corollary 1.1 The strategy is as follows: We choose a cost function and leverage Theorem 3.1 to show that an optimizer exists, the Monotonicity Principle in the form of Theorem 3.6 and Lemma 3.7 will-with some help from Lemma 4.3show that any optimizer must be the hitting time of a barrier. Lemma 4.4 shows that any two barrier-type solutions must be equal.
We now provide the details. Start with a cost function c(ω, t) := −ω(t)A(t) for a strictly monotone function A : R + → R which satisfies |A(t)| ≤ K (1 + t p ) and assume that μ has moment of order 1 2 + p + ε for some ε > 0. To prove that a barriertype solution exists when μ has first moment, choose a bounded strictly increasing A and p = 0, ε = 1 2 in this step. (These assumptions guarantee in particular that the optimization problems considered below have a finite value.) Clearly the problem (OptStop) for c corresponds to (OptStop ψ(B t ,t) ) for ψ(B t , t) = B t A(t) (i.e. ψ takes the role of −c such that the minimal/maximal values agree up to a change of sign). We will deal with the case where ψ(B t , t) = φ(B t ) at the end of this proof.
We now check that the conditions in Version B of Theorem 3.1 are satisfied. We also need to check that Assumption 2 holds. Here we need the assumption that μ has moment of order 1 2 + p + ε, as well as the Hölder and Burkholder-Davis-Gundy inequalities. The latter specialized to Brownian motion state that for all q > 0 there are positive constants K 0 and K 1 such that for any stopping time τ we have With these in hand a straightforward calculation allows us to bound B τ A(τ ) in the L 1+δ -norm for some δ > 0, independently of the stopping time τ ∼ μ.
This shows both that the uniform integrability condition in Version B of Theorem 3.1 is satisfied and that Assumption 2.4 is satisfied.
On C(R + ) we may choose the (Polish) topology of uniform convergence on compacts. For the topology on R + we start with the usual topology and turn A into a continuous function (if it wasn't), by making use of the fact that any measurable function from a Polish space to a second countable space may be turned into a continuous function by passing to a larger Polish topology (with the same Borel sets) on the domain. (This can be found for example in [27,Theorem 13.11].) In the statement of Corollary 1.1 we did not require that the probability space ( , G, (G t ) t≥0 , P) satisfy Assumption 2.2. To remedy this we can enlarge the probability space by setting˜ : where L is Lebesgue measure on [0, 1]. On this space we consider the Brownian motionB t (ω, x) := B t (ω). Theorem 3.1 now gives us an optimal stopping timeτ on the enlarged probability space. If we can show that this stopping time is in fact the hitting time of a barrier, then it follows thatτ = τ • ((ω, x) → ω) for a stopping time τ which is defined as the hitting time of the Brownian motion B of the same barrier. As there are more stopping times on (˜ ,G, t≥0 ) we conclude that τ must also be optimal among the stopping times on ( , G, (G t ) t≥0 ). With this out of the way, let us refer to our Brownian motion by B, to the optimal stopping time by τ and to our filtered probability space by ( , G, (G t ) t≥0 , P) irrespective of whether this is the original process and space we started with, or an enlarged one.
Choosing p 0 := 1 2 + p + ε in Assumption 2.5 we apply Theorem 3.6 to obtain a set on which (B, τ ) is concentrated under P and for which (3.4) holds. As μ is concentrated on (0, ∞), we may assume that Translating (3.5) to our situation, we want to prove that whereB is Brownian motion started in 0 at time t on C([t, ∞)) and σ is any stopping which clearly follows from the assumptions. So we know that Lemma 3.7 holds, i.e. using the names from said lemma we haveτ ≤ τ ≤τ P-a.s.
and μ has finite moment of order p 2 + ε for some ε > 0. Most of the proof remains unchanged. Setting c(ω, t) = −φ(ω(t)) we may again use the Burkholder-Davis-Gundy inequalities to show that c(B τ , τ ) is bounded in L 1+δ -norm, independently of the stopping time τ ∼ μ, thereby showing both that Assumption 2.4 is satisfied and that the uniform-integrability condition in Version B of Theorem 3.1 is satisfied.

Existence of an optimizer
The proof of existence of solutions to the Problem (OptStop) crucially depends on thinking of stopping times as the joint distribution of the process to be stopped and the stopping time. We introduce some concepts to make this precise and give a proof of Theorem 3.1 at the end of this section.
s ] we will mean this function.
Proof Obvious.
Here we use C b (X ) to denote the set of continuous bounded functions from a topological space X to R. The last sentence of the lemma is of course true for any topology on C([t, ∞)) for which the map ω → ω θ is continuous for all θ , but we will only need it for the topology of uniform convergence on compacts. 3 Given spaces X and Y we will denote the projection from X × Y to X by proj X (and similarly for Y ). For a measurable map F : X → Y between measure spaces and a measure ν on X we denote the pushforward of ν under In this definition the topology on C([t, ∞)) is that of uniform convergence on compacts and the topology on [t, ∞) is the usual topology.
Given a distribution ν on C ([t, ∞)) we write We write RST t κ (P) for the set of all ξ ∈ RST t κ with mass 1 and call these the finite randomized stopping times.
In any of these, if we drop the superscript t then we will mean time t = 0, while, if we drop the subscript κ, then we mean that the initial distribution κ = δ 0 , i.e. the Brownian motion to be stopped is started deterministically in 0.
To explain the qualifier finite it may help to imagine that for a non-finite randomized stopping time of mass α < 1, the mass 1 − α which is missing is placed along C([t, ∞)) × {∞}.
The following Lemma 5.3 from [6] shows that the problem (OptStop) is equivalent to the following optimization problem (OptStop') in the sense that a solution of one can be translated into a solution of the other and vice versa. This of course also implies that the values of the two problems are equal, thereby showing that the concrete space ( , G, (G t ) t≥0 , P) has no bearing on this value, as long as Assumptions 1 and 2 are satisfied.
The definition we have given for a randomized stopping time is only the most convenient (for our purposes) of a number of possible equivalent definitions. Although Lemma 5.3 below should provide some intuition on what a randomized stopping time is, the reader may still wish to refer to [6, theorem 3.8] for the other possible ways of defining randomized stopping times. The first step in connecting condition (5.1), which is one of the equivalent conditions listen in said theorem, to the others, is to notice that (5.1) can be rewritten as where ξ ω is a disintegration of ξ with respect to W t κ . This says that the function s ] for all bounded continuous G, i.e. that it is a.s. F t s -measurable whenever F is supported on [t, s]. A limit argument then shows that ω → ξ ω ([t, s]) is a.s. F t s -measurable. Again, we refer the reader to [6] for a more detailed exposition.
Proof of Theorem 3. 1 We prove Version B of the theorem. Version A is a special case. We show that Problem (OptStop') has a solution. To this end we show that the set RST λ (μ) is compact (in the weak topology). From the fact that c is lower semicontinuous and bounded from below in an appropriate sense we then deduce by the Portmanteau theorem that the map is lower semicontinuous and therefore that the infimum inf ζ ∈RST λ (μ)ĉ (ζ ) is attained. Now for the details. On each of the spaces C(R + ) and R + we are dealing with two topologies, one coming from the Definition 5.2 of randomized stopping times (to wit, the topology of uniform convergence on compacts on the space C(R + ) and the usual topology on R + ) and one coming from the assumptions in the statement of this theorem. We can equip each of these spaces with the smallest topology which contains the two topologies in question. These are again Polish topologies and they still generate the standard sigma-algebras on the respective spaces. For the remainder of this proof all topological notions are to be understood relative to these topologies. So the topology on C(R + ) × R + is the product topology of these two topologies, and the weak topology on the space of measures on C(R + ) × R + is to be understood relative to this product topology. The cost function c of course remains lower semicontinuous and by Lemma 5.1 the functions (ω, r ) → F(r ) (G(ω) − E [G|F 0 s ]) appearing in Definition 5.2 are continuous.
Note that for ξ ∈ RST λ (μ) as μ has mass 1, so must ξ and (proj C(R + ) ) * (ξ ), which together with (proj C(R + ) ) * (ξ ) ≤ W 0 λ implies (proj C(R + ) ) * (ξ ) = W 0 λ . So we deduce The set is compact by Prokhorov's Theorem and the fact that pushforwards are continuous maps between measure spaces. It remains to show that RST λ (μ) is a nonempty closed subset. It is nonempty because the product measure W 0 λ ⊗ μ ∈ RST λ (μ). It is closed because, as noted, (ω, Now we show thatĉ is lower semicontinuous. The functions c N := c ∨ −N are each bounded from below and lower semicontinuous. By the Portmanteau theorem the mapsĉ N := ζ → c N dζ are lower semicontinuous. On RST λ (μ) they converge uniformly toĉ because which converges to 0 as N goes to ∞ by the uniform integrability assumption. As a uniform limit of lower semicontinuous functions is again lower semicontinuous we see thatĉ is lower semicontinuous.

Geometry of the optimizer
This section is devoted to the proof of Theorem 3.6. The proof closely mimicks that of Theorem 1.3/Theorem 5.7 in [6]. For the benefit of those readers already familiar with said paper we will first describe the changes required to the proofs there to make them work in our situation and then-for the sake of a more self-contained presentationindulge in reiterating the main arguments and only citing results from [6] that we can use verbatim.

Sketch of differences in the proof of Theorem 3.6 relative to [6, Theorem 5.7]
Again the strategy is to show that for a larger set SG ξ ⊇ SG we can find a set ⊆ The definition of SG ξ must of course be adapted analoguously to the changes required to the definition of SG. Apart from that the only real changes are to [6, Theorem 5.8]. Whereas previously it was essential that the randomized stopping time ξ r (ω,s) is also a valid randomized stopping time of the Markov process in question when started at a different time but the same location ω(s), we now need that ξ r (ω,s) will also be a randomized stopping time of our Markov process when started at the same time s but in a different place. Of course, when we are talking about Brownian motion both are true, but this difference is the reason why in the case of the Skorokhod embedding the right class of processes to generalize the argument to is that of Feller processes while in our setup we don't need our processes to be time-homogeneous but we do need them to be space-homogeneous. That we are able to plant this "bush" ξ r (ω,s) in another location is what guarantees that the measure ξ π 1 defined in the proof of Theorem 5.8 of [6] is again a randomized stopping time.
Whereas in the Skorokhod case the task is to show that the new better randomized stopping time ξ π embeds the same distribution as ξ we now have to show that the randomized stopping time we construct has the same distribution as ξ . The argument works along the same lines though-instead of using that ((ω, s), (η, t)) ∈ SG ξ implies ω(s) = η(t) we now use that ((ω, s), (η, t)) ∈ SG ξ implies s = t.
We now present the argument in more detail.
As may be clear by now, what we will show is that if ξ ∈ RST λ (μ) is a solution of (OptStop'), then there is a measurable, (F 0 t ) t≥0 -adapted set ⊆ C(R + ) × R + such that SG ∩ ( < × ) = ∅. Using Lemma 5.3 this implies Theorem 3.6.
We need to make some preparations. To align the notation with [6] and to make some technical steps easier it is useful to have another characterization of measurable, (F 0 t ) t≥0 -adapted processes and sets. To this end define Definition 6.1 r has many right inverses. A simple one is We endow S with the sigma algebra generated by r .
[6, Theorem 3.2], which is a direct consequence of [15,Theorem IV. 97], asserts that a process X is measurable, (F 0 t ) t≥0 -adapted iff X factors as X = X •r for a measurable function X : Note that r (ω, t) = r (ω , t ) implies (ω, t) θ = (ω , t ) θ and therefore for a set SG ⊆ S × S which is described by an expression almost identical to that in Definition 3.4. Namely we can overload to also be the name for the operation whose first operand is an element of S, such that (ω, t) θ = r (ω, t) θ and note that as c is measurable, (F 0 t ) t≥0 -adapted we can write c = c • r and thus get a cost function c which is defined on S.
Given an optimal ξ ∈ RST λ (μ) we may therefore rephrase our task as having to find a measurable set ⊆ S such that r * (ξ ) is concentrated on and that SG ∩ Note that for ⊆ S although r −1 [ ] < is not equal to r −1 < we still have One of the main ingredients of the proof of [6, Theorem 1.3] and of our Theorem 3.6 is a procedure whereby we accumulate many infinitesimal changes to a given randomized stopping time ξ to build a new stopping time ξ π . The guiding intuition for the authors is to picture these changes as replacing certain "branches" of the stopping time ξ by different branches. Some of these branches will actually enter the statement of a somewhat stronger theorem (Theorem 6.8 below), so we begin by describing these. Our way to get a handle on "branches"-i.e. infinitesimal parts of a randomized stopping time-is to describe them through a disintegration (wrt W 0 λ ) of the randomized stopping time. We need the following statement from [6] which should also serve to provide more intuition on the nature of randomized stopping times.
Using Lemma 6.2 above let us fix for the rest of this section both ξ ∈ RST λ (μ) and a disintegration (ξ ω ) ω∈C(R + ) with the properties above. Both Definition 6.3 below and Theorem 6.8 implicitly depend on this particular disintegration and we emphasize that whenever we write ξ ω in the following we are always referring to the same fixed disintegration with the properties given in Lemma 6.2. Note that the measurability properties of (ξ ω ) ω∈C(R + ) imply that for any I ⊆ [0, s] we can determine ξ ω (I ) from ω [0,s] alone. For ( f, s) ∈ S we will again overload notation and use ξ ( f,s) to refer to the measure on [0, s] which is equal to (ξ ω ) [0,s] for any ω ∈ C(R + ) such that r (ω, s) = ( f, s).
Here δ s is the Dirac measure concentrated at s. Really, the definition in the case where ξ ( f,s) ([0, s]) = 1 is somewhat arbitrary-it's more a convenience to avoid partially defined functions. What we will use is that (( f, t), (g, t)) ∈ S × S (again the times have to match) such that either

Definition 6.4 (relative Stop-Go pairs) The set SG ξ consists of all
or any one of the integral on the right hand side equals ∞ 3. either of the integrals is not defined holds. We also define Lemma 6.6 below says that the numbered cases above are exceptional in an appropriate sense and one may consider them a technical detail. Note that when we say (( f, t), (g, t)) ∈ SG ξ we are implicitly saying that ξ ( f,t) ([0, t]) < 1.
Note that the sets SG ξ and SG ξ are measurable (in contrast to SG, which may be more complicated). [6,Lemma 5.2] Let F : C(R + )×R + → R be some measurable function for which F dξ ∈ R. Then the following sets are evanescent.

Theorem 6.8 Assume that ξ is a solution of (OptStop').
Then there is a measurable set ⊆ S such that r * (ξ )( ) = 1 and Our argument follows [6,Theorem 5.7]. We also need the following two auxilliary propositions, which in turn require some definitions. Definition 6.9 Let υ be a probability measure on some measure space Y . The set JOIN λ (υ) is the set of all subprobability measures π on (C( Proposition 6.10 Let ξ be a solution of (OptStop'). Then (r × Id) * (π )(SG ξ ) = 0 for all π ∈ JOIN λ (r * (ξ )).
Here we use × to denote the Cartesian product map, i.e. for sets X i , Y i and functions x 1 ), F 2 (x 2 )). Proposition 6.10 is an analogue of [6,Proposition 5.8] and it is where the material changes compared to [6] take place. We will give the proof at the end of this section. Proposition 6.11 [6,Proposition 5.9] Let (Y, υ) be a Polish probability space and let E ⊆ S × Y be a measurable set. Then the following are equivalent N ) for some evanescent set F ⊆ S and a measurable set N ⊆ Y which satisfies υ(N ) = 0.
Proposition 6.11 is proved in [6] and we will not repeat the proof here.
Proof of Theorem 6.8 Using Proposition 6.10 we see that (r × Id) * (π )(SG ξ ) = 0 for all π ∈ JOIN λ (r * (ξ )). Plugging this into Proposition 6.11 we find an evanescent set F 1 ⊆ S and a set N ⊆ S such that r * (ξ )(N ) = 0 and SG ξ ⊆ ( Defining for any Borel set E ⊆ S the analytic set This shows that S\(N ∪ F > ) has full r * (ξ )-measure. Let be a Borel subset of that set which also has full r * (ξ )-measure. Then which shows SG ξ ∩ ( < × ) = ∅.
Proof Using integral notation instead of the more conventional E, we may write the classical form of the strong markov property as for all bounded measurable G : C(R + ) → R and all bounded F 0 τ -measurable H : Here t is the function which cuts off the initial segment of a path up to time t. From this a simple monotone class argument shows that We may then choose for K (ω, ω) the function F(η, τ (ω)) where the path η is created by cutting off the tail of ω after time τ (ω) and attachingω in its place. Noting the relationship between W τ (ω) x and W τ (ω) 0 we then get . For a fixed y ∈ [0, 1], ω → τ (y, ω) is an (F 0 t ) t≥0 -stopping time, so we may apply the previous equation to these stopping times and integrate over y ∈ [0, 1] to get F(ω, τ (y, ω)) · 1 R + (τ (y, ω)) d(L ⊗ W 0 λ )(y, ω) = F((ω, τ (y, ω)) ˜ , τ (y, ω)) · 1 R + (τ (y, ω)) dW Using the equation for α we see that this is what we wanted to prove. Lemma 6.14 (Gardener's Lemma) Assume that we have ξ ∈ RST λ (P), a measure α on C(R + ) × R + and two families β (ω,t) , are measurable for all Borel D ⊆ C(R + ) × R + and that for all Borel D ⊆ C(R + ) × R + . Then forξ defined by for all bounded measurable F we haveξ ∈ RST λ (P).

Remark 6.15
The intuition behind the Gardener's Lemma is that we are replacing certain branches β (ω,t) of the randomized stopping time ξ by other branches γ (ω,t) to obtain a new stopping timeξ . This process happens along the measure α. Note that (6.6) implies that 1 D ((ω, t) The authors like to think of α as a stopping time and of the maps (ω, t) → β (ω,t) and (ω, t) → γ (ω,t) as adapted (in some sense that would need to be made precise). As these assumptions aren't necessary for the proof of the Gardener's Lemma, they were left out, but it might help the reader's intuition to keep them in mind.
Proof of Lemma 6.14 We need to check that theξ we define is indeed a measure, that (proj C(R + ) ) * (ξ) = W 0 λ and that (5.1) holds forξ . Checking thatξ is a measure is routine-we just note that (6.6) guarantees that ξ(D) ≥ 0 for all Borel D.
Let G : C(R + ) → R be a bounded measurable function.
The first summand is 0 because ξ ∈ RST λ (P). Looking at the second summand we expand the definition of E [G|F 0 r ].
whenever t ≤ r , which is the case for those t which are relevant in the integrand above, because F(s) = 0 implies s ≤ r and moreover β (ω,t) is concentrated on (ω, s) for which t ≤ s.
which is 0 because β (ω,t) ∈ RST t (P) and therefore for all (ω, t) and r ≥ t. The same argument works for the third summand in (6.7).
If π ∈ JOIN λ (r * (ξ )), then for any two measurable sets D 1 , D 2 ⊆ S, because π (C(R + )×R + )×D 2 ∈ RST λ and by making use of Lemma 6.12 we can deduce that Using the monotone class theorem this extends to any measurable subset of S × S in place of D 1 × D 2 . So we can set π := π (r ×Id) −1 [SG ξ ] and know that (proj C(R + )×R + ) * (π ) ∈ RST λ and that π is concentrated on SG ξ . We will be using a disintegration of π wrt r (ξ ), which we call π (g,t) (g,t)∈S and for which we assume that π (g,t) is a subprobability measure for all (g, t) ∈ S. It will also be useful to assume that π (g,t) is concentrated on the set {(ω, s) ∈ C(R + ) × R + : s = t} not just for r (ξ )-almost all (g, t) but for all (g, t). Again this is no restriction of generality. We will also push π onto (C(R + ) × R + ) × (C(R + ) × R + ), defining a measureπ via for all bounded measurable F. Observe that by Lemma 6.13 the pushforward of π under projection onto the second coordinate (pair) is ξ and that a disintegration ofπ wrt to ξ (again in the second coordinate) is given by π r (η,t) (η,t)∈C(R + )×R + . Let us name (proj C(R + )×R + ) * (π ) =: ζ ∈ RST λ . We will now use the Gardener's Lemma to define two modifications ξ π 0 , ξ π 1 of ξ such that ξ π := 1 2 (ξ π 0 + ξ π 1 ) is our improved randomized stopping time.
The concatenation on the last line is well-definedπ -almost everywhere becauseπ is concentrated on (r × r ) −1 SG ξ and so in the integrand above s = t on a set of full measure.
We need to check that the Gardener's Lemma applies in both cases. First of all observe that the product measure W t 0 ⊗δ t is in RST t (P) and that Lemma 6.13 implies for any randomized stopping time α. So for ξ π 0 the measures γ (ω,t) are given by W t 0 ⊗δ t and for ξ π 1 the measures β (ω,t) are given by W t 0 ⊗ δ t . For ξ π 0 the measure along which we are replacing branches is given by The branches β (ω,s) we remove are ξ r (ω,s) . We need to check that for all positive, bounded, measurable F : C(R + ) × R + → R. Let us calculate.
(1 − ξ ω ([0, s])) F((ω, s) ω, u) dξ r (ω,s) (ω, u) dζ(ω, s) Here we first used the definition of ξ r (ω,s) and then Lemma 6.13 and finally that (proj C(R + ) ) * (ζ ) ≤ W 0 λ . For ξ π 1 we replace branches along The calculation above shows that for all positive, bounded, measurable F : C(R + ) × R + → R. For ξ π 1 the branches γ (η,t) that we add are given by t) (ω, s) > 0 and δ t otherwise (again, the latter is arbitrary). In the more interesting case γ (η,t) is an average over elements of RST t (P) and therefore itself in RST t (P). Here it is again crucial that for π r (η,t) -almost all (ω, s) we have s = t, otherwise we would be averaging randomized stopping times of our process started at unrelated times.
We now want to extend (6.8) to c. We first show that (6.8) also holds for F : C(R + ) × R + → R which are measurable and positive and for which F dξ < ∞. To see this, approximate such an F from below by bounded measurable functions (for which (6.8) holds) and note that by previous calculations both Looking at positive and negative parts of c and using Assumption 2.4 to see that c − d(ξ π − ξ) ∈ R we get that indeed (6.8) holds for F = c. Now we will argue that the integrand in the right hand side of (6.8) is negativē π -almost everywhere. This will conclude the proof.

Variations on the theme
We proceed to prove Corollary 1.2. This is closely modelled on the treatment of the Azema-Yor embedding in [6, Theorem 6.5]. As is the case there we run into a technical obstacle, though one which can be overcome by combining the ideas we have already seen in slightly new ways.
To demonstrate the problem let us begin an attempt to prove Corollary 1.2. Again, we read off c(ω, t) = −ω * (t), with ω * (t) = sup s≤t ω(s). We may use Theorem 3.1 to find a solution τ of the problem (OptStop B * t ) and we use Theorem 3.6 to find a set ⊆ C(R + ) × R + for which P[(B, τ ) ∈ ] = 1 and SG ∩ ( < × ) = ∅. Now we would like to apply Lemma 3.7 with Y t (ω) = ω(t) − ω * (t), as proposed by Corollary 1.2, so we want to prove that , t), (η, t)) ∈ SG. Let us do the calculations. We start with an (F s t ) s≥t -stopping time σ , for which W t 0 (σ = t) < 1, W t 0 (σ = ∞) = 0 and for which both sides in (3.2) are defined and finite. To reduce clutter, let us name (ω → (ω, σ (ω))) * (W t 0 ) =: α, so that (3.2), which we want to prove, reads −ω * (t) + ((ω, t) θ) * (s) dα(θ, s) < −η * (t) + ((η, t) θ) * (s) dα(θ, s) (7.1) We may rewrite the left hand side as For the right hand side we get the same expression with ω replaced by η. Looking at the integrands we see that if but in the other case So if (7.2) holds for (θ, s) from a set of positive α-measure, then we proved what we wanted to prove. But if θ * (s) ≤ η * (t) − η(t) for α-a.a. (θ, s) then in (3.2) we have equality instead of strict inequality. As in [6, Theorem 6.5], one way of getting around this is to introduce a secondary optimization criterion. One way to explain the idea of secondary optimization is to think about what happens if, instead of considering a cost function c : C(R + )×R + → R we consider a cost function c : C(R + ) × R + → R n . Of course, to be able to talk about optimization, we will then want to have an order on R n . For reasons that should become clear soon, we decide on the lexicographical order. For the case n = 2 that we are actually interested in for Corollary 1.2 this means that We claim that Theorem 3.6 is still true if we replace c : C(R + ) × R + → R by c : C(R + ) × R + → R n and read any symbol ≤ which appears between vectors in R n as the lexicographic order on R n (and of course likewise for all the derived symbols and notions <, ≥, >, inf, etc.). Moreover, the arguments are exactly the same. Indeed the crucial part that may deserve some mention is at the end of the proof of Proposition 6.10, where we use the assumption that (6.9) holds on a set of positive measure, i.e. that the integrand is < 0 on a set of positive measure, and that the integrand is 0 outside that set, to conclude that the integral itself must be < 0. This implication is also true for the lexicographical order on R n . One more detail to be aware of is that integrating functions which map into R 2 may give results of the form (∞, x), (x, −∞), etc. In the case of a one-dimensional cost function we excluded such problems by making Assumption 2.4. What we really want in the proof of Proposition 6.10 is that c dξ and c dξ π should be finite. Clearly a sufficient condition to guarantee this is to replace Assumption 2.4 by (4') E[c(B, τ )] ∈ R n for all stopping times τ ∼ μ. This is not the most general version possible but it will suffice for our purposes.
To get an existence result we may assume that c = (c 1 , c 2 ) is component-wise lower semicontinuous and that both c 1 and c 2 are bounded below (in either of the ways described in the two versions of Theorem 3.1). Note that-because we are talking about the lexicographic order-ξ ∈ RST λ (μ) is a solution of (OptStop') for c iff ξ is a solution of (OptStop') for c 1 and among all such solutions ξ , ξ minimizes c 2 dξ . By Theorem 3.1 in the form that we have already proved the set of solutions of (OptStop') for c 1 is non-empty. It is also a closed subset of a compact set and therefore itself compact. This allows us to reiterate the argument that we used in the proof of Theorem 3.1 to find inside this set a minimizer of ξ → c 2 dξ . This minimizer is the solution of (OptStop') for c.
We would like to conclude by giving a couple of pointers to the interested reader who may want to work through the proofs corresponding to the remaining pictures in Fig. 1.
For the problem of minimizing E[B * τ ], it may actually happen that the timesτ,τ from Lemma 3.7 do not coincide. Specifically one has to expect this to happen on a non-negligible set whenŘ contains parts of the time axis whichR does not contain. Under these circumstances an optimizer may turn out to be a true randomized stopping time, with a proportion of a path hitting the time axis at a certain point needing to be stopped while the rest continues. In this situation the picture alone does not completely describe the optimal stopping time.
For the problems involving absolute values one needs to make a minor modification in the proof of Proposition 6.10. Specifically one can allow "mirroring" the paths which are "transplanted" using the Gardener's Lemma. This leads to a slightly different definition of Stop-Go pairs, which is perhaps most easily described by saying that in Fig. 2 the green paths which are stoppen by σ may be flipped upside-down on either side.