The geometry of multi-marginal Skorokhod Embedding

The Skorokhod Embedding Problem (SEP) is one of the classical problems in the study of stochastic processes, with applications in many different fields (cf.~ the surveys \cite{Ob04,Ho11}). Many of these applications have natural multi-marginal extensions leading to the \emph{(optimal) multi-marginal Skorokhod problem} (MSEP). Some of the first papers to consider this problem are \cite{Ho98b, BrHoRo01b, MaYo02}. However, this turns out to be difficult using existing techniques: only recently a complete solution was be obtained in \cite{CoObTo15} establishing an extension of the Root construction, while other instances are only partially answered or remain wide open. In this paper, we extend the theory developed in \cite{BeCoHu14} to the multi-marginal setup which is comparable to the extension of the optimal transport problem to the multi-marginal optimal transport problem. As for the one-marginal case, this viewpoint turns out to be very powerful. In particular, we are able to show that all classical optimal embeddings have natural multi-marginal counterparts. Notably these different constructions are linked through a joint geometric structure and the classical solutions are recovered as particular cases. Moreover, our results also have consequences for the study of the martingale transport problem as well as the peacock problem.


Introduction
The Skorokhod Embedding problem (SEP) is a classical problem in probability, dating back to the 1960s [58,59]. Simply stated, the aim is to represent a given probability as the distribution of Brownian motion at a chosen stopping time. Recently, motivated by applications in probability, mathematical finance, and numerical methods, there has been renewed, sustained interest in solutions to the SEP (cf. the two surveys [36,50]) and its multi-marginal extension, the multi-marginal SEP: Given marginal measures μ 0 , . . . , μ n of finite variance and a Brownian motion with B 0 ∼ μ 0 , construct stopping times τ 1 ≤ . . . ≤ τ n such that It is well known that a solution to (MSEP) exists iff the marginals are in convex order (μ 0 c . . . c μ n ) and have finite second moment; under this condition Skorokhod's original results give the existence of solutions of the induced one period problems, which can then be pasted together to obtain a solution to (MSEP). It appears to be significantly harder to develop genuine extensions of one period solutions: many of the classical solutions to the SEP exhibit additional desirable characteristics and optimality properties which one would like to extend to the multimarginal case. However the original derivations of these solutions make significant use of the particular structure inherent to certain problems, often relying on explicit calculations, which make extensions difficult if not impossible. The first paper which we are aware of to attempt to extend a classical construction to the multi-marginal setting is [12], which generalised the Azéma-Yor embedding [1] to the case with two marginals. This work was further extended by Henry-Labordère et al. [30,52], who were able to extend to arbitrary (finite) marginals, under particular assumptions on the measures. Using an extension of the stochastic control approach in [25] Claisse et al. [15] constructed a two marginal extension of the Vallois embedding. Recently, Cox et al. [18] were able to characterise the solution to the general multi-marginal Root embedding through the use of an optimal stopping formulation.
Mass transport approach and general multi-marginal embedding In this paper, we develop a new approach to the multi-marginal Skorokhod problem, based on insights from the field of optimal transport.
Following the seminal paper of Gangbo and McCann [26] the mutual interplay of optimality and geometry of optimal transport plans has been a cornerstone of the field. As shown for example in [14,45,53] this in not limited to the two-marginal case but extends to the multi-marginal case where it turns out to be much harder though. Recently, similar ideas have been shown to carry over to a more probablistic context, to optimal transport problems satisfying additional linear constraints [7,28,60] and in fact to the classical Skorokhod embedding problem [2].
Building on these insights, we extend the mass transport viewpoint developed in [2] to the multi-marginal Skorokhod embedding problem. This allows us to give multi-marginal extensions of all the classical optimal solutions to the Skorokhod problem in full generality, which we exemplify by several examples. In particular the classical solutions of Azéma-Yor, Root, Rost, Jacka, Perkins, and Vallois can be recovered as special cases. In addition, the approach allows us to derive a number of new solutions to (MSEP) which have further applications to e.g. martingale optimal transport and the peacock problem. A main contribution of this paper is that in many different cases, solutions to the multi-marginal SEP share a common geometric structure. In all the cases we consider, this geometric information will in fact be enough to characterise the optimiser uniquely, which highlights the flexibility of our approach.
Furthermore, our approach to the Skorokhod embedding problem is very general and does not rely on fine properties of Brownian motion. Therefore, exactly as in [2] the results of this article carry over to sufficiently regular Markov processes, e.g. geometric Brownian motion, three-dimensional Bessel process and Ornstein-Uhlenbeck processes, and Brownian motion in R d for d > 1. As the arguments are precisely the same as in [2], we refer to [2,Section 8] for details.
Related work Interest in the multi-marginal Skorokhod problem comes from a number of directions and we describe some of these here: • Maximising the running maximum: the Azéma-Yor embedding Suppose (M t ) t≥0 is a martingale and writeM t := sup s≤t M s . The relationship between the laws of M 1 andM 1 has been studied by Blackwell and Dubins [10], Dubins and Gilat [23] and Kertz and Rösler [44], culminating in a complete classification of all possible joint laws by Rogers [55]. In particular given the law of M 1 , the set of possible laws ofM 1 admits a maximum w.r.t. the stochastic ordering, this can be seen through the Azéma-Yor embedding. Given initial and terminal laws of the martingale, Hobson [35] gave a sharp upper bound on the law of the maximum based on an extension of the Azéma-Yor embedding to Brownian motion started according to a non-trivial initial law. These results are further extended in [12] to the case of martingales started in 0 and constrained to a specified marginal at an intermediate time point, essentially based on a further extension of the Azéma-Yor construction. The natural aim is to solve this question in the case of arbitrarily many marginals. Assuming that the marginals have ordered barycenter functions this case is included in the work of Madan and Yor [47], based on iterating the Azéma-Yor scheme. More recently, the stochastic control approach of [25] (for one marginal) is extended by Henry-Labordère et al. [30,52] to marginals in convex order satisfying an additional assumption ([52, Assumption ] 1 ). Together with the Dambis-Dubins-Schwarz Theorem, Theorem 2.11 below provides a solution to this problem in full generality. • Multi-marginal Root embedding In a now classical paper, Root [56] showed that for any centred distribution with finite second moment, μ, there exists a (right) barrier R, i.e. a Borel subset of R + × R such that (t, x) ∈ R implies (s, x) ∈ R for all s ≥ t, and for which B τ R ∼ μ, τ R = inf{t : (t, B t ) ∈ R}. This work was further generalised to a large class of Markov processes by Rost [57], who also showed that this construction was optimal in that it minimised E[h(τ )] for convex functions h.
More recent work on the Root embedding has focused on attempts to characterise the stopping region. A number of papers do this either through analytical means [16,17,27,48] or through connections with optimal stopping problems [19].
Recently the connection to optimal stopping problems has enabled Cox et al. [18] to extend these results to the multi-marginal setting. Moreover, they prove that this solution enjoys a similar optimality property to the one-marginal Root solution. The principal strategy is to first prove the result in the case of locally finitely supported measures by means of a time reversal argument. The proof is then completed in the case of general measures by a delicate limiting procedure. As a consequence of the theoretical results in this paper, we will be able to prove similar results. In particular, the barrier structure as well as the optimality properties are recovered in Theorem 2.7. Indeed, as we will show below, the particular geometric structure of the Root embedding turns out to be archetypal for a number of multi-marginal counterparts of classical embeddings.

• Model-independent finance
An important application field for the results in this paper, and one of the motivating factors behind the recent resurgence of interest in the SEP, relates to modelindependent finance. In mathematical finance, one models the price process S as a martingale under a risk-neutral measure, and specifying prices of all call options at maturity T is equivalent to fixing the distribution μ of S T . Understanding no-arbitrage price bounds for a functional γ , can often be seen to be equivalent to finding the range of E[γ (B) τ ] among all solutions to the Skorokhod embedding problem for μ. This link between SEP and model-independent pricing and hedging was pioneered by Hobson [35] and has been an important question ever since. A comprehensive overview is given in [36]. However, the above approach uses only market data for the maturity time T , while in practice market data for many intermediate maturities may also be available, and this corresponds to the multi-marginal SEP. While we do not pursue this direction of research in this article we emphasize that our approach yields a systematic method to address this problem. In particular, the general framework of super-replication results for model-independent finance now includes a number of important contributions, see [3,22,29,40], and most of these papers allow for information at multiple intermediate times.

• Martingale optimal transport
Optimal transport problems where the transport plan must satisfy additional martingale constraints have recently been investigated, e.g. the works of Dolinsky, Ekren, Gallichon, Ghoussoub, Henry-Labordere, Hobson, Juillet, Kim, Lim, Nutz, Obłoj, Soner, Tan, Touzi in [4,[7][8][9]13,22,24,25,38]. Besides having a natural interpretation in finance, such martingale transport problems are also of independent mathematical interest, for example -similarly to classical optimal transportthey have consequences for the investigation of martingale inequalities (see e.g. [11,30,52]). As observed in [5] one can gain insight into the martingale transport problem between two probabilities μ 1 and μ 2 by relating it to a Skorokhod embedding problem which may be considered as a continuous time version of the martingale transport problem. Notably this idea can be used to recover the known solutions of the martingale optimal transport problem in a unified fashion ( [41]).
It thus seems natural that an improved understanding of an n-marginal martingale transport problem can be obtained based on the multi-marginal Skorokhod embedding problem. Indeed this is exemplified in Theorem 2.17 below, where we use a multi-marginal embedding to establish an n-period version of the martingale monotone transport plan, and recover similar results to recent work of Nutz et al. [49].

• Construction of peacocks
Dating back to the work of Madan-Yor [47], and studied in depth in the book of Hirsch et al. [33], given a family of probability measures (μ t ) t∈[0,T ] which are increasing in convex order, a peacock (from the acronym PCOC "Processus Crois- The existence of such a process is granted by Kellerer's celebrated theorem, and typically there is an abundance of such processes. Loosely speaking, the peacock problem is to give constructions of such martingales. Often such constructions are based on Skorokhod embedding or particular martingale transport plans, and often one is further interested in producing solutions with some additional optimality properties; see for example the recent works [31,37,42,43]. Given the intricacies of multi-period martingale optimal transport and Skorokhod embedding, it is necessary to make additional assumptions on the underlying marginals and desired optimality properties are in general not preserved in a straight forward way during the inherent limiting/pasting procedure. We expect that an improved understanding of the multi-marginal Skorokhod embedding problem will provide a first step to tackle these range of problems in a systematic fashion.

Outline of the paper
We will proceed as follows. In Sect. 2.1, we will describe our main results. Our main technical tool is a 'monotonicity principle', Theorem 2.5. This result allows us to deduce the geometric structure of optimisers. Having stated this result, and defined the notion of 'stop-go pairs', which are important mathematical embodiment of the notion of 'swapping' stopping rules for a candidate optimiser, we will be able to deduce our main consequential results. Specifically, we will prove the multi-marginal generalisations of the Root, Rost and Azéma-Yor embeddings, using their optimality properties as a key tool in their construction. The Rost construction is entirely novel, and the solution to the Azéma-Yor embedding generalises existing results, which have only previously been given under a stronger assumption on the measures. We also give a multi-marginal generalisation of an embedding due to Hobson & Pedersen; this is, in some sense, the counterpart of the Azéma-Yor embedding; classically, this is better recognised as the embedding of Perkins [54], however for reasons we give later, this embedding has no multi-marginal extension. Moreover the proofs of these results will share a common structure, and it will be clear how to generalise these methods to provide similar results for a number of other classical solutions to the SEP. In Sect. 2.1, we also use our methods to give a multi-marginal martingale monotone transport plan, using a construction based on a SEP-viewpoint.
The remainder of the paper is then dedicated to proving the main technical result, Theorem 2.5. In Sect. 3, we introduce our technical setup, and prove some preliminary results. As in [2], it will be important to consider the class of randomised multistopping times, and we define these in this section, and derive a number of useful properties. It is technically convenient to consider randomised multi-stopping times on a canonical probability space, where there is sufficient additional randomisation, independent of the Brownian motion, however we will prove in Lemma 3.11 that any sufficiently rich probability space will suffice. A key property of the set of randomised multi-stopping times embedding a given sequence of measures is that this set is compact in an appropriate (weak) topology, and this will be proved in Proposition 3.19; an important consequence of this is that optimisers of the multi-marginal SEP exist under relatively mild assumptions on the objective (Theorem 2.1).
In Sect. 4 we introduce the notions of color-swap pairs, and multi-colour swap pairs. These will be the fundamental constituents of the set of 'bad-pairs', or combinations of stopped and running paths that we do not expect to see in optimal solutions. In this section we define these pairs, and prove some technical properties of the sets.
In Sect. 5 we complete the proof of Theorem 2.5. In spirit this follows the proof of the corresponding result in [2], and we only provide the details here where the proof needs to adapt to account for the multi-marginal setting.

Frequently used notation
• The set of Borel (sub-)probability measures on a topological space X is denoted by P(X) / P ≤1 (X).
of length d. • The d-dimensional Lebesgue measure will be denoted by L d .
• For a measure ξ on X we write f (ξ ) for the push-forward of ξ under f : X → Y.
• We use ξ( f ) as well as f dξ to denote the integral of a function f against a measure ξ . • C x (R + ) denotes the continuous functions starting at x; C(R + ) = x∈R C x (R + ).
the usual augmentation of (F 0 t ⊗ B([0, 1] d )) t≥0 . To keep notation manageable, we suppress d from the notation since the precise number will always be clear from the context. • X is a Polish space equipped with a Borel probability measure m. We set X := t≥0 . Again, we suppress d from the notation since the precise number will always be clear from the context.
• The set of stopped paths started at 0 is denoted by R is continuous,f(0)=0} and we define r : i.e. r X = (Id, r ). • We use ⊕ for the concatenation of paths: depending on the context the arguments may be elements of S, where Y is either S or C 0 (R + ) × R + , and Z may be any of the three spaces. For example, if ( f , s) ∈ S and ω ∈ C 0 (R + ), then ( f , s) ⊕ ω is the path (1.1) • As well as the simple concatenation of paths, we introduce a concatenation operator which keeps track of the concatenation time: We denote the set of elements of this form as S ⊗2 , and inductively, S ⊗i in the same manner. • Elements of S ⊗i will usually be denoted by ( f , s 1 , . . . , s i ) or (g, t 1 , . . . , t i ). We define r i : Accordingly, the set of i-times stopped paths started in X is S ⊗i X = (X, S ⊗i ). Elements of S ⊗i X are usually denoted by (x, f , s 1 , . . . , s i ) or (y, g, t 1 , . . . , t i ). In case of X = R we often simply write ( f , s 1 , . . . , s i ) or (g, t 1 , . . . , t i ) with the understanding that f (0), g(0) ∈ R. In case that there is no danger of confusion we will also sometimes write S ⊗i R = S ⊗i . The operators ⊕, ⊗ generalise in the obvious way to allow elements of S ⊗i X to the left of the operator. • For (x, f , s 1 , . . . , s i ) ∈ S ⊗i X , (h, s) ∈ S we often denote their concatenation by (x, f , s 1 , . . . , s i )|(h, s) which is the same element as (x, f , s 1 , . . . , s i ) ⊗ (h, s) but comes with the probabilistic interpretation of conditioning on the continuation of ( f , s 1 , . . . , s i ) by (h, s). In practice, this means that we will typically expect the (h, s) to be absorbed by a later ⊕ operation.

Existence and monotonicity principle
In this section we present our key results and provide an interpretation in probabilistic terms. To move closer to classical probabilistic notions, in this section, we slightly deviate from the notation used in the rest of the article. We consider a Brownian motion B on some generic probability space and recall that, for each 1 ≤ i ≤ n, We note that S ⊗i carries a natural Polish topology. For a function γ : S ⊗n → R which is Borel and a sequence (μ i ) n i=0 of centered probability measures on R, increasing in convex order, we are interested in the optimization problem We denote the set of all minimizers of (OptMSEP) by Opt γ . Take another Borel measurable function γ 2 : S ⊗n → R. We will be also interested in the secondary optimization problem P γ 2 |γ = inf{E[γ 2 ((B s ) s≤τ n , τ 1 , . . . , τ n )] : (τ 1 , . . . , τ n ) ∈ Opt γ }.
(OptMSEP 2 ) Both optimization problems, (OptMSEP) and (OptMSEP 2 ),will not depend on the particular choice of the underlying probability space, provided that ( , F , (F t ) t≥0 , P) is sufficiently rich that it supports a Brownian motion (B t ) t≥0 starting with law μ 0 , and an independent, uniformly distributed random variable Y , which is F 0 -measurable (see Lemma 3.11). We will from now on assume that we are working in this setting. On this space, we denote the filtration generated by the Brownian motion by F B . Many of the assumptions imposed on the problem can be weakened. First, the assumption that E[τ n ] < ∞ can be weakened, and the class of measures considered can then be extended to the class of probability measures with a finite first moment. More generally, the class of processes can be extended to include e.g. diffusions. Since all the arguments are identical to those in the single marginal setting, we do not work in this generality in this paper, but rather restrict our consideration to the case outlined above. For further details of how to extend the arguments, we refer the reader to [2,Section 7].
We will prove this result in Sect. 3.3. Our main result is the monotonicity principle, Theorem 2.5, which is a geometric characterisation of optimizersτ = (τ 1 , . . . ,τ n ) of (OptMSEP 2 ). The version we state here is weaker than the result we will prove in Sect. 5 but easier to formulate and still sufficient for our intended applications.
For two families of increasing stopping times (σ j ) n j=i and (τ j ) n j=i with τ i = 0 we define Note that (σ j ) n j=i and (τ j ) n j=i are again two families of increasing stopping times, sinceτ and similarly forσ j .

Example 2.2
To illustrate this construction, consider the following sequences of stopping times for Brownian motion started at B 0 = 0. Let σ j = H ±( j+1) := inf{t ≥ 0 : |B t | ≥ j + 1}, and τ j = j. The idea is that we want to construct a new sequence (σ j ) which 'starts' with τ 0 , but reverts to the original (σ j ) sequence as soon as possible. Correspondingly, we wish to construct the sequence (τ j ) which starts like (σ j ), but reverts to (τ j ) as soon as possible. As above, k = inf{ j ≥ i : τ j+1 ≥ σ j } is the first time (if at all) that B leaves the interval [− j, j] before time j. If this never happens, then the two sequences will just swap. That is, if the sequences switch back, then the construction gives: Note in particular that with this swap, theσ stopping times stop instantly, while theτ times no longer stop at time 0.
whenever both sides are well defined and the left hand side (of (2.4)) is finite.
Remark 2.6 (1) We will also consider ternary or j-ary optimization problems given j Borel measurable functions γ 1 , . . . , γ j : S ⊗n → R leading to ternary or j-ary i-th stop-go pairs SG i,3 , . . . , SG i, j for 1 ≤ i ≤ n, the notion of γ j | . . . |γ 1 -monotone sets and a corresponding monotonicity principle. To save (digital) trees we leave it to the reader to write down the corresponding definitions. (2) Intuitively, the sets i in Definition 2.4 could be simply defined to be the projections of n onto S ⊗i , however this would not guarantee measurability of the sets S ⊗i . Hence we need a slightly more involved statement of Theorem 2.5.

The n-marginal Root embedding
The classical Root embedding [56] establishes the existence of a barrier (or rightbarrier) R ⊆ R + × R such that the first hitting time of R solves the Skorokhod embedding problem. A barrier R is a Borel set such that (s, x) ∈ R ⇒ (t, x) ∈ R for all t > s. Moreover, the Root embedding has the property that it minimises E[h(τ )] for a strictly convex function h : R + → R over all solutions to the Skorokhod embedding problem, cf. [57]. We will show that there is a unique n-marginal Root embedding in the sense that there are n barriers (R i ) n i=1 such that for each i ≤ n the first hitting time of R i after hitting R i−1 embeds μ i . Theorem 2.7 (n-marginal Root embedding, c.f. [18]) Put γ i : S ⊗n → R, ( f , s 1 , . . . , s n ) → h(s i ) for some strictly convex function h : R + → R and assume that (OptMSEP) is well posed. Then there exist n barriers (R i ) n i=1 such that defining simultaneously for all 1 ≤ i ≤ n among all increasing families of stopping times . Hence for every i ≤ n we have P-a.s.
We claim that, for all 1 ≤ i ≤ n we have . . , t i ) ∈ S ⊗i satisfying s i > t i and consider two families of stopping times (σ j ) n j=i and (τ j ) n j=i on some probability space ( , F , P) together with their modifications (σ j ) n j=i and (τ j ) n j=i as in Sect. 2.1. Put and inductively for 1 < a ≤ n − i + 1 Let l = arg min{a : P[σ j a =σ j a ] > 0}. By the definition ofσ j andτ j we have in case of j l = i the equality {σ j l =σ j l } = and for j l > i it holds that As τ k ≤ τ k+1 , in particular, we have on {σ j l =σ j l } the inequality σ k > τ k for every i ≤ k ≤ j l . The strict convexity of h and s > t implies Hence, we get a strict inequality in (the corresponding κ −1 ( j l )-ary version of) (2.4) and the claim is proven.
Following the argument in the proof of Theorem 2.1 in [2], we define τ 1 cl and τ 1 op to be the first hitting times of R 1 cl and R 1 op respectively to see that actually τ 1 cl ≤ τ Root and τ 1 cl = τ 1 op a.s. by the strong Markov property. Then we can inductively proceed and define By the very same argument we see that Finally, we need to show that the choice of the permutation κ does not matter. This follows from a straightforward adaptation of the argument of Loynes [46] (see also [2,Remark 2.3] and [18, Proof of Lemma 2.4]) to the multi-marginal set up. Indeed, the first barrier R 1 is unique by Loynes original argument. This implies that the second barrier is unique because Loynes argument is valid for a general starting distribution of the process (t, B t ) in R + × R and we can conclude by induction. (1) In the last theorem, the result stays the same if we take different strictly convex functions h i for each i. (2) Moreover, it is easy to see that the proof is simplified if one starts with the objective n i=1 h i (τ i ), which removes the need for taking an arbitrary permutation of the indices at the start. Of course, to get the more general conclusion, one needs to consider these permutations.

Corollary 2.9
Let h : R + → R be a strictly convex function and let γ : among all increasing families of stopping timesτ 1 ≤ . . . ≤τ n satisfying Bτ i ∼ μ i for all 1 ≤ i ≤ n.

The n-marginal Rost embedding
The classical Rost embedding [57] establishes the existence of an inverse barrier (or left-barrier) R ⊆ R + × R such that the first hitting time of R solves the Skorokhod embedding problem. An inverse barrier R is a Borel set such that (t, x) ∈ R ⇒ (s, x) ∈ R for all s < t. Moreover, the Rost embedding has the property that it maximises E[h(τ )] for a strictly convex function h : R + → R over all solutions to the Skorokhod embedding problem, cf. [57]. Similarly to the Root embedding it follows that simultaneously for all 1 ≤ i ≤ n among all increasing families of stopping times This solution is unique in the sense that for any solutionτ 1 , . . . ,τ n of such a barriertype we have τ Rost The proof of this theorem goes along the very same lines as the proof of Theorem 2.7. The only difference is that due to the maximisation we get leading to inverse barriers. We omit the details.

The n-marginal Azéma-Yor embedding
For ( f , s 1 , . . . , s n ) ∈ S ⊗n we will use the notationf s i := max 0≤s≤s i f (s).

Theorem 2.11 (n-marginal Azéma-Yor solution) There exist n barriers
This solution is unique in the sense that for any We emphasise that this result has not appeared previously in the literature in this generality; previously the most general result was due to [30,51], which proved a closely related result under an additional condition on the measures, which is not necessary here. Unlike our solution, however, the constructions of [30,51] are constructive.

Remark 2.12
In fact, similarly to the n-marginal Root and Rost solutions τ AY simultaneously solves the optimization problems for each i which of course implies Theorem 2.11 (see also Remark 2.8.2). To keep the presentation readable, we only prove the less general version.
Proof Fix a bounded and strictly increasing continuous function ϕ : R + → R + and consider the continuous Pick, by Theorem 2.1, a minimizer τ AY of (OptMSEP 2 ) and, by Theorem 2.5, aγ |γ -monotone family of sets We claim that and f s i >ḡ t i and take two families of stopping times (σ j ) n j=i and (τ j ) n j=i together with their modifications (σ j ) n j=i and (τ j ) n j=i as in Sect. 2.1. We assume that they live on some probability space ( , F , P) additionally supporting a standard Brownian motion W . Observe that (as written out in the proof of Theorem 2.7) on {σ j =σ j } it holds that σ j > τ j . Hence, on this set we haveW σ j ≥W τ j . This implies that for ω ∈ {σ j =σ j } (and henceσ j = τ j ,τ j = σ j ) with a strict inequality unless eitherW On the set {σ j =σ j } we do not change the stopping rule for the j-th stopping time and hence we get a (pathwise) equality in (2.7). Thus, we always have a strict inequality in Hence, (( f , s 1 , . . . , s i ), (g, t 1 , . . . , t i )) ∈ SG ⊆ SG 2 in the first case and in the second case we have (( f , s 1 , . . . , s i ), (g, t 1 , . . . , t i )) ∈ SG 2 proving (2.6).
For each i ≤ n we define We will show inductively on i that firstly τ i cl ≤ τ AY i ≤ τ i op a.s. and secondly τ i cl = τ i op a.s. proving the theorem. The case i = 1 has been settled in [2]. So let us assume Finally, we need to show that τ i cl = τ i op a.s. Before we proceed we give a short reminder of the case i = 1 from [2, Theorem 6.5]. We definẽ From the definition of R 1 cl , we see thatψ 1 0 (m) is increasing, and we define the rightcontinuous function ψ 1 It follows from the definitions of τ 1 op and τ 1 cl that: Asψ 1 0 has at most countably many jump points (discontinuity points) it is easily checked that τ − = τ + a.s. and hence τ 1 Note also that the lawμ 1 ofB τ AY 1 can have an atom only at the rightmost point of its support. Hence, with {(x,y):y<x} has a density with respect to Lebesgue measure when projected onto the first coordinate.
Defining these quantities in obvious analogy for j ∈ {2, . . . , n}, we need to prove τ i+1 cl = τ i+1 op = τ AY i+1 assuming that π i has continuous projection onto the horizontal axis. To do so, we decompose π i into free and trapped particles Here π i f refers to particles which are free to reach a new maximum, while π i t refers to particles which are trapped in the sense that they will necessarily hit R i op (and thus also R i cl ) before they reach a new maximum. For particles started in π i f it follows precisely as above that the hitting times of R i+1 op and R i+1 cl agree. For particles started in π i t this is a consequence of Lemma 2.13. Additionally, as above we find that π i+1 {(x,y):y<x} has continuous projection onto the horizontal axis.

The n-marginal Perkins/Hobson-Pedersen embedding
For ( f , s 1 , . . . , s n ) ∈ S ⊗n we will use the notation f s i := min 0≤s≤s i f (s) to denote the running minimum of the path up to time s i . (Recall also thatf s i is the maximum of the path). In this section we will consider a generalisation of the embeddings of Perkins [54] and Hobson and Pedersen [39].
for decreasing functions γ + , γ − . Hobson and Pedersen constructed, for the case of a general starting distribution, a stopping time where G was an appropriately chosen, F 0 -measurable random variable. (Here, recalling the discussion at the start of Sect. 2.1, we need to use the assumption that the filtration supports the Brownian motion, and an additional F 0 -measurable, independent uniform random variable; this additional information is enough then to construct a suitable G).
Pick, by Theorem 2.1, a minimizer τ H P of (OptMSEP 2 ) and, by Theorem 2.5, a γ 2 |γ -monotone family of sets By an essentially identical argument to that given in Theorem 2.11, we have with respective hitting times (τ 0 = 0) It can be shown inductively on i that firstly τ i s., proving the theorem. The proofs of these results are now essentially identical to the proof of Theorem 2.11.
Of course, as before, a more general version of the statement (without the summation) can be proved, at the expense of a more complicated argument.

Remark 2.15
The result above says nothing about the uniqueness of the solution. However the following argument (also used in [2]) shows that any optimal solution (to both the primary and secondary optimisation problem in the proof of Theorem 2.14) will have the same barrier form: specifically, suppose that (τ i ) and (σ i ) are both optimal. Define a new stopping rule which, at time 0, chooses either the stopping rule (τ i ), or the stopping rule (σ i ), each with probability 1/2. This stopping rule is also optimal (for both the primary and secondary rules), and the arguments above may be re-run to deduce the corresponding form of the optimal solution.
In fact, a more involved argument would appear to give uniqueness of the resulting barrier among the class of all such solutions; the idea is to use a Loynes-style argument as before, but applied both to the barrier and the rate of stopping at the maximum. The difficulty here is to argue that any stopping times of the form given above are essentially equivalent to another stopping time which simply stops at the maximum according to some rate which will be dependent only on the choice of the lower barrier (that is, in the language above, P(H i for any x and ε > 0, where H i x := inf{t ≥ τ H P i−1 : B t ≥ x). By identifying each of the possible optimisers with a canonical form of the optimiser, and using a Loynes-style argument which combines two stopping rules of the form above by taking the maximum of the left-barriers, and the fastest stopping rate of the rules, one can deduce that there is a unique sequence of barriers and stopping rate giving rise to an embedding of this form. We leave the details to the interested reader. Remark 2. 16 We conclude by considering informally the 'Perkins'-type construction implied by our methods. Recall that in the single marginal case, where B 0 = 0, the Perkins embedding simultaneously both maximises the law of the minimum, and minimises the law of the maximum. A slight variant of the methods above would suggest that one could adapt the arguments above to consider the optimiser which has the same primary objective as above, and also then aims to minimise the law of the minimum. In this case the arguments may be run to give stopping regions (for each marginal) which are barriers in the sense that it is the first hitting time of a left-barrier R which is left-closed in the sense that if (for a fixed x) a path withf s = m, f s = j is stopped, then so too are all paths withḡ s = m , g s = j , where (m , − j ) ≺ (m, − j) and ≺ denotes the lexicographical ordering. With this definition, the general outline argument given above can proceed as usual, however we do not do this here since the final stage of the argument -showing that the closed and open hitting times of such a region are equal -would appear to be much more subtle than previous examples, and so we leave this as an open problem for future work.
However, more notable is that in the multiple marginal case (and indeed, already to some extent in the case of a single marginal with a general starting law), the Perkins optimality property is no longer strictly preserved. To see why this might be the case (see also [39,Remark 2.3]) we note that, in the case of a single marginal, with trivial starting law, the embedding constructed via the double minimisation problems always stops at a time when the process sets a new minimum or a new maximum. At any given possible stopping point, the decision to stop should depend both on the current minimum, and the current maximum; however when the process is at a current maximum, both the current position and the current maximum are the same. In consequence, the decision to stop at e.g. a new maximum will only depend on the value of the minimum, and the optimisation problem relating to maximising a function of the maximum will be unaffected by the choice. In particular, it is never important which optimisation is the primary optimisation problem, and which is the secondary optimisation problem: in terms of the barrier-criteria established above, this can be seen by observing that in lexicographic ordering, On the other hand, with multiple marginals, we may have to consider possible stopping at times which do not correspond to setting a new maximum or minimum. Consider for example the case with μ 0 = δ 0 , μ 1 = (δ 1 + δ −1 )/2, μ 2 = 2(δ 2 + δ −2 )/5 + δ 0 /5. In particular, the first stopping time, τ 1 must be the first hitting time of {−1, 1}, and if the process stops at 0 at the second stopping time, then to be optimal, it must stop there the first time it hits 0 after τ 1 . If we consider the probability that we return to 0 after τ 1 , before hitting {−2, 2}, then this is larger than 1 5 , and we need to choose a rule to determine which of the paths returning to 0 we should stop. It is clear that, if the primary optimisation is to minimise the law of the maximum, then this decision would only depend on the running maximum, while it will depend only on the running minimum if the primary and secondary objectives are switched. In particular, the two problems give rise to different optimal solutions. The difference here arises from the fact that we are not able to assume that all paths have either the same maximum, or the same minimum. As a consequence, we do not, in general, expect to recover a general version of the Perkins embedding, in the sense that there exists a multi-marginal embedding which minimises the law of the maximum, and maximises the law of the minimum simultaneously.

Further "classical" embeddings and other remarks
By combining the ideas and techniques from the previous sections and the techniques from [2, Section 6.2] we can establish the existence of n-marginal versions of the Jacka and Vallois embeddings and their siblings (replacing the local time with a suitably regular additive functional) as constructed in [2,Remark 7.13]. We leave the details to the interested reader.
We also remark that it is possible to get more detailed descriptions of the structure of the different barriers. At this point we only note that all the embeddings presented above have the nice property that their n-marginal solution restricted to the first n − 1 marginals is in fact the n − 1 marginal solution. This is a direct consequence of the extension of the Loynes argument to n-marginals as shown in the proof of Theorem 2.7. For a more detailed description of the barriers for the n-marginal Root embedding we refer to [18].
We also observe that, as in [2, Section 6.3], it is possible to deduce multi-marginal embeddings of some of the embeddings presented in the previous sections, e.g. Root and Rost, in higher dimensions. We leave the details to the interested reader.

A n-marginal version of the monotone martingale coupling
We next discuss the embedding giving rise to a multi-marginal version of the monotone martingale transport plan. Note that we need an extra assumption on the starting law μ 0 , but on μ 0 only.
simultaneously for all 1 ≤ i ≤ n among all increasing families of stopping times (τ 1 , . . . ,τ n ) such that Bτ j ∼ μ j for all 1 ≤ j ≤ n. This solution is unique in the sense that for any solutionτ 1 , . . . ,τ n of such a barrier-type we have τ i =τ i .

Remark 2.18
In the final stage of writing this article we learned of the work of Nutz et al. [49] on multi-period martingale optimal transport which (among various further results) provides an n-marginal version of the monotone martingale transport plan. Their methods are rather different from the ones employed in this article and in particular not related to the Skorokhod problem, but their solution is the same as the one presented here (see also [5]).

Proof of Theorem 2.17
The overall strategy of the proof, and in particular the first steps follow exactly the arguments encountered above. Fix a permutation κ of {1, . . . , n}.
We claim that for all 1 ≤ i ≤ n we have To this end, we have to consider ( f , s 1 , . . . ,  (2.9) and that this inequality is strict, provided that the set ρ > τ has positive probability. To establish this inequality, of course only the parts were ρ > τ matters. Otherwise put, the inequality remains equally valid if we replace all of σ, τ,σ ,τ by τ ∨ σ on the set ρ ≤ τ , in which case we haveσ = τ ,τ = σ , σ ≥ τ . Hence to prove (2.9) it is sufficient to show for α := Law(B σ ), β := Law(B τ ) and a := f (s) = g(t) that To obtain this, we claim that is decreasing in t: This holds true since c x is concave and β precedes α in the convex order (strictly if P(ρ > τ ) > 0).
Having established the claim, we define for each 1 ≤ i ≤ n Following the argument used above, we define τ 1 cl and τ 1 op to be the first times the process (B t − B 0 , B t ) t≥0 hits R 1 cl and R 1 op respectively to see that actually τ 1 cl ≤ τ 1 ≤ τ 1 op . It remains to show that τ 1 cl = τ 1 op (This has already been shown in [41,Prop. 3.1]; we present the argument for completeness). To this end, note that the hitting time of (B t − B 0 , B t ) t≥0 into a barrier can equally well be interpreted as the hitting time of (−B 0 , B t ) t≥0 into a transformed (i.e. sheared through the transformation (d, x) → (d − x, x) ) barrier. The purpose of this alteration is that the process (−B 0 , B t ) t≥0 moves only vertically and we can now apply Lemma 2.13 to establish that indeed τ 1 cl = τ 1 op . Observe that at this stage the continuity assumption on μ 0 is crucial. We then proceed by induction. As above, uniqueness and the irrelevance of the permutation follow from Loynes' argument.
A very natural conjecture is then that Theorem 2.17 would give rise to a solution to the peacock problem. The set of martingales (S t ) t∈[0,T ] (more precisely the set of corresponding martingale measures) carries a natural topology and given D ⊆ [0, T ] with T ∈ D the set of martingales with prescribed marginals (μ t ) t∈D is compact (cf. [6]). By taking limits of the solutions provided above along appropriate finite discretisations D ⊆ [0, T ], one obtains a sequence of optimisers to the discrete problem whose limit simultaneously for all t ∈ [0, T ] among all such martingales. However, since this is not the scope of the present article we leave details for future work.
We note that this also provides a continuous time extension of the martingale monotone coupling rather different from the constructions given by Henry-Labordère et al. [32] and Juillet [42].

Stopping times and multi-stopping times
For a Polish space X equipped with a probability measure m we define a new probability where B(X) denotes the Borel σ -algebra on X, W denotes the Wiener measure, and (F 0 t ) t≥0 the natural filtration. We denote the usual augmentation of G 0 by G a . Moreover, for * ∈ {0, a} we set G * 0− := B(X) ⊗ F * 0 . If we want to stress the dependence on (X, m) we write G a (X, m), G a t (X, m), . . ..
The natural coordinate process on X will be denoted by Y , i.e. for t ≥ 0 we set Note that under P, in the case where X = R, the process Y can be interpreted as a Brownian motion with starting law m. In particular, t → Y t (x, ω) is continuous and We recall and introduce the maps We equip C 0 (R + ) with the topology of uniform convergence on compacts and S X with the final topology inherited from X × R + turning it into a Polish space. This structure is very convenient due to the following proposition which is a particular case of [20, Theorem IV. 97].

Proposition 3.1
Optional sets / functions on X × R + correspond to Borel measurable sets / functions on S X . More precisely we have: Definition 3.2 A G 0 -optional process Z = H • r X is called S X -continuous (resp. l./u.s.c.) iff H : S X → R is continuous (resp. l./u.s.c.).

Randomised stopping times
We set and equip it with the weak topology induced by the continuous and bounded functions on X × R + . Each ξ ∈ M can be uniquely characterized by its cumulative distribution Definition 3.6 A measure ξ ∈ M is called randomized stopping time, written ξ ∈ RST, iff the associated increasing process A ξ is G 0 -optional. If we want to stress the Polish probability space (X, B(X), m) in the background, we write RST(X, m).
We remark that randomized stopping times are a subset of the so called P-measures introduced by Doleans [21] (for motivation and further remarks see [2,Section 3.2]).
In the sequel we will mostly be interested in a representation of randomized stopping times on an enlarged probability space. We will be interested in (X , G , The following characterization of randomized stopping times is essentially Theorem 3.8 of [2]. The only difference is the presence of the X in the starting position, however it is easily checked that this does not affect the proof. (1) There is a Borel function A : S X → [0, 1] such that the process A • r X is rightcontinuous increasing and defines a disintegration of ξ wrt to P.
defines an G -stopping time.

Randomised multi-stopping times
In this section, we extend the results of the last section to the case of multiple stopping. Recall the notation defined in Sect. 1.2. In particular, for d ≥ 1, recall that Recall that (X,Ḡ, We mostly denote L d (du) by du. For (u 1 , . . . , u d ) ∈ [0, 1] d we often just write (u 1 , . . . , u d ) = u. We suppress the d-index in the notation for the extended probability space. It will either be clear from the context which d we mean or we explicitly write down the corresponding spaces.
We denote the subset of all randomised multi-stopping times with total mass 1 by RMST 1 d . If we want to stress the dependence on (X, m) we write RMST d (X, m) or RMST 1 d (X, m). Remark 3. 10 We can understand the condition (3.6) as follows. Consider the case where d = 2 and i = 1. Then the measure ξ 2 is a sub-probability measure of the form: ξ 2 (d(x, ω), ds 1 , ds 2 ) = ξ x,ω (ds 1 , ds 2 ) P(d(x, ω)). Thenr 2,1 (ξ 2 ) is a sub-probability measure on S ⊗i X × X × 1 . This measure can be disintegrated against r 1 (ξ 1 ), which is a measure on S ⊗i X , to give a measure on X × 1 . Intuitively, this measure is the conditional law, given ((B s ) s≤τ 1 , τ 1 ) of ((B t − B τ 1 ) t≥0 , τ 2 −τ 1 ). The condition (3.6) is then a statement that the law of this pair is then consistent with the law of a randomised stopping time.
Unlike for the randomised stopping times, there is no obvious analogue of (1), (2) or (3) of Theorem 3.7 in the multi-stopping time setting. However below we prove a representation result for randomised multi-stopping times in a similar manner to (4). The following lemma (c.f. [2,Lemma 3.11]) then enables us to conclude that, on an arbitrary probability space, all sequences of increasing stopping times can be represented as a randomised multi-stopping time on our canonical probability space. ( , H, (H t ) t≥0 , Q) with right continuous filtration. Let τ 1 , . . . , τ n be an increasing sequence of Hstopping times and consider ω → ((B t ) t≥0 , τ 1 (ω), . . . , τ n (ω)).

Lemma 3.11 Let B be a Brownian motion on some stochastic basis
Then ξ := (Q) is a randomized multi-stopping time and for any measurable γ : , s 1 , . . . , s n ) r n (ξ )(d( f , s 1 , . . . , s n ) If is sufficiently rich that it supports a uniformly distributed random variable which is H 0 -measurable, then for ξ ∈ RMST we can find an increasing family (τ i ) 1≤i≤n of H-stopping times such that ξ = (Q) and (3.7) holds.
First we show thatr 1 Take a measurable and bounded F : S R × C 0 (R + ) → R. Then, using the strong Markov property in the last step, we have Let q be the projection from S R × C 0 (R + ) × R + to S R × C 0 (R + ), and p be the projection from X × 2 → X × R + , p(ω, s 1 , s 2 ) = (ω, s 1 ). Then, q •r 2,1 =r 1 • p.
To show the second part of the lemma we start by constructing an increasing sequence of stopping times on the extended canonical probability space (X,Ḡ, (Ḡ t ) t≥0 ,P). By Theorem 3.7 and the assumption that ξ 1 ∈ RST(X, m) there is aḠ stopping time ρ 1 (x, ω, u) = ρ 1 (x, ω, u 1 ) defining a disintegration of ξ 1 wrt P via By assumption,r 2,1 (ξ 2 ) ∈ RST(S X , r 1 (ξ 1 )). Hence, writing s 2 = s 2 − s 1 we can disintegrate such that for r 1 (ξ 1 ) a.e. (x, f , s 1 ) the disintegration ξ 2 (x, f ,s 1 ) is a randomized stopping time. Again by Theorem 3.7 there is a stopping timeρ 2 x, f ,s 1 (ω, u 2 ) representing ξ 2 (x, f ,s 1 ) as in (3.5). Then, defines aḠ stopping time such that defines a G a -measurable disintegration of ξ 2 w.r.t. P. We proceed inductively. To finish the proof, let U be the [0, 1] d -valued uniform H 0 -measurable random variable. Then τ i := ρ i (B, U ) define the required increasing family of H stopping times. Lemma 3.11 shows that optimizing over an increasing family of stopping times on a rich enough probability space in (OptMSEP) is equivalent to optimizing over randomized multi-stopping times on the Wiener space.

Remark 3.12
defines a G a -measurable disintegration of ξ w.r.t. P.
Recalling that ξ i = p d,i (ξ ), it follows that there exists a disintegration ofr d,i (ξ ) with respect tor i (ξ i ), which we denote by: Moreover, we set

Remark 3.15
We note that the last Corollary still holds for i = 0 by setting S ⊗0 R = R, r 0 (ξ j ) = m. Then, the result says that for a disintegration (ξ x ) x of ξ w.r.t. m for m-a.e. x ∈ X we have ξ x ∈ RMST d . Of course this can also trivially be seen as a consequence of P = m ⊗ W.
An important property of RMST is the following Lemma.

Lemma 3.16 RMST is closed w.r.t. the weak topology induced by the continuous and bounded functions on X × d .
Proof We fix 0 ≤ i ≤ d −1 and consider the Polish spaceX = S ⊗i X with corresponding X =X×C 0 (R + ) and P = r i (ξ i )⊗W. To show the defining property (3.6) in Definition 3.9 we consider condition (2) in Theorem 3.7; the goal is to express measurability of of course this does not rely on our particular setup. By a functional monotone class argument, for G 0 t -measurability of Z t it is sufficient to check that for all G ∈ C b (X). In terms of ξ i+1 , (3.10) amounts to which is a closed condition by Proposition 3.5.
Assume that (M s ) s≥0 is a process on X. Then (M ξ s ) s≥0 is defined to be the probability measure on R d+1 such that for all bounded and measurable functions , ds 1 , . . . , ds d ).

, compactness and existence of optimisers
In this subsection, we specialise our setup to X = R, m = μ 0 ∈ P(R) and d = n. Let μ 0 , μ 1 , . . . , μ n ∈ P(R) be centered, in convex order and with finite second moment 2 We extend B to the extended probability spaceX by settinḡ B (x, ω, u) = B(x, ω). By considering the martingaleB 2 t − t we immediately get (see the proof of Lemma 3.12 in [2] for more details) Lemma 3.17 Let ξ ∈ RMST n and assume that B ξ = (μ 0 , μ 1 , . . . , μ n ). Let (ρ 1 , . . . , ρ n ) be any representation of ξ granted by Lemma 3.11. Then, the following are equivalent Of course it is sufficient to test any of the above quantities for i = n. Definition 3. 18 We denote by RMST(μ 0 , μ 1 , . . . , μ n ) the set of all randomised multistopping times satisfying one of the conditions in Lemma 3.17.
By pasting solutions to the one marginal Skorokhod embedding problem one can see that the set RMST(μ 0 , μ 1 , . . . , μ n ) is non-empty. However, the most important property is

Joinings of stopping times
We now introduce the notion of a joining; these will be used later to define new stopping times which are candidate competitors for our optimisation problem. Definition 3.20 Let (Y, σ ) be a Polish probability space. The set JOIN(m, σ ) of joinings between P = m ⊗ W and σ is defined to consist of all subprobability measures π ∈ P ≤1 (X × R + × Y) such that Example 3. 21 An important example in the sequel will be the probability space (X, P) constructed from X = S ⊗i R and m = r i (ξ i ) for ξ ∈ RMST 1 n (R, μ 0 ) and 0 ≤ i < n, where we set S ⊗0 = R, r 0 (ξ 0 ) = μ 0 leading to X = S ⊗i R ×C(R + ) and P = r i (ξ i )W = r i (ξ i ) (cf. Corollary 3.14).

Colour swaps, multi-colour swaps and stop-go pairs
In this section, we will define the general notion of stop-go pairs which was already introduced in a weaker form in Sect. 2.1. We will do so in two steps. First we define colour swap pairs and then we combine several colour swaps to get multi-colour swaps. Together, they build the stop-go pairs. Our basic intuition for the different swapping rules comes from the following picture. We imagine that each of the measures μ 1 , . . . , μ n carries a certain colour, i.e. the measure μ i carries colour i. The Brownian motion will be thought of being represented by a particle of a certain colour: at time zero the Brownian particle has colour 1 and when it is stopped for the i-th time it changes its colour from i to i + 1 (cf. Fig.  1 in Sect. 2.1).
In identifying a stop-go pair, we want to consider two sub-paths, ( f , s 1 , . . . , s i ) and (g, t 1 , . . . , t i ), and imagine the future stopping rules, which will now be a sequence of colour changes, obtained by concatenating a path ω onto the two paths. The simplest way of creating a new stopping rule is simply to exchange the coloured tails. This will preserve the marginal law of the stopped process, while generating a new multistopping time. A generalisation of this rule would be to try and swap back to the original colour rule at the jth colour change, where i < j. In this case, one would swap the colours until the first time one of the paths would stop for the jth time, after which one attempts to revert to the previous stopping rule. Note however that this may not be possible: if the other path has not yet reached the j − 1st colour change, then the rules cannot be stopped, since one would have to switch from the jth colour to the j − 1st colour, which is not allowed. Instead, in such a case, we simply keep the swapped colourings. We call recolouring rules of this nature colour swaps (or i ↔ j colour swaps). We will define such colour swap pairs in Sect. 4.2.
After consideration of these colour swaps, it is clear that the determination of when to revert to the original stopping rule could be determined in a more sophisticated manner. For example, instead of trying to revert only on the jth colour change, one could instead try to revert on every colour change, and revert the first time it is possible to revert. This recolouring rule gives us a second set of possible path swaps, and we call such pairs multi-colour swaps. We will define these recolouring rules in Sect. 4.3. Of course, a multitude of other rules can easily be created. For our purposes, colour swaps and multi-colour swaps will be sufficient, but other generalisations could easily be considered, and may be important for showing optimality in cases outside those considered in the current paper. We leave this as an avenue for future research. this case it is possible that also all particles of colour j ∈ {i + 2, . . . , n} are stopped at time s by (ξ i+1 ( f ,s 1 ,...,s i ) ) h⊕ω . This is the reason for the closed intervals in the second line on the right hand side of (4.1). Using Lemma 3.11 resp. Corollary 3.13 it is not hard to see that (4.1) indeed defines a randomized multi-stopping time (you simply have to consider the stopping times ρ l (ω, u 1 , . . . , u l ) representing ξ ( f ,s 1 ,...,s i ) with ,s 1 ,...,s i ) for the first case and the second case is immediate). Accordingly, we define the normalised conditional randomised multi-stopping times, bȳ is measurable.
Recall the connection of Borel sets of S X and optional sets in X × R + given by Proposition 3.1.    r (ω, s)). Then, ( f , s 1 , . . . , ,s 1 ,...,s i ) (h ⊕ ω) W(dω) < 1. Set X = S ⊗i and m = r i (ξ i ) and recall that the natural coordinate process on X is denoted by Y . Given a G 0stopping time τ on (X, G, P) we have r i (ξ i ) a.s. by the strong Markov property and the fact that ξ is almost surely a finite stopping time: Hence, the first part follows from the optional section Theorem.

Colour swaps
As a first step towards the definition of stop-go pairs we introduce an important building block, the colour swap pairs. By Corollary 3.13 and Corollary 3.14, for r i (ξ i ) a.e. (g, t 1 , . . . , t i ) there is an increasing sequence (ρ j (g,t 1 ,...,t i ) ) n j=i+1 ofF a -stopping times such that  ,s 1 ,...,s i−1 )|(h,s) = δ 0 · · · δ 0 there is an increasing sequence (ρ j ( f ,s 1 ,...,s i−1 )|(h,s) ) n j=i ofF a -stopping times such that δ ρ i+1 the corresponding results, Theorem 5.7 (resp. Theorem 5.16), in [2]. For the benefit of the reader, and to keep our presentation compact, we concentrate on those aspects of the proof where additional insight is needed to account for the multi-marginal aspects of the problem. We refer the reader to [2] for other details.
The essence of the proof is to first show that if we have a candidate optimiser ξ , and a joining rule π which identifies stop-go pairs, we can construct an infinitesimal improvement ξ π , which will also be a candidate solution, but which will improve the objective. It will follow that the joining π will place no mass on the set of stop-go pairs. The second part of the proof shows that we can strengthen this to give a pointwise result, where we can exclude any stop-go pair from a set related to the support of the optimiser.
The proof of Theorem 5.2 is based on the following two propositions.
and consider the corresponding probability space (X, P). By Proposition 5.
Finally, we can take a Borel subset of i with full measure and taking suitable intersections we can assume that proj S ⊗i−1

Proof of Proposition 5.3
For notational convenience we will only prove the statement for the colour swap pairs CS ξ i . As the colour swap pairs are the main building block for the multi-colour swap pairs MCS ξ i it will be immediate how to adapt the proof for the general case. Moreover, it is clearly sufficient to show that for every j ≥ i we have (r X ⊗ Id)(π )(CS ξ i↔ j ) = 0 for each π ∈ JOIN(r i−1 (ξ i−1 ), r i (ξ i )).
We also define the secondary stop-go pairs of colour i relative to ξ in the wide sense, SG Theorem 5.7 follows from a straightforward modification of Proposition 5.3 by the same proof as for Theorem 5.2 using Proposition 5.4. We omit further details.

Proof of main result
We are now able to conclude, by observing that our main result is now a simple consequence of previous results.
Proof of Theorem 2.5 Since any ξ ∈ RMST(μ 0 , . . . , μ n ) induces via Lemma 4.2 and Corollary 3.13 a sequence of stopping times as used for the definition of stop-go pairs in Sect. 2.1 the result follows from Theorem 5.7.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.