From Bachelier to Dupire via optimal transport

Famously, mathematical finance was started by Bachelier in his 1900 PhD thesis where – among many other achievements – he also provided a formal derivation of the Kolmogorov forward equation. This also forms the basis for Dupire’s (again formal) solution to the problem of finding an arbitrage-free model calibrated to a given volatility surface. The latter result has rigorous counterparts in the theorems of Kellerer and Lowther. In this survey article, we revisit these hallmarks of stochastic finance, highlighting the role played by some optimal transport results in this context.

as an infinitesimal version of a random walk. His 19th-century-style argument runs as follows. Suppose that the grid in space is given by . . . , x n−2 , x n−1 , x n , x n+1 , x n+2 , . . . with the same (infinitesimal) distance x = x n − x n−1 for all n, and such that at time t, these points have (infinitesimal) probabilities . . . , p t n−2 , p t n−1 , p t n , p t n+1 , p t n+2 , ...
of these points at time t + t?
The random walk moves half of the mass p t n sitting at time t in x n to the point x n+1 . En revanche, it moves half of the mass p t n+1 sitting at time t in x n+1 to the point x n . We thus may calculate the net difference between p t n /2 and p t n+1 /2, which Bachelier identifies with where we let x = x n = x n+1 which is legitimate for Bachelier as x n and x n+1 only differ by an infinitesimal. This amount of mass is transported from the interval (−∞, x n ] to [x n+1 , ∞) during the time interval (t, t + t). In Bachelier's own words, this is very nicely captured by the following quote from his thesis: "Chaque cours x rayonne pendant l'élément de temps vers le cours voisin une quantité de probabilité proportionelle à la différence de leurs probabilités. Je dis proportionnelle, car on doit tenir compte du rapport de x à t. La loi qui précède peut, par analogie avec certaines théories physiques, être appelée la loi du rayonnement ou de diffusion de la probabilité." In the English translation: "Each price x during an element of time radiates towards its neighbouring price an amount of probability proportional to the difference of their probabilities. I say proportional because it is necessary to account for the relation of x to t. The above law can, by analogy with certain physical theories, be called the law of radiation or diffusion of probability." Passing formally to the continuous limit and -using today's terminology -denoting by P t (x) = where we have normalised the relation between x and t to obtain the constant 1/2. By differentiating (1.1) with respect to x, one obtains the usual heat equation for the density function p t (x), which then is Gaussian. Of course, the heat equation was known to Bachelier, and he notes regarding (1.2) "C'est une équation de Fourier." Bachelier thus derived, on a formal level, the Kolmogorov forward equation, also known as Fokker-Planck equation, for the propagation of a probability density p under Brownian motion. The forward equation will also play an important role subsequently, and we take the opportunity to note that Bachelier's argument can equally well be applied to the more general process with increments dX t = σ (t, X t ) dW t to arrive at the PDE But let us still remain with the form (1.1) of the heat equation and analyse its message in terms of "horizontal transport of probability measures". One may ask: what is the "velocity field", acting on the set of probabilities on R, which moves the probability density p t ( · ) to the probability density p t+dt ( · )? Following Bachelier's intuition and keeping in mind that the mass sitting at time t in x equals p t (x), the velocity of this move at the point x must be equal to (1.4) which has the natural interpretation as the "speed" of the horizontal transport induced by p t (x). We thus encounter in nuce the "score function" p t (x) = ∇p t (x) p t (x) , where the nabla notation ∇ indicates that this is a vector field which makes perfect sense in the n-dimensional case, too.
At this stage, we can relate Bachelier's work with the more recent notion of the Wasserstein metric W 2 ( · , · ), at least intuitively and at an infinitesimal level. One may ask: what is the necessary kinetic energy needed to transport p t ( · ) to p t+dt ( · )? Knowing the speed (1.4) and the usual formula for the kinetic energy, we obtain for the Wasserstein distance between the two infinitesimally close probabilities p t and p t+dt as For a formal definition of the Wasserstein distance W 2 ( · , · ), we refer e.g. to Villani [56,Definition 6.1]. While for the finite version of the Wasserstein distance between two probability measures, one has to find an optimal transport plan, the situation is simpler -and very pleasant -in the case of the infinitesimal transport induced by the vector field (1.4). This infinitesimal transport is automatically optimal in an asymptotic sense. Indeed, under suitable regularity conditions, the vector field inducing the optimal transport between p t and p t+h converges, after normalising by 1 h , to the vector field (1.4). Intuitively, this corresponds to the geometric insight in the one-dimensional case that the transport lines of infinitesimal length cannot cross each other. For a thorough treatment of the geometry of absolutely continuous curves of probabilities such as (p t ( · )) t≥0 above, we refer to the lecture notes by Ambrosio et al. [3,Chap. II].
We finish the section by returning to Bachelier's thesis. The rapporteur of Bachelier's dissertation was no lesser a figure than Henri Poincaré. Apparently he was aware of the enormous potential of the section "Rayonnement de la probabilité" in Bachelier's thesis, when he added to his very positive report the handwritten phrase "On peut regretter que M. Bachelier n'ait pas développé davantage cette partie de sa thèse." That is: One might regret that Mr Bachelier did not develop further this part of his thesis. Truly prophetic words!

Dupire's formula
We now turn to a well-known and more recent topic in mathematical finance continuing the early achievements of Bachelier.
Suppose that in a financial market, we know the prices of "many" European options on a given (highly liquid) stock S. What can we deduce from this data about the prices of exotic, i.e., path-dependent options?
This question leads to the following mathematical idealisation. Suppose we know the prices of all European call options, i.e., the price C(t, x) of every call option with strike price x and maturity t, for every 0 ≤ t ≤ T and x ∈ R + . Our task is to analyse the set of all possible (local) martingale measures for the stock price process which are compatible with this data. Once we have a hand on the relevant set of martingale measures, we can price arbitrary exotic options by taking expectations.
To make the question more tractable, it is a good idea to restrict the class of processes under consideration e.g. to continuous, Markovian martingales. We also make the economically meaningful assumptions that the function (t, x) → C(t, x) is sufficiently smooth, as well as strictly convex in the variable x and strictly increasing in the variable t, to allow subsequent formal manipulations.
The first observation is that the knowledge of C(t, x) for 0 ≤ t ≤ T and x > 0 is tantamount to the knowledge of the marginal probabilities (μ t ) 0≤t≤T of the underlying stock price process under a martingale measure which determines the prices via the formula (2.1) This observation goes back to Breeden and Litzenberger [12].
If the measures μ t are absolutely continuous with respect to Lebesgue measure with a continuous density function p t (x), then (2.1) amounts to the relation x > 0, (2.2) as one verifies via integration by parts. In a very influential and highly cited paper from 1994 (compare also the work of Derman and Kani [16]), Dupire [17] considered diffusion processes of the form where the "local volatility" σ ( · , · ) is modelled as a deterministic function of t and x, and (W t ) is a Brownian motion adapted to its natural filtration (F t ) 0≤t≤T . It turns out that there is the beautiful and strikingly simple "Dupire formula" which relates σ ( · , · ) to the given option prices C(t, x), namely .
Indeed, the Fokker-Planck equation implies -at least on a formal level, cf.
Integrating with respect to x, using (2.2) and changing the order of derivatives quickly yields (2.4). We note that this beautiful argument is very much in line with Bachelier's reasoning in (1.1) and (1.2) above pertaining to the case of constant volatility σ . We note in passing that Bachelier used instead of the wording "volatility" the more colourful term "nervousness of the market".
Of course, Dupire's formal arguments need proper regularity assumptions in order to be justified. There are two aspects: existence and uniqueness of the martingales fitting the given option prices C(t, x). As regards the former, the question of existence amounts to a remarkable theorem by Kellerer [38,39]: Given a family (μ t ) 0≤t≤T of probability distributions on R which is increasing in the convex order, there is a Markov martingale having these probabilities as marginals. By "increasing in the convex order", we mean that each μ t has finite first moment and that μ t (f ) = R f (x)μ t (dx) is nondecreasing in t, for every convex function f on R. Kellerer's theorem extends earlier work of Strassen [54] who established a discrete-time version of the result. We also note that the convex order condition on the marginal distributions is necessary, as easily follows from Jensen's inequality.
Kellerer's theorem goes far beyond the simple formula (2.4) and has been further refined, notably by Lowther [46,45,47] in an impressive series of papers. We shall review these results in the subsequent sections.
However, from an application point of view, the existence question is not of primordial relevance. After all, the function C(t, x) is an idealisation of reality which has to be estimated from a finite set of given European option prices. In this context, it does not harm to make strong regularity assumptions on the smoothness and convexity (in the variable x) of the function C(t, x) which justify the above argument. Under such assumptions, Dupire's solution (2.4) does make sense and the issue of existence is settled.
A different issue is the question of uniqueness. As we shall see below, this question is challenging and relevant -at least from a mathematical point of view -even in very regular settings, such as the Bachelier or the Black-Scholes model.
In order to formulate existence and uniqueness results for a process with given marginals, one has to specify the class of processes with respect to which we want to establish existence and uniqueness. Under proper regularity assumptions, the unique solution should of course equal Dupire's solution. Dupire's process is a martingale with continuous paths, enjoying the Markov property. Is Dupire's solution unique within this class? In a veritable tour de force, Lowther [46,45] has shown that the answer is yes, provided that we replace the word Markov by the words strong Markov and one restricts to continuous processes.
We also refer to Hirsch et al. [30,Theorem 6.1] where a slightly different version of this theorem, credited to Morgan Pierre, is proved. These theorems settle the question of uniqueness in a very satisfactory way. We shall discuss Lowther's theorem in more detail in Sect. 6.
To the best of our knowledge, the following question remained open: Is it really necessary to add the adjective strong to the word Markov in Lowther's uniqueness theorem? At least if one is willing to accept strong regularity assumptions on the function C( · , · ) and the resulting process S as defined in (2.3), one may ask whether the Markov property alone is sufficient. We focus on this question in the next section.

An eye-opening example
The subsequent example is known since the work by Dynkin and Jushkevich [18] in the 1950s.

Example 3.1
There is an R + -valued, continuous, Markov martingale which fails to be strongly Markovian.
Proof We define the process S = (S t ) 0≤t≤1 by starting at S 0 = 1 and subsequently proceeding in two steps. For t ∈ [0, 1 2 ], the process S is a stopped geometric Brownian motion, i.e., where B is a standard Brownian motion and τ is the first moment when S hits the level 2. For 1 2 ≤ t ≤ 1, we distinguish two cases. If S has been stopped, i.e., if S 1 2 = 2, the process S simply remains constant at the level 2. If this is not the case, the process continues to follow a geometric Brownian motion, i.e., Obviously, S is a continuous martingale. The crucial feature is its Markovian nature: The Markov property follows from the fact that for every fixed (deterministic) time 1 2 ≤ t ≤ 1, the probability for the geometric Brownian motion S t to be equal to 2 is zero on the set {τ > 1 2 }. Hence, for every fixed 1 2 ≤ t ≤ 1, the conditional law of (S u ) t≤u≤1 is almost surely determined by the present value S t of the process.
Why does S fail to be strongly Markovian? On the set {τ > 1 2 }, define the stopping time ϑ as the first instance u > 1 2 when S u equals the value 2, which happens with positive probability during the interval ( 1 2 , 1). At time ϑ ∧ 1, the process S therefore takes the value 2 on a non-negligible part of the set {τ > 1 2 }. Of course, the random variable S τ equals 2 on the set {τ ≤ 1 2 }, too. Hence there is no strongly Markovian prescription for the process S what to do after time ϑ : Without further information on the past, the process S cannot decide whether it should remain constant or continue to move on as a geometric Brownian motion.
Let us apply this example to the pricing of options of the form ( Letting z = 2, we find Indeed, this is clear for t ≤ 1 2 because then τ ≤ t ≤ 1 2 and S 1 = 2. For t > 1 2 , the set {S t = 2, τ > 1 2 } has probability 0 so that again S 1 = 2 P-a.s. on {S t = 2}. On the other hand, for z = 2 and 1 2 ≤ t < 1, the prices P x (t, z) are given by the usual Black-Scholes formula and therefore strictly positive. Hence, for 1 2 ≤ t < 1, the option prices z → P x (t, z) are discontinuous at z = 2. They also fail to be increasing and convex in the variable z which a reasonable option pricing regime should certainly satisfy. On the other hand, we note that these option prices -strange as they might be -do not violate the no-arbitrage principle as they were legitimately derived from a martingale.
The marginal distributions of the process S have an atom at the point 2 which is rather unpleasant. One may ask whether it is possible to construct variants of the above example which have more regular marginals.
Here is a fairly straightforward modification. Fix an uncountable compact K in R + with zero Lebesgue measure. For example, one may take the classical Cantor set K = {1 + ∞ n=1 ε n 3 n : ε n ∈ {0, 2}} and c −1 : [0, 1] → K as the (strictly increasing) right-continuous generalised inverse of the Cantor function associated with K. We can modify the construction of Example 3.1 in three steps: continue to be geometric Brownian motion, but stopped at the stopping 2 3 ]. Clearly, the probability that τ takes any fixed value vanishes, but the set {τ ∈ [ 1 3 , 2 3 ]} has positive probability. To see this, denote the running maximum of S by S * u := max r≤u S r and consider the event ≥ 2} which has positive probability. Since S * and c −1 are increasing and continuous resp. right-continuous, there exists on E a minimal t * ∈ [ 1 3 , 2 3 ] with S * t * ≥ c −1 (t * ). We deduce from (right-)continuity of the involved functions and 2 3 ]}, the process S remains constant, i.e., S t = S τ , and on {τ / ∈ [ 1 3 , 2 3 ]}, S continues to follow geometric Brownian motion. The process S thus enjoys all the features of Example 3.1 and in addition has continuous marginals; this uses that c −1 is strictly increasing so that the stopped process does not get stuck in some point with positive probability. Note, however, that these marginals are not given by densities as they are not absolutely continuous with respect to Lebesgue measure.
Turning back to the context of Example 3.1, there is another continuous Markovian martingale with the same marginals as S, inducing reasonable option prices. In fact, there is a continuous strongly Markovian martingale with this property and which is unique in this latter class (Theorem 4.1 below).
We only give an informal, verbal description of this strong Markov process. On the stochastic interval where 0 ≤ t ≤ 1 2 ∨ τ , let S be defined as in Example 3.1. For 1 2 ∨ τ ≤ t ≤ 1, we have to define S in a way that keeps the probability of the event {S t = 2} constant and preserves the strong Markov property. For this reason, we stop paths at the level 2 and at the same time start excursions from the set of paths stopped at the level 2 with a certain intensity rate. We are free to choose this rate in such a way that the mass remaining at the atom {S t = 2} equals precisely the constant mass which is prescribed by the given marginals of the process S. We thus have indicated the construction of another continuous martingale having the same marginals as the process S in Example 3.1. One may check that the latter construction is strongly Markovian -as opposed to the above construction in Example 3.1 -and that the option prices are increasing and strictly convex in the variable z as they should be. It will follow from Theorem 4.1 below that the latter martingale is the unique strong Markov solution for the given marginals.
Note that this answers the question raised at the end of Sect. 2. In Lowther's uniqueness theorem, it is not sufficient to consider Markovian (but not necessarily strongly Markovian) martingales; as we have just seen, there exist two distinct continuous Markov martingales with the same one-dimensional marginal distributions.
In view of Dupire's formula, this leads to the next question. It seemed natural to conjecture (but turned out to be wrong, as seen above) that, provided the call prices are sufficiently regular in t and x, there should be only one continuous Markov martingale matching these prices. Correspondingly one would ask: can one obtain similar examples as above, i.e., a continuous strongly Markovian martingale and a continuous Markov martingale failing the strong Markov property with the same absolutely continuous (or even more regular) marginals?
To the surprise of the present authors, it turned out that the answer is "yes", even when we pass to the "most regular" situation when S is a Brownian motion, i.e., in the Bachelier model (or the Black-Scholes model). The construction is more involved but rests on the above developed intuition; see the companion paper by Beiglböck et al. [10].

Uniqueness of Dupire's diffusion
There is a huge literature on one-dimensional processes inducing a given family of one-dimensional marginal distributions (see Kellerer [38], Madan and Yor [48], Hirsch and Roynette [29], Hirsch et al. [30], Beiglböck et al. [7], Lowther [45], Hamza and Klebaner [26], Fan et al. [19], Hobson [32], Oleszkiewicz [49], Albin [2], Baker et al. [6], Källblad et al. [36], among others). In particular, the late Marc Yor and his co-authors Hirsch, Profeta and Roynette wrote the beautiful book [28] on "peacocks". This is a pun on the French acronym PCOC, for "processus croissant pour l'ordre convexe", and a peacock is a stochastic process (X t ) t≥0 for which the family of laws law(X t ), t ≥ 0, is increasing in the convex order. We take here the liberty to use the word peacock also for a family of probabilities (μ t ) t≥0 that increases in the convex order. 1 To connect with this literature, we find it more natural to pass from the multiplicative setting (2.3) to the additive setting of a martingale diffusion Hence we consider now processes taking possibly values in all of R and switch to the notation X instead of the "stock price" S. We note, however, that this change is only for notational reasons, and everything below could also be done in the multiplicative setting of the previous sections. Given a peacock (μ t ) t≥0 , we may define option prices via where μ t , t ≥ 0, denote the one-dimensional marginals of X t , t ≥ 0, and x ∈ R. The "multiplicative" formula (2.4) becomes in the additive setting We can now cite Lowther's complete solution for the uniqueness problem within the class of continuous, strong Markov martingales. We stress (and admire) that this theorem does not require any additional regularity assumptions.
be R-valued, continuous, strong Markov martingales. If X and Y have the same one-dimensional marginal distributions, they also have the same distributions (as stochastic processes).
The proof of this theorem is highly technical and its presentation goes far beyond the scope of the present paper. Instead, we formulate a "toy" version of the theorem under strong regularity assumptions. We then analyse why the notion of strong Markovianity is key in the above theorem and finally give some hints on the strategy for the proof of Theorem 4.1.

Assumption 4.2
We suppose that the process X is given by X 0 = z 0 and (4.1), where σ (t, x) is sufficiently smooth to guarantee that there is a unique strong solution X. We also suppose that X T has finite second moment. Denoting by μ t the law of X t , we assume that the function C(t, x) defined in (4.2) is strictly convex in the variable x, strictly increasing in the variable t and satisfies standard Itô smoothness assumptions, i.e., it is twice continuously differentiable in x and once continuously differentiable in t. We also assume that for every x ∈ R, the pricing function (t, z) → P X,x (t, z) defined via also satisfies these standard Itô assumptions (of course, now with respect to z and t).
Assumption 4.2 is strong enough to guarantee that the function C(t, x) indeed satisfies (4.3). Here is then the "toy" version of Theorem 4.1 with strong regularity assumptions which make life easier. 0≤t≤T be another continuous Markov (but not necessarily strongly Markovian) martingale such that X t and Y t have the same distribution for every 0 ≤ t ≤ T . For fixed strike price x ∈ R, let P Y,x (t, z) be the corresponding option prices defined via

5)
and assume that for every x, the function (t, z) → P Y,x (t, z) also satisfies the above standard Itô smoothness assumptions. Then P X,x (t, z) = P Y,x (t, z) for all t, x, z, and the processes X and Y have the same distributions (as stochastic processes).
Proof As the function (t, z) → P Y,x (t, z) is assumed to satisfy the standard Itô conditions, we may apply Itô's formula to obtain where Y denotes the quadratic variation process of the continuous, square-integrable martingale Y . By (4.5), the process (P Y,x (t, Y t )) 0≤t≤T is a martingale. Indeed, since Y is Markovian, we get which is a martingale in the filtration of Y . The martingale condition implies that the drift term vanishes so that the equality 1 Because t → P Y,x (t, z) is strictly increasing and z → P Y,x (t, z) is strictly convex, we may define the function and then conclude that d Y t = ρ 2 (t, Y t ) dt. We therefore must have that Y may be represented as in (4.1), with σ replaced by ρ.
Recall that X and Y have by assumption the same one-dimensional marginals μ t , t ≥ 0. Denoting the option prices of Y , for x ∈ R and t ≥ 0, by we therefore have C Y = C. On the other hand, the same reasoning as for (4.3) implies that C Y satisfies Comparing with (4.3), we obtain ρ 2 = σ 2 which shows the identity of the processes X and Y in distribution. Theorem 4.3 provides a sufficient set of regularity assumptions to substantiate the statement in Dupire's paper [17] that ". . . we can recover, up to technical regularity assumptions, a unique diffusion process".
Of course, one could do some massaging of the above argument to somewhat weaken the very strong Assumption 4.2 which we have imposed. But there is a long and thorny road, going far beyond simple cosmetic changes, to arrive at Lowther's result in Theorem 4.1.
In Theorem 4.3, the strong regularity assumptions imply in particular the strong Markov property of the process X (although this is not used in the simple proof above). We stress once more that in the setting of Lowther's result in Theorem 4.1, the strong Markov property is the key assumption.
Passing to Lowther's notation and looking at (4.4), a crucial step in the above argument is to start from a convex, increasing and 1-Lipschitz function g, such as g(z) = (z − x) + , and pass to its conditional expectations In order to start a chain of arguments, one has to verify that f (t, z) is a "nice" function. When looking at Example 3.1 and its variants, we have seen that in that case, for g(z) = (z − x) + , this is not at all the case. Its conditional expectation f (t, z) lacked each of the following desired properties: continuity, monotonicity, and convexity in z.
Contrary to this lamentable breakdown of regularity, we shall verify in Corollary 5.3 that the strong Markov property guarantees that the following three properties are inherited from g( · ) by each f (t, · ): convexity, monotonicity, and 1-Lipschitz continuity (which serves as a more quantitative version of continuity). This preservation of regularity is a decisive feature of Lowther's proof.

Coupling strong Markov processes
What is the salient property which distinguishes the strong Markov property from the Markov property in our context? While the former condition allows Lowther's uniqueness theorem to hold true, we have seen in Example 3.1 that there may be different continuous Markov martingales inducing the same marginals. The following well-known concept is the key to understanding the difference.

Definition 5.1
For probability measures π 1 and π 2 on R, we say that π 2 dominates π 1 to first order if for every a ∈ R, we have We show in the next proposition that the strong Markov property of a continuous martingale implies that the transition probabilities (π s,t x ) x∈R given by where s < t and A is a Borel set in R, are increasing to first order in the variable x, for every s < t. We follow Hobson [31] who applied a well-known technique, namely the "joys of coupling" (to quote his paper), in the present context.

Proposition 5.2 Let X = (X t ) 0≤t≤T be a continuous strong Markov process with transition probabilities π s,t x [ · ]
. Then for 0 ≤ s < t ≤ T and x < y, the probability π s,t y dominates π s,t x to first order.
Proof Fix s, t and x < y as above and let (X x u ) s≤u≤t and (X y u ) s≤u≤t be independent copies of the process X, starting at X x s = x and X y s = y, both defined on the same filtered probability space. Define the stopping time τ as the first moment u when X x u equals X y u , if this happens for some u ∈ [s, t]; otherwise we let τ = ∞. Define the processX x byX We clearly haveX Indeed, if τ = ∞, the paths of (X x u ) s≤u≤t = (X x u ) s≤u≤t and (X y u ) s≤u≤t never touch, so that we even have strict inequality by continuity of the processes. If τ < ∞, then X x and X y have "joined" at time τ and subsequently follow the same trajectory. HenceX x u = X y u for τ ≤ u ≤ t. Inequality (5.1) implies that the law of X y t dominates the law ofX x t to first order. We conclude by observing that X x t andX x t have the same law due to the strong Markov property.

Corollary 5.3
Let X = (X t ) 0≤t≤T be a continuous strong Markov process with marginal laws μ t and transition probabilities π s,t x [ · ]. Let 0 ≤ s ≤ t ≤ T and z → g(z) be a measurable μ t -integrable function, and define the conditional expectation similarly as in (4.6) by Then the following assertion holds true: If we assume in addition that X is a martingale, we also have the following two assertions:

Proof (i) This is just a reformulation of Proposition 5.2. (ii) If
(iii) We follow the proof of Hobson [31, Theorem 3.1]. For convex g and fixed x < y < z, we have to show that Choose three independent copies X x , X y , X z of the process X, starting at time s from the initial values x, y and z. To simplify notation, we denote the resulting triple of processes (X x , X y , X z ) by (X, Y, Z). We define coupling times similarly as above.
Let τ x be the first moment u > s when X u = Y u ; similarly, τ z is defined as the first moment when Z and Y meet. Finally, let τ = τ x ∧ τ z ∧ t. This time, we leave the processes unchanged; we rather argue on the three disjoint (up to null sets) sets {τ = τ x }, {τ = τ z } and {τ = t}. We start with the latter set on which we have X t < Y t < Z t . By the convexity of g, we have On {τ = τ x }, we have X t = Y t so that the last term in (5.3) vanishes. Moreover, the first and the middle term are equal so that (5.3) holds true (with equality) on the set {τ = τ x }. In particular, Analogous reasoning applies to {τ = τ z } so that Summing up, we obtain Finally, we use independence and the martingale property of X, Y and Z to obtain which is tantamount to (5.2).
We can reformulate the message of Corollary 5.3 (ii) in the spirit of Bachelier by considering the Wasserstein cost W 1 (π s,t x [ · ], π s,t y [ · ]) of the horizontal transport of the conditional probability measures π s,t x [ · ] to π s,t y [ · ]. Recall that for probabilities μ, ν on the real line, the Wasserstein-1 distance is given by where cpl(μ, ν) denotes the set of all probabilities on R 2 having μ, ν as marginal measures; see e.g. Villani [56] for an extensive overview of the field of optimal transport.

Definition 5.4
Let π be a probability on R 2 and write μ for its projection onto the first coordinate and (π x ) x for the respective disintegration so that π = R π x dμ(x). Then π is called a Lipschitz-kernel if for all x, y in a set X with μ[X] = 1, we have We call π a martingale coupling if y dπ x (y) = x μ-a.s. It is then straightforward to see that for a martingale coupling π , the following are equivalent: (i) π is a Lipschitz-kernel.
(iii) For all x, y in a set X with μ[X] = 1 and x ≤ y, the measure π x is dominated to first order by π y . Definition 5.5 Let X be an R-valued Markov process. Then X has the Lipschitz-Markov property if for all s ≤ t, the law of (X s , X t ) is a Lipschitz-kernel.
To give yet another characterisation of Lipschitz-Markov processes, recall that a process X is Markov if and only if for all s ≤ t and every bounded measurable function f , there is a measurable function g such that

E[f (X t )|F s ] = g(X s ).
A process X is Lipschitz-Markov if and only if for all s ≤ t and every 1-Lipschitz function f , there is a 1-Lipschitz function g such that This is a straightforward consequence of the Kantorovich-Rubinstein theorem which provides a dual characterisation of the Wasserstein-1 distance through 1-Lipschitz functions.
We now can resume the crucial role of the strong Markov property. To the best of our knowledge, Lipschitz-kernels play a crucial role in all known proofs of Kellerer's theorem. The decisive property is the following. In contrast, the set of Markov martingales is not closed. See e.g. Beiglböck et al. [7] for the (simple) proof of Proposition 5.7.

Continuity of the martingale solution
An important question in the present context is the following: Under which conditions on a peacock (μ t ) 0≤t≤1 , as defined in Sect. 4 above, is there a strong Markov martingale with continuous trajectories having the given marginals? We only focus on the one-dimensional case as we have done throughout this paper. It is important to mention that the corresponding question of "mimicking" a peacock by a "nice" martingale remains wide open for dimensions d ≥ 2. The one-dimensional case, however, is fully understood by now, again by the definitive work of Lowther. Theorem 6.1 (Lowther [45,Theorem 1.3]) Let (μ t ) t≥0 be a peacock and assume that t → μ t is weakly continuous and that each μ t has convex support. Then there exists a unique strongly continuous Markov martingale X such that X t ∼ μ t , t ≥ 0.
We do not show Lowther's theorem in full generality, but again we want to isolate a sufficient set of assumptions that allows us to present a (comparably simple) selfcontained proof of the existence theorem.

Remark 6.2
A key ingredient of the proof is that for probabilities μ, ν in convex order, there exists a continuous martingale (X t ) 0≤t≤1 with X 0 ∼ μ and X 1 ∼ ν which is strongly Markovian (and hence Lipschitz-Markov). For instance, we can take X to be a stretched Brownian motion, that is, a solution to a continuous-time martingale transport problem; see Backhoff-Veraguas et al. [5]. Another possibility would be to apply an appropriate deterministic time change to Root's solution [52] (see Cox and Wang [15] for the case of a non-trivial starting distribution) of the Skorokhod embedding problem. We note that the martingale transport approach is also applicable to measures μ, ν defined on R d , d > 1. This could be interesting in view of a possible multidimensional extension of Lowther's result in Theorem 6.1, but this is not within the scope of the present article. Assumption 6.3 Let (μ t ) 0≤t≤1 be a one-dimensional peacock centered at zero with densities p t (x) and finite second moments such that the function t → m 2 2 (μ t ) is continuous. We assume that there is a -bounded or unbounded -open interval I ⊆ R supporting all the μ t such that for each compact subset K ⊆ I , the Lebesgue densities x → p t (x) of μ t are bounded away from zero, uniformly in x ∈ K and t ∈ [0, 1].
It will be convenient to suppose (without loss of generality via a deterministic time change) that t → m 2 2 (μ t ) is affine. More precisely, we may assume that There is an obvious and well-known strategy for the proof. We want to obtain the desired martingale M as a limit of approximations which fit the peacock (μ t ) 0≤t≤1 on finitely many points of time. As in Hirsch et al. [30], it is convenient to do so along the partially ordered set S of finite subsets S ⊆ [0, 1] naturally ordered by inclusion.
For each S ∈ S, we choose a continuous strong Markov martingale M S having the given marginals at each time s i ∈ S. The existence of M S is a direct consequence of Remark 6.  [7] for the straightforward argument. By refining the filter S, we may suppose that M is a limit point.
We fix such a limiting process M which by Proposition 5.7 is a Lipschitz-Markov martingale. These arguments again are standard by now and e.g. well presented in the papers by Hirsch et al. [30] or Beiglböck et al. [7]. A priori, the martingale M has càdlàg trajectories. Our present task is to show the continuity of the trajectories of the limiting process M under the above assumptions.
We first give a general criterion for the continuity of a limiting martingale M which is somewhat reminiscent of the classical Kolmogorov continuity criterion; see Revuz Then the martingale M has continuous trajectories.
In (6.2), the argument τ runs through the [t 0 , t 0 + h]-valued stopping times with respect to the natural filtration of (M i t ) t 0 ≤t≤t 0 +h . As M i is strong Markov, condition (6.2) is tantamount to the requirement that the first moments m 1 (π i,τ,t 0 +h x ) of the transition probabilities π i,τ,t 0 +h for μ i τ -almost all x ∈ R, where μ i τ denotes the law of M i τ . An important feature of the BMO-norms for continuous martingales is that by the John-Nirenberg inequality, all BMO q -norms are equivalent for 1 ≤ q < ∞ (see e.g. Kazamaki [37, Corollary 2.1]). Applying this fact to the present context, (6.2) is equivalent to the existence of a constant C q > 0 (for some or, equivalently, for all 1 ≤ q < ∞) such that Proof of Proposition 6.5 Suppose that M fails to be continuous and let us work towards a contradiction to (6.4) when q > 1 β . Assume M has jumps of size bigger than 3a > 0 with probability bigger than κ > 0, i.e., As M has càdlàg paths, there is h 0 > 0 such that for all 0 < h ≤ h 0 , By the pigeonhole principle, we can find for each 0 < h ≤ h 0 a time t 0 ∈ [0, 1] with In view of the convergence of finite-dimensional distributions of (M i ) i∈I to M, we find for each 0 < h ≤ h 0 a time t 0 ∈ [0, 1] (without loss of generality t 0 + h ≤ 1) and Fixing such an index i ∈ I , it follows that there is a set A ⊆ R of positive measure with respect to the law of M i t 0 such that for all x ∈ A, For x ∈ A, we therefore have As q > 1 β , we can choose 0 < h ≤ h 0 sufficiently small, with a(κh) 1 q > C q h β , and arrive at the desired contradiction to (6.4) via Turning back to the family (M S ) S∈S of martingales defined above, we next establish an inequality of the type (6.3) for the transition probabilities π S,t 0 ,t 0 +h x , using the fact that π S,t 0 ,t 0 +h x is a Lipschitz-kernel. Lemma 6.6 Let (μ t ) 0≤t≤1 be a peacock satisfying Assumption 6.3. Fix a compact set K ⊆ I . For all h > 0 sufficiently small and x ∈ K, there is a constant D > 0 such that for all S ∈ S and 0 ≤ t 0 ≤ t 0 + h ≤ 1, with t 0 , t 0 + h ∈ S, the first moments of the transition measures π S,t 0 ,t 0 +h x can be estimated by where P denotes the law of the martingale M S .
Proof We first suppose that I = R. By (6.1) and Jensen's inequality, we have We may rewrite this inequality in the form where we alleviate the notation from π S,t 0 ,t 0 +h x to π x and set We claim that the function x → F (x) satisfies the estimate Indeed, proving the claim. In the first inequality, we have used the fact that π is a Lipschitzkernel. Now choose a compact interval [a, b] such that K ⊆ [a + 1, b − 1] and denote by > 0 a lower bound for the density function p t 0 of μ t 0 on [a, b]. We want to estimate F (x 0 ) for x 0 ∈ K and start by showing the rough estimate F (x 0 ) ≤ 2. Using (6.7), we otherwise have F (x) dx ≥ 1 and arrive at the following contradiction to (6.6): for small enough h, From F (x 0 ) ≤ 2, we may argue similarly, using again (6.7) and elementary geometry, to obtain for x 0 ∈ K that yielding for h sufficiently small the desired estimate where the constant D > 0 only depends on the compact set K, but not on h.
Finally, we have to come back to our assumption I = R which allowed us to embed the compact set K into the interval [a, b] ⊆ I such that K ⊆ [a + 1, b − 1]. If I is only one-or two-sided bounded, we have to reason slightly more carefully, as we can embed the compact set K only into an interval [a, b] ⊆ I such that [a + ε, b − ε] contains K. But no difficulties arise from replacing 1 by ε, and it is straightforward to adapt the above argument also to this situation.
Under the assumptions of Lemma 6.6, Tschebyscheff's inequality and (6.5) allow us to control the difference between medians and means since for fixed 0 < δ < 1 4 , for h > 0 sufficiently small, x ∈ K and feasible S ∈ S. Hence, we have in the setting of Lemma 6.6 that |median(π S,t 0 ,t 0 +h Lemma 6.7 Let 0 < δ < 1 4 . Under the assumptions of Lemma 6.6, the same conclusion as in (6.5) holds true for every [t 0 , t 0 + h]-valued stopping time τ : By possibly changing the constant D to a different C, we have for x ∈ K that m 1 (π S,τ,t 0 +h Proof By Corollary 5.6, π S,t 0 ,t x is a Lipschitz-kernel for all t ∈ [t 0 , t 0 + h]. From this, we can deduce the continuity of the map Therefore, it suffices to show (6.9) for deterministic τ ≡ t ∈ [t 0 , t 0 + h] due to the strong Markov property. To this end, letK be a compact interval in I , containing the compact set K in its interior, and fix the constant D from Lemma 6.6 applied toK.
To argue (6.9) for τ ≡ t, find x ∈ I such that y equals the median of the measure π S,t 0 ,t x , that is, Since π S,t 0 ,t x is a Lipschitz-kernel by Corollary 5.6, such an x exists. We may use x to obtain the estimate Here, we used (i) from Corollary 5.3 for the first inequality and y being the median of π S,t 0 ,t x for the second. Note that for h sufficiently small, we have by (6.8) that x ∈K. Applying the estimates (6.5) and (6.8) to (6.10), we find which concludes the proof.
Proof of Theorem 6. 4 We have to combine Proposition 6.5 and Lemma 6.7 with a stopping argument. To this end, let (K n := [a n , b n ]) n∈N be an increasing sequence of compact intervals exhausting the interval I = be the smallest integer such that the entire trajectory (M S t (ω)) 0≤t≤1 is contained in K m S . We know already that for every n ∈ N, M S,n (and therefore M S,n ) allows finite-dimensional-distribution-convergent subnets along S, where the limits have by Proposition 6.5 continuous versions. We want to argue that similarly, the pair (M S , m S ) S∈S taking values in C[0, 1] N × N admits a convergent subnet, too. For this, it is sufficient to show the following claim: For every ε > 0, there is n ∈ N such that for every S ∈ S, sup S∈S P[m S > n] < ε. (6.11) Indeed, for each n ∈ N (sufficiently large), there are maximal positive numbers α n , β n with α n + β n ≤ 1 such that the probability measure α n δ a n + β n δ b n + (1 − α n − β n )δ mean(μ 1 )−αnan−βnbn 1−αn−βn is dominated by μ 1 in the convex order. Since μ 1 puts no mass onto the boundary of I , there is for each ε > 0 an index N ∈ N such that α n + β n < ε for all n ≥ N . The law of M S,n is dominated in the convex order by μ 1 . By the maximality of α n and β n , we find uniformly for all S ∈ S that P[m S > n] = P[τ S,n < ∞] = P[M S,n ∈ {a n , b n }] ≤ α n + β n < ε, which yields the claim (6.11).
Given a continuum of marginals which increase in the convex order (and maybe satisfy additional technical conditions), different authors have provided specific constructions of (not necessarily Markovian) martingales that match these marginals. A main motivation stems from the calibration problem in mathematical finance. An additional goal has often been to give constructions that optimise particular functionals, given the martingale and marginal constraints, since this yields robust bounds on option prices. Madan and Yor [48] and Källblad et al. [36] establish a continuous-time version of the Azéma-Yor embedding. Hobson [33] established a continuous-time version of the martingale coupling constructed in Hobson and Klimmek [34]. Henry-Labordère et al. [27] as well as Brückerhoff et al. [13] provide continuous-time versions of the shadow coupling (originally introduced in Beiglböck and Juillet [8]). Richard et al. [51] give a continuous-time version of the Root solution to the Skorokhod embedding problem. In a slightly different but related direction, Boubel and Juillet [11] consider a continuum of marginals on the real line that do not satisfy an order condition and construct a canonical Markov process matching these marginals. We also refer to the book of Hirsch et al. [28] that collects a variety of related constructions.
The problem of finding martingales with given one-dimensional marginals has received specific attention in the case where these marginals equal the ones of Brownian motion. Hamza and Klebaner [26] posed the challenge of constructing martingales with Brownian marginals that differ from Brownian motion, so-called fake Brownian motions. Non-continuous solutions can be found in Madan and Yor [48], Hamza and Klebaner [26], Hobson [32] and Fan et al. [19], whereas continuous (but non-Markovian) fake Brownian motions were constructed by Oleszkiewicz [49], Albin [2], Baker et al. [6] and Hobson [33]. As already noted, the companion article Beiglböck et al. [10] establishes that there exists a Markovian martingale with continuous paths that has Brownian marginals. In this context, we also refer to the work of Föllmer et al. [21] which establishes the existence of weak Brownian motions of arbitrary order k > 0, that is, processes which have the same k-dimensional marginals as Brownian motion, but are not Gaussian.
A somewhat different direction arises if one starts with marginals that do not merely satisfy a structural condition (specifically, monotonicity in the convex order), but rather assumes that a set of marginals is generated from an Itô diffusion dX t = σ t dB t + μ t dt, (7.1) and one seeks a Markovian diffusion dX t =σ t (X t ) dB t + μ t (X t ) dt that mimics the evolution of X in the sense that law(X t ) = law(X t ) for each t ≥ 0. The processX is then called a Markovian projection of X. This line of research goes back essentially to the work of Krylov [41] and Gyöngy [25]. Of course, also the work of Dupire [17] can be seen as a formal contribution to this line of research. A rigorous justification of Dupire's formula under rather general assumption is obtained by Klebaner [40]. A very general theorem on mimicking aspects of Itô processes is given by Brunick and Shreve [14]. Recently, Lacker et al. [43] show that the results of [25,14] can be established directly from the superposition principle of Trevisan [55] (or Figalli [20] in the case where (7.1) has bounded coefficients). (Notably, the main focus of the work [43] is a mimicking result that shows that conditional time marginals of an Itô process can be matched by a solution of a conditional McKean-Vlasov SDE with Markovian coefficients.) In the mathematical finance community, Markovian (local volatility) models are often considered to exhibit dynamics that are not particularly realistic. There has been significant interest to combine the convenience that the local volatility model offers in terms of calibration with more realistic dynamics that are exhibited by other classes of financial models. That is, given a Markovian model dX t =σ t (X t ) dS t that represents market data, one would like to "reconstruct" a more realistic model dX t = σ t dB t and thus to "invert" the Markovian projection. A concrete way to perform this inversion is the stochastic local volatility model; see the work of Guyon and Henry-Labordère [22,23] and [24,Chap. 11]. However, it is remarkably delicate to establish existence and uniqueness results for the resulting SDEs. Partial solutions where given by Jourdain and Zhou [35] and by Lacker et al. [42]. The problem is also discussed by Acciaio and Guyon [1] who consider it an important open problem to establish existence of the stochastic local volatility model under fairly general assumptions.
Funding Note Open access funding provided by Swiss Federal Institute of Technology Zurich.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/ 4.0/.