Robust bounds for the American Put

We consider the problem of finding a model-free upper bound on the price of an American put given the prices of a family of European puts on the same underlying asset. Specifically we assume that the American put must be exercised at either $T_1$ or $T_2$ and that we know the prices of all vanilla European puts with these maturities. In this setting we find a model which is consistent with European put prices and an associated exercise time, for which the price of the American put is maximal. Moreover we derive a cheapest superhedge. The model associated with the highest price of the American put is constructed from the left-curtain martingale transport of Beiglb\"{o}ck and Juillet.


Introduction
This article is motivated by an attempt to understand the range of possible prices of an American put in a robust, or model-independent, framework. In our interpretation this means that we assume we are given today's prices of a family of European-style vanilla puts (for a continuum of strikes and for a discrete set of maturities). The goal is to find the consistent model for the underlying for which the American put has the highest price, where by definition a model is consistent if the discounted price process is a martingale and if the model-based discounted expected values of European-put payoffs match the given prices of European puts. This notion of model-independent or robust bounds on the prices of exotic options was introduced in Hobson [17] in the context of lookback options, and has been applied several times since, see Brown et al. [8] (barrier options), Cox and Ob lój [12] (no-touch options), Hobson and Neuberger [21] and Hobson and Klimmek [20] (forward-start straddles), Carr and Lee [9] and Cox and Wang [13] (variance options), Stebegg [27] (Asian options) and the survey article Hobson [19]. The principal idea is that the prices of the vanilla European puts determine the marginal distributions of the price process at the traded maturities (but not the joint distributions) and that these distributional requirements, coupled with the martingale property, place meaningful and useful restrictions on the class of consistent models. These restrictions lead to bounds on the expected payoffs of path-dependent functionals, or equivalently bounds on the prices of exotic options.
In addition to the pricing problem there is a related dual or hedging problem. In the dual problem the aim is to construct a static portfolio of European put options and a dynamic discrete-time hedge in the underlying which combine to form a superhedge (pathwise over a suitable class of candidate price paths) for the exotic option. The value of the dual problem is the cost of the cheapest superhedge.

INTRODUCTION
There is a growing literature, beginning with Beiglböck et al. [4] for discrete-time problems, and Galichon et al. [15] in continuous time, which aims to explain how to formulate the problem in such a way that there is no duality gap, i.e. the highest model-based price is equal to the cheapest superhedge, either for specific derivatives, or in general.
Many of the early papers on robust hedging exploited a link with the Skorokhod embedding problem (Skorokhod [26]). For example, in the study of the lookback option in Hobson [17] the consistent model which achieves the highest lookback price is constructed from the Azéma-Yor [2] solution of the Skorokhod embedding problem. More recently, Beiglböck et al. [4] (see also Dolinsky and Soner [14] and Touzi [28]) have championed the connection between robust hedging problems and martingale optimal transport. In this paper we will make use of the left-curtain martingale coupling introduced by Beiglböck and Juillet [6], and developed by Henry-Labordère and Touzi [16] and Beiglböck et al. [5].
The study of American style claims in the robust framework was initiated by Neuberger [25], see also Hobson and Neuberger [23], Bayraktar and Zhou [3] and Aksamit et al. [1]. (There is also a paper by Cox and Hoeggerl [11] which asks about the possible shapes of the price of an American put, considered as a function of strike, given the prices of co-maturing European puts.) The main innovation of this paper is that rather than focussing on general American payoffs and proving that the pricing (primal) problem and the dual (hedging) problem have the same value, we focus explicitly on American puts and try to say as much as possible about the structure of the consistent price process for which the model-based American put price is maximised, and the structure of the cheapest superhedge.
Mathematically, it will turn out that our problem can be cast as follows. Let µ and ν be a pair of probability measures which are increasing in convex order and therefore necessarily have the same meanμ. A standing assumption in this paper will be that µ is continuous (or equivalently, µ has no atoms). LetΠ M (µ, ν) be the set of martingale couplings (which are often alternatively called martingale transports) between µ and ν and let K 1 > K 2 be a pair of fixed constants. The problem we consider is to find where M = (μ, M 1 , M 2 ) is a martingale with joint law P(M 1 ∈ dx, M 2 ∈ dy) = π(dx, dy) and B is a Borel subset of R. In terms of the American put problem M should be thought of as the discounted price of the underlying asset (to simplify notation we write M 1 ≡ X and M 2 ≡ Y ). Further, K 1 and K 2 are the discounted strikes of the put and B represents the set of values of the discounted time-1 price of the underlying such that the option is exercised at time 1; otherwise the put is exercised at time 2. Then (1) represents the primal problem of finding the highest model-based expected payoff of the American put. See Section 2.2.
There is a corresponding dual or hedging problem of finding the cheapest superhedge based on static portfolios of European puts and a piecewise constant holding of the underlying asset, see Section 2.3.
Our main achievement is to exhibit the model and stopping rule which achieves the highest possible price for the American put, to exhibit the cheapest superhedge, and to show that the highest model-based price is equal to the cost of the cheapest superhedge.
For fixed µ, ν and K 1 > K 2 there is typically a family of optimal models. Fixing µ and ν but varying K 1 and K 2 it turns out that there is a model which is optimal for all K 1 and K 2 simultaneously. This model is related to the left-curtain coupling of Beiglböck and Juillet [6]. In particular, given µ ≤ cx ν (with µ continuous), Beiglböck and Juillet [6] prove that there exist functions T d and T u with T d (x) ≤ x ≤ T u (x) such that T u is increasing and such that if x < x ′ then T d (x ′ ) / ∈ (T d (x), T u (x)), and such that there is π ∈Π M (µ, ν) which is concentrated on the graphs of T d and T u . Under this martingale coupling Y ∈ {T d (X), T u (X)} and by the martingale property P(Y = T d (X)|X) = Tu(X)−X Tu(X)−T d (X) (assuming not both T d (X) = X and T u (X) = X).
In this paper we will concentrate on the case where µ is continuous. Indeed, if µ has atoms then the situation becomes more delicate. On one hand, we must allow for a wider range of possible candidates for exercise determining sets B. On atoms of X we may want to sometimes stop and sometimes continue, although we must still take stopping decisions which do not violate the martingale property of future price movements. On the other hand, the functions T d , T u that characterises the left-curtain coupling become multi-valued on the points where µ has atoms. Then it is not clear how the optimal model can be identified. For these reasons we must extend our notion of a martingale coupling and generalise, in a useful fashion, the left-curtain martingale coupling of Beiglböck and Juillet [6] to the case with atoms. The appropriate extension of the left-curtain coupling to the case with atoms in µ is discussed in a companion paper ( [24]); in this paper we focus on the financial aspects of our results, namely the application to the robust hedging of American puts.
The remainder of the paper is structured as follows. In the next section we formulate precisely our problem of finding the robust, model-independent price of an American put and explain how the problem can be transformed into (1) in the atom-free case. We also explain how the pricing problem is related to the dual problem of constructing the cheapest superhedge. In Section 3 we assume that µ is continuous, and we show by studying a series of ever more complicated set-ups how to determine the best model and hedge. The constructions in this section make use of results on the left-curtain coupling of Beiglböck and Juillet [6] and Henry-Labordère and Touzi [16].
By weak duality the highest model price is bounded above by the cost of the cheapest superhedge. Hence, if on the one hand we can identify a consistent model and stopping rule and on the other a superhedge, such that the expected payoff in that model with that stopping rule is equal to the cost of the superhedge then we must have identified an optimal model and an optimal stopping rule together with an optimal hedging strategy. Moreover there is no duality gap. This is the strategy of our proofs. One feature of our analysis is that wherever possible we provide pictorial explanations and derivations of our results. In our view this approach helps bring insights which may be hidden under calculus-based approaches.
2 Preliminaries and set-up
It follows from Lemma 1 that if there is a point x in the interior of the interval I η such that D η,χ (x) = 0 then we can separate the problem of constructing martingale couplings of η to χ into a pair of subproblems involving mass to the left and right of x, respectively, always taking care to allocate mass of χ at x appropriately. Indeed, if there are multiple {x j } with D η,χ (x j ) = 0 then we can divide the problem into a sequence of 'irreducible' problems 1 , each taking place on an interval I i such that D > 0 on the interior of I i and D = 0 at the endpoints. All mass starting in a given interval is transported to a point in the same interval. However, in our setting, in addition to specifying a model (or equivalently a martingale coupling) we also need to specify a stopping rule, and this needs to be defined across all irreducible components simultaneously. For this reason we do not insist that D > 0 on the interior of I χ , although this will be the case in the simple settings in which we build our solution.

The financial model and model based prices for American puts
SupposeZ = (Z Ti ) i=0,1,2 is the price of a financial security which pays no dividends, where T 0 = 0 is today's date. (In this section a superscript· denotes an undiscounted quantity.) Suppose interest rates are non-stochastic and positive. Let one unit of cash invested at time T 0 in a bank account paying the riskless rate be worthB Ti at time i for i = 0, 1, 2. ThenB 0 = 1. Define Z = (Z i ) i=0,1,2 by Z i =Z Ti /B Ti so that Z is the discounted asset price with a simplified time-index i = 0, 1, 2. We assume that Z 0 is known at time 0.
Let Σ be the set of stopping rules taking values in {T 1 , T 2 } and let T be the set of stopping rules taking values in {1, 2}. Consider an American put with strikeK which may be exercised at T 1 or T 2 only. Define K i =K/B Ti . Under a fixed model the expected payoff of an American put under an exercise (stopping) rule σ taking values in {T 1 , T 2 } is given by E[ 1 Bσ (K −Z σ ) + ] and the price of the American option (assuming exercise is only allowed at T 1 or T 2 ) is Assume we are given European put prices {P Ti (k)}k ≥0 for i = 1, 2 for a continuum of strikesk. If the call prices have come from a model for which the discounted price process is a martingale theñ Then for fixed i we have P i (k) =P Ti (kB Ti ), and if we are given European put prices with maturity T i then we can read off the law of Z i : Henceforth we assume we work in a discounted setting and with time-index in the set i = 0, 1, 2. In this setting the American put has payoff (K 1 − Z 1 ) + at time 1 and payoff (K 2 − Z 2 ) + at time 2 where K i = K/B Ti . Since interest rates are positive by hypothesis, we have K 2 < K 1 . We assume that we are given the prices of European puts (with maturities T 1 and T 2 in the original timescale) for all possible strikes. From these we can infer the laws of the discounted price process at times 1 and 2. We denote these laws by µ and ν. It follows from Jensen's inequality that if µ and ν have arisen from sets of European put options in this way then µ ≤ cx ν. [22]). Suppose µ ≤ cx ν.
We say (S, M ) is a (µ, ν)-consistent model if S is a filtered probability space and M is a (S, µ, ν) consistent stochastic process.
Remark 1. It is important to note that the supremum in (3) can exceed the supremum in (1), but only in the case where µ has atoms, see Hobson and Norgilas [24]. The supremum in (1) gives the highest model based price under the restriction that F 0 is trivial, F 1 = σ(X) and F 2 = σ(X, Y ).
However, as pointed out in Hobson and Neuberger [23], see also Hobson and Neuberger [22], Bayraktar and Zhou [3] and Aksamit et al. [1], it is sometimes possible to achieve a higher model price if we work on a richer probability space. In the financial context, the choice of probability space is typically not specified. Instead the choice of probability space is a modelling issue, and it seems unreasonable to restrict attention to a sub-class of models without good reason, especially if this sub-class does not include the optimum. The case where µ has atoms will be excluded by our standing assumptions, so we find that it is always sufficient to work in a setting in which F is the natural filtration of M .

Superhedging
The following notion of a robust superhedge for an American option was first introduced by Neuberger [25], see also Bayraktar and Zhou [3] and Hobson and Neuberger [23].
We work in discounted units over two time-points. Consider a general American-style option with payoff a if exercised at time 1, and payoff b if exercised at time 2, where a : R → R + and b : R → R + are positive functions.

PRELIMINARIES AND SET-UP
The idea behind the definition is that the hedger purchases a portfolio of maturity-1 European puts (and calls) with payoff φ and a portfolio of maturity-2 European puts (and calls) with payoff ψ. (The fact that this can be done and has cost C follows from arguments of Breeden and Litzenberger [7].) In addition, if the American option is exercised at time 1 the hedger holds θ 1 units of the underlying between times 1 and 2; otherwise the hedger holds θ 2 units of the underlying over this time-period. In the former case, (4) implies that the strategy superhedges the American option payout; in the later case (5) implies the same.

Remark 2.
We could extend the definition and allow a holding of θ 0 units of the discounted asset over the time-period [0, 1). Then the RHS of (4) would be However, after a relabelling φ(x) + θ 0 (x − M 0 ) → φ(x), (6) reduces to (4). (Note that θ 0 (x − M 0 )µ(dx) = 0 by the martingale property so that C is unchanged.) Similarly for (5). Hence there is no gain in generality by allowing non-zero strategies between times 0 and 1.
The dual (superhedging) problem is to find Potentially the space H could be very large and it is extremely useful to be able to search over a smaller space. The next lemma shows that any convex ψ with ψ ≥ b can be used to generate a superhedge (φ, ψ, For a convex function χ let χ ′ + denote the right-derivative of χ. and (5) follows. Also, by the convexity of ψ, ψ(x) ≤ ψ(y) − ψ ′ + (x)(y − x) and Hence (4) follows.
LetH =H(b) be the set of convex functions ψ with ψ ≥ b. For ψ ∈H we can define the associated cost of the portfolioC The reduced dual hedging problem restricts attention to superhedges generated from ψ ∈H and is to findD Clearly we have D ≤D: we will show that D =D for the American put.

Weak and Strong Duality
and we have weak duality P ≤ D.
Suppose we can find (S * , M * , B * ) with M * ∈ M(S * , µ, ν) and ψ * ∈H such that Then A(B * , M * , S * ) ≤ P ≤ D ≤D ≤C(ψ * , µ, ν) but since the two outer terms are equal we have P = D and strong duality. Moreover, (S * , M * ) is a consistent model which generates the highest price for the American put (and τ * given by τ * = 1 if and only if X ∈ B * is the optimal exercise rule) and ψ * generates the cheapest superhedge.

The left-curtain coupling
The left-curtain coupling (or martingale transport) was introduced by Beiglböck and Juillet [6] and further studied by Henry-Labordère and Touzi [16] and Beiglböck et al. [5].
x,d is the law of a Brownian motion started at x evaluated on the first exit from (c, d).
Lemma 3 (Beiglböck and Juillet [6], Corollary 1.6). Let µ, ν be probability measures in convex order and assume that µ is continuous. Then there exists a pair of measurable functions T d : Note that there is no claim of uniqueness of the functions T d , T u in Lemma 3. For example, the definitions of T d and T u are immaterial outside [ℓ µ , r µ ]. Further, if T u has a (necessarily upward) jump at x ′ then it does not matter what value we take for (Since we are assuming µ is continuous, the probability that we choose an x-coordinate value of x ′ is zero.) More importantly, if (T d , T u ) satisfy the properties of Lemma 3 and if T u (x) = x on an interval [x, x) then we can modify the definition of T d on [x, x) to either T d (x) = x or T d (x) = T d (x−) and still satisfy the relevant monotonicity properties. Henry-Labordère and Touzi [16] resolve this indeterminacy by setting T d (x) = x on the set T u (x) = x and also taking T u and T d to be right-continuous.
We follow Henry-Labordère and Touzi [16] by taking T d (x) = x on the set T u (x) = x but we do not make right-continuity assumptions on T d and T u . Also we write (f, g) in place of (T d , T u ).
Lemma 4. Let (T d , T u ) be a pair of functions satisfying the monotonicity properties listed in Lemma 3. Suppose they lead to a solution π lc ∈Π M (µ, ν).
Remark 3. The left-curtain martingale coupling can be identified with Figure 1 in the following way: . Then the coordinates (x, y) represent the realised values of (X, Y ).
For a horizontal level y there are two cases. Either, g(y) > y and then the value of y arises from a choice according to µ of x = g −1 (y) for which g(x) is chosen rather than f (x); or g(y) = y and the value y arises either from a choice according to µ of x = y, or from a choice according to µ of f −1 (y) combined with a choice of y-coordinate of f (f −1 (y)) = y.
Suppose ν is also continuous and fix x. Then, by the first paragraph of Remark 3, under the left-curtain martingale coupling mass in the interval (f (x), x) at time 1 is mapped to the interval Essentially, (9) is preservation of mass condition and (10) is preservation of mean and the martingale property. If ν has atoms then (9) and (10) become Returning to the case of continuous µ and ν, for fixed x there can be multiple solutions to (9) and (10). If, however, we consider f and g as functions of x and impose the additional monotonicity )), then typically, for almost all x there is a unique solution to (9) and (10). However, there are exceptional x at which f jumps and at which there are multiple solutions, see Section 3.3.
Remark 5. There are many pairs (µ, ν) which lead to the same pair of functions (f, g). Conversely, let I 1 ⊆ I 2 ⊆ R be intervals and define Then (subject to integrability conditions 2 ) we have π ∈Π M (µ, ν). Moreover, if we set Mon , then provided the same integrability conditions are satisfied we have that if ν is given by (13) then π given by π(dx, dy The relevance of this remark is as follows. Given a pair µ ≤ cx ν it may be difficult to determine the properties of (f, g) which define the left-curtain coupling, beyond the fact that (f, g) ∈ Ξ Mon . (For example, it may be difficult to ascertain the number of downward jumps of f without calculating f and g everywhere.) However, if we want to construct examples for which (f, g) have additional properties (such as no downward jump) then we can start with an appropriate pair (f, g), take arbitrary (continuous) initial law µ with support on the interval where f is defined, and then define ν via (13). This observation underpins our analysis in Sections 3.2 and 3.3.
3 Robust bounds for American puts when µ is atom-free

Problem formulation
Our goal in this section is to derive the highest consistent model price for the American put. We begin by giving a concise formulation of the problem, and stating a version of our main result. Then we first study the problem in a simple special case, second generalise to a case which exhibits all the main features and third present the analysis in the general case.
Throughout this paper we assume that µ has no atoms. The same assumption is made in Beiglböck and Juillet [6], Henry-Labordère and Touzi [16] and Beiglböck et al [5]. The extension of the left-curtain martingale coupling to the case where µ has atoms is the subject of Hobson and Norgilas [24].
We consider an American put on an asset. Under the bond numeraire, we represent the price of the underlying security by M = (M 0 =μ, M 1 = X, M 2 = Y ). The American put may only be exercised at time 1 or time 2: if the put is exercised at time 1 the payoff is (K 1 − X) + ; if the put is exercised at time 2 the payoff is (K 2 − Y ) + . We say the put is in-the-money at time 1 (respectively time 2) if X < K 1 (respectively Y < K 2 ). Otherwise the put is out-of-the-money. The laws of X and Y are presumed to be given and L(X) = µ and L(Y ) = ν.
Under Standing Assumption 1 our problem is to Our main result is as follows: Theorem 1. The highest model-based expected payoff of the American put is equal to the cheapest superhedging price. Moreover, the highest model-based expected payoff is attained by the model associated with the left-curtain martingale coupling (and a judiciously chosen stopping rule). Further, we can characterise the cheapest super-hedging strategy: it takes the form described in Lemma 2 and it is one of four possible types.
We begin by considering a couple of degenerate cases. If K 1 ≤ ℓ µ then the American put is always out-of-the-money at time 1, and the American put is equivalent to the European put with strike K 2 and maturity 2. Since puts with strike K 1 and maturity 1 are costless, a simple superhedging strategy is to purchase one European put with strike K 1 and maturity 1, and one European put with strike K 2 and maturity 2. The cost of this hedge is P ν (K 2 ), this is also the model-based expected payoff of the American put under any consistent model.
Again the American put is equivalent to the European put with strike K 2 and maturity 2. In this case, for a superhedge it is sufficient to purchase one European put with strike K 2 and maturity 2. By Lemma 2 (with ψ(y) = (K 2 − y) + and φ = 0) this generates a superhedge with cost P ν (K 2 ). Again, this is the the model-based expected payoff of the American put under any consistent model.
For the remainder of the paper we make

The left-curtain coupling
The goal in this section is to present the theory in a simple special case, and to illustrate the main features and solution techniques of our approach unencumbered by technical issues or the consideration of exceptional cases. The following assumption is a small modification of one introduced by Hobson and Klimmek [20], see also Henry-Labordère and Touzi [16]. See Figure 2.
Under the Dispersion Assumption {k : D µ,ν (k) > 0} is an interval and D = D µ,ν is convex to the left of e − , concave on (e − , e + ) and again convex above e + . Lemma 5 (Henry-Labordère and Touzi [16], Section 3.4). Suppose Assumption 1 holds. For all x ∈ (e − , r µ ), there exist f, g with f < e − < x < g such that (9) and (10) hold. Moreover, if we consider f and g as functions of x on (e − , r µ ) then f and g are continuous, f is strictly decreasing and g is strictly increasing, Figure 3: Sketch of functions f and g under the Dispersion Assumption, with the regions K 2 < f (K 1 ) and K 2 > f (K 1 ) shaded. This is a simple special case of Figure 1.

Remark 6.
As discussed at the end of Remark 5, for the purposes of the analysis of this section it is not the fact that the measures µ and ν satisfy the Dispersion Assumption which is important, but rather that π lc is so simple, and {k : g(k) > k} is a single interval on which f is a monotone decreasing function.
Starting with monotonic f and g, letting µ be continuous and defining ν by ν(dy) = x µ(dx)χ f (x),x,g(x) (dy) and π lc by the pair (µ, ν) may or may not satisfy Assumption 1 but nonetheless, a candidate optimal model, stopping time and hedge can be constructed exactly as described in this section, and can be proved to be optimal by the methods of this section. Since our analysis depends on the pair (µ, ν) only through the functions (f, g) we may take as our starting point any (f, g) ∈ Ξ Mon . Remark 7. In a related problem, Hobson and Klimmek [20] show how under the Dispersion Assumption, upper and lower functions can be characterised as solutions of a pair of coupled differential equations. In our case (f, g) solve a pair of coupled differential equations on [e − , r µ ) obtained from differentiating (9) and (10): with the initial condition f (e − ) = e − = g(e − ). See also Henry-Labordère and Touzi [16,Equations (3.10) and (3.9)].
The principle behind the left-curtain martingale coupling in Beiglböck-Juillet [6] is that they determine where to map mass at x at time 1 sequentially working from left to right. In our current setting there is an interval (ℓ µ , e − ] on which mass can remain unmoved between times 1 and 2. To the right of e − we can define f, g in such a way that mass is moved as little as possible. This leads to the ODEs in Remark 7.

The American put
Suppose K 1 ∈ (e − , r µ ] and suppose f and g are constructed as in Lemma 5. Define Λ : Pictorially Λ is the difference in slope of the two dashed lines in Figure 4. Lemma 6. Suppose K 1 ∈ (e − , r µ ] and f (K 1 ) < K 2 . Then there is a unique x * = x * (µ, ν; K 1 , K 2 ) ∈ (g −1 (K 1 ), K 1 ) such that Λ(x * ) = 0. Moreover f (x * ) < K 2 and Proof. It is clear, see Figure 4, that since f and g are continuous monotonic functions we have that Λ is continuous and strictly increasing. Moreover, Λ(g −1 ( Hence there is a unique root to Λ = 0. At this root the equalities in (16) hold.
x * x K 1 K 2 Figure 5: A combination of Figures 3 and 4, showing how jointly they define the best model and best hedge. By adjusting x we can find x * such that Λ(x * ) = 0. Together the quantities (f (x * ), x * , g(x * )) define the optimal model, stopping time and hedge.
Continue to suppose K 1 > e − and f (K 1 ) < K 2 . Now we define a superhedge of the American put.
Let ψ * be the function Note that by construction and by (16), Moreover, ψ * is convex and satisfies ψ * (z) ≥ (K 2 − z) + . Hence by Lemma 2, ψ * can be used to construct a superhedge (ψ * , φ * , θ * 1,2 ). In the following theorem we will assume the American put is not always strictly in-the-money at time 1 (or equivalently, K 1 ≤ r µ ). Discussion of the case K 1 > r µ is postponed until Section 3.3.5 below.
1. Suppose K 1 ∈ (e − , r µ ] and f (K 1 ) < K 2 . The model (S * , M * ) described in the previous paragraphs is a consistent model for which the price of the American option is the highest. The stopping time τ * is the optimal exercise time. The function ψ * defined in (17) defines the cheapest superhedge.
Moreover, the highest model-based price is equal to the cost of the cheapest superhedge.
2. Suppose either that Case A: K 1 ≤ e − or that Case B: K 1 ∈ (e − , r µ ] and f (K 1 ) ≥ K 2 . Then there is a consistent model for which (Y < K 2 ) = (X < K 2 ) ∪ (X > K 1 , Y < K 2 ) and any model with this property with the stopping rule τ = 1 if X < K 1 and τ = 2 otherwise attains the highest consistent model price. The cheapest superhedge is generated from ψ(x) = (K 2 − x) + and the highest model-based price is equal to the cost of the cheapest hedge.
Remark 8. In Part 2 of the Theorem 2, the left-curtain coupling generates a model which, when associated with the stopping rule of the theorem, attains the highest consistent model price.
Since ν is continuous we have that f, x, g solve (9) and (10). The elements f, x, g can be used to define a model using the construction after Lemma 6 above. For this model we can calculate the expected payoff of the American put. At the same time we can use (f, x, g) to define a superhedge. The remaining task is to show that the cost of the superhedge equals that of the model-based expected payoff. Then by the discussion in Section 2.4 we have found an optimal model and a cheapest superhedge.
The expected payoff of the American put (for this model and stopping rule) is Now we consider the hedging cost. Set Θ = K2−f g−f ∈ (0, 1). Note that, since x is such that Λ(x) = 0 we have Θ = K1−x g−x . Recall the definition of ψ * in (17). Then Using Lemma 2 we can use ψ * to generate a superhedging strategy. The cost of this strategy is where the first two terms arise from the purchase of the static time-2 portfolio ψ * and the third comes from the purchase of the time-1 portfolio (K 1 − w) + − ψ * (w). The expression in (18) can be rewritten as Now we consider the difference between the hedging cost (HC) and the model-based expected payoff (M BEP ). Recall that P χ (k) = k −∞ (k − x)χ(dx), χ ∈ {µ, ν}, and that D(k) = D µ,ν (k) = P ν (k) − P µ (k). Then (9) and (10) can be rewritten as We find where we use (20), (19) and the definition of Θ, respectively. Optimality of the model, stopping rule and hedge now follows.
Alternatively, suppose K 1 > e − but f (K 1 ) ≥ K 2 . Then under the left-curtain martingale coupling mass below K 2 at time 1 stays constant between times 1 and 2 (note that K 2 ≤ f (K 1 ) ≤ e − ), and mass between K 2 and K 1 at time 1 is mapped to (K 2 , ∞). Then, mass which is below K 2 at time 2 was either below K 2 at time 1, or above K 1 at time 1. The expected payoff under this model (using a strategy of exercising at time 1 if the American put is in-the-money) is again given by (21). Now consider the hedging cost. Let ψ(y) = (K 2 − y) + . Defining φ as in Lemma 2 we find φ(x) = (K 1 − x) + − (K 2 − x) + = (K 1 − (x ∨ K 2 )) + and the superhedging cost is Hence the model-based expected payoff equals the hedging cost.

Two intervals of g > x and one downward jump in f
We now relax the Dispersion Assumption to the case where f is not monotone. The simplest situation when this may arise is when there are two intervals on which g(x) > x. We do not contend that there are many natural examples which fall into this situation, but rather that this intermediate case illustrates phenomena which are to be found in the general case but which were not to be found under the Dispersion Assumption.
By construction we have that so that if mass in (x ′ , x ′′ ) at time 1 is mapped to (x ′ , g(x ′′ )) at time 2 then total mass and mean are preserved. Note that given (f ′ , x ′ ) satisfy (22) we also have In particular, given (22) and (23), the pair of equations has two solutions for f , namely f = x ′ and f = f ′ . Hence, in defining the left-curtain martingale coupling there are two choices for f at x ′′ : we may take f (x ′′ ) = x ′ or f (x ′′ ) = f ′ . Rather than assuming one of these choices (for example by requiring left-continuity of f ) it is convenient to allow f to be multi-valued. Then, for x such that g(x) > x let ℵ(x) = {f : (f, x, g(x)) solves (9) and (10)}.
Then, in the setting of Assumption 2, for x > e − , |ℵ(x)| = 1 except at x ′′ and there ℵ( Remark 9. As discussed in Remark 5 when constructing examples which fit with the analysis of this section, we may begin with f, g as presented in the bottom half of Figure 7. Given µ with support (ℓ µ , r µ ) we can define ν via ν(dy) = µ(dx)χ f (x),x,g(x) . Then the pair (µ, ν) satisfy the hypotheses of Assumption 2.
Remark 10. Recall Remark 7 and the principle that quantities in the left-curtain coupling are determined working from left to right. Given that µ and ν have continuous densities and given that η > ρ on (ℓ µ , e 1 − ) we can set f = g = x on this interval. To the right of e 1 − we have ρ > η and we can define f and g using the differential equations in Remark 7. There are two cases, either g(x) > x for all x ∈ (e 1 − , r µ ) (in which case we can define (f, g) on (e 1 − , r µ ) with the properties described in Lemma 5) or there is some point at which g first hits the diagonal line y = x again. This point is exactly x ′ .
If x ′ exists it must satisfy x ′ ∈ (e 1 + , e 2 − ). Then we set g(x) = x on (x ′ , e 2 − ) and let f = g solve the same coupled differential equations as in Remark 7 but with a new starting point g(e 2 − ) = e 2 − = f (e 2 − ). The ODEs determine f and g until f first reaches x ′ . This happens at x ′′ , and at x ′′ f jumps down to f ′ (and g is continuous). To the right of x ′′ , f and g solve the differential equations again subject to initial conditions f ( . If f is multi-valued, then Λ will also be multivalued. In Section 3.2, one of our main steps was to find x such that Λ(x) = 0, and our aim is similar here.
Instead of seeking x which is a root of Λ(x) = 0 our goal is to find (f, x, g) with g = g(x) and f ∈ ℵ(x) such that Υ(f, x, g) = 0. Fixing K 1 , the value of K 2 such that Υ(f ′ = f (x ′′ +), x ′′ , g(x ′′ )) = 0 is given by Similarly, the value of K 2 such that Υ(x ′ = f (x ′′ −), x ′′ , g(x ′′ )) = 0 is given by This motivates the introduction of the linear increasing functions L u , Pictorially, L d and L u are the lower and upper boundaries, respectively, of the dotted triangular area G in Figure 8. From Figure 8 we identify four regions (and various subregions) on which four different martingale couplings and hedging strategies will be needed in order to find the highest model-based expected payoff of an American put. (Compare this with two regimes under the Dispersion Assumption in Figure 3.) Define , which we write more compactly as R 1 = {e 1 − < k 1 < x ′ , f (k 1 ) < k 2 < k 1 }. Using the same compact notation define In general, on the boundaries between the regions the boundaries could be allocated to either region. However, we allocate points on the boundary to the region where the hedge is simplest.
Recall the proof of Theorem 2. There, to show that M BEP = HC, we used the fact that under model M * , or more specifically, under any martingale coupling which mapped (f * , x * ) to (f * , g * ), the mass that is 'unexercised' at time 1 and is in-the-money at time 2 is given by − (as is the case when (K 1 , K 2 ) ∈ R 1 ∪ R 4 ∪ R 5 ) then the same proof applies, M BEP = HC and we have optimality. However, if (K 1 , K 2 ) ∈ R 2 ∪ R 3 , then it is not the case that f * < e 1 − and thus, in order to specify the optimal model, we need to impose additional structure on the couplingμ f * ,x * →ν f * ,g * .
Theorem 3. Suppose Assumption 2 holds and (K 1 , K 2 ) ∈ R. Depending on whether (K 1 , K 2 ) ∈ R 1 ∪ R 4 ∪ R 5 or R 2 ∪ R 3 , the models M * andM * and the stopping time τ * are the consistent models for which the price of the American option is the highest. The function ψ * defined in (17) defines the cheapest superhedge. Moreover, the highest model-based price is equal to the cost of the cheapest superhedge.
Proof. If (K 1 , K 2 ) ∈ R 1 ∪ R 4 ∪ R 5 then the proof is essentially the same as the proof of the first case in Theorem 2. We repeat it for convenience. First find x * ∈ (g −1 (K 1 ), K 1 ) and f * ∈ ℵ(x * ) such that Υ(f * , x * g * = g(x * )) = 0. If x * = x ′′ we find f * = f (x ′′ +) = f ′ . Under the candidate model M * mass below f * at time 1 is mapped to the same point at time 2 (which is possible since f * < e 1 − ), and mass in (f * , x * ) is mapped to (f * , g * ), while mass above x * is either mapped to below f * or to above g * . Then under the candidate stopping rule τ * the model-based expected payoff is equal to the cost of the hedging strategy generated by ψ * : Now suppose (K 1 , K 2 ) ∈ R 2 ∪R 3 . By Lemma 7 there is a unique x * ∈ (g −1 (K 1 ), x ′′ ] and f * ∈ ℵ(x * ) such that Υ(f * , x * , g * = g(x * )) = 0. If x * = x ′′ then we have f * = f (x ′′ −) = x ′ . Then, since ν is continuous we have that f * , x * , g * solve (9) and (10). Note, however, that x ′ ≤ f * < e 2 − . Under the candidate modelM * mass in (f ′ , x ′ ) at time 1 is mapped to the same interval at time 2. Also, mass below f ′ and mass in (x ′ , f * ) at time 1 is mapped to the same point at time 2, and mass in (f * , x * ) is mapped to (f * , g * ). Mass above x * is either mapped to below f ′ , to (x ′ , f * ), or to above g * . In particular (ν − µ)| (−∞,f ′ )∪(x ′ ,f * ) is the mass that was not 'exercised' at time 1 and is 'exercised' in-the-money at time 2. In other words, (ν − µ)| (−∞,f ′ )∪(x ′ ,f * ) is the probability underM * that (X > x * , Y < K 2 ). From (22) we have

Theorem 4. Suppose Assumption 2 holds and that
. Then any model with these properties with the stopping rule τ = 1 if X < K 1 and τ = 2 otherwise attains the highest consistent model price. The cheapest superhedge is generated from ψ(x) = (K 2 − x) + and the highest model-based price is equal to the cost of the cheapest hedge.
Proof. Let ψ(y) = (K 2 − y) + . Defining φ as in Lemma 2 we find φ(x) = (K 1 − x) + − (K 2 − x) + and the superhedging cost (which is the same for all the cases) is Suppose (K 1 , K 2 ) ∈ B 1 . Then using the properties of f and g and the left-curtain coupling we see that the proof that the model-based expected payoff is equal to the hedging cost is the same as in the second case of Theorem 2. In particular, Then under the left-curtain coupling mass from (f ′ , x ′ ) at time 1 is mapped to the same interval at time 2. Therefore mass which is below K 2 at time 2 was either below K 2 at time 1, or above x ′ at time 1. Therefore, we again have Finally, suppose (K 1 , K 2 ) ∈ B 3 . We again utilise the fact that under the left-curtain coupling, mass from (f ′ , x ′ ) at time 1 is mapped to the same interval at time 2. In both cases, the mass which is below K 2 at time 2 was either below K 2 at time 1, or above K 1 at time 1. In particular, mass that can be 'exercised' at time 2 is given by (ν − µ)| (−∞,f ′ )∪(x ′ ,K2) . Then using which ends the proof.
Then there is a consistent model for which (f ′ < X < x ′ ) = (f ′ < Y < x ′ ) and any model with this property with the stopping rule τ = 1 if X < K 1 and τ = 2 otherwise attains the highest consistent model price. The cheapest superhedge is generated from ψ x ′ defined in (26) and the highest model-based price is equal to the cost of the cheapest hedge.
Proof. First note that and the cost of this strategy (under any consistent model) is Figure 9: Picture of f and g along with superhedge for the blank region W. Now consider the model-based expected payoff. From (22) it follows that µ f ′ ,x ′ and ν f ′ ,x ′ have the same mean and mass, and are in convex order. Moreover, the same holds forμ f ′ ,x ′ andν f ′ ,x ′ . Hence there exists a martingale coupling, which we termπ x ′ ∈Π M (µ, ν), which maps the mass in (f ′ , x ′ ) at time 1 to the same interval at time 2. Under this model the only mass that can be 'exercised' at time 2 is therefore given by (ν − µ)| (−∞,f ′ ) .
Note that, since f ′ and x ′ satisfy (22), and hence Then given that we stop at time 1 if X < K 1 and at time 2 otherwise we have Recall the construction of L u and L d . For K 1 ∈ (x ′′ , g(x ′′ )) and K 2 ∈ (L d (K 1 ), L u (K 1 )) there does not exist x * ∈ (g −1 (K 1 ), K 1 ) such that Λ(x * ) = 0; instead Λ(x ′′ −) < 0 < Λ(x ′′ +). On the other hand, from (23) we have that there exists a martingale coupling of µ x ′ ,x ′′ and ν x ′ ,g(x ′′ ) . Moreover, note that the restrictions ofμ f ′ ,x ′ to (x ′ , x ′′ ) andν f ′ ,x ′ to (x ′ , g(x ′′ )) are equal to µ x ′ ,x ′′ and ν x ′ ,g(x ′′ ) , respectively. Then we define a martingale couplingπ x ′ ,x ′′ ∈Π(µ, ν) by combining the couplings of Given x ′ , and thus also x ′′ , we define the superhedge as follows. First define linear functions Figure 10: Picture of f and g along with superhedge for the dotted region G. The hedge function ψ x ′ ,x ′′ has a kink at x ′ .
Theorem 6. Suppose Assumption 2 holds and (K 1 , K 2 ) ∈ G. The model M x ′ ,x ′′ and the stopping time τ = 1 if X < x ′′ and τ = 2 otherwise attains the highest consistent model price. Moreover, ψ x ′ ,x ′′ defined in (27) generates the cheapest superhedge and the highest model-based price is equal to the cost of the cheapest superhedge.
Proof. Under the candidate model M x ′ ,x ′′ mass in (f ′ , x ′ ) at time 1 is mapped to the same interval at time 2, while the mass in (x ′ , x ′′ ) is mapped to (x ′ , g(x ′′ )). Then under the candidate stopping time (exercise at time 1 if X < x ′′ and at time 2 otherwise) the law of Y (under M x ′ ,x ′′ ) on the event that the option was not exercised at time 1 is given by Now consider the hedging cost generated by ψ Note that we can rewrite (27) as and thus the hedging cost is Now using (22) and the fact that g( (28) moreover (23) gives that Then, combining (28) and (29) we conclude that HC = M BEP .

K 1 > r µ
In Lemma 5, and under the Dispersion Assumption, we constructed f and g but only on the interval (e − , r µ ]. More generally, under Standing Assumption 1 the arguments of Beiglböck and Juillet [6] and Henry-Labordère and Touzi [16] allow us to construct T d = f and T u = g on [ℓ µ , r µ ] for arbitrary laws µ ≤ cx ν. For their purposes the definitions of f and g outside the range of µ are not important since they have no impact on the construction of the left-curtain martingale coupling. Nonetheless, we can extend the definitions of f and g to R in a way which respects the conditions in Lemma 3, by setting We will show that with these definitions for f and g the analysis of the previous sections extends to the case K 1 > r µ . Suppose r ν > r µ and r µ < K 1 < r ν . Then Λ(r µ ) = rν −K1 rν −rµ − (K1−K2) rµ−ℓν and Λ(r ν −) = ∞. If Λ(r µ ) ≥ 0 and Λ is continuous then there exists x * ∈ [ℓ µ , r µ ] such that g(x * ) > x * and Λ(x * ) = 0. Then, exactly as in Section 3.2.2 we can construct a model, stopping time and superhedge such that the model-based expected payoff equals the hedging cost, and hence the model, stopping time and hedge are all optimal. The model could be based on the left-curtain coupling, and the optimal exercise rule is to exercise the American put at time 1 if X < x * . Even if Λ is not continuous, there may exist x * such that Λ(x * ) = 0 and the same arguments apply (see Section 3.3.1). If not, then we are in the setting of Section 3.3.4, but again we can identify the optimal model and hedge. Essentially, the case Λ(r µ ) ≥ 0 is covered by a direct extension of existing arguments. Note that Λ(r µ ) ≥ 0 is equivalent to Now suppose r µ < K 1 < r ν and Then Λ(r µ ) < 0 and since Λ(r ν −) = ∞ and Λ is continuous on [r µ , r ν ] (note that we have defined f and g to be constants on this range) there must exist x * ∈ (r µ , K 1 ) such that Λ(x * ) = 0. It is always optimal to exercise at time 1 and any martingale coupling can be used to generate a model which attains the highest model based price of P µ (K 1 ) = (K 1 − µ). A cheapest superhedge is generated by The cost of this hedge is Finally suppose K 1 > r ν . Then X < K 1 almost surely under any consistent model and for X < K 1 It is always optimal to exercise the American put at time 1. If K 2 > r ν or K 2 < ℓ ν then we are in the case studied in Section 3.3.2 and a cheapest hedge is generated by a time 2 payoff ψ(y) = (K 2 − y) + . If K 2 ∈ [ℓ ν , r ν ] then we are in the case studied in Section 3.3.3 and a cheapest superhedge is generated by ψ = ψ(y) where ψ is given by (30). In either case the highest model-based expected payoff is P µ (K 1 ) = (K 1 −μ) and this is also the cost of the superhedge.
ℓ ν r µ r ν Figure 11: The various cases for K 1 > r ν in the setting of Section 3.3.
For both jumps in f and g, we have a pictorial representation of the regions of pairs (K 1 , K 2 ) which lead to a hedging strategy which has to be adapted as above, see Figure 14. If g has a jump atx, then Λ(x−) < 0 and Λ(x+) > 0 is equivalent to point (K 1 , K 2 ) lying in the interior of a triangle with vertices {(g(x−), g(x−)), (g(x+), g(x+)), (x, f (x))}. On the other hand, if f jumps downwards atx, then Λ(x−) < 0 and Λ(x+) > 0 is equivalent to point K 1 , K 2 lying in the interior of a triangle with vertices {(x, f (x−)), (x, f (x+)), (g(x), g(x))} (compare this with a region G).

The general case for continuous ν
In the previous sections we showed how the left-curtain coupling can be used to find an optimal model, exercise strategy and a superhedge, under the assumption that both µ and ν are continuous together with further regularity and simplifying assumptions which we labelled the Dispersion Assumption and the Single Jump Assumption. Under the latter assumption, the existence of points that solve (22) led us to identify two further types of hedging strategy that were not present under the dispersion assumption, making four in total.
If we relax the assumptions further and require only that both µ and ν are continuous, then we expect that there exist multiple pairs (f ′ i , x ′ i ), i = 1, 2, 3, ..., that solve (22). Note that from the monotonicity of g we can write {x : g(x) > x} as a countable union of intervals, and on each such interval f is decreasing. f jumps over the intervals (f ′ i , x ′ i ) identified above (at least those with x ′ to the left of the current value of x). In particular, f has only countably many downward jumps. Figure 1 is a stylized representation of the general left-curtain martingale coupling, not least because in the figure f has only finitely many jumps. Using Figure 1 we can divide (K 1 , K 2 < K 1 ) into four regions, see Figure 13. They key point is that these four regions are characterised exactly as in the cases described in Section 3.3. For given (K 1 , K 2 ) we can determine which of the types of hedging strategy is a candidate optimal superhedge, and determine a candidate optimal stopping rule. (We can always use the model associated with the left-curtain martingale coupling π lc .) The fact that these candidates are indeed optimal can be proved using exactly analogous techniques to those used in Section 3.3. Figure 13: General picture of f, g with shading of regions. There remain 4 types of shading corresponding to 4 forms of optimal hedge.
More specifically, we can divide {(k 1 , k 2 ) : k 2 < k 1 } into {(k 1 , k 2 ) : k 2 ≤ f (k 1 )} ∪ {(k 1 , k 2 ) : f (k 1 ) < k 2 < k 1 }. We can divide the former into two regions W = {(k 1 , k 2 ) : K 2 < k 1 , ∃x ≤ k 1 such that f (x) < k 2 < g(x)} and B = {(k 1 , k 2 ) : k 2 ≤ f (k 1 )} \ W. The latter we again divide into two regions G and R = {(k 1 , k 2 ) : f (k 1 ) < k 2 < k 1 } \ G. Here we can write G = ∪ x:f (x−)>f (x+) ∆(x) where ∆(x) is a triangle with vertices (x, f (x+)), (x, f (x−)) and (g(x), g(x)). Then on each of the regions W, B, G and R we have a superhedge exactly as described in Section 3.3. Moreover, again by the arguments of Section 3.3, we can show that the hedging cost associated with the super-hedging strategy is precisely the model-based expected payoff of the American put under the martingale coupling π lc (and candidate stopping rule) thus proving the optimality of the hedge and of the model/exercise rule.
Remark 11. The set {x : g(x) > x} is a collection of intervals and we let I + denote the set of rightendpoints of these intervals. As remarked above, Figure 13 is drawn in the case of 'finite complexity' in the sense that the set I + contains a finite number of elements. The results extend easily to countable I + provided I + contains no accumulation points.
In general I + may contain an accumulation point, and as discussed in Henry-Labordère and Touzi [16], care is needed in the construction of the left-curtain mappings (T d , T u ) in this case. However, from our perspective such subtleties do not cause a problem. The reason for this is we do not aim to derive the left-curtain coupling, but rather take the left-curtain coupling as a given, and use it to solve the put pricing problem.
Our construction of the best model and the cheapest hedge is local in the sense that when in Figure 13 we look at in which region the point (K 1 , K 2 ) lies, the fine detail of the picture in other parts of (k 1 , k 2 )space is not important. So, the existence of accumulation points can only be an issue if K 1 is equal to one of those accumulation points. Let x ∞ be such an accumulation point in I + and suppose K 1 = x ∞ . Depending on the value of K 2 then either there exists (x ′ , f ′ ) with f ′ < K 2 < x ′ such that (22) holds or not. In the former case we can follow the analysis of Section 3.3.3, and in the latter Section 3.3.2: in either case we construct a model and hedge such that the model price and hedging cost agree, thus proving optimality of both.

Atoms in the target law
When ν has atoms, the preservation of mass and mean conditions become (11) and (12), respectively. In particular, atoms of ν correspond to the flat sections in f or g. See Figure 14. In this case we still can find all the optimal quantities as before. In particular, Λ(x) := g(x)−K1 g(x)−x − (K1−K2) x−f (x) is strictly increasing in x, even if f and/or g is constant. Hence we can find solutions to Λ = 0 (more generally solutions x, f ∈ ℵ(x) to Υ(f, x, g = g(x)) = 0) exactly as before. The superhedge is unchanged. A little care is needed in constructing the optimal model, but mass in (f (x * ), x * ) is mapped to (f (x * ), g(x * )) together with (potentially) atoms at f (x * ) or g(x * ). Specifically, given f * , x * , g * we can find λ * f and λ * g such that (11) and (12) hold. Then, in any optimal model mass in (f * , x * ) is mapped to ν x * which is defined to be ν x * = ν| (f * ,g * ) + λ * f * δ f * + λ * g * δ g * and mass outside (f * , x * ) is mapped to ν − ν x * . with the left-curtain coupling is typically not optimal. The reason is that a model (S, M ) is only optimal when it is combined with the best stopping rule, and the optimal stopping rule does depend on (K 1 , K 2 ). Conversely, although the model associated with the left-curtain coupling is optimal (simultaneously across all pairs K 1 , K 2 ), we do not need the full power of this coupling when we work with fixed (K 1 , K 2 ). In the dispersion assumption case all we need is a coupling in which (f (x * ), x * ) is mapped to (f (x * ), g(x * )) where x * is such that Λ(x * ) = 0. There are many martingale couplings which have this property.
The intuition behind the optimality of the left-curtain coupling is as follows. With American puts there is a tension between the time-decay of the option payout promoting early exercise, and the convexity of the payoff function promoting delay. If the aim is to maximise the payoff of the option then any paths which are in-the-money at time 1, and will remain in-the-money, are best exercised at time 1. However, once a path has been exercised, any further volatility is irrelevant. In particular, when designing a candidate optimal model we should try to keep paths which are exercised at time 1 constant (or near constant) whenever possible. Thus the probability space should be split into two regions: one region where the put is in-the-money at time 1 and is exercised, and thereafter paths move little, and a second region where the put is out-of-the-money at time 1 (and sometimes just in-the-money, but left unexercised at time 1) and then paths move a long way between times 1 and 2. The left-curtain coupling has this property.

Multiple exercise times
It is natural to ask if it is possible to extend the analysis to American puts which can be exercised at multiple dates (T 1 , T 2 , . . . T N ) where N > 2, or equivalently to martingales M = (M n ) 0≤n≤N with marginals (µ n ) where µ 1 has mean M 0 =μ and µ n ≤ cx µ n+1 for 1 ≤ n ≤ N − 1. It is clear that many of the ideas extend naturally to the multi-marginal case. However, the number of types of hedging strategy may grow exponentially with N . This is left as future work.