Incorporating statistical model error into the calculation of acceptability prices of contingent claims

The determination of acceptability prices of contingent claims requires the choice of a stochastic model for the underlying asset price dynamics. Given this model, optimal bid and ask prices can be found by stochastic optimization. However, the model for the underlying asset price process is typically based on data and found by a statistical estimation procedure. We define a confidence set of possible estimated models by a nonparametric neighborhood of a baseline model. This neighborhood serves as ambiguity set for a multi-stage stochastic optimization problem under model uncertainty. We obtain distributionally robust solutions of the acceptability pricing problem and derive the dual problem formulation. Moreover, we prove a general large deviations result for the nested distance, which allows to relate the bid and ask prices under model ambiguity to the quality of the observed data.


Introduction
The no-arbitrage paradigm is the cornerstone of mathematical finance. The fundamental work of Harrison, Kreps and Pliska [12][13][14]21] and Delbaen and Schachermayer [6], to mention some of the most important contributions, paved the way for a sound theory for the pricing of contingent claims. In a general market model, the exclusion of arbitrage opportunities leads to intervals of fair prices.
Typically, the resulting no-arbitrage price bounds are too wide to provide practically meaningful information. 1 In practice, market-makers wish to have a framework for controlling the acceptable risk when setting their spreads. Pioneering contributions to incorporate risk in the pricing procedure for contingent claims were made by Carr, Geman and Madan [3] as well as Föllmer and Leukert [8,9], subsequent generalizations being made, e.g., by Nakano [24] or Rudloff [42]. The pricing framework of the present paper is in this spirit: by specifying acceptability functionals, an agent may control her shortfall risk in a rather intuitive manner. In particular, using the Average-Value-at-Risk (AV@R α ) will allow for a whole range of prices between the extreme cases of hedging with probability one (the traditional approach) and hedging w.r.t. expectation by varying the parameter α .
Nowadays, there is great awareness of the epistemic uncertainty inherent in setting up a stochastic model for a given problem. For single-stage and two-stage situations, there is a plethora of available literature on different approaches to account for model ambiguity (see the lists contained in [31, pp. 232-233] or [45, p. 2]). Recently, balls w.r.t. the Kantorovich-Wasserstein distance around an estimated model have gained a lot of popularity (e.g., [7,10,11,23,25,46]), while originally proposed by Pflug and Wozabal [34] in 2007. However, the literature on nonparametric ambiguity sets for multistage problems is still extremely sparse. Analui and Pflug [1] were the first to study balls w.r.t. the multistage generalization of the Kantorovich-Wasserstein distance, named nested distance, 2 for incorporating model uncertainty into multistage decision making. It is the aim of this article to further explore this rather uncharted territory. The classic mathematical finance problem of contingent claim pricing serves as a very well suited instance for doing so. In fact, while in the traditional pointwise hedging setup only the null sets of the stochastic model for the dynamics of the underlying asset price process influence the resulting price of a contingent claim, the full specification of the model affects the claim price when acceptability is introduced. Thus, model dependency is even stronger in the latter case, which is the topic of this paper.
Stochastic optimization offers a natural framework to deal with the problems of mathematical finance. Application of the fundamental work of Rockafellar and Wets [35][36][37][38][39][40][41] on conjugate duality and stochastic programming has led to a stream of literature on those topics. King [17] originally formulated the problem of contingent claim pricing as a stochastic program. Extensions of this approach have been made, amongst others, by King, Pennanen and their coauthors [17][18][19][20][26][27][28], Kallio and Ziemba [16] or Dahl [5]. The stochastic programming approach naturally allows for incorporating features and constraints of realworld markets and allows to efficiently obtain numerical results by applying the powerful toolkit of available algorithms for convex optimization problems.
The main contribution of this article is the link between statistical model error and the pricing of contingent claims, where the pricing methodology allows for a controlled hedging shortfall. The setup is inspired by practically very relevant aspects of decision making under both aleatoric and epistemic uncertainty. Given the stochastic model from which future evolutions are drawn, agents are willing to accept a certain degree of risk in their decisions. However, it may be dangerously misleading to neglect the fact that it is impossible to detect the true model without error. Thus, a distributionally robust framework, which takes the limitations of nonparametric statistical estimation into account, is required. In the statistical terminology, balls w.r.t. the nested distance may be seen as confidence regions: by considering all models whose nested distance to the estimated baseline model does not exceed some threshold, it is ensured that the true model is covered with a certain probability and hence the decision is robust w.r.t. the statistical model estimation error. In particular, we prove a large deviations theorem for the nested distance, based on which we show that a scenario tree can be constructed out of data such that it converges (in terms of the nested distance) to the true model in probability at an exponential rate. Thus, distributionally robust claim prices w.r.t. nested distance balls as ambiguity sets include a hedge under the true model with arbitrary high probability, depending on the available data. In other words, we provide a framework that allows for setting up bid and ask prices for a contingent claim which result from finding hedging strategies with truly calculated risks, since the important factor of model uncertainty is not neglected.
This paper is organized as follows. In Section 2 we introduce our framework for acceptability pricing, i.e., we replace the traditional almost sure super-/ subreplication requirement by the weaker constraint of an acceptable hedge. The acceptability condition is formulated w.r.t. one given probability model. This lowers the ask price and increases the bid price such that the bid-ask spread may be tightened or even closed. Section 3 contains the main results of this article. We weaken the assumption of one single probability model assuming that a collection of models is plausible. In particular, we define the distributionally robust acceptability pricing problem and derive the dual problem formulation under rather general assumptions on the ambiguity set. The effect of the introduction of acceptability and ambiguity into the classical pricing methodology is nicely mirrored by the dual formulations. Moreover, we give a strong statistical motivation for using nested distance balls as ambiguity sets by proving a large deviations theorem for the nested distance. Section 4 contains illustrative examples to visualize the effect of acceptability and model ambiguity on contingent claim prices. In Section 5 we discuss the algorithmic solution of the AV@Racceptability pricing problem w.r.t. nested distance balls as ambiguity sets. In particular, we exploit the duality results of Section 3 and the special stagewise structure of the nested distance by a sequential linear programming algorithm which yields approximate solutions to the originally semi-infinite non-convex problem. In this way, we overcome the current state-of-the-art computational methods for multistage stochastic optimization problems under non-parametric model ambiguity. Finally, we summarize our results in Section 6.

Acceptability functionals
The terminology introduced in this section follows the book of Pflug and Römisch [33]. A detailed discussion of acceptability functionals and their properties can be found therein. Intuitively speaking, an acceptability functional A maps a stochastic position Y ∈ L p (Ω), 1 < p < ∞, defined on a probability space (Ω, F, P), to the real numbers extended by −∞ in such a way that higher values of the position correspond to higher values of the functional, i.e., a 'higher degree of acceptance'. In particular, the defining properties of an acceptability functional are translation equivariance, 3 concavity, monotonicity, 4 and positive homogeneity. We assume all acceptability functionals to be version independent, 5 i.e., A(Y ) depends only on the distribution of the random variable Y .
The following proposition is well-known. It follows directly from the Fenchel-Moreau-Rockafellar Theorem (see [35,Th. 5] and [33,Th. 2.31]). Proposition 1. An acceptability functional A which fulfills the above conditions has a dual representation of the form where Z is a closed convex subset of L q (Ω), with 1/p + 1/q = 1 . We call Z the superdifferential of A. Monotonicity and translation equivariance imply that all Z ∈ Z are nonnegative densities.
Assumption A1. There exists some constant K 1 ∈ R such that for all Z ∈ Z it holds Z q ≤ K 1 .
This assumption implies that A is Lipschitz on L p : A good example for such an acceptability functional is the Average Valueat-Risk, AV@R α , whose superdifferential is given by The extreme cases are represented by the essential infimum (AV@R 0 (Y ) := lim α↓0 AV@R α (Y ) = essinf(Y ) 6 ) and the expectation (α = 1). Its superdifferentials are given by the set of all probability densities and just the function identically 1, respectively.
Other common names for the AV@R are Conditional-Value-at-Risk, Tail-Value-at-Risk, or Expected Shortfall. The subtleties between these terminologies are, e.g., addressed in Sarykalin et al. [43]. All our computational studies in Section 4 and Section 5 will be based on some AV@R α , while our theoretical results are general. 5 For version independent acceptability functionals, upper semi-continuity follows from concavity (see Jouini, Schachermayer and Touzi [15]). 6 Strictly speaking, Assumption A1 is not respected by AV@R 0 . However, all our results on AV@R-acceptability pricing will hold true also for AV@R 0 . In fact, this is the special case which is well treated in the literature.

Acceptable replications
Let us now introduce the notion of acceptability in the pricing procedure for contingent claims.
As usual in mathematical finance, we consider a market model as a filtered probability space (Ω, F, P), where the filtration is given by the increasing sequence of sigma-algebras F = (F 0 , F 1 , . . . , F T ) with F 0 = {∅, Ω}. The liquidly traded basic asset prices are given by a discrete-time R m + -valued stochastic process S = (S 0 , . . . , S T ), where S t = ). We assume the filtration to be generated by the asset price process.
One asset, denoted by S (1) , serves as numéraire (a risk-less bond, say). We assume w.l.o.g. that S (1) t = 1 a.s. If not, we may replace (S A contingent claim C consists of an F-adapted series of cash flows C = (C 1 , . . . , C T ) measured in units of the numéraire. The fact that the payoff C t is contingent on the respective state of the market up to time t is reflected by the condition that C is adapted to the filtration F, for which we write C ¡ F.
To be more precise, let , and L 1 q := L q (Ω, F 1 ) × · · · × L q (Ω, F T ) . We assume that S ∈ L m p , x ∈ L m ∞ and C ∈ L 1 p . The norm in L m p is given by and similarly for L m ∞ . Notice that x 0 and S 0 are deterministic vectors. Assumption A2. We assume that all claims are Lipschitz-continuous functions of the underlying asset price process S. Definition 1. Consider a contingent claim C and fix acceptability functionals A t , for all t = 1, . . . , T . We assume that all functionals A have a representation given by Proposition 1. Then the acceptable prices are given by the optimal values of the following stochastic optimization programs: i) the acceptable ask price of C is defined as ii) the acceptable bid price of C is defined as where the optimization runs over all trading strategies x ∈ L m ∞ for the liquidly traded assets. The constraints in (2a) and (3a) are formulated for all t = 1, . . . , T − 1.
To interpret Definition 1, the acceptable ask price is given by the minimal initial capital required to acceptably superhedge the cash-flows C t , which have to be paid out by the seller. On the other hand, the acceptable bid price corresponds to the maximal amount of money that can initially be borrowed from the market to buy the claim, such that by receiving the payments C t and always rebalancing one's portfolio in an acceptable way, one ends up with an acceptable position at maturity.
In what follows we will mainly consider the ask price problem (P) and its variants. The bid price problem (P ) is its mirror image and all assertions and proofs for the problem (P) can be rewritten literally for problem (P ).
Assumption A3. The optima are attained and all solutions x to the problems (P β ), for β in a neighborhood of 0, are uniformly bounded, i.e., We show the following auxiliary result for the problems (P β ).

Lemma 1.
Let v β be the optimal value of (P β ) and v * be the optimal value of (P). Then, in a neighborhood of 0, Proof. If v β is the optimal value of (P β ), then by inclusion of the feasible sets We have to bound v |β| − v −|β| . Let x * t be the solution of (P −|β| ). x * t is not necessarily feasible for (P |β| ). We modify x * t in order to get feasibility for (P |β| ). Let a t , t = 1, . . . , T − 1 , be the vector with identical components 2 T s=t+1 |β s | which concludes the proof.
Notice that the primal program (P) is semi-infinite, if the constraints are written in the extensive form where Z = (Z 1 , . . . , Z T ) ∈ L 1 q . Lemma 2 below demonstrates the validity of an approximation with only finitely many supergradients.
Since the L p spaces are separable, there exist sequences (Z t,1 , Z t,2 , . . . ) that are dense in Z t , for each t . Let as n → ∞.

Lemma 2.
Let v * be the optimal value of the basic problem (P) and let v * n be the optimal value of the similar optimization problem t be the solution of (P). We may find finite sub-sigma-algebrasF t ⊆ F t such that with we have that Denote by (P) the variant of the problem (P), where the processes (S t ) and (C t ) are replaced by (S t ) and (C t ). Similarly as before introduce the notatioñ By Lemma 1 we may conclude that v * ≤ṽ * + η, whereṽ * is the optimal value of (P). Let (P n ) be the variant of problem (P), where all A t are replaced by A t,n . The optimal value of (P n ) is denoted bỹ v * n . In this finite situation we may show thatṽ * n ↑ṽ * . Obviously,ṽ * n is a monotonically increasing sequence withṽ * n ≤ṽ * . It remains to demonstrate that lim nṽ * n cannot be smaller thanṽ * . For this, letx n * be a solution of (P n ). Because of the finiteness of the filtrationF, the solutions of (P n ) as well as ofP are just bounded vectors in some high-, but finite dimensional R N and are all bounded by K 2 . Letx * * be an accumulation point of (x n * ), i.e., we have for some subsequence thatx ni * →x * * . We show thatx * * satisfies the constraints of (P).
Suppose the contrary. Then there is a t such that Since the objective function is continuous inx this implies that lim iṽ * ni =ṽ * and, by monotonicity, lim nṽ * n = v * . We have therefore shown that we can find an index n such that v * <ṽ * n + η .
Let x n * be the solution of (P n ) and letx n * = E[x n * |F t ] . Analogously as before, one may prove that Putting (5), (6) and (7) together one sees that which contradicts the assumption that v * n < v * − 3η .
We now turn to the duals of the problems (P) and (P ), called (D) and (D ), respectively. It turns out that also in our general acceptability case a martingale property appears in the dual as it is known for the case of a.s. super-/ subreplication. Theorem 1. For all t = 1, . . . , T , let A t be acceptability functionals with corresponding superdifferentials Z t . Then, the acceptable ask price is given by (8b) and the acceptable bid price is given by Proof. The acceptable ask/ bid price corresponds to a special case of the distributionally robust acceptable ask/ bid price introduced in Definition 2 below, namely when the ambiguity set reduces to a singleton. Hence, the validity of Theorem 1 follows directly from the proof of Theorem 2.
Remark 1 (Interpretation of the dual formulations). The objective of the dual formulations (D) and (D ) is to maximize (minimize, resp.) the expected value of the payoffs resulting from the claim w.r.t. some feasible measure Q.
The constraints (8a) and (9a) require Q to be such that the underlying asset price process is a martingale w.r.t. Q. This is well known from the traditional approach of pointwise super-/ subreplication. The acceptability criterion enters the dual problems in terms of the constraints (8b) and (9b), which reduce the feasible sets by a stronger condition than the two probability measures just having the same null sets. Making the feasible sets smaller obviously lowers the ask price and increases the bid price and thus gives a tighter bid-ask spread.
Proposition 2. For fixed acceptability functionals A 1 , . . . , A T , consider the acceptable ask price π a (P) as a function of the underlying model P . This function is Lipschitz.
Proof. The assertion follows from Theorem 5 in the Appendix, considering the Lipschitz property of claims (Assumption A2) and the problem formulation resulting from Theorem 1.

Model ambiguity and distributional robustness
Traditional stochastic programs are based on a given and fixed probability model for the uncertainties. However, already since the pioneering paper of Scarf [44] in the 1950s, it was felt that the fact that these models are based on observed data as well as the statistical error should be taken into account when making decisions. Ambiguity sets are typically either a finite collection of models or a neighborhood of a given baseline model. In what follows we study the latter case and, in particular, we use the nested distance to construct parameter-free ambiguity sets.

Acceptability pricing under model ambiguity
In Section 2.2 we defined the bid/ ask price of a contingent claim as the maximal/ minimal amount of capital needed in order to sub-/ superhedge its payoff(s) w.r.t. to an acceptability criterion. However, the result computed with this approach heavily depends on the particular choice of the probability model. This section weakens the strong dependency on the model. More specifically, acceptable bid and ask prices shall be based on an acceptability criterion that is robust w.r.t. all models contained in a certain ambiguity set.
Definition 2. Consider a contingent claim C. Then, for acceptability functionals A t , t = 1, . . . , T , and an ambiguity set P ε of probability models, i) the distributionally robust acceptable ask price of C is defined as ii) the distributionally robust acceptable bid price is defined as where the optimization runs over all trading strategies x ∈ L m ∞ for the liquidly traded assets. The constraints in (10a) and (11a) are formulated for all t = 1, . . . , T − 1 and A P t denotes the value of the acceptability functional when the underlying probability model is given by P.
Theorem 2. Let P ε be a convex set of probability models, which is spanned by a sequence of models (P 1 , P 2 , . . .) . Moreover, let P ε be dominated by some model P 0 and assume all densities w.r.t. P 0 to be bounded. For t = 1, . . . , T , let A t be acceptability functionals with corresponding superdifferentials Z At . Then, the distributionally robust acceptable ask price is given by (12b) and the distributionally robust acceptable bid price is given by Then, the constraints in (PP ) can be written in the form Since all densities f t are bounded by assumption, 7 Lemma 2 holds true if we replace Z t ∈ Z t by d t ∈ D t . It can easily be seen that for each t there are sequences (d t,1 , d t,2 , . . .) which are dense in D t . Let us define 7 It would be sufficient to assume Z A t ⊆ Ls and ft ∈ Lr such that 1 r + 1 s = 1 q . However, for simplicity, we keep Z A t ⊆ Lq and assume ft ∈ L∞.
Then, it holds that D n t ⊆ D n+1 t and n D n t = D t . Thus, by Lemma 2 we may approximate (PP) by a problem of the form Rearranging its Lagrangian leads to the following representation of (PP n ) : where W n t := This is a finite-dimensional bilinear problem. Notice that (PP n ) is always feasible. 8 We may thus interchange the inf and the sup. Carrying out explicitly the minimization in x, the unconstrained minimax problem (14) can be written as the constrained maximization problem

the problem can be rewritten in terms of Q in the form
It is left to show that there is no duality gap in the limit, as n → ∞ . Assume that the dual problem (DD) has an optimal value π a = π a . By the primal constraints in (PP), for any dual feasible solution Q it holds Thus, the optimal primal solution π a is also greater than or equal to the optimal dual solution π a . Now assume π a < π a . Then, since π n a ↑ π a by Lemma 2, there must exist some n such that π n a > π a . Moreover, there exists some Q n , which is dual feasible and such that E Q n T t=1 C t = π n a . This is a contradiction to π a being the limit of the monotonically increasing sequence of optimal values of the approximate dual problems of the form (DD n ). Hence, π a = π a , i.e., it is shown that there is no duality gap in the limit.
Finally, considering the structure of D t , the condition dQ dP0 Ft ∈ D t means that it is of the form Z t f t , where there exists some P ∈ P ε such that Z t ∈ Z A P t and dP dP0 Ft = f t . This completes the derivation of the dual problem formulation (DD).

Nested distance balls as ambiguity sets: a large deviations result
In order to find appropriate nonparametric distances for probability models used in the framework of stochastic optimization, one has to observe that a minimal requirement is that it metricizes weak convergence and allows for convergence of empirical distributions. The Kantorovich-Wasserstein distance does metricize the weak topology on the family of probability measures having a first moment. Its multistage generalization, the nested distance, measures the distance between stochastic processes on filtered probability spaces. The Appendix contains the definition and interpretation of both, the Kantorovich-Wasserstein distance and the nested distance. Realistic probability models must be based on observed data. While for single-or vector-valued random variables with finite expectation the empirical distribution based on an i.i.d. sample converges in Kantorovich-Wasserstein distance to the underlying probability measure, the situation is more involved for stochastic processes. The simple empirical distribution for stochastic processes does not converge in nested distance (cf. Pflug and Pichler [32]), but a smoothed version involving density estimates does.
As we show here by merging the concepts of kernel estimations and transportation distances, one may get good estimates for confidence balls and ambiguity sets under some assumptions on regularity.
Let P be the distribution of the stochastic process ξ = (ξ 1 , . . . , ξ T ) with values ξ t ∈ R m . Notice that P is a distribution on R with = m · T . Let P n be the probability measure of n independent samples from P. If ξ (j) = (ξ (j) 1 , . . . , ξ (j) T ), j = 1, . . . , n is such a sample, then the empirical distributionP n puts the weight 1/n on each of the paths ξ (j) . For the construction of nested ambiguity balls, the empirical distribution has to be smoothed by convolution with a kernel function k(x) for x ∈ R . For a bandwidth h > 0 to be specified In what follows we will work with the kernel density estimatef n =P n * k h , where * denotes convolution.

1.
The support of P is a set D = D 1 × · · · × D T , where D i are compact sets in R m ;

2.
P has a Lebesgue density f , which is Lipschitz on D with constant L;

3.
f is bounded from below and from above on D by 0 < c ≤ f (x) ≤ c; 4. the kernel function k vanishes outside the unit ball and is Lipschitz with constant L;
Remark 2. The proof of Theorem 3 below relies on the lower bound c of the density. As the denominator of the conditional density f (x|y) = f (x, y)/f (y) has to be estimated by density estimation as well, the bound ensures that the denominator does not vanish. In fact, the assumptions on the compact cube (point 1.) can be weakened to D being a compact set; the proof, however, is slightly more involved then. For the other technical assumptions (under point 5.) we may refer to Mirkov and Pflug [22].
Theorem 3 (Large deviation for the nested distance). Under Assumption A4 there exists a constant K > 0 such that for n sufficiently large and appropriately chosen bandwidth h. Here, dI denotes the nested distance.
The proof of (16) is based on several steps presented as propositions below. To start with we recall two important results for density estimatesf n =P n * k h for densities f on R .
Proposition 3. Under the Lipschitz conditions for f and k given above, it holds that if the bandwidth is chosen as h = ε/(2L). Proposition 4. Let f and g be densities vanishing outside a compact set D and set P f (A) =´A f (x)dx resp. P g (A) =´A g(x)dx . Then their Wasserstein distance d is bounded by Here ∆ is the diameter of D and λ(D) is the Lebesgue measure of D.
The next result extends the previous for conditional densities.
Proposition 5. Let f and g be bivariate densities on compact setsD 1 × D 2 bounded by 0 < c ≤ f, g ≤ c < ∞ which are sufficiently close so that Then there is a universal constant κ 1 , depending on the setD :=D 1 ×D 2 only, so that the conditional densities are close as well, i.e., they satisfy for all x ∈D 1 and y ∈D 2 , i.e., Proof. To abbreviate the notation set ε := sup x,y |f (x, y) − g(x, y)| and note that ε ≤ cλ(D)[2∆ ] −1 . Consider the marginal density f (y) :=´D 1 f (x, y)dx (g(y) :=´D 1 g(x, y)dx, resp.). It holds that Clearly |f (y)| ≥ cλ(D 1 ), where λ(D 1 ) is the Lebesgue measure ofD 1 and therefore The elementary inequality 1 1+x ≤ 1 + 2 |x| is valid for x ≥ − 1 /2. With (20) it follows that The assertion of the proposition finally follows by exchanging the roles of the densities f and g.
Theorem 4. Given Assumption A4 there exists a constant κ 2 such that for all ε > 0 and n sufficiently large.
Proof. It follows from (18) and (19) that Recall the large deviation result from [2, Th. 2.8], which is given by P n (d(P n , P) > η) ≤ exp(−nκ η 2 ) , for some universal constant κ depending on the Lipschitz constants of f and k only.
We employ the results elaborated above forP :=P n * k h . Then .
We employ (21) to deduce that The desired large deviation result follows for n sufficiently large for any The smoothed modelP n * k h is not yet a tree, but by Theorem 6 of the Appendix one may find 9 a finite tree processP n , which is arbitrarily close to it. Therefore, by eventually increasing the probability bound in (16) by another constant factor, it holds true also forP n .
Remark 3. From a statistical perspective, the results contained in this section represent a strong motivation to use nested distance balls as ambiguity sets for general stochastic optimization problems on scenario trees constructed from observed data. In particular, the distributionally robust acceptable ask price allows the seller of a claim to invest in a trading strategy which gives an acceptable superhedge of the payments to be made under the true model with arbitrary high probability, given sufficient available data.

Illustrative examples
One may summarize the results of the previous sections in the following way: If the martingale measure is not unique ('incomplete market'), then typically there is a positive bid-ask spread in the (pointwise) replication model. This spread does also exist in the acceptability model. However, if the acceptability functional is the AV@R α , then by changing α we can get the complete range between the replication model (α → 0) and the expectation model (α = 1). At least in the latter case, but possibly even for some α < 1 , there is no bid-ask spread and thus a unique price. On the other hand, model ambiguity widens the bid-ask spread: The more models are considered, i.e., the larger the radius of the ambiguity set, the wider is the bid-ask spread. For illustrative purposes, let us look at the simplest form of examples which demonstrate these effects. Since infinitely many equivalent martingale measures can be constructed on this tree, there is a considerable bid-ask spread for the pointwise replication model, which corresponds to the AV@R α -acceptability pricing model with α = 0. However, by increasing α for both contract sides, the bid-ask spread gets monotonically smaller. For α = 1, there is no bid-ask spread, since all martingale measures coincide in their expectation and both buyer and seller only consider expectation in their valuation. Figure 1a visualizes this behavior for the price of a call option struck at 95%: the bid price increases with α, while the ask price decreases. For α = 1 they coincide.
Computationally, AV@R-acceptability pricing on scenario trees boils down to solving a linear program (LP). It is thus straightforward to implement and the problem scales with the complexity of LPs. This tree can carry only one single martingale measure. In such a model, the change of acceptability levels does not change the price, since also under weakened acceptability the price is determined by a martingale measure, namely the unique one (in case α is small enough such that it is feasible). However, in an ambiguity situation, a bid-ask spread may appear, since there are typically many martingale measures contained in ambiguity sets. We consider nested distance balls around the baseline tree, where we keep the uniform distribution of the scenarios for simplicity, but allow the values of the process to change. 10 (a) Acceptability: The bid-ask spread tightens for increasing acceptability.  The result for a call option struck at 95% can be seen in Figure 1b. While there is a unique price for small radii ε of the nested distance ball, an increasing bid-ask spread appears for larger values of ε.

Algorithmic solution
The nested distance between two given scenario trees can be obtained by solving an LP. However, the distributionally robust AV@R-acceptability pricing problem w.r.t. nested distance balls as ambiguity sets results in a highly non-linear, in general non-convex problem. Therefore, we assume the tree structure to be given by the baseline model. In particular, it is assumed that different probability models within the ambiguity set differ only in terms of the transition probabilities; state values and the information structure are kept fixed.
Still, distributionally robust acceptability pricing is a semi-infinite non-convex problem. The only algorithmic approach available in the literature for similar problems is based on the idea of successive programming (cf. [31,Chap. 7.3.3]): an approximate solution is computed by starting with the baseline model only and alternately adding worst case models and finding optimal solutions. However, for typical instances of tree models this is computationally hard, as it involves the solution of a non-convex problem in each iteration step.
Hence, we tackle the dual formulation presented in Theorem 2. The structure of the nested distance enables an iterative approach. Algorithm 1 finds an approximate solution by solving a sequence of linear programs. Based on duality considerations and algorithmic exploitation of the specific stagewise transportation structure inherent to the nested distance, the algorithm approximates the solution of a semi-infinite non-convex problem by a sequence of LPs. The current state-of-the-art method, on the other hand, requires the solution of a non-convex program in each iteration step. Clearly, a sequential linear programming approach improves the performance considerably. 11 Moreover, our algorithm turned out to find feasible solutions in many cases where our implementation of a successive programming method fails to do so.
Let us extend the concept of the nested distance to subtrees, iteratively from the leaves to the root ('top-down'). For two scenario trees (here with identical filtration structures), define dI T (i, j) as the distance of the paths leading to the leave nodes i, j ∈ N T . Moreover, define dI t (k, l) := i∈k+ j∈l+ π(i, j|k, l) dI t+1 (i, j) , for all nodes k, l ∈ N t , where 0 ≤ t < T . Then, the nested distance between the two trees is given by dI 0 (1, 1) . This stagewise backwards approach (cf. [31, Alg. 2.1]) is the basic idea of Algorithm 1. As we assume the tree structure to be fixed, Algorithm 1 iterates through the tree in the same top-down manner and searches for the optimal solution in each stage, while ensuring that the nested distance constraint remains satisfied. The variables are the conditional transition probabilities under Q, i.e., q i := Q[i|i−], as well as the transportation subplans π(i, j|i−, j−), as defined in the Appendix. We use the notation n− for the immediate predecessor of some node n. As the measure P is in fact not needed explicitly since it is given by the transportation plan fromP , condition (4.3) in Algorithm 1 serves to ensure that it is still well-defined implicitly (note that always some nodek ∈ N t−1 needs to be fixed). Condition (1) ensures that Q is a martingale measure, Q represents conditional probabilities by condition (2), condition (3) corresponds to the constraint on the measure change (dQ/dP ≤ 1/α) resulting from the primal AV@R α -acceptability conditions, and (4.1) -(4.3) represent the constraint that there must be one P contained in the nested distance ball such that condition (3) holds.
The algorithm optimizes the variables stagewise top-down. The optimal solution at stage t + 1 depends on the values of the variables for all stages up to stage t, which result from the previous iteration step. Therefore, the algorithm iterates as long as there is further improvement possible at some stage, given updated variable values for the earlier stages of the tree. Otherwise, it terminates and the optimal solution of our approximate problem is found.

Algorithm 1 Acceptability pricing under model ambiguity.
Start with some feasible model P in the nested distance ball aroundP. Initialize π old by assigning the optimal transportation plan between P andP and initialize 'oldprice'. 1 for t from T to 1 do solve (1)  price ← E Q [ T t=1 Ct], construct transportation plan π(·, ·) from subplans π(·, ·|·, ·) 14: return [price, π] 15: end function Example 3. Consider the price of a plain vanilla call option struck at 95, in the Black-Scholes model with parameters S 0 = 100, r = 0.01, σ = 0.2, T = 1. Applying optimal quantization techniques (see, e.g., [31,Chap. 4] for an overview) to discretize the lognormal distribution, we construct a scenario tree with 500 nodes. While there exists a unique martingale measure (and thus a unique option price) in the Black-Scholes model, the discrete approximation allows for several martingale measures (and thus a positive bid-ask spread). Figure 2 visualizes the bid-ask spread as a function of the AV@R-acceptability level α and the radius ε of the nested distance ball used as model ambiguity set. For α → 1 and ε = 0, the spread closes and the resulting price approximates the true Black-Scholes price up to 4 digits. For illustrative purposes, the spread between the bid and the ask price surface is shown from two perspectives.

Conclusion
In this paper we extended the usual methods for contingent claim pricing into two directions. First, we replaced the replication constraint by a more realistic acceptability constraint. By doing so, the claim price does explicitly depend on Figure 2: The bid-ask spread as a function of acceptability and ambiguity. the stochastic model for the price dynamics of the underlying (and not just on its null sets). If the model is based on observed data, then the calculation of the claim price can be seen as a statistical estimate. Therefore, as a second extension, we introduced model ambiguity into the acceptability pricing framework and we derived the dual problem formulations in the extended setting. Moreover, we used the nested distance for stochastic processes to define a confidence set for the underlying price model. In this way, we link acceptability prices of a claim to the quality of observed data. In particular, the size of the confidence region decreases with the sample size, i.e., the number of observed independent paths of the stochastic process of the underlying. For a given sample of observations, the ambiguity radius indicates how much the baseline ask/ bid price should be corrected to safeguard the seller/ buyer of a claim against the inherent statistical model risk, as Section 5 illustrates.
Finite scenario trees are much easier to work with than general stochastic processes. For finite trees, where every node m has a unique predecessor, we write m+ for the set of its immediate successors. Denote by N t the set of all nodes at stage t of the tree model P. For a node i ∈ m+ let P[i|m] be the conditional transition probability from m to i .
The matrix π of transportation plans and the matrix D carrying the pairwise distances of the paths are defined on N T ×Ñ T . The conditional joint probabilities π(i, j|k, l) in (22) are given by π(i, j|k, l) = π i,j · [ i ∈k+ j ∈l+ π i ,j ] −1 .
Approximation of random processes by finite trees. The subsequent result follows from [31,Prop. 4.26].
Theorem 6. If the stochastic process ξ = (ξ 1 , . . . , ξ T ) satisfies the Lipschitz condition given in Assumption A4.5 in Section 3.2, then for every ε > 0 there is a stochastic process with distributionP, which is defined on a finite tree and which satisfies dI(P,P) ≤ ε, where P is the distribution of ξ on the filtered space (Ω, F).