The distortion principle for insurance pricing: properties, identification and robustness

Distortion (Denneberg 1990) is a well known premium calculation principle for insurance contracts. In this paper, we study sensitivity properties of distortion functionals w.r.t. the assumptions for risk aversion as well as robustness w.r.t. ambiguity of the loss distribution. Ambiguity is measured by the Wasserstein distance. We study variances of distances for probability models and identify some worst case distributions. In addition to the direct problem we also investigate the inverse problem, that is how to identify the distortion density on the basis of observations of insurance premia.


Introduction
The function of the insurance business is to carry the risk of a loss of the customer for a fixed amount, called the premium. The premium has to be larger than the expected loss, otherwise the insurance company faces ruin with probability one. The difference between the premium and the expectation is called the risk premium. There are several principles, from which an insurance premium is calculated on the basis of the loss distribution.
Let X be a (non-negative) random loss variable. Traditionally, an insurance premium is a functional, π : {X ≥ 0 defined on (Ω, F, P )} → R ≥0 . We will work with functionals that depend only on the distribution of the loss random variable (sometimes called law-invariance or version-independence property, Young 2014 [33]). If X has distribution function F we use the notation π(F ) for the pertaining insurance premium, and E(F ) for the expectation of F . We use alternatively the notation π(F ) or π(X), resp. E(F ) or E(X) whenever it is more convenient. To the extent of the paper, a more specific notation is used for particular cases of the premium.
The distortion principle. The distortion principle is related to the idea of stress testing. The original distribution function F is modified (distorted) and the premium is the expectation of the modified distribution. If g : [0, 1] → R is a concave monotonically increasing function with the property g(0) = 0, g(1) = 1, then the distorted distribution F g is given by The function g is called the distortion function and with g being the derivative of g, is the distortion density. 1 Notice that h is a density in [0, 1]. We denote by H(u) =´u 0 h(v) dv the distortion distribution. Since the assumptions imply that g(x) ≥ x for 0 ≤ x ≤ 1, F g ≤ F , i.e. F g is first order stochastically larger than F . 2 The distortion premium is the expectation of F g π h (F ) =ˆ∞ 0 g(1 − F (x)) dx ≥ˆ∞ 0 (1 − F (x)) dx = E(X).
By a simple integral transform, one may easily see that the premium can equivalently be written as where V@R v (F ) = F −1 (v), the quantile function. Note that a functional of this form is called an L-estimates (Huber 2011 [11]). If the random variable X takes as well negative values, we could generally define the premium as a Choquet integral In principle, any distortion function which is monotonic and satisfies g(u) ≥ u is a valid basis for a distortion function. However, the concavity of g guarantees that the pertaining distortion density h is increasing, which -in insurance application -reflects the fact that putting aside risk capital gets more expensive for higher quantiles of the risk distribution. Nondecreasing distortion functions lead to non-negative distortion densities with the consequence that whenver F 2 is stochastically larger than F 1 .
Relaxing the monotonicity assumption for g would violate in general the monotonicity w.r.t. first stochastic order.
Examples of distortion functions. Widely used distortion functions g resp. the pertaining distortion densities h are the power distortion with exponent s. If 0 < s < 1, The premium is known as the Proportional Hazard transform (Wang 1995 [27]) and calculated as If s ≥ 1, then we take The premium is If we consider integer exponent, the premium has a special representation.
Proposition 1 Let X (i) , i = 1, . . . , n be independent copies of the random variable X, then the power distortion premium with integer power s has the representation π h (s) (X) = E max{X (1) , . . . , X (s) } .
Proof Let F be the distribution of X. The power distortion premium for integer power s is computed with g (s) in (5) and by definition The assertion follows from the fact that the distribution function of the random variable max{X (1) , . . . , Finally, notice that the distortion density is bounded for s ≥ 1, but unbounded for 0 < s < 1. the Wang distortion or Wang transform (Wang 2000 [26]) where Φ is the standard normal distribution and φ its density. the AV@R (average value-at-risk) distortion function and density are where 0 ≤ α < 1. The pertaining premium has different names, such as conditional tail expectation (CTE), CV@R (conditional value at risk) or ES (expected shortfall) (Embrechts et al. 1997 [4]). The premium is piecewise constant distortion densities. The insurance industry uses also piecewise constant increasing distortion functions. For example, the following distortion function is used by a large reinsurer.
v For more examples on different choices of h and also for different families of distributions, see Wang 1996 [28] and Furmann and Zitikis 2008 [6]. Certainty Equivalence. Let V be a convex, strictly monotonic disutility function. 3 The certainty equivalence premium is the solution of i.e. it is obtained by equating the disutility of the premium and the expected disutility of the loss. The premium is written as follows By Jensen's inequality π V (F ) ≥ E(F ). Examples for disutilities V are the power utility V (x) = x s for s ≥ 1 or the exponential utility V (x) = exp(x). 3 The original notion of a utility function introduced by Neumann/Morgenstern was a concave monotonic U , such that the decision maker maximizes the expectation E(U (Y )) of a profit variable Y . A disutility function can be defined out of a utility function by setting Related to this premium, one could consider just the expected value and compute the expected disutility (Borch 1961 [2]) obtaining π(F ) = E(V (X)). (9) For generalizations of the CEQ premium see Vinel and Krokhmal 2017 [24]. The ambiguity principle. Let F be a family of distributions, which contains the "most probable" loss distribution F . The ambiguity insurance premium is F is called the ambiguity set. In an alternative, but equivalent notation, the ambiguity premium is given by where Q is a family of probability models containing the baseline model P .
Remark 1 In their seminal paper from 1989, Gilboa and Schmeidler [7] give an axiomatic approach to extended utility functionals of the form where U is a utility function and Y is a profit variable. For the insurance case, U should be replaced by a disutility function V and Y should be replaced by a loss variable X leading to an equivalent expression The link to (10) is obvious and it can be seen as a combination of expected disutility (9) and ambiguity.
Remark 2 Recall the fundamental pricing formula of derivatives in financial markets states that the price can be obtained by taking the maximum of the discounted expected payoffs, where the maximum is taken over all probability measures, which make the discounted price of the underlying a martingale. This can be seen as an ambiguity price.
The ambiguity premium is characterized by the choice of the ambiguity set F. In principle, this set can be arbitrary given as long as it contains F . Convex premium functionals have a dual representation, which are also in the form of an ambiguity functional. For distortion functionals, this will be illustrated in the next section. Other important examples for ambiguity premium prices can be defined through distances for probability distributions. Let D be such a distance, then an ambiguity set is given by We call the ambiguity radius. This radius quantifies not only the risk premium, but also the model uncertainty, since the real distribution is typically not exactly known and all we have is a baseline model F . In our Section 6 we base ambiguity models on the Wasserstein distance W D.
Combined models. Luan 2001 [15] introduced a combination of distortion and certainty equivalence premium prices by defining a variable W distributed according to F g and setting More generally, one may also add ambiguity respect to the model and set Notice that (11) contains all previous definitions by making some of the following parameter settings If all parameters are set like that, we recover the expectation. We could also consider the expected disutility premium (9) and combine it with the distortion premium, Section 6 will be dedicated to study the combination of distortion and ambiguity premium prices.
As to notation, we denote by L p the space of all random variables with finite p-norm for all p ≥ 1 resp. X ∞ = ess sup(|X|), the essential supremum. The same notation is used for any real valued function on [0, 1] and p and q are conjugates if 1/p+1/q = 1.

The distortion premium and generalizations
The characterization and represestations of the distortion premium were studied exhaustively. Among some of the most classic contributions we mention the dual theory of Yaari 1987 [32]; and the characterization by axioms of this premium developed in Wang, et al. 1997 [29], where the power distortion for 0 < s < 1 is also characterized in a unique manner. A summary of other known representations and new generalization of this premium will be presented below. Recall that any mapping X → π(X) which is monotone, convex and fulfils translation equivariance 4 is a risk measure. Furthermore, if π is also positively homogeneous, monotonic w.r.t. the first stochastic order and subadditive 5 , then it is a coherent risk measure (Artzner et al. 1999 [1]). The distortion premium fulfils all these properties, therefore by the Fenchel-Moreau-Rockefellar Theorem, it has a dual representation.
Theorem 1 (see Pflug 2006 [19]) The dual representation of the distortion premium with distortion density h is given by Note that all admissible Z's in Theorem 1 are densities on [0,1], since h ≥ 0 and E(h(U )) = 1. To put it differently, given X defined on (Ω, F, P ) and let Q be the set of all probability measures on (Ω, F) such that the density dQ dP has distribution function H, the distortion distribution, then Therefore, every distortion premium can be seen as well as an ambiguity premium with Q as the ambiguity set.
Let us look into more detail to the special case of the AV@R premium. In this case, the dual representation specializes to From the previous representation, we can see that the AV@R-distortion densities h α are the extremes of the convex set of all distortion densities. This fact implies that any distortion premium can be represented as mixtures of AV@R's, such representations are called Kusuoka representations (Kusuoka 2001 [14], Jouini et al. 2006 [12]). Coherent risks have a Kusuoka representation of the form where K is a collection of probability measures in [0,1]. In particular, for the distortion premium we have the following result (Pflug/Römisch 2007 [18]).

Theorem 2 Any distortion premium can be written as
The mixture distribution K is given by the way how h is represented as a mixture of the AV@R-distortion densities, i.e.
The pure AV@R β is contained in this class by setting K(α) = δ β , the Dirac measure at β. Moreover, the integral of the AV@R's is obtained for K(α) = α and is defined aŝ if the integral exists.
Remark 3 Some other generalizations of the distortion premium were studied in Greselin and Zitikis 2018 [10], where they consider a class of functionalŝ with ν(·, ·) an integrable function and show the Gini-index and Bonferroniindex belong to this class. These generalizations lead to inequality measures instead of risk measures.
As a related generalization of the distortion premium one may consider for some convex and monotonic Lipschitz function ν and some non-negative function k on [0,1]. Clearly, R(X) is convex and monotonic, but in general is neither positively homogeneous nor translation equivariant unless ν is the identity (see Appendix for a proof). To our knowledge, functionals of the form (12) are not used in the insurance sector. For this and some other generalizations see the papers of Goovaerts [5]. To start, we recall the notion of the Wasserstein distance.
Definition 1 Let (Ω, d) be a metric space and P ,P be two Borel probability measures on it. Then the Wasserstein distance of order r ≥ 1 is defined as Here the infimum is over all joint distributions of the pair (X, Y ), such that the marginal distributions are P resp.P , i.e. X ∼ P , Y ∼P . For two distributions F and G on the real line endowed with metric this definition specializes to (see Vallender 1974 [23]) Therefore, the Wasserstein distance is the (absolute) area between the distribution functions which is also the (absolute) area between the inverse distributions. By a similar argument one may prove that the Wasserstein distance of order r ≥ 1 with the d 1 metric on the real line is We now study continuity properties of the functional F → π h (F ).
Proposition 2 (Continuity for bounded distortion densities) Let F and G be two distributions on the real line and h a distortion density function. If the distributions have both finite first moments and h is bounded, then Proof See Pichler 2010 [20].

Remark 4
The boundedness of h is ensured if g has a finite right hand side derivative at 0, and also if g has finite Lipschitz constant L, since h ∞ ≤ L.
Proposition 2 can be easily generalized as follows.
Proposition 3 (Continuity for distortion densities in L q for q < ∞) Let F and G be two distributions on the real line and h a distortion density function. If F , G have finite p−moments and h ∈ L q , then where p and q are conjugates.
Proof By Hölder's inequality for p and q we obtain Example 1 Let F and G be two distributions with finite first moments.
-For the AV@R distortion premium ||h α || ∞ = 1 1−α , and therefore -For the power distortion with s ≥ 1, ||h (s) || ∞ = s, and therefore The power distortion with 0 < s < 1 is not bounded. The next result is dedicated for this particular case.
Proposition 4 (Continuity for the the power distortion with 0 < s < 1) Let F and G be distribution functions and h (s) the distortion density defined in (3). If F and G have finite p−moments for p > 1 s and h ∈ L q , then where p and q are conjugates.
Proposition 3 proves the statement.
The next result is a direct consequence of Proposition 4.
Corollary 1 (Continuity for distortion densities dominated by power distortion densities with 0 < s < 1) Let F and G be distribution functions and h a distortion density.
where p and q are conjugates.
Corollary 2 (Convergence) If F, F n for all n ≥ 1 have finite uniformly bounded p−moments, h ∈ L q and W D p,d1 (F n , F ) → 0 as n → ∞, then where p and q are conjugates.
Remark 5 Corollary 2 holds when the sequence of distributions are the empirical distributions F n defined on an i.i.d. sample of size n, (x 1 , . . . , x n ) from 0. This result follows by applying Lemma 4.1 in [17].
Finally notice that, for continuity, the order of the Wasserstein distance r coincides with the number of finite moments of F .

Partial coverage
Many insurance contracts do not guarantee complete indemnity, but their payoff is just a part of the full damage. Such contracts include proportional insurance, deductibles and capped insurance. In general, there is a (monotonic) payoff function T such that the payoff is T (X), if the total loss is X. A quite flexible form is for instance the excess-of-loss insurance (XL-insurance), which has a payoff function Denote by F T the distribution of T (X), if F is the distribution of X. The distortion premium for partial coverage is π h (F T ). We study the relationship between F T and G T as well as between π h (F T ) and π h (G T ) in a slightly more general setup, namely for Hölder continuous T . Recall that T is Hölder Theorem 3 (Distance between the original and image probabilities by T ) Let P and Q be two probability measures and consider their image probabilities under T denoted by P T and Q T , respectively. If T is a β−Hölder continuous mapping, then W D r β ,d1 (P T , Q T ) ≤ H β · W D β r,d1 (P, Q), for r β = r β ≥ 1 and r ≥ 1, where H β is the β−Hölder constant. Proof Let the joint distribution of X and Y such that . Taking the r β root on both sides finished the proof.
For the XL-insurance, the Hölder-constant is a Lipschitz constant (β = 1) and has the value 1.
From the previous Theorem we can conclude that, if two probabilities are close, then the image probabilities by a mapping T with the characteristics of Theorem 3, are close in Wasserstein distance as well. Theorem 3 isolates the argument also used in Theorem 3.31 in [17]. Note that the underlying distances for the Wasserstein distances are the metrics of the respective spaces.
Corollary 3 Let F, G be two distributions defined by the probabilities P and Q, respectively, and F T , G T be their image distributions by T , respectively. If T is a β−Hölder continuous mapping with constant H β , h ∈ L q , the distributions F T , G T with finite p−moments, then for all r = p · β (r ≥ 1), the distortion premium with payment function T satisfies We proceed now to study sensitivity properties of the distortion premium w.r.t. the distortion density.
4 Continuity of the premium w.r.t. the distortion density Previously, we studied the mapping F → π h (F ) for fixed h. In this section, we consider and present properties of the mapping h → π h (F ) for fixed F . Different sensitivity properties w.r.t. the distortion parameters were studied in Gourieroux and Liu 2006 [9].
Proposition 5 (Continuity of the distortion premium w.r.t. the distortion density h) Let F be a distribution and consider two different distortion densities h 1 , h 2 . If F has finite p−moments and h 1 , h 2 ∈ L q , then where p and q are conjugates. Here the choices p = 1, q = ∞ and p = ∞, q = 1 are included.
Proof Use Hölder inequality and the result is direct.
We can conclude that, if h 1 and h 2 are close, then also the premium prices are close. However, h is always identifiable by the following Proposition.
Proof Let F a be the distribution which takes the value 0 with probability a and the value 1 with probability 1 − a, for some a ∈ (0, 1), then its inverse F −1 a is the indicator function of the interval [a, 1]. Hence, Thus, the distortion distributions H 1 and H 2 are equal and therefore h 1 = h 2 almost surely.

Estimating the distortion density from observations
The way how insurance companies calculate a premium is typically not revealed to the customer. Notice that risk premia appear not only in the insurance business, see the link of insurance premium prices and asset pricing in Nguyen et al. 2012 [16]. Risk premia appears in other areas such as -power future markets. A future contract fixes the price today for delivery of energy later. There is the risk of price changes between now and the delivery period. Thus, such a contract has the character of an insurance and the pricing principles apply, although the price is found in exchange markets (e.g. electricity future markets).
-exotic options. While standard options are priced through a replication strategy argument, this argument does not apply for other types of options and these options have the character of insurance contracts. Pricing of such contracts is often done over the counter, but again the pricing principle is not revealed to the counterparty. -credit derivatives. Also these contracts carry the character of insurance and can be priced according to insurance price principles.
In this section we assume that we know the distortion premium prices of m contracts, which are all priced with the same distortion density h. For each contract j, we also have a sample x (j) 1 , . . . , x (j) n of size n drawn from the loss distribution of this contract at our disposal. For simplicity we assume that n is the same for all contracts, but this is not crucial.
The goal of this section is to show how the distortion density h can be regained from the observations of the insurance prices, which would help us to shed more light on the price formation of contract counterparties. Notice that our aim is not to estimate the distortion premium prices from empirical data as is done in Gourieroux and Liu 2006 [9] or Tsukahara 2013 [22].
A simulation example. As an example we consider m different loss distributions, all of Gamma type. From each distribution, we obtain a sample of size n. For each sample, we calculate the AV@R and power distortion premium prices. Based on the prices obtained and our samples, we aim to recover the distortion density h. We denote the ordered sample from the j-th loss distribution by x (j) [1] , . . . , x (j) [n] . The distortion premium, with distortion density h for each sample j = 1, . . . , m, is On the following, we develop (16) for the particular cases of AV@R and power distortion premium prices for each sample j = 1, . . . , m.
AV@R distortion premium. The price for h α defined on (7) is Power distortion premium. The price given by the power distortion h (s) defined in (3) with 0 < s < 1 is and the price given by h (s) defined in (5) with s ≥ 1 is The inverse problem consists on estimating the distortion density h from observed prices. Recall that among the examples we presented of common distortion densities we had step functions and continuous functions, therefore we will use step and spline functions in order to estimate estimate h. We do so for the prices obtained in (17), (18) and (19).

Estimation of the distortion density with a step function
Distortion density as a step function. Let h 1 l denote the step function consisting of l equal-size steps, defined as where L = n/l, λ s ∈ R for k = 1, . . . , l and l denotes the dimension of the step function space. We also imposê with 0 ≤ λ 1 ≤ · · · ≤ λ l . In this way, h 1 l fulfils the density constraints as well as the non-decreasing constraints.
Prices with the step function. For each sample j = 1, . . . , m, the prices with h 1 l are Estimation. In order to estimate h 1 l we will minimize the squares of the differences between the prices obtained by a distortion function h and the premium obtained by h 1 l in (22). We will test our results with the given prices π (j) calculated in (17), (18) and (19). We solve, (P 1 )

Estimation of the distortion density with a cubic monotone spline
B-splines construction. For our purposes we will define the splines on the interval [0, 1]. Any B-spline is a linear combinations of the B-spline basis functions. The B-spline basis functions have all the same degree, b and we choose to define them at equally spaced knots t k = k/L, for k = 0, . . . , L, hence L subintervals. The functions for this basis are denoted as B k,b and constructed following a recursion formula. The B-spline basis function of degree 0 is denoted and defined as The B-spline basis functions of degree b, B k,b are obtained as an interpolation between B k,b−1 and B k+1,b−1 , following the recursion formula In the recursion we need to define fake knots t −k = 0 and t L+k = 1 for k = 1, . . . , b. In our case, we consider splines of degree b = 2. If we divide [0, 1] in L equally sized intervals, the basis has L + 2 functions Notice that all the elements of the basis can be obtained by translating the B-spline basis function B 0,2 defined on the first b + 2 = 4 knots. In order to have a base of increasing monotone cubic splines we integrate the functions of (23) and obtain a new base where S k (v) =´v 0 B k,2 (w) dw for all k = −2, . . . , L − 1. We scale the functions of (23) so the splines in (24) are distribution functions. Note that no linear combination of (24) gives us a constant function, due to construction of (24). Therefore, we need one element to our base, say S L (v) = c and hence is our final base with l = L + 3 elements, where l denotes its dimension.
As an example we illustrate the base obtained for L = 5. Starting with B 0,2 defined on t 0 = 0, t 1 = 1/5, t 2 = 2/5, t 3 = 3/5, precisely We denote by S 0 the distribution of B 0,2 and obtain the rest of the monotone cubic splines by translating S 0 . The basis of cubic monotone splines of dimension l = 8, illustrated in Figure 1, is denoted as where S k (v) = S 0 (v − k/5) for k = −2, . . . , 4 and S 5 (v) = c. Distortion density as a spline. Let h 2 l (v) denote an increasing monotone cubic density defined as a linear combination of l = L + 3 splines in (25) where λ k ≥ 0 for all k = −2, . . . , L. Notice that by setting the scalars to be non-negative, h 2 l is increasing. However, h 2 l must integrate to 1 on [0, 1], hencê Prices with the spline. For each sample j = 1, . . . , m, the prices with h 2 l are Estimation. Given prices π (j) calculated as in (17), (18) or (19) and the prices calculated in (29) for every sample j = 1, . . . , m, we solve where a k is defined in (28). The estimations obtained by solving (P 1 ) and (P 2 ) are presented below.
AV@R distortion premium. We consider particular cases of h α for α = 0.9, 0.95. We estimate the distortion density for each of the cases, with two different step functions, corresponding to l = 8, 10 steps, and two different spline basis functions of dimensions l = 8, 13, respectively.
Step function. The estimated step distortions h l for l = 8, 10 are obtained by solving (P 1 ) and illustrated below. Splines. The estimated spline distortions h 2 l for l = 8, 13 are obtained by solving (P 2 ) and illustrated below. Power distortion premium. For this case we considerh (s) for s = 0.8, 3. We solve (P 1 ) and (P 2 ) with the same number of steps and number of spline basis functions as before.
Step function. The estimated step distortions h 1 l for l = 8, 10 are obtained by solving (P 1 ) and illustrated below.  The optimal values of the optimization problems for all the cases can be seen in the following table.  Table 1: Optimal values of the problems (P 1 ) and (P 2 ) for the AV@R−distortion and the power distortion.

Ambiguity
In this section we combine the distortion premium with the ambiguity principle. Such an approach allows us to incorporate model uncertainty into the premium. Recall that, by setting the distortion density to h = 1, we would price just with the ambiguity principle. As was mentioned in Section 1, distances can be used to define ambiguity sets. Here, closed Wasserstein balls will serve as ambiguity sets. These sets will be centred at F , an initial distribution, that we refer to as our baseline model.
Definition 2 (Robust distortion premium under Wasserstein balls with d 1 ) Let F be the baseline loss distribution, h a distortion density. The robust distorted price of order r ≥ 1 is We call the worst case distribution and denote it by F * if F * ∈ B r,d1 (F, ) and is such that We can say more about the value and solution of (P-r) if we choose r = p. We start with bounded distortion densities, i.e. for p = 1 and q = ∞.
Proposition 7 (Characterization of the worst case distribution for r ≥ p = 1) Let the baseline distribution F have its first moment finite.
(i) If h is unbounded, then (P-r) for r = 1 is unbounded. (ii) If h is bounded with sup v h(v) = h ∞ , then (P-r) is bounded for all r ≥ 1.
If r = 1, the optimal value of (P-r) is We interpret the additional term · h ∞ as the ambiguity premium. For the worst case distribution, if h(v) = h ∞ for v ≥ 1 − η and 0 < η ≤ 1, then the supremum is attained at -Otherwise, the supremum is not attained, but can be approximated by the sequence F * 1/n (x), ∀n ∈ N.
Proof (i) Given that h is increasing and unbounded, the increasing sequence K n = h (1 − 1/n), is such that lim n→∞ K n = ∞. For all n ∈ N we define a distribution G n such that G n is on the boundary of B 1,d1 (F, ) and Hence, (P-r) is unbounded for r = 1. (ii) It is sufficient to prove (P-r) is bounded for r = 1 since B 1,d1 ⊇ B r,d1 for all r ≥ 1 (see Remark 8). Any admissible G for r = 1 can be written as Since F has its first moment finite, the following upper bound is finite: The distribution F * η (x) given in the Proposition has inverse Therefore, F * η is on the boundary of B 1,d1 (F, ) and If h(v) = h ∞ for v ≥ 1 − η, then F * η attains the upper bound in (31). Otherwise, F * 1/n approaches the maximum from below, since and π h (F * 1/n ) = π h (F ) + · nˆ1
As an example, we illustrate the worst case distribution for the AV@R premium. Fig. 6: The worst case distribution F * η for h α with α = 0.9 is obtained by shifting F from x α , a length /η, where x α = F −1 (α) and η = 1 − α.
If h is unbounded we can characterize the solution of (P-r) as follows.
Proposition 8 (Characterization of the worst case distribution for r ≥ p > 1) Let the baseline distribution F have finite p−moments. If h ∈ L q , then (P-r) is bounded for r ≥ p. If r = p, the optimal value of (P-r) is π h,p,d1 (F ) = π h (F ) + · h q q . Also in this case, the term · h q q is interpreted as ambiguity premium. Furthermore, the worst case distribution F * of (P-r) for r = p is such that Proof We prove (P-r) is bounded for r = p and by Remark 8 we have boundness for all r ≥ p. Notice that, for all admissible G, if r = p, we havê F * is admissible since it is on the boundary of B p,d1 (F, ) and F * attains the upper bound Under some conditions on h we can also prove unboundness of (P-r) for r > p > 1 in the case where h is not in L q , where q is the conjugate of p, the finite moments of F . Proposition 9 (Unboundness for r > p > 1) Let the baseline distribution F have finite p−moments and let h / ∈ L q , for p, q conjugates and r, s conjugates with r > 1. If there exists s 1 < s such that´1 0 h(v) s1 dv = ∞ and h ∈ L t , for all t < s 1 , then (P-r) is unbounded for all r > p. 1] . Since ψ η ∈ L r for r > 1 (note that r(s 1 − 1) < s 1 ), there exists an 0 < η < 1 such that . And its premium is unbounded Remark 10 If instead of the metric d 1 we consider d p (x, y) = |x p − y p | as underlying metric for the Wasserstein distance, we could define the ambiguity principle π h,1,dp (F ) = sup{π h (G) : G ∈ B 1,dp (F, )}, (P-dp) where B r,dp (F, ) = {G : W D r,dp (G, F ) ≤ }. It is easy to see that, if F has p−moments the constraint of the balls make all of admissible distributions to have also p−moments, therefore for Proposition 3, if h ∈ L q , then (P-dp) is bounded. Furthermore, continuity respect to this Wasserstein distance implies our continuity results in Section 3.

Conclusions
After some introduction about general premium principles we propose generalizations of the distortion premium. In addition, we have studied in detail three functional relationships for the distortion premium the premium function F → π h (F ), i.e. the properties of π h as a premium principle, the direct function h → π h (F ), i.e. the dependency on the distortion density, the inverse functions π h (F ) → h.
The smoothness properties are important for robustness aspects, however it is well known that a quite smooth direct function makes the inverse problem difficult. We showed however that the inverse problem is identifiable and we gave a simple quadratic optimization problem to estimate it from empirical data. We successfully illustrated this in a simulation study, the application on real data is left for further research. We also identified the ambiguity premium for Wasserstein balls as ambiguity sets offering, in some cases, a specific formulation of the worst case distribution. It turned out that the extra premium for ambiguity depends on the distortion function h and in a multiplicative way on the ambiguity radius , but does not on the loss distribution F itself. Thus it is the same for all contracts and can be calculated in a separate manner. Finally, by using different distances as underlying metrics for the Wasserstein ball, and hence, for the ambiguity set, we could find bounds for the robust premium is always bounded.
Finally, based on the subdifferential, one gets a dual representation where Z Y is given by (33).
On different underlying metrics for the Wasserstein distance. There is a whole family of distances on R, which are generalizations of d 1 .
Set for x, y ≥ 0, d p (x, y) = |x p − y p |. The Wasserstein distance of order 1 with distance d p is Lemma 1 Notice that for p ≥ 1 W D p,d1 (F, G) ≤ [W D 1,dp (F, G)] 1/p .
Proof By the subadditivity of x → x p on R ≥0 one has that |x − y| p ≤ |x p − y p | and therefore W D p,d1 (F, G) =

Remark 11
This argument also shows that if F has finite p−moments and if W D 1,dp (F, G) < ∞ (and a fortiori if W D p,d1 (F, G) < ∞), then also G has finite p−moments. On the other hand, if both F and G have finite p−moments, then W D 1,dp (F, G) ≤ p · W D p,d1 (F, G)(1 + F −1 p−1 p + G −1 p−1 p ) (see Lemma 2.19 in [17]). Therefore, imposing conditions on W D 1,dp or on W D p,d1 leads to quite similar results.