1 Introduction

The function of the insurance business is to carry the risk of a loss of the customer for a fixed amount, called the premium. The premium has to be larger than the expected loss, otherwise the insurance company faces ruin with probability one. The difference between the premium and the expectation is called the risk premium. There are several principles, from which an insurance premium is calculated on the basis of the loss distribution.

Let X be a (non-negative) random loss variable. Traditionally, an insurance premium is a functional, \(\pi {:}\, \{ X\ge 0 \text { defined on } (\varOmega , \mathcal {F}, P) \} \rightarrow \mathbb {R}_{\ge 0}\). We will work with functionals that depend only on the distribution of the loss random variable (sometimes called law-invariance or version-independence property, Young 2014). If X has distribution function F we use the notation \(\pi (F)\) for the pertaining insurance premium, and \(\mathbb {E}(F)\) for the expectation of F. We use alternatively the notation \(\pi (F)\) or \(\pi (X)\), resp. \(\mathbb {E}(F)\) or \(\mathbb {E}(X)\) whenever it is more convenient. To the extent of the paper, a more specific notation is used for particular cases of the premium.

We consider the following basic pricing principles:

  • The distortion principle (Denneberg 1990).

  • The certainty equivalence principle (Von Neumann and Morgenstern 1947).

  • The ambiguity principle (Gilboa and Schmeidler 1989).

  • Combinations of the previous (for instance Luan 2001).

1.1 The distortion principle

The distortion principle is related to the idea of stress testing. The original distribution function F is modified (distorted) and the premium is the expectation of the modified distribution. If \(g:\, [0,1] \rightarrow \mathbb {R}\) is a concave monotonically increasing function with the property \(g(0)=0\), \(g(1)=1\), then the distorted distribution \(F^{g}\) is given by

$$\begin{aligned} F^{g}(x)=1-g(1-F(x)). \end{aligned}$$

The function g is called the distortion function and

$$\begin{aligned} h(v)=g^{\prime }(1-v), \end{aligned}$$

with \(g^\prime \) being the derivative of g, is the distortion density.Footnote 1 Notice that h is a density in [0, 1]. We denote by \(H(u)=\int _0^u h(v) \, dv\) the distortion distribution. Since the assumptions imply that \(g(x) \ge x\) for \(0\le x \le 1\), \(F^g \le F\), i.e. \(F^g\) is first order stochastically larger than F.Footnote 2 The distortion premium is the expectation of \(F^{g}\)

$$\begin{aligned} \pi _h(F)=\int _{0}^{\infty }g(1-F(x))\,dx \ge \int _0^{\infty } (1- F(x)) \, dx = \mathbb {E}(X). \end{aligned}$$

By a simple integral transform, one may easily see that the premium can equivalently be written as

$$\begin{aligned} \pi _h(F)=\int _{0}^{1}F^{-1}(v)\,h(v)\,dv = \int _0^1 {\text {V@R}}_v(F) \, h(v)\, dv, \end{aligned}$$
(1)

where \({\text {V@R}}_v(F) = F^{-1}(v)\), the quantile function. Note that a functional of this form is called an L-estimates (Huber 2011). If the random variable X takes as well negative values, we could generally define the premium as a Choquet integral

$$\begin{aligned} \pi _h(F) = \int _{-\infty }^0 g(1-F(x)) - 1\, dx + \int _0^\infty g(1-F(x))\, dx. \end{aligned}$$
(2)

In principle, any distortion function which is monotonic and satisfies \(g(u) \ge u\) is a valid basis for a distortion function. However, the concavity of g guarantees that the pertaining distortion density h is increasing, which—in insurance application—reflects the fact that putting aside risk capital gets more expensive for higher quantiles of the risk distribution. Nondecreasing distortion functions lead to non-negative distortion densities with the consequence that

$$\begin{aligned} \pi _h(F_1) \le \pi _h(F_2) \qquad \hbox { whenver } F_2 \hbox { is stochastically larger than } F_1. \end{aligned}$$

Relaxing the monotonicity assumption for g would violate in general the monotonicity w.r.t. first stochastic order.

1.2 Examples of distortion functions

Widely used distortion functions g resp. the pertaining distortion densities h are

  • the power distortion with exponent s. If \(0<s< 1\),

    $$\begin{aligned} g^{(s)}(v)=v^{s},\quad h^{(s)}(v)=s(1-v)^{s-1}. \end{aligned}$$
    (3)

    The premium is known as the proportional hazard transform (Wang 1995) and calculated as

    $$\begin{aligned} \pi _{h^{(s)}}(F) = \int _0^\infty 1- F(x)^s \, dx = s\int _0^1 F^{-1}(v)(1-v)^{s-1} \, dv. \end{aligned}$$
    (4)

    If \(s\ge 1\), then we take

    $$\begin{aligned} g^{(s)}(v)= 1- (1-v)^s, \quad h^{(s)}(v) = s v^{s-1}. \end{aligned}$$
    (5)

    The premium is

    $$\begin{aligned} \pi _{h^{(s)}}(F) = \int _0^\infty 1- (1-F(x))^s \, dx = s\int _0^1 F^{-1}(v)v^{s-1} \, dv. \end{aligned}$$
    (6)

    If we consider integer exponent, the premium has a special representation.

Proposition 1

Let \(X^{(i)}\), \(i=1, \ldots , n\) be independent copies of the random variable X, then the power distortion premium with integer power s has the representation

$$\begin{aligned} \pi _{h^{(s)}} (X)= \mathbb {E}\left( \max \left\{ X^{(1)}, \ldots , X^{(s)}\right\} \right) . \end{aligned}$$

Proof

Let F be the distribution of X. The power distortion premium for integer power s is computed with \(g^{(s)}\) in (5) and by definition

$$\begin{aligned} \pi _{h^{(s)}}(F) =\int _0^\infty g^{(s)}(1-F(x))= \int _0^\infty 1 - F(x)^s \, dx. \end{aligned}$$

The assertion follows from the fact that the distribution function of the random variable \(\max \lbrace X^{(1)}, \ldots , X^{(s)}\rbrace \) is \(F(x)^s\). \(\square \)

Finally, notice that the distortion density is bounded for \(s\ge 1\), but unbounded for \(0<s<1\).

  • the Wang distortion or Wang transform (Wang 2000)

    $$\begin{aligned} g(v)=\varPhi \left( \varPhi ^{-1}(v)+\lambda \right) ,\qquad h(v)=\frac{\phi (\varPhi ^{-1}(1-v)+\lambda )}{\phi \left( \varPhi ^{-1}(1-v)\right) },\quad \lambda >0, \end{aligned}$$

    where \(\varPhi \) is the standard normal distribution and \(\phi \) its density.

  • the \({\text {AV@R}}\) (average value-at-risk) distortion function and density are

    $$\begin{aligned} g_\alpha (v)=\min \left\{ \frac{v}{1-\alpha } ,1\right\} ,\qquad h_\alpha (v)=\frac{1}{1-\alpha }\,\mathbb {1}_{v\ge \alpha }, \end{aligned}$$
    (7)

    where \(0\le \alpha <1\). The pertaining premium has different names, such as conditional tail expectation (CTE), CV@R (conditional value at risk) or ES (expected shortfall) (Embrechts et al. 1997). The premium is

    $$\begin{aligned} \pi _{h_\alpha }(F)=\int _{0}^{\infty }\min \left\{ \frac{1-F(x)}{1-\alpha },1\right\} \,dx=\frac{1}{1-\alpha }\int _{\alpha }^{1}F^{-1}(v)\,dv. \end{aligned}$$
    (8)
  • piecewise constant distortion densities. The insurance industry uses also piecewise constant increasing distortion functions. For example, the following distortion function is used by a large reinsurer.

v

\(h\,(v)\)

v

\(h\,(v)\)

[0,0.85]

0.8443

[0.988,0.992)

3.6462

[0.85,0.947)

1.1731

[0.992,0.993)

4.0572

[0.947,0.965)

1.4121

[0.993,0.996)

6.5378

[0.965,0.975)

1.7335

[0.996,0.997)

12.7020

[0.975,0.988)

2.4806

[0.997,1]

14.9436

For more examples on different choices of h and also for different families of distributions, see Wang (1996) and Furman and Zitikis (2008).

1.3 Certainty equivalence principle

Let V be a convex, strictly monotonic disutility function.Footnote 3 The certainty equivalence premium is the solution of

$$\begin{aligned} V(\pi )=\mathbb {E}(V(X)), \end{aligned}$$

i.e. it is obtained by equating the disutility of the premium and the expected disutility of the loss. The premium is written as follows

$$\begin{aligned} \pi ^V(F)=V^{-1}\left( \mathbb {E}(V(X))\right) = V^{-1} \left( \int _0^1 V\left( F^{-1}(v)\right) \, dv \right) . \end{aligned}$$

By Jensen’s inequality \(\pi ^V(F) \ge \mathbb {E}(F)\). Examples for disutilities V are the power utility \(V(x)=x^{s}\) for \(s \ge 1\) or the exponential utility \(V(x)=\exp (x)\).

Related to this premium, one could consider just the expected value and compute the expected disutility (Borch 1961) obtaining

$$\begin{aligned} \pi (F) = \mathbb {E}(V(X)). \end{aligned}$$
(9)

For generalizations of the CEQ premium see Vinel and Krokhmal (2017).

1.4 The ambiguity principle

Let \(\mathfrak {F}\) be a family of distributions, which contains the “most probable” loss distribution F. The ambiguity insurance premium is

$$\begin{aligned} \pi ^{\mathfrak {F}}(F) = \sup \left\{ \mathbb {E}(G) : G \in \mathfrak {F} \right\} . \end{aligned}$$

\(\mathfrak {F}\) is called the ambiguity set. In an alternative, but equivalent notation, the ambiguity premium is given by

$$\begin{aligned} \pi ^{\mathcal {Q}}(X)=\max \left\{ \mathbb {E}_{Q}(X){:}\,Q\in \mathcal {Q}\right\} , \end{aligned}$$
(10)

where \(\mathcal {Q}\) is a family of probability models containing the baseline model P. The functional inside the maximization needs not to be the expectation, but can be general, see e.g. Wozabal (2012), Wozabal (2014), Gilboa and Schmeidler (1989) and our Sect. 6.

Remark 1

In their seminal paper from 1989, Gilboa and Schmeidler (1989) give an axiomatic approach to extended utility functionals of the form

$$\begin{aligned} \min \left\{ \mathbb {E}_{Q}(U(Y)){:}\,Q\in \mathcal {Q}\right\} , \end{aligned}$$

where U is a utility function and Y is a profit variable. For the insurance case, U should be replaced by a disutility function V and Y should be replaced by a loss variable X leading to an equivalent expression

$$\begin{aligned} \max \left\{ \mathbb {E}_{Q}(V(X)){:}\,Q\in \mathcal {Q}\right\} . \end{aligned}$$

The link to (10) is obvious and it can be seen as a combination of expected disutility (9) and ambiguity.

Remark 2

Recall the fundamental pricing formula of derivatives in financial markets states that the price can be obtained by taking the maximum of the discounted expected payoffs, where the maximum is taken over all probability measures, which make the discounted price of the underlying a martingale. This can be seen as an ambiguity price.

The ambiguity premium is characterized by the choice of the ambiguity set \(\mathfrak {F}\). In principle, this set can be arbitrary given as long as it contains F. Convex premium functionals have a dual representation, which are also in the form of an ambiguity functional. For distortion functionals, this will be illustrated in the next section. Other important examples for ambiguity premium prices can be defined through distances for probability distributions. Let D be such a distance, then an ambiguity set is given by

$$\begin{aligned} \mathfrak {F}=\left\{ G:\,D(F,G)\le \epsilon \right\} , \end{aligned}$$

with ambiguity premium

$$\begin{aligned} \pi ^\epsilon _{D}(F) = \max \left\{ \mathbb {E}(G) : D(F,G) \le \epsilon \right\} . \end{aligned}$$

We call \(\epsilon \) the ambiguity radius. This radius quantifies not only the risk premium, but also the model uncertainty, since the real distribution is typically not exactly known and all we have is a baseline model F. In our Sect. 6 we base ambiguity models on the Wasserstein distance WD.

1.5 Combined models

Luan (2001) introduced a combination of distortion and certainty equivalence premium prices by defining a variable W distributed according to \(F^g\) and setting

$$\begin{aligned} \pi ^V_h(F) = V^{-1}(\mathbb {E}[V(W)]) = V^{-1} \left( \int _0^1 V\left( F^{-1}(v) \right) h(v) \, dv \right) . \end{aligned}$$

Notice that \((F^g)^{-1} (v) = F^{-1}(1-g^{-1}(1-v))\).

More generally, one may also add ambiguity respect to the model and set

$$\begin{aligned} \pi ^{V,\, \epsilon }_{h}(F) = \sup \bigg \lbrace V^{-1} \left( \int _0^1 V\left( G^{-1}(v)\right) \, h(v) \, dv \right) : D(F,G) \le \epsilon \bigg \rbrace . \end{aligned}$$
(11)

Notice that (11) contains all previous definitions by making some of the following parameter settings

$$\begin{aligned} h(v)= 1, V(v)=v, \epsilon =0. \end{aligned}$$

If all parameters are set like that, we recover the expectation.

We could also consider the expected disutility premium (9) and combine it with the distortion premium,

$$\begin{aligned} \int _0^1 V(F^{-1}(v))\, h(v) \,dv = \mathbb {E}[V(W)]. \end{aligned}$$

Section 6 will be dedicated to study the combination of distortion and ambiguity premium prices.

As to notation, we denote by \(\mathcal {L}^p\) the space of all random variables with finite p-norm for all \(p\ge 1\)

$$\begin{aligned} \Vert X \Vert _p = [\mathbb {E}(|X|^p)]^{1/p}, \end{aligned}$$

resp. \(\Vert X \Vert _\infty = \hbox {ess sup } (|X|)\), the essential supremum. The same notation is used for any real valued function on [0, 1] and p and q are conjugates if \(1/p + 1/q =1\).

2 The distortion premium and generalizations

The characterization and represestations of the distortion premium were studied exhaustively. Among some of the most classic contributions we mention the dual theory of Yaari (1987); and the characterization by axioms of this premium developed in Wang et al. (1997), where the power distortion for \(0<s<1\) is also characterized in a unique manner. A summary of other known representations and new generalization of this premium will be presented below. Recall that any mapping \(X \mapsto \pi (X)\) which is monotone, convex and fulfils translation equivarianceFootnote 4 is a risk measure. Furthermore, if \(\pi \) is also positively homogeneous, monotonic w.r.t. the first stochastic order and subadditiveFootnote 5, then it is a coherent risk measure (Artzner et al. 1999). The distortion premium fulfils all these properties, therefore by the Fenchel–Moreau–Rockefellar theorem, it has a dual representation.

Theorem 1

(see Pflug 2006) The dual representation of the distortion premium with distortion density h is given by

$$\begin{aligned} \pi _h(X) = \sup \{ \mathbb {E}(X \cdot Z) : Z= h(U), \quad \hbox {where}\ U \hbox { is uniformly distributed on } [0,1] \}. \end{aligned}$$

Note that all admissible Z’s in Theorem 1 are densities on [0, 1], since \(h\ge 0\) and \(\mathbb {E}(h(U))=1\). To put it differently, given X defined on \((\varOmega , \mathcal {F}, P)\) and let \(\mathcal {Q}\) be the set of all probability measures on \((\varOmega , \mathcal {F})\) such that the density \(\frac{dQ}{dP}\) has distribution function H, the distortion distribution, then

$$\begin{aligned} \pi _h(X) = \sup \{ \mathbb {E}_Q (X) : Q \in \mathcal {Q} \}. \end{aligned}$$

Therefore, every distortion premium can be seen as well as an ambiguity premium with \(\mathcal {Q}\) as the ambiguity set.

Let us look into more detail to the special case of the \({\text {AV@R}}\) premium. In this case, the dual representation specializes to

$$\begin{aligned} \pi _{h_\alpha }(X) = \sup \left\{ \mathbb {E}(X \cdot Z) : 0\le Z \le \frac{1}{1-\alpha };\, \mathbb {E}(Z)=1 \right\} . \end{aligned}$$

From the previous representation, we can see that the \({\text {AV@R}}\)-distortion densities \(h_{\alpha }\) are the extremes of the convex set of all distortion densities. This fact implies that any distortion premium can be represented as mixtures of \({\text {AV@R}}\)’s, such representations are called Kusuoka representations (Kusuoka 2001; Jouini et al. 2006). Coherent risks have a Kusuoka representation of the form

$$\begin{aligned} \pi (F) = \sup _{K\in \mathcal {K}} \int _0^1 {\text {AV@R}}_\alpha (F) \, dK(\alpha ), \end{aligned}$$

where \(\mathcal {K}\) is a collection of probability measures in [0, 1]. In particular, for the distortion premium we have the following result (Pflug and Römisch 2007).

Theorem 2

Any distortion premium can be written as

$$\begin{aligned} \pi _h(F) = \int _0^1 {\text {AV@R}}_\alpha (X) \, dK(\alpha ). \end{aligned}$$

The mixture distribution K is given by the way how h is represented as a mixture of the \({\text {AV@R}}\)-distortion densities, i.e.

$$\begin{aligned} h(v) = \int _0^v \frac{1}{1-\alpha } \, dK(\alpha ). \end{aligned}$$

The pure \({\text {AV@R}}_\beta \) is contained in this class by setting \(K(\alpha ) = \delta _\beta \), the Dirac measure at \(\beta \). Moreover, the integral of the \({\text {AV@R}}\)’s is obtained for \(K(\alpha ) = \alpha \) and is defined as

$$\begin{aligned} \int _0^1 {\text {AV@R}}_\alpha (F) \, d\alpha = \int _0^1 F^{-1}(v) \, \left[ -\log (1-v) \right] \, dv, \end{aligned}$$

if the integral exists.

Remark 3

Some other generalizations of the distortion premium were studied in Greselin and Zitikis (2018), where they consider a class of functionals

$$\begin{aligned} \int _0^1 \nu ({\text {AV@R}}_\alpha (X), {\text {AV@R}}_0(X))\, d\alpha , \end{aligned}$$

with \(\nu (\cdot ,\cdot )\) an integrable function and show the Gini-index and Bonferroni-index belong to this class. These generalizations lead to inequality measures instead of risk measures.

As a related generalization of the distortion premium one may consider

$$\begin{aligned} R(X)=\int _0^1 \nu ({\text {AV@R}}_\alpha (X)) \, k(\alpha ) \, d\alpha , \end{aligned}$$
(12)

for some convex and monotonic Lipschitz function \(\nu \) and some non-negative function k on [0, 1]. Clearly, R(X) is convex and monotonic, but in general is neither positively homogeneous nor translation equivariant unless \(\nu \) is the identity (see “Appendix” section for a proof). To our knowledge, functionals of the form (12) are not used in the insurance sector. For this and some other generalizations see the papers of Goovaerts et al. (2004) and Furman and Zitikis (2008).

3 Continuity of the premium w.r.t. the Wasserstein distance

In this section we study sensitivity properties of the distortion premium respect to the underlying distribution. Some results in this section are related to those in Pichler (2013), Pflug and Pichler (2014) and Kiesel et al. (2016). Similar results of continuity for variability measures are studied in Furman et al. (2017). To start, we recall the notion of the Wasserstein distance.

Definition 1

Let \((\varOmega ,d)\) be a metric space and P, \(\tilde{P}\) be two Borel probability measures on it. Then the Wasserstein distance of order \(r\ge 1\) is defined as

$$\begin{aligned} WD_{r,d} (P,\tilde{P})= \left( {\mathop {\mathop {\inf }\limits _{X\sim P}}\limits _{Y\sim \tilde{P}}} \mathbb {E}\left( d(X,Y)^r\right) \right) ^{1/r}. \end{aligned}$$

Here the infimum is over all joint distributions of the pair (XY), such that the marginal distributions are P resp. \(\tilde{P}\), i.e. \(X\sim P\), \(Y \sim \tilde{P}\).

For two distributions F and G on the real line endowed with metric

$$\begin{aligned} d_1(x,y)= |x-y|. \end{aligned}$$

this definition specializes to (see Vallender 1974)

$$\begin{aligned} WD_{1,d_1}(F,G)= & {} \int _{-\infty }^\infty |F(x)-G(x)|\,dx = \int _0^1 |F^{-1}(v) - G^{-1}(v)| \,dv. \end{aligned}$$

Therefore, the Wasserstein distance is the (absolute) area between the distribution functions which is also the (absolute) area between the inverse distributions. By a similar argument one may prove that the Wasserstein distance of order \(r\ge 1\) with the \(d_1\) metric on the real line is

$$\begin{aligned} WD^r_{r,d_1}(F,G) = \int _{0}^1 |F^{-1}(v)-G^{-1}(v)|^r\,dv = \Vert F^{-1}-G^{-1}\Vert _r^r. \end{aligned}$$
(13)

We now study continuity properties of the functional \(F \mapsto \pi _h(F)\).

Proposition 2

(Continuity for bounded distortion densities) Let F and G be two distributions on the real line and h a distortion density function. If the distributions have both finite first moments and h is bounded, then

$$\begin{aligned} \left| \pi _h(F) - \pi _h(G) \right| \le || h ||_\infty \cdot WD_{1,d_1} (F,G). \end{aligned}$$

Proof

See Pichler (2010). \(\square \)

Remark 4

The boundedness of h is ensured if g has a finite right hand side derivative at 0, and also if g has finite Lipschitz constant L, since \(\Vert h\Vert _\infty \le L\).

Proposition 2 can be easily generalized as follows.

Proposition 3

(Continuity for distortion densities in \(\mathcal {L}^q\) for \(q<\infty \)) Let F and G be two distributions on the real line and h a distortion density function. If F, G have finite p-moments and \(h\in \mathcal {L}^q\), then

$$\begin{aligned} \left| \pi _h(F) - \pi _h(G) \right| \le || h ||_q \cdot WD_{p,d_1} (F,G), \end{aligned}$$

where p and q are conjugates.

Proof

By Hölder’s inequality for p and q we obtain

$$\begin{aligned} | \pi _h(F) - \pi _h(G)|&= \left| \int _0^1 h(v) \cdot \left( F^{-1}(v)- G^{-1}(v)\right) \, dv\right| \\&\le \left( \int _0^1 \left| h(v) \right| ^q \,dv \right) ^{1/q} \cdot \left( \int _0^1 \left| F^{-1}(v)- G^{-1}(v) \right| ^p \, dv \right) ^{1/p} \\&\le || h ||_q \cdot WD_{p,d_1} (F,G). \end{aligned}$$

\(\square \)

Example 1

Let F and G be two distributions with finite first moments.

  • For the \({\text {AV@R}}\) distortion premium \( ||h_\alpha ||_\infty = \frac{1}{1-\alpha }\), and therefore

    $$\begin{aligned} | \pi _{h_\alpha }(F) - \pi _{h_\alpha }(G)|\le \frac{1}{1-\alpha } \cdot WD_{1,d_1} (F,G) . \end{aligned}$$
  • For the power distortion with \(s\ge 1\), \( ||h^{(s)}||_\infty = s\), and therefore

    $$\begin{aligned} | \pi _{h^{(s)}}(F) - \pi _{h^{(s)}}(G)|\le s\cdot WD_{1,d_1} (F,G) . \end{aligned}$$

The power distortion with \(0<s<1\) is not bounded. The next result is dedicated for this particular case.

Proposition 4

(Continuity for the the power distortion with \(0<s<1\)) Let F and G be distribution functions and \(h^{(s)}\) the distortion density defined in (3). If F and G have finite p-moments for \(p>\frac{1}{s}\) and \(h\in \mathcal {L}^q\), then

$$\begin{aligned} | \pi _{h^{(s)}}(F) - \pi _{h^{(s)}}(G)| \le \frac{ s}{ \root q \of {1+q\, (s-1)} } \cdot WD_{p,d_1} (F,G), \end{aligned}$$

where p and q are conjugates.

Proof

We first note that \(p>\frac{1}{s}\) implies \(q< \frac{1}{1-s} \) and let \(t=1+q\, (s-1)>0\).

$$\begin{aligned} \left( \int _0^1 h^{(s)}(v)^q \, dv \right) ^{1/q}&= \left( \int _0^1 s^q \cdot (1-v)^{q\cdot (s-1)} \, dv \right) ^{1/q} \\&= \left( \int _0^1 s^q \cdot (1-v)^{t-1} \, dv \right) ^{1/q} \\&= \frac{s}{ \root q \of {t} } \cdot \left( \int _0^1 t\,(1-v)^{t-1} \, dv\right) ^{1/q} = \frac{s}{ \root q \of {t} }. \end{aligned}$$

Proposition 3 proves the statement. \(\square \)

The next result is a direct consequence of Proposition 4.

Corollary 1

(Continuity for distortion densities dominated by power distortion densities with \(0<s<1\)) Let F and G be distribution functions and h a distortion density. If h is such that \(h(v)\le c\cdot h^{(s)}(v)\), for all \(v\in [0,1]\), \(c>0\) and \(0<s<1\), F and G have finite p-moments for \(p>\frac{1}{s}\) , then \(h\in \mathcal {L}^q\) and

$$\begin{aligned} | \pi _h(F) - \pi _h(G)| \le \frac{c\cdot s}{ \root q \of {1+q\, (s-1)} } \cdot WD_{p,d_1}(F,G) , \end{aligned}$$

where p and q are conjugates.

Corollary 2

(Convergence) If \(F, F_n\) for all \(n\ge 1\) have finite uniformly bounded p-moments, \(h\in \mathcal {L}^q\) and \(WD_{p,d_1} (F_n,F) \rightarrow 0\) as \(n\rightarrow \infty \), then

$$\begin{aligned} \left| \pi _h(F) - \pi _h(F_n) \right| \xrightarrow [n \rightarrow \infty ]{} 0, \end{aligned}$$

where p and q are conjugates.

Remark 5

Corollary 2 holds when the sequence of distributions are the empirical distributions \(\widehat{F}_n\) defined on an i.i.d. sample of size n, \((x_1, \ldots , x_n)\) from \(X\sim F\). If F has finite p-moments, then \(WD_{p,d_1} (\widehat{F}_n,F) \xrightarrow [n \rightarrow \infty ]{} 0\), hence \( \left| \pi _h(\widehat{F}_n) - \pi _h(F) \right| \xrightarrow [n \rightarrow \infty ]{} 0\). This result follows by applying Lemma 4.1 in Pflug and Pichler (2014).

Finally notice that, for continuity, the order of the Wasserstein distance r coincides with the number of finite moments of F.

3.1 Partial coverage

Many insurance contracts do not guarantee complete indemnity, but their payoff is just a part of the full damage. Such contracts include proportional insurance, deductibles and capped insurance. In general, there is a (monotonic) payoff function T such that the payoff is T(X), if the total loss is X. A quite flexible form is for instance the excess-of-loss insurance (XL-insurance), which has a payoff function

$$\begin{aligned} T(x)=\left\{ \begin{array}{ll} 0 &{} \quad \hbox {if}\quad x \le a\\ x-a &{} \quad \hbox {if}\quad a \le x \le e \\ e-a &{} \quad \hbox {if}\quad x \ge e. \end{array} \right. \end{aligned}$$
(14)

Denote by \(F^T\) the distribution of T(X), if F is the distribution of X. The distortion premium for partial coverage is \(\pi _h(F^T)\). We study the relationship between \(F^T\) and \(G^T\) as well as between \(\pi _h(F^T)\) and \(\pi _h(G^T)\) in a slightly more general setup, namely for Hölder continuous T. Recall that T is Hölder continuous with constant \(H_\beta \), if \(|T(x)-T(y)|\le H_\beta \cdot |x-y|^\beta \), for some \(\beta \le 1\).

Theorem 3

(Distance between the original and image probabilities by T) Let P and Q be two probability measures and consider their image probabilities under T denoted by \(P^T\) and \(Q^T\), respectively. If T is a \(\beta \)-Hölder continuous mapping, then

$$\begin{aligned} WD_{r_\beta , d_1} \left( P^T, Q^T\right) \le H_\beta \cdot WD_{r,d_1}^\beta (P,Q), \end{aligned}$$

for \(r_\beta =\frac{r}{\beta }\ge 1\) and \(r\ge 1\), where \(H_\beta \) is the \(\beta \)-Hölder constant.

Proof

Let the joint distribution of X and Y such that

$$\begin{aligned} WD_{r,d_1}(X,Y) = \mathbb {E}^{1/r}\left( |X-Y|^r\right) , \end{aligned}$$

then

$$\begin{aligned} WD^{ r_\beta }_{r_\beta , d_1} (P^T, Q^T)&\le \mathbb {E}(|T(X) - T(Y)|^{r_\beta }) \\&\le H_\beta ^{r_\beta } \cdot \mathbb {E}(|X - Y|^r) = H_\beta ^{r_\beta } \cdot WD^{r}_{r,d_1} (P,Q). \end{aligned}$$

Taking the \(r_\beta \) root on both sides finished the proof. \(\square \)

For the XL-insurance, the Hölder-constant is a Lipschitz constant (\(\beta =1\)) and has the value 1.

From the previous Theorem we can conclude that, if two probabilities are close, then the image probabilities by a mapping T with the characteristics of Theorem 3, are close in Wasserstein distance as well. Theorem 3 isolates the argument also used in Theorem 3.31 in Pflug and Pichler (2014). Note that the underlying distances for the Wasserstein distances are the metrics of the respective spaces.

Corollary 3

Let FG be two distributions defined by the probabilities P and Q, respectively, and \(F^T, G^T\) be their image distributions by T, respectively. If T is a \(\beta \)-Hölder continuous mapping with constant \(H_\beta \), \(h\in \mathcal {L}^q\), the distributions \(F^T\), \(G^T\) with finite p-moments, then for all \(r=p\cdot \beta \) (\(r\ge 1\)), the distortion premium with payment function T satisfies

$$\begin{aligned} | \pi _{h}\left( F^T\right) - \pi _{h}\left( G^T\right) |\le || h||_q \cdot WD_{p,d_1}\left( P^T,Q^T\right) \le || h||_q \cdot H_\beta \cdot WD_{r,d_1}^{\beta }(P,Q).\nonumber \\ \end{aligned}$$
(15)

We proceed now to study sensitivity properties of the distortion premium w.r.t. the distortion density.

4 Continuity of the premium w.r.t. the distortion density

Previously, we studied the mapping \(F \mapsto \pi _h(F)\) for fixed h. In this section, we consider and present properties of the mapping \(h \mapsto \pi _h(F)\) for fixed F. Different sensitivity properties w.r.t. the distortion parameters were studied in Gourieroux and Liu (2006).

Proposition 5

(Continuity of the distortion premium w.r.t. the distortion density h) Let F be a distribution and consider two different distortion densities \(h_1, \, h_2\). If F has finite p-moments and \(h_1, h_2\in \mathcal {L}^q\), then

$$\begin{aligned} \left| \pi _{h_1}(F) - \pi _{h_2}(F) \right| \le ||F^{-1}||_p \cdot || h_1-h_2 ||_q , \end{aligned}$$

where p and q are conjugates. Here the choices \(p=1\), \(q=\infty \) and \(p=\infty \), \(q=1\) are included.

Proof

Use Hölder inequality and the result is direct. \(\square \)

We can conclude that, if \(h_1\) and \(h_2\) are close, then also the premium prices are close. However, h is always identifiable by the following Proposition.

Proposition 6

If \(\pi _{h_1}(F) = \pi _{h_2}(F)\) for all distribution functions F (the value \(\infty \) is not excluded), then

$$\begin{aligned} h_1(v) = h_2(v) \text{ a.s. } \end{aligned}$$

Proof

Let \(F_a\) be the distribution which takes the value 0 with probability a and the value 1 with probability \(1-a\), for some \(a\in (0,1)\), then its inverse \(F_a^{-1}\) is the indicator function of the interval [a, 1]. Hence,

$$\begin{aligned} \pi _{h_1}(F_a) = \int \mathbb {1}_{[a,1]}(v) \, h_1(v) \, dv = \int _a^1 h_1(v) \, dv = \pi _{h_2}(F_a) =\int _a^1 h_2(v) \, dv. \end{aligned}$$

Thus, the distortion distributions \(H_1\) and \(H_2\) are equal and therefore \(h_1 = h_2\) almost surely.

\(\square \)

Remark 6

Note the previous proposition is true if the family of distributions where the premium prices coincide contains all the Bernoulli variables. Compare also Theorem 2 in Wang et al. (1997).

Remark 7

Another family with the property that the premium prices for this family determine the distortion in a unique manner is the family of Power distributions of the form \(F_{\gamma }(u)=u^\gamma \) on [0, 1] and more general of the form \(F_{\gamma , \beta }(u)=\beta ^{-\gamma }u^\gamma \) on \([0, \beta ]\). The distortion premium prices for this family are

$$\begin{aligned} \int _0^1 \beta \, v^{1/\gamma } \, h(v) \, dv, \end{aligned}$$

and the uniqueness of h and \(\beta \) is obtained since

$$\begin{aligned} \beta = \lim _{\gamma \rightarrow \infty } \int _0^1 \beta v^{1/\gamma } \, h(v) \, dv , \end{aligned}$$

and the inversion formula for the Mellin transform (see Zwillinger 2002).

5 Estimating the distortion density from observations

The way how insurance companies calculate a premium is typically not revealed to the customer. Notice that risk premia appear not only in the insurance business, see the link of insurance premium prices and asset pricing in Nguyen et al. (2012). Risk premia appears in other areas such as

  • Power future markets A future contract fixes the price today for delivery of energy later. There is the risk of price changes between now and the delivery period. Thus, such a contract has the character of an insurance and the pricing principles apply, although the price is found in exchange markets (e.g. electricity future markets).

  • Exotic options While standard options are priced through a replication strategy argument, this argument does not apply for other types of options and these options have the character of insurance contracts. Pricing of such contracts is often done over the counter, but again the pricing principle is not revealed to the counterparty.

  • Credit derivatives Also these contracts carry the character of insurance and can be priced according to insurance price principles.

In this section we assume that we know the distortion premium prices of m contracts, which are all priced with the same distortion density h. For each contract j, we also have a sample \(x_1^{(j)}, \ldots , x_n^{(j)}\) of size n drawn from the loss distribution of this contract at our disposal. For simplicity we assume that n is the same for all contracts, but this is not crucial.

The goal of this section is to show how the distortion density h can be regained from the observations of the insurance prices, which would help us to shed more light on the price formation of contract counterparties. Notice that our aim is not to estimate the distortion premium prices from empirical data as is done in Gourieroux and Liu (2006) or Tsukahara (2013).

A simulation example As an example we consider m different loss distributions, all of Gamma type. From each distribution, we obtain a sample of size n. For each sample, we calculate the \({\text {AV@R}}\) and power distortion premium prices. Based on the prices obtained and our samples, we aim to recover the distortion density h. We denote the ordered sample from the j-th loss distribution by \(x^{(j)}_{[1]} , \ldots , x^{(j)}_{[n]}\). The distortion premium, with distortion density h for each sample \(j=1, \ldots , m\), is

$$\begin{aligned} \pi ^{(j)} = \sum _{i=1}^n x^{(j)}_{[i]}\int _{\frac{i-1}{n}}^{\frac{i}{n}} h(v)\, dv =\sum _{i=1}^n x^{(j)}_{[i]} \left( H\left( \frac{i}{n}\right) - H\left( \frac{i-1}{n} \right) \right) . \end{aligned}$$
(16)

On the following, we develop (16) for the particular cases of \({\text {AV@R}}\) and power distortion premium prices for each sample \(j=1, \ldots , m\).

AV@R distortion premium The price for \(h_{ \alpha }\) defined on (7) is

$$\begin{aligned} \pi ^{(j)}= \frac{1}{n\, (1-\alpha )} \cdot \sum _{i=i_a}^n x^{(j)}_{[i]}, \end{aligned}$$
(17)

where \(1<i_\alpha <n\) s.t. \( \frac{i_\alpha -1}{n}\le \alpha < \frac{i_\alpha }{n}\).

Power distortion premium The price given by the power distortion \(h^{(s)}\) defined in (3) with \(0<s<1\) is

$$\begin{aligned} \pi ^{(j)} = \sum _{i=1}^n x^{(j)}_{[i]}\cdot \left( \left( 1-\frac{i-1}{n}\right) ^s - \left( 1- \frac{i}{n}\right) ^s \right) , \end{aligned}$$
(18)

and the price given by \(h^{(s)}\) defined in (5) with \(s\ge 1\) is

$$\begin{aligned} \pi ^{(j)} = \sum _{i=1}^n x^{(j)}_{[i]} \cdot \left( \left( \frac{i}{n}\right) ^s - \left( \frac{i-1}{n}\right) ^s \right) . \end{aligned}$$
(19)

The inverse problem consists on estimating the distortion density h from observed prices. Recall that among the examples we presented of common distortion densities we had step functions and continuous functions, therefore we will use step and spline functions in order to estimate estimate h. We do so for the prices obtained in (17)–(19).

5.1 Estimation of the distortion density with a step function

Distortion density as a step function Let \(\widehat{h}^1_l\) denote the step function consisting of l equal-size steps, defined as

$$\begin{aligned} \widehat{h}^1_l(v)= \sum _{k=1}^l \lambda _k \cdot I_{\left[ L\cdot \frac{k-1}{n}, L\cdot \frac{k}{n}\right) }(v)=\sum _{k=1}^l \lambda _k \cdot I_{\left[ \frac{k-1}{l},\frac{k}{l}\right) }(v), \end{aligned}$$
(20)

where \(L= n/l\), \( \lambda _s\in \mathbb {R}\) for \(k=1, \ldots , l\) and l denotes the dimension of the step function space. We also impose

$$\begin{aligned} \int _0^1 \widehat{h}^1_l (v) \, dv=\sum _{k=1}^l \int _{ \frac{k-1}{l}}^{ \frac{k}{l}} \lambda _k \,dv = \frac{1}{l}\cdot \sum _{k=1}^l \lambda _k = 1, \end{aligned}$$
(21)

with \(0\le \lambda _1\le \cdots \le \lambda _l. \) In this way, \(\widehat{h}^1_l\) fulfils the density constraints as well as the non-decreasing constraints.

Prices with the step function For each sample \(j=1, \ldots , m\), the prices with \(\widehat{h}^1_l\) are

$$\begin{aligned} \widehat{\pi }^{(j)} = \sum _{i=1}^n x_{[i]}^{(j)} \cdot \int _{\frac{i-1}{n}}^{\frac{i}{n}} \widehat{h}^1_l (v) \, dv =\sum _{k=1}^l \sum _{i = (k-1)L + 1}^{L\cdot k} x_{[i]}^{(j)} \cdot \int _{\frac{i-1}{n}}^{\frac{i}{n}} \lambda _k \, dv = \sum _{k=1}^l \frac{\lambda _k}{n} \cdot \sum _{i = (k-1)L + 1}^{L\cdot k} x_{[i]}^{(j)} , \end{aligned}$$
(22)

Estimation In order to estimate \(\widehat{h}^1_l\) we will minimize the squares of the differences between the prices obtained by a distortion function h and the premium obtained by \(\widehat{h}^1_l\) in (22). We will test our results with the given prices \(\pi ^{(j)}\) calculated in (17), (18) and (19). We solve,

figure a

5.2 Estimation of the distortion density with a cubic monotone spline

B-splines construction For our purposes we will define the splines on the interval [0, 1]. Any B-spline is a linear combinations of the B-spline basis functions. The B-spline basis functions have all the same degree, b and we choose to define them at equally spaced knots \(t_k=k/L\), for \(k=0, \ldots , L\), hence L subintervals. The functions for this basis are denoted as \(B_{k,b}\) and constructed following a recursion formula. The B-spline basis function of degree 0 is denoted and defined as

$$\begin{aligned} B_{k,0}(v)={\left\{ \begin{array}{ll} 1 &{} \quad t_k \le v \le t_{k+1} \\ 0 &{} \quad \text {otherwise.} \end{array}\right. } \end{aligned}$$

The B-spline basis functions of degree b, \(B_{k,b}\) are obtained as an interpolation between \(B_{k,b-1}\) and \(B_{k+1,b-1}\), following the recursion formula

$$\begin{aligned} B_{k,b}(v) = \frac{v-t_k}{t_{k+b} -t_k} B_{k, b-1}(v) + \frac{t_{k+b+1}- v}{t_{k+b+1} -t_{k+1}} B_{k+1, b-1}(v). \end{aligned}$$

In the recursion we need to define fake knots \(t_{-k}=0\) and \(t_{L+k}=1\) for \(k=1, \ldots ,b\). In our case, we consider splines of degree \(b=2\). If we divide [0, 1] in L equally sized intervals, the basis has \(L+2 \) functions

$$\begin{aligned} \left\{ B_{-2,2}, B_{-1,2}, B_{0,2}, B_{1,2}, \ldots ,B_{L-1,2} \right\} . \end{aligned}$$
(23)

Notice that all the elements of the basis can be obtained by translating the B-spline basis function \(B_{0,2}\) defined on the first \(b+2=4\) knots. In order to have a base of increasing monotone cubic splines we integrate the functions of (23) and obtain a new base

$$\begin{aligned} \{S_{-2},S_{-1},S_0, \ldots , S_{L-1}\}, \end{aligned}$$
(24)

where \(S_k(v)= \int _0^v B_{k,2}(w)\, dw\) for all \(k=-2, \ldots , L-1\). We scale the functions of (23) so the splines in (24) are distribution functions. Note that no linear combination of (24) gives us a constant function, due to construction of (24). Therefore, we need one element to our base, say \(S_{L}(v)=c\) and hence

$$\begin{aligned} \{S_{-2},S_{-1},S_0, \ldots , S_{L-1}, S_L\}, \end{aligned}$$
(25)

is our final base with \(l=L+3\) elements, where l denotes its dimension.

As an example we illustrate the base obtained for \(L=5\). Starting with \(B_{0,2}\) defined on \(t_0=0, t_1=1/5, t_2=2/5, t_3=3/5\), precisely

$$\begin{aligned} B_{0,2}(v) = \frac{5^3}{2} \cdot \left( v^2 \mathbb {1}_{[t_0,t_1)} + \left( v(t_2-v)+(t_3-v)(v-t_1) \right) \mathbb {1}_{[t_1,t_2)} + (t_3-v)^2 \mathbb {1}_{[t_2,t_3)}\right) \end{aligned}$$

We denote by \(S_0\) the distribution of \(B_{0,2}\) and obtain the rest of the monotone cubic splines by translating \(S_0\). The basis of cubic monotone splines of dimension \(l=8\), illustrated in Fig. 1, is denoted as

$$\begin{aligned} \{S_{-2},S_{-1},S_0, \ldots , S_4, S_5\}, \end{aligned}$$
(26)

where \(S_k(v)=S_0(v-k/5)\) for \(k=-2, \ldots , 4\) and \(S_{5}(v)=c\).

Fig. 1
figure 1

Cubic increasing monotonic base functions

Any linear combination with positive scalars of the splines in (26) define a spline which is an increasing and positive function.

Distortion density as a spline Let \(\widehat{h}^2_l(v)\) denote an increasing monotone cubic density defined as a linear combination of \(l=L+3\) splines in (25)

$$\begin{aligned} \widehat{h}^2_l(v) = \sum _{k=-2}^{L} \lambda _k\cdot S_k(v), \end{aligned}$$
(27)

where \(\lambda _k\ge 0\) for all \(k=-2, \ldots , L\). Notice that by setting the scalars to be non-negative, \(\widehat{h}^2_l \) is increasing. However, \(\widehat{h}^2_l\) must integrate to 1 on [0, 1], hence

$$\begin{aligned} \int _0^1 \widehat{h}^2_l(v) \, dv= \sum _{k=-2}^{L} \lambda _k\cdot \int _0^1 S_k(v)\,dv =\sum _{k=-2}^{L} \lambda _k\cdot \left( \sum _{i=1}^nA_{ik}\right) = \sum _{k=-2}^{L} \lambda _k\cdot a_k=1, \end{aligned}$$

where

$$\begin{aligned} A_{ik}= \int _{\frac{i-1}{n}}^{\frac{i}{n}}S_k(v)\,dv, \quad a_k=\sum _{i=1}^n A_{ik}. \end{aligned}$$
(28)

Prices with the spline For each sample \(j=1, \ldots , m\), the prices with \(\widehat{h}^2_l\) are

$$\begin{aligned} \widehat{\pi }^{(j)} = \sum _{i=1}^n x_{[i]}^{(j)} \cdot \int _{\frac{i-1}{n}}^{\frac{i}{n}} \widehat{h}^2_l (v) \, dv = \sum _{i=1}^n x_{[i]}^{(j)} \cdot \left( \sum _{k=-2}^{L} \lambda _k \, A_{ik} \right) . \end{aligned}$$
(29)

Estimation Given prices \(\pi ^{(j)}\) calculated as in (17), (18) or (19) and the prices calculated in (29) for every sample \(j=1, \ldots , m\), we solve

figure b

where \(a_k\) is defined in (28).

The estimations obtained by solving (\(P_1\)) and (\(P_2\)) are presented below.

AV@R distortion premium We consider particular cases of \(h_\alpha \) for \(\alpha =0.9, 0.95\). We estimate the distortion density for each of the cases, with two different step functions, corresponding to \(l=8, 10\) steps, and two different spline basis functions of dimensions \(l=8, 13\), respectively.

Step function The estimated step distortions \(\widehat{h}_l\) for \(l=8, 10\) are obtained by solving (\(P_1\)) and illustrated below (Fig. 2).

Fig. 2
figure 2

The true distortion density \(h_{{\alpha }}\) for \(\alpha =0.9, 0.95\) and their respective step functions estimators for \(l=8\) steps, and \(l=10\) steps

Splines The estimated spline distortions \(\widehat{h}^2_l\) for \(l=8,13\) are obtained by solving (\(P_2\)) and illustrated below (Fig. 3).

Fig. 3
figure 3

The true distortion density \(h_{{\alpha }}\) for \(\alpha =0.9, 0.95\) and their respective spline estimators for \(l=8\) and \(l=13\) spline base dimension

Power distortion premium For this case we consider\(h^{(s)}\) for \(s=0.8, 3\). We solve (\(P_1\)) and (\(P_2\)) with the same number of steps and number of spline basis functions as before.

Step function The estimated step distortions \(\widehat{h}^1_l\) for \(l=8,10\) are obtained by solving (\(P_1\)) and illustrated below (Fig. 4).

Fig. 4
figure 4

The true distortion density \(h^{(s)}\) for \(s=0.8,3\) and their respective estimated step distortions with \(l=8,10\) steps

Splines The estimated spline distortions \(\widehat{h}^1_l\) for \(l=8,13\) are obtained by solving (\(P_2\)) and illustrated below (Fig. 5).

Fig. 5
figure 5

The true distortion density \(h^{(s)}\) for \(s=0.8,3\) and their respective estimated spline distortions with \(l=8,13\) spline base dimension

The optimal values of the optimization problems for all the cases can be seen in the following Table 1.

Table 1 Optimal values of the problems (\(P_1\)) and (\(P_2\)) for the \({\text {AV@R}}\)-distortion and the power distortion

6 Ambiguity

In this section we combine the distortion premium with the ambiguity principle. Such an approach allows us to incorporate model uncertainty into the premium. Recall that, by setting the distortion density to \(h=1\), we would price just with the ambiguity principle. As was mentioned in Sect. 1, distances can be used to define ambiguity sets. Here, closed Wasserstein balls will serve as ambiguity sets. These sets will be centred at F, an initial distribution, that we refer to as our baseline model.

Definition 2

(Robust distortion premium under Wasserstein balls with \(d_1\)) Let F be the baseline loss distribution, h a distortion density. The robust distorted price of order \(r\ge 1\) is

$$\begin{aligned} \pi ^\epsilon _{h,r,d_1} (F)=\sup \left\{ \pi _h(G) : G\in \mathcal {B}_{r,d_1}(F,\epsilon ) \right\} , \end{aligned}$$
(P-r)

where \(\mathcal {B}_{r,d_1}(F,\epsilon ) =\{G:\, WD_{r,d_1}(G, F)\le \epsilon \}.\) We call the worst case distribution and denote it by \(F^*\) if \(F^* \in \mathcal {B}_{r,d_1}(F,\epsilon ) \) and is such that

$$\begin{aligned} \pi ^\epsilon _{h,r,d_1} (F) = \pi _h(F^*). \end{aligned}$$

Remark 8

Notice that for \(r_1 \le r_2\)

$$\begin{aligned} WD_{r_1,d_1} \le WD_{r_2,d_1}, \end{aligned}$$
(30)

thus \(\mathcal {B}_{r_1,d_1} \supseteq \mathcal {B}_{r_2,d_1}\).

We can say more about the value and solution of (P-r) if we choose \(r=p\). We start with bounded distortion densities, i.e. for \(p=1\) and \(q=\infty \).

Proposition 7

(Characterization of the worst case distribution for \(r\ge p=1 \)) Let the baseline distribution F have its first moment finite.

  1. (i)

    If h is unbounded, then (P-r) for \(r = 1\) is unbounded.

  2. (ii)

    If h is bounded with \(\sup _v h(v) = \Vert h\Vert _\infty \), then (P-r) is bounded for all \(r\ge 1\). If \(r=1\), the optimal value of (P-r) is

    $$\begin{aligned} \pi ^\epsilon _{h,1,d_1}(F) = \pi _h(F) + \epsilon \cdot \Vert h\Vert _\infty . \end{aligned}$$

    We interpret the additional term \(\epsilon \cdot \Vert h\Vert _\infty \) as the ambiguity premium. For the worst case distribution,

    • if \(h(v) = \Vert h\Vert _\infty \) for \(v \ge 1-\eta \) and \(0<\eta \le 1\), then the supremum is attained at

      $$\begin{aligned} F_\eta ^*(x) = \left\{ \begin{array}{ll} F(x) &{} \quad x< F^{-1}(1-\eta ),\\ 1-\eta &{} \quad F^{-1}(1-\eta )\le x < F^{-1}(1-\eta ) + \epsilon / \eta ,\\ F\left( x- \epsilon / \eta \right) &{} \quad x \ge F^{-1}(1-\eta ) + \epsilon / \eta . \end{array} \right. \end{aligned}$$
    • Otherwise, the supremum is not attained, but can be approximated by the sequence \(F^*_{1/n}(x)\), \(\forall n\in \mathbb {N}\).

Proof

  1. (i)

    Given that h is increasing and unbounded, the increasing sequence \(K_n = h\left( 1- 1/n \right) \), is such that \(\lim _{n\rightarrow \infty } K_n =\infty \). For all \(n\in \mathbb {N}\) we define a distribution \(G_n\) such that

    $$\begin{aligned} G_n^{-1}(v) = F^{-1}(v) + \epsilon \cdot n\, \mathbb {1}_{[1-1/n , 1]}. \end{aligned}$$

\(G_n\) is on the boundary of \( \mathcal {B}_{1,d_1}(F,\epsilon ) \) and

$$\begin{aligned} \pi _h(G_n) = \pi _h(F) + \epsilon \cdot n \int _{1-1/n}^1 h(v)\, dv \ge \pi _h(F) + \epsilon \, K_n. \end{aligned}$$

Hence, (P-r) is unbounded for \(r=1\). (ii) It is sufficient to prove (P-r) is bounded for \(r=1\) since \(\mathcal {B}_{1,d_1} \supseteq \mathcal {B}_{r,d_1}\) for all \(r\ge 1\) (see Remark 8). Any admissible G for \(r=1\) can be written as \(G^{-1}(v) = F^{-1}(v) + G_1^{-1}(v)\), where \(G_1\) is such that \(\int _0^1 G_1^{-1}(v) \, dv \le \epsilon \). Since F has its first moment finite, the following upper bound is finite:

$$\begin{aligned} \pi _h(G) = \pi _h(F) + \int _0^1 G_1^{-1}(v)\, h(v) \, dv \le \pi _h(F) + \epsilon \cdot \Vert h\Vert _\infty . \end{aligned}$$
(31)

The distribution \(F_\eta ^*(x)\) given in the Proposition has inverse

$$\begin{aligned} \left( F_\eta ^*\right) ^{-1} (v) = F^{-1}(v) + \frac{\epsilon }{\eta } \mathbb {1}_{[1-\eta ,1]}. \end{aligned}$$

Therefore, \(F_\eta ^* \) is on the boundary of \(\mathcal {B}_{1,d_1}(F,\epsilon )\) and

$$\begin{aligned} \pi _h\left( F_\eta ^*\right) =\int _0^1 \left( F^{-1}(v) + \frac{\epsilon }{\eta } \mathbb {1}_{[1-\eta ,1]}\right) h(v) \, dv= \pi _h(F) + \frac{\epsilon }{\eta } \int _{1-\eta }^1 h(v) \, dv. \end{aligned}$$

If \(h(v) = \Vert h\Vert _\infty \) for \(v \ge 1-\eta \), then \(F_\eta ^*\) attains the upper bound in (31). Otherwise, \(F^*_{1/n}\) approaches the maximum from below, since

$$\begin{aligned} \left( F_{1/n}^*\right) ^{-1} (v) = F^{-1}(v) + \epsilon \cdot n \mathbb {1}_{[1-1/n,1]}, \end{aligned}$$

and

$$\begin{aligned} \pi _h(F^*_{1/n})= \pi _h(F) + \epsilon \cdot n \int _{1-1/n}^1 h(v) \, dv \uparrow \pi _h(F) + \epsilon \cdot \Vert h\Vert _\infty . \end{aligned}$$

\(\square \)

Remark 9

The solution \(F^*_\eta \) in Proposition 7 is not unique. Any distribution \(\tilde{F}_\eta \) such that \({\tilde{F}_\eta }^{-1}(v) = F^{-1}(v) + \frac{\epsilon }{\eta }\cdot k(v)\mathbb {1}_{[1-\eta ,1]} \), with \(\frac{1}{\eta }\cdot k(v)\mathbb {1}_{[1-\eta ,1]}\) a density on [0, 1], attains the supremum.

As an example, we illustrate the worst case distribution for the \({\text {AV@R}}\) premium (Fig. 6).

Fig. 6
figure 6

The worst case distribution \(F^*_\eta \) for \(h_\alpha \) with \(\alpha =0.9\) is obtained by shifting F from \(x_\alpha \), a length \(\epsilon /\eta \), where \(x_\alpha =F^{-1}(\alpha )\) and \(\eta =1-\alpha \)

If h is unbounded we can characterize the solution of (P-r) as follows.

Proposition 8

(Characterization of the worst case distribution for \(\mathbf {r\ge p>1}\)) Let the baseline distribution F have finite p-moments. If \(h\in \mathcal {L}^q\), then (P-r) is bounded for \(r\ge p\). If \(r=p\), the optimal value of (P-r) is

$$\begin{aligned} \pi ^\epsilon _{h, p, d_1}(F)= \pi _h(F) + \epsilon \cdot \Vert h\Vert _q^q. \end{aligned}$$

Also in this case, the term \(\epsilon \cdot \Vert h\Vert _q^q\) is interpreted as ambiguity premium.

Furthermore, the worst case distribution \(F^*\) of (P-r) for \(r=p\) is such that

$$\begin{aligned} {F^*}^{-1}(v) = F^{-1}(v) + \epsilon \cdot \left( \frac{h(v)}{|| h||_q}\right) ^{q/p} . \end{aligned}$$

Proof

We prove (P-r) is bounded for \(r=p\) and by Remark 8 we have boundness for all \(r\ge p\). Notice that, for all admissible G, if \(r=p\), we have

$$\begin{aligned} \int _0^1 G^{-1}(v) \, h(v)\, dv&\le \int _0^1 F^{-1}(v)\, h(v) \, dv + \int _0^1 |G^{-1} -F^{-1}| \, h(v)\, dv \\&\le \pi _h(F) + \left( \int _0^1 |G^{-1} -F^{-1}|^p \, dv \right) ^{1/p} ||h||_q \\&\le \pi _h(F) + \epsilon \cdot || h||_q. \end{aligned}$$

\(F^*\) is admissible since it is on the boundary of \(\mathcal {B}_{p, d_1}(F, \epsilon )\)

$$\begin{aligned} WD_{p, d_1}(F, F^*) =\left( \int _0^1 \epsilon ^p \cdot \left( \frac{h(v)}{ || h ||_q}\right) ^q \, dv \right) ^{1/p}=\epsilon , \end{aligned}$$

and \(F^*\) attains the upper bound

$$\begin{aligned} \pi _h(F^*) - \pi _h(F) = \int _0^1 \epsilon \cdot \left( \frac{h(v)}{ || h ||_q}\right) ^{q/p} h(v)\, dv =\epsilon \cdot \int _0^1\frac{ h(v)^q}{||h ||^{q-1}_q} \, dv =\epsilon \cdot || h ||_q.\square \end{aligned}$$

Under some conditions on h we can also prove unboundness of (P-r) for \(r>p>1\) in the case where h is not in \(L^q\), where q is the conjugate of p, the finite moments of F.

Proposition 9

(Unboundness for \(\mathbf {r}>\mathbf {p}>\mathbf {1}\)) Let the baseline distribution F have finite p-moments and let \(h\notin \mathcal {L}^q\), for \(p,\, q\) conjugates and \(r,\,s\) conjugates with \(r>1\). If there exists \(s_1<s\) such that \(\int _0^1 h(v)^{s_1} \, dv =\infty \) and \(h\in \mathcal {L}^t\), for all \(t<s_1\), then (P-r) is unbounded for all \(r>p\).

Proof

Define \(\psi _\eta (v) = h(v)^{s_1-1}\mathbb {1}_{[1-\eta , 1]}\). Since \(\psi _\eta \in \mathcal {L}^r\) for \(r>1\) (note that \(r(s_1 - 1)<s_1\)), there exists an \(0<\eta <1\) such that

$$\begin{aligned} \int _0^1 \psi _\eta (v)^r\, dv = \int _{1-\eta }^1 h(v)^{r(s_1 - 1)}\, dv < \epsilon . \end{aligned}$$

Thus, the distribution \(G_\eta \) such that \(G_\eta ^{-1}(v) = F^{-1} + \psi _\eta (v)\) is in \(\mathcal {B}_{r,d_1}(F, \epsilon )\). And its premium is unbounded

$$\begin{aligned} \pi _h(G_\eta ) = \pi _h(F) + \int _0^1 \psi _\eta (v)\, h(v)\, dv = \pi _h(F) + \int _{1-\eta }^1 h(v)^{s_1}> \infty .\square \end{aligned}$$

Remark 10

If instead of the metric \(d_1\) we consider \(d_p(x,y)= |x^p - y^p|\) as underlying metric for the Wasserstein distance, we could define the ambiguity principle

$$\begin{aligned} \pi ^\epsilon _{h,1,d_p} (F)=\sup \{ \pi _h(G) : G\in \mathcal {B}_{1,d_p}(F,\epsilon ) \} , \end{aligned}$$
(P-dp)

where \(\mathcal {B}_{r,d_p}(F,\epsilon ) =\{G:\, WD_{r,d_p}(G, F)\le \epsilon \}.\) It is easy to see that, if F has p-moments the constraint of the balls make all of admissible distributions to have also p-moments, therefore for Proposition 3, if \(h\in \mathcal {L}^q\) , then (P-dp) is bounded. Furthermore, continuity respect to this Wasserstein distance implies our continuity results in Sect. 3.

7 Conclusions

After some introduction about general premium principles we propose generalizations of the distortion premium. In addition, we have studied in detail three functional relationships for the distortion premium

  • the premium function \(F \mapsto \pi _h(F)\), i.e. the properties of \(\pi _h\) as a premium principle,

  • the direct function \(h \mapsto \pi _h(F)\), i.e. the dependency on the distortion density,

  • the inverse functions \(\pi _h(F) \mapsto h\).

The smoothness properties are important for robustness aspects, however it is well known that a quite smooth direct function makes the inverse problem difficult. We showed however that the inverse problem is identifiable and we gave a simple quadratic optimization problem to estimate it from empirical data. We successfully illustrated this in a simulation study, the application on real data is left for further research. We also identified the ambiguity premium for Wasserstein balls as ambiguity sets offering, in some cases, a specific formulation of the worst case distribution. It turned out that the extra premium for ambiguity depends on the distortion function h and in a multiplicative way on the ambiguity radius \(\epsilon \), but does not on the loss distribution F itself. Thus it is the same for all contracts and can be calculated in a separate manner. Finally, by using different distances as underlying metrics for the Wasserstein ball, and hence, for the ambiguity set, we could find bounds for the robust premium is always bounded.