1 Introduction

As pointed out by Embrechts et al. [17] and McNeil et al. [33, Sect. 6.2.1], in financial mathematics and actuarial science, marginal risks and their dependence structure are often modelled separately. While the marginal risks of a \(d\)-variate risk are identified with probability distributions \(\mu _{1},\ldots ,\mu _{d}\) on the real line, the dependence structure is most often modelled by a \(d\)-variate copula \(C\). The distribution function \(F_{\mu}\) of the joint distribution \(\mu \) is then given by

F μ ( x 1 ,, x d )=C( F μ 1 ( x 1 ),, F μ d ( x d ))for all  x 1 ,, x d R,
(1.1)

where \(F_{\mu _{i}}\) is the distribution function of \(\mu _{i}\).

In practical applications, quantitative risk managers and actuaries are interested in various aspects \(\mathcal{T}_{d}(\mu )\) of the joint distribution \(\mu \) of the individual risks. An important example is \(\mathcal{T}_{d}=\mathcal{R}_{A_{d}}\) with

$$ {\mathcal{R}}_{A_{d}}(\mu ):=\mathcal{R}(\mu \circ A_{d}^{-1}), $$
(1.2)

where ℛ is the risk functional corresponding to some distribution-invariant ‘downside’ risk measure and A d : R d R is a fixed Borel-measurable map regarded as an aggregation map in the spirit of McNeil et al. [33, Sect. 6.2.1]. Standard examples for the aggregation map are \(A_{d}(x_{1},\dots ,x_{d}) := \sum _{i=1}^{d}x_{i}\) and the three other maps presented in Example 4.4 below. Note that \(\mu \circ A_{d}^{-1}\) is the distribution of \(A_{d}(X_{1},\ldots ,X_{d})\) when \((X_{1},\ldots ,X_{d})\) is a random vector distributed according to \(\mu \). Therefore \(\mathcal{R}_{A_{d}}(\mu )\) can be seen as the downside risk of the aggregate position \(A_{d}(X_{1},\ldots ,X_{d})\).

More generally, one could consider \(\mathcal{T}_{d}=\mathcal{R}_{\mathfrak{A}_{d}}\) with

$$ {\mathcal{R}}_{\mathfrak{A}_{d}}(\mu ):=\inf \{\mathcal{R}(\mu \circ A_{d}^{-1}) : A_{d}\in \mathfrak{A}_{d} \}=\inf \{\mathcal{R}_{A_{d}}(\mu ) : A_{d} \in \mathfrak{A}_{d} \}, $$
(1.3)

where \(\mathfrak{A}_{d}\) is a fixed set of Borel-measurable maps A d : R d R. If there exists an \(A_{d}^{*}\in \mathfrak{A}_{d}\) at which the infimum in (1.3) is attained, then \(\mathcal{R}_{\mathfrak{A}_{d}}(\mu )\) can be seen as the smallest possible risk of a position \(A_{d}(X_{1},\ldots ,X_{d})\) derived from the single risks \(X_{1},\ldots ,X_{d}\) with joint distribution \(\mu \) through a function \(A_{d}\in \mathfrak{A}_{d}\). It is worth noting that ‘risk’ here does not necessarily mean downside risk, but can also be for instance a mean–downside risk mixture which is the target value in many portfolio optimisation problems. For details, see Sect. 5.2, in particular Remark 5.5.

Of course, there are many other examples for \(\mathcal{T}_{d}\). One of them is the optimal value in a multi-period portfolio optimisation problem that is addressed in Sect. 6.2. In this example, the role of \(\mu \) is played by the joint distribution of the relative price changes of the \(d\) risky assets that are available on the considered financial market.

When starting from separate models for the copula and the marginal distributions, it is reasonable to regard \(\mathcal{T}_{d}\) as a functional of the copula \(C\) and the marginal distributions \(\mu _{1},\ldots ,\mu _{d}\) via

$$ \mathfrak{T}_{d}(C,\mu _{1},\ldots ,\mu _{d}):=\mathcal{T}_{d}\Big( \mathfrak{p}_{d}\big(C(F_{\mu _{1}},\ldots ,F_{\mu _{d}})\big)\Big), $$
(1.4)

where \(\mathfrak{p}_{d}\) assigns to a \(d\)-variate distribution function its corresponding Borel probability measure on R d .

In [33, Sect. 6.2.1], McNeil et al. point out that practitioners are often required to work only with partial information. For instance, in some situations, it is possible to obtain (sufficient) information on \(\mu _{1},\ldots ,\mu _{d}\), but it is much more difficult to obtain information on the dependence structure. Carrying this to the extreme, McNeil et al. assume that \(\mu _{1},\ldots ,\mu _{d}\) are fully known and \(C\) is fully unknown. In this case, one cannot specify \(\mathfrak{T}_{d}(C,\mu _{1},\ldots ,\mu _{d})\), because \(C\) is unknown. This leads to the ‘Fréchet problem’ of specifying the range of the map \(C\mapsto \mathfrak{T}_{d}(C,\mu _{1},\ldots ,\mu _{d})\). In the special case where \(\mathfrak{T}_{d}\) takes values in ℝ, this is often related to finding (sharp) upper and lower bounds for this map. There is a vast literature dealing with this problem; see for instance the works of Rüschendorf [43], [44, Chap. 4], Embrechts and Puccetti [14], Embrechts et al. [15], Puccetti [39], Embrechts et al. [17] and the references cited therein.

In the present paper, a related but different problem is addressed. Still in the case where \(\mu _{1},\ldots ,\mu _{d}\) are known (and fixed), assume that \(\widehat{C}\) is a guess for the true copula \(C\). It might be based on an expert opinion, a statistical estimation, or the like. Of course, as a guess, \(\widehat{C}\) can differ from \(C\). It is clear that a deviation of \(\widehat{C}\) from \(C\) can imply a significant difference between \(\mathfrak{T}_{d}(\widehat{C},\mu _{1},\ldots ,\mu _{d})\) and \(\mathfrak{T}_{d}(C,\mu _{1},\ldots ,\mu _{d})\). On the other hand, one might ask whether the difference remains small if the deviation of \(\widehat{C}\) from \(C\) is small. This question was raised and answered by Embrechts et al. [17] in the context of (1.2) with \(A_{d}(x_{1},\dots ,x_{d}) := \sum _{i=1}^{d}x_{i}\). Krätschmer et al. [28, Sect. 4.2.4] took up this concept and generalised the respective result of [17]. In fact, in the latter two references, continuity of the functional \(\mathcal{T}_{d}\) at the probability measure \(\mathfrak{p}_{d}(C(F_{\mu _{1}},\ldots ,F_{\mu _{d}}))\) (with fixed marginal distributions \(\mu _{1},\ldots ,\mu _{d}\) having finite \(p\)th moments) was not considered with respect to a metric on the set of copulas, but with respect to the (relative) weak topology on the set of \(d\)-variate distributions (with marginal distributions \(\mu _{1},\ldots ,\mu _{d}\)). However, it can be seen from Theorem 3.10 below that this is equivalent when the set of copulas is equipped with the supremum distance.

Despite this equivalence, it might be a little more accessible for some readers to measure the difference between two dependence structures directly through the difference between the corresponding copulas, in particular if one starts from separate models for the copula and the marginal distributions. If one follows this approach, one ought to take into account that a \(d\)-variate distribution \(\mu \) with fixed marginal distributions \(\mu _{1},\ldots ,\mu _{d}\) depends on the copula \(C\) only through the values that \(C\) takes on \(\mathrm{ran}F_{\mu _{1}}\times \cdots \times\mathrm{ran}F_{\mu _{d}}\) (\(\subseteq [0,1]^{d}\)), where \(\mathrm{ran}F_{\mu _{i}}\) is the range of \(F_{\mu _{i}}\). This is apparent from (1.1) and suggests to measure the distance between copulas (in the considered framework) only on \(\mathrm{ran}F_{\mu _{1}}\times \cdots \times\mathrm{ran}F_{\mu _{d}}\).

We propose to say that the functional \(\mathcal{T}_{d}\) underlying \(\mathfrak{T}_{d}\) (recall Eq. (1.4)) is copula robust if for any ‘admissible’ univariate distributions \(\mu _{1},\ldots ,\mu _{d}\), the map \(C\mapsto \mathfrak{T}_{d}(C, \mu _{1},\ldots ,\mu _{d})\) is continuous with respect to pointwise (or uniform) convergence on \(\overline{\mathrm{ran}F_{\mu _{1}}}\times \cdots \times \overline{\mathrm{ran}F_{\mu _{d}}}\), where it is assumed that \(\mathcal{T}_{d}\) (and thus \(\mathfrak{T}_{d}\)) takes values in a topological space. By ‘admissible’ we mean that one can find at least one copula \(C\) such that the probability measure \(\mathfrak{p}_{d}(C(F_{\mu _{1}},\ldots ,F_{\mu _{d}}))\) is contained in the domain of \(\mathcal{T}_{d}\). The precise definition of copula robustness is given in Sect. 3. The required notation and terminology as well as some auxiliary results are given before in Sect. 2. It is worth mentioning that Theorem 2.3 provides a generalisation of Deheuvels’ [10] copula convergence theorem and that Corollary 2.9 provides a characterisation of weak convergence in Fréchet classes of \(d\)-variate distributions.

In the second part of the paper, we discuss three examples for copula robust functionals \(\mathcal{T}_{d}\). First, in Sect. 4, we address the quantification of the ‘downside risk’ of aggregate financial positions. It will be seen that the functional in (1.2) is copula robust under mild assumptions (Sect. 4.2). The relation of copula robustness to the concept of aggregation robustness of Embrechts et al. [17] (Sect. 4.3) as well as copula robustness of inf-convolution functionals (Sect. 4.4) are also discussed in detail. Second, in Sect. 5, we address stochastic programming problems. It can be inferred from results of Claus et al. [9] that the optimal value of a general stochastic programming problem depends copula robustly on the distribution of the underlying \(d\)-variate input random variable \(Z\). This covers in particular classical one-period portfolio optimisation problems (where the role of \(Z\) is played by the vector of the relative price changes of \(d\) risky assets) and therefore backs in a way a hypothesis of Saida and Prigent [45]. In [45, Sect. 1], they conclude from their numerical investigations that ‘investors must more take care of the specification of the marginal distribution than of the copula function’. Third, in Sect. 6, we address multi-period portfolio optimisation problems and derive results that are similar to those in the one-period case. The main tool in this context is Theorem 6.2 which is a variant of a result of Müller [35] about the continuous dependence of the value function on the transition function in a Markov decision model. Theorem 6.2 is of independent interest and contributes to the general theory of Markov decision processes.

Throughout this paper, \(|\cdot |\) denotes any norm on R d and \(\langle \,\cdot \,,\,\cdot \,\rangle \) is the Euclidean scalar product defined by \(\langle x,y\rangle :=\sum _{i=1}^{d}x_{i}y_{i}\) for any elements \(x=(x_{1},\ldots ,x_{d})\) and \(y=(y_{1},\ldots ,y_{d})\) of R d . Moreover, we set R + :=[0,) and R + + :=(0,). The proofs of all results can be found in Appendix A.

2 Preliminary notation, terminology and results

2.1 Fréchet classes and copulas

For any dN, let us use \(\mathcal{M}_{d}\) to denote the set of all Borel probability measures on R d . For any fixed \(\mu _{1},\ldots ,\mu _{d}\in{\mathcal{M}}_{1}\), denote by \(\mathcal{M}_{d}(\mu _{1},\ldots ,\mu _{d})\) the set of all \(\mu \in{\mathcal{M}}_{d}\) having marginals \(\mu _{1},\ldots ,\mu _{d}\), i.e., satisfying \(\mu \circ \pi _{i}^{-1}=\mu _{i}\) for any \(i=1,\ldots ,d\), where π i : R d R is the projection on the \(i\)th coordinate. The set \(\mathcal{M}_{d}(\mu _{1},\ldots ,\mu _{d})\) is known as Fréchet class associated with the univariate Borel probability measures \(\mu _{1},\ldots ,\mu _{d}\). The distribution function of a Borel probability measure \(\mu \) will be denoted by \(F_{\mu}\).

By definition a \(d\)-variate copula is the distribution function \(C:[0,1]^{d}\rightarrow [0,1]\) of a Borel probability measure on \([0,1]^{d}\) whose marginal distributions are all given by the uniform distribution on \([0,1]\). The latter condition ensures that each \(d\)-variate copula \(C\) is Lipschitz-continuous. Theorem 2.10.7 in Nelsen’s textbook [36] indeed shows that every \(d\)-variate copula \(C\) satisfies \(|C(u)-C(v)|\le |u-v|_{1}\), where \(|x|_{1}:=\sum _{i=1}^{d}|x_{i}|\) for any x=( x 1 ,, x d ) R d .

Let us denote by \(\mathbf{C}_{d}\) the set of all \(d\)-variate copulas. With any \(C\in \mathbf{C}_{d}\) and \(\mu _{1},\ldots ,\mu _{d}\in{\mathcal{M}}_{1}\), we associate an element \(\mu \) of the Fréchet class \(\mathcal{M}_{d}(\mu _{1},\ldots ,\mu _{d})\) through (1.1). It is indeed easily seen that the right-hand side of (1.1) defines a probability distribution function on R d and that the corresponding Borel probability measure \(\mu \) has \(\mu _{1},\ldots ,\mu _{d}\) as its marginal distributions. Sklar’s theorem ([49]; see also [36, Theorem 2.10.9]) shows that the distribution function of any element \(\mu \) of \(\mathcal{M}_{d}(\mu _{1},\dots ,\mu _{d})\) admits the representation (1.1). That is, for any \(\mu \in{\mathcal{M}}_{d}(\mu _{1},\dots ,\mu _{d})\), one can find a copula \(C\in \mathbf{C}_{d}\) such that (1.1) holds. On the set \(\mathrm{ran}F_{\mu _{1}}\times \cdots \times\mathrm{ran}F_{\mu _{d}}\), the copula \(C\) is uniquely determined and given by

$$ C(u_{1},\ldots ,u_{d})=F_{\mu}\big(F_{\mu _{1}}^{\leftarrow}(u_{1}), \ldots ,F_{\mu _{d}}^{\leftarrow}(u_{d})\big), $$
(2.1)

where F μ i ( u i ):=inf{xR: F μ i (x) u i }. In particular, if \(F_{\mu _{1}},\ldots ,F_{\mu _{d}}\) are all continuous, then the copula \(C\) is unique and given by (2.1) on the whole unit cube \([0,1]^{d}\). For background on copulas, see for instance the textbooks by Durante and Sempi [12, Chaps. 1–2] or Nelsen [36, Chaps. 1–2].

For any nonempty compact set \(K\subseteq [0,1]^{d}\), we can define a pseudo-metric \(d_{K}\) on \(\mathbf{C}_{d}\) through

$$ d_{K}(C_{1},C_{2}):=\sup _{u\in K} |C_{1}(u)-C_{2}(u) |. $$

Since the elements of \(\mathbf{C}_{d}\) are all Lipschitz-continuous with Lipschitz constant 1 on \([0,1]^{d}\), the set \(\mathbf{C}_{d}\) is uniformly equicontinuous. This implies that convergence of a sequence ( C n ) n N C d N to some \(C\in \mathbf{C}_{d}\) with respect to \(d_{K}\) is equivalent to pointwise convergence of ( C n ) n N to \(C\) on \(K\). The topology on \(\mathbf{C}_{d}\) generated by \(d_{K}\) is denoted by \(\mathcal{O}_{K}\). For any \(\mu _{1},\ldots ,\mu _{d}\in{\mathcal{M}}_{1}\), we let

$$ d_{\mu _{1},\ldots ,\mu _{d}}:=d_{K}\quad \mbox{and}\quad\mathcal{O}_{ \mu _{1},\ldots ,\mu _{d}}:=\mathcal{O}_{K}\quad \mbox{with}\ K:= \overline{\mathrm{ran}F_{\mu _{1}}}\times \cdots \times \overline{\mathrm{ran}F_{\mu _{d}}}. $$
(2.2)

For \(K=[0,1]^{d}\), the pseudo-metric \(d_{K}\) is even a metric, and the topology \(\mathcal{O}_{K}\) is the standard topology on \(\mathbf{C}_{d}\) (and the counterpart of the weak topology on the set of all Borel probability measures on \([0,1]^{d}\) whose distribution functions are \(d\)-variate copulas). In particular, if \(F_{\mu _{1}},\ldots ,F_{\mu _{d}}\) are all continuous, then \(d_{\mu _{1},\ldots ,\mu _{d}}=d_{[0,1]^{d}}\) and \(\mathcal{O}_{\mu _{1},\ldots ,\mu _{d}}=\mathcal{O}_{[0,1]^{d}}\).

Convergence of copulas with respect to \(\mathcal{O}_{[0,1]^{d}}\) has been addressed in the literature several times, for instance by Charpentier and Segers [7] and Trutschnig [51]. Metrics inducing topologies that are at least as fine as \(\mathcal{O}_{[0,1]^{d}}\) have been studied for instance by Li et al. [30], Trutschnig [50], Fernández Sánchez and Trutschnig [19] and Kasper et al. [24]. On the other hand, the (pseudo-) metric \(d_{\mu _{1},\ldots ,\mu _{d}}\) defined by (2.2) generates a topology that is at most as fine as \(\mathcal{O}_{[0,1]^{d}}\). It is finally worth mentioning that the metric on the set of bivariate subcopulas that was recently introduced by Rachasingho and Tasena [40] basically differs from the metric \(d_{\mu _{1},\mu _{2}}\) and from its variant \(d_{\mu _{1},\mu _{2}}^{\sim}\) introduced in the following Remark 2.1; for details, see Appendix B.

Remark 2.1

For any fixed \(\mu _{1},\ldots ,\mu _{d}\in{\mathcal{M}}_{d}\), one can regard the pseudo-metric \(d_{\mu _{1},\ldots ,\mu _{d}}\) as a metric when changing from \(\mathbf{C}_{d}\) to the quotient set \(\mathbf{C}_{d}/_{\sim _{\mu _{1},\ldots ,\mu _{d}}}\) with respect to the equivalence relation

$$ \sim _{\mu _{1},\ldots ,\mu _{d}}:=\{(C,C')\in \mathbf{C}_{d}\times \mathbf{C}_{d}:C=C'\mbox{ on }\overline{\mathrm{ran}F_{\mu _{1}}}\times \cdots \times \overline{\mathrm{ran}F_{\mu _{d}}}\}. $$

On the resulting quotient set \(\mathbf{C}_{d}/_{\sim _{\mu _{1},\ldots ,\mu _{d}}}\), one may then define a metric through d μ 1 , , μ d (C, C ):= d μ 1 , , μ d (C, C ), where \(C,C'\) are (arbitrary) representatives of the equivalence classes C, C C d / μ 1 , , μ d . The topology on \(\mathbf{C}_{d}/_{\sim _{\mu _{1},\ldots ,\mu _{d}}}\) generated by \(d_{\mu _{1},\ldots ,\mu _{d}}^{\sim}\), henceforth denoted by \(\mathcal{O}_{\mu _{1},\ldots ,\mu _{d}}^{\sim}\), preserves the topological structure in the sense that a set \(G\subseteq \mathbf{C}_{d}\) lies in \(\mathcal{O}_{\mu _{1},\ldots ,\mu _{d}}\) if and only if the set

{C C d / μ 1 , , μ d :there exists a CC with CG}

lies in \(\mathcal{O}_{\mu _{1},\ldots ,\mu _{d}}^{\sim}\).

2.2 The set \(\mathcal{M}_{d}^{p}\) and the \(p\)-weak topology

Fix p R + and let \(\mathcal{M}_{d}^{p}\) be the set of all \(\mu \in{\mathcal{M}}_{d}\) for which R d | x | p μ(dx)<. Note that \(\mathcal{M}_{d}=\mathcal{M}_{d}^{0}\supseteq{\mathcal{M}}_{d}^{p_{1}}\supseteq{\mathcal{M}}_{d}^{p_{2}}\) for any p 1 , p 2 R + with \(p_{1}\le p_{2}\). The \(p\)-weak topology on \(\mathcal{M}_{d}^{p}\), henceforth denoted by \(\mathcal{O}_{d}^{p}\), is defined as the coarsest topology for which all mappings \(\mu \mapsto \int f\,d\mu \), \(f\in {\mathcal{C}}_{d}^{p}\), are continuous, where \(\mathcal{C}_{d}^{p}\) is the space of all continuous functions f: R d R with sup x R d |f(x)|/(1+ | x | p )<. The 0-weak topology on \(\mathcal {M}_{d}^{0}\) (\(=\mathcal{M}_{d}\)) is just the classical weak topology, and the \(p\)-weak topology \(\mathcal{O}_{d}^{p}\) is finer than the relative weak topology \(\mathcal{O}_{d}^{0}\cap{\mathcal{M}}_{d}^{p}\) when \(p>0\).

It is known from Krätschmer et al. [28, Lemma 2.1] that \((\mathcal{M}_{d}^{p},\mathcal{O}_{d}^{p})\) is a Polish space and that \(\mu _{n}\to \mu \) in \(\mathcal{O}_{d}^{p}\) if and only if both \(\mu _{n}\to \mu \) in \(\mathcal{O}_{d}^{0}\cap{\mathcal{M}}_{d}^{p}\) and

R d | x | p μ n (dx) R d | x | p μ(dx).

In particular, \(\mathcal{O}_{d}^{p}\) is metrised by

d(μ,ν):= d weak (μ,ν)+| R d | x | p μ(dx) R d | x | p ν(dx)|

for any metric \(d_{\mathrm{weak}}\) which metrises \(\mathcal{O}_{d}^{0}\). Already in the 1980s, Bickel and Freedman [5, Lemma 8.3] proved for \(p\in [1,\infty )\) that \(\mathcal{O}_{d}^{p}\) is also metrisable by the \(L^{p}\)-Wasserstein metric. The following proposition is a sort of continuous mapping theorem.

Proposition 2.2

Let d, d N and p, p R + . Let h: R d R d be a continuous function such that sup x R d | h ( x ) | p /(1+ | x | p )<. Then \(\mathfrak{h}(\mu ):=\mu \circ h^{-1}\) lies in \(\mathcal{M}_{d'}^{p'}\) for any \(\mu \in{\mathcal{M}}_{d}^{p}\), and the map \(\mathfrak{h}:\mathcal{M}_{d}^{p}\to{\mathcal{M}}_{d'}^{p'}\) is \((\mathcal{O}_{d}^{p},\mathcal{O}_{d'}^{p'})\)-continuous.

2.3 A generalisation of Deheuvels’ copula convergence theorem

Deheuvels’ convergence theorem [10, Théorème 2.3, Lemma 4.1] says that given a \(d\)-variate distribution \(\mu \in{\mathcal{M}}_{d}\) whose marginal distributions \(\mu _{1},\ldots ,\mu _{d}\in{\mathcal{M}}_{1}\) all possess continuous distribution functions \(F_{\mu _{1}},\ldots ,F_{\mu _{d}}\), a sequence ( μ n ) n N M d N converges to \(\mu \) in \(\mathcal{O}_{d}^{0}\) if and only if \((\mu _{n,i})\) converges to \(\mu _{i}\) in \(\mathcal{O}_{1}^{0}\), \(i=1,\ldots ,d\), and \(d_{[0,1]^{d}}(C_{n},C)\to 0\). Here \(C\) is the unique copula of \(\mu \), \(C_{n}\) is any copula of \(\mu _{n}\), and \(\mu _{n,i}\) is the \(i\)th marginal distribution of \(\mu _{n}\). Sempi [47] and Lindner and Szimayer [31] extended Deheuvels’ result to the general case where the marginal distribution functions \(F_{\mu _{1}},\ldots ,F_{\mu _{d}}\) might be discontinuous. Theorem 2.1 in [31] shows that ( μ n ) n N M d N converges to \(\mu \) in \(\mathcal{O}_{d}^{0}\) if and only if \((\mu _{n,i})\) converges to \(\mu _{i}\) in \(\mathcal{O}_{1}^{0}\), \(i=1,\ldots ,d\), and \(d_{\mu _{1},\ldots ,\mu _{d}}(C_{n},C)\to 0\) (a similar result was proved earlier for \(d=2\) in [47, Theorems 2 and 3]). Recall that \(C\) is uniquely determined only on \(\overline{\mathrm{ran}F_{\mu _{1}}}\times \cdots \times \overline{\mathrm{ran}F_{\mu _{d}}}\). The results of [47, Example 2] and [31, Example 2.2] show that convergence of the copula on the whole unit cube \([0,1]^{d}\) can indeed fail. Theorem 2.3 below is a version of the Sempi–Lindner–Szimayer result where the weak topologies are replaced by \(p\)-weak topologies.

Consider the map \(\mathfrak{P}_{d}:\mathbf{C}_{d}\times\mathcal{M}_{1}\times \cdots \times\mathcal{M}_{1}\to{\mathcal{M}}_{d}\) defined by

$$ \mathfrak{P}_{d}(C,\mu _{1},\ldots ,\mu _{d}):=\mathfrak{p}_{d}\big(C(F_{ \mu _{1}},\ldots ,F_{\mu _{d}})\big), $$
(2.3)

where \(\mathfrak{p}_{d}\) assigns to a \(d\)-variate distribution function its corresponding Borel probability measure on R d . Note that \(\mathfrak{P}_{d}(C,\mu _{1},\ldots ,\mu _{d})\) remains unchanged when \(C\) is modified outside \(\overline{\mathrm{ran}F_{\mu _{1}}}\times \cdots \times \overline{\mathrm{ran}F_{\mu _{d}}}\). It is easily seen (see Appendix A.2) that for any p R + , the univariate distributions \(\mu _{1},\ldots ,\mu _{d}\) lie in \(\mathcal{M}_{1}^{p}\) if and only if the \(d\)-variate distribution \(\mathfrak{P}_{d}(C,\mu _{1},\ldots ,\mu _{d})\) lies in \(\mathcal{M}_{d}^{p}\), regardless of the copula \(C\). In particular, the restriction of \(\mathfrak{P}_{d}\) to \(\mathbf{C}_{d}\times\mathcal{M}_{1}^{p}\times \cdots \times\mathcal{M}_{1}^{p}\) can be regarded as an \(\mathcal{M}_{d}^{p}\)-valued map.

Theorem 2.3

Fix p R + and let \((C,\mu _{1},\ldots ,\mu _{d})\) and \((C_{n},\mu _{n,1},\ldots ,\mu _{n,d})\), nN, be elements of \(\mathbf{C}_{d}\times\mathcal{M}_{1}^{p}\times \cdots \times\mathcal{M}_{1}^{p}\). Then

$$ \mathfrak{P}_{d}(C_{n},\mu _{n,1},\ldots ,\mu _{n,d})\longrightarrow \mathfrak{P}_{d}(C,\mu _{1},\ldots ,\mu _{d}) $$

in \(\mathcal{O}_{d}^{p}\) if and only if \(\mu _{n,i}\to \mu _{i}\) in \(\mathcal{O}_{1}^{p}\), \(i=1,\ldots ,d\), and \(d_{\mu _{1},\ldots ,\mu _{d}}(C_{n},C)\to 0\).

Recall that a sequence in a product space converges in the product topology if and only if for each projection, the corresponding marginal sequence converges. As a direct consequence, we can obtain from Theorem 2.3 the following corollary, taking into account that each of the involved topologies is metrisable, or at least pseudo-metrisable, and that \(d_{\mu _{1},\ldots ,\mu _{d}}(C,C)=0\) for any \(C\in \mathbf{C}_{d}\) and \(\mu _{1},\ldots ,\mu _{d}\in{\mathcal{M}}_{1}\).

Corollary 2.4

For any p R + , \(C\in \mathbf{C}_{d}\) and \(\mu _{1},\ldots ,\mu _{d}\in{\mathcal{M}}_{1}^{p}\), the following two assertions hold:

(i) The map \(\mathfrak{P}_{d}(\,\cdot \,,\mu _{1},\ldots ,\mu _{d}):\mathbf{C}_{d} \to{\mathcal{M}}_{d}^{p}\) is \((\mathcal{O}_{\mu _{1},\ldots ,\mu _{d}},\mathcal{O}_{d}^{p})\)-continuous.

(ii) The map \(\mathfrak{P}_{d}(C,\,\cdot \,,\,\ldots ,\,\cdot \,):\mathcal{M}_{1}^{p} \times \cdots \times\mathcal{M}_{1}^{p}\to{\mathcal{M}}_{d}^{p}\) is continuous for the pair \((\mathcal{O}_{1}^{p}\times \cdots \times\mathcal{O}_{1}^{p},\mathcal{O}_{d}^{p})\).

2.4 Characterisation of (\(p\)-)weak convergence in Fréchet classes

For any p R + and \(\mu _{1},\ldots ,\mu _{d}\in{\mathcal{M}}_{1}^{p}\), the image of the map

$$ \mathfrak{P}_{d}(\,\cdot \,,\mu _{1},\ldots ,\mu _{d}):\mathbf{C}_{d} \to{\mathcal{M}}_{d}^{p} $$

is the Fréchet class \(\mathcal{M}_{d}(\mu _{1},\ldots ,\mu _{d})\), and we have \(\mu _{1},\ldots ,\mu _{d}\in{\mathcal{M}}_{1}^{p'}\) as well as \({\mathcal{M}}_{d}(\mu _{1},\ldots ,\mu _{d})\subseteq{\mathcal{M}}_{d}^{p'}\) for any \(p'\in [0,p]\). Therefore, Corollary 2.4 (i) immediately yields the following result.

Corollary 2.5

For any p R + and \(\mu _{1},\ldots ,\mu _{d}\in{\mathcal{M}}_{1}^{p}\), the map

$$ \mathfrak{P}_{d}(\,\cdot \,,\mu _{1},\ldots ,\mu _{d}):\mathbf{C}_{d} \to{\mathcal{M}}_{d}(\mu _{1},\ldots ,\mu _{d}) $$

is \((\mathcal{O}_{\mu _{1},\ldots ,\mu _{d}},\mathcal{O}_{d}^{p'}\cap{\mathcal{M}}_{d}( \mu _{1},\ldots ,\mu _{d}))\)-continuous for any \(p'\in [0,p]\).

As a simple consequence of Theorem 2.3, we obtain the following corollary (see Appendix A.3). The result is already known from Krätschmer et al. [28, Proposition 3.9] (with \(A_{d}\) chosen to be the identity on R d ), where other arguments have been used for the proof.

Corollary 2.6

For any p R + and \(\mu _{1},\ldots ,\mu _{d}\in{\mathcal{M}}_{1}^{p}\), we have that

$$ {\mathcal{O}}_{d}^{p}\cap{\mathcal{M}}_{d}(\mu _{1},\ldots ,\mu _{d})=\mathcal{O}_{d}^{p'} \cap{\mathcal{M}}_{d}(\mu _{1},\ldots ,\mu _{d}) $$

for any \(p'\in [0,p]\).

For any fixed \(\mu _{1},\ldots ,\mu _{d}\in{\mathcal{M}}_{d}^{p}\), we use as before \(\mathbf{C}_{d}/_{\sim _{\mu _{1},\ldots ,\mu _{d}}}\) to denote the quotient set of \(\mathbf{C}_{d}\) with respect to the equivalence relation \(\sim _{\mu _{1},\ldots ,\mu _{d}}\) of identity on \(\overline{\mathrm{ran}F_{\mu _{1}}}\times \cdots \times \overline{\mathrm{ran}F_{\mu _{d}}}\). Recall from Remark 2.1 that we denote by \(\mathcal{O}_{\mu _{1}\ldots ,\mu _{k}}^{\sim}\) the topology on \(\mathbf{C}_{d}/_{\sim _{\mu _{1},\ldots ,\mu _{d}}}\) generated by the metric \(d_{\mu _{1},\ldots ,\mu _{d}}^{\sim}\) corresponding to the pseudo-metric \(d_{\mu _{1},\ldots ,\mu _{d}}\), and that \(\mathcal{O}_{\mu _{1},\ldots ,\mu _{d}}^{\sim}\) preserves the topological structure of \(\mathcal{O}_{\mu _{1},\ldots ,\mu _{d}}\).

Let us denote by \(\mathfrak{P}_{\mu _{1},\ldots ,\mu _{d}}:\mathbf{C}_{d}/_{\sim _{ \mu _{1},\ldots ,\mu _{d}}}\to{\mathcal{M}}_{d}(\mu _{1},\ldots ,\mu _{d})\) the map that assigns to each equivalence class C C d / μ 1 , , μ d the unique probability measure μ C M d ( μ 1 ,, μ d ) that satisfies μ C = P d (C, μ 1 ,, μ d ) for all representatives CC. Then Corollary 2.5 can be reformulated as follows.

Corollary 2.7

For any p R + and \(\mu _{1},\ldots ,\mu _{d}\in{\mathcal{M}}_{1}^{p}\), the map

$$ \mathfrak{P}_{\mu _{1},\ldots ,\mu _{d}}:\mathbf{C}_{d}/_{\sim _{\mu _{1}, \ldots ,\mu _{d}}}\to{\mathcal{M}}_{d}(\mu _{1},\ldots ,\mu _{d}) $$

is \((\mathcal{O}_{\mu _{1},\ldots ,\mu _{d}}^{\sim},\mathcal{O}_{d}^{p'}\cap{\mathcal{M}}_{d}(\mu _{1},\ldots ,\mu _{d}))\)-continuous for any \(p'\in [0,p]\).

Let \(\mathfrak{C}_{\mu _{1},\ldots ,\mu _{d}}:\mathcal{M}_{d}(\mu _{1}, \ldots ,\mu _{d})\to \mathbf{C}_{d}/_{\sim _{\mu _{1},\ldots ,\mu _{d}}}\) be the map that assigns to each \(\mu \in{\mathcal{M}}_{d}(\mu _{1}\ldots ,\mu _{d})\) the unique equivalence class C μ C d / μ 1 , , μ d whose representatives are copulas of \(\mu \). Then we have the following converse of Corollary 2.7.

Corollary 2.8

For any p R + and \(\mu _{1},\ldots ,\mu _{d}\in{\mathcal{M}}_{1}^{p}\), the map

$$ \mathfrak{C}_{\mu _{1},\ldots ,\mu _{d}}:\mathcal{M}_{d}(\mu _{1}, \ldots ,\mu _{d})\to \mathbf{C}_{d}/_{\sim _{\mu _{1},\ldots ,\mu _{d}}} $$

is \((\mathcal{O}_{d}^{p'}\cap{\mathcal{M}}_{d}(\mu _{1},\ldots ,\mu _{d}),\mathcal{O}_{ \mu _{1},\ldots ,\mu _{d}}^{\sim})\)-continuous for any \(p'\in [0,p]\).

As an immediate consequence of Corollaries 2.7 and 2.8, we obtain the following result. Note that the equivalence (a) ⇔ (b) also follows from Corollary 2.6, and that condition (c) is equivalent with \(d_{\mu _{1},\ldots ,\mu _{d}}(C_{n},C)\to 0\) for any copulas \(C\) and \(C_{n}\), nN, of \(\mu \) and \(\mu _{n}\), nN, respectively.

Corollary 2.9

Fix p R + and \(\mu _{1},\ldots ,\mu _{d}\in{\mathcal{M}}_{1}^{p}\). Then the following assertions are equivalent for any ( μ n ) n N M d ( μ 1 , , μ d ) N and \(\mu \in \mathcal{M}_{d}(\mu _{1},\ldots ,\mu _{d})\):

(a) \(\mu _{n}\to \mu \) in \(\mathcal{O}_{d}^{0}\cap{\mathcal{M}}_{d}(\mu _{1},\ldots ,\mu _{d})\).

(b) \(\mu _{n}\to \mu \) in \(\mathcal{O}_{d}^{p}\cap{\mathcal{M}}_{d}(\mu _{1},\ldots ,\mu _{d})\).

(c) \(\mathfrak{C}_{\mu _{1},\ldots ,\mu _{d}}(\mu _{n})\to \mathfrak{C}_{ \mu _{1},\ldots ,\mu _{d}}(\mu )\) in \(\mathcal{O}_{\mu _{1},\ldots ,\mu _{d}}^{\sim}\).

3 Copula robustness

3.1 Definition of copula robustness

Let \(\mathcal{M}_{d}'\subseteq{\mathcal{M}}_{d}\) and \(\mathcal{T}_{d}:\mathcal{M}_{d}'\longrightarrow \mathbf{E} \) be any map taking values in some topological space \((\mathbf{E},\mathcal{O}_{\mathbf{E}})\). As before, let the map \(\mathfrak{P}_{d}:\mathbf{C}_{d}\times\mathcal{M}_{1}\times \cdots \times\mathcal{M}_{1}\to{\mathcal{M}}_{d}\) be defined by (2.3). Let \(\mathfrak{D}_{d}'\) be the set of all \((C,\mu _{1},\ldots ,\mu _{d})\in \mathbf{C}_{d}\times\mathcal{M}_{1} \times \cdots \times\mathcal{M}_{1}\) for which \(\mathfrak{P}_{d}(C,\mu _{1},\ldots ,\mu _{d})\) lies in \(\mathcal{M}_{d}'\). Then we can associate with \(\mathcal{T}_{d}\) a functional \(\mathfrak{T}_{d}:\mathfrak{D}_{d}'\to \mathbf{E}\) through

$$ \mathfrak{T}_{d}(C,\mu _{1},\ldots ,\mu _{d}):=\mathcal{T}_{d}\big( \mathfrak{P}_{d}(C,\mu _{1},\ldots ,\mu _{d})\big). $$
(3.1)

Let \(\mathfrak{M}_{d}'\) be the set of all \(d\)-tuples \((\mu _{1},\ldots ,\mu _{d})\in{\mathcal{M}}_{1}\times \cdots \times\mathcal{M}_{1}\) for which there exists a copula \(C\in \mathbf{C}_{d}\) such that \((C,\mu _{1},\ldots ,\mu _{d})\in \mathfrak{D}_{d}'\). Moreover, for any fixed \((\mu _{1},\ldots ,\mu _{d})\in \mathfrak{M}_{d}'\), let the set \(\mathbf{C}_{d}'(\mu _{1},\ldots ,\mu _{d})\) consist of all those copulas \(C\in \mathbf{C}_{d}\) for which \((C,\mu _{1},\ldots ,\mu _{d})\in \mathfrak{D}_{d}'\).

Definition 3.1

The map \(\mathcal{T}_{d}\) is copula robust if for any fixed \((\mu _{1},\ldots ,\mu _{d})\in \mathfrak{M}_{d}'\), the map \(\mathfrak{T}_{d}(\,\cdot \,,\mu _{1},\ldots ,\mu _{d}):\mathbf{C}_{d}'( \mu _{1},\ldots ,\mu _{d})\to \mathbf{E}\) is continuous for the pair

$$ (\mathcal{O}_{\mu _{1},\ldots ,\mu _{d}}\cap \mathbf{C}_{d}'(\mu _{1}, \ldots ,\mu _{d}),\mathcal{O}_{\mathbf{E}}).$$

The sets \(\mathfrak{M}_{d}'\) and \(\mathbf{C}_{d}'(\mu _{1},\ldots ,\mu _{d})\) are illustrated by Examples 3.2 and 3.3 below. The examples show in particular that the set \(\mathbf{C}_{d}'(\mu _{1},\ldots ,\mu _{d})\) can be quite different from case to case. In Example 3.2, and in the further course, let \(\mathcal{N}_{1}\) be the set of all non-degenerate univariate normal distributions and for \(d\ge 2\), let \(\mathcal{N}_{d}\) be the set of all (possibly degenerate) \(d\)-variate normal distributions with continuous marginals. In Example 3.2, we also need the notion of a Gaussian copula. Recall that, by definition, a \(d\)-variate Gaussian copula is an element \(C\in \mathbf{C}_{d}\) given through

$$ C(u_{1},\ldots ,u_{d}):=\boldsymbol{\varPhi}_{\boldsymbol{0},R}\big( \varPhi _{0,1}^{-1}(u_{1}),\ldots ,\varPhi _{0,1}^{-1}(u_{d})\big) $$
(3.2)

for some correlation matrix \(R\), i.e., for some symmetric and positive semi-definite matrix \(R\in [-1,1]^{d\times d}\) which has entries 1 on the diagonal. Here \(\varPhi _{0,1}\) and \(\boldsymbol{\varPhi}_{\boldsymbol{0},R}\) are respectively the distribution function of the univariate standard normal distribution and the distribution function of the centered \(d\)-variate normal distribution \(\mathbf{N}_{\boldsymbol{0},R}\) with covariance matrix equal to \(R\), and we set \(\varPhi _{0,1}^{-1}(0):=-\infty \) and \(\varPhi _{0,1}^{-1}(1):=+ \infty \) as well as

Φ 0 , R ( x 1 ,, x d ):= N 0 , R [ i = 1 d (, x i ] R d ]

for any x 1 ,, x d R :=R{,+}. The set of all Gaussian copulas is denoted by \(\mathbf{C}_{d}^{{\mbox{\textup {{\scriptsize {Ga}}}}}}\).

Example 3.2

If \(\mathcal{M}_{d}'=\mathcal{N}_{d}\), then \(\mathfrak{D}_{d}'=\mathbf{C}_{d}^{{\mbox{\textup {{\scriptsize {Ga}}}}}}\times\mathcal{N}_{1}\times \cdots \times\mathcal{N}_{1}\) (see Appendix A.5). In particular, \(\mathfrak{M}_{d}'=\mathcal{N}_{1}\times \cdots \times\mathcal{N}_{1}\) and \(\mathbf{C}_{d}'(\mu _{1},\ldots ,\mu _{d})=\mathbf{C}_{d}^{{\mbox{\textup {{\scriptsize {Ga}}}}}}\) for any \((\mu _{1},\ldots ,\mu _{d})\in \mathfrak{M}_{d}'\).

Example 3.3

If \(\mathcal{M}_{d}'=\mathcal{M}_{d}^{p}\) for some p R + , then \(\mathfrak{D}_{d}'=\mathbf{C}_{d}\times\mathcal{M}_{1}^{p}\times \cdots \times\mathcal{M}_{1}^{p}\) (see Appendix A.6). In particular, \(\mathfrak{M}_{d}'=\mathcal{M}_{1}^{p}\times \cdots \times\mathcal{M}_{1}^{p}\) and \(\mathbf{C}_{d}'(\mu _{1},\ldots ,\mu _{d})=\mathbf{C}_{d}\) for any \((\mu _{1},\ldots ,\mu _{d})\in \mathfrak{M}_{d}'\).

The following lemma is trivial, but, nevertheless, worth to be written down. In the lemma, \((\mathbf{E}',\mathcal{O}_{\mathbf{E}}')\) is another topological space.

Lemma 3.4

If \(\mathcal{T}_{d}\) is copula robust and \(\mathcal{U}:\mathbf{E}\to \mathbf{E}'\) is any \((\mathcal{O}_{\mathbf{E}},\mathcal{O}_{\mathbf{E}}')\)-continuous map, then the composition \(\mathcal{T}_{d}':=\mathcal{U}\circ{\mathcal{T}}_{d}\) is copula robust.

3.2 Copula robustness of functionals on \(\mathcal{N}_{d}\)

In this section, let specifically \(\mathcal{M}_{d}'=\mathcal{N}_{d}\). That is, let \(\mathcal{T}_{d}:\mathcal{N}_{d}\to \mathbf{E}\) be any map taking values in some topological space \((\mathbf{E},\mathcal{O}_{\mathbf{E}})\). In view of Example 3.2, the definition of copula robustness of \(\mathcal{T}_{d}\) (Definition 3.1) can then be reformulated as follows.

Definition 3.5

The map \(\mathcal{T}_{d}\) on \(\mathcal{N}_{d}\) is copula robust if for any fixed \(\mu _{1},\ldots ,\mu _{d}\in \mathcal{N}_{1}\), the map \(\mathfrak{T}_{d}(\,\cdot \,,\mu _{1},\ldots ,\mu _{d}):\mathbf{C}_{d}^{ {\mbox{\textup {{\scriptsize {Ga}}}}}}\to \mathbf{E}\) is \((\mathcal{O}_{[0,1]^{d}}\cap \mathbf{C}_{d}^{{\mbox{\textup {{\scriptsize {Ga}}}}}},\mathcal{O}_{ \mathbf{E}})\)-continuous.

Remark 3.6

Convergence in \((\mathbf{C}_{d}^{{\mbox{\textup {{\scriptsize {Ga}}}}}},\mathcal{O}_{[0,1]^{d}}\cap \mathbf{C}_{d}^{ {\mbox{\textup {{\scriptsize {Ga}}}}}})\) is nothing but pointwise (or uniform) convergence in \(\mathbf{C}_{d}^{{\mbox{\textup {{\scriptsize {Ga}}}}}}\). This sort of convergence is therefore equivalent to convergence of the respective correlation matrices in any matrix norm; for details, see Appendix A.7.

Example 3.7

The identity map \(\mathcal{P}_{d}:\mathcal{N}_{d}\to{\mathcal{N}}_{d}\) is copula robust in the sense of Definition 3.5 when the role of \((\mathbf{E},\mathcal{O}_{\mathbf{E}})\) is played by \((\mathcal{N}_{d},\mathcal{O}_{d}^{p}\cap{\mathcal{N}}_{d})\) for arbitrary (but fixed) p R + . For details, see Appendix A.8.

Example 3.7 and Lemma 3.4 (applied to \(\mathcal{T}_{d}':=\mathcal{T}_{d}\), \(\mathcal{T}_{d}:=\mathcal{P}_{d}\), \(\mathcal{U}:=\mathcal{T}_{d}\)) immediately yield the following result.

Theorem 3.8

If \(\mathcal{T}_{d}\) is \((\mathcal{O}_{d}^{p}\cap{\mathcal{N}}_{d},\mathcal{O}_{\mathbf{E}})\)-continuous for some p R + , then it is copula robust.

Of course, Theorem 3.8 can be generalised to larger sets of parametric distributions as for instance to the set \(\mathcal{S}_{d}\) of all \(d\)-variate (Student) \(t\)-distributions with continuous marginals; see for instance Demarta and McNeil [11] for the definitions of \(d\)-variate \(t\)-distributions and \(t\)-copulas. However, for the sake of clarity and ease, the exposition here is restricted to the Gaussian setting. A perhaps more interesting setting is addressed in the next section.

3.3 Copula robustness of functionals on \(\mathcal{M}_{d}^{p}\)

In this section, let specifically \(\mathcal{M}_{d}'=\mathcal{M}_{d}^{p}\) for some p R + . That is, let \(\mathcal{T}_{d}:\mathcal{M}_{d}^{p}\to \mathbf{E}\) be any map taking values in some topological space \((\mathbf{E},\mathcal{O}_{\mathbf{E}})\). In view of Example 3.3, the definition of copula robustness of \(\mathcal{T}_{d}\) (Definition 3.1) can then be reformulated as follows.

Definition 3.9

The map \(\mathcal{T}_{d}\) on \(\mathcal{M}_{d}^{p}\) is copula robust if for any fixed \(\mu _{1},\ldots ,\mu _{d}\in{\mathcal{M}}_{1}^{p}\), the map \(\mathfrak{T}_{d}(\,\cdot \,,\mu _{1},\ldots ,\mu _{d}):\mathbf{C}_{d} \to \mathbf{E}\) is \((\mathcal{O}_{\mu _{1},\ldots ,\mu _{d}},\mathcal{O}_{\mathbf{E}})\)-continuous.

With the help of Corollaries 2.5 and 2.8, we can derive the following characterisation of copula robustness of \(\mathcal{T}_{d}\). For details, see Appendix A.9.

Theorem 3.10

Let \(\mathcal{T}_{d}: \mathcal{M}_{d}^{p} \to \mathbf{E}\) be any map. Then \(\mathcal{T}_{d}\) is copula robust if and only if for any fixed \(\mu _{1},\ldots ,\mu _{d}\in{\mathcal{M}}_{1}^{p}\), its restriction \(\mathcal{T}_{d}|_{\mathcal{M}_{d}(\mu _{1},\ldots ,\mu _{d})}\) to the Fréchet class \(\mathcal{M}_{d}(\mu _{1},\ldots ,\mu _{d})\) is continuous for the pair \((\mathcal{O}_{d}^{0}\cap{\mathcal{M}}_{d}(\mu _{1},\ldots ,\mu _{d}),\mathcal{O}_{ \mathbf{E}})\).

Example 3.11

Corollary 2.4 (i) shows that the identity map \(\mathcal{P}_{d}:\mathcal{M}_{d}^{p}\to{\mathcal{M}}_{d}^{p}\) is copula robust in the sense of Definition 3.9 when the role of \((\mathbf{E},\mathcal{O}_{\mathbf{E}})\) is played by \((\mathcal{M}_{d}^{p},\mathcal{O}_{d}^{p})\).

Example 3.11 and Lemma 3.4 (applied to \(\mathcal{T}_{d}':=\mathcal{T}_{d}\), \(\mathcal{T}_{d}:=\mathcal{P}_{d}\), \(\mathcal{U}:=\mathcal{T}_{d}\)) immediately yield the following result.

Theorem 3.12

If \(\mathcal{T}_{d}\) is \((\mathcal{O}_{d}^{p},\mathcal{O}_{\mathbf{E}})\)-continuous, then it is copula robust.

Now fix d N and p R + . Proposition 2.2 ensures that in the scope of the following corollary, we have \(\mu \circ h^{-1}\in{\mathcal{M}}_{d'}^{p'}\) for any \(\mu \in{\mathcal{M}}_{d}^{p}\).

Corollary 3.13

Let \(\mathcal{T}_{d'}:\mathcal{M}_{d'}^{p'}\to \mathbf{E}\) be an \((\mathcal{O}_{p'}^{d'},\mathcal{O}_{\mathbf{E}})\)-continuous map and suppose h: R d R d is a continuous map with sup x R d | h ( x ) | p /(1+ | x | p )<. Then the map \(\mathcal{T}_{d}':\mathcal{M}_{d}^{p}\to \mathbf{E}\) defined by \(\mathcal{T}_{d}'(\mu ):=\mathcal{T}_{d'}(\mu \circ h^{-1})\) is copula robust.

4 Example 1: risk measures of aggregate risks

4.1 Foundations of risk measures

Let (Ω,F,P) be an atomless probability space and denote by L 0 := L 0 (Ω,F,P) the usual class of all finite-valued random variables modulo the equivalence relation of ℙ-a.s. identity. Moreover, let L p = L p (Ω,F,P) be the usual \(L^{p}\)-space, \(p>0\). For any p R + , we say that a map ρ: L p R is a risk measure when the following three conditions are satisfied:

(i) (monotonicity) \(\rho (X)\le \rho (Y)\) for \(X\), \(Y\in L^{p}\) with \(X\le Y\);

(ii) (cash-additivity) \(\rho (X+m)=\rho (X)+m\) for \(X\in L^{p}\) and mR;

(iii) (distribution-invariance) \(\rho (X)=\rho (Y)\) for \(X,Y\in L^{p}\) with P X = P Y .

In this context, the elements of \(L^{p}\) should be seen as payoff profiles where positive realisations correspond to losses. Following Föllmer and Schied [22], [23, Chap. 4], a risk measure ρ: L p R is said to be convex if it satisfies the following condition:

(iv) (convexity) \(\rho (\lambda X+(1-\lambda ) Y)\le \lambda \rho (X)+(1-\lambda ) \rho (Y)\) for all \(X,Y\in L^{p}\) and \(\lambda \in [0,1]\).

The following example recalls three risk measures which are popular in practice and/or among academics. For background, see Emmer et al. [18] and references cited therein.

Example 4.1

Fix \(\alpha \in (0,1)\).

(i) The value at risk at level \(\alpha \) is the risk measure VaR α : L 0 R defined by \(\mathrm{VaR}_{\alpha}(X):=F_{X}^{\leftarrow}(\alpha )\), where F X (α):=inf{xR: F X (x)α} is the lower \(\alpha \)-quantile of P X . It is not convex.

(ii) The average value at risk at level \(\alpha \) is the risk measure AVaR α : L 1 R defined by \(\mathrm{AVaR}_{\alpha}(X):=\frac{1}{1-\alpha}\int _{\alpha}^{1}F_{X}^{ \leftarrow}(s)\,ds\) and known to be convex; see for instance the work of Wang and Dhaene [53].

(iii) The \(\alpha \)-expectile at level \(\alpha \) is the risk measure Ept α : L 1 R defined by Ept α (X):= U α ( X ) 1 (0), where U α ( X ) 1 denotes the inverse of the function U α (X)(m):=E[ U α (Xm)] with \(U_{\alpha}(x):=\alpha x\) or \((1-\alpha )x\) depending on whether \(x\ge 0\) or \(x<0\). It is well defined, and known to be convex if and only if \(\alpha \ge 1/2\); see the work of Bellini et al. [3].

For any risk measure ρ: L p R, we may define a functional R ρ : M 1 p R through

$$ {\mathcal{R}}_{\rho}(\mu ):=\rho (X_{\mu}), $$
(4.1)

where \(X_{\mu}\) is any random variable on (Ω,F,P) with distribution \(\mu \). We refer to \(\mathcal{R}_{\rho}\) as the risk functional associated with \(\rho \). The assertion of the following result is a direct consequence of Cheridito and Li [8, Theorem 4.1] combined with the representation theorem of Krätschmer et al. [27, Theorem 3.5]. Here O R refers to the natural topology on ℝ.

Theorem 4.2

Let p R + . For any convex risk measure ρ: L p R, the corresponding risk functional R ρ : M 1 p R is ( O 1 p , O R )-continuous.

4.2 Copula robustness of risk measures of aggregate risks

Let ρ: L p R be a risk measure for some p R + . Let A d : R d R be any continuous map, regarded as an aggregation map in the spirit of McNeil et al. [33, Sect. 6.2.1]. Assume that for some p R + and any \(X_{1},\ldots ,X_{d}\in L^{p}\), the random variable \(A_{d}(X_{1},\ldots ,X_{d})\) lies in \(L^{p'}\). Then we can define a map R ρ , A d : M d p R through

$$ \mathcal{R}_{\rho ,A_{d}}(\mu ):=\mathcal{R}_{\rho }(\mu \circ A_{d}^{-1} ). $$
(4.2)

We refer to \(\mathcal{R}_{\rho ,A_{d}}\) as aggregation risk functional associated with \(\rho \) and \(A_{d}\). Note that the right-hand side in (4.2) equals \(\rho (A_{d}(X_{1},\ldots ,X_{d}))\) when \((X_{1},\ldots ,X_{d})\) is an R d -valued random variable with distribution \(\mu \). As a direct consequence of Corollary 3.13 (applied to \(\mathcal{T}_{1}:=\mathcal{R}_{\rho}\), \(h:=A_{d}\), \(\mathcal{T}_{d}':=\mathcal{R}_{\rho ,A_{d}}\)) and Theorem 4.2, we obtain the following result.

Corollary 4.3

Take p, p R + , a convex risk measure ρ: L p R and a continuous map A d : R d R satisfying sup x R d | A d ( x ) | p / ( 1 + | x | ) p <. Then the aggregation risk functional R ρ , A d : M d p R defined by (4.2) is copula robust.

Example 4.4

In risk management, \(A_{d}\) is frequently chosen as one of the following maps; see for instance the textbook by McNeil et al. [33, Sect. 6.2]:

(i) \(A_{d}(x_{1},\dots ,x_{d}) := \sum _{i=1}^{d}x_{i}\);

(ii) \(A_{d}(x_{1},\dots ,x_{d}) := \max \{x_{1},\dots ,x_{d}\}\);

(iii) \(A_{d}(x_{1},\dots ,x_{d}) := \sum _{i=1}^{d}(x_{i} - t_{i})^{+}\) for thresholds \(t_{1},\dots ,t_{d} > 0\);

(iv) \(A_{d}(x_{1},\dots ,x_{d}) := (\sum _{i=1}^{d}x_{i} - t)^{+}\) for a threshold \(t > 0\).

It is easily seen that for each of these four maps, sup x R d | A d ( x ) | p / ( 1 + | x | ) p < holds for any p R + . That is, all these maps satisfy the assumptions of Corollary 4.3 for \(p'=p\) (and thus for any p R + and \(p'\in [0,p]\)). In particular, for each of these four maps \(A_{d}\) and for any convex risk measure ρ: L p R, the corresponding aggregation risk functional R ρ , A d : M d p R defined by (4.2) is copula robust for any dN.

Remark 4.5

Of course, the assertion of Corollary 4.3 and the last assertion in Example 4.4 also hold true for any other risk measure \(\rho \) for which the corresponding risk functional R ρ : M 1 p R is ( O 1 p , O R )-continuous.

Remark 4.5 indicates that in the setting of Corollary 4.3, the assumed convexity of \(\rho \) is not necessary. To give an example that shows that this is indeed true, let \(\rho \) be the \(\alpha \)-expectile \(\mathrm{Ept}_{\alpha}\) with \(\alpha <1/2\) (see Example 4.1 (iii)). Then \(\rho \) is not convex (see Bellini et al. [3, Proposition 7(b–c)]), but the corresponding risk functional R ρ : M 1 1 R is ( O 1 1 , O R )-continuous (see Krätschmer and Zähle [29, Theorem 2.1]), and the latter implies that the aggregation risk functional R ρ , A d : M d 1 R is copula robust.

On the other hand, if the risk functional R ρ : M 1 p R corresponding to some \(\rho \) is not ( O 1 p , O R )-continuous, then copula robustness of R ρ , A d : M d p R can indeed fail to hold. For instance, Example 4.7 below shows that R ρ , A 2 : M 2 0 R is not copula robust when \(\rho :=\mathrm{VaR}_{\alpha}\) (see Example 4.1 (i)) and \(A_{2}(x_{1},x_{2}):=x_{1}+x_{2}\). Note here that the risk functional R ρ : M 1 0 R associated with \(\rho :=\mathrm{VaR}_{\alpha}\) is known not to be weakly continuous, and that weak continuity is just ( O 1 0 , O R )-continuity.

It is further known that the risk functional R ρ : M 1 0 R associated with \(\rho :=\mathrm{VaR}_{\alpha}\) can be made weakly continuous when restricting it to the set \(\mathcal{M}_{1}^{(\alpha )}\) of all those Borel probability measures on ℝ that possess a unique \(\alpha \)-quantile (see e.g. van der Vaart [52, Lemma 21.2]), or even to the set \(\mathcal{M}_{1}^{\mathcal{L}}\) of all \(\mu \in \bigcap _{s\in (0,1)}{\mathcal{M}}_{1}^{(s)}\) that possess a Lebesgue density. Nonetheless, the corresponding aggregation risk functional \(\mathcal{R}_{\rho ,A_{2}}\), with \(A_{2}(x_{1},x_{2}):=x_{1}+x_{2}\), defined on the set \(\mathcal{M}_{2}^{\mathcal{L}}\) of all Borel probability measures on R 2 with marginal distributions in \(\mathcal{M}_{1}^{\mathcal{L}}\), is still not copula robust. This is also a consequence of Example 4.7. The lack of copula robustness of \(\mathcal{R}_{\rho ,A_{2}}\) on \(\mathcal{M}_{2}^{\mathcal{L}}\) is not immediately obvious. Note, however, that for \(\mu \in{\mathcal{M}}_{2}^{\mathcal{L}}\), the image measure \(\mu \circ A_{2}^{-1}\) can be purely discrete (see Example 4.7), i.e., \(\mu \circ A_{2}^{-1}\) can lie outside the set \(\mathcal{M}_{1}^{(\alpha )}\) on which \(\mathcal{R}_{\rho}\) is weakly continuous.

When restricting \(\mathcal{R}_{\rho ,A_{2}}\), with \(\rho :=\mathrm{VaR}_{\alpha}\) and \(A_{2}(x_{1},x_{2}):=x_{1}+x_{2}\), to the much smaller set \(\mathcal{N}_{2}\) introduced before (3.2), then copula robustness holds true. Note that \(\mu \circ A_{2}^{-1}\in{\mathcal{N}}_{1}'\subseteq{\mathcal{M}}_{1}^{(\alpha )}\) for all \(\mu \in{\mathcal{N}}_{2}\), where \(\mathcal{N}_{1}'\) (\(\supseteq{\mathcal{N}}_{1}\)) is the set of all (possibly degenerate) univariate normal distributions. The copula robustness follows from Theorem 3.8 since the restriction of \(\mathcal{R}_{\rho ,A_{2}}\) to \(\mathcal{N}_{2}\) is ( O 2 0 N 2 , O R )-continuous. The latter follows from the \((\mathcal{O}_{2}^{0}\cap{\mathcal{N}}_{2},\mathcal{O}_{1}^{0}\cap{\mathcal{N}}_{1}')\)-continuity of the map \(\mathfrak{h}:\mathcal{N}_{2}\to{\mathcal{N}}_{1}'\) defined by \(\mathfrak{h}(\mu ):=\mu \circ A_{2}^{-1}\) and the ( O 1 0 N 1 , O R )-continuity of the restriction of \(\mathcal{R}_{\rho}\) to \(\mathcal{N}_{1}'\) (\(\subseteq{\mathcal{M}}_{1}^{(\alpha )}\)).

4.3 Relation to aggregation robustness of risk measures

In [17], Embrechts et al. consider the special case where \(A_{d}\) is defined as in (i) of Example 4.4 and \(\rho \) is a coherent distortion risk measure defined on a subset of \(L^{1}\). In this case, they obtain an analogue of Corollary 4.3 and refer to it as aggregation robustness. In fact, they do not explicitly consider continuity in the copula, but rather weak continuity of the analogous functional defined on the corresponding Fréchet class. However, as seen in Theorem 3.10, this is the same. A generalisation to more general risk measures and more general aggregation maps is given in the work of Krätschmer et al. [28, Sect. 4.2.4].

The following definition is a reformulation of the definition of aggregation robustness of a risk measure ρ: L p R (i.e., of [17, Definition 2.1]). As before, the aggregation risk functional \(\mathcal{R}_{\rho ,A_{d}}\) associated with \(\rho \) and \(A_{d}(x_{1},\ldots ,x_{d}):=\sum _{i=1}^{d}x_{i}\) is defined by (4.2).

Definition 4.6

Let p R + . A risk measure ρ: L p R is said to be aggregation robust if the corresponding aggregation risk functionals R ρ , A d : M d p R, \(d\ge 2\), are copula robust.

In view of Corollary 4.3 and Example 4.4, any convex risk measure ρ: L p R is aggregation robust. This assertion remains true when replacing in Definition 4.6 the map \(A_{d}(x_{1},\ldots ,x_{d}):=\sum _{i=1}^{d}x_{i}\) by any other of the maps introduced in Example 4.4.

In their Example 2.2, Embrechts et al. [17] demonstrated that for any \(\alpha \in (0,1)\), the value at risk VaR α : L 0 R is not aggregation robust. The following example extends the first part of that example (from \(\alpha =1/2\) to general \(\alpha \in (0,1)\)) and shows that for any \(\alpha \in (0,1)\) and p R + , the aggregation risk functional R VaR α , A 2 : M 2 p R is not copula robust. The example is in particular interesting in that it shows that even if the marginal distributions \(\mu _{1},\ldots ,\mu _{d}\) possess Lebesgue densities and unique quantiles, the map \(C\mapsto \mathfrak{R}_{\mathrm{VaR}_{\alpha},A_{d}}(C,\mu _{1},\ldots , \mu _{d})\) need not be continuous when choosing \(A_{d}(x_{1},\ldots ,x_{d}):=\sum _{i=1}^{d}x_{i}\) (here \(\mathfrak{R}_{\mathrm{VaR}_{\alpha},A_{d}}\) is derived from \(\mathcal{R}_{\rho ,A_{d}}\) as \(\mathfrak{T}_{d}\) is derived from \(\mathcal{T}_{d}\) in (3.1)). The point here is that for random variables \(X_{1},\ldots ,X_{d}\) with distributions \(\mu _{1},\ldots ,\mu _{d}\), the distribution of \(\sum _{i=1}^{n}X_{i}\) can be discrete even if \(\mu _{1},\ldots ,\mu _{d}\) possess Lebesgue densities. This fact has already been pointed out in [17].

Example 4.7

Generalising the first part of Embrechts et al. [17, Example 2.2], define for \(\alpha \in (0,1/2]\) a bivariate copula \(C_{0}^{(\alpha )}\) through

$$\begin{aligned} &C_{0}^{(\alpha )}(u_{1},u_{2}) \\ &\ := \max \big\{ \min \{u_{1},\alpha \}+\min \{u_{2},\alpha \}- \alpha ,0\big\} +\max \{u_{1}+u_{2}-(1+\alpha ),0 \}, \end{aligned}$$

let \(C_{1}\) be the bivariate independence copula, i.e., \(C_{1}(u_{1},u_{2}):=u_{1}u_{2}\), and define for any \(t\in [0,1]\) the copula \(C_{t}^{(\alpha )}\) as a mixture of \(C_{0}^{(\alpha )}\) and \(C_{1}\) via

$$ C_{t}^{(\alpha )}(u_{1},u_{2}):= (1-t)\,C_{0}^{(\alpha )}(u_{1},u_{2})+t \,C_{1}(u_{1},u_{2}). $$

Moreover, for any \(t\in [0,1]\), let \(\hat{C}_{t}^{(\alpha )}\) be the survival copula of \(C_{t}^{(\alpha )}\) which is defined by \(\hat{C}_{t}^{(\alpha )}(u_{1},u_{2}):=u_{1}+u_{2}-1+C_{t}^{(\alpha )}(1-u_{1},1-u_{2})\). Finally, let \(\mu _{1}:=\mu _{2}:=\mathrm{U}_{[0,1]}\) as well as \(\hat{\mu}_{1}:=\hat{\mu}_{2}:=\mathrm{U}_{[-1,0]}\), where \(\mathrm{U}_{I}\) is used to denote the uniform distribution on \(I\). Then the following two assertions are valid:

(i) \(\mathfrak{R}_{\mathrm{VaR}_{\alpha},A_{2}}(C_{0}^{(\alpha )},\mu _{1}, \mu _{2})=\alpha \) and \(\mathfrak{R}_{\mathrm{VaR}_{\alpha},A_{2}}(C_{t}^{(\alpha )},\mu _{1}, \mu _{2})=\sqrt{2\alpha}\) for any \(t\in (0,1]\). Therefore we have \(\lim _{t\searrow 0}C_{t}^{(\alpha )}=C_{0}^{(\alpha )}\) uniformly, but

$$ \lim _{t\searrow 0}\mathfrak{R}_{\mathrm{VaR}_{\alpha},A_{2}}(C_{t}^{( \alpha )},\mu _{1},\mu _{2})\neq\mathfrak{R}_{\mathrm{VaR}_{\alpha},A_{2}}(C_{0}^{( \alpha )},\mu _{1},\mu _{2}). $$

(ii) \(\mathfrak{R}_{\mathrm{VaR}_{1-\alpha},A_{2}}(\hat{C}_{0}^{(\alpha )}, \hat{\mu}_{1},\hat{\mu}_{2})=-1-\alpha \) and \(\mathfrak{R}_{\mathrm{VaR}_{1-\alpha},A_{2}}(\hat{C}_{t}^{(\alpha )}, \hat{\mu}_{1},\hat{\mu}_{2})=-\sqrt{2\alpha}\) for any \(t\in (0,1]\). Therefore we have that \(\lim _{t\searrow 0}\hat{C}_{t}^{(\alpha )}=\hat{C}_{0}^{(\alpha )}\) uniformly, but

$$ \lim _{t\searrow 0}\mathfrak{R}_{\mathrm{VaR}_{1-\alpha},A_{2}}(\hat{C}_{t}^{( \alpha )},\hat{\mu}_{1},\hat{\mu}_{2})\neq\mathfrak{R}_{\mathrm{VaR}_{1- \alpha},A_{2}}(\hat{C}_{0}^{(\alpha )},\hat{\mu}_{1},\hat{\mu}_{2}).$$

For details, see Appendix A.11.

It is worth commenting on the copulas \(C_{0}^{(\alpha )}\), \(C_{1}\) and \(C_{t}^{(\alpha )}\) in the preceding example. The copula \(C_{1}\) is well known; it is simply the distribution function of the uniform distribution on \([0,1]^{2}\). The copula \(C_{0}^{(\alpha )}\) is the distribution function of the ‘uniform distribution’ on the union of the two disjoint line segments \(S_{1}^{\alpha}\) and \(S_{2}^{\alpha}\) with endpoints \((\alpha ,0),(0,\alpha )\) and \((1,\alpha ),(\alpha ,1)\), respectively (see Appendix A.11 for the precise definition). Thus \(C_{t}^{(\alpha )}\) is the distribution function of the Borel probability measure on \([0,1]^{2}\) that is defined as a convex combination (with coefficients \(t\) and \(1-t\)) of the uniform distribution on \([0,1]^{2}\) and the ‘uniform distribution’ on \(S_{1}^{\alpha}\uplus S_{2}^{\alpha}\). For a visualisation of \(C_{0}^{(\alpha )}\), see Fig. 1, and note that the distribution of the sum of two \(\mathrm{U}_{[0,1]}\)-distributed random variables coupled via \(C_{0}^{(\alpha )}\) is the two-point distribution \(\alpha \delta _{\alpha}+(1-\alpha )\delta _{1+\alpha}\).

Fig. 1
figure 1

Visualisation of the copula \(C_{0}^{(\alpha )}\) and of the sets \(S_{1}^{\alpha}\) and \(S_{2}^{\alpha}\) for \(\alpha =0.3\)

4.4 Application to optimal capital and risk allocations

Let \(p\in [1,\infty )\). As in the work of Filipović and Svindland [21], consider \(d\) agents, or business units, with endowments \(X_{1},\ldots ,X_{d}\in L^{p}\). We then assume that these agents assess the riskiness of their positions by means of some convex risk measures ρ 1 ,, ρ d : L p R (in the sense of Sect. 4.1). In order to minimise the total and individual risk, the agents redistribute the aggregate endowment \(X:=\sum _{i=1}^{d}X_{i}\) among themselves. By a redistribution of \(X\), we mean any \(d\)-tuple \((Y_{1},\ldots ,Y_{d})\) of random variables (payoffs) in \(L^{p}\) such that \(X=\sum _{i=1}^{d}Y_{i}\). A redistribution \((X_{1}^{*},\ldots ,X_{d}^{*})\) is called an optimal capital and risk allocation of \(X\) if

$$ \sum _{i=1}^{n}\rho _{i}(X_{i}^{*})=\inf \bigg\{ \sum _{i=1}^{n}\rho _{i}(Y_{i}) : Y_{1},\ldots ,Y_{d}\in L^{p}\mbox{ and }\sum _{i=1}^{n}Y_{i}=X \bigg\} . $$
(4.3)

Here it is assumed that the redistribution is not subject to frictions, i.e., that every redistribution of \(X\) is admissible, even if this is not always the case (as pointed out by Filipović and Kupper [20]).

Note that an optimal capital and risk allocation \((X_{1}^{*},\ldots ,X_{d}^{*})\) as a redistribution must satisfy \(X=\sum _{i=1}^{d}X_{i}^{*}\). An optimal capital and risk allocation of \(X\) need not exist. If it exists, it coincides with the inf-convolution of \(\rho _{1},\ldots ,\rho _{d}\) at \(X\), denoted by \(\mathop {\square }_{i=1}^{d}\rho _{i}(X)\), which is defined by the right-hand side of (4.3). Using the convention \(\inf \emptyset =\infty \), the inf-convolution can be seen as a map \(\mathop {\square }_{i=1}^{d}\rho _{i}:L^{p}\to (-\infty ,\infty ]\). For background, see [21] and references cited therein.

In the above economic setting, the inf-convolution can also be regarded as a map \(\mathop {\boxtimes }_{i=1}^{d}\rho _{i}:L^{p}\times \cdots \times L^{p}\to (- \infty ,\infty ]\) through

$$ \mathop {\boxtimes }_{i=1}^{d}\rho _{i}(X_{1},\ldots ,X_{d}):=\mathop {\square }_{i=1}^{d} \rho _{i}\bigg(\sum _{i=1}^{d}X_{i} \bigg). $$

Since we assumed \(\rho _{1},\ldots ,\rho _{d}\) to be convex risk measures on \(L^{p}\), a result of Filipović and Svindland [21, Corollary 2.7] ensures that the inf-convolution \(\mathop {\square }_{i=1}^{d}\rho _{i}\) is a convex risk measure on \(L^{p}\), too (note that in [21, Corollary 2.7], \(\mathop {\square }_{i=1}^{d}\rho _{i}\) is exact, and hence it is ℝ-valued if \(\rho _{1},\ldots ,\rho _{d}\) are ℝ-valued). As a convex risk measure, \(\mathop {\square }_{i=1}^{d}\rho _{i}\) is distribution-invariant, and so is \(\mathop {\boxtimes }_{i=1}^{d}\rho _{i}\). Thus we may associate with \(\mathop {\boxtimes }_{i=1}^{d}\rho _{i}\) a corresponding functional R i = 1 d ρ i : M d p R through

$$ {\mathcal{R}}_{\mathop {\boxtimes }_{i=1}^{d}\rho _{i}}(\mu ):=\mathcal{R}_{\mathop {\square }_{i=1}^{d} \rho _{i},A_{d}}(\mu )=\mathcal{R}_{\mathop {\square }_{i=1}^{d}\rho _{i}} (\mu \circ A_{d}^{-1} ) $$

with \(A_{d}(x_{1},\ldots ,x_{d})=\sum _{i=1}^{d}x_{i}\), where \(\mathcal{R}_{\mathop {\square }_{i=1}^{d}\rho _{i}}\) and \(\mathcal{R}_{\mathop {\square }_{i=1}^{d}\rho _{i},A_{d}}\) are defined as in (4.1) and (4.2), respectively.

It is worth mentioning that [21, Corollary 2.7] even ensures that for any \(X\in L^{p}\), there exists a comonotone optimal capital and risk allocation \((X_{1}^{*},\ldots ,X_{d}^{*})\). This implies that whenever \(\rho _{1}=\cdots =\rho _{d}\) and \(\rho :=\rho _{1}\) is comonotonic (i.e., finitely additive for all comonotone risks), we have

$$ {\mathcal{R}}_{\mathop {\boxtimes }_{i=1}^{d}\rho}(\mu )=\mathcal{R}_{\rho }(\mu \circ A_{d}^{-1} ) $$
(4.4)

for any \(\mu \in{\mathcal{M}}_{d}^{p}\) (see Appendix A.12). Of course, for convex risk measures \(\rho \) that are not comonotonic, the representation (4.4) need not apply. An example for a comonotonic convex risk measure is the average value at risk at level \(\alpha \in (0,1)\). A counterexample is the \(\alpha \)-expectile at level \(\alpha \in [1/2,1)\); see Emmer et al. [18].

The following result is a direct consequence of Corollary 4.3 and Example 4.4 since we have seen above that \(\mathop {\square }_{i=1}^{d}\rho _{i}\) is a convex risk measure on \(L^{p}\) if \(\rho _{1},\ldots ,\rho _{d}\) are.

Corollary 4.8

Let \(p\in [1,\infty )\) and ρ 1 ,, ρ d : L p R be convex risk measures. Then R i = 1 d ρ i : M d p R is copula robust.

The following example shows that if the risk measures \(\rho _{1},\ldots ,\rho _{d}\) are not assumed to be convex, copula robustness of \(\mathcal{R}_{\mathop {\boxtimes }_{i=1}^{d}\rho _{i}}\) may fail; recall that \(\mathrm{VaR}_{\alpha}\) is not convex.

Example 4.9

It is known from the work of Embrechts et al. [13, Corollary 2] that \(\mathop {\square }_{i=1}^{2}{\mathrm{VaR}}_{\alpha}=\mathrm{VaR}_{2\alpha}\) on \(L^{1}\) when \(\alpha \in (0,1/2)\). Therefore

$$ {\mathcal{R}}_{\mathop {\boxtimes }_{i=1}^{2}{\mathrm{VaR}}_{\alpha}}(\mu )=\mathcal{R}_{ \mathop {\square }_{i=1}^{2}{\mathrm{VaR}}_{\alpha},A_{2}}(\mu )=\mathcal{R}_{\mathrm{VaR}_{2 \alpha},A_{2}}(\mu )=\mathcal{R}_{\mathrm{VaR}_{2\alpha}}(\mu \circ A_{2}^{-1}) $$

for any \(\mu \in{\mathcal{M}}_{2}^{1}\) and \(\alpha \in (0,1/2)\). Thus it follows from Example 4.7 that

R i = 1 2 VaR α : M 2 1 R

is not copula robust for any \(\alpha \in (0,1/2)\).

5 Example 2: stochastic programming problems

5.1 A class of stochastic programming problems

Adopting the framework of Claus et al. [9], let \(\varXi \) be a nonempty and compact subset of R k , h:Ξ× R d R a Borel-measurable function and \(Z\) an R d -valued random variable on an atomless probability space (Ω,F,P). Let \(p\in [1,\infty )\) and assume that \(h(\xi ,Z)\) is contained in L p = L p (Ω,F,P) for any \(\xi \in \varXi \). Consider the optimisation problem

$$ \min \big\{ \rho \big(h(\xi ,Z)\big) : \xi \in \varXi \big\} , $$
(5.1)

where ρ: L p R is any map. A classical example for \(\rho \) is the expectation, i.e., ρ(Y)=E[Y], where \(p=1\). In Sect. 5.2, we consider another example where \(\rho \) is a more general monotone, distribution-invariant and convex function on \(L^{p}\). Problem (5.1) can be written as min{ R ρ (Ph ( ξ , Z ) 1 ):ξΞ} or, equivalently, as

$$ \min \big\{ \mathcal{R}_{\rho}\big((\delta _{\xi}\otimes \mu )\circ h^{-1} \big) : \xi \in \varXi \big\} , $$
(5.2)

where \(\mathcal{R}_{\rho}\) is derived from \(\rho \) as in (4.1) and \(\mu \) denotes the distribution of \(Z\).

Lemma 5.1 below assumes the following three conditions, where monotonicity, distribution-invariance and convexity are defined as in (i), (iii) and (iv) in Sect. 4.1. Recall that (Ω,F,P) is assumed to be atomless.

(a) ρ: L p R, for some \(p\in [1,\infty )\), is monotone, distribution-invariant and convex.

(b) h:Ξ× R d R is Borel-measurable and limited by an exponent γ R + + .

(c) \((\delta _{\xi}\otimes \mu )[D_{h}]=0\) for any \(\xi \in \varXi \) and \(\mu \in{\mathcal{M}}_{d}^{\gamma p}\).

The second requirement in (b) means that there exists some locally bounded map \(\eta :\varXi \to (0,\infty )\) such that \(|h(\xi ,z)|\le \eta (\xi )(1+|z|)^{\gamma}\) for all (ξ,z)Ξ× R d . In (c), the set \(D_{h}\) is the set of all discontinuity points of \(h\). Under conditions (a) and (b), the map Q ρ , h :Ξ× M d γ p R given by

$$ \mathcal{Q}_{\rho ,h}(\xi ,\mu ):=\mathcal{R}_{\rho} \big((\delta _{\xi} \otimes \mu )\circ h^{-1}\big) $$

is well defined. The following lemma is known from Claus et al. [9, Theorem 5.2].

Lemma 5.1

If conditions (a)–(c) hold true, then the map Q ρ , h :Ξ× M d γ p R is (( O R k Ξ)× O d γ p , O R )-continuous.

Lemma 5.1 can be used to obtain the following result on the map

R ρ , h : M d γ p R { } , R ρ , h ( μ ) : = inf { Q ρ , h ( ξ , μ ) : ξ Ξ } .
(5.3)

Recall that the set \(\varXi \) was assumed to be compact.

Theorem 5.2

If conditions (a)–(c) hold true, then the infimum in (5.3) is attained for any \(\mu \in{\mathcal{M}}_{d}^{\gamma p}\), and the map R ρ , h : M d γ p R is ( O d γ p , O R )-continuous.

Note here that if the infimum in (5.3) is attained, then \(\mathcal{R}_{\rho ,h}(\mu )\) is a solution to (5.2). Theorem 5.2 is a variant of Claus et al. [9, Corollary 2.4].

Remark 5.3

The ( O d γ p , O R )-continuity of R ρ , h : M d γ p R obtained in the preceding theorem can be seen as robustness of \(\rho \) relative to \((\mathcal{G},Z,\pi _{d}^{\gamma p})\) in the sense of Embrechts et al. [16, Definition 1], where \(\mathcal{G}:=\{h(\xi ,\,\cdot \,):\xi \in \varXi \}\) and \(\pi _{d}^{\gamma p}\) is any metric metrising the \((p\gamma )\)-weak topology \(\mathcal{O}_{d}^{\gamma p}\).

5.2 Example: one-period mean–risk portfolio optimisation

Consider a one-period financial market consisting of one riskless bond and \(d\) risky assets with prices per unit \(S_{0}^{0}:=1\) and S 0 1 ,, S 0 d R + + at time 0. In between time 0 and time 1, the prices change to \(S_{1}^{0},S_{1}^{1},\ldots ,S_{1}^{d}\) according to \(S_{1}^{i}=Z^{i}S_{0}^{i}\), \(i=0,\ldots ,d\), where the bond’s relative price change \(Z^{0}\) is deterministic ( R + + ) and known at time 0 and the assets’ relative price changes \(Z^{1},\ldots ,Z^{d}\) are R + -valued random variables on a common atomless probability space (Ω,F,P) and are unobservable at time 0. Let x 0 R + + be an amount of capital to be invested in the bond and in the \(d\) assets at time 0. If for any \(i=1,\ldots ,d\), the amount of capital invested in the asset \(i\) is denoted by \(\xi _{i}\), then the amount of capital invested in the bond is \(\xi _{0}:=x_{0}-\langle \xi ,\boldsymbol{1}\rangle \), where \(\xi :=(\xi _{1},\ldots ,\xi _{d})\) and 1:=(1,,1) R d . When identifying a portfolio with the corresponding amounts of capital \(\xi _{1},\ldots ,\xi _{d}\) and assuming that taking loans and short selling are banned, the set

Ξ:={( ξ 1 ,, ξ d ) R + d :ξ,1 x 0 }
(5.4)

can be seen as the set of all admissible portfolios. The realised loss at time 1 of a portfolio \(\xi =(\xi _{1},\ldots ,\xi _{d})\in \varXi \) is given by

$$ h(\xi ,z):=(x_{0}-\langle \xi ,\boldsymbol{1}\rangle )(1-Z^{0})+ \langle \xi ,\boldsymbol{1}-z\rangle , $$
(5.5)

when \(z=(z_{1},\ldots ,z_{d})\) is the vector of the assets’ realised relative price changes, i.e., the realisation of \(Z:=(Z^{1},\ldots ,Z^{d})\).

Of course, the portfolio \(\xi =(\xi _{1},\ldots ,\xi _{d})\in \varXi \) should be chosen such that the expected profit is as high as possible, i.e., such that the expected loss E[h(ξ,Z)] is as small as possible. Simultaneously the portfolio’s downside risk should be as small as possible, where the downside risk can be measured by \(\sigma (h(\xi ,Z))\) for a suitable given ‘downside’ risk measure σ: L p R. This leads to the mean–risk model

min{E[h(ξ,Z)]+κσ(h(ξ,Z)):ξΞ},
(5.6)

where κ R + + is a risk aversion parameter. Note that the model (5.6) aims at minimising the weighted sum of two competing objects and is in line with Markowitz’ [32] classical mean–variance optimisation theory (where σ(h(ξ,Z))=Var[h(ξ,Z)]). It is also worth mentioning that mean–risk models are related to the corresponding multiobjective optimisation problems; see for instance the works of Ogryczak and Ruszczyński [37, 38] and Schultz and Tiedemann [46].

The mean–risk model (5.6) coincides with problem (5.2) when \(Z\) is distributed according to \(\mu \) and ρ: L p R is defined by

ρ(Y):=E[Y]+κσ(Y),
(5.7)

where one should note that \(\rho \) is monotone, distribution-invariant and convex if \(\sigma \) is. For any fixed \(p\in [1,\infty )\), the following corollary is a simple consequence of Theorem 5.2; see Appendix A.14.

Corollary 5.4

Let σ: L p R be monotone, distribution-invariant and convex. For any \(\mu \in{\mathcal{M}}_{d}^{p}\), let \(\mathcal{R}_{\rho ,h}(\mu )\) be defined by (5.3) (and (5.7)). Then the infimum on the right-hand side of (5.3) is attained (and thus finite) for any \(\mu \in{\mathcal{M}}_{d}^{p}\), and the map R ρ , h : M d p R is ( O d p , O R )-continuous.

Remark 5.5

If we use \(\mathfrak{A}_{d}\) to denote the set of all functions h(ξ,): R d R, \(\xi \in \varXi \), then \(\mathcal{R}_{\rho ,h}=\mathcal{R}_{\rho ,\mathfrak{A}_{d}}\) for the functional \(\mathcal{R}_{\rho ,\mathfrak{A}_{d}}\) defined by (1.3), i.e., by

$$ {\mathcal{R}}_{\rho ,\mathfrak{A}_{d}}(\mu ):=\inf \{\mathcal{R}_{\rho}(\mu \circ A_{d}^{-1}) : A_{d}\in \mathfrak{A}_{d} \}=\inf \{\mathcal{R}_{ \rho ,A_{d}}(\mu ) : A_{d}\in \mathfrak{A}_{d} \}. $$

Here \(\mathcal{R}_{\rho}\) is derived from \(\rho \) as in (4.1), and \(\mathcal{R}_{\rho ,A_{d}}\) is derived from \(\mathcal{R}_{\rho}\) and \(A_{d}\) as in (4.2).

5.3 Copula robustness of stochastic programming problems

In the setting of Sect. 5.1, assume that conditions (a)–(c) are satisfied and recall that \(\varXi \) was assumed to be compact. Then by Theorem 5.2, the map R ρ , h : M d γ p R is ( O d γ p , O R )-continuous. Together with Theorem 3.12, this leads to the following result.

Corollary 5.6

If conditions (a)–(c) hold true, then the map R ρ , h : M d γ p R is copula robust.

Corollary 5.6 shows that under conditions (a)–(c), the minimal value of problem (5.2) is robust with respect to slight changes in the copula of \(\mu \).

Example 5.7

Let us return to the specific setting of Sect. 5.2 (mean–risk portfolio optimisation), where \(\mu \) played the role of the joint distribution of the relative price changes \((Z_{1},\ldots ,Z_{d})\). In this framework, it can be seen in the proof of Corollary 5.4 that conditions (a)–(c) are satisfied for \(p=1\) when σ: L p R is monotone, distribution-invariant and convex. Thus under the latter assumptions on \(\sigma \), Corollary 5.6 ensures that the functional R ρ , h : M d p R is copula robust. Of course, the copula robustness of \(\mathcal{R}_{\rho ,h}\) also directly follows from Theorem 3.12 and Corollary 5.4.

6 Example 3: multi-period portfolio optimisation

In this section, the objective is to show that the maximal expected utility of the terminal wealth of a portfolio in a multi-period financial market model (see Sect. 6.2) is copula robust if it is regarded as a function of the joint distribution of the assets’ relative price changes. The terminal wealth portfolio optimisation problem can be regarded as a Markov decision problem as introduced in the textbook by Bäuerle and Rieder [1, Chaps. 1 and 2] and in other standard monographs. To prove the main result of this section (Corollary 6.8), it is therefore useful to first establish a variant of a result of Müller [35, Theorem 4.2] about the dependence of the value function on the Markov transition probability function. This variant can be found in Theorem 6.2 and is of independent interest. It is worth pointing out that the factor \(1/\psi (x)\) on the right-hand side of (6.3) is essential for our purposes; see the proof of Corollary 6.7.

6.1 Groundwork: a class of Markov decision models

6.1.1 Basic notation and terminology

Let \((E,\mathcal{E})\) be a measurable space, to be regarded as the state space, and NN the fixed finite planning horizon. For each \(n=0,\ldots ,N-1\) and \(x\in E\), let \(A_{n}(x)\) be a nonempty set whose elements are regarded as the admissible actions at time \(n\) in state \(x\). For each \(n=0,\ldots ,N-1\), let \(A_{n}:=\bigcup _{x\in E} A_{n}(x)\) and \(D_{n}:=\{(x,a)\in E \times A_{n} : a\in A_{n}(x)\}\). The elements of \(A_{n}\) can be seen as the actions that may basically be selected at time \(n\), whereas the elements of \(D_{n}\) are the possible state–action combinations at time \(n\). We equip \(A_{n}\) with a \(\sigma \)-algebra \(\mathcal{A}_{n}\) and \(D_{n}\) with the trace \(\sigma \)-algebra \(\mathcal{D}_{n}:=(\mathcal{E}\otimes{\mathcal{A}}_{n})\cap D_{n}\). We use \(\boldsymbol{A}\) to denote the family that consists of all sets \(A_{n}(x)\), \(n=0,\ldots ,N-1\), \(x\in E\), and of all \(\sigma \)-algebras \(\mathcal{A}_{n}\), \(n=0,\ldots ,N-1\). All the sets and spaces just introduced are fully determined by \((E,\mathcal{E})\) and \(\boldsymbol{A}\). Although all the objects introduced in what follows depend on \((E,\mathcal{E})\) and \(\boldsymbol{A}\), we suppress this dependence in the notation.

By a (Markov decision) transition function associated with \((E,\mathcal{E})\) and \(\boldsymbol{A}\), we mean an \(N\)-tuple \(P=(P_{n})_{n=0}^{N-1}\), where \(P_{n}\) is a probability kernel from \((D_{n},\mathcal{D}_{n})\) to \((E,\mathcal{E})\), to be seen as the one-step transition kernel at time \(n\). The set of all transition functions is denoted by \(\overline{\mathcal{P}}\). The actions are governed by a so called \(N\)-stage strategy, i.e., by an \(N\)-tuple \(\pi =(f_{0},\ldots ,f_{N-1})\) where \(f_{n}\) is a decision rule at time \(n\), i.e., an \((\mathcal{E},\mathcal{A}_{n})\)-measurable map \(f_{n}:E\rightarrow A_{n}\) satisfying \(f_{n}(x)\in A_{n}(x)\) for all \(x\in E\). Let \(F_{n}\) be a nonempty set of decision rules at time \(n\), and define the set of all ‘admissible’ strategies by \(\varPi :=F_{0}\times \cdots \times F_{N-1}\).

For any \(P=(P_{n})_{n=0}^{N-1}\in \overline{\mathcal{P}}\), \(\pi =(f_{n})_{n=0}^{N-1}\in \varPi \) and \(n=0,\ldots ,N-1\), define the probability kernel \(P_{n}^{\pi}\) from \((E,\mathcal{E})\) to \((E,\mathcal{E})\) by \(P_{n}^{\pi}(x,B) := P_{n}((x,f_{n}(x)),B)\), \(x\in E\), \(B\in{\mathcal{E}}\). The probability measure \(P_{n}^{\pi}(x,\,\cdot \,)\) can be seen as the one-step transition probability at time \(n\) given state \(x\) when the actions are chosen according to \(\pi \). On the measurable space \((\varOmega ,\mathcal{F}):=(E^{N+1},\mathcal{E}^{\otimes (N+1)})\), we can define for any \(x_{0}\in E\) and \(\pi \in \varPi \) the probability measure P x 0 , P ; π := δ x 0 P 0 π P N 1 π , where the right-hand side is the usual product of the probability measure \(\delta _{x_{0}}\) and the kernels \(P_{0}^{\pi},\ldots ,P_{N-1}^{\pi}\). Under the probability measure P x 0 , P ; π , the identity map \(X=(X_{n})_{n=0}^{N}\) on \(\varOmega \) is called Markov decision process (MDP) associated with initial state \(x_{0}\), transition function \(P\) and strategy \(\pi \).

Let r n : D n R be a ( D n ,B(R))-measurable map, referred to as one-stage reward function, and r N :ER an (E,B(R))-measurable map, referred to as terminal reward function. Here \(r_{n}(x,a)\) specifies the one-stage reward when action \(a\) is taken at time \(n\) in state \(x\), and \(r_{N}(x)\) specifies the reward of being in state \(x\) at the terminal time \(N\). Finally, set \(\vec{\boldsymbol{r}}:=(r_{n})_{n=0}^{N}\).

For any fixed subset \(\mathcal{P}\subseteq \overline{\mathcal{P}}\), the collection of the objects \((E,\mathcal{E})\), \(\boldsymbol{A}\), \(\varPi \), \(\mathcal{P}\), { P x 0 , P ; π : x 0 E,PP,πΠ}, \(X\) and \(\vec{\boldsymbol{r}}\) introduced so far are often referred to as Markov decision model. In fact, in the standard literature, the set \(\mathcal{P}\) is typically a singleton. If, however, there is uncertainty with respect to the ‘true’ transition function, then one should allow a whole bundle of transition functions in the model.

6.1.2 Intrinsic optimisation problem

We assume that \(r_{k}(X_{k},f_{k}(X_{k}))\), \(k=0,\ldots ,N-1\), and \(r_{N}(X_{N})\) are P x 0 , P ; π -integrable for any \(x_{0}\in E\), \(P\in{\mathcal{P}}\), \(\pi \in \varPi \) (for a sufficient condition, see Lemma 6.1 below). As a consequence, we can define for any \(P\in{\mathcal{P}}\) and \(\pi =(f_{n})_{n=0}^{N-1}\in \varPi \) an (E,B(R))-measurable map V 0 P ; π :ER by

V 0 P ; π ( x 0 ):= E x 0 , P ; π [ k = 0 N 1 r k ( X k , f k ( X k ))+ r N ( X N )].

The value \(V_{n}^{P;\pi}(x_{0})\) specifies the expected total reward of \(X\) under P x 0 , P ; π . Here ‘under P x 0 , P ; π ’ means that \(X\) starts in \(x_{0}\) and that the random transitions of \(X\) are governed by \(P\) and \(\pi \). For fixed \(P\in{\mathcal{P}}\), it is natural to look for those strategies \(\pi \in \varPi \) for which the expected total reward from time 0 to \(N\) is maximal for a given initial states \(x_{0}\in E\). This results in the optimisation problem

$$ \max \{V_{0}^{P;\pi}(x_{0}) : \pi \in \varPi \}. $$
(6.1)

We assume that \(\sup _{\pi \in \varPi}V_{0}^{P;\pi}(x_{0})<\infty \) for any \(x_{0}\in E\), which means that it is impossible to gain an arbitrarily high reward. A strategy \(\pi ^{P}\in \varPi \) is said to be optimal for (6.1) if \(V_{0}^{P;\pi ^{P}}(x_{0})=V_{0}^{P}(x_{0})\) for any \(x_{0}\in E\), where the map V 0 P :ER is defined by \(V_{0}^{P}(x_{0}):=\sup _{\pi \in \varPi}V_{0}^{P;\pi}(x_{0})\). The map \(V_{0}^{P}\) is referred to as value function.

Some known facts about the existence of optimal strategies are recalled in Appendix C. Part (i) of Theorem C.1 shows that under some assumptions, the value function can be obtained by the Bellman iteration scheme. The latter involves the time-\(n\) value functions \(V_{n}^{P}\), \(n=1,\ldots ,N\), defined by \(V_{n}^{P}(x):=\sup _{\pi \in \varPi}V_{n}^{P;\pi}(x)\), where for any \(\pi =(f_{n})_{n=0}^{N-1}\in \varPi \) the (E,B(R))-measurable map V n P ; π :ER is defined by V n P ; π (x):= E x 0 , P ; π [ k = n N 1 r k ( X k , f k ( X k ))+ r N ( X N )| X n =x] (note that the right-hand side is independent of \(x_{0}\in E\)). Here and in the following, we use the convention \(\sum _{n=N}^{N-1}:=0\). The maps \(V_{n}^{P;\pi} (\, \cdot \,)\), \(\pi \in \varPi \), are sometimes called policy value functions and appear in Theorem 6.2.

6.1.3 Bounding function

For the Markov decision model introduced above and \(P\in{\mathcal{P}}\), an \((\mathcal{E},\mathcal{B}([1,\infty ))\)-measurable function \(\psi :E\to [1,\infty )\) is called a bounding function for \(P\) if there exist constants K 1 , K 2 , K 3 R + such that the following three assertions hold:

(a) \(|r_{n}(x,a)| \le K_{1} \psi (x)\) for any \(n=0,\ldots ,N-1\) and \((x,a)\in D_{n}\);

(b) \(|r_{N}(x)| \le K_{2} \psi (x)\) for any \(x\in E\);

(c) \(\int _{E}\psi (y)\,P_{n}((x,a),dy)\le K_{3} \psi (x)\) for any \(n=0,\ldots ,N-1\) and \((x,a)\in D_{n}\).

This terminology is adapted from the work of Müller [34, Definition 2.4] and the textbook by Bäuerle and Rieder [1, Definition 2.4.1]. Denote by M(E) the set of all (E,B(R))-measurable maps v:ER, and by M ψ (E) the set of all vM(E) satisfying \(\|v\|_{\psi}<\infty \), where \(\|v\|_{\psi}:=\sup _{x\in E}|v(x)|/\psi (x)\).

Lemma 6.1

Let \(P\in{\mathcal{P}}\). If there exists a bounding function \(\psi \) for \(P\), then the random variables \(r_{k}(X_{k},f_{k}(X_{k}))\), \(k=0,\ldots ,N-1\), and \(r_{N}(X_{N})\) are P x 0 , P ; π -integrable for any \(x_{0}\in E\) and \(\pi \in \varPi \), and moreover, \(\|V_{n}^{P}\|_{\psi}<\infty \) (in particular, V n P ; π M ψ (E) for any \(\pi \in \varPi \)) for any \(n=0,\ldots ,N-1\).

6.1.4 Continuous dependence of the optimal value on the transition function

Let \(\psi :E\to [1,\infty )\) be an \((\mathcal{E},\mathcal{B}([1,\infty ))\)-measurable function, and note that the integral \(\int _{E} v\,d\mathfrak{m}\) exists and is finite for any v M ψ (E) and \(\mathfrak{m}\in{\mathcal{M}}_{1}^{\psi}(E)\), the set of all probability measures on \((E,\mathcal{E})\) with \(\int \psi \,d\mathfrak{m}<\infty \). For any fixed subset M M ψ (E), the distance between \(\mathfrak{m}_{1}\) and \(\mathfrak{m}_{2}\) from \(\mathcal{M}_{1}^{\psi}(E)\) can be measured by

d M ( m 1 , m 2 ):= sup v M | E vd m 1 E vd m 2 |.
(6.2)

Note that (6.2) defines a probability pseudo-metric (in the sense of Rachev [41, Sect. 2.3]), i.e., a map d M : M 1 ψ (E)× M 1 ψ (E) R + which is symmetric and fulfils the triangle inequality. If M separates points in \(\mathcal{M}_{1}^{\psi}(E)\) (i.e., if any two \(\mathfrak{m}_{1},\mathfrak{m}_{2}\in{\mathcal{M}}_{1}^{\psi}(E)\) coincide when \(\int _{E} v\,d\mathfrak{m}_{1}=\int _{E} v\,d\mathfrak{m}_{2}\) for all vM), then d M is even a probability metric. It is sometimes called integral probability metric or probability metric with a \(\zeta \)-structure; see Müller [35] and Zolotarev [54].

In some situations, the (pseudo-)metric d M (with M M ψ (E) fixed) can be represented by the right-hand side of (6.2) with M replaced by a different subset M of M ψ (E). Each such set M is said to be a generator of d M . The largest generator of d M is called the maximal generator of d M and will be denoted by M . That is, M is the set of all v M ψ (E) for which | E vd m 1 E vd m 2 | d M ( m 1 , m 2 ) for all \(\mathfrak{m}_{1},\mathfrak{m}_{2}\in{\mathcal{M}}_{1}^{\psi}(E)\); see [35, Definition 3.1]. Examples for d M and M are discussed in Kern et al. [26] and Müller [34, 35].

Now denote by \(\mathcal{P}_{\psi}\) the set of all transition functions \(P=(P_{n})_{n=0}^{N-1}\in{\mathcal{P}}\) with \(P_{n}((x,a),\,\cdot \,) \in{\mathcal{M}}_{1}^{\psi}(E)\) for all \((x,a)\in D_{n}\) and \(n=0,\ldots ,N-1\). For any \(P\in{\mathcal{P}}_{\psi}\), the integrals \(\int _{E} v(y)\,P_{n}((x,a),dy)\), v M ψ (E), \((x,a)\in D_{n}\), \(n=0,\ldots ,N-1\), exist and are finite. For any M M ψ (E), we may define the distance between two transition functions \(P=(P_{n})_{n=0}^{N-1}\) and \(Q=(Q_{n})_{n=0}^{N-1}\) from \(\mathcal{P}_{\psi}\) by

d M , ψ (P,Q):= max n = 0 , , N 1 sup ( x , a ) D n d M ( P n ((x,a),), Q n ((x,a),))/ψ(x).
(6.3)

For any M M ψ (E), the Minkowski functional ϱ M : M ψ (E) R + (in the sense of Rudin [42, paragraph after Definition 1.33]) is defined by

ϱ M (v):=inf{λ R + + :v/λM},

where we set \(\inf \emptyset :=\infty \). Examples for M and ϱ M are discussed in Kern et al. [26] and Müller [34]. In the following result, we assume that \(\psi \) is a bounding function for any \(Q\in{\mathcal{P}}_{\psi}\). By Lemma 6.1, it then follows that \(V_{n}^{Q}(x)<\infty \) for any \(n=0,\ldots ,N\), \(Q\in{\mathcal{P}}_{\psi}\) and \(x\in E\). In particular, we can define a functional V n x : P ψ R by \(\overline{\mathcal{V}}_{n}^{x}(Q):=V_{n}^{Q}(x)\). Note that Theorem 6.2 is a refinement of Kern’s PhD thesis [25, Theorem 2.2.8] and that a related result was proved earlier by Müller [34, Theorem 4.2]. We use \(K_{3,P}\) to denote the constant in condition (c) of a bounding function for \(P\).

Theorem 6.2

We assume that \(\psi \) is a bounding function for any \(Q\in{\mathcal{P}}_{\psi}\), and we let M M ψ (E) and M be a generator of d M . Then for any \(n=0,\ldots ,N-1\), \(x_{n}\in E\) and \(Q,P\in{\mathcal{P}}_{\psi}\), we have

| V n x n ( Q ) V n x n ( P ) | j = n N 1 sup π Π ϱ M ( V j + 1 P ; π ) ( K 3 , P + ϱ M ( ψ ) d M , ψ ( Q , P ) ) n j ψ ( x n ) d M , ψ ( Q , P ) .

As a direct consequence of Theorem 6.2, we get the following result.

Corollary 6.3

Assume that \(\psi \) is a bounding function for any \(Q\in{\mathcal{P}}_{\psi}\) and let \(P\in{\mathcal{P}}_{\psi}\). Let M M ψ (E) and M be a generator of d M . If ϱ M (ψ)< and sup π Π ϱ M ( V n + 1 P ; π )< for any \(n=0,\ldots ,N-1\), then \(\overline{\mathcal{V}}_{n}^{x_{n}}\) is ( d M , ψ ,||)-continuous at \(P\) for any \(n=0,\ldots ,N-1\) and \(x_{n}\in E\).

6.2 A utility-based portfolio optimisation problem

6.2.1 Financial market model and a terminal wealth optimisation problem

Consider an \(N\)-period financial market consisting of one riskless bond \(S^{0}=(S^{0}_{n})_{n=0}^{N}\) and \(d\) risky assets \(S^{i}=(S^{i}_{n})_{n=0}^{N}\), \(i=1,\ldots ,d\), for some fixed dN. Assume that the value of the bond evolves deterministically according to

$$ S^{0}_{0}=1\quad \mbox{ and }\quad S^{0}_{n+1}=Z^{0}_{n+1} S^{0}_{n}, \quad n=0,\ldots ,N-1, $$

for some fixed constants \(Z^{0}_{1},\ldots ,Z^{0}_{N}\in [1,\infty )\), and that the value of the \(i\)th asset evolves stochastically according to

$$ S^{i}_{0}=s^{i}_{0}\quad \mbox{ and }\quad S^{i}_{n+1}=Z^{i}_{n+1} S^{i}_{n}, \quad n=0,\ldots ,N-1, $$

for a constant s 0 i R + + and independent R + -valued random variables \(Z^{i}_{1},\ldots ,Z^{i}_{N}\) on a common probability space (Ω,F,P). For \(n=0,\ldots ,N\), set \(S_{n}:=(S_{n}^{1},\ldots ,S_{n}^{d})\) and \(Z_{n}:=(Z_{n}^{1},\ldots ,Z_{n}^{d})\) and denote by \(\mu _{n}\) the distribution of \(Z_{n}\). We also define 1:=(1,,1) R d , \(\mathcal{F}_{0}:=\{\emptyset ,\varOmega \}\), \(\mathcal{F}_{n}:=\sigma (S_{0},\ldots ,S_{n})=\sigma (Z_{1},\ldots ,Z_{n})\), \(n=1,\ldots ,N\), and F:= ( F n ) n = 0 N .

Now, an agent invests a given amount of capital x 0 R + + in the bond and the assets according to some self-financing trading strategy. By a trading strategy, we mean an F-adapted R + d + 1 -valued stochastic process \(\xi =(\xi _{n}^{0},\xi _{n})_{n=0}^{N-1}\) with \(\xi _{n}=(\xi _{n}^{1},\ldots ,\xi _{n}^{d})\), where \(\xi _{n}^{0}\) and \(\xi _{n}^{i}\) specify the amounts of capital invested in the bond and in the \(i\)th asset, respectively, during the time interval \([n,n+1)\). The nonnegativity of \(\xi _{n}^{0},\xi _{n}^{1},\ldots ,\xi _{n}^{d}\), \(n=0,\ldots ,N-1\), means that taking loans and short selling of the assets are excluded. The corresponding (F-adapted) portfolio process \(X^{\xi}=(X_{n}^{\xi})_{n=0}^{N}\) associated with \(\xi =(\xi _{n}^{0},\xi _{n})_{n=0}^{N-1}\) is defined by

$$ X_{0}^{\xi}:=\xi _{0}^{0}+\langle \xi _{0},\boldsymbol{1}\rangle , \qquad X_{n+1}^{\xi}:=\xi _{n}^{0}Z^{0}_{n+1} + \langle \xi _{n},Z_{n+1} \rangle ,\quad n=0,\ldots ,N-1. $$
(6.4)

A trading strategy \(\xi =(\xi _{n}^{0},\xi _{n})_{n=0}^{N-1}\) is called self-financing with respect to the initial capital \(x_{0}\) if \(x_{0}=\xi _{0}^{0}+\langle \xi _{0},\boldsymbol{1}\rangle \) and \(X_{n}^{\xi}=\xi _{n}^{0}+\langle \xi _{n},\boldsymbol{1}\rangle \) for any \(n=1,\ldots ,N\). Note that \(\xi _{n}^{0}\) and \(\langle \xi _{n},\boldsymbol{1}\rangle \) specify the amounts of capital invested during the time interval \([n,n+1)\) in the bond and in the \(d\) assets, respectively. For any self-financing trading strategy \(\xi =(\xi _{n}^{0},\xi _{n})_{n=0}^{N-1}\) with respect to \(x_{0}\), we have \(\xi _{n}^{0}=X_{n}^{\xi}-\langle \xi _{n},\boldsymbol{1}\rangle \) for any \(n=0,\ldots ,N-1\), and therefore the corresponding portfolio process admits the representation

$$ X_{0}^{\xi}=x_{0}, \qquad X_{n+1}^{\xi}=Z^{0}_{n+1} X_{n}^{\xi} + \langle \xi _{n},Z_{n+1}-Z^{0}_{n+1}\boldsymbol{1}\rangle \quad \mbox{for }n=0,\ldots ,N-1. $$
(6.5)

In view of (6.5), we identify a self-financing trading strategy with respect to \(x_{0}\) with an F-adapted R + d -valued stochastic process \(\xi =(\xi _{n})_{n=0}^{N-1}\) with \(\xi _{n}=(\xi _{n}^{1}, \ldots ,\xi _{n}^{d})\) such that \(\langle \xi _{0},\boldsymbol{1}\rangle \in [0,x_{0}]\) and \(\langle \xi _{n},\boldsymbol{1}\rangle \in [0,X_{n}^{\xi}]\) for any \(n=1,\ldots ,N-1\). We restrict ourselves to Markovian self-financing trading strategies \(\xi =(\xi _{n})_{n=0}^{N-1}\) with respect to \(x_{0}\) which means that \(\xi _{n}\) only depends on \(n\) and \(X_{n}^{\xi}\). To put it another way, we assume that for any \(n=0,\ldots ,N-1\), there exists a Borel-measurable map f n : R + R + d such that \(\xi _{n} = f_{n}(X_{n}^{\xi})\). Then in particular, \(X^{\xi}\) is an R + -valued F-Markov process whose one-step transition probability at time \(n\in \{0,\ldots ,N-1\}\) given state x R + and strategy \(\xi =(\xi _{n})_{n=0}^{N-1}\) (resp. \(\pi :=(f_{n})_{n=0}^{N-1}\)) is given by \(\mu _{n+1}\circ \eta _{n,(x,f_{n}(x))}^{-1} \), where

η n , ( x , a ) (z):= Z n + 1 0 x+a,z Z n + 1 0 1,z R + d .
(6.6)

The agent’s aim is to find a self-financing trading strategy \(\xi =(\xi _{n})_{n=0}^{N-1}\) (resp. \(\pi =(f_{n})_{n=0}^{N-1}\)) with respect to \(x_{0}\) for which her expected utility of the relative terminal wealth is maximised. We assume that the agent is risk-averse and that her attitude towards risk is set via the power utility function u α : R + R + defined by

$$ u_{\alpha}(x):=x^{\alpha }$$
(6.7)

for some fixed \(\alpha \in (0,1)\). Hence the agent is interested in those self-financing trading strategies \(\xi =(\xi _{n})_{n=0}^{N-1}\) (resp. \(\pi =(f_{n})_{n=0}^{N-1}\)) with respect to \(x_{0}\) for which the expectation of \(u_{\alpha}(X_{N}^{\xi}/(x_{0}S_{N}^{0}))\) is maximised. Since \(u_{\alpha}(X_{N}^{\xi}/(x_{0}S_{N}^{0})) = u_{\alpha}(X_{N}^{\xi})/(x_{0}S_{N}^{0})^{ \alpha}\), this is equivalent to maximising the expectation of \(u_{\alpha}(X_{N}^{\xi})\). For notational simplicity, we consider the terminal wealth optimisation problem in the latter form. We assume that \(Z_{n}^{1},\ldots ,Z_{n}^{d}\) are ℙ-a.s. strictly positive and E[ u α ( Z n ,1)]< for any \(n=1,\ldots ,N\).

Example 6.4

Assume that the bond and the \(d\) assets evolve according to the 1-dimensional ordinary (Itô stochastic) differential equations

$$\begin{aligned} & d\mathsf{s}^{0}_{t} = \delta _{0}\mathsf{s}^{0}_{t}\,dt,\quad \mathsf{s}_{0}^{0}=1, \\ & d\mathsf{s}_{t}^{i}=\delta _{i}\mathsf{s}_{t}^{i}\,dt+\sigma _{i} \mathsf{s}_{t}^{i}\,dB_{t}^{i},\quad \mathsf{s}_{0}^{i}=s_{0}^{i}, \qquad i=1,\ldots ,d, \end{aligned}$$

where δ 0 , δ 1 ,, δ d , σ 1 ,, σ d R + + are constants and \(B^{1},\ldots ,B^{d}\) are (jointly Gaussian) correlated 1-dimensional standard Brownian motions which satisfy for any t R + that Cov( B t i , B t j )= R i , j t, where R= ( R i , j ) 1 i , j d R d × d is a fixed correlation matrix (i.e., \(R\) is symmetric and positive semi-definite with entries 1 on the diagonal). This is a multivariate version of the classical Black–Scholes–Merton model. Choose the trading period to be the unit interval \([0,1]\) and assume that the bond and the assets can be traded only at \(N\) equidistant time points in \([0,1]\), namely at \(t_{N,n}:=n/N\), \(n=0,\ldots ,N-1\). Then the relative price changes \(Z^{0}_{n+1}:=S^{0}_{n+1}/S^{0}_{n}=\mathsf{s}^{0}_{t_{N,n+1}}/ \mathsf{s}^{0}_{t_{N,n}}\) and \(Z_{n+1}^{i}:=S_{n+1}^{i}/S_{n}^{i}=\mathsf{s}^{i}_{t_{N,n+1}}/ \mathsf{s}^{i}_{t_{N,n}}\) are given by, respectively, \(e^{\delta _{0}(t_{N,n+1}-t_{N,n})}\) and \(e^{(\delta _{i} - \sigma _{i}^{2}/2)(t_{N,n+1}-t_{N,n})+\sigma _{i}(B^{i}_{t_{N,n+1}}-B^{i}_{t_{N,n}})}\), i.e., for \(n=0,\ldots ,N-1\),

$$ Z_{n+1}^{0}=e^{\delta _{0}/N}\quad \mbox{ and }\quad Z_{n+1}^{i}=e^{( \delta _{i} - \sigma _{i}^{2}/2)/N+\sigma _{i}(B^{i}_{t_{N,n+1}}-B^{i}_{t_{N,n}})}, \qquad i=1,\ldots ,d. $$

That is, we have \(Z_{n+1}=(e^{G_{1}},\ldots ,e^{G_{d}})\) for a \(d\)-variate random variable \((G_{1},\ldots ,G_{d})\) which has a \(d\)-variate normal distribution \(\mathbf{N}_{\delta ,\varGamma}\) with \(\delta :=(( \delta _{i} - \sigma _{i}^{2}/2)/N)_{i=1}^{d}\) and \(\varGamma :=( \sigma _{i}R_{i,j}\sigma _{j}/N)_{1\le i,j\le d}\). Thus we have \(\mu _{1}=\cdots =\mu _{N}=\mathrm{LN}_{\delta ,\varGamma}\), where \(\mathrm{LN}_{\delta ,\varGamma}\) is a \(d\)-variate log-normal distribution with parameters \(\delta \) and \(\varGamma \).

6.2.2 Interpretation as a Markov decision problem

The terminal wealth optimisation problem just introduced can be embedded in the framework of Sect. 6.1 as follows. Let \(Z^{0}_{1},\ldots ,Z^{0}_{N}\in [1,\infty )\) be a priori fixed and choose (E,E):=( R + ,B( R + )). For any x R + and \(n=0,\ldots ,N-1\), let

A n (x):=A(x):={a R + d :a,1x}.

Hence A n = R + d and D n =D:={(x,a) R + d + 1 :aA(x)} for \(n=0, \ldots ,N-1\). Let A n :=B( R + d ) and D n :=B( R + d + 1 )D for any \(n=0,\ldots ,N-1\), and let the set \(F\) consist of all those Borel-measurable maps f: R + R + d that satisfy \(\langle f(x),\boldsymbol{1}\rangle \in [0,x]\) for any x R + . Finally, let \(F_{n}:=F\) for \(n=0,\ldots ,N-1\) and \(\varPi :=F_{0}\times \cdots \times F_{N-1}=F^{N}\).

Let M 1 α ( R + + d ) be the set of all Borel probability measures on R + + d for which R + d | z | α μ(dz)<. The latter condition is equivalent to R + + d z , 1 α μ(dz)<, which can be shown by using arguments as at the beginning of Appendix A.2. For any μ = ( μ n ) n = 1 N M 1 α ( R + + d ) N , we define a transition function \(P^{\vec{\boldsymbol{\mu}}}=(P_{n}^{\vec{\boldsymbol{\mu}}})_{n=0}^{N-1}\) by

$$P_{n}^{\vec{\boldsymbol{\mu}}}\big((x,a),\,\cdot \,\big)=(\mu _{n+1} \circ \eta _{n,(x,a)}^{-1}) [\,\cdot \,],\qquad (x,a)\in D_{n},\,n=0, \ldots ,N-1, $$

where the map η n , ( x , a ) : R + d R + is defined by (6.6). The set of all such transition functions is denoted by \(\mathcal{P}_{\alpha}\), i.e., P α :={ P μ : μ M 1 α ( R + + d ) N }, and plays the role of \(\mathcal{P}\).

Let \(r_{n}:= 0\), \(n=0,\ldots ,N-1\), and \(r_{N}(x):=u_{\alpha}(x)\), x R + . Then

V 0 P ; π ( x 0 )= E x 0 , P ; π [ r N ( X N )]= E x 0 , P ; π [ u α ( X N )]

for any x 0 R + , \(P\in{\mathcal{P}}_{\alpha}\) and \(\pi \in \varPi \), and the terminal wealth problem introduced subsequent to (6.7) can be identified with the optimisation problem (6.1), i.e., with

max{ E x 0 , P ; π [ u α ( X N )]:πΠ}
(6.8)

for any x 0 R + and \(P\in{\mathcal{P}}_{\alpha}\). A strategy \(\pi ^{P}\in \varPi \) is called an optimal (self-financing) trading strategy for \(P\) if it solves the maximisation problem (6.8) for any x 0 R + . Note that the coordinate process \(X\) plays the role of the portfolio process \(X^{\xi}\) introduced in (6.4), and that for each x 0 R + , any self-financing trading strategy \(\xi =(\xi _{n})_{n=0}^{N-1}\) with respect to \(x_{0}\) may be identified with some \(\pi =(f_{n})_{n=0}^{N-1}\in \varPi \) through \(\xi _{n}=f_{n}(X_{n}^{\xi})\). Theorem C.3 ensures that optimal trading strategies exist.

6.2.3 Continuous dependence of the optimal value on \(P^{\vec{\boldsymbol{\mu}}}\)

Let the function ψ α : R + [1,) be defined by \(\psi _{\alpha}(x):=1 + u_{\alpha}(x)\). Moreover, let \(\mathcal{P}_{\psi _{\alpha}}\) be derived from \(\mathcal{P}_{\alpha}\) as \(\mathcal{P}_{\psi}\) is derived from \(\mathcal{P}\) in Sect. 6.1.

Lemma 6.5

\(\psi _{\alpha}\) is a bounding function for any \(P\in{\mathcal{P}}_{\alpha}\), and we have \(\mathcal{P}_{\psi _{\alpha}}=\mathcal{P}_{\alpha}\).

Let M:= M Höl , α :={v R R + : v Höl , α 1}, where the Hölder-\(\alpha \) norm is defined by v Höl , α := sup x , y R + : x y |v(x)v(y)|/ | x y | α . We obviously have M Höl , α M ψ α ( R + ), and in view of Lemmas 6.5 and 6.1, we can therefore define a functional V n x : P ψ α R through \(\overline{\mathcal{V}}_{n}^{x}(P):=V_{n}^{P}(x)\). The set M Höl , α separates points in M 1 ψ α ( R + ), implying that d M Höl , α (defined by (6.2) with M:= M Höl , α ) provides a metric on M 1 ψ α ( R + ); see Kern et al. [26] for details. Let d M Höl , α , ψ α be defined by (6.3) with M:= M Höl , α and \(\psi :=\psi _{\alpha}\).

Theorem 6.6

For any \(n=0,\ldots ,N-1\) and x R + , the map V n x : P ψ α R is -continuous.

Recall that the elements of \(\mathcal{P}_{\alpha}\) (\(=\mathcal{P}_{\psi _{ \alpha}}\)) are parametrised by the elements of the set M 1 α ( R + + d ) N . For any μ M 1 α ( R + + d ), denote by \(\overline{\mu}\) the element of M 1 α ( R + + d ) N whose \(N\) entries are all equal to \(\mu \), i.e., \(\overline{\mu}:=(\mu )_{n=1}^{N}\). Then we can define a functional V n x : M 1 α ( R + + d )R by

$$ \mathcal{V}_{n}^{x}(\mu )\,:=\,\overline{\mathcal{V}}_{n}^{x}( \overline{\mu}) = V_{n}^{P^{\overline{\mu}}}(x). $$
(6.9)

Since we used \(\mathcal{O}_{d}^{\alpha}\) to denote the \(\alpha \)-weak topology on \(\mathcal{M}_{1}^{\alpha}\) (see Sect. 2.2), we use O d α ( R + + d ) to denote the analogous topology on M 1 α ( R + + d ).

Corollary 6.7

For any \(n=0,\ldots ,N-1\) and x R + , the map V n x : M 1 α ( R + + d )R defined by (6.9) is ( O d α ( R + + d ), O R )-continuous.

6.3 Copula robustness of the maximal expected utility of the terminal wealth

For any \(n=0,\ldots ,N-1\) and x R + , let the map V n x : M 1 α ( R + + d )R be defined by (6.9), and note that \(\mathcal{V}_{0}^{x_{0}}(\mu )\) corresponds to the maximal expected utility of the terminal wealth in (6.8) with \(P=P^{\overline{\mu}}\). When regarding each μ M 1 α ( R + + d ) as a Borel probability measure on the whole Euclidean space R d (with μ[ R + + d ]=1), the set M 1 α ( R + + ) can be seen as a subset of \(\mathcal{M}_{1}^{\alpha}\). Thus \(\mathbf{C}_{d}(\mu _{1},\ldots \mu _{d})=\mathbf{C}_{d}\) for any μ 1 ,, μ d M 1 α ( R + + d ), and Theorem 3.12 and Corollary 6.7 together imply the following result.

Corollary 6.8

For any \(n=0,\ldots ,N-1\) and x R + , the map V n x : M 1 α ( R + + d )R defined by (6.9) is copula robust.