Abstract
There are many ways of measuring and modeling taildependence in random vectors: from the general framework of multivariate regular variation and the flexible class of maxstable vectors down to simple and concise summary measures like the matrix of bivariate taildependence coefficients. This paper starts by providing a review of existing results from a unifying perspective, which highlights connections between extreme value theory and the theory of cuts and metrics. Our approach leads to some new findings in both areas with some applications to current topics in risk management.
We begin by using the framework of multivariate regular variation to show that extremal coefficients, or equivalently, the higherorder taildependence coefficients of a random vector can simply be understood in terms of random exceedance sets, which allows us to extend the notion of Bernoulli compatibility. In the special but important case of bivariate taildependence, we establish a correspondence between taildependence matrices and \(L^1\) and \(\ell _1\)embeddable finite metric spaces via the spectral distance, which is a metric on the space of jointly 1Fréchet random variables. Namely, the coefficients of the cutdecomposition of the spectral distance and of the TawnMolchanov maxstable model realizing the corresponding bivariate extremal dependence coincide. We show that line metrics are rigid and if the spectral distance corresponds to a line metric, the higher order taildependence is determined by the bivariate taildependence matrix.
Finally, the correspondence between \(\ell _1\)embeddable metric spaces and taildependence matrices allows us to revisit the realizability problem, i.e. checking whether a given matrix is a valid taildependence matrix. We confirm a conjecture of Shyamalkumar and Tao (2020) that this problem is NPcomplete.
1 Introduction
Extreme events such as large portfolio losses in insurance and finance, spatial and environmental extremes such as heatwaves, floods, electric grid outages, and many other complex system failures are associated with tailevents. That is, the simultaneous occurrence of extreme values in the components of a possibly very highdimensional vector \(X = (X_i)_{1\le i\le p}\) of covariates. Such simultaneous extremes occur due to dependence among the extremes of the \(X_i\)’s. This has motivated a large body of literature on modeling and quantifying taildependence, see, e.g. (Coles 2001; Finkenstädt and Rootzén 2003; Rachev 2003; Beirlant et al. 2004; Castillo 1988; Resnick 2007; de Haan and Ferreira 2007). One basic and popular measure is the bivariate (upper) taildependence coefficient
where \(F_i^{1}(u):= \inf \{ x\,:\, \mathbb P[X_i \le x] \ge u\}\) is the generalized inverse of the cumulative distribution function \(F_i\) of \(X_i\). Under weak conditions the above limit exists and is independent of the choice of the (continuous) marginal distributions of \((X_i,X_j)\). The matrix \(\Lambda := (\lambda _X(i,j))_{p\times p}\) of bivariate taildependence coefficients is necessarily positive (semi)definite and in fact, since \(\lambda _X(i,i) = 1\), it is a correlation matrix of a random vector, see Schlather and Tawn (2003). We call \(\Lambda\) as defined in (1.1) a taildependence matrix or TD matrix for short.
The general theme of our paper is that we review and contribute to the unified treatment of taildependence using the powerful framework of multivariate regular variation. This leads to deep connections to existing results in the theory of cut (semi)metrics and \(\ell _1\)embeddable metrics (Deza and Laurent 1997), as well as to extensions to the Bernoulli compatibility characterization of taildependence matrices established in Embrechts et al. (2016) and Krause et al. (2018). What follows is an overview of our key ideas and contributions.
Since the marginal distributions of X are not important in quantifying taildependence, one may transform its marginals to be heavytailed. In fact, we make the additional and often very mild assumption that the vector X is regularly varying, i.e., that there exists a Radon measure \(\mu\) on \(\mathbb R^p\setminus \{0\}\) and a suitable positive sequence \(a_n\uparrow \infty\) such that
for all Borel sets \(A\subset \mathbb R^p\) that are bounded away from 0 and such that \(\mu (\partial A) =0\) (see Definition 2.1). This allows us to conclude that \(n\mathbb P[ h(X)> a_n] \rightarrow \mu \{ h>1\}\) for continuous and 1homogeneous functions \(h:\mathbb R^p \rightarrow [0,\infty )\) (Proposition 2.5). Therefore, if h is such a risk functional, we readily obtain an asymptotic approximation of the probability of an extreme loss \(\mathbb P[h(X)> a_n] \approx n^{1} \mu \{h>1\}\). By varying the risk functional h, one obtains different measures of taildependence, which may be of particular interest to practitioners. For example, if \(L = \{i_1,\cdots ,i_k\}\subset [p]:=\{1,\cdots ,p\}\) and taking \(h_L(X) = (\min _{i \in L} X_i)_+:=\max \{0,\min _{i \in L} X_i\}\), the risk functional quantifies the joint exceedance probability
that all components of X with index in the set L are simultaneously extreme – an event with potentially devastating consequences. In practice, due to the limited horizon of historical data such extreme events especially for large sets L are rarely (if ever) observed. Thus, quantifying their probabilities is very challenging. Yet, as Emil Gumbel had eloquently put it “It is not possible that the improbable will never occur.” This underscores the importance of the theoretical understanding, modeling, and inference of such functionals. Namely, one naturally arrives at the higher order taildependence coefficients
It can be seen that if the marginals of the \(X_i\)’s are identical and \(a_n\) is such that \(n^{1} \sim \mathbb P[X_i>a_n]\) (i.e. \(\lim _{n \rightarrow \infty } n\mathbb P[X_i>a_n]=1\)), then \(\lambda _X(\{i,j\}) = \lim _{n\rightarrow \infty } \mathbb P[ X_i>a_n\mid X_j>a_n]\) recovers the classic bivariate taildependence coefficients \(\lambda _X(i,j)\) in (1.1). Using the functionals \(h(X):= \max _{j\in K} X_j\) for some \(K\subset [p]\), one arrives at the popular extremal coefficients arising in the study of maxstable processes:
Starting from the seminal works of Schlather and Tawn (2002, 2003), the structure of the extremal coefficients \(\{\theta _X(K),\ K\subset [p]\}\) has been studied extensively, see Strokorb and Schlather (2015); Strokorb et al. (2015); Molchanov and Strokorb (2016); Fiebig et al. (2017), which address fundamental theoretical problems and develop stochastic process extensions. Our goal here is more modest. We want to study both the taildependence and extremal coefficients as risk functionals from the unifying perspective of regular variation. Interestingly, they can be succinctly understood in terms of exceedance sets. Namely, defining the random set
we show (Proposition 3.1)
where the limit \(\Theta\) is a nonempty random subset of [p] such that
where \(a = \theta _X([p])\). Thus, \(\lambda _X\) and \(\theta _X\) (up to rescaling by a) are precisely the inclusion and hitting functionals characterizing the distribution of \(\Theta\) (Molchanov 2017). Interestingly, the probability mass function of the random set \(\Theta\) recovers (up to rescaling) the coefficients in a (generalized) TawnMolchanov maxstable model associated with X (see (3.6)).
The above probabilistic representation in (1.2) of the taildependence functionals leads to transparent proofs of seminal results from Embrechts et al. (2016) and Krause et al. (2018) on the characterization of TD matrices in terms of socalled Bernoullicompatible matrices. In fact, we readily obtain a more general result on the characterization of higherorder taildependence coefficients via Bernoullicompatible tensors (Proposition 3.4).
Associated to the bivariate taildependence coefficients \(\lambda _X(\{i,j\})\) we introduce and discuss the so called spectral distance \(d_X\) given by
This spectral distance defines a metric on the space of 1Fréchet random variables (i.e. random variables with distribution function \(F(x)=\exp \{c/x\}, x \ge 0,\) for some nonnegative scale coefficient c, where we speak of a standard 1Fréchet distribution if \(c=1\)) living on a joint probability space, which metricizes convergence in probability and was considered in Davis and Resnick (1993); Stoev and Taqqu (2005); Fiebig et al. (2017). In Section 4 we will establish the \(L^1\)embeddability of this metric, which allows us to apply the rich theory about metric embeddings in the context of analyzing the taildependence coefficients.
In Section 4.2, utilizing the exceedence set representation of the bivariate taildependence coefficients and the \(L^1\)embeddability of the spectral distance, we recover the equivalence of the \(L^1\) and \(\ell _1\)embeddability as well as a probabilistic proof of the socalled cutdecomposition of \(\ell _1\)embeddable finite metric spaces. In this case, this decomposition turns out to be closely related to the TawnMolchanov model of an associated maxstable vector X (Proposition 4.5). When a given \(\ell _1\)embeddable metric has a unique cutdecomposition, it is called rigid (Deza and Laurent 1997). Rigidity of the spectral distance basically means that the bivariate taildependence coefficients \(\Lambda\) determine all higher order taildependence coefficients. In Theorem 4.11, we show that line metrics are rigid, which to the best of our knowledge is a new finding. In particular, we obtain that the bivariate taildependence coefficient matrices corresponding to line metrics determine the complete set of taildependence or, equivalently, extremal coefficients of X. Interestingly, the random set \(\Theta\) corresponding to such linemetric taildependence is (after a suitable reordering of marginals) a random segment, more precisely a random set of the form \(\{i, i+1, \ldots , j1, j\}\) for \(1 \le i \le j \le p\) with \(i=1\) or \(j=p\). In general, the characterization of rigidity is computationally hard as it is equivalent to the characterization of the simplex faces of the cone of cut metrics (Deza and Laurent 1997).
The bivariate TD matrix \(\Lambda\) is a correlation matrix of a random vector. It is wellknown, however, that not every correlation matrix with nonnegative entries is a matrix of taildependence coefficients. The recent works of Fiebig et al. (2017), Embrechts et al. (2016), Krause et al. (2018), and Shyamalkumar and Tao (2020) among others have studied extensively various aspects of the class of TD matrices. One surprisingly difficult problem, referred to as the realizability problem, is checking whether a given matrix \(\Lambda\) is a valid TD matrix. The extensive study of Shyamalkumar and Tao (2020) proposed several practical and efficient algorithms for realizability. Moreover, Shyamalkumar and Tao (2020) conjectured that the realizability problem is NPcomplete. In Section 5, we confirm their conjecture. We do so by exploiting the established connection to \(\ell _1\)embeddability, which allows us to utilize the rich theory on cuts and metrics outlined in the monograph of Deza and Laurent (1997). It is known that checking whether any given ppoint metric space is \(\ell _1\)embeddable is a computationally hard problem in the NPcomplete class.
The paper is structured as follows: In Section 2 we give an overview over several ways of modeling and measuring taildependence of a random vector, presented in a hierarchic fashion: First of all, multivariate regular variation allows for the most complete asymptotic description of the tailbehavior of (heavytailed) random vectors in terms of the tail measure, with a direct correspondence to the class of maxstable models as the natural representatives for each given tail measure. A more condensed description of taildependence is given by the values of special extremal dependence functionals like the extremal coefficients and taildependence coefficients. Finally, a rather coarse but popular description of the taildependence is given in form of those functions evaluated only at bivariate marginals, where the bivariate taildependence coefficients form the most prominent example.
In Section 3 we first discuss exceedance sets, as introduced above, and Bernoulli compatibility. Based on this interpretation we give a short introduction into generalized TawnMolchanov models.
In Section 4 we explore the relationship between bivariate taildependence coefficients and the spectral distance on the space of 1Fréchet random variables. After a brief introduction into the concepts of metric embeddings of finite metric spaces we will show that the spectral distance is both \(L^1\) and \(\ell _1\)embeddable, some consequences of which will be explored in Sections 4.2 and 5. In Section 4.2 we introduce the concept of rigid metrics and prove that the building blocks of \(\ell _1\)embeddability, i.e. the line metrics, correspond to TawnMolchanov models with a special structure which is completely determined by this line metric.
Finally, in Section 5 we use known results about the computational complexity of embedding problems to show that the realization problem of a taildependence matrix is NPcomplete. Some proofs are deferred to the Appendix A.
2 Regular variation, maxstability, and extremal dependence
In this section, we provide a concise overview of fundamental notions on multivariate regular variation and maxstable distributions, which underpin the study of taildependence.
2.1 Multivariate regular variation
The concept of multivariate regular variation is key to the unified treatment of the various taildependence notions we will consider. Much of this material is classic but we provide here a selfcontained review tailored to our purposes. Many more details and insights can be found in Resnick (1987, 2007); Hult and Lindskog (2006); Basrak and Planinić (2019); Kulik and Soulier (2020) among other sources.
We start with a few notations. A set \(A \subset \mathbb R^p\) is said to be bounded away from 0 if \(0\not \in A^\textrm{cl}\), i.e., \(A\cap B(0,\varepsilon ) =\emptyset\), for some \(\varepsilon >0\). Here \(A^\textrm{cl}\) is the closure of A and \(B(x,r):=\{ y\in \mathbb R^p\,:\, \Vert xy\Vert <r\}\) is the ball of radius r centered at x in a given fixed norm \(\Vert \cdot \Vert\). Furthermore, denote the Borel \(\sigma\)Algebra on \(\mathbb {R}^p\) by \(\mathcal{B}(\mathbb R^p)\).
Consider the class \(M_0(\mathbb R^p)\) of all Borel measures \(\mu\) on \(\mathcal{B}(\mathbb R^p)\) that are finite on sets bounded away from 0, i.e., such that \(\mu (B(0,\varepsilon )^c)<\infty\), for all \(\varepsilon >0\). Such measures will be referred to as boundedly finite. For \(\mu _n,\mu \in M_0(\mathbb R^p),\) we write
if \(\int _{\mathbb R^p} f(x)\mu _n(dx) \rightarrow \int _{\mathbb R^p} f(x) \mu (dx), \text { as }n\rightarrow \infty ,\) for all bounded and continuous f vanishing in a neighborhood of 0. The latter is equivalent to having
for all \(\mu\)continuity Borel sets A that are bounded away from 0 (Hult and Lindskog 2006, Theorems 2.1 and 2.4).
Definition 2.1
A random vector X in \(\mathbb R^p\) is said to be regularly varying if there is a positive sequence \(a_n\uparrow \infty\) and a nonzero \(\mu \in M_0(\mathbb R^p)\) such that
In this case, we write \(X\in \textrm{RV}(\{a_n\},\mu )\) and call \(\mu\) the tail measure of X.
If \(X\in \textrm{RV}(\{a_n\},\mu )\), then it necessarily follows that there is an index \(\alpha >0\) such that
and, moreover, \(a_n \sim n^{1/\alpha } \ell (n)\), for some slowly varying function \(\ell\), see, e.g., Kulik and Soulier (2020), Section 2.1. We shall denote by \(\textrm{index}(X)\) the index of regular variation \(\alpha\) and sometimes write \(X\in \textrm{RV}_\alpha (\{a_n\},\mu )\) to specify that \(\textrm{index}(X) = \alpha\).
The measure \(\mu\) is unique up to a multiplicative constant and the scaling property (2.2) implies that \(\mu\) factors into a radial and an angular component. Namely, fix any norm \(\Vert \cdot \Vert\) in \(\mathbb R^p\setminus \{0\}\) and define the polar coordinates \(r:=\Vert x\Vert\) and \(u:= x/\Vert x\Vert ,\ x\not = 0\). Then,
where \(S:=\{x\,:\, \Vert x\Vert =1\}\) is the unit sphere and \(\sigma\) is a finite Borel measure on S referred to as the angular or spectral measure associated with \(\mu\), see, e.g., Kulik and Soulier (2020), Section 2.2. Given the norm \(\Vert \cdot \Vert\), the measure \(\sigma\) is uniquely determined as
where \(\mathcal{B}(A)\) for \(A \subset \mathbb {R}^p\) denotes the pdimensional Borel sets which are also subsets of A. The following is a useful characterization of regular variation sometimes taken as an equivalent definition, see again, e.g., Kulik and Soulier (2020), Section 2.2.
Proposition 2.2
We have \(X\in \textrm{RV}_\alpha (\{a_n\},\mu )\) if and only if for all \(x>0\)
where \(\Rightarrow\) denotes the weak convergence of probability distributions.
Proposition 2.2 characterizes regularly varying random vectors in terms of exceedances over a threshold. An equivalent charaterization is also possible in terms of maxima, see, e.g., Kulik and Soulier (2020), Section 2.1.
Proposition 2.3
For a random vector \(Y \in [0,\infty )^p\) we have \(Y \in \textrm{RV}_\alpha (\{a_n\},\mu )\) if and only if there exists a nondegenerate random vector X such that for all \(x \in [0,\infty )^p\)
where \([0,x]^c:= \mathbb R_+^p \setminus [0,x]=\mathbb R_+^p \setminus ([0,x_1] \times \ldots \times [0,x_p])\) and \(Y^{(t)},\ t=1,\dots ,n\) are independent copies of Y and the operation \(\vee\) denotes taking the componentwise maximum. The random vector X is said to have a (multivariate) Fréchetdistribution with exponent measure \(\mu\).
Multivariate regular variation provides an asymptotic framework and for given \(\alpha , \{a_n\}\) and \(\mu\) there exist several distributions of random vectors Y such that \(Y \in \textrm{RV}_\alpha (\{a_n\},\mu )\), but according to Proposition 2.3 their maxima are all attracted to the same random vector X whose distribution depends only on \(\mu\). The class of limiting random variables in Proposition 2.3 will be inspected more closely in the next section.
2.2 Maxstable vectors
The homogeneity property (2.2) of \(\mu\) implies that the limiting random vector in Proposition 2.3 has a certain stability property, namely that
with the same notation as in Proposition 2.3 and where \({\mathop {=}\limits ^{d}}\) stands for equality in distribution, see Kulik and Soulier (2020), Section 2.1. We call such a random vector X maxstable and we call X nondegenerate maxstable if in addition \(\mathbb P[X=(0, \ldots , 0)]<1\). For \(\alpha =1\) this simplifies to
and we speak of a simple maxstable random vector X, which we will further analyze in the following.
The marginal distributions of simple maxstable distributions are necessarily 1Fréchet, that is,
for some nonnegative scale coefficient \(\sigma _i\). We shall write \(\Vert X_i\Vert _1:=\sigma _i\) for the scale coefficient of the 1Fréchet variable \(X_i\). The next result characterizes all multivariate simple maxstable distributions. Here, we recall the socalled de Haan construction of a simple maxstable vector.
Proposition 2.4
Let \((E,\mathcal{E},\nu )\) be a measure space and let \(L_+^1(E,\nu )\) denote the set of all nonnegative \(\nu\)integrable functions on E. For every collection \(f_i\in L_+^1 (E,\nu ),\ 1\le i\le p\), there is a random vector \(X = (X_i)_{1\le i\le p}\), such that for all \(x_i>0, 1 \le i \le p,\)
The random vector X is simple maxstable. Conversely, for every simple maxstable vector X, Equation (2.6) holds and \((E,\mathcal{E},\nu )\) can be chosen as \(([0,1],\mathcal{B}[0,1],\textrm{Leb})\). In fact, we have the stochastic representation
where \(\{(\Gamma _j,U_j)\}\) is a Poisson point process on \((0,\infty )\times [0,1]\) with mean measure \(dx\times \nu (du)\).
For a proof and more details, see e.g. de Haan (1984); Stoev and Taqqu (2005). The functions \(f_i\) in (2.6) and (2.7) are referred to as spectral functions associated with the vector X. From (2.6) and (2.7), one can readily see that for all \(f\in L_+^1(E,\nu )\), the socalled extremal integral I(f) in (2.7) is a welldefined 1Fréchet random variable. More precisely, its cumulative distribution function is:
Moreover, the extremal integral functional \(I(\cdot )\) is maxlinear in the sense that for all \(a_i\ge 0\) and \(f_i\in L_+^1(E,\nu ),\ 1\le i\le n\), we have
Thus, every maxlinear combination \(\vee _{i=1}^n a_i X_i\) of X as above with coefficients \(a_i\ge 0\) is a 1Fréchet random variable with scale coefficient:
We will further explore the asymptotic properties of simple maxstable random vectors and how they fit into the framework of multivariate regular variation in the following section.
2.3 Extremal dependence functionals and taildependence coefficients
The tail measure \(\mu\) and the normalizing sequence \(\{a_n\}\) from Section 2.1 provide a comprehensive description of the asymptotic behavior of a random vector X and allow to approximate probabilities of the form \(\mathbb P[X \in a_n A]\) for all sets A bounded away from 0. Sometimes, however, one may be interested in those probabilities for certain simple sets A only and describe the asymptotic behavior of X by certain extremal dependence functions instead. In this section, we first derive a general result for such extremal dependence functions and then introduce two particularly popular families of them.
Proposition 2.5
Let \(X\in \textrm{RV}_\alpha (\{a_n\},\mu )\) in \(\mathbb R^p\). Let also \(h:\mathbb R^p\rightarrow [0,\infty )\) be a nonnegative, continuous and 1homogeneous function, i.e., \(h(c x) = c h(x),\ c>0,\ x\in \mathbb R^p\). Then,
where Y has probability distribution \(\sigma (\cdot )/\sigma (S)\) with \(\sigma\) is as in (2.4) and \(S=\{x\,:\, \Vert x\Vert =1\}\).
Though this result is similar to Yuen et al. (2020), Lemma A.7, and also a special case to Dyszewski and Mikosch (2020), Theorem 2.1, its proof is given Section A.
We will apply the formula in (2.8) for homogeneous functionals of the form \(h(x) = (\min _{i\in K} x_i)_+\) and \(h(x) = (\max _{i\in K} x_i)_+\) for some subset \(K\subset [p]=\{1,\ldots ,p\}\).
The next result shows that simple maxstable vectors are regularly varying and provides means to express their extremal dependence functionals both in terms of spectral functions and tail measures.
Proposition 2.6
Let \(X=(X_i)_{1\le i\le p}\) be a nondegenerate simple maxstable vector as in (2.6). Then, \(X\in \textrm{RV}_1(\{n\},\mu )\), where \(\mu\) is supported on \([0,\infty )^p\) and for all \(x=(x_i)_{1\le i\le p}\in \mathbb R_+^p\setminus \{0\}\)
Moreover, for every nonnegative, continuous 1homogeneous function \(h:\mathbb R^p\rightarrow [0,\infty )\), we have
where \(\textbf{f}(z) = (f_1(z),\cdots ,f_p(z))\). In particular, the spectral measure \(\sigma\) has the representation
Again, this result is standard but we sketch its proof for the sake of completeness in Appendix A. The classic representation of the simple maxstable cumulative distribution functions is a simple corollary from Proposition 2.6.
Corollary 2.7
In the situation of Proposition 2.6, by taking \(h(u):= h_x(u):= (\max _{i\in [p]} u_i/x_i)_+\) for \(x\in (0,\infty )^p\) in (2.9), we obtain \(\mu (\{h>1\}) = \mu ([0,x]^c)\) and
For more details on the characterization of the maxdomain of attraction of multivariate maxstable laws in terms of multivariate regular variation, see e.g., Proposition 5.17 in Resnick (1987).
We are now ready to recall the general definitions of the extremal and taildependence coefficients of a regularly varying random vector, which have briefly been introduced in Section 1, now with additional notation for the normalizing sequence \(\{a_n\}\).
Definition 2.8
Let \(X=(X_i)_{1\le i\le p} \in \textrm{RV}(\{a_n\},\mu )\). Then, for nonempty sets \(K, L\subset [p]\), we let
The \(\theta _X(K;\{a_n\})\)’s and \(\lambda _X(L;\{a_n\})\)’s are referred to as the extremal and taildependence coefficients relative to \(\{a_n\}\) of the vector X, respectively.
If it is clear to which random vector we refer to or it does not matter for the argument, we may drop the index X and just write \(\theta (K;\{a_n\})\) and \(\lambda (K;\{a_n\})\). Sometimes we will view \(\theta\) and \(\lambda\) as functions of ktuples and write for example
(where some of the arguments \(i_1,\ldots ,i_k\) may repeat) which corresponds to \(\lambda _X(L,\{a_n\})\) where L is the set of all distinct values in \(\{i_1,\ldots ,i_k\}\).
Remark 2.9
Note that the definitions of \(\theta _X(K,\{a_n\})\) and \(\lambda _X(L,\{a_n\})\) depend on the choice of the sequence \(\{a_n\}\). They are unique, however, up to a multiplicative constant. More precisely, if \(\textrm{index}(X) = \alpha\) and \(a_n\sim a_n', c>0\), then
Remark 2.10
In the following we will focus on extremal and taildependence coefficients of maxstable random vectors, which exist by Definition 2.8 in combination with Proposition 2.6 as long as X is nondegenerate. Observe that if X is nondegenerate simple maxstable, then
Thus, if all marginals of X are standard \(1\)Fréchet, i.e., \(\Vert X_i\Vert _1=1\), then setting \(a_n=n\) ensures that \(\lim _{n \rightarrow \infty } n \mathbb P[X_i>a_n]=1\) and one recovers the upper taildependence coefficient \(\lambda _X(i,j)\) from (1.1), \(i,j \in [p]\). More generally, if X is nondegenerate simple maxstable, then we can choose \(a_n=n\) as a normalizing sequence and in this case (or if the sequence \(\{a_n\}\) does not matter for the argument), we will also write
In the case that \(\mathbb P[X=(0,\ldots , 0)]=1\), we set \(\theta _X(K)=\lambda _X(L)=0\) for all \(K, L \subset [p]\).
The following result expresses these functionals in terms of both the tail measure \(\mu\) and the spectral functions of the vector X. Again, the proof is given in Appendix A.
Corollary 2.11
Let \(X=(X_i)_{1\le i\le p}\) be a simple maxstable vector as in (2.6). Then,
where \(A_i:= \{ x\in \mathbb R^p\,:\, x_i>1\}\) and
2.4 Bivariate taildependence measures and spectral distance
In Definition 2.8 we introduced general extremal and taildependence coefficients for arbitrary nonempty subsets \(K,L\subset [p]\), i.e. for \(2^p1\) different sets. Often these are too many coefficients for a handy description of the dependence structure. Therefore, one may consider only the pairwise dependence in a simple maxstable vector X which corresponds to the consideration of sets K and L with at most two entries. The set of taildependence coefficients with sets containing at most two elements can be written in the so called matrix of bivariate taildependence coefficients, which we denote by
For the bivariate taildependence we have the alternative representation
For standardized marginals \(\Vert X_i\Vert _1=1\) this implies \(\lambda _X(i,j)=2\Vert X_i\vee X_j\Vert _1\). The 1Fréchet marginals of X imply
as \(n\rightarrow \infty\), where \(\Vert X_i\vee X_j\Vert _1\) denotes the scale coefficient of the 1Fréchet distribution of \(X_i\vee X_j\). Thus, for standardized marginals \(\Vert X_i\Vert _1=1\), \(1\le i\le p\), the bivariate taildependence coefficients also have the following representation for all \(1\le i,j\le p\):
In this form, the bivariate taildependence matrix is a popular measure for the extremal dependence in the random vector X. First appearing around the 60’s (e.g. de Oliveira (1962)), the bivariate taildependence coefficients are frequently considered in the literature, see e.g. Coles et al. (1999); Beirlant et al. (2004); Frahm et al. (2005); Fiebig et al. (2017); Shyamalkumar and Tao (2020) for different considerations (sometimes other names as coefficient of (upper) taildependence or \(\chi\)measure are used). In the context of finance and insurance but also in an environmental context this measure is used to describe the extremal risk in the random vector X. Moreover, the characterization of whether \(X_i\) and \(X_j\) are extremally dependent is usually formulated by these bivariate taildependence coefficents: If \(\lambda _X(i,j)=0\), then \(X_i\) and \(X_j\) are extremally independent, otherwise the two random variables are extremally dependent.
Note that for standardized marginals the relation \(\theta _X(i,j)=2\lambda _X(i,j)\) holds. The extremal dependence coefficient in this form has often been used in the literature as a measure for extremal dependence, see e.g. Smith (1990); Schlather and Tawn (2003); Strokorb and Schlather (2015).
In all these references, the taildependence coefficient was defined as in (2.15) and standardized (or at least identically distributed) marginal distributions were assumed, as it is common for the analysis of dependence. However, we allow for unequal scales and therefore use the more general form (2.14).
Remark 2.12
The matrix of bivariate taildependence coefficients \(\Lambda\) of a simple maxstable vector is necessarily positive semidefinite. Indeed, this follows from the observation that by Corollary 2.11
where \(B=\{B(t),\ t\ge 0\}\) is a standard Brownian motion and since nonnegative mixtures of covariance matrices are again covariance matrices. Another way to see this is from the observation that for each n, we have \(n\mathbb P[X_i>n, X_j>n] =n\mathbb E[ I(X_i>n) I(X_j>n)]\) is a positive semidefinite function of \(i,j\in [p]\), which is related to the fact that \((i,j)\mapsto \lambda (i,j)\) is, up to a multiplicative constant, the covariance function of a certain random exceedance set (see Remark 3.6, below).
The matrix \(\Lambda\) is thus positive semidefinite, has nonnegative entries and for standardized marginals of X it holds \(\lambda (\{i\})=1\), i.e. \(\Lambda\) is a correlation matrix. However, not every correlation matrix with nonnegative entries is necessarily a matrix of bivariate taildependence coefficients. The realization problem (i.e. the question whether a given matrix is the taildependence matrix of some random vector) is a recent topic in the literature (Fiebig et al. 2017; Krause et al. 2018; Shyamalkumar and Tao 2020). We will further discuss this problem in Section 5.
Related to the bivariate dependence coefficients we define an associated function, which will turn out to be a semimetric on [p].
Definition 2.13
Let \(X=(X_i)_{1\le i\le p}\) be a simple maxstable vector. Then, for \(i,j\in [p]\), the spectral distance \(d_X\) is defined by
By (2.14)
If the scales of the marginals of the simple maxstable vector \((X_i)_{1\le i\le p}\) are the same, i.e. \(\Vert X_i\Vert = c\) for some \(c>0\) and all \(1\le i\le p\), then (2.17) simplifies to
For standard 1Fréchet marginals this further reduces to \(d(i,j)=2(1\lambda _X(i,j))\).
The spectral distance for maxstable vectors was already considered in Stoev and Taqqu (2005), equation (2.11). There it was shown that this distance is indeed a semimetric on [p] (Stoev and Taqqu 2005, Proposition 2.6) and that it metricizes convergence in probability in 1Fréchet spaces (Stoev and Taqqu 2005, Proposition 2.4). In the form of (2.17), the spectral distance also appears in Fiebig et al. (2017), where it was defined in two steps in (Fiebig et al. 2017, Proposition 34 and 37). There, the use of the spectral distance is based on the fundamental work of (Deza and Laurent 1997, Section 5.2), where it is used in a different context.
In Section 4 we will prove that the spectral distance of a simple maxlinear vector X is \(L^1\)embeddable, with representation \(d_X(i,j)=\Vert f_if_j\Vert _{L^1}\), where \(f_i,f_j\) are the spectral functions of X. In this form, the spectral distance was already used in Davis and Resnick (1989, 1993), where it was mainly applied for a projection method for prediction of maxstable processes. Davis and Resnick (1993) also gave a connection to the bivariate taildependence coefficients \(\lambda (i,j)\) as considered in de Oliveira (1962), but only in the case of equally scaled marginals.
3 Taildependence via exceedance sets
In this section we develop a unified approach to representing taildependence via random exceedence sets, which explains and extends the notion of Bernoulli compatibility discovered in Embrechts et al. (2016) to higher order taildependence. Moreover, we introduce a slight extension of the socalled TawnMolchanov models and explore their connections to extremal and taildependence coefficients.
3.1 Bernoulli compatibility
We will first demonstrate that taildependence can be succinctly characterized via a random set obtained as the limit of exceedance sets. Let \(X \in \textrm{RV}_\alpha (\{a_n\},\mu )\) and consider the exceedance set:
The asymptotic distribution of this random set, conditioned on it being nonempty can be directly characterized in terms of the extremal or taildependence coefficients of X. Specifically, these dependence coefficients can be seen as the hitting and inclusion functionals of a limiting random set \(\Theta\), respectively. For the precise definitions and related notions from the theory of random sets, we will always refer to the monograph of Molchanov (2017).
Before proceeding with the analysis of \(\Theta\) we will introduce some appropriate coefficients. Let
where again \(A_i:= \{ x\in \mathbb R^p\,:\, x_i>1\}, i \in [p]\). Then, in view of (2.12), since the \(B_J\)’s are all pairwise disjoint in J,
This, in view of the socalled Möbius inversion formula, see, e.g., Molchanov (2017), Theorem 1.1.61, yields the inversion formulae:
which is Equation (7) in Schlather and Tawn (2003), Theorem 1. We also have
Finally, the usual inclusion–exclusion type relationships hold between \(\theta\) and \(\lambda\):
Although some of the Relations (3.3), (3.4), and (3.5) are available in the literature, we prove them in Appendix A independently with elementary arguments in Lemma A.2.
Observe that the event \(\{\Theta _n \cap K \not = \emptyset \}\) is \(\{\max _{i\in K} X_i >a_n\}\) and note that
due to (2.2) and \(\mu\) being nonzero. This implies that
The functionals \(T_n(\cdot )\) are known as the hitting functionals of the conditional distribution of the random set \(\Theta _n\). They are completely alternating capacities and their limit yields hitting functionals \(T(K):= \theta _X(K)/\theta _X([p])\) of a nonempty random set \(\Theta \subset [p]\). This random set \(\Theta\) may be viewed as the “typical” exceedance set for a regularly varying vector as the threshold \(a_n\) approaches infinity. It is immediate from (3.3) and Molchanov (2017), Corollary 1.1.31, that
Observing that \(\theta _X([p]) = \sum _{\emptyset \ne K \subset [p]} \beta (K),\) we have thus established the following result.
Proposition 3.1
Let \(X \in \textrm{RV}(\{a_n\},\mu )\) and define the random exceedance set \(\Theta _n:= \{i\,:\, X_i>a_n\}\). Then, as \(n\rightarrow \infty\), we have
where the probability mass function of \(\Theta\) is as in (3.6) and the \(\beta (J)\)’s are as in (3.1). We have moreover that
Remark 3.2
Molchanov and Strokorb (2016) introduced the important class of Choquet random supmeasures whose distribution is characterized by the extremal coefficient functional \(\theta (\cdot )\). This is closely related but not identical to our perspective here, which emphasizes thresholdexceedance rather than maxstability.
The above result shows that all taildependence coefficients can be succinctly represented (up to a constant) via the random set \(\Theta\). This finding allows us to connect the taildependence coefficients to socalled Bernoullicompatible tensors.
Definition 3.3
A ktensor \(T = (T(i_1,\cdots ,i_k))_{1\le i_1,\cdots ,i_k\le p}\) is said to be Bernoullicompatible, if
where \(\xi (1),\cdots ,\xi (p)\) are (possibly dependent) Bernoulli 0 or 1valued random variables, i.e. \(\mathbb P[\xi (i)=1]=p_i=1\mathbb P[\xi (i)=0]\) for some \(p_i \in [0,1], i \in [p]\). If not all \(\xi (i)\)’s are identically zero, the tensor T is said to be nondegenerate.
In the case \(k=2\), this definition recovers the notion of Bernoulli compatibility in Embrechts et al. (2016). Proposition 3.1 implies the following result.
Proposition 3.4

(i)
For every Bernoullicompatible ktensor \(T = (T(i_1,\cdots ,i_k))_{[p]^k}\), there exists a simple maxstable random vector X such that
$$T(i_1,\cdots ,i_k) = \lambda _X(i_1,\cdots ,i_k),$$for all \(i_1,\cdots ,i_k\in [p]\).

(ii)
Conversely, for every simple maxstable random vector \(X=(X_i)_{1\le i\le p}\), and every \(c\ge \theta _X([p])\) (or every \(c>0\) if \(\theta _X([p])=0\))
$$\begin{aligned} (T(i_1,\cdots ,i_k))_{[p]^k} := \frac{1}{c}\cdot \Big ( \lambda _X(i_1,\cdots ,i_k)\Big )_{[p]^k} \end{aligned}$$(3.9)is a Bernoullicompatible ktensor.
Proof
(i) : Assume (3.8) holds and introduce the random (possibly empty) set \(\Theta :=\{i\,:\, \xi (i)=1\}\). Let \(\beta (J):= \mathbb P[\Theta =J]\) and define the simple maxstable vector
where \(1_{J}=(1_J(i))_{1\le i\le p}\) contains 1 in the coordinates in J and 0 otherwise and the \(Z_J\)’s are iid standard 1Fréchet. If T is degenerate, then \(\mathbb P[\Theta =\emptyset ]=1\) and \(\mathbb P[X=(0,\ldots , 0)]=1\), so by our previous convention we have \(\lambda _X(L)=0\) for all \(L\subset [p]\) and the statement follows. Otherwise, X is nondegenerate. Then, in view of Lemma A.1 and since \(\lambda _{1_J Z_J}(L)=1\) for \(L\subset J\) and \(\lambda _{1_J Z_J}(L)=0\) for \(L\not \subset J\), we have
Since for \(L = \{i_1,\cdots ,i_k\}\) we have \(1_{\{L\subset \Theta \}} = \prod _{j=1}^k \xi (i_j)\), we obtain
This completes the proof of (i).
(ii) : If \(\mathbb P[X=(0, \ldots , 0)]=1\), then \(\theta _X([p])=0\) and the statement follows by setting all \(\xi (i_k)\) identically to 0, so assume \(\mathbb P[X = (0, \ldots , 0)]<1\) in the following, which implies \(\theta _X([p])>0\). Let \(\Theta \subset [p]\) be a random set such that (3.7) holds, i.e.,
Define \(\xi (i):= B\cdot 1_{\Theta }(i)\), where B is a Bernoulli random variable, independent of \(\Theta\), such that \(\mathbb P[B=1] = 1\mathbb P[B=0] = q\in (0,1]\) for all \(i \in [p]\). Then, we have that
This shows that (3.9) holds with potentially any \(c\ge \theta _X([p])\). \(\square\)
Remark 3.5
As it can be seen from the proof the lower bound on the constant c in Proposition 3.4 (ii) cannot be improved. Observe that \(\theta ([p]) \le \sum _{i\in [p]} \lambda _X(i)\), where the inequality is strict unless all \(X_i\)’s are independent. Thus, the above result even in the case \(k=2\) improves upon Theorem 3 in Krause et al. (2018) where the range for the constant c is \(c\ge \sum _{i\in [p]} \lambda _X(i)\).
Remark 3.6
In the case of twopoint sets, we have that the bivariate taildependence coefficient
is proportional to the socalled covariance function \((i,j)\mapsto \mathbb P[i,j\in \Theta ] =\mathbb E[1_{\Theta }(i) 1_{\Theta }(j)]\) of the random set \(\Theta\). This shows again that the bivariate taildependence function \((i,j)\mapsto \lambda (i,j)\) is positive semidefinite.
Remark 3.7
Relation (3.11) recovers a simple proof of the Bernoulli compatibility of TD matrices established in Theorem 3.3 of Embrechts et al. (2016). Namely, their result states that \(\Lambda = (\lambda _{i,j})_{p\times p}\) is a matrix of bivariate taildependence coefficients, if and only if \(\Lambda = c \mathbb E[\xi \xi ^\top ]\) for some \(c>0\) and a random vector \(\xi =(\xi _i)_{1\le i\le p}\) with Bernoulli entries taking values in \(\{0,1\}\). Clearly, there is a onetoone correspondence between a random set \(\Theta \subset [p]\) and a Bernoulli random vector: \(\Theta :=\{i\,:\, \xi _i=1\}\) and \(\xi = (1_{\Theta }(i))_{1\le i\le p}\). The characterization result then follows from (3.11).
3.2 Generalized TawnMolchanov models
In the previous section we defined in (3.1) coefficients \(\beta (J)\) to characterize the distribution of the limiting exceedance set \(\Theta\). These coefficients were then used in (3.10) to construct a maxstable random vector in order to prove Proposition 3.4. This special random vector is in fact nothing else than a generalized version of the socalled TawnMolchanov model which we will introduce formally in this section.
The following result is a slight extension and reformulation of existing results in the literature, which have first appeared in Schlather and Tawn (2002, 2003) (see also Strokorb and Schlather (2015); Molchanov and Strokorb (2016) for extensions) in the context of finding necessary and sufficient conditions for a set of \(2^p1\) numbers \(\{\theta (K)\mid \, \emptyset \not =K\subset [p]\}\) to be the extremal coefficients of a maxstable vector X. The novelty here is that we consider maxstable vectors with possibly nonidentical marginals and treat simultaneously the cases of extremal as well as taildependence coefficients.
Theorem 3.8
The function \(\{\theta (K),\ K \subset [p]\}\) (\(\{\lambda (L),\ L\subset [p]\}\), respectively) yields the extremal (taildependence, respectively) coefficients of a simple maxstable vector \(X=(X_i)_{1\le i\le p}\) if and only if the \(\beta (J)\)’s in (3.3) ((3.4), respectively) are nonnegative for all \(\emptyset \not =J\subset [p]\). In this case, let \(Z_J, J \subset [p]\), be iid standard 1Fréchet random variables and define
where \(1_{J}=(1_J(i))_{1\le i\le p}\) contains 1 in the coordinates in J and 0 otherwise. Then, \(X^*\) is a maxstable random vector whose extremal (taildependence) coefficients are precisely the \(\theta (K)\)’s (\(\lambda (L)\)’s, respectively).
The proof is given in Appendix A. The vector \(X^*\) defined in (3.12) is referred to as the TawnMolchanov or simply TMmodel associated with the extremal (taildependence) coefficients \(\{\theta (K)\}\) (\(\{\lambda (L)\},\) respectively).
Remark 3.9
The distribution of the random set \(\Theta\) introduced in Section 3.1 can be understood in terms of the TawnMolchanov model (3.12) using the single large jump heuristic: Given that \(\Theta _n = \{i\,:\, X^*_i>n\} \not =\emptyset\), for large n, only one of the \(Z_J\)’s is extreme enough to contribute to the exceedance set. Thus, with high probability, \(\Theta _n\) equals the corresponding J in (3.12). The probability of the set J to occur is asymptotically proportional to the weight \(\beta (J)\), which explains the formula (3.6).
We have seen in Section 2.4 that extremal dependence can also be measured in terms of spectral distance. In the following section we will explore further the connections between spectral distance and the just introduced TawnMolchanov models and see how the latter naturally lead to a decomposition of the former which is equivalent to \(\ell _1\)embeddability.
4 Embeddability and rigidity of the spectral distance
So far, we have mainly considered the overall taildependence of X or the taildependence function \(\lambda (L)\) for arbitrary \(L\subset [p]\). In this section we will focus on the bivariate dependence as in Section 2.4. Specifically, we look at the spectral distance and prove that it is both \(L^1\) and, equivalently, \(\ell _1\)embeddable. For special spectral distances, namely those corresponding to line metrics, we prove that they are rigid and completely determine the taildependence of a TMmodel.
4.1 \(L^1\)embeddability of the spectral distance
Recall that a function \(d:T\times T \rightarrow [0,\infty )\) on a nonempty set T is called a semimetric on T if (i) \(d(u,u) = 0\), \(u\in T\) (ii) \(d(u,v) = d(v,u),\ u,v\in T\) and (iii) \(d(u,w)\le d(u,v)+d(v,w),\ u,v,w\in T\). The semimetric is a metric if \(d(u,v) = 0\) only if \(u=v\).
Definition 4.1
A semimetric d on a set T is said to be \(L^1(E,\nu )\)embeddable (or short \(L^1\)embeddable, when the measure space is understood) if there exists a collection of functions \(f_t\in L^1(E,\nu )\), \(t\in T\), such that
The concept of \(L^1\)embeddability is extensively discussed in Deza and Laurent (1997). An overview can also be found in Matoušek (2013). Our first theorem in this section shows that the spectral distance matrix \(d_X\) of a maxstable vector X as defined in (2.16) is \(L^1\)embeddable.
Theorem 4.2

(i)
For a simple maxstable vector X with bivariate taildependence coefficients \(\lambda _{i,j} = \lambda _X(i,j)\), the spectral distance
$$\begin{aligned} d(i,j) := \lambda _{i,i} + \lambda _{j,j}  2 \lambda _{i,j} \end{aligned}$$(4.1)(see Definition 2.13 and (2.17)) is an \(L^1\)embeddable semimetric.

(ii)
Conversely, for every \(L^1\)embeddable semimetric d on [p], there exists a simple maxstable vector X such that (4.1) holds with \(\lambda _{i,j}:= \lambda _X(i,j),\ 1\le i,j\le p\). Moreover, there exists a \(c \ge 0\) such that X may be chosen to have equal marginal distributions with \(\Vert X_i\Vert _1 = c, i \in [p]\).

(iii)
The semimetric d in parts (i) and (ii) is a metric if and only if \(\mathbb P[X_i \not = X_j]>0\) for all \(i\not =j\).
Proof
Part (i): Suppose that \(X=(X_i)_{1\le i\le p}\) is simple maxstable and let \(f_i\in L_+^1([0,1])\) be as in (2.6), where for simplicity and without loss of generality we choose \(\nu =\)Leb. In view of Relation (2.13), we obtain
Now the identity \(ab = a + b  2(a\wedge b)\) implies
This shows that the semimetric in (4.1) is \(L^1\)embeddable. Note that d is a metric if and only if \(f_i(\cdot )\not =f_j(\cdot )\), almost everywhere, or equivalently \(X_i\not = X_j\) a.s., for all \(i\not =j\).
Part (ii): Suppose now that \(d(i,j) = \ \Vert g_i  g_j\Vert _{L^1}\) for some \(g_i\in L^1(E,\nu ),\ i\in [p]\). For simplicity and without loss of generality, we can assume that \((E,\mathcal{E},\nu ) = ([0,1],\mathcal{B}[0,1],\textrm{Leb})\). Define the function \(g^*(x):= \max _{i\in [p]} g_i(x)\) and let
This way, we clearly have that the \(f_i\)’s are nonnegative elements of \(L^1([0,1])\) and
Letting \(X_i:= I(f_i)\) be the extremal integrals defined in (2.7), we obtain as in (4.2) that
This proves the first claim in part (ii). It remains to argue that (with this particular choice of \(f_i\)’s) the scales of the \(X_i\)’s are all equal. Note that \(\Vert X_i\Vert _1 = \Vert f_i\Vert _{L^1}\) and since
we obtain \(\Vert X_i\Vert _1 = \Vert f_i\Vert _{L^1} = \int _0^{1} g^*(u) du,\) for all \(i\in [p],\) which completes the proof of part (ii).
Part (iii): The claim follows from the observation that \(X_i:=I(f_i) = I(f_j)=:X_j\) almost surely if and only if \(f_i=f_j\) a.e., or equivalently, \(\Vert f_if_j\Vert _{L^1}=0\).\(\square\)
Remark 4.3
The construction in the proof of part (ii) of Theorem 4.2 still works for \(f_i\) replaced by \(\tilde{f}_i=f_i+\tilde{c}\) for any \(\tilde{c}>0\). Thus, the constant c can be chosen equal to or larger than \(\int _0^{1} g^*(u) du\), where \(g^*(x):= \max _{i\in [p]} g_i(x)\) and \(g_i\in L^1(E,\nu ),\ i\in [p]\) such that \(d(i,j) = \Vert g_i  g_j\Vert _{L^1}\). In particular, for \(\int _0^{1} g^*(u) du\le 1\), one may choose X with standardized marginals, i.e. \(\Vert X_i\Vert _1=1, i \in [p]\).
4.2 \(\ell _1\)embeddability of the spectral distance
In Theorem 4.2 we have shown the equivalence between \(L^1\)embeddable metrics and spectral distances of simple maxstable vectors. In this section, we will additionally state an explicit formula for the \(\ell _1\)embedding of the spectral distance. Thereby we show that \(L^1\) and \(\ell _1\)embeddability are equivalent and, in passing, we recover and provide novel probabilistic interpretations of the socalled cutdecomposition of \(\ell _1\)embeddable metrics (Deza and Laurent 1997).
Definition 4.4
A semimetric d on T is said to be \(\ell _1\)embeddable in \((\mathbb R^m, \Vert \cdot \Vert _{\ell _1})\) (or short \(\ell _1\)embeddable) for some integer \(m\ge 1\) if there exist \(x_t=(x_t(k))_{1\le k\le m}\in \mathbb R^m\), \(t\in T\), such that
Proposition 4.5
A semimetric d on the finite set [p] is embeddable in \(L^1(E,\mathcal{E},\nu )\) if and only if
for some nonnegative \(\beta (J)\)’s. This means that d is \(L^1\)embeddable if and only if it is \(\ell _1\)embeddable in \(\mathbb R^m\), where \(m = \mathcal{J}\) and \(\mathcal{J} = \{ \emptyset \ne J\subset [p]\,:\, \beta (J)>0\}\). Indeed, (4.3) is equivalent to \(d(i,j) = \Vert x_i  x_j\Vert _{\ell _1}\), with \(x_i = (x_i(J))_{J\in \mathcal{J}}:= (\beta (J)1_J(i))_{J\in \mathcal{J}} \in \mathbb R_+^m,\ i,j\in [p]\).
Proof
By Theorem 4.2, d is \(L^1\)embeddable if and only if (4.1) holds, where \(\lambda _{i,j}=\lambda _X(\{i,j\})\) for some simple maxstable random vector X. If this X is degenerate, d is equal to 0 and (4.3) follows by setting all \(\beta (J)\)’s to 0. Otherwise, \(X \in \textrm{RV}(\{a_n\},\mu )\). Then, in view of (3.7) for the special case of \(J=\{i\}\), using that \(\mathbb P[J\subset \Theta ] = \mathbb E[ 1_{\{ J\subset \Theta \}}]\), we have
Taking \(X^*\) to be the (generalized) TMmodel with matching extremal coefficients to those of X, by Relations (3.6) and (4.4) we obtain (4.3).\(\square\)
Remark 4.6
Equation (4.4) shows that the spectral distance d is proportional to the probability that the limiting exceedance set \(\Theta\) covers one and only one of the points i and j.
Remark 4.7
Proposition 4.5 recovers the wellknown result that \(L^1\) and \(\ell _1\)embeddability are equivalent (see Theorem 4.2.6 in Deza and Laurent 1997).
Proposition 4.5 also provides a probabilistic interpretation of the socalled cutdecomposition of \(\ell _1\)embeddable metrics. To connect to the rich literature on the subject, we will introduce some terminology following Chapter 4 of the monograph of Deza and Laurent (1997).
Let \(J\subset [p]\) be a nonempty set and define the socalled cut semimetric:
The positive cone CUT\(_p:=\{ \sum _{J\subset [p]} c_J \delta (J),\ c_J\ge 0\}\) is referred to as the cut cone of nonnegative functions defined on [p]. Notice that CUT\(_p\) consists of semimetrics. Therefore, Proposition 4.5 entails that the cut cone CUT\(_p\) comprises all \(\ell _1\)embeddable metrics on p points (Proposition 4.2.2 in Deza and Laurent 1997. Relation (4.3), moreover, provides a decomposition of any such metric as a positive linear combination of cut semimetrics. The coefficients of this decomposition are precisely the coefficients of some TawnMolchanov model. Finally, in view of (4.4), the random exceedance set \(\Theta\) of this TMmodel is such that
Remark 4.8
For a given spectral distance d, Proposition 4.5 provides a decomposition and thereby shows the \(\ell _1\)embeddability of d in \(\mathbb R^m\), where \(m = \mathcal{J}\) and \(\mathcal{J} = \{ \emptyset \ne J\subset [p]\,:\, \beta (J)>0\}\). Without further knowledge about the number of J such that \(\beta (J)>0\) we can always choose \(m=2^p2\), since we may set \(\beta ([p])=0\) as it does not affect d. However, by Caratheodory’s theorem each \(\ell _1\)embeddable metric on [p] is in fact known to be \(\ell _1\)embeddable in \(\mathbb{R}^m\), with \(m=\binom{p}{2}\) see (Matoušek 2013, Proposition 1.4.2). We would like to mention that finding the corresponding “minimal” TMmodel (i.e. the one with minimal \(\mathcal {J}\)) and analyzing the properties of such representations could be an interesting topic for further research.
Observe that
where \(J^c = [p]\setminus J\), which implies that, in general, the decomposition of d in Proposition 4.5 is not unique. Furthermore, \(\beta ([p]) \ge 0\) does not affect d in (4.3), since \(1_{[p]}(i)1_{[p]}(j)=0\). The next definition guarantees that, apart from those unavoidable ambiguities, the representation in (4.3) is essentially unique.
Definition 4.9
An \(\ell _1\)embeddable metric d is said to be rigid if for any two representations
and
with nonnegative \(\beta (J), \tilde{\beta }(J), \emptyset \ne J \subset [p],\) the equality
holds for all \(\emptyset \ne J \subsetneq [p]\).
Observe that each semimetric d on p points can be identified with a vector \(d = (d(i,j),\ 1\le i < j\le p)\) in \(\mathbb R^N\), where \(N:= {p\atopwithdelims ()2}\). Thus, sets of such semimetrics can be treated as subsets of the Euclidean space \(\mathbb R^N\). By Corollary 4.3.3 in Deza and Laurent (1997), the metric d is rigid, if and only if it lies on a simplex face of the cutcone \(\textrm{CUT}_p\). That is, if and only if the set \(\{J_1,\cdots ,J_m\}=\{\emptyset \ne J \subset [p]: \beta (J)>0\}\) is such that the cut semimetrics \(\delta (J_i),\ i=1,\cdots ,m\) (defined in (4.5)) lie on an affinely independent face of \(\textrm{CUT}_p\). Recall that the points \(\delta _i\in \mathbb R^N,\ i=1,\cdots ,m\) are affinely independent if and only if \(\{\delta _i\delta _1,\ i=2,\cdots ,m\}\) are linearly independent. In general, the description of the faces of the cutcone is challenging, but the next section deals with a special class of metrics which are always rigid.
4.3 Rigidity of line metrics
In this section we show that socalled line metrics are rigid (cf. Definition 4.9) and that for spectral distances corresponding to line metrics the bivariate taildependence coefficients, in combination with the marginal distribution, fully determine the higher order taildependence coefficients of the underlying random vector and thus the coefficients of the corresponding TawnMolchanov model.
Definition 4.10
A metric d on [p] is said to be a line metric if there exist a permutation \(\pi =(\pi _i)_{1\le i\le p}\) of [p] and some weights \(w_k\ge 0\), \(1\le k\le p1\), such that
In other words, d is a line metric if all points of [p] can be ordered with different distances on some line and the distance between any two points equals the distance along that line.
Theorem 4.11
Let d be a line metric, where without loss of generality the indices are ordered in such a way that for all \(1\le i < j\le p\) and some \(w_k\ge 0\)

(i)
The line metric d is \(\ell _1\)embeddable and rigid.
Assume in addition that X follows a (generalized) TMmodel as in (3.12) with given univariate \(\lambda (i) = \lambda _{i,i}\) and bivariate taildependence coefficients \(\lambda (i,j)=\lambda _{i,j}\) satisfying (4.1) with d as in (4.6). Then:

(ii)
For every nonempty set \(J\subset [p]\), we have
$$\begin{aligned} \lambda (J) = \lambda (i,j),\ \ \text { where } i = \min (J) \text { and }j=\max (J). \end{aligned}$$ 
(iii)
For the coefficients \(\beta (J)\) of the (generalized) TMmodel, we have that for all \(1\le k\le p1\),
$$\begin{aligned} \beta ([1:k]) =\lambda (k)\lambda (k,k+1),\ \ \beta ([k+1:p]) = \lambda (k+1)\lambda (k,k+1), \end{aligned}$$(4.7)where \([i:j]:=\{i, i+1, \ldots , j1,j\}, i<j \in [p],\)
$$\begin{aligned} \beta ([p])=\lambda (1,p),\end{aligned}$$(4.8)and \(\beta (J)=0\) for all other \(J\subset [p]\).
Proof
Part (i): To see that d is \(\ell _1\)embeddable, set \(\beta ([1:k])=w_k, k \in [p1],\) and \(\beta (J)=0\) for all other sets \(\emptyset \ne J \subset [p]\), which gives
Thus, d is \(\ell _1\)embeddable by Proposition 4.5.
Let now \(\beta (J), \emptyset \ne J \subset [p]\) be the coefficients of a representation (4.3) of d. We will show that
To this end, note that (4.6) implies, for any \(i\le j \in [p]\), that \(d(i,j)=\sum _{k=i}^{j1}d(k,k+1)\) and thus
or, equivalently,
Since
and all \(\beta (J)\) are nonnegative, (4.10) implies that
for those J with \(\beta (J)>0\) and all \(i \le j \in [p]\). Note that this immediately excludes that \(1, p \in J^c\) as J was assumed to be nonempty. The three remaining cases are:

(i)
If \(1, p \in J\), then \(J=[p]\).

(ii)
If \(1 \in J, p \in J^c\), then there exists one \(k \in [p]\) such that \(J=[1:k]\).

(iii)
If \(1 \in J^c, p \in J\), then there exists one \(k \in [p]\) such that \(J=[k:p]\).
We have thus shown (4.9) and in order to show that d is rigid, we only need to consider sets of the form \(J=[1:k], J^c=[k+1:p], k \in [p1]\). For those sets we get
and thus the sum \(\beta (J)+\beta (J^c)=w_k\) is invariant for all representations (4.3) of d and d is rigid.
Part (ii): Let \(\emptyset \ne J \subset [p]\) and set \(i=\min (J), j=\max (J)\). Then, from part (i) and (3.2),
where we used the fact that \(\beta (J) = 0\), for all \(J\subset [2:p1]\) established in the proof of part (i). This completes the proof of (ii).
Part (iii): We have from (4.11) that
and it follows for \(k \in [1:p1]\) by (i) and (3.2) that
Together, this gives (4.7). Furthermore, (4.8) follows from
That \(\beta (J)=0\) if J is not of the form [1 : k] or \([k:p], k \in p,\) has already been shown in (i).\(\square\)
Remark 4.12
Consider a maxstable vector X with standard 1Fréchet marginals, i.e., \(\Vert X_i\Vert _1 = \lambda _X(i) = 1,\ i\in [p]\). Theorem 4.11 shows that if the spectral distance \(d_X(i,j)= 2(1\lambda _X(i,j)),\ i,j\in [p]\) is a line metric on [p], then
and for all other \(\emptyset \ne J \subset [p], \beta (J)=0\). In particular, all higher order extremal coefficients of X are then completely determined by the bivariate taildependence coefficients and given from (3.2) by
Remark 4.13
The random set \(\Theta\) corresponding to such linemetric taildependence is a random segment with one of its endpoints anchored at 1 or p. This is a direct consequence of the characterisation of \(\beta (J)\) in from Theorem 4.11 (iii) and (3.6).
Remark 4.14
In practical applications, the nonparametric inference on higherorder taildependence coefficients can be very challenging or virtually impossible. Only, say, the bivariate taildependence coefficients \(\Lambda = (\lambda _X(i,j))_{p\times p}\) of the vector X may be estimated well. Given such constraints, one may be interested in providing upper and lower bounds on \(\lambda _X(\{1,\cdots ,p\})\), which provide the worst and bestcase scenarios for the probability of simultaneous extremes.
If the spectral distance turns out to be a line metric and the marginal distributions are known, then Theorem 4.11 provides a way to precisely calculate \(\lambda _X(\{1,\cdots ,p\})\). However, in general this problem falls in the framework of computational risk management (see e.g. Embrechts and Puccetti 2010) as well as the distributionally robust inference perspective (see, e.g. Yuen et al. 2020, and the references therein). The problem can be stated as a linear optimization problem in dimension \(2^p1\), similar to the approach in Yuen et al. (2020). Unfortunately, the exponential growth of complexity of the problem makes it computationally intractable for \(p\ge 15\). In fact, the exact solution to such types of optimization problems may be NPhard. This underscores the importance of the line of research initiated by Shyamalkumar and Tao (2020) where new approximate solutions or modelregularized approaches to distributionally robust inference in highdimensional extremes are of great interest.
5 Computational complexity of decision problems
In this section we will use known results about the algorithmic complexity of \(\ell _1\)embeddings to derive that the socalled tail dependence realization problem is NPcomplete, thereby confirming a conjecture from Shyamalkumar and Tao (2020). While a formal introduction to the theory of algorithmic complexity is beyond the scope of this paper, we shall informally recall the basic notions needed in our context following the treatment in (Deza and Laurent 1997, Section 2.3).
Consider a class of computational problems D, where each instance \(\mathcal{I}\) of D can be encoded with a finite number of bits \(\mathcal{I}\). D is said to be a decision problem, if for any input instance \(\mathcal{I}\) there is a correct answer, which is either “yes” or “no”. The goal is to determine this answer based on any input \(\mathcal{I}\) by using a computer (i.e., a deterministic Turing machine).
The decision problem D is said to belong to:

The class P (for polynomial complexity), if there is an algorithm (i.e., a deterministic Turing machine), that can produce the correct answer in polynomial time, i.e. its running time is of the order \(\mathcal {O}(\mathcal{I}^k)\) for some \(k \in \mathbb {N}\).

The class NP (nondeterministic polynomial time) if the problem admits a polynomiallyverifiable positive certificate. More precisely, this means that for each instance \(\mathcal{I}\) of D with positive (“yes”) answer, there exists a finitebit certificate \(\mathcal{C}\) of size \(\mathcal{C}\) that can be verified by an algorithm / deterministic Turing machine with running time \(\mathcal {O}(\mathcal{C}^l)\) for some \(l \in \mathbb {N}\). (The certificate needs not be constructed in polynomial time.)

The class NPhard if any problem in NP reduces to D in polynomial time. This means that for every problem \(D'\) in NP, the correct answer to this decision problem for any instance \(\mathcal {I}'\) of \(D'\) can be found by first applying an algorithm that runs in polynomial time of \(\mathcal{I}'\) to transform \(\mathcal {I}'\) into an instance \(\mathcal {I}\) of D and then solve the decision problem D for this instance \(\mathcal {I}\). Note that this definition does not require that D itself is in NP.

The class NPcomplete if D is both in NP and is NPhard.
A decision problem which has received some attention recently, see Fiebig et al. (2017), Embrechts et al. (2016), Krause et al. (2018), and Shyamalkumar and Tao (2020), is the realization problem of a TD matrix with standardized entries on the diagonal, namely finding an algorithm with the following input and output:
This problem may at a first glance look similar to deciding whether a given matrix is a valid covariance matrix. Indeed, as a strengthening of Remark 3.7, it can be shown that there exists a bijection between TD matrices as in the above problem and a subset of the socalled Bernoullicompatible random matrices, i.e. expected outer products \(E(YY^t)\) of random (column) vectors Y with Bernoulli margins, see Embrechts et al. (2016) and Fiebig et al. (2017). But while it is a simple task to check if a matrix is the covariance matrix of some random vector, for example by finding the eigenvalues of this matrix, it can become more difficult to check whether a matrix is the covariance matrix or outer product of a restricted space of random variables. Practical and numerical aspects of deciding whether a given matrix is a TD matrix have been studied in Krause et al. (2018) and Shyamalkumar and Tao (2020), including a discussion on the computational complexity of the problem. Indeed, they point out that due to results by Pitowsky (1991), checking whether a matrix is Bernoullicompatible is an NPcomplete problem. However, some subtlety arises as in order to check whether a \(p \times p\)matrix L is a socalled tail coefficient matrix, i.e. a TD matrix with 1’s on the diagonal, it needs to be checked that \(p^{1}L\) is Bernoullicompatible, see Shyamalkumar and Tao (2020) and our Proposition 3.4. Thus, the problem narrows down to checking Bernoulli compatibility of the subclass of matrices with 1/p on their diagonal and this may have a different complexity than the general membership problem. Due to the similarity in the above mentioned problems, Shyamalkumar and Tao (2020) conjecture that the TDR problem is NPcomplete as well.
We add to the discussion by using results about computational complexity of problems related to cut metrics and metric embeddings, see Section 4.4 in Deza and Laurent (1997) for a brief overview over some relevant results. To this end, let us first introduce a problem which is related to the TDR problem but easier to handle for the subsequent complexity analysis.
With the help of our previous results and the known computational complexity of \(\ell _1\)embeddings it is simple to establish the computational complexity of the above problem.
Theorem 5.1
The SDR problem with unconstrained, identical margins is NPcomplete.
Proof
Due to Theorem 4.2 (i)(ii), the spectral distance \(d(i,j)=2( c\lambda _X(i,j))\) of a simple maxstable random vector with \(\Vert X_i\Vert _1=c, i \in [p],\) is \(L^1\)embeddable and for each \(L^1\)embeddable semimetric d there exists a simple maxstable vector X with \(\Vert X_i\Vert _1=c, i \in [p],\) for some \(c>0\) such that d is the spectral distance of X. Thus, the question is equivalent to checking that d is \(L^1\)embeddable and this is equivalent to checking that d is \(\ell _1\)embeddable, see Remark 4.7. The latter problem is NPcomplete by Avis and Deza (1991), see also (P5) in Deza and Laurent (1997).\(\square\)
Remark 5.2
In the SDR problem one could add more assumptions about d in the first place under “Input”, for example that the entries on the diagonal of d are equal to 0 or that d is a distance matrix. Alternatively, one could also just assume under “Input” that d is a \(p \times p\)matrix. Since a positive answer to the question would always ensure that d is a distance matrix and all mentioned properties (nonnegativity, symmetry, triangle inequality) could be checked in a number of steps which is a polynomial in p these additional assumptions do not change the NPcompleteness of the problem.
Unfortunately, the constant c in (5.1) is not part of the input in the algorithm and thus cannot be fixed a priori. If we could for example set \(c=1\) and thus ask if for a given d a simple maxstable vector X with standard 1Fréchetmargins exists such that \(d(i,j)=2(1\lambda _X(i,j))\), then this is equivalent to checking that \(\lambda _{i,j}:=1d(i,j)/2\) is a TD matrix. But while such an arbitrary fixation of c may change the nature of the problem, the following statement points out an a posteriori feasible range for c.
Lemma 5.3
If the outcome of the SDR problem with unconstrained, identical margins is a positive answer to the question, then (5.1) holds for a suitable chosen maxstable vector X and every \(c \ge (2^p2)\max _{i,j \in [p]}d(i,j)\).
The proof is given in Appendix A. From the previous lemma we see that the SDR problem with unconstrained, identical margins is equivalent to
Finally, by changing from X to \(\tilde{X}:=X/((2^p2)\max _{i,j \in [p]}d(i,j))\) the spectral distance \(d_{\tilde{X}}\) of \(\tilde{X}\) and bivariate taildependence coefficients \(\lambda _{\tilde{X}}(i,j)\) scale accordingly by Lemma A.1 and we see that the latter problem is actually equivalent to
From the last line in the above problem we can see that our SDR problem with constrained, standard margins can be solved if we have an algorithm to check that \(\lambda\) of the given form is a TD matrix. But since we know by the stated equivalence of all three SDR problems in combination with Theorem 5.1 that all of them are NPcomplete, we know that this algorithm has to be NPcomplete as well. This leads to the following result.
Theorem 5.4
The TDR problem is NPcomplete.
Proof
We need to show that the TDR problem is both in NP and NPhard. That the TDR problem is in NP has been shown in (Shyamalkumar and Tao 2020, p. 255), with the help of Caratheodory’s theorem. We start with the first statement and follow the typical way to prove this by reducing a known NPcomplete problem to TDR. Indeed, any input matrix d(i, j) to any of the three equivalent, and by Theorem 5.1 NPcomplete, SDR problems can be transformed in polynomial time to the matrix \(\lambda (i,j):=1d(i,j)/(2(2^p2)\max _{i,j \in [p]}d(i,j))\). By the statement of the third SDR problem, the question with input d can be answered by using \(\lambda\) as an input to the TDR problem. Thus, an NPcomplete problem reduces in polynomial time to the TDR problem and the TDR problem is NPhard, thus NPcomplete.\(\square\)
Data availability
Not applicable.
References
Avis, D., Deza, M.: The cut cone, \({L}^1\) embeddability, complexity, and multicommodity flows. Networks 21(6), 595–617 (1991). https://doi.org/10.1002/net.3230210602
Basrak, B., Planinić, H.: A note on vague convergence of measures. Statist. Probab. Lett. 153, 180–186 (2019). https://doi.org/10.1016/j.spl.2019.06.004
Beirlant, J., Goegebeur, Y., Teugels, J., Segers, J.: Statistics of extremes. Wiley Series in Probability and Statistics, John Wiley & Sons, Chichester (2004). https://doi.org/10.1002/0470012382
Castillo, E.: Extreme value theory in engineering. Statistical Modeling and Decision Science, Academic Press Inc, Boston, MA (1988). https://doi.org/10.1016/c20090221696
Coles, S.: An introduction to statistical modeling of extreme values. Springer Series in Statistics, SpringerVerlag, London, London (2001). https://doi.org/10.1007/9781447136750
Coles, S., Heffernan, J., Tawn, J.: Dependence measures for extreme value analyses. Extremes 2, 339–365 (1999). https://doi.org/10.1023/A:1009963131610
Davis, R.A., Resnick, S.I.: Basic properties and prediction of maxARMA processes. Adv. in Appl. Probab. 21(4), 781–803 (1989). https://doi.org/10.2307/1427767
Davis, R.A., Resnick, S.I.: Prediction of stationary maxstable processes. Ann. Appl. Probab. 3(2), 497–525 (1993). https://doi.org/10.1214/aoap/1177005435
de Haan, L.: A spectral representation for maxstable processes. Ann. Probab. 12(4), 1194–1204 (1984). https://doi.org/10.1214/aop/1176993148
de Haan, L., Ferreira, A.: Extreme value theory: an introduction. Springer Science & Business Media (2007). https://doi.org/10.1007/0387344713
de Oliveira, J.T.: Structure theory of bivariate extremes extensions. Estudos de Math. Estat. Econom. 7, 165–195 (1962)
Deza, M.M., Laurent, M.: Geometry of cuts and metrics, Vol. 15 of Algorithms and Combinatorics, SpringerVerlag, Berlin (1997). https://doi.org/10.1007/9783642042959
Dyszewski, P., Mikosch, T.: Homogeneous mappings of regularly varying vectors. Ann. Appl. Probab. 30, 2999–3026 (2020). https://doi.org/10.1214/20AAP1579
Embrechts, P., Hofert, M., Wang, R.: Bernoulli and taildependence compatibility. Ann. Appl. Probab. 26(3), 1636–1658 (2016). https://doi.org/10.1214/15AAP1128
Embrechts, P., Puccetti, G.: Bounds for the sum of dependent risks having overlapping marginals. J. Multivariate Anal. 101(1), 177–190 (2010). https://doi.org/10.1016/j.jmva.2009.07.004
Fiebig, U.R., Strokorb, K., Schlather, M.: The realization problem for tail correlation functions. Extremes 20(1), 121–168 (2017). https://doi.org/10.1007/s1068701602508
Finkenstädt, B., Rootzén, H.: Extreme Values in Finance. Telecommunications, and the Environment, Monographs on Statistics and Applied Probability, Chapman & Hall CRC Press, New York (2003). https://doi.org/10.1201/9780203483350
Frahm, G., Junker, M., Schmidt, R.: Estimating the taildependence coefficient: properties and pitfalls. Insurance Math. Econom. 37(1), 80–100 (2005). https://doi.org/10.1016/j.insmatheco.2005.05.008
Hult, H., Lindskog, F.: Regular variation for measures on metric spaces. Publ. Inst. Math. (Beograd) (N.S.) 80(94), 121–140 (2006). https://doi.org/10.2298/PIM0694121H
Krause, D., Scherer, M., Schwinn, J., Werner, R.: Membership testing for bernoulli and taildependence matrices. J. Multivar. Anal. 168, 240–260 (2018). https://doi.org/10.1016/j.jmva.2018.07.014
Kulik, R., Soulier, P.: Heavytailed time series. Springer Series in Operations Research and Financial Engineering, Springer, New York. (2020). https://doi.org/10.1007/9781071607374
Matoušek, J.: Lecture notes on metric embeddings, Technical report, Institute of Theoretical Computer Science, ETH Zürich (2013). https://kam.mff.cuni.cz/~matousek/baa4.pdf
Molchanov, I.: Theory of random sets, Vol. 87 of Probability Theory and Stochastic Modelling, SpringerVerlag, London. Second edition (2017). https://doi.org/10.1007/9781447173496
Molchanov, I., Strokorb, K.: Maxstable random supmeasures with comonotonic tail dependence. Stochastic Process. Appl. 126(9), 2835–2859 (2016). https://doi.org/10.1016/j.spa.2016.03.004
Pitowsky, I.: Correlation polytopes: their geometry and complexity. Math. Program. 50(1), 395–414 (1991). https://doi.org/10.1007/BF01594946
Rachev, S.T.: Handbook of heavy tailed distributions in finance: Handbooks in finance, Book 1. Elsevier (2003). https://doi.org/10.1016/B9780444508966.X50006
Resnick, S.I.: Extreme Values. Regular Variation and Point Processes, SpringerVerlag, New York. (1987). https://doi.org/10.1007/9780387759531
Resnick, S.I.: Heavytail phenomena, Springer Series in Operations Research and Financial Engineering, Springer. New York. Probabilistic and statistical modeling (2007). https://doi.org/10.1007/9780387450247
Schlather, M., Tawn, J.: Inequalities for the extremal coefficients of multivariate extreme value distributions. Extremes 5(1), 87–102 (2002). https://doi.org/10.1023/A:1020938210765
Schlather, M., Tawn, J.A.: A dependence measure for multivariate and spatial extreme values: properties and inference. Biometrika 90(1), 139–156 (2003). https://doi.org/10.1093/biomet/90.1.139
Shyamalkumar, N.D., Tao, S.: On tail dependence matrices. Extremes 23, 245–285 (2020). https://doi.org/10.1007/s1068701900366y
Smith, R.L.: Maxstable processes and spatial extremes. Unpublished Manuscript 205, 1–32 (1990)
Stoev, S., Taqqu, M.S.: Extremal stochastic integrals: a parallel between maxstable processes and \(\alpha \)stable processes. Extremes 8, 237–266 (2005). https://doi.org/10.1007/s1068700600040
Strokorb, K., Ballani, F., Schlather, M.: Tail correlation functions of maxstable processes: construction principles, recovery and diversity of some mixing maxstable processes with identical TCF. Extremes 18(2), 241–271 (2015). https://doi.org/10.1007/s106870140212y
Strokorb, K., Schlather, M.: An exceptional maxstable process fully parameterized by its extremal coefficients. Bernoulli 21(1), 276–302 (2015). https://doi.org/10.3150/13BEJ567
Yuen, R., Stoev, S., Cooley, D.: Distributionally robust inference for extreme ValueatRisk. Insurance Math. Econom. 92, 70–89 (2020). https://doi.org/10.1016/j.insmatheco.2020.03.003
Acknowledgements
We would like to thank the referees and the AE for a careful reading of the manuscript and helpful comments.
Funding
Open Access funding enabled and organized by Projekt DEAL. Stilian Stoev was partially supported by the NSF Grant DMS1916226.
Author information
Authors and Affiliations
Contributions
All three authors have made equal contributions to this manuscript.
Corresponding author
Ethics declarations
Ethical approval
Not applicable.
Conflict of interest
The authors declare that there are no competing interests related to this submission.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A. Proofs and auxiliary results
A. Proofs and auxiliary results
1.1 A.1 Proofs for Section 2
Proof of Proposition 2.5
Here, for brevity, we shall write \(\{h\in A\}\) for the preimage set \(h^{1}(A)=\{x\in \mathbb R^p\,:\, h(x)\in A\}\). By the continuity of h, it follows that for all \(a\ge 0\), the set \(\{h\ge a\}\) is closed and \(\{h>a\}\) is open. Hence \(\{h=a\} = \{h\ge a\}\setminus \{h>a\} \supset \partial \{h>a\}\), where \(\partial A = A^\textrm{cl}\setminus A^\textrm{int}\) denotes the boundary of the set A. Since \(h(0) = 0\) (by continuity and homogeneity), we have that for all \(a>0\), the closed set \(\{h\ge a\}\) does not contain 0 and hence it is bounded away from 0. Thus, \(\mu (\{ h\ge a\}) <\infty\). Since \(\{h=t\} = t \cdot \{h=1\},\ t>0\), the scaling property (2.2) of \(\mu\) implies that \(\mu (\{h=t\}) = t^{\alpha } \mu (\{h=1\})\) and if \(\mu (\{h=t\})>0\) for some (any) \(t>0\), then \(\mu (\{h=t\})>0\), for all \(t>0\). On the other hand, we have that \(\{h\ge a\} = \cup _{t\ge a} \{h=t\}\), where the latter union involves an uncountable collection of disjoint sets. Thus, \(\mu (\{h=t\})\) must vanish for all \(t>0\). This means that \(\mu (\partial \{\mu > a\}) = 0\), or that \(\{h>a\}\) are \(\mu\)continuity sets for all \(a>0\). This allows us to apply the definition of regular variation (2.1) and obtain
Now, (2.3) entails
where in the last two displays we used the homogeneity of h and the change of variables \(x:=r^{\alpha }\).
This completes the proof of the first relation in (2.8). The second relation therein follows from the observation that \(\sigma (\cdot )/\sigma (S)\) is a probability distribution.\(\square\)
Proof of Proposition 2.6
Since X has nonnegative components, to establish its regular variation, it is enough consider measures supported only on \(\mathbb R_+^p\) and show that
where \(\mu [0,x]^c = \log (\mathbb P[X\le x])\) with \([0,x]^c:= \mathbb R_+^p \setminus [0,x]\).
Fix an \(x \in [0,\infty )^p\setminus \{0\}\). The definition of simple maxstability (2.5) entails \(\mathbb P[X \le n x]^n = \mathbb P[X\le x]\). Thus, for all \(n \in \mathbb {N}\),
Observe that \(\mathbb P[X\le x]\) is positive. Indeed, it is easy to see that since X is nondegenerate \(\xi :=\max _{1\le i\le p} x_i^{1} X_i\) is 1Fréchet and thus \(\mathbb P[\xi \le 1 ] = \mathbb P[X\le x] >0\), for all \(x\in \mathbb R_+^p\). This means that for the boundedly finite measures \(\mu _n(\cdot )= n \mathbb P[ X\in n\cdot ]\), we have
The latter relation shows that the sequence of measures \(\{\mu _n\}\) is relatively compact in \(M_0(\mathbb R^p)\), equipped with the \(M_0\)convergence topology. Indeed, by (Hult and Lindskog 2006, Theorem 2.7), it suffices to show that for all \(\varepsilon >0\) and \(\eta >0\), there exists an \(M=M(\varepsilon ,\eta )>0\), such that
The first condition follows from (6.2) and since \(\mathbb P[X \le x]>0\). The second condition follows from the fact that \(\log (\mathbb P[X\le x]) \downarrow 0\), as \(x\uparrow \infty\), which is true since X has a valid probability distribution.
The relative compactness of the measures \(\{\mu _n\}\) entails that \(\mu _{n'}{\mathop {\Longrightarrow }\limits ^{\mathrm{M_0}}} \mu\) for some \(\mu \in M_0\) and a subsequence \(n'\rightarrow \infty\). However, by (6.2) and Proposition 2.4 we have
for all \(x\in [0,\infty )^p\setminus \{0\},\) and the limit measure is uniquely determined by its values on all the complements of rectangles containing the origin. Furthermore, we see from (6.3) that for nondegenerate X, the limit measure \(\mu\) is nondegenerate as well. This proves that \(X\in \textrm{RV}_1(\{n\},\mu )\) where \(\mathbb P[X\le x] = \exp \{\mu [0,x]^c\}\).
Having established regular variation, the first equality in Relation (2.9) follows from the \(\mu\)continuity of the set \(\{h>1\}\) as argued in the proof of Proposition 2.5. The rest of Relation (2.9) follows from (6.3).
Finally, the representation in (2.10) follows from the fact that \(\sigma\) is determined by \(\int _S g(u) \sigma (du)\), for all continuous functions \(g: S\rightarrow \mathbb R_+\). Indeed, for every such g, the function \(h(x):= g(x/\Vert x\Vert ) \Vert x\Vert 1_{\{x\not = 0\}}\) is continuous, nonnegative and 1homogeneous and hence by (2.9)
This, since g is arbitrary, proves (2.10).\(\square\)
Proof of Corollary 2.11
Relation (2.12) follows by applying (2.9) to h replaced by the continuous and homogeneous functions \(h_{\max ,L} (x):= (\max _{i\in L}x_i)_+\) and \(h_{\min ,L} (x):= (\min _{i\in L}x_i)_+\), respectively. Indeed, observe that
On the other hand, \(\{x\,:\, h_{\min ,L}(x)>1\} = \bigcap _{i\in L} A_i\).
The formula for \(\lambda (L)\) in Relation (2.13) follows from the above and Equation (2.9) since \(h_{\min ,L}(\textbf{f})=\min _{i\in L} f_i(x).\) The derivations of the formulae for \(\theta (K)\) are similar.
We conclude this section with the auxiliary result, that the spectral distance and the taildependence coefficients are linear under maxlinear combinations, in the sense of the following lemma.
Lemma A.1
Let \(X^{(t)}=(X^{(t)}_i)_{1\le i\le p}\), \(1\le t\le n\), be independent simple maxstable vectors with tail measures \(\mu ^{(t)}\), \(1\le t\le n\), and let \(\gamma _t\ge 0\), \(1\le t\le n\), be some nonnegative weights. Define \(\bar{X}=\bigvee _{t=1}^{n}\gamma _t X^{(t)}\). Then,
and
Proof of Lemma A.1
By the independence of \(X^{(t)}\), \(1\le t\le n\), and Proposition 2.6 it applies
for all \(x \in \mathbb {R}_+^p \setminus \{0\}\), where in the last step the homogeneity of \(\mu ^{(t)}\) was applied. Thus, \(\bar{X}\) has the tail measure \(\mu _{\bar{X}}=\sum _{t=1}^{n}\gamma _t \mu ^{(t)}\), i.e. the tail measure of the maxlinear combination \(\bar{X}\) is the corresponding linear combination of the tail measures of the components. In particular, \(\bar{X}\) has 1Fréchet marginals with scale coefficient \(\Vert \bigvee _{t=1}^{n}\gamma _t X^{(t)}_i\Vert _1=\sum _{t=1}^{n}\gamma _t\Vert X^{(t)}_i\Vert _1\).
Hence, by the definition of the spectral distance \(d_{\bar{X}}\) in Definition 2.13 we obtain
By the linear representation of \(\mu _{\bar{X}}\) and (2.12) it follows for the taildependence coefficients
1.2 A.2 Proofs for Section 3
Lemma A.2
For \(\lambda\) and \(\theta\) in (3.2) and \(\beta\) in (3.1), we have the inversion formulae (3.3), (3.4) as well as (3.5). Namely, the following formulae hold:
Proof
For simplicity, introduce the indicator functions \(I_i:= 1_{A_i}\), where \(A_i = \{x\in \mathbb R^p\,:\, x_i>1\}\). In view of (3.1) and (2.12), we have
as well as
This immediately entails
which proves (3.4).
The inclusion–exclusion formula for \(\theta\) in terms of the \(\lambda\)’s is immediate from (6.4) and the observation that
Now, using (6.4) we obtain
Observe that by Newton’s binomial formula:
and hence
Using the latter expression for the constant 1 in the righthand side of (6.5), we obtain
completing the proof of the inclusion–exclusion formula for the \(\lambda\)’s via the \(\theta\)’s in (3.5).
To complete the proof we need to establish the expression of \(\beta (J)\)’s via the \(\theta (K)\)’s in (3.3). We do so by passing through the \(\lambda (L)\)’s first. Namely, by the established (3.4) and (3.5), we have
Observe that \(C(K,J):= \sum _{L\,:\, (J\cup K)\subset L} (1)^{LJ\cup K} =0\) if \((J\cup K)^c \not = \emptyset .\) Indeed, the latter sum is simply \((1+(1))^{ [p] \setminus (K\cup L) } = 0\). On the other hand, if \(J\cup K = [p]\), we trivially have \(C(K,L)=1\). This, since \(J\cup K = [p]\) is equivalent to \(J^c\subset K\), immediately implies
This proves (3.3).\(\square\)
Proof of Theorem 3.8
For this proof it is convenient to let \(\Vert u\Vert := \max _{i\in [p]} u_i\) be the supnorm.
(‘if’) Suppose that all \(\beta (J)\)’s in (3.3) (or in (3.4)) are nonnegative and define \(X^*\) as in (3.12). Clearly, \(X^*\) is maxstable and we shall determine its spectral measure \(\sigma ^*\) in the supnorm. Observe that for all \(x\in (0,\infty )^p\) we have \(\beta (J) Z_J 1_J\le x\), if and only if \(Z_J \le \min _{i\in J} x_i /\beta (J)\) and since the \(Z_J\)’s are iid standard 1Fréchet:
Letting \(\sigma ^*(du):= \sum _{\emptyset \ne J\subset [p]} \beta (J) \delta _{1_J}(du),\) we obtain that
This shows that
which in view of (2.11) shows that \(\sigma ^*\) is the spectral measure of \(X^*\), where \(\mu ^*\) denotes the tail measure of \(X^*\).
Let now \(\emptyset \not =J\subset [p]\) and define the set
We will argue that, with \(B_J, A_j\) as in (3.1),
Indeed, by (2.4), we have
Note that if \(x\in \widetilde{C}_J\), then \(x_i=\Vert x\Vert >1\) for all \(i\in J\), and \(x_j = 0 <1\), for all \(j\in J^c\), so that \(x\in B_J\). This means that \(\widetilde{C}_J\subset B_J\), and hence \(\sigma ^*(C_J) \le \mu ^*(B_J)\).
By the construction of \(\sigma ^*\), on the other hand, we have \(\sigma ^*(C_J) = \sigma ^*(\{1_J\}) = \beta (J)\) and since the \(B_J\)’s partition the set \(\{\Vert x\Vert >1\}\cap \mathbb R_+^p\), we get
where the last relation follows from (6.6) by setting \(x_i=1, i \in [p]\). Since \(\sigma ^*(S) = \mu ^*(\{\Vert x\Vert >1\})\) we obtain from the above inequality that \(\mu ^*(B_J) = \sigma ^*(C_J)=\beta (J)\), for all \(J\subset [p]\).
We have thus shown that the functionals \(\beta (J)\) that we started with are indeed the ones which determine the extremal (tail dependence) coefficients of \(X^*\) via (3.2). This completes the proof of the ‘if’ part.
(‘only if’) Conversely, let \(\{\theta (K),\ K\subset [p]\}\) (or \(\{\lambda (L),\ L\subset [p]\}\)) be the extremal coefficients (taildependence coefficients, respectively) of a maxstable vector X with tail measure \(\mu\). Then, as already argued above (2.12) holds, and hence the \(\beta (J)\)’s defined as in (3.1) are nonnegative and satisfy Relations (3.3) (or (3.4)). This completes the proof.\(\square\)
1.3 A.3 Proofs for Section 5
Proof of Lemma 5.3
Assume that for an input matrix d for the SDR problem with unconstrained, identical margins the answer is positive, which is equivalent to d being \(L^1\)embeddable, see the proof of Theorem 5.1. According to Proposition 4.5 and Lemma A.1 we can then choose the realizing maxstable vector X as a (generalized) TMmodel with coefficients \(\beta (J), \emptyset \ne J \subset [p]\), such that
and
We note that the particular choice of \(\beta ([p]) \ge 0\) does only affect the value of c and is not determined by (6.7) as \(1_{[p]}(i)1_{[p]}(j)=0\). Thus, for each realizing maxstable vector X with \(\Vert X_i\Vert =c\) we can for each \(\tilde{c}>c\) find a realizing maxstable vector X with \(\Vert X_i\Vert =\tilde{c}\) by increasing the value of \(\beta ([p])\) in the generalized TMmodel.
Since all \(\beta (J)'s\) are nonnegative, (6.7) implies furthermore that for all \(\emptyset \ne J \subsetneq [p]\) the inequality
holds. Thus we know that for all \(i \in [p]\)
As any choice of \(\beta ([p])\ge 0\) leads to a realizing (generalized) TMmodel for given d we see that any value \(c \ge (2^p2)\max _{i,j \in [p]}d(i,j)\) is possible for the marginal scale of this model.\(\square\)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Janßen, A., Neblung, S. & Stoev, S. Taildependence, exceedance sets, and metric embeddings. Extremes (2023). https://doi.org/10.1007/s1068702300471z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s1068702300471z
Keywords
 Bernoulli compatibility
 Exceedance sets
 Line metrics
 Maxstable vectors
 Metric embedding
 Multivariate regular variation
 NP completeness
 Taildependence (TD) matrix
 Taildependence coefficients
 TawnMolchanov models
 Realization problem for TD matrices
AMS 2000 Subject Classification
 Primary–60G70
 Secondary–51K05
 60E05
 68R12
 68Q25