Approximation and estimation of very small probabilities of multivariate extreme events

This article discusses modelling of the tail of a multivariate distribution function by means of a large deviation principle (LDP), and its application to the estimation of the probability of a multivariate extreme event from a sample of n iid random vectors, with the probability bounded by powers of sample size with exponents below -1. One way to view classical tail limits is as limits of probability ratios. In contrast, the tail LDP provides asymptotic bounds or limits for log-probability ratios. After standardising the marginals to standard exponential, dependence is represented by a homogeneous rate function. Furthermore, the tail LDP can be extended to represent both dependence and marginals, the latter implying marginal log-GW tail limits. A connection is established between the tail LDP and residual tail dependence (or hidden regular variation) and a recent extension of it. Under a smoothness assumption, they are implied by the tail LDP. Based on the tail LDP, a simple estimator for very small probabilities of extreme events is formulated. It avoids estimation of the rate function by making use of its homogeneity. Strong consistency in the sense of convergence of log-probability ratios is proven. Simulations and an application illustrate the difference between the classical approach and the LDP-based approach.

natural hazard assessment, and in operational risk assessment for financial institutions. Multivariate events with such low probabilities are also relevant to these fields of application. Examples are breaching of a flood protection consisting of multiple sections differing in exposure, design and maintenance along a shoreline or river bank (Vrouwenvelder & Struik (1990)), damage to an offshore structure caused by the combined effects of multiple environmental loads like water level, wave height, etc. (ISO (2005)), and operational losses suffered by banks in different business lines and due to various types of events (Embrechts & Puccetti (2007)).
Most research on estimation of probabilities of multivariate extreme events has been based on the regularity assumption that the multivariate distribution function F is in the domain of attraction of some extreme value distribution function (de Haan & Ferreira (2006); Resnick (1987)), employing the exponent measure or its properties to formulate estimators; see Coles & Tawn (1994), Bruun & Tawn (1998), de Haan & Sinha (1999, Drees & de Haan (2013), and Ch. 8 of de Haan & Ferreira (2006). If some components of the random vector X under consideration are asymptotically independent, these estimators may produce invalid results. To alleviate this problem, residual tail dependence was introduced as an additional regularity assumption on the tail of the multivariate survival function F c defined by F c (x) = P (X i > x i ∀i ∈ {1, .., m}); see e.g. Ledford & Tawn (1996), Ledford & Tawn (1997), Ledford & Tawn (1998) and Draisma et al (2004). This approach was recently extended in Wadsworth & Tawn (2013).
If F satisfies a GW tail limit and q F (∞) > 0 2 , then the GW limit implies the log-GW limit defined by replacing q F in (1.3) by log q F , i.e., log q F ∈ ERV (see de Valk (2014), Theorem 2(b)). The log-GW limit is a natural condition for estimation of quantiles with probabilities in the range (1.1): if U (∞) > 0, F has a GP tail limit, and the relative error of a GP approximation of U (t λ ) from U (t) vanishes as t → ∞ for all λ ≥ 1 3 , then either lim t→∞ U (t λ )/U(t) = 1 for all λ > 0 (so the GP approximation converges trivially), or log q F ∈ ERV {0,1} ; see Proposition 1 in de Valk (2014). Since ERV {0,1} is a rather exceptional subset of ERV , it makes sense to adopt the log-GW limit log q F ∈ ERV as regularity assumption instead of the GP tail limit.
One might expect that this regularity assumption would be equally useful for estimating probabilities of exceedance in the range (1.1), and by extension, for estimating probabilities of multivariate extreme events in the range (1.1). This paper looks at the viability of such an approach from a theoretical angle.
To keep the notation simple, only GW tail limits will be considered. This does not restrict applicability in any way, since a log-GW limit can always be converted to a GW limit by replacing the random variable X ∼ F under consideration by log X. In Section 7, we will return to this issue for a brief discussion of practical aspects.
The remainder of this paper is organised as follows. As preparation, (1.5) is generalised to arbitrary Borel sets in R; this results in a simple large deviation principle, which can take different forms. These are generalised to the multivariate setting in Section 3, in order to provide bounds and limits for probabilities of multivariate extreme events. Section 4 makes the connection to classical multivariate extreme value theory and in particular, to residual tail dependence and related assumptions. Section 5 discusses how the representation of tail dependence can be changed by changing the standardisation of the marginal distributions, and in particular, how we can approximate probabilities of events which are extreme "in any direction" in this manner. Section 6 applies the theory to formulate and analyse estimators for probabilities of multivariate extreme events in the range (1.1), and Section 7 closes with a discussion of the results and of potential applications.

The univariate case
Consider a scalar random variable X ∼ F . To avoid distraction by technicalities, let F be continuous; this assumption does not seem restrictive for typical applications of extreme value analysis. The infimum of an (extended) real function f over a set S will be written as inf f (S) instead of the conventional but cumbersome inf x∈S f (x).
Furthermore, inf f ({∅}) := ∞. The following holds without further assumptions on F : , hence the lower bound in (2.1). The upper bound is proven similarly. ⊓ ⊔ Equation (2.1) is a somewhat trivial example of a large deviation principle (LDP) (e.g. Dembo & Zeitouni (1998)). In a similar manner, the GW limit (1.5) is readily generalised as well: Proposition 2 Suppose that F satisfies the GW limit (1.5). Then for every θ > 0 and every Borel set Proof The proof is similar to that of Proposition 1 above. In particular for the lower bound, for nonempty A o , α := inf(A o ) ≥ θ, δ > 0 such that (α, α + δ] ⊂ A o and ε ∈ (0, δ/2), P ((X − q F (y))/g(y) ∈ hρ(A o )) ≥ P ((X − q F (y))/g(y) ∈ hρ((α, α + δ])) ≥ e −y(α+ε) (0 ∨ 1 − e −y(δ−2ε) ) provided that y is large enough, with the last inequality a consequence of (1.5). ⊓ ⊔ For an extension to the multivariate setting, it would be desirable to replace (2.2) by bounds valid for all A ⊂ [0, ∞), since a multivariate event we might be concerned with could be extreme in one variable, but not in some other variable. Therefore, defining the quantile approximationq F,y for y > 0 bỹ with ρ a real number and g a positive function, the following combination of (2.2) and (2.1) would be useful: (2.4) Proposition 3 Suppose that F satisfies the GW limit (1.5). Then (2.4) holds for every A proof is omitted, as it is similar to the proofs of the previous propositions. It is straightforward to derive that (2.4) is equivalent to the GW limit, as it implies (1.5) and (1.3).
Remark 1 By (3.6), I(x) = ̺(x)I(x/̺(x)) for every x ∈ R m \ {0} and every norm ̺ on R m . This gives for every norm a "spectral representation" of I, analogous to the spectral measures in classical extreme value theory (e.g. de Haan & Ferreira (2006), Section 6.1.4). For example, in the bivariate case, the rate function can be represented as for all t ∈ [0, 1] because of (3.5).
Remark 2 In the special case that I is subadditive, then by (3.6), it is convex, and furthermore, by (3.5), it is a norm.
A is called a continuity set of I (Dembo & Zeitouni (1998)

(3.17)
A is a continuity set of I if I is continuous and A ⊂ A o , for example. Homogeneity (3.6) of I relaxes the conditions for being a continuity set. For example, without assum- In the remainder of this article, we will discuss continuity sets of rate functions without considering particular conditions which make them continuity sets.

Connection to residual tail dependence, and short-range approximation of probabilities of tail events
There is an interesting connection between the theory of Section 3 and residual tail dependence (RTD), introduced in Ledford & Tawn (1996, 1997; see also de Haan & Ferreira (2006), Section 7.5. In the bivariate case, RTD offers a model of tail dependence within the classical domain of asymptotic independence of component-wise maxima (de Haan & Ferreira (2006), Section 7.6). For a random vector X on R m with continuous marginals F 1 , ..., Fm, it amounts to the assumption that for some positive function (4.1) The limit S satisfies S(1 ) = 1, with 1 the vector in R m with all its components equal to 1. Furthermore, the denominator in (4.1) must be regularly varying, so S(1 λ) = λ −1 /η for all λ > 0 with η ∈ (0, 1] the residual dependence index, and by (4.1), RTD is an example of short-range tail regularity, dealing with extrapolation of the survival function over marginal distances corresponding to fixed factors 1/x 1 , ...,1/xm in probability of exceedance. This stands in contrast to long-range tail regularity represented by the tail LDP, which concerns extrapolation over a fixed factor in the logarithm of the probability of a multivariate event (e.g. for a continuity set A of I satisfying with Y defined by (3.1), and let H ∧ be the distribution function of Y ∧ . Define for all a ∈ R m Aa := {x ∈ R m : x j > a j ∀j ∈ {1, .., m}. (4.4) For real functions, a one-sided smoothness condition L is defined as follows (cf. Bingham et al (1987), Subsection 1.7.6): f ∈ L if f is nondecreasing, absolutely continuous, and its derivative f ′ satisfies (4.5) RTD and the tail LDP (3.9) are related as follows.
Proposition 4 shows that RTD implies a limited LDP-like condition and in turn, that the LDP (3.9) with an additional smoothness condition implies an RTD-like condition; the smoothness makes it possible to derive short-range regularity from the long-range regularity represented by the LDP.
RTD provides only a partial description of the tail, since (4.1) only describes the joint survival function, and this only within domains of the form {z ∈ R m : ., m} as t → ∞. In terms of Y with standard exponential marginals, (4.1) characterises the limiting behaviour of the joint survival function of Y within zones of finite width around the diagonal: for all z ∈ R m , in Proposition 4(c), the LDP implies an alternative representation of short-range tail dependence: Theorem 2 Assume (3.4), so the LDP (3.9) applies. To any Borel set A ⊂ R m which is a continuity set of I and satisfies the smoothness condition (see (4.5)), the following limit applies: with I satisfying (3.5), (3.6), (3.10) and I(0) = 0. In particular, for every a ∈ [0, ∞) m such that Proof For A a continuity set of I, (3.9) implies (3.17), and (4.9) is obtained in the same manner as in the proof of Proposition 4(c). In particular, Aa is a continuity set of I for every a ∈ [0, ∞) m , so (4.10) follows directly from (4.9). ⊓ ⊔ Due to (3.6), application of (4.9) or (4.10) only requires knowledge of I on {x ∈ [0, ∞) m : ̺(x) = 1} for some norm ̺ on R m . The marginals are described by (4.9) as well, as can be seen from (4.10) and (3.10). In the special case of a = 1 , (4.10) becomes equivalent to (4.1) with x = 1 λ and η = 1/I(1 ), so on the diagonal, the limit (4.10) and RTD (4.1) agree; elsewhere, they differ.
However, it appears that the limit (4.9), inherited from the LDP (3.9), provides a solution to this problem. It offers a straightforward recipe for approximation of probabilities P (X ∈ Q(A log(tλ))) = P (V ∈ (tλ) A ) for all λ > 1 and all Borel sets A ⊂ R m satisfying the conditions of Theorem 2, which is readily turned into an estimator (we will not pursue this further, focusing instead on estimation based on the LDP (3.9) later in Section 6).
In (4.9), it is not κ, but the rate function I which determines the attenuation This condition is rather restrictive, as a rate function resembles a density more than it resembles a survival function; see definition (3.4). As an illustration, for In the bivariate case with T 12 = T 21 =: t and x := (1, 0), therefore, for every t ∈ (0, 1) by (3.10). More generally, in the bivariate case with differentiable κ, I(a) > κ(a) implies that κ 1 (a) = 0 or κ 2 (a) = 0 and therefore, (4.11) yields zero, so (4.12) is not valid.
After this excursion into the topic of short-range tail dependence, we now return to long-range tail dependence described by the LDP (3.9).

Changing the standardisation of the marginals
An LDP similar to (3.9) is obtained when we replace the standard exponential marginals of Y by another distribution G which is continuously increasing on its support (0, ∞) and satisfies a Weibull tail limit (cf. (1.6)) with index r > 0, so q G = G −1 (1 − e −Id ) satisfies q G ∈ RV {r} . Define the random vector Z = (Z 1 , .., Zm) on R m by and for every x ∈ R m and a > 0, define x a := (|x 1 | a sgn(x 1 ), .., |xm| a sgn(xm)).
Theorem 3 Suppose that {µy; y > 0} defined by (3.3) satisfies (3.4) with rate function I. Then for every continuously increasing distribution function G with support in (0, ∞) and Weibull tail limit with index r > 0, (a) {µ G,y ; y > 0} defined by (5.3) is exponentially tight and satisfies the LDP x r ∈ A} is a continuity set of I; (d) if all marginals of F satisfy GW tail limits, i.e., (3.14), then and for all y > 0,Q G,y :=Qy • q −1 Theorem 3 above provides some freedom in choosing the standardised marginals in which to express tail dependence.
Theorem 4 For X a random vector in R m with continuous marginals and L the Laplace distribution, assume that {µ L,y , y > 0} satisfies for all x ∈ R m . Then for every continuously increasing symmetric distribution function G satisfying a Weibull tail limit with index r > 0, (1.4) for F = F j , g = g + j and ρ = ρ + j , and for F = 1 − F j (−Id), g = g − j and ρ = ρ − j for j = 1, .., m), then the LDP (5.6) holds withμ G,y defined by (5.7), with for all y > 0 and x ∈ R m ,Q G,y (x) := (Q G,1,y (x 1 ), ..,Q G,m,y (xm)) and Proof Apart from readily verifiable identities, the proof involves a minor and straightforward adaptation of the derivation of (3.9) and the proof of Theorem 3. ⊓ ⊔ Remark 3 Theorem 4 remains valid after replacing L by any continuously increasing symmetric distribution function G satisfying a Weibull tail limit with index ̺ > 0, using I G := I G • Id ̺/r in (a).

Remark 4
The standard normal distribution Φ satisfies the conditions on G in Theorem 4. Φ seems a particularly attractive choice for G: if the components of X are independent, then I Φ = 2 2 , which is rotation-invariant and convex.

Simple estimators for very small probabilities
We are now ready to apply the theory of Sections 3 and 5 to the problem of estimation of probabilities of extreme events pn satisfying (1.1) from X (1) , ..., X (n) , with X (1) , X (2) , ... a sequence of iid random vectors in R m with a distribution function F having continuous marginals F 1 , .., Fm. Generalising (1.1) to τ 2 > τ 1 > 0, we will focus on events of the form Bn := Q G (q G (ỹn)A) for certain A ⊂ R m , withỹn = − log pn and G some distribution function as in Theorem 4. Under the conditions for Theorem 4, including marginal upper and lower GW tail limits as in Theorem 4(c), the tail LDP (5.6) with (5.7) and (5.10) applies. For every Borel set A ⊂ R m which is a continuity set of I G satisfying that inf I G (A) ∈ (0, ∞), therefore, − log P (X ∈ Bn) ∼ỹn inf I G (A) (showing that eventually, P (X ∈ Bn) ∈ [n −t2 , n −t1 ] for some t 2 > t 1 > 0) and − log P (Q G,ỹn (Z/ℓ) ∈ Bn) ∼ỹn inf I G (Aℓ) for all ℓ > 0, so by (5.5), lim n→∞ log P (X ∈ Bn) ℓ −1/r log P (Q G,ỹn (Z/ℓ) ∈ Bn) = 1 ∀ℓ > 0. (6.1) This limit suggests that the logarithm of P (X ∈ Bn) could be estimated by modifying the denominator in (6.1) as follows: choose for G a particular distribution function G with Weibull tail index ̺ > 0 and satisfying the conditions on G in Theorem 4, and then replace r by ̺ and P (Q G,ỹn (Z/ℓ) ∈ Bn) by an estimator of P (Q G,n (Z/ℓ) ∈ Bn) with Z := Q −1 G (X), (6.2) andQ G,n an estimator of Q G . The hope is that for each n, the value of ℓ can be chosen in such a manner that consistency is assured. Estimation of Q G (see (5.2)) boils down to a univariate quantile estimation problem, so we will proceed to examine this first. For real scalar iid random variables X (1) , X (2) , ... with continuous distribution function F satisfying the upper GW tail limit (1.5), and with X 1:n ≤ ... ≤ Xn:n the order statistics of the sample X (1) , ..., X (n) , define a quantile estimatorq n,X for q F defined by (1.2) aŝ q n,X (y) := X ⌊n(1−e −y )⌋+1:n if y ∈ (0, yn] X n−k0(n)+1:n +ĝ(n)hρ (n) (y/yn) if y > yn, (6.3) withρ(n) andĝ(n) estimators for ρ and g in (1.5), respectively, k 0 : N → N nondecreasing and such that k 0 (n) ∈ {1, ..., n} for all n ∈ N, and yn := log(n/k 0 (n)).
Returning to the case of a sequence of iid copies X (1) , X (2) , ... of a random vector X in R m , let for j ∈ {1, .., m} X j,1:n ≤ ... ≤ X j,n:n denote the marginal order statistics derived from the marginal sample X (1) j , ..., X (n) j . Now define the following GW-based estimatorQ G,j,n for Q G,j (compare (5.10)): withq n,X defined by (6.3), and letQ G,n (x) := (Q G,1,n (x 1 ), ...,Q G,m,n (xm)) ∈ R m for every x ∈ R m . Furthermore, define the following estimator of Z Shortly, we will verify that if Bn,τ := Q G (q G (τ log n)A) is substituted for B, with G any continuous symmetric distribution function satisfying a Weibull tail limit, then under mild restrictions on A and k 0 , this estimator converges in the large deviation sense for all τ > 0 .
Theorem 5 Let X (1) , X (2) , ... be iid copies of a random vector X on R m satisfying

the conditions of Theorem 4, including continuous marginals with upper and lower GW tail limits as in Theorem 4(c). Let G be a continuously increasing and symmetric
distribution function satisfying a Weibull tail limit with index ̺ > 0. For k 0 : N → N satisfying 0 < c ′ := lim inf n→∞ log k 0 (n) log n ≤ lim sup n→∞ log k 0 (n) log n =: c < 1, (6.14) consider the estimator (6.18) for P (X ∈ B), with the quantile estimator (6.3) satisfying (6.6) and θ ∈ (0, (1 − c ′ ) −1 ). Then for Bn,τ := Q G (q G (τ log n)A), with G any continuously increasing and symmetric distribution function satisfying a Weibull tail limit and A ⊂ R m any Borel set which is a continuity set of I G defined in Theorem 4(a) satisfying inf I G (A) ∈ (0, ∞), Proof The proof can be found in Subsection 8.2. ⊓ ⊔ Remark 5 By Theorem 4, P (X ∈ Bn,τ ) = n −τ inf IG(A)(1+o(1)) in (6.15), so the probability range (1.1) is covered by Theorem 5.
Remark 6 G in Theorem 5 may be different from G; therefore, the choice of G for the estimator is of no consequence for its consistency. The case of G and G having support in (0, ∞) as in Theorem 3 can be handled by small modifications.

(6.19)
Proof The proof can be found in Subsection 8.1.

Discussion
Like similar methods in the classical multivariate extreme value setting (e.g. de Haan & Sinha (1999); Drees & de Haan (2013); Draisma et al (2004)), the estimators (6.13) and (6.18) exploit homogeneity of a function describing tail dependence; in this case, homogeneity (5.5) of the rate function I G . This offers the advantage that no explicit estimate of I G is required. However, in certain situations, there may be reasons to estimate I G , such as if for a given X, probabilities need to be estimated for multiple sets in a consistent and reproducible manner. Therefore, estimation of I G remains a topic deserving elaboration. The limitation of A to continuity sets of I G in Theorems 5 and 6 is less restrictive than it may seem, since homogeneity of the rate function (5.5) makes continuity sets relatively common, as noted at the end of Section 3. The other conditions on A are weak. To prove convergence of the estimators under such weak conditions, local uniformity in d of convergence in (8.3) is employed, which is derived from uniformity in d of convergence in (8.13). A variant of (8.13) involving only pointwise convergence is much easier to prove using Hoeffding's inequality and the Borel-Cantelli lemma. Local uniformity in λ of convergence in (8.3), also derived from (8.13), is used to prove local uniformity in τ of convergence of the estimators in (6.15) and (6.19). In practice, this means that if such an estimator applied to a given dataset produces a fair estimate of P (X ∈ B 0 ) for some B 0 ⊂ R m , then it may also be applied with confidence to the same dataset to estimate the probability of B 1 ⊂ R m such that P (X ∈ B 1 ) ≥ P (X ∈ B 0 ) τ for τ > 1 not too large, e.g. τ = 2. If P (X ∈ B 0 ) ≪ 1, e.g. P (X ∈ B 0 ) = 0.01, this amounts to extrapolation over several additional orders of magnitude in probability. How far one can extrapolate in practice will depend on the rates of convergence to the marginal GW tail limits and in (5.4), which will differ from case to case.
Convergence of log-ratios of probabilities as in (6.15) and (6.19) is typical for the probability range (1.1). As observed in de Valk (2014), a stronger notion of convergence might be desirable, but would require restrictive additional assumptions which seem hard to justify 4 . Rather, it is recommended to diagnose bias in estimates and take this into account in estimates of uncertainty. For this reason, modelling of bias and rate of convergence deserves further study. Deriving asymptotic error distributions will require additional assumptions beyond those for Theorems 5 and 6 and is left for a follow-up study as well.
The assumption of marginal GW tail limits is not restrictive, since the more generally applicable log-GW tail limit can be converted to a GW tail limit by a logarithmic transformation (de Valk (2014)). This may also work for lower tail limits, as in the case of the lognormal distribution. In other cases, one may assume that for j ∈ {1, .., m}, numbers c j and d j in {0, 1} exist such that the distribution function of the random has upper and lower GW tails, and replace X j by (7.1). A rule for choosing c j and d j based on data of X j is discussed in de Valk (2015) under assumptions weaker than the existence (log)GW tail limits. In practice, it will often be clear how to choose c j and d j , based on diagnostics of GW and log-GW tail estimates or accumulated experience in the application domain. Potential applications of approximation based on the tail LDP's (5.4) and (5.6) and of estimators like (6.13) and (6.18) include analysis of financial and economic tail risk, flood hazard analysis, and structural reliability analysis (Ditlevsen & Madsen (2007)). FORM (Hasofer & Lind (1974)), the most widely applied method for approximation of failure probabilities in structural reliability analysis, can be regarded as a large-deviations method, although it is rarely presented as such. Compared to FORM, which requires transformation of X to a U ∼ N (0, I) (Hasofer & Lind (1974); Ditlevsen & Madsen (2007)), the tail LDP (5.4) may offer simplification, as it involves only marginal transformations. In line with FORM, the standard normal distribution Φ could be chosen for G in Section 5 and for G in Section 6. Furthermore, approximation of failure probabilities and extreme value analysis, normally treated as two entirely separate topics in structural reliability analysis, can be combined within the same large deviation framework, as demonstrated by the estimators in Section 6.
Short-range approximation based on (4.9) may have merits in applications focused on the intermediate probability range, such as in the analysis of market risk of financial institutions; (4.9) can be applied together with classical univariate extreme value methods for the marginals.
Without much effort, the main results of this article can be generalised from a random vector in R m to a random element of C b (K), the continuous functions on a compact metric space K. Classical multivariate extreme value theory has been generalised to this settings earlier; see e.g. de Haan & Lin (2001), Part III of de Haan & Ferreira (2006) and Ferreira & de Haan (2014). For the theory presented here, the main difference between the R m setting and the C b (K) setting is that in the latter, exponential tightness of {µ G,y , y > 0} no longer follows from the standardised marginals but is an independent assumption. In loose terms, it entails that all but an exponentially small probability mass is concentrated on equicontinuous sets of functions in C b (K) (see e.g. Dembo & Zeitouni (1998)).

Proof of Theorem 5
Proof Following the proof of Theorem 6 in Subsection 8.1, (8.5) and (6.13) yield and the result (6.15) follows as in the proof of Theorem 6. ⊓ ⊔
(b) If instead, G is symmetric and continuously increasing on R and satisfies a Weibull tail limit, then (8.9) holds as well.

⊓ ⊔
Lemma 5 Let G and G be symmetric, continuously increasing distribution functions satisfying Weibull tail limits with indices ̺ > 0 and r > 0, respectively. Let the random vector X on R m satisfy upper and lower marginal GW tail limits as in Theorem 4, and let {µ G,y ; y > 0} (see (5.3)) satisfy the LDP (5.4) with good rate function I G . Let the Borel set A ⊂ R m be a continuity set of I G satisfying inf I G (A) ∈ (0, ∞). Let k 0 : N → N satisfy (6.14) and let the quantile estimatorq n,X given by (6.3) satisfy (6.6). For c ′ defined in (6.14), let ∆ ∈ 0, ((1 − c ′ ) inf I G (A)) −r .