Approximation and estimation of very small probabilities of multivariate extreme events

This article discusses modelling of the tail of a multivariate distribution function by means of a large deviation principle (LDP), and its application to the estimation of the probability pn of a multivariate extreme event from a sample of niid random vectors, with pn?[n-t2,n-t1]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$p_{n}\in [n^{-\tau _{2}},n^{-\tau _{1}}]$\end{document} for some t1>1 and t2>t1. One way to view the classical tail limits is as limits of probability ratios. In contrast, the tail LDP provides asymptotic bounds or limits for log-probability ratios. After standardising the marginals to standard exponential, tail dependence is represented by a homogeneous rate function I. Furthermore, the tail LDP can be extended to represent both dependence and marginals, the latter implying marginal log-Generalised Weibull tail limits. A connection is established between the tail LDP and residual tail dependence (or hidden regular variation) and a recent extension of it. Under a smoothness assumption, they are implied by the tail LDP. Based on the tail LDP, a simple estimator for very small probabilities of extreme events is formulated. It avoids estimation of I by making use of its homogeneity. Strong consistency in the sense of convergence of log-probability ratios is proven. Simulations and an application illustrate the difference between the classical approach and the LDP-based approach.


Introduction
In this article, we will consider estimation of very small probabilities p n of multivariate extreme events from a sample of size n, with p n ∈ [n −τ 2 , n −τ 1 ] with τ 2 > τ 1 > 1, (1.1) motivated by applications requiring quantile estimates for p n 1/n in e.g. flood protection and more generally, natural hazard assessment, and in operational risk assessment for financial institutions. Multivariate events with such low probabilities are also relevant to these fields of application. Examples are breaching of a flood protection consisting of multiple sections differing in exposure, design and maintenance along a shoreline or river bank (Steenbergen et al. 2004), damage to an offshore structure caused by the combined effects of multiple environmental loads like water level, wave height, etc. (ISO 2005), and operational losses suffered by banks in different business lines and due to various types of events (Embrechts and Puccetti 2007).
Most work on estimation of probabilities of extreme events is based on the regularity assumption that the distribution function F is in the domain of attraction of some extreme value distribution function (de Haan and Ferreira 2006;Resnick 1987). In the univariate case, this is equivalent to the generalised Pareto (GP) tail limit for some positive function w, with U(t) := F −1 (1 − 1/t) and for λ > 0, for some γ ∈ R, the extreme value index. In the multivariate case, with F the distribution function of a random vector X = (X 1 , .., X m ) with continuous marginals F 1 , .., F m , it implies that each marginal satisfies the GP tail limit (1.2) and that V := (V 1 , .., V m ), the random vector with standard Pareto marginals with for every Borel set A ⊂ [0, ∞) m such that inf x∈A max(x 1 , .., x m ) > 0 and ν(∂A) = 0, with ν a measure satisfying ν(Aa) = a −1 ν(A) for all these A and all a > 0. Based on the GP tail limit and the properties of the exponent measure ν, estimators for probabilities have been formulated; e.g. Smith et al. (1990), Tawn (1991, 1994), Joe et al. (1992), Bruun andTawn (1998), de Haan andSinha (1999), Drees and de Haan (2015). If the maxima of some components of X under consideration are asymptotically independent, these estimators may produce invalid results. To alleviate this problem, residual tail dependence (RTD), also known as hidden regular variation, was introduced as an additional regularity assumption on the tail of the multivariate survival function F c , defined by F c (x) = P (X i > x i , i = 1, .., m); e.g. Ledford and Tawn (1996, 1997, Peng (1999), Resnick (2002), Draisma et al. (2004) and Heffernan and Resnick (2005). This model was recently extended in Wadsworth and Tawn (2013). Another approach, based on conditional limits, was proposed in Heffernan and Tawn (2004) and Heffernan and Resnick (2007).
The first-order tail regularity conditions (1.2) and (1.5) can be seen as limiting relations for probability ratios. As such, they only allow estimation of probabilities p n vanishing slowly enough, that is, p n ≥ λk n /n (1.6) for some λ > 0 and some intermediate sequence 1 (k n ); therefore, np n → ∞ as n → ∞. For an iid sample, the empirical probabilityp n is an unbiased estimator for such p n , satisfying thatp n /p n p → 1 (from the binomial distribution of np n ). Therefore, estimators for these p n which make use of tail regularity can at best achieve a reduction in variance when compared top n . To allow tail extrapolation to be carried further to more rapidly vanishing p n , additional assumptions beyond (1.2) and (1.5) are introduced. Initially, e.g. in Smith et al. (1990), Tawn (1991, 1994) and Joe et al. (1992), the tail is assumed to follow the limiting distribution exactly above some thresholds, so likelihood methods can be employed. Later, e.g. in de Haan and Sinha (1999), Peng (1999), Drees and de Haan (2015), Draisma et al. (2004) and de Haan and Ferreira (2006), convergence to the limiting distribution and its effect on bias in estimates is explicitly considered. For the marginals, additional assumptions on convergence to the limit (1.2) in these articles are identical to or stronger than those invoked for univariate quantile estimation. 2 However, the latter appear to be restrictive when γ = 0, regardless of the precise nature of the assumption; see de Valk (2016), Proposition 1. For example, they exclude the normal and the lognormal distribution, but also for all α ∈ (0, ∞) \ {1} the distribution functions of Y α with Y exponentially distributed, and of exp((log V ) α ) with V Pareto distributed.
To overcome these limitations, we will consider a different approach in this paper. Rather than imposing additional assumptions on convergence beyond the first-order limits (1.2) and (1.5), we will attempt to replace them by different types of firstorder limits more suitable for the probability range (1.1). Let (k n ) be an intermediate sequence satisfying k n ≤ n c for some c ∈ (0, 1). Then (p n ) satisfying (1.1) does not satisfy (1.6), but This suggests that replacing the classical limits of probability ratios by limits of log-probability ratios could provide a framework for constructing estimators for probabilities of extreme events in the range (1.1).
In the next section, we address the limiting behaviour of log-probability ratios in the univariate case as introduction to the multivariate case. We will find that this behaviour is described by a large deviation principle (LDP) (see e.g. Dembo and Zeitouni (1998)). It is generalised to the multivariate setting in Section 3. In Section 4, we establish a connection between this LDP and residual tail dependence and related assumptions. Section 5 returns to the basic LDP and applies it to formulate a simple estimator for probabilities of extreme events in the range (1.1) and to prove its consistency. In Section 6, this estimator is compared to its classical analogues in simulations, and an application of the LDP-based estimator is presented as illustration. Section 7 closes with a discussion of the results and of outstanding issues. Readers primarily interested in tail dependence could scan Section 2 for the approach and background, read the first part of Section 3 until Eq. 3.12, and then continue with Sections 4-7. Lemmas can be found in Section 8.
The following notation is adopted: Id denotes the identity. The interior of a set S is denoted by S o and its closure byS. The image of a set S under a function f is written as f (S). The infimum of an (extended) real function f over S is written as inf f (S); by convention, inf{∅} := ∞. To avoid tedious repetition, expressions of the form a ≤ lim inf y→∞ f (y) ≤ lim sup y→∞ f (y) ≤ b are abbreviated to a ≤ lim inf y→∞ f (y) ≤ lim sup y→∞ ... ≤ b.

Introducing the tail LDP: the univariate case
We begin by examining the univariate case in order to become acquainted with a particular type of large deviation principle (LDP) as a model of the tail of a distribution function.
Let X be a real-valued random variable and let {b y , y > 0} be a family of real functions such that for D ⊂ [0, ∞), b y (D) becomes more extreme in some sense when y is increased. In line with the classical limits (e.g. (1.2)), we could consider an affine function for b y , i.e., b y (x) = r(y)+g(y)x,with r some nondecreasing function and g some measurable positive function. Instead, for a reason to be explained later, we assume that F (0) < 1 and consider b y (x) = r(y)e g(y)x (2.1) with g and r as above and r(∞) > 0. We examine the limiting behaviour of as y → ∞. Substituting y n = − log(k n /n) for y, this determines the behaviour of the log-probability ratio in Eq. 1.7 with p n = P (X ∈ b y n (D)) as n → ∞. Generally speaking, normalised logarithms of probabilities like (2.2) do not need to satisfy limits, so we only assume that 3 for (at least) D = (x, ∞) for all x ≥ 0, with J some monotonic set function taking values in [0, ∞]. Noting that ϕ(x) := −J ((x, ∞)) is nondecreasing in x, we have at every continuity point x of ϕ in (0, ∞), Assume that ϕ is not constant. By Lemma 1.1.1 of de Haan and Ferreira (2006), Eq. 2.4 implies lim y→∞ (log q(yλ) − log r(y))/g(y) = ϕ −1 (λ) at every continuity point of the left-continuous inverse ϕ −1 of ϕ in (ϕ(0), ϕ(∞)). Therefore (cf. the proof of Theorem 1.1.3 in de Haan and Ferreira (2006)), we may take r = q and choose g measurable and such that ϕ −1 (λ) = h θ (λ) for some real θ (see (1.3)). As a result, and from (2.4), Equations 2.6 states that log q is extended regularly varying with index θ . By Eq. 2.6, lim y→∞ g(yλ)/g(y) = λ θ for all λ > 0, so g ∈ RV θ (g is regularly varying with index θ ); see Appendix B of de Haan and Ferreira (2006 (α), h θ (α + δ)] ⊂ D o and for every ε ∈ (0, δ/2), P ((log X − log q(y))/g(y) ∈ D o ) ≥ F (e g(y)h θ (α+δ) q(y)) − F (e g(y)h θ (α) q(y)) ≥ e −y(α+ε) − e −y(α+δ−ε) ≥ e −y(α+ε) (0 ∨ 1 − e −y(δ−2ε) ), provided that y is large enough, as a consequence of (2.7). As δ > 0 is arbitrary, this implies the lower bound in (2.8). The proof of the upper bound is similar and is therefore omitted.
The pair of equivalent limit relations (2.6) and (2.7) was named the log-Generalised Weibull (log-GW) tail limit in de Valk (2016), where it was proposed as a model for estimating high quantiles for probabilities in the range (1.1), as an alternative to the more familiar GP tail limit. If θ = 0 and g(y) → g ∞ > 0 as y → ∞, it reduces to the Weibull tail limit; see e.g. Broniatowski (1993) and Klüppelberg (1991).
The log-GW tail limit looks deceptively similar to a GP tail limit, but it is a very different object, primarily due to the logarithm in Eq. 2.7 (or equivalently, due to the exponent in (2.5)). Its domain of attraction covers a wide range of tail weights: a class of light tails having finite endpoints, tails with Weibull limits (such as the normal distribution), all tails with classical Pareto tail limits and, more generally, with log-Weibull tail limits. For the latter, F • exp satisfies a Weibull tail limit; an example is the lognormal distribution. For estimation of high quantiles with probabilities (1.1) of distribution functions within the domain of attraction of the GP limit with γ = 0, the log-GW tail limit offers a continuum of limits instead of just one; as a consequence, it is much more widely applicable (see de Valk (2016)). Readers more comfortable with classical tail limits may consider focusing on tails with a Pareto tail limit (γ > 0), which have a log-GW limit with θ = 1 (so h θ (λ) = λ − 1) and g(y) = γ y. This may make reading of the rest of the article easier.
An expression of the form Eq. 2.8 is an example of a large deviation principle 4 (LDP); see Section 1.2 of Dembo and Zeitouni (1998) for a general background. The rate function of the LDP (2.8) is h −1 θ . The bounds provided by an LDP are crude; for example, they are unaffected by multiplying the probability in Eq. 2.8 by a positive number. One could see this as the price to be paid for approximating probabilities over a very wide range. More precise bounds may exist, but such cases should be regarded as the exception rather than the rule. Observe also that the bounds do not involve integration and in fact, most of D does not even matter to the values of the bounds. The LDP (2.8) reduces to a limit only if D satisfies inf h −1 θ (D o ) = inf h −1 θ (D); such a D is called a continuity set of the rate function. Had we considered events of the form b y (x) = r(y) + g(y)x instead of Eq. 2.1, then in the same way as above, we would have arrived at a different tail limit, the GW limit defined by replacing log q by q in Eq. 2.6 (see de Valk (2016)). Its domain of attraction covers a much more limited range of tail weights. Furthermore, if F (0) < 1, then the GW limit implies a log-GW limit (cf. the proof of Lemma 3.5.1 in de Haan and Ferreira (2006)). Therefore, to ensure that the results of this article are sufficiently widely applicable, we focus on the log-GW limit.
The events considered in Eq. 2.8 with D ⊂ [0, ∞) imply that X is in the interval 5 [q(y), ∞) for q(y) > 0. In a multivariate setting, it would be desirable to extend this interval to R, since a multivariate event could be extreme in one variable, but not in some other variable. This can be accomplished using a trick: define an approximatioñ q y of q (see (2.5)) for y ∈ q −1 ((0, ∞)) bỹ (2.9) so for z > y,q y (z) is the log-GW tail approximation; for z ≤ y, it is exact. A random variable Y with the standard exponential distribution satisfies for every Borel set A ⊂ [0, ∞), which can be proven in a similar manner as Proposition 1. If F is continuous, then Y = − log(1 − F (X)) has the standard exponential distribution and q is increasing, so we can substitute P (X ∈ q(Ay)) for P (Y ∈ Ay) in (2.10). Under the assumptions of Proposition 1, we can substitute P (X ∈q y (Ay)) for P (Y ∈ Ay) in Eq. 2.10 as well, extending (2.8) to Proposition 2 If F is continuous, then (2.8), (2.6) and (2.7) are all equivalent to (2.11).
When restricting A to [1, ∞), Eq. 2.11 is equivalent to Eq. 2.8 for D = h −1 θ (A). With A in [0, ∞), therefore, Eq. 2.11 provides the intended generalisation of (2.8). Note that the log-GW index θ and auxiliary function g are now hidden in the approximationq y in (2.9). However, they are as essential in Eq. 2.11 as they are in the more explicit (2.8).

Bounds and limits for probabilities of multivariate tail events
For the univariate tail, we obtained the LDP (2.11) in a form which closely resembles (2.10) for the standard exponential distribution. This suggests that for a multivariate generalisation, we examine first the case of a random vector Y := (Y 1 , .., Y m ) with distribution function having standard exponential marginals. A straightforward multivariate generalisation of the LDP (2.10) would be for every Borel set A ⊂ [0, ∞) m , with I some rate function; we may regard (3.1) as the analogue of the classical expression (1.5). Further on, we will prove that Eq. 3.1 holds if For now, we turn to the rate function I , defined by Eq. 3.2 as some kind of limiting density, with the probability of an open ball replaced by its logarithm. Several properties of I follow immediately from Eq. 3.2 and the exponential marginals of Y . For every ε > 0 and x ∈ R m with x j = λ > 0 for some j ∈ {1, .., m}, This implies that I is a good rate function, meaning that I −1 ([0, a]) is compact for every a ∈ [0, ∞). Also, since B ε (xλ) = λB ε/λ (x), for every x ∈ R m \{0} and every norm on R m . This gives for every norm a "spectral representation" of I , analogous to the spectral measures in classical extreme value theory (e.g. de Haan and Ferreira (2006), Section 6.1.4). For example, in the bivariate case, the rate function can be represented The similarity of ψ to the dependence function A of Pickands (1981) may be misleading, as a rate function defined by Eq. 3.2 and a distribution function are very different objects. Besides satisfying A(t) ≥ max(t, 1 − t) for all t ∈ [0, 1], Pickands' function A is convex, and A(1) = A(0) = 1. These latter conditions do not need to apply to ψ.
In the bivariate case with v 12 = v 21 =: ρ, If ρ > 0, then I is convex and therefore, ψ is convex. Figure 1 shows contour plots of I for ρ = 0.8 (left) and ρ = 0.2 (middle). On the right, the function ψ is plotted for these two values of ρ; for both, Remark 2 If lim y→∞ P (min j =1,..,m Y j > y)/P (Y 1 > y) > 0, then I (1) = 1 with 1 = (1, .., 1); the converse is not true. In the bivariate case, I (1) = 1 implies that ψ( 1 2 ) = 1 2 for ψ in Remark 1. It does not fix ψ(t) at other t ∈ [0, 1]. For example, for the bivariate normal distribution with standard marginals (see Example 1) and with Remark 3 If I is subadditive, then by Eq. 3.4, it is convex, and furthermore, by Eq. 3.3, it is a norm. Again, this condition does not need to be satisfied in general. corresponding to the random variables {Y/y, y > 0} is exponentially tight (Dembo and Zeitouni 1998) which follows from Eq. 3.5 when taking E α = {x ∈ R m : x ∞ ≤ α + ε} for some ε > 0. As a consequence, Proof By Theorem 4.1.11 in Dembo and Zeitouni (1998) Dembo and Zeitouni (1998). Then Eq. 3.7 follows from Eq. 3.1 and the exponential marginals of Y .

Remark 4 Equation 3.3 is implied by (3.7).
For a continuity set A of I satisfying that inf I (Ā) = inf I (A o ), the bounds in (3.1) reduce to a limit: A sufficient condition for a set A to be a continuity set of I is that I is continuous and A ⊂ A o . Homogeneity (3.4) of I allows us to relax this condition: without assuming continuity of I , A is a continuity set if inf Fig. 2. Let A o be the grey set; if I attains its infimum overĀ on the part of its boundary drawn as a fat line (excluding the points indicated by circles), then A is a continuity set. In the remainder of this article, we will discuss continuity sets of rate functions without considering the particular conditions which make them so.
It is straightforward to extend Proposition 3 to a random vector X with a distribution function F having continuous marginals F 1 , .., F m . As in Eq. 2.5, let for i = 1, .., m, and for every x ∈ [0, ∞) m , (3.10) Having obtained a multivariate version of Eq. 2.10, we are now ready to generalise the univariate tail LDP (2.8) and its extension (2.11) to the multivariate context. Concerning the latter, one would expect its multivariate generalisation to be like (3.12), with Q replaced by an approximation. Let F 1 , .., F m satisfy log-GW tail limits with scaling functions g 1 , .., g m and log-GW indices θ 1 , .., θ m , respectively. As in Eq. 2.9, define marginal quantile approximations .,q m,y (x m )). (3.14) Theorem 1 Let the random vector X = (X 1 , .., X m ) have distribution function F with continuous marginals F 1 , .., F m having positive endpoints.
The proof can be found in Section 8.1.
Remark 5 This theorem justifies viewing (3.1) as representation of tail dependence within the context of the LDP (3.15), which also represents the marginal tails. The relationship between the LDPs (3.15) and (3.1) is the large deviations analogue of a similar relationship in classical extreme value theory; compare e.g. Resnick (1987), Propositions 5.10 and 5.15.
From the multivariate generalisation of Eq. 2.11, we can now also derive a multivariate version of Eq. 2.8, equivalent to the restriction of Eq. 3.15 to A ⊂ [1, ∞) m : Proof See Section 8.1.
Note that Eq. 3.16 only addresses events within (q 1 (y), ∞) × .. × (q m (y), ∞), which is "covered" by all marginal log-GW tail approximations simultaneously. Just as Eq. 2.8, it can be extended somewhat. However, the main interest of Eq. 3.16 is that it shows the multivariate tail LDP explicitly as a pair of asymptotic bounds for the probabilities of extreme events defined in terms of affinely normalised logarithms of the components of X. For applications in statistics, Eq. 3.15 should be more useful, as it applies also to events which are not simultaneously extreme in every component of X.

A connection to residual tail dependence and related models
In this section, we digress from the main storyline to examine an interesting connection between the theory of Section 3 and earlier work on residual tail dependence (RTD) or hidden regular variation, introduced in Ledford and Tawn (1996, 1997 and studied in depth in Resnick (2002), amongst others. In the bivariate case, RTD offers a model of tail dependence within the classical domain of asymptotic independence of component-wise maxima (e.g. de Haan and Ferreira (2006), Section 7.6). For a random vector X on R m with continuous marginals F 1 , ..., F m , defining the random vector V := (V 1 , .., V m ) with standard Pareto-distributed variables by Eq. 1.4, one way to describe RTD is that for some positive function S on (0, ∞) m , The limiting function S satisfies S(1) = 1, with 1 the vector in R m with all its components equal to 1. Furthermore, the denominator in Eq. 4.1 must be regularly varying, so S(λ1) = λ −1/η for all λ > 0 with η ∈ (0, 1] the residual dependence index, and by Eq. 4.1, Every regularly varying function f ∈ RV α can be represented as with c(y) → c 0 > 0 and a(y) → α as y → ∞. A minor strengthening of regular variation is that f satisfies the Von Mises condition (see e.g. Proposition 1.15 of Resnick (1987)), which means that c in Eq. 4.3 can be taken equal to a positive number c 0 ; it implies that f is differentiable with derivative f (y) = a(y)f (y)/y. Note that whenever the LDP (3.1) holds for Y given by Eq. 3.11 and inf I (A) ∈ (0, ∞) for a Borel continuity set A of I , then the function (y → − log P (Y/y ∈ A)) is in RV 1 . Therefore, within the context of the LDP (3.1), the statement that (y → − log P (Y/y ∈ A)) satisfies the Von Mises condition makes sense as a smoothness condition. The following relates RTD to the tail LDP (3.1).

Proposition 4 (a) RTD (4.1) implies
with η the residual dependence index of X.
Proposition 4 shows that RTD implies a limited LDP-like condition and in turn, the LDP (3.1) with an additional smoothness condition implies an RTD-like condition.
If lim t→∞ t −1 P (V j > t, j = 1, .., m) = 0, then there is a discrepancy between the "hidden" regularity of the survival function in (0, ∞) m described by Eq. 4.1 and the regularity of the marginals. In contrast, the LDP (3.1) provides a single consistent description of the multivariate tail which includes the marginal tails. Furthermore, the next theorem shows that under a smoothness assumption similar to the one in Proposition 4(b), the LDP (3.1) implies a useful extension of RTD. Let for all a ∈ R m , A a := {x ∈ R m : x j > a j , j = 1, .., m}. (4.5) Theorem 2 (a) Assume that the LDP (3.1) applies. To any Borel set A ⊂ R m which is a continuity set of I with (y → − log P (Y/y ∈ A)) satisfying the Von Mises condition, the following limit relation applies: with I satisfying (3.4), (3.7) and I (0) = 0. In particular, for every a ∈ [0, ∞) m such that the function (y → − log P (Y j > ya j , j ∈ 1, .., m) satisfies the Von Mises condition, Proof For A a continuity set of I , (3.1) implies (3.8), and (4.6) is obtained in the same manner as in the proof of Proposition 4(b). In particular, A a is a continuity set of I for every a ∈ [0, ∞) m . Therefore, substituting A a for A in Eq. 4.6, we obtain (4.7). This proves (a). For (b), note that f := (t → 1/P (Y ∈ A log t)) ∈ RV inf I (A) . Therefore, just as in the proof of Proposition 4(a), lim y→∞ y −1 log f (e yλ ) → λ inf I (A) for all λ ≥ 1, which implies (3.8).
Combining (a) and (b) in Theorem 2, we see that under the Von Mises condition (for A a Borel continuity set of I ), the limit relation (4.6) for a probability ratio, and the limit relation (3.8) for the normalised logarithm of a probability are equivalent.
In the special case of a = 1, Eq. 4.7 becomes equivalent to Eq. 4.1 with x = λ1 and η = 1/I (1), so on the diagonal, Eq. 4.7 and RTD (4.1) agree; elsewhere, they differ. Defining a function κ by κ(a) := inf I (A a ) for every a ∈ [0, ∞) m , Eq. 4.7 becomes identical to an extension of RTD recently introduced in Wadsworth and Tawn (2013). Wadsworth and Tawn (2013) proposed this assumption to close the possible gap between (4.1) and the regularity of the marginal tails. It is curious that this condition, requiring the existence of separate limits of the survival function along chosen paths, is derivable from the simple LDP (3.1).
The generalisation of Eq. 4.7 with inf I (A a ) replaced by κ(a) to the apparently new limit relation (4.6) is not trivial. Another generalisation, proposed in Wadsworth and Tawn (2013) However, the limiting behaviour of the probability of the event Y ∈ B+a log t as t → ∞ is determined by a in Eq. 4.8; not by B. Therefore, for estimating probabilities of extreme events, Eq. 4.6 seems more promising than the local limits (4.8) for chosen a.
In Eq. 4.6, it is not κ, but the rate function I which determines the attenuation rate. For any a ∈ [0, ∞) m , I (a) and κ(a) are identical only if I (a + x) ≥ I (a) for all x ∈ [0, ∞) m . This condition is rather restrictive, as a rate function resembles a density more than it resembles a survival function; see definition (3.2).

A simple estimator for very small probabilities
We are now going to apply the theory of Section 3 to the problem of estimation of probabilities of extreme events p n satisfying (1.1) from X (1) , ..., X (n) , with X (1) , X (2) , ... a sequence of iid copies of a random vector X in R m with distribution function F having continuous marginals F 1 , .., F m . Denoting the underlying probability space as ( , F, P), let F n ⊂ F be the σ -algebra generated by X (1) , ..., X (n) .
Generalising (1.1) to τ 2 > τ 1 > 0, consider events of the form B n := Q(A log n) with A ⊂ [0, ∞) m and Q given by Eq. 3.10. Suppose that the tail LDP (3.1) applies. Then for every Borel set A ⊂ [0, ∞) m which is a continuity set of I satisfying that inf I (A) ∈ (0, ∞), we have: This suggests estimating the left-hand side of Eq. 5.1 by replacing Q on the righthand side by an estimatorQ n and Y by an estimatorŶ n , and then choosing small enough that P (Q n (Ŷ n / ) ∈ B n ) can be estimated nonparametrically by counting.
Estimation of Q boils down to a univariate quantile estimation problem, so we proceed to examine this first. Assume that every marginal satisfies a log-GW tail limit (i.e., the univariate tail LDP, see Section 2). Let X j,1:n ≤ ... ≤ X j,n:n be the marginal order statistics derived from the marginal sample X (1) j , ..., X (n) j . For some intermediate sequence (k n ) and for n large enough that X j,n−k n +1:n > 0 for j = 1, .., m, define the following estimatorq j,n for q j (compare (3.13)): For z > y n ,q j,n (z) follows a log-GW tail withθ j,n andĝ j,n estimators for θ j and g j in Eq. 3.13, respectively; for other z, the empirical quantile is used as estimator. The only assumption we make on the quantile estimator is that the probability-based quantile estimation errorν j,n , defined bŷ for every T > 1.
The proof can be found in Section 8.3.
In practice, computing or approximating (5.9) may not be easy; for example, in engineering applications, it may involve running a complex numerical model for every datapoint. Therefore, it would be an advantage to replace + n (B) in Eq. 5.10 by an arbitrary value in some suitable interval. Define for some ϑ ∈ (0, ξ] − n (B) := sup{l > 0 :p n (Q n (Ŷ n /l) ∈ B) ≥ (k n /n) ϑ }. (5.14) for the present analysis, it is sufficient to assume that n (B) is a random variable satisfying (5.14). Now consider the following generalisation of the estimator (5. The proof can be found in Section 8.2. The constraints on (k n ), ξ and ϑ ensure that for some ε ∈ (0, 1 2 ),eventually, n ε < np n (Q n (Ŷ n / n (B n,τ )) ∈ B n,τ ) < n 1−ε . This does not seem restrictive for applications.
In practice, based on a few trial values of n (B) which give "acceptable" numbers of np n (Q n (Ŷ n / n (B)) ∈ B), one could check the stability ofπ II n (B) with respect to np n (Q n (Ŷ n / n (B)) ∈ B).

Numerical examples
First, we discuss simulations, considering the case of a bivariate normal random vector U with standard normal marginals and correlation coefficient ρ = 0.5. We are not yet concerned with marginal estimation, so for X, we take the random vector with standard exponential marginals obtained from U by marginal transformations; therefore, X equals Y defined by Eq. 3.11 in this case.
As extreme events, we consider halfspaces, i.e., U ∈ {x ∈ R 2 : a 1 x 1 + a 2 x 2 > c} for some a ∈ R 2 and c > 0; their probabilities are easily calculated. In terms of X, these events are represented by X ∈ B with Experiments were performed with a 2 = 1 and with several different values of a 1 , with c in each case chosen to ensure that P (X ∈ B) = 4 · 10 −8 . In all experiments, n = 5000, and the estimator (5.10) was applied with ξ = 1 and k n = 20.  Fig. 3 Simulation with bivariate normal dependence and exponential marginals (see text). From left to right: aŶ n (dots) and failure event B given by Eq. 6.1 (grey); bŶ n / + n (B) and failure event; c the classical analogueŶ n + λ n (B) of (b) (see main text) Our first case concerns a 1 = 0.5; B is shown as a grey patch in Fig. 3. Figure 3a showsŶ n , which has no datapoint in B. The stretched data cloudŶ n / + n (B) is shown in Fig. 3b, with k n = 20 datapoints in B; + n (B) equals 0.334. According to Eq. 5.10, the probability of B is estimated as (20/5000) 1/0.334 = 6.6 · 10 −8 .
To appreciate how this estimator differs from the classical approach, an estimator similar to Eq. 5.10 but based on the classical multivariate tail limit (1.5) was applied as well:π cl n (B) := (k n /n)e −λ n (B) with λ n (B) := inf{l > 0 :p n (Ŷ n + l1) ∈ B) ≥ k n /n} with 1 = (1, 1). It is similar to the estimator considered in de Haan and Sinha (1999) and Drees and de Haan (2015) without marginal estimation. Figure 3c showsŶ n + λ n (B)1 with λ n (B) = 8.92; the corresponding probability estimate equals 5.4·10 −7 . The qualitative difference between Fig. 3b and c is striking.  Fig. 4 Simulations with bivariate normal dependence and exponential marginals (see text). From left to right: a boundaries of failure events (6.1) labelled by a 1 , for a 2 = 1; b RMSE of the logarithm of probability as function of a 1 for estimator (5.10) (circles), its classical analogue (diamonds) and its classical analogue accounting for residual tail dependence (squares); c bias for the estimators as under (b) ρ < 1, the limiting measure in Eq. 1.5 is concentrated on the boundaries, so the classical estimator is not expected to do well in this case. Therefore, in addition, a correction of the classical estimator based on residual tail dependence (4.1) was applied cf. Draisma et al. (2004).
The results in Fig. 4 indicate that the standard deviation is generally small in comparison to the bias, despite the small value of k n used. The two classical estimators perform better or worse depending on the value of a 1 , but the LDP-based estimator (5.10) performs consistently as good as or better than both classical estimators in all cases.
The probability estimator (5.10) can also be applied to estimate the survival function. This makes it possible to compare it to the estimator for the survival function proposed in Wadsworth and Tawn (2013) (Sections 5.1 and 5.2) based on Eq. 4.7 with inf I (A a ) replaced by κ(a), estimated using an approach employing the Hill estimator. With both estimators, the same simulations were carried out as reported in Section 5.3 of Wadsworth and Tawn (2013): for X considered above, estimates of the survival function F c (x 1 , x 2 ) were made with x 2 = 1.5 log n and x 1 /x 2 = 0.05, 0.10, ..., 0.50. With n = 5000, k n = 20, 500 realisations, andŶ n replaced by the exact Y (see (3.11)) as in Wadsworth and Tawn (2013), the RMSE of the logarithm of probability for Eq. 5.10 was 11-17 % higher than for the estimator from Wadsworth and Tawn (2013), which performed similarly to an estimator based on the conditional probability approach of Heffernan and Tawn (2004) (see Section 5.3 of Wadsworth and Tawn (2013)). This is an encouraging result for an estimator as simple and widely applicable as (5.10).
The final case is a trial application of the estimator (5.15) to an oceanographic dataset, in order to estimate the mean fraction of time that wave run-up reaches the crest of a fictitious seawall. Figure 5 (upper right) shows simultaneous 3-hourly values of wave period measured at the offshore site YM6 in the North Sea and surge level from the nearby harbour of IJmuiden provided by Rijkswaterstaat, the Netherlands. 6 The dataset covers 24 years (n = 70128). For this trial, a strongly simplified version of a model from TAW (2002) is used to approximate the run-up height of the 2 % highest waves on the seawall from wave period and still water level. The set B of wave period/water level combinations leading to wave run-up exceeding 15m is indicated by the grey area in the same figure. In the model, the mean depth at the seawall base is 0m and the seawall has a flat smooth 1:4 slope. The RMS wave height at the base is approximated by its upper bound from Ruessink et al. (2003). For the water level, we use surge data, ignoring the astronomical tide. Dependence on wave direction is ignored in marginals and nearshore wave transformation. Because of all these simplifications, estimates obtained do not carry concrete relevance to coastal flood safety.
Contrary to the assumptions made earlier, the 3-hourly surge and wave period are serially dependent. Since we are estimating a fraction of time, serial dependence does not need to invalidate the estimate; its principal effect is that the estimate is less precise than it would have been if the process were iid. Imposing a minimum separation of 24 hours between storm events, the 41 datapoints moved into B in Fig.  5 (lower right) represent 18 distinct events, giving a mean duration per event of 6.8 hours. Using this value, the estimateπ II n (B) can be converted to an estimate of the frequency of wave run-up exceeding 15m; its value is 1.7 · 10 −4 per year. Evidently, this unconventional, but intuitively appealing variation of the peaks-over-threshold approach would need formal underpinning by a model of serial dependence in order to be taken seriously.

Discussion
Like similar methods in the classical setting (e.g. de Haan and Sinha (1999), Drees and de Haan (2015), Draisma et al. (2004)), the estimators (5.10) and (5.15) exploit homogeneity of a function describing tail dependence; in this case, homogeneity (3.4) of the rate function I . This offers the advantage that no explicit estimate of I is required. However, in certain situations, there may be good reasons to estimate I , such as if for a given random vector X, probabilities need to be estimated for multiple sets in a consistent and reproducible manner. Therefore, estimation of I remains a topic deserving elaboration.
The limitation of A to continuity sets of I in Theorems 3 and 4 is less restrictive than it may seem, since the homogeneity of I makes continuity sets rather common, as noted in Section 3. The other conditions on A are weak.
To prove convergence of the estimators under such weak conditions, local uniformity in d of convergence in Eq. 8.5 is employed, which is derived from uniformity in d of convergence in (8.12). The latter also ensures local uniformity in λ of convergence in Eq. 8.5, and therefore local uniformity in τ of convergence of the estimators in Eqs. 5.12 and 5.16. In practice, this means that if such an estimator applied to a given dataset produces a fair estimate of P (X ∈ B 0 ) for some B 0 ⊂ R m , then it may also be applied with confidence to the same dataset to estimate the probability of B 1 ⊂ R m such that P (X ∈ B 1 ) ≥ P (X ∈ B 0 ) τ for τ > 1 not too large, e.g. τ = 2. If P (X ∈ B 0 ) 1, e.g. P (X ∈ B 0 ) = 0.01, this amounts to extrapolation over several additional orders of magnitude in probability. How far one can extrapolate in practice will depend on the rates of convergence to the marginal log-GW tail limits and in Eq. 3.1, which will differ from case to case.
Convergence of log-probability ratios as in Eqs. 5.12 and 5.16 is typical for the probability range (1.1). A stronger notion of convergence might be desirable, but would require restrictive additional assumptions which would be hard to justify in applications. Rather, it is recommended to diagnose bias in estimates and take this into account in estimates of uncertainty. For this reason, modelling of bias and rate of convergence deserves further study.
Deriving asymptotic error distributions will require additional assumptions beyond those for Theorems 3 and 4 and methods quite different from those employed in the present article. Because it is complex (see e.g. Drees and de Haan (2015) for a comparable problem), this important topic needs to be left for a follow-up study as well.
The theory is readily extended from events involving a high value of at least one of the variables to events extreme "in any direction", by replacing the exponential distribution as standard marginal by the Laplace distribution cf. Keef et al. (2013). Other choices of standard marginal are also possible, with minor adaptations to theory and estimator.
Furthermore, the main results of this article can be generalised straightforwardly from a random vector in R m to a random element of C b (K), the continuous functions on a compact metric space K. Classical multivariate extreme value theory and estimation have been generalised to this setting earlier; see e.g. de Haan and Lin (2001), Part III of de Haan and Ferreira (2006), Einmahl andLin (2006) andde Haan (2014). For the theory presented here, the main difference between the R m setting and the C b (K) setting is that in the latter, exponential tightness of {P (Y/y ∈ ·), y > 0} no longer follows from the exponential marginals; it is an independent assumption. In loose terms, it entails that all but an exponentially small probability mass is concentrated on an equicontinuous set of functions in C b (K) (see e.g. Dembo and Zeitouni (1998)).

Proof of Theorem 1 and Corollary 1
Convergence in Eq. 2.6 is locally uniform in λ (e.g. de Haan and Ferreira (2006) For every y > 0,Q y is injective, so we can define the random vector with Y defined by Eq. 3.11. By Eq. 8.1, there exists almost surely for every > 1 and δ > 0 some y ,δ > 0 such that for all y ≥ y ,δ , Ỹ y − Y ∞ > δy implies Y ∞ > y. Therefore, by Eq. 3.5, since > 1 is arbitrary, for all δ > 0, By Proposition 3, the distribution functions of {Y/y, y > 0} satisfy the LDP (3.1) with good rate function I , so Eq. 8.3 implies the same for the distribution functions of {Ỹ y /y, y > 0}; see Theorem 4.2.13 of Dembo and Zeitouni (1998). Therefore, Eq. 3.15 follows from (8.2). To prove (b), note that by Eqs. 3.15 and 3.7, lim y→∞ 1 y log 1 − F j q j (y)e g j (y)h θ j (λ) = −λ for all λ ≥ 1 and j = 1, .., m, so Eq. 2.6 holds with q = q j , g = g j and θ = θ j for j = 1, .., m. As in the proof of (a), this implies (8.3). Moreover, Eq. 3.7 implies (3.3), so I is a good rate function. An application of Theorem 4.2.13 of Dembo and Zeitouni (1998) completes the proof of the theorem.

Proof of Theorem 3
Following the proof of Theorem 4 in Section 8.2, Eqs. 8.7 and 5.10 yield ] y −1 n logπ I n (Q(y n Aλ)) + λ inf I (A) = 0, and the result Eq. 5.12 follows as in the proof of Theorem 4.