Semiparametric bivariate modelling with flexible extremal dependence

Inference over multivariate tails often requires a number of assumptions which may affect the assessment of the extreme dependence structure. Models are usually constructed in such a way that extreme components can either be asymptotically dependent or be independent of each other. Recently, there has been an increasing interest on modelling multivariate extremes more flexibly, by allowing models to bridge both asymptotic dependence regimes. Here we propose a novel semiparametric approach which allows for a variety of dependence patterns, be them extremal or not, by using in a model-based fashion the full dataset. We build on previous work for inference on marginal exceedances over a high, unknown threshold, by combining it with flexible, semiparametric copula specifications to investigate extreme dependence, thus separately modelling marginals and dependence structure. Because of the generality of our approach, bivariate problems are investigated here due to computational challenges, but multivariate extensions are readily available. Empirical results suggest that our approach can provide sound uncertainty statements about the possibility of asymptotic independence, and we propose a criterion to quantify the presence of either extreme regime which performs well in our applications when compared to others available. Estimation of functions of interest for extremes is performed via MCMC algorithms. Attention is also devoted to the prediction of new extreme observations. Our approach is evaluated through simulations, applied to real data and assessed against competing approaches. Evidence demonstrates that the bulk of the data do not bias and improve the inferential process for extremal dependence in our applications.


Introduction
Precise knowledge of the tail behaviour of a distribution as well as predicting capabilities about the occurrence of extremes are fundamental in many applications, as for instance environmental sciences and finance. Evidence points to an increasing trend of such extreme events in environmental applications with associated economic and insurance losses growing dramatically (Salvatori et al.

Electronic supplementary material
The online version of this article (https://doi.org/10.1007/s11222-019-09878-w) contains supplementary material, which is available to authorized users. ). In most cases, the analysis of such extreme events is inherently multivariate. Interest is then on the concomitant observation of extremes on a number of variables. For instance, the effects on the human respiratory system are particularly dramatic after exposure to high concentrations of both ozone O 3 and nitrogen dioxide NO 2 .
Since standard statistical methods do not guarantee precise extrapolation towards the tail of the distribution, a variety of methods tailored to inference about tails have been introduced under the general name of extreme-value theory. Whilst univariate models can be faithfully applied in most applications, since their underlying assumptions are flexible enough to be met in practice, the application of multivariate methods often requires a number of assumptions which may impact the inferential process. First, a large number of models assume that variables are asymptotically dependent, meaning that there exist a dependence between the marginal extreme events (e.g. Boldi and Davison 2007;Sabourin and Naveau 2014). Such models thus exclude the possibility that, although dependence between random variables exists at finite levels, their extremes will be independent in the limit, a condition usually referred to as asymptotic independence. Second, the application of such methods requires the arbitrary selection of data points considered "extreme", usually selected as those that exceed a fixed threshold. However, this choice can greatly affect the inferential process (Scarrott and MacDonald 2012;Wan and Davis 2019). Finally, data is usually transformed via the empirical cumulative function so that each marginal has the same distribution. However, this transformation is known to bias inference on extremes (Einmahl and Segers 2009).
A new sound, flexible approach is proposed here to model multivariate extremes that requires little, if no, assumptions and that has the capability of bridging the two possible cases of asymptotic dependence and independence (there has been an increasing interest in the development of models with this property, e.g. Huser et al. 2017;Huser and Wadsworth 2019;Wadsworth and Tawn 2012;Wadsworth et al. 2017). Our approach formally uses in a model-based fashion the full dataset, thus not requiring the arbitrary selection of an extreme region of points (similarly to Aulbach et al. 2012;Vrac et al. 2007). Our applications show that the bulk of the data does not bias the inferential ascertainment of the asymptotic dependence structure. The approach proposed here splits modelling into two tasks: modelling marginals using some recently developed methodology for univariate extremes justified by the asymptotic theory for tails, and modelling dependence using a flexible semiparametric copula structure which does not require any assumption about the tail dependence decay. Although the use of copula functions with specific marginals to build multivariate models is not new, we show below that our semiparametric specification has critical implications for inference on extremes.
Inference is carried out within the Bayesian paradigm using the MCMC machinery (Gamerman and Lopes 2006), enabling us to straightforwardly deliver a wide variety of estimates and predictions of quantities of interest, e.g. high quantiles. The flexibility of our approach makes it computationally intensive but easily handled. Here we thus restrict our attention to bivariate problems (for some recent references on bivariate extremes see e.g. Camilo and de Carvalho 2017;Engelke et al. 2018;Guillotte et al. 2011;Wadsworth et al. 2017). However, multivariate extensions of the approach are readily available and discussed in Sect. 8.
The code for the implementation of our approach is written in OX (Doornik 1996) 1 . Multi-purpose software for the application of the methods is currently being developed.
The paper is structured as follows. Section 2 reviews univariate models for exceedances and extreme-value mixture 1 The code implementing the models for the applications of Sect. 7 is freely available at https://github.com/manueleleonelli/Biv_ext_exc. models. Section 3 deals with multivariate extremes and measures of extreme dependence. Having described in Sect. 2 models for marginals, we introduce copulae and mixtures of these in Sect. 4. This section further provides an overview of the inferential results for mixtures we use. Our approach and inferential routines are described in Sect. 5. Section 6 presents a simulation study to both investigate their performance and address the issue of model choice. In Sect. 7 our methodology is applied to two real-world applications: river flows in Puerto Rico and NO 2 /O 3 concentrations in the city of Leeds. We conclude with a discussion.

Peaks over threshold approach
A common approach to model extremes, often referred to as peaks over threshold (POT), studies the exceedances over a threshold. A key result to apply this methodology is due to Pickands (1975) which states that, under general regularity conditions, the only possible non-degenerate limiting distribution of properly rescaled exceedances over a threshold u is the generalized Pareto distribution (GPD). Its cumulative distribution function (cdf) P is defined as for u, ξ ∈ R and σ ∈ R + , where the support is x ≥ u if ξ ≥ 0 and u ≤ x ≤ u − σ/ξ if ξ < 0. Therefore, the GPD is bounded if ξ < 0 and unbounded from above if ξ ≥ 0. The application of this result in practice entails first the selection of a threshold u beyond which the GPD approximation appears to be tenable and then the fit of a GPD over data points that exceed the threshold. The POT approach has two serious drawbacks. First, only a small subset of the data points, those beyond the threshold, are formally retained in a model-based approach during the inferential process. Thus, parameter estimates may not be reliable when the number of data points is small. Second, the choice of the threshold over which to fit a GPD is arbitrary. Although tools to guide this choice exist (e.g. Davison and Smith 1990), inference can greatly vary for different thresholds Scarrott and MacDonald 2012).

The MGPD model
A variety of models called extreme-value mixture models (Scarrott and MacDonald 2012) have been recently defined to formally take into account the full dataset and not require a fixed threshold. These combine a flexible model for the bulk of the data points, those below the threshold, a formally justifiable model for the tail and uncertainty measures for the threshold. A building block of our approach is the extreme-value mixture model of Nascimento et al. (2012), which we henceforth refer to as mixture of gamma Pareto distribution (MGPD). This consists of a finite mixture of gamma distributions for the bulk coupled with a GPD for the tail. The parametrization of the gamma suggested in Wiper et al. (2001) in terms of shape, η, and mean, μ, parameters is used. Its density, g, is g(x|μ, η) = Γ (η) −1 (η/μ) η x η−1 exp (−ηx/μ), and its cdf is denoted by G.
A finite mixture of these distributions is defined next. For n ∈ N, let [n] = {1, . . . , n}. The density h and the cdf H of a finite mixture of n gammas are formally defined as and w is such that w i ≥ 0 and i∈ [n] w i = 1. The density f of an MGPD then consists of a mixture of gamma densities h for the bulk and a GPD density p for the right tail. Formally, where Θ = {μ, η, w, ξ, σ, u}. An example of an MGPD density fitting simulated data is presented in Fig. 1, where it is clearly discernible that the bulk of the distribution consists of a mixture of 2 gammas, whilst beyond the threshold the density has GPD decay.
The cdf of an MGPD F is similarly defined in a piecewise fashion. Whilst below the threshold u, this is the cdf of the mixture of gammas H , over the threshold, i.e. for x > u, it can be written as F(x|Θ) = H (u|μ, η, w) + (1 − H (u|μ, η, w)) P (x|ξ, σ, u).
A great advantage of the MGPD model is that high quantiles beyond the threshold, i.e. q values such that P(X > Fig. 1 Example of a MGPD density fit consisting of a mixture of 2 gammas for the bulk: solid line-MGPD density; dashed line-threshold q|Θ) = 1 − p for p > F(u|Θ), have a closed-form expression. Specifically, this is a function q of both the probability p and the parameter Θ defined as Nascimento et al. (2012) showed that the MGPD can outperform standard POT methods in situations where determination of the threshold is difficult. So nothing is lost using this approach instead of considering only the extreme points as in the standard POT approach. The number of mixture components for the bulk can be safely estimated as shown by the extensive studies of Nascimento et al. (2011Nascimento et al. ( , 2012Nascimento et al. ( , 2016. More details on the estimation of mixtures are discussed in Sect. 4.

Priors for the MGPD model
The MGPD model definition is completed by an appropriate prior distribution for Θ, as given in Nascimento et al. (2012). A gamma prior with shape c i and mean d i is assigned to each η i , where these parameters may be chosen to achieve a large prior variance. The parameter space of μ is restricted to to ensure the parameters' identifiability (see Sect. 4 for details). To each μ i is assigned an inverse gamma prior with shape a i and mean b i , where again these parameters may be chosen to achieve a large variance. The prior for μ is thus π(μ) = K i∈ [n] and zero otherwise, f I G is the inverse gamma density and K −1 = C(μ) i∈ [n] The weights of the gamma mixture, w i , are assigned a Dirichlet D(1 n ) prior, where 1 n is a vector of ones of dimension n. The prior of the threshold u is normal with mean chosen around a high-order sample statistic, since the GPD approximation can be expected to hold only over the tail of the data. The variance is chosen so that the bulk of the prior distribution ranges roughly over data points in the upper half. Nascimento et al. (2012) empirically showed that for large datasets the value of the variance does not affect the estimation of the threshold, whilst for smaller datasets the above restriction may need to be imposed to correctly estimate the threshold u. This is the only partially informative prior used, but the hyper-parameter choices are guided by the theoretical results of Pickands (1975).
The hyper-parameters above can be changed to effectively include expert prior information without affecting our inferential routines.

Asymptotics and models
Modelling approaches for multivariate extremes rely on limiting results of componentwise maxima and are mainly due to de Haan and Resnick (1977). One of these limiting results is briefly discussed next (see, e.g. Beirlant et al. 2004, for a comprehensive review). Let , be independent and identically distributed random vectors with marginal unit Fréchet distributions with cdfs exp(−1/x), x ∈ R + . If M n = max i∈[n] X i j j∈ [d] converges in distribution as n → ∞ to a non-degenerate cdf R, then R( The main point here is that the limiting distribution of the componentwise maximum M n cannot be described in a parametric closed form, but consists of a nonparametric family characterized by the spectral functions respecting the "mean" constraint. The generality of this result has lead to the definition of a variety of approaches to model multivariate extreme observations. We can broadly identify three different strategies: -define a parametric submodel for either the exponent measure Tawn 1991, 1994) or the spectral measure (Ballani and Schlather 2011;Boldi and Davison 2007); -model in a nonparametric fashion the class of MEVD distributions (Einmahl and Segers 2009;Guillotte et al. 2011); -construct models based on other theoretical justifications (De Carvalho and Davison 2014;Ramos and Ledford 2009;Wadsworth et al. 2017). In all cases, data are usually transformed via the empirical cdf into Fréchet or uniform margins and then some of the data points, those considered "extreme", are formally retained for inference. Having already discussed the difficulty of assessing such a threshold in the univariate case, the identification of extreme data points becomes even more critical in multivariate applications since there is no unique definition of threshold.
To illustrate this, consider the different bivariate threshold choices in Fig. 2. Figure 2a, b shows that an observation is extreme if it is beyond the threshold in all or in at least one component, respectively. These thresholds are usually utilized when estimating simultaneously marginal and joint features of the data. The threshold in Fig. 2c describes as extreme an observation such that the sum of its components is larger than a specified value and is often used when only modelling dependence. The last threshold in Fig. 2d is associated with the so-called partially censored approach: an observation below a marginal threshold in any component is supposed to be censored at the threshold.
Although the theoretical limiting result of maxima can be expected to hold in the region specified by the threshold in Fig. 2a, all other thresholds are more commonly utilized to increase the sample size effectively retained for inference. Furthermore, the choice of such thresholds is often driven by the type of analysis required or computational simplifications. A flexible method that takes into account the full dataset is developed here to avoid making the arbitrary choices of thresholds location and type.

Extreme dependence
MEVDs are asymptotic distributions and the strength of dependence given by them represents a measure of the asymptotic dependence in a random vector. However, in many practical applications dependent variables are observed to be asymptotically independent (Davison et al. 2013;Ledford and Tawn 1997;Ramos and Ledford 2009) and many commonly used distributions exhibit this behaviour: for example, the bivariate normal with correlation ρ ∈ [−1, 1), ρ = 0. Due to a result of Berman (1961), multivariate extreme independence can be assessed by investigating all pairs of random variables. We thus focus on bivariate vectors. Sibuya (1960) proved that two random variables X 1 and X 2 with cdfs F 1 and F 2 are asymptotically independent iff the tail dependence χ is equal to zero, where χ = lim u→1 χ(u), and χ(u) = P(F 1 (X 1 ) > u|F 2 (X 2 ) > u). For instance, for a bivariate MEVD distribution χ = 0 iff X 1 and X 2 are independent, whilst χ = 0 for any bivariate Gaussian with dependence ρ = 1. To address this deficiency of the MEVD distribution, novel extreme models that can take into account asymptotic dependence and independence have been proposed (e.g. Heffernan and Tawn 2004;Ramos and Ledford 2009;Wadsworth et al. 2017).

Copulae and mixtures
Given that all issues about marginals have been addressed, all is left is modelling dependence. Copulae are flexible functions to model complex relationships in a simple way. These only model the dependence structure of a random vector and allow for marginals to be defined separately (see Nelsen 2006, for a review). Their use to model multivariate extremes has been increasing over the past few years (e.g. Huser and Wadsworth 2019;Huser et al. 2017). For a random vector X = (X i ) i∈ [d] with cdf F, whose margins have cdfs [d] is an instantiation of X. Sklar (1959) proved that such a C linking marginal and joint distributions always exists. Notice that C is a cdf itself and, if absolutely continuous, possesses a density c called copula density and defined as c where f i and f are the densities of X i and X respectively.
Copula functions and finite mixture models have recently been combined (e.g. Kim et al. 2013) to depict an even wider variety of patterns of dependence. Formally, a mixture of n copulae C i is defined for i∈ [n] Our approach described in detail in Sect. 5 below is based on a finite mixture of copulae functions with marginals given by MGPDs, thus including mixtures of gamma distributions. It therefore requires the identification of the number of components for each mixture as well as the estimation of their weights. However, this is easily handled. In practical applications, a small number of components, either copulae or gammas, is usually required as shown by Dey et al. (1995), Nascimento et al. (2012) and Rousseau and Mengersen (2011). In practice, this means that the weight of any extra component is estimated to be zero. Thus, mixture estimation can be carried out using two equivalent routines: by either separately fitting models with an increasing number of components until one is estimated to have zero weight, or by fitting one model only with a large-enough number of components so that only the required ones have nonzero weight (see Supplementary Material for an example). Alternatively, the choice of the number of components can be straightforwardly automated using a fully nonparametric approach, but this is typically not required (see, e.g. Fúquene Patiño 2015; De Waal and Van Gelder 2005).
A relevant aspect of mixture models is their inherent lack of identifiability, but this can be solved by imposing some restrictions over the parameters. Diebolt and Robert (1994) and Frühwith-Schnatter (2001) impose order restrictions over the means in Gaussian mixtures. The same procedure is applied by Wiper et al. (2001) and Nascimento et al. (2012) over the means of a mixture of gammas, as reported in equation (2). Similarly, in our approach, we impose such an order restriction over the copulae components by ordering them according to their correlation coefficient. This guarantees that all components of our approach can be correctly identified.

Likelihood
For each marginal, an MGPD with density and cdf f i and F i , respectively, and parameters are the parameters of a mixture of n i gammas as in equation (1). The dependence structure is modelled by a mixture of n copulae C i with weights w = (w i ) i∈ [n] and and its density f equals where c i is the associated copula density, i ∈ [n]. Although our approach does not require any restriction on the chosen copulae, in this work mixtures of elliptical copulae are used: more specifically, Gaussian (Song 2000), T (Demarta and McNeil 2005), skew-normal (Wu et al. 2014) and skew-T (Smith et al. 2012) copulae. Furthermore, all mixture components are assumed to belong to the same family, e.g. Gaussian. Such mixtures have the very convenient property of a known asymptotic behaviour: whilst mixtures of Gaussians and skew-normals have asymptotically independent extremes, Ts and skew-Ts exhibit extreme dependence (Kollo et al. 2017). We discuss below in our simulation study that mixtures combining different elliptical structures did not provide any additional information about the extreme structure.
Consider now bivariate vectors only. The specific form of our densities follows by substituting c i in equation (3) with the expressions in Supplementary Material. Whilst for mixtures of Gaussian copulae, no constraints are imposed, for the other models, we impose the following constraints on the likelihood in equation (3): -for T copulae, all components have the same number of degrees of freedom in R + . This is to ensure that all components lead to the same asymptotic dependence structure; -for skew-normal copulae, all components have the same skewness parameters. Our simulations showed that such parameters are particularly hard to estimate and could not be precisely estimated under more general settings; -for skew-T copulae, one single component with integer degrees of freedom is used. This greatly speeds up computations using the formulae of Dunnett and Sobel (1954) and ensure all parameters can be correctly estimated.
As well as having closed-form expressions for marginal quantiles, bivariate quantiles can be easily deduced in our models. However, these are not uniquely defined since there are infinitely many pairs (x 1 , x 2 ) such that P(X 1 > x 1 , X 2 > x 2 |Θ) is equal to a specified number. Thus we look at pairs (x 1 , x 2 ) and compute the associated probability of joint exceedance P(X 1 > x 1 , X 2 > x 2 |Θ). This is a function E of (x 1 , x 2 ) and Θ defined as Similarly, our approach leads to closed-form expressions for the probabilities χ(u|Θ) andχ(u|Θ) appearing in the (subasymptotic) tail dependences. This is because, for instance, χ(u|Θ) = E(F −1 1 (u|Θ 1 ), F −1 2 (u|Θ 2 ), Θ)/ P(F 1 (X 1 |Θ 1 ) > u) and these two probabilities have closedform expressions.

Prior distribution
Our approach is completed by the introduction of a prior distribution, defined by considering separate blocks of parameters. The independent prior of each marginal parameter vector Θ i is the one of Nascimento et al. (2012) and reported in Sect. 2.3. Our simulation study showed that to ensure the threshold location is correctly identified in all cases, 95% of the prior probability needs to roughly range between the 75th and 95th data quantile: such a prior is still in line with the theory of Pickands (1975) whilst giving enough uncertainty to allow the data to guide the choice of the threshold location.
For correlation coefficients ρ i , a continuous uniform U[−1, 1] is selected. The joint π(ρ) is defined over a restricted space as for the mean parameters of the gamma mixtures to ensure identifiability. For skew copulae, a continuous uniform U[−1+ , 1− ] is assigned to the skewness parameters δ j , for an close to zero. The copulae mixture weights w i are given a Dirichlet D(1 n ). These priors are chosen to give uninformative prior beliefs.
For the degrees of freedom v of the T copula, the uninformative prior of Fonseca et al. (2008) is used, defined as where γ is the trigamma function (Abramowitz et al. 1965).
For the skew-T copula with integer degrees of freedom a zero-truncated Poisson distribution with mean 25 is used. Sensitivity studies showed that this value enabled for the identification of both low and high number of degrees of freedom.

Posterior and predictive inference
For a sample x = (x i ) i∈ [m] , where x i = (x 1i , x 2i ), the posterior log-density is then Inference cannot be performed analytically, and approximating MCMC algorithms are used. Parameters are divided into blocks and updating of the blocks follows Metropolis-Hastings steps since full conditionals have no recognizable form. Proposal variances are tuned via an adaptive algorithm as suggested in Roberts and Rosenthal (2009). Details are given in Supplementary Material. All algorithms are implemented in OX (Doornik 1996). The posterior in equation (5) is proper, and all parameters can be identified by our MCMC inferential routines. Most quantities of interest in the analysis of extremes, e.g. χ(u|Θ), are highly nonlinear functions of the models' parameters. Thus, their posterior distribution cannot be derived analytically. However, the MCMC machinery enables us to derive an approximated distribution for any function of the models' parameters. For instance, for I draws Θ (i) , i ∈ [I ], from the posterior π(Θ|x), the values χ(u|Θ (i) ) approximate the posterior distribution of χ(u|Θ), given a sample x. An estimate of the posterior mean is then 1 I i∈I χ(u|Θ (i) ). Estimation is an important task in extreme-value theory as much as the prediction of a new observation x m+1 given a sample x. The likelihood of a new observation can be summarized by the predictive distribution of joint exceedance E(x m+1 |x) given by This corresponds to the expectation of equation (4) with respect to the posterior π(Θ|x). This expectation cannot be computed analytically, but our Bayesian approach enables us to derive an approximated Monte Carlo estimate equal to 1 I i∈ [I ] E(x m+1 |Θ (i) ).

Ascertainment of extreme independence
A critical task in the analysis of multivariate extremes is the determination of the asymptotic dependence structure. However, very few models are able to take into account both extreme dependence and independence, and consequently discriminate one from the other. Our semiparametric Bayesian approach enables us to introduce a fully probabilistic, new criterion for the ascertainment of asymptotic independence based on the posterior distribution of the number of degrees of freedom of the T copula.
Recall that T and skew-T copulae tend to Gaussian and skew-normal ones, respectively, when the number of degrees of freedom v → ∞, and consequently large posterior estimates of the number of degrees of freedom may indicate asymptotically independent extremes. Thus, for a fixed c ∈ R + , we define the criterion φ(c) = P(v ∈ [c, ∞)|x) which gives an uncertainty measure about the possibility that χ = 0 and thus that extremes are independent. Of course, the assessment of the extreme dependence behaviour depends on the choice of c, but our simulations below give some guidance on how to choose this value and demonstrate that a fairly large interval of c values lead to similar conclusions. Values of φ(c) close to zero give an indication towards asymptotic dependence, whilst for φ(c) close to one, the evidence is towards asymptotic independence. Henceforth, we report the estimates of φ(c) only for the T mixtures since for this model the prior of the number of degrees of freedom is non-informative.

Simulations
A simulation study, performed to validate selection criteria for our models, is summarized next. Importantly, this exercise enables us to validate the use of the number of degrees of freedom to assess extreme dependence.
Priors are chosen as in Sect. 5.2. Prior means of μ i and η i , i ∈ [2] are selected around the true values if available, or around reasonable values after visual investigation of the data histograms, but with large variances. The prior means of the thresholds are fixed at the 90th empirical quantile.
For all simulations, algorithms are run for 25000 iterations, with a burn-in of 5000 and thinning every 20, giving a posterior sample of 1000. Convergence is assessed by looking at trace plots of various functions of the parameters. In all cases, the number of gamma mixture components of each marginal is first chosen by fitting different MGPD models. The number of copula components is then identified by independently fitting models with an increasing number of components until the weight of one is estimated to be zero (as in Fig. 3). In all cases, no more than two components are required. Note, however, that all parameters, both those of the marginal MGPDs and those of the copula densities, are estimated jointly. Figure 3 reports the summary φ(c) as a function of c for mixtures of T copulae estimated over the simulated datasets. With the exception of the slowest decaying solid line, which is associated with data simulated from a mixture of T with 7 degrees of freedom, φ(c) decays at a clearly different rate for asymptotically dependent and independent data. Depending on how one measures the loss of a misspecification of either behaviour, then different values of c can be chosen to devise a criterion to discriminate the extreme regimes.
Here we take a neutral position and assume both regimes have the same probability (φ(c) = 0.5). For this choice, values of c roughly in [7, 15] discriminate the two asymptotic behaviours, where such discrimination appears to be strongest for c = 10. Thus, hereafter, φ = φ(10) denotes our summary of evidence towards asymptotic independence. This is confirmed by Table 1 which summarizes the posterior means and credibility intervals of the number of degrees of freedom: these are more concentrated around larger values in asymptotically independent datasets. Our chosen coefficient φ, taking notably larger values for asymptotically independent datasets in Table 1, gives a sound uncertainty statement about the possibility of asymptotic independence. The only exception is the dataset from a mixture of T copulae for which the true number of degrees of freedom is seven: thus a value for φ around 0.5 is not surprising. Table 1 includes the estimates of the parameter δ of the model of Huser and Wadsworth (2019) which discriminates between asymptotic dependence and independence: for δ < 0.5 extremes are independent whilst for δ > 0.5 they are dependent. To fit the model, the R package SpatialADAI is used. Data are first transformed to uniform margins via the empirical cdf and thresholds at values 0.95 for each margins are used as in Huser and Wadsworth (2019). Different threshold choices did not affect the results. In most cases, the parameter estimate is close to the boundary case of 0.5 and most 95% confidence intervals include the possibility of either asymptotic behaviour, thus providing little information about the tail behaviour.
Standard model selection criteria, e.g. BIC (Schwarz 1978) and DIC (Spiegelhalter et al. 2002), reported in Table 2 and calculated using the parameters' posterior means, although giving guidance on the presence of skewness, do not provide information about extreme dependence, possibly because these are mostly influenced by the bulk of the data.
A model consisting of a mixture of Gaussian and T copulae (where all T are constrained to have the same number of degrees of freedom) is further fitted to all simulated datasets. Gaussian and T are chosen both for computational simplicity and because BIC and DIC criteria are overall lower for these models (unless data comes from skew-normal and skew-T models). These choices are natural since we validated their use in our simulation study. In all cases the weights of the Gaussian components are estimated to be zero and the estimates of the number of degrees of freedom coincide with the ones in Table 1. This exercise confirms the that main factor discriminating between asymptotic behaviours is indeed The row φ reports our criterion for the model T and δ 95 reports the estimate of the δ parameter of Huser and Wadsworth (2019) using a 0.95 threshold. The datasets used are: mixture of 2 Gaussian copulae (2G); skew-normal copula (SN); Morgenstern copula (MO); bilogistic copula (BL); mixture of 2 T copulae (2T); skew-T copula (ST); asymmetric logistic copula (AL); and Cauchy copula (CA) In bold are reported the lowest scores for each column. The datasets used are: mixture of 2 Gaussian copulae (2G); skew-normal copula (SN); Morgenstern copula (MO); bilogistic copula (BL); mixture of 2 T copulae (2T); skew-T copula (ST); asymmetric logistic copula (AL); and Cauchy copula (CA) the number of degrees of freedom and consequently our φ criterion.

Applications
Two datasets from environmental applications are analysed next: -weekly maxima from August 1966 to June 2016 of the flows of Fajardo and Espiritu Santu rivers in Puerto Rico, comprising 2492 observations (Nascimento et al. 2012); -daily maxima of the hourly means during the winter months in 1994-1998 of NO 2 /O 3 concentrations in Leeds, comprising 532 observations (Heffernan and Tawn 2004).
The Puerto Rico rivers dataset (Fig. 4a) is freely available at waterdata.usgs.gov, whilst the Leeds pollutants dataset (Fig. 4b) can be found in the texmex R package (Southworth et al. 2017). These are chosen for their apparent different asymptotic dependence: in Fig. 4 the Puerto Rico rivers seem to have strong extreme dependence, whilst the Leeds pollutants appear to have independent extremes (as noted in Heffernan and Tawn 2004). Some of the data points are not used for model fitting but to test the predictive capabilities of our and other approaches. Specifically, 1000 and 100 observations are randomly selected and discarded from the Puerto Rico rivers and Leeds contaminants datasets, respectively. These numbers are chosen to have test datasets smaller than the fitting ones and so to easily compute quantiles.
Our approach is compared against the asymptotically independent multivariate Gaussian tail model of Bortot et al. (2000), the best asymptotically dependent model in the EVD R package (Stephenson 2002) and the models of Ramos and Ledford (2009) that can account for both dependent and independent extremes. For all these models, marginal thresholds were selected as in Ledford and Tawn (1997) at a high empirical quantile of the variable min(− log(F 1 (X 1 )) −1 , − log(F 2 (X 2 )) −1 ), whereF is the empirical cdf . In this study, different empirical quantiles of this variable are used, namely the 90, 95 and 97.5 quantiles 2 . For each threshold and marginal, a GPD is first fitted to the exceedances using a POT approach and then the data are transformed into Fréchet margins via empirical cdf for data below the threshold and GPD cdf otherwise. Bivariate extreme models are fitted over the resulting datasets.

Model choice and measures of asymptotic dependence
The first step of the data analysis is to decide the number of mixture components for both the marginal gammas and the copulae. This is straightforwardly handled by following the procedure summarized in Sect. 4. Then the best copula mixture for each dataset is determined. Figure 5, reporting the coefficient φ(c), gives a clear indication of the asymptotic behaviour of the data: asymptotic independence for the Leeds pollutants and dependence for the Puerto Rico rivers. This is confirmed in Table 3 by the estimates of the number of degrees of freedom. The table further reports the estimates of the parameter δ of Huser and Wadsworth (2019). Whilst for the Leeds pollutants data the indication of asymptotic independence is strong, for the Puerto Rico rivers estimates are close to 0.5. Furthermore, these heavily depend on the threshold chosen: the estimate of δ indicates asymptotic dependence for a 0.9 threshold, and asymptotic independence otherwise (although in these last two cases the confidence interval includes 0.5). BIC and DIC scores are reported in Table 4, which again strongly confirm asymptotic independence for Leeds pollutants. Since the posterior credibility intervals of the skewness parameters for all skew models include zero, we choose the Gaussian for the Leeds contaminants and the T for the Puerto Rico rivers as our favourite models. In Fig. 6a, b, the posterior estimates of χ(u|Θ), u ∈ [0.8, 1), for our chosen models are reported. For both applications, the posterior means give a good fit to the associated empirical estimates from the fitting and test datasets. These two diagrams give a further indication of asymptotic dependence for the Puerto Rico rivers, as χ(u|Θ) tends to a strictly positive value, and asymptotic independence for Leeds pollutants, as χ(u|Θ) goes to zero. Similar conclusions are drawn from the probabilitiesχ(u|Θ) in the subasymptotic tail dependence reported in Fig. 6c, d . To the limit these confirm the asymptotic behaviour shown by χ(u|Θ), since for instance for the Leeds pollutantsχ(u|Θ) goes to −0.1. Notice that our approach, utilizing the full dataset, can give model-based estimates of both χ(u|Θ) andχ(u|Θ) for any u ∈ (0, 1). These are reported in Supplementary Material. For the Leeds pollutants dataset, the threshold 0.8 is the largest one for which R did not return an error

Predictions
The performance in extreme predictions of our approach is studied next. Marginally, as already noted in Nascimento et al. (2012), the MGPD can outperform the POT methodology. This is reported in Table 5 for the Puerto Rico rivers. Importantly, the table shows that joint modelling gives not only much narrower posterior credibility intervals than a simpler MGPD model, but also predicted values closer to the empirical ones. The same observation is true when comparing our approach with the results from the POT estimates and their confidence intervals. The properties of the posterior distribution of E(x 1 , x 2 |Θ), the probability of joint exceedance of (x 1 , x 2 ), are summarized in Table 6 for a number of values of (x 1 , x 2 ) exceeding the used thresholds, together with estimates from the other approaches considered as well as the empirical probabilities of the test data. Our approach outperforms competing ones for the Leeds pollutant dataset in all pairs, since our estimates are closer to the empirical ones than those from the other approaches. For the Puerto Rico rivers dataset, our estimates are closer to the empirical ones for all pairs but the one associated with an exceedance probability of 0.005. In all cases, the 95% posterior credibility intervals from our approach include the empirical probability. The posterior distributions of E(x 1 , x 2 |Θ) at a number of fixed values (x 1 , x 2 ) are further reported in Fig. 7. Such distributions are in general not available using the approaches reviewed in Sect. 3.
Lastly, Fig. 8 reports the Monte Carlo estimates of the predictive probabilities of exceedance E(x m+1 |x). Each point (x 1 , x 2 ) of this map gives the probability of a future obser-vation that is larger than both x 1 and x 2 . These provide an intuitive description of the overall behaviour of the test datasets. Again, such predictive summaries are often not available for other approaches. Although the probabilities in Fig. 8b appear to be highly asymmetrical, this is mostly due to the effect of the marginals (see Supplementary Material for a map not affected by marginals). Thanks to the capability of our approach of modelling marginals and dependence simultaneously, exceedance probabilities can exhibit a variety of flexible forms.

Effect of the bulk on estimation of extreme dependence
An analysis over a subset of the full datasets, including only points considered extreme, is next carried out to ascertain whether the bulk of the data affects our tail estimation approach. The extreme points are selected as follows: first only observations that exceed the chosen thresholds in both marginals are retained (as in Fig. 2a); for the Puerto Rico rivers application, the threshold locations are chosen at the posterior means of the thresholds of the T copula model (giving 190 observations); for the Leeds pollutants the thresholds were selected to give a marginal probability of exceedance of 0.3 as in Heffernan and Tawn (2004) (giving 49 observations); lastly the margins of the resulting data points are transformed to the uniform scale via the empirical cdf.
Mixtures of T copulae are first fitted to these datasets to investigate whether the asymptotic dependence behaviour chosen by looking at the full dataset is confirmed when considering only extreme points. The results of this analysis, summarized in Table 7, confirm the asymptotic behaviours identified in Sect. 7.1, but give much larger posterior credibility intervals to the number of degrees of freedom and thus uncertainty about the true extreme regime.
Having assessed the asymptotic dependence structure over the extreme points only using our T model, the extreme-value copulae (Gudendorf and Segers 2010) associated with the T and Gaussian copulae are then fitted to the extreme datasets of the Puerto Rico rivers and Leeds pollutants applications, respectively. However, for the Puerto Rico rivers, a Gumbel copula given by and θ ∈ [1, +∞) is used instead since this has an almost identical extreme dependence structure to the one of the extreme T copula, whilst being much simpler to estimate as discussed by Demarta and McNeil (2005). For the Leeds pollutants, a Gaussian copula is used since the associated extreme copula would simply be an independent one. Table 8 summarizes the posterior distributions of the relevant coefficients of tail dependence when estimated using the full dataset or the extreme points only. In both cases, the posterior means are around the values of the empirical coefficients reported in Fig. 6, but importantly the credibility intervals are narrower for the full dataset.
The issue of model choice between model exhibiting different extreme dependence structures was investigated as well as the performance of our approach in extremes' predictions. Furthermore, great attention was devoted to the identification of the extreme dependence behaviour by defining the new criterion φ which gives a probabilistic judgement on the possibility of asymptotic dependence. The results suggest that our Bayesian semiparametric approach gives a strong alternative to other bivariate approaches: the φ criterion robustly identifies the extreme dependence structure in our simulation study as well as in our applications.
A natural extension of the approach described here could consider two different copulae specifications in disjoint subsets of R 2 + , as for example done in Aulbach et al. (2012) and Vrac et al. (2007). Such subsets might correspond to the ones defined by the thresholds illustrated in Fig. 2. Such distinction would allow for the use of the full dataset whilst specifying a different dependence pattern for the extreme region, should one wish to do so. So, for instance, the likelihood could be defined as Emp. Pred. is the empirical estimate from the test dataset; EVD is the estimated probability from the best model in the EVD package; Bortot is the estimate from the model of Bortot et al. (2000); Ramos is the estimate from the model of Ramos and Ledford (2009). All these models are fitted using thresholds at the 90, 95 and 97.5 empirical levels  where c b and c t are two different copula densities, K is a normalizing constant and B ⊂ R 2 is the region including non-extreme points. This more general specification brings in extra components and complications (K depends on model parameters in a non-trivial form) and handling them is not so straightforward. Solutions for these issues are the subject of ongoing research. Although in this paper the focus was mainly on bivariate problems, multivariate extensions are readily available. For instance, mixtures of d-variate elliptical copulae could be considered. A full definition of the approach would then be completed by an appropriate prior for the covariance matrix, for instance an inverse-Wishart, and an appropriate identification constraint for matrices, for example based on the determinant. But more interestingly, since different pairs of variables could be defined to have a different asymptotic dependence, the overall density could be defined via vine-copulae (Bedford and Cooke 2002). For instance, in the trivariate case the overall density via a vine-copula decomposition can be written as f (x 1 , x 2 , x 3 ) = c 12 (F 1 (x 1 ), F 2 (x 2 ))c 23 (F 2 (x 2 ), F 3 (x 3 )) c 13|2 (F 1|2 (x 1 |x 2 ), F 3|2 (x 3 |x 2 )) i∈ [3] f i (x i ) where F i| j (x i |x j ) = ∂C i j (F i (x i ), F j (x j ))/∂ F j (x j ) and the c's are bivariate copula densities. We are currently investigating such models.