1 Introduction

A common goal in environmental statistics is evaluating and assessing risk. Risk evaluation usually entails estimating marginal or joint extreme exceedance probabilities, sometimes for values not yet observed in the data. Similarly, risk assessment involves estimating the magnitude of an extreme event that has not been observed, which, by definition, has a small probability of occurrence. In both cases, extrapolation techniques that can accurately represent the tails of the distribution are required. Due to the non-stationary nature of environmental data, these techniques must be able to incorporate the influence of external factors that could play a critical role in understanding rare events and tail behaviours. Therefore, from a modelling perspective, marginal and conditional estimation of high quantiles and the estimation of joint and marginal tail probabilities are of utmost importance in risk management.

We here present a combination of extreme value analysis (EVA), quantile regression, dependence modelling through copulae, and bootstrap methodologies to estimate these high conditional and marginal quantiles and probabilities associated with extreme events. EVA focuses on modelling the extreme tails of distributions, providing insights into events with low probabilities but high impact (Embrechts et al 2013). Over the last few years, EVA has undergone important extensions, from purely independent and identically distributed (iid) settings to non-stationary multivariate, spatial, and spatio-temporal domains (Heffernan and Tawn 2004; Rootzén and Tajvidi 2006; Rootzén et al. 2018; Castro-Camilo et al. 2019; Simpson and Wadsworth 2021; Castro-Camilo et al. 2021). Quantile regression (Koenker and Bassett Jr 1978) extends traditional regression analysis by estimating different quantiles of the response variable, offering a comprehensive view of the conditional distribution. However, conventional quantile regression estimators are often unstable at the extreme tails due to data sparsity in the tail region. For example, the uniform consistency of the classical linear \(\tau \)-quantile regression estimator holds only for \(\tau \in [\epsilon ,1-\epsilon ]\), where \(\epsilon \) is arbitrarily chosen in (0, 0.5) (Gutenbrunner and Jurecková 1992; Koenker 2005). Over the last few years, there has been an increasing number of methodological developments of extreme quantile regression (Chernozhukov et al. 2017), see, e.g., Daouia et al. (2013), Velthoen et al. (2023), Wang and Li (2013), as well as applications in finance (Chernozhukov and Fernández-Val 2011) and environmental sciences (Wang et al. 2012).

The methods presented here also incorporate dependence modelling through copulae to capture complex relationships between variables, essential for understanding joint extreme events (Durante and Salvadori 2010; Gudendorf and Segers 2010). Additionally, we also use bootstrap methods to aid in assessing the uncertainty associated with quantile estimates, which is especially crucial in scenarios with limited data or complex dependencies. We tailored these methods to the four tasks involving univariate and multivariate problems presented in the EVA (2023) Conference Data Challenge (Rohrbeck et al. 2023). The challenges (denoted by C1, C2, C3, and C4) were centred on an artificial country called Utopia, for which we lack general knowledge of environmental processes. However, the data contained some special properties, allowing a more tractable understanding of environmental extremes and their contributing factors.

The remainder of the paper is organised as follows. Section 2 contains a brief description of the data and data pre-processing. Section 3 presents two approaches considered to estimate extreme conditional quantiles. Section 4 details our approach to estimating marginal quantiles optimising a specific loss function. Sections 5 and 6 show multivariate approaches to estimate joint exceedance probabilities. Section 7 offers concluding remarks and reflects on the choices made in light of having the truth revealed in Rohrbeck et al. (2023).

2 Data and data pre-processing

The data provided include a univariate response variable Y for C1 and C2, and multidimensional responses for C3 (3 dimensions) and C4 (50 dimensions). For C1, we also have four covariates denoted by \(V_{1}\), \(V_{2}\), \(V_{3}\), and \(V_{4}\), as well as season, wind speed and direction, and atmosphere. For C3, only season and atmosphere are provided. No covariates are used in C2 and C4; see Rohrbeck et al. (2023) for a comprehensive description of the data. Two regions in Utopia contain 70 years of data, while a third contains 50 years. Every year contains twelve 25-day months, so the number of time points ranges between \(n = 50\times 12\times 25 = 15,000\) and \(n = 70\times 12\times 25 = 21,000\). We are advised that the data are expected to be stationary, with no large trends in time except for season and atmosphere, which are cyclical. Additionally, any spatial information and knowledge of environmental processes on Earth are irrelevant, and 11.7% of the observations have at least one missing value amongst all covariates but atmosphere and season. These missing values were subsequentially removed from the training dataset.

Let \(W=(W_1, W_2)^{\top }\) denote the random wind vector, where \(W_1\) represents wind speed and \(W_2\) represents wind direction. To avoid accounting for the circular nature of wind direction, we transform \(W_1\) and \(W_2\) to the eastward and northward wind vectors, represented by the variables \(W_E\) and \(W_N\), respectively, defined as

$$\begin{aligned} W_E = W_1\cos (W_2),\qquad W_N =W_1\sin (W_2). \end{aligned}$$
(1)

The meteorological convention is that the \(W_E\) component is positive for a west-to-east flow (eastward wind), and the \(W_N\) component is positive for a south-to-north flow (northward wind).

3 C1: extreme conditional quantile regression

The goal of this challenge is to provide pointwise estimation and associated uncertainty assessment of the 99.99% conditional quantile \(q_{\text {C1}}(\textbf{x}_i)\) of the response variable Y given covariates \(\textbf{x}_i\):

$$\begin{aligned} F_{Y\vert \textbf{X}=\textbf{x}_i}(q_{\text {C1}}(\textbf{x}_i)) = \text {Pr}(Y < q_{\text {C1}}(\textbf{x}_i)\mid \textbf{X} = \textbf{x}_i) = p_{\text {C1}} = 0.9999, \end{aligned}$$
(2)

Specifically, we aim to provide pointwise predictions for the conditional quantile of Y and its associated central 50% confidence interval for a test dataset comprising 100 different covariate combinations. The complexity of the problem relies on the lack of data to estimate the high quantile in Eq. 2, which makes usual models centred around mean values and classical validation tools such as k-fold cross-validation unsuitable. While discussing and testing different ways to tackle C1, two methods displayed competitive performance: the first assumes a generalised Pareto distribution (GPD) for exceedances over a high threshold with parameters described using a generalised additive modelling (GAM) framework. We call this the GAM-GPD approach. The second method uses quantile extrapolation techniques via sequences of intermediate quantiles. In this section, we show the construction and results for both techniques and discuss why we use the GAM-GPD for our data challenge submission.

3.1 Method 1: GAM-GPD

A natural way to tackle C1 is using threshold exceedance models based on the generalised Pareto distribution. Specifically, under max-stability assumptions, exceedances over a high threshold of an iid sequence from Y are asymptotically distributed according to a GPD, i.e.,

$$\begin{aligned} \text {Pr}(Y-u>y\mid Y>u) = \left( 1+\frac{\xi y}{\sigma }\right) ^{-1/\xi }, \end{aligned}$$
(3)

where u is the threshold and \(\sigma >0\) and \(\xi \in \mathbb {R}\) are the scale and shape parameters, respectively. There is no general consensus regarding the best way to select u, and this is still an active research area. Most common threshold validation techniques rely on graphical tools (Coles 2001), although recently, Murphy et al. (2023) proposed an automatic threshold selection procedure with integrated uncertainty quantification. In most applied problems, we need to capture potential non-stationarities when selecting the threshold, as this could bias the inference. Additionally, the magnitude of threshold exceedances could also be non-stationary, which means that the size of exceedances could be explained through external factors or covariates. One way to account for non-stationarity in threshold exceedances is by making the parameters in Eq. 3 change with the values of covariates. Section 3.1.1 below details our approach for threshold selection, while Section 3.1.2 describes a variable selection procedure to choose the best GPD fits and details how we combine them for improved quantile estimation.

3.1.1 Threshold selection

We investigate different approaches for threshold selection defined through model-based quantiles, from constant high quantiles to quantile generalised additive modelling regression to include covariates’ effects. Validation plots (not shown here) show that, in general, using constant quantiles provides better fits than their covariate-dependent counterparts. For \(\alpha = 0.9, 0.93, 0.95, 0.99\), we estimate the \(\alpha \)-quantile of the data, denoted as \(u_{\alpha }\), exploiting a link between the quantile loss function and the asymmetric Laplace distribution (Yu and Moyeed 2001). Exceedances over the fitted \(\alpha \)-quantile \(\widehat{u}_{\alpha }\) are defined as \(Z_i = Y_i - \widehat{u}_{\alpha } \vert Y_i > \widehat{u}_{\alpha }\) and are modelled as

$$\begin{aligned} Z\mid Z>0, \textbf{X} = \textbf{x}_i&\sim F_{Z\vert Z>0}\equiv \text {GPD}(\sigma (\textbf{x}_i), \xi ),\nonumber \\ \log \{\sigma (\textbf{x}_i)\}&= \beta _{\sigma ,0} + \beta _{\sigma ,1}\times \text {Season}_i + V_{1,i} + V_{2,i} + V_{3,i} + s_{\sigma ,4}(V_{4,i}) + \text {Atm}_i \nonumber \\ \xi (\textbf{x}_i)&= \xi _0, \end{aligned}$$
(4)

where \(\beta _{\sigma ,0}\) is an intercept, \(\beta _{\sigma ,1}\) is a regression coefficient, and \(s_{\sigma ,4}\) is a flexible term represented using reduced rank smoothing splines (Wood 2017, Chapter 4). The constant shape in Eq. 4 was chosen for simplicity, as this model is only used to assess the selection of the threshold \(\widehat{u}_\alpha \). Defining \(\tilde{q}_\alpha = u - \widehat{u}_{\alpha }\) for some \(u > \widehat{u}_{\alpha }\), we have that

$$\begin{aligned} p_{\alpha }&= \Pr (Y<u\mid \textbf{X}=\textbf{x}_i) = \Pr (Z< \tilde{q}_\alpha \mid Z>0, \textbf{X} = \textbf{x}_i)\Pr (Z>0) + \Pr (Z<0) \nonumber \\&= F_{Z\vert Z>0}(\tilde{q}_\alpha )\zeta _Z + 1 - \zeta _Z, \end{aligned}$$
(5)

where \(\zeta _Z = \Pr (Z>0)=\Pr (Y>\widehat{u}_{\alpha })\) is the exceedance rate, assumed to be constant and empirically estimated (note that \(\widehat{\zeta }_Z\) should be approximately \(1-\alpha \)), and \(F_{Z\vert Z>0}(\tilde{q}_\alpha )\) is estimated using Eq. 4. We discuss the suitability of the assumption of constant exceedance rate in Section 7. Note that Eq. 5 provides a way to link the modelling of Z with that of the target conditional quantile \(q_{\text {C1}}(\textbf{x}_i)\). Indeed, defining \(\tilde{q}_{\text {C1}}(\textbf{x}_i) = q_{\text {C1}}(\textbf{x}_i) - \widehat{u}_{\alpha }>0\) and using Eqs. 2 and 5 we have that

$$\begin{aligned} p_{\text {C1}} = \text {Pr}(Y < q_{\text {C1}}(\textbf{x}_i)\mid \textbf{X} = \textbf{x}_i) = F_{Z\vert Z>0}(\tilde{q}_{\text {C1}}(\textbf{x}_i))\zeta _Z + 1 - \zeta _Z, \end{aligned}$$
(6)

with \(p_{\text {C1} = 0.9999}\). This will be used in Section 3.1.2 to provide an estimate for \(q_{\text {C1}}(\textbf{x}_i)\).

The threshold \(\widehat{u}_\alpha \) is then selected by assessing the goodness of fit of Eq. 5 when \(u = \widehat{F}_Y^{-1}(p_\alpha \)), where \(\widehat{F}_Y(\cdot )\) denotes the empirical cdf of Y and \(p_{\alpha }\in [{0.9995}, 0.99999]\). Note that here we use the marginal cdf (or more precisely, the quantile function) of Y only to define u since, as mentioned above, our initial experiments show that GPD fits are better using constant threshold. Figure 1 shows probability-probability plots (pp-plots) constructed using the points \(\{(p_\alpha ^{(k)},\hat{p}_\alpha ^{(k)}),k=1,\ldots ,K\}\) where \(K=50\), \(0.9995\le p_{\alpha }^{(1)}\le \cdots \le p_{\alpha }^{(K)}\le 0.99999\) and \(\hat{p}_\alpha ^{(k)}\) are estimated using Eq. 5. The grey lines in Fig. 1 correspond to \(B=300\) block-bootstrap estimates, with the bootstrap average in blue and the estimates using the original data in black. We can see that some bootstrap samples display large variabilities, especially for probabilities close to 0.9995. But estimates using the original data and the bootstrap average seem to be closely aligned with the true values using the \(\alpha = 0.95\) quantile, offering a good trade-off between the number of exceedances available and the quality of the fit, especially around the target 99.99% quantile.

Fig. 1
figure 1

Comparison of estimates of high probabilities \(p_\alpha \in [0.9995,0.99999]\) against \(p_\alpha \) computed using Eq. 5 based on exceedances over the \(\alpha =90\%, 93\%, 95\%\) and \(99\%\) empirical quantile. Black lines show the fitted \(p_{\alpha }\) using the original data, while grey lines show the \(B=300\) block-bootstrap estimates. The blue lines are the averages over all bootstrap samples

3.1.2 A bootstrap model averaging approach for improved quantile estimations

As mentioned in Section 3.1.1, the GPD fit in Eq. 4 is only used to select a suitable high threshold (\(\widehat{u}_{0.95}\)) based on estimates of high probabilities using Eq. 5. Therefore, in this section, we aim to choose the best model to fit exceedances over \(\widehat{u}_{0.95}\). Although, in principle, this could be done jointly with the threshold selection, we decided on a pragmatic approach for computational reasons. All GPD models tested follow a similar formulation to that in Eq. 4, but now the GPD shape parameter is also allowed to vary with covariates. Among all models tested, three models displayed competitive performance. The models, denoted as \(\mathcal {M}_m\), \(m=1,2,3\), are

$$\begin{aligned} \mathcal {M}_1:\quad \log \{\sigma (\textbf{x}_i)\}&= \beta _{\sigma ,0}^1 + \beta _{\sigma ,1}^1\times \text {Season}_i + s_{\sigma ,1}^1(\text {Atm}_i) + s_{\sigma ,2}^1(V_{1,i},V_{2,i}) \nonumber \\&\quad + s_{\sigma ,3}^1(V_{3,i},V_{4,i}) + s_{\sigma ,4}^1(W_{E,i},W_{N,i}) ,\nonumber \\ \xi (\textbf{x}_i)&= \beta _{\xi ,0}^1.\nonumber \\ \mathcal {M}_2:\quad \log \{\sigma (\textbf{x}_i)\}&= \beta _{\sigma ,0}^2 + \beta _{\sigma ,1}^2\times \text {Season} + \beta _{\sigma ,2}^2\times \text {Season}\times V_{4,i} + s_{\sigma ,1}^2(\text {Atm}) \nonumber \\&\quad + s_{\sigma ,2}^2(V_{1,i},V_{2,i})+ s_{\sigma ,3}^2(V_{3,i},V_{4,i}) + s_{\sigma ,4}^2(W_{E,i},W_{N,i}) ,\nonumber \\ \xi (\textbf{x}_i)&= \beta _{\xi ,0}^2 + \beta _{\xi ,1}^2\times \text {Season}.\nonumber \\ \mathcal {M}_3:\quad \log \{\sigma (\textbf{x}_i)\}&= \beta _{\sigma ,0}^3 + \beta _{\sigma ,1}^3\times \text {Season} + s_{\sigma ,1}^3(\text {Atm}) + s_{\sigma ,2}^3(V_{1,i},V_{2,i}) \nonumber \\&\quad + s_{\sigma ,3}^3(V_{3,i},V_{4,i}) + s_{\sigma ,3}^3(W_{E,i},W_{N,i}) ,\nonumber \\ \xi (\textbf{x}_i)&= \beta _{\xi ,0}^3 + \beta _{\xi ,1}^3\times \text {Season} + s_{\xi ,1}^3(\text {Atm}) \nonumber \\&\quad + s_{\xi ,2}^3(V_{1,i},V_{2,i}) + s_{\xi ,3}^3(V_{3,i},V_{4,i}) + s_{\xi ,3}^3(W_{E,i},W_{N,i}), \end{aligned}$$
(7)

where \(\beta _{\cdot ,0}^m\) are intercepts, \(\beta _{\cdot ,1}^m,\beta _{\cdot ,2}^m\) are regression coefficients, \(s_{\cdot ,1}^m\) are smooths represented using reduced rank smoothing splines as in Eq. 4, and \(s_{\cdot ,l}^m\) for \(l\ge 2\) are smooth interaction terms defined using the tensor product construction of Wood (2006).Figure 2 is constructed as Fig. 1 and it shows the performance of these models (fitted over \(\widehat{u}_{0.95}\)) when estimating probabilities in [0.9995, 0.99999], using the original sample and \(B=300\) block-bootstrap samples.

Fig. 2
figure 2

Estimates of high probabilities computed using Eq. 5 based on exceedances over the fitted \(95\%\) quantile (\(\widehat{u}_{0.95}\)) for models \(\mathcal {M}_1,\mathcal {M}_2\) and \(\mathcal {M}_3\). Black lines show the fitted probabilities using the original data, while grey lines show the \(B=300\) block bootstrap-based estimates. The blue line is the average over all bootstrap samples

Among the three models, \(\mathcal {M}_1\) seems to better estimate the probabilities in [0.9995, 0.9999] when using the bootstrap average or the model fitted to the original data. Using the fitted GPD parameters from \(\mathcal {M}_1\), denoted by \(\widehat{\sigma }_{\mathcal {M}_1}(\textbf{x}_i)\) and \(\widehat{\xi }_{\mathcal {M}_1}(\textbf{x}_i)\), we can invert Eq. 6 with \(\widehat{u}_{\alpha } = \widehat{u}_{0.95}\) and estimate the target conditional quantile via

$$\begin{aligned} \widehat{q}_{\text {C1}}^{\mathcal {M}_1}(\textbf{x}_i) = \widehat{u}_{0.95} + \frac{\widehat{\sigma }_{\mathcal {M}_1}(\textbf{x}_i)}{\widehat{\xi }_{\mathcal {M}_1}(\textbf{x}_i)}\left[ \left\{ \frac{1-p_{\text {C1}}}{\widehat{\zeta }_Z}\right\} ^{-\widehat{\xi }_{\mathcal {M}_1}(\textbf{x}_i)} - 1\right] , \end{aligned}$$
(8)

where \(\widehat{\zeta }_Z\) is obtained using the empirical estimator and \(p_{\text {C1}} = 0.9999\). Alternatively, we can use the three models (\(\mathcal {M}_1, \mathcal {M}_2, \mathcal {M}_3\)) to predict the target conditional quantile \(q_{\text {C1}}(\textbf{x}_i)\) based on a bootstrap model average (BMA). Specifically, we adapt Martin and Roberts (2006) methodology to generate model-based block bootstrap samples as follows:

  1. 1.

    Fit \(\mathcal {M}_1\) to the original exceedance observations, which, without loss of generality, we here denote by \({\textbf{y}_0 = (y_{01},\ldots ,y_{0n_e})}\), where \(n_e\) is the total number of exceedances. Let \({\textbf{X}_0 = (V_{1}^0,V_{2}^0,V_{3}^0,V_{4}^0,\text {Season}^0,W_{E,1}^0,W_{N,1}^0,\text {Atm}^0)}\) denote the \(n_e\times 8\) matrix of associated covariates, where \(V_{1}^0 = (V_{1,1}^0,\ldots ,V_{1,n_e}^0)\). The same notation applies to the remaining covariates.

  2. 2.

    Obtain conditional generalised residuals \(\widehat{e}_{0j}=-\log \{1-\widehat{F}_{\mathcal {M}_{1}}({y_{0j}|\textbf{X}_{0j}})\}\), \(j=1,\ldots ,n_e\), where \(\widehat{F}_{\mathcal {M}_{1}}\) is the cdf of model \(\mathcal {M}_{1}\) and \(\textbf{X}_{0j}\) is the j-th row of \(\textbf{X}_{0}\).

  3. 3.

    Generate \(B = 300\) block bootstrap samples from \(\widehat{e}_{01},\ldots ,\widehat{e}_{0n_e}\). Denote these samples by \(\widehat{\textbf{e}}_{b} = (\widehat{e}_{b1},\ldots ,\widehat{e}_{bn_e})\) and their corresponding accompanying covariates matrix by \(\textbf{X}_{b}\) for every \(b=1,\ldots ,B\).

  4. 4.

    Construct bootstrap observations \(y_{bj} = \widehat{F}^{-1}_{\mathcal {M}_1}(1-\exp \{-\widehat{e}_{bj}\}|{\textbf{X}}_{bj})\), \(j=1,\ldots ,{n_e}\).

We then fit models \(\mathcal {M}_m\), \(m=1,2,3\), to \(\mathcal {D}_b=(\textbf{y}_b,\textbf{X}_b)\) and choose the best fit via AIC (Akaike 1973). Let \(w_{m}\in [0,1]\) be model weights computed according to the prevalence of model \(\mathcal {M}_m\) among the B bootstrap samples. Then, the final point estimation of the target conditional quantile \(q_{\text {C1}}(\textbf{x}_i)\) using BMA is expressed as

$$\begin{aligned} \widehat{q}_{\text {C1}}^{\text {BMA}}(\textbf{x}_i) = {\widehat{u}_{0.95}} + \sum _{m=1}^3w_{m}\widehat{F}_{m,Z\vert Z>0}^{-1}\left( \frac{p_\text {C1}-1+\widehat{\zeta }_Z}{\widehat{\zeta }_Z}\mid \textbf{x}_i\right) := \widehat{u}_{0.95} + \widehat{A}_i, \end{aligned}$$
(9)

where \(\widehat{F}_{m,Z\vert Z>0}(\cdot )\) is the cdf of model m, \(m=1,2,3.\) We evaluate the uncertainty around the estimate \(\widehat{q}_{\text {C1}}^{\text {BMA}}(\textbf{x}_i)\) using the delta method. Specifically, following Burnham and Anderson (2002), the estimated standard error associated with \(\widehat{A}_i\) can be expressed as

$$\begin{aligned} \begin{aligned} \hat{se}\left( \widehat{A}_i\right)&= \sum _{m=1}^3w_m\sqrt{\left( \frac{\hat{se}_{m,i}t_{r_m}}{z_{0.25}}\right) ^2+\left[ \widehat{A}_i-\widehat{F}_{m,Z\vert Z>0}^{-1}\left( \frac{p_\text {C1}-1+\widehat{\zeta }_Z}{\widehat{\zeta }_Z}\mid \textbf{x}_i\right) \right] ^2}, \end{aligned} \end{aligned}$$
(10)

where \(\hat{se}_{m,i}=\hat{se}\left( \widehat{F}_{m,Z\vert Z>0}^{-1}\left( \frac{p_\text {C1}-1+\widehat{\zeta }_Z}{\widehat{\zeta }_Z}\mid \textbf{x}_i\right) \right) \) is the estimated standard error of the predictions by model m using covariates \(\textbf{x}_i\), \(t_{r_m}\) is the 25% percentile of a student’s t distribution with \(r_m\) degrees of freedom and \(z_{0.25}\) is the 25% percentile of a standard normal distribution. Due to the the big sample size, Eq. 10 was computed using a normal approximation to \(t_{r_m}\). Finally, the 50% confidence interval for \(\widehat{q}_{\text {C1}}^{\text {BMA}}(\textbf{x}_i)\) obtained by the delta method is \(\widehat{q}_{\text {C1}}^{\text {BMA}}(\textbf{x}_i)\pm {z_{0.25}}\hat{se}\left( \widehat{A}_i\right) .\) For comparison purposes, we also computed a block bootstrap-based confidence interval using \(B=300\) samples. Figure 3 shows predictions of the target conditional quantile on the test dataset using confidence intervals constructed by the delta method (top) and bootstrap (bottom) for the best model \(\mathcal {M}_1\) (left column) and the bootstrap model average approach (BMA, right column). We can see some differences in terms of pointwise estimates between the BMA and \(\mathcal {M}_1\) model. Specifically, predictions obtained using \(\mathcal {M}_1\) are slightly higher and with wider CIs than those obtained by BMA. Since BMA is an average of models, it is slightly more robust to model misspecification and we would prefer it over \(\mathcal {M}_1\). Regarding the uncertainty of estimates, we can see that the delta method provides narrower intervals compared to the bootstrap-based confidence intervals.

Fig. 3
figure 3

Pointwise predictions of the target 99.99% quantile on the test dataset and associated central \(50\%\) confidence intervals using the GAM-GPD approach detailed in Section 3.1. These results are shown for model \(\mathcal {M}_1\) (left column) and bootstrap model average approach (right column). Confidence intervals were constructed using the delta method (top) and block bootstrap (bottom)

3.2 Method 2: extreme quantile estimation using intermediate quantiles and quantile extrapolation

This section presents an alternative non-parametric flexible estimation of the target conditional quantile \(q_{\text {C1}}(\textbf{x}_i)\) using quantile regression and extrapolation techniques. Quantile regression generally performs well for central levels but becomes less robust when the target probability approaches 0 or 1, which is our case (\(p_{\text {C1}} = 0.9999\)). Although directly applying quantile regression to our estimation problem is not advisable due to the limited data available beyond the target conditional quantile \(q_{\text {C1}}(\textbf{x}_i)\), we can use quantile regression to predict quantiles which are smaller than \(q_{\text {C1}}(\textbf{x}_i)\), and then use extrapolation techniques to obtain \(q_{\text {C1}}(\textbf{x}_i)\).

Suppose we have two relatively high conditional quantiles \(q_{\tau _1}(\textbf{x}_i)\) and \(q_{\tau _2}(\textbf{x}_i)\), \(\tau _1 < \tau _2\) from \(F_{Y|\varvec{X}=\textbf{x}_i}\) and we wish to estimate the conditional quantile \(q_\beta (\textbf{x}_i)\), with \(\beta \ge \tau _2\). To avoid dealing with the upper endpoint directly when extrapolating, we work with the density-quantile function (Hutson 2002) defined as \(fq_{\beta }(\textbf{x}_i) = f_{Y|\varvec{X}=\textbf{x}_i}(q_{\beta }(\textbf{x}_i)|\textbf{x}_i )=1/q^{\prime }_{\beta }(\textbf{x}_i): [0,1] \rightarrow \mathbb {R}_+\), where \(q'_{\beta }\) is the derivative of \(q_{\beta }\) with respect to \(\beta \). This function is obtained by differentiating \(\beta =F_{Y|\varvec{X}=\textbf{x}_i}(q_{\beta }(\textbf{x}_i)|\textbf{x}_i)\) with respect to \(\beta \). Hutson (2002) proposed a linear interpolation on the tail part \(\{\beta :\tau _2< \beta < 1\}\) of \(fq_{\beta }(\textbf{x}_i)\) to derive an expression for \(q_{\beta }(\textbf{x}_i)\) for higher quantiles. Note that when \(\beta =1\), the density-quantile function at the upper endpoint, \(fq_{1}(\textbf{x}_i)\), is equal to 0 by properties of probability density functions. Additionally, \(fq_{\tau _2}(\textbf{x}_i)\), or equivalently, \(1/q_{\tau _2}^{\prime }(\textbf{x}_i)\) can be approximated by the reciprocal of the slope between the points \((\tau _1, q_{\tau _1}(\textbf{x}_i))\) and \((\tau _2, q_{\tau _2}(\textbf{x}_i))\) if they are close enough. Integrating the interpolated \(q'_{\beta }(\textbf{x}_i) = 1/fq_{\beta }(\textbf{x}_i)\) and setting \(\beta = p_{\text {C1}}\), we have the following extrapolation formula (Hutson 2002)

$$\begin{aligned} \hat{q}_{p_{\text {C1}}}(\textbf{x}_i)=q_{\tau _2}(\textbf{x}_i) + \frac{\tau _2-1}{\tau _2-\tau _1}[q_{\tau _2}(\textbf{x}_i)-q_{\tau _1}(\textbf{x}_i)]\log \frac{1-p_{\text {C1}}}{1-\tau _2}, \quad \tau _2 \le p_{\text {C1}}< 1. \end{aligned}$$
(11)

The choice of \(\tau _1\) and \(\tau _2\) involves a trade-off between the accuracy of quantile regression and tail extrapolation. Small \(\tau _1\) and \(\tau _2\) have better performance of the quantile estimation but poor extrapolation, while large values operate oppositely. Additionally, the difference between \(\tau _1\) and \(\tau _2\) affects approximation accuracy of \(q_{\tau _2}^{\prime }(\textbf{x}_i)\). Here, we use two relatively large empirical quantiles defined as \(\tau _1 = (m-1)/(m+1)\) and \(\tau _2=m/(m+1)\). with \(m=30,90.\) These two settings were designed to study the influence of different \(\tau _1\) and \(\tau _2\) over the extrapolation results.

We estimate \(q_{\tau _2}(\textbf{x}_i)\) and \(q_{\tau _1}(\textbf{x}_i)\) using quantile generalised additive regression with the following form

$$\begin{aligned} \begin{aligned} \widehat{q}_{\tau _2}(\textbf{x}_i)&= \beta _{\tau _2,0} + \beta _{\tau _2,1}\times \text {Season}_i + s_{\tau _2,1}(\text {Atm}_i) + s_{\tau _2,2}(V_{1,i})+s_{\tau _2,3}(V_{2,i})\\ &+s_{\tau _2,4}(V_{3,i})+ s_{\tau _2,5}(V_{4,i}) +s_{\tau _2,6}(W_{E,i})+s_{\tau _2,7}(W_{N,i})\\ \widehat{q}_{\tau _1}(\textbf{x}_i)&= \beta _{\tau _1,0} + \beta _{\tau _1,1}\times \text {Season}_i + s_{\tau _1,1}(\text {Atm}_i) + s_{\tau _1,2}(V_{1,i})+s_{\tau _1,3}(V_{2,i}) \\&+s_{\tau _1,4}(V_{3,i})+ s_{\tau _1,5}(V_{4,i}) +s_{\tau _1,6}(W_{E,i})+s_{\tau _1,7}(W_{N,i}), \end{aligned} \end{aligned}$$
(12)

where \(\beta _{\tau _2,0}, \beta _{\tau _1,0}\) are intercepts, \(\beta _{\tau _2,1}, \beta _{\tau _1,1}\) are regression coefficients, and \(s(\cdot )\) are reduced rank smoothing splines. As m increases, chances of quantile crossing between \(\widehat{q}_{\tau _2}(\textbf{x}_i)\) and \(\widehat{q}_{\tau _1}(\textbf{x}_i)\) could rise. To address this, we replace \(\widehat{q}_{\tau _2}(\textbf{x}_i)\) with \(\widehat{q}_{\tau _1}(\textbf{x}_i)\) whenever the former is smaller than the latter, as \(\widehat{q}_{\tau _1}(\textbf{x}_i)\) is generally more robust due to its closer proximity to the median.

To obtain the required central 50% confidence intervals of the target conditional quantile \(q_{\text {C1}}(\textbf{x}_i)\) over the test data, we implement a block bootstrap procedure where \(\widehat{q}_{\tau _2}(\varvec{x}_i)\) and \(\widehat{q}_{\tau _1}(\varvec{x}_i)\) are estimated using Eq. 12 in each bootstrap sample. Then, the target conditional quantile \(q_{\text {C1}}(\textbf{x}_i)\) is extrapolated using Eq. 11. The above process is repeated \(B=300\) times to obtain the sampling distribution of \(q_{\text {C1}}(\textbf{x}_i)\). Figure 4 shows pointwise predicted means and central 50% confidence intervals of the target conditional quantile \(q_{\text {C1}}(\textbf{x}_i)\) on the test dataset. We can see that this method provides overall different results than those obtained using the GPD-GAM approach (Fig. 3). Specifically, pointwise estimates using the intermediate quantile methods tend to be larger than those obtained via the GPD-GAM, with narrower confidence intervals. Additionally, the effect of the tuning parameter m on the intermediate quantile approach is not completely clear from Fig. 4. We can see the main differences between \(m=30\) and \(m=90\) are in terms of point estimates (and not so much in terms of uncertainty), but Fig. 4 does not provide a comprehensive understanding of the effect of m. A more formal way to select m would provide a better ground for comparison against the GPD-GAM approach. Considering the above, we relied on the theoretical foundations of the GPD and submitted our results using the GAM-GPD approach of Section 3.1 using bootstrap model average (BMA) with confidence intervals constructed using the delta method introduced in Section 3.1 (upper right plot in Fig. 3).

Fig. 4
figure 4

Pointwise predicted means of the target 99.99% conditional quantile on the test dataset and associated central \(50\%\) confidence intervals using \(m=30\) (left) and \(m=90\) (right) to define intermediate quantiles for the approach described in Section 3.2

4 C2: a GPD-based weighted bootstrap approach to estimate extreme marginal quantiles

In this challenge, we seek to estimate an extreme event of Y irrespective of covariate values. Specifically, the task is to estimate the quantile \(q_{\text {C2}}\) such that

$$\begin{aligned} \text {Pr}(Y>q_{\text {C2}}) = \frac{1}{300\times 200}. \end{aligned}$$

If samples from the process Y were independent and identically distributed, \(q_{\text {C2}}\) would inform on an event occurring once in 200 years, on average (recall that each year contains 300 days). The point estimate of the return value \(\hat{q}_{\text {C2}}\) must seek to minimise the cost function

$$\begin{aligned} L(q_{\text {C2}},\hat{q}_{\text {C2}}) = {\left\{ \begin{array}{ll} 0.9(0.99q_{\text {C2}}-\hat{q}_{\text {C2}}) & \text { if } 0.99q_{\text {C2}}>\hat{q}_{\text {C2}}\\ 0 & \text { if } |q_{\text {C2}}-\hat{q}_{\text {C2}}|\le 0.01q_{\text {C2}}\\ 0.1(\hat{q}_{\text {C2}}-1.01q_{\text {C2}}) & \text { if } 1.01q_{\text {C2}} < \hat{q}_{\text {C2}}, \end{array}\right. } \end{aligned}$$
(13)

which represents a preference for over-estimation over under-estimation. Jonathan et al. (2021) performed a systematic review of return value estimators for peaks over thresholds with applications to ocean-environment engineering. Using bias in the true return value, bias in exceedance probability, and bias in log-exceedance probability, they concluded that the overall best estimator is the mean of different quantile estimates for the annual maximum event with a non-exceedance probability of \(1-1/r\), where r is the return period. In this section, we combine the approach in Jonathan et al. (2021) with a weighted bootstrap procedure for extreme value estimators by Gomes and Oliveira (2001) and de Haan and Zhou (2024), allowing for the extrapolation of return values and provides measures of uncertainty. Weighted bootstrap techniques have been used to exploit desired data properties and obtain more robust distribution estimation and confidence interval calibration (Hall and Maesono 2000). Kojadinovic and Yan (2012) show that weighted bootstrap is a computationally efficient alternative to parametric bootstrap for goodness-of-fit and hypothesis testing.

For the GPD in Eq. 3, the return value \(y_r\) associated with the return period r can be expressed as

$$\begin{aligned} y_r = u + \frac{\sigma }{\xi }[(rp_u)^\xi -1], \end{aligned}$$

where \(p_u = \text {Pr}(Y>u)\) is the exceedance probability. While Jonathan et al. (2021) used multiple measures of bias and exceedance probability to access the accuracy of the return level estimate, we use the loss function provided in Eq. 13. Given that the quantile of interest \(q_{\text {C2}}\) is not observed, we compute the loss on threshold exceedances for observed extreme observations under the assumption that a high quantile observed in 70 years (q) is smaller than the equivalent quantile observed over 200 years (\(q_{\text {C2}}) \), i.e., \(q < q_{\text {C2}}\). As mentioned above, a weighted bootstrap procedure is used to provide a measure of uncertainty. The weights for the bootstrap sampling were chosen as a scaled version of \(\arctan (y_{(i)})\), where \(y_{(i)}\) is the i-th ordered observation, to induce higher weights for extreme observations and much smaller weights for smaller values as seen in the left side of Fig. 5. Specifically, the sampling weight \(w_i\) for observation \(y_{(i)}\), \(i = 1,\ldots , n\), is

$$\begin{aligned} w_i = \frac{w^{*}_i}{\sum _{i=1}^{n}{w^{*}_i}}, \quad \text {where} \quad w^{*}_i = \frac{y^*_i - \min \{y^{*}_i\}}{\text {max}\{y^{*}_i - \min {y^{*}_i}\}}\quad \text {and}\quad y_i^* = \arctan \left\{ \Phi ^{-1}(\widehat{F}_Y(y_{(i)}))\right\} , \end{aligned}$$
(14)

where \(\widehat{F}_Y(\cdot )\) is the empirical cdf of Y and \(\Phi (\cdot )\) is the cdf of a standard normal distribution used here to obtain the desired preferential sampling of extreme values. In an application on precipitation exceedances, Varga et al. (2016) propose bootstrapping extremes by incorporating bootstrap weights coming from known distributions (multinomial or exponential) into a weighted GPD likelihood. However, this approach does not ensure that the largest values in the original sample are consistently sampled, potentially missing valuable information for extreme value analysis. With the weights in Eq. 14, we guarantee the smallest numbers have very small weights, while extreme observations are far more likely to be sampled, resulting in samples that are more representative of the extremal behaviour of the data. The full GPD-based weighted bootstrap procedure is performed as follows:

  1. 1.

    For each iteration of the bootstrap procedure \(b = \{1,...,B\}\), sample a set of n observations with replacement using \(w_i\) in Eq. 14 as the probability of drawing the i-th quantile from the original data, \(i=1,\ldots ,n\).

  2. 2.

    Fit a stationary GPD model, \(\hat{F}_b\), to exceedances over the \(p_0=0.995\) empirical quantile. This threshold, denoted as \(u_0\), was chosen using mean residual life plots of the sampled data (not shown).

  3. 3.

    Predict the high quantiles \(q_{(j)}<q_{\text {C2}}\) where \(q_{(j)} > u_0\), \(j=1,...,n_u\), and \(n_u\) is the number of exceedances over \(u_0\) sorted by increasing order j, using \(\hat{F}_b^{-1}\left( \frac{\hat{F}_Y(q_{(j)})-p_0}{1-p_0}\right) = \hat{q}_{(j)}\). Then, calculate \(L(q_{(j)},\hat{q}_{(j)})\) for each predicted quantile \(q_{(j)}\) using Eq. 13. Finally, compute the total loss \(L_b = \sum _{j=1}^{n_u} L(q_{(j)},\hat{q}_{(j)})\).

  4. 4.

    Predict \(q_{\text {C2}}\) using \(\hat{F}_b\).

The above procedure is repeated for \(B=1000\) iterations, resulting in a range of bootstrap predictions assessed by their respective total loss, \(L_b\). The right-hand side of Fig. 5 shows these predictions for threshold \(u_0=119.33\), corresponding to the \(99.5\%\) empirical quantile. The point estimate for the sample b with the smallest total loss \(L_b\) was proposed as the final answer. In this application, the bootstrap sample with the smallest total loss had a point estimate of \(\hat{q}_{\text {C2}}=239\) with 95% confidence interval (169.2, 363.2).

Fig. 5
figure 5

Left: Bootstrap sampling weights before rescaling for transformed observations \(y_i^*\) where extreme observations are more likely to be sampled at every bootstrap iteration. Right: Bootstrapped quantile predictions based on a GPD fitted on exceedances of the 99.5% empirical quantile. The red points represent the best prediction, i.e., the sample that minimises the loss function, and the blue line is our final prediction for \(q_{\text {C2}}\)

5 C3: Using vine copulae for exceedance probabilities of low-dimensional random vectors

We aim to estimate the extreme probabilities \(p_{\text {C3,1}}=\text {Pr}(Y_1>y,Y_2>y,Y_3>y)\) and \(p_{\text {C3,2}}=\text {Pr}(Y_1>v,Y_2>v,Y_3<m)\) where \(Y_i\) are standard Gumbel random variables (\(i=1,2,3\)), \(y=6,v=7\) and \(m=-\log (\log 2)\). Sklar (1959) showed that a multivariate continuous distribution F can be written in terms of a cumulative distribution function with uniform margins, known as a copula. Specifically, any d-dimensional multivariate distribution \(F(y_1,\cdots ,y_d)\) with continuous margins \(F_i(y_i)\) has a unique copula C and we can write

$$\begin{aligned} F(y_1,\cdots ,y_d)=C\left( F_1(y_1),\cdots ,F_d(y_d)\right) . \end{aligned}$$

Given \(F_i\) are known to be standard Gumbel, by using this approach, we only need to choose an appropriate copula C to obtain \(p_{\text {C3,1}}\) and \(p_{\text {C3,2}}\).

A single copula may not effectively model the whole dependence when different dependence structures are spotted across margins. One way to increase the flexibility of the estimation is to utilise bivariate parametric copulae models as building blocks to construct multivariate copula, a process known as vine copula construction (Joe 2014; Simpson et al. 2021; Czado and Nagler 2022). A vine on d elements in a nested-tree structure where the nodes of the second tree (T2) are the edges of the first tree (T1), and the nodes of the third tree (T3) are the edges of the second tree, and so on. A widely used vine structure is the regular vine (Dissmann et al. 2013), which satisfies the proximity condition that two nodes in the \(j+1\) tree are connected only if they share the same node in tree j, \(j=1,2,\cdots ,d-1\). A vine copula can be obtained by assigning bivariate copulae to the edges of a regular vine. Figure 6 illustrates the structure of a vine copula on three variables. Note that the structure is not unique. Indeed, we can select node 1 or node 2 as the central node in T1. Then, T2 and T3 are adjusted subject to the conditions of the regular vine. Based on this figure, we can express the density of (\(Y_1,Y_2,Y_3\)) as

$$\begin{aligned} f(y_1,y_2,y_3)=&c_{13}(F_1(y_1),{F_3(y_3)}) c_{23}(F_2(y_2),F_3(y_3)) \times c_{12|3}(F_{1|3}(y_1|y_3),F_{2|3}(y_2|y_3);y_3)\nonumber \\&\times f_1(y_1)f_2(y_2)f_3(y_3), \end{aligned}$$
(15)

where \(F_{1|3}\) and \(F_{2|3}\) are the marginal conditional distributions, \(c_{13}\) and \(c_{23}\) are copula densities, and \(c_{12|3}\) is the conditional copula density conditioning on variable 3. The pair-copulae \(c_{13}\), \(c_{23}\) and \(c_{12|3}\) can be chosen from known parametric pair-copula families (see, e.g., Aas et al. 2009, Czado 2010).

Fig. 6
figure 6

A regular vine on three variables. \(c_{13}\) and \(c_{23}\) are copula densities and \(c_{12|3}\) is the conditional copula density conditioning on variable 3

Fig. 7
figure 7

Survival probabilities in terms of \(U_i = F(Y_i)\) where F is the standard Gumbel cdf, zoomed in on \(p \in (0.9,1)\) (left) and on a logarithmic scale (right) to evaluate the model performance of our vine copula approach described in Section 5. The black line represents empirical values, and the red one corresponds to estimated probabilities computed using \(10^7\) samples from the copula model

To fit (15) incorporating covariates, we reparametrise the copulae model parameters in terms of Kendall’s \(\tau \). Indeed, as noted by Genest et al. (2011), the Kendall’s \(\tau \) of a bivariate copula can be calculated by

$$\begin{aligned} \tau =4\int C(u_1,u_2) d C(u_1,u_2) -1, \end{aligned}$$

so we can establish a one-to-one mapping between Kendall’s \(\tau \) and one-parameter copulae. We incorporate the only two covariates available for C3, namely, season and atmosphere, in each copula in Fig. 6 using the following GAM structure

$$\begin{aligned} \tau _{j} = g(\beta _0 + \beta _1\times \text {Season} + s_j(\text {Atm})),\quad j=1,2,3, \end{aligned}$$
(16)

where j is a copula index following Fig. 6 from top to bottom (so \(j=1\) refers to copula \(c_{13}\), \(j=2\) to copula \(c_{23}\) and \(j=3\) to copula \(c_{12|3}\)), \(g(x)=(e^x-1)/(e^x+1)\) is a link function that ensures a valid value of \(\tau _{j}\), \(\beta _0\) is an intercept, \(\beta _1\) is a regression coefficient and \(s(\cdot )\) is a reduced rank smoothing spline, similar to the one in Eq. 4.

We fit (15) using different families of copulae for \(c_{13}, c_{23}, c_{12|3}\) reparametrised with Kendall’s \(\tau \) expressed as in Eq. 16 in a sequential manner. Specifically, we fit parametric copulae with the GAM structure in Eq. 16 for each \(c_{13}\) and \(c_{23}\) in T1 and choose the best via AIC. Using the fitted values of the best copulae for \(c_{13}\) and \(c_{23}\), we generate pseudo-observations from \(F_{1|3}(y_1|y_3)\) and \(F_{2|3}(y_2|y_3)\). Finally, the same estimation step in T1 is employed for determining \(c_{12|3}\). The above procedure is repeated over all possible regular vines in three dimensions to find the global optimal estimation (for higher dimensional estimation, a heuristic method is discussed in Dissmann et al. (2013)). The estimation results show that the optimal regular vine structure is the one in Fig. 6. Under this structure, the most suitable copula family for \(c_{13}\) and \(c_{23}\) is the standard Gumbel copula that is 270 degrees rotated, and the best one for \(c_{12|3}\) is the standard Gumbel copula that is 90 degrees rotated. We evaluate the model performance by comparing the empirical and model-based survival probabilities \(\text {Pr}(U_1>p, U_2>p, U_3>p)\), \(\text {Pr}(U_1>p, U_2>p \mid U_3<p_m)\) where \(U_i = F(Y_i)\), \(p_m = F(-\log (\log 2))\) = 0.5, F is the standard Gumbel cdf, and \(p\in [0,1]\). Specifically, we draw \(10^7\) samples from our copula model by resampling the covariates season and atmosphere and empirically compute the above survival probabilities for a sequence of p in [0, 1]. Figure 7 shows that the predicted survival probabilities are well aligned with the empirical values for most values of p. However, in the tail region, there is a slight overestimation of \(\text {Pr}(U_1>p, U_2>p, U_3>p)\) and underestimation of \(\text {Pr}(U_1>p, U_2>p \mid U_3<p_m)\). Our final predictions are \(\widehat{p}_{\text {C3,1}}=9.45\times 10^{-5}\) and \(\widehat{p}_{\text {C3,2}}=1.04 \times 10^{-5}\).

6 C4: Using probabilistic PCA for exceedance probabilities of high-dimensional random vectors

For this challenge, we aim to estimate the joint exceedance probabilities

$$\begin{aligned} p_{\text {C4,1}}&= \text {Pr}(Y_1^{(1)}> u_1,\ldots ,Y_{25}^{(1)}> u_1, Y_1^{(2)}> u_2,\ldots ,Y_{25}^{(2)}> u_2),\nonumber \\ p_{\text {C4,2}}&= \text {Pr}(Y_1^{(1)}> u_1,\ldots ,Y_{25}^{(1)}> u_1, Y_1^{(2)}> u_1,\ldots ,Y_{25}^{(2)} > u_1), \end{aligned}$$
(17)

where \(Y^{(i)}\) are copies from Y observed at two distinct areas \(i=1,2\), and \(u_i\) is the \((1-\phi _i)\) quantile of a standard Gumbel distribution with \(\phi _1 = 1/300\) and \(\phi _2 = 12\times \phi _1\), i.e., \(u_1=5.702113\) and \(u_2=3.198534\). Different approaches were considered to estimate the probabilities in Eq. 17, from a combination of supervised and unsupervised learning to max-stable models for multivariate extremes (see, e.g., Segers 2012). For practical reasons, we chose to work under the probabilistic principal component analysis framework (PPCA; Tipping and Bishop 1999), which borrows ideas from factor analysis by assuming a low-dimensional latent Gaussian model framework so that the principal axes of a set of observed data vectors can be determined through maximum likelihood estimates. More specific approaches using PCA for extreme value analysis have been proposed (Cooley and Thibaud 2019; Jiang et al. 2020; Drees and Sabourin 2021), but they are for the asymptotic dependence case, which does not seem to be the case for all locations in the data and therefore may not be appropriate for our purposes. For PPCA, let \({Y} = \{\varvec{y}_1,\cdots ,\varvec{y}_n\}\), where \(\varvec{y}_i \in \mathbb {R}^d\). Under the usual factor analysis framework (Bartholomew 1987), we assume a linear relationship between \(\varvec{y}_i\) and some underlying latent variable \(\varvec{z}_i \in \mathbb {R}^k\), where \(\varvec{z}_i\) are k-dimensional vectors and \(k<d\). Specifically,

$$\varvec{y}_i = \varvec{W}\varvec{z}_i + \varvec{\epsilon },$$

where \(\varvec{W}\) is a \(d\times k\) dimensional matrix whose components are known as principal axes and \(\varvec{\epsilon }\) is an error term assumed to follow an isotropic Gaussian distribution \(N(0, \sigma ^2\varvec{I})\). Then, the data points \(\varvec{y}_i\) are described via the projection \(\varvec{y}_i|\varvec{z}_i \sim N(\varvec{W}\varvec{z}_i, \sigma ^2\varvec{I})\). PPCA aims to estimate the principal axes \(\varvec{W}\) and the noise variance \(\sigma ^2\), and achieves this by generalising PCA. Indeed, by integrating out the latent variable \(\varvec{z}_i\), we can see that \(\varvec{y}_i \sim N(0, \varvec{W}\varvec{W}^\top +\sigma ^2\varvec{I}).\) Then, estimates for \(\varvec{W}\) and \(\sigma ^2\) can be obtained via maximum likelihood. Note that the classical PCA framework is retrieved when \(\sigma ^2\rightarrow 0.\) We implemented PPCA and computed the probabilities in Eq. 17, which yield the point estimates \(\widehat{p}_{\text {C4,1}} = 2.9\times 10^{-9}\) and \(\widehat{p}_{\text {C4,2}}=5.4\times 10^{-10}\), respectively.

Fig. 8
figure 8

Results from a logistic regression model for the exceedance rate using GAMs over all the continuous covariates available in C1 (\(V_{1}, V_{2}, V_{3}, V_{4}\) and atmosphere as given by the organisers, and the \(W_E\) and \(W_N\) components computed following Eq. 1). The plots show fitted splines (continuous black line), 95% confidence intervals (grey polygons), and the reference horizontal line at 0 (dashed red line). In the y-axis, \(s_3(\cdot ,k)\) is a cubic spline with k degrees of freedom (df), while \(s_T(\cdot ,k)\) is a thin plate regression spline with k df, selected using the choose.k() function in the mgcv R package

7 Discussion

In this section, we discuss our results in light of having the truth revealed after the 2023 EVA Conference (Rohrbeck et al. 2023). The first challenge involved estimating an extreme conditional quantile, and we considered two approaches. For the first one, we borrowed information from nearby quantiles and used a generalised Pareto (GP) distribution with parameters described through GAMs, with a block bootstrap approach for threshold selection and model validation. We found that a GP threshold invariant to covariates provides a better fit than a covariates-dependent one. Additionally, the exceedance rate (denoted as \(\zeta _Z\) in Section 3.1.1) is also assumed to be invariant to covariates. To check this assumption, we fit a logistic regression model to the probability of exceedances using all available covariates. The covariates are included flexibly using GAMs, except for Season that was included as categorical covariate. Figure 8 shows the individual effects of each continuous covariate over the exceedance rate. We can see that three of the covariates have no effect once uncertainty is accounted for, while the rest show only a modest, almost negligible effect on the exceedance probability.

Once the threshold was selected, a variable selection procedure was conducted to identify the best model for threshold exceedances. This procedure identified three models with competitive performances, and the best among them (denoted \(\mathcal {M}_1\)) was compared to a bootstrap model average (BMA) of the three. It is important to notice that threshold selection for the GP model was conducted separately from model selection. This was mainly for computational reasons and that could be improved.

\(\mathcal {M}_1\) and BMA were compared using two techniques to compute confidence intervals: the delta method and a block-bootstrap approach. When compared to the truth, \(\mathcal {M}_1\) performs slightly better than BMA when using the bootstrap-based confidence intervals (bottom left in Fig. 3). We argue that this is because \(\mathcal {M}_2\) has a prevalence of \(w_2=56.33\%\), which highly impacts the BMA results (for comparison, \(\mathcal {M}_1\) has a prevalence of \(w_1=18.67\%\)). The high prevalence of \(\mathcal {M}_2\) results in an overall underestimation of the truth compared to the single model \(\mathcal {M}_1\). Additionally, due to its construction, the BMA approach avoids the strong influence of abnormal observations and usually provides more stable estimations. Thus, it shows narrower CIs compared to the single model \(\mathcal {M}_1\). Nonetheless, our submission based on the BMA approach with confidence intervals computed via the delta method shows that our estimates are decently aligned with the true values overall.

Our second approach for C1 also borrows information from nearby quantiles but in a different way, using intermediate quantile estimation and estimating the target quantile using quantile extrapolation. Specifically, linear interpolation is applied at the tail end of the density-quantile function for extrapolation, leading to an exponential tail. This could explain our biased estimations, given that the truth is generated from a GPD. Further improvements could be made by integrating the tail heaviness characteristic of the data into the interpolation and the extrapolation process.

The second challenge required the estimation of marginal quantiles that optimise a particular loss function. Our approach minimised the loss function using a GPD model for the upper quantiles of extreme-weighted bootstrap samples. This choice of modelling approach was mainly guided by the lack of data to extrapolate to an extremely high quantile. Using the extreme-weighted bootstrap samples, we obtained multiple bootstrap datasets representing the extreme values in the data. However, the approach led to an overestimation of the quantile of interest. In part, this overestimation is due to the assumption that \(q_{\text {C2}}>q\), which, at the time we were solving the challenges, seemed reasonable, as it meant that a high quantile for a period of 70 years is lower than that observed in a period of 200 years. Although useful, this assumption proved erroneous, as the true return value for the 200-year return period was observed in the training data. We present this as a cautionary tale for such assumptions made prior to the modelling process that may deeply affect the results. Additionally, an exhaustive search of different weighting strategies in the bootstrap approach could help determine its usefulness for extreme quantile extrapolation.

The third challenge involves two typical scenarios in multivariate extreme modelling: estimating the probability where all components are extremal and estimating those where only part of the components are extremal. Given the known marginal distributions, we focus completely on dependence estimation. Specifically, to estimate the dependence structure, we fit a regular vine copula on data transformed to a uniform scale. Our approach facilitates joint dependence modelling for both bulk and tail data. Although it effectively addresses this challenge, the estimation is dominated by bulk dependence, potentially leading to a bias of tail dependence estimation. An alternative is to consider a vine copula specifically tailored for tail copula to enhance the precision of joint tail probability estimation. For example, Simpson et al. (2021) studied the properties of the tail dependence coefficient and the asymptotic shape of the sample cloud for commonly used vine copula classes.

Finally, the PPCA approach in the fourth challenge allowed us to quickly compute probabilities of high-dimensional multivariate random vectors using principal components based on a latent Gaussian structure. However, this resulted in an overestimation of the true probabilities. As noted in Section 6, we disregarded the use of PCA for extreme value analysis since some locations displayed evidence of potential asymptotic independence. After the true values and data-generating processes were revealed, extreme PCA could, in fact, have been a useful tool in the computation of joint exceedance probabilities for the clusters of asymptotically dependent locations.