Modelling sub-daily precipitation extremes with the blended generalised extreme value distribution

A new method is proposed for modelling the yearly maxima of sub-daily precipitation, with the aim of producing spatial maps of return level estimates. Yearly precipitation maxima are modelled using a Bayesian hierarchical model with a latent Gaussian field, with the blended generalised extreme value (bGEV) distribution used as a substitute for the more standard generalised extreme value (GEV) distribution. Inference is made less wasteful with a novel two-step procedure that performs separate modelling of the scale parameter of the bGEV distribution using peaks over threshold data. Fast inference is performed using integrated nested Laplace approximations (INLA) together with the stochastic partial differential equation (SPDE) approach, both implemented in R-INLA. Heuristics for improving the numerical stability of R-INLA with the GEV and bGEV distributions are also presented. The model is fitted to yearly maxima of sub-daily precipitation from the south of Norway, and is able to quickly produce high-resolution return level maps with uncertainty. The proposed two-step procedure provides an improved model fit over standard inference techniques when modelling the yearly maxima of sub-daily precipitation with the bGEV distribution.


Introduction
Heavy rainfall over short periods of time can cause flash floods, large economic losses and immense damage to infrastructure. The World Economic Forum states that climate action failure and extreme weather events are perceived among the most likely and most impactful global risks in 2021 (World Economic Forum, 2021). Therefore, a better understanding of heavy rainfall can be of utmost importance for many decision-makers, e.g., those that are planning the construction or maintenance of important infrastructure. In this paper, we create spatial maps with estimates of large return levels for sub-daily precipitation in Norway. Estimation of return levels is best described within the framework of extreme value theory, where the most common methods are the block maxima and the peaks over threshold (e.g. Coles, 2001;Davison & Huser, 2015). Due to low data quality (see Section 2 for more details) and the difficulty of selecting high-dimensional thresholds, we choose to use the block maxima method for estimating the precipitation return levels. This method is based on modelling the maximum of a large block of random variables with the generalised extreme value (GEV) distribution, which is the only non-degenerate limit distribution for a standardised block maximum (Fisher & Tippett, 1928). When working with environmental data, blocks are typically chosen to have a size of one year (Coles, 2001). Inference with the GEV distribution is difficult, partially because its support depends on its parameter values. Castro-Camilo et al. (2021) propose to ease inference by substituting the GEV distribution with the blended generalised extreme value (bGEV) distribution, which has the right tail of a Fréchet distribution and the left tail of a Gumbel distribution, resulting in a heavy-tailed distribution with a parameter-free support. Both Castro-Camilo et al. (2021) and Vandeskog et al. (2021) demonstrate with simulation studies that the bGEV distribution performs well as a substitute for the GEV distribution when estimating properties of the right tail. Additionally, in this paper we develop a simulation study that shows how the parameter-dependent support of the GEV distribution can lead to numerical problems during inference, while inference with the bGEV distribution is more robust. This can be of crucial importance in complex and high-dimensional settings, and consequently we choose to model the yearly maxima of sub-daily precipitation using the bGEV distribution.
Modelling of extreme daily precipitation has been given much attention in the literature, and it is well established that precipitation is a heavy-tailed phenomenon (e.g. Katz et al., 2002;Papalexiou & Koutsoyiannis, 2013;Wilson & Toumi, 2005), which makes the bGEV distribution a possible model for yearly precipitation maxima. Spatial modelling of extreme daily precipitation has also received a great amount of interest. Cooley et al. (2007) combine Bayesian hierarchical modelling with a generalised Pareto likelihood for estimating large return values for daily precipitation. Similar methods are also applied by Davison et al. (2012), Geirsson et al. (2015), Opitz et al. (2018), and Sang and Gelfand (2009), using either the block maxima or the peaks over threshold approach. Using a multivariate peaks over threshold approach, Castro-Camilo and Huser (2020) propose local likelihood inference for a specific factor copula model to deal with complex non-stationary dependence structures of precipitation over the contiguous U.S. Spatial modelling of extreme sub-daily precipitation is more difficult, due to less available data sources. Consequently, this is often performed using intensity-duration-frequency relationships where one pools together information from multiple aggregation times in order to estimate return levels (Koutsoyiannis et al., 1998;Lehmann et al., 2016;Ulrich et al., 2020;Wang & So, 2016). Spatial modelling of extreme hourly precipitation in Norway has previously been performed by Dyrrdal et al. (2015). After their work was published, the number of observational sites for hourly precipitation in Norway has greatly increased. We aim to improve their return level estimates by including all the new data that have emerged over the last years. We model sub-daily precipitation using a spatial Bayesian hierarchical model with a bGEV likelihood and a latent Gaussian field. In order to keep our model simple, we do not pool together information from multiple aggregation times, making our model purely spatial. The model assumes conditional independence between observations, which makes it able to estimate the marginal distribution of extreme sub-daily precipitation at any location, but unable to successfully estimate joint distributions over multiple locations. In the case of hydrological processes such as precipitation, ignoring dependence might lead to an underestimation of the risk of flooding. However, Davison et al. (2012) find that models where the response variables are independent given some latent process can be a good choice when the aim is to estimate a spatial map of marginal return levels.
High-resolution spatial modelling can demand a lot of computational resources and be highly time-consuming. The framework of integrated nested Laplace approximations (INLA; Rue et al., 2009) allows for a considerable speed-up by using numerical approximations instead of sampling-based inference methods like Markov chain Monte Carlo (MCMC). Inference with a spatial Gaussian latent field can be even further sped up with the so-called stochastic partial differential equation (SPDE; Lindgren et al., 2011) approach of representing a Gaussian random field using a Gaussian Markov random field that is the approximate solution of a specific SPDE. Both INLA and the SPDE approach have been implemented in the R-INLA library, which is used for performing inference with our model Bivand et al., 2015;Rue et al., 2017). R-INLA requires a log-concave likelihood to ensure numerical stability during inference. However, neither the GEV likelihood nor the bGEV likelihood are log-concave, which can cause inferential issues. We present heuristics for mitigating the risk of numerical instability caused by a lack of log-concavity.
A downside of the block maxima method is that inference can be somewhat wasteful compared to the peaks over threshold method. Additionally, most of the available weather stations in Norway that measure hourly precipitation are young and contain quite short time series. This data sparsity makes it challenging to place complex models on the parameters of the bGEV distribution in the hierarchical model. A promising method of accounting for data-sparsity is the recently developed sliding block estimator, which allows for better data utilisation by not requiring that the block maxima used for inference come from disjoint blocks (Bücher & Segers, 2018;Zou et al., 2019). However, to the best of our knowledge, no theory has yet been developed for using the disjoint block estimator on non-stationary time series, or for performing Bayesian inference with the disjoint block estimator. Vandeskog et al. (2021) propose a new two-step procedure that allows for less wasteful and more stable inference with the block maxima method by separately modelling the scale parameter of the bGEV distribution using peaks over threshold data. Having modelled the scale parameter, one can standardise the block maxima so the scale parameter can be considered as a constant, and then estimate the remaining bGEV parameters. Bücher and Zhou (2021) suggests that, when modelling stationary time series, the peaks over threshold technique is preferable over block maxima if the interest lies in estimating large quantiles of the stationary distribution of the times series. The opposite holds if the interest lies in estimating return levels, i.e. quantiles of the distribution of the block maxima. Thus, both methods have different strengths, and by using this two-step procedure, one can take advantage of the merits and improve the pitfalls of both methods. We apply the two-step procedure for modelling sub-daily precipitation and compare the performance with that of a standard block maxima model where all the bGEV parameters are estimated jointly.
The remainder of the paper is organised as follows. Section 2 introduces the hourly precipitation data and all explanatory variables used for modelling. Section 3 presents the bGEV distribution and describes the Bayesian hierarchical model along with the two-step modelling procedure. Additionally, heuristics for improving the numerical stability of R-INLA are proposed, and a score function for evaluating model performance is presented. In Section 4 we perform modelling of the yearly precipitation maxima in Norway. A cross-validation study is performed for evaluating the model fit, and a map of return levels is estimated. Conclusions are presented in Section 5.

Hourly precipitation data
Observations of hourly aggregated precipitation from a total of 380 weather stations in the south of Norway are downloaded from an open archive of historical weather data from MET Norway (https://frost.met.no). The oldest weather stations contain observations from 1967, but approximately 90 percent of the available weather stations are established after 2000. Each observation comes with a quality code, but almost all observations from before 2005 are of unknown quality. An inspection of the time series with unknown quality detects unrealistic precipitation observation ranging from −300 mm/hour to 400 mm/hour. Other unrealistic patterns, like 50 mm/hour precipitation for more than three hours in a row, or no precipitation during more than half a year, are also detected. The data set contains large amounts of missing data, but these are often recorded as 0 mm/hour, instead of being recorded as missing. Thus, there is no way of knowing which of the zeros represent missing data and which represent an hour without precipitation. Having detected all of this, we decide to remove all observations with unknown or bad quality flags, which accounts for approximately 14% of the total number of observations. Additionally, we clean the data by removing all observations from years with more than 30% missing data and from years where more than two months contain less than 20% of the possible observations. This data cleaning is performed to increase the probability that our observed yearly maxima are close or equal to the true yearly maxima. Having cleaned the data, we are left with 72% of the original observations, distributed over 341 weather stations and spanning the years 2000 to 2020. The total number of usable yearly maxima is approximately 1900. Figure 1 displays the distribution of the number of usable yearly precipitation maxima per weather station. The majority of the weather stations contain five or less usable yearly maxima, and approximately 50 stations have more than 10 usable maxima. Figure 1 also displays the location of all the weather stations. A large amount of the stations are located close to each other, in the southeast of Norway. Such spatial clustering can be an indicator for preferential sampling. However, we do not believe that preferential sampling is an issue for our data. The weather stations are mostly placed in locations with high population densities, and to the best of our knowledge there is no strong dependency between population density and extreme precipitation in Norway, as there are large cities located both in dry and wet areas of the country. Even though most stations are located in areas with high population densities, there is still a good spatial coverage of the entire area of interest, also for areas with low population densities.
The yearly maxima of precipitation accumulated over 1, 2, . . . , 24 hours are computed for all locations and available years. A rolling window approach with a step size of 1 hour is used for locating the precipitation maxima. As noted by Robinson and Tawn (2000), a sampling frequency of one hour is not enough to observe the exact yearly maximum of hourly precipitation. With this sampling frequency, one only observes the precipitation during the periods 00:00-01:00, 01:00-02:00, etc., whereas the maximum precipitation might occur e.g. during the period 14:23-15:23. Approximately half of the available weather stations have a sampling frequency of one minute, while the other half only contain hourly observations. We therefore use a sampling frequency of one hour for all weather stations, as this allows us to use all the 341 weather stations without having to account for varying degrees of sampling frequency bias in our model. Dyrrdal et al. (2015) used the same data source for estimating return levels of hourly precipitation. They fitted their models to hourly precipitation maxima using only 69 weather stations from all over Norway. However, they received a cleaned data set from the Norwegian Meteorological Institute, resulting in time series with lengths up to 45 years. Our data cleaning approach is more strict than that of Dyrrdal et al. (2015) in the sense that it results in shorter time series by removing all data of uncertain quality. On the other hand, we include more locations and get a considerably better spatial coverage, by keeping all time series with at least one good year of observations.
The main focus of this paper is the novel methodology for fast and accurate estimation of return levels, and we believe that we have prepared the data well enough to give a good demonstration of our proposed model and to achieve reliable return level estimates for sub-daily precipitation. It is trivial to add more, or differently cleaned data, to improve the return level estimates at a later time.  Crespi et al. (2018). The climatologies do not cover the years 2011-2020, from which most of the observations come. We assume that the precipitation patterns have not changed overly much and that they are still representative for the years 2011-2020. Hanssen-Bauer and Førland (1998) find that, in most southern regions of Norway, the only season with a significant increase in precipitation is Autumn. This strengthens our assumption that the change in precipitation patterns is slow enough to not be problematic for us. Dyrrdal et al. (2015) include additional explanatory variables in their model, such as temperature, summer precipitation and the yearly number of wet days. They find mean summer precipitation to be one of the most important explanatory variables. We compute these explanatory variables at all station locations using the gridded seNorge2 data product (Lussana, Saloranta, et al., 2018;. Our examination finds that yearly precipitation, summer precipitation and the yearly number of wet days are close to 90% correlated with each other. There is also a negative correlation between temperature and altitude of around -85%. Consequently, we choose to not use any more explanatory variables for modelling, as highly correlated variables might lead to identifiability issues during parameter estimation.

The bGEV distribution
Extreme value theory concerns the statistical behaviour of extreme events, possibly larger than anything ever observed. It provides a framework where probabilities associated with these events can be estimated by extrapolating into the tail of the distribution. This can be used for e.g. estimating large quantiles, which is the aim of this work (e.g. Coles, 2001;Davison & Huser, 2015). A common approach in extreme value theory is the block maxima method. Assume that the limiting distribution of the standardised block maximum (Y k − b k )/a k is non-degenerate, where Y k = max{X 1 , X 2 , . . . , X k } is the maximum over k random variables from a stationary stochastic process, and {b k } and {a k > 0} are some appropriate sequences of standardising constants. Then, for large enough block sizes k, the distribution of the block maximum Y k is approximately equal to the GEV distribution with (Coles, 2001;Fisher & Tippett, 1928) where (a) + = max{a, 0}, σ k > 0 and µ k , ξ ∈ R. In most settings, k is fixed, so we denote σ = σ k and µ = µ k . A challenge with the GEV distribution is that its support depends on its parameters. This complicates inference procedures such as maximum likelihood estimation (e.g. Bücher & Segers, 2017;Smith, 1985), and can be particularly problematic in a covariate-dependent setting with spatially varying parameters, as it might also introduce artificial boundary restrictions such as an unnaturally large lower bound for yearly maximum precipitation. Castro-Camilo et al. (2021) propose the bGEV distribution as an alternative to the GEV distribution in settings where the tail parameter ξ is non-negative. The support of the bGEV distribution is parameter-free and infinite. This allows for more numerically stable inference, while also avoiding the possibility of estimated lower bounds that are larger than future observations. The bGEV distribution function is where F is a GEV distribution with ξ ≥ 0 and G is a Gumbel distribution. The weight function is equal to where F β (·; c 1 , c 2 ) is the distribution function of a beta distribution with parameters c 1 = c 2 = 5, which leads to a symmetric and computationally efficient weight function. The weight v(y; a, b) is zero for y ≤ a and one for y ≥ b, meaning that the left tail of the bGEV distribution is equal to the left tail in G, while the right tail is equal to the right tail in F . The choice of the weight v(y; a, b) should not considerably affect inference if we let the difference between a and b be small. The parametersμ andσ are injective functions of (µ, σ, ξ) such that the bGEV distribution function is continuous and F (y; µ, σ, ξ) = G(y;μ,σ) for y ∈ {a, b}. Setting a = F −1 (p a ) and b = F −1 (p b ) with small probabilities p a = 0.1, p b = 0.2 makes it possible to model the right tail of the GEV distribution without any of the problems caused by a finite left tail. See Castro-Camilo et al. (2021) for guidelines on how to choose c 1 , c 2 , p a and p b .
In the supplementary material we present a simulation study where both the GEV distribution and the bGEV distribution are fitted to univariate samples from a GEV distribution. We demonstrate how a small change in initial values can cause large numerical problems for inference with the GEV distribution, and no noticeable difference for inference with the bGEV distribution. The fact that considerable numerical problems can arise for the GEV distribution in a univariate setting with large sample sizes and perfectly GEV-distributed data strongly indicates that the GEV distribution is not robust enough to be used reliably in complex, high-dimensional problems with noisy data. The bGEV distribution is more robust than the GEV distribution, and we therefore prefer it over the GEV distribution for modelling precipitation maxima in Norway.
Although the bGEV distribution is more robust than the GEV distribution, it might still seem unnatural to model block maxima using the bGEV distribution, when it is known that the correct limiting distribution is the GEV distribution. However, we argue that the bGEV would be a good choice for modelling heavy tailed block maxima even if it had not been more robust than the GEV distribution. In multivariate extreme value theory it is common to assume that the tail parameter ξ of the GEV distribution is constant in time and/or space (e.g. Castro-Camilo et al., 2019; Koutsoyiannis et al., 1998;Opitz et al., 2018;Sang & Gelfand, 2010). This assumption is often made, not because one truly believes that it should be constant, but because estimation of ξ is difficult, and models with a constant ξ often are "good enough". The tail parameter is incredibly important for the shape of the GEV distribution, and small changes in ξ can lead to large changes in return levels, and even affects the existence of distributional moments. A model where ξ varies in space can therefore e.g. provide model fits with a finite mean in one location and an infinite mean in the neighbouring location. Such a model can also give scenarios where a new observation at one location can change the existence of moments in other, possibly far away, locations. Thus, even though it might seem unnatural to use a constant tail parameter, these models often provide more natural fits to data than the models that allow ξ to vary in space. We claim that the bGEV distribution fulfils a similar role as a model with constant ξ, but for the model support instead of the moments. When ξ is positive, the support of the GEV distribution varies with its parameter values. In regression settings with covariates and finite amounts of data one can therefore experience unnatural lower bounds that are known to be wrong. Furthermore, if only one new observation is smaller than the estimated lower limit, the entire model fit will be invalidated. We therefore prefer the bGEV, which completely removes the lower bound while still having the right tail of the GEV distribution, thus yielding a model that is "good enough" for estimating return levels, but without the unwanted model properties in the left tail of the GEV distribution.
Naturally, the bGEV distribution can only be applied for modelling exponential-or heavytailed phenomena (ξ ≥ 0). However, it is well established that extreme precipitation should be modelled with a non-negative tail parameter. Cooley et al. (2007) performs Bayesian spatial modelling of extreme daily precipitation in Colorado and find that the tail parameter is positive and less than 0.15. Papalexiou and Koutsoyiannis (2013) examine more than 15000 records of daily precipitation worldwide and conclude that the Fréchet distribution performs the best. They propose that even when the data suggests a negative tail parameter, it is more reasonable to use a Gumbel or Fréchet distribution. Less information is available concerning the distribution of extreme sub-daily precipitation. However, Koutsoyiannis et al. (1998) argues that the distribution of precipitation should not have an upper bound for any aggregation period, so ξ must be non-negative. Van de Vyver (2012) estimate the distribution of yearly precipitation maxima in Belgium for aggregation times down to 1 minute, and find that the estimates of ξ increase as the aggregation times decreases, meaning that the tail parameter for sub-daily precipitation should be larger than for daily precipitation. Dyrrdal et al. (2016) estimate ξ for daily precipitation in Norway from the seNorge1 data product (Mohr, 2009;Tveito et al., 2005) and conclude that the tail parameter estimates are non-constant in space and often negative. However, the authors do not provide confidence intervals or p-values and do not state whether the estimates are significantly different from zero. Based on our own exploratory analysis (results not shown) and the overwhelming evidence in the literature, we assume that sub-daily precipitation is a heavy-tailed phenomenon.
Following Castro-Camilo et al. (2021), we reparametrise the bGEV distribution from (µ, σ, ξ) to (µ α , σ β , ξ), where the location parameter µ α is equal to the α quantile of the bGEV distribution if α ≥ p b . The scale parameter σ β , hereby denoted the spread parameter, is equal to the difference between the 1 − β/2 quantile and the β/2 quantile of the bGEV distribution if β/2 ≥ p b . There is a one to one relationship between the new and the old parameters. The new parametrisation is advantageous as it is considerably easier to interpret than the old parametrisation. The parameters µ α and σ β are directly connected to the quantiles of the bGEV distribution, whereas µ and σ have no simple connections with any kind of moments or quantiles. Consequently, it is much easier to choose informative priors for µ α and σ β . Based on preliminary experiments, we find that α = 0.5 and β = 0.8 are good choices that makes it easy to select informative priors. This is because the empirical quantiles close to the median have less variance. We have also experienced that R-INLA is more numerically stable when the spread is small, i.e. β is large.

Models
Let y t (s) denote the maximum precipitation at location s ∈ S during year t ∈ T , where S is the study area and T is the time period in focus. We assume a bGEV distribution for the yearly precipitation maxima, where all observations are assumed to be conditionally independent given the parameters µ α (s), σ β (s) and ξ(s). Correct estimation of the tail parameter is a difficult problem which highly affects estimates of large quantiles. The tail parameter is assumed to be constant, i.e. ξ(s) = ξ. As discussed in Section 3.1, this is a common procedure, as inference for ξ is difficult with little data. The tail parameter is further restricted such that ξ < 0.5, resulting in a finite mean and variance for the yearly maxima. This restriction makes inference easier and more numerically stable. Exploratory analysis of our data supports the hypothesis of a spatially constant ξ < 0.5 and spatially varying µ α (s) and σ β (s) (results not shown). Two competing models are constructed for describing the spatial structure of µ α (s) and σ β (s).

The joint model
In the first model, denoted the joint model, both parameters are modelled using linear combinations of explanatory variables. Additionally, to draw strength from neighbouring stations, a spatial Gaussian random field is added to the location parameter. This gives the model where x µ (s) and x σ (s) are vectors containing an intercept plus the explanatory variables described in Table 1, and β µ and β σ are vectors of regression coefficients. The term u µ (s) is a zero-mean Gaussian field with Matérn correlation function, i.e., Here, d(s i , s j ) is the Euclidean distance between s i and s j , ρ > 0 is the range parameter and ν > 0 is the smoothness parameter. The function K ν is the modified Bessel function of the second kind and order ν. The Matérn family is a widely used class of covariance functions in spatial statistics due to its flexible local behaviour and attractive theoretical properties (Guttorp & Gneiting, 2006;Matern, 1986;Stein, 1999). Its form also naturally appears as the covariance function of some models for the spatial structure of point rain rates (Sun et al., 2015). Efficient inference for high-dimensional Gaussian random fields can be achieved using the SPDE approach of Lindgren et al. (2011), which is implemented in R-INLA. It is common to fix the smoothness parameter ν instead of estimating it, as the parameter is difficult to identify from data. The SPDE approximation in R-INLA allows for 0 < ν ≤ 1.
We choose ν = 1 as this reflects our beliefs about the smoothness of the underlying physical process. Additionally, Whittle (1954) argues that ν = 1 is a more natural choice for spatial models than the less smooth exponential correlation function (ν = 1/2), and ν = 1 is also the most extensively tested value when using R-INLA with the SPDE approach (Lindgren & Rue, 2015). The joint model is similar to the models of Davison et al. (2012), Dyrrdal et al. (2015), and Geirsson et al. (2015). However, they all place a Gaussian random field in the linear predictor for the log-scale and for the tail parameter. Within the R-INLA framework, it is not possible to model the spread or the tail using Gaussian random fields. Based on the amount of available data and the difficulty of estimating the spread and tail parameters, we also believe that the addition of a spatial Gaussian field in either parameter would simply complicate parameter estimation without any considerable contributions to model performance. Consequently, we do not include any Gaussian random field in the spread or tail of the bGEV distribution.

The two-step model
The second model is specifically tailored for sparse data with large block sizes. In such data-sparse situations, a large observation at a single location can be explained by a large tail parameter or a large spread parameter. In practice this might cause identifiability issues between σ β (s) and ξ, even though the parameters are identifiable in theory. In order to put a flexible model on the spread while avoiding such issues, Vandeskog et al. (2021) propose a model which borrows strength from the peaks over threshold method for separate modelling of σ β (s).
For some large enough threshold x thr (s), the distribution of sub-daily precipitation X(s) larger than x thr (s) is assumed to follow a generalised Pareto distribution (Davison & Smith, 1990) P (X(s) > x thr (s) + x|X(s) > x thr (s)) = 1 + ξx ζ(s) −1/ξ , with tail parameter ξ and scale parameter ζ(s) = σ(s) + ξ(x thr (s) − µ(s)), where µ(s) and σ(s) are the original GEV parameters from (1). Since ξ is assumed to be constant in space, all spatial variations in the bGEV distribution must stem from µ(s) or σ(s). We therefore assume that the difference x thr (s) − µ(s) between the threshold and the location parameter is proportional to the scale parameter σ(s). This assumption leads to the spread σ β (s) being proportional to the standard deviation of all observations larger than the threshold x thr (s). Based on this assumption, it is possible to model the spatial structure of the spread parameter independently of the location and tail parameter. Denote with σ * β a standardising constant and σ * (s) the standard deviation of all observations larger than x thr (s) at location s. Conditional on σ * (s), the block maxima can be standardised as y * t (s) = y t (s)/σ * (s).
The standardised block maxima have a bGEV distribution with a constant spread parameter, where µ * α (s) = µ α (s)/σ * (s). Consequently, the second model is divided into two steps. First, we model the standard deviation of large observations at all locations. Second, we standardise the block maxima observations and model the remaining parameters of the bGEV distribution. We denote this as the two-step model. The two-step model shares some similarities with regional frequency analysis (Carreau et al., 2016;Dalrymple, 1960;Hosking & Wallis, 1997;Naveau et al., 2014), which is a multi-step procedure where the data are standardised and pooled together inside homogeneous regions. However, we standardise the data differently and do not pool together data from different locations. Instead, we borrow strength from nearby locations by adding a spatial Gaussian random fields to our model and by keeping ξ constant for all locations.
The location parameter µ * α (s) is modelled as a linear combination of explanatory variables x µ (s) and a Gaussian random field u µ (s), just as µ α (s) in the joint model (3). For estimation of σ * (s), the threshold x thr (s) is chosen as the 99% quantile of all observed precipitation at location s. The precipitation observations larger than x thr (s) are declustered to account for temporal dependence, and only the cluster maximum of an exceedance is used for estimating σ * (s). This might sound counter-intuitive, as the aim of the two-step model is to use more data to simplify inference. However, even when only using the cluster maxima, inference is less wasteful than for the joint model. By using all threshold exceedances for estimating σ * (s), we would need to account for the dependence within exceedance clusters, which would add another layer of complexity to the modelling procedure. Consequently, we have chosen to not model the temporal dependence and only use the cluster-maxima for inference in this paper. To avoid high uncertainty from locations with few observations, σ * (s) is only computed at stations with more than three years of data. In order to estimate σ * (s) at locations with little or no observations, a linear regression model is used, where the logarithm of σ * (s) is assumed to have a Gaussian distribution, with precision τ and mean η(s) = x σ (s) T β σ . The estimated posterior mean from the regression model is then used as an estimator for σ * (s) at all locations. Consequently, the complete two-step model is given as Notice that the formulation of the two-step model makes it trivial to add more complex components for modelling the spread. One can, therefore, easily add a spatial Gaussian random field to the linear predictor of log(σ * (s)) while still using the R-INLA framework for inference, which is not possible with the joint model. In Section 4 we perform modelling both with and without a Gaussian random field in the spread to test how it affects model performance.
The uncertainty in the estimator for σ * (s) is not propagated into the bGEV model for the standardised response, meaning that the estimated uncertainties from the two-step model are likely to be too small. This can be corrected with a bootstrapping procedure, where we draw B samples from the posterior of log(σ * (s)) and estimate (µ * α (s), σ * β , ξ) for each of the B samples. Vandeskog et al. (2021) show that the two-step model with 100 bootstrap samples is able to outperform the joint model in a simple setting.
It might seem contradictory to employ a model based on exceedances in our setting, since we claim that the data quality is too bad to use the peaks over threshold model for estimating return levels. However, merely estimating the standard deviation of all threshold exceedances is a much simpler task than to estimate spatially varying parameters of the generalised Pareto distribution, including the tail parameter ξ. Thus, while we claim that the available data is not of good enough quality to estimate return levels in a similar fashion to Opitz et al. (2018), we also claim that it is of good enough quality to perform the simple task of estimating the trends in the spread parameter. The estimation of all remaining parameters, including ξ, is performed using block maxima data, which we believe to be of better quality.

INLA
By placing a Gaussian prior on β µ , both the joint and the two-step models fall into the class of latent Gaussian models. This is advantageous as it allows for inference using INLA with the R-INLA library (Bivand et al., 2015;Rue et al., 2009;Rue et al., 2017). The extreme value framework is quite new to the R-INLA package. Still, in recent years, some papers have started to appear where it is used for modelling extremes with INLA (e.g. Castro-Camilo et al., 2019;Opitz et al., 2018). R-INLA includes an implementation of the SPDE approximation for Gaussian random fields with a Matérn correlation function, which is used on the random field u µ (s) for a considerable improvement in inference speed.
A requirement for using INLA is that the model likelihood is log-concave. Unfortunately, neither the GEV distribution nor the bGEV distribution have log-concave likelihoods when ξ > 0. This can cause severe problems for model inference. However, we find that these problems are mitigated by choosing slightly informative priors for the model parameters, which is possible because of the reparametrisation described in Section 3.1. Additionally, we find that R-INLA is more stable when given a response that is standardised such that the difference between its 95% quantile and its 5% quantile is equal to 1. Based on the authors' experience, similar standardisation of the response is also a common procedure when using INLA for estimating the Weibull distribution parameters within the field of survival analysis. We believe that the combination of slightly informative priors and standardisation of the response is enough to fix the problems of non-concavity and ensure that R-INLA is working well with the bGEV distribution.

Evaluation
Model performance can be evaluated using the continuous ranked probability score (CRPS; Friederichs & Thorarinsdottir, 2012;Gneiting & Raftery, 2007;Matheson & Winkler, 1976), where F is the forecast distribution, y is an observation, p (x) = x(p − I(x < 0)) is the quantile loss function and I(·) is an indicator function. The CRPS is a strictly proper scoring rule, meaning that the expected value of CRPS(F, y) is minimised for G = F if and only if y ∼ G. The importance of proper scoring rules when forecasting extremes is discussed by Lerch et al. (2017). From (5), one can see that the CRPS is equal to the integral over the quantile loss function for all possible quantiles. However, we are only interested in predicting large quantiles, and the model performance for small quantiles is of little importance to us. The threshold weighted CRPS (twCRPS; Gneiting & Ranjan, 2011) is a modification of the CRPS that allows for emphasis on specific areas of the forecast distribution, twCRPS(F, y) = 2 1 0 p y − F −1 (p) w(p)dp, where w(p) is a non-negative weight function. A possible choice of w(p) for focusing on the right tail is the indicator function w(p) = I(p > p 0 ). As described by Bolin and Wallin (2019), the mean twCRPS is not robust to outliers and it gives more weight to forecast distributions with large variances, i.e. at locations far away from any weather station. A scaled version of the twCRPS, denoted the StwCRPS, is created using theorem 5 of Bolin and Wallin (2019): where S(F, y) is the twCRPS and S(F, F ) is its expected value with respect to the forecast distribution, S(F, F ) = S(F, y)dF (y).
The mean StwCRPS is more robust to outliers and varying degrees of uncertainty in forecast distributions, while still being a proper scoring rule (Bolin & Wallin, 2019).
Using R-INLA we are able to sample from the posterior distribution of the bGEV parameters at any location s. The forecast distribution at location s is therefore given as where F is the distribution function of the bGEV distribution and (µ β (s), ξ (i) ) are drawn from the posterior distribution of the bGEV parameters for i = 1, . . . , m, where m is a multiple of the number B of bootstrap samples. A closed-form expression is not available for the twCRPS when using the forecast distribution from (8). Consequently, we evaluate the twCRPS and StwCRPS using numerical integration.

Modelling sub-daily precipitation extremes in Norway
The models from Section 3 are applied for estimating return levels in the south of Norway. Table 1 shows which explanatory variables are used for modelling the location and spread parameters in both models. All explanatory variables are standardised to have zero mean and a standard deviation of 1, before being applied for modelling. Inference for the two-step model is performed both with and without propagation of the uncertainty in σ * (s). The uncertainty propagation is achieved using 100 bootstrap samples, as described in Section 3.2.2. Additionally, we modify the two-step model and add a random Gaussian field u σ (s) to the linear predictor of the log-spread, to test if this can yield any considerable improvement in model performance. Just as u µ (s), u σ (s) has zero mean and a Matérn covariance function.

Prior selection
Priors must be specified before we can model the precipitation extremes. From construction, the location parameter µ α is equal to the α quantile of the bGEV distribution. This allows us to place a slightly informative prior on β µ , using quantile regression on y * (s) (Koenker, 2005(Koenker, , 2020. We choose a Gaussian prior for β µ , centred at the α quantile regression estimates and with a precision of 10. There is no unit on the precision in β µ because the block maxima have been standardised, as described in Section 3.3. The regression coefficients β σ differ between the two-step and joint models. In the joint model, all the coefficients in β σ , minus the intercept coefficient, are given Gaussian priors with zero mean and a precision of 10 −3 . The intercept coefficient, here denoted β 0,σ , is given a log-gamma prior with parameters such that exp(β 0,σ ) has a gamma prior with mean equal to the empirical difference between the 1 − β/2 quantile and the β/2 quantile of the standardised block maxima. The precision of the gamma prior is 10. In the two-step model, all coefficients of β σ are given Gaussian priors with zero mean and a precision of 10 −3 , while the logarithm of σ * β is given the same log-gamma prior as the intercept coefficient in the joint model.
The parameters of the Gaussian random fields u µ and u σ are given penalised complexity (PC) priors. The PC prior is a weakly informative prior distribution, designed to punish model complexity by placing an exponential prior on the distance from some base model . Fuglstad et al. (2019) develop a joint PC prior for the range ρ > 0 and standard deviation ζ > 0 of a Gaussian random field, where the base model is defined to have infinite range and zero variance. The prior contains two penalty parameters, that can be decided by specifying the four parameters ρ 0 , α 1 , ζ 0 and α 2 such that P (ρ < ρ 0 ) = α 1 and P (ζ > ζ 0 ) = α 2 . We choose α 1 = α 2 = 0.05. ρ 0 is given a value of 75 km for both the random fields, meaning that we place a 95% probability on the range being larger than 75 km. To put this range into context, the study area has a dimension of approximately 730 × 460 km 2 , and the mean distance from one station to its closest neighbour is 10 km. ζ 0 is given a value of 0.5 mm for u σ , meaning that we place a 95% probability on the standard deviation being smaller than 0.5 mm. This seems to be a reasonable value because the estimated logarithm of σ * (s) lies in the range between 0.1 mm and 3.5 mm for all available weather stations and all examined aggregation times. For u µ we set ζ 0 = 0.5, which is a reasonable value because of the standardisation of the response described in Section 3.3.
A PC prior is also placed on the tail parameter ξ. Opitz et al. (2018) develop a PC prior for the tail parameter of the generalised Pareto distribution, which is the default prior for ξ in R-INLA when modelling with the bGEV distribution. However, to the best of our knowledge, expressions for the PC priors for ξ in the GEV or bGEV distributions are not previously available in the literature. In the supplementary material we develop expressions for the PC prior of ξ ∈ [0, 1) with base model ξ = 0 for the GEV distribution and the bGEV distribution. Closed-form expressions do not exist, but the priors can be approximated numerically. Having computed the PC priors for the GEV distribution and the bGEV distribution, we find that they are similar to the PC prior of the generalised Pareto distribution, which has a closed-form expression and is already implemented in R-INLA. Consequently, we choose to model the tail parameter of the bGEV distribution with the PC prior for the generalised Pareto distribution (Opitz et al., 2018): with 0 ≤ ξ < 1 and penalty parameter λ. Even though the prior is defined for values of ξ up to 1, a reparametrisation is performed within R-INLA such that 0 ≤ ξ < 0.5. Since the base model has ξ = 0, the prior places more weight on small values of ξ when λ increases. Based on the plots in Figure S2.1 in the supplementary material, we find a value of λ = 7 to give a good description of our prior beliefs, as we expect ξ to be positive but small.

Cross-validation
Model performance is evaluated using five-fold cross-validation with the StwCRPS. The StwCRPS weight function is chosen as w(p) = I(p > 0.9). Both in-sample and out-of-sample performance are evaluated. The mean StwCRPS over all five folds are displayed in Table 2.
The two-step model outperforms the joint model for all aggregation times. This implies that information about threshold exceedances can provide valuable information when modelling block maxima. When performing in-sample estimation, the variant of the two-step model with a Gaussian field and without bootstrapping always outperforms the other contestants. However, during out-of-sample estimation, the model performs worse than its competitors. This indicates a tendency to overfit when not using bootstrapping to propagate uncertainty in σ * (s) into the estimation of (µ * α (s), σ * β , ξ). The two variants of the two-step model that use bootstrapping perform best during out-of-sample estimation. While their model fits yield similar scores, their difference in complexity is quite considerable, as one model contains two spatial random fields, and the other only contains one. This shows that there is little need for placing an overly complex model on the spread parameter. Consequently, for estimation of the bGEV parameters and return levels, we choose to use the two-step model with bootstrapping and without a spatial Gaussian random field in the spread.

Parameter estimates
The parameters of the two-step model are estimated for different aggregation times between 1 and 24 hours. Uncertainty is propagated using B = 100 bootstrap samples. Estimation of the posterior of (µ * α (s), σ * β , ξ) given some value of σ * (s) takes less than 2 minutes on a 2.4 gHz laptop with 16 GB RAM, and the 100 bootstraps can be computed in parallel. On a moderately sized computational server, inference can thus be performed in well under 10 minutes.
The estimated values of the regression coefficients β µ and β σ , the spread σ * β and the standard deviation of the Gaussian field u µ (s) for the standardised precipitation maxima, are displayed in Table 3 for some selected temporal aggregations. These estimates are computed by drawing 20 samples from each of the 100 posterior distributions. The empirical mean, standard deviation and quantiles of these 2000 samples are then reported. There is strong evidence that all the explanatory variables in x σ (s) are affecting the spread, with the northing being the most important explanatory variable. There is considerably less evidence that all our chosen explanatory variables have an effect on the location parameter. However, as the posterior distribution of β µ is estimated using 100 different samples from the posterior of σ * (s), it might be that the different regression coefficients are more significant for some of the standardisations, and less significant for others. The explanatory variable that has the greatest effect on the location parameter seems to be the mean annual precipitation. Thus, at locations with large amounts of precipitation, we expect the extreme precipitation to be heavier than at locations with little precipitation. From the estimates for β σ , we also expect more variance in the distribution of extreme precipitation in the south. The Table 2: Mean StwCRPS with weight function w(p) = I(p > 0.9) for 5-fold cross-validation performed using out-of-sample estimation and in-sample estimation. The two-step method is tested with and without a Gaussian field in the spread and bootstrapping for propagation of uncertainty. Cross-validation is performed for precipitation aggregated over periods of 1 hour, 3 hours, 6 hours, 12 hours and 24 hours. The best scores are written in bold.  Table 3: Estimated regression coefficients β µ , β σ and estimated standard deviation SD(u µ ) of the Gaussian field u µ (s) in the two-step model for yearly maximum precipitation at different temporal aggregations.  standard deviation of u µ (s) is of approximately the same magnitude as most of the regression coefficients in β µ . Table 4 displays the posterior range of the u µ (s). For the available data, the median number of neighbours within a radius of 50 km is 17, and the median number of neighbours within a radius of 100 km is 36. Based on these numbers, one can see that the Gaussian field is able to introduce spatial correlation between a large number of different stations. The range of the Gaussian field is considerably reduced as the temporal aggregation increases. It seems that, for 1 hour precipitation, the regression coefficients are unable to explain some kind of large-scale phenomenon that considerably affects the location parameter µ α (s). To correct this, the range of u µ (s) has to be large. For longer aggregation periods, this phenomenon is not as important anymore, and the regression coefficients are able to explain most of the large-scale trends. Consequently, the range of u µ (s) is decreased. The posterior means of u µ (s) for three different temporal aggregations are displayed over a 1 × 1 km 2 gridded map in Figure 2. It is known that extreme precipitation dominates in the southeast of Norway for short aggregation times because of its large amount of convective precipitation (see, e.g, Dyrrdal et al., 2015). Based on Figure 2 it becomes evident that our explanatory variables are unable to describe this regional difference when modelling hourly precipitation, and u µ (s) has to do the job of separating between the east and the west. As the temporal aggregations increase from one hour to three and six hours, the difference between east and west diminishes, and it seems that the explanatory variables do a better job of explaining the trends in the location parameter µ α (s).

Model
The posterior distribution of ξ is also described in Table 4. The tail parameter seems to decrease quickly as the aggregation time increases, and it is practically constant for precipitation over longer periods than 12 hours. This makes sense given the observation of Barbero et al. (2019) that most 24 hour annual maximum precipitation comes from rainstorms with lengths of less than 15 hours. Thus, the tail parameter for 24 hour precipitation should be close to the tail parameter for 12 hour precipitation. For 12 hours and up, the tail parameter is so small that one may wonder if a Gumbel distribution would not have given a better fit to the data. However, this is not the case for the shorter aggregation times, where the tail parameter is considerably larger.

Return levels
We use the two-step model for estimating large return levels for the yearly precipitation maxima. Posterior distributions of the 20 year return levels are estimated on a grid with resolution 1 × 1 km 2 . The posterior means and the widths of the 95% credible intervals are displayed in Figure 3. For a period of 1 hour the most extreme precipitation is located southeast in Norway, while for longer periods, the extreme precipitation is moving over to the west coast. These results are expected since we know that the convective precipitation of the southeast dominates for short aggregation periods. At the same time, the southwest of Norway generally has more precipitation, making it the dominant region for longer aggregation times. The spatial structure and magnitude of the 20 year return levels for hourly precipitation are similar to the estimates of Dyrrdal et al. (2015), but with considerably thinner credible intervals. This makes sense as more data are available, and the two-step model is able to perform less wasteful inference. In addition, our model is much more simple, as they include a random Gaussian field in all three parameters, while we only include a random Gaussian field in the location parameter. This can also lead to less uncertainty in the return level estimates.

Conclusion
The blended generalised extreme value (bGEV) distribution is applied as a substitute for the generalised extreme value (GEV) distribution for estimation of the return levels of sub-daily precipitation in the south of Norway. The bGEV distribution simplifies inference by introducing a parameter-free support, but can only be applied for modelling of heavytailed phenomena. Sub-daily precipitation maxima are modelled using a spatial Bayesian hierarchical model with a latent Gaussian field. This is implemented using both integrated nested Laplace approximations (INLA) and the stochastic partial differential equation (SPDE) approach, for fast inference. Inference is also made more stable and less wasteful by our novel two-step modelling procedure that borrows strength from the peaks over threshold method when modelling block maxima. Like the GEV distribution, the bGEV distribution suffers from a lack of log-concavity, which can cause problems when using INLA. We are able to mitigate any problems caused by a lack of log-concavity by choosing slightly informative priors and standardising the data. We find that the bGEV distribution performs well as a model for extreme precipitation. The two-step model successfully utilises the additional information provided by the peaks over threshold data and is able to outperform models that only use block maxima data for inference.

Funding
This publication is part of the World of Wild Waters (WoWW) project, which falls under the umbrella of Norwegian University of Science and Technology (NTNU)'s Digital Transformation initiative.

Code and data availability
The necessary code and data for achieving these results are available online at https://github. com/siliusmv/inlaBGEV. The unprocessed data are freely available online, as described in Section 2.

Supplementary material S1 Simulation study
A simulation study is conducted for comparing the performance of the generalised extreme value (GEV) distribution and blended generalised extreme value (bGEV) distribution in a univariate setting. Our penultimate goal when modelling block maxima is to estimate return levels. Thus, the two distributions are compared by evaluating the performance of their return level estimators. We sample n ∈ {25, 50, 100, 500, 1000} i.i.d. "block maxima" from a GEV distribution resembling the estimated distribution of the yearly maxima of hourly precipitation. From Table 3 and Table 4 in "Modelling sub-daily precipitation extremes with the blended generalised extreme value distribution" we find that the posterior mean of the intercept in β µ is E[β µ,0 ] = 0.661, and the posterior mean of the intercept in β σ is E[β σ,0 ] = 2.835. Consequently, we set α = 0.5, β = 0.8, µ α = E[β µ,0 ] · exp(E[β σ,0 ]) ≈ 11.26, and σ β = E[σ * β ] · exp(E[β σ,0 ]) ≈ 2.01. The tail parameter is set equal to the posterior mean from Table 4: ξ = 0.178. Using the n block maxima, we compute maximum likelihood estimators for the parameters of the GEV and bGEV distributions, which are further used for computing return level estimators for periods of length 25, 50, 100, 250 and 500. The actual maximum likelihood estimation is performed on the (µ, σ, ξ) parametrisation, in which µ ≈ 10.05 and σ ≈ 3.21. Optimisation is performed using the optim function in R, where the true values of (µ, σ, ξ) are used as initial values. The entire procedure is repeated 500 times for each value of n. This allows us to estimate the distribution of the maximum likelihood estimators for all return levels in question. Figure S1.1 displays the distribution of all return level estimators using both the GEV and the bGEV distributions. The difference between the distributions of the GEV estimators and the bGEV estimators are negligible for small values of n. For large return periods and values of n some slight differences are appearing. However, in practical situations one rarely encounters as much as 1000 block maxima, and the blocks are almost never large enough for the limit distribution to hold exactly. Thus, the differences in distribution can be considered negligible for large values of n for almost all real-life settings. Larger differences in model fit could theoretically occur for small blocks where the block maxima deviate further from the GEV distribution. However, in such settings, both the GEV and the bGEV distribution would be misspecified, and we have no reason to believe that the misspecification in the  26  28  30  32  30  33  36  35  40  45  50  40 45 50 55 60   22  24  26  24 26 28 30 32 34  28  32  36  40  35 40 45 50 55  40  50  60  70   20  25  30  35  20  30  40  20 30 40 50 60  20  40  60  80  25 50 75 Figure S1.1: Distributions of return level estimators computed from model fits with the GEV and bGEV distributions to simulated data from a GEV distribution. The dark vertical lines display the true return levels. Return periods are denoted as T , with unit "block sizes", and the number of block maxima used for fitting the GEV and bGEV distributions is denoted as n.
GEV distribution would be strictly better or worse than the misspecification in the bGEV distribution. Therefore, our results imply that we do not lose anything in performance by modelling GEV data with the bGEV distribution instead of the GEV distribution.
In most real-life settings we do not have as much prior knowledge about the true underlying distribution, and we will likely choose less optimal initial values and/or prior distributions. To examine the effects of choosing less optimal initial values, we repeat the maximum likelihood estimation on the exact same data used for Figure S1.1, using different initial values. Figure S1.2 displays the distributions of the return level estimators when we use an initial value of 0.9 instead of 3.2 for σ, without changing the initial values for µ or ξ. For small values of n the two distributions yield comparable results. However, for large values of n, the performance of the GEV distribution quickly deteriorates, and the return level estimators are shifted to the left. The bGEV estimators, however, seem to have identical distributions in Figures S1.1 and S1.2, and are not affected by the slight change of initial values. This demonstrates how inference with the bGEV distribution is more robust than inference with the GEV distribution. The parameter-dependent support of the GEV complicates likelihoodbased inference and makes it more difficult to locate the global maximum of the likelihood, even in simple and univariate settings with unrealistically large amounts of data that are perfectly GEV-distributed.
In such a simple setting, it is relatively easy to see that we chose bad initial values for (µ, σ, ξ). However, when modelling block maxima with a large number of covariates and in high-dimensional settings, it becomes considerably more difficult to choose good enough initial values and/or prior distributions. Additionally, it is not trivial to examine a maximum likelihood estimator or posterior distribution and find out if the inference procedure was successful or not. The GEV return level estimators in Figure S1.2 take on reasonable values and are large enough to seem realistic. Without knowing the truth, it would not be trivial for a practitioner to detect that they are faulty, and caused by numerical problems. Thus, if one performs likelihood-based inference using the GEV distribution, one should always perform extensive checks to evaluate whether the fit is reasonable, or if it is caused by numerical problems. The increased robustness of the bGEV distribution provides more safety, and most practitioners can be more confident in their model fits after fitting the bGEV distribution to block maxima data instead of the GEV distribution.
Similarly, while attempting to model return levels for precipitation in Norway using the gev family in R-INLA, we experienced that e.g. adding or removing one covariate from the model could completely alter the model fits if our priors were too wide or the data were not correctly standardised. Examining which of the fits were the best required comprehensive and time-demanding cross-validation studies, and it was difficult to find out if the best model fit was "good enough", or if a slight change in priors or covariates would yield considerably better model fits. These problems were considerably mitigated by switching to the bgev family, which resulted in more stable inference using less informative priors, and less optimal initial values.  Figure S1.2: Distributions of return level estimators computed from model fits with the GEV and bGEV distributions to simulated data from a GEV distribution, when using bad initial values. The dark vertical lines display the true return levels. Return periods are denoted as T , with unit "block sizes", and the number of block maxima used for fitting the GEV and bGEV distributions is denoted as n.

S2 PC prior for the tail parameter
In order to compute the PC prior for the GEV and bGEV distribution with ξ ≥ 0 and base model ξ = 0, one must first compute KLD(π ξ , π 0 ) for the GEV distribution and the bGEV distribution. Notice that the base model is identical for both distributions. Writing the GEV distribution function as F (x) = exp(−h(x)) gives the probability density function The KLD for the GEV distribution is equal to KLD(π ξ , π 0 ) = π ξ (x) log π ξ (x) π 0 (x) dx The fourth term of the KLD is simply equal to the expectation of (x − µ)/σ, which is known. The first term is easily solvable with the substitution u = h(x). Using the same substitution for the second term with the knowledge that log(−σh (x)) = (1 + ξ) log(h(x)) gives the integral where γ is the Euler-Mascheroni constant. We are unable to find a closed-form expression for the third term. However, using substitution with u = h(x) gives which is finite for ξ > 0. With a change of variables, this integral is transformed to have finite limits, This can easily be numerically approximated. Summarising, the KLD for the GEV is finite for 0 ≤ ξ < 1 and equal to KLD(π ξ , π 0 ) = −1 − (1 + ξ)γ + ξ −1 (Γ(1 − ξ) − 1) + The PC prior has probability density function π(d) = λ exp(−λd) with d = 2KLD(π ξ , π 0 ). Transforming the PC prior from a distribution on d to a distribution on ξ gives π(ξ) = λ exp (−λd(ξ)) ∂d(ξ) ∂ξ = λ d(ξ) exp (−λd(ξ)) ∂ ∂ξ KLD(π ξ , π 0 ) . expression for the KLD integral between the p a quantile and the p b quantile. Thus, we use numerical integration with µ = 0 and σ = 1 to compute KLD Blending (π ξ , π 0 ). The value of the integral does not depend on the values of µ and σ, but we are unable to compute it without choosing some value for them. Being unable to find an expression for the KLD of the bGEV distribution, we must also use numerical derivation to estimate the derivative of the KLD, needed in (S2.2). The PC priors for the GEV, bGEV and generalised Pareto distributions are displayed in Figure S2.1 for different values of the penalty parameter λ. For all the chosen values of λ the three distributions are so similar that their effect on a posterior distribution probably will be close to identical. The PC prior for the generalised Pareto distribution exists in closedform and is already implemented in R-INLA, while the other PC priors must be computed numerically and are not implemented in R-INLA. Consequently, we choose to use the PC prior of the generalised Pareto distribution for modelling the tail parameter of the bGEV distribution.