Functional and variables selection in extreme value models for regional flood frequency analysis

The problem of estimating return levels of river discharge, relevant in flood frequency analysis, is tackled by relying on the extreme value theory. The Generalized Extreme Value (GEV) distribution is assumed to model annual maxima values of river discharge registered at multiple gauging stations belonging to the same river basin. The specific features of the data from the Upper Danube basin drive the definition of the proposed statistical model. Firstly, Bayesian P-splines are considered to account for the non-linear effects of station-specific covariates on the GEV parameters. Secondly, the problem of functional and variable selection is addressed by imposing a grouped horseshoe prior to the coefficients to encourage the shrinkage of non-relevant components to zero. A cross-validation study is organized to compare the proposed modeling solution to other models, showing its potential to reduce the uncertainty of the ungauged predictions without affecting their calibration.


Introduction
An effective prediction of flood phenomena is crucial for protecting and managing territories.A rigorous hydrological approach to the problem can be profitably supported by flood frequency analysis, which is a data-based framework aimed at providing reliable estimates of expected return periods of a flood event characterized by a certain magnitude.When data from multiple gauging stations placed in a target catchment or area are included in the analysis, a regional flood frequency analysis is carried out (Hosking and Wallis, 1997).The main advantage of this approach is the possibility to borrow strength from available stations to obtain calibrated and reliable estimates also for ungauged locations.In the framework of regional flood frequency analysis, a pioneering approach is the flood index by Dalrymple (1960).Such a method is characterized by a multi-step procedure, where the main stages can be summarised in i) classifying stations in homogeneous regions; ii) choosing a suitable frequency distribution for the locations included in a region; iii) estimating the distribution parameters, commonly relying on L-moments algorithm.For an overview of this approach, see Hosking and Wallis (1997).
An alternative strategy that allows producing return level estimates also for ungauged locations relies on modeling a block maxima sequence through a Generalized Extreme Value (GEV) distribution, assuming that the parameters depend on covariates and/or other features connected with the stations.In flood frequency analysis, yearly maxima of the river discharge (m 3 /s) are considered for this purpose.The GEV distribution is a widespread tool of the extreme value theory, and such models have been already exploited in regional flood frequency analysis, often adopting a Bayesian inferential approach.For example, Thorarinsdottir et al. (2018) use it to study floods in Norway, including some features of the stations and the related sub-catchments.A similar modeling strategy is also set by Jóhannesson et al. (2022), which propose a computationally efficient procedure for its estimation, exploiting its representation as a generalized latent theory.In this branch of statistics, two main approaches can be pursued: block maxima and peak-over-threshold (Coles, 2001;Beirlant et al., 2004).The first strategy considers the maxima of a time block sequence, which are used to estimate the parameters of the GEV distribution.The second procedure is constituted by two steps: i) the whole dataset is exploited to estimate a threshold above which observations are considered to be extremes; ii) the threshold exceedances are used to fit a Generalized Pareto distribution, that it can be shown to be connected with the GEV.
In this paper, the block maxima approach is adopted.According to the Fisher -Tippett -Gnedenko theorem, the GEV distribution arises as the limiting distribution of a normalized sequence of block maxima, and, hence, plays a relevant role in this framework.Indeed, this probabilistic result is exploited in extreme value theory, assuming that a sequence of recorded maxima over T distinct temporal blocks (e.g. years or days), denoted by y t , t = 1, . . ., T , is distributed as y t |µ, σ, ξ ind ∼ GEV (µ, σ, ξ), ∀t.
(1) Such a distribution is ruled by three parameters: µ ∈ R controls the location, σ ∈ R + the scale and ξ ∈ R the shape, affecting the behavior of the distribution tails and, consequently, its support.In particular, ξ < 0 implies a short and finite right-tail (y t ∈ (−∞; µ − σ/ξ]), ξ = 0 a light right-tail (y t ∈ R) and ξ > 0 an heavy right-tail (y t ∈ [µ − σ/ξ, +∞)).In the latter case, ξ has an impact also on the existence of the distribution moments: the moment of order ρ is finite if ξ < 1/ρ.The GEV distribution is usually defined through its cumulative distribution function: (2) where [g] + = max(g, 0).
In the statistical analysis of extremes, the most typical output is the estimation of return levels associated with a return period R. It is defined as the quantile Q p that has a probability equal to p = 1/R of being exceeded in the chosen time block.In other words, the return level Q 1/R is expected to be surpassed once every R time blocks and, inverting the (2), it is defined as Once the probabilistic setting is defined, some remarks about the inferential side of the problem are worthy.In this paper, a Bayesian approach is adopted: it is becoming increasingly popular in extremes statistics thanks to the possibility of eliciting prior information and the natural ability to estimate the model uncertainty, propagating it in distinct steps of the analysis (Coles and Powell, 1996;Coles, 2001).For example, making inference on return levels (3) require combining the three distribution parameters that need to be estimated: if the Bayesian approach is chosen and Monte Carlo Markov Chain (MCMC) methods are exploited, then drowns from the parameters posterior can be combined to obtain the whole posterior distribution of Q 1/R .

Data on Danube river basin
The proposed strategy targets the estimation of return levels for the discharge of rivers belonging to the Danube upper basin (i.e. the part located both in Germany and Austria).The analysis considers data that are freely available from different sources to propose a general procedure that can also be replicated in other river basins.In this section, the sources of information considered in the analysis are listed, together with some remarks about the data integration procedure.
The response variable required for implementing a flood frequency analysis is the river discharge, usually measured in m 3 /s.The time series with daily river discharge observations are  retrieved from the GRDC portal (The Global Runoff Data Centre, 1988), selecting all the gauging stations present in the area under study.To determine the final set of locations, some data quality and reliability checks are performed: by focusing on the period 1985-2017, only stations with a maximum of 2% daily missing observations in each year are selected.Furthermore, the coordinates of the gauges should correctly locate on the river network to avoid location inconsistencies and possible mismatches.The shapefiles with the network of main rivers in the basin are retrieved from the River Network Database (Copernicus Programme, 2020).The final database is constituted by yearly maxima from S = 62 stations that satisfied the aforementioned requirements.The yearly maxima are denoted as y st , referring to the gauging station s = 1, . . ., 62 in the year t = 1, . . ., 33.For brevity, y s indicates the vector of maxima related to a single station s.
River discharge values are complemented by station-specific auxiliary variables, that are listed in Table 1.Most of them are derived from the EU Digital Elevation Model (EU-DEM, Copernicus Programme, 2016), starting from the determination of stations sub-catchments through GISbased tools within the whitebox R package (Lindsay, 2016).The spatial location is accounted for by latitude and longitude, the sub-catchment area is an important measure directly related to the river discharge magnitude, whereas the features of the terrain characterizing the sub-catchment are computed by averaging indicators derived from the EU-DEM in the area (elevation, slope and aspect).In addition, the average rainfall (O'donnell and Ignizio, 2012) and the proportion of area covered by buildings (from CORINE land cover raster, Buchhorn et al., 2020) are considered.

Exploratory analysis
This section reports the results of an exploratory analysis aimed at pointing out the main motivations for the modeling strategies dealt with in the paper.A two-step analysis is performed: firstly, a GEV distribution is fitted on the maxima sequences y s registered at each station s: y st ind ∼ GEV (µ s , σ s , ξ s ), ∀t.Then, the estimates of the GEV parameters are used as responses in Bayesian semi-parametric additive models, to investigate how they are influenced by the stationspecific covariates.
In the first step, S = 62 station-specific GEV models are fitted adopting the Bayesian approach.To retrieve the posterior distributions of the parameters, the model specification must be completed by choosing prior distributions.Given the exploratory purpose of this step, noninformative priors on the parameters are set; namely, zero mean Gaussian distributions with scale 10,000 for µ s and log(σ s ), and a standard normal for the shape parameter ξ s .Samples from the posterior distribution of the GEV parameters are drawn by means of an MCMC algorithm implemented in Stan.As point estimates, the posterior means are then computed and are denoted as (μ s , σs , ξs ), ∀s.
Figure 2 reports the distributions of the estimated GEV parameters, which are contained in the following vectors: μ = (μ 1 , . . ., μS ) , σ, ξ.The first goal of this exploratory step is to assess whether the multivariate link function proposed in Jóhannesson et al. (2022) is reasonable for the discussed application.We first note that all the MCMC draws from the posteriors of µ s , ∀s, are positive, and the distribution of μ is markedly skewed.For this reason, a logarithmic transformation of the location parameter might be useful to reduce the skewness, noting that the induced positivity assumption is not restrictive.Furthermore, Pearson's linear correlation between log( μ) and log( σ) is 0.987: consequently, modeling the functional τ s = log (σ s /µ s ) as dispersion parameter in place of σ s is convenient to reduce the dependency between model parameters.The multivariate link function by Jóhannesson et al. (2022) also encloses a transformation of the shape parameter h(ξ s ), by limiting its range in the interval (−0.5, 0.5).Such a transformation, which will be deeply discussed in the next section, is helpful in stabilizing the estimate of the shape parameter and it appears to be tenable as only 4 stations show an estimate ξs higher than 0.5.
In the second step of the analysis, the transformations of GEV parameters estimates, i.e. log( μ), log( σ/ μ) and h( ξ), are used as responses in three distinct Bayesian Gaussian additive models implemented by the stan_gamm4() function of the rstanarm package (Goodrich et al., 2022).All the covariates summarized in Table 1, transformed according to the reported function, are included in the model as smooth terms through a Bayesian P-spline representation.This choice is motivated by the presence of non-linear relationships among the covariates and the target parameters (e.g., see Figure 3).The exploration of the posterior results supports the choice of specifying smooth effects for the covariates.The residuals of the fitted additive models are studied in order to check if a residual spatial trend can be detected (Cooley et al., 2007).The variograms reported in Figure 4 are related both to the model with all the covariates (Full ) and the model without covariates (Null ) and they do not point out a relevant residual spatial variation when the full model is considered.
For these reasons, our modeling proposal focuses mainly on the presence of non-linear relationships among the transformations of the GEV parameters and the covariates included in the analysis.On the other hand, the inclusion of spatially structured random effects is omitted to keep the model as simple as possible, pointing the attention on the implementation of Bayesian semi-parametric GEV models.

The proposed modeling framework
Let us consider that a collection of N maxima y st from s = 1, . . ., S gauging stations are available for blocking times t = 1, . . ., T .It is assumed that, conditionally on site-specific parameters, the maxima are distributed as: The assumption of conditional independence represents a simplification but it is a quite standard one in the extreme value literature when marginal return levels need to be estimated (Dyrrdal et al., 2015;Thorarinsdottir et al., 2018;Jóhannesson et al., 2022).Alternatively, max-stable spatial processes can be considered, even if the complexity of the modeling framework sensibly increases (Asadi et al., 2015).As already hinted in Section 3.1, the multivariate link function  2022) is adopted, to obtain transformed parameters that are convenient to specify regression models for: The station-specific parameters are stored in the following vectors: ψ = (ψ 1 , . . ., ψ S ) , τ = (τ 1 , . . ., τ S ) and φ = (φ 1 , . . ., φ S ) .Applying the logarithmic transformation on the location parameter µ s restricts its domain to R + and, according to the exploratory results of Section 3.1, such an assumption is fulfilled in the discussed application; furthermore, the dependence between scale and location parameters is mitigated modeling log(σ s /µ s ) instead of log(σ s ).Lastly, the function applied to the shape parameter is where (a φ , b φ , c φ ) = (0.062376, 0.39563, 0.8).Jóhannesson et al. (2022) proposed it to restrict the domain of ξ s to the interval (0.5, 0.5), keeping an approximately linear relationship with ξ s around zero.The domain restriction for the shape parameter is motivated by the desiderata of guaranteeing the variance of the GEV distribution to be finite and the upper bound of the distribution greater than µ s + 2σ s .Note that the developments discussed later still hold even if a different link function is adopted.If a linear relationship between covariates and the parameters is assumed, the following latent regression models are specified: . ., M .Each predictor, related to the generic parameters vector θ ∈ {ψ, τ , φ}, is constituted by an overall intercept β 0θ and a linear regression with coefficients stored in β θ ∈ R M .A vector of station-specific unstructured random effects u θ ∈ R S completes the equation, to account for possible residual variation.To keep the notation simple, all the model equations in (6) contain the same covariates, as will be the case in the application, but this assumption can be easily relaxed.
The model specification must be completed by setting prior distributions for the parameters.Firstly, weakly informative Gaussian priors are assumed for the coefficients.To account that the transformed GEV parameters have different magnitudes, they are calibrated by exploiting the results from the exploratory analysis of Section 3.1.More in detail, following the advises from the rstanarm package (Goodrich et al., 2022), the following prior is set for the intercepts: where m θ and s 2 θ are the mean and the variance of the generic vector of fitted parameters θ.Recalling that the covariates are standardized, independent zero-mean Gaussian priors with equal scales are specified for the regression coefficients: Lastly, focusing on the vector of unstructured random effects, a spherical multivariate Gaussian prior with scale parameter κ θ is set: where N + (•, •) indicates an half-Normal distribution.

GEV regression with Bayesian P-splines
When the evidence of non-linear relationships between covariates and responses is pointed out, it is natural to extend the linear models in ( 6) by allowing for flexible regression terms.Among the possible strategies, the Bayesian P-splines method by Lang and Brezger (2004) is implemented: The predictors are characterized by the sum of M flexible regression terms defined as the product of a matrix B m ∈ R S×K of cubic B-spline basis functions evaluated at K knots, multiplied by a vector of associated coefficients γ θm ∈ R K .In the P-splines approach, the smoothness of the fitted effect is encouraged by setting a second-order random walk prior on the splines coefficients: The matrix K γ has rank K − 2 and it is a precision matrix describing a second-order random walk, whereas ω θm is a scaling parameter.Due to the rank deficiency of the precision matrix, the prior is improper and the specification of linear constraints might be required.To better understand the features of the P-splines setting, the representation of the (10) as a mixed model could be useful.

Mixed model representation
The linear predictors defined in (10) can be reparameterized by exploiting the spectral decomposition of B m K − γ B m , i.e. the covariance matrix of B m γ θm .The model representation defined in the following is particularly suitable to perform functional selection since the structured and improper prior on the spline coefficients in ( 11) is traced back to a proper spherical Gaussian prior on coefficients, associated with a matrix of orthonormal basis Scheipl et al. (2012).
To set the notation, the spectral decomposition is defined as: where −2) is the orthogonal matrix with the associated eigenvectors and U 0 ∈ R S×(R−K+2) contains the eigenvalues that span the null space of B m K − γ B m .Combining the prior in (11) and the spectral decomposition (12), it is possible to split the generic flexible term B m γ θm into a penalized component and an unpenalized one: The unpenalized part is constituted by the term x •m β θm and it is strictly related to the null space of the structure matrix that defines the prior assumed for the splines coefficients.Indeed, under the considered second-order random walk, a polynomial of order one in the covariate is required, i.e. a constant term (already included in the overall intercept) and a linear trend on the covariate.Concerning the penalized component, Bm = U + Λ 1 2 + ∈ R S×(K−2) determines a matrix of orthonormal basis and γθm ∈ R K−2 constitutes the vector of related splines coefficients.Due to the orthogonalization procedure, such vector of coefficients has a spherical prior: Hence, linear predictors in (10) can be expressed in the following way: The model specification can be completed by the already discussed priors (7), ( 8) and ( 9), whereas for the scaling parameter ω θm the same prior of equation ( 11) can be set.In this way, a standard Bayesian P-splines model can be implemented, even if only proper priors are specified.

Variable selection: the grouped horseshoe prior
When several covariates are available and their relationships with the modeled latent parameters are unknown, it can be useful to set a prior distribution that is able to shrink the non-relevant regressors to zero.Scheipl et al. (2012) proposed to use spike-and-slab priors for functional selection.The behavior of such priors is also mimicked by the HS priors, which have been proposed in a hierarchical version to deal with shrinkage of grouped regression terms (Xu et al., 2016).Since all the coefficients related to a covariate can be considered to form a group, a grouped HS prior appears suitable to be applied in this framework, rearranging the model in (13) to where 1) and α θm = (β θm , γθm ) ∈ R K−1 .To implement the grouped HS prior, the following hierarchy is necessary: where δ θm = δ θ1m , . . ., δ θ(K−1)m and C + (•, •) denotes an half-Cauchy distribution.The parameter η θ represents the global scale of the regression coefficients.For this reason, its prior scale is set equal to the standard deviation of posterior estimates obtained from the station-specific exploratory GEV models, to account for the different magnitudes of the modeled quantities.The prior hierarchy is completed by a covariate-specific scale λ θm that controls the relevance of the whole effect and the coefficient-specific parameter δ θkm .

Posterior inference and model comparison
As previously mentioned, an MCMC approach is adopted to draw B samples from the posterior distributions of the model parameters, by exploiting the Stan probabilistic language.After draws from the posteriors of the basic parameters are obtained, it is possible to consequently retrieve other posterior distributions of useful quantities.More in detail, it is possible to have the posterior for the generic GEV parameter related to station s: θ s |y.If the interest is on an out-of-sample location s , such quantity cannot be computed due to the presence of the stationspecific random effect term.To propagate the uncertainty, it is possible to obtain a prediction of the GEV parameter defined as θs |y = β 0θ + f (x T s • ) + ũθ |y.The function of the covariates f (x T s • ) depends on the kind of model that is analyzed (linear or spline regression) and the b-th replicate of random effect term is generated as The samples from θ s |y or θs |y can be combined by following the (3) to have posterior distributions of the return period denoted with Q 1/R,s |y or Q1/R,s |y for the estimated and the predicted ones, respectively.
The posterior predictive distribution is another important quantity for making predictions and model assessments.It is possible to recover a random sample from it by exploiting the MCMC posterior samples of GEV parameters.In particular, the b-th replicate from the posterior predictive y rep st |y, ∀s, t, is obtained generating from: s ).Similarly, the posterior predictive distributions for out-of-sample stations, denoted as ỹrep s t |y, ∀s , t, can be drawn relying on the posteriors θs |y.
The posterior predictive distribution constitutes the pillar of several model performance evaluation tools that are listed hereafter.In particular, as shown in the next section, a cross-validation study is carried out to assess and compare the performances of the models.The quantities that are introduced in the following are computed by relying on the posterior predictive ỹrep st |y −s , i.e., obtained after fitting a model without observations from station s.
To evaluate the calibration of predictions produced by Bayesian models, the probability integral transforms (PIT) are widely used (Dawid, 1984).In particular, they are defined as i.e. the cumulative probability of the posterior predictive distribution up to the observed value y st .If the model predictions are calibrated, PIT values follow a uniform distribution.Bayesian p-values constitute another useful posterior predictive check.They can be flexibly defined, depending on the inferential goal characterizing the procedure.In extreme value estimation, GEV quantiles represent an important target quantity, since they determine return levels.For this reason, station-specific Bayesian p-values are defined for a given return period R: where q 1/R (y s ) is the sample quantile of the maxima of station s.In this case, good model performances are underlined by values of P-val s nearby 0.5.Lastly, the continuous ranked probability score (CRPS) is largely used to evaluate the probabilistic predictions under continuous densities, even in the extreme values literature (Friederichs and Thorarinsdottir, 2012).It is a score computed specifically for each observation y st and it is indicated with CRPS(y st ), and the R package scoringRules can be exploited to evaluate it (Jordan et al., 2019).Note that the model showing lower scores is preferable in terms of calibration and sharpness of the predictions.

Application
The modeling strategies described in Section 4 are applied to the Danube basin data introduced in Section 3. In particular, results about three different Bayesian models are compared: the one assuming linear effects for the covariates, labeled as Linear and defined by equations in ( 6), the basic P-spline model of ( 10), labeled as Splines, and its extension to automatically perform model selection through grouped horseshoe priors (labeled as Splines-HS ).
To assess the performances of the considered models, results from a folded cross-validation study are reported in Section 5.1, whereas the outcomes from the analysis carried out on the full dataset are discussed in Section 5.2.

Cross-validation study
The whole set of S = 62 stations is randomly partitioned into G = 31 groups constituted by a couple of stations each.A folded cross-validation study is executed, by repeatedly fitting the 3 compared models and excluding a couple of stations at each iteration.The quantities introduced in Section 4.2, PIT and CRPS in particular, are evaluated on the out-of-sample stations.
A first indication from the folded cross-validation study concerns the stability of the estimates with respect to the removal of stations.Given that the models are characterized by different parameterizations, the intercepts β 0θ are taken into consideration for this aspect.Figure 5 compares the distributions of the posterior means obtained in the 31 runs to the estimates of the intercepts in the models fitted considering all the stations.The estimation of such parameters seems to be stable: the estimates obtained exploiting the full dataset are often close to the median of the distribution and, in general, included in the boxes.The only exception concerns the φ parameter under Linear, remaking that such parameter is also characterized by evident differences in the estimates across the models.This could be expected due to the difficulties in identifying the shape parameter.
The calibration of the predictions produced by the compared models is firstly evaluated by exploring the distribution of P IT st , recalling that a uniform distribution is required for a calibrated model.The kernel densities are shown in Figure 6, compared with the expected uniform distribution.In the models including flexible regression terms (Splines and Splines-HS ), PIT distributions are more compliant with the Uniform if compared to the Linear model, where the excess of values far from 0 or 1 is more evident.Another indication about the calibration of predictions can be deduced from the Bayesian p-values P-val R,s .To summarise them, P-val * R denotes the proportion of Bayesian p-values far from the extremes, i.e. included in the interval (0.05, 0.95).Selecting R = 50, P-val * R is equal to 0.85 for Splines, 0.87 for Splines-HS and 0.90 for Linear ; concerning R = 100, a value of 0.87 is observed for Splines, and 0.92 for Linear and Splines-HS.To sum up, the three models show good results in terms of calibration of predictions, with the Linear slightly penalized in terms of PIT and the Splines model in terms of Bayesian p-values for quantiles predictions.
Lastly, further evaluations of the reliability and the sharpness of predictions are discussed.The average CRPS (ACRPS) is computed to have a station-specific summary: ACRPS s = T −1 t CRPS(y st ), and their distributions across the stations are depicted in Figure 7.To set the proposed Splines-HS model as a benchmark, the values are relativized by dividing them for the corresponding ACRPS observed under this model.The median of the distributions of relative ACRPS is above 1 both for Linear and Splines models, where, respectively, 60% and 61% of stations have higher ACRPS than under Splines-HS model.Note that the 61% of stations have higher ACRPS under Linear model if compared to the Splines one, pointing out the merits of introducing flexible effects in the model.Another indication about the sharpness of prediction can be deduced from the width of the 90% credible intervals of the posterior of quantiles Q1/R,s |y −s , for R = {50, 100}.Also in this case, Figure 7 reports the distribution of the station-specific widths divided by those obtained under the Splines-HS model.It is interesting to stress how the Splines-HS model has intervals in median the 26.6% and 42.6% narrower than the intervals retrieved with Splines and Linear models, respectively.

Results
According to the results of the folded cross-validation study presented in the previous section, allowing for non-linear relationships among covariates and GEV parameters leads to some gains in terms of predictive ability.These improvements are even more noticeable when a prior able to automatically execute the variables selection step is assumed.Similar conclusions can also be detected by comparing the models fitted relying on the whole dataset.As an overall measure of goodness of fit, the leave-one-out information criterion (LOOIC, Vehtari et al., 2017) is considered and it registers the lowest values for the Splines-HS model (22219.5),followed by the Splines one (22222.1)and, more detached, the Linear model (22245.7).
A first insight to understand the benefits led by the models with Bayesian P-splines can be deduced from Table 2, reporting the posterior summaries about the random effects scale parameters κ θ .Such quantities can be considered as measures of the amount of signal captured by the covariates in the regression models: the higher the values, the lower the variability explained by the covariates.The Linear model registers noticeably higher scales, especially for the random effects related to parameters ψ and τ .Despite such differences, it is interesting to remark that the in-sample estimates of the stations-specific GEV parameters µ s and σ s are similar across the considered models, whereas differences can be observed for the shape parameter ξ s , for which the models induce different levels of shrinkage.These results are depicted by the boxplots in the first row of Figure 8, where also the estimates obtained under the station-specific models are added for benchmarking purposes.As a consequence, the inflation of the scales κ θ might lead to over-dispersed out-of-sample predictions: such behavior is captured by the PIT distribution previously reported in Figure 6 and the general increase of the width of the credible intervals (Figure 7).
Let now shift the focus to the comparison between the two models that include flexible regression terms.Figure 9 shows how three selected covariates (area, elevation and slope) impact on the transformations of GEV parameters.As a first cue, it can be noticed the marked nonlinearity of several effects (elevation and slope, primarily).The trends detected by the two models are similar, however, the impact of the grouped HS prior for the splines coefficients emerges.The shrinkage towards zero for negligible effects is evident under the Splines-HS model, especially when modeling φ, i.e. the function of the shape parameter.In this case, the Splines model individuates trends endowed with considerably higher uncertainty, producing intervals that include the 0 value almost everywhere.The decrease in the effect uncertainty is also detectable when modeling parameters ψ and τ , even if less pronouncedly.Observing the effect of area on the location parameter ψ it can be pointed out that the grouped HS prior is also able to firmly lead back the flexible effect to the linearity assumption.
The results concerning covariate effects can be put in relationship with those about the random effects scales shown in Table 2. Indeed, the combination of these outputs allows to motivate the lower dispersion of the station-specific estimates of the shape parameters ξ s under the Splines-HS model, already noticed in boxplots of the first row in Figure 8.On the other hand, the Splines model produces scattered estimates.In light of the massive shrinkage induced by the grouped HS priors, such variability of estimates could be poorly supported by the data, possibly leading to problems of instability.In fact, it is widely known that the identification of the shape parameter of the GEV distribution is a tricky task (see, e.g., Jóhannesson et al., 2022), and the grouped HS prior can help in avoiding over-fitting in this framework.
The second and the third rows of Figure 8 allow us to deepen the connections between parameters estimates obtained with the Splines-HS model and the GEV distribution fitted on the single stations.The results related to the in-sample estimates confirm that no relevant differences are detected in estimating µ s and σ s , they provide further evidence about the strong shrinking process affecting the estimates of ξ s , which are gathered around 0.25.It is also interesting to explore how the GEV parameters are predicted when data related to the station are excluded from the fitting sample, taking the outcome of the folded cross-validation study (third row).As expected, the predictions concerning µ s and σ s are more scattered with respect to the estimates from the single-station models, even if the correlation between estimates and predictions is strong.From these diagnostic plots, three stations, whose points are embedded in a red triangle, are selected to investigate the inference on return levels through the different modeling strategies.To this aim, the distances between predictions and single-station model estimates are considered, taking the stations having maximum (#6242530), median (#6243240) and minimum (#6342610) distances, noting that such stations are also representative of different values of the shape parameter according to the single-station models.
To complete the analysis of the results, a brief discussion on the estimates and the out-ofsample predictions of river discharge return levels is carried out (outcomes reported in Figure 10).As expected, the in-sample estimates are generally characterized by lower levels of uncertainty than predictions, whose variability is inflated by the presence of random effects generated from the prior, as described in Section 4.2.Another general trend to point out is that the single-station models produce estimates with larger credible intervals, mainly due to the issues in estimating the shape parameters.Conversely, the models fitted on the overall basin allow borrowing strength across the stations, reducing such variability through the aforementioned shrinkage process on ξ s .Besides, as already pointed out in Section 5.1, the Splines-HS model is also able to produce return level estimates with lower uncertainty levels than the other strategies, by combining lower variability in effects identification (Figure 9) and lower random effects scale parameters (Table 2).Despite the narrower bands, the points representing the observed values are included in the credible intervals, with the exception of predictions for station #6242530, i.e. the one characterized by the maximum distance between predicted and estimated parameters.

Concluding remarks
This paper aims at illustrating the potential of Bayesian models in introducing flexibility in extreme value analysis.In particular, the linearity assumption, often restrictive in dealing with complex phenomena such as environmental ones, is relaxed proposing non-linear functional relationships.Furthermore, a suitable regularizing prior is introduced, allowing the incorporation of variables and functional selection steps within the model.The use of the popular Stan software to sample from the posteriors could also foster practitioners in using more sophisticated statistical techniques.
The performances of the models considered in the paper are compared by means of a crossvalidation study that evaluates their ability in predicting return levels at ungauged locations.In doing so, the advantages brought by the use of splines regression tied with a regularizing prior can be highlighted.Indeed, its use allows us to sensibly reduce the uncertainty of the predictions without affecting model calibration if compared to other considered model specifications.
Despite the application tackles extreme value analysis from the block-maxima perspective, by adopting the typical GEV distribution, the underlying idea of setting a semi-parametric regression with regularizing priors can also be extended to other distributional assumptions and approaches of extreme value theory.Among the others, we mention the Blended-GEV by Castro-Camilo et al. (2022), which solves the GEV problem of having a finite lower tail when the shape parameter is positive, or the widespread peak-over-threshold approach.In the latter framework, the proposed strategy might help in both the threshold determination step and in the analysis of the exceedances through the Generalized Pareto distribution.
Lastly, it is worth stressing that the principle behind the use of a prior encouraging a grouped variable selection can also be extended to other low-rank structure matrices such as tensors, useful to model a spatially structured effect, interactions and also categorical variables (Scheipl et al., 2012).

Figure 2 :
Figure 2: Histograms of posterior means for the station-specific GEV parameters.

Figure 4 :
Figure 4: Variograms for the residuals of the models fitted with (Full ) or without (Null ) covariates on the transformations of the GEV parameters.

Figure 5 :
Figure 5: Boxplot of the G = 31 posterior means from the cross-validation compared to the estimate obtained in the model fitted with all the stations.

Figure 6 :Figure 7 :
Figure 6: Kernel densities of the distribution of P IT st , ∀s, t, density of the Uniform distribution as dashed line.

Figure 8 :Figure 9 :
Figure 8: Boxplots of GEV parameters posterior means under the considered models (first row).Comparison between estimates (second row) and out-of-sample predictions (third row) from the Splines-HS model and the station-specific ones.The red triangles indicate the stations considered for return levels of Figure 10.

Table 1 :
Figure 1: Gauging stations of the Danube upper basin included in the analysis.Station-specific covariates used in the analysis.

Table 2 :
Posterior estimates related to the random effects scale parameters.