Study Area
The study is carried out in the Ahlergaarde catchment (1,055 km2), located in western Denmark (Fig. 1). The land surface elevation ranges from 150 m above sea level in the east to 25 m above sea level at the outlet of the catchment in the west (Sebok et al. 2016). The annual mean precipitation is 1,050 mm with an annual mean temperature of 8.2 degrees Celsius. The shallow aquifer mostly consists of sandy and silty deposits (Houmark-Nielsen 1989). Intensive agriculture is the dominating land use (80%), followed by forests (10%), heath (6%) and urban areas (4%) (Ridler et al. 2014).
Climate Models
16 GCM-RCM combinations (Table S1) from the EURO-CORDEX initiative (Jacob et al. 2014) are used. Each GCM is driven by Representative Concentration Pathway (RCP) 8.5 and downscaled to a spatial resolution of 0.11°.
The RCM outputs are remapped to match the observed temperature (20 km × 20 km) and precipitation (10 km × 10 km) grids produced by the Danish Meteorological Institute. Raw climate model outputs usually have systematic errors when compared to the observations (Maraun 2016). Therefore, the RCM outputs are bias-corrected employing a Distribution Based Scaling method (Seaby et al. 2013) which uses a double Gamma distribution to correct precipitation and a normal distribution to correct temperature. For precipitation, the cut-off threshold between the two distributions is set at the observed 90th percentile. In an initial step, the number of observed and simulated days without precipitation are matched by setting a threshold below which the daily precipitation outputs are set to zero (Pastén-Zapata et al. 2019).
Hydrological Model Setup, Calibration and Validation
The MIKE SHE code, a physically-based, integrated and fully distributed model (Abbott et al. 1986; Graham and Butts 2005) is setup for the catchment with a spatial resolution of 500 m × 500 m. The MIKE SHE code is the basis of the national hydrological model of Denmark (Henriksen et al. 2003; Højberg et al. 2013; Stisen et al. 2019) where the saturated zone, the unsaturated zone, river flow, evapotranspiration and overland flow are included. The dynamics of the unsaturated zone are of critical importance for the hydrological response of climate change.
The robustness of the results is assessed using three different model conceptualizations to simulate flow and evapotranspiration in the unsaturated zone: Richards’ equation, gravity flow and two-layer water balance. The models are described in supplementary Sect. S3 and their calibration and validation procedure and results are shown in Sect. S3.1.
Each bias-corrected GCM-RCM combination (16 in total) is used as the driving climate for each of the calibrated hydrological models (3 in total), producing 48 simulations in total. The average results of each climate model, across the three hydrological models, are used to estimate the projected absolute changes in discharge and groundwater head.
Climate Model Evaluation Metrics and Ranking
The simulation skill of the climate models is determined by comparing observations to simulations of precipitation from 1991 to 2010 following two different pathways (Fig. 2). The primary difference between pathways A and B is that in A, ranking is based on the match of the raw CMs to observed precipitation while in B, ranking is determined by the match of the bias-corrected CM output to observed precipitation following a five-fold cross-validation approach.
The cross-validation scheme evaluates whether the bias-correction method provides accurate results outside its training period (Gutiérrez et al. 2019). The five-fold cross validation method divides the observation period into five equal-length and non-overlapping blocks, where four of the blocks are used to train the parameters which are then used to correct the remaining block. This is repeated for all blocks until a cross-validated time series of the same length of the period with observations is produced.
It has been suggested that evaluating cross-validated outputs from free-running bias-corrected models (as pathway B) could give misleading results (Maraun and Widmann 2018). Nevertheless, these outputs are typically used to assess the impacts of climate change and cross-validation is employed to evaluate the reliability of the bias-correction method (Maraun 2016).
A set of nine metrics (9 m) is defined to evaluate the climate models with respect to precipitation (see Table S2). The metrics assess the simulation skills for the mean, ‘moderate’ extremes, ‘highly’ extremes and variability of precipitation. The extreme metrics are taken from the daily extreme climate change indices (Zhang et al. 2011). To explore the importance of the selection of the metrics, subsets of six (6 m), and three metrics (3 m) are also included in the analysis. In 3 m, the mean behaviour and one ‘moderate’ extreme metrics are evaluated, whereas 6 m includes more metrics on ‘moderate’ extremes and 9 m adds three ‘highly’ extreme event metrics.
For each metric, the climate models are assigned a score between 1 (smallest bias) and 16 (largest bias). For the purpose of ranking, the scores of all metrics for each model are summed up. Based on the final sum, models are ranked to differentiate their overall relative simulation skills. A low relative value of the final sum represents a model with good simulation skill whereas a high relative value indicates poor simulation skill.
Analysis of the Uncertainty in the Projection
The uncertainty of the hydrological projections is evaluated by analysing the projected ensemble mean river discharge and mean groundwater head of the 16 ensemble members by the end of the century (2071–2100). The analysis focuses on the 5th, 50th and 95th percentiles of each variable from the best performing RCMs still remaining at a given step in the analysis.
The change in the uncertainty is analysed in a series of steps (see Fig. 2). For each step, a set of phases are followed, as described next for step ‘n’:
-
Phase 1. Estimate the standard deviation of the variable of interest (river discharge or groundwater head) considering the RCMs in the ensemble at the beginning of step ‘n’.
-
Phase 2. For all the RCMs in the current ensemble, evaluate their ability to simulate the historical precipitation using the different subsets of metrics (Table S2) and rank them, as described in Sect. 2.4.
-
Phase 3. Remove the worst-ranked RCMs from the ensemble.
-
After reaching Phase 3, all three phases are repeated for the following step ‘n + 1’. This approach is used for nine steps, leaving seven models in the final ensemble. Previous analyses (e.g., Evans et al. 2013; Pennell and Reichler 2011) indicate that the order of seven models is considered as an appropriate ensemble size.
Comparison with Other Weighting Methods
The results of this approach are compared to other weighting methods. The reliability ensemble averaging (REA) and the upgraded reliability ensemble averaging (UREA) methods are selected for comparison because these methods can assign weights with larger differences to the climate models (Wang et al. 2019; Chen et al. 2017). Both are multiple criteria methods and can reduce the discharge uncertainty in the historical period to a larger extent than other weighting methods (Wang et al. 2019). In the present study, only the projections of precipitation are evaluated to define the climate model weights of these methods. REA assigns the weight of a model by assessing its reliability, which consists of the product of two components: its biases in the historical period and the convergence of its projection with the projection of the whole model ensemble (Giorgi and Mearns 2002). UREA removes the criteria of convergence and replaces it with the skill of each individual model to simulate the observed interannual precipitation variability (equations used are shown in the supplementary Sects. S4 and S5).
Uncertainty Assessment for the Different Methods
The uncertainty of the methods is assessed initially using the standard deviation of the projection by the end of the century and subsequently the signal to noise ratio (SNR) of the projected change. The standard deviation of the ensemble is estimated using the square root of the sum of the squared differences between the projection of model i (Xi) and the projected mean of the ensemble (µ), multiplied by the weight (W) of model i:
$$\sigma =\sqrt{{\sum}_{i=1}^{N}{W}_{i}\cdot {\left({X}_{i}-\mu \right)}^{2}}$$
(1)
The SNR estimates the uncertainty of the projected change from the reference period (1981 to 2010), to the future period (2071 to 2100). Thus, the SNR is estimated by dividing the projected mean change of the model ensemble (µ) by its standard deviation (σ):
$$SNR=\frac{\mu }{\sigma }$$
(2)
A larger SNR indicates that the uncertainty is relatively small, compared to the larger uncertainty of smaller SNR values.