Comparison of two model calibration approaches and their influence on future projections under climate change in the Upper Indus Basin

This study performs a comparison of two model calibration/validation approaches and their influence on future hydrological projections under climate change by employing two climate scenarios (RCP2.6 and 8.5) projected by four global climate models. Two hydrological models (HMs), snowmelt runoff model + glaciers and variable infiltration capacity model coupled with a glacier model, were used to simulate streamflow in the highly snow and glacier melt–driven Upper Indus Basin. In the first (conventional) calibration approach, the models were calibrated only at the basin outlet, while in the second (enhanced) approach intermediate gauges, different climate conditions and glacier mass balance were considered. Using the conventional and enhanced calibration approaches, the monthly Nash-Sutcliffe Efficiency (NSE) for both HMs ranged from 0.71 to 0.93 and 0.79 to 0.90 in the calibration, while 0.57–0.92 and 0.54–0.83 in the validation periods, respectively. For the future impact assessment, comparison of differences based on the two calibration/validation methods at the annual scale (i.e. 2011–2099) shows small to moderate differences of up to 10%, whereas differences at the monthly scale reached up to 19% in the cold months (i.e. October–March) for the far future period. Comparison of sources of uncertainty using analysis of variance showed that the contribution of HM parameter uncertainty to the overall uncertainty is becoming very small by the end of the century using the enhanced approach. This indicates that enhanced approach could potentially help to reduce uncertainties in the hydrological projections when compared to the conventional calibration approach.


Introduction
River basins in high Asia with mountainous headwaters having seasonal storage in the form of snow and ice are vital because they contribute substantially to the water supply in the densely populated lowlands by forming a natural buffer against drought (Pritchard 2019;Vanham et al. 2008). The Upper Indus Basin (UIB) is one such example that supplies water to the vast Indus plain, often termed the bread basket of Pakistan (Clarke 2015). It plays an important role in feeding the population of 197 million of Pakistan. But this South Asian region is also a hotspot of climate change which will affect the future water availability (De Souza et al. 2015). The snow and glacier melt-dominated hydrological cycle of the UIB is likely to undergo drastic changes due to the adverse effects of climate change (Hasson 2016) and it is expected that the flow regime could be radically altered for the UIB (Mukhopadhyay 2012).
In the last decade, a number of studies have assessed the future water availability from the UIB. For example, Hasson et al. (2019) showed that when there is no change in glacier area, the annual water availability will increase by 34% and 43% on average in the Himalayan watersheds based on the global warming scenarios of 1.5°C and 2.0°C, respectively. But if glacier area is reduced then the overall water availability in the UIB will be reduced under the prevailing climatic conditions (Hasson 2016). Moreover, Lutz et al. (2014 and concluded that the future water availability for the UIB is highly uncertain but the basin-wide trends and patterns of seasonal shifts in water availability are consistent across climate change scenarios based on the glacio-hydrological model's simulations. Lutz et al. (2016) also projected an initial increase in water availability for the UIB in summer months in the early decades of the twenty-first century and a sharp decline at the end of the century due to diminishing glacier coverage. Similarly, Laghari et al. (2012) concluded that climate change will result in increased water availability in the short term (e.g. < 10 years); however, longterm water availability will decrease. Most of these studies suggested that the reduced water availability in the future will be intensified during the spring and summer months, while snowmelt will occur earlier than the main monsoon flows.
Despite these big modelling efforts in recent years, glacio-hydrological projections are still associated with large uncertainties due to insufficient observations and process understanding in snow-and glacier-dominated basins (Pellicciotti et al. 2012), as well as the coarse resolution of global climate models (GCMs) (Vetter et al. 2015). For example, Krysanova and Hattermann (2017) investigated the climate change impacts on river discharge for 12 large scale river basins and showed that larger uncertainties were associated with global climate projections than hydrological models; however, Vetter et al. (2015) showed that in meltdominated basins, hydrological models (HMs) also have relatively high contribution to uncertainties in hydrological projections. These uncertainties in the HMs can be attributed to model parameterization (Ragettli et al. 2013), implementation of hydrological processes (Orth et al. 2015) or model calibration and validation approach (Troin et al. 2016). The quantification of different sources of uncertainties is an important factor when it comes to interpreting the results of climate impact studies, because different uncertainty sources could influence projected water availability differently.
Judging the model performance before applying it to impact assessments can be one way to minimize the overall uncertainties but it is still questionable if good model performance in the historical time period should increase confidence in projected impacts and decrease the modelrelated uncertainties. For example, Coron et al. (2012) argue that good model performance in the historical time period might show lack of robustness when models are used under a changing climate, if the model parameterization is unable to tackle the non-stationarity of future climate data. However, some studies have shown that when the models are calibrated for longer records that include different hydro-climatic periods, they can reasonably simulate the hydrological processes under different climatic conditions (Guo et al. 2018). Some studies have used a variety of approaches to enhance prediction for non-stationary conditions (e.g. Vaze et al. 2010). These approaches, for example, include better representation of vegetation processes (Murray et al. 2011), more complex hydrological models (Barnard et al. 2019) or use of different approaches for parameterization and calibration (Merz et al. 2011;Smith et al. 2008). The choice of calibration method has a major impact on a model's ability to represent different processes, especially in glacierized basins, because the compensation effect between runoff generation processes can impact the overall results. Therefore, constraining the snow and glacier melt in hydrological modelling is essential for preventing internal process compensation (He et al. 2018;Tarasova et al. 2016).
In the past, most of the climate change impact assessment studies carried out specifically for the UIB only used a conventional calibration method in which model parameters were adjusted based only on comparison of simulated and observed discharges at the downstream outlet of the catchment (Hasson et al. 2019;Lutz et al. 2016) for one continuous time period. A common assumption implicit in most of these studies is that hydrological models calibrated over the historical period are valid for use in the future under the changed climate conditions. However, to our knowledge, there are no studies for the UIB investigating whether a model calibrated for different climatic conditions (dry/wet periods) will simulate different hydrologic response in future than the model parameterized using the conventional method.
In the present study, we have applied a five-step enhanced calibration/validation method (Krysanova et al. 2018) which (1) takes into account the quality of observational data, (2) utilizes contrasting climate data for calibration, (3) includes comparison of discharge for multiple sub-catchments (i.e. intermediate gauges), (4) includes multi-variable calibration for variables like evapotranspiration, glacier mass balance (GMB) or snow cover area (SCA) and (5) evaluates observed and simulated trends in the observational period.
The aim of this study is to compare two calibration/validation approaches applied to two HMs, and analyse their influence on future climate change impact projections and the HMrelated uncertainty. The specific objectives are (a) to apply and analyse two model calibration/validation methods (i.e. conventional and an enhanced approach); (b) to compare climate change impacts on discharge using the two model parameterisations; and (c) to assess and compare the contribution of representative concentration pathways (RCPs)', GCMs' and HMs' (based on two calibration/validation methods) uncertainties to the overall uncertainty. To achieve these objectives, two HMs with different model complexities were used. The lumped conceptual snowmelt runoff model + glaciers (SRM + G) (Schaper et al. 1999), which is a glacier-integrated enhanced version of SRM (Martinec 1975), and the semi-distributed process-based variable infiltration capacity (VIC) model (Liang et al. 1994) coupled with a glacier model (VIC-Glacier). Both HMs were applied to the UIB which has dominant snow and glacier melt regime. For climate change impact assessment the bias-corrected climate scenarios from four GCMs (GFDL-ESM2M, HadGEM2-ES, IPSL-CM5A-LR, MIROC5) driven by two representative concentration pathways (RCPs), which were provided by the ISIMIP project (Hempel et al. 2013;Frieler et al. 2017) were used.

Upper Indus Basin
The Indus River is one of the longest flowing rivers in Asia originating from the Tibetan Plateau and ending in the Arabian Sea. During its course, it runs through the Karakoram and Himalayan ranges. The UIB primarily feeds the largest reservoir Tarbela (Supplementary Fig. S1 1 ) in Pakistan (Fig. 1). The UIB upstream of Tarbela has an approximate drainage area of 173,345 km 2 spread over three countries, namely China, India and Pakistan (Ismail and Bogacki 2018). It has a very rugged mountainous terrain with more than 65% of the area above 4000 m.a.s.l. (Supplementary Fig. S2). The glacier area in the UIB is approximately 21,000 km 2 (~12% of the total UIB area) based on the estimates from the Global Land Ice Measurements from Space (GLIMS) database (Raup et al. 2007). In addition, two sub-basins (Hunza and Astore) were selected for enhanced calibration. More detailed information about the hydro-climatic characteristics of the UIB is given in Supplementary section SM1.

Hydrological models
Two HMs, SRM + G and VIC-Glacier with different levels of complexity in the implementation of modelled processes, were applied in this study. The overall general comparison between the two models has been summarized in Supplementary Table S1.

SRM + G
The Snowmelt Runoff Model (SRM) (Martinec 1975) is a lumped, temperature index (TI) model which was designed to simulate the runoff in snowmelt-dominated regions. It has been successfully applied in more than hundred snowmelt-driven basins around the globe (Martinec et al. 2011). In this study, the catchment is sub-divided into elevation zones with 500 m intervals (with the exception of the lowest zone which extends from 432 to 1000 m.a.s.l.). The input data for the SRM consist of daily air temperature, precipitation and SCA, assigned to each elevation zone. SRM calculates snowmelt using the degree-day method. An enhancement of the original SRM approach called SRM + G has been introduced by incorporating a separate glacier melt component that considers the area covered by exposed (i.e. not snow-covered) glaciers (Schaper et al. 1999). This extension requires the zonal exposed glacier area for each elevation zone as an additional daily input variable and respective model parameters.

VIC-Glacier
The variable infiltration capacity (VIC) model is a macroscale hydrologic model (Liang et al. 1994) which has been successfully applied to many basins in regional and global scale applications (e.g. Huang et al. 2018;Naz et al. 2016) with a spatial resolution ranging from 0.04°to 2°. The VIC model solves full water and energy balances independently for each grid cell within a watershed. A routing algorithm is used to simulate streamflow at a specified location by routing runoff and base flow from each grid cell (Lohmann et al. 1998;Cherkauer et al. 2003). The VIC snow algorithm uses a two-layer energy and mass balance approach to simulate snow accumulation and ablation, including explicit tracking of the energy used for snowmelt and refreezing, and changes to the snowpack heat content (Andreadis et al. 2009). Each VIC model grid cell can be sub-divided into snow elevation bands to represent the effect of sub-grid topography on snow accumulation and melt. As implemented here, the VIC model represents the areal extent of the snow cover based on the assumption that snow fully covers the fractional area of each snow elevation band. To explicitly simulate glacier melt in each grid cell, a novel approach as a third bottom ice layer is introduced to the two-layer VIC snow model ( Supplementary Fig. S3). During the model run, the ice thickness is updated at each model time step for each elevation band as a function of net mass balance changes using a similar approach to Naz et al. (2014) (Supplementary section SM3). It is the first time that an energy balance-based glacier model has been incorporated in the original VIC model, hereafter referred to as VIC-Glacier. The glacier movement and short-term storages of meltwater through englacial and sub-glacial drainage system as well as within a firn layer were not directly considered in the model.

Input weather data and climatic change scenarios
The input data required by both HMs include the altitudinal characteristics, temperature and precipitation while the VIC-Glacier model also requires humidity, wind speed and radiation. Ground observations from the Pakistan Meteorological Department (PMD) and the Water and Power Development Authority (WAPDA) were used to check the quality of the gridded EWEMBI 2 climate dataset (Lange 2018) in the historical period. For the climate impact assessment, the HMs were driven with outputs of four bias-corrected earth system models i.e. GFDL-ESM2M, HadGEM2-ES, IPSL-CM5A-LR and MIROC5 for two representative concentration pathways (RCPs) (i.e. RCP2.6 and RCP8.5).
For the VIC-Glacier model setup, all the above climatic input data were resampled to 0.25°resolution from 0.5°using bilinear interpolation and disaggregated to a 3-hourly temporal resolution. The 3-hourly temperature and downward short-and long-wave radiation data were derived from the daily temperature range and daily total precipitation using methods described in Nijssen et al. (2001) and daily precipitation data were temporally disaggregated by equally apportioning precipitation to 3-hourly intervals for each grid cell at 0.25°resolution.

Spatial model setup
In SRM + G, the whole basin was sub-divided into 11 elevation zones with an interval of 500 m using NASA's 1-arcsecond Shuttle Radar Topography Missions (SRTM) Digital Elevation Model (DEM) ( Supplementary Fig. S2). The MODIS/Terra Snow Cover Daily L3 Global 500 m Grid (MOD10A1) product (Hall et al. 2006) was used to determine the daily SCA after 2000 (Fig. S4). But the historical reference period used for climate change impact was 1976-2005. Therefore, the percentage of SCA before year 2000 and for future time periods was estimated using a new empirical temperature similarity approach based on the assumption that snow cover conditions in an elevation zone are best defined by temperature as described in Supplementary section SM5. Mean monthly historical and future temperature for each zone was first compared and then based on the best fit criteria using the optimum Kling-Gupta efficiency (KGE) (Kling et al. 2012) (Supplementary section SM6) the relevant zonal SCA was selected from the historical MODIS snow cover data (i.e. 2000-2016).
Historical glacier extent was derived from the GLIMS data. The evolution of the exposed glacier area ratio which is required for the SRM + G model is presented in Supplementary  Table S2 and Fig. S5. For future changes in the glacier area, a glacier area reduction scenario was assumed. In this scenario, the glacier area is reduced to 75% in the near future (2011-2040), 50% in the mid future (2041-2070) and 25% in the far future (2071-2099) periods under both RCPs. While in the historical time period, 100% glacier area was used (i.e. no area reduction). Similar assumptions of glacier area reduction for the UIB can be found in Hasson et al. (2019) and Hasson (2016). The assumption that the glacier area will be reduced to 25% at the end of the century is within the range of the results presented in Lutz et al. (2016), where the simulated mean remaining glacier area under two RCPs (i.e. 4.5 and 8.5) using 4 GCMs for the Hindukush-Himalayan-Karakoram (HKH) region by 2100 was 32% and 15%, respectively. Glacier area is reduced from lower to higher zones successively. SRM + G uses runoff coefficients for snow, glacier and rain in order to account for the losses, which are the differences between the available water volume and the outflow from the basin.
For the VIC-Glacier model, each grid cell is divided into a maximum of 25 elevation bands to represent the effect of topography on precipitation and temperature to account for variability within grid. Within each band, the fractional area, average elevation and fractional precipitation were calculated. The average elevation of the band was calculated based on the maximum and minimum elevation within each band using the 30 m SRTM DEM. To account for the effects of sub-grid topography on precipitation distribution within a grid cell, the precipitation fraction within each band was calculated using precipitation climatology data available at 30 arcsecond resolution from WorldClim (version 1.4, Hijmans et al. 2005) averaged over . Using the glacier cover extent, average ice thickness values were calculated using the surface slope information derived from the 30 m SRTM DEM for each band (Supplementary section SM3, Eq. 1). This information was then used as an input to the VIC-Glacier model in addition to fractional area, average elevation and fractional precipitation for the 25 elevation bands. The areal extent of glaciers in the VIC-Glacier model was based on the estimated extent of snow-covered glacier, clean ice and debris-covered areas of the glaciers from cloud-free Landsat scenes at the end of the melting season available between 1998 and 2002 based on Khan et al. (2015), which showed a good agreement (i.e. 91% accuracy) with the GLIMS inventory for the central Karakoram region.

Model calibration and validation
Two model calibration/validation methods (i.e. hereafter referred to as conventional and enhanced methods) were applied. Both the HMs were calibrated and validated using the observed inflows at Tarbela using a manual calibration method. For SRM + G, there were 11 parameters to calibrate including the three runoff coefficients and the separate degreeday factors (DDFs) for snow and glaciers (Supplementary Table S3). For the VIC-Glacier model, 5 parameters including the variable infiltration parameter, bi, the three base flow parameters (W s , D s , and D smax ),and the snow roughness length were used to calibrate the model. After identification of the sensitive and key parameters for both models, the HMs were first calibrated using the conventional method, i.e. only against discharge at the final outlet gauge as adopted in many previous climate impact studies Reed et al. 2004). The models were calibrated for 9 years (i.e. 2003-2011) and validated for 8 years over two periods (2000-2002 and 2012-2016). Model performance was judged only on the basis of the model's ability to simulate the discharge at the basin outlet. Secondly, for the enhanced calibration/validation method, the five calibration and validation steps suggested by Krysanova et al. (2018) were followed, as explained in the following sub-sections.

Checking the data quality
Station data was utilized to check the data quality by comparing the observed monthly temperature and precipitation time series with EWEMBI data for the calibration/validation period (i.e. 2000-2016). Both EWEMBI temperature (Supplementary section SM7-9) and precipitation (Supplementary section SM10) were bias-corrected. In case of temperature, quantile mapping method (Piani et al. 2010) was used for bias correction, while for the precipitation, altitude correction approach adopted by Immerzeel et al. (2015) was used to calculate the bias correction factor. After the bias correction, EWEMBI-based annual precipitation for the whole UIB averaged for the period 2000-2016 is about 604 mm/year. This value is in the range of previously reported e.g. 675 ± 100 mm/year (Reggiani and Rientjes 2014) and 608 mm/year (Khan and Koch 2018).

Calibration and validation for contrasting climatic periods
Two different climatic conditions (i.e. cold/dry and warm/wet) within the given time period were identified to calibrate and validate the HMs for these conditions. Calibration and validation of the model in contrasting climates will give more credibility to the fact that the model can perform well in a changing climate (i.e. considering climatic non-stationarity). The available time period was sub-divided into 'dry/cold' and 'wet/warm' periods based on the basin-averaged bias-corrected annual precipitation and air temperature. The periods 2000-2004 and 2008-2010 were then selected as 'dry/cold' and 'wet/warm' respectively ( Supplementary Fig. S7).

Multi-site calibration and validation
The third step involved the calibration and validation for intermediate gauges. The Daniyor (Hunza) and Doyian (Astore) gauges were selected as intermediate gauges (Fig. 1).
For multi-site calibration and for different climate periods, an optimal parameter set was identified through an objective function θ formulated as eq. 1: where KGE is calculated from the monthly discharge time series. The KGE for the two distinct climates is a weighted sum of KGEs of all selected gauges (Eqs. 2 and 3).
where n = 3 is the number of gauges selected in catchment.

Calibration for additional variables
The calibration/validation of an additional variable means that the model shall not only be restricted to observe discharge but also consider other observations such as snow cover area (SCA), glacier mass balance (GMB) or actual evapotranspiration (ET a ) so that internal consistency of the simulated processes can be ensured. In the UIB, the most important discharge component is meltwater generation through snow and glacier melt. Hence, it is essential to take into account the GMB while simulating discharge in order to prevent internal process compensation (Hagg et al. 2006;Finger et al. 2015;He et al. 2018). In case of SRM + G, degree-day factors for the glaciers (a g ) (Table S3) were manually adjusted to consider the GMB. These calibrated a g values are well within range compared to earlier studies (e.g. Lutz et al. 2016). For SRM + G, GMB is calculated annually by subtracting the glacier melt component from snowfall on to the glaciers. For VIC-Glacier, the net annual GMB is determined from the change in storage states of snow and ice water equivalent at the end of each hydrological year (Supplementary section SM11). The model parameters related to snow accumulation/ablation, snow roughness length and temperature lapse rate were manually calibrated to adjust GMB. Literature-based GMB values available for this region (Supplementary Table S4) were considered for comparing the mean annual mass balance values over the UIB (period: 2000UIB (period: -2016 in the model calibration/validation in order to avoid the compensation effect between glacier and non-glacier runoff. Secondly, basin-averaged ET a based on the Global Land Evaporation Amsterdam Model (GLEAM) (Martens et al. 2017;Miralles et al. 2011) was compared with the simulated ET a of VIC-Glacier and with the losses generated through the runoff coefficients in the SRM + G model.

Validation for trends
The Mann-Kendall Trend Test (Mann 1945& Kendall 1975) was applied to check for consistently increasing or decreasing monotonic trends in simulated and observed discharges. It should be noted that the hydrological trend analysis should be done with long-term data (e.g. 40 years) but here the trend analysis was performed with only 17 years of available data.

Analysis of uncertainties under climate scenarios
In this study, we have applied the analysis of variance (ANOVA, Gottschalk 2006) approach to decompose the sources of uncertainties (Kaufmann and Schering 2014). Here, we have three different sources of uncertainties including two HMs, four GCMs and two RCPs, so we have applied a three-way ANOVA. It gives the share of uncertainties related to a specific source as well as their interaction terms. A more detailed description of the ANOVA method can be found in Vetter et al. (2015).

Calibration and validation of hydrological models
The results for the conventional and enhanced calibration/validation methods based on KGE, NSE and Percent bias (Pbias) criteria are presented in Table 1. Overall for the UIB, the monthly KGE values in the calibration/validation period are around 0.86 for the SRM + G model, while for the VIC-Glacier model the KGE values ranged from 0.57 to 0.64 using the conventional method. SRM + G underestimates discharge in the first half of the year when snowmelt is dominating while it overestimates discharge in July-September when the glacier melt comes into effect. VIC-Glacier overestimates peak flows, with under-prediction of winter base flow in many of the calibration years ( Supplementary Fig. S8).
Using the enhanced method, the model calibration/validation was performed against discharge of the UIB and intermediate gauges of Astore and Hunza for different climate conditions. Additionally, GMB and ET a were also considered. The calibration periods of 2000-2004 and 2008-2010 were selected for the dry and wet climate conditions, respectively. For the model validation, the remaining years (2005-2007 and 2011-2016) were selected.
In general, for the UIB, both HMs perform well for dry and wet periods with KGE ≥ 0.65. For the validation years, the monthly KGE values are 0.91 and 0.65 for the SRM + G and VIC-Glacier HMs, respectively. However, both HMs show relatively weaker performance for the Hunza and Astore basins compared to the whole UIB. The monthly KGE values for both dry and wet periods from both HMs are greater than 0.38 and 0.49 for Hunza and Astore river basins, respectively. For the validation period, the monthly KGE value for SRM + G model is 0.52 in the Hunza basin, while VIC-Glacier model shows weaker performance with KGE value of 0.25. For the Astore basin, SRM + G shows satisfactory performance with KGE value of 0.64 while VIC-Glacier shows very good performance with KGE value of 0.79 in the validation time period. Some unsatisfactory results for the Hunza and Astore basins show the diversified hydrological regimes of the sub-catchments with Hunza being more glacierdominated and Astore as a snowmelt-driven catchment. In case of SRM + G, the DDFs controlling the snow and glacier melt were shifted and adjusted to achieve adequate results in both the Hunza and Astore basins as well as the whole UIB. Consequently, the results for the sub-catchments are not as good as obtained in previous studies (e.g. Tahir et al. 2011Tahir et al. , 2016 since they calibrated and validated each catchment separately, while in the present study a single set of parameters was applied to all sub-basins which explains some differences in results. The unsatisfactory performance of the VIC-Glacier model in the wet and validation periods in the Hunza catchment might be related to the biases in the forcing datasets. While the air temperature in the present study was bias-corrected with station data, for precipitation a single basin-wide multiplication factor was used. This can lead to degraded model performance (Islam and Déry 2017), especially in the sub-catchments with complex spatial and temporal variability in precipitation (Archer and Fowler 2004). Hydrological responses in the UIB can be improved with better quality data particularly for the models that resolve the full energy and water balances such as VIC-Glacier. In order to constrain melt parameters and achieve higher internal model consistency (Mayr et al. 2013), a comparison of simulated GMB with the literature-based GMB was performed. Simulation for calibration/validation period (i.e. 2000-2016) yields GMB values of about − 98 mm/year and − 83 mm/year for SRM + G and VIC-Glacier, respectively, which is in the range of many previous studies for the Karakoram ranges summarized in Supplementary  Table S4.
The validation for ET a for the SRM + G model was not a one-to-one comparison as explained earlier (Sect. 3.4.4). However, the average annual ET a modelled by GLEAM (220 mm/year) vs SRM + G losses (258 mm/year) for the calibration/validation period are relatively close (Supplementary Fig. S9). For the VIC-Glacier model, the simulated long-term average annual ET a is underestimated (125 mm/year) in comparison to GLEAM's ET a , but these results are well in range compared to the values 200 ± 100 mm/year found in the literature (Reggiani and Rientjes 2014). Seasonally, the GLEAM-estimated ET a is generally higher than the VIC-Glacier-simulated ET a (Supplementary Fig. S10). The differences in the ET a might be related to different estimation methods used by VIC-Glacier and GLEAM. VIC-Glacier uses an energy balance approach to account for sublimation, and GLEAM calculates sublimation over snow and ice-covered areas using a specific parameterization of the Priestley and Taylor equation (Martens et al. 2017), which highly relied on the quality of the data from satellites products.
The last step was to validate the model for the observed or lack of trend. The trend was calculated for the annual median flows for the outlet. The calculated Mann-Kendall Tau coefficient (τ) for the observed flows (i.e. 2000-2016) was 0.015 while it was 0.38 and 0.45 for SRM + G and VIC-Glacier, respectively. Similarly, the Sen's slope (S) (Sen 1968) was 1.9, 25.1 and 58.0 [m 3 s −1 year −1 ] for the observed, SRM + G and VIC-Glacier respectively. The positive values of τ and S for the simulated and observed discharges show that there are increasing trends. These trends are statistically significant for the simulated discharges (p < 0.05 significance level).
After the performance evaluation, a possibility of model weighting for the impact assessment was analysed. Model weighting has been recommended in some studies as a feasible approach in reducing uncertainties in climate impact studies (Greene et al. 2006;Zhu et al. 2013). This technique gives more weights to the more skilful models and lesser weights to the models that do not match the observed dynamics (Vetter et al. 2015). In this study, there are only two HMs so both HMs have been assigned equal weights instead of specific weights. If there were more than two HMs, then it would make more sense to apply specific weights based on model performance. The calibration/validation results have been summarized as Taylor diagrams (Taylor 2001) (Supplementary Fig. S11) showing extended performance metrics in addition to Table 1.

Evaluation of climate change scenarios
To assess the climate change scenarios, three periods 2011-2040 (near), 2041-2070 (mid) and 2071-2099 (far) were compared to the historical period 1976-2005. The changes in biascorrected precipitation and temperature were evaluated and compared between the historical and future periods for the UIB (Supplementary Fig. S12). The long-term annual median precipitation based on 4 GCMs in the historical period is about 620 mm. In the RCP2.6 scenario, the change in the average annual precipitation is in the range of 2 to 6% for all 3 future periods while it ranges between 6 and 15% for RCP8.5 with the highest increase at the end of the century.
Long-term changes in air temperature range from 1.4 to 2.0°C and 1.7 to 6.2°C for RCP2.6 and RCP8.5, respectively (historical median is close to 0°C). The highest projected increase in the median temperature is more than 6°C compared to the historical period for RCP8.5 by the end of the current century.

Impacts on the long-term mean annual and seasonal dynamics
Both HMs were run for the period 1976-2099 forced by four GCMs and two RCPs. In total, 24 time series were obtained for each HM (i.e. based on four GCMs, two RCPs, two calibration/validation methods, plus 8 historical runs for two calibration/validation methods), which were then analysed for the long-term mean annual and seasonal dynamics as shown in Fig. 2a and b.
The combined long-term projected changes in discharge from both HMs show contrasting trends for two RCPs and between two calibration/validation methods (Fig. 2a). The largest increases in discharge are projected for RCP8.5 in the mid-century (median: 15%) and far future (median: 19%) based on the enhanced method, and the median changes based on two methods differ in sign for RCP2.6 in all periods and for RCP8.5 in the near future.
The comparison of differences between the mean impacts based on two methods at the annual scale (Fig. 2a) shows minor to moderate differences in impacts based on these two methods for both RCPs and for all three future periods, with a maximum difference of about 9% for RCP2.6 and 10% under RCP8.5 in the far future scenario (Supplementary Table S5). However, at the monthly time scale, there are moderate to large differences in impacts between simulations based on the two calibration/validation methods, especially in the cold months (Supplementary Table S5). The largest differences were found in October-November (6-9%) in all periods under RCP2.6, and in March (− 8%, − 17%) and [18][19] in the mid and far future periods, respectively, under RCP8.5. The simulated seasonal impacts are different based on these two calibration/validation methods especially for RCP8.5.
The projected monthly changes in discharge (Fig. 2b, comparison of the long-term mean monthly flows for historical and future periods are also shown in Supplementary Fig. S13) indicate an increase in discharge in the low flow season (October-March) for all three future periods and both RCPs. An increase in flows is projected in the early spring months where most of the water is contributed by snowmelt. In the high flow season (July-September), there is a projected decrease in discharge for all three future periods. The projected decrease in late summer flows is mainly because of the decrease in glacier melt which is indicated by both the HMs (Table 2). This change in water availability for the high flow season will ultimately impact the overall water availability which is also evident in Fig. 2b. Figure S14 in Supplementary shows a comparison of projected changes in total discharge simulated by both HMs. There are rather small median changes (maximum: 7%) simulated by SRM + G for both RCPs in all periods and by VIC-Glacier for RCP2.6 in all periods and RCP8.5 in near future. However, quite significant increases (31-43%) are projected by VIC-Glacier under RCP8.5 in the mid and far future periods (Table 2). These differences reflect the differences in glacier parameterization in the two models. The future SCA percentages within SRM + G as estimated by the temperature similarity approach were 35% and 45% lower for RCP2.6 and RCP8.5, respectively, when compared to the historic time period (Supplementary section SM5), whereas, VIC-Glacier explicitly models the changes in snow/glacier area as a result of snow and glacier melt within a given elevation zone of a grid cell.

Discharge components for the UIB
In order to assess the hydrological regime changes for different flow components (i.e. snowmelt, glacier melt and rainfall) under the climate change scenarios, analysis of meltwater contribution to annual discharge has been performed separately for the two HMs ( Supplementary Fig. S15 and Table 2). For the SRM + G model, the snowmelt, glacier melt Fig. 2 a Projected changes in the long-term mean annual discharge under RCP2.6 and 8.5 for future periods based on both calibration/validation methods using two HMs and four GCMs. b Projected changes in the mean monthly discharge for the near, mid and far future periods compared to the historical period based on both calibration/validation methods under both RCPs for two HMs using four GCMs and rain contributions to river discharge at Tarbela for the historical time period  are about 48%, 25% and 27%, respectively. These contributions are similar to the earlier studies (e.g. Mukhopadhyay and Khan 2015). For RCP2.6, there is a projected increase in snowmelt ranging from 5 to 6% compared to the historical period ( Table 2). The rainfall contribution increased by 36-40%. In case of glacier melt, large decreases ranging between − 28 and − 79% are projected. Moreover, for RCP8.5, snowmelt contribution initially increased a little in the near future and then decreased by − 12 to − 36% in the mid and far future periods. The rainfall contribution is projected to increase by 20-67%, while the contribution due to glaciers decreases by − 19 to − 45%.
For the VIC-Glacier model, the snowmelt, glacier melt and rain contributions to total discharge are estimated to be about 29%, 31% and 40%, respectively, in the historical period. There is a projected increase in snowmelt contribution ranging from 35 to 42% under RCP2.6 and 35-68% under RCP8.5, whereas the contribution from the rainfall increases by 15-17% under RCP2.6 and 16-58% under RCP8.5. Glacier melt contributions in the future decrease by − 77 to − 93% under RCP2.6 and − 73 to − 77% under RCP8.5. These changes are consistent with the previous projections of changes in glacier and snowmelt in the UIB (e.g. Lutz et al. 2016).
Although two HMs apply different modelling approaches (Sect. 3.3), both show a similar behaviour towards the changes in the glacier melt contribution. It should be noted that SRM + G assumed a glacier area reduction to 25% till the far future period, whereas VIC-Glacier explicitly modelled the glacier area as a result of glacier ablation within a given elevation zone. An important contrast to highlight, in SRM + G, these decreases in glacier melt are mandated by the imposed decrease in glacier area, while in the VIC-Glacier model this reflects simulated change in storage. However, the contradictory trends in projected changes in snowmelt between the two HMs could be due to the fact that SRM + G uses temperature index while VIC-Glacier is employing the two snow layer energy balance approach and explicitly simulates the snow sublimation, as well as differences in the lapsing of temperature which controls the partitioning of precipitation into snow and rain components.

Contribution of sources of uncertainty
The sources of simulation uncertainty were analysed by the ANOVA method as applied by Vetter et al. (2015). The sum of squares for different sources of uncertainty for each 30-year period for the mean discharge has been summarized (Fig. 3). The uncertainty contributions from different sources (i.e. HMs, RCPs, GCMs and interactions) vary but the overall result shows that the uncertainty increases from near to far future. For the enhanced method, the uncertainty contribution from the HMs becomes very small towards the end of the century, compared to the conventional method (Fig. 3). But in the near future, HM uncertainty contribution is larger for the enhanced method which is most likely related to the differences in timings of peak flow in two HMs for two calibration methods. For example, VIC-Glacier simulates the peak discharge in July-August under both calibration methods, while SRM + G simulates the peak in July-August in the conventional and June-August in the enhanced method. This could explain the larger HM uncertainty based on the enhanced method in the first period. Regarding the RCPs, initially, the uncertainty contribution in the near future period is close to zero but it increases for the far future time period as the scenarios diverge. The GCMs uncertainty contribution is higher for the mid future period than the near and far future periods. The 1st and 2nd order interactions between the HMs, RCPs and GCMs also show an increased uncertainty contribution from near future to far future scenarios, whereas, the residual sum of squares which is the portion of total variability that is not explained by the model decreases from near to far future.

Discussion and conclusions
This paper analyses and quantifies the influence of two calibration/validation strategies on future projections for the UIB. By comparing the two approaches for the same catchment, the main outcome is increased confidence in the projections generated using the enhanced calibration/ validation method. The impact results based on the enhanced method are more credible because in this method, (i) calibration parameters were tested for intermediate gauges as well as for the contrasting climatic conditions, (ii) GMB was considered in model calibration/ validation and (iii) model parameters remained homogenous throughout the catchment. It is true that discharge for the intermediate gauges was better reproduced in previous studies (e.g. Hayat et al. 2019) but our weaker results can be attributed to the fact that in previous studies the intermediate gauges were separately calibrated and validated. Calibrating for the contrasting climate in the enhanced method can help to address climate non-stationarity (Vaze et al. 2010) by giving more assurance that the model with these parameters is more suitable to variable climate conditions. The enhanced method also addressed to some extent problems related to equifinality by validating for additional variables (e.g. ET a and GMB).
It can also be concluded that the uncertainty contribution from the HMs is getting smaller for projections simulated using the enhanced method compared to the conventional method, except for the near future (Sect. 4.5). The two different parameterizations for the HMs provided quite different impact assessment results under RCP8.5. As the enhanced method is considered more reliable, impacts based on this method are more trustworthy. Different HMs might give different signals for projected hydrological change, especially for the different flow components as shown here, where one model showed increased snowmelt and the other decreased. It is recommended to use a multimodel ensemble for impact assessment in order to minimize the model-related uncertainties.
The assessment of the hydrological regime changes in the UIB is a complex subject because it involves snow and glacier melt-dominated processes and uncertainties related to these methods. There exist different estimates on the contributions from glacier and snowmelt for the UIB in the literature. Recently, some studies for the UIB showed that there is an increase in overall discharge. For example, Hasson et al. (2019) suggested that in the UIB, future flow will increase from 34 to 43%, while assuming that glacier areas remain unchanged. But if it is assumed that glacier area will decline in the future then ultimately flows will show a negative trend (i.e. UIB discharge will reduce to roughly 8% under RCP4.5) Hasson (2016). Moreover, Lutz et al. (2016) also concluded that there could be both rising as well as falling trends for the water availability in the UIB ranging from −15 to 60% by the end of the twenty-first century under RCP4.5 and 8.5, based on a different GCM ensemble and for different areal extent of the UIB.
Bearing in mind that the focus of this study is the comparison of different calibration strategies in a mountainous catchment, we have to admit that some assumptions in the modelling had certain shortcomings (due to data scarcity and data quality problems in this region) for an accurate estimation of future contributions of snow and glacier to river discharge. For example, in the VIC-Glacier model, the change in glacier area is explicitly modelled as a result of glacier ablation within a given elevation zone of a grid cell, although no glacier movement to lower elevations is simulated. As a result, the decrease in glacier area is likely overestimated which may also contribute to the strong decrease in glacier melt contribution by end of the century (more than 70%). In SRM + G, this is overcome by forcing future glacier area reduction based on results of previous studies on glaciological modelling. We acknowledge that if different glacier area reduction scenarios had been considered for each RCP, then the outcome would have been different. Thus, a precise assessment of existing glaciers tipping point is critical for assessing the future water availability.
Long-term mean annual GMB as simulated by SRM + G in the historical period  is about − 86 mm/year, with similar trends in future periods ranging between − 52 and − 15 mm/year, and − 85 and − 74 mm/year for RCP2.6 and RCP8.5, respectively ( Supplementary Fig. S16). For VIC-Glacier, the long-term historical mean annual GMB is about 0.62 mm/year. While for the future period, it ranges between 22 and 44 mm/year and − 125 and 3 mm/year for RCP2.6 and RCP8.5, respectively ( Supplementary Fig. S16). Simulated GMB for the calibration/validation period (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016) is comparable to earlier studies (e.g. Bolch et al. 2017;Muhammad et al. 2019) for the Karakoram ranges (Supplementary Table S4). The use of different datasets and time periods as well as different size of study areas is among the potential reasons for differences between the modelled GMB results compared to previous studies. Although our knowledge of snow/glacier changes and their responses to climate forcing is still incomplete, consideration of GMB in the enhanced method makes the simulated glacier melt contribution more credible. However, it is important to note that presently the glacier evolution is not explicitly modelled in both HMs. A dynamic representation of glaciers in a HM would be better suited to simulate future glaciohydrological changes in such a catchment (Wortmann et al. 2019).
Moreover, the procedural gap like the unavailability of simulated actual evapotranspiration (ET a ) data from SRM + G model was handled by using model losses as a proxy for ET a . Besides, a new temperature similarity approach was used to estimate the future depletion of snow cover in the catchment which can be improved by taking into account seasonal snow accumulation. This is to say that the models used here can be further improved for a more accurate assessment of future water availability but are used here to demonstrate the influence of different calibration strategies on projection of discharge.
The main conclusions from this study can be summarized as follows: & The model parameterization achieved in the enhanced calibration/validation method is better suited for climate change impact assessment because these parameters were tested at three gauges under different climate conditions and GMB was also given due consideration. & Calibration/validation for multiple variables (i.e. discharge, GMB, ET a ) leads to more accurate simulation of hydrological processes with little sacrifice of model performance in terms of goodness of fit statistics. & Two different parameterizations of the HMs lead to different impact assessment results at the mean monthly and annual scales reaching up to 19% and 10%, respectively. We consider the impacts based on the enhanced method more credible. & According to the analysis of variance, the contribution of HM parameter uncertainty to the overall uncertainty becomes very small by the end of the century based on the enhanced calibration/validation approach.