1 Introduction

Water flows as well as water storage at and below the land surface of the Earth affect water availability for humans and ecosystems, result in hazards such as floods and affect atmospheric processes, sea level and global biogeochemical cycles. They have been increasingly altered by human actions including emissions of greenhouse gases, land use, water abstractions and the construction of dams and dykes (e.g., Vörösmarty and Sahagian 2000; Sterling et al. 2013). Ecosystems suffer from these alterations and, in many regions, human development is constrained by water scarcity. Freshwater systems including their natural and human components need to be characterized regarding both water quantity and quality to support a sustainable land and water management and a better understanding of the Earth system. Global-scale quantification of water flows and storage in freshwater systems under current and future conditions is of particular interest in a globalized world and can support the wise development of global-scale water management and governance (Vörösmarty et al. 2015).

Quantification is best achieved by combining in situ and remote sensing data with physical modelling. Global hydrological modelling serves to estimate water flows on the land areas of the globe such as evapotranspiration, river discharge or groundwater recharge as well as water storage (or only water storage variations) in different compartments, e.g., in soil or in groundwater and surface water bodies. It uses data on precipitation and other climate variables over land areas as input and computes water flows from the land areas to the oceans (or into internal sinks on the continents), thus covering the terrestrial part of the global water cycle. While global hydrological modelling has been refined and extended with respect to modelled processes (in particular regarding human impacts on natural water flows and storages) and computed indicators in the last decade, modelling uncertainties have not become less albeit better known. These uncertainties are generally categorized into uncertainties due to model inputs (e.g., climate variables or soil properties), parameter values and model structure, but uncertainty of observations used for model validation or calibration also has to be considered (Sood and Smakhtin 2015). Different models compute contradictory estimates of, for example, mean annual evapotranspiration and low, mean and high flows in river basins (Gudmundsson et al. 2012) or groundwater depletion (Döll et al. 2014a). They result in strongly varying projected impacts of climate change on river discharge (Schewe et al. 2014) or irrigation water requirements (Wada et al. 2013). Even global mean annual evaporation estimates as derived from global hydrological modelling (or satellite observations) differ by almost a factor of 2 (Jiménez et al. 2011), which is an important obstacle for the detection and attribution of changes in evapotranspiration due to global warming (Douville et al. 2012). Reasons for the discrepant model output have not been sufficiently analysed.

With this paper, the authors wish to share their perspectives on important challenges of and prospects for modelling continental water flows and storages at the global scale. In the next section, we briefly present existing modelling approaches. In Sect. 3, we discuss seven challenges and illustrate them with results of two global hydrological models (GHMs), WaterGAP (Döll et al. 2003; Müller Schmied et al. 2014) and PCR-GLOBWB (Wada et al. 2014). In Sect. 4, we present three advancements that may help to better characterize freshwater flows and storages at the global scale in the future. Finally, we draw our conclusions.

2 Approaches for Modelling Global Hydrology

To understand and quantify natural and human-induced water flows and storage changes across large scales, a number of models that simulate the continental part of the hydrological cycle on a regional to global scale have been developed in recent decades. Models developed to simulate global hydrology can be roughly classified into GHMs, land surface models (LSMs) and dynamic global vegetation models (DGVMs). Most DGVMs, however, do not include lateral water flows or surface water bodies, and can therefore only be used to assess runoff but not discharge. GHMs focus on simulation of water resources; they have a comprehensive representation of continental hydrological processes and often take into account human water use as well as man-made reservoirs. LSMs serve as a module of global climate models (GCMs) and therefore model both water and energy balances at the land surface. Due to this, they often represent the soil with a higher vertical resolution than GHMs and represent evapotranspiration and snow melt in a less conceptual manner than GHMs. LSMs often lack a groundwater reservoir, lateral routing or consideration of surface water bodies, and in most cases, they do not model the impact of human water use or man-made reservoirs. Finally, some LSMs are able to also model vegetation dynamics, or DGVMs have been extended to simulate global hydrology including not only vertical but lateral water flows as well as human water use and man-made reservoirs. In addition, simulation of irrigation water use is not only done by some GHMs, LSMs and DGVMs but also by global crop models (e.g., Elliott et al. 2014).

In the following, we do not distinguish between GHMs, LSMs and DVGMs but summarily refer to all of them as GHMs (like Schewe et al. 2014 or Hagemann et al. 2013 did) because existing models cannot be strictly classified into the three categories and because we focus on their ability to simulate terrestrial water flows and storages. GHMs typically simulate the dynamics of soil moisture storage due to precipitation and evapotranspiration, the generation of runoff and the discharge through the river network. The majority of these models are based on the water balance concept and track the transfer of water through a number of storage compartments with time steps ranging from a month to less than 1 day. Conceptual models are chosen as they are deemed to be more robust than empirical models and more parsimonious in their data requirements than fully physically based models, while they maintain the ability to translate the effects of global change on water flows and storages in a consistent manner. Over time process descriptions have become more physically based. Few models simulate human water use that is essential to quantify river discharge, water availability and water stress, and even fewer models represent groundwater including groundwater recharge and abstractions, which is crucial to assess groundwater resources. Sood and Smakhtin (2015) presented an overview over 12 GHMs, and Bierkens et al. (2015) provided a table that describes the main features of GHMs and regional-scale hydrological models.

3 Challenges

Aiming at an improved representation of freshwater systems at the global scale, global hydrological modelling faces diverse challenges. We select some of the most important challenges that have been identified by the scientific community, i.e. constraints that lead to uncertain model output and thus limit the usefulness of global hydrological modelling for understanding freshwater systems.

3.1 Modelling Human Water Use

Human water use leads to anthropogenic water flows in the form of water abstractions from and return flows to surface water or groundwater bodies (Döll et al. 2012; Wada et al. 2011). Quantification of these flows is important for two reasons. On the one hand, water abstractions, consumptive water use (the part of the withdrawn water that evapotranspires during use) or net water abstractions (water abstractions minus return flows) are used in combination with estimates of water availability to compute indicators of water stress (or water scarcity). On the other hand, these anthropogenic water flows alter natural groundwater and surface water flows and storages (Döll et al. 2014a, b; Wada et al. 2012). It was estimated that around the year 2000, mean annual river discharge had been decreased due to water abstractions and man-made reservoirs by more than 10 % on one-sixth of the global land area (excluding Greenland and Antarctica), as compared to natural discharge (Döll et al. 2009). The strongest alterations, with, e.g., both decreases and increases of mean annual water storage (Döll et al. 2012), are found in semi-arid and arid areas of the globe, where irrigation is the dominant water use and alterations of river flow regimes by water abstractions are more important than alterations due to man-made reservoirs (Döll et al. 2009).

In global hydrological modelling, water abstractions and return flows are mostly estimated at a spatial resolution of 0.5° by 0.5° (55 km by 55 km at the equator) or 5′ by 5′ (9 km by 9 km). While water use for domestic and industrial purposes is assumed to vary negligibly throughout the year, monthly estimates of irrigation water use are required due to the often high seasonal variation of irrigation requirements. Modelling of water use for households and manufacturing strongly relies on statistical water abstraction data provided by countries, but data generally exist for a few years only. To derive annual time series by country, abstractions are modelled taking into account structural and technological change (Flörke et al. 2013). In addition, downscaling to the grid cell level is required and is mainly done based on urban and rural population in grid cells (Flörke et al. 2013; Vassolo and Döll 2005). Cooling water requirements for thermoelectric power plants are computed for each power plant as a function of power plant type and cooling system, as well as values of national thermal electricity production (Flörke et al. 2013).

Unlike domestic and industrial water abstractions, irrigation water abstractions are very rarely measured, and statistical data for countries are generally based on either modelling or rough assumptions on per hectare irrigation water use. Consumptive irrigation water use in a grid cell is modelled as a function of irrigated area, crops and climate (e.g., Döll and Siebert 2002; Rost et al. 2008). Estimation of areas equipped for irrigation and even more estimation of areas actually irrigated are prone to large uncertainties as quality of statistical information is very heterogeneous (comp. information of map quality of the Global Map of Irrigation Areas GMIA v5.0, http://www.fao.org/nr/water/aquastat/irrigationmap/index40.stm). Assumed cropping patterns describing crop-specific growing periods throughout the year strongly impact the seasonality of estimated irrigation water demands (Zhou et al. 2015) and to a lesser degree annual values. Global consumptive irrigation water use as computed by six state-of-the-art global water use models varies between 1000 and 1500 km3/year (Zhou et al. 2015, Siebert and Döll 2010), but this range may underestimate total uncertainty as all models used the same map of irrigated areas (GMIA). Estimated consumptive use strongly depends on the algorithm to compute potential (or reference crop) evapotranspiration, with global values ranging from 1180 to 1450 km3/year just due to using three different equally plausible algorithms (FAO Penman–Monteith and two versions of Priestley–Taylor; Siebert and Döll 2010). To estimate water abstractions for irrigation, irrigation water use efficiencies (water consumption-to-abstraction ratios) need to be defined, e.g., for individual world regions (Döll and Siebert 2002), or simulated depending on the crop (in particular paddy rice; Wisser et al. 2008) or type of irrigation system (surface, sprinkler and drip; Jägermeyr et al. 2015). By modelling water flows in the three irrigation systems in a processed-based manner for different crops, Jägermeyr et al. (2015) computed grid cell-specific irrigation water use efficiencies as well as beneficial consumptive use due to transpiration. They estimated that global abstractions for irrigation are twice as high as the global consumptive irrigation water use of 1260 km3/year, while 52 % of consumptive use is beneficial.

Irrigation water use is normally computed under the assumption that the crops receive enough irrigation water to allow actual evapotranspiration to become equal to the potential evapotranspiration value. However, it is likely that farmers irrigate less in case of water scarcity. Backed by independent information on worldwide groundwater depletion, application of a GHM indicated that in areas with groundwater depletion, farmers irrigated with only 70 % of the optimal water volume (Döll et al. 2014a).

An important uncertainty in human water use modelling relates to the distinction between the groundwater and surface water abstractions, and also to which degree return flows caused by irrigation recharge groundwater. Currently, the distinction of the source of water abstraction appears to more uncertain in the domestic and manufacturing sector than in the irrigation sector (Döll et al. 2012). Table 1 summarizes major uncertainties in modelling sectoral human water use at the global scale.

Table 1 Major challenges in modelling sectoral human water use at the global scale (livestock water use is neglected)

3.2 Uncertain Climate Input

A number of studies have shown the very strong dependence of computed continental water flows on applied climate input (Biemans et al. 2009; Döll and Fiedler 2008; Guo et al. 2006; Müller Schmied et al. 2014; Nasonova et al. 2011). Not only precipitation, but also radiation data are strong drivers of water flows and storages around the globe, while temperature data have the strongest impact in case of snow and ice. However, both historic and future climate information are prone to multiple types of uncertainty which represents a major challenge for providing reliable GHM output.

3.2.1 Uncertainties in Historic Climate Information and Their Impact on Simulating Water Resources

Historic climate information applicable for global-scale studies suffers mainly from insufficient density of high-quality observations but also from observation errors. The effect of incomplete knowledge about the spatio-temporal distribution of climate variables during historic periods on global sums of water flows and storage changes is illustrated in Table 2 where simulation results of the GHM WaterGAP 2.2 (Müller Schmied et al. 2014) are listed. The considered model output encompasses global mean annual river discharge into oceans and inlands sinks (equal to renewable water resources in the case of model runs that neglect human water abstractions), actual evapotranspiration, consumptive water use by humans and water storage changes for the time period 2000–2009. Three different climate data sets were used to generate the table, all of which are available at the spatial resolution of WaterGAP (0.5° grids). For the STANDARD model runs, the WFDEI data were used that combine observational data with ERA-Interim reanalysis data for the years 1979–2009, resulting in climate input with a daily resolution (Weedon et al. 2014). In variant CLIMATE, the monthly data set CRU TS 3.2 (Harris et al. 2014) was used, but monthly precipitation totals were replaced by the GPCC v6 precipitation monitoring product (Schneider et al. 2014) because it includes more observation stations. Downscaling to daily values was done within WaterGAP, in case of precipitation based on the number of wet days in each month. Precipitation input used to drive both STANDARD and CLIMATE variants is based on monthly precipitation data from GPCC, but correction of precipitation error and the GPCC product version differ, leading to small differences in global precipitation (Table 2). Further details on climate input for the two model variants can be found in Müller Schmied et al. (2014). The PRINCETON model variant is forced by the “Global Meteorological Forcing Dataset for land surface modelling v2” from Princeton University which is a combination of global observation-based datasets and the NCEP-NCAR reanalysis (Sheffield et al. 2006) and is available at http://hydrology.princeton.edu/data.pgf.php.

Table 2 Mean annual (2000–2009) water flows (precipitation P, discharge into oceans and inland sinks/total water resources Q, actual evapotranspiration AET, consumptive water use WCa) and change in total water storage (dTWS) calculated for different WaterGAP 2.2 model variants, in km3/year

WaterGAP 2.2 is calibrated against observed mean annual river discharge at 1319 stations worldwide, by adjusting 1–3 parameters in the basin cells. Table 2 shows that this calibration reduces the impact of uncertain climate variables on global river discharge into oceans and inland sinks Q, actual evapotranspiration AET, actual consumptive water use WCa and change in total water storage dTWS (comp. differences between columns 1 and 2 to differences between columns 3 and 4 in Table 2). While the difference in global precipitation between CLIMATE and STANDARD of 1396 km3/year is increased to a difference in Q of 4255 km3/year without calibration, this difference, i.e. sensitivity to climate input, is reduced by calibration to 2671 km3/year. The difference is still appreciable even in the calibrated model version due to areas without observed river discharge. Global AET reacts much less sensitive to the different climate inputs. This is also seen when comparing the results of two uncalibrated model runs (no human water abstractions assumed) that are driven by either the Princeton or WFDEI climate data (columns 5 and 6). The Princeton climate data, with global precipitation estimated to be 7051 km3/year less than in case of the WFDEI data, result in global total water resources that are 5503 km3/year less than with the WFDEI data.

To perform a first sensitivity analysis regarding uncertain radiation data, daily net radiation in the uncalibrated STANDARD variant of WaterGAP 2.2 (without human water abstractions, column 6 in Table 2) was decreased or increased by 20 % in each grid cell (columns 7 and 8 in Table 2). At least at the scale of 0.5° grid cells, an uncertainty of 20 % for net radiation is not exaggerated. In case of −20 % net radiation, global actual evapotranspiration decreases by 7.2 %, while with +20 % net radiation, it increases by only 4.4 %. The asymmetry is due to water availability limiting actual evapotranspiration; radiation-limited areas become water limited when net radiation is increased. With a 20 % lower global net radiation on the continents, renewable water resources (i.e. runoff) would increase by 9.9 %, while with a 20 % higher global net radiation, it would decrease by 6.1 %. Figure 1 shows the spatial distribution of the changes in renewable water resources that are computed by WaterGAP 2.2 to result from a 20 % decrease or increase in net radiation (average for 2000–2009), in addition to a global map of the baseline net radiation computed by WaterGAP 2.2. While renewable water resources decrease in most grid cells by less than 10 % in case of +20 % net radiation, they increase by more than 10 % in many regions of the globe in case of −20 % net radiation. Changes in per cent of water resources are highest in radiation-limited areas, i.e. in many grid cells with surface water bodies (e.g., in Tibet) but also in tropical and cold regions (even without surface water bodies). In water-limited areas, simulated water resources vary less (Fig. 1).

Fig. 1
figure 1

Sensitivity of renewable water resources RWR during 2000–2009 to a decrease or increase in daily net radiation by 20 % as simulated by WaterGAP 2.2 (Müller Schmied et al. 2014). RWR, i.e. net cell runoff, as computed with uncalibrated model variant STANDARD (a) and the related net radiation (b), and RWR changes relative to STANDARD in case of decreased (c) and increased (d) net radiation. nc: Per cent change cannot be meaningfully computed in case of zero or negative values of renewable water resources for STANDARD NoCal NoUse and where the value changes the sign. Negative RWR values occur in cells with lakes or wetlands if evapotranspiration of water flowing in from upstream cells exceeds precipitation

3.2.2 Uncertainties in Global and Regional Climate Projections and Their Impact on Simulating Future Water Resources

Land and water management as well as assessment of climate change impacts in support of climate change mitigation require quantitative climate projections from seasonal to multi-decadal timescales that can be used as input to GHMs. These projections are subject to uncertainties that are different from the uncertainties of historic climate. The first uncertainty is related to future scenarios of anthropogenic perturbations including emissions of greenhouse gases and sulphate aerosols, land use change and water management itself all of which affect future climate at least on multi-decadal timescales. The second uncertainty, which is related to model deficiencies as illustrated by the different GCM responses to identical radiative forcings, is important at all timescales. Model uncertainty is generally higher for precipitation than for temperature and arises partly from the limited spatial resolution of GCMs and the need to parameterize unresolved processes within atmosphere and ocean and at the land surface. Higher-resolution regional climate models still show significant biases when driven by atmospheric reanalyses and still require bias correction based on observations for use in impact studies. While there is hope to narrow model uncertainty by further increasing model resolution down to a few kilometres, at least for resolving explicitly atmospheric convective processes, the need of ensemble simulations at both seasonal and multi-decadal timescales will remain a major obstacle for at least a decade and raises the issue of both climate model formulation (structural model uncertainty) and model calibration (parameter uncertainty).

The need of large ensembles is related to the third source of uncertainty: the internal variability of the climate system, that is, the natural fluctuations that arise in the absence of any anthropogenic forcing (and any natural radiative forcing such as solar activity or volcanic eruptions). Appreciation of these fluctuations is an important matter for decision makers because they have the potential to reverse—for a decade or so—the longer-term trends that are associated with anthropogenic climate change (e.g., Douville et al. 2015). While they have been recognized as a fundamental limit to predictability from the early beginning of seasonal forecasting, their relevance in climate scenarios has been emphasized more recently and is probably still underestimated in most impact studies which often use a single realization of a given GCM for driving the impact model.

The relative importance of the three sources of uncertainty varies with prediction lead time and with spatial and temporal averaging scale, but is also variable dependent. Focusing on seasonal precipitation at the regional scale from the CMIP3 archive, Hawkins and Sutton (2011) showed that internal variability contributes 50–90 % of the total uncertainty for all regions for precipitation projections of the next decade and is the most important uncertainty for many regions for lead times up to 30 years. Model uncertainty is generally dominant thereafter. Scenario uncertainty was found to be small or over land areas. This is different for other climate inputs such as surface air temperature or surface radiation, and hydrological impacts of climate change differ appreciably between low and high emissions scenarios in the second half of the twenty-first century (Jiménez Cisneros et al. 2014).

The fourth source of uncertainty is related to the statistical bias correction of GCM outputs (e.g., Hagemann et al. 2011). Such techniques are sometimes included in statistical downscaling tools but, again, must be also implemented on top of dynamical downscaling tools. Statistical bias correction is commonly applied in climate impact modelling to correct GCM output for systematic deviations of the simulated historical data from observations. It has been found that evapotranspiration and river discharge as computed by GHMs that are driven by GCM output differ significantly for historic time periods if climate model output is not bias corrected; bias correction of precipitation and temperature is more important than bias correction of radiation, humidity and wind speed (Haddeland et al. 2012). Bias correction methods are generally based on transfer functions generated to map the distribution of the simulated historical GCM output to that of the observations. Those are subsequently applied to correct the future projections, thus making the assumption that GCM biases are constant over time. While such an assumption is not necessarily true, especially for precipitation biases (Chen et al. 2015), there are other challenging assumptions in most bias correction techniques.

3.3 Quantification of the Role of Active Vegetation Under Changing Climate and CO2 Concentrations

When climate and atmospheric CO2 concentrations change due to anthropogenic climate change, vegetation changes too. Therefore, evapotranspiration and runoff are not only affected directly by changes in climatic variables but in addition by the vegetation reaction to changes in climatic variables and CO2 concentration. Rising CO2 concentration causes two counteracting effects (Gerten et al. 2014). On the one hand, the physiological effect reduces opening of leaf stomata, as less water is required to assimilate carbon in case of higher CO2 concentration; this decreases transpiration for the same climatic conditions. On the other hand, CO2 fertilization may cause increased plant growth leading to increased transpiration per unit area (structural effect). Current quantification of these effects is highly uncertain, due to diverse observational evidence but also algorithms in DGVMs (Gerten et al. 2014). This is true not only for natural vegetation but also for crops (Elliott et al. 2014). In addition, the changing climate itself affects the vegetation, e.g., altering biomass production or vegetation cover or even leading of a biome shift.

It is uncertain how vegetation responses to increasing CO2 and changing climate will affect water flows, which adds uncertainty to the response of water flows to climate change. Effects may be large, in particular where the type of vegetation changes. A modelling study with a DGVM indicated that until the end of the twenty-first century the active vegetation may cause a relative increase in runoff in response to increased atmospheric CO2 concentration (physiological effect dominant), except in areas where grassland changes to deep-rooted vegetation in a warmer climate (Murray et al. 2012). In case of 4 °C global warming, the high temperatures lead to decreased vegetation cover, such that runoff as a ratio of precipitation is projected to increase worldwide (Murray et al. 2012). Large uncertainties also surround the response of vegetation to persistent droughts in both present-day and future climates, especially over the Amazon rainforest (Joetzjer et al. 2014).

Most GHMs, like most basin-scale hydrological models, do not model vegetation responses to changes in climate and atmospheric CO2 concentrations. Thus, if, for example, the physiological effect were dominant, those GHM would underestimate future runoff and thus renewable water resources. Neglecting the reaction of crops to increased CO2 may lead to an overestimation of future irrigation water demand (Wada et al. 2013; Elliott et al. 2014; Gerten et al. 2014). In the multi-model study of Wada et al. (2013), the only model that considered CO2 effects on crop photosynthesis and transpiration shows a decreasing trend in future irrigation water demand (about 10 % by the end of this century) and increasing yields, while model runs without CO2 effect indicated pronounced increases in future irrigation water demand (>20 % by the end of this century).

Unfortunately, models that simulate vegetation responses strongly disagree among each other on the effect of active vegetation on evapotranspiration and runoff. In a multi-model study on projected runoff changes between 1981–2010 and 2070–2099, two DGVMs computed higher runoff, and two computed lower runoff compared to GHMs with passive vegetation (Fig. 2; Davie et al. 2013). Comparing the runs of the four models with elevated CO2 to runs with constant CO2, they found that modelling the CO2 effect on vegetation contributes to an increased spread in runoff projections. The challenge is to (1) improve modelling of climatic and CO2 effects on vegetation with respect to evapotranspiration and runoff and (2) to include the effect of the active vegetation on evapotranspiration and runoff also in the majority of GHMs that do not model vegetation dynamics.

Fig. 2
figure 2

Uncertain impact of vegetation response on runoff changes under future anthropogenic climate change: scatterplot of runoff change against precipitation change between 1981–2010 and 2070–2099 in mm/day for world regions as computed by global hydrological models not taking into account active vegetation (blue) and global vegetation models that do (green). All models are forced with HadGEM2-ES RCP8.5 climate. Figure by Davie et al. (2013)

3.4 Understanding of Why GHMs (Including Global Irrigation Models) Respond Differently to Changed Climate Input

Traditionally, GCMs have been considered as a major source of the uncertainty in future hydrological assessments. Therefore, hydrological studies on the impact of climate change applied not only the output of one but multiple GCMs as input to the hydrological model (Jiménez Cisneros et al. 2014). However, recent model intercomparison projects (WaterMIP and ISI-MIP) where various GHMs were driven by either standard historic climate data or the bias-corrected output of multiple GCMs showed that differences among GHMs are also a major source of uncertainty regarding evapotranspiration, runoff and discharge (Dankers et al. 2014; Davie et al. 2013; Gosling et al. 2011; Haddeland et al. 2011, 2014; Hagemann et al. 2011, 2013; Schewe et al. 2014) and irrigation water demand (Wada et al. 2013; Elliott et al. 2014). The uncertainty can be larger than that arising from GCMs, depending on the region and the output variable. Considering eight GHMs but only three GCMs, Hagemann et al. (2013) found that the spread in projected changes of actual evapotranspiration dominantly caused by the different GHMs in most areas of the world, while the spread in projected runoff was dominantly caused either by the GCM or by the greenhouse gas emissions scenario (considering changes until the end of the twenty-first century). However, the small number of applied GCMs and bias correction of GCM output has limited the spread of GCM output. Applying eleven GHMs driven by five bias-corrected GCMs, Schewe et al. (2014) determined that GHMs were responsible for a larger spread in river discharge than the GCMs on most of the global land area. This can be explained by the fact that they compared model results not for a specific time period but for a specific global warming for which the GCM outputs are more similar. Projections of the impact of climate change on optimal irrigation water abstractions as computed by seven GHMs driven the same GCMs were dominated by GHM uncertainty throughout the twenty-first century (Wada et al. 2014). With these studies, it has become state-of-the-art that an ensemble of model runs that has been generated by driving multiple GHMs with the output of multiple GCMs should be evaluated in water-related climate change impact studies. Analysis of such multi-model ensembles should not be restricted to the ensemble mean but also consider results of individual models that would imply a high risk, i.e. results that may have strong negative impacts due to high vulnerability (Döll et al. 2015).

Why do the responses of GHMs (and hydrological models in general) to climate change projections vary so widely? Possible reasons include (1) different model algorithms for the computation of potential and actual evapotranspiration as well as runoff generation, (2) modelling (or not) of energy balance in addition to water balance, (3) modelling irrigation water requirements based on soil water deficits as derived from soil water balances or as the difference between optimal evapotranspiration and available water, (4) different physiographic input parameters such as soil properties and land use and (5) different simulation of vegetation, including the CO2 effect on crops and other vegetation. The latter aspect has already been discussed in the previous section.

None of the multi-model studies has been able to analyse in depth the reasons for the discrepant results of the individual GHMs. Hagemann et al. (2013) concluded that large differences in projected changes between the GHMs may be attributed to different model formulations of evapotranspiration but provide no further detail. Different methods for computing potential evapotranspiration (e.g., taking into account only temperature, or also radiation or humidity and wind speed) may explain the disagreement not only in energy-limited regions (Haddeland et al. 2012). Among eleven hydrological models applied for the ISI-MIP project, a few models used temperature-based methods for simulating potential evapotranspiration (Schewe et al. 2014), which has likely contributed to the spread. Haddeland et al. (2011) analysed differences among eleven GHMs driven by the same historic climate input, discussing the impact of energy balance-based snow algorithms as compared to degree day-based snow algorithms (affecting seasonal flows) and the impact of differing parameter values. They found that regarding “the interannual variation in runoff and evapotranspiration, no major differences have been found between the models run at daily or subdaily time steps or between models using different evapotranspiration or runoff schemes” (Haddeland et al. 2011, p. 882).

Model intercomparison studies should go beyond the identification of the spread of the model ensemble but also try to understand reasons for the spread and identify routes towards model improvement. However, the challenge of understanding why a large number of complex GHMs with a high spatial and temporal resolution react differently to a spatially and temporally heterogeneous change in input variables may appear overwhelming. A step towards progress could be to devise feasible intercomparison strategies that aim at understanding the major drivers of the spread. Another step would be to assess the capability of the GHMs to simulate hydrological effects of past climate variability, under the assumption that a model that does a good job in simulating hydrological responses to, e.g., inter-annual climate variability is also able to better simulate impacts of climate change. Hydrologists should study lessons learnt in model intercomparison of climate (e.g., Knutti and Sedlacek 2013) or dynamic vegetation models (Sitch et al. 2008; Warszawski et al. 2013). Regarding GCMs, Knutti and Sedlacek (2013) found that model spread regarding future temperature change has not decreased in the new CMIP5 model ensemble as compared to the CMIP3 ensemble, i.e. after about than 6 years of massive efforts in model improvements. They argue that GCMs have improved and now represent more processes in greater detail, which implies greater confidence in projections even if model spread, which is often called model uncertainty, has not decreased. A decreased spread might even misleading if caused only by using more similar model input or more similar algorithms that become to be considered “state-of-the-art” within the scientific community even without firm scientific support. Regarding DGVMs, the study of Sitch et al. (2008. p. 2035) showed “the ability of models to satisfy contemporary global carbon cycle constraints, while future projections diverge markedly”, as many different parameter combinations allow recreating the historical record but lead to divergent future projections.

3.5 Modelling of Monthly Time Series of River Discharge and Human Water Use to Support More Meaningful Indicators of Water Stress for Both Humans and Ecosystems

Currently, indicators of water stress are mostly defined based on mean annual values of water availability and use (e.g., Kiguchi et al. 2015, Arnell and Lloyd-Hughes 2014). Water stress for humans and freshwater ecosystems as well as environmental flow requirements could be defined more meaningfully at the monthly timescale (Hoekstra and Mekonnen 2011) as use of mean annual values masks differences in seasonality and interannual variability. Consideration of mean annual values only leads to an underestimation of water stress in highly seasonal flow regimes, e.g., in monsoon regions, as well as in regions with high interannual flow variability, e.g., in semi-arid and arid regions.

However, WaterGAP and most likely all other GHMs are not capable of satisfactorily simulating monthly time series or even mean monthly values of river discharge and human water use. Müller Schmied et al. (2014) found that even for most of the 1319 gauging stations used for model calibration (that considers only mean annual river discharge), monthly (their Figs. 6 and 7, and Table 4) or mean monthly river discharge (their Fig. 5) was not well simulated. In case of the best model version (STANDARD), only 28 % of the basins showed a modelling efficiency (Nash–Sutcliffe coefficient) of more than 0.7, while 46 % of the basins had a value of less than 0.5. Simulation of seasonality of irrigation water use is known to be highly uncertain as cropping patterns and calendars are not well known (Portmann et al. 2010). Besides, cropping patterns and calendars would ideally be simulated as a function of climate. Therefore, we believe that it is currently not reasonable to compute, in global-scale studies, water stress indicators based on monthly water availability and use. The challenge is to improve both GHMs and their input data such that reliable monthly time series of river discharge and human water use are computed.

An alternative to indicators based on monthly discharge values is to consider statistical monthly low and high flows, e.g., Q 90 and Q 10, the river discharge that is exceeded in 9 out of 10 months and 1 out of 10 months, respectively. These statistical low and high flows include seasonal and inter-annual variability of monthly river discharge, which is particularly high in the dry regions of the globe, and are ecologically relevant indicators of the flow regime. They can be computed reasonably well by the GHMs WaterGAP and PCR-GLOBWB, at least for the gauging stations whose mean annual discharge was used for calibrating WaterGAP (Fig. 3). Simulation of high flows is better than simulation of low flows. In case of observed Q 90 values of less than 1 km3/month or 10 mm/month, Q 90 as simulated by WaterGAP simulations may differ very strongly from the observed value even though WaterGAP was calibrated against mean annual river discharge of these stations (Fig. 3 top). WaterGAP tends to overestimate Q 90 (with 592 of the simulated 821 values being larger than the observed ones) and to underestimate Q 10 (with 608 of the simulated 821 values being smaller than the observed ones). PCR-GLOBWB shows the same behaviour but to a much lesser extent (Q 90: 497 of the simulated 821 values are larger than the observed ones; Q 10: 449 of the simulated 821 values are smaller than the observed ones). However, model results of PCR-GLOBWB show a lower fit to observed low and high flow values as the model is not calibrated against mean annual discharge at the depicted gauging stations. Water stress indicators based on monthly Q 90, where water use is often taken to be mean annual consumptive water use instead of mean annual water abstraction, were applied, e.g., by Alcamo et al. (2007) and Hanasaki et al. (2008).

Fig. 3
figure 3

Validation of monthly low and high flows Q 90 (left) and Q 10 (right) as simulated by the global hydrological models WaterGAP 2.2 (Müller Schmied et al. 2014) (top) and PCR-GLOBWB (Wada et al. 2014) (bottom) against observations at 821 of the 1319 WaterGAP calibration stations with at least 15 years of data and basin area of at least 20,000 km2. Note that different from WaterGAP, PCR-GLOBWB is not calibrated to mean annual observed discharge at the depicted gauging stations

3.6 Simulation of Groundwater–Surface Water Interaction and Capillary Rise by Gradient-Based Groundwater Modelling

Groundwater is the largest store of freshwater available for human use. It is replenished by precipitation in the form of diffuse groundwater recharge through the soil and sometimes by concentrated recharge from surface water bodies (Taylor et al. 2013). Groundwater flows along gradients of hydraulic head, but this is not represented yet in GHMs. In particular, groundwater flow between grid cells is not simulated. If at all, groundwater is represented in GHMs mainly as a linear storage compartments that discharge groundwater as baseflow into surface water bodies, baseflow being a function of groundwater storage. With a linear groundwater store, the relative temporal changes of groundwater storage can be computed, also as affected by groundwater abstractions (Döll et al. 2014a, b). Groundwater storage changes can be translated to variations of groundwater table elevations, but there is no information of the absolute elevation of the groundwater table or depth to groundwater table. An approach for taking into account groundwater in climate models by Niu et al. (2007) allows to compute depth to groundwater table and capillary rise but does not simulate lateral groundwater flow (nor groundwater recharge from surface water bodies).

Without dynamic modelling of the elevation of the groundwater table, groundwater recharge from surface water bodies, which is particularly important in dry regions, cannot be represented well. Döll et al. (2014a) used a very rough estimate of groundwater recharge from lakes and wetland in semi-arid and arid regions of the globe to avoid underestimation of groundwater recharge and thus overestimation of groundwater depletion. Equally, capillary rise from groundwater to the soil cannot be represented well if the distance of the groundwater table to the land surface is not simulated. Thus, for a simulation of groundwater–surface water interaction and capillary rise, it is necessary to estimate the temporally changing elevation of the groundwater table well; this can only be achieved if lateral groundwater flow driven by gradients of hydraulic head (or groundwater table) is computed (Fan et al. 2007; Jones et al. 2008; Kollet and Maxwell 2008; Krakauer et al. 2014; Maxwell et al. 2007; Maxwell et al. 2015). Lateral groundwater flow is described by a partial differential equation the solution of which is computationally more involved than solving the ordinary differential equation that is used to describe groundwater storage and outflow in typical hydrological models. However, the main reason for not including gradient-based groundwater flow modelling in GHMs may be the extreme lack of information on groundwater that is available for global-scale studies (Taylor et al. 2013; de Graaf et al. 2015).

For North America, Miguez-Macho et al. (2007) linked a land surface scheme with a two-dimensional gradient-based groundwater model and computed, with a daily time step, gradient-based groundwater flow, water table elevation, groundwater–surface water interaction and capillary rise, using a spatial resolution of 12.5 km. One challenge was the determination of the river conductance that affects the degree of groundwater–surface water interaction. Capillary rise was computed using the Richards’ equation for a soil column reaching down to the groundwater table by soil layers of variable thickness; the model appears to overestimate capillary rise that is computed to dominate in all flat regions during May–October. Vergnes et al. (2012, 2014) established a gradient-based groundwater model applicable for global-scale modelling and applied it to France with spatial resolutions of 0.5° and 5 arc-min. The transient model simulates two-dimensional groundwater flow dynamics and also accounts for groundwater–river exchange and capillary rise. It is currently being implemented in the GHM of CNMR (Centre National de Recherches Météorologiques, France). Fan et al. (2013) developed a high-resolution (30 arc-s) steady-state global groundwater flow model driven by diffuse groundwater recharge, taking into account land surface elevation. The results indicated that patterns in water table depth explain patterns in wetlands at the global scale and vegetation gradients at regional and local scales. This study was an important step towards simulating groundwater dynamics globally. However, in the chosen approach, neither the important hydraulic connection between rivers, surface water bodies and groundwater nor spatially distributed hydrogeological information was taken into account. A subsequent study by de Graaf et al. (2015) presented an alternative global-scale steady-state groundwater flow model of a shallow aquifer (spatial resolution 6 arc-min), estimating aquifer depths and using a global lithological map (Hartmann and Moosdorf 2012) in combination with estimates of lithology-specific hydraulic conductivity (Gleeson et al. 2014). The results showed the importance of lateral groundwater flows over catchment boundaries as inter-basin flow paths. Both models are likely to overestimate the depth to groundwater. They are not dynamic and are not coupled to a model that dynamically models groundwater recharge and surface water levels. Therefore, the simulation of groundwater–surface water interactions is very limited, and neither capillary rise nor groundwater pumping or groundwater recharge by irrigation return flows is considered.

At a continental scale, Maxwell et al. (2015) showed the possibility of setting up an integrated hydrological model that simulates surface and subsurface flow at a high spatial resolution (1 km). The model solves surface and subsurface flow simultaneously and is constructed entirely of available datasets including topography, soil texture and hydrogeology. However, the steady-state simulation did not take into consideration runoff generation, transient dynamics or human activities such as groundwater pumping that affect the quantity of surface water fluxes and groundwater recharge (Döll et al. 2014a, b). Maxwell et al. (2015) concluded that these limitations can be addressed within the current modelling framework but require additional computational resources. Advanced soil–groundwater–surface water modelling systems such HydroGeoSphere (Brunner and Simmons 2012) that are widely applied at local scales would require not only faster computers and better calibration strategies but also good quality data to be applicable at the global scale.

Given the poor knowledge on the three-dimensional shape and distribution of aquifer bodies as well as restricted computational resources, gradient-based groundwater modelling cannot aim, in the near future, at modelling groundwater flows in three dimensions or at supporting the sustainable management of specific aquifers. The focus is on better representing groundwater–surface water interactions and capillary rise. Here, the major challenge is to achieve a reasonable representation even with relatively large grid sizes such that it is computationally feasible to perform transient groundwater flow simulations that are coupled to soil water and surface water dynamics.

3.7 Detection and Attribution of Observed Changes in Freshwater Systems

Detection is the process of demonstrating that an observed change cannot be explained by internal climate variability only. Attribution of a change to anthropogenic influence requires the additional demonstration that the detected change is consistent with the change simulated in response to a combination of external forcings. While detection is generally done using statistical methods, attribution almost always requires the use of models. Most often in hydrology, detected change, e.g., of river discharge, is attributed to observed changes in climate or CO2 concentrations (Jiménez Cisneros et al. 2014). Attribution of hydrological changes to human climate-altering activities is seldom attempted because it requires the application of climate models. An exception is the study of Pall et al. (2011) where citizens allowed the researchers to perform a very large ensemble of climate model runs on their computers such that it could be shown that increased greenhouse gas concentrations increased the likelihood of a specific historic flood event in the UK by approximately a factor of 2–3.

Detection of changes of water flows and storages is limited by availability of, e.g., river discharge data or data on groundwater recharge. In the case of river discharge, available data are inhomogeneous as records of many stations end in the 1990s or even earlier, and there are a large number of ungauged basins around the world so that there is no good global coverage of river discharge by gauging stations. In case of the important variable groundwater recharge, there are no measured time series at all. In addition, the strong spatio-temporal variability of hydrological variables makes it hard to detect changes, e.g., changes in flood frequency. Attribution of detected changes of river discharge is challenging because river discharge is affected not only by climatic (and CO2) changes but also by changes in land use and water abstractions (Jiménez Cisneros et al. 2014).

There are studies that tried to attribute changes in river discharge to changes in climate, CO2 and land use (Gedney et al. 2006) and, more recently, to the radiative effect of anthropogenic aerosols (Gedney et al. 2014). Such studies were based on the comparison between annual river discharge derived from offline land surface simulations on the one hand and from observed river discharges on the other hand. The river basins with significant irrigation were ignored so that the role of human water use was considered as negligible. Yet, the conclusions of Gedney et al. (2006) that CO2 increase played a large role in causing increased global river discharge were challenged. They are highly uncertain not only due to the specific model assumptions on physiological versus structural effects on evapotranspiration but also because the applied precipitation data set is not suitable for evaluating trends as it is based on a temporally varying number of precipitation gauging stations, and because a more recent compilation of observed river discharge resulted in a decrease in global river discharge over time (Gerten et al. 2014). Alkama et al. (2011) were successful in simulating recent river discharge trends without accounting for physiological effects and emphasized the possible relevance of permafrost thawing for capturing the discharge trends of northern high-latitude rivers.

As far as evapotranspiration (ET) is concerned, changes at the continental to global scale are even more difficult to analyse due to lack of direct measurements. Only relatively few monitoring sites operate around the world and the period of record is quite short. Two studies (Jung et al. 2010, Wang et al. 2010) have used such in situ measurements for tuning global empirical ET schemes based on remote sensing and standard meteorological data. They agreed on a global increase in annual mean ET by about 7 mm per year per decade from 1982 to the late 1990s. These results were compared with ET outputs of process-oriented land surface models and were found to be relatively robust (Jung et al. 2010). The 1982–2008 period was, however, too short for a formal detection and attribution. More recently, Douville et al. (2012) used several global ET reconstructions based on two land surface models driven by two precipitation forcings and attributed the reconstructed multi-decadal variations of annual mean ET in three latitudinal belts to anthropogenic climate change. The ET reconstructions neither accounted for direct CO2 effects on ET nor accounted for changes in land or water use, thereby allowing a fair comparison with GCM outputs and a more robust attribution of the effect of anthropogenic climate change. Yet, this strategy did not allow the authors to assess the possible impacts of other human activities on ET, such as water abstractions.

4 Prospects

The role of global hydrological modelling is to combine large amounts of diverse and mostly spatially and temporally resolved data in order to estimate continental water flows and storages and resulting policy-relevant indicators of the water situation worldwide. To decrease the uncertainty of the computed estimates, it will be fruitful to generate and utilize improved GHM input data such as climate data but also to make better use of observations of GHM output variables such as river discharge or total water storage variations. This can be done by multi-criteria validation, calibration or data assimilation. A higher spatial resolution of GHM models beyond the current 0.5° resolution (55 km × 55 km at the equator) will increase the policy relevance of GHM output, e.g., for supporting integrated water resources management at the scale of river basins.

4.1 Multi-criteria Validation Against River Discharge and Geodetic/Remote Sensing Observations

Observations of river discharge are ideally suited for validating macro-scale hydrological models because the point observation integrates over processes in the whole upstream basin of the gauging station. Besides, river discharge is a flow that can be related quite easily to water availability which is the focus of many assessments. Finally, long time series of observational data exist for many stations around the globe and some of them are compiled by the Global Runoff Data Centre (GRDC). When validating model output against observed river discharge, measurement errors should be taken into account, ideally in a station- and discharge-specific manner. None of the 500 UK gauging stations has a discharge observation uncertainty of less than 10 % (for individual measurements at mean flow conditions) due to uncertain stage–discharge relationships, while 83 % of the stations for which uncertainty could be determined has an uncertainty of less than 40 % (Coxon et al. 2015).

Observational data of other components of the water cycle in addition to river discharge are needed to validate hydrological models due to the well-known equifinality problem (Beven and Freer 2001); more than one parameter combination (or model) can lead to a good fit between discharge observations and simulations, while other water flows or storages would be projected very differently by model variants that result in equally good simulations of the river discharge. Therefore, a multi-criteria validation that considers other observed flows or water storage variations or any other observation that are related to flows and storages can be expected to be highly informative.

As an example, total water storage (TWS) variations as modelled by GHMs can be validated against satellite-based geodetic observations of monthly gravity variations of the GRACE satellites (Tapley et al. 2004), at least if all important storage compartments like the groundwater and large surface water bodies are taken into account in the GHM. Simulated TWS variations can also be validated by continuous GPS observations at more than 200 network stations worldwide because water storage variations cause crustal deformations which lead to displacements of the GPS reference point (Döll et al. 2014b). Comparing simulated TWS variations against both GRACE and GPS, Döll et al. (2014b) identified regions where the WaterGAP GHM underestimates seasonal variability of TWS and found that maximum TWS occurs 1 month too early in WaterGAP for most land areas (based on GRACE only). Validating groundwater depletion as computed by WaterGAP against both in situ well observations and GRACE TWS allowed the conclusion that farmers in groundwater depletion area irrigate with only 70 % of the optimal value (Döll et al. 2014a). In the future, combined validation of GHM model output against river discharge and TWS (e.g., Alkama et al. 2010) should be intensified, and time series of lake or river water tables as measured by radar altimetry should serve as additional observational data sets.

4.2 Multi-criteria Calibration and Data Assimilation

Multi-criteria calibration and data assimilation goes beyond multi-criteria model validation. In model calibration, model parameters are adjusted in a way that simulated water flux or storage (state) variables optimally match historic observations with respect to one or more performance criteria. A primary goal of calibration is to obtain a model (including parameter values) that allows simulations for periods without observation data, such as for simulations of the global water resources for the time period 1971–2000 or of future climate change impacts. GHMs have rarely been calibrated, with few exceptions such as WaterGAP (Döll et al. 2003; Hunger and Döll 2008) and WASMOD-M (Widén-Nilsson et al. 2007) for which one or more parameters were adjusted by evaluating simulation results against observed river discharge. Without basin-specific calibration, even mean annual simulated river discharge may differ strongly from the observed value (Müller Schmied et al. 2014; Haddeland et al. 2011).

Model calibration is usually hampered by parameter equifinality (see Sect. 4.1), and calibration against more than one observable and performance criterion has long been recognized as an option to allow adjustment of a larger number of model parameters and to constrain the number of plausible model realizations (e.g., Gupta et al. 1999). For continental to global-scale modelling, however, there is not yet much experience with multi-criteria calibration, presumably because adequate satellite-based observation data with sufficient spatial and temporal extent and resolution have become available only recently. While considerable uncertainties of remote sensing data products may still limit their value in a multi-criteria calibration strategy (Livneh and Lettenmaier 2012), large-scale calibration examples demonstrated the benefit of using, in addition to river discharge, satellite-based monitoring data, such as near-surface soil moisture from Envisat (Milzow et al. 2011), MODIS-based evapotranspiration (Livneh and Lettenmaier 2012), altimetry-based water levels (Milzow et al. 2011), MODIS-based snow cover (Parajka and Blöschl 2008) or total water storage (TWS) variations from GRACE (Werth et al. 2009, Xie et al. 2012). GRACE TWS and river discharge were incorporated into a multi-criteria calibration scheme for WaterGAP by Werth and Güntner (2010) by adjusting the most sensitive 6–8 parameters in the 28 largest river basins worldwide. Improved simulations of TWS variations and river discharge were achieved for most basins after calibration, but calibrated mean annual discharge was still poor compared to the observed values in some basins, and a better fit to GRACE TWS did not necessarily lead to a better fit of simulated discharge to observed discharge. In the study of Xie et al. (2012), model parameters appear to be less sensitive to TWS than to river discharge. While large trade-offs in model performance for different objective functions leave the model with considerable uncertainties, they can help to unravel deficiencies of the model structure (e.g., Duethmann et al. 2014).

Direct multi-criteria calibration for the essential terms of the continental water balance (river discharge and TWS changes, and evapotranspiration if monitoring data were available) is particularly appealing if one strives for a closed water balance model of the continental areas. Nevertheless, the particular nature of GRACE TWS data based on the Earth’s time-variable gravity field requires specific consideration of the storage compartments considered, data filtering and error terms to make the calibration scheme consistent between model and observations (Güntner 2008). Multi-criteria calibration should comprise more than two observables to further constrain the space of plausible model realizations. Besides the types of satellite-based data on states and fluxes mentioned above, information on water storage in surface water bodies has a high potential as a large-scale calibration constraint, based on currently available multi-sensor combination data (for example, Papa et al. 2013) and future satellite missions such as SWOT. With the development of multi-scale modelling and parameterization concepts (Samaniego et al. 2010), even observation data with a small spatial measurement support but a global coverage such as evapotranspiration from Fluxnet eddy-covariance sites (Jung et al. 2011) or near-surface soil moisture based on GNSS reflectometry (Larson et al. 2008), for instance, may inform parameter adjustment in global models within a multi-criteria calibration approach.

For observing systems that require complex operators to transform the sensor signal into a hydrological variable simulated by a GHM, an inverse strategy can be promising. In this case, the state or flux variable of the hydrological model is forward-transformed to a quantity at the sensor level and, thus, parameter adjustment is done by measuring the performance directly relative to the sensor signal. This avoids the need for using an operator which in turn often is a nonlinear model that is afflicted with uncertainty. An example is to convert simulated water storage variations of a hydrological model into K-Band range rate data which is the key observable of GRACE at the level of the twin satellites, i.e. the inter-satellite distance changes as determined by a K-Band Ranging (KBR) System between the two GRACE satellites (Krogh 2011). Parameters in the hydrological model are then adjusted by minimizing the difference between modelled and observed range rate data.

Data assimilation, i.e. an integration strategy of models and data that primarily adjust state variables of the model, and possibly also parameters, may allow for an optimal quantification of the system status for a period where observations are available, and if the respective errors can be adequately specified. With early developments of data assimilation for large-scale applications being tailored towards appropriate initial conditions in land surface schemes of weather forecasting systems, there are now numerous examples for continental to global-scale assimilation of a variety of data types into LSMs with the general aim of hydrological forecasting and provide land surface hydrological states that are superior to satellite observations or model estimates alone (see, e.g., overviews in Li et al. 2012; Reichle et al. 2014; Lahoz and de Lannoy 2014). However, there are only very few examples of data assimilation for large-scale water cycle modelling and operational water resources assessment so far (Renzullo et al. 2014), presumably because of limited availability of usable data at this scale, but also because of the complexity and computational costs of these techniques (van Dijk et al. 2014). The most widely adopted technique is the ensemble Kalman filter (EnKF) or Smoother, where the otherwise unknown error characteristics of the model are estimated by a Monte Carlo-based ensemble approach to determine the error covariance matrix of the model. First assimilation studies using GRACE TWS show its value for informing simulated subsurface water storage (Zaitchik et al. 2008; Li et al. 2012; Houborg et al. 2012). Eicker et al. (2014) presented an EnKF approach for assimilating GRACE TWS into WaterGAP with combined state and parameter updating and a full error propagation from the monthly GRACE spherical harmonic coefficients. They showed that GRACE data inform the model even at higher spatial resolution than resolved by the GRACE data themselves, with varying gains in time, space and among the different storage compartments. Some of the major challenges for the further development of data assimilation techniques include strategies for conserving mass during the assimilation process, error characterization of model and observation data, and adequate mapping functions between observed and simulated variables.

Data assimilation or fusion techniques can also be applied in an offline mode to provide consistent water balance estimates of continental hydrology. A global water cycle reanalysis product has recently been presented by van Dijk et al. (2014), merging prior estimates of monthly water storage changes based on an ensemble of several LSM outputs and complementary data with GRACE TWS data in a sequential data assimilation framework.

4.3 Hyperresolution Global Hydrological Modelling

There is the vision that one day it may be feasible to perform global-scale hydrological modelling with an acceptable accuracy at a much higher resolution than today, with grid cells of 100 m to 1 km instead of the current 50 km (Wood et al. 2011). Then, global-scale modelling would allow improved global assessment of, e.g., food security and could support river basin management everywhere. This has particular relevance in developing countries where basin models are not yet available or a poorly constrained because of lack of local data; in these cases, information about water resources derived from GHMs that can exploit non-local remote sensing data would be a great asset if it is locally relevant (Bierkens et al. 2015). In addition, highly resolved global-scale information on water flows and storages would be very beneficial for freshwater ecosystem management and for assessing global biogeochemical cycles.

The HyperHydro initiative (www.hyperhydro.org) aims at advancing hyperresolution global hydrological modelling. It is a network of scientists that is open to the broader scientific community and invites anyone who wishes to cooperate. Current efforts include the establishment of testbeds, overcoming of computational challenges and the compilation of input data sets. For further information on motivation, challenges and prospects of hyperresolution global hydrological modelling, please refer to Wood et al. (2011) (including a comment of Beven and Cloke 2012 and the reply of Wood et al. 2012) and Bierkens et al. (2015). Beven et al. (2015) provide valuable critical comments on hyperresolution modelling of water on the land areas of the globe, pointing out that unknown heterogeneities in the subsurface and ignorance about subsurface processes result in a lower gain of accuracy by increased resolution than is the case in atmosphere and ocean modelling. However, a prospect of hyperresolution modelling is that its output can be evaluated more meaningfully than current GHM output by local experts and stakeholders who can help identify model deficiencies.

5 Conclusions

The capabilities and the sheer number of GHMs have increased significantly over the last decade such that global-scale quantification of water resources has improved and uncertainties are better known. We conclude that major challenges remain until GHMs can serve as reliable tools for characterizing current and potential future water resources worldwide. We hope that our presentation of selected challenges informs not only on the state-of-the-art of global hydrological modelling but also indicates fruitful research directions. As outlined in the previous section, we believe that major advancements will be possible if in situ and remotely sensed observational data of model output variables are utilized more efficiently in global hydrological modelling and if spatially more resolved model output can be provided with reasonable accuracy.