The Utility of Land-Surface Model Simulations to Provide Drought Information in a Water Management Context Using Global and Local Forcing Datasets

Drought diagnosis and forecasting are fundamental issues regarding hydrological management in Spain, where recurrent water scarcity periods are normal. Land-surface models (LSMs) could provide relevant information for water managers on how drought conditions evolve. Here, we explore the usefulness of LSMs driven by atmospheric analyses with different resolutions and accuracies in simulating drought and its propagation to precipitation, soil moisture and streamflow through the system. We perform simulations for the 19802014 period with SASER (5 km resolution) and LEAFHYDRO (2.5 km resolution), which are forced by the Spanish SAFRAN dataset (at 5km and 30km resolutions), and the global eartH2Observe datasets at 0.25 degrees (including the MSWEP precipitation dataset). We produce standardized indices for precipitation (SPI), soil moisture (SSMI) and streamflow (SSI). The results show that the model structure uncertainty remains an important issue in current generation large-scale hydrological simulations based on LSMs. This is true for both the SSMI and SSI. The differences between the simulated SSMI and SSI are large, and the propagation scales for drought regarding both soil moisture and streamflow are overly dependent on the model structure. Forcing datasets have an impact on the uncertainty of the results but, in general, this impact is not as large as the uncertainty due to model formulation. Concerning the global products, the precipitation product that includes satellite observations (MSWEP) represents a large improvement compared with the product that does not.


Introduction
Drought is an important climatic risk that can be exacerbated by anthropogenic warming Marx et al. 2018) and water management (Wanders and Wada 2015). It is the result of complex interactions among processes in the atmosphere, on the continental surface, and within human management (Van Loon 2015. Drought has large social impacts on the society of the Iberian Peninsula due to the scarcity and high degree of the utilization of water resources (MMA 2000). Due to the degree of exposure that Spanish society has to drought, it is necessary to improve our knowledge on drought. It is also necessary to develop tools that improve our monitoring and early warning capabilities to provide water managers with better information. Currently, there is still an insufficient amount of knowledge and adequate tools to manage drought effectively in many places.
The most common meteorological drought index is the standardized precipitation index (SPI) (McKee et al. 1993). For instance, the drought monitoring product of the AEMET (Agencia Estatal de Meteorología) is based on the SPI. There are other indices that also include reference evapotranspiration (ETo), such as the Palmer Drought Severity Index (PDSI) (Palmer 1965), Reconnaissance Drought Index (RDI) (Tsakiris et al. 2007), Standardized Precipitation Evapotranspiration Index (SPEI) (Vicente-Serrano et al. 2010;Begueria et al. 2014) and Standardized Palmer Drought Index (SPDI) (Ma et al. 2014). Begueria et al. (2014) and Vicente-Serrano et al. (2015) studied the sensitivity of drought indices to ETo and precipitation and found that each one has a different sensitivity to each variable, which must be taken into account for practical applications, specifically if they are related to climate change (Sheffield et al. 2012). Similar to the SPI, standardized indices of other variables, such as soil moisture or streamflow , can be generated. In addition, these can be used to study how drought propagates throughout the system by correlating indices at different time scales (Barker et al. 2015).
Standardized soil moisture, if it was widely observed, would be a good drought indicator, as it integrates the balance among actual evapotranspiration, effective precipitation and other relevant processes and, in addition, it is directly linked to vegetation and is the water source that plants use. However, observed data for this variable are rare (Seneviratne et al. 2010). AghaKouchak et al. (2015) considers that satellite data, which are not currently used for drought monitoring in most basins, offer interesting opportunities to improve early drought warning, and Linés et al. (2017) shows that satellite data can be used for the early warning of droughts in the Ebro basin. Concerning remotely sensed soil moisture, recent missions (Entekhabi et al. 2010;Kerr et al. 2012;Bindlish et al. 2015) and new high-resolution datasets (Escorihuela et al. 2012;Merlin et al. 2013;Molero et al. 2016) are producing even more possibilities.
Land-surface models (LSMs) explicitly simulate the water and energy exchanges at the interface of the soil with vegetation and the atmosphere. Compared with satellite data, the great advantage of these models is that they allow for long-term simulations that extend several decades into the past and offer a coherent image of the entire system, including variables that are difficult or impossible to observe from space, such as root zone soil moisture. LSMs are complex, which limits their use as hydrological models for daily basin management, where simple hydrological models offer better results in terms of streamflow simulation. However, because they are physical models, LSMs simulate most of the processes related to the propagation of drought through the system (Vidal et al. 2010b); thus, they are ideal for the study of processes related to drought. Furthermore, LSMs play an important role in drought prediction systems (Thober et al. 2015).
LSMs can be useful for drought monitoring and providing information regarding decision making Wood 2007, 2014). One caveat is that LSMs simulating soil moisture and its variability are affected by uncertainties in the forcing data, model structure Van Loon et al. 2012), parameters (Nearing et al. 2016) and resolution; as a consequence, uncertainties are still relevant (Yilmaz and Crow 2013;Wang et al. 2009;Fan et al. 2011;Tallaksen and Stahl 2014) and, thus, the use of ensembles in different models is recommended (Mo and Lettenmaier 2014). The assimilation of satellite soil moisture is also a possible way to reduce uncertainty (López López et al. 2016), but validation with satellite data is not always straightforward (Escorihuela and Quintana-Seguí 2016).
In this study, we use two different land-surface models and three forcing datasets at different resolutions in order to 1. evaluate the usefulness of land-surface models as tools to provide drought information (e.g., precipitation, soil moisture and streamflow) to water managers, including the temporal and spatial structures of drought and its propagation through the system; 2. evaluate uncertainties due to the forcing datasets and model structure, with special emphasis on the comparison of global low-resolution products and local high-resolution products; and 3. detect areas of improvement in land-surface modeling to better meet the needs of water managers.
This study improves our current knowledge on the capability of land-surface models to reproduce drought and its evolution (from precipitation to soil moisture and streamflow) and how the uncertainty of these processes depends on the forcing data at different resolutions and the model structure using two distinct land-surface models. Furthermore, this is the first comparison of the SURFEX and LEAFHYDRO land-surface models in Spain, and it is the first study about drought in Spain using high-resolution land-surface models.

Area of Study
This study is focused on mainland Spain, whose relief and main basins are depicted in Fig. 1. Spain is influenced by both Atlantic (dominated by synoptic-scale frontal systems) and Mediterranean meteorology (where mesoscale convective systems are common and often missed by global-scale and low-resolution reanalyses and datasets) which, together with complex orography, are the main factors that modulate the climate in this area. Spain has an heterogeneous distribution of the annual mean precipitation, with values ranging from 2000 mm y −1 in the Atlantic margin and Pyrenees to less than 100 mm y −1 in the SE. The main valleys, such as the Ebro basin, are dry due to the shadowing effect of the mountains. Runoff is mainly generated on the relief surrounding the main basins. Agriculture is located in downstream areas, and the infrastructure has been built to store water (dams) and distribute it to irrigated fields (canals). As a result, the streamflow is significantly influenced by water management.

Datasets and Models
In this section, we describe the forcing datasets (Section 3.1) used to force different landsurface models (Section 3.2). In Section 3.3, we describe the data used to validate the simulations performed.

Forcing Datasets
Off-line land-surface model simulations are performed by driving the LSM with a forcing dataset. In this study, our reference dataset is SAFRAN. Furthermore, two other global and lower resolution datasets are used: the forcing dataset of the EU-FP7 eartH2Observe project and MSWEP.
SAFRAN (Durand et al. 1993(Durand et al. , 1999 is a meteorological analysis system that produces gridded datasets of screen-level meteorological variables by combining the outputs of a meteorological model and all available observations using an optimal interpolation algorithm (OI) (Gandin 1966). SAFRAN has been extensively used in France (Quintana-Segui et al. 2008;Vidal et al. 2010a) and, more recently, it has also been applied in Spain by Quintana-Seguí et al. (2016, 2017. The Spanish application of SAFRAN uses observational data from the Spanish State Meteorological Agency (AEMET) and ERA-Interim (Dee et al. 2011) as the first guess. The resulting gridded product has a resolution of 5km but, for this study, we have also generated a lower resolution dataset (30km). We have labeled these products as SFR (SAFRAN) and SLR (SAFRAN Low Resolution), respectively. The high-resolution SAFRAN data used in this study are available in the HyMeX database.
The EU-FP7 eartH2Observe project has produced a global reanalysis of water resources by performing global off-line simulations based on hydrological and land-surface models. These simulations need a global meteorological forcing dataset, and the project has created its own based on ERA-Interim (Dee et al. 2011). The first version of this reanalysis, which has a resolution of 0.5 degrees, is described in Schellekens et al. (2017). eartH2Observe also produced a 0.25 degree forcing dataset, which is the one we used here. It is directly derived from the original ERA-Interim 3-hourly data driving ERA-Interim/Land (Balsamo et al. 2012). Hereafter, this forcing dataset is labeled as E2O.
Simulations have also been performed using E2O by substituting its precipitation dataset with the MSWEP precipitation dataset , which optimally merges the highest quality data sources available as a function of timescale and location. From now on, this dataset is labeled as MSW.

Land-Surface Models
This study is based on simulations performed with two land-surface models: SASER (Section 3.2.1) and LEAFHYDRO (Section 3.2.2). Both models incorporate a river routing scheme to transform the generated runoff into streamflow. Neither models simulate human processes, such as irrigation or dams. LEAFHYDRO includes a dynamic water table coupled via two-way fluxes into soil-vegetation and rivers.

SASER
SASER (SAfran-Surfex-Eaudysee-Rapid) is a hydrometeorological modeling platform based on the SURFEX land-surface model. Usually, SASER uses SAFRAN as the forcing dataset (Section 3.1) but, in this paper, it can use any other suitable dataset. SURFEX (SURFace EXternalisée) is Météo France's surface modeling platform (Masson et al. 2013). For natural soils, SURFEX uses the ISBA (Interaction Sol-Biosphère-Atmosphère) land surface scheme (Noilhan and Planton 1989;Noilhan and Mahfouf 1996), which has different versions. In this study, we have selected ISBA-3L (Boone et al. 1999) and ISBA-DIF (Boone 2000;Habets et al. 2003). The former considers a simple three-layer description of the soil by means of a force restore approach, while the latter uses a more complex multilayer approach. SURFEX lacks a routing scheme, which is the reason why SASER uses the RAPID routing model (David et al. 2011a, b). The connection between SURFEX and RAPID is made by means of Eau-dysée. 1 An important limitation of SASER is that it does not represent groundwater processes.
In this study, SURFEX has been run on the same grid as SAFRAN, which has a resolution of 5km. Simulations have been performed using ISBA-DIF and ISBA-3L. Hereafter, the SURFEX simulations that use ISBA-DIF are labeled as DIF, and the SURFEX simulations that use ISBA-3L are labeled as 3L.

LEAFHYDRO
LEAF (Land-Ecosystem-Atmosphere Feedback) is the land-surface component of RAMS (Regional Atmosphere Modeling System) (Walko et al. 2000). Several modifications were made to LEAF to incorporate groundwater processes, resulting in LEAFHYDRO (Miguez-Macho et al. 2007). The water table depth is diagnosed based on the soil moisture when it lies within the resolved soil layers. When the depth is deeper, soil columns are extended to the dynamic water table below, thereby acting as saturation boundary conditions and affecting the soil water fluxes above. There is lateral groundwater flow among adjacent cells, leading to divergence from the high ground and convergence with low valleys at multiple scales. The water table, once recharged by rain events or fed by lateral flow convergence, relaxes into rivers within a grid cell through fluxes between groundwater and rivers. These fluxes can be 2-way depending on the hydraulic gradient, representing both loosing (i.e., leaking into groundwater) and gaining (i.e., receiving groundwater) streams. River discharge, which is fed by surface runoff and groundwater, is routed out to the ocean through the channel network using the kinematic wave method. The sea level is set as the head boundary condition for groundwater, which allows it to influence coastal drainage.
The LEAFHYDRO simulations for this study were performed on a grid with a 2.5 km resolution that is nested in the 5 km SAFRAN grid, which allows each cell in the SAFRAN grid to exactly encompass 4 cells of the LEAFHYDRO grid. Afterwards, the results were aggregated to 5 km to facilitate the comparisons with the SASER-based simulations. Hereafter, LEAFHYDRO is labeled as LHD.

Streamflow Data
Daily streamflow data were obtained from the MAPAMA database. 2 To obtain monthly time series with few gaps, we selected all stations that had at least 95% of the daily data during the period of study.
Due to the high degree of human influence on streamflow in the area of study, an estimate of the monthly naturalized flow was necessary in order to validate our LSM simulations, which do not simulate any water management infrastructure. SIMPA (Estrela and Quintas 1996;Ruiz 1999) is a monthly, conceptual, distributed hydrological model used by MAPAMA and the authorities of most basins, which provide feedback during its development. We believe that it is currently the best reference dataset for naturalized flow for Spain. These data were provided to us by MAPAMA. Its streamflow time series ends in summer 2006 (see Section 4.1). Hereafter, SIMPA is also referred to as SMP.

Methodology
Hereafter we explain how the model simulations were performed (Section 4.1), how drought was quantified (Section 4.2) and how the simulated and observed data were compared (Section 4.3). Table 1 summarizes the simulations performed for this study. They encompass the period 1980-2013; however, all calculations based on streamflow have been performed in the subperiod . This was done for two reasons: (1) the SMP time series provided by the MAPAMA ends in summer 2006; and (2) in the LHD simulations, due to the spin-up process, the simulated streamflow was not realistic until 1982.

Land-Surface Model Simulations
To compare the simulated soil moisture among models, soil was divided into two layers: the root zone and deeper soil zone. SURFEX defines the root zone and deeper soil zone coherently in both 3L and DIF. To do the same in LHD, we used the root zone and total soil depth of 3L and then extracted the soil moisture for these same layers in LHD. Table 1 Combinations of forcing datasets and modelsAll simulations were performed by the authors of this study, except for the SMP, which was obtained from MAPAMA

Forcing Datasets
-LHD is not shown in the streamflow section of this study, as the corresponding streamflows were not generated

Calculation of drought indices
We have calculated drought indices following the SPI. This index is calculated using monthly data or, alternatively, a time series of the accumulated precipitation for the previous n months, where n represents the scale of the index. A similar operation can be done for soil moisture (SSMI) and streamflow (SSI). We use a nonparametric methodology, which can be applied to different climatic variables, including precipitation, soil moisture and relative humidity, without having to assume representative parametric distributions .

Drought propagation
The methodology we used to study drought propagation is inspired by Barker et al. (2015): (1) the standardize soil moisture index (SSMI) is calculated with a temporal accumulation of 1 month, making the SSMI equivalent to SSMI-1; (2) for precipitation, SPI-n is computed for n in [1, 2, ..., 24]; and (3) finally, for any given grid point, the n x scale maximizing the Pearson correlation coefficient between SPI-n and SSMI is found. The resulting n x can be interpreted as the scale at which drought propagates from precipitation anomalies to soil moisture anomalies. For streamflow, the methodology is mostly the same, with the difference being that the SPI is calculated with the areal mean of the basin precipitation corresponding to the gauging station of the studied streamflow.

Comparison and Validation Metrics
To compare standardized indices of the same variable (i.e. soil moisture) corresponding to two different products or to a product and the observations we have used the root mean square difference (RMSD) and Pearson Correlation coefficient (r ).

Results
In this section we compare how the different simulations represent soil moisture drought and how they propagate drought from precipitation to soil moisture (Section 5.1); we also evaluate how drought propagates from precipitation to streamflow (Section 5.2). With these comparisons, we assess the quality and uncertainty of the current forcing datasets and landsurface models in reproducing drought characteristics. The analysis of the forcing datasets and the validation of the modeled streamflow are presented as electronic supplementary material.

Soil Moisture Drought
In this section, we compare how simulations reproduce soil moisture drought (Section 5.1.1) and then compare the dynamics of the propagation of this drought from precipitation anomalies to soil moisture anomalies (Section 5.1.2).

Comparison of the Simulated Soil Moisture Drought
In this section, we quantify the model result differences and assess the intermodel consistency for soil moisture drought. Table 2 shows the root mean square difference (RMSD) and Pearson correlation (r ) calculated by comparing all simulations with each other. These values synthesize both the spatial and temporal aspects of the differences, as the comparisons include all data points (i.e., all grid points for all time steps). The table contains four subtables. The first row of subtables corresponds to the root zone, and the second row corresponds to the deep soil. Different color scales have been used for each score (RMSD and r ), but the same color scale (where greener is better) is used for each score for both soil layers. There is more uncertainty (i.e., redder) in the deep soil than the root zone results, which is expected, as the considered LSMs use different approaches to simulate deep soil water fluxes.

SFR-3L SFR-3L
Two scores have been computed, the root mean square difference (RMSD) and the Pearson correlation (r ), for the root zone SSMI and deep soil SSMI The disparity between simulations with the same model but different forcings is larger than that between simulations with the same forcing but different models, which implies that the forcing dataset plays an important role. For example, in simulations forced by SFR, the largest RMSD for the root zone soil moisture is 0.55, and the lowest correlation is 0.84 (both correspond to SFR-DIF vs. SFR-LHD). On the other hand, for all simulations using DIF, the largest RMSD is 0.61, and the lowest correlation is 0.80 (E2O-DIF vs. SFR-DIF). This result is identical if we compare E2O-LHD with SFR-LHD.
For the root zone soil moisture, the smallest difference between two simulations that use different forcing datasets and models is between E2O-DIF and SFR-3L (R M S D = 0.42 and r = 0.9). The largest difference is between E2O-LHD and SFR-DIF (R M S D = 0.74 and r = 0.71). To put these results into context, a RMSD of 0.74 is almost 3 4 of a standard deviation, which means that the drought status of a grid point at a given time probably changes category. The trend found in E2O increases the differences between simulations. If E2O is removed from the comparison, the simulations with the largest differences are This analysis shows that drought studies performed in the same area with different models and forcing datasets should be compared carefully, as the uncertainties are still high. This is something that, for example, has also been found in a climate change setting by Marx et al. (2018).

Propagation of drought to soil-moisture
If the drought status is important for a water manager, the time scales of drought propagation (from precipitation to soil moisture) are perhaps even more important, as their knowledge can be a tool for the forecasting of the soil moisture status in places where drought dynamics are relatively slow, which could benefit early warning systems. As stated in Section 4.2.2 we perform this analysis by finding the n x scale maximizing the correlation between SPI-n and SSMI. Figure 2 shows maps of n x for the root-zone soil moisture. For instance, by taking SFR-DIF (a) and SFR-LHD (b), it is apparent that the spatial pattern of n x is significantly different for each model. The spatial structure in DIF is overly homogeneous, which means that for this model, the SSMI correlates better with the SPI at scales of 2 months for almost all of the study area. In contrast, the LHD presents a richer pattern, with scales that extend from 3 to 12 months, as the dynamics are slower towards the south and faster towards the northwest. 3L is similar to DIF but with a NW-SE gradient that extends from scales of 1 month in the NW to 9 months in the SE. The conclusion of this first comparison is that the model formulations have a strong influence on the temporal scale at which the variability of precipitation affects the variability in soil moisture; thus, different models yield significantly different results. Another important question is how meteorological forcing affects n x . The first column in Fig. 2 (panels a, d, f, and h), showing the n x results for all simulations that use the DIF model, shows that the pattern is largely invariant. E2O-DIF and MSW-DIF have an area where n x is one month longer than that in SFR-DIF, but this difference is significantly lower than that from the changing model. The results for the LHD model (second column; panels b, e and g) indicate that the spatial patterns of n x for different forcings are more divergent than those in the case of DIF but still to a lesser degree than Fig. 2 Scale n x that maximizes the correlation between SPI-n and root-zone SSMI for different forcing and model combinations those of a different model. This suggests that, for some models, the forcing dataset has an impact on how precipitation variability propagates to soil moisture; however, this impact is still not as high as that of the model formulation itself.
A similar analysis can be performed for the deeper soil zone (below the root zone). In this case, (figures not shown), n x is higher for all models except 3L, indicating that soil moisture variability in the deeper soil is generally related to more persistent precipitation anomalies, which is expected. For DIF, n x is much larger in the deep soil than in the root zone when compared with the LHD results, where there is much less variation. This occurs perhaps because of the relatively fine resolution of the soil columns in the top 4 m of the soil and the coupling with the water table in this model.
These comparisons show that LSMs with different structures produce different drought dynamics in soil moisture. Unfortunately, we do not know what the reality is due to the lack of observations, but these large differences reveal that progress is needed in order to, first, improve the observational coverage of the soil moisture (particularly the deeper layers) and, second, bring models closer to the real behavior. Furthermore, the forcing dataset also introduces its own uncertainty.

Hydrological Drought
The next step of this study is to analyze streamflow drought. In this case, in situ observations are available and, thus, included in the analysis. In Section 5.2.1, we validate the standardized values; in Section 5.2.2, we evaluate how the models reproduce drought propagation from the atmosphere to the streamflow. The outputs of the hydrological model SIMPA are included in the comparison to be used as a reference for natural flows. The simulated streamflows (nonstandardized values) are validated in the electronic supplement.

Validation of the simulated SSI
Here, we evaluate the ability of the models to simulate streamflow drought (SSI). In this case, we compare the ability directly with that of the SMP-SMP, assuming that it provides a good estimate of natural flows. Figure 3 shows the RMSDs for the different simulations, with lower values indicating higher skills. SFR-DIF generally has the smallest RMSDs, ranging from 0.6 to 0.8, with maximum values between 0.8 and 1.0 in an area at the headwaters of the Tajo, Guadiana, and Júcar, which also includes the southernmost stations in the Ebro river basin. SFR-3L presents similar results, whereas the SFR-LHD exhibits larger RMSDs at most stations, with a maximum value of approximately 1.2. The aforementioned high baseflows in this model are likely the cause of these large RMSDs, which impact its capability to reproduce drought status. To put all of these results in the context of drought studies, an RMSD value of 1 represents one standard deviation of the index and, thus, even errors of 0.6, which is the best case for all simulations and may change the drought status of a given monthly flow (i.e., from normal to drought status). Concerning the effect of forcing, the maps show that the MSW has similar results to the SFR, while E2O, in general, yields somewhat larger RMSDs. Figure 4 depicts the Pearson correlation coefficient (r ) of the same simulations, with higher values representing better performances. The SFR-DIF is, again, the simulation with the best scores, followed by SFR-3L and SFR-LHD. In the last case, the correlations in some rivers are extremely low, especially in the southern half of the area of study, where summer flows are minimal and, thus, negatively affected by the high baseflow bias in the LHD. The impact of changing the forcing datasets remains low, as it was before. The conclusion of this comparison is that, in terms of the SSI, model formulation is the most important uncertainty source compared to the forcing dataset.

Drought propagation to streamflow
Finally, we evaluate how the models simulate drought propagation from precipitation to streamflow. We have studied the correlation between the SPI-n and SSI for different values of n (not shown). We found that, in general, the SMP-SMP has a behavior closer to the DIF and 3L behaviors compared with the observations in many basins. This means that even a well-tested hydrological model, such as the SMP, has difficulties producing the right Fig. 5 n x scale, in months, of the SPI-n, which better correlates with the SSI. The red dots indicate scales longer than 12 months correlations between the SPI-n and SSI. For its part, the LHD presents a different behavior due to the connection of rivers to the groundwater in this model. In all cases, the simulations are closer to the other simulations when using the same model and not the same forcing dataset, which indicates that model formulation largely determines the drought propagation behavior, and the forcing dataset has a reduced impact.
We now generalize for the whole study area. Figure 5 shows the scale n x , in months, of the SPI-n that better correlates with the SSI. The SFR-OBS (b) compares basin precipitation from the SFR and observed SSI at each station. There are many red dots in areas where the flow is highly intervened by water management, but other reasons cannot be excluded, such as karsts or groundwater influence, because some red dots correspond to rivers in the natural regime. The SMP-SMP (a) is the result of a well-tested hydrological model; however, we must keep in mind that it is not the ground truth, and it will not be able to correctly simulate the presence of large karst systems or other complex geological features; therefore, it does not have a perfect skill score, as discussed previously. The SMP-SMP shows some coherent Fig. 6 Pearson correlation between the SPI-n and SSI when n is the scale that maximizes the correlation (n x ) geographical patterns, with low n x values (1-3 months) in the north in most of the Ebro and in some areas of the Duero and Tajo basins. The highest values (reaching 1 year) are found in the Sistema Iberico mountain range, separating the Ebro and Duero basins. Moderate values occur in the SE and some areas of the Duero basin. With regard to the other models, The DIF and 3L present the lowest n x , as in the case of the root-zone soil moisture, indicating the generally fast response of rivers to precipitation deficits. The comparison with the SMP shows that these n x values are too low and too homogeneous in space. Forcing the DIF with the SLR instead of the SFR, MSW or E2O does not change the results significantly, suggesting that those results are mostly dependent on model formulation. Indeed, a model with a different approach, such as the LHD, presents a divergent behavior, with high values of n x in many of the studied basins, which is likely due to the role of groundwater buffering the streamflow response to drought. Furthermore, it is more sensitive to forcing than the other models, as was the case for soil moisture. Figure 6 shows the Pearson correlation between the SPI-n and SSI when n = n x . If we compare the SFR-OBS with the SMP-SMP, we see that in the real world, which is complex and influenced by water management, the correlations between the SPI-n x and SSI are lower than those in the non-human influenced and simplified world of a model. This result suggests that, in reality, there are many more factors driving streamflow variability, as defined by the SSI, than just precipitation. As expected, the DIF and 3L streamflow variabilities are too related to precipitation (i.e., high correlation), as they do not simulate management or groundwater processes; however, they are quite comparable to the SMP-SMP. The LHD, on the contrary, has much lower correlations, which means that the streamflow variability in this case is driven by other factors, such as groundwater processes.

Discussion
In this study, we have shown that, despite their limitations, global forcing datasets can be useful for drought studies. The MSW, which integrates diverse sources of data, including rain gauges and satellites, is a valuable dataset and represents a clear improvement compared with the E2O in both absolute and relative (drought) terms (see the electronic supplementary material). However, we should not forget that the MSW uses rain gauge data and, thus, is probably better in data rich countries, such as Spain, than in countries where observations are inexistent or unavailable.
Our study shows that our implementation of state-of-the-art LSMs in Spain is not yet ready to provide reliable information to water managers.
We compared the differences among the SSMI used in several simulations, concluding that changing the forcing dataset and keeping the same model has a larger impact on the results than the opposite, which implies that the forcing dataset plays an important role. These comparisons mixed both the spatial and temporal components of the error, but it is the spatial component that amplifies the divergences among the simulations, as there are large differences between how, for instance, E2O and SFR represent the spatial structures of drought.
The results also show that model formulations have a strong influence on the temporal scale n x at which the precipitation variability (SPI) affects the soil moisture variability (SSMI). This is an important issue because it implies that since there is no ground truth for comparison, we cannot trust land-surface models to understand how drought propagates to soil moisture. The spatial structures of n x are overly homogeneous in space in the 3L and DIF. Even though we do not know what the reality is, intuition tells us that there should be differences across various hydrological settings and climates. LHD, on the contrary, presents a rich spatial variability of n x , which shows the important effects that groundwater lateral flows and the water table have on soil moisture dynamics. This is especially important in semiarid areas, where the interaction between the water table and soil moisture sustains vegetation during the dry season and drought periods. Furthermore, the differences among models can be amplified by the choice of the forcing dataset, as the results for the LHD have shown that changing the forcing dataset alters how precipitation variability propagates to soil moisture. As a consequence, model formulation is the key factor explaining the uncertainty of drought propagation to soil moisture, while forcing explains most of the divergence when representing the spatial structures of drought, as global products are not able to reproduce the finer scale spatial details of the precipitation pattern.
The structural differences among models affect their ability to simulate the SSI; thus, model formulation is the most important uncertainty source in the SSI results. One of the interesting results in drought propagation is that, compared with the observations at the near natural flow stations, the correlation between the SPI-n x and SSI is generally too high for the SMP, DIF and 3L. This means that these models miss important processes affecting the SSI. As the model structure is the most important source of uncertainty, it is meaningful to use global forcing datasets for local (national or large basin scale) streamflow drought studies.
One of the main results of this study is that the land-surface models used in this study must be improved if they are going to be used to monitor and understand drought processes in Spain. This is mainly due to the large uncertainty in the simulation of soil-moisture and the needed improvement in groundwater processes and their interactions with soil and rivers. We believe that similar results would be obtained if more models were introduced in the comparison, as the main causes of the detected problems are shared by many LSMs. However, this should be the subject of a larger future study. Therefore, future work in improving offline LSMs is necessary not only to provide interesting data for water management (planning and monitoring) but to prepare for the next generation of earth-system models. In this respect, the methodology used in this study, which compares different kinds of drought and its propagation, is useful to reveal structural differences and problems in model simulations, providing new perspectives that go beyond the validation of the absolute values of the studied variables.

Conclusions and Perspectives
Land-surface models are tools that have the potential to provide valuable information to water managers. Simulated standardized soil moisture is especially interesting as a drought index because soil moisture integrates the effects of effective precipitation and actual evapotranspiration. LSMs can provide estimations of this index, compensating for the lack of observations. However, this study shows that the uncertainties due to model formulation remain an important issue in current generation large-scale hydrological simulations based on landsurface models. This is true for the results of both the standardized soil moisture (SSMI) and streamflow (SSI).
In the case of soil moisture, the differences among simulations are quite large both in terms of the RMSD and correlation. Furthermore, the differences in how drought propagates to soil moisture are also large and determined by the model structure. The problem is difficult to solve because there is no Spanish dataset with in situ root-zone soil moisture data that could be used to guide model improvement and, thus, models cannot be effectively constrained and validated, even though some good networks exist: REMEDHUS (https:// ismn.geo.tuwien.ac.at/networks/remedhus/) and the Valencia Anchor Station (Coll Pajarón 2017).
For streamflow, the quality of the simulated SSI is not as good, as it would otherwise be desirable. Furthermore, the LSMs are not able to correctly simulate the scales of drought propagation, and they miss processes that affect the SSI, as the correlations between precipitation (SPI-n x ) and streamflow (SSI) are too high (this is also true for the SMP).
The forcing datasets do have an impact on the uncertainty of the results but, in general, this impact is not as large as the impact of uncertainty due to model formulation. The MSW, which includes satellite precipitation data, represents a large improvement compared with E2O.
The methodologies used to study drought propagation are not only useful to understand this process but also to explore structural problems in models and guide model improvement. In the future, it is necessary to continue improving the current hydrological implementation of LSMs in Spain (SASER and LEAFHYDRO). The assimilation of the remote sensing of surface soil moisture can be a useful way to improve how models simulate root-zone soil moisture. Concerning streamflow, the main physical processes to be improved are those related to groundwater and lateral flows, which need to be introduced in SASER (3L and DIF) and improved in LEAFHYDRO. Finally, it would also be necessary to introduce human-related processes in models, such as irrigation and dam operation, in order to be able to compare them to reality and study the direct impacts of humans on drought.
Funding Information This is a contribution to the EU-FP7 eartH2Observe project, which received funding from the European Union's Seventh Programme for research, technological development and demonstration under grant agreement no. 603608. This work has been partially funded by the Spanish Ministry of Science, Innovation and Universities and the European Regional Development Fund through grants CGL2013-47261-R and CGL2017-85687-R. This work is a contribution to the HyMeX program (Hydrological Cycle in the Mediterranean Experiment; http://www.hymex.org).

Conflicts of Interest Statement None
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.