Introduction

A hydrological model is a quantitative and integrated approach to understand the spatial and temporal variation of hydrological cycle components and evaluate the possible impacts on water resources (Hrachowitz and Clark 2017). Hydrologic simulation with high accuracy and minimum error is essential for water resources management and planning. Hydrological models simplify the water behavior in nature (Devak and Dhanya 2017; Liu and Gupta 2007), and play a critical role in predicting water resources in cold regions, where glaciers and snow are key elements of the water cycle. However, hydrological models face significant challenges that can affect their performance and accuracy. One of the main challenges is the limited availability and quality of observed data for calibrating and validating hydrological models, particularly in cold regions where traditional measurement methods are difficult to conduct (Goswami et al. 2007; Belvederesi et al. 2022). In addition, meteorological predictions can be highly uncertain in cold regions due to complex physical processes such as snowmelt and freezing/thawing cycle, which further complicate model calibration and validation. Another challenge is the limited ability of models to effectively communicate their results for decision-making purposes (Bergström and Lindström, 2015; Chen et al. 2017; Xu et al. 2005). This can be due to a lack of clarity in the model structure and parameters, which can make it difficult to interpret model outputs and identify areas for improvement. Furthermore, uncertainty in model predictions is a pervasive issue, which can arise from various sources such as model structure, input data, and parameter estimation methods (Liu and Gupta 2007; Moges et al., 2020). To address these challenges, previous studies have explored the application of satellite snow cover area (SCA) data to enhance hydrological model calibration and boost its performance in cold regions. By incorporating high-resolution satellite data into hydrological models, researchers can obtain a more accurate depiction of the temporal and spatial distribution of SCA, which can lead to more reliable model predictions (Roy et al. 2010). The current study builds upon this research by investigating the potential of using satellite SCA data to enhance the calibration process and performance of a glacio-hydrological model in northern Sweden.

Glacierized catchments are characterized by the contrasted runoff dynamics of glacier and non-glacier catchment portions. Runoff from glaciers generally respond to fluctuations in temperature and solar radiation, whereas runoff from non-glaciated areas predominantly responds to differences in precipitation (van Tiel et al. 2020b). Thus, these two catchment portions can balance each other out across varying seasons (Chen and Ohmura 1990; Pritchard 2019). Both snowmelt and glacial melt mostly contribute to runoff during the subsequent summer months and glacial melt plays a role in mitigating the overall variability observed in interannual runoff dynamics. However, van Tiel et al. (2020a) demonstrated that this compensatory effect depends on more than just the catchment’s proportional glacier cover. In addition to the pronounced seasonality, diurnal fluctuations in summer runoff strongly reflect differences in temperature and radiation throughout the day. Furthermore, there is a strong relationship between long-term runoff patterns and the state of glaciers. The trend in total runoff is affected by the peak water trajectory resulting from the decreasing volumes of ice and retreating glaciers because of persistent negative glacier mass balance (GMB), which causes the glacier melt to firstly increase and then decrease while crossing a tipping point (Immerzeel et al. 2013; Huss and Hock 2018). Therefore, to accurately simulate hydrological processes in glacierized catchments, the appropriate calibration and validation techniques must be employed. These techniques are critical to ensuring that the model accurately represents the system and can provide valuable insights that are otherwise difficult to obtain from limited observations (Ajami et al. 2004). Therefore, reliable calibration approaches are essential for accurately modeling glacierized catchment complexities and predicting glacio-hydrological variables.

Implementing hydrological models accurately is a challenging task in mountainous regions where data are often limited (Chen et al. 2017). Previous studies also showed that the calibration procedure and parameter settings greatly influence the performance of (semi-)distributed hydrologic models (Farrag et al. 2021; Kavetski et al. 2006; Mizukami et al. 2017; Parajka and Blöschl 2008). Therefore, the efficiency of hydrological model simulations relies on the careful selection and implementation of the appropriate calibration technique during the hydrological modeling process. Calibration is the process of adjusting a hydrological model to reflect the characteristics of a specific catchment or system (Nemri and Kinnard 2020). It is a necessary step in accurately simulating runoff and other hydrological processes, as it ensures that the model parameters reflect the true conditions of the catchment. Moreover, uncertainties in the calibration process of glacio-hydrological modeling can arise due to various factors such as highly variable and nonlinear nature of snow accumulation and melt processes, the heterogeneity of snowpack properties, and the influence of topography on the distribution and dynamics of glacier melt. Challenges in model calibration can significantly contribute to the uncertainties in hydrological modeling (Boyle et al. 2000; Butts et al. 2004; Merz et al. 2009; Thirel et al. 2015). These issues can influence the applicability of model parameters in time and space, making it difficult to interpret the outcomes of hydrological modeling research (De Vos et al. 2010; Merz et al. 2011; Seiller et al. 2012; Brigode et al. 2013; Coron et al. 2014; Arsenault et al. 2018). Therefore, to improve the performance of the FLEXG model and minimize its uncertainty in hydrological simulations, the current study seeks to use a multi-variable calibration process.

Hydrological modeling typically relies on streamflow data for calibration, which may not be sufficient to represent and simulate all internal hydrological processes. To address this limitation, remote sensing data have been employed as an alternative or a complementary source of information in hydrological studies (Xu et al. 2014). Remote sensing data can provide observations of various hydrological variables, and several studies have utilized satellite data to enhance the performance of hydrological models (Wanders et al. 2014; Beck et al. 2017; Kittel et al. 2018; Kumar and Lakshmi 2018). However, the accuracy and quality of remote sensing data can vary across regions, and the use of satellite data alone for model calibration may not provide satisfactory results. Several studies demonstrated the potential of using satellite-derived SCA for improving hydrological modeling in cold regions. For example, Di Marco et al.(2021) used MODIS SCA data and topographical information into a semi-distributed hydrological model (ICHYMOD) to improve the hydrological modeling in Eastern Italian Alps. Their findings showed that using MODIS SCA data can significantly reduce the uncertainty in snow-covered area estimation and thus improve the accuracy of hydrological modeling. Udnæs et al. (2007) focused on improving the hydrological modeling of the mountain areas of Norway using satellite-based SCA data. They found that including SCA data in the calibration process of HBV (Hydrologiska Byråns Vattenbalansavdelning) models, along with discharge data, resulted in discharge simulations that were nearly as accurate as models calibrated using discharge data alone. Moreover, the simulations of SCA were substantially enhanced when SCA was included in the calibration process.

Calibrating hydrological models based solely on gauged (measured) runoff data may not ensure reasonable simulation of the internal hydrological processes, which could lead to uncertainties in hydrological modeling tasks. This issue can be particularly challenging in ungauged catchments where there is a lack of measurement of hydrological data, such as streamflow, evapotranspiration, and soil moisture. As a result, the use of satellite-derived SCA data can provide valuable information in regions where gauged data are limited or non-existent (Duethmann et al. 2014). The SCA data have been utilized in various ways for hydrological modeling. One way is to use it as a forcing data to evaluate reproduced runoff from snowmelt models (Li and Williams 2008). Another way is to use it to update hydrological models in data assimilation techniques (Andreadis and Lettenmaier 2006; Zaitchik and Rodell 2009; Yatheendradas et al. 2012). SCA data have also been used to calibrate hydrological models (Corbari et al. 2009; Finger et al. 2011; Pellicciotti et al. 2012; Shrestha et al. 2014). The benefits of using SCA data as a reference for calibrating hydrological models have been confirmed by previous studies. For example, de Niet et al. (2020) used daily gauged streamflow and satellite MODIS SCA data for a multi-dataset calibration of Hydrological Projections for the Environment (HYPE) in a glaciated catchment in Iceland. They concluded that the multi-dataset calibration strategy may improve runoff and SCA simulations in glacierized catchment.

Due to the complexity of hydrological systems in cold regions, glacierized catchments are influenced by several factors, such as low temperatures, glaciers, snow cover, topography, aspect (e.g., mountain aspect), and other glacial variables (Arnold and Sharp 1992; Chen et al. 2017; Grünewald et al. 2013; Harris and Murton 2005; Helfricht et al. 2014; Somers and McKenzie 2020). Hydrological models in cold regions should incorporate these elements to comprehend such complex conditions. Previous studies have shown that glacio-hydrological models, with acceptable accuracy in hydrological modeling in glacierized basins, can fill this gap (Akhtar et al. 2008; Chen et al. 2018). The FLEXG model as a glacio-hydrological model has demonstrated successful applications in various glacierized basins, for example in the Urumqi glacier no. 1 catchment (China) (Gao et al. 2017) and in northern Sweden (Mohammadi et al. 2023). Despite the progress made in hydrological modeling, there are still research gaps that need to be addressed. A major limitation of hydrological models is the limited availability of data in ungauged catchments, which hinders the accurate calibration and validation processes. Furthermore, in cold regions, such as mountainous areas or glacierized catchment, the modeling of SCA and runoff dynamics presents unique challenges due to complex interactions between glacier, snow, and hydrological processes (Bavay et al. 2013; Engel et al. 2016). Although satellite remote sensing data have been increasingly used to improve hydrological modeling in these regions, the accuracy of such data and its applicability to specific regions remain a topic of discussion. Thus, the primary objective of our study was to improve the calibration and validation of glacio-hydrological modeling in glacierized catchment by incorporating satellite-based SCA data into the FLEXG model. Specifically, this study investigated the use of a satellite-derived SCA from MODIS for the FLEXG calibration and validation using different schemes. Additionally, this study compared the MODIS-based SCA, Landsat-8 based SCA, and AVHRR-based SCA for the studied glacierized catchment in northern Sweden.

Study area

We focused on the Torne River basin (Fig. 1) as the study area which is located in northern Sweden. Torne River basin is characterized as a glacierized basin and encompasses an area of 547.6 km2, situated at the latitude 68°8′–68°24′ North and the longitude 18°3–18°56′ East, varying from 342 to 1781 m above the average sea level. According to the Koppen-Geiger climate classification, the studied catchment is located in the polar tundra climate zone (Kottek et al. 2006). Snow and glaciers are the primary water resources in the study area (Mohammadi et al. 2023). Based on the data during 1995–2018, the Torne River basin recorded an average daily temperature of 0.36 °C, ranging from − 32.1 to 23.1 °C. Likewise, the average daily maximum temperature over the same period was 4.29 °C, ranging from − 29.2 to 32.8 °C. The average daily minimum temperature was − 3.36 °C, ranging from − 34.7 to 18.5 °C. The daily precipitation ranged from 0 to 61.9 mm with an average of 0.94 mm. The minimum and maximum streamflows were recorded as 0.13 and 219 (m3/s), respectively.

Fig. 1
figure 1

Torne River basin in northern Sweden with aspect classifications, river networks, meteorological and streamflow stations

In Fig. 2, the diverse elevations and aspects of both the glacierized and non-glacierized areas in the Torne River basin are shown. The X-axis displays the areas of the elevation zones for both glacier and non-glacier parts of the basin. The aspects are indicated by different colors, with north, south, east/west facing aspects identified to account for their influence on snow and glacier melting. The studied catchment was split into 143 elevation zones, with 10-m intervals to assess the effect of elevation on precipitation and temperature. Therefore, a total of 143 elevation zones were classified, with a sum of 1 for three aspects (north, south, and others) and two regions (glacier and non-glacier). Due to the lack of multiple meteorological stations in the study area, a linear distribution approach was used to generate distributed inputs for the FLEXG. This method allowed for the spatial distribution of meteorological variables across the entire model domain. Temperature and precipitation were assigned to each range of elevation by interpolating the meteorological station located at 393 m. The air temperature demonstrated a linear decrease with increasing elevation, following a lapse rate of − 0.004 °C/m. Conversely, precipitation demonstrated a linear increase with elevation, adhering to a locally appropriate lapse rate of 0.05%/m (Mohammadi et al. 2023).

Fig. 2
figure 2

Area of each elevation zone in glacier and non-glacier regions of the Torne River basin

Data and methods

Meteorological and hydrological data

The Övre Abiskojokk is a streamflow gauge station located downstream of the Torne River basin (Fig. 1). Temperature and precipitation were observed at the Abisko meteorological station, which records the weather data for the Torne River basin area. The current study used daily meteorological and hydrological data from 1995 to 2018 from The Swedish Meteorological and Hydrological Institute (SMHI: https://www.smhi.se/).

Satellite-based data

The topography data from a Digital Elevation Model (DEM) with a cell size of 25 m were obtained from the European Union’s Earth observation program (Copernicus). Glacial area and the ice thickness information were obtained from ETH Zurich’s Research Collection (https://www.research-collection.ethz.ch/). The MODIS/Terra Cloud Gap Filled (CGF) (MOD10A1F version 61: https://nsidc.org/) satellite-based SCA dataset with a spatial resolution of ~ 500 m was utilized for obtaining SCA data during the study period (2000–2018). Previous studies confirmed the ability of MODIS SCA data in the snow-covered basins around the globe (Duethmann et al. 2014; Parajka and Blöschl 2008; Thirel et al. 2013; Zhou et al. 2013). In addition, another type of snow cover dataset was obtained from the Advanced Very High Resolution Radiometer (AVHRR) product during its available period between 2000 and 2018, namely daily global Snow Cover Fraction (snow on the ground (SCFG) version 2.0) with a spatial resolution of ~ 5 km. The satellite-derived SCA from the AVHRR dataset was used as independent satellite SCA data to assess the reliability of the FLEXG model’s SCA simulations.

Glacio-hydrological model: FLEXG

The glacio-hydrological model FLEXG incorporates snow and glacier accumulation and ablation processes. The temperature-index approach is utilized to estimate glacier runoff, and also the FLEXG model employs a storage capacity curve based on the Xinanjiang model (Zhao 1992) to estimate runoff generation in the non-glacier areas of the basin (in situations where the water exceeds the storage capacity and cannot be held) (Gao et al. 2017; Mohammadi et al. 2023). The type of precipitation, either snow (Ps) or rain (Pl), is determined by the average daily air temperature being either below or above a certain threshold temperature. The snowpack (Sw) is represented as a porous medium capable of retaining liquid water, resulting from snowmelt or rainfall, with the ability for the liquid water to be frozen. In the temperature-index snow module, the calculation of snowmelt is based on a degree-day factor (Fdd) expressed in units of mm.°C−1.day−1. Therefore, the amount of melted snow (Ms) can be calculated via Eq. 1.

$$M_{s} = \left\{ {\begin{array}{*{20}c} {F_{{{\text{dd}}}} (T - T_{t} );{\text{T}}} > {T_{t} } \\ {0;{\text{T}} \le {\text{T}}_{{\text{t}}} } & {} \\ \end{array} } \right.$$
(1)

where T and Tt are the daily average air temperature and threshold temperature, respectively. It is important to highlight that the degree-day factor typically has a higher value for glaciers compared to snow in the corresponding area (Seibert et al. 2021), mainly because of the lower albedo of ice compared to snow. This effect is accounted for in the model by using a multiplier Cg. The runoff generated from glacier parts of the basin (Qg) is then computed by routing Mg and Pl via a linear reservoir Sfg, regulated by a recession parameter Kfg.

$$M_{g} = \left\{ {\begin{array}{*{20}c} {F_{{{\text{dd}}}} C_{g} (T - T_{t} );{\text{T}}} > {T_{t} {\text{and}}S_{w} = 0} \\ {0;{\text{T}} \le {\text{T}}_{{\text{t}}} {\text{orS}}_{{\text{w}}} } > {0} \\ \end{array} } \right.$$
(2)
$$\frac{{\text{d}}{S}_{f,g}}{{\text{d}}t}={P}_{l}+{M}_{g}-{Q}_{f,g}$$
(3)
$${Q}_{g}=\frac{{S}_{f,g}}{{K}_{f,g}}$$
(4)

The runoff generation in non-glacier area is calculated using the relative soil moisture (Su/Su,max) and the effective water into the soil. The estimation of actual evaporation is determined by considering the relative soil moisture and potential evaporation (Seibert 1997). The surplus water from replenishes two linear reservoirs (Sf and Ss), which indicate the response mechanisms of subsurface storm flow (Qf) and groundwater discharge (Qs), respectively. The recharge is controlled by a parameter, and Rf and Rs are recharges to Sf and Ss, respectively. The fast recession parameter (Kf) and the slow recession parameter (Ks) indicate the rate at which the two reservoirs recede, as shown in Eqs. 58.

$$\frac{{\text{d}}{S}_{f}}{{\text{d}}t}={R}_{f}-{Q}_{f}$$
(5)
$$\frac{{\text{d}}{S}_{s}}{{\text{d}}t}={R}_{s}-{Q}_{s}$$
(6)
$${Q}_{f}=\frac{{S}_{f}}{{K}_{f}}$$
(7)
$${Q}_{s}=\frac{{S}_{s}}{{K}_{s}}$$
(8)

Model calibration schemes

The FLEXG model was run at the daily scale for the time periods: warm-up period (1995–1999), calibration period (2000–2009), and validation period (2010–2018). The FLEXG model was calibrated using three schemes. They are scheme 1: the FLEXG model was calibrated by gauged streamflow as reference data; scheme 2: the FLEXG model was calibrated using satellite-derived SCA data (MODIS snow cover product) as reference data; scheme 3: the FLEXG model was calibrated by both gauged streamflow and satellite-derived SCA dataset (MODIS) as reference data at the same time.

This study employed different objective functions depending on the calibration scheme during the FLEXG model calibration procedure. Different evaluation metrics provide different insights into the model performance, and it is necessary to consider multiple metrics to ensure the model’s robustness and reliability. For scheme 1, the Kling Gupta efficiency (KGE, Gupta et al. 2009) was employed as the objective function. The KGE is a comprehensive hydrological metric that considers the correlation, variability, and timing of the simulated and observed hydrographs. KGE can capture different aspects of hydrological processes and has been widely used in hydrological model calibration (Gupta et al. 2009). For scheme 2, the objective function was integrated weight of the coefficient of determination (R2) and the ratio of root mean square error to the standard deviation of the observations (RSR). In scheme 2, a combination of R2 and RSR was used to improve the calibration performance. In this scheme, R2 was used for explaining proportion of variance by the model, while RSR was used to indicate the goodness of fit between measured and simulated values. Finally, for scheme 3, a combination of KGE, R2, and RSR was used as the integrated weight of the objective function. This approach can consider the strengths of each metric and make a balance in their contributions, and similar integrated objective functions have been used in other studies to improve the model calibration performance (Blasone et al. 2007, 2008; Yu et al. 2016; Drisya and Sathish Kumar 2018). The prior parameters’ ranges were subjected to a Monte Carlo sampling strategy with 10,000 samples drawn from a homogeneous distribution, and then parameter sets that meet the predefined criteria were selected for further analysis. The predefined criteria in this study were based on the model’s performance, selecting the best iteration according to the objective function for each defined scheme (Table 1). Table 2 presents the initial parameter ranges and optimal parameter values of the FLEXG model for the three different calibration schemes.

Table 1 Applied objective functions based on three defined calibration schemes for calibrating the FLEXG
Table 2 Optimized parameters of the FLEXG by the Monte Carlo method for each of the three schemes

Metrics for the model performance

Five statistical metrics were applied in the assessment of the FLEXG model, including R2, KGE, root mean square error (RMSE), RSR, and mean absolute error (MAE). Table 3 lists the equation of each used statistical metrics, where \({Q}_{m,i}\) and \({Q}_{s,i}\) refer to the ith measured and modeled runoff, respectively, and \({\overline{Q} }_{m}\) and \({\overline{Q} }_{s}\) represent the mean of measured and modeled runoff, respectively. Additionally, the variable “n” represents the total number of measured values, \({{\text{STDEV}}}_{m}\) indicates the standard deviation of measured runoff, and \({{\text{CV}}}_{m}\) and \({{\text{CV}}}_{s}\) refer to the coefficient of variation for measured and modeled runoff, respectively.

Table 3 List of the equations for the metrics that were employed in this study

Results and discussion

Calibrated parameters from different schemes

Parameters of the FLEXG model were calibrated based on three schemes with the Monte Carlo method. Figure 3 demonstrates the optimal value of 13 free parameters during calibration period by scheme 1 (gauged streamflow-based calibration), scheme 2 (MODIS satellite SCA-based calibration), and scheme 3 (gauged and satellite-based calibration). The radar charts (shown in Fig. 3) provide a visual representation of the relative importance and interdependence of each parameter in each scheme. The calibration of the recession coefficient for the reservoir’s slow response (Ks) was conducted using three different calibration schemes. Scheme 2 resulted in the highest Ks value of approximately 178 days, while scheme 3 produced the lowest Ks value of 53 days. This suggests that a multi-variable calibration approach can help achieve a shorter response time for the reservoir. The use of MODIS SCA data, however, led to a longer response time in the catchment. Scheme 2 also yielded higher optimal values for ice melt factor, snow water holding capacity, and refreezing factor, while scheme 1 showed higher values for shape parameter, threshold temperature (Tt), and splitter.

Fig. 3
figure 3

Optimal values of the FLEXG parameters using three different calibration schemes. Each colored polygon represents the parameter values for a particular scheme, with the scheme number and the parameter name

Runoff simulation (total runoff, glacier runoff, and non-glacier runoff) under three different calibration schemes

Table 4 presents the statistical metrics of the FLEXG model during the three different calibration schemes. Scheme 1 showed the most accurate performance in terms of runoff simulation, with MAE of 7.25 (m3/s) and KGE of 0.75 for validation period. Scheme 3 also showed good performance, with MAE of 7.43 (m3/s) and KGE of 0.70 for the validation period. However, scheme 2 could not simulate runoff with acceptable accuracy, with MAE of 9.73 (m3/s) and KGE of 0.01 for the validation period. Generally, all schemes reported acceptable errors in terms of MAE, RMSE, and RSR, but only schemes 1 and 3 reported acceptable KGE values for runoff generation in both calibration and validation periods. In terms of simulating both streamflow and SCA (the following section), scheme 3 yielded the best performance in this study. This is consistent with the previous study that utilized satellite-derived snow cover products for calibrating hydrological models. For instance, Finger et al. (2011) examined the use of glacier mass balance, satellite snow cover images, and discharge data in improving the performance of a physically based distributed hydrological model (Topographic Kinematic Approximation and Integration model (TOPKAPI)) in a glaciated catchment in Switzerland. They found that the combination of streamflow and SCA can improve the performance of TOPKAPI in glaciated catchments.

Table 4 Evaluation of the FLEXG for simulating runoff using three calibration schemes at daily scales during the calibration (Cal) and validation (Val) periods

The FLEXG model successfully reproduced the hydrographs of total runoff at the basin outlet, as shown in Figs. 4 and 5 for the calibration and validation periods, respectively. The values of KGEs were 0.73, 0.03, and 0.71, for the calibration periods of schemes 1, 2, and 3, respectively, and the value of KGEs were 0.75, 0.01, and 0.70, for the validation periods of schemes 1, 2, and 3, respectively. Schemes 1 and 3 reproduced peak flow and base flow of the runoff time series, which demonstrated the highest accuracies and less error in terms of total runoff simulation. Scheme 2 which was calibrated by satellite remote sensing-based data had the worst performance in terms of total runoff simulation, and it reproduced total runoff by underestimation during calibration and validation periods. Based on the calibration procedure using satellite SCA data, the hydrological model was not able to accurately simulate the peak flow of the runoff. However, scheme 2 was able to capture the baseflow components of the hydrograph during both the calibration and validation periods. In other words, scheme 2 was able to simulate the sustained, lower flow portions of the hydrograph, but was less successful in capturing the higher peak flows.

Fig. 4
figure 4

Hydrographs of measured and simulated daily total runoff via the FLEXG using the three calibration schemes during the calibration period

Fig. 5
figure 5

Hydrographs of measured and simulated daily total runoff via the FLEXG using the three calibration schemes during the validation period

Based on the statistical metrics listed in Table 5, it can be concluded that the FLEXG model performed better for runoff simulation on a monthly scale compared to the daily scale. Scheme 1 showed the highest accuracy in simulating total runoff on a monthly scale throughout the validation period, with KGE = 0.81, R2 = 0.78, and RMSE = 8.30 (m3/s). Scheme 3 also performed well and showed more accuracy than scheme 2 with KGE = 0.78, R2 = 0.72, and RMSE = 9.54 (m3/s) during the validation period. It should be noted that the KGE values for scheme 2 were very low, indicating poor model performance. Additionally, results showed that scheme 2 had much higher MAE, RMSE, and RSR values compared to the other two schemes. Therefore, scheme 2 performed poorly in terms of monthly runoff simulation during both the calibration and validation periods. Overall, the results suggest that scheme 1 and scheme 3 are more suitable for monthly runoff simulation using satellite-derived SCA data.

Table 5 Evaluation of the FLEXG for simulating runoff using three calibration schemes at monthly scales during the calibration (Cal) and validation (Val) periods

Based on the results of monthly runoff time series simulations, scheme 1 and scheme 3 were found to be more effective in detecting peak flows during the calibration (Fig. 6) and validation (Fig. 7) periods, except for three peak flows in 2000, 2015, and 2017 that were not detected accurately. Scheme 2, on the other hand, showed underestimation during maximum events (peak flows) in both calibration and validation periods when compared to the other two schemes. Furthermore, the calibration of the FLEXG model using schemes 1 and 3 resulted in smoother time series or hydrographs with fewer noises, which in turn reduced the bias of simulated runoff. These findings suggest that the incorporation of different calibration strategies can impact the accuracy of peak flow detection and runoff prediction in the FLEXG model. It also highlights the importance of selecting an appropriate calibration strategy to achieve more reliable and accurate simulations of runoff.

Fig. 6
figure 6

Simulated and measured hydrographs of monthly runoff via the FLEXG using the three calibration schemes during the calibration period

Fig. 7
figure 7

Simulated and measured hydrographs of monthly runoff via the FLEXG using the three calibration schemes during the validation period

The FLEXG model has the capability to simulate runoff separately for glacierized and non-glacierized areas of the basin (Gao et al. 2017). This means that the model can differentiate between the two types of runoffs and estimate them separately, and the simulated runoff for both parts during the calibration and validation periods are presented in Figs. 8 and 9. The results of the calibration and validation periods indicate that using both gauged data and satellite remote sensing data simultaneously (scheme 3) resulted in less non-glacier runoff compared to scheme 1. This implies that scheme 3 is better at simulating non-glacier runoff, which is an important component of the total runoff in the study basin. Scheme 2 performed poorly in terms of glacier and non-glacier runoff simulation, as it could not detect the natural streamflow behavior well and only detected base flows. The results also showed that scheme 2 simulated non-glacier runoff significantly more than glacier runoff, indicating that using only satellite SCA data for calibration can result in a weaker performance of the glacio-hydrological model. The results highlight the importance of using both gauged data and satellite remote sensing data simultaneously for calibrating the glacio-hydrological models to achieve better accuracy in simulating both glacier and non-glacier runoff.

Fig. 8
figure 8

Simulated daily runoff for glacier and non-glacier parts of basin versus measured daily total runoff for three calibration schemes during the calibration period

Fig. 9
figure 9

Simulated daily runoff for glacier and non-glacier parts of basin versus measured daily total runoff for three calibration schemes during the validation period

Snow cover area simulation under three different calibration schemes

Table 6 presents the evaluation of the FLEXG model in simulating SCA against the MODIS satellite remote sensing SCA product. The validation results showed that scheme 2 had the highest accuracy in simulating SCA, with the lowest RMSE of 22.19%, the highest R2 of 0.87, and the lowest RSR of 0.76 compared to the other schemes. Scheme 3 also showed better performance than scheme 1, with RMSE of 23.85%, R2 of 0.79, and RSR of 0.82 during the validation period. In contrast, scheme 1 had the poorest performance, with the highest RMSE of 26.05%, the lowest R2 of 0.78, and the highest RSR of 0.89. It can be concluded that the multi-variable calibration strategy improved the accuracy of the FLEXG model in SCA simulation, and scheme 2 performed better than the other schemes.

Table 6 Evaluation of the FLEXG for simulating SCA using three calibration schemes during the calibration (Cal) and validation (Val) periods

The simulated SCAs by three calibration strategies against the observed SCA data (MODIS) are shown in Fig. 10. It is important to note that the main difference between the FLEXG model’s simulated SCA values and the observed SCA values is that the FLEXG model’s simulated SCA values range from 0 to 100%, indicating that the model considers the possibility of up to 100% of the basin being covered by snow. In contrast, the maximum values of MODIS-derived SCA are less than 90%, suggesting that the study basin is never completely covered by snow. All three calibration strategies, namely scheme 1, scheme 2, and scheme 3, followed the pattern of SCA, and the observed data from MODIS showed the same time series behavior during the studied period. This indicates that all three calibration strategies were able to capture the overall pattern of snow cover in the basin. However, the accuracy of the simulated SCA values varied between the different schemes, as shown in Table 6.

Fig. 10
figure 10

Comparison of observed SCA (from MODIS) and simulated SCA by the FLEXG under the three different calibration schemes from 2000 to 2018

Comparison of simulated SCA with the SCA products from MODIS, AVHRR, and Landsat-8

In order to further assess the performance of MODIS and AVHRR in SCA monitoring in the studied basin, this study also calculated the Normalized Difference Snow Index (NDSI) using Landsat-8 imageries. NDSI is a widely used snow-covered area mapping technique based on the spectral differences between snow and other surfaces in the visible and near-infrared regions of the electromagnetic spectrum. NDSI is defined as (Green—SWIR)/(Green + SWIR), where Green and SWIR are the reflectance values in the green and short-wave infrared bands, respectively. In this study, NDSI was used with a threshold of 0.4 to derive SCA from Landast-8 imageries. The threshold of 0.4 was selected based on previous studies that have shown it to be effective in accurately mapping SCA in different regions to distinguish snow from other surfaces, such as vegetation or soil (Dozier 1989; Hall et al. 2010; Sankey et al. 2015).

The current study used MODIS SCA data as reference data to calibrate the FLEXG model. Following previous studies, the question of which data provide a more accurate estimation of actual SCA still remains. To address this, the SCA values derived from AVHRR and MODIS were compared with Landsat-8 imagery on a basin-wide scale, as shown in Fig. 11. We assumed that Landsat-8 imagery at higher spatial resolution could be better at detecting small-scale snow cover and thus it is useful for evaluating the accuracy of SCA simulations and other satellite snow cover products at coarser spatial resolutions. Therefore, the snow cover derived from Landsat-8 imagery with the NDSI method were used for this purpose. Four days which have the biggest difference between AVHRR and MODIS values were selected and compared with Landsat-8 satellite imageries. Figure 11 demonstrates that for all selected days (April 12, 2014, April 24, 2016, March 22, 2018, and April 5, 2018), the AVHRR data suggest that over 97% of the basin is covered by snow. However, SCA values extracted from AVHRR indicate that they are higher than those indicated by Landsat-8 imagery. The comparison between the SCA values derived from AVHRR and MODIS datasets reveals larger discrepancies during the spring season, while their values showed excellent agreement in the other seasons for the studied basin. Therefore, the comparison of AVHRR, MODIS, and Landsat-8 was analyzed on the same day or two days before the satellite imageries’ date in the spring seasons. Because of the natural features of the studied basin, always some parts of the basin could be without snow, so the values of MODIS and Landsat-8 could be more logical than AVHRR SCA values.

Fig. 11
figure 11

NDSI calculated via Landast-8 imageries and compared with AVHRR and MODIS snow cover area (SCA) values on the basin scale. A threshold of 0.4 was used to derive snow-covered regions from the NDSI, which is a commonly used value in the literature

The current study also utilized SCA data derived from MODIS to calibrate the FLEXG model and compared SCA data from MODIS with AVHRR and Landsat-8 satellite images. The difference between simulated SCA by the FLEXG and observed SCA from MODIS is that the FLEXG simulated a range of SCA between 0 to 100%, which means it considers up to 100% of the basin is covered by snow. The maximum values of MODIS-derived SCA are less than 90%, suggesting that the study basin is never completely covered by snow. Therefore, this issue was a motivation to check and confirm the suitability of MODIS SCA in the Torne River basin via other satellite-derived SCA products. The current study found that the MODIS SCA values were in better agreement with the Landsat-8 values than the AVHRR SCA values. This could be attributed to the higher spatial resolution of MODIS images (~ 500 m) compared to AVHRR images (~ 1 km), which may have led to better differentiation of snow cover from other land cover types.

Limitations of this study and outlook for future studies

The findings of this study contribute to a better understanding of the advantages associated with the application of a multi-variable calibration approach to improve the performance of a glacio-hydrological model in a glacierized catchment. However, it is important to note several limitations: (i) Data limitations: the current study relies on satellite-derived SCA data and gauged streamflow data for calibration. Uncertainties may arise from the accuracy and availability of these data sources. Integration of other glacio-hydrological data, like snow water equivalent data, can be explored in future studies to improve model calibration and validation. Regarding the use of a single meteorological station in this study (due to the unavailability of multiple stations in the studied basin), it is essential to acknowledge the potential sources of error associated with using a single meteorological station to implementation the glacio-hydrological model. The reliance on data from a single station may not capture the spatial heterogeneity of meteorological conditions within the study area, potentially leading to uncertainties in model outputs. To mitigate this limitation, future research could consider incorporating data from multiple meteorological stations. (ii) Model structure: The FLEXG model used in this study represents a specific model structure to glacio-hydrological modeling. This model seemed to overestimate the SCA in this study area for some periods, suggesting the need for further evaluating and refining the representation of snow accumulation and melt processes in the FLEXG model. Regarding the availability of other models with varied parameterizations and process representations, it could be interesting to compare different process-based models to assess the robustness of the findings and identify potential model-specific biases. (iii) Uncertainty analysis: it is essential to quantify and address uncertainties associated with all relevant variables used in hydrological modeling. To improve the model’s performance evaluation, future studies can focus on quantifying and disseminating uncertainties related to input data, model parameters, and calibration/validation datasets.

Conclusions

Hydrological modeling in glacierized catchments is an essential task for water resources management in cold regions. The calibration process of hydrological models is a crucial step for obtaining simulation with a satisfactory level of accuracy. This study evaluated a multi-variable calibration approach to improve the parametrization and performance of the glacio-hydrological model (FLEXG) in a glacierized catchment in northern Sweden by incorporating satellite snow cover area (SCA) data and gauged streamflow data. The FLEXG model was calibrated using three different schemes: scheme 1 used gauged streamflow data as the reference, scheme 2 utilized satellite-based SCA (MODIS product) as an alternative variable for calibration, and scheme 3 combined gauged streamflow and satellite-based SCA data for calibration. The Month Carlo method was used for the calibration process of the FLEXG model. Different objective functions were used for each scheme. Our results showed that schemes 1 and 3, yielded accurate streamflow simulations. However, scheme 1 had poor SCA simulations, whereas scheme 3 achieved acceptable accuracy in SCA simulations. Scheme 2, which solely relied on satellite SCA data for calibration, yielded good simulation of SCA but failed to simulate runoff accurately. These findings suggest that incorporating satellite SCA data along with streamflow data during calibration improves the representation of hydrological components in glacierized catchments. Comparisons between satellite SCA data from AVHRR, Landsat-8, and MODIS revealed discrepancies, particularly during the spring seasons. AVHRR data showed unusually high SCA values (90–100%) on some days, while Landsat-8 and MODIS data exhibited better agreement and lower values. Although the FLEXG model also simulated SCA values higher than the satellite data on these days, the good agreement between MODIS and Landsat-8 data suggested that it was implausible for the entire basin to be completely covered by snow. Future research could explore the integration of additional variables, such as soil moisture, snow water equivalent, and glacier mass balance, to further improve the calibration and accuracy of hydrological models in glacierized catchments.