Introduction

The IPCC 2022 report (Pörtner et al. 2022) projects an increase in the frequency and severity of low flows in Southern Europe, resulting from increasing drought and water scarcity conditions. Population exposed to at moderate water scarcity will grow by 18% and 54% for a raise of 1.5°C and 2°C in air temperature, respectively. The groundwater resources will be affected by an increase in abstraction rates and a decrease in recharge rates. Agriculture, which represents the main water use in the region, may be seriously limited by water availability. Thus, there is a need to improve water management at different scales to cope with the increasing scarcity. At the regional scale, this means the construction of dams and reservoirs to increase water storage, desalination, water reuse, and the adoption of water conservation measures. At the plot scale, that means reallocation to crops more resistant to drought conditions, the improvement of water use efficiency and performance of irrigation systems, and the implementation of soil water conservation practices (Jovanovic et al. 2020; Pereira et al. 2009).

Decision-support systems (DSSs) have been developed over the last few decades to improve water resource management at different spatial and temporal scales (Teodosiu et al. 2009). These tools commonly consist of interactive software-based systems where useful information from raw data sources, documents, simulation models, and other sources is aggregated to identify and solve problems and support decision-making. Considering the plot scale, Smart Irrigation Decision Support System (SIDSS, Navarro-Hellín et al. 2016) and IrrigaSys (Simionesei et al. 2020) are examples of DSSs for irrigation water management support. SIDSS estimates weekly irrigation needs based on data from soil sensors and/or weather stations using two machine learning techniques. IrrigaSys also estimates weekly irrigation needs using a physically based model fed by weather forecast and hindcast data. When considering larger scales, Zhang et al. (2015a) designed and developed a prototype of a DSS for watershed management by integrating open-source web-based geographical information systems, a modeling component, and a cloud computing platform. Ashrafi and Mahmoudi (2019) presented a DSS to assist decision-makers in examining the impacts of different operating policies at the basin scale. DSSs are also applied to reservoir flood control operations (Delaney et al. 2020) and to early warning and detection, follow-up, and early response to flood events and hazmat pollution occurrences in inland and transitional waters (HAZRUNOFF Project - Layman’s Report 2020).

As proposed by Miser and Quade (1985), one of the steps to design a DSS is the building of models to predict consequences. A good hydrological and/or hydraulic model with reliable results and proved forecast capacity is of paramount importance for water management DSSs. Their results can then feed other models in the DSS. For instance, modeled groundwater levels can be used to estimate irrigation needs, or the simulation of river flows can help in flood forecast. However, modeling results can also be directly used to support decision-making.

Concerning models’ classification, they can be divided into three main groups according to their complexity: (i) empirical models; (ii) conceptual models; and (iii) physical models (Sitterson et al. 2017). Empirical models are based on linear and non-linear equations that relate inputs and outputs ignoring the physical processes. These types of models are considered the simplest models. Conceptual models are based on simplified equations to describe the hydrological processes and are characterized by an intermediate level of complexity. Physical-based models, also known as process-based models, are the most complex and rely on physical principles, being suitable to provide insights into physical processes. Usually, physical models use finite difference equations and state variables that can be measured and are time- and space-dependent (Devia et al. 2015; Fatichi et al. 2016). However, their weakness relies on the large number of parameters required to describe the physical characteristics of the watershed, which leads to high complexity levels that make their correct implementation difficult and laborious calibration and validation processes (Devia et al. 2015; Abbott et al. 1986a, b; Ranatunga et al. 2016; Zhang et al. 2015b; Mehr et al. 2013).

The study presented here is included within the framework of a larger work aimed at developing a DSS for supporting water management in the Maranhão and Montargil reservoirs, in southern Portugal. These reservoirs store water that is used mainly for irrigation of the Sorraia Valley, which comprehended a cultivated area of 21,280 ha and an irrigated area of 18,754 ha (ARBVS 2023) in 2021. With a 52% increase in the irrigated area over the last 2 decades (ARBVS 2023) and facing predictions of river flow decrease between 54 and 94% due to climate change (Almeida et al. 2018), accurate forecast of streamflow is of extreme importance to improve the management of water availabilities in the region. Taking as example the Maranhão reservoir, the work presented here makes use of two different types of models to estimate the daily inflow to the reservoir and discusses the advantages and weaknesses of both approaches. The applied models were the physically based MOHID-Land model (Trancoso et al. 2009; Canuto et al. 2019; Oliveira et al. 2020) and a convolutional neural network (CNN) (Oliveira et al. 2023), i.e., a data-driven model. In both cases, the models were calibrated/trained and validated using data from a hydrometric station that corresponds to 30% of the Maranhão watershed. Because there are no stations monitoring the entire watershed despite the importance of this information for the sustainability of the irrigation district, this study also aims to analyze the capacity of both approaches to represent streamflow generation in the entire watershed. That analysis comprehended the expansion of models results from the referred sub-basin to the full basin scale through the extension of the calibrated parameters in MOHID-Land, or through the replacing of the forcing variables in the CNN model. The results were then validated with a monthly reservoir mass balance. Therefore, this study provides sophisticated modeling tools for streamflow calculation in the Maranhão watershed, which were developed using two distinct modeling approaches. The ultimate aim is their integration into the DSS for supporting water managers in the decision-making of water availabilities in the region.

Materials and methods

Description of the study area

The Maranhão dam is located at Ribeira da Seda, southern Portugal (39° 0′ 53.846″ N; 7° 58′ 33.149″ W). The corresponding reservoir has a total capacity of 205 hm3 and drains an area close to 2300 km2. The minimum, average, and maximum altitudes are 122, 261, and 723 m, respectively (EU-DEM 2019) (Fig. 1).

Fig. 1
figure 1

Maranhão watershed: location, delineation, elevation, main rivers, and hydrometric stations

The climate is classified as Mediterranean hot-summer (Csa) according to Köppen–Geiger climate classification (Agencia Estatal de Meteorología (España) 2011). The average annual precipitation is 608 mm. The minimum and maximum average monthly precipitation are 4 mm in July and August and 84 mm in December. The average monthly air temperature ranges from 24 °C in July and August, and 9°C in January, while the annual average is 16 °C. The main soil reference groups are Luvisols (67%), Regosols (18%), and Cambisols (11%) (Panagos et al. 2012). The main land uses are non-irrigated arable land and agro-forestry areas, both representing 28% of the watershed, broad-leaved forest, occupying 15%, and olive groves, with a representation of 11% (CLC 2012 2019).

The Maranhão watershed has four hydrometric stations (Fig. 1), with all measuring daily streamflow in natural regime. Table 1 presents a brief characterization of those stations.

Table 1 Characteristics of hydrometric stations: drained area, period of records and percentage of records within this period, minimum (Min), maximum (Max), median (Med), and average (Ave) streamflow values (source: SNIRH 2021)

Figure 2 shows the monthly patterns considering the daily streamflow values at the four stations. In accordance with the meteorological characterization, streamflow patterns show higher values between November and April, while lower values occur between May and September, with August presenting the lowest value.

Fig. 2
figure 2

Monthly distribution of streamflow in the four hydrometric stations (source: SNIRH 2021)

The water stored in the Maranhão reservoir is mainly for irrigation of the Sorraia Valley (ARBVS 2023). Other uses include energy production, industrial supply, and recreation. The stored volumes normally increase during the wet period and decrease in the dry period as expected in hydroagricultural reservoirs (Fig. 3).

Fig. 3
figure 3

Monthly pattern of stored volume in Maranhão reservoir (source: SNIRH 2021)

Convolutional neural network model description

A one-dimensional convolutional neural network (1D-CNN) was used to estimate daily streamflow at Ponte Vila Formosa. This 1D-CNN model was created, developed, optimized, and tuned in Python language (version 3.8.10) using public and free tools (Keras, Chollet et al. 2015; TensorFlow, Abadi et al. 2016; KerasTuner, O’Malley et al. 2019; Pandas, McKinney 2010; Scikit-learn, Pedregosa et al. 2011). A detailed description about the development of the 1D-CNN model is presented in Oliveira et al. (2023). In that study, the authors carried out a set of experiments where three different neural network models were tested for streamflow estimation, as well as several combinations of precipitation and air temperature values. The models’ structures and hyper-parameters were optimized and tuned using six different training algorithms. Also, the batch size and the number of epochs were optimized. The best solution for streamflow estimation was obtained with a 1D-CNN model composed of one input 1D convolutional (1D-Conv) layer with 16 filters, a kernel size equal to 1, and an output dense layer activated by a linear function. Between them, two 1D-Conv layers, each having 32 filters and a kernel size of 8, were applied. After each 1D-Conv layer, a MaxPooling1D layer with pool_size set to 2 was placed. The Nadam optimizer was the training algorithm with the best performance combined with a learning rate of 1 × 10–3 and a ε (constant used for numerical stability) of 1 × 10–8. The batch size and the number of epochs were 20 and 200, respectively. Finally, the input variable was the daily precipitation values accumulated in 1, 2, 3, 4, 5, and 10 days and delayed in 1, 2, 3, 4, 5, 6, and 7 days.

The CNN model was tuned, trained, and validated considering the streamflow values available in Ponte Vila Formosa station (30% of the Maranhão watershed) for the period from 01/01/2001 to 01/01/2009. The model performance was considered good, reaching a Nash–Sutcliffe Efficiency (NSE) of 0.86, a coefficient of determination (R2) of 0.87, a percent bias (PBIAS) of 10.5%, and a root-mean-squared error (RMSE) of 4.2 m3 s−1 for the test dataset. Thus, in this study, the same 1D-CNN model was used by considering the precipitation of the entire Maranhão watershed instead of the sub-basin’s data as in the original version.

Input variables for 1D-CNN model

The precipitation data used to train the 1D-CNN model were obtained from the ERA5-Reanalysis dataset (Hersbach et al. 2017). This is a gridded product with a resolution of 31 km and an hourly timestep, making it an appropriate option for the implementation of the physically based model, which requires sub-daily precipitation in small watersheds like Maranhão. Precipitation data were extracted from the dataset considering all the cells within the limits of the watershed. Precipitation hourly values were then averaged within the watershed area and accumulated each day from 01/01/2001 to 31/12/2009. The daily precipitation values in the watershed accumulated in 1, 2, 3, 4, 5, and 10 days and delayed in 1, 2, 3, 4, 5, 6, and 7 days were considered. The average annual precipitation for the period considered in this study was 575 mm, with July (3 mm) and August (8 mm) presenting the minimum monthly values, and October (104 mm) and November and December (both with 67 mm) the months when more precipitation was registered.

Estimation of Maranhão inflow with 1D-CNN

The Maranhão reservoir’s daily inflow was estimated considering the daily precipitation in the corresponding watershed and the trained 1D-CNN model. However, because of the intrinsic random behavior verified in randomly initialized neural networks (Duan et al. 2020; Alzubaidi et al. 2021), the 1D-CNN model was trained 100 times. Those 100 runs were performed using the same dataset and division into training, validation, and test datasets presented in Oliveira et al. (2023). After each run, the results were compared and evaluated considering the observed streamflow in Ponte Vila Formosa station. Based on the statistical evaluation, the model with the best performance was selected.

The selected 1D-CNN model was then exposed to Maranhão watershed daily precipitation, with results representing the daily surface flow generated in the watershed and flowing to the Maranhão reservoir. Those daily values were then aggregated by month and transformed into volume. The estimated monthly volume that reached Maranhão reservoir was incorporated into the reservoir mass balance to estimate the stored volume in the following month. The validation of inflow values was made through the comparison of estimated stored volumes and the corresponding observed values.

MOHID-Land model description

MOHID-Land is an open-source hydrological model, with the code available in an online repository (github.com/Mohid-Water-Modelling-System/Mohid). MOHID-Land (Trancoso et al. 2009; Canuto et al. 2019; Oliveira et al. 2020) is a fully distributed and physically based model. Considering the mass and momentum conservation equations and a finite volume approach, the model simulates the water movement between four main compartments: atmosphere, porous media, soil surface, and river network. To avoid instability problems and save computational time, the model time step is variable being higher during dry seasons and lower in wet periods when water fluxes increase.

According to his finite volume approach, the domains in MOHID-Land are discretized by a regular grid in the surface plane and by a Cartesian coordinate system in the vertical direction. The land surface considers a 2D domain to simulate the water movement, while the porous media is represented by a 3D domain, which includes the same surface grid and is complemented with the vertical grid with variable thickness layers. Additionally, a 1D domain representing the river network can be derived from a digital terrain model represented in the horizontal grid. The water lines of the river network are then delineated by linking surface cell centers (nodes).

The four compartments referred to before are all explicitly simulated, except the atmosphere which is only responsible for providing the data needed for imposing surface boundary conditions. The atmospheric data can be space and/or time variant, and include precipitation, air temperature, relative humidity, wind velocity, solar radiation, and/or cloud cover.

The amount of water precipitated in each cell is divided into surface and subsurface flow considering the infiltration process and according to the soil saturation state. In this study, the infiltration rate (i, LT−1) was computed according to the Darcy’s law

$$i= {-K}_{sat}\left(\frac{\partial h}{\partial z}+1\right),$$
(1)

where Ksat is the saturated soil hydraulic conductivity (LT−1), h is the soil pressure head (L), and z is the vertical space coordinate (L).

The movement of infiltrated water in porous media was simulated using the Richards’ equation, which is applied to the whole subsurface domain and simulates saturated and unsaturated flow using the same grid

$$\frac{\partial \theta }{\partial t}=\frac{\partial }{{\partial x}_{i}}\left[K(\theta )\left(\frac{\partial h}{{\partial x}_{i}}+\frac{\partial }{{\partial x}_{i}}\right)\right]-S\left(h\right),$$
(2)

where θ is the volumetric water content (L3L−3), xi represents the xyz directions (–), K is the hydraulic conductivity (LT−1), and S is the sink term representing root water uptake (L3L−3 T−1). The soil hydraulic parameters were described using the van Genuchten–Mualem functional relationships (Mualem 1976; van Genuchten 1980). When a cell reaches saturation, i.e., when soil moisture in a cell is above a threshold value defined by the user, the model considers the saturated conductivity to compute flow and pressure becomes hydrostatic, corrected by friction. The ratio between the horizontal and vertical hydraulic conductivities is defined by a factor (fh = Khor/Kver) that can also be tuned by the user.

The root water uptake was estimated considering the weather conditions and soil water contents. The reference evapotranspiration (ETo) rates were computed following the FAO Penman–Monteith method (Allen et al. 1998). The crop evapotranspiration (ETc) rates were then estimated by multiplying the ETo first with a crop coefficient (Kc). The Kc values were made to vary as a function of the plant development stage, as follows:

$${K}_{c}=\left\{\begin{array}{c}{K}_{c,ini}, GFr<GFr1\\ {K}_{c,ini}+\frac{GFr-GFr1}{GFr2-GFr1}\left({K}_{c,mid}-{K}_{c,ini}\right), GFr1<GFr<GFr2\\ {K}_{c, mid}, GFr2<PS<{GFrLAI}_{Sen}\\ {K}_{c,mid}+\frac{GFr-GFr2}{1.0-GFr2}\left({K}_{c,end}-{K}_{c,mid}\right), GFr2<GFr<1.0\\ {K}_{c,end}, GFr>1.0\end{array},\right.$$
(3)

where GFr, GFr1, GFr2, and GFrLAISen are the plant growth fractions in the simulated instant, the initial stage, the mid-season stage, and when the LAI senescence starts, respectively, and Kc,ini, Kc,mid, and Kc,end are the crop coefficients during the initial, mid-season and end-season stages, respectively. The plant growth stages are represented as a percentage of maturity heat units, and the values for GFr1, GFr2, and GFrLAISen are defined in the plant growth database of MOHID-Land. ETc values are then partitioned into potential soil evaporation (Es) and crop transpiration (Tc) as a function of the simulated leaf area index (LAI), which is computed using a modified version of the EPIC model (Neitsch et al. 2011; Williams et al. 1989) and considering the heat units approach for the plant to reach maturity, the crop development stages, and crop stress (Ramos et al. 2017). Following the macroscopic approach proposed by Feddes et al. (1978), root water uptake reductions (i.e., actual crop transpiration rates, Ta) are computed by distributing water extractions along the root zone domain and are estimated considering the presence of depth-varying stressors, such as water stress (Šimůnek and Hopmans 2009; Skaggs et al. 2006). Finally, the actual soil evaporation is calculated from potential soil evaporation by imposing a pressure head threshold value (ASCE 1996).

The amount of water that is not able to infiltrate is transformed into surface flow which is computed by solving the Saint–Venant equation in its conservative form, accounting for advection, pressure, and friction forces

$$\frac{{\partial Q}_{u}}{\partial t}+{{v}}_{v}\frac{{\partial Q}_{u}}{{\partial x}_{u}}=-gA\left(\frac{\partial H}{{\partial x}_{u}}+\frac{\left|Q\right|{Q}_{u}{n}^{2}}{{A}_{v}^{2}{R}_{h}^{4/3}}\right),$$
(4)

where Q is the water flow (L3T−1), A is the cross-sectional flow area (L2), g is the gravitational acceleration (LT−2), ν is the flow velocity (LT−1), H is the hydraulic head (L), n is the Manning coefficient (TL−1/3), Rh is the hydraulic radius (L), and subscripts u and v denote flow directions. The Saint–Venant equation is solved on a 2D domain considering the directions of the horizontal grid except for the river network, where it is solved considering the 1D domain comprehending the water lines. There, the cross-section for each node of the river network is defined by the user.

The water changes between the river network and the soil surface are estimated according to a kinematic approach, neglecting bottom friction, and using an implicit algorithm to avoid instabilities. The water fluxes between the river network and the porous media are driven by the pressure gradient in the interface of these two mediums.

Model set-up

The MOHID-Land model was implemented using a constant horizontal spaced grid with a resolution of 0.006º in longitudinal and latitudinal directions (⁓520 × 666 m). To cover the modeled domain, the grid had 140 columns and 110 rows, with its origin located at 38° 45′ 16.5" N and 8° 03′ 12.4" W.

Elevation data were interpolated to the MOHID-Land grid from the digital elevation model (DEM) provided by the European Environment Agency (EU-DEM 2019) and have a resolution of approximately 30 m (0.00028°). The watershed’s minimum and maximum elevations after the interpolation process were 107 m and 725 m, respectively (Fig. 4a). The delineation of the watershed and the river network was performed considering the cell where the dam of Maranhão reservoir is located as the outlet. The minimum area to consider the existence of a waterline (minimum threshold area) was 10 km2. Additionally, a rectangular geometry was chosen to represent the river cross-sections with width and height defined according to Andreadis et al. (2013). The cross-section dimensions were related to the drained area and were assigned to the river network according to Table 2. For the nodes where the drained area relied between the values presented on the table, the cross-section dimensions were linearly interpolated based on the given information.

Fig. 4
figure 4

MOHID-Land inputs for Maranhão watershed: a digital terrain model and watershed and river network delineation; b Manning coefficient values; c types of vegetation; d identification number of the types of soil in surface horizon; e identification number of the types of soil in middle horizon; and f identification number of the types of soil in bottom horizon

Table 2 Cross sections dimensions according to drained area

The CORINE Land Cover 2012, with a resolution of 100 m (CLC 2012, 2019), was interpolated to the MOHID-Land’s grid and was used for representing land use in the watershed. Each land-use class was associated with: (i) a Manning coefficient, which was defined according to Pestana et al. (2013) (Fig. 4b), and (ii) a vegetation type class considering MOHID-Land’s database (Fig. 4c).

The Kc values were defined according to Allen et al. (1998) for agriculture (summer and winter crops), orchard, pasture, and brush, while pine, oak, and forest crop coefficients were defined based on the values proposed by Corbari et al. (2017) (Table 3).

Table 3 Crop coefficient values for initial stage (Kc ini), mid- (Kc mid) and late (Kc end) season for each type of vegetation

The Mualem–van Genuchten hydraulic parameters were obtained from the European Soil Hydraulic Database (EU Soil Database, Tóth et al. 2017). Although the database provides information at 7 different depths, with a resolution of 250 m, the present application only considered data from 0.3, 1.0 and 2.0 m depths. The porous media was divided into 6 layers, with a thickness of 0.3, 0.3, 0.7, 0.7, 1.5, and 1.5 m from surface to bottom (vertical grid), with the maximum total soil depth of 5.0 m. These layers were organized according to 3 different horizons characterized by the soil hydraulic properties acquired from the selected depths of EU Soil Database. The 2 surface layers (0–0.6 m) were associated with the data at 0.3 m depth, the 2 middle layers (0.6–2.0 m) acquired the values at 1.0 m depth, and the information at 2.0 m depth was representative of the 2 bottom layers (2.0–5.0 m) (Table 4). The spatial variation of soil properties in the surface, middle, and bottom horizons are shown in Fig. 4.d, e and f, respectively, with each ID corresponding to a different combination of soil hydraulic data. The fh parameter relating horizontal and vertical hydraulic conductivities was set to 10.

Table 4 Mualem–van Genuchten hydraulic parameters by soil horizon. θr, residual water content; θs, saturated water content; α and η, empirical shape parameters; Ks,vert, saturated hydraulic conductivity

As for the input variables used in the neural network model, meteorological data were obtained from ERA5-Reanalysis dataset (Hersbach et al. 2017). For the implementation of MOHID-Land, the meteorological properties incorporated were the total precipitation, air temperature, and dew point temperature (at 2 m height), u and v components of wind velocity (at 10 m height), surface solar radiation downwards, and total cloud cover. Wind velocity was adjusted from 10 to 2 m height and relative humidity was estimated from air and dew point temperatures according to Allen et al. (1998).

Estimation of Maranhão inflow with MOHID-Land

MOHID-Land was directly implemented in the entire Maranhão watershed, but the lack of daily inflow data at the outlet only allowed model calibration and validation to be performed at Ponte Vila Formosa. There, the estimated daily streamflow data were compared with the observed data, and, when model results are similar to the observed values with the model having a good representation of the streamflow generation on that sub-basin, the calibrated parameters were assumed as representatives of the Maranhão watershed. Hence, the daily streamflow estimated by the model in the outlet section was considered to represent the Maranhão reservoir’s inflow and was transformed to monthly volume. The monthly volumes were then validated with a reservoir mass balance identical to the one presented for the validation of 1D-CNN model’s results.

Models’ evaluation

MOHID-Land and 1D-CNN were calibrated/trained using the average daily streamflow in Ponte Vila Formosa hydrometric station. Validation was performed with daily and monthly timesteps. The dataset was also divided into wet (October–March), and dry (April–September) periods and the results were validated, ignoring the division between calibrated/trained.

In the case of MOHID-Land, the calibration period was from 01/01/2002 to 31/01/2003 and the validation was from 01/01/2004 to 31/12/2009. For the 1D-CNN model, each of the 100 runs was evaluated considering the same test dataset presented by Oliveira et al. (2023). For both models, streamflow estimation performance was evaluated in Ponte Vila Formosa station. The analysis was made with four different statistical parameters, namely, the R2, the PBIAS, the RMSE, and the NSE

$${R}^{2}={\left[\frac{{\sum }_{i=1}^{p}\left({X}_{i}^{obs}-{X}_{mean}^{obs}\right)\left({X}_{i}^{sim}-{X}_{mean}^{sim}\right)}{\sqrt{{\sum }_{i=1}^{p}{\left({X}_{i}^{obs}-{X}_{mean}^{obs}\right)}^{2}}\sqrt{{\sum }_{i=1}^{p}{\left({X}_{i}^{sim}-{X}_{mean}^{sim}\right)}^{2}}}\right]}^{2}$$
(5)
$$PBIAS=\frac{{\sum }_{i=1}^{p}\left({X}_{i}^{obs}-{X}_{i}^{sim}\right)}{{\sum }_{i=1}^{p}{X}_{i}^{obs}}\times 100$$
(6)
$$RMSE=\sqrt{\frac{1}{p}\sum_{i=1}^{p}{\left({X}_{i}^{obs}-{X}_{i}^{sim}\right)}^{2}}$$
(7)
$$NSE=1-\frac{{\sum }_{i=1}^{p}{\left({X}_{i}^{obs}-{X}_{i}^{sim}\right)}^{2}}{{\sum }_{i=1}^{p}{\left({X}_{i}^{obs}-{X}_{mean}^{obs}\right)}^{2}},$$
(8)

where Xiobs and Xisim are the flow values observed and estimated by the model on day i, respectively. Xmeanobs and Xmeansim are the average flow considering the observed and the modeled values in the analyzed period, and p is the total number of days/values in this period. According to Moriasi et al. (2007), a model is considered satisfactory when NSE > 0.5, PBIAS ± 25%, and R2 > 0.5, while the RMSE represents the standard deviation of the residuals with lower values meaning a better model’s performance.

Maranhão reservoir’s inflow was evaluated with a monthly timestep, since this is the frequency of the data available in the reservoir. Since the models were already calibrated, the validation of the reservoir’s inflow was done for the period comprehended between 01/01/2002 and 31/12/2009.

For the validation process, the monthly water volume reaching the reservoir was incorporated into a mass balance where the observed stored volume from the previous month and the water volume that leaves the reservoir in the current month were also considered

$${V}_{i}^{sim}={V}_{i-1}^{obs}+{VI}_{i}^{sim}-{VO}_{i}^{obs},$$
(9)

where Visim represents the estimated stored volume in month i, Vi-1obs represents the observed stored volume in the previous month, VIisim is the volume that enters the reservoir in month i resulting from the simulations, and VOiobs is the observed volume that leaves the reservoir. The stored volume estimated through the water balance was then compared to the observed stored volume of the corresponding month.

Performance assessment was made by a visual comparison, and it was complemented by the estimation of the R2, NSE, PBIAS, RMSE, and the RMSE-observation standard deviation ratio (RSR)

$$RSR=\frac{RMSE}{{STDEV}_{obs}}=\frac{\sqrt{{\sum }_{i=1}^{p}{\left({X}_{i}^{obs}-{X}_{i}^{sim}\right)}^{2}}}{\sqrt{{\sum }_{i=1}^{p}{\left({X}_{i}^{obs}-{X}_{mean}^{obs}\right)}^{2}}},$$
(10)

where Xiobs and Xisim are the stored volume values observed and estimated on month i, respectively, and Xmeanobs and Xmeansim are the average stored volume in the analyzed period. It is important to note that the typical approach for inflow validation, which considers the direct calculation of inflow values from a massa balance performed in the resevoir, was also tested. However, about 30% of the inflow values estimated with that approach resulted in negative inflow. Because of that, the referred approach was not considered in the study.

Results

1D-CNN at Ponte Vila Formosa

Considering the set of 100 runs performed with the 1D-CNN model and the precipitation of Ponte Vila Formosa watershed, the four statistical parameters used to evaluate model’s performance were calculated for each run and considering the test dataset. Four sets of 100 values were obtained. For each of those sets, the minimum, maximum, average, standard deviation, median, and 1st and 3rd quartiles were estimated and are presented in Table 5.

Table 5 Goodness-of-fit indicators for 1D-CNN model in Ponte Vila Formosa hydrometric station

A spread range of results were obtained for the statistical parameters, with RMSE ranging from 1.44 to 3.13 m3 s−1, PBIAS from – 40 to 67%, R2 from 0.59 to 0.90, and NSE from 0.42 to 0.88. Although some simulations did not reach the minimum requirements to be classified as satisfactory, most of them got acceptable values, with the 1st quartile presenting a NSE of 0.71 and a R2 of 0.75. This means that 75% of the simulations had higher values for NSE and R2. However, considering the PBIAS results, the table shows that the value of the 3rd quartile was 25%, which means that a quarter of the simulations present higher PBIAS. In turn, the 1st quartile of this statistical parameter was – 3.5% and the minimum value was – 40.3%, which indicates that from the 25 simulations that present lower PBIAS values, a significant part of them is still considered as having a satisfactory behavior.

The simulation considered as the best in fitting the observed streamflow in Ponte Vila Formosa station presented an NSE of 0.88, a R2 of 0.88, a PBIAS of – 7.8%, and a RMSE of 1.44 m3 s−1 (Table 5). Although the R2 of this model was not the maximum presented in the table, the combined values of the four statistical parameters represented the best solution, since the simulation with the maximum R2 presented a PBIAS of 25%, which relies in the limit of the range for a satisfactory performance.

For an easier comparison with MOHID-Land, the four statistical parameters were also estimated considering the entire dataset, neglecting the first year (2001). Streamflow results show that the model outputs included negative values for 1.5% of the dataset. Since these negative values occurred in isolated days, they were replaced by simply averaging the estimated streamflow from the previous and the next days. Table 6 presents those statistical parameters, while Fig. 5 allows a visual assessment of model’s performance. Table 6 also presents the goodness-of-fit indicators when the simulated interval was divided into wet and dry periods and considering the average monthly streamflow.

Table 6 Goodness-of-fit indicators for 1D-CNN model in Ponte Vila Formosa hydrometric station
Fig. 5
figure 5

Comparison between observed and estimated streamflow values (using the 1D-CNN model) in Ponte Vila Formosa between 01/01/2002 and 31/12/2009

When considering daily results, the 1D-CNN model demonstrated a very good performance, with the NSE and R2 reaching values of 0.65, the PBIAS being – 7.21% and the RMSE as 4.75 m3 s−1. Results were better when average monthly streamflow were considered, with NSE, R2, PBIAS, and RMSE of 0.87, 0.87, 2.23%, and 2.01 m3 s−1, respectively. This is justified, because the estimation of the average monthly values smooths out the daily errors. Considering the dry and wet periods, the 1D-CNN model shows a much better performance for the wet period. With the NSE and R2 having both values of 0.79 and a PBIAS of 8.62% for the wet period, the dry period obtained only an NSE value of 0.26, the R2 decreased to 0.57, and the PBIAS presents a value of -53%.

MOHID-Land at Ponte Vila Formosa

MOHID-Land’s calibration focused on a large number of different parameters related to the porous media, river network, and plant development processes. Among them, the fh factor and the soil hydraulic parameters were a calibration target. In the river network, the minimum threshold area, the cross-section dimensions, and the Manning coefficient were evaluated, and for the vegetation development, the Kc for different stages, and maximum root depth were also subjected to calibration.

The best solution obtained with MOHID-Land comprehended a river Manning coefficient of 0.035 s m−1/3 and a minimum threshold area of 1 km2. The calibrated cross-section dimensions are presented in Table 2, being clearly larger than those of the model set-up. In porous media, the fh adopted the value 500, while the saturated water content of each soil type was increased by 10%. Finally, the maximum root depth was 25% to 60% lower than the default values of MOHID-Land’s growth database.

The comparison between the streamflow values registered in Ponte Vila Formosa station and those estimated by MOHID-Land is presented in Fig. 6, with the corresponding statistical parameters shown in Table 7. Table 7 also shows NSE, R2, PBIAS, and RMSE for the average monthly streamflow and for the division of the analyzed period into wet and dry seasons.

Fig. 6
figure 6

Comparison between observed and estimated streamflow values (using MOHID-Land model) in Ponte Vila Formosa between 01/01/2002 and 31/12/2009

Table 7 Goodness-of-fit indicators for MOHID-Land model in Ponte Vila Formosa hydrometric station

MOHID-Land’s results show the satisfactory performance obtained with this model. It reached an NSE and an R2 of 0.65 for the calibration period with a slight decrease in the validation period (0.62 for NSE and 0.63 for R2). PBIAS demonstrated an underestimation of streamflow in calibration and an overestimation during validation, while RMSE values were similar in both periods. When considering the monthly aggregation, the model reached a very good performance, with NSE and R2 values above 0.85 in calibration and validation periods. The RMSE showed a decrease in both periods when compared with the daily values. Finally, PBIAS did not suffer significant changes. During the wet period, the performance of the model was better than in the dry period. Although R2 showed a better value for the dry period, NSE and PBIAS demonstrated an accentuated decrease in model’s performance in that period, with the first going from 0.61 to 0.39 and the second indicating an overestimation of about 9% in wet period and an underestimation of about 30% in dry period.

Maranhão reservoir’s inflow

The characterization of Maranhão reservoir’s inflow obtained with MOHID-Land and 1D-CNN models from 01/01/2002 until 31/12/2009 is presented in Table 8. The respective flow duration curves are presented in Fig. 7.

Table 8 Maranhão reservoir's inflow characterization for 1D-CNN and MOHID-Land models
Fig. 7
figure 7

Flow duration curve for Maranhão reservoir's inflow estimated with MOHID-Land (blue line) and 1D-CNN (red line)

Results from Table 8 showed a very similar behavior for both models apart from the maximum streamflow value. In that case, the 1D-CNN model presented a maximum streamflow more than twice the maximum streamflow estimated by MOHID-Land. However, MOHID-Land had a slightly higher streamflow average. It indicates that for the middle streamflow values, MOHID-Land tends to overestimate 1D-CNN model. It is also demonstrated in Fig. 7, where it is possible to confirm that for streamflow values with non-exceedance probability between 0 and 0.3, higher values are observed for MOHID-Land.

Regarding the validation of stored volumes considering the reservoir’s mass balance, NSE, R2, PBIAS, RMSE, and RSR were estimated for the entire period, and the results are presented in Table 9. Figure 7 presents the graph with the comparison between the two models and the observed stored volumes.

Table 9 NSE, R2, PBIAS, and RMSE values for MOHID-Land and 1D-CNN models for reservoir’s stored volume

Results showed good agreement between both models and observed values. In fact, 1D-CNN and MOHID-Land presented very similar R2 (1D-CNN: 0.84; MOHID-Land: 0.85) and RMSE (1D-CNN: 18.62 hm3; MOHID-Land: 18.61 hm3) values. NSE and RSR were equal in both cases, while PBIAS was the parameter in which some difference is observed. With a PBIAS of -0.55% for 1D-CNN model and -1.18% for MOHID-Land model, both models were slightly underestimating the reservoir’s inflow. MOHID-Land showed a higher tendency for that underestimation.

Discussion

1D-CNN model

The 1D-CNN model had already demonstrated its adequacy to predict streamflow in the sub-basin of Ponte Vila Formosa station as demonstrated in Oliveira et al. (2023). The approach presented here, where 100 simulations were performed with the same 1D-CNN structure, allowed to slightly improve the results obtained in that study. Thus, the best solution had an NSE and an R2 of 0.88, a PBIAS of – 7.80%, and an RMSE of 1.44 m3 s−1, considering the test dataset. Results also show that half of the 100 simulations obtained a NSE higher than 0.74 and/or a R2 above 0.79. The same number of simulations got a PBIAS lower than 9.52%. It indicates the suitability of the developed structure for streamflow estimation.

The results of the 1D-CNN model are in accordance with the results of several authors. Barino et al. (2020) used two 1D-CNN models to predict multi-day ahead river flow in Madeira River, a tributary of the Amazon River, Brazil. One of those models considered only the river flow in previous days, while the other considered that same variable combined with the turbidity. Both models obtained NSE and R2 values higher than 0.92, while mean absolute percentage error (MAPE) and normalized RMSE were lower than 25% and 0.20, respectively. Among the models analyzed by Huang et al. (2020), two CNN models were studied to forecast a day ahead streamflow. Considering the lagged streamflow values of the past 16 days in the site to be forecasted and in the neighborhood, a generic CNN model and a CNN model trained with a transfer learning procedure were tested. With four different locations in the United Kingdom being the studied, the generic CNN model obtained MAPE values between 14.36% and 41.95%, while the MAPE of the other CNN model laid between 12.29% and 32.17%. Duan et al. (2020) considered the watersheds within the Catchment Attributes for Large-Sample Studies (CAMELS dataset), in California, USA, to test a temporal CNN model. The model was developed for long-term streamflow projection and consisted of a one-dimensional network that used dilated causal convolutions. As input variables, authors elected precipitation, temperature, and solar radiation and tested different time window sizes to delay the values. After performing 15 runs for each watershed in the CAMELS dataset, the average NSE was 0.55, while the average NSE for the best run over all basins was 0.65. Finally, a CNN model was employed by Song (2020) to estimate daily streamflow in Heuk River watershed, in South Korea. Using rainfall, runoff, soil map, and land-use data, authors generated a hydrological image based on curve number method to feed the neural network and estimate streamflow in the watershed. Model evaluation resulted on a coefficient of correlation of 0.87 and a NSE of 0.60.

Usually, in machine learning methods, better results are verified when antecedent streamflow is considered as a forcing variable (Barino et al. 2020; Khosravi et al. 2022). However, when the model is used in the simulation of future scenarios or periods when no observed data are available, the antecedent streamflow values to feed the model are those already calculated by the model in the previous iterations. Consequently, the propagation and exacerbation of errors in the estimates can lead to a degradation of the results in the long-term. There are also other types of machine learning methods for streamflow estimation emerging in the last few years. For instance, Si et al. (2021) considered a graphical convolutional GRU model to predict the streamflow in the next 36 h hours, while Szczepanek (2022) used three different models, namely, XGBoost, LightGBM, and CatBoost, for daily streamflow forecast. Additionally, hybrid solutions considering different machine learning algorithms, such as Di Nunno et al. (2023) and Yu et al. (2023), are becoming widely used and with improved results.

MOHID-Land model

MOHID-Land daily results demonstrated to be satisfactory. With an NSE and an R2 higher than 0.62 and 0.63, respectively, and a PBIAS between – 7% and 4%, and an average RMSE of 5.6 m3 s−1, these results were substantially better than those presented by Almeida et al. (2018) for the same study area. Using Soil Water Assessment Tool (SWAT), the authors compared the daily streamflow also in Ponte Vila Formosa station. They obtained an NSE, an R2, a bias, and an RMSE of – 3.05, 0.31, 2.93, and 12.61 m3 s−1, respectively, for the calibration period. For the validation, the NSE was 0.11, the R2 was 0.24, and the bias and RMSE were – 0.46 and 15.21 m3 s−1, respectively. Almeida et al. (2018) also made a daily comparison in Moinho Novo hydrometric station, which is located in Montargil watershed and is very similar to Maranhão watershed sharing boundaries between them. For Moinho Novo station, the authors obtained for calibration and validation periods, respectively, an NSE of 0.22 and 0.39, an R2 of 0.41 in both cases, a bias of 0.90 and – 1.07, and an RMSE of 13.1 and 16.6 m3 s−1. Bessa Santos et al. (2019) estimated the daily streamflow in Sabor River watershed, placed in Northeast Portugal and with an area of 3170 km2. Using SWAT model, they compared the modeled and observed river flow values and the results reached an NSE of 0.62 and 0.61 for calibration and validation periods, respectively, and a R2 for those same periods of 0.63 and 0.80. The PBIAS was 2.7% for calibration and -24% for validation, while RSR for calibration and validation was 0.62 and 0.63, respectively. Considering Pracana watershed, located in Central Portugal, Demirel et al. (2009) also used SWAT model to predict daily streamflow. Authors classified the model as having a poor peak magnitude estimation.

Considering the monthly values, MOHID-Land’s performance increased substantially when compared with the daily values. The results reached an NSE of 0.85 and 0.92 and a R2 of 0.86 and 0.95 for calibration and validation periods, respectively. PBIAS and RMSE also demonstrated the very good behavior of the model. Those parameters obtained very good results for the calibration and validation periods, with PBIAS indicating a slight underestimation during calibration (-6.59%) and an overestimation (4.15%) during validation, and the RMSE being about 2 m3 s−1 for both periods. In line with this work, Brito et al. (2018) used SWAT for long-term forecasts of monthly Enxoé reservoir’s inflow. With that watershed located in South Portugal and draining an area of 60 km2, authors reached an NSE of 0.78 and an R2 of 0.77. Almeida et al. (2018) also presented a monthly analysis for Ponte Vila Formosa station, with SWAT obtaining an NSE of – 1.26 and 0.40 for calibration and validation periods. For calibration and validation, respectively, R2 reached values of 0.58 and 0.54, the bias was 2.97 and – 0.42, and the RMSE 6.04 and 5.93 m3 s−1. Ponte Vila Formosa streamflow was also modeled by van der Laan et al. (2023) with SWAT model. They obtained an NSE, an R2, and a PBIAS for calibration period of 0.76, 0.77, and – 7.1%, respectively. For the validation period, the NSE was 0.89, the R2 was 0.9, and PBIAS was 15%.

The comparisons presented above allowed to conclude that MOHID-Land’s performance is in line with the other studies carried out in Portuguese watersheds for daily streamflow estimation. The exception was the study performed by Almeida et al. (2018) where the simulation of the same sub-basin that was being modeled here obtained a much poorer performance there. When monthly streamflow was considered, MOHID-Land’s performance surpassed the results obtained with SWAT model for the same or identical sub-basins. The difference in the performance of the models is justified by the fact that SWAT is more empirically parametrized than MOHID-Land. For instance, MOHID-Land explicitly estimates the infiltration and porous media fluxes based on Darcy’s law and Richards equation, respectively, with the remaining water transformed into surface runoff where fluxes are estimated based on Saint–Venant equation. On the other hand, in SWAT, a baseflow factor, which is a direct index of groundwater flow response to changes in recharge, or a surface runoff lag coefficient to control the fraction of the total available water that will be allowed to enter the reach on 1 day, needs to be defined. The empirical parametrization of some processes prevents a more accurate representation of reality, leading to more errors in estimates and the degradation of the overall performance, especially beyond the period of calibration.

Nonetheless, MOHID-Land has its own limitations. In one hand, the implementation effort is significatively high, with several parameters needing to be defined, such as the six hydraulic parameters of all the soil types, the crop coefficients for each type of vegetation, the surface and the river Manning coefficients, and others. The high number of input data, parameters, and variables that the user should define conduces to an extremely high number of parameters that can be calibrated, which can be time-consuming. A consequence of this is reflected in the number of simulations performed to reach the best solution. In this study, more than 70 simulations were made to test the sensitivity of the MOHID-Land to other parameters than those studied by Oliveira et al. (2020), and to obtain the combination that allows a good fit between modeled and observed streamflow. On the other hand, the empirical representation of parts of the hydrological processes or the generalization of some parameters can make the representation of the modeled system difficult, leading to values of the calibrated parameters outside the normal ranges. That condition is here verified with the crop coefficients calibrated for the summer and winter crops, which are considered too low.

Models’ comparison

Overall, the 1D-CNN model demonstrated a better performance than MOHID-Land model for daily streamflow estimation in Ponte Vila Formosa station. However, when the results are aggregated by month, MOHID-Land’s performance surpassed the 1D-CNN results.

Focusing on wet and dry periods, it is interesting to verify that the results of both models complement each other. If on one hand, the 1D-CNN obtained a performance for the wet period better than that obtained by MOHID-Land, on the other hand, during the dry period, MOHID-Land demonstrated a better performance. Thus, in the first case, both models achieved satisfactory performances, but the 1D-CNN, with an NSE and R2 of 0.79, was better than MOHID-Land, which obtained an NSE of 0.61 and an R2 of 0.63. In the second case, the dry period, both models experienced a decrease in their performances, but MOHID-Land, with an NSE of 0.39 and an R2 of 0.69, performed better than the 1D-CNN model, which obtained an NSE of 0.26 and an R2 of 0.56. These results put in evidence the difficulty of MOHID-Land in estimating the peak flow events, but also a better ability to simulate the transitions between the wet and dry periods when compared to the 1D-CNN. It can also be verified in Figs. 5 and 6, where the results for MOHID-Land demonstrate a more natural behavior than those obtained for 1D-CNN model.

The more irregular behavior of 1D-CNN model is in part justified by the fact that these types of models have not a physical basis, which means that the streamflow estimation does not consider physical laws or limitations. This characteristic of neural network models also justifies the difficulty in avoiding the existence of negative streamflow values. Although other authors did not refer to this issue, it was verified in this study and should not be ignored, since it can limit the application of the model.

Models’ extension to Maranhão watershed

The streamflow estimated by the extension of 1D-CNN and MOHID-Land models to the entire Maranhão watershed was made by the adaptation of the trained and calibrated models to that watershed. Thus, the 1D-CNN presents a maximum inflow value substantially higher than the maximum predicted by MOHID-Land, which is related to the fact that MOHID-Land demonstrated some difficulty in reproducing peaks flow (Table 8). The remaining statistics are similar between both models, with the minimum streamflow near 0 m3 s−1, the average is between 3.6 and 3.9 m3 s−1, and the median is 1.9 and 1.6 m3 s−1 for 1D-CNN and MOHID-Land.

The evaluation of the inflow values based on the mass balance at the reservoir scale showed a very good performance when using 1D-CNN and MOHID-Land (Table 9). Both models have NSE and RSR of 0.79 and 0.46, respectively. R2 is 0.84 for 1D-CNN and 0.85 for MOHID-Land and the RMSE is 18.6 m3 s−1 for both models. The higher difference in the statistical parameters is in the PBIAS with the 1D-CNN underestimating – 0.55% and the MOHID-Land also presenting an underestimation, but a little higher, of about – 1.18%. Visually, it is also possible to verify slight differences between the stored volume estimated with inflow from 1D-CNN model and from MOHID-Land model (Fig. 8), with the main differences occurring in the wet season (October–March).

Fig. 8
figure 8

Comparison between observed stored volume (black line) and stored volumes estimated considering the streamflow simulated by MOHID-Land (blue line) and 1D-CNN model (red line)

In a similar approach but considering the continuous simulation of the stored water in two reservoirs included in the same modeled watershed, Rocha et al. (2020) found identical results. Using SWAT model to Monte Novo and Vigia reservoirs, in South Portugal, the authors validated the stored volume of both reservoirs with a monthly timestep, obtaining an NSE of 0.44 and a PBIAS of 6.3% for Monte Novo reservoir and an NSE of 0.70 and PBIAS of 10.1% for Vigia reservoirs.

In this case, models were extended to an ungauged watershed, which physical characteristics and the rainfall regime are similar to those verified in the sub-basin where the models were trained or calibrated. In that sense, the question that arises from this study is about the behavior of this expanding approach when larger watersheds, marked by diversified characteristics and rainfall regimes, are the target of the study. In those cases, the calibrated parameters cannot be representative or even represented in the expanded area, for the typical hydrological models, or the differences in the rainfall regime when considering the expanded area cannot be correctly related with the runoff values, which was already referred to by Parisouj et al. (2020).

Finally, it is important to note that several sources of uncertainty are involved in modeling Ponte Vila Formosa watershed, but also in expanding the optimized models to the entire watershed. Besides difficulties in correctly considering the differences between monitored and unmonitored areas, models also have their own uncertainty. On one hand, the limitations of model developers and users in correctly representing real systems through the structure of a hydrologic model and approximations made by numerical methods result in residual model errors and, therefore, in model output uncertainties (Loucks and van Beek 2017). However, the attempt of improve the representation of reality through the increase of model complexity results in adding the cost of data collection and may also introduce more parameters needing to be defined, which can then result in more potential sources of error in model output. On the other hand, Gal and Ghahramani (2016) focused on the high levels of uncertainty when using deep learning tools for regression and classification, even with simple modeling structures. In that sense, further investigation should be carried out concerning the expansion of both models and the involved uncertainty. For a better understanding, for example, several instances of the same model, with slight but coherent differences in the parametrization, can be taken into account, with the calculation of the streamflow resulting from the combination of those instances and considering the estimation of confidence intervals.

Conclusions

The proposed approach showed the adequateness of implementing a 1D-CNN model and a physically based model for estimating daily streamflow generation at the outlet of an ungauged watershed after prior calibration/validation of those models in a sub-basin of the same catchment. Considering the sub-basin modeling, the 1D-CNN model demonstrated a better performance than MOHID-Land when considering the daily values and the wet period. The MOHID-Land model showed a better performance in estimating streamflow values during dry periods and for a monthly analysis. When the validation of the reservoir mass balance was considered, the results showed an identical behavior for both models, with only a slight difference in the PBIAS. That difference indicates a smaller underestimation of inflow by the 1D-CNN than that estimated by MOHID-Land.

Although the results were considered from satisfactory to very good in all the steps taken during the validation process, the generation of negative values by the 1D-CNN is of concern. In that sense, the model presented here should be a target of improvement in future applications. In turn, MOHID-Land model revealed a lower performance for daily streamflow estimation, but its physical basis contributes to avoiding unpredictable and incomprehensible results.

Finally, it is worth noting that neural network models are developed and trained for present and/or past conditions, and their application to future scenarios can be limited. Also, the prediction of events that go beyond the observations can be problematic. This limitation is mainly related to its lack of capacity to absorb information about future conditions in cases where neural networks were not prepared to be forced by variables that include the impact of those future changes. Nonetheless, the changes in future conditions can be easily imposed in physically based models, with the main problems being: (i) the detail of the characterization of future conditions, that most of the time is too coarse for the detail adopted on physical models; and (ii) the high computational time needed to run long-term simulations, usually performed in analysis of future scenarios. Thus, hybrid solutions, combining different types of models or different models, can be used to incorporate the predicted changes in neural network models.