Introduction

Building dam reservoirs are one of the other oldest branches of engineering. Historically, human civilization developed on rivers. As humanity expanded and advanced worldwide, the number of constructed dams has increased, especially in nearly every water body region (Olden and Naiman 2010; Rheinheimer et al. 2015). Among the common and significant roles that dams play are water storage, water volume control, and flood protection which have not yet fully understood the ecologies of the global riverine system. Reservoirs and dams and their operation can affect riverine ecology, including changing riverine thermal regimes and water temperature fluctuation alongside the rivers (Olden and Naiman 2010; Rheinheimer et al. 2015). Water temperatures can affect aquatic species’ health, distribution, and functions (Jiang et al. 2018); therefore, as the number of constructed dams increases, the understanding of water temperature variation of fluctuation has become a priority for ecological researchers (Murchie et al. 2008; Olden and Naiman 2010). Indeed, it was shown that the dam’s water release mechanism is the major and critical factor controlling the water temperature downstream of the dams (Tao et al. 2020). Generally, a high volume of cold water was passed down through “deep portals” beneath the thermocline, especially the hypolimnetic layer (Olden and Naiman 2010; Kushwaha and Bhardwaj 2016). Although this is a rare occurrence, water was passed down above the thermocline specifically the hypolimnetic layer, causing an increase in the downstream water temperatures (Cheng et al. 2020).

Reservoirs impact the seasonal and annual thermal patterns of downstream water temperature. Indeed, it was demonstrated that, during the spring and summer seasons, water temperature fluctuation in large reservoirs was moved toward a decreased direction compared to the winter season, for which negligible fluctuation has been experienced. Compared to the well-informed natural rivers, a significant delay for the maxima values was exhibited (Olden and Naiman 2010). For comparison, it was shown that many dams constructed worldwide had encountered similar phenomena, among them are Hills Creek Dam in the USA (Angilletta Jr et al. 2008), the controlled dam in Scotland (Jackson et al. 2007), and the Burrendong Dams in Australia (Ryan et al. 2001; Preece 2004).

The impounded reservoirs behind the dams significantly influence the temperature regimes alongside the dam’s river. Water is released through the dam at the upstream reservoir channels (Ali et al. 2019b). Temperature gradients observed over a long period are generally used as an alternative to assess a free-flowing river’s natural thermodynamics within impounded water. They significantly affect the marine life upstream and downstream of the diversion of impoundment. Consequently, overall marine aquatic life is highly vulnerable to temperature fluctuation. All these marine organisms must adapt, relocate, or perish in response to the impacts of thermal regime modification.

The greatest production of power electricity in the world is guaranteed by the Three Gorges Dam (TGD). Also, it possesses the biggest stored water volume (Wu et al. 2012). Regarding its high hydraulic, hydrological, and ecological importance, a large number of investigations have been conducted over the TGD, i.e., hydrological alteration (Gao et al. 2012; Yu et al. 2017b; Wang et al. 2017), investigating the streamflow variation conducted by Gao et al. (2012), highlighting that the TGD has significantly contributed to the decreasing in the calculated downstream flow section. It has helped reduce the peak flows (Ali et al. 2019c).

Over the past two decades, artificial intelligence (AI) and machine learning techniques have been successfully developed and widely used for estimating and predicting (Citakoglu and Coşkun 2022), in particular, modeling non-linear hydrologic systems and agriculture field (Shukla et al. 2021), meteorological droughts and standardized precipitation index (SPI) (Malik et al. 2021; Xu et al. 2022), lake water level (Zhu et al. 2020), rainfall forecasting (Luk et al. 2001; Olsson et al. 2004; Abbot and Marohasy 2012; Lee et al. 2018; Mirabbasi et al. 2019; Adnan et al. 2020; Armin et al. 2021; Khosravi et al. 2022), streamflow forecasting (Yaseen et al. 2016; Shukla et al. 2021; Khodakhah et al. 2022), hydrological drought (Shamshirband et al. 2020; Aghelpour et al. 2021; Muhammad et al. 2021; Almikaeel et al. 2022), pan evaporation forecasting (Shiri and Özgur 2011; Mohammad et al. 2019; Malik et al. 2020; Al-Mukhtar 2021; Kushwaha et al. 2021), evapotranspiration (Granata 2019; Wu et al. 2019; Tikhamarine et al. 2019, 2020; Chen et al. 2020; Chia et al. 2020; Ferreira and da Cunha 2020; Elbeltagi et al. 2022b), water level forecasting (Daliakopoulos et al. 2005; Nayak et al. 2006; Ali Ghorbani et al. 2010; Kisi et al. 2012; Buyukyildiz et al. 2014; Seo et al. 2015, 2017), velocity predictions in compound channels with vegetated floodplains (Harris et al. 2003), suspended sediment load prediction (Melesse et al. 2011; Rajaee et al. 2011; Azamathulla et al. 2013; Kakaei Lafdani et al. 2013; Gupta et al. 2021), soil temperature (Yang and Wang 2008; Bilgili 2010; Singh et al. 2018; Penghui et al. 2020), water quality (Singh et al. 2021b), groundwater quality variables (Esmaeilbeiki et al. 2020; Che Nordin et al. 2021; El Bilali et al. 2021; Singha et al. 2021; Shiri et al. 2021; Singh et al. 2022), soil permeability (Singh et al. 2020, 2021a; Özçoban et al. 2022), soil hydraulic conductivity (Allah et al. 2014; Sihag et al. 2019a; Singh et al. 2019; Araya and Ghezzehei 2019), runoff and suspended sediment simulation (Sharma et al. 2015; Kumar et al. 2019), soil infiltration (Kashi et al. 2014; Sihag et al. 2019b; Panahi et al. 2021; Sayari et al. 2021; Angelaki et al. 2021), global solar radiation (Hassan et al. 2017; Voyant et al. 2017; Cornejo-Bueno et al. 2019; Feng et al. 2019; Ağbulut et al. 2021), dew point temperature (Naganna et al. 2019; Qasem et al. 2019; Alizamir et al. 2020), chezy resistance coefficient in corrugated channels (Giustolisi 2004), manning’s roughness coefficient in flows, (Bahramifar et al. 2013; Pradhan and Khatua 2017; Mohanta et al. 2018), and drought- and stress-tolerance (Kumar et al. 2022).

The main aim of this work is to provide an experimental evaluation of the effect of dams on river water temperature fluctuation. The study considered river water temperature over many years before and after selecting reservoirs (Kuriqi et al. 2020). The study findings are expected to allow users to establish a direct effect of the TDG on the river’s thermal regime. The findings of this study provide an insight into future development projects; for instance, it can present valuable information and a priori view to support the engineers and practitioners to implement the structures to be constructed to cope with the floods and droughts when looking at prevailing climatic events. The finding of the study can be beneficial in planning and management of water resources at Yangtze River.

Materials and methods

Study area and climate characterization

China is blessed with an abundant number of rivers flowing from north to south, including the Yangtze River, among others. The Yangtze River is one of the longest rivers around the world, which collects water from several catchments. This paper uses the Yangtze River located in China with in the Coordinates latitude 29.7204° N, longitude 112.6501° E as a case study. It flows from Qinghai’s southwest corner to Shanghai’s north end. The river basin is approximately 1.8 million km2 in size. It provides approximately 892 km3 of water calculated as a river discharge for the period ranging from 1950 to 2010 (Yang and Lu 2012; Liu et al. 2018). The monsoon is a dominant component in this region. It is designed for the transportation of moist air, starting from the East and ending in the south China Sea, according to spatiotemporal data of rainfall alongside the river basin (Li et al. 2014; Wu et al. 2018), and there are numerous precipitation patterns over time (Zhao and Shepherd 2012). Summers receive a large amount of precipitation, leading to floods (Wu et al. 2012; Zhao and Shepherd 2012). The river is about 6400 km long and is Asia’s longest river (Vezzoli et al. 2016; Ali et al. 2019b). Due to the river’s length, nearly 50,000 reservoirs of various sizes have been built. The sources of nitrogen and phosphorus were highly influenced by the spatiotemporal fluctuation of the Yangtze River (Liu et al. 2018; Ali et al. 2019a). From year to year, it was shown from several conducted investigations that the natural aquatic habitat was significantly affected by the TGD project, whether at the upstream or the downstream locations of the dam (Yu et al. 2017a). As a result, three stations on the dam’s upstream sides were chosen for investigation in this study (Fig. 1) to depict the stations’ positions. All stations were chosen according to their geographical situation and the availability of high-quality data. The Hydrologic Data Centre of China’s Ministry of Water Resources provided the mean daily river stations and the Yangtze River afterdata.

Fig. 1
figure 1

Locations of the hydrologic stations and the Yangtze River

Mann-Kendall trend analysis

The Mann-Kendall statistical test for trend is used to assess whether a set of data values is increasing over time or decreasing over time and whether the trend in either direction is statistically significant. The Mann-Kendall test does not assess the magnitude of change. There are several trend assessments approaches available in the literature. However, the Mann-Kendall test is the most widely used test for assessing the trends in hydro-climatic studies. The Mann-Kendall test (Ahmed et al. 2017; Ali et al. 2019c), which is recommended by the World Meteorological Organization (WMO) often used as because it has several advantages: it does consider the data distribution, and it can cope with the outliers (Ali et al. 2019c). For a time-series data points Y = {x1, x2, x3, x4, x5….. xn} with n > 10. The Mann-Kendall test statistic, S is calculated as (Haktanir and Citakoglu 2014; Tefaruk and Hatice 2015; Citakoglu and Minarecioglu 2021)

$$S=\sum_{k=1}^{n-1}\sum_{j=k+1}^n\operatorname{sgn}\left({x}_j\hbox{--} {x}_k\right)$$
(1)

where n is the number of data points and sgn(xj - xk) is calculated as

$$\operatorname{sgn}\;\left({\mathrm x}_{\mathrm j}-{\mathrm x}_{\mathrm k}\right)=\left\{\begin{array}{c}1\;\mathrm{for}\;\left({\mathrm x}_{\mathrm j}-{\mathrm x}_{\mathrm k}\right)\;>\;0\\0\;\mathrm{for}\left({\mathrm x}_{\mathrm j}-{\mathrm x}_{\mathrm k}\right)\;=\;0\\-1\;\mathrm{for}\left({\mathrm x}_{\mathrm j}-{\mathrm x}_{\mathrm k}\right)\;<\;0\end{array}\right.$$
(2)

If we assume that selected data points are independent and randomly ordered, the mean of S = 0 and the variance of M.K. statistics [Var(S)] is given by

$$\mathrm{Var}\left(\mathrm{S}\right)=\frac{\left[\mathrm{m}\left(\mathrm{m}-1\right)\left(2\mathrm{m}+5\right)-\sum_{\mathrm{p}=1}^{\mathrm{q}}\left({\mathrm{t}}_{\mathrm{p}}-1\right){\mathrm{t}}_{\mathrm{p}}\left(2{\mathrm{t}}_{\mathrm{p}}+5\right)\right]}{18}$$
(3)

where q is the number of groups of tied rank, each with tp tied observation. A tied group is a set of the same values in a selected dataset. The standard normal test statistic (Z) is calculated as

$$\mathrm{Z}=\frac{S\hbox{--} \operatorname{sgn}(S)}{\mathrm{Var}{(S)}^{1/2}}$$
(4)

The Sen’s slope (S.S.) is represented by calculating the slope as a change in measurement per unit change in time:

$$S.S,= Median\left[\frac{w_j-{w}_i}{j-i}\right]$$
(5)

where wj and wi have represented the values of information at the time i and j, respectively, for all i < j.

Based on the Mann-Kendall test, the M-K significance of monthly, yearly, and seasonal dam temperature trends is assessed and tabulated in Table 1. Table 1 indicates that, during the January, February, April, June, October, November, and December months, the increasing temperature trend and the rest of the month were found to decrease but were statistically not significant in both cases. In Fig. 2, we depicted water temperature fluctuation at three different trends: monthly, yearly, and seasonal. Indeed, it is clear that the statistical test (Table 1) confirmed that a statistically positive trend could be highlighted. In addition, yearly and monthly fluctuation of water temperatures follows a rapidly ascending curve during the period of record, which is statistically significant after the dam project’s realization. Taking into account the water temperature anomalies, it is clear that fitting the mean yearly fluctuation of the water temperature using a linear fit led to detect a non-significant and high trend of approximately ≈0.072°C for each year for the average water temperature and the seasonal water temperature was increased by approximately 0.082°C during the period ranging from 2010 to 2015. During the autumn (September to October) and winters (November to March) seasons, positive trends were detected by 0.165 and 0.206 0.082°C, respectively, and spring (April to May) and summer (June to August) seasons were detected negatively by −0.030, −0.015°C, respectively.

Table 1 Statistic and change percentage (2010–2015)
Fig. 2
figure 2

Fluctuation and trend of monthly, yearly, and seasonally temperature at Yangtze River in China (2010–2015)

Factory sites around the city, a rise in human activity, and a lack of green spaces and parks all contribute to the city’s warming. Furthermore, the mountains surrounding the city act as natural windshields, impeding smooth air circulation and contributing to the city’s heat. It satisfies the accuracy requirement that the temperature simulation in the reservoirs essentially agrees with the recorded data, allowing the developed model to accurately simulate the trends and evolution of water temperature structure over space and time at the Xiangjiaba and Xiluodu reservoirs.

Seasonal variation exists in the stratification of water temperature in the Xiangjiaba Reservoir. It was almost visible from April to August and then vanished in other months. The surface water temperature rises increasingly rapidly in spring, and the stratification steadily intensifies. Due to the reservoir’s flood, the bottom water temperature rose rapidly in the summer. The thickness of the isothermal layer increased, and the treatment decreased due to the strong vertical turbulent diffusion. The storage leads to conclude that the stratification dissolved in the autumn. The study found that the water temperature distribution in Xiangjiaba Reservoir is affected by the inflow temperature, meteorological elements, and intake elevation. The inflow temperature only affects the size of the water temperature in the Xiangjiaba Reservoir. However, the influence of the elevation and discharge ways on the vertical water structure in front of the dam was more notable. Meteorological elements control the surface water temperature within 10 m.

The lagging heating process was visible in spring after the impoundment of Xiluodu Reservoir. The water temperature lowering process was relatively smooth in the fall and winter. The daily variable amplitude of the water temperature was reduced daily. The inflow temperature is nearly identical to the water temperature in front of the dam. The velocity in the Xiluodu Reservoir has grown greatly due to the increased inflow. The seasonal stratification of vertical water temperature in the Xiluodu Reservoir was noticeable. It could be divided into epilimnion, thermocline, and hypolimnion. The epilimnion depth increased, the thermocline thickness reduced, and the water temperature stratification structure strengthened as input and water temperature increased in spring. Due to the deep hole spillway, the thermocline moves slowly down during the flood season. The hypolimnion range shrank gradually, while the water temperature remained stable at 14–15°. The inflow temperature had little influence on the vertical water structure in the front of the dam; the intake elevation significantly affected the thermocline depth in the Xiluodu Reservoir.

In Xiangjiaba Reservoir, located at the end of the cascade reservoirs in the Jinsha River, affected by the impoundment of Xiluodu Reservoir, the water temperature change process was lagged, and heating and cooling processes were smoother. The congestion time and the accumulative effect of water temperature on space were more significant. The impact of cascade reservoirs on downstream river water temperature processes can be summarized in two ways: the first is the homogenization effect, which refers to the amplitude of annual variation in water temperature decrease, and the second is the lagging effect, which refers to the apparent delay in water temperature change.

According to the cumulative impacts of the water temperature in the Xiangjiaba Reservoir and Xiluodu Reservoir, the temperature of the discharged water in the Xiangjiaba Reservoir was below the lower limit of demand from March to May. The control measures are as follows: in Xiangjiaba Reservoir, the left power station will be tested to operate from March to May. The right power station will be tested to operate from August to February. In Xiluodu Reservoir, the stop-log gate will be enabled in March. Then, the level deterioration should be started from January, as far as possible, down to 540 m before May 1.

Dataset

In the present study, we examine the variation of water temperature in the upper and middle streams of the Yangtze River at Cuntan from 2010 to 2015. However, two scenarios were deeply analyzed: the pre-impact and post-impact. The period of 2010–2012 was considered pre-impact, while 2013–2015 was considered post-impact for Cuntan station. The descriptive statistics of the data selected for the two scenarios are reported in Table 2. The statistical summary for the two scenarios during the training and testing period is given in Table 2, and the inter-co-relation among input variables is shown in Tables 3 and 4, respectively.

Table 2 Statistics of measured daily water temperature at study stations
Table 3 Correlation matrix and multicollinearity statistics analysis result from pre-impact intercomparison input combination (variables) characteristics
Table 4 Correlation matrix and multicollinearity statistics analysis result from post-impact intercomparison input combination (variables) characteristics

Machine learning models

Random Subspace (RSS)

The RSS generates several representations that can create a wide diversity of decision agents (Li et al. 2011; Pham et al. 2018). RSS, like bagging, modifies the training set; more precisely, the change is made for the future and not, for example, space. For a given p-dimensional vector (Zj) from the calibration dataset, i.e., (zj1, zj2... zjp), a (P) features were randomly chosen. Hence, a Random Subspace of the first p-dimensional vector is presented due to this subspace selection. A new calibration dataset is designated as (Z = z1, z2, z3,…, zn) of the initially p-dimensional training instances. Consequently, first base-level classifiers are constructed, and a voting mechanism is used to get a final prediction.

This technique is adopted to boost the accuracy achieved using poor classifiers performance (Plumpton et al. 2012). Following that, the RSS introduces randomness into the issue formulation by selecting certain variables to be substituted at random (Li et al. 2011). The RSS algorithm is a robust ensemble with several different classifiers (Plumpton et al. 2012). Integrating these weak classifiers becomes a robust model (Al-rimy et al. 2019).

Furthermore, stochastic discrimination theory is similar to the bagging method in that randomly selecting for the presented calibration dataset was adopted (Garca-Pedrajas and Ortiz-Boyer 2008); nevertheless, the RSS is selected using the fixed method calibration subset of attributes (Hong et al. 2017). M patterns were randomly chosen when building an RSS model to several aggregate classifiers for cataloging. They had L size without the need for any replacement. Each candidate example combines several single subsets representing an R subspace. Subsequently, a classifier is then calibrated using a sole subset of the all training set (Pham et al. 2018). The parameters selected for modeling pre- and post-impact in the RSS algorithm are presented in Table 5.

Table 5 The machine learning algorithm parameters are used for pre- and post-dam construction water temperature modeling

Reduced Error Pruning Tree (REPTree)

The REPTree is one of the ensembles learning algorithms. It is used for building a decision tree (DT) model using an ensemble of dataset by decreasing the variance. The information can be obtained using a splitting criterion, and decreasing the error pruning is the critical goal of the training process. Based on the division of the available instance, the REPTree can successfully handle missing data. For building a REPTree model, four pieces of information are necessary to be provided: (i) for each leaf of the threes, the minimum number of instances should be provided, (ii) the maximal value of the tree depth, (iii) for the split, the minimum ratio of the training set, and (iv) how many numbers of folds should be provided for better pruning (Srinivasan and Mekala 2014; Witten et al. 2016).

It employs regression tree logic to generate iteratively after that successfully; it only chooses one which is considered the best (Rajesh and Karthikeyan 2017). Several authors used the REPTree model to predict air pollution concentration (Oprea et al. 2016; Vitkar 2017). Furthermore, the REPTree employs the validation dataset to accurately anticipate generalization errors (Nhu et al. 2020; Pham et al. 2021). From a computational point of view, backward overfitting is the first and sole responsibility of the pruning process achieved using the REPTree model. The essential benefit of the REPT technique is that it reduces the model complexities, escapes the over-fitting during the learning phase, and maintains accuracy (Khosravi et al. 2018). The parameters selected for modeling pre- and post-impact in the REPTree algorithm are presented in Table 5.

Random Forest (RF)

Random Forest (RF) is a strong artificial intelligence technique developed by Breiman (2001) for measuring the considerable level of predictive parameters and producing accurate results without any of the overfitting fitting issues (Devasena 2014). It is a classifier composed of a collection of classification trees mainly related to the variables. Every tree produces a unique class, and all classes are then aggregated. The overwhelming vote predicts the outcomes (Pavey et al. 2017). It is used in classification and regression situations. The algorithm can be used for learning a complicated large dataset.

In contrast, a forest grows from numerous regression trees, putting them together and building an ensemble (Breiman 2001). Equal bias values characterize all trees; however, variances minimization can be achieved by lowering the link between the coefficients (Hastie et al. 2009). The results are numerical values, and the training sample is expected to be statistically independent.

The main advantages of the RF technique can be summarized as follows: (i) high generalization capacity, (ii) slightly sensitive to the attribute values, and (iii) can be easily calibrated using cross-validation. The ability of the R.F. methodology in simulating long-term monthly air temperature was studied, and its accuracy was examined by Mohsenzadeh Karimi et al. (2020); application examples showed advantageous characteristics of the R.F., which has higher accuracy. Several other researchers favor RF model machine learning techniques to judge relevance, namely, for climatological, hydrological, and environmental studies (Rahman and Islam 2019; Salam et al. 2021; Saha et al. 2020). Islam et al. (2020) employed the RF model to investigate whether variables impact COVID-19 mortality in Bangladesh cities. The architecture and parameters selected for modeling pre- and post-impact in the RF algorithm are presented in Table 5.

M5 Pruned (M5P)

The model trees were developed by Quinlan (1992). The M5P is the most well-known reported algorithm for regression problems among the developed model’s trees. Linear functions are used instead of discrete class labels at the leaves; M5P predicts that functional reliance is not constant across the domain but could be considered in smaller subdomains (Demir 2022). M5P is an upgraded model of the M5 technique. Its major feature is efficiently handling large datasets with high dimensionality. If the training set is limited, the classification error rate may be large compared to the number of classes. The M5P method does not require parameter configuration. As a result, this algorithm does not require knowledge discovery. The M5P model can also be used in hydrology to model the stage-discharge connection (Ajmera and Goyal 2012), long-term streamflow forecasting (Yaseen et al. 2016), lake level forecasting (Demir 2022) and simulate the rainfall-runoff process (Solomatine and Xue 2004).

It is quick, straightforward, and accurate throughout the procedure. M5P uses a multivariate linear regression model to generate classification and regression trees. As a result, it can reduce variation within a specific subspace. These model trees are reminiscent of piecewise linear functions. The M5P algorithm is named the robust algorithm when dealing with missing data. The parameters selected for modeling pre- and post-impact in the M5P algorithm are presented in Table 5.

Statistical performance assessment

There are numerous applications for performance evaluation in the real world. When a consumer wants to buy a computer, for example, he must compare costs, CPU speed, RAM, pre-installed software, and other factors among several options before deciding which one to purchase. We may ask which search engine will return the most relevant information for the given searches when retrieving information on the Internet. In performance evaluation, hypotheses are selected or ranked based on performance comparison of hypotheses on sample data (Leighton and Srivastava 1999). Hypotheses’ performance measurements are numerical numbers that must be derived from sample data and may contain noise. Furthermore, in real-world applications, evaluating all hypotheses is typically impractical or impossible due to time and resource restrictions. As a result, statistical measures are utilized to efficiently evaluate the performance of hypotheses using a small quantity of sample data. There are a variety of statistical metrics available, and their conclusions are dependent on a number of criteria, including the size of the sample data and the distribution of hypotheses performance measurements. It’s difficult to choose the best acceptable statistical measurements.

Models evaluation and comparison of actual and forecasted data of water temperature were achieved based on several performances metrics, namely, (i) Pearson correlation coefficient (PCC), (ii) the mean absolute error (MAE), (iii) the root mean square error (RMSE), (iv) the relative absolute error (RAE), (v) the coefficient of determination R2, and (vi) the root-relative square error (RRSE), calculated as follows (Shukla et al. 2021; Vishwakarma et al. 2022):

$$\mathrm{PCC}=\frac{\sum_{\mathrm{i}=1}^{\mathrm{N}}\left({\left(\mathrm{Tw}\right)}_{{\left(\mathrm{Obs}\right)}_{\mathrm{i}}}-{\left(\overline{\mathrm{Tw}}\right)}_{{\left(\mathrm{Obs}\right)}_{\mathrm{i}}}\right)\ \left({\left(\mathrm{Tw}\right)}_{{\left(\mathrm{Est}\right)}_{\mathrm{i}}}-{\left(\overline{\mathrm{Tw}}\right)}_{{\left(\mathrm{Est}\right)}_{\mathrm{i}}}\right)}{\sqrt{\sum_{\mathrm{i}=1}^{\mathrm{N}}{\left(\left({\left(\mathrm{Tw}\right)}_{{\left(\mathrm{Obs}\right)}_{\mathrm{i}}}-{\left(\overline{\mathrm{Tw}}\right)}_{{\left(\mathrm{Obs}\right)}_{\mathrm{i}}}\right)\right)}^2\ {\left(\left({\left(\mathrm{Tw}\right)}_{{\left(\mathrm{Est}\right)}_{\mathrm{i}}}-{\left(\overline{\mathrm{Tw}}\right)}_{{\left(\mathrm{Est}\right)}_{\mathrm{i}}}\right)\right)}^2\ }}$$
(6)
$${R}^2\ {\left[\frac{\sum_{\mathrm{i}=1}^{\mathrm{N}}\left({\left(\mathrm{Tw}\right)}_{{\left(\mathrm{Obs}\right)}_{\mathrm{i}}}-{\left(\overline{\mathrm{Tw}}\right)}_{{\left(\mathrm{Obs}\right)}_{\mathrm{i}}}\right)\ \left({\left(\mathrm{Tw}\right)}_{{\left(\mathrm{Est}\right)}_{\mathrm{i}}}-{\left(\overline{\mathrm{Tw}}\right)}_{{\left(\mathrm{Est}\right)}_{\mathrm{i}}}\right)}{\sqrt{\sum_{\mathrm{i}=1}^{\mathrm{N}}{\left(\left({\left(\mathrm{Tw}\right)}_{{\left(\mathrm{Obs}\right)}_{\mathrm{i}}}-{\left(\overline{\mathrm{Tw}}\right)}_{{\left(\mathrm{Obs}\right)}_{\mathrm{i}}}\right)\right)}^2\ {\left(\left({\left(\mathrm{Tw}\right)}_{{\left(\mathrm{Est}\right)}_{\mathrm{i}}}-{\left(\overline{\mathrm{Tw}}\right)}_{{\left(\mathrm{Est}\right)}_{\mathrm{i}}}\right)\right)}^2\ }}\right]}^2$$
(7)
$$RMSE=\sqrt{\sum_{i=1}^N\frac{1}{N}\ {\left[\left({\left(\mathrm{Tw}\right)}_{{\left(\mathrm{Obs}\right)}_{\mathrm{i}}}-{\left(\overline{\mathrm{Tw}}\right)}_{{\left(\mathrm{Est}\right)}_{\mathrm{i}}}\right)\right]}^2}$$
(8)
$$MAE=\frac{1}{N}\sum_{i=1}^N\left|\left({\left(\mathrm{Tw}\right)}_{{\left(\mathrm{Obs}\right)}_{\mathrm{i}}}-{\left(\overline{\mathrm{Tw}}\right)}_{{\left(\mathrm{Obs}\right)}_{\mathrm{i}}}\right)\right|$$
(9)
$$RAE=\left|\frac{{\left(\mathrm{Tw}\right)}_{{\left(\mathrm{Obs}\right)}_{\mathrm{i}}}-{\left(\overline{\mathrm{Tw}}\right)}_{{\left(\mathrm{Est}\right)}_{\mathrm{i}}}}{{\left(\overline{\mathrm{Tw}}\right)}_{{\left(\mathrm{Est}\right)}_{\mathrm{i}}}}\right|\times 100$$
(10)
$$RRSE=\frac{\sqrt{\sum_{i=1}^N{\left[\left({\left(\mathrm{Tw}\right)}_{{\left(\mathrm{Obs}\right)}_{\mathrm{i}}}-{\left(\overline{\mathrm{Tw}}\right)}_{{\left(\mathrm{Est}\right)}_{\mathrm{i}}}\right)\right]}^2}}{\sqrt{\sum_{i=1}^N{\left[\left({\left(\mathrm{Tw}\right)}_{{\left(\mathrm{Obs}\right)}_{\mathrm{i}}}-{\left(\overline{\mathrm{Tw}}\right)}_{{\left(\mathrm{Obs}\right)}_{\mathrm{i}}}\right)\right]}^2}}$$
(11)

where \({\left(\mathrm{Tw}\right)}_{{\left(\mathrm{Obs}\right)}_{\mathrm{i}}}\) is the ith measured/actual values, \({\left(\overline{\mathrm{T}}w\right)}_{{\left(\mathrm{Obs}\right)}_{\mathrm{i}}}\)is the average of the observed/actual values, \({\left(\overline{\mathrm{T}}w\right)}_{{\left(\mathrm{Est}\right)}_{\mathrm{i}}}\) is the ith calculated values, \({\left(\overline{\mathrm{Tw}}\right)}_{{\left(\mathrm{Est}\right)}_{\mathrm{i}}}\)is the average of the estimated/predicted values, and N is the total number of observations. The values of RMSE range from 0 to ∞, PCC −1 to 1, R2 0 to 1, MAE 0 to ∞, and RAE and RRSE 0 to 1. Good forecasting accuracy corresponds to a value of PCC and R2 nearly equal to 1, while for the other metrics, their values should be close to zero (Yaseen et al. 2016, 2018; Ayele et al. 2017; Shukla et al. 2021; Demir 2022; Pham et al. 2022; Vishwakarma et al. 2022). Each one of these measures’ descriptive performances is as follows:

  • The lower the value of MAE and RMSE and near to zero MBE, the better the model performance (Vishwakarma et al. 2022).

  • For R2:

    Very good (0.7 < R2 ≤ 1); good (0.6 < R2 ≤ 0.7); satisfactory (0.5 < R2 ≤ 0.6); and unsatisfactory (R2 ≤ 0.5) (Ayele et al. 2017).

In RAE, total absolute error is normalized by dividing it by the total absolute error of the basic indicator in the RAE, whereas in RRSE, the total squared error is normalized by dividing it by the total squared error of the basic indicator in the RSE. The error is reduced to the same dimensions as the quantity being predicted by taking the square root of the relative squared error. Taylor diagrams, radar charts, and box plots were also investigated to visually compare model performance (Taylor 2001; Citakoglu 2021; Başakın et al. 2022; Görkemli et al. 2022). More details about models evaluation and comparison can be found in Kushwaha et al. (2021), Elbeltagi et al. (2022a), and Vishwakarma et al. (2022).

Result and discussion

Temperature is one of the most significant parameters for evaluating the water environment since water temperature fluctuation mainly governs several freshwater processes. The reservoir impoundment is responsible for water temperature fluctuation and distribution, as is the annual water temperature change in the downstream river. At the same time, because the reservoir space architecture is relatively dense in the cascade development mode, a single reservoir’s influence on water temperature is bound to be in some form, resulting in a cumulative effect on water temperature. In this paper, two two-dimensional models modeled on two-dimensional averages were laterally averaged for Xiangjiaba Reservoir and Xiluodu Reservoir downstream of Jinsha River to simulate the water temperature. The model parameters were calibrated using 2014 temperature data and then confirmed using 2015 data. These models are capable and most suited to simulating the two libraries’ hydrodynamic processes and geographical water temperature distributions.

Using simulation findings, the present research examined water temperature fluctuation over space and time in Xiangjiaba and Xiluodu reservoirs.

Furthermore, a cumulative effect evaluation method was built. The characteristics of cumulative effects of water temperature over space and time in the Xiangjiaba and Xiluodu reservoirs were evaluated. As a result, the downstream control approach for cumulative effects was developed.

Input variables selection for modeling of pre- and post-impact on water temperature

The success of machine learning models is mainly governed by a good selection of the best predictors, i.e., the best input variables (Malik et al. 2019; Shukla et al. 2021; Kushwaha et al. 2021; Elbeltagi et al. 2022b, a; Kumar et al. 2022). From a general point of view, based on the available input variables, we believe that testing several input combinations is the more suitable procedure for obtaining the best final model; in addition, testing several input combinations can help provide a multitude of alternatives with different structures. As reported in Tables 6 (for pre-impact) and 7 (for post-impact), eight scenarios were analyzed in the present study having different input variables. The best input combinations are reported in bold. However, all combinations were selected based on several indices, namely, Amemiya’s PC (A-PC), Schwarz’s BC, Akaike’s IC, the MSE, and Mallows’ Cp (M-Cp), the R2, and the adjusted R2 (A-R2). According to Table 6, for the pre-impact scenario, it is clear that the best model corresponds to the third input combination using the first, second, and seventh lag times, i.e., (t-1), (t-2), and (t-7), respectively, and exhibiting the most significant statistical indices with MSE, R2, A-R2, M-Cp, AIC, SBC, and APC values of approximately 0.145, 0.994, 0.994, 4.311, −2111.308, −2091.310, and 0.006, respectively. Similarly, for the post-impact, as reported in Table 7, the best model was obtained when the input variables were selected as the first eight successive lag times, excluding the fifth lag time, i.e., (t-1) to (t-4) in addition to (t-6-) to (t-8), for which the statistical MSE, R2, A-R2, M-Cp, AIC, SBC, and APC values were approximately 0.192, 0.992, 0.992, 7.002, −1801.833, −1761.838, and 0.008, respectively.

Table 6 The summary of best subset regression variables in (pre-impact)
Table 7 The summary of best subset regression variables in (post-impact)

Sensitivity analysis

From the input variables selection reported above, it is clear that the variables’ contribution varies from one to another, and the best input selection highly influenced the model’s performance. Tables 8 and 9 and Figs. 3 and 4 depict the obtained standard coefficients of the linear regression (SC-LR). According to Table 8, for the pre-impact scenario, the input variables corresponding to the three lags times, i.e., (t-1), (t-2), and (t-7), exhibited the highest absolute standard coefficients, i.e., 1.084, 0.068, and −0.019, respectively, Similarly, for the post-impact simulation, the values of the SC-LR were 0.911, 0.080, 0.015, 0.047, 0.025, 0.006 and 0.005, respectively (Table 9).

Table 8 Standardized coefficients and sensitivity analysis of linear regression of different input combinations in pre-impact
Table 9 Standardized coefficients and sensitivity analysis of linear regression of different input combinations post-impact
Fig. 3
figure 3

The standardized coefficients of input variable for sensitivity analysis (pre-impact)

Fig. 4
figure 4

The standardized coefficients of input variable for sensitivity analysis (post-impact)

Modeling of pre- and post-impact on water temperature

The hybrid models, i.e., RS, REPTree, RF, and M5P, were calibrated according to the best input variables selected based on the finding reported in Tables 6 and 7. Selection models were calibrated beyond the input variables using 75% of daily observed data and validated using the remaining 25%. Both goodness-of-fit measurements and graphical presentations were used to assess the models’ performance. Tables 10 and 11 describe the overall performance of all AI-based models throughout the calibration and testing stages for the estimate of daily observed water temperature at all stations using five statistical indicators. In Figs. 5 and 6, the statistical measures are also shown using a radar chart.

Table 10 Statistical measurements of the proposed methods to forecast water temperature in pre-impact spam
Table 11 Statistical measurements of the proposed methods to forecast water temperature in post-impact spam
Fig. 5
figure 5

Radar charts display the goodness-of-fit measures of Random Subspace, REPTree, Random Forest, and M5P models during a training and b testing period in pre-impact water temperature

Fig. 6
figure 6

Radar charts display the goodness-of-fit measures of Random Subspace, REPTree, Random Forest, and M5P models during a training and b testing period in post-impact water temperature

Evaluation developed models in pre-impact water temperature forecasting

Using various assessment criteria, we examined the robustness of the proposed models during the calibration and testing stages (Table 10). In addition, all soft computing models use identical statistical techniques to train and evaluate datasets. It can be seen that overfitting does not occur in any of the models. The M5P model has the highest accuracy during the calibration and testing stages of training compared to the other suggested models, as shown in Table 10. Based on examining the numerical performances reported in Table 10, extremely strong prediction performance (R2 > 0.9) was achieved using all models. Our R2 result revealed highly reasonable model performances. However, the highest numerical performances were obtained using the M5P when performance measurements were taken into account, exhibiting the largest R2 value (0.9920); the RF (0.9872) came second, REPTree (0.9872) came third, and RS (0.9862) was ranked fourth in the list of models during the validation stage. By referencing the RMSE values, it is clear that the M5P model obtained the poorest RMSE values corresponding to the highest predictive accuracy (RMSE≈0.3349); the RF (0.4356) came second, REPTree (0.4365) came third, and RS (0.455) was ranked fourth in the list of models during the validation stage. Similarly, based on the MAE criteria, the M5P model (RMSE≈0.2384) worked best, the RF (0.3206) came second, REPTree (0.3232) came third, and RS (0.9938) was ranked fourth during the validation stage.

The M5P model produced the lowest MAE criteria (0.2384), followed by the Random Forest (0.3206), REPTree (0.3232), and Random Subspace (0.9938) model, using the RAE and RRSE statistical evaluation criteria which were least in M5P (6.2573 and 8.0288, respectively) and followed by the RF (8.404 and 10.1421, respectively), REPTree (8.4729 and 10.163, respectively), and RSS (8.9076 and 10.592, respectively) models. Statistical metrics are also presented using the radar graph in Fig. 5. All four hybrid models performed excellently, but the M5P model worked largely better than the other models in estimating daily water temperature in all the six statistics at all study locations during pre-impact spam. The relative performance indicated that they performed similarly. The performance lines of all five models overlap on the radar map, showing that the models perform similarly to one another. However, a closer examination of the data indicated that the M5P largely exceeds the remaining models.

The scatterplot of the measured and estimated data of daily water temperature in the calibration and testing stages for all proposed models are depicted in Figs. 7 and 8, showing a good match during the two stages. All models with excellent levels guaranteed high predictive accuracy. At the same time, only M5P could perfectly predict the fluctuation of the water temperature of training and testing of pre-impact spam (Figs. 9 and 10).

Fig. 7
figure 7

Comparison of the results between the measured/actual and predicted water temperature for the training and testing dataset in using ab Random Subspace; cd REPTree; ef Random Forest; and gh M5P for pre-impact time series

Fig. 8
figure 8

Scatter plots of observed vs. predicted water temperature in the training and testing phase: a–b Random Subspace; c–d REPTree; e–f Random Forest; and g–h M5P during the preimpact time

Fig. 9
figure 9

Comparison of the results between the measured/actual and predicted water temperature for the training and testing dataset in using: ab Random Subspace; cd REPTree; ef Random Forest; and gh M5P for post-impact time series

Fig. 10
figure 10

Scatter plots of observed vs. predicted water temperature in the training and testing phase: a–b Random Subspace; c–d REPTree; e–f Random Forest; and g–h M5P during the post-impact time

We also further analyzed model efficiency using the Box and Whisker Plot of the models (Fig. 11a) and Taylor diagrams (Fig. 12). The box and whisker plots for predicting the maximum and minimum data point using the M5P were approximately equal to the measured data. In contrast, RSS, REPTree, and RF slightly underestimated water temperature. The quartile, median, mean, and standard deviation of all models could closely predict water temperature values to the measured data having a significant predictive degree. Indeed, the M5P showed better accuracy.

Fig. 11
figure 11

Box and Whisker plot of the models: a pre-impact and b post-impact

Fig. 12
figure 12

Taylor diagram of the models during pre-impact spam. a Pre-impact training. b Pre-impact testing

The better performance shown in Taylor diagrams (Fig. 12), the closer each produced model’s point is to the observed position. The models had a strong predictive capacity in this case. However, the M5P approach provided the greatest R and poorest RMSE values. The SD of the M5P model was close to the actual SD-based values; however, the SD of the RS and RF models was lower, followed by the REPTree models.

Evaluation developed models in post-impact water temperature forecasting

The model’s performance during post-impact is summarized in Table 11 in terms of six statistical metrics. All four hybrid models performed significantly better in predicting water temperature in all six statistics in the post-impact phase than the baseline model. The comparison of the relative performances of the hybrid models found that they were quite close to the observed values. Our findings (Table 11) revealed that these models are acceptable and provide good results based on testing data. However, considering the R2 and PCC, the M5P was the most accurate and exhibited a value of approximately 0.9708 and 0.9853, followed by the RSS and RF, which are equal (R2 = 9704, PCC = 0.9851) and REPTree (0.9661 and 0.9829). The MAE, RMSE, RAE, and RRSE were obtained as 0.4212, 0.5969, 11.3469, and 13.9353, respectively, for RSS; 0.439, 0.6006, 11.8284, and 14.0229 for RF; and 0.464, 0.6442, 12.5022, and 15.0396 for REPTree, respectively. The low MAE, RMSE, RAE, and RRSE and higher value or near-ideal R2 and PCC values designate a better model predictive performance. As indicated in Table 11, there is an excellent concert of the M5P model in estimating daily water temperature for post-impact.

As seen in Fig. 6, the statistical measures are also provided using a radar map. Excellent accuracies were achieved using the four proposed models. However, the superiority of the M5P model compared to the other models in projecting daily water temperature according to the six statistics metrics at all study sites during the pre-impact spam period is more obvious. The radar map demonstrates that the performance lines of all five models overlap, suggesting that the models perform similarly to one another in terms of overall performance. However, an in-depth examination of the data indicated slight superiority of the M5P model compared to the other.

Figures 9 and 10 indicate that the suggested soft computing algorithms predicted and observed values and scatter plots are consistent. This graph demonstrates that the proposed models can accurately predict water temperature. When employing the M5P model, the data points projected as measured versus predicted values were close, one on top of the other, indicating high fitting capabilities. We examine model efficiency using box and whisker plots (Fig. 11b) and Taylor diagrams (Fig. 13). Figure 11b shows the model’s box and whisker plot results. Like the M5P model, the M5P box and whisker plot predicted maximum and minimum values very close to the actual values. However, Random Subspace, REPTree, and Random Forest slightly underestimated the water temperature. The M5P model outperformed the other quartile, median, mean, and standard deviation.

Fig. 13
figure 13

Taylor diagram of the models during post-impact spam. a Post-impact training. b Post-impact testing

Figure 13 shows that, according to Taylor diagrams, the model should be considered better if it is near the observed point’s position. Hence, it is clear that the M5P algorithm was the strong model in terms of forecasting capabilities and performances, which is reflected by its high PCC and lowest RMSE. In addition, from the Taylor diagrams, the M5P was also the sole model having an SD relatively equal to the measured data. However, the RT and RF models exhibited a small SD, while the REPTree model also had a small standard deviation.

Discussion

Obtained results in the present study are very encouraging and promising. While the performances of all models for the pre-impact spam were more accurate compared to those of post-impact spam, in overall, numerical performances revealed the suitability of the proposed machine learning models as a robust tool for water temperature prediction. According to the obtained results and to what is discussed above, the mean PCC, RMSE, and MAE values were 0.994, 0.306°C, and 0.418°C for pre-impact spam and 0.985, 0.438, and 0.568 for post-impact spam, which are superior to the values reported by Heddam et al. (2020), i.e., 0.980, 1.413 °C and 1.085 °C, respectively, and it is clear that the superiority of the M5P, RSS, RF, and REPTree models was more obvious taking into account the error metrics, i.e., the RMSE and MAE values. In a recently published paper, Heddam et al. (2022) reported that river water temperature can be predicted with sufficient accuracy by hybrid machine learning combined with signal decomposition, and it was found that the high values of the PCC, RMSE, and MAE were 0.980, 1.304°C, and 1.018°C, respectively, which were significantly less than the obtained values in our present study. In another study, Yousefi and Toffolon (2022) compared between long short-term memory (LSTM), RF, ERT, K-nearest neighbor (KNN), decision tree (DT), adaptive neuro fuzzy inference system (ANFIS), multi-layer perceptron neural network (MLPNN), and support vector regression (SVR) for predicting river water temperature, and they reported that none of the reported models was able to reduce the RMSE below the level of 1.400°C, therefore highlighting the extent and the importance of the modelling framework reported in the present study.

Conclusions

According to the numerical results obtained in this study, we can conclude that dam reservoirs contributed significantly to the alteration of the thermal water regimes. Especially, they are responsible for the continuous and progressive water heating in the downstream river environment. Consequently, building models for simulating dam’s reservoir behaviors, continuous monitoring, and control of water temperature need to be continuously observed. Accurate forecasting of the water temperature variation in dams and lakes may help in the building and managing dams and lake’s water utilization. To predict daily water temperature fluctuation of the Yangtze River in Cuntan, China, we tested and developed several artificial intelligence models, namely RSS, REPTree, RF, and M5P, according to several input variable combinations. The best input combination was found to be water temperature measured at three lags times, i.e., (t-1), (t-3), and (t-7) for pre-impact and (t-1) to (t-8) with the exclusion of (t-5) for post-impact. Our findings indicated that M5P outperformed all models exhibiting high performances and the best forecasting accuracy with the lowest MAE, RMSE, RAE, RRSE, and the greatest R2 and PCC. Furthermore, model validation based on graphical analysis revealed that plotting the data points using histograms and scatterplot demonstrate the superiority of the M5P for which data were less scattered than the other models indicating that it has potential for broader use in water temperature prediction.