Introduction

Freshwater resources play an indispensable role in sustaining our ecosystems due to their multifaceted contributions that include agricultural irrigation, economic progress through tourism, preservation of natural aesthetics, preservation of biodiversity, and many other important factors (Barzegar et al. 2021; Buyukyildiz et al. 2014; Choi et al. 2019; García Molinos et al. 2015; Zhu et al. 2020a). Despite providing myriad benefits to human societies and biodiversity, freshwater resources are inherently limited. Freshwater, which is essential to support life on Earth, makes up approximately 2.5% of the water that covers about 70% of the planet’s surface (Shiklomanov 2000). Freshwater is often challenging to access due to its predominant presence in glaciers and ice caps, which account for approximately 68.7% of the global freshwater supply (Gleick 1993; Shiklomanov 2000). Freshwater lakes hold about 20.9% of the 1.2% of Earth’s freshwater that is available as surface water (Gleick 1993). According to the United States Environmental Protection Agency, the Great Lakes of North America, constituting 21% of the world’s surface freshwater, are the largest group of freshwater systems on Earth (US EPA 2023). The Great Lakes, encompassing Lake Erie, Lake Huron-Michigan, Lake Ontario, Lake Superior, and Lake St. Clair, possess a combined volume of approximately 22,671 km³. Native Americans have historically relied heavily on these lakes as a source of water. The Great Lakes region was home to a large population of American Indians who fished, hunted, and used water for all manner of purposes (Hudson and Ziegler 2014). However, they respected and cherished these lakes, and as a result they kept them pristine. Both the surroundings near the lakes and the lakes themselves have seen significant change during the past few hundred years. The greatest contributors to environmental change near the Great Lakes have been industrialization leading to commercial and agricultural growth, followed by climate change and its potential long-term effects (Hartig et al. 2020; Mortsch 1998; Mortsch and Quinn 1996).

Keddy and Reznicek (1986) discussed how water fluctuations can alter ecosystems, especially vegetation, several decades ago. In their study, they compared periods of low water level with periods of high-water level. Their observations suggest that during low water level periods, the soil becomes less anoxic, plants attempt to adapt to living in dryer conditions, and vegetation changes generally as plants that cannot survive in dry conditions gradually give way to species that are tolerant of these climatic changes. On the other hand, the opposite procedure can be seen at times of high water. According to study conducted a few decades ago, the water level in some Great Lakes could rise or fall (depending on the pattern) by 4.5 feet (1.37 m) over the next hundred years (Annin 2006; Hall and Stuntz 2008). Water level changes are a particularly important issue due to potential environmental changes, so monitoring and recording the water level in these large freshwater reservoirs over an extended period of time is essential. In addition to having a significant impact on businesses, homes and habitats of aquatic species in the Great Lakes region, these changes have the potential to change the weather, resulting in prolonged rain. Therefore, analyzing the findings can help effectively manage the water resources of the Great Lakes. The Great Lakes Environmental Research Laboratory (GLERL), a facility of the National Oceanic and Atmospheric Administration (NOAA) Office of Oceanic and Atmospheric Research (OAR), has been involved in a number of observation and monitoring initiatives undertaken so far (https://www.glerl.noaa.gov/data/dashboard/data/). Starting in the year 1860, GLERL has been collecting information on the water levels of all the lakes in the Great Lakes basin, with the exception of Lake St. Clair, where water levels have been monitored since 1898. In addition to water levels data, NOAA has also been collecting data on air temperature, evaporation and precipitation in the Great Lakes basin over an extended period of time.

In recent years, significant research has been conducted to study water level fluctuations of lakes in various parts of the world, including the Great Lakes, using machine learning methods. Machine learning (ML) is a subdivision of artificial intelligence (AI) that allows us to train the computer to accurately predict the output for given inputs (Ciaburro and Iannace 2021; Janiesch et al. 2021; Jordan and Mitchell 2015). Recently, ML algorithms have become very popular especially in engineering and science and are used in numerous applications such as image recognition, medicine, language translation, computer vision, social media platforms and others (Demir and Yaseen 2023; Injadat et al. 2021; Jordan and Mitchell 2015; Mirzania et al. 2023a; Sarker 2021; Sharma et al. 2021; Zhou et al. 2023). However, the application of ML algorithms to predict water levels in lakes and reservoirs is still under development. Zhu et al. (2023a) developed two deep data-driven models, including gated recurrent unit (GRU) and long short-term memory (LSTM), coupled with an attention mechanism for forecasting daily lake water levels in Poland. Their study revealed that LSTM with attention mechanism generally outperforms GRU with attention mechanism, although on average across different lakes, GRU emerges as the best-performing deep learning model. Furthermore, zero-order forecast models excel in predicting tomorrow, while deep learning models demonstrate improved performance with longer prediction horizons. Zhu et al. (2020b) employed a feed forward neural network (FFNN) and Deep Learning (DL) technique to predict monthly lake water level in Poland. Their results demonstrated that the FFNN model slightly outperformed the DL model, suggesting that traditional ML models are sufficient for predicting water levels if properly trained. Saroughi et al. (2024) utilized the Shannon entropy method and developed a hybrid model for predicting groundwater level (GLW) in the Tabriz plain of Iran. They reported that the performance of the standalone model significantly improved with the proposed method, with the Honey Badger algorithm (HBA)-ANN performing marginally better than the Coot-ANN model. Additionally, Mirzania et al. (2023b) employed hybrid algorithm of innovative gunner-support vector regression (AIG-SVR) and SVR models, to accurately estimate daily reference evapotranspiration (ET0) in Australia. Their results showed that the AIG-SVR provides better results than the classic SVR. Similarly, Mirzania et al. (2023c) developed a hybrid COOT-ANN predictive model and evaluated its performance in predicting ET0 in Australia, comparing its performance with that of the standalone ANN model. Their findings indicated that the COOT-ANN hybrid model surpasses the ANN model in performance. Bonakdari et al. (2019) employed several AI models such as the Minimax Probability Machine Regression (MPMR), Relevance Vector Machine (RVM), Gaussian Process Regression (GPR) and Extreme Learning Machine (ELM) to predict water level fluctuations in Lake Huron. The study reported that the MPMR is the best model for predicting water level fluctuations in Lake Huron. Similarly, Altunkaynak (2014) estimated water level fluctuations in Lake Michigan-Huron using a combination of three methods: wavelet transform, fuzzy logic, and multilayer perceptron (artificial neural network). In addition, Wang and Wang (2020) used a set of ML algorithms, namely Gaussian process (GP), multiple linear regression (MLR), multilayer perceptron (MLP), M5P model tree, random forest (RF), and k-nearest neighbor (KNN) to predict the water level in Lake Erie. While these studies are useful for estimating the water levels of Lake Huron, Lake Michigan-Huron, and Lake Erie, they do not provide complete statistical information about all the lakes in the Great Lakes basin. Furthermore, Coulibaly (2010) compared the results of water levels of only major lakes (excluding lake St. Clair) in the Great Lakes region using several neural networks such as echo state network (ESN), recurrent neural networks (RNN), and Bayesian neural networks (BNN).

This study addresses the correlation between water levels and key meteorological features, including air temperature, evaporation, and precipitation, to accurately predict fluctuations in water levels in the Great Lakes. To achieve this goal, the study employs various models, namely multiple linear regression (MLR), the nonlinear autoregressive network with exogenous inputs (NARX), Facebook Prophet (FB-Prophet), and long short-term memory (LSTM), all of which are based on historical datasets. The selection of these models for water level prediction was based on their respective strengths and suitability for the task. MLR, chosen for its simplicity and interpretability, captures linear relationships between predictor variables and water levels effectively. NARX models accommodate nonlinear relationships and temporal dependencies, which are crucial for capturing the complex dynamics of water level fluctuations (Zhu et al. 2023b). LSTM’s capability to capture long-term dependencies in sequential data is essential for modeling the intricate temporal relationships inherent in water level prediction tasks. These models were preferred over others due to their alignment with the specific requirements and objectives of the study, highlighting their effectiveness in addressing the complexities of water level forecasting. In this investigation, air temperature, evaporation, precipitation, and lagged water levels from the preceding month were selected as the independent variables for analysis. Simultaneously, current water levels were identified as the dependent variable, serving as the target for predictive modeling efforts within the Great Lakes basin. Based on the comprehensive evaluation metrics and accurately predicted results, the findings strongly suggest that the NARX model emerges as a well-suited and reliable tool for predicting lake water levels in the Great Lakes. These results hold significant implications for enhancing our understanding of the region’s hydrological dynamics and can be instrumental in guiding effective water resource management strategies. Although prior studies have contributed valuable insights into water level prediction within the Great Lakes basin, they have not comprehensively addressed statistical information for all lakes within the region. To the best of my knowledge, this study is the first to provide detailed assessments for all lakes, including Lake St. Clair, in the Great Lakes basin using a variety of models.

Materials and methods

Study area

Freshwater is a vital necessity for the survival of human life and a wide range of biota. Distribution of global water, including freshwater, on Earth is illustrated in Fig. 1(a) (USGS 1993). The Great Lakes, also known as the Laurentian Great Lakes, account for 20.9% of global surface freshwater located between the United States (59%) and Canada (41%) border (Bonakdari et al. 2019; Gleick 1993; Xue et al. 2022). These lakes are Lake Erie, Lake Huron-Michigan, Lake Ontario, Lake Superior, and Lake St. Clair as shown in Fig. 1(b).

Fig. 1
figure 1

(a) Distribution of global water and (b) A map of the study area showing the location of the Great Lakes

It is important to note that Lake Huron and Lake Michigan have the same hydrological characteristics and are connected to each other by the Strait of Mackinac, hence known as a single lake (Anderson and Schwab 2013). The Great Lakes basin covers an area of approximately 764,046 km2 (295,000 mi2) (Neff and Nicholas 2005). It stretches for about 1126 km (700 miles) from north to south and about 1448 km (900 miles) from the west to the outlet of Lake Ontario at Cornwall and Massena in the east. The region encompasses eight states - Minnesota, Wisconsin, Illinois, Indiana, Michigan, Ohio, Pennsylvania, New York - and one province, Ontario, in addition to the Great Lakes (Neff and Nicholas 2005; Wilcox et al. 2007). The Great Lakes basin is home to approximately 33 million people, representing roughly 10% of the U.S. population and 30% of the Canadian population (Danz et al. 2007; Wilcox et al. 2007). Lake Superior is the largest lake in terms of volume and surface area in the Great Lakes basin, with a surface area of approximately 82,100 km2 (31,700 mi2). However, When Lake Michigan and Lake Huron are combined into a single lake, they not only become the largest lake in the region but also claim the title of the world’s largest freshwater lake by surface area, boasting a total surface area of approximately 117,400 km2 (45,300 mi2). On the other hand, Lake St. Clair, which is considered part of the Great Lakes system, is the smallest lake with a surface area of approximately 1.114 km² (430 mi2). The physical characteristics and basic statistical parameters of all lakes in the Great Lakes basin are given in Table 1 (Inn and Port Huron 1999; US EPA 2023).

Table 1 Physical and statistical characteristics of lakes in the Great Lakes basin

Data source

A time series, comprising chronologically ordered observations, serves as a valuable resource for researchers across diverse fields, including engineering, physical sciences, and social sciences (Ghaderpour et al. 2021; Moraffah et al. 2021; Parzen 1961). In the present study, monthly mean time series data of water level, air temperature, evaporation, and precipitation of the Great Lakes, including Lake St. Clair, for the duration of 1950 to 2010 were used. These monthly time series data were acquired from NOAA website. All analyses were conducted using various models in Python (version 3.9.13) and MATLAB (version R2022a) environments. The flowchart of the proposed methodology for water level prediction is illustrated in Fig. 2.

Fig. 2
figure 2

The flowchart of the study

When working with ML models, there isn’t a single, universally applicable method for input selection (Babel and Shinde 2011). The body of research demonstrates that researchers have thus far employed heuristic methodologies, sensitivity analysis, and linear cross-correlation techniques (Bowden et al. 2005; Piasecki et al. 2015). As hydrological factors such as air temperature, evaporation, and precipitation have an impact on lake water levels, the selection of input for forecasting Great Lakes water levels was made depending on the specific ML models utilized.

Before using the raw data for some ML models, the data were normalized using the min-max normalization method and scaled between 0 and 1 with the following equation:

$${x}_{norm}=\frac{x-{x}_{min}}{{x}_{max}-{x}_{min}}$$
(1)

Here, xnorm represents the normalized data, x represents the observed data, and xmin and xmax denote the minimum and maximum values of the observed data, respectively.

There are several approaches to employing data partitioning in ML. When training models, the given data is typically divided into three distinct sets: training, validation, and testing, usually with ratios of 70/15/15 or 80/10/10, respectively. In this study, the datasets were randomly split into training, validation, and testing with a ratio of 70/15/15.

Prediction model

ML has recently gained popularity and proven to be a useful model for classification, clustering, pattern recognition, and prediction in a variety of fields (Lee et al. 2018; Mirzania et al. 2023a; Saroughi et al. 2024; Wu et al. 2014). Additionally, artificial neural networks (ANNs), a subset of ML, serve as nonlinear statistical data models that mimic the functionality of biological neurons (Ghimire et al. 2021; Zhu et al. 2023b). While numerous studies have significantly benefited from the use of classical approaches, ML, particularly ANNs, offer several advantages over their counterparts in time series forecasting because ML and ANNs can handle nonlinear data that does not follow a normal distribution (Hansen et al. 1999). Moreover, ML can efficiently process extensive datasets and identify potential interactions among predictor variables (Ghiasi and Koushki 2020).

In the course of this study, diverse models, particularly MLR, NARX, FB-Prophet, and LSTM, were employed to predict lake water levels within the Great Lakes basin. The optimization of each model was carried out by adjusting the hyperparameters (Jamous et al. 2021). Each of the models for predicting water levels is briefly described below.

Multiple linear regression (MLR)

MLR, an extension of linear regression, is a statistical approach employed in data analysis and modeling to investigate the correlation between a dependent variable and two or more independent predictors (Choden et al. 2022; Uyanık and Güler 2013). The following expression presents the equation that characterizes the MLR model:

$$Y={\beta }_{0}+{\beta }_{1}{X}_{1}+{\beta }_{2}{X}_{2}+{\beta }_{3}{X}_{3}+\dots +{\beta }_{p}{X}_{p}+\epsilon$$
(2)

where \(Y\) is the estimated value of the dependent variable, \({\beta }_{0}\) is the \(Y\)-intercept, \({\beta }_{1},{\beta }_{2},{\beta }_{3},\dots ,{\beta }_{p}\) represent regression coefficients of independent predictors \({X}_{1},{X}_{2},{X}_{3}, \dots , {X}_{p}\), and \(\epsilon\) is the error term. In the context of MLR, water level values with a one-month lagged were employed for predicting the water levels in lakes.

Nonlinear autoregressive network with exogenous inputs (NARX)

The NARX model is a modification of the nonlinear autoregressive (NAR) model by adding an additional relevant time-series variable as an additional input to the forecasting model. The NARX, which is based on the linear ARX model, is a recurrent dynamic neural network that is enclosed by feedback connections at several layers (Demuth et al. 1992). The equation describing the NARX model is given by the following expression:

$$y(t)\; = \;f(y(t - 1),\;y(t - 2),...\;,\;y(t - {n_y}),$$
$$u(t - 1),\;u(t - 2),...,\;u(t - {n_u}))\;$$
(3)

where \(y\left(t\right)\) is the dependent (predicted) output value, \({n}_{u}\) is the number of time delay in the input, \({n}_{y}\) is the number of time delay in the output, \(y\left(t-1\right),y\left(t-2\right),\dots ,y\left(t-{n}_{y}\right)\) are previous values of the output, \(u\left(t-1\right),u\left(t-2\right),\dots ,u\left(t-{n}_{u}\right)\) are previous values of independent (exogeneous) input, and \(f\) is typically a nonlinear function. In the NARX model, the dependent output value \(y\left(t\right)\) is predicted based on previous values of the output and previous values of independent (exogeneous) input as shown in Eq. 3. The NARX model can be employed by approximating the function f with a feedforward neural network. The architecture of the NARX neural network is shown in Fig. 3. Here, IW1,1 is the connection weight between the input neuron and hidden neuron; LW1,3 is the connection weight between the hidden neuron and output feedback neuron; LW2,1 is the connection weight between the hidden neuron and predicted output; b1 is the bias of the hidden neuron; b2 is the bias of the predicted output; f1 is the hidden layer activation function; and f2 is the output layer activation function. In this architecture, the approximation is performed using a two-layer feedforward network (Demuth et al. 1992). In this model, previous water level values along with the aforementioned independent exogenous inputs were used to predict future values of the lake water levels. While training the model, 10 neurons with 1:2 input and feedback delays were utilized in the hidden layer. The network was trained with Levenberg-Marquardt backpropagation algorithm discussed in detail elsewhere (Lv et al. 2017). Although the maximum number of epochs was set to 1000, the validation criteria were met between 15 and 35 epochs for all lakes.

Fig. 3
figure 3

Schematic diagram of nonlinear autoregressive network with exogenous inputs (NARX) architecture

Facebook prophet (FB-Prophet)

FB-Prophet, developed by Facebook’s data science team as open source, is a forecasting tool designed for time series analysis and forecasting (Battineni et al. 2020; ChikkaKrishna et al. 2022). FB-Prophet finds extensive application across various domains, including business, finance, and economics, for forecasting purposes. The primary methodology employed by FB-Prophet involves a decomposable time series model that encompasses three key model components: trend, seasonality, and holidays (Daraghmeh et al. 2021; Garlapati et al. 2021). The combination of these three components can be mathematically represented by the following equation:

$$y\left(t\right)=g\left(t\right)+s\left(t\right)+h\left(t\right)+{\epsilon }_{t}$$
(4)

where \(y\left(t\right)\) represents additive regression model, \(g\left(t\right)\) represents trend function (or growth term), \(s\left(t\right)\) describes seasonality, and \({\epsilon }_{t}\) is an error term. For training the FB-Prophet model, the internal width, change point prior scale, seasonality prior scale, change point range, and uncertainty sample were set to 0.9, 2, 9, 0.6, and 1000, respectively.

Long short-term memory (LSTM)

The LSTM represents a specialized variant within the domain of recurrent neural networks (RNNs) (Sherstinsky 2020; Wunsch et al. 2021). It is specifically engineered for the management and analysis of sequential data, rendering it highly suitable for applications associated with time series data, natural language processing, and a variety of other sequential data processing tasks (Cao et al. 2018; Sagheer and Kotb 2019; Wunsch et al. 2021; Yang and Zhang 2022). The LSTM addresses certain limitations present in conventional RNNs, including the vanishing gradient problem, which has the potential to prevent deep networks from being trained on long sequences. In that sense, the LSTM is distinguished from traditional RNNs by its ability to capture and remember long-range dependencies in data. A typical LSTM unit comprises four main components: a cell, an input gate, an output gate, and a forget gate, which allow them to selectively store and retrieve information over time (Mohan and Gaitonde 2018; Wunsch et al. 2021). The schematic diagram of the recurrent neural network (top) and the LSTM architecture (bottom) is shown in Fig. 4, and the corresponding LSTM equations are as follows:

$${f}_{t}=\sigma \left({W}_{f} \cdot \left[{h}_{t-1},{x}_{t}\right]+{b}_{f}\right)$$
(5)
$${i}_{t}=\sigma \left({W}_{i} \cdot \left[{h}_{t-1},{x}_{t}\right]+{b}_{i}\right)$$
(6)
$${\stackrel{\sim}{C}}_{t}=tanh\left({W}_{C} \cdot \left[{h}_{t-1},{x}_{t}\right]+{b}_{C}\right)$$
(7)
$${C}_{t}={f}_{t}.{C}_{t-1}+{i}_{t}.{\stackrel{\sim}{C}}_{t}$$
(8)
$${o}_{t}=\sigma \left({W}_{o} \cdot \left[{h}_{t-1},{x}_{t}\right]+{b}_{o}\right)$$
(9)
$${h}_{t}={o}_{t}.\text{tanh}\left({C}_{t}\right)$$
(10)

Here, \({f}_{t}\) represents the forget gate which specifies what information from the prior cell state \(\left({C}_{t-1}\right)\) should be forgotten, the input gate \({i}_{t}\) controls what new information should be added to the cell state \(\left({C}_{t}\right)\), \({\stackrel{\sim}{C}}_{t}\) represents the new candidate values to be added to the cell state, \({C}_{t}\) updates the cell state by combining the information to be forgotten and the new information, the output gate \({o}_{t}\) regulates what information from the cell state \(\left({C}_{t}\right)\) should be output as the hidden state \(\left({h}_{t}\right)\) which computes the hidden state based on the cell state and the output gate. In above equations, \(\sigma\) is the sigmoid activation function, tanh is the hyperbolic tangent activation function, \({W}_{f}, {W}_{i},{W}_{C}, {W}_{o}\) represent weight matrices for the forget gate, input gate, candidate cell state, and output gate, respectively, \({b}_{f},{b}_{i},{b}_{C},{b}_{o}\) represent bias vectors for the corresponding gates, \({x}_{t}\) is the input at time step \(t\), \({h}_{t-1}\) is the hidden state at the previous time step \(t-1\). These equations provide a comprehensive description of how an LSTM cell manages sequential data, maintains and refreshes information, and governs the transmission of data to subsequent time steps or the output. In this model, one-month lagged water level values and three stacked LSTM layers with varying units (128, 64, and 64) were utilized to predict the water levels of lakes. The Rectified Linear Unit (ReLU) and Adam were used as the activation function and optimizer, respectively, with 100 epochs and a batch size of 64.

Fig. 4
figure 4

The schematic diagram of the recurrent neural network (top) and the LSTM architecture (bottom)

The parameters of each algorithm used for predicting water levels in the Great Lakes region are given in Table 2.

Table 2 Parameters of each model for the Great Lakes water level prediction

Model performance evaluation

In this study, four widely used statistical indicators were employed to assess the effectiveness of the model used to forecast lake water levels in the Great Lakes basin. These four different indicators can be defined as follows:

  1. i.

    Root mean square error (RMSE) is used to measure residuals between the predicted value and the observed value. The RMSE value is generally considered to be between 0 and 1. Higher prediction accuracy is associated with a lower RMSE value. The mathematical expression for the RMSE is given by the following equation:

$$RMSE={\left[\sum _{i=1}^{n}\frac{{\left({y}_{i}-\widehat{{y}_{i}}\right)}^{2}}{n}\right]}^{\frac{1}{2}}$$
(11)

where n is the total number of data points, \({y}_{i}\) is the observed (actual) value, and \(\widehat{{y}_{i}}\) denotes the predicted value.

  1. ii.

    Mean absolute error (MAE) serves as a metric for assessing the average magnitude of error between a set of observed or predicted values. A lower MAE signifies higher model performance. The equation for MAE can be expressed as:

$$MAE\; = \;\left( {\frac{1}{n}} \right)\sum {\left| {{y_i} - {{\widehat y}_i}} \right|}$$
(12)

where n is the total number of data points, \({y}_{i}\) is the observed value, and \(\widehat{{y}_{i}}\) is the predicted value.

  1. iii.

    The mean absolute percent error (MAPE) measures the average percentage variance between the observed values and the predicted values. A lower MAPE signifies an improved predictive model because it indicates that the predictions are closer to the actual values in terms of percentage error. The equation for MAPE can be written as:

$$MAPE=\left(\frac{1}{n}\right)\sum \left|\frac{{y}_{i}-\widehat{{y}_{i}}}{{y}_{i}}\right|100\%$$
(13)

where n is the total number of data points, \({y}_{i}\) is the observed value, and \(\widehat{{y}_{i}}\) is the predicted value.

  1. iv.

    Coefficient of determination (R2), which ranges from 0 to 1, expresses how accurately a statistical model forecasts an outcome. Forecasting performance improves when the value of R2 is close to 1. The expression for R2 is given by:

$${R}^{2}=\frac{\sum _{i=1}^{n}\left({y}_{i}-\overline{{y}_{i}}\right)\left(\widehat{{y}_{i}}-\overline{\widehat{{y}_{i}}}\right)}{\sqrt{{\sum _{i=1}^{n}\left({y}_{i}-\overline{{y}_{i}}\right)}^{2}\sum _{i=1}^{n}{\left(\widehat{{y}_{i}}-\overline{\widehat{{y}_{i}}}\right)}^{2}}}$$
(14)

Here, n is the total number of data points, \({y}_{i}\), \(\overline{{y}_{i}}\), \(\widehat{{y}_{i}}\), and \(\overline{\widehat{{y}_{i}}}\) represent the observed value, mean of the observed value, the predicted value, and mean of the predicted value, respectively.

Results

In order to accurately estimate the monthly water level fluctuations in Lakes Erie, Huron-Michigan, Ontario, Superior, and St. Clair, located within the Great Lakes basin, a data partitioning strategy was employed. The dataset was subdivided into three distinct sets, consisting of a training set (70%), a validation set (15%), and a testing set (15%). The initial training phase of model utilized the training dataset, spanning from 01/01/1950 to 01/10/2001. During this phase, the model learned how to process information and make predictions. Subsequently, hyperparameter optimization was performed using the validation dataset. Once the model was constructed and optimized using the training and validation datasets, test data covering the time period from 01/11/2001 to 01/12/2010 was employed to assess its predictive capabilities. A diverse set of performance indicators, including RMSE, MAE, MAPE, and R2, were employed to assess the effectiveness and accuracy of various models.

Figure 5 displays graphical representations illustrating the partitioning of data for all lakes in the Great Lakes region. The training and testing parts are represented by solid magenta and solid blue lines, respectively. For the sake of clarity, a vertical black dashed line is used to distinguish the training section from the testing section.

Fig. 5
figure 5

Data partitioning of monthly water level of lakes in the Great Lakes region. The training parts are represented by solid magenta while the test parts are denoted by solid blue lines. A vertical black dashed line is used to distinguish the training section from the test section

Figure 6 shows regression fits for a comparative analysis between the observed and estimated monthly water levels for each model across all lakes in the Great Lakes basin. Each row corresponds to a lake, while each column is associated with a model. The results obtained from the MLR, NARX, and LSTM models are in good agreement with the observed water levels, yet the best performance is achieved from the NARX model for all lakes. The regression lines of the NARX model for all lakes have a slope of almost unity, indicating that all data points are roughly distributed very close to the regression lines.

Fig. 6
figure 6

Scatter plots of observed versus predicted water levels for each model during the testing period

Referring to the performance metrics in Table 3 for all models across each lake, the NARX model consistently exhibits the best values for RMSE, MAE, MAPE, and R2. Additionally, application of the NARX model yields the most favorable performance metrics for Lake Huron-Michigan, with the lowest values recorded for RMSE (0.029), MAE (0.022), and MAPE (0.013%), alongside the highest R2 (0.995) value, when compared to all other lakes.

Table 3 Performance metrics of each model for the Great Lakes water level prediction

The performance of each model is visually presented in Fig. 7, where bar graphs depict the RMSE, MAE, and MAPE for all models across every lake, excluding R2 for clarity due to its significantly larger values compared to other performance indicators. While comparing performance metrics for all lakes, it is evident that the NARX outperforms other models. However, the least satisfactory water level predictions across all lakes are observed with FB-Prophet.

Fig. 7
figure 7

Performance metrics for the prediction capabilities of each model

Similarly, Fig. 8 displays quantitative analysis plots for the observed and predicted water levels as a function of time for various models across all lakes. The blue solid line shows the observed water level, while the green dotted line, red dashed line, purple square, and mustard dash-dotted line represent the predicted water levels of MLR, NARX, FB-Prophet, and LSTM, respectively. The time series plots for all lakes clearly demonstrate that the predicted water level values for all models except FB-Prophet are in excellent agreement with the observed water level values, as the predicted lines closely follow the trend of the observed line.

Fig. 8
figure 8

Scatter plots of observed versus predicted water levels as a function of time during the testing period

Figure 9 shows notched boxplots illustrating the comparison between observed and predicted lake water levels for each model. The notched boxplots clearly demonstrate that each model, across all lakes, could accurately predict median water levels, as their notches overlap with the observed water levels. Similarly, each model, with the exception of FB-Prophet, nearly captured both the first (Q25) and third (Q75) quartiles of the observed water level values and accurately predicted the maximum values of the observed water levels for all lakes. As for the minimum water level values across all lakes, the best prediction was obtained using the NARX and LSTM models. Furthermore, an outlier beyond the whiskers is clearly visible for MLR and LSTM in the case of Lake St. Clair. Although all models except FB-Prophet provided good accuracy for water level prediction in all lakes, the overall best performance was achieved using the NAXR model.

Fig. 9
figure 9

Notched boxplots showing comparison between observed and predicted lake water levels for each model

Discussion

The prediction and monitoring of freshwater levels are essential for both foreseeing and addressing the challenges posed by water scarcity, floods, and environmental degradation. This enables the effective management of resources and promotes sustainable development. There has been great interest in studies focusing on predicting freshwater levels of the Great Lakes. According to the evaluation metric values presented in Table 3, the predictions generated by the ML models in this study demonstrate an acceptable level of accuracy when compared to findings from previous studies. For instance, Coulibaly (2010) utilized various neural networks to forecast water levels exclusively for the major lakes in the Great Lakes region, except for Lake St. Clair, and assessed the accuracy of the predictions using RMSE and correlation coefficient (r). In his study, the best RMSE values obtained for Lakes Erie, Huron-Michigan, Ontario, and Superior are 0.06, 0.04, 0.08, and 0.03, respectively. These values are higher than those presented in Table 3 in this study, with the exception of Lake Superior, where the value is almost the same. In addition, Altunkaynak (2014) predicted water levels for Lake Michigan-Huron, attaining an RMSE value of 0.112 by employing a combination of three methods. It is evident that this value significantly exceeds the one presented in Table 3 in this study. In a recent published paper, Wang and Wang (2020) used various ML models to predict Lake Erie water levels based on RMSE, MAE, r, and mutual information (MI) performance metrics. The RMSE and MAE values achieved for Lake Erie in their study are 0.02 and 0.01, respectively, which closely resemble those obtained in this study. Similarly, Barzegar et al. (2021) conducted water level forecasting in Lakes Ontario and Michigan utilizing various ML models, which were evaluated based on numerous metrics. Their report showed that Lake Ontario achieved RMSE and MAE values of 0.082 and 0.064, respectively, as the best outcomes, which are higher than the values obtained in this study.

The findings of this study suggest that the NARX model has produced highly promising and encouraging results compared to other models. Performance assessments across various metrics during the testing period revealed significant improvements with the NARX model. Specifically, for Lake Erie, RMSE decreased by up to 21.3% and 11.1%, MAE decreased by up to 25% and 18.2%, MAPE decreased by up to 22.2% and 16%, and R2 increased by up to 10.4% and 7.6% compared to the MLR and LSTM models, respectively. Similarly, for Lake Huron-Michigan, RMSE decreased by up to 32.6% and 29.3%, MAE decreased by up to 37.1% and 33.3%, MAPE decreased by up to 35% and 31.6%, and R2 increased by up to 6.8% and 5.9% compared to the MLR and LSTM models, respectively. Furthermore, for Lake Ontario, RMSE decreased by up to 39.6% and 34.4%, MAE decreased by up to 42% and 37.3%, MAPE decreased by up to 43.1% and 38%, and R2 increased by up to 16.4% and 12.7% compared to the MLR and LSTM models, respectively. Concerning Lake Superior, RMSE decreased by up to 21% and 30.6%, MAE decreased by up to 16.1% and 27.8%, MAPE decreased by up to 17.6% and 30%, and R2 increased by up to 5.3% and 8.3% compared to the MLR and LSTM models, respectively. Lastly, for Lake St. Clair, RMSE decreased by up to 14.6% and 13.6%, MAE decreased by up to 23.1% and 25.4%, MAPE decreased by up to 21.6% and 23.7%, and R2 increased by up to 33.8% and 32.2% compared to the MLR and LSTM models, respectively.

Furthermore, to provide a comprehensive understanding of water level prediction performance across different models, visual comparisons are presented in both Figs. 6 and 8. These figures offer detailed insights into how the MLR, NARX, FB-Prophet, and LSTM models perform in capturing the underlying trends of the actual data. From the visualization, it is evident that all three models (MLR, NARX, and LSTM) exhibit commendable abilities to capture the overall trend of the observed data. However, upon closer examination, the NARX model emerges as the most accurate in fitting the data, showcasing its superior predictive capabilities. Notably, although the performance of all models, except for FB-Prophet, is considered satisfactory, particularly for Lakes Erie, Huron-Michigan, and Superior, it is the NARX model that consistently outperforms the others across all lakes in the Great Lakes basin. This highlights the robustness and reliability of the NARX model in predicting water levels across various hydrological settings.

Conclusions

The sustainability of our ecosystems significantly depends on the vital contribution of freshwater, highlighting the crucial need for monitoring water levels and ensuring the effective management of freshwater resources. The present study reports the prediction of monthly mean water levels for lakes in the Great Lakes basin based on historical datasets, including air temperature, evaporation, and precipitation, using a variety of models. By utilizing a comprehensive range of evaluation metrics, which encompass RMSE, MAE, MAPE, and R², the effectiveness of four prominent models (MLR, NARX, FB-Prophet, and LSTM) systematically evaluated across five significant lakes: Erie, Huron-Michigan, Ontario, Superior, and St. Clair. The results of the current study reveal that the predictive capability of all models, except for FB-Prophet, is in good agreement with the observed water levels, particularly for Lakes Erie, Huron-Michigan, and Superior. However, it is apparent that the predicting performance of MLR and LSTM is diminished for Lakes Ontario and St. Clair (Table 3). While MLR and LSTM offer satisfactory performance in predicting water levels in the Great Lakes, the NARX model achieves the best overall performance across all lakes, demonstrating superior prediction ability for water levels. In the case of Lake Erie, NARX emerges as the standout performer, boasting the lowest RMSE (0.048), MAE (0.036), and MAPE (0.021%) values, coupled with an impressive R2 score of 0.977. Similarly, for Lake Huron-Michigan, NARX once again stand outs, demonstrating unparalleled predictive precision with an RMSE (0.029), MAE (0.022), MAPE (0.013%), and an exceptional R2 value of 0.995. When the focus shifts to Lake Ontario, NARX consistently demonstrates strong performance metrics, with an RMSE of 0.061, MAE of 0.047, MAPE of 0.062%, and R2 of 0.960, despite exhibiting slightly higher error metrics compared to other lakes. This observation underscores the complex interplay between the dynamics of the model and the unique characteristics of each lake. Meanwhile, in the case of Lake Superior, both NARX and LSTM exhibit commendable predictive accuracy, underscoring their efficacy in capturing the complex hydrological dynamics inherent to the region (Table 3). However, the predicting landscape presents greater challenges when shifting our focus to Lake St. Clair, where predictive performance encounters significant obstacles across all models. Despite this, NARX maintains relatively strong performance metrics with an RMSE of 0.076, MAE of 0.050, MAPE of 0.029%, and R2 of 0.953, solidifying its position as the leading model for water level prediction in the Great Lakes basin. The findings of this research suggest that the current study can help effectively manage water resources and advance the knowledge of water level prediction in the Great Lakes region. Moreover, this study focuses solely on the correlation between water levels and meteorological features such as air temperature, evaporation, and precipitation for water level prediction in the Great Lakes basin. Other relevant variables like wind speed and direction, humidity, atmospheric pressure, and solar radiation, which could impact water levels, have not been taken into account. However, further studies will be performed to investigate the inclusion of these factors to enhance the understanding of their correlation with water levels and their influence on prediction accuracy. Furthermore, future research will also include the development of machine learning hybrid models to enhance the analysis of water level prediction in the Great Lakes basin.