The development of dissolved oxygen forecast model using hybrid machine learning algorithm with hydro-meteorological variables

Dissolved oxygen (DO) forecasting is essential for aquatic managers responsible for maintaining ecosystem health and the management of water bodies affected by water quality parameters. This paper aims to forecast dissolved oxygen (DO) concentration using a multivariate adaptive regression spline (MARS) hybrid model coupled with maximum overlap discrete wavelet transformation (MODWT) as a feature decomposition approach for Surma River water using a set of water quality hydro-meteorological variables. The proposed hybrid model is compared with numerous machine learning methods, namely Bayesian ridge regression (BNR), k-nearest neighbourhood (KNN), kernel ridge regression (KRR), random forest (RF), and support vector regression (SVR). The investigational results show that the proposed model of MODWT-MARS has a better prediction than the comparing benchmark models and individual standalone counter parts. The result shows that the hybrid algorithms (i.e. MODWT-MARS) outperformed the other models (r = 0.981, WI = 0.990, RMAE = 2.47%, and MAE = 0.089). This hybrid method may serve to forecast water quality variables with fewer predictor variables. Supplementary Information The online version contains supplementary material available at 10.1007/s11356-022-22601-z.


Introduction
The deterioration of the quality of water sources throughout the world is considered a wide-reaching issue of importance. Because of the rapid rise of communities and the diversity of their activities, this deterioration is speeding up, and it could constitute a severe threat to the aquatic environment and human health (Henderson et al. 2009;Hur and Cho 2012;Mouri et al. 2011;Su et al. 2011).
The dissolved oxygen (DO) in water is a critical water quality variable that is crucial for the proper functioning of the aquatic ecosystem (Ranković et al. 2010). DO demonstrate the water pollution in rivers (Heddam and Kisi 2018;Mohan and Kumar 2016) and the state of the river's ecosystems (Mellios et al. 2015;Ranković et al. 2010). The concentration of dissolved oxygen (DO) in aquatic systems refers to the metabolism of the aquatic systems, and it reflects the transient balance between the oxygen system and the metabolic activity. The concentration of DO is affected by a variety of parameters, including salinity, temperatures, 1 3 and pressure (US-Geological-Survey 2016). Researchers investigated the concentration and change of DO over the last decade since the dynamics of DO are nonlinear (Kisi et al. 2020). It is very desirable for water resource managers to develop a DO model for rivers that can reliably quantify and predict DO concentrations based on hydro-meteorological variables.
There are various methods available for estimating the DO concentration, but most of them are time-consuming and expensive to use since they require numerous parameters that are not readily available in most cases (Suen and Eheart 2003). More to the point, conventional data processing techniques are no longer appropriate for water quality modelling, which may be linked to the explanation that many parameters affecting water quality have a complicated nonlinear interaction with one another (Ahmed 2017;Xiang et al. 2006). There are specific issues in developing a water quality model for tiny streams or rivers due to the lack of available data, investment, and many different inputs to consider. As a result, certain well-known water quality analysis models, such as the United States Environmental Protection Agency (USEPA): QUAL2E and QUAL2K, WASP6, require a great deal of information that is not always readily available (Ahmed 2017). Moreover, these models are complex and sensitive and, therefore, tough to recognise.
This study investigates the utilisation of multivariate adaptive regression splines (MARS) (Friedman 1991) to describe DO dynamics' intrinsic nonlinear and multidisciplinary relationship. Like neural networks, no prior information on the numerical function is required for MARS. The benefit of the MARS model is that it can accomplish complex data by grouping related data collected, permitting it to understand easily (Zhang and Goh 2016). Considering the positive attribute, the MARS model has been used in hydrology (Deo et al. 2017b;Heddam and Kisi 2018;Kisi and Parmar 2016;Yin et al. 2018) and the energy sector (Al-Musaylh et al. 2019). Heddam and Kisi (2018) applied the least-square support vector machine (LSSVM), multivariate adaptive regression splines, and M5 model tree (M5T) for daily dissolved oxygen forecasting. The authors found the MARS model a substantial forecasting approach with a limited number of predictor variables. Therefore, incorporating the hybrid approaches and a potential feature selection algorithm may boost the result of forecasting. Nevertheless, the hybrid MARS models are yet to be executed in the study sites of Bangladesh.
Using multi-resolution analysis (MRA), a technique for extracting data features, the prediction performance can be enhanced significantly. Using the EMD, you can decompose a signal following the spirit of the Fourier series into a specific number of components. A coefficient representing Gaussian white noise with a unit variance is introduced sequentially to the time series in CEEMDAN-based decomposition to reduce the complexity and avoid the intricacy of the time series (Prasad et al. 2018). A coefficient denoting Gaussian white noise with covariance matrices is introduced sequentially to the time series in CEEMDAN-based decomposition to reduce the complexity and prevent the intricacy of the time series (Di et al. 2014). Previous studies have used CEEMDAN in forecasting soil moisture (Ahmed et al. 2021a;Prasad et al. 2018Prasad et al. , 2019 with an earlier version (i.e. EEMD) used in forecasting stream-flow (Seo and Kim 2016) and rainfall (Beltrán-Castro et al. 2013;Jiao et al. 2016;Ouyang et al. 2016). Discrete wavelets transform (DWT) has been employed (Deo and Sahin 2016;Nourani et al. 2014Nourani et al. , 2009 in different fields of hydrology. On the other hand, DWT has a limitation that prevents it from extracting all the features of the predictors in its entirety. An enhanced discrete wavelet transforms (DWT), such as the MODWT, can solve these problems (Cornish et al. 2006;Prasad et al. 2017;Rathinasamy et al. 2014). Al-Musaylh et al. (2020) successfully used MODWT to decompose the short-term electricity demand of Australia. The study incorporated the MODWT by separately splitting the data to training, testing, and validation to calculate the detailed approximation, as Quilty and Adamowski (2018) prescribed. The potential application of MODWT is further approved by Prasad et al. (2017), where MODWT was used to forecast stream-flow. However, neither the MODWT nor the DWT decomposition model has incorporated the MARS model in DO forecasting, as attempted in this study.
The feature selection technique, namely neighbourhood component analysis (NCA) for regression, was used in this investigation. As a result of the algorithm being slowed down by the extraneous and redundant features, the prediction model is less accurate (Arhami et al. 2013) different feature selection methods have been utilised in predictive models (Ahmed et al. 2021a;Prasad et al. 2017Prasad et al. , 2019. The NCA method has been successfully applied by Ahmed et al. (2021a) to forecast surface soil moisture. The study demonstrates that the feature weight calculated by NCA was found successful in forecasting soil moisture and to the study by Ghimire et al. (2019b), where they applied NCA for solar radiation forecasting. Forecasting DO concentration with a machine learning method incorporated with the NCA feature selection method and feature decomposition methods would substantially increase forecasting performance.
To the author's knowledge, there has been no systematic comparison of various feature decomposition strategies in improving MARS performance for daily DO estimates. The fundamental contribution of this study is the selection of an appropriate feature decomposition algorithm (i.e. MODWT, DWT, CEEMDAN, EEMD, and EMD) tailored MARS model for DO prediction. While effective adjustment of MARS parameters via feature decomposition algorithms can increase prediction accuracy, the incorporation of feature selection and feature decomposition theories can aid decision-makers in making the optimal choice for the best prediction model. Because attempting all available optimisation techniques is practically impossible, the scope of the current study has been reduced to a few potential algorithms to be merged with the MARS. As a result, the goal of this study is to (1) use 5 feature decomposition techniques to modify MARS ability, (2) compare the performances of hybridised MARS models, and (3) rank the hybridised MARS models using hydro-meteorological variables. The findings of this work will be a helpful tool that can provide valuable information for better water management.

Multivariate adaptive regression spline
According to Friedman (1991), a non-parametric and nonlinear regression technique, the multivariate adaptive regression spline (MARS), was utilised in this investigation. MARS uses numerous splines to build nodes between these lines (Friedman 1991). The underlying functional link between inputs and outputs is not assumed in the MARS model. The data in each spline is assigned using basis functions (BF) in MARS models. It is possible to express the BF as a single equation between two knots. Two adjacent data domains converge at a knot, and the output is continuous. An adaptive regression algorithm is used (Heddam and Kisi 2018). The MARS model depicts the piecewise relationship between the input and output variables using numerous lines. The over-fitting of training data is avoided by setting a predefined minimum number of observations between knots (Heddam and Kisi 2018).
Let y be the target output, and a matrix of n input variables be the vector x = (x 1 , … , x n ) . The data are then presumed to be created from an undisclosed 'true' model. In the case of a straight answer, this will be as follows: In which is the distribution of the model error, and n is the number of training data points. By adding sufficient BFs, MARS approximates the f(.). For linear functions piecewise: max (0, x-t) where a knot exists at position t (Zhang and Goh 2016). The max (.) equation implies that only the positive portion of (.) is used; otherwise, a zero value will be given corresponding to: Thus, f (x) is constructed as a linear BF(x) combination: The coefficients are constants, calculated using the form of least squares. Initially, f (x) is applied to input data in a forward-backward stepwise process to determine the knot's position where the feature value varies (Deo et al. 2017b). A broad model is built at the end of the forwards' stage to over-fit the qualified input data. According to the generalised cross-validation, the model is optimised by deleting one last basis function from the model (GCV). GCV for a model is computed as follows for the training data with n observations: where M is the number of BF, d is the penalising parameter, n is the number of measurements, and f (x i ) denotes the MARS model's expected values.
MARS is a non-parametric regression modelling technique that is flexible and does not make any assumptions about the relationships between the variables (Stull et al. 2014). The model is simple to understand and interpret (Kuhn and Johnson 2013). MARS models typically exhibit a favourable bias-variance trade-off. While the models are sufficiently flexible to account for nonlinearity and variable interactions (and so have a relatively low bias), the limited nature of the MARS basis functions precludes excessive flexibility (thus, MARS models have relatively low variance). (1)

Maximal overlap discrete wavelet transforms
Distinctive wavelet transforms (DWTs) are modified by the maximal overlap discrete wavelet transform (MODWT) (Li et al. 2017a). Ideally, time series analysis can be done using the MODWT's appealing qualities, which prevent missing data without subsampling. MODWT's ability to extract additional information is enhanced because the coefficients of decomposed components in each layer are identical to the original time series. Time-series data are broken down into high-pass and low-pass filters using MODWT, which handles two feature sets. Further, highpass filters can be broken down into several information levels depending on the suitable time frame (He et al. 2017). Low-pass filters reflect the real-time-series signal pattern called an approximation. The signal m is decomposed through wavelet low-pass π m and high-pass detail filters h m and reconstructed by digital reconstruction filters complementing decomposition filters. This principle is described in the equations below:

Comparing models
In this study, we proposed a MODWT-MARS model to predict the dissolved oxygen of a running river. To find a practical approach to machine learning methods and feature decomposition methods, a pool of six machine learning models and five feature decomposition methods were also incorporated. The theoretical description of the proposed algorithms (i.e. MODWT and MARS) was explained in the previous section, and this section provides a short overview of the comparing algorithms. Breiman (2001) proposed an algorithm based on a random forest (RF), which included methods for regression and classification. The bootstrap resampling procedure generates a new set of training data from the initial training sample set N, and then bootstrap-set random forests are built using K decision trees. The RF model's full specifications may be read here (Ali et al. 2020a). The random forests approach has become a prominent tool for classification, prediction, investigating variable relevance, selection, and outlier identification. RF comprises a group (ensemble) of basic tree predictors. Each tree may generate a response given a collection of predictor values (Jui et al. 2022;Yu et al. 2017).
With regularisation and the kernel technique, it is possible to reduce over-fitting using the KRR (Kernel Ridge Regression) regression model (Saunders et al. 1998). The "kernel technique" can be used to generate a nonlinear form of ridge regression. Extending the general framework, kernel ridge regression allows nonlinear prediction. Linear, polynomial and Gaussian kernels are only some of the many options available for enhancing overall performance (You et al. 2018). The suggested KRR technique has the fundamental advantage of learning a global function and predicting any target variable using a regularised variation of least squares.
The Bayesian modelling approach uses hierarchical data (Huang and Abdel-Aty 2010). Bayesian regression uses this regularisation parameter, easily tailored to the data. The Gaussian maximum posterior estimate is discovered before the coefficient w and, with an accuracy of λ (-1), is treated as a random variable instead of a lambda. In contrast, most decision-making analyses based on maximum likelihood estimation entail determining the values of parameters that may significantly impact the analysis outcome and for which there is considerable uncertainty. The capacity to include previous information is one of the primary advantages of the Bayesian technique (Saqib 2021).
A machine learning kernel method known as SVR (Support Vector Regression) can be used for various purposes, including forecasting time series. SVRs that use kernels can also learn the nonlinear trend of the training data. There are three SVR models to pick from, each with a different kernel (RBF, poly, and linear) . It should also be noted that the proposed KRR model in its generic sense has been used in many research including the forecasting of precipitation (Ali et al. 2020b), drought (Ali et al. 2019), wind speed (Alalami et al. 2019;Douak et al. 2013;Mishra et al. 2019;Naik et al. 2018;Zhang et al. 2019), and solar power (Dash et al. 2020).
K-nearest neighbours (KNN) algorithm is implemented using instance-based learning, which serves two purposes: (1) estimating the test data density function and (2) categorising the test data obtained from the test patterns (Shabani et al. 2020). Choosing the number of neighbours (k) is a crucial stage. This method's efficiency depends on selecting samples from the nearest reference database (or most similar). If k is significant, other points from other classes can be placed inside the desired range of possibilities (Wu et al. 2008). The KNN method has been successfully applied previously (Ghiassi et al. 2017;Liu et al. 2020).
This study incorporated five decomposition methods (i.e. DWT, EMD, EEMD, MODWT and CEEMDAN) and six machine learning methods (i.e. MARS, RF, BNR, SVR, KNN and KRR) to address the prediction problem of dissolved oxygen concentration. Hyperspectral feature decomposition is DWT-assisted, and the features are evaluated for their efficacy in discriminating between subtly different ground covers (Bruce et al. 2002). The theoretical explanation of the method is explained by other researchers (Agbinya 1996;Fowler 2005;Shensa 1992). Most recently, Huang et al. (Huang et al. 1998) developed an empirical mode decomposition (EMD) method for analysing the information contained in data derived from non-stationary and nonlinear systems. This algorithm decomposes the signal into a series of oscillatory functions that are 'well-behaved,' which are referred to as the intrinsic mode functions in this context (IMFs). When used with the powerful adaptive EMD tool, it behaves as a dyadic filter bank (Flandrin et al. 2004). It is handy for filtering out noise in the measurement domains (Khaldi et al. 2008). Torres et al. (2011) implemented the CEEMDAN process to reduce the computational cost and retain the ability to eliminate mode mixing. The readers are requested to go through the previous studies (Ahmed et al. 2021a;Zhang et al. 2017;Zhou et al. 2019) for getting further information on CEEMDAN.

Study area and data
The Surma River, Bangladesh, provided daily water quality factors. Figure 1 depicts the Surma River monitoring stations. This river drains one of the heaviest runoffs in the Surma-Meghna Basin system (Chowdhury and Ali 2006). The Surma River originates in Assam's Cachar district, flows through Bangladesh's Sylhet and Sunamganj districts, joins the Meghna River near Bhairab Bazar Kishoreganj, and empties into the Bay of Bengal. Many studies are found regarding water quality analysis (Ahmed 2017;Ahmed and Shah 2017a, b), riverbank erosion (Islam and Hoque 2014), stream flows (Ahmed and Shah 2017b), and water level modelling (Biswas et al. 2009). The Surma River's Keane Bridge station provided the study's water quality variables between January 2017 and December 2019 obtained 15 cm to 20 cm below the surface.
The selection of prospective predictive factors is critical for predictive modelling. Various studies reveal that some variables predict DO better than the others (Ahmed 2017;Tomic et al. 2018). Ahmed (2017) used Biological Oxygen Demand (BOD) and Chemical oxygen demand (COD) for predicting the dissolved oxygen of the Surma River. Kisi and Ay (2012) observed that the temperature, pH, and electrical conductivity are highly influential over Fountain Creek, Colorado. However, Ranković et al. (2010) claimed that pH and water temperature have a practical relation in DO prediction, whereas nitrates, chloride, and total phosphate have poor connections. It is found that pH is a standard variable for predicting DO values using ANN, followed by temperature. However, along with pH and temperature, some authors used oxygen-containing (PO 4 3− , NO 3 -N) variables or oxygen demanding variables (NH −4 N, COD, and BOD) (Wen et al. 2013). Turbidity (Iglesias et al. 2014) and total solid can be considered essential water quality parameters, as their high value indicates typically high values of other parameters associated with water quality. The missing values were interpolated from two adjacent values. The fundamental statistics of the input variables are tabulated in Table 1.

Development of MODWT-MARS model
The multi-phase MODWT-MARS model and other benchmark models were created in Python using the sci-kit-learn machine learning platform (Pedregosa et al. 2011b). All simulations were performed on a machine with an Intel i7 processor running at 3.6 GHz and 16 GB of RAM. Furthermore, a software platform such as 'MATLAB2020' is employed for feature selection using neighbourhood component analysis (NCA). However, tools such as matplotlib (Barrett et al. 2004) and seaborn (Waskom et al. 2020) are employed to visualise the forecasted DO. Figure 2 depicts the workflow of the proposed MODWT-MARS model.
The wavelet transformation using MODWT was combined with the predictor variables filtered by the NCA approach to create the MODWT-MARS model. Identifying the wavelet-scaling filter types and decomposition level is vital in creating a substantial wavelet transformation model. Because there is no one approach to choose the optimal filter, Al-Musaylh et al. (2020) used a trial and error strategy. Quilty and Adamowski (2018) discovered an issue in the forecast model inputs due to erroneous wavelet decomposition during the wavelet-based forecasting model. The inaccuracy can be traced back to the decomposition process's boundary conditions. They identified three problems: (1) improper use of future data, (2) unsuitable selection of decomposition levels and filters, and (3) incorrect division of validation and calibration data. The readers are encouraged to look up more information about the findings of Quilty and Adamowski (2018). The authors' concern about the development of MODWT and DWT decomposition were addressed in this study. After separating the DO variables to resolve more comprehensive information to create the MODWT-MARS model, Fig. 3 displays the time-series of the intrinsic mode functions (IMFs) and the residual components and decomposed components of MODWT.
There is no formula for verifying whether or not a model's valid predictors are present (Tiwari and Adamowski 2013). Although the research describes three input selection strategies for picking the time series of lagged memories of DO and predictors for an optimum model, the literature does not specify which method should be used. The autocorrelation function (ACF), partial autocorrelation function (PACF), and cross-correlation function (CCF) approaches are the three types of approaches to consider. A substantial antecedent behaviour in terms of the lag of DO from the Keane Bridge The cross-correlation function determines which predictor's antecedent lag selects the input signal pattern and which pattern the predictor selects (Adamowski et al. 2012). The cross-correlation function is used to establish the statistical similarity between the predictors and the target variable. The cross-correlation function between the predictors and the DO for the River Surma is depicted in Fig. 5a. Afterwards, a set of significant input combinations were determined by assessing r cross of each predictor with DO. In this plot, a 95% confidence level of the statistically significant r cross is shown in the blue line. It is found from the Fig. 5a the correlation of respective data with DO was found as the highest for all stations at lag zero (r cross ≈ 0.25-0.45). A similar procedure is maintained for the decomposed predictor variables. Figure 5b-f demonstrate the r cross value between #d 1 (DO) and #d n (Predictors) and their respective residuals (n = 1 to 4). Figure 5 shows that the r cross value was ranged between 0.25 and 0.50 found more than 95% confidence level. The predictor data sets are normalised (Ahmed 2017;Ali et al. 2019) between 0 and 1 to minimise one variable's overestimation.
Python-based Scikit-learn (Pedregosa et al. 2011a) was used to build this study's SVR, RF, KRR, BNR, and KNN model. For SVR, the RBF (Radial Basis Function) was employed in developing the SVR model (Suykens et al. 2002). The RBF uses a faster function during training to examine nonlinearities between the objective and predictor variables (Goyal et al. 2014;Lin 2003;Maity et al. 2010). The tricky process of creating an accurate SVR model required identifying the 3D parameters (C, σ, and ε) (Hoang et al. 2014). This is why the NCA algorithm was used to select the parameters with the smallest weight value. (Pedregosa et al. 2011a). Alternatively, the MARS model adopted the Pythonbased Py-earth package (Rudy and Cherti 2017). The two MARS models used are cubic or linear piecewise functions. This study used a piecewise cubic model because it provided a smoother response. Also, the generalised recursive partitioning regression was adopted since it can handle multiple preconditioners. A forward and backward selection was used for optimisation. Initially, the algorithm ran with a "naïve" model that only contained the intercept term. The training MSE was reduced by iteratively adding the reflected pairs of basis functions ( Table 2).
The accuracy of the hybrid MARS and other comparing models was constructed using piecewise cubic and linear regression functions, respectively. The best MARS model was selected using the lowest Generalised Cross-Validation (GCV) (Lin 2003); the MODWT-MARS model yielded the lowest RMSE and the highest LM, demonstrating the most accurate predictions. The optimum tuning parameters of various machine learning methods are tabulated in Table 3.

Results
In this study, MARS models optimised using a feature decomposition approach were utilised to forecast DO time series using hydro-meteorological variables. Several ways were employed to do this, including the conventional machine learning models (i.e. MARS, RF, SVR, KNN, and KRR), feature decomposition methods (i.e. MODWT, CEEMDAN, EEMD, EMD, and DWT), and the feature selection method (i.e. NCA) to screen the optimal model to forecast the DO. Though the mathematical metrics are so ambiguous that there is no way to evaluate the suitable alternative, it is reasonable to use multiple performance evaluation approaches. Compared to the other models, the hybrid, and standalone models of BNR, KNN, KRR, and RF outstripped all decomposition methods. The performance of MODWT-MARS has revealed that the NCA algorithm helped choose the relevant features to assist the MARS in better emulating the future DO concentration. MODWT found important performance matrices, such as r, NSE, WI, RMSE, and MAE. The MODWT-MARS model outperforms all the other tested models.
This study used the NCA algorithm to screen the appropriate predictor variables in the model. Table 2 provides the input combination for forecasting DO. The robustness of Mentionable that KRR, KNN, and RF model provides poor performance comparatively. Further analysis through a box plot showing the forecasted vs observed DO and absolute forecasting error of all hybrid models is illustrated in Fig. 6. The absolute forecasted error was determined as |FE|= DO for -DO obs . The box plot demonstrates the observed (DO obs ) data dispersion and forecasted (DO for ) DO from the proposed machine learning approaches and comparing models. Figure 6b, c, and e visualise the quartiles' data with distinctly larger outliers. The lower end of the plot lies between the lower quartile (25 th percentile) and the upper quartile (75th percentile). The MODWT-MARS model shows an identical prediction compared with MODWT-SVR, with higher outliers for the SVR model. A more in-depth inspection of the absolute forecast error (|FE|) from the hybrid MODWT-MARS model further strengthens the suitability of the hybrid MARS approach in predicting the DO of the Surma River, which has the narrowest distribution compared with other models. The MODWT-MARS model has a significant percentage (98%) of the |FE| in the first error brackets (0 <|FE|< 0.25), while the MODWT-SVR model has a percentage of 95%. The empirical cumulative distribution function (ECDF) visualisation demonstrates the forecast error data's feature from the least to highest and perceives the full features circulated across the dataset. Figure 7 represents the empirical CDF of all six models for objective models and comparing models. The hybrid MODWT-MARS model was seen as reasonably sound against other models. The MODWT-MARS generated errors significantly lower from 0 to 0.25 mg/l. In the model-like KNN, KRR, and RF, the distribution of CDF was larger comparatively. The analysis also revealed that the standalone models showed a poor distribution, proving that MODWT-MARS was the most precise and responsive model.
To analyse the proposed MODWT-MARS model's further robustness, the models' forecasting performance was further assessed based on RRMSE and MAPE for all tested models, as shown in Fig. 8. From the figure, the magnitude of RRMSE and MAPE for the objective model (MODWT-MARS) is significantly lower, which clarifies the potential merits of the proposed model. The best RRMSE (3.6%) and MAPE (2.2%) were found for the MODWT-MARS model, which was followed by the MODWT-BNR model with moderate RRMSE (4.0%) and MAPE (3%). Besides, KNN, KRR, and RF models with MODWT showed RRMSE (11.5% to 13.5%) and MAPE (9.5% to 12%) values, demonstrating poor performance.
Compared to the standalone models using the Taylor diagram (Taylor 2001), the proposed model performance improves the interpretation presented in Fig. 9. The Taylor diagram demonstrates that the MODWT-MARS model with the NCA algorithm is closer to the observation than the comparing models. Again, the forecasted DO illuminates the proposed model's better pertinency than the standalone and benchmark models. The benchmark models' performance The scatter plot of the forecasted and observed DO for the proposed MODWT-MARS model portrayed a detailed comparison of DO forecasting (Fig. 10). The scatter plots comprise with the coefficient of determination (R 2 ) with goodness-of-fit between forecasted vs observed DO and a least-square fitting line and the corresponding equation; DO for = m X DO obs + C, where, m is referred to as the gradient, and C is denoted as the y-intercept. Figure 10 reveals that the proposed model displays significant performance with a more considerable R 2 value. The DO forecasting using a hybrid machine learning model (i.e. MODWT-MARS) performed significantly better than the other models. The magnitudes registered from the hybrid MODWT-MARS model were the closest to unity, which, in pairs (m|R 2 ), are 0.978|0.976, followed by MODWT-SVR (0.939|0.965). Moreover, the CEEMDAN-SVR (0.699|0.795) and CEEM-DAN-MARS (0.700|0.794) models provide a comparatively lower pair. Alternatively, y-intercepts [ideal value = 0] was found close to zero i.e. 0.084 for the proposed model. However, the y-intercept deviated from the ideal value with more outliers for the other models.
To attain a different interpretation of the proposed MODWT-MARS model's accuracy, the time series plot is used to comprehend the proposed model's forecasting ability. Figure 11 demonstrates the time series plot of forecasted and observed DO with MODWT-MARS compared to the standalone MARS model. Results show that the proposed MODWT-MARS model is found close to the observed DO revealed a high predictive accuracy. After applying the NCA algorithm as a feature selection approach and MODWT as a feature decomposition technique, the forecasted DO is enhanced.
Notably, five unique decomposition algorithms, EMD, EEMD, CEEMDAN, DWT, and MODWT, are incorporated to enhance the MARS-based predictive model. In terms of r, LM, and APB of DO forecasting, the MODWT effectively forecasts improvement (Fig. 12). In the MARS model, r and LM values using the MODWT model increased by ~ 19% and ~ 20% accordingly, and APB decreased by ~ 68%. Similarly, for the BNR model, MODWT feature decomposition skill increased r and LM values up to ~ 21% and ~ 59% accordingly, and APB is decreased by ~ 57%. Additionally, r and LM values for the MARS model with CEEMDAN are increased by ~ 15% and ~ 50%, respectively. Similarly, the inclusion of DWT, EMD, and EEMD also substantially improved the r, LM, and APB values.

Discussion
According to the findings of this study, different input combinations have varying effects on the outcomes. Then, several input variables must be analysed, and the most Fig. 9 Tylor diagram representing correlation coefficient and the standard deviation difference for proposed hybrid models vs benchmark models appropriate collection of variables must be employed to optimise the products. Every model should have its ideal combination; yet the most effective combination is rare throughout the various models. Al-Musaylh et al. (2019) used the hybrid MARS model in forecasting electricity demand with a good performance. This study demonstrated profound forecasting of Dissolved Oxygen (DO) concentration. Our findings have led to better forecasting than any algorithm evaluated in standalone and hybrid versions. We propose more studies to forecast DO using wet and dry season datasets and compare the results with the whole dataset's findings. Different pre-processing techniques could also enhance the projection accuracy of the MARS model. First, it is possible to implement a suitable feature selection approach such as NCA (Ahmed et al. 2021a;Ghimire et al. 2019a) algorithm to pick the input variables that significantly impact the model. The feature weight calculated using neighbourhood component analysis (NCA) respective to predictor variables was added one by one based on the highest to lowest feature weight to improve the model performance. The optimum combination of input parameters was found significantly in the proposed hybrid MARS model. By fitting piecewise linear regressions, MARS essentially creates flexible models by approximating the nonlinearity of a model using discrete linear regression slopes in various intervals of the independent variable space. An expansion in product spline basis functions of the predictors selected during a forwards and backwards recursive partitioning technique is how MARS best fits a model given a collection of predictor variables.
The time complexity of machine learning models is very important for the better application of the ML models. All the ML models used in our study shows less complex in terms of training time with less than 2 min for almost all the models. The incorporation of five feature decomposition approaches is vital to understanding the diverse implementation of the models in association with data pre-processing (i.e. feature selection and feature decomposition). The results showed that the inclusion of feature decomposition methods such as MODWT, CEEMDAN, EEMD, EMD, and DWT increased the performance of DO forecasting compared to the respective standalone methods. As MODWT can handle any sample size, the smooth and detail coefficients of MODWT filters and produces a more asymptotically efficient wavelet variance estimator than the as a feature decomposition method for DO forecast of the river Surma. The study used five distinct feature decomposition approaches (i.e. MODWT, CEEMDAN, EEMD, EMD, and DWT) and six machine learning models (i.e. BNR, KNN, KRR, MARS, RF, and SVR) for developing the optimum model. A new approach to the DO forecasting model was created using a decedent-lagged memory framework to explain the forecasting problem more appropriately and its consequences afterwards. The proposed MODWT-MARS approach provides the optimal performance among the benchmarked models. From this analysis, the following observations can be made. DWT. However, the MODWT-MARS model was found as the optimum. Different researchers reported similar performance, where MODWT data decomposition is reported to improve performance (Li et al. 2017a;Prasad et al. 2017).

Conclusions
This study developed hybrid machine learning models incorporating neighbourhood component analysis (NCA) as a feature selection method, multivariate adaptive regression splines (MARS) as a predictive model, and MODWT In addition to providing scientific benefits, the MARS low input need combined with their substantial predictive capability also provided significant practical benefits. They enable the development of a station-specific prudent predictive model of DO for monitoring river health at a minimal cost and the development of region-specific management plans across a range of land use and land cover gradients in a cost-effective manner.

Acknowledgements
The authors want to thank Leading University for allowing us to conduct laboratory testing of water parameters.
Funding Open Access funding enabled and organized by CAUL and its Member Institutions Data availability This article presents an original research work executed by the authors, so all the data presented depend on their findings and analysis techniques. The datasets used in this article are available from the corresponding author on reasonable request.

Declarations
Ethics approval "Not applicable." Research does not report on or involve the use of any animal or human data or tissue.

Consent for publication Not applicable.
Consent to participate Not applicable.

Conflict of interest The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.