Comparative evaluation of classical and SARIMA-BL time series hybrid models in predicting monthly qualitative parameters of Maroon river

In the present study, seasonal autoregressive integrated moving average (SARIMA) time series models, nonlinear BL model, multi-layer perceptron artificial neural network and SARIMA-bilinear hybrid models were employed to predict the quality parameters of total dissolved solids (TDS), sodium adsorption ratio (SAR), electrical conductivity (EC) in the Maroon basin in Khuzestan Province. For fitting the mentioned models, the monthly data were used for the calibration (1970–2000), confirmation (2001–2011) and prediction of the model (2012–2018). The appropriate models of SARIMA, the bilinear BL and SARIMA-BL hybrid models of the rationalized parameters of the above-mentioned quality parameters were selected based on the adequacy tests, such as the Akaike criterion and the independence test of the model residuals (Ljung–Box). To determine the effective input parameters of the network, the partial mutual information (PMI) algorithm was used to model the three monthly EC, TDS, and SAR parameters. Also, for input and output layers linear transfer function and for hidden layer various active functions with the back-propagation learning algorithm were used for modeling and predicting this water quality parameters. Comparison of the models showed the tangible superiority of SARIMA-bilinear hybrid models than the artificial neural network with effective input parameters based on PMI algorithm, SARIMA and bl models in performed prediction of the all three monthly qualitative parameters in all three stages, training-validation-test in Maroon basin.


Introduction
One of the most important issues in hydrological science is water quality issue as it is valuable in suppling water for the agricultural or drinking and industrial purposes. Today, most of the researchers study on water quality and focus on the surface and groundwater contamination issues. The pollution problem not only in industrialized countries but also in developing countries is important. In the field of water quality management, several models such as artificial intelligence models and artificial neural networks, hydrological models such as MIKEH11 and QUAL2K, WASP, QUAN2E, WAT-EVAL, hec-5Q and time series models have been developed. Also due to the complexity and lack of sufficient knowledge about the physical processes in the hydrological cycle, the production of statistical models and their extension to processes has always been an engineer's concerns. The basis of many decisions in hydrological processes is the exploitation of water resources, which is based on the prediction and analysis of time series. Accordingly, some international researches have been carried on to predict the river water quality with different models. Singh et al. (2009) developed a neural network model for estimating the amount of dissolved oxygen (DO) and biochemical oxygen demand (BOD) levels in the Gomti River (India). In this study, both the models employed eleven input water quality variables measured in river water over a monthly period of 10 years at eight different sites. Jiang et al. (2018) studied the temperature, electrical conductivity and pH of groundwater as well as microbial and hydrochemical analyses in a Karst valley in northwestern China. Dieng et al. (2017) revealed the presence of estuary in the entrance of the Saloum River in Senegal, significantly effecting the water quality, which caused the intrusion of saline water in the aquifers of the region. Narany et al. (2017) based on 25-year data investigated the quantitative effect of deforestation on the groundwater quality in northwestern Kelantan, Malaysia, and concluded that the amount of nitrate annually increased by 8.1%. Nosrati and Eeckhaut (2012) in their research conducted multivariate statistical analyses to assess groundwater quality using multiple statistical techniques in Hashtgerd Plain and concluded that the main factors affecting the quality of groundwater associated with natural pollutants are household sewage, industrial and agriculture materials. Ahmad and Ali Shah (2017) used an adaptive neural fuzzy inference system to estimate the need for BOD in the Surma River in Bangladesh. The results showed that the ANFIS model can predict water quality with high precision. Asadollahfardi et al (2011) reviewed the ability of two models of artificial neural network and recurrent neural network to estimate the water quality index of the Talkhe River in East-Azerbaijan Province to estimate the TDS parameter. Chowdhury et al. (2010) evaluated the performance of artificial neural network and ordinary Kriging for interpolation of arsenic values. The founds showed that the artificial neural network model estimates 15% better than Kriging method of arsenic values. Zou et al (2010) compared two models of ARIMA and artificial neural networks to predict soil salt and water content, and the results showed the superiority of ARIMA model. Finally, the performance of the comparative neural fuzzy inference model in comparison with the neural network indicates the superiority of the fuzzy inference model compared to the neural network. Sandhu and Finch (1995) have also confirmed the ability of artificial neural networks to predict daily and actual salinity levels in different waters of the watersheds and the ability to estimate the concentration of cations, anions, EC and TDS in these basins. The main purpose of this study is to compare the performance of the classic multilayer perceptron neural network, linear (SARIMA), nonlinear (BL) time series and SARIMA-BL time series hybrid model in predicting and modeling the monthly qualitative parameters such as TDS, SAR and EC of the Maroon river. In addition, in this research, PMI algorithm was used to introduce the input parameters to the neural networks.

Area of study
Maroon basin with an area of about 3824 km 2 is located in the geographic coordinates of longitude 49° 50′ to 51° 10′ N and latitude of 30° 30′ to 31° 20′ E at the mountains of Behbahan city in Khuzestan Province. The Maroon drainage basin is surrounded by watersheds of Zohreh and Karun rivers in Khuzestan and Kohgiluyeh and Boyer-Ahmad Provinces. The average annual rainfall in the Maroon basin varied from about 150 mm in the coastal plain to about 900 mm in the northern mountains, and the rainfall is Mediterranean, with the maximum precipitation occurring between December and March. Table 1 presents the statistical characteristics of the time series of the monthly quality parameters of the Idanak hydrometric station used in this study. Also, Fig. 1 shows the main rivers of maroon basin. Box et al (1994) developed the ARIMA model for seasonal time series. If, in a time series, periodical behavior is seen in definite interval (S), this time series is seasonal, and the SARIMA model is used to model it. This model is shown in the form of SARIMA (p, d, q) (P, D, Q) s in which their structure is P, D, Q, the seasonal component of the model and (p, d, q) nonseasonal component of the model and S is the length of the season. Using the backward shift operator B, the general form of the model is shown as follows:

SARIMA model
where (B) and (B) are polynomials of order p and q, respectively. Φ(B s ) and Θ(B s ) are polynomials of order P and Q in B s , respectively. p is nonseasonal auto-correlation order, d is nonseasonal difference, q is nonseasonal moving mean order, P is seasonal auto-correlation order, D is the number of seasonal differences, Q is seasonal moving mean order, and S is the season duration. ∇ d is nonseasonal operator and ∇ D s is seasonal operator. Time series models consist of this step process: Model identification This step is started by plotting the autocorrelation function (ACF) and the partial autocorrelation function (PACF) charts, and the stationary in the mean and variance of the data is evaluated. Autocorrelation function (ACF) is one of the most important tools for testing data dependency. This function often gives us an insight into the probabilistic pattern that produces the data. This is used to identify and fit the appropriate stochastic model for data. Model fit (estimation of parameters): At this step, by identifying the appropriate models in the previous step, to compare several models and select the best ones, the following equation is used to calculate Akaike criterion: where n is the total number of data, m = (p + q + P + Q) , RSS is the root of the sum of the squares of the residuals. A model is chosen that has the lowest value of AIC.
Model recognition: To check the accuracy of the model, the residual chart is evaluated for normal and stationary aspects. Therefore, by plotting the ACF and PACF charts of the residuals of fitted models, if the values of the auto-correlation coefficients and partial auto-correlation coefficients are within the 95% confidence level, the fitted model is adequate.

Bilinear model (BL)
The bilinear model was introduced by Granger and Andersen (1978). Linear models of time series are Taylor's first-order extension, and the main idea of the bilinear model is the Taylor's second rank extension, which is defined as follows: where Z t is the required time series, and p, q, r, and s are integers that represent the rank of the bilinear model, which is denoted by the form BL (p, q, r, s). The bilinear model is actually the same as the extended linear ARMA model, which the nonlinear term is added to its right side. In this term, the product of two variables Z t−i−j , t−j , which are varied with time, eliminates the linear state and the model is turned to be nonlinear. There are two steps to fit the bilinear model; the first is to determine the order of the model (p, q, r, s) and the other is to estimate , , coefficients. Determining the model parameters is performed the Akaike method. Each rank with a smaller acacia value is selected as the appropriate model order (Akaike 1974). One of the tests used to express the model's adequacy of linear ( SARIMA) and nonlinear (BL) and time series hybrids models' (SARIMA-BL) is Ljung-Box test. This test is applied by computing Q statistics that follows ( 2 k−m ) Chi-squared distribution: According to this equation, N is the number of samples, L is the number of delays of auto-correlation function, and r 2 k is the square of ε residual time series autocorrelation in the delay k. Q statistic was compared with the value of Chi-squared extracted from table at a significant level of 5%. The adequacy of the model was confirmed if Q value was lower than table Chi-squared value.

Artificial neural networks
An artificial neural network is a computational mechanism that can produce a series of new information by gathering information and calculating them. One of the most commonly used neural networks used in the hydrological process is the multilayer perceptron neural network. This network consists of an input layer, an output layer and one or more layers between them that are not directly connected to the input data and output results. Input layer units only have the task of distributing the input levels of the next layer, and the output layer also responds to the output signals. In these two layers, the number of neurons is equal to the inputs and outputs, and the hidden layer is responsible for the relationship between them. One of the steps in numerical preparation and calculations for feeding the neural networks of data normalization is to increase the sensitivity of data and increase the learning power, which increases the predictive ability. In this research, input and output vectors of the network were standardized using (6) in the interval (1 and 0).
In the above equation, X is the given data, X ➝ mean of data, X max : maximum data, X min : minimum data and y: normalized data.

Determining effective input variables in data-based models
There are various methods to determine input variables in models based on data processing, especially in terms of artificial neural networks. These include trial-and-error method, exploration method, expert knowledge-based method, statistical-based method of analysis and combination of different methods (Bowden et al. 2005). The trial-and-error method is two major flaws: 1. Although many combinations of variables are considered as inputs in models based on data processing, there is no guarantee of choosing the best combination of input variables, especially for modeling phenomena that include many input variables. 2. Given that, different models of ANNs should be trained and tested for each combination of input variables which takes a lot of time and complex computing. In the heuristic, the word variables are selected step by step. This method is often used in order to avoid over-estimating all subsets of input variables. The standard step-by-step approach is forward selection and backward selection. In the forward method, in each successive step with a set of input variables selected, the variable that improves the model performance is added as the best final input variable. In the backward approach, the program starts by selecting all input variables and successively input variables that reduce the model performance. The main drawback of the exploration methods is that there are no guarantees to choose the best combination of input variables like trial and error. Used to determine the initial set of input variables, this method has been approved by the American Society of Civil Engineers (ASEC) Committee for the Application of ANN to Hydrology Studies (Sharma 2000). Well understanding of the hydrological system is important to proper select the input variables in the model. If effective input variables not be selected and considered correctly, some information about the system may be lost which led to a confusion in the ANN model training process. In the combination of different methods, the various methods mentioned above are used to determine the input variables in the modeling of the system studied with the ANN models. Although each of these methods has weaknesses for determining effective input variables, the statistical method is a suitable method for determining effective input variables. Since that it is not limited to a specific application, so there is a good potential for providing appropriate algorithms to be added to the ANN model structure as a component, based on data analysis rather than expert knowledge or heuristic only nonlinear variables selection algorithm. The input to determine the effective input variables in the models is based on the data processing PMI algorithm.

Introduction of PMI algorithm based on input selection (PMIS)
The PMI-based input selection algorithm was initially developed by Sharma (2000) to identify the effective input variables in hydrology models. The PMI algorithm performs each iteration by considering an input (C) and an output (Y) and finding the Cs that maximizes the PMI value according to the output variable (according to inputs that are preselected). The statistical concept that PMI estimates for Cs is based on the confidence ranges defined by the distribution generated by a bootstrap loop. If the input is significant, the Cs is added to S and the selection will continue, so that no significant input remains and the algorithm is stopped subsequently.

Estimation of partial mutual information (PMI)
Given a stochastic variable Y, there is some uncertainty about y observation which can be defined according to the Shannon H entropy (Shannon 1948). Assuming a random input variable X, where Y is dependent on it, mutual observations (x, y) reduce this uncertainty, since knowing x allows the derivation of y, and vice versa. According to the definition of mutual information I (X; Y), the reduction in the uncertainty of the variable Y is due to the observation of X (Cover and Thomas 1991). This problem is represented as a common part between two circles in Fig. 2. This common part is where the reduced uncertainty around X and Y is specified by the conditional entropy H (X | Y) and H (Y | X), respectively. Mutual information (MI) can be calculated directly by the following formula: where p(y) and p(x) are the marginal probability density functions (pdfs) of X and Y, respectively, and p(x, y) is the joint probability density function. However, in practice, the correct form of probability density functions in (6) is unknown. Hence, estimates of probability densities are used instead of it. By placing estimates of probability density with the numerical approximation of the integral in (5) we have: where ƒ denotes the estimated density based on the sample of n observations of (x, y). It is noteworthy that various logarithms are used, but usually 2 or e is used. If the basis of logarithm is not mentioned, then the logarithm is natural. As shown in Eq. (6), it could be said that the accurate and effective estimation of MI depends on the method used to estimate the marginal and joint probability density functions. Totally, there are three criteria of PMI algorithm stopping as: 1. tabulated critical values, 2. the criterion based on Akaike information criterion (AIC), 3. the Hamplel test criterion (Shannon 1948;David 1966;Davies and Gather 1993;Maier and Dandy 2000;Sharma 2000;Pearson 2002;Goebel et al. 2005).

SARIMA-BL hybrid models
To construct a hybrid SARIMA-bilinear models first seasonal ARIMA model, the time series data are processed using the very well-known Box-Jenkins method, which consists of three steps: identification, estimation and verification. The details regarding Box-Jenkins method are given in the model description. After verification, the residuals of the best SARIMA fitted model are used as inputs for the bilinear model. The purpose of this final step is to forecast the error sequence of the fit best SARIMA model, which cannot be defined using a linear method like SARIMA. The construction of the SARIMA-bilinear hybrid model includes six steps that are referred to below.
1. Determine the adequacy of data: One way to test the adequacy of data is to use the Hurst coefficient. The Hurst coefficient is a criterion for measuring the adequacy of the data information in terms of the length of the statistical period. 2. Data normalization test: The monthly electrical conductivity of the Maroon basin, in which the Landa coefficient of Box-Box conversion was used after normalizing the data. 3. Fitting SARIMA models on monthly qualitative flow data of the Idenak hydrometer station. 4. Confirmation of the fitting of SARIMA models based on the behavior of the residual. Auto-correlation function (ACF), partial correlation (PACF) and the Manteo port statistics (Ljung-Box) are common steps in the modeling of both the SARIMA and the hybrid model. 5. Selection of the best SARIMA model based on the minimum values of the AIC and Schwartz statistics 6. Introduction of the residual SARIMA models as inputs to the bilinear time series models to build a hybrid model.
In addition, considering the nonlinear structure of the data in the SARIMA model, in the condition where residual data of the model contain nonlinear data, the use of the bilinear time series models model would be appropriate to describe the residual SARIMA models. Therefore, the results of bilinear models can be considered as a substitute for SARIMA model errors. Then, the sum of the results of both SARIMA and bilinear forecasts is considered to be the final forecast. To fit the qualitative data with the SARIMA model, the graphical beauty of the software output mini tab software was used. Then, to construct the hybrid model, the residuals obtained from fitting the SARIMA models to the monthly qualitative data were fitted with the bilinear time series R software.

Model evaluation criteria
In this paper, in order to evaluate the performance of each mentioned model, a number of numerical criteria were used to determine the performance of the models. These criteria include: • Average absolute error • Root-mean-squared error • The coefficient of determination In the above equations, Q i is the observed monthly qualitative parameter values Q i the predicted monthly qualitative parameter values, Q the average monthly qualitative parameter values, and Q the average of the predicted monthly qualitative parameter values.

Results and discussion
As mentioned, one of the methods for testing the adequacy of the fitted model for the monthly discharge time series is to investigate the residual auto-correlation function. Figures 2,  3 and 4 represent the autocorrelation function and partial correlation of the residuals of fitted SARIMA models based on EC, SAR and TDS monthly qualitative parameters. According to Figs. 2, 3 and 4, the ACF values of the residuals are located in the permitted range of 95% confidence interval ( ±1.96∕ √ n ), so the residuals were stationary and did not show any trend. Also, the Q values of the Ljung-Box test confirmed the zero hypothesis that the residuals series is stochastic (Table 2). Also, the best fitted SARIMA, BL and SARIMA-BL Hybrid models were selected based on the lowest AIC information criterion, the Portmanteau statistics and the model residuals test (in order to evaluate the adequacy of the residuals of the best model). Table 3 shows a summary of the parameters of the fitted bilinear models on the monthly data of TDS, EC, SAR. Given the values of the Ljung-Box statistics, BL bilinear models and time series hybrid models (SARIMA-bilinear) comparing them with the corresponding values of the Chi-square of table which handle to be sure of the validity of fitting bilinear models, finally, bilinear model BL (5,1,1,1) with an Akaike value of 379.18 as the best model for modeling and predicting the SAR parameter, the bilinear model (12,0,1,1) with an Akaike value of 396.24 for modeling and predicting the TDS qualitative parameter… and the bilinear model BL (7,2,1,1) having Akaike value of 385.97 were used to predict and model the qualitative parameter of EC. It is worth to mention that, by increasing the moving average of bilinear models, their ability becomes weaker for model and predict. On the other hand, in the SARIMA linear model, as the order of the model increases, the model performance in forecasting as well as the accuracy of the model greatly reduces. In the nonlinear BL and SARIMA-BL hybrid models, it is true about the models with higher order of p and lower values of q, r and s which have better performance and more fitting accuracy.

Determination of effective input variables on qualitative variables in artificial neural network
To estimate the qualitative parameters of EC, TDS and SAR flow at the Idanak hydrometric station with artificial neural network, we initially attempt to determine the effective input variables. To provide the potential set of input variables, including sodium, magnesium, calcium, sulfate, HC and PH, each variable was considered up to three months ago. Then input variables affecting output variables including EC, TDS and SAR were obtained using PMI algorithm. Table 4 presents the qualitative parameters affecting the monthly electrical conductivity parameter based on the Hampel and Akaike criterion, where the sodium quality parameter (with a delay of one month), temperature (with a delay of two months), flow rate (with a delay of two months), magnesium (with two months delay) and acidity (with a delay of one month) were supposed as effective parameters. Table 5 shows the qualitative parameters affecting the monthly TDS parameter based on the Hampel and Akaike criterion, in which the parameters of sodium, magnesium, calcium, flow rate, temperature (with a delay of one month) were introduced as an effective input parameter to the neural network for modeling and advance monthly TDS parameters. Also parameters of    sodium, magnesium, calcium and acidity (with one month delay) parameters were introduced as effective input parameters in the modeling and prediction of the monthly parameter based on the Hampel and Akaike criterion. Also Based on the results in Table 6, monthly parameters of calcium. sodium Magnesium without time delay and ph parameter with one time delay based on Hempel and Akaike criteria were introduced as effective input parameters for modeling and forecasting the monthly qualitative parameters of SAR by PMI algorithm to the artificial neural networks. Unlike Zou et al. (2010) who found the superiority of ARIMA model than ANN in predicting EC parameter, our results are agreed with Sandhu and Finch (1995) results, who found the ability of neural network in predicting the hydrological parameters. Different models were evaluated to determine the optimum levels of hidden layers and transmission functions in artificial neural networks. The number of layers as well as the number of hidden layer neurons was determined based on the trial-and-error test. In the artificial neural network model, various scenarios were tested for the three parameters TDS, SAR, EC. Then, according to the model's studied criteria, the scenarios with the least error and the highest correlation with the observational data of the qualitative parameter related to the optimal structure were selected. For modeling of the neural network, data are usually divided into train and test groups. Therefore, in this study, to avoid over-fitting which results in nongeneralizability for new data, the data were divided into three train data , validation data (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011) and test data (2012)(2013)(2014)(2015)(2016)(2017)(2018). Among the available transfer functions, the function for the input and output layer of the linear function and in the hidden layer of various functions such as sigmoid tangent function, sigmoid log, etc., error back-propagation train algorithm was applied. The calculations related to optimal models showed that the number of neurons 8,9,8 in the hidden layer was for modeling and predicting EC, TDS and SAR, respectively, as being applied as optimal values. It should be noted that by increasing or decreasing the number of neurons in the three layers (the input, middle and output) the performance of the network was not significantly changed. In addition, in order to evaluate the performance of appropriately fitted models, different statistics were used. In Table 7, the numerical criteria of performance evaluation of the models were used to predict monthly qualitative parameters of EC, SAR and TDS. Also, among various active function the hyperbolic tangent active function was selected as the superior active function of the artificial neural network for predicting and modeling monthly qualitative parameters for EC, and the best active function for SAR was log sigmoid; similarly, tangent sigmoid was the best active function determined for the TDS parameters. Evaluation of the performance criteria of the models in Table 7 shows the gradual decline in the performance of the SARIMA models used for predicting and modeling the TDS and SAR monthly qualitative parameters from the training stage to the test stage so that the model performance in TDS prediction. The coefficient of determination was 0.81 at the calibration stage and reached to 0.76. But the performance of the SARIMA model in predicting EC parameterization shows a gradual improvement of the model performance. So that, in the validation stage the model performance was better than in the calibration stage; however, in the reduced model performance testing phase, this is due to the numerical values of the model performance evaluation. On the other hand, artificial neural network model with passive structures and active functions is used to predict monthly qualitative parameters and showed the highest coefficient of determination and lowest root-mean-square error and mean error compared to SARIMA time series models and binary BL models have better performance.  Table 7 shows numerical criteria for evaluating the predicted values of monthly qualitative parameters using linear models of SARIMA and nonlinear bilinear time series, neural network, SARIMA, bilinear hybrid models. On the other hand, the evaluation of the performance of the models in Table 7 according to the coefficient of determination and average absolute error in the training-validation-test stages was investigated. Besides, the results showed that the values of the corresponding tests decreased and the root-mean-square error increased. In addition, determining both the effective input parameters to the neural network by PMI algorithm and the optimal structure along with active function is very time-consuming and exhausting task. Therefore, using the SARIMA-BL hybrid model is a suitable solution to this problem.

Conclusion
So in this study, SARIMA, bilinear time series models, multi-layer perceptron neural network and SARIMA-BL hybrid models were used to predict monthly qualitative parameters of SAR, TDS and EC. The validity of the SARIMA fit models for modeling the aforementioned monthly qualitative parameters was confirmed based on the Portmanteau statistic, the residual of autocorrelation and partial autocorrelation. The validity of the SARIMA-BL hybrid and bi-linear models was verified using numerical values of Portmanteau statistics. Finally in this study, PMI algorithm was used to improve the performance of artificial neural network based on the results of PMI algorithm with different time delay as input to network to predict monthly qualitative parameters. In addition, evaluated criteria of model showed that BL models performed better than SARIMA models in predicting monthly EC-TDS-SAR quality parameters. So that, BL models have higher coefficient of determination and lower mean square root. They had training-validation-testing in all three phases compared to the SARIMA models. On the other hand, the evaluation of model for the studied criteria shows a gradual decrease in the performance of neural network models. BL models and hybrid models are from training to testing, respectively; the comparison of four models showed the tangible superiority of SARIMA-bilinear hybrid models in comparing to artificial neural network with effective input parameters based on PMI algorithm in prediction of the all there monthly qualitative parameters in Maroon basin. SARIMA-BL hybrid models performed better in all three stages of training-validation-test despite using lower input parameters than artificial neural networks with the least square root mean and the coefficient of determination.
Funding The author(s) received no specific funding for this work.

Conflict of interest
The authors declare that they have no conflict of interest. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.