Estimating discharge coefficient of side weirs in trapezoidal and rectangular flumes using outlier robust extreme learning machine

Using the outlier robust extreme learning machine (ORELM) method, the discharge coefficient of side weirs placed on rectangular and trapezoidal canals was simulated for the first time in this study. The parameters governing the discharge coefficient of side weirs including Froude number (Fr), the ratio of the weir length to the main channel length (L/b), the ratio of the flow depth at the upstream of the side weir to the main channel width (y1/b) and the ratio of the crest height of the side weir to the flow depth at the upstream of the side weir (W/y1), the ratio of the weir length to the main channel width (L/y1), and the side wall slope parameter (m) were initially detected. Using the parameters governing, eight different input combinations were defined. By randomly selection approach, 65% of the data were considered to train the ORELM models and the rest of samples were applied to test them. The correlation coefficient, Nash–Sutcliffe efficiency coefficient, and Scatter Index for this model were calculated to be 0.937, 0.869 and 0.092, respectively. The results of sensitivity analysis indicated the ORELM model was more sensitive to the W/y1 and L/b than Fr and y1/b. The results of the ORELM model were also compared with the support vector machine optimized with genetic algorithm (SVM-GA) and extreme learning machine (ELM)) and four multiple linear regression models, with a better performance of the ORELM model. The ORELM models demonstrated a higher precision and correlation with experimental values.


Introduction
As one of the main hydraulic structures, a side weir is installed to divert and regulate flow on the sidewall of the main channels. Such structures have many applications in irrigation and drainage networks, urban runoff collection systems and water and wastewater treatment plants (Bagheri et al. 2014).
The discharge coefficient is treated as the most significant parameter for design a side weir which many researchers have conducted several analytical and numerical studies on it (Akhbari et al. 2017;Mirzaei and Sheibani 2020). For channels with different slopes, they reported the changes in the discharge coefficient versus the Froude number, suggesting that the discharge coefficient decreased by increasing the Froude number. Furthermore, an experimental research on the discharge coefficient of weirs located on rectangular and trapezoidal canals was carried out by Keshavarzi and Ball (2014). They came to the conclusion that the discharge coefficient was a function of the Froude number, the ratio of the crest height of the side weir to the flow depth at the upstream of the weir, and the wall slope of the main channel. Moreover, Bagheri et al. (2014) evaluated the discharge coefficient of rectangular side weirs experimentally. The effects of hydraulic and geometric parameters on the changes in the discharge coefficient were evaluated and the variations in the free flow surface were calculated. The limitation of the multiple linear regressions (MLRs) that applied in experimental-based studies to provide a relationship between input and output parameters is the low generalizability of this approach to estimate samples that have no role in model calibrations. Indeed, the MLR fits the model to find the target value and there is no any training phase to teach model for unseen samples. To overcome this drawbacks, an accurate and reliable tool known as artificial intelligence model have been utilized for simulating and estimating different phenomena. The discharge coefficient of side weirs has been simulated across various algorithms and models of artificial intelligence throughout recent decades (Salmasi and Sattari 2017;Niazkar and Afzali 2018;Olyaie et al. 2019).
Using Gene Expression Programming (GEP) model, the discharge coefficient of rectangular side weirs was estimated by Ebtehaj et al. (2015a). They proposed a formula using hydraulic and geometric parameters to determine the discharge coefficient. They also compared the results of the developed GEP with existing models and showed that the GEP was more accurate. Using group method of data handling (GMDH), Ebtehaj et al. (2015b) estimated the discharge coefficient of side weirs. They also compared the results with Artificial Neural Network (ANN) and showed that the GMDH was more accurate. Moreover, Khoshbin et al. (2016) presented an optimized hybrid model for estimating the discharge coefficient of side weirs through the combination of the adaptive neuro-fuzzy inference system (ANFIS), the genetic algorithm (GA) and the singular value decomposition (SVD). By conducting a sensitivity analysis, the parameters influencing the discharge coefficient of side weirs located on trapezoidal channels were investigated by Azimi et al. (2017a). They defined the superior model and the most significant input parameter using the extreme learning machine (ELM). Azimi et al. (2017b) used the GEP to simulate discharge coefficient of side weirs on trapezoidal channels through subcritical conditions. They provided an equation for the discharge coefficient calculation. Subsequently, Azimi et al. (2019a) developed six different models for estimating the discharge coefficient of weirs located on a trapezoidal channel by the means of the support vector machine approach. In determining the discharge coefficient, they implemented the superior model by performing a sensitivity analysis. Bagherifar et al. (2020) simulated the flow field within the circular flumes along with the rectangular side weirs through a computational fluid dynamics (CFD) model. The results showed that the CFD model estimated the flow characteristics with a reasonable performance. The authors demonstrated that the specific energy at upstream and downstream of the side weir was approximately constant.
According to the unique characteristics of the Extreme leaning machine (ELM) (Huang et al. 2006) as an efficient, effective machine learning algorithm (Huang 2014) in solving complex nonlinear problems has attracted many researchers' attention (Azimi et al. 2017a;Ebtehaj et al. 2018;Zeynoddin et al. 2018;Bonakdari et al. 2019;Azimi and Shiri 2021). Some of the advantages of this method are: (1) in addition to the ability to approximate the estimator function, it can map a training inputs variables to the corresponding output one and can perform fast and accurate parallel computations during testing and training processes.
(2) Various experimental studies showed that the ELM technique has better generalization and scalability performance than classical neural network methods such as the multilayer perceptron and the support vector machine (Huang et al. 2011;Ebtehaj et al. 2016;Azimi and Shiri 2021). (3) The modeling speed in the ELM is noticeably high while other classical methods are burdened with increased communication costs for training the model. In fact, this feature is the most noticeable advantage of the method over classical machine learning algorithms so that all parameters relevant to hidden nodes (i.e., biases and input weights) are randomly produced without encountering with training samples and tuning (Huang et al. 2006).
In the current study, a novel version of ELM known as Outlier Robust ELM (ORELM) (Zhang and Luo 2015) is applied for estimating the discharge coefficient of side weir for the first time. The novelty of the present study is thirdfold. (1) The ORELM is applied for the first time in the discharge coefficient of side weir, (2) by exploring the literature, it can be concluded that no previous study on the estimating of side weir has used comparative analysis on the probable input combinations for the discharge coefficient of side weir situated on rectangular and trapezoidal channels. In the current study, input combination is carried out on eight different models inputs, with six to four input variables as ORELM 1 to ORELM 8. (3) Most previous equations were proposed based on restricted database ranges. However, in this study, a wide range of datasets were used which combine four different experimental datasets. Besides, the results of the ORELM are compared with the existing artificial intelligence-based methods in estimating of the discharge coefficient. The best ORELM model is compared with three artificial intelligence (AI) and four empirical approaches. According to the performed analyses, the superior ORELM possesses better performance in comparison with these AI-based and empirical models.

Discharge coefficient of side weir
The parameters influencing the discharge coefficient of rectangular side weirs are written as follows (Azimi et al. 2017b): where Fr is the Froude number of flow at the upstream of the side weir, L is the side weir length; b is the main channel width, P is the crest height of the side weir, y 1 is the flow depth at the upstream of the side weir and S 0 is slope of the main channel bed. To make the introduced parameters dimensionless, the ratio of the side weir length to the main channel width (L∕b) , the ratio of the side weir length to the flow depth at the upstream of the side weir L∕y 1 , and the ratio of the side weir crest height to the flow depth at the upstream of the side weir W∕y 1 are defined (Azimi et al. 2017b): Also, Borghei et al. (1999) reported that the effects of the main channel bed slope are marginal and can be ignored in the subcritical flow regime. In this study, the main channel is a trapezoidal channel. It is worth mentioning that the effect of side wall slope (m) is an effective factor on the discharge coefficient (Azimi et al. 2017b). Thus, Eq. (2) is expressed as follows: (1) In addition, to study all parameters influencing the side weir discharge coefficient located on trapezoidal channels, the influence of the ratio of the flow depth at the upstream of the side weir to the trapezoidal channel bed width y 1 ∕b on the discharge coefficient is taken into account. So, Eq. (3) is written in the form of Eq. (4) (Azimi et al. 2017b): Therefore, to develop the artificial intelligence models, the parameters of Eq. (4) are utilized. In Fig. 1, the input parameter combinations of the various ORELM models are shown.

Data sets used in this study
A detailed database is used in this paper to model the discharge coefficient of side weirs. To this end, four different experimental models including Cheong (1991), Emiroglu et al. (2011), Keshavarzi and Ball (2014) and Bagheri et al. (2014) are implemented. Cheong's (1991) model involves a straight trapezoidal channel with the length of 10 m and the bed width of 0.67 m in which the side weir is placed on the sidewall at a distance of two-thirds of the main channel length from the inlet. Slope of the trapezoidal channel sidewalls in Cheong (1991)  Approximately 65% of the experimental samples are randomly selected to train models of artificial intelligence and the remaining 35% is used to test them. Figure 2 demonstrates the layout of the experimental models used in this analysis. The maximum, minimum, and average values of the applied experimental measurements are tabulated in Table 1.

Outlier robust extreme learning machine (ORELM)
A method for generating the single-layer feed-forward neural network (SLFFNN) is the Extreme Learning Machine (ELM) technique (Huang et al. 2006).
where j ∈ R n is the matrix of problem inputs and q j ∈ R , and if the proposed model has the ability to establish mapping between j and q j with reasonable accuracy, the ELM with the activation function f(k) and N hidden layer neurons can be expressed as follows (Huang et al. 2006): where j is the output weight matrix connecting the jth hidden layer neuron to the output neuron (target variable), f(k) is known as the activation function, j = [g j1 , g j2 , g j3 , ..., g jn ] is the input weight matrix so that connects the jth hidden layer neuron to input neurons and b j is the bias relevant to the jth hidden layer neurons. In addition, j ⋅ i is the internal multiplication of G j and k i . If we express Z obtained relationships from Eq. (5) in a matrix form, the following linear system is achieved (Huang et al. 2006): Here, According to the above relationship, it is shown that the only parameter which requires to be calculated is the output weight matrix (γ) and the other ones (the bias and the output weight matrix) are constant. It is obvious that the matrix W is non-square in most cases and there might be no answer for γ as = (Huang et al. 2006). To overcome this issue, the optimal answer is obtained using the least square solution. To this end, the main aim is to minimize the loss function: Eventually, the optimal response of the problem for minimizing the l 2 -norm is as follows: where + is Moore-Penrose generalized inverse (MPGI) of W (Rao and Mitra 1971). As the number of training samples is greater than the number of hidden layer nodes (Z > N), it is possible to rewrite the above equation as follows: Since outliers possess a little part of training samples, the value of this feature for training error € can be specified by sparsity. It is clear that sparsity is reflected by the l 0 -norm better than the l 2 -norm. Therefore, in the ORELM, we are trying to find the output weight matrix (γ) with the least value of the l 2 -norm so that the value of e to be sparse:

Fig. 2 Layout of used experimental models
The above relationship is a non-convex programming problem. Since the sparse term can be obtained using the l 1 -norm (Chuang et al. 2002;Daszykowski et al. 2007) it is clear that by replacing the l 0 -norm by the l 1 -norm in Eq. (9) in addition to having overall minimization convex, satisfies the sparsity feature. Thus, Eq. (9) is rewritten as follows: The above equation is a constrained convex problem that suits the related domain of the augmented Lagrange multiplier (ALM). Hence, the ALM is provided as follows: where η is the penalty parameter and ∈ R n is the Lagrange multiplier vector. Also, = 2Z∕‖ ‖ 1 (Yang and Zhang 2011). The ALM algorithm yields the optimal answers (γ, e) and the value of α through an augmented Lagrangian multipliers minimization process: To produce next generations through the minimization process, the following relationships are solved by the ALM: Although the developed ORELM has advantages including high ability to map nonlinearly between inputs and outputs, rapid training time that overcome the limitation of the classical time-consuming approaches, minimum user intervention and high generalization, it also has disadvantage. The main disadvantage of this method is random generation of the input weights and bias of hidden neurons that can be affected in the generalization ability of the developed model. To overcome this drawback, it is recommended to run this method for different times and check the generalizability of it at testing samples that had no role in model calibration.
The flowchart of ORELM model is presented in Fig. 3.

Goodness of fit
The correlation coefficient (R), variance accounted for (VAF), Root Mean Square Error (RMSE), Scatter Index, Mean Absolute Relative Error (MARE) and Efficiency of Nash-Sutcliffe (NSC) are used as follows in this study (Azimi and Shiri 2020a): where O i is observed values, F i represents values predicted by numerical models, O is the average of observed values and n is the number of observed values. In the current study, five criteria were applied since the correlation of the ORELM models were assessed by using the R and NSC indices, whereas the relative errors of the models were evaluated by means of the SI and MARE criteria. Moreover, the value of absolute errors were examined through the RMSE indicator. The total number of used experimental measurements was 314 cased, which 65% of the samples were used for training the ORELM models, and the remaining 35% were used for testing of the ORELM models.

Results and discussion
The number of secret layer neurons is initially optimized in the next sections and various activation functions are investigated subsequently. After that, the superior model and the most effective input parameters are identified through a sensitivity analysis. In addition, with some artificial intelligence and regression models, the ORELM superior model is compared. In these models, an analysis of uncertainty and a reliability analysis are also performed. Finally, for the superior model, a partial derivative sensitivity analysis (PSDA) is conducted. It should be noted that only the testing mode results are presented in this research.

Number of hidden layer neurons
The number of ORELM hidden layer neurons is investigated in this section. The selection of optimal neurons increases the artificial intelligence model's efficiency in terms of modeling accuracy and computational time (Azimi and Shiri 2021). The number of hidden layer neurons is initially selected to be equal to 5, and this number increases gradually to 24. The most optimal number of hidden layer neurons is chosen to be 22. The values of different statistical indices measured for all hidden layer neurons are presented in Fig. 4

Activation function
In the following, the ORELM activation functions are studied. In this paper, for activation functions including sig, sin, hardlim, tribas and radbas are utilized for the ORELM model. Figure 6 displays  Moreover, the RMSE and MARE statistical indices for the activation function tribas are obtained to be 0.089 and 0.159, respectively. In addition, for the activation function radbas, the values of R, SI and NSC are calculated to be 0.846, 0.167 and 0.713, respectively. Thus, according to different activation functions, the function sig is introduced as the superior one and used in the following modeling process for simulating the discharge coefficient. The results of the discharge coefficients simulated by the activation function sig and the comparison with the experimental values are shown in Fig. 7.

Sensitivity analysis
The performance of different ORELM models is evaluated in this chapter by performing a sensitivity study. Figure 8 displays the results of the various statistical indices calculated for these models. Based on the performed sensitivity analysis, the ORELM 8 and ORELM 2 models have the highest and the lowest accuracies, respectively. Furthermore, eliminating the dimensionless parameters W∕y 1 and L∕b declines the modeling accuracy incredibly. So, these dimensionless parameters are ascertained as the most influencing input parameters on the simulation of the discharge coefficient by the ORELM model. Therefore, the performed sensitivity analysis demonstrated that ORELM 8 was the best model in order to estimate the discharge coefficient. After ORELM 8, ORELM 5, ORELM 3, ORELM 4, ORELM 1, ORELM 7 and ORELM 6 were, respectively, identified as the second, third, fourth, fifth, sixth and seventh-best models for estimating the target function. However, ORELM 2 showed the worst performance to model the discharge coefficient of side weirs.
Furthermore, W/y 1 possessed the highest level of effectiveness on the ORELM network so as to predict the discharge coefficient, while the L/b, Fr, m, and y 1 /b factors were, respectively, recognized as the second, third, fourth and fifth-important input parameters. It is worth noting that the slope of side wall (m) was insignificant input variables so as to approximate the target value.

Comparison of ORELM model with AI-based and regression-based models
In this section, the superior model (ORELM 8) is compared with three artificial intelligence models developed by Roushangar et al. (2016), and Azimi et al. (2017a) as well as four regression models defined by Singh et al. (1994) (Reg 1), Borghei et al. (1999) Azimi et al. (2017a, b) applied the measurements presented by Cheong (1991) for training and testing the ELM model. It should be stated that almost all previous studies estimated the discharge coefficients instead of discharge and also the empirical and artificial intelligence-based equations were presented to approximate the discharge coefficient not the discharge. Therefore, evaluation of the side weir discharge coefficient is quite reasonable in the current study. Figure 10 provides a comparison of the values of the various statistical indices for artificial intelligence and regression models. Also, Fig. 11 demonstrates the comparison of the discharge coefficient simulated by different artificial intelligence and regression models. Based on the comparison, the value of R for the ELM, and SVM-GA models are computed to be 0.162, -0.484 and 0.482, respectively. Also, for the Reg1, Reg2 and Reg3 models, the NSC values are obtained equal to 0.135, -0.617 and -0.256, respectively. In addition, the values of RMSE, SI and MARE for the Reg4 model are equal to 0.142, 0.288 and 0.207, respectively.
In comparison with other studies carried out on the simulation of the discharge coefficient so far, the ORELM 8 model therefore has the highest precision and the lowest error, as shown. In other words, the ORELM 8 model is more flexible than other artificial intelligence and regression studies in simulating the discharge coefficient. For instance, the accuracy of ORELM model was roughly 87% greater than the ELM model, whereas the correlation of ORELM model was nearly 94% higher than the SVM-GA model. Moreover, the precision of the ORELM model was approximately 74%, 69% and 61% better than Reg 1, Reg 2 and Reg 3 models. The made comparison showed a good generalization ability of the applied ORELM model in comparison with the previous investigation, meaning that the ORELM algorithm was used in a wide range of experimental measurements, while other regression and AI-based equations were proposed just for some specific experimental values. Therefore, the presented ORELM model was more generalized algorithm compared with its counterparts. Additionally, the ORELM model had high level of precision and correlation with experimental values.
In addition, the 95% uncertainty is performed for the artificial intelligence and regression models and the results are given in Table 2. The mathematical details of the 95% uncertainty can be found at Saberi-Movahed et al. (2020). The 95% uncertainty is calculated by (Saberi-Movahed et al. 2020): The 95% uncertainty for the ELM, and SVN-GA models are 0.0022, 0.0057 and 0.0019, respectively. Moreover, the 95% uncertainty for the Reg 1, Reg 2, Reg 3 and Reg 4 models are obtained to be 0.0014, 0.0013, 0.0012 and 0.0012, respectively, while this value for the ORELM8 model is estimated to be 0.0009. Based on the results of the 95% uncertainty, the ORELM8 model has the lowest uncertainty.
In the following, the reliability is conducted for the artificial intelligence and regression models. The mathematical details of the reliability can be found at Saberi-Movahed et al. (2020).
The reliability is computed as follows: here k i is estimated by using two phases. Firstly, the relative average error (RAE) is calculated as a vector whose ith component is as below:  Secondly, if RAE i ≤ Δ, then k i = 1, otherwise, k i = 0, here Δ is the threshold value of the target function, meaning that k i is determined as the number of times the value of RAE is less than or equal to that of Δ. Regarding the Chinese Standards, the optimal value of Δ is 0.2 (Saberi- Movahed et al. 2020). The results of the analysis are listed in Table 3. For the ELM and SVM-GA models, the reliability value is estimated to be 13.245% and 2.649%, respectively. In addition, this value for the Reg 1, Reg 2, Reg 3 and Reg 4 models are computed to be 39. 073, 31.788, 65.894 and 57.947, respectively. It is worth noting that for the ORELM 8 model, the reliability value is computed to be 93.378% indicating high accuracy and reliability of the ORELM 8 model.

Partial derivative sensitivity analysis (PDSA)
For the superior model (ORELM 8), a partial derivative sensitivity analysis (PDSA) is performed in this section. The PSDA is typically used to measure the effect of input parameters on the target parameter (Azimi et al. 2019b;Azimi and Shiri 2020b). In other words, the PSDA is a method for identifying the changing pattern of the objective parameter according to input parameters. The positive PSDA implies an increase in the objective function (discharge coefficient), while the negative sign implies a decrease in the target function. In other words, the relative derivative of each input parameter is computed according to the target function in this process. According to the PDSA findings, the PSDA increases by increasing the value of the Froude number. Also, by increasing the parameter L/b, the PDSA decreases. Furthermore, by increasing the input parameters y 1 /b and W/y 1 , the PDSA decreases. The PDSA results for the ORELM 8 model are illustrated in Fig. 12.

Superior ORELM model
ORELM 8 was identified as the best model to simulate the discharge coefficient in the current study. Thus, a relationship for ORELM 5 is presented as follows: here InW, InV, BHN and OutW are, respectively, matrices of input weights, input variables, the bias of hidden neurons and output weights and defined as bellow: The sensitivity analysis showed that the Fr, L/b, y 1 /b, W/y 1 had significant influence to model the discharge coefficient using the ORELM algorithm. Hence, the ORELM 8 was developed as the superior model to estimate the discharge coefficient in the present work.
The performed analyses showed that ORELM 8 had the highest level of precision and correlation and this model could simulate the discharge coefficient of side weirs with a low level of uncertainty. Moreover, the side weir height to flow depth ratio (W/y 1 ) and the side weir length to the main channel width ratio were detected as the most striking input factors for estimating the target function.

Conclusion
The discharge coefficient of side weirs located on rectangular and trapezoidal channels was first simulated using a modern model of artificial intelligence called "outlier robust extreme learning machine (ORELM)" in this study. Initially, a comprehensive database composed of four different experimental models was utilized for validating the artificial intelligence models. It is worth remembering that 65% of the observed samples were used for training models of artificial intelligence and the remaining 35% were used for testing models of ORELM. The most important obtained results are summarized as follows: • The optimal number of hidden layer neurons was selected to be 22 by performing a trial and error process. • The sigmoid was chosen as the best activation function for the ORELM model. • Eight distinctive ORELM models were produced using the effective dimensionless parameters and the ORELM 8 was detected as the superior model. • By performing a sensitivity analysis and the parameters W∕y 1 and L∕b were introduced as the most influencing input parameters. • For the ORELM 8 model, the values of R, RMSE and MARE are approximated to be 0.937, 0.045 and 0.081, respectively. • To compare the results of the best ORELM-based model (ORELM 8) with the existing ones, three nonlinear artificial intelligence methods (support vector machine optimized by genetic algorithm, and extreme learning machine) and four multiple linear regression equations were employed. The results indicated that the ORELM 8 model has better results. • Furthermore, uncertainty and reliability analysis for the ORELM 8 model were computed 0.0009 and 93%, respectively.
• Finally, for the ORELM 8 model, a partial derivative sensitivity analysis was performed and the PSDA increased by increasing the Froude number and by increasing the parameter L/b the PDSA decreased.
This study showed that the ORELM model which is calibrated with a large number of samples has more flexibility and better efficiency. For future works, it is highly recommended to check the performance of other string artificial intelligence such as adaptive neuro-fuzzy inference systems optimized with the new developed evolutionary algorithms such as gray wolf optimization. Additionally, the provided ORELM-based equation may be used for the estimation of the discharge coefficient in the practical applications. It is suggested that the results of the ORELM model can be compared with the computations fluid dynamics (CFD) tools.
Funding The author(s) received no specific funding for this work.

Conflict of interest The authors declare no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.