Introduction

Study on the hydraulic phenomena is based on the definition affective parameters. To this purpose, influence parameters such as fluid properties, hydraulic and geometric variables are collected together and using the dimensional analysis such as Buckingham π theorem the dimensionless parameters are derived (Dehdar-behbahani and Parsaie 2016; Chen 2015). Usually using the design of experiment (DOE) techniques, the influence of the independent parameters on the dependent parameter is defined. In this approach for defining the impact of the independent parameter on the dependent parameter during the experiments, other parameter remains constant (Antony 2014). Today by advancing the data mining approaches such as neural network models in almost all areas of water engineering fields especially in the water engineering studies (Azamathulla et al. 2016; Parsaie 2016a, b), researchers have attempted to use these techniques for predicting and modeling the hydraulic or hydrologic phenomena (Tayfur 2014). As clear from the name of the data mining approaches, developing these models are based on the data set; therefore, investigators for developing the types of the data mining models have tried to collect the related data set from the various reliable sources such as peer-reviewed article and handbooks and books, etc. (Araghinejad 2013). During the data collection process defining the most affective independent parameters sometimes becomes difficult therefore to this purpose several mathematical approached such as principal component analysis as multivariable analysis techniques, etc., have been proposed. Using these approaches leads to define the most affective parameter on the desired phenomenon (Remesan and Mathew 2014). Since the focus of this research is on the side weir discharge coefficient, so the most follow illustration is on this subject. Side weir is a type of weir which is set up on the side wall channel and most of the time installed parallel to the flow direction (Haghiabi 2012; Heidarpour et al. 2008). Side weir is used for removing the excess flow from the hydro-systems such as irrigation and drainage network, sewage, etc. (Bagheri et al. 2014; Haddadi and Rahimpour 2012; Parsaie et al. 2015a). Several studies such as experimental, analytical and artificial intelligent techniques have been used for calculating and predicting the sider weir discharge coefficient (Vatankhah 2013a, b; Parsaie and Haghiabi 2014). In the experimental studies researchers have tried to improve the performance of the sider weir, to this purpose various shapes have been proposed for the crest of side weirs which most of these categorized as nonlinear crest. In the field of numerical modeling using the computational fluid dynamic and artificial intelligent techniques can be stated (Aydin and Emiroglu 2013). In the computational hydraulic field, the water surface profile and flow properties were studied (Parsaie and Haghiabi 2015a, b). Side weir discharge coefficient was predicted and modeled by most types of neural network techniques such as multilayer perceptron (MLP) neural network, adaptive neuro-fuzzy inference system (ANFIS), and group method of data handling (GMDH) (Ebtehaj et al. 2015a; Emiroglu et al. 2011b; Kisi et al. 2012). Based on the reports the accuracy of these models are much more than the empirical formulas. Using the AI model together with numerical methods leads to increase the accuracy of the numerical simulation (Parsaie et al. 2015b; Parsaie and Haghiabi 2015a, c). Although the AI techniques have the ability to model complex systems, optimal structure of these models is an important subject which is discussed in the model development process. Several mathematical approaches such as gamma test, Monte Carlo simulation and principal component analysis such as multivariable analysis have been proposed to this purpose (Martinez et al. 2010). In this paper using the PCA as the most important parameter on the side weir, discharge coefficient is derived and in the following by considering the PCA results, an evaluation is conducted on the performance of the empirical formulas which have been proposed for \({\text{Cd}}_{\text{sw}}\). At the end, the ANFIS model is developed based on the PCA results.

Method and materials

Discharge coefficient of side weir is proportional to the hydraulic and geometric parameters. Figure 1 shows a schematic shape of the side weir and the most important parameters in the subcritical flow condition.

Fig. 1
figure 1

Sketch of side weir at subcritical flow condition

As seen in Fig. 1, the most important parameters are the flow velocity (\(V_{1}\)), side weir length (L), diversion angle of the flow (\(\psi\)), weir height (P) and the longitudinal slope of the channel (s 0). Equation (1) collected the mentioned parameters.

$${\text{Cd}}_{\text{sw}} = f\left( {v_{1} ,L,b,h_{1} ,P,\psi ,s_{0} } \right)$$
(1)

Using the Buckingham theorem leads to derive dimensionless parameters which are basic parameters for developing the empirical formulas and AI models. The result of the Buckingham theory is given in the Eq. (2) (Emiroglu et al. 2011a).

$${\text{Cd}}_{\text{sw}} = f_{2} \left( {{\text{Fr}}_{1} ,\frac{L}{b},\frac{L}{{h_{1} }},\frac{P}{{h_{1} }}} \right)$$
(2)

For calculating the \({\text{Cd}}_{\text{sw}}\) some of the most famous empirical formulas were collected and given in Table 1.

Table 1 Some empirical formulas to calculate the side weir discharge coefficient

As mentioned in the past section, developing the AI models is based on the data set; therefore, about 477 data set-related parameters of Eq. (2) were collected from reliable peer-reviewed journals and their ranges are given in Table 2. To calculate the discharge coefficient of side weir using the empirical formulas with regard to Table 2, the values of related parameters used in each of the empirical formulas are derived and then taken into the empirical formula and then the discharge coefficient will be calculated.

Table 2 Range of collected data related to the side weir discharge coefficient

Principal component analysis (PCA)

The principal component analysis (PCA) is an advanced category in the factor analysis approaches and usually used for data reduction in the field of engineering. The main application of the PCA is in the compression and classification of data; the other main use of this approach is to reduce the dimensionality of a data set (sample) by finding a new set of variables, smaller than the original set of variables that nonetheless retains most of the sample’s information (Camacho et al. 2015, Martinez et al. 2010).

Adaptive neuro-fuzzy inference systems (ANFIS)

Adaptive neuro-fuzzy inference systems (ANFIS) is a powerful tool for modeling of complex system based on input and output data. ANFIS are realized by an appropriate combination of neural and fuzzy systems. This combination enables to use both the numeric power of intelligent systems. In fuzzy systems, different fuzzification and defuzzification strategies with different rules were considered for input parameters. For determining the effect of fuzzy logic on the input data, three stages should be considered. One-selecting the membership function for each input variable. In this stage, a Gaussian function for each of input variable maybe considered. Figure 2 shows a fuzzy reasoning process. For simplicity, illustrating a fuzzy system with two input variables and one output was considered. Suppose that the rule base containing two fuzzy if–then rules:

Fig. 2
figure 2

ANFIS model structure

$${\text{Rule}}1:{\text{if}}\,x\,{\text{is}}\,A_{1} \,{\text{and}}\,y\,{\text{is}}\,B_{1} \,{\text{then}}\,f_{1} \, = p_{1} x + q_{1} y + r_{1}$$
$${\text{Rule}}2:{\text{if}}\,x\,{\text{is}}\,A_{2} \,{\text{and}}\,y\,{\text{is}}\,B_{2} \,{\text{then}}\,f_{2} \, = p_{2} x + q_{2} y + r_{2} ,$$

where \(A_{1}\); \(A_{2}\) and \(B_{1}\); \(B_{2}\) are the MFs for inputs x and y; respectively; \(p_{1}\); \(q_{1}\); \(r_{1}\) and \(p_{2}\); \(q_{2}\); \(r_{2}\) are the parameters of the output function. ANFIS architecture is presented in Fig. 2 as follows: in the first layer, all the input variables gave the grade membership with membership function; in layer 2, all the membership grades will be multiplies together; in layer 3, all the grades of member will be normalized; in layer 4, the contribution of all the rules will be computed; and in the last layer, output variable will be computed as weighted average of grade membership (Riahi-Madvar et al. 2009).

Results and discussion

Empirical formula results

The performance of empirical formulas was evaluated by conducting a comparison with measured data. The results of the each empirical formula were plotted versus the measured data and are shown in Fig. 3. The standard error indices such as correlation coefficient (R 2) and root mean square of error (RMSE) were calculated for assessing the performance of the empirical formulas. The results of the error indices are given in Table 3. As clear from Fig. 3 and Table 3, the Emiroglu formula with correlation coefficient 0.64 and root mean square error 0.03 is accurate among the empirical approaches.

Fig. 3
figure 3

Performance of the empirical formulas to calculate the \({\text{Cd}}_{\text{sw}}\)

Table 3 The performance of empirical formulas

PCA result

To define the most affective parameters on the \({\text{Cd}}_{\text{sw}}\), the PCA technique was carried out on the collected data set, the ranges of which are given in Table 2. The results of the PCA are given in Fig. 4 and Table 4. As shown in Fig. 4, the Froude number and ratio of the weir height to the flow depth (P/h 1) are the most important parameters for predicting the \({\text{Cd}}_{\text{sw}}\). By paying attention to the PCA results and results of the empirical formulas obtained in Fig. 2, it could be found that the empirical formulas which considered more weight for the parameters such as Fr1 and especially \({P \mathord{\left/ {\vphantom {P {h_{1} }}} \right. \kern-0pt} {h_{1} }} ,\) such as Emiroglu formula are more accurate when compared to other empirical formulas.

Fig. 4
figure 4

The screw graph resulted from the PCA technique

Table 4 The table of component variance resulted from PCA technique

ANFIS models development

Developing ANFIS models similar to other neural network models is based on the data set. To this purpose, the data set ranges of these given in Table 2 were used and divided into two groups as training and testing. Choosing training and testing data sets was based on the randomized approach. Designing the structure of the ANFIS included the definition number of the membership function, hidden layer(s), activation function and learning algorithm. Choosing the number of the hidden layers and other model structures almost is based on trial and error, but the experience of the designer and recommendations of the other investigators who conducted similar studies are useful. Another approach for developing an optimal structure for ANFIS model is using the mathematical approach such as PCA. Result of the PCA shows that the Fr1 and P/h 1 are the most important parameters in the \({\text{Cd}}_{\text{sw}}\) prediction. Developing the ANFIS model structure based on the PCA requires considering more number of neurons to the Fr1 and P/h 1. The ANFIS model has a main advantage when compared to other ANN models such as multilayer perceptron neural network (MLP) model in the utility of structure designing stage. This utility is related to specifying the number of the membership function to the input variables based on these influences on the output parameter. PCA results can be applied for developing the structure of ANFIS model, so the utility of ANFIS model leads to develop a model that is more optimal and has more reliability because each parameter which is more affective on the outputs can get more membership function. The results of the ANFIS model to predict the \({\text{Cd}}_{\text{sw}}\) are shown in Figs. 5 and 6. As mentioned in the past, the data set is randomly divided in two groups as training and testing data set. Training data set is about 80 % of the total collected data and the remaining data set (20 %) was used for testing. The structural of the ANFIS which has best performance is given in Table 5; as shown in Table 5, the Gaussian function (guassmf) was considered for the membership function and weight average (wtaver) approach was considered for defuzzification method. As shown in Figs. 5 and 6, the histogram and distribution of the errors are also plotted for assessing the performance of the ANFIS model in stage of the training and testing. As clear from Table 5, the Fr and P/h 1 have more neurons when compared to the other parameters. Overall, as shown in Figs. 5 and 6, the ANFIS model’s ability is suitable for predicting the values of the \({\text{Cd}}_{\text{sw}}\) in the training and testing stages and also this model has suitable performance to predict the maximum values of the \({\text{Cd}}_{\text{sw}}\). The results of this study uphold the results of Ebtehaj et al. (2015a) and Ebtehaj et al. (2015b). Ebtehaj et al. (2015a) stated that for prediction of discharge coefficient of side weir using the GMDH, Fr and P/h 1 are the most important parameters also reviewing the studies which were conducted by Ebtehaj et al. (2015b) and Emiroglu et al. (2011a, b) showed that they considered more weight for the both parameters during the model development.

Fig. 5
figure 5

The performance of ANFIS model during the training stage

Fig. 6
figure 6

The performance of ANFIS model during the testing stage

Table 5 The structure and summery of the ANFIS structures

Conclusion

In predicting the discharge coefficient of the weirs spatially, side weirs play a key role in the hydro-system management. Recently by advancing the neural network techniques in the water engineering studies, modeling of hydraulic phenomena is carried out more accurately. Although the ANN models have high ability for predicting the discharge coefficients of the hydraulic structures, especially side weirs, optimal designing of these structure is an important factor which leads to increase the reliability of the ANN model. Using the mathematical techniques such as principal component analysis (PCA) helps to define the most important parameters which have influence on the desired phenomena. In this paper, using the PCA it was found that the Froude number and ratio of the weir height to the upstream flow depth (P/h 1) are the most influential parameters on the \({\text{Cd}}_{\text{sw}}\). Therefore, during the ANFIS model development more number of membership function was considered to these parameters. Overall, using the PCA results leads to preparing an optimal structure the ANFIS model.