Introduction

Weir is one of the oldest structures of low-water hydraulics. It was built in watercourses for numerous reasons, such as to measure water flow, lift water upstream, and control flow velocity (Razmi et al. 2022). In the case of weirs, local scour downstream is an important topic for researchers and has been the subject of numerous studies in recent decades (Bormann and Julien 1991; Gaudio et al. 2000; Lenzi et al. 2002; D’Agostino and Ferro 2004; Lu et al. 2013; Pagliara and Kurdistani 2013). Scouring is the process by which turbulent water flow removes sediment around the structure (Habib et al. 2021), exposing the structure to corrosion and possible failure. Scouring begins upstream of the downstream edge when the water stream creates a shear stress greater than the critical shear stress of the sediment downstream of the weir. In addition, the hole dimensions of the vertical scour change much faster than those of the horizontal (Salih et al. 2020). Moreover, sediment suspension is the only way to transport sediment in the initial stage. Moreover, sediment transport becomes part of the combination of bottom and suspended load as the diameters of the vertical scour holes change. Asymptotically, the criterion of equilibrium is reached when the scour rate increases to zero. This process is influenced by the flow properties of the water and the physical properties of the soil.

Construction of adequate protection measures is only possible if the location and depth of scour downstream are accurately estimated and understood (Guven and Gunal 2008a; Deng and Cai 2010; Carvalho et al. 2019). However, the correct estimation of scour depth, \({D}_{s}\), is a difficult process due to the complexity of water flow patterns around hydraulic structures. Therefore, in recent decades, scour processes have been modeled using deterministic models of varying complexity and precision (Sharafati et al. 2020c). Considering the different flow, time, and material properties in field and laboratory tests, several empirical equations have been developed to predict \({D}_{s}\) (Bormann and Julien 1991), (D’Agostino and Ferro 2004), (Chee and Padiyar 1969; Mossa 1998; Olsen and Kjellesvig 1998; Marion et al. 2004). However, these studies mostly rely on traditional scouring based on physical models, which in turn limits the understanding of the overall scouring process. In addition, the use of empirical equations leads to under- or overestimation of scour depth (Goel and Pal 2009; Tafarojnoruz 2012; Sharafati et al. 2021).

Another way to predict scour depth is to use a numerical model. Different types of numerical techniques have been used to evaluate the scour process by solving the formula for sediment transport and the equations of Navier–Stokes (Sharafati et al. 2020b). The advancement of numerical models (Mohd Yusof et al. 2023) (e.g., Flow3D, ANSYS-FLUENT, and OpenFoam) has made them an effective option for modeling scour depth (Sharafati et al. 2020b). However, due to the high-computational cost for complicated problems, the use of numerical models is limited (Tafarojnoruz 2012). On the other hand, soft computing approaches (SC) provide appealing features for simulating complicated relationships between input and output variables. Based on observational data, the approaches of SC mimic the desired trends, unlike empirical calculations that may not reflect the complexity of the scouring process (Sharafati et al. 2021) (Najafzadeh et al. 2017; Sharafati et al. 2020b). In the approach of SC, the predictive modeling is automatically generated and does not rely on the user's knowledge. In addition, the models in the approach of SC can recognize implicit relationships between features (Bagheri et al. 2024), while empirical equations rely on explicit data before building models to explore the relationships between different parameters. Moreover, the models of SC provide less expensive and more flexible methods for studying complicated issues (Gholami et al. 2016). The ability of the models of SC, to simulate \({D}_{s}\) problems, has been effectively implemented in many studies in the last decades (Sharafati et al. 2021), (Azamathulla et al. 2008a, b; Guven and Gunal 2008a; Altan and Hacıoğlu 2020). Table 1 provides some examples of the application of various SC models to predict \({{\text{D}}}_{{\text{s}}}\).

Table 1 Examples of different SC models in modeling scouring depth

Accurately predicting scour depth is crucial for the longevity of water infrastructures. Yet, the task is difficult (Abudallah Habib et al. 2021) due to factors like complex shapes, water behavior, and sediment motion (Tao et al. 2021). The limitations of existing models and traditional equations, which result in inaccuracies (Fuladipanah et al. 2023; Le and Thu Hien 2024), emphasize the necessity for the development of better approaches to guide sustainable and safe hydraulic design. Therefore, researchers developed in previous works several statistical and AI-based models to predict the scour depth (Marulasiddappa et al. 2024). As shown in Table 1, several bio-inspired algorithms have been used to hybridize AI models to capture the nonlinear relationship between the input variables and the associated targets by adjusting the parameters that have a large impact on model accuracy in the most efficient way (Moayedi et al. 2019; Xu et al. 2020; Zhou et al. 2020). Although predictive models have made significant progress, their structure is becoming more complicated due to the incorporation of AI techniques with complicated algorithms, which makes it more difficult to understand their results and behavior. In addition to the complexity of the applied models, some machine learning-based models have several shortcomings, such as generalization problems and difficulties in understanding and explaining the decisions. Moreover, machine learning models usually require multiple techniques to determine and select their hyperparameters (Mohammadi 2023), such as the type of transfer function, the number of hidden neurons, the learning rate, and kernel function, so these parameters were usually determined by trial and error (Wang et al. 2018; Liu et al. 2020; Panda and Panda 2020). These problems can be solved by the novel and alternative model of high-order response surface method (HORSM) for predicting the scour depth. It is important to note that HORSM is an improved version of the classic response surface methodology model. Numerous researchers have used the traditional response surface methodology (RSA) to efficiently develop predictive models to solve many problems in various fields such as construction and materials (Hameed et al. 2021a), water resources (Keshtegar et al. 2018), and hydrology (Keshtegar and Kisi 2017). In addition, an improved version of RSA called HORMS has been introduced in recent years to solve challenging environmental and water resources problems (Keshtegar and Kisi 2017), (Keshtegar et al. 2019). Despite the successes achieved with HORMS, its applications in solving civil engineering problems in hydraulic and hydrology are still limited to the best of the author's knowledge. Therefore, in this study, HORSM models based on different polynomial functions are used to predict scour depth downstream of weirs. Moreover, the reported results are validated and compared with the models from ANN to investigate the performance of the applied methodology in solving a significant hydraulic problem.

The rest structure of the paper is organized as follows. In Sect. “Methodology,” there is a detailed description of the experimental data, applied models (HORSM and ANN), and statistical measures. Evaluating and analyzing the obtained results from HORSM model and other comparable models are further discussed in Sect. “Results and discussion.” Finally, the conclusion and research recommendations are provided in Sect. “Conclusion.”

Methodology

Data collection

Several factors influence scour development downstream of weirs, e.g., flow conditions, bed material, tailwater depth, and weir geometry (Goel and Pal 2009), (Najafzadeh 2015). Therefore, it is possible to estimate scour depth using the following relationship:

$$D_{s} = f\left( { v, \rho , g, h_{o} ,h_{t} , \cup_{o} , \rho_{s} , d_{50} ,\sigma_{g} , \cup_{c} , z, b} \right),$$
(1)

where \(v\) is the kinematic viscosity of the fluid, \(\rho\) is the water density, \(g\) is the acceleration, \({h}_{o}\) is the average approximation of the flow depth, \({h}_{t}\) is the depth of embodied tailwater, \({\cup }_{o}\) is the average approach of the flow velocity, \({\rho }_{s}\) is the particle density at the bed, \({d}_{50}\) is the mean value of the particle size at the bed, \({\sigma }_{g}\) is the standard deviation of the particle size at the bed, \({\cup }_{c}\) is the average approximation of the critical flow velocity, z is the weir height, and b is the weir depth. The influence of non-dimensional parameters such as \(\frac{{\cup }_{o}}{{\cup }_{c}}\), \(\frac{{d}_{50}}{{h}_{t}}\), \(\frac{z}{{h}_{t}}\) has been investigated in many studies. These studies found that the non-dimensional parameters performed better than the dimensional parameters in estimating the scour depth (Karbasi and Azamathulla 2017), (Najafzadeh 2015). Other studies show that variations in approach flow and tailwater have a significant effect on scour depth (Guven 2011), (Najafzadeh 2015), (Onen 2014). Gunan et al. (Guan et al. 2016) find that decreasing tailwater depth (\(\frac{z}{{h}_{t}}\)) and increasing flow intensity (\(\frac{{\cup }_{o}}{{\cup }_{c}}\)) increases scour depth. In addition, other studies have shown that the width of the weir (\(\frac{b}{{h}_{t}})\) has a large effect on scour depth (Najafzadeh 2015), (Roushangar et al. 2016). Therefore, based on the dimensional analysis, the following equation is used (Guan et al. 2016).

$$\frac{{D_{s} }}{{h_{t} }} = f\left( {\frac{{d_{50} }}{{h_{t} }}, \frac{z}{{h_{t} }}, \frac{{ \cup_{o} }}{{ \cup_{c} }}} \right)$$
(2)

where \(\frac{{d}_{50}}{{h}_{t}}\) represents the effect of particle size in the bed, \(\frac{z}{{h}_{t}}\) represents tailwater depth, and \(\frac{z}{{h}_{t}}\) represents flow conditions.

Data from four published studies (D’Agostino and Ferro 2004), (D’agostino 1994), (Veronese 1937), (Falciai and Giacomin 1978) with different acquisition conditions were combined into a dataset of 186 experimental data to evaluate the capabilities of the high-order response surface in predicting scour depth. The statistical description of the data used can be found in Table 2 and Fig. 1. Furthermore, the data utilized in this research have been gathered and can be located in supplementary file accompanying this study (Appendix 1).

Table 2 Statistical description of the used variable
Fig. 1
figure 1

The correlation matrix of the used variables

High-order response surface method (HORSM)

In general, response surface approach (RSA) is an empirical, mathematical, and statistical modeling approach used to study multiple regressions (MR) by using quantitative data from specific observations to simultaneously solve multivariate equations. RSA quantifies the relationships between one or more measured variables and three input variables that primarily influence the response. Moreover, RSA uses a simple and explicit function to approximate an originally complicated and implicit limit state function. However, the correctness of the results strongly depends on how the properties of the original boundary condition are correctly represented by the approximation function. The adequacy of the generated RSA depends mainly on the appropriate positioning of the so-called sample points that approximate the response function using a typical regression approach. In general, the RSA uses a second-order polynomial form with cross terms. By including cross terms in the RSA second-order polynomial, a reasonable prediction for the time series can be achieved. Since the prediction of the scour downstream (\({D}_{s}\)) is a complex and highly nonlinear process, and using a second-order RSA polynomial may not provide reliable prediction (Keshtegar et al. 2016), a new version of RSA is proposed based on a high-order polynomial function mixed with the extremely nonlinear cross term to form the method high-order response surface (HORSM). The HORSM can be represented in Eq. (3), considering several independent variables \({\varvec{R}}\left\{{r}_{1},{r}_{2},{r}_{3},\dots \right\}\)

$${\check{S} }\left( r \right) = \gamma_{0} + \mathop \sum \limits_{i = 1}^{n} \gamma_{i} r_{i} + \mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = i}^{n} \gamma_{ij} r_{i} r_{j} + \mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{n} \alpha_{ij} r_{i} r_{j}^{2} + \mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{n} b_{ij} r_{i} r_{j}^{3} + \mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{n} d_{ij} r_{i} r_{j}^{4} + \mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{n} g_{ij} r_{i} r_{j}^{5} ,$$
(3)

\({\check{S} }\left( v \right)\) represent high-order calculated RSA for ….., \(n\) is the input predictor number, \({\gamma }_{0}\), \({\gamma }_{i}\), \({\gamma }_{ij}\), \({\alpha }_{ij}\), \({b}_{ij}\), \({d}_{ij}\), and \({g}_{ij}\) represent the unknown coefficients which can be determined using Eq. (4) as follows (Keshtegar et al. 2016):

$$\frac{{\left( {n + 1} \right)\left( {n + 2} \right)}}{2} + \left( {\mu - 2} \right)n^{2}$$
(4)

Here, \(\mu\) is the order of RSA, ranging from 2 to 6 in this study. Calibration of multiple experimental designs is considered in the calculation of coefficients. In general, the unknown RSM coefficients in Eq. (1) are calculated using the least-squares error approach by minimizing the error between the actual \(S\left( r \right)\) and the calculated \({\check{S} }(r)\)=\(P{(r)}^{T}a)\) for \({D}_{s}\) values. Equation (5) represents the error as follows:

$$E(r) = [S(r) - P(r)^{T} a]^{T} [S(r) - P(r)^{T} a]$$
(5)

where \({\varvec{S}}\left(r\right)=[{S}_{1}, {S}_{2}, {S}_{3}, \ldots , {S}_{NI}]\) is the actual value and \(P{(r)}^{T}=\left[P\left({S}_{1}\right), P\left({S}_{2}\right), P\left({S}_{3}\right), \ldots , P\left({S}_{NI}\right)\right]\) is the predicted value based on a high-order polynomial for \(NI\) number of observed data. Equation (6) shows the predicted scour depth downstream in weirs using 6th order RSA considering several variables (assume 3 variables).

$$P\left( S \right) = \left[ {1, S_{1} , S_{2} , S_{3} , S_{1}^{2} , S_{1} S_{2} , S_{1} S_{3} , S_{2}^{2} , \ldots , S_{1}^{3} , S_{1} S_{2}^{2} , S_{1} S_{3}^{2} , \ldots , S_{1}^{4} , S_{1} S_{2}^{3} , S_{1} S_{3}^{3} , \ldots , S_{1}^{5} , S_{1} S_{2}^{4} , S_{1} S_{3}^{4} , \ldots } \right]$$
(6)

A system in linear form can be obtained by minimizing the error function in Eq. (6) for the unknown coefficients \(a\), also the value of \({D}_{s}\) can be predicted using Eq. (7) as follows:

$${\check{S} }\left( r \right) = P\left( r \right)^{T} \left[ {P\left( r \right)^{T} P\left( r \right)} \right]^{ - 1} \left[ {P\left( r \right)^{T} P\left( r \right)} \right]$$
(7)

As mentioned earlier, using high-order polynomial RSA can produce a more accurate prediction of \({D}_{s}\) than the use of second-order polynomials in a strongly nonlinear process. The prediction of \({D}_{s}\) using HORSM can be summarized as follows:

  • Set the input data for \({D}_{s}\) with the predictors \({r}_{1},{r}_{2},{r}_{3},\dots\).

  • Set the order of RSA and use Eqs. (2) and (4) to estimate the predicted value for training the model \(P\left(S\right)\) from the data.

  • Fix the predicted vector for all samples (\(P\left({r}_{all}\right)\)) for both training and test data.

  • Use Eq. (8) to obtain the predicted \({D}_{s}\)\(\left( {\check{S} (r)} \right)\)

    $${\check{S} }\left( r \right) = P\left( {r_{all} } \right)^{T} \left[ {P\left( r \right)^{T} P\left( r \right)} \right]^{ - 1} \left[ {P\left( r \right)^{T} P\left( r \right)} \right]$$
    (8)

In this study, the HORSM software code was built using the MATLAB language based on four prior phases. Figure 2 illustrates the framework of the proposed HORSM model.

Fig. 2
figure 2

The structure of HORSM

Feedforward neural network

Feedforward neural network (FFNN) is an algorithm that has layers comparable to the processing unit for human neurons. Each unit (node) in the FFNN layer is connected to all other units in the layers (Hameed et al. 2021b). These connections with the layers are not all identical, as the weight and strength of each connection can change (see Fig. 3). Processing of information in a neural network (NN) involves input of data by the input units and movement from one layer to the next through the network until it reaches the output units (Hagan and Menhaj 1994; Üstün et al. 2020). In FFNN, the data are transmitted only in one direction from the input nodes, if any, to the hidden nodes and then to the output nodes (Hertz 2018). The output values are computed by the neural network based on the input values, while the intermediate results of the computations are associated with the hidden layer nodes.

Fig. 3
figure 3

The structure of FFNN

Each j-node receives input from each i-node in the previous layer and each input signal (\({X}_{i}\)) is associated with a weight (\({R}_{ij}\)). The incoming signal (\({E}_{j})\) is the sum of all incoming signals plus the neuron's threshold (\({t}_{j})\).

$$E_{j} = \mathop \sum \limits_{i = 1}^{n} X_{i} R_{ij} + t_{j}$$
(9)

To generate the node's outgoing signal (\({y}_{i})\), the incoming signal (\({E}_{j})\) is passed through a nonlinear activation function. The logistic sigmoid function is the most commonly used of this type of networks. The function of this transmission is continuously differentiated, monotonic, symmetric and bounded from 0 to 1 (Hecht-Nielsen 1990). The function can be mathematically expressed by the following equation (see Eq. 10) (Lipu et al. 2021). However, in this study, we use a linear transfer function for the output layer. It is important to mention that the ANN has been trained using the Marquardt algorithm, which allows for effective optimization and learning.

$$f\left( {E_{j} } \right) = \frac{1}{{1 + e^{{ - E_{j} }} }}$$
(10)

Model development

In this study, classical RSM models and four models based on HORSM supported by polynomial functions ranging from 3 to 6 orders, as well as ANN models, have been constructed to predict the depth of scour downstream of weirs. For constructing the models, 186 experimental samples were used. Out of these samples, 66% (123 samples) were randomly selected for training the models, and the remaining 63 samples were used for testing the models. It is worth noting that the MATLAB programming language was used to develop all the prediction models. During this work, the training samples were used to adjust the models' parameters. For ANN models, the Levenberg–MarquardIn hydraulic engineering, it is importantt algorithm was chosen to train the models, and hidden nodes within the range of 2 to 15 were used. The testing data were used to evaluate the efficiency of predicting based on several statistical metrics and graphical figures. Finally, sensitivity analysis is also executed to select the most effective predictors that have a large impact on depth of scour.

Performance evaluation

Several statistical parameters, such as the coefficient of determination (\({R}^{2}\)), root mean square error (RMSE), mean absolute error (MAE), correlation coefficient (R), root mean square relative error (RMSRE), Willmott index (WI), mean absolute percentage error (MAPE), and relative root mean square error (RRMSE), are used to investigate the accuracy of the proposed models. The following mathematical formulas describe each parameter (Hameed et al. 2022).

$$R^{2} = 1 - \frac{{\mathop \sum \nolimits_{{{\text{r}} = 1}}^{{\text{n}}} \left( {D_{s}^{r,m} - D_{s}^{r,c} } \right)^{2} }}{{\mathop \sum \nolimits_{{{\text{r}} = 1}}^{{\text{n}}} \left( {D_{s}^{r,m} - \overline{{D_{s} }}^{r,m} } \right)^{2} }}$$
(11)
$${\text{RMSE}} = \sqrt {\frac{1}{n}\mathop \sum \limits_{r = 1}^{n} \left( {D_{s}^{r,m} - D_{s}^{r,c} } \right)^{2} }$$
(12)
$${\text{MAE}} = \frac{1}{n}\mathop \sum \limits_{{{\text{r}} = { }1}}^{n} \left| {D_{s}^{r,m} - D_{s}^{r,c} } \right|$$
(13)
$$R = \frac{{\mathop \sum \nolimits_{{{\text{r}} = 1}}^{n} \left( {D_{s}^{r,m} - \overline{{D_{s} }}^{r,m,} } \right)\left( {D_{s}^{r,c} - \overline{{D_{s} }}^{r,c} } \right)}}{{\sqrt {\mathop \sum \nolimits_{r = 1}^{n} \left( {D_{s}^{r,m} - \overline{{D_{s} }}^{r,m} } \right)^{2} \mathop \sum \nolimits_{{{\text{r}} = 1}}^{n} \left( {D_{s}^{r,c} - \overline{{D_{s} }}^{r,c} } \right)^{2} } }}$$
(14)
$${\text{RMSRE}} = \sqrt {\frac{1}{{\text{n}}}\mathop \sum \limits_{{{\text{r}} = 1}}^{n} \left( {\frac{{D_{s}^{r,m} - D_{s}^{r,c} }}{{D_{s}^{r,m} }}} \right)^{2} }$$
(15)
$${\text{WI}} = 1 - { }\frac{{\mathop \sum \nolimits_{r = 1}^{n} (D_{s}^{r,m} - D_{s}^{r,c} )^{2} }}{{\mathop \sum \nolimits_{r = 1}^{n} (\left| {D_{s}^{r,c} - { }\overline{{D_{s} }}^{r,m} } \right| + { }\left| {D_{s}^{r,m} - { }\overline{{D_{s} }}^{r,m} } \right|)^{2} { }}}$$
(16)
$${\text{MAPE}} = \frac{1}{n}\frac{{\mathop \sum \nolimits_{r = 1}^{n} \left| {D_{s}^{r,m} - D_{s}^{r,c} } \right|}}{{\mathop \sum \nolimits_{r = 1}^{n} D_{s}^{r,c} }}$$
(17)
$${\text{RRMSE}} = \frac{{\sqrt {\frac{1}{n}\mathop \sum \nolimits_{r = 1}^{n} \left( {D_{s}^{r,m} - D_{s}^{r,c} } \right)^{2} } }}{{\mathop \sum \nolimits_{r = 1}^{n} D_{s}^{r,m} }} \times 100$$
(18)

where \({D}_{s}^{r,m}\) and \({\overline{{D}_{s}}}^{r,m}\) are the actual value and the mean of the actual value of the scour depth (\({D}_{s})\) at the \({r}^{{\text{th}}}\) sample. \({D}_{s}^{r,c}\) and \({\overline{{D}_{s}}}^{r,c}\) are the predicted and the mean of the predicted value of \({D}_{s}\) at the \({r}^{{\text{th}}}\) sample, respectively. The mentioned metrics are commonly employed for model validation (Mamata et al. 2023). Some of these metrics, such as RMSE, RMSRE, MAPE, RRMSE, and MAE, are utilized to account for prediction errors. When the model produces a minimum value close to zero, it serves as a positive indicator that the model has a precise prediction. On the other hand, metrics such as R, R2, and WI are used to assess how well the predicted values align with the observed values during validation. The ideal value for these metrics is one (Hameed et al. 2024), indicating a perfect match between the model's predictions and the observed values. Therefore, a higher value close to one for these metrics signifies a good model.

Results and discussion

Two models were created to predict the maximum scour depth \(\frac{ds}{ht}\) downstream of the weirs, namely the high-order response surface method model (HORSM) and the ANN model. As for the structure of HORSM models, different high-order polynomial functions (second, third, fourth, and fifth-order) were tested to find the best predictive model. For model construction, 66% of the samples are used to train the proposed models, and the rest are proposed for testing. Finally, sensitivity analysis is also performed to select the most effective predictors that have a large impact on the \(\frac{ds}{ht}\).

Graphical representations and statistical measures were used to evaluate the performance of each model to assess predictive ability. The performance of the applied models in the training phase is shown in Table 3. Overall, most of the applied models achieved good predictions for \(\frac{ds}{ht}\), with the exception of the classical RSA (RMS2) and ANN models, respectively. For example, the ANN and the classical RSM models were founded with a higher prediction error: MAE of 0.121 and 0.141; RMSE of 0.176 and 0.194; MAPE of 0.237 and 0.269; RRMSE of 29.163 and 32.077; R of 0.825 and 0.781; and WI of 0.902 and 0.867. On the other hand, the \({{\text{HORSM}}}_{5}\), and \({{\text{HORSM}}}_{6}\) provided excellent predictions with lower MAE (0.090, 0.087), RMSE (0.133, 0.131), MAPE (0.188, 0.184), RRMSE (22.018, 21.709), R (0.904, 0.906), and WI (0.948, 0.949). Also, based on the results presented in Table 3, both HORMSM3 and HORSM4 exhibit a relatively similar performance in predicting the depth of scour. The MAE is 1.04/0.097, RMSE is 1.47/1.38, RRMSE is 24.57/22.752, R is 0.881/0.897, MAPE is 0.216/0.202, and WI is 0.934/0.943, respectively, for HORSM3/HORSM4. Thus, the statistical analysis shows that the performance of \({{\text{HORSM}}}_{5},\) and \({{\text{HORSM}}}_{6}\) is very similar in the training phase.

Table 3 The performance metrics used to assess the models’ efficiency through the training phase

In this study, a visualization comparison is also performed to visually analyze the performance of each proposed model compared to the observed targets. The scatter plots shown in Fig. 4 provide further information on both the performance of the applied model and the deviation between predicted and observed ds/ht in the training phase. The graphs visually display the relationship between predicted and measured values of depth scour. These figures help to assess the accuracy of the prediction model and understand how closely the predicted values align with the actual measurements. They also indicate the variability in the model's predictions by showing how the predicted points are spread out around the measured values. Additionally, scatter plots provide the value of R2, which is a statistical measure of how well the model fits the data. It can be observed that the classical RSM and ANN models provide poor predictions that are further away from the corresponding (actual) values. Moreover, the classical RSM and ANN models have a significantly low predictive capacity in terms of the coefficient of determination of the correlation (\({R}^{2}\)). Upon observation, it is evident that both the classical RSM and the generated predicted samples deviate significantly from the fit line in the scatter plot. The respective values of R2 for these models are 0.611 and 0.680. In contrast, the applied models such as \({{\text{HORSM}}}_{3}\), \({{\text{HORSM}}}_{4}\), and \({{\text{HORSM}}}_{5}\), as well as \({{\text{HORSM}}}_{6}\), provide a good and higher value of \({R}^{2}\). Additionally, the estimated points of both models show less scattering around the ideal line (1:1), indicating a higher quality of prediction. Moreover, the HORSM6 model exhibits the highest accuracy with an R2 value of 0.822, followed by HORSM5 with an R2 value of 0.817, HORSM4 with an R2 value of 0.804, and HORSM3 with an R2 value of 0.775.

Fig. 4
figure 4

A comparison between the predictions of the applied model and the actual values of \(\frac{ds}{ht}\) during the training phase

To further visually examine the performance of the model, we constructed both a Taylor plot (Taylor 2001) and a violin plot (Hintze and Nelson 1998). Three statistics, namely the Root Mean Square Difference (RMSD), the normalized standard deviation, and the correlation coefficient, are used to create the Taylor plot, which is a polar plot. The higher the power of the model in the Taylor plot, the closer the model values are to the observed values (Sigaroodi et al. 2014). A violin plot is a boxplot that is combined with a kernel density plot to show the distribution of the data. From Fig. 5, it can be seen that the classical RSM and ANN models have a major problem in predicting \(\frac{ds}{ht}\). However, the HORSM models with high-order polynomial functions provide satisfactory predictions. From Fig. 6, it can be seen that the \({{\text{HORSM}}}_{4}\), \({{\text{HORSM}}}_{5}\), and \({{\text{HORSM}}}_{6}\) models are very close to the location of the observed \(\frac{ds}{ht}\), which means that there is a very good agreement between the predicted and observed values. Furthermore, from the figure, it can be observed that both \({{\text{HORSM}}}_{5}\) and \({{\text{HORSM}}}_{6}\) models are the closest to the benchmark value (represented by the measured point). These models also exhibit the highest correlation, the fewest RMSD, and a closer standard deviation to the observed measured point according to the Taylor diagram. Overall, the \({{\text{HORSM}}}_{5}\) and \({{\text{HORSM}}}_{6 }\) models are the best models according to the quantitative and visual analysis. Moreover, the performance of these models in the training phase is very similar.

Fig. 5
figure 5

Violin plots showing visual comparison between actual and predicted values of \(\frac{ds}{ht}\) during the training phase

Fig. 6
figure 6

The Taylor diagram shows the similarity between the applied model predictions and the observed values of \(\frac{ds}{ht}\) during the training phase

In order to select the best and most reliable predictive models, some recent studies have shown that the training step is not sufficient and may lead to misleading judgments due to different conditions that may occur in the calibration processes (Alomar et al. 2020; Hameed et al. 2021c, a). Therefore, the testing phase is crucial for identifying the most accurate model. Moreover, the best model is able to perform well in both the training and testing phases. The quantitative evaluation of all applied models in the testing phase is summarized in Table 4. According to Table 4, the \({{\text{HORSM}}}_{5}\) model has the lowest prediction error (MAE = 0.083, MAPE = 0.135, RMSE = 0.115 and RRMSE = 18.079) and the highest similarity with the observed scour values (R = 0.955, and WI = 0.972) compared to the other models. Similar to the training phase, both HORSM3 and HORSM4 exhibit similar performance in the testing phase, as shown in Table 4. However, HORSM4 demonstrates slightly better accuracy. The statistical performance for HORSM3 and HORSM4 includes lower prediction error with MAE (0.097 and 0.095), RMSE (0.139 and 0.134), MAPE (0.156 and 0.164), and RRMSE (21.818 and 21.036). Additionally, both models achieve good accuracy with R (0.954 and 0.935) and WI (0.954 and 0.962). Overall, the excellent ability of the \({{\text{HORSM}}}_{5}\) model shows in the reduction of RMSE values by 44.17%, 17.27%, 14.18%, 36.46% and 29.01% compared to the classical RMS,\({{\text{HORSM}}}_{3}\), \({{\text{HORSM}}}_{4}\), \({{\text{HORSM}}}_{6}\), and ANN models, respectively. Moreover, the evaluation of the visualization in scatter plots (see Fig. 7) shows that the \({{\text{HORSM}}}_{5}\) model has the highest similarity of predictions with the actual values with an \({R}^{2}\) is 0.912, followed by \({{\text{HORSM}}}_{3}\)(\({R}^{2}\) = 0.894), ANN (\({R}^{2}\) = 0.886), \({{\text{HORSM}}}_{4}\)(\({R}^{2}\) = 0.874), the classical RMSE model (\({R}^{2}\) = 0.810), and \({{\text{HORSM}}}_{6}\) (\({R}^{2}\) = 0.874).

Table 4 The performance metrics used to evaluate the efficiency of the models in the testing phase
Fig. 7
figure 7

A comparison between applied model predictions and actual values of \(\frac{ds}{ht}\) during the testing phase

On the other hand, Fig. 7 shows that although the \({{\text{HORSM}}}_{6}\) model has excellent accuracy in the training phase, it has the lowest predictive capacity in the testing phase. In addition to its poor prediction, \({{\text{HORSM}}}_{6}\) also provides negative estimates. The situation can be interpreted that this model suffers from overfitting in the training phase and therefore has shown its weaknesses in providing more reliable estimates in the testing phase. The violin diagram (Fig. 8) shows the ability of each model to predict scour depth. Some of the applied models provided very poor estimates, such as \({{\text{HORSM}}}_{6}\), and ANN. The model ANN is not able to efficiently estimate the higher value of scour depth. On the other hand, the \({{\text{HORSM}}}_{6}\) model fails to simulate the lower value of scour depth. However, the \({{\text{HORSM}}}_{5}\) model is superior compared to the other models and manages to simulate the highest and the lowest value of the scour depth.

Fig. 8
figure 8

Violin plots showing a visual comparison between actual and predicted values of \(\frac{ds}{ht}\) during the testing phase

Finally, as can be seen in Fig. 9, a Taylor diagram is created for the test data set to allow a more comprehensive evaluation and to select the best predictive models. As for the best predictive model, the results from Fig. 9 are consistent with previous evaluations, whether quantitative, visual, or both. It can be seen that the \({{\text{HORSM}}}_{5}\) model is the best model because it has a very good agreement between its predictions and the observed values. Moreover, all previous evaluations have shown that HORSM is very sensitive to the order of the polynomial degree. Thus, it can be seen that HORSM with a fifth-degree polynomial gives the best predictions and outperforms the AI model (ANN).

Fig. 9
figure 9

The Taylor diagram shows the similarity between the applied model predictions and the observed values of \(\frac{ds}{ht}\) during the testing phase

Comparison of the attained results to sophisticated models and empirical relations

The obtained results from the best predictive model in this study (\({{\text{HORSM}}}_{5})\) are validated against well-established empirical equations and other AI models. Comparing the performance of the proposed model in predicting the scour depth with models developed in the literature may give a clear and important impression about the efficiency and validity of the proposed model.

Sharafati et al. (Sharafati et al. 2020a) developed a hybrid model based on the integration of the bio-inspirited algorithm (invasive weed optimization (IWO)) and adaptive neuro-fuzzy inference system (ANFIS) to estimate the souring depth. The applied model (ANFIS-IWO) managed to provide satisfactory accuracy of prediction (R = 0.932), and however, this model also generated relatively higher prediction error (RMSE = 0.148, and MAE = 0.108). Furthermore, an empirical equation-based model is created by (Guan et al. 2016; Sharafati et al. 2020a) to predict the souring depth. The results showed the accuracy of the model was not accurate as expected, providing a higher prediction error with RMSE = 0.447, and MAE = 0.395. Based on the reported results from both models, it can be concluded that the pattern of the scouring depth is complicated and the existing model could not efficiently capture the underlying relations between predictors and scouring depth.

For further quantitative comparison, the accuracy of the proposed model of this study compared with more and advanced models developed in previous studies (Elkiki 2018; Guven and Gunal 2008b; Goel and Pal 2009; Guven 2011; Najafzadeh 2015; Sattar et al. 2018) in terms of correlation coefficient (R). Figure 10 summarizes the prediction accuracies of eight different AI modeling techniques. The reviewed models yielded good estimates, having higher similarity between their targets and the actual scouring depth. Besides, the correlation coefficient of these models varies between 0.905 and 0.948, while the proposed model (\({{\text{HORSM}}}_{5})\) showed the highest accuracy (R = 0.955). Lastly, the validation illustrates that the \({{\text{HORSM}}}_{5}\) model achieved higher prediction accuracy compared to other models.

Fig. 10
figure 10

Comparison between the accuracy of the proposed model and other models developed in the previous study throughout the testing stage

Sensitivity analysis

In hydraulic engineering, it is important to determine which parameters have a significant effect on scour depth (\(\frac{ds}{ht}\)). In this section, we explain the approach used to select the most important predictors. After selecting the best predictive model (\({{\text{HORSM}}}_{5}\)), different input combinations were used to train the model. All possible input combinations were considered when performing the sensitivity analysis (see Table 5). It is important to mention that the model is trained with 66% of the data and the remaining data are used for testing. According to the results of the testing phase (see Fig. 11), the most important critical case is the second case (combination II). The most important indication is that the prediction accuracy decreases drastically when the factor (z/ht) is not present. Moreover, the factor z/ht is very important for the dynamics of the scour depth compared to other factors. The second and third important parameters are Uo/Uc, and d50/ht, respectively.

Table 5 Different input combinations used to perform the sensitivity analysis
Fig. 11
figure 11

Results of sensitivity analysis for different combinations

Discussion

The primary focus of this paper is to explore the possibility of utilizing a suitable and efficient prediction model for accurately estimating the depth of scouring downstream of weirs. The accurate estimation of scouring depth holds significant importance in the field of water resources as it ensures the sustainability of hydraulic structures. However, predicting the depth of scouring poses a considerable challenge in the hydraulic discipline due to the involvement of various complex parameters, such as irregular geometry, the natural characteristics of water flow dynamics, and the nonlinear movement of sediments. The limitations of existing empirical formulas necessitate the development of advanced models capable of accurately predicting the impact of these parameters on scouring patterns. Additionally, a reliable predictive model would greatly assist in the sustainable design of hydraulic structures. In this study, the HORSM was employed, equipped with different orders of high-order polynomial functions ranging from two to six. The findings of this research indicate that HORSM5 is the most effective model, demonstrating superior performance compared to AI-based models (ANN). The fifth order of the HORSM polynomial function provides the most precise predictions, exhibiting a higher R2 of 0.912 and WI of 0.972 when compared to the values obtained using ANN (R2 = 0.886 and WI = 0.927). Furthermore, the accuracy of the predictions is evidenced by a reduction in mean square error of up to 44.17% and 29.01% in comparison with classical RSM and ANN, respectively.

In addition, the proposed model, HORSM5, was validated against various sophisticated models and empirical models developed in previous works, including DNN, hybrid model (GMDP-PSO), SVR, and other ML-based models (refer to Fig. 10) (Elkiki; Guven and Gunal 2008b; Goel and Pal 2009; Guven 2011; Najafzadeh 2015; Sattar et al. 2018). The correlation coefficients of these models ranged from 0.905 to 0.948, while the HORSM5 model demonstrated the highest accuracy, with a correlation coefficient of approximately 0.955. The validation results clearly indicate that the HORSM5 model outperforms other models in terms of prediction accuracy.

In the comparative analysis, HORSM5 outshined ANN by reducing the RMSE by a notable 29%, showcasing its robust capability to discern complex patterns and interdependencies within the data, which translated into enhanced prediction precision. On the other hand, ANN's performance was less effective when compared to HORSM5. The diminished accuracy of ANN may be linked to a confluence of issues. Firstly, the training data may have lacked quality or been an ill fit for the model's learning process. Secondly, there may have been difficulties in fine-tuning the hyperparameters effectively. Thirdly, the algorithm might have encountered obstacles in avoiding local minima. Finally, the limited number of training samples for ANN could have hindered its performance. More training samples would have allowed ANN to efficiently learn and improve its predictive capabilities by increasing its capacity to capture underlying patterns and relationships in the data. Collectively, these factors played a part in undermining the ANN's predictive performance as observed in the research.

Conclusion

In this study, an improved version of the RSM model was used as a predictive model for predicting scour depth downstream of weirs, which is considered one of the most important parameters in the hydraulic field. The traditional RSM was developed using high-order polynomial functions with two to six polynomial degrees, which gave rise to the HORSM approach. The traditional RSM and HORSM models were trained and validated using 186 experimental data. In addition, these models were validated using the ANN model as an efficient form of artificial intelligence model. The experimental samples were randomly divided into training and testing sets, with a ratio of 66% for training and 34% for testing. Various statistical metrics, including RMSE, R, MAE, R2, MAPE, WI, and RRMSE, were employed to evaluate the prediction accuracy. The study found that the HORSM equipped with fifth-order polynomial functions provided the best prediction accuracy and outperformed the standard RSM and ANN models. The proposed model demonstrated a reduced prediction error for scour depth compared to other models, with values of MAE = 0.083, RMSE = 0.115, and MAPE = 0.135. Additionally, it exhibited high accuracy, as reflected in R = 0.955, and WI = 0.972, indicating superior performance. Also, a sensitivity analysis was performed to assess the impact of various hydraulic and geometric parameters on scour depth prediction. The absence of the factor (z/ht) significantly decreased prediction accuracy, highlighting its crucial role in scour depth dynamics. Additionally, Uo/Uc and d50/ht were identified as the second and third most important parameters, respectively. The proposed model can be used to estimate the marginal power functions in reliability planning and to predict other parameters in the hydraulic domain, such as the discharge coefficient and scour depth around bridge piers. Moreover, the proposed approach is very powerful for large data sets. However, when few or small data points are available (\(N<\frac{(n+1)(n+2)}{2}+\left(Q-2\right)*{n}^{2}\)), the HORSM may not provide satisfactory predictions, which in turn limits its use. Therefore, it is important for the future to improve the HORSM approach to solve various engineering problems with few samples (observations) and large input vectors. Thus, this study proposes the following recommendations:

  1. 1.

    Employing principal component analysis (PAC) to decrease the dimensionality of the processed data for higher-order response surface methodology (HORSM).

  2. 2.

    Utilizing a regularization parameter to improve the performance of the HORSM model.