Estimation of Mean Velocity Upstream and Downstream of a Bridge Model Using Metaheuristic Regression Methods

This study compares four data-driven methods, Gaussian process regression (GPR), multivariate adaptive regression spline (MARS), M5 model tree (M5Tree), and multilinear regression (MLR), in estimating mean velocity upstream and downstream of bridges. Data were obtained through multiple experiments in a rectangular laboratory flume with glass walls 9.5 m long, 0.6 m wide, and 0.6 m deep. Four different bridge models were placed at the 6th meter of the channel to determine the average velocities upstream and downstream. Different data-driven models were implemented with different combinations of effective parameters as input. They were evaluated and compared using root mean square error (RMSE), mean absolute relative error (MARE), and Nash–Sutcliffe efficiency (NSE). The results showed that the MARS had the best efficiency in estimating the mean velocity upstream of the bridge model. At the same time, the M5Tree provided the highest performance in estimating the mean velocity downstream. The MARS method improved the estimation accuracy of GPR, M5Tree, and MLR in the test phase by 23.8%, 45.1%, and 47.4% concerning the RMSE at the upstream. The M5Tree provided better RMSE accuracy of 31.8%, 70.4%, and 75.5% at the downstream compared to MARS, GPR, and MLR, respectively. The study recommends the MARS and M5Tree for estimating mean velocities upstream and downstream of the bridge.


Introduction
Bridges are important components of the transportation system that provide daily public mobility, food, medical, and other supplies, welfare, commerce, industry, and various cultural activities at different spatial scales (Chang et al. 2012).Therefore, these structures require careful planning, design, and maintenance.Despite significant improvements and the development of new recommendations and guidelines for bridge design, the number of bridge failures worldwide remains high (Zhang et al. 2022).The age of collapsed bridges can be from 1 to over 100 years old, which depends on many factors (Wardhana and Hadipriono 2003).Bridge failure significantly impacts the Extended author information available on the last page of the article Published online: 22 September 2023 Water Resources Management (2023) 37:5559-5580 have also reported that data-driven techniques perform better than regression models in scour depth prediction (Bonakdari et al. 2020;Tola et al. 2023).As far as we know, most studies using data-driven techniques to investigate flow-structure interactions in bridges are mainly related to phenomena such as scour and drawdown but not to specific hydraulic parameters such as Froud number or flow velocity (Atashi et al. 2023;Tola et al. 2023).Therefore, the main objective and contribution of this study are to evaluate four data-driven methods, multivariate adaptive regression spline (MARS), M5 model tree (M5Tree), Gaussian process regression (GPR), and multilinear regression (MLR), for predicting the mean velocity at the upstream and downstream of a bridge model structure built in a channel using monitoring data from experiments.The following parts of the study was organized as follows: Sect. 2 explains the experimental procedures and gives brief information about the data-driven techniques; Sect. 3 shows the main results and discusses the relevance of our findings compared to similar studies.Finally, Sect. 4 summarizes the study's main results and contribution and provides recommendations for future research.

Experimental Setup
Experiments were conducted in a rectangular laboratory channel with glass walls 0.6 m wide, 9.5 m long, and 0.6 m deep at the Hydraulics Laboratory, Erciyes University, Turkey.Water flow through the flume was measured using a UFM-600 ultrasonic current meter mounted on the pipe carrying water from a constant head tank to the inlet of the flume.A Tripod-mounted point meter that can freely move in 3D was used for measuring velocities and water surface profiles (Fig. 1a).Streamflow Velocity Meter 400 type "Low-Speed Propeller Probe" was used to measure flow velocities.To accurately determine point velocities during the measurements, the average frequency was obtained on the digital display every 10 s; this procedure was repeated multiple times at every point, and velocities were then resolved using the average of the multiple frequencies.
Bridge models with rectangular cross-sections and four openings ranging M = b/B = 0.58, 0.67, 0.75, 0.83 were utilized, where B and b are the width of the channel (here 60 cm) and the span of the bridge, respectively.The width of the bridge deck (Wb), made of wood, is 5 cm.The bridge models were positioned at the 6th m downstream of the channel to observe the mean velocities up/downstream and to investigate the effects of the bridge structure on the variations of the water level profile.Velocities were measured at the 1 cm upstream of the up/downstream sides of the bridge section at the midpoint of the bridge span (Fig. 1b).As can be observed from Table 1, velocity measurements were done for five distinct flow conditions in steady-state (Hadi and Ardiclioglu 2018).
Where V (Q/A) is average flow velocity, h n is uniform water depth, Q is discharge, A is a wetted area of cross-section, S is the slope of channel, Re (= 4VR/υ) is Reynolds number with hydraulic radius of R(= A/P), P is the wetted perimeter, (kinematic viscosity) and Fr (= V/(gℎ) 0.5 , Froude number) with g (the acceleration due to gravity).Mean velocities were observed for five different discharges with four distinct b/B ratios at the up/downstream portions at the midpoint of the spans.The average velocities for different discharges and openings are given in Fig. 2.
The average velocity (V) in a vertical is determined from velocity observations at different points in each vertical.In the method of vertical velocity curve, measurements were made for each selected vertical at points well distributed between the riverbed and water surface.The mean velocity in the vertical (V u upstream or V d downstream) is calculated by obtaining the area between the ordinate axis and curve and then computing the area/flow depth ratio in that vertical utilizing Eq. (1).

a)
Fig. 1 The experimental flume and equipment for measurement In Eq. ( 1), two successive velocities, v i and v i+1 , and for a depth h i , indicate the distance between successive velocity measurement points.The mean velocities at the downstream and upstream were determined by taking the mean values of the average velocities obtained for each opening using Eq.(2).
In this equation, V u is the average upstream velocity, V d is the average downstream velocity, and j is the number of openings.Measured average velocities for five different discharges and four different opening mid-sections are given in Table 2. (1)

Multivariate Adaptive Regression Splines (MARS) Approach
MARS is a nonlinear/nonparametric method capable of modeling nonlinear systems.This method does not assume a functional relationship between independent and dependent variables.MARS is composed of piecewise linear segments or splines that are seamlessly connected.These splines (e.g., polynomials) are called BFs (basis functions), which can provide flexibility in handling linear or nonlinear behaviors.The connections of the pieces are named nodes.They mark the end of one data region and the beginning of another.Candidate nodes are randomly positioned within each input range.A node marks the end of one data region and the beginning of another (Friedman 1991).
The MARS can map complex and high-dimensional data.It can provide a simple interpretable model and calculate the contribution of each input variable.The main aim of this method is to estimate the amounts of a continuous dependent variable, y (n × 1) from independent explanatory variables, x(n × p).The model can be given as follows: f refers to a weighted sum of basic functions depending on x, and e indicates the error vector with an (n × 1) dimension.
MARS generates BFs by stepwise searching all possible univariate candidate nodes and via interactions between all considered variables.It uses an adaptive regression method to choose node positions automatically.The MARS has two phases: forward and backward.In the phase forward, candidate nodes are randomly positioned within each input range to provide BFs pairs.At every step, the model adjusts the node and the related pair of BFs to reduce the residual error in the sum of squares.In the forward phase, excessive BFs can be added to reduce error, and this can cause overfitting.This problem is solved in the backward phase by eliminating the BFs having the least contributions (Zhang and Goh 2016).

M5 Model Tree (M5Tree) Approach
The M5 model tree, first developed by Quinlan (1992), is a data mining method.This method uses a binary decision tree with linear equations at the terminal (or leaf) nodes.Using such equations, a relationship is estimated between dependent and independent variables.It can handle quantitative data (Mitchell 2007).Like the MARS method, constructing the M5 model tree requires two distinct phases (Solomatine and Xue 2004).Data is portioned into subsets in the first phase, and a decision tree is generated.The split criterion treats the standard deviation (SD) of the class values reaching a node as an error measure at that node and computing the expected reduction due to testing each attribute.M5 is a recursive algorithm that constructs the regression tree by partitioning the space using the SD reduction (SDR) factor, the maximum reduction in output error after branching.The equation for calculating the standard deviation reduction (SDR) is expressed as: where T stands for a set of examples reaching the node, T i for the subset of examples having the i −th result of the potential set, and sd for the SD, due to the split process, the SD of the data in the child nodes (lower nodes) is lower than that in the parent node.The best one maximizing the expected error reduction is selected among the considered splits.However, this splitting often results in a large tree-like structure that can lead to overfitting or poor generalization.To overcome this, the second phase is to prune the oversized tree and then replace the pruned subtrees with linear equations (Rahimikhoob et al. 2013).

Gaussian Process Regression (GPR)
The GPR is a nonparametric model for solving nonlinear regression problems (Williams 1997).This method regresses the inputs and output by directly defining a prior probability distribution over a latent function.The following equation can express it: where m(x) is the mean function (MF) and k(x, x � ) is a covariance kernel function (CKF).The MF encoding central tendency of the function is generally accepted as 0 (Zhang et al. 2016).The CKF encodes information about the expected function's structure and shape.The following equation defines the relationship between inputs and outputs: where ε is a noise, is assumed to be independent, and has a Gaussian distribution with a 0 mean.Variance ( 2 n ) is distributed over it: From Eq. ( 5), the likelihood can be provided as follows: ] and I is a unit matrix with a M × M dimension.In the GPR method, the kernel function is selected based on the assumptions about the model, and the Gaussian kernel is mostly used.Hyper-parameters of kernel function are computed by maximum likelihood estimation (Karbasi 2018;Shadrin et al. 2021).

Results
In this study, four data-driven methods, multivariate adaptive regression spline (MARS), M5 model tree (M5Tree), Gaussian process regression (GPR), and multilinear regression (MLR), were implemented to estimate the mean velocity upstream and downstream of a bridge model using data from experiments.The models were applied using MATLAB software.The models were tested to estimate the mean velocity with different combinations of influential input parameters such as h, Vup, y, b/B, Fr, Re, and B/hn.The input combinations were determined by adding one variable at each time to determine the influence of (4) the variables on the mean velocity.The models were evaluated using the following statistics: Eqs. ( 4)-( 6): where V m is the mean of measured velocity, V ic is calculated velocity, V im is measured velocity, and N refers to the quantity of data.The performance statistics of the implemented data-driven methods are summarized in Tables 4, 5, 6, 7, 8, 9, 10 and 11 for estimating the mean velocity upstream of the bridge model using different input combinations.The model with minimum inputs (h, V up ) performs the worst for the MARS method.In contrast, the models by input combination iv have the smallest RMSE, MARE, and the highest NSE in the training and testing phases.The three MARS models have a marginal difference, having 5 to 7 input variables.In this study, we selected the MARS model with input combination iv as the best model because it requires fewer inputs than the other two alternatives.
The performance statistics of the M5Tree (Table 4) clearly show that the model with inputs h, V up , y, b/B, and Fr has the best accuracy in estimating the mean velocity at the upstream of the bridge model in both training (RMSE = 0.082 m/s, MARE = 1.052,NSE = 0.9914) and testing (RMSE = 0.0222 m/s, MARE = 3.836, NSE = 0.9417).Beyond this combination, the accuracy of the M5Tree is not improved, like the results of the method MARS.Among the GPR models listed in The best data-driven models are compared in Fig. 3 with their velocity estimates.From the figure, the estimates of the MARS are closer to the observed values compared to the M5Tree, GPR, and MLR.The superiority of GPR over M5Tree and MLR can also be seen in this figure.The estimates of the same models are compared in Fig. 4 in the form of a scatter plot.The MARS method has the least scattered velocity estimates upstream of the bridge model, with the lowest coefficient of determination (R 2 = 0.985).In contrast, the MLR model performs the worst.
2 Fig. 3 Comparison of observed and predicted upstream mean velocities by data-driven methods in the test stage Tables 7-11 show the training and testing performances of the four data-driven methods in estimating the mean velocity downstream of the bridge model.Again, the MARS model with five inputs (input combination iv) provided the best accuracy, and adding more inputs did not improve the accuracy.Comparison of input combinations iv and v shows that including the Froude number improved the accuracy in the test phase by 58.7%, 53%, and 37.1% in terms of RMSE, MARE, and NSE, respectively.Table 7 shows that the M5Tree model provided the lowest RMSE and MARE and the highest NSE in the training and testing phases of the 5th input combination (h, V down , y, b/B, Fr, Re).Also, for this model, adding the Froude number improved the model accuracy in the test phase by 67.8%, 74.5%, and 31.4% in terms of RMSE, MARE, and NSE, respectively.Like the MARS model, the GPR model with inputs h, V down , y, b/B, and Fr (input combination iv) was also the best (Table 8) in estimating the mean velocity downstream of the bridge model.For this method, including the Froude number in the model inputs improved the accuracy of RMSE, MARE, and NSERMSE, MARE, and NSE by 37.6%, 41.9%, and 152.5%, respectively, in the test phase.As can be seen in Table 9, the MLR model offered the best performance for input combination iv.Again, the MARS, M5Tree, and GPR outperformed the MLR in estimating the mean velocity downstream of the bridge model.Among the datadriven methods, the M5Tree performed better than the others.The improvements in the accuracy of the best MARS, GPR, and MLR models by implementing the M5Tree model are 31.8%,70.4%, and 75.5%, respectively, regarding the RMSE in the test phase.
Figure 5 compares the mean velocity estimates of the best data-driven models downstream of the bridge model.The M5Tree estimates are closer to the observed values than the other methods, and GPR and MLR do not capture the measurements well.The scatter plots of the data-driven methods are shown and compared in Fig. 6.The least scattered estimates belong to the M5 tree with the highest coefficient of determination (R 2 = 0.977).In contrast, the GPR and MLR methods provide inadequate estimates.Figure 9 illustrates the errors of the data-driven methods in estimating the mean velocities at the upstream and downstream of the bridge model.The figure shows that the MARS and M5Tree methods have the best accuracy in estimating the velocities.In contrast, as expected, the MLR method produces the highest errors because the relationship between velocity and the influential parameters is nonlinear.Taylor diagrams were employed to facilitate a comprehensive of the model's performance, as illustrated in Fig. 7.These diagrams offer a valuable visualization tool for evaluating the accuracy of the models through the lenses of RMSE, standard deviation, and correlation.Upon examination, it becomes evident that the MARS method boasts the strongest correlation and minimal squared error when estimating mean velocities in upstream and downstream contexts.Moreover, the assessment of model predictions and observed values was undertaken through violin charts, as depicted in Fig. 8.This graphical representation effectively contrasts the distributions of predictions and observations generated by the various models.A notable observation from this figure is that the MARS model exhibits a striking resemblance to the observed values in terms of mean, The results were subjected to additional validation using a one-way analysis of variance (ANOVA) to assess the robustness of the models in terms of the significance of discrepancies between the estimated and observed values.Both tests were conducted with a confidence level of 95%.In specific terms, deviations between the predicted and actual values were deemed significant when the resulting p-value fell below 0.05, employing a two-tailed significance approach.The statistical outcomes of these tests are presented in Table 11.In the case of upstream velocity estimation, the MARS model demonstrated modest testing values (0.0053) alongside a notably high significance level (0.942) (Table 3).Conversely, for downstream velocity, the M5Tree model exhibited the smallest testing value (0.0051) coupled with the highest significance level (0.943), followed by the MARS model.These test findings suggest that, in terms of the mean velocity of the bridge model, the MARS and M5Tree methods exhibit higher robustness than the other methods.

Discussion
The study aimed to estimate the mean velocity at the upstream and downstream of a bridge model using four data-driven methods: multivariate adaptive regression spline (MARS), M5 model tree (M5Tree), Gaussian process regression (GPR), and multi-multi-linear regression (MLR).These methods were implemented and evaluated based on their accuracy in estimating velocities using experimental data.The study found that the MARS method consistently outperformed the other methods in estimating upstream and downstream mean velocities of the bridge model.This was particularly notable when comparing the input combinations of variables.Adding the Froude number (Fr) as an input parameter substantially impacted the accuracy of the MARS, M5Tree, and GPR methods.The improvement in accuracy ranged from 41.4% to 82.7% in terms of RMSE, which is a significant enhancement.Comparing the methods, the study concluded that the MARS method performed the best in accuracy, followed by GPR and M5Tree.MLR consistently yielded the lowest accuracy among the tested methods.The nonlinearity of the investigated phenomenon can explain this.
The study also employed various visualizations to understand the results better.Figures 4 to 9 showcase the comparison between different methods' velocity estimates, scatter plots of the estimates, variation graphs, and error illustrations.These  et al. 2020), the data-driven methods-MARS, M5Tree, GPR, and MLR-are reliant on the amount of available data.Within the training dataset, extreme velocities are limited, which hinders the models' ability to grasp the underlying phenomenon fully.This challenge could potentially be mitigated by incorporating a larger volume of experimental data.
It is important to understand the behavior of flow and turbulence characteristics near bridge piers.Measurements of upstream and downstream flow help explain how velocity changes affect scour development around the base of bridge piers.These velocities provide important information for understanding scour mechanisms and designing future structures to increase bridge safety and resilience (Carnacina et al. 2019).The applications show that the MARS and M5Tree models can successfully estimate the mean velocity using the input parameters h, V up , y, b/B, Fr, and Re.

Conclusions
This study investigated the ability of four data-driven methods to estimate mean velocity at the upstream and downstream of bridges using experimental data and influential parameters (h, V up , y, b/B, Fr, Re, B/h n ).Various combinations of the above influential parameters were used as inputs to the MARS, M5Tree, GPR, and MLR models, considering the correlations between inputs and outputs.Fr number was very effective in estimating mean velocity upstream and downstream.Including the Fr, improvements were obtained for the MARS, M5Tree, and GPR as 41.4%, 42.8%, and 82.7% at the upstream and 58.7%, 67.7%, and 37.6% at the downstream, respectively, concerning the RMSE in the test phase.Evaluation of the methods showed that the MARS model with inputs of h, V up , y, b/B, and Fr provided the best accuracy in estimating the mean velocity upstream of the bridge.In contrast, the M5Tree model had the highest performance in estimating downstream mean velocity.It was found that the MLR model did not model the mean velocities well due to the complexity of the phenomenon studied.The relative RMSE between the MARS and the other models (GPR, M5Tree, MLR) was 23.8%, 45.1%, and 47.4% at the upstream in the test phase, respectively.In contrast, the corresponding values between the M5Tree and other models (MARS, GPR, MLR) were 31.8%,70.4%, and 75.5% at the downstream, respectively.This study recommends using MARS and M5Tree models for estimating the mean velocities at the upstream and downstream of the bridge with the MARS and M5Tree models.
In the presented study, the effect of bridge piers on the flow in rectangular channels was investigated.In future studies, trapezoidal channels can also be investigated, and the implemented methods can be assessed using more experimental data to improve efficiency.

Fig. 2
Fig. 2 Upstream and downstream average velocities for different discharge and opening

Fig. 4
Fig. 4 Scatterplots of observed predicted upstream mean velocities in the test stage by data-driven methods

Fig. 6
Fig. 6 Scatterplots of and predicted downstream mean velocities in the test stage by data-driven methods

Fig. 7
Fig. 7 Taylor diagram of the metaheuristic regression approaches and MLR in the test stage of upstream and downstram: A: Observed, B: MARS, C: M5Tree, D: GPR, and E: MLR

Fig. 9
Fig. 9 Prediction errors produced by the metaheuristic regression approaches and MLR in the test stage a upstream and b downstream

s) (m/s) (m/s) (m/s) (m/s)
Table5, the model with 5 inputs (h, V up , y, b/B, Fr) has the lowest RMSE and MARE and the highest NSE in the training and testing phases.The results of MARS, M5Tree, and GPR methods show that including the Froude number in the model input significantly improves the accuracy in estimating the mean velocity.The improvement in accuracy of MARS, M5Tree, and GPR for input combination iii to iv is 41.4%, 42.8%, and 82.7%, respectively, in terms of RMSE in the test phase.The performance statistics of the simple MLR method are summarized in Table6.Unlike the previous methods, MLR provided the best accuracy (RMSE = 0.0232 m/s, MARE = 5.263, NSE = 0.936) for the full input variables (h, V up , y, b/B, Fr, Re, B/h n ).A comparison of the four methods shows that the MLR is the worst method for estimating the mean velocity upstream of the bridge model.In contrast, the MARS method provided the best accuracy, followed by the GPR and M5Tree.The model MARS with inputs h, V up , y, b/B, and Fr improves the estimation accuracy of the best GPR, M5Tree, and MLR by 23.8%, 45.1%, and 47.4%, respectively, in terms of RMSE in the test phase.

Table 3
Analysis of variance for estimating the mean velocity of the bridge model the findings and highlighted that the MARS and M5Tree ods consistently yielded the most accurate estimates.At the same time, GPR and MLR fell short of capturing the actual velocity patterns.Nevertheless, it was evident that all four methods could not accurately capture certain extreme velocity values downstream and upstream.As previously highlighted in other research (Adnan

Table 4
Error statistics of MARS in estimating the mean velocity of bridge upstream using different input combinations

Table 5
Error statistics of M5Tree in estimating the mean velocity of bridge upstream using different input combinations

Table 6
Error statistics of GPR in estimating the mean velocity of bridge upstream using different input combinations

Table 7
Error statistics of MLR in estimating the mean velocity of bridge upstream using different input combinations

Table 8
Error statistics of MARS in estimating the mean velocity of the bridge downstream using different input combinations

Table 9
Error statistics of M5Tree in estimating the mean velocity of the bridge downstream using different input combinations

Table 10
Error statistics of GPR in estimating the mean velocity of the bridge downstream using different input combinations