Predicting the behavior of reinforced concrete columns confined by fiber reinforced polymers using data mining techniques

Fiber Reinforced Polymer (FRP) usage to wrap reinforced concrete (RC) structures has become a popular technology. Most studies about RC columns wrapped with FRP in literature ignored the internal steel reinforcement. This paper aims to develop a model for the axial compressive strength and axial strain for FRP confined concrete columns with internal steel reinforcement. The impact of FRP, Transverse, and longitudinal reinforcement is studied. Two non-destructive analysis methods are explored: Artificial Neural Networks (ANNs) and Regression Analysis (RA). The database used in the analysis contains the experimental results of sixty-four concrete columns under the compressive concentric load available in the literature. The results show that both models can predict the column's compressive stress and strain reasonably with low error and high accuracy. FRP has the highest effect on the confined compressive stress and strain compared to other materials. While the longitudinal steel actively contributes to the compressive strength, and the transverse steel actively contributes to the compressive strain.

The predicted output using the model O The average of the actual outputs O ′ The average of the predicted outputs. R 2 The squared multiple correlation coefficient RMSE The Root Mean Square Error S The center to center spacing between transverse steel t f The total thickness of FRP wrap X The input attributes in regression analysis α The confidence level

Introduction
The use of Fiber Reinforced Polymer (FRP) to wrap reinforced concrete (RC) structures has become a popular technology [1]. For RC columns, FRP is used to wrap the concrete externally. This practice enhances columns' capacity to increase ductility, moment, ultimate compressive load capacity, ultimate deformability, and energy absorption compared to unconfined columns [2,3].
Most studies about RC columns wrapped with FRP in literature ignored the internal steel reinforcement [4] or estimated the total confinement pressure as the sum of the confinement pressure due to the external FRP and the confinement pressure due to the internal steel [5], with few models to account for both [6,7].
One of the challenges that face a designer of RC columns wrapped with FRP is to estimate the compressive strength and strain at failure. That is usually achieved by destructive methods through lab testing or non-destructive method as analytical models.
Recently, researchers used the power of artificial neural networks (ANNs) in modeling several civil engineering systems, and several studies used ANNs to estimate concrete compressive stress and strain successfully.
Hola and Schabowicz [8] used ANN to assist the compressive strength of concrete. They concluded that this method is a viable method to estimate the concrete compressive strength.
Oreta and Kawashima [9] used the ANN method to predict the compressive strength and strain of circular columns confined by internal reinforcement. They found that the ANN model performs well compared to other analytical models. Pham et al. 2014 used ANN to predict the stress and strain of FRP-confined rectangular columns. The study yielded results with marginal errors, approximately half of the errors of the other existing models [10]. Several regression analysis methods have been used in literature to predict the compressive strength of concrete [11][12][13].
Concrete testing procedures are usually time intensive. A standard compression test is done approximately 28 days after the concrete has been placed. It is, therefore, necessary to be able to model the strength analytically without repeating the tests.
The purpose of this paper is to estimate the axial compressive strength and axial strain for FRP confined concrete circular columns with internal steel reinforcement. Two non-destructive analysis methods are explored: Artificial Neural Networks (ANNs) and Regression Analysis (RA). The database used in the analysis contains the experimental results of sixty-four concrete columns wrapped with FRP under the compressive concentric load available in the literature. The analysis was conducted using parameters that depend on the materials' properties and columns' setup. The impact of each parameter in the confined column compressive strength and strain is studied. Throughout the paper, it should be noted that the terms compressive strength and strain should be understood as axial compressive strength and axial strain, respectively.
A comprehensive literature review is carried out to collect the experimental data related to circular concrete columns reinforced with steel and FRP to achieve this objective. The experimental data of the columns are summarized. If not available, parameters of interest are calculated. The collected data is used to develop analytical models for the columns' axial compressive strength and axial strain. In this study, two techniques are used: Artificial Neural Networks (ANNs) and Regression Analysis (RA). The results of each model are provided, and a comparison between the two techniques is carried.
The paper is organized as follows. Section 2 highlights the research significance, while Sect. 3 provides the details of the research methodology. In Sect. 4, the parameters related to the concrete columns' setup and materials used in the models are explained. In Sect. 5, an extensive search for experimental data for Steel -FRP confined circular concrete columns was conducted, and a total of 64 test results are collected.
The estimation techniques used in the paper are detailed in Sect. 6. The section includes an overview of the Artificial Neural Networks (ANNs) and the Regression Analysis (RA). The details of the statistical metrics used to evaluate the techniques' performance are also included. Section 7 consists of the results of both methods used in the paper. These results are evaluated using the statistical metrics discussed in Sect. 6 and compared to each other. Finally, Sect. 8 concludes this research study, along with the study limitations.

Research significance
Most studies about RC columns wrapped with FRP ignored the internal steel reinforcement with few models to account for both. In literature, the estimation of RC columns wrapped with FRP compressive strength and strain at failure is usually achieved by destructive methods through lab testing, requiring a high level of effort and funds. This study explores advanced computational methods to analyze FRP-RC columns' behavior based on materials properties. The study considers all materials' properties in the analysis: concrete, FRP, and steel, and the contribution of each material on the compressive strength and strain.

Methodology
The study carried out a comprehensive literature review on research conducted on steel, FRP, and steel-FRP confined circular concrete columns. Based on the review, the parameters that may influence the behavior of the confined columns are specified. The focus is on the studies that performed experimental research on a circular RC concrete column externally wrapped by FRP to collect the experimental data needed for the parameters.
Afterward, two modeling techniques are used to analyze the collected data, the Artificial Neural Networks (ANNs) and Regression Analysis (RA) methods. Each model's details are established based on the best practices in literature and the experimental data setup. Finally, the results of each technique are discussed and compared. A summary of the research methodology is shown in Fig. 1.

Parameters
There are two categories of parameters collected for this study: input (predictor) parameters, influencing the output (responses) parameters' values, which is the second category.
The study objective is to analytically estimate the axial compressive stress of confined concrete (f ' cc ) and ε ccu, the confined concrete ultimate axial strain corresponding to the compressive stress f ' cc . The compressive strength at 28-days (f c ') and the axial strain at the unconfined concrete's peak stress play a key role in the concrete properties when confined.
Several established models for steel or FRP-confined concrete strength consist of strength ratio and strain ratio [14,15]. The strength ratio (f ' cc /f c ' ) is the ratio between the confined strength to unconfined concrete strength. While strain ratio (ε ccu /ε c ' ) is the ratio between the confined concrete ultimate axial strain to the axial strain at the peak stress of unconfined. This method will be used throughout the study to include unconfined concrete properties.
The input parameters are those material and geometrical characteristics that influence the steel -FRP confined columns' behavior.
Both steel and FRP provide lateral confining pressure on the concrete. As mentioned before, most researchers will combine the effect of steel and FRP lateral confinements together, but concrete confined with steel behaves differently than concrete confined with FRP [4]. The compressive stress-strain curve for concrete confined with steel shows an initial linear-elastic stage followed by a yielding plateau where the tensile strain increases with little stress rise (Fig. 2a).On the other hand, the compressive stress-strain of concrete confined with FRP composites stays linear elastic until the final brittle failure occurred (Fig. 2b).
The behavior of steel confined concrete depends on the maximum lateral confining pressure due to transverse steel (f ls ). This behavior is affected by the distribution of longitudinal and lateral reinforcement configuration, transverse steel spacing, size/dimension, and transverse steel characteristics [16]. The steel maximum lateral confining pressure can be calculated using Eq. 1 proposed by Mander [15]. Where f y is the yield strength of non-prestressed steel reinforcement; s is the center to center spacing between transverse steel; A st is the transverse steel area; d s is the concrete core diameter to the centerline of transverse steel (Fig. 3).
The steel maximum lateral confining pressure f ls is an important parameter that should be included in the model. As the concrete specimen compressive strength varies, the confinement ratio of the steel-confined specimen (f ls /f c ' ) will be used. Where f c ' is the compressive strength of unconfined concrete at 28-days.
(1) f ls = 2A st f y sd s

Fig. 1 Summary of the research methodology
The behavior of FRP confined concrete depends on the maximum lateral confining pressure due to FRP (f lf ). f lf depends directly on FRP material properties: the FRP modulus of elasticity (E f ); the design rupture strain of FRP wrap (ε fu ) , and geometrical properties: the total thickness of FRP wrap (t f ); and the circular column diameter (D). If f lf is not available directly, it is calculated using Eq. 2.
The FRP maximum lateral confining pressure (f lf ) is another critical parameter that should be included in the model. As the concrete specimens' compressive strength varies, the confinement ratio of FRP-confined specimen (f lf / f c ') will be used.
Lastly, the RC columns include longitudinal steel reinforcement. While the longitudinal reinforcement affects the steel confinement [15], its contribution to confined concrete columns' behavior has been studied briefly [6]. The longitudinal reinforcement will be included in the term of ρ cc in the model, where ρ cc is the ratio of longitudinal reinforcement area to the area of the concrete core of the section. (2)

Experimental data
An extensive search for experimental data for Steel -FRP confined circular concrete columns was conducted. A small number of tests have been reported in the literature on the concrete confined by steel and FRP. Most of the experiments were conducted on concrete specimens without steel reinforcement [14]. Tests that were not fully and clearly documented were excluded. The The values of these parameters are collected from the original study, if possible. If not available, they are calculated based on the material properties and setup of the experiment provided in the research.
With these exclusions, a total of 64 test results are available for the assessment of strength models and are listed in Table 1. These data are all for FRP fully confined reinforced concrete circular columns reported in nine studies [7,[17][18][19][20][21][22][23][24].
The statistical data of the columns studied are summarized in Table 2.

Overview of the artificial neural networks
An artificial neural network (ANN) is a computational model that mimics the human brain's learning and decision-making process [25]. ANN is a data modeling tool that depends upon various parameters and learning methods. The advantage of this method is that neuron computing devices don't get to be programmed. The random choice of initial weights stimulates learning from adjusting the weights themselves by achieving the prediction's minimum error. It is typically organized in layers that are made up of several interconnected neurons/nodes. Then, the information is processed through neurons/nodes in a parallel manner to solve the problem of interest. A numerical value weight matrix is initiated for each neuron in which different analytical formulas are resolved to make adjustments to the matrix itself.
Patterns are presented to the network through the input layer, which communicates to hidden layers, then the hidden layers link to the output layer [26]. This stage is called training. The program needs a number of valid data to train it to predict the output in the future. Another set of data is usually used to test how well the program will predict the output. This process is called testing.

Artificial neural networks: current study parameters
In this study, the machine learning software, WEKA [27], was utilized to process the ANNs algorithm. Three input variables are used f ls /f c ' , ρ cc , and f lf /f c ' to test the contribution of transverse steel reinforcement, longitudinal steel reinforcement, and FRP wrap, respectively. While the output variables are f ' cc /f c ' and ε ccu /ε c ' . Figure 4 shows the ANN model structure used to solve this problem using the WEKA machine learning software [27].
For this study, one hidden layer is used. Using one hidden layer can generate an approximation very precisely [28]. Two neurons were used in the input layer, and one neuron was used at the output layer for each output, as shown in Fig. 4. The number of neurons was generated by the software based on the complexity and number of data sets used for training.
The holdout method is a widely used method for model evaluation where the data is partitioned into two sets, a training set and a testing set [25,29]. The training set should not be small to be able to learn from the data and generate a very robust model with high generalization ability [25]. A percentage of approximately 80% for  training and approximately 20% for testing is empirically the best division of the data sets [30]. For this study, the database of 64 records of concrete columns was divided into a training dataset of 52 records and a testing dataset of 12 records.
Once the network has been designed, the network would be trained. Each input i will be multiplied by the corresponding weight W ij for each neuron j before adding the weighted inputs. A sigmoid activation function is then applied for each neuron in the input layer. A linear activation function is applied at the output neuron, where the Sigmoid function is a function used in logistic regression.
Three statistical metrics were used to evaluate the performance of the algorithm. These metrics are the Mean Absolute Error (MAE), the Root Mean Square Error (RMSE), the Squared Multiple Correlation Coefficient (R 2 ), and the Two-Sample t-Test.
The Mean Absolute Error (MAE) is the average of the absolute errors for all points. If we have n points, then the MAE is calculated in Eq. 4 as follows: where e i is the difference between the actual (observed) output (O) and the predicted output using the model (O').
The Root Mean Square Error (RMSE) is the square root of the average of the squared errors for all points. If we have n points, then the RMSE is calculated in Eq. 4 as follows: where e i is the difference between the actual (observed) output (O) and the predicted output using the model (O').
The Squared Multiple Correlation Coefficient (R 2 ) measures the models' precision [31]. It estimates how wellobserved results are replicated by the model, depending on the proportion of total output variance described by the model. [32]. It can be calculated using Eq. 5 where O the actual (observed) output, O' is the predicted output using the model, The Two-Sample t-Test determines whether the data means of the experimental and predicted data differ. The p-value obtained from the test is compared to the significance level α, which is considered 0.05. If p-value ≤ α, the difference between the means is statistically significant. But if p-value > α, the difference between the means is not statistically significant, and the proposed model is in good agreement with the experimental data.

Regression analysis (RA)
Regression is a statistical technique that can be used to model functions with continuous values, and it helps predict unavailable numerical data values [25,29]. Regression is used to model the relationship between the dependent variable (output) and the independent variables (inputs). The goal is to find the model that can relate the output value to the input values [25]. Linear regression is one of the most used models in statistical analysis. The regression equation, in this case, will be a generalized linear model that can be used to fit the dataset [25] as follows: O = I 0 + I 1 X 1 + I 2 X 2 + I 3 X 3 + … . + I n X n The polynomial regression is another form of regression used to find a regression model that can be used to fit the nonlinear relationship between the inputs and the output based on a polynomial equation from the n th degree.
In this study, three models were implemented using the Python programming language and the scikit-learn machine learning library [33]. The first one is a linear model, and the other two models are polynomial models of 2nd and 3rd degrees.
Similar to the ANN model, the statistical metrics used to evaluate the model are the Mean Absolute Error (MAE), the Root Mean Square Error (RMSE), the Squared Multiple Correlation Coefficient (R 2 ), and the Two-Sample t-Test. Also, the number of terms in the generated equation will be compared, and the p-value for each input parameter is calculated.
The probability value (p-value) is used to determine if the likelihood that an apparent relationship between variables is deterministic. This value is usually an indicator of the significance of each variable in the model. The p-value is compared with the confidence level α. A confidence level of 0.05 has been accepted as a cutoff value in many disciplines. In this study, a confidence level α of 0.05 was adopted as a cutoff point. If the variable's p-value is less than 0.05 a relationship is strong enough to be noteworthy [31]. If the variable's p-value is more than 0.05, the variable will be eliminated from the equation. MAE, RMSE, and R 2 will be recalculated to evaluate the proposed hypothesis.

Artificial neural networks (ANNs)
The training phase was used to train the model to predict the outputs' value; (f ' cc /f c ' ) and (ε ccu /ε c ' ) in this study using the training dataset. The resultant trained model was used in the testing phase to predict the outputs using the testing dataset containing 12 concrete columns. The results generated by the algorithm for the concrete columns in the testing group is shown in Fig. 5.
The horizontal axis represents the model results, and the vertical axis represents the experimental results for f ' cc /f c ' (Fig. 5a) and ε ccu /ε c ' (Fig. 5b). The closer the point to the diagonal axis, the better the results are. The points are clustered around the diagonal line, even for points with high f ' cc /f c ' and ε ccu /ε c ' . MAE, RMSE, and R 2 are calculated for both outputs, and the results are listed in Table 3. MAE values are 0.466 and 2.937 for f ' cc /f c ' and ε ccu /ε c ' respectively, and RMSE values are 0.56 and 3.319 for f ' cc /f c ' and ε ccu /ε c ' respectively. These two metrics show that the f ' cc /f c ' outputs were generated with lower error values than ε ccu /ε c ' outputs using the ANN model. That could be due to differences in methods and locations in measuring the strain in the lab. Measuring stress tends to be more consistent over different equipment and lab setups.  R 2 values for the models are 0.803 and 0.80 for f ' cc /f c ' and ε ccu /ε c ' , respectively. That means that the model predicted both outputs with almost the same accuracy based on this metric.
The mean and standard deviation of the experimental value of f ' cc /f c ' for the testing database are 2.54 and 1.313, respectively. In comparison, the mean and standard deviation of the model value are 2.59 and 1.173. The mean is slightly higher for the ANN model when compared to the experimental data mean. The standard deviation of the model predictions is lower than the experimental data standard deviation. When it comes to the second output ε ccu /ε c ' . The mean and standard deviation of the experimental data are 10.74 and 7.12, respectively. While The mean and standard deviation of the model value are 11.89 and 5.69. As the first input, the mean value for the predicted data is slightly higher than the experimental data mean, but the model predictions' standard deviation is lower. To check that these variances are not significant, the model's testing data results were compared to the experimental data using the two-sample t-test. The p-value was found to be 0.92 and 0.668 for stress and strain, respectively, which are higher than 0.05. Therefore, the difference between the means of the two sets of data is not statistically significant, and the proposed model is in good agreement with the experimental data.

Regression analysis (RA)
Three models are derived using this method: linear, quadratic, and cubic models. These models have four, ten, and twenty terms, respectively.
The values for MAE, RMSE, and R 2 for each model are listed in Table 4. The models' performance improved using cubic and quadratic regression to model f ' cc /f c ' , and the improvement is even more significant for ε ccu / ε c ' . Although the models' performance improved using cubic and quadratic equations, the models' complexity using ten and twenty terms made it impractical to use these two models in engineering design. Therefore, only the linear model will be discussed herein.
Based on the linear regression model, f ' cc /f c ' and ε ccu /ε c  Table 4), ensuring that the eliminated term did not affect the model's accuracy. The same applies to the second output ε ccu /ε c ' . MAE, RMSE, and R 2 are recalculated. And the difference in these values between the original equation and the adjusted equation values is also negligible ( Table 4). Figure 6 illustrates a comparison between the regression model prediction and actual data for each data record. The horizontal axis represents the model results, and the vertical axis represents the experimental results for f ' cc /f c ' (Fig. 6a) and ε ccu /ε c ' (Fig. 6b). The closer the point to the diagonal axis, the better the results are. Figure 6 shows that the model results for the first output f ' cc /f c ' are concentrated around the diagonal line, while the second output results are more scattered. That agrees with R 2 values obtained for each model: 0.895 and 0.786 for f ' cc /f c ' and ε ccu /ε c ' , respectively. Along with the metrics in Table 4, these values show that the model proposed for the confined compressive strength is more accurate in predicting the output than the model proposed for the confined compressive strain.
The MAE and RMSE metrics results in Table 4 support this observation and show that the f ' cc /f c ' outputs were generated with lower error values than ε ccu /ε c ' outputs using the linear regression models.
For the confined concrete strength, the mean and standard deviation of the model data are 2.62 and 1.44. That gave us an exact mean to the experimental data but a lower standard deviation. For the second output ε ccu /ε c ' , these values are 10.25 and 6.817, respectively. Like the first output, compared to the experimental data, the mean is exactly the same, but the standard deviation is lower for the model. As the means of the models equal the means of the experimental data for both outputs, it was predictable that the t value will be 1 in the t-test.

Comparison between models
Although the ANN model needed more computer science knowledge to perform, both modeling techniques needed equal efforts to create once the experimental database is ready. Figure 7 compares the performance of ANNs model and regression model by comparing the statistical metrics used in the study. The linear regression model is superior for the first output (f ' cc /f c ' ). R 2 RA is higher than R 2 ANN , while MAE RA and RMSE RA are less than MAE ANN and RMSE ANN respectively. These are indications that the regression model gave better results with higher accuracy and lower overall error compared to the ANNs model. For the second output (ε ccu /ε c ' ), R 2 RA is very close to R 2 ANN , while MAE RA and RMSE RA are more than MAE ANN and RMSE ANN respectively. These are indications that the ANNs model gave better results with higher accuracy and lower overall error compared to regression analysis.
Both models gave a significant result when comparing the mean and standard deviation with the experimental database. But the RA method proposed a model that shared the same mean of the original data for both outputs.
The two models' predictions and the experiment database for the testing group are shown in Fig. 8 ). Using Linear regression model f c ' in multiple columns. This observation was not accurate for ε ccu /ε c ' as well. It should be noticed that the linear regression method produced two equations that can be used to predict the compressive strength and strain of RC circular columns wrapped with FRP. Any interested designer can directly use these equations without a background related to the regression model obtained using the machine learning algorithms. On the other hand, the generated ANNs model generated using machine learning algorithms needs to be used again to predict the future's compressive strength and strain.

Summary and conclusions
The estimation of RC columns wrapped with FRP compressive strength and strain at failure is usually achieved by destructive methods through lab testing, requiring a high level of effort and funds. The purpose of this paper is to develop a model for the axial compressive strength and axial strain for FRP confined circular concrete columns with internal steel reinforcement using parameters that depend on the materials' properties and geometry.
Two non-destructive analysis methods are explored: Artificial Neural Networks (ANNs) and Regression Analysis (RA). The database used in the analysis contains the experimental compressive strength results of 64 concrete columns available in the literature. Several statistical metrics are used to evaluate the performance of both methods: The Mean Absolute Error (MAE), the Root Mean Square Error (RMSE), the Squared Multiple Correlation Coefficient (R 2 ), and the Two-Sample t-Test.
Experimental results show that the generated ANN model can predict the column compressive stress and strain reasonably with low error and high accuracy. The model predicted both f ' cc /f c ' and ε ccu /ε c ' with the same accuracy level.
Three models are derived using the regression analysis method: linear, quadratic, and cubic models. Although the models' performance improved using cubic and quadratic models, the models' complexity made it unpractical to use in engineering applications.
Using the statistical metrics listed above, it is found that the lateral confining pressure due to fibers has the highest contribution to the compressive strength and strain. Also, it is found that the transverse steel is considered to have a minimal contribution to the confined column compressive strength. On the other hand, the longitudinal steel reinforcement to have a minimal contribution to the confined column compressive strain.
The proposed linear regression model can also predict the column compressive stress and strain reasonably with low error and high accuracy. The model is more accurate in predicting f ' cc /f c ' than ε ccu /ε c ' . That may come because of variation in the location and method of reading and recording experimental research strains.
For the first output (f ' cc /f c ' ), the statistical metrics show that the regression model gave better results with higher accuracy and lower overall error than the ANNs model. For the second output (ε ccu /ε c ' ), the metrics indicate that the ANNs model gave better results with higher accuracy and lower overall error than regression analysis. This paper proposed a novel approach to accurately predict the axial compressive strength and axial strain for FRP confined circular concrete columns with internal steel reinforcement. The proposed methodology is significantly important for the designer to get these values without lab destructive testing. The regression model is beneficial if the user is looking for a set of equations to be used or programmed in a data sheet. While the ANN model is more dynamic as it is a smart algorithm and the base of artificial intelligence, it will learn from the input to predict the required values. These two techniques can be applied to several applications in most engineering applications.
The models' findings are limited to the experimental data collected. As these data are limited, so are the models. Larger experimental data will increase the accuracy of both techniques used. Also, as the parameters incorporated in the models are collected from several studies in the literature, the parameters may vary based on the sampling and testing equipment used in the experiments. In the future, more modeling techniques can be used and compared to the two techniques discussed in this paper.
Data availability All data generated or analyzed during this study are included in this published article.

Compliance with ethical standards
Conflicts of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.