1 Introduction

Reinforced concrete (RC) is a composite material composed of concrete and rebar, and the bond between the two materials is a major factor in determining the stiffness, deformation, and load-carrying capacity of RC members (Alharbi et al., 2021; Darwin et al., 1992; Eligehausen et al., 1982; Tepfers, 1979). Therefore, various empirical models to estimate bond strength (\(\tau_{\max }\)) have been proposed (Esfahani & Kianoush, 2005; Harajli et al., 1995; Orangun et al., 1977; Wu & Zhao, 2013; Xu, 1990). On the other hand, estimation models for bond–slip (\(\tau - s\)) were very rare (Wu & Zhao, 2013), and bond–slip (\(\tau - s\)) models presented by most previous research and design codes require experimental data, which makes it difficult to apply them to the numerical analysis of RC members without testing. (Comité Euro-International du Béton, 1993; Eligehausen et al., 1982; Martin, 1973; Mirza & Houde, 1979; Nilson, 1968; Rehm, 1961).

However, the test data available for the derivation of \(\tau - s\) are quite limited, and the reason is explained in Fig. 1. The pull-out test is a simple bond testing method by directly pulling out a reinforcing bar embedded in concrete. In this case, tensile stress occurs in the rebar, and compressive stress arises in the surrounding concrete (Alharbi et al., 2021). The other bond test method is the beam test in which a reinforcing bar in an RC beam is pulled out by an external moment. In this case, both the rebar and the surrounding concrete are subjected to tensile stress (Alharbi et al., 2021). Owing to this difference in the stress state, the \(\tau - s\) developed in the RC beam may be quite different from the \(\tau - s\) obtained from the pull-out specimen (Alharbi et al., 2021). Nevertheless, because of the simplicity of the pull-out test, most of the \(\tau - s\) data are pull-out test results, and the beam test data are very limited.

Fig. 1
figure 1

Stress distribution in pull-out and beam specimens (Tastani & Pantazopoulou, 2002).

If a mapping function that can convert the \(\tau - s\) of pull-out specimens to those of beam specimens is developed, the \(\tau - s\) curve of RC beams can be estimated using the large amount of \(\tau - s\) data obtained from the pull-out test. Therefore, a mapping function between the \(\tau - s\) of pull-out and beam specimens was developed in this study. The overall research process complies with the data science pipeline consisting of data acquisition, analysis, modeling, and verification steps. In the next section, the background and existing models are described. The data and machine-learning methods used in this study are introduced in Sect. 3 and the data analysis results and proposed models are presented in Sect. 4. The verification of the accuracy of the proposed model is described in Sect. 5.

2 Backgrounds

2.1 Bond and stress components

Fig. 2 shows the bond and stress components of the rebar embedded in concrete. As shown in Fig. 2a, the bond components are chemical adhesion, friction, and mechanical interlock (Lutz & Gergely, 1967; Tepfers, 1979). Chemical adhesion is a bond component arising from physical and chemical bonding between concrete and rebar and makes little contribution to bond strength, because it disappears when a slip occurs (Alharbi et al., 2021). Friction is caused by the roughness of the surface and normal stress between the concrete and rebar. Lutz and Gergely reported that normal stress is caused by the shrinkage of concrete (Lutz & Gergely, 1967). Mechanical interlocking is a bond component arising from the shear resistance of the rib and adjacent concrete, which has the greatest effect on the bond strength of the deformed bar (Alharbi et al., 2021; Lutz & Gergely, 1967).

Fig. 2
figure 2

Bond mechanism.

Fig. 2b shows the stress component caused by the bond. The stress that occurs in the longitudinal direction of the rebar is shear stress (\(\tau_{l}\)). The average shear stress (\(\tau\)) caused by the bond is calculated as

$$\tau = \frac{{{\Delta }T}}{{2\pi d_{{\text{b}}} {\Delta }x}},$$

where \(d_{b}\) is the rebar diameter, and \({\Delta }T\) is the increment of the tensile force generated in a unit length (\({\Delta }x\)) of rebar. The stress generated in the horizontal direction of the rebar section is divided into radial stress (\(\sigma_{{\text{r}}}\)) and tangential stress (\(\sigma_{\theta }\)), and the relationship between \(\sigma_{r}\) and \(\tau\) is (Tepfers, 1979)

$$\sigma_{{\text{r}}} = \tau \tan \alpha ,$$

where \(\alpha\) is the angle between the principal compressive bond stress and the axis of the reinforcing bar, which is approximately 45° (Tepfers, 1979). Based on the thick-wall cylinder theory (Timoshenko, 2002), the tangential stress (\(\sigma_{\theta }\)) can be represented using the following equation:

$$\sigma_{\theta } = \frac{{\left( {\frac{{d_{{\text{b}}} }}{2}} \right)\sigma_{{\text{r}}} }}{{\left( {c + \frac{{d_{{\text{b}}} }}{2}} \right)^{2} - \left( {\frac{{d_{{\text{b}}} }}{2}} \right)^{2} }}\left[ {1 + \frac{{\left( {c + \frac{{d_{{\text{b}}} }}{2}} \right)^{2} }}{{r^{2} }}} \right],$$

where \(c\) is the cover depth, and \(r\) is the radial distance from the center of the rebar. As shown in Eq. (3) and Fig. 2b, \(\sigma_{\theta }\) is maximum at the surface of the reinforcing bar, and it decreases as the \(r\) increases.

2.2 Failure Mode, Bond Strength, and Bond–Slip Behavior

Fig. 3 shows the bond–slip curve according to the pull-out failure or splitting failure (Mazumder & Gilbert, 2019). Pull-out failure is a failure mode in which concrete around the rib is crushed. This mainly occurs when the cover depth (\(c\)) is great, the compressive strength of concrete (\(f^{\prime}_{{\text{c}}}\)) is low, or the stress of the concrete near the rib is large, because the embedded length \(\left( {L_{{\text{e}}} } \right)\) is small. Splitting failure is a failure mode in which splitting cracks occur in the longitudinal direction when \(\sigma_{\theta }\) exceeds the tensile strength of concrete (\(f_{{\text{t}}}\)). This mainly occurs when \(c\) is shallow or \(f_{{\text{t}}}\) is small. As shown in Fig. 3, splitting failure shows a rapid decrease in bond strength (\(\tau_{{{\text{max}}}}\)) compared with pull-out failure. There is almost no residual strength in the unconfined specimen, whereas the residual strength is partially maintained in the confined specimen. In addition, the bond strength (\(\tau_{{{\text{max}}}}\)) of the confined specimen is larger than that of the unconfined specimen. Therefore, the bond characteristic of rebar varies depending on various influencing parameters and ultimately has a great impact on the behavior of the flexural member.

Fig. 3
figure 3

Failure modes and bond–slip curve.

2.3 Influencing Parameters

Various tests have been conducted on the influencing factors of bonds (Darwin et al., 1992; Eligehausen et al., 1982; Esfahani & Kianoush, 2005; Harajli et al., 1995; Orangun et al., 1977; Soroushian & Choi, 1989; Walker et al., 1997), including \(f^{\prime}_{{\text{c}}}\), \(c\), \(d_{{\text{b}}}\), the cross-sectional area of one leg of transverse reinforcement (\(A_{{{\text{st1}}}}\)), and the spacing of transverse reinforcement (\(S_{{{\text{st}}}}\)). Here, \(f^{\prime}_{{\text{c}}}\) is the most important influencing factor that increases \(\tau_{{{\text{max}}}}\)(Wu & Zhao, 2013), and the effect on \(\tau_{{{\text{max}}}}\) is mainly proportional to \(\sqrt {f^{\prime}_{{\text{c}}} }\) (Esfahani & Kianoush, 2005; Harajli et al., 1995; Orangun et al., 1977). Furthermore, \(c\) is an influencing factor that increases \(\tau_{{{\text{max}}}}\), but the effect gradually decreases as \(c\) increases (Walker et al., 1997). \(d_{{\text{b}}}\) serves as a factor that decreases \(\tau_{{{\text{max}}}}\) and is often reflected in the form of \(c/d_{{\text{b}}}\) (Ichinose et al., 2004). \(A_{{{\text{st1}}}}\) and \(S_{{{\text{st}}}}\) are influencing factors confining longitudinal reinforcing bars, and \(\tau_{{{\text{max}}}}\) tends to increase as \(A_{{{\text{st1}}}}\) increases or \(S_{{{\text{st}}}}\) decreases.

2.4 Existing Models

The empirical models to estimate \(\tau_{\max }\) are presented in Table 1. Orangun et al. (1977) proposed an empirical model based on non-linear regression analyses of bond test data. In this model, the influences of \(c\) and \(f_{{{\text{yt}}}}\) (the yield strength of rebar) on the bond strength are reflected linearly. Xu (1990) and Harajli et al. (1995) proposed simpler models, compared to Orangun's model, where the influence of \(c\) is reflected non-linearly and \(f_{{{\text{yt}}}}\) is excluded. Esfahani and Kianoush (2005) proposed an empirical model with high complexity, where the median of side cover, bottom cover, and rebar spacing (\(c_{{{\text{med}}}}\)) is implemented. Wu and Zhao (2013) also proposed a complex bond strength model that provides a unified result with their bond–slip model.

Table 1 Empirical bond strength models.

The bond–slip (\(\tau - s\)) models are shown in Table 2. The top six models require \(\tau - s\) test data for completing the curve. On the other hand, Wu and Zhao’s model (2013) gives a complete \(\tau - s\) curve without bond test data once the influencing factors, such as compressive strength of concrete (\(f^{\prime}_{{\text{c}}}\)), cover depth (\(c\)), rebar diameter (\(d_{{\text{b}}}\)), cross-sectional area (\(A_{{{\text{st}}}}\)), and spacing of transverse reinforcement (\(S_{{{\text{st}}}}\)), etc., are provided.

Table 2 Empirical bond–slip models.

3 Materials and Methods

3.1 Materials

As shown in Table 3, 255 and 31 \(\tau - s\) data were collected from the previous pull-out and beam tests, respectively. Seven data points were extracted from each \(\tau - s\) curve. The data included the points at which slip (\(s\)) is the maximum and the minimum, the points at which it is \(\tau = \tau_{{{\text{max}}}}\), and the points which are the closest points to each of 4 points splitting the maximum and the minimum of slip with equal distance. As a result, 1785 and 217 data points were collected from the pull-out and beam test data. The \(f^{\prime}_{{\text{c}}}\) of pull-out specimens ranged from 5 to 150 MPa, the \(c\) from 15 to 96 mm, the \(d_{{\text{b}}}\) from 6 to 32 mm, the \(A_{{{\text{st1}}}} /S_{{{\text{st}}}}\) from 0 to 3.6, the s from 0 to 25 mm, and the \(\tau\) from 0 to 48 MPa. Similarly, the \(f^{\prime}_{{\text{c}}}\) of the beam specimens ranged from 24 to 93 MPa, the \(c\) from 15 to 59 mm, the \(d_{{\text{b}}}\) from 8 to 40 mm, the \(A_{{{\text{st1}}}} /S_{{{\text{st}}}}\) from 0 to 2, the s from 0 to 4.2 mm, and the \(\tau\) from 0 to 28 MPa. In addition, as shown at the bottom of Table 3, 44 beam test data with only \(\tau_{{{\text{max}}}}\) reported were collected to compensate for the lack of data in the beam test. The \(f^{\prime}_{{\text{c}}}\) of the data ranged from 29 to 93 MPa, the \(c\) from 15 to 76 mm, the \(d_{b}\) from 12 to 26 mm, the \(A_{{{\text{st1}}}} /S_{{{\text{st}}}}\) from 0 to 1.6, and the \(\tau\) from 4 to 18 MPa. Fig. 4 shows the histograms of \(\tau_{{{\text{max}}}}\) data. The features and counts are presented in x- and y-axes, respectively. The highest frequency occurred at \(f_{{{\text{ck}}}}\) = 30 MPa, \(c\) = 67 mm, \(d_{{\text{b}}}\) = 16 mm, \(A_{{{\text{st1}}}} /S_{{{\text{st}}}}\) = 0 mm, and \(\tau_{\max }\) = 13 MPa.

Table 3 Summary of data.
Fig. 4
figure 4

Histograms of the data sets.

3.2 Methods

3.2.1 Random Forest

A regression tree (RT) (Breiman et al., 1984) can be used advantageously in modeling and data analysis, because it can provide a regression model with moderate accuracy and feature importance. Fig. 5a shows the regression model of data consisting of two features (\(x_{1,i} ,x_{2,i} \in X\)) and a target (\(y_{i} \in Y\)) to explain the learning process of RT. As shown in Fig. 5b, the learning process of an RT is to build up splits (\(s_{p},\) i.e., if-else function) that can minimize the impurity (\(I\)) of data from the root node.

Fig. 5
figure 5

Regression tree.

The first step in the learning process is to define the impurity (\(I\)), and the \(I\) that a node \(t\) has in a regression problem can be defined as variance (\({\text{Var}}\left( t \right)\)), as follows:

$$I\left( t \right) = {\text{Var}}\left( t \right) = \frac{1}{N\left( t \right)}\mathop \sum \limits_{{x_{i} \in t}}^{{}} \left( {y_{i} - \overline{y}\left( t \right)} \right)^{2} ,$$

where \(N\left( t \right)\) is the number of data in node \(t\), and \(\overline{y}\left( t \right)\) is the average of \(y\) in node \(t\). The next step is to define a split set (\(S_{p}\), \(s_{p} \in S_{p}\)), which is created by sorting the values of each feature in order and adopting the midpoint. For example, when the \(x_{1}\) of the data included in \(t\) are 0.5, 1.5, and 2.5 and the \(x_{2}\) are 20, 40, 60, and 80, \(S_{p} = \left\{ {\left( {x_{1} :1} \right),\left( {x_{1} :2} \right),\,\left( {x_{2} :30} \right),\left( {x_{2} :50} \right),\left( {x_{2} :70} \right)} \right\}\). The next step is to derive the \(s_{{{\text{p,max}}}} \in S_{p}\) that can minimize the impurity change (\({\Delta }I\)), and \({\Delta }I\) can be calculated as

$${\Delta }I\left( {s_{p},t} \right) = I\left( t \right) - I\left( {t_{{\text{L}}} } \right) - I\left( {t_{{\text{R}}} } \right),$$

where \(t_{{\text{L}}}\) and \(t_{{\text{R}}}\) are the left and right child nodes generated from \(s_{p}\), respectively. For each \(s_{p}\), \({\Delta }I\) is calculated, and \(s_{{{\text{p,max}}}}\) is derived based on the \(\mathop {\arg \max }\limits_{s_{p}\, \in \,S_{p}} {\Delta }I\) value. Through the repetition of the above process, an RT is built, and the feature importance is derived based on the \({\Delta }I\).

The RT is prone to overfitting, because it can build a regression model that fits the training set perfectly. In addition, because the RT selects \(s_{p}\) based on variance, it is greatly affected by the data distribution. One method devised to compensate for these problems is the random forest (RF) (Breiman, 2001), and the main concept of this method is the bagging and random selection of features. Bagging is a method of creating multiple trees using various data sets generated through sampling with replacements and averaging the predicted values of all trees to determine the final predicted value. The random selection of features is a method of limiting the number of features that can be used to generate \(S_{p}\), and it serves to decorrelate the trees that make up the RF. The regression model derived from this method exhibits superior generalization performance to that of the RT (Breiman, 2001).

3.2.2 K-Means Clustering

Clustering is the task of grouping similar data and is useful in data mining, because it can summarize data (Xu & Wunsch, 2008). K-means clustering, one of the most common methods used in clustering, is an algorithm that groups a given data set into a set of K clusters (Ding & He, 2004). Fig. 6 shows the process of K-means clustering, where \(\times\) is each datum, and \({\text{o}}\) is the center of the cluster. The dotted line represents the distance (\(D\)) between a cluster center and a datum, and the Euclidean distance (\(L_{2}\)) can be used to measure \(D\), as shown below:

$$D\left( {x_{1} ,x_{2} } \right) = \left( {\left| {x - \mu } \right|^{\frac{1}{2}} } \right)^{2} ,$$
Fig. 6
figure 6

K-means clustering.

where \(x\) is the datum, and \(\mu\) is the center of the cluster. The objective of K-means clustering is to minimize the variance between \(\mu\) and \(x\) of each cluster, which can be represented as

$$\mathop {{\text{argmin}}}\limits_{C} \mathop \sum \limits_{i = 1}^{k} \mathop \sum \limits_{{x \in C_{i} }} (x - \mu_{i} )^{2}$$

To do this, the algorithm repeats the following process.

  1. 1.

    Determine which cluster the data belong to: Identify the \(D\) between \(x\) and \(\mu\) and then determine the cluster to which each \(x\) belongs based on Eq. (7).

  2. 2.

    Move the centroid of the cluster: Move \(\mu\) to the centroid of each cluster.

The process continues until the centroid of the cluster no longer changes.

3.2.3 Genetic Programming

Genetic programming (GP) is an optimization algorithm that simulates evolution and can be used for symbolic regression, classification, and feature selection (Koza, 1994). Fig. 7 shows the symbolic regression using GP. The algorithm creates several expression trees (a formula expressed as a tree type) to generate a population. Subsequently, it evaluates the fitness (\(f\), i.e., estimation error) of each tree, where the mean squared error (MSE) or mean absolute error (MAE) can be used to calculate \(f\) based on true (yi) and prediction values (ypi):

$${\text{MSE}}\left( {y_{{\text{i}}} - y_{{{\text{pi}}}} } \right) = \frac{1}{N}\sum_{i=1}^N\left( {y_{{\text{i}}} - y_{{{\text{pi}}}} } \right)^{2} .$$
$${\text{MAE}}\left( {y_{{\text{i}}} - y_{{{\text{pi}}}} } \right) = \frac{1}{N}\sum_{i=1}^N\left| {y_{{\text{i}}} - y_{{{\text{pi}}}} } \right|.$$
Fig. 7
figure 7

Symbolic regression using genetic programming.

Based on the previously evaluated \(f\), two excellent parent trees are selected to perform a crossover that exchanges a part of each tree. Subsequently, a mutation is carried out to change a part of the tree to a random tree. The generated offspring tree is merged into the next-generation population, and this process continues until a user-defined termination condition is met.

4 Results and Discussion

4.1 Parametric Study

4.1.1 Feature Importance

There are some machine learning-based methods to analyze the feature importance of the data set (Kuhn & Johnson, 2019). For example, recursive feature elimination (RFE), which observes the change of the model performance by recursively eliminating a variable in the training data set, can be used for deriving feature importance. However, RFE can underestimate the importance of the features with colinearity (Granitto et al., 2006). Meanwhile, although impurity-based feature importance, such as RF tends to underestimate the importance of categorical features, it is less affected by collinearity than RFE (Deng et al., 2011). In this paper, there was no categorical feature in the data sets, but there was collinearity between some features. Thus, the feature importance of \(\tau_{{{\text{max}}}}\) was analyzed using the RF introduced in Sect. 3.2.1. To confirm the difference in the feature importance of the pull-out and beam specimens, two data sets were trained separately, and the ratio of training data to test data was 2:1. Hyperparameter optimization was performed to derive the feature importance from the optimized RF model. For the pull-out data, the lowest test error occurred when the number of trees was 100, the depth of the tree was 20, and the maximum number of features was 3. The optimal result was obtained when they were 1000, 10, and 4, respectively, for the beam data set.

Fig. 8a, b shows the feature importance results of the pull-out and beam specimens derived by RF, respectively. As shown in Fig. 8a, \(f^{\prime}_{{\text{c}}}\) had a significant influence on the bond strength of the pull-out specimens (\(\tau_{{\text{p,max}}} )\). The level of importance was the highest in \(f^{\prime}_{{\text{c}}}\), followed by \(c\), \(d_{{\text{b}}}\), and \(\frac{{A_{{{\text{st1}}}} }}{{S_{{{\text{st}}}} }}\). However, the degree of influence of \(f^{\prime}_{{\text{c}}}\) on the bond strength of the beam specimens (\(\tau_{{\text{b,max}}}\)) was reduced compared with that of \(\tau_{{\text{p,max}}}\), and the influence of \(\frac{{A_{{{\text{st1}}}} }}{{S_{{{\text{st}}}} }}\) increased significantly. As a result, the level of importance was the highest in \(f^{\prime}_{{\text{c}}}\), followed by \(\frac{{A_{{{\text{st1}}}} }}{{S_{{{\text{st}}}} }}\), \(c\), and \(d_{{\text{b}}}\), for the bond strength of the beam specimens (\(\tau_{{\text{b,max}}}\)).

Fig. 8
figure 8

Feature importance.

4.1.2 Correlation Analysis

The correlation between the features and \(\tau_{{{\text{max}}}}\) was analyzed in this section. To identify the correlation between a target and a selected feature more clearly, the influence of other features should be minimized. Thus, K-mean clustering, the distance-based clustering method introduced in Sect. 3.2.2, was used to cluster the data that are close to each other. The number of clusters was selected based on the silhouette coefficient (Rousseeuw, 1987), and the highest silhouette coefficient was obtained from four and three clusters, respectively, for the pull-out and beam specimens.

Fig. 9a, b shows the results of the correlation analysis between the target and the features constituting the pull-out and beam specimens. The x-axis of each graph represents the features shown at the bottom, and the y-axis represents \(\tau_{{{\text{max}}}}\). The scatter plot and linear regression model of the unclustered data and clustered data are shown at the top and bottom of the figure, respectively. The graphs are numbered in the upper-left corner of each graph.

Fig. 9
figure 9

Correlation matrix (\(\tau_{{{\text{max}}}}\)).

As shown in Fig. 9a Graph 1, the variable with the greatest slope in the pull-out specimens was \(f^{\prime}_{{\text{c}}}\), and the effect of \(f^{\prime}_{{\text{c}}}\) decreased slightly with increasing \(f^{\prime}_{{\text{c}}}\), as shown in Graph 5. As \(c\), which showed the second-highest feature importance, increased, \(\tau_{{\text{p,max}}}\) increased (Graph 2), and the increase in \(\tau_{{\text{p,max}}}\) gradually decreased as \(c\) increased (Graph 6). In addition, \(d_{b}\) showed different tendencies for each cluster (Graph 7), but \(\tau_{{\text{p,max}}}\) decreased slightly as \(d_{b}\) increased for unclustered data (Graph 3). Moreover, \(A_{{{\text{st1}}}} /S_{{{\text{st}}}}\) slightly increased \(\tau_{{\text{p,max}}}\) (Graph 4), and a consistent tendency was found in all clusters (Graph 8).

In the beam specimens, \(f^{\prime}_{{\text{c}}}\) increased \(\tau_{{\text{b,max}}}\) as in pull-out specimens, but the increase was smaller than that of the pull-out specimens (Fig. 9b Graph 1). The influence of \(c\), which showed the third-highest degree of influence, was obscure (Graph 2 and 6). Unlike the pull-out specimens in which \(d_{{\text{b}}}\) had the lowest degree of influence, \(d_{{\text{b}}}\) showed a distinct influence to decrease \(\tau_{{\text{b,max}}}\) (Graph 7). Finally, \(A_{{{\text{st1}}}} /S_{{{\text{st}}}}\), which showed the second-largest feature importance, was found to increase \(\tau_{{\text{b,max}}}\) (Graph 4).

Tables 4 and 5 show the average feature values of each cluster obtained from the pull-out and beam specimens, respectively. As shown in Table 4, clusters 2 and 3 with underlines showed the lowest and highest τp,max, respectively, and there were significant differences in f'c and c/db. It can be confirmed that the effect of f'c was much larger than that of c/db, when comparing the differences in clusters 2, 3, and 4. In the beam specimens, clusters 2 and 3 with underlines in Table 5 showed the lowest and highest τmax,b, respectively, and major differences occurred in f'c and Ast1/Sst. Although cluster 1 had the highest f'c, the cluster with the highest τb,max was cluster 3 which has the second-best f'c and first-best Ast1/Sst. This is because the influence of f'c decreased and because the influence of Ast1/Sst increased in the beam specimens, as shown in Figs. 8b and 9b.

Table 4 Feature values of each cluster (pull-out data set).
Table 5 Feature values of each cluster (beam data set).

4.2 Derivation of Mapping Function

4.2.1 Evolutionary Parameter

In this section, the mapping function (\(\tau - s\) relationship between pull-out and beam specimens) was modeled based on data, and the GP introduced in Sect. 3.2.3 was used to do this. Before deriving the mapping function, a generalized \(\tau - s\) model (\(\tau_{{\text{p}}} \left( s \right)\)) of the pull-out specimen was derived. The evolutionary parameters used in the GP are summarized in Table 6. The instructions used to model \(\tau_{{\text{p}}} \left( s \right)\) were \(f^{\prime}_{{\text{c}}}\), \(c\), \(d_{{\text{b}}}\), \(\frac{{A_{{{\text{st1}}}} }}{{S_{{{\text{st}}}} }}\), \(s\), and arithmetic operators (i.e., \(+\), \(-\), \(\div\), \(\times\)). Since the mapping function between the bond–slip model of the beam specimens (\(\tau_{{\text{b}}}\)) and \(\tau_{{\text{p}}}\) is

$$\tau_{{\text{b}}} = m\left( {\tau_{{\text{p}}} ,\,f^{\prime}_{{\text{c}}} \,{\text{, c, d}}_{{\text{b}}} ,\,A_{{{\text{st1}}}} /S_{{{\text{st}}}} } \right),$$
Table 6 Evolutionary parameters.

\(\tau_{{\text{p}}}\), \(f^{\prime}_{{\text{c}}}\), \(c\), \(d_{{\text{b}}}\), \(\frac{{A_{{{\text{st1}}}} }}{{S_{{{\text{st}}}} }}\), and the arithmetic operators were used as instructions. The ramped half-and-half (Koza, 1994) method was used to generate the initial population, and the size of the tree was set at 1 to 4. Because a larger training data set requires a higher computational cost for evaluation, the population size was set to 1000 in the \(\tau_{{\text{p}}}\) modeling, while it was set to 100,000 in the modeling of \(m\left( {\tau_{{\text{p}}} } \right)\). Tournament selection (Miller & Goldberg, 1995) was used as a selection method, and the tournament size was set to 7. The crossover and mutation probabilities were set to 90% and 10%, respectively, and the maximum size of the tree was set to 40 and 30, respectively, for \(\tau_{{\text{p}}} \left( s \right)\) and \(m\left( {\tau_{{\text{p}}} } \right)\) modeling to derive a concise equation. In \(\tau_{{\text{p}}} \left( s \right)\) modeling, the MAE of the GP did not decrease significantly after approximately 250 generations, and the algorithm was thus terminated in the 250th generation. Even in \(m\left( {\tau_{{\text{p}}} } \right)\) modeling, the algorithm was terminated in the 50th generation for the same reason. As with the RF, the ratio of the training and test data was 2:1.

4.2.2 Bond–Slip and Bond Strength Model for Pull-Out Specimens

The \(\tau_{{\text{p}}} \left( s \right)\) obtained using GP is shown below.:

$$\tau_{{\text{p}}} = \frac{{2\left( {f^{\prime}_{{\text{c}}} + \frac{155}{{f^{\prime}_{{\text{c}}} }} + \frac{155}{c} + 45} \right)s}}{{2s^{2} + \left( {\frac{155}{{f^{\prime}_{{\text{c}}} }} + \frac{310}{c}} \right)s + 1}}$$

To simplify Eq. (11), \(\left( {\frac{155}{{f^{\prime}_{{\text{c}}} }} + \frac{310}{c}} \right)s\) is removed from the denominator of Eq. (11), and constant optimization is performed using the Levenberg–Marquardt (Levenberg, 1944; Marquardt, 1963) algorithm. The modified \(\tau_{{\text{p}}} \left( s \right)\) is expressed as

$$\tau_{{\text{p}}} = \left( {f^{\prime}_{{\text{c}}} - \frac{130}{{f^{\prime}_{{\text{c}}} }} - \frac{630}{c} + 50} \right)\frac{s}{{2\left( {s^{2} + 1} \right)}}.$$

In Eq. (12), \(f^{\prime}_{{\text{c}}}\) is modeled in the form of \(f^{\prime}_{{\text{c}}} - \frac{\alpha }{{f^{\prime}_{{\text{c}}} }}\)(where \(\alpha\) is a constant), which is a combination of a rational function and a linear function. This is the result of reflecting the effect that the increase in bond strength decreased with increasing \(f^{\prime}_{{\text{c}}}\), as shown in Fig. 9a Graph 5. In addition, \(c\) appears in the form of \(- \frac{\alpha }{c}\) (where \(\alpha\) is a constant), which is a result of reflecting the decrease of influence on \(\tau_{{\text{p}}}\) as \(c\) increase. However, \(d_{{\text{b}}} ,\) and \(A_{{{\text{st1}}}} /S_{{{\text{st}}}}\) are not reflected in Eqs. (11) and (12). This is because their feature importance is low, as shown in Fig. 8a, and their influence on bond strength is relatively low, as shown in Fig. 9a. \(\tau_{{\text{p}}}\) is differentiated with respect to \(s\) to derive the bond strength of the pull-out specimens (\(\tau_{{\text{p,max}}}\)) from Eq. (12). The \(d\tau_{{\text{p}}} /ds\) is

$$\frac{{d\tau_{{\text{p}}} }}{ds} = \left( {f^{\prime}_{{\text{c}}} - \frac{130}{{f^{\prime}_{{\text{c}}} }} - \frac{630}{c} + 50} \right)\frac{{\left( { - s^{2} + 1} \right)}}{{2\left( {s^{2} + 1} \right)^{2} }}.$$

Equation 13 confirms that a maximum point is obtained at \(s = 1\). \(s = 1\) is substituted into Eq. (12) to obtain the bond strength of the pull-out specimens (\(\tau_{{\text{p,max}}}\)). The \(\tau_{{\text{p,max}}}\) is

$$\tau_{{\text{p,max}}} = \frac{1}{4}\left( {f^{\prime}_{{\text{c}}} - \frac{130}{{f^{\prime}_{{\text{c}}} }} - \frac{630}{c} + 50} \right).$$

4.2.3 Mapping Function Between Pull-Out and Beam Bond–Slip Data

The \(m\left( {\tau_{{\text{p}}} } \right)\) derived using GP is

$$\tau_{{\text{b}}} = m\left( {\tau_{{\text{p}}} } \right) = \frac{{\left( {A_{{{\text{st1}}}} /S_{{{\text{st}}}} + 1} \right)f^{\prime}_{{\text{c}}} + 14}}{{d_{{\text{b}}} + 7\tau_{{\text{p}}} }}\tau_{{\text{p}}} + \frac{c}{{d_{{\text{b}}} + 7\tau_{{\text{p}}} }}.$$

Unlike in Eq. (11), the effects of \(d_{{\text{b}}}\) and \(A_{{{\text{st1}}}} /S_{{{\text{st}}}}\) are reflected in Eq. (15). This is because, as shown in Fig. 9b, their influence increase in the beam tests. Meanwhile, in Eq. (15), \(\tau_{b}\) is not equal zero at \(s = 0\) because of the \(\frac{c}{{d_{{\text{b}}} + 7\tau_{{\text{p}}} }}\) term. Accordingly, the \(\frac{c}{{d_{{\text{b}}} + 7\tau_{{\text{p}}} }}\) term is removed from Eq. (15), and the influence of \(c\) is then reflected in the denominator of \(d_{{\text{b}}}\) to introduce \(c/d_{{\text{b}}}\), a term used widely in the \(\tau_{{{\text{max}}}}\) prediction model. In addition, a constant term is added to the denominator to build a model with a higher level of accuracy, and constant optimization is performed. The modified \(m\left( {\tau_{{\text{p}}} } \right)\) is

$$\tau_{{\text{b}}} = m\left( {\tau_{{\text{p}}} } \right) = \frac{{\left( {A_{{{\text{st1}}}} /S_{{{\text{st}}}} + 2.7} \right)f^{\prime}_{{\text{c}}} - 16}}{{210\frac{{d_{{\text{b}}} }}{c} + 7\tau_{{\text{p}}} - 10}}\tau_{{\text{p}}} .$$

In Eq. (16), the influence of \(A_{{{\text{st1}}}} /S_{{{\text{st}}}}\) is reflected linearly, and \(d_{{\text{b}}}\) is reflected in the form of a rational function that gradually reduce \(\tau_{{\text{b}}}\). \(\tau_{{\text{b}}}\) is differentiated with respect to \(s\) to derive the bond strength of the beam specimens (\(\tau_{{\text{b,max}}}\)) from Eq. (16) as below:

$$\tau_{{\text{b}}} = \frac{dm}{{ds}} = \frac{dm}{{d\tau_{p} }}\frac{{d\tau_{{\text{p}}} }}{ds}.$$

As in Eq. (17), \(\tau_{{\text{b}}}\) has a maximum point at the same position (\(s = 1\)) as \(\tau_{{\text{p}}}\) according to the chain rule, and the bond strength of the beam specimens (\(\tau_{{\text{b,max}}}\)) is

$$\tau_{{\text{b,max}}} = \frac{{\left( {A_{{{\text{st1}}}} /S_{{{\text{st}}}} + 2.7} \right)f^{\prime}_{{\text{c}}} - 16}}{{210\frac{{d_{{\text{b}}} }}{c} + 7\tau_{{\text{p,max}}} - 10}}\tau_{{\text{p,max}}} .$$

A safety factor (\({\upgamma }\)) is introduced to use Eq. (18) as a design equation as below:

$$\gamma \tau_{{\text{b,max}}} = \gamma \frac{{\left( {A_{{{\text{t1}}}} /S_{{\text{t}}} + 2.7} \right)f^{\prime}_{{\text{c}}} - 16}}{{210\frac{{d_{{\text{b}}} }}{c} + 7\tau_{{\text{p,max}}} - 10}}\tau_{{\text{p,max}}} .$$

5 Verification

5.1 Accuracy of Proposed Models

Fig. 10a shows the prediction accuracy of the bond–slip models for the pull-out and beam specimens in Eqs. (12) and (16), respectively. One-third of 1785 data points from the pull-out tests and one-third of 217 data points from the beam tests were used for the validation of the proposed models. (i.e., 590 data points from the pull-out tests and 72 data points from the beam tests). In the figure, the black dots indicate the \(\tau\) of the pull-out specimens predicted using Eq. (12), while the red dots indicate the \(\tau\) of the beam specimens predicted using Eq. (16). As shown in the figure, both models predicted \(\tau\) with high levels of accuracy, and Eqs. (12) and (16) showed MAEs of 2.04 and 3.11 MPa, respectively.

Fig. 10
figure 10

Comparison of test results and values calculated by proposed models.

Fig. 10b shows the \(\tau_{{{\text{max}}}}\) prediction accuracy of Eqs. (14) and (18) for 101 data points from the pull-out test and 56 data points from the beam test (including additional 44 data points of \(\tau_{{{\text{max}}}}\)) except for the \(\tau_{{{\text{max}}}}\) used for training. The black dots indicate the \(\tau_{{{\text{max}}}}\) of the pull-out specimens predicted using Eq. (14), while the red dots indicate the \(\tau_{{{\text{max}}}}\) of the beam specimens predicted using Eq. (18). As shown in the figure, Eqs. (14) and (18) provided prediction results close to the ideal curve and showed MAEs of 2.80 and 1.53 MPa, respectively.

A comparison between the accuracy of Eqs. (12) and (14) and that of Eqs. (16) and (18) for beam specimens reveals the accuracy improvement effect owing to the mapping function. The black and red dots in Fig. 11a indicate the \(\tau\) of the beam specimens predicted using Eqs. (12) and (16), respectively. When the \(\tau\) of the beam specimens was predicted using Eq. (12), the MAE was 4.43 MPa. When the mapping function was used for prediction, the MAE was 3.11 MPa, reducing the error by approximately 30%.

Fig. 11
figure 11

Accuracy improvement by mapping function.

The black and red dots in Fig. 11b indicate the \(\tau_{{{\text{max}}}}\) of the beam specimens predicted using Eqs. (14) and (18), respectively. In general, the measured \(\tau_{{{\text{max}}}}\) values of the pull-out specimens are larger than those of the beam specimens. Therefore, Eq. (14), trained using pull-out data, evaluated the \(\tau_{{{\text{max}}}}\) of the beam specimens as very unsafe. As a result, the MAE of Eq. (14) was 8.26 MPa, and the MAE of Eq. (18) was 1.53 MPa, reducing the error by 81%.

In Fig. 12, the \(\tau\) prediction accuracy of the model proposed by Wu and Zhao (2013) and that of Eq. (16) are compared. The mean, standard deviation (SD), and coefficient of variation (COV) are statistical values based on the values which are divided the predicted value into the measured value. The model by Wu and Zhao showed an MAE of 3.6 MPa, while Eq. (16) had an MAE of 2.9 MPa, exhibiting a predicted behavior closer to the ideal curve. The proposed model provides conservative prediction results at the interval with a lower \(\tau\) than that of Wu and Zhao’s model. Consequently, the mean of those were 0.9 and 0.8, and the SD and COV of the two models were 0.5 and 0.6, respectively.

Fig. 12
figure 12

Accuracy comparison for \(\tau - s\) of beam specimens.

Fig. 13 compares the \(\tau - s\) curve obtained by the model proposed by Wu and Zhao (2013) and Eq. (16) with the beam test data. Both models exhibited excellent accuracy. However, in the case of Fig. 13a, c with a high \(\tau_{{{\text{max}}}}\), the proposed model simulated the \(\tau - s\) behavior more closely, whereas, in the case of Fig. 13b, d with low \(\tau_{{{\text{max}}}}\), the model proposed by Wu and Zhao (2013) showed results that were slightly closer to the experimental results. The low accuracy in the high \(\tau\) region of Wu and Zhao's model was due to the lack of high-strength specimen data used for modeling. On the other hand, the reason for the low accuracy in the low \(\tau\) region of the proposed model seemed to be due to the limited model complexity during evolution.

Fig. 13
figure 13

Comparison of bond–slip prediction.

Fig. 14 shows comparison results of \(\tau_{{{\text{max}}}}\) obtained from the existing models, Eqs. (18), and (19). In the model proposed by Xu (Xu, 1990), \(f_{{\text{t}}}\) was calculated (ACI committee318, 2019) as

$$f_{t} = 0.5\sqrt {f^{\prime}_{{\text{c}}} }$$
Fig. 14
figure 14

Accuracy comparison for \(\tau_{{{\text{max}}}}\) of beam specimens.

In the model used by Orangun et al. (1977), \(f_{{{\text{yt}}}} =\) 400 MPa was applied to specimens for which \(f_{yt}\) was not reported. As shown in Table 8, 0.7, which is a value at 0.08 fractile, was applied as a safety factor (\({\upgamma }\)) in Eq. (19).

Table 8 Relationship between safety factor and fractile

As shown in Fig. 14h, Eq. (18) showed the lowest MAE (1.8 MPa) and a mean of 1.0, and Eq. (19) provided conservative \(\tau_{{{\text{max}}}}\) values. Equations (18) and (19) showed the lowest SD and COV, and the models of Esfahani and Kianoush (2005) and Orangun et al. (1977) also showed a low SD and COV. These results indicate that Eq. (18) can predict bond–slip and bond strength close to the measured values and also provide bond strength on the safe side by introducing a safety factor.

5.2 Bond–Slip Behavior of Proposed Models

In this section, Eq. (12) (i.e., the \(\tau - s\) model for pull-out specimens) and Eq. (16) (i.e., the \(\tau - s\) model for beam specimens), the accuracies of which were verified in Sect. 5.1, were used to analyze the change in \(\tau - s\) resulting from the influence of each variable and to compare the difference in the \(\tau - s\) behavior between the pull-out and beam specimens. In Fig. 15, the black and red lines indicate the \(\tau - s\) behavior of Eqs. (12) and (16), and the values of each feature increase in the order of solid, dashed, and dotted lines. Fig. 15a shows the effect of \(f^{\prime}_{{\text{c}}}\). As \(f^{\prime}_{{\text{c}}}\) increased, the \(\tau\) values of Eqs. (12) and (16) were found to increase significantly. In addition, the \(\tau_{{{\text{max}}}}\) of Eq. (16) was approximately 6–7 MPa lower than that of Eq. (12), and Eq. (16) showed a less sharp decrease in \(\tau\) at the post-peak stage compared with that of Eq. (12). Fig. 15b shows the effect of \(c\). In both models, \(\tau\) increased with increasing \(c\). However, the rate of increase in \(\tau\) gradually decreased as \(c\) increased. The effects of \(d_{{\text{b}}}\) and \(A_{{{\text{st1}}}} /S_{{{\text{st}}}}\) are shown in Fig. 15c, d, respectively. Because the effects of \(d_{{\text{b}}}\) and \(A_{{{\text{st1}}}} /S_{{{\text{st}}}}\) were not reflected in the pull-out bond–slip curve of Eq. (12), the \(\tau - s\) behavior was the same, even when \(d_{{\text{b}}}\) and \(A_{{{\text{st1}}}} /S_{{{\text{st}}}}\) changed—that is, it is represented by a single curve in the graphs in Fig. 15c, d. However, the effects of \(d_{{\text{b}}}\) and \(A_{{{\text{st1}}}} /S_{{{\text{st}}}}\) were reflected in Eq. (16). As a result, the \(\tau\) of Eq. (16) decreased as \(d_{{\text{b}}}\) increased, and the rate of decrease gradually decreased. In addition, the \(\tau\) of Eq. (16) increased almost linearly with increasing \(A_{{{\text{st1}}}} /S_{{{\text{st}}}}\).

Fig. 15
figure 15

Bond–slip behavior according to influencing parameters.

6 Conclusions

In this study, a mapping function was developed to simplify estimating \(\tau - s\) in RC beams. A total of 255 pull-out specimen data and 75 beam specimen data were collected, and the feature importance and correlation of the two groups were analyzed based on the data. In addition, the \(\tau - s\) model for pull-out specimens (base model) and the \(\tau - s\) mapping function between the pull-out and beam specimens were derived using GP. The rationality of the proposed model was verified by comparing its accuracy with that of existing models. The mapping function proposed can be used to estimate the \(\tau - s\) of the beam through a relatively simple pull-out test. In addition, if the mapping function is used together with the bond-slip model of pull-out specimens, the \(\tau - s\) curve can be derived without experimental data. The following conclusions were drawn from the results of this study.

  1. 1.

    It was found that \(f^{\prime}_{{\text{c}}}\) had the greatest influence on the \(\tau\) of the pull-out specimens, and the degree of influence was the highest in \(f^{\prime}_{{\text{c}}}\), followed by \(c\), \(d_{{\text{b}}}\), and \(\frac{{A_{{{\text{st1}}}} }}{{S_{{{\text{st}}}} }}\). When compared with the pull-out specimens, the influence of \(f^{\prime}_{{\text{c}}}\) was slightly decreased in the beam specimens. In addition, the influence of \(A_{st1} /S_{st}\) increased significantly, and \(f^{\prime}_{{\text{c}}}\) had the highest degree of influence, followed by \(\frac{{A_{{{\text{st1}}}} }}{{S_{{{\text{st}}}} }}\), \(c\), and \(d_{{\text{b}}}\).

  2. 2.

    While the MAEs for the \(\tau - s\) and \(\tau_{max}\) of the proposed model were 2.04 and 2.80 MPa, respectively, for the pull-out specimens, the MAEs for the \(\tau - s\) and \(\tau_{{{\text{max}}}}\) of the proposed model were 3.11 and 1.53 MPa, respectively, for the beam specimens. The proposed model exhibited the lowest error when compared with the existing models.

  3. 3.

    The proposed model was used to compare the \(\tau - s\) behaviors of the two groups. The comparison revealed that the \(\tau_{{{\text{max}}}}\) of the beam specimens was lower than that of the pull-out specimens, and the beam specimens exhibited a more gradual decrease in \(\tau\) at the post-peak stage when compared with the pull-out specimens.