1 Introduction

The vulnerability of a building can be evaluated either by in-situ technique of data analysis with non-constructive methods, known as structural health monitoring, or numerical analysis of structural models. The main idea of using such methods is to evaluate the building performance in the operating condition. Although in-situ technique can provide a wide range of data, some practical limitations such as implementing the sensors or actuators and mechanical problems during the time can prevent the performance assessment of structures [1,2,3]. Therefore, this method can be improved by response prediction methods for buildings subjected to seismic excitations.

Nowadays, the seismic probabilistic assessment of a building needs to perform complicate analysis using precise finite element model, which may need a time-consuming process for evaluating different limit states (e.g., see [4, 5]). Due to the unpredictable nature of ground motions, it is essential to predict the nonlinear structural response during seismic loads to take precautions for reducing the probability of collapse risk. There are some approaches that can be employed to perform nonlinear analysis. The nonlinear static analysis, known also as pushover analysis, can provide information about the base shear versus top floor displacement. While the nonlinear time history analysis uses the pre-recorded earthquakes and performs analysis considering the scale factors defined based on the acceleration spectrum prescribed by design code. Hence, the most accurate approaches of estimating seismic response are conducted by the nonlinear time history analysis and Incremental Dynamic Analysis (IDA) using prior seismic events and finite element methods [6,7,8]. The prediction of seismic response using these approaches need to prepare complex models and perform time-consuming analysis, while using simplified models (e.g., single-degree of freedom model) are computationally efficient with low performance and behavior compared to the real structures. Therefore, there is a need to introduce a novel Machine Learning (ML)-based method to efficiently and accurately predict the seismic response of RC frames.

Finding the seismic capacity of buildings can help engineers to find a preliminary prediction for the performance levels of the designed building. Kazemi et al. [9, 10] proposed factors for modifying and estimating the collapse capacity of colliding steel Moment-Resisting Frames (MRFs) and colliding Reinforced Concrete (RC) and steel frames [11]. It should be noted that the proposed factors were achieved from complex modeling and analysis; therefore, there is a need to propose a prediction model to avoid such prohibitively complex analysis. Recently, ML algorithms are applied in many civil engineering areas such as failure mode of steel base-plate connection [12], damage identification of bridge [13], damage state of steel frames [14], and RC beams [15]. ML methods are divided into two main parts of supervised and unsupervised algorithms, which the seismic response prediction can be considered as supervised learning using training and testing datasets with the possibility of assuming n-features for n-samples. Therefore, in this method, it is possible to take the important features into account [16,17,18]. Huang et al. [19] proposed a backpropagation neural network to predict the seismic response of structures. Yinfeng et al. [20] used the Support Vector Machine (SVM) algorithm for predicting the nonlinear time history response of structures. Then, Lagaros and Papadrakakis [21] improved neural networks for predicting the nonlinear time history response of a three-dimensional building using six seismic excitations. De Lautour and Omenzetter [22] developed a methodology for estimating the structural responses using pattern recognition of damages. ML algorithms are in the interest of some researchers to use for nonlinear modal analysis [23], predicting seismic responses for achieving fragility curves [24], and predicting maximum displacements of isolated pendulum system [25]. Oh et al. [26] developed a neural network model for predicting the seismic response of buildings based on the correlation of records using 2700 artificial records. Luo and Paal [27] proposed a novel artificial methodology for seismic response prediction of RC structures using 272 RC columns datasets.

It is confirmed that there is no unique formula for the prediction of Maximum Interstory Drift Ratio (IDRmax) and Median of IDA curves (M-IDAs) for any type of RC buildings. The purpose of this research is to develop a powerful ML-based tool with employing the innovative data sampling and hyperparameter optimization methods such as fine-tune method, halving search strategy, grid search method, and k-fold cross-validation. For this purpose, a wide range of data points containing 165 RC MRFs with different length and number of bays were numerically determined to prepare training dataset. Then, the ML-based prediction model can be used for estimating the seismic response and seismic limit-state capacities of RC buildings that can be further applied for a preliminary estimation of IDRmax and M-IDAs of existing and newly constructed buildings. The seismic response prediction results would help designers to find out the behavior of the designed building, and regarding the behavior, it is possible to control the performance of structural elements for postponing the seismic damages. In other words, estimating the IDRmax can be used for predicting the maximum deformation of buildings, and predicting the Sa(T1) of M-IDAs can be applied for seismic performance levels assessment. Finally, the results of research were used for introducing an estimation tool based on the developed ML algorithms.

2 Structural response prediction model

2.1 Artificial neural network

Due to the high ability of Artificial Neural Networks (ANNs) for prediction, they can be trained for different problems, such as positioning site facilities [28], the seismic limit-state performance of bridge piers [29], estimating the fracture toughness of rocks [30], optimizing the consumption of energy [31], estimating the compressive strength of steel fiber-reinforced concrete [32], seismic vulnerability assessment of RC frames [33], and seismic response prediction of structures [34]. ANNs contain three main parts of the input layer, hidden layers, and output layer, which are connected by some nonlinear function with the adjusted weight. The weight of each neuron can increase or decrease the strength of connection for purpose of minimizing the loss function or error (i.e., the difference between the predicted and actual values). Backward and forward propagation methods can be used for recalculating the weights of each neuron in the previous iteration to minimize the error; then, the process can be repeated with new adjusted weights to achieve a reliable model. The backward propagation method is presented in Fig. 1. In this study, IDRmax and Sa(T1) were defined as targets for backward and forward propagation ANNs. Moreover, Multi-layer Perceptron Regressor (MLPReg) considers the linear function to predict seismic responses of RC structures.

Fig. 1
figure 1

Functioning of the backpropagation method in ANNs

2.2 Random decision forest

Random decision Forest (RF) can be employed for both regression and classification problems. RF algorithm uses an ensemble multiple bagging models parallel to a different train subset from train data, and achieves the final result based on the majority voting. Figure 2 presents the RF algorithm with the bagging principle.

Fig. 2
figure 2

RF algorithm with the bagging principle

Although the RF algorithm can be classified as a decision tree, the RF method considers subsets of data to solve the overfitting problem while selecting random observations instead of a set of formulas [35]. It should be noted that different parameters were selected by trial and error to find the lower bias and higher variance values to overcome the overfitting problem and achieve an optimized prediction model. Moreover, different types of RF algorithms known as An Extra-Trees Regressor (ETReg), which randomly selects decision tree to fit input data, and An Extremely Randomized Tree Regressor (ERTReg), which uses the random tree selection to improve the calculations speed [36], and Bagging Regressor (BReg), which aggregate individual predictions were used to find the best prediction model [37].

2.3 Boosting algorithms

Boosting principle is another way of using RF methods. In this principle, weak learners combine in sequential order to create a strong model with higher accuracy of prediction. Adaptive Boosting (AdaBoost) algorithm combines strong base learners such as decision trees with a single split to weight the data points for improving the accuracy of estimation [38]. Gradient Boosting Machine (GBM) comes from the idea of improving the weak learners to enhance their final results by minimizing the loss function. Moreover, Histogram-based Gradient Boosting Regression (HistGBR) considers the quantization method for splitting the features for prediction with a higher speed compared to GBM. To control the accuracy of the results, the following formula can be used considering the initial probability equal to 0.5, and in each step, the value can be compared with the previous step to find an optimized model.

$${\text{Value}} = \frac{{\sum\nolimits_{i = 1}^{n} {{\text{Residual}}_{i} } }}{{\sum\nolimits_{i = 1}^{n} {({\text{Previous Probability}}_{i} \times (1 - {\text{Previous Probability}}_{i} ))} }},$$
(1)
$${\text{Residual}}_{i} = ({\text{Actual (IDR}}_{{{\text{max}}\;i}} ) - {\text{Predicted (IDR}}_{{{\text{max}}\;i}} )),$$
(2)
$${\text{Value}} = \frac{{\sum\nolimits_{i = 1}^{n} {{\text{Residual}}_{i} } }}{{\sum\nolimits_{i = 1}^{n} {({\text{Previous Probability}}_{i} \times (1 - {\text{Previous Probability}}_{i} )) + \lambda } }}.$$
(3)

Extreme Gradient Boosting (XGBoost) is an improved algorithm of GBM with a regularize factor, λ, to reduce the effectiveness of small leaves [39, 40]. In this study, a fine-tune XGBoost model was used to change the trees number and parameters to find the best target based on the following formula:

2.4 Support vector machine

Support Vector Machine (SVM) is selected as a decision boundary method with the capability of using hyperplane based on the marginal distances for two-dimensional and three-dimensional spaces [41]. In addition, Nu-Support Vector Regression (NuSVR), which considers the ν parameter as the controlling number of vectors [42], and Linear Support Vector Regression (LSVR), which considers functions for loss and penalties [43], were assumed to find a suitable model for estimating IDRmax and Sa(T1). To enhance the performance of ML methods during the training, and reduce the risk of losing the important datasets, the k-fold cross-validation was employed. Figure 3 presents the k-fold cross-validation methodology, in which, training and testing datasets are 70–80% and 30–20% of total data points, respectively [44]. It is worth mentioning that the k-fold cross-validation with different k was employed for assumed ML algorithms to find the suitable k with higher performance.

Fig. 3
figure 3

Architecture of k-fold cross-validation

2.5 Regressors models

Some important regression algorithms can be used for IDRmax prediction, which is a supervised regression model while not included in the abovementioned category. For example, these models are not using the hidden layers ability (i.e., ANNs) or boosting methods (i.e., XGBoost); therefore, this subsection is defined to include the ML algorithms used in this research with different ability of predictions. Response Prediction in Voting Regressor (VReg) is based on the average of the individual results, while K-Nearest neighbor Regression (KNR) assume linear estimation on the mean of data points. On the other hand, Gaussian Process Regression (GPReg) renormalizes the targets to find a zero mean for the maximum log marginal of data points. Linear Regression (LReg) considers a linear estimation model to minimize a target of residual sum defined as squares of predicted and actual values. In addition, Gamma Regressor (GReg) uses the strategy of combining data points with an inverse function and their logarithmic unit deviance [45]. The algorithm that uses the strength of estimators for finding the final estimator to solve the prediction model is known as Stacking Regressor (SReg) (see more detail [46]). Partial Least Squares Regression (PLSReg) is another regression model that has the ability to assume maximum multidimensional direction for data points to achieve fundamental relations between inputs and outputs [47]. Since Python libraries provide a great possibility for developing the ML algorithms as well as the free access of this software, the Python software as a general-purpose programming language is selected for implementing ML methods. Therefore, all assumed ML algorithms were developed in Python software and different resampling strategies, such as fine-tune method, halving search strategy, grid search method, and k-fold cross-validation were used to improve them as a prediction model.

3 Modeling process

To train the ML algorithms, eleven types of RC buildings including two to twelve-floor elevations (i.e., 2-, to 12-Story buildings) having three bay length types (i.e., 5 m, 6.1 m, and 7.6 m) with the plan presented in Fig. 4 were assumed. All buildings modeled in ETABS software based on the assumption of soil type D, acceleration parameters of SD1 = 0.6 g and SDs = 1.0 g for the construction site of high seismic, and design parameters of R = 8, Cd = 5.5, and Ω = 3 in accordance with ASCE 7‐16 [48]. It is noteworthy that the acceleration parameters of the construction site were achieved based on the USGS website [49]. In addition, a floor dead load of 8.4 kN/m2 and a floor live load of 2.4 kN/m2 were applied to all floor levels of buildings. To design structural elements, the concrete compressive strength of 34.5 MPa (i.e., 5 ksi, see Table 6–2 in reference [50]) was used [51]. Details of structural elements of RC frames assuming the bay length of 6.1 m were presented in Figs. 5, 6 and 7. To perform collapse analysis, all buildings were modeled as two-dimensional RC frames in Opensees [52] assuming the leaning column for those gravity columns not included in models to consider the P-delta effects [53,54,55,56]. In addition, the two-dimensional frames were modeled and verified with their corresponding buildings considering modeling procedures used by Haselton and Deierlein [50] and Kazemi et al. [9,10,11, 57, 58]. According to these procedures, plastic hinge models for simulating seismic collapse presented in Fig. 4 were developed by Ibarra et al. [59] and Altoontash [60]. It should be noted that for considering the real condition of RC buildings, all panel zones were modeled, and concentrate plastic hinge models were used in the ends of structural elements with possibility of achieving seismic collapse (for more detail on modeling see [50]).

Fig. 4
figure 4

RC MRFs plan and concentrated plasticity approach employed to model buildings

Fig. 5
figure 5

Structural documentation of the 2-Story, 3-Story, 4-Story, 5-Story, and 9-Story RC MRFs

Fig. 6
figure 6

Structural documentation of the 10-Story, 11-Story, and 12-Story RC MRFs

Fig. 7
figure 7

Structural documentation of the 6-Story, 7-Story, and 8-Story RC MRFs

To train the ML algorithms, 165 RC MRFs were assumed to have one-, two-, three-, four-, and five-bays, and 2-, to 12-Story elevations having the bay lengths of 5 m, 6.1 m, and 7.6 m. To assess IDRmax in different intensity measures and seismic limit-state curves of all 165 RC MRFs, IDAs were performed based on the spectral acceleration in the period of the structure, Sa(T1), as intensity measure, and IDRmax as engineering demand parameter, considering near-fault Pulse-like (PL), and No-Pulse (NP) records introduced by FEMA-P695 [61]. To perform IDAs, an algorithm was developed to implement the hunt and fill methodology using both Opensees [52] and MATLAB [62] software to reduce the time of analysis. It is worth mentioning that the programming code was developed in MATLAB [62] to control the entire analysis procedure; and in addition, to post-process the results of the analysis. Figure 8 presents the IDA curves of the 2-Story, 4-Story, 8-Story, and 12-Story RC frames having three bays with 6.1 m length including NP records. It should be noted that there is no restriction on the increasing steps of the intensity measure selection in this study; therefore, the results are distributed with different ranges of the Sa(T1).

Fig. 8
figure 8

IDA curves of the RC frames having three bays with 6.1 m including NP records

The training datasets were prepared with important features of weight, aspect ratio, reinforcement ratio for beams and columns, story number, bay length and the total height of RC frames, Sa(T1), the direction and RSN number of record, fundamental period (T1), and IDRmax in each step of the analysis, which achieved based on the trial and error. In addition, for seismic response prediction models, the IDRmax of selected RC frames was considered as a target in the test dataset, and for seismic limit-state capacity prediction models, the Sa(T1) of M-IDAs of selected RC frames were considered as a target of prediction in the testing dataset. Therefore, two main training datasets were considered to train and test the prediction models. In addition, 92,400 data points were considered in the training dataset that were achieved by performing IDAs.

4 Analytical procedure

The main purpose of this study is to train ML algorithms for accurate prediction of the IDRmax and the seismic limit-state capacity of RC frames using M-IDAs (e.g., presented in pink color in Fig. 8). M-IDAs can be used to estimate the seismic performance levels of the structures assuming a different threshold of IDRmax introduced by seismic provisions. Therefore, the analytical procedure presented in Fig. 9 depicts four main parts used for preparing prediction models. The first part in the blue color is the modeling and validation of RC MRFs using ETABS and Opensees [52] softwares (see Sect. 3). The green part, explains the preparation of training and testing datasets based on the IDRmax and M-IDAs as targets of prediction. In the red section, ML algorithms were implemented in Python software and improved based on some innovative methodologies for the prediction of the two aforementioned targets. After validation of predicting models, some important ML algorithms were selected for the violet part, which shows the second validation of prediction models for a new RC building to show the capability of the proposed ML-based model.

Fig. 9
figure 9

Analytical procedures are used to achieve a high accuracy model

4.1 Data selection method

Although many features can influence the response prediction of structures, introducing all these features can reduce the speed of calculations while increase the overfitting possibility in the algorithms. Therefore, it is necessary to provide the important features while the prediction accuracy remains unchanged during the validations. To do this, different feature selection methods such as filter and wrapper methods, which contains the more suitable methods of forward feature selection, backward feature elimination, and exhaustive feature selection, were used to achieve the importance of input features. Figure 10 presents the relative importance of seven features with higher scores achieved by trial and error using the aforementioned methods. Other features were remove since their relative importance were less than these feature. For estimating the M-IDA curve, three main features of the number of bays, fundamental period of the frame, and IDRmax have more scores compared to other features. On the other side, for predicting IDRmax as a target, five features of number of stories, weight, fundamental period of the frame, number of bays, and Sa(T1) have scored more than 10%. According to Fig. 10, these seven features were selected in the training and testing datasets for prediction models.

Fig. 10
figure 10

Relative importance achieved by trial and error for predicting M-IDA curve (left) and IDRmax (right) as targets of models

It is noteworthy that to enhance the ability of the methods, the feature selection approaches were used simultaneously with embedded method to reduce the effects of those data points with low effects on the predictions of selected target. In other words, the developed embedded method reduces the number of data points for reasonable computational cost while increases the capability of ML algorithms and prevents the overfitting problem, which is the most important issue in the performance of models. Therefore, all ML methods improved based on the developed embedded method in purpose of increasing their ability.

To compare the reliability and capability of the aforementioned ML algorithms, the statistical metrics presented in Table 1 were used. The coefficient of determination, R2, is widely used for presenting the accuracy of prediction and can take values between 0.0 and 1.0 (or 0.0% and 100%) to show the spreads of predicted and actual data points from the x = y line. Other metrics compare the actual and predicted values to show the capability of models for minimizing the error, which is the difference between the actual and predicted values.

Table 1 Statistical metrics used for evaluating the ML models

Twenty ML algorithms were implemented in Python software and used as a prediction model. A sensitivity analysis was performed using the 3-Story RC frame with three bays having bay lengths of 5.0 m subjected to PL records for both models of prediction based on the IDRmax and Sa(T1) as a target. Table 2 shows the comparison of statistical metrics for the performance evaluation of ML algorithms for predicting IDRmax. It can be seen that most ML algorithms achieved higher values of R2, which shows the accuracy of these algorithms. In the IDRmax as target of testing dataset, eight methods of PLSReg, SReg, VReg, LReg, GReg, MLPReg, SVM, and LSVR had R2 values of 0.384, 0.386, 0.585, 0.350, 0.160, 0.205, 0.259, and 0.232, respectively. Although their accuracy of prediction in the training dataset was higher than approximately 90%, their performance in the testing dataset is lower than other algorithms and cannot be considered as reliable models. In addition, In the Sa(T1) as target of testing dataset, five algorithms of LReg, PLSReg, LSVR, SReg, and GReg had R2 values of 0.775, 0.774, 0.743, 0.614, and 0.313, respectively. Therefore, these algorithms can be considered as not reliable models that cannot achieve R2 values higher than 0.77. Comparing the metrics can provide a good information about the capability of the models and their power for estimating the targets. These tables also can be used for selecting the best ML methods. To better compare the metrics, the score marker were used, which provides the number from 1 to 20 for ranking the ML methods for each of the metrics. Then, in each ML methods, the scores of each metrics were determined to compare the capability of them. According to results of Table 2, the BReg, HistGBR, ETReg, RF, ERTReg, GBM, and XGBoost methods achieved scores of 49, 49, 80, 82, 83, 86, and 98, respectively, which are introduced as best methods. Moreover, the methods of PLSReg, LReg, NuSVR, LSVR, MLPReg, GReg, and SVM had the scores of 175, 176, 190, 199, 212, 219, and 243, respectively, in the end of ranking list.

Table 2 Comparison of statistical metrics assuming the 3-Story RC frame with three bays having bay lengths of 5.0 m as test data for predicting IDRmax

According to results of Table 3, the ANNs, HistGBR, XGBoost, RF, NuSVR, BReg, and ETReg methods achieved scores of 49, 49, 66, 73, 81, 86, and 93, respectively, which are introduced as best models, while the methods of VReg, PLSReg, LReg, LSVR, SReg, and GReg with scores of 190, 215, 222, 236, 244, and 250, respectively, are introduced as weak prediction models. The statistical indicators used for calculating the error of methods depend on the actual and predicted values; therefore, the higher value of the error shows the dispersion of the predicted values. Although the SVM method had lower performance for predicting IDRmax of the 3-Story RC frame, the SVM method achieved the R2 value of 0.987 for predicting Sa(T1) that proves the acceptable performance of this method.

Table 3 Comparison of statistical metrics assuming the 3-Story RC frame with three bays having bay lengths of 5.0 m as test data for predicting Sa(T1)

5 Performance of prediction models

The most important part of the prediction models is to prepare the datasets according to the important features. The seven important features related to each type of prediction (i.e., Sa(T1) or IDRmax) was plotted in Fig. 10. According to these targets, the training dataset contained 92,400 data points achieved by performing IDAs. In other words, 92,400 nonlinear time history analyses were done based on increasing the intensity measures (i.e., IDA) to prepare the large database for prediction. After preparing suitable datasets, the selected ML algorithms with higher accuracy of prediction (see Tables 2 and 3) were used for seismic response prediction models. Figures 11 and 12 present prediction results of IDRmax for the 6-Story and 8-Story RC MRFs assuming five types of bays including PL records. It should be noted that the selected RC MRFs were removed from training datasets during the prediction. For the 6-Story RC MRFs with one-, two-, three-, four-, and five-bays, the ML algorithms of HistGBR, ANNs, and BReg had higher accuracy of prediction values of 90.2%, 93.5%, 94%, 95.4%, and 96.3%, respectively. For the 8-Story RC MRFs with one-, two-, three-, four-, and five-bays, the ML algorithms of ETReg, BReg, and ANNs had higher accuracy of prediction values of 93.8%, 94.3%, 93.4%, 95%, and 95.3%, respectively. It can be seen that in all results, the algorithms had the most precise prediction for IDRmax of lower than 4.0% due to the points near the blue lines. Therefore, the mentioned algorithms can be used as a precise prediction model for IDRmax lower than 4.0% in all types of RC MRFs.

Fig. 11
figure 11

IDRmax prediction results for the 6-Story RC MRFs as testing datasets including PL records

Fig. 12
figure 12

IDRmax prediction results for the 8-Story RC MRFs as testing datasets including PL records

To present the estimation accuracy of M-IDA curve models, only having higher values of R2 is not enough due to the relations between the values of before and after data points. Therefore, the best way to present the power of the algorithm is to plot both actual and predicted curves. Figures 13 and 14 show the predicted M-IDAs versus the actual M-IDA curve of the 3-Story and 7-Story RC MRFs having five types of bays subjected to PL records. The two most precise predicted M-IDAs were plotted that show the accuracy of the prediction models used in this study and can be used as a preliminary prediction of M-IDA curves of RC MRFs.

Fig. 13
figure 13

Predicted M-IDAs of the 3-Story RC MRFs as testing datasets including records

Fig. 14
figure 14

Predicted M-IDAs of the 7-Story RC MRFs as testing datasets including PL records

6 Generality of prediction models

In Sect. 5, the capability of ML algorithms for predicting the IDRmax and Sa(T1) of the aforementioned RC frames was presented. To present the overall accuracy of the proposed ML-based prediction of IDRmax and Sa(T1) as a target for the M-IDA curve, four case study RC buildings with different structural parameters were assumed to show the reliability and applicability of prediction models. Figure 15 presents the structural plan and documentation of beams and columns of a five-Story RC frame that was used for the performance evaluation of prediction models. It should be added that the testing dataset prepared for this RC frame should have same important features as the training dataset for prediction models (see Fig. 10). Therefore, the selected RC frame was modeled in ETABS and Opensees [52] softwares, and IDAs were performed based on the targets of Sa(T1) and IDRmax including assumed seismic records. The results of the analysis were prepared as a testing dataset; then, trained prediction models were used to estimate IDRmax and Sa(T1) as a target.

Fig. 15
figure 15

A five-Story RC frame is used for the performance evaluation of trained prediction models

Given that it is not possible to have an experimental sample to validate prediction models, to challenge the ability of proposed ML-based models, four cases of selected RC buildings assuming different input features were assumed. In Case A, the bay length of the five-Story RC frame was selected as equal to 6.5 m. In Case B, the bay length and story elevation of the five-Story RC frame were selected equal to 6.5 m and 3.8 m, respectively. For Case C and D, the weight of the five-Story RC frame was reduced by 10% and 20%, respectively, compared to the aforementioned loads assumed in Sect. 3, while the bay length and story elevation were selected equal to 6.5 m and 3.8 m, respectively. These four cases have different input features to challenge the possibility of using proposed ML-based models for any type of RC frame including two record subsets. The fundamental periods of Case A, Case B, Case C, and Case D were equal to 1.351, 1.291, 1.225, and 1.156, respectively. Therefore, all input features of the assumed cases are different from the training models. Figure 16 presents the comparison of R2 for ML algorithms to predict IDRmax of the five-Story RC frames assuming PL records. Four algorithms of BReg, ETReg, ERTReg, and ANNs had higher values of prediction accuracy equal to 95.7%, 93.19%, 90.27%, and 90%, respectively, for the prediction of IDRmax in Case A, and had higher values of prediction accuracy equal to 92.78%, 90.31%, 87.85%, and 90.1%, respectively, for prediction of IDRmax in Case B. Moreover, in Case C, the ANNs and BReg algorithms achieved a prediction accuracy of 92.9% and 89.76%, respectively, while in Case D, the BReg, ETReg, and ANNs algorithms had a prediction accuracy of 92.5%, 89.93%, and 87.32%, respectively. Figure 17 depicts the scatter plots of predicted IDRmax of four cases of the five-Story RC frames in the best ML algorithm including PL records. It should be noted that similar results were observed for NP records, while results regarding PL records were presented for brevity.

Fig. 16
figure 16

Pie charts of ML algorithms for predicting IDRmax of the five-Story RC frames assuming PL records

Fig. 17
figure 17

Predicted IDRmax of the five-Story RC frames in the best ML algorithm including PL records

Figure 18 presents the pie charts of ML-based models for estimating the M-IDA curve of the five-Story RC frames assuming PL records. ML methods achieved R2 values higher than 0.97 for predicting testing datasets of four cases. Although the pie charts show the highest values of the predicted M-IDA curve with R2 of more than 0.97, some of the ML algorithms cannot fit the actual M-IDA curve of RC frames. Therefore, ML algorithms were improved to achieve the best fitting curves. Figure 19 presents the fitted predicted M-IDAs by improved ML algorithms. The ANNs and XGBoost algorithms had the best fitting curves and can be considered the most reliable prediction models.

Fig. 18
figure 18

Pie charts of ML methods for estimating the M-IDA curve of the five-Story RC frames assuming PL records

Fig. 19
figure 19

Predicted M-IDA curves of the five-Story RC frames in the best ML methods including PL records

To determine the seismic performance levels of the five-Story RC frames, the structural performance levels that were defined based on the allowable IDRmax values of 1.0%, 2.0%, and 4.0% corresponding to Immediate Occupancy (IO), Life Safety (LS), and Collapse Prevention (CP) performance levels, respectively, were assumed. It is noteworthy that the limit states were described according to the Table C1–3 in FEMA 356 [63] for limiting the damages states of primary structural elements of the lateral force-resisting system. According to allowable performance levels, Table 4 presents the actual values achieved by M-IDAs of the RC frames and those were predicted by improved ML algorithms. According to Table 4, the predicted values in all performance levels are very close to the actual values; thus, the prediction models have the ability of reliable prediction and can be used by researchers for predicting RC frames.

Table 4 Predicted seismic performance levels of the five-Story RC frames based on the M-IDA curves including PL records

7 Graphical user interface

The preliminary estimation of the performance levels can widely help designers to know about the weakness of the designed buildings, therefore, they can use the results for vulnerability assessments of structures. To prepare for better accessibility of the results of this research, Graphical User Interface (GUI) was introduced to receive input parameters related to the RC frame and seismic limitation of performance levels and provide the predicted Sa(T1) regarding the seismic limit-state performance levels of RC MRFs prescribed by FEMA356 [63]. It should be noted that the reliability of prediction models was discussed in Sect. 6, and the introduced GUI can plot the predicted ML-based M-IDA curve while mitigating the need for complex modeling and analyses. It is noteworthy that the input parameters can be easily achieved for the assumed structure, and in addition, for calculating the period of the structure, the formulas that have been provided by the seismic provisions (e.g., ASCE 07-16 [48]) can be used.

8 Conclusions

Recent studies confirm that complex modeling and analysis should be performed to determine seismic responses and seismic performance levels of RC structures, while the most of analyses are time-consuming and need to be done by high-speed computer systems. In addition, the unpredictable nature of seismic events is another factor that affects seismic performance achievement. To overcome this issue, this research proposed ML-based prediction models to estimate the IDRmax and Sa(T1) for the M-IDA curve of the RC frames. The analysis results can be summarized as follows:

  • Assuming IDRmax as the target of prediction, eight algorithms of PLSReg, SReg, VReg, LReg, GReg, MLPReg, SVM, and LSVR had lower R2 values (i.e., less than 65%) and cannot be used as prediction models. On the other hand, eight algorithms of KNR, PLSReg, SReg, LReg, GReg, MLPReg, SVM, and LSVR had lower R2 values (i.e., less than 77%) for predicting Sa(T1) as a target. In addition, ML algorithms had the precise prediction values located exactly in the x = y line, assuming allowable IDRmax of lower than 4.0%, that shows the ability of the proposed methods for estimating IDRmax in all RC MRFs.

  • Considering the curve plotting ability that improved in ML methods based on the allowable performance levels (i.e., IDRmax values of 1.0%, 2.0%, and 4.0%), three algorithms of the XGBoost, ANNs, and NuSVR can predict the seismic performance levels of the five-Story RC frame using the predicted M-IDA curves. Therefore, they can be considered as proposed prediction models for any type of RC frame.

  • Four case study RC buildings were assumed to check the reliability of prediction models. In Case A, the BReg, ETReg, ERTReg, and ANNs algorithms predicted the IDRmax with the accuracy of 95.7%, 93.19%, 90.27%, and 90%, respectively, and in Case B, the accuracy of 92.78%, 90.31%, 87.85%, and 90.1%, respectively, were achieved by prediction models. In Case C, the ANNs and BReg algorithms with the accuracy of 92.9% and 89.76%, respectively, in Case D, the BReg, ETReg, and ANNs algorithms with the accuracy of 92.5%, 89.93%, and 87.32%, respectively, can be considered as best models of prediction.

  • Graphical User Interface (GUI) was proposed for preliminary estimation of the seismic performance levels of RC frames based on the main important features that can be introduced as input parameters. In addition, the GUI can be able to plot the predicted M-IDA curve regarding both seismic events and facilitate the seismic vulnerability assessment of RC buildings. Moreover, there is no limit for introducing the thresholds of the allowable IDRmax, and the users can find the prediction results for the selected IDRmax.

  • For operating the GUI, (a) receives the main important structural features that affects the seismic response and seismic limit-state capacities, (b) receives the selected IDRmax defined by user (e.g., four main IDRmax were showed in Fig. 20), (c) predicts the M-IDA curve of introduced RC frames, and (d) presents the Sa(T1) corresponding to the selected IDRmax.

Fig. 20
figure 20

GUI introduced for predicting the seismic limit-state capacity of RC MRFs