Machine learning-based seismic response and performance assessment of reinforced concrete buildings

Complexity and unpredictability nature of earthquakes makes them unique external loads that there is no unique formula used for the prediction of seismic responses. Hence, this research aims to implement the most well-known Machine Learning (ML) methods in Python software to propose a prediction model for seismic response and performance assessment of Reinforced Concrete Moment-Resisting Frames (RC MRFs). To prepare 92,400 data points of training dataset for developing data-driven techniques, Incremental Dynamic Analyses (IDAs) were performed considering 165 RC MRFs with two-, to twelve-Story elevations having the bay lengths of 5.0 m, 6.1 m, and 7.6 m assuming near-fault seismic excitations. Then, important structural features were considered in datasets to train and test the ML-based prediction models, which were improved with innovative techniques. The results show that improved algorithms have higher R2 values for estimating the Maximum Interstory Drift Ratio (IDRmax), and two improved algorithms of artificial neural networks and extreme gradient boosting can estimate the Median of IDA curves (M-IDAs) of RC MRFs, which can be used to estimate the seismic limit-state capacity and performance assessment of existing or newly constructed RC buildings. To validate the generality and accuracy of the proposed ML-based prediction model, a five-Story RC building with different input features was used, and the results are promising. Therefore, graphical user interface is introduced as user-friendly tool to help researchers in estimating the seismic limit-state capacity of RC buildings, while reducing the computational cost and analytical efforts.


Introduction
The vulnerability of a building can be evaluated either by in-situ technique of data analysis with non-constructive methods, known as structural health monitoring, or numerical analysis of structural models. The main idea of using such methods is to evaluate the building performance in the operating condition. Although in-situ technique can provide a wide range of data, some practical limitations such as implementing the sensors or actuators and mechanical problems during the time can prevent the performance assessment of structures [1][2][3]. Therefore, this method can be improved by response prediction methods for buildings subjected to seismic excitations.
Nowadays, the seismic probabilistic assessment of a building needs to perform complicate analysis using precise finite element model, which may need a time-consuming process for evaluating different limit states (e.g., see [4,5]). Due to the unpredictable nature of ground motions, it is essential to predict the nonlinear structural response during seismic loads to take precautions for reducing the probability of collapse risk. There are some approaches that can be employed to perform nonlinear analysis. The nonlinear static analysis, known also as pushover analysis, can provide information about the base shear versus top floor displacement. While the nonlinear time history analysis uses the pre-recorded earthquakes and performs analysis considering the scale factors defined based on the acceleration spectrum prescribed by design code. Hence, the most accurate approaches of estimating seismic response are conducted by the nonlinear time history analysis and Incremental Dynamic Analysis (IDA) using prior seismic events and finite element methods [6][7][8]. The prediction of seismic response using these approaches need to prepare complex models and perform time-consuming analysis, while using simplified models (e.g., single-degree of freedom model) are computationally efficient with low performance and behavior compared to the real structures. Therefore, there is a need to introduce a novel Machine Learning (ML)-based method to efficiently and accurately predict the seismic response of RC frames.
Finding the seismic capacity of buildings can help engineers to find a preliminary prediction for the performance levels of the designed building. Kazemi et al. [9,10] proposed factors for modifying and estimating the collapse capacity of colliding steel Moment-Resisting Frames (MRFs) and colliding Reinforced Concrete (RC) and steel frames [11]. It should be noted that the proposed factors were achieved from complex modeling and analysis; therefore, there is a need to propose a prediction model to avoid such prohibitively complex analysis. Recently, ML algorithms are applied in many civil engineering areas such as failure mode of steel base-plate connection [12], damage identification of bridge [13], damage state of steel frames [14], and RC beams [15]. ML methods are divided into two main parts of supervised and unsupervised algorithms, which the seismic response prediction can be considered as supervised learning using training and testing datasets with the possibility of assuming n-features for n-samples. Therefore, in this method, it is possible to take the important features into account [16][17][18]. Huang et al. [19] proposed a backpropagation neural network to predict the seismic response of structures. Yinfeng et al. [20] used the Support Vector Machine (SVM) algorithm for predicting the nonlinear time history response of structures. Then, Lagaros and Papadrakakis [21] improved neural networks for predicting the nonlinear time history response of a three-dimensional building using six seismic excitations. De Lautour and Omenzetter [22] developed a methodology for estimating the structural responses using pattern recognition of damages. ML algorithms are in the interest of some researchers to use for nonlinear modal analysis [23], predicting seismic responses for achieving fragility curves [24], and predicting maximum displacements of isolated pendulum system [25]. Oh et al. [26] developed a neural network model for predicting the seismic response of buildings based on the correlation of records using 2700 artificial records. Luo and Paal [27] proposed a novel artificial methodology for seismic response prediction of RC structures using 272 RC columns datasets.
It is confirmed that there is no unique formula for the prediction of Maximum Interstory Drift Ratio (IDR max ) and Median of IDA curves (M-IDAs) for any type of RC buildings. The purpose of this research is to develop a powerful ML-based tool with employing the innovative data sampling and hyperparameter optimization methods such as fine-tune method, halving search strategy, grid search method, and k-fold cross-validation. For this purpose, a wide range of data points containing 165 RC MRFs with different length and number of bays were numerically determined to prepare training dataset. Then, the ML-based prediction model can be used for estimating the seismic response and seismic limit-state capacities of RC buildings that can be further applied for a preliminary estimation of IDR max and M-IDAs of existing and newly constructed buildings. The seismic response prediction results would help designers to find out the behavior of the designed building, and regarding the behavior, it is possible to control the performance of structural elements for postponing the seismic damages. In other words, estimating the IDR max can be used for predicting the maximum deformation of buildings, and predicting the Sa(T 1 ) of M-IDAs can be applied for seismic performance levels assessment. Finally, the results of research were used for introducing an estimation tool based on the developed ML algorithms.

Artificial neural network
Due to the high ability of Artificial Neural Networks (ANNs) for prediction, they can be trained for different problems, such as positioning site facilities [28], the seismic limit-state performance of bridge piers [29], estimating the fracture toughness of rocks [30], optimizing the consumption of energy [31], estimating the compressive strength of steel fiber-reinforced concrete [32], seismic vulnerability assessment of RC frames [33], and seismic response prediction of structures [34]. ANNs contain three main parts of the input layer, hidden layers, and output layer, which are connected by some nonlinear function with the adjusted weight. The weight of each neuron can increase or decrease the strength of connection for purpose of minimizing the loss function or error (i.e., the difference between the predicted and actual values). Backward and forward propagation methods can be used for recalculating the weights of each neuron in the previous iteration to minimize the error; then, the process can be repeated with new adjusted weights to achieve a reliable model. The backward propagation method is presented in Fig. 1. In this study, IDR max and Sa(T 1 ) were defined as targets for backward and forward propagation ANNs. Moreover, Multi-layer Perceptron Regressor (MLPReg) considers the linear function to predict seismic responses of RC structures.

Random decision forest
Random decision Forest (RF) can be employed for both regression and classification problems. RF algorithm uses an ensemble multiple bagging models parallel to a different train subset from train data, and achieves the final result based on the majority voting. Figure 2 presents the RF algorithm with the bagging principle.
Although the RF algorithm can be classified as a decision tree, the RF method considers subsets of data to solve the overfitting problem while selecting random observations instead of a set of formulas [35]. It should be noted that different parameters were selected by trial and error to find the lower bias and higher variance values to overcome the overfitting problem and achieve an optimized prediction model. Moreover, different types of RF algorithms known as An Extra-Trees Regressor (ETReg), which randomly selects decision tree to fit input data, and An Extremely Randomized Tree Regressor (ERTReg), which uses the random tree selection to improve the calculations

Boosting algorithms
Boosting principle is another way of using RF methods. In this principle, weak learners combine in sequential order to create a strong model with higher accuracy of prediction. Adaptive Boosting (AdaBoost) algorithm combines strong base learners such as decision trees with a single split to weight the data points for improving the accuracy of estimation [38]. Gradient Boosting Machine (GBM) comes from the idea of improving the weak learners to enhance their final results by minimizing the loss function. Moreover, Histogram-based Gradient Boosting Regression (HistGBR) considers the quantization method for splitting the features for prediction with a higher speed compared to GBM. To control the accuracy of the results, the following formula can be used considering the initial probability equal to 0.5, and in each step, the value can be compared with the previous step to find an optimized model.
Extreme Gradient Boosting (XGBoost) is an improved algorithm of GBM with a regularize factor, λ, to reduce the effectiveness of small leaves [39,40]. In this study, a fine-tune XGBoost model was used to change the trees number and parameters to find the best target based on the following formula:

Support vector machine
Support Vector Machine (SVM) is selected as a decision boundary method with the capability of using hyperplane based on the marginal distances for two-dimensional and three-dimensional spaces [41]. In addition, Nu-Support Vector Regression (NuSVR), which considers the ν parameter as the controlling number of vectors [42], and Linear Support Vector Regression (LSVR), which considers functions for loss and penalties [43], were assumed to find a suitable model for estimating IDR max and Sa(T 1 ). To enhance the performance of ML methods during the training, and reduce the risk of losing the important datasets, the k-fold cross-validation was employed. Figure 3 presents the k-fold cross-validation methodology, in which, training and testing datasets are 70-80% and 30-20% of total data points, respectively [44]. It is worth mentioning that the k-fold cross-validation with different k was employed for assumed ML algorithms to find the suitable k with higher performance.

Regressors models
Some important regression algorithms can be used for IDR max prediction, which is a supervised regression model while not included in the abovementioned category. For example, these models are not using the hidden layers ability (i.e., ANNs) or boosting methods (i.e., XGBoost); therefore, this subsection is defined to include the ML algorithms used in this research with different ability of predictions.  Response Prediction in Voting Regressor (VReg) is based on the average of the individual results, while K-Nearest neighbor Regression (KNR) assume linear estimation on the mean of data points. On the other hand, Gaussian Process Regression (GPReg) renormalizes the targets to find a zero mean for the maximum log marginal of data points. Linear Regression (LReg) considers a linear estimation model to minimize a target of residual sum defined as squares of predicted and actual values. In addition, Gamma Regressor (GReg) uses the strategy of combining data points with an inverse function and their logarithmic unit deviance [45]. The algorithm that uses the strength of estimators for finding the final estimator to solve the prediction model is known as Stacking Regressor (SReg) (see more detail [46]). Partial Least Squares Regression (PLSReg) is another regression model that has the ability to assume maximum multidimensional direction for data points to achieve fundamental relations between inputs and outputs [47]. Since Python libraries provide a great possibility for developing the ML algorithms as well as the free access of this software, the Python software as a general-purpose programming language is selected for implementing ML methods. Therefore, all assumed ML algorithms were developed in Python software and different resampling strategies, such as fine-tune method, halving search strategy, grid search method, and k-fold crossvalidation were used to improve them as a prediction model.

Modeling process
To train the ML algorithms, eleven types of RC buildings including two to twelve-floor elevations (i.e., 2-, to 12-Story buildings) having three bay length types (i.e., 5 m, 6.1 m, and 7.6 m) with the plan presented in Fig. 4 were assumed. All buildings modeled in ETABS software based on the assumption of soil type D, acceleration parameters of SD 1 = 0.6 g and SDs = 1.0 g for the construction site of high seismic, and design parameters of R = 8, Cd = 5.5, and Ω = 3 in accordance with ASCE 7-16 [48]. It is noteworthy that the acceleration parameters of the construction site were achieved based on the USGS website [49]. In addition, a floor dead load of 8.4 kN/m 2 and a floor live load of 2.4 kN/ m 2 were applied to all floor levels of buildings. To design structural elements, the concrete compressive strength of 34.5 MPa (i.e., 5 ksi, see Table 6-2 in reference [50]) was used [51]. Details of structural elements of RC frames assuming the bay length of 6.1 m were presented in Figs. 5, 6 and 7. To perform collapse analysis, all buildings were modeled as two-dimensional RC frames in Opensees [52] assuming the leaning column for those gravity columns not included in models to consider the P-delta effects [53][54][55][56].
In addition, the two-dimensional frames were modeled and verified with their corresponding buildings considering modeling procedures used by Haselton and Deierlein [50] and Kazemi et al. [9-11, 57, 58]. According to these procedures, plastic hinge models for simulating seismic collapse presented in Fig. 4 were developed by Ibarra et al. [59] and Altoontash [60]. It should be noted that for considering the real condition of RC buildings, all panel zones were modeled, and concentrate plastic hinge models were used in the ends of structural elements with possibility of achieving seismic collapse (for more detail on modeling see [50]).
To train the ML algorithms, 165 RC MRFs were assumed to have one-, two-, three-, four-, and five-bays, and 2-, to 12-Story elevations having the bay lengths of 5 m, 6.1 m, and 7.6 m. To assess IDR max in different intensity measures and seismic limit-state curves of all 165 RC MRFs, IDAs were performed based on the spectral acceleration in the period of the structure, Sa(T 1 ), as intensity measure, and IDR max as engineering demand parameter, considering nearfault Pulse-like (PL), and No-Pulse (NP) records introduced by FEMA-P695 [61]. To perform IDAs, an algorithm was developed to implement the hunt and fill methodology using both Opensees [52] and MATLAB [62] software to reduce the time of analysis. It is worth mentioning that the programming code was developed in MATLAB [62] to control the It should be noted that there is no restriction on the increasing steps of the intensity measure selection in this study; therefore, the results are distributed with different ranges of the Sa(T 1 ). The training datasets were prepared with important features of weight, aspect ratio, reinforcement ratio for beams and columns, story number, bay length and the total height of RC frames, Sa(T 1 ), the direction and RSN number of record, fundamental period (T 1 ), and IDR max in each step of the analysis, which achieved based on the trial and error. In addition, for seismic response prediction models, the IDR max of selected RC frames was considered as a target in the test dataset, and for seismic limit-state capacity prediction models, the Sa(T 1 ) of M-IDAs of selected RC frames were considered as a target of prediction in the testing dataset. Therefore, two main training datasets were considered to train and test the prediction models. In addition, 92,400 data points were considered in the training dataset that were achieved by performing IDAs.

Analytical procedure
The main purpose of this study is to train ML algorithms for accurate prediction of the IDR max and the seismic limitstate capacity of RC frames using M-IDAs (e.g., presented ig. 5 Structural documentation of the 2-Story, 3-Story, 4-Story, 5-Story, and 9-Story RC MRFs in pink color in Fig. 8). M-IDAs can be used to estimate the seismic performance levels of the structures assuming a different threshold of IDR max introduced by seismic provisions. Therefore, the analytical procedure presented in Fig. 9 depicts four main parts used for preparing prediction models. The first part in the blue color is the modeling and validation of RC MRFs using ETABS and Opensees [52] softwares (see Sect. 3). The green part, explains the preparation of training and testing datasets based on the IDR max and M-IDAs as targets of prediction. In the red section, Structural documentation of the 10-Story, 11-Story, and 12-Story RC MRFs ML algorithms were implemented in Python software and improved based on some innovative methodologies for the prediction of the two aforementioned targets. After validation of predicting models, some important ML algorithms were selected for the violet part, which shows the second validation of prediction models for a new RC building to show the capability of the proposed ML-based model.

Data selection method
Although many features can influence the response prediction of structures, introducing all these features can reduce the speed of calculations while increase the overfitting possibility in the algorithms. Therefore, it is necessary to provide the important features while the prediction accuracy remains unchanged during the validations. To do this, different feature selection methods such as filter and wrapper methods, which contains the more suitable methods of forward feature selection, backward feature elimination, and exhaustive feature selection, were used to achieve the importance of input features. Figure 10 presents the relative importance of seven features with higher scores achieved by trial and error using the aforementioned methods. Other features were remove since their relative importance were less than these feature. For estimating the M-IDA curve, three main features of the number of bays, fundamental period of the frame, and IDR max have more scores compared to other features. On the other side, for predicting IDR max as a target, five features of number of stories, weight, fundamental period of the frame, number of bays, and Sa(T 1 ) have scored more than 10%. According to Fig. 10, these seven features were selected in the training and testing datasets for prediction models. It is noteworthy that to enhance the ability of the methods, the feature selection approaches were used simultaneously with embedded method to reduce the effects of those data points with low effects on the predictions of selected target.    To compare the reliability and capability of the aforementioned ML algorithms, the statistical metrics presented in Table 1 were used. The coefficient of determination, R 2 , is widely used for presenting the accuracy of prediction and Mean square relative error Relative root mean square error can take values between 0.0 and 1.0 (or 0.0% and 100%) to show the spreads of predicted and actual data points from the x = y line. Other metrics compare the actual and predicted values to show the capability of models for minimizing the error, which is the difference between the actual and predicted values. Twenty ML algorithms were implemented in Python software and used as a prediction model. A sensitivity analysis was performed using the 3-Story RC frame with three bays having bay lengths of 5.0 m subjected to PL records for both models of prediction based on the IDR max and Sa(T 1 ) as a target. Table 2 shows the comparison of statistical metrics for the performance evaluation of ML algorithms for predicting IDR max . It can be seen that most ML algorithms achieved higher values of R 2 , which shows the accuracy of these algorithms. In the IDR max as target of testing dataset, eight methods of PLSReg, SReg, VReg, LReg, GReg, MLPReg, SVM, and LSVR had R 2 values of 0.384, 0.386, 0.585, 0.350, 0.160, 0.205, 0.259, and 0.232, respectively. Although their accuracy of prediction in the training dataset was higher than approximately 90%, their performance in the testing dataset is lower than other algorithms and cannot be considered as reliable models. In addition, In the Sa(T 1 ) as target of testing dataset, five algorithms of LReg, PLSReg, LSVR, SReg, and GReg had R 2 values of 0.775, 0.774, 0.743, 0.614, and 0.313, respectively. Therefore, these algorithms can be considered as not reliable models that cannot achieve R 2 values higher than 0.77. Comparing the metrics can provide a good information about the capability of the models and their power for estimating the targets. These tables also can be used for selecting the best ML methods. To better compare the metrics, the score marker were used, which provides the number from 1 to 20 for ranking the ML methods for each of the metrics. Then, in each ML methods, the scores of each metrics were determined to compare the capability of them. According to results of Table 2, the BReg, Hist-GBR, ETReg, RF, ERTReg, GBM, and XGBoost methods achieved scores of 49,49,80,82,83,86, and 98, respectively, which are introduced as best methods. Moreover, the methods of PLSReg, LReg, NuSVR, LSVR, MLPReg, GReg, and SVM had the scores of 175,176,190,199,212,219, and 243, respectively, in the end of ranking list.
According to results of Table 3, the ANNs, Hist-GBR, XGBoost, RF, NuSVR, BReg, and ETReg methods achieved scores of 49, 49, 66, 73, 81, 86, and 93, respectively, which are introduced as best models, while the methods of VReg, PLSReg, LReg, LSVR, SReg, and GReg with scores of 190, 215, 222, 236, 244, and 250, respectively, are introduced as weak prediction models. The statistical indicators used for calculating the error of methods depend on the actual and predicted values; therefore, the higher value of the error shows the dispersion of the predicted values. Although the SVM method had lower performance for predicting IDR max of the 3-Story RC frame, the SVM method achieved the R 2 value of 0.987 for predicting Sa(T 1 ) that proves the acceptable performance of this method.

Performance of prediction models
The most important part of the prediction models is to prepare the datasets according to the important features. The seven important features related to each type of prediction (i.e., Sa(T 1 ) or IDR max ) was plotted in Fig. 10. According to these targets, the training dataset contained 92,400 data points achieved by performing IDAs. In other words, 92,400 nonlinear time history analyses were done based on increasing the intensity measures (i.e., IDA) to prepare the large database for prediction. After preparing suitable datasets, the selected ML algorithms with higher accuracy of prediction (see Tables 2 and 3) were used for seismic response prediction models. Figures 11 and 12  To present the estimation accuracy of M-IDA curve models, only having higher values of R 2 is not enough due to the relations between the values of before and after data points. Therefore, the best way to present the power of the algorithm is to plot both actual and predicted curves. Figures 13 and 14 show the predicted M-IDAs versus the actual M-IDA curve of the 3-Story and 7-Story RC MRFs having five types of bays subjected to PL records. The two most precise predicted M-IDAs were plotted that show the accuracy of the prediction models used in this study and can be used as a preliminary prediction of M-IDA curves of RC MRFs.

Generality of prediction models
In Sect. 5, the capability of ML algorithms for predicting the IDR max and Sa(T 1 ) of the aforementioned RC frames was presented. To present the overall accuracy of the proposed   ML-based prediction of IDR max and Sa(T 1 ) as a target for the M-IDA curve, four case study RC buildings with different structural parameters were assumed to show the reliability and applicability of prediction models. Figure 15 presents the structural plan and documentation of beams and columns of a five-Story RC frame that was used for the performance evaluation of prediction models. It should be added that the testing dataset prepared for this RC frame should have same important features as the training dataset for prediction models (see Fig. 10). Therefore, the selected RC frame was modeled in ETABS and Opensees [52] softwares, and IDAs were performed based on the targets of Sa(T 1 ) and IDR max including assumed seismic records. The results of the analysis were prepared as a testing dataset; then, trained prediction models were used to estimate IDR max and Sa(T 1 ) as a target.
Given that it is not possible to have an experimental sample to validate prediction models, to challenge the ability of proposed ML-based models, four cases of selected RC buildings assuming different input features were assumed. In Case A, the bay length of the five-Story RC frame was selected as equal to 6.5 m. In Case B, the bay length and story elevation of the five-Story RC frame were selected equal to 6.5 m and 3.8 m, respectively. For Case C and D, the weight of the five-Story RC frame was reduced by 10% and 20%, respectively, compared to the aforementioned loads assumed in Sect. 3, while the bay length and story elevation were selected equal to 6.5 m and 3.8 m, respectively. These four cases have different input features to challenge the possibility of using proposed ML-based models for any type of RC frame including two record subsets. The fundamental periods of Case A, Case B, Case C, and Case D were equal to 1.351, 1.291, 1.225, and 1.156, respectively. Therefore, all input features of the assumed cases are different from the training models. Figure 16 presents the comparison of R 2 for ML algorithms to predict IDR max of the five-Story RC frames assuming PL records. Four algorithms of BReg, ETReg, ERTReg, and ANNs had higher values of prediction accuracy equal to 95.7%, 93.19%, 90.27%, and 90%, respectively, for the prediction of IDR max in Case A, and had higher values of prediction accuracy equal to 92.78%, 90.31%, 87.85%, and 90.1%, respectively, for prediction of IDR max in Case B. Moreover, in Case C, the ANNs and BReg algorithms achieved a prediction accuracy of 92.9%   Figure 17 depicts the scatter plots of predicted IDR max of four cases of the five-Story RC frames in the best ML algorithm including PL records. It should be noted that similar results were observed for NP records, while results regarding PL records were presented for brevity. Figure 18 presents the pie charts of ML-based models for estimating the M-IDA curve of the five-Story RC frames assuming PL records. ML methods achieved R 2 values higher than 0.97 for predicting testing datasets of four cases. Although the pie charts show the highest values of the predicted M-IDA curve with R 2 of more than 0.97, some of the ML algorithms cannot fit the actual M-IDA curve of RC frames. Therefore, ML algorithms were improved to achieve the best fitting curves. Figure 19 presents the fitted predicted M-IDAs by improved ML algorithms. The ANNs and XGBoost algorithms had the best fitting curves and can be considered the most reliable prediction models.
To determine the seismic performance levels of the five-Story RC frames, the structural performance levels that were defined based on the allowable IDR max values of 1.0%, 2.0%, and 4.0% corresponding to Immediate Occupancy (IO), Life Safety (LS), and Collapse Prevention (CP) performance levels, respectively, were assumed. It is noteworthy that the limit states were described according to the Table C1-3 in FEMA 356 [63] for limiting the damages states of primary structural elements of the lateral force-resisting system. According to allowable performance levels, Table 4 presents the actual values achieved by M-IDAs of the RC frames and those were predicted by improved ML algorithms. According to Table 4, the

Graphical user interface
The preliminary estimation of the performance levels can widely help designers to know about the weakness of the designed buildings, therefore, they can use the results for vulnerability assessments of structures. To prepare for better accessibility of the results of this research, Graphical User Interface (GUI) was introduced to receive input parameters related to the RC frame and seismic limitation of performance levels and provide the predicted Sa(T 1 ) regarding the seismic limit-state performance levels of RC MRFs prescribed by FEMA356 [63]. It should be noted that the reliability of prediction models was discussed in Sect. 6, and the introduced GUI can plot the predicted ML-based M-IDA curve while mitigating the need for complex modeling and analyses. It is noteworthy that the input parameters can be easily achieved for the assumed structure, and in addition, for calculating the period of the structure, the formulas that have been provided by the seismic provisions (e.g., ASCE 07-16 [48]) can be used.

Conclusions
Recent studies confirm that complex modeling and analysis should be performed to determine seismic responses and seismic performance levels of RC structures, while the most of analyses are time-consuming and need to be done by high-speed computer systems. In addition, the unpredictable nature of seismic events is another factor that affects seismic performance achievement. To overcome this issue, this research proposed ML-based prediction models to estimate the IDR max and Sa(T 1 ) for the M-IDA curve of the RC frames. The analysis results can be summarized as follows:  • Assuming IDR max as the target of prediction, eight algorithms of PLSReg, SReg, VReg, LReg, GReg, MLPReg, SVM, and LSVR had lower R 2 values (i.e., less than 65%) and cannot be used as prediction models. On the other hand, eight algorithms of KNR, PLSReg, SReg, LReg, GReg, MLPReg, SVM, and LSVR had lower R 2 values (i.e., less than 77%) for predicting Sa(T 1 ) as a target. In addition, ML algorithms had the precise prediction values located exactly in the x = y line, assuming allowable IDR max of lower than 4.0%, that shows the ability of the proposed methods for estimating IDR max in all RC MRFs. • Considering the curve plotting ability that improved in ML methods based on the allowable performance levels (i.e., IDR max values of 1.0%, 2.0%, and 4.0%), three algorithms of the XGBoost, ANNs, and NuSVR can predict the seismic performance levels of the five-Story RC frame using the predicted M-IDA curves. Therefore, they can be considered as proposed prediction models for any type of RC frame. • Four case study RC buildings were assumed to check the reliability of prediction models. In Case A, the BReg, ETReg, ERTReg, and ANNs algorithms predicted the IDR max with the accuracy of 95.7%, 93.19%, 90.27%, and 90%, respectively, and in Case B, the accuracy of 92.78%, 90.31%, 87.85%, and 90.1%, respectively, were achieved by prediction models. In Case C, the ANNs and BReg algorithms with the accuracy of 92.9% and 89.76%, respectively, in Case D, the BReg, ETReg, and ANNs algorithms with the accuracy of 92.5%, 89.93%, and 87.32%, respectively, can be considered as best models of prediction. • Graphical User Interface (GUI) was proposed for preliminary estimation of the seismic performance levels of RC frames based on the main important features that can be introduced as input parameters. In addition, the GUI can be able to plot the predicted M-IDA curve regarding both seismic events and facilitate the seismic vulnerability assessment of RC buildings. Moreover, there is no limit for introducing the thresholds of the allowable IDR max , and the users can find the prediction results for the selected IDR max . • For operating the GUI, (a) receives the main important structural features that affects the seismic response and seismic limit-state capacities, (b) receives the selected IDR max defined by user (e.g., four main IDR max were showed in Fig. 20), (c) predicts the M-IDA curve of introduced RC frames, and (d) presents the Sa(T 1 ) corresponding to the selected IDR max .
Author contributions All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by FK, NA, and RJ. The first draft of the manuscript was written by FK and NA, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript. FK: writing-original draft preparation, conceptualization, software, analysis, methodology, modeling. NA: writing-original draft preparation, modeling, software, analysis, and investigation. RJ: writingreview and editing, supervision.

Funding
The authors did not receive support from any organization for the submitted work. The authors have no relevant financial or nonfinancial interests to disclose.
Data availability Data will be made available on request.

Conflict of interest
The authors declare that there is no conflict of interest with relation to the paper Machine learning-based seismic probabilistic prediction of reinforced concrete buildings submitted for publication in Archives of Civil and Mechanical Engineering.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.