1 Introduction

Shear walls are typically utilized as the primary elements to resist lateral loads in reinforced concrete buildings. Towards capacity design assumptions, shear walls are designed to exhibit ductile behavior by providing adequate reinforcement and proper detailing. However, experimental studies have shown that walls with an aspect ratio smaller than 1.5 (i.e., squat walls) and those with poor reinforcement and detailing, despite their higher aspect ratio, end up showing brittle failure (e.g., diagonal tension, web crushing) (ASCE-41 2017; Massone and Wallace 2004; Sittipunt et al. 2001). Such walls are often observed in buildings not designed according to modern seismic codes and are prone to severe damage (Wallace et al. 2012; Arnold et al. 2006). As the performance-based design and assessment approach has gained importance concordant with hazard mitigation efforts, there has been an increasing need and demand for reliable models to predict structural behavior under seismic actions. This objective is particularly important for walls that exhibit shear behavior as the nonlinear deformation capacity of such walls is assumed to be zero, potentially leading to technical and economical over-conservation. More realistic solutions can be achieved if their behavior is accurately estimated and considered in seismic performance evaluation.

The prediction of structural behavior has been achieved through the use of predictive equations or models that are developed based on available experimental data. Recently, machine learning (ML) methods have gained significant attention in the structural/earthquake engineering field, showing promising results even with limited data compared to other domains like computer vision and image processing. However, it should be noted that while black-box models often capture the input–output relationship more accurately in classification and regression tasks, they may not always align with the underlying physical behavior. This issue has been highlighted in various scientific and engineering applications, where black-box models have led to misleading conclusions (Lazer et al. 2014; Douglas and Aochi 2008; Karpatne et al. 2022). Consequently, despite their high accuracy, black-box models are not universally accepted within the earthquake engineering community. To leverage the benefits of advancements in artificial intelligence while considering the physical behavior, recent research efforts have focused on integrating black-box machine learning methods with domain-specific knowledge (Karpatne et al. 2022; Luo and Paal 2022; Zhou et al. 2022; Aladsani et al. 2022). This study takes a step further by employing an explainable machine learning approach, in contrast to black-box models, and incorporates the existing physics-based understanding of seismic behavior to estimate the deformation capacity of non-ductile shear walls.

2 Related research

Research efforts in the literature to estimate wall deformation capacity have produced empirical models, some recently adopted by building codes (Abdullah and Wallace 2019); however, they are relatively limited compared to other behavior features such as shear strength or failure mode. Earlier models were mainly developed using a limited number of experimental results (Paulay et al. 1982; Kazaz et al. 2012) or were trained using a single dataset; that is, they were not trained and tested based on unmixed data (Abdullah and Wallace 2019; Grammatikou et al. 2015). Over time, as machine learning is embraced in the earthquake engineering field (Mangalathu et al. 2020; Zhang et al. 2022; Deger and Taskin 2022; Deger and Taskin Kaya 2022; Aladsani et al. 2022) and new experiments are conducted, more advanced models have been developed. Yet, two main issues are encountered: (i) Some models used simple approaches such as linear regression for the sake of interpretability (Deger and Basdogan 2019) and sacrificed overall accuracy (or had large dispersion). One might think that accurate models that predict relatively complicated behavior attributes can only be achieved by increasing model complexity; however, literature studies have shown that this may cause problems with the structure and distribution of the data (Johnson and Khoshgoftaar 2019; Kailkhura et al. 2019). More importantly, urging the model to develop complex relationships to achieve higher performance typically leads to black-box models where internal mechanisms include highly nonlinear, complex relations. (ii) Such black-box models achieve high overall accuracy at the cost of explainability (Zhang et al. 2022). Researchers that acknowledge the significance of interpretability employed model-agnostic, local or global explanation methods, such as SHapley Additive exPlanation (SHAP) (Feng et al. 2021) and Local Interpretable Model-agnostic Explanations (LIME) (Ribeiro et al. 2016), to interpret the decision mechanism of their models (Aladsani et al. 2022). However, it should be noted that such algorithms are not fully verified (Kumar et al. 2020; Rudin 2019); besides, they are approximate approaches. In addition, due to the diverse criteria employed by each model-agnostic method, the interpretations derived from these methods can exhibit variations across different approaches, leading to different conclusions and insights. Apart from all these, despite their broadening use and high accuracy, the black-box models are not entirely accepted in the earthquake engineering society, primarily due to their opaque internal relationships and occasional lack of reliability (Molnar 2020).

Concerns regarding the trustworthiness and transparency of black-box models have led to the development of explainable artificial intelligence (XAI) (Adadi and Berrada 2018; Lipton 2018). The strategies in XAI can be classified into two main categories: post-hoc explainability, which involves using explanatory algorithms to explain existing black-box models, and the generation of transparent glass-box models (Deger and Taskin Kaya 2022) that are fully comprehensible and interpretable by humans. In fields where critical decisions are paramount, the adoption of transparent models is of utmost importance. However, their prevalence remains limited in comparison to black-box models (Doshi-Velez and Kim 2017). This discrepancy can be attributed to the superior decision-making capabilities demonstrated by black-box models. The use of highly complex functions during the training of these machine learning models allows them to capture the underlying nonlinear structure present in the data, thereby enabling them to make significantly better decisions than transparent models. On the other hand, transparent models aim to construct learning models with simpler functions in order to maximize performance while also facilitating human understanding and comprehension of the model (Selbst and Barocas 2018). Linear regression, logistic regression, and decision trees are widely recognized as transparent methods in the field of machine learning. However, due to the utilization of simpler functions or rules in such methods during the learning phase, the performance levels of transparent models are typically lower than black-box models (Linardatos et al. 2020). To improve performance, the structure of functions used to construct predictive models can be intentionally made complex. This can be achieved by leveraging the framework of generalized additive models (GAMs) (Hastie et al. 2009). The GAMs allow for flexible modeling by incorporating smooth and non-linear functions of the input variables while still maintaining interpretability. This flexibility enables the model to capture complex relationships and interactions between input variables. Some approaches developed based on the GAM framework include Neural additive model (NAM) (Agarwal et al. 2021), GAMI-Net (Yang et al. 2021), and Enhanced explainable neural network (ExNN) (Yang et al. 2020). Another method, Explainable Boosting Machine (EBM) is an alternative tree-based method that integrates the interpretability of generalized additive models (GAMs) with the predictive capabilities of boosting (Nori et al. 2019). Notably, EBM has been widely reported in several studies, including detecting common flaws in data (Chen et al. 2021), diagnosing COVID-19 using blood test variables (Thimoteo et al. 2022), predicting diseases such as Alzheimer (Sarica et al. 2021), or Parkinson (Sarica et al. 2022), as a competitive alternative to powerful boosting methods like XGBoost and LightGBM.

Understanding how the model makes the decision/estimation is critical to (i) verify that the model is physically meaningful, (ii) develop confidence in the predictive capabilities of the model, and (iii) broaden existing scientific knowledge with new insights. This study addresses this requirement and fills a crucial research gap in earthquake engineering by leveraging domain-specific knowledge to evaluate and validate the decisions made by machine learning methods. Unlike the existing ML-based predictive models (Aladsani et al. 2022; Zhang et al. 2022), the proposed model aims to estimate the deformation capacity of non-ductile reinforced concrete (RC) shear walls and inherently possesses transparency and interpretability. The methodology is developed based on the framework of generalized additive models and incorporates engineering knowledge to improve accuracy and reliability, providing valuable insights for structural analysis and design in earthquake engineering. The inputs of the predictive model are designated as the shear wall design properties (e.g., wall geometry, reinforcing ratio), whereas the output is one of the constitutive components of the nonlinear wall behavior, that is, the deformation capacity. The main contributions of this research are highlighted as follows:

  • A fully transparent and interpretable predictive model is developed to estimate the deformation capacity of RC shear walls that fail in pure shear or shear-flexure interaction.

  • The proposed model meets all desired properties, i.e., decomposability, algorithmic transparency, and simulatability, without compromising high performance.

  • This study integrates novel computational methods and domain-based knowledge to formalize complex engineering knowledge. The proposed model’s overall consistency with a physics-based understanding of seismic behavior is verified.

3 The RC shear wall database

The experimental data used in this research is a sub-assembly of the wall test database utilized in Deger and Taskin (2022) with 30 additional data (Tokunaga and Nakachi 2012; Hirosawa 1975). As the main focus is to estimate the deformation capacity of walls governed by shear or shear-flexure interaction, walls that did not show so-called shear failure indications based on their reported failure modes are excluded from the database, resulting in 286 specimens of use for this research. Shear failure indications are mainly identified as diagonal tension failure and web crushing damage, which imply reaching shear strength. All specimens were tested under quasi-static cyclic loading, whereas none was retrofitted and re-tested. The database consists of wall design parameters, depicted in Fig. 1, which are herein designated as the input variables of the machine learning problem, namely: wall geometry (\(t_w\), \(l_w\), \(h_w\)), shear span ratio (\(M/Vl_w\)), concrete compressive strength (\(f_c\)), yield strength of longitudinal and transverse reinforcing steel at web (\(f_{yl}\), \(f_{yt}\)), yield strength of longitudinal and transverse reinforcing steel at boundary elements (\(f_{ybl}\), \(f_{ysh}\)), longitudinal and transverse reinforcing ratios at web (\(\rho _{yl}\), \(\rho _{yt}\)), longitudinal and transverse reinforcing ratios at boundary elements (\(\rho _{ybl}\), \(\rho _{ysh}\)), axial load ratio (\(P/(A_gf_c\))), shear demand (or strength) at the section (\(V_{max}\)), cross-section type (rectangular, barbell-shape, or flanged, XSec), curvature type (single or double, CT). It is noted that single curvature and double curvature correspond to the end conditions of the specimen, i.e., cantilever and fixed-fixed, respectively. Distributions of the input variables are presented in Fig. 2 along with their box plots (shown in blue).

Fig. 1
figure 1

Distribution of the input variables in the database

Fig. 2
figure 2

Distribution of the input variables in the database

The output variable of the ML problem, the deformation capacity, is taken directly as the reported ultimate displacement prior to its failure if the specimen is tested until failure. Otherwise, it is assumed as the displacement corresponding to \(0.8V_{max}\) as suggested by Park (1989). It is noted that failure displacement was taken as the total wall top displacement and was not separated into shear and flexural deformation components.

4 Generalized additive models

The Generalized Additive Model (GAM) (Hastie and Tibshirani 1987) was initially proposed as a representation model that models an n-dimensional complex mathematical function or a black-box problem as a linear combination of first-order basis functions. To improve the model representation and capture additional complexities, second-order terms, including the pairwise effects between variables, were incorporated into the model. This extended version of GAM, known as GA\(^2\)Ms (Generalized Additive Models with Second-order terms) (Lou et al. 2012), is expressed in Eq. 1.

$$\begin{aligned} g(E[y]) ) = f_0 + \sum _{i=1}^n f_i(x_i) + \sum _{i=1}^K f_k(x_{k_1},x_{k_2}) \end{aligned}$$
(1)

where \(f_0\) denotes the intercept term, calculated as the mean response of all the outputs, while each \(f_j\) represents the univariate shape function for the i-th variable, capturing the individual effect of \(x_i\) on the model output \(f(x_1, \ldots , x_n)\). Additionally, as already mentioned earlier, the model incorporates K pairs of features \((k_1, k_2)\) to account for their combined effects (\(f_k(x_{k_1},x_{k_2}))\).

Notably, since the shape functions are trained independently for each input variable, the model is additive (decomposable), enabling the separate analysis of the effect of each input variable on the model output. Thus, the effect of each feature on the model’s predictions, which is captured by the corresponding univariate shape function, can be interpreted through visualization of these shape functions (algorithmically transparent). An example of such visualization can be observed in Fig. 3, where the shape functions provide insights into the influence of different features on the model’s output.

Fig. 3
figure 3

Inference phase of GAM model representation

Similarly, the pairwise interaction \(f_{ij}(x_i,x_j)\) could be rendered as a heatmap on the two-dimensional \(x_i\), \(x_j\)-plane. In the inference phase, all the terms in Eq. 1 are added up, yielding a final prediction of model output, i.e., deformation capacity. Moreover, the GAM offers both local and global explanations of the learning model as each variable importance is estimated as the average absolute value of the predicted score. It should be noted that for better performance, additional interactions can be incorporated; however, this may result in a more complex model with lower generalization performance due to the increased number of model parameters to be trained. This increased complexity can make the final predictive model less comprehensible as well (less simulatability). Given this trade-off, the number of interactions is set to two for the proposed model.

The framework of Generalized Additive Models (GAMs) offers a flexible and interpretable approach for modeling complex relationships. It achieves this by decomposing the overall function into additive components represented by the shape functions. Each shape function captures a specific aspect of the relationship between the input variables, enabling the modeling of nonlinearity and interactions. The determination of these basis functions can be accomplished using various methods, resulting in different representation techniques. While alternative analytical functions like splines or orthogonal polynomials can be employed to define shape functions, they may yield less accurate representations of nonlinear models (McLean et al. 2014). Alternatively, shape functions can be modeled in such cases using black-box methods such as trees, boosting, or neural networks. This approach is particularly advantageous when the underlying relationship between variables is complex and nonlinear. Among these methods, Explainable Boosting Machine (EBM), developed based on the GAM framework, is a relatively new approach that employs random tree and boosting techniques to represent shape functions (Nori et al. 2019). The EBM method effectively models nonlinearity in the model and has demonstrated high performance in the literature, competing with other powerful boosting methods, such as LightGBM, XGBoost, etc. (Nori et al. 2021). The GAMs have certain advantages over tree-based models such as XGBoost, particularly in the context of interpretability, even when considering SHAP for tree models. The GAMs inherently provide a clear and direct understanding of how each feature influences the outcome, as they model the relationship between predictors and response in an additive and smooth manner. This smoothness is crucial for capturing non-linear trends in data, a feature that tree-based models, with their step-wise data splits, might not capture as effectively. Moreover, while tree models can handle complex interactions, GAMs maintain interpretability even when modeling the interactions between the model’s variables. Additionally, GAMs provide more stable predictions for minor variations in input data and can inherently model uncertainty, providing confidence intervals around predictions. This contrasts with tree-based models, where predictions can change more abruptly due to discrete data splits. Due to these strengths, the EBM, a model based on GAM, will be utilized in this study for modeling the deformation capacity, incorporating domain knowledge, and ensuring physical consistency with seismic behavior.

5 Experimental analysis

To assess whether the method compromises accuracy for the sake of interpretability, the performance of the proposed model, specifically utilizing the EBM, is compared to three state-of-the-art black-box machine learning models, namely: XGBoost (Chen and Guestrin 2016), Gradient Boost (Friedman 2001), Random Forest (Breiman 2001), and two glass-box models, namely Ridge Linear Regression (Hastie et al. 2009), Decision Tree (Breiman et al. 2017). All the implementations are carried out in a Python environment. To ensure a fair comparison, the entire database, including all twelve input variables (ten variables from Fig. 2 and two binary coded variables for curvature type and cross-section type), is randomly split into training and test datasets with a ratio of 90% and 10%, respectively. The performance of each model is influenced by hyperparameters such as learning rate, number of leaves, and number of interactions. A tenfold cross-validation technique is employed to optimize these hyperparameters. This technique involves splitting the data into ten subsets, using one subset as the validation set, while training the model with different hyperparameter settings on the remaining nine subsets. This helps prevent overfitting on the training dataset during hyperparameter tuning. This procedure is not only standard but recommended in the machine learning community, as it provides a comprehensive assessment over multiple train-test splits, thus offering more accurate results of the model’s performance in practical scenarios. By integrating this well-established method, we aim to address potential concerns regarding the reliability of our analysis.

For performance evaluations, the following three metrics are used over “unseen” (i.e., not used in the training process) test datasets of ten random train-test data splittings: coefficient of determination (\(R^2\)), relative error (RE), and prediction accuracy (PA), as given in Eqs. 2, 3, and 4, respectively.

$$\begin{aligned} R^2 & = 1-\frac{\sum _i (y_i -{\hat{y}}_i)^2}{\sum _i (y_i -{\bar{y}})^2} \end{aligned}$$
(2)
$$\begin{aligned} RE & = \frac{\sum _{i} |\hat{y_i} - y_i|}{\sum _{i} |y_i|}| \times 100\% \end{aligned}$$
(3)
$$\begin{aligned} PA & = \sum _{i} \frac{y_i}{\hat{y_i}} \end{aligned}$$
(4)

where \(y_i\), \(\bar{y}\), \(\hat{y_i}\), and m refer to the actual output, the mean value of \(y_i\)s, predicted output of corresponding regression model, and a number of samples in the test dataset, respectively.

The EBM method was initially employed on the dataset without considering any form of physical meaningfulness to evaluate its performance, especially compared to other boosting-based methods. Table 1 presents the mean performance scores (evaluation metrics) of all the machine learning models, while Fig. 4 illustrates their dispersion through box plots. The results demonstrate that the EBM achieves comparable performance to its black-box counterparts, with a coefficient of determination (\(R^2\)) of 0.83, a relative error of 0.41%, and a prediction accuracy (PA) of 1.21. The box plots in Fig. 4 also indicate that EBM exhibits low deviations in \(R^2\), relative error (RE), and prediction accuracy (PA), indicating the model’s reliability and robustness across different train-test splits. Moreover, the mean prediction accuracy (PA) reveals an overestimation of approximately 20% for both EBM and the black-box methods, suggesting the presence of potential noise in some input variables. In comparison to transparent models, EBM outperforms both the Decision Tree (DT) and Ridge Linear Regression (RLR) across all three metrics, demonstrating its superiority over traditional glass-box approaches.

Table 1 Mean performance scores based on the test datasets over ten random splittings
Fig. 4
figure 4

Comparison of performance scores of ML methods utilizing all wall features for test samples based on ten random train splittings

Considering the achieved results, the most remarkable advantage of the EBM method over the others is that it provides full explainability without sacrificing accuracy. Unlike other methods, EBM enables the user to understand how the prediction is made and which parameters are essential in the decision-making process. Therefore, the EBM method is selected as the baseline algorithm for the rest of the analysis to propose a prediction model for estimating the deformation capacity based on the following criteria: developing a model with fewer input variables (high simulatability), achieving high accuracy, and ensuring physical consistency.

5.1 The proposed deformation capacity model

To reduce the number of variables in the prediction model without sacrificing prediction accuracy, it is necessary to identify the most influential variables through uncertainty analysis. This approach aims to propose a model that is not only more interpretable and simple but also practical and easily implementable with a reduced number of variables for potential users. Therefore, the importance of the wall properties in predicting the deformation capacity is evaluated based on additive term contributions visualized in Fig. 5. The results reveal that \(t_w\) and \(M/Vl_w\) (or \(h_w\)/\(l_w\)) have the greatest impact on individual predictions. This is consistent with the mechanics of the behavior as walls with smaller thickness are shown to be more susceptible to lateral stiffness degradation due to concrete spalling, leading to a failure caused by lateral instabilities or out-of-plane buckling (Vallenas et al. 1979; Oesterle et al. 1976). The shear span ratio (or aspect ratio), on the other hand, has a significant impact on deformation capacity as the higher the shear span ratio gets, the slender the wall is, and the higher deformations it typically can reach prior to its failure. The least important wall parameters, on the other hand, are identified as curvature type, cross-section type, and concrete compressive strength.

Fig. 5
figure 5

Global interpretation, in terms of mean absolute score, of the model comprising of twelve input features. Xsec and CT stand for cross-section and curvature type, respectively

One objective of this research is to propose a practical model as well as its accuracy and transparency. To develop the predictive model with as few input variables as possible, various combinations of input features, selected based on engineering knowledge, are thoroughly assessed to achieve performance scores comparable to those obtained when using all twelve features. Domain knowledge-based feature selection emphasizes two key features: (i) the shear stress demand (\(V_{max}\)) and (ii) the axial load ratio (\(P/(A_gf_c\))). These two features have been highlighted for their significant impact, as evidenced by various research studies [e.g., Abdullah and Wallace (2019), Lefas et al. (1990), Netrattana et al. (2017)], and are underscored by ASCE 41-17 (ASCE-41 2017) for accurately modeling wall deformation. Table 2 presents various input feature combinations along with the corresponding mean coefficient of determination (\(R^2\)) and mean prediction accuracy (PA) on the test datasets over ten random splittings. As shown in Table 2, EBM can achieve performance scores comparable to using all features by focusing on just four key features: \(V_{max}\), \(M/Vl_w\), \(P/(A_gf_c\)), and \(t_w\).Integrating additional features (e.g., \(\rho _l\), \(\rho _{bl}\), \(\rho _{sh}\)) deemed influential by experimental findings (Tasnimi 2000; Hube et al. 2014), has demonstrated only a marginal impact on the overall performance.

Table 2 Mean performance scores based on the test datasets over ten random splittings for various input feature combinations

The proposed predictive model is selected to achieve the highest \(R^2\) with a prediction accuracy as close to 1.0 on the validation dataset as possible. The correlation plots are presented in Fig. 6 for training and test data sets, where scattered data are concentrated along the \(y=x\) line, demonstrating that the proposed model can make accurate predictions. Additionally, it is worth mentioning that the distribution of the residuals is concentrated around zero. It should be noted that overfitting occurs when a model is excessively complex and learns not only the underlying patterns in the training data but also its noise, leading to poor generalization on new, unseen data. As can be seen in Fig. 6a, the \(R^2\) accuracy of our model on the training data is 96%, while on the test data, it is also 92%. These results can be interpreted as an indication that our model has not overfitted. This is evident because the model exhibits consistent performance across both training and test datasets, suggesting that it generalizes well to new, unseen data without being overly tailored to the specific patterns of the training set.

Fig. 6
figure 6

Correlations of the model outputs with the actual values for a training and b test datasets

Following all of these analyses, the proposed model is illustrated in Figure Fig. 7. As discussed above, the proposed model is an additive model in which each relevant feature is designated a quantitative term contribution, therefore enabling the user to examine the contribution of each feature through univariate (Fig. 7a–d) and bivariate shape functions (Fig. 7e–f). The values, called scores, are read from these functions, and those from heat maps representing pairwise interactions (i.e., between two features) are summed up to calculate the prediction. The gray zones displayed along the shape functions represent error bars, indicating the model’s uncertainty and sensitivity to the data. These error bars are particularly visible in cases of sparsity or the presence of outliers within the corresponding region. Furthermore, it is observed that the resulting shape functions exhibit characteristics of jump-like, piece-wise constant functions instead of smooth curves with gradual inclines and declines. This behavior can be attributed to the fact that the shape functions of the proposed model are derived from multiple decision-tree learning models.

Fig. 7
figure 7

Shape functions of the proposed deformation model, including individual (ad) and pairwise interactions (ef). Note that the intercept \(f_0= 35.528\)

Fig. 8
figure 8

Variable contribution estimates for a well-predicted, b averagely-predicted samples

The shape functions in Fig. 7 also indicate their correlations with the output in a graphical representation. For example, nonlinear patterns that can not be observed in linear approaches can be easily interpreted (Zschech et al. 2022), which provides new insights to broaden existing experimental-based knowledge. Experimental results (Corley et al. 1981) demonstrate that the shear strength \(V_{max}\) (Fig. 7d) reduces ductility, thus, the deformation capacity, aligning with the suggestions of ASCE 41-17 acceptance criteria, whereas, the collection of relevant experimental data in the literature has revealed a highly nonlinear pattern (Aladsani et al. 2022; Deger and Basdogan 2019). This nonlinearity can be observed in the shape function suggested by the proposed method. Considering the other input variables (\(M/Vl_w\),\(t_w\), \(P/(A_gf_c)\)), the proposed model is also consistent with experimental results in the literature such that \(M/Vl_w\) (Fig. 7a) and \(t_w\) (Fig. 7b) have a positive impact, as discussed above, whereas \(P/(A_gf_c)\) (Fig. 7c) has an adverse influence (Lefas et al. 1990; Farvashany et al. 2008). The reason for \(M/Vl_w\) and \(t_w\) (Fig. 7b) suggesting an inverse effect up to a certain point (\(M/Vl_w \approx 1.2\), \(t_w \approx 60\) cm, \(P/(A_gf_c) \approx 0.08\)) is because the model has an intercept value (\(f_0\), Eq. 1) and specimens with smaller deformation capacities (\(f_0\) less than 35.528) are predicted adding up negative values. The unexpected spikes in \(t_w\) are likely due to a sudden increase in the number of data points at \(t_w = 100\) mm and \(t_w = 200\) mm (64 and 44 specimens, respectively). These particular thickness values are commonly employed in experiments. Consequently, when the output varies while the input remains consistent across a significant number of specimens, it can pose challenges in decision-making processes.

As mentioned earlier, generalized additive models provide flexibility over the structure of the developed model by allowing modifications such as adjusting the number of pairwise interactions. This allows the GAM to suggest more than one model for the same input–output configuration for a particular train-test dataset. Reducing the number of interactions brings simplicity to the model; however, it typically loses accuracy as GAM relies on its automatically determined interactions in the decision-making process.

5.2 Sample-based verification of proposed model

To further examine the physical meaningfulness of the proposed model, the prediction of deformation capacity was validated based on two example specimens, where both good and poor results were obtained. The prediction for one specimen showed excellent accuracy with almost zero error (Fig. 8a), while the prediction for the other specimen had an error of approximately 15% (Fig. 8b). The contribution of each feature to the prediction is presented for each specimen such that the intercept is constant and shown in gray, the additive terms with positive impact are marked in orange, and additive terms decreasing the output are shown in blue. Each contribution estimate is extracted from the shape functions and two-dimensional heat maps (Fig. 7) based on the input values of a specific specimen. Overall, the model is consistent with physical knowledge, except \(V_{max}\) has an unexpected positive impact on the output for the relatively worse prediction (Fig. 8b). This is an excellent advantage of proposed method; that is, the user can prudently understand how the prediction is made for a new sample and develop confidence in the predictive model (versus blind acceptation in black-box models).

5.3 Comparisons with current code provisions

For validation and verification of the predictive model, performance scores are compared with those obtained using the current code provisions. ASCE 41-17 and ACI 369-17 (ACI-369 2017) provide recommended deformation capacities for nonlinear modeling purposes, where shear walls are classified into the following two categories based on their aspect ratio: shear-controlled (\(h_w/l_w > 1.5\)) and flexure-controlled (\(h_w/l_w > 3.0\)). The deformation capacity of shear-controlled walls is identified as drift ratio such that \(\Delta _u/h_w = 1.0\) if the wall axial load level is greater than 0.5 and \(\Delta _u/h_w = 2.0\) otherwise (ASCE-41 2017) (Table 3).

Table 3 Acceptable Total Drift (%) for Nonlinear Procedures—RC Shear Walls Controlled by Shear (ASCE-41 2017)
Fig. 9
figure 9

Comparisons of proposed deformation capacity model predictions with code provisions

Deformation capacity predictions based on the proposed model are compared to ASCE 41-17 provisions in Fig. 9 based on the test data set. Predicted-to-actual ratios are \(1.06\pm 0.49\) and \(6.42\pm 3.17\) for proposed model and code predictions, respectively. The results imply that traditional approaches may lead to the overestimation of deformation capacities and cause unsafe assessments.

5.4 Conclusions

The utilization of machine learning techniques has gained significant attention in earthquake engineering, providing promising solutions to various challenges and yielding reasonably reliable and accurate predictions. However, the decision-making process of machine learning models remains ambiguous mainly due to their inherent complexity, thereby creating opaque black-box models. To deal with this issue, a fully transparent predictive model is developed in this study to estimate the deformation capacity of reinforced concrete shear walls that fail in pure shear or shear-flexure interaction. The proposed method is constructed based on a generalized additive model such that each relevant feature (wall parameters) is designated a quantitative term contribution. The input–output configuration of the model is designated as the shear wall design properties (e.g., wall geometry, axial load ratio) and ultimate wall displacement, respectively. The conclusions derived from this study are summarized as follows:

  • The importance of the wall properties in predicting the deformation capacity is evaluated based on additive term contributions. The feature, including \(t_w\) and \(M/Vl_w\) (or \(h_w\)/\(l_w\)), have the greatest effect on individual predictions, whereas the least relevant ones are identified as curvature type, cross-section type, and concrete compressive strength.

  • Compared to three black-box models (XGBoost, Gradient Boost, Random Forest), the model, constructed based on EBM, achieves similar or better performance in terms of correlation of determination (\(R^2\)), relative error (RE), and prediction accuracy (PA; the ratio of predicted to the actual value). The EBM-based model achieves a mean \(R^2\) of 0.83 and a mean RE of 0.41% using twelve input variables based on ten random train-test splittings.

  • Compared to two glass-box methods (Decision Tree (DT) and Ridge Linear Regression (RLR)), the EBM outperforms both methods across all three metrics.

  • The dispersion of performance metrics of the EBM-based model is small, implying that the model is robust and the performance is relatively less data-dependent.

  • Compared to the EBM-based model when all the available features are used, the proposed method achieves competitive performance scores using only four input variables: \(M/Vl_w\), \(P/(A_gf_c)\), \(t_w\), and \(V_{max}\). Using these four features, the proposed model achieves \(R^2\) of 0.92 and PA of 1.05 based on the test dataset. Using fewer variables ensures that the model is less simulatable, more practical, more comprehensible, and reduces the computational cost.

  • It is important to note that the decision-making process developed by the proposed EBM-based model has overall consistency with scientific knowledge despite several exceptions detected in sample-based inferences. This is an excellent advantage of the proposed model; that is, the user can assess and evaluate the prediction process before developing confidence in the result (versus blindly accepting it as in black-box models).

  • This model delivers exact intelligibility, i.e., there is no need to use local explanation methods (e.g., SHAP, LIME) to interpret the learning model, which obviates the uncertainties associated with their approximations.

The proposed model is valuable in that it is simultaneously accurate, explainable, and consistent with scientific knowledge. The proposed method’s ability to provide interpretable and transparent results would allow engineers to better understand the factors that affect the deformation capacity of non-ductile RC shear walls and make informed design decisions. The use of the EBM to estimate deformation capacity would improve the reliability and efficiency of structural analysis and design processes, leading to safer and more cost-effective buildings.