1 Introduction

Advancing the discovery and application of knowledge is the primary goal of commitments of scientists and engineers. In a broader sense, knowledge can be promoted through investigating hypotheses, which can in turn be verified via tests. The outcome of such tests can then be used to establish theories. A theory postulates the inner workings of a phenomenon and gives us the recipe to map/predict such a phenomenon with some degree of certainty (Pearson, 1892). Arriving at a theory ensures a fairly unified means of describing how and why a particular phenomenon occurs (i.e., to articulate its causes and effects). For example, the modified compression field theory (MCFT) defines an analytical model to predict the load-deformation response of reinforced concrete (RC) elements that are subjected to shear (Vecchio & Collins, 1986). This theory was initially derived via tests on 30 RC beams and has been, along with its derivatives (Bentz et al., 2006; Sadeghian & Vecchio, 2018), a cornerstone in the design of concrete structures.

The success of the MCFT, together with the majority of similar theories in our domain, can be traced to the aforenoted methodical procedures practiced by structural engineers (as well as engineers in general) (Boothby & Clough, 2017; Bulleit et al., 2015; Gray, 1965; Randolph, 2003). In this procedure, engineers design controlled experiments—wherein a benchmark specimen is first examined, followed by measuring a series of altered specimens. Each altered specimen is modified by changing a single parameter while holding all other parameters identical to the benchmark. Allowing only one parameter (e.g., X) to vary gives us insights into how such a parameter affects the observed responses (e.g., the difference between the outcome of the altered specimen and that of the benchmark specimen is directly caused by the alternation of X). Simply, a change in parameter X may be found leading to a change in the outcome ceteris paribus (other things being equal) (Chambliss & Schutt, 2013).

In the event where two or more parameters are altered simultaneously, then the causal link becomes harder to grasp, and instead of reaching a direct (clear-cut) causation, it is possibly to draw an indirect causation. Given the ambiguity of indirect causation and additional steps required to untangle this form of causation, it comes as no surprise that much of the works in this domain favors direct causationa synonym for empiricism (which regards experience as the source of knowledge, with a focus on induction) and the foundation for defining rationalism (which regards reason as the source of knowledge, with a focus on deduction).

Noting the reliance on physical tests implies that we tend to be constrained to examining a limited number of specimens (whether due to costs arising from specimen preparation, the availability of equipment, or other factors etc.). In addition, it is unlikely that a single experimental program is sufficient to explore the full search space (e.g., altering all parameters and their combination) involved in a given phenomenon. A deep dive into this limitation highlights a philosophical concern that imposes fundamental challenges to accelerating knowledge discovery within this domain. Despite such a concern, the widely adopted empiricism and rationalism approaches continue to boom, as demonstrated by many notable works published annually (including research articles, reports, committee/codal proceedings, etc.).

Given that it is unlikely to devise a comprehensive experimental program, then two possible solutions are proposed: (1) To combine results from multiple experimental programs; and (2) To enlarge experimental tests with numerically generated results. These two solutions aim to grow the number of observations that could be examined to investigate new hypotheses and theories. The combination of experimental programs is a favorable method, as it can retain the essence of physical testing. However, a fact shall be noticed: not all tests are conducted in a uniform and inherent manner, and it is often hard to quantify a test results; as a result, variations could arise due to the feasibility of testing, the availability of equipment, etc. On the numerical front, there is still a lack of a standardized approach to developing numerical models. Thus, any outcome stemming from such models may not only be resource-expensive, but be skewed towards the modelers’ favorable finite element (FE) software, meshing sensitivity, solution criteria and other factors. In the practical sense, the aforenoted arguments can be assumed to be with minor adverse influence.

Hence, regardless of the solutions adopted, the outcome of the enhanced investigations (i.e., observations obtained from tests on both benchmarked and altered specimens by combination multiple studies or augmentation of numerical works) is often analyzed in traditional statistical methods. To properly apply statistical methods, such conditions as preset assumptions and requirements on data size, distribution, quality, etc. shall be met (Ziegel, 2003). If one or more assumptions are breached, then the resulting statistical inference could become weak or untenable.

Perhaps one of the primary limitations with statistical inference lies in that when a statistical model is applied to fit the model’s parameters, it is then able to predict the response of a given phenomenon (Christodoulou et al., 2019). Yet, the real data may not comply with the commonly adopted statistical models’ requirements, especially when the numbers/relationships of parameters grow/become complex (Bzdok et al., 2018). As such, although the reliance on statistical methods may provide some tools to chart approximate solutions to the current problems, it may still not support unlocking the mechanics of such problems. In a way, statistical inference is rarely equivalent to causal inference (Holland, 1986).

To overcome the limitations of classical statistics, nonparametric methods based on artificial intelligence (AI) become attractive tools (Sun et al., 2021). With AI, data are generally no longer in need of satisfying preset criteria or model assumptions. Unlike statistical methods, AI can directly learn from data to search for patterns that tie input parameters to a phenomenon’s output (hence, becoming applicable in scenarios where data comprise highly nonlinear/dimensional interactions) (Hand, 2013). This development paves the way to integrating AI in structural engineering; and in fact, open literature reveals that predictions in data-driven AI methods outperform those in the statistical methods and approaches adopted in codal provisions (Bijelić et al., 2019; Degtyarev, 2021; Mousavi et al., 2012; Naser et al. 2021; Solhmirzaei et al., 2020; Xu et al., 2021).

When building something on the success of AI, an optimist might ask: (1) Would leveraging no AI [given its superior predictivity and the fact that it utilizes more modern, more diverse and larger data pools than those used in deriving codal provisions] overcome the challenges associated with traditional investigation methods? And (2) If so, would we be able to accelerate knowledge discovery if we can supplement our empiricism and rationalism with AI? These questions are counterfactual in nature. Counterfactuals are an essential part in the theories of the school of causality. This study is also motivated by these questions.

At this point, many of the existing AI models remain data-driven, primarily as black boxes with limited potentials to establish certain causality (Pearl, 2009). In other words, AI-based methods can be easily implemented in an application manner to predict an answer to a given problem from a data-driven perspective. Yet, the same models may not be able to identify cause-and-effect functional relations that govern such a problem, simply because the models are black boxes, which fail to disclose the relationships between the governing parameters and, most importantly, are not developed to follow causal principles. The above content answers the questions raised by optimists.

Over recent years, advances in the front of explainability have made it possible to convert blackbox AI models into eXplainable AI (XAI) or whitebox models (Naser, 2021b; Rudin, 2019). With the capability to display the influence of each input parameter on the predicted outcome, white box models can, for instance, articulate how the parameter X, along with other elements, influences the models’ predictions. Understanding how the input parameters influence the models’ predictions would help us uncover possible hidden links from a data-driven perspective. When supplemented with structural engineering domain knowledge, such links can be turned into causal relationships (i.e., mapping functions) to map the cause-and-effect associations. Such functions encode real behaviors (i.e., not confined by working assumptions and/or linearity principles, among others) and hence showcase higher value than those obtained in pure statistical methods.

This study takes a philosophical look into causality from a structural engineering perspective. Specifically, it starts with a discussion on causality as well as how the principles of causality and XAI are adopted in structural engineering. Furthermore, it establishes an approach and certain criteria to derive causal laws and expressions by adopting XAI and combining induction- and deduction-based methods (i.e., mapping functions), so as to describe structural engineering phenomena. The proposed approach is then examined with a series of case studies to derive causal expressions and predict sectional capacities of load-bearing members.

2 Causality: overview and examples

The concept of causality is domain-independent; nevertheless, establishing specific causality is likely to be a domain-dependent task, because each research area will leverage its own domain knowledge in pursuit of causality. Then, how do we establish proper causality? In a broader sense, there are two schools for causal analysis: one is referred to as the Potential Outcomes approach (Rubin, 2005), and the other the Directed Acyclic Graph approach (Pearl & Mackkenzie, 2018). The former strives to analyze data (the results from randomized experiments or observational studies), so as to estimate the population-level causal effects, as opposed to the unit-level effects, since only one state can be observed for each unit in reality. By applying a form of difference between the treated and control states, this approach tries to quantify average causal effects and establish causality (i.e., observed responses of the treated units—observed responses of the control units).

On the other hand, the second approach emphasizes adopting graphical methods to identify possible links (→) between causes and effects. Such links would portray a directional argument that goes from cause(s) to effect(s). In this approach, causality is regarded as a three-level hierarchy that begins with association [purely statistical relationships as defined by data, i.e., how does a unit change in X affect the observed response?], followed by intervention [how will changing one parameter to a specific value (say, X to 5X) affect the observed response?] and further by counterfactuals [the ability to answer questions in the what-if form, i.e., what would happen if one or more parameters are altered? Or, what parameters have to be changed to realize a predefined response?]) (Imbens, 2020; Pearl, 2009). An overview on research of causality is summarized in Table 1, and some noteworthy resources on the origin and philosophical views of causality that go beyond the realm of structural engineering can be found elsewhere (Bunge, 2017; Leuridan & Weber, 2012).

Table 1 Causality based on Brady (2008)

Recent progress in the field of causality results from investigations often carried out in non-engineering domains (such as epidemiology, medicine, policy research, and social sciences). The problems with such domains are fundamentally different from those in structural engineering. For instance, social science is interested in social phenomena, which are often complex and do not have clear-cut boundaries (for instance, the influence of higher education on wages). The same is also true regarding the number of parameters involved, or perhaps more appropriately, co-variates (e.g., age, race, parental background) and units of investigations (i.e., workers).

There are other differences between social and engineering phenomena. For example, the former may contain unobserved or poorly defined parameters, with an emphasis on randomization, so as to establish causality (Robins, 2020). Suppose that a researcher is interested in investigating the effect of higher education on wages. When examining the wage of a worker with higher education, it is impossible to observe how the worker’s wage would have differed if this worker had not received higher education. In other words, no possible means is available to observe the same worker (a unit of investigation) under two parallel conditions (or two parallel worlds, as commonly noted in social science). This impossibility further complicates causality in social science, since “causal inference is inherently a missing data problem” (Imbens & Rubin, 2015).

On the other hand, it is likely for structural engineers to compare the states of two similar units of investigation (e.g., beams) in a given experimental setting. In fact, as noted in the introduction section, a typical testing program contains a benchmark specimen with a series of altered specimens—wherein each and every specimen is likely to vary with one parameter only. The nature of our investigation, while it may be limited since we are not be able to explore neither all parameters nor their interactions, still allows us to implement causal-like investigation in a structured and attainable manner.

The following sentence is a quote often attributed to Leonardo da Vinci: “There is no result in nature without a cause; understand the cause, and you will have no need of experiments.” From the structural engineering perspective, these words emphasize the need to understand the relationship between causes and results (or effects), while also highlighting the importance of experiments and, by extension, the reliance on experiments and their inherent limitations. Finally, this quote dives us to understand the causal link(s) between causes and effects, so as to leverage such links to advance knowledge at an affordable and accelerated rate.

In the following example, beams are load-bearing members designed to resist horizontal actions. It is commonly known that beams are likely to fail due to bending and/or shear effects. For simplicity, assume that a W-shaped beam, Beam A, is made from Grade 50 ksi structural steel, which can only fail through bending once its moment capacity, M50ksi, is exceeded. To find out whether Beam B, a beam identical to Beam A, would have failed under the similar moment capacity level if this beam had been made from Grade 36 ksi, the terms “would have” and “had been made” imply the need for a causal investigation. To answer the above question, Beam B is fabricated and tested. It is noted that Beam B would fail once its capacity reaches M36ksi (which happens to be < M50ksi). This ends our investigation by answering the counterfactual question. But a second question arises: Is it necessary to test Beam B?

From an empirical view and ceteris paribus, beams made from stronger construction materials (in this case, Grade 50 ksi steel) tend to feature larger moment capacities than the identical beams made from inferior materials. In parallel, from a rational view, Eq. 1 reveals that the moment capacity of a typical W-shaped steel beam is a function of the plastic section modulus, Z, and the yield strength (fy) of the structural steel:

$$M={f}_{y}\times Z$$
(1)

Since the yield strength of Grade 36 ksi is lower than that of Grade 50 ksi steel, then it is logical only when M36ksi < M50ksi. Now, the difference of |M50ksiM36ksi| is the direct causal effect of altering the material grade on the gap in the moment capacity between the above two beams. Despite its simplicity, it can be inferred from the same analysis that we do not have to test Beam B for the following two reasons: (1) empiricism by observation generates a qualitative outcome that answers our question, and (2) rationalism by substitution into Eq. 1 generates a quantitative outcome that also answers our question.

Point 1 suggests that the empirical observations acquired in labs (for example, elements of higher-grade materials tend to have larger capacities; or beams tend to deflect under loading) encode causality within them. Revisiting the above example infers that Beams A and B would not have failed in the described manner if it were not for the combination of their geometric and material properties. In other words, due to the differences in yield strength, Beam B would possibly not fail under the same moment level as Beam A.

Here is another example: Suppose that we are now interested in exploring the difference in the load-deformation history between Beams A and B. Then, instead of using Eq. 1 as a basis for comparison, these two beams need to be tested, since no expression is available that enables us to plot the load-deformation history of the two beams directly). Similar to the above example, the observations we record during testing Beams A and B also encode causality (i.e., representing the natural events of load-deformation as they happened, without any assumption). The difference between the obtained plots will be the direct causal result for Beam A being made from Grade 50 ksi and Beam B being made from Grade 36 ksi. Thus, if the causal mechanism behind the load-deformation history of beams is decoded, perhaps it is possible to derive an expression that can be used to predict the load-deformation history of any beam, without the need for physical tests (which are resource-intensive) or numerical simulations (which are computationally expensive and, most importantly, entail working assumptions). Unraveling the causality behind natural phenomena enables human beings to minimize their reliance on physical tests/simulations and derive improved and realistic theories at an accelerated rate.

Point 2 implies that the rationalism adopted in our domain is elemental to the success of attaining causality by articulating the governing factors of a phenomenon and in many cases, by providing a quantifiable measure of the influence of such factors. For example, Eq. 1 shows that the geometric features and the material properties contribute equally to the moment capacity of the W-shaped steel beams, and it displays the relationship between these factors (i.e., a functional relationship in a multiplication form), indicating that any increase in one parameter (or both) will lead to a hike in the moment capacity. Finally, this equation is a mathematical function, which can be documented and used with ease (without the need to program a set of software/algorithms for each individual case).

A deep dive into Points 1 and 2 reveals an opportunity to tie empiricism (induction) and rationalism (deduction) together, so as to arrive at certain causal laws (see Table 2). For this purpose, observations shall be examined. Thankfully, literature on structural engineering provides plenty of related tools. Analyzing such observations will likely consume tremendous computational resources. Thus, this study maintains that XAI, which can overcome many limitations of traditional AI and statistical methods, shall be adopted as an attractive solution. The following section dives into more details on this front.

Table 2 Comparison and convergence between empiricism (induction) and rationalism (deduction)

2.1 Empiricism → Rationalism → Mapping Functions through XAI

The previous section gives an overview of the principles of causality while charting a path to navigating the realms of empiricism and rationalism. This path starts from observing a re-occurring phenomenon (e.g., under certain conditions, simply supported beams may fail in bending). Re-occurring of a phenomenon means it occurs consistently in tests and/or simulations. By examining such observations, it is possible to identify the causal mechanisms behind such consistency. To uncover such mechanisms (e.g., the cause behind an effect), the parameters governing the phenomenon shall first be identified, and how such parameters are consistently tied to enable the re-occurrence of the phenomenon shall be understood. This exercise combines the strengths of empiricism (induction) and rationalism (deduction).

Figure 1 presents a flowchart for the proposed approach. Steps 1, 2 and 3 are self-explanatory and have been implicitly discussed in the above sections. Therefore, the following discussion will begin from Step 4, which combines data from a series of experiments. Ideally, these data would comprise observations (e.g., sectional capacity, load-deformation history plots) and their governing parameters (e.g., geometrical, material, loading and restraint features). In this instance, a quick refresher on structural engineering principles becomes handy, as shown in the following example.

Fig. 1
figure 1

Outline of the proposed approach

In a common sense, sectional structural engineering phenomena (e.g., the moment or shear capacity) are governed by two groups of components: the geometric features of a structural member, and the material properties comprising this member, as seen in Eq. 1 and 2, which illustrates the moment capacity of a typical rectangular reinforced concrete (RC) beam with tensile reinforcement. Note that Eqs. 1 and 2 are arrived at from mechanics principles and present a functional relation that arises from the geometry and material properties comprising beams. Thus, at a sectional level, the governing parameters are expected to fall into the class of geometric features and material properties.

$$M= {A}_{s}{\times f}_{y}\left(d-\frac{a}{2}\right)$$
(2)
$$and, a=\frac{{A}_{s}{\times f}_{y}}{0.85{f}_{c}\times b}$$

where, As: Area of the tensile steel reinforcement; d: depth of the concrete beam; b: width of the concrete beam; fc: compressive strength of the concrete; and fy: yield strength of the tensile steel reinforcement.

On the other hand, behavioral structural engineering phenomena (e.g., those occurring at the element level, such as load-deformation history, buckling of columns) require the addition of more governing parameters in lieu of the geometric features and material properties. Most notably, such factors will arise from applied loading and/or boundary conditions (given that both of them influence the behavioral response, such as the deformation response of loaded members. In other words, under the same loading conditions, two identical beams, one being simply supported and the other being restrained, will deliver different deformation patterns. The same can be true for two identical, simply supported beams, with one loaded in flexure and the other in shear). Intuitively, it can be inferred that: (1) complex phenomena are likely to require a higher degree of parameters, and (2) there is mutual independence between the classes of parameters.

Unlike Eqs. 1 and 2, there is currently no expression available to plot the complete load-deformation history of beams (vs. expressions to predict the elastic deflection of beams). On a more positive note, methods can be found to arrive at such a history via testing, FE simulation or moment–curvature analysis. While the existing methods are well established, they remain costly and cumbersome. Therefore, it is necessary to develop a new approach to predicting behavioral structural engineering phenomena (e.g., load-deformation history of beams or any member for that matter).

In this approach, a surrogate comprises one or more AI algorithms combined into an ensemble. The use of an ensemble is favorable, as an ensemble would average predictions from two or more algorithms, thereby minimizing biasness or poor predictivity that may be generated from reliance on one sole algorithm (Fernández-Delgado et al., 2019). Results of an ensemble will be showcased in the next section; the outlined approach is algorithm-agnostic and could still be carried out either by individual algorithms or ensembles.

In Step 5, the selected ensemble is examined with explainability measures (AI → XAI), which indicate the importance of each parameter in the ensemble’s predictions (i.e., how often a parameter is used in each prediction) and the partial effect of each parameter when it varies across the range of the outcome while other parameters are kept fixed. Knowing the importance of a parameter will help visualize its significance from a data-driven perspective. Similarly, visualizing how the influence of a parameter changes in the process a phenomenon would shed light on the roles the parameter may play in predicting the phenomenon, as discussed on Eqs. 1 and 2.

The same is also true for complex history-based or time-dependent phenomena. To better visualize this, image how the cracking strength of concrete is expected to play a more dominant role during the early stages of the load-deformation history in RC beams (prior to and up to cracking) than that in the post-cracking regime [since the load-deformation plot post cracking tends to be governed by the response of tensile reinforcement and compressive strength of concrete (Chern et al., 1992)]. Hence, a proper XAI ensemble would yield results that may match our knowledge domain expectations and possibly pinpoint new knowledge that we have not acquired. In other words, we are trying to parallel our knowledge domain to that obtained from XAI, purposed to establish the ensemble’s validity and, thereby, the causal capability.

In Step 6, the AI ensemble is converted into a mapping function. Mapping a phenomenon indicates that the relationship between the governing parameters and the observed responses can be derived with good approximation and sufficient consistency; in this way, a mapping function can be applied to predict a given phenomenon with confidence (Naser, 2021a). Simply put, as an expression that ties the input(s) to the output(s) of a phenomenon, a mapping function could be acquired through nonparametric techniques, such as genetic algorithms or generalized additive models (Gomez-Rubio, 2018; Sivanandam & Deepa, 2008). Mapping functions are favorable to use in this study over XAI algorithms/ensembles due to (1) the former have extreme transparency; (2) the resemblance of design expressions the engineers are familiar with; and (3) independence from the need for AI coding/software. For an in-depth discussion on the rationale behind mapping functions, it is advised to reference certain recent explorations (Naser, 2021c).

Finally, in Step 7, the derived mapping function is tested to explore causality based on the Potential Outcomes and in the Directed Acyclic Graph approaches. In this step, the mapping function can be examined by exploring how the response of two identical elements (e.g., beams) would vary with a particular parameter, X, in one beam (BeamAlt) over a benchmark beam (BeamBench). The difference (e.g., in terms of absolute difference terms) in how the sectional capacity or load-deformation in BeamAlt differs from BeamBench, as the result obtained from the mapping function bears a resemblance to the average treatment effect acquired in the Potential Outcomes approach. In other words, the response of BeamAlt differs from that of BeamBench, because parameter X has been altered (e.g., the alternation of X is the cause of a newly observed effect).

The same mapping function can also be examined based on the principles outlined in the Directed Acyclic Graph approach (e.g., by exploring association → intervention → counterfactuals), where one can apply the mapping function to examine the parenthesized concepts to advance the knowledge on the phenomenon. For example, the mapping function could be used to explore how the varying parameter X (or any other parameters) by one unit is associated with the varying observed responses. Similarly, the same function could be used to predict a new response when parameter X (or any other parameters) is intervened in (substituted) to a new value. Finally, the same function can also be utilized to answer some hypothetical questions, for example: What would happen if X, or X, Y and Z, are altered simultaneously? Or which parameters shall be changed to realize a predefined response?

Re-visiting Steps 5–7 can help infer the criteria for possible mapping functions. Such criteria will help establish the degree of describe-ability (only to describe trends in data, i.e., data-driven trends) and/or cause-ability (to allow for causal interpretation by combining induction and deduction and satisfy the proposed criteria) associated with a given mapping function. Five criteria are proposed herein for the latter:

  • To answer causal questions in terms of potential outcomes and associations, interventions, and counterfactuals.

  • It is preferable to contain most, if not all, of the parameters that are known to govern a phenomenon as commonly accepted based on our domain knowledge.

  • It is preferable to take the same functional form as that of an equivalent existing expression if such expression exists (e.g., generated from mechanics or codal provisions).

  • It is preferable to have the same range of applicability (with the ability to cater to a larger range).

  • It is preferable to be compact and easy to use/apply.

3 Artificial intelligence analysis: ensemble and explainability measures

This section presents details pertaining to ensemble development, explainability measures, and AI analysis. The developed ensemble would average predictions from three algorithms [namely, Extreme Gradient Boosted Trees (XGBoost), AdaBoost Regressor (AdaBoost), and Keras Neural Network (KNN)]. A brief description of these selected algorithms is provided herein for illustrative purposes, since detailed description of them can be found in their respective references. To maintain transparency and allow for easy replication of this analysis, all the algorithms are primarily implemented in their default settings, which are available from their original references.

3.1 Ensemble details

3.1.1 Extreme gradient boosted trees (XGBoost)

XGBoost is a tree-like algorithm that optimizes arbitrary differentiable loss functions within weaker predictors (Freund & Schapire, 1997). It aligns consecutive trees to residual errors, so as to concentrate training on the most challenging targets to be predicted. The code of the used XGBoost can be found online at (XGBoost Python Package, 2020). The selected algorithm includes pre-tuned settings at a learning rate of 0.05, the “least squares regression loss” function, the maximum tree depth of 3, the subsample feature of 1.0, and a total of 1,000 boosting stages.

3.1.2 AdaBoost Regressor (AdaBoost)

The AdaBoost regressor fits the data with weights adjusted according to errors (Freund & Schapire, 1997). In addition, AdaBoost builds a group of regressors (rather than one regressor) to analyze data. The code script for this algorithm can be found at Scikit (2021). The adopted algorithm adopts a “linear” loss function, with 600 estimators at a learning rate of 0.05.

3.1.3 Keras neural network (KNN)

Keras is a neural network library (Li et al., 2018). In this neural network, a direct connection links data to the observed response via an optimized loss function. And a learning rate of 0.03, a Prelu activation function, an Adam optimizer, and two layers containing 512 neurons are used in this network. KNN can be readily found at Keras (2020).

3.2 Explainability measures

Once an ensemble is developed, two explainability measures will be added to display the insights into the governing parameters. These measures are the feature importance and partial dependence plots.

3.2.1 Feature importance and SHapley Additive exPlanations (SHAP) values

The developed ensemble examines a series of features. Since each feature is used to predict a given response, the ensemble’s predictions could be interpreted by tracing the influence of each feature. In a traditional sense, a feature will be deemed important if permuting its values can increase errors in the obtained predictions; this effectively implies that the ensemble heavily relies on this particular feature to deliver good predictions (Altmann et al., 2010). More recently, a more unified method has been introduced, which is referred to as the SHapley Additive exPlanations (SHAP) method, used to explain the feature importance via Shapley values (Lundberg & Lee, 2017; Lundberg et al., 2018). In SHAP, features with large absolute Shapley values are deemed important (Molnar, 2019).

3.2.2 Partial dependence plots

Partial dependence indicates a feature’s marginal effect on ensemble predictions when other features are held constant (Friedman, 2001). It is often displayed via plots [Partial Dependence Plot (PDP)], which assume the independence of one feature from other features. Overall, PDP illustrates the global impact of features on the full range of the observed response.

3.3 Analysis procedure

Once developed, the ensemble will be fed with randomly shuffled observations comprising three subsets (T: training; V: validation; and S: testing). The first two subsets are used to train and validate the ensemble, while the third one is to cross-check its predictions independently. The training, validation and testing of the ensemble follow a tenfold cross-validation procedure (Xiong et al., 2020). In addition, three dedicated performance indicators [namely, Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Coefficient of Determination (R2)] are applied to quantify the performance of the ensemble at each phase of its development, as indicated in Eqs. 3, 4 (Naser & Alavi, 2021). Additional tests as recommended by Smith (Smith, 1986) [i.e., a correlation coefficient (R) > 0.8 with low error metrics (e.g., MAE) showcases a strong correlation between the predictions and actual measurements], and an external predictability indicator (Rm > 0.5) proposed by Roy and Roy (Roy & Roy, 2008) is also applied to add another layer of validation.

$$\mathrm{MAE}= \frac{\sum_{i=1}^{n}\left|{E}_{i}\right|}{n}$$
(3)
$$\mathrm{RMSE}= \sqrt{\frac{\sum_{i=1}^{n}{{E}_{i}}^{2}}{n}}$$
(4)
$${R}^{2}=1-\sum_{i=1}^{n}{\left({{P}_{i}-A}_{i}\right)}^{2}/\sum_{i=1}^{n}{\left({A}_{i}-{A}_{\mathrm{mean}}\right)}^{2}$$
(5)

where, A: actual measurements, P: predictions, n: number of data points, and E = A − P.

$$R=\frac{{\sum }_{i=1}^{n}({A}_{i}-{\overline{A}}_{i})({P}_{i}-{\overline{P}}_{i})}{\sqrt{{\sum }_{i=1}^{n}{({A}_{i}-{\overline{A}}_{i})}^{2}{\sum }_{i=1}^{n}{({P}_{i}-{\overline{P}}_{i})}^{2}}}$$
(6)
$${R}_{\text{m}}={R}^{2}\times (1-\sqrt{|{R}^{2}-R{o}^{2}|})$$
(7)

where

$$R{o}^{2}=1-\frac{{\sum }_{i=1}^{n}{({\mathrm{predicted}}_{i}-{\mathrm{updated}}_{i}^{o})}^{2}}{{\sum }_{i=1}^{n}{({\mathrm{predicted}}_{i}-\mathrm{mean of predictions})}^{2}},{\mathrm{and} \mathrm{updated}}_{i}^{o}=k\times {\mathrm{predicted}}_{i}$$
(8)

And k indicates the slope of the regression line between the regressions of actuals against predictions.

Once the ensemble is developed, it will be augmented with a genetic algorithm surrogate, so as to translate it into a mathematical function—a mapping function, which will be examined via the above metrics to ensure its predictivity (Goldberg & Holland, 1988; Sivanandam & Deepa, 2008). Specifically, the mapping function is examined by articulating association, intervention and counterfactuals, in order to explore possible causality. It is worth noting that additional indicators, algorithms or surrogate techniques can also be used as part of the proposed approach (Babanajad et al., 2017; Zhang et al., 2018). Interested readers are encouraged to make further exploration in these areas.

4 Case study

This section describes a case study to implement the proposed approach at the sectional level of load-bearing members.

Sectional phenomenon: moment capacity of W-shaped steel beams.

This case study will investigate a sectional phenomenon of the moment capacity of W-shaped compact steel beams. A database comprising compact W-shaped steel beams obtained from the AISC manual (AISC, 2017) is developed. In this database, the sections and their geometrical properties (i.e., Z) as well as the yield strength taken as 36 ksi and 50 ksi are compiled. Each section has one Z feature and two fy values (and hence two values for its moment capacity). Table 3 details this database and shows the association between Z and fy as well as the moment capacity resulting from Pearson correlation analysis and mutual information analysis.

Table 3 Statistical insights from the collected database

The above database is then fed into the developed ensemble by following the proposed approach, so as to derive mapping functions for the moment capacity of compact W-shaped steel beams. Figure 2 and Table 4 present the results of this analysis. Figure 2 shows the moment capacity predictions obtained from the ensemble and mapping functions. As expected, the derived function illustrated in Eq. 9 is identical to that in Eq. 1, with the simplest possible form to reach.

Fig. 2
figure 2

Comparison of predictions from the ensemble and mapping functions (Note: bounds represent a ± constant 10% error bound)

Table 4 Performance of the ensemble and mapping functions for training/validation/testing regimes
$$M={f}_{y}\times Z$$
(9)
$$M=927.88 + 0.95Z{f}_{y}$$
(10)

The precursor form shown in Eq. 10 is delivered prior to Eq. 9. As can be seen, this form of the derived mapping function demonstrates a resemblance to Eq. 1 and 9. However, adopting this form implies a constant (i.e., taken for granted) moment capacity of 927.88 k.in even for fictitious steel sections (i.e., with Z = 0 and/or fy = 0). This form voids some of the earlier criteria for selecting mapping functions and, if adopted, needs to be used with caution. It is interesting to note that both functions score favorable metrics in MAE, RSME and R2, as well as additional tests recommended by Smith (R > 0.80) and Roy and Roy (Rm > 0.5), as seen in Table 4 and Fig. 2.

A deep exploration into this case study can find the influence of the Z and fy parameters on predictions from the ensemble and mapping functions. The influence of each feature is plotted in Fig. 3. At the start, this figure shows a good agreement between the ensemble and mapping functions and also demonstrates heavy influence of Z on the predictions from the ensemble and mapping functions. Such reliance on Z can be attributed to fy being limited to 36 ksi or 50 ksi (in other words, the parameter Z is elemental/important in delivering each prediction). In fact, Fig. 3 illustrates that the practical ranges for fy skew the functional relationship between Z and fy in favor of Z, while demonstrating that PD of Z is of a linear form and a nonlinear form in the mapping functions and ensemble, respectively (possibly due to more complex relationships that the ensemble has devised for Z).

Fig. 3
figure 3

Feature importance (top), and PD plots for Z (middle) and fy (bottom) from the ensemble vs. mapping functions (Eqs. 9 and 10)

It could be argued that the high performance of the ensemble and derived functions is a reflection of the simplicity of the problem in discussion. In this instance, simplicity refers to the existence of two features of Z and fy in predicting the moment capacity of compact steel beams. Here, fy is reduced to two values (36 ksi and 50 ksi).

To complement this case study, an effort is made to add a new level of complexity to the above problem by breaking down the composite plastic modulus Z into its basic components, including: depth, d; flange width, bf; flange thickness, ft; and web thickness, tw. The purpose of this operation is to identify whether the proposed approach could successfully uncover the composite relationship between Z and its components. Accordingly, the database is extended by adding basic components and removing Z (see Table 5 for the updated statistics on the new database). A comparison between Tables 3 and 5 indicates that the association between fy and geometric features is weak or non-existent (the results are mixed with findings from Table 3).

Table 5 Statistical insights from the collected database

The new database was then analyzed by using the proposed approach to generate a new mapping function (Eq. 11), which was then compared with Eq. 12, which substitutes Z in Eq. 1 with an expression that accounts for the geometric parameters comprising Z (with the principles of mechanics). The performance indicators for Eq. 11 are listed in Table 4 (also see Fig. 2 for a visual representation). These indicators are found in good agreement with other mapping functions, Eq. 1, and the ensemble. To give a complete picture, Fig. 4 showcases the feature importance (top) and PD plots for the newly examined features. An agreement can be observed in terms of feature importance and PD relationships embedded within the ensemble and Eq. 11.

Fig. 4
figure 4

Feature importance (top), and PD plots for Z (middle) and fy (bottom) from the ensemble vs. Mapping function (Eq. 11)

$$M=6.24d{t}_{f}{f}_{y}+0.494d{b}_{f}{t}_{w}{f}_{y}+0.0194{b}_{f}{t}_{f}{f}_{y}{d}^{2}-0.063{t}_{w}{t}_{f}{f}_{y}{d}^{2}$$
(11)
$$M={f}_{y} \times \left({b}_{f}{t}_{f}\left(d-{t}_{f}\right)+0.25{t}_{w}{\left(d-2{t}_{f}\right)}^{2}\right)$$
(12)

To complete this case study, Table 6 lists the answers to the causal principles outlined according to the Directed Acyclic Graph approach (e.g., exploring association → intervention → counterfactuals). Despite minor differences, it is interesting to note how the proposed approach manages to agree well with the expressions derived from pure mechanics principles. This adds a new layer of confidence to the proposed approach, showcases its applicability, and paves the way to future testing and examination.

Table 6 Further investigation into the ensemble and mapping functions

4.1 Further insights into causality and where to go from here

From a philosophical perspective, establishing causality will unavoidably face a domain-specific dilemma. For example, investigating causal questions often turns into decades-long campaigns (for instance, how will adopting a particular tax policy affect voters’ preferences in the future? What is the influence of race, gender or education on wages? Would applying a medical procedure, X, be the best course of action for treating a disease, Y? And what are the effects of community outreach and public engagement on future crime rates?). Such problems are associated with an array of factors/confounders, many of which may not be easily observed timely, if at all. Thus, establishing causality in such domains may be tedious and exhaustive, finally becoming an admired task (Leuridan & Weber, 2012; Marini & Singer, 1988).

On the other hand, many of the current problems resemble a closed-loop system with a smaller search space than its aforementioned counterparts. For example, the geometrical features of load-bearing members do not deviate beyond a certain range (i.e., with a practical size range where a very small or very large group of members are rarely used, if ever), and these members comprise a handful of construction materials and have certain restraint conditions that would vary within the scale of two extremes (simply-supported and fixed). Regardless of their geometrical features, material properties, loading conditions or other elements, load-bearing members would deflect when stressed, and they would fail when their sectional capacities are exceeded. This consistency narrows our space search and smoothens our search for causality. This study maintains that exploring causality in structural engineering is perhaps a more attainable task than in other domains. Addressing such tasks is not only possible, but probably of the highest merit for advancing humans’ knowledge horizons.

It is necessary to illustrate the overall rationale of the proposed approach with a case study, while keeping on further investigations in the future. The case study selected can articulate the applicability, usefulness, and validity of the proposed approach. And evidently, additional testing will facilitate solidifying the rationale behind this approach. Future explorations are then encouraged to examine, test and extend the ideas presented herein to deal with other problems (of regression and classification nature) within this domain. A priority would be taken to investigate the full capacity of the Potential Outcomes and Directed Acyclic Graph approaches. Future examinations could also explore the notions of time-dependent and/or system-level phenomena (Naser, 2022; Naser & Ciftcioglu, 2022). Other research ideas may include developing causal and counterfactual measures tailored to structural engineering problems. It would be advisable to visit the techniques often adopted in establishing causality in epidemiology, medicine and social science, techniques including instrumental variables, structural/causal equation modeling, etc.

5 Conclusions

This paper presents a new look into causality in structural engineering that hopes to tie induction (empiricism) with deduction (rationalism) via mapping functions and XAI. The proposed approach starts in analyzing commonly observed re-occurring phenomena, so as to identify possible patterns/trends. And then, the arsenal of domain knowledge can be revisited to pinpoint the governing factors associated with such phenomena. An XAI investigation is implemented to derive some mapping functions that can describe the phenomena on hand in a causal manner. In summary, the following list of inferences can be drawn from the findings of this study:

  • Causality is the science of cause and effect; as a result, establishing causality is elemental to advancing the state of structural engineering.

  • Patterns identified from the re-occurring structural engineering phenomena, especially from consistent ones, can be used to integrate induction and deduction.

  • Explainability measures can provide assistance in exploring algorithmic logics and establishing a basis for mapping functions.