1 Introduction

The updating of existing models and reduction of errors of any individual model is typically performed through a process known as model updating. The updated computer models are beneficial in the prediction of structural response and to check the architectural design under various configurations. However, structural models always contain errors, regardless of how they are created. These errors can be represented in various forms, such as parametric uncertainty of stiffness, mass and damping of FE models, the rectification of a set of differential equations to satisfy FE models in the boundary region, and other modeling errors (Khodaparast et al. 2008; Soize 2013). In the early years, natural frequencies of a structure and their corresponding mode shapes were the major modal parameters that resulted in the calculation of model uncertainty. Since then, the application of structural model updating in damage detection (Ng et al. 2009) and health monitoring of divergent structures (Kuok and Yuen 2012; Ng 2014; Yin et al. 2009) has become well-known.

Model updating is mostly approached as a statistical inference problem that can be solved by deterministic and probabilistic approaches. Specifically, deterministic models of a structure are kept enclosed by a set of probability models, which allow the models to give a predictable part and an uncertain part as a prediction error. Jaishi and Ren (2005) considered model updating to be a restricted optimization issue. The inconsistency between the response predicted by the model and by real-time measurement was reduced by considering the objective function and relating it to the mode shape along with residual modal flexibility and frequency. For authentication purposes, the latter authors performed an ambient vibration test. Another example of deterministic methods can be found in Datta (2002). While deterministic methods re entirely appropriate in updating models, they face some limitations. For instance, most deterministic methods ignore the possible solutions that result from the problem of modeling error and incomplete nature of measurements by focusing on a single solution. When the parameters for analysis increase, dimensionality also increases, and thus, the optimization process becomes complicated.

Whenever a complex structure is analyzed, the input parameters often contain errors caused by the absence of proper readings. Unlike in computer modeling, where all the degrees of freedom (DOFs) of a structure can be modeled, it is not possible to analyze all these DOFs practically. A different order of uncertainty is induced in model updating by modeling and measurement errors. Probabilistic models are flexible and take into account the various uncertainties due to unexpected nature. Bayesian model updating methods are probability-based and can handle the multiple types of uncertainties mentioned above. The Bayesian approach helps in providing the optimized value of the uncertain modal parameter by updating the probability distribution function (PDF), thereby assisting engineers to understand the system more precisely.

1.1 Background and motivation

Probabilistic model updating using a specific model reduction procedure was performed using incomplete data in literature. Identification of the parameters is performed and measured, which helps to identify the modal quantities. The model with Markov Chain Monte Carlo (MCMC) with random walks was used to estimate the uncertainties (Sun and Büyüköztürk 2015). MCMC was also applied to calculate the posterior probability density function (PDF) in Bayesian model updating, which helped to estimate the uncertainty and update the finite element of the structure. In another work, efficient parameters were identified for modal updating, and reduction of errors using sensitivity-based clustering was performed (Jang and Smyth 2017). Transferring the measurement error from measuring experimental data (frequency and mode shapes) to updating structural parameters using Bayes rule was also conducted. The collective error was included with Gaussian distributions, and the methodology was investigated using synthetic data along with its applications (Zhang et al. 2020). Bayesian optimisation was also previously used in a constrained optimisation problem with multiple available sources (Ghoreishi and Allaire 2019).

Structural optimisation has been approached by various algorithms like, the firefly algorithm, cuckoo search (CS) algorithm (Gandomi et al. 2011, 2013). It is commonly seen that non-deterministic optimization problems involving more than one parameter to optimize do not utilise causal information in decision-making. In a nutshell, the traditional optimization method pictures the variables to be optimized as having a direct causal relationship with the outcome variable. Recently, the works in the domain of causal inference (Aglietti et al. 2020) have shown that if optimization methods use causal information, they can solve the same optimization problem with a lesser number of optimisation iterations. In this work, we utilised causal information in the process of decision-making.

Bayesian optimisation (BO) is an efficient method to estimate the posterior but suffers from costly iterations and

dependence on the experimental budget. Hence, we used a CBO engine, an extended form of BO that utilises causal information in the optimization process and has proven to be cost-effective by reducing the number of optimisation iterations in various global optimisation problems (Aglietti et al. 2020). We also adopted the traditional Gaussian process with the Causal Gaussian Process and acquisition function with the Causal Expected Improvement (Aglietti et al. 2020). A previous work evaluated counterfactuals of random policies to build causal models (Buesing et al. 2018). Non-parametric Bayesian methods use priors defined on stochastic processes, which allows extremely flexible prior and posterior, whereby complexity grows with data size. Since prior beliefs can be different, or rather subjective, we can assign different priors for the same problem. Moreover, the selection of priors significantly affect the solution because, the prior can be used for decreasing the search space for any model. The most commonly used surrogate functions in non-parametric Bayesian are the Dirichlet process and Gaussian processes (GPs). GPs define prior on the space of functions, which are also known as universal approximators (Patel and Oberai 2019).

The Bayesian approach behaves very poorly when the errors are associated with various kinds, and therefore, quantifying each type of error becomes difficult. In this case, multilevel integration is required for the evaluation of the posterior Probability Distribution Function (PDF). Different simulation techniques, such as MCMC (Parno and Marzouk 2018), combined MCMC, and Metropolis–Hastings (MH) (Lam et al. 2015), are used to avoid the integration and obtain samples. However, Monte Carlo sampling can be inefficient since it incorporates regions where the likelihood will consider small values (Lam et al. 2015). Depending on the identifiability of the problem, the Gaussian processes provide a convenient way to evaluate the posterior PDF (Stuart 2010; Marzouk and Najm 2009). In the Causal Gaussian process, we often trade-off between observation (what we already know) and intervention in areas in which uncertainty is likely to belong. The updated posterior from the use of do-calculus helps in the estimation of maximum likelihood. We are often interested in areas where uncertainty is high rather than the areas where we are already certain in the search space.

A CBO utilises the causal dependencies between the parameters to minimise the total number of iterations required to reach a global minimum. Once it determines the causal dependencies, it breaks down the single surrogate model into multiple surrogate models, where the variables in each surrogate model are causally related. CBO trains the new set of surrogate models using a prior obtained from observational data and a posterior from interventional data, which is also called as causal Gaussian Process (cGP) as it integrates observational and intervention data. Once the surrogate models is trained, it uses an causal acquisition function to balance the exploration and exploitation. In this work, considering the methodology in Aglietti et al. (2020), we employed Causal Expected Improvement (CEI) as the acquisition function that is built on a standard expected improvement. This means a CBO evaluates an acquisition function for each of the new surrogate models then considers only one of them (to evaluate) in any given optimisation iteration. It is well-known (Wang et al. 2016) that the performance of a standard BO decreases with increase in dimensions of parameters to optimise. A causal optimisation model like CBO helps to reason the effective dimensionality of the problem. In this work, we do not claim that our problem has high dimensions, rather we explored the possibility of knowing about the effective dimensionality of the problem in order to reduce the optimisation iterations and, hence, save experimental budget. This the main reason why CBO outperforms BO, as it breaks the single surrogate model into multiple surrogate models based on the causal structures of the parameters. As a result, it becomes easy for CBO to learn the probability distribution for each surrogate model individually.

As the cGP needs a prior from observational data, we investigated if it is possible to synthetically create more observational data for a better prior and compare the efficiency. To create synthetic observational data, we used Tabular Generative Adversarial Networks (that were properly trained with sufficient data) to obtain a prior distribution very close to the real distribution. The generated synthetic information is used to train the cGP when posterior PDF is calculated. We have used a variant of Generative adversarial networks (GANs) that is, TGANs to obtain a perfectly generated data set. GANs have provided a well-organised way to sample the posterior PDF and have an extraordinary ability to estimate the correct distribution (Patel and Oberai 2019; Goodfellow et al.; Ma et al. 2017; Wang et al. 2017).

The two optimal ways to create a good model are subsequently described. First, using a large model with a lot of experimental data, we avoid over fitting and can automatically obtain a model with less uncertainty. Second, when data are insufficient, the right variables that affect the outcome variable are selected through a causal graph, and accurate decision-making must be achieved. Having sufficient data and causal knowledge can help the model to learn the non-deterministic function in unknown regions. In most cases, fit-for-purpose real-world data augmentation can be expensive, time-consuming, complicated, or simply impossible. Minimisation of the need for accessing real data by augmenting substitute data that resembles the real data are important. Adding synthetic data or artificially created data to existing datasets often significantly reduces overfitting and proves to be a better approach for improving the accuracy and generalization of trained models.

1.2 Summary of contributions

The major contributions of the following work can be summarised as follows:

  • Synthetic and observational data were used to provide prior knowledge to the CBO engine, thus balancing the trade-off between observation and intervention (a typical trade-off in causal inference).

  • Synthetic data were generated using TGANs.

  • Synthetic data were evaluated using a table evaluator and finding the best resemblance to real data in terms of results.

  • Frequency corresponding to the minimum error in the data were determined.

  • Results obtained from the model were compared to measured responses from the literature of the structure and thus the gap was minimised.

The formulation of the posterior PDF is discussed in Sect. 2. Section 3 emphasises the description of the generative model, how the generation of synthetic data were performed, and the evaluation of synthetic data based on various metrics. The implementation of CBO is presented in Sect. 4. A comparative study between BO and causal BO and its formulation is described in Sect. 5. The descriptions of the structure and utilisation of field test data from literature for verification purposes are provided in Sect. 6, along with the complete flow of work and the final results. Section  7 presents the concluding remarks of this work.

2 Problem formulations

Model updating of the initial and final class is done by considering algorithms that can utilise the causal information to learn the probability distribution efficiently. In order to balance the trade-off between observation and intervention, we generated and used synthetic data as prior in the Causal BO to approximate the results. Per the Bayesian theorem, the conditional posterior distribution function can be evaluated by multiplying the likelihood function by the prior then dividing by the evidence, as given in Eq. 1. In this model, D is a matrix of measured modal data based on the modelling error, Coefficient of Variation (COV) of frequency, COV of mode shape, variance of frequency, and variance of mode shape obtained from literature (Lam et al. 2014; Au et al. 2012). Our prime objective is to minimise each error, specifically the errors in frequency and mode shapes and find its corresponding natural frequency. Thus, the following equation is obtained:

$$p(\alpha |D) = \frac{p(D|\alpha )p(\alpha )}{p(D)}$$
(1)

where \(p(\alpha )\) represents the prior PDF of model parameters, which depends on experience and human knowledge, whereby Gaussian distributions are additionally a decent choice (Lam et al. 2004); and \(p(D|\alpha )\) is the likelihood PDF, which is a conditional probability of obtaining the actual set of data or evidence based on the set of prior or uncertain modal parameters \(\alpha\). The measured frequencies and corresponding mode shapes are the modal parameters that are uncertain. The posterior PDF becomes even more complicated when we consider the class of models, and hence by neglecting class, the equations are simpler, as given by Lam et al. (2015). Sampling from a posterior distribution like above is a challenging task as it is a vector of large dimensions. Considering a single value for each parameter in D (modelling error \(x_1\), COV of frequency \(x_2\), COV of mode shape \(x_3\), variance of frequency \(x_4\), and variance of mode shape \(x_5\)), the normalised form of Eq. 1 becomes:

$$p(\alpha |D=x) = \frac{1}{Z}p(D=\{x_1,x_2,x_3,x_4,x_5\}|\alpha )p(\alpha )$$
(2)

where Z is a normalizing constant, which makes the probability equal to 1; \(e_{f^*,a}\) is the prediction error of natural frequency; and \(f^*_a\) is defined in Eq. 3. A similar idea was adopted to explain the identified mode shapes \(\phi ^*_a\). As presented by Lam et al. (2015), the mode index is a factor on which the prediction error \(e_{{f^*,a}}\) depends. The prediction error of measured frequency is determined as follows :

$$e_{{f^*,a}} = {f^*_{a}} - {f^*_{a}}(\alpha )$$
(3)

where \({f^*_a}\) denotes the measured frequency; and \({f^*_a}(\alpha )\) is the model predicted frequency. The model predicted response depends on the uncertain model parameters. In this work, the error between the measured response and the model predicted response are highlighted.

In literature, the variance in measured data were considered equal to the variation in prediction error, which underestimated the modeling errors (Zhang et al. 2013). A fast Bayesian FFT model identification was previously applied to analytically obtain the associated variance of the prediction error in Au (2011).

The prediction error of the observed mode shapes \(e_{\Phi ^*,a}\) as in Lam et al. (2015) can be modeled as:

$$e_{\Phi ^*,a} = {\Phi }^*_a - \Gamma _a\Psi _{a}(\alpha )$$
(4)

The maximum probable value (MPV) of mode shape in the mode index a is represented by \({\Phi }^*_a\), while the model predicted mode can be represented as \(\Gamma _a\Psi _a(\alpha )\). In the latter mode, \(\Gamma _{a}\) depicts a decision matrix consisting of binary digits (0, 1) for selecting the DOFs measured from the mode shape predicted by the model. A unit length is used as a default to normalize the mode shapes (both measured and model predicted). According to Lam et al. (2014), derivation of the mode shape prediction error can be determined in the following equation, in which \(\delta ^2_{a}\) represents the variance of the mode shape prediction error:

$$\delta ^2_{a} = 2(1-E({\text{MAC}}_a))\sim 2\left\{ 1 - \left\{ 1 + \sum _{a=1}^{r} \lambda ^2_{a} \right\} \right\}$$
(5)

The posterior co-variance matrix corresponding to mode shapes are represented by eigenvalues \([\lambda ^2_{1},\lambda ^2_{2},\ldots ,\lambda ^2_{r}]\) (Au and Zhang 2011). Equation 5 represents how the approximation of the variance can be done by approximating the MAC number from the obtained eigenvalues. Modal Assurance Criterion (MAC) is utilized to develop the fundamental error between the calculated and measured mode shapes, which quantifies the closeness between two vectors (Ewins 2000) and is calculated as follows:

$${\text{MAC}}(\Phi ,\Psi ) = \frac{\Phi ^T \Psi }{||\Phi ||||\Psi ||}\,\,\,\,\, {\text{for}} \, 0\leqslant {\text{MAC}} \leqslant 1$$
(6)

where || || denotes the second norm of the vector; and \(\, \Psi\) and \(\Phi\) are the model predicted and measured mode shape vectors, respectively. The posterior PDF of uncertain modal parameters considering the prediction error of individual observed modal parameters is evaluated independently via Eq.7:

$$p(\alpha |D=x) = \gamma p(\alpha |D=\{x_1,x_2,x_3,x_4,x_5\})\exp\left( -\frac{1}{2}J(\alpha )\right)$$
(7)

where \(\gamma\) is a normalizing constant; and J indicates the error corresponding to both the frequencies and mode shapes, which is the measure-of-fit function; and the terms \(\sigma ^2_a\) and \(\delta ^2_a\) represent the variance of frequency and variance (deviation from mean) of mode shape, respectively (Lam et al. 2014). The combined term of error divided by variance for both the parameters for each measurement is represented by J as it contains a sum of all the errors in the model and is minimised for a better model.

$$J(\alpha ) = \sum _{a=1}^{r}\left( \frac{(f^*_{a}-f^*_{a}(\alpha ))^2}{\sigma ^2_{a}} + \frac{2(1-{\text{MAC}}(\Phi ^2_{a},\Gamma _{a}\Psi _{a}(\alpha ) ))}{\delta ^2_{a}} \right)$$
(8)

To obtain the optimized value of input data, we minimised the measure-of-fit function given in Eq. 8. The J function shows the inconsistency between the model predicted and measured data. Herein, a single model was targeted, and the posterior PDF was estimated by generating samples together with their weighting factors.

3 Description of generative models and generating synthetic data

3.1 Generative adversarial networks

Generative Adversarial Networks (GANs) are simply a set of generative models that are able to generate new content from scratch. GANs consist of two deep networks, namely a generator and discriminator. The generator creates synthetic samples or fake data, whereas the discriminator classifies real data from fake data produced by the generator. The discriminator also evaluates the probability of the similarity between generated data and actual data. Subsequently, the difference between the real and generated data reduces as the discriminator identifies the tiny gap between them. Some examples of GANs include conditional GANs (cGANs), deep convolutional GANs (DCGANs), discovery GANs (DiscoGANs), TGANs, etc. In this work, TGANs were used due to their capability of generating a tabular dataset.

A table contains discrete (multinomial) random variables and continuous random variables that follow an unknown joint distribution. A synthetic table is created by the samples generated by the generative model. The synthetic data table is generated such that the mutual information between an arbitrary pair of variables is similar, and it could achieve the same accuracy on a real test table (Xu and Veeramachaneni 2018).

Fig. 1
figure 1

Variation of hyper parameters against similarity scores

3.2 Generation via TGANs and evaluation of data

To generate synthetic data using TGANs, an AdamOptimizer was employed to minimise the network loss. In the generator, TGANs uses Recurrent Neural Networks with a learning rate of 0.001 and 100 features. In the discriminator, TGANs applies a Neural Network Classifier (SoftMax at the end layer) and 100 hidden nodes. The whole network runs for five epochs and subsequently emits nearly equal losses after three epochs. Herein, we calculated the similarity score between the synthetic data and real data by taking the mean of five different similarity values (\(S_{\text{basic}}\), \(S_{\text{corr}}\), \(S_{\text{mirr}}\), \(S_{\text{pca}}\), \(S_{\text{est}}\)). Each similarity values is calculated separately and their mean represents the similarity score. Specifically, \(S_{\text{basic}}\) is calculated by determining the Pearson’s correlation coefficient of the mean and standard deviation of the two distributions. \(S_{\text{corr}}\) is determined by calculating the Pearson’s correlation coefficient of the association matrix. \(S_{\text{mirr}}\) is the mean of association of both datasets. \(S_{\text{pca}}\) is the mean absolute percentage error (MAPE) value representing the average error in percentage differences between the real and synthetic data. \(S_{\text{est}}\) is calculated by finding the Pearson’s correlation coefficient of the square root of the difference in squared mean values of both distributions, or Root mean squared error (RMSE) scores of two columns. The scores range \([-1,1]\), where a score close to 1 reveals similarity between the synthetic data and real data (near positive correlation), and a negative score indicates an inverse proportional relationship between the two datasets (Xu and Veeramachaneni 2018). We further evaluated the hyperparameters of the generator and discriminator (features, hidden nodes, learning rate and optimizer) of TGANs against the similarity scores in Fig. 1. To do so, we varied each given hyperparameter of TGANs, while keeping the others constant. The results obtained helped to select the values of the hyperparameters of TGANs. For each given hyperparameter value, we evaluated similarity score 15 times (i.e. 15 observations) and then measured the mean and error rate, which are shown in each bar of the graphs in Fig. 1. The similarity scores for TGANs, RBMs, and VAEs were determined to be 0.897, 0.704, and 0.311, respectively.

  • Cumulative sums—this metric helps to visually inspect column-wise (or data features) distributions. For each continuous column, the data distribution can be understood with just one plot, allowing one to determine which kind of values and columns are more appropriate or more difficult than others.

  • Column correlations - this metric typically presents the association information in a tabular form for both real and synthetic data. Column correlations indicate the difficulty faced by generative models while creating a learned-model (learning the probability distribution) by showing where the synthetic data diverge. This metric also helps to understand the association among the data.

  • Similarity score—this single value score indicates the closeness between the synthetic dataset and the real dataset. When the actual data and synthetic data resemble each other, the scores of each metric should be very similar. This score ranges [-1,1], where 1 means a positive correlation, 0 indicates no relationship, and -1 reveals a negative correlation.

4 Implementation and evaluation of CBO

4.1 Improvement in causal Gaussian process (cGP)

We improved the existing cGP model , which is a standard GP that takes the prior from observational data and builds the posterior from intervention data (Aglietti et al. 2020), using the prior from observational as well as synthetic data (from a generative model). For any generative model, the generative model is first trained using a sample set, and then the generated data are used as prior for training do-calculus. In this work, we trained TGANs (a generative model) from existing observational data, which helped to incorporate a better prior distribution (Patel and Oberai 2019) and improve the posterior formation to perform do-calculus. Our model employs observational and synthetic data to form the prior of the Causal Gaussian Process and uses interventional data to form the posterior. For example, if there is an increase in the outcome variable due to an increase in the input variable, CBO can evaluate the change in the outcome variable when the same input variable decreases (its counterfactual) via the predicted data from do-calculus. In this section, we attempt to connect the synthetically generated prior along with the posterior.

Let \(p^{\text{true}}_\alpha (\alpha )\) be a true distribution and \(\alpha\) be a vector, which is sampled from the true distribution. Also, g(z) represents the generator of generative model, which is trained using real data, and \(z\sim p_z(z)\) distinguishes the latent vector space. The generative model has the capability to learn the true distribution of real data provided sufficient data and infinite capacity (Goodfellow et al.), as given in Eq. 9:

$$p^{\text{true}}_\alpha (\alpha )=p^{gen}_Z(\alpha )$$
(9)

where \(p^{\text{gen}}_Z(\alpha )\) can be defined as:

$$\alpha \sim p^{\text{gen}}_Z(\alpha )\Rightarrow x=g(z), z\sim p_z(z)$$
(10)

where the components of \(p_z(z)\) (multivariate distribution) conform to a uniform or Gaussian distribution and are independent and identically distributed (iid). The above equations signify that generative models sample z from \(p_z\) and, hence, create synthetic samples of \(\alpha\).

In this problem, we consider a cGP regression but in a customized manner. In literature, the idea of generating distributions and using it in Bayesian inference has been demonstrated (Patel and Oberai 2019). In Eq. 9, the prior is replaced by \(p_Z^{\text{gen}}(\alpha )\), which is an efficient approach by rewriting in terms of latent vector z. The updated posterior from Eq. 2 is then formulated as:

$$p_Z^{\text{post}}(\alpha |D=x) = \frac{1}{Z'}p(D=\{x_1,x_2,x_3,x_4,x_5\}|\alpha )p_Z^{\text{gen}}(\alpha )$$
(11)

The data used for training the generative models were taken from Lam et al. (2014) and Au et al. (2012). The modelling error includes \(x_1\), COV of frequency \(x_2\), COV of mode shape \(x_3\), variance of frequency \(x_4\) and variance of mode shape \(x_5\). The expectation of the posterior distribution for \(\alpha\) is equivalent to the expectation for z. \(Z'\) is another normalising constant that makes the probability tend to one and is illustrated as:

$$\alpha \sim p(\alpha |D)\Rightarrow x=g(z), z\sim p_Z^{post}(\alpha |D)$$
(12)

4.2 Evaluating cGP

After evaluating the synthetic data, we used it with observational data as prior to the causal GP and trained it with interventional data. We compared the priors of the surrogate model (cGP) by considering priors from Variational Autoencoders (VAEs) and Restricted Boltzman machines (RBMs). In order to select the best model for generating synthetic data, we evaluated different generative priors with the following metrics for each priors:

  • Negative Log Predictive Density (NLPD)—By generating samples, which approximate the maximum likelihood value, we created the Parzen windows distribution. The calculation of NLPD is provided by Bengio et al. (2013) and Aggarwal et al. (2019), which can also take negative values as the probability density can take infinite positive values. The hyperparameters were tuned to obtain the minimum NLPD. Also, NLPD turns out to be the main metric of interest in this work.

  • Mean Absolute Error (MAE)—For any sample (xy) provided to a regressor, MAE can be calculated as the difference between the mean of the predictive distribution as returned by the regressor (Aggarwal et al. 2019) and the true distribution. For a GP, the mean is the value at which the PDF takes the maximum value.

We calculate the NLPD and MAE for each combination of priors from different generative models and evaluated the efficiency of cGP. The results were obtained by running the evaluation for each metric 15 times, and the mean and standard deviation of the results are summarized in Table 1. We can observe the minimum value of NLPD and MAE while using TGANs prior. Although, the MAE values did not vary much in between models, but it showed significant results for TGANs. The VAEs show an abnormal NLPD value, from which we determined that the discrepancy in the data are due to a greater variation from the observational dataset.

Table 1 Evaluation of cGP

5 Causal Bayesian optimisation

5.1 CBO and BO

Causal Bayesian inference is a combination of BO with causal inference. In any non-deterministic problem with optimisation objectives, BO has given efficient results. As an optimiser, we know that every iteration is costly, and, thereby, reduced further to obtain efficient results thereby trying to get efficient results by reducing it further. CBO is an extended form of BO with two major advantages. First, CBO considers the dependency (or the causal reaction between them) on the input variables and outcome variables. For example, our optimisation problem could be shaped in such a way that any of the individual input variables, like COV of mode shape, leads to a change in the outcome variable. On the other hand, some other variable, such as variance of mode shape, would not affect the outcome variable directly, rather it would help in creating a dependency graph (kind of Bayesian Nets) with the COV of mode shape. Hence, any change in the variance of mode shape would lead to a change in the COV of mode shape and not in the outcome variable directly. Different causal graphs are illustrated in Fig. 2, which represent possible causal relationships between the input and outcome variables.

Forming a causal graph can require extensive domain knowledge and painful efforts. There are a couple of works and tools available as an open source that try to learn the causal graph from observational data (Zheng et al. 2018; Py-causal 2021). Tools that help in forming causal graphs from observational data seem to give a major boost to black box systems or systems where the user or administrator does not have domain expertise. In our work, we employed such tools to form a causal graph from only observational data. The graph in Fig. 2a was prepared with the tool—“Directed Acyclic Graph (DAG) with no tears”, and the graph in Fig. 2b was creating with the py-Causal tool. We also generated a manual causal graph (as shown in Fig. 2c) based on our experience with the optimization problem variables. Subsequently, the three graphs were evaluated.

The second major advantage of using CBO is computing and evaluating counterfactual. Collecting a large number of experimental data could result in all possible variations of each feature in the dataset. However, experimentation is costly and sometimes cannot be performed in certain areas. CBO can study the variations in data by evaluating its counterfactual and checking its impact on the outcome variable. Such a feature could save a lot of experimentation and lead to a better analysis of outcome variables even on less data. This is one of the major advantages of CBO which BO could possibly incorporate. Causal BO uses do-calculus for predicting the outcome variable at the point to be intervened (counterfactual data) from observational data and interventional data. It uses observational data to evaluate the prediction of interventional data without actually calculating it (Pearl 1995). The allocation of new points and variables to be intervened for evaluation in the CBO process is done using Causal Expected Improvement (CEI).

Fig. 2
figure 2

Causal Graphs a \(C_1\)—Casual Graph using DAG (Zheng et al. 2018) b \(C_2\)—Causal Graph using py-causal (Py-causal 2021) c \(C_3\)—Manual Causal Graph

We utilised a Minimal Intervention Set (MIS) containing a set of variables to be intervened (Aglietti et al. 2020). The variables used for each CBO set depends completely on how the causal graph has been built. Considering each causal graph in Fig. 2, we assume that the variables change across the input and outcome variable or, we have observed some confounders. The variables to be considered in the complete BO change following each causal graph separately then the results are compared to determine the accurate outcome. The first graph (\(C_1\)) (Fig. 2a) was created in such a way that the COV of frequency did not affect the outcome variable directly but caused a change in the variance of frequency, subsequently resulting in a change in the modelling error and outcome variable. MIS is the minimum intervention set or set of the minimum number of variables intervened to obtain the optimum outcome variable (Aglietti et al. 2020). \({\text{MIS}}_{C_1}\) is the minimum intervention set for the first causal graph. x is the complete set of input variables, that is, \(x_1\), \(x_2\), \(x_3\), \(x_4\), \(x_5\). The MIS for the three different causal graphs (\(C_1, C_2, C_3\)) was determined as follows:

$${\text{MIS}}_{C_1}= {} \{\{\phi \}, \{x_1\}, \{x_4\}, \{x_5\}, \{x_1,x_4\}, \{x_1,x_5\}\}$$
(13)
$${\text{MIS}}_{C_2}= {} \{\{\phi \}, \{x_1\}\}$$
(14)
$$\begin{aligned} {\text{MIS}}_{C_{3}}&= {} \{\{\phi \}, \{x_1\}, \{x_1,x_2\}, \\&\quad \{x_1,x_3\}, \{x_1,x_2,x_3\}, \{x_4\}, \{x_5\}\}\end{aligned}$$
(15)

In a nutshell, apart from the causal graph, we need to determine the variables to be intervened to meet the requirements of CBO (Aglietti et al. 2020). We use the mapped distribution in equation11 (which contains complete set of variables) for traditional Bayesian Optimisation and selected the variables from MIS in Eq. 13-15 based on the three casual graphs, respectively.

The term \(p_Z^{gen}\) is common in Eq. 11 represents the updated prior distribution. The variables used in Eq.11 contain all the parameters in the measured model data, but the variables in the causal equations consider variables according to the respective causal graph.

For the interventional set selected based on the causal graph, an acquisition function is needed for exploration and exploitation. We used the same definition of a standard expected improvement (EI) (Stuart 2010) as shown below:

$${EI(a) = {\mathbb{E}} \max(f(a) - f(a^+), 0)}$$
(16)

where a\(^+\) is the position of the best input value so far. Also under a GP model, the EI can be computed as follows,

$${EI(a) = (\mu (a) - f(a^+) - \xi )\Phi (Z) + \sigma (a)\phi (Z)}$$
(17)

where Z is calculated as follows:

$${Z = \frac{\mu (a) - f(a^+) - \xi }{\sigma (a)}}$$
(18)

As per the above equation of EI, \(\xi\) is the most important parameter for balancing exploration vs. exploitation. This EI is computed for every element of MIS (i.e. for each MIS element, we have a trained GP surrogate model). Causal EI (CEI) Aglietti et al. (2020) is computed by taking the max EI from all elements of MIS. In an optimisation iteration, each element in MIS with maximum EI is considered for exploration, while the other GP models are not used.

It is also important to note that the calculation of CEI in Eq. 17 needs the mean and variance from any GP. In our case, the GP gets is prior from observational data and posterior from interventional data. Upon forming the posterior, these trained GP models provide mean and variance at any given sampled point.

6 Model updating of coupled slab using causal Bayesian inference

6.1 Description of structure

A 3-story coupled slab concrete structure, including a recreational center and public library, was used for verification purposes, as illustrated in Fig. 3. This structure was targeted because of the random vibrations that occur due to the usage of the building for various purposes, which causes societal vibrations. The slab on the second and third floor share the same plan and consists of four orthogonal and eight planar trusses. The building has a height of 40 meters and contains six columns. The slabs (\(35\,{\text{m}}\,\times \, 35\,{\text{m}}\)) on the second and third floor of the building are coupled. The slabs are connected by columns that are attached to the truss system by braces that results in a well-built linkage, causing the dynamic behaviour of both floors to be coupled. The third floor consists of two basketball courts, and the second floor contains a kids play space and a large multipurpose room.

Fig. 3
figure 3

Plan of the slab structure

6.2 Ambient vibration test and modal identification

A total of 126 measured locations was considered, which results in \(126 \times 3 = 378\) measured DOFs. The total experiment was divided into 35 different frameworks. In the testing procedure, triaxial accelerometers (six) were used along with a signal processing unit. The measurements were taken in the time domain represented by a singular value spectrum, which were later converted into the frequency domain by Fourier transform. The data recorded gave the frequencies of the corresponding structure in the three different modes, and their corresponding uncertainties were evaluated along with them. We first used the data from the literature and then formed the finite element model of the coupled slab structure in ANSYS. The data were used to create the geometry in ANSYS, utilising the information given in literature (Au et al. 2012). The natural frequency for the three modes of vibration was obtained, then the errors in measured and predicted data were considered to determine the modelling error in our dataset, for which we aimed to reduce the minimum number of iterations. Ultimately, a time- and cost-effective model with higher accuracy was created.

6.3 Proposed methodology and results

Fig. 4
figure 4

Flowchart presenting an overall workflow for optimizing the model parameters

The flowchart in Fig. 4 presents the overall flow for updating the model parameters in the least number of iterations using a causal BO. The first stage of the flowchart shows how the database was constructed. For example, x, y, z represent the unknown parameters of interest. In the current procedure, the measured modal data contain coefficients of variation and variance of various model parameters. Coefficient of variation (COV) and variance of parameter represent the deviation of a particular data from its mean.We then generated synthetic data using TGANs (Tabular GANs) and try to map the data as close to real data as possible. TGANs takes the observational data (or experimental data) from the database and generates synthetic data. We then evaluated the synthetic data using various metrics and construct priors. The generation and evaluation of synthetic data (which took 10-15 minutes) were conducted in a Colab setup at a one-time cost, which can be used as many number of times as needed. The synthetic data along with observational data were passed on to the next optimisation steps. The causal Gaussian process in the causal BO (CBO) was trained with the manipulative and outcome variables based on the causal graphs and prior (here, prior was captured by observational data alone or observational and synthetic data). The parametric value was determined as the CBO optimisation iterations were started. The causal expected improvement was taken as the acquisition function, which asks for the next sampling point to evaluate. The Gaussian was then ready to evaluate the function at any point as required by the causal expected improvement (acquisition function). The acquisition function also figures out which variables to intervene (which is essentially the selection of one among multiple surrogate models). The causal Bayesian algorithm then runs on a minimised error mode, whereby the error contains the difference between the measured and finite element predicted frequency summed up with the error in mode shapes (as shown in Eq. 8), which we aimed to minimise for a better model. If the total number of iterations gets exhausted considering the experimental budget, then the loop ends with an optimal solution. The loop also ends when the optimal solution of the objective function remains constant. If any further evaluation is required, we can re-run the loop until we get an optimal solution.

For the model updating of the coupled slab structure using the suggested method, an FEM was established in the ANSYS workbench based on data in Jang and Smyth (2017) and Aglietti et al. (2020). The natural frequency of the structure and the measured mode shapes were obtained. The difference between the measured data and predicted data were termed as the modelling error, which was minimised in this model. We also tried to minimise the gap between model predicted response and the measured response. We determined the frequency at the point of minimum error and compared it with the mean measured frequency. The model was then updated by calculating posterior PDF, and the prior was taken from do-calculus, which was trained using different measured model data distributions (via Causal Gaussian Process). The Gaussian process aids the acquisition function that is used for guiding the selection of the next evaluation point. Subsequently, the next determining point is evaluated, and the process continues until the minimised (or maximised) point is determined. If the number of iterations is small, the function is not entirely assessed, and if they are too large, the optimised value becomes constant after a specific iteration number.

We designed a Causal BO Engine, which uses the addition of synthetic data and observational data as prior distributions for building up the posterior using intervention data. The convergence graphs in Figs.567 reveal that the errors decreased as the number of iterations increased. It can be observed from the plots that, for each model, the error was reduced to a certain limit, then the variation almost became negligible. We performed the accuracy testing based on the three causal graphs in Fig.  2. In Fig. 5, we utilised the first causal graph, which was created using DAG. The number of iterations taken by CBO considering synthetic data seems to take the least number of iterations for solving the same problem as the others. We also evaluated CBO with a default Gaussian prior and with prior generated from TGANs (similarity score = 0.897), RBMs (similarity score = 0.704), and VAEs (similarity score = 0.311), respectively. We can observe from the plots that, as the similarity score increased, the optimum was reached with less iterations for each causal graph. We also found that BO took the maximum number of iterations. Results further reveal that the Bayesian engine performed almost the same for all three cases since it does not depend on the causal graphs. In Fig.  7, the CBO took more iterations in the third case, as the causal graph was manually plotted. However, the performance of CBO was found to be very cost-effective and accurate when traditional BO engines were considered, also saving the number of iterations. In Figs. 89, we observe the optimised frequency and MAC (for mode shapes) at the point of minimum error. When the global optimum is reached, that is, when the variation in error becomes nearly zero as the model runs, the results for the parameters to be updated are obtained. We ran the model for each data set 15 times and calculated the standard deviation (which shows the deviation from mean results) of the values obtained. The optimal results of the measured frequencies and the calculated frequencies and mode shapes by various models and the error associated with each model is shown in Table 2.

Table 2 Evaluation of cGP
Fig. 5
figure 5

Convergence of CBO and BO for \(C_1\)

Fig. 6
figure 6

Convergence of CBO and BO for \(C_2\)

Fig. 7
figure 7

Convergence of CBO and BO for \(C_3\)

Fig. 8
figure 8

Optimal Frequency at \(7{\text{th}}\) iteration using CBO utilising Synthetic data

Fig. 9
figure 9

Optimal MAC number at \(7{\text{th}}\) iteration using CBO utilising Synthetic data

7 Conclusion

Causal Bayesian model updating is performed in this work and validated through empirical evaluations, which achieved optimal results via adding synthetic data in terms of accuracy and a reduced number of iterations. In this work, we generated synthetic data using TGANs. A comparison and evaluation of synthetic data with real data and how it may be utilised adequately to take care of the causal Bayesian model updating issues are presented, and the results were confirmed by obtaining the optimised values. The formulations of the error of frequencies and mode shapes were given special observation. This work also demonstrates the good performance of the regressor with TGANs, which also reduced the computational time by nearly decreasing iterations by \(30\%\). We further plotted three different causal graphs to show the contrast between the performance of CBO and previous models. The optimised frequency obtained by the proposed model was compared to experimental data and existing models, verifying that CBO with synthetic data are the most cost-efficient and accurate. The enhanced results are primarily due to the reflection of the homogeneous frequency distribution in the prior. This case study provides an exceptional model for the necessary model updating in structural health monitoring of structures.