1 Introduction

Finite Element Model Updating (FEMU) refers to all strategies and algorithms intended for the calibration of an existing FE model based on experimental evidence, especially vibration data [1]. Data used for such calibration or updating purposes can be acquired from occasional in situ surveys or by an embedded, permanent monitoring system. For the specific case considered here—and as very commonly used in real-life circumstances—modal parameters, extracted from acceleration time histories, are utilised. This is a well-known example of the indirect FEMU method [2], where the input parameters are varied to match the output results (natural frequencies and mode shapes). At its core, this represents an optimisation problem.

Therefore, the aim is to estimate the mechanical properties to be assigned to the numerical model, given its geometry. Once calibrated, the FE model can be used in several ways. The most obvious application is for predictive analysis, e.g., to estimate the remaining resilience of the structure in case of strong motions or other potentially dangerous events. If a predictive model was already available before a specific damaging event (e.g., a major earthquake), updating that FE model and comparing the estimated stiffnesses before and after the seism can be used for model-based Structural Health Monitoring (SHM) and damage assessment [3, 4]. That allows not only basic damage detection but also specific advanced tasks such as damage localisation and severity assessment. A third use is for hybrid simulations [5, 6]; in these applications, a target structure is divided into experimental and numerical substructures, due to practical limitations or to save costs. FEMU allows one to match the response of the numerical components to their experimental counterparts. Finally, in the case of continuously monitored structures and infrastructures, constantly re-updating a detailed Finite Element Model represents an enabling technology required for Digital Twins. This can serve the decision maker to evaluate the current and future structural situation of the assets under management. More details about the basic and general concepts of FEMU can be found in the works of Friswell and Mottershead—e.g., [7] and [8].

1.1 Efficient Bayesian sampling for finite element model updating

Arguably, one of the major issues about the FEMU procedure described so far is that the optimisation problem can become computationally expensive and very time-consuming, especially when dealing with complex FE models.

On the one hand, numerical models are becoming more and more complex, and so very computationally demanding. The need for very efficient optimisation techniques suitable for potentially highly demanding tasks is therefore clear. Nonetheless, an optimisation algorithm should discern  the global minimum across the function domain, thereby circumventing the risk of encountering local minima. Unfortunately, sampling efficiency and global search capabilities are somewhat conflicting goals. Consequently, global optimization techniques that require high sampling volumes to search the space for the global optimum are frequently employed.

For these reasons, the approach proposed here employs Bayesian Sampling Optimisation [9] in the framework of FEMU. As will be described in detail in the Methodology section, Bayesian Sampling Optimisation (or simply Bayesian Optimisation, BO, for short) uses the basics of Bayes’ Theorem to infer the best sampling strategy in the search domain. This greatly increases the computational efficiency of the procedure, vastly reducing the sampling volume required to attain a solution, especially when a larger number of parameters needs to be estimated at once, as in the case of damaged structures and infrastructures, where multiple areas can be affected by different levels of damage.

In common practice, also according to the visible crack pattern, the target system is divided into macro-areas, under the assumption that these substructures will have different mechanical properties [10]. In the most common case, the local parameters of these macro-areas (Young’s moduli, etc.) must be jointly estimated to match the global dynamic response of the structure. Hence, the dimensionality of the search space of the optimisation function, defined by these numerous parameters, easily ten or more [11]—can become very high. As a note, it is important to remark that this will be the intended use of the term ‘Bayesian’ in this work; other research works, for example [12,13,14,15], and Ref. [16] among many others, use the same adjective to refer to the estimated output. Instead, this paper focuses solely on Bayesian sampling and its effectiveness for the optimisation of expensive functions in search spaces characterized by high dimensionality.

1.2 Applications of FEMU to historical architectural heritage and earthquake engineering.

The proposed Bayesian Optimization-based FEMU strategy is validated on a case study of interest for Structural Dynamics purposes, the bell tower of the Santa Maria Maggiore Cathedral in Mirandola. This historical high-rise masonry building suffered extensive damage after the 2012 Emilia Earthquake and has been the subject of several research studies throughout the years—see e.g., [1]. Both numerically simulated and experimental data were employed, thereby allowing to benchmark the proposed approach with a known ground truth and in a controlled fashion.

Indeed, regarding this specific application, FEMU is especially important for Earthquake Engineering. After major seismic events, reliable and predictive FE models are required as soon as possible to design and evaluate temporary interventions that should be deployed in the immediate aftermath to secure the damaged structures. However, the calibration of such FE models is not trivial. This situation is relatively worse for masonry structures, where even the properties of the original (pristine) structure are more difficult to estimate than homogeneous materials such as structural steel. In architectural and cultural heritage (CH) sites, even the pre-earthquake material properties are often unknown due to the lack of historical records; yet they have notoriously low mechanical resistance, due to their centuries-old ageing. These aspects make these unique and irreplaceable structures strongly vulnerable. Among them, historical bell towers are at particular risk during seismic events, due to various factors such as their relative slenderness, several potential failure mechanisms, and building material (bricks and mortar) [17]. These further underscore the importance of implementing robust monitoring strategies to detect and track damage development in such structures [18].

Thus, FEMU represents a precious tool for CH. Moreover, even after the first phase of a post-earthquake emergency, model updating is an important tool for vibration-based continuous monitoring and/or periodic dynamic investigations [19]. In the short to medium term, seismic aftershocks can cause more damage than the main shock, as strong motions insist on accumulated damage; in the long run, the initial cracks can expose structural vulnerabilities to external environmental factors.

Some noteworthy examples of FEMU applications can be found in [17, 20,21,22,23] and [24]. A broader, up-to-date review of Structural Health Monitoring (SHM) techniques successfully applied to CH structures is given by [25], while [26] specifically delves into the historical and contemporary advancements in SHM concerning the Garisenda tower in Bologna, Italy, a heritage structure similar in many aspects to the case study under examination. Similarly, Refs. [27] and [28] thoroughly analyses the Civic Tower of Ostra, Italy, and the Civic Clock tower of Rotella, respectively, using detailed numerical models and experimental data to assess the structural condition and establish standards for ongoing maintenance, posing the accent on the use of Genetic Algorithms. The remainder of this paper is organised as follows. In Sect. 2, the theoretical background of Finite Element Model Updating and Bayesian Sampling Optimisation are discussed in detail. In Sect. 3, the specific methodology of the algorithm implemented for this research work is reported. The three optimisation algorithms used for the comparison of the results are also briefly recalled. Section 4 describes the case study. Section 5 comments on the results, comparing the BO estimates with the three benchmark algorithms and the findings retrieved from the published scientific literature on the same case study. Finally, Sect. 6 concludes this paper.

2 Theoretical background

Parametric models (such as finite elements models) are described by a vector of model parameters \({\varvec{\theta}}\). Thus, being \(M\) the model operator, \({\varvec{y}}={\varvec{M}}({\varvec{x}},{\varvec{\theta}})\) returns the output vector \({\varvec{y}}\) for a given input vector \({\varvec{x}}\). For obvious reasons, in model updating, it is preferable to adopt outputs that are independent of the input and dependent on the model parameters only (such as modal features). According to this assumption, the \({\varvec{x}}\) vector can be dropped, and the input–output relationship is simply represented by \({\varvec{y}}={\varvec{M}}({\varvec{\theta}})\).

Finite elements model updating methods fall into two categories, direct methods and iterative methods (the latter also called deterministic). Direct methods try to improve observed data and computed data agreement by directly changing the mass and stiffness matrices; this leads to little physical meaning (no correlation with physical model parameters), problems with elements connectivity, and fully populated stiffness matrices. For these reasons, they are seldom used in common structural engineering applications. The iterative methods attempt to obtain results that fit the observations by iteratively changing the model parameters: this enables retaining good physical understanding of the model and doesn’t present the above-mentioned problems. The degree of correlation is determined by a penalty function (or cost function): optimising this function requires the problem to be solved iteratively, which means computing the output (i.e., performing a FE analysis) of the numerical model at each iteration. Hence, a higher computational cost is the major drawback of iterative methods.

Many FEMU methods have been proposed and successfully used: sensitivity-based methods, [29, 30], and [31]; eigenstructure-assignment methods, [32] and [33]; uncertainty quantification methods [34]; sensitivity-independent iterative methods, [35]; and many more [36].

As described, model updating is an inverse problem, as it aims at inverting the relationship between model parameters and the model output to find the optimal set of parameters \({\varvec{\uptheta}}\) that minimises the difference between computed data and measured data.

In this sense, model updating can be simply considered as the following constrained optimisation problem:

$${\varvec{\uptheta}}^{*}=\text{arg}\underset{\uptheta \upepsilon {\mathrm {D}}}{\text{min}}{\text{F}}\left(M\left({\varvec{\uptheta}}\right),{\varvec{f}}\right),$$
(1)

where \({{\varvec{\uptheta}}}^{*}\) is the set of optimal parameters, \(D\) is the parameter space, \(F\) is the cost function and \({\varvec{f}}\) is the measured data.

The whole process of solving \({\text{F}}\left(M\left({\varvec{\uptheta}}\right),{\varvec{f}}\right)\) – the output of the numerical model “post-processed” in some way by a cost function – may be conceived as computing an unknown (non-linear) objective function of the model parameters \({\varvec{\uptheta}}\), which constitutes the sole input of the numerical model to be updated. Typically, this objective function is non-convex and expensive to evaluate. The output surface of the objective function lies in a \(d-\) dimensional space, where \(d\) is the number of parameters to be optimised. The sampling volume is exponential to \(d\), thus posing an implicit restriction to the number of parameters that can be optimised.

Many optimisation algorithms have been developed in the last decades, each of them with its peculiar strengths and weaknesses. Among them, three of the better-known and most extensively used are Generalized pattern search (GPS) algorithms, Genetic Algorithms (GA), and simulated annealing (SA) algorithms.

In recent years, BO has proven itself to be a powerful strategy for finding the global minimum of non-linear functions that are expensive to evaluate, non-convex and whose access to the derivatives is burdensome. Furthermore, Bayesian sampling optimisation techniques distinguish themselves as being among the most efficient approaches in terms of a number of objective evaluations [37,38,39,40,41].

The essence of the Bayesian approach lies in the reading of the optimisation problem given by the ‘Bayes’ Theorem’:

$$P(M|E)\propto P(E|M) P\left(M\right),$$
(2)

which mathematically states that the conditional probability of event \(M\) occurring given the event \(E\) is true is proportional to the conditional probability of event \(E\) occurring if event \(M\) is true multiplied by the probability of \(M\). Here, \(P(M|E)\) is seen as the posterior probability of the model \(M\) given the evidence (or observations) \(E\), \(P(E|M)\) as the likelihood of \(E\) given \(M\) and \(P(M)\) as the prior probability of the model \(M\). Essentially, the prior, \(P(M)\), represents the extant beliefs about the type of possible objective functions, eventually based on the observations already at disposal. The posterior \(P(M|E)\), on the other hand, represents the updated beliefs about the objective function, given the new observations. The process basically aims at estimating the objective function by means of a statistical surrogate function, or surrogate model.

Many stochastic regression models can be used as a surrogate: the model must be able to describe a predictive distribution that represents the uncertainty in the reconstruction of the objective function, in practice by providing a mean and a variance.

To efficiently select the next sampling point, the proposed approach makes use of an acquisition function defined over the statistical moments of the posterior distribution given by the surrogate. The role of the acquisition function is crucial since it governs the trade-off between exploration (aptitude for a global search of the minimum) and exploitation (aptitude for sampling regions where the function is expected to be low) of the optimisation process. Probability of Improvement (PI), Expected Improvement (EI) and Upper Confidence Bound (UCB) are among the most used and most popular acquisition functions in Bayesian optimisation applications.

2.1 Finite element model updating

As mentioned, when iterative model updating methods are involved, the solution to the problem described by Eq. (1) entails the optimisation of a highly non-convex, high-dimensional cost function. In this case, modal features have been chosen to evaluate the degree of correlation between experimental and theoretical results, by employing both natural frequencies and associated mode shapes.

The selection of the parameters to be updated is a crucial step to reduce optimisation complexity, retain good physical understanding and ensure the well-posedness of the problem [42]. Generally, good practices to avoid ill-conditioning or ill-posedness are (1) choosing updating parameters that adequately affect the model output and (2) reducing the number of parameters to limit the occurrence of under-determinacy issues in the updating problem [43]. The first task can be accomplished by using sensitivity-based methods to discard non-sensitive parameters, and the second by dividing the structure into sub-parts with the same material properties. Additionally, the richness and the nature of the measured data, in contrast to the degree of discretization of the finite element model, places a limit on the type and number of parameters that can be updated while retaining physical meaningfulness.

Various issues of ill-conditioning or rank-deficiency can arise in relation to the specific optimisation technique used. In the case of the BO approach, the rank of the covariance matrix of the Gaussian Process (i.e., kernel matrix) may be source of some concern. The matrix can become nearly singular if (i) the original function that is being optimised is so smooth and predictable that leads to a high correlation between sampling points, thereby generating columns of near-one values, and/or if (ii) the sampled points are very close one to another (which typically happens towards the end of the optimisation process), thereby generating several columns that are almost identical [44].

2.2 Bayesian sampling optimisation algorithm

For highly non-convex cost functions and problems denoted by high dimensionality, traditional optimization algorithms may encounter difficulty in identifying the global optimum or fail to converge, even within the framework of well-posed problem sets. In this study, we undertake a comparison between the performance of the proposed Bayesian sampling optimization approach and the outcomes derived from the aforementioned three classical alternatives. These will be discussed later in a dedicated paragraph.

When dealing with expensive and non-convex functions to optimise, both efficiency (in terms of sampling) and global search capabilities are paramount. Indeed, several global optimisation techniques have been developed over the years, but very few perform well when the number of function evaluations is kept to a minimum. One way to deal with expensive functions is by using surrogate optimisation techniques. This approach consists in substituting the objective function with a fast surrogate model, which is then used to search for the optimum and speed up the optimisation process. Of course, the validity of the surrogate model, that is to say, its capability to represent the behaviour of the underlying objective function, is of uttermost importance to obtain good and reliable results. Unfortunately, when a linear regression of the form

$$y\left({\mathbf{x}}^{(i)}\right)=\sum_{h} {\beta }_{h}{f}_{h}\left({\mathbf{x}}^{(i)}\right)+{\epsilon }^{(i)} \left(i=1,\dots ,n\right),$$
(3)

is used to fit the data (where \({\mathbf{x}}^{(i)}\) is the i-th sampled point out of a total of \(h\), \(y\left({\mathbf{x}}^{(i)}\right)\) is the associated objective value, \({f}_{h}(\mathbf{x})\) is a function of \(\mathbf{x}\), \({\beta }_{h}\) are coefficients to be estimated, and \({\epsilon }^{(i)}\) are the independent errors, normally distributed), it is arduous to determine which functional form should be employed if none or scanty a priori information about the function of interest is available. As such, these strategies are often impracticable for model updating optimisation problems.

The approach of Bayesian sampling optimisation consists of a change of paradigm. Instead of trying to minimise the error \({\epsilon }^{(i)}\) by selecting some functional form that aligns with the data, the focus is placed on modelling the error by means of a stochastic process, so that the surrogate model is of the form:

$$y\left({\mathbf{x}}^{(i)}\right)=\mu +\epsilon \left({\mathbf{x}}^{(i)}\right) \left(i=1,\dots ,n\right),$$
(4)

where \(\mu\) is the regression term (the functional form is just a constant), and the error term \(\epsilon \left({\mathbf{x}}^{(i)}\right)\) is a stochastic process with mean zero (in other words, a set of correlated random variables indexed by space). This change of perspective about the surrogate function is comprehensively described in one of the most interesting papers on modern Bayesian optimisation, [38], where the proposed method is called Efficient Global Optimisation, EGO. Besides modelling the surrogate as a stochastic process, the Bayesian sampling optimisation method makes use of an acquisition function to perform a utility-based selection of the points to be sampled. These (a stochastic predictive/surrogate model combined with the acquisition function) are in fact the two key elements of Bayesian optimisation.

BO has gained much attention in the last decades only. However, the first works on the topic have been published in the early 60s by [45]. After some developments by [46], who used Wiener processes, the concept of Bayesian optimisation using Gaussian Processes as the surrogate model was first used in the EGO formulation, combined with the expected improvement (EI) concept [47].

In the last years, several research works have proven the advantages of using Bayesian optimisation with expensive non-convex functions [48], making it a popular and well-known global optimisation technique.

Fitting a surrogate model to the data requires carrying out an additional optimisation process for the determination of hyperparameters. Furthermore, the next point to be sampled is found by searching for the maximum of the acquisition function. Hence, the BO approach entails two secondary (arguably fast-computing) optimisation problems, to be solved at each iteration: this results in a somewhat fancy and potentially heavy algorithm, which is suitable only if the objective function is considerably expensive.

In the following, this notation is often used:

$${\mathcal{D}}_{1:t}=\left\{{\mathbf{x}}_{1:t},f\left({\mathbf{x}}_{1:t}\right)\right\},$$
(5)

where \({\mathcal{D}}_{1:t}\) denotes the observations set, or sample, made of \(t\) observations in total. \({\mathbf{x}}_{i}\) is the input point vector of the \(i\)-th observation. This vector, in other words, contains the updating parameters (in the input domain). The length of \({\mathbf{x}}_{i}\) equals \(d\), the dimensionality of the updating problem, i.e., the number of updating parameters. Finally, \(f\left({\mathbf{x}}_{1:t}\right)\), also abbreviated in \({\mathbf{f}}_{t}\), are the observed values of the objective function at \({\mathbf{x}}_{1:t}\), i.e., the outputs of the cost function at each set of updating parameters \({\mathbf{x}}_{i}\).

While any probabilistic model can be adopted to describe the prior and the posterior, it should be (i) relatively light and fast, to provide quick access to predictions and related uncertainties, (ii) able to adequately fit the objective function with a small number of observations, since sampling efficiency is pursued, and (iii) the conditional variance must cancel if and only if the distance between an observation and the prediction point is zero, as this is one condition to ensure the convergence of the BO method [49].

Given these requisites, Gaussian Process priors are the chosen probabilistic model in the majority of modern Bayesian optimisation implementations. To mention some popular alternatives, [50] worked with random forests, [51] with deep neural networks, [52] made use of Bayesian neural networks, while [53] used Mondrian trees. GPs are well-suited for model updating problems where the penalty function to be minimised is continuous.

Given a Gaussian Process (seen as a continuous collection of random variables, any finite number of which have consistent joint Gaussian distributions [54]), of the form:

$$f\left(\mathbf{x}\right)\sim \mathcal{G}\mathcal{P}\left(m\left(\mathbf{x}\right),k\left(\mathbf{x},{\mathbf{x}}^{\mathrm{^{\prime}}}\right)\right),$$
(6)

where \(m(\mathbf{x})\) is the mean function, and \(k\left(\mathbf{x},{\mathbf{x}}^{\mathrm{^{\prime}}}\right)\) is the covariance function (which models the level of correlation between two observations, \({f}_{i}\) and \({f}_{j}\), relatively to the distance between the points \({x}_{i}\) and \({x}_{j}\)), the covariance can be computed for each pair of sampled points and conveniently arranged in matrix form:

$$\mathbf{K}=\left[\begin{array}{ccc}k\left({\mathbf{x}}_{1},{\mathbf{x}}_{1}\right)& \dots & k\left({\mathbf{x}}_{1},{\mathbf{x}}_{t}\right)\\ \vdots & \ddots & \vdots \\ k\left({\mathbf{x}}_{t},{\mathbf{x}}_{1}\right)& \dots & k\left({\mathbf{x}}_{t},{\mathbf{x}}_{t}\right)\end{array}\right].$$
(7)

Many covariance functions \(k\left(\mathbf{x},{\mathbf{x}}^{\mathrm{^{\prime}}}\right)\) (or kernel functions) can be chosen, as decreasing functions of the distance between points \({x}_{i}\) and \({x}_{j}\) in the input space.

Considering the joint Gaussian distribution:

$$\left[\begin{array}{c}{\mathbf{f}}_{1:t}\\ {f}_{*}\end{array}\right]\sim \mathcal{N}\left(0,\left[\begin{array}{cc}\mathbf{K}& \mathbf{k}\\ {\mathbf{k}}^{T}& k\left({\mathbf{x}}_{*},{\mathbf{x}}_{*}\right)\end{array}\right]\right),$$
(8)

where \({f}_{*}\) is the objective output at \({\mathbf{x}}_{*}\), that is \({f}_{*}=f\left({\mathbf{x}}_{*}\right)\), and

$$\mathbf{k}=\left[\begin{array}{c}k\left({\mathbf{x}}_{*},{\mathbf{x}}_{1}\right) k\left({\mathbf{x}}_{*},{\mathbf{x}}_{2}\right) \cdots k\left({\mathbf{x}}_{*},{\mathbf{x}}_{t}\right)\end{array}\right],$$
(9)

the following predictive distribution can be derived—for a full analytical derivation, see [55]:

$$P\left({f}_{*}\mid {\mathcal{D}}_{1:t},{\mathbf{x}}_{*}\right)=\mathcal{N}\left({\mu }_{t}\left({\mathbf{x}}_{*}\right),{\sigma }_{t}^{2}\left({\mathbf{x}}_{*}\right)\right),$$
(10)

where:

$${\mu }_{t}\left({\mathbf{x}}_{*}\right)={\mathbf{k}}^{T}{\mathbf{K}}^{-1}{\mathbf{f}}_{1:t}$$
(11)
$${\sigma }_{t}^{2}\left({\mathbf{x}}_{*}\right)=k\left({\mathbf{x}}_{*},{\mathbf{x}}_{*}\right)-{\mathbf{k}}^{T}{\mathbf{K}}^{-1}\mathbf{k}{\sigma }_{t}^{2}\left({\mathbf{x}}_{*}\right)=k\left({\mathbf{x}}_{*},{\mathbf{x}}_{*}\right)-{\mathbf{k}}^{T}{\mathbf{K}}^{-1}\mathbf{k}.$$
(12)

In the above set of equations, \({\mu }_{t}\left({\mathbf{x}}_{*}\right)\) is the prediction over the objective function value at any chosen point \({\mathbf{x}}_{*}\), and \({\sigma }_{t}^{2}\left({\mathbf{x}}_{*}\right)\) is the variance of the prediction at \({\mathbf{x}}_{*}\) (the subscripts here denote that the prediction and its variance come from a GP trained with the \({\mathcal{D}}_{1:t}=\left\{{\mathbf{x}}_{1:t},f\left({\mathbf{x}}_{1:t}\right)\right\}\) data sample).

To compute the prediction and the variance at \({\mathbf{x}}_{*}\) from Eqs. (11) and (12) (by means of which exact inference is computed), it is necessary to invert the kernel matrix \(\mathbf{K}\). This operation has a computational complexity of \(\mathcal{O}\left({N}^{3}\right)\), where \(N\) is the size of the (square) kernel matrix (which equals the number of observations, \(t\)). While this operation is relatively fast on its own, it can lead to computationally burdensome workflows as (i) the BO approach entails the maximisation of the acquisition function, a task that may require computing thousands of predictions (especially in high dimensional problems), and (ii) the number of observations keeps increasing (and so the size of \(\mathbf{K}\)) as the optimisation advances and new points are added to the observations set.

Therefore, when using Gaussian Processes, BO badly scales with the number of observations. One way to mitigate such a problem consists of limiting the number of observations used to fit the GP to a certain amount (e.g., defining an “active set” size of a few hundreds), by randomly choosing a new set of training points among the sample at each iteration of the algorithm. Indeed, this practice is applied in the implementation used within this work, with an active set size of 300 being used.

The choice of the kernel function deeply affects the smoothness properties of a GP. This must be coherent with the features of the underlying objective function to obtain a quality surrogate model. Moreover, as each problem has its own specifics, the kernel function must be properly scaled. To this extent, the kernel functions are generalized by introducing hyperparameters. In the case of a squared exponential kernel function, this results in the equation:

$$k\left( {{\varvec{x}}_{i} ,{\varvec{x}}_{j} } \right) = \sigma_{f} {\text{ exp}}\left( { - \frac{1}{{2\theta^{2} }}\left\| {{\varvec{x}}_{i} - {\varvec{x}}_{j} } \right\|^{2} } \right),$$
(13)

where \({\sigma }_{f}\) is the vertical scale, which is the GP’s standard deviation (i.e., describes the vertical scaling of the GP’s variance), and the hyperparameter \(\theta\) is the characteristic length scale, which defines how far apart the input points \({{\varvec{x}}}_{i}\) can be for the output to become uncorrelated. When dealing with anisotropic problems (as it is often the case with model updating), it is much more convenient to use separate length scales, one for each parameter. This is typically done with automatic relevance determination kernels (ARD), that consists in using a vector of hyperparameters \({\varvec{\theta}}\), which size equals \(d\).

In practical terms, when a specific length scale \({\theta }_{l}\) assumes a significantly higher value compared to other length scales, the kernel matrix becomes independent of the \(l\)-th parameter.

The optimal set of hyperparameters \({\varvec{\theta}}\) is computed by maximisation of the marginal log-likelihood of the evidence \({\mathcal{D}}_{1:t}=\left\{{\mathbf{x}}_{1:t},f\left({\mathbf{x}}_{1:t}\right)\right\}\) given \({\varvec{\theta}}\):

$${\rm log}(p({{\varvec{f}}}_{1:t}\mid {{\varvec{x}}}_{1:t},{{\varvec{\theta}}}^{+}))=-\frac{1}{2}{{{\varvec{f}}}_{1:t}}^{\top }{{\varvec{K}}}^{-1}{{\varvec{f}}}_{1:t}-\frac{1}{2}{\rm log}|{\varvec{K}}|-\frac{t}{2}{\rm log}(2\pi ),$$
(14)

where the \({{\varvec{\theta}}}^{+}\) vector contains the length scales \({{\varvec{\theta}}}_{1:l}\) plus the vertical scale, and the mean \({\mu }_{0}\) (i.e., the constant regression term) of the GP (and therefore all the \(d+2\) hyperparameters), so that\({{\varvec{\theta}}}^{+}:=({{\varvec{\theta}}}_{1:l},\boldsymbol{ }{\mu }_{0},{\sigma }_{f})\). In the previous equation, the dependency on \({{\varvec{\theta}}}^{+}\) is obviously found in the kernel matrix \({\varvec{K}}\).

By employing this approach, a sort of sensitivity analysis of the parameters over the sampled points is performed. This built-in feature of the Bayesian optimisation technique is certainly useful for what concerns structural model updating problems, where the system sensitivity to the updating parameters is often dissimilar and usually unknown.

Four different ARD kernels will be employed, two of which are in the form of Matérn functions [56]. These functions are defined as

$$k\left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{j} } \right) = \frac{1}{{2^{{\varsigma - 1}} \Gamma \left( \varsigma \right)}}\left( {2\sqrt \varsigma {\mathbf{x}}_{i} - {\mathbf{x}}_{j} } \right)^{\varsigma } H_{\varsigma } \left( {2\sqrt \varsigma {\mathbf{x}}_{i} - {\mathbf{x}}_{j} } \right),$$
(15)

where \(\varsigma\) is a smoothness coefficient, while \(\Gamma (\bullet )\) and \({H}_{\varsigma }\left(\bullet \right)\) are the Gamma function and the Bessel function of order \(\varsigma\), respectively. As the smoothness coefficient \(\varsigma\) tends towards infinite, the Matérn function reduces to the squared exponential function; when \(\varsigma\) tends towards zero, the Matérn function reduces to the unsquared exponential function. The four employed kernels are:

  • an ARD unsquared exponential kernel:

    $$\begin{aligned} k\left( {{\varvec{x}}_{i} ,{\varvec{x}}_{j} {\mid }{\varvec{\theta}}^{ + } } \right) & = \sigma_{f}^{2} \exp \left( { - D} \right) \\ D & = \sqrt {\mathop \sum \limits_{l = 1}^{d} \frac{{\left( {x_{i,l} - x_{j,l} } \right)^{2} }}{{\theta_{l}^{2} }}} \\ \end{aligned}$$
    (16.a)
  • an ARD squared exponential kernel:

    $$k\left({{\varvec{x}}}_{i},{{\varvec{x}}}_{j}\mid {{\varvec{\theta}}}^{+}\right)={\sigma }_{f}^{2} {\text{ex}}p\left[\sum_{l=1}^{d} \frac{{\left({x}_{i,l}-{x}_{j,l}\right)}^{2}}{{\theta }_{l}^{2}}\right]$$
    (16.b)
  • an ARD Matérn 3/2 kernel (\(\varsigma =\frac{3}{2}\)):

    $$\begin{aligned} k\left( {{\varvec{x}}_{i} ,{\varvec{x}}_{j} {\mid }{\varvec{\theta}}^{ + } } \right) & = \sigma_{f}^{2} \left( {1 + \sqrt 3 D} \right)\exp \left( { - \sqrt 3 D} \right) \\ D & = \sqrt {\mathop \sum \limits_{l = 1}^{d} \frac{{\left( {x_{i,l} - x_{j,l} } \right)^{2} }}{{\theta_{l}^{2} }}} \\ \end{aligned}$$
    (16.c)
  • an ARD Matérn 5/2 kernel (\(\varsigma =\frac{5}{2}\)):

    $$\begin{aligned} k\left( {{\varvec{x}}_{i} ,{\varvec{x}}_{j} {\mid }{\varvec{\theta}}^{ + } } \right) & = \sigma_{f}^{2} \left( {1 + \sqrt 5 D + \frac{5}{3} D^{2} } \right)\exp \left( { - \sqrt 5 D} \right) \\ D & = \sqrt {\mathop \sum \limits_{l = 1}^{d} \frac{{\left( {x_{i,l} - x_{j,l} } \right)^{2} }}{{\theta_{l}^{2} }}} \\ \end{aligned}$$
    (16.d)

The difference between the four kernel functions is visible in Fig. 1. The exponential kernel stands out due to its rapid decline in correlation as distance increases. Consequently, function samples drawn from a GP constructed with the exponential kernel exhibit notably rugged features, whereas those generated using Matérn 3/2, Matérn 5/2, and squared exponential kernel functions display increasingly smoother characteristics.

Fig. 1
figure 1

Correlation between observations \({{\varvec{x}}}_{i}\) and \({{\varvec{x}}}_{j}\) according to the four different kernel functions, plotted against the distance (\(\theta =0.25\))

As mentioned already, two different approaches can be followed to search for the optimum: the exploitative approach and the explorative approach. The automatic trade-off between exploitation and exploration is taken care of by the acquisition function.

Typically, Bayesian Optimization has found its historical roots within the scientific literature as a methodology primarily designed for the maximization of objective functions. Consequently, acquisition functions are conventionally formulated to yield high values in regions where the objective is deemed to exhibit a high value. Thus, when seeking the minimum of a function \(f\left({\varvec{x}}\right)\), such in the case of model updating, it is sufficient to consider the equivalent problem:

$$\begin{aligned}{{\text{argmax}}}_{{\varvec{x}}} g\left({\varvec{x}}\right)\\ g\left({\varvec{x}}\right)=-f\left({\varvec{x}}\right)\end{aligned}$$
(17)

The next point \({{\varvec{x}}}_{t+1}\) that will be chosen for sampling is found by maximising the acquisition function \(a({\varvec{x}})\) according to the optimisation problem:

$${{\varvec{x}}}_{t+1}={{\text{argmax}}}_{{\varvec{x}}} a\left({\varvec{x}}|{\mathcal{D}}_{1:t}\right)$$
(18)

Four acquisition functions were tested on a prelaminar numerical case study, consisting of the updating of a simple 2D shear-frame (see Fig. 2): Probability of Improvement (PI), Expected Improvement (EI), a modified version of Expected Improvement [57] and Upper Confidence Bound (UCB). Among these, UCB was selected for implementation in the Bayesian sampling algorithm used in this study, as it was found to strike the best balance between exploitation and exploration. In particular, UCB was found to be about 40%, 20%, and 70% better than PI, EI, and EI (Bull) in terms of accuracy, respectively, given the same initial seed and the same total sampling volume.

Fig. 2
figure 2

Comparative analysis of four distinct acquisition functions. The plots illustrate the trajectory of the optimal objective function value throughout the optimization iterations for the updating of a basic 2D shear-frame

Upper Confidence Bound (or Lower Confidence Bound, LCB, if minimisation is involved), first proposed by [58] in the “Sequential Design for Optimisation” algorithm (SDO), consists of a very simple yet very effective approach. The UCB function is defined as:

$${\text{UCB}}\left(\mathbf{x}\right)=\mu \left(\mathbf{x}\right)+\kappa \sigma \left(\mathbf{x}\right)$$
(19)

where \(\kappa\) is typically a positive integer number, which controls the bound width identified by the standard deviation \(\sigma \left(\mathbf{x}\right)\) and therefore the propensity of the algorithm to explore the search space. Often, \(\kappa\) is taken equal to 2, so that the confidence bound is about 95% (indeed, this is the value used in the following case-study).

The “intelligent” sampling performed by the BO approach when using UCB is displayed in Fig. 3. Here, a simple numerically-simulated updating problem consisting of a 3-DOF shear-type system is considered. Levels are all denoted by the same stiffens \(k\), while the lumped masses are \({m}_{1}\), \({m}_{2}\) and \({m}_{3}\). The parameters being updated consist of the stiffness \(k\) and the mass \({m}_{2}\), which target values are known in advance (as well as the target response of the system). The associated 2D penalty function is sampled at 9 randomly chosen points, a Gaussian Process is fitted to the observations,and the UCB function is computed knowing \(\mu \left(\mathbf{x}\right)\) and \(\sigma \left(\mathbf{x}\right)\) by means of Eqns. (11) and (12). The minimum according to the surrogate model and the acquisition function maximum (i.e., the succeeding sampling point) are visible. When using UCB (with \(\kappa =2\)), the choice of the next sampling point is heavily influenced by the high level of uncertainty in the predictions. The acquisition maximum is found in an area far from other observations, where uncertainty is very high, while the predicted objective is still reasonably low. Furthermore, a tendency to explore the optimization space at the early stages of the optimization procedure, when the number of observations is low and the uncertainty is high, followed by a gradually more exploitative behaviour as the overall uncertainty decreases, is a remarkable aspect of UCB functions for locating the global optimum, as both exploration and exploitation needs are upheld.

Fig. 3
figure 3

The upper visualization shows the GP mean (surface in red), the nine observations \({{\varvec{f}}}_{1:9}\) used for training, the model minimum (i.e., the believed lowest function value according to the GP), and the next sampling point selected by the UCB acquisition function. The lower depicts the acquisition function surface \(UCB({\varvec{x}})\) and the next chosen sampling point, which corresponds to the acquisition function maximum

3 Methodology

In this section, the implementation of the Bayesian optimisation algorithm strategy used in this study is described. For completeness, the main technical details about the optimisation techniques used for comparison are discussed as well. These algorithms are in fact very susceptible to specific implementation choices and initial parameters’ values, as the optimisation outcome is affected both in terms of sampling efficiency and accuracy. This is particularly true for Simulated Annealing and Genetic Algorithm.

3.1 Bayesian optimisation: the proposed algorithm

Technical details about the implementation of the employed Bayesian sampling optimisation procedure are summarized as follows, according to the flowchart represented in Fig. 4.

  1. (1)

    The optimisation procedure is initialized by computing the objective function at the seed points, which are randomly chosen within the optimisation domain, defined by the search bounds of each input parameter. The seed size should be sufficiently large to avoid overfitting when selecting the optimal set of kernel hyperparameters through log-likelihood maximisation. As a rule of thumb, [38] suggest setting the initial seed size at \(10\bullet d\) at least, where \(d\) is the number of dimensions of the optimisation problem (i.e., updating parameters). Indeed, this criterion will be followed for the presented case study.

  2. (2)

    The fitting of the Gaussian Process (i.e., the surrogate model at iteration \(i\)) occurs by maximising the marginal log-likelihood, which enables to select the optimal set of hyperparameters \({{\varvec{\theta}}}^{+}\). Moreover, a small amount of Gaussian noise \({\sigma }^{2}\) is added to the observations (such that the prior distribution has covariance K \(({\varvec{x}},{\varvec{x}}\boldsymbol{^{\prime}};{\varvec{\theta}})+{\sigma }^{2}{\varvec{I}}\)).

  3. (3)

    To maximise the acquisition function, several thousands of predictions \({\mu }_{t}\left({\mathbf{x}}_{*}\right)\) are computed at points \({\mathbf{x}}_{*}\) randomly chosen within the optimisation space. Then, some of the best points are further improved with local search (for this application, the “fmincon” MatLab® function is used), among which the best point is finally selected.

  4. (4)

    The objective function is computed at the point corresponding to the acquisition function maximum.

Fig. 4
figure 4

Bayesian sampling optimisation algorithm flowchart

By following this workflow, a newly fitted GP is used at each algorithm iteration. In fact, the objective function value computed at each \(i-{\text{th}}\) iteration is added to the set of observations at iteration \(i+1\), which is then employed to train the GP used to model the objective function, by determining a new set of hyperparameters via log-likelihood maximisation.

Before starting the actual procedure (step (2)), the code is implemented to perform cross-validation tests to determine which configuration of the GP is most suitable for the specific updating problem, according to the following procedure. First, non-exhaustive cross-validation tests are performed to choose whether to log transform the input variables (i.e., updating parameters), as this is often found to improve the GP regression quality. To this extent, validation loss is computed for two GPs, one fitted using non-transformed variables, and the other fitted using log-transformed variables. Secondly, after choosing whether to transform the input variables or not, the GP is fitted four different times using the four kernel functions previously introduced. Once more, cross-validation loss is computed to establish which kernel is the fittest for modelling the objective function. Once the input-variables transformation is established and the GP kernel chosen, the algorithm is actually initialized.

3.2 Benchmark algorithms: GPS, SA, and GA

Generalized pattern search is a relatively simple traditional optimisation algorithm, while Simulated Annealing, the Genetic Algorithm and BO are generally considered to be “computational intelligence” optimisation techniques. All four algorithms have in common that no use of derivatives is made, hence the function is not required to be differentiable. Besides, despite the different approaches and backgrounds, Simulated annealing, GA and Bayesian sampling optimisation techniques share many elements: all algorithms are designed to carry out a global search of the minimum, avoiding local minima; they are appropriate for non-linear and non-smooth functions; finally, the algorithms are especially suitable for black-box models, where establishing in advance a functional form that effectively aligns with the data is often impossible. The key difference between the BO approach and the other techniques is that the former requires a much smaller sampling volume to achieve comparable optimization performance, greatly enhancing the computational efficiency when expensive cost functions are involved. Nonetheless, a greater sampling efficiency comes at the expense of a more sophisticated algorithm, that requires computationally intensive operations at each iteration. One shared drawback of GA, simulated annealing and Bayesian optimisation techniques is that these algorithms tend to give results close to the global minimum, although not very accurate. On the other hand, GPS can achieve high accuracy of solution.

The selection of parameters proves to be a critical aspect across all techniques. However, parameter selection for GPS, GA, and SA, as found by the authors, presents greater challenges compared to BO. The optimization performance exhibits heightened sensitivity to factors such as initial temperature, cooling schedule, and acceptance probability function for SA, crossover and mutation operators for GA and mesh shape and size in the case of GPS. On the contrary, while BO also demands user-defined parameters as described in Sect. 2.2—most notably, the kernel function—these do not pose significant difficulty and demonstrate robust generalization within the optimization framework of model updating cost functions. Furthermore, kernel hyperparameters are automatically optimized through maximization of the marginal log-likelihood, as already mentioned.

Reference can be made to [59] for a first implementation of a GPS algorithm, to [60] who initially proposed the GA algorithm, and to [61] for a first application of the SA concept to optimisation problems.

Given the diverse nature of the optimisation techniques employed, it is essential to exercise particular care in selecting each algorithm's implementation strategy to ensure a fair comparison, necessary to evaluate the performance of the Bayesian sampling optimisation approach in model updating applications. In particular, GPS, SA and GA are allowed to sample the objective function 1000 times, while BO is stopped at 500 function evaluations. This is necessary since the former techniques typically need a much greater sampling volume to achieve sufficient levels of accuracy.

The specific technical details of each alternative are briefly described as follows. The GPS algorithm used in the case studies adheres to the standard procedure firstly introduced by Hooke & Jeeves and is set according to the following details:

  1. (1)

    Input parameters are linearly scaled to the interval \([\mathrm{0,100}]\), according to the optimisation bounds: this is needed since the employed mesh size is equal in all dimensions.

  2. (2)

    The mesh size is multiplied by a factor of 2 at every successful poll, and it is divided by the same factor after any unsuccessful poll.

  3. (3)

    The algorithm stops if the maximum allowed number of objective function evaluations is reached.

Simulated annealing is implemented in its most common formulation as proposed by Kirkpatrick et al. Specifically, it is set according to the following strategy:

  1. (1)

    Input parameters are linearly scaled to the interval \([\mathrm{0,1}]\),

  2. (2)

    The initial temperature \({T}_{0}\) is set at 50.

  3. (3)

    The temperature gradually decreases at each iteration according to the cooling schedule \(T={T}_{0}/k\), where \(k\) is a parameter equal to the iteration number.

  4. (4)

    Each new sampled point, if its objective is higher than the current one, is accepted according to the acceptance function \(\frac{1}{1+{\text{exp}}\left(\frac{\Delta }{{\text{max}}(T)}\right)}\), where \(\Delta\) is the difference between the objective values (at the incumbent point and the newly sampled one).

Finally, the Genetic Algorithm implementation used is based on the following principles:

  1. (1)

    Compliance with optimisation bounds is enforced by ensuring each individual is generated within the constraints at each generation though proper crossover and mutation operators.

  2. (2)

    The initial population, necessary to initialize the algorithm, consists of points randomly chosen within the space defined by the optimisation bounds of each parameter. As the population size should increase accordingly to the number of dimensions, the initial population size was set to 200.

  3. (3)

    The choice of parents is made according to their fitness value. In particular, the chances of breeding are higher for higher fitness values.

  4. (4)

    The crossover fraction is set to 0.8.

  5. (5)

    The elite size is set at 5% of the population size.

  6. (6)

    The mutation fraction varies dynamically, according to the genetic diversity of each generation.

3.3 The objective function

A well-known objective function is employed, based on the difference between the estimated and the actual values of natural frequencies and on the Modal Assurance Criterion—MAC [62] between target and computed modes. This can be defined as follows:

$$P=\sum_{i=1}^{N}\left|\frac{{\omega }_{i}^{targ/id}-{\omega }_{i}^{calc}}{{\omega }_{i}^{targ/id}}\right| +\sum_{i=1}^{N} \left(1-{\text{diag}}\left({\text{MAC}}\left({{\varvec{\phi}}}_{i}^{\text{calc }},{{\varvec{\phi}}}_{i}^{targ/id}\right)\right)\right)$$
(20)

where \({\omega }_{i}^{targ/id}\) and \({\omega }_{i}^{calc}\) are respectively the \(i\)-th target (or identified) natural angular frequency and the \(i\)-th computed natural angular frequency out of the \(N\) modes used for updating, and \(MAC\left({\phi }_{i}^{\text{calc }},{\phi }_{i}^{targ/id}\right)\) is the MAC value relative to the \(i\)-th computed mode shape \({\phi }_{i}^{\text{calc}}\) and the \(i\)-th target (or identified) mode shape\({\phi }_{i}^{targ/id}\). This objective function includes both natural frequencies and mode shapes, with equal weights.

3.4 Performance metrics

The root mean square relative error (RMSRE) is considered as a global metric for the accuracy of the optimisation procedure, both in the input domain (i.e., for each updating parameter) and in the output domain (i.e., for the natural frequencies and mode shapes). In this latter case, this is computed as:

$${{\text{RMSRE}}}_{{\text{output}}}=\sqrt{\frac{1}{n}\cdot \sum_{i=1}^{n}\Delta {f}_{{\text{rel}},i}^{2}+\frac{1}{n}\cdot \sum_{i=1}^{n} {\left(1-{MAC}_{i}\right)}^{2}}$$
(21)

where \({f}_{{\text{rel}},i}\) is the relative error of the i-th natural frequency, \({MAC}_{i}\) is the MAC value of the i-th mode shape and \(n\) is the number of modes considered for updating. In the former case, for the parameters estimated in the input space, the RMSRE is instead given by:

$${{\text{RMSRE}}}_{{\text{input}}}=\sqrt{\frac{1}{n}\cdot \sum_{i=1}^{n}\Delta {X}_{{\text{rel}},i}^{2}}$$
(22)

where in this case \({X}_{{\text{rel}},i}\) is the relative error between the i-th updating parameter and its target value, and \(n\) is the number of updating parameters considered. Obviously, this calculation was only possible for the numerical dataset, were the inputs of the numerically-generated results are user-defined and thus known and comparable.

Finally, the total computational time is reported as well, specifically as a way to compare BO to the other algorithms—GPS, SA, and GA. This allows to prove the time-saving advantage of using a very efficient optimisation technique such as Bayesian sampling optimisation.

4 The case study: the Mirandola Bell Tower

The proposed approach has been validated on a well-known CH case study, the Mirandola bell tower, resorting to data collected from on-site surveys [63, 64]. Indeed, historical CH structures require very specific monitoring and assessment strategies; a complete overview can be found in [65].

Both experimental data and numerically-simulated data have been employed. This latter case is intended for assessing the algorithm capabilities in a more controlled fashion. The experimental validation, on the other hand, proves its feasibility for data collected from operational (output-only) Ambient Vibration (AV) tests.

The masonry-made bell tower of the Santa Maria Maggiore Cathedral in Mirandola (Emilia-Romagna, Italy) is pictured in Fig. 5.a. For this application, model updating is used as a means of structural damage assessment, as mentioned in the Introduction. The results from this procedure, carried out through the use of a Bayesian sampling optimisation approach, will be compared to the damage analysis of the bell tower structure conducted in [63]. The data analysed here originate from the same dataset, which will be discussed later.

Fig. 5
figure 5

Pictures of the Santa Maria Maggiore Cathedral (a), the bell tower located on its south-east corner (b), the discretisation in five substructures (c), and the sensor layout with the location and orientation of the recording channels (d). Adapted from [63]

The target structure represents an important piece of historical and cultural heritage. Built during the late fourteenth century, it underwent several structural modifications throughout the centuries, especially in the seventeenth century when the height of the original tower was tripled [66]; the existing portions were reinforced as well to withstand the additional weights. The octagonal stone roofing was finally added in the eighteenth century. From a geometric perspective, the tower has a square plan (5.90 \(\times\) 5.90 m) for a total height of 48 m with four levels of openings (Fig. 5b).

Historically, the area of interest was only considered to be at modest seismic risk. However, two events linked to the 2012 Emilia Earthquake (specifically on the 20th and 29th of May) caused significant damage to the tower, as detailed in [64]. The damage pattern influenced the FE modelling as well, as will be discussed in the next subsections. More details from the visual inspection and in-situ surveys can be found in [64] and [63].

4.1 Acquisition setup

The dynamic testing discussed here was performed by the laboratory of Strength of Materials (LabSCo) of the IUAV (Istituto Universitario di Venezia) in August 2012 [64], considering the post-earthquake situation before the installation of the provisional safety interventions. An Operational Modal Analysis (OMA) was conducted, relying on the structural response from ambient vibrations to identify the modal properties of the target structure [67].

Eight uniaxial piezoelectric accelerometers (PCB Piezotronics type 393C) were deployed as portrayed in Fig. 5.d, using metal bases to attach them to the masonry walls [1]. All recorded signals had a length of \(\sim\) 300 s and a sampling frequency \({f}_{s}=192\) Hz: this was then reduced to 48 Hz in post-processing, focusing on the first eight modes, to expedite the output-only identification procedure. This was done by employing the Stochastic Subspace Identification (SSI) algorithm [68] on the detrended and filtered data. However, only the first six modes (reported in Table 1) were employed as done in [63] since the 7th and the 8th ones were deemed as less reliable.

Table 1 Identified natural frequencies (OMA)

4.2 Crack pattern and finite element model

The FE model (Fig. 6) was organised in substructures (i.e. macro-areas), following the structural partition of the tower structure and reflecting the strong localisation of damage. This macro-zoning procedure reflects what is generally done for similar structures after seismic damage—see the similar case of the Fossano bell tower in [69]. The rationale is to encompass areas that are expected to show similar and homogeneous mechanical properties.

Fig. 6
figure 6

Finite Element Model of the Mirandola bell tower. a The whole structure b detail of the linear springs elements (c) the idealized connections with the nearby buildings, shown in plan-view. Adapted from [63]

In this case, the tower base (subsection 1, highlighted in red in Fig. 5.c), i.e., the portion from ground level (0.00 m) to + 9.50 m, suffered minimal to no damage, as well as the tower top (subsection 4) and the belfry (subsection 5). These are highlighted respectively in yellow, between + 30.5 m and + 37.5 m, and green, from + 37.5 m to + 48.0 m. The most severe and large damage portions were found in subsection 2 (dark orange, + 9.50 m to + 21.0 m) and subsection 3 (light orange, + 21.0 m to + 30.5 m). This latter floor represents a structural peculiarity of this case study, as it includes very large window openings on all façades. The reason is that the portion corresponds to the first belfry, built before the addition of the second one on top of it. In any case, this layout is quite different from the most common cases where the opening size decreases moving from the tower bottom. This affects the local and global dynamic response of the whole structure [70]. In fact, during the seismic events of interest, this locally more flexible portion underwent a twisting rotation, which caused deep diagonal cracks to arise right below this order of openings, extending down to the (less wide) openings of the underlying level on all four sides. Importantly, the second and third portions were also the ones covered by the eight metal tie rods (four per portion) installed as provisional safety interventions.

As for the previous case study, the model was realized in ANSYS Mechanical APDL. SHELL181 elements were applied for all façades in all macro-areas, as well as for the stone roof and the masonry vaults at the basement level, for a total of 1897 elements. The interactions with the Cathedral of Santa Maria Maggiore and the rectory were modelled by 104 COMBIN14 linear springs, distributed along the whole contact surface in correspondence to the apse arches, the nave walls, and the rectory wall on the East, West, and North side (Fig. 6b). That is intended to simulate the in-plane stiffness of the attached masonry walls, thus removing these external elements and replacing their reaction forces with springs’ elastic forces, acting as boundary conditions as also suggested in [63].

The complete FE model consisted of 2052 nodes.

4.3 Model updating setup

The following eleven parameters were considered for updating (see Table 2):

  • \({E}_{1}, {E}_{2}, {E}_{3}, {E}_{4}\): the Young’s modulus of the damaged masonry in the four sub-structures (according to Fig. 6).

  • \({\nu }_{mas}\): the Poisson’s ratio of the masonry, assumed as constant everywhere.

  • \({k}_{1},{k}_{2},{k}_{3},{k}_{4},{k}_{5},{k}_{6}\): the linear stiffness of the six distributed springs (used to model the connections with the nearby structures, see Fig. 6).

Table 2 Parameters selected for updating and associated optimisation bounds

These parameters are exactly the same ones considered in [63], thus enabling a direct comparability of the results. No sensors were available on the belfry roof, therefore, for lack of reliable data, the 5th macro-area was not considered in the FE updating.

Table 2 shows these input-parameters and the assumed optimisation bounds. Notice that the optimisation range of the link-element parameters spans through several orders of magnitude, thus generating an extremely wide optimisation space. This reflects the high uncertainty about the boundary conditions, which might significantly affect the dynamic response of the structure. The optimisation bounds of the elasticity moduli consider the values suggested by literature and Italian regulations for brick masonry, while allowing to capture the level of damage suffered by the structure.

The (arbitrarily chosen) target input parameters and the related system-output parameters (in terms of frequency only) used for the numerically simulated data setup are reported in Table 3. System-output parameters used for updating the experimental data setup are the identified natural frequencies shown in Table 1 (and the related mode shapes).

Table 3 Target natural frequencies (generated by the set of target parameters)

5 Results

The results for BO and each benchmark optimisation algorithm are firstly presented for the numerical dataset, drawing considerations about accuracy and computational time. These are followed by the results of the experimental datasets, which allow to assess BO performance in a real application. Furthermore, concerning BO, the use of several kernels is investigated, and the benefits of accessing parameters length scales when using ARD kernel functions in the framework of model updating are highlighted.

5.1 Numerically-simulated data results

For the numerical case, the initial seed size is set to 220 points (20 times the number of parameters being updated). A logarithmic transformation is employed on the input parameters, as it was observed to augment the quality of the GP regression. The choice of the kernel function is driven by a cross-validation test, since some kernels may happen to be more suitable at modelling the underlying objective function specific to this updating problem, resulting in surrogate models with enhanced validity. The outcome of a tenfold cross-validation test of four different Gaussian Processes, fitted using the ARD exponential, the ARD Matérn 3/2, the ARD Matérn 5/2 and the ARD squared exponential kernels are shown in Table 4. Whilst all kernel functions are seen to return reliable regression models, the ARD Matérn 5/2 kernel is found to be the most suitable, returning excellent validation results. Fitting the GP to the initial seed also allows retrieving information on system sensitivity through the optimized hyperparameters of the selected ARD kernel (Table 5).

Table 4 The root mean square error (RMSE) is reported for each kernel, as the average of the validation losses obtained with tenfold cross-validation tests
Table 5 Parameters' length-scales (hyperparameters) obtained by maximisation of the marginal log-likelihood

As expected, the eleven length scales differ by several orders of magnitude due to the anisotropy of the problem. The parameters which mostly affect the system response are the elasticity moduli: such behaviour is foreseeable, as material elasticity is usually blamed for significantly impact modal properties. However, kernel hyperparameters suggest that the elasticity modulus of the fourth sub-structure has instead low sensitivity: this is likely due to the scarcity of sensors at that level of the building, which actually disqualifies from capturing the necessary vibration information. Therefore, reliable estimations of \({E}_{4}\) should not be expected, neither in the simulated-data setup, nor in the experimental one. The problem appears to be scarcely sensitive to changes in the Poisson’s ratio as well: this makes perfect sense as damping has only marginal effect on modal properties. For what concerns the springs, these are generally found to have lesser effects on the modal response compared to the elasticity moduli of the first three sub-structures. In particular, as hyperparameters suggest, \({k}_{5}\) and \({k}_{6}\) are found to have lower impact on the system modal response, while \({k}_{1}\), \({k}_{3}\) and \({k}_{4}\) feature higher sensitivity.

The Bayesian sampling optimisation is carried out using the upper confidence bound (UCB) acquisition function, which was generally found to be the most effective, as it provided a good balance between exploitation and exploration. In Fig. 7 (at the top) it is clear how the first selected sampling point already represents a massive improvement over the best cost function value, suggesting that the Gaussian Process is able to model the objective function impressively well. As the GP is updated with newly sampled points, UCB steadily converges towards a minimum, gradually improving the accuracy of the optimisation solution. For additional clarity, the best-computed objective against the iteration number is shown at the bottom. The best-obtained objective is 0.0742, which is fairly close to zero, the (known) global optimum in the output space. Also, it is noticeable how BO struggles at further improving the result after 350 iterations in this case.

Fig. 7
figure 7

Top: Bayesian optimisation progress over iterations for the numerical dataset. The objective value at the randomly sampled seed points is displayed in red, while the objective at points selected by UCB is displayed in blue. Bottom: best objective function value over iterations

The results relative to the output, that are the raw (best) cost function value and the estimated modal data, are shown in Table 6 for each optimisation technique. For all modes, the updated value and the target value are reported, as well as the relative error and the MAC value. Also, the RMSRE and the best achieved objective function value are reported for each algorithm. These results highlight how GPS, SA and GA fail at minimising the cost function under the proposed conditions: all algorithms, and especially SA, return objective values that are far from zero. Bayesian optimisation, on the contrary, at only 500 evaluations, achieves quite impressive results. The error about the frequencies is kept to a minimum (here, only the second mode is showing a higher divergence), and so is the error about the mode shapes. The relative errors are computed, for each \(n\)-th mode, as \(({f}_{n}^{{\text{UPD}}}-{f}_{n}^{{\text{TAR}}})/{f}_{n}^{{\text{TAR}}}\), where \({f}_{n}^{{\text{UPD}}}\) is the updated value and \({f}_{n}^{{\text{TAR}}}\) is the corresponding target.

Table 6 Optimisation results in the output space, obtained with GPS, Simulated annealing, Genetic Algorithm and Bayesian optimisation

The resulting updated parameters (that generate the updated modal features just discussed), are displayed in Table 7. The updated value, the target value and the RMSRE value (about the four elasticity moduli) are reported. Up to 1000 observations are used for the three former algorithms, while BO employs 500 observations only, leading to comparable total optimization times. Among all parameters, the most interesting are the elasticity moduli of the four sub-structures as these parameters respond to the main goal of the updating problem, that is assessing the level of damage to the structure. The stiffnesses of the six springs are of secondary interest since these are introduced in the updating procedure only due to unawareness of the degree of support provided by the adjacent architectonical elements as well as of their impact on the dynamic response of the building. Moreover, given their extremely wide optimisation range, identifying the right order of magnitude can already be considered a satisfactory result. In light of the above, GPS, SA and GA show quite poor results, failing to attain a good estimation of the first four parameters (and providing even worse estimations for the rest). On the contrary, Bayesian optimisation returns acceptable errors over the estimated values, showing good agreement with the first four in particular. Although a ~ 25% error in the elasticity moduli may appear significant, these results are in fact remarkable given the intricate nature of the optimization problem under consideration. Indeed, this level of precision allows for informed assessments regarding which portions of the structure likely experienced the most damage and which sections remain structurally sound. The scale of the springs is in some cases recognized as well, except for \({k}_{2}\) and \({k}_{6}\) (and to a lower extent \({k}_{1}\) and\({k}_{4}\)), suggesting once more that the algorithm could have run into a local minimum. Potentially, had Bayesian optimisation been able to properly sort out the right scale of \({k}_{2}\) and\({k}_{6}\), it would have then returned even better accuracy about the parameters of interest (i.e.,\({E}_{1}\),\({E}_{2}\), \({E}_{3}\) and\({E}_{4}\)).

Table 7 Optimisation results in the input space, obtained with GPS, SA, GA and BO

In the damage assessment study of 2017, the updating procedure was carried out in batches: at first, the springs were calibrated while holding the elasticity moduli constant, afterwards, \({E}_{1}\), \({E}_{2}\), \({E}_{3}\) and \({E}_{4}\) were optimised using the link-element stiffness values previously estimated. As traditional optimisation techniques were employed, this approach aimed at facilitating the optimisation procedure by lowering the dimensionality of the problem. Given the outcome of this numerical test, such an approach can be avoided when using BO, as this technique is powerful enough to allow considering all parameters at once, cutting off computational time and enhancing chances of ending up close enough to the global minimum.

The optimisation time employed by each algorithm is reported in Table 8. As the number of observations is relatively high, the secondary optimisation problems (i.e., the maximisation of the marginal log-likelihood and the maximisation of the acquisition function) are relatively burdensome tasks, significantly extending the total optimisation time (Fig. 8). As computing a prediction has a computational complexity of \(\mathcal{O}\left({{\text{N}}}^{3}\right)\), the modelling and point selection time gradually increases as the number of observations grows. These two tasks are crucial to obtain high sampling efficiency, which in turn allows for keeping the total number of observations to a minimum. In fact, with 500 iterations only (actually, in this case, about 350 would have been sufficient to obtain the same final results), BO still enables saving some computational time when compared to other techniques, even if negatively affected by increasingly burdensome secondary optimization problems, while retaining far superior accuracy.

Table 8 The table reports the elapsed optimisation time for each technique. The sampling volume comprises 500 observations for BO and 1000 observations for GPS, SA, and GA. Minor variations in computational time required to calculate the objective across algorithms stem from differences in computer-related computational performance among different runs
Fig. 8
figure 8

Total optimisation time of BO, as the sum of the objective evaluation time (the cumulative time employed for evaluating the objective function) and the modelling and point selection time (the cumulative time employed for maximising the marginal log-likelihood and the acquisition function)

5.2 Experimental data results

Differently from numerically simulated data, the experimental case study is affected by both the implicit limitations of the FE model and the measurement noise of the acquisitions. Thus, the minimum of the cost function is never found at zero when using real data, since significant misfit between experimental and computed modal properties will endure.

The same kernel and acquisition function of the numerical case are used, while the seed size is set to 250 points. For a given number of total observations (in this case 500), using larger seeds leads to shorter computational time (as the actual number of algorithm iterations is reduced, meaning less time is spent on secondary optimisation problems), potentially improved surrogate quality, at the expense of reduced exploitation of the algorithm’s “intelligent” sampling capabilities. In a way, we rely more on the surrogate represented by the GP and less on the point-selection process operated by the acquisition function.

Table 9 provides the length-scales obtained when fitting the GP to the new 250-point seed. As different observations are here employed, hyperparameters (only) marginally differ from the ones obtained for the numerically simulated dataset. It is noticeable a substantial correlation between the two. As such, previous considerations about length scales apply in this case as well.

Table 9 Parameters' length scales obtained by maximising the marginal log-likelihood for the experimental dataset. GP fitted on 250 seed points

The results obtained through the optimisation process (shown in Fig. 9) reveal how in this case the optimiser cannot be expected to converge at zero, since FE modelling deficiencies coupled with identified modal data inaccuracies lead to the ineluctable misfit between computed and measured modal response. For additional clarity, the best-computed objective against the iteration number is shown as well. The value associated with the function minimum is 0.8343.

Fig. 9
figure 9

Top: Bayesian optimisation progress over iterations for the experimental dataset. Once again, the objective value at the randomly sampled seed points is displayed in red, while the objective value at points selected by UCB is displayed in blue. Bottom: best objective function value over iterations

The results relative to the output, that are the raw (best) cost function value and the estimated modal data, are shown in Table 10, along with the modal parameters attained in the paper of reference. Once again, for each considered mode, the updated and the identified values are reported, as well as the relative error and the MAC value. Looking at the updated modal parameters, Bayesian optimisation gives results that are consistent with what was obtained in the 2017 study. Generally, natural frequencies obtained through the Bayesian sampling optimisation show a good correlation with the identified ones. The modes that exhibit the highest amount of error are the third and the fifth: this was already the case in the former study. The MAC values of the first three modes suggest a good correlation with the identified mode shapes of the tower, while the last three denote some degree of incoherence (especially for the fifth and the sixth). This issue is in common with the study made in 2017, indicating some problems probably due to the quality of the measurements at the highest modes or the inadequacy of the FE model to capture the dynamic behaviour of the bell tower at higher frequencies. Indeed, due to these reasons, fitting the system response with modal features beyond the third mode is often unpractical in many FEMU applications [71].

Table 10 Optimisation results in the output space, obtained with Bayesian optimisation. Results from 2017 are reported for reference. Also, the RMSRE is computed for both BO and the former paper

The results relative to the input space (i.e., the estimated parameters) are reported in Table 11, as the estimated value obtained through BO and the estimations of the former damage assessment study. Focusing on the elasticity moduli of the four sub-structures, results stemming from the Bayesian sampling optimisation approach are mostly compliant with the former study, except for the elasticity modulus of the third sub-structure.

Table 11 Optimisation results in the input space, obtained with Bayesian optimisation

These findings can be used to assess the damage condition of the bell tower. The low values obtained for the two lowest levels suggest that the bell tower has probably endured high levels of damage in these areas, which mostly affect the lower modes of the building. Concerning these two subparts, a slightly higher estimate of the second elasticity modulus is the only marginal difference between the two results. On the contrary, a substantial difference can be seen for the elasticity of the walls at the third level of the structure: the former study highlighted a much higher level of damage. Judging by the value estimated through BO, this specific part of the tower could have been either less damaged by the seismic event, or originally characterized by walls built with stiffer and more qualitative material. This hypothesis is made more plausible by considering that the building, which construction started in the late fourteenth century, was severally altered in the seventeenth century, when the height of the bell tower was tripled and the original structure reinforced [63]. Finally, the walls of the fourth level were found to be significantly stiffer than the rest of the structure. However, estimates of parameters concerning this level of the structure should not be considered as reliable for the reasons stated before (scarcity of sensors).

Regarding the springs elements which model the degree of constrain enforced by the adjacent structures, the results of the BO approach agree with the estimations of the former analysis, particularly for what concerns the value of \({k}_{5}\), which stands out in both cases with respect to the other link-element parameters by a factor of \({10}^{3}\). The greater stiffness of the fifth spring element suggests that the architectonical element having the greatest impact on the dynamic response of the bell tower is the easternmost apse arch. All the other elements (particularly the ones modelled by \({k}_{2}\), \({k}_{4}\) and \({k}_{6}\), hence the nave walls and the rectory wall) seem to provide little contrast to the motion of the bell tower when considering small-amplitude environmental vibrations.

6 Conclusions

When expensive FE models are involved, the optimization algorithm represents one of the most critical aspects of FEMU applications that make use of iterative methods. The optimisation technique is a key element of the updating procedure, as it should feature good sampling efficiency, global search attitudes and adequate accuracy to cope with non-convex and complex cost functions.

This research presented and validated a Bayesian sampling optimisation (BO) approach for such a task, with an application to a real case study—the Mirandola bell tower—that represents an interesting example of post-seismic assessment of a historical building of cultural and architectural relevance. Overall, the proposed procedure proved itself to be well-suited for this challenging task. Especially, BO outperformed the other well-established global optimisation techniques selected for the benchmark (namely Generalized Pattern Search, Simulated annealing and Genetic Algorithm), featuring far superior sampling efficiency, greater accuracy, and better capabilities of finding the global function minimum, particularly as the dimensionality of the problem increases. Results were achieved with half of the objective function evaluations allowed for GPS, SA, and GA. This, practically, translates into shorter computational times and costs.

One major drawback of SA and GA is that both techniques rely on large sampling volumes. Furthermore, these algorithms tend to provide results denoted by poor accuracy. The GPS algorithm, on the contrary, may exhibit scarce global search aptitudes, as it was either found converging too quickly or failing at finding the global function optimum. The proposed BO algorithm is not affected by any of these drawbacks.

Regarding the implementation of the described technique, a logarithmic transformation of the input variables was found to improve the quality of the Gaussian Process regression, albeit marginally. Furthermore, this research revealed that all four investigated kernels can be successfully used, although the ARD Matérn 5/2 kernel provided the best results in terms of validation of the surrogate model for this specific case-study. When using automatic relevance determination (ARD) kernels, it is possible to retrieve hyperparameters (i.e., length scales) of the GP, gathering useful information about the problem sensitivity to each parameter. Additionally, ARD kernels automatically discard irrelevant dimensions from the optimization procedure, making this approach well-suited for highly anisotropic problems. This sort of “built-in” sensitivity analysis is particularly advantageous in structural model updating applications, where information about the relevance of parameters (often scarcely known in advance) can be considerably useful for a better understanding of the structural behaviour.