1 Introduction

Data-based approaches to structural health monitoring (SHM) are often limited by the range of available labelled health–state data in training. This lack of available health-state data means that most approaches are limited to novelty detection, or only being able to identify previously-seen health-states. Population-based structural health monitoring (PBSHM) is a branch of SHM that seeks to overcome these challenges by expanding the set of available labelled health-state data by considering a population of structures [1,2,3,4]. By pooling together datasets from a population, there is likely to be more labelled information available and an increase in the potential to diagnose more health-states of interest.

A significant challenge in adopting PBSHM is that the feature spaces between structures will not be aligned; this will occur for several reasons. First, the structures themselves can be different, whether because of manufacturing variations in homogeneous populations (i.e. nominally-identical structures), or because the population is heterogeneous [1,2,3,4]. Second, issues arise because the monitoring setup is not exactly equivalent for each member of the population. Third, the environmental and operational conditions for each member of the population are different, meaning each member of the population may be observing a different part of the feature space. These shifts in the feature space (also known as domain shift), mean that labelled data cannot be naïvely shared by directly applying a classifier trained on one member of the population to other members. Instead, a mapping must be inferred between the feature spaces such that a classifier can be inferred on a harmonised dataset that will generalise between all members of the population.

Transfer learning, and more specifically domain adaptation, is a branch of machine learning that aims to achieve this goal of identifying a mapping between different domains (i.e. different feature spaces) [5,6,7]. By inferring a mapping that harmonises these domains based on some criteria (typically using statistical distances [8,9,10,11,12,13,14] and/or manifold assumptions [12, 15]), a classifier can be inferred for a source domain where labelled information is known, and be applied to unlabelled target domains. Several domain adaptation methods have been used within the SHM literature, with most focussing on identifying a latent space where the two feature datasets are harmonised [8,9,10,11,12,13,14,15,16,17]. The majority of the approaches are deterministic in nature, and utilise a distance metric to infer the mapping. This paper takes a different viewpoint, instead seeking to infer a mapping from the target domain directly onto the source domain (rather than to a latent space). In addition, the method outlined in this paper is probabilistic and uses a maximum likelihood criteria to infer the mapping.

The method outlined in this paper— named the domain-adapted Gaussian mixture model (DA-GMM)— is an extension of work proposed by Paaßen et al. [18], that inferred a linear mapping between a fully-labelled target dataset and a source Gaussian Mixture Model. The novelty in this paper is that the method has been extended to the scenario where the target is unlabelled (and even the scenario where the source is unlabelled as well). This extension makes the approach practical for PBSHM scenarios, with the method demonstrated on three case studies: an artificial dataset, a population of two numerical shear-building structures, and the Z24 [19] and KW51 [20] bridge datasets. A MATLAB implementation accompanies this paper—https://github.com/pagard/EngineeringTransferLearning.

The outline of this paper is as follows. Sect. 2 introduces the DA-GMM and the expectation maximisation algorithm used to solve the model. An artificial dataset is introduced in Sect. 3, to demonstrate the flexibility of the linear mapping in the context of two-dimensional rigid rotations. A case study inferring the mapping between a population of two numerical shear-building structures is presented in Sect. 4, where the approach is benchmarked against several other GMM models. In Sect. 5, the method is applied in a completely unsupervised manner between two bridges, the Z24 and KW51. Finally, conclusions are drawn in Sect 6.

2 Domain-adapted-Gaussian mixture model

The domain-adapted Gaussian mixture model (DA-GMM), an extension of the work by Paaßen et al. [18], seeks to learn a mapping from a target dataset onto a finite Gaussian Mixture Model (GMM) learnt in a supervised manner from a labelled source domain dataset \(\{X_s,{\varvec{y}}_s\}\) (where \(X_s \in \mathbb {R}^{D \times N_s}\) are the feature data and \({\varvec{y}}_s \in \mathbb {R}^{1 \times N_s}\) are the corresponding labelsFootnote 1). Specifically, the approach seeks to learn a linear mapping via a projection matrix \(H \in \mathbb {R}^{D\times D}\) that transforms the distribution of the unlabelled target feature data \(X_t \in \mathbb {R}^{D\times N_t}\) such that the likelihood of the transformed data \(\hat{X}_t=H X_t \in \mathbb {R}^{D\times N_t}\) being generated by the k-component source GMM is maximised. The underlying distribution of the transformed target data is, therefore, assumed to be defined by a k-component mixture model, where each of the k components is defined by a Gaussian distribution,

$$\begin{aligned} p(\hat{{\varvec{x}}}_t^{(i)} \;\vert \; z^{(i)} = k) = \mathcal {N}(H{\varvec{x}}_t^{(i)} \;\vert \; {\varvec{\mu }}_s^{(k)}, \Sigma _s^{(k)}), \end{aligned}$$

where k indexes the class group \(k\in \{1,\dots ,K\}\), where the latent variable \(z^{(i)} \in \{1,\dots ,K\}\) represents the mixture component for the feature data \(\hat{{\varvec{x}}}_t^{(i)}\), and the mean \({\varvec{\mu }}_s^{(k)}\) and covariance \(\Sigma _s^{(k)}\) of each component are defined by the supervised source domain GMM, with the source GMM being inferred from the labelled dataset \(\{X_s,{\varvec{y}}_s\}\), (and hence fixed in the following inference).

The latent variable that governs the mixture \(z^{(i)}\in \{1,\dots ,K\}\) is categorically distributed such that,

$$\begin{aligned} P(z^{(i)} = k) = \pi ^{(k)} \end{aligned}$$

and \(\sum ^K_{k=1} \pi ^{(k)} = 1\), where \({\varvec{\pi }} = \{\pi ^{(1)},\dots ,\pi ^{(k)}\}\) define the mixing proportions.

The posterior probability of each class, given a transformed data point is,

$$\begin{aligned} p(z^{(i)}= & {} k \;\vert \; \hat{{\varvec{x}}}_t^{(i)}) = \frac{p(\hat{{\varvec{x}}}_t^{(i)} \;\vert \; z^{(i)} = k) P(z^{(i)} = k)}{\sum _{k=1}^K p(\hat{{\varvec{x}}}_t^{(i)} \;\vert \; z^{(i)} = k) P(z^{(i)} = k)}, \end{aligned}$$
$$\begin{aligned} p(z^{(i)}= & {} k \;\vert \; \hat{{\varvec{x}}}_t^{(i)}) = \frac{\pi ^{(k)} \mathcal {N}(H{\varvec{x}}_t^{(i)} \;\vert \; {\varvec{\mu }}_s^{(k)}, \Sigma _s^{(k)})}{\sum _{k=1}^K \pi ^{(k)} \mathcal {N}(H{\varvec{x}}_t^{(i)} \;\vert \; {\varvec{\mu }}_s^{(k)}, \Sigma _s^{(k)})}, \end{aligned}$$

where \(p(z^{(i)} = k \;\vert \; H{\varvec{x}}_t^{(i)}) = r^{(i,k)}\), aptly named the responsibility matrix. A maximum likelihood estimate of the labels \(\hat{{\varvec{y}}}_t\) can be obtained by selecting the class with the highest probability in the responsibility matrix for each observation,

$$\begin{aligned} \hat{y}_t^{(i)} = {\mathop {{{\,\mathrm{arg\,max}\,}}}\limits _{k\in {1,\dots ,K}}} \left[ p(z^{(i)} = k \;\vert \; H{\varvec{x}}_t^{(i)})\right] . \end{aligned}$$

Parameter inference can be performed for the projection matrix H, as well as the mixing proportions for the transformed target data \({\varvec{\pi }}\), using a maximum likelihood approach. The complete data (i.e. \(\hat{X}_t\) and Z combined) log likelihood of the projected target data given the parameter set \({\varvec{\theta }} = \{H,{\varvec{\pi }}\}\) is defined as,

$$\begin{aligned} \log p(\hat{X}_t,Z\; \vert \; {\varvec{\theta }}) = \sum _{n=1}^N \log \sum ^{K}_{k=1} \pi ^{(k)} \mathcal {N}(H{\varvec{x}}_t^{(i)} \;\vert \; {\varvec{\mu }}_s^{(k)}, \Sigma _s^{(k)}), \end{aligned}$$

which can be maximised using Expectation Maximisation (EM).

EM is a maximum likelihood technique for performing inference when missing values causes the marginal likelihood to be intractable [21], as is the case with the DA-GMM, where the latent variables Z are unobserved and their distribution is unknown. Instead, EM iterates between an expectation step (E-step) and a maximisation step (M-step), to find a maximum likelihood estimate of the marginal likelihood. The E-step requires the definition of the auxiliary function \(Q({\varvec{\theta }},{\varvec{\theta }}_0)\), which, using the current parameter estimates \({\varvec{\theta }}_0\), is the expectation of the complete log likelihood function with respect to the current posterior distribution of the latent variables. The M-step then finds a new set of parameter estimates \({\varvec{\theta }}\) by maximising the auxiliary function with respect to the parameters. The steps are repeated until convergence criteria are met [21].

The auxiliary function (the expected complete data log likelihood) for the DA-GMM is,

$$\begin{aligned} Q({\varvec{\theta }},{\varvec{\theta }}_0) = \mathbb {E}_{Z\vert \hat{X}_t, {\varvec{\theta }}_0}\left[ \log p(\hat{X}_t,Z\; \vert \; {\varvec{\theta }}) \right] , \end{aligned}$$

where the expectation can be rewritten as,

$$\begin{aligned} Q({\varvec{\theta }},{\varvec{\theta }}_0) = \sum _Z p(Z\; \vert \hat{X}_t \; {\varvec{\theta }}_0) \log p(\hat{X}_t,Z\; \vert \; {\varvec{\theta }}), \end{aligned}$$

which in turn can be simplified using an indicator function \(\mathcal {I}(\cdot )\) for the mixing component for the \(i^{th}\) data point and substituting Eq. (6),

$$\begin{aligned}&Q({\varvec{\theta }},{\varvec{\theta }}_0) = \sum _{i=1}^N p(z^{(i)}\; \vert \hat{{\varvec{x}}}_t^{(i)}\; {\varvec{\theta }}_0) \sum _{k=1}^K \mathcal {I}(z^{(i)}=k) \nonumber \\&\quad \log \left[ \pi ^{(k)} \mathcal {N}(H{\varvec{x}}_t^{(i)} \;\vert \; {\varvec{\mu }}_s^{(k)}, \Sigma _s^{(k)})\right] , \end{aligned}$$

where \(p(z^{(i)}\; \vert \hat{{\varvec{x}}}_t^{(i)}\; {\varvec{\theta }}_0) \mathcal {I}(z^{(i)}=k) = p(z^{(i)} = k \;\vert \; \hat{{\varvec{x}}}_t^{(i)}) = r^{(i,k)}\). This expression simplifies to,

$$\begin{aligned} Q({\varvec{\theta }},{\varvec{\theta }}_0) = \sum _{i=1}^N \sum _{k=1}^K r^{(i,k)} \left( \log \left[ \pi ^{(k)}\right] + \log \left[ \mathcal {N}(H{\varvec{x}}_t^{(i)} \;\vert \; {\varvec{\mu }}_s^{(k)}, \Sigma _s^{(k)})\right] \right) . \end{aligned}$$

The auxiliary function can be maximised with respect to the mixing proportions \({\varvec{\pi }}\) analytically, by adding a Lagrange multiplier (to ensure that \(\sum ^K_{k=1} \pi ^{(k)} = 1\)) and setting the partial derivative to zero,

$$\begin{aligned} \pi ^{(k)} = \frac{1}{N} \sum _{i=1}^N r^{(i,k)}. \end{aligned}$$

The maximisation step with respect to the projection matrix H, only requires maximising the part of the auxiliary function dependent on H,

$$\begin{aligned} \max _H \sum _{i=1}^N \sum _{k=1}^K r^{(i,k)} \log \left[ \mathcal {N}(H{\varvec{x}}_t^{(i)} \;\vert \; {\varvec{\mu} _s^{(k)}}, \Sigma _s^{(k)})\right] \end{aligned}$$

which can be expanded to,

$$\begin{aligned}&\max _H \sum _{i=1}^N \sum _{k=1}^K r^{(i,k)} \log \left[ (2\pi ^{-\frac{n}{2}} \vert \Sigma _s^{(k)}\vert ^{-\frac{1}{2}})\right] \nonumber \\&\quad \times - \frac{1}{2}(H{\varvec{x}}_t^{(i)} - {\varvec{\mu }}_s^{(k)})^{T}(\Sigma _s^{(k)})^{-1}(H{\varvec{x}}_t^{(i)} - {\varvec{\mu }}_s^{(k)}), \end{aligned}$$

where the constants with respect to H can be omitted, leading to the minimisation of a weighted quadratic error,

$$\begin{aligned} \min _H \sum _{i=1}^N \sum _{k=1}^K r^{(i,k)} (H{\varvec{x}}_t^{(i)} - {\varvec{\mu }}_s^{(k)})^{T}(\Sigma _s^{(k)})^{-1}(H{\varvec{x}}_t^{(i)} - {\varvec{\mu }}_s^{(k)}) =: \mathbb {E}_Q(H). \end{aligned}$$

It can be shown that \(\mathbb {E}_Q(H)\) is convex [18], and the gradient is,

$$\begin{aligned} \nabla _H \mathbb {E}_Q(H) = 2 \sum _{k=1}^K (\Sigma _s^{(k)})^{-1} \sum _{i=1}^N r^{(i,k)} (H{\varvec{x}}_t^{(i)} - {\varvec{\mu }}_s^{(k)}) {\varvec{x}}_t^{(i)} {}^{T} \end{aligned}$$

meaning that the maximisation step with respect to H can be solved using a gradient-based optimiser. In the case where \(\Sigma _k = \Sigma \;, \forall k\), the projection matrix H can be found in closed form as,

$$\begin{aligned} H = M R X_t^T (X_t^T X_t)^{-1}, \end{aligned}$$

where \(M = \left[ {\varvec{\mu }}^{(1)},\dots ,{\varvec{\mu }}^{(k)}\right] \in \mathbb {R}^{D \times K}\) and \(R(i,k) = r^{i,k}\), where \(R \in \mathbb {R}^{K \times N}\). This result is similar to a least-squares solution for a multiple-output linear regression, and, therefore, leads to the possibility of extending the mapping to being nonlinear and adding a prior over H.

The algorithm is shown in Algorithm 1. The linear projection matrix needs to be initialised either randomly, or given some prior knowledge about the mapping. The number of classes K is defined completely by the labels if the source GMM is supervised, or should be selected via a model selection approach if an unsupervised GMM is utilised.

figure a

3 Artificial dataset

To demonstrate the capability of the linear transform H in the DA-GMM, an artificial dataset is utilised to evaluate the method’s ability to infer linear rotation matrices. The dataset consists of four, two-dimensional Gaussian components, as stated in Table 1. The source dataset was obtained by directly sampling from these distributions \(X_s = X\), whereas the target data were created by applying a rotation to the data obtained from these distributions, i.e, \(X_t = HX\), where H is parametrised given some angle \(\alpha\), as,

$$\begin{aligned} H(\alpha ) = \begin{bmatrix} \cos (\alpha ) &{} -\sin (\alpha ) \\ \sin (\alpha ) &{} \cos (\alpha ) \end{bmatrix}. \end{aligned}$$
Table 1 Gaussian components of the artificial dataset and number of training data points in the source and target domains (\(N_s\) and \(N_t\), respectively). Gaussian distributions are parametrised by a mean \(\mu\) and covariance \(\Sigma\), \(\mathcal {N}(\mu ,\Sigma )\)

The target test data are obtained from 1000 observations of each class transformed by the same rotation matrix (i.e. \(N_t^{test} = 4000\)).

Figure 1 states the testing accuracy on the target domain dataset against the rotation in degrees (where \(\alpha =\{5,10,\cdots ,355\}\)). The results shown correspond to the greatest accuracy from 25 random initialisations of the EM algorithm (with the training and testing data remaining the same for all 25 repeats), ensuring an optimal solution has been obtained. Despite the stochastic nature of the random initial conditions, the test accuracies are consistent with rotation (with all \(\approx 100\%\)). This is clear evidence that the DA-GMM approach is robust to rotations in two-dimensions. As an illustration of the approach, Fig. 2 depicts the source and target datasets, along with the transformed target dataset from the DA-GMM, for a rotation of 80\(^\circ\) (i.e. \(X_t = H(80)X\)).

Fig. 1
figure 1

Comparison of testing accuracies (—) on the target dataset for different rotations of the artificial dataset

Fig. 2
figure 2

An example of the artificial dataset when \(\alpha =80^\circ\). Panels (a) and (b) present the source data \(X_s\) (\(\cdot\)), and the untransformed target data \(X_t\) (\(\triangleleft\)), respectively. Panels (c) and (d) show the DA-GMM predictions on the transformed target \(\hat{X}_t\) (\(\triangle\)) training and testing data, respectively (where Panel (c), includes the source data for reference). Panels (a), (c) and (d) include visualisations of the inferred source Gaussian mixture model clusters denoted by \(\mu\) \((+)\) and \(2\Sigma\) (—)

4 Case study: shear-building structures

Within the field of PBSHM a challenging problem is to identify mappings between heterogeneous populations. This difficulty arises because, as structures become increasingly different, their features spaces are more dissimilar, typically making the mapping more complex. This case study considers a population of two numerical four degree-of-freedom shear-building structures, constructed from different material properties (one aluminium and the other steel) and geometries.

The numerical simulations are obtained from four degree-of-freedom lumped-mass models, where the outputs of the model were four damped natural frequencies \(\{\omega _1,\omega _2,\omega _3,\omega _4\}\) (calculated in a similar way to [3, 12]). The mass of each floor was parametrised by a length \(l_f\), width \(w_f\), thickness \(t_f\) and density \(\rho\). The stiffness between each floor is formed via four rectangular cross-sectioned cantilever beams parametrised by length \(l_b\), width \(w_b\), thickness \(t_b\) and Young’s Modulus E. Damage is introduced to the beams via an open crack, using the stiffness-reduction model proposed by Christides and Barr [22]. For each structure, five damage scenarios were considered, based on a localisation problem; the undamaged condition \(y=0\), and an open crack on one of the beams at each of the four floors (\(y=\{1,2,3,4\}\) numbered from the ground floor upwards). For each location, the crack had a length of 5% of the beam width and was located at 10% of the way up the beam’s length.

To introduce variability, the material properties \(\{\rho ,E\}\) and damping coefficients c were defined as random variables, where each output observation was obtained by a random draw from an underlying distribution. The properties for the two structures are shown in Table 2. The labelled training data for Structure One consisted of 100 observations of \(y=0\), and 75 observations for each of the four damage classes \(y=\{1,2,3,4\}\) (i.e. \(N_s = 400\)). The unlabelled training data for Structure Two comprised 50 observations of \(y=0\) and 30 observations of each damage class \(y=\{1,2,3,4\}\) (i.e, \(N_t = 170\)). The level of class imbalance was chosen to reflect SHM problems, where typically more normal condition data (\(y=0\)), are available than for each damage class individually. For both structures the test datasets were comprised of 1000 observations of each class (\(N_s^{test} = N_t^{test} = 5000\)).

Table 2 Geometric and material properties of the shear-building structures (1 and 2). Gaussian distributions are parametrised by a mean \(\mu\) and variance \(\sigma ^2\), \(\mathcal {N}(\mu ,\sigma ^2)\), and Gamma distributions by a shape \(\kappa\) and scale \(\theta\), \(\mathcal {G}(\kappa ,\theta )\)

The PBSHM scenario was to infer a mapping from Structure One to Structure Two, such that the labelled data in Structure One can aid in classifying the unlabelled data from Structure Two, i.e. Structure One is the source domain and Structure Two the target. To aid visualisation, the features in this analysis were the first two principal components of the four damped natural frequencies (calculated from the training data from each domain individually), i.e. \(X_s \in \mathbb {R}^{2\times N_s}\) and \(X_t \in \mathbb {R}^{2 \times N_t}\), for structures One and Two, respectively. The feature spaces are visualised in Fig. 3, where it can be seen that the source and target domains are very different and there is a large amount of overlap between classes \(y=\{1,2,4\}\) in the target domain. It is hoped that the DA-GMM will aid both in labelling the target domain and improving separability between these classes.

Fig. 3
figure 3

Comparison of Gaussian mixture models and their predictions for the shear-building case study on the training data; \(y=0\) is the normal condition, and \(y={1,2,3,4}\) denote damage at each floor. Source data \(X_s\) \((\cdot )\), target data \(X_t\) \((\lhd )\) and transformed target data \(\hat{X}_t\) \((\triangle )\) are depicted against the inferred Gaussian mixture model clusters denoted by \(\mu\) \((+)\) and \(2\Sigma\) (—). Panel (a) shows the source model applied to the target data and Panel (b) displays the DA-GMM predictions. Panels (c) and (d) show the supervised and unsupervised Gaussian mixture models inferred from the target data (where the unsupervised labels are determined by the proximity of the inferred clusters to the supervised model)

Five scenarios were examined, demonstrating the comparative performance of the DA-GMM approach against conventional maximum likelihood GMMs, with their accuracies compared in Table 3. The first scenario (a prerequisite for the DA-GMM), was training the source domain supervised GMM, where a testing accuracy of 100% was achieved. The second analysis applied the source domain GMM directly to the untransformed target data, as seen in Figure 3a, where a testing classification accuracy of 44.0% was achieved. This result shows that there is a need for performing domain adaptation, as there is domain shift between the source and target datasets. The second scenario in Figure 3b is where the target data have been transformed using the DA-GMM. The target data have been expanded to match the source GMM and as such have improved the testing accuracy to 81.5%. It is noted that the misclassification that has occurred is because of the overlap in the untransformed target data and its effect on the mapping.

The final two scenarios consider the target data alone (i.e. there is no attempt to transfer knowledge), with Figure 3c showing the results of the fully supervised GMM trained on a labelled target dataset, and Figure 3d displaying the unsupervised GMMFootnote 2 results (where classes have been assigned by the proximity of the inferred unsupervised clusters to the supervised model). These results show that with perfect information, a testing accuracy of 95.6% is achieved, and with no label information, the best accuracy one could achieve (assuming classes can be assigned with inspection knowledge) would be 76.8%. This evidences the fact that the DA-GMM provides better classification accuracy than an unsupervised model on the target domain (if labels could be assigned), and is a robust way of labelling the target domain from source domain observations.

Table 3 Training and testing accuracies for the different Gaussian mixture model scenarios. GMM\(_a\)(\(X_b\)) denotes a supervised Gaussian mixture model trained on domain a and applied to dataset \(X_b\), where the subscript \(^{un}\) denotes an unsupervised mixture model. It is noted that the unsupervised model labels are determined by the proximity of the inferred clusters to the supervised model and the accuracies are used here as a reference measure

5 Case study: Z24 and KW51 bridges

Inferring mappings between structures such that label information can be shared is an important aspect of PBSHM. By linking structures via mappings, any labelled data obtained for one structure can be applied to the rest of the population. In this section, domain adaptation is performed, such that a mapping is obtained in a heterogeneous population of two partially-labelledFootnote 3 bridge datasets, from the Z24 [19] and KW51 bridges [20]. By learning a mapping between the two bridge datasets, any future label information obtained for one structure can be directly applied to the other. This is particularly important for managers of bridge infrastructure, as any observation of damage, from any member of the population, can be used in diagnosing that health-state for any bridge in the population (given that a mapping can be inferred).

5.1 Z24 and KW51 datasets

The Z24 bridge dataset has been well-studied within the literature, with numerous analyses managing to identify the key events in the dataset [23,24,25,26,27,28,29,30,31,32]. The Z24 bridge, located in the canton Bern near Solothurn, Switzerland, was a concrete highway bridge that was used as an experimental structure for an SHM campaign before its demolition in 1998, as part of the SIMCES project [19]. The monitoring campaign occurred during the year before demolition, in which a range of environmental data, as well as the acceleration response at 16 locations, were measured. The acceleration responses were processed using operational modal analysis (OMA), such that the first four natural frequencies of the structure were obtained. Damage was introduced incrementally into the bridge, with relatively small scale damage beginning on the 10\({th}\) August 1998 — the pier was incrementally lowered by a few centimetres — with more substantial damage occurring after the 31\({st}\) August 1998, beginning with the failure of the concrete hinge. For a complete description of the benchmark dataset, the reader is referred to [19]. It is noted that model-updating approaches [23, 26, 27] have been able to detect when the pier was lowered by between 80–95mm, which occurred on the 17–18\(^{th}\) August. In contrast, several data-based approaches [24, 28,29,30,31,32] have detected the onset of the smallest damage introduced to the Z24 bridge, a lowering of the pier by 20 mm on the 10\({th}\) August, with several even detecting when the installation equipment was brought onto the bridge (9\({th}\) August).

The KW51 bridge is a steel bowstring railway bridge in Leuven, Belgium. A 15-month monitoring campaign occurred between 2018 and 2019, in which the acceleration response, the strains in the deck and gauges of the rails, the displacement at the bearings, as well as environmental data were all recorded [20]. The acceleration responses were processed using OMA to obtain the first 14 natural frequencies of the structure. During the monitoring campaign, every diagonal member was retrofit with a steel box to strengthen the design of the bridge, with details of the retrofit specified in [20]. The bridge condition before the retrofit was measured from 2\({nd}\) October 2018 until the 15\({th}\) May 2019, at which point the retrofitting process was carried out. The retrofit was completed on 27\({th}\) September 2019. Novelty detection has been successfully performed on this dataset using robust principal component analysis (PCA) and linear regression methods [33]. For a complete overview of the dataset the reader is referred to [20].

Although the two bridges are very different in design (forming a heterogeneous population), there are similarities in their modal responsesFootnote 4. Despite different absolute values in natural frequencies, the first and third natural frequencies of the Z24 have correspondence with the tenth and twelfth natural frequencies of the KW51; both represent vertical bending modes of the deck, with effectively the same nodal and anti-nodal pattern (see [19] and [20] for a visualisation of the Z24 and KW51 mode shapes, respectively). In addition, both bridges undergo a stiffening effect caused by cold temperature conditions. Figures 4 and 5 show the two datasets against ambient temperature, where it can be seen that the below-freezing conditions lead to a ‘stiffening-effect’ on the natural frequencies; in the Z24, the asphalt stiffens, compared to freezing of the rail ballast in the KW51. It is noted that, although these low-temperature conditions correlate to a below-freezing condition, there are no ‘ground truth’ labels available for these conditions in either dataset. Given the similarities in modal response and the existence of at least two normal conditions (normal ambient and low temperature), the datasets form a transfer learning problem which can be solved using the DA-GMM.

Fig. 4
figure 4

Z24 dataset. Ambient temperature, first (\(\omega _1\)) and third (\(\omega _3\)) natural frequencies. Horizontal line indicates 0 °C and the vertical black line the training and testing data split, with the red solid line denoting the onset of damage

Fig. 5
figure 5

KW51 dataset. Ambient temperature, tenth (\(\omega _{10}\)) and twelfth (\(\omega _{12}\)) natural frequencies. Horizontal line indicates \(0^\circ\)C and the vertical black line the training and testing data split, with the red solid lines denoting the start and end of the retrofit period

The datasets were divided into training and testing data for the DA-GMM mapping. The first 99 days of monitoring data from the Z24 are used as training data, as this time period covers both normal condition behaviours; the remaining data, including when damage occurred, are used to test the mapping. The KW51 training data were formed from the first 1500 data points covering the two temperature conditions, with the remaining pre-retrofit and post-retrofit data being used as testing data for the mapping. The training and testing divisions for both datasets are indicated on Figs. 4 and 5.

5.2 Statistic alignment and the unsupervised domain-adapted Gaussian mixture model

The aim of the analysis in this section was to find a mapping using the DA-GMM approach that aligns the data from these two bridges for the two environmental conditions in a completely unsupervised manner (as no ‘ground truth’ labels are known for either dataset for these conditions). Although completely unlabelled, the Z24 is considered as the source dataset (i.e. it forms the dataset where the GMM is inferred) and the KW51 is considered as the target domain (i.e. the KW51 is mapped onto the Z24 dataset). This change means that line one of Algorithm 1 is modified such that the source mean and covariances are inferred using an unsupervised GMM (i.e. no source labels \({\varvec{y}}_s\) are required) using an EM maximum-likelihood approach.

Before applying the DA-GMM, pre-processing is performed to aid transfer, in a similar manner to how normalisation is performed as part of best practice in machine learning algorithms. The pre-processing performs statistic alignment [7], a process of aligning the lower-order statistics of the feature space. In this case, the statistics of the first 100 data points (\(N=100\)) are aligned to the target domain, with the aim of centring both datasets around the normal condition i.e.,

$$\begin{aligned} X_t= & {} \frac{{\varvec{\omega _t}} - \mu _{1:N}({\varvec{\omega }}_t)}{\sigma _{1:N}({\varvec{\omega }}_t)} \end{aligned}$$
$$\begin{aligned} X_s= & {} \left( \frac{{\varvec{\omega _s}} - \mu _{1:N}({\varvec{\omega }}_s)}{\sigma _{1:N}({\varvec{\omega }}_s)} \right) \sigma _{1:N}(X_t) + \mu _{1:N} (X_t), \end{aligned}$$

where \({\varvec{\omega }}_s\) and \({\varvec{\omega }}_t\) are the set of natural frequencies for each bridge, \(\mu _{1:N}(\cdot )\) and \(\sigma _{1:N}(\cdot )\) find the mean and standard deviation of the first N data points of their arguments, and \(X_s\) and \(X_t\) are the source and target pre-processed features. The statistically-aligned features are presented in Fig. 6, where it can be seen that the offset between the source and target has been removed (removing the need for \(X_t\) to be augmented by a matrix of ones), and the problem has been simplified to one of learning a rotation matrix between the source and target datasets. The pre-processing steps mean that the DA-GMM algorithm can be modified, with the parameter set becoming \({\varvec{\theta }} = \{\alpha ,{\varvec{\pi }}\}\), where the linear projection matrix is parametrised as in Eq. (17) and \(\alpha\) is the angle of rotation; this reduces the number of parameters that need to be inferred by the DA-GMM model.

Fig. 6
figure 6

Statistic-aligned Z24 (source) \(X_s\), and KW51 (target) \(X_t\), features. Dashed line indicates data points used to statistically align the data, the solid black lines indicate training and testing data splits and the solid red lines denote the onset of damage and the retrofit states, respectively

Figures 4 and 5 indicate that there is class imbalance between the ambient and low temperature classes. This prior assumption, that there will be fewer data points in the low temperature class when compared to the ambient temperature class, can be encoded into the DA-GMM to learn an optimal mapping that equally prioritises both classes. In this case study, the maximisation step is modified such that Eq. (14) becomes,

$$\begin{aligned} \mathbb {E}_Q(H) = \min _H \sum _{k=1}^K w_k \sum _{i=1}^N r^{(i,k)} (H{\varvec{x}}_t^{(i)} - {\varvec{\mu }}_s^{(k)})^{T}(\Sigma _s^{(k)})^{-1}(H{\varvec{x}}_t^{(i)} - {\varvec{\mu }}_s^{(k)}), \end{aligned}$$

where \(w_k\) are a prior imbalance weight for each class. In this case study, the values of the weight vector \({\varvec{w}}\) are assigned based on the portion of the target training dataset below 0\(^\circ\)C, where the ambient condition is down-weighted and the low temperature is up-weighted; \({\varvec{w}} = \{0.055,\; 0.945\}\) for the ambient and low temperature conditions, respectively.

5.3 Domain adaptation results

A two-class unsupervised DA-GMM was inferred on the Z24 (unlabelled source), and KW51 (unlabelled target), statistically-aligned feature data. The inferred projection and source GMM are presented in Fig. 7a, with the natural frequencies of the bridges labelled according the predictive classes of the GMM displayed in Fig. 7b and c. It can be seen in Fig.7a, that the target data have been well-aligned with the source data, and that the inferred feature space retains physical meaning. The ambient normal condition is centred around the origin and can be viewed as the ‘baseline’ relative natural frequencies, with the low temperature effect causing an increase in the feature values, corresponding to stiffening behaviour. The GMM has identified the transition from the ambient to low temperature classes for both bridges, as shown in Fig. 7b and c, meaning that one GMM can be used to diagnose behaviour on both bridges. This result demonstrates the power of the DA-GMM, that two bridges of different design can be mapped onto a single feature space where classification can be performed for the complete population.

Fig. 7
figure 7

Unsupervised domain-adapted Gaussian mixture model predictions of the Z24 and KW51 bridge datasets. Panel (a) is a comparison of the two features for each bridge against the probability of being in either class, with the inferred (source) unsupervised Gaussian mixture model for reference (\(\mu\) \((+)\) and \(2\Sigma\) (—)). The Z24 is denoted \(X_s\) (\(\cdot\)) and the KW51 \(\hat{X}_t\) (\(\triangle\)). Panels (b) and (c) are the Z24 (\(\omega _1\) and \(\omega _3\)) and KW51 (\(\omega _{10}\) and \(\omega _{12}\)) natural frequencies against sample point

Once the mapping has been inferred, any future data can be projected onto the harmonised feature space. The testing data for the Z24 and KW51 were mapped into this space and an unsupervised GMM inferred using maximum-likelihood EM. Given that the number of classes is unknown, model selection was performed. The number of components in the GMMs were varied from two to nine, and the Bayesian information criterion (BIC) was used to select the appropriate model (given the criterion’s ability to penalise model complexity along with assessing the model’s fit). The Bayesian information criterion is,

$$\begin{aligned} BIC = m \ln {N} - 2 \ln {\hat{L}} \end{aligned}$$

where m is the number of parameters in the model, N the number of data points used in training and \(\hat{L}\) the estimate of the complete data likelihood of the unsupervised GMM. Figure 8 presents the BIC for each of the models considered, where ten repeats were performed to account for the variability in the EM algorithm from the random initialisations of the parameters. A seven-component mixture model produced the lowest BIC and, therefore, was selected as the most appropriate model. It is noted that analysis using an infinite Gaussian mixture model (also know as a Dirichlet process Gaussian mixture model) on the complete four-dimensional Z24 dataset, automatically selected seven distinct Gaussian clusters [30], highlighting that a seven-component model is appropriate for explaining the joint Z24 and KW51 datasets on the aligned two-dimensional feature space. The seven-component model with the lowest BIC out of the ten repeats was used to analyse the bridge datasets.

Fig. 8
figure 8

Comparison of the BIC for unsupervised GMMs, with the number of components ranging from two to nine. The GMMs were trained on the joint Z24 and KW51 datasets in the transformed space. The bars denote the minimum BIC over ten random initialisations, with the error bar denoting the maximum estimated BIC

The predictions from the unsupervised GMM are displayed in Fig. 9. The results indicate that, once a mapping is inferred, both bridges can be diagnosed together, with class information being shared between both bridges. Fig. 9b and c show that the key damage and retrofit states have been well identified, with the GMM identifying the onset of the smallest damage extent on the Z24, a lowering of the pier by 20 mm on the 10\(^{th}\) August 1998 (red line). This result is a significant achievement, given that the model has been inferred on a reduced dataset (only the first and third natural frequencies), and provides state-of-the-art performance when compared to existing methods applied to the Z24 in the literature [23,24,25,26,27,28,29,30,31,32]. As found in previous analysis on the Z24 dataset using an infinite Gaussian mixture model [30], the Z24 ambient normal condition is non-Gaussian and can be captured by two additional components; classes Five and Six. These Gaussian clusters show that the normal ambient condition (Class Four) starts to rotate and drift as the bridge experiences warmer environmental conditions (see Fig. 4), which are captured by classes Five and Six. In addition, the low temperature condition can be described by more than one Gaussian component [30], classes Two and Three in this model. Finally, it is noted that the feature space inferred by the DA-GMM has retained physical meaning, something that is particularly useful in avoiding problems associated with negative transfer, i.e. the damage class shows a softening affect from the origin, and the retrofit shows a new stiffened behaviour.

Fig. 9
figure 9

Unsupervised seven-component Gaussian mixture model predictions on the transformed Z24 and KW51 datasets. Panel (a) is a comparison of the two features for each bridge against the probability of being in one of seven classes, with the inferred (source and target) unsupervised Gaussian mixture model for reference (\(\mu\) \((+)\) and \(2\Sigma\) (—)). The Z24 is denoted \(X_s\) (\(\cdot\)) and the KW51 \(\hat{X}_t\) (\(\triangle\)). Panels (b) and (c) are the Z24 (\(\omega _1\) and \(\omega _3\)) and KW51 (\(\omega _{10}\) and \(\omega _{12}\)) natural frequencies against sample point. The black vertical lines in panels (b) and (c) denote the separation of the training data in the DA-GMM mapping, and the red vertical lines indicate the beginning of damage on the Z24 bridge — lowering of the pier by 20mm which occurred on 10/08/1998 — and the start of the retrofit state for the KW51 bridge

6 Conclusions

Population-based SHM is a branch of structural health monitoring that seeks to utilise information from across a population of structures, to improve diagnostic capabilities. Specifically, PBSHM seeks to transfer label information between members of the population. One method of transferring label information is via domain adaptation, where a mapping can be inferred such that the source and target datasets are harmonised, meaning a classifier trained on one domain will generalise to others in the population. This paper developed and demonstrated the potential of a domain adaptation approach constructed from a Gaussian mixture model formulation, for use in an PBSHM context; namely the domain-adapted Gaussian mixture model.

The domain-adapted Gaussian mixture model seeks to identify a linear mapping from a target dataset onto a source Gaussian mixture model. The approach was demonstrated on three datasets. The first was a numerical dataset, displaying the method’s ability to infer linear mappings in the form of two-dimensional rotations. The second dataset presented a numerical case study involving a heterogeneous population of two shear-building structures. The approach was shown to outperform naïvely applying the source GMM to the untransformed target dataset, and to be an improvement on an unsupervised GMM trained on the target domain. The final dataset, involving the Z24 and KW51 bridges, showed the approach in a completely unsupervised setting. An unsupervised GMM was inferred from the Z24 dataset, which was utilised in identifying a mapping for the KW51 bridge dataset onto the Z24 dataset. The inferred mapping aligned the two datasets allowing class information to be shared between the two bridges. The inferred feature space also retained physical meaning, something not possible with many of the existing domain adaptation technologies. The case study also demonstrated the potential of a PBSHM approach, even when no labels are known initially; as any future labels from one structure can be used in diagnosing the others in the population.

The DA-GMM presents an alternative to existing domain adaptation approaches, using a probabilistic framework and inferring a mapping directly from the target to source domains. Future research should seek to extend the mapping to be nonlinear, whether by a regression viewpoint, via a basis function approach or the kernel trick, or whether via normalising flows [34]. In addition, more robust estimates of the projection parameters and mixing proportions could be obtained if a Bayesian model was constructed by introducing priors on both these parameters. The model could then be solved using a variational inference approach that may be more robust to initial conditions. The DA-GMM is a promising tool for PBSHM and could also be used for more traditional concept drift scenarios on a single structure.