1 Introduction

There has been an exponential growth of data generated from sensors and computing devices connected to the Internet known as the Internet of Things (IoT). The IoT has penetrated pervasively into most aspects of human life everywhere such as civil infrastructures, health-care centres, transportation, etc., wherein smart services are utilized to continuously monitor every activity at all times and in real-time. In the field of civil infrastructures, IoT has provided flexibility and added value to Structural Health Monitoring (SHM) applications to generate actionable insights.

The applications of SHM aim to provide an automated process for damage detection in complex structures such as bridges using sensing data collected through multiple networked sensors attached to it [1]. This data is then utilized to gain insight into the health of a structure and make timely and economic decisions about its maintenance. One of the traditional approaches for structural damage detection is known as model-driven which constructs a numerical model for the structure based on finite element analysis [2]. However, a numerical model can be impractical as it cannot always sensibly capture the behavior of the real structure. On the other hand, a modern technique known as the data-driven approach has been successfully adopted in SHM and its applications. The data-driven approach uses machine learning algorithms to construct a model from measured data and then makes predictions for new measured responses to detect structural damage. This approach has brought a concrete aspect to IoT in SHM and enabled IoT smart applications. Bridges are critical to our society as they connect various separated locations to allow the flows of people and goods within cities. Bridges are influenced by several factors such as environmental conditions (wind, ambient temperature change,..etc) and various loads (constant and temporary loads), which makes them prone to damage and potential failures. Any problem with such a structure from small damage to catastrophic failures would result in significant economic and life losses. Most of the structural maintenance approaches are time-based, visual inspections are carried out at a predefined regular schedule which can be either too early or too late to detect damage. However, SHM is a condition-based approach; it uses data sensed continuously to provide real-time monitoring so that necessary maintenance actions can be taken once damage, or abnormal change in the structure behaviour, occur.

There are two data-driven mechanisms that have been commonly known for damage detection in complex structures. The first mechanism relies on a centralized machine learning model which requires transmitting the generated sensing data from the deployed sensors to a central processing unit to assess the structural condition. The data then is either aggregated or fused in one data structure to capture the correlation and relations between the measured variables from all sensors, and to learn the different aspects of the data (temporal, spatial, and feature) at the same time [3, 4]. This mechanism allows capturing the underlying structural aspects using multi-way sensing data which has made it successful as it achieved good accuracy in terms of damage detection. However, the use of such centralized model has its drawbacks, especially for SHM. It is not practical in the context of real-time monitoring and resource-constrained environment since sensors collect vibration measurements using accelerometers at high frequency during a time period, thus, contain a sequence of thousands of data points to be transmitted very frequently to the central model. Moreover, wireless transmission costs more energy than local processing, thus poses several challenges for battery-powered wireless nodes.

The alternative mechanism to handle the centralized model restrictions is to perform the learning from the sensed data in a distributed environment similar to the promising edge or distributed computing paradigms [5]. The distributed learning approach has a number of benefits including reducing the intensive data transmission over the network, conserving the energy of sensor nodes and, reducing the workload overhead on the central server on which data processing (learning) occurs.

One challenge that emerges with the distributed learning approach is that the measured data in SHM are often multi-way and highly redundant and correlated, i.e., many sensors at different locations simultaneously collect data over time. Therefore, a single sensor node analysis cannot capture all of these correlations and relationships together in a distributed learning model. In contrast, the centralized learning analysis allows learning from such data in multiple modes at the same time. The work proposed by Mehri et al. [5] has thoroughly investigated and compared the performance of the centralized prediction models versus a single sensor node model (decentralized) prediction. Their experimental results show that the centralized model was able to successfully detect the presence of very small damage in a structure and to monitor its progress over time. On the other hand, the distributed learning models which were constructed for each sensor not only reduced the sensitivity of the models but also failed to monitor the progress of damage in the monitored structure.

Therefore, developing an effective and efficient damage detection model for SHM applications requires information derived from many spatially-distributed locations throughout large infrastructure covering various points in the monitored structure. However, consolidating this data in a centralized learning model can often be computationally complex and costly. This motivates for developing a more advanced model that utilizes the centralized learning model but without the need to transmit the frequently measured data to a single processing unit. In this study, we propose a novel approach to overcome the above-discussed challenges of centralized and decentralized learning mechanisms in SHM. Our approach is developed based on federated learning (FL) augmented with Tensor Data Fusion for damage detection in complex structures such as bridges.

Our approach is derived from an auto-encoder neural network (ANN) as a damage detection model and employs tensor data analysis to perform data fusion for a wired connected sensor in SHM applications to reduce the communications in the FL network. The emerging Federated Learning (FL) concept was initially proposed by Google for improving security and preventing data leakages in distributed environments [6]. FL allows the central machine learning model to build its learning from a broad range of data sets located at different locations. It aims to train a shared centralized machine learning model using datasets stored and distributed across multiple devices or sensors. In the context of our study, we devise an FL-approach that enables multiple IoT devices (sensors) to collaborate on the development of a central learning model, but without needing to directly share or pool all data measured from several sensors with each other. Our approach can work efficiently and effectively by sharing the model coefficients of each client model only rather than the whole data collected by all participating sensors at each period of time. The effectiveness of the model learning continues to improve over the course of several training iterations during which the shared models get exposed to a significantly wider range of data than what any single sensor node possesses in-house. Our proposed FL-based approach decentralizes machine learning by removing the need to pool data into a single location and through training the centralized model in multiple iterations at different locations (where sensors are deployed).

Although FL results in reducing data transmission and improving data security, it raises significant challenges in how to deal with non-IID (Independent and Identically Distributed) data distribution and statistical diversity. In fact, Stochastic Gradient Descent (SGD), which is widely used in training models, may lean towards one distribution than the other leading to a non-generalized model. The work proposed in [7, 8] has thoroughly investigated the performance of the global model on clients’ local data by showing how the accuracy decreases when the data diversity increases. To address the non-IID challenge in FL, we employed Moreau envelopes [9] as a regularized loss function in the learning process of the clients’ models. The rationale idea behinds Moreau envelope is to leverage the central model in optimizing the clients’ models not only using Federated Averaging (FedAvg), but personalize it w.r.t its local data distribution. The contribution of the work in this study is twofold.

  • This paper addresses the problem of non-IID distribution of data in FL for better generalization of the clients’ models.

  • We extensively evaluated the performance of personalized FL augmented with tensor data fusion method to demonstrate the effectiveness of the proposed method in constructing generalized clients’ models

The remaining of this paper is organized as follows. Section 2 reviews the related work and Sect. 3.1 provides a short overview of the original Auto-encoder Neural Network method. Our novel online personalized FL with tensor algorithm is described in Sect. 4. Section 6 then presents experimental results. Finally, we summarize our contributions and conclude this paper in Sect. 7.

2 Related work

Federated learning gained a lot of interest in recent years and as a result, it has attracted AI researchers as a new and promising machine learning approach [10, 11]. This federated learning approach attracts several well-suited practical problems and application areas due to its intrinsic settings where data needs to be decentralized and privacy to be preserved. However, only a few applications, that have been reported in the literature, utilized the federated learning approach to construct a global model. For instance, Bonawitz et al.[12] adopted the federated learning settings to develop a system that solves the problem of next-word-prediction in mobile devices. On the other hand, several other studies focused on addressing the training challenges of a central model to support all local data training especially when the distribution of data across clients is highly non-IID (independent and identically distributed). For example, Hanzely et al. [13] claim that a central model is still too far mature from the typical usage of one client. Similarly, Kairouz et al. [10] discuss the broad challenges and open problems in the field of applying federated learning to different problems.

McMahan et al. [14] propose the first federated learning-based algorithm named FedAvg. It uses the local Stochastic Gradient Descent (GSD) updates to build a global model by taking average model coefficients from a subset of clients with non-IID data. Subsequently, Guha et al. [15] propose a new method to learn a global model in one single round of communication between the client and the central model. Other studies such as [16, 17] address the limitations of communication among distributed models in federated learning settings by performing periodic averaging, partial device participation, and quantized message-passing. Meanwhile, other research studies such as [18,19,20] suggest multiple local optimization rounds before reporting the learning from the distributed models to the central server.

Several other methods have been proposed to achieve personalization in federated learning. Recently, Smith et al. [21] propose a new algorithm named MOCHA based multi-task learning (MTL) framework to address the non-IID challenge in federated learning. However, the author in [13] proposes an L2GD algorithm that combines the optimization of the local and global models. Similarly, Dng et al. [8] develop an adaptive personalized federated learning (APFL) algorithm which mixes the user’s local model with the global model. Another rationalization method called FedPer proposed by [22] which divides the layers in a neural network into a base and personalized, while both layers are trained by clients but only the base layers are trained by the server.

In this paper, we address the communication problem that arises in a federated learning environment where several distributed models (clients) communicate with the central model to report its learning to the central model (the sever). Our approach is centered around employing a tensor as a data fusion method for wired connected clients in SHM applications. Tensor analysis has been successfully applied in many application domains such as civil engineer, social network analysis, and computer vision [23,24,25], and produced promising results. In our proposed approach, we also address the non-IID challenge in federated learning settings by developing a personalized model which is optimized for each distributed data model. To the best of our knowledge, our federated-based approach and its application the structural health monitoring (SHM) domain create new research contributions.

3 Background

3.1 Autoencoder neural network

Autoencoder neural network (ANN) is an unsupervised learning process which has the ability to learn from one class data. It is an extension of the traditional neural network which basically designed for supervised learning when the class labels are given with the training examples. The rational idea of an autoencoder is to force the network to learn a lower dimensional space for the input features, and then try to reconstruct the original feature space. In other words it sits the target values to be approximately equal to its original inputs. In this sense, the main objective of autoencoders is to learn reproducing input vectors \(\{x^1, x^2, x^3, \dots , x^m\}\) as outputs \(\{\hat{x}^1, \hat{x}^2, \hat{x}^3, \dots , \hat{x}^m\}\). The architecture of ANN is made up of L layers (\(L=3\) for simplification) denoted by input, hidden and output layers. Each layer consists from a set of nodes. Layer \(l_1\) is the input layer represents features which encoded into the hidden layer \(l_2\), and then decoded into the output layer \(l_3\). The learning process of ANN starts by successively computing the output of each node in the network. For a node i in layer l it calculates the output value \(z^{(l)}_i\) by computing the total weighted sum of the input values which also includes the bias term using the following equation:

$$\begin{aligned} z_i^{(l)} = \sum _{j=1}^{n}\theta _{ij}^{(l-1)} a_j^{(l-1)} \end{aligned}$$
(1)

where \(\theta\) is the coefficient weight written as \(\theta _{ij}\) when associated with the connection between node j in layer \(l-1\), and node i in layer l, and \(a_j^{(l-1)}\) is the output value of node j in layer \(l-1\). The resultant output denoted by \(a^{(l)}_i\) is then processed through an activation function as follows:

$$\begin{aligned} a_i^{(l)} = \mathcal {F}(z_i^{(l)}) \end{aligned}$$
(2)

where \(\mathcal {F}()\) is the activation functions. The most common activation functions in the hidden layers are sigmoid and hyperbolic tangent. However, in autoencoder settings a linear function is used in the output layer since we don’t scale the output of the network to a specific interval.

The autoencoder uses back propagation algorithm to learn the parameters \(\theta\) which approximate \(\hat{x} \approx x\). In each iteration of the training process, we calculate the cost error \({\mathcal {L}}(\theta _{ij}^{(l)}; x_i)\) using Eq. (3) and then propagate it backward to the network layer.

$$\begin{aligned} {\mathcal {L}}(\theta _{ij}^{(l)}; x_i) =\frac{1}{M} \sum _{m=1}^{M} \Bigg ( \frac{1}{2} \Vert x^m - \hat{x}^m\Vert ^2 \Bigg ) \end{aligned}$$
(3)

In this setting, we perform a stochastic gradient descent step to update the learning parameters \(\theta _{ij}^{(l)}\). This is done by computing the partial derivative of the cost function \({\mathcal {L}}(\theta _{ij}^{(l)}; x_i)\) (defined in Eq.  3) as follows:

$$\begin{aligned} \theta _{ij}^{(l)} := \theta _{ij}^{(l)} - \eta \frac{\partial }{\partial \theta _{ij}^{(l)}} {\mathcal {L}}(\theta _{ij}^{(l)}; x_i) \end{aligned}$$
(4)

The complete steps are summarized in Algorithm 1.

figure a

Once we successfully trained the autoencoder, the network will be able to reconstruct an new incoming positive data, while it fails with anomalous data. This will be judged based on the reconstruction error (RE) which is measured by applying the Euclidean norm to the difference between the input and output nodes as shown in Eq. (5).

$$\begin{aligned} \mathrm{{RE}}(x) = \Vert x_i - \hat{x}_i\Vert ^2 \end{aligned}$$
(5)

The measured value of RE is used as anomaly score for a given new sample. Intuitively, examples from the similar distribution to the training data should have low reconstruction error, whereas anomalies should have high anomaly score.

4 Personalised federated learning framework for SHM

Our approach comprises of a number components which are detailed in this section. First, we described how the sensor data is monitored and modelled in distributed environment (in SHM such as bridges).Second, we define how the personalized federated learning is formulated as distributed models (clients and central server) and the communication among them. In our approach, we also propose aggregating the data points measured from multiple sensors into one client node.Third, we present our tensor data fusion method to perform this step. Fourth, describe our formal model of updating the learning experienced by the clients and server to detect structural damage.

4.1 Data modelling

Consider a set of sensor nodes S mounted on different locations of a bridge to measure and transmit sensing data related to a structural event. Each sensor node \(\mathcal {S}\) can perform computation on sensed acceleration data to detect the damage in their vicinity. The data points collected concerning to the vibration responses are assumed to be a vector as \(X_i = \big [ x_1,x_2,\dots , x_n \big ]\); where n is the total number of data points sensed by a sensor node \(\mathcal {S}\) over a time duration. Due to the lack of available data from the damaged state of the structure in most cases, the acceleration measurements we collect from a bridge is often corresponding to the healthy condition of the bridge. This data covers various environmental and ambient conditions as well as operational conditions, such as traffic loading. Therefore, in the training phase, we construct a one-class model by extracting the statistical features from raw acceleration data in the healthy condition of the bridge. The trained model will be used later to classify the raw acceleration measurement from unknown conditions of the bridge as either healthy or damaged. Each healthy training sample \(x_i \in \{X_i\}_{i=1}^n\) is an m-dimensional feature vector \(x^j = {x^1, x^2,\dots , x^m}\), where \(j= 1, \dots , m\) are the statistical features extracted from sensed acceleration data in healthy condition of the bridge. The total number of features j depends on the sampling frequency and sampling window, and the total number of data points (n) in \(X_i\) depends on the number of events. In our approach, we use an auto-encoder neural network as an anomaly detection method to fit a one-class model using healthy data. However, we need now to formulate this model in a federated learning (FL) setting with the help of tensor data analysis as a data fusion method.

4.2 Problem formulation in personalized federated learning

In FL setting, a set of clients S are connected to a central server to solve the following problem:

$$\begin{aligned} \underset{w \in \mathbb {R}^d }{\min } f(w) := \frac{1}{S} \sum _{s=1}^{S}f_s(\theta ), \end{aligned}$$
(6)

where \(f_s\) is the expected loss over the data distribution corresponding to a sensor node client \(\mathcal {S}\) which defined as follows:

$$\begin{aligned} f_s(\theta ) := \mathbb {E}_{x_i} [{\mathcal {L}}_s(\theta ;x_i)] \end{aligned}$$
(7)

where \({\mathcal {L}}_s(\theta ;x_i)\) measures the error of model \(\theta\) given the input \(x_i\) defined in Eq. (3).

The stochastic gradient descent (SGD) method solves the above problem defined in Eq. (7) by repeatedly updates \(\theta\) to minimize \({\mathcal {L}}_s(\theta ; x_i)\). It starts with some initial value of \(\theta ^{(t)}\) and then repeatedly performs the update as follows :

$$\begin{aligned} \theta ^{(t+1)} := \theta ^{(t)} - \eta \frac{\partial {\mathcal {L}}}{\partial \theta } (x_i^{(t)} ,\theta ^{(t)} ). \end{aligned}$$
(8)

In FL, each client performs a number of E epochs at each round to compute the gradient of the loss over its local data and send the model parameters \(\theta _s^{t+1}\) to the server. The central sever aggregates these gradients and applies the global model parameters update by taking the average of the resulting models parameters as follows:

$$\begin{aligned} w^{(t+1)} := \frac{1}{S} \sum _{s=1}^{S} \theta _s^{(t+1)}. \end{aligned}$$
(9)

The server then sends \(w^{(t+1)}\) to all clients in which each one performs another round to update \(\theta ^{(t+1)}\) but with setting \(\theta ^{(t)} = w^{(t+1)}\) as defined in traditional FL-FedAvg. However, such simple averaging method may not be practical in real world applications where some clients may have very few local data points. Therefore, we propose an ensemble learning method to combine multiple learners by selecting a random subset of clients under the condition that clients will only share their local models if they reach a minimum accuracy performance on their local validation data.

We further investigate the problem of non-i.i.d in FL settings. In fact, our current approach may work well when clients (sensors) have similar i.i.d. data. However, it is unrealistic to assume that since data may come from different environments and contexts in FL settings thus have non-i.i.d. Therefore, we employ Moreau envelope (ME) [9] as a regularized loss function which helps decouple personalized model optimization from the global model learning in a bi-level problem stylized for personalized FL. With this scheme, instead of updating the clients’ models \(\theta ^{(t+1)}\) entirely based on the server model \(w^{(t+1)}\), we force the clients to pursue their own models based on \(\theta ^{(t+1)}\) but with adding an \(l_2\)-norm regularized loss function i.e ME based on \(w^{(t+1)}\) so that \(\theta _s\) will not deviate far from w. Geometrically, the global model can be considered as a “central point”, where all clients agree to meet, and personalized models are the points in different directions that clients follow according to their heterogeneous data distributions. In this setting, the client model update will be as follows:

$$\begin{aligned} \theta ^{(t+1)} := \theta ^{(t)} - \eta \frac{\partial {\mathcal {L}}}{\partial \theta } (x_i^{(t)} ,\theta ^{(t)} ) + \lambda \Vert \theta ^{(t)} - w^{(t+1)} \Vert ^2, \end{aligned}$$
(10)

where \(\lambda\) is a regularization parameter controls the penalty of \(l_2\)-norm.

4.3 Tensor data fusion

Our proposed approach also incorporates a data fusion step that merges data from a set of connected sensor nodes \(\mathcal {S}\) into one client node. In fact, sensors’ measurements usually have a high redundancy and correlation, which two-way matrix analysis may fail to capture all of these correlations and relationships together. A naive approach would simply concatenate the features obtained from different connected clients. However, unfolding the multi-way data and analyzing them using two-way methods may result in information loss and misinterpretation since it breaks the modular structure inherent in the data. Therefore, we present a method for data fusion using a tensor data structure that arranges the data from a set of connected sensor nodes as one single client node \(\mathcal {T}\) we call it a tensor node. This tensor node \(\mathcal {T}\) has data in a form of a three-way tensor \(\mathcal {X} \in \mathbb {R}^{I \times J \times K}\) where I represents the number of connected clients, J represents the number of features in each client, and K is the total number of data points sensed by a sensor node \(\mathcal {S}\). The structure of this tensor is shown in Fig. 1

Fig. 1
figure 1

Connected clients fused in a tensor

Once we arrange the data in a tensor form, we apply a tensor decomposition to extract latent information in each dimension of tensor \(\mathcal {X}\). This work adopts the CP decomposition (CANDECOMP/PARAFAC decomposition) method for tensor decomposition due to its ease of interpretation compared with the Tucker method [26]. CP decomposes \(\mathcal {X} \in \mathbb {R}^{I \times J \times K}\) into three matrices \(A \in \mathbb {R}^{I \times R}\), \(B \in \mathfrak {R}^{J \times R}\) and \(C \in \mathfrak {R}^{K \times R}\) where R is the latent factors. Each matrix represents latent information for each mode or dimension. It can be written as follows:

$$\begin{aligned} \mathcal {X}_{(ijk)} \approx \sum _{r=1}^{R}A_{ir} \circ B_{jr} \circ C_{kr} \end{aligned}$$
(11)

where “\(\circ\)” is a vector outer product.

We formulate the problem as follows:

$$\begin{aligned} \min _{A,B,C} \Vert \mathcal {X} - \sum _{r=1}^R \lambda _r \ A_{r} \circ B_{r} \circ C_{r} \Vert ^2_f, \end{aligned}$$
(12)

where \(\Vert \mathcal {X}\Vert ^2_f\) is the norm value which is the sum squares of all elements of \(\mathcal {X}\), and the subscript f denotes the Frobenius norm. \(A_{r}, B_{r}\)and \(C_{r}\) are rth columns of component matrices \(A \in \mathbb {R}^{I \times R}\), \(B \in \mathbb {R}^{J \times R}\) and \(C \in \mathbb {R}^{K \times R}\).

We applied the alternating least square (ALS) technique to solve the CP decomposition problem. It iteratively solves each factor matrix by fixing the other two matrices using a least-square technique until it meets a convergence criterion [27]. The ALS technique is described in Algorithm 2.

figure b

Once the convergence criteria are met, the ALS algorithm returns the three matrices A, B, and C. As mentioned before, the matrix \(C \in \mathfrak {R}^{K \times R}\), which is associated with the time mode, will be used later for constructing the central model. This matrix has K rows, each of which represents a data instance aggregated from all the clients given in a tensor node \(\mathcal {T}\) at a specific time.

4.4 The client–server learning phase

Based on the FL problem formulation and tensor data fusion described above, we present our structural damage detection approach. Our method uses the FL approach which is augmented with a tensor data fusion method, and an auto-encoder neural network model for structural damage detection. Each tensor node \(\mathcal {T}\) on the client-side will initially fuse the sensors data in a tensor \(\mathcal {X}\) and apply CP algorithm using ALS to decompose \(\mathcal {X}\) into three matrices AB,  and C. The matrix C, which represents the time mode, will be then used in the learning process. Our auto-encoder neural network uses the stochastic gradient descent algorithm to learn reconstructions \(\hat{C}\) that is close to its original input C. At each round, each client \(\mathcal {T}\) performs a number of E epochs to update the model parameters and report them to the central server. Algorithm 3 explains the learning phase at given a tensor node \(\mathcal {T}\).

figure c

Each client with a tensor node \(\mathcal {T}\) will report the model parameters \(\theta ^{(t+1)}\) back to the central server. Once the updates are received, the central server will aggregate them using Eq. (9) and send them back to all client tensor nodes \(\mathcal {T}\). Figure 2 gives an overview architecture of our federated learning network and Algorithm 4 explains the learning phase at the central server.

Fig. 2
figure 2

The architecture of the federated learning network with deep auto-encoder model

figure d

5 Experimental setup

5.1 Data collection

We conducted experiments on two case studies using structural vibration based datasets acquired from a network of accelerometers mounted on two bridges in Australia, cable-stayed bridge (see Fig. 3) and Arch Bridge. For all experiments, three hidden layers were used in our ANN and the accuracy values were obtained using the F-Score (FS), defined as \(\mathrm{{F-score}} = 2 \cdot \dfrac{\text {Precision} \times \text {Recall} }{\text {Precision} + \text {Recall}}\) where \(\text {Precision} = \dfrac{\text {TP} }{\text {TP} + \text {FP}}\) and \(\text {Recall} = \dfrac{\text {TP} }{\text {TP} + \text {FN}}\) (the number of true positive, false positive and false negative are abbreviated by TP, FP and FN, respectively). The core consistency diagnostic technique (CORCONDIA) technique described in [28] was used to determine the number of rank-one tensors R when it decomposed using CP-ALS method described in 2. The CORCONDIA suggests \(R=2\) for all experimented data sets.

5.1.1 The cable-stayed bridge

Fig. 3
figure 3

(source: Google Earth)

Side view of the cable-stayed bridge from our first case study, Western Sydney, Australia

This bridge is 46 m long carries one traffic lane and one pedestrian lane. It is composed of single deck which is 0.16 m thick and 6.3 m wide. This deck is supported by four I-beam steel girders, and 16 stay cables. These cables are connected to the 33 m mast of the cable-stayed bridge. Figure 3 shows a side view of this bridge. We instrumented the Cable-Stayed Bridge with 24 uniaxial accelerometers and 28 strain gauges. However, we only used accelerations data collected from sensors Ai with \(i\in [1;24]\). Figure 4 shows the locations of these 24 sensors on the bridge deck. These sensors are connected to a HBM Quantum-X data logger attached to an embedded computer on one side of the bridge. This embedded device provides time synchronization to the data, and stores them temporarily before forwarding via WiFi to a gateway on a nearby building. This gateway then forwards the data over a VPN to our laboratory. The acceleration data are collected at 600 Hz, with a range of 2G and a sensitivity of 2 V/G. Each set of sensors on the bridge along with one line (e.g A1: A4) is connected to one client node and fused in a tensor node \(\mathcal {T}\) to representing one client in our FL network. It results in six tensor nodes \(\mathcal {T}\).

Fig. 4
figure 4

The locations on the bridge’s deck of the 24 Ai accelerometers used in this study. The cross girder j of the bridge is displayed as CGj [4]

For the sake of experiments, we emulated two different types of damage on this bridge by placing a large static load (vehicle) at different locations of a structure. Thus, three scenarios have been considered which includes: no vehicle is placed on the bridge (healthy state), a light vehicle with the approximate mass of 3 t is placed on the bridge close to location A10 (“Car-Damage”) and a bus with the approximate mass of 12.5 t is located on the bridge at location A14 (“Bus-Damage”). This emulates slight and severe damage cases which were used in our evaluation Sect. 6.1.

This experiment generates 262 samples (a.k.a events) each consists of acceleration data for a period of 2 seconds at a sampling rate of 600 Hz. We separated the 262 data instances into two groups, 125 samples related to the healthy state and 137 samples for the damage state. The 137 damage examples were further divided into two different damaged cases: the “Car-Damage” samples (107) generated when a stationary car was placed on the bridge, and the “Bus-Damage” samples (30) emulated by the stationary bus.

For each reading of the uni-axial accelerometer, we normalized its magnitude to have a zero mean and one standard variation. The fast Fourier transform (FFT) is then used to represent the generated data in the frequency domain. Each event now has a feature vector of 600 attributes representing its frequencies. The resultant data at each sensor node \(\mathcal {T}\) has a structure of 4 sensors \(\times\) 600 features \(\times\) 262 events.

5.1.2 The arch bridge

This case study used acceleration data captured by a network of accelerometers deployed on the Arch Bridge. It has 800 jack arches to support its joints, which were each instrumented by three tri-axial accelerometers mounted on the left, middle and right side of the joint, as shown in Fig. 5. We conducted two different experiments using this data. The first experiment uses six joints (named 1–6) where only one joint (number four) was known as a cracked joint. The data used in this study contains 36,952 events as shown in Table 1 which were collected for three months. Each event is recorded by a sensor node when a vehicle passes by a jack arch for 1.6 s at a sampling rate of 375 Hz resulting in a feature vector of 600 attributes in the time domain. All the events in the datasets (1, 2, 3, 5, and 6) are labelled positive (healthy events), where all the events in dataset 4 (joint 4) are labelled negative (damaged events). For each reading of the tri-axial accelerometer (x, y, z), we calculated the magnitude of the three vectors and then the data of each event is normalized to have a zero mean and one standard variation. Since the accelerometer data is represented in the time domain, it is noteworthy to represent the generated data in the frequency domain using Fourier transform. The resultant six datasets (using the middle sensor of each joint) have 300 features that represent the frequencies of each event. The data collected from the three sensors are fused in one tensor node \(\mathcal {T}\) which represents one client node in our FL network.

Fig. 5
figure 5

Evaluated joints on the arch bridge

Table 1 Number of samples in each joint of the arch bridge dataset

The second experiments involve a large set of 85 joints located on different structural locations on the bridge. In addition to the event responses, each node at the joint also collects continuous ambient vibration data at 1500 Hz during midnight. This data is filtered to a 1-min continuous record of ambient responses i.e. a period where no vehicle was driving over the joint. Note that for ambient data collection, only one sensor (middle one) in each node collects data. In the scheme of the FL network, we want to use this ambient data to group bridge substructures with similar behavior into one client and then see whether our personalized FL-tensor with ME method can learn from non-i.i.d. This study leads us to apply k-means clustering technique which is described in the following section.

5.2 Substructure grouping using k-means

k-means clustering is a popular clustering technique. It partitions data into k clusters so that the within-cluster sum of squares are minimized. This optimization problem is typically addressed using an iterative technique. Convergence is reached when the centroids no longer change. The k-means algorithm is sensitive to the centroids initialization and the algorithm may converge to a sub-optimal solution. We used k-means++ method proposed by Arthur and Vassilvitskii [29] to optimize centroids initialization before applying the standard k-means technique. For selecting the number of clusters k we use the Silhouettes coefficient proposed by [30]. For each cluster group, we obtain from this process, we fuse all the events data for each joint exits in the same cluster in one tensor node \(\mathcal {T}\) which represents one client node in our FL network.

6 Experimental results and discussions

In this section, we present the performance evaluation of our proposed federated learning method in terms of detecting damage in civil structural. Our experimental valuation is based on the two real datasets; the cable-stayed bridge and the Arch bridge.

6.1 The cable-stayed bridge

We separated the 262 data instances into two groups, 125 samples related to the healthy state and 137 samples for the damage state. The 137 damage examples were further divided into two different damaged cases: the“Car-Damage” samples (107) generated when a stationary car was placed on the bridge, and the “Bus-Damage” samples (30) emulated by the stationary bus.

‘We randomly selected eighty percent of the healthy events (100 samples) from each tensor node \(\mathcal {T}\) to form training multi-way of \(\mathcal {X} \in \mathbb {R}^{4 \times 600 \times 100}\) (i.e. training set). The 137 examples related to the two damage cases were added to the remaining 20% of the healthy data to form a testing set, which was later used for the model evaluation.

At each client node \(\mathcal {T}\), we initially applied Algorithm 2 to decompose the tensor \(\mathcal {X}\) into three matrices A, B, and C which was used to construct learn the ANN model at each client as well as the central model using Algorithms 3 and 4. Although no data from the damaged state has been employed to construct the central model, each personalized local client model was able to identify the damage events related to “Car-Damage” and “Bus-Damage” with an average F-score accuracy of \(0.92\pm 0.01\). We compared the results of our PFL-tensor with tensor FL-tensor [31]) and traditional FedAvg [14]. Table 2 show the resultant FP, TP and F-score accuracies. As can be seen, PFL-tensor produced better results than FL-tensor and FedAvg. Furthermore, the personalized client’s model was also able to separate the two damage cases (“Car-Damage” and “Bus-Damage”) where the reconstruction error values were further increased for the samples related to the more severe damage cases related to “Bus-Damage”. This is what we anticipated discovering from tensor which can extract damage-sensitive features. Moreover, it reduces the time communication by reducing the number of clients in our FL network.

6.2 The arch bridge

6.2.1 Six joints experiments

For each dataset, we randomly selected 80% of the positive events for training and 20% for testing in addition to the unhealthy events in dataset 4. At each client/joint \(\mathcal {T}\), we fused data from the three sensors in a multi-way tensor of \(\mathcal {X} \in \mathbb {R}^{3 \times 300 \times n}\) where n represents number of the training samples defined in Table 1. Similar to the last case study, we applied the three Algorithms 2, 3 and 4 to decompose the tensor \(\mathcal {X}\) and to learn the ANN model at each client as well as central model, respectively. Each personalized client’s model was able to identify its local healthy samples with an average F-score equal to \(0.88\pm 0.02\) . The model at client/joint 4 was also able to identify 0.86 of the damage samples. These results demonstrate that the PFL-tensor approach without data sharing is still able to identify damage events even-though these events were not involved in the training process. There is no doubt that this work also demonstrates that learning from massively decentralized data is still challenging and needs improvement especially in the prediction accuracy of damage events at joint 4. Table 2 shows the resultant FP, TP, and F-score accuracies of PFL-tensor compared to FL-tensor and FedAvg methods. Similarly, PFL-tensor outperformed these two methods in damage prediction accuracy with less false alarm rates (Table 2).

Table 2 Comparison of the TP, the FP, and the F-score between our FL-Tensor and FL-FedAvg

6.2.2 85 joints experiments

Our initial experiments here were to perform substructure grouping of the 85 joints using k-means++ method with Silhouette coefficient to select the best values of k. We run k-meanse++ at different values of k in the range of [2, 12]. This selection was guided by domain knowledge of SHM applications. In practice, the maximum k value could be set equal to the number of structural spans of a bridge (see Fig. 8). For example, k could be set to 12 for a bridge that has 12 different structural spans. Our experimental results suggest that joints can be clustered into 9 groups with the highest Silhouette coefficient of 0.71 as shown in Fig. 6. By analyzing these clusters, we found that joints with similar substructure such as joints located on the middle of the bridge were clustered in one group and most of the substructures located on the edges were clustered in another group as shown in Fig. 7.

Fig. 6
figure 6

Silhouette for different k parameters

Fig. 7
figure 7

Node ID and substructure ID, ordered from South to North

Fig. 8
figure 8

Overview of the arch bridge

For each resultant cluster, we fuse all the events vibration data of each joint in one tensor node \(\mathcal {T}\) to represent one client node in our FL network, and we randomly selected 80% of the positive events for training and 20% for testing in addition to the unhealthy events related to joint 4 which was grouped with other joints in cluster 1. Each client/cluster now exists in a form of multi-way tensor \(\mathcal {X} \in \mathbb {R}^{\mathcal {J} \times 300 \times n}\) where n represents the smallest number of the training samples among joints within the same cluster, and \(\mathcal {J}\) is the number of joints within the same cluster. We applied here again the three Algorithms 2, 3 and 4 to decompose the tensor \(\mathcal {X}\) and to learn the ANN model at each client as well as central model, respectively. Our personalized client model was able to identify its local healthy samples with an average F-score equal to \(0.87\pm 0.02\) . The model at client 1 which contains damage data related to joint 4 was also able to identify 0.85 of the damage samples (Fig. 8).

7 Conclusions

In this paper, we present a novel machine learning approach for detecting damage in structural health monitoring. Our method employs a Federated Learning (FL) approach and tensor data fusion technique. We model the damage detection as a FL problem where data collected from several distributed sensors attached to a complex structure (clients) learn continuously in local settings without the need to share the data into a centralized learning model (server). Furthermore, our approach models the correlations and relationships among the various sensor nodes and shares the learning with a central model. Our experimental evaluation on two real bridge structure datasets showed promising damage detection accuracy by considering different damage scenarios. In the “Cable-Stayed Bridge” dataset, our FL-based method achieved an accuracy of 94–97%. Our centralized model based on shared models learning also showed that we were able to monitor the progress of damage in the structure by providing increasing reconstruction error values for the samples related to “Bus-damage” events. In the Arch Bridge dataset, our FL-based method achieved 86% damage detection accuracy. The experimental results of these case studies demonstrated the capability of our FL-based damage detection approach with the incorporation of tensor data fusion method to improve the damage detection accuracy and to avoid the problems of transmitting sensed data over the network (network traffic, the low energy consumption of the sensor nodes and vulnerability of the data). Our future work aims to improve our prediction accuracy and better optimisation of the personalized federated learning approach in our FL network, and further to apply it onto different application domains.