1 Introduction

Nowadays, structural health monitoring (SHM) represents a fundamental research field in a society where historical and modern infrastructures must coexist harmoniously. To preserve the integrity of thousands of structures, contain maintenance costs, and increase safety, early detection of anomalies before severe damages occur is a cornerstone of civil engineering [1].

As far as bridges are concerned, some statistics highlight the relevance of the problem. For example, currently, in Italy, there are almost 2000 bridges that require continuous and accurate monitoring; in France, 4000 bridges need to be restored, and 840 are considered in critical conditions; in Germany, 800 bridges are considered critic; in the United States of America, among the 600.000 bridges, according to a conservative estimate, at least \(9\%\) of them is considered deficient [2]. In this context, SHM offers several solutions for anomaly detection [3,4,5].

In literature, numerous damage detection and localization strategies have been proposed and tested [6, 7]. Part of them focuses on extracting the most significant damage-sensitive features of the structure under analysis. However, data management requires further investigation to determine how the sensing parameters impact on the anomaly detectors performance. Generally, damage detection techniques can be divided into model-free and model-based: in the former, the only information available is the one gathered by measurements (e.g., acceleration, temperature, position) [8], while in the latter, information comes from measurements and prior knowledge of the model of the structure [9, 10]. Model-based approaches tend to outperform model-free ones because of the prior knowledge of the structure; however, the solutions for a specific case are not easily generalizable due to the tight coupling with the model. The effect of environmental parameters, such as temperature, wind, and humidity, [11,12,13,14], and the influence of traffic loading [15], are usually taken into account to describe the structures behaviour exhaustively. In this work, environmental and traffic dependencies have been extracted directly by the anomaly detectors from the data. The constraint on the knowledge about traffic and environmental parameters has been relaxed thanks to the analysis of the long-term continuous monitoring measurements, that explore all the possible operational configurations of the structure. Since the monitoring procedure could be complex and requires a specific fine-tuning of several parameters that depend on the structure under analysis, the adoption of machine learning (ML) techniques to detect changes in the damage sensitive features received increasing interest recently [16,17,18,19,20]. In particular, in [21] a convolutional neural network (NN) is adopted to perform automatic features extraction and damage detection simultaneously, reducing the computational cost of the procedure. A multi-layer perceptron NN is used to evaluate the effectiveness of a previous feature extraction procedure in [22], and a NN is adopted to predict bridge accelerations in order to extract damage sensitive features in [23]. These approaches are typically tested on scaled or simulated structures, not considering the amount of data produced to be managed in a real-world monitoring scenario. Sensors displacement represents a widely investigated topic in SHM. Usually, the proposed strategies start from a model of the structure and place the sensors to minimize a cost function [24,25,26]. In this work, we propose a different paradigm. Starting from an oversized number of sensors, we first evaluate the effect of each accelerometer on the overall data set of measurements, than we select the subset of sensors necessary to preserve high performance on the anomaly detection task. Moreover, data management in SHM is still an open problem, that can be addressed at sensors network level [27, 28] or at structures network level [29, 30]. In this work we provide fundamental guidelines to deploy the monitoring system (at sensors network level and structures network level), taking into account the constraints introduced by the available resources.

The proposed framework starts with the fundamental frequencies extraction from accelerometric measurements through stochastic subspace identification (SSI), cleaning, and clustering [9, 16, 31,32,33,34,35,36] and then performs modal frequencies tracking in the time domain [19]. The first two fundamental frequencies are considered a feature space to train one-class classifiers to perform anomaly detection; intended as any non-negligible deterioration of the structure that affects its standard behaviour. In this work, we investigate the impact of data shrinking strategies on damage detection in bridges. In particular, we derive the performance of ML-based anomaly detection techniques varying the number of sensors, samples, and resolution bits to minimize the data storage/transmission requirements, in view of large-scale bridge monitoring.

This goal is crucial for several reasons: (i) If damage detection is performed locally on the bridge, reducing the size of the data set can contain energy consumption for elaboration and allow the use of low-cost computational units. (ii) If data are processed remotely and a battery-powered wireless network enables the connection with the remote server, reducing the amount of data can increase the network lifetime. Moreover, several internet of things (IoT) solutions have limited throughput, so reducing the volume of data collected may pave the way for the use of IoT networks in bridge monitoring. (iii) Continuous monitoring over the years generates a huge amount of data. Therefore, to contain the database size, it is recommended to use the minimum amount of information necessary.

To summarize, the main contributions are the following:

  • The performance of several ML algorithms for anomaly detection, such as principal component analysis (PCA), kernel principal component analysis (KPCA), Gaussian mixture model (GMM), and one-class classifier neural network (OCCNN)\(^2\) are compared in terms of accuracy, precision, recall, and \(F_1\) score.

  • The effect of the number of sensors on algorithms’ performance is investigated.

  • The impact of the number of samples and the resolution bits (bits per sample) on the classification accuracy is quantified.

  • To account for low-cost sensors in a typical large-scale monitoring, the effect of measurement noise power on damage detection is investigated [37].

  • The combined effects of the number of sensors, number of samples, and resolution bits are analyzed to find operational limits that ensure a predefined performance of the classification task.

The performance of the proposed solution is investigated on a real structure data set using the accelerometric data available for the Z-24 bridge [38, 39].

Throughout this paper, capital boldface letters denote matrices and tensors, lowercase bold letters denote vectors, \((\cdot )^{\mathrm{T}}\) stands for transposition, \((\cdot )^+\) indicates the Moore–Penrose pseudoinverse operator, \(||\cdot ||\) is the \(\ell _2\)-norm of a vector, \(\mathfrak {R}\{\cdot \}\) and \(\mathfrak {I}\{\cdot \}\) are the real and imaginary parts of a complex number, respectively, \({\mathbb {V}}\{\cdot \}\) is the variance operator, and \(\mathbbm {1}\{a,b\}\) is the indicator function equal to 1 when \(a=b\), and zero otherwise.

This paper is organized as follows. In Sect. 2, a brief overview of the acquisition system, the accelerometers setup, and the monitoring scenario is presented. The fundamental frequencies extraction technique adopted is described in Sect. 3. A survey of anomaly detection techniques is reported in Sect. 4. The volume of data generated by the acquisition system and some possible strategies to reduce it are presented in Sect. 5. Numerical results are given in Sect. 6. Finally, conclusions are drawn in Sect. 7.

2 System configuration

The Z-24 bridge was located in the Switzerland canton Bern. The bridge was a part of the road connection between Koppigen and Utzenstorf, overpassing the A1 highway between Bern and Zurich. It was a classical post-tensioned concrete two-cell box girder bridge with a main span of \(30\,\text {m}\) and two side spans of \(14\,\text {m}\). The bridge was built as a freestanding frame, with the approaches backfilled later. Both abutments consisted of triple concrete columns connected with concrete hinges to the girder. Both intermediate supports were concrete piers clamped into the girder. An extension of the bridge girder at the approaches provided a sliding slab. All supports were rotated with respect to the longitudinal axis that yielded a skew bridge. The bridge was demolished at the end of 1998 [38]. During the year before its demolition, the bridge was subjected to long-term continuous monitoring to quantify the bridge dynamics environmental variability. Moreover, progressive damage tests took place over a month, shortly before the complete demolition of the bridge, alternated with short-term monitoring tests while the continuous monitoring system was still running. The tests proved experimentally that realistic damage has a measurable influence on bridge dynamics.

Fig. 1
figure 1

ata acquisition setup along the Z-24 bridge: the selected accelerometers, their positions, and the measured acceleration direction [40]

2.1 Data collection and pre-processing

The accelerometer’s position and their measurement axis are shown in Fig. 1. In this work, we considered \(l=8\) accelerometers, identified as 03, 05, 06, 07, 10, 12, 14, and 16, which are present in both long-term continuous monitoring phase and in the progressive damage one.Footnote 1 The accelerometer orientation is highlighted in Fig. 1 with different colors, red, green, and blue, staying respectively for transversal, vertical, and longitudinal orientation. Every hour \(N_{\mathrm{s}}=65,536\) samples are acquired from each sensor with sampling frequency \(f_{\mathrm{samp}}=100\,\text {Hz}\) which corresponds to an acquisition time \(T_{\mathrm{a}}=655.36\,\text {s}\). Each accelerometer has a built-in antialiasing filter to avoid aliasing during the acquisition. Since the measurements are not always available, there are \(N_{\mathrm{a}}=4107\) acquisitions collected in a period of 44 weeks.

The block diagram depicted in Fig. 2 represents the sequence of tasks performed for the fully automatic anomaly detection approach presented in this work. Some pre-processing steps have been applied to the data to reduce disturbs, the computational cost, and the memory occupation of the subsequent elaborations. This could be particularly useful when the computational resources are limited, and in wireless networks, when the amount of data to be stored and transmitted represent an important constraint. First, a decimation by a factor of 2 is applied to each acquisition; hence the sampling frequency is scaled to \(f_{\mathrm{samp}}=50\,\text {Hz}\). Such sampling frequency is considered sufficient because the Z-24 fundamental frequencies fall in the \([2.5,\,20]\,\text {Hz}\) frequency range [38]. After decimation, data are processed with a bandpass finite impulse response (FIR) filter of order 30 with band \([2.5,\,20]\,\text {Hz}\), to remove out-of-band disturbances. At the end of the decimation step, the amount of samples for each acquisition \(N_{\mathrm{dec}}\) is already halved (\(N_{\mathrm{dec}} = N_{\mathrm{s}}/2 = 32,768\)) and that represent a first important step in the data management process.

To keep the notation compact, from now on we consider the data organized in a tensor \(\varvec{{\mathcal {D}}}\) of dimensions \(N_{\mathrm{a}} \times l \times N_{\mathrm{dec}}\).

Fig. 2
figure 2

Block diagram for signal acquisition, processing, feature extraction, tracking, and anomaly detection

3 Fundamental frequencies extraction

In this section, we approach the problem of extracting damage-sensitive features from the accelerometric measurements of the structure. Among a wide set of algorithms, we select the SSI, a data-driven strategy able to provide damage-sensitive features without a priori information about the structure [9]. After this procedure, a mode selection phase is provided to distinguish physical modes from spurious ones. Four different widely known metrics are used to accomplish this task that will be presented in the following [32, 34, 35, 41]. Finally, K-means algorithm is applied to cluster the data [16, 31] followed by a tracking algorithm performed onto the first two fundamental frequencies to filter outliers [19].

3.1 Stochastic subspace identification

SSI requires the selection of a model order \(n\in {\mathbb {N}}\) and a time-lag \(i\ge 1\). The following constraint must be ensured to correctly apply the algorithm, \(l\cdot i \ge n\) [9]. In this application we consider the model order n unknown, so it is varied in the range \(n \in [2,\,160]\) (with step 2), while the time-lag is \(i=60\) [9].

First of all, we define the block Toeplitz matrix for a given time-lag i, shift s, and acquisition a

$$\begin{aligned} \mathbf {T}_{s|i}^{(a)} =\begin{bmatrix} \mathbf {R}^{(a)}_i &{} \mathbf {R}^{(a)}_{i-1} &{} \dots &{} \mathbf {R}^{(a)}_{s+1} &{} \mathbf {R}^{(a)}_{s} \\ \mathbf {R}^{(a)}_{i+1} &{} \mathbf {R}^{(a)}_{i} &{} \dots &{} \mathbf {R}^{(a)}_{s+2} &{} \mathbf {R}^{(a)}_{s+1} \\ \vdots &{} \vdots &{} \ddots &{} \vdots &{} \vdots \\ \mathbf {R}^{(a)}_{2i-1} &{} \mathbf {R}^{(a)}_{2i-2} &{} \dots &{} \mathbf {R}^{(a)}_{i+1} &{} \mathbf {R}^{(a)}_{i} \\ \end{bmatrix} \end{aligned}$$
(1)

of dimensions \(li \times li\) where

$$\begin{aligned} \mathbf {R}^{(a)}_i=\frac{1}{N-i}{\mathbf {D}}_{(a,1:N-i,:)} {\mathbf {D}}_{(a,i:N,:)}^{\mathrm{T}} \end{aligned}$$
(2)

is a correlation matrix of dimension \(l \times l\), and matrix \({\mathbf {D}}_{(a,b:c,:)}\) is extracted from the data tensor \({\varvec{{\mathcal {D}}}}\) selecting a particular acquisition a. We drop index a to simplify the notation; for this reason, all the following tasks will be repeated for each acquisition. In order to factorize the block Toeplitz matrix (1) with \(s=1\), we apply the singular values decomposition (SVD) as follows

$$\begin{aligned} \mathbf {T}_{1|i} = \mathbf {U}^{(n)}\varvec{\Sigma }^{(n)} \mathbf {V}^{(n)\text {T}} \end{aligned}$$
(3)

where \(\mathbf {U}^{(n)}\) is an \(li \times n\) matrix that contains the left singular vectors arranged in columns, \(\mathbf {V}^{(n)}\) is an \(n \times li\) matrix that contains the right singular vectors arranged in rows, and \(\varvec{\Sigma }\) is an \(n \times n\) diagonal matrix that contains the singular values on its diagonal sorted in descending order. We also drop the index n so that the next steps will be applied for each model order. Selecting the correct number of singular values from the SVD, the matrix \(\mathbf {T}_{1|i}\) can be split in two parts

$$\begin{aligned} \mathbf {T}_{1|i} = \mathbf {O}_i\varvec{\Gamma }_i \,\,\,\text {with}\,\,\, \mathbf {O}_i = \mathbf {U}\varvec{\Sigma }^{1/2}\mathbf {S} \,\,\,\text {and}\,\,\, \varvec{\Gamma }_i = \mathbf {S}^{-1}\varvec{\Sigma }^{1/2}\mathbf {V}^{\mathrm{T}} \end{aligned}$$
(4)

\(\mathbf {O}_i = [\mathbf {C}\;\; \mathbf {C}\mathbf {A} \dots \mathbf {C}\mathbf {A}^{i-1}]^{\mathrm{T}}\) and \(\varvec{\Gamma }_i =[\mathbf {A}^{i-1}\mathbf {G} \dots \mathbf {A}\mathbf {G} \;\; \mathbf {G}]\) represent, respectively, the observability matrix and the reversed controllability matrix. In (4) the matrix \(\mathbf {S}\) is set equal to the identity matrix \(\mathbf {I}\), that is because it plays the role of a similarity transformation applied to the state-space model. The matrices \(\mathbf {A}\), \(\mathbf {C}\), and \(\mathbf {G}\) represent the state matrix, the output influence matrix, and the next state-output covariance matrix, respectively. Matrices \(\mathbf {C}\) and \(\mathbf {G}\) can be easily extracted from the matrices \(\mathbf {O}_i\) and \(\varvec{\Gamma }_i\), consequently \(\mathbf {A}\) can be calculated by (1) as \(\mathbf {A} = \mathbf {O}_i^+ \mathbf {T}_{2|i+1} \varvec{\Gamma }_i^+\). Applying now the eigenvalues decomposition to \(\mathbf {A}\) we get

$$\begin{aligned} \mathbf {A} = \varvec{\Psi } \varvec{\Omega } \varvec{\Psi }^{\mathrm{T}} \end{aligned}$$
(5)

where \(\varvec{\Psi }\) is an orthonormal matrix that contains the eigenvectors arranged in columns, and \(\varvec{\Omega } =\text {diag}(\widetilde{\lambda }_1,\ldots ,\widetilde{\lambda }_n)\) is an \(n \times n\) diagonal matrix that contains the n eigenvalues of the state matrix. Reintroducing now the previously dropped indices, we can estimate the continuous-time damage sensitive parameters of the pth mode as follows:

  • eigenvalues \(\lambda _p^{(a,n)} = f_s \, \ln (\widetilde{\lambda }_p^{(a,n)})\);

  • natural frequencies \(\mu _p^{(a,n)} =|{\lambda }_p^{(a,n)}|/(2\pi )\);

  • dumping ratios \({\delta }_p^{(a,n)} =-\mathfrak {R}\{{\lambda }_p^{(a,n)}\}/|{\lambda }_p^{(a,n)}|\);

  • mode shapes \(\varvec{\phi }_p^{(a,n)} =\mathbf {C}^{(a,n)}\varvec{\psi }_p^{(a,n)}\);

where \(\varvec{\phi }_p^{(a,n)}\) is a \(l \times 1\) vector, and \(\varvec{\psi }_p^{(a,n)}\) is the pth column vector of \({\varvec{\Psi }}^{(a,n)}\) defined in (5). Figure 3a reports the stabilization diagram obtained extracting natural frequencies through the described procedure for the first acquisition (\(a=1\)) varying the model order n.

3.2 Mode selection

The SSI algorithm generates a broad set of modes; some of these are real, others are spurious, and must be ignored. In literature, several approaches are present to accomplish this task [32]. In this work, we use four metrics to evaluate if a mode is real or spurious: modal assurance criterion (MAC), mean phase deviation (MPD), dumping ratio check, and complex conjugate poles check. In the following, we briefly describe each metric.

MAC. It is a dimensionless squared correlation coefficient between mode shapes, defined as [41]

$$\begin{aligned} \text {MAC}(\varvec{\phi }_p^{(a,n)}, \varvec{\phi }_q^{(a,j)}) =\frac{|\varvec{\phi }_p^{(a,n)\text {T}} \varvec{\phi }_q^{(a,j)}|^2}{||\varvec{\phi }_p^{(a,n)} ||_2^2||\varvec{\phi }_q^{(a,j)}||_2^2} \end{aligned}$$
(6)

with values between 0 and 1. When MAC is greater than 0.9, the mode is considered physical; otherwise, it is discarded. A metric based on MAC can also be defined as follows [39]

$$\begin{aligned} d_{\mathrm{m}}(a,n,j,p,q)&= \frac{|\lambda _p^{(a,n)} -\lambda _q^{(a,j)}|}{\max (|\lambda _p^{(a,n)}|,|\lambda _q^{(a,j)}|)}\nonumber \\&\quad +1-\text {MAC}(\varvec{\phi }_p^{(a,n)}, \varvec{\phi }_q^{(a,j)}) \end{aligned}$$
(7)

where the first term measures the distance between the pth and qth eigenvalues. Using (7) a mode can be considered physical when \(d_{\mathrm{m}}(a,n,j,p,q) < 0.1\).

Fig. 3
figure 3

Example of stabilization diagram for the first hour monitoring: a through SSI, b after mode selection and clustering. Vertical blue lines represent the estimated frequencies after the clustering procedure

MPD. It measures the deviation of a mode shape components from the mean phase (MP) of all its components. Usually to evaluate the MP the SVD is adopted [39]

$$\begin{aligned} \mathbf {P{\varvec{\Lambda }}Q}^T = [\mathfrak {R}\{\varvec{\phi }_p^{(a,n)}\} \;\; \mathfrak {I}\{\varvec{\phi }_p^{(a,n)}\}] \end{aligned}$$
(8)

where \(\mathbf {P}\) is \(l \times 2\), \({\varvec{\Lambda }}\) is \(2 \times 2\), and \(\mathbf {Q}\) is \(2 \times 2\). The MPD can, therefore, be evaluated as follows

$$\begin{aligned}&\text {MPD}(\varvec{\phi }_p^{(a,n)}) = \nonumber \\&\quad \frac{\sum _{r=1}^{l} |\phi _{r,p}^{(a,n)}|\arccos \Big | \frac{\mathfrak {R}\{ \phi _{r,p}^{(a,n)}\} q_{22} -\mathfrak {I}\{ \phi _{r,p}^{(a,n)} \} q_{12}}{\sqrt{q_{12}^2 + q_{22}^2}|\phi _{r,p}^{(a,n)}|} \Big |}{\sum _{r=1}^{l} \phi _{r,p}^{(a,n)}}. \end{aligned}$$
(9)

When the ratio \(\text {MPD}(\varvec{\phi }_p^{(a,n)}) /(\pi /2)>0.75\) a mode is considered spurious and it is neglected, otherwise it is considered as physical one.

Damping ratio and complex conjugate poles. In an actual structure, the dumping ratio evaluated for each mode must be positive and lower than 0.2; for this reason, only modes with \(0<\delta _p^{(a,n)}<0.2\) are considered. Furthermore, if \(\mathfrak {R}\big \{{\lambda }_p^{(a,n)}\big \}>0\) the mode represent an unstable structure hence it is ignored.

In Fig. 3b, the stabilization diagram after mode selection is shown, from now on the remaining modes are represented with \(\bar{\mu }_p^{(a,n)}\).

Fig. 4
figure 4

First two natural frequencies estimation after the density-based tracking algorithm. Blue and green backgrounds highlight the acquisitions made during the bridge’s normal condition, used respectively as training and test sets, while the red background stands for damaged condition acquisitions used in the test phase. Red vertical dashed line indicates the instant of damaged detection performed by OCCNN\(^2\), that corresponds to acquisition 3283, 29 h after the introduction of the actual damage

3.3 Data partitioning for damage detection

At the end of the mode selection procedure, a phase that performs clustering and tracking is applied to extract the temporal profile of the first two fundamental frequencies \(\mathbf {f}_s =\big \{f_s^{(a)}\big \}_{a=1}^{N_{\mathrm{a}}}\) with \(s \in \{1,2\}\) [16, 19, 31, 42]. The result is stored in the following matrix (see also Figs. 3b, 4)

$$\begin{aligned} \mathbf {F} =\begin{bmatrix} \mathbf {f}_1\\ \mathbf {f}_2 \end{bmatrix}^{\mathrm{T}} = \begin{bmatrix} f_1^{(1)} &{} f_1^{(2)} &{} \dots &{} f_1^{(N_{\mathrm{a}})}\\ f_2^{(1)} &{} f_2^{(2)} &{} \dots &{} f_2^{(N_{\mathrm{a}})}\\ \end{bmatrix}^{\mathrm{T}}. \end{aligned}$$

As reference parameters of the structure under normal conditions, the average values of the two fundamental frequencies evaluated by the first 100 measurements are \(\bar{\mathbf {f}}_1 =4.00\,\text {Hz}\) and \(\bar{\mathbf {f}}_2 = 5.12\,\text {Hz}\). At this point, the fundamental frequencies extracted must be divided into training, test in standard condition, and test in damaged conditions sets. As described in [38], the damage is introduced at the acquisition \(a=N_{\mathrm{d}} = 3253\), corresponding to the installation of a lowering system. Therefore, from now on, the matrix \(\bar{\mathbf {X}} = {\mathbf {F}}_{1:2N_{\mathrm{d}}-N_{\mathrm{a}}-1,:}\) contains the training points (blue background in Fig. 4b), \(\bar{\mathbf {Y}} = \mathbf {F}_{2N_{\mathrm{d}}-N_{\mathrm{a}}:N_{\mathrm{d}}-1,:}\) contains the test points in standard condition (green background in Fig. 4b), and \(\bar{\mathbf {U}}= \mathbf {F}_{N_{\mathrm{d}}:N_{\mathrm{a}},:}\) contains the test points in damaged condition (red background in Fig. 4b). The three subsets of acquisitions that correspond to training, standard test, and damaged test points are, respectively, \({{\mathcal {I}}}_{\mathrm{x}}=\{1,...,2N_{\mathrm{d}}-N_{\mathrm{a}}-1\}\), \({{\mathcal {I}}}_{\mathrm{y}}=\{2N_{\mathrm{d}}-N_{\mathrm{a}},...,N_{\mathrm{d}}-1\}\), and \({{\mathcal {I}}}_{\mathrm{u}}=\{N_{\mathrm{d}},...,N_{\mathrm{a}}\}\).

Let us define the offset \(\hat{\mathbf {x}}\) as the column vector containing the row-wise mean of the matrix \(\bar{\mathbf {X}}\), and the rescaling factor \(x_{\mathrm{m}} = \max _{a,s}|\bar{x}_{a,s} -\hat{x}_a|\). Before proceeding with the anomaly detection, the matrices \(\bar{\mathbf {X}}\), \(\bar{\mathbf {Y}}\) and \(\bar{\mathbf {U}}\) are centered and normalized subtracting the offset \(\hat{\mathbf {x}}\) row-wise and dividing each entry by the rescaling factor \(x_{\mathrm{m}}\). The resulting data matrices are \(\mathbf {X}\), \(\mathbf {Y}\) and \(\mathbf {U}\), of size \(N_{\mathrm{x}} \times D\), \(N_{\mathrm{y}} \times D\), and \(N_{\mathrm{u}} \times D\), respectively, with \(D=2\) features. The result of this procedure is depicted on the left of Fig. 5.

4 Survey of anomaly detection techniques

In this section, we briefly review PCA, KPCA, GMM which are often adopted for one-class classification (OCC) and introduce OCCNN\(^2\), a neural network based approach recently presented [43,44,45,46,47].

4.1 Principal component analysis

This technique remaps the training data from the feature space \({\mathbb {R}}^D\) in a subspace \({\mathbb {R}}^P\) (where \(P<D\) is the number of components selected) that minimizes the Euclidean distance between the data in the feature space and their projection into the chosen subspace [48]. To find the best subspace to project the training data, the evaluation of the \(D \times D\) sample covariance matrix

$$\begin{aligned} \varvec{\Sigma }_{\mathrm{x}} = \frac{\mathbf {X}^{\mathrm{T}} \mathbf {X}}{N_{\mathrm{x}}-1} \end{aligned}$$
(10)

is needed. The sample covariance matrix \(\varvec{\Sigma }_{\mathrm{x}}\) can be factorized by eigenvalue decomposition as \(\varvec{\Sigma }_{\mathrm{x}} = \mathbf {V}_{\mathrm{x}} \varvec{\Lambda }_{\mathrm{x}} {\mathbf {V}_{\mathrm{x}}}^{\mathrm{T}}\), where \(\mathbf {V}_{\mathrm{x}}\) is an orthonormal matrix whose columns are the eigenvectors, while \(\varvec{\Lambda }_{\mathrm{x}}\) is a diagonal matrix that contains the D eigenvalues. The eigenvalues magnitude measures the importance of the direction pointed by the relative eigenvector. In our setting, we select the largest component, hence \(P=1\); therefore, the best linear subspace of dimension one is \(\mathbf {v}_{\mathrm{P}}\), which coincides with the eigenvector related to the largest eigenvalue of \(\varvec{\Sigma }_{\mathrm{x}}\). The projection into the subspace is obtained by multiplying the data by \(\mathbf {v}_{\mathrm{P}}\), i.e., \(\mathbf {x}_{\mathrm{P}} =\mathbf {X}\mathbf {v}_{\mathrm{P}}\), \(\mathbf {y}_{\mathrm{P}} =\mathbf {Y}\mathbf {v}_{\mathrm{P}}\), and \(\mathbf {u}_{\mathrm{P}} =\mathbf {U}\mathbf {v}_{\mathrm{P}}\). The error is evaluated reconstructing the data in the original feature space, i.e., \(\widetilde{\mathbf {X}} = \mathbf {x}_{\mathrm{P}}\mathbf {v}_{\mathrm{P}}^{\mathrm{T}}\), \(\widetilde{\mathbf {Y}} = \mathbf {y}_{\mathrm{P}} \mathbf {v}_{\mathrm{P}}^{\mathrm{T}}\), and \(\widetilde{\mathbf {U}} =\mathbf {u}_{\mathrm{P}}\mathbf {v}_{\mathrm{P}}^{\mathrm{T}}\). After the reconstruction, it is possible to calculate the error as the Euclidean distance between the original and reconstructed data.

Unfortunately, PCA can be ineffective when the number of frequencies considered is low. Moreover, the variability of the frequencies estimated due to environmental effects can affect its performance [49]. This is because PCA finds only linear boundaries in the original feature space; therefore, it is recommended when the dimensionality of the problem is high and classes can be well separated via hyperplanes.

4.2 Kernel principal component analysis

Due to the inability of PCA of finding non-linear boundaries, here we propose KPCA as an alternative [50]. KPCA first maps the data with a non-linear function, named kernel, then applies the standard PCA to find a linear boundary in the new feature space. The kernel function, applied to the linear boundary, makes it non-linear in the original feature space. A delicate point in the development of KPCA algorithm is the kernel function choice. In [51], where the data distribution is unknown, the radial basis function (RBF) kernel is proposed as the right candidate. Given a generic point \(\mathbf {z}\) that correspond to a \(1 \times D\) vector, we can apply the RBF as

$$\begin{aligned} K^{(\mathbf {z})}_{n}=e{^{-\gamma ||\mathbf {z} -\mathbf {x}_n||^2}}, \;\;\;\; \text {with} \;\;\;\; n=1,2,\dots ,N_{\mathrm{x}} \end{aligned}$$
(11)

where \(\gamma\) is a kernel parameter (which controls the width of the Gaussian function) that must be set properly, \(\mathbf {x}_n\) is the nth row of \(\mathbf {X}\), and \(K^{(\mathbf {z})}_n\) is the nth component of the point \(\mathbf {z}\) in the kernel space. Overall, the vector \(\mathbf {z}\) is mapped in the vector \(\mathbf {k}^{({\mathbf {z}})} =[K^{(\mathbf {z})}_{1}, K^{(\mathbf {z})}_{2},\dots ,K^{(\mathbf {z})}_{N_{\mathrm{x}}}]\). Remapping all the data in the kernel space, we obtain the subsequent matrices \(\mathbf {K}_{\mathrm{x}}\) of size \(N_{\mathrm{x}} \times N_{\mathrm{x}}\) for training, \(\mathbf {K}_{\mathrm{y}}\) of size \(N_{\mathrm{y}} \times N_{\mathrm{x}}\) for validation, and \(\mathbf {K}_{\mathrm{u}}\) of size \(N_{\mathrm{u}} \times N_{\mathrm{x}}\) for test, respectively.

Applying now the PCA to the new data set, it is possible to find non-linear boundaries in the original feature space.

4.3 Gaussian mixture model

Another well-known data analysis tool, named GMM, has been used to solve OCC problems in literature [52]. This approach assumes that data can be represented by a mixture of \({\mathcal {M}}\) multivariate Gaussian distributions. The outputs of the algorithm are the covariance matrices, \(\varvec{\Sigma }_m\), and the mean values, \(\varvec{\mu }_m\), of the Gaussian functions, with \(m =1,2,\dots ,{\mathcal {M}}\). The GMM algorithm finds the set of parameters \(\varvec{\Sigma }_m\) and \(\varvec{\mu }_m\) of a Gaussian mixture that better fit the data distribution through iterative algorithms, such as stochastic gradient descent or Newton–Raphson [16, 17].

4.4 One-class classifier neural network\(^2\)

This algorithm exploits the flexibility of the standard feed-forward NN in an anomaly detection problem. It is based on the OCCNN method [45] that generates artificially anomalous points with a spatial density proportional to the one inferred by the Pollard’s estimator [53]. Such anomalous points will then be used during training to estimate the class boundaries. This procedure is repeated several times to refine the edges step-by-step. Unfortunately, Pollard’s estimator may exhibit accuracy degradation when the data set points distribution deviates from Poisson. Based on these considerations, OCCNN\(^2\) shares the same strategy of OCCNN, but the first boundary estimation is made by an autoassociative neural network (ANN) that is less sensitive to deviations from the Poisson distribution [19].

Fig. 5
figure 5

Examples of feature transformation due to the effect of a low number of sensors, a low number of bits, and a low number of samples with respect to the standard measurement condition reported on the left

5 Data management

This section analyzes the amount of data that must be stored or transmitted to perform anomaly detection and proposes some strategies to reduce such volume of data. Considering a network of \(l=8\) synchronized sensors interconnected to a coordinator that stores the accelerometric measurements, it is easy to observe that if each sensor collects \(N_{\mathrm{s}}=65,536\) samples each acquisition with \(N_{\mathrm{b}}=16\) resolution bits, the total amount of data stored by the coordinator is \(M_{\mathrm{t}}= N_{\mathrm{s}} N_{\mathrm{b}} N_{\mathrm{a}} l \simeq 32\,\text {Gbit} = 4\,\text {GB}\) for \(N_{\mathrm{a}}=4107\) acquisitions. This considerable amount of data has been stored in an year of non continuous measurements, where the actual acquisition time is \(T_{\mathrm{t}} = T_{\mathrm{a}} N_{\mathrm{a}} \simeq 44,860\,\text {m} \simeq 448\,\text {h}\). The volume of data in a continuous measurement system in a year would be around \(47\,\text {GB}\). To reduce the amount of data the first step is decimation. Considering that in this application the fundamental frequencies of the bridge fall in the interval \([0, 20]\,\text {Hz}\), to comply with the sampling theorem with a guard band of \(5\,\text {Hz}\) a sampling frequency \(f_{\mathrm{samp}}=50\,\text {Hz}\) is enough to capture the bridge oscillations. Since the measurements are acquired by accelerometers with \(f_{\mathrm{samp}}=100\,\text {Hz}\), a decimation by factor 2 can be adopted so that data volume is halved: \(M_{\mathrm{d}}=M_{\mathrm{t}}/2 \simeq 2\,\text {GB}\). Starting from the decimated waveforms, three parameters can be tuned to trade-off between the volume of data and the performance of the OCC algorithms:

  • The number of sensors l; this also reduces the network costs.

  • The number of samples \(N_{\mathrm{s}}\) or equivalently the acquisition time \(T_{\mathrm{a}}\); this has benefits also on the energy consumption and network lifetime in battery-powered sensors [54, 55].

  • The number of bits \(N_{\mathrm{b}}\); this also reduces the accelerometer cost.

The solutions described before and how they influence the anomaly detectors’ accuracy will be presented and widely discussed in the next section. In Fig. 5 some working points of the system are reported and compared with the reference working condition after decimation (\(l=8\), \(N_{\mathrm{d}}=32,768\), \(N_{\mathrm{b}}=16\)). To clarify the effect of the proposed solutions on the amount of data that must be stored to monitor the structure effectively, in Table 1 some acceptable working conditions are reported.

Table 1 Memory occupation varying decimation, number of sensors, number of samples, and number of bits

6 Numerical results

In this section, the proposed algorithms are applied to the Z-24 bridge data set to detect anomalies based on the fundamental frequencies estimation [9, 56, 57], and a reduced number of features. The performance is evaluated through accuracy, precision, recall, and F\(_1\) score, considering only the test set:

$$\begin{aligned} \text {Accuracy}&= \frac{T_{\mathrm{P}}+T_{\mathrm{N}}}{T_{\mathrm{P}} +T_{\mathrm{N}}+F_{\mathrm{P}}+F_{\mathrm{N}}}\\ \text {Precision}&= \frac{T_{\mathrm{P}}}{T_{\mathrm{P}}+F_{\mathrm{P}}}\\ \text {Recall}&= \frac{T_{\mathrm{P}}}{T_{\mathrm{P}}+F_{\mathrm{N}}}\\ \text {F}_1\ \text {score}&= 2\cdot \frac{\text {Recall} \cdot \text {Precision}}{\text {Recall}+\text {Precision}} \end{aligned}$$

where \(T_{\mathrm{P}}\), \(T_{\mathrm{N}}\), \(F_{\mathrm{P}}\), and \(F_{\mathrm{N}}\), represent respectively true positive, true negative, false positive, and false negative predictions. Such indicators are obtained comparing the actual labels \([\zeta ^{(1)},\dots , \zeta ^{(N_{\mathrm{a}})}]\), with those predicted by the OCC \([\widehat{\zeta }^{(1)},\dots , \widehat{\zeta }^{(N_{\mathrm{a}})}]\). In this application, labels are 0 for normal condition and 1 for anomaly condition, respectively. Therefore

$$\begin{aligned} T_{\mathrm{P}} = \sum _{a \in {{\mathcal {I}}}_{\mathrm{u}}} \mathbbm {1}\left\{ \zeta ^{(a)},\widehat{\zeta }^{(a)}\right\} \,\,\,\text {and}\,\,\, T_{\mathrm{N}} =\sum _{a \in {{\mathcal {I}}}_{\mathrm{y}}} \mathbbm {1}\left\{ \zeta ^{(a)},\widehat{\zeta }^{(a)}\right\} \end{aligned}$$

with \(F_{\mathrm{N}} = N_{\mathrm{u}}-T_{\mathrm{N}}\), and \(F_{\mathrm{P}} =N_{\mathrm{y}}-T_{\mathrm{P}}\).

Fig. 6
figure 6

Comparison of the anomaly detection algorithms in terms of F\(_1\) score, recall, precision, and accuracy

The feature space has dimension \(D = 2\), and the three data set used for training, test in normal condition, and damaged condition, have cardinality \(N_{\mathrm{x}}=2399\), \(N_{\mathrm{y}}=854\), and \(N_{\mathrm{u}}=854\), respectively. For PCA, the number of components selected is \(P = 1\). For KPCA, after several tests the values of P and \(\gamma\) that ensure the minimum reconstruction error are \(P=3\) and \(\gamma =8\). For GMM the order of the model that maximize performance is \({\mathcal {M}}=10\). Regarding the OCCNN\(^2\) the first step boundary estimation is made by a fully connected ANN with 7 layers of, respectively, 50, 20, 10, 1, 10, 20 and 50 neurons, with ReLU activation functions, and a fully connected NN with 2 hidden layers with \(L = 50\) neurons each one for the second step. All the NNs are trained for a number of epochs \(N_{\mathrm{e}}=5000\) with a learning rate \(\rho = 0.05\). The error adopted to evaluate the displacement of the points in the feature space from the original position due to the different configurations is the root mean square error (RMSE), defined as

$$\begin{aligned} {\mathcal {E}}_f = \frac{1}{\sqrt{N_{\mathrm{a}} N_{\mathrm{s}}}}\sqrt{\sum ^{N_{\mathrm{s}}}_{s=1} \sum ^{N_{\mathrm{a}}}_{n=1}(f_s^{(n)}-{\bar{f}_s}^{(n)})^2} \end{aligned}$$
(12)

where \(N_{\mathrm{s}}\) is the number of features (\(N_{\mathrm{s}} = 2\)), \(f_s^{(n)}\) is the sth feature of the nth acquisition in the initial configuration, and \(\bar{f}_s^{(n)}\) is the relative data point in the modified configuration.

6.1 Performance comparison

The performance comparison of the algorithms is reported in Fig. 6. As we can see, considering the F\(_1\) score and accuracy, OCCNN\(^2\) outperforms the other detectors. The confusion matrices for the four OCCs are reported in Table 2. It is important to notice that \(F_{\mathrm{P}}\) is always small, that is because the threshold is set on the false alarm rate (selected ensuring false alarm rate equal to 0.01 on the training set).

Table 2 Confusion matrices of the four anomaly detectors analyzed

6.2 Sensors’ relevance

Before evaluating the effect of reducing the number of sensors, assessing each one’s importance in the modal frequencies estimation is informative. It is widely known in the literature that the sensor position strongly affects the mode estimation [9]. To verify the sensors’ relevance, we removed sensors one by one. We then evaluated the RMSE between the feature space points with respect to the standard condition. As can be seen in Fig. 7, sensor \(S_{10}\) generates the most significant error in the fundamental frequencies extraction when removed. With this approach it is possible to sort the sensors from the most relevant to the less one as follows: \(S_{10}, S_{03}, S_{16}, S_{14}, S_{05}, S_{12}, S_{06}, S_{07}\). To evaluate the effect of the number of sensors, they are removed in the same order indicated above. This way, we always consider the worst possible condition (i.e., the set of accelerometers that results in worse performance) for a given number of sensors.

Fig. 7
figure 7

RMSE caused by removing the selected sensor

Fig. 8
figure 8

RMSE and accuracy varying the number of sensors

6.3 Number of sensors

Once the sensor relevance is identified, we can analyze the performance by varying the number of sensors used for SSI. As can be noticed in Fig. 8, the accuracy remains almost the same as long as the number of sensors is greater than two. In particular, the RMSE shows a significant increase when the number of sensors drops from 3 to 2. Thus we can deduce that the minimum number of sensors that must be used to monitor the Z-24 bridge is equal to 3. In this configuration it is easy to notice that the amount of data stored is reduced to \(M_{\mathrm{sen}}= N_{\mathrm{s}} N_{\mathrm{b}} N_{\mathrm{a}} 3 \simeq \,0.8\,\text {GB}\).

6.4 Number of samples

To evaluate the effect of the acquisition time on the anomaly detection performance, we progressively reduced the number of samples used to extract the fundamental frequencies. As we can see in Fig. 9 the performance of the detectors remains almost constant as long as the number of samples \(N_{\mathrm{s}}\) is greater than 600; that corresponds to an acquisition time of \(12\,\text {s}\) with a sampling frequency \(f_{\mathrm{samp}} =50\,\text {Hz}\). By drastically reducing the acquisition time, a significant reduction in data occupation is obtained, which in this configuration is \(M_{\mathrm{sam}}= 600 N_{\mathrm{b}} N_{\mathrm{a}} l \simeq \,0.04\,\text {GB}\), with no performance degradation.

Fig. 9
figure 9

RMSE and accuracy varying the number of samples

6.5 Number of bits

The number of bits per sample used to encode waveforms extracted from accelerometers can also be dropped to reduce the volume of data stored and the cost of the sensor. Such an impact is reported in Fig. 10. The RMSE remains limited as far as the number of bits per sample is greater than 6; likewise, as expected, the accuracy of OCCs remains high as long as the error is small. Several low-cost accelerometers are available on the market with a resolution \(N_{\mathrm{b}}=8\), and these results show that this type of sensor could accomplish the anomaly detection task. In this case, the data occupation is \(M_{\mathrm{bit}}= 8 N_{\mathrm{s}} N_{\mathrm{a}} l \simeq \,1\,\text {GB}\). This relevant reduction of the number of resolution bits is possible because of the anomaly detector capability to cope with the error introduced in the modal frequencies estimation (depicted in red in Fig. 10) caused by quantization.

Fig. 10
figure 10

RMSE and accuracy varying the number of bits

6.6 Intrinsic noise

A significant effect that should be investigated when designing a sensor network with low-cost devices is the intrinsic noise of the sensors. This kind of noise is strictly related to the type of technology adopted. For example, for micro-electromechanical systems (MEMS) accelerometers, the intrinsic noise can be modeled as a combination of thermal noise, flicker noise, and shot noise [58, 59]; for microcantilevers at high frequencies, dominant noise sources are adsorption–desorption processes, temperature fluctuations, and Johnson noise, whereas the adsorption–desorption noise dominates at low frequencies [60].

To maintain generality about the type of sensors, we considered thermal noise, always present in electro-mechanical systems, as dominant effect, modeled as additive white Gaussian noise with zero mean and variance \(\sigma _N^2\) depending on the sensor characteristics. To evaluate the impact of noise, we assess the performance of the algorithms varying the signal-to-noise ratio (SNR) of the system. Considering the accelerometric measurements gathered by the actual sensors as ideal, we progressively increase the noise variance \(\sigma _N^2\) to ensure an SNR excursion between \(-22\) and 15 dB as reported in Fig. 11. As we can see, the RMSE is tolerable when the SNR is greater than \(-12\) dB; likewise, the accuracy of the detectors remains high as long as the error is limited.

Fig. 11
figure 11

RMSE and accuracy in the presence of intrinsic noise varying the SNR

Fig. 12
figure 12

RMSE [Hz] of fundamental frequencies estimation varying the number of bits, sensors, and samples. Red regions represent combinations of parameters that lead to high feature extraction error (RMSE above 0.09 Hz); blue ones identify parameters settings that result in low RMSE

6.7 Joint effect of sensors, samples, and bits

Due to the interdependence between the analyzed sensing parameters, it is impossible to adjust them independently to guarantee a predefined performance. Therefore, in Fig. 12, the feature extraction RMSE varying the number of bits, sensors, and samples is reported for different combinations of the acquisition parameters. Red regions represent combinations of parameters that lead to high feature extraction error, while blue ones identify parameters settings that result in low RMSE. For example, reducing the number of sensors l from 8 to 3 and the number of bits \(N_{\mathrm{b}}\) from 16 to 8, we comply with the limits reported in Figs. 8 and 10, but as depicted in Fig. 12, the combination of these two values result in an unacceptable RMSE. Although the interplay between acquisition parameters is not straightforward, important guidelines can be derived from the heat maps in Fig. 12:

  • From the first row of plots, it can be seen that the number of sensors l necessary to guarantee a low RMSE decreases by lowering the acquisition time.

  • From the middle row of plots, we deduce that the number of bits is not a critical parameter in the feature extraction process due to the similar shapes reported in the heat maps.

  • From the last row, we can infer that 4 sensors are enough to ensure low extraction error when the number of samples \(N_{\mathrm{s}}\) is greater than 1000.

  • From the first plot of the first row, we can see that there are no possible configurations that contain the error with a number of samples \(N_{\mathrm{s}}\) lower than 500.

7 Conclusions

In this paper, we presented a SHM system that aims to extract damage-sensitive features with the minimum amount of resources necessary for damage detection with high accuracy. In addition, an overview of some widely used anomaly detection algorithms is provided. Three different approaches are proposed to reduce the volume of data stored or transmitted to limit costs for sensors and network infrastructure: (i) when the goal is to reduce the amount of data stored, it is good practice to reduce the observation time and use several high-resolution sensors; (ii) if the target is to minimize the sensor cost, a good practice is to adopt several low-resolution sensors combined with a long observation time; (iii) when the objective is to contain the network infrastructure cost, it is recommended to adopt high-resolution sensors and long observation time. We show that the best practice to reduce the total amount of data and hence the memory occupation, without affecting the performance, is to reduce the observation time. In the considered scenario, this approach reduces the memory occupation from 4 to 0.04 GB.

RMSE and accuracy are used to evaluate the error introduced by the data containment strategies and the corresponding performance of the algorithms. The results show that such strategies, when properly designed, can be adopted without significant loss of performance. In fact, all the algorithms except PCA ensure an anomaly detection accuracy greater than 94% in all of the proposed configurations, with the maximum performance reached by OCCNN\(^2\) whose accuracy never goes down below 95%.