The impact of sensing parameters on data management and anomaly detection in structural health monitoring

The massive and autonomous structural health monitoring (SHM) of bridges is a problem that is of growing interest due to its importance and topicality. However, a considerable amount of data must be elaborated and managed in such an application. This paper proposes a set of machine learning (ML) tools to detect anomalies in a bridge from vibrational measurements using the minimum amount of data. The proposed framework starts from the fundamental frequencies extracted through operational modal analysis (OMA) and clustering, followed by a density-based time-domain tracking algorithm. The fundamental frequencies extracted are then fed to one-class classification (OCC) algorithms that perform anomaly detection. Then, to reduce the amount of data, we analyze the effect of the number of sensors, the number of bits per sample, the observation time, and the measurement noise on damage detection performance. As a case study, the Z-24 bridge is considered because of the extensive database of accelerometric measurements in both standard and damaged conditions. A comparison of OCC algorithms, such as principal component analysis (PCA), kernel principal component analysis (KPCA), Gaussian mixture model (GMM) and one-class classifier neural network (OCCNN)2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^2$$\end{document} is performed, and their robustness to data shrinking is evaluated. In many cases, OCCNN2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^2$$\end{document} increases the performance with respect to classical anomaly detection techniques in terms of accuracy.


Introduction
Nowadays, structural health monitoring (SHM) represents a fundamental research field in a society where historical and modern infrastructures must coexist harmoniously. To preserve the integrity of thousands of structures, contain maintenance costs, and increase safety, early detection of anomalies before severe damages occur is a cornerstone of civil engineering [1].
As far as bridges are concerned, some statistics highlight the relevance of the problem. For example, currently, in Italy, there are almost 2000 bridges that require continuous and accurate monitoring; in France, 4000 bridges need to be restored, and 840 are considered in critical conditions; in Germany, 800 bridges are considered critic; in the United States of America, among the 600.000 bridges, according to a conservative estimate, at least 9% of them is considered deficient [2]. In this context, SHM offers several solutions for anomaly detection [3][4][5].
In literature, numerous damage detection and localization strategies have been proposed and tested [6,7]. Part of them focuses on extracting the most significant damagesensitive features of the structure under analysis. However, data management requires further investigation to determine how the sensing parameters impact on the anomaly detectors performance. Generally, damage detection techniques can be divided into model-free and model-based: in the former, the only information available is the one gathered by measurements (e.g., acceleration, temperature, position) [8], while in the latter, information comes from measurements and prior knowledge of the model of the structure [9,10]. Model-based approaches tend to outperform model-free ones because of the prior knowledge of the structure; however, the solutions for a specific case are not easily generalizable due to the tight coupling with the model. The effect of environmental parameters, such as temperature, wind, and humidity, [11][12][13][14], and the influence of traffic loading [15], are usually taken into account to describe the structures behaviour exhaustively. In this work, environmental and traffic dependencies have been extracted directly by the anomaly detectors from the data. The constraint on the knowledge about traffic and environmental parameters has been relaxed thanks to the analysis of the long-term continuous monitoring measurements, that explore all the possible operational configurations of the structure. Since the monitoring procedure could be complex and requires a specific fine-tuning of several parameters that depend on the structure under analysis, the adoption of machine learning (ML) techniques to detect changes in the damage sensitive features received increasing interest recently [16][17][18][19][20]. In particular, in [21] a convolutional neural network (NN) is adopted to perform automatic features extraction and damage detection simultaneously, reducing the computational cost of the procedure. A multi-layer perceptron NN is used to evaluate the effectiveness of a previous feature extraction procedure in [22], and a NN is adopted to predict bridge accelerations in order to extract damage sensitive features in [23]. These approaches are typically tested on scaled or simulated structures, not considering the amount of data produced to be managed in a real-world monitoring scenario. Sensors displacement represents a widely investigated topic in SHM. Usually, the proposed strategies start from a model of the structure and place the sensors to minimize a cost function [24][25][26]. In this work, we propose a different paradigm. Starting from an oversized number of sensors, we first evaluate the effect of each accelerometer on the overall data set of measurements, than we select the subset of sensors necessary to preserve high performance on the anomaly detection task. Moreover, data management in SHM is still an open problem, that can be addressed at sensors network level [27,28] or at structures network level [29,30]. In this work we provide fundamental guidelines to deploy the monitoring system (at sensors network level and structures network level), taking into account the constraints introduced by the available resources.
The proposed framework starts with the fundamental frequencies extraction from accelerometric measurements through stochastic subspace identification (SSI), cleaning, and clustering [9,16,[31][32][33][34][35][36] and then performs modal frequencies tracking in the time domain [19]. The first two fundamental frequencies are considered a feature space to train one-class classifiers to perform anomaly detection; intended as any non-negligible deterioration of the structure that affects its standard behaviour. In this work, we investigate the impact of data shrinking strategies on damage detection in bridges. In particular, we derive the performance of ML-based anomaly detection techniques varying the number of sensors, samples, and resolution bits to minimize the data storage/transmission requirements, in view of large-scale bridge monitoring.
This goal is crucial for several reasons: (i) If damage detection is performed locally on the bridge, reducing the size of the data set can contain energy consumption for elaboration and allow the use of low-cost computational units. (ii) If data are processed remotely and a batterypowered wireless network enables the connection with the remote server, reducing the amount of data can increase the network lifetime. Moreover, several internet of things (IoT) solutions have limited throughput, so reducing the volume of data collected may pave the way for the use of IoT networks in bridge monitoring. (iii) Continuous monitoring over the years generates a huge amount of data. Therefore, to contain the database size, it is recommended to use the minimum amount of information necessary.
To summarize, the main contributions are the following: -The performance of several ML algorithms for anomaly detection, such as principal component analysis (PCA), kernel principal component analysis (KPCA), Gaussian mixture model (GMM), and one-class classifier neural network (OCCNN) 2 are compared in terms of accuracy, precision, recall, and F 1 score. -The effect of the number of sensors on algorithms' performance is investigated. -The impact of the number of samples and the resolution bits (bits per sample) on the classification accuracy is quantified. -To account for low-cost sensors in a typical large-scale monitoring, the effect of measurement noise power on damage detection is investigated [37]. -The combined effects of the number of sensors, number of samples, and resolution bits are analyzed to find operational limits that ensure a predefined performance of the classification task.
The performance of the proposed solution is investigated on a real structure data set using the accelerometric data available for the Z-24 bridge [38,39]. Throughout this paper, capital boldface letters denote matrices and tensors, lowercase bold letters denote vectors, (⋅) T stands for transposition, (⋅) + indicates the Moore-Penrose pseudoinverse operator, || ⋅ || is the 2 -norm of a vector, ℜ{⋅} and ℑ{⋅} are the real and imaginary parts of a complex number, respectively, {⋅} is the variance operator, and 1{a, b} is the indicator function equal to 1 when a = b , and zero otherwise. This paper is organized as follows. In Sect. 2, a brief overview of the acquisition system, the accelerometers setup, and the monitoring scenario is presented. The fundamental frequencies extraction technique adopted is described in Sect. 3. A survey of anomaly detection techniques is reported in Sect. 4. The volume of data generated by the acquisition system and some possible strategies to reduce it are presented in Sect. 5. Numerical results are given in Sect. 6. Finally, conclusions are drawn in Sect. 7.

System configuration
The Z-24 bridge was located in the Switzerland canton Bern. The bridge was a part of the road connection between Koppigen and Utzenstorf, overpassing the A1 highway between Bern and Zurich. It was a classical post-tensioned concrete two-cell box girder bridge with a main span of 30 m and two side spans of 14 m . The bridge was built as a freestanding frame, with the approaches backfilled later. Both abutments consisted of triple concrete columns connected with concrete hinges to the girder. Both intermediate supports were concrete piers clamped into the girder. An extension of the bridge girder at the approaches provided a sliding slab. All supports were rotated with respect to the longitudinal axis that yielded a skew bridge. The bridge was demolished at the end of 1998 [38]. During the year before its demolition, the bridge was subjected to long-term continuous monitoring to quantify the bridge dynamics environmental variability. Moreover, progressive damage tests took place over a month, shortly before the complete demolition of the bridge, alternated with short-term monitoring tests while the continuous monitoring system was still running. The tests proved experimentally that realistic damage has a measurable influence on bridge dynamics.

Data collection and pre-processing
The accelerometer's position and their measurement axis are shown in Fig. 1. In this work, we considered l = 8 accelerometers, identified as 03, 05, 06, 07, 10, 12, 14, and 16, which are present in both long-term continuous monitoring phase and in the progressive damage one. 1 The accelerometer orientation is highlighted in Fig. 1 with different colors, red, green, and blue, staying respectively for transversal, vertical, and longitudinal orientation. Every hour N s = 65, 536 samples are acquired from each sensor with sampling frequency f samp = 100 Hz which corresponds to an acquisition time T a = 655.36 s . Each accelerometer has a built-in antialiasing filter to avoid aliasing during the acquisition. Since the measurements are not always available, there are N a = 4107 acquisitions collected in a period of 44 weeks.
The block diagram depicted in Fig. 2 represents the sequence of tasks performed for the fully automatic anomaly detection approach presented in this work. Some pre-processing steps have been applied to the data to reduce disturbs, the computational cost, and the memory occupation of the subsequent elaborations. This could be particularly useful when the computational resources are limited, and in wireless networks, when the amount of data to be stored and transmitted represent an important constraint. First, a decimation by a factor of 2 is applied to each acquisition; hence the sampling frequency is scaled to f samp = 50 Hz . Such sampling frequency is considered sufficient because the Z-24 fundamental frequencies fall in the [2.5, 20] Hz frequency range [38]. After decimation, data are processed with a bandpass finite impulse response (FIR) filter of order 30 with band [2.5, 20] Hz , to remove outof-band disturbances. At the end of the decimation step, the amount of samples for each acquisition N dec is already halved  Block diagram for signal acquisition, processing, feature extraction, tracking, and anomaly detection ( N dec = N s ∕2 = 32, 768 ) and that represent a first important step in the data management process.
To keep the notation compact, from now on we consider the data organized in a tensor D of dimensions N a × l × N dec .

Fundamental frequencies extraction
In this section, we approach the problem of extracting damagesensitive features from the accelerometric measurements of the structure. Among a wide set of algorithms, we select the SSI, a data-driven strategy able to provide damage-sensitive features without a priori information about the structure [9]. After this procedure, a mode selection phase is provided to distinguish physical modes from spurious ones. Four different widely known metrics are used to accomplish this task that will be presented in the following [32,34,35,41]. Finally, K-means algorithm is applied to cluster the data [16,31] followed by a tracking algorithm performed onto the first two fundamental frequencies to filter outliers [19].

Stochastic subspace identification
SSI requires the selection of a model order n ∈ ℕ and a timelag i ≥ 1 . The following constraint must be ensured to correctly apply the algorithm, l ⋅ i ≥ n [9]. In this application we consider the model order n unknown, so it is varied in the range n ∈ [2, 160] (with step 2), while the time-lag is i = 60 [9].
First of all, we define the block Toeplitz matrix for a given time-lag i, shift s, and acquisition a of dimensions li × li where is a correlation matrix of dimension l × l , and matrix (a,b∶c,∶) is extracted from the data tensor D selecting a particular acquisition a. We drop index a to simplify the notation; for this reason, all the following tasks will be repeated for each acquisition. In order to factorize the block Toeplitz matrix (1) with s = 1 , we apply the singular values decomposition (SVD) as follows where (n) is an li × n matrix that contains the left singular vectors arranged in columns, (n) is an n × li matrix that contains the right singular vectors arranged in rows, and is an n × n diagonal matrix that contains the singular values on its diagonal sorted in descending order. We also drop the index n so that the next steps will be applied for each model order. Selecting the correct number of singular values from the SVD, the matrix 1|i can be split in two parts ] represent, respectively, the observability matrix and the reversed controllability matrix. In (4) the matrix is set equal to the identity matrix , that is because it plays the role of a similarity transformation applied to the state-space model. The matrices , , and represent the state matrix, the output influence matrix, and the next state-output covariance matrix, respectively. Matrices and can be easily extracted from the matrices i and i , consequently can be calculated by (1) as = + i 2|i+1 + i . Applying now the eigenvalues decomposition to we get where is an orthonormal matrix that contains the eigenvectors arranged in columns, and = diag(̃ 1 , … ,̃ n ) is an n × n diagonal matrix that contains the n eigenvalues of the state matrix. Reintroducing now the previously dropped indices, we can estimate the continuous-time damage sensitive parameters of the pth mode as follows: where (a,n) p is a l × 1 vector, and (a,n) p is the pth column vector of (a,n) defined in (5). Figure 3a reports the stabilization diagram obtained extracting natural frequencies through the described procedure for the first acquisition ( a = 1 ) varying the model order n.

Mode selection
The SSI algorithm generates a broad set of modes; some of these are real, others are spurious, and must be ignored. In literature, several approaches are present to accomplish this task [32]. In this work, we use four metrics to evaluate if a mode is real or spurious: modal assurance criterion (MAC), mean phase deviation (MPD), dumping ratio check, and complex conjugate poles check. In the following, we briefly describe each metric.
MAC. It is a dimensionless squared correlation coefficient between mode shapes, defined as [41] with values between 0 and 1. When MAC is greater than 0.9, the mode is considered physical; otherwise, it is discarded. A metric based on MAC can also be defined as follows [39] where the first term measures the distance between the pth and qth eigenvalues. Using (7) a mode can be considered physical when d m (a, n, j, p, q) < 0.1.
MPD. It measures the deviation of a mode shape components from the mean phase (MP) of all its components. Usually to evaluate the MP the SVD is adopted [39] where is l × 2 , is 2 × 2 , and is 2 × 2 . The MPD can, therefore, be evaluated as follows When the ratio MPD( (a,n) p )∕( ∕2) > 0.75 a mode is considered spurious and it is neglected, otherwise it is considered as physical one.
Damping ratio and complex conjugate poles. In an actual structure, the dumping ratio evaluated for each mode must be positive and lower than 0.2; for this reason, only modes with 0 < (a,n) p < 0.2 are considered. Furthermore, if ℜ (a,n) p > 0 the mode represent an unstable structure hence it is ignored.
MPD( (a,n) In Fig. 3b, the stabilization diagram after mode selection is shown, from now on the remaining modes are represented with ̄( a,n) p .

Data partitioning for damage detection
At the end of the mode selection procedure, a phase that performs clustering and tracking is applied to extract the temporal profile of the first two fundamental frequencies s = f (a)  Fig. 3 Example of stabilization diagram for the first hour monitoring: a through SSI, b after mode selection and clustering. Vertical blue lines represent the estimated frequencies after the clustering procedure

Principal component analysis
This technique remaps the training data from the feature space ℝ D in a subspace ℝ P (where P < D is the number of components selected) that minimizes the Euclidean distance between the data in the feature space and their projection into the chosen subspace [48]. To find the best subspace to project the training data, the evaluation of the D × D sample covariance matrix is needed. The sample covariance matrix x can be factorized by eigenvalue decomposition as where x is an orthonormal matrix whose columns are the eigenvectors, while x is a diagonal matrix that contains the D eigenvalues. The eigenvalues magnitude measures the importance of the direction pointed by the relative eigenvector. In our setting, we select the largest component, hence P = 1 ; therefore, the best linear subspace of dimension one is P , which coincides with the eigenvector related to the largest eigenvalue of x . The projection into the subspace is obtained by multiplying the data by P , i.e., P = P , P = P , and P = P . The error is evaluated reconstructing the data in the original feature space, i.e., ̃ = P T P , ̃ = P T P , and ̃ = P T P . After the reconstruction, it is possible to calculate the error as the Euclidean distance between the original and reconstructed data.
Unfortunately, PCA can be ineffective when the number of frequencies considered is low. Moreover, the variability of the frequencies estimated due to environmental effects can affect its performance [49]. This is because PCA finds only linear boundaries in the original feature space; therefore, it is recommended when the dimensionality of the problem is high and classes can be well separated via hyperplanes.

Kernel principal component analysis
Due to the inability of PCA of finding non-linear boundaries, here we propose KPCA as an alternative [50]. KPCA first maps the data with a non-linear function, named kernel, then applies the standard PCA to find a linear boundary in the new feature space. The kernel function, applied to the linear boundary, makes it non-linear in the original feature space. A delicate point in the development of KPCA algorithm is the Fig. 5 Examples of feature transformation due to the effect of a low number of sensors, a low number of bits, and a low number of samples with respect to the standard measurement condition reported on the left kernel function choice. In [51], where the data distribution is unknown, the radial basis function (RBF) kernel is proposed as the right candidate. Given a generic point that correspond to a 1 × D vector, we can apply the RBF as where is a kernel parameter (which controls the width of the Gaussian function) that must be set properly, n is the nth row of , and K ( ) n is the nth component of the point in the kernel space. Overall, the vector is mapped in the vector ] . Remapping all the data in the kernel space, we obtain the subsequent matrices x of size N x × N x for training, y of size N y × N x for validation, and u of size N u × N x for test, respectively. Applying now the PCA to the new data set, it is possible to find non-linear boundaries in the original feature space.

Gaussian mixture model
Another well-known data analysis tool, named GMM, has been used to solve OCC problems in literature [52]. This approach assumes that data can be represented by a mixture of M multivariate Gaussian distributions. The outputs of the algorithm are the covariance matrices, m , and the mean values, m , of the Gaussian functions, with m = 1, 2, … , M . The GMM algorithm finds the set of parameters m and m of a Gaussian mixture that better fit the data distribution through iterative algorithms, such as stochastic gradient descent or Newton-Raphson [16,17].

One-class classifier neural network 2
This algorithm exploits the flexibility of the standard feedforward NN in an anomaly detection problem. It is based on the OCCNN method [45] that generates artificially anomalous points with a spatial density proportional to the one inferred by the Pollard's estimator [53]. Such anomalous points will then be used during training to estimate the class boundaries. This procedure is repeated several times to refine the edges step-bystep. Unfortunately, Pollard's estimator may exhibit accuracy degradation when the data set points distribution deviates from Poisson. Based on these considerations, OCCNN 2 shares the same strategy of OCCNN, but the first boundary estimation is made by an autoassociative neural network (ANN) that is less sensitive to deviations from the Poisson distribution [19].

Data management
This section analyzes the amount of data that must be stored or transmitted to perform anomaly detection and proposes some strategies to reduce such volume of data.
Considering a network of l = 8 synchronized sensors interconnected to a coordinator that stores the accelerometric measurements, it is easy to observe that if each sensor collects N s = 65, 536 samples each acquisition with N b = 16 resolution bits, the total amount of data stored by the coordinator is M t = N s N b N a l ≃ 32 Gbit = 4 GB for N a = 4107 acquisitions. This considerable amount of data has been stored in an year of non continuous measurements, where the actual acquisition time is T t = T a N a ≃ 44, 860 m ≃ 448 h . The volume of data in a continuous measurement system in a year would be around 47 GB . To reduce the amount of data the first step is decimation. Considering that in this application the fundamental frequencies of the bridge fall in the interval [0, 20] Hz , to comply with the sampling theorem with a guard band of 5 Hz a sampling frequency f samp = 50 Hz is enough to capture the bridge oscillations. Since the measurements are acquired by accelerometers with f samp = 100 Hz , a decimation by factor 2 can be adopted so that data volume is halved: M d = M t ∕2 ≃ 2 GB . Starting from the decimated waveforms, three parameters can be tuned to trade-off between the volume of data and the performance of the OCC algorithms: -The number of sensors l; this also reduces the network costs. -The number of samples N s or equivalently the acquisition time T a ; this has benefits also on the energy consumption and network lifetime in battery-powered sensors [54,55]. -The number of bits N b ; this also reduces the accelerometer cost.
The solutions described before and how they influence the anomaly detectors' accuracy will be presented and widely discussed in the next section. In Fig. 5 some working points of the system are reported and compared with the reference working condition after decimation ( l = 8 , N d = 32, 768 , N b = 16 ). To clarify the effect of the proposed solutions on the amount of data that must be stored to monitor the structure effectively, in Table 1 some acceptable working conditions are reported.

Numerical results
In this section, the proposed algorithms are applied to the Z-24 bridge data set to detect anomalies based on the fundamental frequencies estimation [9,56,57], and a reduced number of features. The performance is evaluated through accuracy, precision, recall, and F 1 score, considering only the test set: where T P , T N , F P , and F N , represent respectively true positive, true negative, false positive, and false negative predictions. Such indicators are obtained comparing the actual labels [ (1) , … , (N a ) ] , with those predicted by the OCC [̂ (1) , … ,̂ (N a ) ] . In this application, labels are 0 for normal condition and 1 for anomaly condition, respectively. Therefore with F N = N u − T N , and F P = N y − T P . The feature space has dimension D = 2 , and the three data set used for training, test in normal condition, and damaged condition, have cardinality N x = 2399 , N y = 854 , and N u = 854 , respectively. For PCA, the number of components selected is P = 1 . For KPCA, after several tests the values of P and that ensure the minimum reconstruction error are P = 3 and = 8 . For GMM the order of the model that maximize performance is M = 10 . Regarding the OCCNN 2 the first step boundary estimation is made by a fully connected ANN with 7 layers of, respectively, 50, 20, 10, 1, 10, 20 and 50 neurons, with ReLU activation functions, and a fully connected NN with 2 hidden layers with L = 50 neurons each one for the second step. All the NNs are trained for a number of epochs N e = 5000 with a learning rate = 0.05 . The error adopted to evaluate the displacement of the points where N s is the number of features ( N s = 2 ), f (n) s is the sth feature of the nth acquisition in the initial configuration, and f (n) s is the relative data point in the modified configuration.

Performance comparison
The performance comparison of the algorithms is reported in Fig. 6. As we can see, considering the F 1 score and accuracy, OCCNN 2 outperforms the other detectors. The confusion matrices for the four OCCs are reported in Table 2. It is important to notice that F P is always small, that is because the threshold is set on the false alarm rate (selected ensuring false alarm rate equal to 0.01 on the training set).

Sensors' relevance
Before evaluating the effect of reducing the number of sensors, assessing each one's importance in the modal frequencies estimation is informative. It is widely known in the literature that the sensor position strongly affects the mode estimation [9]. To verify the sensors' relevance, we removed sensors one by one. We then evaluated the RMSE between the feature space points with respect to the standard condition. As can be seen in Fig. 7, sensor S 10 generates the most significant error in the fundamental frequencies extraction when removed. With this approach it is possible to sort the sensors from the most relevant to the less one as follows: S 10 , S 03 , S 16 , S 14 , S 05 , S 12 , S 06 , S 07 . To evaluate the effect of  Fig. 6 Comparison of the anomaly detection algorithms in terms of F 1 score, recall, precision, and accuracy the number of sensors, they are removed in the same order indicated above. This way, we always consider the worst possible condition (i.e., the set of accelerometers that results in worse performance) for a given number of sensors.

Number of sensors
Once the sensor relevance is identified, we can analyze the performance by varying the number of sensors used for SSI. As can be noticed in Fig. 8, the accuracy remains almost the same as long as the number of sensors is greater than two. In particular, the RMSE shows a significant increase when the number of sensors drops from 3 to 2. Thus we can deduce that the minimum number of sensors that must be used to monitor the Z-24 bridge is equal to 3. In this configuration it is easy to notice that the amount of data stored is reduced to M sen = N s N b N a 3 ≃ 0.8 GB.

Number of samples
To evaluate the effect of the acquisition time on the anomaly detection performance, we progressively reduced the number of samples used to extract the fundamental frequencies.
As we can see in Fig. 9 the performance of the detectors remains almost constant as long as the number of samples N s is greater than 600; that corresponds to an acquisition time of 12 s with a sampling frequency f samp = 50 Hz . By drastically reducing the acquisition time, a significant reduction in data occupation is obtained, which in this configuration is M sam = 600N b N a l ≃ 0.04 GB , with no performance degradation.

Number of bits
The number of bits per sample used to encode waveforms extracted from accelerometers can also be dropped to reduce the volume of data stored and the cost of the sensor. Such an impact is reported in Fig. 10. The RMSE remains limited as far as the number of bits per sample is greater than 6; likewise, as expected, the accuracy of OCCs remains high as long as the error is small. Several low-cost accelerometers are available on the market with a resolution N b = 8 , and these results show that this type of sensor could accomplish the anomaly detection task. In this case, the data occupation is M bit = 8N s N a l ≃ 1 GB . This relevant reduction of the number of resolution bits is possible because of the anomaly detector capability to cope with the error introduced in the modal frequencies estimation (depicted in red in Fig. 10) caused by quantization.

Intrinsic noise
A significant effect that should be investigated when designing a sensor network with low-cost devices is the intrinsic noise of the sensors. This kind of noise is strictly related to the type of technology adopted. For example, for microelectromechanical systems (MEMS) accelerometers, the intrinsic noise can be modeled as a combination of thermal noise, flicker noise, and shot noise [58,59]; for microcantilevers at high frequencies, dominant noise sources are adsorption-desorption processes, temperature fluctuations, and Johnson noise, whereas the adsorption-desorption noise dominates at low frequencies [60].
To maintain generality about the type of sensors, we considered thermal noise, always present in electro-mechanical systems, as dominant effect, modeled as additive white Gaussian noise with zero mean and variance 2 N depending on the sensor characteristics. To evaluate the impact of noise, we assess the performance of the algorithms varying the signal-to-noise ratio (SNR) of the system. Considering the accelerometric measurements gathered by the actual sensors as ideal, we progressively increase the noise variance 2 N to ensure an SNR excursion between −22 and 15 dB as reported in Fig. 11. As we can see, the RMSE is tolerable when the SNR is greater than −12 dB; likewise, the accuracy of the detectors remains high as long as the error is limited.

Joint effect of sensors, samples, and bits
Due to the interdependence between the analyzed sensing parameters, it is impossible to adjust them independently to guarantee a predefined performance. Therefore, in Fig. 12, the feature extraction RMSE varying the number of bits, sensors, and samples is reported for different combinations of the acquisition parameters. Red regions represent combinations of parameters that lead to high feature extraction error, while blue ones identify parameters settings that result in low RMSE. For example, reducing the number of sensors l from 8 to 3 and the number of bits N b from 16 to 8, we comply with the limits reported in Figs. 8 and 10, but as depicted in Fig. 12, the combination of these two values result in an unacceptable RMSE. Although the interplay between acquisition parameters is not straightforward, important guidelines can be derived from the heat maps in Fig. 12: -From the first row of plots, it can be seen that the number of sensors l necessary to guarantee a low RMSE decreases by lowering the acquisition time. -From the middle row of plots, we deduce that the number of bits is not a critical parameter in the feature extraction process due to the similar shapes reported in the heat maps. -From the last row, we can infer that 4 sensors are enough to ensure low extraction error when the number of samples N s is greater than 1000. -From the first plot of the first row, we can see that there are no possible configurations that contain the error with a number of samples N s lower than 500.

Conclusions
In this paper, we presented a SHM system that aims to extract damage-sensitive features with the minimum amount of resources necessary for damage detection with high accuracy. In addition, an overview of some widely used anomaly detection algorithms is provided. Three different approaches are proposed to reduce the volume of data stored or transmitted to limit costs for sensors and network infrastructure: (i) when the goal is to reduce the amount of data stored, it is good practice to reduce the (ii) if the target is to minimize the sensor cost, a good practice is to adopt several low-resolution sensors combined with a long observation time; (iii) when the objective is to contain the network infrastructure cost, it is recommended to adopt high-resolution sensors and long observation time. We show that the best practice to reduce the total amount of data and hence the memory occupation, without affecting the performance, is to reduce the observation time. In the considered scenario, this approach reduces the memory occupation from 4 to 0.04 GB. RMSE and accuracy are used to evaluate the error introduced by the data containment strategies and the corresponding performance of the algorithms. The results show that such strategies, when properly designed, can be adopted without significant loss of performance. In fact, all the algorithms except PCA ensure an anomaly detection accuracy greater than 94% in all of the proposed configurations, with the maximum performance reached by OCCNN 2 whose accuracy never goes down below 95%.