Data mining algorithms for bridge health monitoring: Kohonen clustering and LSTM prediction approaches
 236 Downloads
Abstract
In recent years, bridge health monitoring system has been widely used to deal with massive data produced with the continuous growth of monitoring time. However, how to effectively use these data to comprehensively analyze the state of a bridge and provide early warning of bridge structure changes is an important topic in bridge engineering research. This paper utilizes two algorithms to deal with the massive data, namely Kohonen neural network and long shortterm memory (LSTM) neural network. The main contribution of this study is using the two algorithms for health state evaluation of bridges. The Kohonen clustering method is shown to be effective for getting classification pattern in normal operating condition and is straightforward for outliers detection. In addition, the LSTM prediction method has an excellent prediction capability which can be used to predict the future deflection values with good accuracy and mean square error. The predicted deflections agree with the true deflections, which indicate that the LSTM method can be utilized to obtain the deflection value of structure. What’s more, we can observe the changing trend of bridge structure by comparing the predicted value with its limit value under normal operation.
Keywords
Health monitoring Structural assessment Kohonen clustering Time series Long shortterm memory1 Introduction
The changes of weather, environmental erosion, natural disasters and increasing traffic loadings can continually modify the behavior and even cause deterioration of bridges in their longterm service [1]. Application of BHM has been recognized as an attractive tool to improve the health and safety of bridge and provide early warning on structure damage [2, 3, 4]. A typical BHM system generally provides various effective realtime information such as temperature, wind, crack, deflection and stress. We can use those information to judge the bridge health status according to the corresponding diagnostic method. Thus, bridge health monitoring and safety assessment are inevitable requirements for the sustainable development of bridge engineering.
We know that the detection efficiency of sensors installed in the bridge construction gradually deteriorates during their longterm operation which affects the reliability of monitoring data. In addition, the severe environment conditions have a serious impact on the quality of the collected data [2]. What is more, the types of collected data are different and its amount is huge. Therefore, the difficulty for the current research and application of bridge health monitoring system is often not collecting the monitoring data, but how to extract useful information from the massive data accumulated over time [5]. However, most of bridge management systems currently use the data collected from the sensors without analysis which will greatly affect the quality of the data and lead the bridge structure damage identification and fault diagnosis technology to fall into a “data disaster” dilemma. The purpose of this paper is using methods based on data mining to overcome the problems of subjectivity, high computation complexity, low sensitivity and complicated technology in traditional analysis.
With the uninterrupted work of the bridge monitoring data acquisition system, a large amount of data are transmitted and stored in the database every day. Therefore, the accumulated data in the database can be said to be massive during the entire bridge operation period. Traditional database technology has shown its weakness in dealing with such a large amount of data storage and processing. The emergence of technology such as Hadoop provides an excellent solution to massive data processing.
Data processing is a core component of bridge health monitoring system. Highquality data are the basis for accurately judging the health status of bridges. With the increasing scale of bridge monitoring data, traditional database technology has gradually weakened in storing and processing data. In order to meet the challenges brought by the data growth in the bridge monitoring system, the bridge data monitoring and prediction function combined with Hadoop have theoretical significance and practical value. In this work, we built a triplenode Hadoop cluster and tested the proposed algorithm in a distributed environment.
In this paper, we focus on processing the monitoring data as well as getting structured data. And then based on the clustering theory of data mining, we use the Kohonen neural network to group the structured data and explain the point in clusters. We have also established a LSTM network model to predict the deflection which can reflect the health of the bridge structure by analyzing the historical monitoring data. These two methods both can be used to evaluate the health status of bridges to some degree. The former can get the classification pattern of the data collected under normal operating condition and then provide corresponding early warning through the analysis of outliers generated when new data were input to the network.
 1.
We preprocess the data collected by the sensors of bridge health monitoring system. Firstly, we fill the missing values by analyzing the range of data values as well as the historical data and empirical data. Secondly, we analyze the cause of abnormal data and repair them. Finally, we merge the data that are closest in acquisition time and then standardize each record.
 2.
We use Kohonen neural network to selfadaptively cluster the processed structured data and get the classification pattern of the data collected under normal operating condition. Then, we can analyze the outliers generated when new data were input to the network, and provide corresponding early warning according to the results.
 3.
We use the time series prediction model established by LSTM neural network to predict the deflection value. And then we can obtain the health status by comparing the predicted value with its limit value under normal operation.
2 Literature review
In order to ensure the safety and durability of bridges, the research and evaluation system for bridge health monitoring has been developed in the twentieth century and has made extensive achievements [6, 7, 8]. The UK first developed a longterm health monitoring devices and data acquisition processing system in the early 1980s [9]. This research finding was applied to continuous steel box girder Foyle Bridge in Northern Ireland. This system is the earliest, relatively complete structural monitoring system. In the midlater 1980s, the USA began to install sensors on a number of bridges, such as the Sunshine Sky Bridge in Florida on which there were more than 500 sensors installed to measure the temperature, strain and displacement.
BHM in China began in the 1990s, when different scales monitoring systems were installed with the construction of a large number of bridges [10]. In 2000, the Hong Kong Highways Department installed a large number of sensors for temperature, wind, strain, displacement and acceleration monitoring on Tsing Ma Bridge and Tingkau Bridge. As far as the Tsing Ma Bridge is concerned, more than 800 functional sensors were installed including GPS surveying instrument and anemometer. And the Nanjing Yangtze River Bridge, Xupu Bridge and Humen Bridge were all equipped with the monitoring system to monitor the structural response shortly after the project completed.
Hadoop, an opensource distributed framework launched by Apache in 2007, is a big data solution widely promoted and applied by academia and industry. Hadoop mainly includes Hadoop Distributed File System (HDFS) and Map Reduce, a parallel computing model. Hadoop is widely used for largescale data processing on clusters. In terms of productivity and maturity, it is better to run them on a supercomputer than to develop a new framework from scratch. Many supercomputing centers have begun to officially support the Hadoop environment [11].
The easy operation of environmental vibration test makes it a widely used method in bridge structure detection, but it can only obtain the basic modal parameters such as frequency, mode and dump ratio [12, 13]. Often, such methods provide a global analysis of the structure integrity, but won’t give the location of the damage which leads to ineffective decisionmaking for structural management. In order to overcome the limitation and solve the challenge of damage detection, many authors have proposed to use the data mining approach for health assessment in BHM. Researchers have been using time series and chaotic theory to predict the information of deflection at midspan Ma Sangxi Bridge [14]. They found it is effective in predicting chaotic time series by using multistep recurrent BP neural network and RBF neural network. The results show that multirecursive BP neural network can predict the health of bridges. Another study proposes a clustering method to group the nodes with similar behavior on bridges and then detects the abnormal joints [15]. This method learns the most representative behavioral models from historical data and detects the potential damage in the structure with distance obtained from the normal model. All in all, data mining technology has great advantages in dealing with the massive data collected by the bridge health monitoring system.
Time series analysis is the study of data collected in chronological order. In general, time series contain data sequences obtained within a fixed sampling time. At present, time series prediction is widely used. A practical time series predictionbased nonstationarity detection method was proposed [16]. Chen et al. [17] introduced a new wind speed prediction method based on LSTMs, SVRM and EO deep learning time series prediction nonlinear and learning integration. Based on the nonlinear learning integration of LSTM, SVRM and EO, the proposed LSTM system achieved satisfactory wind speed prediction performance. In [18], machine learning algorithm was used to predict the wind speed of Osorio wind farm. The optimal intelligence model was introduced at different time intervals. The past data were the input, and the future data were used to represent the MLA output. Additionally, in [19], the author proposed a new multiattentionbased network to predict geographic sensory time series based on heterogeneous data in multiple fields.
3 Method
In bridge health monitoring, it is unclear what kind of relationship exists in a large amount of historical data. The classification results will not be very good due to the lack of classification rules in advance. Therefore, it is more appropriate to use clustering method to deal with health monitoring data because it is not necessary to understand attribute relationship in data. In addition, the bridge structure damage is gradually formed in the course of its operation and we should excavate the law of structure performance changing with the time from a large number of data samples. Therefore, a long shortterm memory neural network is used to predict the value of some important parameters which can reflect the bridge health state.
3.1 Kohonen neural network

Step 1: Randomly initializing all weights \(W_{ij}(i=1,2,\dots m,\, m\) is the total number of input nodes, \(j=1,2,\dots ,n\), n is the total of number output nodes.)

Step 2: Computing the Euclidean distance \(X_i  W_j\) between each input sample and the weight vectors, where \(X_i\) representing the \(i_{\mathrm{th}}\) sample.

Step 3: Defining the neuron C with the smallest Euclidean distance \(X_i  W_j = {\mathrm{min}}X_j  W_j\) as the winner. And we denote the winner’s neighborhood as \(N_c(t)\).
 Step 4: Adapting the weights of the winner and specified neighbors.where \(W_{ij}(t)\) represents the weights between neurons i and neurons j that within a neighborhood at time t and \(\alpha (t) = \alpha (0)(1  t/T)\) is the learning rate. T is the whole time to train.$$\begin{aligned} \left\{ \begin{array}{ll} W_{ij}(t+1) = W_{ij}(t) + \alpha (t)[W_{ij}(t)  X_i], &{}\quad j\in N_c(t)\\ W_{ij}(t) = W_{ij}(t),&{}\quad j\notin N_c(t) \end{array} \right. \end{aligned}$$(1)

Step 5: Adjusting neighborhood size and learning rate. It is obvious that the network converges when the learning rate \(\alpha (t)\) decreases to nil or \(N_c(t)\) shrinks to an acceptable neighborhood range. Then we will record it.

Step 6: Training the network continuously in the aforementioned manner until its satisfies the conditions.

After sufficient training, the Kohonen network will provide a graph in which several independent regions are formed to represent specific clusters.
3.2 Long shortterm memory neural network
4 Discussion and experiments
The bridge monitoring system will operate 24 h after installation, and the monitoring data will be uploaded and stored in real time. Assume that the stored data in a day are 5G, and then we can know that the storage capacity of 1 year is about 1825G. It will not only be meaningless but also consume a lot of storage space if we store all these information but rarely use it. Therefore, in this chapter, we will analyze the data collected by the bridge monitoring system in order to mine useful information from the massive data.
Before the experiment, we built and configured the Hadoop distributed cluster. We used three nodes. We called the master node command lab01, and the other two child nodes were named lab02 and lab03. A root user was created on each computer.
4.1 Preprocessing bridge monitoring data

In the health monitoring system, the initial time from collecting data of sensors with different attributes is different and the acquisition frequency is also different. Then merge the data, we combine the data that are closest in acquisition time, according to the highest frequency of items. The total number of samples is 8735 after merging.

In addition, data may be missing due to shortterm severe environmental change or interference in the process of data transmission. The missing values can be replaced by the record of the previous time because the parameters of bridge won’t change very much in a short time.

Analyzing the variation trend of the local maximum and minimum value of each parameter can make us understand the variation characteristics of local time and control the change trend of bridge structure. As shown in Fig. 3, the maximum value of deflection_{1} has a tendency to decrease. In order to facilitate the future data analysis, we need to find the maximum, minimum and average value for each parameter. The statistics of some properties are shown in Table 1.
 The different ranges of monitoring values from various types of sensors make it possible to neglect the properties with small values in modeling and training. Therefore, in order to improve the effectiveness of the model and balance the influence of each attribute on the model, it is necessary to standardize the values of each attribute. The standardization process is as follows:where \(x_i\) and \(y_i \,(i=1,2,3,\dots ,n)\) are the original and normalized value, respectively, and \(A_{{\mathrm{min}}}\) and \(A_{{\mathrm{max}}}\) represent the minimum and maximum value of attribute A, respectively. The standardized values of some attributes are shown in Table 2.$$\begin{aligned} y_i = \frac{x_i  A_{{\mathrm{min}}}}{A_{{\mathrm{max}}}A_{{\mathrm{min}}}} \end{aligned}$$(9)

Principal components analysis (PCA) can create a replacement, smaller set of variables to combine the essence of the original attributes for compression and dimensionality reduction in bridge monitoring data. We can redefine each property and get new variables through PCA. These variables are independent of each other, which is beneficial for the precise training of the model. In this study, we use PAC to reduce the data to 200 dimensions to be the input of Kohonen clustering network because the clustering time is shorter and the score according to the clustering analysis result of the Calinski–Harabase index evaluation is higher in this case. Therefore, we use PCA to reduce the original data to 200 dimensions for better clustering results.
The statistical values of some properties
Parameter  max  min  Average 

Crack_{1} (mm)  0.07  − 0.17  − 0.01 
Crack_{2} (mm)  0.18  − 0.05  0.09 
Deflection_{1} (mm)  44.5  9.6  27.03 
Deflection_{2} (mm)  43.4  6.6  25.23 
Humidity_{1} (%)  83.3  36.09  60.39 
Humidity_{2} (%)  40.96  4.08  22.70 
Temperature_{1} (°C)  37.29  1.59  19.39 
Temperature_{2} (°C)  37.06  2.1  19.62 
Some data records after standardization
Time  \(d_1\)  \(d_2\)  \(t_1\)  \(t_2\) 

2017/7/10 1:00  0.703  0.694  0.722  0.688 
2017/7/10 2:00  0.701  0.694  0.710  0.686 
2017/7/10 3:00  0.701  0.694  0.705  0.682 
2017/7/10 4:00  0.700  0.692  0.702  0.680 
2017/7/10 5:00  0.698  0.691  0.681  0.678 
2017/7/10 6:00  0.697  0.690  0.669  0.674 
2017/7/10 7:00  0.696  0.688  0.684  0.668 
2017/7/10 8:00  0.695  0.686  0.694  0.667 
2017/7/10 9:00  0.693  0.684  0.684  0.667 
2017/7/10 10:00  0.693  0.683  0.671  0.671 
2017/7/10 11:00  0.693  0.681  0.662  0.678 
2017/7/10 12:00  0.693  0.682  0.662  0.687 
2017/7/10 13:00  0.692  0.683  0.687  0.698 
2017/7/10 14:00  0.695  0.681  0.713  0.710 
2017/7/10 15:00  0.697  0.682  0.718  0.723 
4.2 Clustering analysis model based on Kohonen neural network
The analysis of clustering result about distance between clusters
Pair of cluster  Distance 

(0, 1)  2.43 
(0, 2)  2.78 
(0, 3)  1.06 
(1, 2)  2.79 
(1, 3)  1.67 
(2, 3)  1.93 
The number of proportion in each cluster
Label  Number  Proportion (%) 

0  2493  28.54 
1  2316  26.51 
2  1659  18.99 
3  2267  25.96 
The average value of some attributes in different clusters
Label  \(d_1\) (mm)  \(t_1\) (°C)  \(c_1\) (mm) 

0  34.661  26.638  0.012 
1  18.481  11.052  − 0.002 
2  19.200  12.140  − 0.027 
3  33.072  25.221  − 0.037 
The standard deviation of some attributes in different clusters
Label  \(d_1\) (mm)  \(t_1\) (°C)  \(c_1\) (mm) 

0  5.784  5.687  0.028 
1  16.796  16.221  0.025 
2  16.453  15.587  0.046 
3  4.091  4.211  0.056 
We can also see from Fig. 5 that Cluster 1 and Cluster 2 have relatively large discreteness which is because there is a big difference between the record in Cluster 1 and Cluster 2. The temperature and deflection values are lower in Cluster 1. On the contrary, the temperature and deflection values are relatively high in Cluster 2. This coincides with the principle of clustering based on the similarity and dissimilarity in data which means that clustering can find out the data’s inner characteristic and distribution rule.
The clustering of the above Kohonen network forms 4 distinct clusters which generally contain patterns of monitoring data under normal condition, and each cluster represents a category with similar properties. Then, we can input the new data into the Kohonen neural network and compare the clustering results (including the proportion of each cluster, the average, standard deviation of each property) with the results obtained from the normal data. If there is a large deviation, the data may be abnormal. Therefore, the Kohonen clustering model can be used for outlier detection and provide an early warning reference for bridge structure.
4.3 Times series prediction based on LSTM
In the bridge structure, the characteristics of grider deflection (according to which we can make a timely response to the bridge structure condition under the joint action of load and external environment, because it is not prone to noise and is sensitive to structure damage) can truly reflect the bridge health condition. Therefore, we use the LSTM neural network to predict the deflection in BHM.
MSE and the ratio of different accuracy for some deflections
Parameter  Error = 0.5 (%)  Error = 1 (%)  Error = 2 (%)  MSE 

Deflection_{22}  95  98  100  0.083 
Deflection_{23}  85  97  99  0.192 
Deflection_{24}  99  100  100  0.024 
Deflection_{25}  83  95  99  0.227 
Deflection_{26}  98  100  100  0.043 
Deflection_{27}  74  95  99  0.319 
In terms of the difference between the predicted and real value for deflection_{25}, 83% of the data are within 0.5 mm, 95% are within 1 mm and 99% are within 2 mm. We also can know that the predicted deflection value of the small MSE is close to the real deflection. We can identify the bridge health state through the predicted deflection value. Firstly, we take the maximum deflection obtained under the effect combination of serviceability limit state and bearing capacity limit state as threshold and then compare the predicted value with the threshold, Finally, we provide early warning if the predicted value goes above threshold.
5 Conclusion
With the continuous development of bridge construction as well as more and more bridges being put into operation, the amount of data generated in the bridge monitoring process is also increasing and the structure of data is also more and more complex. The problem that urgently needs to be solved is how to extract effective information from these data to guide the maintenance and operation of bridge.
In this paper, we analyze the data collected by sensors in the bridge monitoring system and give the method of data preprocessing based on the theory of data mining. We use Kohonen neural network to selfadaptively group the processed structured data and get the classification pattern of the data collected under normal operating condition which can provide corresponding early warning and find the abnormal changes of bridge structure through the analysis of outliers. In addition, we use the time series prediction model established by LSTM neural network to predict the deflection value. At the best predictive point, 99% of the data have an error of less than 0.5 mm, and 74% of the data in worstperforming points meet this accuracy requirement. What is more, all deflection points have at least 95% of their data showing a prediction error of less than 1 mm. Therefore, LSTM can be used to predict the value of monitoring parameters in BHM. We can observe the changing trend of bridge structure by comparing the predicted value with the limit value under normal operation. All work above is still in the initial stage of exploration. In future work, we will focus on transforming it from theory to engineering practice.
Notes
Author’s contribution
AG conceived and designed the experiments; AJ analyzed the data; JL performed the experiments; XL wrote the paper.
Funding
This research received no specific grant from any funding agency in the public, commercial or notforprofit sectors.
Compliance with ethical standards
Conflict of interest
The author(s) declared no potential conflicts of interest with respect to the research, authorship and publication of this article.
References
 1.Li A, Ding Y, Wang H, Guo T (2012) Analysis and assessment of bridge health monitoring mass data—progress in research/development of “structural health monitoring”. Sci China Technol Sci 55(8):2212–2224CrossRefGoogle Scholar
 2.Worden K, Cross E (2018) On switching response surface models, with applications to the structural health monitoring of bridges. Mech Syst Signal Process 98:139–156CrossRefGoogle Scholar
 3.Xia Q, Cheng Y, Zhang J, Zhu F (2016) Inservice condition assessment of a longspan suspension bridge using temperatureinduced strain data. J Bridge Eng 22(3):04016124CrossRefGoogle Scholar
 4.Zhou G, Li A, Li J, Duan M (2018) Structural health monitoring and timedependent effects analysis of selfanchored suspension bridge with extrawide concrete girder. Appl Sci 8(1):115CrossRefGoogle Scholar
 5.Forstner E, Wenzel H (2011) The application of data mining in bridge monitoring projects: exploiting time series data of structural health monitoring. In: 2011 22nd International Workshop on Database and Expert Systems Applications. IEEE, pp 297–301Google Scholar
 6.Alokita S, Rahul V, Jayakrishna K, Kar V, Rajesh M, Thirumalini S, Manikandan M (2019) Recent advances and trends in structural health monitoring. In: Jawaid M, Thariq M, Saba N (eds) Structural health monitoring of biocomposites, fibrereinforced composites and hybrid composites. Elsevier, Amsterdam, pp 53–73CrossRefGoogle Scholar
 7.Zambon I, Vidović A, Strauss A, Matos J (2019) Condition prediction of existing concrete bridges as a combination of visual inspection and analytical models of deterioration. Appl Sci 9(1):148CrossRefGoogle Scholar
 8.Carnevale M, Collina A, Peirlinck T (2019) A feasibility study of the driveby method for damage detection in railway bridges. Appl Sci 9(1):160CrossRefGoogle Scholar
 9.Cornwell P, Farrar CR, Doebling SW, Sohn H (1999) Environmental variability of modal properties. Exp Tech 23(6):45–48CrossRefGoogle Scholar
 10.Guofang C, Ning L (2011) Research on the health monitoring for long span bridge in China. In: 2011 International Conference on Electric Technology and Civil Engineering (ICETCE). IEEE, pp 6814–6816Google Scholar
 11.Dao TC, Chiba S (2016) HPCReuse: efficient process creation for running MPI and Hadoop MapReduce on supercomputers. In: 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). IEEE, pp 342–345Google Scholar
 12.Bayraktar A, Altunişik AC, Sevim B, Özşahin TŞ (2014) Environmental effects on the dynamic characteristics of the Gülburnu Highway Bridge. Civ Eng Environ Syst 31(4):347–366CrossRefGoogle Scholar
 13.Sevim B, Atamturktur S, Altunişik AC, Bayraktar A (2016) Ambient vibration testing and seismic behavior of historical arch bridges under near and far fault ground motions. Bull Earthq Eng 14(1):241–259CrossRefGoogle Scholar
 14.Yang J, Zhou Y, Zhou J, Chen Y (2013) Prediction of bridge monitoring information chaotic using time series theory by multistep BP and RBF neural networks. Intell Autom Soft Comput 19(3):305–314MathSciNetCrossRefGoogle Scholar
 15.Diez A, Khoa NLD, Alamdari MM, Wang Y, Chen F, Runcie P (2016) A clustering approach for structural health monitoring on bridges. J Civ Struct Health Monit 6(3):429–445CrossRefGoogle Scholar
 16.Koesdwiady A, Karray F (2018) SAFE: spectral evolution analysis feature extraction for nonstationary time series prediction. arXiv preprint arXiv:1803.01364
 17.Chen J, Zeng GQ, Zhou W, Du W, Lu KD (2018) Wind speed forecasting using nonlinearlearning ensemble of deep learning time series prediction and extremal optimization. Energy Convers manag 165:681–695CrossRefGoogle Scholar
 18.Khosravi A, Machado L, Nunes R (2018) Timeseries prediction of wind speed using machine learning algorithms: a case study Osorio wind farm, Brazil. Appl Energy 224:550–566CrossRefGoogle Scholar
 19.Liang Y, Ke S, Zhang J, Yi X, Zheng Y (2018) GeoMAN: multilevel attention networks for geosensory time series prediction. In: IJCAI, pp 3428–3434Google Scholar
 20.Wu CY, Ahmed A, Beutel A, Smola AJ, Jing H (2017) Recurrent recommender networks. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, pp 495–503Google Scholar
 21.Lyu H, Lu H, Mou L (2016) Learning a transferable change rule from a recurrent neural network for land cover change detection. Remote Sens 8(6):506CrossRefGoogle Scholar
 22.Gers FA, Schmidhuber J, Cummins F (1999) Learning to forget: continual prediction with LSTM. IETGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.