Deep convolutional neural network for multi-level non-invasive tunnel lining assessment

In recent years, great attention has focused on the development of automated procedures for infrastructures control. Many efforts have aimed at greater speed and reliability compared to traditional methods of assessing structural conditions. The paper proposes a multi-level strategy, designed and implemented on the basis of periodic structural monitoring oriented to a cost- and time-efficient tunnel control plan. Such strategy leverages the high capacity of convolutional neural networks to identify and classify potential critical situations. In a supervised learning framework, Ground Penetrating Radar (GPR) profiles and the revealed structural phenomena have been used as input and output to train and test such networks. Image-based analysis and integrative investigations involving video-endoscopy, core drilling, jacking and pull-out testing have been exploited to define the structural conditions linked to GPR profiles and to create the database. The degree of detail and accuracy achieved in identifying a structural condition is high. As a result, this strategy appears of value to infrastructure managers who need to reduce the amount and invasiveness of testing, and thus also to reduce the time and costs associated with inspections made by highly specialized technicians.


Introduction and related works
Structural assessment of civil infrastructure, such as bridges and tunnels, is of paramount importance to ensure a high level of safety and the optimal management of economic resources. The development of engineering tools for making this task automatic is crucial for owners and managers of complex infrastructural assets.
Even limiting the analysis field to the Italian scenario, we need to deal with an infrastructural heritage consisting of approximately 33500 bridges and 2500 tunnels. Due to the magnitude of this asset, there is a clear need for new automatic control plans. The urgency of their development and implementation is further accentuated by the current level of infrastructure ageing. Indeed, most of the infrastructures date back to the 1960s and thus are extremely prone to deterioration due to ageing.
Structural Health Monitoring (SHM) techniques based on image recognition are often used to recognize the presence and the nature of potential damage to infrastructure [10]. Nowadays, Artificial Intelligence (AI) is transforming the way in which a wide range of sectors operate thanks to advanced learning architectures and the capacity to transfer and collect a huge amount of data. Most recently, deep learning techniques have been found to be effective in carrying out complex classification tasks for automatic image analysis. Notably, convolutional neural network (CNN) and transfer learning techniques have been exploited to obtain better results through the use of pre-trained deep networks. One of the main advantages of such AI networks is the direct extraction of data features. Some applications of structural damage classification based on transfer learning with convolutional networks can be found in Refs. [11][12][13][14][15][16][17]. It is worth noting that in most literature studies the types of categorized defects are somewhat limited, and the data often correspond to mere ideal laboratory conditions.
Among the non-destructive structural monitoring techniques for tunnel control [18], Ground Penetrating Radar (GPR) is one of the most used. It allows a multidefects interpretation of tunnel lining [19], improving visual inspection techniques that are exclusively suitable for detecting surface defects [20]. Nevertheless, the process of GPR data interpretation is generally computationally expensive [21] because data are usually manually scaled and interpreted or stored and only subsequently processed off-line.
This paper presents a new multi-level strategy, based on deep CNNs, for automated concrete damage detection and classification. Its main contribution is the creation of a rapid and robust tool capable of providing a decision aid for tunnels (DAT) during the maintenance phase. It classifies GPR profiles into 14 categories, thus covering a wide range of defects. The findings are considered satisfactory both in terms of accuracy and robustness. In addition, an investigation of the sample-wise double descent phenomenon [22,23] has been carried out in an optimization and improvement assessment of the results.

Convolutional neural network and transfer learning
Neural networks are one of the most extensively utilized image-based categorization approaches. In this study, the automated attribution of a particular structural state to the analyzed image has been obtained by training CNN. As mentioned in the previous section, such networks avoid the need for human-made feature extraction. Hundreds of layers, analogous to the biological structure of the visual cortex, define the network's structure. Each layer learns some features from the images. The network architecture is composed by four types of layers: convolution, activation, pooling, and fully connected [24]. The first one contains neurons placed in a feature map connected to the adjacent ones of the next layer through convolution kernels. The second one is introduced within the network architecture to extract nonlinear features. The third one reduces the size of the convolved feature to improve the algorithm performance and decreases the computational cost. The last one is the layer that interprets the characteristics previously extracted and creates a vector containing the probability of belonging to each class.
The use of deep learning without a huge dataset and a very high training period is possible by means of the transfer learning approach. It consists of re-training existing networks on their dataset for different classification scenarios. Transfer learning for the finetuning of the network is quicker than training a network from the ground up, and it delivers excellent accuracy even with fewer training data.
In this study a pretrained neural network exploiting very large datasets was used for new classification scenarios. Such a network was pre-trained on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012-2017 image classification and localization dataset. In this way, the network could classify 1000 object classes through 1281167 training images, 50000 validation images, and 100000 test images [25,26].
The chosen pretrained network was adapted to perform binary classifications using the following hyperparameters ( Table 1). The classification performance depends on several factors. The environmental conditions and the instrumentation typology, the database size, and some implementation details can all play a relevant role. In this section, such aspects are described.

Instrumentation: Ground Penetrating Radar
The images used for the damage classification were the output of a GPR campaign. Such technology is a generally non-destructive screening method [27] used in civil engineering applications [28], specifically for assessing a tunnel's structural conditions [29]. It is used in a wide range of applications including concrete void location [30], underground utility tracking [31], railway ballast optimization and evaluation [32], and landmine detection [33]. This instrument is known for its strong penetration capacity and its ease of use and transport [34]. Such features make it a valuable tool for damage detection and localization.

Operating principles and survey methodology
GPR is a geophysical [35] survey methodology and is based on the transmission of high-frequency electromagnetic wave impulses into a material by means of an antenna with a frequency ranging from 10 to 2600 MHz.
The propagation of such an impulse depends on the material dielectric properties. For this reason, the quality of the representation is greatly influenced by some elements, e.g., water. This latter causes reflection and attenuation of part of the signal producing a less clear and meaningful rendering. Two types of GPR were used in the presented survey performed by the RINA company. The first involved the use of a dual-frequency antenna, the second the use of a high-frequency one. Tables 2 and 3 summarize the technical aspects of both.
The dual-frequency antenna was used to capture longitudinal profiles, whose minimum number and layout in the tunnel cross-section depended on the number of lanes and on the tunnel size ( Fig. 1). On the other hand, GPR scans with a high-frequency antenna could be either longitudinal or transversal. This depends on the degree of required detail.

Database: engineering judgement
The GPR campaign described so far was carried out in tunnels belonging to several highway routes in Italy where visual inspections had already revealed criticalities. Attention to them had been growing due to the age of such tunnels, most dated between 1960 and 1980. To assess the structural conditions of the linings, mapping of tunnel lining thicknesses, identification of ribs, survey of the presence of intrados reinforcement, verification of the presence and position of possible voids, discontinuities, situations of degradation or inhomogeneity and analysis of the coating cortical state became the general objectives of the assessment.
Interpretations of GPR longitudinal profiles were performed by both image-based analysis (IBA), i.e., through visual recognition procedures of specific patterns with trained inspectors, and a variety of supplemental tests, such as transverse GPR, jack, pull-out, core drilling, and video−endoscopy, which supported the classification process.
GPR profiles are characterized by a vertical axis representing the depth of investigated thickness and the horizontal axis representing the progressive distance from the beginning of the structure. An example of GPR profile with interpretations is reported in Fig. 2.

Algorithm and implementation details
ResNet50, a supervised learning algorithm, was the core of the developed methodology. As already stated, the algorithm received the "basic/filtered" GPR longitudinal profiles as the input and provided the corresponding interpretations as the output.

Image pre-processing
The processed GPR longitudinal profiles were acquired by means of B-scan visualization. To use them as input of the algorithm, the axes (described in the Section 3.2)   were removed. GPR profiles are affected by noise, sound tails, and interferences. Obviously, environmental noise can make the interpretation of GPR profiles complicated, whether this is done by experts or by an algorithm. For this reason, several filters were applied by the RINA company. In detail, four types of filters were used. The first ("Move start time") was used to remove the portion of the signal between air and the investigated medium to correctly interpret the depth of the analysis. The second ("Background removal"), the third ("Bandpass filter"), and the fourth ("Smoothed gain") attenuated the noise, the high frequencies, and equalize the power, respectively.
Starting from such filtered GRP profiles, the firstperformed pre-processing operation was the profile cutting. Each profile was divided into elements varying in size between 112-600 horizontal pixels and 110-564 vertical pixels. This operation was carried out using the free online tools form PineTools. Then, to improve the classification performance, a data augmentation technique was used. As highlighted by several studies in the literature [36], this technique turned out to be very effective. The horizontal flip augmentation [37,38], namely the rotation of images with respect to the vertical axis, was performed. This operation was carried out using the program: Microsoft Office Picture Manager. It was executed for the images of all the classes, except for the ones related to the healthy conditions. This choice was justified by the presence of a high number of images belonging to this class.
The database was created by associating the ith image to its class. This operation was carried out by comparing the filtered GPR profiles without interpretations with the produced reports containing the GPR profiles with their interpretations.

Pretrained neural network: ResNet-50
Of the available selection of pretrained neural networks (e.g., AlexNet, SqueezeNet, ShuffleNet, ResNet-18, Goog-LeNet, ResNet-50, MobileNet-v2, and NASNetmobile) ResNet-50 was chosen and used within the MATLAB-2020b programming environment. It is a CNN designed in 2015 by He et al. [39]. ResNet-50 has about 25 million parameters. It is composed by 177 layers of which 49 are convolutional and 1 is fully connected. It exploits the Rectified Linear Unit (ReLu) and the softmax as activation functions and it is defined as a "feed forward" neural network with "residual/skip connections".
It stemmed from the observation of a non-intuitive phenomenon: "by increasing the depth of the network layers there is a risk of making the network worse". The deeper neural networks intuitively should perform better than the shallower ones, or at least, should show better results in the training phase. Indeed, in this phase, the overfitting phenomenon is not possible. Examples of deep neural networks, showing excellent results, are present in very recent studies [40,41]. However, it is known that, as the depth of the network increases, the increase in accuracy is not always verified and a degradation problem occurs. The innovative element that makes ResNet perform better than similar counterparts is the possession of a residual unit (skip connection) that makes it capable of learning the differences between the input and output layers. In this way, it is possible to mitigate the problems arising from excessive depth. The high depth of the network and the relatively low computational level are two of the reasons why ResNet was selected to address the classification problems at hand [42].

Methodology: multi-level damage classification
To perform tunnel lining condition rating, the proposed methodology was developed in six levels, as depicted in the flowchart in Fig. 3. Moving from the lower to the higher levels, it is possible to achieve more detailed knowledge about the presence and the type of structural damage. This approach aimed to associate an increasing level of attention to the criticalities that deserved an indepth examination of the ongoing structural decay. This concept is the same as that reported in the "New guidelines for the classification and management of risk, safety assessment and monitoring of existing bridges", recently approved in Italy (2020).
When ith GPR profile is analyzed, it can be associated with one of 14 classes, described below. C1: Healthy and Reinforcement. This class is composed by images associated with healthy structural condition and with the possible presence of reinforcement, namely covering centring.
C2: Damaged. This class is composed by images with at least one or more types of damage. C3: Healthy. This class is formed by images associated with healthy structural condition.
C4: Reinforcement. This class includes images with reinforcement, namely covering centring.
C5: Warning mix. Images in this class are characterized by the combination of two or more types of damages.
C6: Warning all. This class is composed by images corresponding to the presence of a single type of damage. The potential damages are anomalies, cracks, simply voids, detachments or excavations.
C7: Crack. Images in this class are characterized by the presence of cracks.
C8: Images in this class can present anomalies, simply voids, detachment or excavation.
C9: Anomaly. Images in this class present anomalies, namely inhomogeneity within the covering casting. Some of the causes of this phenomenon are: aging of concrete, temperature changes, presence of problems in the casting, crawl spaces, and reduced injuries.
C10: Mixed voids. Images in this class show the presence of voids of different nature.
C11: Simply voids. Images in this class are associated to the presence of voids with medium size and depth.
C12: Images in this class are related to the detachment and excavation phenomena. A more detailed description is reported in the C13 and C14 class, respectively. C13: Detachment. This phenomenon produces external void, also presenting some cracks.
C14: Excavation. This phenomenon leads internal void with large size.

Results and discussion
Tunnel linings dating from 1890 up to 1992 were analyzed, following the procedure described so far. The accuracy achieved for each level was satisfactory as it was always greater than 90% and on average was equal to 94.5%. Tables 4 to 10 show the confusion matrices for each level.

Subtotal results: confusion matrix for each level
To evaluate the algorithm classification performance for each level, a confusion matrix and a value of accuracy were used. The confusion matrix rows showed the actual classes and the columns showed the predicted labels. The values places on the diagonal of the matrix correspond to a correct classification. The accuracy value was defined as the ratio between the confusion matrix trace and the total sum of the matrix values. The displayed confusion matrices and the corresponding accuracies were relative to an arithmetic mean based on the results from several (10, as explained below) test folds obtained by means of a K-fold validation technique. Besides, for each such test fold, an error estimation through the RMSE (Root Mean Square Error) index was performed and then their average was calculated and used as final indicator.
It is worth noting that the same number of samples for both classes was used in the training of the algorithm for the six levels. Such homogeneity avoided specific methodologies that would otherwise have been required to overcome problems of imbalance between classes [43].
Following the K-fold validation methodology as previously mentioned, the data were randomly divided   into k groups (folds) where one "fold" is used for testing, one for validation, and (k-2) folds for training [44,45]. A split value of k equal to 10 was chosen to the crossvalidation. As empirically proven, such value produces test error rate estimates that don't suffer from either high bias or large variance [46].
Finally, the convergence graph (loss/accuracy versus number of iterations) was used as an additional tool to evaluate the models. An example, representative of the general behavior, is reported in Fig. 4. It shows the loss/accuracy versus number of iterations for one of the 10 cases related to the Level 1 and highlights the correspondence of the trend with respect to the expected behavior. 5 The total number of samples for class was equal to 936. The values of accuracy and RMSE were 91.8% and 25.6%, respectively.

Level 5
The total number of samples for class was equal to 1080. The values of accuracy and RMSE were 98.3% and 5.2%, respectively.

Level 6
The total number of samples for class was equal to 408. The values of accuracy and RMSE were 95.3% and 17.1%, respectively.

Optimization perspective: double descent phenomenon
In an optimization and improvement perspective, an investigation of the sample-wise double descent phenomenon was carried out. To speed up the investigation process, the analyses reported here were based on splitting the overall dataset into only two parts: training and testing. As is well known, for a fixed model and training procedure, the variation of the number of the training samples has an important effect on the error found in the test set. As recently highlighted [22,23], the behavior of such error is not monotonically descending as the number of training samples increases. This is due to the sample-wise double descent phenomenon. Such behavior shows three phases: two decreasing (the first and third) and one increasing (the second). Knowledge of the phase to which the error belongs allows potential improvement that is still to be understood. Figures 5 and 6 show the trend of test error (expressed as the complement of the accuracy) as a function of the number of samples per class and of the training percentage for Level 1 and Level 5, respectively. For Level 1, using a number of samples per class close to that used in Section 5.1.1, for training percentages from 50 to 90, the error is already descending in the third phase. Consequently, by increasing the training set the expected improvements will be slight. On the other hand, Level 5 shows a behavior that is not well defined due to the small number of available samples. A significant increase of the number of available samples could produce substantial improvements.

Conclusions
This paper proposes an automated multilevel strategy for the identification and classification of damage in tunnel linings. The potential outcomes, stemming from the use of innovative pre-trained neural networks in this research, are: 1) the automatic categorization of a wide range of defects, 2) the decrease of the time and cost caused by employing highly specialized inspectors in the interpretation of GPR profiles, 3) the reduction of additional invasive tests to be coupled to GPR for the characterization of defects, with a consequent minimization of assessment invasiveness, 4) the construction of a methodology that can be integrated into an holistic maintenance plan. Despite some intrinsic limitations of  the methodology, linked to the training times and to the accumulation of data associated with more categories of defects, advances of the proposed approach are expected. Future developments of the work foresee: 1) the integration of the CNN results with laboratory tests for the creation of a holistic tunnel control strategy, 2) the database extension, 3) the increment of the damage classes number, and 4) the comparison of the results with the ones obtainable from other CNN architectures.