Introduction

The infrastructure at ports in Germany, both on the sea and inland, is aging and in need of new technologies and methods for managing its lifespan. Traditional processes, which are labour-intensive and time-consuming, need to be replaced with automated, smart, and innovative measurement and analysis processes to improve transparency, efficiency, and reliability for more accurate lifetime predictions.

The infrastructure at ports is prone to degradation over time due to human activities and environmental factors, particularly the impact of saltwater on the material at seaports. This can lead to damage to concrete structures, sheet pile walls, and wooden construction. It is important to identify and assess the severity of this damage in order to take timely maintenance measures and prevent costly repairs or even infrastructure collapses.

Testing and monitoring of port infrastructural buildings typically involves both above and underwater parts. The above water portion is usually tested through manual and visual inspections, while the underwater portion is more difficult to assess. Divers must manually inspect the structure by touching it with their hands, but this method is unreliable and subjective, making it difficult to accurately classify and track damages. Additionally, the underwater inspection process is typically limited in scope, meaning that only a small percentage of the structure is actually examined. As a result, damage is often not detected until it has become severe or is not discovered at all.

Structural health monitoring with kinematic multi-sensor system

Sea and inland ports need to be inspected comprehensively on a regular basis. However, it is nearly impossible to visually inspect underwater areas, especially in river regions due to the high sediment content.

For this reason, we use a kinematic multi-sensor system (k-MSS) to record the object’s surface both above and below water.

Only in this way is it possible to scan the entire structure, reliably detect damage, and subsequently assess the current condition of the building in detail. In order to accurately record the building’s geometry and condition, high-resolution 3D data is required for both the underwater and above-water parts of the building.

The carrier platform, which functions as a boat-like vehicle, plays a central role in recording measurement data. However, it is also exposed to variables such as wind, waves, and currents. The carrier platform also serves to hold the sensor platform, and its design aims to prevent flow-related influences from causing quality-reducing deformation and constraining forces on the sensor platform. For this purpose, a drift and torsion compensation is carried out.

In addition to recording data, structural analyses are conducted to assess the service life and lifecycle of the infrastructures.

Port operators can use the results of these analyses to implement maintenance concepts and construction measures in accordance with the structure inspection as transparently as possible. Cost-intensive maintenance measures and long downtimes are significantly reduced by this procedure.

The k-MSS integrates various sensors for object recording, including a high-resolution hydroacoustic underwater multibeam echo sounder, a surface profile laser scanner, and five HDR cameras. Positioning is achieved through a combination of an IMU and GNSS combination, and there is the option for hybrid positioning with an automatically measuring total station from the shore (see Fig. 1).

Fig. 1
figure 1

A 3D recording of a port facility above and below water (Hesse et al. 2019)

The underwater multibeam echo sounder and the profile laser scanner measure 2D profiles, and the movement of the carrier platform creates an unstructured point cloud. The expected noise of the point clouds is in the millimetre range above water and in the centimetre range underwater.

In modern data processing, damage detection is often done using pattern recognition methods (see Hesse et al 2019). This work focuses on ensuring the quality, completeness, reproducibility, and automation of damage detection from 3D point clouds of port infrastructures above and underwater.

The main focus of this study is the detection of geometric damage in point clouds. In order to fully investigate this topic, we decided to only use point cloud data in our analysis. This decision was made in order to provide a more focused and in-depth examination of the subject and to avoid any potential complications or distractions that might arise from using additional data sources. As a result, the captured images from the installed cameras were not included in our analysis. The selection of damage types is highly dependent on the intended use and the specific application. In this study, we tested a methodology using two different data sets: a synthetic set and a real set from the harbour of Lübeck, Germany. The results of this testing will help us better understand the effectiveness of the methodology. The following state of the art is limited to damage detection and does not include data collection and mapping (for more details on mapping and data collection, see Hesse et al. 2019).

Structural monitoring by detection methods is addressed in several publications over water or in clear offshore areas.

Damage detection is mostly divided into two approaches: classification based on images or based on point clouds. In the case of point clouds, as in the present work, a distinction is made between point-based, area-based, and geometry-based methods (Neuner et al. 2016). A deformation monitoring and damage detection of large retaining structures using motor-vehicle-based mobile mapping systems was presented by Kalenjuk et al. (2021). Hadavandsiri et al. (2019) introduced a new approach for automatic, preliminary detection of damage in concrete structures with terrestrial ground scanners and a systematic threshold using a point-based approach. In Zhang et al. (2013), a geometry-based (using quadrics) approach is implemented for the segmentation of point clouds.

Image-based methods are divided into various approaches and methods. In recent years, neural networks have been increasingly used for classification and segmentation tasks. Tung et al. (2013) take repeated images of a retaining wall with a standard camera to perform digital image correlation. O’Byrne et al. (2013) are detecting disturbances by texture segmentation of colour images, while Gatys et al. (2015) showed that neural networks trained on natural images learn to represent textures in a way that enables synthesising realistic textures and even whole scenes. Neural networks, as feature extraction, are thus preferred over hand-crafted features (Yosinski et al. 2014; Carvalho et al. 2017; Abati et al. 2019).

When detecting damage in point clouds or images, one very often has to deal with the problem of having very little damage and very many undamaged areas. This leads to a great imbalance of classes. One possible way around this problem is to look for instances that are most different from the majority of the data. Such methods that distinguish non-normal instances from the majority are often called anomaly detection (AD).

Anomaly detection with transfer learning

Anomaly detection is a technique used to identify unusual or suspicious events or data sets. It was first mentioned by Grubbs in 1969 for identifying outlying data points, but the definition has since evolved to include a wider range of applications. Anomalies are now understood as data that is different from the norm and is rare compared to other instances in a data set. Two commonly used methods for anomaly detection are transfer learning and local outlier factors (LOF). Transfer Learning involves using pre-trained neural networks to create models more quickly and efficiently (see, e.g. Andrews et al. 2016), while LOF evaluates an object based on its isolation from other data in its local environment (Breunig et al. 2000).

Nowadays, anomaly detection algorithms are often used in many application domains. García-Teodoro et al. (2009) describe a method to use anomaly detection algorithms to identify network-based-intrusion. In this context, anomaly detection is also often called behavioural analysis, and these systems typically use simple but fast algorithms. Other possible scenarios are Fraud detection (Li et al. 2012), medical applications like patient monitoring during electrocardiography (Lin et al. 2005), data leakage prevention (Sigholm and Raciti 2012), and other more specialised applications like movement detection in surveillance cameras (Basharat et al. 2008) or for document authentication in forensic applications (Gebhardt et al. 2013).

Contribution

This work focuses on using anomaly detection methods to segment point clouds into two classes: damaged and not damaged. Because there are significantly more areas without damage, the classes are unequal in size. Additionally, the amount of available training data with true labels is limited, so we use pre-trained networks as generic feature generators. Before applying the anomaly detection algorithms, the point clouds must be pre-processed to make them suitable for analysis. We create a height field (also known as a digital elevation model (DEM) (Skidmore 1989) from the point clouds. Unlike natural images, the statistics of height fields depend on the scan resolution of the sensor, which makes it difficult to transfer pre-trained networks. The anomaly detection method we present can successfully differentiate between damaged and undamaged areas in point cloud-derived height fields. This is a novel approach to automated damage detection in structural monitoring, and it provides a foundation for further research in this field.

Methodology

In this paper, we convert point clouds into heightfields and use them as input for a deep neural network (DNN). The features extracted by the DNN are then used to calculate the local outlier factor (LOF) for individual sections (or "tiles") of the heightfield. By setting a threshold, we can classify each tile as either a damaged or undamaged region. This allows us to identify damage in the heightfield using the LOF and DNN.

Pre-processing of the point clouds

To accurately evaluate the results, we need to manually label the damage (concrete spalling) in the existing data. Additionally, objects such as fenders or ladders appear in the data and should not be included in the calculation of the heightfield. These objects are also manually labelled and assigned an index. This results in a point cloud with different indices for the quay wall, damage, and other objects. An example of this can be seen in Section III. This labelling process is necessary for accurate and reliable analysis.

Generation of the heightfield

To create the heightfield, regular shapes are fitted into the point cloud, and the distance of the points to the geometry is calculated. In this study, we use a simple plane according to Drixler (1993) as the reference geometry. We first rotate the point clouds in a consistent direction using principal component analysis and then cut regular square sections from the point cloud. These sections overlap by 50% in the X and Y directions. This allows us to more easily fit geometries into the point cloud. After cutting, a plane is estimated in each section.

When estimating the plane, we only use the points corresponding to the quay wall and damaged areas, taking into account the different indices. We then calculate the distance of each point to the estimated plane using the formula:

$$dist =\frac{{{n}_{x}\bullet p}_{x}+{{n}_{y}\bullet p}_{y}+{{n}_{z}\bullet p}_{z}-d }{|\overrightarrow{{\varvec{n}}}|}$$

This process enables us to generate a heightfield from the point cloud data.

The vector \(\overrightarrow{{\varvec{n}}}\) is the normal vector of the plane with the entries \({n}_{x}\), \({n}_{y},\) and \({n}_{z}\). \(d\) is the distance to the origin, and \({p}_{x}\), \({p}_{y}\), and \({p}_{z}\) are the coordinates of the point. An example of the determined distances is shown in Fig. 2. Blue represents a small distance to the plane and green via yellow and red an increasing distance.

Fig. 2
figure 2

Distances to the plane

To make deviations due to damage more noticeable, the distance to the plane is manually set to small values for points that indicate the presence of additional objects. The point clouds are then converted into a heightfield by rasterizing them. The raster size should be selected based on the resolution of the point cloud, and distances in overlapping areas are averaged during the rasterization process. The resulting heightfield is represented as a grayscale image, with black indicating a distance of zero and lighter shades indicating increasing distance. A corresponding label image, which is a binary image showing points with damage indices in white and all other points in black, is also created. Examples of the heightfield and label images are shown in Figs. 3 and 4.

Fig. 3
figure 3

Simulated data with heightfield and label

Fig. 4
figure 4

Real data with heightfield and label image

Convolutional neural network

The feature extraction from the heightfield is carried out using the VGG19 DNN (Simonyan & Zisserman 2014). Afterwards, the local outlier factors (LOF) is calculated out of the resulting feature maps.

The used VGG19 DNN is a variant of the VGG network which achieved very good results at the ImageNet Large Scale Visual Recognition Competition in 2014. The VGG19 DNN consists of 19 layers, thereof 16 convolutions and 3 fully connected layers. Because of the good results, the VGG19 DNN is used with weights pre-trained on the ImageNet data set (Deng et al. 2009). There is no adjustment of the weights carried out. For the extension with LOF, the DNN is truncated after Layer pool_4. At this point, the feature maps are available. Small sections or “tiles” of the original depth image are used as input to the DNN. Accordingly, one output per tile is obtained. The size of the tiles is a variable parameter in the DNN. In addition, the tiles overlap by 50%.

The LOF is then calculated from these different feature maps for each tile. Each of the individual tiles is assigned its value. In the overlapping areas, the average value of the calculated LOF is determined. Using a threshold for the resulting LOF of the sections of the point cloud, the actual binary classification is carried out.

Data

In this work, we will use two data sets for analysis: one consisting of simulated sheet pile walls with randomly distributed damage and another consisting of terrestrially surveyed quay walls from the port in Lübeck, Germany. The simulated data set will be used to demonstrate the suitability of our chosen methodology, while the real data set will be used to confirm that our method can be applied in practice.

In Fig. 3, there is a point cloud that has been simulated to depict quay walls above the water’s surface. This point cloud has been divided into three categories: quay walls (coloured blue), concrete spalling (coloured green), and additional objects (coloured red). These categories can be seen in the label image that accompanies the point cloud. The distance between points in the point cloud can vary, but is generally about 1 cm. There is also some noise present, with a range of 1–2 mm. Figure 4 displays the classified point cloud along with its corresponding depth and label image.

Experiments

In the following section, the results of various experiments will be presented. The experiments were conducted on both simulated and real data. There were a total of 500 simulated images and 395 created heightfield and label images used as the basis for the experiments. The purpose of the experiments is to determine whether the proposed method, which utilizes DNN and LOF, is effective at detecting damage on quay walls. There are two primary criteria that must be considered when assessing the success of damage detection. The first is precision, which refers to the proportion of damage regions identified by the network that are actually damaged in reality. The second is recall, which refers to the proportion of actual damage that is identified by the network. In addition to precision and recall, other measures such as accuracy (the percentage of correctly classified regions in the entire data set), the F1 score (a harmonic mean of precision and recall), and the Mathews correlation coefficient (MCC) will also be reported. The MCC takes all four categories of the confusion matrix into account and ranges from – 1 (strong misclassification) to + 1 (excellent classification). A value of 0 indicates random classification (Chicco and Jurman 2020).

Simulated data

The best results were obtained when using tile sizes of 32 × 32 pixels and normalized features as output from the DNN. Figure 5 shows the results for different threshold values. Using a threshold of – 1.5, the confusion matrix in Table 1 was obtained, resulting in an accuracy of 99.9%, a precision of 96.3%, and a recall of 96.2%. This leads to an F1 score of 96.3%. The MCC was also found to be 0.96, indicating a good classification. Figure 6 shows the results for two of the simulated images, with green rectangles indicating true positive (TP) classification, red indicating false negative (FN), and yellow indicating false positive (FP). Overall, the results using simulated data were very good, and the LOF appears to be effective for the automated classification of quay walls.

Table 1 Confusion matrix for threshold – 1.5 for simulated Data
Fig. 5
figure 5

Simulated data using tiles size32 × 32 px and normalized features

Fig. 6
figure 6

Examples of classification results on test data. Correct classification of damage is coloured in green, yellow are false positives (classified damage where there is none), and red are false negatives (did not detect the damage)

Real data

The method has shown promise in identifying potential damage areas, so the next step is to test it on a real data set using various tile sizes. The normalized features from the DNN are used as the output. The best result was achieved using a tile size of 32 × 32 pixels, as shown in Fig. 7.

Fig. 7
figure 7

Real data generated with a segmentation of 50 × 50 cm and tile size 32 × 32 px, normalized features

A threshold of – 1.55 was selected, and the resulting confusion matrix is shown in Table 2. At this threshold, the accuracy is 90.5%, precision is 72.2%, recall is 72.6%, and the F1 score is 72.4%. The MCC is 0.66, indicating good classification, but slightly lower than the results obtained using simulated data.

Table 2 Confusion matrix for threshold – 1.55 for real data

The classification result for the two images is shown in Fig. 8. Green, red, and yellow again indicate TP, FN, and FP. Here, the original heightfield is shown with higher contrast in the middle again to make grey value differences in the heightfield more visible for human eye.

Fig. 8
figure 8

Example classification real data, in the middle depth image with more contrast

Although the results were not as good as those obtained using simulated data, the method still appears to be effective when applied to real data.

The presented methodology is supposed to be used to automatically create a suspicion plan with suspected damage regions from point clouds. To be able to use the methodology in reality, all damage regions must be found as far as possible. In addition, only the damaged regions as such should be detected. This means that precision and recall together should be as high as possible. The methodology was first tested on simulated data and then applied to real data.

The method demonstrated strong performance on simulated data, with a F1 score of 96.3% and an MCC of 0.96. These results indicate that the method is suitable for generating suspicion maps of damage on quay walls, meeting the desired criteria for accuracy and completeness.

The results from the real data were slightly worse than those from the simulated data, with a F1 score of 72.4% and an MCC of 0.66. There are several potential reasons for this discrepancy. One factor could be the presence of additional objects such as ladders, fenders, plants, and ropes in the real data, which can affect the measurements. Additionally, the data has not been cleaned or filtered, potentially leading to outliers and sensor artefacts that impact the results. The noise in the real data may also not be normally distributed and could contain systematic components, in contrast to the simulated data. Finally, we used a pre-existing CAD model for the simulated data to determine distances, while we created our own model for the real data. Despite these differences, the results still have value in terms of automated damage detection and digitally assisted building inspections.

Conclusion and outlook

Identifying damaged areas is important for maintaining aging port infrastructure. In this study, point clouds collected using multi-beam echo sounder and profile laser scanners were converted into depth images and processed using a pre-trained CNN with LOF extension. The automated method of creating heightfields from point clouds and then classifying them using a DNN in combination with LOF was found to be effective. The experiments yielded very good results for the simulated data, and while the classification of the real data was slightly worse, it was still sufficient. Overall, the method is useful for detecting damage and can be applied to improve inspection procedures. In the future, it may be possible to improve classification by more effectively removing additional objects from the data. Another area of focus for future research will be combining 3D point clouds and colour images to detect small cracks and non-geometric damage such as rust or efflorescence, as well as developing more advanced and knowledge-based data cleansing techniques.