Introduction

Sewerage systems are important infrastructure facilities for improving public health and maintaining water quality. However, in Japan, the number of sewer pipes that have reached the end of their useful life of 50 years has reached approximately 22,000 km, and the number is increasing every year [1]. However, the number of sewer pipes in Japan that have reached the end of their useful life of 50 years has reached approximately 22,000 km, and the number is increasing every year. In 2019, approximately 2900 road subsidence incidents have been reported in Japan [1]. In order to prevent such collapses, the inside of the pipe should be inspected. For this reason, the development of worm-shaped robots for piping inspection, as shown in Fig. 1, and inspection methods are being studied [2].

Fig. 1
figure 1

Earthworm robot [2]

In this research, we focus on pressure pipes, which account for 5 to 10% of all sewer pipes inspected in Japan. Compared to conventional sewer pipes, pressure pipes are less restricted by topography and can be laid out relatively freely [3]. Therefore, inspection methods have not been established. The defects that occur in pressure-feeding pipes differ depending on the material. Vinyl chloride pipes are deformed by soil pressure and cast iron pipes have rust on the inner surface of the pipe. These defects must be detected and located from the images.

In recent years, with the development of deep learning, research in the field of anomaly detection, such as defect inspection of industrial products, has been active [4]. In general, anomaly detection is difficult to perform supervised learning because there is very little anomaly data compared to normal data. For this reason, unsupervised learning using non-defect data is often used, with deviations from the normal distribution being considered abnormal. It is considered effective to solve this problem as an abnormality detection problem in the inspection. Oyama et al. [5] proposed a method for detecting abnormalities in piping using Variational AutoEncoder (VAE) [6] and Residual Network (ResNet) [7]. Specifically, the VAE learns only non-defect images and calculates the degree of abnormality by taking the difference between the input image and the generated image to detect abnormalities. The method also estimated the anomaly location by inputting the input image and the generated image to ResNet and extracting the middle layer of the input image and the generated image. However, VAE has the disadvantage of blurring the generated image. Therefore, they have not been able to estimate the location of anomalies, In addition, the experiment was conducted on a self-made data set that imitated rust, and no verification was conducted using actual cast-iron pipe images.

The objective of this paper is to detect abnormal rust in cast iron pipes and to estimate the location of abnormalities. Specifically, we use a generative model of deep learning, the Generative Adversarial Network (GAN) [8]. Anomaly detection and anomaly location estimation are performed by combining f-AnoGAN [9], one of the GAN anomaly detection methods, and Lightweight GAN [10], a model for image generation. Validity is confirmed by verification using cast iron pipe images using actual cast iron pipe images captured by a camera mounted on an earthworm robot. Validation is also performed using Sewer-ML [11], a public dataset.

Proposed method

Outline of proposed methodology

Figure 2 shows the flow of the proposed method in this paper. GAN learns non-defect images among the images taken by the worm-shaped robot. Next, fix the GAN parameters and have Encoder learn only non-defect images. The learned model is used to generate an image, and the difference from the input image is taken. If the difference is small, it is judged as normal, and if the difference is large, it is judged as abnormal. If an abnormality is detected, estimate the location of the abnormality from the subtraction image. In this research, we are faced with the problem of only having a small amount of training data, since it is still difficult to conduct experiments using actual robots.

Fig. 2
figure 2

Outline of proposed method

Generative adversarial networks

GAN consists of two neural networks, Generator and Discriminator, which learn to deceive each other. The generator takes noise as input and generates images that resemble the training data. The discriminator takes the images generated by the generator or the training data as input and identifies whether they are real or fake. The generator is intended to be misrecognized by the discriminator. Adversarial learning between the generator and discriminator allows the generator to generate images that resemble the training data.

Lightweight GAN

Lightweight GAN [10] is a model that can generate high-resolution images with a small amount of training data in a short training time. Generally, it is necessary to train tens to hundreds of thousands of images to generate high-resolution images with a GAN. Lightweight GAN uses a Skip-layer channel-wise Excitation (SLE) module and a self supervised discriminator in the Generator to generate high resolution images with a small amount of data and short training time.

The structure of Generator is shown in Fig. 3. The blue arrows indicate upsampling and convolution. The red arrows indicate the inputs and outputs of the SLE module, which takes a small feature map and a large feature map as inputs and fuses them together. This allows gradient propagation at low computational cost.

Fig. 3
figure 3

The structure of generator

The structure of the Discriminator is shown in Fig. 4. It performs self-supervised learning. The blue arrows indicate down-sampling and 5 × 5 resizing. The difference between the resized training data and the data generated by the Generator is the loss.

Fig. 4
figure 4

The structure of the discriminator

Also, an 8 × 8 part is randomly cut out from the 16 × 16 feature map and input it to Simple Decoder. The output is \(I^{\prime}_{part}\) . 8 × 8 feature map is input to Simple Decoder and its output is \(I^{\prime}\) . The same process is applied to the training data and denote I and \({I}_{part}\) respectively.

The difference between I' and I, \(I^{\prime}_{part}\) and \(I^{\prime}_{part}\) is used as the reconstruction error. This is expected to capture the whole image and local features.

The loss function is an Adversarial loss of Hinge type, and the reconstruction error \({L}_{recon}\) is added when the training data is input to the Discriminator. The Discriminator loss \({L}_{D}\) and Generator loss \({L}_{G}\) are expressed by the following equations.

$$L_{D} = - {\text{E}}_{{(x\sim I_{r} eal)}} \left[ {min\left( {0, - 1 + D\left( x \right)} \right)} \right] - {\text{E}}_{{(\mathop { \, x}\limits^{ \wedge } \sim G\left( z \right))}} \left[ {min\left( {0, - 1 - D\left( {\mathop { \, x}\limits^{ \wedge } } \right)} \right)} \right] + L_{recons}$$
(1)
$$L_{G} = - {\text{E}}_{{\left( {z \sim N} \right)}} \left[ {min\left( {D\left( {G\left( z \right)} \right)} \right)} \right]$$
(2)

D is the Discriminator, G is the Generator, x is a sample of the training data, z is the noise sampled from the latent space, and \(\mathop x\limits^{ \wedge }\) is the data generated by the Generator.

The reconstruction error \({L}_{recons}\) is

$$L_{recons} = - E_{{\left( {f\sim D_{encode\left( x \right)} ,x\sim I_{real} } \right)}} \left[ {g\left( f \right) - T\left( x \right)} \right]$$
(3)

f is the Discriminator’s intermediate feature map, g is the Decoder’s processing for f, and T is the processing for x. Learned Perceptual Image Patch Similarity (LPIPS) [12] is used for reconstruction error. LPIPS is a measure of the difference between two images.

In addition, the use of Differentiable Augmentation [13], a data augmentation method for GANs, enables learning with small amounts of data. Differentiable Augmentation is a method of applying data augmentation not only to the input image but also to the Generator's generated image.

f-AnoGAN

f-AnoGAN is one of the anomaly detection methods using GAN. Although GAN does not have a mechanism for reconstructing inputs, this model enables anomaly detection by introducing Encoder.

The flow of the method is described below. At training time, non-defect data are trained on the GAN. Next, the Encoder is trained using the trained GAN. Specifically, data x is input to Encoder and noise z is obtained. The noise z is input to the learned Generator to generate G(E(x)). Input data x and generated data G(E(x)) are trained to minimize the reconstruction error. During inference, test data are input to the learned The test data are input to the learned model and the reconstruction error between the input data x and the generated data (x) is taken as the anomaly.

Proposed model

In this paper, we use a model that combines f-AnoGAN and Lightweight GAN. The proposed model is shown in Fig. 5. WGAN [14] is used for f-AnoGAN, but it has the disadvantage of taking a long time to learn. Therefore, Lightweight GAN is used instead of WGAN. The use of a more specialized model for image generation would make it possible to estimate detailed anomaly location estimates from subtraction images. It is also expected to be stronger in learning with small amounts of data. As with f-AnoGAN, a 4-layer ResNet is used for the Encoder.

Fig. 5
figure 5

Proposal method (image generation by lightweight GAN)

Loss function

The GAN loss function uses Eqs. 1 and 2.

Mean Squared Error is used for the loss function \({L}_{E}\) of the Encoder.

$$L_{E} \left( x \right) = \frac{1}{n}\left\{ {x - G\left( {E\left( x \right)} \right)} \right\}^{2}$$
(4)

n is the total number of pixels, x is the input data, and G(E(x)) is the generated data.

Calculation of anomaly score

The calculation of anomaly is performed by calculating the difference between the input image and the generated image and summing the pixel values of all pixels in the subtraction image. A threshold value is set for the calculated anomaly to determine whether the image is normal or abnormal.

Estimation of abnormal locations

The subtraction image between the input image and the generated image is used to estimate the anomaly. The subtraction image is colored blue to indicate that the value is small, and red to indicate that the value is large. Therefore, red areas in the subtraction image are estimated to be abnormal.

Experiment

Experimental condition 1

Experiments were conducted using the proposed method to detect anomalies and estimate the location of anomalies in actual cast iron pipe images. The training data consisted of approximately 2200 normal images of actual cast iron pipes and approximately 2200 normal images of new cast iron pipes, for a total of approximately 4400 images. Examples of training data are shown in Figs. 6 and 7. The test data consisted of a total of 200 images: 50 normal images of actual cast iron pipe, 50 normal images of new cast iron pipe, and 100 abnormal images of actual cast iron pipe. Examples of abnormal images of the test data are shown in Fig. 8. The number of pixels in both the training and test data were resized to 256 × 256.

Fig. 6
figure 6

Example of normal cast iron pipe images

Fig. 7
figure 7

Example of normal new cast iron pipe images

Fig. 8
figure 8

Example of anomaly cast iron pipe images

The evaluation method is based on the Area Under Receiver Operating Characteristic (AUROC), which is the size of the area under the ROC curve. The values range from 0 to 1, with a value closer to 1 indicating better model performance. No abnormality threshold is set for evaluation by AUROC.

The number of iterations for GAN was set to 50000 and batch size to 8. 500 epochs and batch size of Encoder were set to 16.

Experimental results 1

Figures 9, 11 and 13 show the GAN experimental results. Figures 10, 12 and 14 show the VAE experimental results. Figures 9, 10,11,12,13 and 14 show, from left to right, the input image, the generated image, and the subtraction image. The blue color in the subtraction image indicates a smaller difference value, while the red color indicates a larger difference value. Figure 9 and 10 shows a normal image of an actual cast iron pipe, Figs. 11 and 12 shows a normal image of a new cast iron pipe, and Figs. 13 and 14 shows the results for an abnormal image of an actual cast iron pipe. The histogram of abnormality in Figs. 15 and 16 shows that the blue distribution represents normal data and the orange distribution represents abnormal data; AUROC was 0. 986. On the other hand, the AUROC for VAE was 1. 0.

Fig. 9
figure 9

Experimental results of normal cast iron pipe (GAN)

Fig. 10
figure 10

Experimental results of normal cast iron pipe (VAE)

Fig. 11
figure 11

Experimental results of normal new cast iron pipe (GAN)

Fig. 12
figure 12

Experimental results of normal new cast iron pipe (VAE)

Fig. 13
figure 13

Experimental results of rust anormaly cast iron pipe (GAN)

Fig. 14
figure 14

Experimental results of rust anormaly cast iron pipe (VAE)

Fig. 15
figure 15

Histogram of anomaly score 1 (GAN)

Fig. 16
figure 16

Histogram of anomaly score 1 (VAE)

The high AUROC value of 0. 986 suggests that anomaly detection for the test data is feasible.

Looking at the generated images of normal cast iron pipes in Figs. 9 and 11, both generate images similar to the input image. From this fact, it is considered possible to determine whether the input normal image is normal or not, regardless of whether it is a new image or an actual image.

The Figs. 10 and 12 suggests that VAE is also correctly generated for both actual and new.

The generated image of the abnormal cast iron pipe in Fig. 13 shows that the image is generated as if there were no abnormal areas in the input image. The subtraction image shows that the white areas that are abnormal are reddish. This indicates that it is possible to estimate the location of the anomaly.

On the other hand, the subtraction image of VAE’s ResNet in Fig. 14 shows that the anomaly location cannot be estimated.

The histogram of abnormality in Fig. 15 shows that the distribution of normal and abnormal is divided to some extent. On the other hand, there are some areas where the distributions overlap. This is because some of the generated images could not be generated correctly. The reason may be that Encoder's batch size was too small, resulting in poor feature extraction.

The histogram of VAE in Fig. 16 shows a complete separation of normal and abnormal distributions. This suggests that VAE is able to correctly generate both normal and abnormal.

Experimental condition 2.

Experiments were conducted using the proposed method to detect anomalies and estimate the location of anomalies in sewer-ML, a public data set. Sewer-ML is a multi-label sewer defect classification dataset. The training data consisted of approximately 14000 normal images. Examples of training data are shown in Fig. 17. The test data consisted of a total of 2000 images: 1000 normal images, and 1000 abnormal images. Examples of abnormal images of the test data are shown in Fig. 18.

Fig. 17
figure 17

Example of normal pipe images

Fig. 18
figure 18

Example of anormaly pipe images

The evaluation method is based on the Area Under Receiver Operating Characteristic (AUROC).

Experimental results 2

Figures 19 and 21 show the GAN experimental results. Figures 20 and 22 show the VAE experimental results. Figures 19, 20, 21 and 22 show, from left to right, the input image, the generated image, and the subtraction image. Figures 19 and 20 shows a normal image, and Figs. 21 and 22 shows the results for an abnormal image. The histogram of abnormality in Figs. 23 and 24 shows that the blue distribution represents normal data and the orange distribution represents abnormal data; AUROC was 0. 634. On the other hand, the AUROC for VAE was 0. 649.

Fig. 19
figure 19

Experimental results of normal pipe (GAN)

Fig. 20
figure 20

Experimental results of normal pipe (VAE)

Fig. 21
figure 21

Experimental results of obstacle anormaly pipe (GAN)

Fig. 22
figure 22

Experimental results of obstacle anormaly pipe (VAE)

Fig. 23
figure 23

Histogram of anomaly score 2 (GAN)

Fig. 24
figure 24

Histogram of anomaly score 2 (VAE)

The AUROC of 0. 649 indicates that anomaly detection for sewer-ML is difficult, and the AUROC of 0. 634 for VAE indicates that it is similarly difficult.

The normal generated image in Fig. 20 shows that it is able to generate an image similar to the input image. Similarly, the VAE in Fig. 22 is able to generate an image similar to the input image.

The generated image of the anomaly in Fig. 21 shows that the image is generated as if there is no obstacle in the lower left corner, which is the anomaly location. The subtraction image shows that the anomaly is red. This suggests that it is possible to estimate the location of the anomaly. On the other hand, the generated image of the VAE anomaly in Fig. 22 shows that it has generated an anomalous location. Therefore, the subtraction image of ResNet does not estimate the anomaly location.

The histograms of the anomaly levels in Figs. 23 and 24 do not show much separation between normal and anomaly distributions in either case. This suggests that anomaly detection for sewer-ML is difficult. This is due to the large amount of variations in types of pipes. Dividing the pipes based on type of material can lead to better results. Areas where the conduit divides, as shown in the Fig. 25, are visible differently in different places. Therefore, there is little training data for such examples and it is difficult to generate accurate images. Straight conduits with more training data are considered suitable.

Fig. 25
figure 25

Example of difficulties in detecting anomalies

Conclusion

In this paper, we proposed a GAN-based anomaly detection method for detecting anomalies in piping. f-AnoGAN and Lightweight GAN models are combined to train non-defect images, and anomaly detection is performed by differencing input images and generated images to estimate anomalous locations from the subtraction images. Experiments were conducted on actual cast iron pipe images, and the AUROC was as high as 0. 986, confirming the effectiveness of the proposed method. We also validated the data using sewer-ml, a public dataset. Comparison of the proposed method with conventional methods showed that the proposed method was superior in the estimation of anomalous locations. When considering practical applications, it is difficult to collect all data due to the wide variety of defect types. Therefore, compared to anomaly detection using object detection, the proposed method has the advantage that it does not require learning of defective images. Otherwise, pixelwise labelling of defective areas is required which is a very tedious task. The disadvantage is that if the defect area is small, the difference will be small and may not be determined as abnormal. Prospects are to improve the model so that it can generate more correct non-defect images.