Abstract
Anomaly detection is an important branch of computer vision. At present, a variety of deep learning models are applied to anomaly detection. However, the lack of abnormal samples makes supervised learning difficult to implement. In this paper, we mainly study abnormal detection tasks based on unsupervised learning and propose a Fully-Nested Encoder-decoder Framework. The main part of the proposed generating model consists of a generator and a discriminator, which are adversarially trained based on normal data samples. In order to improve the image reconstruction capability of the generator, we design a Fully-Nested Residual Encoder-decoder Network, which is used to encode and decode the images. In addition, we add residual structure into both encoder and decoder, which reduces the risk of overfitting and enhances the feature expression ability. In the test phase, a distance measurement model is used to determine whether the test sample is abnormal. The experimental results on the CIFAR-10 dataset demonstrate the excellent performance of our method. Compared with the existing models, our method achieves the state-of-the-art result.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Anomaly detection is becoming more and more important in visual tasks. In industrial production, it can greatly improve production efficiency to detect the faults of various parts of machines by means of anomaly detection. Over the years, scholars have done a lot of preliminary works [1,2,3,4,5,6] to explore the development direction of the field of anomaly detection. The development of CNN offers new ideas for image anomaly detection. From the proposal of LeNet [7] structure, to AlexNet [8], to VGG [9] and Inception series [10,11,12], the performance of CNN is getting better and better. In the tasks of anomaly detection, the methods of supervised learning based on CNNs have been widely used to detect anomalies. However, in some engineering areas, the lack of anomaly samples hinders the development of supervised anomaly detection methods. Due to the lack of abnormal samples, traditional methods such as object detection, semantic segmentation and image classification are difficult to carry out model training. Therefore, anomaly detection methods based on normal samples need to be proposed urgently.
The development of GAN in recent years has provided new ideas for the research of anomaly detection methods based on normal samples. As an unsupervised image method, GAN was proposed by Ian Goodfellow et al. [13] in 2014. Subsequently, methods such as LAPGAN, CGAN, InfoGAN, and CycleGAN [14,15,16,17] have gradually enhanced the performance of GAN. AnoGAN [18] applied GAN to the field of image anomaly detection, and realized image anomaly detection without abnormal samples. This method only uses normal samples to train DCGAN [19], and introduces an image distance measurement model to judge whether the samples are abnormal. After that, the proposal of Efficient-GAN [20], ALAD [21] and f-AnoGAN [22] further improved the performance of the GAN-based anomaly detection models.
On the basis of the GAN as the backbone network method, Akcay et al. proposed the GANomaly [23], which trains the autoencoder by adversarial mechanism and carries out image reconstruction operation. Skip-GANomaly [24] adds the skip connections between the encoding part and the decoding part of the generator on the basis of GANomaly to reduce information loss and enhance model performance. However, in some small target anomaly detection tasks, such as bird in CIFAR-10 dataset [25], the performance of f-AnoGAN, Skip-GANomaly and GANomaly are not satisfactory. Moreover, the current encoder-decoder networks lack stability and robustness in the training process.
In the paper, we mainly study abnormal detection tasks based on unsupervised learning and propose a Fully-Nested Encoder-decoder Framework. The main body of the anomaly detection method consists of a generating model and a distance measurement model. The generating model includes a generator and a discriminator, which detects data anomalies by a distance measurement model. In the generating model, we design a Fully-Residual Encoder-decoder Network as the generator. Taking into account the needs of different datasets for different network depths, the generator uses encoding-decoding networks of different depths to nest, which enhances the selectivity of different datasets for the best-depth encoding-decoding network. Then, we choose the discriminant network in DCGAN as the discriminator of the model. The experiments of our method on CIFAR-10 dataset demonstrate its excellent performance.
2 Proposed Method
This paper proposes a Fully-Nested Encoder-decoder Framework for anomaly detection. As shown in Fig. 1, the main body of the anomaly detection method consists of two parts, generating model and distance measurement model. Generating model is generated by learning the distribution of the normal data to reconstruct the normal samples. In the process of training generator, the model uses a classification network as discriminator to train with the adversarial mechanism. Furthermore, we introduce the distance measurement model. The distance measurement model is a distance calculation method. In the test phase, the distance between the reconstructed image and the real image is used to determine whether the test sample is abnormal.
2.1 Generating Model
The generating model reconstructs the image by learning the distribution of normal samples. Choosing a high-performance encoder-decoder network is very important for image reconstruction. The composition of encoder and decoder directly affects the effect of reconstructed image.
In the generating model, generator is a fully nested residual network, which can be divided into encoding part and decoding part, as shown in Fig. 2. The network can be regarded as multiple encoding and decoding networks with different scales nested. The encoder is a shared branch. The decoder decodes the deep semantic feature maps of four different scales generated by the encoder, and produces four parallel decoding branches. The generating model uses a classification network as discriminator and is trained based on the adversarial mechanism. In the whole network structure, Batch Normalization [26] and ReLU activation functions [27] are used.
The encoder is the shared part, as shown in the black dotted box in Fig. 1, represented as \({G}_{E}\), which is used to read in the input image \({x}_{real}\) to generate the deep semantic feature map \(z=({z}_{1},{z}_{2},{z}_{3},{z}_{4})\), its specific expression is shown in Formula (1),
The decoder network decodes \(({z}_{1},{z}_{2},{z}_{3},{z}_{4})\), and produces four parallel branches: \({D}_{1}, {D}_{2}, {D}_{3}\) and \({D}_{4}\), which are expressed as \({G}_{D}\), as shown in the red dotted box in Fig. 1. Moreover, the internal decoding branches uses dense skip connections to connect to adjacent external decoding branches for feature fusion. Skip connections enhance the transfer of detailed information between different branches, greatly reducing information loss. The final layer of the outermost decoding branch outputs the reconstructed image \({x}_{fake}\) of the generator, its specific expression is shown in Formula (2),
We add residual structure into both encoder and decoder to improve the feature expression ability and reduce the risk of overfitting. Through back propagation, the model can independently select the suitable depth network for different datasets through the nested model of four scales.
We add a classification network after the generator as the discriminator of the model, which is the classification network of DCGAN model, denoted by \(D\left( \cdot \right)\). For the input image, the discriminator network identifies whether it is normal sample \({x}_{real}\) or the image \({x}_{fake}\) reconstructed by the generator.
The dataset is divided into the training set \({D}_{train}\) and the test set \({D}_{test}\). The training set \({D}_{train}\) is only composed of normal samples, and the test set \({D}_{test}\) is composed of normal samples and abnormal samples. At the training phase, the model only uses normal samples to train the generator and discriminator. At the test phase, the distance between the given test images and their reconstructed images generated by the generator are calculated to determine whether they are abnormal.
2.2 Distance Measurement Model
In the test phase, we calculate the anomaly score of the test image to measure whether it is abnormal. Given test set \({D}_{test}\) and input \({x}_{test}\), the anomaly score is defined as \(A\left({x}_{test}\right)\). We use two kinds of distances to measure the difference between \({x}_{test}\) and \({x}_{fake}\). First, calculate \({L}_{1}\) distance directly for \({x}_{test}\) and \({x}_{fake}\), represented as \(R\left({x}_{test}\right)\), which describes the detailed difference between the reconstructed image and the input image. Secondly, calculate \({L}_{2}\) distance directly for \(f\left({x}_{fake}\right)\) and \({f(x}_{test})\), which describes the difference in semantic feature, is denoted by \(L\left({x}_{test}\right)\). The formulas for \(A\left({x}_{test}\right)\), \(R\left({x}_{test}\right)\), and \(L\left({x}_{test}\right)\) are as follows,
where \(\lambda \) is the weight to balance the two distances \(R\left({x}_{test}\right)\) and \(L\left({x}_{test}\right)\). In the proposed model, \(\lambda \) is set to 0.9.
In order to better measure whether the input image is abnormal, it is necessary to normalize the anomaly score of each image in the test set \({D}_{test}\) calculated according to Formula (3). Suppose set \(A=\{{A}_{i}:A\left({x}_{test,i}\right), {x}_{test}\in {D}_{test}\}\) is the set of anomaly scores of all images in the test set \({D}_{test}\). The model maps the set of anomaly scores \(A\) to the interval [0, 1] by Formula (6).
We set a threshold for \({A}^{^{\prime}}\left({x}_{test}\right)\). Samples with anomaly score greater than the threshold are judged to be abnormal, else normal.
2.3 Training Strategy
The loss function of the model consists of three kinds of loss functions, which are Adversarial Loss, Contextual Loss, and Latent Loss.
In order to maximize the reconstruction ability of the model during the training phase and ensure that the generator reconstructs the normal image \({x}_{real}\) as realistically as possible, the discriminator should classify the normal image \({x}_{real}\) and the reconstructed image \({x}_{fake}\) generated by the generator as much as possible. Use cross entropy to define the Adversarial Loss, the specific expression is shown in Formula (7).
In order to make the reconstructed image generated by the generator obey the data distribution of normal image as much as possible and make the reconstructed image \({x}_{fake}\) conform to the context image, the model defines the reconstruction loss by calculating the SmoothL1 Loss [28] of the normal image and the reconstructed image, as shown in Formula (8):
where \({S}_{L1}\) represents the SmoothL1 Loss function.
In order to pay more attention to the differences between the reconstructed image \({x}_{fake}\) generated by the generator and the normal image \({x}_{real}\) in the latent space, the model uses the last convolution layer of discriminator to extract the bottleneck features \(f\left({x}_{real}\right)\) and \(f\left({x}_{fake}\right)\), and takes the SmoothL1 loss between the two bottleneck features as the Latent Loss. The specific expression is shown in Formula (10).
In the training phase, the model adopts the adversarial mechanism for training. First, fix the parameters of generator, and optimize the discriminator by maximizing the Adversarial Loss \({\mathcal{L}}_{adv}\). The objective function is
Then, fix the parameters of discriminator, and optimize the generator by the objective function:
where \({w}_{adv}\), \({w}_{con}\) and \({w}_{lat}\) are the weight parameters of \({\mathcal{L}}_{adv}\), \({\mathcal{L}}_{con}\) and \({\mathcal{L}}_{lat}\).
3 Experiments
All experiments in this paper are implemented using the Pytorch1.1.0 framework with an Intel Xeon E5-2664 v4 Gold and NVIDIA Tesla P100 GPU.
3.1 Dataset
To evaluate the proposed anomaly detection model, this paper conducted experiments on the CIFAR-10 [25] dataset.
The CIFAR-10 dataset consists of 60,000 color images, and the size of each image is 32 × 32. There are 10 classes of images in the CIFAR-10 dataset, each with 6000 images. When implementing anomaly detection experiments on the CIFAR-10 dataset, we regarded one class of them as abnormal class, and the other 9 classes as normal class. Specifically, we use 45000 normal images from the other 9 normal classes as normal samples for model training, and the remaining 9000 normal images in the other 9 normal classes and 6000 abnormal images in the abnormal class as test samples for model testing.
3.2 Implementation Details
Model Parameters Setting.
The model is set to be trained for 15 epochs and optimized by Adam [29] with the initial learning rate \(lr=0.0002\), with a lambda decay, and momentums \({\beta }_{1}=0.5\), \({\beta }_{2}=0.999\). The weighting parameters of loss function are set to \({w}_{adv}=1\), \({w}_{con}=5\), \({w}_{lat}=1\). The weighting parameter \(\lambda \) of the distance metric is empirically chosen as 0.9.
Metrics.
In this paper, AUROC and AUPRC are used to assess the performance of our method. Concretely, AUROC is the area under the ROC curve (Receiver Operating Characteristic curve), which is the function plotted by the TPR (true positive rates) and FPR (false positive rates) with varying threshold values. AUPRC is the area under the PR curve (Precision Recall curve), which is the function plotted by the Precision and Recall with varying threshold values.
Results and Discussion.
To demonstrate the performance of our method, we compare our method with Skip-GANomaly, GANomaly and f-AnoGAN on the CIFAR-10 dataset. The parameter settings of Skip-GANomaly and GANomaly are consistent with our experimental parameter settings in this paper, and the parameters of f-AnoGAN are the same as the settings in [22].
Table 1 and Fig. 3 show the experimental results of the CIFAR-10 dataset under the AUROC indicator, and Table 2 and Fig. 4 show the experimental results of the CIFAR-10 dataset under the AUPRC indicator. It is apparent from Table 1, Fig. 3, Table 2 and Fig. 4 that the proposed method is significantly better than the other methods in each anomaly classes of the CIFAR-10 dataset, achieving the optimal accuracy under both AUROC and AUPRC indicators. Moreover, the proposed method achieves the best performance among the three class of objects: airplane, frog, and ship, with almost 100% accuracy for anomaly detection. In addition, for the most challenging abnormal classes bird and horse in the CIFAR-10 dataset, the optimal AUROC of the other methods are 0.658 and 0.672, and the optimal AUPRC are 0.558 and 0.501, respectively. Significantly, the AUROC of abnormal classes bird and horse for the proposed method are 0.876 and 0.866, with accuracy increases of 21.8% and 19.4%, and the AUPRC are 0.818 and 0.775, with accuracy increases of 26.0% and 27.4%.
Figure 5 shows the histogram of anomaly scores of Skip-GANomaly and the proposed model on the CIFAR-10 dataset when bird class is considered as abnormal image. This can be seen that compared with Skip-GANomaly, our method can better distinguish between the normal and the abnormal, and achieves a good anomaly detection effect. Taking bird class as abnormal class, Fig. 6 illustrates the reconstruction effect of our method on objects of CIRAR-10 dataset in the test phase.
In conclusion, the anomaly detection performance of the method proposed in this paper on the CIFAR-10 dataset is better than the previous related methods.
4 Conclusion
In this paper, we introduce a Fully-Nested Encoder-decoder Framework for general anomaly detection within an adversarial training scheme. The generator in the proposed model is composed of a novel full-residual encoder-decoder network, which can independently select suitable depth networks for different datasets through four-scale nested models. The residual structure is added to the generator to reduce the risk of overfitting and improve the feature expression ability. We have conducted multiple comparative experiments on the CIFAR-10 dataset. And the experimental results show that the performance of the proposed method in this paper has greatly improved compared with previous related work.
References
Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)
Niu, Z., Shi, S., Sun, J., He, X.: A survey of outlier detection methodologies and their applications. In: Deng, H., Miao, D., Lei, J., Wang, F.L. (eds.) AICI 2011. LNCS (LNAI), vol. 7002, pp. 380–387. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23881-9_50
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection. ACM Comput. Surv. 41(3), 1–58 (2009)
Ahmed, M., Naser Mahmood, A., Hu, J.: A survey of network anomaly detection techniques. J. Netw. Comput. Appl. 60, 19–31 (2016)
Ma, J., Dai, Y., Hirota, K.: A survey of video-based crowd anomaly detection in dense scenes. J. Adv. Comput. Intell. Intell. Inform. 21(2), 235–246 (2017)
Kwon, D., Kim, H., Kim, J., Suh, S.C., Kim, I., Kim, K.J.: A survey of deep learning-based network anomaly detection. Clust. Comput. 22(1), 949–961 (2017). https://doi.org/10.1007/s10586-017-1117-8
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25(2), 1097–1105 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the Inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Proceedings of 31th AAAI Conference on Artificial Intelligence, pp. 4278–4284 (2017)
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al.: Generative adversarial networks. arXiv: 1406.2661 (2014)
Denton, E., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: Proceedings of 28th International Conference on Neural Information Processing Systems, pp. 1486–1494 (2015)
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv: 1411.1784 (2014)
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Proceedings of 30th International Conference on Neural Information Processing Systems, pp. 2180–2188 (2016)
Zhu, J., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2242–2251 (2017)
Schlegl, T., Seeböck, P., Waldstein, S.M., Schmidt-Erfurth, U., Langs, G.: Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: Niethammer, M., et al. (eds.) IPMI 2017. LNCS, vol. 10265, pp. 146–157. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59050-9_12
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv: 1511.06434 (2015)
Zenati, H., Foo, C.S., Lecouat, B., Manek, G., Chandrasekhar, V.R.: Efficient GAN-based anomaly detection. arXiv: 1802.06222 (2018)
Zenati, H., Romain, M., Foo, C.S., Lecouat, B., Chandrasekhar, V.R.: Adversarially learned anomaly detection. In: Proceedings of 2018 IEEE International Conference on Data Mining, pp. 727–736 (2018)
Schlegl, T., Seeböck, P., Waldstein, S., Langs, G., Schmidt-Erfurth, U.: f-AnoGAN: fast unsupervised anomaly detection with generative adversarial networks. Med. Image Anal. 54, 30–44 (2019)
Akcay, S., Atapour-Abarghouei, A., Breckon, T.P.: GANomaly: semi-supervised anomaly detection via adversarial training. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11363, pp. 622–637. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20893-6_39
Akçay, S., Atapour-Abarghouei, A., Breckon, T.P.: Skip-GANomaly: skip connected and adversarially trained encoder-decoder anomaly detection. In: Proceedings of 2019 International Joint Conference on Neural Networks, pp. 1–8 (2019)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Tech Report (2009)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of International Conference on Machine Learning, pp. 448–456 (2015)
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of 14th International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (2015)
Acknowledgement
This research is supported by Major Special Project (18-A02) of China Railway Construction Corporation in 2018 and Science and Technology Program (201809164CX5J6C6, 2019421315KYPT004JC006) of Xi’an.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
The authors declare that there are no competing interests regarding the publication of this paper.
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this paper
Cite this paper
Gong, Y., Jing, W. (2022). A Fully-Nested Encoder-Decoder Framework for Anomaly Detection. In: Qian, Z., Jabbar, M., Li, X. (eds) Proceeding of 2021 International Conference on Wireless Communications, Networking and Applications. WCNA 2021. Lecture Notes in Electrical Engineering. Springer, Singapore. https://doi.org/10.1007/978-981-19-2456-9_75
Download citation
DOI: https://doi.org/10.1007/978-981-19-2456-9_75
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2455-2
Online ISBN: 978-981-19-2456-9
eBook Packages: EngineeringEngineering (R0)