Beyond Dents and Scratches: Logical Constraints in Unsupervised Anomaly Detection and Localization

Bergmann, Paul; Batzner, Kilian; Fauser, Michael; Sattlegger, David; Steger, Carsten

doi:10.1007/s11263-022-01578-9

Beyond Dents and Scratches: Logical Constraints in Unsupervised Anomaly Detection and Localization

Open access
Published: 22 February 2022

Volume 130, pages 947–969, (2022)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Computer Vision Aims and scope Submit manuscript

Beyond Dents and Scratches: Logical Constraints in Unsupervised Anomaly Detection and Localization

Download PDF

15k Accesses
38 Citations
17 Altmetric
Explore all metrics

Abstract

The unsupervised detection and localization of anomalies in natural images is an intriguing and challenging problem. Anomalies manifest themselves in very different ways and an ideal benchmark dataset for this task should contain representative examples for all of them. We find that existing datasets are biased towards local structural anomalies such as scratches, dents, or contaminations. In particular, they lack anomalies in the form of violations of logical constraints, e.g., permissible objects occurring in invalid locations. We contribute a new dataset based on industrial inspection scenarios that evenly covers both types of anomalies. We provide pixel-precise ground truth data for each anomalous region and define a generalized evaluation metric that addresses localization ambiguities that can arise for logical anomalies. Furthermore, we propose a novel algorithm that improves over the state of the art in the joint detection of structural and logical anomalies. It consists of a local and a global network branch. The first one inspects confined regions independent of their spatial locations in the input image and is primarily responsible for the detection of entirely new local structures. The second one learns a globally consistent representation of the training data through a bottleneck that enables the detection of violations of long-range dependencies, a key characteristic of many logical anomalies. We perform extensive evaluations on our new dataset to corroborate our claims.

A Reliable Surface Defect Detection Method Based on Semantic Image Inpainting

Anomaly Detection Using GANs for Visual Inspection in Noisy Training Data

Bringing Attention to Image Anomaly Detection

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The abundance and availability of unlabeled image data both enables and encourages the development of unsupervised methods in many areas of computer vision. In this paper, we address the problem of detecting and localizing anomalous regions in natural images without any prior knowledge of the nature and appearance of potential anomalies.

This problem has attracted increased attention from the research community and has applications in numerous fields, including active learning (Mackowiak et al. 2018; Yoo and Kweon 2019), medical imaging (Baur et al. 2019; Zhou et al. 2020; Schlegl et al. 2019), autonomous driving (Blum et al. 2019; Lis et al. 2019), and industrial inspection (Bergmann et al. 2021, 2019b; Cohen and Hoshen 2020).

The present work builds on the observation that deviations from the anomaly-free training data can manifest themselves in very different ways. On the one hand, entirely new local structures can occur that are not present during training. On the other hand, an image can also be considered anomalous if certain underlying logical or geometrical constraints of the training data are violated. To illustrate the difference between these two, we created a synthetic toy dataset. All anomaly-free images display exactly one black circle at a random location on a flat white background. We introduced two different types of anomalies. The first one is a simple color variation. The second type of anomaly is characterized by the fact that there are two black circles in a single image instead of one. Detailed information on the creation of the toy dataset is found in Appendix 1.

A number of existing state-of-the-art unsupervised anomaly detection methods model the distribution of local features extracted from pretrained networks (Bergmann et al. 2020; Burlina et al. 2019). They excel at the detection of anomalies such as the color defect in our toy dataset. They are, however, inherently limited to the information inside the receptive field of their descriptors. This makes it difficult to detect anomalies that violate long-range dependencies. In Fig. 1, we demonstrate this by considering three test images of our toy dataset, one of which is anomaly-free, one shows a color defect, and one contains an additional circle. The center row shows anomaly maps calculated by the Student–Teacher method (Bergmann et al. 2020). This method clearly identifies and localizes the color defect. The two circles, however, are not predicted as anomalous because each individual circle does not constitute an anomaly and the receptive field of the method is not large enough to understand the long-range relationships in the image.

Autoencoders (VAEs) (An and Cho 2015; Vasilev et al. 2019) or Generative Adversarial Networks (GANs) (Goodfellow et al. 2014; Schlegl et al. 2017) have the potential to capture information from the entire image (Liu et al. 2020). Consequently, they are potentially able to detect anomalies such as the extra black circle in our toy dataset. However, they also tend to produce blurry and inaccurate reconstructions, which leads to an increase in false positives, and are often outperformed by the local methods mentioned above. The bottom row of Fig. 1 shows anomaly maps calculated by a VAE on our toy dataset. This method accomplishes to identify the two circles as anomalous but produces many false positives in the anomaly-free test image.

Motivated by these observations, we classify an anomaly as either a structural anomaly or a logical anomaly and demonstrate that existing methods indeed perform very differently on these two classes. We define structural anomalies as new visual structures that occur in locally confined regions and that do not exist in the anomaly-free data. Logical anomalies, on the other hand, violate underlying logical constraints in the data and potentially require a method to capture long-range dependencies. In our toy example, we would classify the color defect as a structural anomaly since the yellow color adds a local structure that has never been observed during training. The additional circle in the top right corner of Fig. 1 does not introduce any new local structure. The anomaly manifests itself through the violation of the logical constraint that there should always be exactly one circle in the image. Hence, we classify it as a logical anomaly. Note that it is not always straightforward to make a clear distinction between structural and logical anomalies and corner cases may exist.

Existing datasets (Bergmann et al. 2021; Carrera et al. 2017; Huang et al. 2018; Song and Yan 2013) identify the task of visual inspection of industrially manufactured products as a typical real-world example for unsupervised anomaly detection. Nevertheless, all of them focus on the detection of structural anomalies and therefore favor methods that perform well on this type of anomaly. Logical anomalies, however, do occur in manufacturing processes, e.g., as an incorrect wiring of a circuit, a shift in the fill level of a vial, or the absence of an essential component. The development of methods that are capable of detecting logical anomalies is hindered by the availability of suitable data. This creates the need for a dataset that takes both structural and logical anomalies into account with equal importance. We intend to alleviate this need by introducing a new dataset that is also inspired by industrial inspection scenarios but balances the number of logical and structural anomalies. An illustrative example is portrayed in Fig. 2.

This new dataset has enabled us to develop a new method that is capable of detecting both types of anomalies. In summary, we make three key contributions:

We introduce a new dataset for the evaluation of unsupervised anomaly localization algorithms that covers both structural and logical anomalies. It contains 3644 images of five distinct object categories inspired by real-world industrial inspection scenarios. Structural anomalies occur as scratches, dents, or contaminations in the manufactured products. Logical anomalies violate underlying constraints, e.g., a permissible object being present in an invalid location or a required object not being present at all. We hope that this dataset will help the research community to develop and test their own algorithms in the future.
In order to compare the performance of different methods on our dataset, a suitable performance measure is needed. We find that commonly used metrics are not directly applicable to assess the capability of methods to detect logical anomalies. To this end, we introduce a performance metric that takes the different modalities of the defects present in our dataset into account. This performance measure is a generalization of an established measure for unsupervised anomaly detection.
We propose a new method for the unsupervised pixel-precise localization of anomalies. It improves the results of the joint detection of structural and logical anomalies compared to existing methods. Our method consists of a local and a global branch, each of which we show to be primarily responsible for the detection of structural and logical anomalies, respectively. Motivated by the recent success of using local features of pretrained networks for anomaly detection, our local branch contains a regression network that matches such local descriptors. The global branch of our method intends to overcome the difficulty to capture the entire context of an input image by learning a globally consistent representation of the training data through a bottleneck. During inference, regression errors in the two branches indicate anomalies. Extensive evaluations against state-of-the-art methods show the superiority of our approach in the detection of logical anomalies, as well as in the combined localization of both anomaly types.

2 Related Work

We first discuss existing datasets for unsupervised anomaly localization and show the need for our newly introduced dataset. We then give an overview of relevant approaches to unsupervised anomaly localization. Pang et al. (2020) provide a more comprehensive review of both subjects.

2.1 Datasets

The availability of challenging datasets such as ImageNet (Krizhevsky et al. 2012), MS-COCO (Lin et al. 2014), or Cityscapes (Cordts et al. 2016) has largely contributed to recent successes in various fields of computer vision. For the task of unsupervised anomaly localization, however, comparatively few datasets exist and all of them are primarily designed for the detection of what we refer to as structural anomalies.

Huang et al. (2018) introduce a surface inspection dataset of magnetic tiles. It contains 1344 grayscale images of a single texture. Test images contain various structural anomalies such as cracks or uneven areas. Similarly, Carrera et al. (2017) present NanoTWICE, a dataset of 45 grayscale images of a nanofibrous material acquired by a scanning electron microscope. Anomalies occur in the form of flattened areas or specks of dust. Both datasets only provide textured images, which require a method to focus on local repetitive patterns. Hence, these datasets are inherently unsuited for assessing the ability of a method to capture long-range dependencies and logical constraints.

The Fishyscapes dataset (Blum et al. 2019) is intended to assess the anomaly detection performance of semantic segmentation algorithms for autonomous driving. The task is to train a supervised model on the Cityscapes dataset and, during inference, to localize anomalous objects that were inserted artificially into the test images. The anomalies only consist of objects not present in the training set. This enables their detection based on local, patch-based visual features.

The MVTec Anomaly Detection dataset (MVTec AD) comprises five texture and ten object categories from industrial inspection scenarios (Bergmann et al. 2021). The 1258 test images contain 73 types of anomalies, such as contaminations or scratches on the manufactured products. The vast majority (97%) of anomalies in the dataset matches our definition of structural anomalies. Hence, an evaluation on this dataset alone does not give sufficient insight into how well a method detects logical anomalies.

To date, there exists no comprehensive dataset that explicitly focuses on the detection of structural as well as logical anomalies and that requires a model to understand the underlying logical or geometrical relationships in the anomaly-free data. To fill this void, we introduce the Logical Constraints Anomaly Detection dataset. It represents industrial inspection scenarios and equally covers both types of anomalies.

2.2 Methods

The diversity of methods for unsupervised anomaly detection and localization is high. Numerous approaches have been introduced to tackle the problem. Ehret et al. (2019) give a comprehensive review of existing work. Here, we restrict ourselves to a brief overview of methods. We only cover methods that are capable of performing a pixel-precise localization of anomalies in natural images.

Autoencoder-based methods attempt to reconstruct input images through a low-dimensional bottleneck. They rely on the assumption that anomalies cannot be reconstructed during inference. Pixelwise anomaly scores are derived by comparing the input to the reconstruction. While their latent representations have the potential to capture the global context of the training data, autoencoders tend to produce blurry and inaccurate reconstructions. This leads to an increase in false positives. They might also learn to simply copy parts of the input data, which would allow them to reconstruct anomalous features during inference. To discourage this behavior, Park et al. (2020) introduce MNAD, an autoencoder with an integrated memory module. It selects numerous latent features during training that need to be reused for reconstruction during inference. In our experiments, we observed that this indeed helps in the detection of structural anomalies but impairs the detection of logical ones (see Fig. 8).

Similar to autoencoders, GAN-based methods attempt to reconstruct anomaly-free images by finding suitable latent representations as input for the generator network. Schlegl et al. (2019) propose f-AnoGAN, for which an encoder network is trained to output the latent vectors that best reconstruct the training data. A pixelwise comparison of the input image and the reconstruction yields an anomaly score. Since GAN-based methods are difficult to train on high-resolution images (Gulrajani et al. 2017), f-AnoGAN processes images at a resolution of $64 \times 64$ pixels, which results in very coarse anomaly maps.

Methods that leverage features of pretrained networks tend to outperform autoencoder- or GAN-based methods that are trained from scratch (Burlina et al. 2019). They achieve this by modeling the distribution of local features obtained from spatially resolved activation layers of a pretrained network. Cohen and Hoshen (2020) introduce the SPADE method which utilizes the feature space of a deep CNN. During inference, the method first identifies a certain number of anomaly-free training images that are closest to the test image. A separate 1-NN classifier is then introduced for each pixel in the feature maps extracted from the selected training images. This makes the algorithm computationally expensive, which might prevent it from being used in practical applications.

Bergmann et al. (2020) propose a Student–Teacher framework in which an ensemble of student networks matches local descriptors of pretrained teacher networks on anomaly-free data. Anomalies are detected by increased regression errors and predictive variances in the students’ predictions. The networks employed exhibit a limited receptive field, which prevents this method from detecting global inconsistencies that fall outside the receptive field’s range.

3 The Logical Constraints Anomaly Detection Dataset

To be able to compare the ability of anomaly detection methods to understand logical constraints, we need suitable datasets. As discussed in Sect. 2.1, very few datasets exist for unsupervised anomaly detection in general. Industrial inspection scenarios have been identified as a prime example for unsupervised anomaly detection tasks. This is underlined by the fact that the majority of the existing datasets (Bergmann et al. 2019a, b; Carrera et al. 2017; Huang et al. 2018; Song and Yan 2013) are inspired by such applications.

None of them, however, set an explicit focus on the joint detection of structural and logical anomalies. To this end, we introduce the MVTec Logical Costraints Anomaly Detection (MVTec LOCO AD) dataset.^{Footnote 1}

3.1 Description of the Dataset

MVTec LOCO AD consists of five object categories from industrial inspection scenarios. We have selected the objects and designed our acquisition setup in such a way that they are as close as possible to real-world applications. In machine vision applications, an object is usually located in a defined position. This is often realized by a mechanical alignment system. The illumination is chosen to best suit the task or is specifically designed for it. The same is true for the employed camera and lens. For more details on typical machine vision setups, we refer to Steger et al. (2018).

We provide a total of 1772 images for training, 304 for validation, and 1568 for testing. Figure 3 shows example images for each of the dataset categories. The training sets consists of only anomaly-free images. Machine learning methods typically require data for validating their performance during training or for adjusting hyperparameters. To ensure that the choice of the validation data does not add a bias to evaluations and benchmarks, we define a specific validation set. Like the training images, the validation images are free of any anomalies. The test set contains anomaly-free images and images with various types of logical and structural anomalies. All three sets are independent of each other in the sense that they consist of images of distinct physical objects and that there is no overlap between them. An overview of the image statistics of our dataset is shown in Table 1, including the number and size of training, validation, and test images as well as the number of different defect types for each category.

Table 1 Statistical overview of the MVTec LOCO AD dataset. For each category, the number of training, validation, and test images is given

Beyond Dents and Scratches: Logical Constraints in Unsupervised Anomaly Detection and Localization

Abstract

Similar content being viewed by others

A Reliable Surface Defect Detection Method Based on Semantic Image Inpainting

Anomaly Detection Using GANs for Visual Inspection in Noisy Training Data

Bringing Attention to Image Anomaly Detection

1 Introduction

2 Related Work

2.1 Datasets

2.2 Methods

3 The Logical Constraints Anomaly Detection Dataset

3.1 Description of the Dataset

3.2 Annotations and Labeling Policies

3.3 The Saturated Per-Region Overlap (sPRO)

3.4 Selection of Saturation Thresholds

4 Description of Our Method

5 Experiments on the LOCO Dataset

5.1 Dataset Augmentation

5.2 Training and Evaluation Protocols

5.3 Experiment Results

5.4 Ablation Studies

5.5 Image-Level Classification

6 Experiments on the MVTec AD Dataset

7 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Availability of Data and Material

Code Availability

Author Contributions

Additional information

Publisher's Note

Appendices

A Construction of the Black Circle Toy Dataset

B Additional Results on MVTec LOCO AD

C Additional Results on MVTec AD

D Overview of the Anomalies Present in the MVTec LOCO AD Dataset

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation