Abstract
Anomaly detection is of great interest in fields where abnormalities need to be identified and corrected (e.g., medicine and finance). Deep learning methods for this task often rely on autoencoder reconstruction error, sometimes in conjunction with other penalties. We show that this approach exhibits intrinsic biases that lead to undesirable results. Reconstruction-based methods can sometimes show low error on simple-to-reconstruct points that are not part of the training data, for example the all black image. Instead, we introduce a new unsupervised Lipschitz anomaly discriminator (LAD) that does not suffer from these biases. Our anomaly discriminator is trained, similar to the discriminator of a GAN, to detect the difference between the training data and corruptions of the training data. We show that this procedure successfully detects unseen anomalies with guarantees on those that have a certain Wasserstein distance from the data or corrupted training set. These additions allow us to show improved performance on MNIST, CIFAR10, and health record data. Further, LAD does not require decoding back to the original data space, which makes anomaly detection possible in domains where it is difficult to define a decoder, such as in irregular graph structured data. Empirically, we show this framework leads to improved performance on image, health record, and graph data.
Similar content being viewed by others
References
Zimek, A., Schubert, E., & Kriegel, H. P. (2012). A survey on unsupervised outlier detection in high-dimensional numerical data. Statistical Analysis and Data Mining: The ASA Data Science Jounal, 5(5), 363–387. https://doi.org/10.1002/sam.11161
Chalapathy, R., & Chawla, S. (2019). Deep Learning for Anomaly Detection: A Survey. ArXiv190103407 Cs Stat.
Pang, G., Shen, C., Cao, L., & van den Hengel, A. (2020). Deep Learning for Anomaly Detection: A Review. ArXiv200702500 Cs Stat.
Radhakrishnan, A., Yang, K., Belkin, M., & Uhler, C. (2019). Memorization in Overparameterized Autoencoders. ArXiv181010333 Cs Stat.
Zhao, M., & Saligrama, V. (2009). Anomaly detection with score functions based on nearest neighbor graphs. In NeurIPS.
Abati, D., Porrello, A., Calderara, S., & Cucchiara, R. (2019). Latent Space Autoregression for Novelty Detection. In CVPR.
Chalapathy, R., Menon, A. K., & Chawla, S. (2017). Robust, Deep and Inductive Anomaly Detection. In ECML. Springer International Publishing. https://doi.org/10.1007/978-3-319-71249-9sps3
Sabokrou, M., Khalooei, M., Fathy, M., & Adeli, E. (2018). Adversarially Learned One-Class Classifier for Novelty Detection. In CVPR. https://doi.org/10.1109/CVPR.2018.00356
Schlegl, T., Seeböck, P., Waldstein, S. M., Schmidt-Erfurth, U., & Langs, G. (2017). Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In IPML.
Ruff, L., Kauffmann, J. R., Vandermeulen, R. A., Montavon, G., Samek, W., Kloft, M., Dietterich, T. G., & Müller, K. R. (2020). A unifying review of deep and shallow anomaly detection. ArXiv200911732 Cs Stat.
Sriperumbudur, B. K., Fukumizu, K., Gretton, A., Schölkopf, B., & Lanckriet, G. R. G. (2009). On integral probability metrics, \(\phi\)-divergences and binary classification. ArXiv09012698 Cs Math.
Tong, A., Wolf, G., & Krishnaswamy, S. (2020). Fixing Bias in Reconstruction-based Anomaly Detection with Lipschitz Discriminators. In IEEE MLSP.Espoo, Finland.
Andrews, J. T. A., Morton, E. J., & Griffin, L. D. (2016). Detecting Anomalous Data Using Auto-Encoders. International Journal of Machine Learning and Computing, 6(1), 6.
Erfani, S. M., Rajasegarar, S., Karunasekera, S., & Leckie, C. (2016). High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition, 58, 121–134. https://doi.org/10.1016/j.patcog.2016.03.028
Hawkins, S., He, H., Williams, G., & Baxter, R. (2002). Outlier Detection Using Replicator Neural Networks. In Data Warehousing and Knowledge Discovery (vol. 2454, pp. 170–180). Springer Berlin Heidelberg, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46145-0sps17
Sakurada, M., & Yairi, T. (2014). Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction. In MLSDA. Australia. https://doi.org/10.1145/2689746.2689747
Perera, P., Nallapati, R., & Xiang, B. (2019). OCGAN: One-class Novelty Detection Using GANs with Constrained Latent Representations. ArXiv190308550 Cs.
Pidhorskyi, S., Almohsen, R., Adjeroh, D. A., & Doretto, G. (2018). Generative Probabilistic Novelty Detection with Adversarial Autoencoders. In arXiv:1807.02588 [Cs].
Zenati, H., Foo, C. S., Lecouat, B., Manek, G., & Chandrasekhar, V. R. (2018). Efficient GAN-based anomaly detection. ArXiv180206222 Cs Stat.
Akcay, S., Atapour-Abarghouei, A., & Breckon, T. P. (2018). GANomaly: Semi-supervised anomaly detection via adversarial training. ArXiv180506725 Cs.
Di Mattia, F., Galeone, P., De Simoni, M., & Ghelfi, E. (2019). A survey on GANs for anomaly detection. ArXiv190611632 Cs Stat.
Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13(7), 1443–1471. https://doi.org/10.1162/089976601750264965
Tax, D. M., & Duin, R. P. (2004). Support vector data description. Machine Learning, 54(1), 45–66. https://doi.org/10.1023/B:MACH.0000008084.60811.49
Chalapathy, R., Menon, A. K., & Chawla, S. (2018). Anomaly Detection using One-Class Neural Networks. ArXiv180206360 Cs Stat.
Ruff, L., Vandermeulen, R. A., Görnitz, N., Deecke, L., Siddiqui, S. A., Binder, A., Müller, E., & Kloft, M. (2018). Deep One-Class Classification. In ICML (p. 10). Stockholm, Sweden.
Elomaa, T., Mannila, H., & Toivonen, H. (2002). Fast Outlier Detection in High Dimensional Spaces. In PKDD, Lecture Notes in Computer Science; Lecture Notes in Artificial Intelligence. Springer, Helsinki, Finland.
Knorr, E. M., Ng, R. T., & Tucakov, V. (2000). Distance-based outliers: Algorithms and applications. The VLDB Journal The International Journal on Very Large Data Bases, 8(3–4), 237–253. https://doi.org/10.1007/s007780050006.
Ramaswamy, S., Rastogi, R., & Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. In MOD (p. 12). Dalles, TX.
Breunig, M. M., Kriegel, H.P., Ng, R. T., & Sander, J. (2000). LOF: Identifying Density-Based Local Outliers. In ACM SIGMOD. Dalles, TX, p 12.
Campos, G. O., Zimek, A., Sander, J., Campello, R. J. G. B., Micenková, B., Schubert, E., et al. (2016). On the evaluation of unsupervised outlier detection: Measures, datasets, and an empirical study. Data Mining and Knowledge Discovery, 30(4), 891–927. https://doi.org/10.1007/s10618-015-0444-8
Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9), 509–517. https://doi.org/10.1145/361002.361007
Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein GAN. In ICML.
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. (2017). Improved Training of Wasserstein GANs. ArXiv170400028 Cs Stat.
Villani, C. (2009). Optimal Transport: Old and New. Berlin: Springer.
Leeb, W. (2015). Topics in metric approximation. Ph.D. thesis, Yale University
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P. A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. JMLR, 3371–3408.
Liu, F. T., Ting, K. M., & Zhou, Z. H. (2012). Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data, 6(1), 1–39. https://doi.org/10.1145/2133360.2133363
Chen, Y., Zhou, X., & Huang, T. (2001). One-class SVM for learning in image retrieval. In: ICIP (vol. 1, pp. 34–37). IEEE, Thessaloniki, Greece. https://doi.org/10.1109/ICIP.2001.958946
Inker, L. A., & Perrone, R. D. (2018). Assessment of kidney function. https://www.uptodate.com/
Moon, K. R., van Dijk, D., Wang, Z., Gigante, S., Burkhardt, D. B., Chen, W. S., Yim, K., van den Elzen, A., Hirn, M. J., Coifman, R. R., Ivanova, N. B., Wolf, G., & Krishnaswamy, S. (2019). Visualizing structure and transitions in high-dimensional biological data. Nature Biotechnology, 37(12), 1482–1492.
Kipf, T. N., & Welling, M. (2016). Semi-Supervised Classification with Graph Convolutional Networks. 4th International Conference on Machine Learning.
Zhou, N., Jiang, Y., Bergquist, T. R., Lee, A. J., Kacsoh, B. Z., Crocker, A. W., Lewis, K. A., Georghiou, G., Nguyen, H. N., Hamid, M. N., Davis, L., Dogan, T., Atalay, V., Rifaioglu, A. S., Dalkıran, A., Cetin Atalay, R., Zhang, C., Hurto, R. L., Freddolino, P. L., Zhang, Y., Bhat, P., Supek, F., Fernández, J. M., Gemovic, B., Perovic, V. R., Davidović, R. S., Sumonja, N., Veljkovic, N., Asgari, E., Mofrad, M. R., Profiti, G., Savojardo, C., Martelli, P. L., Casadio, R., Boecker, F., Schoof, H., Kahanda, I., Thurlby, N., McHardy, A. C., Renaux, A., Saidi, R., Gough, J., Freitas, A. A., Antczak, M., Fabris, F., Wass, M. N., Hou, J., Cheng, J., Wang, Z., Romero, A. E., Paccanaro, A., Yang, H., Goldberg, T., Zhao, C., Holm, L., Törönen, P., Medlar, A. J., Zosa, E., Borukhov, I., Novikov, I., Wilkins, A., Lichtarge, O., Chi, P. H., Tseng, W. C., Linial, M., Rose, P. W., Dessimoz, C., Vidulin, V., Dzeroski, S., Sillitoe, I., Das, S., Lees, J. G., Jones, D. T., Wan, C., Cozzetto, D., Fa, R., Torres, M., Warwick Vesztrocy, A., Rodriguez, J. M., Tress, M. L., Frasca, M., Notaro, M., Grossi, G., Petrini, A., Re, M., Valentini, G., Mesiti, M., Roche, D. B., Reeb, J., Ritchie, D. W., Aridhi, S., Alborzi, S. Z., Devignes, M. D., Koo, D. C. E., Bonneau, R., Gligorijević, V., Barot, M., Fang, H., Toppo, S., Lavezzo, E., Falda, M., Berselli, M., Tosatto, S. C., Carraro, M., Piovesan, D., Ur Rehman, H., Mao, Q., Zhang, S., Vucetic, S., Black, G. S., Jo, D., Suh, E., Dayton, J. B., Larsen, D. J., Omdahl, A. R., McGuffin, L. J., Brackenridge, D. A., Babbitt, P. C., Yunes, J. M., Fontana, P., Zhang, F., Zhu, S., You, R., Zhang, Z., Dai, S., Yao, S., Tian, W., Cao, R., Chandler, C., Amezola, M., Johnson, D., Chang, J. M., Liao, W. H., Liu, Y. W., Pascarelli, S., Frank, Y., Hoehndorf, R., Kulmanov, M., Boudellioua, I., Politano, G., Di Carlo, S., Benso, A., Hakala, K., Ginter, F., Mehryary, F., Kaewphan, S., Björne, J., Moen, H., Tolvanen, M. E., Salakoski, T., Kihara, D., Jain, A., Šmuc, T., Altenhoff, A., Ben-Hur, A., Rost, B., Brenner, S. E., Orengo, C. A., Jeffery, C. J., Bosco, G., Hogan, D. A., Martin, M. J., O’Donovan, C., Mooney, S. D., Greene, C. S., Radivojac, P., & Friedberg, I. (2019). The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biology, 20(1), 244. https://doi.org/10.1186/s13059-019-1835-8
Gligorijevic, V., Renfrew, P. D., Kosciolek, T., Leman, J. K., Berenberg, D., Vatanen, T., Chandler, C., Taylor, B. C., Fisk, I. M., Vlamakis, H., Xavier, R. J., Knight, R., Cho, K., & Bonneau, R. (2019). Structure-based protein function prediction using graph convolutional networks. Preprint, Bioinformatics. https://doi.org/10.1101/786236
Bruna, J., Zaremba, W., Szlam, A., & LeCun, Y. (2014) Spectral Networks and Locally Connected Networks on Graphs. In ICLR.
Errica, F., Bacciu, D., Podda, M., & Micheli, A. (2020). A Fair Comparison of Graph Neural Networks for Graph Classification. In: ICLR. p. 14.
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017). Neural Message Passing for Quantum Chemistry. In Proceedings of the 34th International Conference on Machine Learning.
Leaver-Fay, A., Tyka, M., Lewis, S. M., Lange, O. F., Thompson, J., Jacak, R., Kaufman, K., Renfrew, P. D., Smith, C. A., Sheffler, W., Davis, I. W., Cooper, S., Treuille, A., Mandell, D. J., Richter, F., Ban, Y. E. A., Fleishman, S. J., Corn, J. E., Kim, D. E., Lyskov, S., Berrondo, M., Mentzer, S., Popović, Z., Havranek, J. J., Karanicolas, J., Das, R., Meiler, J., Kortemme, T., Gray, J. J., Kuhlman, B., Baker, D., & Bradley, P. (2011). Rosetta3: An object-oriented software suite for the simulation and design of macromolecules. Methods in Enzymology, 487, 545–574. https://doi.org/10.1016/B978-0-12-381270-4.00019-6
Kipf, T. N., & Welling, M. (2016). Variational Graph Auto-Encoders. In Bayesian Deep Learning Workshop NeurIPS 2016.
Borgwardt, K. M., Ong, C. S., Schonauer, S., Vishwanathan, S. V. N., Smola, A. J., & Kriegel, H. P. (2005). Protein function prediction via graph kernels. Bioinformatics, 21(Suppl 1), i47–i56. https://doi.org/10.1093/bioinformatics/bti1007
Kingma, D. P., & Ba, J. (2015). Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations.
Acknowledgements
The authors would like to thank Matthew Amodio and Ronald Coifman for productive discussions and feedback on this project as well as the anonymous reviewers who helped improve this work. This research was partially funded by IVADO Professor funds, CIFAR AI Chair, and NSERC Discovery grant 03267 [G.W.]; Chan-Zuckerberg Initiative grants 182702 & CZF2019-002440 [S.K.]; NSF career grant 2047856 [S.K.]; Sloan Fellowship FG-2021-15883 [S.K.]; and NIH grants R01GM135929 & R01GM130847 [S.K.]. The content provided here is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tong, A., Wolf, G. & Krishnaswamy, S. Fixing Bias in Reconstruction-based Anomaly Detection with Lipschitz Discriminators. J Sign Process Syst 94, 229–243 (2022). https://doi.org/10.1007/s11265-021-01715-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-021-01715-6