Abstract
Outlier detection is an important field in data mining. For high-dimensional data the task is particularly challenging because of the so-called “curse of dimensionality”: The notion of neighborhood becomes meaningless, and points typically show their outlying behavior only in subspaces. As a result, traditional approaches are ineffective. Because of the lack of a ground truth in real-world data and of a priori knowledge about the characteristics of potential outliers, outlier detection should be considered an unsupervised learning problem. In this paper, we examine the usefulness of unsupervised artificial neural networks – autoencoders, self-organising maps and restricted Boltzmann machines – to detect outliers in high-dimensional data in a fully unsupervised way. Each of those approaches targets at learning an approximate representation of the data. We show that one can measure the “outlierness” of objects effectively, by measuring their deviation from the learned representation. Our experiments show that neural-based approaches outperform the current state of the art in terms of both runtime and accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: SIGMOD Conference, pp. 37–46. ACM (2001). https://doi.org/10.1145/376284.375668
Attik, M., Bougrain, L., Alexandre, F.: Self-organizing map initialization. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3696, pp. 357–362. Springer, Heidelberg (2005). https://doi.org/10.1007/11550822_56
Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49257-7_15
Bishop, C.M.: Novelty detection and neural network validation. In: ICANN 1993, pp. 789–794 (1993). https://doi.org/10.1007/978-1-4471-2063-6_225
Bourland, H., Kamp, Y.: Auto-association by multilayer perceptrons and singular value decomposition. Biol. Cybern. 59(4), 291–294 (1988). https://doi.org/10.1007/BF00332918
Breunig, M.M., Kriegel, H., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: SIGMOD Conference, pp. 93–104. ACM (2000). https://doi.org/10.1145/335191.335388
Campos, G.O., et al.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Discov. 30(4), 891–927 (2016). https://doi.org/10.1007/s10618-015-0444-8
Chen, J., Sathe, S., Aggarwal, C.C., Turaga, D.S.: Outlier detection with autoencoder ensembles. In: SDM, pp. 90–98. SIAM (2017). https://doi.org/10.1137/1.9781611974973.11
Chen, Y., Lu, L., Li, X.: Application of continuous restricted boltzmann machine to identify multivariate geochemical anomaly. J. Geochem. Explor. 140, 56–63 (2014). https://doi.org/10.1016/j.gexplo.2014.02.013
Ciampi, A., Lechevallier, Y.: Clustering large, multi-level data sets: an approach based on Kohonen Self Organizing Maps. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 353–358. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45372-5_36
Dau, H.A., Ciesielski, V., Song, A.: Anomaly detection using replicator neural networks trained on examples of one class. In: Dick, G., et al. (eds.) SEAL 2014. LNCS, vol. 8886, pp. 311–322. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13563-2_27
Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: ICML, ACM International Conference Proceeding Series, vol. 148, pp. 233–240. ACM (2006). https://doi.org/10.1145/1143844.1143874
Dua, D., Graff, C.: UCI machine learning repository (2019). http://archive.ics.uci.edu/ml
Fiore, U., Palmieri, F., Castiglione, A., Santis, A.D.: Network anomaly detection with the restricted boltzmann machine. Neurocomputing 122, 13–23 (2013). https://doi.org/10.1016/j.neucom.2012.11.050
Hahnloser, R.R., Sarpeshkar, R., Mahowald, M.A., Douglas, R.J., Seung, S.H.: Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405(6789), 947–951 (2000). https://doi.org/10.1038/35016072
Hawkins, D.M.: Identification of Outliers, Monographs on Applied Probability and Statistics, vol. 11. Springer, Dordrecht (1980). https://doi.org/10.1007/978-94-015-3994-4
Hawkins, S., He, H., Williams, G., Baxter, R.: Outlier detection using replicator neural networks. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, pp. 170–180. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-46145-0_17
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002). https://doi.org/10.1162/089976602760128018
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006). https://doi.org/10.1162/neco.2006.18.7.1527
Hinton, G.E., Zemel, R.S.: Autoencoders, minimum description length and helmholtz free energy. In: NIPS, pp. 3–10. Morgan Kaufmann (1993). http://papers.nips.cc/paper/798-autoencoders-minimum-description-length-and-helmholtz-free-energy
Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 6(2), 107–116 (1998). https://doi.org/10.1142/S0218488598000094
Japkowicz, N., Myers, C., Gluck, M.A.: A novelty detection approach to classification. In: IJCAI, pp. 518–523. Morgan Kaufmann (1995). http://ijcai.org/Proceedings/95-1/Papers/068.pdf
Keller, F., Müller, E., Böhm, K.: HiCS: high contrast subspaces for density-based outlier ranking. In: ICDE, pp. 1037–1048. IEEE Computer Society (2012). https://doi.org/10.1109/icde.2012.88
Kelley, H.J.: Gradient theory of optimal flight paths. ARS J. 30(10), 947–954 (1960). https://doi.org/10.2514/8.5282
Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43(1), 59–69 (1982). https://doi.org/10.1007/bf00337288
Kohonen, T.: Self-Organizing Maps. Springer Series in Information Sciences. Springer, Heidelberg (1995). https://doi.org/10.1007/978-3-642-97610-0
Kriegel, H., Kröger, P., Schubert, E., Zimek, A.: Loop: local outlier probabilities. In: CIKM, pp. 1649–1652. ACM (2009). https://doi.org/10.1145/1645953.1646195
Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Outlier detection in axis-parallel subspaces of high dimensional data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 831–838. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_86
Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2008, ACM Press, New York, NY, USA, pp. 444–452 (2008). https://doi.org/10.1145/1401890.1401946
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012). https://doi.org/10.1145/3065386
Linnainmaa, S.: Taylor expansion of the accumulated rounding error. BIT Numer. Math. 16(2), 146–160 (1976). https://doi.org/10.1007/BF01931367
Liu, F.T., Ting, K.M., Zhou, Z.: Isolation forest. In: ICDM, pp. 413–422. IEEE Computer Society (2008). https://doi.org/10.1109/ICDM.2008.17
Marchi, E., Vesperini, F., Eyben, F., Squartini, S., Schuller, B.W.: A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks. In: ICASSP, pp. 1996–2000. IEEE (2015). https://doi.org/10.1109/ICASSP.2015.7178320
Müller, E., Schiffer, M., Seidl, T.: Adaptive outlierness for subspace outlier ranking. In: CIKM, pp. 1629–1632. ACM (2010). https://doi.org/10.1145/1871437.1871690
Müller, E., Schiffer, M., Seidl, T.: Statistical selection of relevant subspace projections for outlier ranking. In: ICDE, pp. 434–445. IEEE Computer Society (2011). https://doi.org/10.1109/ICDE.2011.5767916
Muñoz, A., Muruzábal, J.: Self-organising maps for outlier detection. Neurocomputing 18(1), 33–60 (1998). https://doi.org/10.1016/S0925-2312(97)00068-4
Nguyen, H.V., Gopalkrishnan, V., Assent, I.: An unbiased distance-based outlier detection approach for high-dimensional data. In: Yu, J.X., Kim, M.H., Unland, R. (eds.) DASFAA 2011. LNCS, vol. 6587, pp. 138–152. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20149-3_12
Nguyen, H.V., Müller, E., Vreeken, J., Keller, F., Böhm, K.: CMI: an information-theoretic contrast measure for enhancing subspace cluster and outlier detection. In: SDM, pp. 198–206 (2013). https://doi.org/10.1137/1.9781611972832.22
Provost, F.J., Fawcett, T.: Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. In: KDD, pp. 43–48. AAAI Press (1997), http://www.aaai.org/Library/KDD/1997/kdd97-007.php
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: SIGMOD Conference, pp. 427–438. ACM (2000). https://doi.org/10.1145/342009.335437
Rayana, S.: ODDS library (2016). http://odds.cs.stonybrook.edu
Reddy, K.K., Sarkar, S., Venugopalan, V., Giering, M.: Anomaly detection and fault disambiguation in large flight data: a multi-modal deep autoencoder approach. In: Proceedings of the Annual Conference of the Prognostics and Health Management Society, Denver, Colorado. PHMC 2016, PHM Society, Rochester, NY, USA, vol. 7, pp. 192–199 (2016). http://www.phmsociety.org/node/2088/
Rubinstein, R.: The cross-entropy method for combinatorial and continuous optimization. Methodol. Comput. Appl. Probab. 1(2), 127–190 (1999). https://doi.org/10.1023/A:1010091220143
Sammon, J.W.: A nonlinear mapping for data structure analysis. IEEE Trans. Comput. 18(5), 401–409 (1969). https://doi.org/10.1109/T-C.1969.222678
Sathe, S., Aggarwal, C.C.: LODES: local density meets spectral outlier detection. In: SDM, pp. 171–179. SIAM (2016). https://doi.org/10.1137/1.9781611974348.20
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001). https://doi.org/10.1162/089976601750264965
Schubert, E., Koos, A., Emrich, T., Züfle, A., Schmid, K.A., Zimek, A.: A framework for clustering uncertain data. PVLDB 8(12), 1976–1979 (2015). http://www.vldb.org/pvldb/vol8/p1976-schubert.pdf
Smolensky, P.: Information processing in dynamical systems: Foundations of harmony theory. In: Rumelhart, D.E., McClelland, J.L., PDP Research Group, C. (eds.) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1, pp. 194–281. MIT Press, Cambridge (1986). http://dl.acm.org/citation.cfm?id=104279.104290
Wittek, P.: Somoclu: an efficient distributed library for self-organizing maps. CoRR abs/1305.1422 (2013). http://arxiv.org/abs/1305.1422
Zeiler, M.D.: ADADELTA: an adaptive learning rate method. CoRR abs/1212.5701 (2012). http://arxiv.org/abs/1212.5701
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Popovic, D., Fouché, E., Böhm, K. (2019). Unsupervised Artificial Neural Networks for Outlier Detection in High-Dimensional Data. In: Welzer, T., Eder, J., Podgorelec, V., Kamišalić Latifić, A. (eds) Advances in Databases and Information Systems. ADBIS 2019. Lecture Notes in Computer Science(), vol 11695. Springer, Cham. https://doi.org/10.1007/978-3-030-28730-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-28730-6_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28729-0
Online ISBN: 978-3-030-28730-6
eBook Packages: Computer ScienceComputer Science (R0)