Advertisement

Unsupervised Artificial Neural Networks for Outlier Detection in High-Dimensional Data

  • Daniel PopovicEmail author
  • Edouard Fouché
  • Klemens Böhm
Conference paper
  • 396 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11695)

Abstract

Outlier detection is an important field in data mining. For high-dimensional data the task is particularly challenging because of the so-called “curse of dimensionality”: The notion of neighborhood becomes meaningless, and points typically show their outlying behavior only in subspaces. As a result, traditional approaches are ineffective. Because of the lack of a ground truth in real-world data and of a priori knowledge about the characteristics of potential outliers, outlier detection should be considered an unsupervised learning problem. In this paper, we examine the usefulness of unsupervised artificial neural networks – autoencoders, self-organising maps and restricted Boltzmann machines – to detect outliers in high-dimensional data in a fully unsupervised way. Each of those approaches targets at learning an approximate representation of the data. We show that one can measure the “outlierness” of objects effectively, by measuring their deviation from the learned representation. Our experiments show that neural-based approaches outperform the current state of the art in terms of both runtime and accuracy.

Keywords

Unsupervised learning Outlier detection Neural networks 

References

  1. 1.
    Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: SIGMOD Conference, pp. 37–46. ACM (2001).  https://doi.org/10.1145/376284.375668CrossRefGoogle Scholar
  2. 2.
    Attik, M., Bougrain, L., Alexandre, F.: Self-organizing map initialization. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3696, pp. 357–362. Springer, Heidelberg (2005).  https://doi.org/10.1007/11550822_56CrossRefGoogle Scholar
  3. 3.
    Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)Google Scholar
  4. 4.
    Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1999).  https://doi.org/10.1007/3-540-49257-7_15CrossRefGoogle Scholar
  5. 5.
    Bishop, C.M.: Novelty detection and neural network validation. In: ICANN 1993, pp. 789–794 (1993).  https://doi.org/10.1007/978-1-4471-2063-6_225CrossRefGoogle Scholar
  6. 6.
    Bourland, H., Kamp, Y.: Auto-association by multilayer perceptrons and singular value decomposition. Biol. Cybern. 59(4), 291–294 (1988).  https://doi.org/10.1007/BF00332918MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Breunig, M.M., Kriegel, H., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: SIGMOD Conference, pp. 93–104. ACM (2000).  https://doi.org/10.1145/335191.335388CrossRefGoogle Scholar
  8. 8.
    Campos, G.O., et al.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Discov. 30(4), 891–927 (2016).  https://doi.org/10.1007/s10618-015-0444-8MathSciNetCrossRefGoogle Scholar
  9. 9.
    Chen, J., Sathe, S., Aggarwal, C.C., Turaga, D.S.: Outlier detection with autoencoder ensembles. In: SDM, pp. 90–98. SIAM (2017).  https://doi.org/10.1137/1.9781611974973.11CrossRefGoogle Scholar
  10. 10.
    Chen, Y., Lu, L., Li, X.: Application of continuous restricted boltzmann machine to identify multivariate geochemical anomaly. J. Geochem. Explor. 140, 56–63 (2014).  https://doi.org/10.1016/j.gexplo.2014.02.013CrossRefGoogle Scholar
  11. 11.
    Ciampi, A., Lechevallier, Y.: Clustering large, multi-level data sets: an approach based on Kohonen Self Organizing Maps. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 353–358. Springer, Heidelberg (2000).  https://doi.org/10.1007/3-540-45372-5_36CrossRefGoogle Scholar
  12. 12.
    Dau, H.A., Ciesielski, V., Song, A.: Anomaly detection using replicator neural networks trained on examples of one class. In: Dick, G., et al. (eds.) SEAL 2014. LNCS, vol. 8886, pp. 311–322. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-13563-2_27CrossRefGoogle Scholar
  13. 13.
    Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: ICML, ACM International Conference Proceeding Series, vol. 148, pp. 233–240. ACM (2006).  https://doi.org/10.1145/1143844.1143874
  14. 14.
    Dua, D., Graff, C.: UCI machine learning repository (2019). http://archive.ics.uci.edu/ml
  15. 15.
    Fiore, U., Palmieri, F., Castiglione, A., Santis, A.D.: Network anomaly detection with the restricted boltzmann machine. Neurocomputing 122, 13–23 (2013).  https://doi.org/10.1016/j.neucom.2012.11.050CrossRefGoogle Scholar
  16. 16.
    Hahnloser, R.R., Sarpeshkar, R., Mahowald, M.A., Douglas, R.J., Seung, S.H.: Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405(6789), 947–951 (2000).  https://doi.org/10.1038/35016072CrossRefGoogle Scholar
  17. 17.
    Hawkins, D.M.: Identification of Outliers, Monographs on Applied Probability and Statistics, vol. 11. Springer, Dordrecht (1980).  https://doi.org/10.1007/978-94-015-3994-4CrossRefGoogle Scholar
  18. 18.
    Hawkins, S., He, H., Williams, G., Baxter, R.: Outlier detection using replicator neural networks. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, pp. 170–180. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-46145-0_17CrossRefGoogle Scholar
  19. 19.
    Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002).  https://doi.org/10.1162/089976602760128018CrossRefzbMATHGoogle Scholar
  20. 20.
    Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006).  https://doi.org/10.1162/neco.2006.18.7.1527MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Hinton, G.E., Zemel, R.S.: Autoencoders, minimum description length and helmholtz free energy. In: NIPS, pp. 3–10. Morgan Kaufmann (1993). http://papers.nips.cc/paper/798-autoencoders-minimum-description-length-and-helmholtz-free-energy
  22. 22.
    Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 6(2), 107–116 (1998).  https://doi.org/10.1142/S0218488598000094MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Japkowicz, N., Myers, C., Gluck, M.A.: A novelty detection approach to classification. In: IJCAI, pp. 518–523. Morgan Kaufmann (1995). http://ijcai.org/Proceedings/95-1/Papers/068.pdf
  24. 24.
    Keller, F., Müller, E., Böhm, K.: HiCS: high contrast subspaces for density-based outlier ranking. In: ICDE, pp. 1037–1048. IEEE Computer Society (2012).  https://doi.org/10.1109/icde.2012.88
  25. 25.
    Kelley, H.J.: Gradient theory of optimal flight paths. ARS J. 30(10), 947–954 (1960).  https://doi.org/10.2514/8.5282CrossRefzbMATHGoogle Scholar
  26. 26.
    Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43(1), 59–69 (1982).  https://doi.org/10.1007/bf00337288CrossRefzbMATHGoogle Scholar
  27. 27.
    Kohonen, T.: Self-Organizing Maps. Springer Series in Information Sciences. Springer, Heidelberg (1995).  https://doi.org/10.1007/978-3-642-97610-0CrossRefGoogle Scholar
  28. 28.
    Kriegel, H., Kröger, P., Schubert, E., Zimek, A.: Loop: local outlier probabilities. In: CIKM, pp. 1649–1652. ACM (2009).  https://doi.org/10.1145/1645953.1646195
  29. 29.
    Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Outlier detection in axis-parallel subspaces of high dimensional data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 831–838. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-01307-2_86CrossRefGoogle Scholar
  30. 30.
    Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2008, ACM Press, New York, NY, USA, pp. 444–452 (2008).  https://doi.org/10.1145/1401890.1401946
  31. 31.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012).  https://doi.org/10.1145/3065386CrossRefGoogle Scholar
  32. 32.
    Linnainmaa, S.: Taylor expansion of the accumulated rounding error. BIT Numer. Math. 16(2), 146–160 (1976).  https://doi.org/10.1007/BF01931367MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Liu, F.T., Ting, K.M., Zhou, Z.: Isolation forest. In: ICDM, pp. 413–422. IEEE Computer Society (2008).  https://doi.org/10.1109/ICDM.2008.17
  34. 34.
    Marchi, E., Vesperini, F., Eyben, F., Squartini, S., Schuller, B.W.: A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks. In: ICASSP, pp. 1996–2000. IEEE (2015).  https://doi.org/10.1109/ICASSP.2015.7178320
  35. 35.
    Müller, E., Schiffer, M., Seidl, T.: Adaptive outlierness for subspace outlier ranking. In: CIKM, pp. 1629–1632. ACM (2010).  https://doi.org/10.1145/1871437.1871690
  36. 36.
    Müller, E., Schiffer, M., Seidl, T.: Statistical selection of relevant subspace projections for outlier ranking. In: ICDE, pp. 434–445. IEEE Computer Society (2011).  https://doi.org/10.1109/ICDE.2011.5767916
  37. 37.
    Muñoz, A., Muruzábal, J.: Self-organising maps for outlier detection. Neurocomputing 18(1), 33–60 (1998).  https://doi.org/10.1016/S0925-2312(97)00068-4CrossRefGoogle Scholar
  38. 38.
    Nguyen, H.V., Gopalkrishnan, V., Assent, I.: An unbiased distance-based outlier detection approach for high-dimensional data. In: Yu, J.X., Kim, M.H., Unland, R. (eds.) DASFAA 2011. LNCS, vol. 6587, pp. 138–152. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-20149-3_12CrossRefGoogle Scholar
  39. 39.
    Nguyen, H.V., Müller, E., Vreeken, J., Keller, F., Böhm, K.: CMI: an information-theoretic contrast measure for enhancing subspace cluster and outlier detection. In: SDM, pp. 198–206 (2013).  https://doi.org/10.1137/1.9781611972832.22
  40. 40.
    Provost, F.J., Fawcett, T.: Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. In: KDD, pp. 43–48. AAAI Press (1997), http://www.aaai.org/Library/KDD/1997/kdd97-007.php
  41. 41.
    Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: SIGMOD Conference, pp. 427–438. ACM (2000).  https://doi.org/10.1145/342009.335437
  42. 42.
    Rayana, S.: ODDS library (2016). http://odds.cs.stonybrook.edu
  43. 43.
    Reddy, K.K., Sarkar, S., Venugopalan, V., Giering, M.: Anomaly detection and fault disambiguation in large flight data: a multi-modal deep autoencoder approach. In: Proceedings of the Annual Conference of the Prognostics and Health Management Society, Denver, Colorado. PHMC 2016, PHM Society, Rochester, NY, USA, vol. 7, pp. 192–199 (2016). http://www.phmsociety.org/node/2088/
  44. 44.
    Rubinstein, R.: The cross-entropy method for combinatorial and continuous optimization. Methodol. Comput. Appl. Probab. 1(2), 127–190 (1999).  https://doi.org/10.1023/A:1010091220143MathSciNetCrossRefzbMATHGoogle Scholar
  45. 45.
    Sammon, J.W.: A nonlinear mapping for data structure analysis. IEEE Trans. Comput. 18(5), 401–409 (1969).  https://doi.org/10.1109/T-C.1969.222678CrossRefGoogle Scholar
  46. 46.
    Sathe, S., Aggarwal, C.C.: LODES: local density meets spectral outlier detection. In: SDM, pp. 171–179. SIAM (2016).  https://doi.org/10.1137/1.9781611974348.20
  47. 47.
    Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001).  https://doi.org/10.1162/089976601750264965CrossRefzbMATHGoogle Scholar
  48. 48.
    Schubert, E., Koos, A., Emrich, T., Züfle, A., Schmid, K.A., Zimek, A.: A framework for clustering uncertain data. PVLDB 8(12), 1976–1979 (2015). http://www.vldb.org/pvldb/vol8/p1976-schubert.pdfGoogle Scholar
  49. 49.
    Smolensky, P.: Information processing in dynamical systems: Foundations of harmony theory. In: Rumelhart, D.E., McClelland, J.L., PDP Research Group, C. (eds.) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1, pp. 194–281. MIT Press, Cambridge (1986). http://dl.acm.org/citation.cfm?id=104279.104290
  50. 50.
    Wittek, P.: Somoclu: an efficient distributed library for self-organizing maps. CoRR abs/1305.1422 (2013). http://arxiv.org/abs/1305.1422
  51. 51.
    Zeiler, M.D.: ADADELTA: an adaptive learning rate method. CoRR abs/1212.5701 (2012). http://arxiv.org/abs/1212.5701

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Karlsruhe Institute of Technology (KIT)KarlsruheGermany

Personalised recommendations