Advertisement

On the Correlation Between Local Intrinsic Dimensionality and Outlierness

  • Michael E. Houle
  • Erich Schubert
  • Arthur Zimek
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11223)

Abstract

Data mining methods for outlier detection are usually based on non-parametric density estimates in various variations. Here we argue for the use of local intrinsic dimensionality as a measure of outlierness and demonstrate empirically that it is a meaningful alternative and complement to classic methods.

Keywords

Outlier detection Intrinsic dimensionality Comparison 

Notes

Acknowledgments

M. E. Houle supported by JSPS Kakenhi Kiban (B) Research Grant 18H03296.

References

  1. 1.
    Amsaleg, L., Bailey, J., Barbe, D., Erfani, S.M., Houle, M.E., Nguyen, V., Radovanović, M.: The vulnerability of learning to adversarial perturbation increases with intrinsic dimensionality. In: WIFS 2017, pp. 1–6 (2017)Google Scholar
  2. 2.
    Amsaleg, L., Chelly, O., Furon, T., Girard, S., Houle, M.E., Kawarabayashi, K., Nett, M.: Estimating local intrinsic dimensionality. In: Proceedings of KDD (2015)Google Scholar
  3. 3.
    Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Proceedings of PKDD, pp. 15–26 (2002)CrossRefGoogle Scholar
  4. 4.
    Angiulli, F., Pizzuti, C.: Outlier mining in large high-dimensional data sets. IEEE TKDE 17(2), 203–215 (2005)zbMATHGoogle Scholar
  5. 5.
    Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. Wiley, Hoboken (1994)zbMATHGoogle Scholar
  6. 6.
    Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbors. In: Proceedings of ICML, pp. 97–104 (2006)Google Scholar
  7. 7.
    Breunig, M.M., Kriegel, H.P., Ng, R., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of SIGMOD, pp. 93–104 (2000)CrossRefGoogle Scholar
  8. 8.
    Camastra, F., Vinciarelli, A.: Estimating the intrinsic dimension of data with a fractal-based method. IEEE TPAMI 24(10), 1404–1407 (2002)CrossRefGoogle Scholar
  9. 9.
    Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM TKDD 10(1), 5:1–5:51 (2015)Google Scholar
  10. 10.
    Campos, G.O., Zimek, A., Sander, J., Campello, R.J.G.B., Micenková, B., Schubert, E., Assent, I., Houle, M.E.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Discov. 30, 891–927 (2016)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Casanova, G., Englmeier, E., Houle, M., Kroeger, P., Nett, M., Schubert, E., Zimek, A.: Dimensional testing for reverse k-nearest neighbor search. PVLDB 10(7), 769–780 (2017)Google Scholar
  12. 12.
    Costa, J.A., Hero, A.O.: Entropic graphs for manifold learning. In: 37th Asilomar Conference on Signals, Systems, and Computers, vol. 1, pp. 316–320 (2003)Google Scholar
  13. 13.
    de Vries, T., Chawla, S., Houle, M.E.: Density-preserving projections for large-scale local anomaly detection. KAIS 32(1), 25–52 (2012)Google Scholar
  14. 14.
    Falconer, K.: Fractal Geometry: Mathematical Foundations and Applications. Wiley, Hoboken (2003)zbMATHCrossRefGoogle Scholar
  15. 15.
    Fraga Alves, M., de Haan, L., Lin, T.: Estimation of the parameter controlling the speed of convergence in extreme value theory. Math. Methods Stat. 12(2), 155–176 (2003)MathSciNetGoogle Scholar
  16. 16.
    Grassberger, P., Procaccia, I.: Characterization of strange attractors. Phys. Rev. Lett. 50, 346–349 (1983)MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Grubbs, F.E.: Procedures for detecting outlying observations in samples. Technometrics 11(1), 1–21 (1969)CrossRefGoogle Scholar
  18. 18.
    Gupta, A., Krauthgamer, R., Lee, J.R.: Bounded geometries, fractals, and low-distortion embeddings. In: Proceedings of FOCS, pp. 534–543 (2003)Google Scholar
  19. 19.
    Hautamäki, V., Kärkkäinen, I., Fränti, P.: Outlier detection using k-nearest neighbor graph. In: Proceedings of ICPR, pp. 430–433 (2004)Google Scholar
  20. 20.
    Hawkins, D.: Identification of Outliers. Chapman and Hall, Boca Raton (1980)zbMATHCrossRefGoogle Scholar
  21. 21.
    Hein, M., Audibert, J.Y.: Intrinsic dimensionality estimation of submanifolds in \(R^d\). In: Proceedings of ICML, pp. 289–296 (2005)Google Scholar
  22. 22.
    Hill, B.M.: A simple general approach to inference about the tail of a distribution. Ann. Stat. 3(5), 1163–1174 (1975)MathSciNetzbMATHCrossRefGoogle Scholar
  23. 23.
    Houle, M.E.: Dimensionality, discriminability, density and distance distributions. In: Proceedings of ICDM Workshops, pp. 468–473 (2013)Google Scholar
  24. 24.
    Houle, M.E.: Local intrinsic dimensionality I: an extreme-value-theoretic foundation for similarity applications. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 64–79. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-68474-1_5CrossRefGoogle Scholar
  25. 25.
    Houle, M.E.: Local intrinsic dimensionality II: multivariate analysis and distributional support. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 80–95. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-68474-1_6CrossRefGoogle Scholar
  26. 26.
    Houle, M.E., Kashima, H., Nett, M.: Generalized expansion dimension. In: ICDM Workshop PTDM, pp. 587–594 (2012)Google Scholar
  27. 27.
    Houle, M.E., Ma, X., Nett, M., Oria, V.: Dimensional testing for multi-step similarity search. In: Proceedings of ICDM, pp. 299–308 (2012)Google Scholar
  28. 28.
    Houle, M.E., Ma, X., Oria, V.: Effective and efficient algorithms for flexible aggregate similarity search in high dimensional spaces. IEEE TKDE 27(12), 3258–3273 (2015)Google Scholar
  29. 29.
    Houle, M.E., Ma, X., Oria, V., Sun, J.: Efficient algorithms for similarity search in axis-aligned subspaces. In: Traina, A.J.M., Traina, C., Cordeiro, R.L.F. (eds.) SISAP 2014. Lecture Notes in Computer Science, vol. 8821, pp. 1–12. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11988-5_1CrossRefGoogle Scholar
  30. 30.
    Houle, M.E., Ma, X., Oria, V., Sun, J.: Query expansion for content-based similarity search using local and global features. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 13(3), 1–23 (2017)CrossRefGoogle Scholar
  31. 31.
    Houle, M.E., Oria, V., Wali, A.M.: Improving \(k\)-nn graph accuracy using local intrinsic dimensionality. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 110–124. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-68474-1_8CrossRefGoogle Scholar
  32. 32.
    Houle, M.E., Nett, M.: Rank-based similarity search: reducing the dimensional dependence. IEEE TPAMI 37(1), 136–150 (2015)CrossRefGoogle Scholar
  33. 33.
    Huisman, R., Koedijk, K.G., Kool, C.J.M., Palm, F.: Tail-index estimates in small samples. J. Bus. Econ. Stat. 19(2), 208–216 (2001)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 577–593. Springer, Heidelberg (2006).  https://doi.org/10.1007/11731139_68CrossRefGoogle Scholar
  35. 35.
    Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, New York (2002).  https://doi.org/10.1007/b98835zbMATHCrossRefGoogle Scholar
  36. 36.
    Karger, D.R., Ruhl, M.: Finding nearest neighbors in growth-restricted metrics. In: Proceedings of STOC, pp. 741–750 (2002)Google Scholar
  37. 37.
    Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of VLDB, pp. 392–403 (1998)Google Scholar
  38. 38.
    Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: LoOP: local outlier probabilities. In: Proceedings of CIKM, pp. 1649–1652 (2009)Google Scholar
  39. 39.
    Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Interpreting and unifying outlier scores. In: Proceedings of SDM, pp. 13–24 (2011)CrossRefGoogle Scholar
  40. 40.
    Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: Proceedings of KDD, pp. 444–452 (2008)Google Scholar
  41. 41.
    Larrañaga, P., Lozano, J.A.: Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation, vol. 2. Springer, New York (2002).  https://doi.org/10.1007/978-1-4615-1539-5zbMATHCrossRefGoogle Scholar
  42. 42.
    Latecki, L.J., Lazarevic, A., Pokrajac, D.: Outlier Detection with Kernel Density Functions. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 61–75. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-73499-4_6CrossRefGoogle Scholar
  43. 43.
    Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. In: Proceedings of NIPS, pp. 777–784 (2004)Google Scholar
  44. 44.
    Ma, X., Li, B., Wang, Y., Erfani, S.M., Wijewickrema, S.N.R., Schoenebeck, G., Song, D., Houle, M.E., Bailey, J.: Characterizing adversarial subspaces using local intrinsic dimensionality, pp. 1–15 (2018)Google Scholar
  45. 45.
    Ma, X., Wang, Y., Houle, M.E., Zhou, S., Erfani, S.M., Xia, S., Wijewickrema, S.N.R., Bailey, J.: Dimensionality-driven learning with noisy labels, pp. 1–10 (2018)Google Scholar
  46. 46.
    Navarro, G., Paredes, R., Reyes, N., Bustos, C.: An empirical evaluation of intrinsic dimension estimators. Inf. Syst. 64, 206–218 (2017)CrossRefGoogle Scholar
  47. 47.
    Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: fast outlier detection using the local correlation integral. In: Proceedings of ICDE, pp. 315–326 (2003)Google Scholar
  48. 48.
    Pei, Y., Zaïane, O., Gao, Y.: An efficient reference-based approach to outlier detection in large datasets. In: Proceedings of ICDM, pp. 478–487 (2006)Google Scholar
  49. 49.
    Radovanović, M., Nanopoulos, A., Ivanović, M.: Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE TKDE 27, 1369–1382 (2015)Google Scholar
  50. 50.
    Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of SIGMOD, pp. 427–438 (2000)CrossRefGoogle Scholar
  51. 51.
    Romano, S., Chelly, O., Nguyen, V., Bailey, J., Houle, M.E.: Measuring dependency via intrinsic dimensionality, pp. 1207–1212 (2016)Google Scholar
  52. 52.
    Rousseeuw, P.J., Hubert, M.: Robust statistics for outlier detection. WIREs DMKD 1(1), 73–79 (2011)Google Scholar
  53. 53.
    Rozza, A., Lombardi, G., Ceruti, C., Casiraghi, E., Campadelli, P.: Novel high intrinsic dimensionality estimators. Mach. Learn. 89(1–2), 37–65 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  54. 54.
    Schubert, E., Gertz, M.: Intrinsic t-stochastic neighbor embedding for visualization and outlier detection. In: Proceedings of SISAP, pp. 188–203 (2017)CrossRefGoogle Scholar
  55. 55.
    Schubert, E., Koos, A., Emrich, T., Züfle, A., Schmid, K.A., Zimek, A.: A framework for clustering uncertain data. PVLDB 8(12), 1976–1979 (2015)Google Scholar
  56. 56.
    Schubert, E., Zimek, A., Kriegel, H.P.: Generalized outlier detection with flexible kernel density estimates. In: Proceedings of SDM, pp. 542–550 (2014)CrossRefGoogle Scholar
  57. 57.
    Schubert, E., Zimek, A., Kriegel, H.P.: Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min. Knowl. Discov. 28(1), 190–237 (2014)MathSciNetzbMATHCrossRefGoogle Scholar
  58. 58.
    Schubert, E., Zimek, A., Kriegel, H.-P.: Fast and scalable outlier detection with approximate nearest neighbor ensembles. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M.A. (eds.) DASFAA 2015. LNCS, vol. 9050, pp. 19–36. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-18123-3_2CrossRefGoogle Scholar
  59. 59.
    Takens, F.: On the numerical determination of the dimension of an attractor. In: Braaksma, B.L.J., Broer, H.W., Takens, F. (eds.) Dynamical Systems and Bifurcations. LNM, vol. 1125, pp. 99–106. Springer, Heidelberg (1985).  https://doi.org/10.1007/BFb0075637CrossRefGoogle Scholar
  60. 60.
    Tang, J., Chen, Z., Fu, A.W., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 535–548. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-47887-6_53CrossRefGoogle Scholar
  61. 61.
    Wang, Y., Parthasarathy, S., Tatikonda, S.: Locality sensitive outlier detection: a ranking driven approach. In: Proceedings of ICDE, pp. 410–421 (2011)Google Scholar
  62. 62.
    Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 813–822. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-01307-2_84CrossRefGoogle Scholar
  63. 63.
    Zimek, A., Campello, R.J.G.B., Sander, J.: Ensembles for unsupervised outlier detection: challenges and research questions. SIGKDD Explor. 15(1), 11–22 (2013)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.National Institute of InformaticsChiyoda-kuJapan
  2. 2.Heidelberg UniversityHeidelbergGermany
  3. 3.University of Southern DenmarkOdense MDenmark

Personalised recommendations