Advertisement

Beyond Outliers and on to Micro-clusters: Vision-Guided Anomaly Detection

  • Wenjie FengEmail author
  • Shenghua LiuEmail author
  • Christos FaloutsosEmail author
  • Bryan HooiEmail author
  • Huawei ShenEmail author
  • Xueqi ChengEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11439)

Abstract

Given a heatmap for millions of points, what patterns exist in the distributions of point characteristics, and how can we detect them and separate anomalies in a way similar to human vision? In this paper, we propose a vision-guided algorithm, EagleMine, to recognize and summarize point groups in the feature spaces. EagleMine utilizes a water-level tree to capture group structures according to vision-based intuition at multiple resolutions, and adopts statistical hypothesis tests to determine the optimal groups along the tree. Moreover, EagleMine can identify anomalous micro-clusters (i.e., micro-size groups), which exhibit very similar behavior but deviate away from the majority. Extensive experiments are conducted for large graph scenario, and show that our method can recognize intuitive node groups as human vision does, and achieves the best performance in summarization compared to baselines. In terms of anomaly detection, EagleMine also outperforms state-of-the-art graph-based methods by significantly improving accuracy in synthetic and microblog datasets.

Notes

Acknowledgments

This material is based upon work supported by the Strategic Priority Research Program of CAS (XDA19020400), NSF of China (61772498, 61425016, 91746301, 61872206), and the Beijing NSF (4172059).

References

  1. 1.
    Supplementary document (proof and additional experiments). https://goo.gl/ZjMwYe
  2. 2.
    Akoglu, L., Chau, D.H., Kang, U., Koutra, D., Faloutsos, C.: OPAvion: mining and visualization in large graphs. In: SIGMOD, pp. 717–720 (2012)Google Scholar
  3. 3.
    Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. PAMI 33, 898–916 (2011)CrossRefGoogle Scholar
  4. 4.
    Böhm, C., Faloutsos, C., Pan, J.Y., Plant, C.: Robust information-theoretic clustering. In: KDD, pp. 65–75. ACM (2006)Google Scholar
  5. 5.
    Borkin, M., et al.: Evaluation of artery visualizations for heart disease diagnosis. IEEE Trans. Vis. Comput. Graph. 17, 2479–2488 (2011)CrossRefGoogle Scholar
  6. 6.
    Buja, A., Tukey, P.A.: Computing and Graphics in Statistics. Springer, New York (1991)CrossRefGoogle Scholar
  7. 7.
    Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM TKDD 10(1), 5:1–5:51 (2015).  https://doi.org/10.1145/2733381CrossRefGoogle Scholar
  8. 8.
    Chakrabarti, D., Papadimitriou, S., Modha, D.S., Faloutsos, C.: Fully automatic cross-associations. In: SIGKDD, pp. 79–88 (2004)Google Scholar
  9. 9.
    Chernobai, A., Rachev, S.T., Fabozzi, F.J.: Composite goodness-of-fit tests for left-truncated loss samples. In: Lee, C.-F., Lee, J.C. (eds.) Handbook of Financial Econometrics and Statistics, pp. 575–596. Springer, New York (2015).  https://doi.org/10.1007/978-1-4614-7750-1_20CrossRefGoogle Scholar
  10. 10.
    Cubedo, M., Oller, J.M.: Hypothesis testing: a model selection approach (2002)Google Scholar
  11. 11.
    DiCarlo, J.J., Zoccolan, D., Rust, N.C.: How does the brain solve visual object recognition? Neuron 73, 415–434 (2012)CrossRefGoogle Scholar
  12. 12.
    Elias, P.: Universal codeword sets and representations of the integers. IEEE Trans. Inf. Theory 21, 194–203 (1975)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996)Google Scholar
  14. 14.
    Fakhraei, S., Foulds, J., Shashanka, M., Getoor, L.: Collective spammer detection in evolving multi-relational social networks. In: SIGKDD, KDD 2015. ACM (2015)Google Scholar
  15. 15.
    Gonzalez, R.C., Woods, R.E.: Digital image processing (2007)Google Scholar
  16. 16.
    Hamerly, G., Elkan, C.: Learning the k in k-means. In: NIPS (2004)Google Scholar
  17. 17.
    Heynckes, M.: The predictive vs. the simulating brain: a literature review on the mechanisms behind mimicry. Maastricht Stud. J. Psychol. Neurosci. 4(15) (2016)Google Scholar
  18. 18.
    Hooi, B., Song, H.A., Beutel, A., Shah, N., Shin, K., Faloutsos, C.: FRAUDAR: bounding graph fraud in the face of camouflage. In: SIGKDD, pp. 895–904 (2016)Google Scholar
  19. 19.
    Huber, P.J.: Projection pursuit. Ann. Stat. 13(2), 435–475 (1985)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Jiang, M., Cui, P., Beutel, A., Faloutsos, C., Yang, S.: CatchSync: catching synchronized behavior in large directed graphs. In: SIGKDD (2014)Google Scholar
  21. 21.
    Jiang, M., Cui, P., Beutel, A., Faloutsos, C., Yang, S.: Inferring strange behavior from connectivity pattern in social networks. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014. LNCS (LNAI), vol. 8443, pp. 126–138. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-06608-0_11CrossRefGoogle Scholar
  22. 22.
    Kang, U., Lee, J.-Y., Koutra, D., Faloutsos, C.: Net-ray: visualizing and mining billion-scale graphs. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014. LNCS (LNAI), vol. 8443, pp. 348–361. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-06608-0_29CrossRefGoogle Scholar
  23. 23.
    Kang, U., Meeder, B., Faloutsos, C.: Spectral analysis for billion-scale graphs: discoveries and implementation. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011. LNCS (LNAI), vol. 6635, pp. 13–25. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-20847-8_2CrossRefGoogle Scholar
  24. 24.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. JACM 46(5), 604–632 (1999).  https://doi.org/10.1145/324133.324140MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Koutra, D., Jin, D., Ning, Y., Faloutsos, C.: Perseus: an interactive large-scale graph mining and visualization tool. VLDB 8(12), 1924–1927 (2015)Google Scholar
  26. 26.
    Kumar, R., Novak, J., Tomkins, A.: Structure and evolution of online social networks. In: Yu, P., Han, J., Faloutsos, C. (eds.) Link Mining: Models, Algorithms, and Applications, pp. 337–357. Springer, New York (2010).  https://doi.org/10.1007/978-1-4419-6515-8_13CrossRefGoogle Scholar
  27. 27.
    Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative analysis. Phys. Rev. E 80, 056117 (2009)CrossRefGoogle Scholar
  28. 28.
    Liu, X.M., Ji, R., Wang, C., Liu, W., Zhong, B., Huang, T.S.: Understanding image structure via hierarchical shape parsing. In: CVPR (2015)Google Scholar
  29. 29.
    McAuley, J.J., Leskovec, J.: From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In: WWW (2013)Google Scholar
  30. 30.
    Prakash, B.A., Sridharan, A., Seshadri, M., Machiraju, S., Faloutsos, C.: EigenSpokes: surprising patterns and scalable community chipping in large graphs. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS (LNAI), vol. 6119, pp. 435–448. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-13672-6_42CrossRefGoogle Scholar
  31. 31.
    Roerdink, J.B., Meijster, A.: The watershed transform: definitions, algorithms and parallelization strategies. Fundam. Informaticae 41, 187–228 (2000)MathSciNetzbMATHGoogle Scholar
  32. 32.
    Schaeffer, S.E.: Graph clustering. Comput. Sci. Rev. 1, 27–64 (2007)CrossRefGoogle Scholar
  33. 33.
    Stephens, M.A.: EDF statistics for goodness of fit and some comparisons. J. Am. Stat. Assoc. 63, 730–737 (1974)CrossRefGoogle Scholar
  34. 34.
    Thompson, H.R.: Truncated normal distributions. Nature 165, 444–445 (1950)CrossRefGoogle Scholar
  35. 35.
    Tukey, J.W., Tukey, P.A.: Computer graphics and exploratory data analysis: an introduction. National Computer Graphics Association (1985)Google Scholar
  36. 36.
    Vincent, L., Soille, P.: Watersheds in digital spaces: an efficient algorithm based on immersion simulations. PAMI 13, 583–598 (1991)CrossRefGoogle Scholar
  37. 37.
    Wang, W., Yang, J., Muntz, R., et al.: STING: a statistical information grid approach to spatial data mining. In: VLDB, pp. 186–195 (1997)Google Scholar
  38. 38.
    Ware, C.: Color sequences for univariate maps: theory, experiments and principles. IEEE Comput. Graph. Appl. 8, 41–49 (1988)CrossRefGoogle Scholar
  39. 39.
    Wilkinson, L., Anand, A., Grossman, R.: Graph-theoretic scagnostics. In: Proceedings - IEEE Symposium on Information Visualization, INFO VIS, pp. 157–164 (2005)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.CAS Key Laboratory of Network Data Science and Technology, Institute of Computing TechnologyChinese Academy of SciencesBeijingChina
  2. 2.University of Chinese Academy of SciencesBeijingChina
  3. 3.School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations