On Detecting Clustered Anomalies Using SCiForest

  • Fei Tony Liu
  • Kai Ming Ting
  • Zhi-Hua Zhou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6322)

Abstract

Detecting local clustered anomalies is an intricate problem for many existing anomaly detection methods. Distance-based and density-based methods are inherently restricted by their basic assumptions—anomalies are either far from normal points or being sparse. Clustered anomalies are able to avoid detection since they defy these assumptions by being dense and, in many cases, in close proximity to normal instances. In this paper, without using any density or distance measure, we propose a new method called SCiForest to detect clustered anomalies. SCiForest separates clustered anomalies from normal points effectively even when clustered anomalies are very close to normal points. It maintains the ability of existing methods to detect scattered anomalies, and it has superior time and space complexities against existing distance-based and density-based methods.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: SIGMOD 2001: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 37–46. ACM Press, New York (2001)CrossRefGoogle Scholar
  2. 2.
    Angiulli, F., Fassetti, F.: Dolphin: An efficient algorithm for mining distance-based outliers in very large datasets. ACM Trans. Knowl. Discov. Data 3(1), 1–57 (2009)CrossRefGoogle Scholar
  3. 3.
    Angiulli, F., Pizzuti, C.: Outlier mining in large high-dimensional data sets. IEEE Transactions on Knowledge and Data Engineering 17(2), 203–215 (2005)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)Google Scholar
  5. 5.
    Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 29–38. ACM Press, New York (2003)CrossRefGoogle Scholar
  6. 6.
    Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. ACM SIGMOD Record 29(2), 93–104 (2000)CrossRefGoogle Scholar
  7. 7.
    Caputo, B., Sim, K., Furesjo, F., Smola, A.: Appearance-based object recognition using svms: which kernel should i use? In: Proc. of NIPS Workshop on Statitsical Methods for Computational Experiments in Visual Processing and Computer Vision, Whistler (2002)Google Scholar
  8. 8.
    Chandola, V., Banerjee, A., Kumar, V.: Outlier detection - a survey. Technical Report TR 07-017, Univeristy of Minnesota, Minneapolis (2007)Google Scholar
  9. 9.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM Comput. Surv. 41(3), 1–58 (2009)CrossRefGoogle Scholar
  10. 10.
    Fawcett, T., Provost, F.: Adaptive fraud detection. Data Mining and Knowledge Discovery 1(3), 291–316 (1997)CrossRefGoogle Scholar
  11. 11.
    Hawkins, D.M.: Identification of Outliers. Chapman and Hall, London (1980)MATHGoogle Scholar
  12. 12.
    Knorr, E.M.: Outliers and data mining: Finding exceptions in data. PhD thesis, University of British Columbia (2002)Google Scholar
  13. 13.
    Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: VLDB 1998: Proceedings of the 24rd International Conference on Very Large Data Bases, pp. 392–403. Morgan Kaufmann, San Francisco (1998)Google Scholar
  14. 14.
    Knuth, D.E.: The art of computer programming. Addison-Wiley (1968)Google Scholar
  15. 15.
    Lazarevic, A., Kumar, V.: Feature bagging for outlier detection. In: KDD 2005: Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 157–166. ACM Press, New York (2005)CrossRefGoogle Scholar
  16. 16.
    Liu, F.T., Ting, K.M., Zhou, Z.-H.: Isolation forest. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), pp. 413–422 (2008)Google Scholar
  17. 17.
    Moonesignhe, H.D.K., Tan, P.-N.: Outlier detection using random walks. In: ICTAI 2006: Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence, Washington, DC, USA, pp. 532–539. IEEE Computer Society Press, Los Alamitos (2006)CrossRefGoogle Scholar
  18. 18.
    Murphy, R.B.: On Tests for Outlying Observations. PhD thesis, Princeton University (1951)Google Scholar
  19. 19.
    Murthy, S.K., Kasif, S., Salzberg, S.: A system for induction of oblique decision trees. Journal of Artificial Intelligence Research 2, 1–32 (1994)MATHGoogle Scholar
  20. 20.
    Otey, M.E., Ghoting, A., Parthasarathy, S.: Fast distributed outlier detection in mixed-attribute data sets. Data Mining and Knowledge Discovery 12(2-3), 203–228 (2006)CrossRefMathSciNetGoogle Scholar
  21. 21.
    Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: Loci: Fast outlier detection using the local correlation integral. In: Proceedings of the 19th International Conference on Data Engineering (ICDE 2003), pp. 315–326 (2003)Google Scholar
  22. 22.
    Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: SIGMOD 2000: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 427–438. ACM Press, New York (2000)CrossRefGoogle Scholar
  23. 23.
    Rocke, D.M., Woodruff, D.L.: Identification of outliers in multivariate data. Journal of the American Statistical Association 91(435), 1047–1061 (1996)MATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Schilling, M.F., Watkins, A.E., Watkins, W.: Is human height bimodal? The American Statistician 56, 223–229 (2002)CrossRefMathSciNetGoogle Scholar
  25. 25.
    Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Technical Report MSR-TR-99-87, Microsoft Research (1999)Google Scholar
  26. 26.
    Schölkopf, B., Platt, J.C., Shawe-Taylor, J.C., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Computation 13(7), 1443–1471 (2001)MATHCrossRefGoogle Scholar
  27. 27.
    Wong, W.-K., Moore, A., Cooper, G., Wagner, M.: Rule-based anomaly pattern detection for detecting disease outbreaks. In: Eighteenth national conference on Artificial intelligence, pp. 217–223. AAAI, Menlo Park (2002)Google Scholar
  28. 28.
    Yamanishi, K., Takeuchi, J.-I., Williams, G., Milne, P.: On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 320–324. ACM Press, New York (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Fei Tony Liu
    • 1
  • Kai Ming Ting
    • 1
  • Zhi-Hua Zhou
    • 2
  1. 1.Gippsland School of Information TechnologyMonash UniversityVictoriaAustralia
  2. 2.National Key Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina

Personalised recommendations