Advertisement

Unsupervised Ensemble Learning for Mining Top-n Outliers

  • Jun Gao
  • Weiming Hu
  • Zhongfei(Mark) Zhang
  • Ou Wu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7301)

Abstract

Outlier detection is an important and attractive problem in knowledge discovery in large datasets. Instead of detecting an object as an outlier, we study detecting the n most outstanding outliers, i.e. the top-n outlier detection. Further, we consider the problem of combining the top-n outlier lists from various individual detection methods. A general framework of ensemble learning in the top-n outlier detection is proposed based on the rank aggregation techniques. A score-based aggregation approach with the normalization method of outlier scores and an order-based aggregation approach based on the distance-based Mallows model are proposed to accommodate various scales and characteristics of outlier scores from different detection methods. Extensive experiments on several real datasets demonstrate that the proposed approaches always deliver a stable and effective performance independent of different datasets in a good scalability in comparison with the state-of-the-art literature.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hawkins, D.: Identification of Outliers. Chapman and Hall, London (1980)zbMATHGoogle Scholar
  2. 2.
    Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Journal of Biometrika 57(1), 97–109 (1970)zbMATHCrossRefGoogle Scholar
  3. 3.
    Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. Journal of VLDB 8(3-4), 237–253 (2000)CrossRefGoogle Scholar
  4. 4.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. Journal of ACM Computing Surveys (CSUR) 31(3), 264–323 (1999)CrossRefGoogle Scholar
  5. 5.
    Barnett, V., Lewis, T.: Outliers in Statistic Data. John Wiley, New York (1994)Google Scholar
  6. 6.
    Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: Lof: Identifying density-based local outliers. In: SIGMOD, pp. 93–104 (2000)Google Scholar
  7. 7.
    Papadimitriou, S., Kitagawa, H., Gibbons, P.: Loci: Fast outlier detection using the local correlation integral. In: ICDE, pp. 315–326 (2003)Google Scholar
  8. 8.
    Yang, J., Zhong, N., Yao, Y., Wang, J.: Local peculiarity factor and its application in outlier detection. In: KDD, pp. 776–784 (2008)Google Scholar
  9. 9.
    Gao, J., Hu, W., Zhang, Z(M.), Zhang, X., Wu, O.: RKOF: Robust Kernel-Based Local Outlier Detection. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS(LNAI), vol. 6635, pp. 270–283. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  10. 10.
    Abe, N., Zadrozny, B., Langford, J.: Outlier detection by active learning. In: KDD, pp. 504–509 (2006)Google Scholar
  11. 11.
    Breiman, L.: Random Forests. J. Machine Learning 45(1), 5–32 (2001)zbMATHCrossRefGoogle Scholar
  12. 12.
    Fox, E., Shaw, J.: Combination of multiple searches. In: The Second Text REtrieval Conference (TREC-2), pp. 243–252 (1994)Google Scholar
  13. 13.
    Lazarevic, A., Kumar, V.: Feature bagging for outlier detection. In: KDD, pp. 157–166 (2005)Google Scholar
  14. 14.
    Gao, J., Tan, P.N.: Converting output scores from outlier detection algorithms into probability estimates. In: ICDM, pp. 212–221 (2006)Google Scholar
  15. 15.
    Nguyen, H., Ang, H., Gopalkrishnan, V.: Mining outliers with ensemble of heterogeneous detectors on random subspaces. Journal of DASFAA 1, 368–383 (2010)Google Scholar
  16. 16.
    Mallows, C.: Non-null ranking models. I. J. Biometrika 44(1/2), 114–130 (1957)MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Lebanon, G., Lafferty, J.: Cranking: Combining rankings using conditional probability models on permutations. In: ICML, pp. 363–370 (2002)Google Scholar
  18. 18.
    Klementiev, A., Roth, D., Small, K.: Unsupervised rank aggregation with distance-based models. In: ICML, pp. 472–479 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jun Gao
    • 1
  • Weiming Hu
    • 1
  • Zhongfei(Mark) Zhang
    • 2
  • Ou Wu
    • 1
  1. 1.National Laboratory of Pattern Recognition, Institute of AutomationChinese Academy of SciencesBeijingChina
  2. 2.Dept. of Computer ScienceState Univ. of New York at BinghamtonBinghamtonUSA

Personalised recommendations