RKOF: Robust Kernel-Based Local Outlier Detection

  • Jun Gao
  • Weiming Hu
  • Zhongfei (Mark) Zhang
  • Xiaoqin Zhang
  • Ou Wu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6635)

Abstract

Outlier detection is an important and attractive problem in knowledge discovery in large data sets. The majority of the recent work in outlier detection follow the framework of Local Outlier Factor (LOF), which is based on the density estimate theory. However, LOF has two disadvantages that restrict its performance in outlier detection. First, the local density estimate of LOF is not accurate enough to detect outliers in the complex and large databases. Second, the performance of LOF depends on the parameter k that determines the scale of the local neighborhood. Our approach adopts the variable kernel density estimate to address the first disadvantage and the weighted neighborhood density estimate to improve the robustness to the variations of the parameter k, while keeping the same framework with LOF. Besides, we propose a novel kernel function named the Volcano kernel, which is more suitable for outlier detection. Experiments on several synthetic and real data sets demonstrate that our approach not only substantially increases the detection performance, but also is relatively scalable in large data sets in comparison to the state-of-the-art outlier detection methods.

Keywords

Outlier detection Kernel methods Local density estimate 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Rousseeuw, P.J., Leroy, A.M.: Robust Rgression and Outlier Detection. John Wiley and Sons, New York (1987)Google Scholar
  2. 2.
    Hawkins, D.: Identification of Outliers. Chapman and Hall, London (1980)Google Scholar
  3. 3.
    Silverman, B.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)Google Scholar
  4. 4.
    Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: Lof: Identifying density-based local outliers. In: SIGMOD, pp. 93–104 (2000)Google Scholar
  5. 5.
    Papadimitriou, S., Kitagawa, H., Gibbons, P.: Loci: Fast outlier detection using the local correlation integral. In: ICDE, pp. 315–326 (2003)Google Scholar
  6. 6.
    Latecki, L.J., Lazarevic, A., Pokrajac, D.: Outlier Detection with Kernel Density Functions. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 61–75. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  7. 7.
    Yang, J., Zhong, N., Yao, Y., Wang, J.: Local peculiarity factor and its application in outlier detection. In: KDD, pp. 776–784 (2008)Google Scholar
  8. 8.
    Tang, J., Chen, Z., Fu, A.W.-c., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 535–548. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  9. 9.
    Sun, P., Chawla, S.: On local spatial outliers. In: KDD, pp. 209–216 (2004)Google Scholar
  10. 10.
    Lazarevic, A., Kumar, V.: Feature bagging for outlier detection. In: KDD, pp. 157–166 (2005)Google Scholar
  11. 11.
    Abe, N., Zadrozny, B., Langford, J.: Outlier detection by active learning. In: KDD, pp. 504–509 (2006)Google Scholar
  12. 12.
    Breiman, L.: Bagging predictors. J. Machine Learning 24(2), 123–140 (1996)MATHGoogle Scholar
  13. 13.
    Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 113–139 (1997)CrossRefMATHGoogle Scholar
  14. 14.
    Jin, W., Tung, A., Ha, J.: Mining top-n local outliers in large databases. In: KDD, pp. 293–298 (2001)Google Scholar
  15. 15.
    Gao, J., Hu, W., Li, W., Zhang, Z.M., Wu, O.: Local Outlier Detection Based on Kernel Regression. In: ICPR, pp. 585–588 (2010)Google Scholar
  16. 16.
    Barnett, V., Lewis, T.: Outliers in Statistic Data. John Wiley, New York (1994)MATHGoogle Scholar
  17. 17.
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. J. Communications of the ACM 18(9), 509–517 (1975)CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Jun Gao
    • 1
  • Weiming Hu
    • 1
  • Zhongfei (Mark) Zhang
    • 2
  • Xiaoqin Zhang
    • 3
  • Ou Wu
    • 1
  1. 1.National Laboratory of Pattern Recognition, Institute of AutomationChinese Academy of SciencesBeijingChina
  2. 2.Dept. of Computer ScienceState Univ. of New York at BinghamtonBinghamtonUSA
  3. 3.College of Mathematics & Information ScienceWenzhou UniversityZhejiangChina

Personalised recommendations