An Improved Outlier Detection Algorithm Based on Reverse K-Nearest Neighbors of Adaptive Parameters

  • Xie Fangfang
  • Xu Liancheng
  • Chi Xuezhi
  • Zhu Zhenfang
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 269)

Abstract

The outlier detection algorithm based on reverse k-nearest neighbors can detect isolated points. The time complexity of finding the k-nearest neighbor is O(kN 2), which is not suitable for large data set, and the selection of the parameters k have a great impact on getting the outliers in large data set. This paper used an adaptive method to determine the parameters k, and proposed an efficient pruning method by the triangle inequality, which reduced the computation in detecting outliers. The theoretical analysis and experimental results demonstrated the feasibility and efficiency of the algorithm.

Keywords

Adaptive parameters k-nearest neighbors Outliers detection Reverse k-nearest neighbors 

Notes

Acknowledgments

This work is supported partly by National Nature Science Foundation of China (60873247), Science and Technology Plan in Colleges and Universities of Shandong Province (J12LN21).

References

  1. 1.
    Han J, Kamber M. (2011) Data mining concepts and techniques. Morgan kaufmann Machinery industry press, p 295Google Scholar
  2. 2.
    Wu M, Jermaine C (2006) Outlier detection by sampling with accuracy guarantees. In: Proceedings of the 12th ACM SIGkDD international conference on knowledge discovery and data mining. ACM, Philadelphia, pp 767–772Google Scholar
  3. 3.
    Gu H, Rastogi R, SHIM K (1998) Cure: an efficient clustering algorithm for large databases In: Proceedings of the 1998 ACN SIGMOD international conference on management of data montreal. ACM, pp 73–84Google Scholar
  4. 4.
    Herman CA (1952) Measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann Math Stat 23(4):493–507Google Scholar
  5. 5.
    Saha BN (2009) Ray N, Zhang H. snake validation: a PCA-based outlier detection method. IEEE Signal Process Lett 16(6):549–552CrossRefGoogle Scholar
  6. 6.
    Jie H, Gongde G (2009) Distributed intrusion detection architecture based on incremental kNN model. Microcomput Appl 30(11):29–32Google Scholar
  7. 7.
    Korn F, Muthukrishna S (2000) Influence sets based on reverse nearest neighbors queries. In: Proceedings of ACM, SIGMOD, pp 201–212Google Scholar
  8. 8.
    Chenyi X, Hsu W, Lee ML, et al. (2006) BODER: efficient computation of boundary points. IEEE Trans knowl Data Eng, 18Google Scholar
  9. 9.
    Sheng L, Shimin L. (2004) Distance-based outlier detection research. Computer Eng Appl 40 (33):73–75Google Scholar
  10. 10.
    Yue F, Baozhi Q (2007) The outlier detection algorithm based on reverse k neighbor. Comput Eng Appl, lancet (7):182–184Google Scholar
  11. 11.
    ShengZong L, XiaoPing F (2012) Applies to connection properties outlier test samples of the adaptive parameters. Appl res Comput 29(9):3259–3262Google Scholar
  12. 12.
    Bhaduri K, Matthews BL, Giannella CR (2011) Algorithms for speeding up distance-based outlier detection In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining.[S.I SCM Press, LondonGoogle Scholar
  13. 13.
    Sambasivam S, Theodsopoulos N (2006) Advanced data clustering methods of mining Web documents. Issues Informing Sci Infor Technol 3:563–579Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  • Xie Fangfang
    • 1
    • 2
  • Xu Liancheng
    • 1
    • 2
  • Chi Xuezhi
    • 3
  • Zhu Zhenfang
    • 4
  1. 1.School of Information Science & EngineeringShandong Normal UniversityJinanChina
  2. 2.Shandong Provincial Key Laboratory for Novel Distributed Computer Software TechnologyJinanChina
  3. 3.Shandong Poice CollegeJinanChina
  4. 4.School of Information Science and Electric EngineeringShandong Jiaotong UniversityJinanChina

Personalised recommendations