An Improved Outlier Detection Algorithm Based on Reverse K-Nearest Neighbors of Adaptive Parameters
The outlier detection algorithm based on reverse k-nearest neighbors can detect isolated points. The time complexity of finding the k-nearest neighbor is O(kN 2), which is not suitable for large data set, and the selection of the parameters k have a great impact on getting the outliers in large data set. This paper used an adaptive method to determine the parameters k, and proposed an efficient pruning method by the triangle inequality, which reduced the computation in detecting outliers. The theoretical analysis and experimental results demonstrated the feasibility and efficiency of the algorithm.
KeywordsAdaptive parameters k-nearest neighbors Outliers detection Reverse k-nearest neighbors
This work is supported partly by National Nature Science Foundation of China (60873247), Science and Technology Plan in Colleges and Universities of Shandong Province (J12LN21).
- 1.Han J, Kamber M. (2011) Data mining concepts and techniques. Morgan kaufmann Machinery industry press, p 295Google Scholar
- 2.Wu M, Jermaine C (2006) Outlier detection by sampling with accuracy guarantees. In: Proceedings of the 12th ACM SIGkDD international conference on knowledge discovery and data mining. ACM, Philadelphia, pp 767–772Google Scholar
- 3.Gu H, Rastogi R, SHIM K (1998) Cure: an efficient clustering algorithm for large databases In: Proceedings of the 1998 ACN SIGMOD international conference on management of data montreal. ACM, pp 73–84Google Scholar
- 4.Herman CA (1952) Measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann Math Stat 23(4):493–507Google Scholar
- 6.Jie H, Gongde G (2009) Distributed intrusion detection architecture based on incremental kNN model. Microcomput Appl 30(11):29–32Google Scholar
- 7.Korn F, Muthukrishna S (2000) Influence sets based on reverse nearest neighbors queries. In: Proceedings of ACM, SIGMOD, pp 201–212Google Scholar
- 8.Chenyi X, Hsu W, Lee ML, et al. (2006) BODER: efficient computation of boundary points. IEEE Trans knowl Data Eng, 18Google Scholar
- 9.Sheng L, Shimin L. (2004) Distance-based outlier detection research. Computer Eng Appl 40 (33):73–75Google Scholar
- 10.Yue F, Baozhi Q (2007) The outlier detection algorithm based on reverse k neighbor. Comput Eng Appl, lancet (7):182–184Google Scholar
- 11.ShengZong L, XiaoPing F (2012) Applies to connection properties outlier test samples of the adaptive parameters. Appl res Comput 29(9):3259–3262Google Scholar
- 12.Bhaduri K, Matthews BL, Giannella CR (2011) Algorithms for speeding up distance-based outlier detection In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining.[S.I SCM Press, LondonGoogle Scholar
- 13.Sambasivam S, Theodsopoulos N (2006) Advanced data clustering methods of mining Web documents. Issues Informing Sci Infor Technol 3:563–579Google Scholar