Advertisement

Local Outlier Detection Algorithm Based on Gaussian Kernel Density Function

  • Zhongping ZhangEmail author
  • Jiaojiao Liu
  • Chuangye Miao
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 986)

Abstract

With the rapid development of information technology, the structure of data resources is becoming more and more complex, and outlier mining is attracting more and more attention. Based on Gaussian kernel function, this paper considers three neighbors: k nearest neighbors, reverse k neighbors and shared nearest neighbors. A local outlier detection algorithm based on Gaussian kernel function is proposed. Firstly, the algorithm stores the nearest neighbors of each data object through kNN maps, including k-nearest neighbors, reverse k-nearest neighbors, and shared nearest neighbors, forming a kernel neighbor set S. Secondly, Estimating density of data objects through kernel density estimation KDE method. Finally, the relative density outlier factor RDOF is used to estimate the degree of data objects deviating from the neighborhood, and then determines whether the data objects are outliers, and the validity of the algorithm is proved on the real and synthetic data sets.

Keywords

Data mining Outliers Gaussian kernel function Kernel density Kernel neighbor 

References

  1. 1.
    Aggarwal, C.: Outlier Analysis, pp. 75–99. Springer, Germany (2015).  https://doi.org/10.1007/978-1-4614-6396-2_3CrossRefGoogle Scholar
  2. 2.
    Braun, T.D., Siegal, H.J., Beck, N., et al.: A comparison study of static mapping heuristics for a class of meta-tasks on heterogeneous computing systems. In: Eighth Heterogeneous Computing Workshop. IEEE Computer Society (1999)Google Scholar
  3. 3.
    Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann, Burlington (2006). 5(4), 1–18zbMATHGoogle Scholar
  4. 4.
    Pham, N., Pagh, R.: A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. In: ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM (2012)Google Scholar
  5. 5.
    Tang, J., Chen, Z., Fu, A.W., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 535–548. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-47887-6_53CrossRefGoogle Scholar
  6. 6.
    Qian, X.Z., Deng, J., Qian, H., et al.: An efficient density biased sampling algorithm for clustering large high-dimensional datasets. Int. J. Pattern Recognit Artif Intell. 29(08), 1550026 (2015)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Han, J.W., Micheline, K.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers, San Francisco (2006)zbMATHGoogle Scholar
  8. 8.
    Muller, E., Sanchez, P.I., Mulle, Y., et al.: Ranking outlier nodes in subspaces of attributed graphs (2013)Google Scholar
  9. 9.
    Hoeting, J., Raftery, A.E., Madigan, D.: A method for simultaneous variable selection and outlier identification in linear regression. Comput. Stat. Data Anal. 54(12), 3181–3193 (1996)zbMATHGoogle Scholar
  10. 10.
    Knorr, E.M., Tucakov, V., Tucakov, V.: Distance-based outliers: algorithms and applications. VLDB J.—Int. J. Very Large Data Bases 8, 237–253 (2000)CrossRefGoogle Scholar
  11. 11.
    Zhang, H., Wu, Q., Pu, J.: A novel fuzzy kernel clustering algorithm for outlier detection. In: International Conference on Mechatronics & Automation. IEEE (2007)Google Scholar
  12. 12.
    Pamula, R., Deka, J.K., Nandi, S.: An Outlier Detection Method Based on Clustering (2011)Google Scholar
  13. 13.
    Nguyen, H.V., Müller, E., Vreeken, J., et al.: CMI: an information-theoretic contrast measure for enhancing subspace cluster and outlier detection. In: SDM, pp. 198–206 (2013)Google Scholar
  14. 14.
    Zhou, S., Zhao, Y., Guan, J., Huang, J.: A neighborhood-based clustering algorithm. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 361–371. Springer, Heidelberg (2005).  https://doi.org/10.1007/11430919_43CrossRefGoogle Scholar
  15. 15.
    Wu, S., Wang, S.: Information-theoretic outlier detection for large-scale categorical data. IEEE Trans. Knowl. Data Eng. 25(3), 589–602 (2013)CrossRefGoogle Scholar
  16. 16.
    Sun, P., Chawla, S., Arunasalam, B.: Mining for outliers in sequential databases. In: Proceedings of the Sixth SIAM International Conference on Data Mining, Bethesda, pp. 94–105 (2006)Google Scholar
  17. 17.
    Lazarus, D., Weinkauf, M., Diver, P.: Pacman profiling: a simple procedure to identify stratigraphic outliers in high-density deep-sea microfossil data. Paleobiology 38(1), 144–161 (2012)CrossRefGoogle Scholar
  18. 18.
    Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 813–822. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-01307-2_84CrossRefGoogle Scholar
  19. 19.
    Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu.html
  20. 20.
    Hettich, S., Bay, S., Musster, K., Winner, J.: KDD CUP (1999). http://kdd.isc.uci.edu/databases/kddcpu99/kddcpu99.html. Accessed 01 Sept 2011

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.School of Information Science and EngineeringYanshan UniversityQinhuangdaoChina
  2. 2.The Key Laboratory for Computer Virtual Technology and System Integration of Hebei ProvinceQinhuangdaoChina

Personalised recommendations