Advertisement

Outlier Detection Based on Cluster Outlier Factor and Mutual Density

  • Zhongping ZhangEmail author
  • Mengfan ZhuEmail author
  • Jingyang Qiu
  • Cong Liu
  • Debin Zhang
  • Jie Qi
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 986)

Abstract

Outlier detection is an important task in data mining with numerous applications. Recent years, the study on outlier detection is very active, many algorithms were proposed including that based on clustering. However, most outlier detection algorithms based on clustering often need parameters, and it is very difficult to select a suitable parameter for different data set. In order to solve this problem, an outlier detection algorithm called outlier detection based on cluster outlier factor and mutual density is proposed in this paper which combining the natural neighbor search algorithm of the Natural Outlier Factor (NOF) algorithm and based on the Density and Distance Cluster (DDC) algorithm. The mutual density and γ density is used to construct decision graph. The data points with γ density anomalously large in decision graph are treated as cluster centers. This algorithm detect the boundary of outlier cluster using cluster outlier factor called Cluster Outlier Factor (COF), it can automatic find the parameter. This method can achieve good performance in clustering and outlier detection which be shown in the experiments.

Keywords

Data mining Outlier Mutual density γ density Cluster outlier factor 

References

  1. 1.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, Burlington (2001). 5(4):394–395 (2006, in Chinese)zbMATHGoogle Scholar
  2. 2.
    Denning, D.E.: An intrusion-detection model. IEEE Trans. Softw. Eng. SE-13(2), 222–232 (2006)CrossRefGoogle Scholar
  3. 3.
    Bolton, R.J., David, J.H.: Unsupervised profiling methods for fraud detection. In: Proceedings of Credit Scoring & Credit Control VII, pp. 5–7 (2001)Google Scholar
  4. 4.
    Laurikkala, J., Juhola, M., Kentala, E.: Informal identification of outliers in medical data. In: Intelligent Data Analysis in Medicine & Pharmacology (2000)Google Scholar
  5. 5.
    Lin, J., Keogh, E., Fu, A., et al.: Approximations to magic: finding unusual medical time series. In: 2005 Proceedings of IEEE Symposium on Computer-Based Medical Systems, pp. 329–334. IEEE (2005)Google Scholar
  6. 6.
    Zhao, J., Lu, C.T., Kou, Y.: Detecting region outliers in meteorological data, pp. 49–55 (2003)Google Scholar
  7. 7.
    Bhattacharya, G., Ghosh, K., Chowdhury, A.S.: Outlier detection using neighborhood rank difference, pp. 24–31. Elsevier Science Inc. (2015)Google Scholar
  8. 8.
    Xue, A.-R., Ju, S.-G., He, W.-H., et al.: Study on algorithms for local outlier detection. Chinese J. Comput. 30(8), 1455–1463 (2007)Google Scholar
  9. 9.
    Wang, Y., Zhang, J.-F., Zhao, X.-J.: Contextual outlier mining algorithm based on particle swarm optimization. J. Taiyuan Univ. Sci. Technol. 36(5), 327–332 (2015)Google Scholar
  10. 10.
    Hawkins, D.M.: Identification of outliers. Biometrics 37(4), 860 (1980)zbMATHGoogle Scholar
  11. 11.
    Xu, X., Liu, J.-W., Luo, X.-L.: Research on outlier mining. Appl. Res. Comput. 26(1), 34–40 (2009). (in Chinese)Google Scholar
  12. 12.
    Breunig, M.M., Kriegel, H.P., Ng, R.T., et al.: LOF: identifying density-based local outliers. ACM SIGMOD Rec. 29(2), 93–104 (2000)CrossRefGoogle Scholar
  13. 13.
    Ha, J., Seok, S., Lee, J.S.: Robust outlier detection using the instability factor. Knowl.-Based Syst. 63(2), 15–23 (2014)CrossRefGoogle Scholar
  14. 14.
    Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.K., Kitsuregawa, M., Li, J., Chang, K. (eds.) Advances in Knowledge Discovery and Data Mining. LNCS, vol. 3918, pp. 577–593. Springer, Heidelberg (2006).  https://doi.org/10.1007/11731139_68CrossRefGoogle Scholar
  15. 15.
    Tao, J.: Clustering-based and density outlier detection method. Master dissertation of South China University of Technology, pp. 1–56 (2014, in Chinese)Google Scholar
  16. 16.
    Huang, J., Zhu, Q., Yang, L., et al.: A non-parameter outlier detection algorithm based on Natural Neighbor. Knowl.-Based Syst. 92(C), 71–77 (2016)CrossRefGoogle Scholar
  17. 17.
    Rodriguez, A., Laio, A.: Machine learning. Clustering by fast search and find of density peaks. Science 344(6191), 1492 (2014)CrossRefGoogle Scholar
  18. 18.
    Huang, J., Zhu, Q., Yang, L., et al.: A novel outlier cluster detection algorithm without top-n parameter. Knowl.-Based Syst. 121, 32–40 (2017)CrossRefGoogle Scholar
  19. 19.
    Veenman, C.J., Reinders, M.J.T., Backer, E.: A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1273–1280 (2002)CrossRefGoogle Scholar
  20. 20.
    Fu, L., Medico, E.: FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinform. 8(1), 3 (2007)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.School of Information Science and EngineeringYanshan UniversityQinhuangdaoChina
  2. 2.The Key Laboratory for Computer Virtual Technology and System Integration of Hebei ProvinceQinhuangdaoChina
  3. 3.Hebei Education Examinations AuthorityShijiazhuangChina
  4. 4.The First Middle School of Qian An CountryQian’anChina

Personalised recommendations