A New Clustering Algorithm Based on Probability

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 298)

Abstract

Clustering is a hot topic of data mining. After studying the existing classical algorithm of clustering, this paper proposes a new clustering algorithm based on probability, and makes a new definition for clustering and outlier. According to the distribution characteristics of sample data, this algorithm determines the initial clustering center automatically. It also implements eliminating outliers in the process of clustering. The experiment results on IRIS show that this algorithm can clustering effectively.

Keywords

Clustering Outlier DBSCAN Algorithm Mathematical expectation Standard deviation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Zhai, D., et al.: K-means text clustering algorithm based on initial cluster centers selection according to maximum distance. Application Research of Computer 31(3), 713–715 (2014)Google Scholar
  2. 2.
    Xia, L.N., Jing, J.W.: SA-DBSCAN: A self-adaptive density-based clustering algorithm. Journal of the Graduate School of the Chinese Academy of Sciences 26(4), 530–538 (2009)Google Scholar
  3. 3.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: LeCam, L., Neyman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics and Probability, pp. 281–297. University of California Press, Berkeley (1967)Google Scholar
  4. 4.
    Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Post & Telecom Press, Beijing (2006)Google Scholar
  5. 5.
    Ester, M., Kriegel, H.P., Sander, J.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J.W., Fayyad, U.M. (eds.) Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press, Portland (1996)Google Scholar
  6. 6.
    Shen, H.: Probability and Statistics, 5th edn. Higher Education Press, Beijing (2011)Google Scholar
  7. 7.
    Yu, Y., Zhou, A.: An Improved Algorithm of DBSCAN. Computer Technology and Development 21(2), 30–33, 38 (2011)Google Scholar
  8. 8.
    Daszykowski, M., Walczak, B., Massart, D.L.: Looking for Natural Patterns In Data. Chemometrics and Intelligent Laboratory Systems 56(2), 83–92 (2001)CrossRefGoogle Scholar
  9. 9.
    Chen, S., He, Y.J., Zhen, M.G.: NPP-oriented intelligent diagnose. Nuclear Power Engineering and Technology (3), 20–24 (2003)Google Scholar
  10. 10.
    Center for Machine Learning and Intelligent Systems at the University of California, Irvine, http://archive.ics.uci.edu/ml/datasets/Iris
  11. 11.
    Witten, I.H., Frank, E., Hall, M.A.: Data Mining Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Software College, Shenyang Normal UniversityShenyangChina

Personalised recommendations