A New Clustering Algorithm Based on Probability
Clustering is a hot topic of data mining. After studying the existing classical algorithm of clustering, this paper proposes a new clustering algorithm based on probability, and makes a new definition for clustering and outlier. According to the distribution characteristics of sample data, this algorithm determines the initial clustering center automatically. It also implements eliminating outliers in the process of clustering. The experiment results on IRIS show that this algorithm can clustering effectively.
KeywordsClustering Outlier DBSCAN Algorithm Mathematical expectation Standard deviation
Unable to display preview. Download preview PDF.
- 1.Zhai, D., et al.: K-means text clustering algorithm based on initial cluster centers selection according to maximum distance. Application Research of Computer 31(3), 713–715 (2014)Google Scholar
- 2.Xia, L.N., Jing, J.W.: SA-DBSCAN: A self-adaptive density-based clustering algorithm. Journal of the Graduate School of the Chinese Academy of Sciences 26(4), 530–538 (2009)Google Scholar
- 3.MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: LeCam, L., Neyman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics and Probability, pp. 281–297. University of California Press, Berkeley (1967)Google Scholar
- 4.Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Post & Telecom Press, Beijing (2006)Google Scholar
- 5.Ester, M., Kriegel, H.P., Sander, J.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J.W., Fayyad, U.M. (eds.) Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press, Portland (1996)Google Scholar
- 6.Shen, H.: Probability and Statistics, 5th edn. Higher Education Press, Beijing (2011)Google Scholar
- 7.Yu, Y., Zhou, A.: An Improved Algorithm of DBSCAN. Computer Technology and Development 21(2), 30–33, 38 (2011)Google Scholar
- 9.Chen, S., He, Y.J., Zhen, M.G.: NPP-oriented intelligent diagnose. Nuclear Power Engineering and Technology (3), 20–24 (2003)Google Scholar
- 10.Center for Machine Learning and Intelligent Systems at the University of California, Irvine, http://archive.ics.uci.edu/ml/datasets/Iris
- 11.Witten, I.H., Frank, E., Hall, M.A.: Data Mining Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann (2011)Google Scholar