Abstract
Clustering analysis is an unsupervised method to find out hidden structures in datasets. Most partitional clustering algorithms are sensitive to the selection of initial exemplars, the outliers and noise. In this paper, a novel technique called data competition algorithm is proposed to solve the problems. First the concept of aggregation field model is defined to describe the partitional clustering problem. Next, the exemplars are identified according to the data competition. Then, the members will be assigned to the suitable clusters. Data competition algorithm is able to avoid poor solutions caused by unlucky initializations, outliers and noise, and can be used to detect the coexpression gene, cluster the image, diagnose the disease, distinguish the variety, etc. The provided experimental results validate the feasibility and effectiveness of the proposed schemes and show that the proposed approach of data competition algorithm is simple, stable and efficient. The experimental results also show that the proposed approach of data competition clustering outperforms three of the most well known clustering algorithms K-means clustering, affinity propagation clustering, hierarchical clustering.
Similar content being viewed by others
References
Xu R, Wunsch D. Survey of clustering algorithms. IEEE Trans Neural Netw, 2005, 16: 645–678
Sun J G, Liu J, Zhao L Y. Clustering algorithms research. J Softw, 2008, 19: 48–61
Filippone M, Camastra F, Masulli F, et al. A survey of kernel and spectral methods for clustering. Pattern Recognit, 2008, 41: 176–190
Tian Z, Li X B, Ju Y W. Spectral clustering based on matrix pertur-bation theory. Sci China Ser F: Inf Sci, 2007, 50: 63–81
Fernández A, Gómez S. Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms. J Classif, 2008, 25: 43–65
MacQueen J B. Some methods for classification and analysis of multivariate observations. In: The 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, 1967. 281–297
Frey B J, Dueck D. Clustering by passing message between data points. Science, 2007, 315: 972–976
Guha S, Pastogi R, Shim K. CURE: An efficient clustering algorithm for large databases. In: Proc. of 1998 ACMSIGMOD Intl. Conf. on Management of Data, Washington, 1998. 73–84, 118–121
Agrawal R, Gehrke J, Gunopulos D, et al. Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. of 1998 ACM-SIGMOD Intl. Conf. on Management of Data, Washington, 1998. 94–105
Zhang T, Ramakrishnan R, Livny M. BIRCH: an efficient data clustering method for very large databases. In: Proc. of 1996 ACM-SIGMOD Intl. Conf on Mangement of Data, Quebec, 1996. 103–114
Jing L P, Michael K N, Huang J Z. An entropy weighting K-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans Knowl Data Eng, 2007, 19: 1026–1041
Chitade A Z, Katiyar S K. Color based image segmentation using K-means clustering. Int J Eng Sci Technol, 2010, 2: 5319–5325
Kanungo T, Mount D M, Netanyahu N. An efficient K-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell, 2002, 24: 881–892
Steinbach M, Karypis G, Kumar V. A comparison of document clustering techniques. In: Proc. 6th Knowledge Discovery Data Mining, Boston, 2000
Park H S, Jun C H. A simple and fast algorithm for K-medoids clustering. Expert Syst Appl, 2009, 36: 3336–3341
Chang D X, Zhang X D, Zheng C W. A genetic algorithm with gene rearrangement for K-means clustering. Pattern Recognit, 2009, 42: 1210–1222
Shehroz S K, Amir A. Cluster center initialization algorithm for K-means clustering. Pattern Recognit Lett, 2004, 25: 1293–1302
Jim Z C L, Huang T J. Fast global K-means clustering using cluster membership and inequality. Pattern Recognit, 2010, 43: 1954–1963
Kiddle S J, Windram O P, Mchattie S. Temporal clustering by affinity propagation reveals transcriptional modules in arabidopsis thaliana. Bioinformatics, 2010, 26: 355–362
Mézard M, Parisi G, Zecchina R. Analytic and algorithmic solution of random satisfability problems. Science, 2002, 297: 812–815
Mézard M. Passing messages between disciplines. Science, 2003, 301: 1685–1686
Michael J B, Kohn H F. Comment on ‘clustering by passing messages between data points’. Science, 2008, 319: 726–726
Frey B J, Dueck D. Response to comment on ‘clustering by passing messages between data points’. Science, 2008, 319: 726–726
Wang C D, Lai J H. Energy based competitive learning. Neurocomputing, 2011, 74: 2265–2275
Xu L, Krzyzak A, Oja E. Rival penalized competitive learning for clustering analysis, RBF net, and curve detection. IEEE Trans Neural Netw, 1993, 4: 636–649
Wang K J. Supplement for affinity propagation. 2011 December 5. Available from: http://www.mathworks.com/matlabcentral/fileexchange/authors/24811
UCI Machine Learning Repositpory. 2011 December 5. Available from: http://archive.ics.uci.edu/ml/
Witten L H, Frank E, Hall M A. Data Ming: Practical Machine Learning Tools and Techniques. 3rd ed. San Fransisco: Morgan Kaufmann Publishers, 2011. 175
Jiang D X, Tang C, Zhang A D. Cluster analysis for gene expression data: A survey. IEEE Trans Knowl Data Eng, 2004, 16: 1370–1386
Dudoit S, Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol, 2002, 3: 1–21
Fowlkes E B, Mallows C L. A method for comparing two hierarchical clusterings. J Am Stat Assoc, 1983, 78: 553–569
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lu, Z., Zhang, Q. Clustering by data competition. Sci. China Inf. Sci. 56, 1–13 (2013). https://doi.org/10.1007/s11432-012-4627-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-012-4627-2