Clustering by data competition

Lu, ZhiMao; Zhang, Qi

doi:10.1007/s11432-012-4627-2

Clustering by data competition

Research Paper
Published: 17 August 2012

Volume 56, pages 1–13, (2013)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

ZhiMao Lu¹ &
Qi Zhang¹

190 Accesses
2 Citations
Explore all metrics

Abstract

Clustering analysis is an unsupervised method to find out hidden structures in datasets. Most partitional clustering algorithms are sensitive to the selection of initial exemplars, the outliers and noise. In this paper, a novel technique called data competition algorithm is proposed to solve the problems. First the concept of aggregation field model is defined to describe the partitional clustering problem. Next, the exemplars are identified according to the data competition. Then, the members will be assigned to the suitable clusters. Data competition algorithm is able to avoid poor solutions caused by unlucky initializations, outliers and noise, and can be used to detect the coexpression gene, cluster the image, diagnose the disease, distinguish the variety, etc. The provided experimental results validate the feasibility and effectiveness of the proposed schemes and show that the proposed approach of data competition algorithm is simple, stable and efficient. The experimental results also show that the proposed approach of data competition clustering outperforms three of the most well known clustering algorithms K-means clustering, affinity propagation clustering, hierarchical clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Xu R, Wunsch D. Survey of clustering algorithms. IEEE Trans Neural Netw, 2005, 16: 645–678
Article Google Scholar
Sun J G, Liu J, Zhao L Y. Clustering algorithms research. J Softw, 2008, 19: 48–61
Article MATH Google Scholar
Filippone M, Camastra F, Masulli F, et al. A survey of kernel and spectral methods for clustering. Pattern Recognit, 2008, 41: 176–190
Article MATH Google Scholar
Tian Z, Li X B, Ju Y W. Spectral clustering based on matrix pertur-bation theory. Sci China Ser F: Inf Sci, 2007, 50: 63–81
Article MathSciNet MATH Google Scholar
Fernández A, Gómez S. Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms. J Classif, 2008, 25: 43–65
Article MATH Google Scholar
MacQueen J B. Some methods for classification and analysis of multivariate observations. In: The 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, 1967. 281–297
Frey B J, Dueck D. Clustering by passing message between data points. Science, 2007, 315: 972–976
Article MathSciNet MATH Google Scholar
Guha S, Pastogi R, Shim K. CURE: An efficient clustering algorithm for large databases. In: Proc. of 1998 ACMSIGMOD Intl. Conf. on Management of Data, Washington, 1998. 73–84, 118–121
Agrawal R, Gehrke J, Gunopulos D, et al. Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. of 1998 ACM-SIGMOD Intl. Conf. on Management of Data, Washington, 1998. 94–105
Zhang T, Ramakrishnan R, Livny M. BIRCH: an efficient data clustering method for very large databases. In: Proc. of 1996 ACM-SIGMOD Intl. Conf on Mangement of Data, Quebec, 1996. 103–114
Jing L P, Michael K N, Huang J Z. An entropy weighting K-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans Knowl Data Eng, 2007, 19: 1026–1041
Article Google Scholar
Chitade A Z, Katiyar S K. Color based image segmentation using K-means clustering. Int J Eng Sci Technol, 2010, 2: 5319–5325
Google Scholar
Kanungo T, Mount D M, Netanyahu N. An efficient K-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell, 2002, 24: 881–892
Article Google Scholar
Steinbach M, Karypis G, Kumar V. A comparison of document clustering techniques. In: Proc. 6th Knowledge Discovery Data Mining, Boston, 2000
Park H S, Jun C H. A simple and fast algorithm for K-medoids clustering. Expert Syst Appl, 2009, 36: 3336–3341
Article Google Scholar
Chang D X, Zhang X D, Zheng C W. A genetic algorithm with gene rearrangement for K-means clustering. Pattern Recognit, 2009, 42: 1210–1222
Article Google Scholar
Shehroz S K, Amir A. Cluster center initialization algorithm for K-means clustering. Pattern Recognit Lett, 2004, 25: 1293–1302
Article Google Scholar
Jim Z C L, Huang T J. Fast global K-means clustering using cluster membership and inequality. Pattern Recognit, 2010, 43: 1954–1963
Article MATH Google Scholar
Kiddle S J, Windram O P, Mchattie S. Temporal clustering by affinity propagation reveals transcriptional modules in arabidopsis thaliana. Bioinformatics, 2010, 26: 355–362
Article Google Scholar
Mézard M, Parisi G, Zecchina R. Analytic and algorithmic solution of random satisfability problems. Science, 2002, 297: 812–815
Article Google Scholar
Mézard M. Passing messages between disciplines. Science, 2003, 301: 1685–1686
Article Google Scholar
Michael J B, Kohn H F. Comment on ‘clustering by passing messages between data points’. Science, 2008, 319: 726–726
Google Scholar
Frey B J, Dueck D. Response to comment on ‘clustering by passing messages between data points’. Science, 2008, 319: 726–726
Article Google Scholar
Wang C D, Lai J H. Energy based competitive learning. Neurocomputing, 2011, 74: 2265–2275
Article MathSciNet Google Scholar
Xu L, Krzyzak A, Oja E. Rival penalized competitive learning for clustering analysis, RBF net, and curve detection. IEEE Trans Neural Netw, 1993, 4: 636–649
Article Google Scholar
Wang K J. Supplement for affinity propagation. 2011 December 5. Available from: http://www.mathworks.com/matlabcentral/fileexchange/authors/24811
UCI Machine Learning Repositpory. 2011 December 5. Available from: http://archive.ics.uci.edu/ml/
Witten L H, Frank E, Hall M A. Data Ming: Practical Machine Learning Tools and Techniques. 3rd ed. San Fransisco: Morgan Kaufmann Publishers, 2011. 175
Google Scholar
Jiang D X, Tang C, Zhang A D. Cluster analysis for gene expression data: A survey. IEEE Trans Knowl Data Eng, 2004, 16: 1370–1386
Article Google Scholar
Dudoit S, Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol, 2002, 3: 1–21
Article Google Scholar
Fowlkes E B, Mallows C L. A method for comparing two hierarchical clusterings. J Am Stat Assoc, 1983, 78: 553–569
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Pattern Recognition and Natural Computation Laboratory, Harbin Engineering University, Harbin, 150001, China
ZhiMao Lu & Qi Zhang

Authors

ZhiMao Lu
View author publications
You can also search for this author in PubMed Google Scholar
Qi Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to ZhiMao Lu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, Z., Zhang, Q. Clustering by data competition. Sci. China Inf. Sci. 56, 1–13 (2013). https://doi.org/10.1007/s11432-012-4627-2

Download citation

Received: 17 November 2011
Accepted: 07 March 2012
Published: 17 August 2012
Issue Date: January 2013
DOI: https://doi.org/10.1007/s11432-012-4627-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering by data competition

Abstract

Access this article

Similar content being viewed by others

Cluster merging based on a decision threshold

Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature

Clustering by fast search and merge of local density peaks for gene expression microarray data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Clustering by data competition

Abstract

Access this article

Similar content being viewed by others

Cluster merging based on a decision threshold

Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature

Clustering by fast search and merge of local density peaks for gene expression microarray data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation