Skip to main content
Log in

Clustering by data competition

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Clustering analysis is an unsupervised method to find out hidden structures in datasets. Most partitional clustering algorithms are sensitive to the selection of initial exemplars, the outliers and noise. In this paper, a novel technique called data competition algorithm is proposed to solve the problems. First the concept of aggregation field model is defined to describe the partitional clustering problem. Next, the exemplars are identified according to the data competition. Then, the members will be assigned to the suitable clusters. Data competition algorithm is able to avoid poor solutions caused by unlucky initializations, outliers and noise, and can be used to detect the coexpression gene, cluster the image, diagnose the disease, distinguish the variety, etc. The provided experimental results validate the feasibility and effectiveness of the proposed schemes and show that the proposed approach of data competition algorithm is simple, stable and efficient. The experimental results also show that the proposed approach of data competition clustering outperforms three of the most well known clustering algorithms K-means clustering, affinity propagation clustering, hierarchical clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Xu R, Wunsch D. Survey of clustering algorithms. IEEE Trans Neural Netw, 2005, 16: 645–678

    Article  Google Scholar 

  2. Sun J G, Liu J, Zhao L Y. Clustering algorithms research. J Softw, 2008, 19: 48–61

    Article  MATH  Google Scholar 

  3. Filippone M, Camastra F, Masulli F, et al. A survey of kernel and spectral methods for clustering. Pattern Recognit, 2008, 41: 176–190

    Article  MATH  Google Scholar 

  4. Tian Z, Li X B, Ju Y W. Spectral clustering based on matrix pertur-bation theory. Sci China Ser F: Inf Sci, 2007, 50: 63–81

    Article  MathSciNet  MATH  Google Scholar 

  5. Fernández A, Gómez S. Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms. J Classif, 2008, 25: 43–65

    Article  MATH  Google Scholar 

  6. MacQueen J B. Some methods for classification and analysis of multivariate observations. In: The 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, 1967. 281–297

  7. Frey B J, Dueck D. Clustering by passing message between data points. Science, 2007, 315: 972–976

    Article  MathSciNet  MATH  Google Scholar 

  8. Guha S, Pastogi R, Shim K. CURE: An efficient clustering algorithm for large databases. In: Proc. of 1998 ACMSIGMOD Intl. Conf. on Management of Data, Washington, 1998. 73–84, 118–121

  9. Agrawal R, Gehrke J, Gunopulos D, et al. Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. of 1998 ACM-SIGMOD Intl. Conf. on Management of Data, Washington, 1998. 94–105

  10. Zhang T, Ramakrishnan R, Livny M. BIRCH: an efficient data clustering method for very large databases. In: Proc. of 1996 ACM-SIGMOD Intl. Conf on Mangement of Data, Quebec, 1996. 103–114

  11. Jing L P, Michael K N, Huang J Z. An entropy weighting K-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans Knowl Data Eng, 2007, 19: 1026–1041

    Article  Google Scholar 

  12. Chitade A Z, Katiyar S K. Color based image segmentation using K-means clustering. Int J Eng Sci Technol, 2010, 2: 5319–5325

    Google Scholar 

  13. Kanungo T, Mount D M, Netanyahu N. An efficient K-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell, 2002, 24: 881–892

    Article  Google Scholar 

  14. Steinbach M, Karypis G, Kumar V. A comparison of document clustering techniques. In: Proc. 6th Knowledge Discovery Data Mining, Boston, 2000

  15. Park H S, Jun C H. A simple and fast algorithm for K-medoids clustering. Expert Syst Appl, 2009, 36: 3336–3341

    Article  Google Scholar 

  16. Chang D X, Zhang X D, Zheng C W. A genetic algorithm with gene rearrangement for K-means clustering. Pattern Recognit, 2009, 42: 1210–1222

    Article  Google Scholar 

  17. Shehroz S K, Amir A. Cluster center initialization algorithm for K-means clustering. Pattern Recognit Lett, 2004, 25: 1293–1302

    Article  Google Scholar 

  18. Jim Z C L, Huang T J. Fast global K-means clustering using cluster membership and inequality. Pattern Recognit, 2010, 43: 1954–1963

    Article  MATH  Google Scholar 

  19. Kiddle S J, Windram O P, Mchattie S. Temporal clustering by affinity propagation reveals transcriptional modules in arabidopsis thaliana. Bioinformatics, 2010, 26: 355–362

    Article  Google Scholar 

  20. Mézard M, Parisi G, Zecchina R. Analytic and algorithmic solution of random satisfability problems. Science, 2002, 297: 812–815

    Article  Google Scholar 

  21. Mézard M. Passing messages between disciplines. Science, 2003, 301: 1685–1686

    Article  Google Scholar 

  22. Michael J B, Kohn H F. Comment on ‘clustering by passing messages between data points’. Science, 2008, 319: 726–726

    Google Scholar 

  23. Frey B J, Dueck D. Response to comment on ‘clustering by passing messages between data points’. Science, 2008, 319: 726–726

    Article  Google Scholar 

  24. Wang C D, Lai J H. Energy based competitive learning. Neurocomputing, 2011, 74: 2265–2275

    Article  MathSciNet  Google Scholar 

  25. Xu L, Krzyzak A, Oja E. Rival penalized competitive learning for clustering analysis, RBF net, and curve detection. IEEE Trans Neural Netw, 1993, 4: 636–649

    Article  Google Scholar 

  26. Wang K J. Supplement for affinity propagation. 2011 December 5. Available from: http://www.mathworks.com/matlabcentral/fileexchange/authors/24811

  27. UCI Machine Learning Repositpory. 2011 December 5. Available from: http://archive.ics.uci.edu/ml/

  28. Witten L H, Frank E, Hall M A. Data Ming: Practical Machine Learning Tools and Techniques. 3rd ed. San Fransisco: Morgan Kaufmann Publishers, 2011. 175

    Google Scholar 

  29. Jiang D X, Tang C, Zhang A D. Cluster analysis for gene expression data: A survey. IEEE Trans Knowl Data Eng, 2004, 16: 1370–1386

    Article  Google Scholar 

  30. Dudoit S, Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol, 2002, 3: 1–21

    Article  Google Scholar 

  31. Fowlkes E B, Mallows C L. A method for comparing two hierarchical clusterings. J Am Stat Assoc, 1983, 78: 553–569

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to ZhiMao Lu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, Z., Zhang, Q. Clustering by data competition. Sci. China Inf. Sci. 56, 1–13 (2013). https://doi.org/10.1007/s11432-012-4627-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-012-4627-2

Keywords

Navigation