Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 126))

Summary

Cluster ensemble methods attempt to find better and more robust clustering solutions by fusing information from several data partitionings. In this chapter, we address the different phases of this recent approach: from the generation of the partitions, the clustering ensemble, to the combination and validation of the combined result. While giving an overall revision of the state-of-the-art in the area, we focus on our own work on the subject. In particular, the Evidence Accumulation Clustering (EAC) paradigm is detailed and analyzed. For the validation/selection of the final partition, we focus on metrics that can quantitatively measure the consistency between partitions and combined results, and thus enabling the choice of best results without the use of additional information. Information-theoretic measures in conjunction with a variance analysis using bootstrapping are detailed and empirically evaluated. Experimental results throughout the paper illustrate the various concepts and methods addressed, using synthetic and real data and involving both vectorial and string-based data representations. We show that the clustering ensemble approach can be used in very distinct contexts with the state of the art quality results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Al-Razgan M, Domeniconi C (2006) Weighted clustering ensembles. In: Ghosh J, Lambert D, Skillicorn D, Srivastava J (eds) Proc the 6th SIAM Int Conf Data Mining, Bethesda, Maryland. SIAM, Philadelphia, pp 258–269

    Google Scholar 

  2. Ayad H, Kamel MS (2003) Finding natural clusters using multi-clusterer combiner based on shared nearest neighbors. In: Windeatt T, Roli F (eds) Proc the 4th Int Workshop Multiple Classifier Syst, Guildford, UK. Springer, Berlin/Heidelberg, pp 166–175

    Chapter  Google Scholar 

  3. Ayad H, Kamel M (2007) Cumulative voting consensus method for partitions with variable number of clusters. IEEE Trans Pattern Analysis Mach Intell 30:160–173

    Article  Google Scholar 

  4. Caruana R, Elhawary M, Nguyen N, Smith C (2006) Meta clustering. In: Proc the 6th IEEE Int Conf Data Mining, Hong Kong, China. IEEE Computer Society, Los Alamitos, pp 107–118

    Google Scholar 

  5. de Souto MCP, Silva SCM, Bittencourt VG, de Araujo DSA (2005) Cluster ensemble for gene expression microarray data. In: Proc IEEE Int Joint Conf Neural Networks, Montréal, QB, Canada. IEEE Computer Society, pp 487–492

    Google Scholar 

  6. Duarte FJ, Fred AL, Lourenço A, Rodrigues MF (2005) Weighted evidence accumulation clustering. In: Simoff SJ, Williams GJ, Galloway J, Kolyshkina I (eds) Proc the 4th Australasian Conf Knowl Discovery Data Mining, Sydney, NSW, Australia. University of Technology, Sydney, pp 205–220

    Google Scholar 

  7. Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Fawcett T, Mishra N (eds) Proc the 20th Int Conf Mach Learn, Washington, DC, USA. AAAI Press, Menlo Park, pp 186–193

    Google Scholar 

  8. Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Brodley CE (ed) Proc the 21st Int Conf Mach Learn, Banff, AL, Canada. ACM, New York, pp 281–288

    Google Scholar 

  9. Filkov V, Skiena S (2003) Integrating microarray data by consensus clustering. In: Proc the 15th IEEE Int Conf Tools with Artif Intell, Sacramento, CA, USA. IEEE Computer Society, Los Alamitos, p 418–426

    Chapter  Google Scholar 

  10. Fred AL, Marques JS, Jorge PM (1997) Hidden Markov models vs syntactic modeling in object recognition. In: Proc the Int Conf Image Proc, Santa Barbara, CA, USA. IEEE Computer Society, Los Alamitos, pp 893–896

    Google Scholar 

  11. Fred A (2001) Finding consistent clusters in data partitions. In: Kittler J, Roli F (eds) Proc the 2nd Int Workshop Multiple Classifier Syst, Cambridge, UK. Springer, Berlin/Heidelberg, pp 309–318

    Google Scholar 

  12. Fred A, Jain AK (2002) Data clustering using evidence accumulation. In: Proc the 16th Int Conf Pattern Recognition, Quebec, QB, Canada. IEEE Computer Society, Washington, pp 276–280

    Google Scholar 

  13. Fred A, Jain AK (2002) Evidence accumulation clustering based on the k-means algorithm. In: Caelli T, Amin A, Duin RPW, Kamel MS, de Ridder D (eds) Proc Joint IAPR Int Workshop Structural, Syntactic, and Statistical Pattern Recognition, Windsor, Canada. Springer, London, pp 442–451

    Chapter  Google Scholar 

  14. Fred A (2002) Similarity measures and clustering of string patterns. In: Chen D, Cheng X (eds) Pattern recognition and string matching. Springer-Verlag, New York, pp 155–194

    Google Scholar 

  15. Fred A, Jain AK (2003) Robust data clustering. In: Proc IEEE Computer Society Conf Comp Vision and Pattern Recognition, Madison, WI, USA. IEEE Computer Society, Los Alamitos, pp 128–133

    Google Scholar 

  16. Fred ALN, Leitão JMN (2003) A new cluster isolation criterion based on dissimilarity increments. IEEE Trans Pattern Analysis Machine Intell 25:944–958

    Article  Google Scholar 

  17. Fred A, Jain AK (2005) Combining multiple clustering using evidence accumulation. IEEE Trans Pattern Analysis Mach Intell 27:835–850

    Article  Google Scholar 

  18. Fred AL, Jain AK (2006) Learning pairwise similarity for data clustering. In: Proc the 18th Int Conf Pattern Recognition, Hong Kong, China. IEEE Computer Society, Washington, pp 925–928

    Google Scholar 

  19. Fu KS (1986) Syntactic pattern recognition. In: Handbook of pattern recognition and image processing. Academic Press, Orlando, pp 85–117

    Google Scholar 

  20. Greene D, Tsymbal A, Bolshakova N, Cunningham P (2004) Ensemble clustering in medical diagnostics. In: Long R, Antani S, Lee DJ, Nutter B, Zhang M (eds) Proc the 17th IEEE Symp Comp-Based Medical Syst, Bethesda, MD, USA. IEEE Computer Society, Los Alamitos, pp 576–581

    Chapter  Google Scholar 

  21. Greene D, Cunningham P (2006) Efficient ensemble methods for document clustering. Tech Rep CS-2006-48, Trinity College Dublin

    Google Scholar 

  22. Halkidi M, Batistakis Y, Vazirgiannis M (2002) Cluster validity methods: part I. SIGMOD Record 31:40–45

    Article  Google Scholar 

  23. Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Upper Saddle River

    MATH  Google Scholar 

  24. Jain AK (1989) Fundamentals of digital image processing. Prentice-Hall, Upper Saddle River

    MATH  Google Scholar 

  25. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Computing Surveys 31:264–323

    Article  Google Scholar 

  26. Karypis G (2002) Multilevel hypergraph partitioning. Tech Rep 02-25, University of Minnesota

    Google Scholar 

  27. Kuncheva L, Hadjitodorov ST (2004) Using diversity in cluster ensembles. In: Proc the IEEE Int Conf Syst, Man and Cybernetics, The Hague, The Netherlands. IEEE Computer Society, Los Alamitos, pp 1214–1219

    Google Scholar 

  28. Law M, Topchy A, Jain AK (2004) Multiobjective data clustering. In: Proc the 2004 IEEE Computer Society Conf Comp Vision and Pattern Recognition, Washington, DC, USA. IEEE Computer Society, Los Alamitos, pp 424–430

    Chapter  Google Scholar 

  29. Levine E, Domany E (2000) Resampling method for unsupervised estimation of cluster validity. Neural Computation 13:2573–2593

    Article  Google Scholar 

  30. Lourenço A, Fred A (2004) Comparison of combination methods using spectral clustering. In: Fred A (ed) Proc the 4th Int Workshop Pattern Recognition in Inf Syst, Porto, Portugal. INSTICC Press, Setúbal, pp 222–234

    Google Scholar 

  31. Lourenço A, Fred A (2007) String patterns: from single clustering to ensemble methods and validation. In: Fred A, Jain AK (eds) Proc the 7th Int Workshop Pattern Recognition in Inf Syst, Funchal, Madeira, Portugal. INSTICC Press, Setúbal, pp 39–48

    Google Scholar 

  32. Lu SY, Fu KS (1977) Stochastic error-correcting syntax analysis for the recognition of noisy patterns. IEEE Trans Computers 26:1268–1276

    Article  MATH  MathSciNet  Google Scholar 

  33. Lu SY, Fu KS (1978) A sentence-to-sentence clustering procedure for pattern analysis. IEEE Trans Syst, Man and Cybernetics 8:381–389

    Article  MATH  MathSciNet  Google Scholar 

  34. Marzal A, Vidal E (1993) Computation of normalized edit distance and applications. IEEE Trans Pattern Analysis Mach Intell 2:926–932

    Article  Google Scholar 

  35. Minaei-Bidgoli B, Topchy A, Punch W (2004) Ensembles of partitions via data resampling. In: Proc Inf Tech: Coding and Computing, Las Vegas, NV, USA. IEEE Computer Society, pp 188–192

    Google Scholar 

  36. Monti S, Tamayo P, Mesirov JP, Golub TR (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52:91–118

    Article  MATH  Google Scholar 

  37. Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in Neural Inf Proc Syst 14. MIT Press, Cambridge, pp 849–856

    Google Scholar 

  38. Pekalska E, Duin RPW (2005) The dissimilarity representation for pattern recognition: foundations and applications. World Scientific, Singapore

    MATH  Google Scholar 

  39. Roth V, Lange T, Braun M, Buhmann J (2002) A resampling approach to cluster validation. In: Hrdle W (ed) Proc th 15th Symp in Computational Statistics, Berlin, Germany. Physica-Verlag, Heidelberg, pp 123–128

    Google Scholar 

  40. Singh V, Mukherjee L, Peng J, Xu J (2008) Ensemble clustering using semidefinite programming. In: Platt JC, Koller D, Singer Y, Roweis S (eds) Advances in Neural Inf Proc Syst 20, MIT Press, Cambridge

    Google Scholar 

  41. Strehl A, Ghosh J (2002) Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J Mach Learn Research 3:583–617

    Article  MathSciNet  Google Scholar 

  42. Strehl A, Ghosh J (2002) Consensus clustering – a knowledge reuse framework to combine clusterings. In: Proc Conf Artif Intell, Edmonton, AL, Canada. AAAI/MIT Press, pp 93–98

    Google Scholar 

  43. Strehl A, Ghosh J (2003) Relationship-based clustering and visualization for high-dimensional data mining. INFORMS J Computing 15:208–230

    Article  Google Scholar 

  44. Topchy A, Jain AK, Punch W (2004) A mixture model of clustering ensembles. In: Proc the 4th SIAM Int Conf Data Mining, Lake Buena Vista, FL, USA. SIAM, Philadelphia

    Google Scholar 

  45. Villanueva WJP, Bezerra GBP, Lima CADM, Von Zuben FJ (2005) Improving support vector clustering with ensembles. In: Proc Workshop Achieving Functional Integration of Diverse Neural Models, Montréal, QB, Canada, pp 13–15

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Fred, A., Lourenço, A. (2008). Cluster Ensemble Methods: from Single Clusterings to Combined Solutions. In: Okun, O., Valentini, G. (eds) Supervised and Unsupervised Ensemble Methods and their Applications. Studies in Computational Intelligence, vol 126. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78981-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78981-9_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78980-2

  • Online ISBN: 978-3-540-78981-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics