Cluster Ensemble Methods: from Single Clusterings to Combined Solutions

Fred, Ana; Lourenço, André

doi:10.1007/978-3-540-78981-9_1

Ana Fred⁵ &
André Lourenço⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 126))

990 Accesses
29 Citations

Summary

Cluster ensemble methods attempt to find better and more robust clustering solutions by fusing information from several data partitionings. In this chapter, we address the different phases of this recent approach: from the generation of the partitions, the clustering ensemble, to the combination and validation of the combined result. While giving an overall revision of the state-of-the-art in the area, we focus on our own work on the subject. In particular, the Evidence Accumulation Clustering (EAC) paradigm is detailed and analyzed. For the validation/selection of the final partition, we focus on metrics that can quantitatively measure the consistency between partitions and combined results, and thus enabling the choice of best results without the use of additional information. Information-theoretic measures in conjunction with a variance analysis using bootstrapping are detailed and empirically evaluated. Experimental results throughout the paper illustrate the various concepts and methods addressed, using synthetic and real data and involving both vectorial and string-based data representations. We show that the clustering ensemble approach can be used in very distinct contexts with the state of the art quality results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Al-Razgan M, Domeniconi C (2006) Weighted clustering ensembles. In: Ghosh J, Lambert D, Skillicorn D, Srivastava J (eds) Proc the 6th SIAM Int Conf Data Mining, Bethesda, Maryland. SIAM, Philadelphia, pp 258–269
Google Scholar
Ayad H, Kamel MS (2003) Finding natural clusters using multi-clusterer combiner based on shared nearest neighbors. In: Windeatt T, Roli F (eds) Proc the 4th Int Workshop Multiple Classifier Syst, Guildford, UK. Springer, Berlin/Heidelberg, pp 166–175
Chapter Google Scholar
Ayad H, Kamel M (2007) Cumulative voting consensus method for partitions with variable number of clusters. IEEE Trans Pattern Analysis Mach Intell 30:160–173
Article Google Scholar
Caruana R, Elhawary M, Nguyen N, Smith C (2006) Meta clustering. In: Proc the 6th IEEE Int Conf Data Mining, Hong Kong, China. IEEE Computer Society, Los Alamitos, pp 107–118
Google Scholar
de Souto MCP, Silva SCM, Bittencourt VG, de Araujo DSA (2005) Cluster ensemble for gene expression microarray data. In: Proc IEEE Int Joint Conf Neural Networks, Montréal, QB, Canada. IEEE Computer Society, pp 487–492
Google Scholar
Duarte FJ, Fred AL, Lourenço A, Rodrigues MF (2005) Weighted evidence accumulation clustering. In: Simoff SJ, Williams GJ, Galloway J, Kolyshkina I (eds) Proc the 4th Australasian Conf Knowl Discovery Data Mining, Sydney, NSW, Australia. University of Technology, Sydney, pp 205–220
Google Scholar
Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Fawcett T, Mishra N (eds) Proc the 20th Int Conf Mach Learn, Washington, DC, USA. AAAI Press, Menlo Park, pp 186–193
Google Scholar
Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Brodley CE (ed) Proc the 21st Int Conf Mach Learn, Banff, AL, Canada. ACM, New York, pp 281–288
Google Scholar
Filkov V, Skiena S (2003) Integrating microarray data by consensus clustering. In: Proc the 15th IEEE Int Conf Tools with Artif Intell, Sacramento, CA, USA. IEEE Computer Society, Los Alamitos, p 418–426
Chapter Google Scholar
Fred AL, Marques JS, Jorge PM (1997) Hidden Markov models vs syntactic modeling in object recognition. In: Proc the Int Conf Image Proc, Santa Barbara, CA, USA. IEEE Computer Society, Los Alamitos, pp 893–896
Google Scholar
Fred A (2001) Finding consistent clusters in data partitions. In: Kittler J, Roli F (eds) Proc the 2nd Int Workshop Multiple Classifier Syst, Cambridge, UK. Springer, Berlin/Heidelberg, pp 309–318
Google Scholar
Fred A, Jain AK (2002) Data clustering using evidence accumulation. In: Proc the 16th Int Conf Pattern Recognition, Quebec, QB, Canada. IEEE Computer Society, Washington, pp 276–280
Google Scholar
Fred A, Jain AK (2002) Evidence accumulation clustering based on the k-means algorithm. In: Caelli T, Amin A, Duin RPW, Kamel MS, de Ridder D (eds) Proc Joint IAPR Int Workshop Structural, Syntactic, and Statistical Pattern Recognition, Windsor, Canada. Springer, London, pp 442–451
Chapter Google Scholar
Fred A (2002) Similarity measures and clustering of string patterns. In: Chen D, Cheng X (eds) Pattern recognition and string matching. Springer-Verlag, New York, pp 155–194
Google Scholar
Fred A, Jain AK (2003) Robust data clustering. In: Proc IEEE Computer Society Conf Comp Vision and Pattern Recognition, Madison, WI, USA. IEEE Computer Society, Los Alamitos, pp 128–133
Google Scholar
Fred ALN, Leitão JMN (2003) A new cluster isolation criterion based on dissimilarity increments. IEEE Trans Pattern Analysis Machine Intell 25:944–958
Article Google Scholar
Fred A, Jain AK (2005) Combining multiple clustering using evidence accumulation. IEEE Trans Pattern Analysis Mach Intell 27:835–850
Article Google Scholar
Fred AL, Jain AK (2006) Learning pairwise similarity for data clustering. In: Proc the 18th Int Conf Pattern Recognition, Hong Kong, China. IEEE Computer Society, Washington, pp 925–928
Google Scholar
Fu KS (1986) Syntactic pattern recognition. In: Handbook of pattern recognition and image processing. Academic Press, Orlando, pp 85–117
Google Scholar
Greene D, Tsymbal A, Bolshakova N, Cunningham P (2004) Ensemble clustering in medical diagnostics. In: Long R, Antani S, Lee DJ, Nutter B, Zhang M (eds) Proc the 17th IEEE Symp Comp-Based Medical Syst, Bethesda, MD, USA. IEEE Computer Society, Los Alamitos, pp 576–581
Chapter Google Scholar
Greene D, Cunningham P (2006) Efficient ensemble methods for document clustering. Tech Rep CS-2006-48, Trinity College Dublin
Google Scholar
Halkidi M, Batistakis Y, Vazirgiannis M (2002) Cluster validity methods: part I. SIGMOD Record 31:40–45
Article Google Scholar
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Upper Saddle River
MATH Google Scholar
Jain AK (1989) Fundamentals of digital image processing. Prentice-Hall, Upper Saddle River
MATH Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Computing Surveys 31:264–323
Article Google Scholar
Karypis G (2002) Multilevel hypergraph partitioning. Tech Rep 02-25, University of Minnesota
Google Scholar
Kuncheva L, Hadjitodorov ST (2004) Using diversity in cluster ensembles. In: Proc the IEEE Int Conf Syst, Man and Cybernetics, The Hague, The Netherlands. IEEE Computer Society, Los Alamitos, pp 1214–1219
Google Scholar
Law M, Topchy A, Jain AK (2004) Multiobjective data clustering. In: Proc the 2004 IEEE Computer Society Conf Comp Vision and Pattern Recognition, Washington, DC, USA. IEEE Computer Society, Los Alamitos, pp 424–430
Chapter Google Scholar
Levine E, Domany E (2000) Resampling method for unsupervised estimation of cluster validity. Neural Computation 13:2573–2593
Article Google Scholar
Lourenço A, Fred A (2004) Comparison of combination methods using spectral clustering. In: Fred A (ed) Proc the 4th Int Workshop Pattern Recognition in Inf Syst, Porto, Portugal. INSTICC Press, Setúbal, pp 222–234
Google Scholar
Lourenço A, Fred A (2007) String patterns: from single clustering to ensemble methods and validation. In: Fred A, Jain AK (eds) Proc the 7th Int Workshop Pattern Recognition in Inf Syst, Funchal, Madeira, Portugal. INSTICC Press, Setúbal, pp 39–48
Google Scholar
Lu SY, Fu KS (1977) Stochastic error-correcting syntax analysis for the recognition of noisy patterns. IEEE Trans Computers 26:1268–1276
Article MATH MathSciNet Google Scholar
Lu SY, Fu KS (1978) A sentence-to-sentence clustering procedure for pattern analysis. IEEE Trans Syst, Man and Cybernetics 8:381–389
Article MATH MathSciNet Google Scholar
Marzal A, Vidal E (1993) Computation of normalized edit distance and applications. IEEE Trans Pattern Analysis Mach Intell 2:926–932
Article Google Scholar
Minaei-Bidgoli B, Topchy A, Punch W (2004) Ensembles of partitions via data resampling. In: Proc Inf Tech: Coding and Computing, Las Vegas, NV, USA. IEEE Computer Society, pp 188–192
Google Scholar
Monti S, Tamayo P, Mesirov JP, Golub TR (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52:91–118
Article MATH Google Scholar
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in Neural Inf Proc Syst 14. MIT Press, Cambridge, pp 849–856
Google Scholar
Pekalska E, Duin RPW (2005) The dissimilarity representation for pattern recognition: foundations and applications. World Scientific, Singapore
MATH Google Scholar
Roth V, Lange T, Braun M, Buhmann J (2002) A resampling approach to cluster validation. In: Hrdle W (ed) Proc th 15th Symp in Computational Statistics, Berlin, Germany. Physica-Verlag, Heidelberg, pp 123–128
Google Scholar
Singh V, Mukherjee L, Peng J, Xu J (2008) Ensemble clustering using semidefinite programming. In: Platt JC, Koller D, Singer Y, Roweis S (eds) Advances in Neural Inf Proc Syst 20, MIT Press, Cambridge
Google Scholar
Strehl A, Ghosh J (2002) Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J Mach Learn Research 3:583–617
Article MathSciNet Google Scholar
Strehl A, Ghosh J (2002) Consensus clustering – a knowledge reuse framework to combine clusterings. In: Proc Conf Artif Intell, Edmonton, AL, Canada. AAAI/MIT Press, pp 93–98
Google Scholar
Strehl A, Ghosh J (2003) Relationship-based clustering and visualization for high-dimensional data mining. INFORMS J Computing 15:208–230
Article Google Scholar
Topchy A, Jain AK, Punch W (2004) A mixture model of clustering ensembles. In: Proc the 4th SIAM Int Conf Data Mining, Lake Buena Vista, FL, USA. SIAM, Philadelphia
Google Scholar
Villanueva WJP, Bezerra GBP, Lima CADM, Von Zuben FJ (2005) Improving support vector clustering with ensembles. In: Proc Workshop Achieving Functional Integration of Diverse Neural Models, Montréal, QB, Canada, pp 13–15
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto de Telecomunicações, Instituto Superior Técnico, Lisboa, Portugal
Ana Fred
Instituto de Telecomunicações, Instituto Superior de Engenharia de Lisboa, Portugal
André Lourenço

Authors

Ana Fred
View author publications
You can also search for this author in PubMed Google Scholar
André Lourenço
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Machine Vision Group, Infotech Oulu, Finland
Oleg Okun
Department of Electrical and Information Engineering, University of Oulu, P.O. Box 4500, FI-90014, Oulu, Finland
Oleg Okun
Dipartimento di Scienze dell’Informazione, Universita degli Studi di Milano, Via Comelico 39, 20135, Milano, Italy
Giorgio Valentini

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Fred, A., Lourenço, A. (2008). Cluster Ensemble Methods: from Single Clusterings to Combined Solutions. In: Okun, O., Valentini, G. (eds) Supervised and Unsupervised Ensemble Methods and their Applications. Studies in Computational Intelligence, vol 126. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78981-9_1

Download citation

DOI: https://doi.org/10.1007/978-3-540-78981-9_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78980-2
Online ISBN: 978-3-540-78981-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics