Summary
Cluster ensemble methods attempt to find better and more robust clustering solutions by fusing information from several data partitionings. In this chapter, we address the different phases of this recent approach: from the generation of the partitions, the clustering ensemble, to the combination and validation of the combined result. While giving an overall revision of the state-of-the-art in the area, we focus on our own work on the subject. In particular, the Evidence Accumulation Clustering (EAC) paradigm is detailed and analyzed. For the validation/selection of the final partition, we focus on metrics that can quantitatively measure the consistency between partitions and combined results, and thus enabling the choice of best results without the use of additional information. Information-theoretic measures in conjunction with a variance analysis using bootstrapping are detailed and empirically evaluated. Experimental results throughout the paper illustrate the various concepts and methods addressed, using synthetic and real data and involving both vectorial and string-based data representations. We show that the clustering ensemble approach can be used in very distinct contexts with the state of the art quality results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Al-Razgan M, Domeniconi C (2006) Weighted clustering ensembles. In: Ghosh J, Lambert D, Skillicorn D, Srivastava J (eds) Proc the 6th SIAM Int Conf Data Mining, Bethesda, Maryland. SIAM, Philadelphia, pp 258–269
Ayad H, Kamel MS (2003) Finding natural clusters using multi-clusterer combiner based on shared nearest neighbors. In: Windeatt T, Roli F (eds) Proc the 4th Int Workshop Multiple Classifier Syst, Guildford, UK. Springer, Berlin/Heidelberg, pp 166–175
Ayad H, Kamel M (2007) Cumulative voting consensus method for partitions with variable number of clusters. IEEE Trans Pattern Analysis Mach Intell 30:160–173
Caruana R, Elhawary M, Nguyen N, Smith C (2006) Meta clustering. In: Proc the 6th IEEE Int Conf Data Mining, Hong Kong, China. IEEE Computer Society, Los Alamitos, pp 107–118
de Souto MCP, Silva SCM, Bittencourt VG, de Araujo DSA (2005) Cluster ensemble for gene expression microarray data. In: Proc IEEE Int Joint Conf Neural Networks, Montréal, QB, Canada. IEEE Computer Society, pp 487–492
Duarte FJ, Fred AL, Lourenço A, Rodrigues MF (2005) Weighted evidence accumulation clustering. In: Simoff SJ, Williams GJ, Galloway J, Kolyshkina I (eds) Proc the 4th Australasian Conf Knowl Discovery Data Mining, Sydney, NSW, Australia. University of Technology, Sydney, pp 205–220
Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Fawcett T, Mishra N (eds) Proc the 20th Int Conf Mach Learn, Washington, DC, USA. AAAI Press, Menlo Park, pp 186–193
Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Brodley CE (ed) Proc the 21st Int Conf Mach Learn, Banff, AL, Canada. ACM, New York, pp 281–288
Filkov V, Skiena S (2003) Integrating microarray data by consensus clustering. In: Proc the 15th IEEE Int Conf Tools with Artif Intell, Sacramento, CA, USA. IEEE Computer Society, Los Alamitos, p 418–426
Fred AL, Marques JS, Jorge PM (1997) Hidden Markov models vs syntactic modeling in object recognition. In: Proc the Int Conf Image Proc, Santa Barbara, CA, USA. IEEE Computer Society, Los Alamitos, pp 893–896
Fred A (2001) Finding consistent clusters in data partitions. In: Kittler J, Roli F (eds) Proc the 2nd Int Workshop Multiple Classifier Syst, Cambridge, UK. Springer, Berlin/Heidelberg, pp 309–318
Fred A, Jain AK (2002) Data clustering using evidence accumulation. In: Proc the 16th Int Conf Pattern Recognition, Quebec, QB, Canada. IEEE Computer Society, Washington, pp 276–280
Fred A, Jain AK (2002) Evidence accumulation clustering based on the k-means algorithm. In: Caelli T, Amin A, Duin RPW, Kamel MS, de Ridder D (eds) Proc Joint IAPR Int Workshop Structural, Syntactic, and Statistical Pattern Recognition, Windsor, Canada. Springer, London, pp 442–451
Fred A (2002) Similarity measures and clustering of string patterns. In: Chen D, Cheng X (eds) Pattern recognition and string matching. Springer-Verlag, New York, pp 155–194
Fred A, Jain AK (2003) Robust data clustering. In: Proc IEEE Computer Society Conf Comp Vision and Pattern Recognition, Madison, WI, USA. IEEE Computer Society, Los Alamitos, pp 128–133
Fred ALN, Leitão JMN (2003) A new cluster isolation criterion based on dissimilarity increments. IEEE Trans Pattern Analysis Machine Intell 25:944–958
Fred A, Jain AK (2005) Combining multiple clustering using evidence accumulation. IEEE Trans Pattern Analysis Mach Intell 27:835–850
Fred AL, Jain AK (2006) Learning pairwise similarity for data clustering. In: Proc the 18th Int Conf Pattern Recognition, Hong Kong, China. IEEE Computer Society, Washington, pp 925–928
Fu KS (1986) Syntactic pattern recognition. In: Handbook of pattern recognition and image processing. Academic Press, Orlando, pp 85–117
Greene D, Tsymbal A, Bolshakova N, Cunningham P (2004) Ensemble clustering in medical diagnostics. In: Long R, Antani S, Lee DJ, Nutter B, Zhang M (eds) Proc the 17th IEEE Symp Comp-Based Medical Syst, Bethesda, MD, USA. IEEE Computer Society, Los Alamitos, pp 576–581
Greene D, Cunningham P (2006) Efficient ensemble methods for document clustering. Tech Rep CS-2006-48, Trinity College Dublin
Halkidi M, Batistakis Y, Vazirgiannis M (2002) Cluster validity methods: part I. SIGMOD Record 31:40–45
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Upper Saddle River
Jain AK (1989) Fundamentals of digital image processing. Prentice-Hall, Upper Saddle River
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Computing Surveys 31:264–323
Karypis G (2002) Multilevel hypergraph partitioning. Tech Rep 02-25, University of Minnesota
Kuncheva L, Hadjitodorov ST (2004) Using diversity in cluster ensembles. In: Proc the IEEE Int Conf Syst, Man and Cybernetics, The Hague, The Netherlands. IEEE Computer Society, Los Alamitos, pp 1214–1219
Law M, Topchy A, Jain AK (2004) Multiobjective data clustering. In: Proc the 2004 IEEE Computer Society Conf Comp Vision and Pattern Recognition, Washington, DC, USA. IEEE Computer Society, Los Alamitos, pp 424–430
Levine E, Domany E (2000) Resampling method for unsupervised estimation of cluster validity. Neural Computation 13:2573–2593
Lourenço A, Fred A (2004) Comparison of combination methods using spectral clustering. In: Fred A (ed) Proc the 4th Int Workshop Pattern Recognition in Inf Syst, Porto, Portugal. INSTICC Press, Setúbal, pp 222–234
Lourenço A, Fred A (2007) String patterns: from single clustering to ensemble methods and validation. In: Fred A, Jain AK (eds) Proc the 7th Int Workshop Pattern Recognition in Inf Syst, Funchal, Madeira, Portugal. INSTICC Press, Setúbal, pp 39–48
Lu SY, Fu KS (1977) Stochastic error-correcting syntax analysis for the recognition of noisy patterns. IEEE Trans Computers 26:1268–1276
Lu SY, Fu KS (1978) A sentence-to-sentence clustering procedure for pattern analysis. IEEE Trans Syst, Man and Cybernetics 8:381–389
Marzal A, Vidal E (1993) Computation of normalized edit distance and applications. IEEE Trans Pattern Analysis Mach Intell 2:926–932
Minaei-Bidgoli B, Topchy A, Punch W (2004) Ensembles of partitions via data resampling. In: Proc Inf Tech: Coding and Computing, Las Vegas, NV, USA. IEEE Computer Society, pp 188–192
Monti S, Tamayo P, Mesirov JP, Golub TR (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52:91–118
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in Neural Inf Proc Syst 14. MIT Press, Cambridge, pp 849–856
Pekalska E, Duin RPW (2005) The dissimilarity representation for pattern recognition: foundations and applications. World Scientific, Singapore
Roth V, Lange T, Braun M, Buhmann J (2002) A resampling approach to cluster validation. In: Hrdle W (ed) Proc th 15th Symp in Computational Statistics, Berlin, Germany. Physica-Verlag, Heidelberg, pp 123–128
Singh V, Mukherjee L, Peng J, Xu J (2008) Ensemble clustering using semidefinite programming. In: Platt JC, Koller D, Singer Y, Roweis S (eds) Advances in Neural Inf Proc Syst 20, MIT Press, Cambridge
Strehl A, Ghosh J (2002) Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J Mach Learn Research 3:583–617
Strehl A, Ghosh J (2002) Consensus clustering – a knowledge reuse framework to combine clusterings. In: Proc Conf Artif Intell, Edmonton, AL, Canada. AAAI/MIT Press, pp 93–98
Strehl A, Ghosh J (2003) Relationship-based clustering and visualization for high-dimensional data mining. INFORMS J Computing 15:208–230
Topchy A, Jain AK, Punch W (2004) A mixture model of clustering ensembles. In: Proc the 4th SIAM Int Conf Data Mining, Lake Buena Vista, FL, USA. SIAM, Philadelphia
Villanueva WJP, Bezerra GBP, Lima CADM, Von Zuben FJ (2005) Improving support vector clustering with ensembles. In: Proc Workshop Achieving Functional Integration of Diverse Neural Models, Montréal, QB, Canada, pp 13–15
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Fred, A., Lourenço, A. (2008). Cluster Ensemble Methods: from Single Clusterings to Combined Solutions. In: Okun, O., Valentini, G. (eds) Supervised and Unsupervised Ensemble Methods and their Applications. Studies in Computational Intelligence, vol 126. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78981-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-78981-9_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78980-2
Online ISBN: 978-3-540-78981-9
eBook Packages: EngineeringEngineering (R0)