Combining Multiple Clustering Systems

Boulis, Constantinos; Ostendorf, Mari

doi:10.1007/978-3-540-30116-5_9

Constantinos Boulis²² &
Mari Ostendorf²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3202))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

2486 Accesses
27 Citations

Abstract

Three methods for combining multiple clustering systems are presented and evaluated, focusing on the problem of finding the correspondence between clusters of different systems. In this work, the clusters of individual systems are represented in a common space and their correspondence estimated by either “clustering clusters” or with Singular Value Decomposition. The approaches are evaluated for the task of topic discovery on three major corpora and eight different clustering algorithms and it is shown experimentally that combination schemes almost always offer gains compared to single systems, but gains from using a combination scheme depend on the underlying clustering systems.

Download to read the full chapter text

Chapter PDF

Braverman’s Spectrum and Matrix Diagonalization Versus iK-Means: A Unified Framework for Clustering

Combinatorial Optimization Approaches for Data Clustering

Clustering Large Datasets by Merging K-Means Solutions

Article 29 March 2019

References

Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24, 97–124 (1998)
Google Scholar
Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to Web search results. Computer Networks 31, 1361–1374 (1999)
Article Google Scholar
Bellegarda, J.: Large vocabulary speech recognition with multispan statistical language models. IEEE Trans. on Speech and Audio Processing 8, 76–84 (2000)
Article Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31, 264–323 (1999)
Article Google Scholar
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36, 105–139 (1999)
Article Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. Machine Learning Research 3, 583–617 (2002)
Article MathSciNet Google Scholar
Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: a resamplingbased method for class discovery and visualization of gene-expression microaray data. Machine Learning 52, 91–118 (2003)
Article MATH Google Scholar
Fred, A., Jain, A.: Data clustering using evidence accumulation. In: Proc. of the International Conference on Pattern Recognition, pp. 276–280 (2002)
Google Scholar
Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19, 1090–1099 (2003)
Article Google Scholar
Zeng, Y., Tang, J., Garcia-Frias, J., Gao, G.: An adaptive meta-clustering approach: Combining the information from different clustering results. In: Proc. IEEE Computer Society Bioinformatics Conference, pp. 276–281 (2002)
Google Scholar
Fern, X., Brodley, C.: Random projection for high dimensional data: A cluster ensemble approach. In: Proc. of the 20th International Conf. on Machine Learning (ICML), pp. 186–193 (2003)
Google Scholar
Topchy, A., Jain, A., Punch, W.: A mixture model for clustering ensembles. In: Proc. of SIAM Conference on Data Mining (2004)
Google Scholar
Dimitriadou, E., Weingessel, A., Hornik, K.: A combination scheme for fuzzy clustering. Inter. J. of Pattern Recognition and Artificial Intelligence 16, 901–912 (2002)
Article Google Scholar
Frossyniotis, D., Pertselakis, M., Stafylopatis, M.: A multi-clustering fusion algorithm. In: Proc. of the 2nd Hellenic Conference on Artificial Intelligence, pp. 225–236 (2002)
Google Scholar
Bradley, P., Fayyad, U.: Refining initial points for K-Means clustering. In: Proc. 15th International Conf. on Machine Learning (ICML), pp. 91–99 (1998)
Google Scholar
Schwartz, G.: Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464 (1978)
Article MathSciNet Google Scholar
Godfrey, J., Holliman, E., McDaniel, J.: Switchboard: Telephone speech corpus for research development. In: Proc. of ICASSP, pp. 517–520 (1992)
Google Scholar
Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning, 311–331 (2004)
Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39(1), 1–38 (1977)
MATH MathSciNet Google Scholar
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Learning to classify text from labeled and unlabeled documents. In: Proc. of AAAI, pp. 792–799 (1998)
Google Scholar
Cheeseman, P., Stutz, J.: Bayesian classification (AutoClass): Theory and results. In: Advances in Knowledge Discovery and Data Mining, AAAI Press/MIT Press (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, University of Washington, Seattle, WA, 98195, USA
Constantinos Boulis & Mari Ostendorf

Authors

Constantinos Boulis
View author publications
You can also search for this author in PubMed Google Scholar
Mari Ostendorf
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INSA-Lyon, LIRIS CNRS UMR5205, F-69621, Villeurbanne, France
Jean-François Boulicaut
Dipartimento di Informatica, Università degli Studi di Bari,
Floriana Esposito
Pisa KDD Laboratory, ISTI - CNR, Area della Ricerca di Pisa, Via Giuseppe Moruzzi 1, Pisa, Italy
Fosca Giannotti
Dipartimento di Informatica, Via F. Buonarroti 2, 56127, Pisa, Italy
Dino Pedreschi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boulis, C., Ostendorf, M. (2004). Combining Multiple Clustering Systems. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Knowledge Discovery in Databases: PKDD 2004. PKDD 2004. Lecture Notes in Computer Science(), vol 3202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30116-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-540-30116-5_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23108-0
Online ISBN: 978-3-540-30116-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Combining Multiple Clustering Systems

Abstract

Chapter PDF

Similar content being viewed by others

Braverman’s Spectrum and Matrix Diagonalization Versus iK-Means: A Unified Framework for Clustering

Combinatorial Optimization Approaches for Data Clustering

Clustering Large Datasets by Merging K-Means Solutions

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Combining Multiple Clustering Systems

Abstract

Chapter PDF

Similar content being viewed by others

Braverman’s Spectrum and Matrix Diagonalization Versus iK-Means: A Unified Framework for Clustering

Combinatorial Optimization Approaches for Data Clustering

Clustering Large Datasets by Merging K-Means Solutions

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation