Abstract
Three methods for combining multiple clustering systems are presented and evaluated, focusing on the problem of finding the correspondence between clusters of different systems. In this work, the clusters of individual systems are represented in a common space and their correspondence estimated by either “clustering clusters” or with Singular Value Decomposition. The approaches are evaluated for the task of topic discovery on three major corpora and eight different clustering algorithms and it is shown experimentally that combination schemes almost always offer gains compared to single systems, but gains from using a combination scheme depend on the underlying clustering systems.
Chapter PDF
Similar content being viewed by others
References
Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24, 97–124 (1998)
Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to Web search results. Computer Networks 31, 1361–1374 (1999)
Bellegarda, J.: Large vocabulary speech recognition with multispan statistical language models. IEEE Trans. on Speech and Audio Processing 8, 76–84 (2000)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31, 264–323 (1999)
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36, 105–139 (1999)
Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. Machine Learning Research 3, 583–617 (2002)
Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: a resamplingbased method for class discovery and visualization of gene-expression microaray data. Machine Learning 52, 91–118 (2003)
Fred, A., Jain, A.: Data clustering using evidence accumulation. In: Proc. of the International Conference on Pattern Recognition, pp. 276–280 (2002)
Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19, 1090–1099 (2003)
Zeng, Y., Tang, J., Garcia-Frias, J., Gao, G.: An adaptive meta-clustering approach: Combining the information from different clustering results. In: Proc. IEEE Computer Society Bioinformatics Conference, pp. 276–281 (2002)
Fern, X., Brodley, C.: Random projection for high dimensional data: A cluster ensemble approach. In: Proc. of the 20th International Conf. on Machine Learning (ICML), pp. 186–193 (2003)
Topchy, A., Jain, A., Punch, W.: A mixture model for clustering ensembles. In: Proc. of SIAM Conference on Data Mining (2004)
Dimitriadou, E., Weingessel, A., Hornik, K.: A combination scheme for fuzzy clustering. Inter. J. of Pattern Recognition and Artificial Intelligence 16, 901–912 (2002)
Frossyniotis, D., Pertselakis, M., Stafylopatis, M.: A multi-clustering fusion algorithm. In: Proc. of the 2nd Hellenic Conference on Artificial Intelligence, pp. 225–236 (2002)
Bradley, P., Fayyad, U.: Refining initial points for K-Means clustering. In: Proc. 15th International Conf. on Machine Learning (ICML), pp. 91–99 (1998)
Schwartz, G.: Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464 (1978)
Godfrey, J., Holliman, E., McDaniel, J.: Switchboard: Telephone speech corpus for research development. In: Proc. of ICASSP, pp. 517–520 (1992)
Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning, 311–331 (2004)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39(1), 1–38 (1977)
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Learning to classify text from labeled and unlabeled documents. In: Proc. of AAAI, pp. 792–799 (1998)
Cheeseman, P., Stutz, J.: Bayesian classification (AutoClass): Theory and results. In: Advances in Knowledge Discovery and Data Mining, AAAI Press/MIT Press (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Boulis, C., Ostendorf, M. (2004). Combining Multiple Clustering Systems. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Knowledge Discovery in Databases: PKDD 2004. PKDD 2004. Lecture Notes in Computer Science(), vol 3202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30116-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-30116-5_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23108-0
Online ISBN: 978-3-540-30116-5
eBook Packages: Springer Book Archive