Comparing Non-parametric Ensemble Methods for Document Clustering

Gonzàlez, Edgar; Turmo, Jordi

doi:10.1007/978-3-540-69858-6_25

Edgar Gonzàlez¹ &
Jordi Turmo¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5039))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

1379 Accesses
4 Citations

Abstract

The biases of individual algorithms for non-parametric document clustering can lead to non-optimal solutions. Ensemble clustering methods may overcome this limitation, but have not been applied to document collections. This paper presents a comparison of strategies for non-parametric document ensemble clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: Proc. of CIKM (2002)
Google Scholar
Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning 55(3) (2004)
Google Scholar
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrica 50 (1985)
Google Scholar
Li, T., Ma, S., Ogihara, M.: Document clustering via adaptive subspace iteration. In: Proc. of SIGIR (2004)
Google Scholar
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society Series B 63(2) (2001)
Google Scholar
Fraley, C., Raftery, A.: How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal 41(8) (1998)
Google Scholar
Surdeanu, M., Turmo, J., Ageno, A.: A hybrid unsupervised approach for document clustering. In: Proc. of KDD (2005)
Google Scholar
Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: Models of consensus and weak partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(12) (2005)
Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles - A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3 (2002)
Google Scholar
Siersdorfer, S., Sizov, S.: Restrictive clustering and metaclustering for self-organizing document collections. In: Proc. of SIGIR (2004)
Google Scholar
Greene, D., Cunningham, P.: Efficient ensemble methods for document clustering. Technical report, Department of Computer Science, Trinity College Dublin (2006)
Google Scholar
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. In: Proc. of ICDE (2005)
Google Scholar
Fred, A., Jain, A.: Robust data clustering. In: Proc. of CVPR (2003)
Google Scholar
Li, T., Ogihara, M., Ma, S.: On combining multiple clusterings. In: Proc. of CIKM (2004)
Google Scholar
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3) (2000)
Google Scholar
Slonim, N.: The Information Bottleneck: Theory and Applications. PhD thesis, The Hebrew University (2003)
Google Scholar
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistics 3 (1974)
Google Scholar
Dhillon, I., Guan, Y.: Information theoretic clustering of sparse co-occurrence data. In: Proc. of ICDM (2003)
Google Scholar
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

TALP Research Center, Universitat Politècnica de Catalunya,
Edgar Gonzàlez & Jordi Turmo

Authors

Edgar Gonzàlez
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Turmo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Epaminondas Kapetanios Vijayan Sugumaran Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gonzàlez, E., Turmo, J. (2008). Comparing Non-parametric Ensemble Methods for Document Clustering. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds) Natural Language and Information Systems. NLDB 2008. Lecture Notes in Computer Science, vol 5039. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69858-6_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-69858-6_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69857-9
Online ISBN: 978-3-540-69858-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics