Abstract
We have carried out experiments in clustering a news corpus. In these experiments we have used two partitional methods varying two different parameters of the clustering tool. In addition,we have worked with the whole document (news)and with representative parts of the document. We have obtained good results working with a representative part of the document. The experiments have been carried out with news in Spanish and Basque in order to compare the results in both languages.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Industry Standard IPTC Subject Codes.http://www.sipausa.com/iptcsubject-codes.htm.
A. Gelbukh, G. Sidorov, A. Guzman-Arenas.“Use of a weighted topic hierarchy for text retrieval and classification.”Text,Speech and Dialogue.Proc.TSD-99. Lecture Notes in Artificial Intelligence,No.1692,Springer,130–135,1999.
“Project HERMES (Hemerotecas Electrónicas:Recuperación Multilingue y Extracción Semántica)”of the Spanish Research Agency,(TIC2000-0335-C03-03). http://terral.ieec.uned.es/hermes/.
Y. Zhao and G. Karypis.“Evaluation of hierarchical clustering algorithms for document data sets ”.CIKM,2002.
Y. Zhao and G. Karypis.“Criterion functions for document clustering:Experiments and analysis ”.http://cs.umn.edu/karypis/publications.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Casillas, A., de González Lena, M., Martínez, R. (2003). Partitional Clustering Experiments with News Documents. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2003. Lecture Notes in Computer Science, vol 2588. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36456-0_68
Download citation
DOI: https://doi.org/10.1007/3-540-36456-0_68
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00532-2
Online ISBN: 978-3-540-36456-6
eBook Packages: Springer Book Archive