Abstract
Clustering is a powerful tool for knowledge discovery in text collections. The quality of document clustering depends not only on clustering algorithms but also on document representation models. We develop a hierarchical document clustering algorithm based on a tolerance rough set model (TRSM) for representing documents, which offers a way of considering semantics relatedness between documents. The results of validation and evaluation of this method suggest that this clustering algorithm can be well adapted to text mining.
Chapter PDF
Similar content being viewed by others
References
Fakes, W. B. and Baeza-Yates, Information Retrieval.Data Structures and Algorithms(eds.), Prentice Hall, 1992.
Ho, T. B. and Funakoshi K., “Information retrieval using rough sets”, Journal of Japanese Society for Artificial Intelligence, Vol. 13, N. 3, 1998, 424–433.
Pawlak, Z., Rough sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, 1991.
Polkowski, L. and Skowron, A., Rough Sets in Knowledge Discovery 2. Applications, Case Studies and Software Systems(eds.), Physica-Verlag, 1998.
Skowron, A. and Stepaniuk, J., “Generalized approximation spaces”, The 3rd International Workshop on Rough Sets and Soft Computing, 1994, 156–163.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kawasaki, S., Binh, N., Bao, T. (2000). Hierarchical Document Clustering Based on Tolerance Rough Set Model. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2000. Lecture Notes in Computer Science(), vol 1910. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45372-5_51
Download citation
DOI: https://doi.org/10.1007/3-540-45372-5_51
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41066-9
Online ISBN: 978-3-540-45372-7
eBook Packages: Springer Book Archive