Abstract
A statistical method of document classification driven by a hierarchical topic dictionary is proposed. The method uses a dictionary with a simple structure and is insensible to inaccuracies in the dictionary. Two kinds of weights of dictionary entries, namely, relevance and discrimination weights are discussed. The first type of weights is associated with the links between words and topics and between the nodes in the tree, while the weights of the second type depend on user database. A common sense-complaint way of assignment of these weights to the topics is presented. A system for text classification Classifier based on the discussed method is described.
The work partially supported by DEPI-IPN, CONACyT (26424-A), and REDII, Mexico.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anderson, J. D., Rowley, F. A.: Building End-user Thesauri from Full Text. In: Kwasnik, B. H., Fidel, R. (eds.): Advances in Classification Research. Proceedings of the 2nd ASIS SIG/CR Classification Research Workshop, Vol. 2. Learned Information, Medford, NJ. (1992) 1–13.
Cohen, W. W.: Learning Trees and Rules with Setvalued Features. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence (1996).
Cohen, W., Singer, Y.: Context-sensitive Learning Methods for Text Categorization. In: SIGIR’96 (1996).
Gelbukh, A.: Using a Semantic Network for Lexical and Syntactic Disambiguation. In: Proceedings of Simposium Internacional de ComputaciĂ³n: Nuevas Aplicaciones e Innovaciones TecnolĂ³gicas en ComputaciĂ³n. Mexico (1997) 352–366.
GuzmĂ¡n-Arenas, A.: Finding the Main Themes in a Spanish Document. Journal Expert Systems with Applications 14 (1, 2) (1998) 139–148.
GuzmĂ¡n-Arenas, A.: Hallando los Temas Principales en un ArtĂculo en Español. Soluciones Avanzadas 5 (45) (1997) 58, 5 (49) (1997). 66
Jacob, E. K.: Cognition and Classification: A Crossdisciplinary Approach to a Philosophy of Classification. (Abstract.) In: Maxian, B. (ed.): ASIS’ 94: Proceedings of the 57th ASIS Annual Meeting. Medford, NJ: Learned Information (1994) 82.
Krowetz, B.: Homonymy and Polysemy in Information Retrieval. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (1997) 72–79.
Lewis, D. D., Ringuette, M.: A Comparison of Two Learning Algorithms for Text Categorization. In: Third Annual Symposium on Document Analysis and Information Retrieval (1994) 81–93.
Riloff, E., Shepherd, J.: A Corpus Based Approach for Building Semantic Lexicons. In: Proceedings of the Second Conference on Empirical Methods in Natural Language Processing (EMNLP-2) (1997).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gelbukh, A., Sidorov, G., Guzman-ArĂ©nas, A. (1999). Use of a Weighted Topic Hierarchy for Document Classification. In: Matousek, V., Mautner, P., OcelĂkovĂ¡, J., Sojka, P. (eds) Text, Speech and Dialogue. TSD 1999. Lecture Notes in Computer Science(), vol 1692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48239-3_24
Download citation
DOI: https://doi.org/10.1007/3-540-48239-3_24
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66494-9
Online ISBN: 978-3-540-48239-0
eBook Packages: Springer Book Archive