Use of a Weighted Topic Hierarchy for Document Classification

Gelbukh, Alexander; Sidorov, Grigori; Guzman-Arénas, Adolfo

doi:10.1007/3-540-48239-3_24

Alexander Gelbukh³,
Grigori Sidorov³ &
Adolfo Guzman-Arénas³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1692))

Included in the following conference series:

International Workshop on Text, Speech and Dialogue

510 Accesses
15 Citations

Abstract

A statistical method of document classification driven by a hierarchical topic dictionary is proposed. The method uses a dictionary with a simple structure and is insensible to inaccuracies in the dictionary. Two kinds of weights of dictionary entries, namely, relevance and discrimination weights are discussed. The first type of weights is associated with the links between words and topics and between the nodes in the tree, while the weights of the second type depend on user database. A common sense-complaint way of assignment of these weights to the topics is presented. A system for text classification Classifier based on the discussed method is described.

The work partially supported by DEPI-IPN, CONACyT (26424-A), and REDII, Mexico.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anderson, J. D., Rowley, F. A.: Building End-user Thesauri from Full Text. In: Kwasnik, B. H., Fidel, R. (eds.): Advances in Classification Research. Proceedings of the 2nd ASIS SIG/CR Classification Research Workshop, Vol. 2. Learned Information, Medford, NJ. (1992) 1–13.
Google Scholar
Cohen, W. W.: Learning Trees and Rules with Setvalued Features. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence (1996).
Google Scholar
Cohen, W., Singer, Y.: Context-sensitive Learning Methods for Text Categorization. In: SIGIR’96 (1996).
Google Scholar
Gelbukh, A.: Using a Semantic Network for Lexical and Syntactic Disambiguation. In: Proceedings of Simposium Internacional de Computación: Nuevas Aplicaciones e Innovaciones Tecnológicas en Computación. Mexico (1997) 352–366.
Google Scholar
Guzmán-Arenas, A.: Finding the Main Themes in a Spanish Document. Journal Expert Systems with Applications 14 (1, 2) (1998) 139–148.
Article Google Scholar
Guzmán-Arenas, A.: Hallando los Temas Principales en un Artículo en Español. Soluciones Avanzadas 5 (45) (1997) 58, 5 (49) (1997). 66
Google Scholar
Jacob, E. K.: Cognition and Classification: A Crossdisciplinary Approach to a Philosophy of Classification. (Abstract.) In: Maxian, B. (ed.): ASIS’ 94: Proceedings of the 57th ASIS Annual Meeting. Medford, NJ: Learned Information (1994) 82.
Google Scholar
Krowetz, B.: Homonymy and Polysemy in Information Retrieval. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (1997) 72–79.
Google Scholar
Lewis, D. D., Ringuette, M.: A Comparison of Two Learning Algorithms for Text Categorization. In: Third Annual Symposium on Document Analysis and Information Retrieval (1994) 81–93.
Google Scholar
Riloff, E., Shepherd, J.: A Corpus Based Approach for Building Semantic Lexicons. In: Proceedings of the Second Conference on Empirical Methods in Natural Language Processing (EMNLP-2) (1997).
Google Scholar

Download references

Author information

Authors and Affiliations

Natural Language Laboratory, Center for Computing Research (CIC), National Polytechnic Institute (IPN), Av. Juan de Dios Bátiz, CP 07738, Zacatenco, Mexico City, Mexico
Alexander Gelbukh, Grigori Sidorov & Adolfo Guzman-Arénas

Authors

Alexander Gelbukh
View author publications
You can also search for this author in PubMed Google Scholar
Grigori Sidorov
View author publications
You can also search for this author in PubMed Google Scholar
Adolfo Guzman-Arénas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineerig, Faculty of Applied Sciences, University of West Bohemia in Plzeň, Universitní 22, 306 14, Pizeň, Czech Republic
Václav Matousek , Pavel Mautner & Jana Ocelíková , &
Department of Programming Systems and Communication, Faculty of Informatics, Masaryk University Brno, Botanická 68a, 602 00, Brno, Czech Republic
Petr Sojka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gelbukh, A., Sidorov, G., Guzman-Arénas, A. (1999). Use of a Weighted Topic Hierarchy for Document Classification. In: Matousek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds) Text, Speech and Dialogue. TSD 1999. Lecture Notes in Computer Science(), vol 1692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48239-3_24

Download citation

DOI: https://doi.org/10.1007/3-540-48239-3_24
Published: 01 October 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66494-9
Online ISBN: 978-3-540-48239-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics