Use of a Weighted Topic Hierarchy for Document Classification

  • Alexander Gelbukh
  • Grigori Sidorov
  • Adolfo Guzman-Arénas
Conference paper

DOI: 10.1007/3-540-48239-3_24

Part of the Lecture Notes in Computer Science book series (LNCS, volume 1692)
Cite this paper as:
Gelbukh A., Sidorov G., Guzman-Arénas A. (1999) Use of a Weighted Topic Hierarchy for Document Classification. In: Matousek V., Mautner P., Ocelíková J., Sojka P. (eds) Text, Speech and Dialogue. TSD 1999. Lecture Notes in Computer Science, vol 1692. Springer, Berlin, Heidelberg

Abstract

A statistical method of document classification driven by a hierarchical topic dictionary is proposed. The method uses a dictionary with a simple structure and is insensible to inaccuracies in the dictionary. Two kinds of weights of dictionary entries, namely, relevance and discrimination weights are discussed. The first type of weights is associated with the links between words and topics and between the nodes in the tree, while the weights of the second type depend on user database. A common sense-complaint way of assignment of these weights to the topics is presented. A system for text classification Classifier based on the discussed method is described.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Alexander Gelbukh
    • 1
  • Grigori Sidorov
    • 1
  • Adolfo Guzman-Arénas
    • 1
  1. 1.Natural Language Laboratory, Center for Computing Research (CIC)National Polytechnic Institute (IPN)Zacatenco, Mexico CityMexico

Personalised recommendations