Use of a Weighted Topic Hierarchy for Document Classification
- Cite this paper as:
- Gelbukh A., Sidorov G., Guzman-Arénas A. (1999) Use of a Weighted Topic Hierarchy for Document Classification. In: Matousek V., Mautner P., Ocelíková J., Sojka P. (eds) Text, Speech and Dialogue. TSD 1999. Lecture Notes in Computer Science, vol 1692. Springer, Berlin, Heidelberg
A statistical method of document classification driven by a hierarchical topic dictionary is proposed. The method uses a dictionary with a simple structure and is insensible to inaccuracies in the dictionary. Two kinds of weights of dictionary entries, namely, relevance and discrimination weights are discussed. The first type of weights is associated with the links between words and topics and between the nodes in the tree, while the weights of the second type depend on user database. A common sense-complaint way of assignment of these weights to the topics is presented. A system for text classification Classifier based on the discussed method is described.
Unable to display preview. Download preview PDF.