Information Retrieval

, Volume 1, Issue 3, pp 193–216 | Cite as

Exploiting Hierarchy in Text Categorization

  • Andreas S. Weigend
  • Erik D. Wiener
  • Jan O. Pedersen


With the recent dramatic increase in electronic access to documents, text categorization—the task of assigning topics to a given document—has moved to the center of the information sciences and knowledge management. This article uses the structure that is present in the semantic space of topics in order to improve performance in text categorization: according to their meaning, topics can be grouped together into “meta-topics”, e.g., gold, silver, and copper are all metals. The proposed architecture matches the hierarchical structure of the topic space, as opposed to a flat model that ignores the structure. It accommodates both single and multiple topic assignments for each document. Its probabilistic interpretation allows its predictions to be combined in a principled way with information from other sources. The first level of the architecture predicts the probabilities of the meta-topic groups. This allows the individual models for each topic on the second level to focus on finer discriminations within the group. Evaluating the performance of a two-level implementation on the Reuters-22173 testbed of newswire articles shows the most significant improvement for rare classes.

information retrieval text mining topic spotting text categorization knowledge management problem decomposition machine learning neural networks probabilistic models hierarchical models performance evaluation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Apte C, Damerau F and Weiss S (1994) Towards language independent automated learning of text categorization models. In: Proceedings of the 17th Annual ACM/SIGIR Conference, pp. 23–30.Google Scholar
  2. Berger JO (1985) Statistical Decision Theory and Bayesian Analysis. Springer Verlag.Google Scholar
  3. Bishop CM (1996) Neural Networks for Pattern Recognition. Oxford University Press.Google Scholar
  4. Breiman L, Friedman JH, Olshen RA and Stone CJ (1984) Classification and Regression Trees (CART). Wadsworth, Pacific Grove, CA.Google Scholar
  5. Buckley C, Salton G and Allen J (1994) The effect of adding relevance information in a relevance feedback environment. In: Proceedings of the 17th Annual ACM/SIGIR Conference, pp. 292–300.Google Scholar
  6. Cherkassky VS and Mulier FM (1998) Learning from Data: Concepts, Theory, and Methods. Wiley, New York.Google Scholar
  7. Cohen WW and Singer Y(1996) Context-sensitive learning metods for text categorization. In: SIGIR'96: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 307–315.Google Scholar
  8. Collett D (1991) Modelling Binary Data. Chapman and Hall, London.Google Scholar
  9. Dagan I, Feldman R and Hirsh H (1996) Keyword-based browsing and analysis of large document sets. In: Fifth Annual Symposium on Document Analysis and Information Retrieval (SDAIR'96), pp. 191–208.Google Scholar
  10. D'Alessio S, Kershenbaum A, Murray K and Schiaffino R (1998) Hierarchical text categorization. Technical Report, Department of Computer Science, Polytechnic University, Brooklyn, NY.Google Scholar
  11. Deerwester S, Dumais S, Furnas G, Landauer T and Harshman R (1990) Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391–407.Google Scholar
  12. Duda RO, Hart PE and Stork DG (1999) Pattern Classification and Scene Analysis, Part I: Pattern Classification. Wiley, New York.Google Scholar
  13. Haykin SS (1998) Neural Networks: A Comprehensive Foundation. Prentice Hall.Google Scholar
  14. Hertz J, Krogh A and Palmer RG (1991) Introduction to the Theory of Neural Computation. Addison-Wesley, Reading, MA.Google Scholar
  15. Hofmann T (1998) Learning and representing topic: A hierarchical mixture model for word occurrences in document databases. In: Conference for Automated Learning and Discovery, Workshop on Learning from Text and the Web (CMU).Google Scholar
  16. Hull D (1993) Using statistical testing in the evaluation of retrieval experiments. In: Proceedings of the 16th Annual ACM/SIGIR Conference, pp. 329–338.Google Scholar
  17. Hull D (1994) Improving text retrieval for the routing problem using latent semantic indexing. In: Proceedings of the 17th Annual ACM/SIGIR Conference, pp. 282–291.Google Scholar
  18. Ittner DJ, Lewis DD and Ahn DD (1995) Text categorization of low quality images. In: Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval, pp. 301–315.Google Scholar
  19. Kennedy RL, Lee Y, Roy BV and Reed C (1998) Solving Data Mining Problems Through Pattern Recognition. Prentice Hall.Google Scholar
  20. Koller D and Sahami M (1997) Hierarchically classifying documents using very few words. In: Proceedings of the 14th International Conference on Machine Learning (Nashville, Tennessee), pp. 170–178.Google Scholar
  21. LeBaron B and Weigend AS (1998) A bootstrap evaluation of the effect of data splitting on financial time series. IEEE Transactions on Neural Networks, 9(1):213–220.Google Scholar
  22. Lewis DD (1992) Representation and Learning in Information Retrieval. Ph.D. Thesis, Computer Science Department, Univ. of Massachussetts at Amherst.Google Scholar
  23. Lewis DD and Ringuette M (1994) A comparison of two learning algorithms for text categorization. In: Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93.Google Scholar
  24. Masand B, Linoff G and Waltz D (1992) Classifying news stories using memory based reasoning. In: Proceedings of the 15th Annual ACM/SIGIR Conference, pp. 59–65.Google Scholar
  25. McCullagh P and Nelder JA (1989) Generalized Linear Models. Chapman and Hall, London.Google Scholar
  26. Nilsson NJ (1998) Artificial Intelligence: A New Synthesis. Morgan Kaufmann.Google Scholar
  27. Pereira F, Tishby N and Lee L (1993) Distributional clustering of English words. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, pp. 183–190.Google Scholar
  28. Rocchio JJ (1971) Relevance feedback in information retrieval. In: The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice Hall, chap. 14, pp. 313–323.Google Scholar
  29. Rose K, Gurewitz E and Fox GC (1990) Statistical mechanics and phase transitions in clustering. Physical Review Letters, 65:945–948.Google Scholar
  30. Rumelhart DE, Durbin R, Golden R and Chauvin Y (1996) Backpropagation: The basic theory. In: Smolensky P, Mozer MC and Rumelhart DE, eds. Mathematical Perspectives on Neural Networks. Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 533–566.Google Scholar
  31. Russell SJ and Norvig P (1995) Artificial Intelligence: A Modern Approach (Prentice Hall Series in Artificial Intelligence). Prentice Hall, Englewood Cliffs, NJ.Google Scholar
  32. Schuetze H, Hull DA and Pedersen JO (1995) A comparison of classifiers and document representations for the routing problem. In: Fox EA, Ingwersen P and Fidel R, eds. Proceedings of the 18th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, pp. 229–237.Google Scholar
  33. van Rijsbergen CJ (1979) Information Retrieval, 2nd ed. Butterworths.Google Scholar
  34. Vapnik VN (1998) Statistical Learning Theory (Adaptive and Learning Systems for Signal Processing, Communications, and Control). Wiley, New York.Google Scholar
  35. Wiener ED (1995) A neural network approach to topic spotting in text. Master’s Thesis, Department of Computer Science, University of Colorado at Boulder Scholar
  36. Wiener ED, Pedersen, JO and Weigend AS (1995) A neural network approach to topic spotting. In:Fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR'95), pp. 317– » aweigend/Research/Papers/TextCategorization/Wiener.Pedersen.Weigend Scholar
  37. Yang Y (1994) Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In: Proceedings of the 17th Annual ACM/SIGIR Conference, pp. 13–22.Google Scholar
  38. Yang Y (1999) An evaluation of statistical approaches to text categorization. Information Retrieval.Google Scholar
  39. Yang Y and Chute CG (1992) A linear least squares fit mapping method for information retrieval from natural language texts. In: Proceedings of the 14th International Conference on Computational Linguistics, pp. 447–453.Google Scholar
  40. Yang Y and Chute CG (1994) An example-based mapping method for text categorization and retrieval. In: ACM Transaction on Information Systems (TOIS), pp. 252–277.Google Scholar

Copyright information

© Kluwer Academic Publishers 1999

Authors and Affiliations

  • Andreas S. Weigend
  • Erik D. Wiener
  • Jan O. Pedersen

There are no affiliations available

Personalised recommendations