Information Retrieval

, Volume 5, Issue 1, pp 87–118 | Cite as

Hierarchical Text Categorization Using Neural Networks

  • Miguel E. Ruiz
  • Padmini Srinivasan
Article

Abstract

This paper presents the design and evaluation of a text categorization method based on the Hierarchical Mixture of Experts model. This model uses a divide and conquer principle to define smaller categorization problems based on a predefined hierarchical structure. The final classifier is a hierarchical array of neural networks. The method is evaluated using the UMLS Metathesaurus as the underlying hierarchical structure, and the OHSUMED test set of MEDLINE records. Comparisons with an optimized version of the traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers are provided. The results show that the use of the hierarchical structure improves text categorization performance with respect to an equivalent flat model. The optimized Rocchio algorithm achieves a performance comparable with that of the hierarchical neural networks.

automatic text categorization applied neural networks hierarchical classifiers 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apte C, Damerau F and Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 3(12):233–251.Google Scholar
  2. Bridle J (1989) Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Fogelman-Soulie F. and Hérault J. Eds., Neuro-computing: Algorithms, Architectures, and Applications. Springer-Verlag, New York.Google Scholar
  3. Breiman L, Friedman JH, Olshen RA and Stone CJ (1984) Classification and Regression Trees. Wadsworth International Group, Belmont, CA.Google Scholar
  4. Buckley C and Salton G (1995) Optimization of relevance feedback weights. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, July 1995, pp. 351–357.Google Scholar
  5. Caropreso MF, Matwin S and Sebastiani F (2001) A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization. In: Amita G. Chin Ed., Text Databases and Document Management: Theory and Practice. Idea Group Publishing, 2001, pp. 78–102.Google Scholar
  6. Cohen W and Singer Y (1996) Context-sensitive learning methods for text categorization. In: Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, July 1996, pp. 307–315.Google Scholar
  7. d'Alché-Buc F, Zwierski D and Nadal JP (1994) Trio-Learning: A tool for building hybrid neural trees. International Journal of Neural Systems, 5(4): December 1994, pp. 259–274.Google Scholar
  8. Galavotti L, Sebastiani F and Simi M (2000) Experiments on the use of feature selection and negative evidence in automated text categorization. In: Proceedings of ECDL-00, 4th European Conference on Research and Advanced Technology for Digital Libraries, Lisbon, Portugal, pp. 59–68.Google Scholar
  9. Hersh W, Buckley C, Leone TJ and Hickman D (1994) OHSUMED: An interactive retrieval evaluation and new large test collection for research. In: Bruce W. Croft and Van Rijsbergen CJ. Eds., Proceedings of the 17th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, July 1994, pp. 192–201.Google Scholar
  10. Joachims T (1997) Text categorization with support vector machines: Learning with many relevant features.Technical Report LS-8 Report 23, University of Dortmund, 1997.Google Scholar
  11. Joachims T (1999) Estimating the generalization performance of a SVM efficiently. Technical report LS-8 Report25. Universität Dortmund, Dortmund, Dec. 1999.Google Scholar
  12. John G, Kohavi R and Pfleger K (1994) Irrelevant features and the subset selection problem. In: Machine Learning: Proceedings of the Eleventh International Conference, Morgan Kaufman Publishers, San Francisco, CA, pp. 121–129.Google Scholar
  13. Jordan MI and Jacobs RA (1993) Hierarchical mixtures of experts and the EM algorithm. Technical report A.I.Memo No. 1440, Massachusetts Institute of Technology.Google Scholar
  14. Koller D and Sahami M (1997) Hierarchically classifying documents using very few words. In: ICML-97: Proceedings of the Fourteenth International Conference on Machine Learning, San Francisco, CA, pp. 170–178.Google Scholar
  15. Lam W and Ho CY (1998) Using a generalized instance set for automatic text categorization. In: Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 81–89.Google Scholar
  16. Lam W, Ruiz ME and Srinivasan P (1999) Automatic text categorization and its application to text retrieval. IEEE Transactions on Knowledge and Data Engineering, 11(6): pp. 865–879.Google Scholar
  17. Lewis DD (1992) An evaluation of phrasal and clustered representations on a text categorization task. In: Proceedings of the 15th International ACMSIGIR Conference on Research and Development in Information Retrieval, June 1992, pp. 37–50.Google Scholar
  18. Lewis D and Ringuette M (1994) Comparison of two learning algorithms for text categorization. In: Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94).Google Scholar
  19. Lewis DD, Schapire RE, Callan JP and Papka R (1996) Training algorithms for linear text classifiers. In: Proceedings of the 19th International ACMSIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, July 1996, pp. 298–303.Google Scholar
  20. Liu H and Motoda H (1998) Less is more. In: Liu Huan and Motoda Hiroshi Eds., Feature Extraction Construction and Selection: A Data Mining Perspective. Kluwer Academic Publishers, Boston, MA, 1998, Ch. 1, pp. 3–12.Google Scholar
  21. McCallum A and Nigam K (1998) A comparison of event models for naive Bayes text classification. In: Learning for Text Categorization: Papers from the 1998 Workshop. AAAI Technical Report WS-98-05. AAAI, AAAI Press, San Francisco, CA, July 1998, pp. 41–48.Google Scholar
  22. McCallum A, Rosenfeld R, Mitchell T and Ng AY (1998) Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning. AAAI, Morgan Kaufmann, July 1998.Google Scholar
  23. McCullagh P and Nelder JA (1989) Generalized Linear Models. Chapman and Hall, London.Google Scholar
  24. Mladenić D (1998) Machine learning on non-homogeneous, distributed text data. PhD Dissertation, University of Ljubljana, Faculty of Computer an Information Science, Ljubljana, Slovenia.Google Scholar
  25. Moulinier I and Ganascia JG (1996) Applying an existing machine learning algorithm to text categorization. In: S Wermer, E Riloff and G Scheler Eds., Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, Springer Verlag, Heidelberg, Germany, pp. 343–354.Google Scholar
  26. National Library of Medicine (1999) Unified Medical Language System (UMLS) Knowledge Sources. 10th edition, U.S. Department of Health and Human Services, National Institute of Health, National Library of Medicine, Jan. 1999.Google Scholar
  27. Ng HT, Goh WB and Low KL (1997) Feature selection, perceptron learning, and a usability case study for text categorization. In: Belkin Nicholas, Desai Narasimhalu Aand Willett Peter Eds., Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, PA, July 1997, pp. 67–73.Google Scholar
  28. Rocchio JJ (1971) Relevance feedback in information retrieval. In: Gerald Salton Ed., The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice Hall, Englewood Cliffs, New Jersey.Google Scholar
  29. Rumelhart DE, Durbin R, Golden R and Chauvin Y, Backpropagation: The basic theory. In: Smolensky P. Mozer MC and Rumelhart DE Eds., Mathematical Perspectives on Neural Networks. Lawrence Earlbaum Associates, Hillsdale, NJ, pp. 533-566.Google Scholar
  30. Sahami M (1998) Using Machine Learning to Improve Information Access. PhD Thesis, Stanford University, Computer Science Department.Google Scholar
  31. Schapire RE, Singer Y and Singhal A (1998) Boosting and Rocchio applied to text filtering. In: Proceedings of the 21st Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, Aug. 1998. pp. 215–223.Google Scholar
  32. Schütze H, Hull DA and Pedersen JO (1995) A comparison of classifiers and document representations for the routing problem. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, July 1995, pp. 229–237.Google Scholar
  33. Singhal A, Buckley C and Mitra M (1996) Pivoted document length normalization. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, Aug. 1996. pp. 21–29.Google Scholar
  34. Singhal A, Mitra M and Buckley C (1997) Learning routing queries in a query zone. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, PA, July 1997. pp. 25–32.Google Scholar
  35. van Rijsbergen CJ (1979) Information Retrieval, 2nd ed. Butterworths, London.Google Scholar
  36. Waterhouse SR (1997) Classification and regression using mixtures of experts. PhD. Dissertation, University of Cambridge, Cambridge England.Google Scholar
  37. Waterhouse SR and Robinson AJ (1994) Classification using hierarchical mixture of experts. In: Proceedings of the 1994 IEEE Workshop on Neural Networks for Signal Processing IV, pp. 177–186.Google Scholar
  38. Weigend AS, Wiener EDand Pedersen JO (1999) Exploiting hierarchy in text categorization. Information Retrieval, 1(3):193–216.Google Scholar
  39. Wiener E, Pedersen JO and Weigend AS (1995) A neural network approach to topic spotting. In: Proceedings of SDAIR'95, pp. 317–332.Google Scholar
  40. Wolpert DH (1993) Stacked generalization. Tech. Rep. LA-UR-90-3460, The Santa Fe Institute, Santa Fe, NM.Google Scholar
  41. Yang J and Honovar V (1998) Feature subset selection using a genetic algorithm. In: Liu Huan and Motoda Hiroshi Eds., Feature Extraction, Construction and Selection:AData Mining Perspective. Kluwer Academic Publishers, Boston, MA, Ch. 8, pp. 117–136.Google Scholar
  42. Yang Y (1996) An evaluation of statistical approaches to MEDLINE indexing. In: Proceedings of the American Medical Informatic Association (AMIA), pp. 358–362.Google Scholar
  43. Yang Y (1999) An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1):69–90.Google Scholar
  44. Yang Y and Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97). Morgan Kaufmann Publishers, San Francisco, CA, July 1997.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Miguel E. Ruiz
    • 1
  • Padmini Srinivasan
    • 1
  1. 1.School of Library and Information ScienceThe University of IowaIowa CityUSA

Personalised recommendations