Information Retrieval

, Volume 11, Issue 4, pp 287–313 | Cite as

Boosting multi-label hierarchical text categorization

  • Andrea Esuli
  • Tiziano Fagni
  • Fabrizio Sebastiani


Hierarchical Text Categorization (HTC) is the task of generating (usually by means of supervised learning algorithms) text classifiers that operate on hierarchically structured classification schemes. Notwithstanding the fact that most large-sized classification schemes for text have a hierarchical structure, so far the attention of text classification researchers has mostly focused on algorithms for “flat” classification, i.e. algorithms that operate on non-hierarchical classification schemes. These algorithms, once applied to a hierarchical classification problem, are not capable of taking advantage of the information inherent in the class hierarchy, and may thus be suboptimal, in terms of efficiency and/or effectiveness. In this paper we propose TreeBoost.MH, a multi-label HTC algorithm consisting of a hierarchical variant of AdaBoost.MH, a very well-known member of the family of “boosting” learning algorithms. TreeBoost.MH embodies several intuitions that had arisen before within HTC: e.g. the intuitions that both feature selection and the selection of negative training examples should be performed “locally”, i.e. by paying attention to the topology of the classification scheme. It also embodies the novel intuition that the weight distribution that boosting algorithms update at every boosting round should likewise be updated “locally”. All these intuitions are embodied within TreeBoost.MH in an elegant and simple way, i.e. by defining TreeBoost.MH as a recursive algorithm that uses AdaBoost.MH as its base step, and that recurs over the tree structure. We present the results of experimenting TreeBoost.MH on three HTC benchmarks, and discuss analytically its computational cost.


Hierarchical text classification Boosting 



This work has been partially supported by Project “Networked Peers for Business” (NeP4B), funded by the Italian Ministry of University and Research (MIUR) under the “Fondo per gli Investimenti della Ricerca di Base” (FIRB) funding scheme.


  1. Apté, C., Damerau, F. J., & Weiss, S. M. (1994). Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3), 233–251.CrossRefGoogle Scholar
  2. Cai, L., & Hofmann, T. (2004). Hierarchical document categorization with support vector machines. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM’04), pp. 78–87.Google Scholar
  3. Ceci, M., & Malerba, D. (2007). Classifying Web documents in a hierarchy of categories: A comprehensive study. Journal of Intelligent Information Systems, 28(1), 37–78.CrossRefGoogle Scholar
  4. Chakrabarti, S., Dom, B. E., Agrawal, R., & Raghavan, P. (1998). Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. Journal of Very Large Data Bases, 7(3), 163–178.CrossRefGoogle Scholar
  5. Cheng, C.-H., Tang, J., Wai-Chee, A., & King, I. (2001). Hierarchical classification of documents with error control. In Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’01) (pp. 433–443). Hong Kong, CN.Google Scholar
  6. Dumais, S. T., & Chen, H. (2000). Hierarchical classification of web content. In Proceedings of the 23rd ACM International Conference on Research and Development in Information Retrieval (SIGIR’00) (pp. 256–263). Athens, GR.Google Scholar
  7. Fagni, T., & Sebastiani, F. (2007). On the selection of negative examples for hierarchical text categorization. In Proceedings of the 3rd Language & Technology Conference (LTC’07) (pp. 24–28). Poznań, PL.Google Scholar
  8. Forman, G. (2004). A pitfall and solution in multi-class feature selection for text classification. In Proceedings of the 21st International Conference on Machine Learning (ICML’04). Banff, CA.Google Scholar
  9. Gaussier, É., Goutte, C., Popat, K., & Chen, F. (2002). A hierarchical model for clustering and categorising documents. In Proceedings of the 24th European Colloquium on Information Retrieval Research (ECIR’02) (pp. 229–247). Glasgow, UK.Google Scholar
  10. Koller, D., & Sahami, M. (1997). Hierarchically classifying documents using very few words. In Proceedings of the 14th International Conference on Machine Learning (ICML’97) (pp. 170–178). Nashville, US.Google Scholar
  11. Lewis, D. D. (1992). Representation and learning in information retrieval. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst, US.Google Scholar
  12. Lewis, D. D., Li, F., Rose, T., & Yang, Y. (2004). RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 361–397.Google Scholar
  13. Liu, T. Y., Yang, Y., Wan, H., Zeng, H. J., Chen, Z., & Ma, W. Y. (2005). Support vector machines classification with a very large-scale taxonomy. SIGKDD Explorations, 7(1), 36–43.CrossRefGoogle Scholar
  14. McCallum, A. K., Rosenfeld, R., Mitchell, T. M., Ng, A. Y. (1998). Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of the 15th International Conference on Machine Learning (ICML’98) (pp. 359–367). Madison, US.Google Scholar
  15. Ng, H. T., Goh, W. B., Low, K. L. (1997). Feature selection, perceptron learning, and a usability case study for text categorization. In Proceedings of the 20th ACM International Conference on Research and Development in Information Retrieval (SIGIR’97) (pp. 67–73). Philadelphia, US.Google Scholar
  16. Rose, T., Stevenson, M., & Whitehead, M. (2002). The Reuters Corpus Volume 1—from yesterday’s news to tomorrow’s language resources. In Proceedings of the 3rd International Conference on Language (LREC’02) Resources and Evaluation (pp. 827–832). Las Palmas, ES.Google Scholar
  17. Ruiz, M., & Srinivasan, P. (2002). Hierarchical text classification using neural networks. Information Retrieval, 5(1), 87–118.CrossRefzbMATHGoogle Scholar
  18. Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), 297–336.CrossRefzbMATHGoogle Scholar
  19. Schapire, R. E., & Singer, Y. (2000). BoosTexter: A boosting-based system for text categorization. Machine Learning, 39(2/3), 135–168.CrossRefzbMATHGoogle Scholar
  20. Schapire, R. E., Singer, Y., & Singhal, A. (1998). Boosting and Rocchio applied to text filtering. In Proceedings of the 21st ACM International Conference on Research and Development in Information Retrieval (SIGIR’98) (pp. 215–223). Melbourne, AU.Google Scholar
  21. Sebastiani, F., Sperduti, A., & Valdambrini N. (2000). An improved boosting algorithm and its application to automated text categorization. In Proceedings of the 9th ACM International Conference on Information and Knowledge Management (CIKM’00) (pp. 78–85). McLean, US.Google Scholar
  22. Spiegel, M. R., & Stephens, L. J. (1999). Statistics (3rd ed.). New York, US: McGraw-Hill.Google Scholar
  23. Sun, A., & Lim, E.-P. (2001). Hierarchical text classification and evaluation. In Proceedings of the 1st IEEE International Conference on Data Mining (ICDM-01) (pp. 521–528). San Jose, US.Google Scholar
  24. Toutanova, K., Chen, F., Popat, K., & Hofmann, T. (2001). Text classification in a hierarchical mixture model for small training sets. In Proceedings of the 10th ACM International Conference on Information and Knowledge Management (CIKM’01) (pp. 105–113). Atlanta, US.Google Scholar
  25. Vinokourov, A., & Girolami, M. (2002). A probabilistic framework for the hierarchic organisation and classification of document collections. Journal of Intelligent Information Systems, 18(2/3), 153–172.CrossRefGoogle Scholar
  26. Weigend, A. S., Wiener, E. D., & Pedersen, J. O. (1999). Exploiting hierarchy in text categorization. Information Retrieval, 1(3), 193–216.CrossRefGoogle Scholar
  27. Wiener, E. D., Pedersen, J. O., & Weigend, A. S. (1995). A neural network approach to topic spotting. In Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR’95) (pp. 317–332). Las Vegas, US.Google Scholar
  28. Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. In Proceedings of the 22nd ACM International Conference on Research and Development in Information Retrieval (SIGIR’99) (pp. 42–49). Berkeley, US.Google Scholar
  29. Yang, Y., Zhang, J., & Kisiel, B. (2003). A scalability analysis of classifiers in text categorization. In Proceedings of the 26th ACM International Conference on Research and Development in Information Retrieval (SIGIR’03) (pp. 96–103). Toronto, CA.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Andrea Esuli
    • 1
  • Tiziano Fagni
    • 1
  • Fabrizio Sebastiani
    • 1
  1. 1.Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle RicerchePisaItaly

Personalised recommendations