Advertisement

TreeBoost.MH: A Boosting Algorithm for Multi-label Hierarchical Text Categorization

  • Andrea Esuli
  • Tiziano Fagni
  • Fabrizio Sebastiani
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4209)

Abstract

In this paper we propose TreeBoost.MH, an algorithm for multi-label Hierarchical Text Categorization (HTC) consisting of a hierarchical variant of AdaBoost.MH. TreeBoost.MH embodies several intuitions that had arisen before within HTC: e.g. the intuitions that both feature selection and the selection of negative training examples should be performed “locally”, i.e. by paying attention to the topology of the classification scheme. It also embodies the novel intuition that the weight distribution that boosting algorithms update at every boosting round should likewise be updated “locally”. We present the results of experimenting TreeBoost.MH on two HTC benchmarks, and discuss analytically its computational cost.

Keywords

Feature Selection Internal Node Weak Learner Positive Training Negative Training 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chakrabarti, S., Dom, B.E., Agrawal, R., Raghavan, P.: Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. Journal of Very Large Data Bases 7(3), 163–178 (1998)CrossRefGoogle Scholar
  2. 2.
    Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proceedings of the 14th International Conference on Machine Learning (ICML1997), Nashville, US, pp. 170–178 (1997)Google Scholar
  3. 3.
    Gaussier, É., Goutte, C., Popat, K., Chen, F.: A Hierarchical Model for Clustering and Categorising Documents. In: Crestani, F., Girolami, M., van Rijsbergen, C.J.K. (eds.) ECIR 2002. LNCS, vol. 2291, pp. 229–247. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    McCallum, A.K., Rosenfeld, R., Mitchell, T.M., Ng, A.Y.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning (ICML 1998), Madison, US, pp. 359–367 (1998)Google Scholar
  5. 5.
    Toutanova, K., Chen, F., Popat, K., Hofmann, T.: Text classification in a hierarchical mixture model for small training sets. In: Proceedings of the 10th ACM International Conference on Information and Knowledge Management (CIKM 2001), Atlanta, US, pp. 105–113 (2001)Google Scholar
  6. 6.
    Vinokourov, A., Girolami, M.: A probabilistic framework for the hierarchic organisation and classification of document collections. Journal of Intelligent Information Systems 18(2/3), 153–172 (2002)CrossRefGoogle Scholar
  7. 7.
    Ruiz, M., Srinivasan, P.: Hierarchical text classification using neural networks. Information Retrieval 5(1), 87–118 (2002)zbMATHCrossRefGoogle Scholar
  8. 8.
    Weigend, A.S., Wiener, E.D., Pedersen, J.O.: Exploiting hierarchy in text categorization. Information Retrieval 1(3), 193–216 (1999)CrossRefGoogle Scholar
  9. 9.
    Wiener, E.D., Pedersen, J.O., Weigend, A.S.: A neural network approach to topic spotting. In: Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR 1995), Las Vegas, US, pp. 317–332 (1995)Google Scholar
  10. 10.
    Dumais, S.T., Chen, H.: Hierarchical classification of web content. In: Proceedings of the 23rd ACM International Conference on Research and Development in Information Retrieval (SIGIR 2000), Athens, GR, pp. 256–263 (2000)Google Scholar
  11. 11.
    Yang, Y., Zhang, J., Kisiel, B.: A scalability analysis of classifiers in text categorization. In: Proceedings of the 26th ACM International Conference on Research and Development in Information Retrieval (SIGIR 2003), Toronto, CA, pp. 96–103 (2003)Google Scholar
  12. 12.
    Schapire, R.E., Singer, Y.: BoosTexter: a boosting-based system for text categorization. Machine Learning 39(2/3), 135–168 (2000)zbMATHCrossRefGoogle Scholar
  13. 13.
    Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Machine Learning 37(3), 297–336 (1999)zbMATHCrossRefGoogle Scholar
  14. 14.
    Schapire, R.E., Singer, Y., Singhal, A.: Boosting and Rocchio applied to text filtering. In: Proceedings of the 21st ACM International Conference on Research and Development in Information Retrieval (SIGIR 1998), Melbourne, AU, pp. 215–223 (1998)Google Scholar
  15. 15.
    Ng, H.T., Goh, W.B., Low, K.L.: Feature selection, perceptron learning, and a usability case study for text categorization. In: Proceedings of the 20th ACM International Conference on Research and Development in Information Retrieval (SIGIR 1997), Philadelphia, US, pp. 67–73 (1997)Google Scholar
  16. 16.
    Forman, G.: A pitfall and solution in multi-class feature selection for text classification. In: Proceedings of the 21st International Conference on Machine Learning (ICML 2004), Banff, CA (2004)Google Scholar
  17. 17.
    Esuli, A., Fagni, T., Sebastiani, F.: TreeBoost. MH: A boosting algorithm for multi-label hierarchical text categorization. Technical Report 2006-TR-56, Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Pisa, IT (submitted for publication, 2006)Google Scholar
  18. 18.
    Lewis, D.D., Li, F., Rose, T., Yang, Y.: RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)Google Scholar
  19. 19.
    Apté, C., Damerau, F.J., Weiss, S.M.: Automated learning of decision rules for text categorization. ACM Transactions on Information Systems 12(3), 233–251 (1994)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Andrea Esuli
    • 1
  • Tiziano Fagni
    • 1
  • Fabrizio Sebastiani
    • 1
  1. 1.Istituto di Scienza e Tecnologia dell’InformazioneConsiglio Nazionale delle RicerchePisaItaly

Personalised recommendations