Learning to Rank from Structures in Hierarchical Text Classification

  • Qi Ju
  • Alessandro Moschitti
  • Richard Johansson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7814)


In this paper, we model learning to rank algorithms based on structural dependencies in hierarchical multi-label text categorization (TC). Our method uses the classification probability of the binary classifiers of a standard top-down approach to generate k-best hypotheses. The latter are generated according to their global probability while at the same time satisfy the structural constraints between father and children nodes. The rank is then refined using Support Vector Machines and tree kernels applied to a structural representation of hypotheses, i.e., a hierarchy tree in which the outcome of binary one-vs-all classifiers is directly marked in its nodes. Our extensive experiments on the whole Reuters Corpus Volume 1 show that our models significantly improve over the state of the art in TC, thanks to the use of structural dependecies.


Support Vector Machine Text Categorization Good Hypothesis Category Assignment Tree Kernel 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bennett, P.N., Nguyen, N.: Refined experts: improving classification in large taxonomies. In: SIGIR (2009)Google Scholar
  2. 2.
    Cai, L., Hofmann, T.: Hierarchical document categorization with support vector machines. In: CIKM (2004)Google Scholar
  3. 3.
    Cesa-Bianchi, N., Gentile, C., Zaniboni, L.: Incremental algorithms for hierarchical classification. JMLR (2006)Google Scholar
  4. 4.
    Charniak, E., Johnson, M.: Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In: ACL (2005)Google Scholar
  5. 5.
    Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In: ACL (2002)Google Scholar
  6. 6.
    DeCoro, C., Barutcuoglu, Z., Fiebrink, R.: Bayesian aggregation for hierarchical genre classification. In: International Symposium on Information Retrieval (2007)Google Scholar
  7. 7.
    Dekel, O., Keshet, J., Singer, Y.: Large margin hierarchical classification. In: ICML (2004)Google Scholar
  8. 8.
    Dumais, S.T., Chen, H.: Hierarchical classification of web content. In: SIGIR (2000)Google Scholar
  9. 9.
    Finley, T., Joachims, T.: Parameter learning for loopy markov random fields with structural support vector machines. In: ICML Workshop (2007)Google Scholar
  10. 10.
    Gopal, S., Yang, Y.: Multilabel classification with meta-level features. In: SIGIR (2010)Google Scholar
  11. 11.
    Huang, L., Chiang, D.: Better k-best parsing. In: IWPT Workshop (2005)Google Scholar
  12. 12.
    Joachims, T.: Making large-scale SVM learning practical. Advances in Kernel Methods – Support Vector Learning (1999)Google Scholar
  13. 13.
    Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: ICML (1997)Google Scholar
  14. 14.
    Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML (2001)Google Scholar
  15. 15.
    Lewis, D.D., Yang, Y., Rose, T., Li, F.: Rcv1: A new benchmark collection for text categorization research. JMLR (2004)Google Scholar
  16. 16.
    Liu, T.Y., Yang, Y., Wan, H., Zeng, H.J., Chen, Z., Ma, W.Y.: Support vector machines classification with a very large-scale taxonomy. SIGKDD Explorations (2005)Google Scholar
  17. 17.
    McCallum, A., Rosenfeld, R., Mitchell, T.M., Ng, A.Y.: Improving text classification by shrinkage in a hierarchy of classes. In: ICML (1998)Google Scholar
  18. 18.
    Moschitti, A.: Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 318–329. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  19. 19.
    Moschitti, A., Ju, Q., Johansson, R.: Modeling topic dependencies in hierarchical text categorization. In: ACL (2012)Google Scholar
  20. 20.
    Padó, S.: User’s guide to sigf: Significance testing by approximate randomisation (2006)Google Scholar
  21. 21.
    Punera, K., Ghosh, J.: Enhanced hierarchical classification via isotonic smoothing. In: WWW (2008)Google Scholar
  22. 22.
    Rifkin, R., Klautau, A.: In defense of one-vs-all classification. JMLR (2004)Google Scholar
  23. 23.
    Rousu, J., Saunders, C., Szedmak, S., Shawe-Taylor, J.: Kernel-based learning of hierarchical multilabel classification models. JMLR (2006)Google Scholar
  24. 24.
    Shahbaba, B., Neal, R.M.: Improving classification when a class hierarchy is available using a hierarchy-based prior. Tech. rep., Bayesian Analysis (2005)Google Scholar
  25. 25.
    Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. In: DMKD (2011)Google Scholar
  26. 26.
    Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML (2004)Google Scholar
  27. 27.
    Tsoumakas, G., Katakis, I., Vlahavas, I.: Random k-labelsets for multi-label classification. In: TKDE (2011)Google Scholar
  28. 28.
    Xue, G.R., Xing, D., Yang, Q., Yu, Y.: Deep classification in large-scale text hierarchies. In: SIGIR (2008)Google Scholar
  29. 29.
    Yeh, A.S.: More accurate tests for the statistical significance of result differences. In: COLING (2000)Google Scholar
  30. 30.
    Zhou, D., Xiao, L., Wu, M.: Hierarchical classification via orthogonal transfer. In: ICML (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Qi Ju
    • 1
  • Alessandro Moschitti
    • 1
  • Richard Johansson
    • 2
  1. 1.DISIUniversity of TrentoItaly
  2. 2.Department of SwedishUniversity of GothenburgSweden

Personalised recommendations