Training Classifiers for Tree-structured Categories with Partially Labeled Data

  • M. Ortega-Moral
  • D. Gutiérrez-González
  • M. L. De-Pablo
  • J. Cid-Sueiro


In this paper we propose a new method for training classifiers for multi-class problems when classes are not (necessarily) mutually exclusive and may be related by means of a probabilistic tree structure. It is based on the definition of a Bayesian model relating network parameters, feature vectors and categories. Learning is stated as a maximum likelihood estimation problem of the classifier parameters. The proposed algorithm is specially suited to situations where each training sample is labeled with respect to only one or part of the categories in the tree. Our experiments on information retrieval scenarios show the advantages of the proposed method.


training classifier Bayesian model probabilistic tree structure 


  1. 1.
    L. Cai and T. Hoffman, “Hierarchical Document Categorization with Support Vector Machines,” in Proc. of CIKM 2004, Washington DC, USA, Nov. 2004.Google Scholar
  2. 2.
    J. Keshet, O. Dekel, and Y. Singer, “Large Margin Hierarchical Classification,” in Proc. of the 21st ICML, Banff, Canada, 2004.Google Scholar
  3. 3.
    E. D. Wiener, A. S. Weigend, and J. O. Pedersen, “Exploiting Hierarchy in Text Categorization,” Inf. Retr., vol. 1, no. 3, October 1999, pp. 193–216.CrossRefGoogle Scholar
  4. 4.
    M. E. Ruiz and P. Srinivasan, “Hierarchical Text Categorization Using Neural Networks,” Inf. Retr., vol. 5, no. 1, 2002, pp. 87–117.MATHCrossRefGoogle Scholar
  5. 5.
    A. Lagreid, T. R. Hvidsten, H. Midelfart, J. Komorowski, and A. K. Sandvik, “Predicting Gene Ontology Biological Process from Temporal Gene Expression Patterns,” Genome Res., vol. 13, no. 5, April 2003, pp. 965–979.CrossRefGoogle Scholar
  6. 6.
    O. D. King, R. E. Foulger, S. S. Dwight, J. V. White, and F. P. Roth, “Predicting Gene Funtion from Patterns of Annotation,” Genome Res., vol. 13, no. 5, April 2003, pp. 896–904.CrossRefGoogle Scholar
  7. 7.
    F. V. Jensen, Bayesian Networks and Decision Graphs, Springer, Berlin Heidelberg New York, 2001.MATHGoogle Scholar
  8. 8.
    J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, San Mateo, CA, 1988.Google Scholar
  9. 9.
    M. I. Jordan and R. A. Jacobs, “Hierarchical Mixtures of Experts and the em Algorithm,” Neural Comput., vol. 6, no. 2, March 1994, pp. 181–214.Google Scholar
  10. 10.
    D. D. Lewis, Reuters-21578 Text Categorization Test Collection, Tech. Rep., AT&T Labs–Research, 1997.Google Scholar
  11. 11.
    D. D. Lewis, Y. Yang, T. Rose, and F. Li, “Rcv1: A New Benchmark Collection for Text Categorization Research,” J. Mach. Learn. Res., vol. 5, no. 361, 2004, p. 397.Google Scholar
  12. 12.
    E. Alpaydin, “Combined 5 × 2 cv f Test for Comparing Supervised Classification Learning Algorithms,” Neural Comput., vol. 11, no. 8, 1999, pp. 1885–1892.CrossRefGoogle Scholar
  13. 13.
    D. D. Lewis, “Rcv1-v2/lyrl2004: The Lyrl2004 Distribution of the Rcv1-v2 Text Categorization Test Collection,” Tech. Rep.,, 2004.

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • M. Ortega-Moral
    • 1
  • D. Gutiérrez-González
    • 1
  • M. L. De-Pablo
    • 1
  • J. Cid-Sueiro
    • 1
  1. 1.Department of Signal Theory and CommunicationsUniversidad Carlos III de MadridLeganés-MadridSpain

Personalised recommendations