Hierarchical Multi-classification with Predictive Clustering Trees in Functional Genomics

  • Jan Struyf
  • Sašo Džeroski
  • Hendrik Blockeel
  • Amanda Clare
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3808)


This paper investigates how predictive clustering trees can be used to predict gene function in the genome of the yeast Saccharomyces cerevisiae. We consider the MIPS FunCat classification scheme, in which each gene is annotated with one or more classes selected from a given functional class hierarchy. This setting presents two important challenges to machine learning: (1) each instance is labeled with a set of classes instead of just one class, and (2) the classes are structured in a hierarchy; ideally the learning algorithm should also take this hierarchical information into account. Predictive clustering trees generalize decision trees and can be applied to a wide range of prediction tasks by plugging in a suitable distance metric. We define an appropriate distance metric for hierarchical multi-classification and present experiments evaluating this approach on a number of data sets that are available for yeast.


Average Precision Yeast Gene Prediction Task Multitask Learning Hierarchical Information 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bakker, B., Heskes, T.: Task clustering for learning to learn. In: Proceedings of the 13th Belgium-Netherlands Conference on Artificial Intelligence, Amsterdam, pp. 33–40 (2001)Google Scholar
  2. 2.
    Bishop, C.M.: Neural Networks for Pattern Recognition. University Press, Oxford (1999)Google Scholar
  3. 3.
    Blockeel, H., Bruynooghe, M., Džeroski, S., Ramon, J., Struyf, J.: Hierarchical multi-classification. In: Proceedings of the ACM SIGKDD 2002 Workshop on Multi-Relational Data Mining (MRDM 2002), pp. 21–35 (2002)Google Scholar
  4. 4.
    Blockeel, H., De Raedt, L.: Top-down induction of first order logical decision trees. Artificial Intelligence 101(1-2), 285–297 (1998)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Blockeel, H., De Raedt, L., Ramon, J.: Top-down induction of clustering trees. In: Proceedings of the 15th International Conference on Machine Learning, pp. 55–63 (1998)Google Scholar
  6. 6.
    Blockeel, H., Džeroski, S., Grbović, J.: Simultaneous prediction of multiple chemical parameters of river water quality with tilde. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 32–40. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  7. 7.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Belmont (1984)MATHGoogle Scholar
  8. 8.
    Caruana, R.: Multitask learning. Machine Learning 28, 41–75 (1997)CrossRefGoogle Scholar
  9. 9.
    Clare, A.: Machine Learning and Data Mining for Yeast Functional Genomics. PhD thesis, University of Wales, Aberystwyth (2003)Google Scholar
  10. 10.
    Langley, P.: Elements of Machine Learning. Morgan Kaufmann, San Francisco (1996)Google Scholar
  11. 11.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann series in Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  12. 12.
    Ramon, J., Bruynooghe, M.: A polynomial time computable metric between point sets. Acta Informatica 37, 765–780 (2001)MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Todorovski, L., Blockeel, H., Džeroski, S.: Ranking with predictive clustering trees. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 444–455. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  14. 14.
    Ženko, B., Džeroski, S., Struyf, J.: Learning predictive clustering rules. Submitted to the Workshop on Knowledge Discovery in Inductive Databases at the 16th European Conference on Machine Learning, ECML (2005)Google Scholar
  15. 15.
    Wang, K., Zhou, S., Liew, S.C.: Building hierarchical classifiers using class proximity. In: VLDB 1999, Proceedings of 25th International Conference on Very Large Data Bases, Edinburgh, Scotland, UK, September 7-10, pp. 363–374. Morgan Kaufmann, San Francisco (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Jan Struyf
    • 1
  • Sašo Džeroski
    • 2
  • Hendrik Blockeel
    • 1
  • Amanda Clare
    • 3
  1. 1.Dept. of Computer ScienceKatholieke Universiteit LeuvenLeuvenBelgium
  2. 2.Dept. of Knowledge TechnologiesJozef Stefan InstituteLjubljanaSlovenia
  3. 3.Dept. of Computer ScienceThe University of Wales, AberystwythPenglais, Aberystwyth, CeredigionUK

Personalised recommendations