Advertisement

True Path Rule Hierarchical Ensembles

  • Giorgio Valentini
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5519)

Abstract

Hierarchical classification problems gained increasing attention within the machine learning community, and several methods for hierarchically structured taxonomies have been recently proposed, with applications ranging from classification of web documents to bioinformatics. In this paper we propose a novel ensemble algorithm for multilabel, multi-path, tree-structured hierarchical classification problems based on the true path rule borrowed from the Gene Ontology. Local base classifiers, each specialized to recognize a single class of the hierarchy, exchange information between them to achieve a global “consensus” ensemble decision. A two-way asymmetric flow of information crosses the tree-structured ensemble: positive predictions for a node influence its ancestors, while negative predictions influence its offsprings. The resulting True Path Rule hierarchical ensemble is applied to the prediction of gene function in the yeast, using the FunCat taxonomy and biomolecular data obtained from high-throughput biotechnologies.

Keywords

Gene Ontology Average Precision Positive Prediction Positive Decision Negative Prediction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Dumais, S., Chen, H.: Hierarchical classification of web content. In: Proc. of the 23rd ACM Int. Conf. on Research and Development in Information Retrieval, pp. 256–263. ACM Press, New York (2000)Google Scholar
  2. 2.
    Rousu, J., et al.: Learning hierarchical multi-category text classification models. In: Proc. of the 22nd ICML, pp. 745–752. OmniPress (2005)Google Scholar
  3. 3.
    Barutcuoglu, Z., Schapire, R., Troyanskaya, O.: Hierarchical multi-label prediction of gene function. Bioinformatics 22, 830–836 (2006)CrossRefGoogle Scholar
  4. 4.
    Guan, Y., et al.: Predicting gene function in a hierarchical context with an ensemble of classifiers. Genome Biology 9 (2008)Google Scholar
  5. 5.
    Ruepp, A., et al.: The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucl. Ac. Res. 32, 5539–5545 (2004)CrossRefGoogle Scholar
  6. 6.
    Dekel, O., Keshet, J., Singer, Y.: Large margin hierarchical classification. In: Proc. of the 21st ICML, pp. 209–216. Omnipress (2004)Google Scholar
  7. 7.
    Cesa-Bianchi, N., Gentile, C., Zaniboni, L.: Hierarchical classification: Combining Bayes with SVM. In: Proc. of the 23rd ICML, pp. 177–184. ACM Press, New York (2006)Google Scholar
  8. 8.
    The Gene Ontology Consortium: Gene ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000)Google Scholar
  9. 9.
    Valentini, G., Cesa-Bianchi, N.: Hcgene: a software tool to support the hierarchical classification of genes. Bioinformatics 24, 729–731 (2008)CrossRefGoogle Scholar
  10. 10.
    Ben-Hur, A., Noble, W.: Choosing negative examples for the prediction of protein-protein interactions. BMC Bioinformatics 7 (2006)Google Scholar
  11. 11.
    Finn, R., et al.: The Pfam protein families database. Nucl. Ac. Res. 36, D281–D288 (2008)CrossRefGoogle Scholar
  12. 12.
    Eddy, S.: Profile hidden markov models. Bioinformatics 14, 755–763 (1998)CrossRefGoogle Scholar
  13. 13.
    Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. Journal of Molecular Biology 215 (1990)Google Scholar
  14. 14.
    Pavlidis, P., Weston, J., Cai, J., Noble, W.: Learning gene functional classification from multiple data. J. Comput. Biol. 9, 401–411 (2002)CrossRefGoogle Scholar
  15. 15.
    Spellman, P., et al.: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomices cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 (1998)CrossRefGoogle Scholar
  16. 16.
    Gasch, P., et al.: Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11, 4241–4257 (2000)CrossRefGoogle Scholar
  17. 17.
    Stark, C., et al.: BioGRID: a general repository for interaction datasets. Nucl. Ac. Res. 34, D535–D539 (2006)CrossRefGoogle Scholar
  18. 18.
    Lin, H., Lin, C., Weng, R.: A note on Platt’s probabilistic outputs for support vector machines. Machine Learning 68, 267–276 (2007)CrossRefGoogle Scholar
  19. 19.
    Dietterich, T.: Approximate statistical test for comparing supervised classification learning algorithms. Neural Computation 10, 1895–1924 (1998)CrossRefGoogle Scholar
  20. 20.
    Pena-Castillo, L., et al.: A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biology 9 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Giorgio Valentini
    • 1
  1. 1.DSI, Dipartimento di Scienze dell’ InformazioneUniversità degli Studi di MilanoMilanoItalia

Personalised recommendations