Skip to main content

Prediction of Human Gene - Phenotype Associations by Exploiting the Hierarchical Structure of the Human Phenotype Ontology

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9043))

Abstract

The Human Phenotype Ontology (HPO) provides a conceptualization of phenotype information and a tool for the computational analysis of human diseases. It covers a wide range of phenotypic abnormalities encountered in human diseases and its terms (classes) are structured according to a directed acyclic graph. In this context the prediction of the phenotypic abnormalities associated to human genes is a key tool to stratify patients into disease subclasses that share a common biological or pathophisiological basis. Methods are being developed to predict the HPO terms that are associated for a given disease or disease gene, but most such methods adopt a simple ”flat” approach, that is they do not take into account the hierarchical relationships of the HPO, thus loosing important a priori information about HPO terms. In this contribution we propose a novel Hierarchical Top-Down (HTD) algorithm that associates a specific learner to each HPO term and then corrects the predictions according to the hierarchical structure of the underlying DAG. Genome-wide experimental results relative to a complex HPO DAG including more than 4000 HPO terms show that the proposed hierarchical-aware approach significantly improves predictions obtained with flat methods, especially in terms of precision/recall results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Robinson, P., Krawitz, P., Mundlos, S.: Strategies for exome and genome sequence data analysis in disease-gene discovery projects. Cin. Genet. 80, 127–132 (2011)

    Article  Google Scholar 

  2. Robinson, P., Kohler, S., Bauer, S., Seelow, D., Horn, D., Mundlos, S.: The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am. J. Hum. Genet. 83, 610–615 (2008)

    Article  Google Scholar 

  3. Amberger, J., Bocchini, C., Amosh, A.: A new face and new challenges for Online Mendelian inheritance in Man (OMIM). Hum. Mutat. 32, 564–567 (2011)

    Article  Google Scholar 

  4. Kohler, S., et al.: The human phenotype ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Research 42(Database issue), D966–D974 (2014)

    Google Scholar 

  5. Moreau, Y., Tranchevent, L.: Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nature Rev. Genet. 13(8), 523–536 (2012)

    Article  Google Scholar 

  6. McGary, K., Lee, I., Marcotte, E.: Broad network-based predictability of Saccharomyces cerevisiae gene loss-of-function phenotypes. Genome Biology 8(R258) (2007)

    Google Scholar 

  7. Mehan, M., Nunez-Iglesias, J., Dai, C., Waterman, M., Zhou, X.: An integrative modular approach to systematically predict gene-phenotype associations. BMC Bioinformatics 11(suppl. 1) (2010)

    Google Scholar 

  8. Wang, P., et al.: Inference of gene-phenotype associations via protein-protein interaction and orthology. PLoS One 8(10) (2013)

    Google Scholar 

  9. Musso, G., et al.: Novel cardiovascular gene functions revealed via systematic phenotype prediction in zebrafish. Development 141, 224–235 (2014)

    Article  Google Scholar 

  10. Cerri, R., de Carvalho, A.: Hierarchical multilabel protein function prediction using local neural networks. In: Norberto de Souza, O., Telles, G.P., Palakal, M. (eds.) BSB 2011. LNCS, vol. 6832, pp. 10–17. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  11. Silla, C., Freitas, A.: A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery 22(1-2), 31–72 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  12. Valentini, G.: True Path Rule hierarchical ensembles for genome-wide gene function prediction. IEEE ACM Transactions on Computational Biology and Bioinformatics 8(3), 832–847 (2011)

    Article  MathSciNet  Google Scholar 

  13. Cesa-Bianchi, N., Re, M., Valentini, G.: Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference. Machine Learning 88(1), 209–241 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  14. Obozinski, G., Lanckriet, G., Grant, C., Jordan, M., Noble, W.: Consistent probabilistic output for protein function prediction. Genome Biology 9(S6) (2008)

    Google Scholar 

  15. Schietgat, L., Vens, C., Struyf, J., Blockeel, H., Dzeroski, S.: Predicting gene function using hierarchical multi-label decision tree ensembles. BMC Bioinformatics 11(2) (2010)

    Google Scholar 

  16. Valentini, G.: Hierarchical Ensemble Methods for Protein Function Prediction. ISRN Bioinformatics 2014(Article ID 901419), 34 pages (2014)

    Google Scholar 

  17. Gene Ontology Consortium: Gene Ontology annotations and resources. Nucleic Acids Research 41, D530–D535 (2013)

    Google Scholar 

  18. Cormen, T., Leiserson, C., Rivest, R.: Introduction to Algorithms. MIT Press, Boston (2009)

    MATH  Google Scholar 

  19. Apweiler, R., Attwood, T., Bairoch, A., Bateman, A., et al.: The interpro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Research 29(1), 37–40 (2001)

    Article  Google Scholar 

  20. Finn, R., Tate, J., Mistry, J., Coggill, P., Sammut, J., Hotz, H., Ceric, G., Forslund, K., Eddy, S., Sonnhammer, E., Bateman, A.: The Pfam protein families database. Nucleic Acids Research 36, D281–D288 (2008)

    Google Scholar 

  21. Attwood, T.: The prints database: a resource for identification of protein families. Brief Bioinform. 3(3), 252–263 (2002)

    Article  Google Scholar 

  22. Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., Cuche, B., De Castro, E., Lachaize, C., Langendijk-Genevaux, P., Sigrist, C.: The 20 years of prosite. Nucleic Acids Research 36, D245–D249 (2008)

    Google Scholar 

  23. Schultz, J., Milpetz, F., Bork, P., Ponting, C.: Smart, a simple modular architecture research tool: identification of signaling domains. Proceedings of the National Academy of Sciences 95(11), 5857–5864 (1998)

    Article  Google Scholar 

  24. Gough, J., Karplus, K., Hughey, R., Chothia, C.: Assignment of homology to genome sequences using a library of hidden markov models that represent all proteins of known structure. Journal of Molecular Biology 313(4), 903–919 (2001)

    Article  Google Scholar 

  25. Valentini, G., Paccanaro, A., Caniza, H., Romero, A., Re, M.: An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods. Artificial Intelligence in Medicine 61(2), 63–78 (2014)

    Article  Google Scholar 

  26. Wu, G., Feng, X., Stein, L.: A human functional protein interaction network and its application to cancer data analysis. Genome Biol. 11, R53 (2010)

    Google Scholar 

  27. Lee, I., Blom, U., Wang, P.I., Shim, J., Marcotte, E.: Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Research 21(7), 1109–1121 (2011)

    Article  Google Scholar 

  28. Re, M., Valentini, G.: Cancer module genes ranking using kernelized score functions. BMC Bioinformatics 13(suppl.14/S3) (2012)

    Google Scholar 

  29. Re, M., Mesiti, M., Valentini, G.: A Fast Ranking Algorithm for Predicting Gene Functions in Biomolecular Networks. IEEE ACM Transactions on Computational Biology and Bioinformatics 9(6), 1812–1818 (2012)

    Article  Google Scholar 

  30. Re, M., Valentini, G.: Network-based Drug Ranking and Repositioning with respect to DrugBank Therapeutic Categories. IEEE/ACM Transactions on Computational Biology and Bioinformatics 10(6), 1359–1371 (2013)

    Article  Google Scholar 

  31. Oliver, S.: Guilt-by-association goes global. Nature 403, 601–603 (2000)

    Article  Google Scholar 

  32. Smola, A.J., Kondor, R.: Kernels and regularization on graphs. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 144–158. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  33. Zhu, X., et al.: Semi-supervised learning with gaussian fields and harmonic functions. In: Proc. of the 20th Int. Conf. on Machine Learning, Washintgton DC, USA (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Valentini, G., Köhler, S., Re, M., Notaro, M., Robinson, P.N. (2015). Prediction of Human Gene - Phenotype Associations by Exploiting the Hierarchical Structure of the Human Phenotype Ontology. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2015. Lecture Notes in Computer Science(), vol 9043. Springer, Cham. https://doi.org/10.1007/978-3-319-16483-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16483-0_7

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16482-3

  • Online ISBN: 978-3-319-16483-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics