Prediction of Human Gene - Phenotype Associations by Exploiting the Hierarchical Structure of the Human Phenotype Ontology

Valentini, Giorgio; Köhler, Sebastian; Re, Matteo; Notaro, Marco; Robinson, Peter N.

doi:10.1007/978-3-319-16483-0_7

Prediction of Human Gene - Phenotype Associations by Exploiting the Hierarchical Structure of the Human Phenotype Ontology

Giorgio Valentini²⁰,
Sebastian Köhler²¹,
Matteo Re²⁰,
Marco Notaro²² &
…
Peter N. Robinson^21,23

Conference paper

2514 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9043))

Abstract

The Human Phenotype Ontology (HPO) provides a conceptualization of phenotype information and a tool for the computational analysis of human diseases. It covers a wide range of phenotypic abnormalities encountered in human diseases and its terms (classes) are structured according to a directed acyclic graph. In this context the prediction of the phenotypic abnormalities associated to human genes is a key tool to stratify patients into disease subclasses that share a common biological or pathophisiological basis. Methods are being developed to predict the HPO terms that are associated for a given disease or disease gene, but most such methods adopt a simple ”flat” approach, that is they do not take into account the hierarchical relationships of the HPO, thus loosing important a priori information about HPO terms. In this contribution we propose a novel Hierarchical Top-Down (HTD) algorithm that associates a specific learner to each HPO term and then corrects the predictions according to the hierarchical structure of the underlying DAG. Genome-wide experimental results relative to a complex HPO DAG including more than 4000 HPO terms show that the proposed hierarchical-aware approach significantly improves predictions obtained with flat methods, especially in terms of precision/recall results.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Robinson, P., Krawitz, P., Mundlos, S.: Strategies for exome and genome sequence data analysis in disease-gene discovery projects. Cin. Genet. 80, 127–132 (2011)
Article Google Scholar
Robinson, P., Kohler, S., Bauer, S., Seelow, D., Horn, D., Mundlos, S.: The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am. J. Hum. Genet. 83, 610–615 (2008)
Article Google Scholar
Amberger, J., Bocchini, C., Amosh, A.: A new face and new challenges for Online Mendelian inheritance in Man (OMIM). Hum. Mutat. 32, 564–567 (2011)
Article Google Scholar
Kohler, S., et al.: The human phenotype ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Research 42(Database issue), D966–D974 (2014)
Google Scholar
Moreau, Y., Tranchevent, L.: Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nature Rev. Genet. 13(8), 523–536 (2012)
Article Google Scholar
McGary, K., Lee, I., Marcotte, E.: Broad network-based predictability of Saccharomyces cerevisiae gene loss-of-function phenotypes. Genome Biology 8(R258) (2007)
Google Scholar
Mehan, M., Nunez-Iglesias, J., Dai, C., Waterman, M., Zhou, X.: An integrative modular approach to systematically predict gene-phenotype associations. BMC Bioinformatics 11(suppl. 1) (2010)
Google Scholar
Wang, P., et al.: Inference of gene-phenotype associations via protein-protein interaction and orthology. PLoS One 8(10) (2013)
Google Scholar
Musso, G., et al.: Novel cardiovascular gene functions revealed via systematic phenotype prediction in zebrafish. Development 141, 224–235 (2014)
Article Google Scholar
Cerri, R., de Carvalho, A.: Hierarchical multilabel protein function prediction using local neural networks. In: Norberto de Souza, O., Telles, G.P., Palakal, M. (eds.) BSB 2011. LNCS, vol. 6832, pp. 10–17. Springer, Heidelberg (2011)
Chapter Google Scholar
Silla, C., Freitas, A.: A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery 22(1-2), 31–72 (2011)
Article MATH MathSciNet Google Scholar
Valentini, G.: True Path Rule hierarchical ensembles for genome-wide gene function prediction. IEEE ACM Transactions on Computational Biology and Bioinformatics 8(3), 832–847 (2011)
Article MathSciNet Google Scholar
Cesa-Bianchi, N., Re, M., Valentini, G.: Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference. Machine Learning 88(1), 209–241 (2012)
Article MATH MathSciNet Google Scholar
Obozinski, G., Lanckriet, G., Grant, C., Jordan, M., Noble, W.: Consistent probabilistic output for protein function prediction. Genome Biology 9(S6) (2008)
Google Scholar
Schietgat, L., Vens, C., Struyf, J., Blockeel, H., Dzeroski, S.: Predicting gene function using hierarchical multi-label decision tree ensembles. BMC Bioinformatics 11(2) (2010)
Google Scholar
Valentini, G.: Hierarchical Ensemble Methods for Protein Function Prediction. ISRN Bioinformatics 2014(Article ID 901419), 34 pages (2014)
Google Scholar
Gene Ontology Consortium: Gene Ontology annotations and resources. Nucleic Acids Research 41, D530–D535 (2013)
Google Scholar
Cormen, T., Leiserson, C., Rivest, R.: Introduction to Algorithms. MIT Press, Boston (2009)
MATH Google Scholar
Apweiler, R., Attwood, T., Bairoch, A., Bateman, A., et al.: The interpro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Research 29(1), 37–40 (2001)
Article Google Scholar
Finn, R., Tate, J., Mistry, J., Coggill, P., Sammut, J., Hotz, H., Ceric, G., Forslund, K., Eddy, S., Sonnhammer, E., Bateman, A.: The Pfam protein families database. Nucleic Acids Research 36, D281–D288 (2008)
Google Scholar
Attwood, T.: The prints database: a resource for identification of protein families. Brief Bioinform. 3(3), 252–263 (2002)
Article Google Scholar
Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., Cuche, B., De Castro, E., Lachaize, C., Langendijk-Genevaux, P., Sigrist, C.: The 20 years of prosite. Nucleic Acids Research 36, D245–D249 (2008)
Google Scholar
Schultz, J., Milpetz, F., Bork, P., Ponting, C.: Smart, a simple modular architecture research tool: identification of signaling domains. Proceedings of the National Academy of Sciences 95(11), 5857–5864 (1998)
Article Google Scholar
Gough, J., Karplus, K., Hughey, R., Chothia, C.: Assignment of homology to genome sequences using a library of hidden markov models that represent all proteins of known structure. Journal of Molecular Biology 313(4), 903–919 (2001)
Article Google Scholar
Valentini, G., Paccanaro, A., Caniza, H., Romero, A., Re, M.: An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods. Artificial Intelligence in Medicine 61(2), 63–78 (2014)
Article Google Scholar
Wu, G., Feng, X., Stein, L.: A human functional protein interaction network and its application to cancer data analysis. Genome Biol. 11, R53 (2010)
Google Scholar
Lee, I., Blom, U., Wang, P.I., Shim, J., Marcotte, E.: Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Research 21(7), 1109–1121 (2011)
Article Google Scholar
Re, M., Valentini, G.: Cancer module genes ranking using kernelized score functions. BMC Bioinformatics 13(suppl.14/S3) (2012)
Google Scholar
Re, M., Mesiti, M., Valentini, G.: A Fast Ranking Algorithm for Predicting Gene Functions in Biomolecular Networks. IEEE ACM Transactions on Computational Biology and Bioinformatics 9(6), 1812–1818 (2012)
Article Google Scholar
Re, M., Valentini, G.: Network-based Drug Ranking and Repositioning with respect to DrugBank Therapeutic Categories. IEEE/ACM Transactions on Computational Biology and Bioinformatics 10(6), 1359–1371 (2013)
Article Google Scholar
Oliver, S.: Guilt-by-association goes global. Nature 403, 601–603 (2000)
Article Google Scholar
Smola, A.J., Kondor, R.: Kernels and regularization on graphs. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 144–158. Springer, Heidelberg (2003)
Chapter Google Scholar
Zhu, X., et al.: Semi-supervised learning with gaussian fields and harmonic functions. In: Proc. of the 20th Int. Conf. on Machine Learning, Washintgton DC, USA (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

AnacletoLab - DI, Dipartimento di Informatica, Università degli Studi di Milano, Italy
Giorgio Valentini & Matteo Re
Institut fur Medizinische Genetik und Humangenetik, Charité - Universitatsmedizin Berlin, Germany
Sebastian Köhler & Peter N. Robinson
Dipartimento di Bioscienze, Università degli Studi di Milano, Italy
Marco Notaro
Institute of Bioinformatics, Department of Mathematics and Computer Science, Freie Universitat Berlin, Germany
Peter N. Robinson

Authors

Giorgio Valentini
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Köhler
View author publications
You can also search for this author in PubMed Google Scholar
Matteo Re
View author publications
You can also search for this author in PubMed Google Scholar
Marco Notaro
View author publications
You can also search for this author in PubMed Google Scholar
Peter N. Robinson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dpto. de Arquitectura y Tecnología de Computadores (ATC)., E.T.S. de Ingenierías en Informática y Telecomunicación. CITIC-UGR, Universidad de Granada, c/ Periodista Daniel Saucedo Aranda s/n, 18071, Granada, Spain
Francisco Ortuño
E.T.S. Ingenierías Informática y de Telecomunicación , , Dpto. Arquitectura y Tecnología de Computadores, CITIC-UGR, Universidad de Granada, C Periodista Rafael Gómez Montero, 18071, Granada, Spain
Ignacio Rojas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Valentini, G., Köhler, S., Re, M., Notaro, M., Robinson, P.N. (2015). Prediction of Human Gene - Phenotype Associations by Exploiting the Hierarchical Structure of the Human Phenotype Ontology. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2015. Lecture Notes in Computer Science(), vol 9043. Springer, Cham. https://doi.org/10.1007/978-3-319-16483-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-16483-0_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16482-3
Online ISBN: 978-3-319-16483-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics