Abstract
The Human Phenotype Ontology (HPO) provides a standard categorization of the phenotypic abnormalities encountered in human diseases and of the semantic relationship between them. Quite surprisingly the problem of the automated prediction of the association between genes and abnormal human phenotypes has been widely overlooked, even if this issue represents an important step toward the characterization of gene-disease associations, especially when no or very limited knowledge is available about the genetic etiology of the disease under study. We present a novel ensemble method able to capture the hierarchical relationships between HPO terms, and able to improve existing hierarchical ensemble algorithms by explicitly considering the predictions of the descendant terms of the ontology. In this way the algorithm exploits the information embedded in the most specific ontology terms that closely characterize the phenotypic information associated with each human gene. Genome-wide results obtained by integrating multiple sources of information show the effectiveness of the proposed approach.
Keywords
- Human Phenotype Ontology
- Hierarchical multi-label classification
- Hierarchical ensemble methods
- Gene-abnormal phenotype prediction
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Amberger, J., Bocchini, C., Amosh, A.: A new face and new challenges for online mendelian inheritance in man (OMIM). Hum. Mutat. 32, 564–7 (2011)
Ashburner, M., et al.: Creating the gene ontology resource: design and implementation. Genome Res. 11(8), 1425–1433 (2001)
Bolstad, B.M., Irizarry, R.A., Astrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003)
Chatr-Aryamontri, A., et al.: The BioGRID interaction database: 2013 update. Nucleic Acids Res. 41, 816–823 (2013)
Cormen, T., Leiserson, C., Rivest, R.L., Stein, S.: Introduction to Algorithms. MIT Press, Boston (2009)
Franceschini, A., et al.: STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 41, 808–815 (2013)
Goldstein, B., Polley, E., Briggs, F.: Random forests for genetic association studies. Stat. Appl. Genet. Mol. Biol. 10(1) (2011). https://doi.org/10.2202/1544-6115.1691
Jiang, Y., et al.: An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol. 17, 184 (2016)
Kohler, S., Vasilevsky, N., Engelstad, M., et al.: The human phenotype ontology in 2017. Nucleic Acids Res. 45, D865 (2017)
Moreau, Y., Tranchevent, L.: Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nature Rev. Genet. 13, 523–536 (2012)
Notaro, M., Schubach, M., Robinson, P.N., Valentini, G.: Prediction of human phenotype ontology terms by means of hierarchical ensemble methods. BMC Bioinform. 18(1), 449:1–449:18 (2017). http://dblp.uni-trier.de/db/journals/bmcbi/bmcbi18.html#NotaroSRV17
Re, M., Mesiti, M., Valentini, G.: A fast ranking algorithm for predicting gene functions in biomolecular networks. IEEE/ACM Trans. Comput. Biol. Bioinf. 9, 1812–1818 (2012)
Robinson, P.N., Frasca, M., Köhler, S., Notaro, M., Re, M., Valentini, G.: A hierarchical ensemble method for DAG-structured taxonomies. In: Schwenker, F., Roli, F., Kittler, J. (eds.) MCS 2015. LNCS, vol. 9132, pp. 15–26. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20248-8_2
Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE 10, 1–21 (2015)
Schubach, M., Re, M., Robinson, P., Valentini, G.: Imbalance-aware machine learning for predicting rare and common disease-associated non-coding variants. Sci. Rep. 7(2959) (2017). https://doi.org/10.1038/s41598-017-03011-5
Smedley, D., et al.: A whole-genome analysis framework for effective identification of pathogenic regulatory variants in Mendelian disease. Am. J. Hum. Genet. 99, 595–606 (2016)
Valentini, G.: True Path Rule hierarchical ensembles for genome-wide gene function prediction. IEEE/ACM Trans. Comput. Biol. Bioinf. 8, 832–847 (2011)
Valentini, G., Armano, G., Frasca, M., Lin, J., Mesiti, M., Re, M.: RANKS: a flexible tool for node label ranking and classification in biological networks. Bioinformatics 32, 2872 (2016)
Valentini, G., Köhler, S., Re, M., Notaro, M., Robinson, P.N.: Prediction of human gene - phenotype associations by exploiting the hierarchical structure of the human phenotype ontology. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2015. LNCS, vol. 9043, pp. 66–77. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16483-0_7
Valentini, G., Paccanaro, A., Caniza, H., Romero, A., Re, M.: An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods. Artif. Intell. Med. 61, 63–78 (2014)
Wang, P., et al.: Inference of gene-phenotype associations via protein-protein interaction and orthology. PLoS ONE 8, 1–8 (2013)
Zemojtel, T., et al.: Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome. Sci. Transl. Med. 6, 252ra123 (2014)
Acknowledgments
We acknowledge partial support from the project “Discovering Patterns in Multi-Dimensional Data” (2016–2017) funded by Università degli Studi di Milano.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Notaro, M., Schubach, M., Frasca, M., Mesiti, M., Robinson, P.N., Valentini, G. (2019). Ensembling Descendant Term Classifiers to Improve Gene - Abnormal Phenotype Predictions. In: Bartoletti, M., et al. Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2017. Lecture Notes in Computer Science(), vol 10834. Springer, Cham. https://doi.org/10.1007/978-3-030-14160-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-14160-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14159-2
Online ISBN: 978-3-030-14160-8
eBook Packages: Computer ScienceComputer Science (R0)