Skip to main content

Ensembling Descendant Term Classifiers to Improve Gene - Abnormal Phenotype Predictions

Part of the Lecture Notes in Computer Science book series (LNBI,volume 10834)

Abstract

The Human Phenotype Ontology (HPO) provides a standard categorization of the phenotypic abnormalities encountered in human diseases and of the semantic relationship between them. Quite surprisingly the problem of the automated prediction of the association between genes and abnormal human phenotypes has been widely overlooked, even if this issue represents an important step toward the characterization of gene-disease associations, especially when no or very limited knowledge is available about the genetic etiology of the disease under study. We present a novel ensemble method able to capture the hierarchical relationships between HPO terms, and able to improve existing hierarchical ensemble algorithms by explicitly considering the predictions of the descendant terms of the ontology. In this way the algorithm exploits the information embedded in the most specific ontology terms that closely characterize the phenotypic information associated with each human gene. Genome-wide results obtained by integrating multiple sources of information show the effectiveness of the proposed approach.

Keywords

  • Human Phenotype Ontology
  • Hierarchical multi-label classification
  • Hierarchical ensemble methods
  • Gene-abnormal phenotype prediction

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Amberger, J., Bocchini, C., Amosh, A.: A new face and new challenges for online mendelian inheritance in man (OMIM). Hum. Mutat. 32, 564–7 (2011)

    CrossRef  Google Scholar 

  2. Ashburner, M., et al.: Creating the gene ontology resource: design and implementation. Genome Res. 11(8), 1425–1433 (2001)

    CrossRef  Google Scholar 

  3. Bolstad, B.M., Irizarry, R.A., Astrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003)

    CrossRef  Google Scholar 

  4. Chatr-Aryamontri, A., et al.: The BioGRID interaction database: 2013 update. Nucleic Acids Res. 41, 816–823 (2013)

    CrossRef  Google Scholar 

  5. Cormen, T., Leiserson, C., Rivest, R.L., Stein, S.: Introduction to Algorithms. MIT Press, Boston (2009)

    MATH  Google Scholar 

  6. Franceschini, A., et al.: STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 41, 808–815 (2013)

    CrossRef  Google Scholar 

  7. Goldstein, B., Polley, E., Briggs, F.: Random forests for genetic association studies. Stat. Appl. Genet. Mol. Biol. 10(1) (2011). https://doi.org/10.2202/1544-6115.1691

  8. Jiang, Y., et al.: An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol. 17, 184 (2016)

    CrossRef  Google Scholar 

  9. Kohler, S., Vasilevsky, N., Engelstad, M., et al.: The human phenotype ontology in 2017. Nucleic Acids Res. 45, D865 (2017)

    CrossRef  Google Scholar 

  10. Moreau, Y., Tranchevent, L.: Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nature Rev. Genet. 13, 523–536 (2012)

    CrossRef  Google Scholar 

  11. Notaro, M., Schubach, M., Robinson, P.N., Valentini, G.: Prediction of human phenotype ontology terms by means of hierarchical ensemble methods. BMC Bioinform. 18(1), 449:1–449:18 (2017). http://dblp.uni-trier.de/db/journals/bmcbi/bmcbi18.html#NotaroSRV17

  12. Re, M., Mesiti, M., Valentini, G.: A fast ranking algorithm for predicting gene functions in biomolecular networks. IEEE/ACM Trans. Comput. Biol. Bioinf. 9, 1812–1818 (2012)

    CrossRef  Google Scholar 

  13. Robinson, P.N., Frasca, M., Köhler, S., Notaro, M., Re, M., Valentini, G.: A hierarchical ensemble method for DAG-structured taxonomies. In: Schwenker, F., Roli, F., Kittler, J. (eds.) MCS 2015. LNCS, vol. 9132, pp. 15–26. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20248-8_2

    CrossRef  Google Scholar 

  14. Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE 10, 1–21 (2015)

    Google Scholar 

  15. Schubach, M., Re, M., Robinson, P., Valentini, G.: Imbalance-aware machine learning for predicting rare and common disease-associated non-coding variants. Sci. Rep. 7(2959) (2017). https://doi.org/10.1038/s41598-017-03011-5

  16. Smedley, D., et al.: A whole-genome analysis framework for effective identification of pathogenic regulatory variants in Mendelian disease. Am. J. Hum. Genet. 99, 595–606 (2016)

    CrossRef  Google Scholar 

  17. Valentini, G.: True Path Rule hierarchical ensembles for genome-wide gene function prediction. IEEE/ACM Trans. Comput. Biol. Bioinf. 8, 832–847 (2011)

    CrossRef  Google Scholar 

  18. Valentini, G., Armano, G., Frasca, M., Lin, J., Mesiti, M., Re, M.: RANKS: a flexible tool for node label ranking and classification in biological networks. Bioinformatics 32, 2872 (2016)

    CrossRef  Google Scholar 

  19. Valentini, G., Köhler, S., Re, M., Notaro, M., Robinson, P.N.: Prediction of human gene - phenotype associations by exploiting the hierarchical structure of the human phenotype ontology. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2015. LNCS, vol. 9043, pp. 66–77. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16483-0_7

    CrossRef  Google Scholar 

  20. Valentini, G., Paccanaro, A., Caniza, H., Romero, A., Re, M.: An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods. Artif. Intell. Med. 61, 63–78 (2014)

    CrossRef  Google Scholar 

  21. Wang, P., et al.: Inference of gene-phenotype associations via protein-protein interaction and orthology. PLoS ONE 8, 1–8 (2013)

    CrossRef  Google Scholar 

  22. Zemojtel, T., et al.: Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome. Sci. Transl. Med. 6, 252ra123 (2014)

    CrossRef  Google Scholar 

Download references

Acknowledgments

We acknowledge partial support from the project “Discovering Patterns in Multi-Dimensional Data” (2016–2017) funded by Università degli Studi di Milano.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giorgio Valentini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Notaro, M., Schubach, M., Frasca, M., Mesiti, M., Robinson, P.N., Valentini, G. (2019). Ensembling Descendant Term Classifiers to Improve Gene - Abnormal Phenotype Predictions. In: Bartoletti, M., et al. Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2017. Lecture Notes in Computer Science(), vol 10834. Springer, Cham. https://doi.org/10.1007/978-3-030-14160-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-14160-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-14159-2

  • Online ISBN: 978-3-030-14160-8

  • eBook Packages: Computer ScienceComputer Science (R0)