Skip to main content

Disease–Genes Must Guide Data Source Integration in the Gene Prioritization Process

  • 436 Accesses

Part of the Lecture Notes in Computer Science book series (LNBI,volume 10834)

Abstract

One of the main issues in detecting the genes involved in the etiology of genetic human diseases is the integration of different types of available functional relationships between genes. Numerous approaches exploited the complementary evidence coded in heterogeneous sources of data to prioritize disease-genes, such as functional profiles or expression quantitative trait loci, but none of them to our knowledge posed the scarcity of known disease-genes as a feature of their integration methodology. Nevertheless, in contexts where data are unbalanced, that is, where one class is largely under-represented, imbalance-unaware approaches may suffer a strong decrease in performance. We claim that imbalance-aware integration is a key requirement for boosting performance of gene prioritization (GP) methods. To support our claim, we propose an imbalance-aware integration algorithm for the GP problem, and we compare it on benchmark data with other state-of-the-art integration methodologies.

Keywords

  • Medical Subject Headings
  • Gene prioritization
  • Imbalance-aware integration
  • Network integration

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.nlm.nih.gov/mesh/.

References

  1. Ashburner, M., et al.: Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25(1), 25–29 (2000)

    CrossRef  Google Scholar 

  2. Barabasi, A.L., Gulbahce, N., Loscalzo, J.: Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12(1), 56–68 (2011). https://doi.org/10.1038/nrg2918

    CrossRef  Google Scholar 

  3. Che, J., Shin, M.: A meta-analysis strategy for gene prioritization using gene expression, SNP genotype, and eQTL data. BioMed Res. Int. 2015, 1–8 (2015). https://doi.org/10.1155/2015/576349

    CrossRef  Google Scholar 

  4. Davis, A.P., et al.: Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical-gene-disease networks. Nucleic Acids Res. 37(Database issue), D786–D792 (2009). https://doi.org/10.1093/nar/gkn580

    CrossRef  Google Scholar 

  5. Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 973–978 (2001)

    Google Scholar 

  6. Frasca, M., Bassis, S.: Gene-disease prioritization through cost-sensitive graph-based methodologies. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2016. LNCS, vol. 9656, pp. 739–751. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31744-1_64

    CrossRef  Google Scholar 

  7. Frasca, M.: Gene2DisCo: gene to disease using disease commonalities. Artif. Intell. Med. 82, 34–46 (2017). https://doi.org/10.1016/j.artmed.2017.08.001

    CrossRef  Google Scholar 

  8. Frasca, M., Bertoni, A., Valentini, G.: UNIPred: Unbalance-aware Network Integration and Prediction of protein functions. J. Comput. Biol. 22(12), 1057–1074 (2015). https://doi.org/10.1089/cmb.2014.0110

    CrossRef  Google Scholar 

  9. Frasca, M., Malchiodi, D.: Exploiting negative sample selection for prioritizing candidate disease genes. Genomics Comput. Biol. 3(3), e47 (2017). https://doi.org/10.18547/gcb.2017.vol3.iss3.e47

    CrossRef  Google Scholar 

  10. Lee, I., Blom, U.M., Wang, P.I., Shim, J.E., Marcotte, E.M.: Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 21(7), 1109–1121 (2011). https://doi.org/10.1101/gr.118992.110

    CrossRef  Google Scholar 

  11. Lovász, L.: Random walks on graphs: a survey. In: Miklós, D., Sós, V.T., Szőnyi, T. (eds.) Combinatorics, Paul Erdős is Eighty, vol. 2, pp. 353–398. János Bolyai Mathematical Society, Budapest (1996)

    Google Scholar 

  12. Marcotte, E., Pellegrini, M., Thompson, M., Yeates, T., Eisenberg, D.: A combined algorithm for genome-wide prediction of protein function. Nature 402, 83–86 (1999)

    CrossRef  Google Scholar 

  13. Montojo, J., Zuberi, K., Shao, Q., Bader, G.D., Morris, Q.: Network assessor: an automated method for quantitative assessment of a network’s potential for gene function prediction. Front. Genet. 5, 123 (2014). https://doi.org/10.3389/fgene.2014.00123

    CrossRef  Google Scholar 

  14. Mostafavi, S., Morris, Q.: Fast integration of heterogeneous data sources for predicting gene function with limited annotation. Bioinformatics 26(14), 1759–1765 (2010)

    CrossRef  Google Scholar 

  15. Piro, R.M., Di Cunto, F.: Computational approaches to disease-gene prediction: rationale, classification and successes. FEBS J. 279(5), 678–696 (2012). https://doi.org/10.1111/j.1742-4658.2012.08471.x

    CrossRef  Google Scholar 

  16. Tiffin, N., Andrade-Navarro, M.A., Perez-Iratxeta, C.: Linking genes to diseases: it’s all in the data. Genome Med. 1(8), 77 (2009). https://doi.org/10.1186/gm77

    CrossRef  Google Scholar 

  17. Valentini, G., et al.: RANKS: a flexible tool for node label ranking and classification in biological networks. Bioinformatics 32, 2872–2874 (2016). https://doi.org/10.1093/bioinformatics/btw235

    CrossRef  Google Scholar 

  18. Valentini, G., et al.: An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods. Artif. Intell. Med. 61(2), 63–78 (2014). https://doi.org/10.1016/j.artmed.2014.03.003

    CrossRef  Google Scholar 

  19. Wu, G., Feng, X., Stein, L.: A human functional protein interaction network and its application to cancer data analysis. Genome Biol. 11(5), 1–23 (2010). https://doi.org/10.1186/gb-2010-11-5-r53

    CrossRef  Google Scholar 

Download references

Acknowledgments

This work was funded grant title Machine learning algorithms to handle label imbalance in biomedical taxonomies, code PSR2017_DIP_010_MFRAS, Università degli Studi di Milano.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Frasca .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Frasca, M. et al. (2019). Disease–Genes Must Guide Data Source Integration in the Gene Prioritization Process. In: Bartoletti, M., et al. Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2017. Lecture Notes in Computer Science(), vol 10834. Springer, Cham. https://doi.org/10.1007/978-3-030-14160-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-14160-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-14159-2

  • Online ISBN: 978-3-030-14160-8

  • eBook Packages: Computer ScienceComputer Science (R0)