Abstract
One of the main issues in detecting the genes involved in the etiology of genetic human diseases is the integration of different types of available functional relationships between genes. Numerous approaches exploited the complementary evidence coded in heterogeneous sources of data to prioritize disease-genes, such as functional profiles or expression quantitative trait loci, but none of them to our knowledge posed the scarcity of known disease-genes as a feature of their integration methodology. Nevertheless, in contexts where data are unbalanced, that is, where one class is largely under-represented, imbalance-unaware approaches may suffer a strong decrease in performance. We claim that imbalance-aware integration is a key requirement for boosting performance of gene prioritization (GP) methods. To support our claim, we propose an imbalance-aware integration algorithm for the GP problem, and we compare it on benchmark data with other state-of-the-art integration methodologies.
Keywords
- Medical Subject Headings
- Gene prioritization
- Imbalance-aware integration
- Network integration
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
References
Ashburner, M., et al.: Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25(1), 25–29 (2000)
Barabasi, A.L., Gulbahce, N., Loscalzo, J.: Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12(1), 56–68 (2011). https://doi.org/10.1038/nrg2918
Che, J., Shin, M.: A meta-analysis strategy for gene prioritization using gene expression, SNP genotype, and eQTL data. BioMed Res. Int. 2015, 1–8 (2015). https://doi.org/10.1155/2015/576349
Davis, A.P., et al.: Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical-gene-disease networks. Nucleic Acids Res. 37(Database issue), D786–D792 (2009). https://doi.org/10.1093/nar/gkn580
Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 973–978 (2001)
Frasca, M., Bassis, S.: Gene-disease prioritization through cost-sensitive graph-based methodologies. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2016. LNCS, vol. 9656, pp. 739–751. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31744-1_64
Frasca, M.: Gene2DisCo: gene to disease using disease commonalities. Artif. Intell. Med. 82, 34–46 (2017). https://doi.org/10.1016/j.artmed.2017.08.001
Frasca, M., Bertoni, A., Valentini, G.: UNIPred: Unbalance-aware Network Integration and Prediction of protein functions. J. Comput. Biol. 22(12), 1057–1074 (2015). https://doi.org/10.1089/cmb.2014.0110
Frasca, M., Malchiodi, D.: Exploiting negative sample selection for prioritizing candidate disease genes. Genomics Comput. Biol. 3(3), e47 (2017). https://doi.org/10.18547/gcb.2017.vol3.iss3.e47
Lee, I., Blom, U.M., Wang, P.I., Shim, J.E., Marcotte, E.M.: Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 21(7), 1109–1121 (2011). https://doi.org/10.1101/gr.118992.110
Lovász, L.: Random walks on graphs: a survey. In: Miklós, D., Sós, V.T., Szőnyi, T. (eds.) Combinatorics, Paul Erdős is Eighty, vol. 2, pp. 353–398. János Bolyai Mathematical Society, Budapest (1996)
Marcotte, E., Pellegrini, M., Thompson, M., Yeates, T., Eisenberg, D.: A combined algorithm for genome-wide prediction of protein function. Nature 402, 83–86 (1999)
Montojo, J., Zuberi, K., Shao, Q., Bader, G.D., Morris, Q.: Network assessor: an automated method for quantitative assessment of a network’s potential for gene function prediction. Front. Genet. 5, 123 (2014). https://doi.org/10.3389/fgene.2014.00123
Mostafavi, S., Morris, Q.: Fast integration of heterogeneous data sources for predicting gene function with limited annotation. Bioinformatics 26(14), 1759–1765 (2010)
Piro, R.M., Di Cunto, F.: Computational approaches to disease-gene prediction: rationale, classification and successes. FEBS J. 279(5), 678–696 (2012). https://doi.org/10.1111/j.1742-4658.2012.08471.x
Tiffin, N., Andrade-Navarro, M.A., Perez-Iratxeta, C.: Linking genes to diseases: it’s all in the data. Genome Med. 1(8), 77 (2009). https://doi.org/10.1186/gm77
Valentini, G., et al.: RANKS: a flexible tool for node label ranking and classification in biological networks. Bioinformatics 32, 2872–2874 (2016). https://doi.org/10.1093/bioinformatics/btw235
Valentini, G., et al.: An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods. Artif. Intell. Med. 61(2), 63–78 (2014). https://doi.org/10.1016/j.artmed.2014.03.003
Wu, G., Feng, X., Stein, L.: A human functional protein interaction network and its application to cancer data analysis. Genome Biol. 11(5), 1–23 (2010). https://doi.org/10.1186/gb-2010-11-5-r53
Acknowledgments
This work was funded grant title Machine learning algorithms to handle label imbalance in biomedical taxonomies, code PSR2017_DIP_010_MFRAS, Università degli Studi di Milano.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Frasca, M. et al. (2019). Disease–Genes Must Guide Data Source Integration in the Gene Prioritization Process. In: Bartoletti, M., et al. Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2017. Lecture Notes in Computer Science(), vol 10834. Springer, Cham. https://doi.org/10.1007/978-3-030-14160-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-14160-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14159-2
Online ISBN: 978-3-030-14160-8
eBook Packages: Computer ScienceComputer Science (R0)