Abstract
Finding genes associated with human genetic disorders is one of the most challenging problems in bio-medicine. In this context, to guide researchers in detecting the most reliable candidate causative-genes for the disease of interest, gene prioritization methods represent a necessary support to automatically rank genes according to their involvement in the disease under study. This problem is characterized by highly unbalanced classes (few causative and much more non-causative genes) and requires the adoption of cost-sensitive techniques to achieve reliable solutions. In this work we propose a network-based methodology for disease-gene prioritization designed to expressly cope with the data imbalance. Its validation over a benchmark composed of 708 selected medical subject headings (MeSH) diseases, shows that our approach is competitive with state-of-art methodologies, and its reduced time complexity makes its application feasible on large-size datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Actually the number of predictors, including the two-way interaction term (i.e. the product of the two features), is equal to 3.
References
Lehne, B., Lewis, C.M., Schlitt, T.: From SNPs to genes: disease association at the gene level. PLoS ONE 6(6), e20133 (2011)
Manolio, T.A.: Genomewide association studies and assessment of the risk of disease. N. Engl. J. Med. 363(2), 166–176 (2010)
Brnigen, D., et al.: An unbiased evaluation of gene prioritization tools. Bioinformatics 28(23), 3081–3088 (2012)
Ashburner, M., et al.: Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat. Genet. 25(1), 25–29 (2000)
Navlakha, S., Kingsford, C.: The power of protein interaction networks for associating genes with diseases. Bioinformatics 26(8), 1057–1063 (2010)
Vanunu, O., Sharan, R.: A propagation-based algorithm for inferring gene-disease associations. In: Proceedings of the German Conference on Bioinformatics, GCB, September 9–12, Dresden, Germany (2008)
Kohler, S., et al.: Walking the interactome for prioritization of candidate disease genes. Am. J. Hum. Genet. 82(4), 949–958 (2008)
Antanaviciute, A., et al.: Ova: integrating molecular and physical phenotype data from multiple biomedical domain ontologies with variant filtering for enhanced variant prioritization. Bioinformatics 31(23), 3822–3829 (2015)
Valentini, G., et al.: An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods. Artif. Intell. Med. 61(2), 63–78 (2014)
Frasca, M., et al.: UNIPred: unbalance-aware network integration and prediction of protein functions. J. Comput. Biol. 22(12), 1057–1074 (2015)
Amberger, J., Bocchini, C., Hamosh, A.: A new face and new challenges for online mendelian inheritance in man (OMIM). Hum. Mutat. 32(5), 564–567 (2011)
Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 973–978 (2001)
Frasca, M., et al.: A neural network algorithm for semi-supervised node label learning from unbalanced data. Neural Netw. 43, 84–98 (2013)
Frasca, M.: Automated gene function prediction through gene multifunctionality in biological networks. Neurocomputing 162, 48–56 (2015)
Bertoni, A., Frasca, M., Valentini, G.: COSNet: a cost sensitive neural network for semi-supervised learning in graphs. In: Hofmann, T., Malerba, D., Vazirgiannis, M., Gunopulos, D. (eds.) ECML PKDD 2011, Part I. LNCS, vol. 6911, pp. 219–234. Springer, Heidelberg (2011)
Frasca, M., Pavesi, G.: A neural network based algorithm for gene expression prediction from chromatin structure. In: IEEE IJCNN, pp. 1–8 (2013). doi:10.1109/IJCNN.2013.6706954
Davis, A.P., et al.: Comparative toxicogenomics database: a knowledgebase and discovery tool for chemical-gene-disease networks. Nucleic Acids Res. 37(Database issue), D786–D792 (2009)
Wu, G., Feng, X., Stein, L.: A human functional protein interaction network and its application to cancer data analysis. Genome Biol. 11(5), R53+ (2010)
Lee, I., et al.: Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 21(7), 1109–1121 (2011)
Segal, E., et al.: A module map showing conditional activity of expression modules in cancer. Nat. Genet. 36(3), 1090–1098 (2004)
Chatr-aryamontri, A., et al.: The biogrid interaction database: 2013 update. Nucleic Acids Res. 41(Database–Issue), 816–823 (2013)
Hellevik, O.: Linear versus logistic regression when the dependent variable is a dichotomy. Qual. Quant. 43(1), 59–74 (2009)
Van Del Paal, B.: A comparison of different methods for modelling rare events data. Master thesis in statistical data analysis, Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium (2013–2014)
Derby, N.: An introduction to the analysis of rare events. In: SA16 Proceedings of the 2011 Midwest SAS Users Group Conference, Kansas City, KS (2011)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Dmochowski, J.P., Sajda, P., Parra, L.C.: Maximum likelihood in cost-sensitive learning: model specification, approximations, and upper bounds. J. Mach. Learn. Res. 11, 3313–3332 (2010)
Lovász, L.: Random walks on graphs: a survey. In: Miklós, D., Sós, V.T., Szőnyi, T. (eds.) Combinatorics, Paul Erdős is Eighty, vol. 2, pp. 353–398. János Bolyai Mathematical Society, Budapest (1996)
Schwikowski, B., Uetz, P., Fields, S.: A network of protein-protein interactions in yeast. Nat. Biotechnol. 18(12), 1257–1261 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Frasca, M., Bassis, S. (2016). Gene-Disease Prioritization Through Cost-Sensitive Graph-Based Methodologies. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2016. Lecture Notes in Computer Science(), vol 9656. Springer, Cham. https://doi.org/10.1007/978-3-319-31744-1_64
Download citation
DOI: https://doi.org/10.1007/978-3-319-31744-1_64
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31743-4
Online ISBN: 978-3-319-31744-1
eBook Packages: Computer ScienceComputer Science (R0)