Abstract
While candidate gene association studies continue to be the most practical and frequently employed approach in disease gene investigation for complex disorders, selecting suitable genes to test is a challenge. There are several computational approaches available for selecting and prioritizing disease candidate genes. A majority of these tools are based on guilt-by-association principle where novel disease candidate genes are identified and prioritized based on either functional or topological similarity to known disease genes. In this chapter we review the prioritization criteria and the algorithms along with some use cases that demonstrate how these tools can be used for identifying and ranking human disease candidate genes.
Cheng Zhu and Chao Wu contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adie EA, Adams RR, Evans KL, Porteous DJ, Pickard BS (2005) Speeding disease gene discovery by sequence based candidate prioritization. BMC Bioinformatics 6:55
Adie EA, Adams RR, Evans KL, Porteous DJ, Pickard BS (2006) SUSPECTS: enabling fast and effective prioritization of positional candidates. Bioinformatics 22(6):773–774
Aerts S, Lambrechts D, Maity S, Van Loo P, Coessens B, De Smet F, Tranchevent LC, De Moor B, Marynen P, Hassan B, Carmeliet P, Moreau Y (2006) Gene prioritization through genomic data fusion. Nat Biotechnol 24(5):537–544
Becker KG, Barnes KC, Bright TJ, Wang SA (2004) The genetic association database. Nat Genet 36(5):431–432. doi:10.1038/ng0504-431, ng0504-431 [pii]
Benitez BA, Alvarado D, Cai Y, Mayo K, Chakraverty S, Norton J, Morris JC, Sands MS, Goate A, Cruchaga C (2011) Exome-sequencing confirms DNAJC5 mutations as cause of adult neuronal ceroid-lipofuscinosis. PLoS One 6(11):e26741. doi:10.1371/journal.pone.0026741, PONE-D-11-16499 [pii]
Beurskens LW, Tibboel D, Lindemans J, Duvekot JJ, Cohen-Overbeek TE, Veenma DC, de Klein A, Greer JJ, Steegers-Theunissen RP (2010) Retinol status of newborn infants is associated with congenital diaphragmatic hernia. Pediatrics 126(4):712–720. doi:10.1542/peds.2010-0521, peds.2010-0521 [pii]
Bornigen D, Tranchevent LC, Bonachela-Capdevila F, Devriendt K, De Moor B, De Causmaecker P, Moreau Y (2012) An unbiased evaluation of gene prioritization tools. Bioinformatics 28(23):3081–3088. doi:10.1093/bioinformatics/bts581, bts581 [pii]
Chen J, Aronow BJ, Jegga AG (2009) Disease candidate gene identification and prioritization using protein interaction networks. BMC Bioinformatics 10:73. doi:1471-2105-10-73, [pii] 10.1186/1471-2105-10-73
Chen J, Bardes EE, Aronow BJ, Jegga AG (2009) ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res 37(Web Server issue):W305–W311. doi:gkp427, [pii] 10.1093/nar/gkp427
Chen J, Xu H, Aronow BJ, Jegga AG (2007) Improved human disease candidate gene prioritization using mouse phenotype. BMC Bioinformatics 8(1):392
Chen JY, Shen C, Sivachenko AY (2006) Mining Alzheimer disease relevant proteins from integrated protein interactome data. Pac Symp Biocomput 367–378
Chen X, Yan GY, Liao XP (2010) A novel candidate disease genes prioritization method based on module partition and rank fusion. OMICS 14(4):337–356. doi:10.1089/omi.2009.0143
Davis AP, Murphy CG, Saraceni-Richards CA, Rosenstein MC, Wiegers TC, Mattingly CJ (2009) Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical-gene-disease networks. Nucleic Acids Res 37(Database issue):D786–D792. doi:gkn580, [pii] 10.1093/nar/gkn580
Erlich Y, Edvardson S, Hodges E, Zenvirt S, Thekkat P, Shaag A, Dor T, Hannon GJ, Elpeleg O (2011) Exome sequencing and disease-network analysis of a single family implicate a mutation in KIF1A in hereditary spastic paraparesis. Genome Res 21(5):658–664. doi:gr.117143.110, [pii] 10.1101/gr.117143.110
Franke L, Bakel H, Fokkens L, de Jong ED, Egmont-Petersen M, Wijmenga C (2006) Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet 78(6):1011–1025
Freudenberg J, Propping P (2002) A similarity-based method for genome-wide prediction of disease-relevant human genes. Bioinformatics 18(Suppl 2):S110–S115
George RA, Liu JY, Feng LL, Bryson-Richardson RJ, Fatkin D, Wouters MA (2006) Analysis of protein sequence and interaction data for candidate disease gene prediction. Nucleic Acids Res 34(19):e130
Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E, Vijayadamodar G, Pochart P, Machineni H, Welsh M, Kong Y, Zerhusen B, Malcolm R, Varrone Z, Collis A, Minto M, Burgess S, McDaniel L, Stimpson E, Spriggs F, Williams J, Neurath K, Ioime N, Agee M, Voss E, Furtak K, Renzulli R, Aanensen N, Carrolla S, Bickelhaupt E, Lazovatsky Y, DaSilva A, Zhong J, Stanyon CA, Finley RL Jr, White KP, Braverman M, Jarvie T, Gold S, Leach M, Knight J, Shimkets RA, McKenna MP, Chant J, Rothberg JM (2003) A protein interaction map of Drosophila melanogaster. Science (New York, NY) 302(5651):1727–1736. doi:10.1126/science.1090289, 1090289 [pii]
Goehler H, Lalowski M, Stelzl U, Waelter S, Stroedicke M, Worm U, Droege A, Lindenberg KS, Knoblich M, Haenig C, Herbst M, Suopanki J, Scherzinger E, Abraham C, Bauer B, Hasenbank R, Fritzsche A, Ludewig AH, Bussow K, Coleman SH, Gutekunst CA, Landwehrmeyer BG, Lehrach H, Wanker EE (2004) A protein interaction network links GIT1, an enhancer of huntingtin aggregation, to Huntington's disease. Mol Cell 15(6):853–865. doi:10.1016/j.molcel.2004.09.016, S1097276504005453 [pii]
Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL (2007) The human disease network. Proc Natl Acad Sci U S A 104(21):8685–8690. doi:0701361104, [pii] 10.1073/pnas.0701361104
Hamosh A, Scott A, Amberger J, Bocchini C, McKusick V (2005) Online Mendelian inheritance in man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33:D514–D517
Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106(23):9362–9367. doi:0903103106, [pii]10.1073/pnas.0903103106
Hristovski D, Peterlin B, Mitchell JA, Humphrey SM (2005) Using literature-based discovery to identify disease candidate genes. Int J Med Inform 74(2–4):289–298
Hsu C, Huang Y, Hsu C, Yang U (2011) Prioritizing disease candidate genes by a gene interconnectedness-based approach. BMC Genomics 12(3):S25
Huynen MA, Snel B, van Noort V (2004) Comparative genomics for reliable protein-function prediction from genomic data. Trends Genet 20(8):340–344
Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci U S A 98(8):4569–4574. doi:10.1073/pnas.061034498, 061034498 [pii]
Jimenez-Sanchez G, Childs B, Valle D (2001) Human disease genes. Nature 409(6822): 853–855
Junker BH, Koschutzki D, Schreiber F (2006) Exploration of biological network centralities with CentiBiN. BMC Bioinformatics 7:219
Kaimal V, Sardana D, Bardes EE, Gudivada RC, Chen J, Jegga AG (2011) Integrative systems biology approaches to identify and prioritize disease and drug candidate genes. Methods Mol Biol 700:241–259. doi:10.1007/978-1-61737-954-3_16
Kann MG (2007) Protein interactions and disease: computational approaches to uncover the etiology of diseases. Brief Bioinform 8(5):333–346
Kim YK, Wassef L, Hamberger L, Piantedosi R, Palczewski K, Blaner WS, Quadro L (2008) Retinyl ester formation by lecithin: retinol acyltransferase is a key regulator of retinoid homeostasis in mouse embryogenesis. J Biol Chem 283(9):5611–5621. doi:M708885200, [pii] 10.1074/jbc.M708885200
King MC, Wilson AC (1975) Evolution at two levels in humans and chimpanzees. Science (New York, NY) 188(4184):107–116
Kleinberg J (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5):604–632
Kohler S, Bauer S, Horn D, Robinson PN (2008) Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet 82(4):949–958. doi:S0002-9297(08)00172-9, [pii] 10.1016/j.ajhg.2008.02.013
Korstanje R, Paigen B (2002) From QTL to gene: the harvest begins. Nat Genet 31(3):235–236
Lage K, Karlberg EO, Storling ZM, Olason PI, Pedersen AG, Rigina O, Hinsby AM, Tumer Z, Pociot F, Tommerup N et al (2007) A human phenome-interactome network of protein complexes implicated in genetic disorders. Nat Biotechnol 25(3):309–316
Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JD, Chesneau A, Hao T, Goldberg DS, Li N, Martinez M, Rual JF, Lamesch P, Xu L, Tewari M, Wong SL, Zhang LV, Berriz GF, Jacotot L, Vaglio P, Reboul J, Hirozane-Kishikawa T, Li Q, Gabel HW, Elewa A, Baumgartner B, Rose DJ, Yu H, Bosak S, Sequerra R, Fraser A, Mango SE, Saxton WM, Strome S, Van Den Heuvel S, Piano F, Vandenhaute J, Sardet C, Gerstein M, Doucette-Stamm L, Gunsalus KC, Harper JW, Cusick ME, Roth FP, Hill DE, Vidal M (2004) A map of the interactome network of the metazoan C. elegans. Science (New York, NY) 303(5657):540–543. doi:10.1126/science.1091403, 1091403 [pii]
Linghu B, Snitkin ES, Hu Z, Xia Y, Delisi C (2009) Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network. Genome Biol 10(9):R91. doi:10.1186/gb-2009-10-9-r91, gb-2009-10-9-r91 [pii]
Lopez-Bigas N, Ouzounis CA (2004) Genome-wide identification of genes likely to be involved in human genetic disease. Nucleic Acids Res 32(10):3108–3114
Mackay TF (2001) Quantitative trait loci in Drosophila. Nat Rev 2(1):11–20
Masseroli M, Galati O, Pinciroli F (2005) GFINDer: genetic disease and phenotype location statistical analysis and mining of dynamically annotated gene lists. Nucleic Acids Res 33(Web Server issue):W717–W723
Masseroli M, Martucci D, Pinciroli F (2004) GFINDer: Genome Function INtegrated Discoverer through dynamic annotation, statistical analysis, and mining. Nucleic Acids Res 32(Web Server issue):W293–W300
Moreau Y, Tranchevent LC (2012) Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nat Rev 13(8):523–536. doi:10.1038/nrg3253, nrg3253 [pii]
Navlakha S, Kingsford C (2010) The power of protein interaction networks for associating genes with diseases. Bioinformatics 26(8):1057–1063. doi:10.1093/bioinformatics/btq076, btq076 [pii]
Ortutay C, Vihinen M (2009) Identification of candidate disease genes by integrating gene ontologies and protein-interaction networks: case study of primary immunodeficiencies. Nucleic Acids Res 37(2):622–628. doi:gkn982, [pii]10.1093/nar/gkn982
Oti M, Ballouz S, Wouters MA (2011) Web tools for the prioritization of candidate disease genes. Methods Mol Biol 760:189–206. doi:10.1007/978-1-61779-176-5_12
Oti M, Snel B, Huynen MA, Brunner HG (2006) Predicting disease genes using protein-protein interactions. J Med Genet 43(8):691–698
Perez-Iratxeta C, Bork P, Andrade MA (2002) Association of genes to genetically inherited diseases using data mining. Nat Genet 31(3):316–319
Perez-Iratxeta C, Wjst M, Bork P, Andrade MA (2005) G2D: a tool for mining genes associated with disease. BMC Genet 6:45
Piro RM, Di Cunto F (2012) Computational approaches to disease-gene prediction: rationale, classification and successes. FEBS J 279(5):678–696. doi:10.1111/j.1742-4658.2012.08471.x
Popescu M, Keller JM, Mitchell JA (2006) Fuzzy measures on the gene ontology for gene product similarity. IEEE/ACM Trans Comput Biol Bioinform 3(3):263–274
Rossi S, Masotti D, Nardini C, Bonora E, Romeo G, Macii E, Benini L, Volinia S (2006) TOM: a web-based integrated approach for identification of candidate disease genes. Nucleic Acids Res 34(Web Server issue):W285–W292
Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, Klitgord N, Simon C, Boxem M, Milstein S, Rosenberg J, Goldberg DS, Zhang LV, Wong SL, Franklin G, Li S, Albala JS, Lim J, Fraughton C, Llamosas E, Cevik S, Bex C, Lamesch P, Sikorski RS, Vandenhaute J, Zoghbi HY, Smolyar A, Bosak S, Sequerra R, Doucette-Stamm L, Cusick ME, Hill DE, Roth FP, Vidal M (2005) Towards a proteome-scale map of the human protein-protein interaction network. Nature 437(7062):1173–1178. doi:nature04209, 10.1038/nature04209
Sam L, Liu Y, Li J, Friedman C, Lussier YA (2007) Discovery of protein interaction networks shared by diseases. Pac Symp Biocomput 76–87
Smith NG, Eyre-Walker A (2003) Human disease genes: patterns and predictions. Gene 318:169–175
Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S, Timm J, Mintzlaff S, Abraham C, Bock N, Kietzmann S, Goedde A, Toksoz E, Droege A, Krobitsch S, Korn B, Birchmeier W, Lehrach H, Wanker EE (2005) A human protein-protein interaction network: a resource for annotating the proteome. Cell 122(6):957–968. doi:S0092-8674(05)00866-4, 10.1016/j.cell.2005.08.029
Sun PG, Gao L, Han S (2010) Prediction of human disease-related gene clusters by clustering analysis. Int J Biol Sci 7(1):61–73
Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, Doerks T, Stark M, Muller J, Bork P, Jensen LJ, von Mering C (2011) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 39(Database issue):D561–D568. doi:10.1093/nar/gkq973, gkq973 [pii]
Thornblad TA, Elliott KS, Jowett J, Visscher PM (2007) Prioritization of positional candidate genes using multiple web-based software tools. Twin Res Hum Genet 10(6):861–870
Tiffin N (2011) Conceptual thinking for in silico prioritization of candidate disease genes. Methods Mol Biol 760:175–187. doi:10.1007/978-1-61779-176-5_11
Tiffin N, Adie E, Turner F, Brunner HG, van Driel MA, Oti M, Lopez-Bigas N, Ouzounis C, Perez-Iratxeta C, Andrade-Navarro MA, Adeyemo A, Patti ME, Semple CA, Hide W (2006) Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes. Nucleic Acids Res 34(10):3067–3081
Tiffin N, Kelso JF, Powell AR, Pan H, Bajic VB, Hide WA (2005) Integration of text- and data-mining using ontologies successfully selects disease gene candidates. Nucleic Acids Res 33(5):1544–1552
Tranchevent LC, Barriot R, Yu S, Van Vooren S, Van Loo P, Coessens B, De Moor B, Aerts S, Moreau Y (2008) ENDEAVOUR update: a web resource for gene prioritization in multiple species. Nucleic Acids Res 36(Web Server issue):W377–W384
Tranchevent LC, Capdevila FB, Nitsch D, De Moor B, De Causmaecker P, Moreau Y (2011) A guide to web tools to prioritize candidate genes. Brief Bioinform 12(1):22–32. doi:10.1093/bib/bbq007, bbq007 [pii]
Turner FS, Clutterbuck DR, Semple CA (2003) POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol 4(11):R75
Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothberg JM (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403(6770):623–627. doi:10.1038/35001009
van Driel MA, Bruggeman J, Vriend G, Brunner HG, Leunissen JA (2006) A text-mining analysis of the human phenome. Eur J Hum Genet 14(5):535–542
van Driel MA, Cuelenaere K, Kemmeren PP, Leunissen JA, Brunner HG (2003) A new web-based data mining tool for the identification of candidate genes for human genetic disorders. Eur J Hum Genet 11(1):57–63
van Driel MA, Cuelenaere K, Kemmeren PP, Leunissen JA, Brunner HG, Vriend G (2005) GeneSeeker: extraction and integration of human disease-related information from web-based genetic databases. Nucleic Acids Res 33(Web Server issue):W758–W761
Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R (2010) Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol 6(1):e1000641. doi:10.1371/journal.pcbi.1000641
Wat MJ, Veenma D, Hogue J, Holder AM, Yu Z, Wat JJ, Hanchard N, Shchelochkov OA, Fernandes CJ, Johnson A, Lally KP, Slavotinek A, Danhaive O, Schaible T, Cheung SW, Rauen KA, Tonk VS, Tibboel D, de Klein A, Scott DA (2011) Genomic alterations that contribute to the development of isolated and non-isolated congenital diaphragmatic hernia. J Med Genet 48(5):299–307. doi:10.1136/jmg.2011.089680, 48/5/299 [pii]
White S, Smyth P (2003) Algorithms for estimating relative importance in networks. Paper presented at the KDD '03: proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining
Wu X, Jiang R, Zhang MQ, Li S (2008) Network-based global inference of human disease genes. Mol Syst Biol 4:189. doi:msb200827, [pii] 10.1038/msb.2008.27
Xu J, Li Y (2006) Discovering disease-genes by topological features in human protein-protein interaction network. Bioinformatics 22(22):2800–2805. doi:btl467, [pii] 10.1093/bioinformatics/btl467
Zhu C, Kushwaha A, Berman K, Jegga AG (2012) A vertex similarity-based framework to discover and rank orphan disease-related genes. BMC Syst Biol 6(Suppl 3):S8. doi:10.1186/1752-0509-6-S3-S8, 1752-0509-6-S3-S8 [pii]
Zhu M, Zhao S (2007) Candidate gene identification approach: progress and challenges. Int J Biol Sci 3(7):420–427
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
4.1 Electronic Supplementary material
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Zhu, C., Wu, C., Aronow, B.J., Jegga, A.G. (2014). Computational Approaches for Human Disease Gene Prediction and Ranking. In: Maltsev, N., Rzhetsky, A., Gilliam, T. (eds) Systems Analysis of Human Multigene Disorders. Advances in Experimental Medicine and Biology, vol 799. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8778-4_4
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8778-4_4
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8777-7
Online ISBN: 978-1-4614-8778-4
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)