Abstract
Candidate gene identification is typically labour intensive, involving laboratory experiments required to corroborate or disprove any hypothesis for a nominated candidate gene being considered the causative gene. The traditional approach to reduce the number of candidate genes entails fine-mapping studies using markers and pedigrees. Gene prioritization establishes the ranking of candidate genes based on their relevance to the biological process of interest, from which the most promising genes can be selected for further analysis. To date, many computational methods have focused on the prediction of candidate genes by analysis of their inherent sequence characteristics and similarity with respect to known disease genes, as well as their functional annotation. In the last decade, several computational tools for prioritizing candidate genes have been proposed. A large number of them are web-based tools, while others are standalone applications that install and run locally. This review attempts to take a close look at gene prioritization criteria, as well as candidate gene prioritization algorithms, and thus provide a comprehensive synopsis of the subject matter.
Similar content being viewed by others
References
Adie EA, Adams RR, Evans KL et al (2006) SUSPECTS: enabling fast and effective prioritization of positional candidates. Bioinformatics 22:773–774
Aerts S et al (2006) Gene prioritization through genomic data fusion. Nat Biotechnol 24:537–544
Auffray C et al (2009) Systems medicine: the future of medical genomics and healthcare. Genome Med. 1(1):2
Braun TA et al (2003) Identification candidate disease genes with high-performance computing. J Supercomput 26:7–17
Braun TA et al (2006) Prioritizing regions of candidate genes for efficient mutation screening. Hum Mutat 27:195–200
Chen J, Xu H, Aronow BJ et al (2007) Improved human disease candidate gene prioritization using mouse phenotype. BMC Bioinformatics 8:392
Chen J, Aronow B, Jegga A (2009a) Disease candidate gene identification and prioritization using protein interaction networks. BMC Bioinformatics 10(1):1
Chen J, Bardes EE, Aronow BJ, Jegga AG (2009b) TOPPGENE Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res 37:305–306
Chen Y, Wang W et al (2011) In Silico gene prioritization by integrating multiple data sources. PLoS One 6(6):e21137
Cheng D, Knox C et al (2008) POLYSEARCH: a web based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Res 36:399–405
De Bie T, Tranchevent LC, Oeffelen LV, Moreau Y (2007) Kernel-based data fusion for gene prioritization. Bioinformatics 23(13):i125–i132
Franke L, van Bakel H, Fokkens L, de Jong ED, Egmont-Petersen E, Wijmenga C (2006) Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet 78:1011–1025
Gaulton KJ, Mohlke KL, Vision TJ (2007) A computational system to select candidate genes for complex human traits. Bioinformatics 23:1132–1140
George RA, Liu JY, Feng LL, Bryson-Richardson RJ, Fatkin D, Wouters MA (2006) Analysis of protein sequence and interaction data for candidate disease gene prediction. Nucleic Acids Res 34:130
Gibson G (2009) Decanalization and the origin of complex disease. Net Rev Genet 10:134–136
Hristovskia D, Peterlinc B, Mitchellb JA, Humphrey SM (2005) Using literature-based discovery to identify disease candidate genes. Int J Med Informatics 74:289
Hutz JE, Kraja AT, McLeod HL, Province MA (2008) CANDID: a flexible method for prioritization candidate genes for complex human traits. Genet Epidemiol 32:779–811
Kohl P et al (2010) Systems biology: an approach. Clin Pharmacol Therap 88:25–33
Kohler S, Bauer S, Horn D, Robinson PN (2008) Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet 82:949–958
Lage K, Karlberg EO et al (2007) A human phenome–interactome network of protein complexes implicated in genetic disorders. Nat Bio 25(3):309–316
Ma X, Lee H, Wang L, Sun F (2007) CGI: a new approach for prioritizing genes by combining gene expression and protein–protein interaction data. Bioinformatics 23(2):215–221
Morrison JL, Breitling R, Higham DJ, Gilbert DR (2005) GENERANK: using search engine technology for the analysis of microarray experiments. BMC Bioinformatics 6:233
Nabieva E, Jim K, Agarwal A, Chazelle B, Singh M (2005) Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics Suppl 1:i302–i310
Nitsch D et al (2010) Candidate gene prioritization by network analysis of differential expression using machine learning approaches. BMC Bioinformatics 14(11):460
Nitsch D et al (2011) PINTA-A web server for network-based gene prioritization from expression data. Nucleic Acids Res. 39(Web Server issue):W334–W338
O’Connor TP, Crystal RG (2006) Genetic medicines: treatment strategies for hereditary disorders. Nat Rev Genet 7:261
Oti M, Snel B, Huynen MA, Brunner HG (2006) Predicting disease genes using protein–protein interactions. J Med Genet 43(8):691–698
Perez-Iratxeta C, Bork P, Andrade MA (2002) Association of genes to genetically inherited diseases using data mining. Nat Genet 31:316–319
Perez-Iratxeta C, Wjst M, Bork P, Andrade MA (2005) G2D: a tool for mining genes associated with disease. BMC Genet 6:45–49
Pers TH et al (2011) Meta-analysis of heterogeneous data sources for genome-scale identification of risk genes in complex phenotypes. Genet Epidemiol 35(5):318–332
Radivojac P, Peng K et al (2008) An integrated approach to inferring gene–disease associations in humans. Proteins 72:1030–1037
Rossi S, Masotti D et al (2006) TOM: a web-based integrated approach for identification of candidate disease genes. Nucleic Acids Res 34:285–292
Schlicker A et al (2010) Improving disease gene prioritization using the semantic similarity of Gene Ontology terms. Bioinformatics 26(18):i561–i567
Seelow D, Schwarz JM, Schuelke M (2008) GENEDISTILLER–distilling candidate genes from linkage intervals. PLoS One 3(12):e3874
Shannon P, Markiel A et al (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504
Smoot M, Ono K et al (2011) PINGO: a cytoscape plugin to find candidate genes in biological networks. Bioinformatics 27(7):1030–1031
Tranchevent LC et al (2008) ENDEAVOUR update: a web resource for gene prioritization in multiple species. Nucleic Acids Res 36 (Web Server issue):W377–W384
Tranchevent LC, Capdevila FB, Nitsch D, Moor BD, De-Causmaecker P, Moreau Y (2010) A guide to web tools to prioritize candidate genes. Brief Bioinform 11:1–11
Turner FS, Clutterbuck DR, Semple CAM (2003) POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol 4:75
Van Vooren S, Thienpont B, Menten B, Speleman F, De-Moor B, Vermeesch J, Moreau Y (2007) Mapping biomedical concepts onto the human genome by mining literature on chromosomal aberrations. Nucleic Acids Res 35:2533–2543
Van-Driel MA, Bruggeman J, Vriend G, Brunner HG, Leunissen JAM (2006) A text-mining analysis of the human phenome. Eur J Hum Genet 14:535–542
Vanunu O, Sharan R (2008) A propagation based algorithm for inferring gene–disease associations. In: Proceedings of German Conference on bioinformatics. Berlin
Xiong Q, Qiu Y, Gu W (2008) PGMAPPER: a web-based tool linking phenotype to genes. Bioinformatics 24:1011–1013
Yoshida Y, Makita Y et al (2009) POSMED (Positional Medline): prioritizing genes with an artificial neural network comprising medical documents to accelerate positional cloning. Nucleic Acids Res 37:147–152
Yu W, Wulf A, Liu T, Khoury MJ, Gwinn M (2008) Gene Prospector: an evidence gateway for evaluating potential susceptibility genes and interacting risk factors for human diseases. BMC Bioinformatics 9:528
Yue P, Melamud E, Moult J (2006) SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics 7:166
Zhu M, Zhao S (2007) Candidate gene identification approach: progress and Challenges. Int J Biol Sci 3(7):420–427
Acknowledgments
We appreciate Joseph Hannon Bozorgmehr for help with English editing the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by J. Graw.
An erratum to this article is available at http://dx.doi.org/10.1007/s00438-015-1117-4.
This article has been retracted by the Editor-in-Chief as it contains previously published figures and tables that have been re-produced without permissions from the original authors and publishers. Moreover, the article contains significant portions of other authors' writings on the same topic in other publications, without sufficient attribution to these earlier works being given. The principal author of the paper has acknowledged that contents from various publications and online sources were used in this review without permission and/or proper reference to the original sources.
The authors apologize for their negligence.
About this article
Cite this article
Masoudi-Nejad, A., Meshkin, A., Haji-Eghrari, B. et al. RETRACTED ARTICLE: Candidate gene prioritization. Mol Genet Genomics 287, 679–698 (2012). https://doi.org/10.1007/s00438-012-0710-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00438-012-0710-z