Abstract
Common complex diseases remain a major health challenge and involve the interaction of multiple genes and environmental factors. Discovering the relevant genes is difficult although it is known that disease risk can originate from the variation of an individual’s genome. Application of in silico tools can significantly improve the detection of genes and variation. Data mining and automated tracking of new knowledge facilitate locus mapping. At the gene search stage, in silico prioritization of candidate genes plays an indispensable role in dealing with linked or associated loci. In silico analysis can also differentiate subtle consequences of coding DNA variants and remains the major tool to predict potential effects of non-coding DNA variants on gene transcription and/or pre-mRNA splicing.
Similar content being viewed by others
References
Wellcome Trust Case Control Consortium. (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447, 661–678. doi:10.1038/nature05911.
Xiang, J., Li, X. Y., Xu, M., Hong, J., Huang, Y., Tan, J. R., et al. (2008). Zinc transporter-8 gene (SLC30A8) is associated with type 2 diabetes in Chinese. Journal of Clinical Endocrinology and Metabolism, 93, 4107–4112. doi:10.1210/jc.2008-0161.
Grant, S. F., Thorleifsson, G., Reynisdottir, I., Benediktsson, R., Manolescu, A., Sainz, J., et al. (2006). Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nature Genetics, 38, 320–323. doi:10.1038/ng1732.
Herbert, A., Gerry, N. P., McQueen, M. B., Heid, I. M., Pfeufer, A., Illig, T., et al. (2006). A common genetic variant is associated with adult and childhood obesity. Science, 312, 279–283. doi:10.1126/science.1124779.
Watkins, H., & Farrall, M. (2006). Genetic susceptibility to coronary artery disease: From promise to progress. Nature Reviews. Genetics, 7, 163–173. doi:10.1038/nrg1805.
McCarthy, M. I., Abecasis, G. R., Cardon, L. R., Goldstein, D. B., Little, J., Ioannidis, J. P., et al. (2008). Genome-wide association studies for complex traits: Consensus, uncertainty and challenges. Nature Reviews. Genetics, 9, 356–369. doi:10.1038/nrg2344.
Trent, R. J. (2005). Molecular medicine (3rd ed., p. 4). London: Elsevier Academic Press.
Thomson, G. (2001). Significance levels in genome scans. Advances in Genetics, 42, 475–486. doi:10.1016/S0065-2660(01)42037-2.
Smith, E. W., & Torbert, J. V. (1958). Study of two abnormal hemoglobins with evidence for a new genetic locus for hemoglobin formation. Bulletin of the Johns Hopkins Hospital, 102, 38–45.
Deisseroth, A., Nienhuis, A., Turner, P., Velez, R., Anderson, W. F., Ruddle, F., et al. (1977). Localization of the human alphaglobin structural gene to chromosome 16 in somatic cell hybrids by molecular hybridization assay. Cell, 12, 205–218.
Rommens, J. M., Iannuzzi, M. C., Kerem, B., Drumm, M. L., Melmer, G., Dean, M., et al. (1989). Identification of the cystic fibrosis gene: Chromosome walking and jumping. Science, 245, 1059–1065. doi:10.1126/science.2772657.
Richards, J. E., Gilliam, T. C., Cole, J. L., Drumm, M. L., Wasmuth, J. J., Gusella, J. F., et al. (1988). Chromosome jumping from D4S10 (G8) toward the Huntington disease gene. Proceedings of the National Academy of Sciences of the United States of America, 85, 6437–6441. doi:10.1073/pnas.85.17.6437.
Frazer, K. A., Ballinger, D. G., Cox, D. R., Hinds, D. A., Stuve, L. L., Gibbs, R. A., et al. (2007). A second generation human haplotype map of over 3.1 million SNPs. Nature, 449, 851–861. doi:10.1038/nature06258.
Gauderman, W. J. (2002). Sample size requirements for matched case-control studies of gene-environment interaction. Statistics in Medicine, 21, 35–50. doi:10.1002/sim.973.
Laird, N. M., & Lange, C. (2006). Family-based designs in the age of large-scale gene-association studies. Nature Reviews. Genetics, 7, 385–394. doi:10.1038/nrg1839.
Lalouel, J.-M., & Rohrwasser, A. (2002). Power and replication in case-control studies. American Journal of Hypertension, 15, 201–205. doi:10.1016/S0895-7061(01)02285-3.
Ambrosius, W. T., Lange, E. M., & Langefeld, C. D. (2004). Power for genetic association studies with random allele frequencies and genotype distributions. American Journal of Human Genetics, 74, 683–693. doi:10.1086/383282.
Marchini, J., Howie, B., Myers, S., McVean, G., & Donnelly, P. (2007). A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genetics, 39, 906–913. doi:10.1038/ng2088.
Kruglyak, L. (1999). Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genetics, 22, 139–144. doi:10.1038/9642.
Barrett, J. C., & Cardon, L. R. (2006). Evaluating coverage of genome-wide association studies. Nature Genetics, 38, 659–662. doi:10.1038/ng1801.
Pe’er, I., de Bakker, P. I., Maller, J., Yelensky, R., Altshuler, D., & Daly, M. J. (2006). Evaluating and improving power in whole-genome association studies using fixed marker sets. Nature Genetics, 38, 663–667. doi:10.1038/ng1816.
de Bakker, P. I., Yelensky, R., Pe’er, I., Gabriel, S. B., Daly, M. J., & Altshuler, D. (2005). Efficiency and power in genetic association studies. Nature Genetics, 37, 1217–1223. doi:10.1038/ng1669.
De La Vega, F. M. (2007). Selecting single-nucleotide polymorphisms for association studies with SNPbrowser software. Methods in Molecular Biology (Clifton, N.J.), 376, 177–193. doi:10.1007/978-1-59745-389-9_13.
Weeber, M., Kors, J. A., & Mons, B. (2005). Online tools to support literature-based discovery in the life sciences. Briefings in Bioinformatics, 6, 277–286. doi:10.1093/bib/6.3.277.
van Driel, M. A., Cuelenaere, K., Kemmeren, P. P., Leunissen, J. A., Brunner, H. G., & Vriend, G. (2005). GeneSeeker: Extraction and integration of human disease-related information from web-based genetic databases. Nucleic Acids Research, 33, W758–W761. doi:10.1093/nar/gki435.
Freudenberg, J., & Propping, P. (2002). A similarity-based method for genome-wide prediction of disease-relevant human genes. Bioinformatics (Oxford, England), 18, S110–S115.
Perez-Iratxeta, C., Bork, P., & Andrade, M. A. (2002). Association of genes to genetically inherited diseases using data mining. Nature Genetics, 31, 316–319.
Turner, F. S., Clutterbuck, D. R., & Semple, C. A. M. (2003). POCUS: Mining genomic sequence annotation to predict disease genes. Genome Biology, 4, R75. doi:10.1186/gb-2003-4-11-r75.
Adie, E. A., Adams, R. R., Evans, K. L., Porteous, D. J., & Pickard, B. S. (2005). Speeding disease gene discovery by sequence based candidate prioritization. BMC Bioinformatics, 6, 55. doi:10.1186/1471-2105-6-55.
Henderson, J., Withford-Cave, J. M., Duffy, D. L., Cole, S. J., Sawyer, N. A., Gulbin, J. P., et al. (2005). The EPAS1 gene influences the aerobic-anaerobic contribution in elite endurance athletes. Human Genetics, 118, 416–423. doi:10.1007/s00439-005-0066-0.
Bouchard, C., Rankinen, T., Chagnon, Y. C., Rice, T., Perusse, L., Gagnon, J., et al. (2000). Genomic scan for maximal oxygen uptake and its response to training in the HERITAGE Family Study. Journal of Applied Physiology, 88, 551–559.
Miller, R. T., Christoffels, A. G., Gopalakrishnan, C., Burke, J., Ptitsyn, A. A., Broveak, T. R., et al. (1999). A comprehensive approach to clustering of expressed human gene sequence: The sequence tag alignment and consensus knowledge base. Genome Research, 9, 1143–1155. doi:10.1101/gr.9.11.1143.
Devos, D., & Valencia, A. (2001). Intrinsic errors in genome annotation. Trends in Genetics, 17, 429–431. doi:10.1016/S0168-9525(01)02348-4.
Judson, R., Stephens, J. C., & Windemuth, A. (2000). The predictive power of haplotypes in clinical response. Pharmacogenomics, 1, 15–26. doi:10.1517/14622416.1.1.15.
Adkins, R. M. (2004). Comparison of the accuracy of methods of computational haplotype inference using a large empirical dataset. BMC Genetics, 5, 22. doi:10.1186/1471-2156-5-22.
Van Den Bogaert, A., Schumacher, J., Schulze, T. G., Otte, A. C., Ohlraun, S., Kovalenko, S., et al. (2003). The DTNBP1 (dysbindin) gene contributes to schizophrenia, depending on family history of the disease. American Journal of Human Genetics, 73, 1438–1443. doi:10.1086/379928.
Batzoglou, S. (2005). The many faces of sequence alignment. Briefings in Bioinformatics, 6, 6–22. doi:10.1093/bib/6.1.6.
Yu, B. (2004). What is the value of mutation identification in familial hypertrophic cardiomyopathy? IUBMB Life, 56, 281–283. doi:10.1080/15216540412331272254.
Mooney, S. (2005). Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis. Briefings in Bioinformatics, 6, 44–56. doi:10.1093/bib/6.1.44.
Ng, P. C., & Henikoff, S. (2003). SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Research, 31, 3812–3814. doi:10.1093/nar/gkg509.
Cartegni, L., & Krainer, A. R. (2002). Disruption of an SF2/ASF-dependent exonic splicing enhancer in SMN2 causes spinal muscular atrophy in the absence of SMN1. Nature Genetics, 30, 377–384. doi:10.1038/ng854.
Houdayer, C., Dehainault, C., Mattler, C., Michaux, D., Caux-Moncoutier, V., Pages-Berhouet, S., et al. (2008). Evaluation of in silico splice tools for decision-making in molecular diagnosis. Human Mutation, 29, 975–982. doi:10.1002/humu.20765.
Bulyk, M. L. (2003). Computational prediction of transcription-factor binding site locations. Genome Biology, 5, 201. doi:10.1186/gb-2003-5-1-201.
Pavesi, G., Mauri, G., & Pesole, G. (2004). In silico representation and discovery of transcription factor binding sites. Briefings in Bioinformatics, 5, 217–236. doi:10.1093/bib/5.3.217.
Amador, M. L., Oppenheimer, D., Perea, S., Maitra, A., Cusatis, G., Iacobuzio-Donahue, C., et al. (2004). An epidermal growth factor receptor intron 1 polymorphism mediates response to epidermal growth factor receptor inhibitors. Cancer Research, 64, 9139–9143. doi:10.1158/0008-5472.CAN-04-1036.
Tokuhiro, S., Yamada, R., Chang, X., Suzuki, A., Kochi, Y., Sawada, T., et al. (2003). An intronic SNP in a RUNX1 binding site of SLC22A4, encoding an organic cation transporter, is associated with rheumatoid arthritis. Nature Genetics, 35, 341–348. doi:10.1038/ng1267.
Greene, E., Mahishi, L., Entezam, A., Kumari, D., & Usdin, K. (2007). Repeat-induced epigenetic changes in intron 1 of the frataxin gene and its consequences in Friedreich ataxia. Nucleic Acids Research, 35, 3383–3390. doi:10.1093/nar/gkm271.
Fairbrother, W. G., Yeh, R. F., Sharp, P. A., & Burge, C. B. (2002). Predictive identification of exonic splicing enhancers in human genes. Science, 297, 1007–1013. doi:10.1126/science.1073774.
Rademakers, R., Eriksen, J. L., Baker, M., Robinson, T., Ahmed, Z., Lincoln, S. J., et al. (2008). Common variation in the miR-659 binding-site of GRN is a major risk factor for TDP43-positive frontotemporal dementia. Human Molecular Genetics, 17, 3631–3642. doi:10.1093/hmg/ddn257.
Acknowledgments
The author thanks Professor Ronald J. Trent and Dr Julia M. Morahan for their helpful discussion and comments on the manuscript. This work was partially supported by the Australian Research Council Discovery Grant DP0452019.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yu, B. Role of In Silico Tools in Gene Discovery. Mol Biotechnol 41, 296–306 (2009). https://doi.org/10.1007/s12033-008-9134-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12033-008-9134-8