Skip to main content
Log in

Role of In Silico Tools in Gene Discovery

  • Review
  • Published:
Molecular Biotechnology Aims and scope Submit manuscript

Abstract

Common complex diseases remain a major health challenge and involve the interaction of multiple genes and environmental factors. Discovering the relevant genes is difficult although it is known that disease risk can originate from the variation of an individual’s genome. Application of in silico tools can significantly improve the detection of genes and variation. Data mining and automated tracking of new knowledge facilitate locus mapping. At the gene search stage, in silico prioritization of candidate genes plays an indispensable role in dealing with linked or associated loci. In silico analysis can also differentiate subtle consequences of coding DNA variants and remains the major tool to predict potential effects of non-coding DNA variants on gene transcription and/or pre-mRNA splicing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Wellcome Trust Case Control Consortium. (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447, 661–678. doi:10.1038/nature05911.

    Article  Google Scholar 

  2. Xiang, J., Li, X. Y., Xu, M., Hong, J., Huang, Y., Tan, J. R., et al. (2008). Zinc transporter-8 gene (SLC30A8) is associated with type 2 diabetes in Chinese. Journal of Clinical Endocrinology and Metabolism, 93, 4107–4112. doi:10.1210/jc.2008-0161.

    Article  CAS  Google Scholar 

  3. Grant, S. F., Thorleifsson, G., Reynisdottir, I., Benediktsson, R., Manolescu, A., Sainz, J., et al. (2006). Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nature Genetics, 38, 320–323. doi:10.1038/ng1732.

    Article  CAS  Google Scholar 

  4. Herbert, A., Gerry, N. P., McQueen, M. B., Heid, I. M., Pfeufer, A., Illig, T., et al. (2006). A common genetic variant is associated with adult and childhood obesity. Science, 312, 279–283. doi:10.1126/science.1124779.

    Article  CAS  Google Scholar 

  5. Watkins, H., & Farrall, M. (2006). Genetic susceptibility to coronary artery disease: From promise to progress. Nature Reviews. Genetics, 7, 163–173. doi:10.1038/nrg1805.

    Article  CAS  Google Scholar 

  6. McCarthy, M. I., Abecasis, G. R., Cardon, L. R., Goldstein, D. B., Little, J., Ioannidis, J. P., et al. (2008). Genome-wide association studies for complex traits: Consensus, uncertainty and challenges. Nature Reviews. Genetics, 9, 356–369. doi:10.1038/nrg2344.

    Article  CAS  Google Scholar 

  7. Trent, R. J. (2005). Molecular medicine (3rd ed., p. 4). London: Elsevier Academic Press.

    Google Scholar 

  8. Thomson, G. (2001). Significance levels in genome scans. Advances in Genetics, 42, 475–486. doi:10.1016/S0065-2660(01)42037-2.

    Article  CAS  Google Scholar 

  9. Smith, E. W., & Torbert, J. V. (1958). Study of two abnormal hemoglobins with evidence for a new genetic locus for hemoglobin formation. Bulletin of the Johns Hopkins Hospital, 102, 38–45.

    CAS  Google Scholar 

  10. Deisseroth, A., Nienhuis, A., Turner, P., Velez, R., Anderson, W. F., Ruddle, F., et al. (1977). Localization of the human alphaglobin structural gene to chromosome 16 in somatic cell hybrids by molecular hybridization assay. Cell, 12, 205–218.

    Article  CAS  Google Scholar 

  11. Rommens, J. M., Iannuzzi, M. C., Kerem, B., Drumm, M. L., Melmer, G., Dean, M., et al. (1989). Identification of the cystic fibrosis gene: Chromosome walking and jumping. Science, 245, 1059–1065. doi:10.1126/science.2772657.

    Article  CAS  Google Scholar 

  12. Richards, J. E., Gilliam, T. C., Cole, J. L., Drumm, M. L., Wasmuth, J. J., Gusella, J. F., et al. (1988). Chromosome jumping from D4S10 (G8) toward the Huntington disease gene. Proceedings of the National Academy of Sciences of the United States of America, 85, 6437–6441. doi:10.1073/pnas.85.17.6437.

    Article  CAS  Google Scholar 

  13. Frazer, K. A., Ballinger, D. G., Cox, D. R., Hinds, D. A., Stuve, L. L., Gibbs, R. A., et al. (2007). A second generation human haplotype map of over 3.1 million SNPs. Nature, 449, 851–861. doi:10.1038/nature06258.

    Article  CAS  Google Scholar 

  14. Gauderman, W. J. (2002). Sample size requirements for matched case-control studies of gene-environment interaction. Statistics in Medicine, 21, 35–50. doi:10.1002/sim.973.

    Article  Google Scholar 

  15. Laird, N. M., & Lange, C. (2006). Family-based designs in the age of large-scale gene-association studies. Nature Reviews. Genetics, 7, 385–394. doi:10.1038/nrg1839.

    Article  CAS  Google Scholar 

  16. Lalouel, J.-M., & Rohrwasser, A. (2002). Power and replication in case-control studies. American Journal of Hypertension, 15, 201–205. doi:10.1016/S0895-7061(01)02285-3.

    Article  Google Scholar 

  17. Ambrosius, W. T., Lange, E. M., & Langefeld, C. D. (2004). Power for genetic association studies with random allele frequencies and genotype distributions. American Journal of Human Genetics, 74, 683–693. doi:10.1086/383282.

    Article  CAS  Google Scholar 

  18. Marchini, J., Howie, B., Myers, S., McVean, G., & Donnelly, P. (2007). A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genetics, 39, 906–913. doi:10.1038/ng2088.

    Article  CAS  Google Scholar 

  19. Kruglyak, L. (1999). Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genetics, 22, 139–144. doi:10.1038/9642.

    Article  CAS  Google Scholar 

  20. Barrett, J. C., & Cardon, L. R. (2006). Evaluating coverage of genome-wide association studies. Nature Genetics, 38, 659–662. doi:10.1038/ng1801.

    Article  CAS  Google Scholar 

  21. Pe’er, I., de Bakker, P. I., Maller, J., Yelensky, R., Altshuler, D., & Daly, M. J. (2006). Evaluating and improving power in whole-genome association studies using fixed marker sets. Nature Genetics, 38, 663–667. doi:10.1038/ng1816.

    Article  CAS  Google Scholar 

  22. de Bakker, P. I., Yelensky, R., Pe’er, I., Gabriel, S. B., Daly, M. J., & Altshuler, D. (2005). Efficiency and power in genetic association studies. Nature Genetics, 37, 1217–1223. doi:10.1038/ng1669.

    Article  Google Scholar 

  23. De La Vega, F. M. (2007). Selecting single-nucleotide polymorphisms for association studies with SNPbrowser software. Methods in Molecular Biology (Clifton, N.J.), 376, 177–193. doi:10.1007/978-1-59745-389-9_13.

    Article  Google Scholar 

  24. Weeber, M., Kors, J. A., & Mons, B. (2005). Online tools to support literature-based discovery in the life sciences. Briefings in Bioinformatics, 6, 277–286. doi:10.1093/bib/6.3.277.

    Article  Google Scholar 

  25. van Driel, M. A., Cuelenaere, K., Kemmeren, P. P., Leunissen, J. A., Brunner, H. G., & Vriend, G. (2005). GeneSeeker: Extraction and integration of human disease-related information from web-based genetic databases. Nucleic Acids Research, 33, W758–W761. doi:10.1093/nar/gki435.

    Article  Google Scholar 

  26. Freudenberg, J., & Propping, P. (2002). A similarity-based method for genome-wide prediction of disease-relevant human genes. Bioinformatics (Oxford, England), 18, S110–S115.

    Google Scholar 

  27. Perez-Iratxeta, C., Bork, P., & Andrade, M. A. (2002). Association of genes to genetically inherited diseases using data mining. Nature Genetics, 31, 316–319.

    CAS  Google Scholar 

  28. Turner, F. S., Clutterbuck, D. R., & Semple, C. A. M. (2003). POCUS: Mining genomic sequence annotation to predict disease genes. Genome Biology, 4, R75. doi:10.1186/gb-2003-4-11-r75.

    Article  Google Scholar 

  29. Adie, E. A., Adams, R. R., Evans, K. L., Porteous, D. J., & Pickard, B. S. (2005). Speeding disease gene discovery by sequence based candidate prioritization. BMC Bioinformatics, 6, 55. doi:10.1186/1471-2105-6-55.

    Article  Google Scholar 

  30. Henderson, J., Withford-Cave, J. M., Duffy, D. L., Cole, S. J., Sawyer, N. A., Gulbin, J. P., et al. (2005). The EPAS1 gene influences the aerobic-anaerobic contribution in elite endurance athletes. Human Genetics, 118, 416–423. doi:10.1007/s00439-005-0066-0.

    Article  CAS  Google Scholar 

  31. Bouchard, C., Rankinen, T., Chagnon, Y. C., Rice, T., Perusse, L., Gagnon, J., et al. (2000). Genomic scan for maximal oxygen uptake and its response to training in the HERITAGE Family Study. Journal of Applied Physiology, 88, 551–559.

    CAS  Google Scholar 

  32. Miller, R. T., Christoffels, A. G., Gopalakrishnan, C., Burke, J., Ptitsyn, A. A., Broveak, T. R., et al. (1999). A comprehensive approach to clustering of expressed human gene sequence: The sequence tag alignment and consensus knowledge base. Genome Research, 9, 1143–1155. doi:10.1101/gr.9.11.1143.

    Article  CAS  Google Scholar 

  33. Devos, D., & Valencia, A. (2001). Intrinsic errors in genome annotation. Trends in Genetics, 17, 429–431. doi:10.1016/S0168-9525(01)02348-4.

    Article  CAS  Google Scholar 

  34. Judson, R., Stephens, J. C., & Windemuth, A. (2000). The predictive power of haplotypes in clinical response. Pharmacogenomics, 1, 15–26. doi:10.1517/14622416.1.1.15.

    Article  CAS  Google Scholar 

  35. Adkins, R. M. (2004). Comparison of the accuracy of methods of computational haplotype inference using a large empirical dataset. BMC Genetics, 5, 22. doi:10.1186/1471-2156-5-22.

    Article  Google Scholar 

  36. Van Den Bogaert, A., Schumacher, J., Schulze, T. G., Otte, A. C., Ohlraun, S., Kovalenko, S., et al. (2003). The DTNBP1 (dysbindin) gene contributes to schizophrenia, depending on family history of the disease. American Journal of Human Genetics, 73, 1438–1443. doi:10.1086/379928.

    Article  Google Scholar 

  37. Batzoglou, S. (2005). The many faces of sequence alignment. Briefings in Bioinformatics, 6, 6–22. doi:10.1093/bib/6.1.6.

    Article  CAS  Google Scholar 

  38. Yu, B. (2004). What is the value of mutation identification in familial hypertrophic cardiomyopathy? IUBMB Life, 56, 281–283. doi:10.1080/15216540412331272254.

    Article  CAS  Google Scholar 

  39. Mooney, S. (2005). Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis. Briefings in Bioinformatics, 6, 44–56. doi:10.1093/bib/6.1.44.

    Article  CAS  Google Scholar 

  40. Ng, P. C., & Henikoff, S. (2003). SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Research, 31, 3812–3814. doi:10.1093/nar/gkg509.

    Article  CAS  Google Scholar 

  41. Cartegni, L., & Krainer, A. R. (2002). Disruption of an SF2/ASF-dependent exonic splicing enhancer in SMN2 causes spinal muscular atrophy in the absence of SMN1. Nature Genetics, 30, 377–384. doi:10.1038/ng854.

    Article  CAS  Google Scholar 

  42. Houdayer, C., Dehainault, C., Mattler, C., Michaux, D., Caux-Moncoutier, V., Pages-Berhouet, S., et al. (2008). Evaluation of in silico splice tools for decision-making in molecular diagnosis. Human Mutation, 29, 975–982. doi:10.1002/humu.20765.

    Article  CAS  Google Scholar 

  43. Bulyk, M. L. (2003). Computational prediction of transcription-factor binding site locations. Genome Biology, 5, 201. doi:10.1186/gb-2003-5-1-201.

    Article  Google Scholar 

  44. Pavesi, G., Mauri, G., & Pesole, G. (2004). In silico representation and discovery of transcription factor binding sites. Briefings in Bioinformatics, 5, 217–236. doi:10.1093/bib/5.3.217.

    Article  CAS  Google Scholar 

  45. Amador, M. L., Oppenheimer, D., Perea, S., Maitra, A., Cusatis, G., Iacobuzio-Donahue, C., et al. (2004). An epidermal growth factor receptor intron 1 polymorphism mediates response to epidermal growth factor receptor inhibitors. Cancer Research, 64, 9139–9143. doi:10.1158/0008-5472.CAN-04-1036.

    Article  CAS  Google Scholar 

  46. Tokuhiro, S., Yamada, R., Chang, X., Suzuki, A., Kochi, Y., Sawada, T., et al. (2003). An intronic SNP in a RUNX1 binding site of SLC22A4, encoding an organic cation transporter, is associated with rheumatoid arthritis. Nature Genetics, 35, 341–348. doi:10.1038/ng1267.

    Article  CAS  Google Scholar 

  47. Greene, E., Mahishi, L., Entezam, A., Kumari, D., & Usdin, K. (2007). Repeat-induced epigenetic changes in intron 1 of the frataxin gene and its consequences in Friedreich ataxia. Nucleic Acids Research, 35, 3383–3390. doi:10.1093/nar/gkm271.

    Article  CAS  Google Scholar 

  48. Fairbrother, W. G., Yeh, R. F., Sharp, P. A., & Burge, C. B. (2002). Predictive identification of exonic splicing enhancers in human genes. Science, 297, 1007–1013. doi:10.1126/science.1073774.

    Article  CAS  Google Scholar 

  49. Rademakers, R., Eriksen, J. L., Baker, M., Robinson, T., Ahmed, Z., Lincoln, S. J., et al. (2008). Common variation in the miR-659 binding-site of GRN is a major risk factor for TDP43-positive frontotemporal dementia. Human Molecular Genetics, 17, 3631–3642. doi:10.1093/hmg/ddn257.

    Article  CAS  Google Scholar 

Download references

Acknowledgments

The author thanks Professor Ronald J. Trent and Dr Julia M. Morahan for their helpful discussion and comments on the manuscript. This work was partially supported by the Australian Research Council Discovery Grant DP0452019.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bing Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, B. Role of In Silico Tools in Gene Discovery. Mol Biotechnol 41, 296–306 (2009). https://doi.org/10.1007/s12033-008-9134-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12033-008-9134-8

Keywords

Navigation