Molecular Biotechnology

, Volume 41, Issue 3, pp 296–306 | Cite as

Role of In Silico Tools in Gene Discovery



Common complex diseases remain a major health challenge and involve the interaction of multiple genes and environmental factors. Discovering the relevant genes is difficult although it is known that disease risk can originate from the variation of an individual’s genome. Application of in silico tools can significantly improve the detection of genes and variation. Data mining and automated tracking of new knowledge facilitate locus mapping. At the gene search stage, in silico prioritization of candidate genes plays an indispensable role in dealing with linked or associated loci. In silico analysis can also differentiate subtle consequences of coding DNA variants and remains the major tool to predict potential effects of non-coding DNA variants on gene transcription and/or pre-mRNA splicing.


Gene discovery Complex disease Data mining Prediction Data hosting In silico prioritization Haplotype inference Simulation 


  1. 1.
    Wellcome Trust Case Control Consortium. (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447, 661–678. doi:10.1038/nature05911.CrossRefGoogle Scholar
  2. 2.
    Xiang, J., Li, X. Y., Xu, M., Hong, J., Huang, Y., Tan, J. R., et al. (2008). Zinc transporter-8 gene (SLC30A8) is associated with type 2 diabetes in Chinese. Journal of Clinical Endocrinology and Metabolism, 93, 4107–4112. doi:10.1210/jc.2008-0161.CrossRefGoogle Scholar
  3. 3.
    Grant, S. F., Thorleifsson, G., Reynisdottir, I., Benediktsson, R., Manolescu, A., Sainz, J., et al. (2006). Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nature Genetics, 38, 320–323. doi:10.1038/ng1732.CrossRefGoogle Scholar
  4. 4.
    Herbert, A., Gerry, N. P., McQueen, M. B., Heid, I. M., Pfeufer, A., Illig, T., et al. (2006). A common genetic variant is associated with adult and childhood obesity. Science, 312, 279–283. doi:10.1126/science.1124779.CrossRefGoogle Scholar
  5. 5.
    Watkins, H., & Farrall, M. (2006). Genetic susceptibility to coronary artery disease: From promise to progress. Nature Reviews. Genetics, 7, 163–173. doi:10.1038/nrg1805.CrossRefGoogle Scholar
  6. 6.
    McCarthy, M. I., Abecasis, G. R., Cardon, L. R., Goldstein, D. B., Little, J., Ioannidis, J. P., et al. (2008). Genome-wide association studies for complex traits: Consensus, uncertainty and challenges. Nature Reviews. Genetics, 9, 356–369. doi:10.1038/nrg2344.CrossRefGoogle Scholar
  7. 7.
    Trent, R. J. (2005). Molecular medicine (3rd ed., p. 4). London: Elsevier Academic Press.Google Scholar
  8. 8.
    Thomson, G. (2001). Significance levels in genome scans. Advances in Genetics, 42, 475–486. doi:10.1016/S0065-2660(01)42037-2.CrossRefGoogle Scholar
  9. 9.
    Smith, E. W., & Torbert, J. V. (1958). Study of two abnormal hemoglobins with evidence for a new genetic locus for hemoglobin formation. Bulletin of the Johns Hopkins Hospital, 102, 38–45.Google Scholar
  10. 10.
    Deisseroth, A., Nienhuis, A., Turner, P., Velez, R., Anderson, W. F., Ruddle, F., et al. (1977). Localization of the human alphaglobin structural gene to chromosome 16 in somatic cell hybrids by molecular hybridization assay. Cell, 12, 205–218.CrossRefGoogle Scholar
  11. 11.
    Rommens, J. M., Iannuzzi, M. C., Kerem, B., Drumm, M. L., Melmer, G., Dean, M., et al. (1989). Identification of the cystic fibrosis gene: Chromosome walking and jumping. Science, 245, 1059–1065. doi:10.1126/science.2772657.CrossRefGoogle Scholar
  12. 12.
    Richards, J. E., Gilliam, T. C., Cole, J. L., Drumm, M. L., Wasmuth, J. J., Gusella, J. F., et al. (1988). Chromosome jumping from D4S10 (G8) toward the Huntington disease gene. Proceedings of the National Academy of Sciences of the United States of America, 85, 6437–6441. doi:10.1073/pnas.85.17.6437.CrossRefGoogle Scholar
  13. 13.
    Frazer, K. A., Ballinger, D. G., Cox, D. R., Hinds, D. A., Stuve, L. L., Gibbs, R. A., et al. (2007). A second generation human haplotype map of over 3.1 million SNPs. Nature, 449, 851–861. doi:10.1038/nature06258.CrossRefGoogle Scholar
  14. 14.
    Gauderman, W. J. (2002). Sample size requirements for matched case-control studies of gene-environment interaction. Statistics in Medicine, 21, 35–50. doi:10.1002/sim.973.CrossRefGoogle Scholar
  15. 15.
    Laird, N. M., & Lange, C. (2006). Family-based designs in the age of large-scale gene-association studies. Nature Reviews. Genetics, 7, 385–394. doi:10.1038/nrg1839.CrossRefGoogle Scholar
  16. 16.
    Lalouel, J.-M., & Rohrwasser, A. (2002). Power and replication in case-control studies. American Journal of Hypertension, 15, 201–205. doi:10.1016/S0895-7061(01)02285-3.CrossRefGoogle Scholar
  17. 17.
    Ambrosius, W. T., Lange, E. M., & Langefeld, C. D. (2004). Power for genetic association studies with random allele frequencies and genotype distributions. American Journal of Human Genetics, 74, 683–693. doi:10.1086/383282.CrossRefGoogle Scholar
  18. 18.
    Marchini, J., Howie, B., Myers, S., McVean, G., & Donnelly, P. (2007). A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genetics, 39, 906–913. doi:10.1038/ng2088.CrossRefGoogle Scholar
  19. 19.
    Kruglyak, L. (1999). Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genetics, 22, 139–144. doi:10.1038/9642.CrossRefGoogle Scholar
  20. 20.
    Barrett, J. C., & Cardon, L. R. (2006). Evaluating coverage of genome-wide association studies. Nature Genetics, 38, 659–662. doi:10.1038/ng1801.CrossRefGoogle Scholar
  21. 21.
    Pe’er, I., de Bakker, P. I., Maller, J., Yelensky, R., Altshuler, D., & Daly, M. J. (2006). Evaluating and improving power in whole-genome association studies using fixed marker sets. Nature Genetics, 38, 663–667. doi:10.1038/ng1816.CrossRefGoogle Scholar
  22. 22.
    de Bakker, P. I., Yelensky, R., Pe’er, I., Gabriel, S. B., Daly, M. J., & Altshuler, D. (2005). Efficiency and power in genetic association studies. Nature Genetics, 37, 1217–1223. doi:10.1038/ng1669.CrossRefGoogle Scholar
  23. 23.
    De La Vega, F. M. (2007). Selecting single-nucleotide polymorphisms for association studies with SNPbrowser software. Methods in Molecular Biology (Clifton, N.J.), 376, 177–193. doi:10.1007/978-1-59745-389-9_13.CrossRefGoogle Scholar
  24. 24.
    Weeber, M., Kors, J. A., & Mons, B. (2005). Online tools to support literature-based discovery in the life sciences. Briefings in Bioinformatics, 6, 277–286. doi:10.1093/bib/6.3.277.CrossRefGoogle Scholar
  25. 25.
    van Driel, M. A., Cuelenaere, K., Kemmeren, P. P., Leunissen, J. A., Brunner, H. G., & Vriend, G. (2005). GeneSeeker: Extraction and integration of human disease-related information from web-based genetic databases. Nucleic Acids Research, 33, W758–W761. doi:10.1093/nar/gki435.CrossRefGoogle Scholar
  26. 26.
    Freudenberg, J., & Propping, P. (2002). A similarity-based method for genome-wide prediction of disease-relevant human genes. Bioinformatics (Oxford, England), 18, S110–S115.Google Scholar
  27. 27.
    Perez-Iratxeta, C., Bork, P., & Andrade, M. A. (2002). Association of genes to genetically inherited diseases using data mining. Nature Genetics, 31, 316–319.Google Scholar
  28. 28.
    Turner, F. S., Clutterbuck, D. R., & Semple, C. A. M. (2003). POCUS: Mining genomic sequence annotation to predict disease genes. Genome Biology, 4, R75. doi:10.1186/gb-2003-4-11-r75.CrossRefGoogle Scholar
  29. 29.
    Adie, E. A., Adams, R. R., Evans, K. L., Porteous, D. J., & Pickard, B. S. (2005). Speeding disease gene discovery by sequence based candidate prioritization. BMC Bioinformatics, 6, 55. doi:10.1186/1471-2105-6-55.CrossRefGoogle Scholar
  30. 30.
    Henderson, J., Withford-Cave, J. M., Duffy, D. L., Cole, S. J., Sawyer, N. A., Gulbin, J. P., et al. (2005). The EPAS1 gene influences the aerobic-anaerobic contribution in elite endurance athletes. Human Genetics, 118, 416–423. doi:10.1007/s00439-005-0066-0.CrossRefGoogle Scholar
  31. 31.
    Bouchard, C., Rankinen, T., Chagnon, Y. C., Rice, T., Perusse, L., Gagnon, J., et al. (2000). Genomic scan for maximal oxygen uptake and its response to training in the HERITAGE Family Study. Journal of Applied Physiology, 88, 551–559.Google Scholar
  32. 32.
    Miller, R. T., Christoffels, A. G., Gopalakrishnan, C., Burke, J., Ptitsyn, A. A., Broveak, T. R., et al. (1999). A comprehensive approach to clustering of expressed human gene sequence: The sequence tag alignment and consensus knowledge base. Genome Research, 9, 1143–1155. doi:10.1101/gr.9.11.1143.CrossRefGoogle Scholar
  33. 33.
    Devos, D., & Valencia, A. (2001). Intrinsic errors in genome annotation. Trends in Genetics, 17, 429–431. doi:10.1016/S0168-9525(01)02348-4.CrossRefGoogle Scholar
  34. 34.
    Judson, R., Stephens, J. C., & Windemuth, A. (2000). The predictive power of haplotypes in clinical response. Pharmacogenomics, 1, 15–26. doi:10.1517/14622416.1.1.15.CrossRefGoogle Scholar
  35. 35.
    Adkins, R. M. (2004). Comparison of the accuracy of methods of computational haplotype inference using a large empirical dataset. BMC Genetics, 5, 22. doi:10.1186/1471-2156-5-22.CrossRefGoogle Scholar
  36. 36.
    Van Den Bogaert, A., Schumacher, J., Schulze, T. G., Otte, A. C., Ohlraun, S., Kovalenko, S., et al. (2003). The DTNBP1 (dysbindin) gene contributes to schizophrenia, depending on family history of the disease. American Journal of Human Genetics, 73, 1438–1443. doi:10.1086/379928.CrossRefGoogle Scholar
  37. 37.
    Batzoglou, S. (2005). The many faces of sequence alignment. Briefings in Bioinformatics, 6, 6–22. doi:10.1093/bib/6.1.6.CrossRefGoogle Scholar
  38. 38.
    Yu, B. (2004). What is the value of mutation identification in familial hypertrophic cardiomyopathy? IUBMB Life, 56, 281–283. doi:10.1080/15216540412331272254.CrossRefGoogle Scholar
  39. 39.
    Mooney, S. (2005). Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis. Briefings in Bioinformatics, 6, 44–56. doi:10.1093/bib/6.1.44.CrossRefGoogle Scholar
  40. 40.
    Ng, P. C., & Henikoff, S. (2003). SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Research, 31, 3812–3814. doi:10.1093/nar/gkg509.CrossRefGoogle Scholar
  41. 41.
    Cartegni, L., & Krainer, A. R. (2002). Disruption of an SF2/ASF-dependent exonic splicing enhancer in SMN2 causes spinal muscular atrophy in the absence of SMN1. Nature Genetics, 30, 377–384. doi:10.1038/ng854.CrossRefGoogle Scholar
  42. 42.
    Houdayer, C., Dehainault, C., Mattler, C., Michaux, D., Caux-Moncoutier, V., Pages-Berhouet, S., et al. (2008). Evaluation of in silico splice tools for decision-making in molecular diagnosis. Human Mutation, 29, 975–982. doi:10.1002/humu.20765.CrossRefGoogle Scholar
  43. 43.
    Bulyk, M. L. (2003). Computational prediction of transcription-factor binding site locations. Genome Biology, 5, 201. doi:10.1186/gb-2003-5-1-201.CrossRefGoogle Scholar
  44. 44.
    Pavesi, G., Mauri, G., & Pesole, G. (2004). In silico representation and discovery of transcription factor binding sites. Briefings in Bioinformatics, 5, 217–236. doi:10.1093/bib/5.3.217.CrossRefGoogle Scholar
  45. 45.
    Amador, M. L., Oppenheimer, D., Perea, S., Maitra, A., Cusatis, G., Iacobuzio-Donahue, C., et al. (2004). An epidermal growth factor receptor intron 1 polymorphism mediates response to epidermal growth factor receptor inhibitors. Cancer Research, 64, 9139–9143. doi:10.1158/0008-5472.CAN-04-1036.CrossRefGoogle Scholar
  46. 46.
    Tokuhiro, S., Yamada, R., Chang, X., Suzuki, A., Kochi, Y., Sawada, T., et al. (2003). An intronic SNP in a RUNX1 binding site of SLC22A4, encoding an organic cation transporter, is associated with rheumatoid arthritis. Nature Genetics, 35, 341–348. doi:10.1038/ng1267.CrossRefGoogle Scholar
  47. 47.
    Greene, E., Mahishi, L., Entezam, A., Kumari, D., & Usdin, K. (2007). Repeat-induced epigenetic changes in intron 1 of the frataxin gene and its consequences in Friedreich ataxia. Nucleic Acids Research, 35, 3383–3390. doi:10.1093/nar/gkm271.CrossRefGoogle Scholar
  48. 48.
    Fairbrother, W. G., Yeh, R. F., Sharp, P. A., & Burge, C. B. (2002). Predictive identification of exonic splicing enhancers in human genes. Science, 297, 1007–1013. doi:10.1126/science.1073774.CrossRefGoogle Scholar
  49. 49.
    Rademakers, R., Eriksen, J. L., Baker, M., Robinson, T., Ahmed, Z., Lincoln, S. J., et al. (2008). Common variation in the miR-659 binding-site of GRN is a major risk factor for TDP43-positive frontotemporal dementia. Human Molecular Genetics, 17, 3631–3642. doi:10.1093/hmg/ddn257.CrossRefGoogle Scholar

Copyright information

© Humana Press 2008

Authors and Affiliations

  1. 1.Department of Molecular & Clinical Genetics, Royal Prince Alfred Hospital and Central Clinical SchoolUniversity of SydneyCamperdownAustralia

Personalised recommendations