Bioinformatic Tools for Identifying Disease Gene and SNP Candidates

  • Sean D. MooneyEmail author
  • Vidhya G. Krishnan
  • Uday S. Evani
Part of the Methods in Molecular Biology book series (MIMB, volume 628)


As databases of genome data continue to grow, our understanding of the functional elements of the genome grows as well. Many genetic changes in the genome have now been discovered and characterized, including both disease-causing mutations and neutral polymorphisms. In addition to experimental approaches to characterize specific variants, over the past decade, there has been intense bioinformatic research to understand the molecular effects of these genetic changes. In addition to genomic experimental assays, the bioinformatic efforts have focused on two general areas. First, researchers have annotated genetic variation data with molecular features that are likely to affect function. Second, statistical methods have been developed to predict mutations that are likely to have a molecular effect. In this protocol manuscript, methods for understanding the molecular functions of single nucleotide polymorphisms (SNPs) and mutations are reviewed and described. The intent of this chapter is to provide an introduction to the online tools that are both easy to use and useful.

Key words

Single nucleotide polymorphism SNP Genetic disease Candidate gene Genome Bioinformatics Machine learning 



We are graciously supported by K22LM009135 (PI: Mooney), R01LM009722 (PI: Mooney), P01AG018397 (PI: Econs), U01GM061373 (PI: Flockhart), and the Indiana Genomics Initiative. The Indiana Genomics Initiative (INGEN) is supported in part by the Lilly Endowment.


  1. 1.
    Mooney, S. (2005) Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis. Brief Bioinform, 6, 44-56.PubMedCrossRefGoogle Scholar
  2. 2.
    Ng, P.C. and Henikoff, S. (2006) Predicting the effects of amino Acid substitutions on protein function. Annu Rev Genomics Hum Genet, 7, 61-80.PubMedCrossRefGoogle Scholar
  3. 3.
    Steward, R.E., MacArthur, M.W., Laskowski, R.A. and Thornton, J.M. (2003) Molecular basis of inherited diseases: a structural perspective. Trends Genet, 19, 505-513.PubMedCrossRefGoogle Scholar
  4. 4.
    Cooper, D.N., Stenson, P.D. and Chuzhanova, N.A. (2006) The Human Gene Mutation Database (HGMD) and its exploitation in the study of mutational mechanisms. Curr Protoc Bioinformatics,  Chapter 1, Unit 1.13.
  5. 5.
    Hamosh, A., Scott, A.F., Amberger, J., Valle, D. and McKusick, V.A. (2000) Online Mendelian Inheritance in Man (OMIM). Hum Mutat, 15, 57-61.PubMedCrossRefGoogle Scholar
  6. 6.
    Altman, R.B. (2007) PharmGKB: a logical home for knowledge relating genotype to drug response phenotype. Nat Genet, 39, 426.PubMedCrossRefGoogle Scholar
  7. 7.
    Mailman, M.D., Feolo, M., Jin, Y., Kimura, M., Tryka, K., Bagoutdinov, R., et al. (2007) The NCBI dbGaP database of genotypes and phenotypes. Nat Genet, 39, 1181-1186.PubMedCrossRefGoogle Scholar
  8. 8.
    Sjoblom, T., Jones, S., Wood, L.D., Parsons, D.W., Lin, J., Barber, T.D., et al. (2006) The consensus coding sequences of human breast and colorectal cancers. Science, 314, 268-274.PubMedCrossRefGoogle Scholar
  9. 9.
    Greenman, C., Stephens, P., Smith, R., Dalgliesh, G.L., Hunter, C., Bignell, G., et al. (2007) Patterns of somatic mutation in human cancer genomes. Nature, 446, 153-158.PubMedCrossRefGoogle Scholar
  10. 10.
    Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C. and Ferrin, T.E. (2004) UCSF Chimera - a visualization system for exploratory research and analysis. J Comput Chem, 25, 1605-1612.PubMedCrossRefGoogle Scholar
  11. 11.
    Chen, R., Morgan, A.A., Dudley, J., Deshpande, T., Li, L., Kodama, K., Chiang, A.P. and Butte, A.J. (2008) FitSNPs: highly differentially expressed genes are more likely to have variants associated with disease. Genome Biol, 9, R170.PubMedCrossRefGoogle Scholar
  12. 12.
    Aerts, S., Lambrechts, D., Maity, S., Van Loo, P., Coessens, B., De Smet, F., et al. (2006) Gene prioritization through genomic data fusion. Nat Biotechnol, 24, 537-544.PubMedCrossRefGoogle Scholar
  13. 13.
    van Driel, M.A., Cuelenaere, K., Kemmeren, P.P., Leunissen, J.A., Brunner, H.G. and Vriend, G. (2005) GeneSeeker: extraction and integration of human disease-related information from web-based genetic databases. Nucleic Acids Res, 33, W758-W761.PubMedCrossRefGoogle Scholar
  14. 14.
    Perez-Iratxeta, C., Wjst, M., Bork, P. and Andrade, M.A. (2005) G2D: a tool for mining genes associated with disease. BMC Genet, 6, 45.PubMedCrossRefGoogle Scholar
  15. 15.
    Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25, 25-29.PubMedCrossRefGoogle Scholar
  16. 16.
    Adie, E.A., Adams, R.R., Evans, K.L., Porteous, D.J. and Pickard, B.S. (2006) SUSPECTS: enabling fast and effective prioritization of positional candidates. Bioinformatics, 22, 773-774.PubMedCrossRefGoogle Scholar
  17. 17.
    Adie, E.A., Adams, R.R., Evans, K.L., Porteous, D.J. and Pickard, B.S. (2005) Speeding disease gene discovery by sequence based candidate prioritization. BMC Bioinfor­matics, 6, 55.PubMedCrossRefGoogle Scholar
  18. 18.
    Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., et al. (2007) New developments in the InterPro database. Nucleic Acids Res, 35, D224-D228.PubMedCrossRefGoogle Scholar
  19. 19.
    Rossi, S., Masotti, D., Nardini, C., Bonora, E., Romeo, G., Macii, E., et al. (2006) TOM: a web-based integrated approach for identification of candidate disease genes. Nucleic Acids Res, 34, W285-W292.PubMedCrossRefGoogle Scholar
  20. 20.
    Franke, L., van Bakel, H., Fokkens, L., de Jong, E.D., Egmont-Petersen, M. and Wijmenga, C. (2006) Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet, 78, 1011-1025.PubMedCrossRefGoogle Scholar
  21. 21.
    Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H. and Kanehisa, M. (1999) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res, 27, 29-34.PubMedCrossRefGoogle Scholar
  22. 22.
    Bader, G.D., Betel, D. and Hogue, C.W. (2003) BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res, 31, 248-250.PubMedCrossRefGoogle Scholar
  23. 23.
    Peri, S., Navarro, J.D., Kristiansen, T.Z., Amanchy, R., Surendranath, V., Muthusamy, B., et al. (2004) Human protein reference database as a discovery resource for proteomics. Nucleic Acids Res, 32, D497-D501.PubMedCrossRefGoogle Scholar
  24. 24.
    Mishra, G.R., Suresh, M., Kumaran, K., Kannabiran, N., Suresh, S., Bala, P., et al. (2006) Human protein reference database - 2006 update. Nucleic Acids Res, 34, D411-D414.PubMedCrossRefGoogle Scholar
  25. 25.
    George, R.A., Liu, J.Y., Feng, L.L., Bryson-Richardson, R.J., Fatkin, D. and Wouters, M.A. (2006) Analysis of protein sequence and interaction data for candidate disease gene prediction. Nucleic Acids Res, 34, e130.PubMedCrossRefGoogle Scholar
  26. 26.
    Radivojac, P., Peng, K., Clark, W.T., Peters, B.J., Mohan, A., Boyle, S.M. and Mooney, S.D. (2008) An integrated approach to inferring gene-disease associations in humans. Proteins, 72, 1030-1037.PubMedCrossRefGoogle Scholar
  27. 27.
    Tiffin, N., Adie, E., Turner, F., Brunner, H.G., van Driel, M.A., Oti, M., et al. (2006) Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes. Nucleic Acids Res, 34, 3067-3081.PubMedCrossRefGoogle Scholar
  28. 28.
    Turner, F.S., Clutterbuck, D.R. and Semple, C.A. (2003) POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol, 4, R75.PubMedCrossRefGoogle Scholar
  29. 29.
    Karolchik, D., Baertsch, R., Diekhans, M., Furey, T.S., Hinrichs, A., Lu, Y.T., et al. (2003) The UCSC Genome Browser Database. Nucleic Acids Res, 31, 51-54.PubMedCrossRefGoogle Scholar
  30. 30.
    Birney, E., Andrews, D., Bevan, P., Caccamo, M., Cameron, G., Chen, Y., et al. (2004) Ensembl 2004. Nucleic Acids Res, 32 Database issue, D468-D470.PubMedCrossRefGoogle Scholar
  31. 31.
    Laskowski, R.A. and Thornton, J.M. (2008) Understanding the molecular machinery of genetics through 3D structures. Nat Rev Genet, 9, 141-151.PubMedCrossRefGoogle Scholar
  32. 32.
    Karchin, R., Diekhans, M., Kelly, L., Thomas, D.J., Pieper, U., Eswar, N., et al. (2005) LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources. Bioinformatics, 21, 2814-2820.PubMedCrossRefGoogle Scholar
  33. 33.
    Yue, P., Melamud, E. and Moult, J. (2006) SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics, 7, 166.PubMedCrossRefGoogle Scholar
  34. 34.
    Singh, A., Olowoyeye, A., Baenziger, P.H., Dantzer, J., Kann, M.G., Radivojac, P., et al. (2007) MutDB: update on development of tools for the biochemical analysis of genetic variation. Nucleic Acids Res, 36 (Database issue), D815-D819.PubMedCrossRefGoogle Scholar
  35. 35.
    Jegga, A.G., Gowrisankar, S., Chen, J. and Aronow, B.J. (2007) PolyDoms: a whole genome database for the identification of non-synonymous coding SNPs with the potential to impact disease. Nucleic Acids Res, 35, D700-D706.PubMedCrossRefGoogle Scholar
  36. 36.
    Pieper, U., Eswar, N., Braberg, H., Madhusudhan, M.S., Davis, F.P., Stuart, A.C., et al. (2004) MODBASE, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res, 32 Database issue, D217-D222.PubMedCrossRefGoogle Scholar
  37. 37.
    Youn, E., Peters, B., Radivojac, P. and Mooney, S.D. (2006) Evaluation of features for catalytic residue prediction in novel folds. Protein Sci, 16, 216-226.PubMedCrossRefGoogle Scholar
  38. 38.
    Ofran, Y. and Rost, B. (2003) Predicted protein-protein interaction sites from local sequence information. FEBS Lett, 544, 236-239.PubMedCrossRefGoogle Scholar
  39. 39.
    Iakoucheva, L.M., Radivojac, P., Brown, C.J., O’Connor, T.R., Sikes, J.G., Obradovic, Z. and Dunker, A.K. (2004) The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res, 32, 1037-1049.PubMedCrossRefGoogle Scholar
  40. 40.
    Wang, Z. and Moult, J. (2001) SNPs, protein structure, and disease. Hum Mutat, 17, 263-270.PubMedCrossRefGoogle Scholar
  41. 41.
    Ye, Y., Li, Z. and Godzik, A. (2006) Modeling and analyzing three-dimensional structures of human disease proteins. Pac Symp Biocomput, 11, 439-446.Google Scholar
  42. 42.
    Radivojac, P., Baenziger, P.H., Kann, M.G., Mort, M.E., Hahn, M.W. and Mooney, S.D. (2008) Gain and loss of phosphorylation sites in human cancer. Bioinformatics, 24, i241-i247.PubMedCrossRefGoogle Scholar
  43. 43.
    UniProt Consortium (2008) The universal protein resource (UniProt). Nucleic Acids Res, 36, D190-D195.CrossRefGoogle Scholar
  44. 44.
    Wang, P., Dai, M., Xuan, W., McEachin, R.C., Jackson, A.U., Scott, L.J., et al. (2006) SNP Function Portal: a web database for exploring the function implication of SNP alleles. Bioinformatics, 22, e523-e529.PubMedCrossRefGoogle Scholar
  45. 45.
    Reumers, J., Maurer-Stroh, S., Schymkowitz, J. and Rousseau, F. (2006) SNPeffect v2.0: a new step in investigating the molecular phenotypic effects of human non-synonymous SNPs. Bioinformatics, 22, 2183-2185.PubMedCrossRefGoogle Scholar
  46. 46.
    Conde, L., Vaquerizas, J.M., Santoyo, J., Al-Shahrour, F., Ruiz-Llorente, S., Robledo, M. and Dopazo, J. (2004) PupaSNP Finder: a web tool for finding SNPs with putative effect at transcriptional level. Nucleic Acids Res, 32, W242-W248.PubMedCrossRefGoogle Scholar
  47. 47.
    Reumers, J., Conde, L., Medina, I., Maurer-Stroh, S., Van Durme, J., Dopazo, J., et al. (2008) Joint annotation of coding and non-coding single nucleotide polymorphisms and mutations in the SNPeffect and PupaSuite databases. Nucleic Acids Res, 36, D825-D829.PubMedCrossRefGoogle Scholar
  48. 48.
    Cai, Z., Tsung, E.F., Marinescu, V.D., Ramoni, M.F., Riva, A. and Kohane, I.S. (2004) Bayesian approach to discovering pathogenic SNPs in conserved protein domains. Hum Mutat, 24, 178-184.PubMedCrossRefGoogle Scholar
  49. 49.
    Chasman, D. and Adams, R.M. (2001) Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. J Mol Biol, 307, 683-706.PubMedCrossRefGoogle Scholar
  50. 50.
    Krishnan, V.G. and Westhead, D.R. (2003) A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function. Bioinformatics, 19, 2199-2209.PubMedCrossRefGoogle Scholar
  51. 51.
    Saunders, C.T. and Baker, D. (2002) Evalua­tion of structural and evolutionary contributions to deleterious mutation prediction. J Mol Biol, 322, 891-901.PubMedCrossRefGoogle Scholar
  52. 52.
    Vitkup, D., Sander, C. and Church, G.M. (2003) The amino-acid mutational spectrum of human genetic disease. Genome Biol, 4, R72.PubMedCrossRefGoogle Scholar
  53. 53.
    Care, M.A., Needham, C.J., Bulpitt, A.J. and Westhead, D.R. (2007) Deleterious SNP prediction: be mindful of your training data! Bioinformatics, 23, 664-672.PubMedCrossRefGoogle Scholar
  54. 54.
    Ferrer-Costa, C., Gelpi, J.L., Zamakola, L., Parraga, I., de la Cruz, X. and Orozco, M. (2005) PMUT: a web-based tool for the annotation of pathological mutations on proteins. Bioinformatics, 21, 3176-3178.PubMedCrossRefGoogle Scholar
  55. 55.
    Ramensky, V., Bork, P. and Sunyaev, S. (2002) Human non-synonymous SNPs: server and survey. Nucleic Acids Res, 30, 3894-3900.PubMedCrossRefGoogle Scholar
  56. 56.
    Ng, P.C. and Henikoff, S. (2003) SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res, 31, 3812-3814.PubMedCrossRefGoogle Scholar
  57. 57.
    Ye, Z.Q., Zhao, S.Q., Gao, G., Liu, X.Q., Langlois, R.E., Lu, H. and Wei, L. (2007) Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP). Bioinformatics, 23, 1444-1450.PubMedCrossRefGoogle Scholar
  58. 58.
    Bromberg, Y. and Rost, B. (2007) SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res, 35, 3823-3835.PubMedCrossRefGoogle Scholar
  59. 59.
    Tian, J., Wu, N., Guo, X., Guo, J., Zhang, J. and Fan, Y. (2007) Predicting the phenotypic effects of non-synonymous single nucleotide polymorphisms based on support vector machines. BMC Bioinformatics, 8, 450.PubMedCrossRefGoogle Scholar
  60. 60.
    Mi, H., Lazareva-Ulitsky, B., Loo, R., Kejariwal, A., Vandergriff, J., Rabkin, S., et al. (2005) The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res, 33, D284-D288.PubMedCrossRefGoogle Scholar
  61. 61.
    Wang, G.S. and Cooper, T.A. (2007) Splicing in disease: disruption of the splicing code and the decoding machinery. Nat Rev Genet, 8, 749-761.PubMedCrossRefGoogle Scholar
  62. 62.
    Freimuth, R.R., Stormo, G.D. and McLeod, H.L. (2005) PolyMAPr: programs for polymorphism database mining, annotation, and functional analysis. Hum Mutat, 25, 110-117.PubMedCrossRefGoogle Scholar
  63. 63.
    Smith, P.J., Zhang, C., Wang, J., Chew, S.L., Zhang, M.Q. and Krainer, A.R. (2006) An increased specificity score matrix for the prediction of SF2/ASF-specific exonic splicing enhancers. Hum Mol Genet, 15, 2490-2508.PubMedCrossRefGoogle Scholar
  64. 64.
    Yvert, G., Brem, R.B., Whittle, J., Akey, J.M., Foss, E., Smith, E.N., et al. (2003) Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat Genet, 35, 57-64.PubMedCrossRefGoogle Scholar
  65. 65.
    Hudson, T.J. (2003) Wanted: regulatory SNPs. Nat Genet, 33, 439-440.PubMedCrossRefGoogle Scholar
  66. 66.
    Pruitt, K.D. and Maglott, D.R. (2001) RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res, 29, 137-140.PubMedCrossRefGoogle Scholar
  67. 67.
    Riva, A. and Kohane, I.S. (2002) SNPper: retrieval and analysis of human SNPs. Bioinformatics, 18, 1681-1685.PubMedCrossRefGoogle Scholar
  68. 68.
    Kim, B.C., Kim, W.Y., Park, D., Chung, W.H., Shin, K.S. and Bhak, J. (2008) SNP@Promoter: a database of human SNPs (single nucleotide polymorphisms) within the putative promoter regions. BMC Bioinformatics, 9 Suppl 1, S2.PubMedCrossRefGoogle Scholar
  69. 69.
    Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., et al. (2003) TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res, 31, 374-378.PubMedCrossRefGoogle Scholar
  70. 70.
    Chen, K. and Rajewsky, N. (2006) Natural selection on human microRNA binding sites inferred from SNP data. Nat Genet, 38, 1452-1456.PubMedCrossRefGoogle Scholar
  71. 71.
    Montgomery, S.B., Griffith, O.L., Schuetz, J.M., Brooks-Wilson, A. and Jones, S.J. (2007) A survey of genomic properties for the detection of regulatory polymorphisms. PLoS Comput Biol, 3, e106.PubMedCrossRefGoogle Scholar
  72. 72.
    Segal, E., Raveh-Sadka, T., Schroeder, M., Unnerstall, U. and Gaul, U. (2008) Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature, 451, 535-540.PubMedCrossRefGoogle Scholar
  73. 73.
    Kawabata, T., Ota, M. and Nishikawa, K. (1999) The Protein Mutant Database. Nucleic Acids Res, 27, 355-357.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Sean D. Mooney
    • 1
    Email author
  • Vidhya G. Krishnan
    • 1
  • Uday S. Evani
    • 1
  1. 1.Department of Medical and Molecular Genetics, Center for Computational Biology and BioinformaticsIndiana University School of MedicineIndianapolisUSA

Personalised recommendations