Skip to main content
Log in

Computational analysis of human genome polymorphism

  • To the Anniversary of the Institute of Molecular Biology
  • Published:
Molecular Biology Aims and scope Submit manuscript

Abstract

While genome-era technologies focused on complete genome sequencing in various organisms, post-genome technologies aim at the understanding of the mechanisms of genetic information processing and elucidation of within-species variation. Single nucleotide polymorphisms (SNPs) are the most common source of genome variation in the human population. Nonsynonymous SNPs that occur in coding gene regions and result in amino acid substitutions are of particular interest. It is thought that such SNPs are responsible for phenotypic variation, quantitative traits, and the etiology of common diseases. PolyPhen is a computational tool for the prediction of putatively functional nonsynonymous SNPs by combining information of various types. The application areas of PolyPhen and similar methods include the genetics of complex diseases and congenital defects, the identification of functional mutations in model organisms, and evolutionary genetics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Cummings C.J., Zoghbi H.Y. 2000. Trinucleotide repeats: Mechanisms and pathophysiology. Annu. Rev. Genomics Hum. Genet. 1, 281–328.

    Article  PubMed  CAS  Google Scholar 

  2. Levy S., Sutton G., Ng P.C., et al. 2007. The diploid genome sequence of an individual human. PLoS Biol. 5, e254.

    Article  PubMed  Google Scholar 

  3. Redon R., Ishikawa S., Fitch K.R., et al. 2006. Global variation in copy number in the human genome. Nature. 444, 444–454.

    Article  PubMed  CAS  Google Scholar 

  4. Ball E.V., Stenson P.D., Abeysinghe S.S., et al. 2005. Microdeletions and microinsertions causing human genetic disease: common mechanisms of mutagenesis and the role of local DNA sequence complexity. Hum. Mutat. 26, 205–213.

    Article  PubMed  CAS  Google Scholar 

  5. Brookes A.J. 1999. The essence of SNPs. Gene. 234, 177–186.

    Article  PubMed  CAS  Google Scholar 

  6. Chakravarti A. 1999. Population genetics-making sense out of sequence. Nature Genet. 21, 56–60.

    Article  PubMed  CAS  Google Scholar 

  7. Gorlov I.P., Gorlova O.Y., Sunyaev S.R., et al. 2008. Shifting paradigm of association studies: Value of rare single-nucleotide polymorphisms. Am. J. Hum. Genet. 82, 100–112.

    Article  PubMed  CAS  Google Scholar 

  8. Collins F.S., Brooks L.D., Chakravarti A. 1998. A DNA polymorphism discovery resource for research on human genetic variation. Genome Res. 8, 1229–1231.

    PubMed  CAS  Google Scholar 

  9. Shah R.R. 2005. Pharmacogenetics in drug regulation: Promise, potential and pitfalls. Philos Trans. R. Soc. Lond. B Biol. Sci. 360, 1617–1638.

    Article  PubMed  CAS  Google Scholar 

  10. Orban T.I., Olah E. 2001. Purifying selection on silent sites — a constraint from splicing regulation? Trends Genet. 17, 252–253.

    Article  PubMed  CAS  Google Scholar 

  11. Cooper D.N., Stenson P.D., Chuzhanova N.A. 2006. The Human Gene Mutation Database (HGMD) and its exploitation in the study of mutational mechanisms. Curr. Protoc. Bioinformatics. Chapter 1, Unit 1.13.

  12. Botstein D., Risch N. 2003. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nature Genet. 33Suppl., 228–237.

    Article  PubMed  CAS  Google Scholar 

  13. Thomas P.D., Kejariwal A. 2004. Coding single-nucleotide polymorphisms associated with complex vs. Mendelian disease: Evolutionary evidence for differences in molecular effects. Proc. Natl. Acad. Sci. USA. 101, 15398–15403.

    Article  PubMed  CAS  Google Scholar 

  14. Halushka M.K., Fan J.B., Bentley K., et al. 1999. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nature Genet. 22, 239–247.

    Article  PubMed  CAS  Google Scholar 

  15. Cargill M., Altshuler D., Ireland J., et al. 1999. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nature Genet. 22, 231–238.

    Article  PubMed  CAS  Google Scholar 

  16. Sunyaev S., Ramensky V., Koch I., et al. 2001. Prediction of deleterious human alleles. Hum. Mol. Genet. 10, 591–597.

    Article  PubMed  CAS  Google Scholar 

  17. Sunyaev S., Kondrashov F.A., Bork P., Ramensky V. 2003. Impact of selection, mutation rate and genetic drift on human genetic variation. Hum. Mol. Genet. 12, 3325–3330.

    Article  PubMed  CAS  Google Scholar 

  18. Li W.H., Sadler L.A. 1991. Low nucleotide diversity in man. Genetics. 129, 513–523.

    PubMed  CAS  Google Scholar 

  19. Kwok P.Y., Deng Q., Zakeri H., et al. 1996. Increasing the information content of STS-based genome maps: Identifying polymorphisms in mapped STSs. Genomics. 31, 123–126.

    Article  PubMed  CAS  Google Scholar 

  20. Cambien F., Poirier O., Nicaud V., et al. 1999. Sequence diversity in 36 candidate genes for cardiovascular disorders. Am. J. Hum. Genet. 65, 183–191.

    Article  PubMed  CAS  Google Scholar 

  21. Kolchinsky A., Mirzabekov A. 2002. Analysis of SNPs and other genomic variations using gel-based chips. Hum. Mutat. 19, 343–360.

    Article  PubMed  CAS  Google Scholar 

  22. International Human Genome Sequencing Consortium. 2004. Finishing the euchromatic sequence of the human genome. Nature. 431, 931–945.

    Article  Google Scholar 

  23. Barbujani G., Magagni A., Minch E., Cavalli-Sforza L.L. 1997. An apportionment of human DNA diversity. Proc. Natl. Acad. Sci. USA. 94, 451–459.

    Article  Google Scholar 

  24. Romualdi C., Balding D., Nasidze I.S., et al. 2002. Patterns of human diversity, within and among continents, inferred from biallelic DNA polymorphisms. Genome Res. 12, 602–612.

    Article  PubMed  CAS  Google Scholar 

  25. Liu N., Sawyer S.L., Mukherjee N., et al. 2004. Haplotype block structures show significant variation among populations. Genet. Epidemiol. 27, 385–400.

    Article  PubMed  Google Scholar 

  26. Lohmueller K.E., Indap A.R., Schmidt S., et al. 2008. Proportionally more deleterious genetic variation in European than in African populations. Nature. 451, 994–997.

    Article  PubMed  CAS  Google Scholar 

  27. Frazer K.A., Ballinger D.G., Cox D.R., et al. 2007. A second generation human haplotype map of over 3.1 million SNPs. Nature. 449, 851–861.

    Article  PubMed  CAS  Google Scholar 

  28. Lander E.S., Linton L.M., Birren B., et al. 2001. Initial sequencing and analysis of the human genome. Nature. 409, 860–921.

    Article  PubMed  CAS  Google Scholar 

  29. Venter J.C., Adams M.D., Myers E.W., et al. 2001. The sequence of the human genome. Science. 291, 1304–1351.

    Article  PubMed  CAS  Google Scholar 

  30. Wheeler D.A., Srinivasan M., Egholm M., et al. 2008. The complete genome of an individual by massively parallel DNA sequencing. Nature. 452, 872–876.

    Article  PubMed  CAS  Google Scholar 

  31. Sherry S.T., Ward M.H., Kholodov M., et al. 2001. dbSNP: The NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311.

    Article  PubMed  CAS  Google Scholar 

  32. Di Rienzo A. 2006. Population genetics models of common diseases. Curr. Opin. Genet. Dev. 16, 630–636.

    Article  PubMed  Google Scholar 

  33. Freimer N., Sabatti C. 2004. The use of pedigree, sibpair and association studies of common diseases for genetic mapping and epidemiology. Nature Genet. 36, 1045–1051.

    Article  PubMed  CAS  Google Scholar 

  34. Jimenez-Sanchez G., Childs B., Valle D. 2001. Human disease genes. Nature. 409, 853–855.

    Article  PubMed  CAS  Google Scholar 

  35. Jorde L.B. 2000. Linkage disequilibrium and the search for complex disease genes. Genome Res. 10, 1435–1444.

    Article  PubMed  CAS  Google Scholar 

  36. Rebbeck T.R., Spitz M., Wu X. 2004. Assessing the function of genetic variants in candidate gene association studies. Nature Rev. Genet. 5, 589–597.

    Article  CAS  Google Scholar 

  37. Wang W.Y., Barratt B.J., Clayton D.G., Todd J.A. 2005. Genome-wide association studies: Theoretical and practical concerns. Nature Rev. Genet. 6, 109–118.

    Article  CAS  Google Scholar 

  38. Stefansson H., Sigurdsson E., Steinthorsdottir V., et al. 2002. Neuregulin 1 and susceptibility to schizophrenia. Am. J. Hum. Genet. 71, 877–892.

    Article  PubMed  Google Scholar 

  39. Nistico L., Buzzetti R., Pritchard L.E., et al. 1996. The CTLA-4 gene region of chromosome 2q33 is linked to, and associated with, type 1 diabetes. Belgian Diabetes Registry. Hum. Mol. Genet. 5, 1075–1080.

    Article  PubMed  CAS  Google Scholar 

  40. Stoll M., Corneliussen B., Costello C.M., et al. 2004. Genetic variation in DLG5 is associated with inflammatory bowel disease. Nature Genet. 36, 476–480.

    Article  PubMed  CAS  Google Scholar 

  41. Hirschhorn J.N., Daly M.J. 2005. Genome-wide association studies for common diseases and complex traits. Nature Rev. Genet. 6, 95–108.

    Article  CAS  Google Scholar 

  42. Cousin E., Deleuze J.F., Genin E. 2006. Selection of SNP subsets for association studies in candidate genes: Comparison of the power of different strategies to detect single disease susceptibility locus effects. BMC Genet. 7, 20, 20.

    Article  PubMed  Google Scholar 

  43. Maller J., George S., Purcell S., et al. 2006. Common variation in three genes, including a noncoding variant in CFH, strongly influences risk of age-related macular degeneration. Nature Genet. 38, 1055–1059.

    Article  PubMed  CAS  Google Scholar 

  44. Lawrence R.W., Evans D.M., Cardon L.R. 2005. Prospects and pitfalls in whole genome association studies. Philos Trans. R. Soc. Lond. B Biol. Sci. 360, 1589–1595.

    Article  PubMed  CAS  Google Scholar 

  45. Sladek R., Rocheleau G., Rung J., et al. 2007. A genomewide association study identifies novel risk loci for type 2 diabetes. Nature. 445, 881–885.

    Article  PubMed  CAS  Google Scholar 

  46. Wellcome Trust Case Control Consortium. 2007. Genome-wide association study of 14.000 cases of seven common diseases and 3.000 shared controls. Nature. 447, 661–678.

    Article  Google Scholar 

  47. Risch N., Merikangas K. 1996. The future of genetic studies of complex human diseases. Science. 273, 1516–1517.

    Article  PubMed  CAS  Google Scholar 

  48. Pritchard J.K., Cox N.J. 2002. The allelic architecture of human disease genes: Common disease-common variant...or not? Hum. Mol. Genet. 11, 2417–2423.

    Article  PubMed  CAS  Google Scholar 

  49. Lohmueller K.E., Pearce C.L., Pike M., et al. 2003. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nature Genet. 33, 177–182.

    Article  PubMed  CAS  Google Scholar 

  50. Saxena R., Voight B.F., Lyssenko V., et al. 2007. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science. 316, 1331–1336.

    Article  PubMed  CAS  Google Scholar 

  51. Altshuler D., Hirschhorn J.N., Klannemark M., et al. 2000. The common PPARgamma Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nature Genet. 26, 76–80.

    Article  PubMed  CAS  Google Scholar 

  52. Klein R.J., Zeiss C., Chew E.Y., et al. 2005. Complement factor H polymorphism in age-related macular degeneration. Science. 308, 385–389.

    Article  PubMed  CAS  Google Scholar 

  53. Carlton V.E., Hu X., Chokkalingam A.P., et al. 2005. PTPN22 genetic variation: Evidence for multiple variants associated with rheumatoid arthritis. Am. J. Hum. Genet. 77, 567–581.

    Article  PubMed  CAS  Google Scholar 

  54. Pritchard J.K. 2001. Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69, 124–137.

    Article  PubMed  CAS  Google Scholar 

  55. Kryukov G.V., Pennacchio L.A., Sunyaev S.R. 2007. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am. J. Hum. Genet. 80, 727–739.

    Article  PubMed  CAS  Google Scholar 

  56. Ahituv N., Kavaslar N., Schackwitz W., et al. 2007. Medical sequencing at the extremes of human body mass. Am. J. Hum. Genet. 80, 779–791.

    Article  PubMed  CAS  Google Scholar 

  57. Cohen J.C., Kiss R.S., Pertsemlidis A., et al. 2004. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science. 305, 869–872.

    Article  PubMed  CAS  Google Scholar 

  58. Zhu X., Fejerman L., Luke A., et al. 2005. Haplotypes produced from rare variants in the promoter and coding regions of angiotensinogen contribute to variation in angiotensinogen levels. Hum. Mol. Genet. 14, 639–643.

    Article  PubMed  CAS  Google Scholar 

  59. Fearnhead N.S., Wilding J.L., Winney B., et al. 2004. Multiple rare variants in different genes account for multifactorial inherited susceptibility to colorectal adenomas. Proc. Natl. Acad. Sci. USA. 101, 15992–15997.

    Article  PubMed  CAS  Google Scholar 

  60. Lee Y.H., Harley J.B., Nath S.K. 2005. CTLA-4 polymorphisms and systemic lupus erythematosus SLE.: a meta-analysis. Hum. Genet. 116, 361–367.

    Article  PubMed  CAS  Google Scholar 

  61. Ueda H., Howson J.M., Esposito L., et al. 2003. Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature. 423, 506–511.

    Article  PubMed  CAS  Google Scholar 

  62. Ramensky V., Bork P., Sunyaev S. 2002. Human nonsynonymous SNPs: server and survey. Nucleic Acids Res. 30, 3894–3900.

    Article  PubMed  CAS  Google Scholar 

  63. Sunyaev S., Ramensky V., Bork P. 2000. Towards a structural basis of human non-synonymous single nucleotide polymorphisms. Trends Genet. 16, 198–200.

    Article  PubMed  CAS  Google Scholar 

  64. Wang Z., Moult J. 2001. SNPs, protein structure, and disease. Hum. Mutat. 17, 263–270.

    Article  PubMed  Google Scholar 

  65. Chasman D., Adams R.M. 2001. Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: Structure-based assessment of amino acid variation. J. Mol. Biol. 307, 683–706.

    Article  PubMed  CAS  Google Scholar 

  66. Sunyaev S.R., Eisenhaber F., Rodchenkov I.V., et al. 1999. PSIC: Profile extraction from sequence alignments with position-specific counts of independent observations. Protein Eng. 12, 387–394.

    Article  PubMed  CAS  Google Scholar 

  67. Kondrashov A.S., Sunyaev S., Kondrashov F.A. 2002. Dobzhansky-Muller incompatibilities in protein evolution. Proc. Natl. Acad. Sci. USA. 99, 14878–14883.

    Article  PubMed  CAS  Google Scholar 

  68. Saunders C.T., Baker D. 2002. Evaluation of structural and evolutionary contributions to deleterious mutation prediction. J. Mol. Biol. 322, 891–901.

    Article  PubMed  CAS  Google Scholar 

  69. Ng P.C., Henikoff S. 2003. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814.

    Article  PubMed  CAS  Google Scholar 

  70. Ferrer-Costa C., Orozco M., de la Cruz X. 2004. Sequence-based prediction of pathological mutations. Proteins. 57, 811–819.

    Article  PubMed  CAS  Google Scholar 

  71. Yue P., Melamud E., Moult J. 2006. SNPs3D: Candidate gene and SNP selection for association studies. BMC Bioinformatics. 7, 166.

    Article  PubMed  Google Scholar 

  72. Jiang R., Yang H., Zhou L., et al. 2007. Sequence-based prioritization of nonsynonymous single-nucleotide polymorphisms for the study of disease mutations. Am. J. Hum. Genet. 81, 346–360.

    Article  PubMed  CAS  Google Scholar 

  73. Capriotti E., Calabrese R., Casadio R. 2006. Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics. 22, 2729–2734.

    Article  PubMed  CAS  Google Scholar 

  74. Bao L., Zhou M., Cui Y. 2005. nsSNPAnalyzer: Identifying disease-associated nonsynonymous single nucleotide polymorphisms. Nucleic Acids Res. 33, W480–W482.

    Article  PubMed  CAS  Google Scholar 

  75. Reumers J., Maurer-Stroh S., Schymkowitz J., Rousseau F. 2006. SNPeffect v. 2.0: A new step in investigating the molecular phenotypic effects of human nonsynonymous SNPs. Bioinformatics. 22, 2183–2185.

    Article  PubMed  CAS  Google Scholar 

  76. Mooney S. 2005. Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis. Brief. Bioinform. 6, 44–56.

    Article  PubMed  CAS  Google Scholar 

  77. Ng P.C., Henikoff S. 2006. Predicting the effects of amino acid substitutions on protein function. Annu. Rev. Genomics Hum. Genet. 7, 61–80.

    Article  PubMed  CAS  Google Scholar 

  78. Burke D.F., Worth C.L., Priego E.M., et al. 2007. Genome bioinformatic analysis of nonsynonymous SNPs. BMC Bioinformatics. 8, 301.

    Article  PubMed  Google Scholar 

  79. Tchernitchko D., Goossens M., Wajcman H. 2004. In silico prediction of the deleterious effect of a mutation: Proceed with caution in clinical genetics. Clin. Chem. 50, 1974–1978.

    Article  PubMed  CAS  Google Scholar 

  80. Williamson S.H., Hernandez R., Fledel-Alon A., et al. 2005. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc. Natl. Acad. Sci. USA. 102, 7882–7887.

    Article  PubMed  CAS  Google Scholar 

  81. Pastinen T., Hudson T.J. 2004. Cis-acting regulatory variation in the human genome. Science. 306, 647–650.

    Article  PubMed  CAS  Google Scholar 

  82. Buckland P.R. 2004. Allele-specific gene expression differences in humans. Hum. Mol. Genet. 13 Spec. no. 2, R255–R260.

    Article  Google Scholar 

  83. Morley M., Molony C.M., Weber T.M., et al. 2004. Genetic analysis of genome-wide variation in human gene expression. Nature. 430, 743–747.

    Article  PubMed  CAS  Google Scholar 

  84. Chen K., Rajewsky N. 2006. Natural selection on human microRNA binding sites inferred from SNP data. Nature Genet. 38, 1452–1456.

    Article  PubMed  CAS  Google Scholar 

  85. Andersen M.C., Engstrom P.G., Lithwick S., et al. 2008. In silico detection of sequence variations modifying transcriptional regulation. PLoS Comput. Biol. 4, e5.

    Article  PubMed  Google Scholar 

  86. Torkamani A., Schork N.J. 2008. Predicting functional regulatory polymorphisms. Bioinformatics. 24, 1787–1792.

    Article  PubMed  CAS  Google Scholar 

  87. Macdonald S.J., Long A.D. 2005. Prospects for identifying functional variation across the genome. Proc. Natl. Acad. Sci. USA. 102Suppl. 1, 6614–6621.

    Article  PubMed  Google Scholar 

  88. Ponomarenko J.V., Merkulova T.I., Orlova G.V., et al. 2003. rSNP_Guide, a database system for analysis of transcription factor binding to DNA with variations: Application to genome annotation. Nucleic Acids Res. 31, 118–121.

    Article  PubMed  CAS  Google Scholar 

  89. Birney E., Stamatoyannopoulos J.A., Dutta A., et al. 2007. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 447, 799–816.

    Article  PubMed  CAS  Google Scholar 

  90. Asthana S., Roytberg M., Stamatoyannopoulos J., Sunyaev S. 2007. Analysis of sequence conservation at nucleotide resolution. PLoS Comput. Biol. 3, e254.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. E. Ramensky.

Additional information

Original Russian Text © V.E. Ramensky, S.R. Sunyaev, 2009, published in Molekulyarnaya Biologiya, 2009, Vol. 43, No. 2, pp. 286–294.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ramensky, V.E., Sunyaev, S.R. Computational analysis of human genome polymorphism. Mol Biol 43, 260–268 (2009). https://doi.org/10.1134/S0026893309020095

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0026893309020095

Key words

Navigation