Skip to main content
Log in

Comparing genetic variants detected in the 1000 genomes project with SNPs determined by the International HapMap Consortium

  • RESEARCH ARTICLE
  • Published:
Journal of Genetics Aims and scope Submit manuscript

Abstract

Single-nucleotide polymorphisms (SNPs) determined based on SNP arrays from the international HapMap consortium (HapMap) and the genetic variants detected in the 1000 genomes project (1KGP) can serve as two references for genomewide association studies (GWAS). We conducted comparative analyses to provide a means for assessing concerns regarding SNP array-based GWAS findings as well as for realistically bounding expectations for next generation sequencing (NGS)-based GWAS. We calculated and compared base composition, transitions to transversions ratio, minor allele frequency and heterozygous rate for SNPs from HapMap and 1KGP for the 622 common individuals. We analysed the genotype discordance between HapMap and 1KGP to assess consistency in the SNPs from the two references. In 1KGP, 90.58% of 36,817,799 SNPs detected were not measured in HapMap. More SNPs with minor allele frequencies less than 0.01 were found in 1KGP than HapMap. The two references have low discordance (generally smaller than 0.02) in genotypes of common SNPs, with most discordance from heterozygous SNPs. Our study demonstrated that SNP array-based GWAS findings were reliable and useful, although only a small portion of genetic variances were explained. NGS can detect not only common but also rare variants, supporting the expectation that NGS-based GWAS will be able to incorporate a much larger portion of genetic variance than SNP arrays-based GWAS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

Similar content being viewed by others

References

  • Abecasis G. R., Auton A., Brooks L. D., DePristo M. A., Durbin R. M., Handsaker R. E. et al. 2012 An integrated map of genetic variation from 1092 human genomes. Nature 491, 56–65.

  • Buchanan C. C., Torstenson E. S., Bush W. S. and Ritchie M. D. 2012 A comparison of cataloged variation between International HapMap Consortium and 1000 Genomes Project data. J. Am. Med. Inform. Assoc. 19, 289–294.

  • Chen R., Davydov E. V., Sirota M. and Butte A. J. 2010 Non-synonymous and synonymous coding SNPs show similar likelihood and effect size of human disease association. PLoS One 5, e13574.

  • Cirulli E. T. and Goldstein D. B. 2010 Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat. Rev. Genet. 11, 415–425.

  • Collins D. W. and Jukes T. H. 1994 Rates of transition and transversion in coding sequences since the human-rodent divergence. Genomics 20, 386–396.

  • Conrad D. F., Keebler J. E., DePristo M. A., Lindsay S. J., Zhang Y., Casals F. et al. 2011 Variation in genome-wide mutation rates within and between human families. Nat. Genet. 43, 712–714.

  • Ebersberger I., Metzler D., Schwarz C. and Paabo S. 2002 Genomewide comparison of DNA sequences between humans and chimpanzees. Am. J. Hum. Genet. 70, 1490–1497.

  • Eichler E. E., Flint J., Gibson G., Kong A., Leal S. M., Moore J. H. et al. 2010 Missing heritability and strategies for finding the underlying causes of complex disease. Nat. Rev. Genet. 11, 446–450.

  • Evangelou E. and Ioannidis J. P. 2013 Meta-analysis methods for genome-wide association studies and beyond. Nat. Rev. Genet. 14, 379–389.

  • Frayling T. M., Timpson N. J., Weedon M. N., Zeggini E., Freathy R. M., Lindgren C. M. et al. 2007 A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316, 889–894.

  • Gibson G. 2011 Rare and common variants: twenty arguments. Nat. Rev. Genet. 13, 135–145.

  • Ginsburg G. S. and McCarthy J. J. 2001 Personalized medicine: revolutionizing drug discovery and patient care. Trends Biotechnol. 19, 491–496.

  • Hindorff L. A., Sethupathy P., Junkins H. A., Ramos E. M., Mehta J. P., Collins F. S. et al. 2009 Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367.

  • Hirschhorn J. N. 2009 Genomewide association studies—illuminating biologic pathways. N. Engl. J. Med. 360, 1699–1701.

  • Hong H. 2012 Next-generation sequencing and its impact on pharmacogenetics. J. Pharmacogenomics Pharmacoproteomics 3, e119.

  • Hong H., Shi L., Su Z., Ge W., Jones W. D., Czika W. et al. 2010a Assessing sources of inconsistencies in genotypes and their effects on genome-wide association studies with HapMap samples. Pharmacogenom. J. 10, 364–374.

  • Hong H., Su Z., Ge W., Shi L., Perkins R., Fang H. et al. 2010b Evaluating variations of genotype calling: a potential source of spurious associations in genome-wide association studies. J. Genet. 89, 55–64.

  • Hong H., Xu L., Liu J., Jones W. D., Su Z., Ning B. et al. 2012a Technical reproducibility of genotyping SNP arrays used in genome-wide association studies. PLoS One 7, e44483.

  • Hong H., Xu L., Su Z., Liu J., Ge W., Shen J. et al. 2012b Pitfall of genome-wide association studies: sources of inconsistency in genotypes and their effects. J. Biomed. Sci. Eng. 5, 557–573.

  • Hong H., Zhang W., Shen J., Su Z., Ning B., Han T. et al. 2013 Critical role of bioinformatics in translating huge amounts of next-generation sequencing data into personalized medicine. Sci. China Life Sci. 56, 110–118.

  • International HapMap C., Frazer K. A., Ballinger D. G., Cox D. R., Hinds D. A., Stuve L. L. et al. 2007 A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861.

  • Klein R. J., Zeiss C., Chew E. Y., Tsai J. Y., Sackler R. S., Haynes C. et al. 2005 Complement factor H polymorphism in age-related macular degeneration. Science 308, 385–389.

  • Kraft P. and Hunter D. J. 2009 Genetic risk prediction—are we there yet? N. Engl. J. Med. 360, 1701–1703.

  • Lander E. S. 1996 The new genomics: global views of biology. Science 274, 536–539.

  • Lander E. S., Linton L. M., Birren B., Nusbaum C., Zody M. C., Baldwin J. et al. 2001 Initial sequencing and analysis of the human genome. Nature 409, 860–921.

  • Langreth R. and Waldholz M. 1999 New era of personalized medicine: targeting drugs for each unique genetic profile. Oncologist 4, 426–427.

  • Londin E., Yadav P., Surrey S., Kricka L. J. and Fortina P. 2013 Use of linkage analysis, genome-wide association studies, and next-generation sequencing in the identification of disease-causing mutations. Methods Mol. Biol. 1015, 127–146.

  • Lovelock P. K., Spurdle A. B., Mok M. T., Farrugia D. J., Lakhani S. R., Healey S. et al. 2007 Identification of BRCA1 missense substitutions that confer partial functional activity: potential moderate risk variants? Breast Cancer Res. 9, R82.

  • Lynch M. 2010 Rate, molecular spectrum, and consequences of human mutation. Proc. Natl. Acad. Sci. USA 107, 961– 968.

  • Manolio T. A., Collins F. S., Cox N. J., Goldstein D. B., Hindorff L. A., Hunter D. J. et al. 2009 Finding the missing heritability of complex diseases. Nature 461, 747–753.

  • Marian A. J. 2012 Molecular genetic studies of complex phenotypes. Transl. Res. 159, 64–79.

  • Marth G. T., Yu F., Indap A. R., Garimella K., Gravel S., Leong W. F. et al. 2011 The functional spectrum of low-frequency coding variation. Genome Biol. 12, R84.

  • O’Rawe J., Jiang T., Sun G., Wu Y., Wang W., Hu J. et al. 2013 Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med. 5, 28.

  • Obama B. 2007 The genomics and personalized medicine act of 2006. Clin. Adv. Hematol. Oncol. 5, 39–40.

  • Pearson T. A. and Manolio T. A. 2008 How to interpret a genome-wide association study. JAMA 299, 1335–1344.

  • Pritchard J. K. and Cox N. J. 2002 The allelic architecture of human disease genes: common disease-common variant ... or not? Hum. Mol. Genet. 11, 2417–2423.

  • Ratan A., Miller W., Guillory J., Stinson J., Seshagiri S. and Schuster S. C. 2013 Comparison of sequencing platforms for single nucleotide variant calls in a human sample. PLoS One 8, e55089.

  • Rosenfeld J. A., Mason C. E. and Smith T. M. 2012 Limitations of the human reference genome for personalized genomics. PLoS One 7, e40294.

  • Sharma M., Kruger R. and Gasser T. 2014 From genome-wide association studies to next-generation sequencing: lessons from the past and planning for the future. JAMA Neurol. 71, 5–6.

  • Su Z., Fang H., Hong H., Shi L., Zhang W., Zhang W. et al. 2014 Legacy microarray data in the RNA-seq era—a biomarker investigation. Genome Biol. 15, 523.

  • The International HapMap Consortium 2003 The international HapMap project. Nature 426, 789–796.

  • Venter J. C., Adams M. D., Myers E. W., Li P. W., Mural R. J., Sutton G. G. et al. 2001 The sequence of the human genome. Science 291, 1304–1351.

  • Wagner M. J. 2013 Rare-variant genome-wide association studies: a new frontier in genetic analysis of complex traits. Pharmacogenomics 14, 413–424.

  • Wang W. Y., Barratt B. J., Clayton D. G. and Todd J. A. 2005 Genome-wide association studies: theoretical and practical concerns. Nat. Rev. Genet. 6, 109–118.

  • Wigginton J. E., Cutler D. J. and Abecasis G. R. 2005 A note on exact tests of Hardy–Weinberg equilibrium. Am. J. Hum. Genet. 76, 887–893.

  • Wood A. R., Perry J. R., Tanaka T., Hernandez D. G., Zheng H. F., Melzer D. et al. 2013 Imputation of variants from the 1000 genomes project modestly improves known associations and can identify low-frequency variant-phenotype associations undetected by HapMap based imputation. PLoS One 8, e64343.

  • Zhang W., Meehan J., Su Z., Ng H. W., Shu M., Luo H. et al. 2014 Whole genome sequencing of 35 individuals provides insights into the genetic architecture of Korean population. BMC Bioinformatics 15, S6.

  • Zhang W., Soika V., Meehan J., Su Z., Ge W., Ng H. W. et al. 2015 Quality control metrics improve repeatability and reproducibility of single-nucleotide variants derived from whole genome sequencing. Pharmacogenomics J. 15, 298–309.

Download references

Acknowledgements

This research was supported in part by an appointment to the research participation programme at the National Center for Toxicological Research (Wenqian Zhang, Hui Wen Ng and Heng Luo) administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the US Department of Energy and the US Food and Drug Administration. The content is solely the responsibility of the authors and does not necessarily represent the official views of the Food and Drugs Administration.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to HUIXIAO HONG.

Additional information

[Zhang W., Ng H. W., Shu M., Luo H., Su Z., Ge W., Perkins R., Tong W. and Hong H. 2015 Comparing genetic variants detected in the 1000 genomes project with SNPs determined by the International HapMap Consortium. J. Genet. 94, xx–xx

The findings and conclusions in this paper have not been formally disseminated by the US Food and Drug Administration (FDA) and should not be construed to represent the FDA determination or policy

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

ZHANG, W., NG, H.W., SHU, M. et al. Comparing genetic variants detected in the 1000 genomes project with SNPs determined by the International HapMap Consortium. J Genet 94, 731–740 (2015). https://doi.org/10.1007/s12041-015-0588-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12041-015-0588-8

Keywords

Navigation