Human Genetics

, Volume 125, Issue 2, pp 163–171 | Cite as

A comprehensive evaluation of SNP genotype imputation

  • Michael Nothnagel
  • David Ellinghaus
  • Stefan Schreiber
  • Michael Krawczak
  • Andre Franke
Original Investigation

Abstract

Genome-wide association studies have contributed significantly to the genetic dissection of complex diseases. In order to increase the power of existing marker sets even further, methods have been proposed to predict individual genotypes at un-typed loci from other marker sets by imputation, usually employing HapMap data as a reference. Although various imputation algorithms have been used in practice already, a comprehensive evaluation and comparison of these approaches, using genome-wide SNP data from one and the same population is still lacking. We therefore investigated four publicly available programs for genotype imputation (BEAGLE, IMPUTE, MACH, and PLINK) using data from 449 German individuals genotyped in our laboratory for three genome-wide SNP sets [Affymetrix 5.0 (500 k), Affymetrix 6.0 (1,000 k), and Illumina 550 k]. We observed that HapMap-based imputation in a northern European population is powerful and reliable, even in highly variable genomic regions such as the extended MHC on chromosome 6p21. However, while genotype predictions were found to be highly accurate with all four programs, the number of SNPs for which imputation was actually carried out (‘imputation efficacy’) varied substantially. BEAGLE, IMPUTE, and MACH yielded nearly identical trade-offs between imputation accuracy and efficacy whereas PLINK performed consistently poorer. We nevertheless recommend either MACH or BEAGLE for practical use because these two programs are more user-friendly and generally require less memory than IMPUTE.

Notes

Acknowledgments

The authors wish to thank all probands for participating in the study. We also thank Alfred Wagner and Simone Knief of the Computational Centre, Christian-Albrechts University Kiel, Germany, for their support. Thomas Wienker and Michael Steffens (IMBIE, University of Bonn, Germany) are acknowledged for performing the initial quality control of the genotype data. Marcus Will, Michael Wittig (both at the Institute of Clinical Molecular Biology, Kiel) and Olaf Junge (Institute of Medical Informatics and Statistics, Kiel) are gratefully acknowledged for expert technical help. We would like to thank Shaun Purcell (PNGU, Massachusetts General Hospital, Boston, MA, USA), Goncalo Abecasis and Yun Li (both at the Center for Statistical Genetics, University of Michigan, MI, USA), Brian Browning (Department of Statistics, University of Auckland, New Zealand), and Tim Becker (IMBIE, University of Bonn, Germany) for providing access to the latest versions of their software and for helpful discussions. This study was supported by the German Ministry of Education and Research (BMBF) through the National Genome Research Network (NGFN). The project received infrastructure support through the DFG excellence cluster “Inflammation at Interfaces”.

Supplementary material

439_2008_606_MOESM1_ESM.pdf (7.4 mb)
Supplementary material 1 (PDF 7.37 mb)

References

  1. Anderson CA, Pettersson FH, Barrett JC, Zhuang JJ, Ragoussis J, Cardon LR, Morris AP (2008) Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms. Am J Hum Genet 83:112–119PubMedCrossRefGoogle Scholar
  2. Ardlie KG, Kruglyak L, Seielstad M (2002) Patterns of linkage disequilibrium in the human genome. Nat Rev Genet 3:299–309PubMedCrossRefGoogle Scholar
  3. Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21:263–265PubMedCrossRefGoogle Scholar
  4. Becker T, Knapp M (2004) Maximum-likelihood estimation of haplotype frequencies in nuclear families. Genet Epidemiol 27:21–32PubMedCrossRefGoogle Scholar
  5. Browning SR (2008) Missing data imputation and haplotype phase inference for genome-wide association studies. Hum Genet 124:439–450PubMedCrossRefGoogle Scholar
  6. Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81:1084–1097PubMedCrossRefGoogle Scholar
  7. Browning BL, Browning SR (2008) Haplotypic analysis of Wellcome Trust Case Control Consortium data. Hum Genet 123:273–280PubMedCrossRefGoogle Scholar
  8. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Sun W, Wang H, Wang Y, Xiong X, Xu L, Waye MM, Tsui SK, Xue H, Wong JT, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallee C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PK, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T et al (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851–861PubMedCrossRefGoogle Scholar
  9. Gourraud PA, Genin E, Cambon-Thomsen A (2004) Handling missing values in population data: consequences for maximum likelihood estimation of haplotype frequencies. Eur J Hum Genet 12:805–812PubMedCrossRefGoogle Scholar
  10. Krawczak M, Nikolaus S, von Eberstein H, Croucher PJ, El Mokhtari NE, Schreiber S (2006) PopGen: population-based recruitment of patients and controls for the analysis of complex genotype-phenotype relationships. Community Genet 9:55–61PubMedCrossRefGoogle Scholar
  11. Lao O, Lu TT, Nothnagel M, Junge O, Freitag-Wolf S, Caliebe A, Balascakova M, Bertranpetit J, Bindoff LA, Comas D, Holmlund G, Kouvatsi A, Macek M, Mollet I, Parson W, Palo J, Ploski R, Sajantila A, Tagliabracci A, Gether U, Werge T, Rivadeneira F, Hofman A, Uitterlinden AG, Gieger C, Wichmann HE, Ruther A, Schreiber S, Becker C, Nurnberg P, Nelson MR, Krawczak M, Kayser M (2008) Correlation between genetic and geographic structure in Europe. Curr Biol 18:1241–1248PubMedCrossRefGoogle Scholar
  12. Leslie S, Donnelly P, McVean G (2008) A statistical method for predicting classical HLA alleles from SNP data. Am J Hum Genet 82:48–56PubMedCrossRefGoogle Scholar
  13. Li Y, Abecasis GR (2006) Mach 1.0: rapid haplotype reconstruction and missing genotype inference. Am J Hum Genet S79:2290Google Scholar
  14. Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39:906–913PubMedCrossRefGoogle Scholar
  15. Pei YF, Li J, Zhang L, Papasian CJ, Deng HW (2008) Analyses and comparison of accuracy of different genotype imputation methods. PLoS ONE 3:e3551PubMedCrossRefGoogle Scholar
  16. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575PubMedCrossRefGoogle Scholar
  17. R Development Core Team (2008) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
  18. Raymond CK, Kas A, Paddock M, Qiu R, Zhou Y, Subramanian S, Chang J, Palmieri A, Haugen E, Kaul R, Olson MV (2005) Ancient haplotypes of the HLA Class II region. Genome Res 15:1250–1257PubMedCrossRefGoogle Scholar
  19. Servin B, Stephens M (2007) Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet 3:e114PubMedCrossRefGoogle Scholar
  20. Teare MD, Dunning AM, Durocher F, Rennart G, Easton DF (2002) Sampling distribution of summary linkage disequilibrium measures. Ann Hum Genet 66:223–233PubMedCrossRefGoogle Scholar
  21. Terwilliger JD, Haghighi F, Hiekkalinna TS, Goring HH (2002) A bias-ed assessment of the use of SNPs in human complex traits. Curr Opin Genet Dev 12:726–734PubMedCrossRefGoogle Scholar
  22. The International HapMap Consortium (2003) The International HapMap Project. Nature 426:789–796CrossRefGoogle Scholar
  23. The International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437:1299–1320CrossRefGoogle Scholar
  24. Traherne JA (2008) Human MHC architecture and evolution: implications for disease association studies. Int J Immunogenet 35:179–192PubMedCrossRefGoogle Scholar
  25. Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14, 000 cases of seven common diseases and 3, 000 shared controls. Nature 447:661–678CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  • Michael Nothnagel
    • 1
  • David Ellinghaus
    • 2
  • Stefan Schreiber
    • 2
    • 3
  • Michael Krawczak
    • 1
    • 3
  • Andre Franke
    • 2
  1. 1.Institute of Medical Informatics and StatisticsChristian-Albrechts UniversityKielGermany
  2. 2.Institute of Clinical Molecular BiologyChristian-Albrechts UniversityKielGermany
  3. 3.PopGen BiobankChristian-Albrechts UniversityKielGermany

Personalised recommendations