Human Genetics

, Volume 131, Issue 1, pp 111–119 | Cite as

Artifact due to differential error when cases and controls are imputed from different platforms

Original Investigation

Abstract

Including previously genotyped controls in a genome-wide association study can provide cost-savings, but can also create design biases. When cases and controls are genotyped on different platforms, the imputation needed to provide genome-wide coverage will introduce differential measurement error and may lead to false positives. We compared genotype frequencies of two healthy control groups from the Nurses’ Health Study genotyped on different platforms [Affymetrix 6.0 (n = 1,672) and Illumina HumanHap550 (n = 1,038)]. Using standard imputation quality filters, we observed 9,841 single-nucleotide polymorphisms (SNPs) out of 2,347,809 (0.4%) significant at the 5 × 10−8 level. We explored three methods for controlling for this Type I error inflation. One method was to remove platform effects using principal components; another was to restrict to SNPs of highest quality imputation; and a third was to genotype some controls alongside cases to exclude SNPs that are statistical artifact. The first method could not reduce the Type I error rate; the other two could dramatically reduce the error rate, although both required that a portion of SNPs be excluded from analysis. Ideally, the biases we describe would be eliminated at the design stage, by genotyping sufficient numbers of cases and controls on each platform. Researchers using imputation to combine samples genotyped on different platforms with severely unbalanced case–control ratios should be aware of the potential for inflated Type I error rates and apply appropriate quality filters. Every SNP found with genome-wide significance should be validated on another platform to verify that its significance is not an artifact of study design.

References

  1. Alberts B (2010) Editorial expression of concern. Science 330(6006):912. doi:10.1126/science.330.6006.912-b PubMedCrossRefGoogle Scholar
  2. Beecham GW, Martin ER, Gilbert JR, Haines JL, Pericak-Vance MA (2010) APOE is not associated with alzheimer disease: a cautionary tale of genotype imputation. Ann Hum Genet 74(3):189–94. doi:10.1111/j.1469-1809.2010.00573.x PubMedCrossRefGoogle Scholar
  3. Carmichael M (2010) The little flaw in the longevity-gene study that could be a big problem. http://www.newsweek.com/2010/07/07/the-little-flaw-in-the-longevity-gene-study-that-could-be-a-big-problem.html
  4. Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics 55(4):997–1004PubMedCrossRefGoogle Scholar
  5. Fallin MD, Szymanski M, Wang R, Gherman A, Bassett SS, Avramopoulos D (2010) Fine mapping of the chromosome 10q11-q21 linkage region in Alzheimer’s disease cases and controls. Neurogenetics 11(3):335–48. doi:10.1007/s10048-010-0234-9 PubMedCrossRefGoogle Scholar
  6. Ho LA, Lange EM (2010) Using public control genotype data to increase power and decrease cost of case–control genetic association studies. Hum Genet 128(6):597–608. doi:10.1007/s00439-010-0880-x PubMedCrossRefGoogle Scholar
  7. Hom G, Graham RR, Modrek B, Taylor KE, Ortmann W, Garnier S, Lee AT, Chung SA, Ferreira RC, Pant PVK, Ballinger DG, Kosoy R, Demirci FY, Kamboh MI, Kao AH, Tian C, Gunnarsson I, Bengtsson AA, Rantapaa-Dahlqvist S, Petri M, Manzi S, Seldin MF, Ronnblom L, Syvanen AC, Criswell LA, Gregersen PK, Behrens TW (2008) Association of systemic lupus erythematosus with C8orf13-BLK and ITGAM-ITGAX. N Engl J Med 358(9):900–909. doi:10.1056/NEJMoa0707865 PubMedCrossRefGoogle Scholar
  8. Howie BN, Donnelly P, Marchini J (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5(6):e1000,529. doi:10.1371/journal.pgen.1000529
  9. Hunter DJ, Kraft P, Jacobs KB, Cox DG, Yeager M, Hankinson SE, Wacholder S, Wang Z, Welch R, Hutchinson A, Wang J, Yu K, Chatterjee N, Orr N, Willett WC, Colditz GA, Ziegler RG, Berg CD, Buys SS, McCarty CA, Feigelson HS, Calle EE, Thun MJ, Hayes RB, Tucker M, Gerhard DS, Fraumeni JFJ, Hoover RN, Thomas G, Chanock SJ (2007) A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet 39(7):870–874. doi:10.1038/ng2075 PubMedCrossRefGoogle Scholar
  10. Li Y, Willer C, Sanna S, Abecasis G (2009) Genotype imputation. Annu Rev Genomics Hum Genet 10:387–406. doi:10.1146/annurev.genom.9.081307.164242 PubMedCrossRefGoogle Scholar
  11. Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34(8):816–834. doi:10.1002/gepi.20533 PubMedCrossRefGoogle Scholar
  12. Luca D, Ringquist S, Klei L, Lee AB, Gieger C, Wichmann HE, Schreiber S, Krawczak M, Lu Y, Styche A, Devlin B, Roeder K, Trucco M (2008) On the use of general control samples for genome-wide association studies: genetic matching highlights causal variants. Am J Hum Genet 82(2):453–63. doi:10.1016/j.ajhg.2007.11.003 PubMedCrossRefGoogle Scholar
  13. Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nat Rev Genet 11(7):499–511. doi:10.1038/nrg2796 PubMedCrossRefGoogle Scholar
  14. McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JPA, Hirschhorn JN (2008) Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9(5):356–369. doi:10.1038/nrg2344 PubMedCrossRefGoogle Scholar
  15. Moskvina V, Craddock N, Holmans P, Owen MJ, O’Donovan MC (2006) Effects of differential genotyping error rate on the type I error probability of case–control studies. Hum Hered 61(1):55–64. doi:10.1159/000092553 PubMedCrossRefGoogle Scholar
  16. Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2(12):e190. doi:10.1371/journal.pgen.0020190 PubMedCrossRefGoogle Scholar
  17. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38(8):904–909. doi:10.1038/ng1847 PubMedCrossRefGoogle Scholar
  18. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–575. doi:10.1086/519795 PubMedCrossRefGoogle Scholar
  19. Qi L, Cornelis MC, Kraft P, Stanya KJ, Linda Kao WH, Pankow JS, Dupuis J, Florez JC, Fox CS, Pare G, Sun Q, Girman CJ, Laurie CC, Mirel DB, Manolio TA, Chasman DI, Boerwinkle E, Ridker PM, Hunter DJ, Meigs JB, Lee CH, Hu FB, van Dam RM (2010) Genetic variants at 2q24 are associated with susceptibility to type 2 diabetes. Hum Mol Genet 19(13):2706–2715. doi:10.1093/hmg/ddq156 PubMedCrossRefGoogle Scholar
  20. R Development Core Team (2009) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, URL http://www.R-project.org, ISBN 3-900051-07-0
  21. Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, Erdos MR, Stringham HM, Chines PS, Jackson AU, Prokunina-Olsson L, Ding CJ, Swift AJ, Narisu N, Hu T, Pruim R, Xiao R, Li XY, Conneely KN, Riebow NL, Sprau AG, Tong M, White PP, Hetrick KN, Barnhart MW, Bark CW, Goldstein JL, Watkins L, Xiang F, Saramies J, Buchanan TA, Watanabe RM, Valle TT, Kinnunen L, Abecasis GR, Pugh EW, Doheny KF, Bergman RN, Tuomilehto J, Collins FS, Boehnke M (2007) A genome-wide association study of type 2 diabetes in finns detects multiple susceptibility variants. Science 316(5829):1341–1345. doi:10.1126/science.1142382 PubMedCrossRefGoogle Scholar
  22. Sebastiani P, Solovieff N, Puca A, Hartley S, Melista E, Andersen S, Dworkis D, Wilk J, Myers R, Steinberg M, Montano M, Baldwin C, Perls T (2010) Genetic signatures of exceptional longevity in humans. Science doi:10.1126/science.1190532
  23. Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447(7145):661–678. doi:10.1038/nature05911 Google Scholar
  24. Wrensch M, Jenkins RB, Chang JS, Yeh RF, Xiao Y, Decker PA, Ballman KV, Berger M, Buckner JC, Chang S, Giannini C, Halder C, Kollmeyer TM, Kosel ML, LaChance DH, McCoy L, O’Neill BP, Patoka J, Pico AR, Prados M, Quesenberry C, Rice T, Rynearson AL, Smirnov I, Tihan T, Wiemels J, Yang P, Wiencke JK (2009) Variants in the CDKN2B and RTEL1 regions are associated with high-grade glioma susceptibility. Nat Genet 41(8):905–908. doi:10.1038/ng.408 PubMedCrossRefGoogle Scholar
  25. Zhuang JJ, Zondervan K, Nyberg F, Harbron C, Jawaid A, Cardon LR, Barratt BJ, Morris AP (2010) Optimizing the power of genome-wide association studies by using publicly available reference samples to expand the control group. Genet Epidemiol 34(4):319–326. doi:10.1002/gepi.20482 PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  1. 1.Department of BiostatisticsHarvard School of Public HealthBostonUSA
  2. 2.Department of EpidemiologyHarvard School of Public HealthBostonUSA

Personalised recommendations