Original Investigation

Human Genetics

, Volume 131, Issue 1, pp 111-119

First online:

Artifact due to differential error when cases and controls are imputed from different platforms

  • Jennifer A. SinnottAffiliated withDepartment of Biostatistics, Harvard School of Public Health Email author 
  • , Peter KraftAffiliated withDepartment of Biostatistics, Harvard School of Public HealthDepartment of Epidemiology, Harvard School of Public Health

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access


Including previously genotyped controls in a genome-wide association study can provide cost-savings, but can also create design biases. When cases and controls are genotyped on different platforms, the imputation needed to provide genome-wide coverage will introduce differential measurement error and may lead to false positives. We compared genotype frequencies of two healthy control groups from the Nurses’ Health Study genotyped on different platforms [Affymetrix 6.0 (n = 1,672) and Illumina HumanHap550 (n = 1,038)]. Using standard imputation quality filters, we observed 9,841 single-nucleotide polymorphisms (SNPs) out of 2,347,809 (0.4%) significant at the 5 × 10−8 level. We explored three methods for controlling for this Type I error inflation. One method was to remove platform effects using principal components; another was to restrict to SNPs of highest quality imputation; and a third was to genotype some controls alongside cases to exclude SNPs that are statistical artifact. The first method could not reduce the Type I error rate; the other two could dramatically reduce the error rate, although both required that a portion of SNPs be excluded from analysis. Ideally, the biases we describe would be eliminated at the design stage, by genotyping sufficient numbers of cases and controls on each platform. Researchers using imputation to combine samples genotyped on different platforms with severely unbalanced case–control ratios should be aware of the potential for inflated Type I error rates and apply appropriate quality filters. Every SNP found with genome-wide significance should be validated on another platform to verify that its significance is not an artifact of study design.