Genotype imputation performance of three reference panels using African ancestry individuals
Genotype imputation estimates unobserved genotypes from genome-wide makers, to increase genome coverage and power for genome-wide association studies. Imputation has been successful for European ancestry populations in which very large reference panels are available. Smaller subsets of African descent populations are available in 1000 Genomes (1000G), the Consortium on Asthma among African ancestry Populations in the Americas (CAAPA) and the Haplotype Reference Consortium (HRC). We compared the performance of these reference panels when imputing variation in 3747 African Americans (AA) from two cohorts (HCV and COPDGene) genotyped using Illumina Omni microarrays. The haplotypes of 2504 (1000G), 883 (CAAPA) and 32,470 individuals (HRC) were used as reference. We compared the number of variants, imputation quality, imputation accuracy and coverage between panels. In both cohorts, 1000G imputed 1.5–1.6× more variants than CAAPA and 1.2× more than HRC. Similar findings were observed for variants with imputation R2 > 0.5 and for rare, low-frequency, and common variants. When merging imputed variants of the three panels, the total number was 62–63 M with 20 M overlapping variants imputed by all three panels, and a range of 5–15 M variants imputed exclusively with one of them. For overlapping variants, imputation quality was highest for HRC, followed by 1000G, then CAAPA, and improved as the minor allele frequency increased. 1000G, HRC and CAAPA provided high performance and accuracy for imputation of African American individuals, increasing the number of variants available for subsequent analyses. These panels are complementary and would benefit from the development of an integrated African reference panel.
Compliance with ethical standards
Conflict of interest
M.H.C. has received grant support from GSK. The remaining authors declare that they have no conflict of interest.
- Campbell MC, Tishkoff SA (2008) African genetic diversity: implications for human demographic history, modern human origins, and complex disease mapping. Ann Rev Genom Human Genet 9(1):403–33. https://doi.org/10.1146/annurev.genom.9.081307.164258 CrossRefGoogle Scholar
- Duggal P, Thio CL, Wojcik GL et al (2013) Genome wide association study of spontaneous resolution of hepatitis C virus infection: data from multiple cohorts. Ann Intern Med 158:235–245. https://doi.org/10.7326/0003-4819-158-4-201302190-00003.Genome CrossRefPubMedPubMedCentralGoogle Scholar
- Hoffmann TJ, Zhan Y, Kvale MN et al (2011) Design and coverage of high throughput genotyping arrays optimized for individuals of East Asian, African American, and Latino race/ethnicity using imputation and a novel hybrid SNP selection algorithm. Genomics 98:422–430. https://doi.org/10.1016/j.ygeno.2011.08.007 CrossRefPubMedPubMedCentralGoogle Scholar
- Loh P-R, Danecek P, Palamara PF et al (2016) Reference-based phasing using the Haplotype Reference Consortium panel. https://doi.org/10.1101/052308
- Nelson SC, Romm JM, Doheny KF, et al (2017) Imputation-based genomic coverage assessments of current genotyping arrays: Illumina HumanCore, OmniExpress, Multi-Ethnic global array and sub-arrays, Global Screening Array, Omni2.5M, Omni5M, and Affymetrix UK Biobank. https://doi.org/10.1101/150219
- R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
- Wojcik GL, Fuchsberger C, Taliun D, et al (2017) Imputation aware tag SNP selection to improve power for multi-ethnic association studies. https://doi.org/10.1101/105551