Genotype imputation performance of three reference panels using African ancestry individuals

  • Candelaria Vergara
  • Margaret M. Parker
  • Liliana Franco
  • Michael H. Cho
  • Ana V. Valencia-Duarte
  • Terri H. Beaty
  • Priya Duggal
Original Investigation

Abstract

Genotype imputation estimates unobserved genotypes from genome-wide makers, to increase genome coverage and power for genome-wide association studies. Imputation has been successful for European ancestry populations in which very large reference panels are available. Smaller subsets of African descent populations are available in 1000 Genomes (1000G), the Consortium on Asthma among African ancestry Populations in the Americas (CAAPA) and the Haplotype Reference Consortium (HRC). We compared the performance of these reference panels when imputing variation in 3747 African Americans (AA) from two cohorts (HCV and COPDGene) genotyped using Illumina Omni microarrays. The haplotypes of 2504 (1000G), 883 (CAAPA) and 32,470 individuals (HRC) were used as reference. We compared the number of variants, imputation quality, imputation accuracy and coverage between panels. In both cohorts, 1000G imputed 1.5–1.6× more variants than CAAPA and 1.2× more than HRC. Similar findings were observed for variants with imputation R2 > 0.5 and for rare, low-frequency, and common variants. When merging imputed variants of the three panels, the total number was 62–63 M with 20 M overlapping variants imputed by all three panels, and a range of 5–15 M variants imputed exclusively with one of them. For overlapping variants, imputation quality was highest for HRC, followed by 1000G, then CAAPA, and improved as the minor allele frequency increased. 1000G, HRC and CAAPA provided high performance and accuracy for imputation of African American individuals, increasing the number of variants available for subsequent analyses. These panels are complementary and would benefit from the development of an integrated African reference panel.

Notes

Compliance with ethical standards

Conflict of interest

M.H.C. has received grant support from GSK. The remaining authors declare that they have no conflict of interest.

Supplementary material

439_2018_1881_MOESM1_ESM.docx (1.1 mb)
Supplementary material 1 (DOCX 1082 kb)
439_2018_1881_MOESM2_ESM.docx (15 kb)
Supplementary material 2 (DOCX 14 kb)

References

  1. Abecasis GR, Auton A, Brooks LD et al (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65.  https://doi.org/10.1038/nature11632 CrossRefPubMedGoogle Scholar
  2. Adeyemo A, Gerry N, Chen G et al (2009) A genome-wide association study of hypertension and blood pressure in African Americans. PLoS Genet 5:e1000564.  https://doi.org/10.1371/journal.pgen.1000564 CrossRefPubMedPubMedCentralGoogle Scholar
  3. Alric L, Fort M, Izopet J et al (1997) Genes of the major histocompatibility complex class II influence the outcome of hepatitis C virus infection. Gastroenterology 113:1675–1681CrossRefPubMedGoogle Scholar
  4. Anderson CA, Pettersson FH, Barrett JC et al (2008) Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms. Am J Hum Genet 83:112–119.  https://doi.org/10.1016/j.ajhg.2008.06.008 CrossRefPubMedPubMedCentralGoogle Scholar
  5. Auton A, Abecasis GR, Altshuler DM et al (2015) A global reference for human genetic variation. Nature 526:68–74.  https://doi.org/10.1038/nature15393 CrossRefPubMedGoogle Scholar
  6. Baran Y, Pasaniuc B, Sankararaman S et al (2012) Fast and accurate inference of local ancestry in Latino populations. Bioinformatics 28:1359–1367.  https://doi.org/10.1093/bioinformatics/bts144 CrossRefPubMedPubMedCentralGoogle Scholar
  7. Brody JA, Morrison AC, Bis JC et al (2017) Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology. Nat Genet 49:1560–1563.  https://doi.org/10.1038/ng.3968 CrossRefPubMedGoogle Scholar
  8. Browning BL, Browning SR (2009) A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet 84:210–223.  https://doi.org/10.1016/j.ajhg.2009.01.005 CrossRefPubMedPubMedCentralGoogle Scholar
  9. Campbell MC, Tishkoff SA (2008) African genetic diversity: implications for human demographic history, modern human origins, and complex disease mapping. Ann Rev Genom Human Genet 9(1):403–33.  https://doi.org/10.1146/annurev.genom.9.081307.164258 CrossRefGoogle Scholar
  10. Cavalli-Sforza LL (2005) The human genome diversity project: past, present and future. Nat Rev Genet 6:333–340.  https://doi.org/10.1038/nrg1596 CrossRefPubMedGoogle Scholar
  11. Chanda P, Yuhki N, Li M et al (2012) Comprehensive evaluation of imputation performance in African Americans. J Hum Genet 57:411–421.  https://doi.org/10.1038/jhg.2012.43 CrossRefPubMedPubMedCentralGoogle Scholar
  12. Cho MH, Castaldi PJ, Wan ES et al (2012) A genome-wide association study of COPD identifies a susceptibility locus on chromosome 19q13. Hum Mol Genet 21:947–957.  https://doi.org/10.1093/hmg/ddr524 CrossRefPubMedGoogle Scholar
  13. Chou W-C, Zheng H-F, Cheng C-H et al (2016) A combined reference panel from the 1000 genomes and UK10K projects improved rare variant imputation in European and Chinese samples. Sci Rep 6:39313.  https://doi.org/10.1038/srep39313 CrossRefPubMedPubMedCentralGoogle Scholar
  14. Cox AL, Netski DM, Mosbruger T et al (2005) Prospective evaluation of community-acquired acute-phase hepatitis C virus infection. Clin Infect Dis 40:951–958.  https://doi.org/10.1086/428578 CrossRefPubMedGoogle Scholar
  15. Cramp ME, Carucci P, Underhill J et al (1998) Association between HLA class II genotype and spontaneous clearance of hepatitis C viraemia. J Hepatol 29:207–213CrossRefPubMedGoogle Scholar
  16. Danecek P, Auton A, Abecasis G et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158.  https://doi.org/10.1093/bioinformatics/btr330 CrossRefPubMedPubMedCentralGoogle Scholar
  17. Das S, Forer L, Schönherr S et al (2016) Next-generation genotype imputation service and methods. Nat Genet 48:1284–1287.  https://doi.org/10.1038/ng.3656 CrossRefPubMedPubMedCentralGoogle Scholar
  18. Deelen P, Menelaou A, van Leeuwen EM et al (2014) Improved imputation quality of low-frequency and rare variants in European samples using the “Genome of The Netherlands”. Eur J Hum Genet 22:1321–1326.  https://doi.org/10.1038/ejhg.2014.19 CrossRefPubMedPubMedCentralGoogle Scholar
  19. DePristo MA, Banks E, Poplin R et al (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43:491–498.  https://doi.org/10.1038/ng.806 CrossRefPubMedPubMedCentralGoogle Scholar
  20. Duan Q, Liu EY, Auer PL et al (2013) Imputation of coding variants in African Americans: better performance using data from the exome sequencing project. Bioinformatics 29:2744–2749.  https://doi.org/10.1093/bioinformatics/btt477 CrossRefPubMedPubMedCentralGoogle Scholar
  21. Duggal P, Thio CL, Wojcik GL et al (2013) Genome wide association study of spontaneous resolution of hepatitis C virus infection: data from multiple cohorts. Ann Intern Med 158:235–245.  https://doi.org/10.7326/0003-4819-158-4-201302190-00003.Genome CrossRefPubMedPubMedCentralGoogle Scholar
  22. Durbin R (2014) Efficient haplotype matching and storage using the positional Burrows–Wheeler transform (PBWT). Bioinformatics 30:1266–1272.  https://doi.org/10.1093/bioinformatics/btu014 CrossRefPubMedPubMedCentralGoogle Scholar
  23. Edlin BR, Shu MA, Winkelstein E et al (2009) More rare birds, and the occasional swan. Gastroenterology 136:2412–2414.  https://doi.org/10.1053/j.gastro.2009.04.040 CrossRefPubMedPubMedCentralGoogle Scholar
  24. Fuchsberger C, Abecasis GR, Hinds DA (2015) minimac2: faster genotype imputation. Bioinformatics 31:782–784.  https://doi.org/10.1093/bioinformatics/btu704 CrossRefPubMedGoogle Scholar
  25. Genome of the Netherlands Consortium LC, Menelaou A, Pulit SL et al (2014) Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet 46:818–825.  https://doi.org/10.1038/ng.3021 CrossRefGoogle Scholar
  26. Goedert JJ, Chen BE, Preiss L et al (2007) Reconstruction of the hepatitis C virus epidemic in the US hemophilia population, 1940–1990. Am J Epidemiol 165:1443–1453.  https://doi.org/10.1093/aje/kwm030 CrossRefPubMedGoogle Scholar
  27. Gurdasani D, Carstensen T, Tekola-Ayele F et al (2014) The African genome variation project shapes medical genetics in Africa. Nature 517:327–332.  https://doi.org/10.1038/nature13997 CrossRefPubMedPubMedCentralGoogle Scholar
  28. Hancock DB, Levy JL, Gaddis NC et al (2012) Assessment of genotype imputation performance using 1000 genomes in African American studies. PLoS One 7:e50610.  https://doi.org/10.1371/journal.pone.0050610 CrossRefPubMedPubMedCentralGoogle Scholar
  29. Hilgartner MW, Donfield SM, Willoughby A et al (1993) Hemophilia growth and development study. Design, methods, and entry data. Am J Pediatr Hematol Oncol 15:208–218CrossRefPubMedGoogle Scholar
  30. Hoffmann TJ, Zhan Y, Kvale MN et al (2011) Design and coverage of high throughput genotyping arrays optimized for individuals of East Asian, African American, and Latino race/ethnicity using imputation and a novel hybrid SNP selection algorithm. Genomics 98:422–430.  https://doi.org/10.1016/j.ygeno.2011.08.007 CrossRefPubMedPubMedCentralGoogle Scholar
  31. Howie B, Fuchsberger C, Stephens M et al (2012) Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet 44:955–959.  https://doi.org/10.1038/ng.2354 CrossRefPubMedPubMedCentralGoogle Scholar
  32. Huang GH, Tseng YC (2014) Genotype imputation accuracy with different reference panels in admixed populations. BMC Proc 8:S64.  https://doi.org/10.1186/1753-6561-8-s1-s64 CrossRefPubMedPubMedCentralGoogle Scholar
  33. Huang L, Li Y, Singleton AB et al (2009) Genotype-imputation accuracy across worldwide human populations. Am J Hum Genet 84:235–250.  https://doi.org/10.1016/j.ajhg.2009.01.013 CrossRefPubMedPubMedCentralGoogle Scholar
  34. Huang J, Howie B, McCarthy S et al (2015) Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat Commun 6:8111.  https://doi.org/10.1038/ncomms9111 CrossRefPubMedPubMedCentralGoogle Scholar
  35. Johnson EO, Hancock DB, Levy JL et al (2013) Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy. Hum Genet 132:509–522.  https://doi.org/10.1007/s00439-013-1266-7 CrossRefPubMedPubMedCentralGoogle Scholar
  36. Jorde LB, Watkins WS, Bamshad MJ, DixonME Ricker CE, Seielstad MT, Batzer MA (2000) The distribution of human genetic diversity: a comparison of mitochondrial, autosomal, and Y-chromosome data. Am J Human Genet 66(3):979–988.  https://doi.org/10.1086/302825 CrossRefGoogle Scholar
  37. Kent WJ, Sugnet CW, Furey TS, Roskin KM (2002) The human genome browser at UCSC. Genome Res 12:996–1006.  https://doi.org/10.1101/gr.229102 CrossRefPubMedPubMedCentralGoogle Scholar
  38. Khakoo SI, Thio CL, Martin MP et al (2004) HLA and NK cell inhibitory receptor genes in resolving hepatitis C virus infection. Science (80-) 305:872–874.  https://doi.org/10.1126/science.1097670 CrossRefGoogle Scholar
  39. Kim AY, Kuntzen T, Timm J et al (2011) Spontaneous control of HCV is associated with expression of HLA-B 57 and preservation of targeted epitopes. Gastroenterology 140:686.e1–696.e1.  https://doi.org/10.1053/j.gastro.2010.09.042 Google Scholar
  40. Krithika S, Valladares-Salgado A, Peralta J et al (2012) Evaluation of the imputation performance of the program IMPUTE in an admixed sample from Mexico City using several model designs. BMC Med Genom 5:12.  https://doi.org/10.1186/1755-8794-5-12 CrossRefGoogle Scholar
  41. Kuniholm MH, Gao X, Xue X et al (2011) The relation of HLA genotype to hepatitis C viral load and markers of liver fibrosis in HIV-infected and HIV-uninfected women. J Infect Dis 203:1807–1814.  https://doi.org/10.1093/infdis/jir192 CrossRefPubMedPubMedCentralGoogle Scholar
  42. Lander ES, Linton LM, Birren B et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921.  https://doi.org/10.1038/35057062 CrossRefPubMedGoogle Scholar
  43. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–1760.  https://doi.org/10.1093/bioinformatics/btp324 CrossRefPubMedPubMedCentralGoogle Scholar
  44. Li Y, Willer CJ, Ding J et al (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34:816–834.  https://doi.org/10.1002/gepi.20533 CrossRefPubMedPubMedCentralGoogle Scholar
  45. Lin P, Hartz SM, Zhang Z et al (2010) A new statistic to evaluate imputation reliability. PLoS One 5:e9697.  https://doi.org/10.1371/journal.pone.0009697 CrossRefPubMedPubMedCentralGoogle Scholar
  46. Loh P-R, Danecek P, Palamara PF et al (2016) Reference-based phasing using the Haplotype Reference Consortium panel.  https://doi.org/10.1101/052308
  47. Mangia A, Gentile R, Cascavilla I et al (1999) HLA class II favors clearance of HCV infection and progression of the chronic liver damage. J Hepatol 30:984–989CrossRefPubMedGoogle Scholar
  48. Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nat Rev Genet 11:499–511.  https://doi.org/10.1038/nrg2796 CrossRefPubMedGoogle Scholar
  49. Mathias RA, Taub MA, Gignoux CR et al (2016) A continuum of admixture in the Western Hemisphere revealed by the African Diaspora genome. Nat Commun 7:12522.  https://doi.org/10.1038/ncomms12522 CrossRefPubMedPubMedCentralGoogle Scholar
  50. McCarthy S, Das S, Kretzschmar W et al (2016) A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 48:1279–1283.  https://doi.org/10.1038/ng.3643 CrossRefPubMedPubMedCentralGoogle Scholar
  51. McRae AF (2017) Analysis of genome-wide association data. In: Keith JM (ed) Bioinformatics, 2nd edn. Humana Press, Melbourne, pp 161–174CrossRefGoogle Scholar
  52. Mitt M, Kals M, Pärn K et al (2017) Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel. Eur J Hum Genet.  https://doi.org/10.1038/ejhg.2017.51 PubMedPubMedCentralGoogle Scholar
  53. Nelson SC, Doheny KF, Pugh EW et al (2013) Imputation-based genomic coverage assessments of current human genotyping arrays. G3 3:1795–1807.  https://doi.org/10.1534/g3.113.007161 CrossRefPubMedPubMedCentralGoogle Scholar
  54. Nelson SC, Romm JM, Doheny KF, et al (2017) Imputation-based genomic coverage assessments of current genotyping arrays: Illumina HumanCore, OmniExpress, Multi-Ethnic global array and sub-arrays, Global Screening Array, Omni2.5M, Omni5M, and Affymetrix UK Biobank.  https://doi.org/10.1101/150219
  55. Nothnagel M, Ellinghaus D, Schreiber S et al (2009) A comprehensive evaluation of SNP genotype imputation. Hum Genet 125:163–171.  https://doi.org/10.1007/s00439-008-0606-5 CrossRefPubMedGoogle Scholar
  56. Parker MM, Foreman MG, Abel HJ et al (2014) Admixture mapping identifies a quantitative trait locus associated with FEV1/FVC in the COPDGene study. Genet Epidemiol 38:652–659.  https://doi.org/10.1002/gepi.21847 CrossRefPubMedPubMedCentralGoogle Scholar
  57. Pistis G, Porcu E, Vrieze SI et al (2015) Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs. Eur J Hum Genet 23:975–983.  https://doi.org/10.1038/ejhg.2014.216 CrossRefPubMedGoogle Scholar
  58. Price AL, Patterson NJ, Plenge RM et al (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909.  https://doi.org/10.1038/ng1847 CrossRefPubMedGoogle Scholar
  59. Purcell S, Neale B, Todd-Brown K et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575.  https://doi.org/10.1086/519795 CrossRefPubMedPubMedCentralGoogle Scholar
  60. R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
  61. Ramnarine S, Zhang J, Chen L-S et al (2015) When does choice of accuracy measure alter imputation accuracy assessments. PLoS One 10:e0137601.  https://doi.org/10.1371/journal.pone.0137601 CrossRefPubMedPubMedCentralGoogle Scholar
  62. Regan EA, Hokanson JE, Murphy JR et al (2010) Genetic epidemiology of COPD (COPDGene) study design. COPD J Chronic Obstr Pulm Dis 7:32–43.  https://doi.org/10.3109/15412550903499522 CrossRefGoogle Scholar
  63. Roshyara NR, Scholz M (2015) Impact of genetic similarity on imputation accuracy. BMC Genet 16:90.  https://doi.org/10.1186/s12863-015-0248-2 CrossRefPubMedPubMedCentralGoogle Scholar
  64. Roshyara NR, Horn K, Kirsten H et al (2016) Comparing performance of modern genotype imputation methods in different ethnicities. Sci Rep 6:34386.  https://doi.org/10.1038/srep34386 CrossRefPubMedPubMedCentralGoogle Scholar
  65. Shriner D, Adeyemo A, Chen G, Rotimi CN (2010) Practical considerations for imputation of untyped markers in admixed populations. Genet Epidemiol 34:258–265.  https://doi.org/10.1002/gepi.20457 PubMedPubMedCentralGoogle Scholar
  66. Sudmant PH, Rausch T, Gardner EJ et al (2015) An integrated map of structural variation in 2,504 human genomes. Nature 526:75–81.  https://doi.org/10.1038/nature15394 CrossRefPubMedPubMedCentralGoogle Scholar
  67. Sung YJ, Gu CC, Tiwari HK et al (2012) Genotype imputation for African Americans using data from HapMap phase II versus 1000 genomes projects. Genet Epidemiol 36:508–516.  https://doi.org/10.1002/gepi.21647 CrossRefPubMedPubMedCentralGoogle Scholar
  68. The International HapMap 3 Consortium, Altshuler DM, Gibbs RA et al (2010) Integrating common and rare genetic variation in diverse human populations. Nature 467:52–58.  https://doi.org/10.1038/nature09298 CrossRefPubMedCentralGoogle Scholar
  69. Tobler LH, Bahrami SH, Kaidarova Z et al (2010) A case–control study of factors associated with resolution of hepatitis C viremia in former blood donors (CME). Transfusion 50:1513–1523.  https://doi.org/10.1111/j.1537-2995.2010.02634.x CrossRefPubMedPubMedCentralGoogle Scholar
  70. Van der Auwera G, Carneiro M, Hartl C et al (2013) From FastQ data to high confidence variant calls: the genome analysis toolkit best practices pipeline. Curr Protoc Bioinform 43:11.10.1–11.10.33.  https://doi.org/10.1002/0471250953.bi1110s43 Google Scholar
  71. van Iperen E, Hovingh G, Asselbergs F, Zwinderman A (2017) Extending the use of GWAS data by combining data from different genetic platforms. PLoS One 12:e0172082.  https://doi.org/10.1371/journal.pone.0172082 (eCollection 2017) CrossRefPubMedPubMedCentralGoogle Scholar
  72. Verma SS, de Andrade M, Tromp G et al (2014) Imputation and quality control steps for combining multiple genome-wide datasets. Front Genet 5:370.  https://doi.org/10.3389/fgene.2014.00370 CrossRefPubMedPubMedCentralGoogle Scholar
  73. Visscher PM, Wray NR, Zhang Q et al (2017) 10 Years of GWAS discovery: biology, function, and translation. Am J Hum Genet 101:5–22.  https://doi.org/10.1016/j.ajhg.2017.06.005 CrossRefPubMedPubMedCentralGoogle Scholar
  74. Vlahov D, Muñoz A, Anthony J et al (1990) Association of drug injection patterns with antibody to human immunodeficiency virus type 1 among intravenous drug users in Baltimore, Maryland. Am J Epidemiol 132:847–856CrossRefPubMedGoogle Scholar
  75. Walter K, Min JL, Huang J et al (2015) The UK10K project identifies rare variants in health and disease. Nature 526:82–90.  https://doi.org/10.1038/nature14962 CrossRefPubMedGoogle Scholar
  76. Warren HR, Evangelou E, Cabrera CP et al (2017) Genome-wide association analysis identifies novel blood pressure loci and offers biological insights into cardiovascular risk. Nat Genet 49:403–415.  https://doi.org/10.1038/ng.3768 CrossRefPubMedGoogle Scholar
  77. Wojcik GL, Thio CL, Kao WHL et al (2014) Admixture analysis of spontaneous hepatitis C virus clearance in individuals of African-descent. Genes Immun 15:241–246.  https://doi.org/10.1038/gene.2014.11 CrossRefPubMedPubMedCentralGoogle Scholar
  78. Wojcik GL, Fuchsberger C, Taliun D, et al (2017) Imputation aware tag SNP selection to improve power for multi-ethnic association studies.  https://doi.org/10.1101/105551
  79. Zhang B, Zhi D, Zhang K et al (2011) Practical consideration of genotype imputation: sample size, window size, reference choice, and untyped rate. Stat Interface 4:339–352CrossRefPubMedPubMedCentralGoogle Scholar
  80. Zhao Z, Timofeev N, Hartley SW et al (2008) Imputation of missing genotypes: an empirical evaluation of IMPUTE. BMC Genet 9:85.  https://doi.org/10.1186/1471-2156-9-85 CrossRefPubMedPubMedCentralGoogle Scholar
  81. Zheng-Bradley X, Flicek P (2016) Applications of the 1000 genomes project resources. Brief Funct Genom 16:elw027.  https://doi.org/10.1093/bfgp/elw027 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Johns Hopkins University, School of MedicineBaltimoreUSA
  2. 2.Channing Division of Network MedicineBrigham and Women’s HospitalBostonUSA
  3. 3.National School of Public HealthUniversidad de AntioquiaMedellínColombia
  4. 4.School of MedicineUniversidad Pontificia BolivarianaMedellínColombia
  5. 5.Division of Pulmonary and Critical Care MedicineBrigham and Women’s HospitalBostonUSA
  6. 6.Johns Hopkins University, Bloomberg School of Public HealthBaltimoreUSA

Personalised recommendations