Human Genetics

, Volume 125, Issue 3, pp 295–303

Use of weighted reference panels based on empirical estimates of ancestry for capturing untyped variation

  • Matthew R. L. Egyud
  • Zofia K. Z. Gajdos
  • Johannah L. Butler
  • Sam Tischfield
  • Loic Le Marchand
  • Laurence N. Kolonel
  • Christopher A. Haiman
  • Brian E. Henderson
  • Joel N. Hirschhorn
Original Investigation

Abstract

Many association methods use a subset of genotyped single nucleotide polymorphisms (SNPs) to capture or infer genotypes at other untyped SNPs. We and others previously showed that tag SNPs selected to capture common variation using data from The International HapMap Consortium (Nature 437:1299–1320, 2005), The International HapMap Consortium (Nature 449:851–861, 2007) could also capture variation in populations of similar ancestry to HapMap reference populations (de Bakker et al. in Nat Genet 38:1298–1303, 2006; González-Neira et al. in Genome Res 16:323–330, 2006; Montpetit et al. in PLoS Genet 2:282–290, 2006; Mueller et al. in Am J Hum Genet 76:387–398, 2005). To capture variation in admixed populations or populations less similar to HapMap panels, a “cosmopolitan approach,” in which all samples from HapMap are used as a single reference panel, was proposed. Here we refine this suggestion and show that use of a “weighted reference panel,” constructed based on empirical estimates of ancestry in the target population (relative to available reference panels), is more efficient than the cosmopolitan approach. Weighted reference panels capture, on average, only slightly fewer common variants (minor allele frequency > 5%) than the cosmopolitan approach (mean r2 = 0.977 vs. 0.989, 94.5% variation captured vs. 96.8% at r2 > 0.8), across the five populations of the Multiethnic Cohort, but entail approximately 25% fewer tag SNPs per panel (average 538 vs. 718). These results extend a recent study in two Indian populations (Pemberton et al. in Ann Hum Genet 72:535–546, 2008). Weighted reference panels are potentially useful for both the selection of tag SNPs in diverse populations and perhaps in the design of reference panels for imputation of untyped genotypes in genome-wide association studies in admixed populations.

Supplementary material

439_2009_627_MOESM1_ESM.doc (40 kb)
Supplementary tables (DOC 40 kb)

References

  1. Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of ld and haplotype maps. Bioinformatics 21:263–265PubMedCrossRefGoogle Scholar
  2. Bersaglieri T, Sabeti PC, Patterson N, Vanderploeg T, Schaffner SF, Drake JA, Rhodes M, Reich DE, Hirschhorn JN (2004) Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet 74:1111–1120PubMedCrossRefGoogle Scholar
  3. Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, Bodmer J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen A, Chen Z, Chu J, Carcassi C, Contu L, Du R, Excoffier L, Ferrara GB, Friedlaender JS, Groot H, Gurwitz D, Jenkins T, Herrera RJ, Huang X, Kidd J, Kidd KK, Langaney A, Lin AA, Mehdi SQ, Parham P, Piazza A, Pistillo MP, Qian Y, Shu Q, Xu J, Zhu S, Weber JL, Greely HT, Feldman MW, Thomas G, Dausset J, Cavalli-Sforza LL (2002) A human genome diversity cell line panel. Science 296:261–262PubMedCrossRefGoogle Scholar
  4. Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 74:106–120PubMedCrossRefGoogle Scholar
  5. Chapman JM, Cooper JD, Todd JA, Clayton DG (2003) Detecting disease associations due to linkage disequilibrium using haplotype tags: a class of tests and determinants of statistical power. Hum Hered 56:18–31PubMedCrossRefGoogle Scholar
  6. Clayton D, Chapman J, Cooper J (2004) Use of unphased multilocus genotype data in indirect association studies. Genet Epidemiol 27:415–428PubMedCrossRefGoogle Scholar
  7. Coriell Institute for Medical Research. Available at http://ccr.coriell.org/Sections/Collections/NHGRI/hapmap.aspx?PgId=266 Accessed 25 Jun 2008
  8. de Bakker PI, Yelenski R, Pe’er I, Gabriel SB, Daly MJ, Altshuler D (2005) Efficiency and power in genetic association studies. Nat Genet 37:1217–1223PubMedCrossRefGoogle Scholar
  9. de Bakker PI, Burtt N, Graham RR, Guiducci C, Yelensky R, Drake JA, Bersaglieri T, Penney KL, Butler J, Young S, Onofrio RC, Lyon HN, Stram DO, Haiman CA, Freedman ML, Zhu X, Cooper R, Groop L, Kolonel LN, Henderson BE, Daly MJ, Hirschhorn JN, Altshuler D (2006) Transferability of tag SNPs in genetic association studies in multiple populations. Nat Genet 38:1298–1303PubMedCrossRefGoogle Scholar
  10. González-Neira A, Ke X, Lao O, Calafell F, Navarro A, Comas D, Cann H, Bumpstead S, Ghori J, Hunt S, Deloukas P, Dunham I, Cardon LR, Bertranpetit J (2006) The portability of tag SNPs across populations: a worldwide survey. Genome Res 16:323–330PubMedCrossRefGoogle Scholar
  11. Haiman CA, Patterson N, Freedman ML, Myers SR, Pike MC, Waliszewska A, Neubauer J, Tandon A, Schirmer C, McDonald GJ, Greenway SC, Stram DO, Le Marchand L, Kolonel LN, Frasco M, Wong D, Pooler LC, Ardlie K, Oakley-Girvan I, Whittemore AS, Cooney KA, John EM, Ingles SA, Altshuler D, Henderson BE, Reich D (2007) Multiple regions within 8q24 independently affect risk for prostate cancer. Nat Genet 39:638–644PubMedCrossRefGoogle Scholar
  12. Haploview (2008) Available at http://www.broad.mit.edu/mpg/haploview/. Accessed 08 Jun 2008
  13. Johnson GC, Esposito L, Barratt BJ, Smith AN, Heward J, Di Genova G, Ueda H, Cordell HJ, Eaves IA, Dudbridge F, Twells RC, Payne F, Hughes W, Nutland S, Stevens H, Carr P, Tuomilehto-Wolf E, Tuomilehto J, Gough SC, Clayton DG, Todd JA (2001) Haplotype tagging for the identification of common disease genes. Nat Genet 29:233–237PubMedCrossRefGoogle Scholar
  14. Kolonel LN, Henderson B, Hankin JH, Nomura AM, Wilkens LR, Pike MC, Stram DO, Monroe KR, Earle ME, Nagamine FS (2000) A Multiethnic Cohort in Hawaii and Los Angeles: baseline characteristics. Am J Epidemiol 151:346–357PubMedGoogle Scholar
  15. Li Y, Willer CJ, Ding J, Sheet P, Abecasis GR (2008) Rapid Markov chain haplotyping and genotype inference (Submitted)Google Scholar
  16. Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39:906–913PubMedCrossRefGoogle Scholar
  17. Maresso K, Broeckel U (2008) Genotyping platforms for mass-throughput genotyping with SNPs, including human genome-wide scans. Adv Genet 60:107–139PubMedCrossRefGoogle Scholar
  18. Montpetit A, Nelis M, Laflamme P, Magi R, Ke X, Remm M, Cardon L, Hudson TJ, Metspalu A (2006) An evaluation of the performance of tag SNPs derived from HapMap in a Caucasian population. PLoS Genet 2:282–290CrossRefGoogle Scholar
  19. Mueller JC, Lõhmussaar E, Mägi R, Remm M, Bettecken T, Lichtner P, Biskup S, Illig T, Pfeufer A, Luedemann J, Schreiber S, Pramstaller P, Pichler I, Romeo G, Gaddi A, Testa A, Wichmann HE, Metspalu A, Meitinger T (2005) Linkage disequilibrium patterns and tag SNP transferability among European populations. Am J Hum Genet 76:387–398PubMedCrossRefGoogle Scholar
  20. Pemberton TJ, Jakobsson M, Conrad DF, Coop G, Wall JD, Pritchard JK, Patel PI, Rosenberg NA (2008) Using population mixtures to optimize the utility of genomic databases: linkage disequilibrium and association study design in India. Ann Hum Genet 72:535–546PubMedCrossRefGoogle Scholar
  21. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959PubMedGoogle Scholar
  22. Sequenom (2008) Available at http://www.sequenom.com. Accessed 12 Jul 2008
  23. Smith MW, Patterson N, Lautenberger JA, Truelove AL, McDonald GJ, Waliszewska A, Kessing BD, Malasky MJ, Scafe C, Le E, De Jager PL, Mignault AA, Yi Z, De The G, Essex M, Sankale JL, Moore JH, Poku K, Phair JP, Goedert JJ, Vlahov D, Williams SM, Tishkoff SA, Winkler CA, De La Vega FM, Woodage T, Sninsky JJ, Hafler DA, Altshuler D, Gilbert DA, O’Brien SJ, Reich D (2004) A high-density admixture map for disease gene discovery in African Americans. Am J Hum Genet 74:1001–1013PubMedCrossRefGoogle Scholar
  24. The International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437:1299–1320CrossRefGoogle Scholar
  25. The International HapMap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851–861CrossRefGoogle Scholar
  26. Tian C, Hinds D, Shigeta R, Kittles R, Ballinger D, Seldin M (2006) A genome-wide single-nucleotide-polymorphism panel with high ancestry information for African American admixture mapping. Am J Hum Genet 79:640–649PubMedCrossRefGoogle Scholar
  27. Zeggini E, Scott L, Saxena R, Voight BF, Marchini JL, Hu T, de Bakker PI, Abecasis GR, Almgren P, Andersen G, Ardlie K, Boström KB, Bergman RN, Bonnycastle LL, Borch-Johnsen K, Burtt NP, Chen H, Chines PS, Daly MJ, Deodhar P, Ding CJ, Doney AS, Duren WL, Elliott KS, Erdos MR, Frayling TM, Freathy RM, Gianniny L, Grallert H, Grarup N, Groves CJ, Guiducci C, Hansen T, Herder C, Hitman GA, Hughes TE, Isomaa B, Jackson AU, Jørgensen T, Kong A, Kubalanza K, Kuruvilla FG, Kuusisto J, Langenberg C, Lango H, Lauritzen T, Li Y, Lindgren CM, Lyssenko V, Marvelle AF, Meisinger C, Midthjell K, Mohlke KL, Morken MA, Morris AD, Narisu N, Nilsson P, Owen KR, Palmer CN, Payne F, Perry JR, Pettersen E, Platou C, Prokopenko I, Qi L, Qin L, Rayner NW, Rees M, Roix JJ, Sandbaek A, Shields B, Sjögren M, Steinthorsdottir V, Stringham HM, Swift AJ, Thorleifsson G, Thorsteinsdottir U, Timpson NJ, Tuomi T, Tuomilehto J, Walker M, Watanabe RM, Weedon MN, Willer CJ, Wellcome Trust Case Control Consortium, Illig T, Hveem K, Hu FB, Laakso M, Stefansson K, Pedersen O, Wareham NJ, Barroso I, Hattersley AT, Collins FS, Groop L, McCarthy MI, Boehnke M, Altshuler D (2008) Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet 40:638–645PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  • Matthew R. L. Egyud
    • 1
    • 2
  • Zofia K. Z. Gajdos
    • 1
    • 3
    • 4
  • Johannah L. Butler
    • 1
    • 3
  • Sam Tischfield
    • 1
    • 3
  • Loic Le Marchand
    • 5
  • Laurence N. Kolonel
    • 5
  • Christopher A. Haiman
    • 6
  • Brian E. Henderson
    • 6
  • Joel N. Hirschhorn
    • 1
    • 3
    • 4
  1. 1.Program in Genomics and Division of EndocrinologyChildren’s HospitalBostonUSA
  2. 2.Boston University School of MedicineBostonUSA
  3. 3.Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUSA
  4. 4.Department of GeneticsHarvard Medical SchoolBostonUSA
  5. 5.Cancer Research CenterUniversity of HawaiiHonoluluUSA
  6. 6.Department of Preventative Medicine, Keck School of MedicineUniversity of Southern CaliforniaLos AngelesUSA

Personalised recommendations