Human Genetics

, Volume 128, Issue 2, pp 165–177 | Cite as

Self-reported ethnicity, genetic structure and the impact of population stratification in a multiethnic study

  • Hansong Wang
  • Christopher A. Haiman
  • Laurence N. Kolonel
  • Brian E. Henderson
  • Lynne R. Wilkens
  • Loïc Le Marchand
  • Daniel O. Stram
Original Investigation

Abstract

It is well-known that population substructure may lead to confounding in case–control association studies. Here, we examined genetic structure in a large racially and ethnically diverse sample consisting of five ethnic groups of the Multiethnic Cohort study (African Americans, Japanese Americans, Latinos, European Americans and Native Hawaiians) using 2,509 SNPs distributed across the genome. Principal component analysis on 6,213 study participants, 18 Native Americans and 11 HapMap III populations revealed four important principal components (PCs): the first two separated Asians, Europeans and Africans, and the third and fourth corresponded to Native American and Native Hawaiian (Polynesian) ancestry, respectively. Individual ethnic composition derived from self-reported parental information matched well to genetic ancestry for Japanese and European Americans. STRUCTURE-estimated individual ancestral proportions for African Americans and Latinos are consistent with previous reports. We quantified the East Asian (mean 27%), European (mean 27%) and Polynesian (mean 46%) ancestral proportions for the first time, to our knowledge, for Native Hawaiians. Simulations based on realistic settings of case–control studies nested in the Multiethnic Cohort found that the effect of population stratification was modest and readily corrected by adjusting for race/ethnicity or by adjusting for top PCs derived from all SNPs or from ancestry informative markers; the power of these approaches was similar when averaged across causal variants simulated based on allele frequencies of the 2,509 genotyped markers. The bias may be large in case-only analysis of gene by gene interactions but it can be corrected by top PCs derived from all SNPs.

Supplementary material

439_2010_841_MOESM1_ESM.pdf (344 kb)
Supplementary material 1 (PDF 344 kb)

References

  1. Albright CL, Steffen AD, Wilkens LR, Henderson BE, Kolonel LN (2008) The prevalence of obesity in ethnic admixture adults. Obesity 16:1138–1143CrossRefPubMedGoogle Scholar
  2. Bacanu SA, Devlin B, Roeder K (2000) The power of genomic control. Am J Hum Genet 66:1933–1945CrossRefPubMedGoogle Scholar
  3. Beechert ED (1985) Working in Hawaii: a labor history. University of Hawaii Press, HonoluluGoogle Scholar
  4. Bonilla C, Parra EJ, Pfaff CL, Dios S, Marshall JA, Hamman RF, Ferrell RE, Hoggart CL, McKeigue PM, Shriver MD (2004) Admixture in the Hispanics of the San Luis Valley, Colorado, and its implications for complex trait gene mapping. Ann Hum Genet 68:139–153CrossRefPubMedGoogle Scholar
  5. Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics 55:997–1004CrossRefPubMedGoogle Scholar
  6. Fejerman L, Haiman CA, Reich D, Tandon A, Deo RC, John EM, Ingles SA, Ambrosone CB, Bovbjerg DH, Jandorf LH, Davis W, Ciupak G, Whittemore AS, Press MF, Ursin G, Bernstein L, Huntsman S, Henderson BE, Ziv E, Freedman ML (2009) An admixture scan in 1, 484 African American women with breast cancer. Cancer Epidemiol Biomarkers Prev 18:3110–3117CrossRefPubMedGoogle Scholar
  7. Freedman ML, Haiman CA, Patterson N, McDonald GJ, Tandon A, Waliszewska A, Penney K, Steen RG, Ardlie K, John EM, Oakley-Girvan I, Whittemore AS, Cooney KA, Ingles SA, Altshuler D, Henderson BE, Reich D (2006) Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men. Proc Natl Acad Sci USA 103:14068–14073CrossRefPubMedGoogle Scholar
  8. Grandinetti A, Keawe’aimoku Kaholokula J, Chang HK, Chen R, Rodriguez BL, Melish JS, Curb JD (2002) Relationship between plasma glucose concentrations and Native Hawaiian Ancestry: The Native Hawaiian Health Research Project. Int J Obes 26:778–782CrossRefGoogle Scholar
  9. Haiman CA, Patterson N, Freedman ML, Myers SR, Pike MC, Waliszewska A, Neubauer J, Tandon A, Schirmer C, McDonald GJ, Greenway SC, Stram DO, Le Marchand L, Kolonel LN, Frasco M, Wong D, Pooler LC, Ardlie K, Oakley-Girvan I, Whittemore AS, Cooney KA, John EM, Ingles SA, Altshuler D, Henderson BE, Reich D (2007) Multiple regions within 8q24 independently affect risk for prostate cancer. Nat Genet 39:638–644CrossRefPubMedGoogle Scholar
  10. Haiman CA, Hsu C, de Bakker PIW, Frasco M, Sheng X, Van Den Berg D, Casagrande JT, Kolonel LN, Le Marchand L, Hankinson SE, Han J, Dunning AM, Pooley KA, Freedman ML, Hunter DJ, Wu AH, Stram DO, Henderson BE (2008) Comprehensive association testing of common genetic variation in DNA repair pathway genes in relationship with breast cancer risk in multiple populations. Hum Mol Genet 17:825–834CrossRefPubMedGoogle Scholar
  11. Hoggart CJ, Parra EJ, Shriver MD, Bonilla C, Kittles RA, Clayton DG, McKeigue PM (2003) Control of confounding of genetic associations in stratified populations. Am J Hum Genet 72:1492–1504CrossRefPubMedGoogle Scholar
  12. Hubisz MJ, Falush D, Stephens M, Pritchard JK (2009) Inferring weak population structure with the assistance of sample group information. Mol Ecol Resour 9:1322–1332CrossRefGoogle Scholar
  13. Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, Szpiech ZA, Degnan JH, Wang K, Guerreiro R, Bras JM, Schymick JC, Hernandez DG, Traynor BJ, Simon-Sanchez J, Matarin M, Britton A, van de Leemput J, Rafferty I, Bucan M, Cann HM, Hardy JA, Rosenberg NA, Singleton AB (2008) Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451:998–1003CrossRefPubMedGoogle Scholar
  14. Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New YorkGoogle Scholar
  15. Kolonel LN, Henderson BE, Hankin JH, Nomura AM, Wilkens LR, Pike MC, Stram DO, Monroe KR, Earle ME, Nagamine FS (2000) A multiethnic cohort in Hawaii and Los Angeles: baseline characteristics. Am J Epidemiol 151:346–357PubMedGoogle Scholar
  16. Kosoy R, Nassir R, Tian C, White PA, Butler LM, Silva G, Kittles R, Alarcon-Riquelme ME, Gregersen PK, Belmont JW, De La Vega FM, Seldin MF (2009) Ancestry informative marker sets for determining continental origin and admixture proportions in common populations in America. Hum Mutat 30:69–78CrossRefPubMedGoogle Scholar
  17. Lander ES, Schork NJ (1994) Genetic dissection of complex traits. Science 265:2037–2048CrossRefPubMedGoogle Scholar
  18. Mao X, Bigham AW, Mei R, Gutierrez G, Weiss KM, Brutsaert TD, Leon-Velarde F, Moore LG, Vargas E, McKeigue PM, Shriver MD, Parra EJ (2007) A genomewide admixture mapping panel for Hispanic/Latino populations. Am J Hum Genet 80:1171–1178CrossRefPubMedGoogle Scholar
  19. Marchini J, Cardon LR, Phillips MS, Donnelly P (2004) The effects of human population structure on large genetic association studies. Nat Genet 36:512–517CrossRefPubMedGoogle Scholar
  20. Martinez-Marignac VL, Valladares A, Cameron E, Chan A, Perera A, Globus-Goldberg R, Wacher N, Kumate J, McKeigue P, O’Donnell D, Shriver MD, Cruz M, Parra EJ (2007) Admixture in Mexico City: implications for admixture mapping of type 2 diabetes genetic risk factors. Hum Genet 120:807–819CrossRefPubMedGoogle Scholar
  21. McKeigue PM (1997) Mapping genes underlying ethnic differences in disease risk by linkage disequilibrium in recently admixed populations. Am J Hum Genet 60:188–196PubMedGoogle Scholar
  22. Nordyke EC (1989) The peopling of Hawaii. University Press of Hawaii, HonoluluGoogle Scholar
  23. Patterson N, Hattangadi N, Lane B, Lohmueller KE, Hafler DA, Oksenberg JR, Hauser SL, Smith MW, O’Brien SJ, Altshuler D, Daly MJ, Reich D (2004) Methods for high-density admixture mapping of disease genes. Am J Hum Genet 74:979–1000CrossRefPubMedGoogle Scholar
  24. Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2:2074–2093CrossRefGoogle Scholar
  25. Plenge RM, Seielstad M, Padyukov L, Lee AT, Remmers EF, Ding B, Liew A, Khalili H, Chandrasekaran A, Davies LR, Li W, Tan AK, Bonnard C, Ong RT, Thalamuthu A, Pettersson S, Liu C, Tian C, Chen WV, Carulli JP, Beckman EM, Altshuler D, Alfredsson L, Criswell LA, Amos CI, Seldin MF, Kastner DL, Klareskog L, Gregersen PK (2007) TRAF1-C5 as a risk locus for rheumatoid arthritis—a genomewide study. N Engl J Med 357:1199–1209CrossRefPubMedGoogle Scholar
  26. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909CrossRefPubMedGoogle Scholar
  27. Price AL, Patterson N, Yu F, Cox DR, Waliszewska A, McDonald GJ, Tandon A, Schirmer C, Neubauer J, Bedoya G, Duque C, Villegas A, Bortolini MC, Salzano FM, Gallo C, Mazzotti G, Tello-Ruiz M, Riba L, Aguilar-Salinas CA, Canizales-Quinteros S, Menjivar M, Klitz W, Henderson B, Haiman CA, Winkler C, Tusie-Luna T, Ruiz-Linares A, Reich D (2007) A genomewide admixture map for Latino populations. Am J Hum Genet 80:1024–1036CrossRefPubMedGoogle Scholar
  28. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959PubMedGoogle Scholar
  29. Reich D, Patterson N, De Jager PL, McDonald GJ, Waliszewska A, Tandon A, Lincoln RR, DeLoa C, Fruhan SA, Cabre P, Bera O, Semana G, Kelly MA, Francis DA, Ardlie K, Khan O, Cree BA, Hauser SL, Oksenberg JR, Hafler DA (2005) A whole-genome admixture scan finds a candidate locus for multiple sclerosis susceptibility. Nat Genet 37:1113–1118CrossRefPubMedGoogle Scholar
  30. Reiner AP, Ziv E, Lind DL, Nievergelt CM, Schork NJ, Cummings SR, Phong A, Burchard EG, Harris TB, Psaty BM, Kwok P (2005) Population structure, admixture and aging-related phenotypes in African American adults: the cardiovascular health study. Am J Hum Genet 76:463–477CrossRefPubMedGoogle Scholar
  31. Silva-Zolezzi I, Hidalgo-Miranda A, Estrada-Gil J, Fernandez-Lopez JC, Uribe-Figueroa L, Contreras A, Balam-Ortiz E, del Bosque-Plata L, Velazquez-Fernandez D, Lara C, Goya R, Hernandez-Lemus E, Davila C, Barrientos E, March S, Jimenez-Sanchez G (2009) Analysis of genomic diversity in Mexican Mestizo populations to develop genomic medicine in Mexico. Proc Natl Acad Sci USA 106:8611–8616CrossRefPubMedGoogle Scholar
  32. Smith MW, Patterson N, Lautenberger JA, Truelove AL, McDonald GJ, Waliszewska A, Kessing BD, Malasky MJ, Scafe C, Le E, De Jager PL, Mignault AA, Yi Z, De The G, Essex M, Sankale JL, Moore JH, Poku K, Phair JP, Goedert JJ, Vlahov D, Williams SM, Tishkoff SA, Winkler CA, De La Vega FM, Woodage T, Sninsky JJ, Hafler DA, Altshuler D, Gilbert DA, O’Brien SJ, Reich D (2004) A high-density admixture map for disease gene discovery in african americans. Am J Hum Genet 74:1001–1013CrossRefPubMedGoogle Scholar
  33. Tang H, Quertermous T, Rodriguez B, Kardia SL, Zhu X, Brown A, Pankow JS, Province MA, Hunt SC, Boerwinkle E, Schork NJ, Risch NJ (2005) Genetic structure, self-identified race/ethnicity, and confounding in case–control association studies. Am J Hum Genet 76:268–275CrossRefPubMedGoogle Scholar
  34. The International HapMap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851–861CrossRefGoogle Scholar
  35. Thomas DC, Witte JS (2002) Point: population stratification: a problem for case–control studies of candidate–gene associations? Cancer Epidemiol Biomarkers Prev 11:505–512PubMedGoogle Scholar
  36. Tian C, Hinds DA, Shigeta R, Kittles R, Ballinger DG, Seldin MF (2006) A genomewide single-nucleotide-polymorphism panel with high ancestry information for African American admixture mapping. Am J Hum Genet 79:640–649CrossRefPubMedGoogle Scholar
  37. Wacholder S, Rothman N, Caporaso N (2002) Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. Cancer Epidemiol Biomarkers Prev 11:512–520Google Scholar
  38. Wilson JF, Weale ME, Smith AC, Gratrix F, Fletcher B, Thomas MG, Bradman N, Goldstein DB (2001) Population genetic structure of variable drug response. Nat Genet 29:265–269CrossRefPubMedGoogle Scholar
  39. Yeager M, Chatterjee N, Ciampa J, Jacobs KB, Gonzalez-Bosquet J, Hayes RB, Kraft P, Wacholder S, Orr N, Berndt S, Yu K, Hutchinson A, Wang Z, Amundadottir L, Feigelson HS, Thun MJ, Diver WR, Albanes D, Virtamo J, Weinstein S, Schumacher FR, Cancel-Tassin G, Cussenot O, Valeri A, Andriole GL, Crawford ED, Haiman CA, Henderson B, Kolonel L, Le Marchand L, Siddiq A, Riboli E, Key TJ, Kaaks R, Isaacs W, Isaacs S, Wiley KE, Gronberg H, Wiklund F, Stattin P, Xu J, Zheng SL, Sun J, Vatten LJ, Hveem K, Kumle M, Tucker M, Gerhard DS, Hoover RN, Fraumeni JF Jr, Hunter DJ, Thomas G, Chanock SJ (2009) Identification of a new prostate cancer susceptibility locus on chromosome 8q24. Nat Genet 41:1055–1057CrossRefPubMedGoogle Scholar
  40. Yu K, Wang Z, Li Q, Wacholder S, Hunter DJ, Hoover RN, Chanock S, Thomas G (2008) Population substructure and control selection in genome-wide association studies. PLoS ONE 3:e2551CrossRefPubMedGoogle Scholar
  41. Zhang F, Wang Y, Deng HW (2008) Comparison of population-based association study methods correcting for population stratification. PLoS One 3:e3392CrossRefPubMedGoogle Scholar
  42. Zhu X, Cooper RS (2007) Admixture mapping provides evidence of association of the VNN1 gene with hypertension. PLoS One 2:e1244CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  • Hansong Wang
    • 1
  • Christopher A. Haiman
    • 2
  • Laurence N. Kolonel
    • 1
  • Brian E. Henderson
    • 2
  • Lynne R. Wilkens
    • 1
  • Loïc Le Marchand
    • 1
  • Daniel O. Stram
    • 3
  1. 1.Epidemiology Program, Cancer Research Center of HawaiiUniversity of HawaiiHonoluluUSA
  2. 2.Department of Preventive Medicine, Keck School of Medicine and Norris Comprehensive Cancer CenterUniversity of Southern CaliforniaLos AngelesUSA
  3. 3.Division of Biostatistics and Genetic Epidemiology, Department of Preventive Medicine, Keck School of MedicineUniversity of Southern CaliforniaLos AngelesUSA

Personalised recommendations