Self-reported ethnicity, genetic structure and the impact of population stratification in a multiethnic study
It is well-known that population substructure may lead to confounding in case–control association studies. Here, we examined genetic structure in a large racially and ethnically diverse sample consisting of five ethnic groups of the Multiethnic Cohort study (African Americans, Japanese Americans, Latinos, European Americans and Native Hawaiians) using 2,509 SNPs distributed across the genome. Principal component analysis on 6,213 study participants, 18 Native Americans and 11 HapMap III populations revealed four important principal components (PCs): the first two separated Asians, Europeans and Africans, and the third and fourth corresponded to Native American and Native Hawaiian (Polynesian) ancestry, respectively. Individual ethnic composition derived from self-reported parental information matched well to genetic ancestry for Japanese and European Americans. STRUCTURE-estimated individual ancestral proportions for African Americans and Latinos are consistent with previous reports. We quantified the East Asian (mean 27%), European (mean 27%) and Polynesian (mean 46%) ancestral proportions for the first time, to our knowledge, for Native Hawaiians. Simulations based on realistic settings of case–control studies nested in the Multiethnic Cohort found that the effect of population stratification was modest and readily corrected by adjusting for race/ethnicity or by adjusting for top PCs derived from all SNPs or from ancestry informative markers; the power of these approaches was similar when averaged across causal variants simulated based on allele frequencies of the 2,509 genotyped markers. The bias may be large in case-only analysis of gene by gene interactions but it can be corrected by top PCs derived from all SNPs.
KeywordsAfrican American European Ancestry African American Admixture Mapping Ancestry Informative Marker
We thank the researchers and participants of the Multiethnic Cohort study. We also thank Dr. David Van Den Berg, Loreall Pooled and Xin Sheng for their technical assistance in genotyping as well as Dr. Gary Chen and Christian Caberto for help in genotype pre-processing and data management. This work was supported by grants from the National Institutes of Health (R37CA54281, P01CA33619, R01CA63464, and U01CA98758).
Conflict of interest statement
The authors declare that they have no conflict of interest.
- Beechert ED (1985) Working in Hawaii: a labor history. University of Hawaii Press, HonoluluGoogle Scholar
- Fejerman L, Haiman CA, Reich D, Tandon A, Deo RC, John EM, Ingles SA, Ambrosone CB, Bovbjerg DH, Jandorf LH, Davis W, Ciupak G, Whittemore AS, Press MF, Ursin G, Bernstein L, Huntsman S, Henderson BE, Ziv E, Freedman ML (2009) An admixture scan in 1, 484 African American women with breast cancer. Cancer Epidemiol Biomarkers Prev 18:3110–3117CrossRefPubMedGoogle Scholar
- Freedman ML, Haiman CA, Patterson N, McDonald GJ, Tandon A, Waliszewska A, Penney K, Steen RG, Ardlie K, John EM, Oakley-Girvan I, Whittemore AS, Cooney KA, Ingles SA, Altshuler D, Henderson BE, Reich D (2006) Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men. Proc Natl Acad Sci USA 103:14068–14073CrossRefPubMedGoogle Scholar
- Haiman CA, Patterson N, Freedman ML, Myers SR, Pike MC, Waliszewska A, Neubauer J, Tandon A, Schirmer C, McDonald GJ, Greenway SC, Stram DO, Le Marchand L, Kolonel LN, Frasco M, Wong D, Pooler LC, Ardlie K, Oakley-Girvan I, Whittemore AS, Cooney KA, John EM, Ingles SA, Altshuler D, Henderson BE, Reich D (2007) Multiple regions within 8q24 independently affect risk for prostate cancer. Nat Genet 39:638–644CrossRefPubMedGoogle Scholar
- Haiman CA, Hsu C, de Bakker PIW, Frasco M, Sheng X, Van Den Berg D, Casagrande JT, Kolonel LN, Le Marchand L, Hankinson SE, Han J, Dunning AM, Pooley KA, Freedman ML, Hunter DJ, Wu AH, Stram DO, Henderson BE (2008) Comprehensive association testing of common genetic variation in DNA repair pathway genes in relationship with breast cancer risk in multiple populations. Hum Mol Genet 17:825–834CrossRefPubMedGoogle Scholar
- Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, Szpiech ZA, Degnan JH, Wang K, Guerreiro R, Bras JM, Schymick JC, Hernandez DG, Traynor BJ, Simon-Sanchez J, Matarin M, Britton A, van de Leemput J, Rafferty I, Bucan M, Cann HM, Hardy JA, Rosenberg NA, Singleton AB (2008) Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451:998–1003CrossRefPubMedGoogle Scholar
- Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New YorkGoogle Scholar
- Kosoy R, Nassir R, Tian C, White PA, Butler LM, Silva G, Kittles R, Alarcon-Riquelme ME, Gregersen PK, Belmont JW, De La Vega FM, Seldin MF (2009) Ancestry informative marker sets for determining continental origin and admixture proportions in common populations in America. Hum Mutat 30:69–78CrossRefPubMedGoogle Scholar
- Martinez-Marignac VL, Valladares A, Cameron E, Chan A, Perera A, Globus-Goldberg R, Wacher N, Kumate J, McKeigue P, O’Donnell D, Shriver MD, Cruz M, Parra EJ (2007) Admixture in Mexico City: implications for admixture mapping of type 2 diabetes genetic risk factors. Hum Genet 120:807–819CrossRefPubMedGoogle Scholar
- Nordyke EC (1989) The peopling of Hawaii. University Press of Hawaii, HonoluluGoogle Scholar
- Plenge RM, Seielstad M, Padyukov L, Lee AT, Remmers EF, Ding B, Liew A, Khalili H, Chandrasekaran A, Davies LR, Li W, Tan AK, Bonnard C, Ong RT, Thalamuthu A, Pettersson S, Liu C, Tian C, Chen WV, Carulli JP, Beckman EM, Altshuler D, Alfredsson L, Criswell LA, Amos CI, Seldin MF, Kastner DL, Klareskog L, Gregersen PK (2007) TRAF1-C5 as a risk locus for rheumatoid arthritis—a genomewide study. N Engl J Med 357:1199–1209CrossRefPubMedGoogle Scholar
- Price AL, Patterson N, Yu F, Cox DR, Waliszewska A, McDonald GJ, Tandon A, Schirmer C, Neubauer J, Bedoya G, Duque C, Villegas A, Bortolini MC, Salzano FM, Gallo C, Mazzotti G, Tello-Ruiz M, Riba L, Aguilar-Salinas CA, Canizales-Quinteros S, Menjivar M, Klitz W, Henderson B, Haiman CA, Winkler C, Tusie-Luna T, Ruiz-Linares A, Reich D (2007) A genomewide admixture map for Latino populations. Am J Hum Genet 80:1024–1036CrossRefPubMedGoogle Scholar
- Reich D, Patterson N, De Jager PL, McDonald GJ, Waliszewska A, Tandon A, Lincoln RR, DeLoa C, Fruhan SA, Cabre P, Bera O, Semana G, Kelly MA, Francis DA, Ardlie K, Khan O, Cree BA, Hauser SL, Oksenberg JR, Hafler DA (2005) A whole-genome admixture scan finds a candidate locus for multiple sclerosis susceptibility. Nat Genet 37:1113–1118CrossRefPubMedGoogle Scholar
- Silva-Zolezzi I, Hidalgo-Miranda A, Estrada-Gil J, Fernandez-Lopez JC, Uribe-Figueroa L, Contreras A, Balam-Ortiz E, del Bosque-Plata L, Velazquez-Fernandez D, Lara C, Goya R, Hernandez-Lemus E, Davila C, Barrientos E, March S, Jimenez-Sanchez G (2009) Analysis of genomic diversity in Mexican Mestizo populations to develop genomic medicine in Mexico. Proc Natl Acad Sci USA 106:8611–8616CrossRefPubMedGoogle Scholar
- Smith MW, Patterson N, Lautenberger JA, Truelove AL, McDonald GJ, Waliszewska A, Kessing BD, Malasky MJ, Scafe C, Le E, De Jager PL, Mignault AA, Yi Z, De The G, Essex M, Sankale JL, Moore JH, Poku K, Phair JP, Goedert JJ, Vlahov D, Williams SM, Tishkoff SA, Winkler CA, De La Vega FM, Woodage T, Sninsky JJ, Hafler DA, Altshuler D, Gilbert DA, O’Brien SJ, Reich D (2004) A high-density admixture map for disease gene discovery in african americans. Am J Hum Genet 74:1001–1013CrossRefPubMedGoogle Scholar
- Wacholder S, Rothman N, Caporaso N (2002) Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. Cancer Epidemiol Biomarkers Prev 11:512–520Google Scholar
- Yeager M, Chatterjee N, Ciampa J, Jacobs KB, Gonzalez-Bosquet J, Hayes RB, Kraft P, Wacholder S, Orr N, Berndt S, Yu K, Hutchinson A, Wang Z, Amundadottir L, Feigelson HS, Thun MJ, Diver WR, Albanes D, Virtamo J, Weinstein S, Schumacher FR, Cancel-Tassin G, Cussenot O, Valeri A, Andriole GL, Crawford ED, Haiman CA, Henderson B, Kolonel L, Le Marchand L, Siddiq A, Riboli E, Key TJ, Kaaks R, Isaacs W, Isaacs S, Wiley KE, Gronberg H, Wiklund F, Stattin P, Xu J, Zheng SL, Sun J, Vatten LJ, Hveem K, Kumle M, Tucker M, Gerhard DS, Hoover RN, Fraumeni JF Jr, Hunter DJ, Thomas G, Chanock SJ (2009) Identification of a new prostate cancer susceptibility locus on chromosome 8q24. Nat Genet 41:1055–1057CrossRefPubMedGoogle Scholar