Human Genetics

, Volume 128, Issue 2, pp 165–177

Self-reported ethnicity, genetic structure and the impact of population stratification in a multiethnic study

  • Hansong Wang
  • Christopher A. Haiman
  • Laurence N. Kolonel
  • Brian E. Henderson
  • Lynne R. Wilkens
  • Loïc Le Marchand
  • Daniel O. Stram
Original Investigation

DOI: 10.1007/s00439-010-0841-4

Cite this article as:
Wang, H., Haiman, C.A., Kolonel, L.N. et al. Hum Genet (2010) 128: 165. doi:10.1007/s00439-010-0841-4

Abstract

It is well-known that population substructure may lead to confounding in case–control association studies. Here, we examined genetic structure in a large racially and ethnically diverse sample consisting of five ethnic groups of the Multiethnic Cohort study (African Americans, Japanese Americans, Latinos, European Americans and Native Hawaiians) using 2,509 SNPs distributed across the genome. Principal component analysis on 6,213 study participants, 18 Native Americans and 11 HapMap III populations revealed four important principal components (PCs): the first two separated Asians, Europeans and Africans, and the third and fourth corresponded to Native American and Native Hawaiian (Polynesian) ancestry, respectively. Individual ethnic composition derived from self-reported parental information matched well to genetic ancestry for Japanese and European Americans. STRUCTURE-estimated individual ancestral proportions for African Americans and Latinos are consistent with previous reports. We quantified the East Asian (mean 27%), European (mean 27%) and Polynesian (mean 46%) ancestral proportions for the first time, to our knowledge, for Native Hawaiians. Simulations based on realistic settings of case–control studies nested in the Multiethnic Cohort found that the effect of population stratification was modest and readily corrected by adjusting for race/ethnicity or by adjusting for top PCs derived from all SNPs or from ancestry informative markers; the power of these approaches was similar when averaged across causal variants simulated based on allele frequencies of the 2,509 genotyped markers. The bias may be large in case-only analysis of gene by gene interactions but it can be corrected by top PCs derived from all SNPs.

Supplementary material

439_2010_841_MOESM1_ESM.pdf (344 kb)
Supplementary material 1 (PDF 344 kb)

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  • Hansong Wang
    • 1
  • Christopher A. Haiman
    • 2
  • Laurence N. Kolonel
    • 1
  • Brian E. Henderson
    • 2
  • Lynne R. Wilkens
    • 1
  • Loïc Le Marchand
    • 1
  • Daniel O. Stram
    • 3
  1. 1.Epidemiology Program, Cancer Research Center of HawaiiUniversity of HawaiiHonoluluUSA
  2. 2.Department of Preventive Medicine, Keck School of Medicine and Norris Comprehensive Cancer CenterUniversity of Southern CaliforniaLos AngelesUSA
  3. 3.Division of Biostatistics and Genetic Epidemiology, Department of Preventive Medicine, Keck School of MedicineUniversity of Southern CaliforniaLos AngelesUSA