Population Stratification Analysis in Genome-Wide Association Studies

  • Erika Salvi
  • Alessandro Orro
  • Guia Guffanti
  • Sara Lupoli
  • Federica Torri
  • Cristina Barlassina
  • Steven Potkin
  • Daniele Cusi
  • Fabio Macciardi
  • Luciano Milanesi


Differences in genetic background within two or more populations are an important cause of disturbance in case–control association studies. In fact, when mixing together populations of different ethnic groups, different allele frequencies between case and control samples could be due to the ancestry rather than a real association with the disease under study. This can easily lead to a large amount of false positive and negative results in association study analysis. Moreover, the growing need to put together several data sets coming from different studies in order to increase the statistical power of the analysis makes this problem particularly important in recent statistical genetics research. To overcome these problems, different correction strategies have been proposed, but currently there is no consensus about a common powerful strategy to adjust for population stratification. In this chapter, we discuss the state-of-the-art of strategies used for correcting the statistics for genome-wide association analysis by taking into account the ancestral structure of the population. After a short review of the most important methods and tools available, we will show the results obtained in two real data sets and discuss them in terms of advantages and disadvantages of each algorithm.


Population Stratification Population Substructure Allele Frequency Difference Genomic Control Control Association Study 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported by FIRB Italia-Israele [RBIN04SWHR], a fellowship of the Doctorate School of Molecular Medicine, University of Milan, POCEMON [FP7-ICT-2007-216088], HYPERGENES [HEALTH-F4-2007-201550], InGenious HyperCare [LSHM-CT-2006-037093], by the Israel Science Foundation [Israel Academy of Sciences, Grant #348/09], by the Enabling Grids for E-sciencE (INFSO-RI-222667), CNR-BIOINFORMATICS, ITALBIONET, Italian-Canada FIRB-MUR Projects.


  1. 1.
    Cardon LR, Bell JI: Association study designs for complex diseases. Nat Rev Genet 2(2): 91–99 (2001)PubMedCrossRefGoogle Scholar
  2. 2.
    Zondervan KT, Cardon LR: Designing candidate gene and genome-wide case-control association studies. Nat Protoc 2(10): 2492–2501 (2007)PubMedCrossRefGoogle Scholar
  3. 3.
    Ziegler A, Konig IR, Thompson JR: Biostatistical aspects of genome-wide association studies. Biom J 50(1): 8–28 (2008)PubMedCrossRefGoogle Scholar
  4. 4.
    Potkin SG, Turner JA, Guffanti G, Lakatos A, Torri F, Keator DB, Macciardi F: Genome-wide strategies for discovering genetic influences on cognition and cognitive disorders: methodological considerations. Cogn Neuropsychiatry 14(4): 391–418 (2009)PubMedCrossRefGoogle Scholar
  5. 5.
    Barnholtz-Sloan JS, McEvoy B, Shriver MD, Rebbeck TR: Ancestry estimation and correction for population stratification in molecular epidemiologic association studies. Cancer Epidemiol Biomarkers Prev 17(3): 471–477 (2008)PubMedCrossRefGoogle Scholar
  6. 6.
    Freedman ML, Reich D, Penney KL et al: Assessing the impact of population stratification on genetic association studies. Nat Genet 36: 388–393 (2004)PubMedCrossRefGoogle Scholar
  7. 7.
    Cardon LR, Palmer LJ: Population stratification and spurious allelic association. Lancet 361: 598–604 (2003)PubMedCrossRefGoogle Scholar
  8. 8.
    Weir BS, Cockerham CC: Estimating F-statistics for the analysis of population structure. Evolution 38: 1358–1370 (1984)CrossRefGoogle Scholar
  9. 9.
    Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics 155: 945–959 (2000)PubMedGoogle Scholar
  10. 10.
    Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM: Worldwide human relationships inferred from genome-wide patterns of variation. Science 319(5866): 1100–1104 (2008)PubMedCrossRefGoogle Scholar
  11. 11.
    Devlin B, Bacanu B, Roeder K: Genomic control in the extreme. Nat Genet 36: 1129–1130 (2004)PubMedCrossRefGoogle Scholar
  12. 12.
    Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA: Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38: 904–909 (2006)PubMedCrossRefGoogle Scholar
  13. 13.
    Purcell S, Neale B, Todd-Brown K et al: PLINK: a toolset for whole-genome association and population-based linkage analysis. AJHG 2007 81: 559–575 (2007)Google Scholar
  14. 14.
    Bauchet M, McEvoy B, Pearson LN, Quillen EE, Sarkisian T, Hovhannesyan K, Deka R, Bradley DG, Shriver MD: Measuring European population stratification with microarray genotype data. Am J Hum Genet 80(5): 948–956 (2007); Epub Mar 22 2007Google Scholar
  15. 15.
    Tian C, Plenge RM, Ransom M, Lee A, Villoslada P, Selmi C, Klareskog L, Pulver AE, Qi L, Gregersen PK, Seldin MF: Analysis and application of European genetic substructure using 300 K SNP information. PLoS Genet 4(1): e4 (2008)PubMedCrossRefGoogle Scholar
  16. 16.
    Price AL, Butler J, Patterson N et al: Discerning the ancestry of European Americans in genetic association studies. PLoS Genet 2008 4(1): e236 (2007); Epub Nov 19 2007Google Scholar
  17. 17.
    Patterson N, Price AL, Reich D: Population structure and eigenanalysis. PLoS Genet 2: 2074–2093 (2006)CrossRefGoogle Scholar
  18. 18.
    Novembre J, Stephens M: Interpreting principal component analyses of spatial population genetic variation. Nat Genet 40(5): 646–649 (2008); Epub Apr 20 2008Google Scholar
  19. 19.
    Yu K, Wang Z, Li Q, Wacholder S, Hunter DJ, Hoover RN, Chanock S, Thomas G: Population substructure and control selection in genome-wide association studies. PLoS ONE 3(7): e2551 (2008)Google Scholar
  20. 20.
    Wellcome Trust Case Control Consortium.: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447(7145): 661–678 (2007)Google Scholar
  21. 21.
    Yeager M, Orr N, Hayes RB et al: Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet 39(5): 645–649 (2007); Epub Apr 1 2007Google Scholar
  22. 22.
    Hunter DJ, Kraft P, Jacobs KB et al: A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet 39(7): 870–874 (2007); Epub May 27 2007Google Scholar
  23. 23.
    Epstein MP, Allen AS, Satten GA: A simple and improved correction for population stratification in case-control studies. Am J Hum Genet 80(5): 921–930 (2007); Epub Mar 29 2007Google Scholar
  24. 24.
    Serre D, Montpetit A, Par G, Engert JC, Yusuf S, Keavney B, Hudson TJ, Anand S: Correction of population stratification in large multi-ethnic association studies. PLoS ONE 3(1): e1382 (2008)Google Scholar
  25. 25.
    Li Q, Yu K: Improved correction for population stratification in genome-wide association studies by identifying hidden population structures. Genet Epidemiol 32(3): 215–226 (2008)PubMedCrossRefGoogle Scholar
  26. 26.
    Seldin MF, Price AL: Application of ancestry informative markers to association studies in European Americans. PLoS Genet 4(1): e5 (2008)PubMedCrossRefGoogle Scholar
  27. 27.
    Mao X, Bigham AW, Mei R, Gutierrez G, Weiss KM, Brutsaert TD, Leon-Velarde F, Moore LG, Vargas E, McKeigue PM, Shriver MD, Parra EJ: A genomewide admixture mapping panel for Hispanic/Latino populations. Am J Hum Genet 80(6): 1171–1178 (2007); Epub Apr 20 2007Google Scholar
  28. 28.
    Sullivan PF, Lin D, Tzeng JY et al: Genomewide association for schizophrenia in the CATIE study: results of stage 1. Mol Psychiatry 13: 570–584 (2008)PubMedCrossRefGoogle Scholar
  29. 29.
    Steemers FJ, Gunderson KL: Pharmacogenomics 6: 777–778 (2005)Google Scholar
  30. 30.
    Fan J-B, Chee MS, Gunderson KL: Highly parallel genomic assays. Nature Publishing Group 7: 632–644 (2006)Google Scholar

Copyright information

© Springer New York 2011

Authors and Affiliations

  • Erika Salvi
  • Alessandro Orro
  • Guia Guffanti
  • Sara Lupoli
  • Federica Torri
  • Cristina Barlassina
  • Steven Potkin
  • Daniele Cusi
  • Fabio Macciardi
  • Luciano Milanesi
    • 1
  1. 1.CNR – Institute for Biomedical TechnologiesSegrateItaly

Personalised recommendations