Skip to main content

Analysis of Population Structure

  • Chapter
  • First Online:
Human Population Genomics

Abstract

For humans, like any sexually reproducing diploid organism, mating may be random in the sense that individuals are equally likely to mate and produce offspring. Such a view of a population has been important in population genetics as a basis for modeling and analysis. Population structure denotes deviation from this panmixia, regardless of the cause. In this chapter, we will briefly discuss random mating, populations, population structure, and various methods and practices to infer population structure among individuals from empirical genome-wide data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19:1655–1664

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Balding DJ, Nichols RA (1994) DNA profile match probability calculation: how to allow for population stratification, relatedness, database selection and single bands. Forensic Sci Int 64:125–140

    Article  CAS  PubMed  Google Scholar 

  • Balding DJ, Nichols RA (1995) A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica 96:3–12

    Article  CAS  PubMed  Google Scholar 

  • Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian computation in population genetics. Genetics 162:2025–2035

    PubMed  PubMed Central  Google Scholar 

  • Becquet C, Przeworski M (2007) A new approach to estimate parameters of speciation models with application to apes. Genome Res 17:1505–1519

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bhatia G, Patterson N, Sankararaman S, Price AL (2013) Estimating and interpreting FST: the impact of rare variants. Genome Res 23:1514–1521

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bradburd GS, Ralph PL, Coop GM (2016) A spatial framework for understanding population structure and admixture. PLoS Genet 12:e1005703

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Cann RL, Stoneking M, Wilson AC (1987) Mitochondrial DNA and human evolution. Nature 325:31–36

    Article  CAS  PubMed  Google Scholar 

  • Cann HM, de Toma C, Cazes L, Legrand MF, Morel V et al (2002) A human genome diversity cell line panel. Science 296:261–262

    Article  CAS  PubMed  Google Scholar 

  • Cavalli-Sforza LL, Edwards AWF (1967) Phylogenetic analysis -models and estimation procedures. Am J Hum Gen 19:233–257

    CAS  Google Scholar 

  • Cavalli-Sforza LL, Menozzi P, Piazza A (1994) The History and Geography of Human Genes. Princeton University Press, Princeton, NJ

    Google Scholar 

  • Chakraborty R, Jin L (1993) A unified approach to study hypervariable polymorphisms: statistical considerations of determining relatedness and population distances. In: DNA fingerprinting: state of the science. Birkhäuser, Basel, pp 153–175

    Google Scholar 

  • Chen C, Durand E, Forbes F, François O (2007) Bayesian clustering algorithms ascertaining spatial population structure: a new computer program and a comparison study. Mol Ecol Notes 7:747–756

    Article  Google Scholar 

  • Corander J, Waldmann P, Sillanpää MJ (2003) Bayesian analysis of genetic differentiation between populations. Genetics 163:367–374

    CAS  PubMed  PubMed Central  Google Scholar 

  • Csilléry K, Blum MGB, Gaggiotti OE, François O (2010) Approximate Bayesian computation in practice. Trends Ecol Evol 25:410–418

    Article  PubMed  Google Scholar 

  • Csilléry K, François O, Blum MGB (2012) abc: an R package for approximate Bayesian computation (ABC). Methods Ecol Evol 3:475–479

    Article  Google Scholar 

  • Duforet-Frebourg N, Blum MGB (2014) Nonstationary patterns of isolation-by-distance: inferring measure of local genetic differentiation with Bayesian kriging. Evolution 68:1110–1123

    Article  PubMed  PubMed Central  Google Scholar 

  • Ewens WJ, Spielman RS (1995) The transmission/disequilibrium test: history, subdivision, and admixture. Am J Hum Genet 57:455–464

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Excoffier L, Lischer HEL (2010) Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Res 10:564–567

    Article  Google Scholar 

  • Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164:1567–1587

    CAS  PubMed  PubMed Central  Google Scholar 

  • Felsenstein, J (1983) Parsimony in systematics: biological and statistical issues. Annu Rev Ecol Syst 14:313–333

    Google Scholar 

  • Foreman L, Smith A, Evett I (1997) Bayesian analysis of DNA profiling data in forensic identification applications. J R Stat Soc A 160:429–469

    Article  Google Scholar 

  • Goldstein DB, Ruiz Linares A, Cavalli-Sforza LL, Feldman MW (1995) Genetic absolute dating based on microsatellites and the origin of modern humans. Proc Natl Acad Sci USA 92:6723–6727

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Green RE, Krause J, Briggs AW, Maricic T, Stenzel U et al (2010) A draft sequence of the Neandertal genome. Science 328:710–722

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Guillot G, Estoup A, Mortier F, Cosson JF (2005) A spatial statistical model for landscape genetics. Genetics 170:1261–1280

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD (2009) Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet 5:e1000695

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Hey J, Nielsen R (2004) Multilocus methods for estimating population sizes, migration rates and divergence time, with applications to the divergence of Drosophila pseudoobscura and D. persimilis. Genetics 167:747–760

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hey J, Nielsen R (2007) Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proc Natl Acad Sci USA 104:2785–2790

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Holsinger KE, Weir BS (2009) Genetics in geographically structured populations: defining, estimating and interpreting F(ST). Nat Rev Genet 10:639–650

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Jakobsson M, Rosenberg NA (2007) CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23:1801–1806

    Article  CAS  PubMed  Google Scholar 

  • Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM et al (2008) Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451:998–1003

    Article  CAS  PubMed  Google Scholar 

  • Jay F, Sjödin P, Jakobsson M, Blum MGM (2013) Anisotropic isolation by distance: the main orientations of human genetic differentiation. Mol Biol Evol 30:513–525

    Article  CAS  PubMed  Google Scholar 

  • Jolliffe I (2005) Principal component analysis. Wiley, New York

    Google Scholar 

  • Jost L (2008) G(ST) and its relatives do not measure differentiation. Mol Ecol 17:4015–4026

    Article  PubMed  Google Scholar 

  • Katti MV, Rajekar PK, Gupta VS (2001) Differential distribution of simple sequence repeats in eukaryotic genome sequences. Mol Biol Evol 18:1161–1167

    Article  CAS  PubMed  Google Scholar 

  • Kumar S, Stecher G, Tamura K (2016) MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33:1870–1874

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Landsteiner K, Weiner AS (1940) An agglutinable factor in human blood recognized by immune sera for rhesus blood. Proc Soc Exp Biol NY 43:223

    Article  CAS  Google Scholar 

  • Lawson DJ, Hellenthal G, Myers S, Falush D (2011) Inference of population structure using dense haplotype data. PLoS Genet 8:e1002453

    Article  CAS  Google Scholar 

  • Lewontin RC, Hubby JL (1966) A molecular approach to the study of genetic heterozygosity in natural populations. II. Amount of variation and degree of heterozygosity in natural populations of Drosophila pseudoobscura. Genetics 54:595–609

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li N, Stephens M (2003) Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165:2213–2233

    CAS  PubMed  PubMed Central  Google Scholar 

  • Li JZ, Absher DM, Tang H, Southwick AM, Casto AM et al (2008) Worldwide human relationships inferred from genome-wide patterns of variation. Science 319:1100–1104

    Article  CAS  PubMed  Google Scholar 

  • Li J, Li H, Jakobsson M, Li S, Sjödin P, Lascoux M (2012) Joint analysis of demography and selection in population genetics: where do we stand and where could we go? Mol Ecol 21:28–44

    Article  CAS  PubMed  Google Scholar 

  • Lipson M, Loh PR, Levin A, Reich D, Patterson N, Berger B (2013) Efficient moment-based inference of population admixture parameters and sources of gene flow. Mol Biol Evol 30:1788–1802

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Lopes JS, Balding D, Beaumont MA (2009) PopABC: a program to infer historical demographic parameters. Bioinformatics 25:2747–2749

    Article  CAS  PubMed  Google Scholar 

  • Mallick S, Li H, Lipson M, Mathieson I, Gymrek M et al (2016) The Simons genome diversity project: 300 genomes from 142 diverse populations. Nature 538:201–206

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • McVean G (2009) A genealogical interpretation of principal components analysis M. PLoS Genetics 5:e1000686

    Google Scholar 

  • Nei M (1972) Genetic distance between populations. Am Nat 106:283–292

    Article  Google Scholar 

  • Nei M (1973) Analysis of gene diversity in subdivided populations. Proc Natl Acad Sci USA 70:3321–3323

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Nei M, Tajima F, Tateno Y (1983) Accuracy of estimated phylogenetic trees from molecular data. II Gene frequency data. J Mol Evol 19:153–170

    Article  CAS  PubMed  Google Scholar 

  • Nicholson G, Smith AV, Jónsson F, Gústafsson Ó, Stefánsson K, Donnelly P (2002) Assessing population differentiation and isolation from single nucleotide polymorphism data. J R Stat Soc B 64:695–715

    Article  Google Scholar 

  • Nielsen R, Wakeley J (2001) Distinguishing migration from isolation: a Markov chain Monte Carlo approach. Genetics 158:885–896

    CAS  PubMed  PubMed Central  Google Scholar 

  • Patterson N, Price AL, Reich D (2006) Population structure and eigen analysis. PLoS Genetics 2:e190

    Google Scholar 

  • Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N et al (2012) Ancient admixture in human history. Genetics 192:1065–1093

    Article  PubMed  PubMed Central  Google Scholar 

  • Petkova D, Novembre J, Stephens M (2016) Visualizing spatial population structure with estimated effective migration surfaces. Nat Genet 48:94–100

    Article  CAS  PubMed  Google Scholar 

  • Pickrell JK, Pritchard JK (2012) Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet 8:e1002967

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Prevosti A, Ocana J, Alonzo G (1975) Distances between populations for Drosophila subobscura based on chromosome arrangement frequencies. Theor Appl Genet 45:231–241

    Article  CAS  PubMed  Google Scholar 

  • Price AL, Tandon A, Patterson N, Barnes KC, Rafaels N et al (2009) Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet 5:e1000519

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959

    CAS  PubMed  PubMed Central  Google Scholar 

  • Pudlo P, Marin JM, Estoup A, Cornuet JM, Gautier M, Robert CP (2016) Reliable ABC model choice via random forests. Bioinformatics 32:859–866

    Article  CAS  PubMed  Google Scholar 

  • Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR et al (2007) PLINK: a toolset for whole-genome association and population-based linkage analysis. Am Hum Genet 81:559–575

    Article  CAS  Google Scholar 

  • Quinn GP, Keough MJ (2002) Experimental design and data analysis for biologists. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Reich D, Thangaraj K, Patterson N, Price AL, Singh L (2009) Reconstructing Indian population history. Nature 461:489–494

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Reich D, Green RE, Kircher M, Krause J, Patterson N et al (2010) Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468:1053–1060

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Reynolds J, Weir BS, Cockerham CC (1983) Estimation of the coancestry coefficient: basis for a short-term genetic distance. Genetics 105:767–779

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Roeder K, Escobar M, Kadane JB, Balazs I (1998) Measuring heterogeneity in forensic databases using hierarchical Bayes models. Biometrika 85:269–287

    Article  Google Scholar 

  • Rogers JS (1972) Measures of similarity and genetic distance. In: Studies in genetics VII. University of Texas Publication 7213. Austin, Texas, pp 145−153

    Google Scholar 

  • Rosenberg NA (2004) Distruct: a program for the graphical display of population structure. Mol Ecol Notes 4:137–138

    Article  Google Scholar 

  • Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK et al (2002) Genetic structure of human populations. Science 298:2381–2385

    Article  CAS  PubMed  Google Scholar 

  • Rousset F (2013) Exegeses on maximum genetic differentiation. Genetics 194:557–559

    Article  PubMed  PubMed Central  Google Scholar 

  • Ryman N, Leimar O (2009) G(ST) is still a useful measure of genetic differentiation – a comment on Jost’s D. Mol Ecol 18:2084–2087

    Article  PubMed  Google Scholar 

  • Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425

    CAS  PubMed  Google Scholar 

  • Schlebusch CM, Skoglund P, Sjödin P, Gattepaille LM, Hernandez D et al (2012) Genomic variation in seven Khoe-San groups reveals adaptation and complex African history. Science 338:374–379

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Shriver M, Jin L, Boerwinkle E, Deka R, Ferrell RE et al (1995) A novel measure of genetic distance for highly polymorphic tandem repeat loci. Mol Biol Evol 12:914–920

    CAS  PubMed  Google Scholar 

  • Slatkin M (1995) A measure of population subdivision based on microsatellite allele frequencies. Genetics 139:457–462

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Tang H, Peng J, Wang P, Risch NJ (2005) Estimation of individual admixture: analytical and study design considerations. Genet Epidemiol 28:289–301

    Article  PubMed  Google Scholar 

  • Tellier A, Pfaffelhuber P, Haubold B, Naduvilezhath L, Rose LE et al (2011) Estimating parameters of speciation models based on refined summaries of the joint site-frequency spectrum. PLoS One 6:e18155

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Veeramah KR, Hammer MF (2014) The impact of whole-genome sequencing on the reconstruction of human population history. Nat Rev Genet 15:149–162

    Article  CAS  PubMed  Google Scholar 

  • Venter JC, Adams MD, Myers EW, Li PW, Mural RJ et al (2001) The sequence of the human genome. Science 291(5507):1304–51. https://doi.org/10.1126/science.1058040. Erratum in: Science 292(5523):1838 (2001). PMID: 11181995.

  • Weir BS (1996) Genetic data analysis II. Sinauer Associates, Sunderland

    Google Scholar 

  • Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358–1370

    CAS  PubMed  Google Scholar 

  • Wright S (1949) The genetical structure of populations. Ann Hum Gen 15:323–354

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mattias Jakobsson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Sjödin, P., Gattepaille, L., Skoglund, P., Schlebusch, C., Jakobsson, M. (2021). Analysis of Population Structure. In: Lohmueller, K.E., Nielsen, R. (eds) Human Population Genomics. Springer, Cham. https://doi.org/10.1007/978-3-030-61646-5_3

Download citation

Publish with us

Policies and ethics