Skip to main content

Introduction to Population Genomics Methods

  • Protocol
  • First Online:
Molecular Plant Taxonomy

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2222))

Abstract

High-throughput sequencing technologies have provided an unprecedented opportunity to study the different evolutionary forces that have shaped present-day patterns of genetic diversity, with important implications for many directions in plant biology research. To manage such massive quantities of sequencing data, biologists, however, need new additional skills in informatics and statistics. In this chapter, our objective is to introduce population genomics methods to beginners following a learning-by-doing strategy in order to help the reader to analyze the sequencing data by themselves. Conducted analyses cover several main areas of evolutionary biology, such as an initial description of the evolutionary history of a given species or the identification of genes targeted by natural or artificial selection. In addition to the practical advices, we performed re-analyses of two cases studies with different kind of data: a domesticated cereal (African rice) and a non-domesticated tree species (sessile oak). All the code needed to replicate this work is publicly available on github (https://github.com/ThibaultLeroyFr/Intro2PopGenomics/).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Charlesworth B (2010) Molecular population genomics: a short history. Genet Res 92:397–411. https://doi.org/10.1017/S0016672310000522

    Article  Google Scholar 

  2. Wang W, Mauleon R, Hu Z et al (2018) Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557:43–49. https://doi.org/10.1038/s41586-018-0063-9

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. 1001 Genomes Consortium. Electronic address: magnus.nordborg@gmi.oeaw.ac.at, 1001 Genomes Consortium (2016) 1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell 166:481–491. https://doi.org/10.1016/j.cell.2016.05.063

    Article  CAS  Google Scholar 

  4. Hartl DL, Clark AG (1998) Principles of population genetics. Sinauer, Sunderland, MA

    Google Scholar 

  5. Cubry P, Tranchant-Dubreuil C, Thuillet A-C et al (2018) The rise and fall of African Rice cultivation revealed by analysis of 246 new genomes. Curr Biol 28:2274–2282.e6. https://doi.org/10.1016/j.cub.2018.05.066

    Article  CAS  PubMed  Google Scholar 

  6. Leroy T, Louvet J-M, Lalanne C, et al (2019) Adaptive introgression as a driver of local adaptation to climate in European white oaks bioRxiv 584847. https://doi.org/10.1101/584847

  7. Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303.3997

    Google Scholar 

  8. Li H, Durbin R (2009) Fast and accurate short read alignment with burrows–wheeler transform. Bioinformatics 25:1754–1760. https://doi.org/10.1093/bioinformatics/btp324

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with bowtie 2. Nat Methods 9:357–359. https://doi.org/10.1038/nmeth.1923

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Makino T, Rubin C-J, Carneiro M et al (2018) Elevated proportions of deleterious genetic variation in domestic animals and plants. Genome Biol Evol 10:276–290. https://doi.org/10.1093/gbe/evy004

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Meyer RS, Purugganan MD (2013) Evolution of crop species: genetics of domestication and diversification. Nat Rev Genet 14:840

    Article  CAS  PubMed  Google Scholar 

  12. Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164:1567

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Hubisz MJ, Falush D, Stephens M, Pritchard JK (2009) Inferring weak population structure with the assistance of sample group information. Mol Ecol Resour 9:1322–1332. https://doi.org/10.1111/j.1755-0998.2009.02591.x

    Article  PubMed  PubMed Central  Google Scholar 

  14. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Novembre J, Stephens M (2008) Interpreting principal component analyses of spatial population genetic variation. Nat Genet 40:646

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Baird NA, Etter PD, Atwood TS et al (2008) Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One 3:e3376. https://doi.org/10.1371/journal.pone.0003376

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Durand E, Jay F, Gaggiotti OE, François O (2009) Spatial inference of admixture proportions and secondary contact zones. Mol Biol Evol 26:1963–1973. https://doi.org/10.1093/molbev/msp106

    Article  CAS  PubMed  Google Scholar 

  18. Corander J, Marttinen P (2006) Bayesian identification of admixture events using multilocus molecular markers. Mol Ecol 15:2833–2843. https://doi.org/10.1111/j.1365-294X.2006.02994.x

    Article  PubMed  Google Scholar 

  19. Raj A, Stephens M, Pritchard JK (2014) fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics 197:573. https://doi.org/10.1534/genetics.114.164350

    Article  PubMed  PubMed Central  Google Scholar 

  20. Frichot E, François O (2015) LEA: an R package for landscape and ecological association studies. Methods Ecol Evol 6:925–929. https://doi.org/10.1111/2041-210X.12382

    Article  Google Scholar 

  21. Frichot E, Mathieu F, Trouillon T et al (2014) Fast and efficient estimation of individual ancestry coefficients. Genetics 196:973. https://doi.org/10.1534/genetics.113.160572

    Article  PubMed  PubMed Central  Google Scholar 

  22. Caye K, Deist TM, Martins H et al (2016) TESS3: fast inference of spatial population structure and genome scans for selection. Mol Ecol Resour 16:540–548. https://doi.org/10.1111/1755-0998.12471

    Article  CAS  PubMed  Google Scholar 

  23. Charlesworth B, Morgan MT, Charlesworth D (1993) The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Pont C, Leroy T, Seidel M et al (2019) Tracing the ancestry of modern bread wheats. Nat Genet 51:905–911. https://doi.org/10.1038/s41588-019-0393-z

    Article  CAS  PubMed  Google Scholar 

  25. Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Charlesworth B (2009) Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation. Nat Rev Genet 10:195–205. https://doi.org/10.1038/nrg2526

    Article  CAS  PubMed  Google Scholar 

  27. Sigwart J (2009) Coalescent theory: an introduction. Syst Biol 58:162–165. https://doi.org/10.1093/schbul/syp004

    Article  Google Scholar 

  28. Terhorst J, Kamm JA, Song YS (2017) Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat Genet 49:303–309. https://doi.org/10.1038/ng.3748

    Article  CAS  PubMed  Google Scholar 

  29. Li H, Durbin R (2011) Inference of human population history from individual whole-genome sequences. Nature 475:493

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Schiffels S, Durbin R (2014) Inferring human population size and separation history from multiple genome sequences. Nat Genet 46:919

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Excoffier L, Dupanloup I, Huerta-Sánchez E et al (2013) Robust demographic inference from genomic and SNP data. PLoS Genet 9:e1003905. https://doi.org/10.1371/journal.pgen.1003905

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD (2009) Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet 5:e1000695. https://doi.org/10.1371/journal.pgen.1000695

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Roux C, Fraïsse C, Romiguier J et al (2016) Shedding light on the Grey zone of speciation along a continuum of genomic divergence. PLoS Biol 14:e2000234. https://doi.org/10.1371/journal.pbio.2000234

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Akashi H, Osada N, Ohta T (2012) Weak selection and protein evolution. Genetics 192:15. https://doi.org/10.1534/genetics.112.140178

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Lu J, Tang T, Tang H et al (2006) The accumulation of deleterious mutations in rice genomes: a hypothesis on the cost of domestication. Trends Genet 22:126–131. https://doi.org/10.1016/j.tig.2006.01.004

    Article  CAS  PubMed  Google Scholar 

  36. Yang J, Mezmouk S, Baumgarten A et al (2017) Incomplete dominance of deleterious alleles contributes substantially to trait variation and heterosis in maize. PLoS Genet 13:e1007019. https://doi.org/10.1371/journal.pgen.1007019

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Liu Q, Zhou Y, Morrell PL, Gaut BS (2017) Deleterious variants in Asian Rice and the potential cost of domestication. Mol Biol Evol 34:908–924. https://doi.org/10.1093/molbev/msw296

    Article  CAS  PubMed  Google Scholar 

  38. Ramu P, Esuma W, Kawuki R et al (2017) Cassava haplotype map highlights fixation of deleterious mutations during clonal propagation. Nat Genet 49:959

    Article  CAS  PubMed  Google Scholar 

  39. Zhou Y, Massonnet M, Sanjak JS et al (2017) Evolutionary genomics of grape (Vitis vinifera ssp. vinifera) domestication. Proc Natl Acad Sci USA 114:11715. https://doi.org/10.1073/pnas.1709257114

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Stein JC, Yu Y, Copetti D et al (2018) Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza. Nat Genet 50:285–296. https://doi.org/10.1038/s41588-018-0040-0

    Article  CAS  PubMed  Google Scholar 

  41. Marsden CD, Ortega-Del Vecchyo D, O’Brien DP et al (2016) Bottlenecks and selective sweeps during domestication have increased deleterious genetic variation in dogs. Proc Natl Acad Sci U S A 113:152. https://doi.org/10.1073/pnas.1512501113

    Article  CAS  PubMed  Google Scholar 

  42. Ng PC, Henikoff S (2001) Predicting deleterious amino acid substitutions. Genome Res 11:863–874

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Choi Y, Sims GE, Murphy S et al (2012) Predicting the functional effect of amino acid substitutions and indels. PLoS One 7:e46688. https://doi.org/10.1371/journal.pone.0046688

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Peischl S, Excoffier L (2015) Expansion load: recessive mutations and the role of standing genetic variation. Mol Ecol 24:2084–2094. https://doi.org/10.1111/mec.13154

    Article  PubMed  Google Scholar 

  45. Henn BM, Botigué LR, Bustamante CD et al (2015) Estimating the mutation load in human genomes. Nat Rev Genet 16:333

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Henn BM, Botigué LR, Peischl S et al (2016) Distance from sub-Saharan Africa predicts mutational load in diverse human genomes. Proc Natl Acad Sci U S A 113:E440. https://doi.org/10.1073/pnas.1510805112

    Article  CAS  PubMed  Google Scholar 

  47. Simons YB, Turchin MC, Pritchard JK, Sella G (2014) The deleterious mutation load is insensitive to recent population history. Nat Genet 46:220–224. https://doi.org/10.1038/ng.2896

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Lewontin RC, Krakauer J (1973) Distribution of gene frequency as a test of the selective neutrality of polymorphisms. Genetics 74:175

    CAS  PubMed  PubMed Central  Google Scholar 

  49. Bierne N, Roze D, Welch JJ (2013) Pervasive selection or is it…? Why are FST outliers sometimes so frequent? Mol Ecol 22:2061–2064. https://doi.org/10.1111/mec.12241

    Article  PubMed  Google Scholar 

  50. Bierne N, Welch J, Loire E et al (2011) The coupling hypothesis: why genome scans may fail to map local adaptation genes. Mol Ecol 20:2044–2072. https://doi.org/10.1111/j.1365-294X.2011.05080.x

    Article  PubMed  Google Scholar 

  51. Lotterhos KE, Whitlock MC (2015) The relative power of genome scans to detect local adaptation depends on sampling design and statistical method. Mol Ecol 24:1031–1046. https://doi.org/10.1111/mec.13100

    Article  PubMed  Google Scholar 

  52. Nei M, Maruyama T (1975) Lewontin-Krakauer test for neutral genes. Genetics 80:395

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Robertson A (1975) Remarks on the Lewontin-Krakauer. Genetics 80:396

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Gautier M (2015) Genome-wide scan for adaptive divergence and association with population-specific covariates. Genetics 201:1555. https://doi.org/10.1534/genetics.115.181453

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Whitlock MC, Lotterhos KE (2015) Reliable detection of loci responsible for local adaptation: inference of a null model through trimming the distribution of FST. Am Nat 186:S24–S36. https://doi.org/10.1086/682949

    Article  PubMed  Google Scholar 

  56. Luu K, Bazin E, Blum MGB (2017) Pcadapt: an R package to perform genome scans for selection based on principal component analysis. Mol Ecol Resour 17:67–77. https://doi.org/10.1111/1755-0998.12592

    Article  CAS  PubMed  Google Scholar 

  57. Abdellaoui A, Hottenga J-J, de Knijff P et al (2013) Population structure, migration, and diversifying selection in the Netherlands. Eur J Hum Genet 21:1277

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Jackson DA (1993) Stopping rules in principal components analysis: a comparison of Heuristical and statistical approaches. Ecology 74:2204–2214. https://doi.org/10.2307/1939574

    Article  Google Scholar 

  59. Schlötterer C, Tobler R, Kofler R, Nolte V (2014) Sequencing pools of individuals—mining genome-wide polymorphism data without big funding. Nat Rev Genet 15:749

    Article  PubMed  Google Scholar 

  60. Gautier M, Foucaud J, Gharbi K et al (2013) Estimation of population allele frequencies from next-generation sequencing data: pool-versus individual-based genotyping. Mol Ecol 22:3766–3779. https://doi.org/10.1111/mec.12360

    Article  CAS  PubMed  Google Scholar 

  61. Leroy T, Rougemont Q, Dupouey J-L, et al (2018) Massive postglacial gene flow between European white oaks uncovered genes underlying species barriers. bioRxiv. https://doi.org/10.1101/246637

  62. Plomion C, Aury J-M, Amselem J et al (2018) Oak genome reveals facets of long lifespan. Nat Plants 4:440–452. https://doi.org/10.1038/s41477-018-0172-3

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. De Vries SMG, Alan M, Bozzano M, Burianek V, Collin E, Cottrell J, Ivankovic M, Kelleher CT, Koskela J, Rotach P, Vietto L, Yrjänä L (2015) Pan-European strategy for genetic conservation of forest trees and establishment of a core network of dynamic conservation units. XF2017001223. EUFORGEN/BI, Paris. http://www.euforgen.org/fileadmin/templates/euforgen.org/upload/Publications/Thematic_publications/EUFORGEN_FGR_conservation_strategy_web.pdf

  64. Lindner MS, Kollock M, Zickmann F, Renard BY (2013) Analyzing genome coverage profiles with applications to quality control in metagenomics. Bioinformatics 29:1260–1267. https://doi.org/10.1093/bioinformatics/btt147

    Article  CAS  PubMed  Google Scholar 

  65. Kofler R, Orozco-terWengel P, De Maio N et al (2011) PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals. PLoS One 6:e15925. https://doi.org/10.1371/journal.pone.0015925

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Pickrell JK, Pritchard JK (2012) Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet 8:e1002967. https://doi.org/10.1371/journal.pgen.1002967

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Reich D, Thangaraj K, Patterson N et al (2009) Reconstructing Indian population history. Nature 461:489

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Keinan A, Mullikin JC, Patterson N, Reich D (2007) Measurement of the human allele frequency spectrum demonstrates greater genetic drift in east Asians than in Europeans. Nat Genet 39:1251

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Kofler R, Pandey RV, Schlötterer C (2011) PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq). Bioinformatics 27:3435–3436. https://doi.org/10.1093/bioinformatics/btr589

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Hivert V, Leblois R, Petit EJ et al (2018) Measuring genetic differentiation from Pool-seq data. Genetics 210:315. https://doi.org/10.1534/genetics.118.300900

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Coop G, Witonsky D, Di Rienzo A, Pritchard JK (2010) Using environmental correlations to identify loci underlying local adaptation. Genetics 185:1411–1423. https://doi.org/10.1534/genetics.110.114819

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Günther T, Coop G (2013) Robust identification of local adaptation from allele frequencies. Genetics 195:205. https://doi.org/10.1534/genetics.113.152462

    Article  PubMed  PubMed Central  Google Scholar 

  73. Pavlidis P, Jensen JD, Stephan W, Stamatakis A (2012) A critical assessment of storytelling: gene ontology categories and the importance of validating genomic scans. Mol Biol Evol 29:3237–3248. https://doi.org/10.1093/molbev/mss136

    Article  CAS  PubMed  Google Scholar 

  74. Calus MPL, Vandenplas J (2018) SNPrune: an efficient algorithm to prune large SNP array and sequence datasets based on high linkage disequilibrium. Genet Sel Evol 50:34. https://doi.org/10.1186/s12711-018-0404-z

    Article  PubMed  PubMed Central  Google Scholar 

  75. Roux C, Tsagkogeorga G, Bierne N, Galtier N (2013) Crossing the species barrier: genomic hotspots of introgression between two highly divergent Ciona intestinalis species. Mol Biol Evol 30:1574–1587

    Article  CAS  PubMed  Google Scholar 

  76. Fraïsse C, Roux C, Gagnaire P-A et al (2018) The divergence history of European blue mussel species reconstructed from approximate Bayesian computation: the effects of sequencing techniques and sampling strategies. PeerJ 6:e5198. https://doi.org/10.7717/peerj.5198

    Article  PubMed  PubMed Central  Google Scholar 

  77. Rougemont Q, Gagnaire P-A, Perrier C et al (2017) Inferring the demographic history underlying parallel genomic divergence among pairs of parasitic and nonparasitic lamprey ecotypes. Mol Ecol 26:142–162. https://doi.org/10.1111/mec.13664

    Article  CAS  PubMed  Google Scholar 

  78. Tine M, Kuhl H, Gagnaire P-A et al (2014) European sea bass genome and its variation provide insights into adaptation to euryhalinity and speciation. Nat Commun 5:5770

    Article  CAS  PubMed  Google Scholar 

  79. Hermisson J (2009) Who believes in whole-genome scans for selection? Heredity 103:283–284

    Article  CAS  PubMed  Google Scholar 

  80. Fraïsse C, Roux C, Welch JJ, Bierne N (2014) Gene-flow in a mosaic hybrid zone: is local introgression adaptive? Genetics 197:939. https://doi.org/10.1534/genetics.114.161380

    Article  PubMed  PubMed Central  Google Scholar 

  81. Le Moan A, Gagnaire P-A, Bonhomme F (2016) Parallel genetic divergence among coastal–marine ecotype pairs of European anchovy explained by differential introgression after secondary contact. Mol Ecol 25:3187–3202. https://doi.org/10.1111/mec.13627

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

The analyses benefited from the Montpellier Bioinformatics Biodiversity (MBB) platform services, the genotoul bioinformatics platform Toulouse Midi-Pyrenees (Bioinfo Genotoul), the Bird Platform of the University of Nantes and Compute Canada (Graham servers). This work takes its source from a diverse range of research contributions and projects we achieved during the last 5 years. During this period, TL was supported by different postdoctoral fellowships from the French Agence Nationale de la Recherche (ANR, Genoak project, PI: Christophe Plomion, 11-BSV6-009-021 and BirdIslandGenomic, PI: Benoit Nabholz, ANR-14-CE02-0002), from the European Research Council (ERC, Treepeace, PI: Antoine Kremer, Grant Agreement no. 339728), and from the University of Vienna, Austria (PI: Christian Lexer). QR was supported by the government of Canada through Genome Canada, Genome British Columbia and Genome Quebec. QR wants to thank Louis Bernatchez for the opportunity to develop various projects during his postdoctoral research. We want to thank Jean-Marc Aury, Antoine Kremer, and Christophe Plomion for providing access to the oak sequencing data. We also thank Philippe Vigouroux and Philippe Cubry for information concerning the African rice data and Pierre-Alexandre Gagnaire and Nicolas Bierne for discussions on TreeMix. This book chapter is dedicated to Prof. Christian Lexer, who through his career greatly advanced our knowledge of population genomics and evolutionary botany.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thibault Leroy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Leroy, T., Rougemont, Q. (2021). Introduction to Population Genomics Methods. In: Besse, P. (eds) Molecular Plant Taxonomy. Methods in Molecular Biology, vol 2222. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0997-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-0997-2_16

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-0996-5

  • Online ISBN: 978-1-0716-0997-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics