Abstract
High-throughput sequencing technologies have provided an unprecedented opportunity to study the different evolutionary forces that have shaped present-day patterns of genetic diversity, with important implications for many directions in plant biology research. To manage such massive quantities of sequencing data, biologists, however, need new additional skills in informatics and statistics. In this chapter, our objective is to introduce population genomics methods to beginners following a learning-by-doing strategy in order to help the reader to analyze the sequencing data by themselves. Conducted analyses cover several main areas of evolutionary biology, such as an initial description of the evolutionary history of a given species or the identification of genes targeted by natural or artificial selection. In addition to the practical advices, we performed re-analyses of two cases studies with different kind of data: a domesticated cereal (African rice) and a non-domesticated tree species (sessile oak). All the code needed to replicate this work is publicly available on github (https://github.com/ThibaultLeroyFr/Intro2PopGenomics/).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Charlesworth B (2010) Molecular population genomics: a short history. Genet Res 92:397–411. https://doi.org/10.1017/S0016672310000522
Wang W, Mauleon R, Hu Z et al (2018) Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557:43–49. https://doi.org/10.1038/s41586-018-0063-9
1001 Genomes Consortium. Electronic address: magnus.nordborg@gmi.oeaw.ac.at, 1001 Genomes Consortium (2016) 1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell 166:481–491. https://doi.org/10.1016/j.cell.2016.05.063
Hartl DL, Clark AG (1998) Principles of population genetics. Sinauer, Sunderland, MA
Cubry P, Tranchant-Dubreuil C, Thuillet A-C et al (2018) The rise and fall of African Rice cultivation revealed by analysis of 246 new genomes. Curr Biol 28:2274–2282.e6. https://doi.org/10.1016/j.cub.2018.05.066
Leroy T, Louvet J-M, Lalanne C, et al (2019) Adaptive introgression as a driver of local adaptation to climate in European white oaks bioRxiv 584847. https://doi.org/10.1101/584847
Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303.3997
Li H, Durbin R (2009) Fast and accurate short read alignment with burrows–wheeler transform. Bioinformatics 25:1754–1760. https://doi.org/10.1093/bioinformatics/btp324
Langmead B, Salzberg SL (2012) Fast gapped-read alignment with bowtie 2. Nat Methods 9:357–359. https://doi.org/10.1038/nmeth.1923
Makino T, Rubin C-J, Carneiro M et al (2018) Elevated proportions of deleterious genetic variation in domestic animals and plants. Genome Biol Evol 10:276–290. https://doi.org/10.1093/gbe/evy004
Meyer RS, Purugganan MD (2013) Evolution of crop species: genetics of domestication and diversification. Nat Rev Genet 14:840
Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164:1567
Hubisz MJ, Falush D, Stephens M, Pritchard JK (2009) Inferring weak population structure with the assistance of sample group information. Mol Ecol Resour 9:1322–1332. https://doi.org/10.1111/j.1755-0998.2009.02591.x
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945
Novembre J, Stephens M (2008) Interpreting principal component analyses of spatial population genetic variation. Nat Genet 40:646
Baird NA, Etter PD, Atwood TS et al (2008) Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One 3:e3376. https://doi.org/10.1371/journal.pone.0003376
Durand E, Jay F, Gaggiotti OE, François O (2009) Spatial inference of admixture proportions and secondary contact zones. Mol Biol Evol 26:1963–1973. https://doi.org/10.1093/molbev/msp106
Corander J, Marttinen P (2006) Bayesian identification of admixture events using multilocus molecular markers. Mol Ecol 15:2833–2843. https://doi.org/10.1111/j.1365-294X.2006.02994.x
Raj A, Stephens M, Pritchard JK (2014) fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics 197:573. https://doi.org/10.1534/genetics.114.164350
Frichot E, François O (2015) LEA: an R package for landscape and ecological association studies. Methods Ecol Evol 6:925–929. https://doi.org/10.1111/2041-210X.12382
Frichot E, Mathieu F, Trouillon T et al (2014) Fast and efficient estimation of individual ancestry coefficients. Genetics 196:973. https://doi.org/10.1534/genetics.113.160572
Caye K, Deist TM, Martins H et al (2016) TESS3: fast inference of spatial population structure and genome scans for selection. Mol Ecol Resour 16:540–548. https://doi.org/10.1111/1755-0998.12471
Charlesworth B, Morgan MT, Charlesworth D (1993) The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289
Pont C, Leroy T, Seidel M et al (2019) Tracing the ancestry of modern bread wheats. Nat Genet 51:905–911. https://doi.org/10.1038/s41588-019-0393-z
Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585
Charlesworth B (2009) Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation. Nat Rev Genet 10:195–205. https://doi.org/10.1038/nrg2526
Sigwart J (2009) Coalescent theory: an introduction. Syst Biol 58:162–165. https://doi.org/10.1093/schbul/syp004
Terhorst J, Kamm JA, Song YS (2017) Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat Genet 49:303–309. https://doi.org/10.1038/ng.3748
Li H, Durbin R (2011) Inference of human population history from individual whole-genome sequences. Nature 475:493
Schiffels S, Durbin R (2014) Inferring human population size and separation history from multiple genome sequences. Nat Genet 46:919
Excoffier L, Dupanloup I, Huerta-Sánchez E et al (2013) Robust demographic inference from genomic and SNP data. PLoS Genet 9:e1003905. https://doi.org/10.1371/journal.pgen.1003905
Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD (2009) Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet 5:e1000695. https://doi.org/10.1371/journal.pgen.1000695
Roux C, Fraïsse C, Romiguier J et al (2016) Shedding light on the Grey zone of speciation along a continuum of genomic divergence. PLoS Biol 14:e2000234. https://doi.org/10.1371/journal.pbio.2000234
Akashi H, Osada N, Ohta T (2012) Weak selection and protein evolution. Genetics 192:15. https://doi.org/10.1534/genetics.112.140178
Lu J, Tang T, Tang H et al (2006) The accumulation of deleterious mutations in rice genomes: a hypothesis on the cost of domestication. Trends Genet 22:126–131. https://doi.org/10.1016/j.tig.2006.01.004
Yang J, Mezmouk S, Baumgarten A et al (2017) Incomplete dominance of deleterious alleles contributes substantially to trait variation and heterosis in maize. PLoS Genet 13:e1007019. https://doi.org/10.1371/journal.pgen.1007019
Liu Q, Zhou Y, Morrell PL, Gaut BS (2017) Deleterious variants in Asian Rice and the potential cost of domestication. Mol Biol Evol 34:908–924. https://doi.org/10.1093/molbev/msw296
Ramu P, Esuma W, Kawuki R et al (2017) Cassava haplotype map highlights fixation of deleterious mutations during clonal propagation. Nat Genet 49:959
Zhou Y, Massonnet M, Sanjak JS et al (2017) Evolutionary genomics of grape (Vitis vinifera ssp. vinifera) domestication. Proc Natl Acad Sci USA 114:11715. https://doi.org/10.1073/pnas.1709257114
Stein JC, Yu Y, Copetti D et al (2018) Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza. Nat Genet 50:285–296. https://doi.org/10.1038/s41588-018-0040-0
Marsden CD, Ortega-Del Vecchyo D, O’Brien DP et al (2016) Bottlenecks and selective sweeps during domestication have increased deleterious genetic variation in dogs. Proc Natl Acad Sci U S A 113:152. https://doi.org/10.1073/pnas.1512501113
Ng PC, Henikoff S (2001) Predicting deleterious amino acid substitutions. Genome Res 11:863–874
Choi Y, Sims GE, Murphy S et al (2012) Predicting the functional effect of amino acid substitutions and indels. PLoS One 7:e46688. https://doi.org/10.1371/journal.pone.0046688
Peischl S, Excoffier L (2015) Expansion load: recessive mutations and the role of standing genetic variation. Mol Ecol 24:2084–2094. https://doi.org/10.1111/mec.13154
Henn BM, Botigué LR, Bustamante CD et al (2015) Estimating the mutation load in human genomes. Nat Rev Genet 16:333
Henn BM, Botigué LR, Peischl S et al (2016) Distance from sub-Saharan Africa predicts mutational load in diverse human genomes. Proc Natl Acad Sci U S A 113:E440. https://doi.org/10.1073/pnas.1510805112
Simons YB, Turchin MC, Pritchard JK, Sella G (2014) The deleterious mutation load is insensitive to recent population history. Nat Genet 46:220–224. https://doi.org/10.1038/ng.2896
Lewontin RC, Krakauer J (1973) Distribution of gene frequency as a test of the selective neutrality of polymorphisms. Genetics 74:175
Bierne N, Roze D, Welch JJ (2013) Pervasive selection or is it…? Why are FST outliers sometimes so frequent? Mol Ecol 22:2061–2064. https://doi.org/10.1111/mec.12241
Bierne N, Welch J, Loire E et al (2011) The coupling hypothesis: why genome scans may fail to map local adaptation genes. Mol Ecol 20:2044–2072. https://doi.org/10.1111/j.1365-294X.2011.05080.x
Lotterhos KE, Whitlock MC (2015) The relative power of genome scans to detect local adaptation depends on sampling design and statistical method. Mol Ecol 24:1031–1046. https://doi.org/10.1111/mec.13100
Nei M, Maruyama T (1975) Lewontin-Krakauer test for neutral genes. Genetics 80:395
Robertson A (1975) Remarks on the Lewontin-Krakauer. Genetics 80:396
Gautier M (2015) Genome-wide scan for adaptive divergence and association with population-specific covariates. Genetics 201:1555. https://doi.org/10.1534/genetics.115.181453
Whitlock MC, Lotterhos KE (2015) Reliable detection of loci responsible for local adaptation: inference of a null model through trimming the distribution of FST. Am Nat 186:S24–S36. https://doi.org/10.1086/682949
Luu K, Bazin E, Blum MGB (2017) Pcadapt: an R package to perform genome scans for selection based on principal component analysis. Mol Ecol Resour 17:67–77. https://doi.org/10.1111/1755-0998.12592
Abdellaoui A, Hottenga J-J, de Knijff P et al (2013) Population structure, migration, and diversifying selection in the Netherlands. Eur J Hum Genet 21:1277
Jackson DA (1993) Stopping rules in principal components analysis: a comparison of Heuristical and statistical approaches. Ecology 74:2204–2214. https://doi.org/10.2307/1939574
Schlötterer C, Tobler R, Kofler R, Nolte V (2014) Sequencing pools of individuals—mining genome-wide polymorphism data without big funding. Nat Rev Genet 15:749
Gautier M, Foucaud J, Gharbi K et al (2013) Estimation of population allele frequencies from next-generation sequencing data: pool-versus individual-based genotyping. Mol Ecol 22:3766–3779. https://doi.org/10.1111/mec.12360
Leroy T, Rougemont Q, Dupouey J-L, et al (2018) Massive postglacial gene flow between European white oaks uncovered genes underlying species barriers. bioRxiv. https://doi.org/10.1101/246637
Plomion C, Aury J-M, Amselem J et al (2018) Oak genome reveals facets of long lifespan. Nat Plants 4:440–452. https://doi.org/10.1038/s41477-018-0172-3
De Vries SMG, Alan M, Bozzano M, Burianek V, Collin E, Cottrell J, Ivankovic M, Kelleher CT, Koskela J, Rotach P, Vietto L, Yrjänä L (2015) Pan-European strategy for genetic conservation of forest trees and establishment of a core network of dynamic conservation units. XF2017001223. EUFORGEN/BI, Paris. http://www.euforgen.org/fileadmin/templates/euforgen.org/upload/Publications/Thematic_publications/EUFORGEN_FGR_conservation_strategy_web.pdf
Lindner MS, Kollock M, Zickmann F, Renard BY (2013) Analyzing genome coverage profiles with applications to quality control in metagenomics. Bioinformatics 29:1260–1267. https://doi.org/10.1093/bioinformatics/btt147
Kofler R, Orozco-terWengel P, De Maio N et al (2011) PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals. PLoS One 6:e15925. https://doi.org/10.1371/journal.pone.0015925
Pickrell JK, Pritchard JK (2012) Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet 8:e1002967. https://doi.org/10.1371/journal.pgen.1002967
Reich D, Thangaraj K, Patterson N et al (2009) Reconstructing Indian population history. Nature 461:489
Keinan A, Mullikin JC, Patterson N, Reich D (2007) Measurement of the human allele frequency spectrum demonstrates greater genetic drift in east Asians than in Europeans. Nat Genet 39:1251
Kofler R, Pandey RV, Schlötterer C (2011) PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq). Bioinformatics 27:3435–3436. https://doi.org/10.1093/bioinformatics/btr589
Hivert V, Leblois R, Petit EJ et al (2018) Measuring genetic differentiation from Pool-seq data. Genetics 210:315. https://doi.org/10.1534/genetics.118.300900
Coop G, Witonsky D, Di Rienzo A, Pritchard JK (2010) Using environmental correlations to identify loci underlying local adaptation. Genetics 185:1411–1423. https://doi.org/10.1534/genetics.110.114819
Günther T, Coop G (2013) Robust identification of local adaptation from allele frequencies. Genetics 195:205. https://doi.org/10.1534/genetics.113.152462
Pavlidis P, Jensen JD, Stephan W, Stamatakis A (2012) A critical assessment of storytelling: gene ontology categories and the importance of validating genomic scans. Mol Biol Evol 29:3237–3248. https://doi.org/10.1093/molbev/mss136
Calus MPL, Vandenplas J (2018) SNPrune: an efficient algorithm to prune large SNP array and sequence datasets based on high linkage disequilibrium. Genet Sel Evol 50:34. https://doi.org/10.1186/s12711-018-0404-z
Roux C, Tsagkogeorga G, Bierne N, Galtier N (2013) Crossing the species barrier: genomic hotspots of introgression between two highly divergent Ciona intestinalis species. Mol Biol Evol 30:1574–1587
Fraïsse C, Roux C, Gagnaire P-A et al (2018) The divergence history of European blue mussel species reconstructed from approximate Bayesian computation: the effects of sequencing techniques and sampling strategies. PeerJ 6:e5198. https://doi.org/10.7717/peerj.5198
Rougemont Q, Gagnaire P-A, Perrier C et al (2017) Inferring the demographic history underlying parallel genomic divergence among pairs of parasitic and nonparasitic lamprey ecotypes. Mol Ecol 26:142–162. https://doi.org/10.1111/mec.13664
Tine M, Kuhl H, Gagnaire P-A et al (2014) European sea bass genome and its variation provide insights into adaptation to euryhalinity and speciation. Nat Commun 5:5770
Hermisson J (2009) Who believes in whole-genome scans for selection? Heredity 103:283–284
Fraïsse C, Roux C, Welch JJ, Bierne N (2014) Gene-flow in a mosaic hybrid zone: is local introgression adaptive? Genetics 197:939. https://doi.org/10.1534/genetics.114.161380
Le Moan A, Gagnaire P-A, Bonhomme F (2016) Parallel genetic divergence among coastal–marine ecotype pairs of European anchovy explained by differential introgression after secondary contact. Mol Ecol 25:3187–3202. https://doi.org/10.1111/mec.13627
Acknowledgments
The analyses benefited from the Montpellier Bioinformatics Biodiversity (MBB) platform services, the genotoul bioinformatics platform Toulouse Midi-Pyrenees (Bioinfo Genotoul), the Bird Platform of the University of Nantes and Compute Canada (Graham servers). This work takes its source from a diverse range of research contributions and projects we achieved during the last 5 years. During this period, TL was supported by different postdoctoral fellowships from the French Agence Nationale de la Recherche (ANR, Genoak project, PI: Christophe Plomion, 11-BSV6-009-021 and BirdIslandGenomic, PI: Benoit Nabholz, ANR-14-CE02-0002), from the European Research Council (ERC, Treepeace, PI: Antoine Kremer, Grant Agreement no. 339728), and from the University of Vienna, Austria (PI: Christian Lexer). QR was supported by the government of Canada through Genome Canada, Genome British Columbia and Genome Quebec. QR wants to thank Louis Bernatchez for the opportunity to develop various projects during his postdoctoral research. We want to thank Jean-Marc Aury, Antoine Kremer, and Christophe Plomion for providing access to the oak sequencing data. We also thank Philippe Vigouroux and Philippe Cubry for information concerning the African rice data and Pierre-Alexandre Gagnaire and Nicolas Bierne for discussions on TreeMix. This book chapter is dedicated to Prof. Christian Lexer, who through his career greatly advanced our knowledge of population genomics and evolutionary botany.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Leroy, T., Rougemont, Q. (2021). Introduction to Population Genomics Methods. In: Besse, P. (eds) Molecular Plant Taxonomy. Methods in Molecular Biology, vol 2222. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0997-2_16
Download citation
DOI: https://doi.org/10.1007/978-1-0716-0997-2_16
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-0996-5
Online ISBN: 978-1-0716-0997-2
eBook Packages: Springer Protocols