Purpose of Review
Detecting gene flow between populations or species is a fundamental goal of population genetics and speciation research and is also central for a thorough understanding of the demographic history of lineages. While population genomic data offer an unparalleled opportunity to study gene flow and other evolutionary processes at high resolution, extracting meaningful patterns from such large and complex datasets is rarely straightforward. Recent advances in both theory and methodology have led to a number of newly proposed analytical tools and frameworks for inferring genome-wide patterns of introgression and admixture that can more efficiently leverage population genomic data. Here, we provide an overview of several recent contributions to the problem of estimating gene flow, discuss advantages and potential pitfalls to these approaches, and provide an outlook for future developments.
Three prominent areas of recent research progress include (1) improving upon existing test statistics to detect and measure gene flow, (2) developing efficient frameworks for demographic model testing, and (3) applying supervised machine learning to identify introgressed loci across genomes. Over the past several years, contributions to these three areas have greatly enhanced our ability to study gene flow at various scales (i.e., species, populations, and individual genomes). Here, we highlight six relevant studies within these focal areas that represent particularly novel contributions to the goal of gene flow estimation from genome-scale data.
The inference of gene flow is a notoriously challenging statistical problem that is an integral component of population genomic research. Our survey of the literature revealed a number of important recent contributions to this problem, from the improvement of admixture tests to demographic model testing and inference of specific regions of the genome likely to have crossed boundaries between populations and species. Although these studies represent only a sampling of the current literature, their contributions, along with those from numerous studies in the expanding field of population genomics, are markers of considerable progress in recent years toward addressing the issue of genomic inference of gene flow.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Brandvain Y, Kenney AM, Flagel L, Coop G, Sweigart AL. Speciation and introgression between Mimulus nasutus and Mimulus guttatus. PLoS Genet. 2014;10:e1004410.
Begun DJ, Holloway AK, Stevens K, Hillier LDW, Poh YP, Hahn MW, et al. Population genomics: whole-genome analysis of polymorphism and divergence in Drosophila simulans. PLoS Biol. 2007;5:e310.
Kulathinal RJ, Stevison LS, Noor MAF. The genomics of speciation in Drosophila: diversity, divergence, and introgression estimated using low-coverage genome sequencing. PLoS Genet. 2009;5:e1000550.
Martin SH, Dasmahapatra KK, Nadeau NJ, Salazar C, Walters JR, Simpson F, et al. Genome-wide evidence for speciation with gene flow in Heliconius butterflies. Genome Res. 2013;23:1817–28.
Fontaine MC, Pease JB, Steele A, Waterhouse RM, Neafsey DE, Sharakhov IV, et al. Extensive introgression in a malaria vector species complex revealed by phylogenomics. Science. 2015;347(80):1258524.
Nadeau NJ, Ruiz M, Salazar P, Counterman B, Medina JA, Ortiz-Zuazaga H, et al. Population genomics of parallel hybrid zones in the mimetic butterflies, H. melpomene and H. erato. Genome Res. 2014;24:1316–33.
Rothfels CJ, Johnson AK, Hovenkamp PH, Swofford DL, Roskam HC, Fraser-Jenkins CR, et al. Natural hybridization between genera that diverged from each other approximately 60 million years ago. Am Nat. 2015;185:433–42.
Nürnberger B, Lohse K, Fijarczyk A, Szymura JM, Blaxter ML. Para-allopatry in hybridizing fire-bellied toads (Bombina bombina and B. variegata): inference from transcriptome-wide coalescence analyses. Evolution. 2016;70:1803–18.
Foley NM, Springer MS, Teeling EC. Mammal madness: is the mammal tree of life not yet resolved? Philos Trans R Soc Lond B Biol Sci. 2016;371:20150140.
Tung J, Barreiro LB. The contribution of admixture to primate evolution. Curr Opin Genet Dev. 2017;47:61–8.
Goulet BE, Roda F, Hopkins R. Hybridization in plants: old ideas, new techniques. Plant Physiol. Am Soc Plant Biol. 2017;173:65–78.
Baack EJ, Rieseberg LH. A genomic view of introgression and hybrid speciation. Curr Opin Genet Dev. 2007;17(6):513–8.
Whitney KD, Ahern JR, Campbell LG, Albert LP, King MS. Patterns of hybridization in plants. Perspect Plant Ecol Evol Syst. 2010;12:175–82.
Leaché AD, Harris RB, Maliska ME, Linkem CW. Comparative species divergence across eight triplets of spiny lizards (Sceloporus) using genomic sequence data. Genome Biol Evol. 2013;5:2410–9.
Burbrink FT, Guiher TJ. Considering gene flow when using coalescent methods to delimit lineages of North American pitvipers of the genus Agkistrodon. Zool J Linnean Soc. 2015;173:505–26.
Schield DR, Card DC, Adams RH, Jezkova T, Reyes-Velasco J, Proctor FN, et al. Incipient speciation with biased gene flow between two lineages of the Western Diamondback Rattlesnake (Crotalus atrox). Mol Phylogenet Evol. 2015;83:213–23.
Schield DR, Adams RH, Card DC, Perry BW, Pasquesi GM, Jezkova T, et al. Insight into the roles of selection in speciation from genomic patterns of divergence and introgression in secondary contact in venomous rattlesnakes. Ecol Evol. 2017;7:3951–66.
Harrington SM, Hollingsworth BD, Higham TE, Reeder TW. Pleistocene climatic fluctuations drive isolation and secondary contact in the red diamond rattlesnake (Crotalus ruber) in Baja California. J Biogeogr. 2018;45:64–75.
Rheindt FE, Edwards SV. Genetic introgression: an integral but neglected component of speciation in birds. Auk. 2011;128:620–32.
Clarkson CS, Weetman D, Essandoh J, Yawson AE, Maslen G, Manske M, et al. Adaptive introgression between Anopheles sibling species eliminates a major genomic island but not reproductive isolation. Nat Commun. 2014;5:4248.
Gladieux P, Ropars J, Badouin H, Branca A, Aguileta G, De Vienne DM, et al. Fungal evolutionary genomics provides insight into the mechanisms of adaptive divergence in eukaryotes. Mol Ecol. 2014;23:753–73.
Schardl CL, Craven KD. Interspecific hybridization in plant-associated fungi and oomycetes: a review. Mol Ecol. 2003;12:2861–73.
Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, et al. A draft sequence of the Neandertal genome. Science. 2010;328(5979):710–22.
Wall JD, Yang MA, Jay F, Kim SK, Durand EY, Stevison LS, et al. Higher levels of Neanderthal ancestry in east Asians than in Europeans. Genetics. 2013;194:199–209.
Durand EY, Patterson N, Reich D, Slatkin M. Testing for ancient admixture between closely related populations. Mol Biol Evol. 2011;28:2239–52.
Raghavan M, Steinrücken M, Harris K, Schiffels S, Rasmussen S, DeGiorgio M, et al. Genomic evidence for the Pleistocene and recent population history of native Americans. Science. 2015;349:aab3884.
Skoglund P, Mallick S, Bortolini MC, Chennagiri N, Hünemeier T, Petzl-Erler ML, et al. Genetic evidence for two founding populations of the Americas. Nature. 2015.
Gopalakrishnan S, Sinding MHS, Ramos-Madrigal J, Niemann J, Samaniego Castruita JA, Vieira FG, et al. Interspecific gene flow shaped the evolution of the genus Canis. Curr Biol. 2018;28:3441–3449.e5.
Nadeau NJ, Martin SH, Kozak KM, Salazar C, Dasmahapatra KK, Davey JW, et al. Genome-wide patterns of divergence and gene flow across a butterfly radiation. Mol Ecol. 2013;22:814–26.
Cahill JA, Green RE, Fulton TL, Stiller M, Jay F, Ovsyanikov N, et al. Genomic evidence for island population conversion resolves conflicting theories of polar bear evolution. PLoS Genet. 2013;9:e1003345.
Eaton DAR, Hipp AL, González-Rodríguez A, Cavender-Bares J. Historical introgression among the American live oaks and the comparative nature of tests for introgression. Evolution. 2015;69(10):2587–601.
Gladieux P, Condon B, Ravel S, Soanes D, Maciel JLN, Nhani A, et al. Gene flow between divergent cereal- and grass-specific lineages of the rice blast fungus Magnaporthe oryzae. MBio. 2018;9.
Slatkin M, Pollack JL. Subdivision in an ancestral species creates asymmetry in gene trees. Mol Biol Evol. 2008;25:2241–6.
Pease JB, Hahn MW. Detection and polarization of introgression in a five-taxon phylogeny. Syst Biol. 2015;64:651–62.
Martin SH, Davey JW, Jiggins CD. Evaluating the use of ABBA-BABA statistics to locate introgressed loci. Mol Biol Evol. 2015;32:244–57.
DeGiorgio M, Rosenberg NA. Consistency and inconsistency of consensus methods for inferring species trees from gene trees in the presence of ancestral population structure. Theor Popul Biol. 2016.
Yang MA, Malaspinas AS, Durand EY, Slatkin M. Ancient structure in Africa unlikely to explain Neanderthal and non-African genetic similarity. Mol Biol Evol. 2012;29:2987–95.
Eriksson A, Manica A. Effect of ancient population structure on the degree of polymorphism shared between modern human populations and ancient hominins. Proc Natl Acad Sci. 2012;109:13956–60.
Theunert C, Slatkin M. Distinguishing recent admixture from ancestral population structure. Genome Biol Evol. 2017;9:427–37.
Siva N. 1000 Genomes project. London: Nature Publishing Group; 2008.
Stoneking M, Krause J. Learning about human population history from ancient and modern genomes. Nat Rev Genet. 2011;12:603–14.
Soraggi S, Wiuf C, Albrechtsen A. Powerful inference with the D-statistic on low-coverage whole-genome data. G3 Genes, Genomes, Genet. G3: Genes,Genomes, Genetics. 2018;8:551–66.
Satler JD, Carstens BC. Phylogeographic concordance factors quantify phylogeographic congruence among co-distributed species in the Sarracenia alata pitcher plant system. Evolution. 2016;70(5):1105–19.
Krehenwinkel H, Rödder D, Tautz D. Eco-genomic analysis of the poleward range expansion of the wasp spider Argiope bruennichi shows rapid adaptation and genomic admixture. Glob Chang Biol. 2015;21:4320–32.
Anna P, Lacey KL. Genomic tests of the species-pump hypothesis: recent island connectivity cycles drive population divergence but not speciation in Caribbean crickets across the Virgin Islands. Evolution(N Y). 2015;69:1501–17.
Roesti M, Kueng B, Moser D, Berner D. The genomics of ecological vicariance in threespine stickleback fish. Nat Commun. 2015;6:8767.
Meier JI, Sousa VC, Marques DA, Selz OM, Wagner CE, Excoffier L, et al. Demographic modelling with whole-genome data reveals parallel origin of similar Pundamilia cichlid species after hybridization. Mol Ecol. 2017;26:123–41.
Thomé MTC, Carstens BC. Phylogeographic model selection leads to insight into the evolutionary history of four-eyed frogs. Proc Natl Acad Sci. 2016;113:8010–7.
Portik DM, Leaché AD, Rivera D, Barej MF, Burger M, Hirschfeld M, et al. Evaluating mechanisms of diversification in a Guineo-Congolian tropical forest frog using demographic model selection. Mol Ecol. 2017;26:5245–63.
Barley AJ, Monnahan PJ, Thomson RC, Grismer LL, Brown RM. Sun skink landscape genomics: assessing the roles of micro-evolutionary processes in shaping genetic and phenotypic diversity across a heterogeneous and fragmented landscape. Mol Ecol. 2015;24:1696–712.
Laurent S, Pfeifer SP, Settles ML, Hunter SS, Hardwick KM, Ormond L, et al. The population genomics of rapid adaptation: disentangling signatures of selection and demography in white sands lizards. Mol Ecol. 2016;25:306–23.
Nater A, Burri R, Kawakami T, Smeds L, Ellegren H. Resolving evolutionary relationships in closely related species with whole-genome sequencing data. Syst Biol. 2015;64:1000–17.
Provost KL, Mauck WM, Smith BT. Genomic divergence in allopatric Northern Cardinals of the North American warm deserts is linked to behavioral differentiation. Ecol Evol. 2018;8(24):12456–78.
Jónsson H, Schubert M, Seguin-Orlando A, Ginolhac A, Petersen L, Fumagalli M, et al. Speciation with gene flow in equids despite extensive chromosomal plasticity. Proc Natl Acad Sci. 2014;111:18655–60.
De Manuel M, Kuhlwilm M, Frandsen P, Sousa VC, Desai T, Prado-Martinez J, et al. Chimpanzee genomic diversity reveals ancient admixture with bonobos. Science. 2016;354(6311):477–81.
Hickerson M, Stahl E, Takebayashi N. msBayes: pipeline for testing comparative phylogeographic histories using hierarchical approximate Bayesian computation. BMC Bioinformatics. 2007;8:268.
Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 2009;5:e1000695.
Excoffier L, Foll M. fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics. 2011;27:1332–4.
Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, Foll M. Robust demographic inference from genomic and SNP data. PLoS Genet. 2013;9:e1003905.
Li H, Durbin R. Inference of human population history from individual whole-genome sequences. Nature. 2011;475:493–6.
Sethuraman A, Hey J. IMa2p—parallel MCMC and inference of ancient demography under the isolation with migration (IM) model. Mol Ecol Resour. 2016;16:206–15.
Hobolth A, Christensen OF, Mailund T, Schierup MH. Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet. 2007;3:e7.
Hickerson MJ, Stahl EA, Lessios HA. Test for simultaneous divergence using approximate Bayesian computation. Evolution (N Y). Wiley Online Library. 2006;60:2435–53.
Adams RH, Schield DR, Card DC, Blackmon H, Castoe TA. GppFst: genomic posterior predictive simulations of FST and dxy for identifying outlier loci from population genomic data. Bioinformatics. 2017;33(9):1414–5.
Adams RH, Schield DR, Card DC, Corbin A, Castoe TA. ThetaMater: Bayesian estimation of population size parameter from genomic data. Bioinformatics. 2018;34:1072–3.
Beaumont MA, Zhang W, Balding DJ. Approximate Bayesian computation in population genetics. Genetics. 2002;162(4):2025–35.
Beaumont MA. Approximate Bayesian computation in evolution and ecology. Annu Rev Ecol Evol Syst. 2010;41:379–406.
Hickerson MJ, Carstens BC, Cavender-Bares J, Crandall KA, Graham CH, Johnson JB, et al. Phylogeography’s past, present, and future: 10 years after Avise, 2000. Mol Phylogenet Evol. 2010;54:291–301.
Hickerson MJ, Meyer CP. Testing comparative phylogeographic models of marine vicariance and dispersal using a hierarchical Bayesian approach. BMC Evol Biol. 2008;8:322.
Jackson ND, Carstens BC, Morales AE, O’Meara BC. Species delimitation with gene flow. Syst Biol. 2017;66(5):799–812.
Yang Z, Rannala B. Unguided species delimitation using DNA sequence data from multiple loci. Mol Biol Evol. 2014;31:3125–35.
Yang Z, Rannala B. Bayesian species delimitation using multilocus sequence data. Proc Natl Acad Sci. 2010;107(20):9264–9.
Adams RH, Schield DR, Card DC, Castoe TA. Assessing the impacts of positive selection on coalescent-based species tree estimation and species delimitation. Syst Biol. 2018;67:1076–90.
Edwards SV, Xi Z, Janke A, Faircloth BC, McCormack JE, Glenn TC, et al. Implementing and testing the multispecies coalescent model: a valuable paradigm for phylogenomics. Mol Phylogenet Evol. 2016;94:447–62.
Leaché AD, Zhu T, Rannala B, Yang Z. The spectre of too many species. Syst Biol. 2019;68:168–81.
Witten, Frank, Hall. Data mining: practical machine learning tools and techniques (Google eBook). Complement. Lit. None. 2011.
McCallum A. MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu. 2002.
McQueen RJ, Garner SR, Nevill-Manning CG, Witten IH. Applying machine learning to agricultural data. Comput Electron Agric. 1995;12:275–93.
Sheehan S, Song YS. Deep learning for population genetic inference. PLoS Comput Biol. 2016;12:e1004845.
Schrider DR, Kern AD. Supervised machine learning for population genetics: a new paradigm. Trends Genet. 2018;34:301–12.
Schrider DR, Ayroles J, Matute DR, Kern AD. Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia. PLoS Genet. 2018;14:e1007341.
Pybus M, Luisi P, Dall’Olio GM, Uzkudun M, Laayouni H, Bertranpetit J, et al. Hierarchical boosting: a machine-learning framework to detect and classify hard selective sweeps in human populations. Bioinformatics. 2015;31(24):3946–52.
Schrider DR, Kern AD. S/HIC: robust identification of soft and hard sweeps using machine learning. PLoS Genet. 2016;12:e1005928.
Ronen R, Udpa N, Halperin E, Bafna V. Learning natural selection from the site frequency spectrum. Genetics. 2013;195:181–93.
Lin K, Li H, Schlötterer C, Futschik A. Distinguishing positive selection from neutral evolution: boosting the performance of summary statistics. Genetics. 2011;187:229–44.
Burbrink FT, Gehara M. The biogeography of deep time phylogenetic reticulation. Syst Biol. 2018;67:743–55.
Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nat. Rev. Genet. 2015;16:321–32.
Oyelade J, Isewon I, Oladipupo F, Aromolaran O, Uwoghiren E, Aameh F, et al. Clustering algorithms: their application to gene expression data. Bioinform Biol Insights. 2016;10:BBI.S38316.
Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, Auton A, et al. Genes mirror geography within Europe. Nature. 2008;456(7218):98–101.
Ma S, Dai Y. Principal component analysis based methods in bioinformatics studies. Brief Bioinform. 2011;12:714–22.
Tan AC, Gilbert D. An empirical comparison of supervised machine learning techniques in bioinformatics. Proc First Asia-Pacific Bioinforma Conf Bioinforma 2003.
Hoff KJ, Tech M, Lingner T, Daniel R, Morgenstern B, Meinicke P. Gene prediction in metagenomic fragments: a large scale machine learning approach. BMC Bioinformatics. 2008;9:217.
Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci U S A. 2000;97:262–7.
Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46(1):389–422.
Xu X-S, Li Y-X. Semi-supervised clustering algorithm for haplotype assembly problem based on MEC model. Int J Data Min Bioinform Inderscience Publishers. 2012;6:429–46.
Hoffman MM, Buske OJ, Wang J, Weng Z, Bilmes JA, Noble WS. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat Methods. 2012;9:473–6.
Breiman L. Random Forrest. Mach Learn. 2001;45:5–32.
Schrider DR, Kern AD. Soft sweeps are the dominant mode of adaptation in the human genome. Mol Biol Evol. 2017;34:1863–77.
Lawson DJ, Hellenthal G, Myers S, Falush D. Inference of population structure using dense haplotype data. PLoS Genet. 2012;8:e1002453.
Racimo F, Sankararaman S, Nielsen R, Huerta-Sánchez E. Evidence for archaic adaptive introgression in humans. Nat. Rev. Genet. 2015;16:359–71.
Hedrick PW. Adaptive introgression in animals: examples and comparison to new mutation and standing variation as sources of adaptive variation. Mol Ecol. 2013;22:4606–18.
Pardo-Diaz C, Salazar C, Baxter SW, Merot C, Figueiredo-Ready W, Joron M, et al. Adaptive introgression across species boundaries in Heliconius butterflies. PLoS Genet. 2012;8:e1002752.
Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273–97.
Lecun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–44.
Support was provided from an NSF grant to TAC (DEB-1655571) and Phi Sigma Support to RHA. Additionally, both the Lonestar and Stampede compute systems of the Texas Advanced Computing Center (TACC) were utilized for these analyses.
Conflict of Interest
All authors have no conflict of interest to disclose.
Human and Animal Rights and Informed Consent
This article does not contain any studies with human or animal subjects performed by any of the authors.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the Topical Collection on Population Genetics
About this article
Cite this article
Adams, R.H., Schield, D.R. & Castoe, T.A. Recent Advances in the Inference of Gene Flow from Population Genomic Data. Curr Mol Bio Rep 5, 107–115 (2019). https://doi.org/10.1007/s40610-019-00120-0
- Migration introgression
- Next-generation sequencing