Background

Introgression is the incorporation of genetic material from one (sub)species into the gene pool of another by means of hybridization and backcrossing (Arnold 2006). For several reasons, it is a common phenomenon in birds: the widespread occurrence of avian hybridization (Ottenburghs et al. 2015) and the slow evolution of intrinsic postzygotic isolation (i.e. hybrids between distantly related species are still fertile), which enables backcrossing, increase the potential for introgression (Fitzpatrick 2004). Indeed, numerous studies have documented the exchange of genetic material between bird species (Rheindt and Edwards 2011). We searched in Thomson Reuters’ Web of Science™ and Elsevier Scopus® for genetic studies on avian introgression. Our literature search resulted in 165 studies published between 1987 and 2017 (see Additional file 1: Table S1). Publishing studies on avian introgression has not been restricted to ornithological journals. The majority of papers appeared in general journals, indicating that the study of avian hybridization provides insights into broad biological questions concerning evolution and ecology (Fig. 1). Intriguingly, from 2000 onwards, avian introgression studies also start appearing in conservation-orientated journals. This trend is possibly driven by the Allendorf et al. (2001) paper ‘The problem with hybrids: setting conservation guidelines’ and emphasizes the importance of understanding avian hybridization in relation to conservation.

Fig. 1
figure 1

Percentage of journals that published papers on avian introgression in three time periods (1987‒2000, 2001‒2010, and 2011‒2017)

Most studies focused on members of the Passeriformes, Galliformes, Anseriformes and Charadriiformes, bird orders that display high levels of hybridization (Ottenburghs et al. 2015). Furthermore, several hybridizing species pairs have become model systems in the study of avian hybridization and introgression: among others, Collared Flycatcher (Ficedula albicollis) and Pied Flycatcher (F. hypoleuca), White-collared Manakin (Manacus candei) and Golden-collared Manakin (M. vitellinus), and Red-legged Partridge (Alectoris rufa) and Chukar Partridge (A. chukar). The study of these model systems mirrors the development of molecular techniques. For instance, the flycatcher system has been studied using mtDNA (Tegelström and Gelter 1990), microsatellites (Saetre et al. 2001), SNPs (Borge et al. 2005) and whole genome data (Ellegren et al. 2012). In general, the toolkit for studying introgression has expanded over the years, following the progress in molecular markers (Fig. 2).

Fig. 2
figure 2

Use of different genetic markers in studies on avian introgression in three time periods (1987‒2000, 2001‒2010, and 2011‒2017). From 2001 onwards, genomic techniques are being applied. Traditional markers, such as microsatellites and mtDNA, remain popular

In this review, we focus on the final step in this methodological progress: genomic data, which has become standard practise in ornithology (Kraus and Wink 2015; Jarvis 2016; Oyler-McCance et al. 2016; Toews et al. 2016). Following Toews et al. (2016), we consider the following next generation sequencing techniques as genomic tools: genome sequencing and resequencing, reduced representation techniques (genotype-by-sequencing [GBS] and restriction-site-associated DNA sequencing [RADseq]), sequence capture and RNA sequencing. In this review, we explore how genomic data impacts the different aspects in the study of avian introgression, such as detecting introgression, hybrid zone studies, and the role of introgression in genome architecture.

Detecting hybrids

One of the first steps in avian hybridization research is the identification of admixed individuals. To tackle this issue several software packages have been developed, such as NewHybrids (Anderson and Thompson 2002), AFLPOP (Duchesne and Bernatchez 2002), BAPS (Corander et al. 2004) and HYBRIDLAB (Nielsen et al. 2006). But the most widely used software package is STRUCTURE (Pritchard et al. 2000). Indeed, 62 out of 87 studies (about 71%) on avian introgression applied this software package (Additional file 1: Table S1). STRUCTURE uses multilocus genotype data and is based on a clustering algorithm that assigns individuals to populations. Early studies used microsatellites, often in combination with mitochondrial markers (e.g., Barilani et al. 2005). The rapid progress in sequencing techniques introduced the application of SNPs and other genome-wide markers to this method (Saetre et al. 2003; Kraus et al. 2012). A new and updated version, fastSTRUCTURE, allows for the analysis of large SNP datasets (Raj et al. 2014; Elgvin et al. 2017).

STRUCTURE, however popular it remains since over a decade, does have limitations with respect to the underlying population genetic model, such as adherence to Hardy–Weinberg and linkage equilibria (Jombart et al. 2010). Alternatives to STRUCTURE have been developed to address these issues, such as ADMIXTURE (Alexander et al. 2009) and Discriminant Analysis of Principle Components or DAPC (Jombart et al. 2010). Eventually, STRUCTURE, ADMIXTURE and DAPC (and other similar software) are best utilized alongside each other (Frosch et al. 2014). However, all genomic studies in our literature search relied solely on STRUCTURE.

Phylogenetic discordance

Hybridization and consequent exchange of genetic material can become apparent in phylogenetic analyses with different loci resulting in discordant gene trees (Degnan and Rosenberg 2009). Hence, this gene tree discordance, also referred to as phylogenetic incongruence, can be used to detect introgression. For example, in a phylogenetic analysis of woodpeckers, Fuchs et al. (2013) attributed conflicting topologies between several loci to an ancient hybridization event between members of the Campephilus and the melanerpine (genera Melanerpes and Sphyrapicus) lineages.

However, introgression is not the only process culminating in phylogenetic incongruence. Other biological processes, such as incomplete lineage sorting and gene duplication, can amount to similar patterns (Maddison 1997; Degnan and Rosenberg 2009). Particularly, incomplete lineage sorting, when lineages fail to coalesce in the ancestral population of two species, seems to be pervasive (Pamilo and Nei 1988). Hence, phylogenetic discordance should not be seen as definitive proof of introgression, but rather as a starting point for further analyses. It is advised to provide other lines of evidence to show that gene tree discordance is the outcome of introgressive hybridization, not incomplete lineage sorting.

One way to discriminate between introgression and incomplete lineage sorting is the Patterson’s D-statistic (Durand et al. 2011), a statistical test that was first employed to quantify the amount of genetic exchange between Neanderthals and modern humans (Green et al. 2010). The D-statistic considers ancestral (‘A’) and derived (‘B’) alleles across the genomes of four taxa (Fig. 3). Under the scenario of incomplete lineage sorting without gene flow, two particular allelic patterns ‘ABBA’ and ‘BABA’ should occur equally frequent. An excess of either ABBA or BABA, resulting in a D-statistic that is significantly different from zero, is indicative of gene flow between two taxa. This approach has been extended to more taxa which allows for identifying the direction of gene flow (Eaton and Ree 2013; Pease and Hahn 2015).

Fig. 3
figure 3

The D-statistic considers ancestral (‘A’) and derived (‘B’) alleles across the genomes of four taxa. Under the scenario of incomplete lineage sorting without gene flow, two particular allelic patterns ‘ABBA’ and ‘BABA’ should occur equally frequent. An excess of either ABBA or BABA, resulting in a D-statistic that is significantly different from zero, is indicative of gene flow between two taxa. The table below shows the range of D-statistics for different bird taxa

However, an excess of ABBA or BABA can also arise from other processes, such as non-random mating in the ancestral population due to population structure (Eriksson and Manica 2012). Also, the D-statistic was originally developed to infer introgression on a genome-wide or chromosome-wide scale (Green et al. 2010). Calculating this statistic for small genomic regions or specific loci in order to characterize patterns of introgression across the genome can lead to unreliable results, because significant D-statistics tend to cluster in regions of reduced genetic diversity (Martin et al. 2014). It is, therefore, advised to apply the D-statistic for the detection of genome-wide or chromosome-wide introgression, not to single out possible introgressed regions.

We found five studies on avian introgression that adopted the D-statistic to study introgression patterns in Zimmerius flycatchers (Rheindt et al. 2014), Corvus crows (Poelstra et al. 2014), Darwin’s Finches (Lamichhaney et al. 2015), Aphelocoma jays (Zarza et al. 2016), and Passer sparrows (Elgvin et al. 2017). Most studies applied the D-statistic in the appropriate way (i.e. to infer genome-wide introgression), only the study on Zimmerius flycatchers seems to fall into the trap of using the ABBA-BABA-test to pinpoint introgressed loci. In this study, Rheindt et al. (2014) attempt to reconstruct the evolutionary history of a phenotypically mosaic population of the Peruvian Tyrannulet (Z. viridiflavus), which shares plumage characteristics with the Golden-faced Tyrannulet (Z. chrysops). Mapping the results from the ABBA-BABA-test to the Zebra Finch (Taeniopygia guttata) genome, they find that some ABBA-favoured SNPs are close to genes involved in cell projection and plasma membranes. Given the connection between cell membranes and plumage coloration, the authors speculate that there might have been introgression of plumage alleles between these populations. However, as discussed above, these loci have not necessarily introgressed, they might represent genomic regions of low genetic diversity due to purifying selection. Examining the level of divergence in putative introgressed regions is recommended here.

Another way to deal with introgression and incomplete lineage sorting is to switch from phylogenetic trees to phylogenetic networks (Huson and Bryant 2006; Ottenburghs et al. 2016, 2017). Some methods for constructing phylogenetic networks allow for hybridization while ignoring incomplete lineage sorting (Beiko and Hamilton 2006), whereas other methods take into account incomplete lineage sorting while ignoring hybridization (Maddison and Knowles 2006; Than and Nakhleh 2009). New methods are being developed to reconstruct phylogenetic networks taking both incomplete lineage sorting and hybridization into account (Joly et al. 2009; Kubatko 2009; Meng and Kubatko 2009; Yu et al. 2013; Wen et al. 2016). This network approach has not been applied to avian genomic data yet, but promises to be a fruitful strategy for the detection of introgression.

Genomic clines in hybrid zones

Apart from the descriptive nature of detecting introgression, patterns of interspecific genetic exchange can also be used to study the genetics of reproductive isolation and speciation (Harrison and Larson 2014; Payseur and Rieseberg 2016). The extent of introgression of a specific allele depends on several factors, such as hybrid fitness, reproductive isolation and genetic linkage (Barton 1979; Wu 2001; Payseur 2010). Alleles can be roughly divided into three categories: (1) neutral alleles that are free to flow between species, (2) alleles that confer an adaptive advantage and introgress quickly, and (3) alleles that lead to reduced fitness and inhibit gene flow. Hybrid genomes are a mosaic of these three categories, mingled by migration and recombination (Payseur 2010; Wang et al. 2011). Hence, most species boundaries are semipermeable: some genomic regions (e.g., those leading to reduced hybrid fitness) show restricted gene flow while other regions (e.g., comprising neutral or advantageous alleles) are allowed to flow freely.

Hybrid zones, regions where two genetically distinct populations interbreed, are excellent natural laboratories to explore these locus-specific patterns of introgression (Hewitt 1988; Harrison 1990; Harrison and Larson 2016), especially in combination with the application of geographical cline theory (Barton and Hewitt 1985). Cline theory provides a framework to analyse changes in traits or allele frequencies as a function of geographic distance across a hybrid zone transect. Several characteristics of the observed clines can be used to make inferences about hybrid zone dynamics. For instance, cline width in combination with dispersal rates allows for estimation of selection pressures (Barton and Gale 1993). Alleles and traits under similar selective pressures will show concordant cline widths and centres, whereas those subject to different selection pressures will show displaced cline centres compared to the majority of the other clines (Barton 1983).

Numerous avian hybrid zones have been studied using geographical cline theory (Table 1), often combining morphological and genetic data (e.g., Parsons et al. 1993; Gay et al. 2007; Seneviratne et al. 2016). Although geographical cline analysis is a powerful tool to study hybridization and speciation, it has several limitations. First, it assumes a monotonic change in allele frequency across a linear transect, even though hybrid zones can display mosaic and patchy distributions (e.g., Walsh et al. 2016). Second, there is no clear optimal scale for geographical cline analysis and the scale of sampling can potentially influence the results (Gompert and Buerkle 2011). An alternative method that circumvents these limitations is genomic cline analysis, which is based on the frequency of locus-specific genotypes across a genome-wide admixture gradient. Loci that are potentially involved in reproductive isolation can be identified by discordance of genomic clines with a null model (Gompert and Buerkle 2009, 2011; Fitzpatrick 2013).

Table 1 Genomic studies on avian hybrid zones

Characterizing an avian hybrid zone by means of geographical or genomic cline analyses can provide important insights into the genetic underpinnings of reproductive isolation between the hybridizing species. But each hybrid zone study is just a single snapshot of a complex and continuously changing interaction. To capture the dynamic nature of hybrid zones, one can study a particular hybrid zone across different temporal and/or spatial scales. For example, by comparing historical (2000‒2002) and recent (2010‒2012) genetic data, Taylor et al. (2014) showed that the hybrid zone between Black-capped Chickadee (Poecile atricapillus) and Carolina Chickadee (P. carolensis) in Pennsylvania has moved north due to changing winter temperatures. However, a temporal comparison is only possible when historical data is available or can be obtained from museum specimens (Spurgin et al. 2014; Linck et al. 2017). An alternative strategy is to compare different hybrid zones between the same species (Schaefer et al. 2016; Kingston et al. 2017; Lackey and Boughman 2017). On the one hand, concordant patterns of introgression and differentiation across multiple hybrid zones might pinpoint genes that are important in reproductive isolation or adaptation, regardless of any environmental differences between the hybrid zones. On the other hand, discordant patterns of introgression and adaptation might be used to identify loci under environment-specific selection or drift within the respective hybrid zones. Finally, the spatial comparison of different hybrid zones can be extended to multiple pairs of closely related species. By analysing contact zones between these hybridizing species, patterns of introgression and differentiation can be related to the age of the interacting species, providing insights into the build-up of reproductive isolation and differentiation over time (Grossen et al. 2016; Vijay et al. 2016).

Exploring the genomic landscape

Genomic data allows researchers to zoom out from locus-specific introgression and differentiation uncovered in hybrid zones and study patterns across the whole genome. Here, genome scans provide a powerful approach (Haasl and Payseur 2016): align the genomes of two species, slide a window across them and calculate a divergence statistic—mostly F ST. The resulting picture is a genomic landscape with islands of differentiated regions in a sea of neutral variation. These islands are commonly referred to as ‘differentiation islands’ (Harr 2006) or ‘speciation islands’ (Turner et al. 2005). As the latter term suggests, these islands have often been related to speciation, in particular to the genic view of speciation (Wu 2001) in combination with divergence-with-gene-flow (Pinho and Hey 2010). This view holds that divergent selection against gene flow is initially restricted to a few loci. These loci contribute to reproductive isolation and are less likely to introgress compared to selectively neutral loci. Hence, these loci and closely linked genomic regions are expected to diverge while gene flow homogenizes the remainder of the genome (Feder et al. 2012; Via 2012). Furthermore, if genomic islands of divergence are the outcome of reduced gene flow, one expects to find genes contributing to reproductive isolation within them. An alternative explanation for the formation of genomic islands is that they arose through positive and purifying selection (including background selection at linked sites, referred to as ‘linked selection’) in allopatry, independent of reproductive isolation and gene flow (Cruickshank and Hahn 2014).

The genomic landscape of several avian study systems has been mapped (Table 2). Ellegren et al. (2012) were the first to explore the genomic landscape of divergence in birds. By comparing the genomes of Collared and Pied Flycatcher, they uncovered about 50 ‘islands of differentiation’. These islands were not only characterized by elevated levels of F ST, they also displayed reduced levels of nucleotide diversity, skewed spectra of allele frequencies, and reduced proportions of shared alleles. Combined, these summary statistics are suggestive of selection. Indeed, further exploration of the genomic landscape of flycatchers indicated that the origin of ‘islands of differentiation’ is mainly driven by linked selection, although heterogeneous gene flow cannot be excluded (Burri et al. 2015). The results from these studies highlight the use of other summary statistics apart from F ST. Most genome scans rely on F ST to quantify genetic distance along the genome, but F ST is a relative measure of differentiation that it dependent on the underlying genetic diversity within the population. Other summary statistics provide different perspectives on the processes that sculpted the genomic landscape (Cruickshank and Hahn 2014; Wolf and Ellegren 2017).

Table 2 Avian studies on genomic islands

If ‘islands of differentiation’ play a pivotal role in the origin of new species, one expects to find genes involved in reproductive isolation within these islands. This expectation has been confirmed for Carrion Crows (Corvus corone corone) and Hooded Crows (C. c. cornix). A genome scan uncovered a highly differentiated genomic region which contains genes involved in pigmentation and visual perception. This result was further corroborated by differential patterns of gene expression (Poelstra et al. 2014). Genomic analyses of contact zones between other Corvus subspecies (cornix, corone, orientalis and pectoralis) revealed clustering of certain pigmentation genes, albeit in different genomic islands (Vijay et al. 2016). Other studies, however, have shown that genes potentially involved in reproductive isolation do not always cluster together, instead they are scattered across the genome (Parchman et al. 2013; Ruegg et al. 2014; but see Delmore et al. 2015). These contrasting findings—candidate ‘speciation genes’ clustered in genomic islands versus scattered throughout the genome—suggests that the genetic basis of speciation is highly species-specific and context-dependent.

Future directions

The approaches discussed above (e.g., phylogenetic discordance, genome scans and genomic cline analysis) are best utilized alongside each other to achieve a complete picture of the speciation and hybridization process. For instance, Parchman et al. (2013) combined genomic cline analyses and genome scans (based on F ST) to investigate the Panamanian hybrid zone between White-collared Manakin (Manacus candei) and Golden-collared Manakin (M. vitellinus). They showed that differentiated loci with high F ST and loci with discordant clines did not cluster together in the genome. This is in line with the hypothesis that genetic regions involved in adaptive divergence and reproductive isolation are scattered throughout the genome. Most studies on avian introgression, however, relied on one approach (only 58 of the 165 studies [35%] applied multiple methods).

In addition, findings from different methods can be incorporated into specific speciation scenarios (e.g., allopatric divergence with secondary contact, divergence-with-gene-flow, etc.), which can consequently be tested using a modelling approach (Fig. 4). Several multilocus studies on avian introgression relied on Isolation-with-Migration (IM) models to infer gene flow parameters, along with population divergence times and effective population sizes (Hey and Nielsen 2004; Hey 2010; Pinho and Hey 2010). However, these models only estimate the amount of gene flow, not the timing. Furthermore, recent analyses suggested that false positives might be common (Cruickshank and Hahn 2014; Hey et al. 2015). In contrast to IM models, Approximate Bayesian (ABC) modelling does allow for the comparison of multiple scenarios that differ in the amount and timing of gene flow (Beaumont 2010). This way, it is possible to discriminate among distinct speciation scenarios (Yeung et al. 2011; Raposo do Amaral et al. 2013; Nadachowska-Brzyska et al. 2015; Nater et al. 2015; Smyth et al. 2015). Running these models with genomic datasets, while taking into account findings from other analyses, will broaden our understanding of the role of introgressive hybridization in avian evolution.

Fig. 4
figure 4

The findings from different methods—admixture test, phylogenetic discordance, cline analysis and genomic islands—can be incorporated into specific speciation scenarios, which can consequently be tested using a modelling approach, such as ABC modelling, which allows for the comparison of different scenarios

To discover and describe hybrid zones in the past, many studies relied on ‘randomly’ sampled specimens, mostly collected within and outside the putative hybrid zone. Sampling strategies that take into account the resolution of genomic data will need to be developed. Moreover, each hybrid zone study is just a single snapshot of a complex and continuously changing interaction. To capture the dynamic nature of hybrid zones in a genomic context, sampling should be carried out across different temporal and spatial scales.

Due to the increasing amount of genetic data, biology is moving from a hypothesis-driven to a data-driven science. The same pattern can be observed in ornithological research. This shift brings many computational challenges. Most software packages that are routinely used in avian introgression studies (e.g., STRUCTURE, IMa) cannot cope with large genomic datasets (Darriba et al. 2015). Therefore, new computational tools will need to be developed to handle the increasing amount of genomic data. For instance, ExaML, a computationally more efficient version of the maximum likelihood program RAxML, was developed to analyse 48 complete avian genomes (Kozlov et al. 2015). Jarvis et al. (2014) indicated that ‘these computationally intensive analyses were conducted on more than 9 supercomputer centers and required the equivalent of > 400 years of computing using a single processor.’

Apart from developing new computational methods, analyses should be conducted more efficiently. For example, the newest computer hardware technology, such as graphics processing units (GPUs) and multicore central processing units (CPUs), can be implemented to parallelize calculations (Ayres et al. 2011). In addition, new techniques from Artificial Intelligence and deep learning could be applied to ‘text mine’ genomes which may yield cases of hybridization where we had not expected them (Fogel 2008; Angermueller et al. 2016; Leung et al. 2016).

Conclusions

In this review, we showed how genomic data can be applied to the study of avian introgression. First, the detection of hybrids and backcrosses has improved dramatically, although the monopoly of STRUCTURE should be broken up by applying other software in concert with it (e.g., DAPC or ADMIXTURE). Another way of detecting introgressive hybridization, phylogenetic discordance (i.e. different loci resulting in discordant gene trees), should be regarded as a starting point for further analyses, not as a definitive proof of introgression. Specifically, disentangling introgression from incomplete lineage sorting remains a challenging endeavour, although new techniques, such as the D-statistic, are being developed. Furthermore, with the advent of genomic data, phylogenetics might require a shift from trees to networks (Edwards et al. 2016; Ottenburghs et al. 2016).

The study of hybrid zones has led to important insights into the complex interplay between hybridization and speciation (Harrison and Larson 2016). Genomic data provide the opportunity to augment the resolution of geographical cline analysis. In addition, genomic cline analysis provides a fresh perspective on hybrid zone dynamics, circumventing several limitations of geographical cline analysis (Gompert and Buerkle 2009). It is, however, important to keep in mind that each hybrid zone study is just a single snapshot of a complex and continuously changing interaction. To capture the dynamic nature of hybrid zones, they should be studied across different temporal and spatial scales.

Genome scans, which uncovered a highly heterogeneous genomic landscape, have become a powerful tool in the genomic toolbox (Nosil and Feder 2012). When performing these genome-wide comparisons, one should realize that there is more to life than F ST. Other summary statistics provide different perspectives on the processes that sculpted the genomic landscape (Cruickshank and Hahn 2014; Wolf and Ellegren 2017). The debate which evolutionary processes underlie the genomic landscape, linked selection or reduced gene flow, is still ongoing. Also, the question whether loci involved in reproductive isolation cluster together in ‘islands of speciation’ or whether they are scattered throughout the genome remains to be answered. Exploring the genomic landscapes across the avian tree of life represents an exciting field for further research.

Finally, the findings from different methods should be incorporated into specific speciation scenarios, which can consequently be tested using a modelling approach. Especially, the application of ABC modelling, which allows for the comparison of different scenarios, will increase our understanding in avian speciation, which turns out to be more complex than the classical Mayrian triumvirate of allopatric, sympatric and parapatric speciation. Although there are many computation challenges ahead, this genomic perspective on avian hybridization and speciation will further our understanding in evolution in general.