# Adaptive gene introgression after secondary contact

- 2.3k Downloads
- 5 Citations

## Abstract

By hybridization and backcrossing, alleles can surmount species boundaries and be incorporated into the genome of a related species. This introgression of genes is of particular evolutionary relevance if it involves the transfer of adaptations between populations. However, any beneficial allele will typically be associated with other alien alleles that are often deleterious and hamper the introgression process. In order to describe the introgression of an adaptive allele, we set up a stochastic model with an explicit genetic makeup of linked and unlinked deleterious alleles. Based on the theory of reducible multitype branching processes, we derive a recursive expression for the establishment probability of the beneficial allele after a single hybridization event. We furthermore study the probability that slightly deleterious alleles hitchhike to fixation. The key to the analysis is a split of the process into a stochastic phase in which the advantageous alleles establishes and a deterministic phase in which it sweeps to fixation. We thereafter apply the theory to a set of biologically relevant scenarios such as introgression in the presence of many unlinked or few closely linked deleterious alleles. A comparison to computer simulations shows that the approximations work well over a large parameter range.

## Keywords

Branching processes Gene introgression Adaptation Hybridization Genetic hitchhiking## Mathematics Subject Classification

60J85 92D15## 1 Introduction

Hybridization between related species is a common phenomenon. Indeed, Mallet (2005) estimates that at least \(25\,\%\) of plant species and \(10\,\%\) of animal species still interbreed. The disappearance of natural habitat barriers following environmental change, the introduction of foreign species, the escape of domesticated animals into the wild, and the cultivation of crops all create new regions of species range overlap and consequently cause high rates of hybridization. Despite reproductive barriers, hybridization between related species is often not completely prohibited and leads to the production of viable and fertile offspring. In the course of backcrossing with a parental species, alien genetic material is lost, but some part of it may be permanently incorporated into the genome of the sister species. The introgression of genes from one species into another occurs over a wide range of taxa (Rhymer and Simberloff 1996; Lindner et al. 1998; Arnold et al. 1999; Arnold 2004; Miller et al. 2012; Ellstrand et al. 2013). The introgression of genes from feral to wild animals (Adams et al. 2003; Beaumont et al. 2001; Gottelli et al. 1994; Rhymer and Simberloff 1996) or from introduced to native species (Rhymer and Simberloff 1996; Fitzpatrick et al. 2010) poses ecological risks and if extensive may entail a loss of biodiversity.

In addition, evidence for the transfer of adaptations across species boundaries is growing (Arnold et al. 1999; Arnold 2004; Whitney et al. 2006; Schwenk et al. 2008; Arnold and Martin 2009; The *Heliconius* Genome Consortium 2012; Hedrick 2013). Introgression of genes can hence take direct influence on the evolutionary routes of a species and speed up adaptation. For example, the introduced sunflower species *Helianthus annuus* likely has acquired resistance genes from the native and locally adapted species *H. debilis* in Texas, allowing it to expand its species range southwards (Heiser 1951; Whitney et al. 2006). Similarly, Abi-Rached et al. (2011) suggest that positively selected immune system alleles from Neanderthals and Denisovans might have introgressed into modern humans. In agriculture, adaptive gene introgression can potentially constitute a major risk: adaptive herbivore, insecticide, or pathogen resistance genes from (possibly genetically modified) crops can spread to wild relatives, severely complicating weed control (Snow 2002; Snow et al. 2003; Stewart et al. 2003). Importantly, Snow et al. (2003) show that a transgene can reduce herbivory and increase fitness in a wild sunflower under natural conditions.

Early-generation hybrids, even if not entirely infertile or inviable, frequently suffer from strongly reduced fitness. Often, hybrids display an intermediate phenotype that is maladapted to either parental niche. The low hybrid fitness can also result from genetic incompatibilities. By backcrossing with one of the parental species, alleles that prove to be deleterious in the foreign genetic background or cause maladaptation to the parental niches can be purged and fitness restored (Heiser 1951; Arnold et al. 1999). The probability of successful gene introgression critically depends on the strength of this fitness bottleneck.

Theoretical models on adaptive gene introgression that take a reduction in hybrid fitness into account usually assume that a pre-defined number of backcrosses are required in order to lose the deleterious material and obtain a positively selected type (Demon et al. 2007; Gosh and Haccou 2010; Gosh et al. 2012a, b). This assumes that the deleterious effects are homogeneously spread over the genome of a diploid organism and that an appreciable amount of deleterious alleles is required to have a measurably impact on fitness. Focusing on other aspects of gene introgression, such as the impact of a temporally varying environment (Gosh et al. 2012b) or life history traits (Demon et al. 2007), these models greatly simplify the underlying genetics. A step towards more realistic population genetic models has been made by Gosh et al. (2012a). Their analysis remains, however, restricted to the most basic scenario in which a single deleterious allele is linked to the locus under positive selection (see also Barton 1979). For a neutral marker locus, in contrast, the impact of a genetic barrier consisting of an arbitrary number of deleterious alleles has been investigated (Bengtsson 1985; Barton and Bengtsson 1986).

In this paper, we focus on a single hybridization event and examine the impact of linked and unlinked deleterious alleles on the introgression process of an adaptive allele. The deleterious effect of alleles is caused by maladaptation to the new environment and is independent of the genetic background. We first set up a Moran-like model which describes the evolution of the population by genetic drift, selection, and recombination. In the first part of the model analysis, we apply the theory of reducible multitype branching processes to determine by how much deleterious alleles reduce the introgression probability of a favorable allele in dependence of the strength of selection and linkage. The second part considers the probability that closely linked deleterious alleles “hitchhike” to fixation. The analysis relies on a separation of the process into a strongly stochastic phase, in which a haplotype carrying the beneficial allele establishes, and a deterministic phase, in which it sweeps through the population, possibly losing deleterious material by recombination with wildtype individuals. These recombination events and the subsequent establishment or loss of haplotypes with fewer deleterious alleles are again subject to strong stochasticity. In this analysis, we again resort to the theory of branching processes. The derived approximations are applied to a variety of biological scenarios and complemented by computer simulations. We close the paper with a discussion.

## 2 Full model and simulations

We consider a large population of \(N\) haploid individuals. The theory also applies to diploids without dominance if we can assume Hardy-Weinberg equilibrium, but we use the haploid formalism throughout the paper. Through a single hybridization event, a hybrid individual is introduced to the population (for diploids, the alien alleles arrive in the foreign habitat at the haploid stage, i.e., for plants, by pollen dispersal). The hybrid carries an adaptive allele as well as a number of deleterious alleles. Deleterious alleles are either physically linked to the adaptive allele or unlinked. We assume that this initial hybrid haplotype carries \(I\) and \(J\) linked deleterious alleles to the left and the right of the beneficial allele, respectively, and \(F\) unlinked alleles. By recombination with wildtype individuals, haplotypes with fewer introgressed alleles can be generated, leading to a hybrid swarm. Selection on the deleterious alleles relies on maladaptation to the environment and is independent of the genomic context. We assume that all wildtype individuals have the same fitness and introgressed alleles interact identically with all wildtype backgrounds. The fitness of an individual is thus fully determined by the introgressed alleles that it carries.

The evolution of the population is described by the following scheme, which represents a Moran model with recombination: At rate \(N\), two individuals are chosen to reproduce and generate a single offspring. During reproduction, recombination can take place. In order to simplify our bookkeeping of genotypes in the analytical treatment, we restrict ourselves to single crossover among the linked alleles. Multiple crossover is unlikely to happen over recombination distances \(r\) with \(r^2\ll r\) so that the model captures scenarios of tight linkage. Considering larger recombination distances (where multiple crossover gets likely) or gene conversion requires a straightforward extension of the formalism, in which a larger number of genotypes can be generated by recombination with wildtype individuals. Unlinked alleles are inherited with probability one-half. The offspring replaces an individual that is chosen based on its fitness. For notational simplicity, we assign the numbers \(1\) to \(N\) to the individuals. Individual number \(k\) is then chosen with probability \(\frac{1-\sigma ^{(k)}}{\sum \nolimits _{i=1}^N (1-\sigma ^{(i)})}\), where \(\sigma ^{(k)}=0\) for wildtype individuals. I.e., \(\sigma ^{(k)}\) is the Malthusian fitness of individual \(k\) in a wildtype population. All three individuals that are involved in a reproduction event are chosen with replacement, i.e., the same individual might be chosen twice.

The simulation program implements the successive events without consideration for the time spans between them. As we are only interested in probabilities, this does not influence the results. The number of replicates is chosen so that error bars vanish in the symbols in all plots. The simulation program is written in the C++ programming language, making use of the *Gnu Scientific Library* (Galassi et al. 2009).

The full model does not allow for an analytical treatment. In the following sections, we therefore consider approximations to the introgression process.

## 3 The early phase of spread

A single evolutionary step involves three individuals: two individuals that reproduce and one that dies. In a large population, as long as introgressed haplotypes are rare, it is unlikely that more than one hybrid individuals are involved in a single event (or that the same individual is involved twice). Formally, this means that terms of order \((n_{\text {intro}}/N)\) in the transition rates, where \(n_{\text {intro}}\) denotes the number of individuals with introgressed alleles, are negligible. Consequently, hybrids suffer (nearly) independent fates in the early phase of spread, and the process is therefore well described by a multitype branching process. The branching process is strictly recovered in the limit \(N \rightarrow \infty \).

The lack of interaction among hybrids entails that types with introgressed material only recombine with wildtype individuals. This implies that by recombination, they can only lose, not gain deleterious alleles, and we encounter a special instance of a reducible multitype branching process (cf. also Barton and Bengtsson 1986; Demon et al. 2007; Gosh et al. 2012b). By recombination, types that carry only deleterious alleles but not the beneficial allele are generated. We do not consider these types in the following (within a branching process approach, they are doomed to extinction) but focus on carriers of the advantageous allele. In the main text of the paper, we assume that all unlinked deleterious alleles have the same effect size. A generalization of the main results to arbitrary effect sizes is given in Appendix D.

## 4 The probability of adaptive gene introgression

First, we focus on the probability that the beneficial allele establishes in the population. Once the beneficial allele is sufficiently frequent, it is very unlikely to be lost again. The extinction probability of the branching process as described in the previous section is thus a good approximation for the extinction probability of the beneficial allele in the full model (and the introgression probability is the complementary probability). We denote by \(Q_{(i,j;f)}\) the extinction probability of the process that is initiated by exactly one individual of type \((i,j;f)\).

### **Theorem 1**

We only give an illustrative derivation of Eq. (2) here and move the full proof to Appendix A.

In Appendix B, we relate our results to results by Bengtsson (1985) and Barton and Bengtsson (1986) on the impact of a cline on the spread of a neutral marker allele. Our results are consistent with Bengtsson (1985) and Barton and Bengtsson (1986) when all loci are only loosely linked or unlinked but deviate for tight linkage of the deleterious allele.

## 5 The hitchhiking probability

### 5.1 General idea

We first give a derivation for the case without unlinked deleterious material (\(F=0\)), and subsequently generalize the approximation to \(F>0\).

### 5.2 The stochastic phase

As a first step, we determine which haplotype “rescues” the introgression process given that the process does not go extinct. For this initial phase, we again resort to the multitype branching process as defined in Eq. (1). As before, the process is initiated by a single individual of type \((I,J)\). If \(\sigma _{(I,J)}-r_{(I,J)}>0\), type \((I,J)\) has the chance to establish a permanent lineage of its own type. If \(\sigma _{(I,J)}-r_{(I,J)}\le 0\), type \((I,J)\) itself will go extinct with probability \(1\) (ignoring the possibility of fixation by drift). However, until extinction, recombinant offspring with fewer deleterious alleles can be generated and rescue the process. In that case, to determine the “rescue type”, we can consider all recombination pathways that lead to establishment of the beneficial allele and determine with which (relative) probability the various paths are realized. This idea is key for the derivation of the approximation in this section.

Throughout the analysis, the total number of recombination events from type \((I,J)\) to any other type until extinction of type \((I,J)\) constitutes a central quantity. This follows Serra (2006) and Serra and Haccou (2007). For \(\sigma _{(I,J)}-r_{(I,J)}<0\), we denote the corresponding probability generating function (p.g.f.) by \(h(s)\). For \(\sigma _{(I,J)}-r_{(I,J)}>0\) and extinction of type \((I,J)\), we consider the number of recombination events conditioned on extinction of type \((I,J)\) and denote the p.g.f. by \(\hat{h}(s)\). \(h(s)\) and \(\hat{h}(s)\) can be explicitly calculated for our model and are given by Lemma 6 in Appendix E.

As a first step, we derive an alternative expression for the survival probability of the process. To do so, we group the recombinant offspring of type \((I,J)\) individuals into two classes: (1) individuals that found processes that survive (2) individuals that found processes that go extinct. We denote by \(Y_{+}\) and \(Y_{-}\) the random number of recombination events from type \((I,J)\) to type \(1\) and type \(2\) individuals, respectively. In the lemma, we rewrite the survival probability of the process in terms of the expected number of successful recombinant lineages and an error term.

### **Lemma 1**

### *Proof*

### *Remark*

For \(\sigma _{(I,J)} - r_{(I,J)} > 0\), we can consider the process conditioned on extinction of type \((I,J)\). In this case, an analogous result holds if we replace \(h(s)\) by \(\hat{h}(s)\).

Results of a similar structure, which approximate weak recombination, appear in Iwasa et al. (2004a, Eq. (5)) and Serra and Haccou (2007, Eq. (8)).

In order to proceed, we need a formal definition of a “rescue type”. Analogous to the lemma, we can then derive a recursive formula for the probability that an individual of type (i, j) rescues the process.

### **Definition 1**

- (1)
it founds an infinite lineage of type \((i,j)\) individuals,

- (2)
there is no individual in its ancestry that founds an infinite lineage of its own type.

That is, \(P_{(i,j)}^{(I,J)}\) gives the probability that there exists an \((i,j)\) rescue type conditioned on survival of the process. A priori, that does not exclude the simultaneous existence of several rescue types. For the following theorem, we again group the recombinant offspring of a type \((I,J)\) individual into two classes: (1) individuals that found a lineage resulting in at least one individual of type \((i,j,+)\) (2) individuals that do not do that. We denote the number of recombinants of the first and second type with \(Y_{(i,j,+)}\) and \(Y_{(i,j,-)}\), respectively.

### **Theorem 2**

- (1)Let \(\sigma _{(I,J)}-r_{(I,J)}< 0\). It holds thatwhere \(R_1\) is defined as before and \(R_2\) is given by$$\begin{aligned} P_{(i,j)}^{(I,J)} = \frac{\left( \sum \nolimits _{k=0}^{I-1}r^{(I,J)}_{(k,J)} (1-Q_{(k,J)})P_{(i,j)}^{(k,J)}+ \sum \nolimits _{k=0}^{J-1} r^{(I,J)}_{(I,k)} (1-Q_{(I,k)}) P_{(i,j)}^{(I,k)}\right) \frac{\frac{\mathrm {d}}{\mathrm {d}s}h(s)|_{s=1}}{r_{(I,J)}} - R_1 }{\left( \sum \nolimits _{k=0}^{I-1}r^{(I,J)}_{(k,J)} (1-Q_{(k,J)})+ \sum \nolimits _{k=0}^{J-1} r^{(I,J)}_{(I,k)} (1-Q_{(I,k)})\right) \frac{\frac{\mathrm {d}}{\mathrm {d}s}h(s)|_{s=1}}{r_{(I,J)}} - R_2 },\nonumber \\ \end{aligned}$$(16)with$$\begin{aligned} R_2 = \frac{\partial }{\partial s_0} \left( \frac{h_2(s_0,s_1)-h_2(0,s_1)}{s_0}\right) \bigg |_{s_0=s_1=1} \end{aligned}$$(17)and$$\begin{aligned} h_2(s_0,s_1) = h(P_{(i,j,+)}s_0 +(1-P_{(i,j,+)})s_1) \end{aligned}$$(18)$$\begin{aligned} P_{(i,j,+)}=\frac{\sum _{k=0}^{I-1} r_{(k,J)}^{(I,J)} (1-Q_{(k,J)})P_{(i,j)}^{(k,J)} + \sum _{k=0}^{J-1} r_{(I,k)}^{(I,J)} (1-Q_{(I,k)})P_{(i,j)}^{(I,k)}}{r_{(I,J)}}.\nonumber \\ \end{aligned}$$(19)
- (2)For \(\sigma _{(I,J)}-r_{(I,J)}>0\), it holds:with$$\begin{aligned} P_{(I,J)}^{(I,J)} = \frac{1-q_{(I,J)}}{1-Q_{(I,J)}} \end{aligned}$$(20)where \(q_{(I,J)}\) is the unconditioned probability that type \((I,J)\) itself goes extinct. For \((i,j)\ne (I,J)\), it holds:$$\begin{aligned} 1-q_{(I,J)}=\frac{\sigma _{(I,J)}-r_{(I,J)}}{1-r_{(I,J)}}, \end{aligned}$$(21)$$\begin{aligned}&P_{(i,j)}^{(I,J)} = \left( 1-\frac{1-q_{(I,J)}}{1-Q_{(I,J)}}\right) \nonumber \\&\quad \times \frac{\left( \sum \nolimits _{k=0}^{I-1}r^{(I,J)}_{(k,J)} (1-Q_{(k,J)})P_{(i,j)}^{(k,J)}+ \sum \nolimits _{k=0}^{J-1} r^{(I,J)}_{(I,k)} (1-Q_{(I,k)}) P_{(i,j)}^{(I,k)}\right) \frac{\frac{\mathrm {d}}{\mathrm {d}s}\hat{h}(s)|_{s=1}}{r_{(I,J)}} - \hat{R}_1 }{\left( \sum \nolimits _{k=0}^{I-1}r^{(I,J)}_{(k,J)} (1-Q_{(k,J)})+ \sum \nolimits _{k=0}^{J-1} r^{(I,J)}_{(I,k)} (1-Q_{(I,k)})\right) \frac{\frac{\mathrm {d}}{\mathrm {d}s} \hat{h}(s)|_{s=1}}{r_{(I,J)}} - \hat{R}_2 },\nonumber \\ \end{aligned}$$(22)

### *Proof*

### *Remark 1*

For \(\sigma _{(I,J)}-r_{(I,J)}<0\), we have \(\frac{\mathrm {d}}{\mathrm {d}s}h(s)|_{s=1}=\frac{r_{(I,J)}}{r_{(I,J)}-\sigma _{(I,J)}}\). For \(\sigma _{(I,J)}-r_{(I,J)}>0\), we have \(\frac{\mathrm {d}}{\mathrm {d}s}\hat{h}(s)|_{s=1}=\frac{r_{(I,J)}}{\sigma _{(I,J)}-r_{(I,J)}}\).

### **Corollary 1**

The proof for relation Eq. (32) is given in Appendix C.

### 5.3 The deterministic phase

It remains to determine whether the haplotype that establishes in the stochastic phase rises to fixation or if types with less deleterious material can establish during the sweep of the beneficial allele. In order to arrive at an approximation for the deterministic phase, we apply and extend an approach developed in Hartfield and Otto (2011). Hartfield and Otto (2011) determined the hitchhiking probability of a single deleterious allele which is closely linked to a beneficial one. For a single hitchhiker, their method can easily be adapted to our model, as shown below. In the Appendix, we further argue that the approach can be extended to a larger number of hitchhikers. Explicit results for two hitchhikers are derived in Appendix F.

### 5.4 Concatenation of the stochastic and the deterministic phase

## 6 Application to various biological scenarios

### 6.1 The impact of unlinked alleles

### 6.2 The impact of a single linked deleterious allele

A comparison between Panels A/B with Panels C/D of Fig. 4 shows that the introgression probability changes only slightly over the depicted range of recombination, while the hitchhiking probability significantly decreases with increasing recombination distance; the scale is strongly affected by the population size.

### 6.3 The impact of a second deleterious allele

In Panel D, the second deleterious allele can hitchhike to fixation, too (\(\sigma _{(0,2)}-r_{(0,2)}>0\)). The total hitchhiking probability of the closest deleterious alleles (i.e., the probability that either type \((0,1)\) or type \((0,2)\) fixes) is then only moderately reduced. For \(r_{(0,1)}^{(0,2)}\) small, both alleles fix. With increasing recombination distance the analytical result underestimates the true hitchhiking probability (see Appendix H).

Figures 7 and 8 show how the selective disadvantage of a second closely linked deleterious allele affects the hitchhiking probability. If it is on the same side of the beneficial allele as the first one (Fig. 7), the hitchhiking probability of the first one is greatly reduced unless the selective disadvantage is very slight (\(\sigma _{(0,2)}\approx \sigma _{(0,1)}\)). The reduction is greatest for intermediate values of the selection coefficient (see Fig. 7a). In this parameter regime, type \((0,2)\) significantly increases in frequency before a successful lineage of type \((0,0)\) or \((0,1)\) is generated. As a consequence, the time to fixation of the beneficial allele is relatively long. Even if a successful lineage of type \((0,1)\) can establish, it is therefore likely that later, a successful lineage of type \((0,0)\) is generated (see Fig. 7b for an illustration of this reasoning). If the beneficial allele is flanked at equal small recombination distances by two deleterious alleles as in Fig. 8, the total hitchhiking probability of the deleterious allele to the right (fixation of type \((0,1)\) or type \((1,1)\)) is barely influenced by the presence of the second deleterious allele, irrespective of the selective disadvantage of the latter. Note, however, that recombination is weak in Fig. 8. For strong recombination and \(\sigma _{(0,1)}\approx \sigma _{(1,0)}\), it is not unlikely that both a successful lineage of type \((0,1)\) and of type \((1,0)\) establish and coexist for a long time, making the production of a successful \((0,0)\) recombinant very likely (see Fig. 13 in Appendix H).

### 6.4 The impact of several linked deleterious alleles

As another example, consider the special case \(J=1\), \(\sigma _{(0,1)}-r_{(0,1)} >0\), and \(\sigma _{(i,j)}-r_{(i,j)}<0\), \(i>0\). If linkage is tight, the hitchhiking probability is barely reduced by additional deleterious mutations.

## 7 Discussion

Gene flow between related species is frequent in nature. Although many foreign alleles burden their carrier with a selective disadvantage, exchange of genetic material between populations is often still possible. If neutral or advantageous alleles survive the fitness bottleneck caused by linked or unlinked deleterious alleles they can become permanently incorporated into the genome of the sister species. Picking up locally adaptive alleles from an indigenous species can help species to expand their range to previously uninhabitable regions (e.g., Heiser 1951; Whitney et al. 2006). Adaptive gene introgression is hence a clever evolutionary mechanism that can speed up adaptation to novel environments. Human activities create ample opportunity for hybridization between domestic animals or crop plants with their wild relatives (e.g. Fitzpatrick et al. 2010; Ellstrand et al. 1999, 2013). In this context, the introgression of alleles from genetically modified organisms (e.g., insecticide resistance genes) into weedy species is recognized as a risk that can cause permanent ecological damage. A quantitative analysis of the introgression process is essential both to assess the importance of adaptive gene introgression as an evolutionary pathway to adaptation and to estimate the ecological risks associated with unwanted hybridization.

The flow of adaptive alleles among species is hampered by the reduced fitness of inter-species crosses. This reduction in fitness is due to alleles from the donor species that are deleterious in the new environment (which we consider in this paper) and/or the new genomic background. If their compound effect outweighs the benefits of the adaptation, some deleterious alleles must be eliminated by recombination in hybrid back-crosses before the favorable allele can establish. However, closely linked slightly deleterious alleles might be dragged along to fixation. In this paper, we developed a framework to investigate the role of linked and unlinked deleterious alleles in adaptive gene introgression. The model accounts for an explicit genetic structure and describes the genetic evolution of a haploid population under the influence of selection, recombination, and drift after a single hybridization event. The analysis is based on the theory of branching processes. The early phase of spread of the advantageous allele is approximated by a reducible multitype branching process with a special structure: within the branching process approximation, offspring are either of the same type as their parent or carry fewer deleterious alleles (for similar setups see Barton and Bengtsson 1986; Demon et al. 2007; Gosh and Haccou 2010; Gosh et al. 2012a, b; Yanchukov and Proulx 2014). The fate of the recombinants that are generated after the initial establishment and until fixation of the beneficial allele is modeled by a time-inhomogeneous single-type branching process. For the analysis of the first phase, we make use of methods developed by Serra (2006) and Serra and Haccou (2007). The analysis of the second phase builds on work by Hartfield and Otto (2011). The combination of the results from both phases allows for the analytical treatment of the entire process.

*The introgression probability* How likely is it that the advantageous allele can establish itself in the population? For large populations, the probability of adaptive gene introgression only depends on the early phase of the spread, where the dynamics are well described by a branching process. Technically, this is similar to studies on scenarios where adaptation relies on the accumulation of new mutations via stochastic tunneling. The gain of adaptive alleles by mutation in these scenarios, which include the crossing of fitness valleys (e.g., Weissman et al. 2009; Proulx 2011), tumor initiation (e.g., Iwasa et al. 2003, 2004a, b), or adaptation of a pathogen to a new host (e.g., Antia et al. 2003), corresponds to the removal of deleterious alleles by recombination in models of gene introgression. The survival probability of a multitype branching process is in general difficult to determine, and one has to resort to approximate formulas and numerical methods (e.g., Barton 1995; Iwasa et al. 2003, 2004b; Serra and Haccou 2007). However, in our special case, a recursive solution can be derived and readily permits the calculation of the introgression probability for any given allele configuration.

We find that both linked and unlinked deleterious alleles can significantly hamper the introgression of an adaptive allele. However, the characteristic of this barrier depends on whether linkage between the beneficial and the deleterious alleles is loose or tight. Loosely linked and unlinked deleterious alleles reduce the introgression probability by a factor that is roughly independent of the strength of the beneficial allele. In this parameter range, our results are analogous to those of Bengtsson (1985) and Barton and Bengtsson (1986), who derive a so-called gene-flow factor or barrier strength to describe the effect of a genetic barrier on the flow of a neutral marker allele (compare Appendix B). The relative reduction due to a single loosely linked allele is approximately \(s_{\text {del}}/r\), where \(s_{\text {del}}\) is the deleterious effect and \(r\) the recombination probability (see Eq. (50), Sect. 6.2). Strongly deleterious alleles (\(s_{\text {del}} > 0.05\)) can have a substantial effect even if unlinked (\(r=0.5\)). In agreement with Bengtsson (1985), we find that the influence of several unlinked deleterious alleles is well approximated by a single deleterious allele of the compound effect [Eq. (45)]. The simple picture of a gene-flow factor holds for \(r \gg s_{\text {ben}}\) (where \(s_{\text {ben}}\) is the effect of the beneficial allele), but breaks down for tighter linkage. For small \(r\), the relative reduction scales as \(1 - r/(|s_{\text {ben}}-s_{\text {del}}|)\) and thus depends explicitly on the strength of the beneficial allele [cf. Eq. (49)].

Our results on adaptive gene introgression can also be compared with results by Barton (1995) on the reduction of the fixation probability of a new beneficial mutation due to interference with standing deleterious variation. Barton (1995) assumes that the deleterious alleles segregate under mutation-selection balance in the population when the advantageous allele appears. Consider first the limiting case where a deleterious allele segregates at a single locus with frequency \(u\). The beneficial mutation can arise on a genome that does or does not carry the deleterious allele. Technically, the introgression probability in our model corresponds most closely to the fixation probability of the beneficial mutation given that it arises on a genome with the deleterious allele [denoted by \(P_u\) in Barton (1995)]. In that case, its fixation probability can be significantly reduced. However, the reduction due to segregating deleterious variation is generally much weaker than in our case, where both alleles enter the population via a single introgression event. This is because relative fitness of the double mutant is higher if the deleterious allele segregates in the population. A numerical comparison confirms that the result \(P_u/(2\sigma _{(0,0)})\) as given by Eq. (16) and (17a) in Barton (1995) converges to \((1-Q_{(0,1)})/\sigma _{(0,0)}\) [cf. Eq. (47c)] as the mutation rate and hence the frequency of the deleterious allele tend to zero. (Note that the results in Barton (1995) are based on a Poisson distribution of the offspring number such that the establishment probability of an isolated beneficial allele is \(2\sigma _{(0,0)}\) while it is \(\sigma _{(0,0)}\) in our model). Our results for the introgression probability hence represent limiting cases of the results in Barton (1995) (\(P_u\) in the limit \(u\rightarrow 0\)). However, for more than one deleterious allele, the equations in Barton (1995) do not allow for an analytical solution, while this is possible for the case of introgression, as shown by our results. The total fixation probability in the segregating-alleles case is a weighted average of the cases that the beneficial allele appears on a genome with and without the deleterious allele; the weighting factor depends on the mutation rate. This can lead to widely diverging conclusions as compared to the introgression scenario. For example, for \(\sigma _{(0,1)}<0\) and complete linkage, the weighted fixation probability of the beneficial mutation is reduced by \(u\) relative to its value in absence of the deleterious allele [Barton 1995, Eq. (17b)]. In contrast, the introgression probability (as well as \(P_u\)) are zero in that case. For \(\sigma _{(0,1)}>0\), the relative reduction is at most \(u(s_{\text {del}}/{\sigma _{(0,0)}})^2\), i.e., much smaller than for introgression (\(s_{\text {del}}/{\sigma _{(0,0)}}\)). Note that the reduction in the weighted fixation probability is caused by the recurrent generation of deleterious alleles that appear on genomes carrying the adaptation. The presence of deleterious alleles itself even slightly increases the weighted fixation probability [this term is very small and neglected in Barton (1995)]. Finally, Barton (1995) finds that for two loci flanking the beneficial mutation, the effects of the two deleterious alleles approximately multiply. This does not hold true for the different biological scenario of adaptive gene introgression.

Linked deleterious alleles can render successful introgression after a single hybridization event extremely unlikely. In order to assess whether even introgression probabilities of the order of \(10^{-8}-10^{-6}\) are still evolutionary relevant, it is helpful to compare these values to the probability of adaptation by de-novo mutations. With a point mutation probability of \(\sim 10^{-8}\) and a selective advantage of \(1\,\%\), the probability that a specific mutation occurs in a specific individual and thereafter rises to fixation, is \(\sim 10^{-10}\). For complex adaptations, the probability is even lower. Depending on the probability of hybridization, adaptive gene introgression can hence be a relevant evolutionary process. Hybridization rates are potentially high, and even if the success probability of each single hybridization event is low, the probability that any hybridization event is followed by adaptive gene introgression is appreciable. This consideration is particularly important in an agricultural context where (genetically modified) crops grow next to wild plants in large areas all over the world for many years. Gosh and Haccou (2010) and Gosh et al. (2012a, b) therefore suggest the so-called hazard rate as a measure for risk assessment, as the hazard rate takes both the hybridization rate and the introgression probability into account.

*The hitchhiking probability* Weakly deleterious alleles that are closely linked to the adaptive allele can hitchhike to fixation. We developed a framework to estimate which haplotype finally fixes in the population, depending on the alien haplotype that was originally introduced. The approach is based on a split of the process into two phases: the establishment phase of the adaptive allele and the sweep during which further deleterious alleles can be lost. What is the respective relevance of the stochastic and the deterministic phase in this scenario? In the simplest case, there are only two loci under selection: one locus with the advantageous and one locus with a deleterious allele that can hitchhike to fixation. We can then distinguish two parameter regimes: if the product of the selection coefficient and the population size, \(N s_{\text {del}}\), is of order \(1\) or smaller, the impact of the stochastic phase is significant. The probability for the hitchhiker to survive this phase depends strongly on the selection coefficients of both the beneficial and the deleterious allele, and on the recombination rate, but is independent of the population size (\(1-r s_{\text {ben}}/(s_{\text {ben}}-s_{\text {del}})^2\), cf. Eq. (54)). However, if the product of selection and population size is large, the stochastic phase can be ignored and the behavior is dominated by the deterministic phase. Hitchhikers will survive this phase if no haplotype without the deleterious allele can establish. Since the duration of the deterministic phase is \(\sim 1/(s_{\text {ben}}-s_{\text {del}})\) and the number of successful new recombinants per generation roughly \(\sim s_{\text {ben}}Nr\), we see that the hitchhiking probability will strongly depend on \(Nr\), while the effect of selection partly cancels. This is confirmed by our more precise calculations (Eq. (40)). The situation is different if additional deleterious alleles render the initial haplotype itself deleterious. In that case, establishment of the adaptation is contingent on the early loss of deleterious alleles, and depending on the allelic configuration, the stochastic establishment phase will have a strong impact on hitchhiking irrespective of the population size. To good approximation, all alleles that cause serious maladaptation and are located on the same side of the beneficial allele can be summarized to a single allele of the compound effect, reducing the dimensionality of the problem. The impact of these additional alleles fades quickly with increasing recombination distance. In contrast to introgression, unlinked alleles have no visible effect on the hitchhiking probability conditioned on successful introgression. These insights essentially generalize to more than one possible deleterious hitchhiker.

Hartfield and Otto (2011) analyze the hitchhiking probability of a single deleterious allele in the absence of other deleterious alleles. They present two approaches to the problem: a semi-deterministic approach based on branching process theory (which also serves as the basis for our analysis of the deterministic phase), and a diffusion approach. In both cases, however, they condition on establishment of type \((0,1)\). Their analysis thus ignores aspects of the stochastic establishment phase and consequently applies only to the regime of large \(Ns_{\text {del}}\).

*Selection and recombination in the introgression process* Summarizing, we can identify three fundamentally different genomic scales in units of the recombination rate that matter for adaptive gene introgression. First, deleterious alleles on the introgression haplotype affect the introgression probability of the beneficial allele across distances on the order of the deleterious selection coefficient (\(r \sim s_{\text {del}}\); cf. Eq. (50)). Strongly deleterious alleles thus still matter even if they are unlinked. Importantly, the absolute strength of selection is crucial for the failure or success of introgression. For the hitchhiking probability of a single deleterious allele, we find two relevant scales that stem from the stochastic and the deterministic phases of the hitchhiking process, respectively. In the stochastic phase, this scale is set by the selection coefficient of the haplotype containing both the beneficial and the deleterious allele (\(r \sim s_{\text {ben}}-s_{\text {del}}\); cf. Eq. (54)). Similar to the introgression probability it is the strength of selection that matters. In contrast, for the deterministic phase, the scale is primarily set by the inverse population size (\(r \sim 1/N\)). The rate of loss of the deleterious allele during the deterministic phase is also affected by selection. However, in contrast to the stochastic phase, only the ratio of selection coefficients \((s_{\text {del}}/s_{\text {ben}}\)) enters and not the absolute strength [Eq. (40)]. Usually (but not always), effects from the deterministic phase are relevant already over much shorter distances than effects from the stochastic phase and will therefore dominate. Finally, a third scale for the distance \(r\) between the beneficial allele and a deleterious hitchhiker becomes relevant if hitchhiking is compromized by further deleterious alleles in the genomic background. For cases where an additional deleterious allele at distance \(R\) to the hitchhiking allele suppresses the Malthusian fitness of that original introgression haplotype below zero, we find that the relevant scale for hitchhiking success is primarily set by the relative size of the recombination rates, i.e., \(r \sim R\) [cf. Eqs. (56), (60)].

*Limitations and extensions* The mathematical analysis of the model has two major restrictions. The first one is the common constraint of branching process approximations for establishment probabilities of beneficial alleles: for small populations, the branching process approach underestimates the true probability. In particular, our multitype branching process can contain supercritical, critical, and subcritical types. If the population size is small, fixation of deleterious haplotypes by genetic drift can be more likely than survival of the branching process. The second restriction concerns the analysis of the hitchhiking probability. The derived approximations rely on the assumption that initially, a single haplotype establishes and starts sweeping before haplotypes with fewer deleterious alleles possibly establish. In particular when linkage is not tight, this assumption is not necessarily satisfied. However, this introduces only a small error, and the approximation usually still yields good results (cf. Fig. 12). Relaxing the assumption would require the assessment of the stochastic dynamics of two or more types that simultaneously establish in the population at different random speeds, which would severely complicate the calculations. The analysis of the deterministic phase increases in complexity with the number of positively selected haplotypes. In order to keep it tractable, we assume that at most two successful recombinant haplotypes are generated during the fixation process of the adaptive allele. While this assumption is again well justified for tight linkage, it leads to strong deviations as recombination gets stronger (cf. Fig. 13).

The model assumes a population of haploid individuals or diploids without dominance. The results for the introgression probability can be directly generalized to apply to diploids with dominance unless the beneficial allele is completely recessive: as long as introgressed alleles are rare in the population, they only appear in heterozygotes. Equation (2) therefore still applies if fitness refers to heterozygote fitness. As soon as the introgressed alleles become more frequent, both copies of an individual’s chromosome might carry introgressed material. The sweep of a selectively favored haplotype is therefore strongly altered if the alleles are not co-dominant. The length and shape of this frequency path is, however, crucial for the hitchhiking probability. Our approach can be extended by this element (see Hartfield and Glémin (2014) for a single deleterious hitchhiker if there is dominance). The model furthermore assumes that selection against the deleterious alleles is independent of the genetic background. Recombination therefore increases the fitness of later generations of hybrids. However, by recombination, incompatibilities can be generated. In that case, later hybrids display a lower fitness than early generation hybrids until further recombination removes the incompatible alleles.

Beyond these genetic confinements, the second class of model extensions concerns the ecological setting. The model assumes that the alien genome arrives by long range dispersal in a panmictic population. An important extension, which would considerably affect the results, is the incorporation of spatial structure. A very different situation from the one analyzed in this paper arises if dispersal is local and strong and recurrent gene flow builds up a hybrid zone. Under these circumstances, as discussed in Barton (1979), a single-locus cline does not significantly hamper the spread of a beneficial allele from one population to the other.

To summarize, we set up a minimal model in order to reveal fundamental principles that are effective in the introgression process of a favorable allele. In particular, the analysis helps to build an intuitive understanding of how deleterious alleles impact adaptive gene introgression.

## Notes

### Acknowledgments

We thank Matthew Hartfield, Nick Barton, Lindi Wahl, and two anonymous referees for helpful comments on the manuscript. This work was made possible with financial support by the Vienna Science and Technology Fund (WWTF), by the Deutsche Forschungsgemeinschaft (DFG), Research Unit 1078 *Natural selection in structured populations*, by the Austrian Science Fund (FWF) via funding for the Vienna Graduate School for Population Genetics, and by a “For Women in Science” fellowship (L’Oréal Österreich in cooperation with the Austrian Commission for UNESCO and the Austrian Academy of Sciences with financial support from the Federal Ministry for Science and Research Austria).

## References

- Abi-Rached L et al (2011) The shaping of modern human immune systems by multiregional admixture with archaic humans. Science 334(89):89–94CrossRefGoogle Scholar
- Adams JR, Leonard JA, Waits LP (2003) Widespread occurence of a domestic dog mitochondrial DNA haplotype in southeastern US coyotes. Mol Ecol 12(2):541–546CrossRefGoogle Scholar
- Antia R, Regoes RR, Koella JC, Bergstrom CT (2003) The role of evolution in the emergence of infectious diseases. Nature 426:658–661CrossRefGoogle Scholar
- Arnold MJ (2004) Transfer and origin of adaptations through natural hybidization: Were Anderson and Stebbins right? Plant Cell 16:562–570CrossRefGoogle Scholar
- Arnold ML, Martin NH (2009) Adaptation by introgression. J Biol 8:82CrossRefGoogle Scholar
- Arnold ML, Bulger MR, Burke JM, Hempel AL, Williams JH (1999) Natural hybridization: How low can you go and still be important? Ecology 80(2):371–381CrossRefGoogle Scholar
- Athreya KB, Ney PE (1972) Branch Processes. Springer, BerlinCrossRefGoogle Scholar
- Barton NH (1995) Linkage and the limits to natural selection. Genetics 140:821–841Google Scholar
- Barton NH (1979) Gene flow past a cline. Heredity 43(3):333–339CrossRefGoogle Scholar
- Barton N, Bengtsson BO (1986) The barrier to genetic exchange between hybridising populations. Heredity 56:357–376CrossRefGoogle Scholar
- Beaumont M et al (2001) Genetic diversity and introgression in the Scottish wildcat. Mol Ecol 10:319–336CrossRefGoogle Scholar
- Bengtsson BO (1985) The flow of genes through a genetic barrier. In: Greenwood JJ, Harvey PH, Slatkin M (eds) Evolution essays in honour of John Maynard Smith. Cambridge University Press, Cambridge, pp 31–42Google Scholar
- Demon I, Haccou P, van den Bosch F (2007) Introgression of resistance genes between populations: a model study of insecticide resistance in
*Bemisia tabaci*. Theor Popul Biol 72:292–304CrossRefzbMATHGoogle Scholar - Desai MM, Fisher DS (2007) Beneficial mutation-selection balance and the effect of linkage on positive selection. Genetics 176:1759–1798CrossRefGoogle Scholar
- Ellstrand NC, Prentice HC, Hancock JF (1999) Gene flow and introgression from domesticated plants into their wild relatives. Annu Rev Ecol Sytematics 30:539–563CrossRefGoogle Scholar
- Ellstrand NC et al (2013) Introgression of crop alleles into wild or weedy populations. Annu Rev Ecol Evol Syst 44:325–345CrossRefGoogle Scholar
- Fitzpatrick BM et al (2010) Rapid spread of invasive genes into a threatend native species. Proc Natl Acad Sci 107(8):3606–3610CrossRefGoogle Scholar
- Galassi M, Davies J, Theiler J, Gough B, Jungman G, Alken P, Booth M, Rossi F (2009) GNU Scientific Library Reference Manual, 3rd edn. Network Theory Ltd., BristolGoogle Scholar
- Gosh A, Haccou P (2010) Quantifying stochastic introgression processes with hazard rates. Theor Popul Biol 77:171–180CrossRefGoogle Scholar
- Gosh A, Meirmans PG, Haccou P (2012a) Quantifying introgression risk with realistic population genetics. Proc R Soc B 279:4747–4754CrossRefGoogle Scholar
- Gosh A, Serra MC, Haccou P (2012b) Quantifying time-inhomogeneous stochastic introgression processes with hazard rates. Theor Popul Biol 81:253–263CrossRefGoogle Scholar
- Gottelli D et al (1994) Molecular genetics of the most endangered canid: the ethopian wolf Canis simensis. Mol Ecol 3(4):301–312CrossRefGoogle Scholar
- Hartfield M, Glémin S (2014) Hitchhiking of deleterious alleles and the cost of adaptation in partially selfing species. Genetics p. doi: 10.1534/genetics.113.158196 Google Scholar
- Hartfield M, Otto SP (2011) Recombination and hitchhiking of deleterious alleles. Evolution 65(9):2421–2434CrossRefGoogle Scholar
- Hedrick PW (2013) Adaptive introgression in animals: examples and comparison to new mutation and standing genetic variation as sources of adaptive variation. Mol Ecol 22:4606–4618CrossRefGoogle Scholar
- Heiser CB Jr (1951) Hybridization in the annual sunflowers:
*Helianthus annuus*\(\times \)*H. debilis*Var. cucumerifolius. Evolution 5(1):42–51CrossRefMathSciNetGoogle Scholar - Iwasa Y, Michor F, Nowak MA (2003) Evolutionary dynamics of escape from biomedical intervention. Proc R Soc B 270:2573–2578CrossRefGoogle Scholar
- Iwasa Y, Michor F, Nowak MA (2004a) Evolutionary dynamics of invasion and escape. J Theor Biol 226:205–214CrossRefMathSciNetGoogle Scholar
- Iwasa Y, Michor F, Nowak MA (2004b) Stochastic tunnels in evolutionary dynamics. Genetics 166(3):1571–1579CrossRefGoogle Scholar
- Lindner CR, Taha I, Seiler GJ, Snow AA, Rieseberg LH (1998) Long-term introgression of crop genes into wild sunflower populations. Theor App Genet 96:339–347CrossRefGoogle Scholar
- Mallet J (2005) Hybridization as an invasion of the genome. Trends Ecol Evol 20(5):229–237CrossRefGoogle Scholar
- Miller W, et al. (2012) Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change. Proc Natl Acad Sci E2382–E2390Google Scholar
- Proulx SR (2011) The rate of multi-step evolution in moran and wright-fisher populations. Theor Popul Biol 80:197–207CrossRefzbMATHGoogle Scholar
- Rhymer JM, Simberloff D (1996) Extinction by hybridization and introgression. Annu Rev Ecol Systematics 27:83–109CrossRefGoogle Scholar
- Schwenk K, Brede N, Streit B (2008) Introduction. Extent, processes and evolutionary impact of interspecific hybridization in animals. Philos Trans R Soc B 363:2805–2811CrossRefGoogle Scholar
- Serra MC (2006) On the waiting time to escape. J Appl Prob 43:296–302CrossRefzbMATHMathSciNetGoogle Scholar
- Serra MC, Haccou P (2007) Dynamics of escape mutations. Theor Popul Biol 72:167–178CrossRefzbMATHGoogle Scholar
- Sewastjanow BA (1974) Verzweigungsprozesse. Akademie-Verlag, BerlinzbMATHGoogle Scholar
- Snow AA (2002) Transgenic crops—why gene flow matters. Nat Biotechnol 20:542Google Scholar
- Snow AA et al (2003) A Bt transgene reduces herbivory and enhances fecundity in wild sunflowers. Ecol Appl 13(2):279–286CrossRefGoogle Scholar
- Stewart CN, Halfhill MD, Warwick SI (2003) Genetic modification: transgene introgression from genetically modified crops to their wild relatives. Nat Rev Genet 4:806–817CrossRefGoogle Scholar
- The Heliconius Genome Consortium (2012) Buttlerfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature 487:95–98Google Scholar
- Uecker H, Hermisson J (2011) On the fixation process of a beneficial mutation in a variable environment. Genetics 188:915–930CrossRefGoogle Scholar
- Weissman DB, Desai MM, Fisher DS, Fisher MW (2009) The rate at which asexual populations cross fitness valleys. Theor Popul Biol 75:286–300CrossRefzbMATHGoogle Scholar
- Whitney KD, Randell RA, Rieseberg LH (2006) Adaptive introgression of herbivore resistance traits in the weedy sunflower
*Helianthus annuus*. Am Nat 167(6):794–807CrossRefGoogle Scholar - Yanchukov A, Proulx SR (2014) Migration-selection balance at multiple loci and selection on dominance and recombination. PLoS One 9(2):e88–651CrossRefGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.