Adaptive gene introgression after secondary contact

By hybridization and backcrossing, alleles can surmount species boundaries and be incorporated into the genome of a related species. This introgression of genes is of particular evolutionary relevance if it involves the transfer of adaptations between populations. However, any beneficial allele will typically be associated with other alien alleles that are often deleterious and hamper the introgression process. In order to describe the introgression of an adaptive allele, we set up a stochastic model with an explicit genetic makeup of linked and unlinked deleterious alleles. Based on the theory of reducible multitype branching processes, we derive a recursive expression for the establishment probability of the beneficial allele after a single hybridization event. We furthermore study the probability that slightly deleterious alleles hitchhike to fixation. The key to the analysis is a split of the process into a stochastic phase in which the advantageous alleles establishes and a deterministic phase in which it sweeps to fixation. We thereafter apply the theory to a set of biologically relevant scenarios such as introgression in the presence of many unlinked or few closely linked deleterious alleles. A comparison to computer simulations shows that the approximations work well over a large parameter range.


Introduction
Hybridization between related species is a common phenomenon. Indeed, Mallet (2005) estimates that at least 25 % of plant species and 10 % of animal species still interbreed. The disappearance of natural habitat barriers following environmental change, the introduction of foreign species, the escape of domesticated animals into the wild, and the cultivation of crops all create new regions of species range overlap and consequently cause high rates of hybridization. Despite reproductive barriers, hybridization between related species is often not completely prohibited and leads to the production of viable and fertile offspring. In the course of backcrossing with a parental species, alien genetic material is lost, but some part of it may be permanently incorporated into the genome of the sister species. The introgression of genes from one species into another occurs over a wide range of taxa (Rhymer and Simberloff 1996;Lindner et al. 1998;Arnold et al. 1999;Arnold 2004;Miller et al. 2012;Ellstrand et al. 2013). The introgression of genes from feral to wild animals (Adams et al. 2003;Beaumont et al. 2001;Gottelli et al. 1994;Rhymer and Simberloff 1996) or from introduced to native species (Rhymer and Simberloff 1996;Fitzpatrick et al. 2010) poses ecological risks and if extensive may entail a loss of biodiversity.
In addition, evidence for the transfer of adaptations across species boundaries is growing (Arnold et al. 1999;Arnold 2004;Whitney et al. 2006;Schwenk et al. 2008;Arnold and Martin 2009;The Heliconius Genome Consortium 2012;Hedrick 2013). Introgression of genes can hence take direct influence on the evolutionary routes of a species and speed up adaptation. For example, the introduced sunflower species Helianthus annuus likely has acquired resistance genes from the native and locally adapted species H. debilis in Texas, allowing it to expand its species range southwards (Heiser 1951;Whitney et al. 2006). Similarly, Abi-Rached et al. (2011) suggest that positively selected immune system alleles from Neanderthals and Denisovans might have introgressed into modern humans. In agriculture, adaptive gene introgression can potentially constitute a major risk: adaptive herbivore, insecticide, or pathogen resistance genes from (possibly genetically modified) crops can spread to wild relatives, severely complicating weed control (Snow 2002;Snow et al. 2003;Stewart et al. 2003). Importantly, Snow et al. (2003) show that a transgene can reduce herbivory and increase fitness in a wild sunflower under natural conditions. Early-generation hybrids, even if not entirely infertile or inviable, frequently suffer from strongly reduced fitness. Often, hybrids display an intermediate phenotype that is maladapted to either parental niche. The low hybrid fitness can also result from genetic incompatibilities. By backcrossing with one of the parental species, alleles that prove to be deleterious in the foreign genetic background or cause maladaptation to the parental niches can be purged and fitness restored (Heiser 1951;Arnold et al. 1999).
The probability of successful gene introgression critically depends on the strength of this fitness bottleneck.
Theoretical models on adaptive gene introgression that take a reduction in hybrid fitness into account usually assume that a pre-defined number of backcrosses are required in order to lose the deleterious material and obtain a positively selected type (Demon et al. 2007;Gosh and Haccou 2010;Gosh et al. 2012a,b). This assumes that the deleterious effects are homogeneously spread over the genome of a diploid organism and that an appreciable amount of deleterious alleles is required to have a measurably impact on fitness. Focusing on other aspects of gene introgression, such as the impact of a temporally varying environment (Gosh et al. 2012b) or life history traits (Demon et al. 2007), these models greatly simplify the underlying genetics. A step towards more realistic population genetic models has been made by Gosh et al. (2012a). Their analysis remains, however, restricted to the most basic scenario in which a single deleterious allele is linked to the locus under positive selection (see also Barton 1979). For a neutral marker locus, in contrast, the impact of a genetic barrier consisting of an arbitrary number of deleterious alleles has been investigated (Bengtsson 1985; Barton and Bengtsson 1986).
In this paper, we focus on a single hybridization event and examine the impact of linked and unlinked deleterious alleles on the introgression process of an adaptive allele. The deleterious effect of alleles is caused by maladaptation to the new environment and is independent of the genetic background. We first set up a Moran-like model which describes the evolution of the population by genetic drift, selection, and recombination. In the first part of the model analysis, we apply the theory of reducible multitype branching processes to determine by how much deleterious alleles reduce the introgression probability of a favorable allele in dependence of the strength of selection and linkage. The second part considers the probability that closely linked deleterious alleles "hitchhike" to fixation. The analysis relies on a separation of the process into a strongly stochastic phase, in which a haplotype carrying the beneficial allele establishes, and a deterministic phase, in which it sweeps through the population, possibly losing deleterious material by recombination with wildtype individuals. These recombination events and the subsequent establishment or loss of haplotypes with fewer deleterious alleles are again subject to strong stochasticity. In this analysis, we again resort to the theory of branching processes. The derived approximations are applied to a variety of biological scenarios and complemented by computer simulations. We close the paper with a discussion.

Full model and simulations
We consider a large population of N haploid individuals. The theory also applies to diploids without dominance if we can assume Hardy-Weinberg equilibrium, but we use the haploid formalism throughout the paper. Through a single hybridization event, a hybrid individual is introduced to the population (for diploids, the alien alleles arrive in the foreign habitat at the haploid stage, i.e., for plants, by pollen dispersal). The hybrid carries an adaptive allele as well as a number of deleterious alleles. Deleterious alleles are either physically linked to the adaptive allele or unlinked. We assume that this initial hybrid haplotype carries I and J linked deleterious alleles to the left and the right of the beneficial allele, respectively, and F unlinked alleles. By recombination with wildtype individuals, haplotypes with fewer introgressed alleles can be generated, leading to a hybrid swarm. Selection on the deleterious alleles relies on maladaptation to the environment and is independent of the genomic context. We assume that all wildtype individuals have the same fitness and introgressed alleles interact identically with all wildtype backgrounds. The fitness of an individual is thus fully determined by the introgressed alleles that it carries.
The evolution of the population is described by the following scheme, which represents a Moran model with recombination: At rate N , two individuals are chosen to reproduce and generate a single offspring. During reproduction, recombination can take place. In order to simplify our bookkeeping of genotypes in the analytical treatment, we restrict ourselves to single crossover among the linked alleles. Multiple crossover is unlikely to happen over recombination distances r with r 2 r so that the model captures scenarios of tight linkage. Considering larger recombination distances (where multiple crossover gets likely) or gene conversion requires a straightforward extension of the formalism, in which a larger number of genotypes can be generated by recombination with wildtype individuals. Unlinked alleles are inherited with probability one-half. The offspring replaces an individual that is chosen based on its fitness. For notational simplicity, we assign the numbers 1 to N to the individuals. Individual number k is then chosen with probability , where σ (k) = 0 for wildtype individuals. I.e., σ (k) is the Malthusian fitness of individual k in a wildtype population. All three individuals that are involved in a reproduction event are chosen with replacement, i.e., the same individual might be chosen twice.
The simulation program implements the successive events without consideration for the time spans between them. As we are only interested in probabilities, this does not influence the results. The number of replicates is chosen so that error bars vanish in the symbols in all plots. The simulation program is written in the C++ programming language, making use of the Gnu Scientific Library (Galassi et al. 2009).
The full model does not allow for an analytical treatment. In the following sections, we therefore consider approximations to the introgression process.

The early phase of spread
A single evolutionary step involves three individuals: two individuals that reproduce and one that dies. In a large population, as long as introgressed haplotypes are rare, it is unlikely that more than one hybrid individuals are involved in a single event (or that the same individual is involved twice). Formally, this means that terms of order (n intro /N ) in the transition rates, where n intro denotes the number of individuals with introgressed alleles, are negligible. Consequently, hybrids suffer (nearly) independent fates in the early phase of spread, and the process is therefore well described by a multitype branching process. The branching process is strictly recovered in the limit N → ∞. type (1, 4) fitness σ (1,4) type (1, 2) fitness σ (1,2) The lack of interaction among hybrids entails that types with introgressed material only recombine with wildtype individuals. This implies that by recombination, they can only lose, not gain deleterious alleles, and we encounter a special instance of a reducible multitype branching process (cf. also Barton and Bengtsson 1986;Demon et al. 2007;Gosh et al. 2012b). By recombination, types that carry only deleterious alleles but not the beneficial allele are generated. We do not consider these types in the following (within a branching process approach, they are doomed to extinction) but focus on carriers of the advantageous allele. In the main text of the paper, we assume that all unlinked deleterious alleles have the same effect size. A generalization of the main results to arbitrary effect sizes is given in Appendix D.
We call an individual that carries the beneficial allele with i deleterious alleles to the left, j deleterious alleles to the right and f unlinked deleterious alleles an individual of type (i, j; f ) (cf. Fig. 1a). Its net selection coefficient is denoted by σ (i, j; f ) . We set (i, j) ≡ (i, j; 0). The recombinant offspring of a type (i, j) individual are either of type (i, k) with k < j or of type (k, j) with k < i. Such recombination events may happen with probabilities r respectively. An instance of repeated recombination events is depicted in Fig. 1b. The overall probability that a recombination event takes place is given by the sum . The number of unlinked deleterious alleles that are inherited by an offspring individual is binomially distributed with parameter 0.5. We obtain for the per capita transition rates of the possible events in the branching process: Death Birth-with coupled recombination

The probability of adaptive gene introgression
First, we focus on the probability that the beneficial allele establishes in the population. Once the beneficial allele is sufficiently frequent, it is very unlikely to be lost again. The extinction probability of the branching process as described in the previous section is thus a good approximation for the extinction probability of the beneficial allele in the full model (and the introgression probability is the complementary probability). We denote by Q (i, j; f ) the extinction probability of the process that is initiated by exactly one individual of type (i, j; f ).

Theorem 1
The extinction probability Q (I,J ;F) can be calculated by recursively solving the system of quadratic equations where always the smaller root of the equation has to be used.
We only give an illustrative derivation of Eq.
(2) here and move the full proof to Appendix A.
Consider a branching process initiated by an individual of type (i, j; f ). With probability 1−σ (i, j; f ) 2−σ (i, j; f ) , the founding individual dies before it reproduces, in which case the lineage is immediately extinct. With probability f g 1 2 f 1−r (i, j) 2−σ (i, j; f ) , it reproduces and generates a non-recombinant offspring with g unlinked alleles, i.e., an offspring of type (i, j; g).
2−σ (i, j) ), it reproduces and gives birth to a type (i, k; g) (or (k, j; g)) individual with 0 ≤ k j (or 0 ≤ k < i). After reproduction, there are now two individuals, each of which is again the founding individual of a lineage. In order for the original lineage to go extinct, both of these lineages have to die out. It therefore holds for the extinction probability Q (i, j; f ) : By rearranging terms, we obtain Eq.
(2) yields: Implications of these results are discussed in Sect. 6. In Appendix B, we relate our results to results by Bengtsson (1985) and Barton and Bengtsson (1986) on the impact of a cline on the spread of a neutral marker allele. Our results are consistent with Bengtsson (1985) and Barton and Bengtsson (1986) when all loci are only loosely linked or unlinked but deviate for tight linkage of the deleterious allele.

General idea
If the effects of closely linked deleterious alleles are not too harmful, namely if σ (i, j) − r (i, j) > 0, the beneficial allele can drag (some of) these deleterious alleles along to fixation. In this section, we develop a framework for determining the hitchhiking probabilities conditioned on fixation of the beneficial allele. The approach is based on a split of the process into two phases. After the original hybridization event, the beneficial allele must establish in the population. We call this the "stochastic phase". In the previous section, we were concerned with the establishment probability itself. Here, we further derive which haplotype (i, j) will escape stochastic loss in this initial phase conditioned on survival of the process. We assume that only one haplotype escapes. This is a very likely outcome of the stochastic phase under many circumstances because the establishment probability of each type is low. Since this establishment happens while the introgressed types are rare, we base the derivation on the multitype branching process as before. Once established, type (i, j) increases in frequency approximately as predicted by deterministic growth. If no further recombination events were to happen, it would rise to fixation following the logistic equatioṅ However, during the sweep, types with fewer deleterious alleles can still be generated by recombination. If one of these types establishes, it outcompetes type (i, j). Building on theory by Hartfield and Otto (2011), we describe the production and possible establishment of these types by a time-inhomogeneous branching process with immigration. Although the generation and establishment of new haplotypes is subject to strong stochasticity, we refer to this phase as to the "deterministic phase" because we model the frequency paths of the established haplotypes deterministically. We first give a derivation for the case without unlinked deleterious material (F = 0), and subsequently generalize the approximation to F > 0.

The stochastic phase
As a first step, we determine which haplotype "rescues" the introgression process given that the process does not go extinct. For this initial phase, we again resort to the multitype branching process as defined in Eq. (1). As before, the process is initiated by a single individual of type (I, J ). If σ (I,J ) − r (I,J ) > 0, type (I, J ) has the chance to establish a permanent lineage of its own type. If σ (I,J ) − r (I,J ) ≤ 0, type (I, J ) itself will go extinct with probability 1 (ignoring the possibility of fixation by drift). However, until extinction, recombinant offspring with fewer deleterious alleles can be generated and rescue the process. In that case, to determine the "rescue type", we can consider all recombination pathways that lead to establishment of the beneficial allele and determine with which (relative) probability the various paths are realized. This idea is key for the derivation of the approximation in this section.
Throughout the analysis, the total number of recombination events from type (I, J ) to any other type until extinction of type (I, J ) constitutes a central quantity. This follows Serra (2006) and Serra and Haccou (2007). For σ (I,J ) − r (I,J ) < 0, we denote the corresponding probability generating function (p.g.f.) by h(s). For σ (I,J ) −r (I,J ) > 0 and extinction of type (I, J ), we consider the number of recombination events conditioned on extinction of type (I, J ) and denote the p.g.f. byĥ(s). h(s) andĥ(s) can be explicitly calculated for our model and are given by Lemma 6 in Appendix E.
As a first step, we derive an alternative expression for the survival probability of the process. To do so, we group the recombinant offspring of type (I, J ) individuals into two classes: (1) individuals that found processes that survive (2) individuals that found processes that go extinct. We denote by Y + and Y − the random number of recombination events from type (I, J ) to type 1 and type 2 individuals, respectively. In the lemma, we rewrite the survival probability of the process in terms of the expected number of successful recombinant lineages and an error term.
Lemma 1 Let σ (I,J ) − r (I,J ) < 0. The survival probability of the process can be written as where the error term R 1 is given by and P success = ,k) (1 − Q (I,k) ) r (I,J ) .
Proof A recombinant offspring of a type (I, J ) individual founds an infinite lineage with probability P success and a lineage that goes extinct with probability 1 − P success . According to Lemma 7 in Appendix E, the joint p.g.f of Y + and Y − is given by and we obtain for the expected number of type 1 individuals: Now note: with I.e., if P(Y + > 1) ≈ 0, the expected number of recombination events from type (I, J ) individuals to individuals with fewer mutations that found a successful lineage approximates the survival probability of the process.
Remark For σ (I,J ) −r (I,J ) > 0, we can consider the process conditioned on extinction of type (I, J ). In this case, an analogous result holds if we replace h(s) byĥ(s).
In order to proceed, we need a formal definition of a "rescue type". Analogous to the lemma, we can then derive a recursive formula for the probability that an individual of type (i, j) rescues the process.

Definition 1
We call an (i, j) individual an (i, j) "rescue type", denoted as (i, j, +), if (1) it founds an infinite lineage of type (i, j) individuals, (2) there is no individual in its ancestry that founds an infinite lineage of its own type.
We denote by X (k,l) (i, j,+) the number of rescue types (i, j, +) in a process which is founded by an individual of type (k, l).
We define That is, P (I,J ) (i, j) gives the probability that there exists an (i, j) rescue type conditioned on survival of the process. A priori, that does not exclude the simultaneous existence of several rescue types. For the following theorem, we again group the recombinant offspring of a type (I, J ) individual into two classes: (1) individuals that found a lineage resulting in at least one individual of type (i, j, +) (2) individuals that do not do that. We denote the number of recombinants of the first and second type with Y (i, j,+) and Y (i, j,−) , respectively.
Proof We first prove part (1) of the theorem. With probability P (i, j,+) , a recombinant offspring founds a lineage that generates at least one individual of type (i, j, +). Analogous to before, we obtain with It holds: Substituting 1 − Q (I,J ) by the approximation Eq. (7) yields Eq. (16).
If σ (I,J ) − r (I,J ) > 0, type (I, J ) establishes a lineage of its own type with probability (cf. Lemma 4). It therefore holds: The probability that type (I, J ) goes extinct, conditioned on survival of the process, is accordingly given by We can now repeat the proof of the first part of the theorem for the process conditioned on extinction of type (I, J ).
For σ (I,J ) − r (I,J ) < 0 not too close to zero, it is likely that only one of the few recombinant offspring of type (I, J ) individuals founds an infinite lineage, and we can approximate P(Y + ≥ 2) ≈ 0 and consequently also P(Y (i, j,+) ≥ 2) ≈ 0. This implies that the error terms R 1 and R 2 can be ignored. For σ (I,J ) −r (I,J ) > 0 and r (I,J ) small, survival of the process is with high probability contingent on establishment of type (I, J ) so that We can therefore formulate the following corollary: Corollary 1 For σ (I,J ) − r (I,J ) < 0 not too close to zero and close linkage, we can approximate with for σ (i, j) − r (i, j) > 0. In this approximation, the P The proof for relation Eq. (32) is given in Appendix C. The approximation [Eqs. (30) and (31)] implies that exactly one rescue type establishes in the population during the stochastic phase (types with fewer deleterious alleles can still arise later during the deterministic phase), i.e., This assumption appears to be justified for a large parameter region with tightly linked deleterious alleles. It is also the basis for most of our analytical treatment of particular cases below. As discussed in Appendix H, the approximation is less accurate if deleterious alleles are relatively loosely linked and/or haplotypes are only slightly deleterious, but in most cases, it introduces only a small error. We can extend the approximation to include unlinked deleterious alleles and obtain for σ (I,J ) − r (I,J ) < 0: For σ (I,J ) − r (I,J ) > 0, we approximate:

The deterministic phase
It remains to determine whether the haplotype that establishes in the stochastic phase rises to fixation or if types with less deleterious material can establish during the sweep of the beneficial allele. In order to arrive at an approximation for the deterministic phase, we apply and extend an approach developed in Hartfield and Otto (2011). Hartfield and Otto (2011) determined the hitchhiking probability of a single deleterious allele which is closely linked to a beneficial one. For a single hitchhiker, their method can easily be adapted to our model, as shown below. In the Appendix, we further argue that the approach can be extended to a larger number of hitchhikers. Explicit results for two hitchhikers are derived in Appendix F. For a single potential hitchhiker, assume that type (0, 1) with σ (0,1) − r (0,1) > 0 has been introduced and established in the population. Its further growth can be well described deterministically as given by the differential equation Eq. (6). However, in the initial phase, it will on average have grown faster than the deterministic path predicts. Following Uecker and Hermisson (2011), we account for the fast initial increase by the use of an "effective initial population size" ν, which we use as an initial condition for the solution of Eq. (6) (cf. also Desai and Fisher 2007). ν is an exponentially distributed random variable with where p est = 1 − q (0,1) (cf. Eq. (26)) denotes the establishment probability of a single type (0, 1) individual in a wildtype population (Uecker and Hermisson 2011 Eq. (40)). To leading order approximation, we can approximate the distribution by its meanν = 1/ p est . For the relative frequency of type (0, 1), it then holds (ignoring recombination): This is a good approximation up to frequency x (0,1) ≈ 1−ν N . At higher frequencies, the frequency will again grow faster. Individuals of type (0, 0) are recurrently generated by recombination at rate r (0,1) N x (0,1) (1 − x (0,1) ). As long as they are rare, their dynamics are strongly determined by stochasticity and can once again be approximated by a branching process. Their fitness depends on x (0,1) (t) and hence on time. The dynamics is thus described by a time-inhomogeneous branching process with birth rate 1 and death rate 1 − σ (0,0) + x (0,1) (t)σ (0,1) . Following Eq. (16a) in Uecker and Hermisson (2011), the establishment probability of a single individual of type (0, 0) generated at time T is given by "Successful" individuals of type (0, 0) are generated at rate Using this, we obtain for the probability that type (0, 1) fixes in the population: where the approximation requiresν < 0.5N . For a single introgressed individual at time t = 0 in a large population, we can approximateν/N ≈ 0 and obtain Eq. (41) corresponds to Eq. (5) in Hartfield and Otto (2011) up to a model-specific factor of 2 in the exponent if we identify is small or if there are already other introgressed haplotypes sweeping in the population as in the generalization to more potential hitchhikers, it makes a quantitative difference whether one accounts for the fast initial increase or not, and we cannot approximateν/N ≈ 0 (see Appendix F). Alternatively, one can resort to a diffusion approach for these cases (see Hartfield and Otto 2011 and Appendix G). Note that both approaches assume that recombination is so weak that by itself, it does not influence the frequency path of type (0, 1) (i.e., σ (0,1) r (0,1) (0,0) ).

Concatenation of the stochastic and the deterministic phase
In order to determine which haplotype fixes in the population, we need to concatenate the stochastic and the deterministic phase. Let A be the set of all types with positive fitness. Type (k, l) establishes in the stochastic phase with probability P (I,J ) (k,l) as derived and discussed in Sect. 5.2 and hence enters the deterministic phase. We always assume that only one type does so (and it does so in a single lineage, i.e., there is only one "rescue individual"). Given establishment of type (k, l), we denote by the probability that during the deterministic phase, type (i, j) ∈ A is generated and finally fixes in the population. Summing over all (k, l) ∈ A yields: If not stated otherwise, the results presented in Sect. 6 are based on Eq. (43) with P (I,J ) (k,l) obtained by Eqs. (30) and (31). The recursions are performed by a program written in the C programming language. Approximations for P All numerical evaluation of the integrals that appear in these approximations is done in Mathematica (Wolfram Research, Champaign, USA). The accuracy of the approach and the appropriateness of the assumptions are addressed in Appendix H.

The impact of unlinked alleles
If I = J = 0, the extinction probability is given by Eq. (5). For Q 0 , Q 1 and Q 2 , we obtain: How does the number of unlinked deleterious alleles impact the introgression probability if their total effect is kept constant? A comparison of 1− Q 1 with σ 1 = σ 0 −2s del and 1 − Q 2 with σ 1 = σ 0 − s del and σ 2 = σ 0 − 2s del yields: i.e., unless the deleterious effect is very strong, the establishment probability is approximately the same for both scenarios (either one deleterious allele of effect 2s del or two deleterious alleles each of effect s del ). Figure 2 generalizes this result to F > 2. One sees that unlinked alleles significantly reduce the introgression probability. However, it is irrelevant whether there is one strongly deleterious allele or many slightly deleterious alleles. By how much do unlinked deleterious alleles of compound effect S del reduce the introgression probability? Making use of the previous observation, it is sufficient to consider a single unlinked allele of effect S del . A Taylor expansion yields: The introgression probability as a function of the number of unlinked deleterious alleles. The total effect on fitness is kept constant; σ F denotes the Malthusian fitness of a haplotype carrying the adaptive allele and F unlinked deleterious alleles. The introgression probability is approximately the same whether the effect is distributed over few strongly or many slightly deleterious alleles. The advantageous allele has Malthusian fitness σ 0 = 0.08. In the absence of deleterious alleles, it would establish with probability 1 − Q 0 = σ (0,0) = 0.08. The crosses denote simulation results. Each simulation point is the average of 10 6 introgression attempts (·,·) = 0.0001. The crosses denote simulation results. Each simulation point is the average of 2,000 successful introgression events i.e., unlinked alleles approximately reduce the introgression probability by a factor that is independent of σ 0 .
While unlinked alleles have a significant impact on the probability of adaptive gene introgression (∼10-50 % in Fig. 2), they do not visibly influence the hitchhiking probability of closely linked deleterious alleles (cf. Fig. 3).

The impact of a single linked deleterious allele
In this section, we consider the impact of a single linked deleterious allele (cf. also Iwasa et al. 2004a). From Eq. (4), we obtain: The approximation is a first order Taylor expansion in r (0,1) , which yields accurate results for small r (0,1) if σ (0,1) is not too close to zero. Due to the assumption of single crossover only, Q (0,1) exactly corresponds to Q 1 for r (0,1) = 0.5 (where σ (0,0) ≡ σ 0 and σ (0,1) ≡ σ 1 ). How does a single deleterious allele impact the probability of adaptive gene introgression? We can measure the impact by the relative reduction of the introgression probability If P is close to zero, the deleterious allele has a weak impact; if P is close to one, it has a strong impact. The influence is obviously strongest for r (0,1) = 0. If σ (0,1) > 0, the maximum relative reduction in the introgression probability is given by s del /σ (0,0) with s del := σ (0,0) − σ (0,1) . For σ (0,1) ≤ 0 and r (0,1) = 0, the advantageous allele can not introgress at all (except through fixation by drift which is not considered here).
For tight linkage, we use the foregoing Taylor expansion and obtain The maximum impact is strongest for a weak beneficial mutation (where introgression is easily reduced to zero for tight linkage). The impact gets weaker with increasing recombination on the scale of σ (0,1) . I.e., if either σ (0,0) s del , or if s del σ (0,0) , the impact declines only slowly. In order to determine the behavior for strong recombination, we perform a Taylor expansion of P in s del : For r (0,1) σ (0,0) , the relative reduction becomes independent of the strength of the beneficial allele. This is in strong contrast to the behavior for the tight linkage case. The impact declines on a scale of s del . It becomes irrelevant if r (0,1) s del . An unlinked deleterious allele leads to a relative reduction of 2s del , which can still be appreciable if the deleterious mutation has a strong effect. If σ (0,1) − r (0,1) > 0, the deleterious allele can hitchhike to fixation. For the assessment of its hitchhiking probability, we do not follow the simple approximation (31) but give a detailed analysis of the stochastic establishment phase instead. Type (0, 1) establishes with probability i.e., and In order to asses the relevance of the stochastic and the deterministic phases, we perform a first-order Taylor expansion of P I.e., changes in P (0,1) (0,1) occur on the scale of r (0,1) ∼ σ 2 (0,1) /σ (0,0) . For the deterministic phase [Eq. (41)], the scale is set by r (0,1) ∼ σ 2 (0,1) /(N σ (0,0) s del ). This allows us to distinguish two parameter regimes: if N s del 1, the deterministic phase dominates. However, if N s del ≈ 1 or smaller, the stochastic phase cannot be ignored. Figures 4C and D illustrate how the stochastic and deterministic phases combine to form the probability of hitchhiking for N = 10000 and N = 500, respectively. For N =10,000, the behavior is dominated by the deterministic phase which decays quickly as a function of r (0,1) . In the parameter range where P ((0,1)→(0,1)) det is appreciable, one can ignore the influence of the stochastic phase. For N = 500, however, P ((0,1)→(0,1)) det decays slowly, and the stochastic phase has a non-negligible impact on hitchhiking: e.g., for 1) ) is small, we have to account for deviations from the deterministic path. These deviations can be accounted for via the parameterν as in Eq. (40) or via a diffusion approach as in Hartfield and Otto (2011). In Appendix G, we give the diffusion equation adjusted to our model and compare the result to Eq. (40).
A comparison between Panels A/B with Panels C/D of Fig. 4 shows that the introgression probability changes only slightly over the depicted range of recombination, while the hitchhiking probability significantly decreases with increasing recombination distance; the scale is strongly affected by the population size. In this section, we investigate how a second deleterious allele affects the introgression and the hitchhiking probability in dependence of the strength of selection, the genetic architecture, and linkage. The results are summarized in Figs. 5, 6, 7, 8. Figures 5 and 6 consider the dependence on linkage. One sees that a strongly deleterious allele significantly affects the introgression probability even if it is only loosely linked. The impact on the hitchhiking probability is more subtle, and several cases must be distinguished. We first turn to Fig. 5, in which the second deleterious allele is on the same side of the beneficial allele as the first one. Panel C shows the behavior for σ (0,2) − r (0,2) < 0. In that case, applying Eq. (30) with Eq. (31), it approximately holds:  where we have used Q (0,1) ≈ 1 − σ (0,1) . Hence: where P (·,·) hitchhiking denotes the probability that type (0, 1) fixes in the population given that type (·, ·) got initially introduced. I.e., for close linkage of the second deleterious allele, the hitchhiking probability gets strongly reduced. However, as linkage gets looser, it converges quickly to its value in the absence of the second allele,  . Let c denote the factor by which the second deleterious allele reduces the hitchhiking probability of the first one: By rearranging terms, we obtain Importantly, the strength of selection of the second deleterious allele has no effect (as long as it is strong enough for our approximation to apply), and the other selection which is independent of the selection coefficients if σ (0,0) s del . The hitchhiking probability is crucially determined by the ratio of recombination distances r In Panel D, the second deleterious allele can hitchhike to fixation, too (σ (0,2) − r (0,2) > 0). The total hitchhiking probability of the closest deleterious alleles (i.e., the probability that either type (0, 1) or type (0, 2) fixes) is then only moderately reduced. For r (0,2) (0,1) small, both alleles fix. With increasing recombination distance the analytical result underestimates the true hitchhiking probability (see Appendix H).
In Fig. 6, the beneficial allele is flanked by the two deleterious alleles. If one of the alleles is strongly deleterious, the hitchhiking probability of the other one is not visibly reduced (Panel C). This is because for successful introgression, the strongly deleterious allele has to recombine away very early. The situation in Panel D (σ (0,1) − r (0,1) > 0, σ (1,0) − r (1,0) > 0, but σ (1,1) − r (1,1) < 0) looks similar to Fig. 5C. We have This is formally similar to Eq. (56). There is, however, an important difference: Now, the selection coefficient of the second deleterious allele is crucial. Its influences ceases with increasing strength as σ (1,0) goes to zero. If both deleterious alleles have approximately the same effect (σ (0,1) ≈ σ (1,0) ), the behavior is again determined by the ratio of the recombination distances r (1,1) (0,1) /r (1,1) (1,0) . Figures 7 and 8 show how the selective disadvantage of a second closely linked deleterious allele affects the hitchhiking probability. If it is on the same side of the beneficial allele as the first one (Fig. 7), the hitchhiking probability of the first one is greatly reduced unless the selective disadvantage is very slight (σ (0,2) ≈ σ (0,1) ). The reduction is greatest for intermediate values of the selection coefficient (see Fig. 7a).
In this parameter regime, type (0, 2) significantly increases in frequency before a successful lineage of type (0, 0) or (0, 1) is generated. As a consequence, the time to fixation of the beneficial allele is relatively long. Even if a successful lineage of type (0, 1) can establish, it is therefore likely that later, a successful lineage of type (0, 0) is generated (see Fig. 7b for an illustration of this reasoning). If the beneficial allele is flanked at equal small recombination distances by two deleterious alleles as in Fig. 8, the total hitchhiking probability of the deleterious allele to the right (fixation of type (0, 1) or type (1, 1)) is barely influenced by the presence of the second deleterious allele, irrespective of the selective disadvantage of the latter. Note, however, that recombination is weak in Fig. 8. For strong recombination and σ (0,1) ≈ σ (1,0) , it is not unlikely that both a successful lineage of type (0, 1) and of type (1, 0) establish and coexist for a long time, making the production of a successful (0, 0) recombinant very likely (see Fig. 13 in Appendix H).

The impact of several linked deleterious alleles
To start with, assume σ (I,J ) − r (I,J ) > 0. If all recombination distances are small, we can generalize the result Eq. (41) and calculate the probability that all deleterious alleles hitchhike to fixation. Analogous to the derivation of Eq. (41), we obtain for the probability that all deleterious alleles hitchhike to fixation We now turn to σ (I,J ) − r (I,J ) < 0 and study several selected scenarios which impart a general intuition. Consider first the special case I = 0, σ (0, j) − r (0, j) > 0, and σ (0,l) − r (0,l) < 0, l > j, i.e., all deleterious alleles are located at one side of the adaptive allele and at most j of them can potentially hitchhike to fixation. For tight linkage, we can again approximate 1 − Q (0,k) ≈ σ (0,k) for k ≤ j and obtain (proof by induction): I.e., the selection coefficients of the ( j + 1)th, ( j + 2)th, …, J th alleles do not enter the result. Furthermore, it does not matter where the deleterious alleles beyond the ( j + 1)th are located. As another example, consider the special case J = 1, σ (0,1) − r (0,1) > 0, and σ (i, j) − r (i, j) < 0, i > 0. If linkage is tight, the hitchhiking probability is barely reduced by additional deleterious mutations. Each simulation point is the average of 1,000 successful introgression events. The introgression probability for the scenarios shown in the plot ranges from ≈ 1.3 · 10 −7 (I = J = 10) to ≈ 0.002 (I = 0, J = 2), i.e., 1,000 successful introgression events correspond to ≈ 500, 000 to 6·10 9 introgression attempts; simulations for complex scenarios are hence impractical Finally, let σ (0,1) − r (0,1) > 0, I and J ≥ 1 arbitrary, and σ (i, j) − r (i, j) < 0 for (i, j) / ∈ {(0, 0), (0, 1)}. Figure 9 shows how additional deleterious alleles that can themselves not hitchhike to fixation can influence the hitchhiking probability of a slightly deleterious allele. The pattern can be understood by consideration of the various paths which lead to establishment of the beneficial allele: unless I = 0 (or J = 1), at least two recombination events are necessary to generate a type with positive Malthusian fitness. The position of the first successful recombination event depends on the fitness of the types that are generated by recombination. Since σ (1,0) is only slightly deleterious, the first recombination event is likely to generate this type if I = 1. In this case, the hitchhiking probability is strongly reduced (cf. the dip in Fig. 9). For I > 1, however, type (1, 0) cannot be generated via a single recombination event, and the reduction is less pronounced, getting smaller with increasing I . For large I , adding more deleterious alleles to the left or the right has only a weak effect. Generally, deleterious alleles that render haplotypes strongly disfavored have to be lost as quickly as possible, and the pathway to establishment of the beneficial allele does usually not involve tunneling via more strongly deleterious haplotypes than necessary. At either side of the adaptive allele, consider the set of alleles which must be lost for establishment. As a rule of thumb, each of the two sets can be replaced by a "virtual" allele of the respective compound effect. In order to have the same effect as the set of actual alleles, this virtual allele has to be located at the same position as the deleterious allele (out of the set) which is closest to the adaptation.

Discussion
Gene flow between related species is frequent in nature. Although many foreign alleles burden their carrier with a selective disadvantage, exchange of genetic material between populations is often still possible. If neutral or advantageous alleles survive the fitness bottleneck caused by linked or unlinked deleterious alleles they can become permanently incorporated into the genome of the sister species. Picking up locally adaptive alleles from an indigenous species can help species to expand their range to previously uninhabitable regions (e.g., Heiser 1951;Whitney et al. 2006). Adaptive gene introgression is hence a clever evolutionary mechanism that can speed up adaptation to novel environments. Human activities create ample opportunity for hybridization between domestic animals or crop plants with their wild relatives (e.g. Fitzpatrick et al. 2010;Ellstrand et al. 1999Ellstrand et al. , 2013. In this context, the introgression of alleles from genetically modified organisms (e.g., insecticide resistance genes) into weedy species is recognized as a risk that can cause permanent ecological damage. A quantitative analysis of the introgression process is essential both to assess the importance of adaptive gene introgression as an evolutionary pathway to adaptation and to estimate the ecological risks associated with unwanted hybridization.
The flow of adaptive alleles among species is hampered by the reduced fitness of inter-species crosses. This reduction in fitness is due to alleles from the donor species that are deleterious in the new environment (which we consider in this paper) and/or the new genomic background. If their compound effect outweighs the benefits of the adaptation, some deleterious alleles must be eliminated by recombination in hybrid back-crosses before the favorable allele can establish. However, closely linked slightly deleterious alleles might be dragged along to fixation. In this paper, we developed a framework to investigate the role of linked and unlinked deleterious alleles in adaptive gene introgression. The model accounts for an explicit genetic structure and describes the genetic evolution of a haploid population under the influence of selection, recombination, and drift after a single hybridization event. The analysis is based on the theory of branching processes. The early phase of spread of the advantageous allele is approximated by a reducible multitype branching process with a special structure: within the branching process approximation, offspring are either of the same type as their parent or carry fewer deleterious alleles (for similar setups see Barton and Bengtsson 1986;Demon et al. 2007;Gosh and Haccou 2010;Gosh et al. 2012a,b;Yanchukov and Proulx 2014). The fate of the recombinants that are generated after the initial establishment and until fixation of the beneficial allele is modeled by a time-inhomogeneous singletype branching process. For the analysis of the first phase, we make use of methods developed by Serra (2006) and Serra and Haccou (2007). The analysis of the second phase builds on work by Hartfield and Otto (2011). The combination of the results from both phases allows for the analytical treatment of the entire process.
The introgression probability How likely is it that the advantageous allele can establish itself in the population? For large populations, the probability of adaptive gene introgression only depends on the early phase of the spread, where the dynamics are well described by a branching process. Technically, this is similar to studies on scenarios where adaptation relies on the accumulation of new mutations via stochastic tunneling.
The gain of adaptive alleles by mutation in these scenarios, which include the crossing of fitness valleys (e.g., Weissman et al. 2009;Proulx 2011), tumor initiation (e.g., Iwasa et al. 2003Iwasa et al. , 2004a, or adaptation of a pathogen to a new host (e.g., Antia et al. 2003), corresponds to the removal of deleterious alleles by recombination in models of gene introgression. The survival probability of a multitype branching process is in general difficult to determine, and one has to resort to approximate formulas and numerical methods (e.g., Barton 1995;Iwasa et al. 2003Iwasa et al. , 2004bSerra and Haccou 2007). However, in our special case, a recursive solution can be derived and readily permits the calculation of the introgression probability for any given allele configuration.
We find that both linked and unlinked deleterious alleles can significantly hamper the introgression of an adaptive allele. However, the characteristic of this barrier depends on whether linkage between the beneficial and the deleterious alleles is loose or tight. Loosely linked and unlinked deleterious alleles reduce the introgression probability by a factor that is roughly independent of the strength of the beneficial allele. In this parameter range, our results are analogous to those of Bengtsson (1985) and Barton and Bengtsson (1986), who derive a so-called gene-flow factor or barrier strength to describe the effect of a genetic barrier on the flow of a neutral marker allele (compare Appendix B). The relative reduction due to a single loosely linked allele is approximately s del /r , where s del is the deleterious effect and r the recombination probability (see Eq. (50), Sect. 6.2). Strongly deleterious alleles (s del > 0.05) can have a substantial effect even if unlinked (r = 0.5). In agreement with Bengtsson (1985), we find that the influence of several unlinked deleterious alleles is well approximated by a single deleterious allele of the compound effect [Eq. (45)]. The simple picture of a gene-flow factor holds for r s ben (where s ben is the effect of the beneficial allele), but breaks down for tighter linkage. For small r , the relative reduction scales as 1−r/(|s ben −s del |) and thus depends explicitly on the strength of the beneficial allele [cf. Eq. (49)].
Our results on adaptive gene introgression can also be compared with results by Barton (1995) on the reduction of the fixation probability of a new beneficial mutation due to interference with standing deleterious variation. Barton (1995) assumes that the deleterious alleles segregate under mutation-selection balance in the population when the advantageous allele appears. Consider first the limiting case where a deleterious allele segregates at a single locus with frequency u. The beneficial mutation can arise on a genome that does or does not carry the deleterious allele. Technically, the introgression probability in our model corresponds most closely to the fixation probability of the beneficial mutation given that it arises on a genome with the deleterious allele [denoted by P u in Barton (1995)]. In that case, its fixation probability can be significantly reduced. However, the reduction due to segregating deleterious variation is generally much weaker than in our case, where both alleles enter the population via a single introgression event. This is because relative fitness of the double mutant is higher if the deleterious allele segregates in the population. A numerical comparison confirms that the result P u /(2σ (0,0) ) as given by Eq. (16) and (17a) in Barton (1995) converges to (1 − Q (0,1) )/σ (0,0) [cf. Eq. (47c)] as the mutation rate and hence the frequency of the deleterious allele tend to zero. (Note that the results in Barton (1995) are based on a Poisson distribution of the offspring number such that the establishment probability of an isolated beneficial allele is 2σ (0,0) while it is σ (0,0) in our model). Our results for the introgression probability hence represent limiting cases of the results in Barton (1995) (P u in the limit u → 0). However, for more than one deleterious allele, the equations in Barton (1995) do not allow for an analytical solution, while this is possible for the case of introgression, as shown by our results. The total fixation probability in the segregating-alleles case is a weighted average of the cases that the beneficial allele appears on a genome with and without the deleterious allele; the weighting factor depends on the mutation rate. This can lead to widely diverging conclusions as compared to the introgression scenario. For example, for σ (0,1) < 0 and complete linkage, the weighted fixation probability of the beneficial mutation is reduced by u relative to its value in absence of the deleterious allele [Barton 1995, Eq. (17b)]. In contrast, the introgression probability (as well as P u ) are zero in that case. For σ (0,1) > 0, the relative reduction is at most u(s del /σ (0,0) ) 2 , i.e., much smaller than for introgression (s del /σ (0,0) ). Note that the reduction in the weighted fixation probability is caused by the recurrent generation of deleterious alleles that appear on genomes carrying the adaptation. The presence of deleterious alleles itself even slightly increases the weighted fixation probability [this term is very small and neglected in Barton (1995)]. Finally, Barton (1995) finds that for two loci flanking the beneficial mutation, the effects of the two deleterious alleles approximately multiply. This does not hold true for the different biological scenario of adaptive gene introgression.
Linked deleterious alleles can render successful introgression after a single hybridization event extremely unlikely. In order to assess whether even introgression probabilities of the order of 10 −8 − 10 −6 are still evolutionary relevant, it is helpful to compare these values to the probability of adaptation by de-novo mutations. With a point mutation probability of ∼ 10 −8 and a selective advantage of 1 %, the probability that a specific mutation occurs in a specific individual and thereafter rises to fixation, is ∼ 10 −10 . For complex adaptations, the probability is even lower. Depending on the probability of hybridization, adaptive gene introgression can hence be a relevant evolutionary process. Hybridization rates are potentially high, and even if the success probability of each single hybridization event is low, the probability that any hybridization event is followed by adaptive gene introgression is appreciable. This consideration is particularly important in an agricultural context where (genetically modified) crops grow next to wild plants in large areas all over the world for many years. Gosh and Haccou (2010) and Gosh et al. (2012a,b) therefore suggest the so-called hazard rate as a measure for risk assessment, as the hazard rate takes both the hybridization rate and the introgression probability into account. The hitchhiking probability Weakly deleterious alleles that are closely linked to the adaptive allele can hitchhike to fixation. We developed a framework to estimate which haplotype finally fixes in the population, depending on the alien haplotype that was originally introduced. The approach is based on a split of the process into two phases: the establishment phase of the adaptive allele and the sweep during which further deleterious alleles can be lost. What is the respective relevance of the stochastic and the deterministic phase in this scenario? In the simplest case, there are only two loci under selection: one locus with the advantageous and one locus with a deleterious allele that can hitchhike to fixation. We can then distinguish two parameter regimes: if the product of the selection coefficient and the population size, N s del , is of order 1 or smaller, the impact of the stochastic phase is significant. The probability for the hitchhiker to survive this phase depends strongly on the selection coefficients of both the beneficial and the deleterious allele, and on the recombination rate, but is independent of the population size (1 − rs ben /(s ben − s del ) 2 , cf. Eq. (54)). However, if the product of selection and population size is large, the stochastic phase can be ignored and the behavior is dominated by the deterministic phase. Hitchhikers will survive this phase if no haplotype without the deleterious allele can establish. Since the duration of the deterministic phase is ∼ 1/(s ben −s del ) and the number of successful new recombinants per generation roughly ∼ s ben Nr, we see that the hitchhiking probability will strongly depend on Nr, while the effect of selection partly cancels. This is confirmed by our more precise calculations (Eq. (40)). The situation is different if additional deleterious alleles render the initial haplotype itself deleterious. In that case, establishment of the adaptation is contingent on the early loss of deleterious alleles, and depending on the allelic configuration, the stochastic establishment phase will have a strong impact on hitchhiking irrespective of the population size. To good approximation, all alleles that cause serious maladaptation and are located on the same side of the beneficial allele can be summarized to a single allele of the compound effect, reducing the dimensionality of the problem. The impact of these additional alleles fades quickly with increasing recombination distance. In contrast to introgression, unlinked alleles have no visible effect on the hitchhiking probability conditioned on successful introgression. These insights essentially generalize to more than one possible deleterious hitchhiker. Hartfield and Otto (2011) analyze the hitchhiking probability of a single deleterious allele in the absence of other deleterious alleles. They present two approaches to the problem: a semi-deterministic approach based on branching process theory (which also serves as the basis for our analysis of the deterministic phase), and a diffusion approach. In both cases, however, they condition on establishment of type (0, 1). Their analysis thus ignores aspects of the stochastic establishment phase and consequently applies only to the regime of large N s del .
Selection and recombination in the introgression process Summarizing, we can identify three fundamentally different genomic scales in units of the recombination rate that matter for adaptive gene introgression. First, deleterious alleles on the introgression haplotype affect the introgression probability of the beneficial allele across distances on the order of the deleterious selection coefficient (r ∼ s del ; cf. Eq. (50)). Strongly deleterious alleles thus still matter even if they are unlinked. Importantly, the absolute strength of selection is crucial for the failure or success of introgression. For the hitchhiking probability of a single deleterious allele, we find two relevant scales that stem from the stochastic and the deterministic phases of the hitchhiking process, respectively. In the stochastic phase, this scale is set by the selection coefficient of the haplotype containing both the beneficial and the deleterious allele (r ∼ s ben − s del ; cf. Eq. (54)). Similar to the introgression probability it is the strength of selection that matters. In contrast, for the deterministic phase, the scale is primarily set by the inverse population size (r ∼ 1/N ). The rate of loss of the deleterious allele during the deterministic phase is also affected by selection. However, in contrast to the stochastic phase, only the ratio of selection coefficients (s del /s ben ) enters and not the absolute strength [Eq. (40)]. Usually (but not always), effects from the deterministic phase are relevant already over much shorter distances than effects from the stochastic phase and will therefore dominate. Finally, a third scale for the distance r between the beneficial allele and a deleterious hitchhiker becomes relevant if hitchhiking is compromized by further deleterious alleles in the genomic background. For cases where an additional deleterious allele at distance R to the hitchhiking allele suppresses the Malthusian fitness of that original introgression haplotype below zero, we find that the relevant scale for hitchhiking success is primarily set by the relative size of the recombination rates, i.e., r ∼ R [cf. Eqs. (56), (60)].

Limitations and extensions
The mathematical analysis of the model has two major restrictions. The first one is the common constraint of branching process approximations for establishment probabilities of beneficial alleles: for small populations, the branching process approach underestimates the true probability. In particular, our multitype branching process can contain supercritical, critical, and subcritical types. If the population size is small, fixation of deleterious haplotypes by genetic drift can be more likely than survival of the branching process. The second restriction concerns the analysis of the hitchhiking probability. The derived approximations rely on the assumption that initially, a single haplotype establishes and starts sweeping before haplotypes with fewer deleterious alleles possibly establish. In particular when linkage is not tight, this assumption is not necessarily satisfied. However, this introduces only a small error, and the approximation usually still yields good results (cf. Fig. 12). Relaxing the assumption would require the assessment of the stochastic dynamics of two or more types that simultaneously establish in the population at different random speeds, which would severely complicate the calculations. The analysis of the deterministic phase increases in complexity with the number of positively selected haplotypes. In order to keep it tractable, we assume that at most two successful recombinant haplotypes are generated during the fixation process of the adaptive allele. While this assumption is again well justified for tight linkage, it leads to strong deviations as recombination gets stronger (cf. Fig. 13).
The model assumes a population of haploid individuals or diploids without dominance. The results for the introgression probability can be directly generalized to apply to diploids with dominance unless the beneficial allele is completely recessive: as long as introgressed alleles are rare in the population, they only appear in heterozygotes. Equation (2) therefore still applies if fitness refers to heterozygote fitness. As soon as the introgressed alleles become more frequent, both copies of an individual's chromosome might carry introgressed material. The sweep of a selectively favored haplotype is therefore strongly altered if the alleles are not co-dominant. The length and shape of this frequency path is, however, crucial for the hitchhiking probability. Our approach can be extended by this element (see Hartfield and Glémin (2014) for a single deleterious hitchhiker if there is dominance). The model furthermore assumes that selection against the deleterious alleles is independent of the genetic background. Recombination therefore increases the fitness of later generations of hybrids. However, by recombination, incompatibilities can be generated. In that case, later hybrids display a lower fitness than early generation hybrids until further recombination removes the incompatible alleles.
Beyond these genetic confinements, the second class of model extensions concerns the ecological setting. The model assumes that the alien genome arrives by long range dispersal in a panmictic population. An important extension, which would consider-ably affect the results, is the incorporation of spatial structure. A very different situation from the one analyzed in this paper arises if dispersal is local and strong and recurrent gene flow builds up a hybrid zone. Under these circumstances, as discussed in Barton (1979), a single-locus cline does not significantly hamper the spread of a beneficial allele from one population to the other.
To summarize, we set up a minimal model in order to reveal fundamental principles that are effective in the introgression process of a favorable allele. In particular, the analysis helps to build an intuitive understanding of how deleterious alleles impact adaptive gene introgression.

Appendix A: Proof of Theorem 1
We can reinterprete the branching process as follows: An individual of type (i, j; f ) "dies" at rate 2 − σ (i, j; f ) and at death, it produces either zero or two offspring, one of which is of its own type. We now consider the embedded discrete-time process: (i, j; f ) , it produces an offspring of its own type and an offspring of type (i, j; g) with g ≤ f .
-With probability f 2−σ (i, j; f ) , it produces an offspring of its own type and an offspring of type (i, k; g) with g ≤ f and k < j.
it produces an offspring of its own type and an offspring of type (k, j; g) with g ≤ f and k < i. Within this scheme, the offspring generating function of an individual of type (i, j; f ) is given by where s is a vector with elements s (k 1 ,k 2 ;g) ; 0 ≤ k 1 ≤ I, 0 ≤ k 2 ≤ J, 0 ≤ g ≤ F . Let G(s) be the vector whose components are the offspring generating functions of all possible types. According to the general theory of multitype branching processes, the extinction probability is given by the root of in the unit cube, which is closest to the origin (Sewastjanow 1974 p. 115). First note that Eq. (64) is equivalent to Eq. (3) (identifying the solution of Eq. (64) with the vector with elements Q (k 1 ,k 2 ;g) ; 0 ≤ k 1 ≤ I, 0 ≤ k 2 ≤ J, 0 ≤ g ≤ F ).
Equation (64) can be solved recursively starting with type (0, 0; 0), for which we obtain where we have used that (1 − r (i, j) ) 1 2 f ≤ 1 − . Since a > 0, we conclude that both roots are positive. It remains to prove that q 2 ≤ 1. Note that 1 is a root of the equation since G(1) = 1. We furthermore see that q 2 is a decreasing function of and thus of all Q (k, j) and Q (i,k) , if c > 0. For c = 0, it holds that q 2 = 0. It is hence clear that q 2 ≤ 1. From Sewastjanow (1974), we even know that q 2 < 1 (because Q (0,0;0) < 1).
Note that instead of considering the embedded discrete time process we could have directly resorted to the corresponding result for the extinction probability of the continuous time processes (Sewastjanow 1974, p. 116), but it seemed more illustrative in this way. Bengtsson (1985) and Barton and Bengtsson (1986) determine by how much deleterious alleles hamper the flow of a neutral marker locus from one population into another, where Bengtsson (1985) mainly focuses on unlinked and Barton and Bengtsson (1986) on linked deleterious alleles. The impact is expressed via a so-called gene flow factor, which is given by the ratio of an effective migration rate m e and the actual migration rate m (Eqs. (1), (3), and (5) in Bengtsson (1985); Eqs. (A3) and (A7) in Barton and Bengtsson (1986); the inverse of the gene flow factor gives the barrier strength to gene flow). Building on these results for the introgression of neutral marker loci, the introgression probability of a single adaptive allele with selective advantage s can be approximated by (75) Figure 10 shows how this approximation compares to the establishment probability as obtained by the branching process approach if there is just one deleterious allele. The deviation is large for tight linkage and gets small as the recombination distance between the loci increases; for lose linkage, it is negligible. This observation generalizes to scenarios with more deleterious alleles. Deviations for tight linkage are to be expected: As an example, consider the basic scenario I = 0, J = 1, and s del < σ (0,0) . For r = 0, the introgression probability of the adaptive allele equals σ (0,0) −s del . A neutral marker allele that is equally tightly linked, however, establishes with probability ≈ 0 (within the approximations by Bengtsson (1985) and Barton and Bengtsson (1986), its establishment probability equals zero; in a finite population, it might fix by drift). We now turn to the other extreme, i.e., when the deleterious alleles are all unlinked. We can perform a first-order Taylor expansion in s del of the result for m e /m with I = 0, F = 1 as found by Bengtsson (1985) m e m = 1 − 2s del + O(s 2 del ).

Appendix B: Approximation of the establishment probability via an effective migration rate
This is in agreement with Eq. (46). Bengtsson (1985) furthermore finds that to good approximation, the compound effect of all deleterious alleles, not their individual effects, matters. Again, this accords with our findings [cf. Sect. 6.1].
-Inductive step: Let the hypothesis be true for all pairs (k, m) with k < n and (n, k) with k < m. We show that it is true for (n, m): The proof works analogously if the number of unlinked alleles is larger than 0.

Appendix D: Unlinked alleles of arbitrary effect
For the main text, we assumed that each unlinked allele has the same selection coefficient. Here, we give the generalization to arbitrary effects. Let F = {e = (e 1 , e 2 , . . . , e F ); e i ∈ {0, 1}}. We define |e| = F i=1 e i . The set of unlinked deleterious alleles carried by an individual can be characterized by a vector in F where 1 and 0 at a given position denote the presence or absence of a specific deleterious allele. In this notation, the transitions of the branching process read with |e f | = f : r (i, j) ) for e f − e g ∈ F, The recursive equation for the extinction probability becomes: The proof is analogous to before. Similarly, we can generalize approximation Eq.

Appendix E: General lemmata
Here we summarize some general lemmata which we use in the derivation of the hitchhiking probability. Although the lemmata either are not new or follow immediately from known results, we briefly sketch the proofs. We consider a multitype branching process with N + 1 types on ( , F, P). The number of individuals of type i (i = 0, 1, . . . , N ) in generation n is Z , the process is started by an individual of type N ; the term "generation" refers to the distance to this founding individual. Individuals of type i reproduce at rate 1 and die at rate 1 − σ i . Let the offspring of a type N individual be of type i with probability r i . If a type N individual gives birth to an individual of type i < N , we call this a recombination event from type N to type i. We define r := N −1 k=0 r k = 1 − r N .

Lemma 2 The probability generating function (p.g.f.) f (s) of the offspring distribution of a type N individual is given by
Proof Individuals have a geometric offspring distribution; for the random offspring number X of a type N individual, it holds: and therefore Lemma 3 The joint p.g.f. of (Z (0) Proof We restrict to N = 1; the generalization to larger N is straightforward. Let p(k) be the probability that an individual of type 1 has k offspring and p(k 0 , k 1 ) the probability that it has k 0 offspring of type 0 and k 1 offspring of type 1. It holds (cf. Serra 2006): Lemma 4 The extinction probability q N of type N is given by Proof The p.g.f. for the number of offspring of its own type is given by f (r +(1−r )s).
From the general theory of single type branching processes, it follows that q N is the smallest root of in [0, 1].
Lemma 5 Let the offspring of type i be of type j ≤ i only. Consider the process on the set B = ω : Z (N ) n (ω) → 0 as n → ∞ , i.e., the set of paths were type N goes extinct. The probability of the event B is denoted by q N . The joint p.g.f. of (Z (0) 1 , Z (1) 1 , . . . , Z (N ) 1 ) in the conditioned process is given bŷ Proof For simplicity, we stick to N = 1. The generalization to N > 1 is straightforward. It holds (cf. Athreya and Ney 1972, p. 52f) : Proof We first proof the relationĥ (s) =F(s,ĥ(s)).
(cf. Serra 2006). Letp(k) be the probability that the total number of recombination events is k and p(k r , k N ) be the probability that an individual of type N has k N offspring of type N and k r offspring of other types. First, note that: Furthermore: Using this:ĥ It thus holds: This leads to a quadratic equation forĥ(s) which has two solutions. As it must hold thatĥ(1) = 1, we can exclude one of them and obtain Eq. (92).
If q N = 1, we denote the probability generating function by h(s) in the main text.
Lemma 7 With the assumptions of the previous lemma, the joint p.g.f. of the number of recombination events from type N to type 0, 1, 2, . . . , N − 1 is given by Proof analogous to the proof of Lemma 5.

Appendix F: The deterministic phase
Appendix F1: The process is initiated by an individual of type (0, 2) We now focus on three types (0, 0), (0, 1), and (0, 2), that all have a positive selection coefficient. All other types are assumed to go extinct. A lineage of type (0, 2) starts sweeping. Throughout, we assume that recombination is weak enough that it does not alter the deterministic frequency path of a type. The deterministic frequency path is given by 1−r (0,2) . Recombination events to successful lineages occur at rate with where p We assume in the following that we can setν 0 /N ≈ 0 in the integration boundaries. Analogous to Eq. (41), the probability that no successful recombination event takes place is then given by The probability that no successful type (0, 0) is generated up to t R is given by The probability that no successful type (0, 1) is generated up to time t R is given by The probability that as a first event, a successful individual of type (0, 0) is generated is given by Equivalently, the probability that as a first event, a successful individual of type (0, 1) is generated is given by In that case, type (0, 1) might either rise to fixation, or a successful type (0, 0) individual might be generated in the following process. We now calculate the probability of these two events given a type (0, 1) individual was generated at time t R . In order to calculate the probability for the generation of a successful type (0, 0) individual, we need the frequencies of the three present types (wildtype, type (0, 2), and type (0, 1)) at time t after the recombination event. We have to take into account that in its early phase of growth, the type (0, 1) increases faster than predictic by the deterministic path. As before, we therefore replace its initial frequency by the mean of the effective initial population size ν. We assume that during establishment, the type (0, 1) individuals exclusively replace wildtype individuals. This assumption has no visible influence on the results. We obtain for the frequencies of wildtype individuals, type (0, 1) individuals, and type (0, 2) individuals, respectively: x 01 (t|x (0,2) x 02 (t|x (0,2) . The time-dependent selection coefficient of type (0, 0) is given bỹ The establishment probability of a single individual of type (0, 0) that arises at time t after t R can be calculated similarly to before and is given bỹ The rate of successful type (0, 0) individuals is (Remember: r (0,1) (0,0) = r (0,2) (0,0) .) We assume that we can ignore deviations from the deterministic frequency path for large t close to fixation of the beneficial allele. The probability that no successful recombination event takes place is then given by Prob(no rec|x (0,2) The probability that type (0, 1) fixes is hence The overall probability that type (0, 0) fixes is obtained as 7.1 The process is initiated by an individual of type (1, 1) We now focus on four types (0, 0), (0, 1), (1, 0), and (1, 1), which all have a positive selection coefficient. An individual of type (1, 1) starts sweeping. We derive an approximation which is valid if no more than two successful recombination events happen during the process, i.e., for weak recombination. We again assume that recombination does not significantly reduce the number of offspring of any individual's own type. The deterministic path of type (1, 1) is given by 1−r (1,1) . Recombination events that generate successful lineages occur at rate with where p (1,0) est and p (0,1) est are the establishment probabilities of an individual of type (1, 0) and (0, 1) respectively and calculated by suitable substitutions in Eq. (38). Analogous to Eq. (41), the probability that no successful recombination event takes place is given by P ((1,1)→(1,1) (1,1) (0,1) σ (0,1) (σ (0,1) −σ (1,1) ) σ 2 (1,1) .
The probability that no successful type (1, 0) is generated up to t R is given by The probability that no successful type (0, 1) is generated up to t R is given by The probability that as a first event, a successful type (1, 0) individual is generated is given by Equivalently, the probability that, as a first event, a successful type (0, 1) individual is generated is given by We now choose without loss of generality σ (1,0) ≤ σ (0,1) . We first consider the case that as a first event, an individual of type (0, 1) is generated. In that case, type (0, 1) might either rise to fixation, or a successful individual of type (0, 0) might be generated in the following process. We ignore the possibility that an individual of type (1, 0) might be generated and temporally sweep until it goes extinct. We now calculate the probability of the two events given a successful individual of type (0, 1) has been generated at time t R . A recombination event may generate a successful type (0, 1) individual at time t R . The type (0, 1) individual will rise to fixation unless a successful type (0, 0) individual is generated. In order to calculate this probability, we need the frequencies of the three present types (wildtype, type (0, 1), (1, 1)) at time t after the recombination event: x 01 (t|x (1,1) , (124b) . The time-dependent selection coefficient of type (0, 0) is given bỹ The establishment probability of a type (0, 1) individual that arises at time t after t R can be calculated as before and is given bỹ The rate of successful type (0, 0) individuals is The probability that no successful recombination event takes place is therefore given by The probability that type (0, 1) fixes is therefore given by: Prob(no rec|x (1,1) (t R ))dt R .
The rate of successful type (0, 0) individuals is The rate of successful type (0, 1) individuals is The probability that no successful recombination event takes place is therefore given by The probability that type (1, 0) fixes is therefore given by: The probability that a successful type (0, 0) individual is generated next at time τ R , is given by The probability that a successful type (0, 0) is generated next at any time is thus The probability that a successful type (0, 1) individual is generated next, is given by As numerical evaluation of these integrals is computationally expensive, we introduce the following approximation: We calculate the probability that at least one individual of type (0, 0) is generated, ignoring type (0, 1) (and vice versa), i.e., .
In order to make the fixation probabilities of the various types sum up to one, we subsequently introduce a normalization factor, which overall leads to the approximation P upper bound (0,0) × for the probability that type (0, 0) is generated next at any time.
For the probability that type (0, 1) is generated next at any time, we approximate accordingly After establishment of type (0, 1), a successful individual of type (0, 0) might still be generated. We ignore this probability. This will be a good approximation when the probability of three successful recombination events is low, i.e., in particular when recombination is low.

Appendix H: Accuracy of the approximations
Appendix H.1: The hitchhiking probability: the stochastic phase Our approximation (43) is based on the assumption that only one "rescue individual" is generated. Here, we investigate the limits of this assumption. As an example, we consider the case I = 0, J = 2 with σ (0,2) < 0 in more detail. We denote bỹ h 1 (s 0 , s 1 ) the joint p.g.f. for the number of "successful" (Y + ) and "unsuccessful" (Y − ) recombination events until the extinction of type (0, 2) conditioned on survival of the process. A successful recombination event means that the recombinant founds an infinite lineage. It holds: with h 1 (s 0 , s 1 ) given by Eq. (11) and Lemma 6. The probability that more than one successful recombination event takes place is given by = 1 − ∂ ∂s 1h 1 (s 1 , s 2 )| s 1 =0,s 2 =1 = 1 − P success 1 − Q (0,2) h (1 − P success ).
The smaller P(Y + > 1|survival), the better is the assumption fulfilled. Panels A and B of Fig. 12 show the probability P(Y + > 1|survival) as a function of recombination and fitness of type (0, 2), respectively. Note that P(Y + > 1|survival) displays a maximum as a function of the recombination probability (not shown): For low recombination, an offspring is a recombinant type with low probability. For large recombination, the number of offspring which are themselves of type (0, 2) gets significantly reduced by recombination, leading to a faster extinction of type (0, 2); this in turn leads to few recombination events even if the recombination probability is high. Panels C and D of Fig. 12 show the hitchhiking probabilities corresponding to the scenarios of Panels A and B. The deviation between the theory and simulations is smaller than expected from P(Y + > 1|survival). For σ (0,2) close to zero, type (0, 2) can survive for a long time until it goes extinct and P(Y + > 1|survival) is large. However, successful recombination events are rare. This implies that, within the multitype branching process, the time between the generation of the first and second rescue individuals can be large. Assume that a rescue individual of type (0, 1) is generated first. Until the generation of the second rescue individual, type (0, 1) has already reached such a high frequency that the multitype branching process description has become obsolete. I.e., even if the (1,1)→(0,1) det rely on the assumption that no more than two successful recombination events happen until fixation of the adaptive allele (and additionally on an approximation of two integrals, see Eqs. (144) and (145). This assumption is only justified if recombination is weak. In particular, if recombination is strong, it is likely that a lineage of type (0, 1) individuals and a lineage of type (1, 0) individuals segregate simultaneously in the population and recombine to found a successful lineage of type (0, 0) individuals. In this case, our approximation will strongly overestimate the fixation probability of type (0, 1) (assuming that σ (0,1) > σ (1,0) ) and underestimate the fixation probability of type (0, 0). This can be clearly seen in Fig. 13. Note that for increasing recombination, the fixation probability of type (0, 1) attains a maximum and decreases then in favor of fixation of type (0, 0) due to the described mechanism (not shown). This behavior is not covered by our theory. The prediction for fixation of type (1, 0) or (1, 1) remains highly accurate though. Our theory is based on a clear separation of the process into two phases. In some cases, however, the two phases blend together.
In Fig. 5, Panel d, deviations between theory and simulations arise for increasing recombination. This has two reasons: First, the approximation P (0,2) (0,2) ≈ 1 becomes worse as recombination increases. Second (and more importantly here), it is not unlikely that type (0, 1) and type (0, 2) establish simultaneously while our approximation assumes that type (0, 1) establishes after type (0, 2). A slight shift in the time of establishment as introduced by our approach can lead to appreciable deviations of the modeled frequency paths of both types from their true paths. This, in turn, has a non-negligible impact on the assessment of the hitchhiking probability.