On the eigenvalue effective size of structured populations

A general theory is developed for the eigenvalue effective size (NeE\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N_{eE}$$\end{document}) of structured populations in which a gene with two alleles segregates in discrete time. Generalizing results of Ewens (Theor Popul Biol 21:373–378, 1982), we characterize NeE\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N_{eE}$$\end{document} in terms of the largest non-unit eigenvalue of the transition matrix of a Markov chain of allele frequencies. We use Perron–Frobenius Theorem to prove that the same eigenvalue appears in a linear recursion of predicted gene diversities between all pairs of subpopulations. Coalescence theory is employed in order to characterize this recursion, so that explicit novel expressions for NeE\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N_{eE}$$\end{document} can be derived. We then study NeE\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N_{eE}$$\end{document} asymptotically, when either the inverse size and/or the overall migration rate between subpopulations tend to zero. It is demonstrated that several previously known results can be deduced as special cases. In particular when the coalescence effective size NeC\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N_{eC}$$\end{document} exists, it is an asymptotic version of NeE\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N_{eE}$$\end{document} in the limit of large populations.


Introduction
The effective size N e was introduced by Wright (1931Wright ( , 1938 as the size of an ideal homogeneous population with the same rate of loss of heterozygosity per generation as the studied population. It has become one of the most important parameters O. Hössjer (B) Divsion of Mathematical Statistics, Department of Mathematics, Stockholm University, Stockholm, Sweden e-mail: ola@math.su.se in population genetics and conservation biology, as reviewed for instance by Crow and Denniston (1988), Orrive (1993), Caballero (1994), Wang and Caballero (1999), Waples (2002) and Charlesworth (2009).
Several closely related variants of N e exist, and Crow (1954) first distinguished between the inbreeding effective size N eI , the quantity originally defined by Wright, and the variance effective size N eV . He also introduced a random extinction parameter that quantifies the long term rate at which genetic variants are lost. It is equivalent to the eigenvalue effective size N eE , defined in terms of the largest non-unit eigenvalue of a Markov chain of allele frequencies (Ewens 1982(Ewens , 2004. The nucleotide diversity or mutation effective size N eπ is essentially the expected coalescence time of a pair of haploid individuals (Ewens 1989;Durrett 2008), whereas the coalescent effective size N eC is defined for populations such that the ancestral tree of any finite number of individuals converges to a Kingman coalescent in the limit of large populations (Kingman 1982;Nordborg and Krone 2002;Sjödin et al. 2005;Wakeley and Sargsyan 2009;Hössjer 2011).
In this paper we provide a general theory of N eE for mutation free structured populations, in which a selectively neutral marker (referred to as a gene) with two variants or alleles segregates. The population consists of s homogeneous subpopulations (geographic sites, age classes, sexes or combinations thereof) and evolves in discrete time, with constant census sizes of all subpopulations. Ewens (2004) reviewed results on N eE for homogeneous populations, showing that it agrees with N eI and N eV for the Wright-Fisher model (Wright 1931;Fisher 1958), Kimura's multi-hypergeometric model (Kimura 1957), conditional branching process models (Karlin and McGregor 1965) or more generally models in which offspring numbers are exchangeable (Cannings 1974).
Results on N eE for structured populations are less complete. Ewens (1982) showed that N eE may differ from N eI and N eV for two-sex models. Cabellero and Hill (1992) and Nagylaki (1995) considered a number of diploid models and derived formulas for an effective size based on the long term decay of heterozygosity. Chesser et al. (1993) and Wang (1997a, b) analyzed the island model with two sexes. They derived linear recursion formulas for the inbreeding coefficient and the coancestry of individuals from the same and different subpopulations, and computed an effective size from the largest eigenvalue of this recursion. Felsenstein (1971) computed the effective size for models with s age classes, and found the effective size from the largest eigenvalue of a linear recursion of s 2 non-identity by descent probabilities of genes drawn with replacement from all pairs of age groups. Maruyama (1970a) derived a similar effective number for the circular stepping stone model under large population and small migration rate limits. Tufto et al. (1996) and Tufto and Hindar (2003) defined the eigenvalue effective size from a linear recursion of covariances between of all pairs of subpopulations.
All these notions of effective size are derived in terms the largest eigenvalue λ of linear recursions of covariances or probabilities of identity by descent or state. Whitlock and Barton (1997) showed that these linear recursions are closely related. They also argued briefly that the transition matrix of the Markov chain of allele frequencies has its largest non-unit eigenvalue equal to λ, and therefore all effective sizes of the previous paragraph agree with N eE .
Motivated by the argument of Whitlock and Barton (1997), our main purpose in this paper is to provide a general framework for exact and asymptotic computation of N eE for a large class of structured populations with stochastic backward migration and exchangeable reproduction within subpopulations. In Sect. 2 we introduce the population genetic model and Ewens' definition of N eE in terms of the rate at which the Markov process of allele frequencies in all subpopulations reaches an absorbing state, quantified by the largest non-unit eigenvalue λ of its transition matrix. In Sect. 3 we focus on gene diversities, i.e. probabilities of genes not being identical by state. We introduce an s 2 -dimensional deterministic process of predicted gene diversities and prove that whenever it is a linear recursion, λ also equals the largest eigenvalue of the matrix A of this recursion. In Sect. 4 we show how the elements of A are obtained from coalescence theory in new settings that generalize previous work, as illustrated with several examples in Sect. 5. Asymptotic approximations of λ and N eE are obtained from perturbation theory of eigenvalues of matrices in Sect. 6, when either the population gets large and/or the migration rate gets small. This gives novel asymptotic expressions for N eE that, for instance, in the limit of large populations agrees with the coalescence effective size N eC when the latter exists. A discussion follows in Sect. 7 and proofs are collected in the "Appendix".

Model of reproduction, migration and allele frequency change
Consider a population of N individuals, divided into s subpopulations of constant sizes N 1 = N u 1 , . . . , N s = N u s , with u i ≥ 0 and i∈I u i = 1. Each individual carries two copies of a selectively neutral gene so that subpopulation i has a total of 2N i genes. The population evolves in discrete time (not necessarily generations) t = 0, 1, . . ., with the genes of each subpopulation k ∈ I at time t − 1 numbered g = 1, . . . , 2N k , and ν tkig referring to the number of offspring of gene g that migrate to subpopulation i at time t. The total gene flow from k to i between t − 1 and t is summarized by the backward migration rate i.e. the fraction of genes a time t and subpopulation i that originate from k at time t −1. The matrix B t = (B ti j ) is referred to as the observed backward migration matrix at time t. Since its row sums are one, it is the transition matrix of a Markov chain with state space I.
Let ν tkg = (ν tk1g , . . . , ν tksg ) summarize the frequency distribution of the offspring of gene g of subpopulation k at time t −1 in all subpopulations. Assume that {ν tkg } 2N k g=1 are exchangeable random vectors, conditionally on B t , and that {B t } are independent and identically distributed random matrices with where B = (B ik ), the expected backward migration matrix, is the transition matrix of a Markov chain with state space I. We assume that the B is irreducible and aperiodic, with a unique equilibrium distribution γ = (γ 1 , . . . , γ s ) satisfying γ i ≥ 0, s i=1 γ i = 1 and It follows from (2) that the observed forward migration rate between k and i, i.e. the expected number of offspring of the genes in subpopulation k at time t − 1 that at time t end up in subpopulation i, is In order to keep subpopulation sizes constant over time, the total contribution in (5) from all parental populations k must be constant, i.e.
Let M t = (M tki ) be the observed forward migration matrix of time t. It follows from (3) and (5) that the corresponding expected forward migration matrix M = E(M t ) has elements M ki = E(ν tkig ) related to those of B as for 1 ≤ k, i ≤ s. Taking expectations on both sides of (6), we find that the vector u = (u 1 , . . . , u s ) of relative subpopulation proportions satisfies u = uM; a left eigenvector of M with eigenvalue 1. The two vectors u and γ are identical for conservative migration (Nagylaki 1980), but in general they differ. Consider a biallelic genetic marker, and let X t = (X t1 , . . . , X ts ) be a column vector of (relative) frequencies of one of the two alleles in all subpopulations at time t, where prime denotes transposition. Since {ν tkg } 2N k g=1 are exchangeable, we may number the genes of subpopulation k and time t − 1 so that the first 2N k X t−1,k have the specified allele. Then the allele frequency drift from one time point to the next can be summarized as The following result is a simple consequence of (3), (5) and (9): where x is a column vector of allele frequencies of length s.
We can rephrase Proposition 1 as {X t } being a vector-valued autoregressive process of order 1 (Brockwell and Davis 1991). This process will be heteroscedastic, since the covariance matrix Var(X t |X t−1 = x) varies with x. The dynamics of {X t ; t ≥ 0} is described more generally by means of a time homogeneous Markov chain with a state space , and a transition kernel P = (P(x, y)), with elements P(x, y) = P(X t = y|X t−1 = x) for all x, y ∈ X . Since our model is free of mutations and no subpopulation is isolated, sooner or later one of the two alleles will be fixed in all subpopulations. This can be phrased as {X t } having two absorbing states 0 = (0, . . . , 0) and 1 = (1, . . . , 1), so that P is reducible with two stationary distributions π 1 (x) = 1 {0} (x) and π 2 (x) = 1 {1} (x), one for each of the absorbing states, with 1 Y (x) the indicator function of Y ⊂ X . Write π i = (π i (x); x ∈ X ) for the corresponding two row vectors of length |X |. Since π i = π i P for i = 1, 2, they are left eigenvectors of P with eigenvalue 1. We divide X = ∪ n i=1 X i into components X 1 = {0}, X 2 = {1}, X 3 , . . . , X n that induce a block form of the transition matrix, with zero blocks above the diagonal. For any function φ : be a column vector of function values. Then P acts as an operator φ → Pφ on R X , as where E x denotes expectation conditionally on X 0 = x, cf. e.g. Norris (2008). In particular, the column vectors generated from are both right eigenvectors of P with eigenvalue 1, i.e. φ i = Pφ i for i = 1, 2. Indeed, this follows from (4) and (10), since and similarly for φ 1 . In order to find the rate of fixation of one of the two alleles, we need to look at P t for large t. We apply Markov chain theory and find this rate among all possibly complex-valued eigenvalue of P, as the largest non-unit one. More specifically, in the "Appendix" we use Perron-Frobenius Theorem (see for instance Cox and Miller 1965) as a main ingredient for establishing the following: Theorem 1 Suppose the square submatrices P ii in (11) along the diagonal are irreducible and aperiodic with at least one row sum less than one, for i = 3, . . . , n. Then the eigenvalues λ i = λ i ( P) of P (including multiplicity), can be ordered as with Moreover, if the maximum in (14) is attained uniquely for i = k, then is a row vector and φ k = (φ k (x); x ∈ X k ) a column vector, both with strictly positive elements, R i j has non-negative elements for j ≥ 3, and the remainder term is a matrix with all its |X | 2 elements of smaller order than λ t 3 .
The requirement on P in Theorem 1 is very weak, essentially that X 3 , . . . , X n contain transient states, so that no subpopulation is isolated and eventually one of the two alleles will be fixed in all subpopulations. When m = 3 there is only one component of transient states, and migration is then possible within a finite number of time steps, back and forth between any pair of subpopulations. A recursive formula is provided in the "Appendix" for all R i j , and for R kk we can normalize the two vectors φ k and q k so that x∈X k q k (x) = x∈X k φ k (x)q k (x) = 1. Then q k is the quasi equilibrium distribution of X t conditionally on starting and remaining in X k (Darroch and Seneta 1965;Collet and Martinez 2013). The quasi equilibrium distributions for the other X 3 , . . . , X n are part of the remainder term of (15).
The following important corollary of Theorem 1 deals with the asymptotic decay rate of the expected value of φ(X t ) for a large class of functions: where X k is the component for which the maximum in (14) is attained. Then y)) is the matrix in (15), and E π denotes expected value conditional on P π (X 0 = x) = π(x). In particular, the right hand side of (18) is strictly positive if π(X k ) > 0.
Corollary 1 shows that the largest non-unit eigenvalue λ 3 = λ 3 ( P) determines the rate of decrease of the expected value E π (φ(X t )) as t → ∞. Putting φ(x) = 1 {x / ∈X 1 ∪X 2 } , we notice that the probability of non-fixation decreases with t at this rate, so that λ 3 is the rate of fixation and the eigenvalue of main interest. We will often simplify notation and write The Wright-Fisher (WF) model is a homogeneous population (s Feller (1951) found all the eigenvalues of the transition matrix for the WF model, and in particular For an allele frequency process {X t } with transition matrix P, we define the eigenvalue effective size as the size of a WF population for which the largest non-unit eigenvalue in (20) is the same as for the studied population.

Rate of decay of predicted gene diversities
Following Nei (1973Nei ( , 1987, we define the gene diversity between subpopulations i and j at time t as the probability that two randomly chosen genes from i and j (picked with replacement if i = j) are different by state, i.e. have different types of alleles. Regarding t = 0 as present and t > 0 as future, let be the predicted gene diversity between i and j at time t, given an initial distribution X 0 ∼ π . We collect the predicted gene diversities between all pairs of subpopulations into a column vector of length s 2 , where vec is the vectorization operator that converts a matrix into a column vector by stacking its columns on top of each other. In order to compute linear combinations of the elements of H t , we define weights W i j for all pairs of subpopulations, and prove the following: is a row vector of length s 2 with nonnegative weights W i j ≥ 0 satisfying the symmetry condition W i j = W ji for all i, j, and let φ W (x) = 2 s i, j=1 W i j x i (1 − x j ) be a quadratic functional of x = (x 1 , . . . , x s ) . Then To see the importance of Proposition 2, we notice that a sufficient condition for φ W to satisfy (17) is that all W i j > 0. It therefore follows from Corollary 1 that we can find λ = λ 3 ( P) and hence also N eE from the rate of decrease to 0 of linear combinations of H ti j . It is therefore of interest to study the time dynamics of H t , and we will assume that a non-negative square matrix A of order s 2 exists, so that H t satisfies the linear recursion for t = 1, 2, . . .. It will be convenient to introduce the set of all pairs of subpopulations (cf. (1)), and write the elements of A as where i j and kl is short hand notation for the row and column numbers obtained from the stacking procedure of the vec operation. We can always divide (26) into m irreducible components I 2 = C 1 ∪· · ·∪C m . After a possible reordering of the elements of I 2 , this gives a corresponding block decomposition of (27), with A aa = (A i j,kl ) i j,kl∈C a irreducible for a = 1, . . . , m. Typically, the pair of ancestors of two genes, picked from any pair I 2 of subpopulations, will ultimately belong to C 1 , provided the ancestry is traced sufficiently far back in time. The following fundamental result clarifies the importance of A: Theorem 2 Let {X t } satisfy the conditions of Theorem 1, with λ = λ 3 ( P) the largest non-unit eigenvalue of its transition matrix P, defined in (19). Assume that (25) holds, with A having non-negative elements. Then where A aa are the diagonal matrices of (28). If the maximum in (29) is attained for a unique 1 ≤ c ≤ m, then with ρ = vec (ρ i j ) i j∈I 2 and r = vec (r i j ) i j∈I 2 left and right eigenvectors ρ A = λρ, of A with eigenvalue λ. Explicit expressions for ρ and r are provided in the "Appendix", and their components can be normalized so that ρ i j , r i j > 0 for i j ∈ C c , i j∈C c ρ i j = i j∈C c ρ i j r i j = 1, ρ i j = 0 for i j ∈ C c+1 ∪ · · · ∪ C m and r i j = 0 for i j ∈ C 1 ∪ · · · ∪ C c−1 .
The following key result follows from (21) and Theorems 1-2: Corollary 2 Suppose {X t } satisfies the conditions of Theorem 1, with a linear recursion (25) for predicted gene diversities in terms of a non-negative quadratic matrix A. Then the eigenvalue effective size is . (32)

Coalescence probabilities
In this section we derive a linear recursion (25) for the predicted gene diversity vector H t , using probabilities that ancestral lineages of two genes coalesce. This enables us to compute N eE from (32). Many authors have derived linear recursions for identity by descent probabilities, gene diversities, covariances or coalescence probabilities of subdivided populations with spatial, age, sex or some other structure, with or without mutations. This includes results in Malécot (1951), Hill (1972), Li (1976), Sawyer (1976), Felsenstein (1982), Slatkin (1991), Nagylaki (2000), Ryman and Leimar (2008), Durrett (2008), ,  and other papers mentioned in the introduction. They utilize coalescence probabilities more or less explicitly. We will generalize several of these results for constant subpopulation sizes, allowing backward migration rates to be stochastic and reproduction within subpopulations to have a general form.
Consider a structured coalescent (Notohara 1990;Herbots 1997;Wakeley 1998) for two genes, with coalescence events formulated hierarchically in two steps, first for subpopulations and then for genes. We draw two different genes from the population at time t and consider their joint ancestral subpopulation history (I τ , J τ ) T τ =0 at times {t − τ } T τ =0 until they find their most recent common ancestor at t − T , where T is the coalescence time. Let be the probability that the parents of two genes drawn from subpopulations i and j at time t have parents from subpopulations k and l, conditionally on B t . Since there are 2N u i B tik genes of subpopulation i at time that originate from subpopulation k, it follows that We gather all these probabilities into an observed backward migration matrix Q t = (Q t,i j,kl ) i j∈I 2 ,kl∈I 2 of order s 2 for pairs of subpopulations. By averaging with respect to B t , we then define the unconditional probabilities that the parents of two genes from i and j belong to k and l, and collect them into a square matrix Q = (Q i j,kl ) i j∈I 2 ,kl∈I 2 of order s 2 . The following result summarizes the role of Q t and Q and is stated without proof: Proposition 3 Suppose segregation is independent between generations and condition on that coalescence events do not take place. Then the joint subpopulation ancestry {(I τ , J τ )} t τ =0 of a pair of different genes from time t back to time 0 is a Markov chain with state space I 2 that (a) conditionally on {B t−τ ; τ ≥ 0} has time varying transition matrices {Q t−τ } t−1 τ =0 , (b) unconditionally has a time invariant transition matrix Q. The next step is to incorporate coalescence events. To this end, let be the probability that two genes from subpopulations i and j at time t have the same parent, given that the parent belongs to k, and conditionally on B t . The corresponding unconditional coalescence probability p i jk that two genes from i and j that both have their parents in k, have the same parent, is It will be helpful to introduce the quantities in order to get a more explicit expression for p i jk , where ν tkig in (2) is the number of offspring of gene g in subpopulation k at time t − 1 that end up in i at time t. The variables {V ki j } s i, j=1 quantify the average variability of reproductive success among individuals in subpopulation k, and the coalescence probabilities are closely related to standardized versions of them: Theorem 3 Suppose the backward subpopulation ancestry of two different genes before coalescence is a Markov chain, with transition matrix Q = (Q i j,kl ) as in (35), and define coalescence probabilities p i jk as in (37). Then with V ki j defined in (38), and the components (22) of the gene diversity vectors

j=1 form a multivariate autoregressive process
of order one for t = 0, 1, . . ., with ti j satisfying E( ti j |X t−1 ) = 0, and The expected gene diversity H t satisfies (25), with A = (A i j,kl ) as in (41).
It turns out that the right eigenvector of P corresponding to its third largest eigenvalue λ 3 can be found explicitly, whenever (40) holds:

Corollary 3 Suppose the conditions of Theorems 1 and 2 hold, with a gene diversity process H t satisfying (40). Then the transition matrix P of the Markov chain X t has a right eigenvector
where X k is the component of X 1 ∪ · · · X n for which the maximum in (14) is uniquely attained. In particular, the restriction of φ 3 to X k agrees (up to a multiplicative constant) with the function φ k of Theorem 1.

Examples
The key formula (41) provides a general way to find A and hence also N eE through (32). We study its two main ingredients; expected backward migration rates Q i j,kl for pairs of genes in Subsect. 5.1, and coalescence probabilities p i jk in Subsect. 5.2. Then we apply these findings to a number of models in Subsect. 5.3 in order to show the generality of Theorem 3, explain how to apply it and as a preparation for the asymptotics of Sect. 6.

Backward migration
Example 1 (Fixed backward migration). When the observed backward migration rates are non-random, we must have in order to satisfy (3). It follows from (34), (35) and (44) that Following the nomenclature of Sved and Latter (Sved and Latter (1977)), we refer to (44) as fixed migration rates.

Example 2 (Dirichlet multinomial backward migration). Denote the i:th rows of B and B t by B
that are independent for i = 1, . . . , s, and assume are conditionally independent and multinomially distributed random vectors, given allB ti . Combining (46) and (47), the rows of B t are independent random vectors with Dirichlet multinomial distributions. The parameters α i quantify the amount of variability of the rows ofB t . We will also extend (46) to α i ∈ {0, ∞}, which in conjunction with (47) gives two degenerate cases within the Dirichlet multinomial family: so that when time proceeds backwards, the ancestral history of all genes within a subpopulation will vary according to the same Markov chain with transition matrix B. From a forward perspective, the latter system has s subpopulation, but the backwards scenario will be identical to that of a single population whose size varies between 2N u 1 , . . . , 2N u s , according to a Markov chain with transition kernel B, see for instance Jagers and Sagitov (2004), Sampson (2006), Pollak (2010), Kaj and Krone (2003) and Sano et al. (2004).
in accordance with (3). Since two genes of subpopulations i and j are drawn independently with multinomial distributions from rows i and j ofB t , it follows from (34) that E Q ti j,kl |B t =B tikBt jl . Since the rows ofB t have independent Dirichlet distributions (46); which simplifies to when α i ≡ ∞ and the rows of B t have multinomial distributions.

Coalescence probabilities
Example 3 (Mixed multinomial reproduction). Coalescence probabilities require that a reproduction scheme is specified. A fairly general scheme is defined by dividing the time interval between t − 1 and t into three steps. In a first fertilization step, a gamete pool of infinite size is created for each parental subpopulation k, to which the 2N k genes contribute in fractions ω tk1 , . . . , ω tk,2N k that are exchangeable random variables summing to one. In a second migration migration step, the s gamete pools mix, so that the infinitely sized post-migration gamete pool i is a mixture of pre-migration pools 1, . . . , s in proportionsB ti1 , . . . ,B tis . In the final reproduction step, subpopulation k at time t is formed by drawing 2N k genes from post migration gamete pool k. Then (47) holds, and regardless of the distribution ofB tik , independently between all pairs of parental and offspring subpopulations k and i, where ω tk = (ω tk1 , . . . , ω tk,2N k ). When (52) holds, we evaluate the expected value in (38) by conditioning on B tik and ω tk and then insert into (39). By second moment properties of the multinomial distribution; this yields and this can also (more easily) be obtained by a direct argument. The coalescence probability p i jk = p k in (53) only depends on the parental subpopulation k, since ω tk is the same, regardless of the offsprings' subpopulations i and j. In the second step of (53) we used E(ω tk1 ) = 1/(2N k ), since the components of ω tk are exchangeable. We interpret N eI,k ≤ N k as a local inbreeding effective size of subpopulation k, with equality if and only if ω tk is non-random.

Example 4 (Survival indicators).
Consider the genes g = 1, . . . , N k of subpopulation k at time t − 1, and assume that transitions k → i from time t − 1 to a specific subpopulation i at time t represent their survival, with ν tkig ∈ {0, 1} an indicator for gene g to be still alive at time t. Then follows from (38) to (39). For all other subpopulations j = i we let ν tk jg refer to the number of offspring of g.
where M k, j|i is the expected number of offspring in j for a gene that survives, possibly different from the expected number of offspring M k j = E(ν tk j1 ) of any gene in k. Since (34)-(35) imply Q i j,kk = B ik B jk , it follows from (7) and (39) that the coalescence probability is the inverse of the local census size of k times a correction factor that quantifies how correlated survival is with number of progeny in j.

Example models
Example 5 (Cannings models). For a homogeneous population (s = 1), we may drop subpopulation index and write ν t11g = ν tg , with ν t1 , . . . , ν t,2N exchangeable random variables. Since subpopulation sizes and backward migration probabilities are trivial (u 1 = Q 11,11 = 1), we only need to specify coalescence probabilities p = p 111 . It follows from (38)-(39) that and A = A 11,11 = 1 − p due to (41). Insertion into (32) yields in agreement with Section 3.7 of Ewens (2004). Notice that (56) Example 6 (Sink and source populations). This population has a source of size N 1 and a sink of size N 2 . It is assumed that on average N emig individuals emigrate from 1 to 2 per generation. The migration scheme is degenerate in the sense that the expected forward and backward migration matrices are both reducible; although the equilibrium distribution γ = (1, 0) of B is still unique. We assume multinomial backward migration (51), and reproduction where parents are drawn uniformly and independently from the parental subpopulation, corresponding to (53). If subpopulations are ordered as I 2 = {11, 12, 21, 22}, we find from (41) that It is evident that A has a block structure (28), with C 1 = 11, C 2 = 12, C 3 = 21 and C 4 = 22. Combining the expression for A with (29) and (32), we obtain For large enough migration rates N emig ≥ N 2 /(2N 1 ), the source population will determine the effective size, whereas for small migration rates N emig < N 2 /(2N 1 ), the two populations get increasingly isolated, and the effective size tends to infinity. It turns out that Theorem 1 applies with n = 5 components, of which the three nonabsorbing ones are For large migration rates, the eigenvalue λ of P and A is found within components X 5 and C 1 , and (42) simplifies to φ 3 (x) = ρ 11 x 1 (1 − x 1 ). For small migration rates, we find λ within X 3 , X 4 and C 2 , C 3 respectively.
Example 7 (Combined age and spatial structure). Age structured models have been studied by Felsenstein (1971), Hill (1972), Kaj et al. (2001), Sagitov and Jagers (2005) and Hössjer (2011). Here we consider a population with s = 2z that has two demes with z age classes each. Subpopulations are numbered so that i (z + i) corresponds to age class i = 1, . . . , z of deme 1 (deme 2). The expected forward and backward migration matrices have a block structure, with M cd and B cd describing migration between demes c and d. All blocks of the forward matrix have the same form, e.g.
with a first column containing expected number of offspring in deme 1 of all age classes in deme 1, and a superdiagonal with probabilities of surviving to the next age class and not migrating to deme 2. The blocks of the backward matrix a similar structure, for instance with probabilities in the first row that the parent of a newborn in deme 1 originates from the various age classes in deme 1, and probabilities along the subdiagonal that genes of the various adult age classes in deme 1 did not migrate during the last time step. If parental subpopulations are chosen independently for all genes from B, it follows that Q i j,kl is given by (51). Coalescence probabilities are obtained for all triples i jk of age classes for which Q i j,kk is nonzero, with a(i) = i mod z the age class of subpopulation i, assuming in the first row that parents of newborns in a particular deme are chosen from a mixed multinomial distribution, with coalescence probabilities as in (53). For the second row we used that the coalescence probability (54) for survival is zero, and in third row we assumed that survival is independent of number of offspring, cf. (55).
Example 8 (Extended Moran model). Eldon andWakeley (2006, 2009) extended the Moran model to allow for more skewed offspring distributions, for a homogeneous population and for the island model. Here generalize their model to populations with conservative migration. The reproduction cycle between time points t − 1 and t is divided into two parts. In the first reproduction step one gene within each subpopulation Then Y k − 1 other, randomly chosen genes from the same subpopulation die. In the second migration step we assume that (44) holds, so that the forward and backward migration rates are non-random and fixed. The conservative migration assumption implies that exactly 2N k genes from subpopulation k migrate in (randomly chosen) groups of sizes 2N 1 B 1k , . . . 2N s B sk to subpopulations 1, . . . , s. The coalescence probability for any triple i, j, k of subpopulations is as shown either by a direct argument, or from (38), (39) and (45). We finally obtain N eE by inserting (45) and (61) into (41) and (32).
Example 9 (Dioecious population). Consider a population with N m males, N f = N − N m females and sex ratio ξ = N m /N . Inheritance at an autosomal locus is modeled with s = 4 subpopulations; gametes within males inherited from the father (i = 1) and mother (i = 2), and gametes within females inherited from the father (i = 3) and mother (i = 4), so that the relative subpopulation sizes are According to Mendelian laws, each gamete is inherited, with equal probability 0.5, either from a grandpaternal or a grandmaternal gamete within the father or mother. This gives an observed backward migration matrix B t with a multinomial distribution (48). In view of (7) and γ = (1/4, 1/4, 1/4, 1/4) follows from (4). Pairwise backward migration probabilities Q i j,kl are given by (51), and in order to find the coalescence probabilities p i jk , we follow the notation of Hill (1979) and let m and f represent male and female sexes, write τ 2 rs for the variance of the number of children of sex s of an individual of sex r , and τ rr,rs for the covariance of the number of children of sex r and s of an individual of sex r . It is shown in the "Appendix" that the nonzero coalescence probabilities when k = 1 are and analogous formulas hold for k = 2, 3, 4. In particular, if a sperm or ova gamete chooses its parent randomly among all N m or N f parental genes, the number of male and female offspring of a male are two independent binomial random variables Bin(2N u 1 , (2N u 1 ) −1 ) and Bin(2N u 3 , (2N u 1 ) −1 ), with variances τ 2 mm and τ 2 m f respectively, and covariance τ mm,m f = 0. It is easily seen that in this case, all three probabilities in (63)-(65) equal 1/ (2N u 1 ). We finally obtain N eE by inserting (62) into (51), and then (51), (63)-(65) and the analogous coalescence probabilities for k = 2, 3, 4 into (41) and (32).

Asymptotics
According to (32), we find N eE from the largest eigenvalue λ of A, for which we derived very explicit expressions in Sect. 4. Here will use this approach and give asymptotic formulas for N eE when depends on a positive parameter ε → 0, with a Taylor expansion andȦ = (Ȧ i j,kl ) i j∈I 2 ,kl∈I 2 a matrix of order s 2 . For each fixed ε > 0, A(ε) satisfies the conditions of Theorem 2, so that in particular its unique largest eigenvalue is λ = λ(ε). The limiting matrix A(0) is assumed to have a largest, not necessarily unique, eigenvalue λ(0) = 1. As in Maruyama (1970a) and Nagylaki (1980Nagylaki ( , 1995, we use perturbation theory of matrices (Horn and Johnson 1985;Friswell 1996), in order to deduce for some negative constantλ < 0. Inserting (68) into (21), we find the asymptotic rate by which the eigenvalue effective size tends to infinity. The following result gives a very explicit formula forλ, see for instance Aa et al. (2007) and references therein for a proof: Theorem 4 Suppose the above conditions hold, with A(0) having a largest eigenvalue λ(0) = 1 of multiplicity 1 ≤ v ≤ s 2 , with corresponding left and right eigenvectors ρ 1 (0), . . . , ρ v (0) and r 1 (0), . . . , r v (0). If also the perturbed left eigenvectors ρ α (ε) and eigenvalues λ α (ε) are differentiable functions of ε at 0 for α = 1, . . . , v, it holds that λ(ε) = max α=1,...,v λ α (ε) for small enough ε > 0, and (68) is satisfied, witḣ In particular, suppose A(0) is the transition matrix of a Markov chain with a unique equilibrium distribution ρ(0) = (ρ i j (0)), the left eigenvector corresponding to λ (0) = 1, and a right eigenvector r(0 In the following three subsections, the small perturbation parameter ε will either correspond to an inverse population size, a migration rate or both. We will use (41) and establish a Taylor expansion (67) for each case based on when population sizes N i (ε) = N (ε)u i (ε), backward migration rates Q i j,kl (ε) and/or coalescence probabilities p i jk (ε) depend on the perturbation parameter ε → 0. The asymptotic expression for N eE is then obtained from (69) and (70).

Large populations
We assume that the population size N tends to infinity, while the relative subpopulations sizes u, forward and backward migration matrices M and B are kept fixed. Introduce as a perturbation parameter, with 0 < β ≤ 1 a fixed constant. In order to verify a Taylor expansion of A(ε) in (73), we first consider the backward migration matrix Q(ε) = (Q i j,kl (ε)) = E(Q t (ε)) for pairs of genes. It does not depend on ε for Dirichlet multinomial backward migration in (50), whereas for fixed backward migration (45) it does. In order to keep generality we assume that Q(ε) may depend on ε, with a Taylor expansion for some matrixQ = (Q i j,kl ). It will be seen though thatQ does not influence the asymptotic behavior of N eE . In contrast, the asymptotic behavior of the coalescence probabilities p i jk = p i jk (ε) is crucial and depends on how variable reproductive success is between individuals that migrate from one subpopulation (k) to other pairs of subpopulations (i, j). The limits are assumed to exist for all triples i jk, with V ki j defined in (38). It follows from (39), (74) and (76) that the coalescence probabilities admit Taylor expansions for all i, j, k. We refer toṗ i jk as the coalescence rate when two lines from i and j are merged into k and time is measured in units of ε −1 = 2N β generations. The coalescence rate σ i jk takes the size of the parental subpopulation k into account and measures time in units of 2(N u k ) β instead. Taking the ε → 0 limit in (73), it follows from (75) and (77) that We assume that Q(0) is the transition kernel of a Markov chain that is not necessarily irreducible (it may contain some transient states), but has a unique equilibrium distribution ρ(0) = (ρ i j (0)), which is also the left eigenvector of Q(0) corresponding to its unique largest eigenvalue λ(0) = 1. Hence formula (72) of Theorem 4 applies, and we obtain the following: (73)-(74). Assume the population size N → ∞ so that (75) holds and the limits in (76) exist for some 0 < β ≤ 1. Then A(ε) satisfies Taylor expansion (67), with A(0) as in (78) anḋ

Theorem 5 Define ε and A = A(ε) as in
If the differentiability conditions on λ(·) and ρ(·) in Theorem 4 hold, then Suppose I = I τ and J = J τ are the subpopulations of the ancestors, taken from the same generation t − τ of two genes sampled at a fixed time t. Let K = I τ +1 and L = J τ +1 refer to the subpopulations of their two parents. Arguing as in Möhle (1998a), it will take many generations before coalescence in a large population, so that (I, J, K , L) will first attain its equilibrium distribution ρ i j (0)Q i j,kl (0). Therefore, (81) can be interpreted as a coalescence rate at equilibrium if time is counted in units of 2N β . In view of this, we may expect that for large populations, N eE is asymptotically equivalent to the coalescence effective size N eC whenever the latter exists. In a number of examples below, we will indeed verify that with C the same constant as in (81). To this end, we first need the following:

Corollary 4 Suppose the conditions of Theorem 5 hold. Then asymptotically as N → ∞, N eE is given by
for fixed backward migration (44), and for Dirichlet multinomial backward migration (46)-(47).
Theorem 2 and (85) imply that A has m = 1 irreducible component C 1 = I 2 for Dirichlet multinomial backward migration when B is irreducible and at least one α i > 0, whereas A has m = 2 components C 1 = {(i, i); i = 1, . . . , s} and C 2 = {(i, i); i = j} when α i ≡ 0. Then the joint ancestry of two genes are confined to lie in the same subpopulation after a few generations, once their ancestral subpopulation lineages merge for the first time, although they may not yet have coalesced at the gene level.
Corollary 5 Assume the conditions of Theorem 5 hold, with Dirichlet multinomial backward migration and α i ≡ 0. Then the equilibrium distribution ρ(0) = (ρ i j (0)) of Q(0) is supported on the diagonal of I 2 , with elements Moreover, N eE is asymptotically given by (80), with V kii = E (ν tki1 (ν tki1 − 1)|K ti = k) and K ti the subpopulation to which all parents of the genes in i at time t belong. Jagers and Sagitov (2004) and Pollak (2010) studied populations with rapidly varying sizes, which can be viewed as a special case of the Dirichlet multinomial backward distribution with α i ≡ 0 (see Example 2). They showed that N eC satisfies (83), with β = 1 and C as in (87).
It is also possible to get explicit expressions for C when backward migration is fixed or multinomial: (84) and (85) with α i ≡ ∞ respectively, and the equilibrium distribution ρ(0) = (ρ i j (0)) has elements

Corollary 6 Assume the conditions of Theorem 5 hold, with fixed or multinomial (α i ≡ ∞) backward migration. Then Q i j,kl (0) = B ik B jl follows from
Moreover, N eE is asymptotically given by (80)-(81), with and V ki j as defined in (38). Felsenstein (1971) seems to have been first to use (89) for weighting pairs of subpopulations. Hössjer (2011) studied models with fixed backward migration, and showed that N eC satisfies (83) when β = 1, with C as in (90).
Example 10 (Local subpopulation sizes). When the coalescence probability p i jk (ε) = p k (ε) only depends on the parental subpopulation k for all ε > 0, the size standardized coalescence rate (76) satisfies as for the mixed multinomial reproduction scheme of Example 3. We deduce from (53) that p k ≥ 1/(2N u k ), and with C larger and N eE smaller the more variable the gene weights ω tkg are. When (92) and Theorem 5 hold, it follows that N eE is given by (80), with using i j ρ i j (0)Q i j,kl (0) = ρ kl (0) in the last step, since ρ(0) is the left eigenvector of Q(0) with eigenvalue 1. In particular, if all genes within each subpopulation have parents from the same subpopulation, (86) implies whereas for fixed or multinomial backward migration, we deduce from (89). In particular, if offspring pick their parents uniformly and independently within the parental subpopulation k, we have p i jk = p k = 1/(2N k ), so that the local inbreeding effective size N eI,k in (53) equals the local census size N k . Asymptotically, this corresponds to having β = 1 in (74) and σ k = 1 in (92). For fixed backward migration, we can therefore use (80) and (96), and deduce N eE Notohara (1993) and Nordborg and Krone (2002) showed that the coalescence effective size satisfies N eC = N /C, with C as in (97). Whenever N eI,k = N k , The first inequality of (98) shows how much stochastically varying migration lowers N eE at most. Then Cauchy-Schwarz inequality shows how much a variable long term reproductivity γ k /u k between subpopulations lowers N eE , with equality for conservative migration γ k = u k (Nagylaki 1980). The circular stepping stone model has conservative migration with uniform population sizes u k = 1/s, and Maruyama (1970a) found that N eE /N → 1 as N → ∞, in agreement with the right hand side of (98).
Example 11 (Multiple mergers). Other limiting ancestral processes with multiple mergers (Pitman 1999;Sagitov 1999) are possible. Let p i jh,k be the probability that three genes from subpopulations i, j, h that all have their parents in subpopulation k, have the same parent. We will only consider models for which the pairwise and triple coalescence probabilities p i jk = p k and p k only depend on the source population. Then we must have in order for the limiting process to be Kingman's coalescent, see Theorem 3.2 of Möhle (2000). It is not possible to violate (99) in Example 3. Indeed, as N → ∞, since ω tk1 is bounded by 1 and tends to zero in probability. On the other hand, the Moran model of Example 8 allows for multiple mergers for an appropriate choice of the offspring size Y t−1,k = Y k of the reproducing gene of subpopulation k. As in Eldon and Wakeley (2006), we let for some 0 < ψ ≤ 1 and β > 0. Then one shows analogously as in (61). Since as N → ∞, it follows from (61) and the last two displayed equations that we can violate (99) when 0 < β ≤ 1, with σ i jk = σ k = ψ 2 2 1−β . Since the extended Moran model has fixed backward migration and conservative migration γ k = u k , it follows that N eE satisfies (96) with Notice that this expression equals 1 when β = ψ = 1, since then the coalescence probability is asymptotically equivalent to 1/(2N k ).
Other applications of Theorem 5 with β = 1 includes a single deme with s age classes. Explicit formulas for the constant C in N eE = N /C + o(N ) can be found under general assumptions on how reproductivity varies randomly between and within age classes, thereby extending results of Felsenstein (1971), Sagitov and Jagers (2005) and Hössjer (2011).
The expression for C is then somewhat different, since males only have one copy of an X -chromosome, and only s = 3 subpopulations are needed. Overlapping generations within a dioecious population (Pollak 2011) requires s = 4z (s = 3z) subpopulations for inheritance at an autosomal (X-linked) locus with z age classes. See also Möhle (1998b), for coalescence theory of two-sex models.

Small migration rates
Assume that the subpopulations in (1) divide into m ≤ s demes with deme d containing the subpopulations of I(d). We will introduce a migration parameter ε → 0 that quantifies the amount of migration between the demes (not within them) while the total population size N is kept fixed. In order to obtain an expression for N eE as ε → 0, the crucial part is to find how all Q i j,kl (ε) in (73) depend on ε. Although the relative subpopulation sizes u i (ε) and coalescence probabilities p i jk (ε) may vary with ε to some extent, this will have no asymptotic impact on N eE as ε → 0. We will assume that the backward migration matrix depends on 0 ≤ ε ≤ ε max , where ε max is chosen to guarantee that B(ε) remains a non-negative matrix. The demes are isolated when ε = 0, so that B(0) has a block diagonal structure B(0) = diag(B 11 (0), . . . , B mm (0)), with B dd (0) = (B ik (0)) i,k∈I (d) describing backward migration within deme d. Since B(ε) is a transition matrix of a Markov chain for all ε, the row sums ofḂ must be zero, and this holds, for instance, ifḂ for all i ∈ I(d) and d = 1, . . . , m. If M(ε) and u(ε) are computed for each ε > 0 from (7) and (8), it follows from (102) that if all u i (ε) are differentiable at 0, withṀ = (Ṁ ki ) having elementṡ The migration parameter ε is such B(ε) =Ḃε + o(ε) and M(ε) =Ṁε + o(ε) as ε → 0 for some positive constantsḂ andṀ, where is the backward migration rate between demes, i.e. the average number of parents of ancestors far back in time that originate from another deme than their children, and is the forward migration rate, i.e. the fraction of all offspring today whose parents reside in another deme. Backward migration B(ε) is somewhat easier to analyze theoretically, but often M(ε) is of more interest in applications. In order to find explicit expressions forḂ andṀ, we introduce γ d = (γ di ) i∈I(d) as the equilibrium distribution of B dd (0), and the matrix G = (G ab ) m a,b=1 with elements It is the infinitesimal generator of a continuous time Markov process with state space {1, . . . , m} and an equilibrium distribution θ = (θ 1 , . . . , θ m ) satisfying In the next lemma we assume ε is small, so that migration is faster within than between demes, and subpopulations within a deme form a macro state. The backward ancestry of a gene then attains its equilibrium distribution within a deme before any transitions between demes occur, and then the backward deme ancestry is a continuous time Markov process with generator G: for i = 1, . . . , s, and the backward migration rate (106) where γ i (0) is the limit of the right hand side of (110). If all u i (ε) are differentiable at 0, the forward rate (107) has a similar expansion In order to find the asymptotic behaviour of N eE as ε → 0 by means of (69) and Theorem 4, we derive an expression for A(ε) in (73), findȦ, show that A(0) has a largest eigenvalue λ(0) = 1, find its multiplicity v and corresponding left and right eigenvectors. Because all demes are isolated when ε = 0, it is easy to see that the ancestors of i ∈ I(a) and j ∈ I(b) must belong to k ∈ I(a) and l ∈ I(b) respectively. Therefore Q(0) has a block diagonal structure with Q ab (0) = Q i j,kl i,k∈I(a), j,l∈I(b) a square matrix of order |I(a)||I(b)| containing all backward transitions when one gene and its parent are from deme a and the other gene and its parent are from deme b. It follows from (73) that A(0) has a block diagonal structure

as well, with A ab (ε) = A i j,kl (ε) i,k∈I(a), j,l∈I(b) having elements
for any ε because of (73), for subpopulations i and j that reside in different demes.
In particular, A ab (0) = Q ab (0) has a unique largest eigenvalue 1 when a = b, and associated left and right eigenvectors ρ ab (0) = vec (ρ ab,i j (0)) i j∈I 2 and r ab (0) = vec (r ab,i j (0)) i j∈I 2 with components Since coalescence events are possible within each deme, even when ε = 0, it follows that A dd (0) differs from Q dd (0), with a largest eigenvalue strictly smaller than one. Therefore, the largest eigenvalue 1 of A(0) has multiplicity In order to apply Theorem 4 we must also find the entries of the matrix˙ = follows from (114) and differentiation of (115) with respect to ε. Invoking the definition of˙ in (71), we find thaṫ Then we insert (117) into (118), make use of (108) and obtaiṅ In particular, when each subpopulation is a deme, m = s and I(d) = {d} for d = 1, . . . , s, so that (108) implies G =Ḃ, and˙ = (˙ i j,kl ) 1≤i = j≤s,1≤k =l≤s is of order v = s(s − 1), with elementṡ Combining (111), (112) and (119) with Theorem 4, we obtain the following: Theorem 6 Suppose subpopulations are divided into m demes, as in (101), whose isolation is quantified by the matrixḂ = (Ḃ ik ) in (102). Then with B = B(ε) the backward migration rate between demes in (106), and˙ = (˙ ab,cd ) the matrix in (119), which simplifies to (120) when m = s. If all u k (ε) are differentiable at ε = 0 as well, then with M = M(ε) the forward migration rate in (107) andṀ ki defined in (105).
Example 12 (Island model). The island model (Wright 1943;Maruyama 1970b) is the most well known example of a population with spatial substructure, having m = s demes, and a forward migration matrix where 1 is a column vector of s ones. Migration is symmetric, so that the migration rate M ki = ε/(s − 1) from each k to any other deme i = k is the same. It follows by symmetry from (4), (7) and (8) Insertion of (124) into (120) yieldṡ so that by symmetry, the largest eigenvalue of˙ corresponds to an eigenvector 1 v = (1, . . . , 1) that is a column vector with v = s(s − 1) ones. Hence we find, from any of the row sums of˙ , that We finally apply (122) and arrive at as M → 0. The accuracy of this formula is illustrated in Fig. 1.
Example 13 (Circular stepping stone model). The circular stepping stone model (Kimura 1953;Kimura and Weiss 1964;Maruyama 1970a) is a spatial model with m = s demes located along the perimeter of a circle, where migration from any deme is only possible to one of its two nearest neighbors. The elements of the expected forward migration matrix are where δ(i, j) is the shortest distance between demes i and j along the circle perimeter, when the distance between two neighboring demes is normalized to 1. It follows from (4), (7) and (8) and from (120) we find that the matrix˙ = (˙ i j,kl ) has elementṡ Since s k=1 u k (0) i;i =kṀ ki = 1, we finally deduce from (122) that It seems difficult to obtain an explicit expression for the multiplicative constant in (127), although Maruyama (1970a) derived an approximation for even s. In Table 1 we compare (127) and (128) for different s and find a very good agreement.
Example 14 (System with five subpopulations). A system with five subpopulations of varying size is shown in Fig. 2, with number of migrants in each generation depicted next to the arrows. The forward migration matrix is and the relative subpopulation size vector u = (2, 4, 0.5, 4, 1)/11.5. We let the forward migration rates depend on a perturbation parameter ε according to so that u = u(ε) does not depend on ε, whereas the forward migration rate M(ε) = εM is proportional to ε. It follows from (105) thaṫ Combining (122) and (129), we find that with C = (1 − k u k M kk )/(−2λ max (˙ )) and˙ derived from (120) and (130). The numerically computed value C = 1.419 is justified in Fig. 2.
Example 15 (Combined spatial and age structure). Continuing Example 7, we assume that the forward migration matrix depends on ε as so that the two demes are isolated when ε = 0. For brevity, write M ki = M ki (0). The nonzero elements of the two off-diagonal blocks ofṀ arė where c 1 , . . . , c z are non-negative constants, of which at least one is strictly positive. The migration rates in (132) are chosen so that u k = u k (ε) does not depend on ε. Intuitively, a fraction εc 1 u z+1 of all offspring in deme 1 end up in deme 2, and a fraction εc k+1 u z+k+1 of all genes of age class k of deme 1 that survive, migrate to deme 2, and similarly for the other two equations of (132). In the "Appendix" we verify that as M → 0. When c i increases with i, older individuals will migrate more, and this will increase N eE if older individuals are less reproductive, and decrease N eE if they reproduce more. Conservative migration is the intermediate case when all age groups are equally reproductive, with γ 1i = u i /U (1), γ 2i = u z+i /U (2) and U (d) = i∈I(d) u i the relative size of deme d. Insertion into (133) gives as M → 0 for conservative migration, independently of the age dependency of the migration pattern. In particular, if both demes are equally large, we get the same multiplicative constant C = U (1)U (2) = (1/2) 2 = 1/4 as for an island model (125) with s = 2.

Large populations and small migration rates
We let the inverse population size and the backward migration rate both tend to zero at the same speed, so that with ε → 0 and c a constant. This can be viewed as an asymptotic scenario intermediate between (74) (with β = 1) and (102). The asymptotic expression for N eE is derived similarly as in the previous subsection, so we only highlight the differences. Since the population size tends to infinity, the coalescence probabilities p i jk will tend to zero, as described in (77), and this modifies (73) to where σ i jk is the size standardized coalescence rate in (77). Consequently, is a block diagonal matrix with blocks A ab (0) given by (115) When a = b, the left and right eigenvectors of A ab (0) are as in (116). The same is true when a = b if we add the assumption of fixed or multinomial backward migration proportions. Differentiating (135) with respect to ε we find thaṫ withQ i j,kl as in (117), but without the restriction a = b. Therefore, inserting (116) and (136) into the definition of˙ = (˙ ab,cd ) 1≤a,b≤m,1≤c,d≤m in (71), we find, after some computations, that this matrix has elementṡ with G ab as in (108) and is a coalescence rate between the lines of deme a that can be interpreted as a local version of (90). In particular, when each subpopulation i is a deme, G ik =Ḃ ik , and (137) reduces to 630 O. Hössjeṙ Equipped with (137) and (138), we apply (69) and Theorem 4, and deduce: Proposition 4 Suppose the migration rate between demes and the inverse population size tend to zero simultaneously as in (134) when ε → 0, with a backward migration that is either fixed (44) or multinomial (48). The eigenvalue effective size then has an asymptotic expansion with the elements of˙ as in (137), or (138) when each deme contains one single subpopulation.
Example 16 (Island model.) We will assume that where σ = (N /s)/N eI can be interpreted as a ratio between the (constant) local census and effective size of each deme. This local inbreeding effective size N eI = N eI,i is similar to (53), although we only consider triplets i = j = k of demes here. It follows from (124), (138) and (140) thaṫ Letλ = λ max (˙ ) be the largest eigenvalue of˙ , with x = vec (x i j ) s i, j=1 the corresponding right eigenvector satisfying˙ x =λx. By symmetry we must have x i j = y when i = j and x i j = z when i = j, and a 2 × 2 system of equations for y and z. It will be convenient to introduce the parameter κ = c/(sσ ) = 4N M/(sσ ) =: 4N eI M. Then we apply (139) and find that is the largest root of the (quadratic) characteristic equation inλ obtained from (142). Figure 3 verifies numerically fast convergence in (143), for three combinations of s and κ (notice the narrow scales of the y-axes). We have that lim κ→0 (−2λ(κ)) = 4/(s −1), in agreement with (125). On the other hand, and when the migration rate dominates the inverse population size, we get lim κ→∞ C(κ) = σ for σ k = σ and γ k = u k = 1/s, in agreement with (96).

Discussion
In this paper we developed a general theory which enables computation of the eigenvalue effective size N eE for a large class of structured populations with stochastic backward migration and exchangeable reproduction within subpopulations, exactly or asymptotically when either the inverse population size and/or migration rates between subpopulations tend to zero. Our work can be extended in several ways. First, subpopulation sizes could be time varying. Existence of λ then requires extra conditions, e.g. sizes that either vary as a Markov chain or cyclically. Several authors have studied this problem for homogeneous or age structured models, see Karlin (1968), Jagers and Sagitov (2004), Pollak (1980Pollak ( , 2002 and Wang and Pollak (2000a, b). For cyclically varying populations with period τ , the matrix A = A t of the predicted gene diversity recursion will depend cyclically on time. Whitlock and Barton (1997) argued that this deterministic process tends to zero at a rate as formally proved in . It is straightforward to extend Theorem 1 by measuring time in units of τ , so that the allele frequency Markov process has kernel P 1 · . . . · P τ . Then (144) equals the rate of fixation λ = λ 3 ( P 1 · . . . · P τ ) 1/τ of alleles in units of time step one. Second, we have included two-sex models, defining subpopulations in terms of male and female gametes. It would also be of interest to define subpopulations in terms of individuals, as for an island model with diploid monoecious or dioecious individuals (Chesser et al. 1993;Wang 1997a, b). This would require some changes in the way the elements of A are characterized in terms of coalescence probabilities, requiring modifications of (34) and Theorem 3.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Appendix
Proof of Theorem 1 Formula (14) follows from the lower block diagonal decomposition (13) of P, which implies that the spectrum of eigenvalues of P is the union of the spectrum of eigenvalues of all P ii . Moreover, for each i ≥ 3 we have that λ max ( P ii ) < 1, since P ii has non-negative elements, row sums less or equal to one, with at least one row sum strictly less than one. In conjunction with (14), this implies λ 3 < 1.
In order to prove (15), we recall from (12) that φ 2 (x) = γ x. It follows from (10) that {φ 2 (X t )} is a martingale. With τ = min{t; X t ∈ X 1 ∪X 2 } a stopping time, we can use the Optional Stopping Theorem (cf. Chapter 7 of Grimmett and Stirzaker 2001) to deduce and similarly P t (x, 0) → 1 − φ 2 (x) = φ 1 (x), for any x ∈ X . This establishes the leading two terms on the right hand side of (15). In order motivate the next term of order λ t 3 , we notice that P kk is non-negative, irreducible and aperiodic, and therefore has a unique largest eigenvalue λ 3 . But since the maximum in (14) is attained uniquely for i = k, λ 3 must also be a simple eigenvalue for P, so that |λ i | < λ 3 for i = 4, . . . , |X |. In conjunction with (145), this implies for some matrix R as t → ∞. Since P has a lower triangular block decomposition by (11), so has P t = ( P (t) i j ) and R = (R i j ). Moreover, P t has non-negative elements and φ 1 π 1 + φ 2 π 2 has all but its first two columns equal to zero, and therefore each R i j with j ≥ 3 is non-negative. Apply Perron-Frobenius Theorem to the non-negative, irreducible and aperiodic matrix P kk in order to deduce P , with φ k and q k the right and left eigenvectors for the leading eigenvalue λ 3 with strictly positive components, normalized for instance so that x∈X k q k (x) = x∈X k φ k (x)q k (x) = 1. This proves that R k = φ k q k , and since all other block diagonal P ii have leading eigenvalues smaller than λ 3 , the corresponding matrices R ii must be zero.
The remaining submatrices R i j of R (i > j ≥ 1, i ≥ 3) can be computed as follows. We use (15) and the two recursions P il P l j and let t → ∞. After some computations this leads to with I p an identity matrix of order p. The upper and lower recursions of (147) is applicable for i = k and j = k, and all R i j can be found from them.
Proof of Corollary 1 It follows from (15) and (17) that where the last inequality holds if π(X k ) > 0, since q k , φ k and φ are strictly positive functions on X k .

Proof of Proposition 2 The proposition follows from
where in the second step we used the symmetry condition W i j = W ji , and in the last step the definition of H ti j in (23).
Proof of Theorem 2 Assume that all elements W i j of W are strictly positive, and that the symmetry condition W ji = W i j holds. Then φ W satisfies (17), and it follows from (18) and Proposition 2 that with C > 0. On the other hand we can use (25) repeatedly t times to deduce The block decomposition (28) implies that λ max ( A) = max a=1,...,m λ max ( A aa ), and since (148)-(149) hold for any vector W with non-negative components and any admissible H 0 , (29) follows. In the sequel, we therefore write λ = λ max ( A).
If the maximum in (29) is attained for a unique 1 ≤ c ≤ m, we lump I 2 =C 1 ∪ C 2 ∪C 3 into (at most) three componentsC 1 = ∪ c−1 a=1 C a ,C 2 = C c andC 3 = ∪ m a=c+1 C a . In particular, we putC 1 = ∅ if c = 1, andC 3 = ∅ if c = m. The corresponding block decomposition of A is It can be shown that matrices C ab exist for 1 ≤ b ≤ a ≤ 3 such that as t → ∞. We will identify the components of C = (C ab ), and in this process find the right and left eigenvectors r and ρ of A.
Starting with the diagonal submatrices of C, it follows that C 11 = C 33 = 0, since λ max (Ā aa ) < λ for a = 1, 3. On the other hand, sinceĀ 22 = A cc is irreducible with λ max (Ā aa ) = λ, Perron-Frobenius Theorem and (149) imply that λ is a simple eigenvalue ofĀ 22 with periodicity 1. Thus we can find a right eigenvector r 2 = r 2i j ; i j ∈C 2 , and a left eigenvector ρ 2 = ρ 2i j ; i j ∈C 2 ofĀ 22 with strictly positive components, normalized so that i j∈C 2 ρ 2i j = i j∈C 2 r 2i j ρ 2i j = 1, with For the non-diagonal elements of C we use the three recursions insert (151) into (153), divide both sides of all three equations in (153) by λ t and let t → ∞. After some computations, this yields with I p an identity matrix of order p. Finally, inserting (152) into (154), we find that C = rρ = (0 , r 2 , r 3 ) (ρ 1 , ρ 2 , 0), with Both r 3 and ρ 1 must have non-negative elements, since r 2 and ρ 2 have strictly positive elements, allĀ ab are non-negative andĀ 11 ,Ā 33 have all their eigenvalues less than λ. Finally, since r 2 and ρ 2 are right and left eigenvectors ofĀ 22 with eigenvalue λ, it follows after some computations from (155) and the block decomposition (150) of A, that r and ρ are right and left eigenvectors of A with eigenvalue λ.

Proof of Theorem 3
Recall that H ti j in (22) is the probability that two genes at time t have different types of alleles when picked independently from subpopulations i and j, with replacement if i = j. Conditionally on B t and X t−1 , we compute the expected value of this probability by conditioning on the parental subpopulations k and l of i and j, and then take into account whether the two parental genes are identical or not. We find that since the probability of picking two different genes at time t is 1 − 1/(2N u i ) when i = j and 1 when i = j. Then the probability that the two parental genes from subpopulations k and l are different is 1 when k = l and 1 − P ti jk when k = l. In the former k = l case, the probability is H t−1,kl that the parental genes are different by state, and when k = l, the probability is H t−1,kk /(1 − (2N u k ) −1 ) that the two parental genes are different by state.
We can express the coalescence probability p i jk in (37) in terms of P ti jk as Then average with respect to B t on the left and right hand sides of (156), use independence between B t and X t−1 , invoke (35), (41) and (157), to find that in accordance with (40). As a next step we average both sides of (158) with respect to X t−1 , using starting distribution π for X 0 , and get H ti j = E π E(H ti j |X t−1 ) = k,l A i j,kl E π (H t−1,kl ) = k,l A i j,kl H t−1,kl , which is equivalent to (25). To verify (39), we first compute and introduce the variables which are conditional versions of V ki j in (38). Then average with respect to the exchangeable random vectors {ν tkg } 2N u k g=1 in (160) to deduce that P ti jk = P(T = 1|I 0 = i, J 0 = j, We can rewrite (34) as and taking the product of the last two displayed equations, we find that P ti jk Q ti j,kk = 1 Formula (39) then follows from (157) by averaging with respect to B t in (163), using that E(V tki j ) = V ki j , and finally dividing by Q i j,kk .
Proof of Corollary 3 Define φ 3 as in (42)  In the fifth step we used (40), and in the seventh step that ρ is a left eigenvector of A with eigenvalue λ. Formula (43) is proved in the same way as the corresponding result for the right eigenvector r of A in Theorem 2.
Verifying (63), (64) and (65) We first insert (7) and the expression for Q i j,kl in (51) into (39), and find that p i jk = V ki j /(M ki M k j ) /(2N u k − 1 {i= j} u k /u i ). Since E(ν tki1 ) = M ki , it follows from (38) that In order to compute Var(ν tki1 ) and Cov(ν tki1 , ν tk j1 ) for k = 1, the subpopulation of grandpaternally inherited alleles of the fathers of time t − 1, let ζ tmmg = ν t11g + ν t21g and ζ tm f g = ν t13g + ν t23g be the total number of children that are males and females respectively, of the male in time t − 1 whose paternally and maternally inherited genes have been assigned number g and g = g (g) from 1, . . . , N m = 2N u 1 . (Due to the convention (9) that the first 2N u k X t−1,k genes of subpopulation k have the specified allele, we cannot assume g = g.) Due to exchangeability, τ 2 mm = Var(ζ tmm1 ), τ 2 m f = Var(ζ tm f 1 ) and τ mm,m f = Cov(ζ tmm1 , ζ tm f 1 ).

Proof of Theorem 5 Formulas
where in the last step we employed that ρ(0) is a left eigenvector of Q(0) with eigenvalue 1, and moreover that kl Q i j,kl (ε) = 1 for all ε ≥ 0, which implies kl Q i j,kl (0) = 1 and klQ i j,kl = 0.
Motivation of Lemma 1. We first sketch a proof of (110), using similar calculations as in Möhle (1998a). By the definition of an equilibrium distribution of a Markov chain, B(ε) t → 1γ (ε) as t → ∞, where 1 is a column vector of ones of length s. Hence we will study B(ε) t for large t and use the approximation which can be shown to be accurate in the limit ε → 0, with where γ (0) = (γ i (0)). We use (106), (108) and (110) to prove (111);