Quasi equilibrium, variance effective size and fixation index for populations with substructure

Hössjer, Ola; Ryman, Nils

doi:10.1007/s00285-013-0728-9

Quasi equilibrium, variance effective size and fixation index for populations with substructure

Published: 15 October 2013

Volume 69, pages 1057–1128, (2014)
Cite this article

Journal of Mathematical Biology Aims and scope Submit manuscript

Ola Hössjer¹ &
Nils Ryman²

482 Accesses
9 Citations
6 Altmetric
1 Mention
Explore all metrics

Abstract

In this paper, we develop a method for computing the variance effective size $N_{eV}$, the fixation index $F_{ST}$ and the coefficient of gene differentiation $G_{ST}$ of a structured population under equilibrium conditions. The subpopulation sizes are constant in time, with migration and reproduction schemes that can be chosen with great flexibility. Our quasi equilibrium approach is conditional on non-fixation of alleles. This is of relevance when migration rates are of a larger order of magnitude than the mutation rates, so that new mutations can be ignored before equilibrium balance between genetic drift and migration is obtained. The vector valued time series of subpopulation allele frequencies is divided into two parts; one corresponding to genetic drift of the whole population and one corresponding to differences in allele frequencies among subpopulations. We give conditions under which the first two moments of the latter, after a simple standardization, are well approximated by quantities that can be explicitly calculated. This enables us to compute approximations of the quasi equilibrium values of $N_{eV}$, $F_{ST}$ and $G_{ST}$. Our findings are illustrated for several reproduction and migration scenarios, including the island model, stepping stone models and a model where one subpopulation acts as a demographic reservoir. We also make detailed comparisons with a backward approach based on coalescence probabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the eigenvalue effective size of structured populations

Article Open access 18 September 2014

Population Genetics with Fluctuating Population Sizes

Article 13 February 2017

Population Structure and Migration

References

Allendorf F, Ryman N (2002) The role of genetics in population viability analysis. In: Bessinger SR, McCullogh DR (eds) Population viability analysis. The University of Chicago Press, Chicago
Google Scholar
Allendorf FW, Luikart G (2007) Conservation and the genetics of populations. Blackwell, Malden
Google Scholar
Barton NH, Slatkin M (1986) A quasi-equilibrium theory of the distribution of rare alleles in a subdivided population. Heredity 56:409–415
Article Google Scholar
Brockwell PJ, Davis RA (1987) Time series: theory and methods. Springer, New York
MATH Google Scholar
Caballero A (1994) Developments in the prediction of effective population size. Heredity 73:657–679
Article Google Scholar
Cannings C (1974) The latent roots of certain Markov chains arising in genetics: a new approach. I. Haploid models. Adv Appl Prob 6:260–290
Article MathSciNet MATH Google Scholar
Caswell H (2001) Matrix population models, 2nd edn. Sinauer, Sunderland
Google Scholar
Cattiaux P, Collet P, Lambert A, Martínez SM, Martín JS (2009) Quasi-stationary distributions and diffusion models in population dynamics. Ann Probab 37(5):1926–1969
Article MathSciNet MATH Google Scholar
Chakraborty R, Leimar O (1987) Genetic variation within a subdivided population. In: Ryman N, Utter R (eds) Population genetics and fishery management. Washington Sea Grant Program, Seattle, WA. Reprinted 2009 by The Blackburn Press, Caldwell
Collet P, Martinez S (2013) Quasi stationary distributions, Markov chains, diffusions and dynamical systems. Springer, Berlin
Book MATH Google Scholar
Cox DR, Miller HD (1965) The theory of stochastic processes. Methuen & Co Ltd, London
MATH Google Scholar
Crow JF (2004) Assessing population subdivision. In: Wasser SP (ed) Evolutionary theory and processes: modern horizons. Papers in Honour of Eviator Nevo. Springer Science+Business Media Dordrecht, Berlin, pp 35–42
Google Scholar
Crow JF, Aoki K (1982) Group selection for a polygenic behavioral trait: a differential proliferation model. Proc Natl Acad Sci 79:2628–2631
Article MathSciNet MATH Google Scholar
Crow JF, Aoki K (1984) Group selection for a polygenic behavioral trait: estimating the degree of population subdivision. Proc Natl Acad Sci 81:6073–6077
Article MATH Google Scholar
Crow JF, Kimura M (1970) An introduction to population genetics theory. The Blackburn Press, Caldwell
MATH Google Scholar
Durrett R (2008) Probability models for DNA sequence evolution, 2nd edn. Springer, New York
Book MATH Google Scholar
Engen S, Lande R, Saether B-E (2005a) Effective size of a fluctuating age-structured population. Genetics 170:941–954
Article Google Scholar
Engen S, Lande R, Saether B-E, Weimerskirch H (2005b) Extinction in relation to demographic and environmental stochasticity in age-structured models. Math Biosci 195:210–227
Article MathSciNet MATH Google Scholar
Engle RF, Granger CWJ (1987) Co-integration and error correction: Representation, estimation and testing. Econometrica 55:251–276
Article MathSciNet MATH Google Scholar
Ethier SN, Nagylaki T (1980) Diffusion approximation of Markov chains with two time scales and applications to genetics. Adv Appl Prob 12:14–49
Article MathSciNet MATH Google Scholar
Ewens WJ (1982) On the concept of effective population size. Theoret Popul Biol 21:373–378
Article MathSciNet MATH Google Scholar
Ewens WJ (2004) Mathematical Population Genetics. I. Theoretical introduction, 2nd edn. Springer, New York
Felsenstein J (1971) Inbreeding and variance effective numbers in populations with overlapping generations. Genetics 68:581–597
MathSciNet Google Scholar
Fisher RA (1958) The genetical theory of natural selection, 2nd edn. Dover, New York
Google Scholar
Granger CWJ (1981) Some properties of time series data and their use in econometric model specification. J Econom 16:121–130
Article Google Scholar
Hardy OJ, Vekemans X (1999) Isolation by distance in a continuous population: reconciliation between spatial autocorrelation analysis and population genetics models. Heredity 83:145–154
Article Google Scholar
Hardy OJ, Vekemans X (2002) SPAGeDI: a versatile computer program to analyse spatial genetic structure at the individual or population model. Mol Ecol Notes 2:618–620
Article Google Scholar
Hare MP, Nunney L, Schwartz MK, Ruzzante DE, Burford M, Waples R, Ruegg K, Palstra F (2011) Understanding and estimating effective population size for practical applications in marine species management. Conserv Biol 25(3):438–449
Article Google Scholar
Hössjer O (2011) Coalescence theory for a general class of structured populations with fast migration. Adv Appl Probab 43(4):1027–1047
Article MATH Google Scholar
Hössjer O (2013) Spatial autocorrelation for subdivided populations with invariant migration schemes. Methodol Comput Appl Probab. doi:10.1007/s11009-013-9321-3
MATH Google Scholar
Hössjer O, Jorde PE, Ryman N (2013) Quasi equilibrium approximations of the fixation index of the island model under neutrality. Theoret Popul Biol 84:9–24
Article MATH Google Scholar
Jamieson IG, Allendorf FW (2012) How does the 50/500 rule apply to MVPs? Trends Ecol Evol 27(10): 578–584
Google Scholar
Jorde P-E, Ryman N (2007) Unbiased estimator of genetic drift and effective population size. Genetics 177:927–935
Article Google Scholar
Karlin S (1966) A first course in stochastic processes. Academic Press, New York
Google Scholar
Kimura M (1953) ‘Stepping stone’ model of population. Ann Rep Natl Inst Genet Japan 3:62–63
Google Scholar
Kimura M (1955) Solution of a process of random genetic drift with a continuous model. Proc Natl Acad Sci USA 41:141–150
Article Google Scholar
Kimura M (1964) Diffusion models in population genetics. J Appl Prob 1:177–232
Article MATH Google Scholar
Kimura M (1971) Theoretical foundations of population genetics at the molecular level. Theor Popul Biol 2:174–208
Article MATH Google Scholar
Kimura M, Weiss GH (1964) The stepping stone model of population structure and the decrease of genetic correlation with distance. Genetics 61:763–771
Google Scholar
Kingman JFC (1982) The coalescent. Stoch Proc Appl 13:235–248
Article MathSciNet MATH Google Scholar
Latter BDH, Sved JA (1981) Migration and mutation in stochastic models of gene frequency change. II. Stochastic migration with a finite number of islands. J Math Biol 13:95–104
Article MathSciNet MATH Google Scholar
Leviyang S (2011a) The distribution of $F_{ST}$ for the island model in the large population, weak mutation limit. Stoch Anal Appl 28:577–601
Article MathSciNet Google Scholar
Leviyang S (2011b) The distribution of $F_{ST}$ and other genetic statistics for a class of population structure models. J Math Biol 62:203–289
Article MathSciNet MATH Google Scholar
Leviyang S, Hamilton MB (2011) Properties of Weir and Cockerham’s $F_{ST}$ estimator and associated bootstrap confidence intervals. Theoret Populat Biol 79:39–52
Article Google Scholar
Malécot G (1946) La consanguinité dans une population limitée. C R Acad Sci (Paris) 222:841–843
MathSciNet Google Scholar
Maruyama T (1970a) On the rate of decrease of heterozygosity in circular stepping stone models of populations. Theor Popul Biol 1:101–119
Google Scholar
Maruyama T (1970b) Effective number of alleles in subdivided populations. Theor Popul Biol 1:273–306
Article MathSciNet MATH Google Scholar
Möhle M (2010) Looking forwards and backwards in the multi-allelic neutral Cannings population model. J Appl Prob 47:713–731
Article MATH Google Scholar
Nagylaki T (1980) The strong migration limit in geographically structured populations. J Math Biol 9: 101–114
Google Scholar
Nagylaki T (1982) Geographical invariance in population genetics. J Theor Biol 99:159–172
Article MathSciNet Google Scholar
Nagylaki T (1998) The expected number of heterozygous sites in a subdivided population. Genetics 149:1599–1604
Google Scholar
Nagylaki T (2000) Geographical invariance and the strong-migration limit in subdivided populations. J Math Biol 41:123–142
Google Scholar
Nei M (1973) Analysis of gene diversity in subdivided populations. Proc Natl Acad Sci USA 70:3321–3323
Article MATH Google Scholar
Nei M (1975) Molecular evolution and population genetics. North-Holland, Amsterdam
Google Scholar
Nei M (1977) $F$-statistics and analysis of gene diversity in subdivided populations. Ann Hum Genet 41: 225–233
Nei M, Chakravarti A, Tateno Y (1977) Mean and variance of $F_{ST}$ in a finite number of incompletely isolated populations. Theoret Popul Biol 11:291–306
Article MATH Google Scholar
Nei M, Kumar S (2000) Molecular evolution and phylogenetics. Oxford University Press, Oxford
Google Scholar
Nei M, Tajima F (1981) Genetic drift and estimation of effective population size. Genetics 98:625–640
MathSciNet Google Scholar
Nordborg M, Krone S (2002) Separation of time scales and convergence to the coalescent in structured populations. In: Slatkin M, Veuille M (eds) Modern development in theoretical population genetics. Oxford Univ Press, Oxford, pp 194–232
Google Scholar
Nunney L (1999) The effective size of a hierarchically-structured population. Evolution 53:1–10
Article Google Scholar
Olsson F, Hössjer O, Laikre L, Ryman N (2013) Variance effective population size of populations in which size and age composition fluctuate. Theoret, Popul Biol (to appear)
Orive ME (1993) Effective population size in organisms with complex life-histories. Theoret Popul Biol 44:316–340
Article MATH Google Scholar
Palstra FP, Ruzzante DE (2008) Genetic estimates of contemporary effective population size: what can they tell us about the importance of genetic stochasticity for wild populations persistence? Mol Ecol 17:3428–3447
Article Google Scholar
Rottenstreich S, Miller JR, Hamilton MB (2007) Steady state of homozygosity and $G_{ST}$ for the island model. Theoret Popul Biol 72:231–244
Article MATH Google Scholar
Ryman N, Allendorf FW, Jorde PE, Laikre L, Hössjer O (2013) Samples from structured populations yield biased estimates of effective size that overestimate the rate of loss of genetic variation. Mol Ecol Resour (to appear)
Ryman N, Leimar O (2008) Effect of mutation on genetic differentiation among nonequilibrium populations. Evolution 62(9):2250–2259
Article Google Scholar
Sagitov S, Jagers P (2005) The coalescent effective size of age-structured populations. Ann Appl Probab 15(3):1778–1797
Article MathSciNet MATH Google Scholar
Sampson KY (2006) Structured coalescent with nonconservative migration. J Appl Prob 43:351–362
Article MathSciNet MATH Google Scholar
Sjödin P, Kaj I, Krone S, Lascoux M, Nordborg M (2005) On the meaning and existence of an effective population size. Genetics 169:1061–1070
Article Google Scholar
Slatkin M (1981) Estimating levels of gene flow in natural populations. Genetics 99:323–335
Google Scholar
Slatkin M (1985) Rare alleles as indicators of gene flow. Evolution 39:53–65
Article Google Scholar
Slatkin M (1991) Inbreeding coefficients and coalescence times. Genet Res 58:167–175
Article Google Scholar
Slatkin M, Arter HE (1991) Spatial autocorrelation methods in population genetics. Am Nat 138(2):499–517
Article Google Scholar
Sved JA, Latter BDH (1977) Migration and mutation in stochastic models of gene frequency change. J Math Biol 5:61–73
Google Scholar
Sokal RR, Oden NL, Thomson BA (1997) A simulation study of microevolutionary inferences by spatial autocorrelation analysis. Biol J Linnean Soc 60:73–93
Article Google Scholar
Takahata N (1983) gene identity and genetic differentiation of populations in the finite island model. Genetics 104 (3): 497–512
Google Scholar
Takahata N, Nei M (1984) $F_{ST}$ and $G_{ST}$ statistics in the Finite island model. Genetics 107 (3): 501–504
Van der AA NP, Ter Morsche HG, Mattheij RRM (2007) Computation of eigenvalue and eigenvector derivatives for a general complex-valued eigensystem. Electron J Linear Algebra 16:300–314
Google Scholar
Wakeley J (1999) Nonequilibrium migration in human history. Genetics 153:1863–1871
Google Scholar
Wakeley J, Takahashi T (2004) The many-demes limit for selection and drift in a subdivided population. Theoret Popul Biol 66:83–91
Article MATH Google Scholar
Wang J, Caballero A (1999) Developments in predicting the effective size of subdivided populations. Heredity 82:212–226
Article Google Scholar
Waples RS (1989) A generalized approach for estimating effective population size from temporal changes of allele frequency. Genetics 121:379–391
Google Scholar
Waples RS (2002) Definition and estimation of effective population size in the conservation of endangered species. In: Beissinger SR, McCullogh DR (eds) Populations viability analysis. The University of Chicago Press, Chicago
Google Scholar
Waples RS, Gaggiotti O (2006) What is a population? An empirical evaluation of some genetic methods for identifying the number of gene pools and their degree of connectivity. Mol Ecol 15:1419–1439
Article Google Scholar
Waples RS, Yokota M (2007) Temporal estimates of effective population size in species with overlapping generations. Genetics 175:219–233
Article Google Scholar
Ward RD, Woodward M, Skibinski DOF (1994) A comparison of genetic diversity levels in marine, freshwater and anadromous fishes. J Fish Biol 44:213–232
Article Google Scholar
Weir BS, Cockerham CC (1984) Estimating $F$-statistics for the analysis of population structure. Evolution 38(6):1358
Article Google Scholar
Weiss GH, Kimura M (1965) A mathematical analysis of the stepping stone model of genetic correlation. J Appl Probab 2:129–149
Article MathSciNet MATH Google Scholar
Whitlock MC, Barton NH (1997) The effective size of a subdivided population. Genetics 145:427–441
Google Scholar
Wilkinson-Herbots HM (1998) Genealogy and subpopulation differentiation under various models of population structure. J Math Biol 37:535–585
Article MathSciNet MATH Google Scholar
Wright S (1931) Evolution in Mendelian populations. Genetics 16:97–159
Google Scholar
Wright S (1938) Size of population and breeding structure in relation to evolution. Science 87:430–431
Google Scholar
Wright S (1943) Isolation by distance. Genetics 28:114–138
Google Scholar
Wright S (1946) Isolation by distance under diverse systems of mating. Genetics 31:39–59
Google Scholar
Wright S (1951) The general structure of populations. Ann Eugenics 15:323–354
Article MathSciNet Google Scholar
Wright S (1978) Variability within and among genetic populations. Evolution and the genetics of populations, vol 4. University of Chicago Press, Chicago
Google Scholar

Download references

Acknowledgments

Ola Hössjer’s research was financially supported by the Swedish Research Council, contract nr. 621-2008-4946, and the Gustafsson Foundation for Research in Natural Sciences and Medicine. Nils Ryman’s research was supported by grants from the Swedish Research Council, the BONUS Baltic Organisations’ Network for Funding Science EEIG (the BaltGene research project), and through a grant to his colleague Linda Laikre from the Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning (Formas). The authors want to thank an associate editor, two referees, Anders Martin-Löf, and Fredrik Olsson for valuable comments on the work.

Author information

Authors and Affiliations

Division of Mathematical Statistics, Department of Mathematics, Stockholm University, Stockholm, Sweden
Ola Hössjer
Division of Population Genetics, Department of Zoology, Stockholm University, Stockholm, Sweden
Nils Ryman

Authors

Ola Hössjer
View author publications
You can also search for this author in PubMed Google Scholar
Nils Ryman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ola Hössjer.

Appendices

Appendix A: Orthogonal decomposition of allele frequency process

Jordan canonical form of ${\varvec{B}}$ and motivation of (22). Let ${\varvec{B}}= {\varvec{Q}}\varvec{\Lambda }{\varvec{Q}}^{-1}$ be the Jordan canonical form of ${\varvec{B}}$, with

$$\begin{aligned} \varvec{\Lambda }= \left( \begin{array}{c@{\quad }c@{\quad }c} \varvec{\Lambda }_1 &{} \ldots &{} 0 \\ \vdots &{} \ddots &{} \vdots \\ 0 &{} \ldots &{} \varvec{\Lambda }_r \end{array}\right) \end{aligned}$$

a block diagonal matrix containing the (possibly complex-valued) eigenvalues of ${\varvec{B}}$ along the diagonal. For each $l=1,\ldots ,r$, the square matrix

$$\begin{aligned} \varvec{\Lambda }_l = \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} \lambda _l &{} 1 &{} 0 &{} \ldots &{} \\ 0 &{} \lambda _l &{} 1 &{} 0 &{} \ldots \\ 0 &{} 0 &{} \lambda _l &{} 1 &{} 0 &{} \ldots \\ \vdots &{} &{} \ddots &{} \ddots &{} \ddots &{} \ddots \end{array}\right) \end{aligned}$$

(104)

occupies rows and columns $j_{l-1}+1,\ldots ,j_l$ of $\varvec{\Lambda }$, with diagonal entries equal to $\lambda _l$, all entries along the superdiagonal equal to 1 and all other entries of $\varvec{\Lambda }_l$ equal 0. Hence $\lambda _l$ is an eigenvalue of ${\varvec{B}}$ which appears $j_l-j_{l-1}$ times along the diagonal of $\varvec{\Lambda }_l$, with $0=j_0 < j_1 < \cdots < j_r=s$. In particular, $\varvec{\Lambda }$ is diagonal when all eigenvalues of ${\varvec{B}}$ are distinct and $r=s$. Then the rows of ${\varvec{Q}}^{-1}$ contain the left eigenvectors of ${\varvec{B}}$ and the columns ${\varvec{q}}_1,\ldots ,{\varvec{q}}_s$ of ${\varvec{Q}}$ the right eigenvectors. See for instance Cox and Miller (1965).

Regardless of whether $\varvec{\Lambda }$ is diagonal or not, since ${\varvec{B}}$ is a transition matrix of a Markov chain, ${\varvec{q}}_1 = \varvec{1}$ is a right eigenvector with eigenvalue $\lambda _1=1$. By the assumed irreducibility and aperiodicity of this Markov chain, it follows from the Perron Frobenius Theorem that $|\lambda _l|<1$ for $l=2,\ldots ,r$, and without loss of generality, we may assume $|\lambda _2|\ge |\lambda _3|\ge \cdots \ge |\lambda _r|\ge 0$.

Introduce the inner product

$$\begin{aligned} ({\varvec{x}},{\varvec{y}})=\sum _{i=1}^s \gamma _i\bar{x}_iy_i \end{aligned}$$

(105)

for possibly complex-valued column vectors ${\varvec{x}}=(x_i)$ and ${\varvec{y}}=(y_i)$ of length $s$, with $\bar{x}_i$ the complex conjugate of $x_i$. Then, we have the following result:

Proposition 5

The columns ${\varvec{q}}_2,\ldots ,{\varvec{q}}_s$ of ${\varvec{Q}}$ are all orthogonal to ${\varvec{q}}_1=\varvec{1}$ with respect to inner product (105), i.e.

$$\begin{aligned} (\varvec{1},{\varvec{q}}_j) = 0, \quad j=2,\ldots ,s. \end{aligned}$$

Proof

We have that

$$\begin{aligned} (\varvec{1},{\varvec{q}}_j)&= \sum _{i=1}^s \gamma _j q_j\\&= \langle \varvec{\gamma },{\varvec{q}}_j\rangle , \end{aligned}$$

where $\langle {\varvec{x}},{\varvec{y}}\rangle =\sum _{j=1}^s x_jy_j$ is the standard inner product. The result follows since $\varvec{\gamma }$ is the first row of ${\varvec{Q}}^{-1}$ and ${\varvec{q}}_j$ row number $j$ (with $j\ge 2$) of ${\varvec{Q}}$. $\square $

Define $\varvec{\Lambda }^0=\text{ diag }(0,\varvec{\Lambda }_2,\ldots ,\varvec{\Lambda }_s)$ as the block diagonal matrix obtained by replacing $\Lambda _1=\lambda _1=1$ in $\varvec{\Lambda }$ by $0$ (or any other with modulus less or equal to $|\lambda _2|$), and put

$$\begin{aligned} {\varvec{B}}^0 = {\varvec{Q}}\varvec{\Lambda }^0{\varvec{Q}}^{-1}. \end{aligned}$$

(106)

It then follows that ${\varvec{B}}^0$ has largest eigenvalue $|\lambda _2|<1$, and it enters into the time dynamics of the allele frequency process as follows:

Proposition 6

The recursive autoregressive equation (10) for ${\varvec{P}}_t$ can be decomposed into one genetic drift term for the overall allele frequency of the whole population, and one recursion part for the allele frequency fluctuations among subpopulations, as

$$\begin{aligned} P_{t+1}&= P_t + \varepsilon _{t+1},\nonumber \\ {\varvec{P}}_{t+1}^0&= {\varvec{B}}{\varvec{P}}_t^0 + \varvec{\varepsilon }_{t+1}^0 = {\varvec{B}}^0 {\varvec{P}}_t^0 + \varvec{\varepsilon }_{t+1}^0, \end{aligned}$$

(107)

with $\varvec{\varepsilon }_{t+1}^0$ as defined in (22).

Proof

The upper part of (107) follows immediately from (10), since

$$\begin{aligned} P_{t+1} = (\varvec{1},{\varvec{P}}_{t+1}) = (\varvec{1},{\varvec{B}}{\varvec{P}}_t + \varvec{\varepsilon }_{t+1}) = P_t + \varepsilon _{t+1}. \end{aligned}$$

Define, for any vector ${\varvec{x}}=(x_1,...,x_s)$, ${\varvec{x}}^0 = {\varvec{x}}- (\varvec{1},{\varvec{x}})\varvec{1}$. Then, since $({\varvec{x}}+{\varvec{y}})^0 = {\varvec{x}}^0+{\varvec{y}}^0$, we have that

$$\begin{aligned} {\varvec{P}}_{t+1}^0 = ({\varvec{B}}{\varvec{P}}_t + \varvec{\varepsilon }_{t+1})^0 = ({\varvec{B}}{\varvec{P}}_t)^0 + \varvec{\varepsilon }_{t+1}^0 = {\varvec{B}}{\varvec{P}}_t^0 + \varvec{\varepsilon }_{t+1}^0 = {\varvec{B}}^0{\varvec{P}}_t^0 + \varvec{\varepsilon }_{t+1}^0. \qquad \quad \end{aligned}$$

(108)

The third equality of (108) follows since

$$\begin{aligned} ({\varvec{B}}{\varvec{P}}_t)^0&= \left( {\varvec{B}}(P_t\varvec{1}+ {\varvec{P}}_t^0)\right) ^0\\&= \left( P_t\varvec{1}+ {\varvec{B}}{\varvec{P}}_t^0\right) ^0\\&= P_t\varvec{1}^0 + ({\varvec{B}}{\varvec{P}}_t^0)^0\\&= 0 + {\varvec{B}}{\varvec{P}}_t^0\\&= {\varvec{B}}{\varvec{P}}_t^0, \end{aligned}$$

where in the second last step we used that since ${\varvec{P}}_t^0$ is a linear combination of ${\varvec{q}}_2,\ldots ,{\varvec{q}}_s$, so is ${\varvec{B}}{\varvec{P}}_t^0$, and hence orthogonal to $\varvec{1}$ by Proposition 5, so that $({\varvec{B}}{\varvec{P}}_t^0)^0={\varvec{B}}{\varvec{P}}_t^0$.

The fourth equality of (108) follows since ${\varvec{Q}}^{-1}{\varvec{P}}_t^0$ is a linear combination of ${\varvec{e}}_2,\ldots ,{\varvec{e}}_s$, where ${\varvec{e}}_i=(0,\ldots ,0,1,0,\ldots ,0)^T$ has 1 in position $i$ and zeros elsewhere. Hence $\varvec{\Lambda }{\varvec{Q}}^{-1}{\varvec{P}}_t^0 = \varvec{\Lambda }^0{\varvec{Q}}^{-1}{\varvec{P}}_t^0$ and ${\varvec{B}}{\varvec{P}}_t^0 = {\varvec{B}}^0 {\varvec{P}}_t^{0}$. $\square $

Appendix B: Proofs from Sect. 5

Proof of Proposition 1.

We notice that

$$\begin{aligned} E(H_{t+1,ij}|{\varvec{P}}_t)&= E\left( P_{t+1,i}(1-P_{t+1,j}) + P_{t+1,j}(1-P_{t+1,i})|{\varvec{P}}_t\right) \\&= E(P_{t+1,i}|{\varvec{P}}_t)\left( 1-E(P_{t+1,j}|{\varvec{P}}_t)\right) \\&+ E(P_{t+1,j}|{\varvec{P}}_t)\left( 1-E(P_{t+1,i}|{\varvec{P}}_t)\right) - 2 \text{ Cov }(P_{t+1,i},P_{t+1,j}|{\varvec{P}}_t)\\&= ({\varvec{B}}{\varvec{P}}_t)_i \left( 1-({\varvec{B}}{\varvec{P}}_t)_j\right) + ({\varvec{B}}{\varvec{P}}_t)_j \left( 1-({\varvec{B}}{\varvec{P}}_t)_i\right) - 2\Omega ({\varvec{P}}_t)_{ij}\\&= \sum _{k,l=1}^s b_{ik}b_{jl}\left( P_{tk}(1-P_{tl}) + (1-P_{tk})P_{tl}\right) - 2 \Omega ({\varvec{P}}_t)_{ij}, \end{aligned}$$

from which it easily follows that the two recursions in (25) and (26) are equivalent, with $A_{ij,kl}$ and $U_{ij,kl}$ related as in (27).

Next we will show that (25) and (28) are equivalent. Clearly (25) implies (28), so it remains to establish the reverse implication. Hence we assume that (28) is satisfied and we want to show that (25) holds for a unique square matrix ${\varvec{U}}=(U_{ij,kl})$ of order $s^2$ with $U_{ij,kl}=U_{ij,lk}$. Indeed, since $\varvec{\Omega }({\varvec{P}})$ is a quadratic function of ${\varvec{P}}$ with $\varvec{\Omega }({\varvec{0}})={\varvec{0}}$, there is a unique such matrix ${\varvec{U}}$ and a unique set of coefficients $c_{ij,k}$ satisfying

$$\begin{aligned} \Omega ({\varvec{P}})_{ij} = \sum _k c_{ij,k}P_k - \sum _{k,l} U_{ij,kl}P_kP_l \end{aligned}$$

(109)

for all $i,j$. On the other hand, according to lower part of (28),

$$\begin{aligned} \Omega (\varvec{1}-{\varvec{P}})_{ij} = \sum _k c_{ij,k}(1-P_k) - \sum _{k,l} U_{ij,kl}(1-P_k)(1-P_l) \end{aligned}$$

(110)

should agree with (109). The quadratic terms of (109) and (110) are clearly identical, but in order for the linear and constant terms to agree as well,

$$\begin{aligned} c_{ij,k} = \sum _{l} U_{ij,kl} \end{aligned}$$

(111)

must hold for all $k$ (recall that $U_{ij,kl}=U_{ij,lk}$). On the other hand, we can add and subtract linear terms in (109) according to

$$\begin{aligned} \Omega ({\varvec{P}})_{ij} = \frac{1}{2} \sum _{k,l} U_{ij,kl}\left( P_k(1-P_l)+P_l(1-P_k)\right) + \sum _{k} (c_{ij,k}-d_{ij,k})P_k, \qquad \quad \end{aligned}$$

(112)

where

$$\begin{aligned} d_{ij,k} = \sum _{l} U_{ij,kl} \end{aligned}$$

for all $k$. But $d_{ij,k}=c_{ij,k}$ according to (111), so that the second sum in (112) vanishes, and the proposition is proved. $\square $

Proof of Proposition 2

First of all, since $\sum _{\tau =0}^\infty ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^\tau $ is assumed to converge, it can be seen by insertion that (36) provides a solution to (33).

In order to prove (37), we get from the Cauchy–Schwarz inequality

$$\begin{aligned} |V_{t,ij}| \le \sqrt{V_{t,ii}V_{t,jj}} \le \max (V_{t,ii},V_{t,jj}), \end{aligned}$$

for all pairs $i,j$. This implies

$$\begin{aligned} |{\varvec{V}}_t|_\infty = \max _{1\le i,j\le s} |V_{t,ij}| = \max _{1\le i \le s} V_{t,ii} = \max _{1\le i \le s} \frac{E_c\left( (P_{ti}^0-P_t)^2|P_t\right) }{P_t(1-P_t)}. \end{aligned}$$

We then use the definitions of $|\cdot |_\infty $ and $\Vert \cdot \Vert $ in Table 2, the triangle inequality and the matrix norm inequality $\Vert ({\varvec{G}}-\varvec{\Pi }{\varvec{U}})^\tau \varvec{\Pi }\Vert \le \Vert ({\varvec{G}}-\varvec{\Pi }{\varvec{U}})^\tau \Vert \Vert \varvec{\Pi }\Vert $ in order to prove (38), since

$$\begin{aligned} |{\varvec{V}}|_\infty&= |\text{ vec }({\varvec{V}})|_\infty \\&= \left| \sum _{\tau =0}^\infty ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^\tau \varvec{\Pi }{\varvec{U}}{\underline{{\mathbf{1}}}}\right| _\infty \\&\le \sum _{\tau =0}^\infty \left| ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^\tau \varvec{\Pi }{\varvec{U}}{\underline{{\mathbf{1}}}}\right| _\infty \\&\le \sum _{\tau =0}^\infty \Vert ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^\tau \Vert \Vert \varvec{\Pi }\Vert |{\varvec{U}}{\underline{{\mathbf{1}}}}|_\infty \\&= \text{ Mixtime } \Vert \varvec{\Pi }\Vert |{\varvec{U}}{\underline{{\mathbf{1}}}}|_\infty . \end{aligned}$$

We also have that

$$\begin{aligned} \Vert \varvec{\Pi }\Vert&= \max _{i,j} \sum _{1\le k,l\le s} |\Pi _{ij,kl}|\\&\le \max _{i,j} \sum _{1\le k,l\le s} |1_{\{(k,l)=(i,j)\}} - \gamma _{k}1_{\{j=l\}} - \gamma _l 1_{\{i=k\}} + \gamma _k\gamma _l|\\&\le \max _{i,j} \left( 1 + 2\sum _k \gamma _k + \sum _{k,l=1}^s \gamma _k\gamma _l\right) \\&= 4. \end{aligned}$$

Finally, (39)–(40) are proved in the same way as (37)–(38). $\square $

Proof of Proposition 3

In order to prove (41), we introduce for each pair of integers $\tau ,\alpha $ with $0\le \alpha \le \tau $ the set ${\mathcal {N}}_{\tau \alpha } = \{{\varvec{n}}= (n_0,n_1,\ldots ,n_{\alpha +1})\}$ of $\tau \atopwithdelims ()\alpha $ sequences ${\varvec{n}}$ such that $0=n_0 < n_1 < \cdots < n_\alpha < n_{\alpha +1} = \tau +1$. Then

$$\begin{aligned} ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^\tau = \sum _{\alpha =0}^\tau (-1)^\alpha \sum _{{\varvec{n}}\in {\mathcal {N}}_{\tau \alpha }} \left( \prod _{i=1}^{\alpha +1} ({\varvec{U}}^{\{i>1\}}({\varvec{G}}^0)^{n_i-n_{i-1}-1}\varvec{\Pi }^{\{i<\alpha +1\}})\right) , \qquad \quad \end{aligned}$$

(113)

where the terms in ${\mathcal {N}}_{\tau \alpha }$ correspond to all possible ways of picking $\alpha $ terms $\varvec{\Pi }{\varvec{U}}$ and $\tau -\alpha $ terms ${\varvec{G}}^0$. Taking the matrix norm of (113) and multiplying by ${\varvec{U}}$ from the left and $\varvec{\Pi }$ from the right, it follows from matrix norm inequalities that

$$\begin{aligned}&\Vert {\varvec{U}}\Vert \Vert ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^\tau \Vert \Vert \varvec{\Pi }\Vert \nonumber \\&\quad \le \sum _{\alpha =0}^\tau \sum _{{\varvec{n}}\in {\mathcal {N}}_{\tau \alpha }} \Vert {\varvec{U}}\Vert \left\| \prod _{i=1}^{\alpha +1} ({\varvec{U}}^{\{i>1\}} ({\varvec{G}}^0)^{n_i-n_{i-1}-1}\varvec{\Pi }^{\{i<\alpha +1\}}) \right\| \Vert \varvec{\Pi }\Vert \nonumber \\&\quad \le \sum _{\alpha =0}^\tau \sum _{{\varvec{n}}\in {\mathcal {N}}_{\tau \alpha }} \Vert {\varvec{U}}\Vert \prod _{i=1}^{\alpha +1} \left( \Vert {\varvec{U}}^{\{i>1\}} ({\varvec{G}}^0)^{n_i-n_{i-1}-1}\varvec{\Pi }^{\{i<\alpha +1\}}\Vert \right) \Vert \varvec{\Pi }\Vert \nonumber \\&\quad \le \sum _{\alpha =0}^\tau \sum _{{\varvec{n}}\in {\mathcal {N}}_{\tau \alpha }} \Vert {\varvec{U}}\Vert \prod _{i=1}^{\alpha +1} \left( \Vert {\varvec{U}}\Vert ^{\{i>1\}} \Vert ({\varvec{G}}^0)^{n_i-n_{i-1}-1}\Vert \Vert \varvec{\Pi }^{\{i<\alpha +1\}}\Vert \right) \Vert \varvec{\Pi }\Vert \nonumber \\&\quad = \sum _{\alpha =0}^\tau \sum _{{\varvec{n}}\in {\mathcal {N}}_{\tau \alpha }} \prod _{i=1}^{\alpha +1} (\Vert {\varvec{U}}\Vert \Vert ({\varvec{G}}^0)^{n_i-n_{i-1}-1}\Vert \Vert \varvec{\Pi }\Vert ). \end{aligned}$$

(114)

Summing (114) over $\tau $, then changing the order of summation between $\alpha $ and $\tau $, and finally substituting $m_i=n_i-n_{i-1}-1$, we find that

$$\begin{aligned} \Vert \varvec{\Pi }\Vert \Vert {\varvec{U}}\Vert \text{ Mixtime }&= \Vert {\varvec{U}}\Vert \sum _{\tau =0}^\infty \Vert ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^\tau \Vert \Vert \varvec{\Pi }\Vert \nonumber \\&\le \sum _{\tau =0}^\infty \sum _{\alpha =0}^\tau \sum _{{\varvec{n}}\in {\mathcal {N}}_{\tau \alpha }} \prod _{i=1}^{\alpha +1} \left( \Vert {\varvec{U}}\Vert \Vert ({\varvec{G}}^0)^{n_i-n_{i-1}-1}\Vert \Vert \varvec{\Pi }\Vert \right) \nonumber \\&= \sum _{\alpha =0}^\infty \sum _{\tau =\alpha }^\infty \sum _{{\varvec{n}}\in {\mathcal {N}}_{\tau \alpha }} \prod _{i=1}^{\alpha +1} \left( \Vert {\varvec{U}}\Vert \Vert ({\varvec{G}}^0)^{n_i-n_{i-1}-1}\Vert \Vert \varvec{\Pi }\Vert \right) \nonumber \\&= \sum _{\alpha =0}^\infty \sum _{m_1=0}^\infty \cdots \sum _{m_{\alpha +1}=0}^\infty \prod _{i=1}^{\alpha +1} \left( \Vert {\varvec{U}}\Vert \Vert ({\varvec{G}}^0)^{m_i}\Vert \Vert \varvec{\Pi }\Vert \right) \nonumber \\&= \sum _{\alpha =0}^\infty \prod _{i=1}^{\alpha +1} \left( \Vert {\varvec{U}}\Vert \sum _{m=0}^\infty \Vert ({\varvec{G}}^0)^m \Vert \Vert \varvec{\Pi }\Vert \right) \nonumber \\&= \sum _{\alpha =0}^\infty \left( \Vert {\varvec{U}}\Vert \sum _{m=0}^\infty \Vert ({\varvec{G}}^0)^m \Vert \Vert \varvec{\Pi }\Vert \right) ^{\alpha + 1}\nonumber \\&= \Vert \varvec{\Pi }\Vert \Vert {\varvec{U}}\Vert \sum _{m=0}^\infty \Vert ({\varvec{G}}^0)^m\Vert / (1 - \Vert \varvec{\Pi }\Vert \Vert {\varvec{U}}\Vert \sum _{m=0}^\infty \Vert ({\varvec{G}}^0)^m\Vert ). \qquad \qquad \end{aligned}$$

(115)

It can be seen that $({\varvec{G}}^0)^\tau \text{ vec }({\varvec{V}}) = \text{ vec }(({\varvec{B}}^0)^\tau {\varvec{V}}(({\varvec{B}}^0)^T)^\tau )$, by induction with respect to $\tau $. Writing $({\varvec{G}}^0)^\tau =(G_{ij,kl}^{0(\tau )})$ and $({\varvec{B}}^0)^\tau = (b_{ik}^{0(\tau )})$, this yields

$$\begin{aligned} G_{ij,kl}^{0(\tau )} = b_{ik}^{0(\tau )}b_{jl}^{0(\tau )}, \end{aligned}$$

and

$$\begin{aligned} \Vert ({\varvec{G}}^0)^\tau \Vert&= \max _{i,j} \sum _{k,l} |G_{ij,kl}^{0(\tau )}|\nonumber \\&= \max _{i,j} \sum _{k,l=1}^s |b_{ik}^{0(\tau )}| |b_{jl}^{0(\tau )}|\nonumber \\&= \max _i \sum _k |b_{ik}^{0(\tau )}| \cdot \max _j \sum _l |b_{jl}^{0(\tau )}|\nonumber \\&= \Vert ({\varvec{B}}^0)^\tau \Vert ^2. \end{aligned}$$

(116)

Formula (41) then follows from (115) to (116). In order to verify (42), we use (106) and the Jordan decomposition (104) to deduce

$$\begin{aligned} ({\varvec{B}}^0)^\tau = {\varvec{Q}}\text{ diag }(0,\varvec{\Lambda }_2^\tau ,\ldots ,\varvec{\Lambda }_r^\tau ) {\varvec{Q}}^{-1}, \end{aligned}$$

where the middle matrix on the right hand side is block diagonal, with $\Vert \varvec{\Lambda }_l^\tau \Vert = O(\tau ^{j_l-j_{l-1}-1}|\lambda _l|^\tau )$ as $\tau \rightarrow \infty $, and $j_l-j_{l-1}$ the order of the square matrix $\varvec{\Lambda }_l$, see Cox and Miller (1965) for details. In particular, this implies that $\Vert \varvec{\Lambda }_l^\tau \Vert $ converges to zero at a faster rate than $(|\lambda _2|+\epsilon )^\tau $ as $\tau \rightarrow \infty $ for any $0<\epsilon < 1-|\lambda _2|$. Then (42) follows, since

$$\begin{aligned} \Vert ({\varvec{B}}^0)^\tau \Vert \le \Vert {\varvec{Q}}\Vert \left( \max _{2\le l \le r} \Vert \varvec{\Lambda }_l^\tau \Vert \right) \Vert {\varvec{Q}}^{-1}\Vert . \end{aligned}$$

Finally, (43) is a simple consequence of (41) and (42), since

$$\begin{aligned} \sum _{\tau =0}^\infty \Vert ({\varvec{B}}^0)^\tau \Vert ^{2}&\le C^2 \sum _{\tau =0}^\infty (|\lambda _2| + \epsilon )^{2\tau }\\&= \frac{C^2}{1-(|\lambda _2|+\epsilon )^2}. \end{aligned}$$

$\square $

Appendix C: Proof of Theorem 1

We start by showing that $\text{ vec }({\varvec{V}}_t)$ and $\text{ vec }(\varvec{\Sigma }_t)$ satisfy a similar system of equations as (33). To this end, since $\varvec{\varepsilon }_t^0 = ({\varvec{I}}-\varvec{1}\varvec{\gamma })\varvec{\varepsilon }_t$, the lower part of (22) implies a recursion

$$\begin{aligned} {\varvec{V}}_{t+1}&= \frac{E_c\left( \varvec{\varepsilon }_{t+1}^0(\varvec{\varepsilon }_{t+1}^0)^T|P_{t}\right) }{P_{t}(1-P_{t})} + \frac{E_c\left( {\varvec{{\varvec{B}}}}^0 {\varvec{{\varvec{P}}}}_t^0 \left( {\varvec{{\varvec{B}}}}^0{\varvec{{\varvec{P}}}}_t^0\right) ^T|P_t\right) }{P_t(1-P_t)} + \varvec{\xi }_{t+1}\nonumber \\&= ({\varvec{I}}-\varvec{1}\varvec{\gamma })\varvec{\Sigma }_t ({\varvec{I}}-\varvec{1}\varvec{\gamma })^T + {\varvec{B}}^0 {\varvec{V}}_t ({\varvec{B}}^0)^T + \varvec{\xi }_{t+1}, \end{aligned}$$

(117)

where $\varvec{\xi }_{t+1}$ is a remainder term that is nonzero since we conditioned on $P_t$ rather than $P_{t+1}$ and divided by $P_t(1-P_t)$ rather than $P_{t+1}(1-P_{t+1})$ on the right hand side of (117). Any departure of $E_c(\varvec{\varepsilon }_{t+1}^0|P_t)$ from $E(\varvec{\varepsilon }_{t+1}^0|P_t)={\varvec{0}}$ implies, in addition, that a cross covariance term is added to $\varvec{\xi }_{t+1}$.

In vec format we may rewrite (117) as

$$\begin{aligned} \text{ vec }({\varvec{V}}_{t+1}) = \varvec{\Pi }\text{ vec }(\varvec{\Sigma }_t) + {\varvec{G}}^0\text{ vec }({\varvec{V}}_t) + \text{ vec }(\varvec{\xi }_{t+1}), \end{aligned}$$

(118)

with $\varvec{\Pi }$ and ${\varvec{G}}^0$ matrices defined by $\varvec{\Pi }\text{ vec }(\varvec{\Sigma }_t)=\text{ vec }(({\varvec{I}}-\varvec{1}\varvec{\gamma })\varvec{\Sigma }_t({\varvec{I}}-\varvec{1}\varvec{\gamma })^T))$ and ${\varvec{G}}^0\text{ vec }({\varvec{V}}_t)=\text{ vec }({\varvec{B}}^0{\varvec{V}}_t({\varvec{B}}^0)^T)$ respectively. Hence their entries are as in (34) and (35).

For the standardized genetic drift covariance matrix, we first expand (31) as

(119)

where the remainder term $\varvec{\zeta }_t$ occurs when replacing the inner expectation $E$ by $E_c$. Then we expand $\varvec{\Omega }(P_t{\varvec{1}} + {\varvec{P}}_t^0)$ as in (30) and take expectation conditionally on $P_t$, and switch index from $t$ to $t+1$, to deduce that

$$\begin{aligned} \text{ vec }(\varvec{\Sigma }_{t+1})&= {\varvec{U}}{\underline{{\mathbf{1}}}} - {\varvec{U}}\text{ vec }({\varvec{V}}_{t+1}) + {\varvec{U}}_{t+1}\varvec{\mu }_{t+1} + \text{ vec }(\varvec{\zeta }_{t+1}) \nonumber \\&= {\varvec{U}}{\underline{{\mathbf{1}}}} - {\varvec{U}}\text{ vec }({\varvec{V}}_{t+1}) + \varvec{\eta }_{t+1}. \end{aligned}$$

(120)

where ${\varvec{U}}_t = (U_{tij,k})$ is an $s^2\times s$ matrix, whose elements are defined as $U_{tij,k}=(1-2P_t)\sum _l U_{ij,kl}$, so that the last term on the right hand side of (30) can be written as ${\varvec{U}}_t{\varvec{P}}_t^0$. The last term on the right hand side of (120) is defined by

$$\begin{aligned} \varvec{\eta }_{t} = {\varvec{U}}_t \varvec{\mu }_t + \text{ vec }(\varvec{\zeta }_t) \end{aligned}$$

with $\varvec{\mu }_t$ as in (44).

Now (118) and (120) define a system of equations which only differs from (33) in that the remainder terms $\text{ vec }(\varvec{\xi }_{t+1})$ and $\varvec{\eta }_{t+1}$ have been added. For simplicity of notation, we write $\tilde{\varvec{\xi }}_t = \text{ vec }(\varvec{\xi }_t)=\text{ vec }(\xi _{t,ij};1\le i,j\le s)$, a column vector of length $s^2$. Combining and (118) and (120), we get

$$\begin{aligned} \left( \begin{array}{c} \text{ vec }(\varvec{\Sigma }_{t+1}) \\ \text{ vec }({\varvec{V}}_{t+1}) \end{array} \right) = {\varvec{T}}\left( \begin{array}{c} \text{ vec }(\varvec{\Sigma }_{t}) \\ \text{ vec }({\varvec{V}}_{t}) \end{array} \right) + \left( \begin{array}{c} {\varvec{U}}{\underline{{\mathbf{1}}}} \\ {\varvec{0}}\end{array} \right) + \left( \begin{array}{c} \varvec{\eta }_{t+1}-{\varvec{U}}\tilde{\varvec{\xi }}_{t+1} \\ \tilde{\varvec{\xi }}_{t+1} \end{array} \right) , \end{aligned}$$

(121)

where

$$\begin{aligned} {\varvec{T}}= \left( \begin{array}{c@{\quad }c} {\varvec{0}}&{} -{\varvec{U}}\\ {\varvec{0}}&{} {\varvec{I}}\end{array}\right) \left( \begin{array}{c@{\quad }c} {\varvec{I}}&{} {\varvec{0}}\\ \varvec{\Pi }&{} {\varvec{G}}^0 \end{array}\right) . \end{aligned}$$

On the other hand, it follows from (33) that

$$\begin{aligned} \left( \begin{array}{c} \text{ vec }(\varvec{\Sigma }) \\ \text{ vec }({\varvec{V}}) \end{array} \right) = {\varvec{T}}\left( \begin{array}{c} \text{ vec }(\varvec{\Sigma }) \\ \text{ vec }({\varvec{V}}) \end{array} \right) + \left( \begin{array}{c} {\varvec{U}}{\underline{{\mathbf{1}}}} \\ {\varvec{0}}\end{array} \right) . \end{aligned}$$

(122)

Taking the difference of (121) and (122), we find that

$$\begin{aligned} \varvec{\delta }_{t} = \left( \begin{array}{c} \text{ vec }(\Delta \varvec{\Sigma }_t) \\ \text{ vec }(\Delta {\varvec{V}}_t) \end{array} \right) , \end{aligned}$$

satisfies

$$\begin{aligned} \varvec{\delta }_{t+1} = {\varvec{T}}\varvec{\delta }_{t} + \left( \begin{array}{c} \varvec{\eta }_{t+1} -{\varvec{U}}\tilde{\varvec{\xi }}_{t+1} \\ \tilde{\varvec{\xi }}_{t+1} \end{array} \right) \Longrightarrow \varvec{\delta }_t = \sum _{\tau =0}^\infty {\varvec{T}}^\tau \left( \begin{array}{c} \varvec{\eta }_{t-\tau }-{\varvec{U}}\tilde{\varvec{\xi }}_{t-\tau } \\ \tilde{\varvec{\xi }}_{t-\tau } \end{array} \right) , \end{aligned}$$

(123)

provided that the series converges. It can be shown by induction with respect to $\tau $ that

$$\begin{aligned} {\varvec{T}}^\tau = \left( \begin{array}{c@{\quad }c} -{\varvec{U}}({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^{\tau -1}\varvec{\Pi }&{} -{\varvec{U}}({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^{\tau -1}{\varvec{G}}^0 \\ ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^{\tau -1}\varvec{\Pi }&{} ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^{\tau -1}{\varvec{G}}^0 \end{array}\right) \end{aligned}$$

for all $\tau \ge 1$. Inserting this formula into (123), one obtains

$$\begin{aligned} \varvec{\delta }_t = \left( \begin{array}{c} -\varvec{\eta }_t \\ {\varvec{0}}\end{array}\right) + \left( \begin{array}{c} -{\varvec{U}}\\ {\varvec{I}}\end{array}\right) \sum _{\tau =0}^\infty ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^{\tau }\left( \varvec{\Pi }\varvec{\eta }_{t-\tau -1} + \tilde{\varvec{\xi }}_{t-\tau }\right) . \end{aligned}$$

(124)

Since $\tilde{\varvec{\xi }}_t$ contains the same elements as $\varvec{\xi }_t$, we have that $|\tilde{\varvec{\xi }}_t|_\infty = |\varvec{\xi }_t|_\infty $, and moreover, $|\varvec{\eta }_t|_\infty \le |\varvec{\zeta }_t|_\infty + \Vert {\varvec{U}}_t\Vert |\varvec{\mu }_t|_\infty $. Hence it follows, by taking the $|\cdot |_\infty $-norm of the upper and lower part of (124), that

$$\begin{aligned} |\Delta \varvec{\Sigma }_t|_\infty&\le |\varvec{\zeta }_t|_\infty \!+\! \Vert {\varvec{U}}_t\Vert |\varvec{\mu }_t|_\infty \nonumber \\&\!+ \Vert {\varvec{U}}\Vert \sum _{\tau =0}^\infty \Vert ({\varvec{G}}^0\!-\!\varvec{\Pi }{\varvec{U}})^{\tau }\Vert \left( \Vert \varvec{\Pi }\Vert (|\varvec{\zeta }_t|_\infty \!+\! \Vert {\varvec{U}}_{t-\tau -1}\Vert |\varvec{\mu }_{t-\tau -1}|_\infty ) \!+\! |\varvec{\xi }_{t-\tau }|_\infty \right) \nonumber \\ \end{aligned}$$

(125)

and

$$\begin{aligned} |\Delta {\varvec{V}}_t|_\infty \le \sum _{\tau =0}^\infty \Vert ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^{\tau }\Vert \left( \Vert \varvec{\Pi }\Vert (|\varvec{\zeta }_t|_\infty + \Vert {\varvec{U}}_{t-\tau -1}\Vert |\varvec{\mu }_{t-\tau -1}|_\infty ) + |\varvec{\xi }_{t-\tau }|_\infty \right) . \nonumber \\ \end{aligned}$$

(126)

Since

$$\begin{aligned} \sum _k |U_{tij,k}| \le \sum _k \left| \sum _l U_{ij,kl}\right| \le \sum _{kl} |U_{ij,kl}|, \end{aligned}$$

if follows that $\Vert {\varvec{U}}_t\Vert \le \Vert {\varvec{U}}\Vert $. Hence we may replace $\Vert {\varvec{U}}_t\Vert $ and $\Vert {\varvec{U}}_{t-\tau -1}\Vert $ in (125)–(126) by their upper bounds $\Vert {\varvec{U}}\Vert $, take conditional expectation $E_c$ on both sides of these two inequalities, and finally letting $t\rightarrow \infty $, thereby obtaining (48) and (49). $\square $

Appendix D: Verifying formulas for $\Omega ({\varvec{P}}_t)$ and $N_{eV}^\mathrm{{ appr}}$ for various reproduction and migration models.

We will start by verifying (30) (and hence also (120)) separately for reproduction scenarios 1, 2 and 3.

Reproduction scenario 1. For this reproduction scenario, we write

$$\begin{aligned} P^*_{tki} = P_{tk} + (\tilde{P}_{tk}-P_{tk}) + (P^*_{tki}-\tilde{P}_{tk}). \end{aligned}$$

It follows from (10) and (53) that

$$\begin{aligned} \varepsilon _{t+1,i} = \sum _{k=1}^s b_{ik}(\tilde{P}_{tk}-P_{tk}) + \sum _{k=1}^s b_{ik}(P^*_{tki}-\tilde{P}_{tk}). \end{aligned}$$

We further have that

$$\begin{aligned} \text{ Var }(\tilde{P}_{tk}-P_{tk}|{\varvec{P}}_t) = \left( \frac{1}{2N_{ek}} - \frac{1}{2Nu_k}\right) P_{tk}(1-P_{tk})(1+o(1)) \end{aligned}$$

(127)

and

$$\begin{aligned} \text{ Var }(P^*_{tki}-\tilde{P}_{tk}|{\varvec{P}}_t) = \frac{P_{tk}(1-P_{tk})}{2Nu_km_{ki}}(1+o(1)). \end{aligned}$$

Combining the last three displayed expressions, we arrive at (54). $\square $

Reproduction scenario 2. Write

$$\begin{aligned} \varepsilon _{t+1,i} = \sum _{k=1}^s b_{ik}(P^*_{tki}-P_{tk}), \end{aligned}$$

(128)

introduce $C_{kij} = \text{ Cov }(\nu _{ki}^l,\nu _{kj}^l)$ and $\tilde{C}_{kij}=\text{ Cov }(\nu _{ki}^l,\nu _{kj}^{l^\prime })$ when $l\ne l^\prime $. Because of the assumed exchangeability of $\{\varvec{\nu }_k^l\}_{l=1}^{2Nu_k}$, $C_{kij}$ and $\tilde{C}_{kij}$ do not depend on $l$ and $(l,l^\prime )$ respectively. Since (2) holds exactly, with remainder term $o(1)$ equal to zero, the variance of the left hand side must be zero, and this implies $\tilde{C}_{kij}=-C_{kij}/(2Nu_k-1)$. Therefore, it follows from (55) that

$$\begin{aligned} \text{ Cov }(P^*_{tki},P^*_{tkj}|{\varvec{P}}_t)&= \frac{ 2Nu_kP_{tk}C_{kij} + 2Nu_kP_{tk}(2Nu_kP_{tk}-1)\tilde{C}_{kij}}{(2Nu_k)^2m_{ki}m_{kj}}\\&\sim \frac{C_{kij}}{m_{ki}m_{kj}} \frac{P_{tk}(1-P_{tk})}{2Nu_k}. \end{aligned}$$

Combining this with (128), we arrive at

$$\begin{aligned} \varvec{\Omega }({\varvec{P}}_t)_{ij} = \sum _{k=1}^s b_{ik}b_{jk}\frac{C_{kij}}{2Nu_k m_{ki}m_{kj}} P_{tk}(1-P_{tk}), \end{aligned}$$

which is equivalent to (56). $\square $

Reproduction scenario 3. In order to verify (120), we first notice from (10) and (57) that

$$\begin{aligned} \varepsilon _{t+1,i} = (P_{t+1,i}-\check{P}_{ti}) + \sum _{k=1}^s b_{ik}(\tilde{P}_{tk}-P_{tk}) + \sum _{k=1}^s (B_{ik}-b_{ik})P_{tk} + \text{ rem }, \qquad \quad \end{aligned}$$

(129)

with $\text{ rem } = \sum _{k=1}^s (B_{ik}-b_{ik})(\tilde{P}_{tk}-P_{tk})$ a remainder term that vanishes when $N_{ek}=Nu_k$ for all $k$ and which is otherwise asymptotically negligible when $\alpha _i\rightarrow \infty $ as $N\rightarrow \infty $. It follows from (57) and (59) that

$$\begin{aligned} \text{ Var }(P_{t+1,i}-\check{P}_{ti}|{\varvec{P}}_t) \sim \frac{({\varvec{B}}{\varvec{P}}_t)_i(1-({\varvec{B}}{\varvec{P}}_t)_i)}{2Nu_i}(1+o(1)), \end{aligned}$$

and

$$\begin{aligned} \text{ Var }\left( \sum _{k=1}^s (B_{ik}-b_{ik})P_{tk}|{\varvec{P}}_t\right)&= \frac{1}{\alpha _i+1}\sum _{k=1}^s P_{tk}^2 b_{ik} - \frac{1}{\alpha _i+1} \sum _{k,l=1}^s P_{tk}P_{tl}b_{ik}b_{il}\\&= \frac{1}{\alpha _i+1}\sum _{k=1}^s \left( P_{tk} - ({\varvec{B}}{\varvec{P}}_t)_i \right) ^2 b_{ik}. \end{aligned}$$

In conjunction with (127) and (129), this proves (60). $\square $

Verifying (65). The reproduction scenario 3 expression for $\varvec{\Sigma }$ is obtained by combining the upper equation of (33) with the relevant entries for $U_{ij,kl}$ in Table 3. When $N_{ek}=Nu_k$ for $k=1,\ldots ,s$, all non-diagonal ($i\ne j$) terms vanish and then the denominator of (62) can be written as

$$\begin{aligned} 2{\varvec{u}}\varvec{\Sigma }{\varvec{u}}^T&= 2\sum _{i,j,k,l} u_iu_j U_{ij,kl}- 2\sum _{i,j} u_iu_j \sum _{k,l} U_{ij,kl} V_{kl}\\&= 2\sum _{i,k,l} u_i^2 U_{ii,kl}- 2\sum _{i} u_i^2 \sum _{k,l} U_{ii,kl} V_{kl}\\&= \frac{1}{N}\left( 1 - \sum _i u_i ({\varvec{B}}{\varvec{V}}{\varvec{B}}^T)_{ii}\right) + 2\sum _i \frac{u_i^2}{\alpha _i+1}\left( \sum _k b_{ik}V_{kk} - ({\varvec{B}}{\varvec{V}}{\varvec{B}}^T)_{ii}\right) , \end{aligned}$$

which yields (65). $\square $

Deriving explicit expressions of $N_{eV}^\mathrm{{ appr}}$ and $F_{ST}^\mathrm{{ appr}}$ for the island model. Since $\varvec{\gamma }={\varvec{u}}$ for the island model, we can apply (62) and (63), with ${\varvec{u}}=\varvec{1}^T/s$, to deduce

$$\begin{aligned} N_{eV}^\mathrm{{ appr}} = \frac{1}{2\varvec{1}^T\varvec{\Sigma }\varvec{1}/s^2} \end{aligned}$$

(130)

and

$$\begin{aligned} F_{ST}^\mathrm{{ appr}} = \frac{1}{s}\text{ tr }({\varvec{V}}). \end{aligned}$$

(131)

We will start by giving a more explicit expression for ${\varvec{V}}$. It follows from (66) that ${\varvec{B}}{\varvec{q}}= (1-m){\varvec{q}}$ for any vector ${\varvec{q}}$ with $({\varvec{q}},\varvec{1})=0$. Hence $\lambda _2=\cdots =\lambda _s=1-m$. In this case it is particularly convenient to put $\lambda _1^0=1-m$ in the definition of ${\varvec{B}}^0$, since then, according to (106), ${\varvec{B}}^0=(1-m){\varvec{I}}$. The lower part of (33) can be written as ${\varvec{V}}={\varvec{B}}^0{\varvec{V}}({\varvec{B}}^0)^T + \tilde{\varvec{\Sigma }}$, where $\tilde{\varvec{\Sigma }} = ({\varvec{I}}-\varvec{1}\varvec{\gamma })\varvec{\Sigma }({\varvec{I}}-\varvec{\gamma })^T$. We can repeatedly apply this equation to deduce that

$$\begin{aligned} {\varvec{V}}= \sum _{r=0}^{\infty } (1-m)^{2r}\tilde{\varvec{\Sigma }} = \frac{\tilde{\varvec{\Sigma }}}{1-(1-m)^2}, \end{aligned}$$

and hence (131) can be rewritten as

$$\begin{aligned} \left( 1-(1-m)^2\right) F_{ST}^\mathrm{{ appr}} = \frac{1}{s}\text{ tr }(\tilde{\varvec{\Sigma }}) = \frac{1}{s}\left( \text{ tr }(\varvec{\Sigma }) - \frac{1}{s}\varvec{1}^T\varvec{\Sigma }\varvec{1}\right) . \end{aligned}$$

(132)

Therefore, in view of (130) and (132), it remains to find $\varvec{\Sigma }$.

For reproduction scenario 1, it can be deduced from (120) that (54) simplifies to

$$\begin{aligned} \Sigma _{ij}&= \left( \frac{1}{2N_e}-\frac{1}{2N/s}\right) \left( \frac{2m-m^2}{s} + (1-m)^2 1_{\{i=j\}}\right) + \frac{1_{\{i=j\}}}{2N/s}\\&-\left( \frac{1}{2N_e}\!-\!\frac{1}{2N/s}\right) \left( \frac{m^2}{s^2}\text{ tr }({\varvec{V}}) \!+\! \frac{V_{ii}\!+\!V_{jj}}{2}\left( 2\frac{m}{s}(1-m) \!+\! 1_{\{i=j\}}(1-m)^2\right) \right) \\&- \frac{1_{\{i=j\}}}{2N/s}\left( \frac{m}{s}\text{ tr }({\varvec{V}}) + (1-m)V_{ii}\right) \end{aligned}$$

for the island model, so that

$$\begin{aligned} \frac{2}{s^2}\varvec{1}^T\varvec{\Sigma }\varvec{1}= \frac{1}{sN_e}\left( 1-\frac{1}{s}\text{ tr }({\varvec{V}})\right) = \frac{1}{sN_e}(1-F_{ST}^\mathrm{{ appr}}) \end{aligned}$$

(133)

and

$$\begin{aligned} \frac{1}{s}\text{ tr }(\tilde{\varvec{\Sigma }}) = \frac{s-1}{s}\frac{1}{2\tilde{N}}(1-F_{ST}^\mathrm{{ appr}}). \end{aligned}$$

(134)

Combining (130) and (133) we arrive at (67), and inserting (134) into (132) and solving for $F_{ST}^\mathrm{{ appr}}$ we arrive at (68).

For reproduction scenario 3, a similar simplification of (60) leads to

$$\begin{aligned} \frac{2}{s^2}\varvec{1}^T\varvec{\Sigma }\varvec{1}&= \frac{1}{sN_e} - \left( \frac{1}{N_e}-\frac{1-(1-m)^2}{N/s}\right) \frac{1}{s^2}\text{ tr }({\varvec{V}}) + \frac{2\left( 1-(1-m)^2\right) }{\alpha +1}\frac{1}{s^2}\text{ tr }({\varvec{V}})\nonumber \\&= \frac{1}{sN_e} - \left( \frac{1}{N_e}-\frac{1-(1-m)^2}{N/s}\right) \frac{1}{s}F_{ST}^\mathrm{{ appr}} + \frac{2\left( 1-(1-m)^2\right) }{\alpha +1}\frac{1}{s}F_{ST}^\mathrm{{ appr}}. \nonumber \\ \end{aligned}$$

(135)

and

$$\begin{aligned} \frac{1}{s}\text{ tr }(\tilde{\varvec{\Sigma }})&= \frac{s-1}{s}\left( \frac{1}{2\tilde{N}}-\frac{(1-m)^2}{2N_e}\frac{1}{s}\text{ tr }({\varvec{V}}) + \frac{1-(1-m)^2}{\alpha +1} \frac{1}{s}\text{ tr }({\varvec{V}})\right) \nonumber \\&= \frac{s-1}{s}\left( \frac{1}{2\tilde{N}}-\frac{(1-m)^2}{2N_e}F_{ST}^\mathrm{{ appr}} + \frac{1-(1-m)^2}{\alpha +1} F_{ST}^\mathrm{{ appr}}\right) . \end{aligned}$$

(136)

Inserting (135) into (130) we arrive at (69), and plugging (136) into (132) and solving for $F_{ST}^\mathrm{{ appr}}$ we arrive at (70). $\square $

Appendix E: Proof of Theorem 2

In order to prove Theorem 2, we first need two lemmas, which we state for a single biallelic locus:

Lemma 1

In the one locus biallelic definitions (12) and (13) of $N_{eV,t}^{{\varvec{w}}}=Y/X$ and $F_{ST,t}^{{\varvec{w}}}=Z/Y$, the conditional expected values of the numerators and denominators equal

$$\begin{aligned} E_c(Y|P_t)&= E_c\left( P_t^{{\varvec{w}}}(1-P_t^{{\varvec{w}}})|P_t\right) \nonumber \\&= \left( 1 - ({\varvec{w}}-\varvec{\gamma }){\varvec{V}}_t({\varvec{w}}-\varvec{\gamma })^T + (1-2P_t)({\varvec{w}}-\varvec{\gamma })\varvec{\mu }_t\right) P_t(1-P_t)\nonumber \\&= (1-\text{ tr }({\varvec{C}}_Y{\varvec{V}}_t) + {\varvec{c}}_Y\varvec{\mu }_t)P_t(1-P_t), \end{aligned}$$

(137)

$$\begin{aligned} E_c(Z|P_t)&= E_c\left( \sum _{i=1}^s w_i (P_{ti}-P_t^{{\varvec{w}}})^2|P_t\right) \nonumber \\&= \left( \sum _{i=1}^s w_i {\varvec{V}}_{tii} - ({\varvec{w}}-\varvec{\gamma }){\varvec{V}}_t({\varvec{w}}-\varvec{\gamma })^T\right) P_t(1-P_t)\nonumber \\&= \text{ tr }({\varvec{C}}_Z{\varvec{V}}_t)P_t(1-P_t) \end{aligned}$$

(138)

and

$$\begin{aligned} E_c(X|P_t)&= 2E_c\left( E((P_{t+1}^{{\varvec{w}}}-P_t^{{\varvec{w}}})^ 2|P_t^{{\varvec{w}}})|P_t\right) \nonumber \\&= 2\left( {\varvec{w}}({\varvec{B}}-{\varvec{I}})({\varvec{V}}_t-\varvec{\varsigma }_t)({\varvec{B}}-{\varvec{I}})^T{\varvec{w}}^T + {\varvec{w}}(\varvec{\Sigma }_t-\varvec{\zeta }_t){\varvec{w}}^T\right) P_t(1-P_t)\nonumber \\&= \left( \text{ tr }\left( {\varvec{C}}_X({\varvec{V}}_t-\varvec{\varsigma }_t)\right) + \text{ tr }\left( {\varvec{C}}_X^\prime (\varvec{\Sigma }_t-\varvec{\zeta }_t)\right) \right) P_t(1-P_t) \end{aligned}$$

(139)

respectively, where ${\varvec{C}}_Y = ({\varvec{w}}-\varvec{\gamma })^T({\varvec{w}}-\varvec{\gamma })$, ${\varvec{c}}_Y = (1-2P_t)({\varvec{w}}-\varvec{\gamma })$, ${\varvec{C}}_Z = \text{ diag }({\varvec{w}}) - {\varvec{w}}^T{\varvec{w}}$, ${\varvec{C}}_X = 2({\varvec{B}}-{\varvec{I}})^T ({\varvec{w}}-\varvec{\gamma })^T({\varvec{w}}-\varvec{\gamma })({\varvec{B}}-{\varvec{I}})$, ${\varvec{C}}_X^\prime = 2{\varvec{w}}^T{\varvec{w}}$, $\varvec{\mu }_t$ and $\varvec{\zeta }_t$ are the remainder terms defined in (44) and (119), and $\varvec{\varsigma }_t$ another remainder term defined below, in (140).

Proof

We only prove the first parts of (137)–(139), and leave the second part to the reader. Starting with (137), we find that

$$\begin{aligned}&E_c\left( P_t^{{\varvec{w}}}(1-P_t^{{\varvec{w}}})|P_t\right) \\&\qquad = P_t(1-P_t) - E_c\left( (P_t^{{\varvec{w}}}-P_t)^2|P_t \right) + (1-2P_t)E_c\left( P_t^{{\varvec{w}}}-P_t|P_t \right) \\&\qquad = P_t(1-P_t) - E_c\left( (({\varvec{w}}-\varvec{\gamma }){\varvec{P}}_t^0)^2|P_t\right) + (1-2P_t)E_c\left( ({\varvec{w}}-\varvec{\gamma }){\varvec{P}}_t^0|P_t \right) \\&\qquad = P_t(1-P_t)\left( 1- ({\varvec{w}}-\varvec{\gamma }){\varvec{V}}_t({\varvec{w}}-\varvec{\gamma })^T + (1-2P_t)({\varvec{w}}-\varvec{\gamma })\varvec{\mu }_t\right) , \end{aligned}$$

where in the second equality we used $P_t^{{\varvec{w}}} - P_t = ({\varvec{w}}-\varvec{\gamma }){\varvec{P}}_t = ({\varvec{w}}-\varvec{\gamma }){\varvec{P}}_t^0$. For (139) we use (21) and $({\varvec{B}}-{\varvec{I}})\varvec{1}= {\varvec{0}}$ to deduce

$$\begin{aligned} P_{t+1}^{{\varvec{w}}}&= {\varvec{w}}{\varvec{P}}_{t+1}\\&= {\varvec{w}}{\varvec{B}}{\varvec{P}}_t + {\varvec{w}}\varvec{\varepsilon }_{t+1}\\&= P_t^{{\varvec{w}}} + {\varvec{w}}({\varvec{B}}-{\varvec{I}}){\varvec{P}}_t + {\varvec{w}}\varvec{\varepsilon }_{t+1}\\&= P_t^{{\varvec{w}}} + {\varvec{w}}({\varvec{B}}-{\varvec{I}}){\varvec{P}}_t^0 + {\varvec{w}}\varvec{\varepsilon }_{t+1}. \end{aligned}$$

We introduce the ascertainment bias term

$$\begin{aligned} \varvec{\varsigma }_t = \frac{E_c\left( E_c\left( {\varvec{P}}_t^0({\varvec{P}}_t^0)^ T|P_t^{{\varvec{w}}},P_t\right) |P_t\right) -E_c\left( E \left( {\varvec{P}}_t^0({\varvec{P}}_t^0)^T|P_t^{{\varvec{w}}},P_t\right) |P_t\right) }{P_t(1-P_t)}, \qquad \end{aligned}$$

(140)

which quantifies the effect of replacing the inner expectation $E$ of ${\varvec{P}}_t^0({\varvec{P}}_t^0)^T$ by $E_c$. Then we can write

$$\begin{aligned}&E_c\left( E\left( (P_{t+1}^{{\varvec{w}}}-P_t^{{\varvec{w}}})^2|P_t^{{\varvec{w}}} \right) |P_t\right) \\&\quad = E_c\left( E\left( (P_{t+1}^{{\varvec{w}}}-P_t^{ {\varvec{w}}})^2|P_t^{{\varvec{w}}},P_t\right) |P_t\right) \\&\quad = {\varvec{w}}({\varvec{B}}-{\varvec{I}})E_c\left( E\left( {\varvec{P}}_t^0({\varvec{P}}_t^0)^T|P_t^{{\varvec{w}}},P_t \right) |P_t\right) ({\varvec{B}}-{\varvec{I}})^T{\varvec{w}}^T\\&\quad \quad + {\varvec{w}}E_c\left( E\left( \varvec{\varepsilon }_{t+1}\varvec{\varepsilon }_{t+1}^T|P_t^{{\varvec{w}}},P_t\right) |P_t\right) {\varvec{w}}^T\\&\quad = {\varvec{w}}({\varvec{B}}-{\varvec{I}})E_c\left( {\varvec{P}}_t^0({\varvec{P}}_t^0)^T|P_t\right) ({\varvec{B}}-{\varvec{I}})^T{\varvec{w}}^T - {\varvec{w}}({\varvec{B}}-{\varvec{I}})\varvec{\varsigma }_t({\varvec{B}}-{\varvec{I}})^T{\varvec{w}}^TP_t(1-P_t)\\&\quad \quad + {\varvec{w}}E_c\left( \varvec{\varepsilon }_{t+1}\varvec{\varepsilon }_{t+1}^T|P_t\right) {\varvec{w}}^T - {\varvec{w}}\varvec{\zeta }_t{\varvec{w}}^TP_t(1-P_t)\\&\quad = P_t(1-P_t) \left( {\varvec{w}}({\varvec{B}}-{\varvec{I}})({\varvec{V}}_t-\varvec{\varsigma }_t)({\varvec{B}}-{\varvec{I}})^T{\varvec{w}}^T + {\varvec{w}}(\varvec{\Sigma }_t-\varvec{\zeta }_t){\varvec{w}}^T\right) . \end{aligned}$$

In order to verify (138), we first write

$$\begin{aligned} {\varvec{P}}_t - P_t^{{\varvec{w}}}\varvec{1}= ({\varvec{I}}-\varvec{1}{\varvec{w}}){\varvec{P}}_t = ({\varvec{I}}-\varvec{1}{\varvec{w}}){\varvec{P}}_t^0, \end{aligned}$$

which leads to

$$\begin{aligned} E_c\left( (P_{ti}-P_t^{{\varvec{w}}})^2|P_t\right) = P_t(1-P_t)\left( ({\varvec{I}}-\varvec{1}{\varvec{w}}){\varvec{V}}_t ({\varvec{I}}-\varvec{1}{\varvec{w}})^T\right) _{ii}, \end{aligned}$$

and then (138) follows since

$$\begin{aligned} \sum _{i=1}^s w_i \left( ({\varvec{I}}-\varvec{1}{\varvec{w}}){\varvec{V}}_t ({\varvec{I}}-\varvec{1}{\varvec{w}})^T\right) _{ii} = \sum _{i=1}^s w_i {\varvec{V}}_{tii} - ({\varvec{w}}-\varvec{\gamma }){\varvec{V}}_t({\varvec{w}}-\varvec{\gamma })^T. \end{aligned}$$

$\square $

Lemma 2

Let ${\varvec{c}}$ be a $1\times s$ vector, ${\varvec{C}}$ an $s\times s$ matrix, and define

$$\begin{aligned} \epsilon = {\varvec{c}}({\varvec{P}}_t^0-\varvec{\mu }_tP_t(1-P_t)) + \text{ tr }\left( {\varvec{C}}({\varvec{P}}_t^0({\varvec{P}}_t^0)^T - {\varvec{V}}_t P_t(1-P_t))\right) . \end{aligned}$$

Then

$$\begin{aligned} E_c(\epsilon ^2|P_t) \le 2|{\varvec{c}}|_1^2 |{\varvec{V}}_t|_\infty P_t(1-P_t) + 2|{\varvec{C}}|_1^2 \kappa _t, \end{aligned}$$

(141)

with

$$\begin{aligned} \kappa _t = \max _{1\le i \le s} E_c((P_{ti}^0)^4|P_t). \end{aligned}$$

Proof

Put ${\varvec{c}}=(c_1,\ldots ,c_s)$ and ${\varvec{C}}=(C_{ij})_{i,j=1}^s$. For simplicity, we omit conditioning on $P_t$ in the notation, writing $E_c(\cdot ) = E_c(\cdot |P_t)$. Then

$$\begin{aligned} E_c(\epsilon ^2)&\le 2E_c\left( \left( \sum _i c_i (P_{ti}^0-\mu _{ti}P_t(1-P_t))\right) ^2\right) + 2E_c\left( \left( \sum _{ij} C_{ji}P_{ti}^0P_{tj}^0\right) ^2\right) \\&\le 2E_c \left( \left( \sum _i c_i P_{ti}^0\right) ^2\right) + 2E_c\left( \left( \sum _{ij} C_{ji}P_{ti}^0P_{tj}^0\right) ^2\right) \\&\le 2\sum _{i,j} |c_i||c_j| E_c(|P_{ti}^0P_{tj}^0|) + 2\sum _{ijkl} |C_{ji}||C_{lk}||E_c (\left| P_{ti}^0 P_{tj}^0 P_{tk}^0 P_{tl}^0)\right| )\\&\le \sum _{i,j} |c_i||c_j| (E_c((P_{ti}^0)^2) + E_c((P_{tj}^0)^2)\\&+\, 0.5\sum _{ijkl} |C_{ji}||C_{lk}|(E_c(P_{ti}^0)^4 + E_c(P_{tj}^0)^4 + E_c(P_{tk}^0)^4 + E_c(P_{tl}^0)^4)\\&\le 2\sum _{i,j} |c_i||c_j| |{\varvec{V}}_t|_\infty P_t(1-P_t) + 2\sum _{ijkl} |C_{ji}||C_{lk}|\kappa _t, \end{aligned}$$

using the Cauchy Schwarz Inequality in the fourth step. The last term is identical to the right hand side of (141). $\square $

Proof of Theorem 2

When all loci are biallelic ($n(x)\equiv 2$), formulas (75) and (78) simplify to

$$\begin{aligned} G_{ST,t}^{{\varvec{w}}}&= Z/Y,\\ N_{eV,t}^{{\varvec{w}}}&= Y/X, \end{aligned}$$

respectively, where

$$\begin{aligned} X&= 2 \sum _{x=1}^n E\left( \left( P_{t+1}^{{\varvec{w}}}(x) - P_t^{{\varvec{w}}}(x)\right) ^2 | P_t^{{\varvec{w}}}(x)\right) ,\\ Y&= \sum _{x=1}^n P_t^{{\varvec{w}}}(x)(1-P_t^{{\varvec{w}}}(x)),\\ Z&= \sum _{x=1}^n \text{ tr }\left( {\varvec{C}}_Z {\varvec{P}}_t^0(x){\varvec{P}}_t^0(x)^T\right) \end{aligned}$$

are multilocus extensions of the corresponding numerators and denominators $X$, $Y$, $Z$ of Lemma 1, where also ${\varvec{C}}_Z$ is defined. We assume that $P_{ti}(x)$ is the value of the overall allele frequency $P_{ti}(x,a)$ of some (arbitrary) of the two alleles $a=1,2$ at locus $x$ and subpopulation $i=1,\ldots ,s$, $P_t(x)=\sum _{i=1}^s \gamma _i P_{ti}(x)$, $P_{ti}^0(x) = P_{ti}(x)-P_t(x)$ and ${\varvec{P}}_t^0(x) = (P_{ti}^0(x);i=1,\ldots ,s)^T$.

It will be convenient to condition on the allele frequency spectrum ${\mathcal {P}}_t = \{P_t(x); \, x=1,\ldots ,n\}$, writing

$$\begin{aligned} \begin{aligned}&G_{ST,t}^{{\varvec{w}}} = (\bar{Z}+ \epsilon _Z)/(\bar{Y}+ \epsilon _Y),\\&N_{eV,t}^{{\varvec{w}}} = (\bar{Y}+ \epsilon _Y)/(\bar{X}+ \epsilon _X), \end{aligned} \end{aligned}$$

(142)

where

$$\begin{aligned} \bar{X}&= E_c(X|{\mathcal {P}}_t)\\&= \sum _{x} P_t(x)(1-P_t(x))\left( \text{ tr }\left( {\varvec{C}}_X({\varvec{V}}_t(x)-\varvec{\varsigma }_t(x))\right) + \text{ tr }\left( {\varvec{C}}_X^\prime (\varvec{\Sigma }_t(x)-\varvec{\zeta }_t(x))\right) \right) ,\\ \bar{Y}&= E_c(Y|{\mathcal {P}}_t) = \sum _{x} P_t(x)(1-P_t(x))\left( 1- \text{ tr }\left( {\varvec{C}}_Y{\varvec{V}}_t(x)\right) +{\varvec{c}}_Y(x)\varvec{\mu }_t\right) ,\\ \bar{Z}&= E_c(Z|{\mathcal {P}}_t) = \sum _{x} P_t(x)(1-P_t(x))\text{ tr }({\varvec{C}}_Z{\varvec{V}}_t(x)), \end{aligned}$$

can be deduced from Lemma 1, using the same definitions of ${\varvec{C}}_X$, ${\varvec{C}}_X^\prime $ and ${\varvec{C}}_Y$ as there. Moreover, ${\varvec{V}}_t(x)$, $\varvec{\Sigma }_t(x)$, ${\varvec{c}}_Y(x)=(1-2P_t(x))({\varvec{w}}-\varvec{\gamma })$, $\varvec{\mu }_t(x)$, $\varvec{\zeta }_t(x)$ and $\varvec{\varsigma }_t(x)$ are the values of ${\varvec{V}}_t$, $\varvec{\Sigma }_t$, ${\varvec{c}}_Y$, $\varvec{\mu }_t$, $\varvec{\zeta }_t$ and $\varvec{\varsigma }_t$ at locus $x$. The remaining three quantities of (142) are the residual terms

$$\begin{aligned} \epsilon _X&= 2 \sum _{x}E\left( \left( P_{t+1}^{{\varvec{w}}}(x)- P_t^{{\varvec{w}}}(x)\right) ^2 | P_t^{{\varvec{w}}}(x)\right) \nonumber \\&- 2 \sum _{x}E_c\left( E\left( \left( P_{t+1}^{{\varvec{w}}} (x)-P_t^{{\varvec{w}}}(x)\right) ^2|P_t^{{\varvec{w}}}(x)\right) |P_t(x)\right) ,\nonumber \\ \epsilon _Y&= \sum _x {\varvec{c}}_Y(x)({\varvec{P}}_t^0(x,a)-\varvec{\mu }_t P_t(x)(1-P_t(x))),\\&- \sum _{x} \text{ tr }\left( {\varvec{C}}_Y({\varvec{P}}_t^0(x)({\varvec{P}}_t^0(x))^T - {\varvec{V}}_t(x) P_t(x)(1-P_t(x)))\right) ,\nonumber \\ \epsilon _Z&= \sum _{x} \text{ tr }\left( {\varvec{C}}_Z({\varvec{P}}_t^0(x)({\varvec{P}}_t^0(x))^T - {\varvec{V}}_t(x) P_t(x)(1-P_t(x)))\right) .\nonumber \end{aligned}$$

(143)

It follows from the definitions of $G_{ST}^\mathrm{{ appr},{\varvec{w}}}$ and $N_{eV}^\mathrm{{ appr},{\varvec{w}}}$ in (51) and (50) that we can write

$$\begin{aligned} \begin{aligned} G_{ST}^\mathrm{{ appr},{\varvec{w}}}&= Z^\mathrm{{ appr}}/Y^\mathrm{{ appr}},\\ N_{eV}^\mathrm{{ appr},{\varvec{w}}}&= Y^\mathrm{{ appr}}/X^\mathrm{{ appr}} \end{aligned} \end{aligned}$$

(144)

with

$$\begin{aligned} X^\mathrm{{ appr}}&= \sum _{x} P_t(x)(1-P_t(x))\left( \text{ tr }\left( {\varvec{C}}_X{\varvec{V}}\right) + \text{ tr }\left( {\varvec{C}}_X^\prime \varvec{\Sigma }\right) \right) ,\\ Y^\mathrm{{ appr}}&= \sum _{x} P_t(x)(1-P_t(x))\left( 1- \text{ tr }\left( {\varvec{C}}_Y{\varvec{V}}\right) \right) ,\\ Z^\mathrm{{ appr}}&= \sum _{x}P_t(x)(1-P_t(x))\text{ tr }({\varvec{C}}_Z{\varvec{V}}). \end{aligned}$$

Taking the difference of (142) and (144), we find that

$$\begin{aligned} G_{ST,t}^{{\varvec{w}}} - G_{ST}^\mathrm{{ appr},{\varvec{w}}}&= \frac{\bar{Z}}{\bar{Y}} - \frac{Z^\mathrm{{ appr}}}{Y^\mathrm{{ appr}}} + \frac{\bar{Z}+\epsilon _Z}{\bar{Y}+\epsilon _Y} - \frac{\bar{Z}}{\bar{Y}}\nonumber \\&\approx \frac{\bar{Z}}{\bar{Y}} - \frac{Z^\mathrm{{ appr}}}{Y^\mathrm{{ appr}}} + \frac{1}{\bar{Y}}\epsilon _Z - \frac{\bar{Z}}{\bar{Y}^2}\epsilon _Y - \frac{1}{\bar{Y}^2}\epsilon _Y\epsilon _Z + \frac{\bar{Z}}{\bar{Y}^3}\epsilon _Y^2, \qquad \qquad \end{aligned}$$

(145)

where in the last step we made a second order Taylor expansion. The first term on the right hand side of (145) can be further approximated as

$$\begin{aligned} \frac{\bar{Z}}{\bar{Y}} - \frac{Z^\mathrm{{ appr}}}{Y^\mathrm{{ appr}}}&= \frac{1}{\bar{Y}}(\bar{Z}- Z^\mathrm{{ appr}}) - \frac{Z^\mathrm{{ appr}}}{\bar{Y}Y^\mathrm{{ appr}}} (\bar{Y}- Y^\mathrm{{ appr}})\nonumber \\&\approx \frac{1}{Y^\mathrm{{ appr}}}(\bar{Z}- Z^\mathrm{{ appr}}) - \frac{Z^\mathrm{{ appr}}}{(Y^\mathrm{{ appr}})^2} (\bar{Y}- Y^\mathrm{{ appr}})\nonumber \\&\approx \frac{2}{H_{T}^\mathrm{{ eq}}\left( 1- tr\left( {\varvec{{\varvec{C}}}}_Y{\varvec{{\varvec{V}}}}\right) \right) }\cdot \frac{1}{n}(\bar{Z}-Z^\mathrm{{ appr}})\\&- \frac{2 tr({\varvec{{\varvec{C}}}}_Z{\varvec{V}})}{H_{T}^\mathrm{{ eq}} \left( 1- tr\left( {\varvec{{\varvec{C}}}}_Y{\varvec{V}}\right) \right) ^2}\cdot \frac{1}{n} (\bar{Y}- Y^\mathrm{{ appr}})\nonumber \\&:= \frac{C_1}{n}(\bar{Z}- Z^\mathrm{{ appr}}) - \frac{C_2}{n} (\bar{Y}- Y^\mathrm{{ appr}}),\nonumber \end{aligned}$$

(146)

where in the second step we replaced $\bar{Y}$ and $\bar{Z}$ by $Y^\mathrm{{ appr}}$ and $Z^\mathrm{{ appr}}$, in the third step we approximated the gene diversity

$$\begin{aligned} H_{Tt} = H_{Tt}^{{\varvec{\varvec{\gamma }}}} = \frac{2}{n}\sum _{x} P_t(x)(1-P_t(x)) \approx H_T^\mathrm{{ eq}}, \end{aligned}$$

in (76) by its quasi equilibrium limit (20), which is accurate, by a Law of Large Numbers argument, for large $n$. In the last step of (147) we introduced the constants $C_1$ and $C_2$ in order to simplify notation.

By the definition of $\epsilon _Y$ and $\epsilon _Z$ we have $E_c(\epsilon _Y|{\mathcal {P}}_t) = E_c(\epsilon _Z|{\mathcal {P}}_t)=0$, and, since $\bar{Y}$ and $\bar{Z}$ are both functions of ${\mathcal {P}}_t$, it follows that the first two terms of the last line of (145) have zero mean. Since all loci are in linkage equilibrium, the terms on the right hand sides of all three equations in (143) are independent for different $x$. By Lemma 2 it then follows, after some computations, that

$$\begin{aligned} |E_c \left( - \frac{1}{\bar{Y}^2}\epsilon _Y\epsilon _Z + \frac{\bar{Z}}{\bar{Y}^3}\epsilon _Y^2\right) | \le \frac{C_3^\prime }{n} \end{aligned}$$

(147)

for some constant $C_3^\prime $, independently of $n$. Combining (145), (147) and (147), using $P_t(x)(1-P_t(x))\le 1/4$, $|\text{ tr }({\varvec{C}}_Y({\varvec{V}}_t(x)-{\varvec{V}})|\le |{\varvec{C}}_Y|_1 |{\varvec{V}}_t(x)-{\varvec{V}}|_\infty $ and analogous estimates for all $x=1,\ldots ,n$, we find that

$$\begin{aligned} |E_c(G_{ST,t}^{{\varvec{w}}}) - G_{ST}^\mathrm{{ appr},{\varvec{w}}}|&\le \frac{C_1}{n}E_c|\bar{Z}- Z^\mathrm{{ appr}}| + \frac{C_2}{n} E_c|\bar{Y}- Y^\mathrm{{ appr}}|+ \frac{C_3^\prime }{n}\\&\le \frac{C_1|{\varvec{C}}_Z|_1+C_2|{\varvec{C}}_Y|_1}{4 n} \sum _x E_c(|{\varvec{V}}_t(x)-{\varvec{V}}|_\infty ) \\&+ \frac{C_2|{\varvec{{\varvec{w}}}}-{\varvec{\gamma }}|_1}{4n} \sum _x E_c(|\varvec{\mu }_t(x)|_\infty ) + \frac{C_3^\prime }{n}. \end{aligned}$$

Then we use $|{\varvec{C}}_Z|_1\le 2$ and $|{\varvec{C}}_Y|_1\le |{\varvec{w}}-\varvec{\gamma }|_1^2$ and let $t\rightarrow \infty $, in order to deduce that

$$\begin{aligned} \lim _{t\rightarrow \infty } |E_c(G_{ST,t}^{{\varvec{w}}}) \!-\! G_{ST}^\mathrm{{ appr},{\varvec{w}}}|&\le \frac{2C_1+C_2|{\varvec{{\varvec{w}}}}-{\varvec{\gamma }}|_1^2}{4} |\Delta {\varvec{V}}|^\mathrm{{ eq}} \!+\! \frac{C_2|{\varvec{{\varvec{w}}}}-{\varvec{\gamma }}|_1}{4} |\varvec{\mu }|^\mathrm{{ eq}} \!+\! \frac{C_3^\prime }{n}\\&=: C_1^\prime |\Delta {\varvec{V}}|^\mathrm{{ eq}} + C_2^\prime |\varvec{\mu }|^\mathrm{{ eq}} + \frac{C_3^\prime }{n}, \end{aligned}$$

since, for instance, the limit $\lim _{t\rightarrow \infty } E_c(|{\varvec{V}}_t(x)-{\varvec{V}}|_\infty ) = |\Delta {\varvec{V}}|^\mathrm{{ eq}}$ in (48) exists for all $x$. As similar analysis shows that

$$\begin{aligned}&\lim _{t\rightarrow \infty } |E_c(N_{eV,t}^{{\varvec{w}}}) - N_{eV}^\mathrm{{ appr},{\varvec{w}}}| \le \frac{C_3}{n}\lim _{t\rightarrow \infty } E_c|\bar{Y}- Y^\mathrm{{ appr}}|\\&\quad + \frac{C_4}{n} \lim _{t\rightarrow \infty } E_c|\bar{X}- X^\mathrm{{ appr}}| + \frac{C_9^\prime }{n}\\&\qquad \le \frac{C_3|{\varvec{{\varvec{C}}}}_Y|_1 + C_4|{\varvec{{\varvec{C}}}}_X|_1}{4}|\Delta {\varvec{V}}|^\mathrm{{ eq}} + \frac{C_4|{\varvec{{\varvec{C}}}}_X^\prime |_1}{4} |\Delta \varvec{\Sigma }|^\mathrm{{ eq}} + \frac{C_3|{\varvec{{\varvec{w}}}}-{\varvec{\gamma }}|_1}{4}|\varvec{\mu }|^\mathrm{{ eq}}\\&\quad \quad \quad + \frac{C_4|{\varvec{{\varvec{C}}}}_X|_1}{4}|\varvec{\varsigma }|^\mathrm{{ eq}} + \frac{C_4|{\varvec{{\varvec{C}}}}_X^\prime |_1}{4}|\varvec{\zeta }|^\mathrm{{ eq}} + \frac{C_9^\prime }{n}\\&\quad \quad \le \frac{C_3|{\varvec{{\varvec{w}}}}-{\varvec{\gamma }}|_1^2 + 4C_4|{\varvec{{\varvec{w}}}}-{\varvec{\gamma }}|_1^2}{4}|\Delta {\varvec{V}}|^\mathrm{{ eq}} + \frac{2C_4}{4} |\Delta \varvec{\Sigma }|^\mathrm{{ eq}} + \frac{C_3|{\varvec{{\varvec{w}}}}-{\varvec{\gamma }}|_1}{4}|\varvec{\mu }|^\mathrm{{ eq}}\\&\quad \quad \quad + \frac{4C_4|{\varvec{{\varvec{w}}}}-{\varvec{\gamma }}|_1^2}{4}|\varvec{\varsigma }|^\mathrm{{ eq}} + \frac{2C_4}{4}|\varvec{\zeta }|^\mathrm{{ eq}} + \frac{C_9^\prime }{n}\\&\quad \quad =: C_4^\prime |\Delta {\varvec{V}}|^\mathrm{{ eq}} + C_5^\prime |\Delta \varvec{\Sigma }|^\mathrm{{ eq}} + C_6^\prime |\varvec{\mu }|^\mathrm{{ eq}} + C_7^\prime |\varvec{\varsigma }|^\mathrm{{ eq}} + C_8^\prime |\varvec{\zeta }|^\mathrm{{ eq}} + \frac{C_9^\prime }{n}, \end{aligned}$$

where

$$\begin{aligned} |\varvec{\varsigma }|^\mathrm{{ eq}} = \lim _{t\rightarrow \infty } E_c(|\varvec{\varsigma }_t|_\infty ) \end{aligned}$$

(148)

is an asymptotic upper bound for the remainder terms $\varvec{\varsigma }_t(x)$, defined in the same way as (45)–(47), and

$$\begin{aligned} C_3&= 2/\left( H_T^\mathrm{{ eq}}(\text{ tr }({\varvec{C}}_X{\varvec{V}})+ \text{ tr }({\varvec{C}}_X^\prime \varvec{\Sigma }))\right) ,\\ C_4&= 2\left( 1- \text{ tr }({\varvec{C}}_Y{\varvec{V}})\right) /\left( H_T^\mathrm{{ eq}}(\text{ tr }({\varvec{C}}_X{\varvec{V}})+ \text{ tr }({\varvec{C}}_X^\prime \varvec{\Sigma }))^2\right) . \end{aligned}$$

$\square $

Appendix F: Details from Sect. 11

Proof of Proposition 4

Let $Q_{ij,kl}$ denote the probability that two different genes from subpopulations $i$ and $j$ have their parents in subpopulations $k$ and $l$ respectively, and let $p_{ijk}$ be the coalescence probability defined in (84).

It is possible to compute $q_{t+1,ij}$ by conditioning on the parental subpopulation $k$ and $l$ one generation back in time, and then look at the ancestry of the parents $t$ generations back in time. Since coalescence can only appear when $k=l$, we find that

$$\begin{aligned} q_{t+1,ij} = \sum _{k,l} Q_{ij,kl}(1-p_{ijk})^{\{k=l\}}q_{t,kl}. \end{aligned}$$

This equals the recursion in (82), with

$$\begin{aligned} D_{ij,kl}=Q_{ij,kl}(1-p_{ijk})^{\{k=l\}}. \end{aligned}$$

(149)

On the other hand, we can rewrite the gene diversity recursion (26) as

$$\begin{aligned} E(H_{t+1,ij}|{\varvec{P}}_t) = \left( 1-\frac{1}{2Nu_i}\right) ^{\{i=j\}} \sum _{k,l} Q_{ij,kl}(1-p_{ijk})^{\{k=l\}} \frac{H_{tkl}}{ \left( 1-\frac{1}{2Nu_k}\right) ^{\{k=l\}}}, \end{aligned}$$

since $(1-1/(2Nu_i))^{\{i=j\}}$ is the probability that two genes, drawn with replacement from subpopulations $i$ and $j$ in generation $t+1$ are different, and $H_{tkl}/(1-1/(2Nu_k))^{\{k=l\}}$ is the probability that two different genes from subpopulations $k$ and $l$ in generation $t$ have different alleles. Hence we see from (26) that

$$\begin{aligned} A_{ij,kl} = \frac{\left( 1-\frac{1}{2Nu_i}\right) ^{\{i=j\}}}{\left( 1-\frac{1}{2Nu_k}\right) ^{\{k=l\}}} Q_{ij,kl}(1-p_{ijk})^{\{k=l\}}, \end{aligned}$$

from which (83) follows. $\square $

We will derive explicit expressions of the matrix elements $D_{ij,kl}$ in Proposition 4. To this end, one could either calculate the coefficients $U_{ij,kl}$ of the covariance matrix expansion (25), and then use Propositions 1 and 4 in order to find $D_{ij,kl}$. Alternatively, one may employ coalescence probabilities and obtain the elements of ${\varvec{D}}$ directly from (82). We use this latter approach in order to prove the following:

Proposition 7

Asymptotically, for large populations and reproduction scenario 2, the elements of ${\varvec{D}}$ have the form

$$\begin{aligned} D_{ij,kl} = b_{ik}\left( \frac{b_{il}-\frac{1_{\{k=l\}}}{2Nu_i}}{1-\frac{1}{2Nu_i}}\right) ^{\{i=j\}} b_{jl}^{\{i\ne j\}} \left( 1-p_{ijk}\right) ^{\{k=l\}} + o(N^{-1}), \end{aligned}$$

(150)

where $p_{ijk}$ is the coalescence probability (84) that two genes from subpopulations $i$ and $j$, that have their parents in $k$, have the same parent, and

$$\begin{aligned} \sigma _{ijk}(N) = \frac{1}{m_{ki}m_{kj}} \cdot \left\{ \begin{array}{l@{\quad }l} E\left( \nu _{ki}^l (\nu _{ki}^l-1)\right) , &{} i=j,\\ E\left( \nu _{ki}^l\nu _{kj}^l\right) , &{} i\ne j. \end{array}\right. \end{aligned}$$

(151)

For reproduction scenario 3 with $\alpha _i\equiv \infty $, it holds that

$$\begin{aligned} D_{ij,kl} = b_{ik}b_{jl} \left( 1-p_{ijk}\right) ^{\{k=l\}} + o(N^{-1}), \end{aligned}$$

(152)

with coalescence probability $p_{ijk}=1/(2N_{ek})$, so that $\sigma _{ijk}(N)$ in (84) equals

$$\begin{aligned} \sigma _{ijk}(N) = \frac{Nu_k}{N_{ek}}. \end{aligned}$$

(153)

Nagylaki (2000) has derived a recursion that generalizes (152) when $N_{ek}=Nu_k$ for probabilities that concern not only the time when but also the subpopulation where coalescence of two genes from subpopulations $i$ and $j$ occurs. The constant $\sigma _{ijk}(N)$ was defined in Hössjer (2011). As mentioned in Sect 11.1, it can be interpreted as the coalescence rate of a pair of lines from subpopulations $i$ and $j$, when both of these migrate backwards to $k$.

Proof of Proposition 7

In order to establish (150) and (152), we will use (149), and hence we need to find expressions for $Q_{ij,kl}$ and $p_{ijk}$. Starting with reproduction scenario 2, we have

$$\begin{aligned} Q_{ij,kl} = \left\{ \begin{array}{l@{\quad }l} b_{ik}b_{jl}, &{} i\ne j,\\ 2Nu_ib_{ik}(2Nu_ib_{ik}-1)/(2Nu_i(2Nu_i-1)), &{} i=j,k=l,\\ 2Nu_ib_{ik}\cdot 2Nu_ib_{il}/(2Nu_i(2Nu_i-1)), &{} i=j,k\ne l, \end{array}\right. \qquad \quad \end{aligned}$$

(154)

since the two genes are drawn without replacement, and an exact fraction $b_{ik}$ of the parents of the offspring genes of subpopulation $i$ originate from subpopulation $k$, and similarly, an exact fraction $b_{jl}$ of the genes in $j$ to have their parent in $l$. We can rewrite (154) more compactly as

$$\begin{aligned} Q_{ij,kl} = b_{ik}\left( \frac{b_{il}-\frac{1_{\{k=l\}}}{2Nu_i}}{1-\frac{1}{2Nu_i}}\right) ^{\{i=j\}} b_{jl}^{\{i\ne j\}}. \end{aligned}$$

It follows for instance from Hössjer (2011) that the coalescence probability $p_{ijk}$ has the form (84), and this completes the proof of (150).

For reproduction scenario 3 with $\alpha _i\equiv \infty $, we simply have

$$\begin{aligned} Q_{ij,kl} = b_{ik}b_{jl}, \end{aligned}$$

since the parental subpopulations are drawn independently for two genes of subpopulations $i$ and $j$, from the probability distributions corresponding to rows $i$ and $j$ of ${\varvec{B}}$. Moreover, the coalescence probability is $1/(2N_{ek})$, since this is the probability that the two parents in $k$ originate from the same gene of a breeder, and this completes the proof of (152). $\square $

Proof of Theorem 3

We will use (87) in order to prove (88). By Perron–Frobenius’ Theorem, there exists a unique largest eigenvalue $\lambda $ of ${\varvec{D}}$, with corresponding left and right eigenvectors ${\varvec{l}}=(l_{ij})$ and ${\varvec{r}}=(r_{ij})$, which can be normalized so that

$$\begin{aligned} \sum _{ij} l_{ij}&= 1,\\ \sum _{ij} l_{ij}r_{ij}&= 1. \end{aligned}$$

By a Jordan decomposition of ${\varvec{D}}$, it follows that

$$\begin{aligned} {\varvec{D}}^\tau = \lambda ^\tau {\varvec{r}}{\varvec{l}}+ o(\lambda ^\tau ) \text{ as } \tau \rightarrow \infty . \end{aligned}$$

Our asymptotic analysis $N\rightarrow \infty $ is equivalent to letting the perturbation parameter

$$\begin{aligned} \varepsilon =\frac{1}{2N} \end{aligned}$$

tend to zero. In order to highlight the dependence of ${\varvec{D}}={\varvec{D}}(\varepsilon )$ on $\varepsilon $, we Taylor expand its elements around $\varepsilon =0$, as

$$\begin{aligned} D_{ij,kl} = D_{ij,kl}(\varepsilon ) = b_{ik}b_{jl} + \dot{D}_{ij,kl}\varepsilon + o(\varepsilon ). \end{aligned}$$

It follows from (150) that $\dot{{\varvec{D}}}=(\dot{D}_{ij,kl})$ has elements

$$\begin{aligned} \dot{D}_{ij,kl} = -1_{\{k=l\}} u_k^{-1}b_{ik}b_{jl}\sigma _{ijk} + 1_{\{i=j\}}u_i^{-1}b_{ik}(b_{il}-1_{\{k=l\}}) \end{aligned}$$

for reproduction scenario 2 and

$$\begin{aligned} \dot{D}_{ij,kl} = -1_{\{k=l\}} u_k^{-1}b_{ik}b_{jl}\sigma _{ijk} \end{aligned}$$

for reproduction scenario 3 with $\alpha _i\equiv \infty $. Clearly, ${\varvec{D}}(0)={\varvec{B}}\otimes {\varvec{B}}$ is the Kronecker product of ${\varvec{B}}$ with itself for either reproduction scenario. It has largest eigenvalue $\lambda (0)=1$, since ${\varvec{B}}$ is the transition matrix of an irreducible Markov chain, with a unique largest eigenvalue 1. Moreover, the form of the left and right eigenvectors ${\varvec{l}}={\varvec{l}}(\varepsilon )$ and ${\varvec{r}}={\varvec{r}}(\varepsilon )$ can be deduced from the left and right eigenvectors of ${\varvec{B}}$ when $\varepsilon =0$, as

$$\begin{aligned} l_{ij}(0)&= \gamma _i\gamma _j,\\ r_{ij}(0)&= 1. \end{aligned}$$

It follows from perturbation theory of matrices (see for instance Nagylaki (1980) and Van der AA et al. 2007), that

$$\begin{aligned} \lambda (\varepsilon ) = 1 + \dot{\lambda }\varepsilon + o(\varepsilon ) \text{ as } \varepsilon \rightarrow 0, \end{aligned}$$

where

$$\begin{aligned} \dot{\lambda }&= {\varvec{l}}(0)\dot{{\varvec{D}}}{\varvec{r}}(0)\\&= -\sum _{ijk} \gamma _i\gamma _j u_k^{-1}b_{ik}b_{jl}\sigma _{ijk} + \sum _{ikl} \gamma _i^2u_i^{-1}b_{ik}(b_{il}-1_{\{k=l\}})\\&= -C + \sum _{i} \gamma _i^2u_i^{-1} (1-1)\\&= -C, \end{aligned}$$

for reproduction scenario 2, with $C$ as defined in (89). A similar (but simpler) analysis shows that $\dot{\lambda }= -C$ for reproduction scenario 3 with $\alpha _i\equiv \infty $. In view of (87), this implies

$$\begin{aligned} N_{e\pi }&= \frac{1}{2}{\varvec{W}}_T({\varvec{I}}-{\varvec{D}})^{-1}{\underline{{\mathbf{1}}}}\nonumber \\&= \frac{1}{2}{\varvec{W}}_T \left( \sum _{\tau =0}^\infty {\varvec{D}}^\tau \right) {\underline{{\mathbf{1}}}}\nonumber \\&= \frac{1}{2}{\varvec{W}}_T \left( \sum _{\tau =0}^\infty \left( \lambda ^\tau {\varvec{r}}{\varvec{l}}+ o(\lambda ^\tau )\right) \right) {\underline{{\mathbf{1}}}}\nonumber \\&= \frac{1}{2} \sum _{\tau =0}^\infty \left( ({\varvec{W}}_T{\varvec{r}}{\varvec{l}}{\underline{{\mathbf{1}}}})\lambda ^\tau + o(\lambda ^\tau )\right) \nonumber \\&= \frac{1}{2} \sum _{\tau =0}^\infty \left( {\varvec{W}}_T({\underline{{\mathbf{1}}}} + o(1))\lambda ^\tau + o(\lambda ^\tau )\right) \\&= \frac{1}{2} \sum _{\tau =0}^\infty \left( \lambda ^\tau + o(\lambda ^\tau )\right) \nonumber \\&= \frac{1}{2(1-\lambda )}(1+o(1))\nonumber \\&= \frac{1}{2C\varepsilon }(1+o(1))\nonumber \\&= \frac{N}{C}(1+o(1))\nonumber \end{aligned}$$

(155)

as $\varepsilon \rightarrow 0$, or equivalently, as $N\rightarrow \infty $, thereby proving (88). In the fifth equality of (156) we used that ${\varvec{r}}={\varvec{r}}(\varepsilon )={\underline{{\mathbf{1}}}} + o(1)$ as $\varepsilon \rightarrow 0$, and in the sixth equality ${\varvec{W}}_T{\underline{{\mathbf{1}}}} = \sum _{i,j} w_iw_j = 1$, regardless of the choice of weight vector ${\varvec{w}}$.

We now turn to the proof of (90). It follows from Table 3 that $\Vert {\varvec{U}}\Vert =O(N^{-1})$ for both reproduction scenarios 2 and 3 (with $\alpha _i\equiv \infty $). Invoking the upper part of (33) and (38), we deduce that

$$\begin{aligned} \text{ vec }(\varvec{\Sigma }) = {\varvec{U}}{\underline{{\mathbf{1}}}} \left( 1+ O(N^{-1}\text{ Mixtime })\right) = {\varvec{U}}{\underline{{\mathbf{1}}}} \left( 1+ O(N^{-1}\right) , \end{aligned}$$

where the last step follows from Proposition 3 and the fact that the migration rates are kept fixed. Inserting the last expression into (50), we find that

$$\begin{aligned} N_{eV}^\mathrm{{ appr}} = \frac{N}{C^\prime } + o(N), \end{aligned}$$

(156)

where

$$\begin{aligned} C^\prime = 2N\sum _{i,j=1}^s \gamma _i\gamma _j({\varvec{U}}{\underline{{\mathbf{1}}}})_{ij}. \end{aligned}$$

(157)

It thus remains to verify, for both reproduction scenarios, that $C^\prime = C$. Starting with reproduction scenario 2, we find from Table 3 that

$$\begin{aligned} ({\varvec{U}}{\underline{{\mathbf{1}}}})_{ij} = \sum _{k,l=1}^s U_{ij,kl} = \sum _{k=1}^s \frac{C_{kij}u_k}{2Nu_iu_j}, \end{aligned}$$

(158)

with $C_{kij}=\text{ Cov }(\nu _{ki}^l,\nu _{kj}^l)$. By the assumptions of the theorem, the quantities $\sigma _{ijk}(N)$ in (151) will converge as $N\rightarrow \infty $. Since the migration rates in ${\varvec{M}}$ are fixed, it follows that the covariances $C_{kij}=C_{kij}(N)$ will converge as well. With a slight abuse of notation, we write $C_{kij}$ also for the asymptotic $N\rightarrow \infty $ limits. Inserting (158) into (157), we find that

$$\begin{aligned} C^\prime = \sum _{i,j,k=1}^s \gamma _i\gamma _j \frac{C_{kij}u_k}{u_iu_j}. \end{aligned}$$

On the other hand, it follows from the definition of $\sigma _{ijk}$ in (151), that each covariance term $C_{kij}$ can be rewritten as

$$\begin{aligned} C_{kij} = \sigma _{ijk}m_{ki}m_{kj} - m_{ki}m_{kj} + m_{ki}1_{\{i=j\}}. \end{aligned}$$

(159)

Inserting (159) into (157), it follows, after some computations, that

$$\begin{aligned} C^\prime&= \sum _{ijk} \gamma _i\gamma _j u_k^{-1}b_{ik}b_{jk}\left( \sigma _{kij}-1 + m_{ki}^{-1}1_{\{i=j\}}\right) \\&= \sum _{ijk} \gamma _i\gamma _ju_k^{-1}b_{ik}b_{jk}\sigma _{kij} - \sum _{ijk} \gamma _i\gamma _ju_k^{-1}b_{ik}b_{jk} + \sum _{ik} \gamma _i^2u_k^{-1}m_{ki}^{-1}b_{ik}^2\\&= C - \sum _k u_k^{-1}\gamma _k^2 + \sum _i u_i^{-1}\gamma _i^2,\\&= C, \end{aligned}$$

and in view of (156), this proves (90).

For reproduction scenario 3 with $\alpha _i\equiv \infty $, it follows from Table 3 that

$$\begin{aligned} ({\varvec{U}}{\underline{{\mathbf{1}}}})_{ij} = \sum _{k,l=1}^s U_{ij,kl} = \sum _{k=1}^s b_{ik}b_{jk}\left( \frac{1}{2N_{ek}} - \frac{1}{2Nu_k}\right) + \frac{1_{\{i=j\}}}{2Nu_i}. \end{aligned}$$

Insertion of this expression into (157) leads to

$$\begin{aligned} C^\prime&= 2N\sum _{i,j,k=1}^s \gamma _i\gamma _j b_{ik}b_{jk}\left( \frac{1}{2N_{ek}} - \frac{1}{2Nu_k}\right) + 2N\sum _{i=1}^s \frac{\gamma _i^2}{2Nu_i}\nonumber \\&= 2N\sum _{k=1}^s \gamma _k^2 \left( \frac{1}{2N_{ek}} - \frac{1}{2Nu_k}\right) + 2N\sum _{i=1}^s \frac{\gamma _i^2}{2Nu_i}\nonumber \\&= \sum _{k=1}^s u_k^{-1}\gamma _k^2\cdot \frac{2Nu_k}{2N_{ek}} \\&= \sum _{k=1}^s u_k^{-1}\gamma _k^2 \sigma _k\nonumber \\&= C,\nonumber \end{aligned}$$

(160)

where $\sigma _k=\sigma _{ijk}$ is defined in (153). The last step of (160) follows easily by adding a term $\sigma _k$ on both sides of Eq. (91). $\square $

Given two random variables $X$ and $Y$, we put $E_0(Y/X)^*=E_0(Y)/E_0(X)$, where $E_0(X)=E(X|{\varvec{P}}_0=P_0\varvec{1})$, a prediction of $Y/X$ given that the allele frequencies of the founder generation are the same in all subpopulations. The following proposition shows that $\bar{f}_{ST}^{{\varvec{w}}}$ and $f_{ST}^{{\varvec{w}}}$ are weighted averages over $t$ of $E_0(\bar{F}_{ST,t}^{{\varvec{w}}})^{*}$ and $E_0(F_{ST,t}^{{\varvec{w}}})^{*}$ respectively:

Proposition 8

The matrix $\bar{{\varvec{H}}}_t = (\bar{H}_{tij})_{i,j=1}^s$ of gene diversities, defined for a pair of distinct genes, satisfies

$$\begin{aligned} E_0\left( \text{ vec }(\bar{{\varvec{H}}}_t)\right) = 2P_0(1-P_0){\varvec{D}}^t\left( {\underline{{\mathbf{1}}}} +O(N^{-1})\right) , \end{aligned}$$

(161)

and the fixation index in (98) is a weighted average

$$\begin{aligned} \bar{f}_{ST}^{{\varvec{w}}} = \sum _{t=0}^\infty \bar{\omega }_t E_0\left( \bar{F}_{ST,t}^{{\varvec{w}}}\right) ^{*} + O(N^{-1}) = \sum _{t=0}^\infty \bar{\omega }_t \frac{E_0\left( \bar{H}_{Tt}^{{\varvec{w}}}-\bar{H}_{St}^{{\varvec{w}}}\right) }{E_0\left( \bar{H}_{Tt}^{{\varvec{w}}}\right) } + O(N^{-1}), \nonumber \\ \end{aligned}$$

(162)

of predictions $E_0\left( \bar{F}^{{\varvec{w}}} _{ST,t}\right) ^{*}$ of the fixation index (95) over different time horizons $t$, with weights

$$\begin{aligned} \bar{\omega }_t = \frac{{\varvec{W}}_T{\varvec{D}}^t{\underline{{\mathbf{1}}}}}{\sum _{\tau =0}^\infty {\varvec{W}}_T{\varvec{D}}^\tau {\underline{{\mathbf{1}}}}}. \end{aligned}$$

Analogously, the matrix ${\varvec{H}}_t = (H_{tij})_{i,j=1}^s$ of gene diversities, when the pair of genes is drawn with replacement, satisfies

$$\begin{aligned} E_0\left( \text{ vec }({\varvec{H}}_t)\right) = 2P_0(1-P_0){\varvec{A}}^t{\underline{{\mathbf{1}}}}, \end{aligned}$$

(163)

and the fixation index (99) is a weighted average

$$\begin{aligned} f_{ST}^{{\varvec{w}}} = \sum _{t=0}^\infty \omega _t E_0\left( F_{ST,t}^{{\varvec{w}}}\right) ^{*} = \sum _{t=0}^\infty \omega _t \frac{E_0\left( H_{Tt}^{{\varvec{w}}}-H_{St}^{{\varvec{w}}}\right) }{E_0\left( H_{Tt}^{{\varvec{w}}}\right) }, \end{aligned}$$

(164)

with weights

$$\begin{aligned} \omega _t = \frac{{\varvec{W}}_T{\varvec{A}}^t{\underline{{\mathbf{1}}}}}{\sum _{\tau =0}^\infty {\varvec{W}}_T{\varvec{A}}^\tau {\underline{{\mathbf{1}}}}}. \end{aligned}$$

(165)

It is implicit from the proof of Theorem 3 that the weights (165) correspond to a probability distribution with mean $O(N)$, as discussed in Subsection 11.2.

Proof of Proposition 8

By means of an expansion $({\varvec{I}}-{\varvec{D}})^{-1}=\sum _{t=0}^\infty {\varvec{D}}^t$, it is clear that (98) can be rewritten as

$$\begin{aligned} \bar{f}_{ST}^{{\varvec{w}}} = \sum _{t=0}^\infty \bar{\omega }_t \frac{({\varvec{W}}_T-{\varvec{W}}_S){\varvec{D}}^{t}{\underline{{\mathbf{1}}}}}{{\varvec{W}}_T{\varvec{D}}^{t}{\underline{{\mathbf{1}}}}}, \end{aligned}$$

(166)

given an assumption that the $\mu \rightarrow 0$ approximation in (98) is exact. On the other hand, as in the proof of (82), it follows that we get a gene diversity recursion

$$\begin{aligned} E\left( \text{ vec }(\bar{{\varvec{H}}}_{t+1})|{\varvec{P}}_t\right) = {\varvec{D}}\text{ vec }(\bar{{\varvec{H}}}_t) \end{aligned}$$

(167)

instead of (29) when two genes are drawn without replacement. We prove (161) by repeated use of (167). This yields

$$\begin{aligned} E_0(\text{ vec }(\bar{{\varvec{H}}}_t))&= E(\text{ vec }(\bar{{\varvec{H}}}_t)|{\varvec{P}}_0=P_0\varvec{1})\\&= {\varvec{D}}^t\text{ vec }(\bar{{\varvec{H}}}_0)\\&= 2P_0(1-P_0){\varvec{D}}^t({\underline{{\mathbf{1}}}} + O(N^{-1})), \end{aligned}$$

applying (94) with $t=0$ in the last step. Invoking the definitions of $\bar{H}_{Tt}^{{\varvec{w}}}$ and $\bar{H}_{St}^{{\varvec{w}}}$ into (161), this yields

$$\begin{aligned} E_0\left( \bar{H}_{Tt}^{{\varvec{w}}}-\bar{H}_{Tt}^{{\varvec{w}}}\right)&= 2P_0(1-P_0) ({\varvec{W}}_T-{\varvec{W}}_S){\varvec{D}}^t{\underline{{\mathbf{1}}}} + O(N^{-1}),\\ E_0\left( \bar{H}_{Tt}^{{\varvec{w}}}\right)&= 2P_0(1-P_0) {\varvec{W}}_T{\varvec{D}}^t{\underline{{\mathbf{1}}}} \left( 1 + O(N^{-1})\right) , \end{aligned}$$

where the last step follows as in the proof of Theorem 3 (see in particular (156)), since

$$\begin{aligned} {\varvec{W}}_T{\varvec{D}}^t\left( {\underline{{\mathbf{1}}}} +O(N^{-1})\right)&= \lambda ^t{\varvec{W}}_T{\varvec{r}}{\varvec{l}}({\underline{{\mathbf{1}}}} +O(N^{-1})) + o(\lambda ^t)\\&= \lambda ^t{\varvec{W}}_T{\varvec{r}}(1+O(N^{-1})) + o(\lambda ^t)\\&= \lambda ^t (1+O(N^{-1})) + o(\lambda ^t)\\&= {\varvec{W}}_T{\varvec{D}}^t{\underline{{\mathbf{1}}}} \left( 1 + O(N^{-1})\right) . \end{aligned}$$

Hence it follows that

$$\begin{aligned} E_0\left( \bar{F}_{ST,t}^{{\varvec{w}}}\right) = \frac{({\varvec{W}}_T-{\varvec{W}}_S){\varvec{D}}^t{\underline{{\mathbf{1}}}} + O(N^{-1})}{{\varvec{W}}_T{\varvec{D}}^t{\underline{{\mathbf{1}}}} \left( 1 + O(N^{-1})\right) } = \frac{({\varvec{W}}_T-{\varvec{W}}_S){\varvec{D}}^t{\underline{{\mathbf{1}}}}}{{\varvec{W}}_T{\varvec{D}}^t{\underline{{\mathbf{1}}}}} + O(N^{-1}). \end{aligned}$$

By inserting the last equation into (166) we arrive at (162).

Equations (163) and (164) are derived analogously, although the proof is simpler. The reason is that the $O(N^{-1})$ remainder terms vanish, since $\text{ vec }({\varvec{H}}_0)=2P_0(1-P_0){\underline{{\mathbf{1}}}}$ holds exactly when ${\varvec{P}}_0=P_0\varvec{1}$. $\square $

Proof of Theorem 4

It will be convenient to rewrite (27) as

$$\begin{aligned} {\varvec{A}}= {\varvec{B}}\otimes {\varvec{B}}- {\varvec{U}}= {\varvec{G}}- {\varvec{U}}, \end{aligned}$$

(168)

where ${\varvec{G}}=(G_{ij,kl})$ has elements $G_{ij,kl}=b_{ik}b_{jl}$. The Jordan decomposition of ${\varvec{B}}$ in Appendix A implies that ${\varvec{B}}^0{\varvec{B}}^{t-1} = {\varvec{B}}^{t-1}{\varvec{B}}^0 = ({\varvec{B}}^0)^{t}$ for any non-negative integer $t$. Since ${\varvec{G}}^0 = {\varvec{B}}^0\otimes {\varvec{B}}^0$ and ${\varvec{G}}={\varvec{B}}\otimes {\varvec{B}}$, it is easy to see that this implies

$$\begin{aligned} {\varvec{G}}^0{\varvec{G}}^{t-1} = {\varvec{G}}^{t-1}{\varvec{G}}^0 = ({\varvec{G}}^0)^{t}. \end{aligned}$$

(169)

A similar calculation as in the proof of Theorem 3 (see in particular (156)) yields

$$\begin{aligned} \sum _{t=0}^\infty ({\varvec{G}}-{\varvec{U}})^t {\underline{{\mathbf{1}}}} = (1-\lambda )^{-1}{\underline{{\mathbf{1}}}} + o\left( (1-\lambda )^{-1}\right) , \end{aligned}$$

(170)

where $\lambda $ is the unique largest eigenvalue of ${\varvec{G}}-{\varvec{U}}$. We will also make use of the fact that

$$\begin{aligned} ({\varvec{W}}_T-{\varvec{W}}_S){\varvec{G}}= ({\varvec{W}}_T-{\varvec{W}}_S){\varvec{G}}^0 = - {\varvec{W}}_S{\varvec{G}}^0, \end{aligned}$$

(171)

which follows since ${\varvec{w}}=\varvec{\gamma }$ and

$$\begin{aligned} {\varvec{W}}_T{\varvec{G}}&= \text{ vec }\left( (\varvec{\gamma }{\varvec{B}})\otimes (\varvec{\gamma }{\varvec{B}})\right) ^T\\&= \text{ vec }(\varvec{\gamma }\otimes \varvec{\gamma })^T\\&= {\varvec{W}}_T,\\ {\varvec{W}}_S{\varvec{G}}&= {\varvec{W}}_S\left( (\varvec{1}\varvec{\gamma })\otimes (\varvec{1}\varvec{\gamma })\right) + {\varvec{W}}_S\left( (\varvec{1}\varvec{\gamma })\otimes {\varvec{B}}^0\right) + {\varvec{W}}_S({\varvec{B}}^0\otimes (\varvec{1}\varvec{\gamma })) + {\varvec{W}}_S{\varvec{G}}^0\\&= {\varvec{W}}_T + 0 + 0 + {\varvec{W}}_S{\varvec{G}}^0\\&= {\varvec{W}}_T + {\varvec{W}}_S{\varvec{G}}^0, \end{aligned}$$

with $\varvec{1}$ a column vector of length $s$, and

$$\begin{aligned} {\varvec{W}}_T{\varvec{G}}^0&= \text{ vec }\left( (\varvec{\gamma }{\varvec{B}}^0)\otimes (\varvec{\gamma }{\varvec{B}}^0)\right) ^T\\&= \text{ vec }\left( \mathbf{0} \otimes \mathbf{0} \right) ^T\\&= \mathbf{0}. \end{aligned}$$

Based on these preliminaries, we can rewrite the numerator of (99) as

$$\begin{aligned}&({\varvec{W}}_T-{\varvec{W}}_S)\left( {\varvec{I}}-({\varvec{G}}-{\varvec{U}})\right) ^{-1}{\underline{{\mathbf{1}}}} = ({\varvec{W}}_T-{\varvec{W}}_S)\sum _{t=0}^\infty ({\varvec{G}}-{\varvec{U}})^t{\underline{{\mathbf{1}}}}\\&\quad = ({\varvec{W}}_S-{\varvec{W}}_T)\sum _{t=0}^\infty \left( -{\varvec{G}}^t+\sum _{\tau =0}^{t-1} {\varvec{G}}^\tau {\varvec{U}}({\varvec{G}}-{\varvec{U}})^{t-\tau -1}\right) {\underline{{\mathbf{1}}}}\\&\quad = ({\varvec{W}}_S-{\varvec{W}}_T)\sum _{t=0}^\infty \sum _{\tau =0}^{t-1} ({\varvec{G}}^0)^\tau {\varvec{U}}({\varvec{G}}-{\varvec{U}})^{t-\tau -1}{\underline{{\mathbf{1}}}}\\&\quad = ({\varvec{W}}_S-{\varvec{W}}_T)\sum _{\tau =0}^\infty ({\varvec{G}}^0)^\tau {\varvec{U}}\sum _{\alpha =0}^\infty ({\varvec{G}}-{\varvec{U}})^\alpha {\underline{{\mathbf{1}}}}\\&\quad = (1-\lambda )^{-1}({\varvec{W}}_S-{\varvec{W}}_T)({\varvec{I}}-{\varvec{G}}^0)^{-1}{\varvec{U}}{\underline{{\mathbf{1}}}} + o\left( \Vert {\varvec{U}}\Vert (1-\lambda )^{-1}\right) , \end{aligned}$$

using (169), (171) and the fact that $({\varvec{W}}_S-{\varvec{W}}_T){\varvec{G}}^t{\underline{{\mathbf{1}}}} = ({\varvec{W}}_S-{\varvec{W}}_T){\underline{{\mathbf{1}}}} = 0$ in the third step, a change of variables $\alpha = t-\tau -1$ in the fourth step and (170) in the fifth step. Formula (170) also implies that the denominator of (99) equals

$$\begin{aligned} {\varvec{W}}_T\left( {\varvec{I}}-({\varvec{G}}-{\varvec{U}})\right) ^{-1}{\underline{{\mathbf{1}}}} = (1-\lambda )^{-1} + o\left( (1-\lambda )^{-1}\right) . \end{aligned}$$

In view of (168), we obtain formula (100) by taking the ratio of the last two displayed equations. In order to prove that $F_{ST}^\mathrm{{ appr}}$ equals the right hand side of (100) as well, it follows, by the definition of $\varvec{\Pi }$ in (34), that

$$\begin{aligned} ({\varvec{W}}_S\varvec{\Pi })_{kl}&= \sum _{i,j=1}^s \gamma _i1_{\{i=j\}}\Pi _{ij,kl}\\&= \sum _{i=1}^s \gamma _i\Pi _{ii,kl}\\&= \sum _{i=1}^s \gamma _i\left( 1_{\{(i,i)=(k,l)\}}- \gamma _k1_{\{i=l\}}-\gamma _l1_{\{i=k\}}+\gamma _k\gamma _l\right) \\&= \gamma _k1_{\{k=l\}} - \gamma _k\gamma _l\\&= ({\varvec{W}}_S-{\varvec{W}}_T)_{kl}, \end{aligned}$$

which we can rewrite in vector format, as

$$\begin{aligned} {\varvec{W}}_S\varvec{\Pi }= {\varvec{W}}_S-{\varvec{W}}_T. \end{aligned}$$

(172)

A similar calculation shows that

$$\begin{aligned} ({\varvec{G}}^0\varvec{\Pi })_{ij,kl}&= \sum _{m,n=1}^s (G^0)_{ij,mn}\Pi _{mn,kl}\\&= \sum _{m,n=1}^s b_{im}^0 b_{jn}^0 \left( 1_{\{(m,n)=(k,l)\}}-\gamma _k1_{\{m=l\}}-\gamma _l1_{\{n=k\}}+ \gamma _k\gamma _l\right) \\&= b_{ik}^0 b_{jl}^0 - \gamma _k b_{il}^0 \sum _{n=1}^s b_{jn}^0 - \gamma _l b_{jk}^0 \sum _{m=1}^s b_{im}^0 + \gamma _k\gamma _l \sum _{m=1}^s b_{im}^0\sum _{n=1}^s b_{jn}^0\\&= b_{ik}^0 b_{jl}^0\\&= G_{ij,kl}^0, \end{aligned}$$

which we rewrite as

$$\begin{aligned} {\varvec{G}}^0\varvec{\Pi }= {\varvec{G}}^0. \end{aligned}$$

(173)

This yields

$$\begin{aligned} F_{ST}^\mathrm{{ appr},{\varvec{\gamma }}}&= \sum _{i=1}^s \gamma _iV_{ii}\\&= {\varvec{W}}_S\text{ vec }({\varvec{V}})\\&= {\varvec{W}}_S \sum _{\tau =0}^\infty ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^\tau \varvec{\Pi }{\varvec{U}}{\underline{{\mathbf{1}}}}\\&= {\varvec{W}}_S \sum _{\tau =0}^\infty ({\varvec{G}}^0)^\tau \varvec{\Pi }{\varvec{U}}{\underline{{\mathbf{1}}}} + O(\Vert {\varvec{U}}\Vert ^2)\\&= ({\varvec{W}}_S-{\varvec{W}}_T)\sum _{\tau =0}^\infty ({\varvec{G}}^0)^\tau {\varvec{U}}{\underline{{\mathbf{1}}}} + O(N^{-2})\\&= ({\varvec{W}}_S-{\varvec{W}}_T)\left( {\varvec{I}}-{\varvec{G}}^0\right) ^{-1}{\varvec{U}}{\underline{{\mathbf{1}}}} + O(N^{-2}), \end{aligned}$$

where in the first step we used the definition (51) of $F_{ST}^\mathrm{{ appr},{\varvec{\gamma }}}$, in the third step the expansion (36) of $\text{ vec }({\varvec{V}})$ and in the fifth step the assumption $\Vert {\varvec{U}}\Vert =O(N^{-1})$, (172), (173) and the second part of (171).

Finally, formula (101) is proved in the same way as (100), replacing ${\varvec{U}}$ by $\bar{{\varvec{U}}}={\varvec{B}}\otimes {\varvec{B}}-{\varvec{D}}$ everywhere. $\square $

In order to compare the sizes of the fixation indeces when genes are drawn with and without replacement, we formulate the following result:

Proposition 9

The fixation index in (99) can be written as

$$\begin{aligned} f_{ST}^{{\varvec{w}}} = \frac{\bar{h}_T^{{\varvec{w}}}-\bar{h}_S^{{\varvec{w}}} + \sum _{i=1}^s \frac{w_i-w_i^2}{2Nu_i}\bar{h}_{ii}}{\bar{h}_T^{{\varvec{w}}} - \sum _{i=1}^s \frac{w_i^2}{2Nu_i}\bar{h}_{ii}}. \end{aligned}$$

(174)

In particular, for a strong migration limit where $N\rightarrow \infty $ while the migration rates in ${\varvec{M}}$ are kept fixed, it holds that

$$\begin{aligned} f_{ST}^{{\varvec{w}}}&= \bar{f}_{ST}^{{\varvec{w}}} + \sum _{i=1}^s \frac{w_i-w_i^2}{2Nu_i} + o(N^{-1})\nonumber \\&\mathop {=}\limits ^{w_i=u_i=1/s}\bar{f}_{ST}^{{\varvec{w}}} + \frac{s-1}{2N} + o(N^{-1}). \end{aligned}$$

(175)

In order to illustrate this result, consider the island model under panmixia ($m=1$), for which it is well known that $\bar{f}_{ST}=0$ for the canonical and uniform weighting scheme $w_i=1/s$, reflecting the fact that subpopulations on the average are identical. However, even under panmixia, there will still be small differences between subpopulations. It is shown in Hössjer (2013) (see also Latter and Sved 1981) that the replacement version $f_{ST}$ of the fixation index captures this, in terms of a nonzero value $f_{ST}=(s-1)/(2N) + o(N^{-1})$. It also follows from Hössjer et al. (2013) or (68) that the replacement version of the quasi equilibrium approximation of the fixation index satisfies $F_{ST}^\mathrm{{ appr}}= (s-1)/(2N)$ under panmixia.

Proof of Proposition 9

We have that

$$\begin{aligned} h_{ij}^{{\varvec{w}}} = \left( 1-\frac{1}{2Nu_i}\right) ^{\{i=j\}}\bar{h}_{ij}^{{\varvec{w}}}, \end{aligned}$$

since the probability is $\left( 1-1/(2Nu_i)\right) ^{\{i=j\}}$ that two genes are not the same when drawn with replacement, and given this, they are different by state with probability $\bar{h}_{ij}^{{\varvec{w}}}$, as defined in (96). It then follows from (97), and the analogous definitions of $h_S^{{\varvec{w}}}$ and $h_T^{{\varvec{w}}}$ in terms of $h_{ij}^{{\varvec{w}}}$, that

$$\begin{aligned} h_S^{{\varvec{w}}}&= \bar{h}_S^{{\varvec{w}}} - \sum _{i=1}^s \frac{w_i}{2Nu_i}\bar{h}_{ii},\\ h_T^{{\varvec{w}}}&= \bar{h}_T^{{\varvec{w}}} - \sum _{i=1}^s \frac{w_i^2}{2Nu_i}\bar{h}_{ii}. \end{aligned}$$

By inserting these two equations into (99), we arrive at (174).

When migration rates are fixed and $N\rightarrow \infty $, we have $\bar{h}_{ij}=\bar{h}_T^{{\varvec{w}}}(1+O(N^{-1}))$ for all $i,j$, and hence (174) implies

$$\begin{aligned} f_{ST}^{{\varvec{w}}} = \bar{f}_{ST}^{{\varvec{w}}} + \frac{\bar{h}_T^{{\varvec{w}}}\sum _{i=1}^s \frac{w_i-w_i^2}{2Nu_i} + O(N^{-2})}{\bar{h}_T^{{\varvec{w}}}\left( 1 + O(N^{-1})\right) }, \end{aligned}$$

which can be simplified to (175). $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hössjer, O., Ryman, N. Quasi equilibrium, variance effective size and fixation index for populations with substructure. J. Math. Biol. 69, 1057–1128 (2014). https://doi.org/10.1007/s00285-013-0728-9

Download citation

Received: 22 October 2012
Revised: 11 September 2013
Published: 15 October 2013
Issue Date: November 2014
DOI: https://doi.org/10.1007/s00285-013-0728-9

Keywords

Mathematics Subject Classification (2000)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quasi equilibrium, variance effective size and fixation index for populations with substructure

Abstract

Access this article

Similar content being viewed by others

On the eigenvalue effective size of structured populations

Population Genetics with Fluctuating Population Sizes

Population Structure and Migration

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Orthogonal decomposition of allele frequency process

Proposition 5

Proof

Proposition 6

Proof

Appendix B: Proofs from Sect. 5

Proof of Proposition 1.

Proof of Proposition 2

Proof of Proposition 3

Appendix C: Proof of Theorem 1

Appendix D: Verifying formulas for \(\Omega ({\varvec{P}}_t)\) and \(N_{eV}^\mathrm{{ appr}}\) for various reproduction and migration models.

Appendix E: Proof of Theorem 2

Lemma 1

Proof

Lemma 2

Proof

Proof of Theorem 2

Appendix F: Details from Sect. 11

Proof of Proposition 4

Proposition 7

Proof of Proposition 7

Proof of Theorem 3

Proposition 8

Proof of Proposition 8

Proof of Theorem 4

Proposition 9

Proof of Proposition 9

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation