Skip to main content
Log in

Quasi equilibrium, variance effective size and fixation index for populations with substructure

  • Published:
Journal of Mathematical Biology Aims and scope Submit manuscript

Abstract

In this paper, we develop a method for computing the variance effective size \(N_{eV}\), the fixation index \(F_{ST}\) and the coefficient of gene differentiation \(G_{ST}\) of a structured population under equilibrium conditions. The subpopulation sizes are constant in time, with migration and reproduction schemes that can be chosen with great flexibility. Our quasi equilibrium approach is conditional on non-fixation of alleles. This is of relevance when migration rates are of a larger order of magnitude than the mutation rates, so that new mutations can be ignored before equilibrium balance between genetic drift and migration is obtained. The vector valued time series of subpopulation allele frequencies is divided into two parts; one corresponding to genetic drift of the whole population and one corresponding to differences in allele frequencies among subpopulations. We give conditions under which the first two moments of the latter, after a simple standardization, are well approximated by quantities that can be explicitly calculated. This enables us to compute approximations of the quasi equilibrium values of \(N_{eV}\), \(F_{ST}\) and \(G_{ST}\). Our findings are illustrated for several reproduction and migration scenarios, including the island model, stepping stone models and a model where one subpopulation acts as a demographic reservoir. We also make detailed comparisons with a backward approach based on coalescence probabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Allendorf F, Ryman N (2002) The role of genetics in population viability analysis. In: Bessinger SR, McCullogh DR (eds) Population viability analysis. The University of Chicago Press, Chicago

    Google Scholar 

  • Allendorf FW, Luikart G (2007) Conservation and the genetics of populations. Blackwell, Malden

    Google Scholar 

  • Barton NH, Slatkin M (1986) A quasi-equilibrium theory of the distribution of rare alleles in a subdivided population. Heredity 56:409–415

    Article  Google Scholar 

  • Brockwell PJ, Davis RA (1987) Time series: theory and methods. Springer, New York

    MATH  Google Scholar 

  • Caballero A (1994) Developments in the prediction of effective population size. Heredity 73:657–679

    Article  Google Scholar 

  • Cannings C (1974) The latent roots of certain Markov chains arising in genetics: a new approach. I. Haploid models. Adv Appl Prob 6:260–290

    Article  MathSciNet  MATH  Google Scholar 

  • Caswell H (2001) Matrix population models, 2nd edn. Sinauer, Sunderland

    Google Scholar 

  • Cattiaux P, Collet P, Lambert A, Martínez SM, Martín JS (2009) Quasi-stationary distributions and diffusion models in population dynamics. Ann Probab 37(5):1926–1969

    Article  MathSciNet  MATH  Google Scholar 

  • Chakraborty R, Leimar O (1987) Genetic variation within a subdivided population. In: Ryman N, Utter R (eds) Population genetics and fishery management. Washington Sea Grant Program, Seattle, WA. Reprinted 2009 by The Blackburn Press, Caldwell

  • Collet P, Martinez S (2013) Quasi stationary distributions, Markov chains, diffusions and dynamical systems. Springer, Berlin

    Book  MATH  Google Scholar 

  • Cox DR, Miller HD (1965) The theory of stochastic processes. Methuen & Co Ltd, London

    MATH  Google Scholar 

  • Crow JF (2004) Assessing population subdivision. In: Wasser SP (ed) Evolutionary theory and processes: modern horizons. Papers in Honour of Eviator Nevo. Springer Science+Business Media Dordrecht, Berlin, pp 35–42

    Google Scholar 

  • Crow JF, Aoki K (1982) Group selection for a polygenic behavioral trait: a differential proliferation model. Proc Natl Acad Sci 79:2628–2631

    Article  MathSciNet  MATH  Google Scholar 

  • Crow JF, Aoki K (1984) Group selection for a polygenic behavioral trait: estimating the degree of population subdivision. Proc Natl Acad Sci 81:6073–6077

    Article  MATH  Google Scholar 

  • Crow JF, Kimura M (1970) An introduction to population genetics theory. The Blackburn Press, Caldwell

    MATH  Google Scholar 

  • Durrett R (2008) Probability models for DNA sequence evolution, 2nd edn. Springer, New York

    Book  MATH  Google Scholar 

  • Engen S, Lande R, Saether B-E (2005a) Effective size of a fluctuating age-structured population. Genetics 170:941–954

    Article  Google Scholar 

  • Engen S, Lande R, Saether B-E, Weimerskirch H (2005b) Extinction in relation to demographic and environmental stochasticity in age-structured models. Math Biosci 195:210–227

    Article  MathSciNet  MATH  Google Scholar 

  • Engle RF, Granger CWJ (1987) Co-integration and error correction: Representation, estimation and testing. Econometrica 55:251–276

    Article  MathSciNet  MATH  Google Scholar 

  • Ethier SN, Nagylaki T (1980) Diffusion approximation of Markov chains with two time scales and applications to genetics. Adv Appl Prob 12:14–49

    Article  MathSciNet  MATH  Google Scholar 

  • Ewens WJ (1982) On the concept of effective population size. Theoret Popul Biol 21:373–378

    Article  MathSciNet  MATH  Google Scholar 

  • Ewens WJ (2004) Mathematical Population Genetics. I. Theoretical introduction, 2nd edn. Springer, New York

  • Felsenstein J (1971) Inbreeding and variance effective numbers in populations with overlapping generations. Genetics 68:581–597

    MathSciNet  Google Scholar 

  • Fisher RA (1958) The genetical theory of natural selection, 2nd edn. Dover, New York

    Google Scholar 

  • Granger CWJ (1981) Some properties of time series data and their use in econometric model specification. J Econom 16:121–130

    Article  Google Scholar 

  • Hardy OJ, Vekemans X (1999) Isolation by distance in a continuous population: reconciliation between spatial autocorrelation analysis and population genetics models. Heredity 83:145–154

    Article  Google Scholar 

  • Hardy OJ, Vekemans X (2002) SPAGeDI: a versatile computer program to analyse spatial genetic structure at the individual or population model. Mol Ecol Notes 2:618–620

    Article  Google Scholar 

  • Hare MP, Nunney L, Schwartz MK, Ruzzante DE, Burford M, Waples R, Ruegg K, Palstra F (2011) Understanding and estimating effective population size for practical applications in marine species management. Conserv Biol 25(3):438–449

    Article  Google Scholar 

  • Hössjer O (2011) Coalescence theory for a general class of structured populations with fast migration. Adv Appl Probab 43(4):1027–1047

    Article  MATH  Google Scholar 

  • Hössjer O (2013) Spatial autocorrelation for subdivided populations with invariant migration schemes. Methodol Comput Appl Probab. doi:10.1007/s11009-013-9321-3

    MATH  Google Scholar 

  • Hössjer O, Jorde PE, Ryman N (2013) Quasi equilibrium approximations of the fixation index of the island model under neutrality. Theoret Popul Biol 84:9–24

    Article  MATH  Google Scholar 

  • Jamieson IG, Allendorf FW (2012) How does the 50/500 rule apply to MVPs? Trends Ecol Evol 27(10): 578–584

    Google Scholar 

  • Jorde P-E, Ryman N (2007) Unbiased estimator of genetic drift and effective population size. Genetics 177:927–935

    Article  Google Scholar 

  • Karlin S (1966) A first course in stochastic processes. Academic Press, New York

    Google Scholar 

  • Kimura M (1953) ‘Stepping stone’ model of population. Ann Rep Natl Inst Genet Japan 3:62–63

    Google Scholar 

  • Kimura M (1955) Solution of a process of random genetic drift with a continuous model. Proc Natl Acad Sci USA 41:141–150

    Article  Google Scholar 

  • Kimura M (1964) Diffusion models in population genetics. J Appl Prob 1:177–232

    Article  MATH  Google Scholar 

  • Kimura M (1971) Theoretical foundations of population genetics at the molecular level. Theor Popul Biol 2:174–208

    Article  MATH  Google Scholar 

  • Kimura M, Weiss GH (1964) The stepping stone model of population structure and the decrease of genetic correlation with distance. Genetics 61:763–771

    Google Scholar 

  • Kingman JFC (1982) The coalescent. Stoch Proc Appl 13:235–248

    Article  MathSciNet  MATH  Google Scholar 

  • Latter BDH, Sved JA (1981) Migration and mutation in stochastic models of gene frequency change. II. Stochastic migration with a finite number of islands. J Math Biol 13:95–104

    Article  MathSciNet  MATH  Google Scholar 

  • Leviyang S (2011a) The distribution of \(F_{ST}\) for the island model in the large population, weak mutation limit. Stoch Anal Appl 28:577–601

    Article  MathSciNet  Google Scholar 

  • Leviyang S (2011b) The distribution of \(F_{ST}\) and other genetic statistics for a class of population structure models. J Math Biol 62:203–289

    Article  MathSciNet  MATH  Google Scholar 

  • Leviyang S, Hamilton MB (2011) Properties of Weir and Cockerham’s \(F_{ST}\) estimator and associated bootstrap confidence intervals. Theoret Populat Biol 79:39–52

    Article  Google Scholar 

  • Malécot G (1946) La consanguinité dans une population limitée. C R Acad Sci (Paris) 222:841–843

    MathSciNet  Google Scholar 

  • Maruyama T (1970a) On the rate of decrease of heterozygosity in circular stepping stone models of populations. Theor Popul Biol 1:101–119

    Google Scholar 

  • Maruyama T (1970b) Effective number of alleles in subdivided populations. Theor Popul Biol 1:273–306

    Article  MathSciNet  MATH  Google Scholar 

  • Möhle M (2010) Looking forwards and backwards in the multi-allelic neutral Cannings population model. J Appl Prob 47:713–731

    Article  MATH  Google Scholar 

  • Nagylaki T (1980) The strong migration limit in geographically structured populations. J Math Biol 9: 101–114

    Google Scholar 

  • Nagylaki T (1982) Geographical invariance in population genetics. J Theor Biol 99:159–172

    Article  MathSciNet  Google Scholar 

  • Nagylaki T (1998) The expected number of heterozygous sites in a subdivided population. Genetics 149:1599–1604

    Google Scholar 

  • Nagylaki T (2000) Geographical invariance and the strong-migration limit in subdivided populations. J Math Biol 41:123–142

    Google Scholar 

  • Nei M (1973) Analysis of gene diversity in subdivided populations. Proc Natl Acad Sci USA 70:3321–3323

    Article  MATH  Google Scholar 

  • Nei M (1975) Molecular evolution and population genetics. North-Holland, Amsterdam

    Google Scholar 

  • Nei M (1977) \(F\)-statistics and analysis of gene diversity in subdivided populations. Ann Hum Genet 41: 225–233

  • Nei M, Chakravarti A, Tateno Y (1977) Mean and variance of \(F_{ST}\) in a finite number of incompletely isolated populations. Theoret Popul Biol 11:291–306

    Article  MATH  Google Scholar 

  • Nei M, Kumar S (2000) Molecular evolution and phylogenetics. Oxford University Press, Oxford

    Google Scholar 

  • Nei M, Tajima F (1981) Genetic drift and estimation of effective population size. Genetics 98:625–640

    MathSciNet  Google Scholar 

  • Nordborg M, Krone S (2002) Separation of time scales and convergence to the coalescent in structured populations. In: Slatkin M, Veuille M (eds) Modern development in theoretical population genetics. Oxford Univ Press, Oxford, pp 194–232

    Google Scholar 

  • Nunney L (1999) The effective size of a hierarchically-structured population. Evolution 53:1–10

    Article  Google Scholar 

  • Olsson F, Hössjer O, Laikre L, Ryman N (2013) Variance effective population size of populations in which size and age composition fluctuate. Theoret, Popul Biol (to appear)

  • Orive ME (1993) Effective population size in organisms with complex life-histories. Theoret Popul Biol 44:316–340

    Article  MATH  Google Scholar 

  • Palstra FP, Ruzzante DE (2008) Genetic estimates of contemporary effective population size: what can they tell us about the importance of genetic stochasticity for wild populations persistence? Mol Ecol 17:3428–3447

    Article  Google Scholar 

  • Rottenstreich S, Miller JR, Hamilton MB (2007) Steady state of homozygosity and \(G_{ST}\) for the island model. Theoret Popul Biol 72:231–244

    Article  MATH  Google Scholar 

  • Ryman N, Allendorf FW, Jorde PE, Laikre L, Hössjer O (2013) Samples from structured populations yield biased estimates of effective size that overestimate the rate of loss of genetic variation. Mol Ecol Resour (to appear)

  • Ryman N, Leimar O (2008) Effect of mutation on genetic differentiation among nonequilibrium populations. Evolution 62(9):2250–2259

    Article  Google Scholar 

  • Sagitov S, Jagers P (2005) The coalescent effective size of age-structured populations. Ann Appl Probab 15(3):1778–1797

    Article  MathSciNet  MATH  Google Scholar 

  • Sampson KY (2006) Structured coalescent with nonconservative migration. J Appl Prob 43:351–362

    Article  MathSciNet  MATH  Google Scholar 

  • Sjödin P, Kaj I, Krone S, Lascoux M, Nordborg M (2005) On the meaning and existence of an effective population size. Genetics 169:1061–1070

    Article  Google Scholar 

  • Slatkin M (1981) Estimating levels of gene flow in natural populations. Genetics 99:323–335

    Google Scholar 

  • Slatkin M (1985) Rare alleles as indicators of gene flow. Evolution 39:53–65

    Article  Google Scholar 

  • Slatkin M (1991) Inbreeding coefficients and coalescence times. Genet Res 58:167–175

    Article  Google Scholar 

  • Slatkin M, Arter HE (1991) Spatial autocorrelation methods in population genetics. Am Nat 138(2):499–517

    Article  Google Scholar 

  • Sved JA, Latter BDH (1977) Migration and mutation in stochastic models of gene frequency change. J Math Biol 5:61–73

    Google Scholar 

  • Sokal RR, Oden NL, Thomson BA (1997) A simulation study of microevolutionary inferences by spatial autocorrelation analysis. Biol J Linnean Soc 60:73–93

    Article  Google Scholar 

  • Takahata N (1983) gene identity and genetic differentiation of populations in the finite island model. Genetics 104 (3): 497–512

    Google Scholar 

  • Takahata N, Nei M (1984) \(F_{ST}\) and \(G_{ST}\) statistics in the Finite island model. Genetics 107 (3): 501–504

  • Van der AA NP, Ter Morsche HG, Mattheij RRM (2007) Computation of eigenvalue and eigenvector derivatives for a general complex-valued eigensystem. Electron J Linear Algebra 16:300–314

    Google Scholar 

  • Wakeley J (1999) Nonequilibrium migration in human history. Genetics 153:1863–1871

    Google Scholar 

  • Wakeley J, Takahashi T (2004) The many-demes limit for selection and drift in a subdivided population. Theoret Popul Biol 66:83–91

    Article  MATH  Google Scholar 

  • Wang J, Caballero A (1999) Developments in predicting the effective size of subdivided populations. Heredity 82:212–226

    Article  Google Scholar 

  • Waples RS (1989) A generalized approach for estimating effective population size from temporal changes of allele frequency. Genetics 121:379–391

    Google Scholar 

  • Waples RS (2002) Definition and estimation of effective population size in the conservation of endangered species. In: Beissinger SR, McCullogh DR (eds) Populations viability analysis. The University of Chicago Press, Chicago

    Google Scholar 

  • Waples RS, Gaggiotti O (2006) What is a population? An empirical evaluation of some genetic methods for identifying the number of gene pools and their degree of connectivity. Mol Ecol 15:1419–1439

    Article  Google Scholar 

  • Waples RS, Yokota M (2007) Temporal estimates of effective population size in species with overlapping generations. Genetics 175:219–233

    Article  Google Scholar 

  • Ward RD, Woodward M, Skibinski DOF (1994) A comparison of genetic diversity levels in marine, freshwater and anadromous fishes. J Fish Biol 44:213–232

    Article  Google Scholar 

  • Weir BS, Cockerham CC (1984) Estimating \(F\)-statistics for the analysis of population structure. Evolution 38(6):1358

    Article  Google Scholar 

  • Weiss GH, Kimura M (1965) A mathematical analysis of the stepping stone model of genetic correlation. J Appl Probab 2:129–149

    Article  MathSciNet  MATH  Google Scholar 

  • Whitlock MC, Barton NH (1997) The effective size of a subdivided population. Genetics 145:427–441

    Google Scholar 

  • Wilkinson-Herbots HM (1998) Genealogy and subpopulation differentiation under various models of population structure. J Math Biol 37:535–585

    Article  MathSciNet  MATH  Google Scholar 

  • Wright S (1931) Evolution in Mendelian populations. Genetics 16:97–159

    Google Scholar 

  • Wright S (1938) Size of population and breeding structure in relation to evolution. Science 87:430–431

    Google Scholar 

  • Wright S (1943) Isolation by distance. Genetics 28:114–138

    Google Scholar 

  • Wright S (1946) Isolation by distance under diverse systems of mating. Genetics 31:39–59

    Google Scholar 

  • Wright S (1951) The general structure of populations. Ann Eugenics 15:323–354

    Article  MathSciNet  Google Scholar 

  • Wright S (1978) Variability within and among genetic populations. Evolution and the genetics of populations, vol 4. University of Chicago Press, Chicago

    Google Scholar 

Download references

Acknowledgments

Ola Hössjer’s research was financially supported by the Swedish Research Council, contract nr. 621-2008-4946, and the Gustafsson Foundation for Research in Natural Sciences and Medicine. Nils Ryman’s research was supported by grants from the Swedish Research Council, the BONUS Baltic Organisations’ Network for Funding Science EEIG (the BaltGene research project), and through a grant to his colleague Linda Laikre from the Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning (Formas). The authors want to thank an associate editor, two referees, Anders Martin-Löf, and Fredrik Olsson for valuable comments on the work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ola Hössjer.

Appendices

Appendix A: Orthogonal decomposition of allele frequency process

Jordan canonical form of \({\varvec{B}}\) and motivation of (22). Let \({\varvec{B}}= {\varvec{Q}}\varvec{\Lambda }{\varvec{Q}}^{-1}\) be the Jordan canonical form of \({\varvec{B}}\), with

$$\begin{aligned} \varvec{\Lambda }= \left( \begin{array}{c@{\quad }c@{\quad }c} \varvec{\Lambda }_1 &{} \ldots &{} 0 \\ \vdots &{} \ddots &{} \vdots \\ 0 &{} \ldots &{} \varvec{\Lambda }_r \end{array}\right) \end{aligned}$$

a block diagonal matrix containing the (possibly complex-valued) eigenvalues of \({\varvec{B}}\) along the diagonal. For each \(l=1,\ldots ,r\), the square matrix

$$\begin{aligned} \varvec{\Lambda }_l = \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} \lambda _l &{} 1 &{} 0 &{} \ldots &{} \\ 0 &{} \lambda _l &{} 1 &{} 0 &{} \ldots \\ 0 &{} 0 &{} \lambda _l &{} 1 &{} 0 &{} \ldots \\ \vdots &{} &{} \ddots &{} \ddots &{} \ddots &{} \ddots \end{array}\right) \end{aligned}$$
(104)

occupies rows and columns \(j_{l-1}+1,\ldots ,j_l\) of \(\varvec{\Lambda }\), with diagonal entries equal to \(\lambda _l\), all entries along the superdiagonal equal to 1 and all other entries of \(\varvec{\Lambda }_l\) equal 0. Hence \(\lambda _l\) is an eigenvalue of \({\varvec{B}}\) which appears \(j_l-j_{l-1}\) times along the diagonal of \(\varvec{\Lambda }_l\), with \(0=j_0 < j_1 < \cdots < j_r=s\). In particular, \(\varvec{\Lambda }\) is diagonal when all eigenvalues of \({\varvec{B}}\) are distinct and \(r=s\). Then the rows of \({\varvec{Q}}^{-1}\) contain the left eigenvectors of \({\varvec{B}}\) and the columns \({\varvec{q}}_1,\ldots ,{\varvec{q}}_s\) of \({\varvec{Q}}\) the right eigenvectors. See for instance Cox and Miller (1965).

Regardless of whether \(\varvec{\Lambda }\) is diagonal or not, since \({\varvec{B}}\) is a transition matrix of a Markov chain, \({\varvec{q}}_1 = \varvec{1}\) is a right eigenvector with eigenvalue \(\lambda _1=1\). By the assumed irreducibility and aperiodicity of this Markov chain, it follows from the Perron Frobenius Theorem that \(|\lambda _l|<1\) for \(l=2,\ldots ,r\), and without loss of generality, we may assume \(|\lambda _2|\ge |\lambda _3|\ge \cdots \ge |\lambda _r|\ge 0\).

Introduce the inner product

$$\begin{aligned} ({\varvec{x}},{\varvec{y}})=\sum _{i=1}^s \gamma _i\bar{x}_iy_i \end{aligned}$$
(105)

for possibly complex-valued column vectors \({\varvec{x}}=(x_i)\) and \({\varvec{y}}=(y_i)\) of length \(s\), with \(\bar{x}_i\) the complex conjugate of \(x_i\). Then, we have the following result:

Proposition 5

The columns \({\varvec{q}}_2,\ldots ,{\varvec{q}}_s\) of \({\varvec{Q}}\) are all orthogonal to \({\varvec{q}}_1=\varvec{1}\) with respect to inner product (105), i.e.

$$\begin{aligned} (\varvec{1},{\varvec{q}}_j) = 0, \quad j=2,\ldots ,s. \end{aligned}$$

Proof

We have that

$$\begin{aligned} (\varvec{1},{\varvec{q}}_j)&= \sum _{i=1}^s \gamma _j q_j\\&= \langle \varvec{\gamma },{\varvec{q}}_j\rangle , \end{aligned}$$

where \(\langle {\varvec{x}},{\varvec{y}}\rangle =\sum _{j=1}^s x_jy_j\) is the standard inner product. The result follows since \(\varvec{\gamma }\) is the first row of \({\varvec{Q}}^{-1}\) and \({\varvec{q}}_j\) row number \(j\) (with \(j\ge 2\)) of \({\varvec{Q}}\). \(\square \)

Define \(\varvec{\Lambda }^0=\text{ diag }(0,\varvec{\Lambda }_2,\ldots ,\varvec{\Lambda }_s)\) as the block diagonal matrix obtained by replacing \(\Lambda _1=\lambda _1=1\) in \(\varvec{\Lambda }\) by \(0\) (or any other with modulus less or equal to \(|\lambda _2|\)), and put

$$\begin{aligned} {\varvec{B}}^0 = {\varvec{Q}}\varvec{\Lambda }^0{\varvec{Q}}^{-1}. \end{aligned}$$
(106)

It then follows that \({\varvec{B}}^0\) has largest eigenvalue \(|\lambda _2|<1\), and it enters into the time dynamics of the allele frequency process as follows:

Proposition 6

The recursive autoregressive equation (10) for \({\varvec{P}}_t\) can be decomposed into one genetic drift term for the overall allele frequency of the whole population, and one recursion part for the allele frequency fluctuations among subpopulations, as

$$\begin{aligned} P_{t+1}&= P_t + \varepsilon _{t+1},\nonumber \\ {\varvec{P}}_{t+1}^0&= {\varvec{B}}{\varvec{P}}_t^0 + \varvec{\varepsilon }_{t+1}^0 = {\varvec{B}}^0 {\varvec{P}}_t^0 + \varvec{\varepsilon }_{t+1}^0, \end{aligned}$$
(107)

with \(\varvec{\varepsilon }_{t+1}^0\) as defined in (22).

Proof

The upper part of (107) follows immediately from (10), since

$$\begin{aligned} P_{t+1} = (\varvec{1},{\varvec{P}}_{t+1}) = (\varvec{1},{\varvec{B}}{\varvec{P}}_t + \varvec{\varepsilon }_{t+1}) = P_t + \varepsilon _{t+1}. \end{aligned}$$

Define, for any vector \({\varvec{x}}=(x_1,...,x_s)\), \({\varvec{x}}^0 = {\varvec{x}}- (\varvec{1},{\varvec{x}})\varvec{1}\). Then, since \(({\varvec{x}}+{\varvec{y}})^0 = {\varvec{x}}^0+{\varvec{y}}^0\), we have that

$$\begin{aligned} {\varvec{P}}_{t+1}^0 = ({\varvec{B}}{\varvec{P}}_t + \varvec{\varepsilon }_{t+1})^0 = ({\varvec{B}}{\varvec{P}}_t)^0 + \varvec{\varepsilon }_{t+1}^0 = {\varvec{B}}{\varvec{P}}_t^0 + \varvec{\varepsilon }_{t+1}^0 = {\varvec{B}}^0{\varvec{P}}_t^0 + \varvec{\varepsilon }_{t+1}^0. \qquad \quad \end{aligned}$$
(108)

The third equality of (108) follows since

$$\begin{aligned} ({\varvec{B}}{\varvec{P}}_t)^0&= \left( {\varvec{B}}(P_t\varvec{1}+ {\varvec{P}}_t^0)\right) ^0\\&= \left( P_t\varvec{1}+ {\varvec{B}}{\varvec{P}}_t^0\right) ^0\\&= P_t\varvec{1}^0 + ({\varvec{B}}{\varvec{P}}_t^0)^0\\&= 0 + {\varvec{B}}{\varvec{P}}_t^0\\&= {\varvec{B}}{\varvec{P}}_t^0, \end{aligned}$$

where in the second last step we used that since \({\varvec{P}}_t^0\) is a linear combination of \({\varvec{q}}_2,\ldots ,{\varvec{q}}_s\), so is \({\varvec{B}}{\varvec{P}}_t^0\), and hence orthogonal to \(\varvec{1}\) by Proposition 5, so that \(({\varvec{B}}{\varvec{P}}_t^0)^0={\varvec{B}}{\varvec{P}}_t^0\).

The fourth equality of (108) follows since \({\varvec{Q}}^{-1}{\varvec{P}}_t^0\) is a linear combination of \({\varvec{e}}_2,\ldots ,{\varvec{e}}_s\), where \({\varvec{e}}_i=(0,\ldots ,0,1,0,\ldots ,0)^T\) has 1 in position \(i\) and zeros elsewhere. Hence \(\varvec{\Lambda }{\varvec{Q}}^{-1}{\varvec{P}}_t^0 = \varvec{\Lambda }^0{\varvec{Q}}^{-1}{\varvec{P}}_t^0\) and \({\varvec{B}}{\varvec{P}}_t^0 = {\varvec{B}}^0 {\varvec{P}}_t^{0}\). \(\square \)

Appendix B: Proofs from Sect. 5

Proof of Proposition 1.

We notice that

$$\begin{aligned} E(H_{t+1,ij}|{\varvec{P}}_t)&= E\left( P_{t+1,i}(1-P_{t+1,j}) + P_{t+1,j}(1-P_{t+1,i})|{\varvec{P}}_t\right) \\&= E(P_{t+1,i}|{\varvec{P}}_t)\left( 1-E(P_{t+1,j}|{\varvec{P}}_t)\right) \\&+ E(P_{t+1,j}|{\varvec{P}}_t)\left( 1-E(P_{t+1,i}|{\varvec{P}}_t)\right) - 2 \text{ Cov }(P_{t+1,i},P_{t+1,j}|{\varvec{P}}_t)\\&= ({\varvec{B}}{\varvec{P}}_t)_i \left( 1-({\varvec{B}}{\varvec{P}}_t)_j\right) + ({\varvec{B}}{\varvec{P}}_t)_j \left( 1-({\varvec{B}}{\varvec{P}}_t)_i\right) - 2\Omega ({\varvec{P}}_t)_{ij}\\&= \sum _{k,l=1}^s b_{ik}b_{jl}\left( P_{tk}(1-P_{tl}) + (1-P_{tk})P_{tl}\right) - 2 \Omega ({\varvec{P}}_t)_{ij}, \end{aligned}$$

from which it easily follows that the two recursions in (25) and (26) are equivalent, with \(A_{ij,kl}\) and \(U_{ij,kl}\) related as in (27).

Next we will show that (25) and (28) are equivalent. Clearly (25) implies (28), so it remains to establish the reverse implication. Hence we assume that (28) is satisfied and we want to show that (25) holds for a unique square matrix \({\varvec{U}}=(U_{ij,kl})\) of order \(s^2\) with \(U_{ij,kl}=U_{ij,lk}\). Indeed, since \(\varvec{\Omega }({\varvec{P}})\) is a quadratic function of \({\varvec{P}}\) with \(\varvec{\Omega }({\varvec{0}})={\varvec{0}}\), there is a unique such matrix \({\varvec{U}}\) and a unique set of coefficients \(c_{ij,k}\) satisfying

$$\begin{aligned} \Omega ({\varvec{P}})_{ij} = \sum _k c_{ij,k}P_k - \sum _{k,l} U_{ij,kl}P_kP_l \end{aligned}$$
(109)

for all \(i,j\). On the other hand, according to lower part of (28),

$$\begin{aligned} \Omega (\varvec{1}-{\varvec{P}})_{ij} = \sum _k c_{ij,k}(1-P_k) - \sum _{k,l} U_{ij,kl}(1-P_k)(1-P_l) \end{aligned}$$
(110)

should agree with (109). The quadratic terms of (109) and (110) are clearly identical, but in order for the linear and constant terms to agree as well,

$$\begin{aligned} c_{ij,k} = \sum _{l} U_{ij,kl} \end{aligned}$$
(111)

must hold for all \(k\) (recall that \(U_{ij,kl}=U_{ij,lk}\)). On the other hand, we can add and subtract linear terms in (109) according to

$$\begin{aligned} \Omega ({\varvec{P}})_{ij} = \frac{1}{2} \sum _{k,l} U_{ij,kl}\left( P_k(1-P_l)+P_l(1-P_k)\right) + \sum _{k} (c_{ij,k}-d_{ij,k})P_k, \qquad \quad \end{aligned}$$
(112)

where

$$\begin{aligned} d_{ij,k} = \sum _{l} U_{ij,kl} \end{aligned}$$

for all \(k\). But \(d_{ij,k}=c_{ij,k}\) according to (111), so that the second sum in (112) vanishes, and the proposition is proved. \(\square \)

Proof of Proposition 2

First of all, since \(\sum _{\tau =0}^\infty ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^\tau \) is assumed to converge, it can be seen by insertion that (36) provides a solution to (33).

In order to prove (37), we get from the Cauchy–Schwarz inequality

$$\begin{aligned} |V_{t,ij}| \le \sqrt{V_{t,ii}V_{t,jj}} \le \max (V_{t,ii},V_{t,jj}), \end{aligned}$$

for all pairs \(i,j\). This implies

$$\begin{aligned} |{\varvec{V}}_t|_\infty = \max _{1\le i,j\le s} |V_{t,ij}| = \max _{1\le i \le s} V_{t,ii} = \max _{1\le i \le s} \frac{E_c\left( (P_{ti}^0-P_t)^2|P_t\right) }{P_t(1-P_t)}. \end{aligned}$$

We then use the definitions of \(|\cdot |_\infty \) and \(\Vert \cdot \Vert \) in Table 2, the triangle inequality and the matrix norm inequality \(\Vert ({\varvec{G}}-\varvec{\Pi }{\varvec{U}})^\tau \varvec{\Pi }\Vert \le \Vert ({\varvec{G}}-\varvec{\Pi }{\varvec{U}})^\tau \Vert \Vert \varvec{\Pi }\Vert \) in order to prove (38), since

$$\begin{aligned} |{\varvec{V}}|_\infty&= |\text{ vec }({\varvec{V}})|_\infty \\&= \left| \sum _{\tau =0}^\infty ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^\tau \varvec{\Pi }{\varvec{U}}{\underline{{\mathbf{1}}}}\right| _\infty \\&\le \sum _{\tau =0}^\infty \left| ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^\tau \varvec{\Pi }{\varvec{U}}{\underline{{\mathbf{1}}}}\right| _\infty \\&\le \sum _{\tau =0}^\infty \Vert ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^\tau \Vert \Vert \varvec{\Pi }\Vert |{\varvec{U}}{\underline{{\mathbf{1}}}}|_\infty \\&= \text{ Mixtime } \Vert \varvec{\Pi }\Vert |{\varvec{U}}{\underline{{\mathbf{1}}}}|_\infty . \end{aligned}$$

We also have that

$$\begin{aligned} \Vert \varvec{\Pi }\Vert&= \max _{i,j} \sum _{1\le k,l\le s} |\Pi _{ij,kl}|\\&\le \max _{i,j} \sum _{1\le k,l\le s} |1_{\{(k,l)=(i,j)\}} - \gamma _{k}1_{\{j=l\}} - \gamma _l 1_{\{i=k\}} + \gamma _k\gamma _l|\\&\le \max _{i,j} \left( 1 + 2\sum _k \gamma _k + \sum _{k,l=1}^s \gamma _k\gamma _l\right) \\&= 4. \end{aligned}$$

Finally, (39)–(40) are proved in the same way as (37)–(38). \(\square \)

Proof of Proposition 3

In order to prove (41), we introduce for each pair of integers \(\tau ,\alpha \) with \(0\le \alpha \le \tau \) the set \({\mathcal {N}}_{\tau \alpha } = \{{\varvec{n}}= (n_0,n_1,\ldots ,n_{\alpha +1})\}\) of \(\tau \atopwithdelims ()\alpha \) sequences \({\varvec{n}}\) such that \(0=n_0 < n_1 < \cdots < n_\alpha < n_{\alpha +1} = \tau +1\). Then

$$\begin{aligned} ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^\tau = \sum _{\alpha =0}^\tau (-1)^\alpha \sum _{{\varvec{n}}\in {\mathcal {N}}_{\tau \alpha }} \left( \prod _{i=1}^{\alpha +1} ({\varvec{U}}^{\{i>1\}}({\varvec{G}}^0)^{n_i-n_{i-1}-1}\varvec{\Pi }^{\{i<\alpha +1\}})\right) , \qquad \quad \end{aligned}$$
(113)

where the terms in \({\mathcal {N}}_{\tau \alpha }\) correspond to all possible ways of picking \(\alpha \) terms \(\varvec{\Pi }{\varvec{U}}\) and \(\tau -\alpha \) terms \({\varvec{G}}^0\). Taking the matrix norm of (113) and multiplying by \({\varvec{U}}\) from the left and \(\varvec{\Pi }\) from the right, it follows from matrix norm inequalities that

$$\begin{aligned}&\Vert {\varvec{U}}\Vert \Vert ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^\tau \Vert \Vert \varvec{\Pi }\Vert \nonumber \\&\quad \le \sum _{\alpha =0}^\tau \sum _{{\varvec{n}}\in {\mathcal {N}}_{\tau \alpha }} \Vert {\varvec{U}}\Vert \left\| \prod _{i=1}^{\alpha +1} ({\varvec{U}}^{\{i>1\}} ({\varvec{G}}^0)^{n_i-n_{i-1}-1}\varvec{\Pi }^{\{i<\alpha +1\}}) \right\| \Vert \varvec{\Pi }\Vert \nonumber \\&\quad \le \sum _{\alpha =0}^\tau \sum _{{\varvec{n}}\in {\mathcal {N}}_{\tau \alpha }} \Vert {\varvec{U}}\Vert \prod _{i=1}^{\alpha +1} \left( \Vert {\varvec{U}}^{\{i>1\}} ({\varvec{G}}^0)^{n_i-n_{i-1}-1}\varvec{\Pi }^{\{i<\alpha +1\}}\Vert \right) \Vert \varvec{\Pi }\Vert \nonumber \\&\quad \le \sum _{\alpha =0}^\tau \sum _{{\varvec{n}}\in {\mathcal {N}}_{\tau \alpha }} \Vert {\varvec{U}}\Vert \prod _{i=1}^{\alpha +1} \left( \Vert {\varvec{U}}\Vert ^{\{i>1\}} \Vert ({\varvec{G}}^0)^{n_i-n_{i-1}-1}\Vert \Vert \varvec{\Pi }^{\{i<\alpha +1\}}\Vert \right) \Vert \varvec{\Pi }\Vert \nonumber \\&\quad = \sum _{\alpha =0}^\tau \sum _{{\varvec{n}}\in {\mathcal {N}}_{\tau \alpha }} \prod _{i=1}^{\alpha +1} (\Vert {\varvec{U}}\Vert \Vert ({\varvec{G}}^0)^{n_i-n_{i-1}-1}\Vert \Vert \varvec{\Pi }\Vert ). \end{aligned}$$
(114)

Summing (114) over \(\tau \), then changing the order of summation between \(\alpha \) and \(\tau \), and finally substituting \(m_i=n_i-n_{i-1}-1\), we find that

$$\begin{aligned} \Vert \varvec{\Pi }\Vert \Vert {\varvec{U}}\Vert \text{ Mixtime }&= \Vert {\varvec{U}}\Vert \sum _{\tau =0}^\infty \Vert ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^\tau \Vert \Vert \varvec{\Pi }\Vert \nonumber \\&\le \sum _{\tau =0}^\infty \sum _{\alpha =0}^\tau \sum _{{\varvec{n}}\in {\mathcal {N}}_{\tau \alpha }} \prod _{i=1}^{\alpha +1} \left( \Vert {\varvec{U}}\Vert \Vert ({\varvec{G}}^0)^{n_i-n_{i-1}-1}\Vert \Vert \varvec{\Pi }\Vert \right) \nonumber \\&= \sum _{\alpha =0}^\infty \sum _{\tau =\alpha }^\infty \sum _{{\varvec{n}}\in {\mathcal {N}}_{\tau \alpha }} \prod _{i=1}^{\alpha +1} \left( \Vert {\varvec{U}}\Vert \Vert ({\varvec{G}}^0)^{n_i-n_{i-1}-1}\Vert \Vert \varvec{\Pi }\Vert \right) \nonumber \\&= \sum _{\alpha =0}^\infty \sum _{m_1=0}^\infty \cdots \sum _{m_{\alpha +1}=0}^\infty \prod _{i=1}^{\alpha +1} \left( \Vert {\varvec{U}}\Vert \Vert ({\varvec{G}}^0)^{m_i}\Vert \Vert \varvec{\Pi }\Vert \right) \nonumber \\&= \sum _{\alpha =0}^\infty \prod _{i=1}^{\alpha +1} \left( \Vert {\varvec{U}}\Vert \sum _{m=0}^\infty \Vert ({\varvec{G}}^0)^m \Vert \Vert \varvec{\Pi }\Vert \right) \nonumber \\&= \sum _{\alpha =0}^\infty \left( \Vert {\varvec{U}}\Vert \sum _{m=0}^\infty \Vert ({\varvec{G}}^0)^m \Vert \Vert \varvec{\Pi }\Vert \right) ^{\alpha + 1}\nonumber \\&= \Vert \varvec{\Pi }\Vert \Vert {\varvec{U}}\Vert \sum _{m=0}^\infty \Vert ({\varvec{G}}^0)^m\Vert / (1 - \Vert \varvec{\Pi }\Vert \Vert {\varvec{U}}\Vert \sum _{m=0}^\infty \Vert ({\varvec{G}}^0)^m\Vert ). \qquad \qquad \end{aligned}$$
(115)

It can be seen that \(({\varvec{G}}^0)^\tau \text{ vec }({\varvec{V}}) = \text{ vec }(({\varvec{B}}^0)^\tau {\varvec{V}}(({\varvec{B}}^0)^T)^\tau )\), by induction with respect to \(\tau \). Writing \(({\varvec{G}}^0)^\tau =(G_{ij,kl}^{0(\tau )})\) and \(({\varvec{B}}^0)^\tau = (b_{ik}^{0(\tau )})\), this yields

$$\begin{aligned} G_{ij,kl}^{0(\tau )} = b_{ik}^{0(\tau )}b_{jl}^{0(\tau )}, \end{aligned}$$

and

$$\begin{aligned} \Vert ({\varvec{G}}^0)^\tau \Vert&= \max _{i,j} \sum _{k,l} |G_{ij,kl}^{0(\tau )}|\nonumber \\&= \max _{i,j} \sum _{k,l=1}^s |b_{ik}^{0(\tau )}| |b_{jl}^{0(\tau )}|\nonumber \\&= \max _i \sum _k |b_{ik}^{0(\tau )}| \cdot \max _j \sum _l |b_{jl}^{0(\tau )}|\nonumber \\&= \Vert ({\varvec{B}}^0)^\tau \Vert ^2. \end{aligned}$$
(116)

Formula (41) then follows from (115) to (116). In order to verify (42), we use (106) and the Jordan decomposition (104) to deduce

$$\begin{aligned} ({\varvec{B}}^0)^\tau = {\varvec{Q}}\text{ diag }(0,\varvec{\Lambda }_2^\tau ,\ldots ,\varvec{\Lambda }_r^\tau ) {\varvec{Q}}^{-1}, \end{aligned}$$

where the middle matrix on the right hand side is block diagonal, with \(\Vert \varvec{\Lambda }_l^\tau \Vert = O(\tau ^{j_l-j_{l-1}-1}|\lambda _l|^\tau )\) as \(\tau \rightarrow \infty \), and \(j_l-j_{l-1}\) the order of the square matrix \(\varvec{\Lambda }_l\), see Cox and Miller (1965) for details. In particular, this implies that \(\Vert \varvec{\Lambda }_l^\tau \Vert \) converges to zero at a faster rate than \((|\lambda _2|+\epsilon )^\tau \) as \(\tau \rightarrow \infty \) for any \(0<\epsilon < 1-|\lambda _2|\). Then (42) follows, since

$$\begin{aligned} \Vert ({\varvec{B}}^0)^\tau \Vert \le \Vert {\varvec{Q}}\Vert \left( \max _{2\le l \le r} \Vert \varvec{\Lambda }_l^\tau \Vert \right) \Vert {\varvec{Q}}^{-1}\Vert . \end{aligned}$$

Finally, (43) is a simple consequence of (41) and (42), since

$$\begin{aligned} \sum _{\tau =0}^\infty \Vert ({\varvec{B}}^0)^\tau \Vert ^{2}&\le C^2 \sum _{\tau =0}^\infty (|\lambda _2| + \epsilon )^{2\tau }\\&= \frac{C^2}{1-(|\lambda _2|+\epsilon )^2}. \end{aligned}$$

\(\square \)

Appendix C: Proof of Theorem 1

We start by showing that \(\text{ vec }({\varvec{V}}_t)\) and \(\text{ vec }(\varvec{\Sigma }_t)\) satisfy a similar system of equations as (33). To this end, since \(\varvec{\varepsilon }_t^0 = ({\varvec{I}}-\varvec{1}\varvec{\gamma })\varvec{\varepsilon }_t\), the lower part of (22) implies a recursion

$$\begin{aligned} {\varvec{V}}_{t+1}&= \frac{E_c\left( \varvec{\varepsilon }_{t+1}^0(\varvec{\varepsilon }_{t+1}^0)^T|P_{t}\right) }{P_{t}(1-P_{t})} + \frac{E_c\left( {\varvec{{\varvec{B}}}}^0 {\varvec{{\varvec{P}}}}_t^0 \left( {\varvec{{\varvec{B}}}}^0{\varvec{{\varvec{P}}}}_t^0\right) ^T|P_t\right) }{P_t(1-P_t)} + \varvec{\xi }_{t+1}\nonumber \\&= ({\varvec{I}}-\varvec{1}\varvec{\gamma })\varvec{\Sigma }_t ({\varvec{I}}-\varvec{1}\varvec{\gamma })^T + {\varvec{B}}^0 {\varvec{V}}_t ({\varvec{B}}^0)^T + \varvec{\xi }_{t+1}, \end{aligned}$$
(117)

where \(\varvec{\xi }_{t+1}\) is a remainder term that is nonzero since we conditioned on \(P_t\) rather than \(P_{t+1}\) and divided by \(P_t(1-P_t)\) rather than \(P_{t+1}(1-P_{t+1})\) on the right hand side of (117). Any departure of \(E_c(\varvec{\varepsilon }_{t+1}^0|P_t)\) from \(E(\varvec{\varepsilon }_{t+1}^0|P_t)={\varvec{0}}\) implies, in addition, that a cross covariance term is added to \(\varvec{\xi }_{t+1}\).

In vec format we may rewrite (117) as

$$\begin{aligned} \text{ vec }({\varvec{V}}_{t+1}) = \varvec{\Pi }\text{ vec }(\varvec{\Sigma }_t) + {\varvec{G}}^0\text{ vec }({\varvec{V}}_t) + \text{ vec }(\varvec{\xi }_{t+1}), \end{aligned}$$
(118)

with \(\varvec{\Pi }\) and \({\varvec{G}}^0\) matrices defined by \(\varvec{\Pi }\text{ vec }(\varvec{\Sigma }_t)=\text{ vec }(({\varvec{I}}-\varvec{1}\varvec{\gamma })\varvec{\Sigma }_t({\varvec{I}}-\varvec{1}\varvec{\gamma })^T))\) and \({\varvec{G}}^0\text{ vec }({\varvec{V}}_t)=\text{ vec }({\varvec{B}}^0{\varvec{V}}_t({\varvec{B}}^0)^T)\) respectively. Hence their entries are as in (34) and (35).

For the standardized genetic drift covariance matrix, we first expand (31) as

(119)

where the remainder term \(\varvec{\zeta }_t\) occurs when replacing the inner expectation \(E\) by \(E_c\). Then we expand \(\varvec{\Omega }(P_t{\varvec{1}} + {\varvec{P}}_t^0)\) as in (30) and take expectation conditionally on \(P_t\), and switch index from \(t\) to \(t+1\), to deduce that

$$\begin{aligned} \text{ vec }(\varvec{\Sigma }_{t+1})&= {\varvec{U}}{\underline{{\mathbf{1}}}} - {\varvec{U}}\text{ vec }({\varvec{V}}_{t+1}) + {\varvec{U}}_{t+1}\varvec{\mu }_{t+1} + \text{ vec }(\varvec{\zeta }_{t+1}) \nonumber \\&= {\varvec{U}}{\underline{{\mathbf{1}}}} - {\varvec{U}}\text{ vec }({\varvec{V}}_{t+1}) + \varvec{\eta }_{t+1}. \end{aligned}$$
(120)

where \({\varvec{U}}_t = (U_{tij,k})\) is an \(s^2\times s\) matrix, whose elements are defined as \(U_{tij,k}=(1-2P_t)\sum _l U_{ij,kl}\), so that the last term on the right hand side of (30) can be written as \({\varvec{U}}_t{\varvec{P}}_t^0\). The last term on the right hand side of (120) is defined by

$$\begin{aligned} \varvec{\eta }_{t} = {\varvec{U}}_t \varvec{\mu }_t + \text{ vec }(\varvec{\zeta }_t) \end{aligned}$$

with \(\varvec{\mu }_t\) as in (44).

Now (118) and (120) define a system of equations which only differs from (33) in that the remainder terms \(\text{ vec }(\varvec{\xi }_{t+1})\) and \(\varvec{\eta }_{t+1}\) have been added. For simplicity of notation, we write \(\tilde{\varvec{\xi }}_t = \text{ vec }(\varvec{\xi }_t)=\text{ vec }(\xi _{t,ij};1\le i,j\le s)\), a column vector of length \(s^2\). Combining and (118) and (120), we get

$$\begin{aligned} \left( \begin{array}{c} \text{ vec }(\varvec{\Sigma }_{t+1}) \\ \text{ vec }({\varvec{V}}_{t+1}) \end{array} \right) = {\varvec{T}}\left( \begin{array}{c} \text{ vec }(\varvec{\Sigma }_{t}) \\ \text{ vec }({\varvec{V}}_{t}) \end{array} \right) + \left( \begin{array}{c} {\varvec{U}}{\underline{{\mathbf{1}}}} \\ {\varvec{0}}\end{array} \right) + \left( \begin{array}{c} \varvec{\eta }_{t+1}-{\varvec{U}}\tilde{\varvec{\xi }}_{t+1} \\ \tilde{\varvec{\xi }}_{t+1} \end{array} \right) , \end{aligned}$$
(121)

where

$$\begin{aligned} {\varvec{T}}= \left( \begin{array}{c@{\quad }c} {\varvec{0}}&{} -{\varvec{U}}\\ {\varvec{0}}&{} {\varvec{I}}\end{array}\right) \left( \begin{array}{c@{\quad }c} {\varvec{I}}&{} {\varvec{0}}\\ \varvec{\Pi }&{} {\varvec{G}}^0 \end{array}\right) . \end{aligned}$$

On the other hand, it follows from (33) that

$$\begin{aligned} \left( \begin{array}{c} \text{ vec }(\varvec{\Sigma }) \\ \text{ vec }({\varvec{V}}) \end{array} \right) = {\varvec{T}}\left( \begin{array}{c} \text{ vec }(\varvec{\Sigma }) \\ \text{ vec }({\varvec{V}}) \end{array} \right) + \left( \begin{array}{c} {\varvec{U}}{\underline{{\mathbf{1}}}} \\ {\varvec{0}}\end{array} \right) . \end{aligned}$$
(122)

Taking the difference of (121) and (122), we find that

$$\begin{aligned} \varvec{\delta }_{t} = \left( \begin{array}{c} \text{ vec }(\Delta \varvec{\Sigma }_t) \\ \text{ vec }(\Delta {\varvec{V}}_t) \end{array} \right) , \end{aligned}$$

satisfies

$$\begin{aligned} \varvec{\delta }_{t+1} = {\varvec{T}}\varvec{\delta }_{t} + \left( \begin{array}{c} \varvec{\eta }_{t+1} -{\varvec{U}}\tilde{\varvec{\xi }}_{t+1} \\ \tilde{\varvec{\xi }}_{t+1} \end{array} \right) \Longrightarrow \varvec{\delta }_t = \sum _{\tau =0}^\infty {\varvec{T}}^\tau \left( \begin{array}{c} \varvec{\eta }_{t-\tau }-{\varvec{U}}\tilde{\varvec{\xi }}_{t-\tau } \\ \tilde{\varvec{\xi }}_{t-\tau } \end{array} \right) , \end{aligned}$$
(123)

provided that the series converges. It can be shown by induction with respect to \(\tau \) that

$$\begin{aligned} {\varvec{T}}^\tau = \left( \begin{array}{c@{\quad }c} -{\varvec{U}}({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^{\tau -1}\varvec{\Pi }&{} -{\varvec{U}}({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^{\tau -1}{\varvec{G}}^0 \\ ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^{\tau -1}\varvec{\Pi }&{} ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^{\tau -1}{\varvec{G}}^0 \end{array}\right) \end{aligned}$$

for all \(\tau \ge 1\). Inserting this formula into (123), one obtains

$$\begin{aligned} \varvec{\delta }_t = \left( \begin{array}{c} -\varvec{\eta }_t \\ {\varvec{0}}\end{array}\right) + \left( \begin{array}{c} -{\varvec{U}}\\ {\varvec{I}}\end{array}\right) \sum _{\tau =0}^\infty ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^{\tau }\left( \varvec{\Pi }\varvec{\eta }_{t-\tau -1} + \tilde{\varvec{\xi }}_{t-\tau }\right) . \end{aligned}$$
(124)

Since \(\tilde{\varvec{\xi }}_t\) contains the same elements as \(\varvec{\xi }_t\), we have that \(|\tilde{\varvec{\xi }}_t|_\infty = |\varvec{\xi }_t|_\infty \), and moreover, \(|\varvec{\eta }_t|_\infty \le |\varvec{\zeta }_t|_\infty + \Vert {\varvec{U}}_t\Vert |\varvec{\mu }_t|_\infty \). Hence it follows, by taking the \(|\cdot |_\infty \)-norm of the upper and lower part of (124), that

$$\begin{aligned} |\Delta \varvec{\Sigma }_t|_\infty&\le |\varvec{\zeta }_t|_\infty \!+\! \Vert {\varvec{U}}_t\Vert |\varvec{\mu }_t|_\infty \nonumber \\&\!+ \Vert {\varvec{U}}\Vert \sum _{\tau =0}^\infty \Vert ({\varvec{G}}^0\!-\!\varvec{\Pi }{\varvec{U}})^{\tau }\Vert \left( \Vert \varvec{\Pi }\Vert (|\varvec{\zeta }_t|_\infty \!+\! \Vert {\varvec{U}}_{t-\tau -1}\Vert |\varvec{\mu }_{t-\tau -1}|_\infty ) \!+\! |\varvec{\xi }_{t-\tau }|_\infty \right) \nonumber \\ \end{aligned}$$
(125)

and

$$\begin{aligned} |\Delta {\varvec{V}}_t|_\infty \le \sum _{\tau =0}^\infty \Vert ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^{\tau }\Vert \left( \Vert \varvec{\Pi }\Vert (|\varvec{\zeta }_t|_\infty + \Vert {\varvec{U}}_{t-\tau -1}\Vert |\varvec{\mu }_{t-\tau -1}|_\infty ) + |\varvec{\xi }_{t-\tau }|_\infty \right) . \nonumber \\ \end{aligned}$$
(126)

Since

$$\begin{aligned} \sum _k |U_{tij,k}| \le \sum _k \left| \sum _l U_{ij,kl}\right| \le \sum _{kl} |U_{ij,kl}|, \end{aligned}$$

if follows that \(\Vert {\varvec{U}}_t\Vert \le \Vert {\varvec{U}}\Vert \). Hence we may replace \(\Vert {\varvec{U}}_t\Vert \) and \(\Vert {\varvec{U}}_{t-\tau -1}\Vert \) in (125)–(126) by their upper bounds \(\Vert {\varvec{U}}\Vert \), take conditional expectation \(E_c\) on both sides of these two inequalities, and finally letting \(t\rightarrow \infty \), thereby obtaining (48) and (49). \(\square \)

Appendix D: Verifying formulas for \(\Omega ({\varvec{P}}_t)\) and \(N_{eV}^\mathrm{{ appr}}\) for various reproduction and migration models.

We will start by verifying (30) (and hence also (120)) separately for reproduction scenarios 1, 2 and 3.

Reproduction scenario 1. For this reproduction scenario, we write

$$\begin{aligned} P^*_{tki} = P_{tk} + (\tilde{P}_{tk}-P_{tk}) + (P^*_{tki}-\tilde{P}_{tk}). \end{aligned}$$

It follows from (10) and (53) that

$$\begin{aligned} \varepsilon _{t+1,i} = \sum _{k=1}^s b_{ik}(\tilde{P}_{tk}-P_{tk}) + \sum _{k=1}^s b_{ik}(P^*_{tki}-\tilde{P}_{tk}). \end{aligned}$$

We further have that

$$\begin{aligned} \text{ Var }(\tilde{P}_{tk}-P_{tk}|{\varvec{P}}_t) = \left( \frac{1}{2N_{ek}} - \frac{1}{2Nu_k}\right) P_{tk}(1-P_{tk})(1+o(1)) \end{aligned}$$
(127)

and

$$\begin{aligned} \text{ Var }(P^*_{tki}-\tilde{P}_{tk}|{\varvec{P}}_t) = \frac{P_{tk}(1-P_{tk})}{2Nu_km_{ki}}(1+o(1)). \end{aligned}$$

Combining the last three displayed expressions, we arrive at (54). \(\square \)

Reproduction scenario 2. Write

$$\begin{aligned} \varepsilon _{t+1,i} = \sum _{k=1}^s b_{ik}(P^*_{tki}-P_{tk}), \end{aligned}$$
(128)

introduce \(C_{kij} = \text{ Cov }(\nu _{ki}^l,\nu _{kj}^l)\) and \(\tilde{C}_{kij}=\text{ Cov }(\nu _{ki}^l,\nu _{kj}^{l^\prime })\) when \(l\ne l^\prime \). Because of the assumed exchangeability of \(\{\varvec{\nu }_k^l\}_{l=1}^{2Nu_k}\), \(C_{kij}\) and \(\tilde{C}_{kij}\) do not depend on \(l\) and \((l,l^\prime )\) respectively. Since (2) holds exactly, with remainder term \(o(1)\) equal to zero, the variance of the left hand side must be zero, and this implies \(\tilde{C}_{kij}=-C_{kij}/(2Nu_k-1)\). Therefore, it follows from (55) that

$$\begin{aligned} \text{ Cov }(P^*_{tki},P^*_{tkj}|{\varvec{P}}_t)&= \frac{ 2Nu_kP_{tk}C_{kij} + 2Nu_kP_{tk}(2Nu_kP_{tk}-1)\tilde{C}_{kij}}{(2Nu_k)^2m_{ki}m_{kj}}\\&\sim \frac{C_{kij}}{m_{ki}m_{kj}} \frac{P_{tk}(1-P_{tk})}{2Nu_k}. \end{aligned}$$

Combining this with (128), we arrive at

$$\begin{aligned} \varvec{\Omega }({\varvec{P}}_t)_{ij} = \sum _{k=1}^s b_{ik}b_{jk}\frac{C_{kij}}{2Nu_k m_{ki}m_{kj}} P_{tk}(1-P_{tk}), \end{aligned}$$

which is equivalent to (56). \(\square \)

Reproduction scenario 3. In order to verify (120), we first notice from (10) and (57) that

$$\begin{aligned} \varepsilon _{t+1,i} = (P_{t+1,i}-\check{P}_{ti}) + \sum _{k=1}^s b_{ik}(\tilde{P}_{tk}-P_{tk}) + \sum _{k=1}^s (B_{ik}-b_{ik})P_{tk} + \text{ rem }, \qquad \quad \end{aligned}$$
(129)

with \(\text{ rem } = \sum _{k=1}^s (B_{ik}-b_{ik})(\tilde{P}_{tk}-P_{tk})\) a remainder term that vanishes when \(N_{ek}=Nu_k\) for all \(k\) and which is otherwise asymptotically negligible when \(\alpha _i\rightarrow \infty \) as \(N\rightarrow \infty \). It follows from (57) and (59) that

$$\begin{aligned} \text{ Var }(P_{t+1,i}-\check{P}_{ti}|{\varvec{P}}_t) \sim \frac{({\varvec{B}}{\varvec{P}}_t)_i(1-({\varvec{B}}{\varvec{P}}_t)_i)}{2Nu_i}(1+o(1)), \end{aligned}$$

and

$$\begin{aligned} \text{ Var }\left( \sum _{k=1}^s (B_{ik}-b_{ik})P_{tk}|{\varvec{P}}_t\right)&= \frac{1}{\alpha _i+1}\sum _{k=1}^s P_{tk}^2 b_{ik} - \frac{1}{\alpha _i+1} \sum _{k,l=1}^s P_{tk}P_{tl}b_{ik}b_{il}\\&= \frac{1}{\alpha _i+1}\sum _{k=1}^s \left( P_{tk} - ({\varvec{B}}{\varvec{P}}_t)_i \right) ^2 b_{ik}. \end{aligned}$$

In conjunction with (127) and (129), this proves (60). \(\square \)

Verifying (65). The reproduction scenario 3 expression for \(\varvec{\Sigma }\) is obtained by combining the upper equation of (33) with the relevant entries for \(U_{ij,kl}\) in Table 3. When \(N_{ek}=Nu_k\) for \(k=1,\ldots ,s\), all non-diagonal (\(i\ne j\)) terms vanish and then the denominator of (62) can be written as

$$\begin{aligned} 2{\varvec{u}}\varvec{\Sigma }{\varvec{u}}^T&= 2\sum _{i,j,k,l} u_iu_j U_{ij,kl}- 2\sum _{i,j} u_iu_j \sum _{k,l} U_{ij,kl} V_{kl}\\&= 2\sum _{i,k,l} u_i^2 U_{ii,kl}- 2\sum _{i} u_i^2 \sum _{k,l} U_{ii,kl} V_{kl}\\&= \frac{1}{N}\left( 1 - \sum _i u_i ({\varvec{B}}{\varvec{V}}{\varvec{B}}^T)_{ii}\right) + 2\sum _i \frac{u_i^2}{\alpha _i+1}\left( \sum _k b_{ik}V_{kk} - ({\varvec{B}}{\varvec{V}}{\varvec{B}}^T)_{ii}\right) , \end{aligned}$$

which yields (65). \(\square \)

Deriving explicit expressions of \(N_{eV}^\mathrm{{ appr}}\) and \(F_{ST}^\mathrm{{ appr}}\) for the island model. Since \(\varvec{\gamma }={\varvec{u}}\) for the island model, we can apply (62) and (63), with \({\varvec{u}}=\varvec{1}^T/s\), to deduce

$$\begin{aligned} N_{eV}^\mathrm{{ appr}} = \frac{1}{2\varvec{1}^T\varvec{\Sigma }\varvec{1}/s^2} \end{aligned}$$
(130)

and

$$\begin{aligned} F_{ST}^\mathrm{{ appr}} = \frac{1}{s}\text{ tr }({\varvec{V}}). \end{aligned}$$
(131)

We will start by giving a more explicit expression for \({\varvec{V}}\). It follows from (66) that \({\varvec{B}}{\varvec{q}}= (1-m){\varvec{q}}\) for any vector \({\varvec{q}}\) with \(({\varvec{q}},\varvec{1})=0\). Hence \(\lambda _2=\cdots =\lambda _s=1-m\). In this case it is particularly convenient to put \(\lambda _1^0=1-m\) in the definition of \({\varvec{B}}^0\), since then, according to (106), \({\varvec{B}}^0=(1-m){\varvec{I}}\). The lower part of (33) can be written as \({\varvec{V}}={\varvec{B}}^0{\varvec{V}}({\varvec{B}}^0)^T + \tilde{\varvec{\Sigma }}\), where \(\tilde{\varvec{\Sigma }} = ({\varvec{I}}-\varvec{1}\varvec{\gamma })\varvec{\Sigma }({\varvec{I}}-\varvec{\gamma })^T\). We can repeatedly apply this equation to deduce that

$$\begin{aligned} {\varvec{V}}= \sum _{r=0}^{\infty } (1-m)^{2r}\tilde{\varvec{\Sigma }} = \frac{\tilde{\varvec{\Sigma }}}{1-(1-m)^2}, \end{aligned}$$

and hence (131) can be rewritten as

$$\begin{aligned} \left( 1-(1-m)^2\right) F_{ST}^\mathrm{{ appr}} = \frac{1}{s}\text{ tr }(\tilde{\varvec{\Sigma }}) = \frac{1}{s}\left( \text{ tr }(\varvec{\Sigma }) - \frac{1}{s}\varvec{1}^T\varvec{\Sigma }\varvec{1}\right) . \end{aligned}$$
(132)

Therefore, in view of (130) and (132), it remains to find \(\varvec{\Sigma }\).

For reproduction scenario 1, it can be deduced from (120) that (54) simplifies to

$$\begin{aligned} \Sigma _{ij}&= \left( \frac{1}{2N_e}-\frac{1}{2N/s}\right) \left( \frac{2m-m^2}{s} + (1-m)^2 1_{\{i=j\}}\right) + \frac{1_{\{i=j\}}}{2N/s}\\&-\left( \frac{1}{2N_e}\!-\!\frac{1}{2N/s}\right) \left( \frac{m^2}{s^2}\text{ tr }({\varvec{V}}) \!+\! \frac{V_{ii}\!+\!V_{jj}}{2}\left( 2\frac{m}{s}(1-m) \!+\! 1_{\{i=j\}}(1-m)^2\right) \right) \\&- \frac{1_{\{i=j\}}}{2N/s}\left( \frac{m}{s}\text{ tr }({\varvec{V}}) + (1-m)V_{ii}\right) \end{aligned}$$

for the island model, so that

$$\begin{aligned} \frac{2}{s^2}\varvec{1}^T\varvec{\Sigma }\varvec{1}= \frac{1}{sN_e}\left( 1-\frac{1}{s}\text{ tr }({\varvec{V}})\right) = \frac{1}{sN_e}(1-F_{ST}^\mathrm{{ appr}}) \end{aligned}$$
(133)

and

$$\begin{aligned} \frac{1}{s}\text{ tr }(\tilde{\varvec{\Sigma }}) = \frac{s-1}{s}\frac{1}{2\tilde{N}}(1-F_{ST}^\mathrm{{ appr}}). \end{aligned}$$
(134)

Combining (130) and (133) we arrive at (67), and inserting (134) into (132) and solving for \(F_{ST}^\mathrm{{ appr}}\) we arrive at (68).

For reproduction scenario 3, a similar simplification of (60) leads to

$$\begin{aligned} \frac{2}{s^2}\varvec{1}^T\varvec{\Sigma }\varvec{1}&= \frac{1}{sN_e} - \left( \frac{1}{N_e}-\frac{1-(1-m)^2}{N/s}\right) \frac{1}{s^2}\text{ tr }({\varvec{V}}) + \frac{2\left( 1-(1-m)^2\right) }{\alpha +1}\frac{1}{s^2}\text{ tr }({\varvec{V}})\nonumber \\&= \frac{1}{sN_e} - \left( \frac{1}{N_e}-\frac{1-(1-m)^2}{N/s}\right) \frac{1}{s}F_{ST}^\mathrm{{ appr}} + \frac{2\left( 1-(1-m)^2\right) }{\alpha +1}\frac{1}{s}F_{ST}^\mathrm{{ appr}}. \nonumber \\ \end{aligned}$$
(135)

and

$$\begin{aligned} \frac{1}{s}\text{ tr }(\tilde{\varvec{\Sigma }})&= \frac{s-1}{s}\left( \frac{1}{2\tilde{N}}-\frac{(1-m)^2}{2N_e}\frac{1}{s}\text{ tr }({\varvec{V}}) + \frac{1-(1-m)^2}{\alpha +1} \frac{1}{s}\text{ tr }({\varvec{V}})\right) \nonumber \\&= \frac{s-1}{s}\left( \frac{1}{2\tilde{N}}-\frac{(1-m)^2}{2N_e}F_{ST}^\mathrm{{ appr}} + \frac{1-(1-m)^2}{\alpha +1} F_{ST}^\mathrm{{ appr}}\right) . \end{aligned}$$
(136)

Inserting (135) into (130) we arrive at (69), and plugging (136) into (132) and solving for \(F_{ST}^\mathrm{{ appr}}\) we arrive at (70). \(\square \)

Appendix E: Proof of Theorem 2

In order to prove Theorem 2, we first need two lemmas, which we state for a single biallelic locus:

Lemma 1

In the one locus biallelic definitions (12) and (13) of \(N_{eV,t}^{{\varvec{w}}}=Y/X\) and \(F_{ST,t}^{{\varvec{w}}}=Z/Y\), the conditional expected values of the numerators and denominators equal

$$\begin{aligned} E_c(Y|P_t)&= E_c\left( P_t^{{\varvec{w}}}(1-P_t^{{\varvec{w}}})|P_t\right) \nonumber \\&= \left( 1 - ({\varvec{w}}-\varvec{\gamma }){\varvec{V}}_t({\varvec{w}}-\varvec{\gamma })^T + (1-2P_t)({\varvec{w}}-\varvec{\gamma })\varvec{\mu }_t\right) P_t(1-P_t)\nonumber \\&= (1-\text{ tr }({\varvec{C}}_Y{\varvec{V}}_t) + {\varvec{c}}_Y\varvec{\mu }_t)P_t(1-P_t), \end{aligned}$$
(137)
$$\begin{aligned} E_c(Z|P_t)&= E_c\left( \sum _{i=1}^s w_i (P_{ti}-P_t^{{\varvec{w}}})^2|P_t\right) \nonumber \\&= \left( \sum _{i=1}^s w_i {\varvec{V}}_{tii} - ({\varvec{w}}-\varvec{\gamma }){\varvec{V}}_t({\varvec{w}}-\varvec{\gamma })^T\right) P_t(1-P_t)\nonumber \\&= \text{ tr }({\varvec{C}}_Z{\varvec{V}}_t)P_t(1-P_t) \end{aligned}$$
(138)

and

$$\begin{aligned} E_c(X|P_t)&= 2E_c\left( E((P_{t+1}^{{\varvec{w}}}-P_t^{{\varvec{w}}})^ 2|P_t^{{\varvec{w}}})|P_t\right) \nonumber \\&= 2\left( {\varvec{w}}({\varvec{B}}-{\varvec{I}})({\varvec{V}}_t-\varvec{\varsigma }_t)({\varvec{B}}-{\varvec{I}})^T{\varvec{w}}^T + {\varvec{w}}(\varvec{\Sigma }_t-\varvec{\zeta }_t){\varvec{w}}^T\right) P_t(1-P_t)\nonumber \\&= \left( \text{ tr }\left( {\varvec{C}}_X({\varvec{V}}_t-\varvec{\varsigma }_t)\right) + \text{ tr }\left( {\varvec{C}}_X^\prime (\varvec{\Sigma }_t-\varvec{\zeta }_t)\right) \right) P_t(1-P_t) \end{aligned}$$
(139)

respectively, where \({\varvec{C}}_Y = ({\varvec{w}}-\varvec{\gamma })^T({\varvec{w}}-\varvec{\gamma })\), \({\varvec{c}}_Y = (1-2P_t)({\varvec{w}}-\varvec{\gamma })\), \({\varvec{C}}_Z = \text{ diag }({\varvec{w}}) - {\varvec{w}}^T{\varvec{w}}\), \({\varvec{C}}_X = 2({\varvec{B}}-{\varvec{I}})^T ({\varvec{w}}-\varvec{\gamma })^T({\varvec{w}}-\varvec{\gamma })({\varvec{B}}-{\varvec{I}})\), \({\varvec{C}}_X^\prime = 2{\varvec{w}}^T{\varvec{w}}\), \(\varvec{\mu }_t\) and \(\varvec{\zeta }_t\) are the remainder terms defined in (44) and (119), and \(\varvec{\varsigma }_t\) another remainder term defined below, in (140).

Proof

We only prove the first parts of (137)–(139), and leave the second part to the reader. Starting with (137), we find that

$$\begin{aligned}&E_c\left( P_t^{{\varvec{w}}}(1-P_t^{{\varvec{w}}})|P_t\right) \\&\qquad = P_t(1-P_t) - E_c\left( (P_t^{{\varvec{w}}}-P_t)^2|P_t \right) + (1-2P_t)E_c\left( P_t^{{\varvec{w}}}-P_t|P_t \right) \\&\qquad = P_t(1-P_t) - E_c\left( (({\varvec{w}}-\varvec{\gamma }){\varvec{P}}_t^0)^2|P_t\right) + (1-2P_t)E_c\left( ({\varvec{w}}-\varvec{\gamma }){\varvec{P}}_t^0|P_t \right) \\&\qquad = P_t(1-P_t)\left( 1- ({\varvec{w}}-\varvec{\gamma }){\varvec{V}}_t({\varvec{w}}-\varvec{\gamma })^T + (1-2P_t)({\varvec{w}}-\varvec{\gamma })\varvec{\mu }_t\right) , \end{aligned}$$

where in the second equality we used \(P_t^{{\varvec{w}}} - P_t = ({\varvec{w}}-\varvec{\gamma }){\varvec{P}}_t = ({\varvec{w}}-\varvec{\gamma }){\varvec{P}}_t^0\). For (139) we use (21) and \(({\varvec{B}}-{\varvec{I}})\varvec{1}= {\varvec{0}}\) to deduce

$$\begin{aligned} P_{t+1}^{{\varvec{w}}}&= {\varvec{w}}{\varvec{P}}_{t+1}\\&= {\varvec{w}}{\varvec{B}}{\varvec{P}}_t + {\varvec{w}}\varvec{\varepsilon }_{t+1}\\&= P_t^{{\varvec{w}}} + {\varvec{w}}({\varvec{B}}-{\varvec{I}}){\varvec{P}}_t + {\varvec{w}}\varvec{\varepsilon }_{t+1}\\&= P_t^{{\varvec{w}}} + {\varvec{w}}({\varvec{B}}-{\varvec{I}}){\varvec{P}}_t^0 + {\varvec{w}}\varvec{\varepsilon }_{t+1}. \end{aligned}$$

We introduce the ascertainment bias term

$$\begin{aligned} \varvec{\varsigma }_t = \frac{E_c\left( E_c\left( {\varvec{P}}_t^0({\varvec{P}}_t^0)^ T|P_t^{{\varvec{w}}},P_t\right) |P_t\right) -E_c\left( E \left( {\varvec{P}}_t^0({\varvec{P}}_t^0)^T|P_t^{{\varvec{w}}},P_t\right) |P_t\right) }{P_t(1-P_t)}, \qquad \end{aligned}$$
(140)

which quantifies the effect of replacing the inner expectation \(E\) of \({\varvec{P}}_t^0({\varvec{P}}_t^0)^T\) by \(E_c\). Then we can write

$$\begin{aligned}&E_c\left( E\left( (P_{t+1}^{{\varvec{w}}}-P_t^{{\varvec{w}}})^2|P_t^{{\varvec{w}}} \right) |P_t\right) \\&\quad = E_c\left( E\left( (P_{t+1}^{{\varvec{w}}}-P_t^{ {\varvec{w}}})^2|P_t^{{\varvec{w}}},P_t\right) |P_t\right) \\&\quad = {\varvec{w}}({\varvec{B}}-{\varvec{I}})E_c\left( E\left( {\varvec{P}}_t^0({\varvec{P}}_t^0)^T|P_t^{{\varvec{w}}},P_t \right) |P_t\right) ({\varvec{B}}-{\varvec{I}})^T{\varvec{w}}^T\\&\quad \quad + {\varvec{w}}E_c\left( E\left( \varvec{\varepsilon }_{t+1}\varvec{\varepsilon }_{t+1}^T|P_t^{{\varvec{w}}},P_t\right) |P_t\right) {\varvec{w}}^T\\&\quad = {\varvec{w}}({\varvec{B}}-{\varvec{I}})E_c\left( {\varvec{P}}_t^0({\varvec{P}}_t^0)^T|P_t\right) ({\varvec{B}}-{\varvec{I}})^T{\varvec{w}}^T - {\varvec{w}}({\varvec{B}}-{\varvec{I}})\varvec{\varsigma }_t({\varvec{B}}-{\varvec{I}})^T{\varvec{w}}^TP_t(1-P_t)\\&\quad \quad + {\varvec{w}}E_c\left( \varvec{\varepsilon }_{t+1}\varvec{\varepsilon }_{t+1}^T|P_t\right) {\varvec{w}}^T - {\varvec{w}}\varvec{\zeta }_t{\varvec{w}}^TP_t(1-P_t)\\&\quad = P_t(1-P_t) \left( {\varvec{w}}({\varvec{B}}-{\varvec{I}})({\varvec{V}}_t-\varvec{\varsigma }_t)({\varvec{B}}-{\varvec{I}})^T{\varvec{w}}^T + {\varvec{w}}(\varvec{\Sigma }_t-\varvec{\zeta }_t){\varvec{w}}^T\right) . \end{aligned}$$

In order to verify (138), we first write

$$\begin{aligned} {\varvec{P}}_t - P_t^{{\varvec{w}}}\varvec{1}= ({\varvec{I}}-\varvec{1}{\varvec{w}}){\varvec{P}}_t = ({\varvec{I}}-\varvec{1}{\varvec{w}}){\varvec{P}}_t^0, \end{aligned}$$

which leads to

$$\begin{aligned} E_c\left( (P_{ti}-P_t^{{\varvec{w}}})^2|P_t\right) = P_t(1-P_t)\left( ({\varvec{I}}-\varvec{1}{\varvec{w}}){\varvec{V}}_t ({\varvec{I}}-\varvec{1}{\varvec{w}})^T\right) _{ii}, \end{aligned}$$

and then (138) follows since

$$\begin{aligned} \sum _{i=1}^s w_i \left( ({\varvec{I}}-\varvec{1}{\varvec{w}}){\varvec{V}}_t ({\varvec{I}}-\varvec{1}{\varvec{w}})^T\right) _{ii} = \sum _{i=1}^s w_i {\varvec{V}}_{tii} - ({\varvec{w}}-\varvec{\gamma }){\varvec{V}}_t({\varvec{w}}-\varvec{\gamma })^T. \end{aligned}$$

\(\square \)

Lemma 2

Let \({\varvec{c}}\) be a \(1\times s\) vector, \({\varvec{C}}\) an \(s\times s\) matrix, and define

$$\begin{aligned} \epsilon = {\varvec{c}}({\varvec{P}}_t^0-\varvec{\mu }_tP_t(1-P_t)) + \text{ tr }\left( {\varvec{C}}({\varvec{P}}_t^0({\varvec{P}}_t^0)^T - {\varvec{V}}_t P_t(1-P_t))\right) . \end{aligned}$$

Then

$$\begin{aligned} E_c(\epsilon ^2|P_t) \le 2|{\varvec{c}}|_1^2 |{\varvec{V}}_t|_\infty P_t(1-P_t) + 2|{\varvec{C}}|_1^2 \kappa _t, \end{aligned}$$
(141)

with

$$\begin{aligned} \kappa _t = \max _{1\le i \le s} E_c((P_{ti}^0)^4|P_t). \end{aligned}$$

Proof

Put \({\varvec{c}}=(c_1,\ldots ,c_s)\) and \({\varvec{C}}=(C_{ij})_{i,j=1}^s\). For simplicity, we omit conditioning on \(P_t\) in the notation, writing \(E_c(\cdot ) = E_c(\cdot |P_t)\). Then

$$\begin{aligned} E_c(\epsilon ^2)&\le 2E_c\left( \left( \sum _i c_i (P_{ti}^0-\mu _{ti}P_t(1-P_t))\right) ^2\right) + 2E_c\left( \left( \sum _{ij} C_{ji}P_{ti}^0P_{tj}^0\right) ^2\right) \\&\le 2E_c \left( \left( \sum _i c_i P_{ti}^0\right) ^2\right) + 2E_c\left( \left( \sum _{ij} C_{ji}P_{ti}^0P_{tj}^0\right) ^2\right) \\&\le 2\sum _{i,j} |c_i||c_j| E_c(|P_{ti}^0P_{tj}^0|) + 2\sum _{ijkl} |C_{ji}||C_{lk}||E_c (\left| P_{ti}^0 P_{tj}^0 P_{tk}^0 P_{tl}^0)\right| )\\&\le \sum _{i,j} |c_i||c_j| (E_c((P_{ti}^0)^2) + E_c((P_{tj}^0)^2)\\&+\, 0.5\sum _{ijkl} |C_{ji}||C_{lk}|(E_c(P_{ti}^0)^4 + E_c(P_{tj}^0)^4 + E_c(P_{tk}^0)^4 + E_c(P_{tl}^0)^4)\\&\le 2\sum _{i,j} |c_i||c_j| |{\varvec{V}}_t|_\infty P_t(1-P_t) + 2\sum _{ijkl} |C_{ji}||C_{lk}|\kappa _t, \end{aligned}$$

using the Cauchy Schwarz Inequality in the fourth step. The last term is identical to the right hand side of (141). \(\square \)

Proof of Theorem 2

When all loci are biallelic (\(n(x)\equiv 2\)), formulas (75) and (78) simplify to

$$\begin{aligned} G_{ST,t}^{{\varvec{w}}}&= Z/Y,\\ N_{eV,t}^{{\varvec{w}}}&= Y/X, \end{aligned}$$

respectively, where

$$\begin{aligned} X&= 2 \sum _{x=1}^n E\left( \left( P_{t+1}^{{\varvec{w}}}(x) - P_t^{{\varvec{w}}}(x)\right) ^2 | P_t^{{\varvec{w}}}(x)\right) ,\\ Y&= \sum _{x=1}^n P_t^{{\varvec{w}}}(x)(1-P_t^{{\varvec{w}}}(x)),\\ Z&= \sum _{x=1}^n \text{ tr }\left( {\varvec{C}}_Z {\varvec{P}}_t^0(x){\varvec{P}}_t^0(x)^T\right) \end{aligned}$$

are multilocus extensions of the corresponding numerators and denominators \(X\), \(Y\), \(Z\) of Lemma 1, where also \({\varvec{C}}_Z\) is defined. We assume that \(P_{ti}(x)\) is the value of the overall allele frequency \(P_{ti}(x,a)\) of some (arbitrary) of the two alleles \(a=1,2\) at locus \(x\) and subpopulation \(i=1,\ldots ,s\), \(P_t(x)=\sum _{i=1}^s \gamma _i P_{ti}(x)\), \(P_{ti}^0(x) = P_{ti}(x)-P_t(x)\) and \({\varvec{P}}_t^0(x) = (P_{ti}^0(x);i=1,\ldots ,s)^T\).

It will be convenient to condition on the allele frequency spectrum \({\mathcal {P}}_t = \{P_t(x); \, x=1,\ldots ,n\}\), writing

$$\begin{aligned} \begin{aligned}&G_{ST,t}^{{\varvec{w}}} = (\bar{Z}+ \epsilon _Z)/(\bar{Y}+ \epsilon _Y),\\&N_{eV,t}^{{\varvec{w}}} = (\bar{Y}+ \epsilon _Y)/(\bar{X}+ \epsilon _X), \end{aligned} \end{aligned}$$
(142)

where

$$\begin{aligned} \bar{X}&= E_c(X|{\mathcal {P}}_t)\\&= \sum _{x} P_t(x)(1-P_t(x))\left( \text{ tr }\left( {\varvec{C}}_X({\varvec{V}}_t(x)-\varvec{\varsigma }_t(x))\right) + \text{ tr }\left( {\varvec{C}}_X^\prime (\varvec{\Sigma }_t(x)-\varvec{\zeta }_t(x))\right) \right) ,\\ \bar{Y}&= E_c(Y|{\mathcal {P}}_t) = \sum _{x} P_t(x)(1-P_t(x))\left( 1- \text{ tr }\left( {\varvec{C}}_Y{\varvec{V}}_t(x)\right) +{\varvec{c}}_Y(x)\varvec{\mu }_t\right) ,\\ \bar{Z}&= E_c(Z|{\mathcal {P}}_t) = \sum _{x} P_t(x)(1-P_t(x))\text{ tr }({\varvec{C}}_Z{\varvec{V}}_t(x)), \end{aligned}$$

can be deduced from Lemma 1, using the same definitions of \({\varvec{C}}_X\), \({\varvec{C}}_X^\prime \) and \({\varvec{C}}_Y\) as there. Moreover, \({\varvec{V}}_t(x)\), \(\varvec{\Sigma }_t(x)\), \({\varvec{c}}_Y(x)=(1-2P_t(x))({\varvec{w}}-\varvec{\gamma })\), \(\varvec{\mu }_t(x)\), \(\varvec{\zeta }_t(x)\) and \(\varvec{\varsigma }_t(x)\) are the values of \({\varvec{V}}_t\), \(\varvec{\Sigma }_t\), \({\varvec{c}}_Y\), \(\varvec{\mu }_t\), \(\varvec{\zeta }_t\) and \(\varvec{\varsigma }_t\) at locus \(x\). The remaining three quantities of (142) are the residual terms

$$\begin{aligned} \epsilon _X&= 2 \sum _{x}E\left( \left( P_{t+1}^{{\varvec{w}}}(x)- P_t^{{\varvec{w}}}(x)\right) ^2 | P_t^{{\varvec{w}}}(x)\right) \nonumber \\&- 2 \sum _{x}E_c\left( E\left( \left( P_{t+1}^{{\varvec{w}}} (x)-P_t^{{\varvec{w}}}(x)\right) ^2|P_t^{{\varvec{w}}}(x)\right) |P_t(x)\right) ,\nonumber \\ \epsilon _Y&= \sum _x {\varvec{c}}_Y(x)({\varvec{P}}_t^0(x,a)-\varvec{\mu }_t P_t(x)(1-P_t(x))),\\&- \sum _{x} \text{ tr }\left( {\varvec{C}}_Y({\varvec{P}}_t^0(x)({\varvec{P}}_t^0(x))^T - {\varvec{V}}_t(x) P_t(x)(1-P_t(x)))\right) ,\nonumber \\ \epsilon _Z&= \sum _{x} \text{ tr }\left( {\varvec{C}}_Z({\varvec{P}}_t^0(x)({\varvec{P}}_t^0(x))^T - {\varvec{V}}_t(x) P_t(x)(1-P_t(x)))\right) .\nonumber \end{aligned}$$
(143)

It follows from the definitions of \(G_{ST}^\mathrm{{ appr},{\varvec{w}}}\) and \(N_{eV}^\mathrm{{ appr},{\varvec{w}}}\) in (51) and (50) that we can write

$$\begin{aligned} \begin{aligned} G_{ST}^\mathrm{{ appr},{\varvec{w}}}&= Z^\mathrm{{ appr}}/Y^\mathrm{{ appr}},\\ N_{eV}^\mathrm{{ appr},{\varvec{w}}}&= Y^\mathrm{{ appr}}/X^\mathrm{{ appr}} \end{aligned} \end{aligned}$$
(144)

with

$$\begin{aligned} X^\mathrm{{ appr}}&= \sum _{x} P_t(x)(1-P_t(x))\left( \text{ tr }\left( {\varvec{C}}_X{\varvec{V}}\right) + \text{ tr }\left( {\varvec{C}}_X^\prime \varvec{\Sigma }\right) \right) ,\\ Y^\mathrm{{ appr}}&= \sum _{x} P_t(x)(1-P_t(x))\left( 1- \text{ tr }\left( {\varvec{C}}_Y{\varvec{V}}\right) \right) ,\\ Z^\mathrm{{ appr}}&= \sum _{x}P_t(x)(1-P_t(x))\text{ tr }({\varvec{C}}_Z{\varvec{V}}). \end{aligned}$$

Taking the difference of (142) and (144), we find that

$$\begin{aligned} G_{ST,t}^{{\varvec{w}}} - G_{ST}^\mathrm{{ appr},{\varvec{w}}}&= \frac{\bar{Z}}{\bar{Y}} - \frac{Z^\mathrm{{ appr}}}{Y^\mathrm{{ appr}}} + \frac{\bar{Z}+\epsilon _Z}{\bar{Y}+\epsilon _Y} - \frac{\bar{Z}}{\bar{Y}}\nonumber \\&\approx \frac{\bar{Z}}{\bar{Y}} - \frac{Z^\mathrm{{ appr}}}{Y^\mathrm{{ appr}}} + \frac{1}{\bar{Y}}\epsilon _Z - \frac{\bar{Z}}{\bar{Y}^2}\epsilon _Y - \frac{1}{\bar{Y}^2}\epsilon _Y\epsilon _Z + \frac{\bar{Z}}{\bar{Y}^3}\epsilon _Y^2, \qquad \qquad \end{aligned}$$
(145)

where in the last step we made a second order Taylor expansion. The first term on the right hand side of (145) can be further approximated as

$$\begin{aligned} \frac{\bar{Z}}{\bar{Y}} - \frac{Z^\mathrm{{ appr}}}{Y^\mathrm{{ appr}}}&= \frac{1}{\bar{Y}}(\bar{Z}- Z^\mathrm{{ appr}}) - \frac{Z^\mathrm{{ appr}}}{\bar{Y}Y^\mathrm{{ appr}}} (\bar{Y}- Y^\mathrm{{ appr}})\nonumber \\&\approx \frac{1}{Y^\mathrm{{ appr}}}(\bar{Z}- Z^\mathrm{{ appr}}) - \frac{Z^\mathrm{{ appr}}}{(Y^\mathrm{{ appr}})^2} (\bar{Y}- Y^\mathrm{{ appr}})\nonumber \\&\approx \frac{2}{H_{T}^\mathrm{{ eq}}\left( 1- tr\left( {\varvec{{\varvec{C}}}}_Y{\varvec{{\varvec{V}}}}\right) \right) }\cdot \frac{1}{n}(\bar{Z}-Z^\mathrm{{ appr}})\\&- \frac{2 tr({\varvec{{\varvec{C}}}}_Z{\varvec{V}})}{H_{T}^\mathrm{{ eq}} \left( 1- tr\left( {\varvec{{\varvec{C}}}}_Y{\varvec{V}}\right) \right) ^2}\cdot \frac{1}{n} (\bar{Y}- Y^\mathrm{{ appr}})\nonumber \\&:= \frac{C_1}{n}(\bar{Z}- Z^\mathrm{{ appr}}) - \frac{C_2}{n} (\bar{Y}- Y^\mathrm{{ appr}}),\nonumber \end{aligned}$$
(146)

where in the second step we replaced \(\bar{Y}\) and \(\bar{Z}\) by \(Y^\mathrm{{ appr}}\) and \(Z^\mathrm{{ appr}}\), in the third step we approximated the gene diversity

$$\begin{aligned} H_{Tt} = H_{Tt}^{{\varvec{\varvec{\gamma }}}} = \frac{2}{n}\sum _{x} P_t(x)(1-P_t(x)) \approx H_T^\mathrm{{ eq}}, \end{aligned}$$

in (76) by its quasi equilibrium limit (20), which is accurate, by a Law of Large Numbers argument, for large \(n\). In the last step of (147) we introduced the constants \(C_1\) and \(C_2\) in order to simplify notation.

By the definition of \(\epsilon _Y\) and \(\epsilon _Z\) we have \(E_c(\epsilon _Y|{\mathcal {P}}_t) = E_c(\epsilon _Z|{\mathcal {P}}_t)=0\), and, since \(\bar{Y}\) and \(\bar{Z}\) are both functions of \({\mathcal {P}}_t\), it follows that the first two terms of the last line of (145) have zero mean. Since all loci are in linkage equilibrium, the terms on the right hand sides of all three equations in (143) are independent for different \(x\). By Lemma 2 it then follows, after some computations, that

$$\begin{aligned} |E_c \left( - \frac{1}{\bar{Y}^2}\epsilon _Y\epsilon _Z + \frac{\bar{Z}}{\bar{Y}^3}\epsilon _Y^2\right) | \le \frac{C_3^\prime }{n} \end{aligned}$$
(147)

for some constant \(C_3^\prime \), independently of \(n\). Combining (145), (147) and (147), using \(P_t(x)(1-P_t(x))\le 1/4\), \(|\text{ tr }({\varvec{C}}_Y({\varvec{V}}_t(x)-{\varvec{V}})|\le |{\varvec{C}}_Y|_1 |{\varvec{V}}_t(x)-{\varvec{V}}|_\infty \) and analogous estimates for all \(x=1,\ldots ,n\), we find that

$$\begin{aligned} |E_c(G_{ST,t}^{{\varvec{w}}}) - G_{ST}^\mathrm{{ appr},{\varvec{w}}}|&\le \frac{C_1}{n}E_c|\bar{Z}- Z^\mathrm{{ appr}}| + \frac{C_2}{n} E_c|\bar{Y}- Y^\mathrm{{ appr}}|+ \frac{C_3^\prime }{n}\\&\le \frac{C_1|{\varvec{C}}_Z|_1+C_2|{\varvec{C}}_Y|_1}{4 n} \sum _x E_c(|{\varvec{V}}_t(x)-{\varvec{V}}|_\infty ) \\&+ \frac{C_2|{\varvec{{\varvec{w}}}}-{\varvec{\gamma }}|_1}{4n} \sum _x E_c(|\varvec{\mu }_t(x)|_\infty ) + \frac{C_3^\prime }{n}. \end{aligned}$$

Then we use \(|{\varvec{C}}_Z|_1\le 2\) and \(|{\varvec{C}}_Y|_1\le |{\varvec{w}}-\varvec{\gamma }|_1^2\) and let \(t\rightarrow \infty \), in order to deduce that

$$\begin{aligned} \lim _{t\rightarrow \infty } |E_c(G_{ST,t}^{{\varvec{w}}}) \!-\! G_{ST}^\mathrm{{ appr},{\varvec{w}}}|&\le \frac{2C_1+C_2|{\varvec{{\varvec{w}}}}-{\varvec{\gamma }}|_1^2}{4} |\Delta {\varvec{V}}|^\mathrm{{ eq}} \!+\! \frac{C_2|{\varvec{{\varvec{w}}}}-{\varvec{\gamma }}|_1}{4} |\varvec{\mu }|^\mathrm{{ eq}} \!+\! \frac{C_3^\prime }{n}\\&=: C_1^\prime |\Delta {\varvec{V}}|^\mathrm{{ eq}} + C_2^\prime |\varvec{\mu }|^\mathrm{{ eq}} + \frac{C_3^\prime }{n}, \end{aligned}$$

since, for instance, the limit \(\lim _{t\rightarrow \infty } E_c(|{\varvec{V}}_t(x)-{\varvec{V}}|_\infty ) = |\Delta {\varvec{V}}|^\mathrm{{ eq}}\) in (48) exists for all \(x\). As similar analysis shows that

$$\begin{aligned}&\lim _{t\rightarrow \infty } |E_c(N_{eV,t}^{{\varvec{w}}}) - N_{eV}^\mathrm{{ appr},{\varvec{w}}}| \le \frac{C_3}{n}\lim _{t\rightarrow \infty } E_c|\bar{Y}- Y^\mathrm{{ appr}}|\\&\quad + \frac{C_4}{n} \lim _{t\rightarrow \infty } E_c|\bar{X}- X^\mathrm{{ appr}}| + \frac{C_9^\prime }{n}\\&\qquad \le \frac{C_3|{\varvec{{\varvec{C}}}}_Y|_1 + C_4|{\varvec{{\varvec{C}}}}_X|_1}{4}|\Delta {\varvec{V}}|^\mathrm{{ eq}} + \frac{C_4|{\varvec{{\varvec{C}}}}_X^\prime |_1}{4} |\Delta \varvec{\Sigma }|^\mathrm{{ eq}} + \frac{C_3|{\varvec{{\varvec{w}}}}-{\varvec{\gamma }}|_1}{4}|\varvec{\mu }|^\mathrm{{ eq}}\\&\quad \quad \quad + \frac{C_4|{\varvec{{\varvec{C}}}}_X|_1}{4}|\varvec{\varsigma }|^\mathrm{{ eq}} + \frac{C_4|{\varvec{{\varvec{C}}}}_X^\prime |_1}{4}|\varvec{\zeta }|^\mathrm{{ eq}} + \frac{C_9^\prime }{n}\\&\quad \quad \le \frac{C_3|{\varvec{{\varvec{w}}}}-{\varvec{\gamma }}|_1^2 + 4C_4|{\varvec{{\varvec{w}}}}-{\varvec{\gamma }}|_1^2}{4}|\Delta {\varvec{V}}|^\mathrm{{ eq}} + \frac{2C_4}{4} |\Delta \varvec{\Sigma }|^\mathrm{{ eq}} + \frac{C_3|{\varvec{{\varvec{w}}}}-{\varvec{\gamma }}|_1}{4}|\varvec{\mu }|^\mathrm{{ eq}}\\&\quad \quad \quad + \frac{4C_4|{\varvec{{\varvec{w}}}}-{\varvec{\gamma }}|_1^2}{4}|\varvec{\varsigma }|^\mathrm{{ eq}} + \frac{2C_4}{4}|\varvec{\zeta }|^\mathrm{{ eq}} + \frac{C_9^\prime }{n}\\&\quad \quad =: C_4^\prime |\Delta {\varvec{V}}|^\mathrm{{ eq}} + C_5^\prime |\Delta \varvec{\Sigma }|^\mathrm{{ eq}} + C_6^\prime |\varvec{\mu }|^\mathrm{{ eq}} + C_7^\prime |\varvec{\varsigma }|^\mathrm{{ eq}} + C_8^\prime |\varvec{\zeta }|^\mathrm{{ eq}} + \frac{C_9^\prime }{n}, \end{aligned}$$

where

$$\begin{aligned} |\varvec{\varsigma }|^\mathrm{{ eq}} = \lim _{t\rightarrow \infty } E_c(|\varvec{\varsigma }_t|_\infty ) \end{aligned}$$
(148)

is an asymptotic upper bound for the remainder terms \(\varvec{\varsigma }_t(x)\), defined in the same way as (45)–(47), and

$$\begin{aligned} C_3&= 2/\left( H_T^\mathrm{{ eq}}(\text{ tr }({\varvec{C}}_X{\varvec{V}})+ \text{ tr }({\varvec{C}}_X^\prime \varvec{\Sigma }))\right) ,\\ C_4&= 2\left( 1- \text{ tr }({\varvec{C}}_Y{\varvec{V}})\right) /\left( H_T^\mathrm{{ eq}}(\text{ tr }({\varvec{C}}_X{\varvec{V}})+ \text{ tr }({\varvec{C}}_X^\prime \varvec{\Sigma }))^2\right) . \end{aligned}$$

\(\square \)

Appendix F: Details from Sect. 11

Proof of Proposition 4

Let \(Q_{ij,kl}\) denote the probability that two different genes from subpopulations \(i\) and \(j\) have their parents in subpopulations \(k\) and \(l\) respectively, and let \(p_{ijk}\) be the coalescence probability defined in (84).

It is possible to compute \(q_{t+1,ij}\) by conditioning on the parental subpopulation \(k\) and \(l\) one generation back in time, and then look at the ancestry of the parents \(t\) generations back in time. Since coalescence can only appear when \(k=l\), we find that

$$\begin{aligned} q_{t+1,ij} = \sum _{k,l} Q_{ij,kl}(1-p_{ijk})^{\{k=l\}}q_{t,kl}. \end{aligned}$$

This equals the recursion in (82), with

$$\begin{aligned} D_{ij,kl}=Q_{ij,kl}(1-p_{ijk})^{\{k=l\}}. \end{aligned}$$
(149)

On the other hand, we can rewrite the gene diversity recursion (26) as

$$\begin{aligned} E(H_{t+1,ij}|{\varvec{P}}_t) = \left( 1-\frac{1}{2Nu_i}\right) ^{\{i=j\}} \sum _{k,l} Q_{ij,kl}(1-p_{ijk})^{\{k=l\}} \frac{H_{tkl}}{ \left( 1-\frac{1}{2Nu_k}\right) ^{\{k=l\}}}, \end{aligned}$$

since \((1-1/(2Nu_i))^{\{i=j\}}\) is the probability that two genes, drawn with replacement from subpopulations \(i\) and \(j\) in generation \(t+1\) are different, and \(H_{tkl}/(1-1/(2Nu_k))^{\{k=l\}}\) is the probability that two different genes from subpopulations \(k\) and \(l\) in generation \(t\) have different alleles. Hence we see from (26) that

$$\begin{aligned} A_{ij,kl} = \frac{\left( 1-\frac{1}{2Nu_i}\right) ^{\{i=j\}}}{\left( 1-\frac{1}{2Nu_k}\right) ^{\{k=l\}}} Q_{ij,kl}(1-p_{ijk})^{\{k=l\}}, \end{aligned}$$

from which (83) follows. \(\square \)

We will derive explicit expressions of the matrix elements \(D_{ij,kl}\) in Proposition 4. To this end, one could either calculate the coefficients \(U_{ij,kl}\) of the covariance matrix expansion (25), and then use Propositions 1 and 4 in order to find \(D_{ij,kl}\). Alternatively, one may employ coalescence probabilities and obtain the elements of \({\varvec{D}}\) directly from (82). We use this latter approach in order to prove the following:

Proposition 7

Asymptotically, for large populations and reproduction scenario 2, the elements of \({\varvec{D}}\) have the form

$$\begin{aligned} D_{ij,kl} = b_{ik}\left( \frac{b_{il}-\frac{1_{\{k=l\}}}{2Nu_i}}{1-\frac{1}{2Nu_i}}\right) ^{\{i=j\}} b_{jl}^{\{i\ne j\}} \left( 1-p_{ijk}\right) ^{\{k=l\}} + o(N^{-1}), \end{aligned}$$
(150)

where \(p_{ijk}\) is the coalescence probability (84) that two genes from subpopulations \(i\) and \(j\), that have their parents in \(k\), have the same parent, and

$$\begin{aligned} \sigma _{ijk}(N) = \frac{1}{m_{ki}m_{kj}} \cdot \left\{ \begin{array}{l@{\quad }l} E\left( \nu _{ki}^l (\nu _{ki}^l-1)\right) , &{} i=j,\\ E\left( \nu _{ki}^l\nu _{kj}^l\right) , &{} i\ne j. \end{array}\right. \end{aligned}$$
(151)

For reproduction scenario 3 with \(\alpha _i\equiv \infty \), it holds that

$$\begin{aligned} D_{ij,kl} = b_{ik}b_{jl} \left( 1-p_{ijk}\right) ^{\{k=l\}} + o(N^{-1}), \end{aligned}$$
(152)

with coalescence probability \(p_{ijk}=1/(2N_{ek})\), so that \(\sigma _{ijk}(N)\) in (84) equals

$$\begin{aligned} \sigma _{ijk}(N) = \frac{Nu_k}{N_{ek}}. \end{aligned}$$
(153)

Nagylaki (2000) has derived a recursion that generalizes (152) when \(N_{ek}=Nu_k\) for probabilities that concern not only the time when but also the subpopulation where coalescence of two genes from subpopulations \(i\) and \(j\) occurs. The constant \(\sigma _{ijk}(N)\) was defined in Hössjer (2011). As mentioned in Sect 11.1, it can be interpreted as the coalescence rate of a pair of lines from subpopulations \(i\) and \(j\), when both of these migrate backwards to \(k\).

Proof of Proposition 7

In order to establish (150) and (152), we will use (149), and hence we need to find expressions for \(Q_{ij,kl}\) and \(p_{ijk}\). Starting with reproduction scenario 2, we have

$$\begin{aligned} Q_{ij,kl} = \left\{ \begin{array}{l@{\quad }l} b_{ik}b_{jl}, &{} i\ne j,\\ 2Nu_ib_{ik}(2Nu_ib_{ik}-1)/(2Nu_i(2Nu_i-1)), &{} i=j,k=l,\\ 2Nu_ib_{ik}\cdot 2Nu_ib_{il}/(2Nu_i(2Nu_i-1)), &{} i=j,k\ne l, \end{array}\right. \qquad \quad \end{aligned}$$
(154)

since the two genes are drawn without replacement, and an exact fraction \(b_{ik}\) of the parents of the offspring genes of subpopulation \(i\) originate from subpopulation \(k\), and similarly, an exact fraction \(b_{jl}\) of the genes in \(j\) to have their parent in \(l\). We can rewrite (154) more compactly as

$$\begin{aligned} Q_{ij,kl} = b_{ik}\left( \frac{b_{il}-\frac{1_{\{k=l\}}}{2Nu_i}}{1-\frac{1}{2Nu_i}}\right) ^{\{i=j\}} b_{jl}^{\{i\ne j\}}. \end{aligned}$$

It follows for instance from Hössjer (2011) that the coalescence probability \(p_{ijk}\) has the form (84), and this completes the proof of (150).

For reproduction scenario 3 with \(\alpha _i\equiv \infty \), we simply have

$$\begin{aligned} Q_{ij,kl} = b_{ik}b_{jl}, \end{aligned}$$

since the parental subpopulations are drawn independently for two genes of subpopulations \(i\) and \(j\), from the probability distributions corresponding to rows \(i\) and \(j\) of \({\varvec{B}}\). Moreover, the coalescence probability is \(1/(2N_{ek})\), since this is the probability that the two parents in \(k\) originate from the same gene of a breeder, and this completes the proof of (152). \(\square \)

Proof of Theorem 3

We will use (87) in order to prove (88). By Perron–Frobenius’ Theorem, there exists a unique largest eigenvalue \(\lambda \) of \({\varvec{D}}\), with corresponding left and right eigenvectors \({\varvec{l}}=(l_{ij})\) and \({\varvec{r}}=(r_{ij})\), which can be normalized so that

$$\begin{aligned} \sum _{ij} l_{ij}&= 1,\\ \sum _{ij} l_{ij}r_{ij}&= 1. \end{aligned}$$

By a Jordan decomposition of \({\varvec{D}}\), it follows that

$$\begin{aligned} {\varvec{D}}^\tau = \lambda ^\tau {\varvec{r}}{\varvec{l}}+ o(\lambda ^\tau ) \text{ as } \tau \rightarrow \infty . \end{aligned}$$

Our asymptotic analysis \(N\rightarrow \infty \) is equivalent to letting the perturbation parameter

$$\begin{aligned} \varepsilon =\frac{1}{2N} \end{aligned}$$

tend to zero. In order to highlight the dependence of \({\varvec{D}}={\varvec{D}}(\varepsilon )\) on \(\varepsilon \), we Taylor expand its elements around \(\varepsilon =0\), as

$$\begin{aligned} D_{ij,kl} = D_{ij,kl}(\varepsilon ) = b_{ik}b_{jl} + \dot{D}_{ij,kl}\varepsilon + o(\varepsilon ). \end{aligned}$$

It follows from (150) that \(\dot{{\varvec{D}}}=(\dot{D}_{ij,kl})\) has elements

$$\begin{aligned} \dot{D}_{ij,kl} = -1_{\{k=l\}} u_k^{-1}b_{ik}b_{jl}\sigma _{ijk} + 1_{\{i=j\}}u_i^{-1}b_{ik}(b_{il}-1_{\{k=l\}}) \end{aligned}$$

for reproduction scenario 2 and

$$\begin{aligned} \dot{D}_{ij,kl} = -1_{\{k=l\}} u_k^{-1}b_{ik}b_{jl}\sigma _{ijk} \end{aligned}$$

for reproduction scenario 3 with \(\alpha _i\equiv \infty \). Clearly, \({\varvec{D}}(0)={\varvec{B}}\otimes {\varvec{B}}\) is the Kronecker product of \({\varvec{B}}\) with itself for either reproduction scenario. It has largest eigenvalue \(\lambda (0)=1\), since \({\varvec{B}}\) is the transition matrix of an irreducible Markov chain, with a unique largest eigenvalue 1. Moreover, the form of the left and right eigenvectors \({\varvec{l}}={\varvec{l}}(\varepsilon )\) and \({\varvec{r}}={\varvec{r}}(\varepsilon )\) can be deduced from the left and right eigenvectors of \({\varvec{B}}\) when \(\varepsilon =0\), as

$$\begin{aligned} l_{ij}(0)&= \gamma _i\gamma _j,\\ r_{ij}(0)&= 1. \end{aligned}$$

It follows from perturbation theory of matrices (see for instance Nagylaki (1980) and Van der AA et al. 2007), that

$$\begin{aligned} \lambda (\varepsilon ) = 1 + \dot{\lambda }\varepsilon + o(\varepsilon ) \text{ as } \varepsilon \rightarrow 0, \end{aligned}$$

where

$$\begin{aligned} \dot{\lambda }&= {\varvec{l}}(0)\dot{{\varvec{D}}}{\varvec{r}}(0)\\&= -\sum _{ijk} \gamma _i\gamma _j u_k^{-1}b_{ik}b_{jl}\sigma _{ijk} + \sum _{ikl} \gamma _i^2u_i^{-1}b_{ik}(b_{il}-1_{\{k=l\}})\\&= -C + \sum _{i} \gamma _i^2u_i^{-1} (1-1)\\&= -C, \end{aligned}$$

for reproduction scenario 2, with \(C\) as defined in (89). A similar (but simpler) analysis shows that \(\dot{\lambda }= -C\) for reproduction scenario 3 with \(\alpha _i\equiv \infty \). In view of (87), this implies

$$\begin{aligned} N_{e\pi }&= \frac{1}{2}{\varvec{W}}_T({\varvec{I}}-{\varvec{D}})^{-1}{\underline{{\mathbf{1}}}}\nonumber \\&= \frac{1}{2}{\varvec{W}}_T \left( \sum _{\tau =0}^\infty {\varvec{D}}^\tau \right) {\underline{{\mathbf{1}}}}\nonumber \\&= \frac{1}{2}{\varvec{W}}_T \left( \sum _{\tau =0}^\infty \left( \lambda ^\tau {\varvec{r}}{\varvec{l}}+ o(\lambda ^\tau )\right) \right) {\underline{{\mathbf{1}}}}\nonumber \\&= \frac{1}{2} \sum _{\tau =0}^\infty \left( ({\varvec{W}}_T{\varvec{r}}{\varvec{l}}{\underline{{\mathbf{1}}}})\lambda ^\tau + o(\lambda ^\tau )\right) \nonumber \\&= \frac{1}{2} \sum _{\tau =0}^\infty \left( {\varvec{W}}_T({\underline{{\mathbf{1}}}} + o(1))\lambda ^\tau + o(\lambda ^\tau )\right) \\&= \frac{1}{2} \sum _{\tau =0}^\infty \left( \lambda ^\tau + o(\lambda ^\tau )\right) \nonumber \\&= \frac{1}{2(1-\lambda )}(1+o(1))\nonumber \\&= \frac{1}{2C\varepsilon }(1+o(1))\nonumber \\&= \frac{N}{C}(1+o(1))\nonumber \end{aligned}$$
(155)

as \(\varepsilon \rightarrow 0\), or equivalently, as \(N\rightarrow \infty \), thereby proving (88). In the fifth equality of (156) we used that \({\varvec{r}}={\varvec{r}}(\varepsilon )={\underline{{\mathbf{1}}}} + o(1)\) as \(\varepsilon \rightarrow 0\), and in the sixth equality \({\varvec{W}}_T{\underline{{\mathbf{1}}}} = \sum _{i,j} w_iw_j = 1\), regardless of the choice of weight vector \({\varvec{w}}\).

We now turn to the proof of (90). It follows from Table 3 that \(\Vert {\varvec{U}}\Vert =O(N^{-1})\) for both reproduction scenarios 2 and 3 (with \(\alpha _i\equiv \infty \)). Invoking the upper part of (33) and (38), we deduce that

$$\begin{aligned} \text{ vec }(\varvec{\Sigma }) = {\varvec{U}}{\underline{{\mathbf{1}}}} \left( 1+ O(N^{-1}\text{ Mixtime })\right) = {\varvec{U}}{\underline{{\mathbf{1}}}} \left( 1+ O(N^{-1}\right) , \end{aligned}$$

where the last step follows from Proposition 3 and the fact that the migration rates are kept fixed. Inserting the last expression into (50), we find that

$$\begin{aligned} N_{eV}^\mathrm{{ appr}} = \frac{N}{C^\prime } + o(N), \end{aligned}$$
(156)

where

$$\begin{aligned} C^\prime = 2N\sum _{i,j=1}^s \gamma _i\gamma _j({\varvec{U}}{\underline{{\mathbf{1}}}})_{ij}. \end{aligned}$$
(157)

It thus remains to verify, for both reproduction scenarios, that \(C^\prime = C\). Starting with reproduction scenario 2, we find from Table 3 that

$$\begin{aligned} ({\varvec{U}}{\underline{{\mathbf{1}}}})_{ij} = \sum _{k,l=1}^s U_{ij,kl} = \sum _{k=1}^s \frac{C_{kij}u_k}{2Nu_iu_j}, \end{aligned}$$
(158)

with \(C_{kij}=\text{ Cov }(\nu _{ki}^l,\nu _{kj}^l)\). By the assumptions of the theorem, the quantities \(\sigma _{ijk}(N)\) in (151) will converge as \(N\rightarrow \infty \). Since the migration rates in \({\varvec{M}}\) are fixed, it follows that the covariances \(C_{kij}=C_{kij}(N)\) will converge as well. With a slight abuse of notation, we write \(C_{kij}\) also for the asymptotic \(N\rightarrow \infty \) limits. Inserting (158) into (157), we find that

$$\begin{aligned} C^\prime = \sum _{i,j,k=1}^s \gamma _i\gamma _j \frac{C_{kij}u_k}{u_iu_j}. \end{aligned}$$

On the other hand, it follows from the definition of \(\sigma _{ijk}\) in (151), that each covariance term \(C_{kij}\) can be rewritten as

$$\begin{aligned} C_{kij} = \sigma _{ijk}m_{ki}m_{kj} - m_{ki}m_{kj} + m_{ki}1_{\{i=j\}}. \end{aligned}$$
(159)

Inserting (159) into (157), it follows, after some computations, that

$$\begin{aligned} C^\prime&= \sum _{ijk} \gamma _i\gamma _j u_k^{-1}b_{ik}b_{jk}\left( \sigma _{kij}-1 + m_{ki}^{-1}1_{\{i=j\}}\right) \\&= \sum _{ijk} \gamma _i\gamma _ju_k^{-1}b_{ik}b_{jk}\sigma _{kij} - \sum _{ijk} \gamma _i\gamma _ju_k^{-1}b_{ik}b_{jk} + \sum _{ik} \gamma _i^2u_k^{-1}m_{ki}^{-1}b_{ik}^2\\&= C - \sum _k u_k^{-1}\gamma _k^2 + \sum _i u_i^{-1}\gamma _i^2,\\&= C, \end{aligned}$$

and in view of (156), this proves (90).

For reproduction scenario 3 with \(\alpha _i\equiv \infty \), it follows from Table 3 that

$$\begin{aligned} ({\varvec{U}}{\underline{{\mathbf{1}}}})_{ij} = \sum _{k,l=1}^s U_{ij,kl} = \sum _{k=1}^s b_{ik}b_{jk}\left( \frac{1}{2N_{ek}} - \frac{1}{2Nu_k}\right) + \frac{1_{\{i=j\}}}{2Nu_i}. \end{aligned}$$

Insertion of this expression into (157) leads to

$$\begin{aligned} C^\prime&= 2N\sum _{i,j,k=1}^s \gamma _i\gamma _j b_{ik}b_{jk}\left( \frac{1}{2N_{ek}} - \frac{1}{2Nu_k}\right) + 2N\sum _{i=1}^s \frac{\gamma _i^2}{2Nu_i}\nonumber \\&= 2N\sum _{k=1}^s \gamma _k^2 \left( \frac{1}{2N_{ek}} - \frac{1}{2Nu_k}\right) + 2N\sum _{i=1}^s \frac{\gamma _i^2}{2Nu_i}\nonumber \\&= \sum _{k=1}^s u_k^{-1}\gamma _k^2\cdot \frac{2Nu_k}{2N_{ek}} \\&= \sum _{k=1}^s u_k^{-1}\gamma _k^2 \sigma _k\nonumber \\&= C,\nonumber \end{aligned}$$
(160)

where \(\sigma _k=\sigma _{ijk}\) is defined in (153). The last step of (160) follows easily by adding a term \(\sigma _k\) on both sides of Eq. (91). \(\square \)

Given two random variables \(X\) and \(Y\), we put \(E_0(Y/X)^*=E_0(Y)/E_0(X)\), where \(E_0(X)=E(X|{\varvec{P}}_0=P_0\varvec{1})\), a prediction of \(Y/X\) given that the allele frequencies of the founder generation are the same in all subpopulations. The following proposition shows that \(\bar{f}_{ST}^{{\varvec{w}}}\) and \(f_{ST}^{{\varvec{w}}}\) are weighted averages over \(t\) of \(E_0(\bar{F}_{ST,t}^{{\varvec{w}}})^{*}\) and \(E_0(F_{ST,t}^{{\varvec{w}}})^{*}\) respectively:

Proposition 8

The matrix \(\bar{{\varvec{H}}}_t = (\bar{H}_{tij})_{i,j=1}^s\) of gene diversities, defined for a pair of distinct genes, satisfies

$$\begin{aligned} E_0\left( \text{ vec }(\bar{{\varvec{H}}}_t)\right) = 2P_0(1-P_0){\varvec{D}}^t\left( {\underline{{\mathbf{1}}}} +O(N^{-1})\right) , \end{aligned}$$
(161)

and the fixation index in (98) is a weighted average

$$\begin{aligned} \bar{f}_{ST}^{{\varvec{w}}} = \sum _{t=0}^\infty \bar{\omega }_t E_0\left( \bar{F}_{ST,t}^{{\varvec{w}}}\right) ^{*} + O(N^{-1}) = \sum _{t=0}^\infty \bar{\omega }_t \frac{E_0\left( \bar{H}_{Tt}^{{\varvec{w}}}-\bar{H}_{St}^{{\varvec{w}}}\right) }{E_0\left( \bar{H}_{Tt}^{{\varvec{w}}}\right) } + O(N^{-1}), \nonumber \\ \end{aligned}$$
(162)

of predictions \(E_0\left( \bar{F}^{{\varvec{w}}} _{ST,t}\right) ^{*}\) of the fixation index (95) over different time horizons \(t\), with weights

$$\begin{aligned} \bar{\omega }_t = \frac{{\varvec{W}}_T{\varvec{D}}^t{\underline{{\mathbf{1}}}}}{\sum _{\tau =0}^\infty {\varvec{W}}_T{\varvec{D}}^\tau {\underline{{\mathbf{1}}}}}. \end{aligned}$$

Analogously, the matrix \({\varvec{H}}_t = (H_{tij})_{i,j=1}^s\) of gene diversities, when the pair of genes is drawn with replacement, satisfies

$$\begin{aligned} E_0\left( \text{ vec }({\varvec{H}}_t)\right) = 2P_0(1-P_0){\varvec{A}}^t{\underline{{\mathbf{1}}}}, \end{aligned}$$
(163)

and the fixation index (99) is a weighted average

$$\begin{aligned} f_{ST}^{{\varvec{w}}} = \sum _{t=0}^\infty \omega _t E_0\left( F_{ST,t}^{{\varvec{w}}}\right) ^{*} = \sum _{t=0}^\infty \omega _t \frac{E_0\left( H_{Tt}^{{\varvec{w}}}-H_{St}^{{\varvec{w}}}\right) }{E_0\left( H_{Tt}^{{\varvec{w}}}\right) }, \end{aligned}$$
(164)

with weights

$$\begin{aligned} \omega _t = \frac{{\varvec{W}}_T{\varvec{A}}^t{\underline{{\mathbf{1}}}}}{\sum _{\tau =0}^\infty {\varvec{W}}_T{\varvec{A}}^\tau {\underline{{\mathbf{1}}}}}. \end{aligned}$$
(165)

It is implicit from the proof of Theorem 3 that the weights (165) correspond to a probability distribution with mean \(O(N)\), as discussed in Subsection 11.2.

Proof of Proposition 8

By means of an expansion \(({\varvec{I}}-{\varvec{D}})^{-1}=\sum _{t=0}^\infty {\varvec{D}}^t\), it is clear that (98) can be rewritten as

$$\begin{aligned} \bar{f}_{ST}^{{\varvec{w}}} = \sum _{t=0}^\infty \bar{\omega }_t \frac{({\varvec{W}}_T-{\varvec{W}}_S){\varvec{D}}^{t}{\underline{{\mathbf{1}}}}}{{\varvec{W}}_T{\varvec{D}}^{t}{\underline{{\mathbf{1}}}}}, \end{aligned}$$
(166)

given an assumption that the \(\mu \rightarrow 0\) approximation in (98) is exact. On the other hand, as in the proof of (82), it follows that we get a gene diversity recursion

$$\begin{aligned} E\left( \text{ vec }(\bar{{\varvec{H}}}_{t+1})|{\varvec{P}}_t\right) = {\varvec{D}}\text{ vec }(\bar{{\varvec{H}}}_t) \end{aligned}$$
(167)

instead of (29) when two genes are drawn without replacement. We prove (161) by repeated use of (167). This yields

$$\begin{aligned} E_0(\text{ vec }(\bar{{\varvec{H}}}_t))&= E(\text{ vec }(\bar{{\varvec{H}}}_t)|{\varvec{P}}_0=P_0\varvec{1})\\&= {\varvec{D}}^t\text{ vec }(\bar{{\varvec{H}}}_0)\\&= 2P_0(1-P_0){\varvec{D}}^t({\underline{{\mathbf{1}}}} + O(N^{-1})), \end{aligned}$$

applying (94) with \(t=0\) in the last step. Invoking the definitions of \(\bar{H}_{Tt}^{{\varvec{w}}}\) and \(\bar{H}_{St}^{{\varvec{w}}}\) into (161), this yields

$$\begin{aligned} E_0\left( \bar{H}_{Tt}^{{\varvec{w}}}-\bar{H}_{Tt}^{{\varvec{w}}}\right)&= 2P_0(1-P_0) ({\varvec{W}}_T-{\varvec{W}}_S){\varvec{D}}^t{\underline{{\mathbf{1}}}} + O(N^{-1}),\\ E_0\left( \bar{H}_{Tt}^{{\varvec{w}}}\right)&= 2P_0(1-P_0) {\varvec{W}}_T{\varvec{D}}^t{\underline{{\mathbf{1}}}} \left( 1 + O(N^{-1})\right) , \end{aligned}$$

where the last step follows as in the proof of Theorem 3 (see in particular (156)), since

$$\begin{aligned} {\varvec{W}}_T{\varvec{D}}^t\left( {\underline{{\mathbf{1}}}} +O(N^{-1})\right)&= \lambda ^t{\varvec{W}}_T{\varvec{r}}{\varvec{l}}({\underline{{\mathbf{1}}}} +O(N^{-1})) + o(\lambda ^t)\\&= \lambda ^t{\varvec{W}}_T{\varvec{r}}(1+O(N^{-1})) + o(\lambda ^t)\\&= \lambda ^t (1+O(N^{-1})) + o(\lambda ^t)\\&= {\varvec{W}}_T{\varvec{D}}^t{\underline{{\mathbf{1}}}} \left( 1 + O(N^{-1})\right) . \end{aligned}$$

Hence it follows that

$$\begin{aligned} E_0\left( \bar{F}_{ST,t}^{{\varvec{w}}}\right) = \frac{({\varvec{W}}_T-{\varvec{W}}_S){\varvec{D}}^t{\underline{{\mathbf{1}}}} + O(N^{-1})}{{\varvec{W}}_T{\varvec{D}}^t{\underline{{\mathbf{1}}}} \left( 1 + O(N^{-1})\right) } = \frac{({\varvec{W}}_T-{\varvec{W}}_S){\varvec{D}}^t{\underline{{\mathbf{1}}}}}{{\varvec{W}}_T{\varvec{D}}^t{\underline{{\mathbf{1}}}}} + O(N^{-1}). \end{aligned}$$

By inserting the last equation into (166) we arrive at (162).

Equations (163) and (164) are derived analogously, although the proof is simpler. The reason is that the \(O(N^{-1})\) remainder terms vanish, since \(\text{ vec }({\varvec{H}}_0)=2P_0(1-P_0){\underline{{\mathbf{1}}}}\) holds exactly when \({\varvec{P}}_0=P_0\varvec{1}\). \(\square \)

Proof of Theorem 4

It will be convenient to rewrite (27) as

$$\begin{aligned} {\varvec{A}}= {\varvec{B}}\otimes {\varvec{B}}- {\varvec{U}}= {\varvec{G}}- {\varvec{U}}, \end{aligned}$$
(168)

where \({\varvec{G}}=(G_{ij,kl})\) has elements \(G_{ij,kl}=b_{ik}b_{jl}\). The Jordan decomposition of \({\varvec{B}}\) in Appendix A implies that \({\varvec{B}}^0{\varvec{B}}^{t-1} = {\varvec{B}}^{t-1}{\varvec{B}}^0 = ({\varvec{B}}^0)^{t}\) for any non-negative integer \(t\). Since \({\varvec{G}}^0 = {\varvec{B}}^0\otimes {\varvec{B}}^0\) and \({\varvec{G}}={\varvec{B}}\otimes {\varvec{B}}\), it is easy to see that this implies

$$\begin{aligned} {\varvec{G}}^0{\varvec{G}}^{t-1} = {\varvec{G}}^{t-1}{\varvec{G}}^0 = ({\varvec{G}}^0)^{t}. \end{aligned}$$
(169)

A similar calculation as in the proof of Theorem 3 (see in particular (156)) yields

$$\begin{aligned} \sum _{t=0}^\infty ({\varvec{G}}-{\varvec{U}})^t {\underline{{\mathbf{1}}}} = (1-\lambda )^{-1}{\underline{{\mathbf{1}}}} + o\left( (1-\lambda )^{-1}\right) , \end{aligned}$$
(170)

where \(\lambda \) is the unique largest eigenvalue of \({\varvec{G}}-{\varvec{U}}\). We will also make use of the fact that

$$\begin{aligned} ({\varvec{W}}_T-{\varvec{W}}_S){\varvec{G}}= ({\varvec{W}}_T-{\varvec{W}}_S){\varvec{G}}^0 = - {\varvec{W}}_S{\varvec{G}}^0, \end{aligned}$$
(171)

which follows since \({\varvec{w}}=\varvec{\gamma }\) and

$$\begin{aligned} {\varvec{W}}_T{\varvec{G}}&= \text{ vec }\left( (\varvec{\gamma }{\varvec{B}})\otimes (\varvec{\gamma }{\varvec{B}})\right) ^T\\&= \text{ vec }(\varvec{\gamma }\otimes \varvec{\gamma })^T\\&= {\varvec{W}}_T,\\ {\varvec{W}}_S{\varvec{G}}&= {\varvec{W}}_S\left( (\varvec{1}\varvec{\gamma })\otimes (\varvec{1}\varvec{\gamma })\right) + {\varvec{W}}_S\left( (\varvec{1}\varvec{\gamma })\otimes {\varvec{B}}^0\right) + {\varvec{W}}_S({\varvec{B}}^0\otimes (\varvec{1}\varvec{\gamma })) + {\varvec{W}}_S{\varvec{G}}^0\\&= {\varvec{W}}_T + 0 + 0 + {\varvec{W}}_S{\varvec{G}}^0\\&= {\varvec{W}}_T + {\varvec{W}}_S{\varvec{G}}^0, \end{aligned}$$

with \(\varvec{1}\) a column vector of length \(s\), and

$$\begin{aligned} {\varvec{W}}_T{\varvec{G}}^0&= \text{ vec }\left( (\varvec{\gamma }{\varvec{B}}^0)\otimes (\varvec{\gamma }{\varvec{B}}^0)\right) ^T\\&= \text{ vec }\left( \mathbf{0} \otimes \mathbf{0} \right) ^T\\&= \mathbf{0}. \end{aligned}$$

Based on these preliminaries, we can rewrite the numerator of (99) as

$$\begin{aligned}&({\varvec{W}}_T-{\varvec{W}}_S)\left( {\varvec{I}}-({\varvec{G}}-{\varvec{U}})\right) ^{-1}{\underline{{\mathbf{1}}}} = ({\varvec{W}}_T-{\varvec{W}}_S)\sum _{t=0}^\infty ({\varvec{G}}-{\varvec{U}})^t{\underline{{\mathbf{1}}}}\\&\quad = ({\varvec{W}}_S-{\varvec{W}}_T)\sum _{t=0}^\infty \left( -{\varvec{G}}^t+\sum _{\tau =0}^{t-1} {\varvec{G}}^\tau {\varvec{U}}({\varvec{G}}-{\varvec{U}})^{t-\tau -1}\right) {\underline{{\mathbf{1}}}}\\&\quad = ({\varvec{W}}_S-{\varvec{W}}_T)\sum _{t=0}^\infty \sum _{\tau =0}^{t-1} ({\varvec{G}}^0)^\tau {\varvec{U}}({\varvec{G}}-{\varvec{U}})^{t-\tau -1}{\underline{{\mathbf{1}}}}\\&\quad = ({\varvec{W}}_S-{\varvec{W}}_T)\sum _{\tau =0}^\infty ({\varvec{G}}^0)^\tau {\varvec{U}}\sum _{\alpha =0}^\infty ({\varvec{G}}-{\varvec{U}})^\alpha {\underline{{\mathbf{1}}}}\\&\quad = (1-\lambda )^{-1}({\varvec{W}}_S-{\varvec{W}}_T)({\varvec{I}}-{\varvec{G}}^0)^{-1}{\varvec{U}}{\underline{{\mathbf{1}}}} + o\left( \Vert {\varvec{U}}\Vert (1-\lambda )^{-1}\right) , \end{aligned}$$

using (169), (171) and the fact that \(({\varvec{W}}_S-{\varvec{W}}_T){\varvec{G}}^t{\underline{{\mathbf{1}}}} = ({\varvec{W}}_S-{\varvec{W}}_T){\underline{{\mathbf{1}}}} = 0\) in the third step, a change of variables \(\alpha = t-\tau -1\) in the fourth step and (170) in the fifth step. Formula (170) also implies that the denominator of (99) equals

$$\begin{aligned} {\varvec{W}}_T\left( {\varvec{I}}-({\varvec{G}}-{\varvec{U}})\right) ^{-1}{\underline{{\mathbf{1}}}} = (1-\lambda )^{-1} + o\left( (1-\lambda )^{-1}\right) . \end{aligned}$$

In view of (168), we obtain formula (100) by taking the ratio of the last two displayed equations. In order to prove that \(F_{ST}^\mathrm{{ appr}}\) equals the right hand side of (100) as well, it follows, by the definition of \(\varvec{\Pi }\) in (34), that

$$\begin{aligned} ({\varvec{W}}_S\varvec{\Pi })_{kl}&= \sum _{i,j=1}^s \gamma _i1_{\{i=j\}}\Pi _{ij,kl}\\&= \sum _{i=1}^s \gamma _i\Pi _{ii,kl}\\&= \sum _{i=1}^s \gamma _i\left( 1_{\{(i,i)=(k,l)\}}- \gamma _k1_{\{i=l\}}-\gamma _l1_{\{i=k\}}+\gamma _k\gamma _l\right) \\&= \gamma _k1_{\{k=l\}} - \gamma _k\gamma _l\\&= ({\varvec{W}}_S-{\varvec{W}}_T)_{kl}, \end{aligned}$$

which we can rewrite in vector format, as

$$\begin{aligned} {\varvec{W}}_S\varvec{\Pi }= {\varvec{W}}_S-{\varvec{W}}_T. \end{aligned}$$
(172)

A similar calculation shows that

$$\begin{aligned} ({\varvec{G}}^0\varvec{\Pi })_{ij,kl}&= \sum _{m,n=1}^s (G^0)_{ij,mn}\Pi _{mn,kl}\\&= \sum _{m,n=1}^s b_{im}^0 b_{jn}^0 \left( 1_{\{(m,n)=(k,l)\}}-\gamma _k1_{\{m=l\}}-\gamma _l1_{\{n=k\}}+ \gamma _k\gamma _l\right) \\&= b_{ik}^0 b_{jl}^0 - \gamma _k b_{il}^0 \sum _{n=1}^s b_{jn}^0 - \gamma _l b_{jk}^0 \sum _{m=1}^s b_{im}^0 + \gamma _k\gamma _l \sum _{m=1}^s b_{im}^0\sum _{n=1}^s b_{jn}^0\\&= b_{ik}^0 b_{jl}^0\\&= G_{ij,kl}^0, \end{aligned}$$

which we rewrite as

$$\begin{aligned} {\varvec{G}}^0\varvec{\Pi }= {\varvec{G}}^0. \end{aligned}$$
(173)

This yields

$$\begin{aligned} F_{ST}^\mathrm{{ appr},{\varvec{\gamma }}}&= \sum _{i=1}^s \gamma _iV_{ii}\\&= {\varvec{W}}_S\text{ vec }({\varvec{V}})\\&= {\varvec{W}}_S \sum _{\tau =0}^\infty ({\varvec{G}}^0-\varvec{\Pi }{\varvec{U}})^\tau \varvec{\Pi }{\varvec{U}}{\underline{{\mathbf{1}}}}\\&= {\varvec{W}}_S \sum _{\tau =0}^\infty ({\varvec{G}}^0)^\tau \varvec{\Pi }{\varvec{U}}{\underline{{\mathbf{1}}}} + O(\Vert {\varvec{U}}\Vert ^2)\\&= ({\varvec{W}}_S-{\varvec{W}}_T)\sum _{\tau =0}^\infty ({\varvec{G}}^0)^\tau {\varvec{U}}{\underline{{\mathbf{1}}}} + O(N^{-2})\\&= ({\varvec{W}}_S-{\varvec{W}}_T)\left( {\varvec{I}}-{\varvec{G}}^0\right) ^{-1}{\varvec{U}}{\underline{{\mathbf{1}}}} + O(N^{-2}), \end{aligned}$$

where in the first step we used the definition (51) of \(F_{ST}^\mathrm{{ appr},{\varvec{\gamma }}}\), in the third step the expansion (36) of \(\text{ vec }({\varvec{V}})\) and in the fifth step the assumption \(\Vert {\varvec{U}}\Vert =O(N^{-1})\), (172), (173) and the second part of (171).

Finally, formula (101) is proved in the same way as (100), replacing \({\varvec{U}}\) by \(\bar{{\varvec{U}}}={\varvec{B}}\otimes {\varvec{B}}-{\varvec{D}}\) everywhere. \(\square \)

In order to compare the sizes of the fixation indeces when genes are drawn with and without replacement, we formulate the following result:

Proposition 9

The fixation index in (99) can be written as

$$\begin{aligned} f_{ST}^{{\varvec{w}}} = \frac{\bar{h}_T^{{\varvec{w}}}-\bar{h}_S^{{\varvec{w}}} + \sum _{i=1}^s \frac{w_i-w_i^2}{2Nu_i}\bar{h}_{ii}}{\bar{h}_T^{{\varvec{w}}} - \sum _{i=1}^s \frac{w_i^2}{2Nu_i}\bar{h}_{ii}}. \end{aligned}$$
(174)

In particular, for a strong migration limit where \(N\rightarrow \infty \) while the migration rates in \({\varvec{M}}\) are kept fixed, it holds that

$$\begin{aligned} f_{ST}^{{\varvec{w}}}&= \bar{f}_{ST}^{{\varvec{w}}} + \sum _{i=1}^s \frac{w_i-w_i^2}{2Nu_i} + o(N^{-1})\nonumber \\&\mathop {=}\limits ^{w_i=u_i=1/s}\bar{f}_{ST}^{{\varvec{w}}} + \frac{s-1}{2N} + o(N^{-1}). \end{aligned}$$
(175)

In order to illustrate this result, consider the island model under panmixia (\(m=1\)), for which it is well known that \(\bar{f}_{ST}=0\) for the canonical and uniform weighting scheme \(w_i=1/s\), reflecting the fact that subpopulations on the average are identical. However, even under panmixia, there will still be small differences between subpopulations. It is shown in Hössjer (2013) (see also Latter and Sved 1981) that the replacement version \(f_{ST}\) of the fixation index captures this, in terms of a nonzero value \(f_{ST}=(s-1)/(2N) + o(N^{-1})\). It also follows from Hössjer et al. (2013) or (68) that the replacement version of the quasi equilibrium approximation of the fixation index satisfies \(F_{ST}^\mathrm{{ appr}}= (s-1)/(2N)\) under panmixia.

Proof of Proposition 9

We have that

$$\begin{aligned} h_{ij}^{{\varvec{w}}} = \left( 1-\frac{1}{2Nu_i}\right) ^{\{i=j\}}\bar{h}_{ij}^{{\varvec{w}}}, \end{aligned}$$

since the probability is \(\left( 1-1/(2Nu_i)\right) ^{\{i=j\}}\) that two genes are not the same when drawn with replacement, and given this, they are different by state with probability \(\bar{h}_{ij}^{{\varvec{w}}}\), as defined in (96). It then follows from (97), and the analogous definitions of \(h_S^{{\varvec{w}}}\) and \(h_T^{{\varvec{w}}}\) in terms of \(h_{ij}^{{\varvec{w}}}\), that

$$\begin{aligned} h_S^{{\varvec{w}}}&= \bar{h}_S^{{\varvec{w}}} - \sum _{i=1}^s \frac{w_i}{2Nu_i}\bar{h}_{ii},\\ h_T^{{\varvec{w}}}&= \bar{h}_T^{{\varvec{w}}} - \sum _{i=1}^s \frac{w_i^2}{2Nu_i}\bar{h}_{ii}. \end{aligned}$$

By inserting these two equations into (99), we arrive at (174).

When migration rates are fixed and \(N\rightarrow \infty \), we have \(\bar{h}_{ij}=\bar{h}_T^{{\varvec{w}}}(1+O(N^{-1}))\) for all \(i,j\), and hence (174) implies

$$\begin{aligned} f_{ST}^{{\varvec{w}}} = \bar{f}_{ST}^{{\varvec{w}}} + \frac{\bar{h}_T^{{\varvec{w}}}\sum _{i=1}^s \frac{w_i-w_i^2}{2Nu_i} + O(N^{-2})}{\bar{h}_T^{{\varvec{w}}}\left( 1 + O(N^{-1})\right) }, \end{aligned}$$

which can be simplified to (175). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hössjer, O., Ryman, N. Quasi equilibrium, variance effective size and fixation index for populations with substructure. J. Math. Biol. 69, 1057–1128 (2014). https://doi.org/10.1007/s00285-013-0728-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00285-013-0728-9

Keywords

Mathematics Subject Classification (2000)

Navigation