Skip to main content
Log in

Analysis of Random Processes of Isonymy: II. Dynamics of Population Divergence

  • MATHEMATICAL MODELS AND METHODS
  • Published:
Russian Journal of Genetics Aims and scope Submit manuscript

Abstract

Random dynamics of the surname composition of a population of finite size in discrete time with non-overlapping generations is considered. It is assumed that surnames are passed to descendants along patrilineal lines. The dynamics is analyzed over a short effective time interval t/NE(t), where NE(t) is average harmonic effective population size over t generations. Since in this case systematic pressures can be neglected, the surname microevolution approximately corresponds to the process of random genetic drift, synchronously proceeding in the same population with the intensity four times less than for the surnames. Similar to the genetic drift model, the surname composition of the next generation τ is a random sample of size Ne(τ)/2 composed of the surnames of the male component of parental population; i.e., the size is 4 times less than the sample of 2Ne(τ) gametes under genetic drift (Ne(τ) is the effective population size in generation τ). The dynamics of the probability of a random encounter of namesakes and the probability of random encounter of individuals with different surnames are studied. These probabilities are similar to the concentrations of homozygotes and heterozygotes, respectively, in the genetic structure analysis. The exact time dependences for the indicated probabilities, variances of the surname concentrations, and the surname analog of the inbreeding coefficient are presented. The approximation of exact dependences by simpler ones is given over short effective time t/NE(t), where the surname divergence is four times faster than the genetic divergence. The results do not imply the surname monophyly and they describe a speculative theoretical set of replica populations, as if having re-experienced the microevolutionary history of the population in question under the same conditions. The use of a time which is small compared to the population size is justified by recent emergence of the majority of surnames in Russia and by the fact that the elapsed time in generations is much smaller than typical population size. In real subdivided populations, estimation of the inbreeding coefficient based on the surname concentrations does not allow for distinguishing the situations of a mechanical mixture of subpopulations or their common origin.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. Buzhilova, A.P., Geography of Russian surnames, in Vostochnye slavyane: antropologiya i etnicheskaya istoriya (East Slavs: Anthropology and Ethnic History), Moscow: Nauchnyi Mir, 1999, pp. 135—152.

    Google Scholar 

  2. Balanovska, E.V., Romanov, A.G., and Balanovsky, O.P., Namesakes or relatives? Approaches to investigating the relationship between Y chromosome haplogroups and surnames, Mol. Biol. (Moscow), 2011, vol. 45, no. 3, pp. 430—441. https://doi.org/10.1134/S0026893311030022

    Article  CAS  Google Scholar 

  3. Crow, J.F. and Mange, A.P., Measurement of inbreeding from the frequency of marriages between persons of the same surname, Soc. Biol., 1982, vol. 29, no. 1/2, pp. 101—105.

    CAS  PubMed  Google Scholar 

  4. Lasker, W.G., Surnames and Genetic Structure, Cambridge: Cambridge University Press, 1985.

    Book  Google Scholar 

  5. Revazov, A.A., Paradeeva, G.M., and Rusakova, G.I., Suitability of Russian surnames as a “quasi-genetic” marker, Genetika (Moscow), 1986, vol. 22, pp. 699—704.

    CAS  PubMed  Google Scholar 

  6. Tarskaia, L., El’chinova, G., Scapoli, C., et al., Surnames in Siberia: a study of the population of Yakutia through isonymy, Am. J. Phys. Anthropol., 2009, vol. 138, pp. 190—198. https://doi.org/10.1002/ajpa.20918

    Article  CAS  PubMed  Google Scholar 

  7. Sorokina, I.N., Churnosov, M.I., Baltutskaya, I.V., et al., Antropogeneticheskoe izuchenie naseleniya tsentral’noi Rossii (Anthropogenetic Study of the Central Russia Population), Moscow: Ross. Akad. Med. Nauk, 2014.

  8. Lasker, G.W., A coefficient of relationship by isonymy: a method for estimating the genetic relationship between populations, Hum. Biol., 1977, vol. 49, no. 3, pp. 489—493.

    CAS  PubMed  Google Scholar 

  9. Rogers, A.R., Doubts about isonymy, Hum. Biol., 1991, vol. 63, no. 5, pp. 663—668.

    CAS  PubMed  Google Scholar 

  10. Sorokina, I.N., Rudykh, N.A., Krikun, E.N., and Sokorev, S.N., The use of surnames in population genetic studies (on the example of foreign populations), Nauchn. Vedomosti Belarus. Gos. Univ., Ser. Med. Farm., 2016, no. 19(240), issue 35, pp. 5—10.

  11. Passekov, V.P., To the analysis of random isonymy processes: I. Structure of isonymy, Russ. J. Genet., 2021, vol. 57, no. 10, pp. 1194–1204

  12. Weir, B.S., Genetic Data Analysis: Method for Discrete Population Genetic Data, Sunderland: Sinauer Associates, 1990.

    Google Scholar 

  13. Malyutov, M.B. and Pasekov, V.P., About one statistical problem in population genetics, Teor. Veroyatn. Ee Primen., 1971, vol. 16, no. 3, pp. 579—581.

    Google Scholar 

  14. Hedrick, P.W., Genetics of Populations, Boston: Jones and Bartlett, 2003.

    Google Scholar 

  15. Li Ching Chun, First Course in Population Genetics, Pacific Grove: Boxwood, 1976.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. P. Passekov.

Ethics declarations

The author declares that he has no conflict of interests. This article does not contain any studies involving animals or human participants performed by the author.

Additional information

Translated by N. Maleeva

APPENDIX

APPENDIX

SUMMARY OF RESULTS FOR POPULATION GENETIC ANALYSIS

To make it convenient to reveal the similarities and differences in the main features of the surname and genetic states of the population, we simply list in one place the results (sometimes implicit) scattered throughout this article for the population genetic model of an elementary diploid population with non-overlapping generations, limited size, and random mating. Let the genetic composition of a diploid population be analyzed at one autosomal locus with k alleles. Then, the genetic state (composition) of the population can be described by the vector x = {xi} of allele concentrations. The dynamics of the state in the general case is determined by systematic pressures and random genetic drift. Given the small number of generations of microevolution t compared to the harmonic mean population size Ñe(t) for this period of time, the pressure of systematic factors on the genetic population states can be neglected, and the influence of random genetic drift will be decisive. The drift results in the divergence of both the population state from the initial one and populations with the common origin from each other. This step is the starting one for the microevolution of initially identical affined populations.

Description of Divergence in Statics

At a fixed point in time, the divergence pattern can be characterized by the following indices. Let pi be the initial concentration of the ith allele in a population with random mating subjected to random genetic drift. Then, after a certain period of time, the expected concentration of heterozygotes with the ith allele E{Hi(x(t))} in a population randomly sampled from a speculative universe of replica populations (as if repeating the microevolution of the considered one) with theoretically possible states is found as

$$\begin{gathered} E\left\{ {{{H}_{i}}({\mathbf{x}}(t))} \right\} \equiv E\left\{ {({{x}_{i}}(t)(1 - {{x}_{i}}(t))} \right\} \\ = {{p}_{i}}(1 - {{p}_{i}}) - V({{x}_{i}}(t)) = {{p}_{i}}(1 - {{p}_{i}})(1 - F(t)). \\ \end{gathered} $$

Here E{.} is the symbol of the mathematical expectation (the operation of obtaining the mean) of the random variable in curly brackets, V(.) is the variance of the random variable in parentheses (in our case, the variance of the concentration in replicas), F is the population inbreeding coefficient, and pi = xi(0). At the same time, provided that the concentration of the ith allele in a particular population of this universe is equal to xi, the proportion of the considered heterozygotes in it, taking into account the order of alleles, according to the Hardy–Weinberg law, theoretically is equal to xi(1 – xi).

The expected concentrations of heterozygotes of all types H(x) and homozygotes in a population randomly sampled from a speculative universe are found as

$$\begin{gathered} E\left\{ {H\left( {\mathbf{x}} \right)} \right\} \equiv E\left\{ {1 - \sum\limits_{i = 1}^k {x_{i}^{2}} } \right\} = H\left( {\mathbf{p}} \right)\left( {1 - F} \right), \\ F = \sum\limits_{i = 1}^k {{{V\left( {{{x}_{i}}} \right)} \mathord{\left/ {\vphantom {{V\left( {{{x}_{i}}} \right)} {H\left( {\mathbf{p}} \right)}}} \right. \kern-0em} {H\left( {\mathbf{p}} \right)}}} . \\ \end{gathered} $$
$$\begin{gathered} E\left\{ {x_{i}^{2}} \right\} = p_{i}^{2} + F{{p}_{i}}\left( {1 - {{p}_{i}}} \right) \\ = p_{i}^{2} + V\left( {{{x}_{i}}} \right) = ~p_{i}^{2} + F{{H}_{i}}\left( {\mathbf{p}} \right),\,\,\,\,i = 1,\,\,2,{\text{ }}...,\,\,k, \\ \end{gathered} $$
$$\begin{gathered} E\left\{ {\sum\limits_{i = 1}^k {x_{i}^{2}} } \right\} = \sum\limits_{i = 1}^k {p_{i}^{2}} + V\left( {\mathbf{x}} \right) = \sum\limits_{i = 1}^k {p_{i}^{2}} + FH\left( {\mathbf{p}} \right) \\ = 1 - H\left( {\mathbf{p}} \right) + FH\left( {\mathbf{p}} \right),\,\,\,\,~V\left( {\mathbf{x}} \right) \equiv \sum\limits_{i = 1}^k {V\left( {{{x}_{i}}} \right)} \,. \\ \end{gathered} $$

Here V(xi) is the variance of the distribution of {xi}, i = 1, 2, … k, values of the concentrations of the ith allele in replica populations. The population inbreeding coefficient F is expressed through the variance V(xi) of the individual allele concentration xi, as well as taking into account the concentrations of all alleles in the theoretical universe of replica populations as

$$\begin{gathered} F = {{V\left( {{{x}_{i}}} \right)} \mathord{\left/ {\vphantom {{V\left( {{{x}_{i}}} \right)} {\left( {{{p}_{i}}\left( {1 - {{p}_{i}}} \right)} \right)}}} \right. \kern-0em} {\left( {{{p}_{i}}\left( {1 - {{p}_{i}}} \right)} \right)}},\,\,\,\,i = 1,\,\,2,\,\, \ldots ,\,\,k, \\ F = {{\left( {{{H}_{i}}\left( {\mathbf{p}} \right) - E\left\{ {{{H}_{i}}\left( {\mathbf{x}} \right)} \right\}} \right)} \mathord{\left/ {\vphantom {{\left( {{{H}_{i}}\left( {\mathbf{p}} \right) - E\left\{ {{{H}_{i}}\left( {\mathbf{x}} \right)} \right\}} \right)} {H\left( {\mathbf{p}} \right)}}} \right. \kern-0em} {H\left( {\mathbf{p}} \right)}}, \\ F = {{V\left( {\mathbf{x}} \right)} \mathord{\left/ {\vphantom {{V\left( {\mathbf{x}} \right)} {\left( {1 - \sum\limits_{i = 1}^k {p_{i}^{2}} } \right)}}} \right. \kern-0em} {\left( {1 - \sum\limits_{i = 1}^k {p_{i}^{2}} } \right)}} = {{V\left( {\mathbf{x}} \right)} \mathord{\left/ {\vphantom {{V\left( {\mathbf{x}} \right)} {H\left( {\mathbf{p}} \right)}}} \right. \kern-0em} {H\left( {\mathbf{p}} \right)}}, \\ V\left( {\mathbf{x}} \right) \equiv \sum\limits_{i = 1}^k {V\left( {{{x}_{i}}} \right)} . \\ \end{gathered} $$

Similarly, the divergence of real subpopulations in a subdivided population can be described both in terms of the variance of the individual allele concentration distribution and in terms of the random inbreeding coefficient F.

The presented formulas characterize an inbred population. Formally, it seems as if it is subdivided, i.e., it consists either of speculative replica populations or of real subpopulations. In the latter case, the above formulas are equally applicable both to a universe of populations with the common origin and to an arbitrary mechanical mixture of populations with random mating. In this case, it is assumed that the mathematical expectations and variances of concentrations refer to the subdivided population under consideration, which plays the role of a theoretical universe. For a real group of subpopulations, mathematical expectations and variances are calculated in the standard way as means and standard deviations from these means. As a result, in the subdivided population as a whole, despite the fulfillment of the Hardy–Weinberg proportions in separate subpopulations, these ratios are disrupted (Wahlund effect). This population is characterized by the deficit of heterozygotes compared to what is expected in the case of total panmixia. The deficit caused by differences (variance) in the allele concentrations between subpopulations corresponds to a certain value of random inbreeding coefficient F (more precisely, FST).

Both for a mechanical mixture of populations and for a group of populations with common origin, this inbreeding coefficient is numerically equal to statistical correlation between homologous genes of uniting gametes in the genotypes. Obtaining its assessment does not make it possible to determine which of the situations the researcher is facing. Only in the case of the common origin of populations with their independent and identical microevolutionary history is statistical correlation directly associated with the probability of identity by descent of a pair of alleles on autosomal locus of a diploid genotype. In the general case, affined populations are not independent, and the procedure for evaluating the inbreeding coefficient as identity by descent of homologous alleles should take into account the origin of populations and the conditions of their microevolution.

Note that the statistical correlation between homologous genes of uniting gametes is important, since it is used to determine the concentrations of genotypes and thereby, for example, the change in the genetic structure as a result of selection.

Dynamics of the Main Characteristics of Divergence

We recall that t is the time in generations, Ñe(t) ≡ \({t \mathord{\left/ {\vphantom {t {\sum\nolimits_{{{\tau = 1}}}^t {({1 \mathord{\left/ {\vphantom {1 {Ne(\tau )}}} \right. \kern-0em} {Ne(\tau )}})} }}} \right. \kern-0em} {\sum\nolimits_{{{\tau = 1}}}^t {({1 \mathord{\left/ {\vphantom {1 {Ne(\tau )}}} \right. \kern-0em} {Ne(\tau )}})} }}\) is the harmonic mean effective population size over t generations, and Ne(τ) is the effective population size in generation τ. In the case of a rather short effective time \({t \mathord{\left/ {\vphantom {t {\tilde {N}e(t)}}} \right. \kern-0em} {\tilde {N}e(t)}}\), the model of microevolution is approximated by the process of random genetic drift. This process leads to random deviation of the population state from the initial one, the increase in its inbreeding coefficient, and the increase in the divergence of possible states (divergence of populations with common origin). In this case, the following time dependences of the main characteristics of the divergence (exact within the framework of the model) are fulfilled:

$$\begin{gathered} E\left\{ {{{H}_{i}}\left( {{\mathbf{x}}\left( t \right)} \right)} \right\} \equiv E\left\{ {{{x}_{i}}\left( t \right)\left( {1 - {{x}_{i}}\left( t \right)} \right)} \right\} \\ = {{p}_{i}}\left( {1 - {{p}_{i}}} \right)\prod\limits_{\tau = 1}^t {(1 - {1 \mathord{\left/ {\vphantom {1 {2Ne(\tau )}}} \right. \kern-0em} {2Ne(\tau )}})} \mathop \to \limits_{t \to \infty } 0,\,\,\,\,i = 1,\,\,2,\,\,...\,\,k, \\ \end{gathered} $$
$$\begin{gathered} E\left\{ {H\left( {{\mathbf{x}}\left( t \right)} \right)} \right\} \equiv E\left\{ {1 - \sum\limits_{i = 1}^k {x_{i}^{2}\left( t \right)} } \right\} \\ = H\left( {\mathbf{p}} \right)\prod\limits_{\tau = 1}^t {(1 - {1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0em} 2}Ne(\tau ))} \mathop \to \limits_{t \to \infty } 0. \\ \end{gathered} $$
$$E\left\{ {x_{i}^{2}\left( t \right)} \right\} = {{p}_{i}} - {{p}_{i}}\left( {1 - {{p}_{i}}} \right)\prod\limits_{\tau = 1}^t {(1 - {1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0em} 2}Ne(\tau ))} \mathop \to \limits_{t \to \infty } {{p}_{i}},$$
$$\begin{gathered} V\left( {{{x}_{i}}\left( t \right)} \right) = {{p}_{i}}\left( {1 - {{p}_{i}}} \right) \\ \times \,\,\left( {1 - \prod\limits_{\tau = 1}^t {(1 - {1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0em} 2}Ne(\tau ))} } \right)\mathop \to \limits_{t \to \infty } {{p}_{i}}\left( {1 - {{p}_{i}}} \right), \\ i = 1,2,\,\,...\,k, \\ \end{gathered} $$
$$\begin{gathered} V\left( {{\mathbf{x}}\left( t \right)} \right) \equiv \sum\limits_{i = 1}^k {V\left( {{{x}_{i}}\left( t \right)} \right)} \\ = H\left( {\mathbf{p}} \right)\left( {1 - \prod\limits_{\tau = 1}^t {(1 - {1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0em} 2}Ne(\tau ))} } \right)\mathop \to \limits_{t \to \infty } H\left( {\mathbf{p}} \right), \\ \end{gathered} $$
$$Fs({\mathbf{x}}(t)) = 1 - \prod\limits_{\tau = 1}^t {\left( {1 - {1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0em} 2}Ne(\tau )} \right)} \mathop \to \limits_{t \to \infty } 1.$$

Here, the limits exist in the case of constant (N) and growing Ne(τ) population size (when, for example, it is bounded from above by a constant). The formulas is simpler (and more familiar) in the case of constant population size N. Then, \(\prod\nolimits_{{{\tau = 1}}}^t {(1 - {1 \mathord{\left/ {\vphantom {1 {2Ne(\tau )}}} \right. \kern-0em} {2Ne(\tau )}})} \) = (1 – 1/2N)t.

Approximation of Results at Short Effective Time of Divergence

At relatively short stage of divergence (at short \({t \mathord{\left/ {\vphantom {t {\tilde {N}e(t)}}} \right. \kern-0em} {\tilde {N}e(t)}}\)), these dependences are further simplified to linear ones in terms of effective time \({t \mathord{\left/ {\vphantom {t {2\tilde {N}e(t)}}} \right. \kern-0em} {2\tilde {N}e(t)}}{\text{:}}\)

$$\begin{gathered} E\left\{ {H\left( {{\mathbf{x}}\left( t \right)} \right)} \right\}\sim H\left( {\mathbf{p}} \right)\left( {1 - {t \mathord{\left/ {\vphantom {t 2}} \right. \kern-0em} 2}\tilde {N}e\left( t \right)} \right) \\ = H\left( {\mathbf{p}} \right)\left( {1 - F\left( {{\mathbf{x}}\left( t \right)} \right)} \right),\,\,\,\,F\left( {{\mathbf{x}}\left( t \right)} \right) \sim {t \mathord{\left/ {\vphantom {t 2}} \right. \kern-0em} 2}\tilde {N}e\left( t \right), \\ \end{gathered} $$
$$\begin{gathered} V\left( {{{x}_{i}}\left( t \right)} \right) \sim {{p}_{i}}\left( {1 - {{p}_{i}}} \right) \times {t \mathord{\left/ {\vphantom {t 2}} \right. \kern-0em} 2}\tilde {N}e\left( t \right), \\ V\left( {{\mathbf{x}}\left( t \right)} \right) \sim \left( {1 - \sum\limits_{i = 1}^k {p_{i}^{2}} } \right) \times {t \mathord{\left/ {\vphantom {t 2}} \right. \kern-0em} 2}\tilde {N}e\left( t \right) \\ = H\left( {\mathbf{p}} \right) \times {t \mathord{\left/ {\vphantom {t 2}} \right. \kern-0em} 2}\tilde {N}e\left( t \right) = H\left( {\mathbf{p}} \right)F\left( {{\mathbf{x}}\left( t \right)} \right). \\ \end{gathered} $$

When the population size is constant, for instance, equal to N, then Ñe(t) is replaced by N.

Since asymptotically F(t) ~ \({t \mathord{\left/ {\vphantom {t {2\tilde {N}e(t)}}} \right. \kern-0em} {2\tilde {N}e(t)}},\) the inbreeding coefficient coincides with the effective time and monotonically increases with the time t in generations. Therefore, it can be said that the effective time is measured by the value of the inbreeding coefficient and vice versa, i.e., this asymptotic behavior is correct at small inbreeding coefficients inherent in human populations.

The random process of sampling drift remains so with respect to arbitrary subgroups of alleles, in particular, when one group consists of a single allele, for example, with concentration x, and the other group contains all the other alleles. The concentration of the latter group is equal to 1 – x; one can discard it as a dependent variable and focus on studying a particular case of the dynamics of allele concentrations separately. In this case, it is possible to express the inbreeding coefficient F(t) in terms of the expected concentration E{x(t)(1 – x(t))} of heterozygotes (taking into account the order of alleles) in a population randomly sampled from a theoretical universe of replica populations with the same demographic history:

$$\begin{gathered} F(t) = {{\left( {p(1 - p) - E\left\{ {x(t)(1{\text{ }}-x(t)} \right\}} \right)} \mathord{\left/ {\vphantom {{\left( {p(1 - p) - E\left\{ {x(t)(1{\text{ }}-x(t)} \right\}} \right)} {p(1 - p)}}} \right. \kern-0em} {p(1 - p)}} \\ = {{\left( {H(p) - E\left\{ {H(x(t))} \right\}} \right)} \mathord{\left/ {\vphantom {{\left( {H(p) - E\left\{ {H(x(t))} \right\}} \right)} {H(p)}}} \right. \kern-0em} {H(p)}},\,\,\,\,p = x\left( 0 \right). \\ \end{gathered} $$

Here E{x(t)(1 – x(t))} means averaging of x(t)(1 – x(t)) over possible values of x(t) in replica populations of theoretical universe. Note that the last expression for F(t) is also true when all alleles are taken into account.

It should be stressed that the above formula does not make it possible to estimate inbreeding with respect to typical data only on current concentrations of heterozygotes x(t)(1 – x(t)) in the studied population. It refers to the expected concentration of heterozygotes in a population randomly sampled from the theoretical universe of replica populations. Possible genetic compositions of replicas are randomly different. In the case where the microevolutionary history of the population under study is repeated under the same conditions, replicas diverge from each other owing to the sampling nature of random drift with its inherent sampling errors.

In addition, the found expression for F(t) depends on the unknown initial state p of the studied population (and its speculative replicas), which cannot be found from the current state.

The random inbreeding coefficient can also be represented as

$$F\left( t \right) = {{V\left( {x\left( t \right)} \right)} \mathord{\left/ {\vphantom {{V\left( {x\left( t \right)} \right)} {p\left( {1 - p} \right)}}} \right. \kern-0em} {p\left( {1 - p} \right)}} = {{V\left( {x\left( t \right)} \right)} \mathord{\left/ {\vphantom {{V\left( {x\left( t \right)} \right)} {H\left( {x\left( p \right)} \right)}}} \right. \kern-0em} {H\left( {x\left( p \right)} \right)}}.$$

Here V is interpopulation variance of allele concentrations in the theoretical universe of replicas (variance of possible results of realizations of the drift process under the same conditions with the same initial states). This approach also requires information on the unknown initial value of x(0) = p.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Passekov, V.P. Analysis of Random Processes of Isonymy: II. Dynamics of Population Divergence. Russ J Genet 57, 1337–1347 (2021). https://doi.org/10.1134/S1022795421110119

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1022795421110119

Keywords:

Navigation