Skip to main content
Log in

A Geometrical Framework for f-Statistics

  • Original Article
  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

A detailed derivation of the f-statistics formalism is made from a geometrical framework. It is shown that the f-statistics appear when a genetic distance matrix is constrained to describe a four population phylogenetic tree. The choice of genetic metric is crucial and plays an outstanding role as regards the tree-like-ness criterion. The case of lack of treeness is interpreted in the formalism as the presence of population admixture. In this respect, four formulas are given to estimate the admixture proportions. One of them is the so-called \(f_4\)-ratio estimate and we show that a second one is related to a known result developed in terms of the fixation index \(F_{\mathrm{ST}}\). An illustrative numerical simulation of admixture proportion estimates is included. Relationships of the formalism with coalescence times and pairwise sequence differences are also provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Barthelemy JP, Guenoche A (1991) Trees and proximity representations. Wiley, Hoboken

    MATH  Google Scholar 

  • Cavalli-Sforza LL (1966) Population structure and human evolution. Proceedings of the Royal Society of London Series B, Containing Papers of a Biological Character 164:362–379

    Google Scholar 

  • Cavalli-Sforza LL, Bodmer WF (1999) The genetics of human populations. Dover Publications, New York

    Google Scholar 

  • Cavalli-Sforza LL, Piazza A (1975) Analysis of evolution: evolutionary rates, independence and treeness. Theor Popul Biol 8:127–165

    Article  MathSciNet  Google Scholar 

  • Cavalli-Sforza LL, Menozzi P, Piazza A (1994) The history and geography of human genes. Princeton University Press, Princeton

    MATH  Google Scholar 

  • David Reich Lab (Harvard University). https://reich.hms.harvard.edu/datasets (2020)

  • Harris AM, DeGiorgio M (2017) Admixture and ancestry inference from ancient and modern samples through measures of population genetic drift. Hum Biol 89:21

    Article  Google Scholar 

  • Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, Genschoreck T, Webster T, Reich D (2012) Ancient admixture in human history. Genetics 192:1065–93

    Article  Google Scholar 

  • Peter BM (2016) Admixture, population structure, and F-statistics. Genetics 202:1485–501

    Article  Google Scholar 

  • Raghavan M, Skoglund P, Graf KE, Metspalu M, Albrechtsen A, Moltke I, Rasmussen S, Stafford TW Jr, Orlando L, Metspalu E, Karmin M, Tambets K, Rootsi S, Mägi R, Campos PF, Balanovska E, Balanovsky O, Khusnutdinova E, Litvinov S, Osipova LP, Fedorova SA, Voevoda MI, DeGiorgio M, Sicheritz-Ponten T, Brunak S, Demeshchenko S, Kivisild T, Villems R, Nielsen R, Jakobsson M, Willerslev E (2014) Upper Palaeolithic Siberian genome reveals dual ancestry of native Americans. Nature 505:87–91

    Article  Google Scholar 

  • Reich D, Thangaraj K, Patterson N, Price AL, Singh L (2009) Reconstructing Indian population history. Nature 461:489–494

    Article  Google Scholar 

  • Simões-Pereira JMS (1969) A note on the tree realizability of a distance matrix. J Combin Theory 6:303–310

    Article  MathSciNet  Google Scholar 

  • Soraggi S, Wiuf C (2019) General theory for stochastic admixture graphs and F-statistics. Theor Popul Biol 125:56–66

    Article  Google Scholar 

  • Wright S (1951) The genetical structure of populations. Ann Eugen 15:323–354

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We are indebted to J.M. Bordes for illuminating discussions about the geometry of f-statistics. GOG thanks The Leverhulme Trust for financial support. JAO work was partially supported by the Spanish MINECO (Grant Numbers AYA2016-81065-C2-2 and PID2019-109592GB-100).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José-Angel Oteo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Lower Bound to Reliability of \(f_3\) Admixture Test

Lower Bound to Reliability of \(f_3\) Admixture Test

A lower bound may be given to the Euclidean distance r travelled in \(\mathbb {R}^m\) by three points representing an admixed population and two contributors, respectively, before they reach the right-angled triangle geometric configuration. If the rate at which allele frequencies change in the dataset were known, then an estimate of the time t elapsed since the birth of the admixture could be provided on the basis that r is proportional to \(\sqrt{t}\), if we assume that the vectors describe Brownian trajectories in \(\mathbb {R}^m\). These bounds are of interest to asses the validity of the \(f_3\) admixture test.

Fig. 11
figure 11

Scheme to find a lower bound to reliability of \(f_3\) admixture test

In Fig. 11, the point \(A'\) represents the admixture population and A and \(A''\) stand for the two equal-weight contributors to the linear combination that defines the birth of \(A'\). The goal is to give just a lower bound, so we do not deal with a generic proportion. Besides, A and \(A''\) are assumed one unit apart and so define the genetic distance scale. In \(\mathbb {R}^m\), those points evolve in time describing a Brownian-like trajectory (red lines) and the linear distance reached, r, is proportional to \(\sqrt{t}\). All the points on a circumference are reached, on average, in the same span. For instance, the trajectory born at the initial time at A wanders in \(\mathbb {R}^m\) until it crosses the circumference at B.

We are interested in the situation where the angle at vertex \(B'\) becomes right. We assume the favourable case where the particular locations \(B,B',B''\), are simultaneously reached. This is the configuration depicted, where lines \(\overline{B'B}\) and \(\overline{B'B''}\) are tangents to a circumference. We want to ascertain the radius r at which this situation happens because triangle \(BB'B''\) is the triangle criterion (22). The proof relies on the fact that the triangles ABC and \(CA'B'\) are identical. Thus, \(|\overline{CA'}|=r\) and \(|\overline{AC}|=|\overline{CB'}|=\sqrt{2}r\). Since \(|\overline{AA''}|=1\), we have then \(2(|\overline{AC}|+\overline{CA'}|)\) \(=2r(1+\sqrt{2})=1\), and therefore, \(r=(\sqrt{2}-1)/2\simeq 0.207\). Of course, this picture corresponds to an extreme situation in which the wandering of the three trajectories ends up simultaneously at points B, \(B'\) and \(B''\). This is why \(21\%\) of the initial distance between contributors is just a lower bound. After that, if the dynamics leads the geometry to an acute-angled triangle the \(f_3\) admixture test will fail.

An algebraic proof is also possible minimizing the expression of r with respect to several variables, but the algebra is quite more involved. It yields the very same result for the radius r.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oteo-García, G., Oteo, JA. A Geometrical Framework for f-Statistics. Bull Math Biol 83, 14 (2021). https://doi.org/10.1007/s11538-020-00850-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11538-020-00850-8

Keywords

Mathematics Subject Classification

Navigation