Skip to main content

Advertisement

Log in

A balance index for phylogenetic trees based on rooted quartets

  • Published:
Journal of Mathematical Biology Aims and scope Submit manuscript

Abstract

We define a new balance index for rooted phylogenetic trees based on the symmetry of the evolutive history of every set of 4 leaves. This index makes sense for multifurcating trees and it can be computed in time linear in the number of leaves. We determine its maximum and minimum values for arbitrary and bifurcating trees, and we provide exact formulas for its expected value and variance on bifurcating trees under Ford’s \(\alpha \)-model and Aldous’ \(\beta \)-model and on arbitrary trees under the \(\alpha \)\(\gamma \)-model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. See http://evolution.genetics.washington.edu/phylip/newicktree.html.

References

  • Abramowitz M, Stegun IAS (1972) Handbook of mathematical functions with formulas, graphs, and mathematical tables. Dover, New York

    MATH  Google Scholar 

  • Aldous D (1996) Probability distributions on cladograms. In: Aldous D, Pemantle R (eds) Random discrete structures. The IMA voumes in mathematics and its applications, vol 76. Springer, New York, pp 1–18

    Chapter  Google Scholar 

  • Aldous D (2001) Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today. Stat Sci 16:23–34

    Article  MathSciNet  MATH  Google Scholar 

  • Blum MGB, François OF (2005) On statistical tests of phylogenetic tree imbalance: the Sackin and other indices revisited. Math Bioscinces 195:141–153

    Article  MathSciNet  MATH  Google Scholar 

  • Cardona G, Mir A, Rosselló F (2013) Exact formulas for the variance of several balance indices under the Yule model. J Math Biol 67:1833–1846

    Article  MathSciNet  MATH  Google Scholar 

  • Cavalli-Sforza LL, Edwards A (1967) Phylogenetic analysis: models and estimation procedures. Evolution 21:550–570

    Article  Google Scholar 

  • Chen B, Ford D, Winkel M (2009) A new family of Markov branching trees: the alpha–gamma model. Electron J Probab 14:400–430

    Article  MathSciNet  MATH  Google Scholar 

  • Colless DH (1982) Review of “phylogenetics: the theory and practice of phylogenetic systematics”. Syst Zool 31:100–104

    Article  Google Scholar 

  • Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. The MIT Press, Cambridge

    MATH  Google Scholar 

  • Coronado TM, Mir A, Rosselló F (2018) The probabilities of trees and cladograms under Ford’s \(\alpha \)-model. Sci World J 2018:1916094

    Article  Google Scholar 

  • Dearlove BL, Frost SD (2015) Measuring asymmetry in time-stamped phylogenies. PLoS Comput Biol 11.7(2015):e1004312

    Article  Google Scholar 

  • Felsenstein J (2004) Inferring phylogenies. Sinauer Associates Inc., Sunderland

    Google Scholar 

  • Ford D (2005) Probabilities on cladograms: introduction to the alpha model. arXiv preprint arXiv:math/0511246

  • Harding E (1971) The probabilities of rooted tree-shapes generated by random bifurcation. Adv Appl Probab 3:44–77

    Article  MathSciNet  MATH  Google Scholar 

  • Keller-Schmidt S, Tuğrul M, Eguíluz VM, Hernández-García E, Klemm K (2015) Anomalous scaling in an age-dependent branching model. Phys Rev E 91:022803

    Article  Google Scholar 

  • Kirkpatrick M, Slatkin M (1993) Searching for evolutionary patterns in the shape of a phylogenetic tree. Evolution 47:1171–1181

    Article  Google Scholar 

  • Macdonald IG (1995) Symmetric functions and Hall polynomials, 2nd edn. Oxford University Press, Oxford

    MATH  Google Scholar 

  • Matsen F (2007) Optimization over a class of tree shape statistics. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 4:506–512

    Article  Google Scholar 

  • McKenzie A, Steel M (2000) Distributions of cherries for two models of trees. Math Biosci 164:81–92

    Article  MathSciNet  MATH  Google Scholar 

  • Mir A, Rosselló F, Rotger L (2013) A new balance index for phylogenetic trees. Math Biosci 241:125–136

    Article  MathSciNet  MATH  Google Scholar 

  • Mir A, Rotger L, Rosselló F (2018) Sound Colless-like balance indices for multifurcating trees. PLoS ONE 13(9):e0203401

    Article  Google Scholar 

  • Mooers A, Heard SB (1997) Inferring evolutionary process from phylogenetic tree shape. Q Rev Biol 72:31–54

    Article  Google Scholar 

  • Pinelis I (2003) Evolutionary models of phylogenetic trees. Proc R Soc Lond B Biol Sci 270:1425–1431

    Article  MathSciNet  Google Scholar 

  • Rosen DE (1978) Vicariant patterns and historical explanation in biogeography. Syst Biol 27:159–188

    Google Scholar 

  • Sackin MJ (1972) Good and “bad” phenograms. Syst Zool 21:225–226

    Article  Google Scholar 

  • Shao KT, Sokal R (1990) Tree balance. Syst Zool 39:226–276

    Article  Google Scholar 

  • Sloane NJA (2010) The on-line encyclopedia of integer sequences. http://oeis.org/. Accessed 30 Apr 2019

  • Wu T, Choi KP (2016) On joint subtree distributions under two evolutionary models. Theor Popul Biol 108:13–23

    Article  MATH  Google Scholar 

  • Yule GU (1924) A mathematical theory of evolution based on the conclusions of Dr J. C. Willis. Philos Trans R Soc Lond Ser B 213:21–87

    Article  Google Scholar 

  • Zhu S, Degnan JH, Steel M (2011) Clades, clans and reciprocal monophyly under neutral evolutionary models. Theor Popul Biol 79:220–227

    Article  MATH  Google Scholar 

  • Zhu S, Than C, Wu T (2015) Clades and clans: a comparison study of two evolutionary models. J Math Biol 71:99–124

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

A preliminary version of this paper was presented at the Workshop on Algebraic and combinatorial phylogenetics held in Barcelona (June 26–30, 2017). We thank Mike Steel, Gabriel Riera, Seth Sullivant, the anonymous reviewers and the associate editor for their helpful suggestions on several aspects of this paper. This research was partially supported by the Spanish Ministry of Economy and Competitiveness and the European Regional Development Fund through Project DPI2015-67082-P (MINECO/FEDER).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesc Rosselló.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

1.1 A.1: An alternative derivation of the variance of \( rQIB _n\) under the Yule model

In this section we give an alternative proof of the following result.

Proposition 4

Under the Yule model,

$$\begin{aligned} Var_{Y}( rQIB _n)=\left( {\begin{array}{c}n\\ 4\end{array}}\right) \frac{5 n^4+ 30 n^3+ 118 n^2 + 408 n+ 630}{33075}. \end{aligned}$$

Proof

By Lemma 7, \( rQIB \) on \({\mathcal {BT}}_n\) is a bifurcating recursive tree shape statistic satisfying the recurrence

$$\begin{aligned} rQIB (T\star T')= rQIB (T)+ rQIB (T')+f_{ rQIB }(|L(T)|,|L(T')|) \end{aligned}$$

with \(f_{ rQIB }(a,b)=\left( {\begin{array}{c}a\\ 2\end{array}}\right) \left( {\begin{array}{c}b\\ 2\end{array}}\right) \). Then, it satisfies the hypothesis in Cor. 1 of Cardona et al. (2013) with

$$\begin{aligned} \begin{aligned} \displaystyle \varepsilon (a,b-1)&\displaystyle =f_{ rQIB }(a,b)-f_{ rQIB }(a,b-1)\\&\displaystyle =\left( {\begin{array}{c}a\\ 2\end{array}}\right) \left( {\begin{array}{c}b\\ 2\end{array}}\right) -\left( {\begin{array}{c}a\\ 2\end{array}}\right) \left( {\begin{array}{c}b-1\\ 2\end{array}}\right) =(b-1)\left( {\begin{array}{c}a\\ 2\end{array}}\right) \\ \displaystyle R(n-1)&\displaystyle =E_{Y}( rQIB _n)-E_{Y}( rQIB _{n-1})\\&\displaystyle =\frac{1}{3}\left( {\begin{array}{c}n\\ 4\end{array}}\right) -\frac{1}{3}\left( {\begin{array}{c}n-1\\ 4\end{array}}\right) =\frac{1}{3}\left( {\begin{array}{c}n-1\\ 3\end{array}}\right) \end{aligned} \end{aligned}$$

Since \(E_{Y}( rQIB _1)=0\) and \(f_{ rQIB }(n-1,1)=0\), applying the aforementioned result from Cardona et al. (2013) we have that

$$\begin{aligned} E_{Y}( rQIB _n^2) \displaystyle= & {} \frac{n}{n-1} E_{Y}( rQIB _{n-1}^2)+\frac{4}{n-1} \sum _{k=1}^{n-2} \varepsilon (k,n-k-1) E_{Y}( rQIB _k) \\&\displaystyle +\frac{2}{n-1} \sum _{k=1}^{n-2} R(n-k-1)E_{Y}( rQIB _k) \\&\displaystyle +\frac{1}{n-1}\sum _{k=1}^{n-2} (f_{ rQIB }(k,n-k)^2- f_{ rQIB }(k,n-k-1)^2)\\ \displaystyle= & {} \frac{n}{n-1} E_{Y}( rQIB _{n-1}^2)+\frac{4}{3(n-1)} \sum _{k=1}^{n-2} (n-k-1)\left( {\begin{array}{c}k\\ 2\end{array}}\right) \left( {\begin{array}{c}k\\ 4\end{array}}\right) \\&\displaystyle +\frac{2}{9(n-1)} \sum _{k=1}^{n-2} \left( {\begin{array}{c}n-k-1\\ 3\end{array}}\right) \left( {\begin{array}{c}k\\ 4\end{array}}\right) \\&\displaystyle +\frac{1}{n-1}\sum _{k=1}^{n-2} \left( {\begin{array}{c}k\\ 2\end{array}}\right) ^2\Bigg (\left( {\begin{array}{c}n-k\\ 2\end{array}}\right) ^2-\left( {\begin{array}{c}n-k-1\\ 2\end{array}}\right) ^2\Bigg )\\ \displaystyle= & {} \frac{n}{n-1} E_{Y}( rQIB _{n-1}^2) +\frac{n}{3}\left( {\begin{array}{c}n-2\\ 4\end{array}}\right) \frac{15 n^2 - 35 n + 6}{420}\\&\displaystyle +\frac{n}{9} \left( {\begin{array}{c}n-2\\ 4\end{array}}\right) \frac{n^2 - 13 n + 42}{840}\\&\displaystyle + n\left( {\begin{array}{c}n-2\\ 2\end{array}}\right) \frac{3 n^4 - 18 n^3 + 41 n^2 - 42 n + 36}{1680}\\ \displaystyle= & {} \frac{n}{n-1} E_{Y}( rQIB _{n-1}^2) \\&\displaystyle + \frac{n(n-2)(n-3)(253 n^4-2014 n^3 +6119 n^2-7430 n+ 3504)}{181440} \end{aligned}$$

Dividing by n both sides of this expression for \(E_{Y}( rQIB ^2_n)\) and setting \(y_n=E_{Y}( rQIB ^2_n)/n\), we obtain the recurrence

$$\begin{aligned} y_n =\displaystyle y_{n-1}+ \frac{(n-2)(n-3)(253 n^4-2014 n^3 +6119 n^2-7430 n+ 3504)}{181440}. \end{aligned}$$

Since \(y_0=y_1=0\), its solution is

$$\begin{aligned} y_n \displaystyle= & {} \sum _{k=2}^n \frac{(k-2)(k-3)(253 k^4-2014 k^3 +6119 k^2-7430 k+ 3504)}{181440}\\ \displaystyle= & {} \frac{(n - 3) (n - 2) (n - 1) (1265 n^4 - 7110 n^3 + 14419 n^2 - 4086 n + 5040)}{6350400} \end{aligned}$$

from where we obtain

$$\begin{aligned} E_{Y}( rQIB _n^2)&=ny_n\\&\displaystyle = \left( {\begin{array}{c}n\\ 4\end{array}}\right) \frac{1265 n^4 - 7110 n^3 + 14419 n^2 - 4086 n + 5040}{264600}. \end{aligned}$$

Finally

$$\begin{aligned} Var_{Y}( rQIB _n) \displaystyle= & {} E_{Y}( rQIB _n^2)-E_{Y}( rQIB _n)^2\\ \displaystyle= & {} \left( {\begin{array}{c}n\\ 4\end{array}}\right) \frac{1265 n^4 - 7110 n^3 + 14419 n^2 - 4086 n + 5040}{264600}-\frac{1}{9}\left( {\begin{array}{c}n\\ 4\end{array}}\right) ^2\\ \displaystyle= & {} \left( {\begin{array}{c}n\\ 4\end{array}}\right) \frac{5 n^4 + 30 n^3 + 118 n^2 + 408 n + 630}{33075}, \end{aligned}$$

as we claimed. \(\square \)

1.2 A.2: Some tables used in Sect. 5

See Tables 3, 4 and 5.

Table 3 Coefficients of the probabilities of the trees in \({\mathcal {BT}}_k^*\), for \(k=5,6,7,8\), in the formula for the variance of \( rQIB _n\)
Table 4 Probabilities under the \(\alpha \)-model of the trees involved in the formula for the variance of \( rQIB _n\)
Table 5 Probabilities under the \(\beta \)-model of the trees involved in the formula for the variance of \( rQIB _n\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Coronado, T.M., Mir, A., Rosselló, F. et al. A balance index for phylogenetic trees based on rooted quartets. J. Math. Biol. 79, 1105–1148 (2019). https://doi.org/10.1007/s00285-019-01377-w

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00285-019-01377-w

Keywords

Mathematics Subject Classification

Navigation