Skip to main content
Log in

On the convergence of the maximum likelihood estimator for the transition rate under a 2-state symmetric model

  • Published:
Journal of Mathematical Biology Aims and scope Submit manuscript

Abstract

Maximum likelihood estimators are used extensively to estimate unknown parameters of stochastic trait evolution models on phylogenetic trees. Although the MLE has been proven to converge to the true value in the independent-sample case, we cannot appeal to this result because trait values of different species are correlated due to shared evolutionary history. In this paper, we consider a 2-state symmetric model for a single binary trait and investigate the theoretical properties of the MLE for the transition rate in the large-tree limit. Here, the large-tree limit is a theoretical scenario where the number of taxa increases to infinity and we can observe the trait values for all species. Specifically, we prove that the MLE converges to the true value under some regularity conditions. These conditions ensure that the tree shape is not too irregular, and holds for many practical scenarios such as trees with bounded edges, trees generated from the Yule (pure birth) process, and trees generated from the coalescent point process. Our result also provides an upper bound for the distance between the MLE and the true value.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Ané C (2008) Analysis of comparative data with hierarchical autocorrelation. Ann Appl Stat 2(3):1078–1102

    Article  MathSciNet  Google Scholar 

  • Ané C, Ho LST, Roch S (2017) Phase transition on the convergence rate of parameter estimation under an Ornstein–Uhlenbeck diffusion on a tree. J Math Biol 74(1–2):355–385

    Article  MathSciNet  Google Scholar 

  • Felsenstein J (1981) Evolutionary trees from gene frequencies and quantitative characters: finding maximum likelihood estimates. Evolution 35(6):1229–1242

    Article  Google Scholar 

  • Felsenstein J (1985) Phylogenies and the comparative method. Am Nat 125(1):1–15

    Article  MathSciNet  Google Scholar 

  • Harmon LJ, Weir JT, Brock CD, Glor RE, Challenger W (2007) GEIGER: investigating evolutionary radiations. Bioinformatics 24(1):129–131

    Article  Google Scholar 

  • Ho LST, Ané C (2013) Asymptotic theory with hierarchical autocorrelation: Ornstein–Uhlenbeck tree models. Ann Stat 41(2):957–981

    Article  MathSciNet  Google Scholar 

  • Ho LST, Ané C (2014) Intrinsic inference difficulties for trait evolution with Ornstein–Uhlenbeck models. Methods Ecol Evol 5(11):1133–1146

    Article  Google Scholar 

  • Jammalamadaka SR, Janson S (1986) Limit theorems for a triangular scheme of U-statistics with applications to inter-point distances. Ann Probab 14(4):1347–1358

    Article  MathSciNet  Google Scholar 

  • Lambert A, Stadler T (2013) Birth-death models and coalescent point processes: the shape and probability of reconstructed phylogenies. Theor Popul Biol 90:113–128

    Article  Google Scholar 

  • Li G, Steel M, Zhang L (2008) More taxa are not necessarily better for the reconstruction of ancestral character states. Syst Biol 57(4):647–653

    Article  Google Scholar 

  • Lipton RJ, Tarjan RE (1979) A separator theorem for planar graphs. SIAM J Appl Math 36(2):177–189

    Article  MathSciNet  Google Scholar 

  • Mooers A, Schluter D (1999) Reconstructing ancestor states with maximum likelihood: support for one- and two-rate models. Syst Biol 48(3):623–633

    Article  Google Scholar 

  • Mossel E, Steel M (2014) Majority rule has transition ratio 4 on Yule trees under a 2-state symmetric model. J Theor Biol 360:315–318

    Article  Google Scholar 

  • Pennell MW, Eastman JM, Slater GJ, Brown JW, Uyeda JC, FitzJohn RG, Alfaro ME, Harmon LJ (2014) geiger v2.0: an expanded suite of methods for fitting macroevolutionary models to phylogenetic trees. Bioinformatics 30(15):2216–2218

    Article  Google Scholar 

  • Sagitov S, Bartoszek K (2012) Interspecies correlation for neutrally evolving traits. J Theor Biol 309:11–19

    Article  MathSciNet  Google Scholar 

  • Tuffley C, Steel M (1997) Links between maximum likelihood and maximum parsimony under a simple model of site substitution. Bull Math Biol 59(3):581–607

    Article  Google Scholar 

  • Van Erven T, Harremoës P (2014) Rényi divergence and Kullback–Leibler divergence. IEEE Trans Inf Theory 60(7):3797–3820

    Article  Google Scholar 

  • Yule GU (1925) A mathematical theory of evolution, based on the conclusions of Dr. JC Willis, FRS. Philoso Trans R Soc Lond Ser B 213:21–87

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lam Si Tung Ho.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

LSTH was supported by startup funds from Dalhousie University, the Canada Research Chairs program, and the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant RGPIN-2018-05447. FAM was supported by CISE-1564137, and in part by a Faculty Scholar grant from the Howard Hughes Medical Institute and the Simons Foundation. MAS was supported by National Science Foundation Grant DMS1264153 and National Institutes of Health grant R01 AI107034.

A Proofs

A Proofs

1.1 A.1 Proof of Lemma 2

Note that by symmetry, we have \({\mathbb {P}}({\mathbf {Y}} = {\mathbf {y}}~|~\rho = 0) = {\mathbb {P}}(1 - {\mathbf {Y}} = {\mathbf {y}}~|~ \rho = 1)\). We deduce that

$$\begin{aligned} {\mathbb {P}}(h({\mathbf {Y}}) = x~|~\rho = 0)&= {\mathbb {P}}({\mathbf {Y}} \in h^{-1}(x)~|~\rho = 0) \\&= {\mathbb {P}}({\mathbf {1}} - {\mathbf {Y}} \in h^{-1}(x)~|~\rho = 1) \\&= {\mathbb {P}}(h({\mathbf {1}} - {\mathbf {Y}}) = x~|~\rho = 1) \\&= {\mathbb {P}}(h({\mathbf {Y}}) = x~|~\rho = 1) \end{aligned}$$

which completes the proof.

1.2 A.2 Proof of Lemma 3

Denote \(P^{(u)}_v = {\mathbb {P}}({\mathbf {Y_u}} ~|~ {\mathbb {T}}_u, \mu , \rho _u = v)\) for \(u \in \{ 0, 1\}\), \(v \in \{ 0,1 \}\). We have

$$\begin{aligned} P_{{\mathbb {T}}_1,\mu }({\mathbf {Y_1}}) P_{{\mathbb {T}}_2,\mu }({\mathbf {Y_2}}) = \frac{1}{4} \sum _{u,v \in \{ 0, 1\}}{P^{(1)}_u P^{(2)}_v}. \end{aligned}$$

Moreover

$$\begin{aligned} P_{{\mathbb {T}},\mu }({\mathbf {Y}}) = \frac{1 + e^{-2 \mu d}}{4} \sum _{u \in \{ 0, 1 \}}{P^{(1)}_u P^{(2)}_u} + \frac{1 - e^{-2 \mu d}}{4} \sum _{u \in \{ 0, 1 \}}{P^{(1)}_u P^{(2)}_{1-u}}. \end{aligned}$$

Therefore

$$\begin{aligned} \frac{1}{1 - e^{-2 \mu d}} P_{{\mathbb {T}}_1,\mu }({\mathbf {Y_1}}) P_{{\mathbb {T}}_2,\mu }({\mathbf {Y_2}}) \le P_{{\mathbb {T}},\mu }({\mathbf {Y}}) \le 2 P_{{\mathbb {T}}_1,\mu }({\mathbf {Y_1}}) P_{{\mathbb {T}}_2,\mu }({\mathbf {Y_2}}). \end{aligned}$$

1.3 A.3 Proof of Lemma 4

Without loss of generality, we assume that \(\mu _1 < \mu _2\). By the mean value theorem, there exists \({{\tilde{\mu }}}_{uv} \in (\mu _1, \mu _2)\) for any \(u, v \in \{ 0, 1\}\) such that

$$\begin{aligned} \left| \log [{\mathbf {P}}_{\mu _1}(t)]_{uv} - \log [{\mathbf {P}}_{\mu _2}(t)]_{uv} \right| = \frac{ t e^{- 2 {{\tilde{\mu }}}_{uv} t}}{[{\mathbf {P}}_{{{\tilde{\mu }}}_{uv}}(t)]_{uv}} |\mu _1 - \mu _2| \le \frac{ t e^{- 2 {{\tilde{\mu }}}_{uv} t}}{1 - e^{- 2 {{\tilde{\mu }}}_{uv} t}} |\mu _1 - \mu _2|. \end{aligned}$$

We observe that there exists a \(C_{{\underline{\mu }}, {\overline{\mu }}}>0\) such that

$$\begin{aligned} \sup _{t \ge 0; {{\tilde{\mu }}}_{uv} \in ({\underline{\mu }}, {\overline{\mu }})} {\frac{ t e^{- 2 {{\tilde{\mu }}}_{uv} t}}{1 - e^{- 2 {{\tilde{\mu }}}_{uv} t}}} \le C_{{\underline{\mu }}, {\overline{\mu }}}. \end{aligned}$$

Therefore,

$$\begin{aligned} | \log [{\mathbf {P}}_{\mu _1}(t)]_{uv} - \log [{\mathbf {P}}_{\mu _2}(t)]_{uv} | \le C_{{\underline{\mu }}, {\overline{\mu }}} |\mu _1 - \mu _2|. \end{aligned}$$

This implies that

$$\begin{aligned}{}[{\mathbf {P}}_{\mu _1}(t)]_{uv} \le e^{C_{{\underline{\mu }}, {\overline{\mu }}} |\mu _1 - \mu _2|} [{\mathbf {P}}_{\mu _2}(t)]_{uv}. \end{aligned}$$
(5)

Note that

$$\begin{aligned} P_{{\mathbb {T}},\mu }({\mathbf {Y}}) = \frac{1}{2} \sum _{y}{\left( \prod _{(u,v)\in E}{[{\mathbf {P}}_\mu ( d_{uv})}]_{y_u y_v} \right) }. \end{aligned}$$

By applying (5) for all \(2n-3\) edges on the tree, we deduce that

$$\begin{aligned} P_{{\mathbb {T}},\mu _1}({\mathbf {Y}}) \le e^{(2n-3) C_{{\underline{\mu }}, {\overline{\mu }}} |\mu _1 - \mu _2| } P_{{\mathbb {T}},\mu _2}({\mathbf {Y}}) . \end{aligned}$$

Hence,

$$\begin{aligned} |\ell _{{\mathbb {T}},\mu _1}({\mathbf {Y}}) - \ell _{{\mathbb {T}},\mu _2}({\mathbf {Y}})| \le (2n-3) C_{{\underline{\mu }}, {\overline{\mu }}} |\mu _1 - \mu _2|, \end{aligned}$$

which validates the lemma.

1.4 A.4 Proof of Lemma 7

For all xy, we have \(|u(x) - u(y)| \le |v(x)-v(y)| + 2c\). Let Y be an independent and identically distributed copy of X, we have

$$\begin{aligned} 2\mathrm {Var}[u(X)]&= {\mathbb {E}}_{X}[u(X)^2] + {\mathbb {E}}_{Y}[u(Y)^2] - 2 {\mathbb {E}}_{X}[u(X)] {\mathbb {E}}_{Y}[u(Y)]\\&= {\mathbb {E}}_{X, Y}\left( u(X)^2 + u(Y)^2 - 2 u(X) u(Y)\right) \\&= {\mathbb {E}}_{X, Y}\left( [u(X)-u(Y)]^2\right) \\&\le {\mathbb {E}}_{X, Y}\left( [|v(X)-v(Y)| + 2c]^2 \right) . \end{aligned}$$

Note that for all \(z, c \in {\mathbb {R}}\) and \(\omega >1\),

$$\begin{aligned} (z + 2c)^2 \le \omega z^2 + \frac{4\omega }{\omega -1} c^2. \end{aligned}$$

Therefore,

$$\begin{aligned} 2\mathrm {Var}[u(X)]&\le \omega {\mathbb {E}}_{X, Y}\left( [v(X)-v(Y)]^2 \right) +\frac{4\omega }{\omega -1}c^2\\&= 2 \omega \mathrm {Var}[v(X)] +\frac{4\omega }{\omega -1}c^2. \end{aligned}$$

1.5 A.5 Proof of Eq. (4)

In order to establish Eq. (4), we use the following Lemma.

Lemma 11

(Remark 3.4 in Jammalamadaka and Janson (1986)) Let \(X_1, X_2, \ldots , X_n\) be an i.i.d. sequence of random variables and \(f_n(x, y)\) be an indicator function on \({\mathbb {R}}^2\) such that

$$\begin{aligned} n^3 E[f_n(X_1, X_2) f_n(X_1, X_3)] \rightarrow 0 \quad \mathrm{and}\quad \frac{1}{2}n^2E[f_n(X_1, X_2)] \rightarrow \lambda \end{aligned}$$

for some constant \(\lambda >0\). Define \(U_n =\sum _{1 \le i< j \le n}{f_n(X_i, X_j)}\).

Then \(U_n \rightarrow _d \text {Poisson}(\lambda )\).

We apply this Lemma with \(f_n(x, y) = I\{|x-y| < r_n\}\) where \(r_n = \epsilon /n^2\) for the sequence \(t_1, t_2, \ldots , t_n\) of the coalescent point process. Note that by Equation (4.3) in Jammalamadaka and Janson (1986),

$$\begin{aligned} \frac{1}{2}n^2E[f_n(t_1, t_2)] \rightarrow c \epsilon \int _{0}^T{\phi (x)^2 dx} \end{aligned}$$

for some constant \(c>0\). On the other hand, we have

$$\begin{aligned} E[f_n(t_1, t_2) f_n(t_1, t_3)]&= E[(E[f_n(t_1, t_2) \mid t_1] )^2]\\&= \int _{0}^T{\left( \int _{t - r_n}^{t+r_n}{\phi (\tau )d\tau }\right) ^2\phi (t) dt}\le \frac{4 \Vert \phi \Vert _\infty ^2 \epsilon ^2}{n^4}. \end{aligned}$$

Therefore, \(n^3 E[f_n(t_1, t_2) f_n(t_1, t_3)] \rightarrow 0\). Hence,

$$\begin{aligned} {\mathbb {P}}\left( n^2 \min _{1 \le i < j \le n-1}{|t_i - t_j|} \le \epsilon \right) = P(U_n = 0) \rightarrow 1 - \exp \left( -c \epsilon \int _0^T{\phi ^2} \right) . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ho, L.S.T., Dinh, V., Matsen, F.A. et al. On the convergence of the maximum likelihood estimator for the transition rate under a 2-state symmetric model. J. Math. Biol. 80, 1119–1138 (2020). https://doi.org/10.1007/s00285-019-01453-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00285-019-01453-1

Keywords

Mathematics Subject Classification

Navigation