Skip to main content

Advertisement

Log in

Ancestral Sequence Reconstruction with Maximum Parsimony

  • Original Article
  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

One of the main aims in phylogenetics is the estimation of ancestral sequences based on present-day data like, for instance, DNA alignments. One way to estimate the data of the last common ancestor of a given set of species is to first reconstruct a phylogenetic tree with some tree inference method and then to use some method of ancestral state inference based on that tree. One of the best-known methods both for tree inference and for ancestral sequence inference is Maximum Parsimony (MP). In this manuscript, we focus on this method and on ancestral state inference for fully bifurcating trees. In particular, we investigate a conjecture published by Charleston and Steel in 1995 concerning the number of species which need to have a particular state, say a, at a particular site in order for MP to unambiguously return a as an estimate for the state of the last common ancestor. We prove the conjecture for all even numbers of character states, which is the most relevant case in biology. We also show that the conjecture does not hold in general for odd numbers of character states, but also present some positive results for this case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Note that inferring ancestral sequences with MP is sometimes referred to as ‘small parsimony’ problem. In contrast, the ‘big parsimony’ problem refers to inferring most parsimonious trees.

References

  • Cai W, Pei J, Grishin NV (2004) Reconstruction of ancestral protein sequences and its applications. BMC Evolut Biol 4:33

    Article  Google Scholar 

  • Felsenstein J (2004) Inferring phylogenies. Sinauer Associates, Inc., Sunderland

    Google Scholar 

  • Fischer M, Liebscher V (2015) On the balance of unrooted trees. Preprint. arXiv:1510.07882

  • Fitch WM (1971) Toward defining the course of evolution: minimum change for a specific tree topology. Syst Zool 20:406–416

    Article  Google Scholar 

  • Gascuel O, Steel M (2010) Inferring ancestral sequences in taxon-rich phylogenies. Math Biosci 227:125–153

    Article  MathSciNet  MATH  Google Scholar 

  • Gascuel O, Steel M (2014) Predicting the ancestral character changes in a tree is typically easier than predicting the root state. Syst Biol 63:421–435

    Article  Google Scholar 

  • Goulden IP, Jackson DM (1983) Combinatorial enumeration. Wiley, New York

    MATH  Google Scholar 

  • Griffith OW, Blackburn DG, Brandley MC, Van Dyke JU, Whittington CM, Thompson MB (2015) Ancestral state reconstructions require biological evidence to test evolutionary hypotheses: a case study examining the evolution of reproductive mode in squamate reptiles. J Exp Zool 324:493–503

    Article  Google Scholar 

  • Li G, Steel M, Zhang L (2008) More taxa are not necessarily better for the reconstruction of ancestral character states. Syst Biol 57:647–653

    Article  Google Scholar 

  • Liberles DA (ed) (2007) Ancestral sequence reconstruction. Oxford University Press, New York

    Google Scholar 

  • Semple C, Steel M (2003) Phylogenetics. Oxford University Press, New York

    MATH  Google Scholar 

  • Steel M, Charleston M (1995) Five surprising properties of parsimoniously colored trees. Bull Math Biol 57:367–375

    Article  MATH  Google Scholar 

Download references

Acknowledgements

We thank Mike Steel for bringing this topic to our attention. Moreover, we thank two anonymous reviewers for their helpful comments on an earlier version of this manuscript. The first author also thanks the Ernst-Moritz-Arndt-University Greifswald for the Landesgraduiertenförderung studentship, under which this work was conducted.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mareike Fischer.

Appendix

Appendix

Proof

(Proof of Lemma 1) (i) Let A, B such that \(a \in A \cap B\) and \(|A| = |B|\). Then A could be transformed into B by renaming all states not element of \(A \cap B\). Then \(A=B\), and this yields \(f_{k,r}^A=f_{k,r}^B\).

(ii) Let \(k=0\). In this case, our tree consists of one leaf, which is at the same time the root. This vertex has to be assigned a to obtain \(X_\rho =\{a\}\). Hence \(f_{0,r}=1\) for all r.

(iii) Let \(k \ge 1\). Then

$$\begin{aligned} f_{k,r}&= \min \left\{ f_{k-1,r}^B+f_{k-1,r}^C: B*C=\{a\} \right\} \\&= \min \left\{ f_{k-1,r}^B+f_{k-1,r}^C: B \cap C=\{a\} \right\} , \\&\text {since } B \cup C \text { would not result in } X_\rho =\{a\} \text { as }B,C\ne \emptyset . \end{aligned}$$

Therefore, \(a \in B\) and \(a \in C\) which results in \(f_{k,r} \ge 2\). \(\square \)

Proof

(Lemma 2) Let \(r= 2p\ge 2\), i.e. \(p\ge 1\).

  1. 1.

    We start with the case \(k\le p\). We have \(f_{1,r}=2\) for all r (see Fig. 8). By Theorem 3, we know that \(f_{k,r}\) is monotonically increasing in k, and thus, \(2=f_{1,r} \le f_{p,r}\). We now use the standard decomposition for \(T_p\) in order to derive its two maximal pending rooted subtrees \(T_{p-1}\) with roots \(\rho _1\) and \(\rho _2\). We use the same construction as in the proof of Observation 1, which is depicted in Fig. 6. Thus, we can achieve \(X_{\rho _1}=A_p=\{a,c_1, \ldots , c_{p-1}\}\) and \(X_{\rho _2}=A_p^{'}=\{a,c_p, \ldots , c_{2p-2}\}\) by assigning a to one leaf in each subtree \(T_{p-1}\), respectively, where \(R=\{a,c_1,\ldots ,c_{2p-1}\}\), and such that all other subtrees use one state each which is unique to this subtree. Thus, the root of \(T_p\) will have the MP root state estimate \(A_p \cap A_p^{'}=\{a\}\). (Note that no leaf is assigned character state \(c_{2p-1}\); we do not even require all states. We will need this fact later.) We conclude \(f_{p,r}\le 2\), which together with \(f_{p,r}\ge 2\) as explained above completes \(f_{p,r}=2\). Together with Theorem 3, we achieve that \(f_{k,r}=2\) for all \(k=1,\ldots ,p\).

  2. 2.

    Now consider the case \(k=p+1\). In this case, we have with (7)

    $$\begin{aligned} f_{p+1,r}=f_{p+1,2p}&=\min \left\{ f_{p,2p}+f_{p,2p}^{A_{2p}},f_{p,2p}^{A_{2}}+f_{p,2p}^{A_{2p-1}},\ldots , f_{p,2p}^{A_{p}}+f_{p,2p}^{A_{p+1}} \right\} \\&=\min \left\{ f_{p,2p}+f_{p,2p}^{A_{2p}},f_{p-1,2p}+f_{p,2p}^{A_{2p-1}},\ldots , f_{1,2p}+f_{0,2p} \right\}&\text {by Theorem } 4 \\&=\min \left\{ 2+f_{p,2p}^{A_{2p}},2+f_{p,2p}^{A_{2p-1}},\ldots , 2+1 \right\}&\text {by Lemma 2, part 1 } \\&=3. \end{aligned}$$

    The latter equation is true because \(1 \le f_{p,2p}^{A_{2p}} \le f_{p,2p}^{A_{2p-1}} \le \cdots \le f_{p,2p}^{A_{p+1}}\) (at least one leaf has to be labelled a if a shall appear in the MP root state estimate).

  3. 3.

    Now consider the case \(k=r\). We can proceed as above and assign \(X_{\rho _1}=A_p=\{a,c_1, \ldots , c_{p-1}\}\) to the first of the two \(T_{r-1}\) subtrees induced by the standard decomposition, and \(X_{\rho _2}=A_{p+1}^{'}=\{a,c_p, \ldots , c_{2p-1}\}\) to the other one, where again \(R=\{a,c_1,\ldots ,c_{2p-1}\}\). Then by (7), we can conclude \(f_{r,r}\le f_{r-1,r}^{A_p} + f_{r-1,r}^{A_{p+1}}\). Note that \(f_{r-1,r}^{A_p} + f_{r-1,r}^{A_{p+1}}= f_{p,r}+f_{p-1,r}\) by Theorem 4, so that \(f_{r,r}\le f_{p,r}+f_{p-1,r} = 2+2\), where the latter equation holds because of the first part of Lemma 2. Altogether, \(f_{r,r}\le 4\). By the monotonicity of Theorem 3, we obtain \(f_{k,r} \le 4\) for all \(p+1 < k \le r\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Herbst, L., Fischer, M. Ancestral Sequence Reconstruction with Maximum Parsimony. Bull Math Biol 79, 2865–2886 (2017). https://doi.org/10.1007/s11538-017-0354-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11538-017-0354-6

Keywords

Navigation