Skip to main content
Log in

Topological Estimation Biases with Covarion Evolution

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

Covarion processes allow changes in evolutionary rates at sites along the branches of a phylogenetic tree. Covarion-like evolution is increasingly recognized as an important mode of protein evolution. Several recent reports suggest that maximum likelihood estimation employing covarion models may support different optimal topologies than estimation using standard rates-across-sites (RAS) models. However, it remains to be demonstrated that ignoring covarion evolution will generally result in topological misestimation. In this study we performed analytical and theoretical studies of limiting distances under the covarion model and four-taxon tree simulations to investigate the extent to which the covarion process impacts on phylogenetic estimation. In particular, we assessed the limits of an RAS model-based maximum likelihood method to recover the phylogenies when the sequence data were simulated under the covarion processes. We find that, when ignored, covarion processes can induce systematic errors in phylogeny reconstruction. Surprisingly, when sequences are evolved under a covarion process but an RAS model is used for estimation, we find that a long branch repel bias occurs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Ané C, Burleigh JG, McMahon MM, Sanderson MJ (2005) Covarion structure in plastid genome evolution: a new statistical test. Mol Biol Evol 22:914–924

    Article  PubMed  Google Scholar 

  • Bruno WJ, Halpern AL (1999) Topological bias and inconsistency of maximum likelihood using wrong models. Mol Biol Evol 16:564–566

    PubMed  CAS  Google Scholar 

  • Chang JT (1996) Inconsistency of evolutionary tree topology reconstruction methods when substitution rates vary across characters. Math Biosci 134:189–215

    Article  PubMed  CAS  Google Scholar 

  • Felsenstein J (1978) Cases in which parsimony or compatibility methods will be positively misleading. Syst Zool 27:401–410

    Article  Google Scholar 

  • Fitch WM (1971) Rate of change of concomitantly variable codons. J Mol Evol 1:84–96

    Article  PubMed  CAS  Google Scholar 

  • Fitch WM, Markowitz E (1970) An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem Genet 4:479–593

    Article  Google Scholar 

  • Foster PG (2004) Modeling compositional heterogeneity. Syst Biol 53:485–495

    Article  PubMed  Google Scholar 

  • Gadagkar SR, Kumar S (2005) Maximum likelihood outperforms maximum parsimony even when evolutionary rates are heterotachous. Mol Biol Evol 22:2139–2141

    Article  PubMed  CAS  Google Scholar 

  • Galtier N (2001) Maximum-likelihood phylogenetic analysis under a covarion-like model. Mol Biol Evol 18:866–873

    PubMed  CAS  Google Scholar 

  • Galtier N, Gouy M (1995) Inferring phylogenies from DNA sequences of unequal base compositions. Proc Natl Acad Sci USA 92:11317–11321

    Article  PubMed  CAS  Google Scholar 

  • Gaucher EA, Miyamoto MM (2005) A call for likelihood phylogenetics even when the process of sequence evolution is heterogeneous. Mol Phylogenet Evol 37:928–931

    Article  PubMed  Google Scholar 

  • Gaucher EA, Miyamoto MM, Benner SA (2001) Function-structure analysis of proteins using covarion-based evolutionary approaches: Elongation factors. Proc Natl Acad Sci USA 98:548–552

    Article  PubMed  CAS  Google Scholar 

  • Gaut BS, Lewis PO (1995) Success of maximum likelihood phylogeny inference in the four-taxon case. Mol Biol Evol 12:152–162

    PubMed  CAS  Google Scholar 

  • Gu X (1999) Statistical methods for testing functional divergence after gene duplication. Mol Biol Evol 16:1664–1674

    PubMed  CAS  Google Scholar 

  • Huelsenbeck JP (1995) Performance of phylogenetic methods in simulation. Syst Biol 44:17–48

    Article  Google Scholar 

  • Huelsenbeck JP (1998) Systematic bias in phylogenetic analysis: Is the Strepsiptera problem solved? Syst Biol 47:519–537

    PubMed  CAS  Google Scholar 

  • Huelsenbeck JP (2002) Testing a covariotide model of DNA substitution. Mol Biol Evol 19:698–707

    PubMed  CAS  Google Scholar 

  • Inagaki Y, Susko E, Fast NM, Roger AJ (2004) Covarion shifts cause a long-branch attraction artifact that unites microsporidia and archaebacteria in EF1-alpha phylogenies. Mol Biol Evol 21:1340–1349

    Article  PubMed  CAS  Google Scholar 

  • Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8:275–282

    PubMed  CAS  Google Scholar 

  • Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism. Academic Press, New York, pp. 21–123

    Google Scholar 

  • Kolaczkowski B, Thornton JW (2004) Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431:980–984

    Article  PubMed  CAS  Google Scholar 

  • Kuhner MK, Felsenstein J (1994) A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol Biol Evol 11:459–468

    PubMed  CAS  Google Scholar 

  • Lartillot N, Philippe H (2004) A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol 21:1095–1109

    Article  PubMed  CAS  Google Scholar 

  • Lockhart PJ, Steel MA, Hendy MD, Penny D (1994) Recovering evolutionary trees under a more realistic model of sequence evolution. Mol Biol Evol 11:605–612

    CAS  Google Scholar 

  • Lockhart PJ, Larkum AW, Steel M, Waddell PJ, Penny D (1996) Evolution of chlorophyll and bacteriochlorophyll: the problem of invariant sites in sequence analysis. Proc Natl Acad Sci USA 93:1930–1934

    Article  PubMed  CAS  Google Scholar 

  • Lockhart PJ, Steel MA, Barbrook AC, Huson DH, Charleston MA, Howe CJ (1998) A covariotide model explains apparent phylogenetic structure of oxygenic photosynthetic lineages. Mol Biol Evol 15:1183–1188

    PubMed  CAS  Google Scholar 

  • Lockhart PJ, Huson D, Maier U, Fraunholz MJ, Van De Peer Y, Barbrook AC, Howe CJ, Steel MA (2000) How molecules evolve in eubacteria. Mol Biol Evol 17:835–838

    PubMed  CAS  Google Scholar 

  • Lopez P, Casane D, Philippe H (2002) Heterotachy, an important process of protein evolution. Mol Biol Evol 19:1–7

    PubMed  CAS  Google Scholar 

  • Miyamoto MM, Fitch W (1995) Testing the covarion hypothesis of molecular evolution. Mol Biol Evol 12:503–513

    PubMed  CAS  Google Scholar 

  • Pagel M, Meade A (2004) A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst Biol 53:571–581

    Article  PubMed  Google Scholar 

  • Penny D, McComish BJ, Charleston MA, Hendy MD (2001) Mathematical elegance with biochemical realism: the covarion model of molecular evolution. J Mol Evol 53:711–723

    Article  PubMed  CAS  Google Scholar 

  • Philippe H, Zhou Y, Brinkmann H, Rodrigue N, Delsuc F (2005) Heterotachy and long-branch attraction in phylogenetics. BMC Evol Biol 5:50

    Article  PubMed  Google Scholar 

  • Pupko T, Galtier N (2002) A covarion-based method for detecting molecular adaptation: application to the evolution of primate mitochondrial genomes. Proc R Soc Lond B 269:1313–1316

    Article  CAS  Google Scholar 

  • R Development Core Team (2007) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Available at: http://www.R-project.org

  • Rambaut A, Grassly NC (1997) Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic tree. Comput Appl Biosci 13:235–238

    PubMed  CAS  Google Scholar 

  • Ruano-Rubio V, Fares MA (2007) Artifactual phylogenies caused by correlated distribution of substitution rates among sites and lineages: the good, the bad, and the ugly. Syst Biol 56:68–82

    Article  PubMed  Google Scholar 

  • Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing evolutionary trees. Mol Biol Evol 4:406–425

    PubMed  CAS  Google Scholar 

  • Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502–504

    Article  PubMed  CAS  Google Scholar 

  • Siddall ME (1998) Success of parsimony in the four-taxon case: long-branch repulsion by likelihood in the Farris zone. Cladistics 14:209–22

    Article  Google Scholar 

  • Simon C, Nigro L, Sullivan J, Holsinger K, Martin A, Grapputo A, Franke A, McIntosh C (1996) Large differences in substitutional pattern and evolutionary rate of 12S ribosomal RNA genes. Mol Biol Evol 13:923–932

    PubMed  CAS  Google Scholar 

  • Spencer M, Susko E, Roger AJ (2005) Likelihood, parsimony, and heterogeneous evolution. Mol Biol Evol 22:1161–1164

    Article  PubMed  CAS  Google Scholar 

  • Steel M (2005) Should phylogenetic models be trying to “fit an elephant”? Trends Genet 21:307–309

    Article  PubMed  CAS  Google Scholar 

  • Susko E, Inagaki Y, Roger AJ (2004) On inconsistency of the neighbour joining method and least squares estimation when distances are incorrectly specified. Mol Biol Evol 29:1629–1642

    Article  Google Scholar 

  • Swofford DL, Waddell PJ, Huelsenbeck JP, Foster PG, Lewis PO, Rogers JS (2001) Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. Syst Biol 50:525–39

    Article  PubMed  CAS  Google Scholar 

  • Tuffley C, Steel MA (1998) Modeling the covarion hypothesis of nucleotide substitution. Math Biosci 147:63–91

    Article  PubMed  CAS  Google Scholar 

  • Uzzell T, Corbin KW (1971) Fitting discrete probability distributions to evolutionary events. Science 172:1089–1096

    Article  PubMed  CAS  Google Scholar 

  • Wang H-C, Spencer M, Susko E, Roger AJ (2007) Testing for covarion-like evolution in protein sequences. Mol Biol Evol 24:294–305

    Article  PubMed  CAS  Google Scholar 

  • Yang Z (1994) Maximum-likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol 39:306–311

    Article  PubMed  CAS  Google Scholar 

  • Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 15:555–556

    Google Scholar 

Download references

Acknowledgments

We thank the two reviewers for useful comments. This research was supported by Discovery grants awarded to E.S. and A.J.R. by the Natural Sciences and Engineering Research Council of Canada. A.J.R. and E.S. are fellows of the Canadian Institute for Advanced Research Program in Evolutionary Biology. A.J.R. is supported by a fellowship from the Peter Lougheed New Investigator Award from the Canadian Institutes of Health Research and the E.W.R. Steacie fellowship from NSERC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huai-Chun Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, HC., Susko, E., Spencer, M. et al. Topological Estimation Biases with Covarion Evolution. J Mol Evol 66, 50–60 (2008). https://doi.org/10.1007/s00239-007-9062-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00239-007-9062-4

Keywords

Navigation