Skip to main content
Log in

Phylogenetic analysis using parsimony and likelihood methods

  • Articles
  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

The assumptions underlying the maximum-parsimony (MP) method of phylogenetic tree reconstruction were intuitively examined by studying the way the method works. Computer simulations were performed to corroborate the intuitive examination. Parsimony appears to involve very stringent assumptions concerning the process of sequence evolution, such as constancy of substitution rates between nucleotides, constancy of rates across nucleotide sites, and equal branch lengths in the tree. For practical data analysis, the requirement of equal branch lengths means similar substitution rates among lineages (the existence of an approximate molecular clock), relatively long interior branches, and also few species in the data. However, a small amount of evolution is neither a necessary nor a sufficient requirement of the method. The difficulties involved in the application of current statistical estimation theory to tree reconstruction were discussed, and it was suggested that the approach proposed by Felsenstein (1981,J. Mol. Evol. 17: 368–376) for topology estimation, as well as its many variations and extensions, differs fundamentally from the maximum likelihood estimation of a conventional statistical parameter. Evidence was presented showing that the Felsenstein approach does not share the asymptotic efficiency of the maximum likelihood estimator of a statistical parameter. Computer simulations were performed to study the probability that MP recovers the true tree under a hierarchy of models of nucleotide substitution; its performance relative to the likelihood method was especially noted. The results appeared to support the intuitive examination of the assumptions underlying MP. When a simple model of nucleotide substitution was assumed to generate data, the probability that MP recovers the true topology could be as high as, or even higher than, that for the likelihood method. When the assumed model became more complex and realistic, e.g., when substitution rates were allowed to differ between nucleotides or across sites, the probability that MP recovers the true topology, and especially its performance relative to that of the likelihood method, generally deteriorates. As the complexity of the process of nucleotide substitution in real sequences is well recognized, the likelihood method appears preferable to parsimony. However, the development of a statistical methodology for the efficient estimation of the tree topology remains a difficult open problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bishop MJ, Friday AE (1985) Evolutionary trees from nucleic acid and protein sequences. Proc R Soc Lond [Biol] 226:271–302

    Google Scholar 

  • Bishop MJ, Friday AE (1987) Tetropad relationships: the molecular evidence. In: Patterson C (ed) Molecules and morphology in evolution: conflict or compromise? Cambridge University Press, Cambridge, pp 123–129

    Google Scholar 

  • Brown WM, Prager EM, Wang A, Wilson AC (1982) Mitochondrial DNA sequences of primates, tempo and mode of evolution. J Mol Evol 18:225–239

    Google Scholar 

  • Camin J, Sokal R (1965) A method for deducing branching sequences in phylogeny. Evolution 19:311–326

    Google Scholar 

  • Cavalli-Sforza LL, Edwards AWF (1967) Phylogenetic analysis: models and estimation procedures. Evolution 21:550–570

    Google Scholar 

  • Debry RW (1992) The consistency of several phylogeny-inference methods under varying evolutionary rates. Mol Biol Evol 9:537–551

    Google Scholar 

  • Eck RV, Dayhoff MO (1966) Inference from protein sequence comparisons. In: Dayhoff MO (ed) Atlas of protein sequence and structure. National Biomedical Research Foundation, Silver Spring, MD, pp 161–202

    Google Scholar 

  • Edwards AWF (1970) Estimation of the branch points of a branching diffusion process with discussion. J R Stat Soc B 32:155–174

    Google Scholar 

  • Edwards AWF, Cavalli-Sforza LL (1963) The reconstruction of evolution. Heredity 18:553

    Google Scholar 

  • Farris J (1973) On the use of the parsimony criterion for inferring evolutionary trees. Syst Zool 22:250–256

    Google Scholar 

  • Farris J (1977) Phylogenetic analysis under Dollo's law. Syst Zool 26:77–88

    Google Scholar 

  • Felsenstein J (1973) Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Syst Zool 22:240–249

    Google Scholar 

  • Felsenstein J (1978) Cases in which parsimony and compatibility methods will be positively misleading. Syst Zool 27:401–410

    Google Scholar 

  • Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376

    Google Scholar 

  • Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791

    Google Scholar 

  • Felsenstein J (1988) Phylogenies from molecular sequences: inference and reliability. Annu Rev Genet 22:521–565

    Google Scholar 

  • Felsenstein J, Kishino H (1993) Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Syst Biol 42:193–200

    Google Scholar 

  • Felsenstein J, Sober E (1986) Parsimony and likelihood: an exchange. Syst Zool 35:617–626

    Google Scholar 

  • Fitch WM (1971) Toward defining the course of evolution: minimum change for a specific tree topology. Syst Zool 20:406–416

    Google Scholar 

  • Fukami-Kobayashi K, Tateno Y (1991) Robustness of maximum likelihood tree estimation against different patterns of base substitution. J Mol Evol 32:79–91

    Google Scholar 

  • Gaut BS, Lewis PO (1995) Success of maximum likelihood phylogeny inference in the four-taxon case. Mol Biol Evol 12:152–162

    Google Scholar 

  • Goldman N (1990) Maximum likelihood inference of phylogenetic trees, with special reference to a Poisson process model of DNA substitution and to parsimony analysis. Syst Zool 39:345–361

    Google Scholar 

  • Goldman N, Yang Z (1994) A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol 11:725–736

    Google Scholar 

  • Hartigan JA (1973) Minimum evolution fits to a given tree. Biometrics 29:53–65

    Google Scholar 

  • Hasegawa M, Fujiwara M (1993) Relative efficiencies of the maximum likelihood, maximum parsimony, and neihbor joining methods for estimating protein phylogeny. Mol Phyl Evol 2:1–5

    Google Scholar 

  • Hasegawa M, Yano T (1984) Maximum likelihood method of phylogenetic inference from DNA sequence data. Bull Biomet Soc Jpn 5:1–7

    Google Scholar 

  • Hasegawa M, Kishino H, Yano T (1985) Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160–174

    Google Scholar 

  • Hendy MD, Penny D (1989) A framework for the quantitative study of evolutionary trees. Syst Zool 38:297–309

    Google Scholar 

  • Hill ID (1973) Algorithm AS 66: the normal integral. Appl Stat 22:424–427

    Google Scholar 

  • Huelsenbeck JP, Hillis DM (1993) Success of phylogenetic methods in the four-taxon case. Syst Biol 42:247–264

    Google Scholar 

  • Jin L, Nei M (1990) Limitations of the evolutionary parsimony method of phylogenetic analysis. Mol Biol Evol 7:82–102

    Google Scholar 

  • Johnson NJ, Kotz S, Kemp AW (1992) Univariate discrete distributions, 2nd ed. Wiley, New York

    Google Scholar 

  • Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism. Academic Press, New York, pp 21–123

    Google Scholar 

  • Katz L (1946) On the class of functions defined by difference equation (x+1)f(x+1)=(a+bx)f(x) (abstract). Ann Math Stat 17:501

    Google Scholar 

  • Katz L (1965) Unified treatment of a broad class of discrete probability distributions. In: Patil GP (ed) Classical and contagious discrete distributions. Pergamon Press, Oxford, pp 175–182

    Google Scholar 

  • Kendail M, Stuart A (1979) Advanced theory of statistics, vol 2. Charles Griffin, London

    Google Scholar 

  • Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120

    Google Scholar 

  • Kishino H, Miyata T, Hasegawa M (1990) Maximum likelihood inference of protein phylogeny and the origin or chloroplasts. J Mol Evol 31:151–160

    Google Scholar 

  • Kuhner MK, Felsenstein J (1994) A simulation of phylogeny algorithms under equal and unequal evolutionary rates. Mol Biol Evol 11:459–468

    Google Scholar 

  • Li WH, Zharkikh A (1994) What is the bootstrap technique? Syst Biol 43:424–430

    Google Scholar 

  • Maddison WP, Maddison DR (1992) MacClade: analysis of phylogeny and character evolution, version 3. Sinauer, Sunderland, MA

    Google Scholar 

  • Muse SV, Gaut BS (1994) A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to chloroplast genome. Mol Biol Evol 11:715–724

    Google Scholar 

  • Nei M (1987) Molecular evolutionary genetics. Columbia University Press, New York

    Google Scholar 

  • Pauling L, Zuckerkandl E (1963) Chemical paleogenetics: molecular “restoration studies” of extinct forms of life. Acta Chem Scand 17:S9-S16

    Google Scholar 

  • Penny D, Hendy MD, Henderson IM (1987) The reliability of evolutionary trees. Cold Spring Harb Symp Quant Biol 52:857–862

    Google Scholar 

  • Saitou N, Imanishi T (1989) Relative efficiencies of the Fitch-Margoliash, maximum parsimony, maximum likelihood, minimum-evolution, and neighbor-joining methods of phylogenetic tree construction in obtaining the correct tree. Mol Biol Evol 6:514–525

    Google Scholar 

  • Saitou N, Nei N (1987) The neighbour joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425

    Google Scholar 

  • Schoniger M, von Haeseler A (1993) A simple method to improve the reliability of tree reconstructions. Mol Biol Evol 10:471–483

    Google Scholar 

  • Sober E (1988) Reconstructing the past: parsimony, evolution, and inference. MIT Press, Cambridge, MA

    Google Scholar 

  • Sourdis J, Krimbas C (1987) Accuracy of phylogenetic trees estimated from DNA sequence data. Mol Biol Evol 4:159–166

    Google Scholar 

  • Swofford DL (1993) Phylogenetic analysis using parsimony (PAUP), version 3.1. University of Illinois, Champaign

    Google Scholar 

  • Swofford DL, Olsen GJ (1990) Phylogeny reconstruction. In: Hillis DM, Moritz G (eds) Molecular systematics. Sinauer, Sunderland, MA, pp 411–501

    Google Scholar 

  • Tajima F, Takezaki N (1994) Estimation of evolutionary distance for reconstructing molecular phylogenetic trees. Mol Biol Evol 11:277–286

    Google Scholar 

  • Takezaki N, Nei M (1994) Inconsistency of the maximum parsimony method when the rate of nucleotide substitution is constant. J Mol Evol 39:210–218

    Google Scholar 

  • Tateno Y, Takezaki N, Nei M (1994) Relative efficiencies of the maximum-likelihood, neighbor-joining and maximum-parsimony methods when substitution rate varies with site. Mol Biol Evol 11:261–277

    Google Scholar 

  • Thompson EA (1975) Human evolutionary trees. Cambridge University Press, Cambridge

    Google Scholar 

  • Wald A (1949) Note on the consistency of the maximum likelihood estimate. Ann Math Statist 20:595–601

    Google Scholar 

  • Wiley E (1975) Karl P. Popper, systematics, and classification: a reply to Walter Bock and other evolutionary taxonomists. Syst Zool 24:233–242

    Google Scholar 

  • Yang Z (1993) Maximum likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol Biol Evol 10:1396–1401

    Google Scholar 

  • Yang Z (1994a) Estimating the pattern of nucleotide substitution. J Mol Evol 39:105–111

    Google Scholar 

  • Yang Z (1994b) Statistical properties of the maximum likelihood method of phylogenetic estimation and comparison with distance matrix methods. Syst Biol 43:329–342

    Google Scholar 

  • Yang Z (1994c) Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol 39:306–314

    Google Scholar 

  • Yang Z (1995a) A space-time process model for the evolution of DNA sequences. Genetics 139:993–1005

    Google Scholar 

  • Yang Z (1995b) Evaluation of several methods for estimating phylogenetic trees when substitution rates differ over nucleotide sites. J Mol Evol 40:689–697

    Google Scholar 

  • Yang Z, Goldman N, Friday AE (1994) Comparison of models for nucleotide substitution used in maximum likelihood phylogenetic estimation. Mol Biol Evol 11:316–324

    Google Scholar 

  • Yang Z, Goldman N, Friday AE (1995) Maximum likelihood trees from DNA sequences: a peculiar statistical estimation problem. Syst Biol 44:384–399

    Google Scholar 

  • Zharkikh A, Li WH (1992) Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences: I. Four taxa with a molecular clock. Mol Biol Evol 9:1119–1147

    Google Scholar 

  • Zharkikh A, Li WH (1993) Inconsistency of the maximum-parsimony method: the case of five taxa with a molecular clock. Syst Biol 42:113–125

    Google Scholar 

  • Zuckerkandl E (1964) Further principles of chemical paleogenetics as applied to the evolution of hemoglobin. In: Peeters H (ed) Protides of the biological fluids. Elsevier, Amsterdam, pp 102–109

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Z. Phylogenetic analysis using parsimony and likelihood methods. J Mol Evol 42, 294–307 (1996). https://doi.org/10.1007/BF02198856

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02198856

Key words

Navigation