Estimating Maximum Likelihood Phylogenies with PhyML

  • Stéphane Guindon
  • Frédéric Delsuc
  • Jean-François Dufayard
  • Olivier Gascuel
Part of the Methods in Molecular Biology book series (MIMB, volume 537)


Our understanding of the origins, the functions and/or the structures of biological sequences strongly depends on our ability to decipher the mechanisms of molecular evolution. These complex processes can be described through the comparison of homologous sequences in a phylogenetic framework. Moreover, phylogenetic inference provides sound statistical tools to exhibit the main features of molecular evolution from the analysis of actual sequences. This chapter focuses on phylogenetic tree estimation under the maximum likelihood (ML) principle. Phylogenies inferred under this probabilistic criterion are usually reliable and important biological hypotheses can be tested through the comparison of different models. Estimating ML phylogenies is computationally demanding, and careful examination of the results is warranted. This chapter focuses on PhyML, a software that implements recent ML phylogenetic methods and algorithms. We illustrate the strengths and pitfalls of this program through the analysis of a real data set. PhyML v3.0 is available from

Key words

DNA and protein sequences molecular evolution sequence comparisons phylogenetics statistics maximum likelihood Markov models algorithms software PhyML 



This work was supported by the “MITOSYS” grant from ANR. The chapter itself is the contribution 2007–08 of the Institut des Sciences de l'Evolution (UMR5554-CNRS).


  1. 1.
    Felsenstein, J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17, 368–76.PubMedCrossRefGoogle Scholar
  2. 2.
    Rogers, J., and Swofford, D. (1999) Multiple local maxima for likelihoods of phylogenetic trees: a simulation study. Mol Biol Evol 16, 1079–85.Google Scholar
  3. 3.
    Huelsenbeck, J. P., and Hillis, D. (1993) Success of phylogenetic methods in the four-taxon case. Syst Biol 42, 247–64.Google Scholar
  4. 4.
    Swofford, D., Olsen, G., Waddel, P., and Hillis, D. (1996) Phylogenetic inference. In D. Hillis, C. Moritz, B. Mable, eds., Molecular Systematics, chapter 11. Sinauer, Sunderland, MA.Google Scholar
  5. 5.
    Guindon, S., and Gascuel, O. (2003) A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52, 696–704.CrossRefGoogle Scholar
  6. 6.
    Olsen, G., Matsuda, H., Hagstrom, R ., and Overbeek, R. (1994) fastDNAml: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Comput Appl Biosci 10, 41–8.Google Scholar
  7. 7.
    Hordijk, W., and Gascuel, O. (2005) Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood. Bioinformatics 21, 4338–47.PubMedCrossRefGoogle Scholar
  8. 8.
    Anisimova, M., and Gascuel, O. (2006) Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst Biol 55, 539–52.PubMedCrossRefGoogle Scholar
  9. 9.
    Shimodaira, H., and Hasegawa, M. (1999) Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol, 16, 1114–6.Google Scholar
  10. 10.
    Jukes, T., and Cantor, C. (1969) Evolution of protein molecules. In H. Munro, ed., Mammalian Protein Metabolism, volume III, chapter 24, 21–132. Academic Press, New York.Google Scholar
  11. 11.
    Kimura, M. (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16, 111–20.PubMedCrossRefGoogle Scholar
  12. 12.
    Felsenstein, J. (1993) PHYLIP (PHYLogeny Inference Package) Version 3.6a2. Distributed by the author, Department of Genetics, University of Washington, Seattle.Google Scholar
  13. 13.
    Hasegawa, M., Kishino, H., and Yano, T. (1985) Dating of the Human-Ape splitting by a molecular clock of mitochondrial-DNA. J Mol Evol 22, 160–74.PubMedCrossRefGoogle Scholar
  14. 14.
    Tamura, K., and Nei, M. (1993) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10, 512–26.Google Scholar
  15. 15.
    Lanave, C., Preparata, G., Saccone, C., and Serio, G. (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20, 86–93.PubMedCrossRefGoogle Scholar
  16. 16.
    Tavaré, S. (1986) Some probabilistic and statistical problems on the analysis of DNA sequences. Lect Mathe Life Sci, 17, 57–86.Google Scholar
  17. 17.
    Whelan, S., and Goldman, N. (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18, 691–9.CrossRefGoogle Scholar
  18. 18.
    Dayhoff, M., Schwartz, R., and Orcutt, B. (1978) A model of evolutionary change in proteins. In M. Dayhoff, ed., Atlas of Protein Sequence and Structure, volume 5, 345–52. National Biomedical Research Foundation, Washington, D. C.Google Scholar
  19. 19.
    Jones, D., Taylor, W., and Thornton, J. (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci, 8, 275–82.PubMedGoogle Scholar
  20. 20.
    Henikoff, S., and Henikoff, J. (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89, 10915–9.Google Scholar
  21. 21.
    Adachi, J., and Hasegawa, M. (1996) MOLPHY version 2.3. programs for molecular phylogenetics based on maximum likelihood. In M. Ishiguro, G. Kitagawa, Y. Ogata, H. Takagi, Y. Tamura, T. Tsuchiya, eds., Computer Science Monographs, 28. The Institute of Statistical Mathematics, Tokyo.Google Scholar
  22. 22.
    Dimmic, M., Rest, J., Mindell, D., and Goldstein, D. (2002) rtREV: an amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. J Mol Evol 55, 65–73.PubMedCrossRefGoogle Scholar
  23. 23.
    Adachi, J., P., Martin, W., and Hasegawa, M. (2000) Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA. J Mol Evol 50, 348–58.PubMedGoogle Scholar
  24. 24.
    Kosiol, C., and Goldman, N. (2004) Different versions of the Dayhoff rate matrix. Mol Biol and Evol 22, 193–9.CrossRefGoogle Scholar
  25. 25.
    Muller, T., and Vingron, M. (2000) Modeling amino acid replacement. J Comput Biol 7, 761–76.PubMedCrossRefGoogle Scholar
  26. 26.
    Cao, Y., Janke, A., Waddell, P., Westerman, M., Takenaka, O., Murata, S., Okada, N., Paabo, S., and Hasegawa, M. (1998) Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders. J Mol Evol 47, 307–22.PubMedCrossRefGoogle Scholar
  27. 27.
    Yang, Z. (1994) Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol 39, 306–14.PubMedCrossRefGoogle Scholar
  28. 28.
    Gascuel, O. (1997) BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol 14, 685–95.Google Scholar
  29. 29.
    Posada, D., and Crandall, K. (1998) Modeltest: testing the model of DNA substitution. Bioinformatics 14, 817–918.PubMedCrossRefGoogle Scholar
  30. 30.
    Abascal, F., Zardoya, R., and Posada, D. (2005) Prottest: selection of best-fit models of protein evolution. Bioinformatics 21, 2104–5.PubMedCrossRefGoogle Scholar
  31. 31.
    Galtier, N., and Jean-Marie, A. (2004) Markov-modulated Markov chains and the covarion process of molecular evolution. J Comput Biol, 11, 727–33.PubMedCrossRefGoogle Scholar
  32. 32.
    Lin, Y.-H., McLenachan, P., Gore, A., Phillips, M., Ota, R., Hendy, M., and Penny, D. (2002) Four new mitochondrial genomes, and the stability of evolutionary trees of mammals. Mol Biol Evol 19, 2060–70.CrossRefGoogle Scholar
  33. 33.
    Reyes, A., Gissi, C., Catzeflis, F., Nevo, E., Pesole, G., and Saccone, C. (2004) Congruent mammalian trees from mitochondrial and nuclear genes using bayesian methods. Mol Biol Evol 21, 397–403.CrossRefGoogle Scholar
  34. 34.
    Murphy, M., Eizirik, E., O'Brien, S., Madsen, O., Scally, M., Douady, C., Teeling, E., Ryder, O., Stanhope, M., de Jong, W., and Springer, M. (2001) Resolution of the early placental mammal radiation using bayesian phylogenetics. Science 294, 2348–51.PubMedCrossRefGoogle Scholar
  35. 35.
    Delsuc, F., Scally, M., Madsen, O., Stanhope, M., de Jong, W., Catzeflis, F., Springer, M., and Douzery, E. (2002) Molecular phylogeny of living xenarthrans and the impact of character and taxon sampling on the placental tree rooting. Mol Biol Evol 19, 1656–71.CrossRefGoogle Scholar
  36. 36.
    Amrine-Madsen, H., Koepfli, K., Wayne, R., and Springer, M. (2003) A new phylogenetic marker, apolipoprotein B, provides compelling evidence for eutherian relationships. Mol Phylogenet Evol 28, 225–40.CrossRefGoogle Scholar
  37. 37.
    Springer, M., Bry, R. D., Douady, C., Amrine, H., Madsen, O., de Jong, W., and Stanhope., M. (2001) Mitochondrial versus nuclear gene sequences in deep-level mammalian phylogeny reconstruction. Mol Biol Evol 18, 132–43.CrossRefGoogle Scholar
  38. 38.
    D'Erchia, A., Gissi, C., Pesole, G., Saccone, C., and Arnason, U. (1996) The guinea-pig is not a rodent. Nature 381, 597–600.PubMedCrossRefGoogle Scholar
  39. 39.
    Reyes, A., Pesole, G., and Saccone, C. (1998) Complete mitochondrial DNA sequence of the fat dormouse, Glis glis: further evidence of rodent paraphyly. Mol Biol Evol 15, 499–505.Google Scholar
  40. 40.
    Reyes, A., Pesole, G., and Saccone, C. (2000) Long-branch attraction phenomenon and the impact of among-site rate variation on rodent phylogeny. Gene 259, 177–87.PubMedCrossRefGoogle Scholar
  41. 41.
    Philippe, H. (1997) Rodent monophyly: pitfalls of molecular phylogenies. J Mol Evol 45, 712–5.PubMedGoogle Scholar
  42. 42.
    Sullivan, J., and Swofford, D. (1997) Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics. J Mammal Evol 4, 77–86.CrossRefGoogle Scholar
  43. 43.
    Felsenstein, J. (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–91.CrossRefGoogle Scholar
  44. 44.
    Felsenstein, J., and Churchill, G. (1996) A hidden Markov model approach to variation among sites in rate of evolution. Mol Biol Evol 13, 93–104.PubMedGoogle Scholar
  45. 45.
    Schniger, M., and von Haesler, A. (1994) A stochastic model for the evolution of autocorrelated DNA sequences. Mol Phylogeny Evol 3, 240–7.CrossRefGoogle Scholar
  46. 46.
    Muse, S. (1995) Evolutionary analyses of DNA sequences subject to constraints on secondary structure. Genetics 139, 1429–39.PubMedGoogle Scholar
  47. 47.
    Tillier, E., and Collins, R. (1998) High apparent rate of simultaneous compensatory base-pair substitutions in ribosomal rna. Genetics 148, 1993–2002.PubMedGoogle Scholar
  48. 48.
    Aarts, E., and Lenstra, J. K. (1997) Local Search in Combinatorial Optimization. Wiley, Chichester.Google Scholar
  49. 49.
    Yang, Z. (1997) PAML : a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13, 555–6.PubMedGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Stéphane Guindon
    • 1
    • 2
  • Frédéric Delsuc
    • 3
  • Jean-François Dufayard
    • 1
  • Olivier Gascuel
    • 1
  1. 1.Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM). Montpellier I IUMR 5506-CNRS, UniversitéMontpellierFrance
  2. 2.Department of StatisticsUniversity of AucklandAucklandNew Zealand
  3. 3.Institut des Sciences de l’Evolution de Montpellier (ISEM), UMR 5554-CNRSUniversité Montpellier I IMontpellierFrance

Personalised recommendations