Estimating Phylogenies from Molecular Data

  • Daniele Catanzaro


Phylogenetic estimation from aligned DNA, RNA or amino acid sequences has attracted more and more attention in recent years due to its importance in analysis of many fine-scale genetic data. Nowadays, its application fields range from medical research to drug discovery, to epidemiology, to systematics and population dynamics. Estimating phylogenies involves solving an optimization problem, called the phylogenetic estimation problem (PEP), whose versions depend on the criterion used to select a phylogeny among plausible alternatives. This chapter offers an overview of PEP and discuss the most important versions that occur in the literature.


Edge Weight Evolutionary Distance Internal Vertex Molecular Sequence Maximal Lyapunov Exponent 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Daniele Catanzaro acknowledges support from the Belgian National Fund for Scientific Research (F.N.R.S.) of which he is “Chargé de Recherches.” Raffaele Pesenti and the anonymous reviewers for their valuable comments on previous versions of the manuscript. Finally, thanks to Prof. Mike Steel and Dr. Rosa Maria Lo Presti for helpful and exciting discussions.


  1. 1.
    J. K. Archibald, M. E. Mort, and D. J. Crawford. Bayesian inference of phylogeny: A non-technical primer. Taxon, 52:187–191, 2003CrossRefGoogle Scholar
  2. 2.
    D. A. Bader, B. M. E. Moret, and L. Vawter. Industrial applications of high-performance computing for phylogeny reconstruction. In SPIE ITCom: Commercial application for high-performance computing, pages 159–168. SPIE, WA, 2001Google Scholar
  3. 3.
    J. P. Barthélemy and A. Guénoche. Trees and proximity representations. Wiley, NY, 1991Google Scholar
  4. 4.
    W. A. Beyer, M. Stein, T. Smith, and S. Ulam. A molecular sequence metric and evolutionary trees. Mathematical Biosciences, 19:9–25, 1974CrossRefGoogle Scholar
  5. 5.
    Å. Björck. Numerical methods for least-squares problems. SIAM, PA, 1996CrossRefGoogle Scholar
  6. 6.
    J. Brinkhuis and V. Tikhomirov. Optimization: Insights and applications. Princeton University Press, NJ, 2005Google Scholar
  7. 7.
    D. Bryant, N. Galtier, and M. A. Poursat. Likelihood calculation in molecular phylogenetics. In O. Gascuel, editor, Mathematics of evolution and phylogeny. Oxford University Press, NY, 2005Google Scholar
  8. 8.
    R. M. Bush, C. A. Bender, K. Subbarao, N. J. Cox, and W. M. Fitch. Predicting the evolution of human influenza A. Science, 286(5446):1921–1925, 1999PubMedCrossRefGoogle Scholar
  9. 9.
    D. Catanzaro. The minimum evolution problem: Overview and classification. Networks, 53(2): 112–125, 2009CrossRefGoogle Scholar
  10. 10.
    D. Catanzaro, L. Gatto, and M. Milinkovitch. Assessing the applicability of the GTR nucleotide substitution model through simulations. Evolutionary Bioinformatics, 2:145–155, 2006Google Scholar
  11. 11.
    D. Catanzaro, R. Pesenti, and M. Milinkovitch. A non-linear optimization procedure to estimate distances and instantaneous substitution rate matrices under the GTR model. Bioinformatics, 22(6):708–715, 2006PubMedCrossRefGoogle Scholar
  12. 12.
    D. Catanzaro, R. Pesenti, and M. C. Milinkovitch. A very large-scale neighborhood search to estimate phylogenies under the maximum likelihood criterion. Technical report, G.O.M. – Computer Science Department – Université Libre de Bruxelles (U.L.B.), 2007Google Scholar
  13. 13.
    D. Catanzaro, M. Labbé, R. Pesenti, and J. J. Salazar-Gonzalez. The balanced minimum evolution problem. Technical report, G.O.M. – Computer Science Department – Université Libre de Bruxelles (U.L.B.), 2009Google Scholar
  14. 14.
    D. Catanzaro, M. Labbé, R. Pesenti, and J. J. Salazar-Gonzalez. Mathematical models to reconstruct phylogenetic trees under the minimum evolution criterion. Networks, 53(2):126–140, 2009CrossRefGoogle Scholar
  15. 15.
    L. L. Cavalli-Sforza and A. W. F. Edwards. Phylogenetic analysis: Models and estimation procedures. American Journal of Human Genetics, 19:233–257, 1967PubMedGoogle Scholar
  16. 16.
    R. Chakraborty. Estimation of time of divergence from phylogenetic studies. Canadian Journal of Genetics and Cytology, 19:217–223, 1977Google Scholar
  17. 17.
    B. S. W. Chang and M. J. Donoghue. Recreating ancestral proteins. Trends in Ecology and Evolution, 15(3):109–114, 2000PubMedCrossRefGoogle Scholar
  18. 18.
    L. Chisci. Sistemi Dinamici – Parte I. Pitagora, Italy, 2001Google Scholar
  19. 19.
    B. Chor, M. D. Hendy, B. R. Holland, and D. Penny. Multiple maxima of likelihood in phylogenetic trees: An analytic approach. Molecular Biology and Evolution, 17(10):1529–1541, 2000PubMedCrossRefGoogle Scholar
  20. 20.
    B. Chor, M. D. Hendy, and S. Snir. Maximum likelihood jukes-cantor triplets: Analytic solutions. Molecular Biology and Evolution, 23(3):626–632, 2005PubMedCrossRefGoogle Scholar
  21. 21.
    A. R. Conn, N. I. M. Gould, and P. L. Toint. Trust-region methods. SIAM, PA, 2000CrossRefGoogle Scholar
  22. 22.
    W. H. E. Day. Computational complexity of inferring phylogenies from dissimilarity matrices. Bulletin of Mathematical Biology, 49:461–467, 1987PubMedGoogle Scholar
  23. 23.
    F. Denis and O. Gascuel. On the consistency of the minimum evolution principle of phylogenetic inference. Discrete Applied Mathematics, 127:66–77, 2003CrossRefGoogle Scholar
  24. 24.
    R. Desper and O. Gascuel. Fast and accurate phylogeny reconstruction algorithms based on the minimum evolution principle. Journal of Computational Biology, 9(5):687–705, 2002PubMedCrossRefGoogle Scholar
  25. 25.
    R. Desper and O. Gascuel. Theoretical foundations of the balanced minimum evolution method of phylogenetic inference and its relationship to the weighted least-squares tree fitting. Molecular Biology and Evolution, 21(3):587–598, 2004PubMedCrossRefGoogle Scholar
  26. 26.
    M. Farach, S. Kannan, and T. Warnow. A robust model for finding optimal evolutionary trees. Algorithmica, 13:155–179, 1995CrossRefGoogle Scholar
  27. 27.
    J. Felsenstein. An alternating least-squares approach to inferring phylogenies from pairwise distances. Systematic Biology, 46:101–111, 1997PubMedCrossRefGoogle Scholar
  28. 28.
    J. Felsenstein. Evolutionary trees from DNA sequences: A maximum likelihood approach. Journal of Molecular Evolution, 17:368–376, 1981PubMedCrossRefGoogle Scholar
  29. 29.
    J. Felsenstein. Inferring phylogenies. Sinauer Associates, MA, 2004Google Scholar
  30. 30.
    G. S. Fishman. Monte Carlo: Concepts, algorithms, and applications. Springer, NY, 1996Google Scholar
  31. 31.
    W. M. Fitch and E. Margoliash. Construction of phylogenetic trees. Science, 155:279–284, 1967PubMedCrossRefGoogle Scholar
  32. 32.
    O. Gascuel. Mathematics of evolution and phylogeny. Oxford University Press, NY, 2005Google Scholar
  33. 33.
    O. Gascuel and D. Levy. A reduction algorithm for approximating a (non-metric) dissimilarity by a tree distance. Journal of Classification, 13:129–155, 1996CrossRefGoogle Scholar
  34. 34.
    O. Gascuel and M. A. Steel. Reconstructing evolution. Oxford University Press, NY, 2007Google Scholar
  35. 35.
    O. Gascuel, D. Bryant, and F. Denis. Strengths and limitations of the minimum evolution principle. Systematic Biology, 50:621–627, 2001PubMedCrossRefGoogle Scholar
  36. 36.
    P. H. Harvey, A. J. L. Brown, J. M. Smith, and S. Nee. New uses for new phylogenies. Oxford University Press, Oxford, 1996Google Scholar
  37. 37.
    M. Hasegawa, H. Kishino, and T. Yano. Evolutionary trees from DNA sequences: A maximum likelihood approach. Journal of Molecular Evolution, 17:368–376, 1981CrossRefGoogle Scholar
  38. 38.
    M. Hasegawa, H. Kishino, and T. Yano. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution, 22:160–174, 1985PubMedCrossRefGoogle Scholar
  39. 39.
    D. P. Heyman and M. J. Sobel, editors. Stochastic models, volume 2 of Handbooks in operations research and management science. North-Holland, Amsterdam, 1990Google Scholar
  40. 40.
    S. Horai, Y. Sattah, K. Hayasaka, R. Kondo, T. Inoue, T. Ishida, S. Hayashi, and N. Takahata. Man’s place in the hominoidea revealed by mitochondrial DNA genealogy. Journal of Molecular Evolution, 35:32–43, 1992PubMedCrossRefGoogle Scholar
  41. 41.
    J. P. Huelsenbeck, B. Larget, P. van der Mark, and F. Ronquist. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics, 17(8):754–755, 2001PubMedCrossRefGoogle Scholar
  42. 42.
    J. P. Huelsenbeck, F. Ronquist, R. Nielsen, and J. P. Bollback. Bayesian inference of phylogeny and its impact on evolutionary biology. Science, 294:2310–2314, 2001PubMedCrossRefGoogle Scholar
  43. 43.
    J. P. Huelsenbeck, B. Larget, R. E. Miller, and F. Ronquist. Potential applications and pitfalls of bayesian inference of phylogeny. Systematic Biology, 51:673–688, 2002PubMedCrossRefGoogle Scholar
  44. 44.
    T. H. Jukes and C.R. Cantor. Evolution of protein molecules. In H. N. Munro, editor, Mammalian protein metabolism, pages 21–123. Academic Press, NY, 1969Google Scholar
  45. 45.
    K. K. Kidd and L. A. Sgaramella-Zonta. Phylogenetic analysis: Concepts and methods. American Journal of Human Genetics, 23:235–252, 1971PubMedGoogle Scholar
  46. 46.
    M. Kimura. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution, 16:111–120, 1980PubMedCrossRefGoogle Scholar
  47. 47.
    M. K. Kuhner and J. Felsenstein. A simulation comparison of phylogeny algorithms under equal and unequal rates. Molecular Biology and Evolution, 11(3):584–593, 1994Google Scholar
  48. 48.
    C. Lanave, G. Preparata, C. Saccone, and G. Serio. A new method for calculating evolutionary substitution rates. Journal of Molecular Evolution, 20:86–93, 1984PubMedCrossRefGoogle Scholar
  49. 49.
    S. Li, D. Pearl, and H. Doss. Phylogenetic tree construction using Markov chain Monte Carlo. Journal of the American Statistical Association, 95:493–508, 2000CrossRefGoogle Scholar
  50. 50.
    V. Makarenkov and B. Leclerc. An algorithm for the fitting of a tree metric according to a weighted least-squares criterion. Journal of Classification, 16:3–26, 1999CrossRefGoogle Scholar
  51. 51.
    M. A. Marra, S. J. Jones, C. R. Astell, R. A. Holt, A. Brooks-Wilson, Y. S. Butterfield, J. Khattra, J. K. Asano, S. A. Barber, S. Y. Chan, A. Cloutier, S. M. Coughlin, D. Freeman, N. Girn, O. L. Griffith, S. R. Leach, M. Mayo, H. McDonald, S. B. Montgomery, P. K. Pandoh, A. S. Petrescu, A. G. Robertson, J. E. Schein, A. Siddiqui, D. E. Smailus, J. M. Stott, G. S. Yang, F. Plummer, A. Andonov, H. Artsob, N. Bastien, K. Bernard, T. F. Booth, D. Bowness, M. Czub, M. Drebot, L. Fernando, R. Flick, M. Garbutt, M. Gray, A. Grolla, S. Jones, H. Feldmann, A. Meyers, A. Kabani, Y. Li, S. Normand, U. Stroher, G. A. Tipples, S. Tyler, R. Vogrig, D. Ward, B. Watson, R. C. Brunham, M. Krajden, M. Petric, D. M. Skowronski, C. Upton, and R. L. Roper. The genome sequence of the SARS-associated coronavirus. Science, 300(5624):1399–1404, 2003PubMedCrossRefGoogle Scholar
  52. 52.
    B. Mau and M. A. Newton. Phylogenetic inference for binary data on dendograms using Markov chain Monte Carlo. Journal of Computational and Graphical Statistics, 6:122–131, 1997Google Scholar
  53. 53.
    G. L. Nemhauser and L. A. Wolsey. Integer and combinatorial optimization. Wiley-Interscience, NY, 1999Google Scholar
  54. 54.
    G. L. Nemhauser, A. H. G. Rinnooy Kan, and M. J. Tod, editors. Optimization, volume 1 of Handbooks in operations research and management science. North-Holland, Amsterdam, 1989Google Scholar
  55. 55.
    C. Y. Ou, C. A. Ciesielski, G. Myers, C. I. Bandea, C. C. Luo, B. T. M. Korber, J. I. Mullins, G. Schochetman, R. L. Berkelman, A. N. Economou, J. J. Witte, L. J. Furman, G. A. Satten, K. A. Maclnnes, J. W. Curran, and H. W. Jaffe. Molecular epidemiology of HIV transmission in a dental practice. Science, 256(5060):1165–1171, 1992PubMedCrossRefGoogle Scholar
  56. 56.
    L. Pachter and B. Sturmfels. The mathematics of phylogenomics. SIAM Review, 49(1):3–31, 2007CrossRefGoogle Scholar
  57. 57.
    R. D. M. Page and E. C. Holmes. Molecular evolution: A phylogenetic approach. Blackwell Science, Oxford, 1998Google Scholar
  58. 58.
    J. M. Park and M. W. Deem. Phase diagrams of quasispecies theory with recombination and horizontal gene transfer. Physical Review Letters, 98:058101–058104, 2007PubMedCrossRefGoogle Scholar
  59. 59.
    Y. Pauplin. Direct calculation of a tree length using a distance matrix. Journal of Molecular Evolution, 51:41–47, 2000PubMedGoogle Scholar
  60. 60.
    P. A. Pevzner. Computational molecular biology. MIT, MA, 2000Google Scholar
  61. 61.
    D. D. Pollock, W. R. Taylor, and N. Goldman. Coevolving protein residues: Maximum likelihood identification and relationship to structure. Journal of Molecular Biology, 287(1): 187–198, 1999PubMedCrossRefGoogle Scholar
  62. 62.
    S. Roch. A short proof that phylogenetic tree reconstruction by maximum likelihood is hard. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 3(1):92–94, 2006PubMedCrossRefGoogle Scholar
  63. 63.
    F. Rodriguez, J. L. Oliver, A. Marin, and J. R. Medina. The general stochastic model of nucleotide substitution. Journal of Theoretical Biology, 142:485–501, 1990PubMedCrossRefGoogle Scholar
  64. 64.
    J. S. Rogers and D. Swofford. Multiple local maxima for likelihoods of phylogenetic trees from nucleotide sequences. Molecular Biology and Evolution, 16:1079–1085, 1999PubMedCrossRefGoogle Scholar
  65. 65.
    F. Ronquist and J. P. Huelsenbeck. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics, 19(12):1572–1574, 2003PubMedCrossRefGoogle Scholar
  66. 66.
    H. A. Ross and A. G. Rodrigo. Immune-mediated positive selection drives human immunodeficency virus type 1 molecular variation and predicts disease duration. Journal of Virology, 76(22):11715–11720, 2002PubMedCrossRefGoogle Scholar
  67. 67.
    C. Rydin and M. Källersjö. Taxon sampling and seed plant phylogeny. Cladistics, 18:485–513, 2002Google Scholar
  68. 68.
    A. Rzhetsky and M. Nei. Theoretical foundations of the minimum evolution method of phylogenetic inference. Molecular Biology and Evolution, 10:1073–1095, 1993PubMedGoogle Scholar
  69. 69.
    A. Rzhetsky and M. Nei. Statistical properties of the ordinary least-squares generalized least-squares and minimum evolution methods of phylogenetic inference. Journal of Molecular Evolution, 35:367–375, 1992PubMedCrossRefGoogle Scholar
  70. 70.
    N. Saitou and M. Nei. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4:406–425, 1987PubMedGoogle Scholar
  71. 71.
    E. Schadt and K. Lange. Codon and rate variation models in molecular phylogeny. Molecular Biology and Evolution, 19(9):1534–1549, 2002PubMedCrossRefGoogle Scholar
  72. 72.
    E. Schadt and K. Lange. Applications of codon and rate variation models in molecular phylogeny. Molecular Biology and Evolution, 19(9):1550–1562, 2002PubMedCrossRefGoogle Scholar
  73. 73.
    C. Semple and M. A. Steel. Phylogenetics. Oxford University Press, NY, 2003Google Scholar
  74. 74.
    P. H. A. Sneath and R. R. Sokal. Numerical taxonomy. W. K. Freeman and Company, CA, 1963Google Scholar
  75. 75.
    J. A. Studier and K. J. Keppler. A note on the neighbor-joining algorithm of Saitou and Nei. Molecular Biology and Evolution, 5:729–731, 1988PubMedGoogle Scholar
  76. 76.
    D. L. Swofford. PAUP* version 4.0. Sinauer Associates, MA, 1997Google Scholar
  77. 77.
    D. L. Swofford, G. J. Olsen, P. J. Waddell, and D. M. Hillis. Phylogenetic inference. In D. M. Hillis, C. Moritz, and B. K. Mable, editors, Molecular systematics, pages 407–514. Sinauer Associates, MA, 1996Google Scholar
  78. 78.
    K. Tamura and M. Nei. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution, 10(3):512–526, 1993PubMedGoogle Scholar
  79. 79.
    P. J. Waddell and M. A. Steel. General time-reversible distances with unequal rates across sites: Mixing gamma and inverse gaussian distributions with invariant sites. Molecular Phylogenetics and Evolution, 8:398–414, 1997PubMedCrossRefGoogle Scholar
  80. 80.
    M. S. Waterman, T. F. Smith, M. Singh, and W. A. Beyer. Additive evolutionary trees. Journal of Theoretical Biology, 64:199–213, 1977PubMedCrossRefGoogle Scholar
  81. 81.
    Z. Yang. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods. Journal of Molecular Evolution, 39:306–314, 1994PubMedCrossRefGoogle Scholar
  82. 82.
    Z. Yang. Bayesian inference in molecular phylogenetics. In O. Gascuel, editor, Mathematics of evolution and phylogeny. Oxford University Press, NY, 2005Google Scholar
  83. 83.
    Z. Yang and B. Rannala. Bayesian phylogenetic inference using DNA sequences: A Markov chain Monte Carlo method. Molecular Biology and Evolution, 14:717–724, 1997PubMedCrossRefGoogle Scholar
  84. 84.
    L. A. Zadeh and C. A. Desoer. Linear system theory. McGraw-Hill, NY, 1963Google Scholar

Copyright information

© Springer New York 2011

Authors and Affiliations

  1. 1.Service Graphes and Mathematical Optimization, Computer Science DepartmentUniversité Libre de Bruxelles (U.L.B.)BrusselsBelgium

Personalised recommendations