High-Performance Phylogenetic Inference

  • David A. BaderEmail author
  • Kamesh Madduri
Part of the Computational Biology book series (COBO, volume 29)


Software tools based on the maximum likelihood method and Bayesian methods are widely used for phylogenetic tree inference. This article surveys recent research on parallelization and performance optimization of state-of-the-art tree inference tools. We outline advances in shared-memory multicore parallelization, optimizations for efficient Graphics Processing Unit (GPU) execution, as well as large-scale distributed-memory parallelization.


Phylogenetic tree inference Maximum likelihood Bayesian inference Parallel algorithms Algorithm engineering 



This work is supported in part by the National Science Foundation awards #1339745, #1439057, and #1535058.


  1. 1.
    Aberer, A.J., Kobert, K., Stamatakis, A.: ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Mol. Biol. Evol. 31(10), 2553–2556 (2014). Scholar
  2. 2.
    Altekar, G., Dwarkadas, S., Huelsenbeck, J.P., Ronquist, F.: Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics 20(3), 407–415 (2004). Scholar
  3. 3.
    Ayres, D.L., Cummings, M.P.: Rerooting trees increases opportunities for concurrent computation and results in markedly improved performance for phylogenetic inference. In: Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 247–256 (2018).
  4. 4.
    Ayres, D.L., Darling, A., Zwickl, D.J., Beerli, P., Holder, M.T., Lewis, P.O., Huelsenbeck, J.P., Ronquist, F., Swofford, D.L., Cummings, M.P., Rambaut, A., Suchard, M.A.: BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics. Syst. Biol. 61(1), 170–173 (2012). Scholar
  5. 5.
    Bader, D.A., Moret, B.M.E.: GRAPPA runs in record time. HPC Wire 9, 47 (2000)Google Scholar
  6. 6.
    Bouckaert, R., Heled, J., Kühnert, D., Vaughan, T., Wu, C.H., Xie, D., Suchard, M.A., Rambaut, A., Drummond, A.J.: BEAST 2: a software platform for Bayesian evolutionary analysis. PLOS Comput. Biol. 10(4), 1–6 (2014). Scholar
  7. 7.
    Box, G.E.P., Tiao, G.C.: Bayesian Inference in Statistical Analysis, vol. 40. Wiley (2011)Google Scholar
  8. 8.
    Chor, B., Tuller, T.: Maximum likelihood of evolutionary trees: hardness and approximation. Bioinformatics 21(suppl1), i97–i106 (2005). Scholar
  9. 9.
    CIPRES Cyberinfrastructure for Phylogenetic Research. Accessed Oct 2018
  10. 10.
    Dereeper, A., Guignon, V., Blanc, G., Audic, S., Buffet, S., Chevenet, F., Dufayard, J.F., Guindon, S., Lefort, V., Lescot, M., Claverie, J.M., Gascuel, O.: robust phylogenetic analysis for the non-specialist. Nucleic Acids Res. 36(suppl2), W465–W469 (2008). Scholar
  11. 11.
    Drummond, A.J., Rambaut, A.: BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7(1), 214 (2007). Scholar
  12. 12.
    Dutheil, J., Gaillard, S., Bazin, E., Glémin, S., Ranwez, V., Galtier, N., Belkhir, K.: Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics. BMC Bioinform. 7(1), 188 (2006). Scholar
  13. 13.
    Felsenstein, J.: PHYLIP version 3.697. Accessed Oct 2018
  14. 14.
    Felsenstein, J.: Phylogeny programs. Accessed Oct 2018
  15. 15.
    Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17(6), 368–376 (1981). Scholar
  16. 16.
    Feng, X., Buell, D.A., Rose, J.R., Waddell, P.J.: Parallel algorithms for Bayesian phylogenetic inference. J. Parallel Distrib. Comput. 63(7), 707–718 (2003). Scholar
  17. 17.
    Fitch, W.M.: On the problem of discovering the most parsimonious tree. Am. Nat. 111(978), 223–257 (1977). Scholar
  18. 18.
    Fitch, W.M., Margoliash, E.: Construction of phylogenetic trees. Science 155(3760), 279–284 (1967)CrossRefGoogle Scholar
  19. 19.
    Flouri, T., Izquierdo-Carrasco, F., Darriba, D., Aberer, A., Nguyen, L.T., Minh, B., Von Haeseler, A., Stamatakis, A.: The phylogenetic likelihood library. Syst. Biol. 64(2), 356–362 (2015). Scholar
  20. 20.
    Foulds, L.R., Graham, R.L.: The Steiner problem in phylogeny is NP-complete. Adv. Appl. Math. 3(1), 43–49 (1982)MathSciNetCrossRefGoogle Scholar
  21. 21.
    GRAPPA genome rearrangements analysis under parsimony and other phylogenetic algorithms. Accessed Oct 2018
  22. 22.
    Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W., Gascuel, O.: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59(3), 307–321 (2010). Scholar
  23. 23.
    Guindon, S., Gascuel, O.: Recent computational advances in maximum-likelihood phylogenetic inference. In: Warnow, T. (ed.) Bioinformatics and Phylogenetics—Seminal Contributions of Bernard Moret. Springer International Publishing AG (2018)Google Scholar
  24. 24.
    Holder, M., Lewis, P.O.: Phylogeny estimation: traditional and Bayesian approaches. Nat. Rev. Genet. 4(4), 275–284 (2003)CrossRefGoogle Scholar
  25. 25.
    Huelsenbeck, J.P., Ronquist, F., Nielsen, R., Bollback, J.P.: Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294(5550), 2310–2314 (2001). Scholar
  26. 26.
    Keane, T.M., Naughton, T.J., Travers, S.A.A., McInerney, J.O., McCormack, G.P.: DPRml: distributed phylogeny reconstruction by maximum likelihood. Bioinformatics 21(7), 969–974 (2005). Scholar
  27. 27.
    Kobert, K., Flouri, T., Aberer, A., Stamatakis, A.: The divisible load balance problem and its application to phylogenetic inference. In: Brown, D., Morgenstern, B. (eds.) Algorithms in Bioinformatics, pp. 204–216. Springer, Berlin Heidelberg (2014)Google Scholar
  28. 28.
    Kozlov, A.: amkozlov/raxml-ng: RAxML-NG v0.6.0 BETA (2018).
  29. 29.
    Kozlov, A.M., Aberer, A.J., Stamatakis, A.: ExaML version 3: a tool for phylogenomic analyses on supercomputers. Bioinformatics 31(15), 2577–2579 (2015). Scholar
  30. 30.
    Miller, M.A., Schwartz, T., Pfeiffer, W.: User behavior and usage patterns for a highly accessed science gateway. In: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale, pp. 46:1–46:8. ACM (2016).
  31. 31.
    Minh, B.Q., Vinh, L.S., von Haeseler, A., Schmidt, H.A.: pIQPNNI: parallel reconstruction of large maximum likelihood phylogenies. Bioinformatics 21(19), 3794–3796 (2005). Scholar
  32. 32.
    Moret, B.M., Tang, J., Wang, L.S., Warnow, T.: Steps toward accurate reconstructions of phylogenies from gene-order data. J. Comput. Syst. Sci. 65(3), 508–525 (2002). Scholar
  33. 33.
    Moret, B.M., Wang, L.S., Warnow, T., Wyman, S.K.: New approaches for reconstructing phylogenies from gene order data. Bioinformatics 17(suppl1), S165–S173 (2001). Scholar
  34. 34.
    Moret, B.M.E., Bader, D.A., Warnow, T.: High-performance algorithm engineering for computational phylogenetics. J. Supercomput. 22(1), 99–111 (2002). Scholar
  35. 35.
    Moret, B.M.E., Lin, Y., Tang, J.: Rearrangements in phylogenetic inference: compare, model, or encode? In: Chauve, C., El-Mabrouk, N., Tannier, E. (eds.) Models and Algorithms for Genome Evolution, pp. 147–171. Springer, London (2013). Scholar
  36. 36.
    Nekrutenko, A., Galaxy Team, Goecks, J., Taylor, J., Blankenberg, D.: Biology needs evolutionary software tools: let’s build them right. Mol. Biol. Evol. 35(6), 1372–1375 (2018). Scholar
  37. 37.
    Nguyen, L.T., Schmidt, H.A., von Haeseler, A., Minh, B.Q.: IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32(1), 268–274 (2015). Scholar
  38. 38.
    Nguyen, N., Mirarab, S., Warnow, T.: MRL and SuperFine+MRL: new supertree methods. Algorithms Mol. Biol. 7(1), 3 (2012). Scholar
  39. 39.
    OMICtools: phylogenetic inference software tools. Accessed Oct 2018
  40. 40.
    Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree 2 approximately maximum-likelihood trees for large alignments. PLOS ONE 5(3), 1–10 (2010). Scholar
  41. 41.
    Roch, S.: A short proof that phylogenetic tree reconstruction by maximum likelihood is hard. IEEE/ACM Trans. Comput. Biol. Bioinform. 3(1), 92 (2006). Scholar
  42. 42.
    Ronquist, F., Huelsenbeck, J.P.: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19(12), 1572–1574 (2003). Scholar
  43. 43.
    Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4(4), 406–425 (1987)Google Scholar
  44. 44.
    Sankoff, D., Blanchette, M.: The median problem for breakpoints in comparative genomics. In: Jiang, T., Lee, D.T. (eds.) Computing and Combinatorics, pp. 251–263. Springer, Berlin, Heidelberg (1997)zbMATHGoogle Scholar
  45. 45.
    Snell, Q., Whiting, M., Clement, M., McLaughlin, D.: Parallel phylogenetic inference. In: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing. IEEE Computer Society (2000)Google Scholar
  46. 46.
    Sokal, R.R., Michener, C.D.: A statistical method for evaluating systematic relationship. Univ. Kansas Sci. Bull. 28, 1409–1438 (1958)Google Scholar
  47. 47.
    Stamatakis, A.: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9), 1312–1313 (2014). Scholar
  48. 48.
    Stamatakis, A.: A review of approaches for optimizing phylogenetic likelihood calculations. In: Warnow, T. (ed.) Bioinformatics and Phylogenetics—Seminal Contributions of Bernard Moret. Springer International Publishing AG (2018)Google Scholar
  49. 49.
    Stewart, C.A., Hart, D., Berry, D.K., Olsen, G.J., Wernert, E.A., Fischer, W.: Parallel implementation and performance of fastDNAml: a program for maximum likelihood phylogenetic inference. In: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing. ACM (2001).
  50. 50.
    Suchard, M.A., Rambaut, A.: Many-core algorithms for statistical phylogenetics. Bioinformatics 25(11), 1370–1376 (2009). Scholar
  51. 51.
    Tavaré, S.: Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. Math. Life Sci. 17(2), 57–86 (1986)MathSciNetzbMATHGoogle Scholar
  52. 52.
    Yang, Z.: Computational Molecular Evolution. Oxford University Press (2006)Google Scholar
  53. 53.
    Zhou, X., Shen, X.X., Hittinger, C.T., Rokas, A.: Evaluating fast maximum likelihood-based phylogenetic programs using empirical phylogenomic data sets. Mol. Biol. Evol. 35(2), 486–503 (2018). Scholar
  54. 54.
    Zwickl, D.J.: Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph.D. thesis, The University of Texas at Austin (2006)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Georgia Institute of TechnologyAtlantaUSA
  2. 2.Pennsylvania State UniversityUniversity ParkUSA

Personalised recommendations