Advertisement

High-Performance Phylogenetic Inference

  • David A. BaderEmail author
  • Kamesh Madduri
Chapter
Part of the Computational Biology book series (COBO, volume 29)

Abstract

Software tools based on the maximum likelihood method and Bayesian methods are widely used for phylogenetic tree inference. This article surveys recent research on parallelization and performance optimization of state-of-the-art tree inference tools. We outline advances in shared-memory multicore parallelization, optimizations for efficient Graphics Processing Unit (GPU) execution, as well as large-scale distributed-memory parallelization.

Keywords

Phylogenetic tree inference Maximum likelihood Bayesian inference Parallel algorithms Algorithm engineering 

Notes

Acknowledgements

This work is supported in part by the National Science Foundation awards #1339745, #1439057, and #1535058.

References

  1. 1.
    Aberer, A.J., Kobert, K., Stamatakis, A.: ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Mol. Biol. Evol. 31(10), 2553–2556 (2014).  https://doi.org/10.1093/molbev/msu236CrossRefGoogle Scholar
  2. 2.
    Altekar, G., Dwarkadas, S., Huelsenbeck, J.P., Ronquist, F.: Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics 20(3), 407–415 (2004).  https://doi.org/10.1093/bioinformatics/btg427CrossRefGoogle Scholar
  3. 3.
    Ayres, D.L., Cummings, M.P.: Rerooting trees increases opportunities for concurrent computation and results in markedly improved performance for phylogenetic inference. In: Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 247–256 (2018).  https://doi.org/10.1109/IPDPSW.2018.00049
  4. 4.
    Ayres, D.L., Darling, A., Zwickl, D.J., Beerli, P., Holder, M.T., Lewis, P.O., Huelsenbeck, J.P., Ronquist, F., Swofford, D.L., Cummings, M.P., Rambaut, A., Suchard, M.A.: BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics. Syst. Biol. 61(1), 170–173 (2012).  https://doi.org/10.1093/sysbio/syr100CrossRefGoogle Scholar
  5. 5.
    Bader, D.A., Moret, B.M.E.: GRAPPA runs in record time. HPC Wire 9, 47 (2000)Google Scholar
  6. 6.
    Bouckaert, R., Heled, J., Kühnert, D., Vaughan, T., Wu, C.H., Xie, D., Suchard, M.A., Rambaut, A., Drummond, A.J.: BEAST 2: a software platform for Bayesian evolutionary analysis. PLOS Comput. Biol. 10(4), 1–6 (2014).  https://doi.org/10.1371/journal.pcbi.1003537CrossRefGoogle Scholar
  7. 7.
    Box, G.E.P., Tiao, G.C.: Bayesian Inference in Statistical Analysis, vol. 40. Wiley (2011)Google Scholar
  8. 8.
    Chor, B., Tuller, T.: Maximum likelihood of evolutionary trees: hardness and approximation. Bioinformatics 21(suppl1), i97–i106 (2005).  https://doi.org/10.1093/bioinformatics/bti1027CrossRefGoogle Scholar
  9. 9.
    CIPRES Cyberinfrastructure for Phylogenetic Research. http://www.phylo.org/. Accessed Oct 2018
  10. 10.
    Dereeper, A., Guignon, V., Blanc, G., Audic, S., Buffet, S., Chevenet, F., Dufayard, J.F., Guindon, S., Lefort, V., Lescot, M., Claverie, J.M., Gascuel, O.: Phylogeny.fr: robust phylogenetic analysis for the non-specialist. Nucleic Acids Res. 36(suppl2), W465–W469 (2008).  https://doi.org/10.1093/nar/gkn180CrossRefGoogle Scholar
  11. 11.
    Drummond, A.J., Rambaut, A.: BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7(1), 214 (2007).  https://doi.org/10.1186/1471-2148-7-214CrossRefGoogle Scholar
  12. 12.
    Dutheil, J., Gaillard, S., Bazin, E., Glémin, S., Ranwez, V., Galtier, N., Belkhir, K.: Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics. BMC Bioinform. 7(1), 188 (2006).  https://doi.org/10.1186/1471-2105-7-188CrossRefGoogle Scholar
  13. 13.
    Felsenstein, J.: PHYLIP version 3.697. http://evolution.genetics.washington.edu/phylip.html. Accessed Oct 2018
  14. 14.
    Felsenstein, J.: Phylogeny programs. http://evolution.genetics.washington.edu/phylip/software.html. Accessed Oct 2018
  15. 15.
    Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17(6), 368–376 (1981).  https://doi.org/10.1007/BF01734359CrossRefGoogle Scholar
  16. 16.
    Feng, X., Buell, D.A., Rose, J.R., Waddell, P.J.: Parallel algorithms for Bayesian phylogenetic inference. J. Parallel Distrib. Comput. 63(7), 707–718 (2003).  https://doi.org/10.1016/S0743-7315(03)00079-0CrossRefGoogle Scholar
  17. 17.
    Fitch, W.M.: On the problem of discovering the most parsimonious tree. Am. Nat. 111(978), 223–257 (1977).  https://doi.org/10.1086/283157CrossRefGoogle Scholar
  18. 18.
    Fitch, W.M., Margoliash, E.: Construction of phylogenetic trees. Science 155(3760), 279–284 (1967)CrossRefGoogle Scholar
  19. 19.
    Flouri, T., Izquierdo-Carrasco, F., Darriba, D., Aberer, A., Nguyen, L.T., Minh, B., Von Haeseler, A., Stamatakis, A.: The phylogenetic likelihood library. Syst. Biol. 64(2), 356–362 (2015).  https://doi.org/10.1093/sysbio/syu084CrossRefGoogle Scholar
  20. 20.
    Foulds, L.R., Graham, R.L.: The Steiner problem in phylogeny is NP-complete. Adv. Appl. Math. 3(1), 43–49 (1982)MathSciNetCrossRefGoogle Scholar
  21. 21.
    GRAPPA genome rearrangements analysis under parsimony and other phylogenetic algorithms. https://www.cs.unm.edu/~moret/GRAPPA/. Accessed Oct 2018
  22. 22.
    Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W., Gascuel, O.: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59(3), 307–321 (2010).  https://doi.org/10.1093/sysbio/syq010CrossRefGoogle Scholar
  23. 23.
    Guindon, S., Gascuel, O.: Recent computational advances in maximum-likelihood phylogenetic inference. In: Warnow, T. (ed.) Bioinformatics and Phylogenetics—Seminal Contributions of Bernard Moret. Springer International Publishing AG (2018)Google Scholar
  24. 24.
    Holder, M., Lewis, P.O.: Phylogeny estimation: traditional and Bayesian approaches. Nat. Rev. Genet. 4(4), 275–284 (2003)CrossRefGoogle Scholar
  25. 25.
    Huelsenbeck, J.P., Ronquist, F., Nielsen, R., Bollback, J.P.: Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294(5550), 2310–2314 (2001).  https://doi.org/10.1126/science.1065889CrossRefGoogle Scholar
  26. 26.
    Keane, T.M., Naughton, T.J., Travers, S.A.A., McInerney, J.O., McCormack, G.P.: DPRml: distributed phylogeny reconstruction by maximum likelihood. Bioinformatics 21(7), 969–974 (2005).  https://doi.org/10.1093/bioinformatics/bti100CrossRefGoogle Scholar
  27. 27.
    Kobert, K., Flouri, T., Aberer, A., Stamatakis, A.: The divisible load balance problem and its application to phylogenetic inference. In: Brown, D., Morgenstern, B. (eds.) Algorithms in Bioinformatics, pp. 204–216. Springer, Berlin Heidelberg (2014)Google Scholar
  28. 28.
    Kozlov, A.: amkozlov/raxml-ng: RAxML-NG v0.6.0 BETA (2018).  https://doi.org/10.5281/zenodo.1291478
  29. 29.
    Kozlov, A.M., Aberer, A.J., Stamatakis, A.: ExaML version 3: a tool for phylogenomic analyses on supercomputers. Bioinformatics 31(15), 2577–2579 (2015).  https://doi.org/10.1093/bioinformatics/btv184CrossRefGoogle Scholar
  30. 30.
    Miller, M.A., Schwartz, T., Pfeiffer, W.: User behavior and usage patterns for a highly accessed science gateway. In: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale, pp. 46:1–46:8. ACM (2016).  https://doi.org/10.1145/2949550
  31. 31.
    Minh, B.Q., Vinh, L.S., von Haeseler, A., Schmidt, H.A.: pIQPNNI: parallel reconstruction of large maximum likelihood phylogenies. Bioinformatics 21(19), 3794–3796 (2005).  https://doi.org/10.1093/bioinformatics/bti594CrossRefGoogle Scholar
  32. 32.
    Moret, B.M., Tang, J., Wang, L.S., Warnow, T.: Steps toward accurate reconstructions of phylogenies from gene-order data. J. Comput. Syst. Sci. 65(3), 508–525 (2002).  https://doi.org/10.1016/S0022-0000(02)00007-7MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Moret, B.M., Wang, L.S., Warnow, T., Wyman, S.K.: New approaches for reconstructing phylogenies from gene order data. Bioinformatics 17(suppl1), S165–S173 (2001).  https://doi.org/10.1093/bioinformatics/17.suppl_1.S165CrossRefGoogle Scholar
  34. 34.
    Moret, B.M.E., Bader, D.A., Warnow, T.: High-performance algorithm engineering for computational phylogenetics. J. Supercomput. 22(1), 99–111 (2002).  https://doi.org/10.1023/A:1014362705613CrossRefzbMATHGoogle Scholar
  35. 35.
    Moret, B.M.E., Lin, Y., Tang, J.: Rearrangements in phylogenetic inference: compare, model, or encode? In: Chauve, C., El-Mabrouk, N., Tannier, E. (eds.) Models and Algorithms for Genome Evolution, pp. 147–171. Springer, London (2013).  https://doi.org/10.1007/978-1-4471-5298-9_7CrossRefGoogle Scholar
  36. 36.
    Nekrutenko, A., Galaxy Team, Goecks, J., Taylor, J., Blankenberg, D.: Biology needs evolutionary software tools: let’s build them right. Mol. Biol. Evol. 35(6), 1372–1375 (2018).  https://doi.org/10.1093/molbev/msy084CrossRefGoogle Scholar
  37. 37.
    Nguyen, L.T., Schmidt, H.A., von Haeseler, A., Minh, B.Q.: IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32(1), 268–274 (2015).  https://doi.org/10.1093/molbev/msu300CrossRefGoogle Scholar
  38. 38.
    Nguyen, N., Mirarab, S., Warnow, T.: MRL and SuperFine+MRL: new supertree methods. Algorithms Mol. Biol. 7(1), 3 (2012).  https://doi.org/10.1186/1748-7188-7-3CrossRefGoogle Scholar
  39. 39.
    OMICtools: phylogenetic inference software tools. https://omictools.com/phylogenetic-inference-category?tab=software&page=1. Accessed Oct 2018
  40. 40.
    Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree 2 approximately maximum-likelihood trees for large alignments. PLOS ONE 5(3), 1–10 (2010).  https://doi.org/10.1371/journal.pone.0009490CrossRefGoogle Scholar
  41. 41.
    Roch, S.: A short proof that phylogenetic tree reconstruction by maximum likelihood is hard. IEEE/ACM Trans. Comput. Biol. Bioinform. 3(1), 92 (2006).  https://doi.org/10.1109/TCBB.2006.4CrossRefGoogle Scholar
  42. 42.
    Ronquist, F., Huelsenbeck, J.P.: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19(12), 1572–1574 (2003).  https://doi.org/10.1093/bioinformatics/btg180CrossRefGoogle Scholar
  43. 43.
    Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4(4), 406–425 (1987)Google Scholar
  44. 44.
    Sankoff, D., Blanchette, M.: The median problem for breakpoints in comparative genomics. In: Jiang, T., Lee, D.T. (eds.) Computing and Combinatorics, pp. 251–263. Springer, Berlin, Heidelberg (1997)zbMATHGoogle Scholar
  45. 45.
    Snell, Q., Whiting, M., Clement, M., McLaughlin, D.: Parallel phylogenetic inference. In: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing. IEEE Computer Society (2000)Google Scholar
  46. 46.
    Sokal, R.R., Michener, C.D.: A statistical method for evaluating systematic relationship. Univ. Kansas Sci. Bull. 28, 1409–1438 (1958)Google Scholar
  47. 47.
    Stamatakis, A.: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9), 1312–1313 (2014).  https://doi.org/10.1093/bioinformatics/btu033CrossRefGoogle Scholar
  48. 48.
    Stamatakis, A.: A review of approaches for optimizing phylogenetic likelihood calculations. In: Warnow, T. (ed.) Bioinformatics and Phylogenetics—Seminal Contributions of Bernard Moret. Springer International Publishing AG (2018)Google Scholar
  49. 49.
    Stewart, C.A., Hart, D., Berry, D.K., Olsen, G.J., Wernert, E.A., Fischer, W.: Parallel implementation and performance of fastDNAml: a program for maximum likelihood phylogenetic inference. In: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing. ACM (2001).  https://doi.org/10.1145/582034.582054
  50. 50.
    Suchard, M.A., Rambaut, A.: Many-core algorithms for statistical phylogenetics. Bioinformatics 25(11), 1370–1376 (2009).  https://doi.org/10.1093/bioinformatics/btp244CrossRefGoogle Scholar
  51. 51.
    Tavaré, S.: Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. Math. Life Sci. 17(2), 57–86 (1986)MathSciNetzbMATHGoogle Scholar
  52. 52.
    Yang, Z.: Computational Molecular Evolution. Oxford University Press (2006)Google Scholar
  53. 53.
    Zhou, X., Shen, X.X., Hittinger, C.T., Rokas, A.: Evaluating fast maximum likelihood-based phylogenetic programs using empirical phylogenomic data sets. Mol. Biol. Evol. 35(2), 486–503 (2018).  https://doi.org/10.1093/molbev/msx302CrossRefGoogle Scholar
  54. 54.
    Zwickl, D.J.: Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph.D. thesis, The University of Texas at Austin (2006)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Georgia Institute of TechnologyAtlantaUSA
  2. 2.Pennsylvania State UniversityUniversity ParkUSA

Personalised recommendations