Skip to main content

A Review of Approaches for Optimizing Phylogenetic Likelihood Calculations

  • Chapter
  • First Online:

Part of the book series: Computational Biology ((COBO,volume 29))

Abstract

The execution times of likelihood-based phylogenetic inference tools for Maximum Likelihood or Bayesian inference are dominated by the Phylogenetic Likelihood Function (PLF). The PLF is executed millions of times in such analyses and accounts for 85–95% of overall run time. In addition, storing the Conditional Likelihood Vectors (CLVs) required for computing the Phylogenetic Likelihood Function largely determines the associated memory consumption. Storing CLVs accounts for approximately 80% of the overall, and typically large, memory footprint of likelihood-based tree inference tools. In this chapter, we review recent technical as well as algorithmic advances for accelerating PLF calculations and for saving CLV memory. We cover topics such as algorithmic techniques for optimizing PLF computations and low-level optimization on modern x86 architectures. We conclude with an outlook on potential future technical and algorithmic developments.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aberer, A.J., Kobert, K., Stamatakis, A.: ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Mol. Biol. Evol. 31(10), 2553–2556 (2014)

    Article  Google Scholar 

  2. Ayres, D.L., Darling, A., Zwickl, D.J., Beerli, P., Holder, M.T., Lewis, P.O., Huelsenbeck, J.P., Ronquist, F., Swofford, D.L., Cummings, M.P., et al.: BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics. Syst. Biol. 61(1), 170–173 (2011)

    Article  Google Scholar 

  3. Biczok, R., Bozsoky, P., Eisenmann, P., Ernst, J., Ribizel, T., Scholz, F., Trefzer, A., Weber, F., Hamann, M., Stamatakis, A.: Two C++ libraries for counting trees on a phylogenetic terrace. bioRxiv, p. 211276 (2017)

    Google Scholar 

  4. Boussau, B., Szöllősi, G.J., Duret, L., Gouy, M., Tannier, E., Daubin, V.: Genome-scale coestimation of species and gene trees. Genome Res. 23(2), 323–330 (2013)

    Article  Google Scholar 

  5. Brent, R.P.: An algorithm with guaranteed convergence for finding a zero of a function. Comput. J. 14(4), 422–425 (1971)

    Article  MathSciNet  Google Scholar 

  6. Chernomor, O., von Haeseler, A., Minh, B.Q.: Terrace aware data structure for phylogenomic inference from supermatrices. Syst. Biol. 65(6), 997–1008 (2016)

    Article  Google Scholar 

  7. Chernomor, O., Minh, B.Q., von Haeseler, A.: Consequences of common topological rearrangements for partition trees in phylogenomic inference. J. Comput. Biol. 22(12), 1129–1142 (2015)

    Article  Google Scholar 

  8. Chor, B., Hendy, M.D., Holland, B.R., Penny, D.: Multiple maxima of likelihood in phylogenetic trees: an analytic approach. Mol. Biol. Evol. 17(10), 1529–1541 (2000)

    Article  Google Scholar 

  9. Constantinescu, M., Sankoff, D.: An efficient algorithm for supertrees. J. Class. 12(1), 101–112 (1995)

    Article  Google Scholar 

  10. Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17(6), 368–376 (1981)

    Article  Google Scholar 

  11. Fletcher, R.: Practical Methods of Optimization. Wiley, New York (1987)

    MATH  Google Scholar 

  12. Flouri, T., Izquierdo-Carrasco, F., Darriba, D., Aberer, A., Nguyen, L.T., Minh, B., Von Haeseler, A., Stamatakis, A.: The phylogenetic likelihood library. Syst. Biol. 64(2), 356–362 (2014)

    Article  Google Scholar 

  13. Hoang, D.T., Chernomor, O., von Haeseler, A., Minh, B.Q., Vinh, L.S.: UFBoot2: improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35(2), 518–522 (2018). https://doi.org/10.1093/molbev/msx281

    Article  Google Scholar 

  14. Izquierdo-Carrasco, F., Gagneur, J., Stamatakis, A.: Trading memory for running time in phylogenetic likelihood computations. Heidelberg Institute for Theoretical Studies (2011)

    Google Scholar 

  15. Izquierdo-Carrasco, F., Smith, S.A., Stamatakis, A.: Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees. BMC Bioinform. 12(1), 470 (2011)

    Article  Google Scholar 

  16. Izquierdo-Carrasco, F., Stamatakis, A.: Computing the phylogenetic likelihood function out-of-core. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Ph.D Forum (IPDPSW), pp. 444–451. IEEE (2011)

    Google Scholar 

  17. Jarvis, E., Mirarab, S., Aberer, A.J., Li, B., Houde, P., Li, C., Ho, S., Faircloth, B.C., Nabholz, B., Howard, J.T., Suh, A., Weber, C.C., da Fonseca, R.R., Li, J., Zhang, F., Li, H., Zhou, L., Narula, N., Liu, L., Ganapathy, G., Boussau, B., Bayzid, M.S., Zavidovych, V., Subramanian, S., Gabaldón, T., Capella-Gutiérrez, S., Huerta-Cepas, J., Rekepalli, B., Munch, K., Schierup, M., Lindow, B., Warren, W.C., Ray, D., Green, R.E., Bruford, M.W., Zhan, X., Dixon, A., Li, S., Li, N., Huang, Y., Derryberry, E.P., Bertelsen, M.F., Sheldon, F.H., Brumfield, R.T., Mello, C.V., Lovell, P.V., Wirthlin, M., Schneider, M.P.C., Prosdocimi, F., Samaniego, J.A., Velazquez, A.M.V., Alfaro-Núnez, A., Campos, P.F., Petersen, B., Sicheritz-Ponten, T., Pas, A., Bailey, T., Scofield, P., Bunce, M., Lambert, D.M., Zhou, Q., Perelman, P., Driskell, A.C., Shapiro, B., Xiong, Z., Zeng, Y., Liu, S., Li, Z., Liu, B., Wu, K., Xiao, J., Yinqi, X., Zheng, Q., Zhang, Y., Yang, H., Wang, J., Smeds, L., Rheindt, F.E., Braun, M., Fjeldsa, J., Orlando, L., Barker, F.K., Jonsson, K.A., Johnson, W., Koepfli, K.P., O’Brien, S., Haussler, D., Ryder, O.A., Rahbek, C., Willerslev, E., Graves, G.R., Glenn, T.C., McCormack, J., Burt, D., Ellegren, H., Alstrom, P., Edwards, S.V., Stamatakis, A., Mindell, D.P., Cracraft, J., Braun, E.L., Warnow, T., Jun, W., Gilbert, M.T.P., Zhang, G.: Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346(6215), 1320–1331 (2014)

    Article  Google Scholar 

  18. Kobert, K., Flouri, T., Aberer, A., Stamatakis, A.: The divisible load balance problem and its application to phylogenetic inference. In: International Workshop on Algorithms in Bioinformatics, pp. 204–216. Springer (2014)

    Google Scholar 

  19. Kobert, K., Stamatakis, A., Flouri, T.: Efficient detection of repeating sites to accelerate phylogenetic likelihood calculations. Syst. Biol. 66(2), 205–217 (2017)

    Google Scholar 

  20. Kozlov, A.: Models, optimizations, and tools for large-scale phylogenetic inference, handling sequence uncertainty, and taxonomic validation. Ph.D. thesis, Karlsruhe Institute of Technology (2017)

    Google Scholar 

  21. Kozlov, A.M., Aberer, A.J., Stamatakis, A.: ExaMl version 3: a tool for phylogenomic analyses on supercomputers. Bioinformatics 31(15), 2577–2579 (2015)

    Article  Google Scholar 

  22. Misof, B., Liu, S., Meusemann, K., Peters, R.S., Donath, A., Mayer, C., Frandsen, P.B., Ware, J., Flouri, T., Beutel, R.G., Niehuis, O., Petersen, M., Izquierdo-Carrasco, F., Wappler, T., Rust, J., Aberer, A., Aspöck, U., Aspöck, H., Bartel, D., Blanke, A., Berger, S., Calcott, B., Chen, J., Friedrich, F., Fukui, M., Fujita, M., P., Gu, S., Huang, Y., Jermiin, L., Kawahara, A., Krogmann, L., Lanfear, R., Letsch, H., Li, Y., Li, Z., Li, J., Lu, H., Machinda, R.Y.M., Kapli, P., McKenna, D., Meng, G., Nakagaki, Y., Navarrete-Heredia, J., Ott, M., Ou, Y., Pass, G., Podsiadlowski, L., Pol, H., von Reumont, B., Schutte, K., Sekiya, K., Shimizu, S., Slipinski, A., Stamatakis, A., Song, W., Su, X., Szucsich, N., Tan, M., Tan, X., Tan, M.G., Tomizuka, S., Trautwein, M., Tong, X., Wilbrandt, J., Wipfler, B., Wong, T., Wu, Q., Wu, G., Xie, Y., Yang, S., Yang, Q.Y.: The timing and pattern of insect evolution. Science 346(6210), 763–767 (2014)

    Google Scholar 

  23. Morel, B., Flouri, T., Stamatakis, A.: A novel heuristic for data distribution in massively parallel phylogenetic inference using site repeats. In: The IEEE International Conference on High Performance Computing and Communications (HPCC). IEEE (2017)

    Google Scholar 

  24. Nguyen, L.T., Schmidt, H.A., von Haeseler, A., Minh, B.Q.: IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32(1), 268–274 (2014)

    Article  Google Scholar 

  25. Pond, S.L.K., Muse, S.V.: Column sorting: rapid calculation of the phylogenetic likelihood function. Syst. Bio. 53(5), 685–692 (2004)

    Article  Google Scholar 

  26. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C, 2nd edn. Cambridge University Press, New York (1992)

    MATH  Google Scholar 

  27. Ripplinger, J., Sullivan, J.: Does choice in model selection affect maximum likelihood analysis? Syst. Biol. 57(1), 76–85 (2008)

    Article  Google Scholar 

  28. Ronquist, F., Huelsenbeck, J.: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574 (2003)

    Article  Google Scholar 

  29. Sanderson, M.J., McMahon, M.M., Stamatakis, A., Zwickl, D.J., Steel, M.: Impacts of terraces on phylogenetic inference. Syst. Biol. 64(5), 709–726 (2015)

    Article  Google Scholar 

  30. Sanderson, M.J., McMahon, M.M., Steel, M.: Terraces in phylogenetic tree space. Science 333(6041), 448–450 (2011)

    Article  Google Scholar 

  31. Scholl, C., Kobert, K., Flouri, T., Stamatakis, A.: The divisible load balance problem with shared cost and its application to phylogenetic inference. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, pp. 408–417. IEEE (2016)

    Google Scholar 

  32. Si Quang, L., Gascuel, O., Lartillot, N.: Empirical profile mixture models for phylogenetic reconstruction. Bioinformatics 24(20), 2317–2323 (2008)

    Article  Google Scholar 

  33. Stamatakis, A., Aberer, A.J.: Novel parallelization schemes for large-scale likelihood-based phylogenetic inference. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1195–1204. IEEE (2013)

    Google Scholar 

  34. Stamatakis, A., Aberer, A.J., Goll, C., Smith, S.A., Berger, S.A., Izquierdo-Carrasco, F.: RAxML-Light: a tool for computing terabyte phylogenies. Bioinformatics 28(15), 2064–2066 (2012)

    Article  Google Scholar 

  35. Stamatakis, A., Alachiotis, N.: Time and memory efficient likelihood-based tree searches on phylogenomic alignments with missing data. Bioinformatics 26(12), i132–i139 (2010)

    Article  Google Scholar 

  36. Stamatakis, A., Ott, M.: Load balance in the phylogenetic likelihood kernel. In: International Conference on Parallel Processing, 2009, ICPP’09, pp. 348–355. IEEE (2009)

    Google Scholar 

  37. Stamatakis, A.P., Ludwig, T., Meier, H., Wolf, M.J.: Accelerating parallel maximum likelihood-based phylogenetic tree calculations using subtree equality vectors. In: ACM/IEEE 2002 Conference on Supercomputing, pp. 1–16. IEEE (2002)

    Google Scholar 

  38. Valle, M., Schabauer, H., Pacher, C., Stockinger, H., Stamatakis, A., Robinson-Rechavi, M., Salamin, N.: Optimization strategies for fast detection of positive selection on phylogenetic trees. Bioinformatics 30(8), 1129–1137 (2014)

    Article  Google Scholar 

  39. Yang, Z.: Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39(3), 306–314 (1994)

    Article  Google Scholar 

  40. Yang, Z., Rannala, B.: Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Mol. Biol. Evol. 23(1), 212–226 (2005)

    Article  Google Scholar 

  41. Zhang, J., Stamatakis, A.: The multi-processor scheduling problem in phylogenetics. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & Ph.D. Forum (IPDPSW), pp. 691–698. IEEE (2012)

    Google Scholar 

Download references

Acknowledgements

The author gratefully acknowledges the support of the Klaus Tschira Foundation and the support he received from Bernard Moret over all those years.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandros Stamatakis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Stamatakis, A. (2019). A Review of Approaches for Optimizing Phylogenetic Likelihood Calculations. In: Warnow, T. (eds) Bioinformatics and Phylogenetics. Computational Biology, vol 29. Springer, Cham. https://doi.org/10.1007/978-3-030-10837-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-10837-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-10836-6

  • Online ISBN: 978-3-030-10837-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics