Advertisement

A Review of Approaches for Optimizing Phylogenetic Likelihood Calculations

  • Alexandros StamatakisEmail author
Chapter
Part of the Computational Biology book series (COBO, volume 29)

Abstract

The execution times of likelihood-based phylogenetic inference tools for Maximum Likelihood or Bayesian inference are dominated by the Phylogenetic Likelihood Function (PLF). The PLF is executed millions of times in such analyses and accounts for 85–95% of overall run time. In addition, storing the Conditional Likelihood Vectors (CLVs) required for computing the Phylogenetic Likelihood Function largely determines the associated memory consumption. Storing CLVs accounts for approximately 80% of the overall, and typically large, memory footprint of likelihood-based tree inference tools. In this chapter, we review recent technical as well as algorithmic advances for accelerating PLF calculations and for saving CLV memory. We cover topics such as algorithmic techniques for optimizing PLF computations and low-level optimization on modern x86 architectures. We conclude with an outlook on potential future technical and algorithmic developments.

Keywords

Phylogenetic inference Likelihood calculations Performance optimization Parallel computing Terraces in tree space 

Notes

Acknowledgements

The author gratefully acknowledges the support of the Klaus Tschira Foundation and the support he received from Bernard Moret over all those years.

References

  1. 1.
    Aberer, A.J., Kobert, K., Stamatakis, A.: ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Mol. Biol. Evol. 31(10), 2553–2556 (2014)CrossRefGoogle Scholar
  2. 2.
    Ayres, D.L., Darling, A., Zwickl, D.J., Beerli, P., Holder, M.T., Lewis, P.O., Huelsenbeck, J.P., Ronquist, F., Swofford, D.L., Cummings, M.P., et al.: BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics. Syst. Biol. 61(1), 170–173 (2011)CrossRefGoogle Scholar
  3. 3.
    Biczok, R., Bozsoky, P., Eisenmann, P., Ernst, J., Ribizel, T., Scholz, F., Trefzer, A., Weber, F., Hamann, M., Stamatakis, A.: Two C++ libraries for counting trees on a phylogenetic terrace. bioRxiv, p. 211276 (2017)Google Scholar
  4. 4.
    Boussau, B., Szöllősi, G.J., Duret, L., Gouy, M., Tannier, E., Daubin, V.: Genome-scale coestimation of species and gene trees. Genome Res. 23(2), 323–330 (2013)CrossRefGoogle Scholar
  5. 5.
    Brent, R.P.: An algorithm with guaranteed convergence for finding a zero of a function. Comput. J. 14(4), 422–425 (1971)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Chernomor, O., von Haeseler, A., Minh, B.Q.: Terrace aware data structure for phylogenomic inference from supermatrices. Syst. Biol. 65(6), 997–1008 (2016)CrossRefGoogle Scholar
  7. 7.
    Chernomor, O., Minh, B.Q., von Haeseler, A.: Consequences of common topological rearrangements for partition trees in phylogenomic inference. J. Comput. Biol. 22(12), 1129–1142 (2015)CrossRefGoogle Scholar
  8. 8.
    Chor, B., Hendy, M.D., Holland, B.R., Penny, D.: Multiple maxima of likelihood in phylogenetic trees: an analytic approach. Mol. Biol. Evol. 17(10), 1529–1541 (2000)CrossRefGoogle Scholar
  9. 9.
    Constantinescu, M., Sankoff, D.: An efficient algorithm for supertrees. J. Class. 12(1), 101–112 (1995)CrossRefGoogle Scholar
  10. 10.
    Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17(6), 368–376 (1981)CrossRefGoogle Scholar
  11. 11.
    Fletcher, R.: Practical Methods of Optimization. Wiley, New York (1987)zbMATHGoogle Scholar
  12. 12.
    Flouri, T., Izquierdo-Carrasco, F., Darriba, D., Aberer, A., Nguyen, L.T., Minh, B., Von Haeseler, A., Stamatakis, A.: The phylogenetic likelihood library. Syst. Biol. 64(2), 356–362 (2014)CrossRefGoogle Scholar
  13. 13.
    Hoang, D.T., Chernomor, O., von Haeseler, A., Minh, B.Q., Vinh, L.S.: UFBoot2: improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35(2), 518–522 (2018).  https://doi.org/10.1093/molbev/msx281CrossRefGoogle Scholar
  14. 14.
    Izquierdo-Carrasco, F., Gagneur, J., Stamatakis, A.: Trading memory for running time in phylogenetic likelihood computations. Heidelberg Institute for Theoretical Studies (2011)Google Scholar
  15. 15.
    Izquierdo-Carrasco, F., Smith, S.A., Stamatakis, A.: Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees. BMC Bioinform. 12(1), 470 (2011)CrossRefGoogle Scholar
  16. 16.
    Izquierdo-Carrasco, F., Stamatakis, A.: Computing the phylogenetic likelihood function out-of-core. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Ph.D Forum (IPDPSW), pp. 444–451. IEEE (2011)Google Scholar
  17. 17.
    Jarvis, E., Mirarab, S., Aberer, A.J., Li, B., Houde, P., Li, C., Ho, S., Faircloth, B.C., Nabholz, B., Howard, J.T., Suh, A., Weber, C.C., da Fonseca, R.R., Li, J., Zhang, F., Li, H., Zhou, L., Narula, N., Liu, L., Ganapathy, G., Boussau, B., Bayzid, M.S., Zavidovych, V., Subramanian, S., Gabaldón, T., Capella-Gutiérrez, S., Huerta-Cepas, J., Rekepalli, B., Munch, K., Schierup, M., Lindow, B., Warren, W.C., Ray, D., Green, R.E., Bruford, M.W., Zhan, X., Dixon, A., Li, S., Li, N., Huang, Y., Derryberry, E.P., Bertelsen, M.F., Sheldon, F.H., Brumfield, R.T., Mello, C.V., Lovell, P.V., Wirthlin, M., Schneider, M.P.C., Prosdocimi, F., Samaniego, J.A., Velazquez, A.M.V., Alfaro-Núnez, A., Campos, P.F., Petersen, B., Sicheritz-Ponten, T., Pas, A., Bailey, T., Scofield, P., Bunce, M., Lambert, D.M., Zhou, Q., Perelman, P., Driskell, A.C., Shapiro, B., Xiong, Z., Zeng, Y., Liu, S., Li, Z., Liu, B., Wu, K., Xiao, J., Yinqi, X., Zheng, Q., Zhang, Y., Yang, H., Wang, J., Smeds, L., Rheindt, F.E., Braun, M., Fjeldsa, J., Orlando, L., Barker, F.K., Jonsson, K.A., Johnson, W., Koepfli, K.P., O’Brien, S., Haussler, D., Ryder, O.A., Rahbek, C., Willerslev, E., Graves, G.R., Glenn, T.C., McCormack, J., Burt, D., Ellegren, H., Alstrom, P., Edwards, S.V., Stamatakis, A., Mindell, D.P., Cracraft, J., Braun, E.L., Warnow, T., Jun, W., Gilbert, M.T.P., Zhang, G.: Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346(6215), 1320–1331 (2014)CrossRefGoogle Scholar
  18. 18.
    Kobert, K., Flouri, T., Aberer, A., Stamatakis, A.: The divisible load balance problem and its application to phylogenetic inference. In: International Workshop on Algorithms in Bioinformatics, pp. 204–216. Springer (2014)Google Scholar
  19. 19.
    Kobert, K., Stamatakis, A., Flouri, T.: Efficient detection of repeating sites to accelerate phylogenetic likelihood calculations. Syst. Biol. 66(2), 205–217 (2017)Google Scholar
  20. 20.
    Kozlov, A.: Models, optimizations, and tools for large-scale phylogenetic inference, handling sequence uncertainty, and taxonomic validation. Ph.D. thesis, Karlsruhe Institute of Technology (2017)Google Scholar
  21. 21.
    Kozlov, A.M., Aberer, A.J., Stamatakis, A.: ExaMl version 3: a tool for phylogenomic analyses on supercomputers. Bioinformatics 31(15), 2577–2579 (2015)CrossRefGoogle Scholar
  22. 22.
    Misof, B., Liu, S., Meusemann, K., Peters, R.S., Donath, A., Mayer, C., Frandsen, P.B., Ware, J., Flouri, T., Beutel, R.G., Niehuis, O., Petersen, M., Izquierdo-Carrasco, F., Wappler, T., Rust, J., Aberer, A., Aspöck, U., Aspöck, H., Bartel, D., Blanke, A., Berger, S., Calcott, B., Chen, J., Friedrich, F., Fukui, M., Fujita, M., P., Gu, S., Huang, Y., Jermiin, L., Kawahara, A., Krogmann, L., Lanfear, R., Letsch, H., Li, Y., Li, Z., Li, J., Lu, H., Machinda, R.Y.M., Kapli, P., McKenna, D., Meng, G., Nakagaki, Y., Navarrete-Heredia, J., Ott, M., Ou, Y., Pass, G., Podsiadlowski, L., Pol, H., von Reumont, B., Schutte, K., Sekiya, K., Shimizu, S., Slipinski, A., Stamatakis, A., Song, W., Su, X., Szucsich, N., Tan, M., Tan, X., Tan, M.G., Tomizuka, S., Trautwein, M., Tong, X., Wilbrandt, J., Wipfler, B., Wong, T., Wu, Q., Wu, G., Xie, Y., Yang, S., Yang, Q.Y.: The timing and pattern of insect evolution. Science 346(6210), 763–767 (2014)Google Scholar
  23. 23.
    Morel, B., Flouri, T., Stamatakis, A.: A novel heuristic for data distribution in massively parallel phylogenetic inference using site repeats. In: The IEEE International Conference on High Performance Computing and Communications (HPCC). IEEE (2017)Google Scholar
  24. 24.
    Nguyen, L.T., Schmidt, H.A., von Haeseler, A., Minh, B.Q.: IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32(1), 268–274 (2014)CrossRefGoogle Scholar
  25. 25.
    Pond, S.L.K., Muse, S.V.: Column sorting: rapid calculation of the phylogenetic likelihood function. Syst. Bio. 53(5), 685–692 (2004)CrossRefGoogle Scholar
  26. 26.
    Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C, 2nd edn. Cambridge University Press, New York (1992)zbMATHGoogle Scholar
  27. 27.
    Ripplinger, J., Sullivan, J.: Does choice in model selection affect maximum likelihood analysis? Syst. Biol. 57(1), 76–85 (2008)CrossRefGoogle Scholar
  28. 28.
    Ronquist, F., Huelsenbeck, J.: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574 (2003)CrossRefGoogle Scholar
  29. 29.
    Sanderson, M.J., McMahon, M.M., Stamatakis, A., Zwickl, D.J., Steel, M.: Impacts of terraces on phylogenetic inference. Syst. Biol. 64(5), 709–726 (2015)CrossRefGoogle Scholar
  30. 30.
    Sanderson, M.J., McMahon, M.M., Steel, M.: Terraces in phylogenetic tree space. Science 333(6041), 448–450 (2011)CrossRefGoogle Scholar
  31. 31.
    Scholl, C., Kobert, K., Flouri, T., Stamatakis, A.: The divisible load balance problem with shared cost and its application to phylogenetic inference. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, pp. 408–417. IEEE (2016)Google Scholar
  32. 32.
    Si Quang, L., Gascuel, O., Lartillot, N.: Empirical profile mixture models for phylogenetic reconstruction. Bioinformatics 24(20), 2317–2323 (2008)CrossRefGoogle Scholar
  33. 33.
    Stamatakis, A., Aberer, A.J.: Novel parallelization schemes for large-scale likelihood-based phylogenetic inference. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1195–1204. IEEE (2013)Google Scholar
  34. 34.
    Stamatakis, A., Aberer, A.J., Goll, C., Smith, S.A., Berger, S.A., Izquierdo-Carrasco, F.: RAxML-Light: a tool for computing terabyte phylogenies. Bioinformatics 28(15), 2064–2066 (2012)CrossRefGoogle Scholar
  35. 35.
    Stamatakis, A., Alachiotis, N.: Time and memory efficient likelihood-based tree searches on phylogenomic alignments with missing data. Bioinformatics 26(12), i132–i139 (2010)CrossRefGoogle Scholar
  36. 36.
    Stamatakis, A., Ott, M.: Load balance in the phylogenetic likelihood kernel. In: International Conference on Parallel Processing, 2009, ICPP’09, pp. 348–355. IEEE (2009)Google Scholar
  37. 37.
    Stamatakis, A.P., Ludwig, T., Meier, H., Wolf, M.J.: Accelerating parallel maximum likelihood-based phylogenetic tree calculations using subtree equality vectors. In: ACM/IEEE 2002 Conference on Supercomputing, pp. 1–16. IEEE (2002)Google Scholar
  38. 38.
    Valle, M., Schabauer, H., Pacher, C., Stockinger, H., Stamatakis, A., Robinson-Rechavi, M., Salamin, N.: Optimization strategies for fast detection of positive selection on phylogenetic trees. Bioinformatics 30(8), 1129–1137 (2014)CrossRefGoogle Scholar
  39. 39.
    Yang, Z.: Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39(3), 306–314 (1994)CrossRefGoogle Scholar
  40. 40.
    Yang, Z., Rannala, B.: Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Mol. Biol. Evol. 23(1), 212–226 (2005)CrossRefGoogle Scholar
  41. 41.
    Zhang, J., Stamatakis, A.: The multi-processor scheduling problem in phylogenetics. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & Ph.D. Forum (IPDPSW), pp. 691–698. IEEE (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Heidelberg Institute for Theoretical StudiesHeidelbergGermany
  2. 2.Karlsruhe Institute of TechnologyKarlsruheGermany

Personalised recommendations