Advertisement

MrBayes for Phylogenetic Inference Using Protein Data on a GPU Cluster

  • Shuai Pang
  • Rebecca J. Stones
  • Ming-ming RenEmail author
  • Gang Wang
  • Xiaoguang Liu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9530)

Abstract

MrBayes is a widely used software for Bayesian phylogenetic inference: we input biological sequence data from various taxonomic groups, and MrBayes returns its estimate of the phylogenetic tree which gave rise to those taxa. This paper presents ta(MC)\(^{3}\), based on its predecessor a(MC)\(^{3}\), which, for protein datasets, improves computational efficiency and overcomes major obstacles in analyzing larger datasets on HPCs with multiple Graphics Processing Units (GPUs). The major improvements are (a) a new task mapping strategy, (b) the use of Kahan summation to resolve non-convergence issues, and (c) the introduction of 64-bit variables. We evaluate ta(MC)\(^{3}\) on real-world protein datasets both on a desktop server and the Tianhe-1A supercomputer. With a single GPU, ta(MC)\(^{3}\) is nearly 90 times faster compared with the serial version of MrBayes, up to around 9 times faster than MrBayes utilizing a GPU via the BEAGLE library, and up to 2.5 times faster than a(MC)\(^{3}\). On larger datasets with 64 nodes (GPUs) on Tianhe-1A, ta(MC)\(^{3}\) is capable of obtaining \(1000+\) speedup vs. serial MrBayes.

Keywords

MrBayes GPU Protein Task mapping strategy 

Notes

Acknowledgements

A biology-focused version of this paper has been published [10]. This work is partially supported by NSF of China (grant numbers: 61373018, 11301288), Program for New Century Excellent Talents in University (grant number: NCET130301) and the Fundamental Research Funds for the Central Universities (grant number: 65141021). Stones was supported by her NSF China Research Fellowship for International Young Scientists (grant number: 11450110409). We would also like to thank Hongju Xia, Jianfu Zhou, Jie Bao and Prof. Qiang Xie for their valuable input.

References

  1. 1.
    Altekar, G., Dwarkadas, S., Huelsenbeck, F., Ronquist, J.P.: Parallel metropolis coupled markov chain monte carlo for bayesian phylogenetic inference. Bioinformatics 20, 407–415 (2004)CrossRefGoogle Scholar
  2. 2.
    Bao, J., Xia, J., Zhou, J., Liu, X.G., Wang, G.: Efficient implementation of MrBayes on multi-GPU. Mol. Biol. Evol. 30, 1471–1479 (2013)CrossRefGoogle Scholar
  3. 3.
    Farber, R.: CUDA Application Design and Development. Morgan Kaufmann, San Francisco (2011)Google Scholar
  4. 4.
    Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981)CrossRefGoogle Scholar
  5. 5.
    Kahan, W.: Pracniques: further remarks on reducing truncation errors. Commun. ACM 8(1), 40 (1965). http://doi.acm.org/10.1145/363707.363723 CrossRefGoogle Scholar
  6. 6.
    Larget, B., Simon, D.L.: Markov chain monte carlo algorithms for the bayesian analysis of phylogenetic trees. Mol. Biol. Evol. 16, 750–759 (1999)CrossRefGoogle Scholar
  7. 7.
    Li, S., Pearl, D.K., Doss, H.: Phylogenetic tree construction using markov chain monte carlo. J. Am. Statist. Assoc. 95, 493–508 (2000)CrossRefGoogle Scholar
  8. 8.
    Mau, B., Newton, M.A.: Phylogenetic inference for binary data on dendrograms using markov chain monte carlo. J. Comp. Graph. Stat. 6, 122–131 (1997)Google Scholar
  9. 9.
    NVIDIA: CUDA C Programming Guide (2013)Google Scholar
  10. 10.
    Pang, S., Stones, R.J., Ren, M.M., Liu, X.G., Wang, G., Xia, H., Wu, H.Y., Liu, Y., Xie, Q.: GPU MrBayes v3.1: GPU MrBayes on graphics processing units for protein sequence data. Mol. Biol. Evol. 32(9), 2496–2497 (2015)CrossRefGoogle Scholar
  11. 11.
    Pratas, F., Trancoso, P., Stamatakis, A., Sousa, L.: Fine-grain parallelism using multi-core, Cell/BE, and GPU systems: accelerating the phylogenetic likelihood function. In: 42nd International Conference on Parallel Processing, pp. 9–17 (2009)Google Scholar
  12. 12.
    Rannala, B., Yang, Z.: Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J. Mol. Evol. 43, 304–311 (1996)CrossRefGoogle Scholar
  13. 13.
    Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)Google Scholar
  14. 14.
    Schmidt, H., Strimmer, K., Vingron, M., Haeseler, A.: Tree-puzzle: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18, 502–504 (2002)CrossRefGoogle Scholar
  15. 15.
    Thuiller, W., Lavergne, S., Roquet, C., Boulangeat, I., Lafourcade, B., Araujo, M.B.: Parallel algorithms for bayesian phylogenetic inference. J. Parallel Distrib. Comput. 63, 707–718 (2003)CrossRefGoogle Scholar
  16. 16.
    Xie, Q., Bu, W., Zheng, L.: The bayesian phylogenetic analysis of the 18s RNA sequences from the main lineages of trichophora (insecta: Heteroptera:pentatomomorpha). Mol. Biol. Evol. 34, 448–451 (2005)Google Scholar
  17. 17.
    Yang, Z.: Phylogenetic analysis using parsimony and likelihood methods. J. Mol. Evol. 42(2), 294–307 (1996)CrossRefGoogle Scholar
  18. 18.
    Zhou, J., Liu, X.G., Stones, D.S., Xie, Q., Wang, G.: MrBayes on a graphics processing unit. Bioinformatics 27, 1255–1261 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Shuai Pang
    • 1
  • Rebecca J. Stones
    • 1
  • Ming-ming Ren
    • 1
    Email author
  • Gang Wang
    • 1
  • Xiaoguang Liu
    • 1
  1. 1.College of Computer and Control EngineeringNankai UniversityTianjinChina

Personalised recommendations