Exploring New Search Algorithms and Hardware for Phylogenetics: RAxML Meets the IBM Cell

  • A. Stamatakis
  • F. Blagojevic
  • D. S. Nikolopoulos
  • C. D. Antonopoulos
Article

Abstract

Phylogenetic inference is considered to be one of the grand challenges in Bioinformatics due to the immense computational requirements. RAxML is currently among the fastest and most accurate programs for phylogenetic tree inference under the Maximum Likelihood (ML) criterion. First, we introduce new tree search heuristics that accelerate RAxML by a factor of 2.43 while returning equally good trees. The performance of the new search algorithm has been assessed on 18 real-world datasets comprising 148 up to 4,843 DNA sequences. We then present the implementation, optimization, and evaluation of RAxML on the IBM Cell Broadband Engine. We address the problems and provide solutions pertaining to the optimization of floating point code, control flow, communication, and scheduling of multi-level parallelism on the Cell.

Keywords

phylogenetic inference maximum likelihood RAxML IBM cell 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    IBM, “Cell broadband engine programming tutorial version 1.0,” Available at: http://www-106.ibm.com/developerworks/eserver/library/es-archguide-v2.html.
  2. 2.
    D. A. Bader, B. M. E. Moret, and L. Vawter, “Industrial Applications of High-performance Computing for Phylogeny Reconstruction,” in Proc. of SPIE ITCom, vol. 4528, 2001, pp. 159–168.Google Scholar
  3. 3.
    P. Bellens, J. M. Perez, R. M. Badia, and J. Labarta, “Cells: A Programming Model for the Cell be Architecture,” in Proc. of SC2006, November 2006.Google Scholar
  4. 4.
    C. Benthin, I. Wald, M. Scherbaum, and H. Friedrich, “Ray Tracing on the CELL Processor,” Technical Report, inTrace Realtime Ray Tracing GmbH, No inTrace-2006-001, 2006.Google Scholar
  5. 5.
    F. Blagojevic, D. S. Nikolopoulos, A. Stamatakis, and C. D. Antonopoulos, “Dynamic Multigrain Parallelization on the Cell Broadband Engine,” in Proc. of PPoPP 2007, San Jose, CA, March 2007.Google Scholar
  6. 6.
    B. Chor and T. Tuller, “Maximum Likelihood of Evolutionary Trees: Hardness and Approximation,” Bioinformatics, vol. 21, no. 1, 2005, pp. 97–106.CrossRefGoogle Scholar
  7. 7.
    Z. Du, F. Lin, and U. Roshan, “Reconstruction of Large Phylogenetic Trees: A Parallel Approach,” Computational Biology and Chemistry, vol. 29, no. 4, 2005, pp. 273–280.MATHCrossRefGoogle Scholar
  8. 8.
    A. E. Eichenberger et al., “Optimizing Compiler for a Cell processor,” Parallel Architectures and Compilation Techniques, September 2005.Google Scholar
  9. 9.
    D. Pham et al., “The Design and Implementation of a First Generation Cell Processor,” Proc. Int’l Solid-State Circuits Conf. Tech. Digest, IEEE Press, 2005, pp. 184–185.Google Scholar
  10. 10.
    K. Fatahalian et al., “Sequoia: Programming the Memory Hierarchy,” in Proc. of SC2006, November 2006.Google Scholar
  11. 11.
    R. E. Ley et al., “Unexpected Diversity and Complexity of the Guerrero Negro Hypersaline Microbial Mat,” Appl. Environ. Microbiol., vol. 72, no. 5, 2006, pp. 3685–3695, May.CrossRefGoogle Scholar
  12. 12.
    J. Felsenstein, “Evolutionary Trees from DNA Sequences: A Maximum Likelihood Approach,” J. Mol. Evol., vol. 17, 1981, pp. 368–376.CrossRefGoogle Scholar
  13. 13.
    G. W. Grimm, S. S. Renner, A. Stamatakis, and V. Hemleben, “A Nuclear Ribosomal DNA Phylogeny of Acer Inferred with Maximum Likelihood, Splits Graphs, and Motif Analyses of 606 Sequences,” Evolutionary Bioinformatics Online, vol. 2, 2006, pp. 279–294.Google Scholar
  14. 14.
    S. Guindon and O. Gascuel, “A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood,” Syst. Biol., vol. 52, no. 5, 2003, pp. 696–704.CrossRefGoogle Scholar
  15. 15.
    N. Hjelte, Smoothed Particle Hydrodynamics on the Cell Broadband Engine. Masters Thesis, June 2006.Google Scholar
  16. 16.
    W. Kahan, “Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-point Arithmetic,” in IEEE, 1997.Google Scholar
  17. 17.
    D. Kunzman, G. Zheng, E. Bohm, and L. V. Kalé, “Charm++, Offload API, and the Cell Processor,” in Proc. of the Workshop on Programming Models for Ubiquitous Parallelism, Seattle, WA, USA, September 2006.Google Scholar
  18. 18.
    Sun Microsystems, Sun UltraSPARC T1 Cool Threads Technology, December 2005. http://www.sun.com/aboutsun/media/presskits/networkcomputing05q4/T1Infographic.pdf.
  19. 19.
    B. Q. Minh, L. S. Vinh, A. V. Haeseler, and H. A. Schmidt, “pIQPNNI: Parallel Reconstruction of Large Maximum Likelihood Phylogenies,” Bioinformatics, vol. 21, no. 19, 2005, pp. 3794–3796.CrossRefGoogle Scholar
  20. 20.
    B. Q. Minh, L. S. Vinh, H. A. Schmidt, and A. V. Haeseler, “Large Maximum Likelihood Trees,” in Proc. of the NIC Symposium 2006, 2006, pp. 357–365.Google Scholar
  21. 21.
    C. E. Robertson, J. K. Harris, J. R. Spear, and N. R. Pace, “Phylogenetic Diversity and Ecology of Environmental Archaea,” Curr. Opin. Microbiol., vol. 8, 2005, pp. 638–642.CrossRefGoogle Scholar
  22. 22.
    F. Ronquist and J. P. Huelsenbeck, “MrBayes 3: Bayesian Phylogenetic Inference under Mixed Models,” Bioinformatics, vol. 19, no. 12, 2003, pp. 1572–1574.CrossRefGoogle Scholar
  23. 23.
    A. Stamatakis, Distributed and Parallel Algorithms and Systems for Inference of Huge Phylogenetic Trees based on the Maximum Likelihood Method, PhD thesis, Technische Universität München, Germany, October 2004.Google Scholar
  24. 24.
    A. Stamatakis, “Phylogenetic Models of Rate Heterogeneity: A High Performance Computing Perspective,” in Proc. of IPDPS2006, HICOMB Workshop, Proceedings on CD, Rhodos, Greece, April 2006.Google Scholar
  25. 25.
    A. Stamatakis, “RAxML-VI-HPC: Maximum Likelihood-based Phylogenetic Analyses with Thousands of Taxa and Mixed Models,” Bioinformatics, vol. 22, no. 21, 2006, pp. 2688–2690.CrossRefGoogle Scholar
  26. 26.
    A. Stamatakis, T. Ludwig, and H. Meier, “Parallel Inference of a 10.000-taxon Phylogeny with Maximum Likelihood,” in Proc. of Euro–Par 2004, September 2004, pp. 997–1004.Google Scholar
  27. 27.
    A. Stamatakis, T. Ludwig, and H. Meier, “RAxML-III: A Fast Program for Maximum Likelihood-based Inference of Large Phylogenetic Trees,” Bioinformatics, vol. 21, no. 4, 2005, pp. 456–463.CrossRefGoogle Scholar
  28. 28.
    A. Stamatakis, M. Ott, and T. Ludwig, “RAxML-OMP: An Efficient Program for Phylogenetic Inference on SMPs,” PaCT, 2005, pp. 288–302.Google Scholar
  29. 29.
    C. Stewart, D. Hart, D. Berry, G. Olsen, E. Wernert, and W. Fischer, “Parallel Implementation and Performance of FastDNAml—A Program for Maximum Likelihood Phylogenetic Inference,” in Proc. of SC2001, Denver, CO, November 2001.Google Scholar
  30. 30.
    D. Wang, “Cell Microprocessor III,” Real World Technologies, July 2005.Google Scholar
  31. 31.
    D. Zwickl, Genetic Algorithm Approaches for the Phylogenetic Analysis of Large Biologiical Sequence Datasets under the Maximum Likelihood Criterion. PhD thesis, University of Texas at Austin, April 2006.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • A. Stamatakis
    • 1
  • F. Blagojevic
    • 2
  • D. S. Nikolopoulos
    • 2
  • C. D. Antonopoulos
    • 3
  1. 1.School of Computer and Communication SciencesÉcole Polytechnique Fédérale de Lausanne LausanneSwitzerland
  2. 2.Department of Computer Science, Center for High-end Computing SystemsVirginia Tech BlacksburgUSA
  3. 3.Department of Computer and Communications EngineeringUniversity of ThessalyVolosGreece

Personalised recommendations