Exploring New Search Algorithms and Hardware for Phylogenetics: RAxML Meets the IBM Cell
Phylogenetic inference is considered to be one of the grand challenges in Bioinformatics due to the immense computational requirements. RAxML is currently among the fastest and most accurate programs for phylogenetic tree inference under the Maximum Likelihood (ML) criterion. First, we introduce new tree search heuristics that accelerate RAxML by a factor of 2.43 while returning equally good trees. The performance of the new search algorithm has been assessed on 18 real-world datasets comprising 148 up to 4,843 DNA sequences. We then present the implementation, optimization, and evaluation of RAxML on the IBM Cell Broadband Engine. We address the problems and provide solutions pertaining to the optimization of floating point code, control flow, communication, and scheduling of multi-level parallelism on the Cell.
Keywordsphylogenetic inference maximum likelihood RAxML IBM cell
Unable to display preview. Download preview PDF.
- 1.IBM, “Cell broadband engine programming tutorial version 1.0,” Available at: http://www-106.ibm.com/developerworks/eserver/library/es-archguide-v2.html.
- 2.D. A. Bader, B. M. E. Moret, and L. Vawter, “Industrial Applications of High-performance Computing for Phylogeny Reconstruction,” in Proc. of SPIE ITCom, vol. 4528, 2001, pp. 159–168.Google Scholar
- 3.P. Bellens, J. M. Perez, R. M. Badia, and J. Labarta, “Cells: A Programming Model for the Cell be Architecture,” in Proc. of SC2006, November 2006.Google Scholar
- 4.C. Benthin, I. Wald, M. Scherbaum, and H. Friedrich, “Ray Tracing on the CELL Processor,” Technical Report, inTrace Realtime Ray Tracing GmbH, No inTrace-2006-001, 2006.Google Scholar
- 5.F. Blagojevic, D. S. Nikolopoulos, A. Stamatakis, and C. D. Antonopoulos, “Dynamic Multigrain Parallelization on the Cell Broadband Engine,” in Proc. of PPoPP 2007, San Jose, CA, March 2007.Google Scholar
- 8.A. E. Eichenberger et al., “Optimizing Compiler for a Cell processor,” Parallel Architectures and Compilation Techniques, September 2005.Google Scholar
- 9.D. Pham et al., “The Design and Implementation of a First Generation Cell Processor,” Proc. Int’l Solid-State Circuits Conf. Tech. Digest, IEEE Press, 2005, pp. 184–185.Google Scholar
- 10.K. Fatahalian et al., “Sequoia: Programming the Memory Hierarchy,” in Proc. of SC2006, November 2006.Google Scholar
- 13.G. W. Grimm, S. S. Renner, A. Stamatakis, and V. Hemleben, “A Nuclear Ribosomal DNA Phylogeny of Acer Inferred with Maximum Likelihood, Splits Graphs, and Motif Analyses of 606 Sequences,” Evolutionary Bioinformatics Online, vol. 2, 2006, pp. 279–294.Google Scholar
- 15.N. Hjelte, Smoothed Particle Hydrodynamics on the Cell Broadband Engine. Masters Thesis, June 2006.Google Scholar
- 16.W. Kahan, “Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-point Arithmetic,” in IEEE, 1997.Google Scholar
- 17.D. Kunzman, G. Zheng, E. Bohm, and L. V. Kalé, “Charm++, Offload API, and the Cell Processor,” in Proc. of the Workshop on Programming Models for Ubiquitous Parallelism, Seattle, WA, USA, September 2006.Google Scholar
- 18.Sun Microsystems, Sun UltraSPARC T1 Cool Threads Technology, December 2005. http://www.sun.com/aboutsun/media/presskits/networkcomputing05q4/T1Infographic.pdf.
- 20.B. Q. Minh, L. S. Vinh, H. A. Schmidt, and A. V. Haeseler, “Large Maximum Likelihood Trees,” in Proc. of the NIC Symposium 2006, 2006, pp. 357–365.Google Scholar
- 23.A. Stamatakis, Distributed and Parallel Algorithms and Systems for Inference of Huge Phylogenetic Trees based on the Maximum Likelihood Method, PhD thesis, Technische Universität München, Germany, October 2004.Google Scholar
- 24.A. Stamatakis, “Phylogenetic Models of Rate Heterogeneity: A High Performance Computing Perspective,” in Proc. of IPDPS2006, HICOMB Workshop, Proceedings on CD, Rhodos, Greece, April 2006.Google Scholar
- 26.A. Stamatakis, T. Ludwig, and H. Meier, “Parallel Inference of a 10.000-taxon Phylogeny with Maximum Likelihood,” in Proc. of Euro–Par 2004, September 2004, pp. 997–1004.Google Scholar
- 28.A. Stamatakis, M. Ott, and T. Ludwig, “RAxML-OMP: An Efficient Program for Phylogenetic Inference on SMPs,” PaCT, 2005, pp. 288–302.Google Scholar
- 29.C. Stewart, D. Hart, D. Berry, G. Olsen, E. Wernert, and W. Fischer, “Parallel Implementation and Performance of FastDNAml—A Program for Maximum Likelihood Phylogenetic Inference,” in Proc. of SC2001, Denver, CO, November 2001.Google Scholar
- 30.D. Wang, “Cell Microprocessor III,” Real World Technologies, July 2005.Google Scholar
- 31.D. Zwickl, Genetic Algorithm Approaches for the Phylogenetic Analysis of Large Biologiical Sequence Datasets under the Maximum Likelihood Criterion. PhD thesis, University of Texas at Austin, April 2006.Google Scholar