Experiences with Parallelizing a Bio-informatics Program on the Cell BE

  • Hans Vandierendonck
  • Sean Rul
  • Michiel Questier
  • Koen De Bosschere
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4917)


The Cell Broadband Engine Architecture is a new heterogeneous multi-core architecture targeted at compute-intensive workloads. The architecture of the Cell BE has several features that are unique in high-performance general-purpose processors, such as static instruction scheduling, extensive support for vectorization, scratch pad memories, explicit programming of DMAs, mailbox communication, multiple processor cores, etc. It is necessary to make explicit use of these features to obtain high performance. Yet, little work reports on how to apply them and how much each of them contributes to performance.

This paper presents our experiences with programming the Cell BE architecture. Our test application is Clustal W, a bio-informatics program for multiple sequence alignment. We report on how we apply the unique features of the Cell BE to Clustal W and how important each is to obtain high performance. By making extensive use of vectorization and by parallelizing the application across all cores, we speedup the pairwise alignment phase of Clustal W with a factor of 51.2 over PPU (superscalar) execution. The progressive alignment phase is sped up by a factor of 5.7 over PPU execution, resulting in an overall speedup by 9.1.


Pairwise Alignment Loop Nest Loop Iteration Loop Body Progressive Alignment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Pham, D., et al.: The design and implementation of a first-generation Cell processor. In: IEEE International Solid-State Circuits Conference, pp. 184–592 (2005)Google Scholar
  2. 2.
    Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22), 4673–4680 (1994)CrossRefGoogle Scholar
  3. 3.
    Flachs, B., et al.: The microarchitecture of the synergistic processor for a Cell processor. Solid-State Circuits, IEEE Journal of 41(1), 63–70 (2006)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Gschwind, M., Hofstee, P.H., Flachs, B., Hopkins, M., Watanabe, Y., Yamazaki, T.: Synergistic processing in cell’s multicore architecture. IEEE Micro 26(2), 10–24 (2006)CrossRefGoogle Scholar
  5. 5.
    Bader, D., Li, Y., Li, T., Sachdeva, V.: BioPerf: A Benchmark Suite to Evaluate High-Performance Computer Architecture on Bioinformatics Applications. In: The IEEE International Symposium on Workload Characterization, pp. 163–173 (October 2005)Google Scholar
  6. 6.
    Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147(1), 195–197 (1981)CrossRefGoogle Scholar
  7. 7.
    Just, W.: Computational complexity of multiple sequence alignment with SP-score. Journal of Computational Biology 8(6), 615–623 (2001)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4(4), 406–425 (1987)Google Scholar
  9. 9.
    Edgar, R.C.: Muscle: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5(1) (2004)Google Scholar
  10. 10.
    Uniprotkb/swiss-prot protein knowledgebase 52.5 statistics,
  11. 11.
    Mikhailov, D., Cofer, H., Gomperts, R.: Performance Optimization of ClustalW: Parallel ClustalW, HT Clustal, and MULTICLUSTAL. White Paper, CA Silicon Graphics (2001)Google Scholar
  12. 12.
    Chaichoompu, K., Kittitornkun, S., Tongsima, S.: MT-ClustalW: multithreading multiple sequence alignment. In: Sixth IEEE International Workshop on High Performance Computational Biology, p. 8 (2006)Google Scholar
  13. 13.
    Williams, S., Shalf, J., Oliker, L., Kamil, S., Husbands, P., Yelick, K.: The potential of the Cell processor for scientific computing. In: Proceedings of the 3rd conference on Computing frontiers, pp. 9–20 (May 2006)Google Scholar
  14. 14.
    Greene, J., Cooper, R.: A parallel 64K complex FFT algorithm for the IBM/Sony/Toshiba Cell broadband engine processor. White Paper (November 2006)Google Scholar
  15. 15.
    Heman, S., Nes, N., Zukowski, M., Boncz, P.A.: Vectorized Data Processing on the Cell Broadband Engine. In: Proceedings of the International Workshop on Data Management on New Hardware (June 2007)Google Scholar
  16. 16.
    Bader, D.A., Agarwal, V., Madduri, K.: On the design and analysis of irregular algorithms on the cell processor: A case study on list ranking. In: 21st IEEE International Parallel and Distributed Processing Symposium (March 2007)Google Scholar
  17. 17.
    Blagojevic, F., Stamatakis, A., Antonopoulos, C.D., Nikolopoulos, D.E.: RAxML-Cell: Parallel phylogenetic tree inference on the cell broadband engine. In: International Symposiumon Parallel and Distributed Processing Systems (2007)Google Scholar
  18. 18.
    Sachdeva, V., Kistler, M., Speight, E., Tzeng, T.H.K.: Exploring the viability of the Cell Broadband Engine for bioinformatics applications. In: Proceedings of the 6th Workshop on High Performance Computational Biology, p. 8 (March 2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Hans Vandierendonck
    • 1
  • Sean Rul
    • 1
  • Michiel Questier
    • 1
  • Koen De Bosschere
    • 1
  1. 1.Department of Electronics and Information Systems/HiPEACGhent UniversityGentBelgium

Personalised recommendations