International Journal of Parallel Programming

, Volume 35, Issue 3, pp 263–298

Scientific Computing Kernels on the Cell Processor

  • Samuel Williams
  • John Shalf
  • Leonid Oliker
  • Shoaib Kamil
  • Parry Husbands
  • Katherine Yelick
Special Issue on High-End Computing

DOI: 10.1007/s10766-007-0034-5

Cite this article as:
Williams, S., Shalf, J., Oliker, L. et al. Int J Parallel Prog (2007) 35: 263. doi:10.1007/s10766-007-0034-5

In this work, we examine the potential of using the recently-released STI Cell processor as a building block for future high-end scientific computing systems. Our work contains several novel contributions. First, we introduce a performance model for Cell and apply it to several key numerical kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs. Next, we validate our model by comparing results against published hardware data, as well as our own Cell blade implementations. Additionally, we compare Cell performance to benchmarks run on leading superscalar (AMD Opteron), VLIW (Intel Itanium2), and vector (Cray X1E) architectures. Our work also explores several different kernel implementations and demonstrates a simple and effective programming model for Cell’s unique architecture. Finally, we propose modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations. Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency.

Keywords

Cell processorGEMMSpMVsparse matrixFFTStencilthree level memory

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Samuel Williams
    • 1
  • John Shalf
    • 1
  • Leonid Oliker
    • 1
  • Shoaib Kamil
    • 1
  • Parry Husbands
    • 1
  • Katherine Yelick
    • 1
  1. 1.Lawrence Berkeley National LaboratoryCRD/NERSCBerkeleyUSA