Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4935))

Included in the following conference series:

Abstract

The Cell processor is a heterogeneous multi-core processor with one Power Processing Engine (PPE) core and eight Synergistic Processing Engine (SPE) cores. Each SPE has a directly accessible small local memory (256K), and it can access the system memory through DMA operations. Cell programming is complicated both by the need to explicitly manage DMA data transfers for SPE computation, as well as the multiple layers of parallelism provided in the architecture, including heterogeneous cores, multiple SPE cores, multithreading, SIMD units, and multiple instruction issue. There is a significant amount of ongoing research in programming models and tools that attempts to make it easy to exploit the computation power of the Cell architecture. In our work, we explore supporting OpenMP on the Cell processor. OpenMP is a widely used API for parallel programming. It is attractive to support OpenMP because programmers can continue using their familiar programming model, and existing code can be re-used. We base our work on IBM’s XL compiler, which already has OpenMP support for AIX multi-processor systems built with Power processors. We developed new components in the XL compiler and a new runtime library for Cell OpenMP that utilizes the Cell SDK libraries to target specific features of the new hardware platform. To describe the design of our Cell OpenMP implementation, we focus on three major issues in our system: 1) how to use the heterogeneous cores and synchronization support in the Cell to optimize OpenMP threads; 2) how to generate thread code targeting the different instruction sets of the PPE and SPE from within a compiler that takes single-source input; 3) how to implement the OpenMP memory model on the Cell memory system. We present experimental results for some SPEC OMP 2001 and NAS benchmarks to demonstrate the effectiveness of this approach. Also, we can observe detailed runtime event sequences using the visualization tool Paraver, and we use the insight into actual thread and synchronization behaviors to direct further optimizations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. IBM XL Compiler for the Cell BE, http://www.alphaworks.ibm.com/tech/cellcompiler

  2. NAS parallel benchmarks, http://www.nas.nasa.gov/Resources/Software/npb.html

  3. Paraver, http://www.cepba.upc.es/paraver

  4. SDK for Cell, http://www-128.ibm.com/developerworks/power/cell

  5. Spec OMP benchmarks, http://www.spec.org

  6. Eichenberger, A., et al.: Vectorization for SIMD Architecture with Alignment Constraints. Programming Language Design and Implementation (PLDI) (2003)

    Google Scholar 

  7. Eichenberger, A., et al.: Optimizing Compiler for the Cell Processor. In: Conference on Parallel Architecture and Compiler Techniques (PACT) (2005)

    Google Scholar 

  8. Pham, D., et al.: The design and implementation of a first-generation cell processor. In: IEEE International Solid-State Circuits Conference (ISSCC) (February 2005)

    Google Scholar 

  9. Gordon, M., et al.: Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs. In: International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (October 2006)

    Google Scholar 

  10. Bellens, P., et al.: CellSs: a Programming Model for the Cell BE Architecture. Supercomputing (SC) (2006)

    Google Scholar 

  11. Williams, S., et al.: The Potential of the Cell Processor for Scientific Computing. In: Conference on Computing Frontiers (2006)

    Google Scholar 

  12. Chen, T., et al.: Optimizing the use of static buffers for DMA on a CELL chip. In: Almási, G.S., Caşcaval, C., Wu, P. (eds.) KSEM 2006. LNCS, vol. 4382. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  13. Kistler, M., Perrone, M., Petrini, F.: CELL multiprocessor communication network: Built for Speed. IEEE Micro 26(3) (May/June 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Barbara Chapman Weiming Zheng Guang R. Gao Mitsuhisa Sato Eduard Ayguadé Dongsheng Wang

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

O’Brien, K., O’Brien, K., Sura, Z., Chen, T., Zhang, T. (2008). Supporting OpenMP on Cell. In: Chapman, B., Zheng, W., Gao, G.R., Sato, M., Ayguadé, E., Wang, D. (eds) A Practical Programming Model for the Multi-Core Era. IWOMP 2007. Lecture Notes in Computer Science, vol 4935. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69303-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69303-1_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69302-4

  • Online ISBN: 978-3-540-69303-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics