Supporting OpenMP on Cell

O’Brien, Kevin; O’Brien, Kathryn; Sura, Zehra; Chen, Tong; Zhang, Tao

doi:10.1007/978-3-540-69303-1_6

Kevin O’Brien¹,
Kathryn O’Brien¹,
Zehra Sura¹,
Tong Chen¹ &
…
Tao Zhang¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4935))

Included in the following conference series:

International Workshop on OpenMP

610 Accesses
6 Citations

Abstract

The Cell processor is a heterogeneous multi-core processor with one Power Processing Engine (PPE) core and eight Synergistic Processing Engine (SPE) cores. Each SPE has a directly accessible small local memory (256K), and it can access the system memory through DMA operations. Cell programming is complicated both by the need to explicitly manage DMA data transfers for SPE computation, as well as the multiple layers of parallelism provided in the architecture, including heterogeneous cores, multiple SPE cores, multithreading, SIMD units, and multiple instruction issue. There is a significant amount of ongoing research in programming models and tools that attempts to make it easy to exploit the computation power of the Cell architecture. In our work, we explore supporting OpenMP on the Cell processor. OpenMP is a widely used API for parallel programming. It is attractive to support OpenMP because programmers can continue using their familiar programming model, and existing code can be re-used. We base our work on IBM’s XL compiler, which already has OpenMP support for AIX multi-processor systems built with Power processors. We developed new components in the XL compiler and a new runtime library for Cell OpenMP that utilizes the Cell SDK libraries to target specific features of the new hardware platform. To describe the design of our Cell OpenMP implementation, we focus on three major issues in our system: 1) how to use the heterogeneous cores and synchronization support in the Cell to optimize OpenMP threads; 2) how to generate thread code targeting the different instruction sets of the PPE and SPE from within a compiler that takes single-source input; 3) how to implement the OpenMP memory model on the Cell memory system. We present experimental results for some SPEC OMP 2001 and NAS benchmarks to demonstrate the effectiveness of this approach. Also, we can observe detailed runtime event sequences using the visualization tool Paraver, and we use the insight into actual thread and synchronization behaviors to direct further optimizations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

IBM XL Compiler for the Cell BE, http://www.alphaworks.ibm.com/tech/cellcompiler
NAS parallel benchmarks, http://www.nas.nasa.gov/Resources/Software/npb.html
Paraver, http://www.cepba.upc.es/paraver
SDK for Cell, http://www-128.ibm.com/developerworks/power/cell
Spec OMP benchmarks, http://www.spec.org
Eichenberger, A., et al.: Vectorization for SIMD Architecture with Alignment Constraints. Programming Language Design and Implementation (PLDI) (2003)
Google Scholar
Eichenberger, A., et al.: Optimizing Compiler for the Cell Processor. In: Conference on Parallel Architecture and Compiler Techniques (PACT) (2005)
Google Scholar
Pham, D., et al.: The design and implementation of a first-generation cell processor. In: IEEE International Solid-State Circuits Conference (ISSCC) (February 2005)
Google Scholar
Gordon, M., et al.: Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs. In: International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (October 2006)
Google Scholar
Bellens, P., et al.: CellSs: a Programming Model for the Cell BE Architecture. Supercomputing (SC) (2006)
Google Scholar
Williams, S., et al.: The Potential of the Cell Processor for Scientific Computing. In: Conference on Computing Frontiers (2006)
Google Scholar
Chen, T., et al.: Optimizing the use of static buffers for DMA on a CELL chip. In: Almási, G.S., Caşcaval, C., Wu, P. (eds.) KSEM 2006. LNCS, vol. 4382. Springer, Heidelberg (2007)
Chapter Google Scholar
Kistler, M., Perrone, M., Petrini, F.: CELL multiprocessor communication network: Built for Speed. IEEE Micro 26(3) (May/June 2006)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, Yorktown Heights, NY 10598,
Kevin O’Brien, Kathryn O’Brien, Zehra Sura, Tong Chen & Tao Zhang

Authors

Kevin O’Brien
View author publications
You can also search for this author in PubMed Google Scholar
Kathryn O’Brien
View author publications
You can also search for this author in PubMed Google Scholar
Zehra Sura
View author publications
You can also search for this author in PubMed Google Scholar
Tong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tao Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Barbara Chapman Weiming Zheng Guang R. Gao Mitsuhisa Sato Eduard Ayguadé Dongsheng Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

O’Brien, K., O’Brien, K., Sura, Z., Chen, T., Zhang, T. (2008). Supporting OpenMP on Cell. In: Chapman, B., Zheng, W., Gao, G.R., Sato, M., Ayguadé, E., Wang, D. (eds) A Practical Programming Model for the Multi-Core Era. IWOMP 2007. Lecture Notes in Computer Science, vol 4935. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69303-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-69303-1_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69302-4
Online ISBN: 978-3-540-69303-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics