The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor

Gschwind, Michael

doi:10.1007/s10766-007-0035-4

The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor

Special Issue on High-End Computing
Published: 06 April 2007

Volume 35, pages 233–262, (2007)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Michael Gschwind¹

172 Accesses
58 Citations
3 Altmetric
Explore all metrics

As CMOS feature sizes continue to shrink and traditional microarchitectural methods for delivering high performance (e.g., deep pipelining) become too expensive and power-hungry, chip multiprocessors (CMPs) become an exciting new direction by which system designers can deliver increased performance. Exploiting parallelism in such designs is the key to high performance, and we find that parallelism must be exploited at multiple levels of the system: the thread-level parallelism that has become popular in many designs fails to exploit all the levels of available parallelism in many workloads for CMP systems. We describe the Cell Broadband Engine and the multiple levels at which its architecture exploits parallelism: data-level, instruction-level, thread-level, memory-level, and compute-transfer parallelism. By taking advantage of opportunities at all levels of the system, this CMP revolutionizes parallel architectures to deliver previously unattained levels of single chip performance. We describe how the heterogeneous cores allow to achieve this performance by parallelizing and offloading computation intensive application code onto the Synergistic Processor Element (SPE) cores using a heterogeneous thread model with SPEs. We also give an example of scheduling code to be memory latency tolerant using software pipelining techniques in the SPE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting Tightly-Coupled Cores

Article Open access 26 August 2014

REPLICA MBTAC: multithreaded dual-mode processor

Article 16 December 2017

On the Accuracy and Usefulness of Analytic Energy Models for Contemporary Multicore Processors

References

V. Srinivasan, D. Brooks, M. Gschwind, P. Bose, P. Emma, V. Zyuban, and P. Strenski, Optimizing Pipelines for Power and Performance, in Proc. 35th International Symposium on Microarchitecture (December 2002).
R. Dennard, F. Gaensslen, H.-N. Yu, L. Rideout, E. Bassous, and A. LeBlanc. Design of ion-implanted MOSFETs with very Small Physical Dimensions, IEEE J. Solid State Circuits, SC-9:256–268 (1974).
Google Scholar
Christensen C. (1997) The Innovator’s Dilemma. McGraw-Hill, New York
Google Scholar
J. Kahle, M. Day, P. Hofstee, C. Johns, T. Maeurer, and D. Shippy. Introduction to the Cell Multiprocessor. IBM J. Res. Dev., 49(4/5):589–604 (September 2005).
P. Hofstee. Power Efficient Processor Architecture and the Cell Processor, in Proc. 11th International Symposium on High-Performance Computer Architecture (February 2005).
M. Gschwind, P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. A Novel SIMD Architecture for the CELL Heterogeneous Chip-Multiprocessor, in Hot Chips 17, Palo Alto, CA (August 2005).
M. Gschwind, P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. Synergistic Processing in Cell’s Multicore Architecture, IEEE Micro, 26(2):10–24 (March 2006).
B. Flachs, S. Asano, S. Dhong, P. Hofstee, G. Gervais, R. Kim, T. Le, P. Liu, J. Leenstra, J. Liberty, B. Michael, H.-J. Oh, S. Mueller, O. Takahashi, A. Hatakeyama, Y. Watanabe, N. Yano, D. Brokenshire, M. Peyravian, V. To, and E. Iwata. The microarchitecture of the Synergistic Processor for a Cell Processor, IEEE J. Solid State Circuits, 41(1):63–70 (January 2006).
T. Karkhanis and J. E. Smith. A Day in the Life of a Data Cache Miss, in Workshop on Memory Performance Issues (2002).
V. Salapura, R. Bickford, M. Blumrich, A. A. Bright, D. Chen, P. Coteus, A. Gara, M. Giampapa, M. Gschwind, M. Gupta, S. Hall, R. A. Haring, P. Heidelberger, D. Hoenicke, G. V. Kopcsay, M. Ohmacht, R. A. Rand, T. Takken, and P. Vranas. Power and Performance Optimization at the System Level, in Proc. ACM Computing Frontiers 2005 (May 2005).
V. Salapura, R. Walkup, and A. Gara. Exploiting Workload Parallelism for Power and Performance Optimization in Blue Gene, IEEE Micro, 26(5):67–81 (September 2006).
W. Wulf and S. McKee. Hitting the Memory Wall: Implications of the Obvious. Compu. Archit. News, 23(1):20–24 (March 1995).
A. Glew. MLP yes! ILP no!, in ASPLOS Wild and Crazy Idea Session ’98 (October 1998).
The Blue Gene team. Blue Gene: A Vision for Protein Science Using a Petaflop Supercomputer. IBM Syst. J., 40(2):310–327 (2001).
Google Scholar
C. Cascaval, J. Castanos, L. Ceze, M. Denneau, M. Gupta, D. Lieber, J. Moreira, K. Strauss, and H. Warren. Evaluation of a Multithreaded Architecture for Cellular Computing, in Proc. Eighth International Symposium on High-Performance Computer Architecture (2002).
Y. Chou, B. Fahs, and S. Abraham. Microarchitecture Optimizations for Exploiting Memory-Level Parallelism, in Proc. 31st Annual International Symposium on Computer Architecture (June 2004).
L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing,in Proc. 27th Annual International Symposium on Computer Architecture (June 2000).
E. Altman, P. Capek, M. Gschwind, P. Hofstee, J. Kahle, R. Nair, S. Sathaye, and J.-D. Wellman. Method and system for maintaining coherency in a multiprovessor system by broadcasting TLB invalidated entry instructions. U.S. Patent 6970982 (November 2005).
M. Gschwind. Chip multiprocessing and the Cell Broadband Engine, in Proc. ACM Computing Frontiers 2006 (May 2006).
C. McNairy and R. Bhatia. Montecito: A Dual-Core, Dual-Thread Itanium Processor, IEEE Micro, 25(2):10–20 (March 2005).
S. Clark, K. Haselhorst, K. Imming, J. Irish, D. Krolak, and T. Ozguner. Cell Broadband Engine Interconnect and Memory Interface, in Hot Chips 17, Palo Alto, CA (August 2005).
C. Click. A Tour Inside the Azul 384-way Java Appliance, Tutorial at the 14th International Conference on Parallel Architectures and Compilation Techniques (September 2005).
A. Eichenberger, K. O’Brien, K. O’Brien, P. Wu, T. Chen, P. Oden, D. Prener, J. Shepherd, B. So, Z. Sura, A. Wang, T. Zhang, P. Zhao, and M. Gschwind. Optimizing Compiler for the Cell Processor, in Proc. 14th International Conference on Parallel Architectures and Compilation Techniques (September 2005).

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
Michael Gschwind

Authors

Michael Gschwind
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Gschwind.

Additional information

This paper is based in part on “Chip multiprocessing and the Cell Broadband Engine”, ACM Computing Frontiers 2006.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gschwind, M. The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor. Int J Parallel Prog 35, 233–262 (2007). https://doi.org/10.1007/s10766-007-0035-4

Download citation

Published: 06 April 2007
Issue Date: June 2007
DOI: https://doi.org/10.1007/s10766-007-0035-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor

Access this article

Similar content being viewed by others

Exploiting Tightly-Coupled Cores

REPLICA MBTAC: multithreaded dual-mode processor

On the Accuracy and Usefulness of Analytic Energy Models for Contemporary Multicore Processors

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor

Access this article

Similar content being viewed by others

Exploiting Tightly-Coupled Cores

REPLICA MBTAC: multithreaded dual-mode processor

On the Accuracy and Usefulness of Analytic Energy Models for Contemporary Multicore Processors

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation