Skip to main content

Advertisement

Log in

The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor

  • Special Issue on High-End Computing
  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

As CMOS feature sizes continue to shrink and traditional microarchitectural methods for delivering high performance (e.g., deep pipelining) become too expensive and power-hungry, chip multiprocessors (CMPs) become an exciting new direction by which system designers can deliver increased performance. Exploiting parallelism in such designs is the key to high performance, and we find that parallelism must be exploited at multiple levels of the system: the thread-level parallelism that has become popular in many designs fails to exploit all the levels of available parallelism in many workloads for CMP systems. We describe the Cell Broadband Engine and the multiple levels at which its architecture exploits parallelism: data-level, instruction-level, thread-level, memory-level, and compute-transfer parallelism. By taking advantage of opportunities at all levels of the system, this CMP revolutionizes parallel architectures to deliver previously unattained levels of single chip performance. We describe how the heterogeneous cores allow to achieve this performance by parallelizing and offloading computation intensive application code onto the Synergistic Processor Element (SPE) cores using a heterogeneous thread model with SPEs. We also give an example of scheduling code to be memory latency tolerant using software pipelining techniques in the SPE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. V. Srinivasan, D. Brooks, M. Gschwind, P. Bose, P. Emma, V. Zyuban, and P. Strenski, Optimizing Pipelines for Power and Performance, in Proc. 35th International Symposium on Microarchitecture (December 2002).

  2. R. Dennard, F. Gaensslen, H.-N. Yu, L. Rideout, E. Bassous, and A. LeBlanc. Design of ion-implanted MOSFETs with very Small Physical Dimensions, IEEE J. Solid State Circuits, SC-9:256–268 (1974).

    Google Scholar 

  3. Christensen C. (1997) The Innovator’s Dilemma. McGraw-Hill, New York

    Google Scholar 

  4. J. Kahle, M. Day, P. Hofstee, C. Johns, T. Maeurer, and D. Shippy. Introduction to the Cell Multiprocessor. IBM J. Res. Dev., 49(4/5):589–604 (September 2005).

  5. P. Hofstee. Power Efficient Processor Architecture and the Cell Processor, in Proc. 11th International Symposium on High-Performance Computer Architecture (February 2005).

  6. M. Gschwind, P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. A Novel SIMD Architecture for the CELL Heterogeneous Chip-Multiprocessor, in Hot Chips 17, Palo Alto, CA (August 2005).

  7. M. Gschwind, P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. Synergistic Processing in Cell’s Multicore Architecture, IEEE Micro, 26(2):10–24 (March 2006).

  8. B. Flachs, S. Asano, S. Dhong, P. Hofstee, G. Gervais, R. Kim, T. Le, P. Liu, J. Leenstra, J. Liberty, B. Michael, H.-J. Oh, S. Mueller, O. Takahashi, A. Hatakeyama, Y. Watanabe, N. Yano, D. Brokenshire, M. Peyravian, V. To, and E. Iwata. The microarchitecture of the Synergistic Processor for a Cell Processor, IEEE J. Solid State Circuits, 41(1):63–70 (January 2006).

  9. T. Karkhanis and J. E. Smith. A Day in the Life of a Data Cache Miss, in Workshop on Memory Performance Issues (2002).

  10. V. Salapura, R. Bickford, M. Blumrich, A. A. Bright, D. Chen, P. Coteus, A. Gara, M. Giampapa, M. Gschwind, M. Gupta, S. Hall, R. A. Haring, P. Heidelberger, D. Hoenicke, G. V. Kopcsay, M. Ohmacht, R. A. Rand, T. Takken, and P. Vranas. Power and Performance Optimization at the System Level, in Proc. ACM Computing Frontiers 2005 (May 2005).

  11. V. Salapura, R. Walkup, and A. Gara. Exploiting Workload Parallelism for Power and Performance Optimization in Blue Gene, IEEE Micro, 26(5):67–81 (September 2006).

  12. W. Wulf and S. McKee. Hitting the Memory Wall: Implications of the Obvious. Compu. Archit. News, 23(1):20–24 (March 1995).

  13. A. Glew. MLP yes! ILP no!, in ASPLOS Wild and Crazy Idea Session ’98 (October 1998).

  14. The Blue Gene team. Blue Gene: A Vision for Protein Science Using a Petaflop Supercomputer. IBM Syst. J., 40(2):310–327 (2001).

    Google Scholar 

  15. C. Cascaval, J. Castanos, L. Ceze, M. Denneau, M. Gupta, D. Lieber, J. Moreira, K. Strauss, and H. Warren. Evaluation of a Multithreaded Architecture for Cellular Computing, in Proc. Eighth International Symposium on High-Performance Computer Architecture (2002).

  16. Y. Chou, B. Fahs, and S. Abraham. Microarchitecture Optimizations for Exploiting Memory-Level Parallelism, in Proc. 31st Annual International Symposium on Computer Architecture (June 2004).

  17. L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing,in Proc. 27th Annual International Symposium on Computer Architecture (June 2000).

  18. E. Altman, P. Capek, M. Gschwind, P. Hofstee, J. Kahle, R. Nair, S. Sathaye, and J.-D. Wellman. Method and system for maintaining coherency in a multiprovessor system by broadcasting TLB invalidated entry instructions. U.S. Patent 6970982 (November 2005).

  19. M. Gschwind. Chip multiprocessing and the Cell Broadband Engine, in Proc. ACM Computing Frontiers 2006 (May 2006).

  20. C. McNairy and R. Bhatia. Montecito: A Dual-Core, Dual-Thread Itanium Processor, IEEE Micro, 25(2):10–20 (March 2005).

  21. S. Clark, K. Haselhorst, K. Imming, J. Irish, D. Krolak, and T. Ozguner. Cell Broadband Engine Interconnect and Memory Interface, in Hot Chips 17, Palo Alto, CA (August 2005).

  22. C. Click. A Tour Inside the Azul 384-way Java Appliance, Tutorial at the 14th International Conference on Parallel Architectures and Compilation Techniques (September 2005).

  23. A. Eichenberger, K. O’Brien, K. O’Brien, P. Wu, T. Chen, P. Oden, D. Prener, J. Shepherd, B. So, Z. Sura, A. Wang, T. Zhang, P. Zhao, and M. Gschwind. Optimizing Compiler for the Cell Processor, in Proc. 14th International Conference on Parallel Architectures and Compilation Techniques (September 2005).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Gschwind.

Additional information

This paper is based in part on “Chip multiprocessing and the Cell Broadband Engine”, ACM Computing Frontiers 2006.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gschwind, M. The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor. Int J Parallel Prog 35, 233–262 (2007). https://doi.org/10.1007/s10766-007-0035-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-007-0035-4

Keywords

Navigation