Heterogeneous Multi-core Processors: The Cell Broadband Engine

  • H. Peter Hofstee
Part of the Integrated Circuits and Systems book series (ICIR)


The Cell Broadband Engine™1 Architecture defines a heterogeneous chip multi-processor (HCMP). Heterogeneous processors can achieve higher degrees of efficiency and performance than homogeneous chip multi-processors (CMPs), but also place a larger burden on software. In this chapter, we describe the Cell Broadband Engine Architecture and implementations. We discuss how memory flow control and the synergistic processor unit architecture extend the Power Architecture™2, to allow the creation of heterogeneous implementations that attack the greatest sources of inefficiency in modern microprocessors. We discuss aspects of the micro-architecture and implementation of the Cell Broadband Engine and PowerXCell8i processors. Next we survey portable approaches to programming the Cell Broadband Engine and we discuss aspects of its performance.


Local Store Multicore Processor Shared System Memory Cell Processor Task Queue 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    J. L. Hennessy and D. A. Patterson. Computer Architecture, A Quantitative Approach, 4th edition. Morgan Kaufmann, 2006.Google Scholar
  2. 2.
    V. Agarwal, M.S. Hrishikesh, S.W. Keckler, D. Burger. Clock rate vs. IPC: The end of the road of conventional microarchitectures. In Proc. 27th Annual International Symposium on Computer Architecture (ISCA). ACM Sigarch Computer Architecture News, 28(2), May 2000.Google Scholar
  3. 3.
    J. L. Gustafson. Reevaluating Amdahl’s law. Communications of the ACM 31(5):532–533, 1988.CrossRefGoogle Scholar
  4. 4.
    J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. Introduction to the Cell multiprocessor. IBM Journal of Research and Development, 49(4/5), 2005.Google Scholar
  5. 5.
    C. R. Johns and D. A. Brokenshire. Introduction to the Cell Broadband Engine Architecture. IBM Journal of Research and Development, 51(5):503–520, Oct 2007.CrossRefGoogle Scholar
  6. 6.
    B. Flachs, S. Asano, S. H. Dhong, H. P. Hofstee, G. Gervais, R. Kim, T. Le, P. Liu, J. Leenstra, J. S. Liberty, B. Michael, H.-J. Oh, S. M. Mueller, O. Takahashi, K. Hirairi, A. Kawasumi, H. Murakami, H. Noro, S. Onishi, J. Pille, J. Silberman, S. Yong, A. Hatakeyama, Y. Watanabe, N. Yano, D. A. Brokenshire, M. Peyravian, V. To, and E. Iwata. Microarchitecture and implementation of the synergistic processor in 65-nm and 90-nm SOI. IBM Journal of Research and Development, 51(5):529–554, Oct 2007.Google Scholar
  7. 7.
    K. Shimizu, H. P. Hofstee, and J. S. Liberty. Cell Broadband Engine processor vault security architecture. IBM Journal of Research and Development, 51(5):521–528, Oct 2007.Google Scholar
  8. 8.
    M. Gschwind, P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. Synergistic processing in cell’s multicore architecture. IEEE Micro, pp.10–24, Mar 2006.Google Scholar
  9. 9.
    Cell Broadband Engine Architecture Version 1.02, Oct 2007. techlib/techlib.nsf/products/Cell_Broadband_Engine
  10. 10.
  11. 11.
    D. Burger, J. R. Goodman, and A. Kagi. Memory bandwidth limitations of future microprocessors. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pp. 79–90, May 1996.Google Scholar
  12. 12.
    J. Huh, D. Burger, and S. Keckler. Exploring the design space of future cmps. In PACT’01: In Proceedings of the 10th International Conference on Parallel Architectures and Compilation Techniques, pp. 199–210, Washington, DC, USA, 2001. IEEE Computer Society.Google Scholar
  13. 13.
    International Technology Roadmap for Semiconductors,
  14. 14.
    D. Pham, T. Aipperspach, D. Boerstler, M. Bolliger, R. Chaudhry, D. Cox, P. Harvey, P. Harvey, H. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Pham, J. Pille, S. Posluszny, M. Riley, D. Stasiak, M. Suzuoki, O. Takahashi, J. Warnock, S. Weitzel, D. Wendel, and K. Yazawa. Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor. IEEE Journal of Solid-State Circuits, 41(1):179–196, Jan 2006.CrossRefGoogle Scholar
  15. 15.
    S. Clark, K. Haselhorst, K. Imming, J. Irish, D. Krolak, and T. Ozguner. Cell Broadband Engine interconnect and memory interface. In Hot Chips 17, Palo Alto, CA, Aug 2005.Google Scholar
  16. 16.
    Top 500 list of supercomputers (
  17. 17.
    A. Bergmann. Linux on Cell Broadband Engine Status Update. Proceedings of the Linux Symposium, June 27–30, 2007 Ottawa.
  18. 18.
    B. Flachs and M. Nutter. Private communication.Google Scholar
  19. 19.
    T. Zhang and J. K. O’ Brien. Private communication.Google Scholar
  20. 20.
    W. Lundgren, K. Barnes, and J. Steed. “Gedae Portability: From Simulation to DSPs to Cell Broadband Engine”, In HPEC 2007, Sep 2007. (poster) ( agendas/proc07/Day3/10_Steed_Posters.pdf).
  21. 21.
    J. Bergmann, M. Mitchell, D. McCoy, S. Seefeld, A. Salama, F. Christensen, T. Steck. Sourcery VSIPL++ for the Cell/B.E. HPEC 2007, Sep 2007.Google Scholar
  22. 22.
    M. McCool. A unified development platform for Cell, GPU, and Multi-core CPUs. SC’07, (
  23. 23.
    J. M. Perez, P. Bellens, R. M. Badia, and J. Labarta. CellSs: Making it easier to program the Cell Broadband Engine processor. In IBM Journal of Research and Development, 51(5):593–604, Oct 2007.CrossRefGoogle Scholar
  24. 24.
    A. E. Eichenberger, J. K. O’Brien, K. M. O’Brien, P. Wu, T. Chen, P. H. Oden, D. A. Prener, J. C. Shepherd, B. So, Z. Sura, A. Wang, T. Zhang, P. Zhao, M. K. Gschwind, R. Archambault, Y. Gao, and R. Koo. Using advanced compiler technology to exploit the performance of the Cell Broadband Engine™ architecture. In IBM Systems Journal, 45(1):59–84, 2006.Google Scholar
  25. 25.
  26. 26.
    D. Kunzman, G. Zheng, E. Bohm, and L. Kale. Charm++, offload api, and the cell processor. In Proceedings of the Workshop on Programming Models for Ubiquitous Parallelism (at PACT 2006).Google Scholar
  27. 27.
    K. Fatahalian, T. J. Knight, M. Houston, M. Erez, D. R. Horn, L. Leem, J. Y. Park, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the memory hierarchy. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, 2006.Google Scholar
  28. 28.
    ALF for the Cell/B.E. Programmer’s guide and API reference. chips/techlib/techlib.nsf/products/IBM_SDK_for_Multicore_Acceleration
  29. 29.
    M. Ohara, H. Inoue, Y. Sohda, H. Komatsu, and T. Nakatani. MPI microtask for programming the Cell Broadband Engine™ processor. In IBM Systems Journal, 45(1):85–102, 2006.CrossRefGoogle Scholar
  30. 30.
    S. Williams, J. Shalf, L. Oliker, S. Kamil, P. Husbands, and K. Yelick. The potential of the cell processor for scientific computing. In ACM International Conference on Computing Frontiers, 2006.Google Scholar
  31. 31.
    T. Chen, R. Raghavan, J. N. Dale, and E. Iwata. Cell Broadband Engine Architecture and its first implementation-A performance view. In IBM Journal of Research and Development, 51(5):559–572, Oct 2007.CrossRefGoogle Scholar
  32. 32.
    S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Supercomputing (SC07), 2007Google Scholar
  33. 33.
    L. Cico, R. Cooper, and J. Greene. Performance and Programmability of the IBM/Sony/Toshiba Cell Broadband Engine Processor. White Paper, 2006. com/uploadedFiles/CellPerfAndProg-3Nov06.pdf
  34. 34.
    B. Gedik, R. R. Bordawekar, and P. S. Yu. CellSort: High performance sorting on the cell processor. In Proceedings of 33rd International Conference on Very Large Databases, pp. 1286–1297, 2007.Google Scholar
  35. 35.
    H. Inoue, T. Moriyama, H. Komatsu, and T. Nakatani. AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors. In Proceedings 16th International Conference on Parallel Architecture and Compilation Techniques, pages 189–198. PACT 2007, 15–19 Sept 2007.Google Scholar
  36. 36.
    O. Villa, D. P. Scarpazza, and F. Petrini. Accelerating Real-Time String Searching with Multicore Processors. IEEE Computer, 41(4), Apr 2008.Google Scholar
  37. 37.
    D. A. Bader, V. Agarwal, K. Madduri, and S. Kang. high performance combinatorial algorithm design on the cell broadband engine processor. In Parallel Computing, 33(10–11):720–740, 2007.CrossRefGoogle Scholar
  38. 38.
    L.-K. Liu, S. Kesavarapu, J. Connell, A. Jagmohan, L. Leem, B. Paulovicks, V. Sheinin, L. Tang, and H. Yeo. Video Analysis and Compression on the STI Cell Broadband Engine Processor. In Proceedings of IEEE International Conference on Multimedia and Expo, pages 29–32, 9–12 July 2006.Google Scholar
  39. 39.
  40. 40.
    F. M. Schellenberg, T. Kingsley, N. Cobb, D. Dudau, R. Chalisani, J. McKibben, and S. McPherson. Accelerating DFM Electronic Data Process using the Cell BE Microprocessor Architecture. In Electronic Data Process (EDP) Workshop, Monterey CA, April 12, 2007.Google Scholar
  41. 41.
    J. A. Turner et al. Roadrunner Applications Team: Cell and Heterogeneous Results to date. Los Alamos Unclassified Report LA-UR-07-7573. roadrunner/rrperfassess.shtml
  42. 42.
    S. Williams, J. Carter, L. Oliker, J. Shalf, and K. Yelick. Lattice Boltzmann simulation optimization on leading multicore platforms. In International Parallel & Distributed Processing Symposium (IPDPS), 2008.Google Scholar
  43. 43.
    The Potential of the Cell Broadband Engine for Data Mining (http://
  44. 44.
    MapReduce for the Cell B.E. Architecture. University of Wisconsin Computer Sciences Technical Report CS-TR-2007-1625, Oct 2007.Google Scholar

Copyright information

© Springer-Verlag US 2009

Authors and Affiliations

  1. 1.IBM Systems and Technology GroupAustinUSA

Personalised recommendations