Performance Impact of Task Mapping on the Cell BE Multicore Processor

  • Jörg Keller
  • Ana Lucia Varbanescu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6161)


Current multicores present themselves as symmetric to programmers with a bus as communication medium, but are known to be non-symmetric because their interconnect is more complex than a bus. We report on our experiments to map a simple application with communication in a ring to SPEs of a Cell BE processor such that performance is optimized. We find that low-level tricks for static mapping do not necessarily achieve optimal performance. Furthermore, we ran exhaustive mapping experiments, and we observed that (1) performance variations can be significant between consecutive runs, and (2) performance forecasts based on intuitive interconnect behavior models are far from accurate even for a simple communication pattern.


Task Mapping Average Runtime Multicore Processor Cell Processor Unidirectional Ring 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ainsworth, T.W., Pinkston, T.M.: On characterizing performance of the Cell Broadband Engine Element Interconnect Bus. In: Proc. Int.l Symposium on Networks-on-Chip, pp. 18–29 (May 2007)Google Scholar
  2. 2.
  3. 3.
  4. 4.
    Benthin, C., Wald, I., Scherbaum, M., Friedrich, H.: Ray tracing on the Cell processor. In: IEEE Symposium of Interactive Ray Tracing, pp. 15–23 (September 2006)Google Scholar
  5. 5.
    Chen, T., Raghavan, R., Dale, J.N., Iwata, E.: Cell broadband engine architecture and its first implementation: a performance view. IBM J. Res. Dev. 51(5), 559–572 (2007)CrossRefGoogle Scholar
  6. 6.
    D’Amora, B.: Online Game Prototype (white paper) (May 2005),
  7. 7.
    Hansson, A., Goossens, K., Bekooij, M., Huisken, J.: CoMPSoC: A template for composable and predictable multi-processor system on chips. ACM Trans. Design Autom. Electr. Syst. 14(1) (2009)Google Scholar
  8. 8.
    Keller, J., Kessler, C.W.: Optimized pipelined parallel merge sort on the Cell BE. In: Proc. 2nd Workshop on Highly Parallel Processing on a Chip (HPPC 2008) at Euro-Par 2008, pp. 131–140 (August 2008)Google Scholar
  9. 9.
    Kessler, C., Keller, J.: Optimized mapping of pipelined task graphs on the Cell BE. In: Proc. 14th International Workshop on Compilers for Parallel Computers (January 2009)Google Scholar
  10. 10.
    Kistler, M., Perrone, M., Petrini, F.: Cell multiprocessor communication network: Built for speed. IEEE Micro 26(3), 10–23 (2006)CrossRefGoogle Scholar
  11. 11.
    Platonov, A., Sorokin, S.: Cell programming lab course (in german) (March 2010),
  12. 12.
    Sudheer, C., Nagaraju, T., Baruah, P., Srinivasan, A.: Optimizing assignment of threads to SPEs on the Cell BE processor. In: Proc. 10th Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC at IPDPS 2009) (May 2009)Google Scholar
  13. 13.
    Sudheer, C.D., Sriram, S., Baruah, P.K.: A communication model for determining optimal affinity on the Cell BE processor. In: Proc. Int.l Conference on High Performance Computing (HiPC 2009) (December 2009)Google Scholar
  14. 14.
    Tuduce, I., Majo, Z., Gauch, A., Chen, B., Gross, T.R.: Asymmetries in multi-core systems — or why we need better performance measurement units. In: Proc. Exascale Evaluation and Research Techniques Workshop (EXERT) at ASPLOS 2010 (March 2010)Google Scholar
  15. 15.
    van der Wijngaart, R., Mattson, T.: RCCE: a small library for many-core communication. In: Proc. Intel Labs Single-chip Cloud Computer Symposium (March 2010),
  16. 16.
    Williams, S., Shalf, J., Oliker, L., Kamil, S., Husbands, P., Yelick, K.: The potential of the Cell processor for scientic computing. ACM Computing Frontiers 06, 9–20 (2006)Google Scholar
  17. 17.
    Wolf, W.: The future of multiprocessor systems-on-chips. In: Proc. 41st Design Automation Conference (DAC 2004), pp. 681–685 (July 2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Jörg Keller
    • 1
  • Ana Lucia Varbanescu
    • 2
  1. 1.University of HagenHagenGermany
  2. 2.Delft University of TechnologyDelftThe Netherlands

Personalised recommendations