Mapping and Synchronizing Streaming Applications on Cell Processors

  • Maik Nijhuis
  • Herbert Bos
  • Henri E. Bal
  • Cédric Augonnet
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5409)


Developing streaming applications on heterogenous multi-processor architectures like the Cell is difficult. Currently, application developers need to know about hardware details to deal with issues like scheduling, memory management and communication/synchronization. Worse, with multiple alternatives for communication available, developers spend significant time picking the most appropriate one. A poor choice often results in bad performance. With Cell-Space, we shield users from hardware details without compromising performance. Its runtime is based on an evaluation of the different communication primitives. In Cell-Space, developers specify a streaming application as a data flow graph of interacting components. Both task- and data-parallelism are easily expressed and advanced features such as dynamic reconfiguration are fully supported. Beneath a simple interface we include a slew of optimizations not present in other Cell run time environments. We demonstrate the impact of these optimizations and show that Cell-Space applications can efficiently exploit the resources offered by the Cell.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy, D.: Introduction to the Cell multiprocessor. IBM Journal of Research and Development 49(4/5), 589 (2005)CrossRefGoogle Scholar
  2. 2.
    Williams, S., Shalf, J., Oliker, L., Kamil, S., Husbands, P., Yelick, K.: The potential of the Cell processor for scientific computing. In: Proc. 3rd conf. on Computing Frontiers, pp. 9–20. ACM Press, New York (2006)CrossRefGoogle Scholar
  3. 3.
    Kunzman, D., Zheng, G., Bohm, E., Kalé, L.V.: Charm++, offload API, and the cell processor. In: Proc. Workshop on Programming Models for Ubiquitous Parallelism, Seattle, WA, USA (September 2006)Google Scholar
  4. 4.
    IBM: Accelerated Library Framework Programmer’s Guide and API Reference (March 2007)Google Scholar
  5. 5.
    Zhang, X.D., Li, Q.J., Rabbah, R., Amarasinghe, S.: A lightweight streaming layer for multicore execution. In: Workshop on Design, Architecture and Simulation of Chip Multi-Processors, Chicago, IL (December 2007)Google Scholar
  6. 6.
    Nijhuis, M., Bos, H., Bal, H.E.: A component-based coordination language for efficient reconfigurable streaming applications. In: Proc. Intl. Conf. on Parallel Processing, Xi’An, China (September 2007)Google Scholar
  7. 7.
    Bovet, D.P., Cesati, M.: Understanding the Linux Kernel, 3rd edn. O’Reilly, Sebastopol (2005)Google Scholar
  8. 8.
    Welsh, M., Basu, A., von Eicken, T.: Incorporating memory management into user-level network interfaces. In: Proceedings of Hot Interconnects V (August 1997)Google Scholar
  9. 9.
    Salim, J.H., Olsson, R., Kuznetsov, A.: Beyond softnet. In: Proc. 5th Annual Linux Showcase & Conference, November 2001, pp. 165–172. USENIX Association, Berkeley (2001)Google Scholar
  10. 10.
    Bos, H., de Bruijn, W., Cristea, M., Nguyen, T., Portokalidis, G.: FFPF: fairly fast packet filters. In: Proc. 6th Symposium on Operating Systems Design and Implementation (December 2004)Google Scholar
  11. 11.
    Govindan, R., Anderson, D.P.: Scheduling and ipc mechanisms for continuous media. In: SOSP, ACM SIGOPS, pp. 68–80 (1991)Google Scholar
  12. 12.
    Buttari, A., Luszczek, P., Kurzak, J., Dongarra, J., Bosilca, G.: Scop3: A rough guide to scientific computing on the playstation 3. version 0.1. Technical Report UT-CS-07-595, ICL, University of Tennessee, Knoxville (April 2007)Google Scholar
  13. 13.
    Pai, V.S., Druschel, P., Zwaenepoel, W.: Io-lite: a unified i/o buffering and caching system. ACM Transactions on Computer Systems 18(1), 37–66 (2000)CrossRefGoogle Scholar
  14. 14.
    IBM: SPE Runtime Management Library, Version 2.2 (October 2007)Google Scholar
  15. 15.
    Bellens, P., Pérez, J.M., Badia, R.M., Labarta, J.: CellSs: a programming model for the Cell BE architecture. In: Proc. 2006 ACM/IEEE Supercomputing conf., p. 86. ACM Press, New York (2006)CrossRefGoogle Scholar
  16. 16.
    Eichenberger, A.E., O’Brien, J.K., O’Brien, K.M., Wu, P., Chen, T., Oden, P.H., Prener, D.A., Shepherd, J.C., So, B., Sura, Z., Wang, A., Zhang, T., Zhao, P., Gschwind, M.K., Archambault, R., Gao, Y., Koo, R.: Using advanced compiler technology to exploit the performance of the cell broadband engineTMarchitecture. IBM System Journal 45(1), 59–84 (2006)CrossRefGoogle Scholar
  17. 17.
    Bouzas, B., Cooper, R., Greene, J., Pepe, M., Prelle, M.J.: Multicore framework: An API for programming heterogeneous multicore processors. In: First Workshop on Software Tools for Multi-Core Systems, Manhattan, New York, NY (March 2006)Google Scholar
  18. 18.
  19. 19.
    Thies, W., Karczmarek, M., Amarasinghe, S.P.: StreamIt: A language for streaming applications. In: Horspool, R.N. (ed.) CC 2002. LNCS, vol. 2304, pp. 179–196. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  20. 20.
    Gordon, M., Thies, W., Amarasinghe, S.: Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In: Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, San Jose, CA (October 2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Maik Nijhuis
    • 1
  • Herbert Bos
    • 1
  • Henri E. Bal
    • 1
  • Cédric Augonnet
    • 2
  1. 1.Vrije UniversiteitAmsterdamThe Netherlands
  2. 2.INRIA - LaBRIUniversité Bordeaux 1France

Personalised recommendations