Exploiting Fine-Grained Parallelism on Cell Processors

  • Ralf Hoffmann
  • Andreas Prell
  • Thomas Rauber
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6272)


Driven by increasing specialization, multicore integration will soon enable large-scale chip multiprocessors (CMPs) with many processing cores. In order to take advantage of increasingly parallel hardware, independent tasks must be expressed at a fine level of granularity to maximize the available parallelism and thus potential speedup. However, the efficiency of this approach depends on the runtime system, which is responsible for managing and distributing the tasks. In this paper, we present a hierarchically distributed task pool for task parallel programming on Cell processors. By storing subsets of the task pool in the local memories of the Synergistic Processing Elements (SPEs), access latency and thus overheads are greatly reduced. Our experiments show that only a worker-centric runtime system that utilizes the SPEs for both task creation and execution is suitable for exploiting fine-grained parallelism.


Load Balance Runtime System Task Creation Task Size Cell Processor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy, D.: Introduction to the Cell multiprocessor. IBM J. Res. Dev. 49(4/5) (2005)Google Scholar
  2. 2.
    Johns, C.R., Brokenshire, D.A.: Introduction to the Cell Broadband Engine Architecture. IBM J. Res. Dev. 51(5) (2007)Google Scholar
  3. 3.
    Hoffmann, R., Prell, A., Rauber, T.: Dynamic Task Scheduling and Load Balancing on Cell Processors. In: Proc. of the 18th Euromicro Intl. Conference on Parallel, Distributed and Network-Based Processing (2010)Google Scholar
  4. 4.
    Griebel, M., Knapek, S., Zumbusch, G.: Numerical Simulation in Molecular Dynamics, 1st edn. Springer, Heidelberg (September 2007)Google Scholar
  5. 5.
    Bellens, P., Perez, J.M., Badia, R.M., Labarta, J.: CellSs: a Programming Model for the Cell BE Architecture. In: Proc. of the 2006 ACM/IEEE conference on Supercomputing (2006)Google Scholar
  6. 6.
    Perez, J.M., Bellens, P., Badia, R.M., Labarta, J.: CellSs: Making it easier to program the Cell Broadband Engine processor. IBM J. Res. Dev. 51(5) (2007)Google Scholar
  7. 7.
    IBM: IBM Software Development Kit (SDK) for Multicore Acceleration Version 3.1,
  8. 8.
    Mohr, E., Kranz, D.A., Halstead Jr., R.H.: Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs. In: Proc. of the 1990 ACM conference on LISP and functional programming (1990)Google Scholar
  9. 9.
    Duran, A., Corbalán, J., Ayguadé, E.: An adaptive cut-off for task parallelism. In: Proc. of the 2008 ACM/IEEE conference on Supercomputing (2008)Google Scholar
  10. 10.
    Rico, A., Ramirez, A., Valero, M.: Available task-level parallelism on the Cell BE. Scientific Programming 17, 59–76 (2009)CrossRefGoogle Scholar
  11. 11.
    Kumar, S., Hughes, C.J., Nguyen, A.: Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors. In: Proc. of the 34th Intl. Symposium on Computer Architecture (2007)Google Scholar
  12. 12.
    Kumar, S., Hughes, C.J., Nguyen, A.: Architectural Support for Fine-Grained Parallelism on Multi-core Architectures. Intel Technology Journal 11(3) (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Ralf Hoffmann
    • 1
  • Andreas Prell
    • 1
  • Thomas Rauber
    • 1
  1. 1.Department of Computer ScienceUniversity of BayreuthGermany

Personalised recommendations