Support for OpenMP Tasks on Cell Architecture

  • Qian Cao
  • Changjun Hu
  • Haohu He
  • Xiang Huang
  • Shigang Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6082)


OpenMP task is the most significant feature in the new specification, which provides us with a way to handle unstructured parallelism. This paper presents a runtime library of task model on Cell heterogeneous multicore, which attempts to maximally utilize architectural advantages. Moreover, we propose two optimizations, an original scheduling strategy and an adaptive cut-off technique. The former combines breadth-first with the work-first scheduling strategy. While the latter adaptively chooses the optimal cut-off technique between max number of tasks and max task recursion level according to application characteristics. Performance evaluations indicate that our scheme achieves a speedup factor from 3.4 to 7.2 compared to serial executions.


Task OpenMP prarallel Cell architecture 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Frigo, M., Leiserson, C.E., Randall, K.H.: The Implementation of the Cilk-5 Multithreaded Language. In: ACM SIGPLAN conference on Programming language design and implementation, pp. 212–223. ACM Press, New York (1998)Google Scholar
  2. 2.
    Reinders, J.: Intel Threading Building Blocks. Technical report, O’Reilly Media Inc. (2007)Google Scholar
  3. 3.
    T.X.D. Team: Report on the Experimental Language X10. Technical report, IBM (2006)Google Scholar
  4. 4.
    Chamberlain, B., Callahan, D., Zima, H.: Parallel programmability and the chapel language. J. Int. J. High Perform. Comput. Appl. 21, 291–312 (2007)CrossRefGoogle Scholar
  5. 5.
    The Fortress Language Specification. Version 1.0 B (2007)Google Scholar
  6. 6.
    OpenMP Application Program Interface, Version 3.0. OpenMP Architecture Review Board (2008)Google Scholar
  7. 7.
    Duran, A., Corbalán, J., Ayguadé, E.: Evaluation of OpenMP task scheduling strategies. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 101–110. Springer, Heidelberg (2008)Google Scholar
  8. 8.
    Shah, S., Haab, G., Petersen, P., Throop, J.: Flexible Control Structures for Parallelism in OpenMP. In: 1st European Workshop OpenMP, pp. 1219–1239 (1999)Google Scholar
  9. 9.
    Teruel, X., Martorell, X., Duran, A., Ferrer, R., Ayguadé, E.: Support for OpenMP Tasks in Nanos v4. In: Proc. Conf. Center for Advanced Studies on Collaborative Research, pp. 256–259. ACM Press, New York (2007)CrossRefGoogle Scholar
  10. 10.
    Teruel, X., Unnikrishnan, P., Martorell, X., et al.: Openmp tasks in ibm XL compilers. In: Proc. of the 2008 conference of the center for advanced studies on collaborative research, pp. 207–221. ACM Press, New York (2008)CrossRefGoogle Scholar
  11. 11.
    Altevogt, P.: IBM BladeCenter QS21 Hardware Performance. IBM Technical White Paper WP101245 (2008)Google Scholar
  12. 12.
    Leijen, D., Hall, J.: Optimize Managed Code for Multi-Core Machines. J. MSDN Magazine, 1098–1116 (2007)Google Scholar
  13. 13.
    Leijen, D., Schulte, W., Burckhardt, S.: The design of a task parallel library. In: International Conference on Object Oriented Programming, Systems, Languages and Applications, pp. 227–242. ACM Press, New York (2009)Google Scholar
  14. 14.
    Balart, J., Duran, A., Gonza‘lez, M., Martorell, X., et al.: Nanos Mercurium: A Research Compiler for OpenMP. In: 6th European Workshop OpenMP, pp. 103–109 (2004)Google Scholar
  15. 15.
    Ayguadé, E., Duran, A., Hoeflinger, J., et al.: An Experimental Evaluation of the New OpenMP Tasking Model. In: Adve, V., Garzarán, M.J., Petersen, P. (eds.) LCPC 2007. LNCS, vol. 5234, pp. 63–77. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  16. 16.
    Cody, A., James, L., Lei, H., Barbara, C.: OpenMP 3.0 Tasking Implementation in OpenUH. In: 2nd Open64 Workshop at CGO (2009)Google Scholar
  17. 17.
    Rico, A., Ramirez, A., Valero, M.: Available task-level parallelism on the cell BE. J. Scientific Programming 17, 59–76 (2009)Google Scholar
  18. 18.
    Bellens, P., Perez, J.M., Badia, R.M., Labarta, J.: CellSs: a programming model for the Cell BE Architecture. In: Proc. of the 2006 ACM/IEEE Conference on Supercomputing. ACM Press, New York (2006)Google Scholar
  19. 19.
    Certner, O., Li, Z., Palatin, P., et al.: A Practical Approach for Reconciling High and Predictable Performance in Non-Regular Programs. In: 1st Workshop on Programmability Issues for Multi-Core Computers, pp. 740–745. ACM Press, New York (2008)Google Scholar
  20. 20.
    Duran, A., Corbalán, J., Ayguadé, E.: An adaptive cut-off for task parallelism. In: Proc. of the 2008 ACM/IEEE Conf. on Supercomputing, pp. 1–11. IEEE Press, Los Alamitos (2008)Google Scholar
  21. 21.
    Martorell, X., Labarta, J., Navarro, N., Ayguad´e, E.: Nano-Threads Library Design, Implementation and Evaluation. Technical Report UPC-DAC-1995-33, DAC/UPC (1995)Google Scholar
  22. 22.
    Cong, G., Kodali, S., Krishnamoorthy, S., et al.: Solving large, irregular graph problems using adaptive work-stealing. In: Proc. of the International Conference on Parallel Processing, pp. 536–545. IEEE Press, New York (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Qian Cao
    • 1
  • Changjun Hu
    • 1
  • Haohu He
    • 1
  • Xiang Huang
    • 1
  • Shigang Li
    • 1
  1. 1.University of Science and Technology BeijingBeijingChina

Personalised recommendations