Practical Many-Core Programming



In the previous chapters we introduced the foundations for programming many-core chips: current and expected hardware architectures; operating system designs; foundations of parallel programming; the basic programming models that we believe are the most promising ones in the context of many-core chips. This chapter focuses on the concrete technologies available today which we believe will endure the test of time, providing a solid background for programming the many-core chips of the future. Specifically, we cover several task based models (such as Cilk , Grand Central Dispatch, OpenMP , Thread Building Blocks, Microsoft’s Task Parallel Library), data parallel models (illustrated through OpenCL which also supports the task model) and a well established representative of the actor model (Erlang). The goal of this chapter is to provide a guiding map for the choice of the most suitable programming model and implementation library when addressing the challenge of ‘best solution’ for a specific application domain.


Task Graph Runtime System Work Item Parallel Region Task Scheduler 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Frigo M, Leiserson C E, Randall K H (1998) The implementation of the Cilk-5 Multithreaded Language. Proceedings of the ACM SIGPLAN 1998 conference on Programming Language Design and Implementation, 212-223Google Scholar
  2. 2.
    Gruman G, Hattersley M, Butler T R (2009) Mac OS X Snow Leopard Bible. Wiley & SonsGoogle Scholar
  3. 3.
    Reinders J (2007) Intel Thread Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly MediaGoogle Scholar
  4. 4.
    Intel Corporation (2010) Intel Threading Building Blocks. http://www. Accessed 11 January 2011Google Scholar
  5. 5.
    Campbell C, Johnson R, Miller A, Toub S (2010) Parallel Programming with Microsoft .NET: Design Patterns for Decomposition and Coordination on Multicore Architectures (Patterns & Practices). Microsoft PressGoogle Scholar
  6. 6.
    Chapman B, Jost G, van der Pas R, Kuck D J (2007) Using OpenMP: Portable Shared Memory Parallel Programming. The MIT PressGoogle Scholar
  7. 7.
    The OpenMP Architecture Review Board (2008) The OpenMP Application Program Interface. Accessed 10 January 2011Google Scholar
  8. 8.
    Podobas A, Brorsson M, Faxén K-F (2010) A Comparison of some recent Task-based Parallel Programming Models. 3rd Workshop on Programmability Issues for Multi-core ComputersGoogle Scholar
  9. 9.
    Ravela S C (2010) Comparison of Shared Memory Based Parallel Programming Models. PhD Thesis, School of Computing, Blekinge Institute of Technology.Google Scholar
  10. 10.
    Olivier S L, Prins J F (2010) Comparison of OpenMP 3.0 and Other Task Parallel Frameworks on Unbalanced Task Graphs. International Journal of Parallel Programming 38(5-6):341-360Google Scholar
  11. 11.
    Faxén K-F (2008) Wool: a Work Stealing Library. SIGARCH Computer Architecture Newsletter 36(5):93-100Google Scholar
  12. 12.
    Sanders J, Kandrot E (2010) CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley ProfessionalGoogle Scholar
  13. 13.
    Khronos Group (2010) OpenCL: Introduction and Overview. Accessed 11 January 2011Google Scholar
  14. 14.
    Khronos OpenCL Working Group (2010) The OpenCL Specification Version 1.1. Accessed 11 January 2011Google Scholar
  15. 15.
    Armstrong J (2003) Making Reliable Distributed Systems in the Presence of Software Errors. PhD thesis, Royal Institute of Technology, Stockholm, Sweden. download/armstrong_thesis_2003.pdf. Accessed 11 January 2011Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Oy L M Ericsson AbJorvasFinland

Personalised recommendations