High-Level Support for Pipeline Parallelism on Many-Core Architectures

  • Siegfried Benkner
  • Enes Bajrovic
  • Erich Marth
  • Martin Sandrieser
  • Raymond Namyst
  • Samuel Thibault
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7484)


With the increasing architectural diversity of many-core architectures the challenges of parallel programming and code portability will sharply rise. The EU project PEPPHER addresses these issues with a component-based approach to application development on top of a task-parallel execution model. Central to this approach are multi-architectural components which encapsulate different implementation variants of application functionality tailored for different core types. An intelligent runtime system selects and dynamically schedules component implementation variants for efficient parallel execution on heterogeneous many-core architectures. On top of this model we have developed language, compiler and runtime support for a specific class of applications that can be expressed using the pipeline pattern. We propose C/C++ language annotations for specifying pipeline patterns and describe the associated compilation and runtime infrastructure. Experimental results indicate that with our high-level approach performance comparable to manual parallelization can be achieved.


Graphic Processing Unit Pipeline Stage Runtime System Execution Unit Implementation Variant 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Intel, Threading Building Blocks (2009),
  2. 2.
    Nvidia, C.: Compute Unified Device Architecture Programming Guide. NVIDIA, Santa Clara (2007)Google Scholar
  3. 3.
    Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy, D.J.: Introduction to the Cell Multiprocessor. IBM Journal of Research and Development 49(4-5), 589–604 (2005)CrossRefGoogle Scholar
  4. 4.
    Munshi, A. (ed.): OpenCL 1.0 Specification. Khronos OpenCL Working Group (2011)Google Scholar
  5. 5.
    Pan, H., Hindman, B., Asanović, K.: Composing Parallel Software Efficiently with Lithe. In: Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2010, pp. 376–387. ACM, New York (2010)CrossRefGoogle Scholar
  6. 6.
    Ansel, J., Chan, C.P., Wong, Y.L., Olszewski, M., Zhao, Q., Edelman, A., Amarasinghe, S.P.: PetaBricks: A Language and Compiler for Algorithmic Choice. In: Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2009, pp. 38–49 (2009)Google Scholar
  7. 7.
    Wernsing, J.R., Stitt, G.: Elastic Computing: A Framework for Transparent, Portable, and Adaptive Multi-core Heterogeneous Computing. In: Proceedings of the ACM SIGPLAN/SIGBED 2010 Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp. 115–124. ACM (2010)Google Scholar
  8. 8.
    Vandierendonck, H., Pratikakis, P., Nikolopoulos, D.S.: Parallel Programming of General-Purpose Programs using Task-based Programming Models. In: Proceedings of the 3rd USENIX Conference on Hot Topics in Parallelism, HotPar 2011, Berkeley, CA, USA, p. 13 (2011)Google Scholar
  9. 9.
    Benkner, S., Pllana, S., Traff, J., Tsigas, P., Dolinsky, U., Augonnet, C., Bachmayer, B., Kessler, C., Moloney, D., Osipov, V.: PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems. IEEE Micro 31(5), 28–41 (2011)CrossRefGoogle Scholar
  10. 10.
    Sandrieser, M., Benkner, S., Pllana, S.: Using explicit platform descriptions to support programming of heterogeneous many-core systems. Parallel Computing 38(12), 52–65 (2012), CrossRefGoogle Scholar
  11. 11.
    Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience (23), 187–198 (2011)Google Scholar
  12. 12.
    Quinlan, D.: ROSE: Compiler Support for Object-Oriented Frameworks. Parallel Processing Letters 49 (2005)Google Scholar
  13. 13.
    Topcuoglu, H., Hariri, S., Wu, M.-Y.: Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing. IEEE Transactions on Parallel and Distributed Systems 13(3) (March 2002)Google Scholar
  14. 14.
    Burrows, M.: A Block-Sorting Lossless Data Compression Algorithm. Research Report 124, Digital Systems Research Center (1994)Google Scholar
  15. 15.
    Intel, Intel Threading Building Blocks - Pipeline Documentation,
  16. 16.
    Seward, J.: BZIP2 Library Utility Function Documentation (September 2011),
  17. 17.
    Gilchrist, J.: Parallel Data Compression with bzip2. In: Proceedings of the 16th IASTED International Conference on Parallel and Distributed Computing and Systems, vol. 16, pp. 559–564 (2004)Google Scholar
  18. 18.
    Gary, B.: Learning openCV: Computer Vision with the openCV Library. O’Reilly, USA (2008)Google Scholar
  19. 19.
    Benoit, A., Robert, Y.: Mapping Pipeline Skeletons onto Heterogeneous Platforms. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007, Part I. LNCS, vol. 4487, pp. 591–598. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  20. 20.
    Cole, M.: Bringing Skeletons out of the Closet: A Pragmatic Manifesto for Skeletal Parallel Programming. Parallel Computing (2004)Google Scholar
  21. 21.
    Mattson, T., Sanders, B., Massingill, B.: Patterns for Parallel Programming. Addison-Wesley (2005)Google Scholar
  22. 22.
    Pop, A., Cohen, A.: A Stream-Computing Extension to OpenMP. In: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers. ACM (2011)Google Scholar
  23. 23.
    Thies, W., Karczmarek, M., Amarasinghe, S.: StreamIt: A Language for Streaming Applications. In: Horspool, R.N. (ed.) CC 2002. LNCS, vol. 2304, pp. 179–196. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  24. 24.
    Sermulins, J., Thies, W., Rabbah, R., Amarasinghe, S.: Cache Aware Optimization of Stream Programs. ACM SIGPLAN Notices 40(7) (2005)Google Scholar
  25. 25.
    Schaefer, C., Pankratius, V., Tichy, W.: Engineering Parallel Applications with Tunable Architectures. In: ICSE 2010: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, vol. 1 (May 2010)Google Scholar
  26. 26.
    Otto, F., Schaefer, C.A., Dempe, M., Tichy, W.F.: A Language-Based Tuning Mechanism for Task and Pipeline Parallelism. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010, Part II. LNCS, vol. 6272, pp. 328–340. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  27. 27.
    Suleman, M., Qureshi, M., Khubaib, Patt, Y.: Feedback-Directed Pipeline Parallelism. In: PACT 2010: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (2010)Google Scholar
  28. 28.
    Ayguade, E., Badia, R.M., Cabrera, D., Duran, A., Gonzalez, M., Igual, F., Jimenez, D., Labarta, J., Martorell, X., Mayo, R., Perez, J.M., Quintana-Ortí, E.S.: A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 154–167. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  29. 29.
    Wolfe, M.: Implementing the PGI Accelerator Model. In: GPGPU 2010: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units. ACM (March 2010)Google Scholar
  30. 30.
    Bodin, F., Bihan, S.: Heterogeneous Multicore Parallel Programming for Graphics Processing Units. Scientific Programming 17, 325–335 (2009)Google Scholar
  31. 31.
    OpenACC. Directives for Accelerators,

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Siegfried Benkner
    • 1
  • Enes Bajrovic
    • 1
  • Erich Marth
    • 1
  • Martin Sandrieser
    • 1
  • Raymond Namyst
    • 2
  • Samuel Thibault
    • 2
  1. 1.Research Group Scientific ComputingUniversity of ViennaAustria
  2. 2.LaBRI-INRIA Bordeaux Sud-OuestUniversity of BordeauxTalenceFrance

Personalised recommendations