Pipeline Patterns on Top of Task-Based Runtimes

  • Enes BajrovicEmail author
  • Siegfried Benkner
  • Jiri Dokulil
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 931)


Task-based runtime systems have gained a lot of interest in recent years since they support separating the specification of parallel computations from the concrete mapping onto a parallel architecture. This separation of concerns is considered key to coping with the increased complexity, performance variability, and heterogeneity of future parallel systems and to facilitating portability of applications across different architectures. In this paper we present our work on a programming framework that enables the expression of pipeline patterns at a high-level of abstraction by adding pragma directives to sequential C++ codes. Such high-level abstractions are then transformed to a runtime coordination layer, which utilizes different task-based runtime systems including StarPU and OCR to realize efficient parallel execution on single-node multi-core architectures. We describe the major aspects of our approach for mapping pipeline patterns to task-based runtimes and present experimental results for a real-world face-detection application indicating that a performance competitive with low-level programming approaches can be achieved.


Parallel programming Runtime systems Multicore architectures 



The work was supported in part by the Austrian Science Fund (FWF) project P 29783 Dynamic Runtime System for Future Parallel Architectures.


  1. 1.
    Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput.: Pract. Exp. - Euro-Par 2009, 187–198 (2011)CrossRefGoogle Scholar
  2. 2.
    Kaiser, H., Heller, T., Adelstein-Lelbach, B., Serio, A., Fey, D.: HPX - a task based programming model in a global address space. In: PGAS 2014: the 8th International Conference on Partitioned Global Address Space Programming Models (2014)Google Scholar
  3. 3.
    Cledat, R., Mattson, T.: OCR, the open community runtime interface. OCR specification 1.2.0 (2016)Google Scholar
  4. 4.
    Benkner, S., et al.: PEPPHER: efficient and productive usage of hybrid computing systems. IEEE Micro 31(5), 28–41 (2011)CrossRefGoogle Scholar
  5. 5.
    Bueno, J., et al.: Productive programming of GPU clusters with OmpSs. In: 2012 IEEE 26th International Parallel Distributed Processing Symposium (IPDPS) (2012)Google Scholar
  6. 6.
    OpenMP Architecture Review Board. OpenMP Application Programming Interface v4.5 (2015)Google Scholar
  7. 7.
    Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, Salt Lake City, Utah (2012)Google Scholar
  8. 8.
    Pheatt, C.: Intel® threading building blocks. J. Comput. Sci. Coll. 23(4), 298 (2008)Google Scholar
  9. 9.
    Robson, M.P., Buch, R., Kale, L., Runtime coordinated heterogeneous tasks in charm++. In: ESPM2 Workshop, in Conjunction with SC16, Salt Lake City (2016)Google Scholar
  10. 10.
    Majeti, D., Sarkar, V.: Heterogeneous Habanero-C (H2C): a portable programming model for heterogeneous processors. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop (2015)Google Scholar
  11. 11.
    Bajrovic, E., Benkner, S.: Automatic performance tuning of pipeline patterns for heterogeneous parallel architectures. In: The 2014 International Conference on Parallel and Distributed Processing, Techniques and Applications (2014)Google Scholar
  12. 12.
    Bradski, G., Kaehler, A.: Learning OpenCV 3: computer vision in C++ with the OpenCV Library. O’Reilly Media, Sebastopol (2016)Google Scholar
  13. 13.
    Dokulil, J., Sandrieser, M., Benkner, S.: OCR-Vx - an alternative implementation of the open community runtime. In: International Workshop on Runtime Systems for Extreme Scale Programming Models and Architectures, in conjunction with SC15, Austin, Texas, November 2015Google Scholar
  14. 14.
    Dokulil, J., Sandrieser, M., Benkner, S.: Implementing the open community runtime for shared-memory and distributed-memory systems. In: 24th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), Heraklion, Greece. IEEE Computer Society, February 2016Google Scholar
  15. 15.
    Benkner, S., Bajrovic, E., Marth, E., Sandrieser, M., Namyst, R., Thibault, S.: High-level support for pipeline parallelism on many-core architectures. In: Kaklamanis, C., Papatheodorou, T., Spirakis, Paul G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 614–625. Springer, Heidelberg (2012). Scholar
  16. 16.
    Gerndt, M., Cesar, E., Benkner, S. (eds.): Automatic tuning of HPC applications - the periscope tuning framework (PTF). Shakar Verlag, Herzogenrath (2015)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Enes Bajrovic
    • 1
    Email author
  • Siegfried Benkner
    • 1
  • Jiri Dokulil
    • 1
  1. 1.Faculty of Computer ScienceUniversity of ViennaViennaAustria

Personalised recommendations