, Volume 96, Issue 12, pp 1195–1211 | Cite as

The PEPPHER composition tool: performance-aware composition for GPU-based systems



The PEPPHER (EU FP7 project) component model defines the notion of component, interface and meta-data for homogeneous and heterogeneous parallel systems. In this paper, we describe and evaluate the PEPPHER composition tool, which explores the application’s components and their implementation variants, generates the necessary low-level code that interacts with the runtime system, and coordinates the native compilation and linking of the various code units to compose the overall application code to optimize performance. We discuss the concept of smart containers and its benefits for reducing dispatch overhead, exploiting implicit parallelism across component invocations and runtime optimization of data transfers. In an experimental evaluation with several applications, we demonstrate that the composition tool provides a high-level programming front-end while effectively utilizing the task-based PEPPHER runtime system (StarPU) underneath for different usage scenarios on GPU-based systems.


PEPPHER project Component model GPU-based systems Performance portability Dynamic scheduling 

Mathematics Subject Classification

68N20 Compilers and interpreters 



This work was funded by EU FP7, project PEPPHER, grant #248481 ( and by SeRC. We would like to thank University of Vienna for providing access to their machine.


  1. 1.
    Benkner S, Pllana S, Träff JL, Tsigas P, Dolinsky U, Augonnet C, Bachmayer B, Kessler C, Moloney D, Osipov V (2011) PEPPHER: efficient and productive usage of hybrid computing systems. IEEE Micro 31(5):28–41CrossRefGoogle Scholar
  2. 2.
    Augonnet C, Thibault S, Namyst R, Wacrenier PA (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exper 23(2):187–198CrossRefGoogle Scholar
  3. 3.
    Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee SH, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IEEE international symposium on workload characterization (IISWC), pp 44–54Google Scholar
  4. 4.
    NVIDIA Corporation (2012) CUBLAS library: NVIDIA CUDA basic linear algebra subroutines.
  5. 5.
    Bell N, Garland M (2012) CUSP library v0.2: generic parallel algorithms for sparse matrix and graph computations.
  6. 6.
    Asanovic K et al (2009) A view of the parallel computing landscape. Commun ACM 52(10):56–67CrossRefGoogle Scholar
  7. 7.
    Kessler CW, Löwe W (2012) Optimized composition of performance-aware parallel components. Concurr Comput Pract Exper 24(5):481–498CrossRefGoogle Scholar
  8. 8.
    Li L, Dastgeer U, Kessler C (2013) Adaptive off-line tuning for optimized composition of components for heterogeneous many-core systems. In: Seventh international workshop on automatic performance tuning (iWAPT-2012), Proc. VECPAR-2012 conference, pp 329–345Google Scholar
  9. 9.
    Kicherer M, Buchty R, Karl W (2011) Cost-aware function migration in heterogeneous systems. In: Proceedings conference on High Perf. and Emb. Arch. and Comp. (HiPEAC), pp 137–145Google Scholar
  10. 10.
    Kicherer M, Nowak F, Buchty R, Karl W (2012) Seamlessly portable applications: Managing the diversity of modern heterogeneous systems. ACM Trans Archit Code Optim 8(4):42(1–42:20)Google Scholar
  11. 11.
    Alexandrescu A (2001) Modern C++ design: generic programming and design patterns applied. Addison-Wesley, ReadingGoogle Scholar
  12. 12.
    Park R (1992) Software size measurement: a framework for counting source statements. Software Engineering Institute, Carnegie Mellon University, Pittsburgh, Tech. repGoogle Scholar
  13. 13.
    Davis TA, Hu Y (2011) The university of florida sparse matrix collection. ACM Trans Math Softw 38(1):1(1–1:25)Google Scholar
  14. 14.
    Ng R, Levoy M, Brédif M, Duval G, Horowitz M, Hanrahan P (2005) Light field photography with a hand-held plenoptic camera. Stanford University, Stanford, Tech. repGoogle Scholar
  15. 15.
    Augonnet C (2011) Scheduling tasks over multicore machines enhanced with accelerators: a runtime system’s perspective. PhD thesis, Université Bordeaux 1Google Scholar
  16. 16.
    Ansel J, Chan C, Wong YL, Olszewski M, Zhao Q, Edelman A, Amarasinghe S (2009) PetaBricks: a language and compiler for algorithmic choice. Proc Conf on Prog Lang Design and Impl (PLDI)Google Scholar
  17. 17.
    Wang PH, Collins JD, Chinya GN, Jiang H, Tian X, Girkar M, Yang NY, Lueh GY, Wang H (2007) EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system. In: Proceedings of conference on programming language design and implementation (PLDI), pp 156–166Google Scholar
  18. 18.
    Linderman MD, Collins JD, Wang H, Meng THY (2008) Merge: a programming model for heterogeneous multi-core systems. In: Proceedings of international conference on architecture support for programming language and Operating Systems, (ASPLOS 2008), pp 287–296Google Scholar
  19. 19.
    Huang SS, Hormati A, Bacon DF, Rabbah R (2008) Liquid metal: object-oriented programming across the hardware/software boundary. In: Proceedings of 22nd European conference on object-oriented progamming (ECOOP), pp 76–103Google Scholar
  20. 20.
    Wernsing JR, Stitt G (2010) Elastic computing: a framework for transparent, portable, and adaptive multi-core heterogeneous computing. In: Proceedings of conference on languages, compilers, and tools for embedded systems (LCTES), pp 115–124Google Scholar
  21. 21.
    Chafi H, Sujeeth AK, Brown KJ, Lee H, Atreya AR, Olukotun K (2011) A domain-specific approach to heterogeneous parallelism. In: 16th symposium on principles and practice of parallel programming (PPoPP), pp 35–46Google Scholar

Copyright information

© Springer-Verlag Wien 2013

Authors and Affiliations

  1. 1.PELAB, Department of Computer and Information ScienceLinköping UniversityLinköpingSweden

Personalised recommendations