Adaptive Implementation Selection in the SkePU Skeleton Programming Library

  • Usman Dastgeer
  • Lu Li
  • Christoph Kessler
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8299)


In earlier work, we have developed the SkePU skeleton programming library for modern multicore systems equipped with one or more programmable GPUs. The library internally provides four types of implementations (implementation variants) for each skeleton: serial C++, OpenMP, CUDA and OpenCL targeting either CPU or GPU execution respectively. Deciding which implementation would run faster for a given skeleton call depends upon the computation, problem size(s), system architecture and data locality.

In this paper, we present our work on automatic selection between these implementation variants by an offline machine learning method which generates a compact decision tree with low training overhead. The proposed selection mechanism is flexible yet high-level allowing a skeleton programmer to control different training choices at a higher abstraction level. We have evaluated our optimization strategy with 9 applications/kernels ported to our skeleton library and achieve on average more than 94% (90%) accuracy with just 0.53% (0.58%) training space exploration on two systems. Moreover, we discuss one application scenario where local optimization considering a single skeleton call can prove sub-optimal, and propose a heuristic for bulk implementation selection considering more than one skeleton call to address such application scenarios.


Skeleton programming GPU programming implementation selection adaptive offline learning automated performance tuning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cole, M.: Algorithmic Skeletons: Structured management of parallel computation. MIT Press, Cambdridge (1989)MATHGoogle Scholar
  2. 2.
    Kessler, C., Gorlatch, S., Enmyren, J., Dastgeer, U., Steuwer, M., Kegel, P.: Skeleton Programming for Portable Many-Core Computing. In: Pllana, S., Xhafa, F. (eds.) Programming Multi-Core and Many-Core Computing Systems, 20 pages. Wiley Interscience, New York (2013)Google Scholar
  3. 3.
    Dastgeer, U.: Skeleton Programming for Heterogeneous GPU-based Systems. Licentiate thesis. Thesis No 1504. Dept. of Comp. and Inf. Sci., Linköping University (October 2011)Google Scholar
  4. 4.
    Steuwer, M., Kegel, P., Gorlatch, S.: SkelCL - A Portable Skeleton Library for High-Level GPU Programming. In: IEEE Int. Sym. on Par. and Dist. Proc. Workshop and Phd Forum (IPDPSW), Anchorage, USA (2011)Google Scholar
  5. 5.
    Dastgeer, U., Li, L., Kessler, C.: The PEPPHER Composition Tool: Performance-Aware Dynamic Composition of Applications for GPU-based Systems. In: Proc. 2012 Int. Workshop on Multi-Core Computing Systems (MuCoCoS 2012), in conjunction with Supercomputing Conference (SC 2012), Salt Lake City, Utah, USA (2012)Google Scholar
  6. 6.
    Dastgeer, U., Enmyren, J., Kessler, C.W.: Auto-tuning SkePU: A multi-backend skeleton programming framework for multi-GPU systems. In: Proc. of the 4th Int. Workshop on Multicore Soft. Eng (IWMSE 2011). ACM, NY (2011)Google Scholar
  7. 7.
    Li, L., Dastgeer, U., Kessler, C.: Adaptive off-line tuning for optimized composition of components for heterogeneous many-core systems. In: Daydé, M., Marques, O., Nakajima, K. (eds.) VECPAR. LNCS, vol. 7851, pp. 329–345. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  8. 8.
    Ernsting, S., Kuchen, H.: Algorithmic skeletons for multi-core, multi-GPU systems and clusters. Int. J. of High Perf. Comp. and Netw. 7(2), 129–138 (2012)Google Scholar
  9. 9.
    Tung, L.D., Duc, N.H., Anh, P.T., Hoang, N.H., Thap, N.M.: An Intermediate Library for Multi-GPUs Computing Skeletons. In: IEEE RIVF International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future, RIVF (2012)Google Scholar
  10. 10.
    Buono, D., Danelutto, M., Lametti, S., Torquati, M.: Parallel Patterns for General Purpose Many-Core. In: Euromicro International Conference on Parallel, Distributed and Network-based Processing, PDP 2013. IEEE Computer Society Press (2013)Google Scholar
  11. 11.
    Nugteren, C., Corporaal, H.: Introducing ‘Bones’: A parallelizing source-to-source compiler based on algorithmic skeletons. In: Proc. 5th Annual Workshop on General Purpose Proc. with Graph. Proc. Units (GPGPU-5). ACM, NY (2012)Google Scholar
  12. 12.
    Collins, A., Fensch, C., Leather, H.: Auto-tuning parallel skeletons. Parallel Processing Letters 22(02) (2012)Google Scholar
  13. 13.
    Thomas, N., Tanase, G., Tkachyshyn, O., Perdue, J., Amato, N.M., Rauchwerger, L.: A framework for adaptive algorithm selection in STAPL. In: Proc. 10th Symposium on Principles and Practice of Parallel Programming (PPoPP 2005). ACM, New York (2005)Google Scholar
  14. 14.
    Kohavi, R.: Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. AAAI Press (1996)Google Scholar
  15. 15.
    Püschel, M., Moura, J.M.F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N.: SPIRAL: Code Generation for DSP Transforms. Proceedings of the IEEE, Special issue on Program Generation, Optimization, and Adaptation 93(2), 232–275 (2005)Google Scholar
  16. 16.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)Google Scholar
  17. 17.
    Vuduc, R., Demmel, J.W., Bilmes, J.A.: Statistical Models for Empirical Search-Based Performance Tuning. Int. J. High Perform. Comput. Appl. 18(1) (2004)Google Scholar
  18. 18.
    Mazouz, A., Touati, S., Barthou, D.: Performance evaluation and analysis of thread pinning strategies on multi-core platforms: Case study of SPEC OMP applications on Intel architectures. In: International Conference on High Performance Computing and Simulation, HPCS (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Usman Dastgeer
    • 1
  • Lu Li
    • 1
  • Christoph Kessler
    • 1
  1. 1.IDALinköping UniversityLinköpingSweden

Personalised recommendations