International Conference on High Performance Computing for Computational Science

VECPAR 2012: High Performance Computing for Computational Science - VECPAR 2012 pp 329-345

Adaptive Off-Line Tuning for Optimized Composition of Components for Heterogeneous Many-Core Systems

  • Lu Li
  • Usman Dastgeer
  • Christoph Kessler
Conference paper

DOI: 10.1007/978-3-642-38718-0_32

Volume 7851 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Li L., Dastgeer U., Kessler C. (2013) Adaptive Off-Line Tuning for Optimized Composition of Components for Heterogeneous Many-Core Systems. In: Daydé M., Marques O., Nakajima K. (eds) High Performance Computing for Computational Science - VECPAR 2012. VECPAR 2012. Lecture Notes in Computer Science, vol 7851. Springer, Berlin, Heidelberg

Abstract

In recent years heterogeneous multi-core systems have been given much attention. However, performance optimization on these platforms remains a big challenge. Optimizations performed by compilers are often limited due to lack of dynamic information and run time environment, which makes applications often not performance portable. One current approach is to provide multiple implementations for the same interface that could be used interchangeably depending on the call context, and expose the composition choices to a compiler, deployment-time composition tool and/or run-time system. Using off-line machine-learning techniques allows to improve the precision and reduce the run-time overhead of run-time composition and leads to an improvement of performance portability. In this work we extend the run-time composition mechanism in the PEPPHER composition tool by off-line composition and present an adaptive machine learning algorithm for generating compact and efficient dispatch data structures with low training time. As dispatch data structure we propose an adaptive decision tree structure, which implies an adaptive training algorithm that allows to control the trade-off between training time, dispatch precision and run-time dispatch overhead.

We have evaluated our optimization strategy with simple kernels (matrix-multiplication and sorting) as well as applications from RODINIA benchmark on two GPU-based heterogeneous systems. On average, the precision for composition choices reaches 83.6 percent with approximately 34 minutes off-line training time.

Keywords

Autotuning Heterogeneous architecture GPU 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Lu Li
    • 1
  • Usman Dastgeer
    • 1
  • Christoph Kessler
    • 1
  1. 1.PELAB, IDALinköping UniversityLinköpingSweden