High Performance Computing for Computational Science - VECPAR 2012

Volume 7851 of the series Lecture Notes in Computer Science pp 329-345

Adaptive Off-Line Tuning for Optimized Composition of Components for Heterogeneous Many-Core Systems

  • Lu LiAffiliated withPELAB, IDA, Linköping University
  • , Usman DastgeerAffiliated withPELAB, IDA, Linköping University
  • , Christoph KesslerAffiliated withPELAB, IDA, Linköping University

* Final gross prices may vary according to local VAT.

Get Access


In recent years heterogeneous multi-core systems have been given much attention. However, performance optimization on these platforms remains a big challenge. Optimizations performed by compilers are often limited due to lack of dynamic information and run time environment, which makes applications often not performance portable. One current approach is to provide multiple implementations for the same interface that could be used interchangeably depending on the call context, and expose the composition choices to a compiler, deployment-time composition tool and/or run-time system. Using off-line machine-learning techniques allows to improve the precision and reduce the run-time overhead of run-time composition and leads to an improvement of performance portability. In this work we extend the run-time composition mechanism in the PEPPHER composition tool by off-line composition and present an adaptive machine learning algorithm for generating compact and efficient dispatch data structures with low training time. As dispatch data structure we propose an adaptive decision tree structure, which implies an adaptive training algorithm that allows to control the trade-off between training time, dispatch precision and run-time dispatch overhead.

We have evaluated our optimization strategy with simple kernels (matrix-multiplication and sorting) as well as applications from RODINIA benchmark on two GPU-based heterogeneous systems. On average, the precision for composition choices reaches 83.6 percent with approximately 34 minutes off-line training time.


Autotuning Heterogeneous architecture GPU