International Journal of Parallel Programming

, Volume 44, Issue 3, pp 506–530

Smart Containers and Skeleton Programming for GPU-Based Systems

Article

DOI: 10.1007/s10766-015-0357-6

Cite this article as:
Dastgeer, U. & Kessler, C. Int J Parallel Prog (2016) 44: 506. doi:10.1007/s10766-015-0357-6

Abstract

In this paper, we discuss the role, design and implementation of smart containers in the SkePU skeleton library for GPU-based systems. These containers provide an interface similar to C++ STL containers but internally perform runtime optimization of data transfers and runtime memory management for their operand data on the different memory units. We discuss how these containers can help in achieving asynchronous execution for skeleton calls while providing implicit synchronization capabilities in a data consistent manner. Furthermore, we discuss the limitations of the original, already optimizing memory management mechanism implemented in SkePU containers, and propose and implement a new mechanism that provides stronger data consistency and improves performance by reducing communication and memory allocations. With several applications, we show that our new mechanism can achieve significantly (up to 33.4 times) better performance than the initial mechanism for page-locked memory on a multi-GPU based system.

Keywords

SkePU Smart containers Skeleton programming  Memory management Runtime optimizations GPU-based systems 

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.PELAB, Department of Computer and Information ScienceLinköping UniversityLinköpingSweden

Personalised recommendations