Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing
On the work sharing among GPUs and CPU cores on GPU equipped clusters, it is a critical issue to keep load balance among these heterogeneous computing resources. We have been developing a run-time system for this problem on PGAS language named XcalableMP-dev/StarPU . Through the development, we found the necessity of adaptive load balancing for GPU/CPU work sharing to achieve the best performance for various application codes.
In this paper, we enhance our language system XcalableMP-dev/ StarPU to add a new feature which can control the task size to be assigned to these heterogeneous resources dynamically during application execution. As a result of performance evaluation on several benchmarks, we confirmed the proposed feature correctly works and the performance with heterogeneous work sharing provides up to about 40% higher performance than GPU-only utilization even for relatively small size of problems.
KeywordsLoad Balance Runtime System High Level Program Dynamic Load Balance Task Size
Unable to display preview. Download preview PDF.
- 1.Odajima, T., Boku, T., Hanawa, T., Lee, J., Sato, M.: GPU/CPU Work Sharing with Parallel Language XcalableMP-dev for Parallelized Accelerated Computing. In: Sixth International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), pp. 97–106 (September 2012)Google Scholar
- 2.XcalableMP, http://www.xcalablemp.org/
- 3.Lee, J., MinhTuan, T., Odajima, T., Boku, T., Sato, M.: An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters. In: HeteroPar 2011 (with EuroPar 2011), pp. 429–439 (2011)Google Scholar
- 5.Lee, J., Sato, M.: Implementation and Performance Evaluation of XcalableMP: A Parallel Programming Language for Distributed Memory Systems. In: Third International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), pp. 413–420 (September 2010)Google Scholar
- 6.High Performance Fortran Version 2.0, http://www.hpfpc.org/jahpf/spec/hpf-v20-j10.pdf
- 7.Texas Advanced Computing Center - GotoBlas2, http://www.tacc.utexas.edu/tacc-projects/gotoblas2
- 8.PGI Accelerator Compiler, http://www.softek.co.jp/SPG/Pgi/Accel/index.html
- 9.HMPP Workbench, http://www.caps-entreprise.com/hmpp.html
- 10.Agullo, E., Augonnet, C., Dongarra, J., Ltaief, H., Namyst, R., Thibault, S., Tomov, S.: Faster, Cheaper, Better - a Hybridization Methodology to Develop Linear Algebra Software for GPUs. In: GPU Computing Gems, vol. 2 (September 2010)Google Scholar
- 11.Augonnet, C., Thibault, S., Namyst, R.: StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines. Concurrency Computat.: Pract. Exper. (March 2010)Google Scholar