Advertisement

Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing

  • Tetsuya Odajima
  • Taisuke Boku
  • Mitsuhisa Sato
  • Toshihiro Hanawa
  • Yuetsu Kodama
  • Raymond Namyst
  • Samuel Thibault
  • Olivier Aumage
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8286)

Abstract

On the work sharing among GPUs and CPU cores on GPU equipped clusters, it is a critical issue to keep load balance among these heterogeneous computing resources. We have been developing a run-time system for this problem on PGAS language named XcalableMP-dev/StarPU [1]. Through the development, we found the necessity of adaptive load balancing for GPU/CPU work sharing to achieve the best performance for various application codes.

In this paper, we enhance our language system XcalableMP-dev/ StarPU to add a new feature which can control the task size to be assigned to these heterogeneous resources dynamically during application execution. As a result of performance evaluation on several benchmarks, we confirmed the proposed feature correctly works and the performance with heterogeneous work sharing provides up to about 40% higher performance than GPU-only utilization even for relatively small size of problems.

Keywords

Load Balance Runtime System High Level Program Dynamic Load Balance Task Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Odajima, T., Boku, T., Hanawa, T., Lee, J., Sato, M.: GPU/CPU Work Sharing with Parallel Language XcalableMP-dev for Parallelized Accelerated Computing. In: Sixth International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), pp. 97–106 (September 2012)Google Scholar
  2. 2.
  3. 3.
    Lee, J., MinhTuan, T., Odajima, T., Boku, T., Sato, M.: An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters. In: HeteroPar 2011 (with EuroPar 2011), pp. 429–439 (2011)Google Scholar
  4. 4.
  5. 5.
    Lee, J., Sato, M.: Implementation and Performance Evaluation of XcalableMP: A Parallel Programming Language for Distributed Memory Systems. In: Third International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), pp. 413–420 (September 2010)Google Scholar
  6. 6.
    High Performance Fortran Version 2.0, http://www.hpfpc.org/jahpf/spec/hpf-v20-j10.pdf
  7. 7.
    Texas Advanced Computing Center - GotoBlas2, http://www.tacc.utexas.edu/tacc-projects/gotoblas2
  8. 8.
  9. 9.
  10. 10.
    Agullo, E., Augonnet, C., Dongarra, J., Ltaief, H., Namyst, R., Thibault, S., Tomov, S.: Faster, Cheaper, Better - a Hybridization Methodology to Develop Linear Algebra Software for GPUs. In: GPU Computing Gems, vol. 2 (September 2010)Google Scholar
  11. 11.
    Augonnet, C., Thibault, S., Namyst, R.: StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines. Concurrency Computat.: Pract. Exper. (March 2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Tetsuya Odajima
    • 1
  • Taisuke Boku
    • 1
    • 2
  • Mitsuhisa Sato
    • 1
    • 2
  • Toshihiro Hanawa
    • 2
  • Yuetsu Kodama
    • 1
    • 2
  • Raymond Namyst
    • 3
  • Samuel Thibault
    • 3
  • Olivier Aumage
    • 3
  1. 1.Graduate School of Systems and Information EngineeringUniversity of TsukubaJapan
  2. 2.Center for Computational SciencesUniversity of TsukubaJapan
  3. 3.University of Bordeaux - LaBRI - INRIA Bordeaux Sud-OuestFrance

Personalised recommendations