Identifying Optimization Opportunities Within Kernel Execution in GPU Codes

  • Robert Lim
  • Allen Malony
  • Boyana Norris
  • Nick Chaimov
Conference paper

DOI: 10.1007/978-3-319-27308-2_16

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9523)
Cite this paper as:
Lim R., Malony A., Norris B., Chaimov N. (2015) Identifying Optimization Opportunities Within Kernel Execution in GPU Codes. In: Hunold S. et al. (eds) Euro-Par 2015: Parallel Processing Workshops. Euro-Par 2015. Lecture Notes in Computer Science, vol 9523. Springer, Cham

Abstract

Tuning codes for GPGPU architectures is challenging because few performance tools can pinpoint the exact causes of execution bottlenecks. While profiling applications can reveal execution behavior with a particular architecture, the abundance of collected information can also overwhelm the user. Moreover, performance counters provide cumulative values but does not attribute events to code regions, which makes identifying performance hot spots difficult. This research focuses on characterizing the behavior of GPU application kernels and its performance at the node level by providing a visualization and metrics display that indicates the behavior of the application with respect to the underlying architecture. We demonstrate the effectiveness of our techniques with LAMMPS and LULESH application case studies on a variety of GPU architectures. By sampling instruction mixes for kernel execution runs, we reveal a variety of intrinsic program characteristics relating to computation, memory and control flow.

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Robert Lim
    • 1
  • Allen Malony
    • 1
  • Boyana Norris
    • 1
  • Nick Chaimov
    • 1
  1. 1.Performance Research Laboratory, High-Performance Computing LaboratoryUniversity of OregonEugeneUSA

Personalised recommendations