Debugging CUDA Accelerated Parallel Applications with TotalView

  • Chris Gottbrath
  • Royd Lüdtke
Conference paper


CUDA introduces developers to a number of concepts (such as kernels, streams, warps and explicitly multi-level memory) beyond what they are used to in serial, parallel and multi-threaded applications. Visibility into these elements is critical for troubleshooting and tuning applications that make use of CUDA. This paper will highlight CUDA concepts implemented in CUDA 3.0–4.0, the complications they introduce for troubleshooting, and how TotalView helps the user deal with these new CUDA specific constructs.


High Performance Computing Global Memory Host Processor Cell Processor Streaming Multiprocessor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Rogue Wave SoftwareBoulderUSA

Personalised recommendations