Theoretical Parallel Computing Models for GPU Computing



The latest GPUs are designed for general purpose computing and attract the attention of many application developers. The main purpose of this chapter is to introduce theoretical parallel computing models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM), that capture the essence of CUDA-enabled GPUs. These models have three parameters: the number p of threads and the width w of the memory and the memory access latency l. As examples of parallel algorithms on these theoretical models, we show fundamental algorithms for computing the sum and the prefix-sums of n numbers. We first show that the sum of n numbers can be computed in \(O( \frac{n} {w} + \frac{\mathit{nl}} {p} + l\log n)\) time units on the DMM and the UMM. We then go on to show that \(\varOmega ( \frac{n} {w} + \frac{\mathit{nl}} {p} + l\log n)\) time units are necessary to compute the sum. We also present a simple parallel algorithm for computing the prefix-sums that runs in \(O(\frac{n\log n} {w} + \frac{nl\log n} {p} + l\log n)\) time units on the DMM and the UMM. Clearly, this algorithm is not optimal. We present an optimal parallel algorithm that computes the prefix-sums of n numbers in \(O( \frac{n} {w} + \frac{\mathit{nl}} {p} + l\log n)\) time units on the DMM and the UMM. We also show several experimental results on GeForce Titan GPU.

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Hiroshima UniversityHigashi-HiroshimaJapan

Personalised recommendations