Theoretical Parallel Computing Models for GPU Computing

Nakano, Koji

doi:10.1007/978-3-319-10683-0_14

Theoretical Parallel Computing Models for GPU Computing

Koji Nakano²

Chapter
First Online: 11 November 2014

1825 Accesses

Abstract

The latest GPUs are designed for general purpose computing and attract the attention of many application developers. The main purpose of this chapter is to introduce theoretical parallel computing models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM), that capture the essence of CUDA-enabled GPUs. These models have three parameters: the number p of threads and the width w of the memory and the memory access latency l. As examples of parallel algorithms on these theoretical models, we show fundamental algorithms for computing the sum and the prefix-sums of n numbers. We first show that the sum of n numbers can be computed in \(O( \frac{n} {w} + \frac{\mathit{nl}} {p} + l\log n)\) time units on the DMM and the UMM. We then go on to show that \(\varOmega ( \frac{n} {w} + \frac{\mathit{nl}} {p} + l\log n)\) time units are necessary to compute the sum. We also present a simple parallel algorithm for computing the prefix-sums that runs in \(O(\frac{n\log n} {w} + \frac{nl\log n} {p} + l\log n)\) time units on the DMM and the UMM. Clearly, this algorithm is not optimal. We present an optimal parallel algorithm that computes the prefix-sums of n numbers in \(O( \frac{n} {w} + \frac{\mathit{nl}} {p} + l\log n)\) time units on the DMM and the UMM. We also show several experimental results on GeForce Titan GPU.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

A.V. Aho, J.D. Ullman, J.E. Hopcroft, Data Structures and Algorithms (Addison Wesley, Boston, 1983)
MATH Google Scholar
S.G. Akl, Parallel Sorting Algorithms (Academic, London, 1985)
MATH Google Scholar
K.E. Batcher, Sorting networks and their applications, in Proc. AFIPS Spring Joint Comput. Conf., vol. 32, pp. 307–314, 1968
Google Scholar
M.J. Flynn, Some computer organizations and their effectiveness. IEEE Trans. Comput. C-21, 948–960 (1972)
Article Google Scholar
A. Gibbons, W. Rytter, Efficient Parallel Algorithms (Cambridge University Press, New York, 1988)
MATH Google Scholar
A. Gottlieb, R. Grishman, C.P. Kruskal, K.P., McAuliffe, L. Rudolph, M. Snir, The nyu ultracomputer – designing an MIMD shared memory parallel computer. IEEE Trans. Comput. C-32(2), 175–189 (1983)
Google Scholar
A. Grama, G. Karypis, V. Kumar, A. Gupta, Introduction to Parallel Computing (Addison Wesley, Boston, 2003)
Google Scholar
M. Harris, S. Sengupta, J.D. Owens, Parallel prefix sum (scan) with CUDA (Chapter 39), in GPU Gems 3 (Addison Wesley, Boston, 2007)
Google Scholar
W.D. Hillis, G.L. Steele Jr., Data parallel algorithms. Commun. ACM 29(12), 1170–1183 (1986). doi:10.1145/7902.7903. http://doi.acm.org/10.1145/7902.7903
W.W. Hwu, GPU Computing Gems, Emerald Edition (Morgan Kaufmann, San Francisco, 2011)
Google Scholar
Y. Ito, K. Ogawa, K. Nakano, Fast ellipse detection algorithm using Hough transform on the GPU, in Proc. of International Conference on Networking and Computing, pp. 313–319, 2011
Google Scholar
A. Kasagi, K. Nakano, Y. Ito, Offline permutation algorithms on the discrete memory machine with performance evaluation on the GPU. IEICE Trans. Inf. Syst. Vol. E96-D(12), 2617–2625 (2013)
Google Scholar
A. Kasagi, K. Nakano, Y. Ito, An optimal offline permutation algorithm on the hierarchical memory machine, with the GPU implementation, in Proc. of International Conference on Parallel Processing, pp. 1–10, 2013
Google Scholar
D.H. Lawrie, Access and alignment of data in an array processor. IEEE Trans. Comput. C-24(12), 1145– 1155 (1975)
Article MathSciNet Google Scholar
F.T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes (Morgan Kaufmann, San Francisco, 1991)
Google Scholar
D. Man, K. Nakano, Y. Ito, The approximate string matching on the hierarchical memory machine, with performance evaluation, in Proc. of International Symposium on Embedded Multicore/Many-core System-on-Chip, pp. 79–84, 2013
Google Scholar
D. Man, K. Uda, Y. Ito, K. Nakano, A GPU implementation of computing Euclidean distance map with efficient memory access, in Proc. of International Conference on Networking and Computing, pp. 68–76, 2011
Google Scholar
D. Man, K. Uda, H. Ueyama, Y. Ito, K. Nakano, Implementations of a parallel algorithm for computing euclidean distance map in multicore processors and GPUs. Int. J. Netw. Comput. 1(2), 260–276 (2011)
Google Scholar
K. Nakano, Asynchronous memory machine models with barrier synchronization, in Proc. of International Conference on Networking and Computing, pp. 58–67, 2012
Google Scholar
K. Nakano, Efficient implementations of the approximate string matching on the memory machine models, in Proc. of International Conference on Networking and Computing, pp. 233–239, 2012
Google Scholar
K. Nakano, An optimal parallel prefix-sums algorithm on the memory machine models for GPUs, in Proc. of International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP). Lecture Notes in Computer Science, vol. 7439 (Springer, Berlin, 2012), pp. 99–113
Google Scholar
K. Nakano, Simple memory machine models for GPUs, in Proc. of International Parallel and Distributed Processing Symposium Workshops, pp. 788–797, 2012
Google Scholar
K. Nakano, The hierarchical memory machine model for GPUs, in Proc. of International Parallel and Distributed Processing Symposium Workshops, pp. 591–600, 2013
Google Scholar
K. Nakano, Sequential memory access on the unified memory machine with application to the dynamic programming, in Proc. of International Symposium on Computing and Networking, pp. 85–94, 2013
Google Scholar
K. Nakano, S. Matsumae, Y. Ito, The random address shift to reduce the memory access congestion on the discrete memory machine, in Proc. of International Symposium on Computing and Networking, pp. 95–103, 2013
Google Scholar
K. Nishida, Y. Ito, K. Nakano, Accelerating the dynamic programming for the matrix chain product on the GPU, in Proc. of International Conference on Networking and Computing, pp. 320–326, 2011
Google Scholar
K. Nishida, Y. Ito, K. Nakano, Accelerating the dynamic programming for the optimal poygon triangulation on the GPU, in Proc. of International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP). Lecture Notes in Computer Science, vol. 7439 (Springer, Berlin, 2012), pp. 1–15
Google Scholar
NVIDIA Corporation, NVIDIA CUDA C best practice guide version 3.1 (2010)
Google Scholar
NVIDIA Corporation, NVIDIA CUDA C programming guide version 5.0 (2012)
Google Scholar
M.J. Quinn, Parallel Computing: Theory and Practice (McGraw-Hill, New York, 1994)
Google Scholar
A. Uchida, Y. Ito, K. Nakano, Fast and accurate template matching using pixel rearrangement on the GPU, in Proc. of International Conference on Networking and Computing, pp. 153–159, 2011
Google Scholar

Download references

Author information

Authors and Affiliations

Hiroshima University, Higashi-Hiroshima, 739-8527, Japan
Koji Nakano

Authors

Koji Nakano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Koji Nakano .

Editor information

Editors and Affiliations

Dept. of Computer Science, University of California, Santa Barbara, Santa Barbara, California, USA
Çetin Kaya Koç

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nakano, K. (2014). Theoretical Parallel Computing Models for GPU Computing. In: Koç, Ç. (eds) Open Problems in Mathematics and Computational Science. Springer, Cham. https://doi.org/10.1007/978-3-319-10683-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-10683-0_14
Published: 11 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10682-3
Online ISBN: 978-3-319-10683-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics