Skip to main content

Theoretical Parallel Computing Models for GPU Computing

  • Chapter
  • First Online:
  • 1825 Accesses

Abstract

The latest GPUs are designed for general purpose computing and attract the attention of many application developers. The main purpose of this chapter is to introduce theoretical parallel computing models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM), that capture the essence of CUDA-enabled GPUs. These models have three parameters: the number p of threads and the width w of the memory and the memory access latency l. As examples of parallel algorithms on these theoretical models, we show fundamental algorithms for computing the sum and the prefix-sums of n numbers. We first show that the sum of n numbers can be computed in \(O( \frac{n} {w} + \frac{\mathit{nl}} {p} + l\log n)\) time units on the DMM and the UMM. We then go on to show that \(\varOmega ( \frac{n} {w} + \frac{\mathit{nl}} {p} + l\log n)\) time units are necessary to compute the sum. We also present a simple parallel algorithm for computing the prefix-sums that runs in \(O(\frac{n\log n} {w} + \frac{nl\log n} {p} + l\log n)\) time units on the DMM and the UMM. Clearly, this algorithm is not optimal. We present an optimal parallel algorithm that computes the prefix-sums of n numbers in \(O( \frac{n} {w} + \frac{\mathit{nl}} {p} + l\log n)\) time units on the DMM and the UMM. We also show several experimental results on GeForce Titan GPU.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. A.V. Aho, J.D. Ullman, J.E. Hopcroft, Data Structures and Algorithms (Addison Wesley, Boston, 1983)

    MATH  Google Scholar 

  2. S.G. Akl, Parallel Sorting Algorithms (Academic, London, 1985)

    MATH  Google Scholar 

  3. K.E. Batcher, Sorting networks and their applications, in Proc. AFIPS Spring Joint Comput. Conf., vol. 32, pp. 307–314, 1968

    Google Scholar 

  4. M.J. Flynn, Some computer organizations and their effectiveness. IEEE Trans. Comput. C-21, 948–960 (1972)

    Article  Google Scholar 

  5. A. Gibbons, W. Rytter, Efficient Parallel Algorithms (Cambridge University Press, New York, 1988)

    MATH  Google Scholar 

  6. A. Gottlieb, R. Grishman, C.P. Kruskal, K.P., McAuliffe, L. Rudolph, M. Snir, The nyu ultracomputer – designing an MIMD shared memory parallel computer. IEEE Trans. Comput. C-32(2), 175–189 (1983)

    Google Scholar 

  7. A. Grama, G. Karypis, V. Kumar, A. Gupta, Introduction to Parallel Computing (Addison Wesley, Boston, 2003)

    Google Scholar 

  8. M. Harris, S. Sengupta, J.D. Owens, Parallel prefix sum (scan) with CUDA (Chapter 39), in GPU Gems 3 (Addison Wesley, Boston, 2007)

    Google Scholar 

  9. W.D. Hillis, G.L. Steele Jr., Data parallel algorithms. Commun. ACM 29(12), 1170–1183 (1986). doi:10.1145/7902.7903. http://doi.acm.org/10.1145/7902.7903

  10. W.W. Hwu, GPU Computing Gems, Emerald Edition (Morgan Kaufmann, San Francisco, 2011)

    Google Scholar 

  11. Y. Ito, K. Ogawa, K. Nakano, Fast ellipse detection algorithm using Hough transform on the GPU, in Proc. of International Conference on Networking and Computing, pp. 313–319, 2011

    Google Scholar 

  12. A. Kasagi, K. Nakano, Y. Ito, Offline permutation algorithms on the discrete memory machine with performance evaluation on the GPU. IEICE Trans. Inf. Syst. Vol. E96-D(12), 2617–2625 (2013)

    Google Scholar 

  13. A. Kasagi, K. Nakano, Y. Ito, An optimal offline permutation algorithm on the hierarchical memory machine, with the GPU implementation, in Proc. of International Conference on Parallel Processing, pp. 1–10, 2013

    Google Scholar 

  14. D.H. Lawrie, Access and alignment of data in an array processor. IEEE Trans. Comput. C-24(12), 1145– 1155 (1975)

    Article  MathSciNet  Google Scholar 

  15. F.T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes (Morgan Kaufmann, San Francisco, 1991)

    Google Scholar 

  16. D. Man, K. Nakano, Y. Ito, The approximate string matching on the hierarchical memory machine, with performance evaluation, in Proc. of International Symposium on Embedded Multicore/Many-core System-on-Chip, pp. 79–84, 2013

    Google Scholar 

  17. D. Man, K. Uda, Y. Ito, K. Nakano, A GPU implementation of computing Euclidean distance map with efficient memory access, in Proc. of International Conference on Networking and Computing, pp. 68–76, 2011

    Google Scholar 

  18. D. Man, K. Uda, H. Ueyama, Y. Ito, K. Nakano, Implementations of a parallel algorithm for computing euclidean distance map in multicore processors and GPUs. Int. J. Netw. Comput. 1(2), 260–276 (2011)

    Google Scholar 

  19. K. Nakano, Asynchronous memory machine models with barrier synchronization, in Proc. of International Conference on Networking and Computing, pp. 58–67, 2012

    Google Scholar 

  20. K. Nakano, Efficient implementations of the approximate string matching on the memory machine models, in Proc. of International Conference on Networking and Computing, pp. 233–239, 2012

    Google Scholar 

  21. K. Nakano, An optimal parallel prefix-sums algorithm on the memory machine models for GPUs, in Proc. of International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP). Lecture Notes in Computer Science, vol. 7439 (Springer, Berlin, 2012), pp. 99–113

    Google Scholar 

  22. K. Nakano, Simple memory machine models for GPUs, in Proc. of International Parallel and Distributed Processing Symposium Workshops, pp. 788–797, 2012

    Google Scholar 

  23. K. Nakano, The hierarchical memory machine model for GPUs, in Proc. of International Parallel and Distributed Processing Symposium Workshops, pp. 591–600, 2013

    Google Scholar 

  24. K. Nakano, Sequential memory access on the unified memory machine with application to the dynamic programming, in Proc. of International Symposium on Computing and Networking, pp. 85–94, 2013

    Google Scholar 

  25. K. Nakano, S. Matsumae, Y. Ito, The random address shift to reduce the memory access congestion on the discrete memory machine, in Proc. of International Symposium on Computing and Networking, pp. 95–103, 2013

    Google Scholar 

  26. K. Nishida, Y. Ito, K. Nakano, Accelerating the dynamic programming for the matrix chain product on the GPU, in Proc. of International Conference on Networking and Computing, pp. 320–326, 2011

    Google Scholar 

  27. K. Nishida, Y. Ito, K. Nakano, Accelerating the dynamic programming for the optimal poygon triangulation on the GPU, in Proc. of International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP). Lecture Notes in Computer Science, vol. 7439 (Springer, Berlin, 2012), pp. 1–15

    Google Scholar 

  28. NVIDIA Corporation, NVIDIA CUDA C best practice guide version 3.1 (2010)

    Google Scholar 

  29. NVIDIA Corporation, NVIDIA CUDA C programming guide version 5.0 (2012)

    Google Scholar 

  30. M.J. Quinn, Parallel Computing: Theory and Practice (McGraw-Hill, New York, 1994)

    Google Scholar 

  31. A. Uchida, Y. Ito, K. Nakano, Fast and accurate template matching using pixel rearrangement on the GPU, in Proc. of International Conference on Networking and Computing, pp. 153–159, 2011

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Koji Nakano .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Nakano, K. (2014). Theoretical Parallel Computing Models for GPU Computing. In: Koç, Ç. (eds) Open Problems in Mathematics and Computational Science. Springer, Cham. https://doi.org/10.1007/978-3-319-10683-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10683-0_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10682-3

  • Online ISBN: 978-3-319-10683-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics