Skip to main content

Abstract

This chapter provides a brief overview of the evolution of the graphics processor; its hardware architecture, CUDA abstraction, and the parallel programming paradigm were provided in this chapter. The important characteristics of the computing system, including massive numbers of cores, SIMT execution, memory bandwidth, CUDA abstract and concurrent engines, and dynamic parallelism, are considered. In addition, multi-threading programming techniques for CPUs are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. D. Blythe, Rise of the graphics processor. Proc. IEEE 96(5), 761–778 (2008)

    Article  Google Scholar 

  2. J.D. Owens, M. Houston, D. Luebke, S. Green, J.E. Stone, J.C. Phillips, GPU computing. Proc. IEEE 96(5), 879–899 (2008)

    Article  Google Scholar 

  3. D. Luebke, G. Humphreys, How GPUs work. Computer 40(2), 96–100 (2007)

    Article  Google Scholar 

  4. E. Lindholm, J. Nickolls, S. Oberman, J. Montrym, NVIDIAⓇ Tesla: A unified graphics and computing architecture. IEEE Micro 28(2), 39–55 (2008)

    Article  Google Scholar 

  5. T. Akenine-Moller, J. Strom, Graphics processing units for handhelds. Proc. IEEE 96(5), 779–789 (2008)

    Article  Google Scholar 

  6. J. Lemley, S. Bazrafkan, P. Corcoran, Deep learning for consumer devices and services: pushing the limits for machine learning, artificial intelligence, and computer vision IEEE Consum. Electron. Mag. 6(2), 48–56 (2017)

    Google Scholar 

  7. E. Azarkhish, D. Rossi, I. Loi, L. Benini, Neurostream: scalable and energy efficient deep learning with smart memory cubes. IEEE Trans. Parallel Distrib. Syst. 29(2), 420–434 (2018)

    Article  Google Scholar 

  8. W. Choi, R.G. Kim, J.R. Doppa, P.P. Pande, D. Marculescu, R. Marculescu, On-chip communication network for efficient training of deep convolutional networks on heterogeneous manycore systems. IEEE Trans. Comput. 67(5), 672–686 (2018)

    Article  MathSciNet  Google Scholar 

  9. NVIDIA Corporation, Whitepaper NVIDIA GF100 (2010)

    Google Scholar 

  10. NVIDIA Corp., NVIDIA CUDA C Programming Guide Version 4.0 (2011)

    Google Scholar 

  11. https://www.khronos.org/opencl/

  12. https://developer.nvidia.com/directcompute

  13. G. Amdahl, Validity of the single processor approach to achieving large-scale computing capabilities, in AFIPS Conference Proceedings, pp. 483–485 (1967)

    Google Scholar 

  14. J.L. Gustafson, Reevaluating Amdahl’s law. Commun. ACM 31, 532–533 (1988)

    Google Scholar 

  15. NVIDIAⓇ Corp., Whitepaper NVIDIA Tesla P100 (2016)

    Google Scholar 

  16. NVIDIA Ⓡ Tesla V100 GPU architecture. (NVIDIA Corp., USA, 2017)

    Google Scholar 

  17. NVIDIAⓇ Corp., CUDA C Programming Guide version 5.2 (2015)

    Google Scholar 

  18. G. Ballard, J. Demmel, O. Holtz, O. Schwartz, Minimizing communication in linear algebra. SIAM J. Matrix Anal. Appl. 32(3), 866–901 (2011)

    Article  MathSciNet  Google Scholar 

  19. V. Volkov, J.W. Demmel, Benchmarking GPUs to tune dense linear algebra. SC2008 Proceedings of the ACM/IEEE Conference on Supercomputing, pp. 1–11 (2008)

    Google Scholar 

  20. NVIDIAⓇ Corp., CUDA C Programming Guide version 11.2 (2021)

    Google Scholar 

  21. https://www.openmp.org/

  22. OpenMP application programming interface version 5.0, in OpenMP architecture review board (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Dinavahi, V., Lin, N. (2022). Many-Core Processors. In: Parallel Dynamic and Transient Simulation of Large-Scale Power Systems. Springer, Cham. https://doi.org/10.1007/978-3-030-86782-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86782-9_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86781-2

  • Online ISBN: 978-3-030-86782-9

  • eBook Packages: EnergyEnergy (R0)

Publish with us

Policies and ethics