Abstract
This chapter provides a brief overview of the evolution of the graphics processor; its hardware architecture, CUDA abstraction, and the parallel programming paradigm were provided in this chapter. The important characteristics of the computing system, including massive numbers of cores, SIMT execution, memory bandwidth, CUDA abstract and concurrent engines, and dynamic parallelism, are considered. In addition, multi-threading programming techniques for CPUs are also discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
D. Blythe, Rise of the graphics processor. Proc. IEEE 96(5), 761–778 (2008)
J.D. Owens, M. Houston, D. Luebke, S. Green, J.E. Stone, J.C. Phillips, GPU computing. Proc. IEEE 96(5), 879–899 (2008)
D. Luebke, G. Humphreys, How GPUs work. Computer 40(2), 96–100 (2007)
E. Lindholm, J. Nickolls, S. Oberman, J. Montrym, NVIDIAⓇ Tesla: A unified graphics and computing architecture. IEEE Micro 28(2), 39–55 (2008)
T. Akenine-Moller, J. Strom, Graphics processing units for handhelds. Proc. IEEE 96(5), 779–789 (2008)
J. Lemley, S. Bazrafkan, P. Corcoran, Deep learning for consumer devices and services: pushing the limits for machine learning, artificial intelligence, and computer vision IEEE Consum. Electron. Mag. 6(2), 48–56 (2017)
E. Azarkhish, D. Rossi, I. Loi, L. Benini, Neurostream: scalable and energy efficient deep learning with smart memory cubes. IEEE Trans. Parallel Distrib. Syst. 29(2), 420–434 (2018)
W. Choi, R.G. Kim, J.R. Doppa, P.P. Pande, D. Marculescu, R. Marculescu, On-chip communication network for efficient training of deep convolutional networks on heterogeneous manycore systems. IEEE Trans. Comput. 67(5), 672–686 (2018)
NVIDIA Corporation, Whitepaper NVIDIA GF100 (2010)
NVIDIA Corp., NVIDIA CUDA C Programming Guide Version 4.0 (2011)
G. Amdahl, Validity of the single processor approach to achieving large-scale computing capabilities, in AFIPS Conference Proceedings, pp. 483–485 (1967)
J.L. Gustafson, Reevaluating Amdahl’s law. Commun. ACM 31, 532–533 (1988)
NVIDIAⓇ Corp., Whitepaper NVIDIA Tesla P100 (2016)
NVIDIA Ⓡ Tesla V100 GPU architecture. (NVIDIA Corp., USA, 2017)
NVIDIAⓇ Corp., CUDA C Programming Guide version 5.2 (2015)
G. Ballard, J. Demmel, O. Holtz, O. Schwartz, Minimizing communication in linear algebra. SIAM J. Matrix Anal. Appl. 32(3), 866–901 (2011)
V. Volkov, J.W. Demmel, Benchmarking GPUs to tune dense linear algebra. SC2008 Proceedings of the ACM/IEEE Conference on Supercomputing, pp. 1–11 (2008)
NVIDIAⓇ Corp., CUDA C Programming Guide version 11.2 (2021)
OpenMP application programming interface version 5.0, in OpenMP architecture review board (2018)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Dinavahi, V., Lin, N. (2022). Many-Core Processors. In: Parallel Dynamic and Transient Simulation of Large-Scale Power Systems. Springer, Cham. https://doi.org/10.1007/978-3-030-86782-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-86782-9_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86781-2
Online ISBN: 978-3-030-86782-9
eBook Packages: EnergyEnergy (R0)