Abstract
In this paper, we use graphics processing units (GPU) to accelerate sparse and arbitrary structured neural networks. Sparse networks have nodes in the network that are not fully connected with nodes in preceding and following layers, and arbitrary structure neural networks have different number of nodes in each layers. Sparse Neural networks with arbitrary structures are generally created in the processes like neural network pruning and evolutionary machine learning strategies. We show that we can gain significant speedup for full activation of such neural networks using graphical processing units. We do a prepossessing step to determine dependency groups for all the nodes in a network, and use that information to guide the progression of activation in the neural network. Then we compute activation for each nodes in its own separate thread in the GPU, which allows for massive parallelization. We use CUDA framework to implement our approach and compare the results of sequential and GPU implementations. Our results show that the activation of sparse neural networks lends very well to GPU acceleration and can help speed up machine learning strategies which generate such networks or other processes that have similar structure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
W.S. McCulloch, W. Pitts, A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943)
R. Hecht-Nielsen, Theory of the backpropagation neural network, in Neural Networks for Perception (Elsevier, Amsterdam, 1992), pp. 65–93
K. Fatahalian, J. Sugerman, P. Hanrahan, Understanding the efficiency of GPU algorithms for matrix-matrix multiplication, in Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware (ACM, New York, 2004), pp. 133–137
Y. LeCun, J.S. Denker, S.A. Solla, Optimal brain damage, in Advances in Neural Information Processing Systems (Morgan Kaufmann, San Francisco, 1990), pp. 598–605
B. Hassibi, D.G. Stork, Second order derivatives for network pruning: optimal brain surgeon, in Advances in Neural Information Processing Systems (Kaufmann , San Mateo, 1993), pp. 164–171
K.O. Stanley, R. Miikkulainen, Evolving neural networks through augmenting topologies. Evolut. Comput. 10(2), 99–127 (2002)
K.O. Stanley, D.B. D’Ambrosio, J. Gauci, A hypercube-based encoding for evolving large-scale neural networks. Artif. Life 15(2), 185–212 (2009)
D. Floreano, P. Dürr, C. Mattiussi, Neuroevolution: from architectures to learning. Evolut. Intell. 1(1), 47–62 (2008)
R. Miikkulainen, J. Liang, E. Meyerson, A. Rawal, D. Fink, O. Francon, B. Raju, H. Shahrzad, A. Navruzyan, N. Duffy et al., Evolving deep neural networks (2017). Preprint. arXiv:1703.00548
F. Ino, J. Gomita, Y. Kawasaki, K. Hagihara, A gpgpu approach for accelerating 2-d/3-d rigid registration of medical images, in International Symposium on Parallel and Distributed Processing and Applications (Springer, New York, 2006), pp. 939–950
D. Kirk et al., NVIDIA CUDA software and GPU parallel computing architecture, in ISMM, vol. 7 (2007), pp. 103–104
J.E. Stone, D. Gohara, G. Shi, OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(3), 66–73 (2010)
D. Strigl, K. Kofler, S. Podlipnig, Performance and scalability of GPU-based convolutional neural networks, in 2010 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) (IEEE, New York, 2010), pp. 317–324
C. Nvidia, Cublas Library, vol. 15(27) (NVIDIA Corporation, Santa Clara, CA, 2008), p. 31
M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al., Tensorflow: a system for large-scale machine learning, in OSDI, vol. 16 (2016), pp. 265–283
D. Göddeke, GPGPU-basic math tutorial. Univ. Dortmund, Fachbereich Mathematik, 2005
NVIDIA, GPU-accelerated libraries for computing (2018) [Online]. Available: https://developer.nvidia.com/gpu-accelerated-libraries
D. Steinkraus, I. Buck, P. Simard, Using GPUs for machine learning algorithms,” in Proceedings of Eighth International Conference on Document Analysis and Recognition, 2005 (IEEE, New York, 2005), pp. 1115–1120
I. Goodfellow, Y. Bengio, A. Courville, Y. Bengio, Deep Learning, vol. 1 (MIT Press, Cambridge, 2016)
Z. Luo, H. Liu, X. Wu, Artificial neural network computation on graphic process unit, in Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005. IJCNN’05, vol. 1 (IEEE, New York, 2005), pp. 622–626
D. Scherer, H. Schulz, S. Behnke, Accelerating large-scale convolutional neural networks with parallel graphics multiprocessors, in International Conference on Artificial Neural Networks (Springer, New York, 2010), pp. 82–91
S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, E. Shelhamer, cuDNN: efficient primitives for deep learning (2014). Preprint. arXiv:1410.0759
A. Coates, B. Huval, T. Wang, D. Wu, B. Catanzaro, N. Andrew, Deep learning with COTS HPC systems, in International Conference on Machine Learning (2013), pp. 1337–1345
S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, Y. Chen, Cambricon-X: an accelerator for sparse neural networks, in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (IEEE, New York, 2016), pp. 1–12
J.M. Nageswaran, N. Dutt, J.L. Krichmar, A. Nicolau, A.V. Veidenbaum, A configurable simulation environment for the efficient simulation of large-scale spiking neural networks on graphics processors. Neur. Netw. 22(5–6), 791–800 (2009)
C.-F. Juang, T.-C. Chen, W.-Y. Cheng, Speedup of implementing fuzzy neural networks with highdimensional inputs through parallel processing on graphic processing units. IEEE Trans. Fuzzy Syst. 19(4), 717–728 (2011)
X. Chen, Y. Wang, X. Liu, M.J. Gales, P.C. Woodland, Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch, in Fifteenth Annual Conference of the International Speech Communication Association (2014)
L. Luo, M. Wong, W.-M. Hwu, An effective GPU implementation of breadth-first search, in Proceedings of the 47th Design Automation Conference (ACM, New York, 2010), pp. 52–55
P. Harish, P. Narayanan, Accelerating large graph algorithms on the GPU using CUDA, in International Conference on High-Performance Computing (Springer, New York, 2007), pp. 197–208
Acknowledgements
This material is based in part upon work supported by the National Science Foundation under grant number IIA-1301726. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gajurel, A., Louis, S.J., Wu, R., Barford, L., Harris, F.C. (2021). GPU Acceleration of Sparse Neural Networks. In: Latifi, S. (eds) ITNG 2021 18th International Conference on Information Technology-New Generations. Advances in Intelligent Systems and Computing, vol 1346. Springer, Cham. https://doi.org/10.1007/978-3-030-70416-2_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-70416-2_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-70415-5
Online ISBN: 978-3-030-70416-2
eBook Packages: EngineeringEngineering (R0)