Abstract
Data parallelism and model parallelism are regarded as two major parallelism strategies for deep neural networks (DNNs). However, the two methodologies achieve acceleration mainly by applying coarse-grained network-model-based parallelization. Neither methodology can fully tap into the potentials of the parallelism of network models and many-core systems (such as GPUs). In this work, we propose a novel fine-grained parallelism strategy based on layer-wise parallelization (named FiLayer), which includes inter-layer parallelism and intra-layer parallelism. The former allows several adjacent layers in a network model to be processed in a pipelined manner. The latter divides the operations in one layer into several parts and processes them in parallel. CUDA streams are applied to realize the above fine-grained parallelisms. FiLayer is implemented by extending Caffe. Several typical datasets are used for the performance evaluation. The experimental results indicate that FiLayer can help Caffe achieve speedups of \(1.58{\times }\)–\(2.19{\times }\).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: Proceedings of 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 265–283. USENIX, Berkeley (2016)
Awan, A.A., Hamidouche, K., Hashmi, J.M., Panda, D.K.: S-Caffe: co-designing MPI runtimes and Caffe for scalable deep learning on modern GPU clusters. In: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 193–205. ACM, New York (2017)
Chen, T., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015)
Chetlur, S., et al.: cuDNN: efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE, Piscataway (2016)
Iandola, F.N., Moskewicz, M.W., Ashraf, K., Keutzer, K.: FireCaffe: near-linear acceleration of deep neural network training on compute clusters. In: Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2592–2600. IEEE, Piscataway (2016)
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia (ACM MM), pp. 675–678. ACM, New York (2014)
Jiang, H., Ruan, J.: The application of genetic neural network in network intrusion detection. J. Comput. 4, 1276–1283 (2009)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS), pp. 1097–1105. Curran Associates Inc., New York (2012)
LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521, 436–444 (2015)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the 28th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9. IEEE, Piscataway (2015)
Acknowledgments
This work is supported by National Natural Science Foundation of China under grant No. 61672250.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Jiang, W., Zhang, Y., Liu, P., Ye, G., Jin, H. (2018). FiLayer: A Novel Fine-Grained Layer-Wise Parallelism Strategy for Deep Neural Networks. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds) Artificial Neural Networks and Machine Learning – ICANN 2018. ICANN 2018. Lecture Notes in Computer Science(), vol 11141. Springer, Cham. https://doi.org/10.1007/978-3-030-01424-7_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-01424-7_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01423-0
Online ISBN: 978-3-030-01424-7
eBook Packages: Computer ScienceComputer Science (R0)