More Effective Distributed Deep Learning Using Staleness Based Parameter Updating

Ye, Yan; Chen, Mengqiang; Yan, Zijie; Wu, Weigang; Xiao, Nong

doi:10.1007/978-3-030-05054-2_32

Yan Ye^16,17,
Mengqiang Chen^16,18,
Zijie Yan^16,17,
Weigang Wu^16,18 &
…
Nong Xiao^16,17

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11335))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1864 Accesses
3 Altmetric

Abstract

Deep learning technology has been widely applied for various purposes, especially big data analysis. However, computation required for deep learning is getting more complex and larger. In order to accelerate the training of large-scale deep networks, various distributed parallel training protocols have been proposed. In this paper, we design a novel asynchronous training protocol, Weighted Asynchronous Parallel (WASP), to update neural network parameters in a more effective way. The core of WASP is “gradient staleness”, a parameter version number based metric to weight gradients and reduce the influence of the stale parameters. Moreover, by periodic forced synchronization of parameters, WASP combines the advantages of synchronous and asynchronous training models and can speed up training with a rapid convergence rate. We conduct experiments using two classical convolutional neural networks, LeNet-5 and ResNet-101, at the Tianhe-2 supercomputing system, and the results show that, WASP can achieve much higher acceleration than existing asynchronous parallel training protocols.

This research is partially supported by The National Key Research and Development Program of China (No. 2016YFB0200404, 2018YFB0203803), National Natural Science Foundation of China (U1711263), MOE-CMCC Joint Research Fund of China (No. MCM20160104), and Program of Science and Technology of Guangdong (No. 2015B010111001).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://nscc-gz.cn/

References

Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Li, X., Zhang, G., Huang, H., et al.: Performance analysis of GPU-based convolutional neural networks. In: International Conference on Parallel Processing, Philadelphia, USA, pp. 67–76. IEEE (2016)
Google Scholar
Li, M., Andersen, D., Park, J., et al.: Scaling distributed machine learning with the parameter server. In: International Conference on Big Data Science and Computing, Beijing, China, pp. 583–598. ACM (2014)
Google Scholar
Dean, J., Corrado, G., Monga, R., et al.: Large scale distributed deep networks. In: International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, USA, pp. 1223–1231. Curran Associates Inc. (2012)
Google Scholar
Ho, Q., Cipar, J., Cui, H., et al.: More effective distributed ML via a stale synchronous parallel parameter server. In: International Conference on Neural Information Processing Systems, Daegu, South Korea, pp. 1223–1231. Curran Associates Inc. (2013)
Google Scholar
Zhang, W., Gupta, S., Lian, X., et al.: Staleness-aware async-SGD for distributed deep learning. In: International Joint Conference on Artificial Intelligence, vol. 1511(05950), pp. 2350–2356 (2016)
Google Scholar
Smola, A., Narayanamurthy, S.: An architecture for parallel topic models. Very Large Data Bases 3(1–2), 703–710 (2010)
Google Scholar
Ahmed, A., Aly, M., Gonzalez, J., et al.: Scalable inference in latent variable models. In: ACM International Conference on Web Search and Data Mining, Seattle Washington, USA, pp. 123–132. ACM (2012)
Google Scholar
Li, M., Zhou, L., Yang, Z., et al.: Parameter server for distributed machine learning. In: Big Learning NIPS Workshop, Lake Tahoe, Nevada, USA, pp. 1–10. ACM (2013)
Google Scholar
Zhang, H., Hu, Z., Wei, J., et al.: Poseidon: A system architecture for efficient GPU-based deep learning on multiple machines. Comput. Sci. 1512(06216), 10–21 (2015)
Google Scholar
Abadi, M., Barham, P., Chen, J., et al.: TensorFlow: a system for large-scale machine learning. In: OSDI 2016 Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation, Savannah, USA, pp. 265–283. USENIX Association (2016)
Google Scholar
Wang, M., Xiao, T., Li, J., Zhang, J., Hong, C., et al.: Minerva: a scalable and highly efficient training platform for deep learning. In: NIPS 2014 Workshop of Distributed Matrix Computations, Montreal, Canada, pp. 1–9. ACM (2014)
Google Scholar
Valiant, L.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Article Google Scholar
McColl, W.: Bulk Synchronous Parallel Computing. Oxford University Press, Oxford (1995)
Google Scholar
Cui, H., Cipar, J., Ho, Q., et al.: Exploiting bounded staleness to speed up big data analytics. In: Usenix Conference on Usenix Technical Conference, Philadelphia, USA, pp. 37–48. USENIX Association (2014)
Google Scholar
Jiang, C., Xing, P., Rajat, M., et al.: Revisiting distributed synchronous SGD. In: International Conference on Learning Representations, vol. 1604(00981), pp. 1–10 (2017)
Google Scholar
Jiang, J., Cui, B., Zhang, C., et al.: Heterogeneity-aware distributed parameter servers. In: ACM International Conference, Glasgow, Scotland, pp. 463–478. ACM (2017)
Google Scholar
Dai, W., Kumar, A., Ho, Q., et al.: High-performance distributed ML at scale through parameter server consistency models. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, Texas, pp. 79–87. AAAI Press (2015)
Google Scholar
Lecun, Y., Cortes, C.: The MNIST database of handwritten digits. Courant Inst. Math. Sci. 3(7), 1–10 (2010)
Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Comput. Sci. Dept. 1(4), 1–60 (2009)
Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. Comput. Vis. Pattern Recogn. 1512(03385), 770–778 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Data and Computer Science, Sun Yet-Sen University, Guangzhou, China
Yan Ye, Mengqiang Chen, Zijie Yan, Weigang Wu & Nong Xiao
Guangdong Province Key Laboratory of Big Data Analysis and Processing, Guangzhou, China
Yan Ye, Zijie Yan & Nong Xiao
MoE Key Laboratory of Machine Intelligence and Advanced Computing, Guangzhou, China
Mengqiang Chen & Weigang Wu

Authors

Yan Ye
View author publications
You can also search for this author in PubMed Google Scholar
Mengqiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zijie Yan
View author publications
You can also search for this author in PubMed Google Scholar
Weigang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Nong Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Ye .

Editor information

Editors and Affiliations

Rutgers University, Newark, NJ, USA
Jaideep Vaidya
Guangzhou University, Guangzhou, China
Jin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ye, Y., Chen, M., Yan, Z., Wu, W., Xiao, N. (2018). More Effective Distributed Deep Learning Using Staleness Based Parameter Updating. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11335. Springer, Cham. https://doi.org/10.1007/978-3-030-05054-2_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-05054-2_32
Published: 07 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05053-5
Online ISBN: 978-3-030-05054-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics