Distributed Deep Learning on Heterogeneous Computing Resources Using Gossip Communication
- 73 Downloads
With the increased usage of deep neural networks, their structures have naturally evolved, increasing in size and complexity. With currently used networks often containing millions of parameters, and hundreds of layers, there have been many attempts to leverage the capabilities of various high-performance computing architectures. Most approaches are focused on either using parameter servers or a fixed communication network, or exploiting particular capabilities of specific computational resources. However, few experiments have been made under relaxed communication consistency requirements and using a dynamic adaptive way of exchanging information.
Gossip communication is a peer-to-peer communication approach, that can minimize the overall data traffic between computational agents, by providing a weaker guarantee on data consistency - eventual consistency. In this paper, we present a framework for gossip-based communication, suitable for heterogeneous computing resources, and apply it to the problem of parallel deep learning, using artificial neural networks. We present different approaches to gossip-based communication in a heterogeneous computing environment, consisting of CPUs and MIC-based co-processors, and implement gossiping via both shared and distributed memory. We also provide a simplistic approach to load balancing in a heterogeneous computing environment, that proves efficient for the case of parallel deep neural network training.
Further, we explore several approaches to communication exchange and resource allocation, when considering parallel deep learning using heterogeneous computing resources, and evaluate their effect on the convergence of the distributed neural network.
KeywordsDeep learning Gossip communication Heterogeneous high-performance computing
This work was supported by the Bulgarian Ministry of Education and Science under the National Research Programme “Environmental Protection and Reduction of Risks of Adverse Events and Natural Disasters”, approved by the RCM #577/17.08.2018 and signed Agreement DO-#230/06.12.2018, as well as by the “Program for Support of Young Scientists and PhD Students at the Bulgarian Academy of Sciences - 2017” under the project DFNP-17-91.
- 1.Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., Fergus, R.: Regularization of neural networks using DropConnect. In: Proceedings of the 30th International Conference on Machine Learning, vol. 28, no. 3, pp. 1058–1066. PMLR (2013)Google Scholar
- 2.Zhang, S., Choromanska, A., LeCun, Y.: Deep learning with elastic averaging SGD. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28 (NIPS), pp. 685–693 (2013)Google Scholar
- 3.Krizhevsky, A.: Learning Multiple Layers of Features from Tiny Images (2009). http://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf
- 4.Blot, M., Picard, D., Cord, M., Thome, N.: Gossip training for deep learning. arXiv preprint arXiv:1611.09726 (2016)
- 5.Daily, J., Vishnu, A., Siegel, C., Warfel, T., Amatya, V.: GossipGraD: scalable deep learning using gossip communication based asynchronous gradient descent. arXiv preprint arXiv:1803.05880 (2018)
- 6.Atanassov, E., Ivanovska, S., Dimitrov, D.: Parallel implementation of option pricing methods on multiple GPUs, MIPRO. In: Proceedings of the 35th International Convention, pp. 368–373. IEEE (2012)Google Scholar
- 7.Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2015)Google Scholar
- 8.He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar