Abstract
We introduce a framework for training deep neural networks on clusters of computers with the following appealing properties: (1) It is developed in Python, exposing an amiable interface that provides an accessible entry point for the newcomer; (2) it is extensible, offering a customizable tool for the more advanced user in deep learning; (3) it covers the main functionality appearing in convolutional neural networks; and (4) it delivers reasonable inter-node parallel performance exploiting data parallelism by leveraging MPI via MPI4Py for communication and NumPy for the efficient execution of (multithreaded) numerical kernels.
Similar content being viewed by others
Notes
The source code for PyDTNN is available at https://github.com/hpca-uji/PyDTNN.
References
Tal B-N, Torsten H (2019) Demystifying parallel and distributed deep learning: an in-depth concurrency analysis. ACM Comput Surv 52(4):65
Chan E, Heimlich M, Purkayastha A, van de Geijn R (2007) Collective communication: theory, practice, and experience. Concurr Comput Pract Exp 19(13):1749–1783
Kumar C, Sidd P, Patrice S (2006) High performance convolutional neural networks for document processing. In: International workshop on frontiers in handwriting recognition
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Pouyanfar S et al (2018) A survey on deep learning: algorithms, techniques, and applications. ACM Comput Surv 51(5):92:1–92:36
Karen S, Andrew Z (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arxiv:1409.1556
Sze V, Chen Y-H, Yang T-J, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329
Aravind V, Andrew A, David G (2017) Parallel multi channel convolution using general matrix multiplication. In: 2017 IEEE 28th international conference on application-specific systems, architectures and processors (ASAP), pp 19–24
You Y, et al. (2018) Large-batch training for LSTM and beyond. Technical Report UCB/EECS-2018-138, Electrical Engineering and Computer Sciences, University of California at Berkeley
Yang Y, Igor G, Boris G (2017) Scaling SGD batch size to 32k for ImageNet training. arXiv preprint arxiv:1708.03888
Acknowledgements
This work was supported by Project TIN2017-82972-R from the Spanish Ministerio de Ciencia, Innovación y Universidades. M. F. Dolz was supported by project CDEIGENT/2018/014 from the Generalitat Valenciana.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Barrachina, S., Castelló, A., Catalán, M. et al. PyDTNN: A user-friendly and extensible framework for distributed deep learning. J Supercomput 77, 9971–9987 (2021). https://doi.org/10.1007/s11227-021-03673-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-03673-z