Abstract
The task of training deep neural networks on a large amount of data requires a lot of resources. The solution of such a problem is often impossible to carry out on one computing device in an adequate time. Distributed computing systems can be used to solve deep learning problems. Such systems may consist of heterogeneous computing nodes with different computing power. To implement deep learning on a distributed heterogeneous system, it is necessary to solve the problem of utilization of all available resources. The solution to this problem is to configure the task delivery system of a distributed system. And to expand the number of computing nodes involved, it is necessary to use virtualization. The article discusses two types of virtualization for grid systems when solving deep learning problems. The features of the implementation of computational applications for training deep neural networks for solving the problem of image classification are discussed. The results of distributed deep learning on a public grid system are discussed. A comparative analysis of two virtualization approaches is given.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Foster, I., Kesselman, C.: The grid 2: blueprint for a new computing infrastructure (2004)
Anderson, D.P.: BOINC: a platform for volunteer computing. J. Grid Comput. 18(1), 99–122 (2019). https://doi.org/10.1007/s10723-019-09497-9
Bockelman, B., Livny, M., Lin, B., Prelz, F.: Principles, technologies, and time: the translational journey of the HTCondor-CE. J. Comput. Sci. 52 (2021). https://doi.org/10.1016/j.jocs.2020.101213
Borges, G., et al.: Sun grid engine, a new scheduler for EGEE middleware. In: BERGRID–Iberian Grid Infrastructure Conference (2007)
Da, T., Morais, S.: Survey on frameworks for distributed computing: Hadoop, spark and storm. In: Proceedings of the 10th Doctoral Symposium in Informatics Engineering - DSIE’15 (2015)
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59, 56–65 (2016). https://doi.org/10.1145/2934664
Ben-Nun, T., Hoefler, T.: Demystifying parallel and distributed deep learning: an in-depth concurrency analysis. ACM Comput. Surv. 52 (2019). https://doi.org/10.1145/3320060
Bellavista, P., Foschini, L., Mora, A.: Decentralised learning in federated deployment environments: a system-level survey (2021). https://doi.org/10.1145/3429252
Abdulrahman, S., Tout, H., Ould-Slimane, H., Mourad, A., Talhi, C., Guizani, M.: A survey on federated learning: the journey from centralized to distributed on-site learning and beyond. IEEE Internet Things J. 8, 5476–5497 (2021). https://doi.org/10.1109/JIOT.2020.3030072
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings (2015)
BOINC projects: List BOINC projects. https://boinc.berkeley.edu/projects.php. Accessed 22 May 2023
Top 500. https://top500.org/lists/top500/2023/06/. Accessed 01 Aug 2023
Watada, J., Roy, A., Kadikar, R., Pham, H., Xu, B.: Emerging trends, techniques and open issues of containerization: a review (2019). https://doi.org/10.1109/ACCESS.2019.2945930
Molto, G., Caballer, M., Perez, A., Alfonso, C. De, Blanquer, I.: Coherent application delivery on hybrid distributed computing infrastructures of virtual machines and docker containers. In: Proceedings - 2017 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017, pp. 486–490. Institute of Electrical and Electronics Engineers Inc. (2017). https://doi.org/10.1109/PDP.2017.29
Chung, M.T., Quang-Hung, N., Nguyen, M.T., Thoai, N.: Using Docker in high performance computing applications. In: 2016 IEEE 6th International Conference on Communications and Electronics, IEEE ICCE 2016, pp. 52–57 (2016). https://doi.org/10.1109/CCE.2016.7562612
Garcia, S., Miller, S.: Great internet Mersenne prime search (GIMPS). In: 100 Years of Math Milestones (2019). https://doi.org/10.1090/mbk/121/84
Kurochkin, I.I., Kostylev, I.S.: Solving the problem of texture images classification using synchronous distributed deep learning on desktop grid systems (2020). https://doi.org/10.1007/978-3-030-64616-5_55
Acknowledgements
This work was funded by Russian Science Foundation (â„– 22-11-00317).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kurochkin, I., Papanov, V. (2023). Using Virtualization Approaches to Solve Deep Learning Problems in Voluntary Distributed Computing Projects. In: Voevodin, V., Sobolev, S., Yakobovskiy, M., Shagaliev, R. (eds) Supercomputing. RuSCDays 2023. Lecture Notes in Computer Science, vol 14389. Springer, Cham. https://doi.org/10.1007/978-3-031-49435-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-49435-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49434-5
Online ISBN: 978-3-031-49435-2
eBook Packages: Computer ScienceComputer Science (R0)