Distributed Training of Large-Scale Deep Learning Models in Commodity Hardware

Ahmad, Jubaer; Navin, Tahsin Elahi; Al Awsaf, Fahim; Arafat, Md. Yasir; Hossain, Md. Shahadat; Islam, Md. Motaharul

doi:10.1007/978-981-99-1624-5_52

Jubaer Ahmad¹²,
Tahsin Elahi Navin¹²,
Fahim Al Awsaf¹²,
Md. Yasir Arafat¹²,
Md. Shahadat Hossain¹² &
…
Md. Motaharul Islam¹²

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 672))

233 Accesses

Abstract

Running deep learning models on a computer is often resource intensive and time-consuming. Deep learning models require high-performance GPUs to train on big data. It might take days and months to train models with large datasets, even with the help of high-performance GPUs. This paper provides an affordable solution for executing models within a reasonable time interval. We propose a system which is perfect to distribute large-scale deep learning models in commodity hardware. Our model consists of creating distributed computing clusters using only open-source software which can provide comparable performance to High-Performance Computing clusters even with the absence of GPUs. Hadoop clusters are created by connecting servers with a SSH network to interconnect computers and enable continuous data transfer between them. We then set up Apache Spark on our Hadoop cluster. Then, we run BigDL on top of Spark. It is a high-performance Spark library that helps us scale to massive datasets. BigDL helps us run large deep learning models locally in Jupyter Notebook and simplifies cluster computing and resource management. This environment provides computation performance up to 70% faster than a single machine execution with the option of scaling in case of model training, data throughput, hyperparameter search, and resource utilization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Fernández AM, Gutiérrez-Avilés D, Troncoso A, Martínez-Álvarez F (2020) Automated deployment of a spark cluster with machine learning algorithm integration. Big Data Res 19:100135
Article Google Scholar
Kim H, Park J, Jang J, Yoon S (2016) Deepspark: spark-based deep learning supporting asynchronous updates and caffe compatibility
Google Scholar
Mostafaeipour A, Jahangard Rafsanjani A, Ahmadi M, Arockia Dhanraj J (2021) Investigating the performance of Hadoop and Spark platforms on machine learning algorithms. J Supercomput 77(2):1273–1300
Article Google Scholar
Ghoting A, Krishnamurthy R, Pednault E, Reinwald B, Sindhwani V, Tatikonda S, Tian Y, Vaithyanathan S (2011) SystemML: declarative machine learning on MapReduce. In: 2011 IEEE 27th international conference on data engineering. IEEE, pp 231–242
Google Scholar
Dai JJ, Wang Y, Qiu X, Ding D, Zhang Y, Wang Y, Jia X, Zhang CL, Wan Y, Li Z, Wang J, Huang S, Wu Z, Wang Y, Yang Y, She B, Shi D, Lu Q, Huang K, Song G (2019) BigDL: a distributed deep learning framework for big data. In: Proceedings of the ACM symposium on cloud computing (SoCC ’19)
Google Scholar
Meng X, Bradley J, Yavuz B, Sparks E, Venkataraman S, Liu D, Freeman J, Tsai DB, Amde M, Owen S, Xin D, Xin R, Franklin MJ, Zadeh R, Zaharia M, Talwalkar A (2016) MLlib: machine learning in apache spark. J Mach Learn Res 17(1):1235–1241
Google Scholar
Langer M, Hall A, He Z, Rahayu W (2018) MPCA SGD—a method for distributed training of deep learning models on spark. IEEE Trans Parallel Distrib Syst 29(11):2540–2556
Google Scholar
Kim H, Park J, Jang J, Yoon S (2016) DeepSpark: a spark-based distributed deep learning framework for commodity clusters
Google Scholar
Li Z, Davis J, Jarvis SA (2018) Optimizing machine learning on apache spark in HPC environments. In: 2018 IEEE/ACM machine learning in HPC environments (MLHPC), pp 95–105
Google Scholar
Khumoyun A, Cui Y, Hanku L (2016) Spark based distributed Deep Learning framework for Big Data applications. In: 2016 international conference on information science and communications technologies (ICISCT), pp 1–5
Google Scholar
Aspri M, Tsagkatakis G, Tsakalides P (2020) Distributed training and inference of deep learning models for multi-modal land cover classification. Rem Sens
Google Scholar
Venkatesan NJ, Nam CS, Shin DR (2018) Deep learning frameworks on apache spark: a review. IETE Tech Rev
Google Scholar
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Nguyen TQ, Weitekamp D, Anderson D, Castello R, Cerri O, Pierini M et al (2019) Topology classification with deep learning to improve real-time event selection at the LHC. Comput Softw Big Sci 3(1):1–14
Google Scholar
Jonnalagadda VS, Srikanth P, Thumati K, Nallamala SH, Dist K (2016) A review study of apache spark in big data processing. Int J Comput Sci Trends Technol (IJCST) 4(3):93–98
Google Scholar
Fiterău-Broştean P, Lenaerts T, Poll E, de Ruiter J, Vaandrager F, Verleg P (2017) Model learning and model checking of SSH implementations. In: Proceedings of the 24th ACM SIGSOFT international SPIN symposium on model checking of software (SPIN 2017). Association for Computing Machinery, New York, NY, USA, pp 142–151
Google Scholar
Dai JJ, Wang Y, Qiu X, Ding D, Zhang Y, Wang Y et al (2019, November) Bigdl: a distributed deep learning framework for big data. In: Proceedings of the ACM symposium on cloud computing, pp 50–60
Google Scholar
Aftab MO, Awan MJ, Khalid S, Javed R, Shabir H (2021, April) Executing spark BigDL for leukemia detection from microscopic images using transfer learning. In: 2021 1st international conference on artificial intelligence and data analytics (CAIDA). IEEE, pp 216–220
Google Scholar
Borthakur D (2008) HDFS architecture guide. Hadoop Apache Project 53(1–13):2
Google Scholar
Jain M (2018) Advanced techniques in shell scripting. In: Beginning modern unix. Apress, Berkeley, CA, pp 283–312
Google Scholar
Liashchynskyi P, Liashchynskyi P (2019) Grid search, random search, genetic algorithm: a big comparison for NAS
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
Jubaer Ahmad, Tahsin Elahi Navin, Fahim Al Awsaf, Md. Yasir Arafat, Md. Shahadat Hossain & Md. Motaharul Islam

Authors

Jubaer Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Tahsin Elahi Navin
View author publications
You can also search for this author in PubMed Google Scholar
Fahim Al Awsaf
View author publications
You can also search for this author in PubMed Google Scholar
Md. Yasir Arafat
View author publications
You can also search for this author in PubMed Google Scholar
Md. Shahadat Hossain
View author publications
You can also search for this author in PubMed Google Scholar
Md. Motaharul Islam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Md. Motaharul Islam .

Editor information

Editors and Affiliations

Computer Science and Design, Dayananda Sagar College of Engineering, Bengaluru, India
V. Suma
University of Haute Alsace, Mulhouse, France
Pascal Lorenz
School of Information Technology, Deakin University, Geelong, VIC, Australia
Zubair Baig

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ahmad, J., Navin, T.E., Al Awsaf, F., Arafat, M.Y., Hossain, M.S., Islam, M.M. (2023). Distributed Training of Large-Scale Deep Learning Models in Commodity Hardware. In: Suma, V., Lorenz, P., Baig, Z. (eds) Inventive Systems and Control. Lecture Notes in Networks and Systems, vol 672. Springer, Singapore. https://doi.org/10.1007/978-981-99-1624-5_52

Download citation

DOI: https://doi.org/10.1007/978-981-99-1624-5_52
Published: 15 June 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1623-8
Online ISBN: 978-981-99-1624-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics