Abstract
This paper on gradient descent (GD) lies at the heart and soul of neural networks. The development of GD optimization algorithms significantly sped up the advancement of deep learning. Gradient Descent methods focus on deep learning research; some research projects have attempted to mix multiple training approaches to improve network performance; moreover, these methods seem primarily practical and need more theoretical guidance. This paper develops an architecture to demonstrate the combination of various GD optimization methodologies by analyzing other learning rates and numerous adaptive methods. This research aims to show how to apply GD to different optimization methods in Multistage into the GD optimization approach for exploring a deep learning model using a GD optimization method mixing technique. This research was motivated by the principles of SGDR (stochastic gradient descent with warm restarts), warm-up, and CLR (cyclical learning rates). The results of the training tests with the huge deep learning network validate the efficiency of the technique. This experiment is done by google colab python.
Similar content being viewed by others
Availability of Data and Materials
The data used in this paper can be requested from the corresponding author upon request.
References
Shafi Patel, Parag Parandkar, et al., "Exploring Alternative Topologies for Network-on-Chip Architectures," BIJIT - BVICAM's International Journal of Information Technology, Vol.3 No.2, July – December 2011.
Singha, A.K., Pathak, N, Sharma. ,N, Tiwari ,P.K., J. P. C. Joel., ”COVID-19 Disease Classification Model Using Deep Dense Convolutional Neural Networks”, In Emerging Technologies in Data Mining and Information Security, pp. 671–682. Springer, Singapore, 2023.
Kingma, D. P., Ba, J.,” Adam: A method for stochastic optimization”, arXiv preprint arXiv:1412.6980((2014).
N. Qian.,” On the momentum term in gradient descent learning algorithms”, Neural Networks, vol. 12, no. 1,pp. 145–151, 1999.
Nesterov, Y. ,” Gradient methods for minimizing composite functions," Mathematical Programming, vol. 140, no. 1,pp. 125–161, 2013.
Singha, A.K., Nitish Pathak., Sharma, N., Tiwari, P.K., J. P. C. Joel.,” Forecasting COVID-19 Confirmed Cases in China Using an Optimization Method”,.In Emerging Technologies in Data Mining and Information Security, pp. 683–695. Springer, Singapore, 2023.
Ruder, S. ,” An overview of gradient descent optimization algorithms”, arXiv preprint arXiv:1609.04747(2016).
Bengio, Y.,”Practical recommendations for gradient-based training of deep architectures”, In Neural networks: Tricks of the trade (pp. 437–478) (2012).
Robbins, H., Monro, S.,” A stochastic approximation method,”.The annals of mathematical statistics, 400–407(1951).
Darken, C., Chang, J., Moody, J.,” Learning rate schedules for faster stochastic gradient search”, In Neural networks for signal processing (Vol. 2)(1992).
Upadhyay, S.K., Kumar, A.,” A novel approach for rice plant diseases classification with deep convolutional neural network”, Int. J. Inf. Technol. 14, 185 – 199 (2022).
Kalaiselvi, T., Padmapriya, S.T., sriramakrishnan, P., Somasundaram, K.,”Deriving tumor detection models using convolutional neural networks from MRI of human brain scans," Int. J. Inf. Technol. 12, 403 – 408 (2020).
Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., He, K. ,” Accurate, large minibatch sgd: Training imagenet in 1 hour”, arXiv preprint arXiv:1706.02677(2017).
Smith, L. N. ,” Cyclical learning rates for training neural networks. In IEEE winter conference on applications of computer vision”, (WACV), (pp. 464–472)(2017).
Loshchilov, I., Hutter, F.,” SGDR: stochastic gradient descent with warm restarts”, in Proceedings of ICLR : International Conference on Learning Representations ( 2016).
Zeng, X., Ouyang, W., Wang, X.,” Multistage contextual deep learning for pedestrian detection”, in Proceeding of the IEEE International Conference on Computer Vision, pp. 121– 128, Sydney, Australia, ( 2013).
Rana, "Innovative Use of Cloud Computing in Smart Phone Technology", BIJIT - BVICAM's International Journal of Information Technology, Vol.5 No.2, July- December, 2013.
Yan, Z., Zhan, Y., Peng, Z., Liao, S., Shinagawa, Y., Metaxas, D. N., & Zhou, X. S. : Body part recognition using multistage deep learning. Lecture Notes in Computer Science, in International Conference on information processing in medical imaging, pp. 449–461, Isle of Skye, UK ( 2015).
K. Bhatia, A. K. Pal, Anu Chaudhary, "Performance Analysis of High Speed Data Networks Using Priority Discipline," BIJIT - BVICAM's International Journal of Information Technology, Vol.1 No.2, July – December 2009.
R. B. Patel, Anu, "A Mobile Transaction System for Open Networks", BIJIT - BVICAM's International Journal of Information Technology, Vol.1 No.1, January – June,2009.
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceeding of the IEEE 86(11):2278–2324
Zubair, S., Singha, A. K., Pathak, N., Sharma, N., Urooj, S., Larguech, S. R..,” Performance Enhancement of Adaptive Neural Networks Based on Learning Rate”, CMC-COMPUTERS MATERIALS & CONTINUA, 74(1), 2005–2019(2023).
Yuan, X. ,” Phd forum: deep learning-based real-time malware detection with multistage analysis”’ in Proceeding of the IEEE Conference on Smart Computing (SMARTCOMP), pp. 1–2,Hong Kong, China( 2017).
Tanwar, P., Gohil, R., & Tanwar, M.,” Quick Survey of Benefits from Control Plane and Data Plane Separation in Software-Defined Networkin{\mathrm{g}}\prime \prime, BVICA M's International Journal of Information Technology, Vol.8, (2016
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Singha, A.K., Zubair, S. Combination of Optimization Methods in a Multistage Approach for a Deep Neural Network Model. Int. j. inf. tecnol. 16, 1855–1861 (2024). https://doi.org/10.1007/s41870-023-01568-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41870-023-01568-1