Improving Steepest Descent Method by Learning Rate Annealing and Momentum in Neural Network

Trivedi, Udai Bhan; Mishra, Priti

doi:10.1007/978-981-15-7804-5_14

Udai Bhan Trivedi⁴⁰ &
Priti Mishra⁴⁰

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 694))

745 Accesses

Abstract

The backpropagation method of neural network (BPNN) method which is an important algorithm in machine learning has been applied to wide range of problem like pattern recognition, optimization, approximation, classification, and data clustering in real world. BPNN algorithm has been widely used in age estimation, pedestrian gender classification, traffic sign recognition, character recognition, water pollution forecasting models, heart disease classification, breast cancer detection (Keskar et al. in On large-batch training for deep learning: Generalization gap and sharp minima, 2016) [1], remote sensing, and image classification (He et al. in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 558–567, 2019) [2]. Algorithm uses a steepest gradient method and suffers with some limitations like convergence to local minima and slow convergence velocity of learning. This research proposed solution for the slow learning convergence velocity by implementing learning rate annealing which implements anneal the learning rate (decline as time progresses) rather than constant learning rate throughout the training. The problem of local minima can be address by momentum, which can be calculated by adding fraction of the past weights updates to the calculation of current weight.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP (2016) On large-batch training for deep learning: generalization gap and sharp minima. arXivpreprint arXiv:1609.04836
Google Scholar
He T, Zhang Z, Zhang H, Zhang Z, Xie J, Li M (2019) Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 558–567
Google Scholar
Mutasem KSA, Khairuddin O, Shahrul AN (2009) Back propagation algorithm: the best algorithm among the multi-layer perceptron algorithm. Int J Comput Sci Netw Secur 9(4):378–383
Google Scholar
Shamsuddin SM et al (2009) Study of cost functions in three term backpropagation for classification problems. In: World congress on nature & biologically inspired computing, NaBIC 2009. IEEE
Google Scholar
Sharda R, Delen D (2006) Predicting box-office success of motion pictures with neural networks. Expert Syst Appl 30:243–254
Article Google Scholar
Whitney TM, Meany RK (2006) Two algorithms related to the method of steepest descent. SIAM
Google Scholar
Olson RS, Moore JH (2019) TPOT: a tree-based pipeline optimization tool for automating machine learning. In: Hutter F, Kotthoff L, Vanschoren J (eds) Automated machine learning. The springer series on challenges in machine learning. Springer, Cham
Google Scholar
.
Google Scholar
Majdi A, Beiki M (2010) Evolving neural network using genetic algorithm for predicting the deformation modulus of rock masses. Int J Rock Mech Min Sci 47(2):246–253
Article Google Scholar
Bista R, Thapa A (2020) Handbook of wireless sensor networks: issues and challenges in current scenario’s, advances in intelligent systems and computing, vol 1132. Springer, Cham, Switzerland, pp 239–259
Google Scholar
Wang X (2008) Method of steepest descent and its applications. IEEE Microwave Wirel Compon Lett
Google Scholar
Burse K, Manoria M, Kirar VPS (2010) Improved back propagation algorithm to avoid local minima in multiplicative neuron model. World Acad Sci Eng Technol Int J Electr Comput Eng 4(12)
Google Scholar
Jastrzębski S, Kenton Z, Arpit D, Ballas N, Fischer A, Bengio Y, Storkey A (2017) Three factors influencing minima in SGD. arXiv preprint arXiv:1711.04623
Google Scholar
Vora K, Yagnik S (2014) A new technique to solve local minima problem with large number of hidden nodes on feed forward neural network. IJEDR 2(2). ISSN: 2321-9939
Google Scholar
Im DJ, Tao M, Branson K (2016) An empirical analysis of deep network loss surfaces. arXiv preprint arXiv:1612.04010
Google Scholar
Ng SC, Leung SH, Luk A (1999) Fast convergent generalized back propagation algorithm with constant learning rate. Neural Process Lett 9:13–23
Article Google Scholar
Liu H, Simonyan K, Yang Y (2019) DARTS: differentiable architecture search. In: International conference on learning representations (ICLR)
Google Scholar
Singh PK, Bhargava BK, Paprzycki M, Kaushal NC, Hong WC (2020) Handbook of wireless sensor networks: issues and challenges in current scenario’s, advances in intelligent systems and computing, vol 1132. Springer, Cham, Switzerland, pp 155–437
Google Scholar
Singh PK, Kar AK, Singh Y, Kolekar MH, Tanwar S (2020) Proceedings of ICRIC 2019, recent innovations in computing, 2020, vol 597. Lecture notes in electrical engineering. Springer, Cham, Switzerland, pp 3–920
Google Scholar
Yan H, Jiang Y, Zheng J, Peng C, Li Q (2006) A multilayer perceptron-based medical decision support system for heart disease diagnosis. Expert Syst Appl 30(2):272–281
Article Google Scholar
Zou H, Xia G, Yang F, Yang H (2007) A neural network model based on the multi-stage optimization approach for short-term food price forecasting in China. Expert Syst Appl 33(2):347–356
Article Google Scholar
He Y, Lin J, Liu Z, Wang H, Li L-J, Han S (2018) AMC: AutoML for model compression and acceleration on mobile devices. In: Proceedings of the European conference on computer vision (ECCV), pp 784–800
Google Scholar
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Google Scholar
Singh P, Paprzycki M, Bhargava B, Chhabra J, Kaushal N, Kumar Y (2018) Futuristic trends in network and communication technologies, FTNCT 2018. Communications in computer and information science, vol 958, pp 3–509
Google Scholar
Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2017) Pruning filters for efficient convents. In: International conference on learning representations (ICLR)
Google Scholar
Ratner A, Bach SH, Ehrenberg HS et al (2020) Rapid training data creation with weak supervision. VLDB J 29:709–730
Article Google Scholar
Nashed MZ (1970) Steepest descent for singular linear operator equations. SIAM J Numer Anal 7(3):358–362
Google Scholar
Mcinerney JM, Haines KG, Biafore S, HechtNielsen R (1992) Can backpropagation error surfaces have non-global minima? In: Proceedings of the IEEE-IJCNN 8911-627
Google Scholar
Gori M, Tesi A (1992) On the problem of local minima in backpropagation. IEEE Trans Pattern Anal Mach Intell 14(1):76–86
Google Scholar
Chang LY (2005) Analysis of freeway accident frequencies: negative binomial regression versus artificial neural network. Saf Sci 43:541–557
Google Scholar
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4700–4708
Google Scholar
Zhang NM, Wu W, Zheng GF (2006) Deterministic convergence of gradient method with momentum for two-layer feed forward neural networks. IEEE Trans Neural Netw 17(2):522–525
Google Scholar
Nikolopoulos K, Goodwin P, Patelis A, Assimakopoulos V (2007) Forecasting with cue information: a comparison of multiple regression with alternative forecasting approaches. Eur J Oper Res 180(1):354–368
Article Google Scholar

Download references

Author information

Authors and Affiliations

PSIT College of Higher Education, Kanpur, 209305, India
Udai Bhan Trivedi & Priti Mishra

Authors

Udai Bhan Trivedi
View author publications
You can also search for this author in PubMed Google Scholar
Priti Mishra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Udai Bhan Trivedi .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Jaypee University of Information Technology, Solan, Himachal Pradesh, India
Pradeep Kumar Singh
CDAC Noida, Noida, Uttar Pradesh, India
Arti Noor
Department of Electrical Engineering, Indian Institute of Technology, Patna, Bihar, India
Maheshkumar H. Kolekar
Department of Computer Engineering, Institute of Technology, Nirma University, Ahmedabad, Gujarat, India
Sudeep Tanwar
Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, OH, USA
Raj K. Bhatnagar
JSSATEN Noida, Noida, Uttar Pradesh, India
Shaweta Khanna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Trivedi, U.B., Mishra, P. (2021). Improving Steepest Descent Method by Learning Rate Annealing and Momentum in Neural Network. In: Singh, P.K., Noor, A., Kolekar, M.H., Tanwar, S., Bhatnagar, R.K., Khanna, S. (eds) Evolving Technologies for Computing, Communication and Smart World. Lecture Notes in Electrical Engineering, vol 694. Springer, Singapore. https://doi.org/10.1007/978-981-15-7804-5_14

Download citation

DOI: https://doi.org/10.1007/978-981-15-7804-5_14
Published: 26 November 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-7803-8
Online ISBN: 978-981-15-7804-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics