Abstract
The backpropagation method of neural network (BPNN) method which is an important algorithm in machine learning has been applied to wide range of problem like pattern recognition, optimization, approximation, classification, and data clustering in real world. BPNN algorithm has been widely used in age estimation, pedestrian gender classification, traffic sign recognition, character recognition, water pollution forecasting models, heart disease classification, breast cancer detection (Keskar et al. in On large-batch training for deep learning: Generalization gap and sharp minima, 2016) [1], remote sensing, and image classification (He et al. in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 558–567, 2019) [2]. Algorithm uses a steepest gradient method and suffers with some limitations like convergence to local minima and slow convergence velocity of learning. This research proposed solution for the slow learning convergence velocity by implementing learning rate annealing which implements anneal the learning rate (decline as time progresses) rather than constant learning rate throughout the training. The problem of local minima can be address by momentum, which can be calculated by adding fraction of the past weights updates to the calculation of current weight.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP (2016) On large-batch training for deep learning: generalization gap and sharp minima. arXivpreprint arXiv:1609.04836
He T, Zhang Z, Zhang H, Zhang Z, Xie J, Li M (2019) Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 558–567
Mutasem KSA, Khairuddin O, Shahrul AN (2009) Back propagation algorithm: the best algorithm among the multi-layer perceptron algorithm. Int J Comput Sci Netw Secur 9(4):378–383
Shamsuddin SM et al (2009) Study of cost functions in three term backpropagation for classification problems. In: World congress on nature & biologically inspired computing, NaBIC 2009. IEEE
Sharda R, Delen D (2006) Predicting box-office success of motion pictures with neural networks. Expert Syst Appl 30:243–254
Whitney TM, Meany RK (2006) Two algorithms related to the method of steepest descent. SIAM
Olson RS, Moore JH (2019) TPOT: a tree-based pipeline optimization tool for automating machine learning. In: Hutter F, Kotthoff L, Vanschoren J (eds) Automated machine learning. The springer series on challenges in machine learning. Springer, Cham
.
Majdi A, Beiki M (2010) Evolving neural network using genetic algorithm for predicting the deformation modulus of rock masses. Int J Rock Mech Min Sci 47(2):246–253
Bista R, Thapa A (2020) Handbook of wireless sensor networks: issues and challenges in current scenario’s, advances in intelligent systems and computing, vol 1132. Springer, Cham, Switzerland, pp 239–259
Wang X (2008) Method of steepest descent and its applications. IEEE Microwave Wirel Compon Lett
Burse K, Manoria M, Kirar VPS (2010) Improved back propagation algorithm to avoid local minima in multiplicative neuron model. World Acad Sci Eng Technol Int J Electr Comput Eng 4(12)
Jastrzębski S, Kenton Z, Arpit D, Ballas N, Fischer A, Bengio Y, Storkey A (2017) Three factors influencing minima in SGD. arXiv preprint arXiv:1711.04623
Vora K, Yagnik S (2014) A new technique to solve local minima problem with large number of hidden nodes on feed forward neural network. IJEDR 2(2). ISSN: 2321-9939
Im DJ, Tao M, Branson K (2016) An empirical analysis of deep network loss surfaces. arXiv preprint arXiv:1612.04010
Ng SC, Leung SH, Luk A (1999) Fast convergent generalized back propagation algorithm with constant learning rate. Neural Process Lett 9:13–23
Liu H, Simonyan K, Yang Y (2019) DARTS: differentiable architecture search. In: International conference on learning representations (ICLR)
Singh PK, Bhargava BK, Paprzycki M, Kaushal NC, Hong WC (2020) Handbook of wireless sensor networks: issues and challenges in current scenario’s, advances in intelligent systems and computing, vol 1132. Springer, Cham, Switzerland, pp 155–437
Singh PK, Kar AK, Singh Y, Kolekar MH, Tanwar S (2020) Proceedings of ICRIC 2019, recent innovations in computing, 2020, vol 597. Lecture notes in electrical engineering. Springer, Cham, Switzerland, pp 3–920
Yan H, Jiang Y, Zheng J, Peng C, Li Q (2006) A multilayer perceptron-based medical decision support system for heart disease diagnosis. Expert Syst Appl 30(2):272–281
Zou H, Xia G, Yang F, Yang H (2007) A neural network model based on the multi-stage optimization approach for short-term food price forecasting in China. Expert Syst Appl 33(2):347–356
He Y, Lin J, Liu Z, Wang H, Li L-J, Han S (2018) AMC: AutoML for model compression and acceleration on mobile devices. In: Proceedings of the European conference on computer vision (ECCV), pp 784–800
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Singh P, Paprzycki M, Bhargava B, Chhabra J, Kaushal N, Kumar Y (2018) Futuristic trends in network and communication technologies, FTNCT 2018. Communications in computer and information science, vol 958, pp 3–509
Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2017) Pruning filters for efficient convents. In: International conference on learning representations (ICLR)
Ratner A, Bach SH, Ehrenberg HS et al (2020) Rapid training data creation with weak supervision. VLDB J 29:709–730
Nashed MZ (1970) Steepest descent for singular linear operator equations. SIAM J Numer Anal 7(3):358–362
Mcinerney JM, Haines KG, Biafore S, HechtNielsen R (1992) Can backpropagation error surfaces have non-global minima? In: Proceedings of the IEEE-IJCNN 8911-627
Gori M, Tesi A (1992) On the problem of local minima in backpropagation. IEEE Trans Pattern Anal Mach Intell 14(1):76–86
Chang LY (2005) Analysis of freeway accident frequencies: negative binomial regression versus artificial neural network. Saf Sci 43:541–557
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4700–4708
Zhang NM, Wu W, Zheng GF (2006) Deterministic convergence of gradient method with momentum for two-layer feed forward neural networks. IEEE Trans Neural Netw 17(2):522–525
Nikolopoulos K, Goodwin P, Patelis A, Assimakopoulos V (2007) Forecasting with cue information: a comparison of multiple regression with alternative forecasting approaches. Eur J Oper Res 180(1):354–368
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Trivedi, U.B., Mishra, P. (2021). Improving Steepest Descent Method by Learning Rate Annealing and Momentum in Neural Network. In: Singh, P.K., Noor, A., Kolekar, M.H., Tanwar, S., Bhatnagar, R.K., Khanna, S. (eds) Evolving Technologies for Computing, Communication and Smart World. Lecture Notes in Electrical Engineering, vol 694. Springer, Singapore. https://doi.org/10.1007/978-981-15-7804-5_14
Download citation
DOI: https://doi.org/10.1007/978-981-15-7804-5_14
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-7803-8
Online ISBN: 978-981-15-7804-5
eBook Packages: Computer ScienceComputer Science (R0)