Skip to main content

Improving Steepest Descent Method by Learning Rate Annealing and Momentum in Neural Network

  • Conference paper
  • First Online:
Evolving Technologies for Computing, Communication and Smart World

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 694))

  • 745 Accesses

Abstract

The backpropagation method of neural network (BPNN) method which is an important algorithm in machine learning has been applied to wide range of problem like pattern recognition, optimization, approximation, classification, and data clustering in real world. BPNN algorithm has been widely used in age estimation, pedestrian gender classification, traffic sign recognition, character recognition, water pollution forecasting models, heart disease classification, breast cancer detection (Keskar et al. in On large-batch training for deep learning: Generalization gap and sharp minima, 2016) [1], remote sensing, and image classification (He et al. in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 558–567, 2019) [2]. Algorithm uses a steepest gradient method and suffers with some limitations like convergence to local minima and slow convergence velocity of learning. This research proposed solution for the slow learning convergence velocity by implementing learning rate annealing which implements anneal the learning rate (decline as time progresses) rather than constant learning rate throughout the training. The problem of local minima can be address by momentum, which can be calculated by adding fraction of the past weights updates to the calculation of current weight.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP (2016) On large-batch training for deep learning: generalization gap and sharp minima. arXivpreprint arXiv:1609.04836

    Google Scholar 

  2. He T, Zhang Z, Zhang H, Zhang Z, Xie J, Li M (2019) Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 558–567

    Google Scholar 

  3. Mutasem KSA, Khairuddin O, Shahrul AN (2009) Back propagation algorithm: the best algorithm among the multi-layer perceptron algorithm. Int J Comput Sci Netw Secur 9(4):378–383

    Google Scholar 

  4. Shamsuddin SM et al (2009) Study of cost functions in three term backpropagation for classification problems. In: World congress on nature & biologically inspired computing, NaBIC 2009. IEEE

    Google Scholar 

  5. Sharda R, Delen D (2006) Predicting box-office success of motion pictures with neural networks. Expert Syst Appl 30:243–254

    Article  Google Scholar 

  6. Whitney TM, Meany RK (2006) Two algorithms related to the method of steepest descent. SIAM

    Google Scholar 

  7. Olson RS, Moore JH (2019) TPOT: a tree-based pipeline optimization tool for automating machine learning. In: Hutter F, Kotthoff L, Vanschoren J (eds) Automated machine learning. The springer series on challenges in machine learning. Springer, Cham

    Google Scholar 

  8. .

    Google Scholar 

  9. Majdi A, Beiki M (2010) Evolving neural network using genetic algorithm for predicting the deformation modulus of rock masses. Int J Rock Mech Min Sci 47(2):246–253

    Article  Google Scholar 

  10. Bista R, Thapa A (2020) Handbook of wireless sensor networks: issues and challenges in current scenario’s, advances in intelligent systems and computing, vol 1132. Springer, Cham, Switzerland, pp 239–259

    Google Scholar 

  11. Wang X (2008) Method of steepest descent and its applications. IEEE Microwave Wirel Compon Lett

    Google Scholar 

  12. Burse K, Manoria M, Kirar VPS (2010) Improved back propagation algorithm to avoid local minima in multiplicative neuron model. World Acad Sci Eng Technol Int J Electr Comput Eng 4(12)

    Google Scholar 

  13. Jastrzębski S, Kenton Z, Arpit D, Ballas N, Fischer A, Bengio Y, Storkey A (2017) Three factors influencing minima in SGD. arXiv preprint arXiv:1711.04623

    Google Scholar 

  14. Vora K, Yagnik S (2014) A new technique to solve local minima problem with large number of hidden nodes on feed forward neural network. IJEDR 2(2). ISSN: 2321-9939

    Google Scholar 

  15. Im DJ, Tao M, Branson K (2016) An empirical analysis of deep network loss surfaces. arXiv preprint arXiv:1612.04010

    Google Scholar 

  16. Ng SC, Leung SH, Luk A (1999) Fast convergent generalized back propagation algorithm with constant learning rate. Neural Process Lett 9:13–23

    Article  Google Scholar 

  17. Liu H, Simonyan K, Yang Y (2019) DARTS: differentiable architecture search. In: International conference on learning representations (ICLR)

    Google Scholar 

  18. Singh PK, Bhargava BK, Paprzycki M, Kaushal NC, Hong WC (2020) Handbook of wireless sensor networks: issues and challenges in current scenario’s, advances in intelligent systems and computing, vol 1132. Springer, Cham, Switzerland, pp 155–437

    Google Scholar 

  19. Singh PK, Kar AK, Singh Y, Kolekar MH, Tanwar S (2020) Proceedings of ICRIC 2019, recent innovations in computing, 2020, vol 597. Lecture notes in electrical engineering. Springer, Cham, Switzerland, pp 3–920

    Google Scholar 

  20. Yan H, Jiang Y, Zheng J, Peng C, Li Q (2006) A multilayer perceptron-based medical decision support system for heart disease diagnosis. Expert Syst Appl 30(2):272–281

    Article  Google Scholar 

  21. Zou H, Xia G, Yang F, Yang H (2007) A neural network model based on the multi-stage optimization approach for short-term food price forecasting in China. Expert Syst Appl 33(2):347–356

    Article  Google Scholar 

  22. He Y, Lin J, Liu Z, Wang H, Li L-J, Han S (2018) AMC: AutoML for model compression and acceleration on mobile devices. In: Proceedings of the European conference on computer vision (ECCV), pp 784–800

    Google Scholar 

  23. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

    Google Scholar 

  24. Singh P, Paprzycki M, Bhargava B, Chhabra J, Kaushal N, Kumar Y (2018) Futuristic trends in network and communication technologies, FTNCT 2018. Communications in computer and information science, vol 958, pp 3–509

    Google Scholar 

  25. Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2017) Pruning filters for efficient convents. In: International conference on learning representations (ICLR)

    Google Scholar 

  26. Ratner A, Bach SH, Ehrenberg HS et al (2020) Rapid training data creation with weak supervision. VLDB J 29:709–730

    Article  Google Scholar 

  27. Nashed MZ (1970) Steepest descent for singular linear operator equations. SIAM J Numer Anal 7(3):358–362

    Google Scholar 

  28. Mcinerney JM, Haines KG, Biafore S, Hecht­Nielsen R (1992) Can backpropagation error surfaces have non-global minima? In: Proceedings of the IEEE-IJCNN 8911-627

    Google Scholar 

  29. Gori M, Tesi A (1992) On the problem of local minima in backpropagation. IEEE Trans Pattern Anal Mach Intell 14(1):76–86

    Google Scholar 

  30. Chang LY (2005) Analysis of freeway accident frequencies: negative binomial regression versus artificial neural network. Saf Sci 43:541–557

    Google Scholar 

  31. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4700–4708

    Google Scholar 

  32. Zhang NM, Wu W, Zheng GF (2006) Deterministic convergence of gradient method with momentum for two-layer feed forward neural networks. IEEE Trans Neural Netw 17(2):522–525

    Google Scholar 

  33. Nikolopoulos K, Goodwin P, Patelis A, Assimakopoulos V (2007) Forecasting with cue information: a comparison of multiple regression with alternative forecasting approaches. Eur J Oper Res 180(1):354–368

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Udai Bhan Trivedi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Trivedi, U.B., Mishra, P. (2021). Improving Steepest Descent Method by Learning Rate Annealing and Momentum in Neural Network. In: Singh, P.K., Noor, A., Kolekar, M.H., Tanwar, S., Bhatnagar, R.K., Khanna, S. (eds) Evolving Technologies for Computing, Communication and Smart World. Lecture Notes in Electrical Engineering, vol 694. Springer, Singapore. https://doi.org/10.1007/978-981-15-7804-5_14

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-7804-5_14

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-7803-8

  • Online ISBN: 978-981-15-7804-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics