AdaSmooth: An Adaptive Learning Rate Method Based on Effective Ratio

Lu, Jun

doi:10.1007/978-981-19-5443-6_21

Jun Lu¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1432))

765 Accesses
3 Citations

Abstract

It is well known that we need to choose the hyper-parameters in Momentum, AdaGrad, AdaDelta, and other alternative stochastic optimizers. While in many cases, the hyper-parameters are tuned tediously based on experience becoming more of an art than science. We introduce a novel per-dimension learning rate method for stochastic gradient optimization called AdaSmooth. The method is insensitive to hyper-parameters thus it requires no manual tuning of the hyper-parameters like Momentum, AdaGrad, and AdaDelta methods. We show promising results compared to other methods on different convolutional neural networks, multi-layer perceptron, and alternative machine learning tasks. Empirical results demonstrate that AdaSmooth works well in practice and compares favourably to other stochastic optimization methods in neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DiffMoment: an adaptive optimization technique for convolutional neural network

Article 17 December 2022

Smooth momentum: improving lipschitzness in gradient descent

Article 22 October 2022

Toward Novel Optimizers: A Moreau-Yosida View of Gradient-Based Learning

Notes

1.
The Census Income data set has 48842 number of samples totally where 70% of them are used as training set in our case: https://archive.ics.uci.edu/ml/datasets/Census+Income.
2.
The MNIST data set has a training set of 60,000 examples, and a test set of 10,000 examples.

References

Becker, S., & LeCun, Y. (1988). Improving the convergence of back-propagation learning with second-order methods.
Google Scholar
Dauphin, Y. N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., & Bengio, Y. (2014). Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Advances in neural information processing systems (Vol. 27).
Google Scholar
Dozat, T. (2016). Incorporating Nesterov momentum into Adam.
Google Scholar
Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(7).
Google Scholar
Graves, A., Mohamed, A.-R., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 6645–6649). IEEE.
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.
Article Google Scholar
Hinton, G., Srivastava, N., & Swersky, K. (2012). Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. Neural Networks for Machine Learning, 14(8), 2.
Google Scholar
Kaufman, P. J. (1995). Smarter trading.
Google Scholar
Kaufman, P. J. (2013). Trading systems and methods, + website (Vol. 591). Wiley.
Google Scholar
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (Vol. 25)
Google Scholar
Le Roux, N., & Fitzgibbon, A. W. (2010). A fast natural newton method. In ICML.
Google Scholar
LeCun, Y. (1998). The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/.
Lu, J. (2022a). Exploring classic quantitative strategies. arXiv:2202.11309
Lu, J. (2022b). Gradient descent, stochastic optimization, and other tales. arXiv:2205.00832
Lu, J. (2022c). Matrix decomposition and applications. arXiv:2201.00145
Lu, J., & Yi, S. (2022). Reducing overestimating and underestimating volatility via the augmented blending-arch model. Applied Economics and Finance, 9(2), 48–59.
Google Scholar
Menard, S. (2002). Applied logistic regression analysis (Vol. 106). Sage.
Google Scholar
Moulines, E., & Bach, F. (2011). Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In Advances in neural information processing systems (Vol. 24).
Google Scholar
Polyak, B. T., & Juditsky, A. B. (1992). Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4), 838–855.
Article MathSciNet MATH Google Scholar
Qian, N. (1999). On the momentum term in gradient descent learning algorithms. Neural Networks, 12(1), 145–151.
Article Google Scholar
Robbins, H., & Monro, S. (1951). A stochastic approximation method. In The annals of mathematical statistics (pp. 400–407).
Google Scholar
Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv:1609.04747
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536.
Article MATH Google Scholar
Ruppert, D. (1988). Efficient estimations from a slowly convergent Robbins-Monro process. Technical report, Cornell University Operations Research and Industrial Engineering.
Google Scholar
Schaul, T., Zhang, S., & LeCun, Y. (2013). No more pesky learning rates. In International Conference on Machine Learning (pp. 343–351). PMLR.
Google Scholar
Schwarz, J., Jayakumar, S., Pascanu, R., Latham, P. E., & Teh, Y. (2021). Powerpropagation: A sparsity inducing weight reparameterisation. In Advances in neural information processing systems (Vol. 34, pp. 28889–28903).
Google Scholar
Smith, L. N. (2017). Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 464–472). IEEE.
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929–1958.
MathSciNet MATH Google Scholar
Sutskever, I., Martens, J., Dahl, G., & Hinton, G. (2013). On the importance of initialization and momentum in deep learning. In International Conference on Machine Learning (pp. 1139–1147). PMLR.
Google Scholar
You, Y., Li, J., Reddi, S., Hseu, J., Kumar, S., Bhojanapalli, S., Song, X., Demmel, J., Keutzer, K., & Hsieh, C.-J. (2019). Large batch optimization for deep learning: Training Bert in 76 minutes. arXiv:1904.00962
Zeiler, M. D. (2012). Adadelta: An adaptive learning rate method. arXiv:1212.5701

Download references

Author information

Authors and Affiliations

Trexquant, New York, USA
Jun Lu

Authors

Jun Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Lu .

Editor information

Editors and Affiliations

Institute of Engineering, Tribhuvan University, Kirtipur, Nepal
Subarna Shakya
Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada
Ke-Lin Du
University of West Attica, Greece, Greece
Klimis Ntalianis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, J. (2023). AdaSmooth: An Adaptive Learning Rate Method Based on Effective Ratio. In: Shakya, S., Du, KL., Ntalianis, K. (eds) Sentiment Analysis and Deep Learning. Advances in Intelligent Systems and Computing, vol 1432. Springer, Singapore. https://doi.org/10.1007/978-981-19-5443-6_21

Download citation

DOI: https://doi.org/10.1007/978-981-19-5443-6_21
Published: 01 January 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-5442-9
Online ISBN: 978-981-19-5443-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

AdaSmooth: An Adaptive Learning Rate Method Based on Effective Ratio

Abstract

Access this chapter

Similar content being viewed by others

DiffMoment: an adaptive optimization technique for convolutional neural network

Smooth momentum: improving lipschitzness in gradient descent

Toward Novel Optimizers: A Moreau-Yosida View of Gradient-Based Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

AdaSmooth: An Adaptive Learning Rate Method Based on Effective Ratio

Abstract

Access this chapter

Similar content being viewed by others

DiffMoment: an adaptive optimization technique for convolutional neural network

Smooth momentum: improving lipschitzness in gradient descent

Toward Novel Optimizers: A Moreau-Yosida View of Gradient-Based Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation