State space representation and phase analysis of gradient descent optimizers

Yao, Biyuan; Li, Guiqing; Wu, Wei

doi:10.1007/s11432-022-3539-8

State space representation and phase analysis of gradient descent optimizers

Research Paper
Published: 27 March 2023

Volume 66, article number 142102, (2023)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Biyuan Yao¹,
Guiqing Li¹ &
Wei Wu²

80 Accesses
1 Citation
Explore all metrics

Abstract

Deep learning has achieved good results in the field of image recognition due to the key role of the optimizer in a deep learning network. In this work, the optimizers of dynamical system models are established, and the influence of parameter adjustments on the dynamic performance of the system is proposed. This is a useful supplement to the theoretical control models of optimizers. First, the system control model is derived based on the iterative formula of the optimizer, the optimizer model is expressed by differential equations, and the control equation of the optimizer is established. Second, based on the system control model of the optimizer, the phase trajectory process of the optimizer model and the influence of different hyperparameters on the system performance of the learning model are analyzed. Finally, controllers with different optimizers and different hyperparameters are used to classify the MNIST and CIFAR-10 datasets to verify the effects of different optimizers on the model learning performance and compare them with related methods. Experimental results show that selecting appropriate optimizers can accelerate the convergence speed of the model and improve the accuracy of model recognition. Furthermore, the convergence speed and performance of the stochastic gradient descent (SGD) optimizer are better than those of the stochastic gradient descent-momentum (SGD-M) and Nesterov accelerated gradient (NAG) optimizers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next

Article Open access 26 July 2022

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

References

Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Commun ACM, 2017, 60: 84–90
Article Google Scholar
Sun Y L, Yu B G. Python Machine Learning Algorithm and Practice (in Chinese). Beijing: Publishing House of Electronics Industry, 2021
Google Scholar
Robbins H, Monro S. A stochastic approximation method. Ann Math Statist, 1951, 22: 400–407
Article MathSciNet MATH Google Scholar
Polyak B T. Some methods of speeding up the convergence of iteration methods. USSR Comput Math Math Phys, 1964, 4: 1–17
Article Google Scholar
Nesterov Y E. A method for unconstrained convex minimization problem with the rate of convergence O(1/k²). Doklady ANSSSR, 1983, 27: 543–547
Google Scholar
An W P, Wang H Q, Sun Q Y, et al. A PID controller approach for stochastic optimization of deep networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 8522–8531
Wang H Q, Luo Y, An W P, et al. PID controller-based stochastic optimization acceleration for deep neural networks. IEEE Trans Neural Netw Learn Syst, 2020, 31: 5079–5091
Article Google Scholar
Wu W, Jing X Y, Du W C, et al. Learning dynamics of gradient descent optimization in deep neural networks. Sci China Inf Sci, 2021, 64: 150102
Article MathSciNet Google Scholar
Blum J R. Multidimensional stochastic approximation methods. Ann Math Statist, 1954, 25: 737–744
Article MathSciNet MATH Google Scholar
Wang Z Y, Fu Y, Huang S T. Deep Learning Through Sparse and Low-Rank Modeling (in Chinese). Beijing: China Machine Press, 2021
Google Scholar
Lei Y W, Hu T, Li G Y, et al. Stochastic gradient descent for nonconvex learning without bounded gradient assumptions. IEEE Trans Neural Netw Learn Syst, 2020, 31: 4394–4400
Article MathSciNet Google Scholar
Engel I, Bershad N J. A transient learning comparison of Rosenblatt, backpropagation, and LMS algorithms for a single-layer perceptron for system identification. IEEE Trans Signal Process, 1994, 42: 1247–1251
Article Google Scholar
Yang H H, Amari S. Complexity issues in natural gradient descent method for training multilayer perceptrons. Neural Computation, 1998, 10: 2137–2157
Article Google Scholar
Bengio Y. Learning deep architectures for AI. FNT Machine Learn, 2009, 2: 1–127
Article MATH Google Scholar
Li Y Z, Liang Y Y. Learning overparameterized neural networks via stochastic gradient descent on structured data. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, Paris, 2018. 8168–8177
Luo X, Qin W, Dong A, et al. Efficient and high-quality recommendations via momentum-incorporated parallel stochastic gradient descent-based learning. IEEE CAA J Autom Sin, 2021, 8: 402–411
Article MathSciNet Google Scholar
Ruder S. An overview of gradient descent optimization algorithms. 2017. ArXiv:1609.04747
Ding X H, Ding G G, Zhou X X, et al. Global sparse momentum SGD for pruning very deep neural networks. In: Proceedings of the 33rd Conference on Neural Information Processing Systems, Vancouver, 2019. 1–13
Wang J Y, Tantia V, Ballas N, et al. SlowMo: improving communication-efficient distributed SGD with slow momentum. In: Proceedings of International Conference on Learning Representation, 2020. 1–27
Jarek D. SGD momentum optimizer with step estimation by online parabola model. 2019. ArXiv:1907.07063
Aleksandar B, Guy L, David B. Nesterov’s accelerated gradient and momentum as approximations to regularised update descent. In: Proceedings of International Joint Conference on Neural Networks, Anchorage, 2017. 1899–1903
Luo L C, Xiong Y H, Liu Y, et al. Adaptive gradient methods with dynamic bound of learning rate. In: Proceedings of International Conference on Learning Representation, New Orleans, 2019. 1–19
Zeyuan A Z. Katyusha: the first direct acceleration of stochastic gradient methods. J Mach Learning Res, 2017, 18: 8194–8244
MathSciNet MATH Google Scholar
Luo X, Zhou M C. Effects of extended stochastic gradient descent algorithms on improving latent factor-based recommender systems. IEEE Robot Autom Lett, 2019, 4: 618–624
Article Google Scholar
Luo X, Wang D X, Zhou M C, et al. Latent factor-based recommenders relying on extended stochastic gradient descent algorithms. IEEE Trans Syst Man Cyber Syst, 2021, 51: 916–926
Article Google Scholar
Lei Y W, Tang K. Learning rates for stochastic gradient descent with nonconvex objectives. IEEE Trans Pattern Anal Mach Intell, 2021, 43: 4505–4511
Article Google Scholar
Amid E, Anil R, Fifty C, et al. Step-size adaptation using exponentiated gradient updates. 2022. ArXiv:2202.00145
Andrychowicz M, Denil M, Colmenarejo G S, et al. Learning to learn by gradient descent by gradient descent. In: Proceedings of Advances in Neural Information Processing Systems, 2016. 1–9
Xiong Y H, Lan L C, Chen X N, et al. Learning to schedule learning rate with graph neural networks. In: Proceedings of International Conference on Learning Representations, 2022. 1–21
Shu J, Zhu Y W, Zhao Q, et al. MLR-SNet: transferable LR schedules for heterogeneous tasks. In: Proceedings of International Conference on Learning Representations, 2021. 1–25
Alamo T, Ferramosca A, González A H, et al. A gradient-based strategy for integrating real time optimizer (RTO) with model predictive control (MPC). IFAC Proc Volumes, 2012, 45: 33–38
Article Google Scholar
Chen J N, Hua C C. Adaptive full-state-constrained control of nonlinear systems with deferred constraints based on nonbarrier Lyapunov function method. IEEE Trans Cybern, 2022, 52: 7634–7642
Article Google Scholar
Lee T H, Trinh H M, Park J H. Stability analysis of neural networks with time-varying delay by constructing novel Lyapunov functionals. IEEE Trans Neural Netw Learn Syst, 2018, 29: 4238–4247
Article Google Scholar
Faydasicok O. A new Lyapunov functional for stability analysis of neutral-type Hopfield neural networks with multiple delays. Neural Networks, 2020, 129: 288–297
Article MATH Google Scholar
Yuan F Y, Liu Y J, Liu L, et al. Adaptive neural consensus tracking control for nonlinear multiagent systems using integral barrier Lyapunov functionals. IEEE Trans Neural Netw Learn Syst, 2021. doi: https://doi.org/10.1109/TNNLS.2021.3112763
Wang Z L, Wang S K, Chen G S, et al. MATLAB/Simulink and control system simulation (in Chinese). Beijing: Publishing House of Electronics Industry, 2019
Google Scholar
Bhaya A, Kaszkurewicz E. Control Perspectives on Numerical Algorithms and Matrix Problems. Philadelphia: Society for Industrial and Applied Mathematics, 2006
Book MATH Google Scholar
Chen Y Q, Wei Y H, Wang Y, et al. On the unified design of accelerated gradient descent. In: Proceedings of 2019 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, 2019
Liu L, Liu J, Hsieh C J, et al. Stochastically controlled compositional gradient for composition problems. IEEE Trans Neural Netw Learn Syst, 2021, doi: https://doi.org/10.1109/TNNLS.2021.3098222
Xu B G. Principle of Automatic Control (in Chinese). Beijing: Publishing House of Electronics Industry, 2013
Google Scholar
Oppenheim A V, Willsky A S, Nawab S H. Signals and Systems (in Chinese). 2nd ed. Beijing: Publishing House of Electronics Industry, 2020
Google Scholar
Department of Mathematics, Tongji University. Higher Mathematics (in Chinese). Beijing: Posts & Telecom Press, 2016
Google Scholar
Gao G S, Yu W X. Principle of Automatic Control (in Chinese). Guangzhou: South China University of Technology Press, 2013
Google Scholar
Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proc IEEE, 1998, 86: 2278–2324
Article Google Scholar
Krizhevsky A, Hinton G. Learning Multiple Layers of Features From Tiny Images. Technical Report, University of Toronto, 2009

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant No. 61972160) and Natural Science Foundation of Guangdong Province (Grant No. 2021A1515012301).

Author information

Authors and Affiliations

School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China
Biyuan Yao & Guiqing Li
School of Computer, Wuhan University, Wuhan, 430072, China
Wei Wu

Authors

Biyuan Yao
View author publications
You can also search for this author in PubMed Google Scholar
Guiqing Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guiqing Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yao, B., Li, G. & Wu, W. State space representation and phase analysis of gradient descent optimizers. Sci. China Inf. Sci. 66, 142102 (2023). https://doi.org/10.1007/s11432-022-3539-8

Download citation

Received: 24 January 2022
Revised: 27 April 2022
Accepted: 23 June 2022
Published: 27 March 2023
DOI: https://doi.org/10.1007/s11432-022-3539-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

State space representation and phase analysis of gradient descent optimizers

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

State space representation and phase analysis of gradient descent optimizers

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation