Newton design: designing CNNs with the family of Newton’s methods

Shen, Zhengyang; Yang, Yibo; She, Qi; Wang, Changhu; Ma, Jinwen; Lin, Zhouchen

doi:10.1007/s11432-021-3442-2

Newton design: designing CNNs with the family of Newton’s methods

Research Paper
Published: 22 May 2023

Volume 66, article number 162101, (2023)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Zhengyang Shen^1,2,
Yibo Yang³,
Qi She²,
Changhu Wang²,
Jinwen Ma¹ &
…
Zhouchen Lin^4,5

104 Accesses
Explore all metrics

Abstract

Nowadays, convolutional neural networks (CNNs) have led the developments of machine learning. However, most CNN architectures are obtained by manual design, which is empirical, time-consuming, and non-transparent. In this paper, we aim at offering better insight into CNN models from the perspective of optimization theory. We propose a unified framework for understanding and designing CNN architectures with the family of Newton’s methods, which is referred to as Newton design. Specifically, we observe that the standard feedforward CNN model (PlainNet) solves an optimization problem via a kind of quasi-Newton method. Interestingly, residual network (ResNet) can also be derived if we use a more general quasi-Newton method to solve this problem. Based on the above observations, we solve this problem via a better method, the Newton-conjugate-gradient (Newton-CG) method, which inspires Newton-CGNet. In the network design, we translate binary-value terms in the optimization schemes to dropout layers, so dropout modules naturally appear in the derived CNN structures with specific locations, rather than being an empirical training strategy. Extensive experiments on image classification and text categorization tasks verify that Newton-CGNets perform very competitively. Particularly, Newton-CGNets surpass their counterparts ResNets by over 4% on CIFAR-10 and over 10% on CIFAR-100, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimization Method of Residual Networks of Residual Networks for Image Classification

Forward layer-wise learning of convolutional neural networks through separation index maximizing

Article Open access 13 April 2024

ResNet: Solving Vanishing Gradient in Deep Networks

References

Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012. 1097–1105
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, 2015
Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015. 1–9
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770–778
Huang G, Liu Z, Maaten L, et al. Densely connected convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 4700–4708
Yang Y, Zhong Z, Shen T, et al. Convolutional neural networks with alternately updated clique. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 2413–2422
Kim Y. Convolutional neural networks for sentence classification. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, 2014. 1746–1751
Zhang X, Zhao J B, LeCun Y. Character-level convolutional networks for text classification. InProceedings of the 28th International Conference on Neural Information Processing Systems, 2015. 649–657
Conneau A, Schwenk H, Barrault L, et al. Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017. 1107–1116
Johnson R, Zhang T. Deep pyramid convolutional neural networks for text categorization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017. 562–570
Baker B, Gupta O, Naik N, et al. Designing neural network architectures using reinforcement learning. In: Proceedings of International Conference on Learning Representations, 2017
Liu C, Zoph B, Neumann M, et al. Progressive neural architecture search. In: Proceedings of European Conference on Computer Vision, 2018. 19–34
Pham H, Guan M Y, Zoph B, et al. Efficient neural architecture search via parameter sharing. In: Proceedings of the 35th International Conference on Machine Learning, 2018. 4095–4104
Liu H, Simonyan K, Yang Y. Darts: differentiable architecture search. In: Proceedings of International Conference on Learning Representations, 2018
Chen X, Xie L X, Wu J, et al. Progressive differentiable architecture search: bridging the depth gap between search and evaluation. In: Proceedings of IEEE/CVF International Conference on Computer Vision, 2019. 1294–1303
Gregor K, LeCun Y. Learning fast approximations of sparse coding. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, 2010. 399–406
Liu Q, Wang J. A one-layer projection neural network for nonsmooth optimization subject to linear equalities and bound constraints. IEEE Trans Neural Netw Learn Syst, 2013, 24: 812–824
Article Google Scholar
Xin B, Wang Y, Gao W, et al. Maximal sparsity with deep networks? In: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016. 4340–4348
Yang Y, Sun J, Li H B, et al. Deep ADMM-Net for compressive sensing MRI. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016. 10–18
Zhang J, Ghanem B. ISTA-Net: interpretable optimization-inspired deep network for image compressive sensing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 1828–1837
Giryes R, Eldar Y C, Bronstein A M, et al. Tradeoffs between convergence speed and reconstruction accuracy in inverse problems. IEEE Trans Signal Process, 2018, 66: 1676–1690
Article MathSciNet MATH Google Scholar
Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci, 2009, 2: 183–202
Article MathSciNet MATH Google Scholar
Papyan V, Romano Y, Elad M. Convolutional neural networks analyzed via convolutional sparse coding. J Mach Learn Res, 2017, 18: 2887–2938
MathSciNet MATH Google Scholar
Sun X, Nasrabadi N M, Tran T D. Supervised deep sparse coding networks for image classification. IEEE Trans Image Process, 2020, 29: 405–418
Article MathSciNet MATH Google Scholar
Chan K H R, Yu Y, You C, et al. ReduNet: a white-box deep network from the principle of maximizing rate reduction. 2021. ArXiv:2105.10446
Li H, Yang Y, Chen D, et al. Optimization algorithm inspired deep neural network structure design. In: Proceedings of the 10th Asian Conference on Machine Learning, 2018. 614–629
Schaffer J D, Whitley D, Eshelman L J. Combinations of genetic algorithms and neural networks: a survey of the state of the art. In: Proceedings of International Workshop on Combinations of Genetic Algorithms and Neural Networks, 1992. 1–37
Leung F H F, Lam H K, Ling S H, et al. Tuning of the structure and parameters of a neural network using an improved genetic algorithm. IEEE Trans Neural Netw, 2003, 14: 79–88
Article Google Scholar
Verbancsics P, Harguess J. Generative neuroevolution for deep learning. 2013. ArXiv:1312.5355
Domhan T, Springenberg J T, Hutter F. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Proceedings of the 24th International Conference on Artificial Intelligence, 2015
E W. A proposal on machine learning via dynamical systems. Commun Math Stat, 2017, 5: 1–11
Article MathSciNet MATH Google Scholar
Lu Y, Zhong A, Li Q, et al. Beyond finite layer neural networks: bridging deep architectures and numerical differential equations. In: Proceedings of the 35th International Conference on Machine Learning, 2018. 3276–3285
Haber E, Ruthotto L. Stable architectures for deep neural networks. Inverse Probl, 2018, 34: 014004
Article MathSciNet MATH Google Scholar
Chen T Q, Rubanova Y, Bettencourt J, et al. Neural ordinary differential equations. In: Proceedings of the 32nd Conference on Neural Information Processing System, 2018. 6571–6583
Howard A G, Zhu M, Chen B, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications. 2017. ArXiv:1704.04861
Krizhevsky A, Hinton G. Learning Multiple Layers of Features from Tiny Images. Technical Report, 2009
Lee C Y, Xie S, Gallagher P, et al. Deeply-supervised nets. In: Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, 2015. 562–570
Netzer Y, Wang T, Coates A, et al. Reading digits in natural images with unsupervised feature learning. In: Proceedings of Conference on Neural Information Processing Systems, 2011
Huang G, Sun Y, Liu Z, et al. Deep networks with stochastic depth. In: Proceedings of European Conference on Computer Vision, 2016. 646–661
Zagoruyko S, Komodakis N. Wide residual networks. 2016. ArXiv:1605.07146
Deng J, Dong W, Socher R, et al. Imagenet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2009. 248–255
He K, Zhang X, Ren S, et al. Identity mappings in deep residual networks. In: Proceedings of European Conference on Computer Vision, 2016. 630–645
Sutskever I, Martens J, Dahl G, et al. On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, 2013. 1139–1147
He K, Zhang X, Ren S, et al. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of IEEE/CVF International Conference on Computer Vision, 2015. 1026–1034
Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistic, 2010. 249–256
Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. 2015. ArXiv:1502.03167
Larsson G, Maire M, Shakhnarovich G. Fractalnet: ultra-deep neural networks without residuals. In: Proceedings of International Conference on Learning Representations, 2017

Download references

Acknowledgements

This work was supported by National Key R&D Program of China (Grant No. 2022ZD0160302), Major Key Project of PCL, China (Grant No. PCL2021A12), and National Natural Science Foundation of China (Grant No. 62276004).

Author information

Authors and Affiliations

School of Mathematical Sciences, Peking University, Beijing, 100871, China
Zhengyang Shen & Jinwen Ma
Bytedance AI Lab, Haidian District, Beijing, 100871, China
Zhengyang Shen, Qi She & Changhu Wang
JD Explore Academy, Beijing, 100176, China
Yibo Yang
Key Laboratory of Machine Perception, School of Intelligence Science and Technology, Peking University, Beijing, 100871, China
Zhouchen Lin
Pazhou Lab, Guangzhou, 510320, China
Zhouchen Lin

Authors

Zhengyang Shen
View author publications
You can also search for this author in PubMed Google Scholar
Yibo Yang
View author publications
You can also search for this author in PubMed Google Scholar
Qi She
View author publications
You can also search for this author in PubMed Google Scholar
Changhu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jinwen Ma
View author publications
You can also search for this author in PubMed Google Scholar
Zhouchen Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jinwen Ma or Zhouchen Lin.

Additional information

Supporting information

Appendixes A and B. The supporting information is available online at info.scichina.com and link.springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.

Supplementary File