Abstract
Nowadays, convolutional neural networks (CNNs) have led the developments of machine learning. However, most CNN architectures are obtained by manual design, which is empirical, time-consuming, and non-transparent. In this paper, we aim at offering better insight into CNN models from the perspective of optimization theory. We propose a unified framework for understanding and designing CNN architectures with the family of Newton’s methods, which is referred to as Newton design. Specifically, we observe that the standard feedforward CNN model (PlainNet) solves an optimization problem via a kind of quasi-Newton method. Interestingly, residual network (ResNet) can also be derived if we use a more general quasi-Newton method to solve this problem. Based on the above observations, we solve this problem via a better method, the Newton-conjugate-gradient (Newton-CG) method, which inspires Newton-CGNet. In the network design, we translate binary-value terms in the optimization schemes to dropout layers, so dropout modules naturally appear in the derived CNN structures with specific locations, rather than being an empirical training strategy. Extensive experiments on image classification and text categorization tasks verify that Newton-CGNets perform very competitively. Particularly, Newton-CGNets surpass their counterparts ResNets by over 4% on CIFAR-10 and over 10% on CIFAR-100, respectively.
Similar content being viewed by others
References
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012. 1097–1105
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, 2015
Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015. 1–9
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770–778
Huang G, Liu Z, Maaten L, et al. Densely connected convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 4700–4708
Yang Y, Zhong Z, Shen T, et al. Convolutional neural networks with alternately updated clique. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 2413–2422
Kim Y. Convolutional neural networks for sentence classification. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, 2014. 1746–1751
Zhang X, Zhao J B, LeCun Y. Character-level convolutional networks for text classification. InProceedings of the 28th International Conference on Neural Information Processing Systems, 2015. 649–657
Conneau A, Schwenk H, Barrault L, et al. Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017. 1107–1116
Johnson R, Zhang T. Deep pyramid convolutional neural networks for text categorization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017. 562–570
Baker B, Gupta O, Naik N, et al. Designing neural network architectures using reinforcement learning. In: Proceedings of International Conference on Learning Representations, 2017
Liu C, Zoph B, Neumann M, et al. Progressive neural architecture search. In: Proceedings of European Conference on Computer Vision, 2018. 19–34
Pham H, Guan M Y, Zoph B, et al. Efficient neural architecture search via parameter sharing. In: Proceedings of the 35th International Conference on Machine Learning, 2018. 4095–4104
Liu H, Simonyan K, Yang Y. Darts: differentiable architecture search. In: Proceedings of International Conference on Learning Representations, 2018
Chen X, Xie L X, Wu J, et al. Progressive differentiable architecture search: bridging the depth gap between search and evaluation. In: Proceedings of IEEE/CVF International Conference on Computer Vision, 2019. 1294–1303
Gregor K, LeCun Y. Learning fast approximations of sparse coding. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, 2010. 399–406
Liu Q, Wang J. A one-layer projection neural network for nonsmooth optimization subject to linear equalities and bound constraints. IEEE Trans Neural Netw Learn Syst, 2013, 24: 812–824
Xin B, Wang Y, Gao W, et al. Maximal sparsity with deep networks? In: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016. 4340–4348
Yang Y, Sun J, Li H B, et al. Deep ADMM-Net for compressive sensing MRI. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016. 10–18
Zhang J, Ghanem B. ISTA-Net: interpretable optimization-inspired deep network for image compressive sensing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 1828–1837
Giryes R, Eldar Y C, Bronstein A M, et al. Tradeoffs between convergence speed and reconstruction accuracy in inverse problems. IEEE Trans Signal Process, 2018, 66: 1676–1690
Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci, 2009, 2: 183–202
Papyan V, Romano Y, Elad M. Convolutional neural networks analyzed via convolutional sparse coding. J Mach Learn Res, 2017, 18: 2887–2938
Sun X, Nasrabadi N M, Tran T D. Supervised deep sparse coding networks for image classification. IEEE Trans Image Process, 2020, 29: 405–418
Chan K H R, Yu Y, You C, et al. ReduNet: a white-box deep network from the principle of maximizing rate reduction. 2021. ArXiv:2105.10446
Li H, Yang Y, Chen D, et al. Optimization algorithm inspired deep neural network structure design. In: Proceedings of the 10th Asian Conference on Machine Learning, 2018. 614–629
Schaffer J D, Whitley D, Eshelman L J. Combinations of genetic algorithms and neural networks: a survey of the state of the art. In: Proceedings of International Workshop on Combinations of Genetic Algorithms and Neural Networks, 1992. 1–37
Leung F H F, Lam H K, Ling S H, et al. Tuning of the structure and parameters of a neural network using an improved genetic algorithm. IEEE Trans Neural Netw, 2003, 14: 79–88
Verbancsics P, Harguess J. Generative neuroevolution for deep learning. 2013. ArXiv:1312.5355
Domhan T, Springenberg J T, Hutter F. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Proceedings of the 24th International Conference on Artificial Intelligence, 2015
E W. A proposal on machine learning via dynamical systems. Commun Math Stat, 2017, 5: 1–11
Lu Y, Zhong A, Li Q, et al. Beyond finite layer neural networks: bridging deep architectures and numerical differential equations. In: Proceedings of the 35th International Conference on Machine Learning, 2018. 3276–3285
Haber E, Ruthotto L. Stable architectures for deep neural networks. Inverse Probl, 2018, 34: 014004
Chen T Q, Rubanova Y, Bettencourt J, et al. Neural ordinary differential equations. In: Proceedings of the 32nd Conference on Neural Information Processing System, 2018. 6571–6583
Howard A G, Zhu M, Chen B, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications. 2017. ArXiv:1704.04861
Krizhevsky A, Hinton G. Learning Multiple Layers of Features from Tiny Images. Technical Report, 2009
Lee C Y, Xie S, Gallagher P, et al. Deeply-supervised nets. In: Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, 2015. 562–570
Netzer Y, Wang T, Coates A, et al. Reading digits in natural images with unsupervised feature learning. In: Proceedings of Conference on Neural Information Processing Systems, 2011
Huang G, Sun Y, Liu Z, et al. Deep networks with stochastic depth. In: Proceedings of European Conference on Computer Vision, 2016. 646–661
Zagoruyko S, Komodakis N. Wide residual networks. 2016. ArXiv:1605.07146
Deng J, Dong W, Socher R, et al. Imagenet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2009. 248–255
He K, Zhang X, Ren S, et al. Identity mappings in deep residual networks. In: Proceedings of European Conference on Computer Vision, 2016. 630–645
Sutskever I, Martens J, Dahl G, et al. On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, 2013. 1139–1147
He K, Zhang X, Ren S, et al. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of IEEE/CVF International Conference on Computer Vision, 2015. 1026–1034
Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistic, 2010. 249–256
Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. 2015. ArXiv:1502.03167
Larsson G, Maire M, Shakhnarovich G. Fractalnet: ultra-deep neural networks without residuals. In: Proceedings of International Conference on Learning Representations, 2017
Acknowledgements
This work was supported by National Key R&D Program of China (Grant No. 2022ZD0160302), Major Key Project of PCL, China (Grant No. PCL2021A12), and National Natural Science Foundation of China (Grant No. 62276004).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Supporting information
Appendixes A and B. The supporting information is available online at info.scichina.com and link.springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.
Supplementary File
Rights and permissions
About this article
Cite this article
Shen, Z., Yang, Y., She, Q. et al. Newton design: designing CNNs with the family of Newton’s methods. Sci. China Inf. Sci. 66, 162101 (2023). https://doi.org/10.1007/s11432-021-3442-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-021-3442-2