Skip to main content
Log in

Newton design: designing CNNs with the family of Newton’s methods

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Nowadays, convolutional neural networks (CNNs) have led the developments of machine learning. However, most CNN architectures are obtained by manual design, which is empirical, time-consuming, and non-transparent. In this paper, we aim at offering better insight into CNN models from the perspective of optimization theory. We propose a unified framework for understanding and designing CNN architectures with the family of Newton’s methods, which is referred to as Newton design. Specifically, we observe that the standard feedforward CNN model (PlainNet) solves an optimization problem via a kind of quasi-Newton method. Interestingly, residual network (ResNet) can also be derived if we use a more general quasi-Newton method to solve this problem. Based on the above observations, we solve this problem via a better method, the Newton-conjugate-gradient (Newton-CG) method, which inspires Newton-CGNet. In the network design, we translate binary-value terms in the optimization schemes to dropout layers, so dropout modules naturally appear in the derived CNN structures with specific locations, rather than being an empirical training strategy. Extensive experiments on image classification and text categorization tasks verify that Newton-CGNets perform very competitively. Particularly, Newton-CGNets surpass their counterparts ResNets by over 4% on CIFAR-10 and over 10% on CIFAR-100, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012. 1097–1105

  2. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, 2015

  3. Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015. 1–9

  4. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770–778

  5. Huang G, Liu Z, Maaten L, et al. Densely connected convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 4700–4708

  6. Yang Y, Zhong Z, Shen T, et al. Convolutional neural networks with alternately updated clique. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 2413–2422

  7. Kim Y. Convolutional neural networks for sentence classification. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, 2014. 1746–1751

  8. Zhang X, Zhao J B, LeCun Y. Character-level convolutional networks for text classification. InProceedings of the 28th International Conference on Neural Information Processing Systems, 2015. 649–657

  9. Conneau A, Schwenk H, Barrault L, et al. Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017. 1107–1116

  10. Johnson R, Zhang T. Deep pyramid convolutional neural networks for text categorization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017. 562–570

  11. Baker B, Gupta O, Naik N, et al. Designing neural network architectures using reinforcement learning. In: Proceedings of International Conference on Learning Representations, 2017

  12. Liu C, Zoph B, Neumann M, et al. Progressive neural architecture search. In: Proceedings of European Conference on Computer Vision, 2018. 19–34

  13. Pham H, Guan M Y, Zoph B, et al. Efficient neural architecture search via parameter sharing. In: Proceedings of the 35th International Conference on Machine Learning, 2018. 4095–4104

  14. Liu H, Simonyan K, Yang Y. Darts: differentiable architecture search. In: Proceedings of International Conference on Learning Representations, 2018

  15. Chen X, Xie L X, Wu J, et al. Progressive differentiable architecture search: bridging the depth gap between search and evaluation. In: Proceedings of IEEE/CVF International Conference on Computer Vision, 2019. 1294–1303

  16. Gregor K, LeCun Y. Learning fast approximations of sparse coding. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, 2010. 399–406

  17. Liu Q, Wang J. A one-layer projection neural network for nonsmooth optimization subject to linear equalities and bound constraints. IEEE Trans Neural Netw Learn Syst, 2013, 24: 812–824

    Article  Google Scholar 

  18. Xin B, Wang Y, Gao W, et al. Maximal sparsity with deep networks? In: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016. 4340–4348

  19. Yang Y, Sun J, Li H B, et al. Deep ADMM-Net for compressive sensing MRI. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016. 10–18

  20. Zhang J, Ghanem B. ISTA-Net: interpretable optimization-inspired deep network for image compressive sensing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 1828–1837

  21. Giryes R, Eldar Y C, Bronstein A M, et al. Tradeoffs between convergence speed and reconstruction accuracy in inverse problems. IEEE Trans Signal Process, 2018, 66: 1676–1690

    Article  MathSciNet  MATH  Google Scholar 

  22. Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci, 2009, 2: 183–202

    Article  MathSciNet  MATH  Google Scholar 

  23. Papyan V, Romano Y, Elad M. Convolutional neural networks analyzed via convolutional sparse coding. J Mach Learn Res, 2017, 18: 2887–2938

    MathSciNet  MATH  Google Scholar 

  24. Sun X, Nasrabadi N M, Tran T D. Supervised deep sparse coding networks for image classification. IEEE Trans Image Process, 2020, 29: 405–418

    Article  MathSciNet  MATH  Google Scholar 

  25. Chan K H R, Yu Y, You C, et al. ReduNet: a white-box deep network from the principle of maximizing rate reduction. 2021. ArXiv:2105.10446

  26. Li H, Yang Y, Chen D, et al. Optimization algorithm inspired deep neural network structure design. In: Proceedings of the 10th Asian Conference on Machine Learning, 2018. 614–629

  27. Schaffer J D, Whitley D, Eshelman L J. Combinations of genetic algorithms and neural networks: a survey of the state of the art. In: Proceedings of International Workshop on Combinations of Genetic Algorithms and Neural Networks, 1992. 1–37

  28. Leung F H F, Lam H K, Ling S H, et al. Tuning of the structure and parameters of a neural network using an improved genetic algorithm. IEEE Trans Neural Netw, 2003, 14: 79–88

    Article  Google Scholar 

  29. Verbancsics P, Harguess J. Generative neuroevolution for deep learning. 2013. ArXiv:1312.5355

  30. Domhan T, Springenberg J T, Hutter F. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Proceedings of the 24th International Conference on Artificial Intelligence, 2015

  31. E W. A proposal on machine learning via dynamical systems. Commun Math Stat, 2017, 5: 1–11

    Article  MathSciNet  MATH  Google Scholar 

  32. Lu Y, Zhong A, Li Q, et al. Beyond finite layer neural networks: bridging deep architectures and numerical differential equations. In: Proceedings of the 35th International Conference on Machine Learning, 2018. 3276–3285

  33. Haber E, Ruthotto L. Stable architectures for deep neural networks. Inverse Probl, 2018, 34: 014004

    Article  MathSciNet  MATH  Google Scholar 

  34. Chen T Q, Rubanova Y, Bettencourt J, et al. Neural ordinary differential equations. In: Proceedings of the 32nd Conference on Neural Information Processing System, 2018. 6571–6583

  35. Howard A G, Zhu M, Chen B, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications. 2017. ArXiv:1704.04861

  36. Krizhevsky A, Hinton G. Learning Multiple Layers of Features from Tiny Images. Technical Report, 2009

  37. Lee C Y, Xie S, Gallagher P, et al. Deeply-supervised nets. In: Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, 2015. 562–570

  38. Netzer Y, Wang T, Coates A, et al. Reading digits in natural images with unsupervised feature learning. In: Proceedings of Conference on Neural Information Processing Systems, 2011

  39. Huang G, Sun Y, Liu Z, et al. Deep networks with stochastic depth. In: Proceedings of European Conference on Computer Vision, 2016. 646–661

  40. Zagoruyko S, Komodakis N. Wide residual networks. 2016. ArXiv:1605.07146

  41. Deng J, Dong W, Socher R, et al. Imagenet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2009. 248–255

  42. He K, Zhang X, Ren S, et al. Identity mappings in deep residual networks. In: Proceedings of European Conference on Computer Vision, 2016. 630–645

  43. Sutskever I, Martens J, Dahl G, et al. On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, 2013. 1139–1147

  44. He K, Zhang X, Ren S, et al. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of IEEE/CVF International Conference on Computer Vision, 2015. 1026–1034

  45. Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistic, 2010. 249–256

  46. Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. 2015. ArXiv:1502.03167

  47. Larsson G, Maire M, Shakhnarovich G. Fractalnet: ultra-deep neural networks without residuals. In: Proceedings of International Conference on Learning Representations, 2017

Download references

Acknowledgements

This work was supported by National Key R&D Program of China (Grant No. 2022ZD0160302), Major Key Project of PCL, China (Grant No. PCL2021A12), and National Natural Science Foundation of China (Grant No. 62276004).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jinwen Ma or Zhouchen Lin.

Additional information

Supporting information

Appendixes A and B. The supporting information is available online at info.scichina.com and link.springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.

Supplementary File

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shen, Z., Yang, Y., She, Q. et al. Newton design: designing CNNs with the family of Newton’s methods. Sci. China Inf. Sci. 66, 162101 (2023). https://doi.org/10.1007/s11432-021-3442-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-021-3442-2

Keywords

Navigation