Skip to main content
Log in

Optimization-inspired manual architecture design and neural architecture search

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Neural architecture has been a research focus in recent years due to its importance in deciding the performance of deep networks. Representative ones include a residual network (ResNet) with skip connections and a dense network (DenseNet) with dense connections. However, a theoretical guidance for manual architecture design and neural architecture search (NAS) is still lacking. In this paper, we propose a manual architecture design framework, which is inspired by optimization algorithms. It is based on the conjecture that an optimization algorithm with a good convergence rate may imply a neural architecture with good performance. Concretely, we prove under certain conditions that forward propagation in a deep neural network is equivalent to the iterative optimization procedure of the gradient descent algorithm minimizing a cost function. Inspired by this correspondence, we derive neural architectures from fast optimization algorithms, including the heavy ball algorithm and Nesterov’s accelerated gradient descent algorithm. Surprisingly, we find that we can deem the ResNet and DenseNet as special cases of the optimization-inspired architectures. These architectures offer not only theoretical guidance, but also good performances in image recognition on multiple datasets, including CIFAR-10, CIFAR-100, and ImageNet. Moreover, we show that our method is also useful for NAS by offering a good initial search point or guiding the search space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Krzhevsky A, Sutshever I, Hinton G. ImageNet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, 2012

  2. Girshick R. Fast R-CNN. In: Proceedings of IEEE International Conference on Computer Vision, 2015

  3. Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell, 2016, 39: 1137–1149

    Article  Google Scholar 

  4. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015

  5. Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015

  6. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, 2015

  7. Szegedy C, Vanhoucke V, Ioffe S, et al. Rethingking the inception architecture for computer vison. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016

  8. Srivastava RK, Greff K, Schmidhuber J. Training very deep networks. In: Proceedings of Advances in Neural Information Processing Systems, 2015. 2377–2385

  9. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015

  10. Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017

  11. Chen Y, Li J, Xiao H, et al. Dual path networks. In: Proceedings of Advances in Neural Information Processing Systems, 2017

  12. Yang Y, Zhong Z, Shen T, et al. Convolutional neural networks with alternately updated clique. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018

  13. Baker B, Gupta O, Naik N, et al. Designing neural network architectures using reinforcemen learning. In: Proceedings of International Conference on Learning Representations, 2017

  14. Zoph B, Le Q. Neural architecture search with reinforcement learning. In: Proceedings of International Conference on Learning Representations, 2017

  15. Liu C, Zoph B, Neumann M, et al. Progressive neural architecture search. In: Proceedings of European Conference on Computer Vision, 2018

  16. Pham H, Guan M Y, Zoph B, et al. Efficient neural architecture search via parameter sharing. In: Proceedings of International Conference on Machine Learning, 2018

  17. Zhong Z, Yan J, Wu W, et al. Practical block-wise neural network architecture generation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018

  18. Zhong Z, Yang Z, Deng B, et al. BlockQNN: efficient block-wise neural network architecture generation. IEEE Trans Pattern Anal Mach Intell, 2021, 43: 2314–2328

    Article  Google Scholar 

  19. Zoph B, Vasudevan V, Shlens J, et al. Learning transferable architectures for scalable image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018

  20. Cai H, Yang J, Zhang W, et al. Path-level network transformation for efficient architecture search. In: Proceedings of International Conference on Machine Learning, 2018. 678–687

  21. Real E, Moore S, Selle A, et al. Large-scale evolution of image classifiers. In: Proceedings of International Conference on Machine Learning, 2017. 2902–2911

  22. Liu H, Simonyan K, Vinyals O, et al. Hierarchical representations for efficient architecture search. In: Proceedings of International Conference on Learning Representations, 2018

  23. Real E, Aggarwal A, Huang Y, et al. Regularized evolution for image classifier architecture search. In: Proceedings of AAAI Conference on Artificial Intelligence, 2019. 4780–4789

  24. Elsken T, Metzen J H, Hutter F. Efficient multi-objective neural architecture search via lamarckian evolution. In: Proceedings of International Conference on Learning Representations, 2019

  25. Lu Z, Whalen I, Boddeti V, et al. NSGA-Net: neural architecture search using multi-objective genetic algorithm. In: Proceedings of the Genetic and Evolutionary Computation Conference, 2019. 419–427

  26. Lu Z, Deb K, Goodman E, et al. NSGANetV2: evolutionary multi-objective surrogate-assisted neural architecture search. In: Proceedings of European Conference on Computer Vision, 2020. 35–51

  27. Fang J, Chen Y, Zhang X, et al. EAT-NAS: elastic architecture transfer for accelerating large-scale neural architecture search. Sci China Inf Sci, 2021, 64: 192106

    Article  MathSciNet  Google Scholar 

  28. Luo R, Tian F, Qin T, et al. Neural architecture optimization. In: Proceedings of Advances in Neural Information Processing Systems, 2018. 7816–7827

  29. Liu H, Simonyan K, Yang Y. Darts: differentiable architecture search. In: Proceedings of International Conference on Learning Representations, 2019

  30. Xie S, Zheng H, Liu C, et al. SNAS: stochastic neural architecture search. In: Proceedings of International Conference on Learning Representations, 2019

  31. Chen X, Xie L, Wu J, et al. Progressive differentiable architecture search: bridging the depth gap between search and evaluation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. 1294–1303

  32. Xu Y, Xie L, Zhang X, et al. PC-DARTS: partial channel connections for memory-efficient differentiable architecture search. In: Proceedings of International Conference on Learning Representations, 2020

  33. Fang J, Sun Y, Zhang Q, et al. Densely connected search space for more flexible neural architecture search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. 10628–10637

  34. Wu B, Dai X, Zhang P, et al. FBNet: hardware-aware efficient ConvNet design via differentiable neural architecture search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. 10734–10742

  35. Weinan E. A proposal on machine learning via dynamical systems. Commun Math Stat, 2017, 5: 1–11

    Article  MathSciNet  MATH  Google Scholar 

  36. Haber E, Ruthotto L. Stable architectures for deep neural networks. Inverse Problems, 2017, 34: 014004

    Article  MathSciNet  MATH  Google Scholar 

  37. Chen T Q, Rubanova Y, Bettencourt J, et al. Neural ordinary differential equations. In: Proceedings of Advances in Neural Information Processing Systems, 2018. 6571–6583

  38. Yang Y, Wu J, Li H, et al. Dynamical system inspired adaptive time stepping controller for residual network families. In: Proceedings of AAAI Conference on Artificial Intelligence, 2020

  39. Gregor K, LeCun Y. Learning fast approximations of sparse coding. In: Proceedings of International Conference on Machine Learning, 2010

  40. Xin B, Wang Y, Gao W, et al. Maximal sparsity with deep networks. In: Proceedings of Advances in Neural Information Processing Systems, 2016

  41. Kulkarni K, Lohit S, Turaga P, et al. ReconNet: non-iterative reconstruction of images from compressively sensed mmeasuremets. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016

  42. Zhang J, Ghanem B. ISTA-Net: iterative shrinkage-thresholding algorithm inspired deep network for image compressie sensing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018

  43. Yang Y, Sun J, Li H, et al. Deep ADMM-Net for compressive sensing MRI. In: Proceedings of Advances in Neural Information Processing Systems, 2016

  44. Xie X, Wu J, Zhong Z, et al. Differentiable linearized ADMM. In: Proceedings of International Conference on Machine Learning, 2019

  45. Schaffer J, Whitley D, Eshelman L. Combinations of genetic algorithms and neural networks: a survey of the state of the art. In: Proceedings of International Workshop on Combinations of Genetic Algorithms and Neural Networks, 1992

  46. Leung F H F, Lam H K, Ling S H, et al. Tuning of the structure and parameters of a neural network using an improved genetic algorithm. IEEE Trans Neural Netw, 2003, 14: 79–88

    Article  Google Scholar 

  47. Verbancsics P, Harguess J. Generative neuroevolution for deep learning. 2013. ArXiv:1312.5355

  48. Saxena S, Verbeek J. Convolutional neural fabrics. In: Proceedings of Advances in Neural Information Processing Systems, 2016

  49. Domhan T, Springenberg J, Hutter F. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Proceedings of International Joint Conference on Artificial Intelligence, 2015

  50. Bergstra J, Yamins D, Cox D. Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: Proceedings of International Conference on Machine Learning, 2013

  51. Kwok T, Yeung D. Constructive algorithms for structure learning in feedforward neural networks for regression problems. IEEE Trans Neural Netw, 1997, 8: 630–645

    Article  Google Scholar 

  52. Ma L, Khorasani K. A new strategy for adaptively constructing multilayer feedforward neural networks. Neurocomputing, 2003, 51: 361–385

    Article  Google Scholar 

  53. Cortes C, Gonzalvo X, Kuznetsov V, et al. AdaNet: adaptive structure learning of artificial nerual networks. In: Proceedings of International Conference on Machine Learning, 2017

  54. Brock A, Lim T, Ritchie J M, et al. SMASH: one-shot model architecture search through hypernetworks. In: Proceedings of International Conference on Learning Representations, 2018

  55. Cai H, Zhu L, Han S. ProxylessNAS: direct neural architecture search on target task and hardware. In: Proceedings of International Conference on Learning Representations, 2019

  56. Bender G, Kindermans P J, Zoph B, et al. Understanding and simplifying one-shot architecture search. In: Proceedings of International Conference on Machine Learning, 2018. 550–559

  57. Guo Z, Zhang X, Mu H, et al. Single path one-shot neural architecture search with uniform sampling. In: Proceedings of European Conference on Computer Vision, 2020. 544–560

  58. Stamoulis D, Ding R, Wang D, et al. Single-path NAS: designing hardware-efficient convnets in less than 4 hours. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2019. 481–497

  59. Guo Y, Chen Y, Zheng Y, et al. Breaking the curse of space explosion: towards efficient nas with curriculum search. In: Proceedings of International Conference on Machine Learning, 2020. 3822–3831

  60. Yang Y, You S, Li H, et al. Towards improving the consistency, efficiency, and flexibility of differentiable neural architecture search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021. 6667–6676

  61. Yang Y, Li H, You S, et al. ISTA-NAS: efficient and consistent neural architecture search by sparse coding. In: Proceedings of_Advances in Neural Information Processing Systems, 2020. 10503–10513

  62. Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci, 2009, 2: 183–202

    Article  MathSciNet  MATH  Google Scholar 

  63. Sprechmann P, Bronstein A M, Sapiro G. Learning efficient sparse and low rank models. IEEE Trans Pattern Anal Mach Intell, 2015, 37: 1821–1833

    Article  Google Scholar 

  64. Zhou J T, Di K, Du J, et al. SC2Net: sparse LSTMs for sparse coding. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018

  65. Chen X, Liu J, Wang Z, et al. Theoretical linear convergence of unfolded ISTA and its practical weights and thresholds. In: Proceedings of Advances in Neural Information Processing Systems, 2018. 9061–9071

  66. Metzler C, Mousavi A, Baraniuk R. Learned D-AMP: principled neural network based compressive image recovery. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 1772–1783

  67. Li H, Yang Y, Chen D, et al. Optimization algorithm inspired deep neural network structure design. In: Proceedings of Asian Conference on Machine Learning, 2018. 614–629

  68. Bertsekas D. Nonlinear Programming. Belmont: Athena Scientific, 1999

    MATH  Google Scholar 

  69. Polyak B T. Some methods of speeding up the convergence of iteration methods. USSR Comput Math Math Phys, 1964, 4: 1–17

    Article  Google Scholar 

  70. Nesterov Y. A method for unconstrained convex minimization problem with the rate of convergence O(1 = k2). Sov Math Dokl, 1983, 27: 372–376

    Google Scholar 

  71. Gabay D. Applications of the method of multipliers to variational inequalities. Stud Math Appl, 1983, 15: 299–331

    Google Scholar 

  72. Lin Z, Liu R, Su Z. Linearized alternating direction method with adaptive penalty for low-rank representation. In: Proceedings of Advances in Neural Information Processing Systems, 2011

  73. Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of Artificial Intelligence and Statistics, 2010

  74. He K, Zhang X, Ren S, et al. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of IEEE International Conference on Computer Vision, 2015

  75. Lin M, Chen Q, Yan S. Network in network. In: Proceedings of International Conference on Learning Representations, 2014

  76. Springenberg J T, Dosovitskiy A, Brox T, et al. Striving for simplicity: the all convolutional net. In: Proceedings of International Conference on Learning Representations Workshop, 2015

  77. Lee C Y, Xie S, Gallagher P, et al. Deeplysupervised nets. In: Proceedings of Artificial Intelligence and Statistics, 2015. 562–570

  78. Loshchilov I, Hutter F. SGDR: stochastic gradient descent with warm restarts. In: Proceedings of International Conference on Learning Representations, 2017

  79. Kingma D P, Ba J. ADAM: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations, 2015

  80. DeVries T, Taylor G W. Improved regularization of convolutional neural networks with cutout. 2017. ArXiv:1708.04552

  81. Howard A G, Zhu M, Chen B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications. 2017. ArXiv:1704.04861

  82. Ma N, Zhang X, Zheng H T, et al. ShuffleNet V2: practical guidelines for efficient cnn architecture design. In: Proceedings of European Conference on Computer Vision, 2018. 116–131

  83. Tan M, Chen B, Pang R, et al. MnasNet: platform-aware neural architecture search for mobile. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2820–2828

  84. Zhou H, Yang M, Wang J, et al. BayesNAS: a Bayesian approach for neural architecture search. In: Proceedings of International Conference on Machine Learning, 2019. 7603–7613

Download references

Acknowledgements

This work was supported by National Key R&D Program of China (Grant No. 2022ZD0160302) and National Natural Science Foundation of China (Grant No. 62276004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhouchen Lin.

Additional information

Supporting information

Appendixes A–D. The supporting information is available online at info.scichina.com and link.springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.

Supplementary File

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Y., Shen, Z., Li, H. et al. Optimization-inspired manual architecture design and neural architecture search. Sci. China Inf. Sci. 66, 212101 (2023). https://doi.org/10.1007/s11432-021-3527-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-021-3527-7

Keywords

Navigation