Abstract
Neural architecture search (NAS) has achieved great success in automatically designing high-performance neural networks for given tasks. But the early NAS approaches have a problem of excessive computational cost. Recently, some NAS approaches, such as gradient-based ones, have significantly reduced the computational cost. However, the gradient-based methods have a significant deviation in the architecture selection because they simply use the parameter values of the corresponding architectures as an importance index for architecture selection. This causes the architecture selected from the search space to generally fall into a sub-optimal state. To address this problem, we propose architecture saliency, as a new selection criterion of optimal architectures. Concretely, we define architecture saliency as the squared change in network loss induced by removing this architecture from the neural network. Our saliency directly reflects the contribution of a candidate architecture to the network performance. Therefore, our proposed selection criterion eliminates the deviation in architecture selection. Furthermore, we approximate architecture saliency with Taylor series expansion to get a more efficient implementation. Extensive experiments show that our approach achieves competitive even better model evaluation performance than other NAS approaches on multiple datasets.
Similar content being viewed by others
References
Baker B, Gupta O, Naik N, Raskar R (2016) Designing neural network architectures using reinforcement learning. In : International conference on learning representations
Baldominos A, Saez Y, Isasi P (2020) On the automated, evolutionary design of neural networks: past, present, and future. Neural Comput Appl 32:1–27
Cai H, Zhu L, Han S (2018) Proxylessnas: direct neural architecture search on target task and hardware. Preprint arXiv:1812.00332
Cai Z, Zhu W (2018) Multi-label feature selection via feature manifold learning and sparsity regularization. Int J Mach Learn Cybern 9(8):1321–1334
Cai Z, Yang X, Huang T, Zhu W (2020) A new similarity combining reconstruction coefficient with pairwise distance for agglomerative clustering. Inf Sci 508:173–182
Chen X, Xie L, Wu J, Tian Q (2019) Progressive differentiable architecture search: bridging the depth gap between search and evaluation, vol 1294–1303
Chen X, Hsieh C-J (2020) Stabilizing differentiable architecture search via perturbation-based regularization. In: International conference on machine learning, pp 1554–1565
Chu X, Zhou T, Zhang B, Li J (2020) Fair darts: eliminating unfair advantages in differentiable architecture search. In: European conference on computer vision, pp 465–480
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255
Dong X, Yang Y (2019) Searching for a robust neural architecture in four GPU hours. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1761–1770
Elsken T, Metzen JH, Hutter F (2018) Neural architecture search: a survey. Preprint arXiv:1808.05377
Guyon I, Gunn S, Nikravesh M, Zadeh LA (2008) Feature extraction: foundations and applications, vol 207. Springer, Berlin
Hand DJ (2007) Principles of data mining. Drug Saf 30(7):621–622
Hassibi B, Stork DG (1993) Second order derivatives for network pruning: Optimal brain surgeon. In: Advances in neural information processing systems, pp 164–171
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. Preprint arXiv:1704.04861
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning, pp 448–456
Kapanova KG, Dimov I, Sellier JM (2018) A genetic approach to automatic neural network architecture optimization. Neural Comput Appl 29(5):1481–1492
Krizhevsky A, Hinton G (2010) Convolutional deep belief networks on cifar-10. Unpublished manuscript 40(7):1–9
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Le Y, Yang X (2015) Tiny imagenet visual recognition challenge. CS 231N
LeCun Y, Denker JS, Solla SA (1990) Optimal brain damage. In: Advances in neural information processing systems, pp 598–605
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Liang H, Zhang S, Sun J, He X, Huang W, Zhuang K, Li Z (2019) Darts+: improved differentiable architecture search with early stopping. Preprint arXiv:1909.06035
Liu H, Simonyan K, Vinyals O, Fernando C, Kavukcuoglu K (2017) Hierarchical representations for efficient architecture search. Preprint arXiv:1711.00436
Liu C, Zoph B, Neumann M, Shlens J, Hua W, Li L-J, Fei-Fei L, Yuille A, Huang J, Murphy K (2018) Progressive neural architecture search. In: Proceedings of the European conference on computer vision (ECCV), pp 19–34
Liu H, Simonyan K, Yang Y (2018) Darts: differentiable architecture search. In: International conference on learning representations
Ly A, Marsman M, Verhagen J, Grasman RPPP, Wagenmakers E-J (2017) A tutorial on fisher information. J Math Psychol 80:40–55
Martens J (2014) New insights and perspectives on the natural gradient method. Preprint arXiv:1412.1193
Mahmoudi MT, Taghiyareh F, Forouzideh N, Lucas C (2013) Evolving artificial neural network structure using grammar encoding and colonial competitive algorithm. Neural Comput Appl 22(1):1–16
Nixon M, Aguado A (2019) Feature extraction and image processing for computer vision. Academic Press, London
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch
Pham H, Guan MY, Zoph B, Le QV, Dean J (2018) Efficient neural architecture search via parameter sharing. Preprint arXiv:1802.03268
Real E, Moore S, Selle A, Saxena S, Suematsu YL, Tan J, Le QV, Kurakin A (2017) Large-scale evolution of image classifiers. In: Proceedings of the 34th international conference on machine learning, vol. 70, pp 2902–2911. JMLR. org
Real E, Aggarwal A, Huang Y, Le QV (2019) Regularized evolution for image classifier architecture search. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 4780–4789
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Xie S, Zheng H, Liu C, Lin L (2018) Snas: stochastic neural architecture search. Preprint arXiv:1812.09926
Xu Y, Xie L, Zhang X, Chen X, Qi G-J, Tian Q, Xiong H (2019) Pc-darts: partial channel connections for memory-efficient differentiable architecture search. Preprint arXiv:1907.05737
Yao Q, Xu J, Tu W, Zhu Z (2019) Efficient neural architecture search via proximal iterations. arXiv: Learning
Zhong Z, Yan J, Wu W, Shao J, Liu C(2018) Practical block-wise neural network architecture generation, pp 2423–2432
Zhu W (2009) Relationship between generalized rough sets based on binary relation and covering. Inf Sci 179(3):210–225
Zela A, Elsken T, Saikia T, Marrakchi Y, Brox T, Hutter F (2020) Understanding and robustifying differentiable architecture search. In: International conference on learning representations
Zoph B, Le QV (2016) Neural architecture search with reinforcement learning. Preprint arXiv:1611.01578
Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8697–8710
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 61772120.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
All authors have seen the manuscript and approved to submit to your journal.
Informed consent
We confirm that the content of the manuscript has not been published or submitted for publication elsewhere.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hao, J., Cai, Z., Li, R. et al. Saliency: a new selection criterion of important architectures in neural architecture search. Neural Comput & Applic 34, 1269–1283 (2022). https://doi.org/10.1007/s00521-021-06418-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06418-4