Abstract
Neural architecture search (NAS) methods have been proposed to relieve human experts from tedious architecture engineering. However, most current methods are constrained in small-scale search owing to the issue of huge computational resource consumption. Meanwhile, the direct application of architectures searched on small datasets to large datasets often bears no performance guarantee due to the discrepancy between different datasets. This limitation impedes the wide use of NAS on large-scale tasks. To overcome this obstacle, we propose an elastic architecture transfer mechanism for accelerating large-scale NAS (EAT-NAS). In our implementations, the architectures are first searched on a small dataset, e.g., CIFAR-10. The best one is chosen as the basic architecture. The search process on a large dataset, e.g., ImageNet, is initialized with the basic architecture as the seed. The large-scale search process is accelerated with the help of the basic architecture. We propose not only a NAS method but also a mechanism for architecture-level transfer learning. In our experiments, we obtain two final models EATNet-A and EATNet-B, which achieve competitive accuracies of 75.5% and 75.6%, respectively, on ImageNet. Both the models also surpass the models searched from scratch on ImageNet under the same settings. For the computational cost, EAT-NAS takes only fewer than 5 days using 8 TITAN X GPUs, which is significantly less than the computational consumption of the state-of-the-art large-scale NAS methods.
Similar content being viewed by others
References
Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016
He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016
Sandler M, Howard A, Zhu M, et al. Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018
Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 834–848
Chen L C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation. 2017. ArXiv:1706.05587
Huang Z L, Wang X G, Huang L C, et al. CCNet: criss-cross attention for semantic segmentation. In: Proceedings of International Conference on Computer Vision, 2019
Huang Z L, Wang X G, Wei Y C, et al. CCNet: criss-cross attention for semantic segmentation. IEEE Trans Pattern Anal Mach Intell, 2020. doi: https://doi.org/10.1109/TPAMI.2020.3007032
Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 1137–1149
Liu W, Anguelov D, Erhan D, et al. SSD: single shot multibox detector. In: Proceedings of European Conference on Computer Vision, 2016
Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection. In: Proceedings of International Conference on Computer Vision, 2017
Yi P, Wang Z Y, Jiang K, et al. Multi-temporal ultra dense memory network for video super-resolution. IEEE Trans Circ Syst Video Technol, 2020, 30: 2503–2516
Zoph B, Vasudevan V, Shlens J, et al. Learning transferable architectures for scalable image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018
Real E, Aggarwal A, Huang Y, et al. Regularized evolution for image classifier architecture search. In: Proceedings of AAAI Conference on Artificial Intelligence, 2019
Pham H, Guan M Y, Zoph B, et al. Efficient neural architecture search via parameter sharing. In: Proceedings of International Conference on Machine Learning, 2018
Zoph B, Le Q V. Neural architecture search with reinforcement learning. In: Proceedings of International Conference on Learning Representations, 2017
Krizhevsky A, Hinton G. Learning Multiple Layers of Features From Tiny Images. Technical Report, 2009
Deng J, Dong W, Socher R, et al. Imagenet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2009
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014. ArXiv:1409.1556
Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015
Liu C X, Zoph B, Neumann M, et al. Progressive neural architecture search. In: Proceedings of European Conference on Computer Vision, 2018
Tommasi T, Patricia N, Caputo B, et al. A deeper look at dataset bias. In: Proceedings of Domain Adaptation in Computer Vision Applications, 2017
Tan M X, Chen B, Pang R M, et al. Mnasnet: platform-aware neural architecture search for mobile. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019
Zhong Z, Yan J J, Wu W, et al. Practical block-wise neural network architecture generation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018
Miikkulainen R, Liang J, Meyerson E, et al. Evolving deep neural networks. In: Proceedings of Artificial Intelligence in the Age of Neural Networks and Brain Computing, 2019
Lu Z C, Whalen I, Boddeti V, et al. NSGA-Net: a multi-objective genetic algorithm for neural architecture search. 2018. ArXiv:1810.03522
Liu H, Simonyan K, Yang Y. DARTS: differentiable architecture search. In: Proceedings of International Conference on Learning Representations, 2019
Zhang X B, Huang Z H, Wang N Y. You only search once: single shot neural architecture search via direct sparse optimization. 2018. ArXiv:1811.01567
Cai H, Zhu L G, Han S. ProxylessNAS: direct neural architecture search on target task and hardware. In: Proceedings of International Conference on Learning Representations, 2019
Fang J M, Sun Y Z, Zhang Q, et al. Densely connected search space for more flexible neural architecture search. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020
Fang J M, Sun Y Z, Peng K, et al. Fast neural network adaptation via parameter remapping and architecture search. In: Proceedings of International Conference on Learning Representations, 2020
Dong X Y, Yang Y. Searching for a robust neural architecture in four GPU hours. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019
Mei J R, Li Y W, Lian X C, et al. Atomnas: fine-grained end-to-end neural architecture search. In: Proceedings of International Conference on Learning Representations, 2020
Wu B C, Dai X L, Zhang P Z, et al. FBNet: hardware-aware efficient convnet design via differentiable neural architecture search. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019
Chang J L, Zhang X B, Guo Y W, et al. Data: differentiable architecture approximation. In: Proceedings of Conference on Neural Information Processing Systems, 2019
Wong C, Houlsby N, Lu Y F, et al. Transfer learning with neural automl. In: Proceedings of Conference on Neural Information Processing Systems, 2018
Deb K. Multi-objective optimization. In: Proceedings of Search Methodologies, 2014
Goldberg D E, Deb K. A comparative analysis of selection schemes used in genetic algorithms. Found Genetic Algorithms, 1991, 1: 69–93
Liu H X, Simonyan K, Vinyals O, et al. Hierarchical representations for efficient architecture search. In: Proceedings of International Conference on Learning Representations, 2018
Chollet F. Xception: deep learning with depthwise separable convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017
Chen T, Goodfellow I J, Shlens J. Net2Net: accelerating learning via knowledge transfer. In: Proceedings of International Conference on Learning Representations, 2016
Loshchilov I, Hutter F. SGDR: stochastic gradient descent with warm restarts. 2016. ArXiv:1608.03983
DeVries T, Taylor G W. Improved regularization of convolutional neural networks with cutout. 2017. ArXiv:1708.04552
Howard A G, Zhu M L, Chen B, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications. 2017. ArXiv:1704.04861
Zhang X Y, Zhou X Y, Lin M X, et al. Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018
Chen X, Xie L X, Wu J, et al. Progressive differentiable architecture search: bridging the depth GAP between search and evaluation. In: Proceedings of International Conference on Computer Vision, 2019
Xu Y H, Xie L X, Zhang X P, et al. PC-DARTS: partial channel connections for memory-efficient architecture search. In: Proceedings of International Conference on Learning Representations, 2020
Xie S R, Zheng H H, Liu C X, et al. SNAS: stochastic neural architecture search. In: Proceedings of International Conference on Learning Representations, 2019
Real E, Moore S, Selle A, et al. Large-scale evolution of image classifiers. In: Proceedings of International Conference on Machine Learning, 2017
Acknowledgements
This work was in part supported by National Natural Science Foundation of China (NSFC) (Grant Nos. 61876212, 61976208, 61733007), Zhejiang Lab (Grant No. 2019NB0AB02), and HUST-Horizon Computer Vision Research Center. We thank Liangchen SONG and Guoli WANG for the discussion and assistance.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fang, J., Chen, Y., Zhang, X. et al. EAT-NAS: elastic architecture transfer for accelerating large-scale neural architecture search. Sci. China Inf. Sci. 64, 192106 (2021). https://doi.org/10.1007/s11432-020-3112-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-020-3112-8