Advertisement

Single Path One-Shot Neural Architecture Search with Uniform Sampling

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12361)

Abstract

We revisit the one-shot Neural Architecture Search (NAS) paradigm and analyze its advantages over existing NAS approaches. Existing one-shot method, however, is hard to train and not yet effective on large scale datasets like ImageNet. This work propose a Single Path One-Shot model to address the challenge in the training. Our central idea is to construct a simplified supernet, where all architectures are single paths so that weight co-adaption problem is alleviated. Training is performed by uniform path sampling. All architectures (and their weights) are trained fully and equally.

Comprehensive experiments verify that our approach is flexible and effective. It is easy to train and fast to search. It effortlessly supports complex search spaces (e.g., building blocks, channel, mixed-precision quantization) and different search constraints (e.g., FLOPs, latency). It is thus convenient to use for various needs. It achieves start-of-the-art performance on the large dataset ImageNet.

Notes

Acknowledgement

This work is supported by The National Key Research and Development Program of China (No. 2017YFA0700800) and Beijing Academy of Artificial Intelligence (BAAI).

Supplementary material

504471_1_En_32_MOESM1_ESM.pdf (241 kb)
Supplementary material 1 (pdf 240 KB)

References

  1. 1.
    Baker, B., Gupta, O., Naik, N., Raskar, R.: Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167 (2016)
  2. 2.
    Bender, G., Kindermans, P.J., Zoph, B., Vasudevan, V., Le, Q.: Understanding and simplifying one-shot architecture search. In: International Conference on Machine Learning, pp. 549–558 (2018)Google Scholar
  3. 3.
    Brock, A., Lim, T., Ritchie, J.M., Weston, N.: SMASH: one-shot model architecture search through hypernetworks. arXiv preprint arXiv:1708.05344 (2017)
  4. 4.
    Cai, H., Zhu, L., Han, S.: ProxylessNAS: direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332 (2018)
  5. 5.
    Choi, J., Wang, Z., Venkataramani, S., Chuang, P.I.J., Srinivasan, V., Gopalakrishnan, K.: PACT: parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018)
  6. 6.
    Dong, X., Yang, Y.: Searching for a robust neural architecture in four GPU hours. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1761–1770 (2019)Google Scholar
  7. 7.
    Dong, X., Yang, Y.: NAS-Bench-102: extending the scope of reproducible neural architecture search. arXiv preprint arXiv:2001.00326 (2020)
  8. 8.
    Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  9. 9.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
  10. 10.
    Li, L., Talwalkar, A.: Random search and reproducibility for neural architecture search. arXiv preprint arXiv:1902.07638 (2019)
  11. 11.
    Liu, C., et al.: Progressive neural architecture search. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 19–34 (2018)Google Scholar
  12. 12.
    Liu, H., Simonyan, K., Yang, Y.: Darts: differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018)
  13. 13.
    Liu, Z., Wu, B., Luo, W., Yang, X., Liu, W., Cheng, K.T.: Bi-Real Net: enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 722–737 (2018)Google Scholar
  14. 14.
    Ma, N., Zhang, X., Zheng, H.T., Sun, J.: ShuffleNet v2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)Google Scholar
  15. 15.
    Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268 (2018)
  16. 16.
    Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. arXiv preprint arXiv:1802.01548 (2018)
  17. 17.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)Google Scholar
  19. 19.
    Sciuto, C., Yu, K., Jaggi, M., Musat, C., Salzmann, M.: Evaluating the search phase of neural architecture search. arXiv preprint arXiv:1902.08142 (2019)
  20. 20.
    Stamoulis, D., et al.: Single-path NAS: designing hardware-efficient ConvNets in less than 4 hours. arXiv preprint arXiv:1904.02877 (2019)
  21. 21.
    Tan, M., Chen, B., Pang, R., Vasudevan, V., Le, Q.V.: MnasNet: platform-aware neural architecture search for mobile. arXiv preprint arXiv:1807.11626 (2018)
  22. 22.
    Véniat, T., Denoyer, L.: Learning time/memory-efficient deep architectures with budgeted super networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3492–3500 (2018)Google Scholar
  23. 23.
    Wu, B., et al.: FBNet: hardware-aware efficient ConvNet design via differentiable neural architecture search. arXiv preprint arXiv:1812.03443 (2018)
  24. 24.
    Wu, B., Wang, Y., Zhang, P., Tian, Y., Vajda, P., Keutzer, K.: Mixed precision quantization of convnets via differentiable neural architecture search. arXiv preprint arXiv:1812.00090 (2018)
  25. 25.
    Xie, S., Kirillov, A., Girshick, R., He, K.: Exploring randomly wired neural networks for image recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1284–1293 (2019)Google Scholar
  26. 26.
    Xie, S., Zheng, H., Liu, C., Lin, L.: SNAS: stochastic neural architecture search. arXiv preprint arXiv:1812.09926 (2018)
  27. 27.
    Yang, A., Esperança, P.M., Carlucci, F.M.: NAS evaluation is frustratingly hard. arXiv preprint arXiv:1912.12522 (2019)
  28. 28.
    Yao, Q., Xu, J., Tu, W.W., Zhu, Z.: Differentiable neural architecture search via proximal iterations. arXiv preprint arXiv:1905.13577 (2019)
  29. 29.
    Zhang, D., Yang, J., Ye, D., Hua, G.: LQ-Nets: learned quantization for highly accurate and compact deep neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 365–382 (2018)Google Scholar
  30. 30.
    Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)Google Scholar
  31. 31.
    Zhang, X., Huang, Z., Wang, N.: You only search once: single shot neural architecture search via direct sparse optimization. arXiv preprint arXiv:1811.01567 (2018)
  32. 32.
    Zhong, Z., Yan, J., Wu, W., Shao, J., Liu, C.L.: Practical block-wise neural network architecture generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2423–2432 (2018)Google Scholar
  33. 33.
    Zhong, Z., et al.: BlockQNN: efficient block-wise neural network architecture generation. arXiv preprint arXiv:1808.05584 (2018)
  34. 34.
    Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)
  35. 35.
    Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016)
  36. 36.
    Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697–8710 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.MEGVII TechnologyBeijingChina
  2. 2.Tsinghua UniversityBeijingChina
  3. 3.Hong Kong University of Science and TechnologyClear Water BayHong Kong

Personalised recommendations