Advertisement

BigNAS: Scaling up Neural Architecture Search with Big Single-Stage Models

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12352)

Abstract

Neural architecture search (NAS) has shown promising results discovering models that are both accurate and fast. For NAS, training a one-shot model has become a popular strategy to rank the relative quality of different architectures (child models) using a single set of shared weights. However, while one-shot model weights can effectively rank different network architectures, the absolute accuracies from these shared weights are typically far below those obtained from stand-alone training. To compensate, existing methods assume that the weights must be retrained, finetuned, or otherwise post-processed after the search is completed. These steps significantly increase the compute requirements and complexity of the architecture search and model deployment. In this work, we propose BigNAS, an approach that challenges the conventional wisdom that post-processing of the weights is necessary to get good prediction accuracies. Without extra retraining or post-processing steps, we are able to train a single set of shared weights on ImageNet and use these weights to obtain child models whose sizes range from 200 to 1000 MFLOPs. Our discovered model family, BigNASModels, achieve top-1 accuracies ranging from 76.5% to 80.9%, surpassing state-of-the-art models in this range including EfficientNets and Once-for-All networks without extra retraining or post-processing. We present ablative study and analysis to further understand the proposed BigNASModels.

Keywords

Efficient neural architecture search AutoML 

Supplementary material

504444_1_En_41_MOESM1_ESM.pdf (182 kb)
Supplementary material 1 (pdf 181 KB)

References

  1. 1.
    Bender, G., Kindermans, P.J., Zoph, B., Vasudevan, V., Le, Q.: Understanding and simplifying one-shot architecture search. In: International Conference on Machine Learning, pp. 549–558 (2018)Google Scholar
  2. 2.
    Berman, M., Pishchulin, L., Xu, N., Blaschko, M.B., Medioni, G.: Aows: adaptive and optimal network width search with latency constraints. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020Google Scholar
  3. 3.
    Brock, A., Lim, T., Ritchie, J., Weston, N.: SMASH: one-shot model architecture search through hypernetworks. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=rydeCEhs-
  4. 4.
    Cai, H., Gan, C., Wang, T., Zhang, Z., Han, S.: Once-for-all: train one network and specialize it for efficient deployment. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=HylxE1HKwS
  5. 5.
    Cai, H., Zhu, L., Han, S.: ProxylessNAS: direct neural architecture search on target task and hardware. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=HylVB3AqYm
  6. 6.
    Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: Autoaugment: learning augmentation strategies from data. In: Proceedings of the IEEE Conference on Computer vision and Pattern Recognition, pp. 113–123 (2019)Google Scholar
  7. 7.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 248–255. IEEE (2009)Google Scholar
  8. 8.
    Goyal, P., et al.: Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017)
  9. 9.
    Guo, Z., et al.: Single path one-shot neural architecture search with uniform sampling. arXiv preprint arXiv:1904.00420 (2019)
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_38CrossRefGoogle Scholar
  13. 13.
    He, T., Zhang, Z., Zhang, H., Zhang, Z., Xie, J., Li, M.: Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 558–567 (2019)Google Scholar
  14. 14.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
  15. 15.
    Howard, A., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1314–1324 (2019)Google Scholar
  16. 16.
    Howard, A.G., et al.: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  17. 17.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)Google Scholar
  18. 18.
    Hu, S., et al.: Dsnas: direct neural architecture search without parameter retraining. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12084–12092 (2020)Google Scholar
  19. 19.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
  20. 20.
    Liu, C., et al.: Progressive neural architecture search. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 19–34 (2018)Google Scholar
  21. 21.
    Liu, H., Simonyan, K., Vinyals, O., Fernando, C., Kavukcuoglu, K.: Hierarchical representations for efficient architecture search. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=BJQRKzbA-
  22. 22.
    Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=S1eYHoC5FX
  23. 23.
    Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)
  24. 24.
    Ma, N., Zhang, X., Zheng, H.T., Sun, J.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)Google Scholar
  25. 25.
    Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: ICML, pp. 4092–4101 (2018). http://proceedings.mlr.press/v80/pham18a.html
  26. 26.
    Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions (2018). https://openreview.net/forum?id=SkBYYyZRZ
  27. 27.
    Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4780–4789 (2019)Google Scholar
  28. 28.
    Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. arXiv preprint arXiv:1801.04381 (2018)
  29. 29.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Stamoulis, D., et al.: Single-path NAS: designing hardware-efficient convnets in less than 4 hours. arXiv preprint arXiv:1904.02877 (2019)
  31. 31.
    Tan, M., Chen, B., Pang, R., Vasudevan, V., Le, Q.V.: Mnasnet: platform-aware neural architecture search for mobile. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  32. 32.
    Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks (2019)Google Scholar
  33. 33.
    Wu, B., et al.: FBNET: hardware-aware efficient convnet design via differentiable neural architecture search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10734–10742 (2019)Google Scholar
  34. 34.
    Yang, Z., et al.: Cars: continuous evolution for efficient neural architecture search. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020Google Scholar
  35. 35.
    Yu, J., Huang, T.: Network slimming by slimmable networks: towards one-shot architecture search for channel numbers. arXiv preprint arXiv:1903.11728 (2019)
  36. 36.
    Yu, J., Huang, T.: Universally slimmable networks and improved training techniques. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1803–1811. IEEE (2019)Google Scholar
  37. 37.
    Yu, J., Yang, L., Xu, N., Yang, J., Huang, T.: Slimmable neural networks. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=H1gMCsAqY7
  38. 38.
    Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016)
  39. 39.
    Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697–8710 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Google BrainNew YorkUSA
  2. 2.University of Illinois at Urbana-ChampaignChampaignUSA

Personalised recommendations