BigNAS: Scaling up Neural Architecture Search with Big Single-Stage Models

Yu, Jiahui; Jin, Pengchong; Liu, Hanxiao; Bender, Gabriel; Kindermans, Pieter-Jan; Tan, Mingxing; Huang, Thomas; Song, Xiaodan; Pang, Ruoming; Le, Quoc

doi:10.1007/978-3-030-58571-6_41

Jiahui Yu^12,13,
Pengchong Jin¹²,
Hanxiao Liu¹²,
Gabriel Bender¹²,
Pieter-Jan Kindermans¹²,
Mingxing Tan¹²,
Thomas Huang¹³,
Xiaodan Song¹²,
Ruoming Pang¹² &
…
Quoc Le¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12352))

Included in the following conference series:

European Conference on Computer Vision

4298 Accesses
86 Citations

Abstract

Neural architecture search (NAS) has shown promising results discovering models that are both accurate and fast. For NAS, training a one-shot model has become a popular strategy to rank the relative quality of different architectures (child models) using a single set of shared weights. However, while one-shot model weights can effectively rank different network architectures, the absolute accuracies from these shared weights are typically far below those obtained from stand-alone training. To compensate, existing methods assume that the weights must be retrained, finetuned, or otherwise post-processed after the search is completed. These steps significantly increase the compute requirements and complexity of the architecture search and model deployment. In this work, we propose BigNAS, an approach that challenges the conventional wisdom that post-processing of the weights is necessary to get good prediction accuracies. Without extra retraining or post-processing steps, we are able to train a single set of shared weights on ImageNet and use these weights to obtain child models whose sizes range from 200 to 1000 MFLOPs. Our discovered model family, BigNASModels, achieve top-1 accuracies ranging from 76.5% to 80.9%, surpassing state-of-the-art models in this range including EfficientNets and Once-for-All networks without extra retraining or post-processing. We present ablative study and analysis to further understand the proposed BigNASModels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

EAT-NAS: elastic architecture transfer for accelerating large-scale neural architecture search

Article 06 August 2021

Efficient Search of Multiple Neural Architectures with Different Complexities via Importance Sampling

DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search

References

Bender, G., Kindermans, P.J., Zoph, B., Vasudevan, V., Le, Q.: Understanding and simplifying one-shot architecture search. In: International Conference on Machine Learning, pp. 549–558 (2018)
Google Scholar
Berman, M., Pishchulin, L., Xu, N., Blaschko, M.B., Medioni, G.: Aows: adaptive and optimal network width search with latency constraints. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
Google Scholar
Brock, A., Lim, T., Ritchie, J., Weston, N.: SMASH: one-shot model architecture search through hypernetworks. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=rydeCEhs-
Cai, H., Gan, C., Wang, T., Zhang, Z., Han, S.: Once-for-all: train one network and specialize it for efficient deployment. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=HylxE1HKwS
Cai, H., Zhu, L., Han, S.: ProxylessNAS: direct neural architecture search on target task and hardware. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=HylVB3AqYm
Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: Autoaugment: learning augmentation strategies from data. In: Proceedings of the IEEE Conference on Computer vision and Pattern Recognition, pp. 113–123 (2019)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 248–255. IEEE (2009)
Google Scholar
Goyal, P., et al.: Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017)
Guo, Z., et al.: Single path one-shot neural architecture search with uniform sampling. arXiv preprint arXiv:1904.00420 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Chapter Google Scholar
He, T., Zhang, Z., Zhang, H., Zhang, Z., Xie, J., Li, M.: Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 558–567 (2019)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Howard, A., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1314–1324 (2019)
Google Scholar
Howard, A.G., et al.: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Hu, S., et al.: Dsnas: direct neural architecture search without parameter retraining. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12084–12092 (2020)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Liu, C., et al.: Progressive neural architecture search. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 19–34 (2018)
Google Scholar
Liu, H., Simonyan, K., Vinyals, O., Fernando, C., Kavukcuoglu, K.: Hierarchical representations for efficient architecture search. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=BJQRKzbA-
Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=S1eYHoC5FX
Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)
Ma, N., Zhang, X., Zheng, H.T., Sun, J.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)
Google Scholar
Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: ICML, pp. 4092–4101 (2018). http://proceedings.mlr.press/v80/pham18a.html
Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions (2018). https://openreview.net/forum?id=SkBYYyZRZ
Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4780–4789 (2019)
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. arXiv preprint arXiv:1801.04381 (2018)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Stamoulis, D., et al.: Single-path NAS: designing hardware-efficient convnets in less than 4 hours. arXiv preprint arXiv:1904.02877 (2019)
Tan, M., Chen, B., Pang, R., Vasudevan, V., Le, Q.V.: Mnasnet: platform-aware neural architecture search for mobile. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks (2019)
Google Scholar
Wu, B., et al.: FBNET: hardware-aware efficient convnet design via differentiable neural architecture search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10734–10742 (2019)
Google Scholar
Yang, Z., et al.: Cars: continuous evolution for efficient neural architecture search. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
Google Scholar
Yu, J., Huang, T.: Network slimming by slimmable networks: towards one-shot architecture search for channel numbers. arXiv preprint arXiv:1903.11728 (2019)
Yu, J., Huang, T.: Universally slimmable networks and improved training techniques. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1803–1811. IEEE (2019)
Google Scholar
Yu, J., Yang, L., Xu, N., Yang, J., Huang, T.: Slimmable neural networks. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=H1gMCsAqY7
Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016)
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697–8710 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Google Brain, New York, USA
Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Xiaodan Song, Ruoming Pang & Quoc Le
University of Illinois at Urbana-Champaign, Champaign, USA
Jiahui Yu & Thomas Huang

Authors

Jiahui Yu
View author publications
You can also search for this author in PubMed Google Scholar
Pengchong Jin
View author publications
You can also search for this author in PubMed Google Scholar
Hanxiao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Bender
View author publications
You can also search for this author in PubMed Google Scholar
Pieter-Jan Kindermans
View author publications
You can also search for this author in PubMed Google Scholar
Mingxing Tan
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodan Song
View author publications
You can also search for this author in PubMed Google Scholar
Ruoming Pang
View author publications
You can also search for this author in PubMed Google Scholar
Quoc Le
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiahui Yu .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 181 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, J. et al. (2020). BigNAS: Scaling up Neural Architecture Search with Big Single-Stage Models. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12352. Springer, Cham. https://doi.org/10.1007/978-3-030-58571-6_41

Download citation

DOI: https://doi.org/10.1007/978-3-030-58571-6_41
Published: 09 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58570-9
Online ISBN: 978-3-030-58571-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

BigNAS: Scaling up Neural Architecture Search with Big Single-Stage Models

Abstract

Access this chapter

Similar content being viewed by others

EAT-NAS: elastic architecture transfer for accelerating large-scale neural architecture search

Efficient Search of Multiple Neural Architectures with Different Complexities via Importance Sampling

DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 181 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

BigNAS: Scaling up Neural Architecture Search with Big Single-Stage Models

Abstract

Access this chapter

Similar content being viewed by others

EAT-NAS: elastic architecture transfer for accelerating large-scale neural architecture search

Efficient Search of Multiple Neural Architectures with Different Complexities via Importance Sampling

DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 181 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation