Skip to main content

Joslim: Joint Widths and Weights Optimization for Slimmable Neural Networks

Part of the Lecture Notes in Computer Science book series (LNAI,volume 12977)

Abstract

Slimmable neural networks provide a flexible trade-off front between prediction error and computational requirement (such as the number of floating-point operations or FLOPs) with the same storage requirement as a single model. They are useful for reducing maintenance overhead for deploying models to devices with different memory constraints and are useful for optimizing the efficiency of a system with many CNNs. However, existing slimmable network approaches either do not optimize layer-wise widths or optimize the shared-weights and layer-wise widths independently, thereby leaving significant room for improvement by joint width and weight optimization. In this work, we propose a general framework to enable joint optimization for both width configurations and weights of slimmable networks. Our framework subsumes conventional and NAS-based slimmable methods as special cases and provides flexibility to improve over existing methods. From a practical standpoint, we propose Joslim, an algorithm that jointly optimizes both the widths and weights for slimmable nets, which outperforms existing methods for optimizing slimmable networks across various networks, datasets, and objectives. Quantitatively, improvements up to 1.7% and 8% in top-1 accuracy on the ImageNet dataset can be attained for MobileNetV2 considering FLOPs and memory footprint, respectively. Our results highlight the potential of optimizing the channel counts for different layers jointly with the weights for slimmable networks. Code available at https://github.com/cmu-enyac/Joslim.

Keywords

  • Model compression
  • Slimmable neural networks
  • Channel optimization
  • Efficient deep learning

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Since we only search for channel counts, the progressive shrinking strategy proposed in OFA does not apply. As a result, both OFA and BigNAS have the same approach.

References

  1. Balandat, M., et al.: BoTorch: a framework for efficient Monte-Carlo Bayesian optimization. In: NeurIPS (2020)

    Google Scholar 

  2. Bender, G., Kindermans, P.J., Zoph, B., Vasudevan, V., Le, Q.: Understanding and simplifying one-shot architecture search. In: ICML, pp. 550–559 (2018)

    Google Scholar 

  3. Bender, G., et al.: Can weight sharing outperform random architecture search? An investigation with tunas. In: CVPR, pp. 14323–14332 (2020)

    Google Scholar 

  4. Berman, M., Pishchulin, L., Xu, N., Medioni, G., et al.: AOWS: adaptive and optimal network width search with latency constraints. In: CVPR (2020)

    Google Scholar 

  5. Bolukbasi, T., Wang, J., Dekel, O., Saligrama, V.: Adaptive neural networks for efficient inference. In: ICML (2017)

    Google Scholar 

  6. Cai, H., Gan, C., Wang, T., Zhang, Z., Han, S.: Once-for-all: train one network and specialize it for efficient deployment. In: ICLR (2020)

    Google Scholar 

  7. Cheng, A.C., et al.: Searching toward pareto-optimal device-aware neural architectures. In: ICCAD, pp. 1–7 (2018)

    Google Scholar 

  8. Chin, T.W., Ding, R., Zhang, C., Marculescu, D.: Towards efficient model compression via learned global ranking. In: CVPR (2020)

    Google Scholar 

  9. Chin, T.W., Marculescu, D., Morcos, A.S.: Width transfer: on the (in) variance of width optimization. In: CVPR Workshops, pp. 2990–2999 (2021)

    Google Scholar 

  10. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)

    CrossRef  Google Scholar 

  11. Dong, J.-D., Cheng, A.-C., Juan, D.-C., Wei, W., Sun, M.: DPP-Net: device-aware progressive search for pareto-optimal neural architectures. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 540–555. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_32

    CrossRef  Google Scholar 

  12. Elsken, T., Metzen, J.H., Hutter, F.: Efficient multi-objective neural architecture search via Lamarckian evolution. arXiv preprint arXiv:1804.09081 (2018)

  13. Gordon, A., et al.: MorphNet: fast & simple resource-constrained structure learning of deep networks. In: CVPR, pp. 1586–1595 (2018)

    Google Scholar 

  14. Guo, S., Wang, Y., Li, Q., Yan, J.: DMCP: differentiable Markov channel pruning for neural networks. In: CVPR, pp. 1539–1547 (2020)

    Google Scholar 

  15. Guo, Z., et al.: Single path one-shot neural architecture search with uniform sampling. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 544–560. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_32

    CrossRef  Google Scholar 

  16. He, Y., Liu, P., Wang, Z., Hu, Z., Yang, Y.: Filter pruning via geometric median for deep convolutional neural networks acceleration. In: CVPR, pp. 4340–4349 (2019)

    Google Scholar 

  17. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  18. Huang, G., Chen, D., Li, T., Wu, F., van der Maaten, L., Weinberger, K.Q.: Multi-scale dense networks for resource efficient image classification. In: ICLR (2018)

    Google Scholar 

  19. Kaya, Y., Hong, S., Dumitras, T.: Shallow-deep networks: understanding and mitigating network overthinking. In: ICML, Long Beach, CA, June 2019

    Google Scholar 

  20. Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710 (2016)

  21. Li, H., Zhang, H., Qi, X., Yang, R., Huang, G.: Improved techniques for training adaptive deep networks. In: ICCV, pp. 1891–1900 (2019)

    Google Scholar 

  22. Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018)

  23. Liu, Z., et al.: MetaPruning: meta learning for automatic neural network channel pruning. In: ICCV (2019)

    Google Scholar 

  24. Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: ICCV, pp. 2736–2744 (2017)

    Google Scholar 

  25. Louizos, C., Welling, M., Kingma, D.P.: Learning sparse neural networks through \( l\_0 \) regularization. arXiv preprint arXiv:1712.01312 (2017)

  26. Lu, Z., et al.: NSGA-NET: neural architecture search using multi-objective genetic algorithm. In: GECCO, pp. 419–427 (2019)

    Google Scholar 

  27. Ma, X., Triki, A.R., Berman, M., Sagonas, C., Cali, J., Blaschko, M.B.: A Bayesian optimization framework for neural network compression. In: ICCV (2019)

    Google Scholar 

  28. Paria, B., Kandasamy, K., Póczos, B.: A flexible framework for multi-objective Bayesian optimization using random scalarizations. In: Globerson, A., Silva, R. (eds.) UAI (2019)

    Google Scholar 

  29. Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameters sharing. In: ICML, pp. 4095–4104. PMLR (2018)

    Google Scholar 

  30. Rasmussen, C.E.: Gaussian processes in machine learning. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) ML 2003. LNCS (LNAI), vol. 3176, pp. 63–71. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28650-9_4

    CrossRef  Google Scholar 

  31. Ruiz, A., Verbeek, J.: Adaptative inference cost with convolutional neural mixture models. In: ICCV, pp. 1872–1881 (2019)

    Google Scholar 

  32. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: CVPR, pp. 4510–4520 (2018)

    Google Scholar 

  33. Stamoulis, D., et al.: Single-Path NAS: designing hardware-efficient convnets in less than 4 hours. In: ECML-PKDD (2019)

    Google Scholar 

  34. Tan, M., et al.: MnasNet: platform-aware neural architecture search for mobile. In: CVPR, pp. 2820–2828 (2019)

    Google Scholar 

  35. Wang, D., Gong, C., Li, M., Liu, Q., Chandra, V.: AlphaNet: improved training of supernet with alpha-divergence. In: ICML (2021)

    Google Scholar 

  36. Yang, T., Zhu, S., Chen, C., Yan, S., Zhang, M., Willis, A.: MutualNet: adaptive ConvNet via mutual learning from network width and resolution. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 299–315. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_18

    CrossRef  Google Scholar 

  37. Yang, Z., et al.: CARS: continuous evolution for efficient neural architecture search. In: CVPR, June 2020

    Google Scholar 

  38. Ye, J., Lu, X., Lin, Z., Wang, J.Z.: Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. In: ICLR (2018)

    Google Scholar 

  39. Yu, J., Huang, T.: AutoSlim: towards one-shot architecture search for channel numbers. arXiv preprint arXiv:1903.11728, August 2019

  40. Yu, J., Huang, T.S.: Universally slimmable networks and improved training techniques. In: ICCV, pp. 1803–1811 (2019)

    Google Scholar 

  41. Yu, J., et al.: BigNAS: scaling up neural architecture search with big single-stage models. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 702–717. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_41

    CrossRef  Google Scholar 

  42. Yu, J., Yang, L., Xu, N., Yang, J., Huang, T.: Slimmable neural networks. In: ICLRa (2019)

    Google Scholar 

  43. Zhang, C., Bengio, S., Singer, Y.: Are all layers created equal? arXiv preprint arXiv:1902.01996 (2019)

Download references

Acknowledgement

This research was supported in part by NSF CCF Grant No. 1815899, NSF CSR Grant No. 1815780, and NSF ACI Grant No. 1445606 at the Pittsburgh Supercomputing Center (PSC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ting-Wu Chin .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4065 KB)

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chin, TW., Morcos, A.S., Marculescu, D. (2021). Joslim: Joint Widths and Weights Optimization for Slimmable Neural Networks. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12977. Springer, Cham. https://doi.org/10.1007/978-3-030-86523-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86523-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86522-1

  • Online ISBN: 978-3-030-86523-8

  • eBook Packages: Computer ScienceComputer Science (R0)