Abstract
Slimmable neural networks provide a flexible trade-off front between prediction error and computational requirement (such as the number of floating-point operations or FLOPs) with the same storage requirement as a single model. They are useful for reducing maintenance overhead for deploying models to devices with different memory constraints and are useful for optimizing the efficiency of a system with many CNNs. However, existing slimmable network approaches either do not optimize layer-wise widths or optimize the shared-weights and layer-wise widths independently, thereby leaving significant room for improvement by joint width and weight optimization. In this work, we propose a general framework to enable joint optimization for both width configurations and weights of slimmable networks. Our framework subsumes conventional and NAS-based slimmable methods as special cases and provides flexibility to improve over existing methods. From a practical standpoint, we propose Joslim, an algorithm that jointly optimizes both the widths and weights for slimmable nets, which outperforms existing methods for optimizing slimmable networks across various networks, datasets, and objectives. Quantitatively, improvements up to 1.7% and 8% in top-1 accuracy on the ImageNet dataset can be attained for MobileNetV2 considering FLOPs and memory footprint, respectively. Our results highlight the potential of optimizing the channel counts for different layers jointly with the weights for slimmable networks. Code available at https://github.com/cmu-enyac/Joslim.
Keywords
- Model compression
- Slimmable neural networks
- Channel optimization
- Efficient deep learning
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Since we only search for channel counts, the progressive shrinking strategy proposed in OFA does not apply. As a result, both OFA and BigNAS have the same approach.
References
Balandat, M., et al.: BoTorch: a framework for efficient Monte-Carlo Bayesian optimization. In: NeurIPS (2020)
Bender, G., Kindermans, P.J., Zoph, B., Vasudevan, V., Le, Q.: Understanding and simplifying one-shot architecture search. In: ICML, pp. 550–559 (2018)
Bender, G., et al.: Can weight sharing outperform random architecture search? An investigation with tunas. In: CVPR, pp. 14323–14332 (2020)
Berman, M., Pishchulin, L., Xu, N., Medioni, G., et al.: AOWS: adaptive and optimal network width search with latency constraints. In: CVPR (2020)
Bolukbasi, T., Wang, J., Dekel, O., Saligrama, V.: Adaptive neural networks for efficient inference. In: ICML (2017)
Cai, H., Gan, C., Wang, T., Zhang, Z., Han, S.: Once-for-all: train one network and specialize it for efficient deployment. In: ICLR (2020)
Cheng, A.C., et al.: Searching toward pareto-optimal device-aware neural architectures. In: ICCAD, pp. 1–7 (2018)
Chin, T.W., Ding, R., Zhang, C., Marculescu, D.: Towards efficient model compression via learned global ranking. In: CVPR (2020)
Chin, T.W., Marculescu, D., Morcos, A.S.: Width transfer: on the (in) variance of width optimization. In: CVPR Workshops, pp. 2990–2999 (2021)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Dong, J.-D., Cheng, A.-C., Juan, D.-C., Wei, W., Sun, M.: DPP-Net: device-aware progressive search for pareto-optimal neural architectures. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 540–555. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_32
Elsken, T., Metzen, J.H., Hutter, F.: Efficient multi-objective neural architecture search via Lamarckian evolution. arXiv preprint arXiv:1804.09081 (2018)
Gordon, A., et al.: MorphNet: fast & simple resource-constrained structure learning of deep networks. In: CVPR, pp. 1586–1595 (2018)
Guo, S., Wang, Y., Li, Q., Yan, J.: DMCP: differentiable Markov channel pruning for neural networks. In: CVPR, pp. 1539–1547 (2020)
Guo, Z., et al.: Single path one-shot neural architecture search with uniform sampling. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 544–560. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_32
He, Y., Liu, P., Wang, Z., Hu, Z., Yang, Y.: Filter pruning via geometric median for deep convolutional neural networks acceleration. In: CVPR, pp. 4340–4349 (2019)
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Huang, G., Chen, D., Li, T., Wu, F., van der Maaten, L., Weinberger, K.Q.: Multi-scale dense networks for resource efficient image classification. In: ICLR (2018)
Kaya, Y., Hong, S., Dumitras, T.: Shallow-deep networks: understanding and mitigating network overthinking. In: ICML, Long Beach, CA, June 2019
Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710 (2016)
Li, H., Zhang, H., Qi, X., Yang, R., Huang, G.: Improved techniques for training adaptive deep networks. In: ICCV, pp. 1891–1900 (2019)
Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018)
Liu, Z., et al.: MetaPruning: meta learning for automatic neural network channel pruning. In: ICCV (2019)
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: ICCV, pp. 2736–2744 (2017)
Louizos, C., Welling, M., Kingma, D.P.: Learning sparse neural networks through \( l\_0 \) regularization. arXiv preprint arXiv:1712.01312 (2017)
Lu, Z., et al.: NSGA-NET: neural architecture search using multi-objective genetic algorithm. In: GECCO, pp. 419–427 (2019)
Ma, X., Triki, A.R., Berman, M., Sagonas, C., Cali, J., Blaschko, M.B.: A Bayesian optimization framework for neural network compression. In: ICCV (2019)
Paria, B., Kandasamy, K., Póczos, B.: A flexible framework for multi-objective Bayesian optimization using random scalarizations. In: Globerson, A., Silva, R. (eds.) UAI (2019)
Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameters sharing. In: ICML, pp. 4095–4104. PMLR (2018)
Rasmussen, C.E.: Gaussian processes in machine learning. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) ML 2003. LNCS (LNAI), vol. 3176, pp. 63–71. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28650-9_4
Ruiz, A., Verbeek, J.: Adaptative inference cost with convolutional neural mixture models. In: ICCV, pp. 1872–1881 (2019)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: CVPR, pp. 4510–4520 (2018)
Stamoulis, D., et al.: Single-Path NAS: designing hardware-efficient convnets in less than 4 hours. In: ECML-PKDD (2019)
Tan, M., et al.: MnasNet: platform-aware neural architecture search for mobile. In: CVPR, pp. 2820–2828 (2019)
Wang, D., Gong, C., Li, M., Liu, Q., Chandra, V.: AlphaNet: improved training of supernet with alpha-divergence. In: ICML (2021)
Yang, T., Zhu, S., Chen, C., Yan, S., Zhang, M., Willis, A.: MutualNet: adaptive ConvNet via mutual learning from network width and resolution. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 299–315. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_18
Yang, Z., et al.: CARS: continuous evolution for efficient neural architecture search. In: CVPR, June 2020
Ye, J., Lu, X., Lin, Z., Wang, J.Z.: Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. In: ICLR (2018)
Yu, J., Huang, T.: AutoSlim: towards one-shot architecture search for channel numbers. arXiv preprint arXiv:1903.11728, August 2019
Yu, J., Huang, T.S.: Universally slimmable networks and improved training techniques. In: ICCV, pp. 1803–1811 (2019)
Yu, J., et al.: BigNAS: scaling up neural architecture search with big single-stage models. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 702–717. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_41
Yu, J., Yang, L., Xu, N., Yang, J., Huang, T.: Slimmable neural networks. In: ICLRa (2019)
Zhang, C., Bengio, S., Singer, Y.: Are all layers created equal? arXiv preprint arXiv:1902.01996 (2019)
Acknowledgement
This research was supported in part by NSF CCF Grant No. 1815899, NSF CSR Grant No. 1815780, and NSF ACI Grant No. 1445606 at the Pittsburgh Supercomputing Center (PSC).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Chin, TW., Morcos, A.S., Marculescu, D. (2021). Joslim: Joint Widths and Weights Optimization for Slimmable Neural Networks. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12977. Springer, Cham. https://doi.org/10.1007/978-3-030-86523-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-86523-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86522-1
Online ISBN: 978-3-030-86523-8
eBook Packages: Computer ScienceComputer Science (R0)
-
Published in cooperation with
http://www.ecmlpkdd.org/