Skip to main content

Explicit Model Size Control and Relaxation via Smooth Regularization for Mixed-Precision Quantization

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13672))

Included in the following conference series:

Abstract

While Deep Neural Networks (DNNs) quantization leads to a significant reduction in computational and storage costs, it reduces model capacity and therefore, usually leads to an accuracy drop. One of the possible ways to overcome this issue is to use different quantization bit-widths for different layers. The main challenge of the mixed-precision approach is to define the bit-widths for each layer, while staying under memory and latency requirements. Motivated by this challenge, we introduce a novel technique for explicit complexity control of DNNs quantized to mixed-precision, which uses smooth optimization on the surface containing neural networks of constant size. Furthermore, we introduce a family of smooth quantization regularizers, which can be used jointly with our complexity control method for both post-training mixed-precision quantization and quantization-aware training. Our approach can be applied to any neural network architecture. Experiments show that the proposed techniques reach state-of-the-art results.

V. Chikin and K. Solodskikh—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Banner, R., Nahshan, Y., Soudry, D.: Post training 4-bit quantization of convolutional networks for rapid-deployment. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/c0a62e133894cdce435bcb4a5df1db2d-Paper.pdf

  2. Bengio, Y., Léonard, N., Courville, A.C.: Estimating or propagating gradients through stochastic neurons for conditional computation. CoRR abs/1308.3432 (2013). https://arxiv.org/abs/1308.3432

  3. Cai, Y., Yao, Z., Dong, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: ZeroQ: a novel zero shot quantization framework. CoRR abs/2001.00281 (2020). https://arxiv.org/abs/2001.00281

  4. Cai, Z., Vasconcelos, N.: Rethinking differentiable search for mixed-precision neural networks (2020)

    Google Scholar 

  5. Choi, J., Wang, Z., Venkataramani, S., Chuang, P.I., Srinivasan, V., Gopalakrishnan, K.: PACT: parameterized clipping activation for quantized neural networks. CoRR abs/1805.06085 (2018)

    Google Scholar 

  6. Choi, Y., El-Khamy, M., Lee, J.: Learning low precision deep neural networks through regularization (2018)

    Google Scholar 

  7. Choukroun, Y., Kravchik, E., Kisilev, P.: Low-bit quantization of neural networks for efficient inference. CoRR abs/1902.06822 (2019). https://arxiv.org/abs/1902.06822

  8. Chu, T., Luo, Q., Yang, J., Huang, X.: Mixed-precision quantized neural network with progressively decreasing bitwidth for image classification and object detection. CoRR abs/1912.12656 (2019). https://arxiv.org/abs/1912.12656

  9. Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to +1 or -1 (2016)

    Google Scholar 

  10. Dong, Z., Yao, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: HAWQ: Hessian aware quantization of neural networks with mixed-precision. CoRR abs/1905.03696 (2019). https://arxiv.org/abs/1905.03696

  11. Elthakeb, A.T., Pilligundla, P., Esmaeilzadeh, H.: SinReQ: generalized sinusoidal regularization for low-bitwidth deep quantized training (2019)

    Google Scholar 

  12. Elthakeb, A.T., Pilligundla, P., Mireshghallah, F., Yazdanbakhsh, A., Esmaeilzadeh, H.: ReLeQ: a reinforcement learning approach for deep quantization of neural networks (2020)

    Google Scholar 

  13. Esser, S.K., McKinstry, J.L., Bablani, D., Appuswamy, R., Modha, D.S.: Learned step size quantization. CoRR abs/1902.08153 (2019). https://arxiv.org/abs/1902.08153

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)

    Google Scholar 

  15. Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Quantized neural networks: training neural networks with low precision weights and activations (2016)

    Google Scholar 

  16. Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report (2009)

    Google Scholar 

  17. Nagel, M., van Baalen, M., Blankevoort, T., Welling, M.: Data-free quantization through weight equalization and bias correction. CoRR abs/1906.04721 (2019). https://arxiv.org/abs/1906.04721

  18. Naumov, M., Diril, U., Park, J., Ray, B., Jablonski, J., Tulloch, A.: On periodic functions as regularizers for quantization of neural networks (2018)

    Google Scholar 

  19. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. CoRR abs/1409.0575 (2014). https://arxiv.org/abs/1409.0575

  20. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV 2: inverted residuals and linear bottlenecks (2018)

    Google Scholar 

  21. Shi, W., et al.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. CoRR abs/1609.05158 (2016). https://arxiv.org/abs/1609.05158

  22. Uhlich, S., et al.: Differentiable quantization of deep neural networks. CoRR abs/1905.11452 (2019). https://arxiv.org/abs/1905.11452

  23. Wang, K., Liu, Z., Lin, Y., Lin, J., Han, S.: HAQ: hardware-aware automated quantization with mixed precision (2019)

    Google Scholar 

  24. Wu, B., Wang, Y., Zhang, P., Tian, Y., Vajda, P., Keutzer, K.: Mixed precision quantization of convnets via differentiable neural architecture search. CoRR abs/1812.00090 (2018)

    Google Scholar 

  25. Yao, Z., et al.: HAWQV3: dyadic neural network quantization. CoRR abs/2011.10680 (2020). https://arxiv.org/abs/2011.10680

  26. Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients (2016)

    Google Scholar 

  27. Zur, Y., et al.: Towards learning of filter-level heterogeneous compression of convolutional neural networks (2019). https://doi.org/10.48550/ARXIV.1904.09872. https://arxiv.org/abs/1904.09872

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir Chikin .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 15480 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chikin, V., Solodskikh, K., Zhelavskaya, I. (2022). Explicit Model Size Control and Relaxation via Smooth Regularization for Mixed-Precision Quantization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13672. Springer, Cham. https://doi.org/10.1007/978-3-031-19775-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19775-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19774-1

  • Online ISBN: 978-3-031-19775-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics