Abstract
While Deep Neural Networks (DNNs) quantization leads to a significant reduction in computational and storage costs, it reduces model capacity and therefore, usually leads to an accuracy drop. One of the possible ways to overcome this issue is to use different quantization bit-widths for different layers. The main challenge of the mixed-precision approach is to define the bit-widths for each layer, while staying under memory and latency requirements. Motivated by this challenge, we introduce a novel technique for explicit complexity control of DNNs quantized to mixed-precision, which uses smooth optimization on the surface containing neural networks of constant size. Furthermore, we introduce a family of smooth quantization regularizers, which can be used jointly with our complexity control method for both post-training mixed-precision quantization and quantization-aware training. Our approach can be applied to any neural network architecture. Experiments show that the proposed techniques reach state-of-the-art results.
V. Chikin and K. Solodskikh—These authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Banner, R., Nahshan, Y., Soudry, D.: Post training 4-bit quantization of convolutional networks for rapid-deployment. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/c0a62e133894cdce435bcb4a5df1db2d-Paper.pdf
Bengio, Y., Léonard, N., Courville, A.C.: Estimating or propagating gradients through stochastic neurons for conditional computation. CoRR abs/1308.3432 (2013). https://arxiv.org/abs/1308.3432
Cai, Y., Yao, Z., Dong, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: ZeroQ: a novel zero shot quantization framework. CoRR abs/2001.00281 (2020). https://arxiv.org/abs/2001.00281
Cai, Z., Vasconcelos, N.: Rethinking differentiable search for mixed-precision neural networks (2020)
Choi, J., Wang, Z., Venkataramani, S., Chuang, P.I., Srinivasan, V., Gopalakrishnan, K.: PACT: parameterized clipping activation for quantized neural networks. CoRR abs/1805.06085 (2018)
Choi, Y., El-Khamy, M., Lee, J.: Learning low precision deep neural networks through regularization (2018)
Choukroun, Y., Kravchik, E., Kisilev, P.: Low-bit quantization of neural networks for efficient inference. CoRR abs/1902.06822 (2019). https://arxiv.org/abs/1902.06822
Chu, T., Luo, Q., Yang, J., Huang, X.: Mixed-precision quantized neural network with progressively decreasing bitwidth for image classification and object detection. CoRR abs/1912.12656 (2019). https://arxiv.org/abs/1912.12656
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to +1 or -1 (2016)
Dong, Z., Yao, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: HAWQ: Hessian aware quantization of neural networks with mixed-precision. CoRR abs/1905.03696 (2019). https://arxiv.org/abs/1905.03696
Elthakeb, A.T., Pilligundla, P., Esmaeilzadeh, H.: SinReQ: generalized sinusoidal regularization for low-bitwidth deep quantized training (2019)
Elthakeb, A.T., Pilligundla, P., Mireshghallah, F., Yazdanbakhsh, A., Esmaeilzadeh, H.: ReLeQ: a reinforcement learning approach for deep quantization of neural networks (2020)
Esser, S.K., McKinstry, J.L., Bablani, D., Appuswamy, R., Modha, D.S.: Learned step size quantization. CoRR abs/1902.08153 (2019). https://arxiv.org/abs/1902.08153
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Quantized neural networks: training neural networks with low precision weights and activations (2016)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report (2009)
Nagel, M., van Baalen, M., Blankevoort, T., Welling, M.: Data-free quantization through weight equalization and bias correction. CoRR abs/1906.04721 (2019). https://arxiv.org/abs/1906.04721
Naumov, M., Diril, U., Park, J., Ray, B., Jablonski, J., Tulloch, A.: On periodic functions as regularizers for quantization of neural networks (2018)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. CoRR abs/1409.0575 (2014). https://arxiv.org/abs/1409.0575
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV 2: inverted residuals and linear bottlenecks (2018)
Shi, W., et al.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. CoRR abs/1609.05158 (2016). https://arxiv.org/abs/1609.05158
Uhlich, S., et al.: Differentiable quantization of deep neural networks. CoRR abs/1905.11452 (2019). https://arxiv.org/abs/1905.11452
Wang, K., Liu, Z., Lin, Y., Lin, J., Han, S.: HAQ: hardware-aware automated quantization with mixed precision (2019)
Wu, B., Wang, Y., Zhang, P., Tian, Y., Vajda, P., Keutzer, K.: Mixed precision quantization of convnets via differentiable neural architecture search. CoRR abs/1812.00090 (2018)
Yao, Z., et al.: HAWQV3: dyadic neural network quantization. CoRR abs/2011.10680 (2020). https://arxiv.org/abs/2011.10680
Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients (2016)
Zur, Y., et al.: Towards learning of filter-level heterogeneous compression of convolutional neural networks (2019). https://doi.org/10.48550/ARXIV.1904.09872. https://arxiv.org/abs/1904.09872
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chikin, V., Solodskikh, K., Zhelavskaya, I. (2022). Explicit Model Size Control and Relaxation via Smooth Regularization for Mixed-Precision Quantization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13672. Springer, Cham. https://doi.org/10.1007/978-3-031-19775-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-19775-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19774-1
Online ISBN: 978-3-031-19775-8
eBook Packages: Computer ScienceComputer Science (R0)