DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network

Pei, Songwen; Wang, Jiyao; Zhang, Bingxue; Qin, Wei; Xue, Hai; Ye, Xiaochun; Chen, Mingsong

doi:10.1007/s10994-023-06453-3

DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network

Published: 31 January 2024

Volume 113, pages 4099–4112, (2024)
Cite this article

Machine Learning Aims and scope Submit manuscript

Songwen Pei ORCID: orcid.org/0000-0003-0810-1458^1,2,3,
Jiyao Wang¹,
Bingxue Zhang¹,
Wei Qin¹,
Hai Xue¹,
Xiaochun Ye² &
…
Mingsong Chen³

263 Accesses
1 Altmetric
Explore all metrics

Abstract

The ever-increasing layers and hyper-parameters of deep neural network are continuously growing to generate large-scale network by training huge masses of data. However, it is difficult to deploy deep neural network on resource-constrained edge devices. Network mixed-precision quantization is a challenging way to prune and compress deep neural network models while discovering the optimal bit width for each layer. To solve the big challenge, we therefore propose the dynamic pseudo-mean mixed-precision quantization (DPQ) by introducing two-bit scaling factors to compensate errors of quantization. Furthermore, the activation quantization named random parameters clipping (RPC) is proposed. RPC adopts partial activation quantization to reduce loss of accuracy. Therefore, DPQ can dynamically adjust the bit precision of weight quantization according to the distribution of weights. It results in a quantification scheme with strong robustness compared to previous methods. Extensive experiments demonstrate that DPQ achieves 15.43\(\times\) compression rate of ResNet20 on CIFAR-10 dataset with 0.22% increase in accuracy, and 35.25\(\times\) compression rate of Resnet56 on SVHN dataset with 0.12% increase in accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple Residual Quantization of Pruning

HFPQ: deep neural network compression by hardware-friendly pruning-quantization

Article 23 February 2021

BiTAT: Neural Network Binarization with Task-Dependent Aggregated Transformation

Code availability

The source code of the relevant algorithms is referenced in the paper. The code for the experimental reproduction will be available from the author.

References

Bengio, Y., Léonard, N., & Courville, A. (2013). Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In Computer vision—ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I (Vol. 16, pp. 213–229). Springer.
Chang, S.-E., Li, Y., Sun, M., Jiang, W., Liu, S., Wang, Y., & Lin, X. (2021). RMSMP: A novel deep neural network quantization framework with row-wise mixed schemes and multiple precisions. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 5251–5260).
Chang, S.-E., Li, Y., Sun, M., Jiang, W., Shi, R., Lin, X., & Wang, Y. (2020). MSP: An FPGA-specific mixed-scheme, multi-precision deep neural network quantization framework. arXiv preprint arXiv:2009.07460
Choi, D., & Kim, H. (2022). Hardware-friendly logarithmic quantization with mixed-precision for mobilenetv2. In 2022 IEEE 4th international conference on artificial intelligence circuits and systems (AICAS) (pp. 348–351). IEEE.
Choi, J., Wang, Z., Venkataramani, S., Chuang, P. I.-J., Srinivasan, V., & Gopalakrishnan, K. (2018). Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085
Dong, Z., Yao, Z., Gholami, A., Mahoney, M. W., & Keutzer, K. (2019). Hawq: Hessian aware quantization of neural networks with mixed-precision. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 293–302).
Esser, S. K., McKinstry, J. L., Bablani, D., Appuswamy, R., & Modha, D. S. (2019). Learned step size quantization. arXiv preprint arXiv:1902.08153
He, D., Yang, Z., Chen, Y., Zhang, Q., Qin, H., & Wang, Y. (2022). Post-training quantization for cross-platform learned image compression. arXiv preprint arXiv:2202.07513
Hubara, I., Nahshan, Y., Hanani, Y., Banner, R., & Soudry, D. (2021). Accurate post training quantization with small calibration sets. In International conference on machine learning (pp. 4466–4475). PMLR.
Jin, Q., Ren, J., Zhuang, R., Hanumante, S., Li, Z., Chen, Z., Wang, Y., Yang, K., & Tulyakov, S. (2022). F8net: Fixed-point 8-bit only multiplication for network quantization. arXiv preprint arXiv:2202.05239
Jin, H., Song, Q., & Hu, X. (2019). Auto-Keras: An efficient neural architecture search system. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 1946–1956).
Kirillov, A., Wu, Y., He, K., & Girshick, R. (2020). Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9799–9808).
Li, Y., Gong, R., Tan, X., Yang, Y., Hu, P., Zhang, Q., Yu, F., Wang, W., & Gu, S. (2021). Brecq: Pushing the limit of post-training quantization by block reconstruction. arXiv preprint arXiv:2102.05426
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., & Zhang, C. (2017). Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE international conference on computer vision (pp. 2736–2744).
Liu, Z., Wang, Y., Han, K., Ma, S., Gao, W.: Instance-aware dynamic neural network quantization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12434–12443) (2022)
Liu, J., Zhuang, B., Chen, P., Shen, C., Cai, J., & Tan, M. (2021). Single-path bit sharing for automatic loss-aware model compression. arXiv preprint arXiv:2101.04935
Martinez, J., Shewakramani, J., Liu, T. W., Bârsan, I. A., Zeng, W., & Urtasun, R. (2021). Permute, quantize, and fine-tune: Efficient compression of neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 15699–15708).
Nagel, M., Fournarakis, M., Amjad, R. A., Bondarenko, Y., Van Baalen, M., & Blankevoort, T. (2021). A white paper on neural network quantization. arXiv preprint arXiv:2106.08295
Shomron, G., Gabbay, F., Kurzum, S., & Weiser, U. (2021). Post-training sparsity-aware quantization. Advances in Neural Information Processing Systems, 34, 17737–17748.
Google Scholar
Sun, S., Cheng, Y., Gan, Z., & Liu, J. (2019). Patient knowledge distillation for bert model compression. arXiv preprint arXiv:1908.09355
Vemparala, M. R., Fasfous, N., Frickenstein, L., Frickenstein, A., Singh, A., Salihu, D., Unger, C., Nagaraja, N.-S., & Stechele, W. (2021). Hardware-aware mixed-precision neural networks using in-train quantization. In British machine vision conference (BMVC).
Wang, K., Liu, Z., Lin, Y., Lin, J., & Han, S. (2019). HAQ: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8612–8620).
Wei, X., Gong, R., Li, Y., Liu, X., & Yu, F. (2022). QDrop: Randomly dropping quantization for extremely low-bit post-training quantization. arXiv preprint arXiv:2203.05740
Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., Tian, Y., Vajda, P., Jia, Y., & Keutzer, K. (2019). Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10734–10742).
Yang, H., Duan, L., Chen, Y., & Li, H. (2021). Bsq: Exploring bit-level sparsity for mixed-precision neural network quantization. arXiv preprint arXiv:2102.10462
Yu, H., Han, Q., Li, J., Shi, J., Cheng, G., & Fan, B. (2020). Search what you want: Barrier penalty NAS for mixed precision quantization. In Computer vision—ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX (Vol. 16, pp. 1–16). Springer.
Yu, Z., & Shi, Y. (2022). Kernel quantization for efficient network compression. IEEE Access, 10, 4063–4071.
Article Google Scholar
Zhang, Y., Qin, J., Park, D. S., Han, W., Chiu, C.-C., Pang, R., Le, Q. V., & Wu, Y. (2020). Pushing the limits of semi-supervised learning for automatic speech recognition. arXiv preprint arXiv:2010.10504
Zhang, D., Yang, J., Ye, D., & Hua, G. (2018). LQ-Nets: Learned quantization for highly accurate and compact deep neural networks. In Proceedings of the European conference on computer vision (ECCV) (pp. 365–382).

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their invaluable comments. This work was partially funded by the National Natural Science Foundation of China under Grant No. 61975124, Shanghai Natural Science Foundation (20ZR1438500), State Key Laboratory of Computer Architecture (ICT, CAS) under Grant No. CARCHA202111, and Engineering Research Center of Software/Hardware Co-design Technology and Application, Ministry of Education, East China Normal University under Grant No. OP202202. Any opinions, findings and conclusions expressed in this paper are those of the authors and do not necessarily reflect the views of the sponsors.

Funding

This work was partially funded by the National Natural Science Foundation of China under Grant No. 61975124, Shanghai Natural Science Foundation (20ZR1438500), State Key Laboratory of Computer Architecture (ICT, CAS) under Grant No.CARCHA202111, and Engineering Research Center of Software/Hardware Co-design Technology and Application, Ministry of Education, East China Normal University under Grant No. OP202202.

Author information

Authors and Affiliations

School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
Songwen Pei, Jiyao Wang, Bingxue Zhang, Wei Qin & Hai Xue
State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Songwen Pei & Xiaochun Ye
Engineering Research Center of Software/Hardware Co-design Technology and Application, Ministry of Education (East China Normal University), Shanghai, 200062, China
Songwen Pei & Mingsong Chen

Authors

Songwen Pei
View author publications
You can also search for this author in PubMed Google Scholar
Jiyao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bingxue Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Qin
View author publications
You can also search for this author in PubMed Google Scholar
Hai Xue
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochun Ye
View author publications
You can also search for this author in PubMed Google Scholar
Mingsong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by MC, XY, HX and WQ. The method was proposed by SP, JW and BZ. The first draft of the manuscript was written by SP and JW. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Songwen Pei.

Ethics declarations

Conflict of interest

The authors declare that they have no confict of interest.

Consent to participate

All authors agreed with the content and that all gave explicit consent to submit.

Consent for publication

All authors consent for submission, and publication.

Ethics approval

The manuscript does not be submitted to more than one journal for simultaneous consideration. This work is original and are not published elsewhere in any form or language.

Additional information

Editors: Feida Zhu, Bin Yang, João Gama

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pei, S., Wang, J., Zhang, B. et al. DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network. Mach Learn 113, 4099–4112 (2024). https://doi.org/10.1007/s10994-023-06453-3

Download citation

Received: 21 April 2023
Revised: 10 July 2023
Accepted: 07 October 2023
Published: 31 January 2024
Issue Date: July 2024
DOI: https://doi.org/10.1007/s10994-023-06453-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network

Abstract

Access this article

Similar content being viewed by others

Multiple Residual Quantization of Pruning

HFPQ: deep neural network compression by hardware-friendly pruning-quantization

BiTAT: Neural Network Binarization with Task-Dependent Aggregated Transformation

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent to participate

Consent for publication

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network

Abstract

Access this article

Similar content being viewed by others

Multiple Residual Quantization of Pruning

HFPQ: deep neural network compression by hardware-friendly pruning-quantization

BiTAT: Neural Network Binarization with Task-Dependent Aggregated Transformation

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent to participate

Consent for publication

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation