Skip to main content
Log in

DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network

  • Published:
Machine Learning Aims and scope Submit manuscript

Abstract

The ever-increasing layers and hyper-parameters of deep neural network are continuously growing to generate large-scale network by training huge masses of data. However, it is difficult to deploy deep neural network on resource-constrained edge devices. Network mixed-precision quantization is a challenging way to prune and compress deep neural network models while discovering the optimal bit width for each layer. To solve the big challenge, we therefore propose the dynamic pseudo-mean mixed-precision quantization (DPQ) by introducing two-bit scaling factors to compensate errors of quantization. Furthermore, the activation quantization named random parameters clipping (RPC) is proposed. RPC adopts partial activation quantization to reduce loss of accuracy. Therefore, DPQ can dynamically adjust the bit precision of weight quantization according to the distribution of weights. It results in a quantification scheme with strong robustness compared to previous methods. Extensive experiments demonstrate that DPQ achieves 15.43\(\times\) compression rate of ResNet20 on CIFAR-10 dataset with 0.22% increase in accuracy, and 35.25\(\times\) compression rate of Resnet56 on SVHN dataset with 0.12% increase in accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Code availability

The source code of the relevant algorithms is referenced in the paper. The code for the experimental reproduction will be available from the author.

References

  • Bengio, Y., Léonard, N., & Courville, A. (2013). Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432

  • Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In Computer vision—ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I (Vol. 16, pp. 213–229). Springer.

  • Chang, S.-E., Li, Y., Sun, M., Jiang, W., Liu, S., Wang, Y., & Lin, X. (2021). RMSMP: A novel deep neural network quantization framework with row-wise mixed schemes and multiple precisions. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 5251–5260).

  • Chang, S.-E., Li, Y., Sun, M., Jiang, W., Shi, R., Lin, X., & Wang, Y. (2020). MSP: An FPGA-specific mixed-scheme, multi-precision deep neural network quantization framework. arXiv preprint arXiv:2009.07460

  • Choi, D., & Kim, H. (2022). Hardware-friendly logarithmic quantization with mixed-precision for mobilenetv2. In 2022 IEEE 4th international conference on artificial intelligence circuits and systems (AICAS) (pp. 348–351). IEEE.

  • Choi, J., Wang, Z., Venkataramani, S., Chuang, P. I.-J., Srinivasan, V., & Gopalakrishnan, K. (2018). Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085

  • Dong, Z., Yao, Z., Gholami, A., Mahoney, M. W., & Keutzer, K. (2019). Hawq: Hessian aware quantization of neural networks with mixed-precision. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 293–302).

  • Esser, S. K., McKinstry, J. L., Bablani, D., Appuswamy, R., & Modha, D. S. (2019). Learned step size quantization. arXiv preprint arXiv:1902.08153

  • He, D., Yang, Z., Chen, Y., Zhang, Q., Qin, H., & Wang, Y. (2022). Post-training quantization for cross-platform learned image compression. arXiv preprint arXiv:2202.07513

  • Hubara, I., Nahshan, Y., Hanani, Y., Banner, R., & Soudry, D. (2021). Accurate post training quantization with small calibration sets. In International conference on machine learning (pp. 4466–4475). PMLR.

  • Jin, Q., Ren, J., Zhuang, R., Hanumante, S., Li, Z., Chen, Z., Wang, Y., Yang, K., & Tulyakov, S. (2022). F8net: Fixed-point 8-bit only multiplication for network quantization. arXiv preprint arXiv:2202.05239

  • Jin, H., Song, Q., & Hu, X. (2019). Auto-Keras: An efficient neural architecture search system. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 1946–1956).

  • Kirillov, A., Wu, Y., He, K., & Girshick, R. (2020). Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9799–9808).

  • Li, Y., Gong, R., Tan, X., Yang, Y., Hu, P., Zhang, Q., Yu, F., Wang, W., & Gu, S. (2021). Brecq: Pushing the limit of post-training quantization by block reconstruction. arXiv preprint arXiv:2102.05426

  • Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., & Zhang, C. (2017). Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE international conference on computer vision (pp. 2736–2744).

  • Liu, Z., Wang, Y., Han, K., Ma, S., Gao, W.: Instance-aware dynamic neural network quantization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12434–12443) (2022)

  • Liu, J., Zhuang, B., Chen, P., Shen, C., Cai, J., & Tan, M. (2021). Single-path bit sharing for automatic loss-aware model compression. arXiv preprint arXiv:2101.04935

  • Martinez, J., Shewakramani, J., Liu, T. W., Bârsan, I. A., Zeng, W., & Urtasun, R. (2021). Permute, quantize, and fine-tune: Efficient compression of neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 15699–15708).

  • Nagel, M., Fournarakis, M., Amjad, R. A., Bondarenko, Y., Van Baalen, M., & Blankevoort, T. (2021). A white paper on neural network quantization. arXiv preprint arXiv:2106.08295

  • Shomron, G., Gabbay, F., Kurzum, S., & Weiser, U. (2021). Post-training sparsity-aware quantization. Advances in Neural Information Processing Systems, 34, 17737–17748.

    Google Scholar 

  • Sun, S., Cheng, Y., Gan, Z., & Liu, J. (2019). Patient knowledge distillation for bert model compression. arXiv preprint arXiv:1908.09355

  • Vemparala, M. R., Fasfous, N., Frickenstein, L., Frickenstein, A., Singh, A., Salihu, D., Unger, C., Nagaraja, N.-S., & Stechele, W. (2021). Hardware-aware mixed-precision neural networks using in-train quantization. In British machine vision conference (BMVC).

  • Wang, K., Liu, Z., Lin, Y., Lin, J., & Han, S. (2019). HAQ: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8612–8620).

  • Wei, X., Gong, R., Li, Y., Liu, X., & Yu, F. (2022). QDrop: Randomly dropping quantization for extremely low-bit post-training quantization. arXiv preprint arXiv:2203.05740

  • Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., Tian, Y., Vajda, P., Jia, Y., & Keutzer, K. (2019). Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10734–10742).

  • Yang, H., Duan, L., Chen, Y., & Li, H. (2021). Bsq: Exploring bit-level sparsity for mixed-precision neural network quantization. arXiv preprint arXiv:2102.10462

  • Yu, H., Han, Q., Li, J., Shi, J., Cheng, G., & Fan, B. (2020). Search what you want: Barrier penalty NAS for mixed precision quantization. In Computer vision—ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX (Vol. 16, pp. 1–16). Springer.

  • Yu, Z., & Shi, Y. (2022). Kernel quantization for efficient network compression. IEEE Access, 10, 4063–4071.

    Article  Google Scholar 

  • Zhang, Y., Qin, J., Park, D. S., Han, W., Chiu, C.-C., Pang, R., Le, Q. V., & Wu, Y. (2020). Pushing the limits of semi-supervised learning for automatic speech recognition. arXiv preprint arXiv:2010.10504

  • Zhang, D., Yang, J., Ye, D., & Hua, G. (2018). LQ-Nets: Learned quantization for highly accurate and compact deep neural networks. In Proceedings of the European conference on computer vision (ECCV) (pp. 365–382).

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their invaluable comments. This work was partially funded by the National Natural Science Foundation of China under Grant No. 61975124, Shanghai Natural Science Foundation (20ZR1438500), State Key Laboratory of Computer Architecture (ICT, CAS) under Grant No. CARCHA202111, and Engineering Research Center of Software/Hardware Co-design Technology and Application, Ministry of Education, East China Normal University under Grant No. OP202202. Any opinions, findings and conclusions expressed in this paper are those of the authors and do not necessarily reflect the views of the sponsors.

Funding

This work was partially funded by the National Natural Science Foundation of China under Grant No. 61975124, Shanghai Natural Science Foundation (20ZR1438500), State Key Laboratory of Computer Architecture (ICT, CAS) under Grant No.CARCHA202111, and Engineering Research Center of Software/Hardware Co-design Technology and Application, Ministry of Education, East China Normal University under Grant No. OP202202.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by MC, XY, HX and WQ. The method was proposed by SP, JW and BZ. The first draft of the manuscript was written by SP and JW. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Songwen Pei.

Ethics declarations

Conflict of interest

The authors declare that they have no confict of interest.

Consent to participate

All authors agreed with the content and that all gave explicit consent to submit.

Consent for publication

All authors consent for submission, and publication.

Ethics approval

The manuscript does not be submitted to more than one journal for simultaneous consideration. This work is original and are not published elsewhere in any form or language.

Additional information

Editors: Feida Zhu, Bin Yang, João Gama

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pei, S., Wang, J., Zhang, B. et al. DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network. Mach Learn 113, 4099–4112 (2024). https://doi.org/10.1007/s10994-023-06453-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10994-023-06453-3

Keywords

Navigation