Multiple Residual Quantization of Pruning

Zhou, Yuee; Kang, HaiDong; Zhang, Tian; Ma, LianBo; Xing, TieJun

doi:10.1007/978-981-19-9297-1_16

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1744))

Included in the following conference series:

International Conference on Data Mining and Big Data

544 Accesses

Abstract

Model compression technology investigates the compression of deep neural networks by quantizing the full-precision weights of the network into low-bit ones, to achieve network acceleration. However, most of the existing quantization operations are calculated by simple thresholding operations, which will lead to serious precision loss. In this paper, we propose a new quantization framework combined with pruning, called Multiple Residual Quantization of Pruning (MRQP), to achieve higher precision quantization neural network (QNN). MRQP recursively performs quantization of the full-precision weights by combining the low-bit weights stem and residual parts many times, to minimize the error between the quantized weights and the full-precision weights, and to ensure higher precision quantization. At the same time, MRQP prunes some weights that have less impact on loss function to further reduce model size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

HFPQ: deep neural network compression by hardware-friendly pruning-quantization

Article 23 February 2021

DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network

Article 31 January 2024

Bit-Quantized-Net: An Effective Method for Compressing Deep Neural Networks

Article 20 January 2021

References

Pradhyumna, P., Shreya, G.P.: Graph neural network (GNN) in image and video understanding using deep learning for computer vision applications. In: International Conference on Electronics and Sustainable Communication Systems (ICESC), pp. 1183–1189 (2021). https://doi.org/10.1109/ICESC51422.2021.9532631
Zhang, X., Yi, W.J., Saniie, J.: Home surveillance system using computer vision and convolutional neural network. In: International Conference on Electro Information Technology (EIT), pp. 266–270 (2019). https://doi.org/10.1109/EIT.2019.8833773
Bantupalli, K., Xie, Y.: American sign language recognition using deep learning and computer vision. IEEE International Conference on Big Data (Big Data), pp. 4896–4899 (2018). https://doi.org/10.1109/BigData.2018.8622141
Nassif, A.B., Shahin, I., Attili, I., et al.: Speech recognition using deep neural networks: a systematic review[J]. IEEE ACCESS 7, 19143–19165 (2019)
Article Google Scholar
Shewalkar, A.: Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM and GRU. J. Artif Intell. Soft Comput. Res. 9(4), 235–245 (2019)
Article Google Scholar
Lokesh, S., Malarvizhi Kumar, P., Ramya Devi, M., et al.: An automatic Tamil speech recognition system by using bidirectional recurrent neural network with self-organizing map[J]. Neural Comput. Appl. 31(5), 1521–1531 (2019)
Article Google Scholar
Giménez, M., Palanca, J., Botti, V.: Semantic-based padding in convolutional neural networks for improving the performance in natural language processing. a case of study in sentiment analysis. Neurocomputing 378, 315–323 (2020)
Google Scholar
Galassi, A., Lippi, M., Torroni, P.: Attention in natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 32(10), 4291–4308 (2020)
Article Google Scholar
Moon, J., Kim, H., Lee, B.: View-point invariant 3D classification for mobile robots using a convolutional neural network. Int. J. Control Autom. Syst. 16(6), 2888–2895 (2018)
Article Google Scholar
Zeng, R., Zeng, C., Wang, X., Li, B., Chu, X.: Incentive mechanism for federated learning and game-theoretical approach. IEEE Netw. (Early Access), 1–7 (2022)
Google Scholar
Zhang, T., Ma, L., Liu, Q., et al.: Genetic programming for ensemble learning in face recognition. In: International Conference on Sensing and Imaging. Springer, Cham, pp. 209–218 (2022) https://doi.org/10.1007/978-3-031-09726-319
Ma, L., Wang, X., Huang, M., Lin, Z., Tian, L., Chen, H.: Two-level master-slave rfid networks planning via hybrid multi-objective artificial bee colony optimizer. IEEE Trans. Syst. Man Cybernet. Syst. 49(5), 861–880 (2019)
Google Scholar
Lianbo. M., Cheng, S., Shi, M.: Enhancing learning efficiency of brain storm optimization via orthogonal learning design. IEEE Trans. Syst. Man Cybernet.: Syst. 51(11), 6723–6742 (2021)
Google Scholar
Ma, L., Huang, M., Yang, S., Wang, R., Wang, X.: An adaptive localized decision variable analysis approach to large-scale multiobjective and many-objective optimization. IEEE Trans. Cybern. 52(7) (2022)
Google Scholar
Molchanov, P., Mallya, A., Tyree, S., et al.: Importance estimation for neural network pruning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11264–11272 (2019). https://doi.org/10.1109/CVPR.2019.01152
Yang, Y., Qiu, J., Song, M., et al.: Distilling knowledge from graph convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7074–7083 (2020). https://doi.org/10.1109/CVPR42600.2020.00710
Liu, F., Wu, X., Ge, S., et al.: Exploring and distilling posterior and prior knowledge for radiology report generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13753–13762 (2021). https://doi.org/10.1109/CVPR46437.2021.01354
Li, Y., Ding, W., Liu, C., et al.: TRQ: Ternary neural networks with residual quantization. Proc. AAAI Conf. Artif. Intell. 35(10), 8538–8546 (2021). https://doi.org/10.1609/aaai.v35i10.17036
Article Google Scholar
Qu, Z., Zhou, Z., Cheng, Y., et al.: Adaptive loss-aware quantization for multi-bit networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7988–7997 (2020). https://doi.org/10.1109/CVPR42600.2020.00801
Peng, H., Wu, J., Zhang, Z., et al.: Deep network quantization via error compensation. IEEE Trans. Neural Netw. Learn. Syst. 33(9), 4960–4970 (2021)
Article Google Scholar
Chen, P., Zhuang, B., Shen, C.: FATNN: fast and accurate ternary neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5219–5228 (2021). https://doi.org/10.1109/ICCV48922.2021.00517
Courbariaux, M., Bengio, Y., David, J.P.: BinaryConnect: training deep neural networks with binary weights during propagations. Adv. Neural. Inf. Process. Syst. 2, 3123–3131 (2015)
Google Scholar
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Chapter Google Scholar
Zhu, C., Han, S., Mao, H., et al.: Trained ternary quantization (2016). https://arxiv.org/abs/1612.01064
Nahshan, Y., Chmiel, B., Baskin, C., et al.: Loss aware post-training quantization. Mach. Learn. 10(11), 3245–3262 (2021)
Article MATH Google Scholar
Yin, P., Lyu, J., Zhang, S,. et al.: Understanding straight-through estimator in training activation quantized neural nets (2019). https://arxiv.org/abs/1903.05662
Yang, J., Shen, X., Xing, J., et al.: Quantization networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7308–7316 (2019). https://doi.org/10.1109/CVPR.2019.00748
Liu, Z., Wang, Y., Han, K., et al.: Post-training quantization for vision transformer. Adv. Neural. Inf. Process. Syst. 34, 28092–28103 (2021)
Google Scholar
Nedic, A., Olshevsky, A., Ozdaglar, A., et al.: On distributed averaging algorithms and quantization effects. IEEE Trans. Autom. Control 54(11), 2506–2517 (2009)
Article MATH Google Scholar
Deng, L., Jiao, P., Pei, J., et al.: GXNOR-Net: training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework[J]. Neural Netw. 100, 49–58 (2018)
Article MATH Google Scholar
Bulat, A., Tzimiropoulos, G.: XNOR-Net++: improved Binary Neural Networks (2019). https://arxiv.org/abs/1909.13863
Kim, H., Kim, K., Kim, J., et al.: BinaryDuo: reducing gradient mismatch in binary activation network by coupling binary activations. In: International Conference on Learning Representations (2019). https://doi.org/10.48550/arXiv.2002.06517
Kim, D., Lee, J., Ham, B.: Distance-aware quantization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5271–5280 (2021). https://doi.org/10.1109/ICCV48922.2021.00522
Hou, L., Yao, Q., Kwok, J.T.Y.: Loss-aware binarization of deep networks. In: International Conference on Learning Representations (2017). https://ui.adsabs.harvard.edu/abs/2016arXiv161101600H
Zhang, D., Yang, J., Ye, D., Hua, G.: LQ-Nets: learned quantization for highly accurate and compact deep neural networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 373–390. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_23
Chapter Google Scholar
Louizos, C., Reisser, M., Blankevoort, T., et al.: Relaxed quantization for discretized neural networks. In: International Conference on Learning Representations (2018). https://doi.org/10.48550/arXiv.1810.01875
Miyashita, D., Lee, E.H., Murmann, B.: Convolutional neural networks using logarithmic data representation (2016). https://arxiv.org/abs/1603.01025v1

Download references

Acknowledgments

This work is partially suported by NSFC under grant No. 62172083 and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

College of Software, Northeastern University, Shenyang, China
Yuee Zhou, HaiDong Kang, Tian Zhang & LianBo Ma
Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China
LianBo Ma
Neusoft Corporation, Shenyang, China
TieJun Xing

Authors

Yuee Zhou
View author publications
You can also search for this author in PubMed Google Scholar
HaiDong Kang
View author publications
You can also search for this author in PubMed Google Scholar
Tian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
LianBo Ma
View author publications
You can also search for this author in PubMed Google Scholar
TieJun Xing
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to LianBo Ma .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Ying Tan
Southern University of Science and Technology, Shenzhen, China
Yuhui Shi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, Y., Kang, H., Zhang, T., Ma, L., Xing, T. (2022). Multiple Residual Quantization of Pruning. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2022. Communications in Computer and Information Science, vol 1744. Springer, Singapore. https://doi.org/10.1007/978-981-19-9297-1_16

Download citation

DOI: https://doi.org/10.1007/978-981-19-9297-1_16
Published: 20 January 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-9296-4
Online ISBN: 978-981-19-9297-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multiple Residual Quantization of Pruning

Abstract

Access this chapter

Similar content being viewed by others

HFPQ: deep neural network compression by hardware-friendly pruning-quantization

DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network

Bit-Quantized-Net: An Effective Method for Compressing Deep Neural Networks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Multiple Residual Quantization of Pruning

Abstract

Access this chapter

Similar content being viewed by others

HFPQ: deep neural network compression by hardware-friendly pruning-quantization

DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network

Bit-Quantized-Net: An Effective Method for Compressing Deep Neural Networks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation