Abstract
Deep convolutional neural networks (DCNNs) contain millions of parameters and require a tremendous amount of computation; therefore, they cannot be well supported by resource-constrained edge devices. We propose a two-stage model compression method to alleviate this restriction: channel pruning and group vector quantization (CP-GVQ). By channel pruning, many channels of the DCNNs layers are pruned to reduce the model size and improve model inference speed. Based on vector quantization (VQ), GVQ is proposed to compress DCNNs, and it uses group codebooks and code matrices to represent the parameters of grouped layers; the model size is reduced greatly. CP-GVQ not only dramatically decreases model size but also improves inference speed. In each stage, it is necessary to fine-tune the model to recover the original accuracy. When applied to the filament indices classification model of microscopic images of activated sludge, the classification accuracy decreased marginally from 0.99 to 0.97, but the model size was decreased by 99% and the inference speed was improved by 42%.
Similar content being viewed by others
Data availability
The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.
References
Qian W, Yang X, Peng S, Yan J, Guo Y (2021) Learning modulated loss for rotated object detection. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 2458–2466
Mafla A, Dey S, Biten AF, Gomez L, Karatzas D (2021) Multi-modal reasoning graph for scene-text based fine-grained image classification and retrieval. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 4023–4033
Denil M, Shakibi B, Dinh L, Ranzato M, De Freitas N (2013) Predicting parameters in deep learning. In: Advances in neural information processing systems, vol 26
Li Z, Wallace E, Shen S, Lin K, Keutzer K, Klein D, Gonzalez J (2020) Train big, then compress: rethinking model size for efficient training and inference of transformers. In: International conference on machine learning. PMLR, pp 5958–5968
Deng L, Li G, Han S, Shi L, Xie Y (2020) Model compression and hardware acceleration for neural networks: a comprehensive survey. Proc. IEEE 108(4):485–532
Tan Z, Song J, Ma X, Tan S-H, Chen H, Miao Y, Wu Y, Ye S, Wang Y, Li D et al (2020) Pcnn: pattern-based fine-grained regular pruning towards optimizing CNN accelerators. In: 2020 57th ACM/IEEE design automation conference (DAC). IEEE, pp 1–6
Anwar S, Hwang K, Sung W (2017) Structured pruning of deep convolutional neural networks. ACM J Emerg Technol Comput Syst (JETC) 13(3):1–18
Ding S, Meadowlark P, He Y, Lew L, Agrawal S, Rybakov O (2022) 4-bit conformer with native quantization aware training for speech recognition. arXiv:2203.15952
Son S, Nah S, Lee KM (2018) Clustering convolutional kernels to compress deep neural networks. In: Proceedings of the European conference on computer vision (ECCV), pp 216–232
Martinez J, Shewakramani J, Liu TW, Bârsan IA, Zeng W, Urtasun R (2021) Permute, quantize, and fine-tune: efficient compression of neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15699–15708
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Wu J, Leng C, Wang Y, Hu Q, Cheng J (2016) Quantized convolutional neural networks for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4820–4828
Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv:1510.00149
Tung F, Mori G (2018) Deep neural network compression by in-parallel pruning-quantization. IEEE Trans Pattern Anal Mach Intell 42(3):568–579
Mishra R, Gupta HP, Dutta T (2020) A survey on deep neural network compression: challenges, overview, and solutions. arXiv:2010.03954
Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE international conference on computer vision, pp 2736–2744
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR, pp 448–456
Boyd S, Boyd SP, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New York
Chen AM, Lu H-M, Hecht-Nielsen R (1993) On the geometry of feedforward neural network error surfaces. Neural Comput 5(6):910–927
Martinez J, Zakhmi S, Hoos HH, Little JJ (2018) Lsq++: lower running time and higher recall in multi-codebook quantization. In: Proceedings of the European conference on computer vision (ECCV), pp 491–506
Tieleman T, Hinton G et al (2012) Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw Mach Learn 4(2):26–31
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Guo J, Peng Y, Wang S, Yang X, Yuan Z (2014) Filamentous and non-filamentous bulking of activated sludge encountered under nutrients limitation or deficiency conditions. Chem Eng J 255:453–461
Valverde-Pérez B, Wágner DS, Lóránt B, Gülay A, Smets BF, Plósz BG (2016) Short-sludge age EBPR process-microbial and biochemical process characterisation during reactor start-up and operation. Water Res 104:320–329
Federation WE, Association A et al (2005) Standard methods for the examination of water and wastewater. American Public Health Association (APHA), Washington, p 21
Heine W, Sekoulov I, Burkhardt H, Bergen L, Behrendt J (2002) Early warning-system for operation-failures in biological stages of WWTPs by on-line image analysis. Water Sci Technol 46(4–5):117–124
Liwarska-Bizukojc E (2005) Application of image analysis techniques in activated sludge wastewater treatment processes. Biotechnol Lett 27(19):1427–1433
Khan MB, Lee XY, Nisar H, Ng CA, Yeap KH, Malik AS (2015) Digital image processing and analysis for activated sludge wastewater treatment. Signal Image Anal Biomed Life Sci 227–248
Jenkins D, Richard MG, Daigger GT (2003) Manual on the causes and control of activated sludge bulking, foaming, and other solids separation problems. CRC Press, Boca Raton
Acknowledgements
This work was supported by the National Key R &D Program of China under Grant 2018YFB1700200, 2020 Support Plan for Innovative Talents of Higher Education and 2021 Basic Scientific Research Projects of Higher Education (Key Projects) LJKZ0422
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, M., Liu, Y., Zhao, L. et al. A lightweight deep neural network model and its applications based on channel pruning and group vector quantization. Neural Comput & Applic 36, 5333–5346 (2024). https://doi.org/10.1007/s00521-023-09332-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09332-z