Fragmented Huffman-Based Compression Methodology for CNN Targeting Resource-Constrained Edge Devices

Pal, Chandrajit; Pankaj, Sunil; Akram, Wasim; Biswas, Dwaipayan; Mattela, Govardhan; Acharyya, Amit

doi:10.1007/s00034-022-01968-x

Fragmented Huffman-Based Compression Methodology for CNN Targeting Resource-Constrained Edge Devices

Published: 07 February 2022

Volume 41, pages 3957–3984, (2022)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Chandrajit Pal ORCID: orcid.org/0000-0002-0576-8014¹,
Sunil Pankaj¹,
Wasim Akram¹,
Dwaipayan Biswas²,
Govardhan Mattela¹ &
…
Amit Acharyya¹

255 Accesses
1 Altmetric
Explore all metrics

Abstract

In this paper, we introduce a fragmented Huffman compression methodology for compressing convolution neural networks executing on edge devices. Present scenario demands deployment of deep networks on edge devices, since application needs to adhere to low latency, enhanced security and long-term cost effectiveness. However, the primary bottleneck lies in the expanded memory footprint on account of the large size of the neural net models. Existing software implementation of deep compression strategies do exist, where Huffman compression is applied on the quantized weights, reducing the deep neural network model size. However, there is a further possibility of compression in memory footprint from a hardware design perspective in edge devices, where our proposed methodology can be complementary to the existing strategies. With this motivation, we proposed a fragmented Huffman coding methodology, that can be applied to the binary equivalent of the numeric weights of a neural network model stored in device memory. Subsequently, we also introduced the static and dynamic storage methodology on device memory space which is left behind even after storing the compressed file, that led to a big reduction in area and energy consumption of approximately 38% in case of dynamic storage methodology in comparison with static one. To the best of our knowledge, this is the first study where Huffman compression technique has been revisited by applying it to compress binary files, from a hardware design perspective, based on multiple bit pattern sequences, to achieve a maximum compression rate of 64%. A compressed hardware memory architecture and a decompression module design has also been undertaken, being synthesized at 500 MHz, using GF 40-nm low-power cell library with a nominal voltage of 1.1 V achieving a reduction of 62% dynamic power consumption with a decompression time of about 63 microseconds (\(\upmu \mathrm{s}\)) without trading-off accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The position-based compression techniques for DNN model

Article 08 May 2023

Using Data Compression for Optimizing FPGA-Based Convolutional Neural Network Accelerators

High-efficient MPSoC-based CNNs accelerator with optimized storage and dataflow

Article 21 July 2021

Data Availability

The four neural net model weights as shown in Table 5 are publicly available and are present in the following links: https://tinyurl.com/yyr7bfy6, https://tinyurl.com/yyk64hos.

References

M. Alawad, M. Lin, Memory-efficient probabilistic 2-d finite impulse response (FIR) filter. IEEE Trans. Multi-Scale Comput. Syst. 4, 69–82 (2018). https://doi.org/10.1109/TMSCS.2017.2695588
Article Google Scholar
K. Ando, K. Ueyoshi, K. Orimo, H. Yonekawa, S. Sato, H. Nakahara, M. Ikebe, T. Asai, Takamaeda-S. Yamazaki, T. Kuroda, M. Motomura, BRein memory: a 13-layer 4.2 k neuron/0.8 m synapse binary/ternary reconfigurable in-memory deep neural network accelerator in 65 nm CMOS. In: Symposium on VLSI Circuits (2017), pp. C24–C25. https://doi.org/10.23919/VLSIC.2017.8008533
R. Andri, L. Cavigelli, D. Rossi, L. Benini, Yodann: an ultra-low power convolutional neural network accelerator based on binary weights. In: IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (2016), pp. 236–241. https://doi.org/10.1109/ISVLSI.2016.111
D. Bankman, L. Yang, B. Moons, M. Verhelst, B. Murmann, An always-on 3.8 \(mu\)j/86memory on chip in 28-nm CMOS. IEEE J. Solid-State Circuits 54(1), 158–172 (2019). https://doi.org/10.1109/JSSC.2018.2869150
Article Google Scholar
Y. Chen, T. Yang, J. Emer, V. Sze, Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Sel. Top. Circuits Syst. 9(2), 292–308 (2019). https://doi.org/10.1109/JETCAS.2019.2910232
Article Google Scholar
B. Cheung, Convolutional neural networks applied to human face classification. In: Proceedings of 11th International Conference on Machine Learning and Applications, vol. 2 (2012), pp. 580–583. https://doi.org/10.1109/ICMLA.2012.177
F. Chollet, Xception:deep learning with depthwise separable convolutions. arXiv preprint arXiv:1610.02357 (2016). https://doi.org/10.1109/CVPR.2017.195
L.P. Deutsch, Deflate compressed data format. Specification version 1.3. IETF. p. 1. sec. Abstract. RFC 1951. Retrieved 2014-04-23 (1996)
D. Dobb, A few questions for Igor Pavlov. Data Compression Newsletter (2003)
H.C. Fu, Y.Y. Xu, Multilinguistic handwritten character recognition by Bayesian decision-based neural networks. IEEE Trans. Signal Process 46, 2781–2789 (1998). https://doi.org/10.1109/NNSP.1997.622445
Article Google Scholar
K. Guo, L. Sui, J. Qiu, S. Yao, S. Han, Y. Wang, H. Yang, Angel-eye: a complete design flow for mapping CNN onto customized hardware. In: IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (2016), pp. 24–29. https://doi.org/10.1109/ISVLSI.2016.129
S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M.A. Horowitz, W.J. Dally, EIE: efficient inference engine on compressed deep neural network. SIGARCH Comput. Archit. News 44, 243–254 (2016). https://doi.org/10.1145/3007787.3001163
Article Google Scholar
S. Han, H. Mao, W.J. Dally, Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding (2016). arXiv:1510.00149
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (2016). https://doi.org/10.1109/CVPR.2016.90
Article Google Scholar
D.A. Huffman, A method for the construction of minimum-redundancy codes. Proc. IRE 40, 1098–1101 (1952). https://doi.org/10.1109/JRPROC.1952.273898
Article MATH Google Scholar
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia (2014), pp. 675–678. https://doi.org/10.1145/2647868.2654889
P. Judd, J. Albericio, T. Hetherington, T.M. Aamodt, A. Moshovos, Stripes: bit-serial deep neural network computing. In: 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2016), pp. 1–12. https://doi.org/10.1109/MICRO.2016.7783722
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017). https://doi.org/10.1145/3065386
Article Google Scholar
J. Lee, C. Kim, S. Kang, D. Shin, S. Kim, H. Yoo, Unpu: a 50.6tops/w unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision. In: IEEE International Solid-State Circuits Conference—(ISSCC) (2018), pp. 218–220. https://doi.org/10.1109/ISSCC.2018.8310262
T.Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C.L. Zitnick, Microsoft coco: common objects in context. In: Computer Vision—ECCV (2014), pp. 740–755
A.C. Miguel, I. Yerlan, Model compression as constrained optimization, with application to neural nets.part I: general framework, part II: quantization, electrical engineering and computer science. (University of California, Merced, 2017). http://eecs.ucmerced.edu
B. Moons, R. Uytterhoeven, W. Dehaene, M. Verhelst, 14.5 envision: a 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28 nm FDSOI. In: IEEE International Solid-State Circuits Conference (ISSCC) (2017), pp. 246–247. https://doi.org/10.1109/ISSCC.2017.7870353
B. Moons, R. Uytterhoeven, W. Dehaene, M. Verhelst, Envision: a 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28 nm FDSOI. In: IEEE International Solid-State Circuits Conference (ISSCC) (2017), pp. 246–257. https://doi.org/10.1109/ISSCC.2017.7870353
B. Moons, M. Verhelst, An energy-efficient precision-scalable convnet processor in 40-nm CMOS. IEEE J. Solid-State Circuits 52(4), 903–914 (2017). https://doi.org/10.1109/JSSC.2016.2636225
Article Google Scholar
I.P. Morns, S.S. Dlay, The DSFPN: a new neural network and circuit simulation for optical character recognition. IEEE Trans. Signal Process 51, 3198–3209 (2003). https://doi.org/10.1109/TSP.2003.819009
Article MathSciNet MATH Google Scholar
F. Nariman, G. Andrea, Neural network detection of data sequences in communication systems. IEEE Trans. Signal Process 66, 5663–5678 (2018). https://doi.org/10.1109/TSP.2018.2868322
Article MathSciNet MATH Google Scholar
C. Pal, S. Pankaj, W. Akram, A. Acharyya, Modified Huffman based compression methodology for deep neural network implementation on resource constrained mobile platforms. In: IEEE International Symposium on Circuits and Systems (ISCAS) (2018), pp. 1–5. https://doi.org/10.1109/ISCAS.2018.8351234
S. Sharify, A.D. Lascorz, K. Siu, P. Judd, A. Moshovos, Loom: exploiting weight and activation precisions to accelerate convolutional neural networks. In: 55th ACM/ESDA/IEEE Design Automation Conference (DAC) (2018). https://doi.org/10.1145/3195970.3196072
E. Sitaridi, R. Mueller, T. Kaldewey, G. Lohman, K.A. Ross, Massively-parallel lossless data decompression. In: 45th International Conference on Parallel Processing (ICPP) (2016), pp. 242–247. https://doi.org/10.1109/ICPP.2016.35
L. Steffen, S. Slawomir, A neural architecture for Bayesian compressive sensing over the simplex via Laplace techniques. IEEE Trans. Signal Process 66, 6002–6015 (2018). https://doi.org/10.1109/TSP.2018.2873548
Article MathSciNet MATH Google Scholar
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 2818–2826. https://doi.org/10.1109/CVPR.2016.308
M. Thom, F. Gritschneder, Rapid exact signal scanning with deep convolutional neural networks. IEEE Trans. Signal Process 65, 1235–1250 (2017). https://doi.org/10.1109/TSP.2016.2631454
Article MathSciNet MATH Google Scholar
H. Valavi, P.J. Ramadge, E. Nestler, N. Verma, A mixed-signal binarized convolutional-neural-network accelerator integrating dense weight storage and multiplication for reduced data movement. In: IEEE Symposium on VLSI Circuits (2018), pp. 141–142. https://doi.org/10.1109/VLSIC.2018.8502421
L. Yang, S. Ruan, K. Cheng, Y. Peng, Model-based deep encoding based on USB transmission for modern edge computing architectures. IEEE Access 8, 112553–112561 (2020). https://doi.org/10.1109/ACCESS.2020.3002844
Article Google Scholar
S. Yin, Z. Jiang, J. Seo, M. Seok, XNOR-SRAM: in-memory computing SRAM macro for binary/ternary deep neural networks. IEEE J. Solid-State Circuits 55(6), 1733–1743 (2020). https://doi.org/10.1109/JSSC.2019.2963616
Article Google Scholar
S. Yin, P. Ouyang, S. Tang, F. Tu, X. Li, L. Liu, S. Wei, A 1.06-to-5.09 tops/w reconfigurable hybrid-neural-network processor for deep learning applications. In: Symposium on VLSI Circuits (2017), pp. C26–C27. https://doi.org/10.23919/VLSIC.2017.8008534
Z. Yuan, J. Yue, H. Yang, Z. Wang, J. Li, Y. Yang, Q. Guo, X. Li, M. Chang, H. Yang, Y. Liu, Sticker: a 0.41-62.1 tops/w 8bit neural network processor with multi-sparsity compatible convolution arrays and online tuning acceleration for fully connected layers. In: IEEE Symposium on VLSI Circuits (2018), pp. 33–34. https://doi.org/10.1109/VLSIC.2018.8502404

Download references

Acknowledgements

Authors would like to acknowledge the support extended by the Defence Research and Development Organization, Ministry of Defence, Government of India with the Grant reference: ERIPR/ER/202009001/M/01/1781 dated 8 February 2021 for the research project entitled “Reconfigurable Machine Learning Accelerator Design and Development for Avionics Applications.” Authors would also like to acknowledge the support received by the Ministry of the Electronics and Information Technology (MEITY), Government of India toward the usage of the CAD tools as part of the Special Manpower Development (SMDP) program. The authors would also like to thank Ceremorphic Technologies Private Limited for funding and extending the tool support for carrying out few experiments.

Author information

Authors and Affiliations

IIT Hyderabad, Sangareddy, India
Chandrajit Pal, Sunil Pankaj, Wasim Akram, Govardhan Mattela & Amit Acharyya
IMEC, Leuven, Belgium
Dwaipayan Biswas

Authors

Chandrajit Pal
View author publications
You can also search for this author in PubMed Google Scholar
Sunil Pankaj
View author publications
You can also search for this author in PubMed Google Scholar
Wasim Akram
View author publications
You can also search for this author in PubMed Google Scholar
Dwaipayan Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Govardhan Mattela
View author publications
You can also search for this author in PubMed Google Scholar
Amit Acharyya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chandrajit Pal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pal, C., Pankaj, S., Akram, W. et al. Fragmented Huffman-Based Compression Methodology for CNN Targeting Resource-Constrained Edge Devices. Circuits Syst Signal Process 41, 3957–3984 (2022). https://doi.org/10.1007/s00034-022-01968-x

Download citation

Received: 02 January 2020
Revised: 14 January 2022
Accepted: 15 January 2022
Published: 07 February 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s00034-022-01968-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fragmented Huffman-Based Compression Methodology for CNN Targeting Resource-Constrained Edge Devices

Abstract

Access this article

Similar content being viewed by others

The position-based compression techniques for DNN model

Using Data Compression for Optimizing FPGA-Based Convolutional Neural Network Accelerators

High-efficient MPSoC-based CNNs accelerator with optimized storage and dataflow

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fragmented Huffman-Based Compression Methodology for CNN Targeting Resource-Constrained Edge Devices

Abstract

Access this article

Similar content being viewed by others

The position-based compression techniques for DNN model

Using Data Compression for Optimizing FPGA-Based Convolutional Neural Network Accelerators

High-efficient MPSoC-based CNNs accelerator with optimized storage and dataflow

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation