A unified efficient deep image compression framework and its application on human-centric Task

Chen, Xueyuan; Hu, Zhihao; Lu, Guo; Liu, Jiaheng

doi:10.1007/s11042-023-17696-6

A unified efficient deep image compression framework and its application on human-centric Task

1232: Human-centric Multimedia Analysis
Published: 14 December 2023

(2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Xueyuan Chen¹,
Zhihao Hu¹,
Guo Lu² &
…
Jiaheng Liu¹

142 Accesses
Explore all metrics

Abstract

Image compression is a widely used technique to reduce the spatial redundancy in images. Recently, learning based image compression has achieved significant progress by using the powerful representation ability from neural networks. However, the current learning based image compression methods suffer from the huge computational cost, which limits their capacity for practical applications. In this paper, we propose a unified framework called Efficient Deep Image Compression (EDIC) based on three new technologies, including a channel attention module, a Gaussian mixture model and a decoder-side enhancement module. Specifically, we design an auto-encoder style network for learning based image compression. To improve the coding efficiency, we exploit the channel relationship between latent representations by using the channel attention module. Besides, the Gaussian mixture model is introduced for the entropy model and improves the accuracy for bitrate estimation. Furthermore, we introduce the decoder-side enhancement module to further improve image compression performance. Our EDIC method can also be readily incorporated with the Deep Video Compression (DVC) framework (Lu et al. 2019) to further improve the video compression performance. Simultaneously, our EDIC method boosts the coding performance significantly while bringing slightly increased computational cost. More importantly, experimental results demonstrate that the proposed approach outperforms the current image compression methods and is up to more than 150 times faster in terms of decoding speed when compared with Minnen’s method (Minnen et al. 2018). Moreover, we also evaluate the performance of the human-centric task (i.e., face recognition) by using different coding strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Coupled Squeeze-and-Excitation Blocks Based CNN for Image Compression

Semantic-oriented learning-based image compression by Only-Train-Once quantized autoencoders

Article 29 April 2022

Neural Multi-scale Image Compression

Data Availability

The datasets generated during and/or analysed during the current study are available online.

Notes

https://github.com/JooyoungLeeETRI/CA_Entropy_Model

References

kodak E (2018)kodak lossless true color image suite (photocd pcd0992). http://r0k.us/graphics/kodak/
bellard F (2018) bpg image format. http://bellard.org/bpg/, accessed: 30 Oct 2018
Webp (2018) . https://developers.google.com/speed/webp/, 30 Oct 2018
x264 (2018a) the best h.264/avc encoder. https://www.videolan.org/developers/x264.html, 30 Oct 2018
x265 (2018b) hevc encoder / h.265 video codec. http://x265.org, 30 Oct 2018
Agustsson E, Mentzer F, Tschannen M, et al (2017) Soft-to-hard vector quantization for end-to-end learning compressible representations. In: NIPS, pp 1141–1151
Agustsson E, Tschannen M, Mentzer F, et al (2018) Generative adversarial networks for extreme learned image compression. arXiv:1804.02958
Baig MH, Koltun V, Torresani L (2017) Learning to inpaint for image compression. In: NIPS, pp 1246–1255
Ballé J, Laparra V, Simoncelli EP (2015) Density modeling of images using a generalized normalization transformation. arXiv:1511.06281
Ballé J, Laparra V, Simoncelli EP (2017) End-to-end optimized image compression. In: 5th International conference on learning representations, ICLR
Ballé J, Minnen D, Singh S, et al (2018) Variational image compression with a scale hyperprior. In: 6th International conference on learning representations, ICLR
Cao Q, Shen L, Xie W, et al (2018) Vggface2: A dataset for recognising faces across pose and age. In: IEEE International conference on automatic face & gesture recognition. IEEE, pp 67–74
Chamain LD, Racapé F, Bégaint J, et al (2021) End-to-end optimized image compression for machines, a study. In: 2021 Data compression conference (DCC). IEEE, pp 163–172
Chen T, Liu H, Ma Z, et al (2019) Neural image compression via non-local attention optimization and improved context modeling. arXiv:1910.06244
Cheng Z, Sun H, Takeuchi M, et al (2019) Learning image and video compression through spatial-temporal energy compaction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR, pp 10,071–10,080
Cheng Z, Sun H, Takeuchi M, et al (2020) Learned image compression with discretized gaussian mixture likelihoods and attention modules. arXiv:2001.01568
Choi Y, El-Khamy M, Lee J (2019) Variable rate deep image compression with a conditional autoencoder. In: Proceedings of the IEEE international conference on computer vision, pp 3146–3154
Deng J, Guo J, Xue N, et al (2019) Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4690–4699
Djelouah A, Campos J, Schaub-Meyer S, et al (2019) Neural inter-frame compression for video coding. In: Proceedings of the IEEE International conference on computer vision, pp 6421–6429
Duan L, Liu J, Yang W, et al (2020) Video coding for machines: A paradigm of collaborative compression and intelligent analytics. Trans Img Proc 29:8680–8695
Guo Y, Zhang L, Hu Y, et al (2016) Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In: European conference on computer vision, pp 87–102
Habibian A, Rozendaal Tv, Tomczak JM, et al (2019) Video compression with rate-distortion autoencoders. In: Proceedings of the IEEE international conference on computer vision, pp 7033–7042
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Hu Y, Yang S, Yang W, et al (2020) Towards coding for human and machine vision: A scalable image coding approach. In: 2020 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6
Johnston N, Vincent D, Minnen D, et al (2018) Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks. In: CVPR
Kemelmacher-Shlizerman I, Seitz SM, Miller D, et al (2016) The megaface benchmark: 1 million faces for recognition at scale. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4873–4882
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Lee J, Cho S, Beack SK (2018) Context-adaptive entropy model for end-to-end optimized image compression. arXiv:1809.10452
Lee J, Cho S, Kim M (2019) A hybrid architecture of jointly learning image compression and quality enhancement with improved entropy minimization. arXiv:1912.12817
Li M, Zuo W, Gu S, et al (2018) Learning convolutional networks for content-weighted image compression. In: CVPR
Lim B, Son S, Kim H, et al (2017) Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 136–144
Liu H, Chen T, Guo P, et al (2019) Non-local attention optimized deep image compression. arXiv:1904.09757
Lu G, Ouyang W, Xu D, et al (2019) DVC: An end-to-end deep video compression framework. In: Proceedings of the IEEE conference on computer vision and pattern recognition,CVPR, pp 11,006–11,015
Mentzer F, Agustsson E, Tschannen M, et al (2018) Conditional probability models for deep image compression. In: CVPR, 2, p 3
Minnen D, Ballé J, Toderici GD (2018) Joint autoregressive and hierarchical priors for learned image compression. In: Advances in neural information processing systems, pp 10,771–10,780
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814
Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems 32. Curran Associates, Inc., p 8024–8035
Ranjan A, Black MJ (2017) Optical flow estimation using a spatial pyramid network. In: The IEEE Conference on computer vision and pattern recognition (CVPR)
Rippel O, Bourdev L (2017) Real-time adaptive image compression. In: ICML
Sandler M, Howard A, Zhu M, et al (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4510–4520
Skodras A, Christopoulos C, Ebrahimi T (2001) The jpeg 2000 still image compression standard. IEEE Signal Process Mag 18(5):36–58
Article Google Scholar
Sullivan GJ, Ohm JR, Han WJ et al (2012) Overview of the high efficiency video coding(hevc) standard. TCSVT 22(12):1649–1668
Google Scholar
Theis L, Shi W, Cunningham A, et al (2017) Lossy image compression with compressive autoencoders. In: 5th International conference on learning representations, ICLR
Toderici G, O’Malley SM, Hwang SJ, et al (2016) Variable rate image compression with recurrent neural networks. In: 4th International conference on learning representations, ICLR
Toderici G, Vincent D, Johnston N, et al (2017) Full resolution image compression with recurrent neural networks. In: CVPR, pp 5435–5443
Wallace GK (1992) The jpeg still picture compression standard. IEEE Transactions on Consumer Electronics 38(1):xviii–xxxiv
Wang X, Girshick R, Gupta A, et al (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803
Wang Z, Simoncelli E, Bovik A, et al (2003) Multi-scale structural similarity for image quality assessment. In: ASILOMAR CONFERENCE ON SIGNALS SYSTEMS AND COMPUTERS, IEEE; 1998, pp 1398–1402
Witten IH, Neal RM, Cleary JG (1987) Arithmetic coding for data compression. Communications of the ACM 30(6):520–540
Article Google Scholar
Wu CY, Singhal N, Krahenbuhl P (2018) Video compression through image interpolation. In: ECCV
Xue T, Chen B, Wu J et al (2019) Video enhancement with task-oriented flow. International Journal of Computer Vision, IJCV 127(8):1106–1125
Article Google Scholar
Yang F, Wang Y, Herranz L et al (2022) A novel framework for image-to-image translation and image compression. Neurocomputing 508:58–70
Article Google Scholar

Download references

Author information

Authors and Affiliations

Beihang University, Beijing, China
Xueyuan Chen, Zhihao Hu & Jiaheng Liu
Shanghai Jiao Tong University, Shanghai, China
Guo Lu

Authors

Xueyuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhihao Hu
View author publications
You can also search for this author in PubMed Google Scholar
Guo Lu
View author publications
You can also search for this author in PubMed Google Scholar
Jiaheng Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiaheng Liu.

Ethics declarations

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, X., Hu, Z., Lu, G. et al. A unified efficient deep image compression framework and its application on human-centric Task. Multimed Tools Appl (2023). https://doi.org/10.1007/s11042-023-17696-6

Download citation

Received: 13 June 2023
Revised: 04 September 2023
Accepted: 21 November 2023
Published: 14 December 2023
DOI: https://doi.org/10.1007/s11042-023-17696-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A unified efficient deep image compression framework and its application on human-centric Task

Abstract

Access this article

Similar content being viewed by others

Coupled Squeeze-and-Excitation Blocks Based CNN for Image Compression

Semantic-oriented learning-based image compression by Only-Train-Once quantized autoencoders

Neural Multi-scale Image Compression

Data Availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A unified efficient deep image compression framework and its application on human-centric Task

Abstract

Access this article

Similar content being viewed by others

Coupled Squeeze-and-Excitation Blocks Based CNN for Image Compression

Semantic-oriented learning-based image compression by Only-Train-Once quantized autoencoders

Neural Multi-scale Image Compression

Data Availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation