Abstract
Image compression is a widely used technique to reduce the spatial redundancy in images. Recently, learning based image compression has achieved significant progress by using the powerful representation ability from neural networks. However, the current learning based image compression methods suffer from the huge computational cost, which limits their capacity for practical applications. In this paper, we propose a unified framework called Efficient Deep Image Compression (EDIC) based on three new technologies, including a channel attention module, a Gaussian mixture model and a decoder-side enhancement module. Specifically, we design an auto-encoder style network for learning based image compression. To improve the coding efficiency, we exploit the channel relationship between latent representations by using the channel attention module. Besides, the Gaussian mixture model is introduced for the entropy model and improves the accuracy for bitrate estimation. Furthermore, we introduce the decoder-side enhancement module to further improve image compression performance. Our EDIC method can also be readily incorporated with the Deep Video Compression (DVC) framework (Lu et al. 2019) to further improve the video compression performance. Simultaneously, our EDIC method boosts the coding performance significantly while bringing slightly increased computational cost. More importantly, experimental results demonstrate that the proposed approach outperforms the current image compression methods and is up to more than 150 times faster in terms of decoding speed when compared with Minnen’s method (Minnen et al. 2018). Moreover, we also evaluate the performance of the human-centric task (i.e., face recognition) by using different coding strategies.
Similar content being viewed by others
Data Availability
The datasets generated during and/or analysed during the current study are available online.
References
kodak E (2018)kodak lossless true color image suite (photocd pcd0992). http://r0k.us/graphics/kodak/
bellard F (2018) bpg image format. http://bellard.org/bpg/, accessed: 30 Oct 2018
Webp (2018) . https://developers.google.com/speed/webp/, 30 Oct 2018
x264 (2018a) the best h.264/avc encoder. https://www.videolan.org/developers/x264.html, 30 Oct 2018
x265 (2018b) hevc encoder / h.265 video codec. http://x265.org, 30 Oct 2018
Agustsson E, Mentzer F, Tschannen M, et al (2017) Soft-to-hard vector quantization for end-to-end learning compressible representations. In: NIPS, pp 1141–1151
Agustsson E, Tschannen M, Mentzer F, et al (2018) Generative adversarial networks for extreme learned image compression. arXiv:1804.02958
Baig MH, Koltun V, Torresani L (2017) Learning to inpaint for image compression. In: NIPS, pp 1246–1255
Ballé J, Laparra V, Simoncelli EP (2015) Density modeling of images using a generalized normalization transformation. arXiv:1511.06281
Ballé J, Laparra V, Simoncelli EP (2017) End-to-end optimized image compression. In: 5th International conference on learning representations, ICLR
Ballé J, Minnen D, Singh S, et al (2018) Variational image compression with a scale hyperprior. In: 6th International conference on learning representations, ICLR
Cao Q, Shen L, Xie W, et al (2018) Vggface2: A dataset for recognising faces across pose and age. In: IEEE International conference on automatic face & gesture recognition. IEEE, pp 67–74
Chamain LD, Racapé F, Bégaint J, et al (2021) End-to-end optimized image compression for machines, a study. In: 2021 Data compression conference (DCC). IEEE, pp 163–172
Chen T, Liu H, Ma Z, et al (2019) Neural image compression via non-local attention optimization and improved context modeling. arXiv:1910.06244
Cheng Z, Sun H, Takeuchi M, et al (2019) Learning image and video compression through spatial-temporal energy compaction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR, pp 10,071–10,080
Cheng Z, Sun H, Takeuchi M, et al (2020) Learned image compression with discretized gaussian mixture likelihoods and attention modules. arXiv:2001.01568
Choi Y, El-Khamy M, Lee J (2019) Variable rate deep image compression with a conditional autoencoder. In: Proceedings of the IEEE international conference on computer vision, pp 3146–3154
Deng J, Guo J, Xue N, et al (2019) Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4690–4699
Djelouah A, Campos J, Schaub-Meyer S, et al (2019) Neural inter-frame compression for video coding. In: Proceedings of the IEEE International conference on computer vision, pp 6421–6429
Duan L, Liu J, Yang W, et al (2020) Video coding for machines: A paradigm of collaborative compression and intelligent analytics. Trans Img Proc 29:8680–8695
Guo Y, Zhang L, Hu Y, et al (2016) Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In: European conference on computer vision, pp 87–102
Habibian A, Rozendaal Tv, Tomczak JM, et al (2019) Video compression with rate-distortion autoencoders. In: Proceedings of the IEEE international conference on computer vision, pp 7033–7042
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Hu Y, Yang S, Yang W, et al (2020) Towards coding for human and machine vision: A scalable image coding approach. In: 2020 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6
Johnston N, Vincent D, Minnen D, et al (2018) Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks. In: CVPR
Kemelmacher-Shlizerman I, Seitz SM, Miller D, et al (2016) The megaface benchmark: 1 million faces for recognition at scale. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4873–4882
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Lee J, Cho S, Beack SK (2018) Context-adaptive entropy model for end-to-end optimized image compression. arXiv:1809.10452
Lee J, Cho S, Kim M (2019) A hybrid architecture of jointly learning image compression and quality enhancement with improved entropy minimization. arXiv:1912.12817
Li M, Zuo W, Gu S, et al (2018) Learning convolutional networks for content-weighted image compression. In: CVPR
Lim B, Son S, Kim H, et al (2017) Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 136–144
Liu H, Chen T, Guo P, et al (2019) Non-local attention optimized deep image compression. arXiv:1904.09757
Lu G, Ouyang W, Xu D, et al (2019) DVC: An end-to-end deep video compression framework. In: Proceedings of the IEEE conference on computer vision and pattern recognition,CVPR, pp 11,006–11,015
Mentzer F, Agustsson E, Tschannen M, et al (2018) Conditional probability models for deep image compression. In: CVPR, 2, p 3
Minnen D, Ballé J, Toderici GD (2018) Joint autoregressive and hierarchical priors for learned image compression. In: Advances in neural information processing systems, pp 10,771–10,780
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814
Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems 32. Curran Associates, Inc., p 8024–8035
Ranjan A, Black MJ (2017) Optical flow estimation using a spatial pyramid network. In: The IEEE Conference on computer vision and pattern recognition (CVPR)
Rippel O, Bourdev L (2017) Real-time adaptive image compression. In: ICML
Sandler M, Howard A, Zhu M, et al (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4510–4520
Skodras A, Christopoulos C, Ebrahimi T (2001) The jpeg 2000 still image compression standard. IEEE Signal Process Mag 18(5):36–58
Sullivan GJ, Ohm JR, Han WJ et al (2012) Overview of the high efficiency video coding(hevc) standard. TCSVT 22(12):1649–1668
Theis L, Shi W, Cunningham A, et al (2017) Lossy image compression with compressive autoencoders. In: 5th International conference on learning representations, ICLR
Toderici G, O’Malley SM, Hwang SJ, et al (2016) Variable rate image compression with recurrent neural networks. In: 4th International conference on learning representations, ICLR
Toderici G, Vincent D, Johnston N, et al (2017) Full resolution image compression with recurrent neural networks. In: CVPR, pp 5435–5443
Wallace GK (1992) The jpeg still picture compression standard. IEEE Transactions on Consumer Electronics 38(1):xviii–xxxiv
Wang X, Girshick R, Gupta A, et al (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803
Wang Z, Simoncelli E, Bovik A, et al (2003) Multi-scale structural similarity for image quality assessment. In: ASILOMAR CONFERENCE ON SIGNALS SYSTEMS AND COMPUTERS, IEEE; 1998, pp 1398–1402
Witten IH, Neal RM, Cleary JG (1987) Arithmetic coding for data compression. Communications of the ACM 30(6):520–540
Wu CY, Singhal N, Krahenbuhl P (2018) Video compression through image interpolation. In: ECCV
Xue T, Chen B, Wu J et al (2019) Video enhancement with task-oriented flow. International Journal of Computer Vision, IJCV 127(8):1106–1125
Yang F, Wang Y, Herranz L et al (2022) A novel framework for image-to-image translation and image compression. Neurocomputing 508:58–70
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, X., Hu, Z., Lu, G. et al. A unified efficient deep image compression framework and its application on human-centric Task. Multimed Tools Appl (2023). https://doi.org/10.1007/s11042-023-17696-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-023-17696-6