Skip to main content
Log in

A unified efficient deep image compression framework and its application on human-centric Task

  • 1232: Human-centric Multimedia Analysis
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Image compression is a widely used technique to reduce the spatial redundancy in images. Recently, learning based image compression has achieved significant progress by using the powerful representation ability from neural networks. However, the current learning based image compression methods suffer from the huge computational cost, which limits their capacity for practical applications. In this paper, we propose a unified framework called Efficient Deep Image Compression (EDIC) based on three new technologies, including a channel attention module, a Gaussian mixture model and a decoder-side enhancement module. Specifically, we design an auto-encoder style network for learning based image compression. To improve the coding efficiency, we exploit the channel relationship between latent representations by using the channel attention module. Besides, the Gaussian mixture model is introduced for the entropy model and improves the accuracy for bitrate estimation. Furthermore, we introduce the decoder-side enhancement module to further improve image compression performance. Our EDIC method can also be readily incorporated with the Deep Video Compression (DVC) framework (Lu et al. 2019) to further improve the video compression performance. Simultaneously, our EDIC method boosts the coding performance significantly while bringing slightly increased computational cost. More importantly, experimental results demonstrate that the proposed approach outperforms the current image compression methods and is up to more than 150 times faster in terms of decoding speed when compared with Minnen’s method (Minnen et al. 2018). Moreover, we also evaluate the performance of the human-centric task (i.e., face recognition) by using different coding strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availability

The datasets generated during and/or analysed during the current study are available online.

Notes

  1. https://github.com/JooyoungLeeETRI/CA_Entropy_Model

References

  1. kodak E (2018)kodak lossless true color image suite (photocd pcd0992). http://r0k.us/graphics/kodak/

  2. bellard F (2018) bpg image format. http://bellard.org/bpg/, accessed: 30 Oct 2018

  3. Webp (2018) . https://developers.google.com/speed/webp/, 30 Oct 2018

  4. x264 (2018a) the best h.264/avc encoder. https://www.videolan.org/developers/x264.html, 30 Oct 2018

  5. x265 (2018b) hevc encoder / h.265 video codec. http://x265.org, 30 Oct 2018

  6. Agustsson E, Mentzer F, Tschannen M, et al (2017) Soft-to-hard vector quantization for end-to-end learning compressible representations. In: NIPS, pp 1141–1151

  7. Agustsson E, Tschannen M, Mentzer F, et al (2018) Generative adversarial networks for extreme learned image compression. arXiv:1804.02958

  8. Baig MH, Koltun V, Torresani L (2017) Learning to inpaint for image compression. In: NIPS, pp 1246–1255

  9. Ballé J, Laparra V, Simoncelli EP (2015) Density modeling of images using a generalized normalization transformation. arXiv:1511.06281

  10. Ballé J, Laparra V, Simoncelli EP (2017) End-to-end optimized image compression. In: 5th International conference on learning representations, ICLR

  11. Ballé J, Minnen D, Singh S, et al (2018) Variational image compression with a scale hyperprior. In: 6th International conference on learning representations, ICLR

  12. Cao Q, Shen L, Xie W, et al (2018) Vggface2: A dataset for recognising faces across pose and age. In: IEEE International conference on automatic face & gesture recognition. IEEE, pp 67–74

  13. Chamain LD, Racapé F, Bégaint J, et al (2021) End-to-end optimized image compression for machines, a study. In: 2021 Data compression conference (DCC). IEEE, pp 163–172

  14. Chen T, Liu H, Ma Z, et al (2019) Neural image compression via non-local attention optimization and improved context modeling. arXiv:1910.06244

  15. Cheng Z, Sun H, Takeuchi M, et al (2019) Learning image and video compression through spatial-temporal energy compaction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR, pp 10,071–10,080

  16. Cheng Z, Sun H, Takeuchi M, et al (2020) Learned image compression with discretized gaussian mixture likelihoods and attention modules. arXiv:2001.01568

  17. Choi Y, El-Khamy M, Lee J (2019) Variable rate deep image compression with a conditional autoencoder. In: Proceedings of the IEEE international conference on computer vision, pp 3146–3154

  18. Deng J, Guo J, Xue N, et al (2019) Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4690–4699

  19. Djelouah A, Campos J, Schaub-Meyer S, et al (2019) Neural inter-frame compression for video coding. In: Proceedings of the IEEE International conference on computer vision, pp 6421–6429

  20. Duan L, Liu J, Yang W, et al (2020) Video coding for machines: A paradigm of collaborative compression and intelligent analytics. Trans Img Proc 29:8680–8695

  21. Guo Y, Zhang L, Hu Y, et al (2016) Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In: European conference on computer vision, pp 87–102

  22. Habibian A, Rozendaal Tv, Tomczak JM, et al (2019) Video compression with rate-distortion autoencoders. In: Proceedings of the IEEE international conference on computer vision, pp 7033–7042

  23. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  24. Hu Y, Yang S, Yang W, et al (2020) Towards coding for human and machine vision: A scalable image coding approach. In: 2020 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6

  25. Johnston N, Vincent D, Minnen D, et al (2018) Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks. In: CVPR

  26. Kemelmacher-Shlizerman I, Seitz SM, Miller D, et al (2016) The megaface benchmark: 1 million faces for recognition at scale. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4873–4882

  27. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980

  28. Lee J, Cho S, Beack SK (2018) Context-adaptive entropy model for end-to-end optimized image compression. arXiv:1809.10452

  29. Lee J, Cho S, Kim M (2019) A hybrid architecture of jointly learning image compression and quality enhancement with improved entropy minimization. arXiv:1912.12817

  30. Li M, Zuo W, Gu S, et al (2018) Learning convolutional networks for content-weighted image compression. In: CVPR

  31. Lim B, Son S, Kim H, et al (2017) Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 136–144

  32. Liu H, Chen T, Guo P, et al (2019) Non-local attention optimized deep image compression. arXiv:1904.09757

  33. Lu G, Ouyang W, Xu D, et al (2019) DVC: An end-to-end deep video compression framework. In: Proceedings of the IEEE conference on computer vision and pattern recognition,CVPR, pp 11,006–11,015

  34. Mentzer F, Agustsson E, Tschannen M, et al (2018) Conditional probability models for deep image compression. In: CVPR, 2, p 3

  35. Minnen D, Ballé J, Toderici GD (2018) Joint autoregressive and hierarchical priors for learned image compression. In: Advances in neural information processing systems, pp 10,771–10,780

  36. Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814

  37. Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems 32. Curran Associates, Inc., p 8024–8035

  38. Ranjan A, Black MJ (2017) Optical flow estimation using a spatial pyramid network. In: The IEEE Conference on computer vision and pattern recognition (CVPR)

  39. Rippel O, Bourdev L (2017) Real-time adaptive image compression. In: ICML

  40. Sandler M, Howard A, Zhu M, et al (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4510–4520

  41. Skodras A, Christopoulos C, Ebrahimi T (2001) The jpeg 2000 still image compression standard. IEEE Signal Process Mag 18(5):36–58

    Article  Google Scholar 

  42. Sullivan GJ, Ohm JR, Han WJ et al (2012) Overview of the high efficiency video coding(hevc) standard. TCSVT 22(12):1649–1668

    Google Scholar 

  43. Theis L, Shi W, Cunningham A, et al (2017) Lossy image compression with compressive autoencoders. In: 5th International conference on learning representations, ICLR

  44. Toderici G, O’Malley SM, Hwang SJ, et al (2016) Variable rate image compression with recurrent neural networks. In: 4th International conference on learning representations, ICLR

  45. Toderici G, Vincent D, Johnston N, et al (2017) Full resolution image compression with recurrent neural networks. In: CVPR, pp 5435–5443

  46. Wallace GK (1992) The jpeg still picture compression standard. IEEE Transactions on Consumer Electronics 38(1):xviii–xxxiv

  47. Wang X, Girshick R, Gupta A, et al (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803

  48. Wang Z, Simoncelli E, Bovik A, et al (2003) Multi-scale structural similarity for image quality assessment. In: ASILOMAR CONFERENCE ON SIGNALS SYSTEMS AND COMPUTERS, IEEE; 1998, pp 1398–1402

  49. Witten IH, Neal RM, Cleary JG (1987) Arithmetic coding for data compression. Communications of the ACM 30(6):520–540

    Article  Google Scholar 

  50. Wu CY, Singhal N, Krahenbuhl P (2018) Video compression through image interpolation. In: ECCV

  51. Xue T, Chen B, Wu J et al (2019) Video enhancement with task-oriented flow. International Journal of Computer Vision, IJCV 127(8):1106–1125

    Article  Google Scholar 

  52. Yang F, Wang Y, Herranz L et al (2022) A novel framework for image-to-image translation and image compression. Neurocomputing 508:58–70

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiaheng Liu.

Ethics declarations

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, X., Hu, Z., Lu, G. et al. A unified efficient deep image compression framework and its application on human-centric Task. Multimed Tools Appl (2023). https://doi.org/10.1007/s11042-023-17696-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-023-17696-6

Keywords

Navigation