Skip to main content
Log in

Image compression with learned lifting-based DWT and learned tree-based entropy models

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

This paper explores learned image compression based on traditional and learned discrete wavelet transform (DWT) architectures and learned entropy models for coding DWT subband coefficients. A learned DWT is obtained through the lifting scheme with learned nonlinear predict and update filters. Several learned entropy models, with varying computational complexities, are explored to exploit inter- and intra-DWT subband coefficient dependencies, akin to traditional EZW, SPIHT, or EBCOT algorithms. Experimental results show that when the explored learned entropy models are combined with traditional wavelet filters, such as the CDF 9/7 filters, compression performance that far exceeds that of JPEG2000 can be achieved. When the learned entropy models are combined with the learned DWT, compression performance increases further. The computations in the learned DWT and all entropy models, except one, can be simply parallelized, and thus, the systems provide practical encoding and decoding times on GPUs, unlike other DWT-based learned compression systems in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Data availability

The datasets that are analyzed during the current study (training and testing) are available from [39, 48], respectively.

Notes

  1. Available at https://github.com/uberkk/ImageCompressionLearnedLiftingandLearnedTreeBasedModels

  2. Encoding/decoding times for JPEG2000, and the two systems with IISCEM are obtained with a CPU (due to sequential encoding/decoding requirement), while all others are obtained with a GPU.

References

  1. Jiao, L., Zhao, J.: A survey on the new generation of deep learning in image processing. IEEE Access 7, 172231–172263 (2019)

    Article  Google Scholar 

  2. Steinmetz, R.: Data compression in multimedia computing-standards and systems. Multimed. Syst. 1(5), 187–204 (1994)

    Article  Google Scholar 

  3. Pennebaker, W.B., Mitchell, J.L.: JPEG: Still image data compression standard. Springer (1992)

  4. Rabbani, M., Joshi, R.: An overview of the jpeg 2000 still image compression standard. Signal Process. Image commun. 17(1), 3–48 (2002)

    Article  Google Scholar 

  5. Christopoulos, C., Skodras, A., Ebrahimi, T.: The JPEG2000 still image coding system: an overview. Consum. Electron. IEEE Trans 46(4), 1103–1127 (2000). https://doi.org/10.1109/30.920468

    Article  Google Scholar 

  6. Lainema, J, Hannuksela, MM, Vadakital ,VK, Aksu, EB: Hevc still image coding and high efficiency image file format. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 71–75 (2016). https://doi.org/10.1109/ICIP.2016.7532321

  7. (Netflix) CC.: AV1 Image File Format (AVIF). Last accessed 26 February 2023 (2023). http://www.aomediacodec.github.io

  8. Goodfellow, I, Bengio, Y., Courville, A.: Deep Learning. MIT Press. http://www.deeplearningbook.org (2016)

  9. Goyal, V.K.: Theoretical foundations of transform coding. IEEE Signal Process. Mag. 18(5), 9–21 (2001)

    Article  Google Scholar 

  10. Ahmed, N., Natarajan, T., Rao, K.R.: Discrete cosine transform. IEEE Trans. Comput. 100(1), 90–93 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  11. Han, J., Saxena, A., Melkote, V., Rose, K.: Jointly optimized spatial prediction and block transform for video and image coding. IEEE Trans. Image Process. 21(4), 1874–1884 (2011)

    MathSciNet  MATH  Google Scholar 

  12. Kamisli, F.: Block-based spatial prediction and transforms based on 2d markov processes for image and video compression. IEEE Trans. Image Process. 24(4), 1247–1260 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  13. Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. arXiv preprint arXiv:1611.01704 (2016)

  14. Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436 (2018)

  15. Hilton, M.L., Jawerth, B.D., Sengupta, A.: Compressing still and moving images with wavelets. Multimed. Syst. 2, 218–227 (1994)

    Article  Google Scholar 

  16. Geetha, V., Anbumani, V., Murugesan, G., Gomathi, S.: Hybrid optimal algorithm-based 2d discrete wavelet transform for image compression using fractional kca. Multimed. Syst. 26, 687–702 (2020)

    Article  Google Scholar 

  17. Buccigrossi, R.W., Simoncelli, E.P.: Image compression via joint statistical characterization in the wavelet domain. IEEE Trans. Image Process. 8(12), 1688–1701 (1999)

    Article  Google Scholar 

  18. Liu, Z., Karam, L.J.: 2002 Quantifying the intra and inter subband correlations in the zerotree-based wavelet image coders. Conf Rec Thirty-Sixth Asilomar Conf Signals Syst Comput 2, 1730–17342 (2002). https://doi.org/10.1109/ACSSC.2002.1197071

    Article  Google Scholar 

  19. Shapiro, J.M.: Embedded image coding using zerotrees of wavelet coefficients. IEEE Trans. Signal Process. 41(12), 3445–3462 (1993)

    Article  MATH  Google Scholar 

  20. Said, A., Pearlman, W.A.: A new, fast, and efficient image codec based on set partitioning in hierarchical trees. IEEE Trans. Circuits Syst. Video Technol. 6(3), 243–250 (1996)

    Article  Google Scholar 

  21. Taubman, D.: High performance scalable image compression with ebcot. IEEE Trans. Image Process. 9(7), 1158–1170 (2000)

    Article  MathSciNet  Google Scholar 

  22. Ma, H., Liu, D., Yan, N., Li, H., Wu, F.: End-to-end optimized versatile image compression with wavelet-like transform. IEEE Trans. Pattern Anal. Mach. Intell. 44, 1247 (2020)

    Article  Google Scholar 

  23. Minnen, D., Ballé, J., Toderici, G.D.: Joint autoregressive and hierarchical priors for learned image compression. Adv. Neural Inform. Process. Syst. (2018). https://doi.org/10.48550/arXiv.1809.02736

    Article  Google Scholar 

  24. Sweldens, W.: The lifting scheme: A construction of second generation wavelets. SIAM J. Math. Anal. 29(2), 511–546 (1998). https://doi.org/10.1137/S0036141095289051

    Article  MathSciNet  MATH  Google Scholar 

  25. Daubechies, I., Sweldens, W.: Factoring wavelet transforms into lifting steps. J. Fourier Anal. Appl. 4(3), 247–269 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  26. Cohen, A., Daubechies, I., Feauveau, J.-C.: Biorthogonal bases of compactly supported wavelets. Commun. Pure Appl. Math. 45(5), 485–560 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  27. Dragotti, P.L., Vetterli, M.: Wavelet footprints: theory, algorithms, and applications. IEEE Trans. Signal Process. 51(5), 1306–1323 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  28. Dragotti, P.L., Vetterli, M.: Footprints and edgeprints for image denoising and compression. In: Proceedings 2001 International Conference on Image Processing (Cat. No. 01CH37205), vol. 2, pp. 237–240 (2001). IEEE

  29. Dragotti, P.L., Vetterli, M.: Deconvolution with wavelet footprints for ill-posed inverse problems. IEEE Int. Conf. Acoust. Speech Signal Process. 2, 1257 (2002)

    Google Scholar 

  30. Zhao, X., Huang, P., Shu, X.: Wavelet-attention cnn for image classification. Multimed. Syst. 28(3), 915–924 (2022)

    Article  Google Scholar 

  31. Brahimi, T., Khelifi, F., Laouir, F., Kacha, A.: A new, enhanced ezw image codec with subband classification. Multimed. Syst. 28(1), 1–19 (2022)

    Article  Google Scholar 

  32. Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Learned image compression with discretized gaussian mixture likelihoods and attention modules. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7939–7948 (2020)

  33. Yílmaz, M.A., Kelesş, O., Güven, H., Tekalp, A.M., Malik, J., Kíranyaz, S.: Self-organized variational autoencoders (self-vae) for learned image compression. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 3732–3736 (2021). IEEE

  34. Lu, M., Guo, P., Shi, H., Cao, C., Ma, Z.: Transformer-based image compression. arXiv preprint arXiv:2111.06707 (2021)

  35. Minnen, D., Singh, S.: Channel-wise autoregressive entropy models for learned image compression. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 3339–3343 (2020). IEEE

  36. He, D., Yang, Z., Peng, W., Ma, R., Qin, H., Wang, Y.: Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5718–5727 (2022)

  37. Kim, J.-H., Heo, B., Lee, J.-S.: Joint global and local hierarchical priors for learned image compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5992–6001 (2022)

  38. Ma, H., Liu, D., Xiong, R., Wu, F.: iwave: Cnn-based wavelet-like transform for image compression. IEEE Trans. Multimed. 22(7), 1667–1679 (2019)

    Article  Google Scholar 

  39. Kodak, E.: Kodak Lossless True Color Image Suite (PhotoCD PCD0992). Last accessed 2 February 2023 (2023). http://r0k.us/graphics/kodak

  40. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  41. Ballé, J.: Efficient nonlinear transforms for lossy image compression. In: 2018 Picture Coding Symposium (PCS), pp. 248–252 (2018). IEEE

  42. Marcellin, M.W., Lepley, M.A., Bilgin, A., Flohr, T.J., Chinen, T.T., Kasner, J.H.: An overview of quantization in jpeg 2000. Signal Process.Image Commun. 17(1), 73–84 (2002)

    Article  Google Scholar 

  43. Ballé, J., Laparra, V., Simoncelli, E.P.: Density modeling of images using a generalized normalization transformation. arXiv preprint arXiv:1511.06281 (2015)

  44. Bégaint, J., Racapé, F., Feltman, S., Pushparaja, A.: Compressai: a pytorch library and evaluation platform for end-to-end compression research. arXiv preprint arXiv:2011.03029 (2020)

  45. Chilinski, P., Silva, R.: Neural likelihoods via cumulative distribution functions. In: Conference on Uncertainty in Artificial Intelligence, pp. 420–429 (2020). PMLR

  46. Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., et al.: Conditional image generation with pixelcnn decoders. Adv. Neural Inform. Process Syst. 29 (2016)

  47. Salimans, T., Karpathy, A., Chen, X., Kingma, D.P.: Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications. arXiv preprint arXiv:1701.05517 (2017)

  48. Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with task-oriented flow. CoRR arXiv:abs/1711.09078 (2017)

  49. Sahin, U.B., Kamisli, F.: Learned-DWT-and-Tree-based-Entropy-Models. Last accessed 26 February 2023 (2023). https://github.com/uberkk/ImageCompressionLearnedLiftingandLearnedTreeBasedModels

  50. Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimization of nonlinear transform codes for perceptual quality. In: 2016 Picture Coding Symposium (PCS), pp. 1–5 (2016). IEEE

  51. Pakdaman, F., Gabbouj, M.: Comprehensive complexity assessment of emerging learned image compression on cpu and gpu. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5 (2023). IEEE

  52. Sovrasov, V.: Ptflops: a Flops Counting Tool for Neural Networks in Pytorch Framework. https://github.com/sovrasov/flops-counter.pytorch

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ugur Berk Sahin or Fatih Kamisli.

Ethics declarations

Code availability

The codes to repeat the results in this paper are available from the authors on GitHub [49].

Additional information

Communicated by Q. Shen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

Following parameters are used in experimental results. In Fig. 7 a), \(ch=32\) is used for processing LL and \(ch=96\) is used for processing LH, HL, and HH subbands together. In Fig. 7 b), \(ch=32\) is used for each subband. In Fig. 8, \(ch=243\) is used for jointly processing LH, HL, and HH subbands, and output gives mean and scale for the corresponding three channels. In Fig. 9, on the right-hand side, \(ch=243\) is used for jointly processing LH, HL, and HH subbands and ch/3 is used on the left-hand side for processing each subband LH, HL, and HH separately (total of 243 channels). In Fig. 10, \(ch=162\) is used for processing each LH, HL, and HH subband separately. In Fig. 11, \(ch=81\) is used for processing each subband LL, LH, HL, and HH. In Fig. 13, \(ch=32\) is used. Our codes are available on github at [49].

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sahin, U.B., Kamisli, F. Image compression with learned lifting-based DWT and learned tree-based entropy models. Multimedia Systems 29, 3369–3384 (2023). https://doi.org/10.1007/s00530-023-01192-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-023-01192-w

Keywords

Navigation