Semantic-Aware Visual Decomposition for Image Coding

Chang, Jianhui; Zhang, Jian; Li, Jiguo; Wang, Shiqi; Mao, Qi; Jia, Chuanmin; Ma, Siwei; Gao, Wen

doi:10.1007/s11263-023-01809-7

Semantic-Aware Visual Decomposition for Image Coding

Published: 02 June 2023

Volume 131, pages 2333–2355, (2023)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Jianhui Chang¹,
Jian Zhang ORCID: orcid.org/0000-0001-5486-3125²,
Jiguo Li³,
Shiqi Wang⁴,
Qi Mao⁵,
Chuanmin Jia¹,
Siwei Ma¹ &
…
Wen Gao¹

1190 Accesses
1 Altmetric
Explore all metrics

Abstract

In this paper, we propose a novel image coding framework with semantic-aware visual decomposition towards extremely low bitrate compression. In particular, an input image is analyzed into a semantic map as structural representation and semantic-wise texture representation and further compressed into bitstreams at the encoder side. On the decoder side, the received bitstreams of dual-layer representations are decoded and reconstructed for target image synthesis with generative models. Moreover, the attention mechanism is introduced into the model architecture for texture representation modeling and a coherency regularization is proposed to further optimize the texture representation space by aligning the representation space with the source pixel space for higher synthesis quality. Besides, we also propose a cross-channel entropy module and control the quantization scale to facilitate rate-distortion optimization. Upon compressing the decomposed components into the bitstream, the simple yet effective representation philosophy benefits image compression in many aspects. First, in terms of compression performance, compact representations, and high visual synthesis quality can bring remarkable advantages. Second, the proposed framework yields a physically explainable bitstream composed of the structural segment and semantic-wise texture segments. Third and most importantly, subsequent vision tasks (e.g., content manipulation) can receive fundamental support from the semantic-aware visual decomposition and synthesis mechanism. Extensive experimental results demonstrate the superiority of the proposed framework towards efficient visual representation learning, high efficiency image compression (\(<0.1\) bpp), and intelligent visual applications (e.g., manipulation and analysis).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Structure–texture image decomposition via non-convex total generalized variation and convolutional sparse coding

Article 07 February 2022

Chunxue Wang, Linlin Xu & Ligang Liu

Scalable image decomposition

Article 26 January 2021

Hwanbok Mun, Gang-Joon Yoon, … Sang Min Yoon

TransVQ-VAE: Generating Diverse Images Using Hierarchical Representation Learning

Notes

For reproducible research, the source codes of our method will be made public when this paper is accepted.
https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM.

References

Agustsson, E., Tschannen, M., & Mentzer, F., et al. (2019). Generative adversarial networks for extreme learned image compression. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 221–231).
Akbari, M., Liang, J., & Han, J. (2019). DSSLIC: Deep semantic segmentation-based layered image compression. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2042–2046).
Aujol, J. F., Gilboa, G., Chan, T., et al. (2006). Structure–texture image decomposition: Modeling, algorithms, and parameter selection. International Journal of Computer Vision, 67(1), 111–136.
Article MATH Google Scholar
Ballé, J., Chou, P. A., Minnen, D., et al. (2020). Nonlinear transform coding. IEEE Journal of Selected Topics in Signal Processing, 15(2), 339–353.
Article Google Scholar
Ballé, J., Laparra, V., & Simoncelli, E. (2017). End-to-end optimized image compression. In Proceedings of international conference on learning representations (ICLR).
Ballé, J., Minnen, D., & Singh, S., et al. (2018). Variational image compression with a scale hyperprior. In Proceedings of international conference on learning representations (ICLR).
Benesty, J., Chen, J., & Huang, Y., et al. (2009). Pearson correlation coefficient. In Noise reduction in speech processing (pp. 1–4). Springer.
Bjontegaard, G. (2001). Calculation of average PSNR differences between RD-curves. ITU-T VCEG-M33, Austin, TX, USA.
Bross, B., Wang, Y. K., Ye, Y., et al. (2021). Overview of the versatile video coding (VVC) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology, 31(10), 3736–3764.
Article Google Scholar
Bross, B., Wieckowski, A., & Schwarz, H., et al. (2016). Suggested process to select the benchmark set. In Document JVET-J0094 10th JVET meeting.
Casaca, W., Paiva, A., Gomez-Nieto, E., et al. (2013). Spectral image segmentation using image decomposition and inner product-based metric. Journal of Mathematical Imaging and Vision, 45(3), 227–238.
Article MathSciNet Google Scholar
Chang, J., Mao, Q., & Zhao, Z., et al. (2019). Layered conceptual image compression via deep semantic synthesis. In IEEE international conference on image processing (ICIP) (pp. 694–698).
Chang, J., Zhao, Z., Jia, C., et al. (2022). Conceptual compression via deep structure and texture synthesis. IEEE Transactions on Image Processing, 31, 2809–2823.
Article Google Scholar
Chang, J., Zhao, Z., & Yang, L., et al. (2021). Thousand to one: Semantic prior modeling for conceptual coding. In 2021 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). IEEE.
Cheng, B., Schwing, A., & Kirillov, A. (2021). Per-pixel classification is not all you need for semantic segmentation. Advances in Neural Information Processing Systems (NeurIPS), 34, 17,864-17,875.
Google Scholar
Cheng, Z., Sun, H., & Takeuchi, M., et al. (2020). Learned image compression with discretized Gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7939–7948).
Choi, Y., El-Khamy, M., & Lee, J. (2019). Variable rate deep image compression with a conditional autoencoder. In Proceedings of the IEEE/CVF international conference on computer vision (CVPR) (pp. 3146–3154).
Cordts, M., Omran, M., & Ramos, S., et al. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
Ding, K., Ma, K., Wang, S., et al. (2022). Image quality assessment: Unifying structure and texture similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(5), 2567–2581.
Google Scholar
Dong, X., Zhou, H., & Dong, J. (2020). Texture classification using pair-wise difference pooling-based bilinear convolutional neural networks. IEEE Transactions on Image Processing, 29, 8776–8790.
Article MATH Google Scholar
Gregor, K., Besse, F., & Rezende, D. J., et al. (2016). Towards conceptual compression. In Advances in neural information processing systems (NeurIPS) (pp. 3549–3557).
Gu, S., Meng, D., & Zuo, W., et al. (2017). Joint convolutional analysis and synthesis sparse representation for single image layer separation. In Proceedings of the IEEE international conference on computer vision (CVPR) (pp. 1708–1716).
Guo, C., Zhu, S. C., & Wu, Y. N. (2007). Primal sketch: Integrating structure and texture. Computer Vision and Image Understanding, 106(1), 5–19.
Article Google Scholar
Hoang, T. M., Zhou, J., & Fan, Y. (2020). Image compression with encoder–decoder matched semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 619–623).
Iwai, S., Miyazaki, T., & Sugaya, Y., et al. (2020). Fidelity-controllable extreme image compression with generative adversarial networks. In ICPR (pp. 8235–8242). IEEE.
Jeon, J., Cho, S., & Tong, X., et al. (2014). Intrinsic image decomposition using structure-texture separation and surface normals. In European conference on computer vision (ECCV) (pp. 218–233). Springer.
Jia, C., Ge, Z., & Wang, S., et al. (2021). Rate distortion characteristic modeling for neural image compression. arXiv preprint arXiv:2106.12954.
Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In Proceedings of European conference on computer vision (ECCV). Springer.
Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4401–4410).
Kazemi, V., & Sullivan, J. (2014). One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1867–1874).
Khosla, P., Teterwak, P., Wang, C., et al. (2020). Supervised contrastive learning. Advances in Neural Information Processing Systems (NeurIPS), 33, 18661–18673.
Google Scholar
Kim, Y., Ham, B., Do, M. N., et al. (2018). Structure–texture image decomposition using deep variational priors. IEEE Transactions on Image Processing, 28(6), 2692–2704.
Article MathSciNet MATH Google Scholar
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of international conference on learning representations (ICLR).
Lee, C. H., Liu, Z., & Wu, L., et al. (2020). Maskgan: Towards diverse and interactive facial image manipulation. In IEEE conference on computer vision and pattern recognition (CVPR).
Lee, J., Cho, S., & Beack, S. K. (2018). Context-adaptive entropy model for end-to-end optimized image compression. In Proceedings of international conference on learning representations (ICLR).
Li, J., Jia, C., & Zhang, X., et al. (2021a). Cross modal compression: Towards human-comprehensible semantic compression. In Proceedings of the 29th ACM international conference on multimedia (pp. 4230–4238).
Li, X., Shi, J., & Chen, Z. (2021b). Task-driven semantic coding via reinforcement learning. arXiv preprint arXiv:2106.03511.
Li, Y., Jia, C., & Wang, S., et al. (2018). Joint rate-distortion optimization for simultaneous texture and deep feature compression of facial images. In 2018 IEEE fourth international conference on multimedia big data (BigMM) (pp. 1–5). IEEE.
Li, Y., Wang, S., & Zhang, X., et al. (2021c). Quality assessment of end-to-end learned image compression: The benchmark and objective measure. In Proceedings of the 29th ACM international conference on multimedia (pp. 4297–4305).
Liu, D., Li, Y., Lin, J., et al. (2020). Deep learning-based video coding: A review and a case study. ACM Computing Surveys (CSUR), 53(1), 1–35.
Article Google Scholar
Livingstone, S. R., & Russo, F. A. (2018). The Ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE, 13(5), e0196,391.
Article Google Scholar
Luo, S., Yang, Y., & Yin, Y., et al. (2018). DeepSIC: Deep semantic image compression. In International conference on neural information processing (NeurIPS) (pp. 96–106). Springer.
Ma, H., Liu, D., Yan, N., et al. (2020). End-to-end optimized versatile image compression with wavelet-like transform. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 1247–1263.
Article Google Scholar
Ma, S., Zhang, X., Jia, C., et al. (2019). Image and video compression with neural networks: A review. IEEE Transactions on Circuits and Systems for Video Technology, 30(6), 1683–1698.
Article Google Scholar
Mao, S., Rajan, D., & Chia, L. T. (2021). Deep residual pooling network for texture recognition. Pattern Recognition, 112(107), 817.
Google Scholar
Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information (Vol. 1(2)). Freeman and Company.
Google Scholar
Mentzer, F., Toderici, G. D., & Tschannen, M., et al. (2020). High-fidelity generative image compression. In Proceedings of advances in neural information processing systems (NeurIPS).
Minnen, D., Ballé, J., & Toderici, G. D. (2018). Joint autoregressive and hierarchical priors for learned image compression. In Advances in neural information processing systems (NeurIPS) (pp. 10,771–10,780).
Park, T., Liu, M. Y., & Wang, T. C., et al. (2019). Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
Park, T., Zhu, J. Y., & Wang, O., et al. (2020). Swapping autoencoder for deep image manipulation. In Advances in neural information processing systems (NeurIPS).
Paszke, A., Gross, S., Massa, F., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems (NeurIPS), 32, 8026–8037.
Google Scholar
Pennebaker, W. B., & Mitchell, J. L. (1992). JPEG: Still image data compression standard. Springer.
Google Scholar
Rabbani, M. (2002). JPEG2000: Image compression fundamentals, standards and practice. Journal of Electronic Imaging, 11(2), 286.
Article MathSciNet Google Scholar
Schwarz, H., Rudat, C., & Siekmann, M., et al. (2016). Coding efficiency/complexity analysis of jem 1.0 coding tools for the random access configuration. In Document JVET-B0044 3rd 2nd JVET meeting.
Shaham, T. R., Dekel, T., & Michaeli, T. (2019). SinGAN: Learning a generative model from a single natural image. In Proceedings of the IEEE international conference on computer vision (CVPR) (pp. 4570–4580).
Sneyers, J., & Wuille, P. (2016). FLIF: Free lossless image format based on MANIAC compression. In 2016 IEEE international conference on image processing (ICIP) (pp. 66–70). IEEE.
Sun, S., He, T., & Chen, Z. (2021). Semantic structured image coding framework for multiple intelligent applications. IEEE Transactions on Circuits and Systems for Video Technology, 31(9), 3631–3642.
Article Google Scholar
Sun, Z., Tan, Z., & Sun, X., et al. (2021b). Interpolation variable rate image compression. In Proceedings of the 29th ACM international conference on multimedia (pp. 5574–5582).
Sze, V., Budagavi, M., & Sullivan, G. J. (2014). High efficiency video coding (HEVC). Integrated Circuit and Systems, Algorithms and Architectures Springer, 39, 40.
Google Scholar
Wang, S., Wang, S., Yang, W., et al. (2021). Towards analysis-friendly face representation with scalable feature and texture compression. IEEE Transactions on Multimedia, 24, 3169–3181.
Article Google Scholar
Wang, T. C., Liu, M. Y., & Zhu, J. Y., et al. (2018a). High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 8798–8807).
Wang, X., Girshick, R., & Gupta, A., et al. (2018b). Non-local neural networks. In Proceedings of the IEEE international conference on computer vision (CVPR) (pp. 7794–7803).
Wang, Y., Liu, D., Ma, S., et al. (2020). Ensemble learning-based rate-distortion optimization for end-to-end image compression. IEEE Transactions on Circuits and Systems for Video Technology, 31(3), 1193–1207.
Article Google Scholar
Xia, Q., Liu, H., & Ma, Z. (2020). Object-based image coding: A learning-driven revisit. In 2020 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). IEEE.
Yan, N., Liu, D., & Li, H., et al. (2020). Towards semantically scalable image coding using semantic map. In 2020 IEEE international symposium on circuits and systems (ISCAS) (pp. 1–5). IEEE.
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In European conference on computer vision (ECCV) (pp. 818–833). Springer.
Zhang, H., Zhang, Z., & Odena, A., et al. (2020). Consistency regularization for generative adversarial networks. In Proceedings of international conference on learning representations (ICLR).
Zhang, P., Wang, S., & Wang, M., et al. (2023). Rethinking semantic image compression: Scalable representation with cross-modality transfer. IEEE Transactions on Circuits and Systems for Video Technology.
Zhang, R., Isola, P., & Efros, A. A., et al. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 586–595).
Zhao, Z., Jia, C., & Wang, S., et al. (2021). Learned image compression using adaptive block-wise encoding and reconstruction network. In 2021 IEEE international symposium on circuits and systems (ISCAS) (pp. 1–5). IEEE.
Zhou, B., Zhao, H., & Puig, X., et al. (2017). Scene parsing through ade20k dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
Zhu, H., Wu, W., & Zhu, W., et al. (2022a). Celebv-hq: A large-scale video facial attributes dataset. In European conference on computer vision (pp. 650–667). Springer.
Zhu, L., Yang, W., Chen, B., et al. (2022). Enlightening low-light images with dynamic guidance for context enrichment. IEEE Transactions on Circuits and Systems for Video Technology, 32, 5068–5079.
Article Google Scholar
Zhu, P., Abdal, R., & Qin, Y., et al. (2020). Sean: Image synthesis with semantic region-adaptive normalization. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
Zhu, W., Ding, W., Xu, J., et al. (2014). Screen content coding based on HEVC framework. IEEE Transactions on Multimedia, 16(5), 1316–1326.

Download references

Author information

Authors and Affiliations

National Engineering Research Center of Visual Technology, School of Computer Science, Peking University, Beijing, 100871, China
Jianhui Chang, Chuanmin Jia, Siwei Ma & Wen Gao
School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen, 518055, China
Jian Zhang
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Jiguo Li
Department of Computer Science, City University of Hong Kong, Hong Kong, China
Shiqi Wang
State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing, 100024, China
Qi Mao

Authors

Jianhui Chang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiguo Li
View author publications
You can also search for this author in PubMed Google Scholar
Shiqi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qi Mao
View author publications
You can also search for this author in PubMed Google Scholar
Chuanmin Jia
View author publications
You can also search for this author in PubMed Google Scholar
Siwei Ma
View author publications
You can also search for this author in PubMed Google Scholar
Wen Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jian Zhang or Siwei Ma.

Additional information

Communicated by Ming-Hsuan Yang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by the National Natural Science Foundation of China under Grants 62025101 and 62088102, Shenzhen Research Project under Grant JCYJ20220531093215035, and the Young Elite Scientist Sponsorship Program By BAST under Grant No. BYSS2022019

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 3477 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chang, J., Zhang, J., Li, J. et al. Semantic-Aware Visual Decomposition for Image Coding. Int J Comput Vis 131, 2333–2355 (2023). https://doi.org/10.1007/s11263-023-01809-7

Download citation

Received: 17 March 2022
Accepted: 19 April 2023
Published: 02 June 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s11263-023-01809-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Semantic-Aware Visual Decomposition for Image Coding

Abstract

Access this article

Similar content being viewed by others

Structure–texture image decomposition via non-convex total generalized variation and convolutional sparse coding

Scalable image decomposition

TransVQ-VAE: Generating Diverse Images Using Hierarchical Representation Learning

Notes

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 3477 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semantic-Aware Visual Decomposition for Image Coding

Abstract

Access this article

Similar content being viewed by others

Structure–texture image decomposition via non-convex total generalized variation and convolutional sparse coding

Scalable image decomposition

TransVQ-VAE: Generating Diverse Images Using Hierarchical Representation Learning

Notes

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 3477 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation