Skip to main content

T2CI-GAN: Text to Compressed Image Generation Using Generative Adversarial Network

  • Conference paper
  • First Online:
Computer Vision and Image Processing (CVIP 2022)

Abstract

The problem of generating textual descriptions for the visual data has gained research attention in the recent years. In contrast to that the problem of generating visual data from textual descriptions is still very challenging, because it requires the combination of both Natural Language Processing (NLP) and Computer Vision techniques. The existing methods utilize the Generative Adversarial Networks (GANs) and generate the uncompressed images from textual description. However, in practice, most of the visual data are processed and transmitted in the compressed representation. Hence, the proposed work attempts to generate the visual data directly in the compressed representation form using Deep Convolutional GANs (DCGANs) to achieve the storage and computational efficiency. We propose GAN models for compressed image generation from text. The first model is directly trained with JPEG compressed DCT images (compressed domain) to generate the compressed images from text descriptions. The second model is trained with RGB images (pixel domain) to generate JPEG compressed DCT representation from text descriptions. The proposed models are tested on an open source benchmark dataset Oxford-102 Flower images using both RGB and JPEG compressed versions, and accomplished the state-of-the-art performance in the JPEG compressed domain. The code will be publicly released at GitHub after acceptance of paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ashual, O., Wolf, L.: Specifying object attributes and relations in interactive scene generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4561–4569 (2019)

    Google Scholar 

  2. Bell, T., Adjeroh, D., Mukherjee, A.: Pattern matching in compressed texts and images (2001)

    Google Scholar 

  3. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)

    Google Scholar 

  4. Gueguen, L., Sergeev, A., Kadlec, B., Liu, R., Yosinski, J.: Faster neural networks straight from jpeg. In: Advances in Neural Information Processing Systems, pp. 3933–3944 (2018)

    Google Scholar 

  5. He, S., et al.: Context-aware layout to image generation with enhanced object appearance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15049–15058 (2021)

    Google Scholar 

  6. Jason, B.: Tips for training stable generative adversarial networks (2019). https://machinelearningmastery.com/how-to-train-stable-generative-adversarial-networks/

  7. Javed, M., Nagabhushan, P., Chaudhuri, B.B.: Extraction of line-word-character segments directly from run-length compressed printed text-documents. In: 2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), pp. 1–4. IEEE (2013)

    Google Scholar 

  8. Javed, M., Nagabhushan, P., Chaudhuri, B.: Extraction of projection profile, run-histogram and entropy features straight from run-length compressed text-documents. arXiv preprint arXiv:1404.0627 (2014)

  9. Javed, M., Nagabhushan, P., Chaudhuri, B.B.: A review on document image analysis techniques directly in the compressed domain. Artif. Intell. Rev. 50(4), 539–568 (2018)

    Article  Google Scholar 

  10. Javed, M., Nagabhushan, P., Chaudhuri, B.B.: A direct approach for word and character segmentation in run-length compressed documents with an application to word spotting. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 216–220. IEEE (2015)

    Google Scholar 

  11. Kang, B., Tripathi, S., Nguyen, T.Q.: Generating images in compressed domain using generative adversarial networks. IEEE Access 8, 180977–180991 (2020). https://doi.org/10.1109/ACCESS.2020.3027800

    Article  Google Scholar 

  12. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)

  13. Mukhopadhyay, J.: Image and video processing in the compressed domain. Chapman and Hall/CRC (2011)

    Google Scholar 

  14. Nilsback, M.E., Zisserman, A.: 102 category flower dataset (2008). https://www.robots.ox.ac.uk/~vgg/data/flowers/102/

  15. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  16. Qiao, T., Zhang, J., Xu, D., Tao, D.: MIRRORGAN: learning text-to-image generation by redescription. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1505–1514 (2019)

    Google Scholar 

  17. de Queiroz, R.L., Eschbach, R.: Fast segmentation of the jpeg-compressed documents. J. Electron. Imaging 7(2), 367–378 (1998)

    Article  Google Scholar 

  18. Rajesh, B., Javed, M., Ratnesh, Srivastava, S.: DCT-compCNN: a novel image classification network using jpeg compressed DCT coefficients. In: 2019 IEEE Conference on Information and Communication Technology, pp. 1–6 (2019). https://doi.org/10.1109/CICT48419.2019.9066242

  19. Rajesh, B., Javed, M., Nagabhushan, P.: Automatic tracing and extraction of text-line and word segments directly in jpeg compressed document images. IET Image Processing, April 2020

    Google Scholar 

  20. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060–1069. PMLR (2016)

    Google Scholar 

  21. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. Adv. Neural. Inf. Process. Syst. 29, 2234–2242 (2016)

    Google Scholar 

  22. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

    Google Scholar 

  23. Tompkins, D.A., Kossentini, F.: A fast segmentation algorithm for bi-level image compression using jbig2. In: Proceedings. 1999 International Conference on Image Processing, 1999. ICIP 99, vol. 1, pp. 224–228. IEEE (1999)

    Google Scholar 

  24. Vovk, V.: The fundamental nature of the log loss function. In: Beklemishev, L.D., Blass, A., Dershowitz, N., Finkbeiner, B., Schulte, W. (eds.) Fields of Logic and Computation II. LNCS, vol. 9300, pp. 307–318. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23534-9_20

    Chapter  MATH  Google Scholar 

  25. Xu, T., et al.: ATTNGAN: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2018)

    Google Scholar 

  26. Yin, G., Liu, B., Sheng, L., Yu, N., Wang, X., Shao, J.: Semantics disentangling for text-to-image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2327–2336 (2019)

    Google Scholar 

  27. Zhang, H., et al.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bulla Rajesh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rajesh, B., Dusa, N., Javed, M., Dubey, S.R., Nagabhushan, P. (2023). T2CI-GAN: Text to Compressed Image Generation Using Generative Adversarial Network. In: Gupta, D., Bhurchandi, K., Murala, S., Raman, B., Kumar, S. (eds) Computer Vision and Image Processing. CVIP 2022. Communications in Computer and Information Science, vol 1777. Springer, Cham. https://doi.org/10.1007/978-3-031-31417-9_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-31417-9_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-31416-2

  • Online ISBN: 978-3-031-31417-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics