T2CI-GAN: Text to Compressed Image Generation Using Generative Adversarial Network

Rajesh, Bulla; Dusa, Nandakishore; Javed, Mohammed; Dubey, Shiv Ram; Nagabhushan, P.

doi:10.1007/978-3-031-31417-9_23

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1777))

Included in the following conference series:

International Conference on Computer Vision and Image Processing

427 Accesses
1 Citations

Abstract

The problem of generating textual descriptions for the visual data has gained research attention in the recent years. In contrast to that the problem of generating visual data from textual descriptions is still very challenging, because it requires the combination of both Natural Language Processing (NLP) and Computer Vision techniques. The existing methods utilize the Generative Adversarial Networks (GANs) and generate the uncompressed images from textual description. However, in practice, most of the visual data are processed and transmitted in the compressed representation. Hence, the proposed work attempts to generate the visual data directly in the compressed representation form using Deep Convolutional GANs (DCGANs) to achieve the storage and computational efficiency. We propose GAN models for compressed image generation from text. The first model is directly trained with JPEG compressed DCT images (compressed domain) to generate the compressed images from text descriptions. The second model is trained with RGB images (pixel domain) to generate JPEG compressed DCT representation from text descriptions. The proposed models are tested on an open source benchmark dataset Oxford-102 Flower images using both RGB and JPEG compressed versions, and accomplished the state-of-the-art performance in the JPEG compressed domain. The code will be publicly released at GitHub after acceptance of paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ashual, O., Wolf, L.: Specifying object attributes and relations in interactive scene generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4561–4569 (2019)
Google Scholar
Bell, T., Adjeroh, D., Mukherjee, A.: Pattern matching in compressed texts and images (2001)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Gueguen, L., Sergeev, A., Kadlec, B., Liu, R., Yosinski, J.: Faster neural networks straight from jpeg. In: Advances in Neural Information Processing Systems, pp. 3933–3944 (2018)
Google Scholar
He, S., et al.: Context-aware layout to image generation with enhanced object appearance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15049–15058 (2021)
Google Scholar
Jason, B.: Tips for training stable generative adversarial networks (2019). https://machinelearningmastery.com/how-to-train-stable-generative-adversarial-networks/
Javed, M., Nagabhushan, P., Chaudhuri, B.B.: Extraction of line-word-character segments directly from run-length compressed printed text-documents. In: 2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), pp. 1–4. IEEE (2013)
Google Scholar
Javed, M., Nagabhushan, P., Chaudhuri, B.: Extraction of projection profile, run-histogram and entropy features straight from run-length compressed text-documents. arXiv preprint arXiv:1404.0627 (2014)
Javed, M., Nagabhushan, P., Chaudhuri, B.B.: A review on document image analysis techniques directly in the compressed domain. Artif. Intell. Rev. 50(4), 539–568 (2018)
Article Google Scholar
Javed, M., Nagabhushan, P., Chaudhuri, B.B.: A direct approach for word and character segmentation in run-length compressed documents with an application to word spotting. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 216–220. IEEE (2015)
Google Scholar
Kang, B., Tripathi, S., Nguyen, T.Q.: Generating images in compressed domain using generative adversarial networks. IEEE Access 8, 180977–180991 (2020). https://doi.org/10.1109/ACCESS.2020.3027800
Article Google Scholar
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
Mukhopadhyay, J.: Image and video processing in the compressed domain. Chapman and Hall/CRC (2011)
Google Scholar
Nilsback, M.E., Zisserman, A.: 102 category flower dataset (2008). https://www.robots.ox.ac.uk/~vgg/data/flowers/102/
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Qiao, T., Zhang, J., Xu, D., Tao, D.: MIRRORGAN: learning text-to-image generation by redescription. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1505–1514 (2019)
Google Scholar
de Queiroz, R.L., Eschbach, R.: Fast segmentation of the jpeg-compressed documents. J. Electron. Imaging 7(2), 367–378 (1998)
Article Google Scholar
Rajesh, B., Javed, M., Ratnesh, Srivastava, S.: DCT-compCNN: a novel image classification network using jpeg compressed DCT coefficients. In: 2019 IEEE Conference on Information and Communication Technology, pp. 1–6 (2019). https://doi.org/10.1109/CICT48419.2019.9066242
Rajesh, B., Javed, M., Nagabhushan, P.: Automatic tracing and extraction of text-line and word segments directly in jpeg compressed document images. IET Image Processing, April 2020
Google Scholar
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060–1069. PMLR (2016)
Google Scholar
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. Adv. Neural. Inf. Process. Syst. 29, 2234–2242 (2016)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Tompkins, D.A., Kossentini, F.: A fast segmentation algorithm for bi-level image compression using jbig2. In: Proceedings. 1999 International Conference on Image Processing, 1999. ICIP 99, vol. 1, pp. 224–228. IEEE (1999)
Google Scholar
Vovk, V.: The fundamental nature of the log loss function. In: Beklemishev, L.D., Blass, A., Dershowitz, N., Finkbeiner, B., Schulte, W. (eds.) Fields of Logic and Computation II. LNCS, vol. 9300, pp. 307–318. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23534-9_20
Chapter MATH Google Scholar
Xu, T., et al.: ATTNGAN: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2018)
Google Scholar
Yin, G., Liu, B., Sheng, L., Yu, N., Wang, X., Shao, J.: Semantics disentangling for text-to-image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2327–2336 (2019)
Google Scholar
Zhang, H., et al.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of IT, IIIT Allahabad, Prayagraj, 211015, U.P, India
Bulla Rajesh, Nandakishore Dusa, Mohammed Javed, Shiv Ram Dubey & P. Nagabhushan
Department of CSE, Vignan University, Guntur, 522213, A.P, India
Bulla Rajesh & P. Nagabhushan

Authors

Bulla Rajesh
View author publications
You can also search for this author in PubMed Google Scholar
Nandakishore Dusa
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Javed
View author publications
You can also search for this author in PubMed Google Scholar
Shiv Ram Dubey
View author publications
You can also search for this author in PubMed Google Scholar
P. Nagabhushan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bulla Rajesh .

Editor information

Editors and Affiliations

Visvesvaraya National Institute of Technology Nagpur, Nagpur, India
Deep Gupta
Visvesvaraya National Institute of Technology Nagpur, Nagpur, India
Kishor Bhurchandi
Indian Institute of Technology Ropar, Rupnagar, India
Subrahmanyam Murala
Indian Institute of Technology Roorkee, Roorkee, India
Balasubramanian Raman
Indian Institute of Technology Roorkee, Roorkee, India
Sanjeev Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rajesh, B., Dusa, N., Javed, M., Dubey, S.R., Nagabhushan, P. (2023). T2CI-GAN: Text to Compressed Image Generation Using Generative Adversarial Network. In: Gupta, D., Bhurchandi, K., Murala, S., Raman, B., Kumar, S. (eds) Computer Vision and Image Processing. CVIP 2022. Communications in Computer and Information Science, vol 1777. Springer, Cham. https://doi.org/10.1007/978-3-031-31417-9_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-31417-9_23
Published: 07 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31416-2
Online ISBN: 978-3-031-31417-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

T2CI-GAN: Text to Compressed Image Generation Using Generative Adversarial Network