Skip to main content
Log in

Human face generation from textual description via style mapping and manipulation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Text-to-Face generation is an interesting and challenging task with great potential for diverse computer vision ap- plications in public safety domain. There has been very selective work in Text-to-Face synthesis than Text-to-Image due to diverse facial visual attributes and their corresponding descriptions. In this paper, we have proposed a Text-to-Face generative model that can produce high quality and high resolution images from a given textual description. The model is also able to produce a range of diverse images for a given description. In the proposed approach, the encoded text input is mapped to the generator to produce high quality output which is further manipulated to better reflect the described attributes. Apart from diversity (or in addition to diversity), the model is also able to significantly emphasize the facial attributes provided in the description. The applications of the proposed model include criminal investigation, character generation (video games, movies etc.), manipulating facial attributes according to brief textual description, text based style transfer, text based Image retrieval etc.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available in the repository [15], https://github.com/switchablenorms/CelebAMask-HQ.

References

  1. Abdal YQR Wonka P (2019) “Im- age2StyleGAN: how to embed images into the style- GAN latent space?” In: ICCV.

  2. Barratt S, Sharma R (2018) “A note on the inception score”. In: arXiv preprint arXiv:1801.01973

  3. Bau D et al. (2019) “Seeing what a Gan cannot generate”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4502–4511.

  4. Chen X et al. (2019) “FTGAN: a fully-trained generative adversarial networks for text to face generation”. In: arXiv preprint arXiv:1904.05729.

  5. Garg K et al. (2020) “Perception GAN: real-world image construction from provided text through perceptual understanding”. In: 2020 Joint 9th International Conference on Informatics, Electronics & Vision (ICIEV) and 2020 4th International Conference on Imaging, Vision & Pattern Recognition (icIVPR). IEEE ,pp. 1–7.

  6. Goodfellow IJ et al. (2014) “Generative adversarial net- works”. In: arXiv preprint arXiv:1406.2661.

  7. Heusel M et al. (2017) “Gans trained by a two time-scale update rule converge to a local Nash equilibrium”. In: arXiv preprint arXiv:1706.08500.

  8. Karnewar A and Wang O (2020) “Msg-Gan: multi- scale gradients for generative adversarial networks”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7799–7808.

  9. Karras T et al. (2017) “Progressive growing of gans for improved quality, stability, and variation”. In: arXiv preprint arXiv:1710.10196.

  10. Karras T, S Laine, Aila T (2019) “A style- based generator architecture for generative adversarial networks”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410.

  11. Karras T et al. (2020) “Analyzing and improving the image quality of StyleGAN”. In: Proc CVPR.

  12. Kettunen M, Härkönen E, Lehtinen J (2019). “E-LPIPS: robust perceptual image similarity via random transformation ensembles”. In: arXiv preprint arXiv:1906.03973

  13. Khan M et al. (2020) “A Realistic Image Generation of Face From Text Description Using the Fully Trained Generative Adversarial Networks”. In: IEEE Access PP, pp. 1–1. https://doi.org/10.1109/ACCESS.2020.3015656.

  14. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst 25:1097–1105

    Google Scholar 

  15. Lee C-H et al. (2020) “MaskGAN: towards diverse and interactive facial image manipulation”. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  16. Li B et al. (2019) “Controllable text-to-image generation”. In: arXiv preprint arXiv:1909.07083.

  17. Mao J et al. (2015) “Deep captioning with multi- modal recurrent neural networks (m-RNN)”. In: ICLR.

  18. Nasir OR et al (2019) “Text2FaceGAN: face generation from fine grained textual descriptions”. In: 2019 IEEE Fifth International Conference on Multime- dia Big Data (BigMM). IEEE, pp. 58–67.

  19. Patashnik O et al. (2021) “Styleclip: text-driven manipulation of stylegan imagery”. In: arXiv preprint arXiv:2103.17249

  20. Radford A, Metz L, Chintala S (2015) “Unsupervised representation learning with deep convolutional generative adversarial networks”. In: arXiv preprint arXiv:1511.06434.

  21. Radford A et al. (2021) “Learning Transferable Visual Models From Natural Language Supervision”. In: arXiv preprint arXiv:2103.00020.

  22. Reed S et al. (2016) “Generative adversarial text to image synthesis”. In: International Conference on Machine Learning. PMLR, pp. 1060–1069.

  23. Richardson E et al. (2020) “Encoding in style: a StyleGAN encoder for image-to-image translation”. In: arXiv preprint arXiv:2008.00951.

  24. Salimans T et al. (2016) “Improved techniques for training gans”. In: arXiv preprint arXiv:1606.03498

  25. Shen Y et al. (2020) “Interpreting the latent space of gans for semantic face editing”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9243–9252.

  26. Simonyan K, Zisserman A (2014) “Very deep convolutional networks for large-scale image recognition”. In: arXiv preprint arXiv:1409.1556.

  27. Szegedy C et al. (2016) “Rethinking the inception architecture for computer vision”. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826.

  28. Tao X et al. 2018 “Attngan: fine-grained text to im- age generation with attentional generative adversarial networks”. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1316–1324.

  29. Tao M et al. (2020) “Df-Gan: deep fusion generative adversarial networks for text-to-image synthesis”. In: arXiv preprint arXiv:2008.05865.

  30. Wang T, Zhang T, and Lovell B (2021) “Faces a la carte: text-to-face generation via attribute disentanglement”. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3380–3388.

  31. Xia W et al. (2021) “GAN Inversion: A Survey”. In: arXiv preprint arXiv:2101.05278.

  32. Xia W et al. (2021) “TediGAN: text-guided diverse face image generation and manipulation”. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  33. Xian Y et al (2018) Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans Pattern Anal Mach Intell 41(9):2251–2265

    Article  Google Scholar 

  34. Yongyi L, Tai Y-W, Tang C-K(2018) “Attribute-guided face generation using conditional cyclegan”. In: Proceedings of the European conference on computer vision (ECCV), pp. 282–297.

  35. Zhang H et al (2018) Stackgan++: realistic image syn- thesis with stacked generative adversarial networks. IEEE Trans Pattern Anal Mach Intell 41(8):1947–1962

    Article  Google Scholar 

  36. Zhang R et al. (2018) “The unreasonable effectiveness of deep features as a perceptual metric”. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595.

  37. Zhu M et al. (2019) “Dm-Gan: dynamic memory generative adversarial networks for text-to-image synthesis”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5802–5810.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tanmoy Hazra.

Ethics declarations

Conflict of interest

We declare that there is no conflict of interest for this research submission.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Todmal, S., Mule, A., Bhagwat, D. et al. Human face generation from textual description via style mapping and manipulation. Multimed Tools Appl 82, 13579–13594 (2023). https://doi.org/10.1007/s11042-022-13899-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13899-5

Keywords

Navigation