Unsupervised Object Transfiguration with Attention

  • Zihan Ye
  • Fan Lyu
  • Linyan LiEmail author
  • Yu Sun
  • Qiming Fu
  • Fuyuan HuEmail author


Object transfiguration is a subtask of the image-to-image translation, which translates two independent image sets and has a wide range of applications. Recently, some studies based on Generative Adversarial Network (GAN) have achieved impressive results in the image-to-image translation. However, the object transfiguration task only translates regions containing target objects instead of whole images; most of the existing methods never consider this issue, which results in mistranslation on the backgrounds of images. To address this problem, we present a novel pipeline called Deep Attention Unit Generative Adversarial Networks (DAU-GAN). During the translating process, the DAU computes attention masks that point out where the target objects are. DAU makes GAN concentrate on translating target objects while ignoring meaningless backgrounds. Additionally, we construct an attention-consistent loss and a background-consistent loss to compel our model to translate intently target objects and preserve backgrounds further effectively. We have comparison experiments on three popular related datasets, demonstrating that the DAU-GAN achieves superior performance to the state-of-the-art. We also export attention masks in different stages to confirm its effect during the object transfiguration task. The proposed DAU-GAN can translate object effectively as well as preserve backgrounds information at the same time. In our model, DAU learns to focus on the most important information by producing attention masks. These masks compel DAU-GAN to effectively distinguish target objects and backgrounds during the translation process and to achieve impressive translation results in two subsets of ImageNet and CelebA. Moreover, the results show that we cannot only investigate the model from the image itself but also research from other modal information.


Multi-modalities Object transfiguration Image-to-image translation Generative Adversarial Networks (GANs) Attention mechanism Deep learning 



This work was supported by the National Natural Science Foundation of China (No. 61876121, 61472267, 61728205, 61502329, 61672371), Primary Research & Development Plan of Jiangsu Province (No. BE2017663), Aeronautical Science Foundation (20151996016), and Jiangsu Key Disciplines of Thirteen Five-Year Plan (No. 20168765) and Suzhou Institute of Trade & Commerce Research Project(KY-ZRA1805).

Compliance with Ethical Standards

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. 1.
    Johnson J, Alahi A, Fei-Fei L. Perceptual losses for real-time style transfer and super-resolution. ECCV. 2016:694–711.Google Scholar
  2. 2.
    Isola P, Zhu JY, Zhou T, Efros AA. Image-to-image translation with conditional adversarial networks. CVPR. 2017:1125–34.Google Scholar
  3. 3.
    Ledig C, Theis L, Huszar F, Caballero J, Cunningham A, Acosta A, et al. Photo-realistic single image superresolution using a generative adversarial network. CVPR. 2017:4681–90.Google Scholar
  4. 4.
    Zhang H, Xu T, Li H, Zhang S, Huang X, Wang X, et al. Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. CVPR. 2017:5907–15.Google Scholar
  5. 5.
    Feng Y, Ren J, Jiang J. Object-based 2d-to-3d video conversion for effective stereoscopic content generation in 3d-tv applications. IEEE Trans Broadcast. 2011;57(2):500–9.CrossRefGoogle Scholar
  6. 6.
    Ren J, Jiang J, Wang D, Ipson S. Fusion of intensity and inter-component chromatic difference for effective and robust colour edge detection. IET Image Process. 2010;4(4):294–301.CrossRefGoogle Scholar
  7. 7.
    Zabalza J, et al. Novel segemented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging. Neurocomputing. 2016;185:1–10.CrossRefGoogle Scholar
  8. 8.
    Han J, Zhang D, Hu X, Guo L, Ren J, Wu F. Background prior-based salient object detection via deep reconstruction residual. TCSVT. 2015;25(8):1309–21.Google Scholar
  9. 9.
    Yan Y, Ren J, Zhao H, Sun G, Wang Z, Zheng J, et al. Cognitive fusion of thermal and visible imagery for effective detection and tracking of pedestrians in videos. Cogn Comput. 2018;10(1):94–104.CrossRefGoogle Scholar
  10. 10.
    Han J, Zhang D, Cheng G, Guo L, Ren J. Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans Geosci Remote Sens. 2015;53(6):3325–37.CrossRefGoogle Scholar
  11. 11.
    Gao F, Zhang Y, Wang J, Sun J, Yang E, Hussain A. Visual attention model based vehicle target detection in synthetic aperture radar images: a novel approach. Cogn Comput. 2015;7(4):434–44.CrossRefGoogle Scholar
  12. 12.
    Gao F, You J, Wang J, Sun J, Yang E, Zhou H. A novel target detection method for SAR images based on shadow proposal and saliency analysis. Neurocomputing. 2017;267:220–31.CrossRefGoogle Scholar
  13. 13.
    Gao F, Ma F, Wang J, et al. Visual saliency modeling for river detection in high-resolution SAR imagery. IEEE Access. 2018;6:1000–14.CrossRefGoogle Scholar
  14. 14.
    Gao F, Ma F, Zhang Y, Wang J, Sun J, Yang E, et al. Biologically inspired progressive enhancement target detection from heavy cluttered SAR images[J]. Cogn Comput. 2016;8(5):955–66.CrossRefGoogle Scholar
  15. 15.
    Fu X, Huang J, Zeng D, Huang Y, Ding X, Paisley J. Removing rain from single images via a deep detail network. CVPR. 2017:3855–63.Google Scholar
  16. 16.
    Shufei Zhang et al. Learning from few samples with memory network, cognitive computation, 2018; 10(1) 15–22.Google Scholar
  17. 17.
    Luo C, et al. Zero-shot learning via attribute regression and class prototype rectification. IEEE Transactions on Image Processing. 2018;27(2):637–48.CrossRefGoogle Scholar
  18. 18.
    Liu MY, Breuel T, Kautz J. Unsupervised image-to-image translation networks. Advances in Neural Information Processing Systems. 2017:700–8.Google Scholar
  19. 19.
    Liao J, Yao Y, Yuan L, Hua G, Kang SB. Visual attribute transfer through deep image analogy. ACM Trans Graph. 2017;36(4):120.CrossRefGoogle Scholar
  20. 20.
    Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J. Stargan: unified generative adversarial networks for multi-domain image-to-image translation. arXiv preprint. 2017;arXiv:1711.09020.Google Scholar
  21. 21.
    Zhu JY, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. CVPR. 2017:2223–32.Google Scholar
  22. 22.
    Yi Z, Zhang H, Tan P, Gong M. Dualgan: unsupervised dual learning for image-to-image translation. CVPR. 2017:2849–57.Google Scholar
  23. 23.
    Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep-convolutional neural networks. NIPS. 2012:1097–105.Google Scholar
  24. 24.
    Zhao B, Feng J, Wu X, Yan S. A survey on deep learning-based fine-grained object classification and semantic segmentation. Int J Autom Comput. 2017;14(2):119–35.CrossRefGoogle Scholar
  25. 25.
    Yan Y, Ren J, Sun G, Zhao H, Han J, Li X, et al. Unsupervised image saliency detection with gestalt-laws guided optimization and visual attention based refinement. Pattern Recogn. 2018;79:65–78.CrossRefGoogle Scholar
  26. 26.
    Aboudib A, Gripon V, Coppin G. A biologically inspired framework for visual information processing and an application on modeling bottom-up visual attention. Cogn Comput. 2016;8(6):1007–26.CrossRefGoogle Scholar
  27. 27.
    Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. In: NIPS. 2014:2672–80.Google Scholar
  28. 28.
    Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint. 2015;arXiv:1511.06434.Google Scholar
  29. 29.
    Zhu JY, Kr¨ahenb¨uhl P, Shechtman E, Efros AA. Generative visual manipulation on the natural image manifold. In: European Conference on Computer Vision. 2016:597–613.Google Scholar
  30. 30.
    Gao F, Huang T, Wang J, Sun J, Hussain A, Yang E. Dual-branch deep convolution neural network for polarimetric SAR image classification. Appl Sci. 2017;7(5):447.CrossRefGoogle Scholar
  31. 31.
    Gao F, Yang Y, Wang J, et al. A deep convolutional generative adversarial networks (DCGANs)-based semi-supervised method for object recognition in synthetic aperture radar (SAR) images. Remote Sens, 2018, 10(6).Google Scholar
  32. 32.
    Reed, Scott and Akata, Zeynep and Yan, Xinchen and Logeswaran, Lajanugen and Schiele, Bernt and Lee, Honglak.: Generative adversarial text to image synthesis. In: ICML. 2016: 1060–1069.Google Scholar
  33. 33.
    Huang X, Liu MY, Belongie S, et al. Multimodal unsupervised image-to-image translation. arXiv preprint. 2018;arXiv:1804.04732.Google Scholar
  34. 34.
    Zhu JY, Zhang R, Pathak D, Darrell T, Efros AA, Wang O, et al. Toward multimodal image-to-image translation. NIPS. 2017:465–76.Google Scholar
  35. 35.
    Briggs F, Mangun GR, Usrey WM. Attention enhances synaptic efficacy and the signal-to-noise ratio in neural circuits. Nature. 2013;499(7459):476–80.CrossRefGoogle Scholar
  36. 36.
    Wang Z, Ren J, Zhang D, et al. A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos. Neurocomputing. 2018;289:68–83.CrossRefGoogle Scholar
  37. 37.
    Ma S, Fu J, Chen CW, Mei T. DA-GAN: instance-level image translation by deep attention generative adversarial networks (with supplementary materials). CVPR. 2018:5657–66.Google Scholar
  38. 38.
    He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. CVPR. 2016:770–8.Google Scholar
  39. 39.
    Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, et al. Residual attention network for image classification. CVPR. 2017:3156–64.Google Scholar
  40. 40.
    Liu X, Deng Z. Segmentation of drivable road using deep fully convolutional residual network with pyramid pooling. Cogn Comput. 2018;10(2):272–81.CrossRefGoogle Scholar
  41. 41.
    Fu J, Zheng H, Mei T. Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. CVPR. 2017:4438–46.Google Scholar
  42. 42.
    Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, et al. Show, attend and tell: neural image caption generation with visual attention. ICML. 2015:2048–57.Google Scholar
  43. 43.
    Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. ICAIS. 2011:315–23.Google Scholar
  44. 44.
    Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. CVPR. 2009:248–55.Google Scholar
  45. 45.
    Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation[C]//International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015: 234–241.Google Scholar
  46. 46.
    Yang P, Huang K, Liu CL. Geometry preserving multi-task metric learning. Mach Learn. 2013;92(1):133–75.CrossRefGoogle Scholar
  47. 47.
    Yang X, Huang K, Zhang R, et al. Learning latent features with infinite nonnegative binary matrix trifactorization. TETCI. 2018;99:1–14.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Suzhou University of Science and TechnologySuzhouChina
  2. 2.Tianjin UniversityTianjinChina
  3. 3.Suzhou Institute of Trade & CommerceSuzhouChina

Personalised recommendations