Conditional Image Synthesis Using Stacked Auxiliary Classifier Generative Adversarial Networks

  • Zhongwei Yao
  • Hao Dong
  • Fangde Liu
  • Yike Guo
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 887)


Synthesizing photo-realistic images has been a long-standing challenge in image processing and could provide crucial approaches for dataset augmentation and balancing. Traditional methods have trouble in dealing with the rich and complicated structural information of objects resulting from the variations in colors, poses, textures and illumination. Recent advancement in Deep Learning techniques presents a new perspective to this task. The aim of our paper is to apply state-of-the-art generative models to synthesize diverse and realistic high-resolution images. Extensive experiments have been conducted on celebA dataset, a large-scale face attributes dataset with more than 200 thousand celebrity images, each with 40 attribute labels. Enlightened by existing structures, we present stacked Auxiliary Classifier Generative Adversarial Networks (Stack-ACGAN) for image synthesis given conditioning labels, which generates low resolution images (e.g. \(64\times 64\)) that sketch basic shapes and colors in Stage-I and high resolution images (e.g. \(256\times 256\)) with plausible details in Stage-II. Inception scores and Multi-Scale Structural Similarity (MS-SSIM) are computed for evaluation of the synthesized images. Both quantitative and qualitative analysis prove the proposed model is capable of generating diverse and realistic images.


High-resolution image synthesis Deep learning Generative adversarial networks 


  1. 1.
    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  2. 2.
    Efros, A.A., Leung, T.K.: Texture synthesis by non-parametric sampling. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1033–1038. IEEE (1999)Google Scholar
  3. 3.
    Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)Google Scholar
  4. 4.
    Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738 (2015)Google Scholar
  5. 5.
    Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: Conference Record of the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, vol. 2, pp. 1398–1402. IEEE (2003)Google Scholar
  6. 6.
    Freeman, W.T., Jones, T.R., Pasztor, E.C.: Example-based super-resolution. IEEE Comput. Graph. Appl. 22(2), 56–65 (2002)CrossRefGoogle Scholar
  7. 7.
    Hays, J., Efros, A.A.: Scene completion using millions of photographs. ACM Trans. Graph. (TOG) 26, 4 (2007)CrossRefGoogle Scholar
  8. 8.
    Reed, S.E., Zhang, Y., Zhang, Y., Lee, H.: Deep visual analogy-making. In: Advances in Neural Information Processing Systems, pp. 1252–1260 (2015)Google Scholar
  9. 9.
    Kingma, D.P., Welling, M.: Auto-encoding variational Bayes, arXiv preprint arXiv:1312.6114 (2013)
  10. 10.
    Gregor, K., Danihelka, I., Graves, A., Rezende, D.J., Wierstra, D.: Draw: a recurrent neural network for image generation, arXiv preprint arXiv:1502.04623 (2015)
  11. 11.
    Yan, X., Yang, J., Sohn, K., Lee, H.: Attribute2image: conditional image generation from visual attributes. In: European Conference on Computer Vision, pp. 776–791. Springer (2016)Google Scholar
  12. 12.
    Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier gans, arXiv preprint arXiv:1610.09585 (2016)
  13. 13.
    Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis, arXiv preprint arXiv:1605.05396 (2016)
  14. 14.
    Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: European Conference on Computer Vision, pp. 597–613. Springer (2016)Google Scholar
  15. 15.
    Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., Metaxas, D.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks, arXiv preprint arXiv:1612.03242 (2016)
  16. 16.
    Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z.: Photo-realistic single image super-resolution using a generative adversarial network, arXiv preprint arXiv:1609.04802 (2016)
  17. 17.
    Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision, pp. 694–711. Springer (2016)Google Scholar
  18. 18.
    Witten, I.H., Frank, E., Hall, M.A., Pal, C.-J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)Google Scholar
  19. 19.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)Google Scholar
  20. 20.
    Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the ICML, vol. 30 (2013)Google Scholar
  21. 21.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of ComputingImperial College LondonLondonUK
  2. 2.Data Science InstituteImperial College LondonLondonUK

Personalised recommendations