Modular Generative Adversarial Networks

  • Bo ZhaoEmail author
  • Bo Chang
  • Zequn Jie
  • Leonid Sigal
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11218)


Existing methods for multi-domain image-to-image translation (or generation) attempt to directly map an input image (or a random vector) to an image in one of the output domains. However, most existing methods have limited scalability and robustness, since they require building independent models for each pair of domains in question. This leads to two significant shortcomings: (1) the need to train exponential number of pairwise models, and (2) the inability to leverage data from other domains when training a particular pairwise mapping. Inspired by recent work on module networks, this paper proposes ModularGAN for multi-domain image generation and image-to-image translation. ModularGAN consists of several reusable and composable modules that carry on different functions (e.g., encoding, decoding, transformations). These modules can be trained simultaneously, leveraging data from all domains, and then combined to construct specific GAN networks at test time, according to the specific image translation task. This leads to ModularGAN’s superior flexibility of generating (or translating to) an image in any desired domain. Experimental results demonstrate that our model not only presents compelling perceptual results but also outperforms state-of-the-art methods on multi-domain facial attribute transfer.


Neural modular network Generative adversarial network Image generation Image translation 



This research was supported in part by the National Sciences and Engineering Council of Canada (NSERC). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

Supplementary material

474202_1_En_10_MOESM1_ESM.pdf (4.8 mb)
Supplementary material 1 (pdf 4867 KB)


  1. 1.
    Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Learning to compose neural networks for question answering. In: HLT-NAACL (2016)Google Scholar
  2. 2.
    Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Neural module networks. In: CVPR, pp. 39–48 (2016)Google Scholar
  3. 3.
    Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. In: ICML (2017)Google Scholar
  4. 4.
    Chang, B., Zhang, Q., Pan, S., Meng, L.: Generating handwritten Chinese characters using Cyclegan. In: WACV (2018)Google Scholar
  5. 5.
    Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: CVPR (2018)Google Scholar
  6. 6.
    Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)Google Scholar
  7. 7.
    Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved Training of Wasserstein GANs. In: NIPS (2017)Google Scholar
  8. 8.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  9. 9.
    Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., Belongie, S.: Stacked generative adversarial networks. In: CVPR (2017)Google Scholar
  10. 10.
    Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph. (TOG) 36, 107 (2017)CrossRefGoogle Scholar
  11. 11.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2016)Google Scholar
  12. 12.
    Johnson, J., et al.: Inferring and executing programs for visual reasoning. In: ICCV, pp. 3008–3017 (2017)Google Scholar
  13. 13.
    Johnson, M., et al.: Google’s multilingual neural machine translation system: enabling zero-shot translation. In: TACL (2017)Google Scholar
  14. 14.
    Karacan, L., Akata, Z., Erdem, A., Erdem, E.: Learning to generate images of outdoor scenes from attributes and semantic layouts. arXiv.1612.00215 (2016)
  15. 15.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)Google Scholar
  16. 16.
    Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR (2017)Google Scholar
  17. 17.
    Li, M., Zuo, W., Zhang, D.: Deep Identity-aware Transfer of Facial Attributes. arXiv.1610.05586 (2016)
  18. 18.
    Li, M., Huang, H., Ma, L., Liu, W., Zhang, T., Jiang, Y.G.: Unsupervised image-to-image translation with stacked cycle-consistent adversarial networks (2018)Google Scholar
  19. 19.
    Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV (2015)Google Scholar
  20. 20.
    Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv:1411.1784 (2014)
  21. 21.
    Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. In: NIPS (2016)Google Scholar
  22. 22.
    Perarnau, G., van de Weijer, J., Raducanu, B., Álvarez, J.M.: Invertible conditional GANs for image editing. In: NIPS Workshop on Adversarial Training (2016)Google Scholar
  23. 23.
    Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning what and where to draw. In: NIPS (2016)Google Scholar
  24. 24.
    Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: ICML (2016)Google Scholar
  25. 25.
    Sangkloy, P., Lu, J., Fang, C., Yu, F., Hays, J.: Scribbler: controlling deep image synthesis with sketch and color. In: CVPR (2016)Google Scholar
  26. 26.
    Seo, P.H., Lehrmann, A., Han, B., Sigal, L.: Visual reference resolution using attention memory for visual dialog. In: NIPS (2017)Google Scholar
  27. 27.
    Shen, W., Liu, R.: Learning residual images for face attribute manipulation. In: CVPR (2017)Google Scholar
  28. 28.
    Sun, Q., Tewari, A., Xu, W., Fritz, M., Theobalt, C., Schiele, B.: A hybrid model for identity obfuscation by face replacement. arXiv:1804.04779 (2018)
  29. 29.
    Xiao, T., Hong, J., Ma, J.: Elegant: exchanging latent encodings with GAN for transferring multiple face attributes. arXiv:1803.10562 (2018)CrossRefGoogle Scholar
  30. 30.
    Xu, T., et al.: Attngan: fine-grained text to image generation with attentional generative adversarial networks. In: CVPR (2018)Google Scholar
  31. 31.
    Yan, X., Yang, J., Sohn, K., Lee, H.: Attribute2Image: conditional image generation from visual attributes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 776–791. Springer, Cham (2016). Scholar
  32. 32.
    Zhang, H., et al.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: ICCV (2017)Google Scholar
  33. 33.
    Zhao, B., Wu, X., Cheng, Z.Q., Liu, H., Jie, Z., Feng, J.: Multi-view image generation from a single-view. In: MM (2018)Google Scholar
  34. 34.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.University of British ColumbiaVancouverCanada
  2. 2.Tencent AI LabBellevueUSA

Personalised recommendations