Advertisement

Shape-Conditioned Image Generation by Learning Latent Appearance Representation from Unpaired Data

  • Yutaro MiyauchiEmail author
  • Yusuke Sugano
  • Yasuyuki Matsushita
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11366)

Abstract

Conditional image generation is effective for diverse tasks including training data synthesis for learning-based computer vision. However, despite the recent advances in generative adversarial networks (GANs), it is still a challenging task to generate images with detailed conditioning on object shapes. Existing methods for conditional image generation use category labels and/or keypoints and are only give limited control over object categories. In this work, we present SCGAN, an architecture to generate images with a desired shape specified by an input normal map. The shape-conditioned image generation task is achieved by explicitly modeling the image appearance via a latent appearance vector. The network is trained using unpaired training samples of real images and rendered normal maps. This approach enables us to generate images of arbitrary object categories with the target shape and diverse image appearances. We show the effectiveness of our method through both qualitative and quantitative evaluation on training data generation tasks.

References

  1. 1.
    Baktashmotlagh, M., Harandi, M.T., Lovell, B.C., Salzmann, M.: Unsupervised domain adaptation by domain invariant projection. In: Proceedings of the ICCV, pp. 769–776 (2013)Google Scholar
  2. 2.
    Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., Krishnan, D.: Unsupervised pixel-level domain adaptation with generative adversarial networks. In: Proceedings of the CVPR, pp. 95–104 (2017)Google Scholar
  3. 3.
    Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Neural photo editing with introspective adversarial networks. In: Proceedings of the ICLR (2017)Google Scholar
  4. 4.
    Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
  5. 5.
    Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Proceedings of the NIPS, pp. 1–14 (2016)Google Scholar
  6. 6.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the CVPR, pp. 248–255 (2009)Google Scholar
  7. 7.
    Donahue, J., Krähenbühl, P., Darrell, T.: Adversarial feature learning. In: Proceedings of the ICLR (2017)Google Scholar
  8. 8.
    Ganin, Y., et al.: Domain-adversarial training of neural networks. JMLR 17, 1–35 (2015)MathSciNetGoogle Scholar
  9. 9.
    Goodfellow, I., Pouget-Abadie, J., Mirza, M.: Generative adversarial networks. In: Proceedings of the NIPS, pp. 2672–2680 (2014)Google Scholar
  10. 10.
    Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of Wasserstein GANs. In: Proceedings of the NIPS, pp. 5769–5779 (2017)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the CVPR, pp. 770–778 (2016)Google Scholar
  12. 12.
    Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-37331-2_42CrossRefGoogle Scholar
  13. 13.
    Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the CVPR, pp. 2261–2269 (2017)Google Scholar
  14. 14.
    Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the CVPR, pp. 3296–3305 (2017)Google Scholar
  15. 15.
    Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph. 36(4), 1–14 (2017). https://dl.acm.org/citation.cfm?id=3073659CrossRefGoogle Scholar
  16. 16.
    Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_43CrossRefGoogle Scholar
  17. 17.
    Junbo Zhao, M.M., LeCun, Y.: Energy-based GAN. In: Proceedings of the ICLR, pp. 32–48 (2015)Google Scholar
  18. 18.
    Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: Proceedings of the ICLR, pp. 1–25 (2018)Google Scholar
  19. 19.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the ICLR (2015)Google Scholar
  20. 20.
    Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: ACM Multimedia, pp. 4681–4690 (2016)Google Scholar
  21. 21.
    Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L.: Pose guided person image generation. In: Proceedings of the NIPS, pp. 405–415 (2017)Google Scholar
  22. 22.
    Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z., Smolley, S.P.: Least squares generative adversarial networks. In: Proceedings of the ICCV, November 2017Google Scholar
  23. 23.
    Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. In: Proceedings of the ICML (2017)Google Scholar
  24. 24.
    Qiu, W., Yuille, A.: UnrealCV: connecting computer vision to unreal engine. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 909–916. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-49409-8_75CrossRefGoogle Scholar
  25. 25.
    Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning what and where to draw. In: Proceedings of the NIPS, pp. 217–225 (2016)Google Scholar
  26. 26.
    Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The SYNTHIADataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the CVPR, pp. 3234–3243 (2016)Google Scholar
  27. 27.
    Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: Proceedings of the CVPR, p. 6 (2017)Google Scholar
  28. 28.
    Sixt, L., Wild, B., Landgraf, T.: RenderGAN: generating realistic labeled data. Front. Rob. AI 5, 66 (2018)CrossRefGoogle Scholar
  29. 29.
    Springenberg, J.T.: Unsupervised and semi-supervised learning with categorical generative adversarial networks. In: Proceedings of the ICLR (2016)Google Scholar
  30. 30.
    Sun, B., Saenko, K.: From virtual to reality: fast adaptation of virtual object detectors to real domains. In: Proceedings of the BMVC, pp. 82.1–82.12 (2014)Google Scholar
  31. 31.
    Tan, W.R., Chan, C.S., Aguirre, H., Tanaka, K.: ArtGAN: artwork synthesis with conditional categorial GANs. In: Proceedings of the ICIP, p. 10 (2017)Google Scholar
  32. 32.
    Vazquez, D., Lopez, A.M., Marin, J., Ponsa, D., Geronimo, D.: Virtual and real world adaptation for pedestrian detection. In: IEEE TPAMI, pp. 797–809 (2014)Google Scholar
  33. 33.
    Wood, E., Baltrus̆aitis, T., Morency, L.P., Robinson, P., Bulling, A.: Learning an appearance-based gaze estimator from one million synthesised images. In: ACM Symposium on Eye Tracking Research & Applications, pp. 131–138 (2016)Google Scholar
  34. 34.
    Xiang, Y., et al.: ObjectNet3D: a large scale database for 3D object recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 160–176. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_10CrossRefGoogle Scholar
  35. 35.
    Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)
  36. 36.
    Zhang, Y., David, P., Gong, B.: Curriculum domain adaptation for semantic segmentation of urban scenes. In: Proceedings of the ICCV, pp. 2039–2049 (2017)Google Scholar
  37. 37.
    Zhang, Z., Song, Y., Qi, H.: Age progression/regression by conditional adversarial autoencoder. In: Proceedings of the CVPR, pp. 5810–5818 (2017)Google Scholar
  38. 38.
    Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 597–613. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_36CrossRefGoogle Scholar
  39. 39.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the ICCV (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Graduate School of Information Science and TechnologyOsaka UniversityOsakaJapan

Personalised recommendations