Advertisement

Pose Guided Human Image Synthesis by View Disentanglement and Enhanced Weighting Loss

  • Mohamed Ilyes LakhalEmail author
  • Oswald Lanz
  • Andrea Cavallaro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11130)

Abstract

View synthesis aims at generating a novel, unseen view of an object. This is a challenging task in the presence of occlusions and asymmetries. In this paper, we present View-Disentangled Generator (VDG), a two-stage deep network for pose-guided human-image generation that performs coarse view prediction followed by a refinement stage. In the first stage, the network predicts the output from a target human pose, the source-image and the corresponding human pose, which are processed in different branches separately. This enables the network to learn a disentangled representation from the source and target view. In the second stage, the coarse output from the first stage is refined by adversarial training. Specifically, we introduce a masked version of the structural similarity loss that facilitates the network to focus on generating a higher quality view. Experiments on Market-1501 and DeepFashion demonstrate the effectiveness of the proposed generator.

Keywords

Pose-guided view synthesis Generative models Structural similarity 

References

  1. 1.
    Siarohin, A., Sangineto, E., Lathuilière, S., Sebe, N.: Deformable GANs for pose-based human image generation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, June 2018Google Scholar
  2. 2.
    Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L.: Pose guided person image generation. In: Advances in Neural Information Processing Systems, NIPS, December 2017Google Scholar
  3. 3.
    Eunbyung, P., Jimei, Y., Ersin, Y., Duygu, C., Alexander, C.B.: Transformation-grounded image generation network for novel 3D view synthesis. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, July 2017Google Scholar
  4. 4.
    Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A.A.: View synthesis by appearance flow. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 286–301. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_18CrossRefGoogle Scholar
  5. 5.
    Chenyang, S., Wei, W., Liang, W., Tieniu, T.: Multistage adversarial losses for pose-based human image synthesis. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, June 2018Google Scholar
  6. 6.
    Kholgade, N., Simon, T., Efros, A., Sheikh, Y.: 3D object manipulation in a single photograph using stock 3D models. ACM Trans. Comput. Graph. 33, 127 (2014)zbMATHGoogle Scholar
  7. 7.
    Zheng, Y., Chen, X., Cheng, M.M., Zhou, K., Hu, S.M., Mitra, N.J.: Interactive images: cuboid proxies for smart image manipulation. ACM Trans. Graph. 31, 99:1–99:11 (2012)Google Scholar
  8. 8.
    Yan, X., Yang, J., Yumer, E., Guo, Y., Lee, H.: Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision. In: Advances in Neural Information Processing Systems, NIPS, December 2016Google Scholar
  9. 9.
    Zhu, H., Su, H., Wang, P., Cao, X., Yang, R.: View extrapolation of human body from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, June 2018Google Scholar
  10. 10.
    Zhao, B., Wu, X., Cheng, Z., Liu, H., Feng, J.: Multi-view image generation from a single-view. Volume abs/1704.04886 (2017)Google Scholar
  11. 11.
    Pumarola, A., Agudo, A., Sanfeliu, A., Moreno-Noguer, F.: Unsupervised person image synthesis in arbitrary poses. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, June 2018Google Scholar
  12. 12.
    Ma, L., Sun, Q., Georgoulis, S., Van Gool, L., Schiele, B., Fritz, M.: Disentangled person image generation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, June 2018Google Scholar
  13. 13.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  14. 14.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: The International Conference on Learning Representations, ICLR, April 2014Google Scholar
  15. 15.
    Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, NIPS, December 2014Google Scholar
  16. 16.
    Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.: Context encoders: feature learning by inpainting. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, June 2016Google Scholar
  17. 17.
    Isola, P., Zhu, J., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, July 2017Google Scholar
  18. 18.
    Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: IEEE International Conference on Computer Vision, ICCV, October 2017Google Scholar
  19. 19.
    Krishna, R., Ali, B.: Cross-view image synthesis using conditional GANs. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, June 2018Google Scholar
  20. 20.
    Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 35, 1798–1828 (2013)CrossRefGoogle Scholar
  21. 21.
    Dosovitskiy, A., Springenberg, J.T., Tatarchenko, M., Brox, T.: Learning to generate chairs, tables and cars with convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 39, 692–705 (2017)Google Scholar
  22. 22.
    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, NIPS, December 2015Google Scholar
  23. 23.
    Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_43CrossRefGoogle Scholar
  24. 24.
    Yang, C., Wang, Z., Zhu, X., Huang, C., Shi, J., Lin, D.: Pose guided human video generation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 204–219. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01249-6_13CrossRefGoogle Scholar
  25. 25.
    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, July 2017Google Scholar
  26. 26.
    Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: Advances in Neural Information Processing Systems, NIPS, December 2016Google Scholar
  27. 27.
    Mirza, M., Osindero, S.: Conditional generative adversarial nets. Volume abs/1411.1784 (2014)Google Scholar
  28. 28.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. (TIP) 13, 600–612 (2004)CrossRefGoogle Scholar
  29. 29.
    Zhao, H., Gallo, O., Frosio, I., Kautz, J.: Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imag. 3, 47–57 (2017)CrossRefGoogle Scholar
  30. 30.
    Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: IEEE International Conference on Computer Vision, ICCV, December 2015Google Scholar
  31. 31.
    Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, June 2016Google Scholar
  32. 32.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, ICLR, May 2015Google Scholar
  33. 33.
    Salimans, T., et al.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, NIPS, December 2016Google Scholar
  34. 34.
    Borji, A.: Pros and cons of GAN evaluation measures. Volume abs/1802.03446 (2018)Google Scholar
  35. 35.
    Guha, B., Amy, Z., Adrian, V.D., Fredo, D., John, G.: Synthesizing images of humans in unseen poses. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, June 2018Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Mohamed Ilyes Lakhal
    • 1
    Email author
  • Oswald Lanz
    • 2
  • Andrea Cavallaro
    • 1
  1. 1.CISQueen Mary University of LondonLondonUK
  2. 2.TeV, Fondazione Bruno KesslerTrentoItaly

Personalised recommendations