Dense Pose Transfer

  • Natalia Neverova
  • Rıza Alp Güler
  • Iasonas Kokkinos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11207)


In this work we integrate ideas from surface-based modeling with neural synthesis: we propose a combination of surface-based pose estimation and deep generative models that allows us to perform accurate pose transfer, i.e. synthesize a new image of a person based on a single image of that person and the image of a pose donor. We use a dense pose estimation system that maps pixels from both images to a common surface-based coordinate system, allowing the two images to be brought in correspondence with each other. We inpaint and refine the source image intensities in the surface coordinate system, prior to warping them onto the target pose. These predictions are fused with those of a convolutional predictive module through a neural synthesis module allowing for training the whole pipeline jointly end-to-end, optimizing a combination of adversarial and perceptual losses. We show that dense pose estimation is a substantially more powerful conditioning input than landmark-, or mask-based alternatives, and report systematic improvements over state of the art generators on DeepFashion and MVC datasets.

Supplementary material

474178_1_En_8_MOESM1_ESM.pdf (2.5 mb)
Supplementary material 1 (pdf 2575 KB)


  1. 1.
    Guler, R.A., Neverova, N., Kokkinos, I.: Densepose: dense human pose estimation in the wild. In: CVPR (2018)Google Scholar
  2. 2.
    Karras, T., Aila, T., Samuli, L., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. In: ICLR (2018)Google Scholar
  3. 3.
    Lassner, C., Pons-Moll, G., Gehler, P.V.: A generative model of people in clothing. In: ICCV (2017)Google Scholar
  4. 4.
    Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L.: Pose guided person image generation. In: NIPS (2017)Google Scholar
  5. 5.
    Siarohin, A., Sangineto, E., Lathuiliere, S., Sebe, N.: Deformable gans for pose-based human image generation. In: CVPR (2018)Google Scholar
  6. 6.
    Chen, Q., Koltun, V.: Photographic image synthesis with cascaded refinement networks. In: ICCV (2017)Google Scholar
  7. 7.
    Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Jan, K., Bryan, C.: High-resolution image synthesis and semantic manipulation with conditional gans. In: CVPR (2018)Google Scholar
  8. 8.
    Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Niener, M.: Faceforensics: a large-scale video dataset for forgery detection in human faces. arXiv:1803.09179v1 (2018)
  9. 9.
    Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Weng, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: CVPR (2017)Google Scholar
  10. 10.
    Lample, G., Zeghidour, N., Usunier, N., Bordes, A., Denoyer, L., Ranzato, M.: Fader networks: manipulating images by sliding attributes. In: NIPS (2017)Google Scholar
  11. 11.
    Shu, Z., Yumer, E., Hadap, S., Sunkavalli, K., Shechtman, E., Samaras, D.: Neural face editing with intrinsic image disentangling. In: CVPR (2017)Google Scholar
  12. 12.
    Isola, P., Zhu, J., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)Google Scholar
  13. 13.
    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248:1–248:16 (2015). (Proc. SIGGRAPH Asia)CrossRefGoogle Scholar
  14. 14.
    Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep It SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). Scholar
  15. 15.
    Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: closing the loop between 3D and 2D human representations. In: ICCV (2017)Google Scholar
  16. 16.
    Varol, G., et al.: Learning from synthetic humans. In: CVPR (2017)Google Scholar
  17. 17.
    Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR (2018)Google Scholar
  18. 18.
    Guler, R.A., Trigeorgis, G., Antonakos, E., Snape, P., Zafeiriou, S., Kokkinos, I.: Densereg: fully convolutional dense shape regression in-the-wild. In: CVPR (2017)Google Scholar
  19. 19.
    Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: CVPR (2016)Google Scholar
  20. 20.
    Liu, K.H., Chen, T.Y., Chen, C.S.: A dataset for view-invariant clothing retrieval and attribute prediction. In: ICMR (2016)Google Scholar
  21. 21.
    Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)Google Scholar
  23. 23.
    Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)Google Scholar
  24. 24.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2016)Google Scholar
  25. 25.
    Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. In: CVPR (2016)Google Scholar
  26. 26.
    Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Smolley, S.P.: Least squares generative adversarial networks. In: ICCV (2017)Google Scholar
  27. 27.
    Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: ICML (2017)Google Scholar
  28. 28.
    Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). Scholar
  29. 29.
    Ulyanov, D., Lebedev, V., Vedaldi, A., Lempitsky, V.: Texture networks: feed-forward synthesis of textures and stylized images. In: ICML (2016)Google Scholar
  30. 30.
    Zhu, S., Fidler, S., Urtasun, R., Lin, D., Loy, C.C.: Be your own prada: fashion synthesis with structural coherence. In: ICCV (2017)Google Scholar
  31. 31.
    Zhao, B., Wu, X., Cheng, Z.Q., Liu, H., Feng, J.: Multi-view image generation from a single-view. In: ACM on Multimedia Conference (2018)Google Scholar
  32. 32.
    Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: CVPR (2016)Google Scholar
  33. 33.
    Yeh, R.A., Chen, C., Lim, T., Hasegawa-Johnson, M., Do, M.N.: Semantic image inpainting with perceptual and contextual losses. In: CVPR (2017)Google Scholar
  34. 34.
    Saito, S., Wei, L., Hu, L., Nagano, K., Li, H.: Photorealistic facial texture inference using deep neural networks. In: CVPR (2017)Google Scholar
  35. 35.
    Deng, J., Cheng, S., Xue, N., Zhou, Y., Zafeiriou, S.: UV-GAN: adversarial facial UV map completion for pose-invariant face recognition. In: CVPR (2018)Google Scholar
  36. 36.
    Ulyanov, D., Vedaldi, A., Lempitsky, V.: Improved texture networks: maximizing quality and diversity in feed-forward stylization and texture synthesis. In: CVPR (2017)Google Scholar
  37. 37.
    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: NIPS (2015)Google Scholar
  38. 38.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  39. 39.
    Cao, Z., Simon, T., Wei, S., Sheikh, Y.: Realtime multiperson 2D pose estimation using part affinity fields. In: CVPR (2017)Google Scholar
  40. 40.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. In: TIP (2004)Google Scholar
  41. 41.
    Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: NIPS (2016)Google Scholar
  42. 42.
    Ma, L., Sun, Q., Georgoulis, S., Van Gool, L., Schiele, B., Fritz, M.: Disentangled person image generation. In: CVPR (2018)Google Scholar
  43. 43.
    Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multi-scale structural similarity for image quality assessment. In: ACSSC (2003)Google Scholar
  44. 44.
    Liu, W., et al.: SSD: Single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Natalia Neverova
    • 1
  • Rıza Alp Güler
    • 2
  • Iasonas Kokkinos
    • 1
  1. 1.Facebook AI ResearchParisFrance
  2. 2.INRIA-CentraleSupélecParisFrance

Personalised recommendations