Human Motion Generation Based on GAN Toward Unsupervised 3D Human Pose Estimation

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1180)


In this paper, we propose a method for generating joint angle sequences toward unsupervised 3D human pose estimation. Many researchers have proposed human pose estimation methods. So far, however, most methods have problems that require a large amount of images with supervised pose datasets to learn pose estimation models. Building such datasets is a time-consuming task. Thus, we aim to propose a method that can estimate 3D human poses without requiring training data with known poses. Toward this goal, we propose a GAN-based method for human motion generation and an optimization-based human pose estimation method. The proposed method consists of a generator that generates human pose sequence, a renderer that renders human images by changing 3D meshes based on the pose sequences generated, and a discriminator that discriminates between generated images and training data. Through an experiment based on simulated walking images, we confirmed that the proposed method can estimate the poses of body parts that are not occluded.


3D human pose estimation Unsupervised learning Generative adversarial networks 



This work was supported by JSPS KAKENHI Grant Number JP17K00372 and JP18K11383.


  1. 1.
    Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131 (2018)Google Scholar
  2. 2.
    Kato, H., Ushiku, Y., Harada, T.: Neural 3D mesh renderer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3907–3916 (2018)Google Scholar
  3. 3.
    Kulkarni, T.D., Kohli, P., Tenenbaum, J.B., Mansinghka, V.: Picture: a probabilistic programming language for scene perception. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4390–4399 (2015)Google Scholar
  4. 4.
    Liu, Z., Zhu, J., Bu, J., Chen, C.: A survey of human pose estimation: the body parts parsing based methods. J. Vis. Commun. Image Represent. 32, 10–19 (2015)CrossRefGoogle Scholar
  5. 5.
    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (TOG) 34(6), 248 (2015)CrossRefGoogle Scholar
  6. 6.
    Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: International Conference on Computer Vision, vol. 1, p. 5 (2017)Google Scholar
  7. 7.
    Moreno-Noguer, F.: 3D human pose estimation from a single image via distance matrix regression. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1561–1570. IEEE (2017)Google Scholar
  8. 8.
    Murray, M.P., Kory, R.C., Clarkson, B.H., Sepic, S.: Comparison of free and fast speed walking patterns of normal men. Am. J. Phys. Med. Rehabil. 45(1), 8–24 (1966)CrossRefGoogle Scholar
  9. 9.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks (2015). arXiv preprint arXiv:1511.06434
  10. 10.
    Sarafianos, N., Boteanu, B., Ionescu, B., Kakadiaris, I.A.: 3D human pose estimation: a review of the literature and analysis of covariates. Comput. Vis. Image Understand. 152, 1–20 (2016)CrossRefGoogle Scholar
  11. 11.
    Takemura, N., Makihara, Y., Muramatsu, D., Echigo, T., Yagi, Y.: Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition. IPSJ Trans. Comput. Vis. Appl. 10(4), 1–14 (2018)Google Scholar
  12. 12.
    Tan, J., Budvytis, I., Cipolla, R.: Indirect deep structured learning for 3D human body shape and pose prediction. In: Proceedings of the BMVC, London, UK, pp. 4–7 (2017)Google Scholar
  13. 13.
    Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)Google Scholar
  14. 14.
    Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)Google Scholar
  15. 15.
    Tung, H.Y., Tung, H.W., Yumer, E., Fragkiadaki, K.: Self-supervised learning of motion capture. In: Advances in Neural Information Processing Systems, pp. 5236–5246 (2017)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Ritsumeikan UniversityKusatsuJapan

Personalised recommendations