Advertisement

SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach

Conference paper
  • 597 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12359)

Abstract

Human poses that are rare or unseen in a training set are challenging for a network to predict. Similar to the long-tailed distribution problem in visual recognition, the small number of examples for such poses limits the ability of networks to model them. Interestingly, local pose distributions suffer less from the long-tail problem, i.e., local joint configurations within a rare pose may appear within other poses in the training set, making them less rare. We propose to take advantage of this fact for better generalization to rare and unseen poses. To be specific, our method splits the body into local regions and processes them in separate network branches, utilizing the property that a joint’s position depends mainly on the joints within its local body region. Global coherence is maintained by recombining the global context from the rest of the body into each branch as a low-dimensional vector. With the reduced dimensionality of less relevant body areas, the training set distribution within network branches more closely reflects the statistics of local poses instead of global body poses, without sacrificing information important for joint inference. The proposed split-and-recombine approach, called SRNet, can be easily adapted to both single-image and temporal models, and it leads to appreciable improvements in the prediction of rare and unseen poses.

Keywords

Human pose estimation 2D to 3D Long-tailed distribution 

Supplementary material

504468_1_En_30_MOESM1_ESM.pdf (1.1 mb)
Supplementary material 1 (pdf 1124 KB)

References

  1. 1.
    Biswas, S., Sinha, S., Gupta, K., Bhowmick, B.: Lifting 2D human pose to 3D: a weakly supervised approach. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–9. IEEE (2019)Google Scholar
  2. 2.
    Cai, Y., et al.: Exploiting spatial-temporal relationships for 3D pose estimation via graph convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2272–2281 (2019)Google Scholar
  3. 3.
    Chen, W., et al.: Synthesizing training images for boosting human 3D pose estimation. In: 2016 4th International Conference on 3D Vision (3DV), pp. 479–488. IEEE (2016)Google Scholar
  4. 4.
    Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7103–7112 (2018)Google Scholar
  5. 5.
    Ci, H., Wang, C., Ma, X., Wang, Y.: Optimizing network structure for 3D human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2262–2271 (2019)Google Scholar
  6. 6.
    Dabral, R., Mundhada, A., Kusupati, U., Afaque, S., Sharma, A., Jain, A.: Learning 3D human pose from structure and motion. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 668–683 (2018)Google Scholar
  7. 7.
    Fang, H.S., Xu, Y., Wang, W., Liu, X., Zhu, S.C.: Learning pose grammar to encode human body configuration for 3D pose estimation. In: 32nd AAAI Conference on Artificial Intelligence (2018)Google Scholar
  8. 8.
    Habibie, I., Xu, W., Mehta, D., Pons-Moll, G., Theobalt, C.: In the wild human pose estimation using explicit 2D features and intermediate 3D representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10905–10914 (2019)Google Scholar
  9. 9.
    Huang, C., Li, Y., Loy, C.C., Tang, X.: Learning deep representation for imbalanced classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  10. 10.
    Huang, F., Zeng, A., Liu, M., Lai, Q., Xu, Q.: Deepfuse: an imu-aware network for real-time 3D human pose estimation from multi-view image. arXiv preprint arXiv:1912.04071 (2019)
  11. 11.
    Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)CrossRefGoogle Scholar
  12. 12.
    Jahangiri, E., Yuille, A.L.: Generating multiple diverse hypotheses for human 3D pose consistent with 2D joint detections. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 805–814 (2017)Google Scholar
  13. 13.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  14. 14.
    Lee, K., Lee, I., Lee, S.: Propagating lstm: 3D pose estimation based on joint interdependency. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 119–135 (2018)Google Scholar
  15. 15.
    Lin, J., Lee, G.H.: Trajectory space factorization for deep video-based 3D human pose estimation. arXiv preprint arXiv:1908.08289 (2019)
  16. 16.
    Luo, C., Chu, X., Yuille, A.: Orinet: a fully convolutional network for 3D human pose estimation. arXiv preprint arXiv:1811.04989 (2018)
  17. 17.
    Luvizon, D.C., Picard, D., Tabia, H.: 2D/3D pose estimation and action recognition using multitask deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5137–5146 (2018)Google Scholar
  18. 18.
    Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2640–2649 (2017)Google Scholar
  19. 19.
    Mehta, D., et al.: Monocular 3D human pose estimation in the wild using improved cnn supervision. In: 2017 International Conference on 3D Vision (3DV), pp. 506–516. IEEE (2017)Google Scholar
  20. 20.
    Mehta, D., et al.: Vnect: real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. (TOG) 36(4), 1–14 (2017)CrossRefGoogle Scholar
  21. 21.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_29CrossRefGoogle Scholar
  22. 22.
    Park, S., Kwak, N.: 3D human pose estimation with relational networks. arXiv preprint arXiv:1805.08961 (2018)
  23. 23.
    Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3D human pose estimation in video with temporal convolutions and semi-supervised training. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7753–7762 (2019)Google Scholar
  24. 24.
    Pham, H.H., Salmane, H., Khoudour, L., Crouzil, A., Zegers, P., Velastin, S.A.: A unified deep framework for joint 3D pose estimation and action recognition from a single RGB camera. arXiv preprint arXiv:1907.06968 (2019)
  25. 25.
    Pishchulin, L., Jain, A., Andriluka, M., Thorm ahlen, T., Schiele, B.: Articulated people detection and pose estimation: reshaping the future. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2012)Google Scholar
  26. 26.
    Rayat Imtiaz Hossain, M., Little, J.J.: Exploiting temporal information for 3D human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 68–84 (2018)Google Scholar
  27. 27.
    Rogez, G., Schmid, C.: Mocap-guided data augmentation for 3D pose estimation in the wild. In: Advances in Neural Information Processing Systems, pp. 3108–3116 (2016)Google Scholar
  28. 28.
    Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)Google Scholar
  29. 29.
    Varol, G., et al.: Learning from synthetic humans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 109–117 (2017)Google Scholar
  30. 30.
    Véges, M., Varga, V., Lőrincz, A.: 3D human pose estimation with siamese equivariant embedding. Neurocomputing 339, 194–201 (2019)CrossRefGoogle Scholar
  31. 31.
    Wandt, B., Rosenhahn, B.: Repnet: weakly supervised training of an adversarial reprojection network for 3D human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7782–7791 (2019)Google Scholar
  32. 32.
    Wang, L., et al.: Generalizing monocular 3D human pose estimation in the wild. arXiv preprint arXiv:1904.05512 (2019)
  33. 33.
    Wang, Y.X., Ramanan, D., Hebert, M.: Learning to model the tail. In: Conference on Neural Information Processing Systems (2017)Google Scholar
  34. 34.
    Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., Wang, X.: 3D human pose estimation in the wild by adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5255–5264 (2018)Google Scholar
  35. 35.
    Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic graph convolutional networks for 3D human pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3425–3435 (2019)Google Scholar
  36. 36.
    Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 398–407 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.The Chinese University of Hong KongHong KongChina
  2. 2.Microsoft Research AsiaBeijingChina

Personalised recommendations