Skip to main content

View-Invariant Probabilistic Embedding for Human Pose

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12350))

Included in the following conference series:

Abstract

Depictions of similar human body configurations can vary with changing viewpoints. Using only 2D information, we would like to enable vision algorithms to recognize similarity in human body poses across multiple views. This ability is useful for analyzing body movements and human behaviors in images and videos. In this paper, we propose an approach for learning a compact view-invariant embedding space from 2D joint keypoints alone, without explicitly predicting 3D poses. Since 2D poses are projected from 3D space, they have an inherent ambiguity, which is difficult to represent through a deterministic mapping. Hence, we use probabilistic embeddings to model this input uncertainty. Experimental results show that our embedding model achieves higher accuracy when retrieving similar poses across different camera views, in comparison with 2D-to-3D pose lifting models. We also demonstrate the effectiveness of applying our embeddings to view-invariant action recognition and video alignment. Our code is available at https://github.com/google-research/google-research/tree/master/poem.

J.J. Sun—This work was done during the author’s internship at Google.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Akhter, I., Black, M.J.: Pose-conditioned joint angle limits for 3D human pose reconstruction. In: CVPR (2015)

    Google Scholar 

  2. Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR (2014)

    Google Scholar 

  3. Bojchevski, A., Günnemann, S.: Deep Gaussian embedding of graphs: Unsupervised inductive learning via ranking. In: ICLR (2018)

    Google Scholar 

  4. Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. In: NeurIPS (1994)

    Google Scholar 

  5. Cao, C., Zhang, Y., Zhang, C., Lu, H.: Body joint guided 3-D deep convolutional descriptors for action recognition. IEEE Trans. Cybern. 48(3), 1095–1108 (2017)

    Article  Google Scholar 

  6. Chen, C.H., Ramanan, D.: 3D human pose estimation = 2D pose estimation + matching. In: CVPR (2017)

    Google Scholar 

  7. Chen, C.H., Tyagi, A., Agrawal, A., Drover, D., Stojanov, S., Rehg, J.M.: Unsupervised 3D pose estimation with geometric self-supervision. In: CVPR (2019)

    Google Scholar 

  8. Chu, R., Sun, Y., Li, Y., Liu, Z., Zhang, C., Wei, Y.: Vehicle re-identification with viewpoint-aware metric learning. In: ICCV (2019)

    Google Scholar 

  9. Drover, D., M. V, R., Chen, C.-H., Agrawal, A., Tyagi, A., Huynh, C.P.: Can 3D pose be learned from 2D projections alone? In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11132, pp. 78–94. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11018-5_7

    Chapter  Google Scholar 

  10. Du, W., Wang, Y., Qiao, Y.: RPAN: an end-to-end recurrent pose-attention network for action recognition in videos. In: ICCV (2017)

    Google Scholar 

  11. Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., Zisserman, A.: Temporal cycle-consistency learning. In: CVPR (2019)

    Google Scholar 

  12. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: CVPR (2006)

    Google Scholar 

  13. Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv:1703.07737 (2017)

  14. Ho, C.H., Morgado, P., Persekian, A., Vasconcelos, N.: PIEs: pose invariant embeddings. In: CVPR, pp. 12377–12386 (2019)

    Google Scholar 

  15. Hu, W., Zhu, S.C.: Learning a probabilistic model mixing 3D and 2D primitives for view invariant object recognition. In: CVPR (2010)

    Google Scholar 

  16. Huang, C., Loy, C.C., Tang, X.: Local similarity-aware deep feature embedding. In: NeurIPS (2016)

    Google Scholar 

  17. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE TPAMI 36, 1325–1339 (2013)

    Article  Google Scholar 

  18. Iqbal, U., Garbade, M., Gall, J.: Pose for action-action for pose. In: FG (2017)

    Google Scholar 

  19. Iscen, A., Tolias, G., Avrithis, Y., Chum, O.: Mining on manifolds: metric learning without labels. In: CVPR (2018)

    Google Scholar 

  20. Iskakov, K., Burkov, E., Lempitsky, V., Malkov, Y.: Learnable triangulation of human pose. In: ICCV (2019)

    Google Scholar 

  21. Jammalamadaka, N., Zisserman, A., Eichner, M., Ferrari, V., Jawahar, C.: Video retrieval by mimicking poses. In: ACM ICMR (2012)

    Google Scholar 

  22. Ji, X., Liu, H.: Advances in view-invariant human motion analysis: a review. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 40(1), 13–24 (2009)

    Google Scholar 

  23. Ji, X., Liu, H., Li, Y., Brown, D.: Visual-based view-invariant human motion analysis: a review. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008. LNCS (LNAI), vol. 5177, pp. 741–748. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85563-7_93

    Chapter  Google Scholar 

  24. Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: NeurIPS (2017)

    Google Scholar 

  25. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: ICLR (2014)

    Google Scholar 

  26. Kocabas, M., Karagoz, S., Akbas, E.: Self-supervised learning of 3D human pose using multi-view geometry. In: CVPR (2019)

    Google Scholar 

  27. LeCun, Y., Huang, F.J., Bottou, L., et al.: Learning methods for generic object recognition with invariance to pose and lighting. In: CVPR (2004)

    Google Scholar 

  28. Li, J., Wong, Y., Zhao, Q., Kankanhalli, M.: Unsupervised learning of view-invariant action representations. In: NeurIPS (2018)

    Google Scholar 

  29. Liu, J., Akhtar, N., Ajmal, M.: Viewpoint invariant action recognition using RGB-D videos. IEEE Access 6, 70061–70071 (2018)

    Article  Google Scholar 

  30. Liu, M., Yuan, J.: Recognizing human actions as the evolution of pose estimation maps. In: CVPR (2018)

    Google Scholar 

  31. Luvizon, D.C., Tabia, H., Picard, D.: Multi-task deep learning for real-time 3D human pose estimation and action recognition. arXiv:1912.08077 (2019)

  32. Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: ICCV (2017)

    Google Scholar 

  33. Mehta, D., et al.: Monocular 3D human pose estimation in the wild using improved CNN supervision. In: 3DV (2017)

    Google Scholar 

  34. Misra, I., Zitnick, C.L., Hebert, M.: Shuffle and learn: unsupervised learning using temporal order verification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 527–544. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_32

    Chapter  Google Scholar 

  35. Mori, G., et al.: Pose embeddings: A deep architecture for learning to match human poses. arXiv:1507.00302 (2015)

  36. Nie, B.X., Xiong, C., Zhu, S.C.: Joint action recognition and pose estimation from video. In: CVPR (2015)

    Google Scholar 

  37. Oh, S.J., Murphy, K., Pan, J., Roth, J., Schroff, F., Gallagher, A.: Modeling uncertainty with hedged instance embedding. In: ICLR (2019)

    Google Scholar 

  38. Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: CVPR (2016)

    Google Scholar 

  39. Ong, E.J., Micilotta, A.S., Bowden, R., Hilton, A.: Viewpoint invariant exemplar-based 3D human tracking. CVIU 104, 178–189 (2006)

    Google Scholar 

  40. Papandreou, G., Zhu, T., Chen, L.-C., Gidaris, S., Tompson, J., Murphy, K.: PersonLab: person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 282–299. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_17

    Chapter  Google Scholar 

  41. Papandreou, G., et al.: Towards accurate multi-person pose estimation in the wild. In: CVPR (2017)

    Google Scholar 

  42. Parkhi, O.M., Vedaldi, A., Zisserman, A., et al.: Deep face recognition. In: BMVC (2015)

    Google Scholar 

  43. Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3D human pose estimation in video with temporal convolutions and semi-supervised training. In: CVPR (2019)

    Google Scholar 

  44. Qiu, H., Wang, C., Wang, J., Wang, N., Zeng, W.: Cross View Fusion for 3D Human Pose Estimation. In: ICCV (2019)

    Google Scholar 

  45. Rao, C., Shah, M.: View-invariance in action recognition. In: CVPR (2001)

    Google Scholar 

  46. Hossain, M.R.I., Little, J.J.: Exploiting temporal information for 3D human pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 69–86. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_5

    Chapter  Google Scholar 

  47. Rhodin, H., Constantin, V., Katircioglu, I., Salzmann, M., Fua, P.: Neural scene decomposition for multi-person motion capture. In: CVPR (2019)

    Google Scholar 

  48. Rhodin, H., Salzmann, M., Fua, P.: Unsupervised geometry-aware representation for 3D human pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 765–782. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_46

    Chapter  Google Scholar 

  49. Rhodin, H., et al.: Learning monocular 3D human pose estimation from multi-view images. In: CVPR (2018)

    Google Scholar 

  50. Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR (2015)

    Google Scholar 

  51. Sermanet, P., et al.: Time-contrastive networks: self-supervised learning from video. In: ICRA (2018)

    Google Scholar 

  52. Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 536–553. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_33

    Chapter  Google Scholar 

  53. Tekin, B., Márquez-Neila, P., Salzmann, M., Fua, P.: Learning to fuse 2D and 3D image cues for monocular body pose estimation. In: ICCV (2017)

    Google Scholar 

  54. Tome, D., Toso, M., Agapito, L., Russell, C.: Rethinking pose in 3D: multi-stage refinement and recovery for markerless motion capture. In: 3DV (2018)

    Google Scholar 

  55. Vilnis, L., McCallum, A.: Word representations via Gaussian embedding. In: ICLR (2015)

    Google Scholar 

  56. Wang, J., et al.: Learning fine-grained image similarity with deep ranking. In: CVPR (2014)

    Google Scholar 

  57. Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3D pose estimation. In: CVPR (2015)

    Google Scholar 

  58. Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: ICCV (2017)

    Google Scholar 

  59. Xia, L., Chen, C.C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: CVPRW (2012)

    Google Scholar 

  60. Zhang, W., Zhu, M., Derpanis, K.G.: From actemes to action: a strongly-supervised representation for detailed action understanding. In: ICCV (2013)

    Google Scholar 

  61. Zheng, L., Huang, Y., Lu, H., Yang, Y.: Pose invariant embedding for deep person re-identification. IEEE TIP 28, 4500–4509 (2019)

    MathSciNet  MATH  Google Scholar 

  62. Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: ICCV (2017)

    Google Scholar 

Download references

Acknowledgment

We thank Yuxiao Wang, Debidatta Dwibedi, and Liangzhe Yuan from Google Research, Long Zhao from Rutgers University, and Xiao Zhang from University of Chicago for helpful discussions. We appreciate the support of Pietro Perona, Yisong Yue, and the Computational Vision Lab at Caltech for making this collaboration possible. The author Jennifer J. Sun is supported by NSERC (funding number PGSD3-532647-2019) and Caltech.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jennifer J. Sun .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2333 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, J.J., Zhao, J., Chen, LC., Schroff, F., Adam, H., Liu, T. (2020). View-Invariant Probabilistic Embedding for Human Pose. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12350. Springer, Cham. https://doi.org/10.1007/978-3-030-58558-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58558-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58557-0

  • Online ISBN: 978-3-030-58558-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics