Skip to main content

Optical Flow-Based 3D Human Motion Estimation from Monocular Video

  • Conference paper
  • First Online:
Pattern Recognition (GCPR 2017)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10496))

Included in the following conference series:

Abstract

This paper presents a method to estimate 3D human pose and body shape from monocular videos. While recent approaches infer the 3D pose from silhouettes and landmarks, we exploit properties of optical flow to temporally constrain the reconstructed motion. We estimate human motion by minimizing the difference between computed flow fields and the output of our novel flow renderer. By just using a single semi-automatic initialization step, we are able to reconstruct monocular sequences without joint annotation. Our test scenarios demonstrate that optical flow effectively regularizes the under-constrained problem of human shape and motion estimation from monocular video.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://bitbucket.org/aauvap/multimodal-pixel-annotator.

References

  1. Akhter, I., Black, M.J.: Pose-conditioned joint angle limits for 3D human pose reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1446–1455 (2015)

    Google Scholar 

  2. Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. In: ACM Transactions on Graphics (TOG), vol. 24, pp. 408–416. ACM (2005)

    Google Scholar 

  3. Bălan, A.O., Black, M.J., Haussecker, H., Sigal, L.: Shining a light on human pose: on shadows, shading and the estimation of pose and shape. In: IEEE International Conference on Computer Vision, pp. 1–8. IEEE (2007)

    Google Scholar 

  4. Bălan, A.O., Sigal, L., Black, M.J., Davis, J.E., Haussecker, H.W.: Detailed human shape and pose from images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)

    Google Scholar 

  5. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). doi:10.1007/978-3-319-46454-1_34

    Chapter  Google Scholar 

  6. Brox, T., Rosenhahn, B., Cremers, D., Seidel, H.-P.: High accuracy optical flow serves 3-D pose tracking: exploiting contour and flow based constraints. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 98–111. Springer, Heidelberg (2006). doi:10.1007/11744047_8

    Chapter  Google Scholar 

  7. Carranza, J., Theobalt, C., Magnor, M.A., Seidel, H.P.: Free-viewpoint video of human actors. In: ACM transactions on graphics (TOG), vol. 22, pp. 569–577. ACM (2003)

    Google Scholar 

  8. Chen, Y., Kim, T.-K., Cipolla, R.: Inferring 3D shapes and deformations from single views. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 300–313. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15558-1_22

    Chapter  Google Scholar 

  9. Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: IEEE International Conference on Computer Vision, pp. 726–733. IEEE (2003)

    Google Scholar 

  10. Elhayek, A., de Aguiar, E., Jain, A., Thompson, J., Pishchulin, L., Andriluka, M., Bregler, C., Schiele, B., Theobalt, C.: MARCOnI—ConvNet-based marker-less motion capture in outdoor and indoor scenes. IEEE Trans. Pattern Anal. Mach. Intell. 39(3), 501–514 (2017)

    Article  Google Scholar 

  11. Fablet, R., Black, M.J.: Automatic detection and tracking of human motion with a view-based representation. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 476–491. Springer, Heidelberg (2002). doi:10.1007/3-540-47969-4_32

    Chapter  Google Scholar 

  12. Fragkiadaki, K., Hu, H., Shi, J.: Pose from flow and flow from pose. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2059–2066 (2013)

    Google Scholar 

  13. Gibson, J.J.: The Perception of the Visual World. Houghton Mifflin, Boston (1950)

    Google Scholar 

  14. Guan, P., Weiss, A., Bălan, A.O., Black, M.J.: Estimating human shape and pose from a single image. In: International Conference on Computer Vision, pp. 1381–1388. IEEE (2009)

    Google Scholar 

  15. Hasler, N., Ackermann, H., Rosenhahn, B., Thormahlen, T., Seidel, H.P.: Multilinear pose and body shape estimation of dressed subjects from image sets. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1823–1830. IEEE (2010)

    Google Scholar 

  16. Hasler, N., Stoll, C., Sunkel, M., Rosenhahn, B., Seidel, H.P.: A statistical model of human pose and body shape. Comput. Graph. Forum. 28, 337–346 (2009)

    Article  Google Scholar 

  17. Horn, B.K., Schunck, B.G.: Determining optical flow. Artif. Intell. 17(1–3), 185–203 (1981)

    Article  Google Scholar 

  18. Jain, A., Thormählen, T., Seidel, H.P., Theobalt, C.: MovieReshape: tracking and reshaping of humans in videos. ACM Trans. Graph. (TOG) 29(6), 148 (2010)

    Article  Google Scholar 

  19. Loper, M.M., Black, M.J.: OpenDR: an approximate differentiable renderer. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 154–169. Springer, Cham (2014). doi:10.1007/978-3-319-10584-0_11

    Google Scholar 

  20. Loper, M.M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248:1–248:16 (2015)

    Article  Google Scholar 

  21. Magnor, M.A., Grau, O., Sorkine-Hornung, O., Theobalt, C. (eds.): Digital Representations of the Real World: How to Capture, Model, and Render Visual Reality. CRC Press, Boca Raton (2015)

    MATH  Google Scholar 

  22. Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. 104(2), 90–126 (2006)

    Article  Google Scholar 

  23. Oliveira, G.L., Valada, A., Bollen, C., Burgard, W., Brox, T.: Deep learning for human part discovery in images. In: IEEE International Conference on Robotics and Automation (2016)

    Google Scholar 

  24. Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P., Schiele, B.: Deepcut: Joint subset partition and labeling for multi person pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

    Google Scholar 

  25. Ramakrishna, V., Kanade, T., Sheikh, Y.: Reconstructing 3D human pose from 2D image landmarks. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 573–586. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33765-9_41

    Chapter  Google Scholar 

  26. Rehan, A., Zaheer, A., Akhter, I., Saeed, A., Mahmood, B., Usmani, M., Khan, S.: NRSfM using local rigidity. In: Winter Conference on Applications of Computer Vision, pp. 69–74. IEEE, Steamboat Springs, March 2014

    Google Scholar 

  27. Rhodin, H., Robertini, N., Casas, D., Richardt, C., Seidel, H.-P., Theobalt, C.: General automatic human shape and motion capture using volumetric contour cues. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 509–526. Springer, Cham (2016). doi:10.1007/978-3-319-46454-1_31

    Chapter  Google Scholar 

  28. Rogge, L., Klose, F., Stengel, M., Eisemann, M., Magnor, M.: Garment replacement in monocular video sequences. ACM Trans. Graph. 34(1), 6:1–6:10 (2014)

    Article  Google Scholar 

  29. Romero, J., Loper, M., Black, M.J.: FlowCap: 2D human pose from optical flow. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 412–423. Springer, Cham (2015). doi:10.1007/978-3-319-24947-6_34

    Chapter  Google Scholar 

  30. Sapp, B., Weiss, D., Taskar, B.: Parsing human motion with stretchable models. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1281–1288. IEEE (2011)

    Google Scholar 

  31. Sigal, L., Balan, A., Black, M.J.: Combined discriminative and generative articulated pose and non-rigid shape estimation. In: Advances in Neural Information Processing Systems, pp. 1337–1344 (2007)

    Google Scholar 

  32. Sigal, L., Balan, A.O., Black, M.J.: HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87(1–2), 4–27 (2010)

    Article  Google Scholar 

  33. Simo-Serra, E., Ramisa, A., Aleny, G., Torras, C., Moreno-Noguer, F.: Single image 3D human pose estimation from noisy observations. In: Conference on Computer Vision and Pattern Recognition, pp. 2673–2680. IEEE (2012)

    Google Scholar 

  34. Vedula, S., Baker, S., Rander, P., Collins, R., Kanade, T.: Three-dimensional scene flow. In: IEEE International Conference on Computer Vision, vol. 2, pp. 722–729. IEEE (1999)

    Google Scholar 

  35. Wandt, B., Ackermann, H., Rosenhahn, B.: 3D human motion capture from monocular image sequences. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, June 2015

    Google Scholar 

  36. Wandt, B., Ackermann, H., Rosenhahn, B.: 3D reconstruction of human motion from monocular image sequences. Trans. Pattern Anal. Mach. Intell. 38, 1505–1516 (2016)

    Article  Google Scholar 

  37. Xu, L., Jia, J., Matsushita, Y.: Motion detail preserving optical flow estimation. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1744–1757 (2012)

    Article  Google Scholar 

  38. Zhou, S., Fu, H., Liu, L., Cohen-Or, D., Han, X.: Parametric reshaping of human bodies in images. In: ACM Transactions on Graphics (TOG), vol. 29, p. 126. ACM (2010)

    Google Scholar 

  39. Zhou, X., Leonardos, S., Hu, X., Daniilidis, K.: 3D shape estimation from 2D landmarks: a convex relaxation approach. In: CVPR, pp. 4447–4455. IEEE Computer Society (2015)

    Google Scholar 

  40. Zhou, X., Zhu, M., Leonardos, S., Derpanis, K.G., Daniilidis, K.: Sparseness meets deepness: 3D human pose estimation from monocular video. In: Conference on Computer Vision and Pattern Recognition, June 2016

    Google Scholar 

  41. Zuffi, S., Romero, J., Schmid, C., Black, M.J.: Estimating human pose with flowing puppets. IEEE International Conference on Computer Vision, pp. 3312–3319 (2013)

    Google Scholar 

Download references

Acknowledgments

The authors gratefully acknowledge funding by the German Science Foundation from project DFG MA2555/12-1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thiemo Alldieck .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Alldieck, T., Kassubeck, M., Wandt, B., Rosenhahn, B., Magnor, M. (2017). Optical Flow-Based 3D Human Motion Estimation from Monocular Video. In: Roth, V., Vetter, T. (eds) Pattern Recognition. GCPR 2017. Lecture Notes in Computer Science(), vol 10496. Springer, Cham. https://doi.org/10.1007/978-3-319-66709-6_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66709-6_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66708-9

  • Online ISBN: 978-3-319-66709-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics