Optical Flow-Based 3D Human Motion Estimation from Monocular Video

  • Thiemo Alldieck
  • Marc Kassubeck
  • Bastian Wandt
  • Bodo Rosenhahn
  • Marcus Magnor
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10496)


This paper presents a method to estimate 3D human pose and body shape from monocular videos. While recent approaches infer the 3D pose from silhouettes and landmarks, we exploit properties of optical flow to temporally constrain the reconstructed motion. We estimate human motion by minimizing the difference between computed flow fields and the output of our novel flow renderer. By just using a single semi-automatic initialization step, we are able to reconstruct monocular sequences without joint annotation. Our test scenarios demonstrate that optical flow effectively regularizes the under-constrained problem of human shape and motion estimation from monocular video.



The authors gratefully acknowledge funding by the German Science Foundation from project DFG MA2555/12-1.


  1. 1.
    Akhter, I., Black, M.J.: Pose-conditioned joint angle limits for 3D human pose reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1446–1455 (2015)Google Scholar
  2. 2.
    Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. In: ACM Transactions on Graphics (TOG), vol. 24, pp. 408–416. ACM (2005)Google Scholar
  3. 3.
    Bălan, A.O., Black, M.J., Haussecker, H., Sigal, L.: Shining a light on human pose: on shadows, shading and the estimation of pose and shape. In: IEEE International Conference on Computer Vision, pp. 1–8. IEEE (2007)Google Scholar
  4. 4.
    Bălan, A.O., Sigal, L., Black, M.J., Davis, J.E., Haussecker, H.W.: Detailed human shape and pose from images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)Google Scholar
  5. 5.
    Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). doi: 10.1007/978-3-319-46454-1_34 CrossRefGoogle Scholar
  6. 6.
    Brox, T., Rosenhahn, B., Cremers, D., Seidel, H.-P.: High accuracy optical flow serves 3-D pose tracking: exploiting contour and flow based constraints. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 98–111. Springer, Heidelberg (2006). doi: 10.1007/11744047_8 CrossRefGoogle Scholar
  7. 7.
    Carranza, J., Theobalt, C., Magnor, M.A., Seidel, H.P.: Free-viewpoint video of human actors. In: ACM transactions on graphics (TOG), vol. 22, pp. 569–577. ACM (2003)Google Scholar
  8. 8.
    Chen, Y., Kim, T.-K., Cipolla, R.: Inferring 3D shapes and deformations from single views. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 300–313. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15558-1_22 CrossRefGoogle Scholar
  9. 9.
    Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: IEEE International Conference on Computer Vision, pp. 726–733. IEEE (2003)Google Scholar
  10. 10.
    Elhayek, A., de Aguiar, E., Jain, A., Thompson, J., Pishchulin, L., Andriluka, M., Bregler, C., Schiele, B., Theobalt, C.: MARCOnI—ConvNet-based marker-less motion capture in outdoor and indoor scenes. IEEE Trans. Pattern Anal. Mach. Intell. 39(3), 501–514 (2017)CrossRefGoogle Scholar
  11. 11.
    Fablet, R., Black, M.J.: Automatic detection and tracking of human motion with a view-based representation. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 476–491. Springer, Heidelberg (2002). doi: 10.1007/3-540-47969-4_32 CrossRefGoogle Scholar
  12. 12.
    Fragkiadaki, K., Hu, H., Shi, J.: Pose from flow and flow from pose. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2059–2066 (2013)Google Scholar
  13. 13.
    Gibson, J.J.: The Perception of the Visual World. Houghton Mifflin, Boston (1950)Google Scholar
  14. 14.
    Guan, P., Weiss, A., Bălan, A.O., Black, M.J.: Estimating human shape and pose from a single image. In: International Conference on Computer Vision, pp. 1381–1388. IEEE (2009)Google Scholar
  15. 15.
    Hasler, N., Ackermann, H., Rosenhahn, B., Thormahlen, T., Seidel, H.P.: Multilinear pose and body shape estimation of dressed subjects from image sets. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1823–1830. IEEE (2010)Google Scholar
  16. 16.
    Hasler, N., Stoll, C., Sunkel, M., Rosenhahn, B., Seidel, H.P.: A statistical model of human pose and body shape. Comput. Graph. Forum. 28, 337–346 (2009)CrossRefGoogle Scholar
  17. 17.
    Horn, B.K., Schunck, B.G.: Determining optical flow. Artif. Intell. 17(1–3), 185–203 (1981)CrossRefGoogle Scholar
  18. 18.
    Jain, A., Thormählen, T., Seidel, H.P., Theobalt, C.: MovieReshape: tracking and reshaping of humans in videos. ACM Trans. Graph. (TOG) 29(6), 148 (2010)CrossRefGoogle Scholar
  19. 19.
    Loper, M.M., Black, M.J.: OpenDR: an approximate differentiable renderer. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 154–169. Springer, Cham (2014). doi: 10.1007/978-3-319-10584-0_11 Google Scholar
  20. 20.
    Loper, M.M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248:1–248:16 (2015)CrossRefGoogle Scholar
  21. 21.
    Magnor, M.A., Grau, O., Sorkine-Hornung, O., Theobalt, C. (eds.): Digital Representations of the Real World: How to Capture, Model, and Render Visual Reality. CRC Press, Boca Raton (2015)zbMATHGoogle Scholar
  22. 22.
    Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. 104(2), 90–126 (2006)CrossRefGoogle Scholar
  23. 23.
    Oliveira, G.L., Valada, A., Bollen, C., Burgard, W., Brox, T.: Deep learning for human part discovery in images. In: IEEE International Conference on Robotics and Automation (2016)Google Scholar
  24. 24.
    Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P., Schiele, B.: Deepcut: Joint subset partition and labeling for multi person pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016Google Scholar
  25. 25.
    Ramakrishna, V., Kanade, T., Sheikh, Y.: Reconstructing 3D human pose from 2D image landmarks. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 573–586. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33765-9_41 CrossRefGoogle Scholar
  26. 26.
    Rehan, A., Zaheer, A., Akhter, I., Saeed, A., Mahmood, B., Usmani, M., Khan, S.: NRSfM using local rigidity. In: Winter Conference on Applications of Computer Vision, pp. 69–74. IEEE, Steamboat Springs, March 2014Google Scholar
  27. 27.
    Rhodin, H., Robertini, N., Casas, D., Richardt, C., Seidel, H.-P., Theobalt, C.: General automatic human shape and motion capture using volumetric contour cues. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 509–526. Springer, Cham (2016). doi: 10.1007/978-3-319-46454-1_31 CrossRefGoogle Scholar
  28. 28.
    Rogge, L., Klose, F., Stengel, M., Eisemann, M., Magnor, M.: Garment replacement in monocular video sequences. ACM Trans. Graph. 34(1), 6:1–6:10 (2014)CrossRefGoogle Scholar
  29. 29.
    Romero, J., Loper, M., Black, M.J.: FlowCap: 2D human pose from optical flow. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 412–423. Springer, Cham (2015). doi: 10.1007/978-3-319-24947-6_34 CrossRefGoogle Scholar
  30. 30.
    Sapp, B., Weiss, D., Taskar, B.: Parsing human motion with stretchable models. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1281–1288. IEEE (2011)Google Scholar
  31. 31.
    Sigal, L., Balan, A., Black, M.J.: Combined discriminative and generative articulated pose and non-rigid shape estimation. In: Advances in Neural Information Processing Systems, pp. 1337–1344 (2007)Google Scholar
  32. 32.
    Sigal, L., Balan, A.O., Black, M.J.: HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87(1–2), 4–27 (2010)CrossRefGoogle Scholar
  33. 33.
    Simo-Serra, E., Ramisa, A., Aleny, G., Torras, C., Moreno-Noguer, F.: Single image 3D human pose estimation from noisy observations. In: Conference on Computer Vision and Pattern Recognition, pp. 2673–2680. IEEE (2012)Google Scholar
  34. 34.
    Vedula, S., Baker, S., Rander, P., Collins, R., Kanade, T.: Three-dimensional scene flow. In: IEEE International Conference on Computer Vision, vol. 2, pp. 722–729. IEEE (1999)Google Scholar
  35. 35.
    Wandt, B., Ackermann, H., Rosenhahn, B.: 3D human motion capture from monocular image sequences. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, June 2015Google Scholar
  36. 36.
    Wandt, B., Ackermann, H., Rosenhahn, B.: 3D reconstruction of human motion from monocular image sequences. Trans. Pattern Anal. Mach. Intell. 38, 1505–1516 (2016)CrossRefGoogle Scholar
  37. 37.
    Xu, L., Jia, J., Matsushita, Y.: Motion detail preserving optical flow estimation. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1744–1757 (2012)CrossRefGoogle Scholar
  38. 38.
    Zhou, S., Fu, H., Liu, L., Cohen-Or, D., Han, X.: Parametric reshaping of human bodies in images. In: ACM Transactions on Graphics (TOG), vol. 29, p. 126. ACM (2010)Google Scholar
  39. 39.
    Zhou, X., Leonardos, S., Hu, X., Daniilidis, K.: 3D shape estimation from 2D landmarks: a convex relaxation approach. In: CVPR, pp. 4447–4455. IEEE Computer Society (2015)Google Scholar
  40. 40.
    Zhou, X., Zhu, M., Leonardos, S., Derpanis, K.G., Daniilidis, K.: Sparseness meets deepness: 3D human pose estimation from monocular video. In: Conference on Computer Vision and Pattern Recognition, June 2016Google Scholar
  41. 41.
    Zuffi, S., Romero, J., Schmid, C., Black, M.J.: Estimating human pose with flowing puppets. IEEE International Conference on Computer Vision, pp. 3312–3319 (2013)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Thiemo Alldieck
    • 1
  • Marc Kassubeck
    • 1
  • Bastian Wandt
    • 2
  • Bodo Rosenhahn
    • 2
  • Marcus Magnor
    • 1
  1. 1.Computer Graphics LabTU BraunschweigBraunschweigGermany
  2. 2.Institut für InformationsverarbeitungLeibniz Universität HannoverHannoverGermany

Personalised recommendations