Abstract
This paper presents a method to estimate 3D human pose and body shape from monocular videos. While recent approaches infer the 3D pose from silhouettes and landmarks, we exploit properties of optical flow to temporally constrain the reconstructed motion. We estimate human motion by minimizing the difference between computed flow fields and the output of our novel flow renderer. By just using a single semi-automatic initialization step, we are able to reconstruct monocular sequences without joint annotation. Our test scenarios demonstrate that optical flow effectively regularizes the under-constrained problem of human shape and motion estimation from monocular video.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akhter, I., Black, M.J.: Pose-conditioned joint angle limits for 3D human pose reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1446–1455 (2015)
Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. In: ACM Transactions on Graphics (TOG), vol. 24, pp. 408–416. ACM (2005)
Bălan, A.O., Black, M.J., Haussecker, H., Sigal, L.: Shining a light on human pose: on shadows, shading and the estimation of pose and shape. In: IEEE International Conference on Computer Vision, pp. 1–8. IEEE (2007)
Bălan, A.O., Sigal, L., Black, M.J., Davis, J.E., Haussecker, H.W.: Detailed human shape and pose from images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). doi:10.1007/978-3-319-46454-1_34
Brox, T., Rosenhahn, B., Cremers, D., Seidel, H.-P.: High accuracy optical flow serves 3-D pose tracking: exploiting contour and flow based constraints. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 98–111. Springer, Heidelberg (2006). doi:10.1007/11744047_8
Carranza, J., Theobalt, C., Magnor, M.A., Seidel, H.P.: Free-viewpoint video of human actors. In: ACM transactions on graphics (TOG), vol. 22, pp. 569–577. ACM (2003)
Chen, Y., Kim, T.-K., Cipolla, R.: Inferring 3D shapes and deformations from single views. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 300–313. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15558-1_22
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: IEEE International Conference on Computer Vision, pp. 726–733. IEEE (2003)
Elhayek, A., de Aguiar, E., Jain, A., Thompson, J., Pishchulin, L., Andriluka, M., Bregler, C., Schiele, B., Theobalt, C.: MARCOnI—ConvNet-based marker-less motion capture in outdoor and indoor scenes. IEEE Trans. Pattern Anal. Mach. Intell. 39(3), 501–514 (2017)
Fablet, R., Black, M.J.: Automatic detection and tracking of human motion with a view-based representation. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 476–491. Springer, Heidelberg (2002). doi:10.1007/3-540-47969-4_32
Fragkiadaki, K., Hu, H., Shi, J.: Pose from flow and flow from pose. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2059–2066 (2013)
Gibson, J.J.: The Perception of the Visual World. Houghton Mifflin, Boston (1950)
Guan, P., Weiss, A., Bălan, A.O., Black, M.J.: Estimating human shape and pose from a single image. In: International Conference on Computer Vision, pp. 1381–1388. IEEE (2009)
Hasler, N., Ackermann, H., Rosenhahn, B., Thormahlen, T., Seidel, H.P.: Multilinear pose and body shape estimation of dressed subjects from image sets. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1823–1830. IEEE (2010)
Hasler, N., Stoll, C., Sunkel, M., Rosenhahn, B., Seidel, H.P.: A statistical model of human pose and body shape. Comput. Graph. Forum. 28, 337–346 (2009)
Horn, B.K., Schunck, B.G.: Determining optical flow. Artif. Intell. 17(1–3), 185–203 (1981)
Jain, A., Thormählen, T., Seidel, H.P., Theobalt, C.: MovieReshape: tracking and reshaping of humans in videos. ACM Trans. Graph. (TOG) 29(6), 148 (2010)
Loper, M.M., Black, M.J.: OpenDR: an approximate differentiable renderer. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 154–169. Springer, Cham (2014). doi:10.1007/978-3-319-10584-0_11
Loper, M.M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248:1–248:16 (2015)
Magnor, M.A., Grau, O., Sorkine-Hornung, O., Theobalt, C. (eds.): Digital Representations of the Real World: How to Capture, Model, and Render Visual Reality. CRC Press, Boca Raton (2015)
Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. 104(2), 90–126 (2006)
Oliveira, G.L., Valada, A., Bollen, C., Burgard, W., Brox, T.: Deep learning for human part discovery in images. In: IEEE International Conference on Robotics and Automation (2016)
Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P., Schiele, B.: Deepcut: Joint subset partition and labeling for multi person pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Ramakrishna, V., Kanade, T., Sheikh, Y.: Reconstructing 3D human pose from 2D image landmarks. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 573–586. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33765-9_41
Rehan, A., Zaheer, A., Akhter, I., Saeed, A., Mahmood, B., Usmani, M., Khan, S.: NRSfM using local rigidity. In: Winter Conference on Applications of Computer Vision, pp. 69–74. IEEE, Steamboat Springs, March 2014
Rhodin, H., Robertini, N., Casas, D., Richardt, C., Seidel, H.-P., Theobalt, C.: General automatic human shape and motion capture using volumetric contour cues. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 509–526. Springer, Cham (2016). doi:10.1007/978-3-319-46454-1_31
Rogge, L., Klose, F., Stengel, M., Eisemann, M., Magnor, M.: Garment replacement in monocular video sequences. ACM Trans. Graph. 34(1), 6:1–6:10 (2014)
Romero, J., Loper, M., Black, M.J.: FlowCap: 2D human pose from optical flow. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 412–423. Springer, Cham (2015). doi:10.1007/978-3-319-24947-6_34
Sapp, B., Weiss, D., Taskar, B.: Parsing human motion with stretchable models. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1281–1288. IEEE (2011)
Sigal, L., Balan, A., Black, M.J.: Combined discriminative and generative articulated pose and non-rigid shape estimation. In: Advances in Neural Information Processing Systems, pp. 1337–1344 (2007)
Sigal, L., Balan, A.O., Black, M.J.: HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87(1–2), 4–27 (2010)
Simo-Serra, E., Ramisa, A., Aleny, G., Torras, C., Moreno-Noguer, F.: Single image 3D human pose estimation from noisy observations. In: Conference on Computer Vision and Pattern Recognition, pp. 2673–2680. IEEE (2012)
Vedula, S., Baker, S., Rander, P., Collins, R., Kanade, T.: Three-dimensional scene flow. In: IEEE International Conference on Computer Vision, vol. 2, pp. 722–729. IEEE (1999)
Wandt, B., Ackermann, H., Rosenhahn, B.: 3D human motion capture from monocular image sequences. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, June 2015
Wandt, B., Ackermann, H., Rosenhahn, B.: 3D reconstruction of human motion from monocular image sequences. Trans. Pattern Anal. Mach. Intell. 38, 1505–1516 (2016)
Xu, L., Jia, J., Matsushita, Y.: Motion detail preserving optical flow estimation. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1744–1757 (2012)
Zhou, S., Fu, H., Liu, L., Cohen-Or, D., Han, X.: Parametric reshaping of human bodies in images. In: ACM Transactions on Graphics (TOG), vol. 29, p. 126. ACM (2010)
Zhou, X., Leonardos, S., Hu, X., Daniilidis, K.: 3D shape estimation from 2D landmarks: a convex relaxation approach. In: CVPR, pp. 4447–4455. IEEE Computer Society (2015)
Zhou, X., Zhu, M., Leonardos, S., Derpanis, K.G., Daniilidis, K.: Sparseness meets deepness: 3D human pose estimation from monocular video. In: Conference on Computer Vision and Pattern Recognition, June 2016
Zuffi, S., Romero, J., Schmid, C., Black, M.J.: Estimating human pose with flowing puppets. IEEE International Conference on Computer Vision, pp. 3312–3319 (2013)
Acknowledgments
The authors gratefully acknowledge funding by the German Science Foundation from project DFG MA2555/12-1.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Alldieck, T., Kassubeck, M., Wandt, B., Rosenhahn, B., Magnor, M. (2017). Optical Flow-Based 3D Human Motion Estimation from Monocular Video. In: Roth, V., Vetter, T. (eds) Pattern Recognition. GCPR 2017. Lecture Notes in Computer Science(), vol 10496. Springer, Cham. https://doi.org/10.1007/978-3-319-66709-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-66709-6_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66708-9
Online ISBN: 978-3-319-66709-6
eBook Packages: Computer ScienceComputer Science (R0)