Abstract
This paper considers the problem of automatically recovering temporally consistent animated 3D models of arbitrary shapes in multi-camera setups. An approach is presented that takes as input a sequence of frame-wise reconstructed surfaces and iteratively deforms a reference surface such that it fits the input observations. This approach addresses several issues in this field that include: large frame-to-frame deformations, noise, missing data, outliers and shapes composed of multiple components with arbitrary geometries. The problem is cast as a geometric registration with two major features. First, surface deformations are modeled using mesh decomposition into elements called patches. This strategy ensures robustness by enabling flexible regularization priors through inter-patch rigidity constraints. Second, registration is formulated as a Bayesian estimation that alternates between probabilistic datal-model association and deformation parameter estimation. This accounts for uncertainties in the acquisition process and allows for noise, outliers and missing geometries in the observed meshes. In the case of marker-less 3D human motion capture, this framework can be specialized further with additional articulated motion constraints. Extensive experiments on various 4D datasets show that complex scenes with multiple objects of arbitrary nature can be processed in a robust way. They also demonstrate that the framework can capture human motion and provides visually convincing as well as quantitatively reliable human poses.
Similar content being viewed by others
References
Ahmed, N., Theobalt, C., Rössl, C., Thrun, S., & Seidel, H. P. (2008). Dense correspondence finding for parametrization-free animation reconstruction from video. In IEEE CVPR.
Baran, I., & Popovic, J. (2007). Automatic rigging and animation of 3D characters. In SIGGRAPH.
Bay, H., Ess, A., Tuytelaars, T., & Gool, L. V. (2008). Surf: Speeded up robust features. CVIU, 110, 346–359.
Bishop, C. M., & Nasrabadi, N. M. (2006). Pattern recognition and machine learning (Vol. 1). New York: Springer.
Botsch, M., & Sorkine, O. (2008). On linear variational surface deformation methods. In IEEE Transactions on Visualization and Computer Graphics.
Botsch, M., Bommes, D., & Kobbelt, L. (2005). Efficient linear system solvers for mesh processing. In IMA Conference on the Mathematics of Surfaces.
Botsch, M., Pauly, M., Wicke, M., & Gross, M. H. (2007). Adaptive space deformations based on rigid cells. Computer Graphics Forum, 26, 339–347.
Cagniart, C., Boyer, E., & Ilic, S. (2010a). Free-from mesh tracking: a patch-based approach. In IEEE CVPR.
Cagniart, C., Boyer, E., & Ilic, S. (2010b). Probabilistic deformable surface tracking from multiple videos. In ECCV.
Chai, J., Xiao, J., & Hodgins, J. K. (2003). Vision-based control of 3d facial animation. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation.
Corazza, S., Mündermann, L., Gambaretto, E., Ferrigno, G., & Andriacchi, T. P. (2010). Markerless motion capture through visual hull, articulated ICP and subject specific model generation. IJCV, 87(1–2), 156–169.
De Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H. P., & Thrun, S. (2008). Performance capture from sparse multi-view video. In ACM SIGGRAPH 2008.
De Aguiar, E., Sigal, L., Treuille, A., & Hodgins, J. K. (2010). Stable spaces for real-time clothing. In SIGGRAPH.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.
Duveau, E., Courtemanche, S., Reveret, L., & Boyer, E. (2012). Cage-based motion recovery using manifold learning. In 3DimPVT, IEEE.
Franco, J. S., & Boyer, E. (2003). Exact polyhedral visual hulls. In: BMVC.
Furukawa, Y., & Ponce, J. (2008). Dense 3D motion capture from synchronized video streams. In IEEE CVPR.
Gall, J., Stoll, C., de Aguiar, E., Theobalt, C., Rosenhahn, B., & Seidel, H. P. (2009). Motion capture using joint skeleton tracking and surface estimation. In IEEE CVPR.
Gall, J., Rosenhahn, B., Brox, T., & Seidel, H. P. (2010). Optimization and filtering for human motion capture. IJCV, 87, 75–92.
Guan, P., Weiss, A., Balan, A., & Black, M. J. (2009). Estimating human shape and pose from a single image. In ICCV (pp. 1381–1388).
Horaud, R. P., Forbes, F., Yguel, M., Dewaele, G., Zhang, J. (2010). Rigid and articulated point registration with expectation conditional maximization. IEEE PAMI.
Huang, C. H., Boyer, E., & Ilic, S. (2013). Robust human body shape and pose tracking. In 3D Vision.
Huang, C. H., Boyer, E., Navab, N., & Ilic, S. (2014). Human shape and pose tracking using keyframes. In CVPR.
James, D. L., & Twigg, C. D. (2005). Skinning mesh animations. SIGGRAPH.
Karypis, G., & Kumar, V. (1998). A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 7, 14–23.
Lewis, J. P., Cordner, M., & Fong, N. (2000). Pose space deformation: A unified approach to shape interpolation and skeleton-driven deformation. In SIGGRAPH, ACM.
Li, H., Roivainen, P., & Forcheimer, R. (1993). 3-D motion estimation in model-based facial image coding. In PAMI.
Li, H., Sumner, R. W., & Pauly, M. (2008). Global correspondence optimization for non-rigid registration of depth scans. Computer Graphics Forum, 25, 1459–1468.
Liao, M., Zhang, Q., Wang, H., Yang, R., & Gong, M. (2009). Modeling deformable objects from a single depth camera. In ICCV.
Liu, Y., Stoll, C., Gall, J., Seidel, H. P., & Theobalt, C. (2011). Markerless motion capture of interacting characters using multi-view image segmentation. In CVPR, IEEE.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 60, 91–110.
Meng, X. L., & Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. In Biometrika.
Myronenko, A., & Song, X. (2010). Point-set registration: Coherent point drift. In IEEE PAMI.
Nealen, A., Mueller, M., Keiser, R., Boxerman, E., & Carlson, M. (2006). Physically Based Deformable Models in Computer Graphics. Computer Graphics Forum, 25, 809–836.
Peel, D., & McLachlan, G. J. (2000). Robust mixture modelling using the t distribution. Statistics and Computing, 10(4), 339–348.
Popa, T., South-Dickinson, I., Bradley, D., Sheffer, A., & Heidrich, W. (2010). Globally consistent space-time reconstruction. Computer Graphics Forum, 29, 1633–1642.
Rydfalk, M. (1987). CANDIDE, a parameterized face. Technical report.
Toledo, S. (2003). Taucs: A library of sparse linear solvers, Version 2.2. Technical report.
Salzmann, M., Pilet, J., Ilic, S., & Fua, P. (2007). Surface deformation models for nonrigid 3D shape recovery. In IEEE PAMI.
Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR.
Sigal, L., Balan, A. O., & Black, M. J. (2010). HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. IJCV, 87(1), 4–27.
Sigal, L., Isard, M., Haussecker, H., & Black, M. J. (2012). Loose-limbed people: Estimating 3d human pose and motion using non-parametric belief propagation. IJCV, 98(1), 15–48.
Sorkine, O., & Alexa, M. (2007). As-rigid-as-possible surface modeling. In Eurographics.
Sorkine, O., Or, D. C., Lipman, Y., Alexa, M., Rössl, C., & Seidel, H. P. (2004). Laplacian surface editing. In SGP’04: Proceedings of the 2004 Eurographics/ACM SIGGRAPH symposium on geometry processing.
Starck, J., & Hilton, A. (2007a). Correspondence labelling for wide-timeframe free-form surface matching. In ICCV 2007.
Starck, J., & Hilton, A. (2007b). Surface capture for performance based animation. In IEEE Computer Graphics and Applications.
Stoll, C., Hasler, N., Gall, J., Seidel, H. P., & Theobalt, C. (2011). Fast articulated motion tracking using a sums of Gaussians body model. In IEEE ICCV.
Straka, M., Hauswiesner, S., Rüther, M., & Bischof, H. (2012). Simultaneous shape and pose adaption of articulated models using linear optimization. In ECCV. Heidelberg: Springer.
Sumner, R.W., Schmid, J., & Pauly, M. (2007). Embedded deformation for shape manipulation. In ACM SIGGRAPH 2007.
Urtasun, R., & Fua, P. (2004). 3D human body tracking using deterministic temporal motion models. In ECCV. Heidelberg: Springer.
Varanasi, K., Zaharescu, A., Boyer, E., Horaud, R. P. (2008). Temporal surface tracking using mesh evolution. In ECCV.
Vlasic, D., Baran, I., Matusik, W., & Popović, J. (2008). Articulated mesh animation from multi-view silhouettes. In SIGGRAPH.
White, R., Crane, K., & Forsyth, D. (2007). Capturing and Animating Occluded Cloth. In SIGGRAPH.
Zhou, Z., Zheng, J., Dai, Y., Zhou, Z., & Chen, S. (2014). Robust non-rigid point set registration using student’s-t mixture model. PLoS One, 9(3), e91,381.
Acknowledgments
This work was partially funded by Deutsche Telekom Laboratories and partly conducted in their laboratory.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by M. Hebert.
Chun-Hao Huang and Cedric Cagniart have contributed equally to this paper.
Rights and permissions
About this article
Cite this article
Huang, CH., Cagniart, C., Boyer, E. et al. A Bayesian Approach to Multi-view 4D Modeling. Int J Comput Vis 116, 115–135 (2016). https://doi.org/10.1007/s11263-015-0832-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-015-0832-y