Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints

Sidhu, Vikramjit; Tretschk, Edgar; Golyanik, Vladislav; Agudo, Antonio; Theobalt, Christian

doi:10.1007/978-3-030-58517-4_13

Vikramjit Sidhu^12,13,
Edgar Tretschk¹²,
Vladislav Golyanik¹²,
Antonio Agudo¹⁴ &
…
Christian Theobalt¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12361))

Included in the following conference series:

European Conference on Computer Vision

3621 Accesses
22 Citations

Abstract

We introduce the first dense neural non-rigid structure from motion (N-NRSfM) approach, which can be trained end-to-end in an unsupervised manner from 2D point tracks. Compared to the competing methods, our combination of loss functions is fully-differentiable and can be readily integrated into deep-learning systems. We formulate the deformation model by an auto-decoder and impose subspace constraints on the recovered latent space function in a frequency domain. Thanks to the state recurrence cue, we classify the reconstructed non-rigid surfaces based on their similarity and recover the period of the input sequence. Our N-NRSfM approach achieves competitive accuracy on widely-used benchmark sequences and high visual quality on various real videos. Apart from being a standalone technique, our method enables multiple applications including shape compression, completion and interpolation, among others. Combined with an encoder trained directly on 2D images, we perform scenario-specific monocular 3D shape reconstruction at interactive frame rates. To facilitate the reproducibility of the results and boost the new research direction, we open-source our code and provide trained models for research purposes (http://gvv.mpi-inf.mpg.de/projects/Neural_NRSfM/).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Disentangling Deep Network for Reconstructing 3D Object Shapes from Single 2D Images

Procrustean Regression Networks: Learning 3D Structure of Non-rigid Objects from 2D Annotations

Single-View 3D Shape Reconstruction with Learned Gradient Descent

References

Agudo, A., Montiel, J.M.M., Agapito, L., Calvo, B.: Online dense non-rigid 3D shape and camera motion recovery. In: British Machine Vision Conference (BMVC) (2014)
Google Scholar
Agudo, A., Montiel, J.M.M., Calvo, B., Moreno-Noguer, F.: Mode-shape interpretation: re-thinking modal space for recovering deformable shapes. In: Winter Conference on Applications of Computer Vision (WACV) (2016)
Google Scholar
Agudo, A., Moreno-Noguer, F.: DUST: dual union of spatio-temporal subspaces for monocular multiple object 3D reconstruction. In: Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Agudo, A., Moreno-Noguer, F.: Global model with local interpretation for dynamic shape reconstruction. In: Winter Conference on Applications of Computer Vision (WACV) (2017)
Google Scholar
Agudo, A., Moreno-Noguer, F.: Force-based representation for non-rigid shape and elastic model estimation. Trans. Pattern Anal. Mach. Intell. (TPAMI) 40(9), 2137–2150 (2018)
Article Google Scholar
Agudo, A., Moreno-Noguer, F.: A scalable, efficient, and accurate solution to non-rigid structure from motion. Comput. Vis. Image Underst. (CVIU) 167, 121–133 (2018)
Article Google Scholar
Akhter, I., Sheikh, Y., Khan, S., Kanade, T.: Trajectory space: a dual representation for nonrigid structure from motion. Trans. Pattern Anal. Mach. Intell. (TPAMI) 33(7), 1442–1456 (2011)
Article Google Scholar
Ansari, M., Golyanik, V., Stricker, D.: Scalable dense monocular surface reconstruction. In: International Conference on 3D Vision (3DV) (2017)
Google Scholar
Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. Int. J. Comput. Vis. (IJCV) 92(1), 1–31 (2011)
Article Google Scholar
Bartoli, A., Gay-Bellile, V., Castellani, U., Peyras, J., Olsen, S., Sayd, P.: Coarse-to-fine low-rank structure-from-motion. In: Computer Vision and Pattern Recognition (CVPR) (2008)
Google Scholar
Bregler, C., Hertzmann, A., Biermann, H.: Recovering non-rigid 3D shape from image streams. In: Computer Vision and Pattern Recognition (CVPR) (2000)
Google Scholar
Bue, A.D.: A factorization approach to structure from motion with shape priors. In: Computer Vision and Pattern Recognition (CVPR) (2008)
Google Scholar
Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38
Chapter Google Scholar
Clevert, D., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). In: International Conference on Learning Representations (ICLR) (2016)
Google Scholar
Dai, Y., Deng, H., He, M.: Dense non-rigid structure-from-motion made easy - a spatial-temporal smoothness based solution. In: International Conference on Image Processing (ICIP), pp. 4532–4536 (2017)
Google Scholar
Dai, Y., Li, H., He, M.: Simple prior-free method for non-rigid structure-from-motion factorization. Int. J. Comput. Vis. (IJCV) 107, 101–122 (2014)
Article MathSciNet Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Computer Vision and Pattern Recognition (CVPR) (2009)
Google Scholar
Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object reconstruction from a single image. In: Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Garg, R., Roussos, A., Agapito, L.: Dense variational reconstruction of non-rigid surfaces from monocular video. In: Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Garg, R., Roussos, A., Agapito, L.: A variational approach to video registration with subspace constraints. Int. J. Comput. Vis. (IJCV) 104(3), 286–314 (2013)
Article MathSciNet Google Scholar
Golyanik, V., Fetzer, T., Stricker, D.: Accurate 3D reconstruction of dynamic scenes from monocular image sequences with severe occlusions. In: Winter Conference on Applications of Computer Vision (WACV), pp. 282–291 (2017)
Google Scholar
Golyanik, V., Stricker, D.: Dense batch non-rigid structure from motion in a second. In: Winter Conference on Applications of Computer Vision (WACV), pp. 254–263 (2017)
Google Scholar
Golyanik, V., Fetzer, T., Stricker, D.: Introduction to coherent depth fields for dense monocular surface recovery. In: British Machine Vision Conference (BMVC) (2017)
Google Scholar
Golyanik, V., Jonas, A., Stricker, D.: Consolidating segmentwise non-rigid structure from motion. In: Machine Vision Applications (MVA) (2019)
Google Scholar
Golyanik, V., Jonas, A., Stricker, D., Theobalt, C.: Intrinsic Dynamic Shape Prior for Fast, Sequential and Dense Non-Rigid Structure from Motion with Detection of Temporally-Disjoint Rigidity. arXiv e-prints (2019)
Google Scholar
Golyanik, V., Mathur, A.S., Stricker, D.: NRSfm-Flow: recovering non-rigid scene flow from monocular image sequences. In: British Machine Vision Conference (BMVC) (2016)
Google Scholar
Golyanik, V., Shimada, S., Varanasi, K., Stricker, D.: HDM-Net: monocular non-rigid 3D reconstruction with learned deformation model. In: Bourdot, P., Cobb, S., Interrante, V., kato, H., Stricker, D. (eds.) EuroVR 2018. LNCS, vol. 11162, pp. 51–72. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01790-3_4
Chapter Google Scholar
Gotardo, P.F.U., Martinez, A.M.: Kernel non-rigid structure from motion. In: International Conference on Computer Vision (ICCV), pp. 802–809 (2011)
Google Scholar
Gotardo, P.F.U., Martinez, A.M.: Non-rigid structure from motion with complementary rank-3 spaces. In: Computer Vision and Pattern Recognition (CVPR), pp. 3065–3072 (2011)
Google Scholar
Groueix, T., Fisher, M., Kim, V.G., Russell, B., Aubry, M.: AtlasNet: a Papier-Mâché approach to learning 3D surface generation. In: Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: International Conference on Computer Vision (ICCV), pp. 1026–1034 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 386–402. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_23
Chapter Google Scholar
Kong, C., Lucey, S.: Deep non-rigid structure from motion. In: International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Kovalenko, O., Golyanik, V., Malik, J., Elhayek, A., Stricker, D.: Structure from articulated motion: accurate and stable monocular 3D reconstruction without training data. Sensors 19(20), 4603 (2019)
Article Google Scholar
Kumar, S.: Jumping manifolds: geometry aware dense non-rigid structure from motion. In: Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Kumar, S., Cherian, A., Dai, Y., Li, H.: Scalable dense non-rigid structure-from-motion: a grassmannian perspective. In: Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Lee, M., Cho, J., Choi, C.H., Oh, S.: Procrustean normal distribution for non-rigid structure from motion. In: Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Lee, M., Choi, C.H., Oh, S.: A procrustean Markov process for non-rigid structure recovery. In: Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Novotny, D., Ravi, N., Graham, B., Neverova, N., Vedaldi, A.: C3DPO: canonical 3D pose networks for non-rigid structure from motion. In: International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Östlund, J., Varol, A., Ngo, D.T., Fua, P.: Laplacian meshes for monocular 3D shape recovery. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 412–425. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_30
Chapter Google Scholar
Paladini, M., Del Bue, A., Xavier, J., Agapito, L., Stosić, M., Dodig, M.: Optimal metric projections for deformable and articulated structure-from-motion. Int. J. Comput. Vis. (IJCV) 96(2), 252–276 (2012)
Article MathSciNet Google Scholar
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: learning continuous signed distance functions for shape representation. In: Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Paszke, A., et al.: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems (NeurIPS) (2019)
Google Scholar
Pearson, K.: On lines and planes of closest fit to systems of points in space. Philoso. Mag. 2, 559–572 (1901)
Article Google Scholar
Pumarola, A., Agudo, A., Porzi, L., Sanfeliu, A., Lepetit, V., Moreno-Noguer, F.: Geometry-aware network for non-rigid shape prediction from a single view. In: Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: International Conference on Neural Networks (ICNN), pp. 586–591 (1993)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)
Article Google Scholar
Russell, C., Fayad, J., Agapito, L.: Energy based multiple model fitting for non-rigid structure from motion. In: Computer Vision and Pattern Recognition (CVPR), pp. 3009–3016 (2011)
Google Scholar
Russell, C., Fayad, J., Agapito, L.: Dense non-rigid structure from motion. In: 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization Transmission (3DIMPVT) (2012)
Google Scholar
Sahasrabudhe, M., Shu, Z., Bartrum, E., Alp Güler, R., Samaras, D., Kokkinos, I.: Lifting autoencoders: unsupervised learning of a fully-disentangled 3D morphable model using deep non-rigid structure from motion. In: International Conference on Computer Vision Workshops (ICCVW) (2019)
Google Scholar
Salzmann, M., Fua, P.: Reconstructing sharply folding surfaces: a convex formulation. In: Computer Vision and Pattern Recognition (CVPR), pp. 1054–1061 (2009)
Google Scholar
Shimada, S., Golyanik, V., Theobalt, C., Stricker, D.: IsMo-GAN: adversarial learning for monocular non-rigid 3D reconstruction. In: Computer Vision and Pattern Recognition Workshops (CVPRW) (2019)
Google Scholar
Sorkine, O.: Laplacian mesh processing. In: Annual Conference of the European Association for Computer Graphics (Eurographics) (2005)
Google Scholar
Stoyanov, D.: Stereoscopic scene flow for robotic assisted minimally invasive surgery. In: Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.) MICCAI 2012. LNCS, vol. 7510, pp. 479–486. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33415-3_59
Chapter Google Scholar
Taetz, B., Bleser, G., Golyanik, V., Stricker, D.: Occlusion-aware video registration for highly non-rigid objects. In: Winter Conference on Applications of Computer Vision (WACV) (2016)
Google Scholar
Tewari, A., et al.: FML: face model learning from videos. In: Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Tewari, A., et al.: MoFA: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Tomasi, C., Kanade, T.: Shape and motion from image streams under orthography: a factorization method. Int. J. Comput. Vis. (IJCV) 9(2), 137–154 (1992)
Article Google Scholar
Torresani, L., Hertzmann, A., Bregler, C.: Nonrigid structure-from-motion: estimating shape and motion with hierarchical priors. Trans. Pattern Anal. Mach. Intell. (TPAMI) 30(5), 878–892 (2008)
Article Google Scholar
Tsoli, A., Argyros, A.A.: Patch-based reconstruction of a textureless deformable 3D surface from a single RGB image. In: International Conference on Computer Vision Workshops (ICCVW) (2019)
Google Scholar
Valgaerts, L., Wu, C., Bruhn, A., Seidel, H.P., Theobalt, C.: Lightweight binocular facial performance capture under uncontrolled lighting. ACM Trans. Graph. (TOG) 31(6), 187:1–187:11 (2012)
Article Google Scholar
Varol, A., Salzmann, M., Fua, P., Urtasun, R.: A constrained latent variable model. In: Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar
Vicente, S., Agapito, L.: Soft inextensibility constraints for template-free non-rigid reconstruction. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 426–440. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_31
Chapter Google Scholar
Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2mesh: generating 3D mesh models from single RGB images. In: European Conference on Computer Vision (ECCV) (2018)
Google Scholar
Xiao, J., Chai, J., Kanade, T.: A closed-form solution to non-rigid shape and motion recovery. In: European Conference on Computer Vision (ECCV) (2004)
Google Scholar
Yu, R., Russell, C., Campbell, N.D.F., Agapito, L.: Direct, dense, and deformable: template-based non-rigid 3D reconstruction from RGB video. In: International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Zhu, Y., Huang, D., Torre, F.D.L., Lucey, S.: Complex non-rigid motion 3D reconstruction by union of subspaces. In: Computer Vision and Pattern Recognition (CVPR), pp. 1542–1549 (2014)
Google Scholar

Download references

Acknowledgement

This work was supported by the ERC Consolidator Grant 4DReply (770784) and the Spanish Ministry of Science and Innovation under project HuMoUR TIN2017-90086-R. The authors thank Mallikarjun B R for help with running the FML method [58] on our data.

Author information

Authors and Affiliations

Max Planck Institute for Informatics, SIC, Saarbrücken, Germany
Vikramjit Sidhu, Edgar Tretschk, Vladislav Golyanik & Christian Theobalt
Saarland University, SIC, Saarbrücken, Germany
Vikramjit Sidhu
Institut de Robótica i Informática Industrial, CSIC-UPC, Barcelona, Spain
Antonio Agudo

Authors

Vikramjit Sidhu
View author publications
You can also search for this author in PubMed Google Scholar
Edgar Tretschk
View author publications
You can also search for this author in PubMed Google Scholar
Vladislav Golyanik
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Agudo
View author publications
You can also search for this author in PubMed Google Scholar
Christian Theobalt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladislav Golyanik .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4015 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sidhu, V., Tretschk, E., Golyanik, V., Agudo, A., Theobalt, C. (2020). Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12361. Springer, Cham. https://doi.org/10.1007/978-3-030-58517-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-58517-4_13
Published: 10 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58516-7
Online ISBN: 978-3-030-58517-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints

Abstract

Access this chapter

Similar content being viewed by others

Disentangling Deep Network for Reconstructing 3D Object Shapes from Single 2D Images

Procrustean Regression Networks: Learning 3D Structure of Non-rigid Objects from 2D Annotations

Single-View 3D Shape Reconstruction with Learned Gradient Descent

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 4015 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints

Abstract

Access this chapter

Similar content being viewed by others

Disentangling Deep Network for Reconstructing 3D Object Shapes from Single 2D Images

Procrustean Regression Networks: Learning 3D Structure of Non-rigid Objects from 2D Annotations

Single-View 3D Shape Reconstruction with Learned Gradient Descent

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 4015 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation