Advertisement

Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints

Conference paper
  • 866 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12361)

Abstract

We introduce the first dense neural non-rigid structure from motion (N-NRSfM) approach, which can be trained end-to-end in an unsupervised manner from 2D point tracks. Compared to the competing methods, our combination of loss functions is fully-differentiable and can be readily integrated into deep-learning systems. We formulate the deformation model by an auto-decoder and impose subspace constraints on the recovered latent space function in a frequency domain. Thanks to the state recurrence cue, we classify the reconstructed non-rigid surfaces based on their similarity and recover the period of the input sequence. Our N-NRSfM approach achieves competitive accuracy on widely-used benchmark sequences and high visual quality on various real videos. Apart from being a standalone technique, our method enables multiple applications including shape compression, completion and interpolation, among others. Combined with an encoder trained directly on 2D images, we perform scenario-specific monocular 3D shape reconstruction at interactive frame rates. To facilitate the reproducibility of the results and boost the new research direction, we open-source our code and provide trained models for research purposes (http://gvv.mpi-inf.mpg.de/projects/Neural_NRSfM/).

Keywords

Neural Non-Rigid Structure from Motion Sequence period detection Latent space constraints Deformation auto-decoder 

Notes

Acknowledgement

This work was supported by the ERC Consolidator Grant 4DReply (770784) and the Spanish Ministry of Science and Innovation under project HuMoUR TIN2017-90086-R. The authors thank Mallikarjun B R for help with running the FML method [58] on our data.

Supplementary material

504471_1_En_13_MOESM1_ESM.pdf (3.9 mb)
Supplementary material 1 (pdf 4015 KB)

References

  1. 1.
    Agudo, A., Montiel, J.M.M., Agapito, L., Calvo, B.: Online dense non-rigid 3D shape and camera motion recovery. In: British Machine Vision Conference (BMVC) (2014)Google Scholar
  2. 2.
    Agudo, A., Montiel, J.M.M., Calvo, B., Moreno-Noguer, F.: Mode-shape interpretation: re-thinking modal space for recovering deformable shapes. In: Winter Conference on Applications of Computer Vision (WACV) (2016)Google Scholar
  3. 3.
    Agudo, A., Moreno-Noguer, F.: DUST: dual union of spatio-temporal subspaces for monocular multiple object 3D reconstruction. In: Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  4. 4.
    Agudo, A., Moreno-Noguer, F.: Global model with local interpretation for dynamic shape reconstruction. In: Winter Conference on Applications of Computer Vision (WACV) (2017)Google Scholar
  5. 5.
    Agudo, A., Moreno-Noguer, F.: Force-based representation for non-rigid shape and elastic model estimation. Trans. Pattern Anal. Mach. Intell. (TPAMI) 40(9), 2137–2150 (2018)CrossRefGoogle Scholar
  6. 6.
    Agudo, A., Moreno-Noguer, F.: A scalable, efficient, and accurate solution to non-rigid structure from motion. Comput. Vis. Image Underst. (CVIU) 167, 121–133 (2018)CrossRefGoogle Scholar
  7. 7.
    Akhter, I., Sheikh, Y., Khan, S., Kanade, T.: Trajectory space: a dual representation for nonrigid structure from motion. Trans. Pattern Anal. Mach. Intell. (TPAMI) 33(7), 1442–1456 (2011)CrossRefGoogle Scholar
  8. 8.
    Ansari, M., Golyanik, V., Stricker, D.: Scalable dense monocular surface reconstruction. In: International Conference on 3D Vision (3DV) (2017)Google Scholar
  9. 9.
    Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. Int. J. Comput. Vis. (IJCV) 92(1), 1–31 (2011)CrossRefGoogle Scholar
  10. 10.
    Bartoli, A., Gay-Bellile, V., Castellani, U., Peyras, J., Olsen, S., Sayd, P.: Coarse-to-fine low-rank structure-from-motion. In: Computer Vision and Pattern Recognition (CVPR) (2008)Google Scholar
  11. 11.
    Bregler, C., Hertzmann, A., Biermann, H.: Recovering non-rigid 3D shape from image streams. In: Computer Vision and Pattern Recognition (CVPR) (2000)Google Scholar
  12. 12.
    Bue, A.D.: A factorization approach to structure from motion with shape priors. In: Computer Vision and Pattern Recognition (CVPR) (2008)Google Scholar
  13. 13.
    Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_38CrossRefGoogle Scholar
  14. 14.
    Clevert, D., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). In: International Conference on Learning Representations (ICLR) (2016)Google Scholar
  15. 15.
    Dai, Y., Deng, H., He, M.: Dense non-rigid structure-from-motion made easy - a spatial-temporal smoothness based solution. In: International Conference on Image Processing (ICIP), pp. 4532–4536 (2017)Google Scholar
  16. 16.
    Dai, Y., Li, H., He, M.: Simple prior-free method for non-rigid structure-from-motion factorization. Int. J. Comput. Vis. (IJCV) 107, 101–122 (2014)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  18. 18.
    Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object reconstruction from a single image. In: Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  19. 19.
    Garg, R., Roussos, A., Agapito, L.: Dense variational reconstruction of non-rigid surfaces from monocular video. In: Computer Vision and Pattern Recognition (CVPR) (2013)Google Scholar
  20. 20.
    Garg, R., Roussos, A., Agapito, L.: A variational approach to video registration with subspace constraints. Int. J. Comput. Vis. (IJCV) 104(3), 286–314 (2013)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Golyanik, V., Fetzer, T., Stricker, D.: Accurate 3D reconstruction of dynamic scenes from monocular image sequences with severe occlusions. In: Winter Conference on Applications of Computer Vision (WACV), pp. 282–291 (2017)Google Scholar
  22. 22.
    Golyanik, V., Stricker, D.: Dense batch non-rigid structure from motion in a second. In: Winter Conference on Applications of Computer Vision (WACV), pp. 254–263 (2017)Google Scholar
  23. 23.
    Golyanik, V., Fetzer, T., Stricker, D.: Introduction to coherent depth fields for dense monocular surface recovery. In: British Machine Vision Conference (BMVC) (2017)Google Scholar
  24. 24.
    Golyanik, V., Jonas, A., Stricker, D.: Consolidating segmentwise non-rigid structure from motion. In: Machine Vision Applications (MVA) (2019)Google Scholar
  25. 25.
    Golyanik, V., Jonas, A., Stricker, D., Theobalt, C.: Intrinsic Dynamic Shape Prior for Fast, Sequential and Dense Non-Rigid Structure from Motion with Detection of Temporally-Disjoint Rigidity. arXiv e-prints (2019)Google Scholar
  26. 26.
    Golyanik, V., Mathur, A.S., Stricker, D.: NRSfm-Flow: recovering non-rigid scene flow from monocular image sequences. In: British Machine Vision Conference (BMVC) (2016)Google Scholar
  27. 27.
    Golyanik, V., Shimada, S., Varanasi, K., Stricker, D.: HDM-Net: monocular non-rigid 3D reconstruction with learned deformation model. In: Bourdot, P., Cobb, S., Interrante, V., kato, H., Stricker, D. (eds.) EuroVR 2018. LNCS, vol. 11162, pp. 51–72. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01790-3_4CrossRefGoogle Scholar
  28. 28.
    Gotardo, P.F.U., Martinez, A.M.: Kernel non-rigid structure from motion. In: International Conference on Computer Vision (ICCV), pp. 802–809 (2011)Google Scholar
  29. 29.
    Gotardo, P.F.U., Martinez, A.M.: Non-rigid structure from motion with complementary rank-3 spaces. In: Computer Vision and Pattern Recognition (CVPR), pp. 3065–3072 (2011)Google Scholar
  30. 30.
    Groueix, T., Fisher, M., Kim, V.G., Russell, B., Aubry, M.: AtlasNet: a Papier-Mâché approach to learning 3D surface generation. In: Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  31. 31.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: International Conference on Computer Vision (ICCV), pp. 1026–1034 (2015)Google Scholar
  32. 32.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)Google Scholar
  33. 33.
    Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 386–402. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01267-0_23CrossRefGoogle Scholar
  34. 34.
    Kong, C., Lucey, S.: Deep non-rigid structure from motion. In: International Conference on Computer Vision (ICCV) (2019)Google Scholar
  35. 35.
    Kovalenko, O., Golyanik, V., Malik, J., Elhayek, A., Stricker, D.: Structure from articulated motion: accurate and stable monocular 3D reconstruction without training data. Sensors 19(20), 4603 (2019)CrossRefGoogle Scholar
  36. 36.
    Kumar, S.: Jumping manifolds: geometry aware dense non-rigid structure from motion. In: Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  37. 37.
    Kumar, S., Cherian, A., Dai, Y., Li, H.: Scalable dense non-rigid structure-from-motion: a grassmannian perspective. In: Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  38. 38.
    Lee, M., Cho, J., Choi, C.H., Oh, S.: Procrustean normal distribution for non-rigid structure from motion. In: Computer Vision and Pattern Recognition (CVPR) (2013)Google Scholar
  39. 39.
    Lee, M., Choi, C.H., Oh, S.: A procrustean Markov process for non-rigid structure recovery. In: Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  40. 40.
    Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  41. 41.
    Novotny, D., Ravi, N., Graham, B., Neverova, N., Vedaldi, A.: C3DPO: canonical 3D pose networks for non-rigid structure from motion. In: International Conference on Computer Vision (ICCV) (2019)Google Scholar
  42. 42.
    Östlund, J., Varol, A., Ngo, D.T., Fua, P.: Laplacian meshes for monocular 3D shape recovery. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 412–425. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33712-3_30CrossRefGoogle Scholar
  43. 43.
    Paladini, M., Del Bue, A., Xavier, J., Agapito, L., Stosić, M., Dodig, M.: Optimal metric projections for deformable and articulated structure-from-motion. Int. J. Comput. Vis. (IJCV) 96(2), 252–276 (2012)MathSciNetCrossRefGoogle Scholar
  44. 44.
    Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: learning continuous signed distance functions for shape representation. In: Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  45. 45.
    Paszke, A., et al.: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems (NeurIPS) (2019)Google Scholar
  46. 46.
    Pearson, K.: On lines and planes of closest fit to systems of points in space. Philoso. Mag. 2, 559–572 (1901)CrossRefGoogle Scholar
  47. 47.
    Pumarola, A., Agudo, A., Porzi, L., Sanfeliu, A., Lepetit, V., Moreno-Noguer, F.: Geometry-aware network for non-rigid shape prediction from a single view. In: Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  48. 48.
    Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: International Conference on Neural Networks (ICNN), pp. 586–591 (1993)Google Scholar
  49. 49.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)CrossRefGoogle Scholar
  50. 50.
    Russell, C., Fayad, J., Agapito, L.: Energy based multiple model fitting for non-rigid structure from motion. In: Computer Vision and Pattern Recognition (CVPR), pp. 3009–3016 (2011)Google Scholar
  51. 51.
    Russell, C., Fayad, J., Agapito, L.: Dense non-rigid structure from motion. In: 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization Transmission (3DIMPVT) (2012)Google Scholar
  52. 52.
    Sahasrabudhe, M., Shu, Z., Bartrum, E., Alp Güler, R., Samaras, D., Kokkinos, I.: Lifting autoencoders: unsupervised learning of a fully-disentangled 3D morphable model using deep non-rigid structure from motion. In: International Conference on Computer Vision Workshops (ICCVW) (2019)Google Scholar
  53. 53.
    Salzmann, M., Fua, P.: Reconstructing sharply folding surfaces: a convex formulation. In: Computer Vision and Pattern Recognition (CVPR), pp. 1054–1061 (2009)Google Scholar
  54. 54.
    Shimada, S., Golyanik, V., Theobalt, C., Stricker, D.: IsMo-GAN: adversarial learning for monocular non-rigid 3D reconstruction. In: Computer Vision and Pattern Recognition Workshops (CVPRW) (2019)Google Scholar
  55. 55.
    Sorkine, O.: Laplacian mesh processing. In: Annual Conference of the European Association for Computer Graphics (Eurographics) (2005)Google Scholar
  56. 56.
    Stoyanov, D.: Stereoscopic scene flow for robotic assisted minimally invasive surgery. In: Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.) MICCAI 2012. LNCS, vol. 7510, pp. 479–486. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33415-3_59CrossRefGoogle Scholar
  57. 57.
    Taetz, B., Bleser, G., Golyanik, V., Stricker, D.: Occlusion-aware video registration for highly non-rigid objects. In: Winter Conference on Applications of Computer Vision (WACV) (2016)Google Scholar
  58. 58.
    Tewari, A., et al.: FML: face model learning from videos. In: Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  59. 59.
    Tewari, A., et al.: MoFA: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: International Conference on Computer Vision (ICCV) (2017)Google Scholar
  60. 60.
    Tomasi, C., Kanade, T.: Shape and motion from image streams under orthography: a factorization method. Int. J. Comput. Vis. (IJCV) 9(2), 137–154 (1992)CrossRefGoogle Scholar
  61. 61.
    Torresani, L., Hertzmann, A., Bregler, C.: Nonrigid structure-from-motion: estimating shape and motion with hierarchical priors. Trans. Pattern Anal. Mach. Intell. (TPAMI) 30(5), 878–892 (2008)CrossRefGoogle Scholar
  62. 62.
    Tsoli, A., Argyros, A.A.: Patch-based reconstruction of a textureless deformable 3D surface from a single RGB image. In: International Conference on Computer Vision Workshops (ICCVW) (2019)Google Scholar
  63. 63.
    Valgaerts, L., Wu, C., Bruhn, A., Seidel, H.P., Theobalt, C.: Lightweight binocular facial performance capture under uncontrolled lighting. ACM Trans. Graph. (TOG) 31(6), 187:1–187:11 (2012)CrossRefGoogle Scholar
  64. 64.
    Varol, A., Salzmann, M., Fua, P., Urtasun, R.: A constrained latent variable model. In: Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  65. 65.
    Vicente, S., Agapito, L.: Soft inextensibility constraints for template-free non-rigid reconstruction. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 426–440. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33712-3_31CrossRefGoogle Scholar
  66. 66.
    Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2mesh: generating 3D mesh models from single RGB images. In: European Conference on Computer Vision (ECCV) (2018)Google Scholar
  67. 67.
    Xiao, J., Chai, J., Kanade, T.: A closed-form solution to non-rigid shape and motion recovery. In: European Conference on Computer Vision (ECCV) (2004)Google Scholar
  68. 68.
    Yu, R., Russell, C., Campbell, N.D.F., Agapito, L.: Direct, dense, and deformable: template-based non-rigid 3D reconstruction from RGB video. In: International Conference on Computer Vision (ICCV) (2015)Google Scholar
  69. 69.
    Zhu, Y., Huang, D., Torre, F.D.L., Lucey, S.: Complex non-rigid motion 3D reconstruction by union of subspaces. In: Computer Vision and Pattern Recognition (CVPR), pp. 1542–1549 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Max Planck Institute for InformaticsSaarbrückenGermany
  2. 2.Saarland UniversitySaarbrückenGermany
  3. 3.Institut de Robótica i Informática Industrial, CSIC-UPCBarcelonaSpain

Personalised recommendations