HDM-Net: Monocular Non-rigid 3D Reconstruction with Learned Deformation Model

Golyanik, Vladislav; Shimada, Soshi; Varanasi, Kiran; Stricker, Didier

doi:10.1007/978-3-030-01790-3_4

Vladislav Golyanik^18,19,
Soshi Shimada^18,19,
Kiran Varanasi¹⁸ &
…
Didier Stricker^18,19

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11162))

Included in the following conference series:

International Conference on Virtual Reality and Augmented Reality

2220 Accesses
20 Citations

Abstract

Monocular dense 3D reconstruction of deformable objects is a hard ill-posed problem in computer vision. Current techniques either require dense correspondences and rely on motion and deformation cues, or assume a highly accurate reconstruction (referred to as a template) of at least a single frame given in advance and operate in the manner of non-rigid tracking. Accurate computation of dense point tracks often requires multiple frames and might be computationally expensive. Availability of a template is a very strong prior which restricts system operation to a pre-defined environment and scenarios. In this work, we propose a new hybrid approach for monocular non-rigid reconstruction which we call Hybrid Deformation Model Network (HDM-Net). In our approach, a deformation model is learned by a deep neural network, with a combination of domain-specific loss functions. We train the network with multiple states of a non-rigidly deforming structure with a known shape at rest. HDM-Net learns different reconstruction cues including texture-dependent surface deformations, shading and contours. We show generalisability of HDM-Net to states not presented in the training dataset, with unseen textures and under new illumination conditions. Experiments with noisy data and a comparison with other methods demonstrate the robustness and accuracy of the proposed approach and suggest possible application scenarios of the new technique in interventional diagnostics and augmented reality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The dataset is available upon request.
2.
When executed in a batch of 100 frames with \(73^2\) points each, a C++ version of [71] takes 1.47 ms per frame on our hardware; for 400 frames long batch, it requires 5.27 ms per frame.

References

Agudo, A., Agapito, L., Calvo, B., Montiel, J.M.M.: Good vibrations: a modal analysis approach for sequential non-rigid structure from motion. In: Computer Vision and Pattern Recognition (CVPR), pp. 1558–1565 (2014)
Google Scholar
Agudo, A., Moreno-Noguer, F.: Force-based representation for non-rigid shape and elastic model estimation. Trans. Pattern Anal. Mach. Intell. (TPAMI) 40(9), 2137–2150 (2018)
Article Google Scholar
Agudo, A., Moreno-Noguer, F.: A scalable, efficient, and accurate solution to non-rigid structure from motion. Comput. Vis. Image Underst. (CVIU), 167, 121–133 (2018)
Article Google Scholar
Agudo, A., Moreno-Noguer, F., Calvo, B., Montiel, J.M.M.: Sequential non-rigid structure from motion using physical priors. Trans. Pattern Anal. Mach. Intell. (TPAMI) 38, 979–994 (2016)
Article Google Scholar
Akhter, I., Sheikh, Y., Khan, S., Kanade, T.: Trajectory space: a dual representation for nonrigid structure from motion. Trans. Pattern Anal. Mach. Intell. (TPAMI) 33(7), 1442–1456 (2011)
Article Google Scholar
Ansari, M., Golyanik, V., Stricker, D.: Scalable dense monocular surface reconstruction. In: International Conference on 3D Vision (3DV) (2017)
Google Scholar
Birkbeck, N., Cobza, D., Jägersand, M.: Basis constrained 3D scene flow on a dynamic proxy. In: International Conference on Computer Vision (ICCV), pp. 1967–1974 (2011)
Google Scholar
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. ACM Trans. Graph. (TOG) 187–194 (1999)
Google Scholar
Brand, M.: A direct method for 3D factorization of nonrigid motion observed in 2D. In: Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 122–128 (2005)
Google Scholar
Bregler, C., Hertzmann, A., Biermann, H.: Recovering non-rigid 3D shape from image streams. In: Computer Vision and Pattern Recognition (CVPR), pp. 690–696 (2000)
Google Scholar
Brunet, F., Hartley, R., Bartoli, A., Navab, N., Malgouyres, R.: Monocular template-based reconstruction of smooth and inextensible surfaces. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6494, pp. 52–66. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19318-7_5
Chapter Google Scholar
Del Bue, A.: A factorization approach to structure from motion with shape priors. In: Computer Vision and Pattern Recognition (CVPR) (2008)
Google Scholar
Chhatkuli, A., Pizarro, D., Collins, T., Bartoli, A.: Inextensible non-rigid structure-from-motion by second-order cone programming. Trans. Pattern Anal. Mach. Intell. (TPAMI) 40(1), 2428–2441 (2018)
Article Google Scholar
Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38
Chapter Google Scholar
Cohen, L.D., Cohen, I.: Deformable models for 3-D medical images using finite elements and balloons. In: Computer Vision and Pattern Recognition (CVPR), pp. 592–598 (1992)
Google Scholar
Cook, R.L., Torrance, K.E.: A reflectance model for computer graphics. ACM Trans. Graph. (TOG) 1(1), 7–24 (1982)
Article Google Scholar
Curless, B., Levoy, M.: A volumetric method for building complex models from range images. ACM Trans. Graph. (TOG) 303–312 (1996)
Google Scholar
Dai, Y., Li, H., He, M.: A simple prior-free method for non-rigid structure-from-motion factorization. Int. J. Comput. Vis. 107(2), 101–122 (2014)
Article MathSciNet Google Scholar
Dou, P., Shah, S.K., Kakadiaris, I.A.: End-to-end 3D face reconstruction with deep neural networks. In: Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems (NIPS), pp. 2366–2374 (2014)
Google Scholar
Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object reconstruction from a single image. In: Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Fayad, J., Agapito, L., Del Bue, A.: Piecewise quadratic reconstruction of non-rigid surfaces from monocular sequences. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 297–310. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_22
Chapter Google Scholar
Blender Foundation: blender, v. 2.79a. open source 3d creation (2018). https://www.blender.org/
Gallardo, M., Collins, T., Bartoli, A.: Using shading and a 3D template to reconstruct complex surface deformations. In: British Machine Vision Conference (BMVC) (2016)
Google Scholar
Gallardo, M., Collins, T., Bartoli, A.: Dense non-rigid structure-from-motion and shading with unknown albedos. In: International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Garg, R., Roussos, A., Agapito, L.: Dense variational reconstruction of non-rigid surfaces from monocular video. In: Computer Vision and Pattern Recognition (CVPR), pp. 1272–1279 (2013)
Google Scholar
Garg, R., Kumar, V.B.G., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_45
Chapter Google Scholar
Garrido, P., et al.: Reconstruction of personalized 3D face rigs from monocular video 35(3), 28:1–28:15 (2016)
Google Scholar
Giannarou, S., Visentini-Scarzanella, M., Yang, G.Z.: Probabilistic tracking of affine-invariant anisotropic regions. Trans. Pattern Anal. Mach. Intell. (TPAMI) 35(1), 130–143 (2013)
Article Google Scholar
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Golyanik, V., Fetzer, T., Stricker, D.: Accurate 3D reconstruction of dynamic scenes from monocular image sequences with severe occlusions. In: Winter Conference on Applications of Computer Vision (WACV) (2017)
Google Scholar
Golyanik, V., Mathur, A.S., Stricker, D.: NRSFM-flow: recovering non-rigid scene flow from monocular image sequences. In: British Machine Vision Conference (BMVC) (2016)
Google Scholar
Golyanik, V., Stricker, D.: Dense batch non-rigid structure from motion in a second. In: Winter Conference on Applications of Computer Vision (WACV), pp. 254–263 (2017)
Google Scholar
Gotardo, P.F.U., Martinez, A.M.: Non-rigid structure from motion with complementary rank-3 spaces. In: Computer Vision and Pattern Recognition (CVPR), pp. 3065–3072 (2011)
Google Scholar
Guan, P., Weiss, A., Blan, A.O., Black, M.J.: Estimating human shape and pose from a single image. In: International Conference on Computer Vision (ICCV), pp. 1381–1388 (2009)
Google Scholar
Gumerov, N., Zandifar, A., Duraiswami, R., Davis, L.S.: Structure of applicable surfaces from single views. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3023, pp. 482–496. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24672-5_38
Chapter Google Scholar
Hamsici, O.C., Gotardo, P.F.U., Martinez, A.M.: Learning spatially-smooth mappings in non-rigid structure from motion. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 260–273. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_19
Chapter Google Scholar
Haouchine, N., Dequidt, J., Berger, M.O., Cotin, S.: Single view augmentation of 3D elastic objects. In: International Symposium on Mixed and Augmented Reality (ISMAR), pp. 229–236 (2014)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Jackson, A.S., Bulat, A., Argyriou, V., Tzimiropoulos, G.: Large pose 3D face reconstruction from a single image via direct volumetric CNN regression. In: International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 2017–2025 (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)
Google Scholar
Lee, M., Cho, J., Oh, S.: Procrustean normal distribution for non-rigid structure from motion. Trans. Pattern Anal. Mach. Intell. (TPAMI) 39(7), 1388–1400 (2017)
Article Google Scholar
Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Liu-Yin, Q., Yu, R., Agapito, L., Fitzgibbon, A., Russell, C.: Better together: joint reasoning for non-rigid 3D reconstruction with specularities and shading. In: British Machine Vision Conference (BMVC) (2016)
Google Scholar
Malti, A., Hartley, R., Bartoli, A., Kim, J.H.: Monocular template-based 3D reconstruction of extensible surfaces with local linear elasticity. In: Computer Vision and Pattern Recognition (CVPR), pp. 1522–1529 (2013)
Google Scholar
McInerney, T., Terzopoulos, D.: A finite element model for 3D shape reconstruction and nonrigid motion tracking. In: International Conference on Computer Vision (ICCV), pp. 518–523 (1993)
Google Scholar
Mitiche, A., Mathlouthi, Y., Ben Ayed, I.: Monocular concurrent recovery of structure and motion scene flow. Front. ICT 2, 16 (2015)
Google Scholar
Moreno-Noguer, F., Porta, J.M., Fua, P.: Exploring ambiguities for monocular non-rigid shape estimation. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 370–383. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15558-1_27
Chapter Google Scholar
NVIDIA Corporation: NVIDIA CUDA C programming guide (2018). Version 9.0
Google Scholar
Paladini, M., Del Bue, A., Xavier, J., Agapito, L., Stosić, M., Dodig, M.: Optimal metric projections for deformable and articulated structure-from-motion. Int. J. Comput. Vis. (IJCV) 96(2), 252–276 (2012)
Article MathSciNet Google Scholar
Paladini, M., Bartoli, A., Agapito, L.: Sequential non-rigid structure-from-motion with the 3D-implicit low-rank shape model. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 15–28. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15552-9_2
Chapter Google Scholar
Paszke, A., et al.: Automatic differentiation in pytorch. In: Advances in Neural Information Processing Systems Workshops (NIPS-W) (2017)
Google Scholar
Paszke, A., Gross, S., Massa, F., Chintala, S.: pytorch (2018). https://github.com/pytorch
Perriollat, M., Hartley, R., Bartoli, A.: Monocular template-based reconstruction of inextensible surfaces. Int. J. Comput. Vis. (IJCV) 95(2), 124–137 (2011)
Article MathSciNet Google Scholar
Pumarola, A., Agudo, A., Porzi, L., Sanfeliu, A., Lepetit, V., Moreno-Noguer, F.: Geometry-aware network for non-rigid shape prediction from a single view. In: Computer Vision and Pattern Recognition (CVPR), pp. 4681–4690 (2018)
Google Scholar
Riegler, G., Ulusoy, A.O., Bischof, H., Geiger, A.: OctNetFusion: learning depth fusion from data. In: International Conference on 3D Vision (3DV) (2017)
Google Scholar
Russell, C., Fayad, J., Agapito, L.: Dense non-rigid structure from motion. In: International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), pp. 509–516 (2012)
Google Scholar
Salzmann, M., Fua, P.: Reconstructing sharply folding surfaces: a convex formulation. In: Computer Vision and Pattern Recognition (CVPR), pp. 1054–1061 (2009)
Google Scholar
Salzmann, M., Fua, P.: Linear local models for monocular reconstruction of deformable surfaces. Trans. Pattern Anal. Mach. Intell. (TPAMI) 33(5), 931–944 (2011)
Article Google Scholar
Salzmann, M., Hartley, R., Fua, P.: Convex optimization for deformable surface 3-D tracking. In: International Conference on Computer Vision (ICCV) (2007)
Google Scholar
Salzmann, M., Lepetit, V., Fua, P.: Deformable surface tracking ambiguities. In: Computer Vision and Pattern Recognition (CVPR) (2007)
Google Scholar
Sela, M., Richardson, E., Kimmel, R.: Unrestricted facial geometry reconstruction using image-to-image translation. In: International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Stay & Play Rotorua Ltd: A hot balloon. http://stayandplaynz.com/rotorua/the-real-new-zealand-experience/. Accessed 29 June 2018
Suwajanakorn, S., Kemelmacher-Shlizerman, I., Seitz, S.M.: Total moving face reconstruction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 796–812. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_52
Chapter Google Scholar
Taetz, B., Bleser, G., Golyanik, V., Stricker, D.: Occlusion-aware video registration for highly non-rigid objects. In: Winter Conference on Applications of Computer Vision (WACV) (2016)
Google Scholar
Tao, L., Matuszewski, B.J.: Non-rigid structure from motion with diffusion maps prior. In: Computer Vision and Pattern Recognition (CVPR), pp. 1530–1537 (2013)
Google Scholar
Tateno, K., Tombari, F., Laina, I., Navab, N.: CNN-SLAM: real-time dense monocular slam with learned depth prediction. In: Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Tewari, A., et al.: Mofa: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Textures.com: WrincklesHanging0037. https://www.textures.com/browse/hanging/112398. Accessed 29 June 2018
Tomasi, C., Kanade, T.: Shape and motion from image streams under orthography: a factorization method. Int. J. Comput. Vis. (IJCV) 9, 137–154 (1992)
Article Google Scholar
Tome, D., Russell, C., Agapito, L.: Lifting from the deep: convolutional 3D pose estimation from a single image. In: Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Torresani, L., Hertzmann, A., Bregler, C.: Nonrigid structure-from-motion: estimating shape and motion with hierarchical priors. Trans. Pattern Anal. Mach. Intell. (TPAMI) 30(5), 878–892 (2008)
Article Google Scholar
Varol, A., Shaji, A., Salzmann, M., Fua, P.: Monocular 3D reconstruction of locally textured surfaces. Trans. Pattern Anal. Mach. Intell. (TPAMI) 34(6), 1118–1130 (2012)
Article Google Scholar
Vicente, S., Agapito, L.: Soft inextensibility constraints for template-free non-rigid reconstruction. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 426–440. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_31
Chapter Google Scholar
Wandt, B., Ackermann, H., Rosenhahn, B.: 3D reconstruction of human motion from monocular image sequences. Trans. Pattern Anal. Mach. Intell. (TPAMI) 38(8), 1505–1516 (2016)
Article Google Scholar
White, R., Forsyth, D.A.: Combining cues: shape from shading and texture. In: Computer Vision and Pattern Recognition (CVPR), pp. 1809–1816 (2006)
Google Scholar
Xiao, D., Yang, Q., Yang, B., Wei, W.: Monocular scene flow estimation via variational method. Multimedia Tools Appl. 76(8), 10575–10597 (2017)
Article Google Scholar
Xiao, J., Chai, J., Kanade, T.: A closed-form solution to non-rigid shape and motion recovery. Int. J. Comput. Vis. (IJCV) 67(2), 233–246 (2006)
Article Google Scholar
Yu, R., Russell, C., Campbell, N.D.F., Agapito, L.: Direct, dense, and deformable: template-based non-rigid 3D reconstruction from RGB video. In: International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Zhou, X., Zhu, M., Pavlakos, G., Leonardos, S., Derpanis, K.G., Daniilidis, K.: Monocap: monocular human motion capture using a CNN coupled with a geometric prior. Trans. Pattern Anal. Mach. Intell. (TPAMI) (2018)
Google Scholar
Zhu, S., Zhang, L., Smith, B.M.: Model evolution: an incremental approach to non-rigid structure from motion. In: Computer Vision and Pattern Recognition (CVPR), pp. 1165–1172 (2010)
Google Scholar

Download references

Acknowledgement

Development of HDM-Net was supported by the project DYMANICS (01IW15003) of the German Federal Ministry of Education and Research (BMBF). The authors thank NVIDIA Corporation for the hardware donations.

Author information

Authors and Affiliations

Augmented Vision, DFKI, Kaiserslautern, Germany
Vladislav Golyanik, Soshi Shimada, Kiran Varanasi & Didier Stricker
University of Kaiserslautern, Kaiserslautern, Germany
Vladislav Golyanik, Soshi Shimada & Didier Stricker

Authors

Vladislav Golyanik
View author publications
You can also search for this author in PubMed Google Scholar
Soshi Shimada
View author publications
You can also search for this author in PubMed Google Scholar
Kiran Varanasi
View author publications
You can also search for this author in PubMed Google Scholar
Didier Stricker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladislav Golyanik .

Editor information

Editors and Affiliations

University of Paris-Sud, Orsay, France
Patrick Bourdot
University of Nottingham, Nottingham, UK
Sue Cobb
University of Minnesota, Minneapolis, MN, USA
Victoria Interrante
Nara Institute of Science and Technology, Ikoma, Japan
Hirokazu kato
University of Kaiserslautern and DFKI, Kaiserslautern, Germany
Didier Stricker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Golyanik, V., Shimada, S., Varanasi, K., Stricker, D. (2018). HDM-Net: Monocular Non-rigid 3D Reconstruction with Learned Deformation Model. In: Bourdot, P., Cobb, S., Interrante, V., kato, H., Stricker, D. (eds) Virtual Reality and Augmented Reality. EuroVR 2018. Lecture Notes in Computer Science(), vol 11162. Springer, Cham. https://doi.org/10.1007/978-3-030-01790-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-01790-3_4
Published: 26 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01789-7
Online ISBN: 978-3-030-01790-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics