Procrustean Regression Networks: Learning 3D Structure of Non-rigid Objects from 2D Annotations

Park, Sungheon; Lee, Minsik; Kwak, Nojun

doi:10.1007/978-3-030-58526-6_1

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12374))

Included in the following conference series:

European Conference on Computer Vision

3734 Accesses
11 Citations

Abstract

We propose a novel framework for training neural networks which is capable of learning 3D information of non-rigid objects when only 2D annotations are available as ground truths. Recently, there have been some approaches that incorporate the problem setting of non-rigid structure-from-motion (NRSfM) into deep learning to learn 3D structure reconstruction. The most important difficulty of NRSfM is to estimate both the rotation and deformation at the same time, and previous works handle this by regressing both of them. In this paper, we resolve this difficulty by proposing a loss function wherein the suitable rotation is automatically determined. Trained with the cost function consisting of the reprojection error and the low-rank term of aligned shapes, the network learns the 3D structures of such objects as human skeletons and faces during the training, whereas the testing is done in a single-frame basis. The proposed method can handle inputs with missing entries and experimental results validate that the proposed framework shows superior reconstruction performance to the state-of-the-art method on the Human 3.6M, 300-VW, and SURREAL datasets, even though the underlying network structure is very simple.

S. Park, M. Lee—Authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep learning-based 3D reconstruction: a survey

Article 28 January 2023

HDM-Net: Monocular Non-rigid 3D Reconstruction with Learned Deformation Model

Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows

References

Agudo, A., Moreno-Noguer, F.: Force-based representation for non-rigid shape and elastic model estimation. IEEE Trans. Pattern Anal. Mach. Intell. 40(9), 2137–2150 (2018)
Article Google Scholar
Akhter, I., Sheikh, Y., Khan, S., Kanade, T.: Trajectory space: a dual representation for nonrigid structure from motion. IEEE Trans. Pattern Anal. Mach. Intell. 33(7), 1442–1456 (2011)
Article Google Scholar
Bartoli, A., Gay-Bellile, V., Castellani, U., Peyras, J., Olsen, S., Sayd, P.: Coarse-to-fine low-rank structure-from-motion. In: IEEE Conference on Computer Vision and Pattern Recognition 2008, CVPR 2008, pp. 1–8. IEEE (2008)
Google Scholar
Bregler, C., Hertzmann, A., Biermann, H.: Recovering non-rigid 3D shape from image streams. In: IEEE Conference on Computer Vision and Pattern Recognition 2000, Proceedings, vol. 2, pp. 690–696. IEEE (2000)
Google Scholar
Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks). In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1021–1030 (2017)
Google Scholar
Cha, G., Lee, M., Oh, S.: Unsupervised 3D reconstruction networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3849–3858 (2019)
Google Scholar
Cho, J., Lee, M., Oh, S.: Complex non-rigid 3D shape recovery using a procrustean normal distribution mixture model. Int. J. Comput. Vis. 117(3), 226–246 (2016)
Article MathSciNet Google Scholar
Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38
Chapter Google Scholar
Dai, Y., Li, H., He, M.: A simple prior-free method for non-rigid structure-from-motion factorization. Int. J. Comput. Vis. 107(2), 101–122 (2014)
Article MathSciNet Google Scholar
Gadelha, M., Maji, S., Wang, R.: 3D shape induction from 2D views of multiple objects. arXiv preprint arXiv:1612.05872 (2016)
Garg, R., Roussos, A., Agapito, L.: Dense variational reconstruction of non-rigid surfaces from monocular video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1272–1279 (2013)
Google Scholar
Gotardo, P.F., Martinez, A.M.: Non-rigid structure from motion with complementary rank-3 spaces. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3065–3072. IEEE (2011)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37, pp. 448–456. JMLR.org (2015)
Google Scholar
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)
Article Google Scholar
Kanazawa, A., Jacobs, D.W., Chandraker, M.: WarpNet: weakly supervised matching for single-view reconstruction. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar
Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 386–402. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_23
Chapter Google Scholar
Kong, C., Lucey, S.: Prior-less compressible structure from motion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4123–4131 (2016)
Google Scholar
Kong, C., Lucey, S.: Deep non-rigid structure from motion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1558–1567 (2019)
Google Scholar
Lee, M., Cho, J., Oh, S.: Procrustean normal distribution for non-rigid structure from motion. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1388–1400 (2017). https://doi.org/10.1109/TPAMI.2016.2596720
Article Google Scholar
Lee, M., Cho, J., Choi, C.H., Oh, S.: Procrustean normal distribution for non-rigid structure from motion. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1280–1287. IEEE (2013)
Google Scholar
Lee, M., Choi, C.H., Oh, S.: A procrustean Markov process for non-rigid structure recovery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1550–1557 (2014)
Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248:1–248:16 (2015). (Proc. SIGGRAPH Asia)
Article Google Scholar
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2640–2649 (2017)
Google Scholar
Mehta, D., et al.: VNect: real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. (TOG) 36(4), 44 (2017)
Article Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Chapter Google Scholar
Novotny, D., Ravi, N., Graham, B., Neverova, N., Vedaldi, A.: C3DPO: canonical 3D pose networks for non-rigid structure from motion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7688–7697 (2019)
Google Scholar
Paladini, M., Del Bue, A., Stosic, M., Dodig, M., Xavier, J., Agapito, L.: Factorization for non-rigid and articulated structure using metric projections. In: IEEE Conference on Computer Vision and Pattern Recognition 2009, CVPR 2009, pp. 2898–2905. IEEE (2009)
Google Scholar
Park, S., Lee, M., Kwak, N.: Procrustean regression: a flexible alignment-based framework for nonrigid structure estimation. IEEE Trans. Image Process. 27(1), 249–264 (2018)
Article MathSciNet Google Scholar
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7025–7034 (2017)
Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Shen, J., Zafeiriou, S., Chrysos, G.G., Kossaifi, J., Tzimiropoulos, G., Pantic, M.: The first facial landmark tracking in-the-wild challenge: benchmark and results. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 50–58 (2015)
Google Scholar
Tatarchenko, M., Dosovitskiy, A., Brox, T.: Multi-view 3D models from single images with a convolutional network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 322–337. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_20
Chapter Google Scholar
Torresani, L., Hertzmann, A., Bregler, C.: Nonrigid structure-from-motion: estimating shape and motion with hierarchical priors. IEEE Trans. Pattern Anal. Mach. Intell. 30(5), 878–892 (2008)
Article Google Scholar
Tulsiani, S., Kar, A., Carreira, J., Malik, J.: Learning category-specific deformable 3D models for object reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 719–731 (2017)
Article Google Scholar
Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: CVPR, vol. 1, p. 3 (2017)
Google Scholar
Varol, G., et al.: Learning from synthetic humans. In: CVPR (2017)
Google Scholar
Wang, C., Kong, C., Lucey, S.: Distill knowledge from NRSfM for weakly supervised 3D pose learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 743–752 (2019)
Google Scholar
Wu, J., Wang, Y., Xue, T., Sun, X., Freeman, B., Tenenbaum, J.: MarrNet: 3D shape reconstruction via 2.5 D sketches. In: Advances in Neural Information Processing Systems, pp. 540–550 (2017)
Google Scholar
Wu, J., et al.: Single image 3D interpreter network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 365–382. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_22
Chapter Google Scholar
Yan, X., Yang, J., Yumer, E., Guo, Y., Lee, H.: Perspective transformer nets: Learning single-view 3D object reconstruction without 3D supervision. In: Advances in Neural Information Processing Systems, pp. 1696–1704 (2016)
Google Scholar
Zhang, D., Han, J., Yang, Y., Huang, D.: Learning category-specific 3D shape models from weakly labeled 2D images. In: Proceedings of the CVPR, pp. 4573–4581 (2017)
Google Scholar
Zhou, X., Leonardos, S., Hu, X., Daniilidis, K.: 3D shape estimation from 2D landmarks: a convex relaxation approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4447–4455 (2015)
Google Scholar
Zhou, X., Zhu, M., Leonardos, S., Derpanis, K.G., Daniilidis, K.: Sparseness meets deepness: 3D human pose estimation from monocular video. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar

Download references

Acknowledgement

This work was supported by grants from IITP (No.2019-0-01367, Babymind) and NRF Korea (2017M3C4A7077582, 2020R1C1C1012479), all of which are funded by the Korea government (MSIT).

Author information

Authors and Affiliations

Samsung Advanced Institute of Technology (SAIT), Yongin-si, Korea
Sungheon Park
Hanyang University, Seoul, Korea
Minsik Lee
Seoul National University, Seoul, Korea
Nojun Kwak

Authors

Sungheon Park
View author publications
You can also search for this author in PubMed Google Scholar
Minsik Lee
View author publications
You can also search for this author in PubMed Google Scholar
Nojun Kwak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nojun Kwak .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 787 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Park, S., Lee, M., Kwak, N. (2020). Procrustean Regression Networks: Learning 3D Structure of Non-rigid Objects from 2D Annotations. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12374. Springer, Cham. https://doi.org/10.1007/978-3-030-58526-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-58526-6_1
Published: 07 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58525-9
Online ISBN: 978-3-030-58526-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Procrustean Regression Networks: Learning 3D Structure of Non-rigid Objects from 2D Annotations

Abstract

Access this chapter

Similar content being viewed by others

Deep learning-based 3D reconstruction: a survey

HDM-Net: Monocular Non-rigid 3D Reconstruction with Learned Deformation Model

Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 787 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Procrustean Regression Networks: Learning 3D Structure of Non-rigid Objects from 2D Annotations

Abstract

Access this chapter

Similar content being viewed by others

Deep learning-based 3D reconstruction: a survey

HDM-Net: Monocular Non-rigid 3D Reconstruction with Learned Deformation Model

Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 787 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation