Skip to main content

Procrustean Regression Networks: Learning 3D Structure of Non-rigid Objects from 2D Annotations

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Abstract

We propose a novel framework for training neural networks which is capable of learning 3D information of non-rigid objects when only 2D annotations are available as ground truths. Recently, there have been some approaches that incorporate the problem setting of non-rigid structure-from-motion (NRSfM) into deep learning to learn 3D structure reconstruction. The most important difficulty of NRSfM is to estimate both the rotation and deformation at the same time, and previous works handle this by regressing both of them. In this paper, we resolve this difficulty by proposing a loss function wherein the suitable rotation is automatically determined. Trained with the cost function consisting of the reprojection error and the low-rank term of aligned shapes, the network learns the 3D structures of such objects as human skeletons and faces during the training, whereas the testing is done in a single-frame basis. The proposed method can handle inputs with missing entries and experimental results validate that the proposed framework shows superior reconstruction performance to the state-of-the-art method on the Human 3.6M, 300-VW, and SURREAL datasets, even though the underlying network structure is very simple.

S. Park, M. Lee—Authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Agudo, A., Moreno-Noguer, F.: Force-based representation for non-rigid shape and elastic model estimation. IEEE Trans. Pattern Anal. Mach. Intell. 40(9), 2137–2150 (2018)

    Article  Google Scholar 

  2. Akhter, I., Sheikh, Y., Khan, S., Kanade, T.: Trajectory space: a dual representation for nonrigid structure from motion. IEEE Trans. Pattern Anal. Mach. Intell. 33(7), 1442–1456 (2011)

    Article  Google Scholar 

  3. Bartoli, A., Gay-Bellile, V., Castellani, U., Peyras, J., Olsen, S., Sayd, P.: Coarse-to-fine low-rank structure-from-motion. In: IEEE Conference on Computer Vision and Pattern Recognition 2008, CVPR 2008, pp. 1–8. IEEE (2008)

    Google Scholar 

  4. Bregler, C., Hertzmann, A., Biermann, H.: Recovering non-rigid 3D shape from image streams. In: IEEE Conference on Computer Vision and Pattern Recognition 2000, Proceedings, vol. 2, pp. 690–696. IEEE (2000)

    Google Scholar 

  5. Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks). In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1021–1030 (2017)

    Google Scholar 

  6. Cha, G., Lee, M., Oh, S.: Unsupervised 3D reconstruction networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3849–3858 (2019)

    Google Scholar 

  7. Cho, J., Lee, M., Oh, S.: Complex non-rigid 3D shape recovery using a procrustean normal distribution mixture model. Int. J. Comput. Vis. 117(3), 226–246 (2016)

    Article  MathSciNet  Google Scholar 

  8. Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38

    Chapter  Google Scholar 

  9. Dai, Y., Li, H., He, M.: A simple prior-free method for non-rigid structure-from-motion factorization. Int. J. Comput. Vis. 107(2), 101–122 (2014)

    Article  MathSciNet  Google Scholar 

  10. Gadelha, M., Maji, S., Wang, R.: 3D shape induction from 2D views of multiple objects. arXiv preprint arXiv:1612.05872 (2016)

  11. Garg, R., Roussos, A., Agapito, L.: Dense variational reconstruction of non-rigid surfaces from monocular video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1272–1279 (2013)

    Google Scholar 

  12. Gotardo, P.F., Martinez, A.M.: Non-rigid structure from motion with complementary rank-3 spaces. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3065–3072. IEEE (2011)

    Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  14. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37, pp. 448–456. JMLR.org (2015)

    Google Scholar 

  15. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)

    Article  Google Scholar 

  16. Kanazawa, A., Jacobs, D.W., Chandraker, M.: WarpNet: weakly supervised matching for single-view reconstruction. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

    Google Scholar 

  17. Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 386–402. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_23

    Chapter  Google Scholar 

  18. Kong, C., Lucey, S.: Prior-less compressible structure from motion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4123–4131 (2016)

    Google Scholar 

  19. Kong, C., Lucey, S.: Deep non-rigid structure from motion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1558–1567 (2019)

    Google Scholar 

  20. Lee, M., Cho, J., Oh, S.: Procrustean normal distribution for non-rigid structure from motion. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1388–1400 (2017). https://doi.org/10.1109/TPAMI.2016.2596720

    Article  Google Scholar 

  21. Lee, M., Cho, J., Choi, C.H., Oh, S.: Procrustean normal distribution for non-rigid structure from motion. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1280–1287. IEEE (2013)

    Google Scholar 

  22. Lee, M., Choi, C.H., Oh, S.: A procrustean Markov process for non-rigid structure recovery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1550–1557 (2014)

    Google Scholar 

  23. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248:1–248:16 (2015). (Proc. SIGGRAPH Asia)

    Article  Google Scholar 

  24. Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2640–2649 (2017)

    Google Scholar 

  25. Mehta, D., et al.: VNect: real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. (TOG) 36(4), 44 (2017)

    Article  Google Scholar 

  26. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29

    Chapter  Google Scholar 

  27. Novotny, D., Ravi, N., Graham, B., Neverova, N., Vedaldi, A.: C3DPO: canonical 3D pose networks for non-rigid structure from motion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7688–7697 (2019)

    Google Scholar 

  28. Paladini, M., Del Bue, A., Stosic, M., Dodig, M., Xavier, J., Agapito, L.: Factorization for non-rigid and articulated structure using metric projections. In: IEEE Conference on Computer Vision and Pattern Recognition 2009, CVPR 2009, pp. 2898–2905. IEEE (2009)

    Google Scholar 

  29. Park, S., Lee, M., Kwak, N.: Procrustean regression: a flexible alignment-based framework for nonrigid structure estimation. IEEE Trans. Image Process. 27(1), 249–264 (2018)

    Article  MathSciNet  Google Scholar 

  30. Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7025–7034 (2017)

    Google Scholar 

  31. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  32. Shen, J., Zafeiriou, S., Chrysos, G.G., Kossaifi, J., Tzimiropoulos, G., Pantic, M.: The first facial landmark tracking in-the-wild challenge: benchmark and results. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 50–58 (2015)

    Google Scholar 

  33. Tatarchenko, M., Dosovitskiy, A., Brox, T.: Multi-view 3D models from single images with a convolutional network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 322–337. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_20

    Chapter  Google Scholar 

  34. Torresani, L., Hertzmann, A., Bregler, C.: Nonrigid structure-from-motion: estimating shape and motion with hierarchical priors. IEEE Trans. Pattern Anal. Mach. Intell. 30(5), 878–892 (2008)

    Article  Google Scholar 

  35. Tulsiani, S., Kar, A., Carreira, J., Malik, J.: Learning category-specific deformable 3D models for object reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 719–731 (2017)

    Article  Google Scholar 

  36. Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: CVPR, vol. 1, p. 3 (2017)

    Google Scholar 

  37. Varol, G., et al.: Learning from synthetic humans. In: CVPR (2017)

    Google Scholar 

  38. Wang, C., Kong, C., Lucey, S.: Distill knowledge from NRSfM for weakly supervised 3D pose learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 743–752 (2019)

    Google Scholar 

  39. Wu, J., Wang, Y., Xue, T., Sun, X., Freeman, B., Tenenbaum, J.: MarrNet: 3D shape reconstruction via 2.5 D sketches. In: Advances in Neural Information Processing Systems, pp. 540–550 (2017)

    Google Scholar 

  40. Wu, J., et al.: Single image 3D interpreter network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 365–382. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_22

    Chapter  Google Scholar 

  41. Yan, X., Yang, J., Yumer, E., Guo, Y., Lee, H.: Perspective transformer nets: Learning single-view 3D object reconstruction without 3D supervision. In: Advances in Neural Information Processing Systems, pp. 1696–1704 (2016)

    Google Scholar 

  42. Zhang, D., Han, J., Yang, Y., Huang, D.: Learning category-specific 3D shape models from weakly labeled 2D images. In: Proceedings of the CVPR, pp. 4573–4581 (2017)

    Google Scholar 

  43. Zhou, X., Leonardos, S., Hu, X., Daniilidis, K.: 3D shape estimation from 2D landmarks: a convex relaxation approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4447–4455 (2015)

    Google Scholar 

  44. Zhou, X., Zhu, M., Leonardos, S., Derpanis, K.G., Daniilidis, K.: Sparseness meets deepness: 3D human pose estimation from monocular video. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

    Google Scholar 

Download references

Acknowledgement

This work was supported by grants from IITP (No.2019-0-01367, Babymind) and NRF Korea (2017M3C4A7077582, 2020R1C1C1012479), all of which are funded by the Korea government (MSIT).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nojun Kwak .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 787 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Park, S., Lee, M., Kwak, N. (2020). Procrustean Regression Networks: Learning 3D Structure of Non-rigid Objects from 2D Annotations. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12374. Springer, Cham. https://doi.org/10.1007/978-3-030-58526-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58526-6_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58525-9

  • Online ISBN: 978-3-030-58526-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics