GSIR: Generalizable 3D Shape Interpretation and Reconstruction

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12358)


3D shape interpretation and reconstruction are closely related to each other but have long been studied separately and often end up with priors that are highly biased towards the training classes. In this paper, we present an algorithm, Generalizable 3D Shape Interpretation and Reconstruction (GSIR), designed to jointly learn these two tasks to capture generic, class-agnostic shape priors for a better understanding of 3D geometry. We propose to recover 3D shape structures as cuboids from partial reconstruction and use the predicted structures to further guide full 3D reconstruction. The unified framework is trained simultaneously offline to learn a generic notion and can be fine-tuned online for specific objects without any annotations. Extensive experiments on both synthetic and real data demonstrate that introducing 3D shape interpretation improves the performance of single image 3D reconstruction and vice versa, achieving the state-of-the-art performance on both tasks for objects in both seen and unseen categories.


Shape interpretation 3D reconstruction 


  1. 1.
    Akhter, I., Black, M.J.: Pose-conditioned joint angle limits for 3D human pose reconstruction. In: CVPR (2015)Google Scholar
  2. 2.
    Arsalan Soltani, A., Huang, H., Wu, J., Kulkarni, T.D., Tenenbaum, J.B.: Synthesizing 3D shapes via modeling multi-view depth maps and silhouettes with deep generative networks. In: CVPR (2017)Google Scholar
  3. 3.
    Balashova, E., Singh, V., Wang, J., Teixeira, B., Chen, T., Funkhouser, T.: Structure-aware shape synthesis. In: 3DV (2018)Google Scholar
  4. 4.
    Barrow, H.G., Tenenbaum, J.M., Bolles, R.C., Wolf, H.C.: Parametric correspondence and chamfer matching: two new techniques for image matching. In: IJCAI (1977)Google Scholar
  5. 5.
    Biederman, I.: Recognition-by-components: a theory of human image understanding. Psychol. Rev. 94, 115 (1987)CrossRefGoogle Scholar
  6. 6.
    Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). Scholar
  7. 7.
    Chan, C., Ginosar, S., Zhou, T., Efros, A.A.: Everybody dance now. arXiv preprint arXiv:1808.07371 (2018)
  8. 8.
    Chan, M.W., Stevenson, A.K., Li, Y., Pizlo, Z.: Binocular shape constancy from novel views: the role of a priori constraints. Percept. Psychophysics 68(7), 1124–1139 (2006)CrossRefGoogle Scholar
  9. 9.
    Chang, A.X., et al.: Shapenet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
  10. 10.
    Chaudhuri, S., Kalogerakis, E., Guibas, L., Koltun, V.: Probabilistic reasoning for assembly-based 3D modeling. In: ACM TOG, vol. 30, p. 35 (2011)Google Scholar
  11. 11.
    Chen, W., Fu, Z., Yang, D., Deng, J.: Single-image depth perception in the wild. In: NeurIPS (2016)Google Scholar
  12. 12.
    Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: CVPR (2019)Google Scholar
  13. 13.
    Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). Scholar
  14. 14.
    Deng, B., Genova, K., Yazdani, S., Bouaziz, S., Hinton, G.E., Tagliasacchi, A.: Cvxnets: Learnable convex decomposition (2020)Google Scholar
  15. 15.
    Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NeurIPS (2014)Google Scholar
  16. 16.
    Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object reconstruction from a single image. In: CVPR (2017)Google Scholar
  17. 17.
    Ganapathi-Subramanian, V., Diamanti, O., Pirk, S., Tang, C., Niessner, M., Guibas, L.: Parsing geometry using structure-aware shape templates. In: 3DV (2018)Google Scholar
  18. 18.
    Girdhar, R., Fouhey, D.F., Rodriguez, M., Gupta, A.: Learning a predictable and generative vector representation for objects. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 484–499. Springer, Cham (2016). Scholar
  19. 19.
    Golovinskiy, A., Funkhouser, T.: Consistent segmentation of 3D models. Comput. Graph. 33(3), 262–269 (2009)CrossRefGoogle Scholar
  20. 20.
    Groueix, T., Fisher, M., Kim, V.G., Russell, B., Aubry, M.: AtlasNet: a papier-mâché approach to learning 3D surface generation. In: CVPR (2018)Google Scholar
  21. 21.
    Groueix, T., Fisher, M., Kim, V.G., Russell, B.C., Aubry, M.: A papier-mâché approach to learning 3D surface generation. In: CVPR (2018)Google Scholar
  22. 22.
    He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)Google Scholar
  23. 23.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  24. 24.
    He, Y., Zhu, C., Wang, J., Savvides, M., Zhang, X.: Bounding box regression with uncertainty for accurate object detection. In: CVPR (2019)Google Scholar
  25. 25.
    Henderson, P., Ferrari, V.: Learning single-image 3D reconstruction by generative modelling of shape, pose and shading. In: IJCV, October 2019Google Scholar
  26. 26.
    Hoiem, D., Efros, A.A., Hebert, M.: Recovering surface layout from an image. IJCV 75(1), 151–172 (2007)CrossRefGoogle Scholar
  27. 27.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NeurIPS (2012)Google Scholar
  28. 28.
    Li, J., Xu, K., Chaudhuri, S., Yumer, E., Zhang, H., Guibas, L.: Grass: Generative recursive autoencoders for shape structures. In: ACM TOG (Proceedings of SIGGRAPH 2017), vol. 36(4) (2017)Google Scholar
  29. 29.
    Li, K., Pham, T., Zhan, H., Reid, I.: Efficient dense point cloud object reconstruction using deformation vector fields. In: ECCV (2018)Google Scholar
  30. 30.
    Li, Y., Pizlo, Z.: Reconstruction of 3D symmetrical shapes by using planarity and compactness constraints. J. Vis. 7(9), 834–834 (2007)CrossRefGoogle Scholar
  31. 31.
    Li, Z., Snavely, N.: Megadepth: learning single-view depth prediction from internet photos. In: CVPR (2018)Google Scholar
  32. 32.
    Lim, J.J., Pirsiavash, H., Torralba, A.: Parsing ikea objects: fine pose estimation. In: ICCV (2013)Google Scholar
  33. 33.
    Litany, O., Bronstein, A., Bronstein, M., Makadia, A.: Deformable shape completion with graph convolutional autoencoders. In: CVPR (2018)Google Scholar
  34. 34.
    Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L.: Pose guided person image generation. In: NeurIPS (2017)Google Scholar
  35. 35.
    Mandikal, P., Navaneet, K.L., Agarwal, M., Babu, R.V.: 3D-LMNet: latent embedding matching for accurate and diverse 3D point cloud reconstruction from a single image. In: Proceedings of the British Machine Vision Conference (BMVC) (2018)Google Scholar
  36. 36.
    Marr, D.: Vision: A Computational Investigation Into the Human Representation and Processing of Visual Information. Ph.D. thesis (1982)Google Scholar
  37. 37.
    McCormac, J., Handa, A., Leutenegger, S., Davison, A.J.: Scenenet RGB-D: Can 5m synthetic images beat generic imagenet pre-training on indoor segmentation? In: ICCV (2017)Google Scholar
  38. 38.
    Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: CVPR (2019)Google Scholar
  39. 39.
    Niu, C., Li, J., Xu, K.: Im2struct: Recovering 3D shape structure from a single RGB image. In: CVPR (2018)Google Scholar
  40. 40.
    Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: learning continuous signed distance functions for shape representation. In: CVPR (2019)Google Scholar
  41. 41.
    Pauly, M., Mitra, N.J., Wallner, J., Pottmann, H., Guibas, L.J.: Discovering structural regularity in 3D geometry. In: ACM TOG, vol. 27 (2008)Google Scholar
  42. 42.
    Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: CVPR (2018)Google Scholar
  43. 43.
    Pizlo, Z.: 3D Shape: Its Unique Place in Visual Perception. MIT Press, Cambridge (2010)Google Scholar
  44. 44.
    Richter, S.R., Roth, S.: Matryoshka networks: predicting 3D geometry via nested shape layers. In: CVPR (2018)Google Scholar
  45. 45.
    Riegler, G., Osman Ulusoy, A., Geiger, A.: Octnet: learning deep 3D representations at high resolutions. In: CVPR (2017)Google Scholar
  46. 46.
    Riegler, G., Ulusoy, A.O., Bischof, H., Geiger, A.: Octnetfusion: learning depth fusion from data. In: 3DV (2017)Google Scholar
  47. 47.
    Roberts, L.G.: Machine perception of three-dimensional solids. Ph.D. thesis, Massachusetts Institute of Technology (1963)Google Scholar
  48. 48.
    Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: MICCAI (2015)Google Scholar
  49. 49.
    Saxena, A., Sun, M., Ng, A.Y.: Make3d: learning 3D scene structure from a single still image. IEEE TPAMI 31(5), 824–840 (2009)CrossRefGoogle Scholar
  50. 50.
    Shin, D., Fowlkes, C.C., Hoiem, D.: Pixels, voxels, and views: a study of shape representations for single view 3D object shape prediction. In: CVPR (2018)Google Scholar
  51. 51.
    Sitzmann, V., Zollhöfer, M., Wetzstein, G.: Scene representation networks: continuous 3D-structure-aware neural scene representations. In: NeurIPS (2019)Google Scholar
  52. 52.
    Št’ava, O., Beneš, B., Měch, R., Aliaga, D.G., Krištof, P.: Inverse procedural modeling by automatic generation of l-systems. Comput. Graph. Forum 29, 665–674 (2010)CrossRefGoogle Scholar
  53. 53.
    Sun, X., et al.: Pix3d: dataset and methods for single-image 3D shape modeling. In: CVPR (2018)Google Scholar
  54. 54.
    Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. In: ICCV (2017)Google Scholar
  55. 55.
    Tulsiani, S., Su, H., Guibas, L.J., Efros, A.A., Malik, J.: Learning shape abstractions by assembling volumetric primitives. In: CVPR (2017)Google Scholar
  56. 56.
    Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: CVPR (2017)Google Scholar
  57. 57.
    Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2mesh: generating 3D mesh models from single RGB images. In: ECCV (2018)Google Scholar
  58. 58.
    Wu, J., Wang, Y., Xue, T., Sun, X., Freeman, B., Tenenbaum, J.: Marrnet: 3D shape reconstruction via 2.5 d sketches. In: NeurIPS (2017)Google Scholar
  59. 59.
    Wu, J., et al.: 3D interpreter networks for viewer-centered wireframe modeling. In: IJCV (2018)Google Scholar
  60. 60.
    Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: NeurIPS (2016)Google Scholar
  61. 61.
    Wu, J., Zhang, C., Zhang, X., Zhang, Z., Freeman, W.T., Tenenbaum, J.B.: Learning shape priors for single-view 3D completion and reconstruction. In: ECCV (2018)Google Scholar
  62. 62.
    Wu, Z., Wang, X., Lin, D., Lischinski, D., Cohen-Or, D., Huang, H.: Structure-aware generative network for 3D-shape modeling. arXiv preprint arXiv:1808.03981 (2018)
  63. 63.
    Xu, Q., Wang, W., Ceylan, D., Mech, R., Neumann, U.: Disn: deep implicit surface network for high-quality single-view 3D reconstruction (2019)Google Scholar
  64. 64.
    Zhang, X., Zhang, Z., Zhang, C., Tenenbaum, J., Freeman, B., Wu, J.: Learning to reconstruct shapes from unseen classes. In: NeurIPS (2018)Google Scholar
  65. 65.
    Zhou, X., Zhu, M., Leonardos, S., Derpanis, K.G., Daniilidis, K.: Sparseness meets deepness: 3D human pose estimation from monocular video. In: CVPR (2016)Google Scholar
  66. 66.
    Zou, C., Yumer, E., Yang, J., Ceylan, D., Hoiem, D.: 3D-PRNN: generating shape primitives with recurrent neural networks. In: ICCV (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Carnegie Mellon UniversityPittsburghUSA

Personalised recommendations