Advertisement

Learning Shape Priors for Single-View 3D Completion And Reconstruction

  • Jiajun WuEmail author
  • Chengkai Zhang
  • Xiuming Zhang
  • Zhoutong Zhang
  • William T. Freeman
  • Joshua B. Tenenbaum
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11215)

Abstract

The problem of single-view 3D shape completion or reconstruction is challenging, because among the many possible shapes that explain an observation, most are implausible and do not correspond to natural objects. Recent research in the field has tackled this problem by exploiting the expressiveness of deep convolutional networks. In fact, there is another level of ambiguity that is often overlooked: among plausible shapes, there are still multiple shapes that fit the 2D image equally well; i.e., the ground truth shape is non-deterministic given a single-view input. Existing fully supervised approaches fail to address this issue, and often produce blurry mean shapes with smooth surfaces but no fine details. In this paper, we propose ShapeHD, pushing the limit of single-view shape completion and reconstruction by integrating deep generative models with adversarially learned shape priors. The learned priors serve as a regularizer, penalizing the model only if its output is unrealistic, not if it deviates from the ground truth. Our design thus overcomes both levels of ambiguity aforementioned. Experiments demonstrate that ShapeHD outperforms state of the art by a large margin in both shape completion and shape reconstruction on multiple real datasets.

Keywords

Shape priors Shape completion 3D reconstruction 

Notes

Acknowledgements

This work is supported by NSF #1231216, ONR MURI N00014-16-1-2007, Toyota Research Institute, Shell Research, and Facebook.

References

  1. 1.
    Bansal, A., Russell, B.: Marr revisited: 2D–3D alignment via surface normal prediction. In: CVPR (2016)Google Scholar
  2. 2.
    Barron, J.T., Malik, J.: Shape, illumination, and reflectance from shading. IEEE TPAMI 37(8), 1670–1687 (2015)CrossRefGoogle Scholar
  3. 3.
    Bell, S., Bala, K., Snavely, N.: Intrinsic images in the wild. ACM TOG 33(4), 159 (2014)CrossRefGoogle Scholar
  4. 4.
    Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Generative and discriminative voxel modeling with convolutional neural networks. In: NIPS Workshop (2016)Google Scholar
  5. 5.
    Chang, A.X., et al.: Shapenet: an information-rich 3D model repository. arXiv:1512.03012 (2015)
  6. 6.
    Chen, W., Fu, Z., Yang, D., Deng, J.: Single-image depth perception in the wild. In: NIPS (2016)Google Scholar
  7. 7.
    Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_38CrossRefGoogle Scholar
  8. 8.
    Dai, A., Qi, C.R., Nießner, M.: Shape completion using 3D-encoder-predictor CNNS and shape synthesis. In: CVPR (2017)Google Scholar
  9. 9.
    Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: NIPS (2016)Google Scholar
  10. 10.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV (2015)Google Scholar
  11. 11.
    Fan, H., Su, H., Guibas, L.: A point set generation network for 3D object reconstruction from a single image. In: CVPR (2017)Google Scholar
  12. 12.
    Firman, M., Aodha, O.M., Julier, S., Brostow, G.J.: Structured Completion of Unobserved Voxels from a Single Depth Image. In: CVPR (2016)Google Scholar
  13. 13.
    Girdhar, R., Fouhey, D.F., Rodriguez, M., Gupta, A.: Learning a predictable and generative vector representation for objects. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 484–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_29CrossRefGoogle Scholar
  14. 14.
    Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)Google Scholar
  15. 15.
    Goueix, T., Fisher, M., Kim, V.G., Russel, B.C., Aubry, M.: Atlasnet: a papier-mch approach to learning 3D surface generation. In: CVPR (2018)Google Scholar
  16. 16.
    Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of wasserstein gans. In: NIPS (2017)Google Scholar
  17. 17.
    Gwak, J., Choy, C.B., Chandraker, M., Garg, A., Savarese, S.: Weakly supervised 3D reconstruction with adversarial constraint. In: 3DV (2017)Google Scholar
  18. 18.
    Häne, C., Tulsiani, S., Malik, J.: Hierarchical surface prediction for 3D object reconstruction. In: 3DV (2017)Google Scholar
  19. 19.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2015)Google Scholar
  20. 20.
    Horn, B.K., Brooks, M.J.: Shape from Shading. MIT Press, Cambridge (1989)zbMATHGoogle Scholar
  21. 21.
    Huang, Q., Wang, H., Koltun, V.: Single-view reconstruction via joint analysis of image and shape collections. ACM TOG 34(4), 87 (2015)Google Scholar
  22. 22.
    Isola, P., Zoran, D., Krishnan, D., Adelson, E.H.: Learning visual groups from co-occurrences in space and time. In: ICLR Workshop (2016)Google Scholar
  23. 23.
    Izadi, S., et al.: Kinectfusion: real-time 3D reconstruction and interaction using a moving depth camera. In: UIST (2011)Google Scholar
  24. 24.
    Jakob, W.: Mitsuba renderer (2010). http://www.mitsuba-renderer.org
  25. 25.
    Janner, M., Wu, J., Kulkarni, T., Yildirim, I., Tenenbaum, J.B.: Self-Supervised Intrinsic Image Decomposition. In: NIPS (2017)Google Scholar
  26. 26.
    Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_43CrossRefGoogle Scholar
  27. 27.
    Kar, A., Tulsiani, S., Carreira, J., Malik, J.: Category-specific object reconstruction from a single image. In: CVPR (2015)Google Scholar
  28. 28.
    Kazhdan, M., Bolitho, M., Hoppe, H.: Poisson surface reconstruction. In: SGP. SGP 2006 (2006)Google Scholar
  29. 29.
    Kazhdan, M., Hoppe, H.: Screened poisson surface reconstruction. ACM TOG 32(3), 29 (2013)CrossRefGoogle Scholar
  30. 30.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  31. 31.
    Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. arXiv:1609.04802 (2016)
  32. 32.
    Li, Y., Dai, A., Guibas, L., Nießner, M.: Database-assisted object retrieval for real-time 3D reconstruction. CGF 34(2), 435–446 (2015)Google Scholar
  33. 33.
    McCormac, J., Handa, A., Leutenegger, S., Davison, A.J.: Scenenet RGB-D: Can 5m synthetic images beat generic imagenet pre-training on indoor segmentation? In: ICCV (2017)Google Scholar
  34. 34.
    Mitra, N.J., Guibas, L.J., Pauly, M.: Partial and approximate symmetry detection for 3d geometry. ACM TOG 25(3), 560–568 (2006)CrossRefGoogle Scholar
  35. 35.
    Nealen, A., Igarashi, T., Sorkine, O., Alexa, M.: Laplacian mesh optimization. In: Proceedings of the 4th international conference on Computer graphics and interactive techniques in Australasia and Southeast Asia. pp. 381–389. ACM (2006)Google Scholar
  36. 36.
    Novotny, D., Larlus, D., Vedaldi, A.: Learning 3D object categories by looking around them. In: ICCV (2017)Google Scholar
  37. 37.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2016)Google Scholar
  38. 38.
    Rezende, D.J., Eslami, S., Mohamed, S., Battaglia, P., Jaderberg, M., Heess, N.: Unsupervised learning of 3D structure from images. In: NIPS (2016)Google Scholar
  39. 39.
    Riegler, G., Ulusoy, A.O., Bischof, H., Geiger, A.: Octnetfusion: Learning depth fusion from data. In: 3DV (2017)Google Scholar
  40. 40.
    Riegler, G., Ulusoys, A.O., Geiger, A.: Octnet: learning deep 3D representations at high resolutions. In: CVPR (2017)Google Scholar
  41. 41.
    Shi, J., Dong, Y., Su, H., Yu, S.X.: Learning non-lambertian object intrinsics across shapenet categories. In: CVPR (2017)Google Scholar
  42. 42.
    Silberman, Nathan, Hoiem, Derek, Kohli, Pushmeet, Fergus, Rob: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, Andrew, Lazebnik, Svetlana, Perona, Pietro, Sato, Yoichi, Schmid, Cordelia (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33715-4_54CrossRefGoogle Scholar
  43. 43.
    Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. In: CVPR (2017)Google Scholar
  44. 44.
    Sorkine, O., Cohen-Or, D.: Least-squares meshes. In: Shape Modeling Applications (2004)Google Scholar
  45. 45.
    Sun, X., et al.: Pix3D: Dataset and methods for single-image 3D shape modeling. In: CVPR (2018)Google Scholar
  46. 46.
    Sung, M., Kim, V.G., Angst, R., Guibas, L.: Data-driven structural priors for shape completion. ACM TOG 34(6), 175 (2015)CrossRefGoogle Scholar
  47. 47.
    Tappen, M.F., Freeman, W.T., Adelson, E.H.: Recovering intrinsic images from a single image. In: NIPS (2003)Google Scholar
  48. 48.
    Tatarchenko, Maxim, Dosovitskiy, Alexey, Brox, Thomas: Multi-view 3D models from single images with a convolutional network. In: Leibe, Bastian, Matas, Jiri, Sebe, Nicu, Welling, Max (eds.) ECCV 2016. LNCS, vol. 9911, pp. 322–337. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46478-7_20CrossRefGoogle Scholar
  49. 49.
    Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. In: ICCV (2017)Google Scholar
  50. 50.
    Thanh Nguyen, D., Hua, B.S., Tran, K., Pham, Q.H., Yeung, S.K.: A field model for repairing 3D shapes. In: CVPR (2016)Google Scholar
  51. 51.
    Thrun, S., Wegbreit, B.: Shape from symmetry. In: ICCV (2005)Google Scholar
  52. 52.
    Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: CVPR (2017)Google Scholar
  53. 53.
    Wang, S., Wu, J., Sun, X., Yuan, W., Freeman, W.T., Tenenbaum, J.B., Adelson, E.H.: 3d shape perception from monocular vision, touch, and shape priors. In: IROS (2018)Google Scholar
  54. 54.
    Wang, X., Fouhey, D., Gupta, A.: Designing deep networks for surface normal estimation. In: CVPR (2015)Google Scholar
  55. 55.
    Weiss, Y.: Deriving intrinsic images from image sequences. In: ICCV (2001)Google Scholar
  56. 56.
    Wu, J., Wang, Y., Xue, T., Sun, X., Freeman, W.T., Tenenbaum, J.B.: MarrNet: 3D shape reconstruction via 2.5D sketches. In: NIPS (2017)Google Scholar
  57. 57.
    Wu, J., Zhang, C., Xue, T., Freeman, W.T., Tenenbaum, J.B.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: NIPS (2016)Google Scholar
  58. 58.
    Wu, Z., et al.: 3D shapenets: a deep representation for volumetric shapes. In: CVPR (2015)Google Scholar
  59. 59.
    Xiang, Yu., et al.: ObjectNet3D: a large scale database for 3D object recognition. In: Leibe, Bastian, Matas, Jiri, Sebe, Nicu, Welling, Max (eds.) ECCV 2016. LNCS, vol. 9912, pp. 160–176. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_10CrossRefGoogle Scholar
  60. 60.
    Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: a benchmark for 3d object detection in the wild. In: WACV (2014)Google Scholar
  61. 61.
    Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: Large-scale scene recognition from abbey to zoo. In: CVPR (2010)Google Scholar
  62. 62.
    Yan, X., Yang, J., Yumer, E., Guo, Y., Lee, H.: Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision. In: NIPS (2016)Google Scholar
  63. 63.
    Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M.: Shape-from-shading: a survey. IEEE TPAMI 21(8), 690–706 (1999)CrossRefGoogle Scholar
  64. 64.
    Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. In: ICLR (2014)Google Scholar
  65. 65.
    Zhu, Jun-Yan, Krähenbühl, Philipp, Shechtman, Eli, Efros, Alexei A.: Generative visual manipulation on the natural image manifold. In: Leibe, Bastian, Matas, Jiri, Sebe, Nicu, Welling, Max (eds.) ECCV 2016. LNCS, vol. 9909, pp. 597–613. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_36CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Jiajun Wu
    • 1
    Email author
  • Chengkai Zhang
    • 1
  • Xiuming Zhang
    • 1
  • Zhoutong Zhang
    • 1
  • William T. Freeman
    • 1
    • 2
  • Joshua B. Tenenbaum
    • 1
  1. 1.MIT CSAILCambridgeUSA
  2. 2.Google ResearchCambridgeUSA

Personalised recommendations