Advertisement

Image-to-Voxel Model Translation for 3D Scene Reconstruction and Segmentation

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12352)

Abstract

Objects class, depth, and shape are instantly reconstructed by a human looking at a 2D image. While modern deep models solve each of these challenging tasks separately, they struggle to perform simultaneous scene 3D reconstruction and segmentation. We propose a single shot image-to-semantic voxel model translation framework. We train a generator adversarially against a discriminator that verifies the object’s poses. Furthermore, trapezium-shaped voxels, volumetric residual blocks, and 2D-to-3D skip connections facilitate our model learning explicit reasoning about 3D scene structure. We collected a SemanticVoxels dataset with 116k images, ground-truth semantic voxel models, depth maps, and 6D object poses. Experiments on ShapeNet and our SemanticVoxels datasets demonstrate that our framework achieves and surpasses state-of-the-art in the reconstruction of scenes with multiple non-rigid objects of different classes. We made our model and dataset publicly available (http://www.zefirus.org/SSZ).

Keywords

Single photo 3D reconstruction 3D semantic segmentation 

Notes

Acknowledgments

The reported study was funded by Russian Foundation for Basic Research (RFBR) according to the research project N\(\mathrm {^{o}}\) 17-29-04509.

Supplementary material

Supplementary material 1 (mov 25284 KB)

504444_1_En_7_MOESM2_ESM.pdf (13.4 mb)
Supplementary material 2 (pdf 13700 KB)

References

  1. 1.
    Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017, pp. 2107–2115 (2017)Google Scholar
  2. 2.
    Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_38. As references [2] and [75] are same, we have deleted the duplicate reference and renumbered accordingly. Please check and confirm.CrossRefGoogle Scholar
  3. 3.
    Xie, H., Yao, H., Sun, X., Zhou, S., Zhang, S.: Pix2Vox: context-aware 3D reconstruction from single and multi-view images. In: The IEEE International Conference on Computer Vision (ICCV) (October 2019)Google Scholar
  4. 4.
    Xu, Q., Wang, W., Ceylan, D., Mech, R., Neumann, U.: DISN: deep implicit surface network for high-quality single-view 3D reconstruction. In Wallach, H., Larochelle, H., Beygelzimer, A., dÁlché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 492–502. Curran Associates, Inc. (2019)Google Scholar
  5. 5.
    Jackson, A.S., Manafas, C., Tzimiropoulos, G.: 3D human body reconstruction from a single image via volumetric regression. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11132, pp. 64–77. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-11018-5_6CrossRefGoogle Scholar
  6. 6.
    Shin, D., Ren, Z., Sudderth, E.B., Fowlkes, C.C.: 3D scene reconstruction with multi-layer depth and epipolar transformers. In: The IEEE International Conference on Computer Vision (ICCV) (October 2019)Google Scholar
  7. 7.
    Choy, C.B., Gwak, J., Savarese, S.: 4D spatio-temporal ConvNets: Minkowski convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 3075–3084 (2019)Google Scholar
  8. 8.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  9. 9.
    Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks, pp. 4510–4520 (2018)Google Scholar
  10. 10.
    Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  11. 11.
    Wu, J., Zhang, C., Xue, T., Freeman, W.T., Tenenbaum, J.B.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Advances in Neural Information Processing Systems, pp. 82–90 (2016)Google Scholar
  12. 12.
    Girdhar, R., Fouhey, D.F., Rodriguez, M., Gupta, A.: Learning a predictable and generative vector representation for objects. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 484–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_29CrossRefGoogle Scholar
  13. 13.
    Shin, D., Fowlkes, C., Hoiem, D.: Pixels, voxels, and views: a study of shape representations for single view 3D object shape prediction. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  14. 14.
    Kalogerakis, E., Averkiou, M., Maji, S., Chaudhuri, S.: 3D shape segmentation with projective convolutional networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017)Google Scholar
  15. 15.
    Zhu, R., Kiani Galoogahi, H., Wang, C., Lucey, S.: Rethinking reprojection: closing the loop for pose-aware shape reconstruction from a single image. In: The IEEE International Conference on Computer Vision (ICCV) (October 2017)Google Scholar
  16. 16.
    Leroy, V., Franco, J.-S., Boyer, E.: Shape reconstruction using volume sweeping and learned photoconsistency. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 796–811. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01240-3_48CrossRefGoogle Scholar
  17. 17.
    Sridhar, S., Rempe, D., Valentin, J., Sofien, B., Guibas, L.J.: Multiview aggregation for learning category-specific shape reconstruction. In Wallach, H., Larochelle, H., Beygelzimer, A., dÁlché Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 2351–2362. Curran Associates, Inc. (2019)Google Scholar
  18. 18.
    Insafutdinov, E., Dosovitskiy, A.: Unsupervised learning of shape and pose with differentiable point clouds. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 2802–2812. Curran Associates, Inc. (2018)Google Scholar
  19. 19.
    Jiang, L., Shi, S., Qi, X., Jia, J.: GAL: geometric adversarial loss for single-view 3D-object reconstruction. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 820–834. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01237-3_49CrossRefGoogle Scholar
  20. 20.
    Wu, J., Wang, Y., Xue, T., Sun, X., Freeman, W.T., Tenenbaum, J.B.: MarrNet: 3D shape reconstruction via 2.5D sketches. In: Advances In Neural Information Processing Systems (2017)Google Scholar
  21. 21.
    Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object reconstruction from a single image. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017)Google Scholar
  22. 22.
    Li, K., Pham, T., Zhan, H., Reid, I.: Efficient dense point cloud object reconstruction using deformation vector fields. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 508–524. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01258-8_31CrossRefGoogle Scholar
  23. 23.
    Zhang, X., Zhang, Z., Zhang, C., Tenenbaum, J., Freeman, B., Wu, J.: Learning to reconstruct shapes from unseen classes. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 2257–2268. Curran Associates, Inc. (2018)Google Scholar
  24. 24.
    Yang, G., Cui, Y., Belongie, S., Hariharan, B.: Learning single-view 3D reconstruction with limited pose supervision. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 90–105. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01267-0_6CrossRefGoogle Scholar
  25. 25.
    Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017)Google Scholar
  26. 26.
    Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017)Google Scholar
  27. 27.
    Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud based 3D object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)Google Scholar
  28. 28.
    Moon, G., Yong Chang, J., Mu Lee, K.: V2V-PoseNet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)Google Scholar
  29. 29.
    Sitzmann, V., Thies, J., Heide, F., Niessner, M., Wetzstein, G., Zollhofer, M.: DeepVoxels: Learning persistent 3D feature embeddings. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)Google Scholar
  30. 30.
    Gadelha, M., Wang, R., Maji, S.: Shape reconstruction using differentiable projections and deep priors. In: The IEEE International Conference on Computer Vision (ICCV) (October 2019)Google Scholar
  31. 31.
    Zheng, Z., Yu, T., Wei, Y., Dai, Q., Liu, Y.: DeepHuman: 3D human reconstruction from a single image. In: The IEEE International Conference on Computer Vision (ICCV) (October 2019)Google Scholar
  32. 32.
    Richter, S.R., Roth, S.: Matryoshka networks: predicting 3D geometry via nested shape layers. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 1936–1944 (2018)Google Scholar
  33. 33.
    Zhang, D., Han, J., Yang, Y., Huang, D.: Learning category-specific 3D shape models from weakly labeled 2D images. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017)Google Scholar
  34. 34.
    Zheng, C., Cham, T.-J., Cai, J.: T\(^2\)Net: synthetic-to-realistic translation for solving single-image depth estimation tasks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 798–814. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01234-2_47CrossRefGoogle Scholar
  35. 35.
    Feng, M., Gilani, S.Z., Wang, Y., Mian, A.: 3D face reconstruction from light field images: a model-free approach. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 508–526. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01249-6_31CrossRefGoogle Scholar
  36. 36.
    Kumar, S., Dai, Y., Li, H.: Monocular dense 3D reconstruction of a complex dynamic scene from two perspective frames. In: The IEEE International Conference on Computer Vision (ICCV) (October 2017)Google Scholar
  37. 37.
    Zhan, H., et al.: Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)Google Scholar
  38. 38.
    Ma, X., Wang, Z., Li, H., Zhang, P., Ouyang, W., Fan, X.: Accurate monocular 3D object detection via color-embedded 3D reconstruction for autonomous driving. In: The IEEE International Conference on Computer Vision (ICCV) (October 2019)Google Scholar
  39. 39.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967–5976. IEEE (2017)Google Scholar
  40. 40.
    Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 386–402. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01267-0_23CrossRefGoogle Scholar
  41. 41.
    Shimada, S., Golyanik, V., Theobalt, C., Stricker, D.: IsMo-GAN: adversarial learning for monocular non-rigid 3D reconstruction. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (June 2019)Google Scholar
  42. 42.
    Zhou, Y., et al.: HairNet: single-view hair reconstruction using convolutional neural networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 249–265. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01252-6_15CrossRefGoogle Scholar
  43. 43.
    Alp Guler, R., Trigeorgis, G., Antonakos, E., Snape, P., Zafeiriou, S., Kokkinos, I.: DenseReg: fully convolutional dense shape regression in-the-wild. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017)Google Scholar
  44. 44.
    Shi, Y., Xu, K., Nießner, M., Rusinkiewicz, S., Funkhouser, T.: PlaneMatch: patch coplanarity prediction for robust RGB-D reconstruction. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 767–784. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01237-3_46CrossRefGoogle Scholar
  45. 45.
    Wu, J., Zhang, C., Zhang, X., Zhang, Z., Freeman, W.T., Tenenbaum, J.B.: Learning shape priors for single-view 3D completion and reconstruction. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 673–691. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01252-6_40CrossRefGoogle Scholar
  46. 46.
    Liu, C., Yang, J., Ceylan, D., Yumer, E., Furukawa, Y.: PlaneNet: piece-wise planar reconstruction from a single RGB image. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)Google Scholar
  47. 47.
    Agudo, A., Pijoan, M., Moreno-Noguer, F.: Image collection pop-up: 3D reconstruction and clustering of rigid and non-rigid categories. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)Google Scholar
  48. 48.
    Sinha, A., Unmesh, A., Huang, Q., Ramani, K.: SurfNet: generating 3D shape surfaces using deep residual networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017)Google Scholar
  49. 49.
    Richardson, E., Sela, M., Or-El, R., Kimmel, R.: Learning detailed face reconstruction from a single image. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017)Google Scholar
  50. 50.
    Dou, P., Shah, S.K., Kakadiaris, I.A.: End-to-end 3D face reconstruction with deep neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017)Google Scholar
  51. 51.
    Tewari, A., et al.: MoFA: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: The IEEE International Conference on Computer Vision (ICCV) (October 2017)Google Scholar
  52. 52.
    Jackson, A.S., Bulat, A., Argyriou, V., Tzimiropoulos, G.: Large pose 3D face reconstruction from a single image via direct volumetric CNN regression. In: The IEEE International Conference on Computer Vision (ICCV) (October 2017)Google Scholar
  53. 53.
    Sela, M., Richardson, E., Kimmel, R.: Unrestricted facial geometry reconstruction using image-to-image translation. In: The IEEE International Conference on Computer Vision (ICCV) (October 2017)Google Scholar
  54. 54.
    Huang, S., Qi, S., Zhu, Y., Xiao, Y., Xu, Y., Zhu, S.-C.: Holistic 3D scene parsing and reconstruction from a single RGB image. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 194–211. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01234-2_12CrossRefGoogle Scholar
  55. 55.
    Kundu, A., Li, Y., Rehg, J.M.: 3D-RCNN: instance-level 3D object reconstruction via render-and-compare. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)Google Scholar
  56. 56.
    Knyaz, V.A., Kniaz, V.V., Remondino, F.: Image-to-voxel model translation with conditional adversarial networks. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 601–618. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-11009-3_37CrossRefGoogle Scholar
  57. 57.
    Kniaz, V.V., Moshkantsev, P.V., Mizginov, V.A.: Deep learning a single photo voxel model prediction from real and synthetic images. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y. (eds.) NEUROINFORMATICS 2019. SCI, vol. 856, pp. 3–16. Springer, Cham (2020).  https://doi.org/10.1007/978-3-030-30425-6_1CrossRefGoogle Scholar
  58. 58.
    Kniaz, V.V., Remondino, F., Knyaz, V.A.: Generative adversarial networks for single photo 3D reconstruction. In: ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XLII-2/W9, pp. 403–408 (2019)Google Scholar
  59. 59.
    Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 432–448. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01228-1_26CrossRefGoogle Scholar
  60. 60.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)Google Scholar
  61. 61.
    Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 4510–4520 (2018)Google Scholar
  62. 62.
    Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 6517–6525 (2017)Google Scholar
  63. 63.
    Caesar, H., et al.: nuScenes: A multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027 (2019)
  64. 64.
    Locher, A., Havlena, M., Van Gool, L.: Progressive structure from motion. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 22–38. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01225-0_2CrossRefGoogle Scholar
  65. 65.
    Mizginov, V.A., Kniaz, V.V.: Evaluating the accuracy of 3D object reconstruction from thermal images. In: ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XLII-2/W18, pp. 129–134 (2019)Google Scholar
  66. 66.
    Sun, X., et al.: Pix3D: dataset and methods for single-image 3D shape modeling. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  67. 67.
    Wang, T., Liu, M., Zhu, J., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 8798–8807 (2018)Google Scholar
  68. 68.
    Kniaz, V.V., Knyaz, V.A., Remondino, F.: The point where reality meets fantasy: mixed adversarial generators for image splice detection. In: Advances in Neural Information Processing Systems: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8–14 December 2019, Vancouver, BC, Canada, vol. 32, pp. 215–226 (2019)Google Scholar
  69. 69.
    Kniaz, V.V., Knyaz, V.A., Hladůvka, J., Kropatsch, W.G., Mizginov, V.: ThermalGAN: multimodal color-to-thermal image translation for person re-identification in multispectral dataset. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11134, pp. 606–624. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-11024-6_46CrossRefGoogle Scholar
  70. 70.
    Kniaz, V.V., Bordodymov, A.N.: Long wave infrared image colorization for person re-identification. In: ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XLII-2/W12, pp. 111–116 (2019)Google Scholar
  71. 71.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  72. 72.
    Canny, J.F.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8(6), 679–698 (1986)CrossRefGoogle Scholar
  73. 73.
    Chang, A.X., Funkhouser, T.A., et al.: ShapeNet: An information-rich 3D model repository. CoRR abs/1512.03012 (2015)Google Scholar
  74. 74.
    Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR). IEEE (2017)Google Scholar
  75. 75.
    Garbade, M., Chen, Y., Sawatzky, J., Gall, J.: Two stream 3D semantic scene completion. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 416–425 (2019)Google Scholar
  76. 76.
    Paszke, A., et al.: Automatic differentiation in PyTorch (2017)Google Scholar
  77. 77.
    Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.State Research Institute of Aviation Systems (GosNIIAS)MoscowRussia
  2. 2.Moscow Institute of Physics and Technology (MIPT)DolgoprudnyRussia
  3. 3.Bruno Kessler Foundation (FBK)TrentoItaly

Personalised recommendations