The Visual Computer

, Volume 35, Issue 2, pp 271–279 | Cite as

Deep 3D semantic scene extrapolation

  • Ali AbbasiEmail author
  • Sinan Kalkan
  • Yusuf Sahillioğlu
Original Article


Scene extrapolation is a challenging variant of the scene completion problem, which pertains to predicting the missing part(s) of a scene. While the 3D scene completion algorithms in the literature try to fill the occluded part of a scene such as a chair behind a table, we focus on extrapolating the available half-scene information to a full one, a problem that, to our knowledge, has not been studied yet. Our approaches are based on convolutional neural networks (CNN). As input, we take the half of 3D voxelized scenes, then our models complete the other half of scenes as output. Our baseline CNN model consisting of convolutional and ReLU layers with multiple residual connections and Softmax classifier with voxel-wise cross-entropy loss function at the end. We train and evaluate our models on the synthetic 3D SUNCG dataset. We show that our trained networks can predict the other half of the scenes and complete the objects correctly with suitable lengths. With a discussion on the challenges, we propose scene extrapolation as a challenging test bed for future research in deep learning. We made our models available on


3D scenes Extrapolation Convolutional neural networks 



This work has been supported by the Scientific and Technological Research Council of Turkey (TÜBİTAK) under the Project EEEAG-215E255. We also thank NVIDIA for their donation of a Titan X graphics card.


  1. 1.
    Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous systems. Software available from (2015)Google Scholar
  2. 2.
    Averbuch-Elor, H., Kopf, J., Hazan, T., Cohen-Or, D.: Co-segmentation for space-time co-located collections. Vis. Comput. 52, 1–12 (2017)Google Scholar
  3. 3.
    Besl, P.J., McKay, N.D.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal, Mach. Intell. 14(2), 239–256 (1992)CrossRefGoogle Scholar
  4. 4.
    Bouaziz, S., Tagliasacchi, A., Pauly, M.: Sparse iterative closest point. In: Rushmeier, H., Deussen, O. (eds.) Computer Graphics Forum, vol. 32, pp. 113–123. Wiley Online Library, New York (2013)Google Scholar
  5. 5.
    Dai, A., Qi, C.R., Nießner, M.: Shape completion using 3D-encoder-predictor CNNS and shape synthesis. arXiv preprint arXiv:1612.00101 (2016)
  6. 6.
    Firman, M., Mac Aodha, O., Julier, S., Brostow, G.J.: Structured prediction of unobserved voxels from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5431–5440 (2016)Google Scholar
  7. 7.
    Fisher, M., Ritchie, D., Savva, M., Funkhouser, T., Hanrahan, P.: Example-based synthesis of 3D object arrangements. ACM Trans. Graph. (TOG) 31(6), 135 (2012)CrossRefGoogle Scholar
  8. 8.
    Fisher, M., Savva, M., Li, Y., Hanrahan, P., Nießner, M.: Activity-centric scene synthesis for functional 3D scene modeling. ACM Trans. Graph. (TOG) 34(6), 179 (2015)CrossRefGoogle Scholar
  9. 9.
    Gal, R., Shamir, A., Hassner, T., Pauly, M., Cohen-Or, D.: Surface reconstruction using local shape priors. In: Symposium on Geometry Processing, EPFL-CONF-149318, pp. 253–262 (2007)Google Scholar
  10. 10.
    Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 2, pp. 2672–2680. MIT Press (2014)Google Scholar
  11. 11.
    Harary, G., Tal, A., Grinspun, E.: Context-based coherent surface completion. ACM Trans. Graph. (TOG) 33(1), 5 (2014)CrossRefzbMATHGoogle Scholar
  12. 12.
    Harary, G., Tal, A., Grinspun, E.: Feature-preserving surface completion using four points. In: Computer Graphics Forum, vol. 33, pp. 45–54. Wiley Online Library, New York (2014)Google Scholar
  13. 13.
    Hays, J., Efros, A.A.: Scene completion using millions of photographs. ACM Trans. Graph. (TOG) 26, 4 (2007)CrossRefGoogle Scholar
  14. 14.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  15. 15.
    Hu, S.M., Chen, T., Xu, K., Cheng, M.M., Martin, R.R.: Internet visual media processing: a survey with graphics and vision applications. Vis. Comput. 29(5), 393–405 (2013)CrossRefGoogle Scholar
  16. 16.
    Huang, J.B., Kang, S.B., Ahuja, N., Kopf, J.: Temporally coherent completion of dynamic video. ACM Trans. Graph. (TOG) 35(6), 196 (2016)Google Scholar
  17. 17.
    Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph. (TOG) 36(4), 107 (2017)CrossRefGoogle Scholar
  18. 18.
    Kazhdan, M., Hoppe, H.: Screened poisson surface reconstruction. ACM Trans. Graph. (TOG) 32(3), 29 (2013)CrossRefzbMATHGoogle Scholar
  19. 19.
    Kelly, B., Matthews, T.P., Anastasio, M.A.: Deep learning-guided image reconstruction from incomplete data. arXiv preprint arXiv:1709.00584 (2017)
  20. 20.
    Kermani, Z.S., Liao, Z., Tan, P., Zhang, H.: Learning 3D scene synthesis from annotated RGB-D images. In: Computer Graphics Forum, vol. 35, pp. 197–206. Wiley Online Library, New York (2016)Google Scholar
  21. 21.
    Kraevoy, V., Sheffer, A.: Template-based mesh completion. In: Symposium on Geometry Processing, vol. 385, pp. 13–22 (2005)Google Scholar
  22. 22.
    Kwatra, V., Essa, I., Bobick, A., Kwatra, N.: Texture optimization for example-based synthesis. ACM Trans. Graph. (ToG) 24(3), 795–802 (2005)CrossRefGoogle Scholar
  23. 23.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  24. 24.
    Li, D., Shao, T., Wu, H., Zhou, K.: Shape completion from a single RGBD image. IEEE Trans. Vis. Comput. Graph. 23(7), 1809–1822 (2017)CrossRefGoogle Scholar
  25. 25.
    Li, Y., Liu, S., Yang, J., Yang, M.H.: Generative face completion. arXiv preprint arXiv:1704.05838 (2017)
  26. 26.
    Liepa, P.: Filling holes in meshes. In: Eurographics/ACM SIGGRAPH Symposium on Geometry Processing, pp. 200–205. Eurographics Association (2003)Google Scholar
  27. 27.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. arXiv preprint arXiv:1708.02002 (2017)
  28. 28.
    Liu, H., Zhang, L., Huang, H.: Web-image driven best views of 3D shapes. Vis. Comput. 28(3), 279–287 (2012)CrossRefGoogle Scholar
  29. 29.
    Mavridis, P., Sipiran, I., Andreadis, A., Papaioannou, G.: Object completion using k-sparse optimization. In: Computer Graphics Forum, vol. 34, pp. 13–21. Wiley Online Library, New York (2015)Google Scholar
  30. 30.
    Mellado, N., Aiger, D., Mitra, N.J.: Super 4pcs fast global pointcloud registration via smart indexing. In: Computer Graphics Forum, vol. 33, pp. 205–215. Wiley Online Library, New York (2014)Google Scholar
  31. 31.
    Min, P.: Binvox. Accessed 21 June 2018
  32. 32.
    Mnih, A., Gregor, K.: Neural variational inference and learning in belief networks. arXiv preprint arXiv:1402.0030 (2014)
  33. 33.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)Google Scholar
  34. 34.
    Oord, A.V.D., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759 (2016)
  35. 35.
    Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer, Berlin (2015)Google Scholar
  36. 36.
    Sahillioğlu, Y., Yemez, Y.: Coarse-to-fine surface reconstruction from silhouettes and range data using mesh deformation. Comput. Vis. Image Underst. 114(3), 334–348 (2010)CrossRefGoogle Scholar
  37. 37.
    Sharf, A., Alexa, M., Cohen-Or, D.: Context-based surface completion. ACM Trans. Graph. (TOG) 23(3), 878–887 (2004)CrossRefGoogle Scholar
  38. 38.
    Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  39. 39.
    Suwajanakorn, S., Seitz, S.M., Kemelmacher-Shlizerman, I.: Synthesizing obama: learning lip sync from audio. ACM Trans. Graph. (TOG) 36(4), 95 (2017)CrossRefGoogle Scholar
  40. 40.
    Wexler, Y., Shechtman, E., Irani, M.: Space-time completion of video. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 915 (2007)CrossRefGoogle Scholar
  41. 41.
    Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Advances in Neural Information Processing Systems, pp. 82–90 (2016)Google Scholar
  42. 42.
    Xia, C., Zhang, H.: A fast and automatic hole-filling method based on feature line recovery. Comput. Aided Des. Appl. 4, 1–9 (2017)Google Scholar
  43. 43.
    Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987–5995. IEEE (2017)Google Scholar
  44. 44.
    Yang, B., Wen, H., Wang, S., Clark, R., Markham, A., Trigoni, N.: 3D object reconstruction from a single depth view with adversarial learning. arXiv preprint arXiv:1708.07969 (2017)
  45. 45.
    Yang, J., Li, H., Campbell, D., Jia, Y.: GO-ICP: a globally optimal solution to 3D ICP point-set registration. IEEE Trans. Pattern Anal. Mach. Intell. 38(11), 2241–2254 (2016)CrossRefGoogle Scholar
  46. 46.
    Yang, L., Yan, Q., Xiao, C.: Shape-controllable geometry completion for point cloud models. Vis. Comput. 33(3), 385–398 (2017)MathSciNetCrossRefGoogle Scholar
  47. 47.
    Yeh, R.A., Chen, C., Lim, T.Y., Schwing, A.G., Hasegawa-Johnson, M., Do, M.N.: Semantic image inpainting with deep generative models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5485–5493 (2017)Google Scholar
  48. 48.
    Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. Int. Confer. Learn. Represent. 896, 2016 (2016)Google Scholar
  49. 49.
    Zhang, H., Xu, M., Zhuo, L., Havyarimana, V.: A novel optimization framework for salient object detection. Vis. Comput. 32(1), 31–41 (2016)CrossRefGoogle Scholar
  50. 50.
    Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., Metaxas, D.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. arXiv preprint arXiv:1612.03242 (2016)
  51. 51.
    Zhao, W., Gao, S., Lin, H.: A robust hole-filling algorithm for triangular mesh. Vis. Comput. 23(12), 987–997 (2007)CrossRefGoogle Scholar
  52. 52.
    Zheng, B., Zhao, Y., Yu, J.C., Ikeuchi, K., Zhu, S.C.: Beyond point clouds: scene understanding by reasoning geometry and physics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3127–3134 (2013)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Middle East Technical UniversityAnkaraTurkey

Personalised recommendations