Advertisement

AtlantaNet: Inferring the 3D Indoor Layout from a Single \(360^\circ \) Image Beyond the Manhattan World Assumption

Conference paper
  • 638 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12353)

Abstract

We introduce a novel end-to-end approach to predict a 3D room layout from a single panoramic image. Compared to recent state-of-the-art works, our method is not limited to Manhattan World environments, and can reconstruct rooms bounded by vertical walls that do not form right angles or are curved – i.e., Atlanta World models. In our approach, we project the original gravity-aligned panoramic image on two horizontal planes, one above and one below the camera. This representation encodes all the information needed to recover the Atlanta World 3D bounding surfaces of the room in the form of a 2D room footprint on the floor plan and a room height. To predict the 3D layout, we propose an encoder-decoder neural network architecture, leveraging Recurrent Neural Networks (RNNs) to capture long-range geometric patterns, and exploiting a customized training strategy based on domain-specific knowledge. The experimental results demonstrate that our method outperforms state-of-the-art solutions in prediction accuracy, in particular in cases of complex wall layouts or curved wall footprints.

Keywords

3D floor plan recovery Panoramic images 360 images Data-driven reconstruction Structured indoor reconstruction Indoor panorama Room layout estimation Holistic scene structure 

Notes

Acknowledgments

This work has received funding from Sardinian Regional Authorities under projects VIGECLAB, AMAC, and TDM (POR FESR 2014-2020). We also acknowledge the contribution of the European Union’s H2020 research and innovation programme under grant agreements 813170 (EVOCATION).

Supplementary material

504445_1_En_26_MOESM1_ESM.pdf (15.2 mb)
Supplementary material 1 (pdf 15578 KB)

References

  1. 1.
    Acuna, D., Ling, H., Kar, A., Fidler, S.: Efficient interactive annotation of segmentation datasets with Polygon-RNN++. In: Proceedings of CVPR (2018)Google Scholar
  2. 2.
    Castrejon, L., Kundu, K., Urtasun, R., Fidler, S.: Annotating object instances with a Polygon-RNN. In: Proceedings of CVPR (2017)Google Scholar
  3. 3.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE TPAMI 40(4), 834–848 (2017)CrossRefGoogle Scholar
  4. 4.
    Delage, E., Honglak Lee, Ng, A.Y.: A dynamic Bayesian network model for autonomous 3D reconstruction from a single indoor image. In: Proceedings of CVPR, vol. 2, pp. 2418–2428 (2006)Google Scholar
  5. 5.
    Douglas, D.H., Peucker, T.K.: Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: Int. J. Geogr. Inf. Geovisual. 10(2), 112–122 (1973)CrossRefGoogle Scholar
  6. 6.
    Fernandez-Labrador, C., Fácil, J.M., Perez-Yus, A., Demonceaux, C., Civera, J., Guerrero, J.J.: Corners for layout: End-to-end layout recovery from 360 images (2019). arXiv:1903.08094
  7. 7.
    Flint, A., Murray, D., Reid, I.: Manhattan scene understanding using monocular, stereo, and 3D features. In: Proceedings of ICCV, pp. 2228–2235 (2011)Google Scholar
  8. 8.
    Flint, A., Mei, C., Murray, D., Reid, I.: A dynamic programming approach to reconstructing building interiors. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 394–407. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15555-0_29CrossRefGoogle Scholar
  9. 9.
    Gallagher, A.C.: Using vanishing points to correct camera rotation in images. In: Proceedings of CVR, pp. 460–467 (2005)Google Scholar
  10. 10.
    Geyer, C., Daniilidis, K.: A unifying theory for central panoramic systems and practical implications. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1843, pp. 445–461. Springer, Heidelberg (2000).  https://doi.org/10.1007/3-540-45053-X_29CrossRefGoogle Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR, pp. 770–778 (2016)Google Scholar
  12. 12.
    Hedau, V., Hoiem, D., Forsyth, D.: Recovering the spatial layout of cluttered rooms. In: Proceedings of ICCV, pp. 1849–1856 (2009)Google Scholar
  13. 13.
    Hoiem, D., Efros, A.A., Hebert, M.: Recovering surface layout from an image. Int. J. Comput. Vision 75(1), 151–172 (2007).  https://doi.org/10.1007/s11263-006-0031-yCrossRefzbMATHGoogle Scholar
  14. 14.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014)Google Scholar
  15. 15.
    Kujiale.com: Structured3D Data (2019). https://structured3d-dataset.org/. Accessed 25 Sept 2019
  16. 16.
    Lee, D.C., Hebert, M., Kanade, T.: Geometric reasoning for single image structure recovery. In: Proceedings of CVPR, pp. 2136–2143 (2009)Google Scholar
  17. 17.
    Matterport: Matterport3D (2017). https://github.com/niessner/Matterport. Accessed 25 Sept 2019
  18. 18.
    Paszke, A., et al.: Automatic differentiation in pytorch. In: Proceedings of NIPS (2017)Google Scholar
  19. 19.
    Pintore, G., Garro, V., Ganovelli, F., Agus, M., Gobbetti, E.: Omnidirectional image capture on mobile devices for fast automatic generation of 2.5D indoor maps. In: Proceedings of IEEE WACV, pp. 1–9 (2016)Google Scholar
  20. 20.
    Pintore, G., Mura, C., Ganovelli, F., Fuentes-Perez, L., Pajarola, R., Gobbetti, E.: State-of-the-art in automatic 3D reconstruction of structured indoor environments. Comput. Graph. Forum 39(2), 667–699 (2020)CrossRefGoogle Scholar
  21. 21.
    Pintore, G., Pintus, R., Ganovelli, F., Scopigno, R., Gobbetti, E.: Recovering 3D existing-conditions of indoor structures from spherical images. Comput. Graph. 77, 16–29 (2018)CrossRefGoogle Scholar
  22. 22.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  23. 23.
    Schindler, G., Dellaert, F.: Atlanta world: an expectation maximization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments. In: Proceedings of CVPR, vol. 1, p. I (2004)Google Scholar
  24. 24.
    Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W., Woo, W.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Proceedings of NIPS, pp. 802–810 (2015)Google Scholar
  25. 25.
    Stanford University: BuildingParser Dataset (2017). http://buildingparser.stanford.edu/dataset.html. Accessed 25 Sept 2019
  26. 26.
    Sun, C., Hsiao, C.W., Sun, M., Chen, H.T.: HorizonNet: learning room layout with 1D representation and pano stretch data augmentation. In: Proceedings of CVPR (2019)Google Scholar
  27. 27.
    Xu, J., Stenger, B., Kerola, T., Tung, T.: Pano2CAD: room layout from a single panorama image. In: Proceedings of WACV, pp. 354–362 (2017)Google Scholar
  28. 28.
    Yang, H., Zhang, H.: Efficient 3D room shape recovery from a single panorama. In: Proceedings of CVPR, pp. 5422–5430 (2016)Google Scholar
  29. 29.
    Yang, S.T., Peng, C.H., Wonka, P., Chu, H.K.: PanoAnnotator: a semi-automatic tool for indoor panorama layout annotation. In: Proceedings of SIGGRAPH Asia 2018 Posters, pp. 34:1–34:2 (2018)Google Scholar
  30. 30.
    Yang, S.T., Wang, F.E., Peng, C.H., Wonka, P., Sun, M., Chu, H.K.: DuLa-Net: a dual-projection network for estimating room layouts from a single RGB panorama. In: Proceedings of CVPR (2019)Google Scholar
  31. 31.
    Yang, Y., Jin, S., Liu, R., Yu, J.: Automatic 3D indoor scene modeling from single panorama. In: Proceedings of CVPR, pp. 3926–3934 (2018)Google Scholar
  32. 32.
    Zhang, Y., Song, S., Tan, P., Xiao, J.: PanoContext: a whole-room 3D context model for panoramic scene understanding. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 668–686. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_43CrossRefGoogle Scholar
  33. 33.
    Zou, C., Colburn, A., Shan, Q., Hoiem, D.: LayoutNet: reconstructing the 3D room layout from a single RGB image. In: Proceedings of CVPR, pp. 2051–2059 (2018)Google Scholar
  34. 34.
    Zou, C., et al.: 3D Manhattan room layout reconstruction from a single 360 image (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Visual ComputingCRS4CagliariItaly
  2. 2.College of Science and EngineeringHBKUDohaQatar

Personalised recommendations