3D Room Layout Estimation from a Cubemap of Panorama Image via Deep Manhattan Hough Transform

Zhao, Yining; Wen, Chao; Xue, Zhou; Gao, Yue

doi:10.1007/978-3-031-19769-7_37

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13661))

Included in the following conference series:

European Conference on Computer Vision

3153 Accesses
5 Citations

Abstract

Significant geometric structures can be compactly described by global wireframes in the estimation of 3D room layout from a single panoramic image. Based on this observation, we present an alternative approach to estimate the walls in 3D space by modeling long-range geometric patterns in a learnable Hough Transform block. We transform the image feature from a cubemap tile to the Hough space of a Manhattan world and directly map the feature to the geometric output. The convolutional layers not only learn the local gradient-like line features, but also utilize the global information to successfully predict occluded walls with a simple network structure. Unlike most previous work, the predictions are performed individually on each cubemap tile, and then assembled to get the layout estimation. Experimental results show that we achieve comparable results with recent state-of-the-art in prediction accuracy and performance. Code is available at https://github.com/Starrah/DMH-Net.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Armeni, I., Sax, S., Zamir, A.R., Savarese, S.: Joint 2D–3D-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105 (2017)
Ballard, D.H.: Generalizing the Hough transform to detect arbitrary shapes. Pattern Recogn. 13(2), 111–122 (1981)
Article Google Scholar
Beltrametti, M.C., Campi, C., Massone, A.M., Torrente, M.: Geometry of the Hough transforms with applications to synthetic data. Math. Comput. Sci. 1–23 (2020)
Google Scholar
Bertamini, M., Helmy, M., Bates, D.: The visual system prioritizes locations near corners of surfaces (not just locations near a corner). Attention Percept. Psychophys. 75(8), 1748–1760 (2013). https://doi.org/10.3758/s13414-013-0514-1
Article Google Scholar
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8(6), 679–698 (1986)
Article Google Scholar
Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: Proceedings of the International Conference on 3D Vision (3DV), pp. 667–676 (2017)
Google Scholar
Coughlan, J.M., Yuille, A.L.: Manhattan world: compass direction from a single image by Bayesian inference. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 941–947 (1999)
Google Scholar
Dasgupta, S., Fang, K., Chen, K., Savarese, S.: Delay: robust spatial layout estimation for cluttered indoor scenes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 616–624 (2016)
Google Scholar
Del Pero, L., Bowdish, J., Kermgard, B., Hartley, E., Barnard, K.: Understanding Bayesian rooms using composite 3D object models. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 153–160 (2013)
Google Scholar
Delage, E., Lee, H., Ng, A.Y.: A dynamic Bayesian network model for autonomous 3D reconstruction from a single indoor image. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2418–2428 (2006)
Google Scholar
Fernandez-Labrador, C., Facil, J.M., Perez-Yus, A., Demonceaux, C., Civera, J., Guerrero, J.J.: Corners for layout: End-to-end layout recovery from 360 images. IEEE Robot. Autom. Lett. (RA-L) 5(2), 1255–1262 (2020)
Google Scholar
Greene, N.: Environment mapping and other applications of world projections. IEEE Comput. Graph. Appl. 6(11), 21–29 (1986)
Article Google Scholar
Gupta, A., Hebert, M., Kanade, T., Blei, D.: Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 1288–1296 (2010)
Google Scholar
Han, Q., Zhao, K., Xu, J., Cheng, M.M.: Deep Hough transform for semantic line detection. IEEE Trans. Pattern Anal. Mach. Intell. 249–265 (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
Hedau, V., Hoiem, D., Forsyth, D.: Recovering the spatial layout of cluttered rooms. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1849–1856 (2009)
Google Scholar
Horry, Y., Anjyo, K.I., Arai, K.: Tour into the picture: using a spidery mesh interface to make animation from a single image. In: Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, pp. 225–232 (1997)
Google Scholar
Hough, P.V.: Method and means for recognizing complex patterns, US Patent 3,069,654, 18 December 1962
Google Scholar
Izadinia, H., Shan, Q., Seitz, S.M.: IM2CAD. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2422–2431 (2017)
Google Scholar
Jiang, Z., Xiang, Z., Xu, J., Zhao, M.: LGT-Net: indoor panoramic room layout estimation with geometry-aware transformer network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1654–1663 (2022)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Lee, C.Y., Badrinarayanan, V., Malisiewicz, T., Rabinovich, A.: RoomNet: end-to-end room layout estimation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4875–4884 (2017)
Google Scholar
Lee, D.C., Hebert, M., Kanade, T.: Geometric reasoning for single image structure recovery. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2136–2143 (2009)
Google Scholar
Lin, Y., Pintea, S.L., van Gemert, J.C.: Deep Hough-transform line priors. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 323–340. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_20
Chapter Google Scholar
Mallya, A., Lazebnik, S.: Learning informative edge maps for indoor scene layout prediction. In: Proceedings of the IEEE International Conf. on Computer Vision (ICCV), pp. 936–944 (2015)
Google Scholar
Pintore, G., Agus, M., Gobbetti, E.: AtlantaNet: inferring the 3D indoor layout from a single \(360^\circ \) image beyond the Manhattan world assumption. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 432–448. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_26
Chapter Google Scholar
Pintore, G., Garro, V., Ganovelli, F., Gobbetti, E., Agus, M.: Omnidirectional image capture on mobile devices for fast automatic generation of 2.5 d indoor maps. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–9 (2016)
Google Scholar
Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep Hough voting for 3D object detection in point clouds. In: Proceedings of the IEEE International Confernece on Computer Vision (ICCV), pp. 9276–9285 (2019)
Google Scholar
Ren, Y., Li, S., Chen, C., Kuo, C.C.J.: A coarse-to-fine indoor layout estimation (cfile) method. In: Asia Conference on Computer Vision (ACCV), pp. 36–51 (2016)
Google Scholar
Schwing, A.G., Hazan, T., Pollefeys, M., Urtasun, R.: Efficient structured prediction for 3d indoor scene understanding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2815–2822 (2012)
Google Scholar
Schwing, A.G., Urtasun, R.: Efficient exact inference for 3D indoor scene understanding. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 299–313. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_22
Chapter Google Scholar
Sobel, I., Feldman, G.: A 3x3 isotropic gradient operator for image processing. A talk at the Stanford Artificial Project, pp. 271–272 (1968)
Google Scholar
Sun, C., Hsiao, C.W., Sun, M., Chen, H.T.: Horizonnet: learning room layout with 1D representation and pano stretch data augmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1047–1056 (2019)
Google Scholar
Sun, C., Sun, M., Chen, H.T.: Hohonet: 360 indoor holistic understanding with latent horizontal features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2573–2582 (2021)
Google Scholar
Wang, F.E., Yeh, Y.H., Sun, M., Chiu, W.C., Tsai, Y.H.: LED2-Net: monocular 360deg layout estimation via differentiable depth rendering. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12956–12965 (2021)
Google Scholar
Xu, J., Stenger, B., Kerola, T., Tung, T.: Pano2CAD: room layout from a single panorama image. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 354–362 (2017)
Google Scholar
Yang, C., Zheng, J., Dai, X., Tang, R., Ma, Y., Yuan, X.: Learning to reconstruct 3D non-cuboid room layout from a single RGB image. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2534–2543 (2022)
Google Scholar
Yang, H., Zhang, H.: Efficient 3D room shape recovery from a single panorama. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5422–5430 (2016)
Google Scholar
Yang, S.T., Wang, F.E., Peng, C.H., Wonka, P., Sun, M., Chu, H.K.: Dula-net: a dual-projection network for estimating room layouts from a single RGB panorama. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3363–3372 (2019)
Google Scholar
Yang, Y., Jin, S., Liu, R., Bing Kang, S., Yu, J.: Automatic 3d indoor scene modeling from single panorama. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3926–3934 (2018)
Google Scholar
Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 636–644 (2017)
Google Scholar
Zeng, W., Karaoglu, S., Gevers, T.: Joint 3D layout and depth prediction from a single indoor panorama image. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 666–682. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_39
Chapter Google Scholar
Zhang, Y., Song, S., Tan, P., Xiao, J.: PanoContext: a whole-room 3D context model for panoramic scene understanding. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 668–686. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_43
Chapter Google Scholar
Zhao, H., Lu, M., Yao, A., Guo, Y., Chen, Y., Zhang, L.: Physics inspired optimization on semantic transfer features: an alternative method for room layout estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 870–878 (2017)
Google Scholar
Zhao, Y., Zhu, S.C.: Scene parsing by integrating function, geometry and appearance models. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3119–3126 (2013)
Google Scholar
Zou, C., Colburn, A., Shan, Q., Hoiem, D.: Layoutnet: reconstructing the 3D room layout from a single RGB image. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2051–2059 (2018)
Google Scholar
Zou, C., et al.: Manhattan room layout reconstruction from a single \(360^{\circ }\) image: a comparative study of state-of-the-art methods. Int. J. Comput. Vision (IJCV) 129(5), 1410–1431 (2021)
Article Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Funds of China (No. 62088102, 62021002) and ByteDance Research Collaboration Project.

Author information

Authors and Affiliations

BNRist, THUIBCS, BLBCI, KLISS, School of Software, Tsinghua University, Beijing, China
Yining Zhao & Yue Gao
Pico IDL, ByteDance, Beijing, China
Chao Wen & Zhou Xue

Authors

Yining Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Chao Wen
View author publications
You can also search for this author in PubMed Google Scholar
Zhou Xue
View author publications
You can also search for this author in PubMed Google Scholar
Yue Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Gao .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 17241 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, Y., Wen, C., Xue, Z., Gao, Y. (2022). 3D Room Layout Estimation from a Cubemap of Panorama Image via Deep Manhattan Hough Transform. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13661. Springer, Cham. https://doi.org/10.1007/978-3-031-19769-7_37

Download citation

DOI: https://doi.org/10.1007/978-3-031-19769-7_37
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19768-0
Online ISBN: 978-3-031-19769-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

3D Room Layout Estimation from a Cubemap of Panorama Image via Deep Manhattan Hough Transform