Abstract
Significant geometric structures can be compactly described by global wireframes in the estimation of 3D room layout from a single panoramic image. Based on this observation, we present an alternative approach to estimate the walls in 3D space by modeling long-range geometric patterns in a learnable Hough Transform block. We transform the image feature from a cubemap tile to the Hough space of a Manhattan world and directly map the feature to the geometric output. The convolutional layers not only learn the local gradient-like line features, but also utilize the global information to successfully predict occluded walls with a simple network structure. Unlike most previous work, the predictions are performed individually on each cubemap tile, and then assembled to get the layout estimation. Experimental results show that we achieve comparable results with recent state-of-the-art in prediction accuracy and performance. Code is available at https://github.com/Starrah/DMH-Net.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Armeni, I., Sax, S., Zamir, A.R., Savarese, S.: Joint 2D–3D-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105 (2017)
Ballard, D.H.: Generalizing the Hough transform to detect arbitrary shapes. Pattern Recogn. 13(2), 111–122 (1981)
Beltrametti, M.C., Campi, C., Massone, A.M., Torrente, M.: Geometry of the Hough transforms with applications to synthetic data. Math. Comput. Sci. 1–23 (2020)
Bertamini, M., Helmy, M., Bates, D.: The visual system prioritizes locations near corners of surfaces (not just locations near a corner). Attention Percept. Psychophys. 75(8), 1748–1760 (2013). https://doi.org/10.3758/s13414-013-0514-1
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8(6), 679–698 (1986)
Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: Proceedings of the International Conference on 3D Vision (3DV), pp. 667–676 (2017)
Coughlan, J.M., Yuille, A.L.: Manhattan world: compass direction from a single image by Bayesian inference. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 941–947 (1999)
Dasgupta, S., Fang, K., Chen, K., Savarese, S.: Delay: robust spatial layout estimation for cluttered indoor scenes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 616–624 (2016)
Del Pero, L., Bowdish, J., Kermgard, B., Hartley, E., Barnard, K.: Understanding Bayesian rooms using composite 3D object models. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 153–160 (2013)
Delage, E., Lee, H., Ng, A.Y.: A dynamic Bayesian network model for autonomous 3D reconstruction from a single indoor image. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2418–2428 (2006)
Fernandez-Labrador, C., Facil, J.M., Perez-Yus, A., Demonceaux, C., Civera, J., Guerrero, J.J.: Corners for layout: End-to-end layout recovery from 360 images. IEEE Robot. Autom. Lett. (RA-L) 5(2), 1255–1262 (2020)
Greene, N.: Environment mapping and other applications of world projections. IEEE Comput. Graph. Appl. 6(11), 21–29 (1986)
Gupta, A., Hebert, M., Kanade, T., Blei, D.: Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 1288–1296 (2010)
Han, Q., Zhao, K., Xu, J., Cheng, M.M.: Deep Hough transform for semantic line detection. IEEE Trans. Pattern Anal. Mach. Intell. 249–265 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Hedau, V., Hoiem, D., Forsyth, D.: Recovering the spatial layout of cluttered rooms. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1849–1856 (2009)
Horry, Y., Anjyo, K.I., Arai, K.: Tour into the picture: using a spidery mesh interface to make animation from a single image. In: Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, pp. 225–232 (1997)
Hough, P.V.: Method and means for recognizing complex patterns, US Patent 3,069,654, 18 December 1962
Izadinia, H., Shan, Q., Seitz, S.M.: IM2CAD. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2422–2431 (2017)
Jiang, Z., Xiang, Z., Xu, J., Zhao, M.: LGT-Net: indoor panoramic room layout estimation with geometry-aware transformer network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1654–1663 (2022)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)
Lee, C.Y., Badrinarayanan, V., Malisiewicz, T., Rabinovich, A.: RoomNet: end-to-end room layout estimation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4875–4884 (2017)
Lee, D.C., Hebert, M., Kanade, T.: Geometric reasoning for single image structure recovery. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2136–2143 (2009)
Lin, Y., Pintea, S.L., van Gemert, J.C.: Deep Hough-transform line priors. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 323–340. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_20
Mallya, A., Lazebnik, S.: Learning informative edge maps for indoor scene layout prediction. In: Proceedings of the IEEE International Conf. on Computer Vision (ICCV), pp. 936–944 (2015)
Pintore, G., Agus, M., Gobbetti, E.: AtlantaNet: inferring the 3D indoor layout from a single \(360^\circ \) image beyond the Manhattan world assumption. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 432–448. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_26
Pintore, G., Garro, V., Ganovelli, F., Gobbetti, E., Agus, M.: Omnidirectional image capture on mobile devices for fast automatic generation of 2.5 d indoor maps. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–9 (2016)
Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep Hough voting for 3D object detection in point clouds. In: Proceedings of the IEEE International Confernece on Computer Vision (ICCV), pp. 9276–9285 (2019)
Ren, Y., Li, S., Chen, C., Kuo, C.C.J.: A coarse-to-fine indoor layout estimation (cfile) method. In: Asia Conference on Computer Vision (ACCV), pp. 36–51 (2016)
Schwing, A.G., Hazan, T., Pollefeys, M., Urtasun, R.: Efficient structured prediction for 3d indoor scene understanding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2815–2822 (2012)
Schwing, A.G., Urtasun, R.: Efficient exact inference for 3D indoor scene understanding. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 299–313. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_22
Sobel, I., Feldman, G.: A 3x3 isotropic gradient operator for image processing. A talk at the Stanford Artificial Project, pp. 271–272 (1968)
Sun, C., Hsiao, C.W., Sun, M., Chen, H.T.: Horizonnet: learning room layout with 1D representation and pano stretch data augmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1047–1056 (2019)
Sun, C., Sun, M., Chen, H.T.: Hohonet: 360 indoor holistic understanding with latent horizontal features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2573–2582 (2021)
Wang, F.E., Yeh, Y.H., Sun, M., Chiu, W.C., Tsai, Y.H.: LED2-Net: monocular 360deg layout estimation via differentiable depth rendering. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12956–12965 (2021)
Xu, J., Stenger, B., Kerola, T., Tung, T.: Pano2CAD: room layout from a single panorama image. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 354–362 (2017)
Yang, C., Zheng, J., Dai, X., Tang, R., Ma, Y., Yuan, X.: Learning to reconstruct 3D non-cuboid room layout from a single RGB image. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2534–2543 (2022)
Yang, H., Zhang, H.: Efficient 3D room shape recovery from a single panorama. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5422–5430 (2016)
Yang, S.T., Wang, F.E., Peng, C.H., Wonka, P., Sun, M., Chu, H.K.: Dula-net: a dual-projection network for estimating room layouts from a single RGB panorama. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3363–3372 (2019)
Yang, Y., Jin, S., Liu, R., Bing Kang, S., Yu, J.: Automatic 3d indoor scene modeling from single panorama. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3926–3934 (2018)
Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 636–644 (2017)
Zeng, W., Karaoglu, S., Gevers, T.: Joint 3D layout and depth prediction from a single indoor panorama image. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 666–682. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_39
Zhang, Y., Song, S., Tan, P., Xiao, J.: PanoContext: a whole-room 3D context model for panoramic scene understanding. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 668–686. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_43
Zhao, H., Lu, M., Yao, A., Guo, Y., Chen, Y., Zhang, L.: Physics inspired optimization on semantic transfer features: an alternative method for room layout estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 870–878 (2017)
Zhao, Y., Zhu, S.C.: Scene parsing by integrating function, geometry and appearance models. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3119–3126 (2013)
Zou, C., Colburn, A., Shan, Q., Hoiem, D.: Layoutnet: reconstructing the 3D room layout from a single RGB image. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2051–2059 (2018)
Zou, C., et al.: Manhattan room layout reconstruction from a single \(360^{\circ }\) image: a comparative study of state-of-the-art methods. Int. J. Comput. Vision (IJCV) 129(5), 1410–1431 (2021)
Acknowledgements
This work was supported by National Natural Science Funds of China (No. 62088102, 62021002) and ByteDance Research Collaboration Project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhao, Y., Wen, C., Xue, Z., Gao, Y. (2022). 3D Room Layout Estimation from a Cubemap of Panorama Image via Deep Manhattan Hough Transform. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13661. Springer, Cham. https://doi.org/10.1007/978-3-031-19769-7_37
Download citation
DOI: https://doi.org/10.1007/978-3-031-19769-7_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19768-0
Online ISBN: 978-3-031-19769-7
eBook Packages: Computer ScienceComputer Science (R0)