Abstract
Scene understanding from the single image of an indoor scene is identified as a challenging task. This involves interpreting the assessment of multiple scene components, such as to identify the spatial layout from room, detect the objects in 3D space and classify the scene, to understand the nature of an indoor scene. Assessing the spatial structure of indoor scenes offers important geometric details and limits for various activities, such as indoor 3D-reconstruction, navigation, scene awareness and virtual reality. Most of the layout estimation function lacks the clutter and decorations present in the background, instead concentrating mainly on the horizontal wall contours. Room layout states the orientations, heights, and positions of walls with respective to its camera center. Then, a set of estimated boundaries or corner positions, or as a 3D mesh is characterized from layout. Nevertheless, identifying the 3D model from a single 2D picture is an ill-posed and challenging problem. Large number of techniques applies “Manhattan assumption” for layout estimation. The layout estimation approach is combined with few computer vision based techniques like semantic segmentation, object orientation estimation, scene classification, and 3D object detection to achieve indoor scene understanding. Additionally, the object detection and semantic segmentation is also reviewed along with this layout estimation approach. This survey provides valuable information to all researchers and those who are looking for better techniques in layout estimation.
Similar content being viewed by others
References
Abdalwhab A, Liu H (2019) Zero-shot object detection for indoor robots. In: 2019 international joint conference on neural networks (IJCNN). IEEE, pp 1–8
Bao SY, Furlan A, Fei-Fei L, Savarese S (2014) Understanding the 3D layout of a cluttered room from multiple images. In: IEEE Winter conference on applications of computer vision. IEEE, pp 690–697
Brucker M, Durner M, Ambruş R, Márton ZC, Wendt A, Jensfelt P, Arras KO, Triebel R (2018) Semantic labeling of indoor environments from 3D RGB maps. In: 2018 IEEE international conference on robotics and automation (ICRA). IEEE, pp 1871–1878
Cavanagh P (2011) Visual cognition. Vis Res 51(13):1538
Chang J, Wetzstein G (2019) Deep optics for monocular depth estimation and 3d object detection. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)
Chao YW, Choi W, Pantofaru C, Savarese S (2013) Layout estimation of highly cluttered indoor scenes using geometric and semantic cues. In: International conference on image analysis and processing. Springer, pp 489–499
Choi W, Chao YW, Pantofaru C, Savarese S (2013) Understanding indoor scenes using 3d geometric phrases. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 33–40
Couprie C, Farabet C, Najman L, LeCun Y (2013) Indoor semantic segmentation using depth information. arXiv:1301.3572
Couprie C, Farabet C, Najman L, LeCun Y (2014) Toward real-time indoor semantic segmentation using depth information. J Mach Learn Res
Dai A, Ritchie D, Bokeloh M, Reed S, Sturm J, Nießner M (2018) Scancomplete: Large-scale scene completion and semantic segmentation for 3d scans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4578–4587
Dai A, Nießner M (ECCV) (2018) 3dmv: Joint 3d-multi-view prediction for 3d semantic scene segmentation. In: Proceedings of the European conference on computer vision, pp 452–468
Dasgupta S, Fang K, Chen K, Savarese S (2016) Delay: Robust spatial layout estimation for cluttered indoor scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 616–624
Dong H, Xu S, Chou W, Jiao R, Yu H (2018) Global localization using object detection in indoor environment based on semantic map. In: 2018 WRC symposium on advanced robotics and automation (WRC SARA). IEEE, pp 178–183
Engelmann F, Kontogianni T, Hermans A, Leibe B (2017) Exploring spatial context for 3D semantic segmentation of point clouds. In: Proceedings of the IEEE international conference on computer vision workshops, pp 716–724
Espinace P, Kollar T, Roy N, Soto A (2013) Indoor scene recognition by a mobile robot through adaptive object detection. Robot Auton Syst 61 (9):932
Fernandez-Labrador C, Facil JM, Perez-Yus A, Demonceaux C, Civera J, Guerrero J (2020) Corners for layout: End-to-end layout recovery from 360 images. IEEE Robot Autom Lett 5(2):1255
Fouhey DF, Gupta A, Hebert M (2014) Unfolding an indoor origami world. In: European conference on computer vision. Springer, pp 687–702
Furlan A, Miller SD, Sorrenti DG, Li FF, Savarese S (2013) Free your camera: 3D indoor scene understanding from arbitrary camera motion. In: BMVC
Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Garcia-Rodriguez J (2017) arXiv:1704.06857
Gupta S, Girshick R, Arbeláez P, Malik J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: European conference on computer vision. Springer, pp 345–360
Guo R, Hoiem D (2013) Support surface prediction in indoor scenes. In: Proceedings of the IEEE international conference on computer vision, pp 2144–2151
Guo R, Zou C, Hoiem D (2015) arXiv:1504.02437
Hayat M, Khan S, Bennamoun M, An S (2016) A spatial layout and scale invariant feature representation for indoor scene classification. IEEE Trans Image Process 25(10):4829
Hazirbas C, Ma L, Domokos C, Cremers D (2016) Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. In: Asian conference on computer vision. Springer, pp 213–228
Hermans A, Floros G, Leibe B (2014) Dense 3d semantic mapping of indoor scenes from rgb-d images. In: 2014 IEEE international conference on robotics and automation (ICRA). IEEE, pp 2631–2638
Hirzer M, Lepetit V, Roth P (2020) Smart hypothesis generation for efficient and robust room layout estimation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2912–2920
Hsiao CW, Sun C, Sun M, Chen HT (2019) arXiv:1905.12571
Huang S, Qi S, Xiao Y, Zhu Y, Wu YN, Zhu SC (2018) arXiv:1810.13049
Ikehata S, Yang H, Furukawa Y (2015) Structured indoor modeling. In: Proceedings of the IEEE international conference on computer vision, pp 1323–1331
Kar A, Tulsiani S, Carreira J, Malik J (2015) Amodal completion and size constancy in natural scenes. In: Proceedings of the IEEE international conference on computer vision, pp 127–135
Kim H, De Campos T, Hilton A (2016) Room layout estimation with object and material attributes information using a spherical camera. In: 2016 Fourth international conference on 3D vision (3DV). IEEE, pp 519–527
Lee CY, Badrinarayanan V, Malisiewicz T, Rabinovich A (2017) Roomnet: End-to-end room layout estimation. In: Proceedings of the IEEE international conference on computer vision, pp 4865–4874
Lee JK, Yea J, Park MG, Yoon KJ (2017) Joint layout estimation and global multi-view registration for indoor reconstruction. In: Proceedings of the IEEE international conference on computer vision, pp 162–171
Li J, Stevenson RL (2020) Indoor layout estimation by 2d lidar and camera fusion. Electron Imaging 2020(14):391
Lin D, Fidler S, Urtasun R (2013) Holistic scene understanding for 3D object detection with RGBD cameras. In: Proceedings of the IEEE international conference on computer vision (ICCV)
Lin HJ, Huang SW, Lai SH, Chiang CK (2018) Indoor scene layout estimation from a single image. In: 2018 24th international conference on pattern recognition (ICPR). IEEE, pp 842–847
Liu C, Schwing AG, Kundu K, Urtasun R, Fidler S (2015) Rent3d: Floor-plan priors for monocular layout estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3413–3421
López-Nicolás G, Omedes J, Guerrero J (2014) Spatial layout recovery from a single omnidirectional image and its matching-free sequential propagation. Robot Auton Syst 62(9):1271
Lu H, Dai Y, Shen C, Xu S (2020) IEEE Trans Pattern Anal Mach Intell :1–1. https://doi.org/10.1109/TPAMI.2020.3004474
Lukierski R, Leutenegger S, Davison AJ (2017) Room layout estimation from rapid omnidirectional exploration. In: 2017 IEEE international conference on robotics and automation (ICRA). IEEE, pp 6315–6322
Mahajan A, Bharti V, Singh HP, Josyula L, Kumar P, et al. (2018) Construction of a 3D map of indoor environment. Procedia Comput Sci 125:124
Mallya A, Lazebnik S (2015) Learning informative edge maps for indoor scene layout prediction. In: Proceedings of the IEEE international conference on computer vision, pp 936–944
Martin-Brualla R, He Y, Russell BC, Seitz SM (2014) The 3d jigsaw puzzle-Mapping large indoor spaces. In: European conference on computer vision. Springer, pp 1–16
Mattausch O, Panozzo D, Mura C, Sorkine-Hornung O, Pajarola R (2014) Object detection and classification from large scale cluttered indoor scans. In: Computer graphics forum, vol 33. Wiley Online Library, pp 11–21
Mitash C, Boularias A, Bekris KE (2018) Improving 6d pose estimation of objects in clutter via physics-aware monte carlo tree search. In: 2018 IEEE international conference on robotics and automation (ICRA). IEEE, pp 3331–3338
Müller AC, Behnke S (2014) Learning depth-sensitive conditional random fields for semantic segmentation of RGB-D images. In: 2014 IEEE international conference on robotics and automation (ICRA). IEEE, pp 6232–6237
Naseer M, Khan S, Porikli F (2018) Indoor scene understanding in 2.5/3d for autonomous agents: a survey. IEEE Access 7:1859
Pham QH, Hua BS, Nguyen T, Yeung SK (2019) Real-time progressive 3D semantic segmentation for indoor scenes. In: 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1089–1098
Ren Z, Sudderth EB (2016) Three-dimensional object detection and layout prediction using clouds of oriented gradients. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1525–1533
Ren Y, Li S, Chen C, Kuo CCJ (2016) A coarse-to-fine indoor layout estimation (cfile) method. In: Asian conference on computer vision. Springer, pp 36–51
Ren Y, Chen C, Li S, Kuo CCJ (2018) Context-assisted 3D (C3D) object detection from RGB-D images. J Vis Commun Image Represent 55:131
Ren Z, Sudderth EB (2020) Clouds of oriented gradients for 3d detection of objects, surfaces, and indoor scene layouts. IEEE Trans Pattern Anal Mach Intell 42(10):2670. https://doi.org/10.1109/TPAMI.2019.2923201
Reza M, Kosecka J et al (2016) Reinforcement learning for semantic segmentation in indoor scenes. arXiv:1606.01178
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211
Silberman N, Sontag D, Fergus R (2014) Instance segmentation of indoor scenes using a coverage loss. In: European conference on computer vision. Springer, pp 616–631
Song S, Xiao J (2014) Sliding shapes for 3d object detection in depth images. In: European conference on computer vision. Springer, pp 634–651
Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: A rgb-d scene understanding benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 567–576
Song S, Yu F, Zeng A, Chang AX, Savva M, Funkhouser T (2017) Semantic scene completion from a single depth image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1746–1754
Tsitsipa V, Achillias G, Parthenios P (2018) Using big data to design user-centric museums. From visitors loyal to museums to museums loyal to users
Wang J, Wang Z, Tao D, See S, Wang G (2016) Learning common and specific features for RGB-D semantic segmentation with deconvolutional networks. In: European conference on computer vision. Springer, pp 664–679
Wang Y, Tan DJ, Navab N, Tombari F (2018) Adversarial semantic scene completion from a single depth image. In: 2018 International conference on 3D vision (3DV). IEEE, pp 426–434
Wang R, Wan W, Wang Y, Di K (2019) A new RGB-D SLAM method with moving object detection for dynamic indoor scenes. Remote Sens 11 (10):1143
Wang L, Li R, Sun J, Liu X, Zhao L, Seah HS, Quah CK, Tandianus B (2019) Multi-view fusion-based 3D object detection for robot indoor scene perception. Sensors 19(19):4092
Wang L, Li R, Shi H, Sun J, Zhao L, Seah HS, Quah CK, Tandianus B (2019) Multi-channel convolutional neural network based 3D object detection for indoor robot environmental perception. Sensors 19(4):893
Xiao J, Furukawa Y (2014) Reconstructing the world’s museums. Int J Comput Vis 110(3):243
Xie Q, Lai YK, Wu J, Wang Z, Zhang Y, Xu K, Wang J (2020) mlcvnet: multi-level context votenet for 3D object detection
Xiong H, Ma W, Zheng X, Gong J, Abdelalim D (2019) Indoor scene texturing based on single mobile phone images and 3D model fusion. Int J Digit Earth 12(5):525
Yan C, Shao B, Zhao H, Ning R, Zhang Y, Xu F (2020) 3D room layout estimation from a single RGB image. IEEE Trans Multimed 22 (11):3014
Yingze Bao S, Chandraker M, Lin Y, Savarese S (2013) Dense object reconstruction with semantic priors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1264–1271
Zhang X, Zhuang Y, Hu H, Wang W (2015) 3-D laser-based multiclass and multiview object detection in cluttered indoor scenes. IEEE Trans Neural Netw Learn Syst 28(1):177
Zhang J, Kan C, Schwing AG, Urtasun R (2013) Estimating the 3d layout of indoor scenes and its clutter from depth sensors. In: Proceedings of the IEEE international conference on computer vision, pp 1273–1280
Zhang W, Zhang W, Liu K, Gu J (2016) Learning to predict high-quality edge maps for room layout estimation. IEEE Trans Multimed 19a(5):935
Zhang Y, Song S, Yumer E, Savva M, Lee JY, Jin H, Funkhouser T (2017) Physically-based rendering for indoor scene understanding using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5287–5295
Zhang W, Zhang W, Gu J (2019) Edge-semantic learning strategy for layout estimation in indoor environment. IEEE Trans Cybern 50(6):2730
Zhao Y, Zhu SC (2013) Scene parsing by integrating function, geometry and appearance models. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3119–3126
Zhao F, Zeng M, Jiang B, Liu X (2013) Render synthetic fog into interior and exterior photographs. In: Proceedings of the 12th ACM SIGGRAPH international conference on virtual-reality continuum and its applications in industry, pp 157–166
Zhou Z, Farhat F, Wang JZ (2017) Detecting dominant vanishing points in natural scenes with application to composition-sensitive image retrieval. IEEE Trans Multimed 19(12):2651
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mohan, N., Kumar, M. Room layout estimation in indoor environment: a review. Multimed Tools Appl 81, 1921–1951 (2022). https://doi.org/10.1007/s11042-021-11358-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11358-1