Skip to main content
Log in

Room layout estimation in indoor environment: a review

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Scene understanding from the single image of an indoor scene is identified as a challenging task. This involves interpreting the assessment of multiple scene components, such as to identify the spatial layout from room, detect the objects in 3D space and classify the scene, to understand the nature of an indoor scene. Assessing the spatial structure of indoor scenes offers important geometric details and limits for various activities, such as indoor 3D-reconstruction, navigation, scene awareness and virtual reality. Most of the layout estimation function lacks the clutter and decorations present in the background, instead concentrating mainly on the horizontal wall contours. Room layout states the orientations, heights, and positions of walls with respective to its camera center. Then, a set of estimated boundaries or corner positions, or as a 3D mesh is characterized from layout. Nevertheless, identifying the 3D model from a single 2D picture is an ill-posed and challenging problem. Large number of techniques applies “Manhattan assumption” for layout estimation. The layout estimation approach is combined with few computer vision based techniques like semantic segmentation, object orientation estimation, scene classification, and 3D object detection to achieve indoor scene understanding. Additionally, the object detection and semantic segmentation is also reviewed along with this layout estimation approach. This survey provides valuable information to all researchers and those who are looking for better techniques in layout estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Abdalwhab A, Liu H (2019) Zero-shot object detection for indoor robots. In: 2019 international joint conference on neural networks (IJCNN). IEEE, pp 1–8

  2. Bao SY, Furlan A, Fei-Fei L, Savarese S (2014) Understanding the 3D layout of a cluttered room from multiple images. In: IEEE Winter conference on applications of computer vision. IEEE, pp 690–697

  3. Brucker M, Durner M, Ambruş R, Márton ZC, Wendt A, Jensfelt P, Arras KO, Triebel R (2018) Semantic labeling of indoor environments from 3D RGB maps. In: 2018 IEEE international conference on robotics and automation (ICRA). IEEE, pp 1871–1878

  4. Cavanagh P (2011) Visual cognition. Vis Res 51(13):1538

    Article  Google Scholar 

  5. Chang J, Wetzstein G (2019) Deep optics for monocular depth estimation and 3d object detection. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)

  6. Chao YW, Choi W, Pantofaru C, Savarese S (2013) Layout estimation of highly cluttered indoor scenes using geometric and semantic cues. In: International conference on image analysis and processing. Springer, pp 489–499

  7. Choi W, Chao YW, Pantofaru C, Savarese S (2013) Understanding indoor scenes using 3d geometric phrases. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 33–40

  8. Couprie C, Farabet C, Najman L, LeCun Y (2013) Indoor semantic segmentation using depth information. arXiv:1301.3572

  9. Couprie C, Farabet C, Najman L, LeCun Y (2014) Toward real-time indoor semantic segmentation using depth information. J Mach Learn Res

  10. Dai A, Ritchie D, Bokeloh M, Reed S, Sturm J, Nießner M (2018) Scancomplete: Large-scale scene completion and semantic segmentation for 3d scans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4578–4587

  11. Dai A, Nießner M (ECCV) (2018) 3dmv: Joint 3d-multi-view prediction for 3d semantic scene segmentation. In: Proceedings of the European conference on computer vision, pp 452–468

  12. Dasgupta S, Fang K, Chen K, Savarese S (2016) Delay: Robust spatial layout estimation for cluttered indoor scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 616–624

  13. Dong H, Xu S, Chou W, Jiao R, Yu H (2018) Global localization using object detection in indoor environment based on semantic map. In: 2018 WRC symposium on advanced robotics and automation (WRC SARA). IEEE, pp 178–183

  14. Engelmann F, Kontogianni T, Hermans A, Leibe B (2017) Exploring spatial context for 3D semantic segmentation of point clouds. In: Proceedings of the IEEE international conference on computer vision workshops, pp 716–724

  15. Espinace P, Kollar T, Roy N, Soto A (2013) Indoor scene recognition by a mobile robot through adaptive object detection. Robot Auton Syst 61 (9):932

    Article  Google Scholar 

  16. Fernandez-Labrador C, Facil JM, Perez-Yus A, Demonceaux C, Civera J, Guerrero J (2020) Corners for layout: End-to-end layout recovery from 360 images. IEEE Robot Autom Lett 5(2):1255

    Article  Google Scholar 

  17. Fouhey DF, Gupta A, Hebert M (2014) Unfolding an indoor origami world. In: European conference on computer vision. Springer, pp 687–702

  18. Furlan A, Miller SD, Sorrenti DG, Li FF, Savarese S (2013) Free your camera: 3D indoor scene understanding from arbitrary camera motion. In: BMVC

  19. Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Garcia-Rodriguez J (2017) arXiv:1704.06857

  20. Gupta S, Girshick R, Arbeláez P, Malik J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: European conference on computer vision. Springer, pp 345–360

  21. Guo R, Hoiem D (2013) Support surface prediction in indoor scenes. In: Proceedings of the IEEE international conference on computer vision, pp 2144–2151

  22. Guo R, Zou C, Hoiem D (2015) arXiv:1504.02437

  23. Hayat M, Khan S, Bennamoun M, An S (2016) A spatial layout and scale invariant feature representation for indoor scene classification. IEEE Trans Image Process 25(10):4829

    Article  MathSciNet  Google Scholar 

  24. Hazirbas C, Ma L, Domokos C, Cremers D (2016) Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. In: Asian conference on computer vision. Springer, pp 213–228

  25. Hermans A, Floros G, Leibe B (2014) Dense 3d semantic mapping of indoor scenes from rgb-d images. In: 2014 IEEE international conference on robotics and automation (ICRA). IEEE, pp 2631–2638

  26. Hirzer M, Lepetit V, Roth P (2020) Smart hypothesis generation for efficient and robust room layout estimation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2912–2920

  27. Hsiao CW, Sun C, Sun M, Chen HT (2019) arXiv:1905.12571

  28. Huang S, Qi S, Xiao Y, Zhu Y, Wu YN, Zhu SC (2018) arXiv:1810.13049

  29. Ikehata S, Yang H, Furukawa Y (2015) Structured indoor modeling. In: Proceedings of the IEEE international conference on computer vision, pp 1323–1331

  30. Kar A, Tulsiani S, Carreira J, Malik J (2015) Amodal completion and size constancy in natural scenes. In: Proceedings of the IEEE international conference on computer vision, pp 127–135

  31. Kim H, De Campos T, Hilton A (2016) Room layout estimation with object and material attributes information using a spherical camera. In: 2016 Fourth international conference on 3D vision (3DV). IEEE, pp 519–527

  32. Lee CY, Badrinarayanan V, Malisiewicz T, Rabinovich A (2017) Roomnet: End-to-end room layout estimation. In: Proceedings of the IEEE international conference on computer vision, pp 4865–4874

  33. Lee JK, Yea J, Park MG, Yoon KJ (2017) Joint layout estimation and global multi-view registration for indoor reconstruction. In: Proceedings of the IEEE international conference on computer vision, pp 162–171

  34. Li J, Stevenson RL (2020) Indoor layout estimation by 2d lidar and camera fusion. Electron Imaging 2020(14):391

    Article  Google Scholar 

  35. Lin D, Fidler S, Urtasun R (2013) Holistic scene understanding for 3D object detection with RGBD cameras. In: Proceedings of the IEEE international conference on computer vision (ICCV)

  36. Lin HJ, Huang SW, Lai SH, Chiang CK (2018) Indoor scene layout estimation from a single image. In: 2018 24th international conference on pattern recognition (ICPR). IEEE, pp 842–847

  37. Liu C, Schwing AG, Kundu K, Urtasun R, Fidler S (2015) Rent3d: Floor-plan priors for monocular layout estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3413–3421

  38. López-Nicolás G, Omedes J, Guerrero J (2014) Spatial layout recovery from a single omnidirectional image and its matching-free sequential propagation. Robot Auton Syst 62(9):1271

    Article  Google Scholar 

  39. Lu H, Dai Y, Shen C, Xu S (2020) IEEE Trans Pattern Anal Mach Intell :1–1. https://doi.org/10.1109/TPAMI.2020.3004474

  40. Lukierski R, Leutenegger S, Davison AJ (2017) Room layout estimation from rapid omnidirectional exploration. In: 2017 IEEE international conference on robotics and automation (ICRA). IEEE, pp 6315–6322

  41. Mahajan A, Bharti V, Singh HP, Josyula L, Kumar P, et al. (2018) Construction of a 3D map of indoor environment. Procedia Comput Sci 125:124

    Article  Google Scholar 

  42. Mallya A, Lazebnik S (2015) Learning informative edge maps for indoor scene layout prediction. In: Proceedings of the IEEE international conference on computer vision, pp 936–944

  43. Martin-Brualla R, He Y, Russell BC, Seitz SM (2014) The 3d jigsaw puzzle-Mapping large indoor spaces. In: European conference on computer vision. Springer, pp 1–16

  44. Mattausch O, Panozzo D, Mura C, Sorkine-Hornung O, Pajarola R (2014) Object detection and classification from large scale cluttered indoor scans. In: Computer graphics forum, vol 33. Wiley Online Library, pp 11–21

  45. Mitash C, Boularias A, Bekris KE (2018) Improving 6d pose estimation of objects in clutter via physics-aware monte carlo tree search. In: 2018 IEEE international conference on robotics and automation (ICRA). IEEE, pp 3331–3338

  46. Müller AC, Behnke S (2014) Learning depth-sensitive conditional random fields for semantic segmentation of RGB-D images. In: 2014 IEEE international conference on robotics and automation (ICRA). IEEE, pp 6232–6237

  47. Naseer M, Khan S, Porikli F (2018) Indoor scene understanding in 2.5/3d for autonomous agents: a survey. IEEE Access 7:1859

    Article  Google Scholar 

  48. Pham QH, Hua BS, Nguyen T, Yeung SK (2019) Real-time progressive 3D semantic segmentation for indoor scenes. In: 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1089–1098

  49. Ren Z, Sudderth EB (2016) Three-dimensional object detection and layout prediction using clouds of oriented gradients. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1525–1533

  50. Ren Y, Li S, Chen C, Kuo CCJ (2016) A coarse-to-fine indoor layout estimation (cfile) method. In: Asian conference on computer vision. Springer, pp 36–51

  51. Ren Y, Chen C, Li S, Kuo CCJ (2018) Context-assisted 3D (C3D) object detection from RGB-D images. J Vis Commun Image Represent 55:131

    Article  Google Scholar 

  52. Ren Z, Sudderth EB (2020) Clouds of oriented gradients for 3d detection of objects, surfaces, and indoor scene layouts. IEEE Trans Pattern Anal Mach Intell 42(10):2670. https://doi.org/10.1109/TPAMI.2019.2923201

    Article  Google Scholar 

  53. Reza M, Kosecka J et al (2016) Reinforcement learning for semantic segmentation in indoor scenes. arXiv:1606.01178

  54. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211

    Article  MathSciNet  Google Scholar 

  55. Silberman N, Sontag D, Fergus R (2014) Instance segmentation of indoor scenes using a coverage loss. In: European conference on computer vision. Springer, pp 616–631

  56. Song S, Xiao J (2014) Sliding shapes for 3d object detection in depth images. In: European conference on computer vision. Springer, pp 634–651

  57. Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: A rgb-d scene understanding benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 567–576

  58. Song S, Yu F, Zeng A, Chang AX, Savva M, Funkhouser T (2017) Semantic scene completion from a single depth image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1746–1754

  59. Tsitsipa V, Achillias G, Parthenios P (2018) Using big data to design user-centric museums. From visitors loyal to museums to museums loyal to users

  60. Wang J, Wang Z, Tao D, See S, Wang G (2016) Learning common and specific features for RGB-D semantic segmentation with deconvolutional networks. In: European conference on computer vision. Springer, pp 664–679

  61. Wang Y, Tan DJ, Navab N, Tombari F (2018) Adversarial semantic scene completion from a single depth image. In: 2018 International conference on 3D vision (3DV). IEEE, pp 426–434

  62. Wang R, Wan W, Wang Y, Di K (2019) A new RGB-D SLAM method with moving object detection for dynamic indoor scenes. Remote Sens 11 (10):1143

    Article  Google Scholar 

  63. Wang L, Li R, Sun J, Liu X, Zhao L, Seah HS, Quah CK, Tandianus B (2019) Multi-view fusion-based 3D object detection for robot indoor scene perception. Sensors 19(19):4092

    Article  Google Scholar 

  64. Wang L, Li R, Shi H, Sun J, Zhao L, Seah HS, Quah CK, Tandianus B (2019) Multi-channel convolutional neural network based 3D object detection for indoor robot environmental perception. Sensors 19(4):893

    Article  Google Scholar 

  65. Xiao J, Furukawa Y (2014) Reconstructing the world’s museums. Int J Comput Vis 110(3):243

    Article  Google Scholar 

  66. Xie Q, Lai YK, Wu J, Wang Z, Zhang Y, Xu K, Wang J (2020) mlcvnet: multi-level context votenet for 3D object detection

  67. Xiong H, Ma W, Zheng X, Gong J, Abdelalim D (2019) Indoor scene texturing based on single mobile phone images and 3D model fusion. Int J Digit Earth 12(5):525

    Article  Google Scholar 

  68. Yan C, Shao B, Zhao H, Ning R, Zhang Y, Xu F (2020) 3D room layout estimation from a single RGB image. IEEE Trans Multimed 22 (11):3014

    Article  Google Scholar 

  69. Yingze Bao S, Chandraker M, Lin Y, Savarese S (2013) Dense object reconstruction with semantic priors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1264–1271

  70. Zhang X, Zhuang Y, Hu H, Wang W (2015) 3-D laser-based multiclass and multiview object detection in cluttered indoor scenes. IEEE Trans Neural Netw Learn Syst 28(1):177

    Article  Google Scholar 

  71. Zhang J, Kan C, Schwing AG, Urtasun R (2013) Estimating the 3d layout of indoor scenes and its clutter from depth sensors. In: Proceedings of the IEEE international conference on computer vision, pp 1273–1280

  72. Zhang W, Zhang W, Liu K, Gu J (2016) Learning to predict high-quality edge maps for room layout estimation. IEEE Trans Multimed 19a(5):935

    Article  Google Scholar 

  73. Zhang Y, Song S, Yumer E, Savva M, Lee JY, Jin H, Funkhouser T (2017) Physically-based rendering for indoor scene understanding using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5287–5295

  74. Zhang W, Zhang W, Gu J (2019) Edge-semantic learning strategy for layout estimation in indoor environment. IEEE Trans Cybern 50(6):2730

    Article  Google Scholar 

  75. Zhao Y, Zhu SC (2013) Scene parsing by integrating function, geometry and appearance models. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3119–3126

  76. Zhao F, Zeng M, Jiang B, Liu X (2013) Render synthetic fog into interior and exterior photographs. In: Proceedings of the 12th ACM SIGGRAPH international conference on virtual-reality continuum and its applications in industry, pp 157–166

  77. Zhou Z, Farhat F, Wang JZ (2017) Detecting dominant vanishing points in natural scenes with application to composition-sensitive image retrieval. IEEE Trans Multimed 19(12):2651

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Narendra Mohan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mohan, N., Kumar, M. Room layout estimation in indoor environment: a review. Multimed Tools Appl 81, 1921–1951 (2022). https://doi.org/10.1007/s11042-021-11358-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11358-1

Keywords

Navigation