Advertisement

GeoLayout: Geometry Driven Room Layout Estimation Based on Depth Maps of Planes

Conference paper
  • 770 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12361)

Abstract

The task of room layout estimation is to locate the wall-floor, wall-ceiling, and wall-wall boundaries. Most recent methods solve this problem based on edge/keypoint detection or semantic segmentation. However, these approaches have shown limited attention on the geometry of the dominant planes and the intersection between them, which has significant impact on room layout. In this work, we propose to incorporate geometric reasoning to deep learning for layout estimation. Our approach learns to infer the depth maps of the dominant planes in the scene by predicting the pixel-level surface parameters, and the layout can be generated by the intersection of the depth maps. Moreover, we present a new dataset with pixel-level depth annotation of dominant planes. It is larger than the existing datasets and contains both cuboid and non-cuboid rooms. Experimental results show that our approach produces considerable performance gains on both 2D and 3D datasets.

Keywords

Room layout estimation Plane segmentation Dataset 

Notes

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61991411, and Grant U1913204, in part by the National Key Research and Development Plan of China under Grant 2017YFB1300205, and in part by the Shandong Major Scientific and Technological Innovation Project (MSTIP) under Grant 2018CXGC1503. We thank the LSUN organizer for the benchmarking service.

Supplementary material

504471_1_En_37_MOESM1_ESM.pdf (21.6 mb)
Supplementary material 1 (pdf 22112 KB)

References

  1. 1.
    Barinova, O., Konushin, V., Yakubenko, A., Lee, K.C., Lim, H., Konushin, A.: Fast automatic single-view 3-D reconstruction of urban scenes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 100–113. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88688-4_8CrossRefGoogle Scholar
  2. 2.
    Camplani, M., Mantecon, T., Salgado, L.: Depth-color fusion strategy for 3-D scene modeling with kinect. IEEE Trans. Cybern. 43(6), 1560–1571 (2013)CrossRefGoogle Scholar
  3. 3.
    Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: International Conference on 3D Vision (3DV) (2017)Google Scholar
  4. 4.
    Choi, W., Chao, Y.W., Pantofaru, C., Savarese, S.: Understanding indoor scenes using 3D geometric phrases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 33–40 (2013)Google Scholar
  5. 5.
    Dasgupta, S., Fang, K., Chen, K., Savarese, S.: Delay: robust spatial layout estimation for cluttered indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 616–624 (2016)Google Scholar
  6. 6.
    De Brabandere, B., Neven, D., Van Gool, L.: Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:1708.02551 (2017)
  7. 7.
    Del Pero, L., Bowdish, J., Kermgard, B., Hartley, E., Barnard, K.: Understanding Bayesian rooms using composite 3D object models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 153–160 (2013)Google Scholar
  8. 8.
    Delage, E., Lee, H., Ng, A.Y.: Automatic single-image 3D reconstructions of indoor manhattan world scenes. In: Thrun, S., Brooks, R., Durrant-Whyte, H. (eds.) Robotics Research. STAR, vol. 28, pp. 305–321. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-48113-3_28CrossRefGoogle Scholar
  9. 9.
    Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Fouhey, D.F., Gupta, A., Hebert, M.: Unfolding an indoor origami world. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 687–702. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_44CrossRefGoogle Scholar
  11. 11.
    Guo, R., Zou, C., Hoiem, D.: Predicting complete 3D models of indoor scenes. arXiv preprint arXiv:1504.02437 (2015)
  12. 12.
    Haines, O., Calway, A.: Recognising planes in a single image. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1849–1861 (2014)CrossRefGoogle Scholar
  13. 13.
    Hedau, V., Hoiem, D., Forsyth, D.: Recovering the spatial layout of cluttered rooms. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1849–1856. IEEE (2009)Google Scholar
  14. 14.
    Hirzer, M., Lepetit, V., Roth, P.: Smart hypothesis generation for efficient and robust room layout estimation. In: The IEEE Winter Conference on Applications of Computer Vision (WACV), March 2020Google Scholar
  15. 15.
    Hsiao, C.W., Sun, C., Sun, M., Chen, H.T.: Flat2layout: flat representation for estimating layout of general room types. arXiv preprint arXiv:1905.12571 (2019)
  16. 16.
    Hu, J., Ozay, M., Zhang, Y., Okatani, T.: Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1043–1051. IEEE (2019)Google Scholar
  17. 17.
    Izadinia, H., Shan, Q., Seitz, S.M.: Im2cad. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5134–5143 (2017)Google Scholar
  18. 18.
    Karsch, K., Hedau, V., Forsyth, D., Hoiem, D.: Rendering synthetic objects into legacy photographs. In: ACM Transactions on Graphics (TOG), vol. 30, p. 157. ACM (2011)Google Scholar
  19. 19.
    Kruzhilov, I., Romanov, M., Babichev, D., Konushin, A.: Double refinement network for room layout estimation. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W.Q. (eds.) ACPR 2019. LNCS, vol. 12046, pp. 557–568. Springer, Cham (2020).  https://doi.org/10.1007/978-3-030-41404-7_39CrossRefGoogle Scholar
  20. 20.
    Lee, C.Y., Badrinarayanan, V., Malisiewicz, T., Rabinovich, A.: Roomnet: end-to-end room layout estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4865–4874 (2017)Google Scholar
  21. 21.
    Lee, D.C., Hebert, M., Kanade, T.: Geometric reasoning for single image structure recovery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2136–2143. IEEE (2009)Google Scholar
  22. 22.
    Liu, C., Kim, K., Gu, J., Furukawa, Y., Kautz, J.: PlaneRCNN: 3D plane detection and reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4450–4459 (2019)Google Scholar
  23. 23.
    Liu, C., Yang, J., Ceylan, D., Yumer, E., Furukawa, Y.: PlaneNet: piece-wise planar reconstruction from a single RGB image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2579–2588 (2018)Google Scholar
  24. 24.
    Liu, C., Schwing, A.G., Kundu, K., Urtasun, R., Fidler, S.: Rent3D: floor-plan priors for monocular layout estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3413–3421. IEEE (2015)Google Scholar
  25. 25.
    Mallya, A., Lazebnik, S.: Learning informative edge maps for indoor scene layout prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 936–944 (2015)Google Scholar
  26. 26.
    Martin-Brualla, R., He, Y., Russell, B.C., Seitz, S.M.: The 3D jigsaw puzzle: mapping large indoor spaces. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 1–16. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10578-9_1CrossRefGoogle Scholar
  27. 27.
    Micusk, B., Wildenauer, H., Vincze, M.: Towards detection of orthogonal planes in monocular images of indoor environments. In: 2008 IEEE International Conference on Robotics and Automation, pp. 999–1004. IEEE (2008)Google Scholar
  28. 28.
    Paszke, A., et al.: Automatic differentiation in pytorch (2017)Google Scholar
  29. 29.
    Pero, L.D., Bowdish, J., Fried, D., Kermgard, B., Hartley, E., Barnard, K.: Bayesian geometric modeling of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2719–2726. IEEE (2012)Google Scholar
  30. 30.
    Ramalingam, S., Pillai, J., Jain, A., Taguchi, Y.: Manhattan junction catalogue for spatial reasoning of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3065–3072 (2013)Google Scholar
  31. 31.
    Ren, Y., Li, S., Chen, C., Kuo, C.-C.J.: A coarse-to-fine indoor layout estimation (CFILE) method. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10115, pp. 36–51. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-54193-8_3CrossRefGoogle Scholar
  32. 32.
    Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008).  https://doi.org/10.1007/s11263-007-0090-8CrossRefGoogle Scholar
  33. 33.
    Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017)CrossRefGoogle Scholar
  34. 34.
    Wang, H., Gould, S., Roller, D.: Discriminative learning with latent variables for cluttered indoor scene understanding. Commun. ACM 56(4), 92–99 (2013)CrossRefGoogle Scholar
  35. 35.
    Xiao, J., Furukawa, Y.: Reconstructing the world’s museums. Int. J. Comput. Vis. 110(3), 243–258 (2014).  https://doi.org/10.1007/s11263-014-0711-yCrossRefGoogle Scholar
  36. 36.
    Yang, F., Zhou, Z.: Recovering 3D planes from a single image via convolutional neural networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 87–103. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01249-6_6CrossRefGoogle Scholar
  37. 37.
    Yu, Z., Zheng, J., Lian, D., Zhou, Z., Gao, S.: Single-image piece-wise planar 3D reconstruction via associative embedding. arXiv preprint arXiv:1902.09777 (2019)
  38. 38.
    Zhang, W., Zhang, W., Gu, J.: Edge-semantic learning strategy for layout estimation in indoor environment. IEEE Trans. Cybern. 50(6), 2730–2739 (2019)CrossRefGoogle Scholar
  39. 39.
    Zhang, Y., Yu, F., Song, S., Xu, P., Seff, A., Xiao, J.: Largescale scene understanding challenge: room layout estimation. http://lsun.cs.princeton.edu/2016/
  40. 40.
    Zhao, H., Lu, M., Yao, A., Guo, Y., Chen, Y., Zhang, L.: Physics inspired optimization on semantic transfer features: an alternative method for room layout estimation. arXiv preprint arXiv:1707.00383 (2017)
  41. 41.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)Google Scholar
  42. 42.
    Zhao, Y., Zhu, S.C.: Scene parsing by integrating function, geometry and appearance models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3119–3126 (2013)Google Scholar
  43. 43.
    Zou, C., Colburn, A., Shan, Q., Hoiem, D.: LayoutNet: reconstructing the 3D room layout from a single RGB image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2051–2059 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of Control Science and EngineeringShandong UniversityJinanChina
  2. 2.School of Communications and Information EngineeringXi’an University of Posts and TelecommunicationsXi’anChina
  3. 3.Google ResearchMountain ViewUSA

Personalised recommendations