Skip to main content
Log in

RC-Net: Row and Column Network with Text Feature for Parsing Floor Plan Images

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

The popularity of online home design and floor plan customization has been steadily increasing. However, the manual conversion of floor plan images from books or paper materials into electronic resources can be a challenging task due to the vast amount of historical data available. By leveraging neural networks to identify and parse floor plans, the process of converting these images into electronic materials can be significantly streamlined. In this paper, we present a novel learning framework for automatically parsing floor plan images. Our key insight is that the room type text is very common and crucial in floor plan images as it identifies the important semantic information of the corresponding room. However, this clue is rarely considered in previous learning-based methods. In contrast, we propose the Row and Column network (RC-Net) for recognizing floor plan elements by integrating the text feature. Specifically, we add the text feature branch in the network to extract text features corresponding to the room type for the guidance of room type predictions. More importantly, we formulate the Row and Column constraint module (RC constraint module) to share and constrain features across the entire row and column of the feature maps to ensure that only one type is predicted in each room as much as possible, making the segmentation boundaries between different rooms more regular and cleaner. Extensive experiments on three benchmark datasets validate that our framework substantially outperforms other state-of-the-art approaches in terms of the metrics of FWIoU, mACC and mIoU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Hori O, Tanigawa S. Raster-to-vector conversion by line fitting based on contours and skeletons. In Proc. the 2nd International Conference on Document Analysis and Recognition, Oct. 1993, pp.353–358. https://doi.org/10.1109/ICDAR.1993.395716.

  2. Liu C, Wu J J, Kohli P, Furukawa Y. Raster-to-vector: Revisiting floorplan transformation. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2214–2222. https://doi.org/10.1109/ICCV.2017.241.

  3. Chen K, Lai Y K, Wu Y X, Martin R, Hu S M. Automatic semantic modeling of indoor scenes from low-quality RGB-D data using contextual information. ACM Trans. Graphics, 2014, 33(6): Article No. 208. https://doi.org/10.1145/2661229.2661239.

  4. Liu C, Wu J Y, Furukawa Y. FloorNet: A unified framework for floorplan reconstruction from 3D scans. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.201–217. https://doi.org/10.1007/978-3-030-01231-1_13.

  5. Sharma D, Gupta N, Chattopadhyay C, Mehta S. DANIEL: A deep architecture for automatic analysis and retrieval of building floor plans. In Proc. the 14th IAPR International Conference on Document Analysis and Recognition, Nov. 2017, pp.420–425. https://doi.org/10.1109/ICDAR.2017.76.

  6. Sharma D, Gupta N, Chattopadhyay C, Mehta S. A novel feature transform framework using deep neural network for multimodal floor plan retrieval. International Journal on Document Analysis and Recognition (IJDAR), 2019, 22(4): 417–429. https://doi.org/10.1007/s10032-019-00340-1.

    Article  Google Scholar 

  7. Zhang Y D, Song S R, Tan P, Xiao J X. PanoContext: A whole-room 3D context model for panoramic scene understanding. In Proc. the 13th European Conference on Computer Vision, Sept. 2014, pp.668–686. https://doi.org/10.1007/978-3-319-10599-4_43.

  8. Yang S T, Wang F E, Peng C H, Wonka P, Sun M, Chu H K. DuLa-Net: A dual-projection network for estimating room layouts from a single RGB panorama. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.3363–3372. https://doi.org/10.1109/CVPR.2019.00348.

  9. Xu Z W, Rong Z, Wu Y H. A survey: Which features are required for dynamic visual simultaneous localization and mapping? Visual Computing for Industry, Biomedicine, and Art, 2021, 4(1): Article No. 20. https://doi.org/10.1186/s42492-021-00086-w.

  10. Ahmed S, Liwicki M, Weber M, Dengel A. Improved automatic analysis of architectural floor plans. In Proc. the 2011 International Conference on Document Analysis and Recognition, Sept. 2011, pp.864–869. https://doi.org/10.1109/ICDAR.2011.177.

  11. de las Heras L P, Fernández D, Valveny E, Lladós J, Sánchez G. Unsupervised wall detector in architectural floor plans. In Proc. the 12th International Conference on Document Analysis and Recognition, Aug. 2013, pp.1245–1249. https://doi.org/10.1109/ICDAR.2013.252.

  12. de las Heras L P, Mas J, Sánchez G, Valveny E. Wall patch-based segmentation in architectural floorplans. In Proc. the 2011 International Conference on Document Analysis and Recognition, Sept. 2011, pp.1270–1274. https://doi.org/10.1109/ICDAR.2011.256.

  13. Ahmed S, Liwicki M, Weber M, Dengel A. Automatic room detection and room labeling from architectural floor plans. In Proc. the 10th IAPR International Workshop on Document Analysis Systems, Mar. 2012, pp.339–343. https://doi.org/10.1109/DAS.2012.22.

  14. Ravagli J, Ziran Z, Marinai S. Text recognition and classification in floor plan images. In Proc. the 2019 International Conference on Document Analysis and Recognition Workshops, Sept. 2019. https://doi.org/10.1109/ICDARW.2019.00006.

  15. Yamasaki T, Zhang J, Takada Y. Apartment structure estimation using fully convolutional networks and graph model. In Proc. the 2018 ACM Workshop on Multimedia for Real Estate Tech., Jun. 2018. https://doi.org/10.1145/3210499.3210528.

  16. Zeng Z L, Li X Z, Yu Y K, Fu C W. Deep floor plan recognition using a multi-task network with room-boundary-guided attention. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27–Nov. 2, 2019, pp.9096–9104. https://doi.org/10.1109/ICCV.2019.00919.

  17. Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, 39(4): 640–651. https://doi.org/10.1109/TPAMI.2016.2572683.

  18. Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. In Proc. the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Oct. 2015, pp. 234–241. https://doi.org/10.1007/978-3-319-24574-4_28.

  19. Dosch P, Tombre K, Ah-Soon C, Masini G. A complete system for the analysis of architectural drawings. International Journal on Document Analysis and Recognition, 2000, 3(2): 102–116. https://doi.org/10.1007/PL00010901.

    Article  Google Scholar 

  20. Or S H, Wong K H, Yu Y K, Chang M M Y. Highly automatic approach to architectural floorplan image understanding & model generation. In Proc. the VMV2005, Nov. 2005, pp.25–32.

  21. Macé S, Locteau H, Valveny E, Tabbone S. A system to detect rooms in architectural floor plan images. In Proc. the 9th IAPR International Workshop on Document Analysis Systems, Jun. 2010, pp.167–174. https://doi.org/10.1145/1815330.1815352.

  22. de las Heras L P, Ahmed S, Liwicki M, Valveny E, Sánchez G. Statistical segmentation and structural recognition for floor plan interpretation. International Journal on Document Analysis and Recognition (IJDAR), 2014, 17(3): 221–237. https://doi.org/10.1007/s10032-013-0215-2.

  23. Dodge S, Xu J, Stenger B. Parsing floor plan images. In Proc. the 17th IAPR International Conference on Machine Vision Applications, May 2017, pp.358–361. https://doi.org/10.23919/MVA.2017.7986875.

  24. Ren S, He K, Girshick R et al. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2017, 39(6): 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031.

    Article  Google Scholar 

  25. Huang W X, Zheng H. Architectural drawings recognition and generation through machine learning. In Proc. the 38th Annual Conference of the Association for Computer Aided Design in Architecture, Oct. 2018, pp.18–20. https://doi.org/10.52842/conf.acadia.2018.156.

  26. Lu Z D, Wang T, Guo J W, Meng W L, Xiao J, Zhang W, Zhang X P. Data-driven floor plan understanding in rural residential buildings via deep recognition. Information Sciences, 2021, 567: 58–74. https://doi.org/10.1016/j.ins.2021.03.032.

    Article  Google Scholar 

  27. Lv X L, Zhao S C, Yu X Y, Zhao B Q. Residential floor plan recognition and reconstruction. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.16712–16721. https://doi.org/10.1109/CVPR46437.2021.01644.

  28. Chen L C, Zhu Y K, Papandreou G, Schroff F, Adam H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.833–851. https://doi.org/10.1007/978-3-030-01234-2_49.

  29. Yang B S, Jiang T P, Wu W T, Zhou Y Z, Dai L. Automated semantics and topology representation of residential-building space using floor-plan raster maps. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15: 7809–7825. https://doi.org/10.1109/JSTARS.2022.3205746.

    Article  Google Scholar 

  30. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In Proc. the 3rd International Conference on Learning Representations, May 2015.

  31. Lin T Y, Goyal P, Girshick R, He K M, Dollár P. Focal loss for dense object detection. IEEE Trans. Pattern Analysis and Machine Intelligence, 2020, 42(2): 318–327. https://doi.org/10.1109/TPAMI.2018.2858826.

    Article  Google Scholar 

  32. Liu C X, Schwing A G, Kundu K, Urtasun R, Fidler S. Rent3D: Floor-plan priors for monocular layout estimation. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2015, pp.3413–3421. https://doi.org/10.1109/CVPR.2015.7298963.

  33. Kalervo A, Ylioinas J, Häikiö M, Karhu A, Kannala J. CubiCasa5K: A dataset and an improved multi-task model for floorplan image analysis. In Proc. the 21st Scandinavian Conference on Image Analysis, Jun. 2019, pp.28–40. https://doi.org/10.1007/978-3-030-20205-7_3.

  34. Zhao H S, Shi J P, Qi X J, Wang X G, Jia J Y. Pyramid scene parsing network. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.2881–2890. https://doi.org/10.1109/CVPR.2017.660.

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jian-Wei Guo or Jun Xiao.

Additional information

Associate Professor Guo supervises this project, helps to implement the experiments, and plays a key role in promoting efficient and accurate communication. Professor Xiao gives a great contribution to experiment improvements, and is crucial in conveying information accurately in an English-speaking context.

Supplementary Information

ESM 1

(PDF 838 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, T., Meng, WL., Lu, ZD. et al. RC-Net: Row and Column Network with Text Feature for Parsing Floor Plan Images. J. Comput. Sci. Technol. 38, 526–539 (2023). https://doi.org/10.1007/s11390-023-3117-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-023-3117-x

Keywords

Navigation