Floor Plan Recognition and Vectorization Using Combination UNet, Faster-RCNN, Statistical Component Analysis and Ramer-Douglas-Peucker
- 807 Downloads
The floor plan recognition and vectorization problem from the image has a high market response due to the ability to be applied in such domains as design, automatic furniture fitting, property cost estimation, etc. Several approaches already exist on the market. Many of them are using just statistical or deep machine learning methods capable of recognizing a limited set of floor plan types or providing a semi-automatic tool for recognition. This paper introduces the approach based on the combination of statistical image processing methods in a row of machine learning techniques that allow training robust model for the different floor plan topologies. Faster R-CNN for the floor object detection with a mean average precision of 86% and UNet for the wall segmentation has shown the IoU metric results of about 99%. Both methods, combined with functional and component filtration, made it possible to implement the new approach for vectoring the floor plans.
KeywordsFloor plan analysis Image processing Deep machine learning Transfer learning Object detection Augmentation
Could be any vector tool like CorelDraw, AutoCAD.
Making some preprocessing before the user could fix things manually. Have a bunch of drawbacks cause some complicated cases could probably save not so much time rather than Manual methods.
Methods that consume only the image and returning the vector output file. Nothing must be done manually.
Mostly all automatic solutions use only deep learning or only statistical methods applicate to specific cases or datasets with known image conditions and structure. The primary purpose of this paper is to describe an approach that uses the combination of methods like computer vision, computational geometry, statistical analysis, and deep learning to enhance the general result metrics and make an approach independent on which type of plan used and what image conditions were to predict.
We proposed the approach that allows us to recognize and vectorize floor plans of different topology and different image conditions with a better IoU indicator than presented in other papers.
We have shown an approach of enlarging small dataset using back perspective transform with physical photography. This approach has shown an increase in 1.5% in the IoU indicator and allowed us to build the solution robust to shadows.
1.1 Previous Work
The introduction part of this paper has denoted that there are three general types of vectorization methods: manual, semi-automatic, and automatic.
The manual methods are usually general vector graphics software for developing the floor plan itself, for example, AutoCAD, CorelDraw, etc.
The investigated semi-automatic methods have many different approaches implementing the idea of the vectoring plan. Some methods used just for the preprocessing based on thresholding . A bit more advanced methods starting from the thresholding and after this, switching to the edit mode, where the preserved object could be placed over the source image .
The UNet architecture is used in  for door and wall recognition, but modified version U-Net+DCL, where the baseline UNet’s deconvolution layers were replaced with a simplified version of pixel deconvolution layers for segmentation. The best result in the wall recognition task in this paper is 0.799 by mean IoU metric.
The object detection approach for filtering floor plan using Faster R-CNN  was applied in paper . In this work, Faster R-CNN was chosen for object detection too since it shows high scores with fast convergence. They have achieved a mean average precision of 0.86, and a mean average recall of 0.92 on a dataset including 12 classes of objects.
2 The Floor Plan Datasets
2 neural networks were used for solving the problem: UNet for semantic segmentation. The U-Net architecture is built upon the Fully Convolutional Network (FCN), and the two main differences comparing to FCN are that UNet is symmetric and the skip connections between the downsampling path and the upsampling path apply a concatenation operator instead of a sum. These skip connections intend to provide local information to the global information while upsampling. The UNet architecture is used in this paper since it has been successfully applied to many image segmentation tasks, and it does not require a dataset of a dozen thousands of images to achieve high results. Pre-trained on ImageNet dataset ResNet backbone is used. In addition to the UNet, the DeepLab3+  model was tested as one of the state-of-the-art models for image segmentation, but results turned out to be worse than the UNet ones. The results were compared using the Intersection over Union (1).
As a model for object detection, pre-trained on ImageNet dataset Faster R-CNN was chosen as one of the most widely used state-of-the-art architecture, which shows high accuracy even when training on a small dataset. It uses Region Proposal Network (RPN) to reduce computational time and make a good accuracy as their predecessor method. Faster R-CNN spread out in many pieces of research in object detection [12, 13, 14].
4 General Approach
If the original image is smaller than required, the same algorithm is applied but using upsampling. The scaling result is processed by two neural networks.
Based on the previous step, the connected components of the image are built and semantically representing the rooms. Morphological operations as above are applied to get rid of small defects at the border, such as part of a door segmented as a wall, but using scalable empirical constants depending on the size of the door, since it is standard and varies in small intervals of 600–1000 mm. Further, component filtration is used to remove connectivity components that are not rooms Fig. 7(d).
Using the obtained contours of the rooms (internal walls borders), as well as the borders of the external walls obtained at the segmentation stage, the 1px-wide middle line is found using the Thinning Algorithm . Hereupon, the wall thickness could be found.
The result of simplified contours in a row with the result of object detection is used to arrange doors, windows, and other objects. Doors and windows are placed by the method of intersecting segments, so they become part of the wall. A K-D tree  is constructed for reducing the enumeration of segments when searching for intersections.
The developed approach has shown the ability to process user input that does not correlate with the training data. The group of methods from start to result works for 2–3 s using the following configuration: Intel Core i5, GeForce GTX 1080Ti, and 16 GB RAM, which allows recognizing incoming plans in production environments without long delays.
Faster R-CNN mAP score at different IoU thresholds.
mAP, collected dataset
mAP, collected dataset expanded with photography
Comparison of different backbones for UNet.
IoU, test images
IoU score for different neural network architectures.
IoU, collected dataset
IoU, collected dataset expanded with photography
IoU, public dataset
The developed approach allows recognizing and building vector representation for the floor plan using the combined methods of deep learning (semantic segmentation with UNet, object detection with Faster R-CNN) in a row with statistical methods (morphology, component filtration, and Ramer - Douglas - Peucker algorithm). The segmentation has shown an accuracy of about 99% by the IoU metric, while object detection has shown 86% by the mAP metric. Also, the dataset enlargement approach using a back-perspective transform was tested. This way of augmenting the dataset introduces natural spatial noise to the image that reduces risks of overfitting and allows make the processing algorithm more robust to shadows. The method developed performs the whole data processing for one input for about 2 s. It allows using this approach for cloud-based recognition systems or any other productive deployment.
- 1.PlanTracer. http://www.plantracer.ru. Accessed 26 Oct 2019
- 2.PlanCAD. System of automated design of floor plans https://sapr.ru/article/18006. Accessed 26 Oct 2019
- 4.Dodge, S., Xu, J., Stenger, B.: Parsing floor plan images. In: Fifteenth IAPR Conference on Machine Vision Applications (MVA) (2017)Google Scholar
- 5.Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. Comput. Res. Repository (2015) Google Scholar
- 6.Yang, J., Jang, H., Kim J.: Semantic Segmentation in architectural floor plans for detecting walls and doors. In: 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Beijing, China, 2018, pp. 1–9 (2018). https://doi.org/10.1109/cisp-bmei.2018.8633243
- 7.Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
- 10.OpenCV. https://opencv.org. Accessed 26 Oct 2019
- 11.Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: The European Conference on Computer Vision (ECCV), pp. 801–818 (2018)Google Scholar
- 12.Zhu, B., Wu, X., Yang, L., Shen, Y., Wu, L: Automatic detection of books based on faster R-CNN. In: Third International Conference on Digital Information Processing, Data Mning, and Wireless Communication (DIPDMWC), pp. 8–12 (2016)Google Scholar
- 13.Zhang, H., Du, Y., Ning, S., Zhang, Y., Yang, S., Du, C.: Pedestrian detection method based on faster R-CNN. In: 13th International Conference on Computational Intelligence and Security (CIS), pp. 427–430 Google Scholar
- 14.Xu, Z., Wu, Z., Feng, J.: CFUN: combining faster R-CNN and U-net network for efficient whole heart segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
- 16.Gonzalez, R., Woods, R.: Digital Image Processing, pp. 541–545. Addison-Wesley Publishing Company (1992)Google Scholar
- 19.Berman, M., Rannen Triki, A., Blaschko, M.: The Lovász-Softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
- 20.Kalervo, A., Ylioinas, J., Häikiö, M., Karhu, A., Kannala, J.: CubiCasa5K: a dataset and an improved multi-task model for floorplan image analysis. In: Felsberg, M., Forssén, P.-E., Sintorn, I.-M., Unger, J. (eds.) SCIA 2019. LNCS, vol. 11482, pp. 28–40. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20205-7_3CrossRefGoogle Scholar