Advertisement

FloorNet: A Unified Framework for Floorplan Reconstruction from 3D Scans

  • Chen LiuEmail author
  • Jiaye Wu
  • Yasutaka Furukawa
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11210)

Abstract

This paper proposes a novel deep neural architecture that automatically reconstructs a floorplan by walking through a house with a smartphone, an ultimate goal of indoor mapping research. The challenge lies in the processing of RGBD streams spanning a large 3D space. The proposed neural architecture, dubbed FloorNet, effectively processes the data through three neural network branches: (1) PointNet with 3D points, exploiting 3D information; (2) CNN with a 2D point density image in a top-down view, enhancing local spatial reasoning; and (3) CNN with RGB images, utilizing full image information. FloorNet exchanges intermediate features across the branches to exploit all the architectures. We have created a benchmark for floorplan reconstruction by acquiring RGBD video streams for 155 residential houses or apartments with Google Tango phones and annotating complete floorplan information. Our qualitative and quantitative evaluations demonstrate that the fusion of three branches effectively improves the reconstruction quality. We hope that the paper together with the benchmark will be an important step towards solving a challenging vector-graphics floorplan reconstruction problem.

Keywords

Floorplan reconstruction 3D Computer Vision 3D CNN 

Notes

Acknowledgement

This research is partially supported by National Science Foundation under grant IIS 1540012 and IIS 1618685, Google Faculty Research Award, Adobe gift fund, and Zillow gift fund. We thank Nvidia for a generous GPU donation.

References

  1. 1.
  2. 2.
    Armeni, I., et al.: 3D semantic parsing of large-scale indoor spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1534–1543 (2016)Google Scholar
  3. 3.
    Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. arXiv preprint arXiv:1709.06158 (2017)
  4. 4.
    Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1 (2017)Google Scholar
  5. 5.
    Furukawa, Y., Curless, B., Seitz, S.M., Szeliski, R.: Manhattan-world stereo. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1422–1429. IEEE (2009)Google Scholar
  6. 6.
    Furukawa, Y., Curless, B., Seitz, S.M., Szeliski, R.: Reconstructing building interiors from images. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 80–87. IEEE (2009)Google Scholar
  7. 7.
    Gao, R., et al.: Multi-story indoor floor plan reconstruction via mobile crowdsensing. IEEE Trans. Mob. Comput. 15(6), 1427–1442 (2016)CrossRefGoogle Scholar
  8. 8.
    Gao, R., et al.: Jigsaw: indoor floor plan reconstruction via mobile crowdsensing. In: Proceedings of the 20th Annual International Conference on Mobile Computing and Networking, pp. 249–260. ACM (2014)Google Scholar
  9. 9.
    Hua, B.S., Pham, Q.H., Nguyen, D.T., Tran, M.K., Yu, L.F., Yeung, S.K.: SceneNN: a scene meshes dataset with annotations. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 92–101. IEEE (2016)Google Scholar
  10. 10.
    Ikehata, S., Yang, H., Furukawa, Y.: Structured indoor modeling. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1323–1331 (2015)Google Scholar
  11. 11.
    Google Inc.: Project tango. https://developers.google.com/tango/
  12. 12.
    Jiang, Y., et al.: Hallway based automatic indoor floorplan construction using room fingerprints. In: Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 315–324. ACM (2013)Google Scholar
  13. 13.
    Klokov, R., Lempitsky, V.: Escape from cells: deep KD-networks for the recognition of 3D point cloud models. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 863–872. IEEE (2017)Google Scholar
  14. 14.
    Lee, J., Dugan, R., et al.: Google project tangoGoogle Scholar
  15. 15.
    Li, Y., Pirk, S., Su, H., Qi, C.R., Guibas, L.J.: FPNN: field probing neural networks for 3D data. In: Advances in Neural Information Processing Systems, pp. 307–315 (2016)Google Scholar
  16. 16.
    Limberger, F.A., et al.: SHREC’17 track: point-cloud shape retrieval of non-rigid toys. In: 10th Eurographics Workshop on 3D Object Retrieval, pp. 1–11 (2017)Google Scholar
  17. 17.
    Liu, C., Wu, J., Kohli, P., Furukawa, Y.: Raster-to-vector: revisiting floorplan transformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2195–2203 (2017)Google Scholar
  18. 18.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  19. 19.
    Luo, H., Zhao, F., Jiang, M., Ma, H., Zhang, Y.: Constructing an indoor floor plan using crowdsourcing based on magnetic fingerprinting. Sensors 17(11), 2678 (2017)CrossRefGoogle Scholar
  20. 20.
    Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928. IEEE (2015)Google Scholar
  21. 21.
    Mura, C., Mattausch, O., Pajarola, R.: Piecewise-planar reconstruction of multi-room interiors with arbitrary wall arrangements. In: Computer Graphics Forum, vol. 35, pp. 179–188. Wiley Online Library (2016)Google Scholar
  22. 22.
    Mura, C., Mattausch, O., Villanueva, A.J., Gobbetti, E., Pajarola, R.: Automatic room detection and reconstruction in cluttered indoor environments with complex room layouts. Comput. Graph. 44, 20–32 (2014)CrossRefGoogle Scholar
  23. 23.
    Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 127–136. IEEE (2011)Google Scholar
  24. 24.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_29CrossRefGoogle Scholar
  25. 25.
    Okorn, B., Xiong, X., Akinci, B., Huber, D.: Toward automated modeling of floor plans. In: Proceedings of the Symposium on 3D Data Processing, Visualization and Transmission, vol. 2 (2010)Google Scholar
  26. 26.
    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. arXiv preprint arXiv:1612.00593 (2016)
  27. 27.
    Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5648–5656 (2016)Google Scholar
  28. 28.
    Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5105–5114 (2017)Google Scholar
  29. 29.
    Riegler, G., Ulusoys, A.O., Geiger, A.: OctNet: learning deep 3D representations at high resolutions. arXiv preprint arXiv:1611.05009 (2016)
  30. 30.
    Schöps, T., et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: Proceedings of CVPR, vol. 3 (2017)Google Scholar
  31. 31.
    Sinha, S., Steedly, D., Szeliski, R.: Piecewise planar stereo for image-based rendering (2009)Google Scholar
  32. 32.
    Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. arXiv preprint arXiv:1611.08974 (2016)
  33. 33.
    Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  34. 34.
    Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953 (2015)Google Scholar
  35. 35.
    Sui, W., Wang, L., Fan, B., Xiao, H., Wu, H., Pan, C.: Layer-wise floorplan extraction for automatic urban building reconstruction. IEEE Trans. Vis. Comput. Graph. 22(3), 1261–1277 (2016)CrossRefGoogle Scholar
  36. 36.
    Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. arXiv preprint arXiv:1703.09438 (2017)
  37. 37.
    Turner, E., Cheng, P., Zakhor, A.: Fast, automated, scalable generation of textured 3D models of indoor environments. IEEE J. Sel. Top. Sig. Process. 9(3), 409–421 (2015)CrossRefGoogle Scholar
  38. 38.
    Wang, D.Z., Posner, I.: Voting for voting in online point cloud object detection. In: Robotics: Science and Systems (2015)Google Scholar
  39. 39.
    Wang, P.S., Liu, Y., Guo, Y.X., Sun, C.Y., Tong, X.: O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Trans. Graph. (TOG) 36(4), 72 (2017)Google Scholar
  40. 40.
    Whelan, T., Kaess, M., Fallon, M., Johannsson, H., Leonard, J., McDonald, J.: Kintinuous: spatially extended kinectfusion (2012)Google Scholar
  41. 41.
    Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)Google Scholar
  42. 42.
    Xiao, J., Furukawa, Y.: Reconstructing the worlds museums. Int. J. Comput. Vis. 110(3), 243–258 (2014)CrossRefGoogle Scholar
  43. 43.
    Xiong, X., Adan, A., Akinci, B., Huber, D.: Automatic creation of semantically rich 3D building models from laser scanner data. Autom. Constr. 31, 325–337 (2013)CrossRefGoogle Scholar
  44. 44.
    Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Computer Vision and Pattern Recognition, vol. 1 (2017)Google Scholar
  45. 45.
    Zhang, Y., Bai, M., Kohli, P., Izadi, S., Xiao, J.: DeepContext: context-encoding neural pathways for 3D holistic scene understanding. arXiv preprint arXiv:1603.04922 (2016)
  46. 46.
    Zhang, Y., Yu, F., Song, S., Xu, P., Seff, A., Xiao, J.: Large-scale scene understanding challenge: room layout estimation (2015). Accessed Sept 2015Google Scholar
  47. 47.
    Zhao, Y., Zhu, S.C.: Image parsing with stochastic scene grammar. In: Advances in Neural Information Processing Systems, pp. 73–81 (2011)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Washington University in St. LouisSt. LouisUSA
  2. 2.Simon Fraser UniversityBurnabyCanada

Personalised recommendations