Multimedia Tools and Applications

, Volume 78, Issue 1, pp 713–726 | Cite as

Real-time indoor scene reconstruction with Manhattan assumption

  • Zunjie Zhu
  • Feng XuEmail author
  • Chenggang YanEmail author
  • Ning Li
  • Bingjian Gong
  • Yongdong Zhang
  • Qionghai Dai


This paper presents a novel end-to-end system for real-time indoor scene reconstruction, which outperforms traditional image feature point-based method and dense geometry correspondence-based method in handling indoor scenes with less texture and geometry features. In our method, we fully explore the Manhattan assumption, i.e. scenes are majorly consisted with planar surfaces with orthogonal normal directions. Given an input depth frame, we first extract dominant axes coordinates via principle component analysis which involves the orthogonal prior and reduce the influence of noise. Then we calculate the coordinates of dominant planes (such as walls, floor and ceiling) in the coordinates using mean shift. Finally, we compute the camera orientation and reconstruct the scene by proposing a fast scheme based on matching the dominant axes and planes to the previous frame. We have tested our approach on several datasets and demonstrated that it outperforms some well known existing methods in these experiments. The performance of our method is also able to meet the requirement of real-time with an unoptimized CPU implementation.


SLAM Tracking Depth sensor Real-Time AR 



This work is supported by National Nature Science Foundation of China (61671196, 61327902, 61671268, 61727808), Zhejiang Province Nature Science Foundation of China LR17F030006.


  1. 1.
    Arun KS, Huang TS, Blostein SD (1987) Least-squares fitting of two 3-d point sets. IEEE Trans Pattern Anal Mach Intell 9(5):698–700CrossRefGoogle Scholar
  2. 2.
    Besl PJ, Mckay ND (1992) A method for registration of 3-D shapes. IEEE Trans Pattern Anal Mach Intell 14(2):239–256CrossRefGoogle Scholar
  3. 3.
    Calonder M, Lepetit V, Strecha C, Fua P (2010) Brief: Binary robust independent elementary features. In: European conference on computer vision, pp 778–792Google Scholar
  4. 4.
    Chen HH (1991) Pose determination from line-to-plane correspondences: existence condition and closed-form solutions. IEEE Trans Pattern Anal Mach Intell 13(6):530–541CrossRefGoogle Scholar
  5. 5.
    Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619CrossRefGoogle Scholar
  6. 6.
    Dai A, Nießner M, Zollöfer M, Izadi S, Theobalt C (2017) BundleFusion: real-time globally consistent 3D reconstruction using on-the-fly surface re-integration. ACM Trans Graph 2017 (TOG) 36(3).
  7. 7.
    Eric W, Grimson L, Lozano-Perez T (1987) Model-based recognition and localization from sparse range or tactile data. Morgan Kaufmann Publishers Inc., BurlingtonCrossRefGoogle Scholar
  8. 8.
    Furukawa Y, Curless B, Seitz SM, Szeliski R (2009) Manhattan-world stereo. In: IEEE conference on computer vision and pattern recognition 2009. CVPR 2009, pp 1422–1429Google Scholar
  9. 9.
    Henry Peter, Krainin Michael, Herbst Evan, Ren Xiaofeng, Fox D (2014) RGB-D mapping: using depth cameras for dense 3D modeling of indoor environments. Springer, BerlinGoogle Scholar
  10. 10.
    Jun HE, Hao D, Xie YQ, Liu BS (2006) Fast improved delaunay triangulation algorithm. Journal of System Simulation 18(11):3055–3057Google Scholar
  11. 11.
    Kahler O, Prisacariu VA, Ren CY, Sun X, Torr PHS, Murray DW (2015) Very high frame rate volumetric integration of depth images on mobile device. IEEE Trans Vis Comput Graph (Proceedings International Symposium on Mixed and Augmented Reality) 21(11):1241–1250CrossRefGoogle Scholar
  12. 12.
    Lee TK, Lim S, Lee S, An S (2012) Indoor mapping using planes extracted from noisy rgb-d sensors. In: Ieee/rsj international conference on intelligent robots and systems, pp 1727–1733Google Scholar
  13. 13.
    Lepetit V, Fua P (2006) Keypoint recognition using randomized trees. IEEE Trans Pattern Anal Mach Intell 28(9):1465–79CrossRefGoogle Scholar
  14. 14.
    Newcombe RA, Izadi S, Hilliges O, Molyneaux D, Kim D, Davison AJ, Kohli P, Shotton J, Hodges S, Fitzgibbon A (2011) Kinectfusion: real-time dense surface mapping and tracking. In: IEEE ISMAR. IEEE, PiscatawayGoogle Scholar
  15. 15.
    Nistér D, Stewénius H (2007) A minimal solution to the generalised 3-point pose problem. J Math Imaging Vision 27(1):67–79MathSciNetCrossRefGoogle Scholar
  16. 16.
    Prisacariu VA, Kahler O, Cheng MM, Ren CY, Valentin J, Torr PHS, Reid ID, Murray DW (2014) A framework for the volumetric integration of depth images. arXiv:1410.0925
  17. 17.
    Ramalingam S, Taguchi Y (2013) A theory of minimal 3d point to 3d plane registration and its generalization. Int J Comput Vis 102(1):73–90MathSciNetCrossRefGoogle Scholar
  18. 18.
    Rosten E, Porter R, Drummond T (2010) Faster and better: a machine learning approach to corner detection. IEEE Trans Pattern Anal Mach Intell 32(1):105–119CrossRefGoogle Scholar
  19. 19.
    Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: An efficient alternative to sift or surf. In: IEEE international conference on computer vision, pp 2564–2571Google Scholar
  20. 20.
    Shotton J, Glocker B, Zach C, Izadi S, Criminisi A, Fitzgibbon A (2013) Scene coordinate regression forests for camera relocalization in rgb-d images. In: IEEE conference on computer vision and pattern recognition, pp 2930–2937Google Scholar
  21. 21.
    Steinbrücker F, Kerl C, Cremers D (2013) Large-scale multi-resolution surface reconstruction from rgb-d sequences. In: IEEE international conference on computer vision, pp 3264–3271Google Scholar
  22. 22.
    Taguchi Y, Jian YD, Ramalingam S, Feng C (2013) Point-plane slam for hand-held 3d sensors. In: IEEE international conference on robotics and automation, pp 5182–5189Google Scholar
  23. 23.
    Trevor AJ, Rogers J, Christensen H (2012) Planar surface slam with 3d and 2d sensors. In: IEEE international conference on robotics and automation, pp 3041–3048Google Scholar
  24. 24.
    Umeyama S (1991) Least-squares estimation of transformation parameters between two point patterns. IEEE Trans Pattern Anal Mach Intell 13(4):376–380CrossRefGoogle Scholar
  25. 25.
    Whelan T, Johannsson H, Kaess M, Leonard JJ (2013) Robust real-time visual odometry for dense rgb-d mapping. In: IEEE international conference on robotics and automation, pp 5724–5731Google Scholar
  26. 26.
    Yan C, Zhang Y, Dai F, Xi W (2014) Parallel deblocking filter for hevc on many-core processor. Electron Lett 50(5):367–368CrossRefGoogle Scholar
  27. 27.
    Yan C, Zhang Y, Jizheng X, Dai F (2014) Efficient parallel framework for hevc motion estimation on many-core processors. IEEE Trans Circuits Syst Video Technol 24(12):2077–2089CrossRefGoogle Scholar
  28. 28.
    Yan C, Zhang Y, Xu J, Dai F, Li L, Dai Q, Wu F (2014) A highly parallel framework for hevc coding unit partitioning tree decision on many-core processors. IEEE Signal Process Lett 21(5):573–576CrossRefGoogle Scholar
  29. 29.
    Yan C, Xie H, Liu X, Yin J, Zhang Y, Dai Q (2017) Effective uyghur language text detection in complex? background images for traffic prompt identification.
  30. 30.
    Yan C, Xie H, Yang D, Yin J, Zhang Y, Dai Q (2017) Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans Intell Transp SystGoogle Scholar
  31. 31.
    Zhang Z, Faugeras OD (1991) Determining motion from 3d line segment matches: a comparative study. Image Vis Comput 9(1):10–19CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Institute of Information and ControlHangzhou Dianzi UniversityHangzhouChina
  2. 2.School of SoftwareTsinghua UniversityBeijingChina
  3. 3.Institute of Computing Technology, Chinese Academy of Sciences(CAS)BeijingChina
  4. 4.Department of AutomationTsinghua UniversityBeijingChina

Personalised recommendations