Advertisement

International Journal of Computer Vision

, Volume 126, Issue 6, pp 615–636 | Cite as

SDF-2-SDF Registration for Real-Time 3D Reconstruction from RGB-D Data

  • Miroslava Slavcheva
  • Wadim Kehl
  • Nassir Navab
  • Slobodan Ilic
Article
  • 1.3k Downloads

Abstract

We tackle the task of dense 3D reconstruction from RGB-D data. Contrary to the majority of existing methods, we focus not only on trajectory estimation accuracy, but also on reconstruction precision. The key technique is SDF-2-SDF registration, which is a correspondence-free, symmetric, dense energy minimization method, performed via the direct voxel-wise difference between a pair of signed distance fields. It has a wider convergence basin than traditional point cloud registration and cloud-to-volume alignment techniques. Furthermore, its formulation allows for straightforward incorporation of photometric and additional geometric constraints. We employ SDF-2-SDF registration in two applications. First, we perform small-to-medium scale object reconstruction entirely on the CPU. To this end, the camera is tracked frame-to-frame in real time. Then, the initial pose estimates are refined globally in a lightweight optimization framework, which does not involve a pose graph. We combine these procedures into our second, fully real-time application for larger-scale object reconstruction and SLAM. It is implemented as a hybrid system, whereby tracking is done on the GPU, while refinement runs concurrently over batches on the CPU. To bound memory and runtime footprints, registration is done over a fixed number of limited-extent volumes, anchored at geometry-rich locations. Extensive qualitative and quantitative evaluation of both trajectory accuracy and model fidelity on several public RGB-D datasets, acquired with various quality sensors, demonstrates higher precision than related techniques.

Keywords

Signed distance field Registration 3D reconstruction Camera tracking Global optimization RGB-D sensors 

Supplementary material

Supplementary material 1 (avi 23636 KB)

References

  1. Adalsteinsson, D., & Sethian, J. A. (1995). A fast level set method for propagating interfaces. Journal of Computational Physics, 118(2), 269–277.MathSciNetCrossRefzbMATHGoogle Scholar
  2. Alexandre, L. A. (2012). 3D descriptors for object and category recognition: A comparative evaluation. In Workshop on Color-Depth Camera Fusion in Robotics at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).Google Scholar
  3. Besl, P. J., & McKay, N. D. (1992). A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 14(2), 239–256.CrossRefGoogle Scholar
  4. Blender Project: Free and open 3D creation software. https://www.blender.org/. Last Accessed March 30, 2017.
  5. Bo, L., Ren, X., & Fox, D. (2011). Depth Kernel descriptors for object recognition. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).Google Scholar
  6. Bylow, E., Olsson, C., & Kahl, F. (2014). Robust camera tracking by combining color and depth measurements. In International Conference on Pattern Recognition (ICPR).Google Scholar
  7. Bylow, E., Sturm, J., Kerl, C., Kahl, F., & Cremers, D. (2013). Real-time camera tracking and 3D reconstruction using signed distance functions. In Robotics: Science and Systems Conference (RSS).Google Scholar
  8. Canelhas, D. (2017). sdf_tracker - ROS Wiki. http://wiki.ros.org/sdf_tracker. Last Accessed March 30, 2017.
  9. Canelhas, D. R., Stoyanov, T., & Lilienthal, A. J. (2013). SDF Tracker: A parallel algorithm for on-line pose estimation and scene reconstruction from depth images. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).Google Scholar
  10. Chen, Y., & Medioni, G. (1991). Object modeling by registration of multiple range images. In IEEE International Conference on Robotics and Automation (ICRA).Google Scholar
  11. Chen, J., Bautembach, D., & Izadi, S. (2013). Scalable real-time volumetric surface reconstruction. ACM Transactions on Graphics, 32(4), 113.zbMATHGoogle Scholar
  12. Choi, S., Zhou, Q. Y., & Koltun, V. (2015) Robust reconstruction of indoor scenes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  13. Choi, S., Zhou, Q., Miller, S., & Koltun, V. (2016). A large dataset of object scans. arXiv:1602.02481.
  14. Clarenz, U., Rumpf, M., & Telea, A. (2004). Robust feature detection and local classification for surfaces based on moment analysis. IEEE Transactions on Visualization and Computer Graphics, 10(5), 516–524.CrossRefGoogle Scholar
  15. CloudCompare: 3D point cloud and mesh processing software. http://www.danielgm.net/cc/. Last Accessed March 30, 2017.
  16. Curless, B., & Levoy, M. (1996). A volumetric method for building complex models from range images. In 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’96, pp. 303–312.Google Scholar
  17. Dimashova, M., Lysenkov, I., Rabaud, V., & Eruhimov, V. (2013). Tabletop object scanning with an RGB-D sensor. In Third Workshop on Semantic Perception, Mapping and Exploration (SPME) at the 2013 IEEE International Conference on Robotics and Automation (ICRA).Google Scholar
  18. Drost, B., Ulrich, M., Navab, N., & Ilic, S. (2010) Model globally, match locally: Efficient and robust 3D object recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  19. Endres, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D., & Burgard, W. (2012). An evaluation of the RGB-D SLAM system. In IEEE International Conference on Robotics and Automation (ICRA).Google Scholar
  20. Fioraio, N., Taylor, J., Fitzgibbon, A., Di Stefano, L., & Izadi, S. (2015). Large-scale and drift-free surface reconstruction using online subvolume registration. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
  21. Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.MathSciNetCrossRefGoogle Scholar
  22. Gelfand, N., Mitra, N. J., Guibas, L. J., & Pottmann, H. (2005). Robust global registration. In Third Eurographics Symposium on Geometry Processing (SGP).Google Scholar
  23. Henry, P., Fox, D., Bhowmik, A., & Mongia, R. (2013). Patch volumes: Segmentation-based consistent mapping with RGB-D cameras. In International Conference on 3D Vision (3DV).Google Scholar
  24. Henry, P., Krainin, M., Herbst, E., Ren, X., & Fox, D. (2010). RGB-D mapping: Using depth cameras for dense 3D modeling of indoor environments. In International Symposium on Experimental Robotics.Google Scholar
  25. Holzer, S., Shotton, J., & Kohli, P. (2012). Learning to efficiently detect repeatable interest points in depth data. In European Conference on Computer Vision (ECCV).Google Scholar
  26. Houston, B., Nielsen, M. B., Batty, C., Nilsson, O., & Museth, K. (2006). Hierarchical RLE level set: A compact and versatile deformable surface representation. ACM Transactions on Graphics (TOG), 25(1), 151–175.CrossRefGoogle Scholar
  27. Ioannou, Y., Taati, B., Harrap, R., & Greenspan, M. A. (2012). Difference of normals as a multi-scale operator in unorganized point clouds. In Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission (3DIMPVT).Google Scholar
  28. Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A., & Fitzgibbon, A. (2011). KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera. In ACM Symposium on User Interface Software and Technology (UIST).Google Scholar
  29. Johnson, A. E., & Hebert, M. (1999). Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 21(5), 433–449.CrossRefGoogle Scholar
  30. Johnson, A., & Kang, S. B. (1999). Registration and integration of textured 3D Data. Image and Vision Computing, 17, 135–147.CrossRefGoogle Scholar
  31. Kähler, O., Prisacariu, V. A., Ren, C. Y., Sun, X., Torr, P., & Murray, D. (2015). Very high frame rate volumetric integration of depth images on mobile devices. IEEE Transactions on Visualization and Computer Graphics (TVCG), 21(11), 1241–1250.CrossRefGoogle Scholar
  32. Kehl, W., Holl, T., Tombari, F., Ilic, S., & Navab, N. (2016). An Octree-based approach towards efficient variational range data fusion. In British Machine Vision Conference (BMVC).Google Scholar
  33. Kehl, W., Navab, N., & Ilic, S. (2014). Coloured signed distance fields for full 3D object reconstruction. In Proceedings of the British Machine Vision Conference (BMVC).Google Scholar
  34. Keller, M., Lefloch, D., Lambers, M., Izadi, S., Weyrich, T., & Kolb, A. (2013). Real-time 3D reconstruction in dynamic scenes using point-based fusion. In 2013 International Conference on 3D Vision (3DV).Google Scholar
  35. Kerl, C. (2017). GitHub—tum-vision/dvo: Dense Visual Odometry. https://github.com/tum-vision/dvo. Last accessed March 30, 2017.
  36. Kerl, C., Sturm, J., & Cremers, D. (2013). Robust odometry estimation for RGB-D cameras. In IEEE International Conference on Robotics and Automation (ICRA).Google Scholar
  37. Khoshelham, K., & Elberink, S. O. (2012). Accuracy and resolution of kinect depth data for indoor mapping applications. Sensors, 12(2), 1437–1454.CrossRefGoogle Scholar
  38. KinectFusion implementation in the point cloud library (PCL). https://github.com/PointCloudLibrary/pcl/tree/master/gpu/kinfu. Last Accessed March 30, 2017.
  39. Klein, G., & Murray, D. (2007). Parallel tracking and mapping for small AR workspaces. In Proceedings of the Sixth IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR).Google Scholar
  40. Kümmerle, R., Grisetti, G., Strasdat, H., Konolige, K., & Burgard, W. (2011). g2o: A general framework for graph optimization. In IEEE International Conference on Robotics and Automation (ICRA).Google Scholar
  41. Lai, K., Bo, L., Ren, X., & Fox, D. (2011). A large-scale hierarchical multi-view RGB-D object dataset. In International Conference on Robotics and Automation (ICRA).Google Scholar
  42. Lorensen, W. E., & Cline, H. E. (1987). Marching cubes: a high resolution 3D surface construction algorithm. In 14th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’87.Google Scholar
  43. Losasso, F., Fedkiw, R., & Osher, S. (2006). Spatially adaptive techniques for level set methods and incompressible flow. Computers and Fluids, 35(10), 995–1010.MathSciNetCrossRefzbMATHGoogle Scholar
  44. Ma, Y., Soatto, S., Kosecka, J., & Sastry, S. S. (2003). An invitation to 3-D vision: From images to geometric models. Berlin: Springer.zbMATHGoogle Scholar
  45. Masuda, T. (2002). Registration and integration of multiple range images by matching signed distance fields for object shape modeling. Computer Vision and Image Understanding (CVIU), 87(1–3), 51–65.CrossRefzbMATHGoogle Scholar
  46. Meilland, M., & Comport, A. I. (2013). On unifying key-frame and Voxel-based dense visual SLAM at large scales. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).Google Scholar
  47. Narayan, K. S., Sha, J., Singh, A., & Abbeel, P. (2015). Range sensor and silhouette fusion for high-quality 3D scanning. In IEEE International Conference on Robotics and Automation (ICRA).Google Scholar
  48. Neubeck, A., & Van Gool, L. (2006). Efficient non-maximum suppression. In 18th International Conference on Pattern Recognition (ICPR).Google Scholar
  49. Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A. J., Kohli, P., Shotton, J., Hodges, S., & Fitzgibbon, A. (2011). KinectFusion: Real-time dense surface mapping and tracking. In 10th International Symposium on Mixed and Augmented Reality (ISMAR).Google Scholar
  50. Newcombe, R. A., Lovegrove, S. J., & Davison, A. J. (2011). DTAM: Dense tracking and mapping in real-time. In IEEE International Conference on Computer Vision (ICCV).Google Scholar
  51. Nielsen, M. B., & Museth, K. (2006). Dynamic tubular grid: An efficient data structure and algorithms for high resolution level sets. Journal of Scientific Computing, 26(3), 261–299.MathSciNetCrossRefzbMATHGoogle Scholar
  52. Nießner, M., Zollhöfer, M., Izadi, S., & Stamminger, M. (2013). Real-time 3D reconstruction at scale using voxel hashing. ACM Transactions on Graphics (TOG), 32, 169.Google Scholar
  53. Osher, S., & Fedkiw, R. (2003). Level set methods and dynamic implicit surfaces. Applied mathematical science (Vol. 153). Springer.Google Scholar
  54. Pirker, K., Rüther, M., Schweighofer, G., & Bischof, H. (2011). GPSlam: Marrying sparse geometric and dense probabilistic visual mapping. In Proceedings of the British Machine Vision Conference (BMVC).Google Scholar
  55. Point Cloud Library. http://pointclouds.org/. Last Accessed March 30, 2017.
  56. Ren, C. Y., & Reid, I. (2012). A unified energy minimization framework for model fitting in depth. In European Conference on Computer Vision 2nd Workshop on Consumer Depth Cameras (ECCVW).Google Scholar
  57. Roth, H., & Vona, M. (2012). Moving volume KinectFusion. In British Machine Vision Conference (BMVC).Google Scholar
  58. Rusinkiewicz, S., & Levoy, M. (2001). Efficient variants of the ICP algorithm. In 3rd International Conference on 3D Digital Imaging and Modeling (3DIM).Google Scholar
  59. Rusu, R. B., Holzbach, A., Blodow, N., & Beetz, M. (2009). Fast geometric point labeling using conditional random fields. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).Google Scholar
  60. Schütz, C., Jost, T., & Hugli, H. (1998). Multi-feature matching algorithm for free-form 3D surface registration. In International Conference on Pattern Recognition (ICPR).Google Scholar
  61. Segal, A., Haehnel, D., & Thrun, S. (2009). Generalized-ICP. In Robotics: Science and Systems (RSS).Google Scholar
  62. Singh, A., Sha, J., Narayan, K., Achim, T., & Abbeel, P. (2014) BigBIRD: A large-scale 3D database of object instances. In IEEE International Conference on Robotics and Automation (ICRA).Google Scholar
  63. Slavcheva, M., & Ilic, S. (2016). SDF-TAR: Parallel tracking and refinement in RGB-D data using volumetric registration. In British Machine Vision Conference (BMVC).Google Scholar
  64. Slavcheva, M., Kehl, W., Navab, N., & Ilic, S. (2016). SDF-2-SDF: Highly accurate 3D object reconstruction. In European Conference on Computer Vision (ECCV).Google Scholar
  65. Steder, B., Rusu, R. B., Konolige, K., & Burgard, W. (2010). NARF: 3D range image features for object recognition. In Workshop on Defining and Solving Realistic Perception Problems in Personal Robotics at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).Google Scholar
  66. Steinbrücker, F., Kerl, C., Sturm, J., & Cremers, D. (2013). Large-scale multi-resolution surface reconstruction from RGB-D sequences. In IEEE International Conference on Computer Vision (ICCV).Google Scholar
  67. Steinbrücker, F., Sturm, J., & Cremers, D. (2014). Volumetric 3D mapping in real-time on a CPU. In IEEE International Conference on Robotics and Automation (ICRA).Google Scholar
  68. Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2017). A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the International Conference on Intelligent Robot Systems (IROS).Google Scholar
  69. Tomasi, C., & Manduchi, R. (1998). Bilateral filtering for gray and color images. In Sixth IEEE International Conference on Computer Vision (ICCV), pp. 839–846.Google Scholar
  70. Tombari, F., Salti, S., & Di Stefano, L. (2013). Performance evaluation of 3D keypoint detectors. International Journal of Computer Vision (IJCV), 102(1), 198–220.CrossRefGoogle Scholar
  71. Vijayanagar, K. R., Loghman, M., & Kim, J. (2014). Real-time refinement of Kinect depth maps using multi-resolution anisotropic diffusion. Mobile Networks and Applications, 19(3), 414–425.CrossRefGoogle Scholar
  72. Wasenmüller, O., Ansari, M., & Stricker, D. (2016) DNA-SLAM: Dense noise aware SLAM for ToF RGB-D cameras. In Asian Conference on Computer Vision (ACCV) International Workshops.Google Scholar
  73. Wasenmüller, O., Meyer, M., & Stricker, D. (2016). CoRBS: Comprehensive RGB-D benchmark for SLAM using Kinect v2. In IEEE Winter Conference on Applications of Computer Vision (WACV).Google Scholar
  74. Whelan, T., Johannsson, H., Kaess, M., Leonard, J. J., & McDonald, J. B. (2013). Robust real-time visual odometry for dense RGB-D mapping. In IEEE International Conference on Robotics and Automation (ICRA).Google Scholar
  75. Whelan, T., Kaess, M., Leonard, J. J., & McDonald, J. (2013). Deformation-based loop closure for large scale dense rgb-d slam. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).Google Scholar
  76. Whelan, T., Leutenegger, S., Salas-Moreno, R. F., Glocker, B., & Davison, A.J. (2015). ElasticFusion: Dense SLAM without a pose graph. In Robotics: Science and Systems (RSS).Google Scholar
  77. Whelan, T., McDonald, J. B., Kaess, M., Fallon, M. F., Johannsson, H., & Leonard, J. J. (2012). Kintinuous: Spatially extended KinectFusion. In RSS Workshop on RGB-D: Advanced Reasoning with Depth Cameras.Google Scholar
  78. Whelan, T., Salas-Moreno, R. F., Glocker, B., Davison, A. J., & Leutenegger, S. (2016). ElasticFusion: Real-time dense SLAM and light source estimation. International Journal of Robotics Research (IJRR), 35(14), 1697–1716.CrossRefGoogle Scholar
  79. Whitaker, R. T. (1998). A level-set approach to 3D reconstruction from range data. International Journal of Computer Vision (IJCV), 29(3), 203–231.CrossRefGoogle Scholar
  80. Zach, C., Pock, T., & Bischof, H. (2007). A globally optimal algorithm for robust TV-\(L^1\) range image integration. In Proceedings of the 11th IEEE International Conference on Computer Vision (ICCV), pp. 1–8.Google Scholar
  81. ZCorporation: ZPrinter 650. Hardware manual (2008).Google Scholar
  82. Zeng, M., Zhao, F., Zheng, J., & Liu, X. (2013). Octree-based fusion for realtime 3D reconstruction. Graphical Models, 75(3), 126–136.CrossRefGoogle Scholar
  83. Zhou, Q., Miller, S., & Koltun, V. (2013). Elastic fragments for dense scene reconstruction. In IEEE International Conference on Computer Vision (ICCV).Google Scholar
  84. Zhou, Q., & Koltun, V. (2013). Dense scene reconstruction with points of interest. ACM Transactions on Graphics, 32(4), 112.zbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Technische Universität MünchenMünchenGermany
  2. 2.Siemens CTMünchenGermany
  3. 3.Toyota Research InstituteLos AltosUSA

Personalised recommendations