Skip to main content
Log in

Improving RGB-D-based 3D reconstruction by combining voxels and points

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript


We propose a flexible 3D reconstruction method based on the RGB-D data stream. Compared to previous methods using pure voxels or pure points as representations, our works propose a new representation combining voxels and points to improve the reconstruction accuracy. A key insight is that points can store additional depth data that are not sampled by regular voxels. Thus, by integrating points and voxels, the 3D reconstruction process can be accelerated due to higher data utilization. Furthermore, depth information stored in points is used to refine the noisy depth image through a depth image refinement method, consequently improving the reconstructed shape quality. Extensive comparative experiments are performed including different representations (pure voxels/points) and various methods (fusion-based/learning-based and online/offline) to illustrate the effectiveness of our work. Experimental results demonstrate that our method can achieve real-time performance, effectively avoid artifacts, and reach state-of-the-art accuracy levels. More importantly, we provide a novel idea to balance the conflict between memory overhead and reconstruction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others


  1. Zhou, Q., Miller, S., Koltun, V.: Elastic fragments for dense scene reconstruction. IEEE International Conference on Computer Vision (2013)

  2. Zhou, Q., Koltun, V.: Color map optimization for 3d reconstruction with consumer depth cameras. ACM Trans. Gr. 33(4), 155 (2014)

    Article  Google Scholar 

  3. Maier, R., Kim, K., Cremers, D., Kautz, J., Niessner, M.: Intrinsic3d: High-quality 3d reconstruction by joint appearance and geometry optimization with spatially-varying lighting. IEEE International Conference on Computer Vision (2017)

  4. Dai, A., Niessner, M., Zollhöfer, M., Izadi, S., Theobalt, C.: Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration. ACM Trans. Gr. 36(3), 24 (2017)

    Article  Google Scholar 

  5. Yang, Y., Dong, W., Kaess, M.: Surfel-based dense RGB-D reconstruction with global and local consistency. International Conference on Robotics and Automation (2019)

  6. Newcombe, RA., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, AJ., et al: Kinectfusion: Real-time dense surface mapping and tracking. 10th IEEE International Symposium on Mixed and Augmented Reality (2011)

  7. Lan, Z., Yew, ZJ., Lee, GH.: Robust point cloud based reconstruction of large-scale outdoor scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)

  8. Niessner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Real-time 3d reconstruction at scale using voxel hashing. ACM Trans. Gr. 32(6), 169 (2013)

    Article  Google Scholar 

  9. Whelan, T., Leutenegger, S., Salas-Moreno, R.F., Glocker, B., Davison, A.J.: Elasticfusion: Dense SLAM without A pose graph. Robot. Sci. Syst. 11, 1 (2015)

    Google Scholar 

  10. Haefner, B., Peng, S., Verma, A., Quèau, Y., Cremers, D.: Photometric depth super-resolution. IEEE Trans. Pattern Anal. Mach. Intell. (2019)

  11. Marquina, A., Osher, S.J.: Image super-resolution by tv-regularization and bregman iteration. J. Sci. Comput. 37(3), 367–382 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  12. Song, X., Dai, Y., Qin, X.: Deep depth super-resolution: Learning depth super-resolution using deep convolutional neural network. Comput. Vis. (2016).

  13. Wen, Y., Sheng, B., Li, P., Lin, W., Feng, D.D.: Deep color guided coarse to fine convolutional network cascade for depth image super-resolution. IEEE Trans. Image Process. 28(2), 994–1006 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  14. Yu, L., Yeung, SK., Tai, Y., Lin, S.: Shading-based shape refinement of RGB-D images. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (2013)

  15. Wu, H., Wang, Z., Zhou, K.: Simultaneous localization and appearance estimation with a consumer RGB-D camera. IEEE Trans. Vis. Comput. Gr. 22(8), 2012–2023 (2016)

    Article  Google Scholar 

  16. Mac Aodha, O., Campbell, NDF., Nair, A., Brostow, GJ.: Patch based synthesis for single depth image super-resolution. Comput. Vis. (2012).

  17. Park, J., Kim, H., Tai, Y., Brown, MS., Kweon, I.: High quality depth image upsampling for 3d-tof cameras. In: IEEE International Conference on Computer Vision (2011).

  18. Jiang, Z., Yue, H., Lai, Y., Yang, J., Hou, Y., Hou, C.: Deep edge map guided depth super resolution. Signal Process Image Commun. (2021).

  19. Haefner, B., Peng, S., Verma, A., Quèau, Y., Cremers, D.: Photometric depth super-resolution. IEEE Trans. Pattern Anal. Mach. Intell. 42(10), 2453–2464 (2020)

    Article  Google Scholar 

  20. Ye, X., Sun, B., Wang, Z., Yang, J., Xu, R., Li, H., et al.: Pmbanet: Progressive multi-branch aggregation network for scene depth super-resolution. IEEE Trans. Image Process. 29, 7427–7442 (2020)

    Article  MATH  Google Scholar 

  21. Song, X., Dai, Y., Zhou, D., Liu, L., Li, W., Li, H., et al.: Channel attention based iterative residual learning for depth image super-resolution. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020).

  22. Liu, X., Zhai, D., Chen, R., Ji, X., Zhao, D., Gao, W.: Depth super-resolution via joint color-guided internal and external regularizations. IEEE Trans. Image Process 28(4), 1636–1645 (2019)

    Article  MathSciNet  Google Scholar 

  23. Yang, H., Zhang, Z.: Depth image upsampling based on guided filter with low gradient minimization. Vis. Comput. 36(7), 1411–1422 (2020)

    Article  Google Scholar 

  24. Yang, S., Cao, N., Guo, B., Li, G.: Depth map super-resolution based on edge-guided joint trilateral upsampling. Vis. Comput. 38(3), 883–895 (2022)

    Article  Google Scholar 

  25. Altantawy, A., Saleh, I., Kishk, S.: Texture-guided depth upsampling using Bregman split: a clustering graph-based approach. Vis. Comput. 36(2), 333–359 (2020)

    Article  Google Scholar 

  26. Zhao, L., Bai, H., Liang, J., Zeng, B., Wang, A., Zhao, Y.: Simultaneous color-depth super-resolution with conditional generative adversarial networks. Pattern Recogn. 88, 356–369 (2019)

    Article  Google Scholar 

  27. Zollhöfer, M., Dai, A., Innmann, M., Wu, C., Stamminger, M., Theobalt, C., et al.: Shading-based refinement on volumetric signed distance functions. ACM Trans. Gr. 34(4), 96 (2015)

    Article  MATH  Google Scholar 

  28. Fu, Y., Yan, Q., Liao, J., Chow, A., Xiao, C.: Real-time dense 3D reconstruction and camera tracking via embedded planes representation. Vis. Comput. 36(10), 2215–2226 (2020)

    Article  Google Scholar 

  29. Lu, F., Zhou, B., Zhang, Y., Zhao, Q.: Real-time 3D scene reconstruction with dynamically moving object using a single depth camera. Vis. Comput. 34(6–8), 753–763 (2018)

    Article  Google Scholar 

  30. Wang, K., Zhang, G., Yang, J., Bao, H.: Dynamic human body reconstruction and motion tracking with low-cost depth cameras. Vis. Comput. 37(3), 603–618 (2021)

    Article  Google Scholar 

  31. Huang, J., Dai, A., Guibas, L., Niessner, M.: 3dlite: Towards commodity 3d scanning for content creation. ACM Trans. Gr. 36(6), 1–14 (2017)

    Article  Google Scholar 

  32. Zhang, J., Zhu, C., Zheng, L., Xu, K.: ROSEFusion: random optimization for online dense reconstruction under fast camera motion. ACM Trans. Graph. 40(4), 56:1-56:17 (2021)

    Article  Google Scholar 

  33. Wong, Y.S., Li, C., Niessner, M., Mitra, N.J.: Rigidfusion: Rgb-d scene reconstruction with rigidly-moving objects. Comput. Gr. Forum 40(2), 511–522 (2021)

    Article  Google Scholar 

  34. Cao, Y.-P., Kobbelt, L., Hu, S.-M.: Real-time High-accuracy Three-Dimensional Reconstruction with Consumer RGB-D Cameras. ACM Trans. Graph. 37(5), 171:1-1711:6 (2018)

    Article  Google Scholar 

  35. Li, K., Pham, T., Zhan, H., Reid, ID.: Effcient dense point cloud object reconstruction using deformation vector fields. Computer Vision (2018).

  36. Arikan, M., Preiner, R., Scheiblauer, C., Jeschke, S., Wimmer, M.: Large scale point-cloud visualization through localized textured surface reconstruction. IEEE Trans. Vis. Comput. Gr. 20(9), 1280–1292 (2014)

    Article  Google Scholar 

  37. Schöps, T., Sattler, T., Pollefeys, M.: Surfelmeshing: Online surfelbased mesh reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 42(10), 2494–2507 (2020)

    Article  Google Scholar 

  38. Monica, R., Aleotti, J.: Surfel-based incremental reconstruction of the boundary between known and unknown space. IEEE Trans. Vis. Comput. Gr. 26(8), 2683–2695 (2020)

    Article  Google Scholar 

  39. Yang, Z., Chai, Y., Anguelov, D., Zhou, Y., Sun, P., Erhan, D., et al.: Surfelgan: Synthesizing realistic sensor data for autonomous driving. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020).

  40. Newcombe, RA., Fox, D., Seitz, SM.: Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time. IEEE Conference on Computer Vision and Pattern Recognition (2015).

  41. Dou, M., Khamis, S., Degtyarev, Y., Davidson, P.L., Fanello, S.R., Kowdle, A., et al.: Fusion4d: Real-time performance capture of challenging scenes. ACM Trans. Gr. 35(4), 1–13 (2016)

    Article  Google Scholar 

  42. Fuhrmann, S., Goesele, M.: Fusion of depth images with multiple scales. ACM Trans. Gr. 30(6), 148 (2011)

    Article  Google Scholar 

  43. Chen, J., Bautembach, D., Izadi, S.: Scalable real-time volumetric surface reconstruction. ACM Trans. Gr. 32(4), 1–16 (2013)

    Article  MATH  Google Scholar 

  44. Mostegel, C., Prettenthaler, R., Fraundorfer, F., Bischof, H.: Scalable surface reconstruction from point clouds with extreme scale and density diversity. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (2017).

  45. Keller, M., Lefloch, D., Lambers, M., Izadi, S., Weyrich, T., Kolb, A.: Real-time 3d reconstruction in dynamic scenes using point-based fusion. 2013 International Conference on 3D Vision (2013).

  46. Sumner, R.W., Schmid, J., Pauly, M.: Embedded deformation for shape manipulation. ACM Trans. Gr. 26(3), 80 (2007)

    Article  Google Scholar 

  47. Gao, W., Tedrake, R.: Surfelwarp: Effcient non-volumetric single view dynamic reconstruction. Robotics: Science and Systems XIV (2018).

  48. Park, J., Florence, P., Straub, J., Newcombe, R.-A., Lovegrove, S.: DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. CVPR: 165–174 (2019).

  49. Chabra, R., Lenssen, J., Ilg, E., Schmidt, T., Straub, J., Lovegrove, S., Newcombe, R.-A.: Deep local shapes: learning local SDF priors for detailed 3D reconstruction. ECCV 29, 608–625 (2020)

    Google Scholar 

  50. Peng, S., Niemeyer, M., Mescheder, L.-M., Pollefeys, M., Geiger, A.: Convolutional Occupancy Networks. ECCV 3, 523–540 (2020)

    Google Scholar 

  51. Jiang, C., Sud, A., Makadia, A., Huang, J., Nießner, M., Funkhouser, T.-A.: Local Implicit Grid Representations for 3D Scenes. CVPR 6000–6009 (2020).

  52. Huang, J., Huang, S.-S., Song, H., Hu, S.-M.: DI-Fusion: Online Implicit 3D Reconstruction with Deep Priors. CVPR 8932–89411 (2021).

  53. Weder, S., Schönberger, J.-L., Pollefeys, M., Oswald, M.-R.: NeuralFusion: Online Depth Fusion in Latent Space. CVPR, pp. 3162–3172 (2021).

  54. Weder, S., Schönberger, J.-L., Pollefeys, M., Oswald, M.- R.: RoutedFusion: Learning Real-Time Depth Map Fusion. CVPR, pp. 4886–4896 (2020).

  55. Saito, S., Huang, Z., Natsume, R., Morishima, S., Li, H., Kanazawa, A.: Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In: 2019 IEEE/CVF International Conference on Computer Vision (2019).

  56. Saito, S., Simon, T., Saragih, JM., Joo, H.: Pifuhd: Multi-level pixel aligned implicit function for high-resolution 3d human digitization. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020).

  57. He, T., Collomosse, J.P., Jin, H., Soatto, S.: Geo-pifu: Geometry and pixel aligned implicit functions for single-view human reconstruction. Advances in Neural Information Processing Systems (2020).

  58. Deng, B., Lewis, JP., Jeruzalski, T., Pons-Moll, G., Hinton, GE., Norouzi, M., et al.: NASA neural articulated shape approximation. ECCV (2020).

  59. Chibane, J., Alldieck, T., Pons-Moll, G.: Implicit functions in feature space for 3d shape reconstruction and completion. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020).

  60. Zheng, Z., Yu, T., Liu, Y., Dai, Q.: Pamir: Parametric model-conditioned implicit representation for image-based human reconstruction. CoRR;abs/2007.03858, (2020).

  61. Natsume, R., Saito, S., Huang, Z., Chen, W., Ma, C., Li, H., et al.: Siclope: Silhouette-based clothed people. In: IEEE Conference on Computer Vision and Pattern Recognition (2019).

  62. Alldieck, T., Pons-Moll, G., Theobalt, C., Magnor, MA.: Tex2shape: Detailed full human body geometry from a single image. In: 2019 IEEE/CVF International Conference on Computer Vision (2019).

  63. Xia, Z., Kim, J., Park, YS.: Real-time 3d reconstruction using a combination of point-based and volumetric fusion. IEEE/RSJ International Conference on Intelligent Robots and Systems (2018).

  64. Liu, X., Li, J., Lu, G.: A new volumetric fusion strategy with adaptive weight field for RGB-D reconstruction. Sensors 20(15), 4330 (2020)

    Article  Google Scholar 

  65. Kazhdan, M.M., Hoppe, H.: Screened poisson surface reconstruction. ACM Trans. Gr. 32(3), 1–13 (2013)

    Article  MATH  Google Scholar 

  66. Chang, A.-X., Funkhouser, T.-A., Guibas, L.-J., Hanrahan, P., Huang, Q.-X., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., Yu, F.: ShapeNet: An Information-Rich 3D Model Repository. CoRR abs/1512.03012 (2015)

  67. Lefloch, D., Weyrich, T., Kolb, A.: Anisotropic point-based fusion. International Conference on Information Fusion (2015)

Download references


This work was supported in part by the National Key Research and Development Program of China (2018YFB1700700), the National Natural Science Foundation of China (61732015, 61972340), and the Research Funding of Zhejiang University Robotics Institute.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jituo Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Li, J. & Lu, G. Improving RGB-D-based 3D reconstruction by combining voxels and points. Vis Comput 39, 5309–5325 (2023).

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: