Advertisement

Reconstruction-Based Pairwise Depth Dataset for Depth Image Enhancement Using CNN

  • Junho Jeon
  • Seungyong Lee
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11220)

Abstract

Raw depth images captured by consumer depth cameras suffer from noisy and missing values. Despite the success of CNN-based image processing on color image restoration, similar approaches for depth enhancement have not been much addressed yet because of the lack of raw-clean pairwise dataset. In this paper, we propose a pairwise depth image dataset generation method using dense 3D surface reconstruction with a filtering method to remove low quality pairs. We also present a multi-scale Laplacian pyramid based neural network and structure preserving loss functions to progressively reduce the noise and holes from coarse to fine scales. Experimental results show that our network trained with our pairwise dataset can enhance the input depth images to become comparable with 3D reconstructions obtained from depth streams, and can accelerate the convergence of dense 3D reconstruction results.

Keywords

Depth image dataset Depth image enhancement 3D reconstruction Deep learning Laplacian pyramid network 

Notes

Acknowledgements

We appreciate the constructive comments from the reviewers. This work was supported by the Ministry of Science and ICT, Korea, through IITP grant (IITP-2015-0-00174), Giga Korea grant (GK18P0300), and NRF grant (NRF-2017M3C4A7066317).

Supplementary material

474218_1_En_26_MOESM1_ESM.pdf (32.3 mb)
Supplementary material 1 (pdf 33024 KB)

References

  1. 1.
    Agustsson, E., Timofte, R.: Ntire 2017 challenge on single image super-resolution: dataset and study. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1122–1131 (2017)Google Scholar
  2. 2.
  3. 3.
    Blum, M., Springenberg, J.T., Wülfing, J., Riedmiller, M.: A learned feature descriptor for object recognition in RGB-D data. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp. 1298–1303 (2012)Google Scholar
  4. 4.
    Bruhn, A., Weickert, J., Schnörr, C.: Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods. Int. J. Comput. Vis. (IJCV) 61(3), 211–231 (2005)CrossRefGoogle Scholar
  5. 5.
    Chen, L., Lin, H., Li, S.: Depth image enhancement for Kinect using region growing and bilateral filter. In: Proceedings of International Conference on Pattern Recognition (ICPR), pp. 3070–3073 (2012)Google Scholar
  6. 6.
    Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2432–2443 (2017)Google Scholar
  7. 7.
    Dai, A., Nießner, M., Zollhöfer, M., Izadi, S., Theobalt, C.: Bundlefusion: real-time globally consistent 3D reconstruction using on-the-fly surface reintegration. ACM Trans. Graph. (ToG) 36(3), 24 (2017)CrossRefGoogle Scholar
  8. 8.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)Google Scholar
  9. 9.
    Denton, E.L., Chintala, S., Fergus, R., et al.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 1486–1494 (2015)Google Scholar
  10. 10.
    Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 184–199 (2014)Google Scholar
  11. 11.
    Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 681–687 (2015)Google Scholar
  12. 12.
    Goodfellow, I., et al.: Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 2672–2680 (2014)Google Scholar
  13. 13.
    Gu, S., Zuo, W., Guo, S., Chen, Y., Chen, C., Zhang, L.: Learning dynamic guidance for depth image enhancement. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 712–721 (2017)Google Scholar
  14. 14.
    Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 345–360 (2014)Google Scholar
  15. 15.
    Han, Y., Lee, J.Y., Kweon, I.S.: High quality shape from a single RGB-D image under uncalibrated natural illumination. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 1617–1624 (2013)Google Scholar
  16. 16.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)Google Scholar
  17. 17.
    Hui, T.W., Loy, C.C., Tang, X.: Depth map super-resolution by deep multi-scale guidance. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 353–369 (2016)CrossRefGoogle Scholar
  18. 18.
    Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph. (ToG) 36(4), 107 (2017)CrossRefGoogle Scholar
  19. 19.
    Kiechle, M., Hawe, S., Kleinsteuber, M.: A joint intensity and depth co-sparse analysis model for depth map super-resolution. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 1545–1552 (2013)Google Scholar
  20. 20.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations (ICLR) (2015)Google Scholar
  21. 21.
    Kopf, J., Cohen, M.F., Lischinski, D., Uyttendaele, M.: Joint bilateral upsampling. ACM Trans. Graphics (ToG) 26(3), 96 (2007)CrossRefGoogle Scholar
  22. 22.
    Kwon, H., Tai, Y.W., Lin, S.: Data-driven depth map refinement via multi-scale sparse representation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 159–167 (2015)Google Scholar
  23. 23.
    Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep Laplacian pyramid networks for fast and accurate super-resolution. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 624–632 (2017)Google Scholar
  24. 24.
    Le, A.V., Jung, S.W., Won, C.S.: Directional joint bilateral filter for depth images. Sensors 14(7), 11362–11378 (2014)CrossRefGoogle Scholar
  25. 25.
    Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4681–4690 (2017)Google Scholar
  26. 26.
    Lin, D., Fidler, S., Urtasun, R.: Holistic scene understanding for 3D object detection with RGBD cameras. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 1417–1424 (2013)Google Scholar
  27. 27.
    Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 740–755 (2014)Google Scholar
  28. 28.
    Lu, J., Forsyth, D.: Sparse depth super resolution. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2245–2253 (2015)Google Scholar
  29. 29.
    Lu, S., Ren, X., Liu, F.: Depth enhancement via low-rank matrix completion. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3390–3397 (2014)Google Scholar
  30. 30.
    Nah, S., Kim, T.H., Lee, K.M.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 257–265 (2017)Google Scholar
  31. 31.
    Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 746–760 (2012)CrossRefGoogle Scholar
  32. 32.
    Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: Proceedings of IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 127–136 (2011)Google Scholar
  33. 33.
    Nießner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. (ToG) 32(6), 169 (2013)CrossRefGoogle Scholar
  34. 34.
    Occipital Structure Sensor: https://structure.io/
  35. 35.
    Paszke, A., Gross, S., Chintala, S., Chanan, G.: Pytorch: tensors and dynamic neural networks in python with strong GPU acceleration (2017)Google Scholar
  36. 36.
    Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.: Context encoders: feature learning by inpainting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2536–2544 (2016)Google Scholar
  37. 37.
    Schmeing, M., Jiang, X.: Edge-aware depth image filtering using color segmentation. Pattern Recognit. Lett. (PR) 50(C), 63–71 (2014)CrossRefGoogle Scholar
  38. 38.
    Shen, X., Zhou, C., Xu, L., Jia, J.: Mutual-structure for joint filtering. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 3406–3414 (2015)Google Scholar
  39. 39.
    Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: a RGB-D scene understanding benchmark suite. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 567–576 (2015)Google Scholar
  40. 40.
    Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 839–846 (1998)Google Scholar
  41. 41.
    Tosic, I., Drewes, S.: Learning joint intensity-depth sparse representations. IEEE Trans. Image Process. (TIP) 23(5), 2122–2132 (2014)MathSciNetCrossRefGoogle Scholar
  42. 42.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. (TIP) 13(4), 600–612 (2004)CrossRefGoogle Scholar
  43. 43.
    Wu, C., Zollhöfer, M., Nießner, M., Stamminger, M., Izadi, S., Theobalt, C.: Real-time shading-based refinement for consumer depth cameras. ACM Trans. Graph. (ToG) 33(6), 200 (2014)Google Scholar
  44. 44.
    Xu, L., Ren, J., Yan, Q., Liao, R., Jia, J.: Deep edge-aware filters. In: Proceedings of International Conference on Machine Learning (ICML), pp. 1669–1678 (2015)Google Scholar
  45. 45.
    Xu, L., Ren, J.S., Liu, C., Jia, J.: Deep convolutional neural network for image deconvolution. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 1790–1798 (2014)Google Scholar
  46. 46.
    Yu, L.F., Yeung, S.K., Tai, Y.W., Lin, S.: Shading-based shape refinement of RGB-D images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1415–1422 (2013)Google Scholar
  47. 47.
    Zhang, L., Shen, P., Zhang, S., Song, J., Zhu, G.: Depth enhancement with improved exemplar-based inpainting and joint trilateral guided filtering. In: Proceedings of IEEE International Conference on Image Processing (ICIP), pp. 4102–4106 (2016)Google Scholar
  48. 48.
    Zhang, Q., Shen, X., Xu, L., Jia, J.: Rolling guidance filter. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 815–830 (2014)Google Scholar
  49. 49.
    Zhang, Y., Funkhouser, T.: Deep depth completion of a single RGB-D image. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.POSTECHPohangSouth Korea

Personalised recommendations