Advertisement

ATGV-Net: Accurate Depth Super-Resolution

  • Gernot RieglerEmail author
  • Matthias Rüther
  • Horst Bischof
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9907)

Abstract

In this work we present a novel approach for single depth map super-resolution. Modern consumer depth sensors, especially Time-of-Flight sensors, produce dense depth measurements, but are affected by noise and have a low lateral resolution. We propose a method that combines the benefits of recent advances in machine learning based single image super-resolution, i.e. deep convolutional networks, with a variational method to recover accurate high-resolution depth maps. In particular, we integrate a variational method that models the piecewise affine structures apparent in depth data via an anisotropic total generalized variation regularization term on top of a deep network. We call our method ATGV-Net and train it end-to-end by unrolling the optimization procedure of the variational method. To train deep networks, a large corpus of training data with accurate ground-truth is required. We demonstrate that it is feasible to train our method solely on synthetic data that we generate in large quantities for this task. Our evaluations show that we achieve state-of-the-art results on three different benchmarks, as well as on a challenging Time-of-Flight dataset, all without utilizing an additional intensity image as guidance.

Keywords

Deep networks Variational methods Depth super-resolution 

Notes

Acknowledgment

This work was supported by Infineon Technologies Austria AG and the Austrian Research Promotion Agency under the FIT-IT Bridge program, project #838513 (TOFUSION).

Supplementary material

419975_1_En_17_MOESM1_ESM.pdf (7.2 mb)
Supplementary material 1 (pdf 7386 KB)

References

  1. 1.
    Aodha, O.M., Campbell, N.D., Nair, A., Brostow, G.J.: Patch based synthesis for single depth image super-resolution. In: European Conference on Computer Vision (ECCV) (2012)Google Scholar
  2. 2.
    Apple, A.: Some techniques for shading machine renderings of solids. In: Proceedings of the April 30–May 2 1968, Spring Joint Computer Conference (1968)Google Scholar
  3. 3.
    Bredies, K., Kunisch, K., Pock, T.: Total generalized variation. SIAM J. Imaging Sci. 3(3), 492–526 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: European Conference on Computer Vision (ECCV) (2012)Google Scholar
  5. 5.
    Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal-dual algorithm. Math. Program. 159, 253–287 (2016)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Chan, D., Buisman, H., Theobalt, C., Thrun, S.: A noise-aware filter for real-time depth upsampling. In: European Conference on Computer Vision Workshops (ECCVW) (2008)Google Scholar
  8. 8.
    Chen, L.C., Schwing, A.G., Yuille, A.L., Urtasun, R.: Learning deep structured models. In: Proceedings of the International Conference on Machine Learning (ICML) (2015)Google Scholar
  9. 9.
    Diebel, J., Thrun, S.: An application of Markov random fields to range sensing. In: Proceedings of Conference on Neural Information Processing Systems (NIPS) (2005)Google Scholar
  10. 10.
    Domke, J.: Generic methods for optimization-based modeling. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS) (2012)Google Scholar
  11. 11.
    Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part IV. LNCS, vol. 8692, pp. 184–199. Springer, Heidelberg (2014)Google Scholar
  12. 12.
    Ferstl, D., Reinbacher, C., Ranftl, R., Rüther, M., Bischof, H.: Image guided depth upsampling using anisotropic total generalized variation. In: IEEE International Conference on Computer Vision (ICCV) (2013)Google Scholar
  13. 13.
    Ferstl, D., Rüther, M., Bischof, H.: Variational depth superresolution using example-based edge representations. In: IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  14. 14.
    Girshick, R., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A.W.: Efficient regression of general-activity human poses from depth images. In: IEEE International Conference on Computer Vision (ICCV) (2011)Google Scholar
  15. 15.
    Glasner, D., Bagon, S., Irani, M.: Super-resolution from single image. In: IEEE International Conference on Computer Vision (ICCV) (2009)Google Scholar
  16. 16.
    Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 345–360. Springer, Heidelberg (2014)Google Scholar
  17. 17.
    Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., Cipolla, R.: Understanding real world indoor scenes with synthetic data. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  18. 18.
    He, K., Sun, J., Tang, X.: Guided image filtering. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 1–14. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  19. 19.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  20. 20.
    Hornáček, M., Rhemann, C., Gelautz, M., Rother, C.: Depth super resolution by rigid body self-similarity in 3D. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)Google Scholar
  21. 21.
    Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A., Fitzgibbon, A.: KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. In: ACM Symposium on User Interface Software and Technology (2011)Google Scholar
  22. 22.
    Kim, J., Lee, J.K., Lee, K.M.: Accurate image super-resolution using very deep convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  23. 23.
    Kopf, J., Cohen, M.F., Lischinski, D., Uyttendaele, M.: Joint bilateral upsampling. ACM Trans. Graph. (TOG) 26(3), 96 (2007)CrossRefGoogle Scholar
  24. 24.
    Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with gaussian edge potentials. In: Proceedings of Conference on Neural Information Processing Systems (NIPS) (2012)Google Scholar
  25. 25.
    Kwon, H., Tai, Y.W., Lin, S.: Data-driven depth map refinement via multi-scale spare representations. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  26. 26.
    Martull, S., Peris, M., Fukui, K.: Realistic CG stereo image dataset with ground truth disparity maps. In: International Conference on Pattern Recognition Workshops (ICPRW) (2012)Google Scholar
  27. 27.
    Nagel, H.H., Enkelmann, W.: An investigation of smoothness constraints for the estimation of displacement vector fields from image sequences. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 8(5), 565–593 (1986)CrossRefGoogle Scholar
  28. 28.
    Ochs, P., Ranftl, R., Brox, T., Pock, T.: Bilevel optimization with nonsmooth lower level problems. In: Aujol, J.-F., Nikolova, M., Papadakis, N. (eds.) SSVM 2015. LNCS, vol. 9087, pp. 654–665. Springer, Heidelberg (2015)Google Scholar
  29. 29.
    Park, J., Kim, H., Tai, Y.W., Brown, M.S., Kweon, I.S.: High quality depth map upsampling for 3D-TOF cameras. In: IEEE International Conference on Computer Vision (ICCV) (2011)Google Scholar
  30. 30.
    Ranftl, R., Gehrig, S., Pock, T., Bischof, H.: Pushing the limits of stereo using variational stereo estimation. In: IEEE Intelligent Vehicles Symposium (2012)Google Scholar
  31. 31.
    Ranftl, R., Pock, T.: A deep variational model for image segmentation. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 107–118. Springer, Heidelberg (2014)Google Scholar
  32. 32.
    Riegler, G., Ranftl, R., Rüther, M., Bischof, H.: Joint training of an convolutional neural net and a global regression model. In: Proceedings of the British Machine Vision Conference (BMVC) (2015)Google Scholar
  33. 33.
    Schulter, S., Leistner, C., Bischof, H.: Fast and accurate image upscaling with super-resolution forests. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  34. 34.
    Schwing, A.G., Urtasun, R.: Fully Connected Deep Structured Networks. arXiv preprint arXiv:1503.02351 (2015)
  35. 35.
    Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: Cipolla, R., Battiato, S., Farinella, G.M. (eds.) Machine Learning for Computer Vision. SCI, vol. 411, pp. 125–141. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  36. 36.
    Timofte, R., Smet, V.D., Gool, L.V.: Anchored neighborhood regression for fast example-based super-resolution. In: IEEE International Conference on Computer Vision (ICCV) (2013)Google Scholar
  37. 37.
    Timofte, R., De Smet, V., Van Gool, L.: A+: adjusted anchored neighborhood regression for fast super-resolution. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9006, pp. 111–126. Springer, Heidelberg (2015)Google Scholar
  38. 38.
    Tompson, J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Proceedings of Conference on Neural Information Processing Systems (NIPS) (2014)Google Scholar
  39. 39.
    Werlberger, M., Trobin, W., Pock, T., Wedel, A., Cremers, D., Bischof, H.: Anisotropic Huber-L1 optical flow. In: Proceedings of the British Machine Vision Conference (BMVC) (2009)Google Scholar
  40. 40.
    Yang, J., Wright, J., Huang, T.S., Ma, Y.: Image super-resolution via sparse representation. IEEE Trans. Image Process. 19(11), 2861–2873 (2010)MathSciNetCrossRefGoogle Scholar
  41. 41.
    Yang, Q., Yang, R., Davis, J., Nistér, D.: Spatial-depth super resolution for range images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007)Google Scholar
  42. 42.
    Zeyde, R., Elad, M., Protter, M.: On single image scale-up using sparse-representations. In: Boissonnat, J.-D., Chenin, P., Cohen, A., Gout, C., Lyche, T., Mazure, M.-L., Schumaker, L. (eds.) Curves and Surfaces 2011. LNCS, vol. 6920, pp. 711–730. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  43. 43.
    Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.: Conditional random fields as recurrent neural networks. In: IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Gernot Riegler
    • 1
    Email author
  • Matthias Rüther
    • 1
  • Horst Bischof
    • 1
  1. 1.Institute for Computer Graphics and VisionGraz University of TechnologyGrazAustria

Personalised recommendations