Advertisement

Depth map prediction from a single image with generative adversarial nets

  • Shaoyong Zhang
  • Na Li
  • Chenchen Qiu
  • Zhibin Yu
  • Haiyong Zheng
  • Bing Zheng
Article

Abstract

A depth map is a fundamental component of 3D construction. Depth map prediction from a single image is a challenging task in computer vision. In this paper, we consider the depth prediction as an image-to-image task and propose an adversarial convolutional architecture called the Depth Generative Adversarial Network (DepthGAN) for depth prediction. To enhance the image translation ability, we take advantage of a Fully Convolutional Residual Network (FCRN) and combine it with a generative adversarial network, which has shown remarkable achievements in image-to-image tasks. We also present a new loss function including the scale-invariant (SI) error and the structural similarity (SSIM) loss function to improve our model and to output a high-quality depth map. Experiments show that the DepthGAN performs better in monocular depth prediction than the current best method on the NYU Depth v2 dataset.

Keywords

Depth prediction Generative adversarial network Image translation 

Notes

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant Number 61701463, the National PostDoctoral Foundation of China under Grant Number 2017M622277, the Fundamental Research Funds for the Central Universities under Grant Number 201713019, the Natural Science Foundation of Shandong Province of China under Grant Number ZR2017BF011 and the Qingdao Postdoctoral Science Foundation of China. We gratefully acknowledge the support of the NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.

References

  1. 1.
    Arjovsky M, Chintala S, Bottou L (2017) Wasserstein gan. arXiv:1701.07875
  2. 2.
    Brock A, Lim T, Ritchie JM, Weston N (2016) Neural photo editing with introspective adversarial networks. arXiv:1609.07093
  3. 3.
    Cao Y, Xia Y, Wang Z (2010) A close-form iterative algorithm for depth inferring from a single image. In: European Conference on computer vision. Springer, pp 729–742Google Scholar
  4. 4.
    Cao Y, Wu Z, Shen C (2017) Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Transactions on Circuits and Systems for Video TechnologyGoogle Scholar
  5. 5.
    Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) Infogan: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in neural information processing systems, pp 2172–2180Google Scholar
  6. 6.
    Cherian A, Morellas V, Papanikolopoulos N (2009) Accurate 3d ground plane estimation from a single image. In: IEEE International conference on robotics and automation, 2009. ICRA’09. IEEE, pp 2243–2249Google Scholar
  7. 7.
    Clayden K (2012) Personality, motivation and level of involvement of land-based recreationists in the Irish uplands. Ph.D. thesis, Waterford Institute of TechnologyGoogle Scholar
  8. 8.
    Dong H, Yu S, Wu C, Guo Y (2017) Semantic image synthesis via adversarial learning. arXiv:1707.06873
  9. 9.
    Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE international conference on computer vision, pp 2650–2658Google Scholar
  10. 10.
    Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Advances in neural information processing systems, pp 2366–2374Google Scholar
  11. 11.
    Endres F, Hess J, Sturm J, Cremers D, Burgard W (2014) 3-d mapping with an rgb-d camera. IEEE Trans Robot 30(1):177–187CrossRefGoogle Scholar
  12. 12.
    Fan X, Zheng K, Lin Y, Wang S (2015) Combining local appearance and holistic view: Dual-source deep neural networks for human pose estimation. arXiv:1504.07159
  13. 13.
    Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237CrossRefGoogle Scholar
  14. 14.
    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587Google Scholar
  15. 15.
    Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680Google Scholar
  16. 16.
    Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC (2017) Improved training of wasserstein gans. In: Advances in neural information processing systems, pp 5769–5779Google Scholar
  17. 17.
    Harman PV, Flack J, Fox S, Dowley M (2002) Rapid 2d-to-3d conversion. In: Stereoscopic displays and virtual reality systems IX, vol 4660. International Society for Optics and Photonics, pp 78–87Google Scholar
  18. 18.
    He K, Sun J, Tang X (2011) Single image haze removal using dark channel prior. IEEE Trans Pattern Anal Mach Intell 33(12):2341–2353CrossRefGoogle Scholar
  19. 19.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778Google Scholar
  20. 20.
    Hoiem D, Efros AA, Hebert M (2008) Putting objects in perspective. Int J Comput Vis 80(1):3–15CrossRefGoogle Scholar
  21. 21.
    Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks arXiv preprintGoogle Scholar
  22. 22.
    Jung JI, Ho YS (2010) Depth map estimation from single-view image using object classification based on bayesian learning. In: 3DTV-conference: the true vision-capture, transmission and display of 3D video (3DTV-CON), 2010. IEEE, pp 1–4Google Scholar
  23. 23.
    Kaneko T, Hiramatsu K, Kashino K (2017) Generative attribute controller with conditional filtered generative adversarial networks. In: IEEE Conference on computer vision and pattern recognition (CVPR), vol 2Google Scholar
  24. 24.
    Karacan L, Akata Z, Erdem A, Erdem E (2016) Learning to generate images of outdoor scenes from attributes and semantic layouts. arXiv:1612.00215
  25. 25.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105Google Scholar
  26. 26.
    Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International conference on 3D vision (3DV). IEEE, pp 239–248Google Scholar
  27. 27.
    Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z et al (2016) Photo-realistic single image super-resolution using a generative adversarial network. ArXiv preprintGoogle Scholar
  28. 28.
    Li Y, Lu H, Li J, Li X, Li Y, Serikawa S (2016) Underwater image de-scattering and classification by deep neural network. Comput Electric Eng 54:68–77CrossRefGoogle Scholar
  29. 29.
    Li P, Wang D, Wang L, Lu H (2018) Deep visual tracking: review and experimental comparison. Pattern Recogn 76:323–338CrossRefGoogle Scholar
  30. 30.
    Liu B, Gould S, Koller D (2010) Single image depth estimation from predicted semantic labels. In: 2010 IEEE Conference on computer vision and pattern recognition (CVPR). IEEE, pp 1253–1260Google Scholar
  31. 31.
    Liu F, Shen C, Lin G (2015) Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5162–5170Google Scholar
  32. 32.
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440Google Scholar
  33. 33.
    Lu H, Li B, Zhu J, Li Y, Li Y, Xu X, He L, Li X, Li J, Serikawa S (2017) Wound intensity correction and segmentation with convolutional neural networks. Concurr Comput Pract Exper, 29(6)CrossRefGoogle Scholar
  34. 34.
    Lu H, Li Y, Mu S, Wang D, Kim H, Serikawa S (2017) Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Internet of Things JournalGoogle Scholar
  35. 35.
    Lu H, Li Y, Chen M, Kim H, Serikawa S (2018) Brain intelligence: go beyond artificial intelligence. Mobile Netw Appl 23(2):368–375CrossRefGoogle Scholar
  36. 36.
    Lu H, Li Y, Uemura T, Kim H, Serikawa S (2018) Low illumination underwater light field images reconstruction using deep convolutional neural networks. Future Generation Computer SystemsGoogle Scholar
  37. 37.
    Mao X, Li Q, Xie H, Lau RY, Wang Z, Smolley SP (2017) Least squares generative adversarial networks. In: 2017 IEEE International conference on computer vision (ICCV). IEEE, pp 2813–2821Google Scholar
  38. 38.
    Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434
  39. 39.
    Roy A, Todorovic S (2016) Monocular depth estimation using neural regression forest. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5506–5514Google Scholar
  40. 40.
    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRefGoogle Scholar
  41. 41.
    Saxena A, Chung SH, Ng AY (2008) 3-d depth reconstruction from a single still image. Int J Comput Vis 76(1):53–69CrossRefGoogle Scholar
  42. 42.
    Saxena A, Sun M, Ng AY (2009) Make3d: learning 3d scene structure from a single still image. IEEE Trans Pattern Anal Mach Intell 31(5):824–840CrossRefGoogle Scholar
  43. 43.
    Serikawa S, Lu H (2014) Underwater image dehazing using joint trilateral filter. Comput Electric Eng 40(1):41–50CrossRefGoogle Scholar
  44. 44.
    Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304CrossRefGoogle Scholar
  45. 45.
    Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: European Conference on computer vision. Springer, pp 746–760Google Scholar
  46. 46.
    Sønderby CK, Caballero J, Theis L, Shi W, Huszár F (2016) Amortised map inference for image super-resolution. arXiv:1610.04490
  47. 47.
    Wang P, Shen X, Lin Z, Cohen S, Price B, Yuille AL (2015) Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2800–2809Google Scholar
  48. 48.
    Wang Q, Li S, Qin H, Hao A (2016) Super-resolution of multi-observed rgb-d images based on nonlocal regression and total variation. IEEE Trans Image Process 25 (3):1425–1440MathSciNetCrossRefGoogle Scholar
  49. 49.
    Xu X, He L, Lu H, Gao L, Ji Y (2018) Deep adversarial metric learning for cross-modal retrieval. World Wide Web, 1–16Google Scholar
  50. 50.
    Yang W, Zhou Q, Fan Y, Gao G, Wu S, Ou W, Lu H, Cheng J, Latecki LJ (2017) Deep context convolutional neural networks for semantic segmentation. In: CCF Chinese conference on computer vision. Springer, pp 696–704Google Scholar
  51. 51.
    Yu L, Zhang W, Wang J, Yu Y (2017) Seqgan: sequence generative adversarial nets with policy gradient. In: AAAI, pp 2852–2858Google Scholar
  52. 52.
    Zhao W, Zhao F, Wang D, Lu H (2018) Defocus blur detection via multi-stream bottom-top-bottom fully convolutional network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3080–3088Google Scholar
  53. 53.
    Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE International conference on computer visionGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.College of Information Science and EngineeringOcean University of ChinaQingdaoChina

Personalised recommendations