Unified Depth Prediction and Intrinsic Image Decomposition from a Single Image via Joint Convolutional Neural Fields

  • Seungryong Kim
  • Kihong Park
  • Kwanghoon Sohn
  • Stephen Lin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9912)


We present a method for jointly predicting a depth map and intrinsic images from single-image input. The two tasks are formulated in a synergistic manner through a joint conditional random field (CRF) that is solved using a novel convolutional neural network (CNN) architecture, called the joint convolutional neural field (JCNF) model. Tailored to our joint estimation problem, JCNF differs from previous CNNs in its sharing of convolutional activations and layers between networks for each task, its inference in the gradient domain where there exists greater correlation between depth and intrinsic images, and the incorporation of a gradient scale network that learns the confidence of estimated gradients in order to effectively balance them in the solution. This approach is shown to surpass state-of-the-art methods both on single-image depth estimation and on intrinsic image decomposition.


Single-image depth estimation Intrinsic image decomposition Conditional random field Convolutional neural networks 



This research was supported by the MSIP (The Ministry of Science, ICT and Future Planning), Korea and Microsoft Research, under ICT/SW Creative research program supervised by the IITP(Institute for Information & Communications Technology Promotion) (IITP-2015-R2212-15-0008).

Supplementary material

419983_1_En_9_MOESM1_ESM.pdf (87.7 mb)
Supplementary material 1 (pdf 89826 KB)


  1. 1.
    Chen, Q., Koltun, V.: A simple model for intrinsic image decomposition with depth cues. In: ICCV (2013)Google Scholar
  2. 2.
    Laffont, P.Y., Bousseau, A., Paris, S., Durand, F., Drettakis, G.: Coherent intrinsic images from photo collections. ACM Trans. Graph. 31(6), 1–11 (2012)CrossRefGoogle Scholar
  3. 3.
    Lee, K.J., Zhao, Q., Tong, X., Gong, M., Izadi, S., Lee, S.U., Tan, P., Lin, S.: Estimation of intrinsic image sequences from image+depth video. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 327–340. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  4. 4.
    Jeon, J., Cho, S., Tong, X., Lee, S.: Intrinsic image decomposition using structure-texture separation and surface normals. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 218–233. Springer, Heidelberg (2014)Google Scholar
  5. 5.
    Barron, J.T., Malik, J.: intrinsic scene properties from a single RGB-D image. In: CVPR (2013)Google Scholar
  6. 6.
    Eigen, D., Puhrsch, C., Ferus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NIPS (2014)Google Scholar
  7. 7.
    Fayao, L., Chunhua, S., Guosheng, L.: Deep convolutional neural fields for depth estimation from a single images. In: CVPR (2015)Google Scholar
  8. 8.
    Kong, N., Black, M.J.: Intrinsic depth: Improving depth transfer with intrinsic images. In: ICCV (2015)Google Scholar
  9. 9.
    Shelhamer, E., Barron, J., Darrell, T.: Scene intrinsics and depth from a single image. In: ICCV Workshop (2015)Google Scholar
  10. 10.
    Zhou, T., Krahenbuhl, P., Efors, A.A.: Learning data-driven reflectnace priors for intrinsic image decomposition. In: ICCV (2015)Google Scholar
  11. 11.
    Narihira, T., Maire, M., Yu, S.X.: Direct intrinsics: learning albedo-shading decomposition by convolutional regression. In: ICCV (2015)Google Scholar
  12. 12.
    Saxena, A., Sun, M., Andrew, Y.: Make3D learning 3D scene structure from a single still image. IEEE Trans. PAMI 31(5), 824–840 (2009)CrossRefGoogle Scholar
  13. 13.
    Wang, Y., Wang, R., Dai, Q.: A parametric model for describing the correlation between single color images and depth maps. IEEE SPL 21(7), 800–803 (2014)Google Scholar
  14. 14.
    Li, X., Qin, H., Wang, Y., Zhang, Y., Dai, Q.: DEPT: depth estimation by parameter transfer for single still images. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9004, pp. 45–58. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-16808-1_4 Google Scholar
  15. 15.
    Konrad, J., Wang, M., Ishwar, P., Wu, C., Mukherjee, D.: Learning-based, automatic 2D-to-3D image and video conversion. IEEE Trans. IP 22(9), 3485–3496 (2013)Google Scholar
  16. 16.
    Karsch, K., Liu, C., Kang, S.B.: Depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans. PAMI 32(11), 2144–2158 (2014)CrossRefGoogle Scholar
  17. 17.
    Choi, S., Min, D., Ham, B., Kim, Y., Oh, C., Sohn, K.: Depth analogy: data-driven approach for single image depth estimation using gradient samples. IEEE Trans. IP 24(12), 5953–5966 (2015)MathSciNetGoogle Scholar
  18. 18.
    Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B., Yuille, A.: Towards unified depth and semantic prediction from a single image. In: CVPR (2015)Google Scholar
  19. 19.
    Barrow, H.G., Tenenbaum, J.M.: Recovering intrinsic scene characteristics from images. In: CVS (1978)Google Scholar
  20. 20.
    Land, E.H., Mccann, J.J.: Lightness and retinex theory. JOSA 61(1), 1–11 (1971)CrossRefGoogle Scholar
  21. 21.
    Shen, J., Tan, P., Lin, S.: Intrinsic image decomposition with non-local texture cues. In: CVPR (2008)Google Scholar
  22. 22.
    Zhao, Q., Tan, P., Dai, Q., SHen, L., Wu, E., Lin, S.: A closed-form solution to retinex with non-local texture constraints. IEEE Trans. PAMI 34(7), 1437–1444 (2012)CrossRefGoogle Scholar
  23. 23.
    Li, Y., Brown, M.S.: Single image layer separation using relative smoothness. In: CVPR (2004)Google Scholar
  24. 24.
    Bell, S., Bala, K., Snavely, N.: Intrinsic images in the wild. ACM Trans. Graph. TOG 33(4), 159 (2014)Google Scholar
  25. 25.
    Bonneel, N., Sunkavalli, K., Tompkin, J., Sun, D., Paris, S., Pfister, H.: Interactive intrinsic video editing. ACM Trans. Graph. (SIGGRAPH ASIA) 33(6), 197 (2014)Google Scholar
  26. 26.
    Wiess, Y.: Deriving intrinsic images from image sequences. In: ICCV (2001)Google Scholar
  27. 27.
    Laffont, P.Y., Bousseau, A., Drettakis, G.: Rich intrinsic image decomposition of outdoor scenes from multiple views. IEEE TVCG 19(2), 1–11 (2013)Google Scholar
  28. 28.
    Kong, N., Gehler, P.V., Black, M.J.: Intrinsic video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part II. LNCS, vol. 8690, pp. 360–375. Springer, Heidelberg (2014)Google Scholar
  29. 29.
    Bousseau, A., Paris, S., Durand, F.: User-assisted intrinsic images. ACM TOG 28(5), 1–11 (2009)CrossRefGoogle Scholar
  30. 30.
    Shen, J., Yang, X., Jia, Y.: Intrinsic image using optimization. In: CVPR (2011)Google Scholar
  31. 31.
    Barron, J., Malik, J.: Shape, albedo, and illumination from a single image of an unknown object. In: CVPR (2012)Google Scholar
  32. 32.
    Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 611–625. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  33. 33.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. PAMI 37(9), 1904–1916 (2015)CrossRefGoogle Scholar
  34. 34.
    Perez, P., Gangnet, M., Blake, A.: Poisson image editing. ACM TOG 22(3), 313–318 (2003)CrossRefGoogle Scholar
  35. 35.
    Xu, L., Ren, J., Yan, Q., Liao, R., Jia, J.: Deep edge-aware filters. In: ICML (2015)Google Scholar
  36. 36.
    Shen, X., Yan, Q., Xu, L., Ma, L., Jia, J.: Multispectral joint image restoration via optimizing a scale map. IEEE Trans. PAMI 31(9), 1582–1599 (2015)Google Scholar
  37. 37.
    Eigen, D., R, F.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV (2015)Google Scholar
  38. 38.
    Alex, K., Ilya, S., E, H.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  39. 39.
    Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. PAMI 37(3), 597–610 (2015)CrossRefGoogle Scholar
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
    Grosse, R., Johnson, M.K., Adelson, E.H., Freeman, W.T.: Ground truth and baseline evaluations for intrinsic image algorithms. In: ICCV (2009)Google Scholar
  45. 45.
    Liu, M., Salzmann, M., He, X.: Discrete-continuous depth estimation from a single image. In: CVPR (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Seungryong Kim
    • 1
  • Kihong Park
    • 1
  • Kwanghoon Sohn
    • 1
  • Stephen Lin
    • 2
  1. 1.Yonsei UniversitySeoulSouth Korea
  2. 2.Microsoft ResearchRedmondUSA

Personalised recommendations