Depth Map Super-Resolution by Deep Multi-Scale Guidance

  • Tak-Wai Hui
  • Chen Change LoyEmail author
  • Xiaoou Tang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9907)


Depth boundaries often lose sharpness when upsampling from low-resolution (LR) depth maps especially at large upscaling factors. We present a new method to address the problem of depth map super resolution in which a high-resolution (HR) depth map is inferred from a LR depth map and an additional HR intensity image of the same scene. We propose a Multi-Scale Guided convolutional network (MSG-Net) for depth map super resolution. MSG-Net complements LR depth features with HR intensity features using a multi-scale fusion strategy. Such a multi-scale guidance allows the network to better adapt for upsampling of both fine- and large-scale structures. Specifically, the rich hierarchical HR intensity features at different levels progressively resolve ambiguity in depth map upsampling. Moreover, we employ a high-frequency domain training method to not only reduce training time but also facilitate the fusion of depth and intensity features. With the multi-scale guidance, MSG-Net achieves state-of-art performance for depth map upsampling.


Sparse Code Convolutional Neural Network Super Resolution Joint Bilateral Filter Image Super Resolution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work is partially supported by SenseTime Group Limited.


  1. 1.
    Kopf, J., Cohen, M., Lischinski, D., Uyttendaele, M.: Joint bilateral upsampling. ToG 26(3), Article No. 96 (2007)Google Scholar
  2. 2.
    Park, J., Kim, H., Tai, Y.W., Brown, M., Kweon, I.: High quality depth map upsampling for 3D-TOF cameras. In: ICCV, pp. 1623–1630 (2011)Google Scholar
  3. 3.
    Kiechle, M., Hawe, S., Kleinsteuber, M.: A joint intensity and depth co-sparse analysis model for depth map super-resolution. In: ICCV, pp. 1545–1552 (2013)Google Scholar
  4. 4.
    Ferstl, D., Reinbacher, C., Ranftl, R., Rüther, M., Bischof, H.: Image guided depth upsampling using anisotropic total generalized variation. In: ICCV, pp. 993–1000 (2013)Google Scholar
  5. 5.
    Yang, J., Ye, X., Li, K., Hou, C., Wang, Y.: Color-guided depth recovery from RGB-D data using an adaptive autoregressive model. TIP 23(8), 3962–3969 (2014)MathSciNetGoogle Scholar
  6. 6.
    Lu, J., Forsyth, D.: Sparse depth super resolution. In: CVPR, pp. 2245–2253 (2015)Google Scholar
  7. 7.
    Kwon, H., Tai, Y.W., Lin, S.: Data-driven depth map refinement via multi-scale sparse representation. In: CVPR, pp. 159–167 (2015)Google Scholar
  8. 8.
    He, K., Sun, J., Tang, X.: Guided image filtering. PAMI 35(6), 1397–1409 (2013)CrossRefGoogle Scholar
  9. 9.
    Hui, T.W., Ngan, K.: Depth enhancement using RGB-D guided filtering. In: ICIP, pp. 3832–3836 (2014)Google Scholar
  10. 10.
    Shen, X., Zhou, C., Xu, L., Jia, J.: Mutual-structure for joint filtering. In: ICCV, pp. 3406–3414 (2015)Google Scholar
  11. 11.
    Dong, C., Loy, C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. PAMI 38(2), 295–307 (2015)CrossRefGoogle Scholar
  12. 12.
    Wang, Z., Liu, D., Yang, J., Han, W., Huang, T.: Deep networks for image super-resolution with sparse prior. In: ICCV, pp. 370–378 (2015)Google Scholar
  13. 13.
    Yang, Q., Yang, R., Davis, J., Nistér, D.: Spatial-depth super resolution for range images. In: CVPR (2007)Google Scholar
  14. 14.
    Liu, M.Y., Tuzel, O., Taguchi, Y.: Joint geodesic upsampling of depth images. In: CVPR, pp. 169–176 (2013)Google Scholar
  15. 15.
    Diebel, J., Thrun, S.: An application of Markov random fields to range sensing. In: NIPS (2005)Google Scholar
  16. 16.
    Mac Aodha, O., Campbell, N.D.F., Nair, A., Brostow, G.J.: Patch based synthesis for single depth image super-resolution. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 71–84. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33712-3_6 Google Scholar
  17. 17.
    Yang, J., Wright, J., Huang, T., Ma, Y.: Image super-resolution via sparse representation. TIP 11(9), 2861–2873 (2010)MathSciNetGoogle Scholar
  18. 18.
    Timofte, R., Smet, V.D., Gool, L.V.: Anchored neighborhood regression for fast example-based super-resolution. In: ICCV, pp. 1920–1927 (2013)Google Scholar
  19. 19.
    Ferstl, D., Ruether, M., Bischof, H.: Variational depth superresolution using example-based edge representations. In: ICCV, pp. 513–521 (2015)Google Scholar
  20. 20.
    Li, Y., Xue, T., Sun, L., Liu, J.: Joint example-based depth map super-resolution. In: ICME, pp. 152–157 (2012)Google Scholar
  21. 21.
    Gregor, K., LeCun, Y.: Learning fast approximations of sparse coding. In: ICML, pp. 399–406 (2010)Google Scholar
  22. 22.
    Osendorfer, C., Soyer, H., Smagt, P.: Image super-resolution with fast approximate convolutional sparse coding. In: Loo, C.K., Yap, K.S., Wong, K.W., Beng Jin, A.T., Huang, K. (eds.) ICONIP 2014. LNCS, vol. 8836, pp. 250–257. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-12643-2_31 Google Scholar
  23. 23.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)Google Scholar
  24. 24.
    Dosovitskiy, A., Springenberg, J.T., Brox, T.: Learning to generate chairs with convolutional neural networks. In: CVPR, pp. 1538–1546 (2015)Google Scholar
  25. 25.
    Fischer, P., Dosovitskiy, A., Ilg, E., Häusser, P., Hazirbas, C., Golkov, V., Smagt, P., Cremers, D., Brox, T.: FlowNet: learning optical flow with convolutional networks. In: ICCV, pp. 2758–2766 (2015)Google Scholar
  26. 26.
    Xie, S., Tu, Z.: Holistically-nested edge detection. In: ICCV, pp. 1395–1403 (2015)Google Scholar
  27. 27.
    Hui, T.W., Ngan, K.: Motion-depth: RGB-D depth map enhancement with motion and depth in complement. In: CVPR, pp. 3962–3969 (2014)Google Scholar
  28. 28.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV, pp. 1026–1034 (2015)Google Scholar
  29. 29.
    Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 611–625. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33783-3_44 Google Scholar
  30. 30.
    Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV 47(1), 7–42 (2002)CrossRefzbMATHGoogle Scholar
  31. 31.
    Scharstein, D., Pal, C.: Learning conditional random fields for stereo. In: CVPR (2007)Google Scholar
  32. 32.
    Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G., Nešić, N., Wang, X., Westling, P.: High-resolution stereo datasets with subpixel-accurate ground truth. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 31–42. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-11752-2_3 Google Scholar
  33. 33.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
  34. 34.
    Lu, J., Shi, K., Min, D., Lin, L., Do, M.N.: Cross-based local multipoint filtering. In: CVPR, pp. 430–437 (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Department of Information EngineeringThe Chinese University of Hong KongSha TinHong Kong
  2. 2.Shenzhen Institutes of Advanced TechnologyChinese Academy of SciencesShenzhenChina

Personalised recommendations