Multi-loss Rebalancing Algorithm for Monocular Depth Estimation

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12362)


An algorithm to combine multiple loss terms adaptively for training a monocular depth estimator is proposed in this work. We construct a loss function space containing tens of losses. Using more losses can improve inference capability without any additional complexity in the test phase. However, when many losses are used, some of them may be neglected during training. Also, since each loss decreases at a different speed, adaptive weighting is required to balance the contributions of the losses. To address these issues, we propose the loss rebalancing algorithm that initializes and rebalances the weight for each loss function adaptively in the course of training. Experimental results show that the proposed algorithm provides state-of-the-art depth estimation results on various datasets. Codes are available at


Monocular depth estimation Multi-loss rebalancing 



This work was conducted by Center for Applied Research in Artificial Intelligence (CARAI) grant funded by Defense Acquisition Program Administration (DAPA) and Agency for Defense Development (ADD) (UD190031RD).

Supplementary material (74.4 mb)
Supplementary material 1 (zip 76141 KB)


  1. 1.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)CrossRefGoogle Scholar
  2. 2.
    Gan, Y., Xu, X., Sun, W., Lin, L.: Monocular depth estimation with affinity, vertical pooling, and label enhancement. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 232–247. Springer, Cham (2018). Scholar
  3. 3.
    Chakrabarti, A., Shao, J., Shakhnarovich, G.: Depth from a single image by harmonizing overcomplete local network predictions. In: NIPS (2016)Google Scholar
  4. 4.
    Chang, A., et al.: Matterport3D: Learning from RGB-D data in indoor environments. In: 3DV (2018)Google Scholar
  5. 5.
    Chen, W., Fu, Z., Yang, D., Deng, J.: Single-image depth perception in the wild. In: NIPS (2016)Google Scholar
  6. 6.
    Chen, Z., Badrinarayanan, V., Lee, C.Y., Rabinovich, A.: GradNorm: gradient normalization for adaptive loss balancing in deep multitask networks. In: ICML (2018)Google Scholar
  7. 7.
    Delage, E., Lee, H., Ng, A.Y.: A dynamic Bayesian network model for autonomous 3D reconstruction from a single indoor image. In: CVPR (2006)Google Scholar
  8. 8.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV (2015)Google Scholar
  9. 9.
    Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NIPS (2014)Google Scholar
  10. 10.
    Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: CVPR (2018)Google Scholar
  11. 11.
    Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)CrossRefGoogle Scholar
  12. 12.
    Godard, C., Aodha, O.M., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)Google Scholar
  13. 13.
    Gupta, A., Efros, A.A., Hebert, M.: Blocks world revisited: image understanding using qualitative geometry and mechanics. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 482–496. Springer, Heidelberg (2010). Scholar
  14. 14.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  15. 15.
    Heo, M., Lee, J., Kim, K.-R., Kim, H.-U., Kim, C.-S.: Monocular depth estimation using whole strip masking and reliability-based refinement. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 39–55. Springer, Cham (2018). Scholar
  16. 16.
    Hu, J., Ozay, M., Zhang, Y., Okatani, T.: Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries. In: WACV (2019)Google Scholar
  17. 17.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)Google Scholar
  18. 18.
    Huang, G., Liu, Z., van der Maaten, L.: Densely connected convolutional networks. In: CVPR (2017)Google Scholar
  19. 19.
    Jiao, J., Cao, Y., Song, Y., Lau, R.: Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 55–71. Springer, Cham (2018). Scholar
  20. 20.
    Karsch, K., Liu, C., Kang, S.B.: Depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2144–2158 (2014)CrossRefGoogle Scholar
  21. 21.
    Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR (2018)Google Scholar
  22. 22.
    Kim, S., Park, K., Sohn, K., Lin, S.: Unified depth prediction and intrinsic image decomposition from a single image via joint convolutional neural fields. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 143–159. Springer, Cham (2016). Scholar
  23. 23.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  24. 24.
    Kundu, J.N., Uppala, P.K., Pahuja, A., Babu, R.V.: AdaDepth: unsupervised content congruent adaptation for depth estimation. In: CVPR (2018)Google Scholar
  25. 25.
    Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 3DV (2016)Google Scholar
  26. 26.
    Lee, D.C., Gupta, A., Hebert, M., Kanade, T.: Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In: NIPS (2010)Google Scholar
  27. 27.
    Lee, J.H., Heo, M., Kim, K.R., Kim, C.S.: Single-image depth estimation based on Fourier domain analysis. In: CVPR (2018)Google Scholar
  28. 28.
    Lee, J.H., Kim, C.S.: Monocular depth estimation using relative depth maps. In: CVPR (2019)Google Scholar
  29. 29.
    Levin, A., Lischinski, D., Weiss, Y.: Colorization using optimization. ACM Trans. Graph. 23(3), 689–694 (2004)CrossRefGoogle Scholar
  30. 30.
    li, B., Dai, Y., He, M.: Monocular depth estimation with hierarchical fusion of dilated CNNs and soft-weighted-sum inference. Pattern Recognit. 83, 328–339 (2018)Google Scholar
  31. 31.
    Li, B., Shen, C., Dai, Y., van den Hengel, A., He, M.: Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In: CVPR (2015)Google Scholar
  32. 32.
    Lim, K., Shin, N.H., Lee, Y.Y., Kim, C.S.: Order learning and its application to age estimation. In: ICLR (2020)Google Scholar
  33. 33.
    Liu, C., et al.: Progressive neural architecture search. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 19–35. Springer, Cham (2018). Scholar
  34. 34.
    Liu, F., Shen, C., Lin, G., Reid, I.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2024–2039 (2016)CrossRefGoogle Scholar
  35. 35.
    Ma, F., Karaman, S.: Sparse-to-Dense: depth prediction from sparse depth samples and a single image. In: ICRA (2018)Google Scholar
  36. 36.
    Mousavian, A., Pirsiavash, H.: Joint semantic segmentation and depth estimation with deep convolutional networks. In: 3DV (2016)Google Scholar
  37. 37.
    Qi, X., Liao, R., Liu, Z., Urtasun, R., Jia, J.: GeoNet: geometric neural network for joint depth and surface normal estimation. In: CVPR (2018)Google Scholar
  38. 38.
    Rajagopalan, A., Chaudhuri, S., Mudenagudi, U.: Depth estimation and image restoration using defocused stereo pairs. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1521–1525 (2004)CrossRefGoogle Scholar
  39. 39.
    Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3-D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)CrossRefGoogle Scholar
  40. 40.
    Sener, O., Koltun, V.: Multi-task learning as multi-objective optimization. In: NIPS (2018)Google Scholar
  41. 41.
    Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). Scholar
  42. 42.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  43. 43.
    Song, S., Lichtenberg, S.P., Xiao, J.: SUN RGB-D: a RGB-D scene understanding benchmark suite. In: CVPR (2015)Google Scholar
  44. 44.
    Wedel, A., Franke, U., Klappstein, J., Brox, T., Cremers, D.: Realtime depth estimation and obstacle detection from monocular video. In: Franke, K., Müller, K.-R., Nickolay, B., Schäfer, R. (eds.) DAGM 2006. LNCS, vol. 4174, pp. 475–484. Springer, Heidelberg (2006). Scholar
  45. 45.
    Xian, K., Shen, C., Cao, Z., Lu, H., Xiao, Y.: Monocular relative depth perception with web stereo data supervision. In: CVPR (2018)Google Scholar
  46. 46.
    Xu, D., Ricci, E., Ouyang, W., Wang, X., Sebe, N.: Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation. In: CVPR (2017)Google Scholar
  47. 47.
    Yang, J., Price, B., Cohen, S.: Object contour detection with a fully convolutional encoder-decoder network. In: CVPR (2016)Google Scholar
  48. 48.
    Yin, Z., Shi, J.: GeoNet: unsupervised learning of dense depth, optical flow and camera pose. In: CVPR (2018)Google Scholar
  49. 49.
    Zhang, Y., Funkhouser, T.: Deep depth completion of a single RGB-D image. In: CVPR (2018)Google Scholar
  50. 50.
    Zhang, Z., Cui, Z., Xu, C.: Pattern-affinitive propagation across depth, surface normal and semantic segmentation. In: CVPR (2019)Google Scholar
  51. 51.
    Zhang, Z., Cui, Z., Xu, C., Jie, Z., Li, X., Yang, J.: Joint task-recursive learning for semantic segmentation and depth estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 238–255. Springer, Cham (2018). Scholar
  52. 52.
    Zoran, D., Isola, P., Krishnan, D., Freeman, W.T.: Learning ordinal relationships for mid-level vision. In: ICCV (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of Electrical EngineeringKorea UniversitySeoulKorea

Personalised recommendations