Advertisement

Pixel-Pair Occlusion Relationship Map (P2ORM): Formulation, Inference and Application

Conference paper
  • 848 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12349)

Abstract

We formalize concepts around geometric occlusion in 2D images (i.e., ignoring semantics), and propose a novel unified formulation of both occlusion boundaries and occlusion orientations via a pixel-pair occlusion relation. The former provides a way to generate large-scale accurate occlusion datasets while, based on the latter, we propose a novel method for task-independent pixel-level occlusion relationship estimation from single images. Experiments on a variety of datasets demonstrate that our method outperforms existing ones on this task. To further illustrate the value of our formulation, we also propose a new depth map refinement method that consistently improve the performance of state-of-the-art monocular depth estimation methods.

Keywords

Occlusion relation Occlusion boundary Depth refinement 

Notes

Acknowledgements.

We thank Yuming Du and Michael Ramamonjisoa for helpful discussions and for offering their GT annotations of occlusion boundaries for a large part of NYUv2, which we completed (NYUv2-OC++) [39]. This work was partly funded by the I-Site FUTURE initiative, through the DiXite project.

Supplementary material

504439_1_En_40_MOESM1_ESM.pdf (17.7 mb)
Supplementary material 1 (pdf 18170 KB)

References

  1. 1.
    Acuna, D., Kar, A., Fidler, S.: Devil is in the edges: learning semantic boundaries from noisy annotations. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11075–11083 (2019)Google Scholar
  2. 2.
    Apostoloff, N., Fitzgibbon, A.: Learning spatiotemporal t-junctions for occlusion detection. In: Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 553–559. IEEE (2005)Google Scholar
  3. 3.
    Barron, J.T., Poole, B.: The fast bilateral solver. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 617–632. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_38CrossRefGoogle Scholar
  4. 4.
    Boulch, A., Marlet, R.: Fast and robust normal estimation for point clouds with sharp features. Comput. Graph. Forum (CGF) 31(5), 1765–1774 (2012)CrossRefGoogle Scholar
  5. 5.
    Cooper, M.C.: Interpreting line drawings of curved objects with tangential edges and surfaces. Image Vis. Comput. 15(4), 263–276 (1997)CrossRefGoogle Scholar
  6. 6.
    Dollár, P., Zitnick, C.L.: Fast edge detection using structured forests. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 37(8), 1558–1570 (2014)CrossRefGoogle Scholar
  7. 7.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2650–2658 (2015)Google Scholar
  8. 8.
    Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems (NeurIPS), pp. 2366–2374. Curran Associates, Inc. (2014)Google Scholar
  9. 9.
    Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2002–2011 (2018)Google Scholar
  10. 10.
    Fu, H., Wang, C., Tao, D., Black, M.J.: Occlusion boundary detection via deep exploration of context. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 241–250 (2016)Google Scholar
  11. 11.
    Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 270–279 (2017)Google Scholar
  12. 12.
    He, K., Sun, J., Tang, X.: Guided image filtering. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 1–14. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15549-9_1CrossRefGoogle Scholar
  13. 13.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)Google Scholar
  14. 14.
    He, X., Yuille, A.: Occlusion boundary detection using pseudo-depth. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 539–552. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15561-1_39CrossRefGoogle Scholar
  15. 15.
    Heise, P., Klose, S., Jensen, B., Knoll, A.: PM-Huber: patchmatch with Huber regularization for stereo matching. In: International Conference on Computer Vision (ICCV), pp. 2360–2367 (2013)Google Scholar
  16. 16.
    Heo, M., Lee, J., Kim, K.-R., Kim, H.-U., Kim, C.-S.: Monocular depth estimation using whole strip masking and reliability-based refinement. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 39–55. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01225-0_3CrossRefGoogle Scholar
  17. 17.
    Hoiem, D., Efros, A.A., Hebert, M.: Recovering occlusion boundaries from an image. Int. J. Comput. Vis. (IJCV) 91, 328–346 (2010)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Hong, Z., Chen, Z., Wang, C., Mei, X., Prokhorov, D., Tao, D.: Multi-store tracker (muster): a cognitive psychology inspired approach to object tracking. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 749–758 (2015)Google Scholar
  19. 19.
    Ilg, E., Saikia, T., Keuper, M., Brox, T.: Occlusions, motion and depth boundaries with a generic network for disparity, optical flow or scene flow estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 626–643. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01258-8_38CrossRefGoogle Scholar
  20. 20.
    Jiao, J., Cao, Y., Song, Y., Lau, R.: Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 55–71. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01267-0_4CrossRefGoogle Scholar
  21. 21.
    Koch, T., Liebel, L., Fraundorfer, F., Körner, M.: Evaluation of CNN-based single-image depth estimation methods. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11131, pp. 331–348. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-11015-4_25CrossRefGoogle Scholar
  22. 22.
    Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: International Conference on 3D Vision (3DV), pp. 239–248. IEEE (2016)Google Scholar
  23. 23.
    Lee, J.H., Kim, C.S.: Monocular depth estimation using relative depth maps. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9729–9738 (2019)Google Scholar
  24. 24.
    Leichter, I., Lindenbaum, M.: Boundary ownership by lifting to 2.1-D. In: International Conference on Computer Vision (ICCV), pp. 9–16. IEEE (2008)Google Scholar
  25. 25.
    Li, J.Y., Klein, R., Yao, A.: A two-streamed network for estimating fine-scaled depth maps from single RGB images. In: International Conference on Computer Vision (ICCV), pp. 3392–3400 (2016)Google Scholar
  26. 26.
    Li, W., et al.: InteriorNet: mega-scale multi-sensor photo-realistic indoor scenes dataset. In: British Machine Vision Conference (BMVC) (2018)Google Scholar
  27. 27.
    Liu, C., Yang, J., Ceylan, D., Yumer, E., Furukawa, Y.: PlaneNet: piece-wise planar reconstruction from a single RGB image. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2579–2588 (2018)Google Scholar
  28. 28.
    Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  29. 29.
    Liu, Y., Cheng, M.M., Fan, D.P., Zhang, L., Bian, J., Tao, D.: Semantic edge detection with diverse deep supervision. arXiv preprint arXiv:1804.02864 (2018)
  30. 30.
    Lu, R., Xue, F., Zhou, M., Ming, A., Zhou, Y.: Occlusion-shared and feature-separated network for occlusion relationship reasoning. In: International Conference on Computer Vision (ICCV) (2019)Google Scholar
  31. 31.
    Martin, D.R., Fowlkes, C.C., Malik, J.: Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 26(5), 530–549 (2004)CrossRefGoogle Scholar
  32. 32.
    Massa, F., Marlet, R., Aubry, M.: Crafting a multi-task CNN for viewpoint estimation. In: British Machine Vision Conference (BMVC) (2016)Google Scholar
  33. 33.
    Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33715-4_54CrossRefGoogle Scholar
  34. 34.
    Nitzberg, M., Mumford, D.B.: The 2.1-D Sketch. IEEE Computer Society Press (1990)Google Scholar
  35. 35.
    Oberweger, M., Rad, M., Lepetit, V.: Making deep heatmaps robust to partial occlusions for 3D object pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 125–141. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01267-0_8CrossRefGoogle Scholar
  36. 36.
    Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6DoF pose estimation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4561–4570 (2019)Google Scholar
  37. 37.
    Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: International Conference on Computer Vision (ICCV), pp. 3828–3836 (2017)Google Scholar
  38. 38.
    Rafi, U., Gall, J., Leibe, B.: A semantic occlusion model for human pose estimation from a single depth image. In: Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), pp. 67–74 (2015)Google Scholar
  39. 39.
    Ramamonjisoa, M., Du, Y., Lepetit, V.: Predicting sharp and accurate occlusion boundaries in monocular depth estimation using displacement fields. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14648–14657 (2020)Google Scholar
  40. 40.
    Ramamonjisoa, M., Lepetit, V.: SharpNet: fast and accurate recovery of occluding contours in monocular depth estimation. In: International Conference on Computer Vision Workshops (ICCV Workshops) (2019)Google Scholar
  41. 41.
    Raskar, R., Tan, K.H., Feris, R., Yu, J., Turk, M.: Non-photorealistic camera: depth edge detection and stylized rendering using multi-flash imaging. ACM Trans. Graph. (TOG) 23(3), 679–688 (2004)CrossRefGoogle Scholar
  42. 42.
    Ren, X., Fowlkes, C.C., Malik, J.: Figure/ground assignment in natural images. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 614–627. Springer, Heidelberg (2006).  https://doi.org/10.1007/11744047_47CrossRefGoogle Scholar
  43. 43.
    Ricci, E., Ouyang, W., Wang, X., Sebe, N., et al.: Monocular depth estimation using multi-scale continuous CRFs as sequential deep networks. IEEE Trans. Pattern Anal. Mach. Intell.(PAMI) 41(6), 1426–1440 (2018)Google Scholar
  44. 44.
    Roberts, L.G.: Machine perception of three-dimensional solids. Ph.D. thesis, Massachusetts Institute of Technology (1963)Google Scholar
  45. 45.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing & Computer Assisted Intervention (MICCAI) (2015)Google Scholar
  46. 46.
    Stein, A.N., Hebert, M.: Occlusion boundaries from motion: low-level detection and mid-level reasoning. Int. J. Comput. Vis. (IJCV) 82, 325–357 (2008)CrossRefGoogle Scholar
  47. 47.
    Su, H., Jampani, V., Sun, D., Gallo, O., Learned-Miller, E., Kautz, J.: Pixel-adaptive convolutional neural networks. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11166–11175 (2019)Google Scholar
  48. 48.
    Sugihara, K.: Machine Interpretation of Line Drawings, vol. 1. MIT press Cambridge (1986)Google Scholar
  49. 49.
    Teo, C., Fermuller, C., Aloimonos, Y.: Fast 2D border ownership assignment. In: Conference on Computer Vision and Pattern Recognition (CVPR) pp. 5117–5125 (2015)Google Scholar
  50. 50.
    Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: International Conference on Computer Vision (ICCV), pp. 839–846 (1998)Google Scholar
  51. 51.
    Wang, G., Liang, X., Li, F.W.B.: DOOBNet: deep object occlusion boundary detection from an image. In: Asian Conference on Computer Vision (ACCV) (2018)Google Scholar
  52. 52.
    Wang, P., Shen, X., Russell, B., Cohen, S., Price, B., Yuille, A.L.: Surge: surface regularized geometry estimation from a single image. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 172–180 (2016)Google Scholar
  53. 53.
    Wang, P., Yuille, A.: DOC: deep OCclusion estimation from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 545–561. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_33CrossRefGoogle Scholar
  54. 54.
    Wang, Y., Yang, Y., Yang, Z., Zhao, L., Wang, P., Xu, W.: Occlusion aware unsupervised learning of optical flow. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4884–4893 (2018)Google Scholar
  55. 55.
    Wu, H., Zheng, S., Zhang, J., Huang, K.: Fast end-to-end trainable guided filter. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1838–1847 (2018)Google Scholar
  56. 56.
    Xie, S., Tu, Z.: Holistically-nested edge detection. In: International Conference on Computer Vision (ICCV), pp. 1395–1403 (2015)Google Scholar
  57. 57.
    Xu, D., Ricci, E., Ouyang, W., Wang, X., Sebe, N.: Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5354–5362 (2017)Google Scholar
  58. 58.
    Yin, W., Liu, Y., Shen, C., Yan, Y.: Enforcing geometric constraints of virtual normal for depth prediction. In: International Conference on Computer Vision (ICCV) (2019)Google Scholar
  59. 59.
    Yu, Z., Feng, C., Liu, M.Y., Ramalingam, S.: CASENet: deep category-aware semantic edge detection. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5964–5973 (2017)Google Scholar
  60. 60.
    Yu, Z., et al.: Simultaneous edge alignment and learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 400–417. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01219-9_24CrossRefGoogle Scholar
  61. 61.
    Zheng, C., Cham, T.-J., Cai, J.: T\(^2\)Net: synthetic-to-realistic translation for solving single-image depth estimation tasks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 798–814. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01234-2_47CrossRefGoogle Scholar
  62. 62.
    Zitnick, C.L., Kanade, T.: A cooperative algorithm for stereo matching and occlusion detection. IEEE Trans. Pattern Anal. Mach. Intell.(PAMI) 22(7), 675–684 (2000)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.LIGM, Ecole des Ponts, Univ Gustave Eiffel, CNRS, ESIEE ParisChamps-sur-MarneFrance
  2. 2.valeo.aiParisFrance

Personalised recommendations