Advertisement

DOOBNet: Deep Object Occlusion Boundary Detection from an Image

  • Guoxia Wang
  • Xiaochuan Wang
  • Frederick W. B. Li
  • Xiaohui LiangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11366)

Abstract

Object occlusion boundary detection is a fundamental and crucial research problem in computer vision. Solving this problem is challenging as we encounter extreme boundary/non-boundary class imbalance during the training of an object occlusion boundary detector. In this paper, we propose to address this class imbalance by up-weighting the loss contribution of false negative and false positive examples with our novel Attention Loss function. We also propose a unified end-to-end multi-task deep object occlusion boundary detection network (DOOBNet) by sharing convolutional features to simultaneously predict object boundary and occlusion orientation. DOOBNet adopts an encoder-decoder structure with skip connection in order to automatically learn multi-scale and multi-level features. We significantly surpass the state-of-the-art on the PIOD dataset (ODS F-score of .702) and the BSDS ownership dataset (ODS F-score of .555), as well as improving the detecting speed to as 0.037 s per image on the PIOD dataset.

Keywords

Boundary detection Occlusion reasoning Convolutional neural network 

Notes

Acknowledgement

This work is supported by National Key R&D Program of China (2017YFB1002702) and National Nature Science Foundation of China (61572058). We would like to thank Peng Wang for helping with generating DOC experimental results and valuable discussions.

References

  1. 1.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)CrossRefGoogle Scholar
  2. 2.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)CrossRefGoogle Scholar
  3. 3.
    Cooper, M.C.: Interpreting line drawings of curved objects with tangential edges and surfaces. Image Vis. Comput. 15(4), 263–276 (1997)CrossRefGoogle Scholar
  4. 4.
    Dollár, P., Zitnick, C.L.: Fast edge detection using structured forests. IEEE Trans. Pattern Anal. Mach. Intell. 37(8), 1558–1570 (2015)CrossRefGoogle Scholar
  5. 5.
    Fu, H., Wang, C., Tao, D., Black, M.J.: Occlusion boundary detection via deep exploration of context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 241–250 (2016)Google Scholar
  6. 6.
    Gao, T., Packer, B., Koller, D.: A segmentation-aware object detection model with occlusion handling. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1361–1368. IEEE (2011)Google Scholar
  7. 7.
    Girshick, R.: Fast R-CNN. In: International Conference on Computer Vision (2015)Google Scholar
  8. 8.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)Google Scholar
  9. 9.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  10. 10.
    He, X., Yuille, A.: Occlusion boundary detection using pseudo-depth. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 539–552. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15561-1_39CrossRefGoogle Scholar
  11. 11.
    Hoiem, D., Stein, A.N., Efros, A.A., Hebert, M.: Recovering occlusion boundaries from a single image. In: IEEE 11th International Conference on Computer Vision, ICCV 2007, pp. 1–8. IEEE (2007)Google Scholar
  12. 12.
    Hu, X., Liu, Y., Wang, K., Ren, B.: Learning hybrid convolutional features foredge detection. Neurocomputing 313, 377–385 (2018)CrossRefGoogle Scholar
  13. 13.
    Hwang, J.J., Liu, T.L.: Pixel-wise deep learning for contour detection. arXiv preprint arXiv:1504.01989 (2015)
  14. 14.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
  15. 15.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
  16. 16.
    Kokkinos, I.: Pushing the boundaries of boundary detection using deep learning. arXiv preprint arXiv:1511.07386 (2015)
  17. 17.
    Leichter, I., Lindenbaum, M.: Boundary ownership by lifting to 2.1 d. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 9–16. IEEE (2009)Google Scholar
  18. 18.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. arXiv preprint arXiv:1708.02002 (2017)
  19. 19.
    Liu, Y., Lew, M.S.: Learning relaxed deep supervision for better edge detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 231–240 (2016)Google Scholar
  20. 20.
    Liu, Y., Cheng, M.M., Bian, J., Zhang, L., Jiang, P.T., Cao, Y.: Semantic edge detection with diverse deep supervision. arXiv preprint arXiv:1804.02864 (2018)
  21. 21.
    Liu, Y., Cheng, M.M., Hu, X., Wang, K., Bai, X.: Richer convolutional features for edge detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5872–5881. IEEE (2017)Google Scholar
  22. 22.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  23. 23.
    Maire, M.: Simultaneous segmentation and figure/ground organization using angular embedding. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 450–464. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15552-9_33CrossRefGoogle Scholar
  24. 24.
    Maire, M., Narihira, T., Yu, S.X.: Affinity CNN: learning pixel-centric pairwise relations for figure/ground embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 174–182 (2016)Google Scholar
  25. 25.
    Martin, D.R., Fowlkes, C.C., Malik, J.: Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans. Pattern Anal. Mach. Intell. 26(5), 530–549 (2004)CrossRefGoogle Scholar
  26. 26.
    Nitzberg, M., Mumford, D.: The 2.1-d sketch. In: 1990 Proceedings of Third International Conference on Computer Vision, pp. 138–144. IEEE (1990)Google Scholar
  27. 27.
    Ren, X., Fowlkes, C.C., Malik, J.: Figure/ground assignment in natural images. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 614–627. Springer, Heidelberg (2006).  https://doi.org/10.1007/11744047_47CrossRefGoogle Scholar
  28. 28.
    Roberts, L.G.: Machine perception of three-dimensional solids. Ph.D. thesis, Massachusetts Institute of Technology (1963)Google Scholar
  29. 29.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  30. 30.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  31. 31.
    Sundberg, P., Brox, T., Maire, M., Arbeláez, P., Malik, J.: Occlusion boundary detection and figure/ground assignment from optical flow. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2233–2240. IEEE (2011)Google Scholar
  32. 32.
    Teo, C.L., Fermüller, C., Aloimonos, Y.: Fast 2D border ownership assignment. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5117–5125. IEEE (2015)Google Scholar
  33. 33.
    Tighe, J., Niethammer, M., Lazebnik, S.: Scene parsing with object instances and occlusion ordering. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3748–3755. IEEE (2014)Google Scholar
  34. 34.
    Wang, P., Yuille, A.: DOC: deep occlusion estimation from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 545–561. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_33CrossRefGoogle Scholar
  35. 35.
    Xie, S., Tu, Z.: Holistically-nested edge detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1395–1403 (2015)Google Scholar
  36. 36.
    Yang, J., Price, B., Cohen, S., Lee, H., Yang, M.H.: Object contour detection with a fully convolutional encoder-decoder network (2016)Google Scholar
  37. 37.
    Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
  38. 38.
    Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Computer Vision and Pattern Recognition, vol. 1 (2017)Google Scholar
  39. 39.
    Zhang, Z., Schwing, A.G., Fidler, S., Urtasun, R.: Monocular object instance segmentation and depth ordering with CNNs. arXiv preprint arXiv:1505.03159 (2015)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Guoxia Wang
    • 1
  • Xiaochuan Wang
    • 1
  • Frederick W. B. Li
    • 2
  • Xiaohui Liang
    • 1
    Email author
  1. 1.State Key Lab of Virtual Reality Technology and SystemsBeihang UniversityBeijingChina
  2. 2.Department of Computer ScienceUniversity of DurhamDurhamUK

Personalised recommendations