Advertisement

BAN: Focusing on Boundary Context for Object Detection

  • Yonghyun KimEmail author
  • Taewook Kim
  • Bong-Nam Kang
  • Jieun Kim
  • Daijin Kim
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11366)

Abstract

Visual context is one of the important clue for object detection and the context information for boundaries of an object is especially valuable. We propose a boundary aware network (BAN) designed to exploit the visual contexts including boundary information and surroundings, named boundary context, and define three types of the boundary contexts: side, vertex and in/out-boundary context. Our BAN consists of 10 sub-networks for the area belonging to the boundary contexts. The detection head of BAN is defined as an ensemble of these sub-networks with different contributions depending on the sub-problem of detection. To verify our method, we visualize the activation of the sub-networks according to the boundary contexts and empirically show that the sub-networks contribute more to the related sub-problem in detection. We evaluate our method on PASCAL VOC detection benchmark and MS COCO dataset. The proposed method achieves the mean Average Precision (mAP) of 83.4% on PASCAL VOC and 36.9% on MS COCO. BAN allows the convolution network to provide an additional source of contexts for detection and selectively focus on the more important contexts, and it can be generally applied to many other detection methods as well to enhance the accuracy in detection.

Keywords

Visual context Boundary context Object detection Convolutional neural network 

Notes

Acknowledgments

This work was supported by IITP grant funded by the Korea government (MSIT) (IITP-2014-3-00059, Development of Predictive Visual Intelligence Technology, IITP-2017-0-00897, SW Starlab support program, and IITP-2018-0-01290, Development of Open Informal Dataset and Dynamic Object Recognition Technology Affecting Autonomous Driving).

References

  1. 1.
    Adelson, E.H., Anderson, C.H., Bergen, J.R., Burt, P.J., Ogden, J.M.: Pyramid methods in image processing. RCA Eng. 29, 33–41 (1984)Google Scholar
  2. 2.
    Avidan, S.: SpatialBoost: adding spatial reasoning to adaboost. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 386–396. Springer, Heidelberg (2006).  https://doi.org/10.1007/11744085_30CrossRefGoogle Scholar
  3. 3.
    Carbonetto, P., de Freitas, N., Barnard, K.: A statistical model for general contextual object recognition. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3021, pp. 350–362. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-24670-1_27CrossRefGoogle Scholar
  4. 4.
    Dai, J., et al.: Deformable convolutional networks. In: IEEE International Conference on Computer Vision (2017)Google Scholar
  5. 5.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (2005)Google Scholar
  6. 6.
    Ding, Y., Xiao, J.: Contextual boost for pedestrian detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  7. 7.
    Divvala, S.K., Hoiem, D., Hays, J.H., Efros, A.A., Hebert, M.: An empirical study of context in object detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  8. 8.
    Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 36, 1532–1545 (2014)CrossRefGoogle Scholar
  9. 9.
    Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: British Machine Vision Conference (BMVC) (2009)Google Scholar
  10. 10.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. (IJCV) 88, 303–338 (2010)CrossRefGoogle Scholar
  11. 11.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 32, 1627–1645 (2010)CrossRefGoogle Scholar
  12. 12.
    Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  13. 13.
    Gonzalez, R.C.: Digital Image Processing. Pearson Education India, Bengaluru (2009)Google Scholar
  14. 14.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  15. 15.
    Li, Y., He, K., Sun, J., et al.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems (NIPS) (2016)Google Scholar
  16. 16.
    Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  17. 17.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision (2017)Google Scholar
  18. 18.
    Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  19. 19.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  20. 20.
    Lu, Y., Javidi, T., Lazebnik, S.: Adaptive object detection using adjacency and zoom prediction. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  21. 21.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  22. 22.
    Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  23. 23.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)Google Scholar
  24. 24.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115, 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  26. 26.
    Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. (IJCV) 104, 154–171 (2013)CrossRefGoogle Scholar
  27. 27.
    Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. (IJCV) 57, 137–154 (2004)CrossRefGoogle Scholar
  28. 28.
    Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 35, 2878–2890 (2013)CrossRefGoogle Scholar
  29. 29.
    Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_26CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringPOSTECHPohangKorea
  2. 2.Department of Creative IT EngineeringPOSTECHPohangKorea

Personalised recommendations