Advertisement

Side-Aware Boundary Localization for More Precise Object Detection

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12349)

Abstract

Current object detection frameworks mainly rely on bounding box regression to localize objects. Despite the remarkable progress in recent years, the precision of bounding box regression remains unsatisfactory, hence limiting performance in object detection. We observe that precise localization requires careful placement of each side of the bounding box. However, the mainstream approach, which focuses on predicting centers and sizes, is not the most effective way to accomplish this task, especially when there exists displacements with large variance between the anchors and the targets. In this paper, we propose an alternative approach, named as Side-Aware Boundary Localization (SABL), where each side of the bounding box is respectively localized with a dedicated network branch. To tackle the difficulty of precise localization in the presence of displacements with large variance, we further propose a two-step localization scheme, which first predicts a range of movement through bucket prediction and then pinpoints the precise position within the predicted bucket. We test the proposed method on both two-stage and single-stage detection frameworks. Replacing the standard bounding box regression branch with the proposed design leads to significant improvements on Faster R-CNN, RetinaNet, and Cascade R-CNN, by 3.0%, 1.7%, and 0.9%, respectively. Code is available at https://github.com/open-mmlab/mmdetection.

Notes

Acknowledgement

This work is partially supported by the SenseTime Collaborative Grant on Large-scale Multi-modality Analysis (CUHK Agreement No. TS1610626 & No. TS1712093), the General Research Fund (GRF) of Hong Kong (No. 14203518 & No. 14205719), SenseTime-NTU Collaboration Project and NTU NAP.

Supplementary material

504439_1_En_24_MOESM1_ESM.pdf (2.3 mb)
Supplementary material 1 (pdf 2370 KB)

References

  1. 1.
    Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: CVPR (2018)Google Scholar
  2. 2.
    Cao, Y., Chen, K., Loy, C.C., Lin, D.: Prime sample attention in object detection. In: CVPR (2020)Google Scholar
  3. 3.
    Chen, K., et al.: Hybrid task cascade for instance segmentation. In: CVPR (2019)Google Scholar
  4. 4.
    Chen, K., et al.: MMDetection: Open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
  5. 5.
    Chen, K., et al.: Optimizing video object detection via a scale-time lattice. In: CVPR (2018)Google Scholar
  6. 6.
    Choi, J., Chun, D., Kim, H., Lee, H.J.: Gaussian YOLOv3: an accurate and fast object detector using localization uncertainty for autonomous driving. In: ICCV (2019)Google Scholar
  7. 7.
    Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS (2016)Google Scholar
  8. 8.
    Ghiasi, G., Lin, T., Pang, R., Le, Q.V.: NAS-FPN: learning scalable feature pyramid architecture for object detection. CoRR abs/1904.07392 (2019). http://arxiv.org/abs/1904.07392
  9. 9.
    Gidaris, S., Komodakis, N.: LocNet: improving localization accuracy for object detection. In: CVPR (2016)Google Scholar
  10. 10.
    Girshick, R.: Fast R-CNN. In: ICCV (2015)Google Scholar
  11. 11.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)Google Scholar
  12. 12.
    Girshick, R., Radosavovic, I., Gkioxari, G., Dollár, P., He, K.: Detectron (2018). https://github.com/facebookresearch/detectron
  13. 13.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: ICCV (2017)Google Scholar
  14. 14.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  15. 15.
    Huang, Q., Xiong, Y., Lin, D.: Unifying identification and context learning for person recognition. In: CVPR (2018)Google Scholar
  16. 16.
    Jiang, B., Luo, R., Mao, J., Xiao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 816–832. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01264-9_48CrossRefGoogle Scholar
  17. 17.
    Kong, T., Sun, F., Liu, H., Jiang, Y., Shi, J.: FoveaBox: beyond anchor-based object detector. CoRR abs/1904.03797 (2019)Google Scholar
  18. 18.
    Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 765–781. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01264-9_45CrossRefGoogle Scholar
  19. 19.
    Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR (2017)Google Scholar
  20. 20.
    Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)Google Scholar
  21. 21.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  22. 22.
    Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  23. 23.
    Lu, X., Li, B., Yue, Y., Li, Q., Yan, J.: Grid R-CNN. In: CVPR (2019)Google Scholar
  24. 24.
    Lu, X., Li, B., Yue, Y., Li, Q., Yan, J.: Grid R-CNN plus: faster and better. arXiv preprint arXiv:1906.05688 (2019)
  25. 25.
    Najibi, M., Rastegari, M., Davis, L.S.: G-CNN: an iterative grid based object detector. In: CVPR (2016)Google Scholar
  26. 26.
    Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)Google Scholar
  27. 27.
    Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR (2017)Google Scholar
  28. 28.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)Google Scholar
  29. 29.
    Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: CVPR (2019)Google Scholar
  30. 30.
    Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. CoRR abs/1904.01355 (2019)Google Scholar
  31. 31.
    Wang, J., Chen, K., Xu, R., Liu, Z., Loy, C.C., Lin, D.: CARAFE: content-aware reassembly of features. In: ICCV (2019)Google Scholar
  32. 32.
    Wang, J., Chen, K., Yang, S., Loy, C.C., Lin, D.: Region proposal by guided anchoring. In: CVPR (2019)Google Scholar
  33. 33.
    Wu, Y., He, K.: Group normalization. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 3–19. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01261-8_1CrossRefGoogle Scholar
  34. 34.
    Xiong, Y., Huang, Q., Guo, L., Zhou, H., Zhou, B., Lin, D.: A graph-based framework to bridge movies and synopses. In: ICCV (2019)Google Scholar
  35. 35.
    Yang, Z., Liu, S., Hu, H., Wang, L., Lin, S.: RepPoints: point set representation for object detection. In: ICCV (2019)Google Scholar
  36. 36.
    Zhan, X., Pan, X., Dai, B., Liu, Z., Lin, D., Loy, C.C.: Self-supervised scene de-occlusion. In: CVPR (2020)Google Scholar
  37. 37.
    Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: CVPR (2018)Google Scholar
  38. 38.
    Zhang, W., Zhou, H., Sun, S., Wang, Z., Shi, J., Loy, C.C.: Robust multi-modality multi-object tracking. In: ICCV (2019)Google Scholar
  39. 39.
    Zhang, X., Wan, F., Liu, C., Ji, R., Ye, Q.: FreeAnchor: learning to match anchors for visual object detection. In: NIPS Google Scholar
  40. 40.
    Zhao, Q., et al.: M2Det: a single-shot object detector based on multi-level feature pyramid network. In: AAAI (2019)Google Scholar
  41. 41.
    Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
  42. 42.
    Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: CVPR (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.The Chinese University of Hong KongShatinHong Kong
  2. 2.Nanyang Technological UniversitySingaporeSingapore
  3. 3.SenseTime ResearchBeijingChina
  4. 4.Zhejiang UniversityHangzhouChina
  5. 5.University of Science and Technology of ChinaHefeiChina

Personalised recommendations