Skip to main content
Log in

IoU-aware feature fusion R-CNN for dense object detection

  • ORIGINAL PAPER
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Densely packed scenes contain a multitude of similar or even identical objects positioned closely, which brings about difficulties in object-detecting. In such scenes, people usually embed the feature pyramid network module into Faster R-CNN for multi-scale detection. However, we find it is not necessary. It is observed that there exists a waste of ground-truth boxes during training, causing the imbalance of training samples and the poor performance of Faster R-CNN. Therefore, we propose an online multiple-step sampling method to increase the utilization of ground truth. Besides, we propose a novel IoU-aware feature fusion R-CNN to take place of the feature pyramid network. It could effectively improve the detection accuracy of Faster R-CNN while simplifying the structure of the feature pyramid network. Our approach improves the base detector and the detection results on SKU-110 k benchmarks indicate that our approach offers a good trade-off between accuracy and speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Arteta, C., Lempitsky, V., Zisserman, A.: Counting in the wild. European conference on computer vision. 483–498. (2016).

  2. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Computer Vision and Pattern Recognition. pp. 248–255. (2009).

  3. Everingham, M., Van Gool, L., Williams, C. K., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge 2007 (VOC2007) results. (2007).

  4. Girshick, R.: Fast R-CNN. arXiv: Computer Vision and Pattern Recognition. (2015).

  5. Goldman, E., Herzig, R., Eisenschtat, A., Goldberger, J., Hassner, T.: Precise Detection in densely packed scenes. arXiv: Computer Vision and Pattern Recognition. (2019).

  6. He, Y., Zhu, C., Wang, J., Savvides, M., Zhang, X.: Bounding box regression with uncertainty for accurate object detection. In: Computer Vision and Pattern Recognition. pp. 2888–2897. (2018).

  7. Hsieh, M. R., Lin, Y. L., Hsu, W. H.: Drone-based object counting by spatially regularized regional proposal network. In: International Conference on Computer Vision. (2017).

  8. Li, S., Zhang, W., & Chan, A. B.: Maximum-margin structured learning with deep networks for 3D human pose estimation. In: International Journal of Computer Vision. pp. 149–168. (2015).

  9. Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S.: Feature Pyramid Net- works for Object Detection. arXiv: Computer Vision and Pattern Recognition. pp. 936–944, (2016).

  10. Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S.: Feature pyramid net- works for object detection. arXiv: Computer Vision and Pattern Recognition. (2016).

  11. Lin, T. Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE Transactions on Pattern Analysis and Machine Intelligence. pp. 2980–2988, (2017).

  12. Lin, T. Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE Transactions on Pattern Analysis and Machine Intelligence. (2017).

  13. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision. Springer, pp. 740–755. (2014).

  14. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., Berg, A.C.: SSD: single shot multibox detector. In: European Conference on Computer Vision. (2016).

  15. Liu, Y., Tang, X., Wu, X., Han, J., Liu, J., & Ding, E.: HAMBox: delving into online high-quality anchors mining for detecting outer faces. arXiv: Computer Vision and Pattern Recognition. (2019).

  16. Ming, X., Wei, F., Zhang, T., Chen, D., Wen, F.: Group sampling for scale invariant face detection. In: Computer Vision and Pattern Recognition. pp. 3446–3456. (2019).

  17. Najibi, M., Rastegari, M., Davis, L. S.: G-CNN: an iterative grid based object detector. In: Computer Vision and Pattern Recognition. pp. 2369–2377. (2015).

  18. Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra R-CNN: towards balanced learning for object detection. In: Computer Vision and Pattern Recognition. pp. 821–830. (2019).

  19. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Computer Vision and Pattern Recognition. pp. 7263–7271. (2017).

  20. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: IEEE Transactions on Pattern Analysis and Machine Intelligence. pp. 91–99. (2015).

  21. Tan, Z., Nie, X., Qian, Q., Li, N., Li, H.: Learning to rank proposals for object detection. In: International Conference on Computer Vision. pp. 8272–8280. (2019).

  22. Zhu, P., Sun, Y., Wen, L., Feng, Y., & Hu, Q.: Drone based RGBT vehicle detection and counting: a challenge. arXiv: Computer Vision and Pat- tern Recognition. (2020).

  23. Kozlov, A.: Working with scale: 2nd place solution to product detection in densely packed scenes [Technical Report]. arXiv: computer vision and pattern Recognition. (2020).

  24. Felzenszwalb, P. F., Girshick, R. B., McAllester, D.: Cascade object detection with deformable part models. In: Computer vision and pattern recognition. pp. 2241–2248. (2010).

  25. Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-IOU loss: faster and better learn- ing for bounding box regression. Proc. AAAI Conf. Artif. Intell. 34(07), 12993–13000 (2020)

    Google Scholar 

  26. He, K., Gkioxari, G., Dollar, P., Ross, Girshick.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), (2017).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chenhui Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hong, J., He, X., Deng, Z. et al. IoU-aware feature fusion R-CNN for dense object detection. Machine Vision and Applications 35, 3 (2024). https://doi.org/10.1007/s00138-023-01483-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01483-2

Keywords

Navigation