IoU-aware feature fusion R-CNN for dense object detection

Hong, Jixuan; He, Xueqin; Deng, Zhaoli; Yang, Chenhui

doi:10.1007/s00138-023-01483-2

IoU-aware feature fusion R-CNN for dense object detection

ORIGINAL PAPER
Published: 18 November 2023

Volume 35, article number 3, (2024)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Jixuan Hong¹,
Xueqin He¹,
Zhaoli Deng¹ &
…
Chenhui Yang ORCID: orcid.org/0000-0002-8580-5451²

315 Accesses
Explore all metrics

Abstract

Densely packed scenes contain a multitude of similar or even identical objects positioned closely, which brings about difficulties in object-detecting. In such scenes, people usually embed the feature pyramid network module into Faster R-CNN for multi-scale detection. However, we find it is not necessary. It is observed that there exists a waste of ground-truth boxes during training, causing the imbalance of training samples and the poor performance of Faster R-CNN. Therefore, we propose an online multiple-step sampling method to increase the utilization of ground truth. Besides, we propose a novel IoU-aware feature fusion R-CNN to take place of the feature pyramid network. It could effectively improve the detection accuracy of Faster R-CNN while simplifying the structure of the feature pyramid network. Our approach improves the base detector and the detection results on SKU-110 k benchmarks indicate that our approach offers a good trade-off between accuracy and speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

References

Arteta, C., Lempitsky, V., Zisserman, A.: Counting in the wild. European conference on computer vision. 483–498. (2016).
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Computer Vision and Pattern Recognition. pp. 248–255. (2009).
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge 2007 (VOC2007) results. (2007).
Girshick, R.: Fast R-CNN. arXiv: Computer Vision and Pattern Recognition. (2015).
Goldman, E., Herzig, R., Eisenschtat, A., Goldberger, J., Hassner, T.: Precise Detection in densely packed scenes. arXiv: Computer Vision and Pattern Recognition. (2019).
He, Y., Zhu, C., Wang, J., Savvides, M., Zhang, X.: Bounding box regression with uncertainty for accurate object detection. In: Computer Vision and Pattern Recognition. pp. 2888–2897. (2018).
Hsieh, M. R., Lin, Y. L., Hsu, W. H.: Drone-based object counting by spatially regularized regional proposal network. In: International Conference on Computer Vision. (2017).
Li, S., Zhang, W., & Chan, A. B.: Maximum-margin structured learning with deep networks for 3D human pose estimation. In: International Journal of Computer Vision. pp. 149–168. (2015).
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S.: Feature Pyramid Net- works for Object Detection. arXiv: Computer Vision and Pattern Recognition. pp. 936–944, (2016).
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S.: Feature pyramid net- works for object detection. arXiv: Computer Vision and Pattern Recognition. (2016).
Lin, T. Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE Transactions on Pattern Analysis and Machine Intelligence. pp. 2980–2988, (2017).
Lin, T. Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE Transactions on Pattern Analysis and Machine Intelligence. (2017).
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision. Springer, pp. 740–755. (2014).
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., Berg, A.C.: SSD: single shot multibox detector. In: European Conference on Computer Vision. (2016).
Liu, Y., Tang, X., Wu, X., Han, J., Liu, J., & Ding, E.: HAMBox: delving into online high-quality anchors mining for detecting outer faces. arXiv: Computer Vision and Pattern Recognition. (2019).
Ming, X., Wei, F., Zhang, T., Chen, D., Wen, F.: Group sampling for scale invariant face detection. In: Computer Vision and Pattern Recognition. pp. 3446–3456. (2019).
Najibi, M., Rastegari, M., Davis, L. S.: G-CNN: an iterative grid based object detector. In: Computer Vision and Pattern Recognition. pp. 2369–2377. (2015).
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra R-CNN: towards balanced learning for object detection. In: Computer Vision and Pattern Recognition. pp. 821–830. (2019).
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Computer Vision and Pattern Recognition. pp. 7263–7271. (2017).
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: IEEE Transactions on Pattern Analysis and Machine Intelligence. pp. 91–99. (2015).
Tan, Z., Nie, X., Qian, Q., Li, N., Li, H.: Learning to rank proposals for object detection. In: International Conference on Computer Vision. pp. 8272–8280. (2019).
Zhu, P., Sun, Y., Wen, L., Feng, Y., & Hu, Q.: Drone based RGBT vehicle detection and counting: a challenge. arXiv: Computer Vision and Pat- tern Recognition. (2020).
Kozlov, A.: Working with scale: 2nd place solution to product detection in densely packed scenes [Technical Report]. arXiv: computer vision and pattern Recognition. (2020).
Felzenszwalb, P. F., Girshick, R. B., McAllester, D.: Cascade object detection with deformable part models. In: Computer vision and pattern recognition. pp. 2241–2248. (2010).
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-IOU loss: faster and better learn- ing for bounding box regression. Proc. AAAI Conf. Artif. Intell. 34(07), 12993–13000 (2020)
Google Scholar
He, K., Gkioxari, G., Dollar, P., Ross, Girshick.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), (2017).

Download references

Author information

Authors and Affiliations

Information College, Xiamen University, Xiang’an, Xiamen, 361102, Fujian, China
Jixuan Hong, Xueqin He & Zhaoli Deng
Information College, Xiamen University, Siming, Xiamen, 361005, Fujian, China
Chenhui Yang

Authors

Jixuan Hong
View author publications
You can also search for this author in PubMed Google Scholar
Xueqin He
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoli Deng
View author publications
You can also search for this author in PubMed Google Scholar
Chenhui Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chenhui Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hong, J., He, X., Deng, Z. et al. IoU-aware feature fusion R-CNN for dense object detection. Machine Vision and Applications 35, 3 (2024). https://doi.org/10.1007/s00138-023-01483-2

Download citation

Received: 14 August 2022
Revised: 26 September 2023
Accepted: 10 October 2023
Published: 18 November 2023
DOI: https://doi.org/10.1007/s00138-023-01483-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

IoU-aware feature fusion R-CNN for dense object detection

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

IoU-aware feature fusion R-CNN for dense object detection

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation