Abstract
Densely packed scenes contain a multitude of similar or even identical objects positioned closely, which brings about difficulties in object-detecting. In such scenes, people usually embed the feature pyramid network module into Faster R-CNN for multi-scale detection. However, we find it is not necessary. It is observed that there exists a waste of ground-truth boxes during training, causing the imbalance of training samples and the poor performance of Faster R-CNN. Therefore, we propose an online multiple-step sampling method to increase the utilization of ground truth. Besides, we propose a novel IoU-aware feature fusion R-CNN to take place of the feature pyramid network. It could effectively improve the detection accuracy of Faster R-CNN while simplifying the structure of the feature pyramid network. Our approach improves the base detector and the detection results on SKU-110 k benchmarks indicate that our approach offers a good trade-off between accuracy and speed.
Similar content being viewed by others
References
Arteta, C., Lempitsky, V., Zisserman, A.: Counting in the wild. European conference on computer vision. 483–498. (2016).
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Computer Vision and Pattern Recognition. pp. 248–255. (2009).
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge 2007 (VOC2007) results. (2007).
Girshick, R.: Fast R-CNN. arXiv: Computer Vision and Pattern Recognition. (2015).
Goldman, E., Herzig, R., Eisenschtat, A., Goldberger, J., Hassner, T.: Precise Detection in densely packed scenes. arXiv: Computer Vision and Pattern Recognition. (2019).
He, Y., Zhu, C., Wang, J., Savvides, M., Zhang, X.: Bounding box regression with uncertainty for accurate object detection. In: Computer Vision and Pattern Recognition. pp. 2888–2897. (2018).
Hsieh, M. R., Lin, Y. L., Hsu, W. H.: Drone-based object counting by spatially regularized regional proposal network. In: International Conference on Computer Vision. (2017).
Li, S., Zhang, W., & Chan, A. B.: Maximum-margin structured learning with deep networks for 3D human pose estimation. In: International Journal of Computer Vision. pp. 149–168. (2015).
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S.: Feature Pyramid Net- works for Object Detection. arXiv: Computer Vision and Pattern Recognition. pp. 936–944, (2016).
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S.: Feature pyramid net- works for object detection. arXiv: Computer Vision and Pattern Recognition. (2016).
Lin, T. Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE Transactions on Pattern Analysis and Machine Intelligence. pp. 2980–2988, (2017).
Lin, T. Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE Transactions on Pattern Analysis and Machine Intelligence. (2017).
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision. Springer, pp. 740–755. (2014).
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., Berg, A.C.: SSD: single shot multibox detector. In: European Conference on Computer Vision. (2016).
Liu, Y., Tang, X., Wu, X., Han, J., Liu, J., & Ding, E.: HAMBox: delving into online high-quality anchors mining for detecting outer faces. arXiv: Computer Vision and Pattern Recognition. (2019).
Ming, X., Wei, F., Zhang, T., Chen, D., Wen, F.: Group sampling for scale invariant face detection. In: Computer Vision and Pattern Recognition. pp. 3446–3456. (2019).
Najibi, M., Rastegari, M., Davis, L. S.: G-CNN: an iterative grid based object detector. In: Computer Vision and Pattern Recognition. pp. 2369–2377. (2015).
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra R-CNN: towards balanced learning for object detection. In: Computer Vision and Pattern Recognition. pp. 821–830. (2019).
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Computer Vision and Pattern Recognition. pp. 7263–7271. (2017).
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: IEEE Transactions on Pattern Analysis and Machine Intelligence. pp. 91–99. (2015).
Tan, Z., Nie, X., Qian, Q., Li, N., Li, H.: Learning to rank proposals for object detection. In: International Conference on Computer Vision. pp. 8272–8280. (2019).
Zhu, P., Sun, Y., Wen, L., Feng, Y., & Hu, Q.: Drone based RGBT vehicle detection and counting: a challenge. arXiv: Computer Vision and Pat- tern Recognition. (2020).
Kozlov, A.: Working with scale: 2nd place solution to product detection in densely packed scenes [Technical Report]. arXiv: computer vision and pattern Recognition. (2020).
Felzenszwalb, P. F., Girshick, R. B., McAllester, D.: Cascade object detection with deformable part models. In: Computer vision and pattern recognition. pp. 2241–2248. (2010).
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-IOU loss: faster and better learn- ing for bounding box regression. Proc. AAAI Conf. Artif. Intell. 34(07), 12993–13000 (2020)
He, K., Gkioxari, G., Dollar, P., Ross, Girshick.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), (2017).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hong, J., He, X., Deng, Z. et al. IoU-aware feature fusion R-CNN for dense object detection. Machine Vision and Applications 35, 3 (2024). https://doi.org/10.1007/s00138-023-01483-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-023-01483-2