Abstract
Localization and classification are two important components in the task of visual object detection. In recent years, object detectors have increasingly focused on creating various localization branches. Bounding box regression is vital for two-stage detectors. Therefore, we propose a multi-branch bounding box regression method called Multi-Branch R-CNN for robust object localization. Multi-Branch R-CNN is composed of the fully connected head and the fully convolutional head. The fully convolutional head focuses on the utilization of spatial semantics. It is complementary to the fully connected head that prefers local features. The features extracted from the two localization branches are fused, then flow to the next stage for classification and regression. The two branches cooperate to predict more precise localization, which significantly improves the performance of the detector. Extensive experiments were conducted on public PASCAL VOC and MS COCO benchmarks. On the COCO dataset, our Multi-Branch R-CNN with ResNet-101 backbone achieved state-of-the-art single model results by obtaining an mAP of 43.2. Extensive comparative experiments prove the effectiveness of the proposed method.
Similar content being viewed by others
References
Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems. 2015. p. 91–99.
Lu X, Li B, Yue Y, Li Q, Yan J. Grid R-CNN. In: Proceedings of the IEEE Conference on CVPR. 2019. p. 7363–7372.
Wu Y, Chen Y, Yuan L, Liu Z, Wang L, et al. Double-head RCNN: rethinking classification and localization for object detection. arXiv. 2019;1904:06493.
Cai Z, Vasconcelos N. Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on CVPR. 2018. p. 6154–6162.
Vasamsetti S, Mittal N, Neelapu BC, et al. 3D local spatio-temporal ternary patterns for moving object detection in complex scenes. Cogn Comput. 2019;11:18–30.
Kim J, Oh K, Oh B, et al. A line feature extraction method for finger-knuckle-print verification. Cogn Comput. 2019;11:50–70.
Gao F, Huang T, Sun J, et al. A new algorithm for SAR image target recognition based on an improved deep convolutional neural network. Cogn Comput. 2019;11:809–24.
Lin T-Y, Doll´ar P, Girshick R, He K, Hariharan B, et al. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on CVPR. 2017. p. 2117–2125.
Xu H, et al. Auto-fpn: Automatic network architecture adaptation for object detection beyond classification. In: Proceedings of the IEEE International Conference on Computer Vision. 2019.
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, et al. SSD: single shot multibox detector. In: European Conference on Computer Vision. Springer; 2016. p. 21–37.
Redmon J, Farhadi A. YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on CVPR. 2017. p. 7263–7271.
Redmon J, Farhadi A. Yolov3: An incremental improvement. arXiv. 2018;1804:02767.
Lin T-Y, Goyal P, Girshick R, He K, Doll´ar P. Focal loss for dense object detection. In: Proceedings of the IEEE ICCV. 2017. p. 2980–2988.
He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2015;37(9):1904–16.
Girshick R. Fast R-CNN. In: Proceedings of the IEEE ICCV. 2015. p. 1440–1448.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on CVPR. 2016. p. 770–778.
Deng J, Dong W, Socher R, Li L-J, Li K, et al. Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on CVPR. IEEE; 2009. p. 248–255.
He K, Gkioxari G, Doll´ar P, Girshick R. Mask R-CNN. In: Proceedings of the IEEE ICCV. 2017. p. 2961–2969.
Gidaris S, Komodakis N. LocNet: Improve in localization accuracy for object detection. In: Proceedings of the IEEE ICCV. 2016. p. 789–798.
Fu C-Y, Liu W, Ranga A, Tyagi A, Berg AC. DSSD: Deconvolutional single shot detector. arXiv. 2017;1701:06659.
Bochkovskiy A, Wang CY, Liao H. YOLOv4: optimal speed and accuracy of object detection. arXiv. 2020;2004:10934.
Tychsen-Smith L, Petersson L. DeNet: scalable real-time object detection with directed sparse sampling. In: Proceedings of the IEEE ICCV. 2017. p. 428–436.
Zhang S, Wen L, Bian X, Lei Z, Li SZ. Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on CVPR. 2018. p. 4203–4212.
Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, et al. CoupleNet: Coupling global structure with local parts for object detection. In: Proceedings of the IEEE ICCV. 2017. p. 4126–4134.
Gao Z, Wang L, Wu G. Lip: Local importance-based pooling. In: Proceedings of the IEEE International Conference on Computer Vision. 2019.
Li Y, et al. Scale-aware trident networks for object detection. In: Proceedings of the IEEE international conference on computer vision. 2019.
Acknowledgements
The authors would like to thank the editor and anonymous reviewers for their valuable comments and suggestions, which are very helpful in improving this paper.
Funding
This work was supported in part by NSFC Key Project of International (Regional) Cooperation and Exchanges (No.61860206004), National Natural Science Foundation of China (NO.61976004), and Collegiate Natural Science Fund of Anhui Province (NO.KJ2017A014).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare no competing interests.
Ethics Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yuan, HS., Chen, SB., Luo, B. et al. Multi-branch Bounding Box Regression for Object Detection. Cogn Comput 15, 1300–1307 (2023). https://doi.org/10.1007/s12559-021-09983-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-021-09983-x