Feature Combination Based on Receptive Fields and Cross-Fusion Feature Pyramid for Object Detection

Zhao, Yongqiang; Rao, Yuan; Dong, Shipeng; Qi, Jiangnan

doi:10.1007/978-3-030-36711-4_4

Yongqiang Zhao¹¹,
Yuan Rao¹¹,
Shipeng Dong¹¹ &
…
Jiangnan Qi¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11954))

Included in the following conference series:

International Conference on Neural Information Processing

1840 Accesses

Abstract

Currently, the state-of-the-art method about object detector in image mainly depends on deep backbones, such as ResNet-50, DarkNet-53, ResNet-101 and DenseNet-169, which benefits for their powerful capability of feature representations but suffers from high computational cost. On the basis of fast lightweight backbone network (i.e., VGG-16), this paper improves the capability of feature representations by combining features of different receptive fields and cross-fusing feature pyramids, and finally establishes a fast and accurate detector. The architecture of our model is designed to integrate FC-CF Net with two sub-modules: FC module and CF module. Inspired by the structure of receptive fields in visual systems of human, we propose a novel method about Feature Combination Based on Receptive Fields module (FC module), which takes the relationship between the size and eccentricity of receptive fields into account, and then combine them with original features for increasing the receptive field and information of the feature map. Furthermore, based on the structure of FPN (Feature Pyramid Network), we design a novel Cross-Fusion Feature Pyramid module (CF module), which combines top-down and bottom-up connections to fuse features across scales, and achieves high-level semantic feature map at all scales. Extensive experiments on PASCAL VOC 2007 and 2012 demonstrate that FC-CF Net achieves state-of-the-art detection accuracy (i.e. 82.4% mAP, 80.5% mAP) with high efficiency (i.e. 69 FPS, 35 FPS).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
Seferbekov, S.S., Iglovikov, V.I., Buslaev, A.V., et al.: Feature pyramid network for multi-class land segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 272–275 (2018)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: CVPR (2016)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Fu, C.Y., et al.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Zhou, P., Geng, C.: Transmission. Scale-transferrable object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 528–537 (2018)
Google Scholar
Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich featurehier archies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
Google Scholar
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks In: International Conference on Neural Information Processing Systems (2015)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88, 303–308 (2010)
Article Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. IJCV 115, 211–252 (2015)
Article MathSciNet Google Scholar
Zhang, S., Wen, L., Bian, X., et al.: Single-shot refinement neural network for object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2018)
Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI (2017)
Google Scholar
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection (2018)
Chapter Google Scholar
Zhang, X., Zhou, X., Lin, M., et al.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. arXiv preprint arXiv:1707.01083 (2017)
Sandler, M., Howard, A., Zhu, M., et al.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. arXiv preprint arXiv:1801.04381 (2018)
Ma, N., Zhang, X., Zheng, H.T., et al.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: European Conference on Computer Vision (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human level performance on imagenet classification. In: ICCV (2015)
Google Scholar
Gotmare, A., Keskar, N.S., Xiong, C., et al.: A closer look at deep learning heuristics: learning rate restarts, warmup and distillation (2018)
Google Scholar
Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)
Google Scholar
Gidaris, S., Komodakis, N.: Object detection via a multi region and semantic segmentation-aware CNN model. In: ICCV, pp. 1134–1142 (2015)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017)
Google Scholar
Bell, S., Zitnick, C.L., Bala, K., Girshick, R.B.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: CVPR, pp. 2874–2883 (2016)
Google Scholar
Shrivastava, A., Gupta, A., Girshick, R.B.: Training region-based object detectors with online hard example mining. In: CVPR, pp. 761–769 (2016)
Google Scholar
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS, pp. 379–387 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Xi’an Jiaotong University, Xi’an, 710049, China
Yongqiang Zhao, Yuan Rao, Shipeng Dong & Jiangnan Qi

Authors

Yongqiang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Rao
View author publications
You can also search for this author in PubMed Google Scholar
Shipeng Dong
View author publications
You can also search for this author in PubMed Google Scholar
Jiangnan Qi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuan Rao .

Editor information

Editors and Affiliations

Australian National University, Canberra, ACT, Australia
Tom Gedeon
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, Y., Rao, Y., Dong, S., Qi, J. (2019). Feature Combination Based on Receptive Fields and Cross-Fusion Feature Pyramid for Object Detection. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science(), vol 11954. Springer, Cham. https://doi.org/10.1007/978-3-030-36711-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-36711-4_4
Published: 09 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36710-7
Online ISBN: 978-3-030-36711-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics