Abstract
Currently, the state-of-the-art method about object detector in image mainly depends on deep backbones, such as ResNet-50, DarkNet-53, ResNet-101 and DenseNet-169, which benefits for their powerful capability of feature representations but suffers from high computational cost. On the basis of fast lightweight backbone network (i.e., VGG-16), this paper improves the capability of feature representations by combining features of different receptive fields and cross-fusing feature pyramids, and finally establishes a fast and accurate detector. The architecture of our model is designed to integrate FC-CF Net with two sub-modules: FC module and CF module. Inspired by the structure of receptive fields in visual systems of human, we propose a novel method about Feature Combination Based on Receptive Fields module (FC module), which takes the relationship between the size and eccentricity of receptive fields into account, and then combine them with original features for increasing the receptive field and information of the feature map. Furthermore, based on the structure of FPN (Feature Pyramid Network), we design a novel Cross-Fusion Feature Pyramid module (CF module), which combines top-down and bottom-up connections to fuse features across scales, and achieves high-level semantic feature map at all scales. Extensive experiments on PASCAL VOC 2007 and 2012 demonstrate that FC-CF Net achieves state-of-the-art detection accuracy (i.e. 82.4% mAP, 80.5% mAP) with high efficiency (i.e. 69 FPS, 35 FPS).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Seferbekov, S.S., Iglovikov, V.I., Buslaev, A.V., et al.: Feature pyramid network for multi-class land segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 272–275 (2018)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: CVPR (2016)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Fu, C.Y., et al.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Zhou, P., Geng, C.: Transmission. Scale-transferrable object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 528–537 (2018)
Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich featurehier archies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks In: International Conference on Neural Information Processing Systems (2015)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88, 303–308 (2010)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. IJCV 115, 211–252 (2015)
Zhang, S., Wen, L., Bian, X., et al.: Single-shot refinement neural network for object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2018)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI (2017)
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection (2018)
Zhang, X., Zhou, X., Lin, M., et al.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. arXiv preprint arXiv:1707.01083 (2017)
Sandler, M., Howard, A., Zhu, M., et al.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. arXiv preprint arXiv:1801.04381 (2018)
Ma, N., Zhang, X., Zheng, H.T., et al.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: European Conference on Computer Vision (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human level performance on imagenet classification. In: ICCV (2015)
Gotmare, A., Keskar, N.S., Xiong, C., et al.: A closer look at deep learning heuristics: learning rate restarts, warmup and distillation (2018)
Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)
Gidaris, S., Komodakis, N.: Object detection via a multi region and semantic segmentation-aware CNN model. In: ICCV, pp. 1134–1142 (2015)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017)
Bell, S., Zitnick, C.L., Bala, K., Girshick, R.B.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: CVPR, pp. 2874–2883 (2016)
Shrivastava, A., Gupta, A., Girshick, R.B.: Training region-based object detectors with online hard example mining. In: CVPR, pp. 761–769 (2016)
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS, pp. 379–387 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhao, Y., Rao, Y., Dong, S., Qi, J. (2019). Feature Combination Based on Receptive Fields and Cross-Fusion Feature Pyramid for Object Detection. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science(), vol 11954. Springer, Cham. https://doi.org/10.1007/978-3-030-36711-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-36711-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36710-7
Online ISBN: 978-3-030-36711-4
eBook Packages: Computer ScienceComputer Science (R0)