Abstract
Object detection usually adopts two-stage end-to-end networks, which use backbone network (such as VGG and ResNet) for feature extraction and are combined with the region proposal network (RPN) for object localization and classification. In this paper, we explore a novel depthwise grouped convolution (DGC) in the backbone network by integrating channels grouping and depthwise separable convolution, which is able to share the convolution parameters in different channels to reduce the amounts of parameters for speeding up training. In particular, split and shuffle strategies of channels are introduced to enhance information exchange between different groups of channels in DGC block, which can prevent the decrease of performance caused by insufficient object samples. Furthermore, non-local block is adopted in RPN to focus on small objects that are hard to identify. Consequently, we introduce margin-based loss to guide the model training together with the loss of classification and regression. Experiments conducted on the VOC2007, VOC2012 and COCO2017 datasets demonstrate the efficiency and effectiveness of our method for object detection.
Similar content being viewed by others
References
Simonyan, Karen, Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer ence (2014)
Redmon, Joseph, et al.: You Only Look Once: Unified, Real-Time Object Detection. CVPR (2015)
Liu, Wei, et al.: SSD: Single Shot MultiBox Detector. ECCV (2016)
Zhang, Shifeng, et al.: Single-Shot Refinement Neural Network for Object Detection. Presented at the (2017)
Tan, Mingxing, Pang, R., Le, Q.V.: EfficientDet: Scalable and Efficient Object Detection. , CVPR (2019)
Ren, Shaoqing, et al.: “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.” Adv. Neural. Inf. Process. Syst. (2017)
Dai, Jifeng, et al.: “R-FCN: Object detection via region-based fully convolutional networks.” Adv. Neural. Inf. Process. Syst. (2016)
Cai, Zhaowei, and N. Vasconcelos.: “Cascade R-CNN: Delving into High Quality Object Detection.” (2017)
Fan, Qi. et al.: “Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector.” CVPR (2020)
He, Kaiming, et al.: Deep Residual Learning for Image Recognition. CVPR (2016)
Chollet, François: Xception: Deep learning with depthwise separable convolutions. CVPR (2017)
Wang, Xiaolong, et al.: Non-local Neural Networks. CVPR (2018)
Kong, Tao, et al.: HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection. CVPR (2016)
Kim, Kye-Hyeon, et al.: “Pvanet: Deep but lightweight neural networks for real-time object detection.” arXiv:1608.08021 (2016)
Shrivastava, Abhinav, Gupta, A., Girshick, R.: Training Region-based Object Detectors with Online Hard Example Mining. CVPR (2016)
Li, Minne, et al.: S-OHEM: Stratified Online Hard Example Mining for Object Detection. Computer Visio (2017)
Li, Buyu, Liu, Yu, Wang, Xiaogang.: “Gradient harmonized single-stage detector.”. AAAI (2019)
Huang, Gao. et al.: “Densely Connected Convolutional Networks.” CVPR (2017)
Xie, Saining. et al.: “Aggregated Residual Transformations for Deep Neural Networks.” CVPR (2017)
Szegedy, Christian, et al.: Rethinking the inception architecture for computer vision. CVPR (2016)
Ma, Ningning, et al.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. ECCV (2018)
Hu, Jie, et al.: “Squeeze-and-Excitation Networks.” CVPR (2018)
Li, Xiang. et al.: “Selective Kernel Networks.” CVPR (2019)
Girshick, Ross, et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR (2014)
Kaiming, He., et al.: Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence (2018)
Wu, Chao-Yuan. et al.: “Sampling Matters in Deep Embedding Learning.” ICCV (2017)
Bottou, Léon.: Large-scale machine learning with stochastic gradient descent. Physica-Verlag HD (2010)
Duchi, John, Hazan, E., Singer, Y.: Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Mach. Learn. Res. 12, 7 (2011)
Kingma, Diederik, Ba, J.: Adam: A Method for Stochastic Optimization. Computer ence (2014)
Ghadimi, Euhanna, Feyzmahdavian, H.R., Johansson, M.: Global convergence of the Heavy-ball method for convex optimization. ECCV (2015)
Sutskever, Ilya, et al.: “On the importance of initialization and momentum in deep learning.” International conference on machine learning (2013)
Zhang, Michael, et al.: “Lookahead optimizer: k steps forward, 1 step back.” Adv. Neural Inf. Process. Syst. (2019)
Yousong Zhu, et al.: “CoupleNet: Coupling Global Structure with Local Parts for Object Detection”. ICCV (2017)
Cartucho, Joao, Ventura, Rodrigo, Veloso, Manuela: Robust object recognition through symbiotic deep learning in mobile robots. IROS (2018)
Krizhevsky, Alex, Sutskever, I., Hinton, G.: ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural. Inf. Process. Syst. 25, 2 (2012)
Sermanet, Pierr, et al.: OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. Eprint Arxiv (2013)
Najibi, Mahyar, Rastegari, M., Davis, L.S.: G-CNN: An Iterative Grid Based Object Detector. CVPR (2016)
Kong, Tao, et al.: Ron: Reverse connection with objectness prior networks for object detection. CVPR (2017)
He, Kaiming, et al.: “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.” IEEE Trans. Pattern Analy. Machine Intell. 37.9(2014)
Lin, Tsung-Yi., et al.: Feature pyramid networks for object detection. CVPR (2017)
Howard, Andrew G., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. Presented at the arXiv preprint (2017)
Wang, Guangrun, Wang, Keze, Lin, Liang: Adaptively connected neural networks. CVPR (2019)
Vaswani, Ashish, et al.: “Attention is all you need.” Advances in neural information processing systems (2017)
Neubeck, Alexander, Gool, L.J.V..: “Efficient Non-Maximum Suppression.” International Conference on Pattern Recognition IEEE Computer Society (2006)
Zhang, Xiangyu, et al.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. CVPR (2018)
Lin, Tsung-Yi., et al.: Microsoft coco: Common objects in context. EECV (2014)
Li, Wei, et al.: Object detection based on semi-supervised domain adaptation for imbalanced domain resources. Mach. Vis. Appl. 31, 3 (2020)
Srivastava, Gargi, Srivastava, Rajeev: User-interactive salient object detection using YOLOv2, lazy snapping, and gabor filters. Mach. Vis. Appl. 31, 3 (2020)
Park, Jinhee, et al.: Small object segmentation with fully convolutional network based on overlapping domain decomposition. Mach. Vis. Appl. 30, 4 (2019)
Li, Cuiping, et al.: Saliency object detection: integrating reconstruction and prior. Mach. Vis. Appl. 30, 3 (2019)
Shahdoosti, Hamid Reza, Rahemi, Zahra: A maximum likelihood filter using non-local information for despeckling of ultrasound images. Mach. Vis. Appl. 29, 4 (2018)
Najibi, Mahyar, Singh, Bharat, Davis, Larry S.: FA-RPN: Floating Region Proposals for Face Detection. CVPR (2019)
Acknowledgements
This work is supported by the Guangdong Basic and Applied Basic Research Foundation (No.2020A1515010616), Science and Technology Program of Guangzhou (No.202102020524), the Guangdong Innovative Research Team Program (No.2014ZT05G157), the Key-Area Research and Development Program of Guangdong Province (2019B010136001), and the Science and Technology Planning Project of Guangdong Province (LZC0023).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liao, Y., Lu, S., Yang, Z. et al. Depthwise grouped convolution for object detection. Machine Vision and Applications 32, 115 (2021). https://doi.org/10.1007/s00138-021-01243-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-021-01243-0