Skip to main content
Log in

Depthwise grouped convolution for object detection

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Object detection usually adopts two-stage end-to-end networks, which use backbone network (such as VGG and ResNet) for feature extraction and are combined with the region proposal network (RPN) for object localization and classification. In this paper, we explore a novel depthwise grouped convolution (DGC) in the backbone network by integrating channels grouping and depthwise separable convolution, which is able to share the convolution parameters in different channels to reduce the amounts of parameters for speeding up training. In particular, split and shuffle strategies of channels are introduced to enhance information exchange between different groups of channels in DGC block, which can prevent the decrease of performance caused by insufficient object samples. Furthermore, non-local block is adopted in RPN to focus on small objects that are hard to identify. Consequently, we introduce margin-based loss to guide the model training together with the loss of classification and regression. Experiments conducted on the VOC2007, VOC2012 and COCO2017 datasets demonstrate the efficiency and effectiveness of our method for object detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Simonyan, Karen, Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer ence (2014)

  2. Redmon, Joseph, et al.: You Only Look Once: Unified, Real-Time Object Detection. CVPR (2015)

  3. Liu, Wei, et al.: SSD: Single Shot MultiBox Detector. ECCV (2016)

  4. Zhang, Shifeng, et al.: Single-Shot Refinement Neural Network for Object Detection. Presented at the (2017)

  5. Tan, Mingxing, Pang, R., Le, Q.V.: EfficientDet: Scalable and Efficient Object Detection. , CVPR (2019)

  6. Ren, Shaoqing, et al.: “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.” Adv. Neural. Inf. Process. Syst. (2017)

  7. Dai, Jifeng, et al.: “R-FCN: Object detection via region-based fully convolutional networks.” Adv. Neural. Inf. Process. Syst. (2016)

  8. Cai, Zhaowei, and N. Vasconcelos.: “Cascade R-CNN: Delving into High Quality Object Detection.” (2017)

  9. Fan, Qi. et al.: “Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector.” CVPR (2020)

  10. He, Kaiming, et al.: Deep Residual Learning for Image Recognition. CVPR (2016)

  11. Chollet, François: Xception: Deep learning with depthwise separable convolutions. CVPR (2017)

  12. Wang, Xiaolong, et al.: Non-local Neural Networks. CVPR (2018)

  13. Kong, Tao, et al.: HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection. CVPR (2016)

  14. Kim, Kye-Hyeon, et al.: “Pvanet: Deep but lightweight neural networks for real-time object detection.” arXiv:1608.08021 (2016)

  15. Shrivastava, Abhinav, Gupta, A., Girshick, R.: Training Region-based Object Detectors with Online Hard Example Mining. CVPR (2016)

  16. Li, Minne, et al.: S-OHEM: Stratified Online Hard Example Mining for Object Detection. Computer Visio (2017)

  17. Li, Buyu, Liu, Yu, Wang, Xiaogang.: “Gradient harmonized single-stage detector.”. AAAI (2019)

  18. Huang, Gao. et al.: “Densely Connected Convolutional Networks.” CVPR (2017)

  19. Xie, Saining. et al.: “Aggregated Residual Transformations for Deep Neural Networks.” CVPR (2017)

  20. Szegedy, Christian, et al.: Rethinking the inception architecture for computer vision. CVPR (2016)

  21. Ma, Ningning, et al.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. ECCV (2018)

  22. Hu, Jie, et al.: “Squeeze-and-Excitation Networks.” CVPR (2018)

  23. Li, Xiang. et al.: “Selective Kernel Networks.” CVPR (2019)

  24. Girshick, Ross, et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR (2014)

  25. Kaiming, He., et al.: Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence (2018)

  26. Wu, Chao-Yuan. et al.: “Sampling Matters in Deep Embedding Learning.” ICCV (2017)

  27. Bottou, Léon.: Large-scale machine learning with stochastic gradient descent. Physica-Verlag HD (2010)

  28. Duchi, John, Hazan, E., Singer, Y.: Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Mach. Learn. Res. 12, 7 (2011)

    MathSciNet  MATH  Google Scholar 

  29. Kingma, Diederik, Ba, J.: Adam: A Method for Stochastic Optimization. Computer ence (2014)

  30. Ghadimi, Euhanna, Feyzmahdavian, H.R., Johansson, M.: Global convergence of the Heavy-ball method for convex optimization. ECCV (2015)

  31. Sutskever, Ilya, et al.: “On the importance of initialization and momentum in deep learning.” International conference on machine learning (2013)

  32. Zhang, Michael, et al.: “Lookahead optimizer: k steps forward, 1 step back.” Adv. Neural Inf. Process. Syst. (2019)

  33. Yousong Zhu, et al.: “CoupleNet: Coupling Global Structure with Local Parts for Object Detection”. ICCV (2017)

  34. Cartucho, Joao, Ventura, Rodrigo, Veloso, Manuela: Robust object recognition through symbiotic deep learning in mobile robots. IROS (2018)

  35. Krizhevsky, Alex, Sutskever, I., Hinton, G.: ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural. Inf. Process. Syst. 25, 2 (2012)

    Google Scholar 

  36. Sermanet, Pierr, et al.: OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. Eprint Arxiv (2013)

  37. Najibi, Mahyar, Rastegari, M., Davis, L.S.: G-CNN: An Iterative Grid Based Object Detector. CVPR (2016)

  38. Kong, Tao, et al.: Ron: Reverse connection with objectness prior networks for object detection. CVPR (2017)

  39. He, Kaiming, et al.: “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.” IEEE Trans. Pattern Analy. Machine Intell. 37.9(2014)

  40. Lin, Tsung-Yi., et al.: Feature pyramid networks for object detection. CVPR (2017)

  41. Howard, Andrew G., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. Presented at the arXiv preprint (2017)

  42. Wang, Guangrun, Wang, Keze, Lin, Liang: Adaptively connected neural networks. CVPR (2019)

  43. Vaswani, Ashish, et al.: “Attention is all you need.” Advances in neural information processing systems (2017)

  44. Neubeck, Alexander, Gool, L.J.V..: “Efficient Non-Maximum Suppression.” International Conference on Pattern Recognition IEEE Computer Society (2006)

  45. Zhang, Xiangyu, et al.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. CVPR (2018)

  46. Lin, Tsung-Yi., et al.: Microsoft coco: Common objects in context. EECV (2014)

  47. Li, Wei, et al.: Object detection based on semi-supervised domain adaptation for imbalanced domain resources. Mach. Vis. Appl. 31, 3 (2020)

    Article  Google Scholar 

  48. Srivastava, Gargi, Srivastava, Rajeev: User-interactive salient object detection using YOLOv2, lazy snapping, and gabor filters. Mach. Vis. Appl. 31, 3 (2020)

    Article  Google Scholar 

  49. Park, Jinhee, et al.: Small object segmentation with fully convolutional network based on overlapping domain decomposition. Mach. Vis. Appl. 30, 4 (2019)

    Article  Google Scholar 

  50. Li, Cuiping, et al.: Saliency object detection: integrating reconstruction and prior. Mach. Vis. Appl. 30, 3 (2019)

    Google Scholar 

  51. Shahdoosti, Hamid Reza, Rahemi, Zahra: A maximum likelihood filter using non-local information for despeckling of ultrasound images. Mach. Vis. Appl. 29, 4 (2018)

    Article  Google Scholar 

  52. Najibi, Mahyar, Singh, Bharat, Davis, Larry S.: FA-RPN: Floating Region Proposals for Face Detection. CVPR (2019)

Download references

Acknowledgements

This work is supported by the Guangdong Basic and Applied Basic Research Foundation (No.2020A1515010616), Science and Technology Program of Guangzhou (No.202102020524), the Guangdong Innovative Research Team Program (No.2014ZT05G157), the Key-Area Research and Development Program of Guangdong Province (2019B010136001), and the Science and Technology Planning Project of Guangdong Province (LZC0023).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhenguo Yang or Wenyin Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liao, Y., Lu, S., Yang, Z. et al. Depthwise grouped convolution for object detection. Machine Vision and Applications 32, 115 (2021). https://doi.org/10.1007/s00138-021-01243-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-021-01243-0

Keywords

Navigation