Skip to main content
Log in

Efficient convolutional neural networks and network compression methods for object detection: a survey

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Object detection is one of the most basic and important research tasks in the field of computer vision. The general trend in object detection has been to design large and over-parameterized models, which can achieve excellent performance. However, this comes at the expense of low speed, heavy computation and large amount of memory overhead, also makes object detection models more difficult to be applied on mobiles and embedded devices which have limited hardware resources and need real-time feedback. So there has been rising interest in building portable and efficient networks for object detection in the recent literature. The main contributions of this review include the following aspects. As far as we know, there are few reviews on efficient object detection CNNs. We systematically summarize the methods, models and evaluation metrics of efficient CNNs for object detection in recent years. We summarize and introduce some commonly used datasets for object detection. Finally, we point out some possible research directions and inspire some useful suggestions for the future work of efficient convolutional neural network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Data availability statement

My manuscript has no associated data.

References

  1. Agarwal S, Du Terrail JO, Jurie F (2018) Recent advances in object detection in the age of deep convolutional neural networks. arXiv\(:\) Computer Vision and Pattern Recognition

  2. Andreopoulos A, Tsotsos JK (2013) 50 years of object recognition: Directions forward. Comput Vis Image Underst 117(8):827–891

    Google Scholar 

  3. Ba LJ, Caruana R (2014) Do deep nets really need to be deep? In: Proceedings of the 27th international conference on neural information processing systems, vol 2, MIT Press, Cambridge, NIPS’14, pp 2654–2662

  4. Behrendt K, Novak L, Botros R (2017) A deep learning approach to traffic lights: detection, tracking, and classification. In: 2017 IEEE International conference on robotics and automation (ICRA), pp 1370–1377

  5. Benenson R, Mathias M, Timofte R, Gool LV (2012) Pedestrian detection at 100 frames per second. In: Computer vision and pattern recognition (CVPR), 2012 IEEE Conference on

  6. Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. ArXiv abs/2004.10934

  7. Braun M, Krebs S, Flohr F, Gavrila DM (2019) Eurocity persons: a novel benchmark for person detection in traffic scenes. IEEE Trans Pattern Anal Mach Intell 41(8):1844–1861

    Google Scholar 

  8. Buciluundefined C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, association for computing machinery, New York, NY, USA, KDD ’06, pp 535–541, https://doi.org/10.1145/1150402.1150464, https://doi.org/10.1145/1150402.1150464

  9. Cai H, Zhu L, Han S (2019) ProxylessNAS: direct neural architecture search on target task and hardware. In: International conference on learning representations, https://arxiv.org/pdf/1812.00332.pdf

  10. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, Springer, pp 213–229

  11. Chen G, Choi W, Yu X, Han T, Chandraker M (2017) Learning efficient object detection models with knowledge distillation. In: Proceedings of the 31st International conference on neural information processing systems, Curran Associates Inc., Red Hook, NIPS’17, pp 742–751

  12. Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Google Scholar 

  13. Chen L, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. arXiv\(:\) Computer Vision and Pattern Recognition

  14. Chen X, Ma H, Wan J, Li B, Xia T (2016) Multi-view 3d object detection network for autonomous driving. arXiv\(:\) Computer Vision and Pattern Recognition

  15. Chen Y, Yang T, Zhang X, Meng G, Xiao X, Sun J (2019) Detnas: Backbone search for object detection. In: NeurIPS

  16. Cheng G, Han J (2016) A survey on object detection in optical remote sensing images. ISPRS J Photogramm Remote Sens 117:11–28

    Google Scholar 

  17. Cheng G, Han J (2016) A survey on object detection in optical remote sensing images. ISPRS J Photogramm Remote Sens 117:11–28. https://doi.org/10.1016/j.isprsjprs.2016.03.014

    Article  Google Scholar 

  18. Cheng G, Han J, Zhou P, Guo L (2014) Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J Photogramm Remote Sens 98(98):119–132

    Google Scholar 

  19. Cheng G, Zhou P, Han J (2016) Learning rotation-invariant convolutional neural networks for object detection in vhr optical remote sensing images. IEEE Trans Geosci Remote Sens 54(12):7405–7415

    Google Scholar 

  20. Cheng G, Han J, Zhou P, Xu D (2019) Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection. IEEE Trans Image Process 28(1):265–278

    MathSciNet  Google Scholar 

  21. Cheng Y, Wang D, Zhou P, Zhang T (2017) A survey of model compression and acceleration for deep neural networks. CoRR abs/1710.09282, http://dblp.uni-trier.de/db/journals/corr/corr1710.html#abs-1710-09282

  22. Cheng Y, Wang D, Zhou P, Zhang T (2018) Model compression and acceleration for deep neural networks: the principles, progress, and challenges. IEEE Signal Process Mag 35(1):126–136

    Google Scholar 

  23. Chollet F (2016) Xception: Deep learning with depthwise separable convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 1800–1807

  24. Chu X, Zhang B, Xu R, Ma H (2019) Multi-objective reinforced evolution in mobile neural architecture search. ArXiv abs/1901.01074

  25. Courbariaux M, Bengio Y, David JP (2015) Binaryconnect: Training deep neural networks with binary weights during propagations. NIPS 28

  26. Courbariaux M, Hubara I, Soudry D, Elyaniv R, Bengio Y (2016) Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1. arXiv\(:\) Learning

  27. Dai J, He K, Sun J (2015) Instance-aware semantic segmentation via multi-task network cascades. arXiv\(:\) Computer Vision and Pattern Recognition

  28. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893

  29. de Charette R, Nashashibi F (2009) Real time visual traffic lights recognition based on spot light detection and adaptive traffic lights templates. In: 2009 IEEE Intelligent Vehicles Symposium, pp 358–363

  30. Dollar P, Tu Z, Perona P, Belongie S (2009) Integral channel features. In: Proceedings of the British Machine Vision Conference, BMVA Press, pp 91.1–91.11, https://doi.org/10.5244/C.23.91

  31. Dollar P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: a benchmark. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 304–311

  32. Dollar P, Wojek C, Schiele B, Perona P (2012) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761

  33. Dollar P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545

    Google Scholar 

  34. Dong X, Yang Y (2019) One-shot neural architecture search via self-evaluated template network. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) pp 3680–3689

  35. Du Y, Xu C, Tao D (2017) Privileged matrix factorization for collaborative filtering. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp 1610–1616, https://doi.org/10.24963/ijcai.2017/223, https://doi.org/10.24963/ijcai.2017/223

  36. Dubout C, Fleuret F (2012) Exact acceleration of linear object detectors. In: In ECCV, 2012. 7

  37. Elfwing S, Uchibe E, Doya K (2018) Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks: The Official Journal of the International Neural Network Society 107:3–11

    Google Scholar 

  38. Eon PS, Simard PY, Haffner P, Lecun Y (1999) Boxlets: a fast convolution algorithm for signal processing and neural networks. In: Advances in Neural Information Processing Systems, MIT Press, pp 571–577

  39. Everingham M, van Gool L, Williams C, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338. https://doi.org/10.1007/s11263-009-0275-4

    Article  Google Scholar 

  40. Everingham M, Eslami S, Van Gool L, Williams C, Winn J, Zisserman A (2014) The Pascal visual object classes challenge: a retrospective. Int J Comput Vis 111. https://doi.org/10.1007/s11263-014-0733-5

  41. Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) Liblinear: A library for large linear classification. J Mach Learn Res 9:1871–1874

    Google Scholar 

  42. Fan Y, Choi W, Lin Y (2016) Exploit all the layers: fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  43. Felzenszwalb P, Mcallester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. vol 8, https://doi.org/10.1109/CVPR.2008.4587597

  44. Felzenszwalb PF, Girshick RB (2012) From rigid templates to grammars: object detection with structured models

  45. Felzenszwalb PF, Girshick RB, McAllester D (2010a) Cascade object detection with deformable part models. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2241–2248

  46. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Google Scholar 

  47. Fleuret F, Geman D (2001) Coarse-to-fine face detection. Int J Comput Vis 41(1–2):85–107

    Google Scholar 

  48. Frankle J, Carbin M (2019) The lottery ticket hypothesis: finding sparse, trainable neural networks. arXiv\(:\) Learning

  49. Gale T, Elsen E, Hooker S (2019) The state of sparsity in deep neural networks. arXiv\(:\) Learning

  50. Gao M, Yu R, Li A, Morariu VI, Davis LS (2017) Dynamic zoom-in network for fast object detection in large images. arXiv\(:\) Computer Vision and Pattern Recognition

  51. Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430

  52. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 3354–3361

  53. Gerards MET, Kuper J (2013) Optimal dpm and dvfs for frame-based real-time systems. ACM Trans Archit Code Optim 9(4), https://doi.org/10.1145/2400682.2400700, https://doi.org/10.1145/2400682.2400700

  54. Ghiasi G, Lin T, Pang R, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. arXiv\(:\) Computer Vision and Pattern Recognition

  55. Ghodrati A, Diba A, Pedersoli M, Tuytelaars T, Van Gool L (2015) Deepproposal: hunting objects by cascading deep convolutional layers. arXiv\(:\) Computer Vision and Pattern Recognition

  56. Girshick R (2015) Fast r-cnn. Computer Science

  57. Girshick RB, Felzenszwalb PF, McAllester D (2011) Object detection with grammar models. In: Proceedings of the 24th International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NIPS’11, pp 442–450

  58. Girshick RB, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587

  59. Guo Y, Yao A, Chen Y (2016) Dynamic network surgery for efficient dnns. arXiv\(:\) Neural and Evolutionary Computing

  60. Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P (2015) Deep learning with limited numerical precision. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, vol 37, JMLR.org, ICML’15, pp 1737–1746

  61. Han J, Zhang D, Cheng G, Guo L, Ren J (2015) Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans Geosci Remote Sens 53(6):3325–3337

  62. Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2019) Ghostnet: more features from cheap operations. ArXiv abs/1911.11907

  63. Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) Ghostnet: more features from cheap operations. In: CVPR

  64. Han S, Mao H, Dally WJ (2015a) Deep compression: compressing deep neural network with pruning, trained quantization and huffman coding. CoRR abs/1510.00149

  65. Han S, Mao H, Dally WJ (2015b) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv\(:\) Computer Vision and Pattern Recognition

  66. Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) Eie: efficient inference engine on compressed deep neural network. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp 243–254

  67. Hanson SJ, Pratt L (1989) Comparing biases for minimal network construction with back-propagation. Morgan Kaufmann Publishers Inc., San Francisco, pp 177–185

    Google Scholar 

  68. Hariharan B, Arbelaez P, Girshick R, Malik J (2014) Simultaneous detection and segmentation. arXiv\(:\) Computer Vision and Pattern Recognition

  69. Hassibi B, GStork D (1992) Second order derivatives for network pruning: optimal brain surgeon. Adv Neural Inform Proc Syst 5

  70. Haussler D (2001) Convolution kernels on discrete structures ucsc-crl-99-10

  71. He K, Sun J (2014) Convolutional neural networks at constrained time cost. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5353–5360

  72. He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37:1904–1916

    Google Scholar 

  73. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778

  74. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv\(:\) Computer Vision and Pattern Recognition

  75. He K, Gkioxari G, Dollar P, Girshick R (2020) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell 42(2):386–397

    Google Scholar 

  76. He Y, Lin J, Liu Z, Wang H, Li L, Han S (2018) Amc: automl for model compression and acceleration on mobile devices. In: ECCV

  77. Hendrycks D, Gimpel K (2016) Bridging nonlinearities and stochastic regularizers with gaussian error linear units. CoRR abs/1606.08415

  78. Hinton GE, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    MathSciNet  Google Scholar 

  79. Hinton GE, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv\(:\) Machine Learning

  80. Hong S, Roh B, Kim K, Cheon Y, Park M (2016) Pvanet: lightweight deep neural networks for real-time object detection. arXiv\(:\) Computer Vision and Pattern Recognition

  81. Houben S, Stallkamp J, Salmen J, Schlipsing M, Igel C (2013) Detection of traffic signs in real-world images: the German traffic sign detection benchmark. In: International Joint Conference on Neural Networks, 1288

  82. Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V, Le QV, Adam H (2019) Searching for mobilenetv3. ArXiv abs/1905.02244

  83. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. ArXiv abs/1704.04861

  84. Hsu CH, Chang SH, Juan DC, Pan JY, Chen YT, Wei W, Chang SC (2018) Monas: multi-objective neural architecture search using reinforcement learning. ArXiv abs/1806.10332

  85. Huang G, Liu Z, Weinberger KQ (2016a) Densely connected convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2261–2269

  86. Huang G, Liu S, van der Maaten L, Weinberger KQ (2017) Condensenet: an efficient densenet using learned group convolutions. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2752–2761

  87. Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S, et al. (2016b) Speed/accuracy trade-offs for modern convolutional object detectors. arXiv\(:\) Computer Vision and Pattern Recognition

  88. Iandola FN, Moskewicz MW, Ashraf K, Han S, Dally WJ, Keutzer K (2017) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and \(<\)1mb model size. ArXiv abs/1602.07360

  89. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv\(:\) Learning

  90. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Synthetic data and artificial neural networks for natural scene text recognition. In: Workshop on Deep Learning, NIPS

  91. Jain V, Learned-Miller E (2010) Fddb: a benchmark for face detection in unconstrained settings. Tech. Rep. UM-CS-2010-009, University of Massachusetts, Amherst

  92. Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. arXiv\(:\) Computer Vision and Pattern Recognition

  93. Jin X, Yuan XT, Feng J, Yan S (2016) Training skinny deep neural networks with iterative hard thresholding methods. ArXiv abs/1607.05423

  94. Keerthi SS, Lin C (2003) Asymptotic behaviors of support vector machines with gaussian kernel. Neural Comput 15(7):1667–1689

    Google Scholar 

  95. Kokkinos I (2012) Bounding part scores for rapid detection with deformable part models. In: Proceedings of European Conference on Computer Vision

  96. Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. arXiv\(:\) Computer Vision and Pattern Recognition

  97. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks pp 1097–1105

  98. Köstinger M, Wohlhart P, Roth PM, Bischof H (2011) Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: 2011 IEEE International Conference on Computer Vision orkshops (ICCV Workshops), pp 2144–2151

  99. Lam D, Kuzma R, Mcgee K, Dooley S, Laielli M, Klaric MN, Bulatov Y, Mccord B (2018) xview: objects in context in overhead imagery. arXiv\(:\) Computer Vision and Pattern Recognition

  100. Lampert CH, Blaschko MB, Hofmann T (2009) Efficient subwindow search: a branch and bound framework for object localization. IEEE Trans Pattern Anal Mach Intell 31(12):2129–2142

    Google Scholar 

  101. Law H, Deng J (2020) Cornernet: detecting objects as paired keypoints. Int J Comput Vis 128(3):642–656

    Google Scholar 

  102. Leather H, Bonilla E, O’Boyle M (2009) Automatic feature generation for machine learning based optimizing compilation. In: 2009 International Symposium on Code Generation and Optimization, pp 81–91

  103. LeCun Y, Denker JS, Solla SA (1989) Optimal brain damage. In: NIPS

  104. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11):2278–2324

    Google Scholar 

  105. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86:2278–2324. https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  106. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11):2278–2324

    Google Scholar 

  107. Li B, Wu B, Su J, Wang G, Lin L (2020a) Eagleeye: fast sub-net evaluation for efficient neural network pruning. arXiv:2007.02491

  108. Li H, Lin Z, Shen X, Brandt J, Hua G (2015) A convolutional neural network cascade for face detection. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5325–5334

  109. Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. ArXiv abs/1608.08710

  110. Li K, Wan G, Cheng G, Meng L, Han J (2020) Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J Photogramm Remote Sens 159:296–307

    Google Scholar 

  111. Li K, Wan G, Cheng G, Meng L, Han J (2020) Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J Photogramm Remote Sens 159:296–307. https://doi.org/10.1016/j.isprsjprs.2019.11.023

    Article  Google Scholar 

  112. Li Q, Jin S, Yan J (2017) Mimicking very efficient network for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7341–7349

  113. Li Y, Lin S, Zhang B, Liu J, Doermann D, Wu Y, Huang F, Ji R (2018) Exploiting kernel sparsity and entropy for interpretable cnn compression. arXiv\(:\) Computer Vision and Pattern Recognition

  114. Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2017) Light-head r-cnn: in defense of two-stage object detector. arXiv\(:\) Computer Vision and Pattern Recognition

  115. Lin S, Ji R, Yan C, Zhang B, Cao L, Ye Q, Huang F, Doermann D (2019) Towards optimal structured cnn pruning via generative adversarial learning. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2785–2794

  116. Lin T, Maire M, Belongie S, Bourdev L, Girshick R, Hays J, Perona P, Ramanan D, Zitnick CL, Dollar P (2014a) Microsoft coco: common objects in context. arXiv\(:\) Computer Vision and Pattern Recognition

  117. Lin T, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2016) Feature pyramid networks for object detection. arXiv\(:\) Computer Vision and Pattern Recognition

  118. Lin T, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017a) Feature pyramid networks for object detection, pp 936–944

  119. Lin T, Goyal P, Girshick R, He K, Dollar P (2017b) Focal loss for dense object detection. arXiv\(:\) Computer Vision and Pattern Recognition

  120. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision - ECCV 2014. Springer International Publishing, Cham, pp 740–755

    Google Scholar 

  121. Lin X, Zhao C, Pan W (2017c) Towards accurate binary convolutional neural network. In: NIPS

  122. Liu K, Mattyus G (2015) Fast multiclass vehicle detection on aerial images. IEEE Geosci Remote Sens Lett 12(9):1938–1942

    Google Scholar 

  123. Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikainen M (2020) Deep learning for generic object detection: a survey. Int. J. Comput. Vis 128(2):261–318

    Google Scholar 

  124. Liu S, Huang D, Wang Y (2017) Receptive field block net for accurate and fast object detection. arXiv\(:\) Computer Vision and Pattern Recognition

  125. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. arXiv\(:\) Computer Vision and Pattern Recognition

  126. Liu S, Huang D, Wang Y (2019a) Learning spatial fusion for single-shot object detection. arXiv\(:\) Computer Vision and Pattern Recognition

  127. Liu S, Ren B, Shen X, Wang Y (2020b) Cocopie: Making mobile ai sweet as pie -compression-compilation co-design goes a long way. ArXiv abs/2003.06700

  128. Liu S, Wang S, Liu X, Lin CT, Lv Z (2020) Fuzzy detection aided real-time and robust visual tracking under complex environments. IEEE Trans Fuzzy Syst 29(1):90–102

    Google Scholar 

  129. Liu S, Wang S, Liu X, Gandomi AH, Daneshmand M, Muhammad K, De Albuquerque VHC (2021) Human memory update strategy: a multi-layer template update mechanism for remote visual monitoring. IEEE Trans Multimedia 23:2188–2198

    Google Scholar 

  130. Liu S, Wang S, Liu X, Dai J, Muhammad K, Gandomi AH, Ding W, Hijji M, de Albuquerque VHC (2022) Human inertial thinking strategy: A novel fuzzy reasoning mechanism for iot-assisted visual monitoring. IEEE Internet Things J

  131. Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: ECCV

  132. Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 2755–2763

  133. Liu Z, Sun M, Zhou T, Huang G, Darrell T (2019b) Rethinking the value of network pruning. In: ICLR

  134. Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. arXiv\(:\) Computer Vision and Pattern Recognition

  135. Lowe DG (1999) Object recognition from local scale-invariant features. Proceedings of the Seventh IEEE International Conference on Computer Vision 2:1150–1157

    Google Scholar 

  136. Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) Icdar 2003 robust reading competitions. In: Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings., pp 682–687

  137. Luo JH, Wu J (2020) Autopruner: an end-to-end trainable filter pruning method for efficient deep model inference. ArXiv abs/1805.08941

  138. Ma N, Zhang X, Zheng HT, Sun J (2018) Shufflenet v2: practical guidelines for efficient cnn architecture design. ArXiv abs/1807.11164

  139. Maji S, Berg AC, Malik J (2008) Classification using intersection kernel support vector machines is efficient. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  140. Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: IEEE Conference on Computer Vision & Pattern Recognition

  141. Mathieu M, Henaff M, Lecun Y (2013) Fast training of convolutional networks through ffts. arXiv\(:\) Computer Vision and Pattern Recognition

  142. Mehrara H, Zahedinejad M, Pourmohammad A (2009) Novel edge detection using bp neural network based on threshold binarization. In: 2009 Second International Conference on Computer and Electrical Engineering, vol 2, pp 408–412

  143. Mogelmose A, Trivedi MM, Moeslund TB (2012) Vision-based traffic sign detection and analysis for intelligent driver assistance systems: perspectives and survey. IEEE Trans Intell Transp Syst 13(4):1484–1497

    Google Scholar 

  144. Mukkamala MC, Hein M (2017) Variants of rmsprop and adagrad with logarithmic regret bounds. arXiv\(:\) Learning

  145. Nada H, Sindagi VA, Zhang H, Patel V (2018) Pushing the limits of unconstrained face detection: a challenge dataset and baseline results. 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS), pp 1–10

  146. Nascimento JC, Marques JS (2006) Performance evaluation of object detection algorithms for video surveillance. IEEE Trans Multimedia 8(4):761–774

    Google Scholar 

  147. Neubeck A, Gool LJV (2006) Efficient non-maximum suppression. In: International Conference on Pattern Recognition

  148. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. arXiv\(:\) Computer Vision and Pattern Recognition

  149. Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. arXiv\(:\) Computer Vision and Pattern Recognition

  150. Oksuz K, Cam BC, Akbas E, Kalkan S (2018) Localization recall precision (lrp): a new performance metric for object detection. arXiv\(:\) Computer Vision and Pattern Recognition

  151. Ouyang W, Wang K, Zhu X, Wang X (2017) Chained cascade network for object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 1956–1964

  152. Ouyang W, Wang K, Zhu X, Wang X (2017) Learning chained deep features and classifiers for cascade in object detection. ArXiv abs/1702.07054

  153. Papageorgiou C, Poggio T (2000) A trainable system for object detection. Int J Comput Vis 38(1):15–33

    Google Scholar 

  154. Papageorgiou C, Poggio T (2000) A trainable system for object detection. Int. J. Comput. Vis 38:15–33. https://doi.org/10.1023/A:1008162616689

    Article  Google Scholar 

  155. Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Shazeer N, Ku A, Tran D (2018) Image transformer. In: International Conference on Machine Learning, PMLR, pp 4055–4064

  156. Peng C, Zhang X, Yu G, Luo G, Sun J (2017) Large kernel matters - improve semantic segmentation by global convolutional network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1743–1751

  157. Porikli F (2005) Integral histogram: a fast way to extract histograms in cartesian spaces. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol 1, pp 829–836 vol. 1

  158. Pratt H, Williams B, Coenen F, Zheng Y (2017) FCNN: Fourier convolutional neural networks. In: Machine Learning and Knowledge Discovery in Databases, Springer International Publishing, pp 786–798, https://doi.org/10.1007/978-3-319-71249-9_47, https://doi.org/10.1007%2F978-3-319-71249-9_47

  159. Qiang Zhu, Mei-Chen Yeh, Kwang-Ting Cheng, Avidan S (2006) Fast human detection using a cascade of histograms of oriented gradients. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol 2, pp 1491–1498

  160. Ramachandran P, Zoph B, Le QV (2017) Swish: a self-gated activation function. arXiv\(:\) Neural and Evolutionary Computing

  161. Rastegari M, Ordonez V, Redmon J, Farhadi A (2016a) Xnor-net: Imagenet classification using binary convolutional neural networks. In: ECCV

  162. Razakarivony S, Jurie F (2016) Vehicle detection in aerial imagery: a small target detection benchmark. J Vis Commun Image Represent 34(Jan.):187–203

  163. Redmon J, Farhadi A (2016) Yolo9000: better, faster, stronger. arXiv\(:\) Computer Vision and Pattern Recognition

  164. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv\(:\) Computer Vision and Pattern Recognition

  165. Redmon J, Divvala SK, Girshick RB, Farhadi A (2016) You only look once: unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 779–788

  166. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Google Scholar 

  167. Ren S, He K, Girshick R, Zhang X, Sun J (2017) Object detection networks on convolutional feature maps. IEEE Trans Pattern Anal Mach Intell 39(7):1476–1481

    Google Scholar 

  168. Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. arXiv\(:\) Computer Vision and Pattern Recognition

  169. Rippel O, Snoek J, Adams RP (2015) Spectral representations for convolutional neural networks. ArXiv abs/1506.03767

  170. Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: hints for thin deep nets. arXiv\(:\) Learning

  171. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by Error Propagation. MIT Press, Cambridge, pp 318–362

    Google Scholar 

  172. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  173. Sadeghi MA, Forsyth D (2013) Fast template evaluation with vector quantization. In: Advances in Neural Information Processing Systems (NIPS)

  174. Sandler M, Howard AG, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: inverted residuals and linear bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 4510–4520

  175. Shang W, Sohn K, Almeida D, Lee H (2016) Understanding and improving convolutional neural networks via concatenated rectified linear units. In: ICML

  176. Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651

    Google Scholar 

  177. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556

  178. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015a) Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 1–9

  179. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015b) Rethinking the inception architecture for computer vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 2818–2826

  180. Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI

  181. Tan M, Chen B, Pang R, Vasudevan V, Le QV (2018) Mnasnet: platform-aware neural architecture search for mobile. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp 2815–2823

  182. Tan M, Pang R, Le QV (2019) Efficientdet: scalable and efficient object detection. arXiv\(:\) Computer Vision and Pattern Recognition

  183. Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) pp 9626–9635

  184. Tung F, Mori G (2020) Deep neural network compression by in-parallel pruning-quantization. IEEE Trans Pattern Anal Mach Intell 42(3):568–579

    Google Scholar 

  185. Tzelepis G, Asif A, Baci S, Cavdar S, Aksoy EE (2019) Deep neural network compression for image classification and object detection. arXiv\(:\) Computer Vision and Pattern Recognition

  186. Vaillant R, Monrocq C, Le Cun Y (1994) Original approach for the localisation of objects in images. IEEE Proceedings-Vision, Image and Signal Processing 141(4):245

    Google Scholar 

  187. Vanhoucke V, Senior A, Mao MZ (2011) Improving the speed of neural networks on cpus. In: in Deep Learning and Unsupervised Feature Learning Workshop, NIPS

  188. Vasilache N, Johnson J, Mathieu M, Chintala S, Piantino S, Lecun Y (2014) Fast convolutional nets with fbfft: a gpu performance evaluation. arXiv\(:\) Learning

  189. Vedaldi A, Zisserman A (2012a) Sparse kernel approximations for efficient classification and detection. 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 2320–2327

  190. Vedaldi A, Zisserman A (2012b) Sparse kernel approximations for efficient classification and detection. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, https://doi.org/10.1109/cvpr.2012.6247943, https://doi.org/10.1109%2Fcvpr.2012.6247943

  191. Vedaldi A, Gulshan V, Varma M, Zisserman A (2009) Multiple kernels for object detection. 2009 IEEE 12th International Conference on Computer Vision pp 606–613

  192. Veit A, Matera T, Neumann L, Matas J, Belongie SJ (2016) Coco-text: dataset and benchmark for text detection and recognition in natural images. ArXiv abs/1601.07140

  193. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol 1, pp I–I

  194. Violapaul, Jonesmichael J (2004) Robust real-time face detection. Int J Comput Vis

  195. Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: 2011 International conference on computer vision, IEEE, pp 1457–1464

  196. Wang K, Liu Z, Lin Y, Lin J, Han S (2018) Haq: hardware-aware automated quantization. arXiv

  197. Wang X, Han TX, Yan S (2009) An hog-lbp human detector with partial occlusion handling. In: 2009 IEEE 12th International Conference on Computer Vision, pp 32–39

  198. Watanabe T, Ito S, Yokoi K (2010) Co-occurrence histograms of oriented gradients for human detection. J Inf Process Syst 2(2):39–47

    Google Scholar 

  199. Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks. arXiv\(:\) Neural and Evolutionary Computing

  200. Wilson P, Fernandez JD (2006) Facial feature detection using haar classifiers. J Comput Sci Coll 21(4):127–133

    Google Scholar 

  201. Woo S, Park J, Lee J, Kweon IS (2018) Cbam: convolutional block attention module. arXiv\(:\) Computer Vision and Pattern Recognition

  202. Wu J, Leng C, Wang Y, Hu Q, Cheng J (2016) Quantized convolutional neural networks for mobile devices. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4820–4828

  203. Xia G, Bai X, Ding J, Zhu Z, Belongie S, Luo J, Datcu M, Pelillo M, Zhang L (2017) Dota: a large-scale dataset for object detection in aerial images. arXiv\(:\) Computer Vision and Pattern Recognition

  204. Yang J (2015) Notes on low-rank matrix factorization. arXiv\(:\) Numerical Analysis

  205. Yang S, Luo P, Loy CC, Tang X (2016) Wider face: a face detection benchmark. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5525–5533

  206. Yang TJ, Howard AG, Chen B, Zhang X, Go A, Sze V, Adam H (2018) Netadapt: platform-aware neural network adaptation for mobile applications. In: ECCV

  207. Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 1083–1090

  208. Yu G, Chang Q, Lv W, Xu C, Cui C, Ji W, Dang Q, Deng K, Wang G, Du Y, et al. (2021) Pp-picodet: a better real-time object detector on mobile devices. arXiv preprint arXiv:2111.00902

  209. Zeiler MD, Fergus R (2013) Visualizing and understanding convolutional networks. arXiv\(:\) Computer Vision and Pattern Recognition

  210. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503

    Google Scholar 

  211. Zhang L, Liang L, Liang X, He K (2016b) Is faster r-cnn doing well for pedestrian detection? In: European Conference on Computer Vision

  212. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2017a) Single-shot refinement neural network for object detection. arXiv\(:\) Computer Vision and Pattern Recognition

  213. Zhang T, Qi G, Xiao B, Wang J (2017b) Interleaved group convolutions for deep neural networks. arXiv\(:\) Computer Vision and Pattern Recognition

  214. Zhang X, Zou J, Ming X, He K, Sun J (2014) Efficient and accurate approximations of nonlinear convolutional networks. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 1984–1992

  215. Zhang X, Zou J, He K, Sun J (2015) Accelerating very deep convolutional networks for classification and detection. IEEE Trans Pattern Anal Mach Intell 38:1943–1955

    Google Scholar 

  216. Zhang X, Zhou X, Lin M, Sun J (2017c) Shufflenet: an extremely efficient convolutional neural network for mobile devices. arXiv\(:\) Computer Vision and Pattern Recognition

  217. Zhang X, Zhu K, Guanzhou C, Xiaoliang T, Zhang L, Dai F, Liao P, Gong Y (2019) Geospatial object detection on high resolution remote sensing imagery based on double multi-scale feature pyramid network. Remote Sens 11:755. https://doi.org/10.3390/rs11070755

    Article  Google Scholar 

  218. Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2018) M2det: a single-shot object detector based on multi-level feature pyramid network. arXiv\(:\) Computer Vision and Pattern Recognition

  219. Zheng M, Gao P, Zhang R, Li K, Wang X, Li H, Dong H (2020) End-to-end object detection with adaptive clustering transformer. arXiv preprint arXiv:2011.09315

  220. Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2019) Distance-iou loss: faster and better learning for bounding box regression. arXiv\(:\) Computer Vision and Pattern Recognition

  221. Zhou X, Wang D, Krahenbuhl P (2019) Objects as points. arXiv\(:\) Computer Vision and Pattern Recognition

  222. Zhou Z, Hu Y, Deng X, Huang D, Lin Y (2021) Fault detection of train height valve based on nanodet-resnet101. In: 2021 36th Youth Academic Annual Conference of Chinese Association of Automation (YAC), IEEE, pp 709–714

  223. Zhu P, Wen L, Du D, Bian X, Fan H, Hu Q, Ling H (2021) Detection and tracking meet drones challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence pp 1–1, https://doi.org/10.1109/TPAMI.2021.3119563

  224. Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159

  225. Zhu Z, Woodcock CE (2012) Object-based cloud and cloud shadow detection in landsat imagery. Remote Sens Environ 118:83–94

    Google Scholar 

  226. Zhu Z, Liang D, Zhang S, Huang X, Li B, Hu S (2016) Traffic-sign detection and classification in the wild. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  227. Zhuang L, Xu Y, Ni B, Xu H (2017) Flexible network binarization with layer-wise priority. arXiv\(:\) Computer Vision and Pattern Recognition

  228. Zoph B, Le QV (2016) Neural architecture search with reinforcement learning. arXiv\(:\) Learning

  229. Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  230. Zou Z, Shi Z (2018) Random access memories: a new paradigm for target detection in high resolution aerial remote sensing images. IEEE Trans Image Process 27(3):1100–1111

    MathSciNet  Google Scholar 

  231. Zou Z, Shi Z, Guo Y, Ye J (2019) Object detection in 20 years: a survey. arXiv\(:\) Computer Vision and Pattern Recognition

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 62272461, 62172417, 62276266), and the Natural Science Foundation of Jiangsu Province (No. BK20201346), the "Double First-Class" Project of China University of Mining and Technology for Independent Innovation and Social Service under Grant 2022ZZCX06, the Six Talent Peaks Project in Jiangsu Province (No. 2015-DZXX-010, 2018-XYDXX-044).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiaqi Zhao.

Ethics declarations

Conflict of interest

Author Yong Zhou, Author Lei Xia, Author Jiaqi Zhao, Author Rui Yao, Author Bing Liu declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Y., Xia, L., Zhao, J. et al. Efficient convolutional neural networks and network compression methods for object detection: a survey. Multimed Tools Appl 83, 10167–10209 (2024). https://doi.org/10.1007/s11042-023-15608-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15608-2

Keywords

Navigation