Abstract
Object detection is one of the most basic and important research tasks in the field of computer vision. The general trend in object detection has been to design large and over-parameterized models, which can achieve excellent performance. However, this comes at the expense of low speed, heavy computation and large amount of memory overhead, also makes object detection models more difficult to be applied on mobiles and embedded devices which have limited hardware resources and need real-time feedback. So there has been rising interest in building portable and efficient networks for object detection in the recent literature. The main contributions of this review include the following aspects. As far as we know, there are few reviews on efficient object detection CNNs. We systematically summarize the methods, models and evaluation metrics of efficient CNNs for object detection in recent years. We summarize and introduce some commonly used datasets for object detection. Finally, we point out some possible research directions and inspire some useful suggestions for the future work of efficient convolutional neural network.
Similar content being viewed by others
Data availability statement
My manuscript has no associated data.
References
Agarwal S, Du Terrail JO, Jurie F (2018) Recent advances in object detection in the age of deep convolutional neural networks. arXiv\(:\) Computer Vision and Pattern Recognition
Andreopoulos A, Tsotsos JK (2013) 50 years of object recognition: Directions forward. Comput Vis Image Underst 117(8):827–891
Ba LJ, Caruana R (2014) Do deep nets really need to be deep? In: Proceedings of the 27th international conference on neural information processing systems, vol 2, MIT Press, Cambridge, NIPS’14, pp 2654–2662
Behrendt K, Novak L, Botros R (2017) A deep learning approach to traffic lights: detection, tracking, and classification. In: 2017 IEEE International conference on robotics and automation (ICRA), pp 1370–1377
Benenson R, Mathias M, Timofte R, Gool LV (2012) Pedestrian detection at 100 frames per second. In: Computer vision and pattern recognition (CVPR), 2012 IEEE Conference on
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. ArXiv abs/2004.10934
Braun M, Krebs S, Flohr F, Gavrila DM (2019) Eurocity persons: a novel benchmark for person detection in traffic scenes. IEEE Trans Pattern Anal Mach Intell 41(8):1844–1861
Buciluundefined C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, association for computing machinery, New York, NY, USA, KDD ’06, pp 535–541, https://doi.org/10.1145/1150402.1150464, https://doi.org/10.1145/1150402.1150464
Cai H, Zhu L, Han S (2019) ProxylessNAS: direct neural architecture search on target task and hardware. In: International conference on learning representations, https://arxiv.org/pdf/1812.00332.pdf
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, Springer, pp 213–229
Chen G, Choi W, Yu X, Han T, Chandraker M (2017) Learning efficient object detection models with knowledge distillation. In: Proceedings of the 31st International conference on neural information processing systems, Curran Associates Inc., Red Hook, NIPS’17, pp 742–751
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chen L, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. arXiv\(:\) Computer Vision and Pattern Recognition
Chen X, Ma H, Wan J, Li B, Xia T (2016) Multi-view 3d object detection network for autonomous driving. arXiv\(:\) Computer Vision and Pattern Recognition
Chen Y, Yang T, Zhang X, Meng G, Xiao X, Sun J (2019) Detnas: Backbone search for object detection. In: NeurIPS
Cheng G, Han J (2016) A survey on object detection in optical remote sensing images. ISPRS J Photogramm Remote Sens 117:11–28
Cheng G, Han J (2016) A survey on object detection in optical remote sensing images. ISPRS J Photogramm Remote Sens 117:11–28. https://doi.org/10.1016/j.isprsjprs.2016.03.014
Cheng G, Han J, Zhou P, Guo L (2014) Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J Photogramm Remote Sens 98(98):119–132
Cheng G, Zhou P, Han J (2016) Learning rotation-invariant convolutional neural networks for object detection in vhr optical remote sensing images. IEEE Trans Geosci Remote Sens 54(12):7405–7415
Cheng G, Han J, Zhou P, Xu D (2019) Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection. IEEE Trans Image Process 28(1):265–278
Cheng Y, Wang D, Zhou P, Zhang T (2017) A survey of model compression and acceleration for deep neural networks. CoRR abs/1710.09282, http://dblp.uni-trier.de/db/journals/corr/corr1710.html#abs-1710-09282
Cheng Y, Wang D, Zhou P, Zhang T (2018) Model compression and acceleration for deep neural networks: the principles, progress, and challenges. IEEE Signal Process Mag 35(1):126–136
Chollet F (2016) Xception: Deep learning with depthwise separable convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 1800–1807
Chu X, Zhang B, Xu R, Ma H (2019) Multi-objective reinforced evolution in mobile neural architecture search. ArXiv abs/1901.01074
Courbariaux M, Bengio Y, David JP (2015) Binaryconnect: Training deep neural networks with binary weights during propagations. NIPS 28
Courbariaux M, Hubara I, Soudry D, Elyaniv R, Bengio Y (2016) Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1. arXiv\(:\) Learning
Dai J, He K, Sun J (2015) Instance-aware semantic segmentation via multi-task network cascades. arXiv\(:\) Computer Vision and Pattern Recognition
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893
de Charette R, Nashashibi F (2009) Real time visual traffic lights recognition based on spot light detection and adaptive traffic lights templates. In: 2009 IEEE Intelligent Vehicles Symposium, pp 358–363
Dollar P, Tu Z, Perona P, Belongie S (2009) Integral channel features. In: Proceedings of the British Machine Vision Conference, BMVA Press, pp 91.1–91.11, https://doi.org/10.5244/C.23.91
Dollar P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: a benchmark. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 304–311
Dollar P, Wojek C, Schiele B, Perona P (2012) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761
Dollar P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545
Dong X, Yang Y (2019) One-shot neural architecture search via self-evaluated template network. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) pp 3680–3689
Du Y, Xu C, Tao D (2017) Privileged matrix factorization for collaborative filtering. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp 1610–1616, https://doi.org/10.24963/ijcai.2017/223, https://doi.org/10.24963/ijcai.2017/223
Dubout C, Fleuret F (2012) Exact acceleration of linear object detectors. In: In ECCV, 2012. 7
Elfwing S, Uchibe E, Doya K (2018) Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks: The Official Journal of the International Neural Network Society 107:3–11
Eon PS, Simard PY, Haffner P, Lecun Y (1999) Boxlets: a fast convolution algorithm for signal processing and neural networks. In: Advances in Neural Information Processing Systems, MIT Press, pp 571–577
Everingham M, van Gool L, Williams C, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338. https://doi.org/10.1007/s11263-009-0275-4
Everingham M, Eslami S, Van Gool L, Williams C, Winn J, Zisserman A (2014) The Pascal visual object classes challenge: a retrospective. Int J Comput Vis 111. https://doi.org/10.1007/s11263-014-0733-5
Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) Liblinear: A library for large linear classification. J Mach Learn Res 9:1871–1874
Fan Y, Choi W, Lin Y (2016) Exploit all the layers: fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Felzenszwalb P, Mcallester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. vol 8, https://doi.org/10.1109/CVPR.2008.4587597
Felzenszwalb PF, Girshick RB (2012) From rigid templates to grammars: object detection with structured models
Felzenszwalb PF, Girshick RB, McAllester D (2010a) Cascade object detection with deformable part models. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2241–2248
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Fleuret F, Geman D (2001) Coarse-to-fine face detection. Int J Comput Vis 41(1–2):85–107
Frankle J, Carbin M (2019) The lottery ticket hypothesis: finding sparse, trainable neural networks. arXiv\(:\) Learning
Gale T, Elsen E, Hooker S (2019) The state of sparsity in deep neural networks. arXiv\(:\) Learning
Gao M, Yu R, Li A, Morariu VI, Davis LS (2017) Dynamic zoom-in network for fast object detection in large images. arXiv\(:\) Computer Vision and Pattern Recognition
Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 3354–3361
Gerards MET, Kuper J (2013) Optimal dpm and dvfs for frame-based real-time systems. ACM Trans Archit Code Optim 9(4), https://doi.org/10.1145/2400682.2400700, https://doi.org/10.1145/2400682.2400700
Ghiasi G, Lin T, Pang R, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. arXiv\(:\) Computer Vision and Pattern Recognition
Ghodrati A, Diba A, Pedersoli M, Tuytelaars T, Van Gool L (2015) Deepproposal: hunting objects by cascading deep convolutional layers. arXiv\(:\) Computer Vision and Pattern Recognition
Girshick R (2015) Fast r-cnn. Computer Science
Girshick RB, Felzenszwalb PF, McAllester D (2011) Object detection with grammar models. In: Proceedings of the 24th International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NIPS’11, pp 442–450
Girshick RB, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587
Guo Y, Yao A, Chen Y (2016) Dynamic network surgery for efficient dnns. arXiv\(:\) Neural and Evolutionary Computing
Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P (2015) Deep learning with limited numerical precision. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, vol 37, JMLR.org, ICML’15, pp 1737–1746
Han J, Zhang D, Cheng G, Guo L, Ren J (2015) Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans Geosci Remote Sens 53(6):3325–3337
Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2019) Ghostnet: more features from cheap operations. ArXiv abs/1911.11907
Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) Ghostnet: more features from cheap operations. In: CVPR
Han S, Mao H, Dally WJ (2015a) Deep compression: compressing deep neural network with pruning, trained quantization and huffman coding. CoRR abs/1510.00149
Han S, Mao H, Dally WJ (2015b) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv\(:\) Computer Vision and Pattern Recognition
Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) Eie: efficient inference engine on compressed deep neural network. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp 243–254
Hanson SJ, Pratt L (1989) Comparing biases for minimal network construction with back-propagation. Morgan Kaufmann Publishers Inc., San Francisco, pp 177–185
Hariharan B, Arbelaez P, Girshick R, Malik J (2014) Simultaneous detection and segmentation. arXiv\(:\) Computer Vision and Pattern Recognition
Hassibi B, GStork D (1992) Second order derivatives for network pruning: optimal brain surgeon. Adv Neural Inform Proc Syst 5
Haussler D (2001) Convolution kernels on discrete structures ucsc-crl-99-10
He K, Sun J (2014) Convolutional neural networks at constrained time cost. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5353–5360
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37:1904–1916
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv\(:\) Computer Vision and Pattern Recognition
He K, Gkioxari G, Dollar P, Girshick R (2020) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell 42(2):386–397
He Y, Lin J, Liu Z, Wang H, Li L, Han S (2018) Amc: automl for model compression and acceleration on mobile devices. In: ECCV
Hendrycks D, Gimpel K (2016) Bridging nonlinearities and stochastic regularizers with gaussian error linear units. CoRR abs/1606.08415
Hinton GE, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Hinton GE, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv\(:\) Machine Learning
Hong S, Roh B, Kim K, Cheon Y, Park M (2016) Pvanet: lightweight deep neural networks for real-time object detection. arXiv\(:\) Computer Vision and Pattern Recognition
Houben S, Stallkamp J, Salmen J, Schlipsing M, Igel C (2013) Detection of traffic signs in real-world images: the German traffic sign detection benchmark. In: International Joint Conference on Neural Networks, 1288
Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V, Le QV, Adam H (2019) Searching for mobilenetv3. ArXiv abs/1905.02244
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. ArXiv abs/1704.04861
Hsu CH, Chang SH, Juan DC, Pan JY, Chen YT, Wei W, Chang SC (2018) Monas: multi-objective neural architecture search using reinforcement learning. ArXiv abs/1806.10332
Huang G, Liu Z, Weinberger KQ (2016a) Densely connected convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2261–2269
Huang G, Liu S, van der Maaten L, Weinberger KQ (2017) Condensenet: an efficient densenet using learned group convolutions. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2752–2761
Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S, et al. (2016b) Speed/accuracy trade-offs for modern convolutional object detectors. arXiv\(:\) Computer Vision and Pattern Recognition
Iandola FN, Moskewicz MW, Ashraf K, Han S, Dally WJ, Keutzer K (2017) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and \(<\)1mb model size. ArXiv abs/1602.07360
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv\(:\) Learning
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Synthetic data and artificial neural networks for natural scene text recognition. In: Workshop on Deep Learning, NIPS
Jain V, Learned-Miller E (2010) Fddb: a benchmark for face detection in unconstrained settings. Tech. Rep. UM-CS-2010-009, University of Massachusetts, Amherst
Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. arXiv\(:\) Computer Vision and Pattern Recognition
Jin X, Yuan XT, Feng J, Yan S (2016) Training skinny deep neural networks with iterative hard thresholding methods. ArXiv abs/1607.05423
Keerthi SS, Lin C (2003) Asymptotic behaviors of support vector machines with gaussian kernel. Neural Comput 15(7):1667–1689
Kokkinos I (2012) Bounding part scores for rapid detection with deformable part models. In: Proceedings of European Conference on Computer Vision
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. arXiv\(:\) Computer Vision and Pattern Recognition
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks pp 1097–1105
Köstinger M, Wohlhart P, Roth PM, Bischof H (2011) Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: 2011 IEEE International Conference on Computer Vision orkshops (ICCV Workshops), pp 2144–2151
Lam D, Kuzma R, Mcgee K, Dooley S, Laielli M, Klaric MN, Bulatov Y, Mccord B (2018) xview: objects in context in overhead imagery. arXiv\(:\) Computer Vision and Pattern Recognition
Lampert CH, Blaschko MB, Hofmann T (2009) Efficient subwindow search: a branch and bound framework for object localization. IEEE Trans Pattern Anal Mach Intell 31(12):2129–2142
Law H, Deng J (2020) Cornernet: detecting objects as paired keypoints. Int J Comput Vis 128(3):642–656
Leather H, Bonilla E, O’Boyle M (2009) Automatic feature generation for machine learning based optimizing compilation. In: 2009 International Symposium on Code Generation and Optimization, pp 81–91
LeCun Y, Denker JS, Solla SA (1989) Optimal brain damage. In: NIPS
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11):2278–2324
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86:2278–2324. https://doi.org/10.1109/5.726791
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11):2278–2324
Li B, Wu B, Su J, Wang G, Lin L (2020a) Eagleeye: fast sub-net evaluation for efficient neural network pruning. arXiv:2007.02491
Li H, Lin Z, Shen X, Brandt J, Hua G (2015) A convolutional neural network cascade for face detection. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5325–5334
Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. ArXiv abs/1608.08710
Li K, Wan G, Cheng G, Meng L, Han J (2020) Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J Photogramm Remote Sens 159:296–307
Li K, Wan G, Cheng G, Meng L, Han J (2020) Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J Photogramm Remote Sens 159:296–307. https://doi.org/10.1016/j.isprsjprs.2019.11.023
Li Q, Jin S, Yan J (2017) Mimicking very efficient network for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7341–7349
Li Y, Lin S, Zhang B, Liu J, Doermann D, Wu Y, Huang F, Ji R (2018) Exploiting kernel sparsity and entropy for interpretable cnn compression. arXiv\(:\) Computer Vision and Pattern Recognition
Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2017) Light-head r-cnn: in defense of two-stage object detector. arXiv\(:\) Computer Vision and Pattern Recognition
Lin S, Ji R, Yan C, Zhang B, Cao L, Ye Q, Huang F, Doermann D (2019) Towards optimal structured cnn pruning via generative adversarial learning. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2785–2794
Lin T, Maire M, Belongie S, Bourdev L, Girshick R, Hays J, Perona P, Ramanan D, Zitnick CL, Dollar P (2014a) Microsoft coco: common objects in context. arXiv\(:\) Computer Vision and Pattern Recognition
Lin T, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2016) Feature pyramid networks for object detection. arXiv\(:\) Computer Vision and Pattern Recognition
Lin T, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017a) Feature pyramid networks for object detection, pp 936–944
Lin T, Goyal P, Girshick R, He K, Dollar P (2017b) Focal loss for dense object detection. arXiv\(:\) Computer Vision and Pattern Recognition
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision - ECCV 2014. Springer International Publishing, Cham, pp 740–755
Lin X, Zhao C, Pan W (2017c) Towards accurate binary convolutional neural network. In: NIPS
Liu K, Mattyus G (2015) Fast multiclass vehicle detection on aerial images. IEEE Geosci Remote Sens Lett 12(9):1938–1942
Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikainen M (2020) Deep learning for generic object detection: a survey. Int. J. Comput. Vis 128(2):261–318
Liu S, Huang D, Wang Y (2017) Receptive field block net for accurate and fast object detection. arXiv\(:\) Computer Vision and Pattern Recognition
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. arXiv\(:\) Computer Vision and Pattern Recognition
Liu S, Huang D, Wang Y (2019a) Learning spatial fusion for single-shot object detection. arXiv\(:\) Computer Vision and Pattern Recognition
Liu S, Ren B, Shen X, Wang Y (2020b) Cocopie: Making mobile ai sweet as pie -compression-compilation co-design goes a long way. ArXiv abs/2003.06700
Liu S, Wang S, Liu X, Lin CT, Lv Z (2020) Fuzzy detection aided real-time and robust visual tracking under complex environments. IEEE Trans Fuzzy Syst 29(1):90–102
Liu S, Wang S, Liu X, Gandomi AH, Daneshmand M, Muhammad K, De Albuquerque VHC (2021) Human memory update strategy: a multi-layer template update mechanism for remote visual monitoring. IEEE Trans Multimedia 23:2188–2198
Liu S, Wang S, Liu X, Dai J, Muhammad K, Gandomi AH, Ding W, Hijji M, de Albuquerque VHC (2022) Human inertial thinking strategy: A novel fuzzy reasoning mechanism for iot-assisted visual monitoring. IEEE Internet Things J
Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: ECCV
Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 2755–2763
Liu Z, Sun M, Zhou T, Huang G, Darrell T (2019b) Rethinking the value of network pruning. In: ICLR
Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. arXiv\(:\) Computer Vision and Pattern Recognition
Lowe DG (1999) Object recognition from local scale-invariant features. Proceedings of the Seventh IEEE International Conference on Computer Vision 2:1150–1157
Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) Icdar 2003 robust reading competitions. In: Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings., pp 682–687
Luo JH, Wu J (2020) Autopruner: an end-to-end trainable filter pruning method for efficient deep model inference. ArXiv abs/1805.08941
Ma N, Zhang X, Zheng HT, Sun J (2018) Shufflenet v2: practical guidelines for efficient cnn architecture design. ArXiv abs/1807.11164
Maji S, Berg AC, Malik J (2008) Classification using intersection kernel support vector machines is efficient. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: IEEE Conference on Computer Vision & Pattern Recognition
Mathieu M, Henaff M, Lecun Y (2013) Fast training of convolutional networks through ffts. arXiv\(:\) Computer Vision and Pattern Recognition
Mehrara H, Zahedinejad M, Pourmohammad A (2009) Novel edge detection using bp neural network based on threshold binarization. In: 2009 Second International Conference on Computer and Electrical Engineering, vol 2, pp 408–412
Mogelmose A, Trivedi MM, Moeslund TB (2012) Vision-based traffic sign detection and analysis for intelligent driver assistance systems: perspectives and survey. IEEE Trans Intell Transp Syst 13(4):1484–1497
Mukkamala MC, Hein M (2017) Variants of rmsprop and adagrad with logarithmic regret bounds. arXiv\(:\) Learning
Nada H, Sindagi VA, Zhang H, Patel V (2018) Pushing the limits of unconstrained face detection: a challenge dataset and baseline results. 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS), pp 1–10
Nascimento JC, Marques JS (2006) Performance evaluation of object detection algorithms for video surveillance. IEEE Trans Multimedia 8(4):761–774
Neubeck A, Gool LJV (2006) Efficient non-maximum suppression. In: International Conference on Pattern Recognition
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. arXiv\(:\) Computer Vision and Pattern Recognition
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. arXiv\(:\) Computer Vision and Pattern Recognition
Oksuz K, Cam BC, Akbas E, Kalkan S (2018) Localization recall precision (lrp): a new performance metric for object detection. arXiv\(:\) Computer Vision and Pattern Recognition
Ouyang W, Wang K, Zhu X, Wang X (2017) Chained cascade network for object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 1956–1964
Ouyang W, Wang K, Zhu X, Wang X (2017) Learning chained deep features and classifiers for cascade in object detection. ArXiv abs/1702.07054
Papageorgiou C, Poggio T (2000) A trainable system for object detection. Int J Comput Vis 38(1):15–33
Papageorgiou C, Poggio T (2000) A trainable system for object detection. Int. J. Comput. Vis 38:15–33. https://doi.org/10.1023/A:1008162616689
Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Shazeer N, Ku A, Tran D (2018) Image transformer. In: International Conference on Machine Learning, PMLR, pp 4055–4064
Peng C, Zhang X, Yu G, Luo G, Sun J (2017) Large kernel matters - improve semantic segmentation by global convolutional network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1743–1751
Porikli F (2005) Integral histogram: a fast way to extract histograms in cartesian spaces. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol 1, pp 829–836 vol. 1
Pratt H, Williams B, Coenen F, Zheng Y (2017) FCNN: Fourier convolutional neural networks. In: Machine Learning and Knowledge Discovery in Databases, Springer International Publishing, pp 786–798, https://doi.org/10.1007/978-3-319-71249-9_47, https://doi.org/10.1007%2F978-3-319-71249-9_47
Qiang Zhu, Mei-Chen Yeh, Kwang-Ting Cheng, Avidan S (2006) Fast human detection using a cascade of histograms of oriented gradients. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol 2, pp 1491–1498
Ramachandran P, Zoph B, Le QV (2017) Swish: a self-gated activation function. arXiv\(:\) Neural and Evolutionary Computing
Rastegari M, Ordonez V, Redmon J, Farhadi A (2016a) Xnor-net: Imagenet classification using binary convolutional neural networks. In: ECCV
Razakarivony S, Jurie F (2016) Vehicle detection in aerial imagery: a small target detection benchmark. J Vis Commun Image Represent 34(Jan.):187–203
Redmon J, Farhadi A (2016) Yolo9000: better, faster, stronger. arXiv\(:\) Computer Vision and Pattern Recognition
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv\(:\) Computer Vision and Pattern Recognition
Redmon J, Divvala SK, Girshick RB, Farhadi A (2016) You only look once: unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Ren S, He K, Girshick R, Zhang X, Sun J (2017) Object detection networks on convolutional feature maps. IEEE Trans Pattern Anal Mach Intell 39(7):1476–1481
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. arXiv\(:\) Computer Vision and Pattern Recognition
Rippel O, Snoek J, Adams RP (2015) Spectral representations for convolutional neural networks. ArXiv abs/1506.03767
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: hints for thin deep nets. arXiv\(:\) Learning
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by Error Propagation. MIT Press, Cambridge, pp 318–362
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Sadeghi MA, Forsyth D (2013) Fast template evaluation with vector quantization. In: Advances in Neural Information Processing Systems (NIPS)
Sandler M, Howard AG, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: inverted residuals and linear bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 4510–4520
Shang W, Sohn K, Almeida D, Lee H (2016) Understanding and improving convolutional neural networks via concatenated rectified linear units. In: ICML
Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015a) Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015b) Rethinking the inception architecture for computer vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 2818–2826
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI
Tan M, Chen B, Pang R, Vasudevan V, Le QV (2018) Mnasnet: platform-aware neural architecture search for mobile. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp 2815–2823
Tan M, Pang R, Le QV (2019) Efficientdet: scalable and efficient object detection. arXiv\(:\) Computer Vision and Pattern Recognition
Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) pp 9626–9635
Tung F, Mori G (2020) Deep neural network compression by in-parallel pruning-quantization. IEEE Trans Pattern Anal Mach Intell 42(3):568–579
Tzelepis G, Asif A, Baci S, Cavdar S, Aksoy EE (2019) Deep neural network compression for image classification and object detection. arXiv\(:\) Computer Vision and Pattern Recognition
Vaillant R, Monrocq C, Le Cun Y (1994) Original approach for the localisation of objects in images. IEEE Proceedings-Vision, Image and Signal Processing 141(4):245
Vanhoucke V, Senior A, Mao MZ (2011) Improving the speed of neural networks on cpus. In: in Deep Learning and Unsupervised Feature Learning Workshop, NIPS
Vasilache N, Johnson J, Mathieu M, Chintala S, Piantino S, Lecun Y (2014) Fast convolutional nets with fbfft: a gpu performance evaluation. arXiv\(:\) Learning
Vedaldi A, Zisserman A (2012a) Sparse kernel approximations for efficient classification and detection. 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 2320–2327
Vedaldi A, Zisserman A (2012b) Sparse kernel approximations for efficient classification and detection. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, https://doi.org/10.1109/cvpr.2012.6247943, https://doi.org/10.1109%2Fcvpr.2012.6247943
Vedaldi A, Gulshan V, Varma M, Zisserman A (2009) Multiple kernels for object detection. 2009 IEEE 12th International Conference on Computer Vision pp 606–613
Veit A, Matera T, Neumann L, Matas J, Belongie SJ (2016) Coco-text: dataset and benchmark for text detection and recognition in natural images. ArXiv abs/1601.07140
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol 1, pp I–I
Violapaul, Jonesmichael J (2004) Robust real-time face detection. Int J Comput Vis
Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: 2011 International conference on computer vision, IEEE, pp 1457–1464
Wang K, Liu Z, Lin Y, Lin J, Han S (2018) Haq: hardware-aware automated quantization. arXiv
Wang X, Han TX, Yan S (2009) An hog-lbp human detector with partial occlusion handling. In: 2009 IEEE 12th International Conference on Computer Vision, pp 32–39
Watanabe T, Ito S, Yokoi K (2010) Co-occurrence histograms of oriented gradients for human detection. J Inf Process Syst 2(2):39–47
Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks. arXiv\(:\) Neural and Evolutionary Computing
Wilson P, Fernandez JD (2006) Facial feature detection using haar classifiers. J Comput Sci Coll 21(4):127–133
Woo S, Park J, Lee J, Kweon IS (2018) Cbam: convolutional block attention module. arXiv\(:\) Computer Vision and Pattern Recognition
Wu J, Leng C, Wang Y, Hu Q, Cheng J (2016) Quantized convolutional neural networks for mobile devices. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4820–4828
Xia G, Bai X, Ding J, Zhu Z, Belongie S, Luo J, Datcu M, Pelillo M, Zhang L (2017) Dota: a large-scale dataset for object detection in aerial images. arXiv\(:\) Computer Vision and Pattern Recognition
Yang J (2015) Notes on low-rank matrix factorization. arXiv\(:\) Numerical Analysis
Yang S, Luo P, Loy CC, Tang X (2016) Wider face: a face detection benchmark. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5525–5533
Yang TJ, Howard AG, Chen B, Zhang X, Go A, Sze V, Adam H (2018) Netadapt: platform-aware neural network adaptation for mobile applications. In: ECCV
Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 1083–1090
Yu G, Chang Q, Lv W, Xu C, Cui C, Ji W, Dang Q, Deng K, Wang G, Du Y, et al. (2021) Pp-picodet: a better real-time object detector on mobile devices. arXiv preprint arXiv:2111.00902
Zeiler MD, Fergus R (2013) Visualizing and understanding convolutional networks. arXiv\(:\) Computer Vision and Pattern Recognition
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
Zhang L, Liang L, Liang X, He K (2016b) Is faster r-cnn doing well for pedestrian detection? In: European Conference on Computer Vision
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2017a) Single-shot refinement neural network for object detection. arXiv\(:\) Computer Vision and Pattern Recognition
Zhang T, Qi G, Xiao B, Wang J (2017b) Interleaved group convolutions for deep neural networks. arXiv\(:\) Computer Vision and Pattern Recognition
Zhang X, Zou J, Ming X, He K, Sun J (2014) Efficient and accurate approximations of nonlinear convolutional networks. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 1984–1992
Zhang X, Zou J, He K, Sun J (2015) Accelerating very deep convolutional networks for classification and detection. IEEE Trans Pattern Anal Mach Intell 38:1943–1955
Zhang X, Zhou X, Lin M, Sun J (2017c) Shufflenet: an extremely efficient convolutional neural network for mobile devices. arXiv\(:\) Computer Vision and Pattern Recognition
Zhang X, Zhu K, Guanzhou C, Xiaoliang T, Zhang L, Dai F, Liao P, Gong Y (2019) Geospatial object detection on high resolution remote sensing imagery based on double multi-scale feature pyramid network. Remote Sens 11:755. https://doi.org/10.3390/rs11070755
Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2018) M2det: a single-shot object detector based on multi-level feature pyramid network. arXiv\(:\) Computer Vision and Pattern Recognition
Zheng M, Gao P, Zhang R, Li K, Wang X, Li H, Dong H (2020) End-to-end object detection with adaptive clustering transformer. arXiv preprint arXiv:2011.09315
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2019) Distance-iou loss: faster and better learning for bounding box regression. arXiv\(:\) Computer Vision and Pattern Recognition
Zhou X, Wang D, Krahenbuhl P (2019) Objects as points. arXiv\(:\) Computer Vision and Pattern Recognition
Zhou Z, Hu Y, Deng X, Huang D, Lin Y (2021) Fault detection of train height valve based on nanodet-resnet101. In: 2021 36th Youth Academic Annual Conference of Chinese Association of Automation (YAC), IEEE, pp 709–714
Zhu P, Wen L, Du D, Bian X, Fan H, Hu Q, Ling H (2021) Detection and tracking meet drones challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence pp 1–1, https://doi.org/10.1109/TPAMI.2021.3119563
Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159
Zhu Z, Woodcock CE (2012) Object-based cloud and cloud shadow detection in landsat imagery. Remote Sens Environ 118:83–94
Zhu Z, Liang D, Zhang S, Huang X, Li B, Hu S (2016) Traffic-sign detection and classification in the wild. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Zhuang L, Xu Y, Ni B, Xu H (2017) Flexible network binarization with layer-wise priority. arXiv\(:\) Computer Vision and Pattern Recognition
Zoph B, Le QV (2016) Neural architecture search with reinforcement learning. arXiv\(:\) Learning
Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zou Z, Shi Z (2018) Random access memories: a new paradigm for target detection in high resolution aerial remote sensing images. IEEE Trans Image Process 27(3):1100–1111
Zou Z, Shi Z, Guo Y, Ye J (2019) Object detection in 20 years: a survey. arXiv\(:\) Computer Vision and Pattern Recognition
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No. 62272461, 62172417, 62276266), and the Natural Science Foundation of Jiangsu Province (No. BK20201346), the "Double First-Class" Project of China University of Mining and Technology for Independent Innovation and Social Service under Grant 2022ZZCX06, the Six Talent Peaks Project in Jiangsu Province (No. 2015-DZXX-010, 2018-XYDXX-044).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Author Yong Zhou, Author Lei Xia, Author Jiaqi Zhao, Author Rui Yao, Author Bing Liu declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhou, Y., Xia, L., Zhao, J. et al. Efficient convolutional neural networks and network compression methods for object detection: a survey. Multimed Tools Appl 83, 10167–10209 (2024). https://doi.org/10.1007/s11042-023-15608-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15608-2