A review of object detection based on deep learning

Xiao, Youzi; Tian, Zhiqiang; Yu, Jiachen; Zhang, Yinshu; Liu, Shuai; Du, Shaoyi; Lan, Xuguang

doi:10.1007/s11042-020-08976-6

A review of object detection based on deep learning

Published: 12 June 2020

Volume 79, pages 23729–23791, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Youzi Xiao¹,
Zhiqiang Tian ORCID: orcid.org/0000-0002-3669-3748¹,
Jiachen Yu¹,
Yinshu Zhang¹,
Shuai Liu¹,
Shaoyi Du² &
…
Xuguang Lan²

18k Accesses
224 Citations
Explore all metrics

Abstract

With the rapid development of deep learning techniques, deep convolutional neural networks (DCNNs) have become more important for object detection. Compared with traditional handcrafted feature-based methods, the deep learning-based object detection methods can learn both low-level and high-level image features. The image features learned through deep learning techniques are more representative than the handcrafted features. Therefore, this review paper focuses on the object detection algorithms based on deep convolutional neural networks, while the traditional object detection algorithms will be simply introduced as well. Through the review and analysis of deep learning-based object detection techniques in recent years, this work includes the following parts: backbone networks, loss functions and training strategies, classical object detection architectures, complex problems, datasets and evaluation metrics, applications and future development directions. We hope this review paper will be helpful for researchers in the field of object detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

References

Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (11):2189–2202
Google Scholar
Andreas G, Philip L, Raquel U (2012) Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3354–3361
Anwar S, Sung W (2016) Coarse pruning of convolutional neural networks with random masks
Arbelaez P, Pont-Tuset J, Barron JT, Marques F, Malik J (2014) Multiscale combinatorial grouping. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 328–335
Bae SH (2019) Object detection based on region decomposition and assembly. In: Proceedings of the AAAI conference on artificial intelligence (AAAI)
Bartlett PL, Wegkamp MH (2008) Classification with a reject option using a hinge loss. J Mach Learn Res 9(8):1823–1840
MathSciNet MATH Google Scholar
Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2874–2883
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(8):1798–1828
Google Scholar
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision (ECCV), pp 850–865
Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-NMS–Improving object detection with one line of code. In: IEEE international conference on computer vision (ICCV), pp 5562–5570
Borji A, Cheng MM, Hou Q, Jiang H, Li J (2014) Salient object detection: a survey. Computational Visual Media, pp 1–34
Bromley J, Guyon I, LeCun Y, S?ckinger E, Shah R (1994) Signature verification using a siamese time delay neural network. In: Advances in neural information processing systems (NIPS), pp 737–744
Caffe2 (2020) A new lightweight, modular, and scalable deep learning framework. https://caffe2.ai/. Software available from caffe2.ai
Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision (ECCV), pp 354–370
Cai L, Zhao B, Wang Z, Lin J, Foo CS, Aly MS, Chandrasekhar V (2019) MaxpoolNMS: getting rid of NMS bottlenecks in two-stage object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 9356–9364
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6154–6162
Cao G, Xie X, Yang W, Liao Q, Shi G, Wu J (2018) Feature-fused SSD: fast detection for small objects. In: Ninth international conference on graphic and image processing (ICGIP), p 106151
Caron M, Bojanowski P, Joulin A, Douze M (2018) Deep clustering for unsupervised learning of visual features. In: European conference on computer vision (ECCV), pp 139–156
Carreira J, Sminchisescu C (2011) CPMC: automatic object segmentation using constrained parametric min-cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence (7): 1312–1328
Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75
MathSciNet Google Scholar
Castro FM, Marin-Jimenez MJ, Guil N, Schmid C, Alahari K (2018) End-to-end incremental learning. In: Proceedings of the European conference on computer vision (ECCV), pp 233–248
Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)? Arguments against avoiding RMSE in the literature. Geosci Model Dev 7(3):1247–1250
Google Scholar
Chen X, Xiang S, Liu C-L, Pan C-H (2013) Vehicle detection in satellite images by parallel deep convolutional neural networks. In: Asian conference on pattern recognition (ACPR), pp 181–185
Chen C, Seff A, Kornhauser A, Xiao J (2015) Deepdriving: learning affordance for direct perception in autonomous driving. In: Proceedings of the IEEE international conference on computer vision (CVPR), pp 2722–2730
Chen G, Choi W, Yu X, Han T, Chandraker M (2017) Learning efficient object detection models with knowledge distillation. In: Advances in neural information processing systems (NIPS), pp 742–751
Chen H, Wang Y, Wang G, Qiao Y (2018) LSTD: a low-shot transfer detector for object detection
Chen K, et al. (2020) Open MMLab Detection Toolbox (mmdetection). https://github.com/open-mmlab/mmdetection
Chen Q, Song Z, Dong J, Huang Z, Hua Y, Yan S (2015) Contextualizing object detection and classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(1):13–27
Google Scholar
Chen X, Gupta A (2017) Spatial memory for context reasoning in object detection. In: IEEE international conference on computer vision (ICCV), pp 4106–4116
Chen X, Kundu K, Zhu Y, Berneshawi AG, Ma H, Fidler S, Urtasun R (2015) 3d object proposals for accurate object class detection. In: Advances in neural information processing systems (NIPS), pp 424–432
Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6526–6534
Cheng B, Wei Y, Shi H, Feris R, Xiong J, Huang T (2018) Revisiting rcnn: on awakening the classification power of faster rcnn. In: Proceedings of the European conference on computer vision (ECCV), pp 453–468
Cheng G, Zhou P, Han J (2016) Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans Geosci Remote Sens 54(12):7405–7415
Google Scholar
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1800–1807
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3213–3223
Cortes C, Vapnik V (1995) Support vector machine. Mach Learn 20(3):273–297
MATH Google Scholar
Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3150–3158
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems (NIPS), pp 379–387
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 886–893
De Boer P-T, Kroese DP, Mannor S, Rubinstein RY (2005) A tutorial on the cross-entropy method. Ann Oper Res 134(1):19–67
MathSciNet MATH Google Scholar
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 248–255
Denton E, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems (NIPS), pp 1269–1277
Divvala SK, Hoiem D, Hays JH, Efros AA, Hebert M (2009) An empirical study of context in object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1271–1278
Dollar P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: a benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 304–311
Dollar P, Wojek C, Schiele B, Perona P (2012) Pedestrian detection: an evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(4):743–761
Google Scholar
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 6569–6578
Dundar A, Jin J, Culurciello E (2016) Convolutional clustering for unsupervised learning. In: International conference on learning representations (ICLR
Endres I, Hoiem D (2010) Category independent object proposals. In: European conference on computer vision (ECCV), pp 575–588
Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2147–2154
Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136
Google Scholar
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Google Scholar
Facebook Al Research (2020) FAIR’s research platform for object detection research (Detectron). https://github.com/facebookresearch/Detectron
Fan Q, Brown L, Smith J (2016) A closer look at faster R-CNN for vehicle detection. In: IEEE intelligent vehicles symposium (IV), pp 124–129
Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(4):594–611
Google Scholar
Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Computer Vision Image Understanding 106(1):59–70
Google Scholar
Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9):1627–1645
Google Scholar
Fu C-Y, Liu W, Ranga A, Tyagi A, Berg AC (2017) DSSD: deconvolutional single shot detector. arXiv:170106659
Gao M, Yu R, Li A, Morariu VI, Davis LS (2018) Dynamic zoom-in network for fast object detection in large images. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6926–6935
Gentile C, Warmuth MK (1999) Linear hinge loss and average margin. In: Advances in neural information processing systems (NIPS), pp 225–231
Ghiasi G, Lin TY, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7036–7045
Ghodrati A, Diba A, Pedersoli M, Tuytelaars T, Van Gool L (2015) Deepproposal: hunting objects by cascading deep convolutional layers. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2578–2586
Gholami A, Kwon K, Wu B, Tai Z, Yue X, Jin P, Zhao S, Keutzer K (2018) SqueezeNext: hardware-aware neural network design. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 1638–1647
Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 580–587
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems (NIPS), pp 2672–2680
Gordon RS, Perez M (2018) Safety for wearable virtual reality devices via object detection and tracking. US Patent Application
Graves A, Mohamed A-R, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6645–6649
Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. In: Technical Report of California Institute
Gu C, Sun C, Ross D, Vondrick C, Pantofaru C, Li Y, Vijayanarasimhan S, Toderici G, Ricco S, Sukthankar R (2018) AVA: a video dataset of spatio-temporally localized atomic visual actions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6047–6056
Gupta S, Girshick R, Arbelaez P, Malik J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: European conference on computer vision (ECCV), pp 345–360
Han S, Mao H, Dally WJ (2016) Deep compression: compressing deep neural networks with pruning trained quantization and huffman coding. In: International conference on learning representations (ICLR)
Hao Y, Fu Y, Jiang YG, Tian Q (2019, July) An end-to-end architecture for class-incremental object detection with knowledge distillation. In: IEEE international conference on multimedia and expo (ICME), pp 1–6
Hao Z, Liu Y, Qin H, Yan J, Li X, Hu X (2017) Scale-aware face detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6186–6195
Hariharan B, Arbelaez P, Girshick R, Malik J (2017) Object instance segmentation and fine-grained localization using hypercolumns. IEEE Transactions on Pattern Analysis and Machine Intelligence (4): 627–639
Harzallah H, Jurie F, Schmid C (2009) Combining efficient object localization and image classification. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 237–244
Hastie T, Tibshirani R, Friedman J (2009) Unsupervised learning. In: The elements of statistical learning. Springer, Berlin, pp 485–585
He K, Girshick R, Dollar P (2018) Rethinking ImageNet Pre-training. arXiv:181108883
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2980–2988
He KM, Zhang XY, Ren SQ, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision (ECCV), pp 346–361
He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
He S, Lau RW, Liu W, Huang Z, Yang Q (2015) Supercnn: a superpixelwise convolutional neural network for salient object detection. Int J Comput Vis 115(3):330–344
MathSciNet Google Scholar
He Y, Zhang X, Savvides M, Kitani K (2018) Softer-NMS: rethinking bounding box regression for accurate object detection. arXiv:180908545
Hetang C, Qin H, Liu S, Yan J (2017) Impression network for video object detection. arXiv:171205896
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Google Scholar
Hoi SC, Wu X, Liu H, Wu Y, Wang H, Xue H, Wu Q (2015) Logo-net: large-scale deep logo detection and brand recognition with deep region-based convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 46(5):2403–2412
Google Scholar
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7132–7141
Hu P, Ramanan D (2017) Finding tiny faces. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 951–959
Hu P, Shuai B, Liu J, Wang G (2017) Deep level sets for salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2300–2309
Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269
Huang J, Rathod V, Sun C, Zhu ML, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S, Murphy K (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3296–3297
Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced mser trees. In: European conference on computer vision (ECCV), pp 497–511
Hwang S, Kim HE (2016) Self-transfer learning for weakly supervised lesion localization. In: International conference on medical image computing and computer-assisted intervention, pp 239–246
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2017) Squeezenet: alexnet-level accuracy with 50x fewer parameters and < 0.5 mb model size. In: International conference on learning representations (ICLR)
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning (ICML), pp 448–456
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence (11): 1254–1259
Janocha K, Czarnecki WM (2017) On loss functions for deep neural networks in classification. arXiv:170205659
Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: European conference on computer vision (ECCV), pp 8–14
Jiang H, Learned-Miller E (2017) Face detection with the faster R-CNN. In: IEEE international conference on automatic face and gesture recognition, pp 650–657
Kang K, Li H, Yan J, Zeng X, Yang B, Xiao T, Zhang C, Wang Z, Wang R, Wang X (2018) T-cnn: tubelets with convolutional neural networks for object detection from videos. IEEE Transactions on Circuits Systems for Video Technology 28(10):2896–2907
Google Scholar
Kang K, Ouyang W, Li H, Wang X (2016) Object detection from video tubelets with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 817–825
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1725–1732
Kavukcuoglu K, Sermanet P, Boureau Y-L, Gregor K, Mathieu M, Cun YL (2010) Learning convolutional feature hierarchies for visual recognition. In: Advances in neural information processing systems (NIPS), pp 1090–1098
Kawahara J, Hamarneh G (2016) Multi-resolution-tract CNN with hybrid pretrained and skin-lesion trained layers. In: International workshop on machine learning in medical imaging, pp 164–171
Kim S-W, Kook H-K, Sun J-Y, Kang M-C, Ko S-J (2018) Parallel feature pyramid network for object detection. In: European conference on computer vision (ECCV), pp 234–250
Kleban J, Xie X, Ma W-Y (2008) Spatial pyramid mining for logo detection in natural scenes. In: IEEE international conference on multimedia and expo (ICME), pp 1077–1080
Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition. In: International conference on machine learning (ICML
Kong B, Zhan Y, Shin M, Denny T, Zhang S (2016) Recognizing end-diastole and end-systole frames via deep temporal regression network. In: International conference on medical image computing and computer-assisted intervention, pp 264–272
Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5244–5252
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 845–853
Krasin I, Duerig T, Alldrin N, Veit A, Abu-El-Haija S, Belongie S, Cai D, Feng Z, Ferrari V, Gomes V (2016) Openimages: a public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://githubcom/openimages 2(6):7
Krhenbuhl P, Koltun V (2014) Geodesic object proposals. In: European conference on computer vision (ECCV), pp 725–739
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. In: Technical Report of University of Toronto
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Google Scholar
Kuo W, Hariharan B, Malik J (2015) Deepbox: learning objectness with convolutional networks. In: Proceedings of the IEEE international conference on computer vision (CVPR), pp 2479–2487
Lai K, Bo L, Ren X, Fox D (2011) A large-scale hierarchical multi-view rgb-d object dataset. In: IEEE international conference on robotics and automation (ICRA), pp 1817–1824
LaLonde R, Bagci U (2018) Capsules for object segmentation. arXiv:1804.04241
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2169–2178
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521 (7553):436–444
Google Scholar
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp 2278–2324
Li G, Yu Y (2015) Visual saliency based on multiscale deep features. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5455–5463
Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. arXiv:1608.08710
Li H, Lin Z, Shen X, Brandt J, Hua G (2015) A convolutional neural network cascade for face detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5325–5334
Li J, Liang X, Shen S, Xu T, Feng J, Yan S (2017) Scale-aware fast R-CNN for pedestrian detection. IEEE Transactions on Multimedia 20(4):985–996
Google Scholar
Li J, Liang X, Wei Y, Xu T, Feng J, Yan S (2017) Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1222–1230
Li J, Wei Y, Liang X, Dong J, Xu T, Feng J, Yan S (2017) Attentive contexts for object detection. IEEE Transactions on Multimedia 19(5):944–954
Google Scholar
Li K, Cheng G, Bu S, You X (2017) Rotation-insensitive and context-augmented object detection in remote sensing images. IEEE Trans Geosci Remote Sens 56(4):2337–2348
Google Scholar
Li L, Xu M, Wang X, Jiang L, Liu H (2019) Attention based glaucoma detection: a large-scale database and CNN model. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 10571–10580
Li S, Yang L, Huang J, Hua XS, Zhang L (2019) Dynamic anchor feature selection for single-shot object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 6609–6618
Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV)
Li Y, Wang D, Hu H, Lin Y, Zhuang Y (2017) Zero-shot recognition using dual visual-semantic mapping paths. In: Proceedings of the IEEE international conference on computer vision (CVPR), pp 5207–5215
Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2018) DetNet: design backbone for object detection. In: European conference on computer vision (ECCV), pp 334–350
Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2018) Light-head R-CNN: in defense of two-stage object detector. In: Proceedings of the IEEE international conference on computer vision (CVPR)
Li Z, Zhou F (2017) FSSD: feature fusion single shot multibox detector. arXiv:171200960
Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. In: Proceedings of the IEEE international conference on computer vision (CVPR), pp 3367–3375
Liangkui L, Shaoyou W, Zhongxing T (2018) Using deep learning to detect small targets in infrared oversampling images. J Syst Eng Electron 29(5):947–952
Google Scholar
Lienhart R, Maydt J (2002) An extended set of haar-like features for rapid object detection. In: Proceedings of the international conference on image processing (ICIP), pp 1–1
Lin M, Chen Q, Yan S (2014) Network in network. In: International conference on learning representations (ICLR)
Lin T, Goyal P, Girshick R, He K, Dollr P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2999–3007
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision (ECCV), pp 740–755
Lin TY, Dollar P, Girshick R, He KM, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 936–944
Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, Van Der Laak JA, Van Ginneken B, Sanchez CI (2017) A survey on deep learning in medical image analysis. Med Image Anal 42:60–88
Google Scholar
Liu C, Zoph B, Neumann M, Shlens J, Hua W, Li L-J, Fei-Fei L, Yuille A, Huang J, Murphy K (2018) Progressive neural architecture search. In: European conference on computer vision (ECCV), pp 19–34
Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietik?inen M (2018) Deep learning for generic object detection: a survey. International Journal of Computer Vision
Liu S, Huang D, Wang Y (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Liu T, Yuan Z, Sun J, Wang J, Zheng N, Tang X, Shum HY (2010) Learning to detect a salient object. IEEE Transactions on Pattern analysis and Machine Intelligence 33(2):353–367
Google Scholar
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: European conference on computer vision (ECCV), pp 21–37
Liu Y, Wang R, Shan S, Chen X (2018) Structure inference net: object detection using scene-level context and instance-level relationships. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6985–6994
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440
Long Y, Gong Y, Xiao Z, Liu Q (2017) Accurate object localization in remote sensing images based on convolutional neural networks. IEEE Trans Geosci Remote Sens 55(5):2486–2498
Google Scholar
Lotter W, Kreiman G, Cox D (2017) Deep predictive coding networks for video prediction and unsupervised learning. International conference on learning representations (ICLR)
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Google Scholar
Luo JH, Wu J (2017) An entropy-based pruning method for cnn compression. arXiv:1706.05791
Ma N, Zhang X, Zheng H-T, Sun J (2018) ShuffleNet v2: practical guidelines for efficient cnn architecture design. In: European conference on computer vision (ECCV), pp 116–131
Mao H, Yao S, Tang T, Li B, Yao J, Wang Y (2018) Towards real-time object detection on embedded systems. IEEE Transactions on Emerging Topics in Computing 6(3):417–431
Google Scholar
Mao J, Xiao T, Jiang Y, Cao Z (2017) What can help pedestrian detection?. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3127–3136
Mate K, Zbigniew W, Jakub M, Jacek N, Kyunghyun C (2019) Augmentation for small object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR
Miller GA, Beckwith R, Fellbaum C, Gross D, Miller KJ (1990) Introduction to WordNet: an on-line lexical database. Int J Lexicogr 3(4):235–244
Google Scholar
Moore R, DeNero J (2013) L1 and L2 regularization for multiclass hinge loss models. Symposium on Machine Learning in Speech and Language Processing
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the international conference on machine learning (ICML), pp 807–814
Najafi H, Genc Y (2010) Fast object detection for augmented reality systems. US Patent Application
Najibi M, Samangouei P, Chellappa R, Davis LS (2017) Ssh: single stage headless face detector. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 4875–4884
Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: International conference on pattern recognition (ICPR), pp 850–855
Ni K, Pearce R, Boakye K, Van Essen B, Borth D, Chen B, Wang E (2015) Large-scale deep learning on the YFCC100M dataset. arXiv:150203409
Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1717–1724
Ouyang W, Wang K, Zhu X, Wang X (2017) Learning chained deep features and classifiers for cascade in object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV)
Ouyang W, Wang X (2013) Joint deep learning for pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2056–2063
Ouyang W, Wang X, Zeng X, Qiu S, Luo P, Tian Y, Li H, Yang S, Wang Z, Loy C-C (2015) Deepid-net: deformable deep convolutional neural networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2403–2412
Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra r-cnn: towards balanced learning for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 821–830
Pang J, Li C, Shi J, Xu Z, Feng H (2019) R²-CNN: fast tiny object detection in large-scale remote sensing images. IEEE Transactions on Geoscience and Remote Sensing
Peng C, Xiao T, Li Z, Jiang Y, Zhang X, Jia K, Yu G, Sun J (2018) MegDet: a large mini-batch object detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6181–6189
Peng J, Sun M, Zhang Z, Tan T, Yan J (2019) POD: practical object detection with scale-sensitive network. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 9607–9616
Peng J, Sun M, ZHANG Z X, Tan T, Yan J (2019) Efficient neural architecture transformation search in channel-level for object detection. In: Advances in neural information processing systems (NeurIPS), pp 14290–14299
Pentina A, Sharmanska V, Lampert CH (2015) Curriculum learning of multiple tasks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5492–5500
Pinheiro PO, Collobert R, Dollar P (2015) Learning to segment object candidates. In: Advances in neural information processing systems (NIPS), pp 1990–1998
PyTorch (2020) Tensors and dynamic neural networks in python with strong GPU acceleration https://pytorch.org/. Software available from pytorch.org
Qiang Z, Mei-Chen Y, Kwang-Ting C, Shai A (2006) Fast human detection using a cascade of histograms of oriented gradients. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1491–1498
Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S (2016) Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the ACM/SIGDA international symposium on field-programmable gate arrays, pp 26–35
Rabinovich A, Vedaldi A, Galleguillos C, Wiewiora E, Belongie S (2007) Objects in context. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1–8
Radford A, Metz L, Chintala S (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In: International conference on learning representations (ICLR)
Rahman S, Khan S, Porikli F (2018) Zero-shot object detection: learning to simultaneously recognize and localize novel concepts. In: European conference on computer vision (ECCV
Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: imagenet classification using binary convolutional neural networks. In: European conference on computer vision (ECCV), pp 525–542
Redmon J (2013) Darknet: open source neural networks in c. http://pjreddie.com/darknet
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6517–6525
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. arXiv:180402767
Ren S, He K, Girshick R, Zhang X, Sun J (2016) Object detection networks on convolutional feature maps. IEEE transactions on pattern analysis and machine intelligence 39(7):1476–1481
Google Scholar
Ren SQ, He KM, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS), pp 91–99
Ren X, Ramanan D (2013) Histograms of sparse codes for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3246–3253
Ren Y, Zhu C, Xiao S (2018) Small object detection in optical remote sensing images via modified faster R-CNN. Appl Sci 8(5):813
Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
MATH Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
MathSciNet Google Scholar
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4510–4520
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) Overfeat: integrated recognition localization and detection using convolutional networks. In: The international conference on learning representations (ICLR)
Shelhamer E, Rakelly K, Hoffman J, Darrell T (2016) Clockwork convnets for video semantic segmentation. In: European conference on computer vision (ECCV), pp 852–868
Shen Z, Liu Z, Li J, Jiang Y-G, Chen Y, Xue X (2017) DSOD: learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1937–1945
Shigeto Y, Suzuki I, Hara K, Shimbo M, Matsumoto Y (2015) Ridge regression, hubness, and zero-shot learning. In: Joint European conference on machine learning and knowledge discovery in databases (ECML PKDD), pp 135–151
Shmelkov K, Schmid C, Alahari K (2017) Incremental learning of object detectors without catastrophic forgetting. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 3420–3429
Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 761–769
Shrivastava A, Sukthankar R, Malik J, Gupta A (2017) Beyond skip connections: top-down modulation for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Singh B, Davis LS (2018) An analysis of scale invariance in object detection SNIP. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3578–3587
Simon M, Milz S, Amende K, Gross H-M (2018) Complex-YOLO: an euler-region-proposal for real-time 3d object detection on point clouds. In: European Conference on Computer Vision Workshops
Simon M, Rodner E, Denzler J (2016) Imagenet pre-trained models with batch normalization. arXiv:161201452
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations (ICLR)
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1470–1477
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15(1):1929–1958
MathSciNet MATH Google Scholar
Sturm J, Engelhard N, Endres F, Burgard W, Cremers D (2012) A benchmark for the evaluation of RGB-D SLAM systems. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 573–580
Sun B, Saenko K (2014) From virtual to reality: fast adaptation of virtual object detectors to real domains. In: British machine vision conference (BMVC
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2017) Inception-v4 inception-resnet and the impact of residual connections on learning. In: AAAI conference on artificial intelligence
Szegedy C, Liu W, Jia YQ, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826
Taigman Y, Yang M, Ranzato MA, Wolf L (2014) Deepface: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1701–1708
Tan M, Chen B, Pang R, Vasudevan V, Le QV (2018) MnasNet: platform-aware neural architecture search for mobile. arXiv:180711626
Tanner G (2020) Object detection API System. https://github.com/tensorflow/models
Teichmann M, Weber M, Zoellner M, Cipolla R, Urtasun R (2018) Multinet: real-time joint semantic reasoning for autonomous driving. In: IEEE intelligent vehicles symposium (IV), pp 1013–1020
TensorFlow (2020) Large-Scale machine learning on heterogeneous distributed systems. https://www.tensorflow.org/. Software available from tensorflow.org
Tian Y, Luo P, Wang X, Tang X (2015) Deep learning strong parts for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1904–1912
Tian Z, Shen C, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV
Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(11):1958–1970
Google Scholar
Tychsen-Smith L, Petersson L (2017) Denet: scalable real-time object detection with directed sparse sampling. Objects in context. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 428–436
Uijlings JRR, van de Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. International Journal of Computer Vision (IJCV) 104 (2):154–171
Google Scholar
van de Sande KEA, Uijlings JRR, Gevers T, Smeulders AWM (2011) Segmentation as selective search for object recognition. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1879–1886
Van Etten A (2018) You only look twice: rapid multi-scale object detection in satellite imagery. arXiv:1805.09512
Van Horn G, Mac Aodha O, Song Y, Cui Y, Sun C, Shepard A, Adam H, Perona P, Belongie S (2018) The inaturalist species classification and detection dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 8769–8778
Vedaldi A, Gulshan V, Varma M, Zisserman A (2009) Multiple kernels for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 606–613
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1–1
Viola P, Jones MJ (2004) Robust real-time face detection. International journal of computer vision 57(2):137–154
Google Scholar
Wagstaff K, Cardie C, Rogers S, Schr?dl S (2001) Constrained k-means clustering with background knowledge. In: International conference on machine learning (ICML), pp 577–584
Wang C, Bai X, Wang S, Zhou J, Ren P (2018) Multiscale visual attention networks for object detection in VHR remote sensing images. IEEE Geosci Remote Sens Lett 16(2):310–314
Google Scholar
Wang H, Wang Q, Gao M, Li P, Zuo W (2018) Multi-scale location-aware kernel representation for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1248–1257
Wang J, Zheng T, Lei P, Bai X (2019) A hierarchical convolution neural network (CNN)-based ship target detection method in spaceborne SAR imagery. Remote Sens 11(6):620
Google Scholar
Wang L, Lu H, Ruan X, Yang MH (2015) Deep networks for saliency detection via local estimation and global search. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3183–3192
Wang L, Wang L, Lu H, Zhang P, Ruan X (2016) Saliency detection with recurrent fully convolutional networks. In: European conference on computer vision (ECCV), pp 825–841
Wang RJ, Li X, Ao S, Ling CX (2018) Pelee: a real-time object detection system on mobile devices. In: Advances in neural information processing systems (NIPS)
Wang S, Zhou Y, Yan J, Deng Z (2018) Fully motion-aware network for video object detection. In: European conference on computer vision (ECCV), pp 542–557
Wang W, Lai Q, Fu H, Shen J, Ling H (2019) Salient object detection in the deep learning era: an in-depth survey. arXiv:1904.09146
Wang WH, Yang J, Xiao JW, Li S, Zhou DX (2015) Face recognition based on deep learning. In: International conference on human-centered computing (HCC), pp 812–820
Wang X, Han T, Yan S (2009) An HOG-LBP human detector with partial occlusion handling. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 32–39
Wang X, Xiao T, Jiang Y, Shao S, Sun J, Shen C (2018) Repulsion loss: detecting pedestrians in a crowd. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7774–7783
Wang XL, Shrivastava A, Gupta A (2017) A-Fast-RCNN: hard positive generation via adversary for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3039–3048
Wei Y, You X, Li H (2016) Multiscale patch-based contrast measure for small infrared target detection. Pattern Recogn 58:216–226
Google Scholar
Weimer D, Scholz-Reiter B, Shpitalni M (2016) Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection. CIRP Annals-Manufacturing Technology 65(1):417–420
Google Scholar
Weston J, Watkins C (1999) Support vector machines for multi-class pattern recognition. In: European symposium on artificial neural networks (ESANN), pp 219–224
Willmott CJ, Matsuura K (2005) Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research 30(1):79–82
Google Scholar
Wu J, Leng C, Wang Y, Hu Q, Cheng J (2016) Quantized convolutional neural networks for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4820–4828
Wu X, Wu Y, Zhao Y (2016) Binarized neural networks on the imagenet classification task. arXiv:1604.03058
Xiangyu Z, Xinyu Z, Mengxiao L, Jian S (2017) ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6848–6856
Xiao J, Ehinger KA, Hays J, Torralba A, Oliva A (2016) Sun database: exploring a large collection of scene categories. Int J Comput Vis 119(1):3–22
MathSciNet Google Scholar
Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5987–5995
Xu M, Cui L, Lv P, Jiang X, Niu J, Zhou B, Wang M (2018) Mdssd: Multi-scale deconvolutional single shot detector for small objects. arXiv:1805.07009
Xue J, Li JY, Gong YF (2013) Restructuring of deep neural network acoustic models with singular value decomposition. In: Annual conference of the international speech communication association, pp 2364–2368
Yanai K, Kawano Y (2015) Food image recognition using deep convolutional network with pre-training and fine-tuning. In: IEEE international conference on multimedia and expo workshops (ICMEW), pp 1–6
Yang B, Yan J, Lei Z, Li SZ (2014) Aggregate channel features for multi-view face detection. In: IEEE international joint conference on biometrics (IJCB), pp 1–8
Yang S, Luo P, Loy CC, Tang X (2017) Faceness-net: face detection through deep facial part responses. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(8):1845–1859
Google Scholar
Yang TJ, Chen YH, Sze V (2017) Designing energy-efficient convolutional neural networks using energy-aware pruning. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5687–5695
Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: point set representation for object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV
Yildirim G, Susstrunk S (2014) FASA: fast, accurate, and size-aware salient object detection. In: Asian conference on computer vision (ACCV), pp 514–528
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: International conference on learning representations (ICLR)
Yu F, Koltun V, Funkhouser TA (2017) Dilated residual networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 636–644
Yu Y, Zhang J, Huang Y, Zheng S, Ren W, Wang C, Huang K, Tan T (2010) Object detection by context and boosted HOG-LBP. In: European conference on computer vision workshop on PASCAL VOC
Yu Z, Wong HS (2007) A rule based technique for extraction of visual attention regions based on real-time clustering. IEEE Transactions on Multimedia 9(4):766–784
Google Scholar
Zagoruyko S, Lerer A, Lin TY, Pinheiro PO, Gross S, Chintala S, Dollar P (2016) A multipath network for object detection. arXiv:1604.02135
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision (ECCV), pp 818–833
Zeiler MD, Krishnan D, Taylor GW, Fergus R (2010) Deconvolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2528–2535
Zeiler MD, Taylor GW, Fergus R (2011) Adaptive deconvolutional networks for mid and high level feature learning. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2018–2025
Zeng X, Ouyang W, Yang B, Yan J, Wang X (2016) Gated bi-directional cnn for object detection. In: European conference on computer vision (ECCV), pp 354–369
Zhang H, Chang H, Ma B, Shan S, Chen X (2019) Cascade RetinaNet: maintaining consistency for single-stage object detection. In: The British machine vision conference (BMVC)
Zhang J, Sclaroff S, Lin Z, Shen X, Price B, Mech R (2016) Unconstrained salient object detection via proposal subset optimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5733–5742
Zhang J, Zhang T, Dai Y, Harandi M, Hartley R (2018) Deep unsupervised saliency detection: a multiple noisy labeling perspective. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 9029–9038
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23 (10):1499–1503
Google Scholar
Zhang L, Lin L, Liang X, He K (2016) Is faster r-cnn doing well for pedestrian detection?. In: European conference on computer vision (ECCV), pp 443–457
Zhang S, Benenson R, Schiele B (2017) Citypersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4457–4465
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Occlusion-aware R-CNN: detecting pedestrians in a crowd. In: European conference on computer vision (ECCV), pp 637–653
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4203–4212
Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ (2017) Faceboxes: a CPU real-time face detector with high accuracy. In: IEEE international joint conference on biometrics (IJCB), pp 1–9
Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ (2017) S3fd: single shot scale-invariant face detector. In: Proceedings of the IEEE international conference on computer vision (CVPR), pp 192–201
Zhang X, Wan F, Liu C, Ji R, Ye Q (2019) Freeanchor: learning to match anchors for visual object detection. In: Advances in neural information processing systems (NeurIPS), pp 147–155
Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5813–5821
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4159–4167
Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: a single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 9259–9266
Zhao R, Ouyang W, Li H, Wang X (2015) Saliency detection by multi-context deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1265–1274
Zhao Z-Q, Zheng P, Xu S-T, Wu X (2018) Object detection with deep learning: a review. IEEE Transactions on Neural Networks and Learning Systems, pp 1–21
Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2018) Places: a 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(6):1452–1464
Google Scholar
Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 528–537
Zhou P, Ni BB, Geng C, Hu JG, Xu Y (2018) Scale-transferrable object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 528–537
Zhou X, Zhuo J, Krahenbuhl P (2019) Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 850–859
Zhu R, Zhang S, Wang X, Wen L, Shi H, Bo L, Mei T (2019) ScratchDet: training single-shot object detectors from scratch. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2268–2277
Zhu X, Dai J, Yuan L, Wei Y (2018) Towards high performance video object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7210–7218
Zhu X, Wang Y, Dai J, Yuan L, Wei Y (2017) Flow-guided feature aggregation for video object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 408–417
Zhu X, Xiong Y, Dai J, Yuan L, Wei Y (2017) Deep feature flow for video recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4141–4150
Zhu Y, Urtasun R, Salakhutdinov R, Fidler S (2015) SegDeepM: exploiting segmentation and context in deep neural networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4703–4711
Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H (2017) Couplenet: coupling global structure with local parts for object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 4126–4134
Zitnick CL, Dollar P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision (ECCV), pp 391–405
Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 8697–8710

Download references

Acknowledgments

This work was supported in part by NSFC under grant No. 61876148, No. 61866022, and No. 61703328. This work was also supported in part by the key project of Trico-Robot plan of NSFC under grant No. 91748208, key project of Shaanxi province No.2018ZDCXL-GY-06-07, the Fundamental Research Funds for the Central Universities No. XJJ2018254, and China Postdoctoral Science Foundation NO. 2018M631164.

Author information

Authors and Affiliations

School of Software Engineering, Xi’an Jiaotong University, Xi’an, China
Youzi Xiao, Zhiqiang Tian, Jiachen Yu, Yinshu Zhang & Shuai Liu
Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, China
Shaoyi Du & Xuguang Lan

Authors

Youzi Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Tian
View author publications
You can also search for this author in PubMed Google Scholar
Jiachen Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yinshu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shaoyi Du
View author publications
You can also search for this author in PubMed Google Scholar
Xuguang Lan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiqiang Tian.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiao, Y., Tian, Z., Yu, J. et al. A review of object detection based on deep learning. Multimed Tools Appl 79, 23729–23791 (2020). https://doi.org/10.1007/s11042-020-08976-6

Download citation

Received: 25 April 2019
Revised: 14 February 2020
Accepted: 22 April 2020
Published: 12 June 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11042-020-08976-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of object detection based on deep learning

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A review of object detection based on deep learning

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation