Skip to main content
Log in

Recent progresses on object detection: a brief review

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Object detection, aiming at locating objects from a large number of specific categories in natural images, is a fundamental but challenging task in the field of computer vision. Recent years have seen significant progress of object detection using deep CNN mainly due to its robust feature representation ability. The goal of this paper is to provide a simple but comprehensive survey of the recent improvements in object detection in the era of deep learning. More than 100 key contributions are investigated mainly from five directions: architecture diagram, contextual reasoning, multi-layer exploiting, training strategy, and others which includes some other progress like real-time object detectors and works borrowing the idea from RNN and GAN. We discuss comprehensive but straightforward experimental comparisons under widely used benchmarks and metrics. This review finishes by providing promising trends for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Bansal A, Sikka K, Sharma G, Chellappa R, Divakaran A (2018) Zero-shot object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 384–400

  2. Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883

  3. Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-nms—improving object detection with one line of code. In: 2017 IEEE international conference on Computer vision (ICCV). IEEE, pp 5562–5570

  4. Byeon YH, Pan SB, Moh SM, Kwak KC (2016) A surveillance system using cnn for face recognition with object, human and face detection. In: Information science and applications (ICISA) 2016. Springer, pp 975–984

  5. Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection. In: European conference on computer vision. Springer, pp 354–370

  6. Cai Z, Vasconcelos N (2017) Cascade r-cnn: Delving into high quality object detection. arXiv:1712.00726

  7. Cao G, Xie X, Yang W, Liao Q, Shi G, Wu J (2018) Feature-fused ssd: fast detection for small objects. In: Ninth international conference on graphic and image processing (ICGIP 2017). International Society for Optics and Photonics, vol 10615, pp 106151e

  8. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pp 1800–1807. https://doi.org/10.1109/CVPR.2017.195

  9. Cui L (2018) Mdssd: Multi-scale deconvolutional single shot detector for small objects. arXiv:1805.07009

  10. Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

  11. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable Convolutional Networks, pp 764–773

  12. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005. CVPR 2005. IEEE computer society conference on Computer vision and pattern recognition, vol 1. IEEE, pp 886–893

  13. Demirel B, Cinbis RG, Ikizler-Cinbis N (2018) Zero-shot object detection by hybrid region embedding. arXiv:1805.06157

  14. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009. CVPR 2009. IEEE conference on Computer vision and pattern recognition. IEEE, pp 248–255

  15. Dollár P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: a benchmark. In: 2009. CVPR 2009. IEEE conference on Computer vision and pattern recognition. IEEE, pp 304–311

  16. Dollar P, Wojek C, Schiele B, Perona P (2012) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761

    Article  Google Scholar 

  17. Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2147–2154

  18. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2008) The pascal visual object classes challenge 2007 (voc 2007) results (2007). http://host.robots.ox.ac.uk/pascal/VOC/voc2007/workshop/index.html

  19. Everingham M, Van gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  20. Fu C, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv:1701.06659

  21. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: Conference on computer vision and pattern recognition (CVPR)

  22. Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1134–1142

  23. Girshick R, Felzenszwalb PF, Mcallester D (2012) Discriminatively trained deformable part models release 5

  24. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  25. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  26. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

  27. Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K (2017) Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv:1706.02677

  28. Han J, Zhang D, Cheng G, Liu N, Xu D (2018) Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Proc Mag 35(1):84–100

    Article  Google Scholar 

  29. He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision. Springer, pp 346–361

  30. He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  31. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: 2017 IEEE international conference on Computer vision (ICCV). IEEE, pp 2980–2988

  32. He K, Girshick R, Dollár P (2018) Rethinking imagenet pre-training. arXiv:1811.08883

  33. Hong S, Roh B, Kim KH, Cheon Y, Park M (2016) Pvanet: lightweight deep neural networks for real-time object detection. arXiv:1611.08588

  34. Hosang J, Benenson R, Schiele B (2017) Learning non-maximum suppression. In: The IEEE conference on computer vision and pattern recognition (CVPR), vol 2

  35. Howard A, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:1704.04861

  36. Hu H, Gu J, Zhang Z, Dai J, Wei Y (2017) Relation networks for object detection. arXiv:1711.11575.8

  37. Hu J, Shen L, Sun G (2017) Squeeze-and-Excitation Networks. arXiv:1709.01507, pp 1–11. https://doi.org/10.1109/CVPR.2018.00745

  38. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pp 2261–2269. https://doi.org/10.1109/CVPR.2017.243

  39. Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S et al (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: IEEE CVPR, vol 4

  40. Iandola F, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <,0.5MB model size. arXiv:1602.07360, pp 1–13. https://doi.org/10.1007/978-3-319-24553-9

  41. Jeong J, Park H, Kwak N (2017) Enhancement of SSD by concatenating feature maps for object detection. arXiv:1705.09587

  42. Jian M, Qi Q, Dong J, Sun X, Sun Y, Lam KM (2018) Saliency detection using quaternionic distance based weber local descriptor and level priors. Multimedia tools and applications, pp 1–18

  43. Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: Towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 845–853

  44. Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: Reverse connection with objectness prior networks for object detection. In: IEEE Conference on computer vision and pattern recognition, vol 1, pp 2

  45. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet Classification with Deep Convolutional Neural Networks. Advances In Neural Information Processing Systems, pp 1–9. https://doi.org/10.1016/j.protcy.2014.09.007

  46. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  47. Lee H, Eum S, Kwon H (2017) Me r-cnn: Multi-expert r-cnn for object detection. arXiv:1704.01069

  48. Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2017) Light-head r-cnn: In defense of two-stage object detector. arXiv:1711.07264

  49. Li Z, Zhou F (2017) Fssd: Feature fusion single shot multibox detector. arXiv:1712.00960

  50. Li J, Liang X, Li J, Wei Y, Xu T, Feng J, Yan S (2018) Multistage Object Detection With Group Recursive Learning. IEEE Trans Multimed 20(7):1645–1655

    Article  Google Scholar 

  51. Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2018) DetNet: A Backbone network for Object Detection. arXiv:1804.06215, 1(2), 3

  52. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollȧr P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740–755

  53. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: CVPR, vol 1, pp 4

  54. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. arXiv:1708.02002

  55. Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision (ICCV)

  56. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  57. Liu Y, Wang R, Shan S, Chen X (2018) Structure inference net: Object detection using scene-level context and instance-level relationships. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6985–6994

  58. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  59. Lowe DG (2004) Distinctive image features from scale invariant keypoints. Int J Comput Vis 60:91–11020,042. https://doi.org/10.1023/B:VISI.0000029664.99615.94

    Article  Google Scholar 

  60. Lu J, Sibai H, Fabry E (2017) Adversarial examples that fool detectors. arXiv:1712.02494

  61. Ma N, Zhang X, Zheng HT, Sun J (2018) ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. arXiv:1807.11164, pp 1–19

  62. Mehta R, Ozturk C (2018) Object detection at 200 frames per second. arXiv:1805.06361

  63. Oksuz K, Cam BC, Akbas E, Kalkan S (2018) Localization recall precision (lrp): A new performance metric for object detection. arXiv:1807.01696

  64. Ouyang W, Wang X, Zeng X, Qiu S, Luo P, Tian Y, Li H, Yang S, Wang Z, Loy CC et al (2015) Deepid-net: Deformable deep convolutional neural networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2403–2412

  65. Peng C, Xiao T, Li Z, Jiang Y, Zhang X, Jia K, Yu G, Sun J (2018) Megdet: a large mini-batch object detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6181–6189

  66. QiongYan J, LiXu Y (2017) Accurate single stage detector using recurrent rolling convolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  67. Redmon J (2013) Darknet: Open source neural networks in c. http://pjreddie.com/darknet 2016

  68. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  69. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger arXiv preprint

  70. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767

  71. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  72. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. arXiv:1801.04381

  73. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229

  74. Shen Z, Liu Z, Li J, Jiang YG, Chen Y, Xue X (2017) Dsod: Learning deeply supervised object detectors from scratch. In: The IEEE international conference on computer vision (ICCV), vol 3, pp 7

  75. Shrivastava A, Gupta A (2016) Contextual priming and feedback for faster r-cnn. In: European conference on computer vision. Springer, pp 330–348

  76. Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 761–769

  77. Shrivastava A, Sukthankar R, Malik J, Gupta A (2016) Beyond skip connections: Top-down modulation for object detection. arXiv:1612.06851

  78. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354

    Article  Google Scholar 

  79. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  80. Singh B, Li H, Sharma A, Davis LS (2018) R-fcn-3000 at 30fps: Decoupling detection and classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1081–1090

  81. Srivastava RK, Greff K, Schmidhuber J (2015) Highway networks. arXiv:1505.00387

  82. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  83. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

  84. Tang Y, Wang J, Wang X, Gao B, Dellandréa E, Gaizauskas R, Chen L (2017) Visual and semantic knowledge transfer for large scale semi-supervised object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence

  85. Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171. https://doi.org/10.1007/s11263-013-0620-5

    Article  Google Scholar 

  86. Wang J, Fu W, Liu J, Lu H et al (2014) Spatiotemporal group context for pedestrian counting. IEEE Trans Circ Syst Video Technol 24(9):1620–1630

    Article  Google Scholar 

  87. Wang X, Shrivastava A, Gupta A (2017) A-fast-rcnn: Hard positive generation via adversary for object detection. In: IEEE Conference on computer vision and pattern recognition

  88. Wang RJ, Li X, Ao S, Ling CX (2018) Pelee: A real-time object detection system on mobile devices. arXiv:1804.06882

  89. Williams RJ, Zipser D (1989) A learning algorithm for continually running fully recurrent neural networks. Neural Comput 1(2):270–280

    Article  Google Scholar 

  90. Wong A, Shafiee MJ, Li F, Chwyl B (2018) Tiny ssd: A tiny single-shot detection deep convolutional neural network for real-time embedded object detection. arXiv:1802.06488

  91. Wu B, Iandola F, Jin PH, Keutzer K (2016) Squeezedet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: IEEE Conference on computer vision and pattern recognition workshops, pp 446–454

  92. Xie C, Wang J, Zhang Z, Zhou Y, Xie L, Yuille A (2017) Adversarial examples for semantic segmentation and object detection. In: Proceedings of International Conference on Computer Vision (ICCV), pp 1378–1387

  93. Xie S, Girshick R, Dollȧr P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pp 5987–5995. https://doi.org/10.1109/CVPR.2017.634

  94. Yang B, Yan J, Lei Z, Li SZ (2016) Craft objects from images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6043–6051

  95. Yang F, Choi W, Lin Y (2016) Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2129–2137

  96. You Y, Zhang Z, Hsieh C, Demmel J, Keutzer K Imagenet training in minutes

  97. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833

  98. Zeng X, Ouyang W, Yang B, Yan J, Wang X (2016) Gated bi-directional cnn for object detection. In: European conference on computer vision. Springer, pp 354–369

  99. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2017) Single-shot refinement neural network for object detection. arXiv preprint

  100. Zhang X, Zhou X, Lin M, Sun J (2017) ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. arXiv:1707.01083. https://doi.org/10.1109/CVPR.2018.00716

  101. Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018) W2f: a weakly-supervised to fully-supervised framework for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 928–936

  102. Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. Technical report, Center for Brains, Minds and Machines (CBMM)

  103. Zhang Z, He T, Zhang H, Zhang Z, Xie J, Li M (2019) Bag of freebies for training object detection neural networks. arXiv:1902.04103

  104. Zheng L, Fu C, Zhao Y (2018) Extend the shallow part of single shot multibox detector via convolutional neural network. arXiv:1801.05918

  105. Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 528–537

  106. Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H et al (2017) Couplenet: Coupling global structure with local parts for object detection. In: Proceedings of international conference on computer vision (ICCV), vol 2

  107. Zitnick CL, Dollȧr P (2014) Edge boxes: Locating object proposals from edges. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8693 LNCS(PART 5), 391–405. https://doi.org/10.1007/978-3-319-10602-1_26

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianggong Hong.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, H., Hong, X. Recent progresses on object detection: a brief review. Multimed Tools Appl 78, 27809–27847 (2019). https://doi.org/10.1007/s11042-019-07898-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-07898-2

Keywords

Navigation