Skip to main content
Log in

InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection

  • Regular Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

Pedestrian detection is one of the most challenging research areas in computer vision, as it involves classifying the image and localizing the pedestrian. Its applications, especially in automated surveillance and robotics, are exceedingly sought-after. Compared to traditional hand-crafted methods, convolutional neural networks (CNNs) have superior detection results. The single-stage detection networks, particularly the You Only Look Once (YOLO) network, have attained a satisfactory performance in object detection without compromising the computation speed and are among the state-of-the-art CNN-based methods. The YOLO framework can be leveraged to use in pedestrian detection as well. In this work, we propose an improved YOLOv2, called InceptionDepth-wiseYOLOv2. The proposed model uses a modified DarkNet53 engineered for a robust feature formation. Three inception depth-wise convolution modules are integrated at varying levels in DarkNet53, leading to a comprehensive feature of an object in the image. The proposed method is compared with state-of-the-art detection methods, i.e., FasterRCNN, YOLOv2 with various base networks, YOLOv3, and Single Shot Multibox Detector. Detection Error Trade-off Curve, Precision–Recall Curve, Log Average Miss Rate, and Average Precision performance metrics are used to compare the methods. The analysis for the count of pedestrians detected concerning their height is also carried out. The experimental study used three benchmark pedestrian datasets: the INRIA Pedestrian, PASCAL VOC 2012, and Caltech Pedestrian.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Availability of data and material

Available.

References

  1. Dollár P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: A benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition, 304–311. https://doi.org/10.1109/CVPR.2009.5206631

  2. Cao J, Pang Y, Li X (2016) Pedestrian detection inspired by appearance constancy and shape symmetry. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1316–1324. https://doi.org/10.1109/TIP.2016.2609807

  3. Zhu C, Peng Y (2015) A boosted multi-task model for pedestrian detection with occlusion handling. IEEE Trans Image Process 24(12):5619–5629. https://doi.org/10.1109/TIP.2015.2483376

    Article  MathSciNet  MATH  Google Scholar 

  4. Cao J, Pang Y, Li X (2017) Learning multilayer channel features for pedestrian detection. IEEE Trans Image Process 26(7):3210–3220. https://doi.org/10.1109/TIP.2017.2694224

    Article  MathSciNet  MATH  Google Scholar 

  5. Li J, Liang X, Shen S, Xu T, Feng J, Yan S (2017) Scale-aware fast R-CNN for pedestrian detection. IEEE Trans Multimedia 20(4):985–996. https://doi.org/10.1109/TMM.2017.2759508

    Article  Google Scholar 

  6. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In IEEE computer society conference on computer vision and pattern recognition, 886–893. https://doi.org/10.1109/CVPR.2005.177

  7. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154. https://doi.org/10.1023/B:VISI.0000013087.49260.fb

    Article  Google Scholar 

  8. Li H, Wu Z, Zhang J (2016) Pedestrian detection based on deep learning model. In 9th International congress on image and signal processing, Biomedical engineering and informatics, pp 796–800. https://doi.org/10.1109/CISP-BMEI.2016.7852818

  9. Sermanet P, Kavukcuoglu K, Chintala S, LeCun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 3626–3633. https://doi.org/10.1109/CVPR.2013.465

  10. Sabokrou M, Fayyaz M, Fathy M, Klette R (2017) Deep-cascade: Cascading 3d deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans Image Process 26(4):1992–2004. https://doi.org/10.1109/TIP.2017.2670780

    Article  MathSciNet  MATH  Google Scholar 

  11. Zhao Y, Yuan Z, Chen B (2019) Accurate pedestrian detection by human pose regression. IEEE Trans Image Process 29:1591–1605. https://doi.org/10.1109/TIP.2019.2942686

    Article  MathSciNet  Google Scholar 

  12. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE conference on computer vision and pattern recognition, 3354–3361. https://doi.org/10.1109/CVPR.2012.6248074

  13. Zhang S, Benenson R, Schiele B (2017) Citypersons: A diverse dataset for pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 3213–3221. https://doi.org/10.1109/CVPR.2017.474

  14. Wu S, Wang S, Laganiere R, Liu C, Wong HS, Xu Y (2017) Exploiting target data to learn deep convolutional networks for scene-adapted human detection. IEEE Trans Image Process 27(3):1418–1432. https://doi.org/10.1109/TIP.2017.2779271

    Article  MathSciNet  MATH  Google Scholar 

  15. Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe B, Matas J, Sebe N, Welling M (eds.) Computer vision – ECCV 2016. Lecture notes in computer science. Springer, Cham, 354–370. https://doi.org/10.1007/978-3-319-46493-0_22

  16. Zhang L, Lin L, Liang X, He K (2016) Is faster R-CNN doing well for pedestrian detection?. In: Leibe B, Matas J, Sebe N, Welling M (eds.) Computer vision – ECCV 2016. Lecture notes in computer science. Springer, Cham, 443–457. https://doi.org/10.1007/978-3-319-46475-6_28

  17. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 779–788. https://doi.org/10.1109/CVPR.2016.91

  18. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 7263–7271. https://doi.org/10.1109/CVPR.2017.690

  19. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.

  20. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) MobileNetV2: Inverted residuals and linear bottlenecks. In IEEE/CVF conference on computer vision and pattern recognition, 4510–4520. https://doi.org/10.1109/CVPR.2018.00474

  21. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: IEEE conference on computer vision and pattern recognition (CVPR), 1800–1807. https://doi.org/10.1109/CVPR.2017.195

  22. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338. https://doi.org/10.1007/s11263-009-0275-4

    Article  Google Scholar 

  23. Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features. In: Proceedings of the British machine vision conference, 91, 1-91. https://doi.org/10.5244/C.23.91

  24. Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987. https://doi.org/10.1109/TPAMI.2002.1017623

    Article  MATH  Google Scholar 

  25. Hurney P, Waldron P, Morgan F, Jones E, Glavin M (2015) Night-time pedestrian classification with histograms of oriented gradients-local binary patterns vectors. IET Intel Transport Syst 9(1):75–85. https://doi.org/10.1049/iet-its.2013.0163

    Article  Google Scholar 

  26. Kumar K, Mishra RK (2020) A heuristic SVM based pedestrian detection approach employing shape and texture descriptors. Multimed Tools Appl 79:21389–21408. https://doi.org/10.1007/s11042-020-08864-z

    Article  Google Scholar 

  27. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  28. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778. https://doi.org/10.1109/CVPR.2016.90

  29. Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In Proceedings of the IEEE international conference on computer vision, 3361–3369. https://doi.org/10.1109/TPAMI.2019.2910514

  30. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1–9. https://doi.org/10.1109/CVPR.2015.7298594

  31. Hosang J, Omran M, Benenson R, Schiele B (2015) Taking a deeper look at pedestrians. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4073–4082. https://doi.org/10.1109/CVPR.2015.7299034

  32. Tian Y, Luo P, Wang X, Tang X (2015) Deep learning strong parts for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, 1904–1912. https://doi.org/10.1109/ICCV.2015.221

  33. Tian Y, Luo P, Wang X, Tang X (2015) Pedestrian detection aided by deep learning semantic tasks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 5079–5087. https://doi.org/10.1109/CVPR.2015.7299143

  34. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 580–587. https://doi.org/10.1109/CVPR.2014.81

  35. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99

    Google Scholar 

  36. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds.) Computer vision – ECCV 2016, Lecture notes in computer science. Springer, Cham, 21–37. https://doi.org/10.1007/978-3-319-46448-0_2

  37. Yi Z, Yongliang S, Jun Z (2019) An improved tiny-yolov3 pedestrian detection algorithm. Optik 183:17–23. https://doi.org/10.1016/j.ijleo.2019.02.038

    Article  Google Scholar 

  38. Lan W, Dang J, Wang Y, Wang S (2018) Pedestrian detection based on YOLO network model. In: IEEE international conference on mechatronics and automation, 1547–1551. https://doi.org/10.1109/ICMA.2018.8484698

  39. Liu Z, Chen Z, Li Z, Hu W (2018) An efficient pedestrian detection method based on YOLOv2. Math Probl Eng. https://doi.org/10.1155/2018/3518959

    Article  Google Scholar 

  40. Hsu WY, Lin WY (2020) Ratio-and-scale-aware YOLO for pedestrian detection. IEEE Trans Image Process 30:934–947. https://doi.org/10.1109/TIP.2020.3039574

    Article  Google Scholar 

  41. Yang X, Wang Y, Laganière R (2020) A scale-aware YOLO model for pedestrian detection. In: Bebis G. et al. (eds.) Advances in visual computing, ISVC 2020, Lecture notes in computer science, Springer, Cham, 12510, 15–26. https://doi.org/10.1007/978-3-030-64559-5_2

  42. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In IEEE conference on computer vision and pattern recognition, pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308

  43. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360.

  44. Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The DET curve in assessment of detection task performance. Defense Technical Information Center. Virginia, US.

  45. Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761. https://doi.org/10.1109/TPAMI.2011.155

    Article  Google Scholar 

  46. Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2147–2154. https://doi.org/10.1109/CVPR.2014.276

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to U. S. N. Raju.

Ethics declarations

Conflict of interest

Not applicable.

Code availability

Available.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Panigrahi, S., Raju, U.S.N. InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection. Int J Multimed Info Retr 11, 409–430 (2022). https://doi.org/10.1007/s13735-022-00239-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-022-00239-4

Keywords

Navigation