Abstract
Pedestrian detection is one of the most challenging research areas in computer vision, as it involves classifying the image and localizing the pedestrian. Its applications, especially in automated surveillance and robotics, are exceedingly sought-after. Compared to traditional hand-crafted methods, convolutional neural networks (CNNs) have superior detection results. The single-stage detection networks, particularly the You Only Look Once (YOLO) network, have attained a satisfactory performance in object detection without compromising the computation speed and are among the state-of-the-art CNN-based methods. The YOLO framework can be leveraged to use in pedestrian detection as well. In this work, we propose an improved YOLOv2, called InceptionDepth-wiseYOLOv2. The proposed model uses a modified DarkNet53 engineered for a robust feature formation. Three inception depth-wise convolution modules are integrated at varying levels in DarkNet53, leading to a comprehensive feature of an object in the image. The proposed method is compared with state-of-the-art detection methods, i.e., FasterRCNN, YOLOv2 with various base networks, YOLOv3, and Single Shot Multibox Detector. Detection Error Trade-off Curve, Precision–Recall Curve, Log Average Miss Rate, and Average Precision performance metrics are used to compare the methods. The analysis for the count of pedestrians detected concerning their height is also carried out. The experimental study used three benchmark pedestrian datasets: the INRIA Pedestrian, PASCAL VOC 2012, and Caltech Pedestrian.
Similar content being viewed by others
Availability of data and material
Available.
References
Dollár P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: A benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition, 304–311. https://doi.org/10.1109/CVPR.2009.5206631
Cao J, Pang Y, Li X (2016) Pedestrian detection inspired by appearance constancy and shape symmetry. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1316–1324. https://doi.org/10.1109/TIP.2016.2609807
Zhu C, Peng Y (2015) A boosted multi-task model for pedestrian detection with occlusion handling. IEEE Trans Image Process 24(12):5619–5629. https://doi.org/10.1109/TIP.2015.2483376
Cao J, Pang Y, Li X (2017) Learning multilayer channel features for pedestrian detection. IEEE Trans Image Process 26(7):3210–3220. https://doi.org/10.1109/TIP.2017.2694224
Li J, Liang X, Shen S, Xu T, Feng J, Yan S (2017) Scale-aware fast R-CNN for pedestrian detection. IEEE Trans Multimedia 20(4):985–996. https://doi.org/10.1109/TMM.2017.2759508
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In IEEE computer society conference on computer vision and pattern recognition, 886–893. https://doi.org/10.1109/CVPR.2005.177
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154. https://doi.org/10.1023/B:VISI.0000013087.49260.fb
Li H, Wu Z, Zhang J (2016) Pedestrian detection based on deep learning model. In 9th International congress on image and signal processing, Biomedical engineering and informatics, pp 796–800. https://doi.org/10.1109/CISP-BMEI.2016.7852818
Sermanet P, Kavukcuoglu K, Chintala S, LeCun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 3626–3633. https://doi.org/10.1109/CVPR.2013.465
Sabokrou M, Fayyaz M, Fathy M, Klette R (2017) Deep-cascade: Cascading 3d deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans Image Process 26(4):1992–2004. https://doi.org/10.1109/TIP.2017.2670780
Zhao Y, Yuan Z, Chen B (2019) Accurate pedestrian detection by human pose regression. IEEE Trans Image Process 29:1591–1605. https://doi.org/10.1109/TIP.2019.2942686
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE conference on computer vision and pattern recognition, 3354–3361. https://doi.org/10.1109/CVPR.2012.6248074
Zhang S, Benenson R, Schiele B (2017) Citypersons: A diverse dataset for pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 3213–3221. https://doi.org/10.1109/CVPR.2017.474
Wu S, Wang S, Laganiere R, Liu C, Wong HS, Xu Y (2017) Exploiting target data to learn deep convolutional networks for scene-adapted human detection. IEEE Trans Image Process 27(3):1418–1432. https://doi.org/10.1109/TIP.2017.2779271
Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe B, Matas J, Sebe N, Welling M (eds.) Computer vision – ECCV 2016. Lecture notes in computer science. Springer, Cham, 354–370. https://doi.org/10.1007/978-3-319-46493-0_22
Zhang L, Lin L, Liang X, He K (2016) Is faster R-CNN doing well for pedestrian detection?. In: Leibe B, Matas J, Sebe N, Welling M (eds.) Computer vision – ECCV 2016. Lecture notes in computer science. Springer, Cham, 443–457. https://doi.org/10.1007/978-3-319-46475-6_28
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 779–788. https://doi.org/10.1109/CVPR.2016.91
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 7263–7271. https://doi.org/10.1109/CVPR.2017.690
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) MobileNetV2: Inverted residuals and linear bottlenecks. In IEEE/CVF conference on computer vision and pattern recognition, 4510–4520. https://doi.org/10.1109/CVPR.2018.00474
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: IEEE conference on computer vision and pattern recognition (CVPR), 1800–1807. https://doi.org/10.1109/CVPR.2017.195
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338. https://doi.org/10.1007/s11263-009-0275-4
Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features. In: Proceedings of the British machine vision conference, 91, 1-91. https://doi.org/10.5244/C.23.91
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987. https://doi.org/10.1109/TPAMI.2002.1017623
Hurney P, Waldron P, Morgan F, Jones E, Glavin M (2015) Night-time pedestrian classification with histograms of oriented gradients-local binary patterns vectors. IET Intel Transport Syst 9(1):75–85. https://doi.org/10.1049/iet-its.2013.0163
Kumar K, Mishra RK (2020) A heuristic SVM based pedestrian detection approach employing shape and texture descriptors. Multimed Tools Appl 79:21389–21408. https://doi.org/10.1007/s11042-020-08864-z
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778. https://doi.org/10.1109/CVPR.2016.90
Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In Proceedings of the IEEE international conference on computer vision, 3361–3369. https://doi.org/10.1109/TPAMI.2019.2910514
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Hosang J, Omran M, Benenson R, Schiele B (2015) Taking a deeper look at pedestrians. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4073–4082. https://doi.org/10.1109/CVPR.2015.7299034
Tian Y, Luo P, Wang X, Tang X (2015) Deep learning strong parts for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, 1904–1912. https://doi.org/10.1109/ICCV.2015.221
Tian Y, Luo P, Wang X, Tang X (2015) Pedestrian detection aided by deep learning semantic tasks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 5079–5087. https://doi.org/10.1109/CVPR.2015.7299143
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 580–587. https://doi.org/10.1109/CVPR.2014.81
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds.) Computer vision – ECCV 2016, Lecture notes in computer science. Springer, Cham, 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Yi Z, Yongliang S, Jun Z (2019) An improved tiny-yolov3 pedestrian detection algorithm. Optik 183:17–23. https://doi.org/10.1016/j.ijleo.2019.02.038
Lan W, Dang J, Wang Y, Wang S (2018) Pedestrian detection based on YOLO network model. In: IEEE international conference on mechatronics and automation, 1547–1551. https://doi.org/10.1109/ICMA.2018.8484698
Liu Z, Chen Z, Li Z, Hu W (2018) An efficient pedestrian detection method based on YOLOv2. Math Probl Eng. https://doi.org/10.1155/2018/3518959
Hsu WY, Lin WY (2020) Ratio-and-scale-aware YOLO for pedestrian detection. IEEE Trans Image Process 30:934–947. https://doi.org/10.1109/TIP.2020.3039574
Yang X, Wang Y, Laganière R (2020) A scale-aware YOLO model for pedestrian detection. In: Bebis G. et al. (eds.) Advances in visual computing, ISVC 2020, Lecture notes in computer science, Springer, Cham, 12510, 15–26. https://doi.org/10.1007/978-3-030-64559-5_2
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In IEEE conference on computer vision and pattern recognition, pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360.
Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The DET curve in assessment of detection task performance. Defense Technical Information Center. Virginia, US.
Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761. https://doi.org/10.1109/TPAMI.2011.155
Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2147–2154. https://doi.org/10.1109/CVPR.2014.276
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Not applicable.
Code availability
Available.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Panigrahi, S., Raju, U.S.N. InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection. Int J Multimed Info Retr 11, 409–430 (2022). https://doi.org/10.1007/s13735-022-00239-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13735-022-00239-4