InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection

Panigrahi, Sweta; Raju, U. S. N.

doi:10.1007/s13735-022-00239-4

InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection

Regular Paper
Published: 11 May 2022

Volume 11, pages 409–430, (2022)
Cite this article

International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

578 Accesses
5 Citations
Explore all metrics

Abstract

Pedestrian detection is one of the most challenging research areas in computer vision, as it involves classifying the image and localizing the pedestrian. Its applications, especially in automated surveillance and robotics, are exceedingly sought-after. Compared to traditional hand-crafted methods, convolutional neural networks (CNNs) have superior detection results. The single-stage detection networks, particularly the You Only Look Once (YOLO) network, have attained a satisfactory performance in object detection without compromising the computation speed and are among the state-of-the-art CNN-based methods. The YOLO framework can be leveraged to use in pedestrian detection as well. In this work, we propose an improved YOLOv2, called InceptionDepth-wiseYOLOv2. The proposed model uses a modified DarkNet53 engineered for a robust feature formation. Three inception depth-wise convolution modules are integrated at varying levels in DarkNet53, leading to a comprehensive feature of an object in the image. The proposed method is compared with state-of-the-art detection methods, i.e., FasterRCNN, YOLOv2 with various base networks, YOLOv3, and Single Shot Multibox Detector. Detection Error Trade-off Curve, Precision–Recall Curve, Log Average Miss Rate, and Average Precision performance metrics are used to compare the methods. The analysis for the count of pedestrians detected concerning their height is also carried out. The experimental study used three benchmark pedestrian datasets: the INRIA Pedestrian, PASCAL VOC 2012, and Caltech Pedestrian.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

Availability of data and material

Available.

References

Dollár P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: A benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition, 304–311. https://doi.org/10.1109/CVPR.2009.5206631
Cao J, Pang Y, Li X (2016) Pedestrian detection inspired by appearance constancy and shape symmetry. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1316–1324. https://doi.org/10.1109/TIP.2016.2609807
Zhu C, Peng Y (2015) A boosted multi-task model for pedestrian detection with occlusion handling. IEEE Trans Image Process 24(12):5619–5629. https://doi.org/10.1109/TIP.2015.2483376
Article MathSciNet MATH Google Scholar
Cao J, Pang Y, Li X (2017) Learning multilayer channel features for pedestrian detection. IEEE Trans Image Process 26(7):3210–3220. https://doi.org/10.1109/TIP.2017.2694224
Article MathSciNet MATH Google Scholar
Li J, Liang X, Shen S, Xu T, Feng J, Yan S (2017) Scale-aware fast R-CNN for pedestrian detection. IEEE Trans Multimedia 20(4):985–996. https://doi.org/10.1109/TMM.2017.2759508
Article Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In IEEE computer society conference on computer vision and pattern recognition, 886–893. https://doi.org/10.1109/CVPR.2005.177
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154. https://doi.org/10.1023/B:VISI.0000013087.49260.fb
Article Google Scholar
Li H, Wu Z, Zhang J (2016) Pedestrian detection based on deep learning model. In 9th International congress on image and signal processing, Biomedical engineering and informatics, pp 796–800. https://doi.org/10.1109/CISP-BMEI.2016.7852818
Sermanet P, Kavukcuoglu K, Chintala S, LeCun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 3626–3633. https://doi.org/10.1109/CVPR.2013.465
Sabokrou M, Fayyaz M, Fathy M, Klette R (2017) Deep-cascade: Cascading 3d deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans Image Process 26(4):1992–2004. https://doi.org/10.1109/TIP.2017.2670780
Article MathSciNet MATH Google Scholar
Zhao Y, Yuan Z, Chen B (2019) Accurate pedestrian detection by human pose regression. IEEE Trans Image Process 29:1591–1605. https://doi.org/10.1109/TIP.2019.2942686
Article MathSciNet Google Scholar
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE conference on computer vision and pattern recognition, 3354–3361. https://doi.org/10.1109/CVPR.2012.6248074
Zhang S, Benenson R, Schiele B (2017) Citypersons: A diverse dataset for pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 3213–3221. https://doi.org/10.1109/CVPR.2017.474
Wu S, Wang S, Laganiere R, Liu C, Wong HS, Xu Y (2017) Exploiting target data to learn deep convolutional networks for scene-adapted human detection. IEEE Trans Image Process 27(3):1418–1432. https://doi.org/10.1109/TIP.2017.2779271
Article MathSciNet MATH Google Scholar
Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe B, Matas J, Sebe N, Welling M (eds.) Computer vision – ECCV 2016. Lecture notes in computer science. Springer, Cham, 354–370. https://doi.org/10.1007/978-3-319-46493-0_22
Zhang L, Lin L, Liang X, He K (2016) Is faster R-CNN doing well for pedestrian detection?. In: Leibe B, Matas J, Sebe N, Welling M (eds.) Computer vision – ECCV 2016. Lecture notes in computer science. Springer, Cham, 443–457. https://doi.org/10.1007/978-3-319-46475-6_28
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 779–788. https://doi.org/10.1109/CVPR.2016.91
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 7263–7271. https://doi.org/10.1109/CVPR.2017.690
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) MobileNetV2: Inverted residuals and linear bottlenecks. In IEEE/CVF conference on computer vision and pattern recognition, 4510–4520. https://doi.org/10.1109/CVPR.2018.00474
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: IEEE conference on computer vision and pattern recognition (CVPR), 1800–1807. https://doi.org/10.1109/CVPR.2017.195
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338. https://doi.org/10.1007/s11263-009-0275-4
Article Google Scholar
Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features. In: Proceedings of the British machine vision conference, 91, 1-91. https://doi.org/10.5244/C.23.91
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987. https://doi.org/10.1109/TPAMI.2002.1017623
Article MATH Google Scholar
Hurney P, Waldron P, Morgan F, Jones E, Glavin M (2015) Night-time pedestrian classification with histograms of oriented gradients-local binary patterns vectors. IET Intel Transport Syst 9(1):75–85. https://doi.org/10.1049/iet-its.2013.0163
Article Google Scholar
Kumar K, Mishra RK (2020) A heuristic SVM based pedestrian detection approach employing shape and texture descriptors. Multimed Tools Appl 79:21389–21408. https://doi.org/10.1007/s11042-020-08864-z
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778. https://doi.org/10.1109/CVPR.2016.90
Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In Proceedings of the IEEE international conference on computer vision, 3361–3369. https://doi.org/10.1109/TPAMI.2019.2910514
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Hosang J, Omran M, Benenson R, Schiele B (2015) Taking a deeper look at pedestrians. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4073–4082. https://doi.org/10.1109/CVPR.2015.7299034
Tian Y, Luo P, Wang X, Tang X (2015) Deep learning strong parts for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, 1904–1912. https://doi.org/10.1109/ICCV.2015.221
Tian Y, Luo P, Wang X, Tang X (2015) Pedestrian detection aided by deep learning semantic tasks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 5079–5087. https://doi.org/10.1109/CVPR.2015.7299143
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 580–587. https://doi.org/10.1109/CVPR.2014.81
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99
Google Scholar
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds.) Computer vision – ECCV 2016, Lecture notes in computer science. Springer, Cham, 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Yi Z, Yongliang S, Jun Z (2019) An improved tiny-yolov3 pedestrian detection algorithm. Optik 183:17–23. https://doi.org/10.1016/j.ijleo.2019.02.038
Article Google Scholar
Lan W, Dang J, Wang Y, Wang S (2018) Pedestrian detection based on YOLO network model. In: IEEE international conference on mechatronics and automation, 1547–1551. https://doi.org/10.1109/ICMA.2018.8484698
Liu Z, Chen Z, Li Z, Hu W (2018) An efficient pedestrian detection method based on YOLOv2. Math Probl Eng. https://doi.org/10.1155/2018/3518959
Article Google Scholar
Hsu WY, Lin WY (2020) Ratio-and-scale-aware YOLO for pedestrian detection. IEEE Trans Image Process 30:934–947. https://doi.org/10.1109/TIP.2020.3039574
Article Google Scholar
Yang X, Wang Y, Laganière R (2020) A scale-aware YOLO model for pedestrian detection. In: Bebis G. et al. (eds.) Advances in visual computing, ISVC 2020, Lecture notes in computer science, Springer, Cham, 12510, 15–26. https://doi.org/10.1007/978-3-030-64559-5_2
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In IEEE conference on computer vision and pattern recognition, pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360.
Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The DET curve in assessment of detection task performance. Defense Technical Information Center. Virginia, US.
Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761. https://doi.org/10.1109/TPAMI.2011.155
Article Google Scholar
Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2147–2154. https://doi.org/10.1109/CVPR.2014.276

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Warangal, Warangal, Telangana, 506004, India
Sweta Panigrahi & U. S. N. Raju

Authors

Sweta Panigrahi
View author publications
You can also search for this author in PubMed Google Scholar
U. S. N. Raju
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to U. S. N. Raju.

Ethics declarations

Conflict of interest

Not applicable.

Code availability

Available.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Panigrahi, S., Raju, U.S.N. InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection. Int J Multimed Info Retr 11, 409–430 (2022). https://doi.org/10.1007/s13735-022-00239-4

Download citation

Received: 18 August 2021
Revised: 14 April 2022
Accepted: 22 April 2022
Published: 11 May 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s13735-022-00239-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

Availability of data and material

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

Availability of data and material

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation