Skip to main content
Log in

EfficientLiteDet: a real-time pedestrian and vehicle detection algorithm

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Since safety plays a crucial role and the top priority, in both unmanned and driver-assistance driving systems, there is a need of efficient and accurate detection of captured objects by object detection algorithms in real-time. Directly applying existing models to tackle real-time pedestrian and vehicle detection tasks captured by high speed moving vehicle scenarios has two problems. First, the target scale varies drastically because the vehicle speed changes greatly. Second, captured images contain both tiny targets and high density targets, which brings in occlusion between targets. To solve the two issues, an efficient light weight real-time detection algorithm is proposed, which is referred to as EfficientLiteDet. Based on Tiny-YOLOv4, one more prediction head is introduced in the proposed model to detect multi-scale targets effectively. In order to detect tiny and occluded denser targets, we used Transformer Prediction Heads (TPH) instead of original anchor detection heads in our model. To explore the potential of self-attention mechanism in TPH, the proposed model integrates “convolutional block attention model” to locate crucial attention region on scenarios with denser targets. Further to improve the detection performance of our model, we applied various data augmentation strategies such as mosaic, mix-up, multi-scale, and random-horizontal-flip during the model training. Extensive experiments are conducted on five challenging pedestrian and vehicle datasets shows that the EfficientLiteDet model has better performance in real-time scenarios. On Pascal Voc-2007, Highway and Udacity datasets, the proposed model achieves mean average precision (mAP) 87.3%, 80.1% and 77.8%, respectively, which is quite better than Tiny-YOLOv4 state-of-the-art algorithm by + 2.4%, 1.8% and + 2.4%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Girshick, R., Donahue, J., Darrell, T., Malik. J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014)

  2. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

  3. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards realtime object detection with region proposal networks. arXiv preprint http://arxiv.org/abs/1506.01497 (2015)

  4. He, K., Gkioxari, G, Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2961–2969 (2017)

  5. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)

  6. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

  7. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

  8. “Yolov3: An incremental improvement. arXiv preprint http://arxiv.org/abs/1804.02767 (2018)

  9. Viola, P., Jones, M.J., Snow, D.: Detecting pedestrians using patterns of motion and appearance. Int. J. Comput. Vision 63(2), 153–161 (2005)

    Article  Google Scholar 

  10. Murthy, C.B., Hashmi, M.F., Bokde, N.D., Geem, Z.W.: Investigations of object detection in images/videos using various deep learning techniques and embedded platforms-A comprehensive review. Appl. Sci. 10(9), 3280 (2020)

    Article  Google Scholar 

  11. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR—Volume 1–Volume 01, 2005, pp. 886–893. IEEE Computer Society (2005)

  12. Yang, B., Yan, J., Lei, Z., Li, S.Z.: Convolutional channel features. In: Proceedings of the IEEE International Conference on Computer Vision, December 2015, pp. 82–90 (2015)

  13. Zhang, S., Benenson, R., Schiele, B.: Filtered channel features for pedestrian detection. Proc. CVPR 1(2), 1751–1760 (2015)

    Google Scholar 

  14. W. Nam, P. Dollár, and J. H. Han, “Local decorrelation for improved pedestrian detection,” in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 424–432.

  15. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint http://arxiv.org/abs/2004.10934 (2020)

  16. Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Scaled-yolov4: scaling cross stage partial network. arXiv preprint http://arxiv.org/abs/2011.08036 (2020)

  17. Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)

  18. Tian, Y., Yang, G., Wang, Z., Wang, H., Li, E., Liang, Z.: Apple detection during different growth stages in orchards using the improved YOLO-V3 model. Comput. Electron. Agric. 157, 417–426 (2019)

    Article  Google Scholar 

  19. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708 (2017)

  20. Wang, C.-Y., Liao, H.-Y.M., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W., Yeh, I.-H.:Cspnet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391 (2020)

  21. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 37(9), 1904–1916 (2015)

    Article  Google Scholar 

  22. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision

  23. Jocher, G. et al. yolov5 (2021). https://github.com/ultralytics/yolov5

  24. Zhu, X., Lyu, S., Wang, X., Zhao, Q.: TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2778–2788 (2021)

  25. Wang, R.J., Li, X., Ling, C.X.: Pelee: a real-time object detection system on mobile devices. arXiv preprint http://arxiv.org/abs/1804.06882 (2018)

  26. Murthy, C.B., Hashmi, M.F.: Real time pedestrian detection using robust enhanced YOLOv3+. In: 2020 21st International Arab Conference on Information Technology (ACIT), pp. 1–5. IEEE (2020, November)

  27. Murthy, C.B., Hashmi, M.F., Keskar, A.G.: Optimized MobileNet+ SSD: a real-time pedestrian detection on a low-end edge device. Int. J. Multimed. Inf. Retr. 3, 171–184 (2021)

    Article  Google Scholar 

  28. Chen, L., Ding, Q., Zou, Q., Chen, Z., Li, L.: DenseLightNet: a light-weight vehicle detection network for autonomous driving. IEEE Trans. Ind. Electron. 67(12), 10600–10609 (2020)

    Article  Google Scholar 

  29. Rani, E.: Little-YOLO-SPP: A delicate real-time vehicle detection algorithm. Optik 225, 165818 (2021)

    Article  Google Scholar 

  30. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv 2017, http://arxiv.org/abs/1704.04861

  31. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018).

  32. Howard, A., Sandler, M., Chen, B., et al.: Searching for MobileNetV3. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp. 1314–1324. https://doi.org/10.1109/ICCV.2019.00140

  33. Mateus, A., Ribeiro, D., Miraldo, P., Nascimento, J.C.: Efficient and robust pedestrian detection using deep learning for human-aware navigation. Robot. Auton. Syst. 113, 23–37 (2019)

    Article  Google Scholar 

  34. Zhang, L., Lin, L., Liang, X., He, K.: Is faster R-CNN doing well for pedestrian detection? In: Proceedings of European Conference on Computer Vision, pp. 443–457: Springer, Cham (2016)

  35. Tian, Y., Luo, P., Wang, X., Tang, X.: Pedestrian detection aided by deep learning semantic tasks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp. 5079–5087 (2015)

  36. Brazil, G., Yin, X., Liu, X.: Illuminating pedestrians via simultaneous detection segmentation, June 2017. http://arxiv.org/abs/1706.08564 (2017)

  37. Ouyang, W., Zhou, H., Li, H., Li, Q., Yan, J., Wang, X.: Joint on. IEEE Trans. Pattern Anal. Mach. Intell. 40(8), 1874–1887 (2018)

    Article  Google Scholar 

  38. Liu, Z., Chen, Z., Li, Z., Hu, W.: An efficient pedestrian detection method based on YOLOv2. In: Mathematical Problems in Engineering (2018)

  39. Li, J., Liang, X., Shen, S., Xu, T., Feng, J., Yan, S.: Scale-aware fast R-CNN for pedestrian detection. IEEE Trans. Multimed. 20(4), 985–996 (2018)

    Google Scholar 

  40. Han, B., Wang, Y., Yang, Z., Gao, X.: Small-scale pedestrian detection based on deep neural network. IEEE Trans. Intell. Transp. Syst. 21(7), 3046–3055 (2019)

    Article  Google Scholar 

  41. Zhang, S., Yang, J., Schiele, B.: Occluded pedestrian detection through guided attention in CNNs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018, pp. 6995–7003 (2018)

  42. Cao, J., Song, C., Peng, S., Song, S., Zhang, X., Shao, Y., Xiao, F.: Pedestrian detection algorithm for intelligent vehicles in complex scenarios. Sensors 20(13), 3646 (2020)

    Article  Google Scholar 

  43. Wang, D., Li, C., Wen, S., Han, Q.-L., Nepal, S., Zhang, X., Xiang, Y.: Daedalus: Breaking nonmaximum suppression in object detection via adversarial examples. In: IEEE Transactions on Cybernetics (2021)

  44. Hsu, W.Y., Lin, W.Y.: Ratio-and-scale-aware YOLO for pedestrian detection. IEEE Trans. Image Process. 30, 934–947 (2020)

    Article  Google Scholar 

  45. Yi, Z., Yongliang, S., Jun, Z.: An improved tiny-yolov3 pedestrian detection algorithm. Optik 183, 17–23 (2019)

    Article  Google Scholar 

  46. Song, H., Liang, H., Li, H., Dai, Z., Yun, X.: Vision-based vehicle detection and counting system using deep learning in highway scenes. Eur. Transp. Res. Rev. 11(1), 1–16 (2019)

    Article  Google Scholar 

  47. Wang, X., Wang, S., Cao, J., Wang, Y.: Data-driven based Tiny-YOLOv3 method for front vehicle detection inducing SPP-Net. IEEE Access 8, 110227–110236 (2020). https://doi.org/10.1109/ACCESS.2020.3001279

    Article  Google Scholar 

  48. Mahto, P., Garg, P., Seth, P., Panda, J.: Refning yolov4 for vehicle detection. Int. J. Adv. Res. Eng. Technol. (IJARET) 11(5), 409–419 (2020)

    Google Scholar 

  49. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of Neural Information Processing Systems, 2012, pp. 1097–1105 (2012)

  50. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016).

  51. Zhou, X., Lin, M., et al.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6848–6856. Salt Lake City, UT, USA (2018)

  52. Ma, N., Zhang, X., Zheng, H. T., et al.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: European Conference on Computer Vision, pp. 122–138. Springer, Cham (2018)

  53. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16 × 16 words: transformers for image recognition at scale. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021 (2021)

  54. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp. 5998–6008 (2017)

  55. Woo, S., Park, J., Lee, J-Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European conference on Computer Vision (ECCV), pp. 3–19 (2018)

  56. Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: a benchmark. In IEEE Conference on Computer Vision and Pattern Recognition, Miami, Fl, USA, June 2009, pp. 304–311 (2009)

  57. Neubeck, A., Van Gool, L.: Efficient nonmaximum suppression. In: 18th International Conference on Pattern Recognition (ICPR 06), vol. 3, pp. 850–855. IEEE (2006)

  58. Solovyev, R., Wang, W., Gabruseva, T.: Weighted boxes fusion: Ensembling boxes from different object detection models. Image Vis. Comput. 107, 104117 (2021)

    Article  Google Scholar 

  59. Udacity (2018). https://github.com/udacity/self-driving-car/. Accessed 20 Sept 2021. (Udacity)

  60. Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes challenge 2007 (voc 2007) results (2007) (2008)

  61. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: CVPR (2012)

  62. Zhang, Z., He, T., Zhang, H., Zhang, Z., Xie, J., Li, M: Bag of freebies for training object detection neural networks. arXiv preprint http://arxiv.org/abs/1902.04103 (2019)

  63. Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Farukh Hashmi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Murthy, C.B., Hashmi, M.F. & Keskar, A.G. EfficientLiteDet: a real-time pedestrian and vehicle detection algorithm. Machine Vision and Applications 33, 47 (2022). https://doi.org/10.1007/s00138-022-01293-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-022-01293-y

Keywords

Navigation