A novel backbone architecture for pedestrian detection based on the human visual system

Saeidi, Mahmoud; Arabsorkhi, Abouzar

doi:10.1007/s00371-021-02280-6

A novel backbone architecture for pedestrian detection based on the human visual system

Original article
Published: 17 August 2021

Volume 38, pages 2223–2237, (2022)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Mahmoud Saeidi¹ &
Abouzar Arabsorkhi¹

343 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

Pedestrian detection using deep convolutional neural networks (DCNNs) has made a breakthrough in the last few years and researchers have proposed different DCNN architectures to detect pedestrians more accurately. Most of these architectures have a backbone based on previous state-of-the-art architectures for classification tasks and just tried to adapt them for their detection task. They are improving their performance with some heuristics, trial and error techniques, and sometimes with grid search on a space of various architectures. However, there is no research in which, firstly, the visual detection system of human has been studied, and then tried to propose a backbone architecture based on that. In this paper, we first review the state-of-the-art methods and then, having a preliminary on visual detection system in the human brain and finally, propose our architecture based on that. The intuition behind our idea can justify the evolutionary course of detection architectures from the first fully convolutional neural networks (FCNNs), like Faster R-CNN, to the modern state-of-the-art methods nowadays and give us a better understanding of why some architectures are superior to the others. The advantage of our idea is that it can be applied to most of the existing architectures with some manipulations, although it is much easier on some methods than others. We have implemented our idea based on an anchor-free method called CSP and could achieve better performance on Caltech-USA and INRIA, which are two of the most popular pedestrian detection datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

Availability of data and materials

The image datasets used to support the findings of this study can be downloaded from the public websites whose cited in the article.

References

Paisitkriangkrai, S., Shen, C., van den Hengel, A.: Pedestrian detection with spatially pooled features and structured ensemble learning. IEEE Trans. Pattern Anal. Mach. Intell. 38(6), 1243–1257 (2015)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Arxiv preprint arXiv:1409.1556 (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229 (2013)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Liu, Z., Chen, Z., Li, Z., Hu, W.: An efficient pedestrian detection method based on YOLOv2. Math. Probl. Eng. 2018 , 1–10 (2018)
Du, X., El-Khamy, M., Lee, J., Davis, L.: Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 953–961. IEEE (2017)
Perreault, H., Bilodeau, G.-A., Saunier, N., Héritier, M.: Spotnet: self-attention multi-task network for object detection. In: 2020 17th Conference on Computer and Robot Vision (CRV), pp. 230–237. IEEE (2020)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. Arxiv preprint arXiv:1511.07122 (2015)
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
Jaderberg, M., Simonyan, K., Zisserman, A.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017–2025 (2015)
Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: detecting pedestrians in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7774–7783 (2018)
Pang, Y., Xie, J., Khan, M.H., Anwer, R.M., Khan, F.S., Shao, L.: Mask-guided attention network for occluded pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4967–4975 (2019)
Li, J., Liang, X., Shen, S., Xu, T., Feng, J., Yan, S.: Scale-aware fast R-CNN for pedestrian detection. IEEE Trans. Multimedia 20(4), 985–996 (2017)
Google Scholar
Singh, B., Davis, L.S.: An analysis of scale invariance in object detection snip. In: Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition, pp. 3578–3587 (2018)
Singh, B., Najibi, M., Davis, L.S.: Sniper: Efficient multi-scale training. In: Advances in Neural Information Processing Systems, pp. 9310–9320 (2018)
Liu, Y., Wang, Y., Wang, S., Liang, T., Zhao, Q., Tang, Z., Ling, H.: CBNet: a novel composite backbone network architecture for object detection. In: Association for the Advancement of Artificial Intelligence (AAAI), pp. 11653–11660 (2020)
Cai, Z., Vasconcelos, N.: Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
He, K., Girshick, R., Dollár, P.: Rethinking imagenet pre-training. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4918–4927 (2019)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. Arxiv preprint arXiv:1804.02767 (2018)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)
Law, H., Deng, J.: Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6569–6578 (2019)
Liu, W., Liao, S., Ren, W., Hu, W., Yu, Y.: High-level semantic feature detection: a new perspective for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5187–5196 (2019)
Song, T., Sun, L., Xie, D., Sun, H., Pu, S.: Small-scale pedestrian detection based on topological line localization and temporal feature aggregation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 536–551 (2018)
Kong, T., Sun, F., Liu, H., Jiang, Y., Li, L., Shi, J.: FoveaBox: beyound anchor-based object detection. IEEE Trans. Image Process. 29, 7389–7398 (2020)
Article Google Scholar
Zhang, L., Lin, L., Liang, X., He, K.: Is faster R-CNN doing well for pedestrian detection? In: European Conference on Computer Vision, pp. 443–457. Springer (2016)
Wang, S., Cheng, J., Liu, H., Tang, M.: Pcn: Part and context information for pedestrian detection with CNNs. Arxiv preprint arXiv:1804.04483 (2018)
Lin, C., Lu, J., Wang, G., Zhou, J.: Graininess-aware deep feature learning for pedestrian detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 732–747 (2018)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Girshick, R.: Fast r-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2011)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp. 886–893. IEEE (2005)
Szeliski, R.: Computer Vision: Algorithms and Applications. Springer (2010)
Kanade, T.: Three-Dimensional Machine Vision, vol. 21. Springer (2012)
Turk, M.A., Pentland, A.P.: Face recognition using eigenfaces. In: Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 586–591. IEEE Computer Society (1991)
Sebe, N., Cohen, I., Garg, A., Huang, T.S.: Machine Learning in Computer Vision, vol. 29. Springer (2005)
Yang, J., Liu, L., Jiang, T., Fan, Y.: A modified Gabor filter design method for fingerprint image enhancement. Pattern Recogn. Lett. 24(12), 1805–1817 (2003)
Article Google Scholar
Viola, P., Jones, M.: Robust real-time face detection. In: Null, p. 747. IEEE (2001)
Wojek, C., Schiele, B.: A performance evaluation of single and multi-feature people detection. In: Joint Pattern Recognition Symposium, pp. 82–91. Springer (2008)
Marin, J., Vázquez, D., López, A.M., Amores, J., Leibe, B.: Random forests of local experts for pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2592–2599 (2013)
Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1532–1545 (2014)
Article Google Scholar
Zhang, S., Bauckhage, C., Cremers, A.B.: Informed haar-like features improve pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 947–954 (2014)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Article Google Scholar
Fei-Fei Li, R.K., Danfei, X.: CNN architectures. http://cs231n.stanford.edu/slides/2020/lecture_9.pdf. Accessed 1 October 2020
Bui, H.M., Lech, M., Cheng, E., Neville, K., Burnett, I.S.: Object recognition using deep convolutional features transformed by a recursive network structure. IEEE Access 4, 10059–10066 (2016)
Article Google Scholar
Vaillant, R., Monrocq, C., Le Cun, Y.: Original approach for the localisation of objects in images. IEE Proc.-Vis., Image Signal Process. 141(4), 245–250 (1994)
Article Google Scholar
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Article Google Scholar
Baars, B., Gage, N.M.: Fundamentals of Cognitive Neuroscience: A Beginner's Guide. Academic Press (2013)
Gage, N.M., Baars, B.: Fundamentals of Cognitive Neuroscience: A Beginner's Guide. Academic Press (2018)
Schieber, M., Squire, L., Baker, J.: Descending control of movement. In: Fundamental Neuroscience, 3rd edn. Academic Press (2008)
Neuroscience, F.: In: Squire, L.R., Bloom, F.E., McConnell, S.K., Roberts, J.L., Spitzer, N.C., Zigmond, M.J. (eds.) Fundamental Neuroscience, 2nd edn.. Elsevier Science, San Diego (2003)
Zhang, S., Benenson, R., Schiele, B.: Citypersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3221 (2017)
Kingma, D.P., Ba, J.A.: A method for stochastic optimization. Arxiv 434, 2014 (2019). arXiv:1412.6980
Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: Towards reaching human performance in pedestrian detection. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 973–986 (2017)
Article Google Scholar
Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: How far are we from solving pedestrian detection? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1259–1267 (2016)

Download references

Acknowledgements

Authors would like to acknowledge Iran Telecommunication Research Center, for supports throughout this research.

Author information

Authors and Affiliations

Iran Telecommunication Research Center, Tehran, Iran
Mahmoud Saeidi & Abouzar Arabsorkhi

Authors

Mahmoud Saeidi
View author publications
You can also search for this author in PubMed Google Scholar
Abouzar Arabsorkhi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors took part in the discussion of the work described in this paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mahmoud Saeidi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saeidi, M., Arabsorkhi, A. A novel backbone architecture for pedestrian detection based on the human visual system. Vis Comput 38, 2223–2237 (2022). https://doi.org/10.1007/s00371-021-02280-6

Download citation

Accepted: 05 August 2021
Published: 17 August 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s00371-021-02280-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel backbone architecture for pedestrian detection based on the human visual system

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

Availability of data and materials

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel backbone architecture for pedestrian detection based on the human visual system

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

Availability of data and materials

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation