Crowd counting method based on the self-attention residual network

Liu, Yan-Bo; Jia, Rui-Sheng; Liu, Qing-Ming; Zhang, Xing-Li; Sun, Hong-Mei

doi:10.1007/s10489-020-01842-w

Crowd counting method based on the self-attention residual network

Published: 17 August 2020

Volume 51, pages 427–440, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yan-Bo Liu¹,
Rui-Sheng Jia ORCID: orcid.org/0000-0003-1612-4764^1,2,
Qing-Ming Liu¹,
Xing-Li Zhang^1,2 &
…
Hong-Mei Sun^1,2

901 Accesses
26 Citations
Explore all metrics

Abstract

Estimating the crowd density in surveillance videos is a hot issue in the field of computer vision and has become the basis of data processing and analysis of public transport services, commercial passenger flow analysis, public security protection and other industries. However, in terms of practical applications, due to the problems of pedestrian occlusion and scale changes, existing methods are inadequate with regard to the acquisition of the human head, which affects the accuracy of counting. To solve this problem, a crowd counting method based on a self-attention residual network is proposed. First, a multiscale convolution module composed of dilated convolution and deformation convolution is used. To avoid losing image resolution, some of the sampling positions are shifted to the occluded crowd by shifting the sampling points, which solves the problem of crowd occlusion. Then, a self-attention residual module is designed to score and classify the feature map, which allows all pixels in the feature map to be classified. The corresponding weight is generated, and the population scale is determined by the weight, which solves the problem of crowd scale changes. The algorithm is applied in ShanghaiTech and the UCF_CC_50 and WorldExpo’10 datasets are tested. The experimental results show that the mean absolute error (MAE) and mean square error (MSE) of this algorithm are significantly reduced compared with those of a comparative algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An encoder-decoder network for crowd counting based on multi-scale attention mechanism

Article 11 April 2024

Lightweight multi-scale network with attention for accurate and efficient crowd counting

Article 25 September 2023

Two stages double attention convolutional neural network for crowd counting

Article 08 August 2020

References

Li T, Chang H, Wang M, Ni B, Hong R, Yan S (2014) Crowded scene analysis: a survey. IEEE transactions on circuits and systems for video technology 25(3):367–386
Article Google Scholar
Onoro-Rubio D, López-Sastre R J (2016) towards perspective-free object counting with deep learning. In European Conference on Computer Vision (ECCV) 615-629
Zhang S, Wu G, Costeira JP, Moura JM (2017) Understanding traffic density from large-scale web camera data. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 5898–5907
Lin SF, Chen JY, Chao HX (2001) Estimation of number of people in crowded scenes using perspective transformation. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 31(6):645–654
Article Google Scholar
Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In: In Tenth IEEE International Conference on Computer Vision (ICCV'05), 1(1), pp 90–97
Li M, Zhang Z, Huang K, Tan T (2008, December). Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In 2008 19th International Conference on Pattern Recognition 1–4
Zhao T, Nevatia R, Wu B (2008) Segmentation and tracking of multiple humans in crowded environments. IEEE Trans Pattern Anal Mach Intell 30(7):1198–1211
Article Google Scholar
Ge W, Collins RT (2009). Marked point processes for crowd counting. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2913–2920
Wang M, Wang X (2011). Automatic adaptation of a generic pedestrian detector to a specific traffic scene. In CVPR 3401-3408
Viola P, Jones MJ, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J Comput Vis 63(2):153–161
Article Google Scholar
Liang R, Zhu Y, Wang H (2014) Counting crowd flow based on feature points. Neurocomputing 133:377–384
Article Google Scholar
Siva P, Shafiee MJ, Jamieson M, Wong A (2016). Scene invariant crowd segmentation and counting using scale-normalized histogram of moving gradients (homg). arXiv preprint arXiv:1602.00386
An S, Liu W, Venkatesh S (2007). Face recognition using kernel ridge regression. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1–7
Chan AB, Vasconcelos N (2009, September) Bayesian poisson regression for crowd counting. IEEE International Conference on Computer Vision (ICCV) 545–551
Pham VQ, Kozakaya T, Yamaguchi O, Okada R (2015) Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. Proc IEEE International Conference on Computer Vision (ICCV) 3253–3261
Zhang S, Wu G, Costeira JP, Moura JM (2017). Fcn-rlstm: deep spatio-temporal neural networks for vehicle counting in city cameras. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) 3667–3676
Hu J, Lu J, Tan YP (2017) Sharable and individual multi-view metric learning. IEEE Trans Pattern Anal Mach Intell 40(9):2281–2288
Article Google Scholar
Shen Z, Xu Y, Ni B, Wang M, Hu J, Yang X (2018). Crowd counting via adversarial cross-scale consistency pursuit. In proceedings of the IEEE conference On Computer Vision And Pattern Recognition (CVPR) 5245–5254
Song C, Huang Y, Ouyang W, Wang L (2018). Mask-guided contrastive attention model for person re-identification. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1179-1188
Shi Z, Zhang L, Liu Y, Cao X, Ye Y, Cheng MM, Zheng G (2018). Crowd counting with deep negative correlation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition 5382–5390
Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 833-841
Boominathan L, Kruthiventi SS, Babu RV (2016, October) Crowdnet: a deep convolutional network for dense crowd counting. In Proceedings of the 24th ACM international conference on Multimedia 640–644
Zhang Y, Zhou D, Chen S, Gao S, & Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 589-597
Sindagi VA, Patel VM (2017, August) Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) 1-6
Li Y, Zhang X, Chen D (2018) Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 1091-1100
Cao X, Wang Z, Zhao Y, Su F (2018) Scale aggregation network for accurate and efficient crowd counting. In Proceedings of the European Conference on Computer Vision (ECCV) 734-750
Liu J, Gao C, Meng D, Hauptmann AG (2018) Decidenet: counting varying density crowds through attention guided detection and density estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 5197-5206
Ranjan V, Le H, Hoai M (2018) Iterative crowd counting. In Proceedings of the European Conference on Computer Vision (ECCV) 270-285
Liu W, Salzmann M, Fua P (2019). Context-aware crowd counting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 5099-5108
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017). Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) 764-773
Sindagi VA, Patel VM (2017). Generating high-quality crowd density maps using contextual pyramid cnns. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) 1861-1870
Liu N, Long Y, Zou C, Niu Q, Pan L, & Wu H (2019) Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3225-3234
Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua TS (2017) Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning. In proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 5659-5667
Loshchilov I, Hutter F (2015) Online batch selection for faster training of neural networks. arXiv preprint arXiv:1511.06343
Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 761-769
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Idrees H, Saleemi I, Seibert C, Shah M (2013). Multi-source multiscale counting in extremely dense crowd images. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 2547-2554
Sam DB, Surya S, Babu RV (2017, July) switching convolutional neural network for crowd counting. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4031-4039
Liu X, van de Weijer J, Bagdanov AD (2018) Leveraging unlabeled data for crowd counting by learning to rank. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 7661-7669
Babu Sam D, Sajjan NN, Venkatesh Babu R, Srinivasan M (2018). Divide and grow: capturing huge diversity in crowd images with incrementally growing cnn. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 3618-3626
Liu L, Jiang J, Jia W, Amirgholipour S, Zeibots M, He X (2019) DENet: a Universal Network for Counting Crowd with Varying Densities and Scales. arXiv preprint arXiv:1904.08056

Download references

Acknowledgements

The authors are grateful for collaborative funding support from the Natural Science Foundation of Shandong Province, China (ZR2018MEE008), National Natural Science Foundation of China (51904173), in part by the Project of Shandong Province High Educational Science and Technology Program (J18KA307).

Author information

Authors and Affiliations

College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, 266590, China
Yan-Bo Liu, Rui-Sheng Jia, Qing-Ming Liu, Xing-Li Zhang & Hong-Mei Sun
Shandong Province Key Laboratory of Wisdom Mine Information Technology, Shandong University of Science and Technology, Qingdao, 266590, China
Rui-Sheng Jia, Xing-Li Zhang & Hong-Mei Sun

Authors

Yan-Bo Liu
View author publications
You can also search for this author in PubMed Google Scholar
Rui-Sheng Jia
View author publications
You can also search for this author in PubMed Google Scholar
Qing-Ming Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xing-Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Mei Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Rui-Sheng Jia or Hong-Mei Sun.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, YB., Jia, RS., Liu, QM. et al. Crowd counting method based on the self-attention residual network. Appl Intell 51, 427–440 (2021). https://doi.org/10.1007/s10489-020-01842-w

Download citation

Published: 17 August 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s10489-020-01842-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Crowd counting method based on the self-attention residual network

Abstract

Access this article

Similar content being viewed by others

An encoder-decoder network for crowd counting based on multi-scale attention mechanism

Lightweight multi-scale network with attention for accurate and efficient crowd counting

Two stages double attention convolutional neural network for crowd counting

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Crowd counting method based on the self-attention residual network

Abstract

Access this article

Similar content being viewed by others

An encoder-decoder network for crowd counting based on multi-scale attention mechanism

Lightweight multi-scale network with attention for accurate and efficient crowd counting

Two stages double attention convolutional neural network for crowd counting

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation