Skip to main content

Advertisement

Log in

Crowd counting method based on the self-attention residual network

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Estimating the crowd density in surveillance videos is a hot issue in the field of computer vision and has become the basis of data processing and analysis of public transport services, commercial passenger flow analysis, public security protection and other industries. However, in terms of practical applications, due to the problems of pedestrian occlusion and scale changes, existing methods are inadequate with regard to the acquisition of the human head, which affects the accuracy of counting. To solve this problem, a crowd counting method based on a self-attention residual network is proposed. First, a multiscale convolution module composed of dilated convolution and deformation convolution is used. To avoid losing image resolution, some of the sampling positions are shifted to the occluded crowd by shifting the sampling points, which solves the problem of crowd occlusion. Then, a self-attention residual module is designed to score and classify the feature map, which allows all pixels in the feature map to be classified. The corresponding weight is generated, and the population scale is determined by the weight, which solves the problem of crowd scale changes. The algorithm is applied in ShanghaiTech and the UCF_CC_50 and WorldExpo’10 datasets are tested. The experimental results show that the mean absolute error (MAE) and mean square error (MSE) of this algorithm are significantly reduced compared with those of a comparative algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Li T, Chang H, Wang M, Ni B, Hong R, Yan S (2014) Crowded scene analysis: a survey. IEEE transactions on circuits and systems for video technology 25(3):367–386

    Article  Google Scholar 

  2. Onoro-Rubio D, López-Sastre R J (2016) towards perspective-free object counting with deep learning. In European Conference on Computer Vision (ECCV) 615-629

  3. Zhang S, Wu G, Costeira JP, Moura JM (2017) Understanding traffic density from large-scale web camera data. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 5898–5907

  4. Lin SF, Chen JY, Chao HX (2001) Estimation of number of people in crowded scenes using perspective transformation. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 31(6):645–654

    Article  Google Scholar 

  5. Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In: In Tenth IEEE International Conference on Computer Vision (ICCV'05), 1(1), pp 90–97

  6. Li M, Zhang Z, Huang K, Tan T (2008, December). Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In 2008 19th International Conference on Pattern Recognition 1–4

  7. Zhao T, Nevatia R, Wu B (2008) Segmentation and tracking of multiple humans in crowded environments. IEEE Trans Pattern Anal Mach Intell 30(7):1198–1211

    Article  Google Scholar 

  8. Ge W, Collins RT (2009). Marked point processes for crowd counting. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2913–2920

  9. Wang M, Wang X (2011). Automatic adaptation of a generic pedestrian detector to a specific traffic scene. In CVPR 3401-3408

  10. Viola P, Jones MJ, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J Comput Vis 63(2):153–161

    Article  Google Scholar 

  11. Liang R, Zhu Y, Wang H (2014) Counting crowd flow based on feature points. Neurocomputing 133:377–384

    Article  Google Scholar 

  12. Siva P, Shafiee MJ, Jamieson M, Wong A (2016). Scene invariant crowd segmentation and counting using scale-normalized histogram of moving gradients (homg). arXiv preprint arXiv:1602.00386

  13. An S, Liu W, Venkatesh S (2007). Face recognition using kernel ridge regression. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1–7

  14. Chan AB, Vasconcelos N (2009, September) Bayesian poisson regression for crowd counting. IEEE International Conference on Computer Vision (ICCV) 545–551

  15. Pham VQ, Kozakaya T, Yamaguchi O, Okada R (2015) Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. Proc IEEE International Conference on Computer Vision (ICCV) 3253–3261

  16. Zhang S, Wu G, Costeira JP, Moura JM (2017). Fcn-rlstm: deep spatio-temporal neural networks for vehicle counting in city cameras. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) 3667–3676

  17. Hu J, Lu J, Tan YP (2017) Sharable and individual multi-view metric learning. IEEE Trans Pattern Anal Mach Intell 40(9):2281–2288

    Article  Google Scholar 

  18. Shen Z, Xu Y, Ni B, Wang M, Hu J, Yang X (2018). Crowd counting via adversarial cross-scale consistency pursuit. In proceedings of the IEEE conference On Computer Vision And Pattern Recognition (CVPR) 5245–5254

  19. Song C, Huang Y, Ouyang W, Wang L (2018). Mask-guided contrastive attention model for person re-identification. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1179-1188

  20. Shi Z, Zhang L, Liu Y, Cao X, Ye Y, Cheng MM, Zheng G (2018). Crowd counting with deep negative correlation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition 5382–5390

  21. Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 833-841

  22. Boominathan L, Kruthiventi SS, Babu RV (2016, October) Crowdnet: a deep convolutional network for dense crowd counting. In Proceedings of the 24th ACM international conference on Multimedia 640–644

  23. Zhang Y, Zhou D, Chen S, Gao S, & Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 589-597

  24. Sindagi VA, Patel VM (2017, August) Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) 1-6

  25. Li Y, Zhang X, Chen D (2018) Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 1091-1100

  26. Cao X, Wang Z, Zhao Y, Su F (2018) Scale aggregation network for accurate and efficient crowd counting. In Proceedings of the European Conference on Computer Vision (ECCV) 734-750

  27. Liu J, Gao C, Meng D, Hauptmann AG (2018) Decidenet: counting varying density crowds through attention guided detection and density estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 5197-5206

  28. Ranjan V, Le H, Hoai M (2018) Iterative crowd counting. In Proceedings of the European Conference on Computer Vision (ECCV) 270-285

  29. Liu W, Salzmann M, Fua P (2019). Context-aware crowd counting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 5099-5108

  30. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  31. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017). Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) 764-773

  32. Sindagi VA, Patel VM (2017). Generating high-quality crowd density maps using contextual pyramid cnns. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) 1861-1870

  33. Liu N, Long Y, Zou C, Niu Q, Pan L, & Wu H (2019) Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3225-3234

  34. Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua TS (2017) Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning. In proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 5659-5667

  35. Loshchilov I, Hutter F (2015) Online batch selection for faster training of neural networks. arXiv preprint arXiv:1511.06343

  36. Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 761-769

  37. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  38. Idrees H, Saleemi I, Seibert C, Shah M (2013). Multi-source multiscale counting in extremely dense crowd images. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 2547-2554

  39. Sam DB, Surya S, Babu RV (2017, July) switching convolutional neural network for crowd counting. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4031-4039

  40. Liu X, van de Weijer J, Bagdanov AD (2018) Leveraging unlabeled data for crowd counting by learning to rank. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 7661-7669

  41. Babu Sam D, Sajjan NN, Venkatesh Babu R, Srinivasan M (2018). Divide and grow: capturing huge diversity in crowd images with incrementally growing cnn. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 3618-3626

  42. Liu L, Jiang J, Jia W, Amirgholipour S, Zeibots M, He X (2019) DENet: a Universal Network for Counting Crowd with Varying Densities and Scales. arXiv preprint arXiv:1904.08056

Download references

Acknowledgements

The authors are grateful for collaborative funding support from the Natural Science Foundation of Shandong Province, China (ZR2018MEE008), National Natural Science Foundation of China (51904173), in part by the Project of Shandong Province High Educational Science and Technology Program (J18KA307).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Rui-Sheng Jia or Hong-Mei Sun.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, YB., Jia, RS., Liu, QM. et al. Crowd counting method based on the self-attention residual network. Appl Intell 51, 427–440 (2021). https://doi.org/10.1007/s10489-020-01842-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01842-w

Keywords

Navigation