Skip to main content
Log in

AMS-CNN: Attentive multi-stream CNN for video-based crowd counting

  • Regular Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

In recent years video-based crowd counting and density estimation (CCDE) have become essential for crowd analysis. Current approaches rarely exploit spatial–temporal features for CCDE, and they also usually do not consider measures to minimize the frame's background influence for obtaining crowd density maps, which has resulted in lower performance in terms of Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). Again, attention to individual feature set's response toward crowd counting is also neglected. To this end, we are motivated to design an end-to-end trainable attentive multi-stream convolutional neural network (AMS-CNN) for crowd counting. At first, a multi-stream CNN (MS-CNN) is designed to obtain crowd density maps. The MS-CNN comprises three streams to fuse deep spatial, temporal, and spatial foreground features from different cues of the crowd video dataset, like frames, the volume of frames, and foregrounds of frames. To improve the accuracy, we designed three stream-wise attention modules to generate attentive crowd density maps, and their relative average is obtained using a relative averaged attentive density-map (RAAD) layer. The relative averaged density map is concatenated with the MS-CNN output, followed by two-stage CNN blocks to get the final density map. The experiments are demonstrated on three publicly available crowd density video datasets: Mall, UCSD, and Venice. We obtained promising and better results in terms of MAE and RMSE as compared with state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Liu Y, Shi M, Zhao Q, Wang X (2019) Point in, box out: beyond counting persons in crowds. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2019.00663

    Article  Google Scholar 

  2. Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localised crowd counting. BMVC 1:1–11

    Google Scholar 

  3. An S, Liu A, Venkatesh S (2007) Face recognition using kernel ridge regression. In: CVPR’07 IEEE Conference on, IEEE, pp. 1–7

  4. Chan AB, Liang ZSJ, Vasconcelos N (2008) Privacy preserving crowd monitoring: counting people without people models or tracking. In: 26th IEEE Conference computer vision and pattern recognition, CVPR. https://doi.org/10.1109/CVPR.2008.4587569

  5. Chen K, Gong S, Xiang T, Loy CC (2013) Cumulative attribute space for age and crowd density estimation. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2013.319

    Article  Google Scholar 

  6. Wang C, Zhang H, Yang L, et al (2015) Deep people counting in extremely dense crowds. In: MM 2015 - Proceedings of the 2015 ACM Multimedia Conference. pp 1299–1302

  7. Shang C, Ai H, Bai B (2016) End-to-end crowd counting via joint learning local and global count. Proc - Int Conference on Image Process ICIP 2016-August pp. 1215–1219. https://doi.org/10.1109/ICIP.2016.7532551

  8. Hu Y, Chang H, Nian F et al (2016) Dense crowd counting from still images with convolutional neural networks. J Vis Commun Image Represent 38:530–539. https://doi.org/10.1016/j.jvcir.2016.03.021

    Article  Google Scholar 

  9. Miao Y, Han J, Gao Y, Zhang B (2019) ST-CNN: Spatial-temporal convolutional neural network for crowd counting in videos. Pattern Recognit Lett 125:113–118. https://doi.org/10.1016/j.patrec.2019.04.012

    Article  Google Scholar 

  10. Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 07–12-June:833–841. https://doi.org/10.1109/CVPR.2015.7298684

  11. Cho SH, Kang HB (2014) Abnormal behavior detection using hybrid agents in crowded scenes. Pattern Recognit Lett 44:64–70. https://doi.org/10.1016/j.patrec.2013.11.017

    Article  Google Scholar 

  12. Tripathy SK, Srivastava R (2020) A real-time two-input stream multi-column multi-stage convolution neural network (TIS-MCMS-CNN) for efficient crowd congestion-level analysis. Multimed Syst 26:585–605. https://doi.org/10.1007/s00530-020-00667-4

    Article  Google Scholar 

  13. Yang DS, Liu CY, Liao WH, Ruan SJ (2020) Crowd gathering and commotion detection based on the stillness and motion model. Multimed Tools Appl 79:19435–19449. https://doi.org/10.1007/s11042-020-08827-4

    Article  Google Scholar 

  14. Shi X, Li X, Wu C, et al (2020) A real-time deep network for crowd counting

  15. Liu Z, Chen Y, Chen B et al (2019) Crowd counting method based on convolutional neural network with global density feature. IEEE Access 7:88789–88798. https://doi.org/10.1109/ACCESS.2019.2926881

    Article  Google Scholar 

  16. Xiong F, Shi X, Yeung DY (2017) Spatiotemporal modeling for crowd counting in videos. Proceedings of IEEE International Conference on Computer Vision 2017, pp. 5161–5169. https://doi.org/10.1109/ICCV.2017.551

  17. Zhang S, Wu G (2017) FCN-rLSTM : Deep Spatio-Temporal Neural Networks for. Iccv 3687–3696

  18. Zhang Y, Zhou D, Chen S et al (2016) Single-image crowd counting via multi-column convolutional neural network. Proc IEEE Conf Comput Vis pattern Recognit. https://doi.org/10.1002/slct.201701956

    Article  Google Scholar 

  19. Boominathan L (2016) CrowdNet : A deep convolutional network for dense crowd counting. In: Proceedings of 24th ACM International Conference on Multimedia pp. 640–644

  20. Zeng L, Xu X, Cai B et al (2017) Multi-scale convolutional neural networks for crowd counting. IEEE Int Conf Image Process 2017:465–469

    Google Scholar 

  21. Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: Proceedings- 30th IEEE Conference computer vision on pattern recognition, CVPR 2017 2017, pp. 4031–4039. https://doi.org/10.1109/CVPR.2017.429

  22. Zhang L, Shi M, Chen Q (2018) Crowd counting via scale-adaptive convolutional neural network. In: Proceedings - 2018 IEEE Winter conference applications of computer vision, WACV 2018 2018, pp. 1113–1121. https://doi.org/10.1109/WACV.2018.00127

  23. Wang Y, Hu S, Wang G et al (2020) Multi-scale dilated convolution of convolutional neural network for crowd counting. Multimed Tools Appl 79:1057–1073. https://doi.org/10.1007/s11042-019-08208-6

    Article  Google Scholar 

  24. Zhou Y, Yang J, Li H et al (2020) Adversarial learning for multiscale crowd counting under complex scenes. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2019.2956091

    Article  Google Scholar 

  25. Wang Y, Zhang W, Liu Y, Zhu J (2020) Multi-density map fusion network for crowd counting. Neurocomputing. https://doi.org/10.1016/j.neucom.2020.02.010

    Article  Google Scholar 

  26. Onoro-Rubio D, López-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: European conference on computer vision. Springer, Cham, pp. 615–629. https://doi.org/10.1007/978-3-319-46478-7_38

  27. Kang D, Chan A (2019) Crowd counting by adaptively fusing predictions from an image pyramid. In: Br Mach Vis Conf 2018, BMVC 2018 pp. 1–12

  28. Wei X, Du J, Liang M, Ye L (2019) Boosting deep attribute learning via support vector regression for fast moving crowd counting. Pattern Recognit Lett 119:12–23. https://doi.org/10.1016/j.patrec.2017.12.002

    Article  Google Scholar 

  29. Xu M, Ge Z, Jiang X et al (2019) Depth information guided crowd counting for complex crowd scenes. Pattern Recognit Lett 125:563–569. https://doi.org/10.1016/j.patrec.2019.02.026

    Article  Google Scholar 

  30. Mixture of Gaussian-2. https://docs.opencv.org/3.4/d1/dc5/tutorial_background_subtraction.html

  31. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536

    Article  Google Scholar 

  32. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. pp. 1–15

  33. Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: Proceedings of IEEE Computer Soc Conference on Computer Vision on Pattern Recognition 2019, pp. 5094–5103. https://doi.org/10.1109/CVPR.2019.00524

  34. Chan AB, Vasconcelos N (2012) Counting people with low-level features and bayesian regression. IEEE Trans Image Process 21:2160–2177. https://doi.org/10.1109/TIP.2011.2172800

    Article  MathSciNet  MATH  Google Scholar 

  35. Pham VQ, Kozakaya T, Yamaguchi O, Okada R (2015) COUNT forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of IEEE International Conference on Computer Vision 2015 Inter, pp. 3253–3261. https://doi.org/10.1109/ICCV.2015.372

  36. Han K, Wan W, Yao H, Hou L (2017) Image crowd counting using convolutional neural network and markov random field. J Adv Comput Intell Intell Inform 2:1–6

    Google Scholar 

  37. Zhang L, Lin L, Liang X, He K (2016) Is faster R-CNN doing well for pedestrian detection? Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 9906 LNCS:443–457. https://doi.org/10.1007/978-3-319-46475-6_28

  38. Li Y, Zhang X, Chen D (2018) CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2018.00120

    Article  Google Scholar 

  39. statistical visual computing laboratory (SVCL) at UC SanDiego (UCSD) UCSD Anomaly Detection Dataset. http://www.svcl.ucsd.edu/projects/anomaly/dataset.htm

  40. Lempitsky V, Zisserman A (2010) Learning to count objects in images victor. Adv Neural Inf Process Syst 3:1–5

    Google Scholar 

  41. Saqib M, Khan SD, Sharma N, Blumenstein M (2019) Crowd counting in low-resolution crowded scenes using region-based deep convolutional neural networks. IEEE Access 7:35317–35329. https://doi.org/10.1109/ACCESS.2019.2904712

    Article  Google Scholar 

Download references

Acknowledgments

The support and the resources provided by ‘PARAM Shivay Facility' under the National Supercomputing Mission, Government of India at the Indian Institute of Technology, Varanasi, are gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Santosh Kumar Tripathy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tripathy, S.K., Srivastava, R. AMS-CNN: Attentive multi-stream CNN for video-based crowd counting. Int J Multimed Info Retr 10, 239–254 (2021). https://doi.org/10.1007/s13735-021-00220-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-021-00220-7

Keywords

Navigation