Skip to main content
Log in

\(\hbox {DA}^2\)Net: a dual attention-aware network for robust crowd counting

  • Special Issue Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Crowd counting in congested scenes is a crucial yet challenging task in video surveillance and urban security system. The performance of crowd counting has been greatly boosted with the rapid development of deep learning. However, robust crowd counting in high-density environment with scale variations remains under-explored. To address this problem, we propose a dual attention-aware network (\(\hbox {DA}^2\)Net) for robust crowd counting in dense crowd scene with scale variations. Specifically, the \(\hbox {DA}^2\)Net consists of two modules, namely Spatial Attention (SA) module and Channel Attention (CA) module. The SA module focuses on the spatial dependencies in the whole feature map to locate the heads accurately. The CA module attempts to handle the relations between channel maps and highlights the discriminative information in specific channels. Thus, it alleviates the mistaken estimation for background regions. The interactions between SA module and CA module provide the synergy which facilitates the learning of discriminative features with a focus on the essential head region. Experimental results on five benchmark datasets, i.e., ShanghaiTech, UCF_CC_50, UCF-QNRF, WorldExpo’10, and NWPU, demonstrate that the \(\hbox {DA}^2\)Net can achieve the state-of-the-art performance on both accuracy and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Bai, H., Chan, S.: Cnn-based single image crowd counting: Network design, loss function and supervisory signal. ArXiv arXiv:abs/2012.15685 (2020)

  2. Ben, X., Ren, Y., Zhang, J., Wang, S.J., Kpalma, K., Meng, W., Liu, Y.: Video-based facial micro-expression analysis: A survey of datasets, features and algorithms. IEEE Trans. Pattern Anal. Mach. Intell. pp. 1–1 (2021)

  3. Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)

  4. Chen, K., Gong, S., Xiang, T., Loy, C.C.: Cumulative attribute space for age and crowd density estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2467–2474 (2013)

  5. Chen, K., Loy, C.C., Gong, S., Xiang, T.: Feature mining for localised crowd counting. In: Proceedings of the British Machine Vision Conference (BMVC), p. 3 (2012)

  6. Chen, X., Bin, Y., Sang, N., Gao, C.: Scale pyramid network for crowd counting. In: Proceedings of the IEEE Workshop on Applications of Computer Vision (WACV), pp. 1941–1950 (2019)

  7. Chen, X., Yan, H., Li, T., Xu, J., Zhu, F.: Adversarial scale-adaptive neural network for crowd counting. Neurocomputing 450, 14–24 (2021)

    Article  Google Scholar 

  8. Cheng, J., Xiong, H., Cao, Z., Lu, H.: Decoupled two-stage crowd counting and beyond. IEEE Trans Image Process 30, 2862–2875 (2021)

    Article  MathSciNet  Google Scholar 

  9. Davies, A.C., Yin, J., Velastin, S.: Crowd monitoring using image processing. Electron Commun Eng J 7, 37–47 (1995)

    Article  Google Scholar 

  10. Ding, X., Lin, Z., He, F., Wang, Y., Huang, Y.: A deeply-recursive convolutional network for crowd counting. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1942–1946 (2018)

  11. Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34, 743–761 (2012)

    Article  Google Scholar 

  12. Gao, J., Lin, W., Zhao, B., Wang, D., Gao, C., Wen, J.: \(c^3\) framework: An open-source pytorch code for crowd counting. ArXiv arXiv:abs/1907.02724 (2019)

  13. Gao, J., Wang, Q., Li, X.: Pcc net: perspective crowd counting via spatial convolutional network. IEEE Trans Circuits Syst Video Technol 30, 3486–3498 (2020)

    Article  Google Scholar 

  14. Gao, J., Wang, Q., Yuan, Y.: Scar: Spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing 363, 1–8 (2019)

    Article  Google Scholar 

  15. Guo, D., Li, K., Zha, Z., Wang, M.: Dadnet: Dilated-attention-deformable convnet for crowd counting. In: Proceedings of the ACM International Conference on Multimedia (ACM MM) (2019)

  16. Hossain, M., Hosseinzadeh, M., Chanda, O., Wang, Y.: Crowd counting using scale-aware attention networks. In: Proceedings of the IEEE Workshop on Applications of Computer Vision (WACV), pp. 1280–1288 (2019)

  17. Huang, S., Li, X., Zhang, Z., Wu, F., Gao, S., Ji, R., Han, J.: Body structure aware deep crowd counting. IEEE Trans Image Process 27, 1049–1059 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  18. Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2547–2554 (2013)

  19. Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., Shah, M.: Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 532–546 (2018)

  20. Jiang, X., Zhang, L., Xu, M., Zhang, T., Lv, P., Zhou, B., Yang, X., Pang, Y.: Attention scaling for crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4705–4714 (2020)

  21. Kang, D., Chan, A.B.: Crowd counting by adaptively fusing predictions from an image pyramid. In: Proceedings of the British Machine Vision Conference (BMVC) (2018)

  22. Kang, D., Ma, Z., Chan, A.B.: Beyond counting: comparisons of density maps for crowd analysis tasks-counting, detection, and tracking. IEEE Trans Circuits Syst Video Technol 29, 1408–1422 (2019)

    Article  Google Scholar 

  23. Kasmani, S.A., He, X., Jia, W., Wang, D., Zeibots, M.: A-ccnn: Adaptive ccnn for density estimation and crowd counting. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 948–952 (2018)

  24. Li, M., Zhang, Z., Huang, K., Tan, T.: Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–4 (2008)

  25. Li, Y., Zhang, X., Chen, D.: Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1091–1100 (2018)

  26. Liu, J., Gao, C., Meng, D., Hauptmann, A.: Decidenet: Counting varying density crowds through attention guided detection and density estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5197–5206 (2018)

  27. Liu, L., Jiang, J., Jia, W., Amirgholipour, S., Wang, Y., Zeibots, M., He, X.: Denet: A universal network for counting crowd with varying densities and scales. IEEE Trans Multimedia 23, 1060–1068 (2021)

    Article  Google Scholar 

  28. Liu, L., Wang, H., Li, G., Ouyang, W., Lin, L.: Crowd counting using deep recurrent spatial-aware network. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 849–855 (2018)

  29. Liu, M., Wang, X., Nie, L., Tian, Q., Chen, B., Chua, T.S.: Cross-modal moment localization in videos. In: Proceedings of the ACM International Conference on Multimedia (ACM MM), pp. 843–851 (2018)

  30. Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5094–5103 (2019)

  31. Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 1150–1157 (1999)

  32. Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 6141–6150 (2019)

  33. Marsden, M., McGuinness, K., Little, S., O’Connor, N.: Fully convolutional crowd counting on highly congested scenes. In: Proceedings of the International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP), pp. 27–33 (2017)

  34. Mini-hwan O., Olsen, P., Ramamurthy, K.: Crowd counting with decomposed uncertainty. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 11799–11806 (2020)

  35. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS) (2017)

  36. Ranjan, V., Le, H.M., Hoai, M.: Iterative crowd counting. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 278–293 (2018)

  37. Sam, D.B., Babu, R.V.: Top-down feedback for crowd counting convolutional neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) (2018)

  38. Sam, D.B., Peri, S., Sundararaman, M.N., Kamath, A., Babu, R.V.: Locate, size and count: accurately resolving people in dense crowds via detection. IEEE Trans. Pattern Anal. Mach. Intell. 43, 2739–2751 (2021)

    Google Scholar 

  39. Sam, D.B., Sajjan, N.N., Babu, R.V.: Divide and grow: Capturing huge diversity in crowd images with incrementally growing cnn. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3618–3626 (2018)

  40. Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4031–4039 (2017)

  41. de Santana Correia, A., Colombini, E.: Attention, please! a survey of neural attention models in deep learning. ArXiv arXiv:abs/2103.16775 (2021)

  42. Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., Yang, X.: Crowd counting via adversarial cross-scale consistency pursuit. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 5245–5254 (2018)

  43. Shi, X., Li, X., Wu, C., Kong, S., Yang, J.S., He, L.: A real-time deep network for crowd counting. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2328–2332 (2020)

  44. Sindagi, V., Patel, V.: Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6 (2017)

  45. Sindagi, V., Patel, V.: Generating high-quality crowd density maps using contextual pyramid cnns. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 1879–1888 (2017)

  46. Sindagi, V.A., Patel, V.M.: A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recognit. Lett. 107, 3–16 (2018)

    Article  Google Scholar 

  47. Wang, Q., Gao, J., Lin, W., Li, X.: Nwpu-crowd: A large-scale benchmark for crowd counting and localization. IEEE Trans. Pattern Anal. Mach. Intell. 43, 2141–2149 (2021)

    Article  Google Scholar 

  48. Wang, Q., Gao, J., Lin, W., Yuan, Y.: Learning from synthetic data for crowd counting in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8190–8199 (2019)

  49. Wang, Q., Han, T., Gao, J., Yuan, Y.: Neuron linear transformation: Modeling the domain shift for crowd counting. IEEE transactions on neural networks and learning systems PP (2021)

  50. Wang, Q., Lin, W., Gao, J., Li, X.: Density-aware curriculum learning for crowd counting. IEEE Transactions on Cybernetics pp. 1–13 (2020)

  51. Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11531–11539 (2020)

  52. Wang, Y., Hu, S., Wang, G., Chen, C., Pan, Z.: Multi-scale dilated convolution of convolutional neural network for crowd counting. Multimed Tool Appl 79, 1057–1073 (2019)

    Article  Google Scholar 

  53. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

  54. Xiong, F., Shi, X., Yeung, D.: Spatiotemporal modeling for crowd counting in videos. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 5161–5169 (2017)

  55. Yang, B., Cao, J., Wang, N., Zhang, Y., Zou, L.: Counting challenging crowds robustly using a multi-column multi-task convolutional neural network. Signal Process. Image Commun. 64, 118–129 (2018)

    Article  Google Scholar 

  56. Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 833–841 (2015)

  57. Zhang, L., Shi, M., Chen, Q.: Crowd counting via scale-adaptive convolutional neural network. In: Proceedings of the IEEE Workshop on Applications of Computer Vision (WACV), pp. 1113–1121 (2018)

  58. Zhang, L., Shi, Z., Cheng, M.M., Liu, Y., Bian, J.W., Zhou, J.T., Zheng, G., Zeng, Z.: Nonlinear regression via deep negative correlation learning. IEEE Trans. Pattern Anal. Mach. Intell. 43, 982–998 (2021)

    Article  Google Scholar 

  59. Zhang, X., Liu, M., Yin, J., Ren, Z., Nie, L.: Question tagging via graph-guided ranking. In: Proceedings of the ACM International Conference on Multimedia (ACM MM), pp. 1–23 (2022)

  60. Zhang, Y., Zhou, C., Chang, F., Kot, A.: Multi-resolution attention convolutional neural network for crowd counting. Neurocomputing 329, 144–152 (2019)

    Article  Google Scholar 

  61. Zhang, Y., Zhou, C., Chang, F., Kot, A.C.: Attention to head locations for crowd counting. In: Proceedings of the International Conference on Image and Graphics (ICIG), pp. 727–737 (2019)

  62. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 589–597 (2016)

  63. Zhao, Y., Nie, W., Liu, A.A., Gao, Z., Su, Y.: Svhan: Sequential view based hierarchical attention network for 3d shape recognition. In: Proceedings of the ACM International Conference on Multimedia (ACM MM), pp. 2130–2138 (2021)

  64. Zitouni, M.S., Bhaskar, H., Dias, J., Al-Mualla, M.: Advances and trends in visual crowd analysis: a systematic survey and evaluation of crowd modelling techniques. Neurocomputing 186, 139–159 (2016)

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61801272), the National Natural Science Foundation of Shandong Province (Nos.ZR2021QD041 and ZR2020MF127), and Shandong Provincial Key Research and Development Program (Major Scientific and Technological Innovation Project) (No. 2019JZZY010119).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingliang Gao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhai, W., Li, Q., Zhou, Y. et al. \(\hbox {DA}^2\)Net: a dual attention-aware network for robust crowd counting. Multimedia Systems 29, 3027–3040 (2023). https://doi.org/10.1007/s00530-021-00877-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-021-00877-4

Keywords

Navigation