Skip to main content
Log in

MSFFA: a multi-scale feature fusion and attention mechanism network for crowd counting

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Crowd counting has been a growing hot topic in the computer vision community in recent years due to its extensive applications in the fields of public safety and commercial planning. However, up to now, it has been still a challenging task in realistic scenes owing to large-scale variations and complex background interference. In this paper, we have proposed an efficient end-to-end Multi-Scale Feature Fusion and Attention mechanism CNN network, named as MSFFA. The presented network consists of three parts: the front-end of the low-level feature extractor, the mid-end of the multi-scale feature fusion operator and the back-end of the density map generator. Among them, most significantly, in the mid-end, we stack three MSFF blocks with the residual connection, which on the one hand, makes the network obtain large-scale continuous variations and on the other hand, enhances the information transmission. Meanwhile, a global attention mechanism module is employed to extract effective features in complex background scenes. Our method has been evaluated on three public datasets, including ShanghaiTech, UCF-QNRF and UCF_CC_50. Experimental results show that our method outperforms some existing advanced approaches, indicating its excellent accuracy and stability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)

  2. Li, Y., Zhang, X., Chen, D.: CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1091–1100 (2018)

  3. Zhu, M., Wang, X., Tang, J., Wang, N., Qu, L.: Attentive multi-stage convolutional neural network for crowd counting. Pat. Recognit. Lett. 135, 279–285 (2020)

    Article  Google Scholar 

  4. Miao, Y., Han, J., Gao, Y., Zhang, B.: ST-CNN: Spatial-Temporal convolutional neural network for crowd counting in videos. Pat. Recognit. Lett. 125, 113–118 (2019)

    Article  Google Scholar 

  5. Khan, S.D., Basalamah, S.: Scale and density invariant head detection deep model for crowd counting in pedestrian crowds. Vis. Comput. 37, 2127–2137 (2021)

    Article  Google Scholar 

  6. Guo, D., Li, K., Zha, Z. J., Wang, M.: DADNet: Dilated-attention-deformable convnet for crowd counting. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1823–1832 (2019)

  7. Gao, J., Wang, Q., Yuan, Y.: SCAR: Spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing 363, 1–8 (2019)

    Article  Google Scholar 

  8. Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: An evaluation of the state of the art Mach. IEEE Trans. Pat. Anal. Mach. Intell. 34(4), 743–761 (2011)

    Article  Google Scholar 

  9. Topkaya, I. S., Erdogan, H., Porikli, F.: Counting people by clustering person detector outputs. In: 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 313– 318 (2014)

  10. Leibe, B., Seemann, E., Schiele, B.: Pedestrian detection in crowded scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 878–885 (2005)

  11. Wu, B., Nevatia, R.: Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. Int. J. Comput. Vision 75(2), 247–266 (2007)

    Article  Google Scholar 

  12. Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2547–2554 (2013)

  13. Chan, A. B., Vasconcelos, N.: Bayesian poisson regression for crowd counting. In: IEEE International Conference on Computer Vision (2009)

  14. Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: European Conference on Computer Vision (ECCV), pp. 734–750 (2018)

  15. Babu Sam, D., Surya, S., Venkatesh Babu, R.: Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4031–4039 (2017)

  16. Zhu, F., Yan, H., Chen, X., Li, T., Zhang, Z.: A multi-scale and multi-level feature aggregation network for crowd counting. Neurocomputing 423, 46–56 (2021)

    Article  Google Scholar 

  17. Dai, F., Liu, H., Ma, Y., Cao, J., Zhao, Q., Zhang, Y.: Dense scale network for crowd counting (2019). arXiv preprint https://arxiv.org/abs/1906.09707.

  18. Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doermann, D., Shao, L.: Crowd counting and density estimation by trellis encoder-decoder networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6133–6142 (2019)

  19. Zhang, L., Shi, M., Chen, Q.: Crowd counting via scale adaptive convolutional neural network. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 1113–1121 (2018)

  20. Gao, J., Wang, Q., Li, X.: PCC-net: Perspective crowd counting via spatial convolutional network. IEEE T. Circ. Syst Vid. 30(10), 3486–3498 (2019)

    Article  Google Scholar 

  21. Zhang, Y., Zhou, C., Chang, F., Kot, A.C.: Multi-resolution attention convolutional neural network for crowd counting. Neurocomputing 329, 144–152 (2019)

    Article  Google Scholar 

  22. Wang, S., Lu, Y., Zhou, T., Di, H., Lu, L., Zhang, L.: SCLNet: Spatial context learning network for congested crowd counting. Neurocomputing 404, 227–239 (2020)

    Article  Google Scholar 

  23. Cheng, J., Chen, Z., Zhang, X., Li, Y., Jing, X.: Exploit the potential of multi-column architecture for crowd counting (2020). arXiv preprint arXiv: 2007.05779

  24. Xie, J., Pang, Y., Cholakkal, H., Anwer, R., Khan, F., Shao, L.: PSC-Net: Learning part spatial co-occurrence for occluded pedestrian detection. Sci. China Inf. Sci. 64, 1–13 (2021)

    Article  MathSciNet  Google Scholar 

  25. Wang, Y., Zhang, W., Liu, Y., Zhu, J.: Two-branch fusion network with attention map for crowd counting. Neurocomputing 411, 1–8 (2020)

    Article  Google Scholar 

  26. Dong, L., Zhang, H., Ji, Y., Ding, Y.: Crowd counting by using multi-level density-based spatial information: A Multi-scale CNN framework. Inf. Sci. 528, 79–91 (2020)

    Article  MathSciNet  Google Scholar 

  27. Liu, Y.B., Jia, R.S., Liu, Q.M., Zhang, X.L., Sun, H.M.: Crowd counting method based on the self-attention residual network. Appl. Intell. 51(1), 427–440 (2021)

    Article  Google Scholar 

  28. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint https://arxiv.org/abs/1409.1556

  29. Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pp.3–19 (2018)

  30. Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., Shah, M.: Composition loss for counting, density map estimation and localization in dense crowds. In: European Conference on Computer Vision (ECCV), pp. 532–546 (2018)

  31. Zeng, X., Wu, Y., Hu, S., Wang, R., Ye, Y.: DSPNet: Deep scale purifier network for dense crowd counting. Expert Syst. Appl. 141, 112977 (2020)

    Article  Google Scholar 

  32. Ma, J., Dai, Y., Tan, Y.P.: Atrous convolutions spatial pyramid network for crowd counting and density estimation. Neurocomputing 350, 91–101 (2019)

    Article  Google Scholar 

  33. Luo, A., Yang, F., Li, X., Nie, D., Jiao, Z., Zhou, S., Cheng, H.: Hybrid graph neural networks for crowd counting. In: AAAI Conference on Artificial Intelligence (AAAI), pp 11693–11700 (2020)

  34. Zeng, L., Xu, X., Cai, B., Qiu, S., & Zhang, T.: Multi-scale convolutional neural networks for crowd counting. In: Proceedings of the IEEE International Conference on Image Processing, pp. 465–469 (2017)

  35. Oh, M. H., Olsen, P., Ramamurthy, K. N.: Crowd counting with decomposed uncertainty. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 11799–11806 (2020)

  36. Sindagi, V. A., Patel, V. M.: CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 1–6 (2017)

  37. Sam, D. B., Peri, S. V., Sundararaman, M. N., Kamath, A., Radhakrishnan, V. B.: Locate, size and count: accurately resolving people in dense crowds via detection. IEEE Trans. Pattern Anal. Mach. Intell., (2020)

  38. Zhang, B., Wang, N., Zhao, Z., Abraham, A., Liu, H.: Crowd counting based on attention-guided multi-scale fusion networks. Neurocomputing 451, 12–24 (2021)

    Article  Google Scholar 

  39. Liu, L., Qiu, Z., Li, G., Liu, S., Ouyang, W., & Lin, L: Crowd counting with deep structured scale integration network. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 1774–1783 (2019)

  40. Ma, Z., Wei, X., Hong, X., & Gong, Y. Bayesian loss for crowd count estimation with point supervision. In Proceedings of the IEEE International Conference on Computer Vision. pp. 6142–6151 (2019)

  41. Jiang, X., Zhang, L., Xu, M., Zhang, T., Lv, P., Zhou, B., & Pang, Y. Attention scaling for crowd counting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4706–4715 (2020)

  42. Zhou, J. T., Zhang, L., Du, J., Peng, X., Fang, Z., Xiao, Z., Zhu, H.: Locality-Aware Crowd Counting, IEEE Trans. Pattern Anal. Mach. Intell., (2021)

  43. Wan, J., Liu, Z., Chant, A. B.: A Generalized Loss Function for Crowd Counting and Localization, In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.1974–1983 (2021)

Download references

Acknowledgements

This work is partially supported by Public Security Subject Basic Theory Research Project (2021XKZX08), Fundamental Research Funds for the Central Universities (2021JKF102) and Open Research Fund of the Public Security Behavioral Science Laboratory (2020SYS16), People’s Public Security University of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuhua Lu.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare relevant to the contents of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Z., Lu, S., Dong, Y. et al. MSFFA: a multi-scale feature fusion and attention mechanism network for crowd counting. Vis Comput 39, 1045–1056 (2023). https://doi.org/10.1007/s00371-021-02383-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02383-0

Keywords

Navigation