Skip to main content

MSCANet: Adaptive Multi-scale Context Aggregation Network for Congested Crowd Counting

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12573))

Included in the following conference series:

Abstract

Crowd counting has achieved significant progress with deep convolutional neural networks. However, most of the existing methods don’t fully utilize spatial context information, and it is difficult for them to count the congested crowd accurately. To this end, we propose a novel Adaptive Multi-scale Context Aggregation Network (MSCANet), in which a Multi-scale Context Aggregation module (MSCA) is designed to adaptively extract and aggregate the contextual information from different scales of the crowd. More specifically, for each input, we first extract multi-scale context features via atrous convolution layers. Then, the multi-scale context features are progressively aggregated via a channel attention to enrich the crowd representations in different scales. Finally, a \(1\times 1\) convolution layer is applied to regress the crowd density. We perform extensive experiments on three public datasets: ShanghaiTech Part_A, UCF_CC_50 and UCF-QNRF, and the experimental results demonstrate the superiority of our method compared to current the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: ECCV (2018)

    Google Scholar 

  2. Chen, X., Bin, Y., Sang, N., Gao, C.: Scale pyramid network for crowd counting. In: WACV (2019)

    Google Scholar 

  3. Deb, D., Ventura, J.: An aggregated multicolumn dilated convolution network for perspective-free counting. In: CVPR Workshop (2018)

    Google Scholar 

  4. Gao, J., Lin, W., Zhao, B., Wang, D., Gao, C., Wen, J.: C\(^3\) framework: an open-source pytorch code for crowd counting. arXiv preprint arXiv:1907.02724 (2019)

  5. Gao, J., Wang, Q., Li, X.: PCC net: perspective crowd counting via spatial convolutional network. IEEE TCSVT 1 (2019)

    Google Scholar 

  6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  7. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)

    Google Scholar 

  8. Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: CVPR (2013)

    Google Scholar 

  9. Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: ECCV (2018)

    Google Scholar 

  10. Jiang, X., et al.: Crowd counting and density estimation by trellis encoder-decoder networks. In: CVPR (2019)

    Google Scholar 

  11. Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: NeurIPS (2010)

    Google Scholar 

  12. Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: CVPR (2018)

    Google Scholar 

  13. Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L., Wu, H.: ADCrowdNet: an attention-injective deformable convolutional network for crowd understanding. In: CVPR (2019)

    Google Scholar 

  14. Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: CVPR (2019)

    Google Scholar 

  15. Oñoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 615–629. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_38

    Chapter  Google Scholar 

  16. Ranjan, V., Le, H., Hoai, M.: Iterative crowd counting. In: ECCV (2018)

    Google Scholar 

  17. Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: CVPR (2017)

    Google Scholar 

  18. Shi, Z., Mettes, P., Snoek, C.G.M.: Counting with focus for free. In: ICCV (2019)

    Google Scholar 

  19. Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: ICCV (2017)

    Google Scholar 

  20. Wang, Q., Gao, J., Lin, W., Yuan, Y.: Learning from synthetic data for crowd counting in the wild. In: CVPR (2019)

    Google Scholar 

  21. Wang, S., Lu, Y., Zhou, T., Di, H., Lu, L., Zhang, L.: SCLNet: spatial context learning network for congested crowd counting. Neurocomputing 404, 227–239 (2020)

    Article  Google Scholar 

  22. Wang, S., Zhao, H., Wang, W., Di, H., Shu, X.: Improving deep crowd density estimation via pre-classification of density. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S. (eds.) ICONIP 2017. LNCS, vol. 10636, pp. 260–269. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70090-8_27

    Chapter  Google Scholar 

  23. Wang, Z., Xiao, Z., Xie, K., Qiu, Q., Zhen, X., Cao, X.: In defense of single-column networks for crowd counting. In: BMVC (2018)

    Google Scholar 

  24. Xie, Y., Lu, Y., Wang, S.: RSANet: deep recurrent scale-aware network for crowd counting. In: ICIP (2020)

    Google Scholar 

  25. Yang, L., Peng, H., Zhang, D., Fu, J., Han, J.: Revisiting anchor mechanisms for temporal action localization. IEEE TIP 29, 8535–8548 (2020)

    Google Scholar 

  26. Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: CVPR (2015)

    Google Scholar 

  27. Zhang, P., Liu, W., Lei, Y., Lu, H., Yang, X.: Cascaded context pyramid for full-resolution 3D semantic scene completion. arXiv preprint arXiv:1908.00382 (2019)

  28. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: CVPR (2016)

    Google Scholar 

  29. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)

    Google Scholar 

  30. Zhou, T., Li, J., Wang, S., Tao, R., Shen, J.: MATNet: motion-attentive transition network for zero-shot video object segmentation. IEEE TIP 29, 8326–8338 (2020)

    Google Scholar 

  31. Zhou, T., Lu, Y., Di, H.: Locality-constrained collaborative model for robust visual tracking. IEEE TCSVT 27(2), 313–325 (2015)

    Google Scholar 

  32. Zhou, T., Lu, Y., Di, H., Zhang, J.: Video object segmentation aggregation. In: ICME (2016)

    Google Scholar 

  33. Zhou, T., Lu, Y., Lv, F., Di, H., Zhao, Q., Zhang, J.: Abrupt motion tracking via nearest neighbor field driven stochastic sampling. Neurocomputing 165, 350–360 (2015)

    Article  Google Scholar 

  34. Zhou, T., Wang, S., Zhou, Y., Yao, Y., Li, J., Shao, L.: Motion-attentive transition for zero-shot video object segmentation. In: AAAI (2020)

    Google Scholar 

  35. Zhou, T., Wang, W., Qi, S., Ling, H., Shen, J.: Cascaded human-object interaction recognition. In: CVPR (2020)

    Google Scholar 

Download references

Acknowledgements

This work is supported by Natural Science Foundation of Shanghai under Grant No. 19ZR1455300, and National Natural Science Foundation of China under Grant No. 61806126.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huailin Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Y., Zhao, H., Zhou, F., Zhang, Q., Shi, Y., Liang, L. (2021). MSCANet: Adaptive Multi-scale Context Aggregation Network for Congested Crowd Counting. In: LokoÄŤ, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12573. Springer, Cham. https://doi.org/10.1007/978-3-030-67835-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67835-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67834-0

  • Online ISBN: 978-3-030-67835-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics