MSCANet: Adaptive Multi-scale Context Aggregation Network for Congested Crowd Counting

Zhang, Yani; Zhao, Huailin; Zhou, Fangbo; Zhang, Qing; Shi, Yanjiao; Liang, Lanjun

doi:10.1007/978-3-030-67835-7_1

Yani Zhang^15,16,
Huailin Zhao¹⁵,
Fangbo Zhou¹⁵,
Qing Zhang¹⁶,
Yanjiao Shi¹⁶ &
…
Lanjun Liang¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12573))

Included in the following conference series:

International Conference on Multimedia Modeling

2175 Accesses
2 Citations

Abstract

Crowd counting has achieved significant progress with deep convolutional neural networks. However, most of the existing methods don’t fully utilize spatial context information, and it is difficult for them to count the congested crowd accurately. To this end, we propose a novel Adaptive Multi-scale Context Aggregation Network (MSCANet), in which a Multi-scale Context Aggregation module (MSCA) is designed to adaptively extract and aggregate the contextual information from different scales of the crowd. More specifically, for each input, we first extract multi-scale context features via atrous convolution layers. Then, the multi-scale context features are progressively aggregated via a channel attention to enrich the crowd representations in different scales. Finally, a \(1\times 1\) convolution layer is applied to regress the crowd density. We perform extensive experiments on three public datasets: ShanghaiTech Part_A, UCF_CC_50 and UCF-QNRF, and the experimental results demonstrate the superiority of our method compared to current the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: ECCV (2018)
Google Scholar
Chen, X., Bin, Y., Sang, N., Gao, C.: Scale pyramid network for crowd counting. In: WACV (2019)
Google Scholar
Deb, D., Ventura, J.: An aggregated multicolumn dilated convolution network for perspective-free counting. In: CVPR Workshop (2018)
Google Scholar
Gao, J., Lin, W., Zhao, B., Wang, D., Gao, C., Wen, J.: C\(^3\) framework: an open-source pytorch code for crowd counting. arXiv preprint arXiv:1907.02724 (2019)
Gao, J., Wang, Q., Li, X.: PCC net: perspective crowd counting via spatial convolutional network. IEEE TCSVT 1 (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)
Google Scholar
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: CVPR (2013)
Google Scholar
Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: ECCV (2018)
Google Scholar
Jiang, X., et al.: Crowd counting and density estimation by trellis encoder-decoder networks. In: CVPR (2019)
Google Scholar
Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: NeurIPS (2010)
Google Scholar
Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: CVPR (2018)
Google Scholar
Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L., Wu, H.: ADCrowdNet: an attention-injective deformable convolutional network for crowd understanding. In: CVPR (2019)
Google Scholar
Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: CVPR (2019)
Google Scholar
Oñoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 615–629. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_38
Chapter Google Scholar
Ranjan, V., Le, H., Hoai, M.: Iterative crowd counting. In: ECCV (2018)
Google Scholar
Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: CVPR (2017)
Google Scholar
Shi, Z., Mettes, P., Snoek, C.G.M.: Counting with focus for free. In: ICCV (2019)
Google Scholar
Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: ICCV (2017)
Google Scholar
Wang, Q., Gao, J., Lin, W., Yuan, Y.: Learning from synthetic data for crowd counting in the wild. In: CVPR (2019)
Google Scholar
Wang, S., Lu, Y., Zhou, T., Di, H., Lu, L., Zhang, L.: SCLNet: spatial context learning network for congested crowd counting. Neurocomputing 404, 227–239 (2020)
Article Google Scholar
Wang, S., Zhao, H., Wang, W., Di, H., Shu, X.: Improving deep crowd density estimation via pre-classification of density. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S. (eds.) ICONIP 2017. LNCS, vol. 10636, pp. 260–269. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70090-8_27
Chapter Google Scholar
Wang, Z., Xiao, Z., Xie, K., Qiu, Q., Zhen, X., Cao, X.: In defense of single-column networks for crowd counting. In: BMVC (2018)
Google Scholar
Xie, Y., Lu, Y., Wang, S.: RSANet: deep recurrent scale-aware network for crowd counting. In: ICIP (2020)
Google Scholar
Yang, L., Peng, H., Zhang, D., Fu, J., Han, J.: Revisiting anchor mechanisms for temporal action localization. IEEE TIP 29, 8535–8548 (2020)
Google Scholar
Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: CVPR (2015)
Google Scholar
Zhang, P., Liu, W., Lei, Y., Lu, H., Yang, X.: Cascaded context pyramid for full-resolution 3D semantic scene completion. arXiv preprint arXiv:1908.00382 (2019)
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: CVPR (2016)
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)
Google Scholar
Zhou, T., Li, J., Wang, S., Tao, R., Shen, J.: MATNet: motion-attentive transition network for zero-shot video object segmentation. IEEE TIP 29, 8326–8338 (2020)
Google Scholar
Zhou, T., Lu, Y., Di, H.: Locality-constrained collaborative model for robust visual tracking. IEEE TCSVT 27(2), 313–325 (2015)
Google Scholar
Zhou, T., Lu, Y., Di, H., Zhang, J.: Video object segmentation aggregation. In: ICME (2016)
Google Scholar
Zhou, T., Lu, Y., Lv, F., Di, H., Zhao, Q., Zhang, J.: Abrupt motion tracking via nearest neighbor field driven stochastic sampling. Neurocomputing 165, 350–360 (2015)
Article Google Scholar
Zhou, T., Wang, S., Zhou, Y., Yao, Y., Li, J., Shao, L.: Motion-attentive transition for zero-shot video object segmentation. In: AAAI (2020)
Google Scholar
Zhou, T., Wang, W., Qi, S., Ling, H., Shen, J.: Cascaded human-object interaction recognition. In: CVPR (2020)
Google Scholar

Download references

Acknowledgements

This work is supported by Natural Science Foundation of Shanghai under Grant No. 19ZR1455300, and National Natural Science Foundation of China under Grant No. 61806126.

Author information

Authors and Affiliations

School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
Yani Zhang, Huailin Zhao, Fangbo Zhou & Lanjun Liang
School of Computer Science and Information Engineering, Shanghai Institute of Technology, Shanghai, China
Yani Zhang, Qing Zhang & Yanjiao Shi

Authors

Yani Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Huailin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Fangbo Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Qing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yanjiao Shi
View author publications
You can also search for this author in PubMed Google Scholar
Lanjun Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huailin Zhao .

Editor information

Editors and Affiliations

Charles University, Prague, Czech Republic
Jakub Lokoč
Charles University, Prague, Czech Republic
Tomáš Skopal
Klagenfurt University, Klagenfurt, Austria
Klaus Schoeffmann
CERTH-ITI, Thessaloniki, Greece
Vasileios Mezaris
Renmin University of China, Beijing, China
Xirong Li
CERTH-ITI, Thessaloniki, Greece
Stefanos Vrochidis
Queen Mary University of London, London, UK
Ioannis Patras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Zhao, H., Zhou, F., Zhang, Q., Shi, Y., Liang, L. (2021). MSCANet: Adaptive Multi-scale Context Aggregation Network for Congested Crowd Counting. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12573. Springer, Cham. https://doi.org/10.1007/978-3-030-67835-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-67835-7_1
Published: 21 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67834-0
Online ISBN: 978-3-030-67835-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics