Focus for Free in Density-Based Counting

Shi, Zenglin; Mettes, Pascal; Snoek, Cees G. M.

doi:10.1007/s11263-024-01990-3

Zenglin Shi¹,
Pascal Mettes² &
Cees G. M. Snoek²

165 Accesses
1 Citation
Explore all metrics

Abstract

This work considers supervised learning to count from images and their corresponding point annotations. Where density-based counting methods typically use the point annotations only to create Gaussian-density maps, which act as the supervision signal, the starting point of this work is that point annotations have counting potential beyond density map generation. We introduce two methods that repurpose the available point annotations to enhance counting performance. The first is a counting-specific augmentation that leverages point annotations to simulate occluded objects in both input and density images to enhance the network’s robustness to occlusions. The second method, foreground distillation, generates foreground masks from the point annotations, from which we train an auxiliary network on images with blacked-out backgrounds. By doing so, it learns to extract foreground counting knowledge without interference from the background. These methods can be seamlessly integrated with existing counting advances and are adaptable to different loss functions. We demonstrate complementary effects of the approaches, allowing us to achieve robust counting results even in challenging scenarios such as background clutter, occlusion, and varying crowd densities. Our proposed approach achieves strong counting results on multiple datasets, including ShanghaiTech Part_A and Part_B, UCF_QNRF, JHU-Crowd++, and NWPU-Crowd. Code is available at https://github.com/shizenglin/Counting-with-Focus-for-Free.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CLFormer: a unified transformer-based framework for weakly supervised crowd counting and localization

Article 12 April 2023

Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds

Mask focal loss: a unifying framework for dense crowd counting with canonical object detection networks

Article 31 January 2024

References

Babu Sam, D., Sajjan, N.N., Venkatesh Babu, R., & Srinivasan, M. (2018). Divide and grow: Capturing huge diversity in crowd images with incrementally growing CNN. In CVPR.
Bai, S., He, Z., Qiao, Y., Hu, H., Wu, W., & Yan, J. (2020). Adaptive dilated network with self-correction supervision for counting. In CVPR.
Brostow, G.J., & Cipolla, R. (2006). Unsupervised bayesian detection of independent motion in crowds. In CVPR.
Cao X, Wang, Z., Zhao, Y., & Su, F. (2018). Scale aggregation network for accurate and efficient crowd counting. In ECCV.
Chan, A.B., & Vasconcelos, N. (2009). Bayesian poisson regression for crowd counting. In ICCV, pp 545–551.
Chan, A. B., & Vasconcelos, N. (2011). Counting people with low-level features and Bayesian regression. IEEE Transactions on Image Processing, 21(4), 2160–2177.
Article ADS MathSciNet PubMed Google Scholar
Chan, A.B., Liang, Z.S.J., &Vasconcelos, N. (2008). Privacy preserving crowd monitoring: Counting people without people models or tracking. In CVPR.
Chen, K., Loy, C.C., Gong, S., & Xiang, T. (2012) Feature mining for localised crowd counting. In BMVC.
Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.S. (2017). Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In CVPR.
Chen, S., Shi, Z., Mettes, P., & Snoek, C.G. (2021). Social fabric: Tubelet compositions for video relation detection. In ICCV.
Cheng, J., Xiong, H., Cao, Z., & Lu, H. (2021). Decoupled two-stage crowd counting and beyond. IEEE Transactions on Image Processing, 30, 2862–2875.
Article ADS MathSciNet PubMed Google Scholar
Cheng, Z.Q., Li, J.X., Dai, Q., Wu, X., & Hauptmann, A.G. (2019a). Learning spatial awareness to improve crowd counting. In ICCV.
Cheng, Z.Q., Li, J.X., Dai, Q., Wu, X., He, J.Y., & Hauptmann, A, (2019b), Improving the learning of multi-column convolutional neural network for crowd counting. In ACM MM.
Cheng, Z.Q., Dai, Q., Li, H., Song, J., Wu, X., & Hauptmann, A.G. (2022). Rethinking spatial invariance of convolutional networks for object counting. In CVPR.
DeVries, T., & Taylor, G.W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., & Uszkoreit, J., (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929
Gao, J., Gong, M., & Li, X. (2022). Congested crowd instance localization with dilated convolutional swin transformer. Neurocomputing, 513, 94–103.
Article Google Scholar
Ghiasi, G., Cui, Y., Srinivas, A., Qian, R., Lin, T.Y., Cubuk, E.D., Le, Q.V., & Zoph, B. (2021). Simple copy-paste is a strong data augmentation method for instance segmentation. In CVPR.
Guerrero-Gómez-Olmedo, R., Torre-Jiménez, B., López-Sastre, R., Maldonado-Bascón, S., Onoro-Rubio, D. (2015). Extremely overlapping vehicle counting. In IbPRIA.
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv:1503.02531
Hu. J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In CVPR.
Hu, Y., Jiang, X., Liu, X., Zhang, B., Han, J., Cao, X., & Doermann, D. (2020). Nas-count: Counting-by-density with neural architecture search. In ECCV.
Huang, S., Li, X., Cheng, Z.Q., Zhang, Z., & Hauptmann, A. (2020). Stacked pooling for boosting scale invariance of crowd counting. In ICASSP.
Idrees H, Saleemi I, Seibert C, & Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In CVPR
Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., & Shah, M. (2018). Composition loss for counting, density map estimation and localization in dense crowds. In ECCV.
Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doermann, D., Shao, L. (2019). Crowd counting and density estimation by trellis encoder-decoder networks. In CVPR.
Jiang, X., Zhang, L., Xu, M., Zhang, T., Lv, P., Zhou, B., Yang, X., & Pang, Y. (2020a). Attention scaling for crowd counting. In CVPR.
Jiang, X., Zhang, L., Zhang, T., Lv, P., Zhou, B., Pang, Y., Xu, M., & Xu, C. (2020). Density-aware multi-task learning for crowd counting. IEEE Transactions on Multimedia, 23, 443–453.
Article Google Scholar
Kang, D., Dhar, D., & Chan, A. (2020). Incorporating side information by adaptive convolution. International Journal of Computer Vision, 128, 2897–2918.
Article Google Scholar
Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In CVPR.
Lempitsky, V., & Zisserman, A. (2010). Learning to count objects in images. In NeurIPS.
Li M, Zhang Z, Huang K, Tan T (2008) Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In: ICPR, pp 1–4
Li, Y., Zhang, X., & Chen, D. (2018). Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In CVPR.
Liang, D., Chen, X., Xu, W., Zhou, Y., & Bai, X. (2022). Transcrowd: Weakly-supervised crowd counting with transformers. SCIENCE CHINA Information Sciences, 65(6), 160104.
Article Google Scholar
Liang, D., Xu, W., Zhu, Y., & Zhou, Y. (2022). Focal inverse distance transform maps for crowd localization. IEEE Transactions on Multimedia, 25, 6040–6052.
Article Google Scholar
Lin, H., Ma, Z., Ji, R., Wang, Y., & Hong, X. (2022). Boosting crowd counting via multifaceted attention. In CVPR.
Lin, S. F., Chen, J. Y., & Chao, H. X. (2001). Estimation of number of people in crowded scenes using perspective transformation. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 31(6), 645–654.
Article Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P. (2017). Focal loss for dense object detection. In ICCV.
Liu, J., Gao, C., Meng, D., Hauptmann, A.G. (2018a). Decidenet: Counting varying density crowds through attention guided detection and density estimation. In CVPR.
Liu, L., Wang, H., Li, G., Ouyang, W., & Lin, L. (2018b). Crowd counting using deep recurrent spatial-aware network. In IJCAI.
Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L., Wu, H. (2019a). Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In CVPR.
Liu, W., Salzmann, M., Fua, P. (2019b). Context-aware crowd counting. In CVPR.
Liu, X., van de Weijer, J., Bagdanov, A.D. (2018c). Leveraging unlabeled data for crowd counting by learning to rank. In CVPR.
Liu, X., Yang, J., Ding, W., Wang, T., Wang, Z., & Xiong, J. (2020). Adaptive mixture regression network with local counting map for crowd counting. In ECCV.
Liu, X., Li, G., Han, Z., Zhang, W., Yang, Y., Huang, Q., Sebe, N. (2021). Exploiting sample correlation for crowd counting with multi-expert network. In ICCV.
Liu, Y., Cheng, M. M., Fan, D. P., Zhang, L., Bian, J. W., & Tao, D. (2022). Semantic edge detection with diverse deep supervision. International Journal of Computer Vision, 130(1), 179–198.
Article Google Scholar
Ma, Z., Wei, X., Hong, X., & Gong, Y. (2019). Bayesian loss for crowd count estimation with point supervision. In: ICCV.
Ma, Z., Wei, X., Hong, X., & Gong, Y. (2020) Learning scales from points: A scale-aware probabilistic model for crowd counting. In ACM MM.
Ma, Z., Hong, X., Wei, X., Qiu, Y., Gong, Y. (2021). Towards a universal model for cross-dataset crowd counting. In ICCV.
Mo, H., Ren, W., Xiong, Y., Pan, X., Zhou, Z., Cao, X., & Wu, W. (2020). Background noise filtering and distribution dividing for crowd counting. IEEE Transactions on Image Processing, 29, 8199–8212.
Article ADS Google Scholar
Modolo, D., Shuai, B., Varior, R.R., & Tighe, J. (2021). Understanding the impact of mistakes on background regions in crowd counting. In WACV.
Onoro-Rubio, D., & López-Sastre, R.J. (2016) Towards perspective-free object counting with deep learning. In ECCV.
Peyré, G., Cuturi, M., et al. (2019). Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5–6), 355–607.
Article Google Scholar
Pham, V.Q., Kozakaya, T., Yamaguchi, O., & Okada, R. (2015). Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In ICCV.
Qian, Y., Zhang, L., Hong, X., Donovan, C.R., & Arandjelovic, O. (2022). Segmentation assisted u-shaped multi-scale transformer for crowd counting. In BMVC.
Rabaud, V., Belongie, S. (2006). Counting crowded moving objects. In CVPR.
Ranjan, V., Le, H., & Hoai, M. (2018). Iterative crowd counting. In ECCV
Rong, L., & Li, C. (2021). Coarse-and fine-grained attention network with background-aware loss for crowd density map estimation. In WACV.
Sam, D.B., Surya, S., & Babu, R.V. (2017). Switching convolutional neural network for crowd counting. In CVPR.
Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., Yang, X. (2018a). Crowd counting via adversarial cross-scale consistency pursuit. In CVPR.
Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., & Yang, X. (2018b). Crowd counting via adversarial cross-scale consistency pursuit. In CVPR.
Shi, M., Yang, Z., Xu, C., & Chen, Q. (2019a). Revisiting perspective information for efficient crowd counting. In CVPR.
Shi, Z., Zhang, L., Liu, Y., Cao, X., & Ye, Y., Cheng, M.M., Zheng, G. (2018a). Crowd counting with deep negative correlation learning. In CVPR.
Shi, Z., Zhang, L., Sun, Y., & Ye, Y. (2018). Multiscale multitask deep netvlad for crowd counting. IEEE TII, 14(11), 4953–4962.
Google Scholar
Shi, Z., Mettes, P., & Snoek, C.G.M. (2019b). Counting with focus for free. In ICCV,
Shi, Z., Chen, Y., Gavves, E., Mettes, P., & Snoek, C. G. (2021). Unsharp mask guided filtering. IEEE Transactions on Image Processing, 30, 7472–7485.
Article ADS PubMed Google Scholar
Shi, Z., Mettes, P., Zheng, G., Snoek, C. (2021b). Frequency-supervised mr-to-ct image synthesis. In MICCAI workshop on deep generative models and data augmentation.
Shi, Z., Sun, Y., Zhang, M. (2024). Training-free object counting with prompts. In WACV.
Shu, W., Wan, J., Tan, K.C., Kwong, S., Chan, A.B. (2022). Crowd counting in the frequency domain. In CVPR.
Sindagi, V.A., Patel, V.M. (2017). Generating high-quality crowd density maps using contextual pyramid cnns. In ICCV pp. 1861–1870.
Sindagi, V. A., & Patel, V. M. (2019). Ha-CCN: Hierarchical attention-based crowd counting network. IEEE Transactions on Image Processing, 29, 323–335.
Article ADS MathSciNet Google Scholar
Sindagi, V. A., Yasarla, R., & Patel, V. M. (2020). Jhu-crowd++: Large-scale crowd counting dataset and a benchmark method. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2020.3035969
Article Google Scholar
Song, Q., Wang, C., Jiang, Z., Wang, Y., Tai, Y., Wang, C., Li, J., Huang, F., Wu, Y. (2021). Rethinking counting and localization in crowds: A purely point-based framework. In ICCV.
Sun, G., Liu, Y., Probst, T., Paudel, D.P., Popovic, N., & Van Gool, L. (2021). Boosting crowd counting with transformers. arXiv:2105.10926
Tian, Y., Chu, X., Wang, H. (2021). Cctrans: Simplifying and improving crowd counting with transformer. arXiv:2109.14483
Topkaya, I.S., Erdogan, H., Porikli, F. (2014). Counting people by clustering person detector outputs. In AVSS
Tran, N.H., Huy, T.D., Duong, S.T,, Nguyen, P., Hung, D.H., Nguyen, C.D.T., Bui, T., Truong, S.Q., & VinBrain, J. (2022). Improving local features with relevant spatial information by vision transformer for crowd counting. In BMVC.
Wan, J., Chan, A. (2020). Modeling noisy annotations for crowd counting. In NeurIPS
Wan, J., Wang, Q., & Chan, A. B. (2020). Kernel-based density map generation for dense object counting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3), 1357–1370.
Article Google Scholar
Wan, J., Liu, Z., Chan, A.B. (2021). A generalized loss function for crowd counting and localization. In CVPR.
Wang, B., Liu, H., Samaras, D., Hoai, M. (2020a). Distribution matching for crowd counting. In NeurIPS.
Wang, C., Song, Q., Zhang, B., Wang, Y., Tai, Y., Hu, X., Wang, C., Li, J., Ma, J., & Wu, Y. (2021). Uniformity in heterogeneity: Diving deep into count interval partition for crowd counting. In ICCV.
Wang, Q., Gao, J., Lin, W., & Li, X. (2020). Nwpu-crowd: A large-scale benchmark for crowd counting and localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(6), 2141–2149.
Article Google Scholar
Wang, Y., Ma, Z., Wei, X., Zheng, S., Wang, Y., & Hong, X. (2022). Eccnas: Efficient crowd counting neural architecture search. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 18(1s), 1–19.
Article Google Scholar
Woo, S., Park, J., Lee, J.Y., & Kweon, I.S. (2018). Cbam: Convolutional block attention module. In ECCV.
Wu, B., & Nevatia, R. (2007). Detection and tracking of multiple, partially occluded humans by Bayesian combination of edgelet based part detectors. International Journal of Computer Vision, 75(2), 247.
Xiong, H., Lu, H., Liu, C., Liu, L., Shen, C., & Cao, Z. (2023). From open set to closed set: Supervised spatial divide-and-conquer for object counting. International Journal of Computer Vision., 131(7), 1722–1740.
Article Google Scholar
Xu, C., Liang, D., Xu, Y., Bai, S., Zhan, W., Bai, X., & Tomizuka, M. (2022). Autoscale: Learning to scale for crowd counting. International Journal of Computer Vision, 130(2), 405–434.
Article Google Scholar
Yan, Z., Yuan, Y., Zuo, W., Tan, X., Wang, Y., Wen, S., & Ding, E. (2019). Perspective-guided convolution networks for crowd counting. In ICCV.
Yang, S., Guo, W., Ren, Y. (2022). Crowdformer: An overlap patching vision transformer for top-down crowd counting. In IJCAI.
Yang, Y., Li, G., Wu, Z., Su, L., Huang, Q., Sebe, N. (2020). Reverse perspective network for perspective-aware object counting. In CVPR.
Yuan, L., Chen, Y., Wang, T., Yu, W., Shi, Y., Jiang, Z.H., Tay, F.E., Feng, J., & Yan, S. (2021). Tokens-to-token vit: Training vision transformers from scratch on imagenet. In ICCV.
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., & Yoo, Y. (2019). Cutmix: Regularization strategy to train strong classifiers with localizable features. In ICCV.
Zhang, J., Cheng, Z.Q., Wu, X., Li, W., Qiao, J.J. (2022). Crossnet: Boosting crowd counting with localization. In ACM MM.
Zhang, Q., & Chan, A. B. (2022). Wide-area crowd counting: Multi-view fusion networks for counting in large scenes. International Journal of Computer Vision, 130(8), 1938–1960.
Article Google Scholar
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y. (2016). Single-image crowd counting via multi-column convolutional neural network. In CVPR.
Zhao, M., Zhang, J., Zhang, C., & Zhang, W. (2019). Leveraging heterogeneous auxiliary tasks to assist crowd counting. In CVPR.

Download references

Author information

Authors and Affiliations

Hefei University of Technology, Hefei, China
Zenglin Shi
University of Amsterdam, Amsterdam, The Netherlands
Pascal Mettes & Cees G. M. Snoek

Authors

Zenglin Shi
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Mettes
View author publications
You can also search for this author in PubMed Google Scholar
Cees G. M. Snoek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zenglin Shi.

Additional information

Communicated by Ming-Hsuan Yang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shi, Z., Mettes, P. & Snoek, C.G.M. Focus for Free in Density-Based Counting. Int J Comput Vis (2024). https://doi.org/10.1007/s11263-024-01990-3

Download citation

Received: 08 June 2023
Accepted: 01 January 2024
Published: 09 February 2024
DOI: https://doi.org/10.1007/s11263-024-01990-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Focus for Free in Density-Based Counting

Abstract

Access this article

Similar content being viewed by others

CLFormer: a unified transformer-based framework for weakly supervised crowd counting and localization

Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds

Mask focal loss: a unifying framework for dense crowd counting with canonical object detection networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Focus for Free in Density-Based Counting

Abstract

Access this article

Similar content being viewed by others

CLFormer: a unified transformer-based framework for weakly supervised crowd counting and localization

Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds

Mask focal loss: a unifying framework for dense crowd counting with canonical object detection networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation