Skip to main content
Log in

Abstract

This work considers supervised learning to count from images and their corresponding point annotations. Where density-based counting methods typically use the point annotations only to create Gaussian-density maps, which act as the supervision signal, the starting point of this work is that point annotations have counting potential beyond density map generation. We introduce two methods that repurpose the available point annotations to enhance counting performance. The first is a counting-specific augmentation that leverages point annotations to simulate occluded objects in both input and density images to enhance the network’s robustness to occlusions. The second method, foreground distillation, generates foreground masks from the point annotations, from which we train an auxiliary network on images with blacked-out backgrounds. By doing so, it learns to extract foreground counting knowledge without interference from the background. These methods can be seamlessly integrated with existing counting advances and are adaptable to different loss functions. We demonstrate complementary effects of the approaches, allowing us to achieve robust counting results even in challenging scenarios such as background clutter, occlusion, and varying crowd densities. Our proposed approach achieves strong counting results on multiple datasets, including ShanghaiTech Part_A and Part_B, UCF_QNRF, JHU-Crowd++, and NWPU-Crowd. Code is available at https://github.com/shizenglin/Counting-with-Focus-for-Free.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Babu Sam, D., Sajjan, N.N., Venkatesh Babu, R., & Srinivasan, M. (2018). Divide and grow: Capturing huge diversity in crowd images with incrementally growing CNN. In CVPR.

  • Bai, S., He, Z., Qiao, Y., Hu, H., Wu, W., & Yan, J. (2020). Adaptive dilated network with self-correction supervision for counting. In CVPR.

  • Brostow, G.J., & Cipolla, R. (2006). Unsupervised bayesian detection of independent motion in crowds. In CVPR.

  • Cao X, Wang, Z., Zhao, Y., & Su, F. (2018). Scale aggregation network for accurate and efficient crowd counting. In ECCV.

  • Chan, A.B., & Vasconcelos, N. (2009). Bayesian poisson regression for crowd counting. In ICCV, pp 545–551.

  • Chan, A. B., & Vasconcelos, N. (2011). Counting people with low-level features and Bayesian regression. IEEE Transactions on Image Processing, 21(4), 2160–2177.

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  • Chan, A.B., Liang, Z.S.J., &Vasconcelos, N. (2008). Privacy preserving crowd monitoring: Counting people without people models or tracking. In CVPR.

  • Chen, K., Loy, C.C., Gong, S., & Xiang, T. (2012) Feature mining for localised crowd counting. In BMVC.

  • Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.S. (2017). Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In CVPR.

  • Chen, S., Shi, Z., Mettes, P., & Snoek, C.G. (2021). Social fabric: Tubelet compositions for video relation detection. In ICCV.

  • Cheng, J., Xiong, H., Cao, Z., & Lu, H. (2021). Decoupled two-stage crowd counting and beyond. IEEE Transactions on Image Processing, 30, 2862–2875.

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  • Cheng, Z.Q., Li, J.X., Dai, Q., Wu, X., & Hauptmann, A.G. (2019a). Learning spatial awareness to improve crowd counting. In ICCV.

  • Cheng, Z.Q., Li, J.X., Dai, Q., Wu, X., He, J.Y., & Hauptmann, A, (2019b), Improving the learning of multi-column convolutional neural network for crowd counting. In ACM MM.

  • Cheng, Z.Q., Dai, Q., Li, H., Song, J., Wu, X., & Hauptmann, A.G. (2022). Rethinking spatial invariance of convolutional networks for object counting. In CVPR.

  • DeVries, T., & Taylor, G.W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552

  • Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., & Uszkoreit, J., (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929

  • Gao, J., Gong, M., & Li, X. (2022). Congested crowd instance localization with dilated convolutional swin transformer. Neurocomputing, 513, 94–103.

    Article  Google Scholar 

  • Ghiasi, G., Cui, Y., Srinivas, A., Qian, R., Lin, T.Y., Cubuk, E.D., Le, Q.V., & Zoph, B. (2021). Simple copy-paste is a strong data augmentation method for instance segmentation. In CVPR.

  • Guerrero-Gómez-Olmedo, R., Torre-Jiménez, B., López-Sastre, R., Maldonado-Bascón, S., Onoro-Rubio, D. (2015). Extremely overlapping vehicle counting. In IbPRIA.

  • Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv:1503.02531

  • Hu. J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In CVPR.

  • Hu, Y., Jiang, X., Liu, X., Zhang, B., Han, J., Cao, X., & Doermann, D. (2020). Nas-count: Counting-by-density with neural architecture search. In ECCV.

  • Huang, S., Li, X., Cheng, Z.Q., Zhang, Z., & Hauptmann, A. (2020). Stacked pooling for boosting scale invariance of crowd counting. In ICASSP.

  • Idrees H, Saleemi I, Seibert C, & Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In CVPR

  • Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., & Shah, M. (2018). Composition loss for counting, density map estimation and localization in dense crowds. In ECCV.

  • Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doermann, D., Shao, L. (2019). Crowd counting and density estimation by trellis encoder-decoder networks. In CVPR.

  • Jiang, X., Zhang, L., Xu, M., Zhang, T., Lv, P., Zhou, B., Yang, X., & Pang, Y. (2020a). Attention scaling for crowd counting. In CVPR.

  • Jiang, X., Zhang, L., Zhang, T., Lv, P., Zhou, B., Pang, Y., Xu, M., & Xu, C. (2020). Density-aware multi-task learning for crowd counting. IEEE Transactions on Multimedia, 23, 443–453.

    Article  Google Scholar 

  • Kang, D., Dhar, D., & Chan, A. (2020). Incorporating side information by adaptive convolution. International Journal of Computer Vision, 128, 2897–2918.

    Article  Google Scholar 

  • Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In CVPR.

  • Lempitsky, V., & Zisserman, A. (2010). Learning to count objects in images. In NeurIPS.

  • Li M, Zhang Z, Huang K, Tan T (2008) Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In: ICPR, pp 1–4

  • Li, Y., Zhang, X., & Chen, D. (2018). Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In CVPR.

  • Liang, D., Chen, X., Xu, W., Zhou, Y., & Bai, X. (2022). Transcrowd: Weakly-supervised crowd counting with transformers. SCIENCE CHINA Information Sciences, 65(6), 160104.

    Article  Google Scholar 

  • Liang, D., Xu, W., Zhu, Y., & Zhou, Y. (2022). Focal inverse distance transform maps for crowd localization. IEEE Transactions on Multimedia, 25, 6040–6052.

    Article  Google Scholar 

  • Lin, H., Ma, Z., Ji, R., Wang, Y., & Hong, X. (2022). Boosting crowd counting via multifaceted attention. In CVPR.

  • Lin, S. F., Chen, J. Y., & Chao, H. X. (2001). Estimation of number of people in crowded scenes using perspective transformation. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 31(6), 645–654.

    Article  Google Scholar 

  • Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P. (2017). Focal loss for dense object detection. In ICCV.

  • Liu, J., Gao, C., Meng, D., Hauptmann, A.G. (2018a). Decidenet: Counting varying density crowds through attention guided detection and density estimation. In CVPR.

  • Liu, L., Wang, H., Li, G., Ouyang, W., & Lin, L. (2018b). Crowd counting using deep recurrent spatial-aware network. In IJCAI.

  • Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L., Wu, H. (2019a). Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In CVPR.

  • Liu, W., Salzmann, M., Fua, P. (2019b). Context-aware crowd counting. In CVPR.

  • Liu, X., van de Weijer, J., Bagdanov, A.D. (2018c). Leveraging unlabeled data for crowd counting by learning to rank. In CVPR.

  • Liu, X., Yang, J., Ding, W., Wang, T., Wang, Z., & Xiong, J. (2020). Adaptive mixture regression network with local counting map for crowd counting. In ECCV.

  • Liu, X., Li, G., Han, Z., Zhang, W., Yang, Y., Huang, Q., Sebe, N. (2021). Exploiting sample correlation for crowd counting with multi-expert network. In ICCV.

  • Liu, Y., Cheng, M. M., Fan, D. P., Zhang, L., Bian, J. W., & Tao, D. (2022). Semantic edge detection with diverse deep supervision. International Journal of Computer Vision, 130(1), 179–198.

    Article  Google Scholar 

  • Ma, Z., Wei, X., Hong, X., & Gong, Y. (2019). Bayesian loss for crowd count estimation with point supervision. In: ICCV.

  • Ma, Z., Wei, X., Hong, X., & Gong, Y. (2020) Learning scales from points: A scale-aware probabilistic model for crowd counting. In ACM MM.

  • Ma, Z., Hong, X., Wei, X., Qiu, Y., Gong, Y. (2021). Towards a universal model for cross-dataset crowd counting. In ICCV.

  • Mo, H., Ren, W., Xiong, Y., Pan, X., Zhou, Z., Cao, X., & Wu, W. (2020). Background noise filtering and distribution dividing for crowd counting. IEEE Transactions on Image Processing, 29, 8199–8212.

    Article  ADS  Google Scholar 

  • Modolo, D., Shuai, B., Varior, R.R., & Tighe, J. (2021). Understanding the impact of mistakes on background regions in crowd counting. In WACV.

  • Onoro-Rubio, D., & López-Sastre, R.J. (2016) Towards perspective-free object counting with deep learning. In ECCV.

  • Peyré, G., Cuturi, M., et al. (2019). Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5–6), 355–607.

    Article  Google Scholar 

  • Pham, V.Q., Kozakaya, T., Yamaguchi, O., & Okada, R. (2015). Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In ICCV.

  • Qian, Y., Zhang, L., Hong, X., Donovan, C.R., & Arandjelovic, O. (2022). Segmentation assisted u-shaped multi-scale transformer for crowd counting. In BMVC.

  • Rabaud, V., Belongie, S. (2006). Counting crowded moving objects. In CVPR.

  • Ranjan, V., Le, H., & Hoai, M. (2018). Iterative crowd counting. In ECCV

  • Rong, L., & Li, C. (2021). Coarse-and fine-grained attention network with background-aware loss for crowd density map estimation. In WACV.

  • Sam, D.B., Surya, S., & Babu, R.V. (2017). Switching convolutional neural network for crowd counting. In CVPR.

  • Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., Yang, X. (2018a). Crowd counting via adversarial cross-scale consistency pursuit. In CVPR.

  • Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., & Yang, X. (2018b). Crowd counting via adversarial cross-scale consistency pursuit. In CVPR.

  • Shi, M., Yang, Z., Xu, C., & Chen, Q. (2019a). Revisiting perspective information for efficient crowd counting. In CVPR.

  • Shi, Z., Zhang, L., Liu, Y., Cao, X., & Ye, Y., Cheng, M.M., Zheng, G. (2018a). Crowd counting with deep negative correlation learning. In CVPR.

  • Shi, Z., Zhang, L., Sun, Y., & Ye, Y. (2018). Multiscale multitask deep netvlad for crowd counting. IEEE TII, 14(11), 4953–4962.

    Google Scholar 

  • Shi, Z., Mettes, P., & Snoek, C.G.M. (2019b). Counting with focus for free. In ICCV,

  • Shi, Z., Chen, Y., Gavves, E., Mettes, P., & Snoek, C. G. (2021). Unsharp mask guided filtering. IEEE Transactions on Image Processing, 30, 7472–7485.

    Article  ADS  PubMed  Google Scholar 

  • Shi, Z., Mettes, P., Zheng, G., Snoek, C. (2021b). Frequency-supervised mr-to-ct image synthesis. In MICCAI workshop on deep generative models and data augmentation.

  • Shi, Z., Sun, Y., Zhang, M. (2024). Training-free object counting with prompts. In WACV.

  • Shu, W., Wan, J., Tan, K.C., Kwong, S., Chan, A.B. (2022). Crowd counting in the frequency domain. In CVPR.

  • Sindagi, V.A., Patel, V.M. (2017). Generating high-quality crowd density maps using contextual pyramid cnns. In ICCV pp. 1861–1870.

  • Sindagi, V. A., & Patel, V. M. (2019). Ha-CCN: Hierarchical attention-based crowd counting network. IEEE Transactions on Image Processing, 29, 323–335.

    Article  ADS  MathSciNet  Google Scholar 

  • Sindagi, V. A., Yasarla, R., & Patel, V. M. (2020). Jhu-crowd++: Large-scale crowd counting dataset and a benchmark method. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2020.3035969

    Article  Google Scholar 

  • Song, Q., Wang, C., Jiang, Z., Wang, Y., Tai, Y., Wang, C., Li, J., Huang, F., Wu, Y. (2021). Rethinking counting and localization in crowds: A purely point-based framework. In ICCV.

  • Sun, G., Liu, Y., Probst, T., Paudel, D.P., Popovic, N., & Van Gool, L. (2021). Boosting crowd counting with transformers. arXiv:2105.10926

  • Tian, Y., Chu, X., Wang, H. (2021). Cctrans: Simplifying and improving crowd counting with transformer. arXiv:2109.14483

  • Topkaya, I.S., Erdogan, H., Porikli, F. (2014). Counting people by clustering person detector outputs. In AVSS

  • Tran, N.H., Huy, T.D., Duong, S.T,, Nguyen, P., Hung, D.H., Nguyen, C.D.T., Bui, T., Truong, S.Q., & VinBrain, J. (2022). Improving local features with relevant spatial information by vision transformer for crowd counting. In BMVC.

  • Wan, J., Chan, A. (2020). Modeling noisy annotations for crowd counting. In NeurIPS

  • Wan, J., Wang, Q., & Chan, A. B. (2020). Kernel-based density map generation for dense object counting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3), 1357–1370.

    Article  Google Scholar 

  • Wan, J., Liu, Z., Chan, A.B. (2021). A generalized loss function for crowd counting and localization. In CVPR.

  • Wang, B., Liu, H., Samaras, D., Hoai, M. (2020a). Distribution matching for crowd counting. In NeurIPS.

  • Wang, C., Song, Q., Zhang, B., Wang, Y., Tai, Y., Hu, X., Wang, C., Li, J., Ma, J., & Wu, Y. (2021). Uniformity in heterogeneity: Diving deep into count interval partition for crowd counting. In ICCV.

  • Wang, Q., Gao, J., Lin, W., & Li, X. (2020). Nwpu-crowd: A large-scale benchmark for crowd counting and localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(6), 2141–2149.

    Article  Google Scholar 

  • Wang, Y., Ma, Z., Wei, X., Zheng, S., Wang, Y., & Hong, X. (2022). Eccnas: Efficient crowd counting neural architecture search. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 18(1s), 1–19.

    Article  Google Scholar 

  • Woo, S., Park, J., Lee, J.Y., & Kweon, I.S. (2018). Cbam: Convolutional block attention module. In ECCV.

  • Wu, B., & Nevatia, R. (2007). Detection and tracking of multiple, partially occluded humans by Bayesian combination of edgelet based part detectors. International Journal of Computer Vision, 75(2), 247.

  • Xiong, H., Lu, H., Liu, C., Liu, L., Shen, C., & Cao, Z. (2023). From open set to closed set: Supervised spatial divide-and-conquer for object counting. International Journal of Computer Vision., 131(7), 1722–1740.

    Article  Google Scholar 

  • Xu, C., Liang, D., Xu, Y., Bai, S., Zhan, W., Bai, X., & Tomizuka, M. (2022). Autoscale: Learning to scale for crowd counting. International Journal of Computer Vision, 130(2), 405–434.

    Article  Google Scholar 

  • Yan, Z., Yuan, Y., Zuo, W., Tan, X., Wang, Y., Wen, S., & Ding, E. (2019). Perspective-guided convolution networks for crowd counting. In ICCV.

  • Yang, S., Guo, W., Ren, Y. (2022). Crowdformer: An overlap patching vision transformer for top-down crowd counting. In IJCAI.

  • Yang, Y., Li, G., Wu, Z., Su, L., Huang, Q., Sebe, N. (2020). Reverse perspective network for perspective-aware object counting. In CVPR.

  • Yuan, L., Chen, Y., Wang, T., Yu, W., Shi, Y., Jiang, Z.H., Tay, F.E., Feng, J., & Yan, S. (2021). Tokens-to-token vit: Training vision transformers from scratch on imagenet. In ICCV.

  • Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., & Yoo, Y. (2019). Cutmix: Regularization strategy to train strong classifiers with localizable features. In ICCV.

  • Zhang, J., Cheng, Z.Q., Wu, X., Li, W., Qiao, J.J. (2022). Crossnet: Boosting crowd counting with localization. In ACM MM.

  • Zhang, Q., & Chan, A. B. (2022). Wide-area crowd counting: Multi-view fusion networks for counting in large scenes. International Journal of Computer Vision, 130(8), 1938–1960.

    Article  Google Scholar 

  • Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y. (2016). Single-image crowd counting via multi-column convolutional neural network. In CVPR.

  • Zhao, M., Zhang, J., Zhang, C., & Zhang, W. (2019). Leveraging heterogeneous auxiliary tasks to assist crowd counting. In CVPR.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zenglin Shi.

Additional information

Communicated by Ming-Hsuan Yang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, Z., Mettes, P. & Snoek, C.G.M. Focus for Free in Density-Based Counting. Int J Comput Vis (2024). https://doi.org/10.1007/s11263-024-01990-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11263-024-01990-3

Keywords

Navigation