Abstract
This work considers supervised learning to count from images and their corresponding point annotations. Where density-based counting methods typically use the point annotations only to create Gaussian-density maps, which act as the supervision signal, the starting point of this work is that point annotations have counting potential beyond density map generation. We introduce two methods that repurpose the available point annotations to enhance counting performance. The first is a counting-specific augmentation that leverages point annotations to simulate occluded objects in both input and density images to enhance the network’s robustness to occlusions. The second method, foreground distillation, generates foreground masks from the point annotations, from which we train an auxiliary network on images with blacked-out backgrounds. By doing so, it learns to extract foreground counting knowledge without interference from the background. These methods can be seamlessly integrated with existing counting advances and are adaptable to different loss functions. We demonstrate complementary effects of the approaches, allowing us to achieve robust counting results even in challenging scenarios such as background clutter, occlusion, and varying crowd densities. Our proposed approach achieves strong counting results on multiple datasets, including ShanghaiTech Part_A and Part_B, UCF_QNRF, JHU-Crowd++, and NWPU-Crowd. Code is available at https://github.com/shizenglin/Counting-with-Focus-for-Free.
Similar content being viewed by others
References
Babu Sam, D., Sajjan, N.N., Venkatesh Babu, R., & Srinivasan, M. (2018). Divide and grow: Capturing huge diversity in crowd images with incrementally growing CNN. In CVPR.
Bai, S., He, Z., Qiao, Y., Hu, H., Wu, W., & Yan, J. (2020). Adaptive dilated network with self-correction supervision for counting. In CVPR.
Brostow, G.J., & Cipolla, R. (2006). Unsupervised bayesian detection of independent motion in crowds. In CVPR.
Cao X, Wang, Z., Zhao, Y., & Su, F. (2018). Scale aggregation network for accurate and efficient crowd counting. In ECCV.
Chan, A.B., & Vasconcelos, N. (2009). Bayesian poisson regression for crowd counting. In ICCV, pp 545–551.
Chan, A. B., & Vasconcelos, N. (2011). Counting people with low-level features and Bayesian regression. IEEE Transactions on Image Processing, 21(4), 2160–2177.
Chan, A.B., Liang, Z.S.J., &Vasconcelos, N. (2008). Privacy preserving crowd monitoring: Counting people without people models or tracking. In CVPR.
Chen, K., Loy, C.C., Gong, S., & Xiang, T. (2012) Feature mining for localised crowd counting. In BMVC.
Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.S. (2017). Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In CVPR.
Chen, S., Shi, Z., Mettes, P., & Snoek, C.G. (2021). Social fabric: Tubelet compositions for video relation detection. In ICCV.
Cheng, J., Xiong, H., Cao, Z., & Lu, H. (2021). Decoupled two-stage crowd counting and beyond. IEEE Transactions on Image Processing, 30, 2862–2875.
Cheng, Z.Q., Li, J.X., Dai, Q., Wu, X., & Hauptmann, A.G. (2019a). Learning spatial awareness to improve crowd counting. In ICCV.
Cheng, Z.Q., Li, J.X., Dai, Q., Wu, X., He, J.Y., & Hauptmann, A, (2019b), Improving the learning of multi-column convolutional neural network for crowd counting. In ACM MM.
Cheng, Z.Q., Dai, Q., Li, H., Song, J., Wu, X., & Hauptmann, A.G. (2022). Rethinking spatial invariance of convolutional networks for object counting. In CVPR.
DeVries, T., & Taylor, G.W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., & Uszkoreit, J., (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929
Gao, J., Gong, M., & Li, X. (2022). Congested crowd instance localization with dilated convolutional swin transformer. Neurocomputing, 513, 94–103.
Ghiasi, G., Cui, Y., Srinivas, A., Qian, R., Lin, T.Y., Cubuk, E.D., Le, Q.V., & Zoph, B. (2021). Simple copy-paste is a strong data augmentation method for instance segmentation. In CVPR.
Guerrero-Gómez-Olmedo, R., Torre-Jiménez, B., López-Sastre, R., Maldonado-Bascón, S., Onoro-Rubio, D. (2015). Extremely overlapping vehicle counting. In IbPRIA.
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv:1503.02531
Hu. J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In CVPR.
Hu, Y., Jiang, X., Liu, X., Zhang, B., Han, J., Cao, X., & Doermann, D. (2020). Nas-count: Counting-by-density with neural architecture search. In ECCV.
Huang, S., Li, X., Cheng, Z.Q., Zhang, Z., & Hauptmann, A. (2020). Stacked pooling for boosting scale invariance of crowd counting. In ICASSP.
Idrees H, Saleemi I, Seibert C, & Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In CVPR
Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., & Shah, M. (2018). Composition loss for counting, density map estimation and localization in dense crowds. In ECCV.
Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doermann, D., Shao, L. (2019). Crowd counting and density estimation by trellis encoder-decoder networks. In CVPR.
Jiang, X., Zhang, L., Xu, M., Zhang, T., Lv, P., Zhou, B., Yang, X., & Pang, Y. (2020a). Attention scaling for crowd counting. In CVPR.
Jiang, X., Zhang, L., Zhang, T., Lv, P., Zhou, B., Pang, Y., Xu, M., & Xu, C. (2020). Density-aware multi-task learning for crowd counting. IEEE Transactions on Multimedia, 23, 443–453.
Kang, D., Dhar, D., & Chan, A. (2020). Incorporating side information by adaptive convolution. International Journal of Computer Vision, 128, 2897–2918.
Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In CVPR.
Lempitsky, V., & Zisserman, A. (2010). Learning to count objects in images. In NeurIPS.
Li M, Zhang Z, Huang K, Tan T (2008) Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In: ICPR, pp 1–4
Li, Y., Zhang, X., & Chen, D. (2018). Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In CVPR.
Liang, D., Chen, X., Xu, W., Zhou, Y., & Bai, X. (2022). Transcrowd: Weakly-supervised crowd counting with transformers. SCIENCE CHINA Information Sciences, 65(6), 160104.
Liang, D., Xu, W., Zhu, Y., & Zhou, Y. (2022). Focal inverse distance transform maps for crowd localization. IEEE Transactions on Multimedia, 25, 6040–6052.
Lin, H., Ma, Z., Ji, R., Wang, Y., & Hong, X. (2022). Boosting crowd counting via multifaceted attention. In CVPR.
Lin, S. F., Chen, J. Y., & Chao, H. X. (2001). Estimation of number of people in crowded scenes using perspective transformation. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 31(6), 645–654.
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P. (2017). Focal loss for dense object detection. In ICCV.
Liu, J., Gao, C., Meng, D., Hauptmann, A.G. (2018a). Decidenet: Counting varying density crowds through attention guided detection and density estimation. In CVPR.
Liu, L., Wang, H., Li, G., Ouyang, W., & Lin, L. (2018b). Crowd counting using deep recurrent spatial-aware network. In IJCAI.
Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L., Wu, H. (2019a). Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In CVPR.
Liu, W., Salzmann, M., Fua, P. (2019b). Context-aware crowd counting. In CVPR.
Liu, X., van de Weijer, J., Bagdanov, A.D. (2018c). Leveraging unlabeled data for crowd counting by learning to rank. In CVPR.
Liu, X., Yang, J., Ding, W., Wang, T., Wang, Z., & Xiong, J. (2020). Adaptive mixture regression network with local counting map for crowd counting. In ECCV.
Liu, X., Li, G., Han, Z., Zhang, W., Yang, Y., Huang, Q., Sebe, N. (2021). Exploiting sample correlation for crowd counting with multi-expert network. In ICCV.
Liu, Y., Cheng, M. M., Fan, D. P., Zhang, L., Bian, J. W., & Tao, D. (2022). Semantic edge detection with diverse deep supervision. International Journal of Computer Vision, 130(1), 179–198.
Ma, Z., Wei, X., Hong, X., & Gong, Y. (2019). Bayesian loss for crowd count estimation with point supervision. In: ICCV.
Ma, Z., Wei, X., Hong, X., & Gong, Y. (2020) Learning scales from points: A scale-aware probabilistic model for crowd counting. In ACM MM.
Ma, Z., Hong, X., Wei, X., Qiu, Y., Gong, Y. (2021). Towards a universal model for cross-dataset crowd counting. In ICCV.
Mo, H., Ren, W., Xiong, Y., Pan, X., Zhou, Z., Cao, X., & Wu, W. (2020). Background noise filtering and distribution dividing for crowd counting. IEEE Transactions on Image Processing, 29, 8199–8212.
Modolo, D., Shuai, B., Varior, R.R., & Tighe, J. (2021). Understanding the impact of mistakes on background regions in crowd counting. In WACV.
Onoro-Rubio, D., & López-Sastre, R.J. (2016) Towards perspective-free object counting with deep learning. In ECCV.
Peyré, G., Cuturi, M., et al. (2019). Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5–6), 355–607.
Pham, V.Q., Kozakaya, T., Yamaguchi, O., & Okada, R. (2015). Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In ICCV.
Qian, Y., Zhang, L., Hong, X., Donovan, C.R., & Arandjelovic, O. (2022). Segmentation assisted u-shaped multi-scale transformer for crowd counting. In BMVC.
Rabaud, V., Belongie, S. (2006). Counting crowded moving objects. In CVPR.
Ranjan, V., Le, H., & Hoai, M. (2018). Iterative crowd counting. In ECCV
Rong, L., & Li, C. (2021). Coarse-and fine-grained attention network with background-aware loss for crowd density map estimation. In WACV.
Sam, D.B., Surya, S., & Babu, R.V. (2017). Switching convolutional neural network for crowd counting. In CVPR.
Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., Yang, X. (2018a). Crowd counting via adversarial cross-scale consistency pursuit. In CVPR.
Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., & Yang, X. (2018b). Crowd counting via adversarial cross-scale consistency pursuit. In CVPR.
Shi, M., Yang, Z., Xu, C., & Chen, Q. (2019a). Revisiting perspective information for efficient crowd counting. In CVPR.
Shi, Z., Zhang, L., Liu, Y., Cao, X., & Ye, Y., Cheng, M.M., Zheng, G. (2018a). Crowd counting with deep negative correlation learning. In CVPR.
Shi, Z., Zhang, L., Sun, Y., & Ye, Y. (2018). Multiscale multitask deep netvlad for crowd counting. IEEE TII, 14(11), 4953–4962.
Shi, Z., Mettes, P., & Snoek, C.G.M. (2019b). Counting with focus for free. In ICCV,
Shi, Z., Chen, Y., Gavves, E., Mettes, P., & Snoek, C. G. (2021). Unsharp mask guided filtering. IEEE Transactions on Image Processing, 30, 7472–7485.
Shi, Z., Mettes, P., Zheng, G., Snoek, C. (2021b). Frequency-supervised mr-to-ct image synthesis. In MICCAI workshop on deep generative models and data augmentation.
Shi, Z., Sun, Y., Zhang, M. (2024). Training-free object counting with prompts. In WACV.
Shu, W., Wan, J., Tan, K.C., Kwong, S., Chan, A.B. (2022). Crowd counting in the frequency domain. In CVPR.
Sindagi, V.A., Patel, V.M. (2017). Generating high-quality crowd density maps using contextual pyramid cnns. In ICCV pp. 1861–1870.
Sindagi, V. A., & Patel, V. M. (2019). Ha-CCN: Hierarchical attention-based crowd counting network. IEEE Transactions on Image Processing, 29, 323–335.
Sindagi, V. A., Yasarla, R., & Patel, V. M. (2020). Jhu-crowd++: Large-scale crowd counting dataset and a benchmark method. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2020.3035969
Song, Q., Wang, C., Jiang, Z., Wang, Y., Tai, Y., Wang, C., Li, J., Huang, F., Wu, Y. (2021). Rethinking counting and localization in crowds: A purely point-based framework. In ICCV.
Sun, G., Liu, Y., Probst, T., Paudel, D.P., Popovic, N., & Van Gool, L. (2021). Boosting crowd counting with transformers. arXiv:2105.10926
Tian, Y., Chu, X., Wang, H. (2021). Cctrans: Simplifying and improving crowd counting with transformer. arXiv:2109.14483
Topkaya, I.S., Erdogan, H., Porikli, F. (2014). Counting people by clustering person detector outputs. In AVSS
Tran, N.H., Huy, T.D., Duong, S.T,, Nguyen, P., Hung, D.H., Nguyen, C.D.T., Bui, T., Truong, S.Q., & VinBrain, J. (2022). Improving local features with relevant spatial information by vision transformer for crowd counting. In BMVC.
Wan, J., Chan, A. (2020). Modeling noisy annotations for crowd counting. In NeurIPS
Wan, J., Wang, Q., & Chan, A. B. (2020). Kernel-based density map generation for dense object counting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3), 1357–1370.
Wan, J., Liu, Z., Chan, A.B. (2021). A generalized loss function for crowd counting and localization. In CVPR.
Wang, B., Liu, H., Samaras, D., Hoai, M. (2020a). Distribution matching for crowd counting. In NeurIPS.
Wang, C., Song, Q., Zhang, B., Wang, Y., Tai, Y., Hu, X., Wang, C., Li, J., Ma, J., & Wu, Y. (2021). Uniformity in heterogeneity: Diving deep into count interval partition for crowd counting. In ICCV.
Wang, Q., Gao, J., Lin, W., & Li, X. (2020). Nwpu-crowd: A large-scale benchmark for crowd counting and localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(6), 2141–2149.
Wang, Y., Ma, Z., Wei, X., Zheng, S., Wang, Y., & Hong, X. (2022). Eccnas: Efficient crowd counting neural architecture search. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 18(1s), 1–19.
Woo, S., Park, J., Lee, J.Y., & Kweon, I.S. (2018). Cbam: Convolutional block attention module. In ECCV.
Wu, B., & Nevatia, R. (2007). Detection and tracking of multiple, partially occluded humans by Bayesian combination of edgelet based part detectors. International Journal of Computer Vision, 75(2), 247.
Xiong, H., Lu, H., Liu, C., Liu, L., Shen, C., & Cao, Z. (2023). From open set to closed set: Supervised spatial divide-and-conquer for object counting. International Journal of Computer Vision., 131(7), 1722–1740.
Xu, C., Liang, D., Xu, Y., Bai, S., Zhan, W., Bai, X., & Tomizuka, M. (2022). Autoscale: Learning to scale for crowd counting. International Journal of Computer Vision, 130(2), 405–434.
Yan, Z., Yuan, Y., Zuo, W., Tan, X., Wang, Y., Wen, S., & Ding, E. (2019). Perspective-guided convolution networks for crowd counting. In ICCV.
Yang, S., Guo, W., Ren, Y. (2022). Crowdformer: An overlap patching vision transformer for top-down crowd counting. In IJCAI.
Yang, Y., Li, G., Wu, Z., Su, L., Huang, Q., Sebe, N. (2020). Reverse perspective network for perspective-aware object counting. In CVPR.
Yuan, L., Chen, Y., Wang, T., Yu, W., Shi, Y., Jiang, Z.H., Tay, F.E., Feng, J., & Yan, S. (2021). Tokens-to-token vit: Training vision transformers from scratch on imagenet. In ICCV.
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., & Yoo, Y. (2019). Cutmix: Regularization strategy to train strong classifiers with localizable features. In ICCV.
Zhang, J., Cheng, Z.Q., Wu, X., Li, W., Qiao, J.J. (2022). Crossnet: Boosting crowd counting with localization. In ACM MM.
Zhang, Q., & Chan, A. B. (2022). Wide-area crowd counting: Multi-view fusion networks for counting in large scenes. International Journal of Computer Vision, 130(8), 1938–1960.
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y. (2016). Single-image crowd counting via multi-column convolutional neural network. In CVPR.
Zhao, M., Zhang, J., Zhang, C., & Zhang, W. (2019). Leveraging heterogeneous auxiliary tasks to assist crowd counting. In CVPR.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Ming-Hsuan Yang.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shi, Z., Mettes, P. & Snoek, C.G.M. Focus for Free in Density-Based Counting. Int J Comput Vis (2024). https://doi.org/10.1007/s11263-024-01990-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11263-024-01990-3