Skip to main content
Log in

From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Visual counting, a task that aims to estimate the number of objects from an image/video, is an open-set problem by nature as the number of population can vary in \([0,+\infty )\) in theory. However, collected data are limited in reality, which means that only a closed set is observed. Existing methods typically model this task through regression, while they are prone to suffer from unseen scenes with counts out of the scope of the closed set. In fact, counting has an interesting and exclusive property—spatially decomposable. A dense region can always be divided until sub-region counts are within the previously observed closed set. We therefore introduce the idea of spatial divide-and-conquer (S-DC) that transforms open-set counting into a closed set problem. This idea is implemented by a novel Supervised Spatial Divide-and-Conquer Network (SS-DCNet). It can learn from a closed set but generalize to open-set scenarios via S-DC. We provide mathematical analyses and a controlled experiment on synthetic data, demonstrating why closed-set modeling works well. Experiments show that SS-DCNet achieves state-of-the-art performance in crowd counting, vehicle counting and plant counting. SS-DCNet also demonstrates superior transferablity under the cross-dataset setting. Code and models are available at: https://git.io/SS-DCNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Babu, S. D., Surya, S., & Venkatesh, B. R. (2017). Switching convolutional neural network for crowd counting. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 5744–5752).

  • Babu, S. D., Sajjan, N. N., Venkatesh, B. R., & Srinivasan, M. (2018). Divide and grow: Capturing huge diversity in crowd images with incrementally growing cnn. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 3618–3626).

  • Cao, X., Wang, Z., Zhao, Y., & Su, F. (2018). Scale aggregation network for accurate and efficient crowd counting. In The European Conference on Computer Vision (ECCV), (pp. 734–750).

  • Chattopadhyay, P., Vedantam , R., Selvaraju, R. R., Batra, D., & Parikh, D. (2017). Counting everyday objects in everyday scenes. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 1135–1144).

  • Chen, K., Loy, C. C., Gong, S., & Xiang, T. (2012). Feature mining for localised crowd counting. In Proceedings of British Machine Vision Conference (BMVC).

  • Chen, K., Gong , S., Xiang, T., & Change, Loy. C. (2013). Cumulative attribute space for age and crowd density estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 2467–2474).

  • Chen, X., Bin, Y., Sang, N., & Gao, C. (2019). Scale pyramid network for crowd counting. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), (pp. 1941–1950).

  • Cheng, Z., Li, J., Dai, Q., Wu, X., & Hauptmann, A. G. (2019). Learning spatial awareness to improve crowd counting. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), (pp. 6152–6161).

  • Cohen, J. P., Boucher, G., Glastonbury, C. A., Lo, H. Z., & Bengio, Y. (2017). Count-ception: Counting by fully convolutional redundant counting. In Proceedings of IEEE International Conference on Computer Vision Workshop (ICCVW), (pp. 18–26).

  • Dehaene, S., Izard, V., Spelke, E., & Pica, P. (2008). Log or linear? Distinct intuitions of the number scale in western and Amazonian indigene cultures. Science, 320(5880), 1217–1220.

    Article  MathSciNet  MATH  Google Scholar 

  • Fu, H., Gong, M., Wang, C., Batmanghelich, K., & Tao, D. (2018). Deep ordinal regression network for monocular depth estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 2002–2011).

  • Girshick, R . (2015). Fast R-CNN. In Proceedings of IEEE International Conference on Computer Vision (ICCV), (pp. 1440–1448).

  • Guerrerogó, mezolmedo. R., Torrejiménez, B., Lópezsastre, R., Maldonadobascón, S., & Oñororubio, D. (2015). Extremely overlapping vehicle counting. In Pattern Recognition and Image Analysis, (pp. 423–431).

  • Idrees, H., Saleemi, I., Seibert, C., & Shah, M. (2013). Multi-source multi-scale counting in extremely dense crowd images. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 2547–2554).

  • Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., & Shah, M. (2018). Composition loss for counting, density map estimation and localization in dense crowds. In The European Conference on Computer Vision (ECCV), (pp. 532–546).

  • Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doermann, D., & Shao, L. (2019) . Crowd counting and density estimation by trellis encoder-decoder networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 6133–6142).

  • Lempitsky, V., & Zisserman, A. (2010). Learning to count objects in images. In Advances in Neural Information Processing Systems (NIPS), (pp. 1324–1332).

  • Li, R., Xian, K., Shen, C., Cao, Z., Lu, H., & Hang, L. (2018a). Deep attention-based classification network for robust depth prediction. In Proceedings of the Asian Conference on Computer Vision (ACCV).

  • Li, R., Xian, K., Shen, C., Cao, Z., Lu, H., & Hang, L. (2018b). Deep attention-based classification network for robust depth prediction. In Proceedings of Asian Conference on Computer Vision (ACCV).

  • Li, Y., Zhang, X., Chen, D. (2018c) Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 1091–1100)

  • Liu, J., Gao, C., Meng, D., & Hauptmann, A. G. (2018a). Decidenet: Counting varying density crowds through attention guided detection and density estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 5197–5206).

  • Liu, L., Wang, H., Li, G., Ouyang, W., & Liang, L. (2018b). Crowd counting using deep recurrent spatial-aware network. In International Joint Conference on Artificial Intelligence (IJCAI).

  • Liu, L., Lu, H., Xiong, H., Xian, K., Cao, Z., & Shen, C .(2019). Counting objects by blockwise classification. In IEEE Transactions on Circuits and Systems for Video Technology.

  • Lu, H., Cao, Z., Xiao, Y., Fang, Z., Zhu, Y., & Xian, K. (2015). Fine-grained maize tassel trait characterization with multi-view representations. Computers and Electronics in Agriculture, 118, 143–158. https://doi.org/10.1016/j.compag.2015.08.027

    Article  Google Scholar 

  • Lu, H., Cao, Z., Xiao, Y., Li, Y., & Zhu, Y. (2016). Region-based colour modelling for joint crop and maize tassel segmentation. Biosystems Engineering, 147, 139–150. https://doi.org/10.1016/j.biosystemseng.2016.04.007

    Article  Google Scholar 

  • Lu, H., Cao, Z., Xiao, Y., Zhuang, B., & Shen, C. (2017). TasselNet: counting maize tassels in the wild via local counts regression network. Plant Methods, 13(1), 79–95.

    Article  Google Scholar 

  • Ma, Z., Wei, X., Hong, X., & Gong, Y. (2019). Bayesian loss for crowd count estimation with point supervision. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), (pp. 6142–6151).

  • Niu, Z., Zhou, M., Wang, L., Gao, X., & Hua, G. (2016) Ordinal regression with multiple output cnn for age estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 4920–4928).

  • Oñoro-Rubio, D., & López-Sastre, R. J. (2016). Towards perspective-free object counting with deep learning. In The European Conference on Computer Vision (ECCV), (pp. 615–629).

  • Osterreicher, F., & Vajda, I. (2003). A new class of metric divergences on probability spaces and its applicability in statistics. Annals of the Institute of Statistical Mathematics, 55(3), 639–653.

    Article  MathSciNet  MATH  Google Scholar 

  • Panareda Busto, P., Gall ,J. (2017). Open set domain adaptation. In Proceedings of IEEE International Conference on Computer Vision (ICCV), (pp. 754–763).

  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (NIPS), (pp. 8024–8035).

  • Ranjan, V., Le, H., Hoai, M. (2018). Iterative crowd counting. In The European Conference on Computer Vision (ECCV), (pp. 270–285).

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-assisted Intervention, (pp. 234–241).

  • Scheirer, W. J., de Rezende, Rocha A., Sapkota, A., & Boult, T. E. (2012). Toward open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(7), 1757–1772.

    Article  Google Scholar 

  • Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., & Yang, X. (2018). Crowd counting via adversarial cross-scale consistency pursuit. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 5245–5254).

  • Shi, Z., Zhang, L., Liu, Y., Cao, X., Ye, Y., Cheng, M. M., & Zheng, G. (2018). Crowd counting with deep negative correlation learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 5382–5390).

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition.

  • Sindagi, V. A., Patel, V. M. (2017). Generating high-quality crowd density maps using contextual pyramid cnns. In The IEEE International Conference on Computer Vision (ICCV), (pp. 1861–1870).

  • Sindagi, V. A., Yasarla, R., & Patel, V. M. (2019). Pushing the frontiers of unconstrained crowd counting: New dataset and benchmark method. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), (pp. 1221–1231).

  • Stahl, T., Pintea, S. L., & van Gemert, J. C. (2019). Divide and count: Generic object counting by image divisions. IEEE Transactions on Image Processing, 28(2), 1035–1044.

    Article  MathSciNet  MATH  Google Scholar 

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A. (2015). Going deeper with convolutions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 1–9).

  • Tian, Y., Lei, Y., Zhang, J., & Wang, J. Z. (2019). Padnet: Pan-density crowd counting. In IEEE Transactions on Image Processing, (pp. 1–1), https://doi.org/10.1109/TIP.2019.2952083

  • Tota, K., & Idrees. H . (2015). Counting in dense crowds using deep features. In Center for Research in Computer Vision (CRCV).

  • Tsai, Y. H., Hung, W. C., Schulter, S., Sohn, K., Yang , M. H., & Chandraker, M. (2018). Learning to adapt structured output space for semantic segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Uijlings, J. R. R., Sande, K. E. A. V. D., Gevers, T., & Smeulders, A. W. M. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171.

    Article  Google Scholar 

  • Xiong, H., Cao, Z., Lu, H., Madec, S., Liu, L., & Shen, C. (2019). TasselNetv2: In-field counting of wheat spikes with context-augmented local regression networks. Plant Methods, 15, 150–163.

    Article  Google Scholar 

  • Xiong, H., Lu, H., Liu, C., Liang, L., Cao, Z., & Shen, C. (2019b). From open set to closed set: Counting objects by spatial divide-and-conquer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), (pp. 8362–8371).

  • Xu, C., Qiu, K., Fu, J., Bai, S., Xu, Y., & Bai, X. (2019). Learn to scale: Generating multipolar normalized density maps for crowd counting. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), (pp. 8382–8390).

  • Yan, Z., Yuan, Y., Zuo, W., Tan, X., Wang, Y., Wen, S., & Ding, E. (2019). Perspective-guided convolution networks for crowd counting. In The IEEE International Conference on Computer Vision (ICCV).

  • Zhang, C., Li, H., Wang, X., & Yang, X. (2015). Cross-scene crowd counting via deep convolutional neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 833–841).

  • Zhang, Y., Zhou, D., Chen, S., Gao, S., & Ma, Y. (2016). Single-image crowd counting via multi-column convolutional neural network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 589–597).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiguo Cao.

Additional information

Communicated by Esa Rahtu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by the Natural Science Foundation of China under Grant Nos. 62106080 and 61876211. Part of the work was done when H. Xiong and L. Liu were visiting The University of Adelaide and when H. Lu and C. Shen were with The University of Adelaide.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiong, H., Lu, H., Liu, C. et al. From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting. Int J Comput Vis 131, 1722–1740 (2023). https://doi.org/10.1007/s11263-023-01782-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01782-1

Keywords

Navigation