Advertisement

Weakly-Supervised Crowd Counting Learns from Sorting Rather Than Locations

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12353)

Abstract

In crowd counting datasets, the location labels are costly, yet, they are not taken into the evaluation metrics. Besides, existing multi-task approaches employ high-level tasks to improve counting accuracy. This research tendency increases the demand for more annotations. In this paper, we propose a weakly-supervised counting network, which directly regresses the crowd numbers without the location supervision. Moreover, we train the network to count by exploiting the relationship among the images. We propose a soft-label sorting network along with the counting network, which sorts the given images by their crowd numbers. The sorting network drives the shared backbone CNN model to obtain density-sensitive ability explicitly. Therefore, the proposed method improves the counting accuracy by utilizing the information hidden in crowd numbers, rather than learning from extra labels, such as locations and perspectives. We evaluate our proposed method on three crowd counting datasets, and the performance of our method plays favorably against the fully supervised state-of-the-art approaches.

Keywords

Weakly-supervised Sorting Multi-frames Crowd counting 

Notes

Acknowledgements

This work was supported in part by the Italy-China collaboration project TALENT:2018YFE0118400, in part by National Natural Science Foundation of China: 61620106009, 61772494, 61931008, U1636214, 61836002 and 61976069, in part by Key Research Program of Frontier Sciences, CAS: QYZDJ-SSW-SYS013, in part by Youth Innovation Promotion Association CAS.

References

  1. 1.
    Boominathan, L., Kruthiventi, S.S.S., Babu, R.V.: Crowdnet: a deep convolutional network for dense crowd counting. In: ACM Multimedia, pp. 640–644 (2016)Google Scholar
  2. 2.
    Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 757–773 (2018)Google Scholar
  3. 3.
    Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: counting people without people models or tracking. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–7 (2008)Google Scholar
  4. 4.
    Cheng, Z., Li, J., Dai, Q., Wu, X., Hauptmann, A.G.: Learning spatial awareness to improve crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6152–6161 (2019)Google Scholar
  5. 5.
    Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transportation distances. In: Neural Information Processing Systems, pp. 2292–2300 (2013)Google Scholar
  6. 6.
    Cuturi, M., Teboul, O., Vert, J.: Differentiable ranks and sorting using optimal transport. In: Conference on Neural Information Processing Systems (2019)Google Scholar
  7. 7.
    Deb, D., Ventura, J.: An aggregated multicolumn dilated convolution network for perspective-free counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 195–204 (2018)Google Scholar
  8. 8.
    Grover, A., Wang, E.H., Zweig, A., Ermon, S.: Stochastic optimization of sorting networks via continuous relaxations. In: International Conference on Learning Representations (2019)Google Scholar
  9. 9.
    Guerrerogomezolmedo, R., Torrejimenez, B., Lopezsastre, R.J., Maldonadobascon, S., Onororubio, D.: Extremely overlapping vehicle counting. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 423–431 (2015)Google Scholar
  10. 10.
    Guo, B., et al.: Mobile crowd sensing and computing: the review of an emerging human-powered sensing paradigm. ACM Comput. Surv. 48(1), 7:1–7:31 (2015)Google Scholar
  11. 11.
    Huang, S., et al.: Body structure aware deep crowd counting. IEEE Trans. Image Process. 27(3), 1049–1059 (2018)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2547–2554 (2013)Google Scholar
  13. 13.
    Jiang, X., et al.: Crowd counting and density estimation by trellis encoder-decoder network. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  14. 14.
    Lempitsky, V.S., Zisserman, A.: Learning to count objects in images. In: NIPS (2010)Google Scholar
  15. 15.
    Li, Y., Zhang, X., Chen, D.: Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1091–1100 (2018)Google Scholar
  16. 16.
    Linderman, S.W., Mena, G., Cooper, H., Paninski, L., Cunningham, J.P.: Reparameterizing the birkhoff polytope for variational permutation inference. In: International Conference on Artificial Intelligence and Statistics (2017)Google Scholar
  17. 17.
    Liu, C., Wen, X., Mu, Y.: Recurrent attentive zooming for joint crowd counting and precise localization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  18. 18.
    Liu, J., Gao, C., Meng, D., Hauptmann, A.G.: Decidenet: counting varying density crowds through attention guided detection and density estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5197–5206 (2018)Google Scholar
  19. 19.
    Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L., Wu, H.: Adcrowdnet: an attention-injective deformable convolutional network for crowd understanding. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  20. 20.
    Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  21. 21.
    Liu, X., De Weijer, J.V., Bagdanov, A.D.: Leveraging unlabeled data for crowd counting by learning to rank. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7661–7669 (2018)Google Scholar
  22. 22.
    Liu, Y., Shi, M., Zhao, Q., Wang, X.: Point in, box out: beyond counting persons in crowds. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  23. 23.
    Longyin, W., et al.: Drone-based joint density map estimation, localization and tracking with space-time multi-scale attention network. arxiv (2020)Google Scholar
  24. 24.
    Loy, C.C., Gong, S., Xiang, T.: From semi-supervised to transfer counting of crowds. In: International Conference on Computer Vision, pp. 2256–2263 (2013)Google Scholar
  25. 25.
    Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  26. 26.
    Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving Jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_5CrossRefGoogle Scholar
  27. 27.
    Ranjan, V., Le, H., Hoai, M.: Iterative crowd counting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 278–293. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01234-2_17CrossRefGoogle Scholar
  28. 28.
    Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer Assisted Intervention, pp. 234–241 (2015)Google Scholar
  29. 29.
    Sam, D.B., Babu, R.V.: Top-down feedback for crowd counting convolutional neural network. In: National Conference on Artificial Intelligence, pp. 7323–7330 (2018)Google Scholar
  30. 30.
    Sam, D.B., Sajjan, N., Babu, R.V., Srinivasan, M.: Divide and grow: capturing huge diversity in crowd images with incrementally growing CNN. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3618–3626 (2018)Google Scholar
  31. 31.
    Sam, D.B., Sajjan, N.N., Maurya, H., Radhakrishnan, V.B.: Almost unsupervised learning for dense crowd counting. Assoc. Adv. Artif. Intell. 33, 8868–8875 (2019)Google Scholar
  32. 32.
    Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4031–4039 (2017)Google Scholar
  33. 33.
    Sermanet, P., et al.: Time-contrastive networks: self-supervised learning from video. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  34. 34.
    Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., Yang, X.: Crowd counting via adversarial cross-scale consistency pursuit. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018Google Scholar
  35. 35.
    Sheng, X., Tang, J., Xiao, X., Xue, G.: Leveraging GPS-less sensing scheduling for green mobile crowd sensing. IEEE Internet Things J. 1(4), 328–336 (2014)CrossRefGoogle Scholar
  36. 36.
    Shi, M., Yang, Z., Xu, C., Chen, Q.: Revisiting perspective information for efficient crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  37. 37.
    Shi, Z., Mettes, P., Snoek, C.G.M.: Counting with focus for free. In: International Conference on Computer Vision, pp. 4200–4209 (2019)Google Scholar
  38. 38.
    Shi, Z., et al.: Crowd counting with deep negative correlation learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5382–5390 (2018)Google Scholar
  39. 39.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)Google Scholar
  40. 40.
    Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: International Conference on Computer Vision, pp. 1879–1888 (2017)Google Scholar
  41. 41.
    Wan, J., Chan, A.B.: Adaptive density map generation for crowd counting. In: International Conference on Computer Vision, pp. 1130–1139 (2019)Google Scholar
  42. 42.
    Wang, Q., Gao, J., Lin, W., Yuan, Y.: Learning from synthetic data for crowd counting in the wild. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  43. 43.
    Xu, D., Xiao, J., Zhao, Z., Shao, J., Xie, D., Zhuang, Y.: Self-supervised spatiotemporal learning via video clip order prediction. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10334–10343 (2019)Google Scholar
  44. 44.
    Yan, Z., et al.: Perspective-guided convolution networks for crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  45. 45.
    Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 833–841 (2015)Google Scholar
  46. 46.
    Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 589–597 (2016)Google Scholar
  47. 47.
    Zhao, M., Zhang, J., Zhang, C., Zhang, W.: Leveraging heterogeneous auxiliary tasks to assist crowd counting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12736–12745 (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyUCASBeijingChina
  2. 2.Key Lab of Big Data Mining and Knowledge ManagementUCASBeijingChina
  3. 3.Key Lab of Intelligent Information ProcessingICT, CASBeijingChina
  4. 4.University of TrentoTrentoItaly

Personalised recommendations