Advertisement

Learning to Count in the Crowd from Limited Labeled Data

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12356)

Abstract

Recent crowd counting approaches have achieved excellent performance. However, they are essentially based on fully supervised paradigm and require large number of annotated samples. Obtaining annotations is an expensive and labour-intensive process. In this work, we focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples while leveraging a large pool of unlabeled data. Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data, which is then used as supervision for training the network. The proposed method is shown to be effective under the reduced data (semi-supervised) settings for several datasets like ShanghaiTech, UCF-QNRF, WorldExpo, UCSD, etc. Furthermore, we demonstrate that the proposed method can be leveraged to enable the network in learning to count from synthetic dataset while being able to generalize better to real-world datasets (synthetic-to-real transfer).

Keywords

Crowd counting Semi-supervised learning Pseudo-labeling Domain adaptation Synthetic to real transfer 

Notes

Acknowledgement

This work was supported by the NSF grant 1910141.

Supplementary material

504452_1_En_13_MOESM1_ESM.pdf (220 kb)
Supplementary material 1 (pdf 219 KB)

References

  1. 1.
    Arteta, C., Lempitsky, V., Zisserman, A.: Counting in the wild. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 483–498. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46478-7_30CrossRefGoogle Scholar
  2. 2.
    Babu Sam, D., Sajjan, N.N., Venkatesh Babu, R., Srinivasan, M.: Divide and grow: capturing huge diversity in crowd images with incrementally growing CNN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3618–3626 (2018)Google Scholar
  3. 3.
    Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 757–773. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01228-1_45CrossRefGoogle Scholar
  4. 4.
    Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: counting people without people models or tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–7. IEEE (2008)Google Scholar
  5. 5.
    Chan, A.B., Vasconcelos, N.: Bayesian Poisson regression for crowd counting. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 545–551. IEEE (2009)Google Scholar
  6. 6.
    Change Loy, C., Gong, S., Xiang, T.: From semi-supervised to transfer counting of crowds. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2256–2263 (2013)Google Scholar
  7. 7.
    Chen, K., Loy, C.C., Gong, S., Xiang, T.: Feature mining for localised crowd counting. In: European Conference on Computer Vision (2012)Google Scholar
  8. 8.
    Gao, G., Gao, J., Liu, Q., Wang, Q., Wang, Y.: CNN-based density estimation and crowd counting: a survey. arXiv preprint arXiv:2003.12783 (2020)
  9. 9.
    Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2547–2554 (2013)Google Scholar
  10. 10.
    Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 544–559. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01216-8_33CrossRefGoogle Scholar
  11. 11.
    Jiang, X., et al.: Crowd counting and density estimation by trellis encoder-decoder network. arXiv preprint arXiv:1903.00853 (2019)
  12. 12.
    Kang, D., Ma, Z., Chan, A.B.: Beyond counting: comparisons of density maps for crowd analysis tasks-counting, detection, and tracking. arXiv preprint arXiv:1705.10118 (2017)
  13. 13.
    Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242 (2016)
  14. 14.
    Lee, D.H.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks (2013)Google Scholar
  15. 15.
    Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in Neural Information Processing Systems, pp. 1324–1332 (2010)Google Scholar
  16. 16.
    Li, M., Zhang, Z., Huang, K., Tan, T.: Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In: 19th International Conference on Pattern Recognition, ICPR 2008, pp. 1–4. IEEE (2008)Google Scholar
  17. 17.
    Li, T., Chang, H., Wang, M., Ni, B., Hong, R., Yan, S.: Crowded scene analysis: a survey. IEEE Trans. Circ. Syst. Video Technol. 25(3), 367–386 (2015)CrossRefGoogle Scholar
  18. 18.
    Li, W., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 18–32 (2014)CrossRefGoogle Scholar
  19. 19.
    Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1091–1100 (2018)Google Scholar
  20. 20.
    Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L., Wu, H.: ADCrowdNet: an attention-injective deformable convolutional network for crowd understanding. arXiv preprint arXiv:1811.11968 (2018)
  21. 21.
    Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5099–5108 (2019)Google Scholar
  22. 22.
    Liu, X., van de Weijer, J., Bagdanov, A.D.: Leveraging unlabeled data for crowd counting by learning to rank. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018Google Scholar
  23. 23.
    Lu, H., Cao, Z., Xiao, Y., Zhuang, B., Shen, C.: TasselNet: counting maize tassels in the wild via local counts regression network. Plant Methods 13(1), 79 (2017).  https://doi.org/10.1186/s13007-017-0224-0CrossRefGoogle Scholar
  24. 24.
    Miyato, T., Maeda, S.I., Koyama, M., Ishii, S.: Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1979–1993 (2018)CrossRefGoogle Scholar
  25. 25.
    Oliver, A., Odena, A., Raffel, C.A., Cubuk, E.D., Goodfellow, I.: Realistic evaluation of deep semi-supervised learning algorithms. In: Advances in Neural Information Processing Systems, pp. 3235–3246 (2018)Google Scholar
  26. 26.
    Oñoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 615–629. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46478-7_38CrossRefGoogle Scholar
  27. 27.
    Pham, V.Q., Kozakaya, T., Yamaguchi, O., Okada, R.: Count forest: co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3253–3261 (2015)Google Scholar
  28. 28.
    Ranjan, V., Le, H., Hoai, M.: Iterative crowd counting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 278–293. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01234-2_17CrossRefGoogle Scholar
  29. 29.
    Rasmussen, C.E.: Gaussian processes in machine learning. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) ML -2003. LNCS (LNAI), vol. 3176, pp. 63–71. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-28650-9_4CrossRefGoogle Scholar
  30. 30.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  31. 31.
    Ryan, D., Denman, S., Fookes, C., Sridharan, S.: Crowd counting using multiple local features. In: Digital Image Computing: Techniques and Applications, DICTA 2009, pp. 81–88. IEEE (2009)Google Scholar
  32. 32.
    Sam, D.B., Babu, R.V.: Top-down feedback for crowd counting convolutional neural network. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)Google Scholar
  33. 33.
    Sam, D.B., et al.: Locate, size and count: accurately resolving people in dense crowds via detection. arXiv preprint arXiv:1906.07538 (2019)
  34. 34.
    Sam, D.B., Peri, S.V., Mukuntha, N., Babu, R.V.: Going beyond the regression paradigm with accurate dot prediction for dense crowds. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2853–2861. IEEE (2020)Google Scholar
  35. 35.
    Sam, D.B., Sajjan, N.N., Maurya, H., Babu, R.V.: Almost unsupervised learning for dense crowd counting. In: Thirty-Third AAAI Conference on Artificial Intelligence (2019)Google Scholar
  36. 36.
    Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  37. 37.
    Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., Yang, X.: Crowd counting via adversarial cross-scale consistency pursuit. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018Google Scholar
  38. 38.
    Shi, M., Yang, Z., Xu, C., Chen, Q.: Revisiting perspective information for efficient crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7279–7288 (2019)Google Scholar
  39. 39.
    Shi, Z., et al.: Crowd counting with deep negative correlation learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018Google Scholar
  40. 40.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)Google Scholar
  41. 41.
    Sindagi, V.A., Patel, V.M.: CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: 2017 IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE (2017)Google Scholar
  42. 42.
    Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: The IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  43. 43.
    Sindagi, V.A., Patel, V.M.: A survey of recent advances in CNN-based single image crowd counting and density estimation. Pattern Recogn. Lett. 107, 3–16 (2017)CrossRefGoogle Scholar
  44. 44.
    Sindagi, V.A., Patel, V.M.: DAFE-FD: Density aware feature enrichment for face detection. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2185–2195. IEEE (2019)Google Scholar
  45. 45.
    Sindagi, V.A., Patel, V.M.: HA-CCN: Hierarchical attention-based crowd counting network. arXiv preprint arXiv:1907.10255 (2019)
  46. 46.
    Sindagi, V.A., Patel, V.M.: Inverse attention guided deep crowd counting network. arXiv preprint (2019)Google Scholar
  47. 47.
    Sindagi, V.A., Patel, V.M.: Multi-level bottom-top and top-bottom feature fusion for crowd counting. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1002–1012 (2019)Google Scholar
  48. 48.
    Sindagi, V.A., Yasarla, R., Patel, V.M.: Pushing the frontiers of unconstrained crowd counting: new dataset and benchmark method. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1221–1231 (2019)Google Scholar
  49. 49.
    Sindagi, V.A., Yasarla, R., Patel, V.M.: JHU-CROWD++: large-scale crowd counting dataset and a benchmark method. arXiv preprint arXiv:2004.03597 (2020)
  50. 50.
    Toropov, E., Gui, L., Zhang, S., Kottur, S., Moura, J.M.: Traffic flow from a low frame rate city camera. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 3802–3806. IEEE (2015)Google Scholar
  51. 51.
    Walach, E., Wolf, L.: Learning to count with CNN boosting. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 660–676. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_41CrossRefGoogle Scholar
  52. 52.
    Wan, J., Chan, A.: Adaptive density map generation for crowd counting. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1130–1139 (2019)Google Scholar
  53. 53.
    Wan, J., Luo, W., Wu, B., Chan, A.B., Liu, W.: Residual regression with semantic prior for crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4036–4045 (2019)Google Scholar
  54. 54.
    Wang, C., Zhang, H., Yang, L., Liu, S., Cao, X.: Deep people counting in extremely dense crowds. In: Proceedings of the 23rd ACM international conference on Multimedia, pp. 1299–1302. ACM (2015)Google Scholar
  55. 55.
    Wang, Q., Gao, J., Lin, W., Yuan, Y.: Learning from synthetic data for crowd counting in the wild. arXiv preprint arXiv:1903.03303 (2019)
  56. 56.
    Xu, B., Qiu, G.: Crowd density estimation based on rich features and random projection forest. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–8. IEEE (2016)Google Scholar
  57. 57.
    Yasarla, R., Sindagi, V.A., Patel, V.M.: Syn2Real transfer learning for image deraining using Gaussian processes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2726–2736 (2020)Google Scholar
  58. 58.
    Zhan, B., Monekosso, D.N., Remagnino, P., Velastin, S.A., Xu, L.Q.: Crowd analysis: a survey. Mach. Vis. Appl. 19(5–6), 345–357 (2008).  https://doi.org/10.1007/s00138-008-0132-4CrossRefGoogle Scholar
  59. 59.
    Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 833–841 (2015)Google Scholar
  60. 60.
    Zhang, Q., Chan, A.B.: Wide-area crowd counting via ground-plane density maps and multi-view fusion CNNs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8297–8306 (2019)Google Scholar
  61. 61.
    Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)Google Scholar
  62. 62.
    Zhao, M., Zhang, J., Zhang, C., Zhang, W.: Leveraging heterogeneous auxiliary tasks to assist crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12736–12745 (2019)Google Scholar
  63. 63.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Johns Hopkins UniversityBaltimoreUSA
  2. 2.Indian Institute of ScienceBangaloreIndia

Personalised recommendations