Advertisement

Semi-supervised Crowd Counting via Self-training on Surrogate Tasks

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12360)

Abstract

Most existing crowd counting systems rely on the availability of the object location annotation which can be expensive to obtain. To reduce the annotation cost, one attractive solution is to leverage a large number of unlabeled images to build a crowd counting model in semi-supervised fashion. This paper tackles the semi-supervised crowd counting problem from the perspective of feature learning. Our key idea is to leverage the unlabeled images to train a generic feature extractor rather than the entire network of a crowd counter. The rationale of this design is that learning the feature extractor can be more reliable and robust towards the inevitable noisy supervision generated from the unlabeled data. Also, on top of a good feature extractor, it is possible to build a density map regressor with much fewer density map annotations. Specifically, we proposed a novel semi-supervised crowd counting method which is built upon two innovative components: (1) a set of inter-related binary segmentation tasks are derived from the original density map regression task as the surrogate prediction target; (2) the surrogate target predictors are learned from both labeled and unlabeled data by utilizing a proposed self-training scheme which fully exploits the underlying constraints of these binary segmentation tasks. Through experiments, we show that the proposed method is superior over the existing semi-supervised crowd counting method and other representative baselines.

Keywords

Crowd counting Surrogate tasks Self-training Semi-supervised learning 

Notes

Acknowledgement

This work was supported by the Key Research and Development Program of Sichuan Province (2019YFG0409). Lingqiao Liu was in part supported by ARC DECRA Fellowship DE170101259.

Supplementary material

504470_1_En_15_MOESM1_ESM.pdf (148 kb)
Supplementary material 1 (pdf 148 KB)

References

  1. 1.
    Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in Neural Information Processing Systems (NIPS), pp. 1324–1332 (2010)Google Scholar
  2. 2.
    Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 833–841 (2015)Google Scholar
  3. 3.
    Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 589–597 (2016)Google Scholar
  4. 4.
    Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5744–5752 (2017)Google Scholar
  5. 5.
    Li, Y., Zhang, X., Chen, D.: Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1091–1100 (2018)Google Scholar
  6. 6.
    Kang, K., Wang, X.: Fully convolutional neural networks for crowd segmentation. arXiv preprint arXiv:1411.4464 (2014)
  7. 7.
    Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: Large kernel matters-improve semantic segmentation by global convolutional network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4353–4361 (2017)Google Scholar
  8. 8.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015)Google Scholar
  9. 9.
    Subburaman, V.B., Descamps, A., Carincotte, C.: Counting people in the crowd using a generic head detector. In: IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), pp. 470–475 (2012)Google Scholar
  10. 10.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)Google Scholar
  11. 11.
    Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. (IJCV) 57(2), 137–154 (2004)CrossRefGoogle Scholar
  12. 12.
    Chen, K., Loy, C.C., Gong, S., Xiang, T., : Feature mining for localised crowd counting. In: British Machine Vision Conference (BMVC), pp. 21.1–21.11 (2012)Google Scholar
  13. 13.
    Chan, A.B., Vasconcelos, N.: Bayesian poisson regression for crowd counting. In: IEEE International Conference on Computer Vision (ICCV), pp. 545–551 (2009)Google Scholar
  14. 14.
    Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2547–2554 (2013)Google Scholar
  15. 15.
    Fiaschi, L., Köthe, U., Nair, R., Hamprecht, F.A.: Learning to count with regression forest and structured labels. In: International Conference on Pattern Recognition (ICPR), pp. 2685–2688 (2012)Google Scholar
  16. 16.
    Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: IEEE International Conference on Computer Vision (ICCV), pp. 1861–1870 (2017)Google Scholar
  17. 17.
    Chen, X., Bin, Y., Sang, N., Gao, C.: Scale pyramid network for crowd counting. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1941–1950 (2019)Google Scholar
  18. 18.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)Google Scholar
  19. 19.
    Jiang, X., et al.: Crowd counting and density estimation by trellis encoder-decoder networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6133–6142 (2019)Google Scholar
  20. 20.
    Zhao, M., Zhang, J., Zhang, C., Zhang, W.: Leveraging heterogeneous auxiliary tasks to assist crowd counting. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12736–12745(2019)Google Scholar
  21. 21.
    Ranjan, V., Le, H., Hoai, M.: Iterative crowd counting. In: European Conference on Computer Vision (ECCV), pp. 270–285 (2018)Google Scholar
  22. 22.
    Zhu, L., Zhao, Z., Lu, C., Lin, Y., Peng, Y., Yao, T.: Dual path multi-scale fusion networks with attention for crowd counting. arXiv preprint arXiv:1902.01115 (2019)
  23. 23.
    Liu, J., Gao, C., Meng, D., Hauptmann, A.G.: Decidenet: counting varying density crowds through attention guided detection and density estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9175–9184 (2018)Google Scholar
  24. 24.
    Lian, D., Li, J., Zheng, J., Luo, W., Gao, S.: Density map regression guided detection network for RGB-D crowd counting and localization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1821–1830 (2019)Google Scholar
  25. 25.
    Jiang, S., Lu, X., Lei, Y., Liu, L.: Mask-aware networks for crowd counting. IEEE Trans. Circ. Syst. Video Technol. (TCSVT) (2019)Google Scholar
  26. 26.
    Valloli, V.K. and Mehta, K.: W-net: reinforced u-net for density map estimation. arXiv preprint arXiv:1903.11249 (2019)
  27. 27.
    Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: European Conference on Computer Vision (ECCV), pp. 532–546 (2018)Google Scholar
  28. 28.
    Zhang, A., et al.: Relational attention network for crowd counting. In: IEEE International Conference on Computer Vision (ICCV), pp. 6788–6797 (2019)Google Scholar
  29. 29.
    Wan, J., Chan, A.: Adaptive density map generation for crowd counting. In: IEEE International Conference on Computer Vision (ICCV), pp. 1130–1139 (2019)Google Scholar
  30. 30.
    Sindagi, V.A., Yasarla, R., Patel, V.M.: Pushing the frontiers of unconstrained crowd counting: new dataset and benchmark method. In: IEEE International Conference on Computer Vision (ICCV), pp. 1221–1231 (2019)Google Scholar
  31. 31.
    Xiong, H., Lu, H., Liu, C., Liu, L., Cao, Z., Shen, C.: From open set to closed set: counting objects by spatial divide-and-conquer. In: IEEE International Conference on Computer Vision (ICCV), pp. 8362–8371 (2019)Google Scholar
  32. 32.
    Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: IEEE International Conference on Computer Vision (ICCV), pp. 6142–6151 (2019)Google Scholar
  33. 33.
    Liu, X., Van De Weijer, J., Bagdanov, A.D.: Leveraging unlabeled data for crowd counting by learning to rank. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7661–7669 (2018)Google Scholar
  34. 34.
    von Borstel, M., Kandemir, M., Schmidt, P., Rao, M.K., Rajamani, K., Hamprecht, F.A.: Gaussian process density counting from weak supervision. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 365–380. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_22CrossRefGoogle Scholar
  35. 35.
    Sam, D.B., Sajjan, N.N., Maurya, H., Babu, R.V.: Almost unsupervised learning for dense crowd counting. In: AAAI Conference on Artificial Intelligence (AAAI), vol. 33, pp. 8868–8875 (2019)Google Scholar
  36. 36.
    Wang, Q., Gao, J., Lin, W., Yuan, Y.: Learning from synthetic data for crowd counting in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8198–8207 (2019)Google Scholar
  37. 37.
    Gao, J., Wang, Q., Yuan, Y.: Feature-aware adaptation and structured density alignment for crowd counting in video surveillance. arXiv preprint arXiv:1912.03672 (2019)
  38. 38.
    Gao, J., Han, T., Wang, Q., Yuan, Y.: Domain-adaptive crowd counting via inter-domain features segregation and gaussian-prior reconstruction. arXiv preprint arXiv:1912.03677 (2019)
  39. 39.
    Xie, Q., Dai, Z., Hovy, E., Luong, M.T., Le, Q.V.: Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848 (2019)
  40. 40.
    Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems (NIPS), pp. 1195–1204 (2017)Google Scholar
  41. 41.
    Verma, V., Lamb, A., Kannala, J., Bengio, Y., Lopez-Paz, D.: Interpolation consistency training for semi-supervised learning. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 3635–3641 (2019)Google Scholar
  42. 42.
    Zhang, F., Bo, D., Zhang, L.: Saliency-guided unsupervised feature learning for scene classification. IEEE Trans. Geosci. Remote Sensing (TGRS) 53(4), 2175–2184 (2014)CrossRefGoogle Scholar
  43. 43.
    Dosovitskiy, A., Springenberg, J.T., Riedmiller, M., Brox, T.: Discriminative unsupervised feature learning with convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 766–774 (2014)Google Scholar
  44. 44.
    Yang, Y., Shu, G., Shah, M.: Semi-supervised learning of feature hierarchies for object detection in a video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1650–1657 (2013)Google Scholar
  45. 45.
    Cheng, Y., Zhao, X., Huang, K. and Tan, T.: Semi-supervised learning for RGB-D object recognition. In: International Conference on Pattern Recognition (ICPR), pp. 2377–2382 (2014)Google Scholar
  46. 46.
    Gidaris, S., Bursuc, A., Komodakis, N., Pérez, P., Cord, M.: Boosting few-shot visual learning with self-supervision. In: IEEE International Conference on Computer Vision (ICCV), pp. 8059–8068 (2019)Google Scholar
  47. 47.
    Li, H., Eigen, D., Dodge, S., Zeiler, M., Wang, X.: Finding task-relevant features for few-shot learning by category traversal. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–10 (2019)Google Scholar
  48. 48.
    Socher, R., Fei-Fei, L.: Connecting modalities: semi-supervised segmentation and annotation of images using unaligned text corpora. In: IEEE Computer Conference on Computer Vision and Pattern Recognition (CVPR), pp. 966–973 (2010)Google Scholar
  49. 49.
    Karnyaczki, S., Desrosiers, C.: A sparse coding method for semi-supervised segmentation with multi-class histogram constraints. In: IEEE International Conference on Image Processing (ICIP), pp. 3215–3219 (2015)Google Scholar
  50. 50.
    Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. (JAIR) 2, 263–286 (1994)CrossRefGoogle Scholar
  51. 51.
    Zhang, H., Cisse, M., Dauphin, Y.N.,Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (ICLR) (2018)Google Scholar
  52. 52.
    Kingma, D.P., Ba, J.: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR), pp. 1–13 (2014)Google Scholar
  53. 53.
    Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems (NIPS), pp. 8024–8035 (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.College of Electronics and Information EngieeringSichuan UniversityChengduChina
  2. 2.School of Computer ScienceThe University of AdelaideAdelaideAustralia
  3. 3.School of Computing and Information TechnologyUniversity of WollongongWollongongAustralia
  4. 4.School of Artificial IntelligenceDalian University of TechnologyDalianChina

Personalised recommendations