Counting in the Wild

  • Carlos ArtetaEmail author
  • Victor Lempitsky
  • Andrew Zisserman
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9911)


In this paper we explore the scenario of learning to count multiple instances of objects from images that have been dot-annotated through crowdsourcing. Specifically, we work with a large and challenging image dataset of penguins in the wild, for which tens of thousands of volunteer annotators have placed dots on instances of penguins in tens of thousands of images. The dataset, introduced and released with this paper, shows such a high-degree of object occlusion and scale variation that individual object detection or simple counting-density estimation is not able to estimate the bird counts reliably.

To address the challenging counting task, we augment and interleave density estimation with foreground-background segmentation and explicit local uncertainty estimation. The three tasks are solved jointly by a new deep multi-task architecture. Using this multi-task learning, we show that the spread between the annotators can provide hints about local object scale and aid the foreground-background segmentation, which can then be used to set a better target density for learning density prediction. Considerable improvements in counting accuracy over a single-task density estimation approach are observed in our experiments.


Depth Information Convolution Neural Network Counting Task Object Count Segmentation Mask 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We thank Dr. Tom Hart and the Zooniverse team for their leading role in the penguin watch project. Financial support was provided by the RCUK Centre for Doctoral Training in Healthcare Innovation (EP/G036861/1) and the EPSRC Programme Grant Seebibyte EP/M013774/1.


  1. 1.
    Arteta, C., Lempitsky, V., Noble, J.A., Zisserman, A.: Interactive object counting. In: ECCV (2014)Google Scholar
  2. 2.
    Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: counting people without people models or tracking. In: CVPR (2008)Google Scholar
  3. 3.
    Chan, A.B., Vasconcelos, N.: Bayesian poisson regression for crowd counting. In: CVPR (2009)Google Scholar
  4. 4.
    Fiaschi, L., Nair, R., Köethe, U., Hamprecht, F.: Learning to count with regression forest and structured labels. In: ICPR (2012)Google Scholar
  5. 5.
    Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: representing model uncertainty in deep learning. arXiv preprint arXiv:1506.02142 (2015)
  6. 6.
    Idrees, H., Soomro, K., Shah, M.: Detecting humans in dense crowds using locally-consistent scale prior and global occlusion reasoning. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1986–1998 (2015)CrossRefGoogle Scholar
  7. 7.
    Kong, D., Gray, D., Tao, H.: A viewpoint invariant approach for crowd counting. In: ICPR (2006)Google Scholar
  8. 8.
    Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: NIPS (2010)Google Scholar
  9. 9.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  10. 10.
    Ma, F., Li, Y., Li, Q., Qiu, M., Gao, J., Zhi, S., Su, L., Zhao, B., Ji, H., Han, J.: FaitCrowd: fine grained truth discovery for crowdsourced data aggregation. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2015)Google Scholar
  11. 11.
    Ma, Z., Yu, L., Chan, A.B.: Small instance detection by integer programming on object density maps. In: CVPR (2015)Google Scholar
  12. 12.
    Ouyang, R.W., Kaplan, L.M., Toniolo, A., Srivastava, M., Norman, T.: Parallel and streaming truth discovery in large-scale quantitative crowdsourcing. IEEE Trans. Parallel Distrib. Syst. PP(99), 1 (2016)Google Scholar
  13. 13.
  14. 14.
    Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. J. Mach. Learn. Res. 11, 1297–1322 (2010)MathSciNetGoogle Scholar
  15. 15.
    Russakovsky, O., Bearman, A.L., Ferrari, V., Li, F.F.: What’s the point: semantic segmentation with point supervision. arXiv preprint arXiv:1506.02106 (2015)
  16. 16.
    Shao, J., Kang, K., Loy, C.C., Wang, X.: Deeply learned attributes for crowded scene understanding. In: CVPR (2015)Google Scholar
  17. 17.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)Google Scholar
  18. 18.
    Straehle, C., Koethe, U., Hamprecht, F.A.: Weakly supervised learning of image partitioning using decision trees with structured split criteria. In: ICCV (2013)Google Scholar
  19. 19.
    Van Horn, G., Branson, S., Farrell, R., Haber, S., Barry, J., Ipeirotis, P., Perona, P., Belongie, S.: Building a bird recognition app. and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: CVPR (2015)Google Scholar
  20. 20.
    Wang, T., Han, B., Collomosse, J.: Touchcut: fast image and video segmentation using single-touch interaction. Comput. Vis. Image Underst. 120, 14–30 (2014)CrossRefGoogle Scholar
  21. 21.
    Welinder, P., Branson, S., Perona, P., Belongie, S.J.: The multidimensional wisdom of crowds. In: NIPS (2010)Google Scholar
  22. 22.
    Whitehill, J., Wu, T.f., Bergsma, J., Movellan, J.R., Ruvolo, P.L.: Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: NIPS (2009)Google Scholar
  23. 23.
    Wolley, C., Quafafou, M.: Learning from multiple naive annotators. In: Zhou, S., Zhang, S., Karypis, G. (eds.) ADMA 2012. LNCS (LNAI), vol. 7713, pp. 173–185. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-35527-1_15 CrossRefGoogle Scholar
  24. 24.
    Xie, W., Noble, J.A., Zisserman, A.: Microscopy cell counting with fully convolutional regression networks. In: MICCAI 1st Workshop on Deep Learning in Medical Image Analysis (2015)Google Scholar
  25. 25.
    Xie, W., Noble, J.A., Zisserman, A.: Microscopy cell counting and detection with fully convolutional regression networks. Comput. Methods Biomech. Biomed. Eng. Imaging Visual. 1–10 (2016)Google Scholar
  26. 26.
    Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: CVPR (2015)Google Scholar
  27. 27.

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Carlos Arteta
    • 1
    Email author
  • Victor Lempitsky
    • 2
  • Andrew Zisserman
    • 1
  1. 1.Department of Engineering ScienceUniversity of OxfordOxfordUK
  2. 2.Skolkovo Institute of Science and Technology (Skoltech)MoscowRussia

Personalised recommendations