International Journal of Computer Vision

, Volume 124, Issue 2, pp 169–186 | Cite as

Salient Object Subitizing

  • Jianming Zhang
  • Shugao Ma
  • Mehrnoosh Sameki
  • Stan Sclaroff
  • Margrit Betke
  • Zhe Lin
  • Xiaohui Shen
  • Brian Price
  • Radomír Měch
Article

Abstract

We study the problem of salient object subitizing, i.e. predicting the existence and the number of salient objects in an image using holistic cues. This task is inspired by the ability of people to quickly and accurately identify the number of items within the subitizing range (1–4). To this end, we present a salient object subitizing image dataset of about 14 K everyday images which are annotated using an online crowdsourcing marketplace. We show that using an end-to-end trained convolutional neural network (CNN) model, we achieve prediction accuracy comparable to human performance in identifying images with zero or one salient object. For images with multiple salient objects, our model also provides significantly better than chance performance without requiring any localization process. Moreover, we propose a method to improve the training of the CNN subitizing model by leveraging synthetic images. In experiments, we demonstrate the accuracy and generalizability of our CNN subitizing model and its applications in salient object detection and image retrieval.

Keywords

Salient object Subitizing Deep learning Convolutional neural network 

Notes

Acknowledgements

This research was supported in part by US NSF Grants 0910908 and 1029430, and gifts from Adobe and NVIDIA.

Supplementary material

11263_2017_1011_MOESM1_ESM.pdf (136 kb)
Supplementary material 1 (pdf 136 KB)

References

  1. Achanta, R., Hemami, S., Estrada, F., & Susstrunk, S. (2009). Frequency-tuned salient region detection. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  2. Anoraganingrum, D. (1999). Cell segmentation with median filter and mathematical morphology operation. In International conference on image analysis and processing.Google Scholar
  3. Arteta, C., Lempitsky, V., Noble, J. A., & Zisserman, A. (2014). Interactive object counting. In European conference on computer vision (ECCV).Google Scholar
  4. Atkinson, J., Campbell, F. W., & Francis, M. R. (1976). The magic number \(4\pm 0\): A new look at visual numerosity judgements. Perception, 5(3), 327–34.Google Scholar
  5. Berg, T. L., & Berg, A. C. (2009). Finding iconic images. In IEEE conference on computer vision and pattern recognition (CVPR) workshops.Google Scholar
  6. Borji, A., Sihite, D. N., & Itti, L. (2012). Salient object detection: A benchmark. In European conference on computer vision (ECCV).Google Scholar
  7. Boysen, S. T., & Capaldi, E. J. (2014). The development of numerical competence: Animal and human models. Hove: Psychology Press.Google Scholar
  8. Chan, A. B., & Vasconcelos, N. (2009). Bayesian poisson regression for crowd counting. In IEEE international conference on computer vision (ICCV).Google Scholar
  9. Chan, A. B., Liang, Z.-S. J., & Vasconcelos, N. (2008). Privacy preserving crowd monitoring: Counting people without people models or tracking. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  10. Chatfield, K., Lempitsky, V., Vedaldi, A., & Zisserman, A. (2011). The devil is in the details: An evaluation of recent feature encoding methods. In British Machine Vision Conference (BMVC).Google Scholar
  11. Cheng, M.-M, Zhang, G.-X., Mitra, N. J., Huang, X., & Hu, S.-M. (2011). Global contrast based salient region detection. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  12. Cheng, M.-M., Mitra, N. J., Huang, X., Torr, P. H. S., & Hu, S.-M. (2015). Global contrast based salient region detection. IEEE Transaction on Pattern Analysis and Machine Intelligence, 37(3), 569–582.CrossRefGoogle Scholar
  13. Choi, J., Jung, C., Lee, J., & Kim, C. (2014). Determining the existence of objects in an image and its application to image thumbnailing. Signal Processing Letters, 21(8), 957–961.CrossRefGoogle Scholar
  14. Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., & Zheng, Y. (2009). NUS-WIDE: A real-world web image database from National University of Singapore. In Proceedings of the ACM international conference on image and video retrieval.Google Scholar
  15. Clements, D. H. (1999). Subitizing: What is it? Why teach it? Teaching Children Mathematics, 5, 400–405.Google Scholar
  16. Davis, H., & Pérusse, R. (1988). Numerical competence in animals: Definitional issues, current evidence, and a new research agenda. Behavioral and Brain Sciences, 11(04), 561–579.CrossRefGoogle Scholar
  17. Dehaene, S. (2011). The number sense: How the mind creates mathematics. Oxford: Oxford University Press.MATHGoogle Scholar
  18. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman J. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
  19. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRefGoogle Scholar
  20. Feng, J., Wei, Y., Tao, L., Zhang, C., & Sun, J. (2011). Salient object detection by composition. In IEEE international conference on computer vision (ICCV).Google Scholar
  21. Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  22. Gopalakrishnan, V., Hu, Y., Rajan, D. (2009). Random walks on graphs to model saliency in images. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  23. Gross, H. J. (2012). The magical number four: A biological, historical and mythological enigma. Communicative & Integrative Biology, 5(1), 1–2.CrossRefGoogle Scholar
  24. Gross, H. J., Pahl, M., Si, A., Zhu, H., Tautz, J., & Zhang, S. (2009). Number-based visual generalisation in the honeybee. PLoS ONE, 4(1), e4263.CrossRefGoogle Scholar
  25. Gurari, D., & Grauman, K. (2016). Visual question: Predicting if a crowd will agree on the answer. ArXiv preprint arXiv:1608.08188.
  26. Heo, J.-P., Lin, Z., & Yoon, S.-E. (2014). Distance encoded product quantization. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  27. Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Synthetic data and artificial neural networks for natural scene text recognition. In Workshop on deep learning, NIPS.Google Scholar
  28. Jansen, B. R. J., Hofman, A. D., Straatemeier, M., Bers, B. M. C. W., Raijmakers, M. E. J., & Maas, H. L. J. (2014). The role of pattern recognition in children’s exact enumeration of small numbers. British Journal of Developmental Psychology, 32(2), 178–194.CrossRefGoogle Scholar
  29. Jevons, W. S. (1871). The power of numerical discrimination. Nature, 3, 281–282.CrossRefGoogle Scholar
  30. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACM international conference on multimedia.Google Scholar
  31. Kaufman, E. L., Lord, M. W., Reese, T. W., & Volkmann, J. (1949). The discrimination of visual number. The American Journal of Psychology, 62, 498–525.CrossRefGoogle Scholar
  32. Kazemzadeh, S., Ordonez, V., Matten, M., & Berg, T. L. (2014). Referitgame: Referring to objects in photographs of natural scenes. In Conference on empirical methods in natural language processing (EMNLP).Google Scholar
  33. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS).Google Scholar
  34. Lee, Y. J., Ghosh, J., & Grauman, K. (2012). Discovering important people and objects for egocentric video summarization. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  35. Lempitsky, V., & Zisserman, A. (2010). Learning to count objects in images. In Advances in neural information processing systems (NIPS).Google Scholar
  36. Li, X., Uricchio, T., Ballan, L., Bertini, M., Snoek, C. G. M., & Bimbo, A. D. (2016). Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval. ACM Computing Surveys, 49(1), 14:1–14:39.CrossRefGoogle Scholar
  37. Li, Y., Hou, X., Koch, C., Rehg, J., & Yuille, A. (2014). The secrets of salient object segmentation. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  38. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014). Microsoft COCO: Common objects in context. In European conference on computer vision (ECCV).Google Scholar
  39. Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., et al. (2011). Learning to detect a salient object. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(2), 353–367.CrossRefGoogle Scholar
  40. Mandler, G., & Shebo, B. J. (1982). Subitizing: An analysis of its component processes. Journal of Experimental Psychology: General, 111(1), 1.CrossRefGoogle Scholar
  41. Nath, S. K., Palaniappan, K., & Bunyak, F. (2006). Cell segmentation using coupled level sets and graph-vertex coloring. In Medical image computing and computer-assisted intervention (MICCAI).Google Scholar
  42. Pahl, M., Si, A., & Zhang, S. (2013). Numerical cognition in bees and other insects. Frontiers in psychology, 4, 162.CrossRefGoogle Scholar
  43. Peng, Xi., Sun, B., Ali, K., & Saenko, K. (2015). Learning deep object detectors from 3d models. In IEEE international conference on computer vision (ICCV).Google Scholar
  44. Piazza, M., & Dehaene, S. (2004). From number neurons to mental arithmetic: The cognitive neuroscience of number sense. The Cognitive Neurosciences (3rd ed.), pp. 865–877.Google Scholar
  45. Pinheiro, P. O., Lin, T.-Y, Collobert, R., & Dollr, P. (2016). Learning to refine object segments. In European conference on computer vision (ECCV).Google Scholar
  46. Pont-Tuset, J., Arbelaez, P., Barron, J. T., Marques, F., & Malik, J. (2017). Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(1), 128–140.CrossRefGoogle Scholar
  47. Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. In IEEE conference on computer vision and pattern recognition (CVPR), DeepVision Workshop.Google Scholar
  48. Russakovsky, O., Deng, J., Hao, S., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3), 211–252.MathSciNetCrossRefGoogle Scholar
  49. Scharfenberger, C., Waslander, S. L., Zelek, J. S., & Clausi, D. A. (2013). Existence detection of objects in images for robot vision using saliency histogram features. In IEEE international conference on computer and robot vision (CRV).Google Scholar
  50. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2014). Overfeat: Integrated recognition, localization and detection using convolutional networks. In International conference on learning representations (ICLR).Google Scholar
  51. Shen, X., & Wu, Y. (2012). A unified approach to salient object detection via low rank matrix recovery. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  52. Shin, D., He, Shu, Lee, G. M, Whinston, A. B., Cetintas, S., & Lee, K.-C. (2016). Content complexity, similarity, and consistency in social media: A deep learning approach. https://ssrn.com/abstract=2830377.
  53. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.Google Scholar
  54. Siva, P., Russell, C., Xiang, T., & Agapito, L. (2013). Looking beyond the image: Unsupervised learning for object saliency and detection. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  55. Stark, M., Goesele, M., Schiele, B. (2010). Back to the future: Learning shape models from 3D CAD data. In British Machine Vision Conference (BMVC).Google Scholar
  56. Stoianov, I., & Zorzi, M. (2012). Emergence of a visual number sense in hierarchical generative models. Nature Neuroscience, 15(2), 194–196.CrossRefGoogle Scholar
  57. Subburaman, V. B., Descamps, A., & Carincotte, C. (2012). Counting people in the crowd using a generic head detector. In IEEE international conference on advanced video and signal-based surveillance (AVSS).Google Scholar
  58. Sun, B., & Saenko, K. (2014). From virtual to reality: Fast adaptation of virtual object detectors to real domains. In British Machine Vision Conference (BMVC).Google Scholar
  59. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  60. Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A. (2003). Context-based vision system for place and object recognition. In IEEE international conference on computer vision (ICCV).Google Scholar
  61. Trick, L. M., & Pylyshyn, Z. W. (1994). Why are small and large numbers enumerated differently? A limited-capacity preattentive stage in vision. Psychological Review, 101(1), 80.CrossRefGoogle Scholar
  62. Vedaldi, A., & Fulkerson, B. (2008). VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/.
  63. Vuilleumier, P. O., & Rafal, R. D. (2000). A systematic study of visual extinction between-and within-field deficits of attention in hemispatial neglect. Brain, 123(6), 1263–1279.CrossRefGoogle Scholar
  64. Wang, P., Wang, J., Zeng, G., Feng, J., Zha, H., & Li, S. (2012). Salient object detection for searched web images via global saliency. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  65. Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  66. Xiong, B., & Grauman, K. (2014). Detecting snap points in egocentric video with a web photo prior. In European conference on computer vision (ECCV). Springer.Google Scholar
  67. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A. C., Salakhutdinov, R., et al. (2015). Show, attend and tell: Neural image caption generation with visual attention. In Internation conference on machine learning (ICML).Google Scholar
  68. Zhang, J., Ma, S., Sameki, M., Sclaroff, S., Betke, M., Lin, Z., et al. (2015a). Salient object subitizing. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  69. Zhang, J., Sclaroff, S., Lin, Z., Shen, X., Price, B., & Mĕch, R. (2015b). Minimum barrier salient object detection at 80 fps. In IEEE international conference on computer vision (ICCV).Google Scholar
  70. Zhang, J., Sclaroff, S., Lin, Z., Shen, X., Price, B., & Mĕch, R. (2016). Unconstrained salient object detection via proposal subset optimization. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  71. Zhao, R., Ouyang, W., Li, H., & Wang, X. (2015). Saliency detection by multi-context deep learning. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  72. Zou, W. Y., & McClelland, J. L. (2013). Progressive development of the number sense in a deep neural network. In Annual conference of the Cognitive Science Society (CogSci).Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Computer Science DepartmentBoston UniversityBostonUSA
  2. 2.Adobe ResearchSan JoseUSA

Personalised recommendations