Salient Object Subitizing

Zhang, Jianming; Ma, Shugao; Sameki, Mehrnoosh; Sclaroff, Stan; Betke, Margrit; Lin, Zhe; Shen, Xiaohui; Price, Brian; Měch, Radomír

doi:10.1007/s11263-017-1011-0

Jianming Zhang ORCID: orcid.org/0000-0002-9954-6294²,
Shugao Ma¹,
Mehrnoosh Sameki¹,
Stan Sclaroff ORCID: orcid.org/0000-0002-0711-4313¹,
Margrit Betke¹,
Zhe Lin²,
Xiaohui Shen²,
Brian Price² &
…
Radomír Měch²

914 Accesses
12 Citations
Explore all metrics

Abstract

We study the problem of salient object subitizing, i.e. predicting the existence and the number of salient objects in an image using holistic cues. This task is inspired by the ability of people to quickly and accurately identify the number of items within the subitizing range (1–4). To this end, we present a salient object subitizing image dataset of about 14 K everyday images which are annotated using an online crowdsourcing marketplace. We show that using an end-to-end trained convolutional neural network (CNN) model, we achieve prediction accuracy comparable to human performance in identifying images with zero or one salient object. For images with multiple salient objects, our model also provides significantly better than chance performance without requiring any localization process. Moreover, we propose a method to improve the training of the CNN subitizing model by leveraging synthetic images. In experiments, we demonstrate the accuracy and generalizability of our CNN subitizing model and its applications in salient object detection and image retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantical Knowledge Guided Salient Object Detection with Multiple Proposals

Salient Objects in Clutter: Bringing Salient Object Detection to the Foreground

Salient Object Detection Using DenseNet Features

Notes

http://www.cs.bu.edu/groups/ivc/Subitizing/.
We use the subset of ImageNet images with bounding box annotations.
The F-score is computed as \(\frac{2RP}{(R+P)}\), where R and P denote recall and precision respectively.
When evaluated on the test set used by Zhang et al. (2015a), our best method GoogleNet_Syn_FT achieves a mAP score of 85.0%.

References

Achanta, R., Hemami, S., Estrada, F., & Susstrunk, S. (2009). Frequency-tuned salient region detection. In IEEE conference on computer vision and pattern recognition (CVPR).
Anoraganingrum, D. (1999). Cell segmentation with median filter and mathematical morphology operation. In International conference on image analysis and processing.
Arteta, C., Lempitsky, V., Noble, J. A., & Zisserman, A. (2014). Interactive object counting. In European conference on computer vision (ECCV).
Atkinson, J., Campbell, F. W., & Francis, M. R. (1976). The magic number \(4\pm 0\): A new look at visual numerosity judgements. Perception, 5(3), 327–34.
Google Scholar
Berg, T. L., & Berg, A. C. (2009). Finding iconic images. In IEEE conference on computer vision and pattern recognition (CVPR) workshops.
Borji, A., Sihite, D. N., & Itti, L. (2012). Salient object detection: A benchmark. In European conference on computer vision (ECCV).
Boysen, S. T., & Capaldi, E. J. (2014). The development of numerical competence: Animal and human models. Hove: Psychology Press.
Google Scholar
Chan, A. B., & Vasconcelos, N. (2009). Bayesian poisson regression for crowd counting. In IEEE international conference on computer vision (ICCV).
Chan, A. B., Liang, Z.-S. J., & Vasconcelos, N. (2008). Privacy preserving crowd monitoring: Counting people without people models or tracking. In IEEE conference on computer vision and pattern recognition (CVPR).
Chatfield, K., Lempitsky, V., Vedaldi, A., & Zisserman, A. (2011). The devil is in the details: An evaluation of recent feature encoding methods. In British Machine Vision Conference (BMVC).
Cheng, M.-M, Zhang, G.-X., Mitra, N. J., Huang, X., & Hu, S.-M. (2011). Global contrast based salient region detection. In IEEE conference on computer vision and pattern recognition (CVPR).
Cheng, M.-M., Mitra, N. J., Huang, X., Torr, P. H. S., & Hu, S.-M. (2015). Global contrast based salient region detection. IEEE Transaction on Pattern Analysis and Machine Intelligence, 37(3), 569–582.
Article Google Scholar
Choi, J., Jung, C., Lee, J., & Kim, C. (2014). Determining the existence of objects in an image and its application to image thumbnailing. Signal Processing Letters, 21(8), 957–961.
Article Google Scholar
Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., & Zheng, Y. (2009). NUS-WIDE: A real-world web image database from National University of Singapore. In Proceedings of the ACM international conference on image and video retrieval.
Clements, D. H. (1999). Subitizing: What is it? Why teach it? Teaching Children Mathematics, 5, 400–405.
Google Scholar
Davis, H., & Pérusse, R. (1988). Numerical competence in animals: Definitional issues, current evidence, and a new research agenda. Behavioral and Brain Sciences, 11(04), 561–579.
Article Google Scholar
Dehaene, S. (2011). The number sense: How the mind creates mathematics. Oxford: Oxford University Press.
MATH Google Scholar
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman J. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.
Article Google Scholar
Feng, J., Wei, Y., Tao, L., Zhang, C., & Sun, J. (2011). Salient object detection by composition. In IEEE international conference on computer vision (ICCV).
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE conference on computer vision and pattern recognition (CVPR).
Gopalakrishnan, V., Hu, Y., Rajan, D. (2009). Random walks on graphs to model saliency in images. In IEEE conference on computer vision and pattern recognition (CVPR).
Gross, H. J. (2012). The magical number four: A biological, historical and mythological enigma. Communicative & Integrative Biology, 5(1), 1–2.
Article Google Scholar
Gross, H. J., Pahl, M., Si, A., Zhu, H., Tautz, J., & Zhang, S. (2009). Number-based visual generalisation in the honeybee. PLoS ONE, 4(1), e4263.
Article Google Scholar
Gurari, D., & Grauman, K. (2016). Visual question: Predicting if a crowd will agree on the answer. ArXiv preprint arXiv:1608.08188.
Heo, J.-P., Lin, Z., & Yoon, S.-E. (2014). Distance encoded product quantization. In IEEE conference on computer vision and pattern recognition (CVPR).
Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Synthetic data and artificial neural networks for natural scene text recognition. In Workshop on deep learning, NIPS.
Jansen, B. R. J., Hofman, A. D., Straatemeier, M., Bers, B. M. C. W., Raijmakers, M. E. J., & Maas, H. L. J. (2014). The role of pattern recognition in children’s exact enumeration of small numbers. British Journal of Developmental Psychology, 32(2), 178–194.
Article Google Scholar
Jevons, W. S. (1871). The power of numerical discrimination. Nature, 3, 281–282.
Article Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACM international conference on multimedia.
Kaufman, E. L., Lord, M. W., Reese, T. W., & Volkmann, J. (1949). The discrimination of visual number. The American Journal of Psychology, 62, 498–525.
Article Google Scholar
Kazemzadeh, S., Ordonez, V., Matten, M., & Berg, T. L. (2014). Referitgame: Referring to objects in photographs of natural scenes. In Conference on empirical methods in natural language processing (EMNLP).
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS).
Lee, Y. J., Ghosh, J., & Grauman, K. (2012). Discovering important people and objects for egocentric video summarization. In IEEE conference on computer vision and pattern recognition (CVPR).
Lempitsky, V., & Zisserman, A. (2010). Learning to count objects in images. In Advances in neural information processing systems (NIPS).
Li, X., Uricchio, T., Ballan, L., Bertini, M., Snoek, C. G. M., & Bimbo, A. D. (2016). Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval. ACM Computing Surveys, 49(1), 14:1–14:39.
Article Google Scholar
Li, Y., Hou, X., Koch, C., Rehg, J., & Yuille, A. (2014). The secrets of salient object segmentation. In IEEE conference on computer vision and pattern recognition (CVPR).
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014). Microsoft COCO: Common objects in context. In European conference on computer vision (ECCV).
Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., et al. (2011). Learning to detect a salient object. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(2), 353–367.
Article Google Scholar
Mandler, G., & Shebo, B. J. (1982). Subitizing: An analysis of its component processes. Journal of Experimental Psychology: General, 111(1), 1.
Article Google Scholar
Nath, S. K., Palaniappan, K., & Bunyak, F. (2006). Cell segmentation using coupled level sets and graph-vertex coloring. In Medical image computing and computer-assisted intervention (MICCAI).
Pahl, M., Si, A., & Zhang, S. (2013). Numerical cognition in bees and other insects. Frontiers in psychology, 4, 162.
Article Google Scholar
Peng, Xi., Sun, B., Ali, K., & Saenko, K. (2015). Learning deep object detectors from 3d models. In IEEE international conference on computer vision (ICCV).
Piazza, M., & Dehaene, S. (2004). From number neurons to mental arithmetic: The cognitive neuroscience of number sense. The Cognitive Neurosciences (3rd ed.), pp. 865–877.
Pinheiro, P. O., Lin, T.-Y, Collobert, R., & Dollr, P. (2016). Learning to refine object segments. In European conference on computer vision (ECCV).
Pont-Tuset, J., Arbelaez, P., Barron, J. T., Marques, F., & Malik, J. (2017). Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(1), 128–140.
Article Google Scholar
Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. In IEEE conference on computer vision and pattern recognition (CVPR), DeepVision Workshop.
Russakovsky, O., Deng, J., Hao, S., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3), 211–252.
Article MathSciNet Google Scholar
Scharfenberger, C., Waslander, S. L., Zelek, J. S., & Clausi, D. A. (2013). Existence detection of objects in images for robot vision using saliency histogram features. In IEEE international conference on computer and robot vision (CRV).
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2014). Overfeat: Integrated recognition, localization and detection using convolutional networks. In International conference on learning representations (ICLR).
Shen, X., & Wu, Y. (2012). A unified approach to salient object detection via low rank matrix recovery. In IEEE conference on computer vision and pattern recognition (CVPR).
Shin, D., He, Shu, Lee, G. M, Whinston, A. B., Cetintas, S., & Lee, K.-C. (2016). Content complexity, similarity, and consistency in social media: A deep learning approach. https://ssrn.com/abstract=2830377.
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.
Siva, P., Russell, C., Xiang, T., & Agapito, L. (2013). Looking beyond the image: Unsupervised learning for object saliency and detection. In IEEE conference on computer vision and pattern recognition (CVPR).
Stark, M., Goesele, M., Schiele, B. (2010). Back to the future: Learning shape models from 3D CAD data. In British Machine Vision Conference (BMVC).
Stoianov, I., & Zorzi, M. (2012). Emergence of a visual number sense in hierarchical generative models. Nature Neuroscience, 15(2), 194–196.
Article Google Scholar
Subburaman, V. B., Descamps, A., & Carincotte, C. (2012). Counting people in the crowd using a generic head detector. In IEEE international conference on advanced video and signal-based surveillance (AVSS).
Sun, B., & Saenko, K. (2014). From virtual to reality: Fast adaptation of virtual object detectors to real domains. In British Machine Vision Conference (BMVC).
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In IEEE conference on computer vision and pattern recognition (CVPR).
Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A. (2003). Context-based vision system for place and object recognition. In IEEE international conference on computer vision (ICCV).
Trick, L. M., & Pylyshyn, Z. W. (1994). Why are small and large numbers enumerated differently? A limited-capacity preattentive stage in vision. Psychological Review, 101(1), 80.
Article Google Scholar
Vedaldi, A., & Fulkerson, B. (2008). VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/.
Vuilleumier, P. O., & Rafal, R. D. (2000). A systematic study of visual extinction between-and within-field deficits of attention in hemispatial neglect. Brain, 123(6), 1263–1279.
Article Google Scholar
Wang, P., Wang, J., Zeng, G., Feng, J., Zha, H., & Li, S. (2012). Salient object detection for searched web images via global saliency. In IEEE conference on computer vision and pattern recognition (CVPR).
Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In IEEE conference on computer vision and pattern recognition (CVPR).
Xiong, B., & Grauman, K. (2014). Detecting snap points in egocentric video with a web photo prior. In European conference on computer vision (ECCV). Springer.
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A. C., Salakhutdinov, R., et al. (2015). Show, attend and tell: Neural image caption generation with visual attention. In Internation conference on machine learning (ICML).
Zhang, J., Ma, S., Sameki, M., Sclaroff, S., Betke, M., Lin, Z., et al. (2015a). Salient object subitizing. In IEEE conference on computer vision and pattern recognition (CVPR).
Zhang, J., Sclaroff, S., Lin, Z., Shen, X., Price, B., & Mĕch, R. (2015b). Minimum barrier salient object detection at 80 fps. In IEEE international conference on computer vision (ICCV).
Zhang, J., Sclaroff, S., Lin, Z., Shen, X., Price, B., & Mĕch, R. (2016). Unconstrained salient object detection via proposal subset optimization. In IEEE conference on computer vision and pattern recognition (CVPR).
Zhao, R., Ouyang, W., Li, H., & Wang, X. (2015). Saliency detection by multi-context deep learning. In IEEE conference on computer vision and pattern recognition (CVPR).
Zou, W. Y., & McClelland, J. L. (2013). Progressive development of the number sense in a deep neural network. In Annual conference of the Cognitive Science Society (CogSci).

Download references

Acknowledgements

This research was supported in part by US NSF Grants 0910908 and 1029430, and gifts from Adobe and NVIDIA.

Author information

Authors and Affiliations

Computer Science Department, Boston University, Boston, MA, USA
Shugao Ma, Mehrnoosh Sameki, Stan Sclaroff & Margrit Betke
Adobe Research, San Jose, CA, USA
Jianming Zhang, Zhe Lin, Xiaohui Shen, Brian Price & Radomír Měch

Authors

Jianming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shugao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Mehrnoosh Sameki
View author publications
You can also search for this author in PubMed Google Scholar
Stan Sclaroff
View author publications
You can also search for this author in PubMed Google Scholar
Margrit Betke
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Lin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohui Shen
View author publications
You can also search for this author in PubMed Google Scholar
Brian Price
View author publications
You can also search for this author in PubMed Google Scholar
Radomír Měch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianming Zhang.

Additional information

Communicated by Antonio Torralba.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 136 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, J., Ma, S., Sameki, M. et al. Salient Object Subitizing. Int J Comput Vis 124, 169–186 (2017). https://doi.org/10.1007/s11263-017-1011-0

Download citation

Received: 10 July 2016
Accepted: 03 April 2017
Published: 12 April 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s11263-017-1011-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Salient Object Subitizing

Abstract

Access this article

Similar content being viewed by others

Semantical Knowledge Guided Salient Object Detection with Multiple Proposals

Salient Objects in Clutter: Bringing Salient Object Detection to the Foreground

Salient Object Detection Using DenseNet Features

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 136 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Salient Object Subitizing

Abstract

Access this article

Similar content being viewed by others

Semantical Knowledge Guided Salient Object Detection with Multiple Proposals

Salient Objects in Clutter: Bringing Salient Object Detection to the Foreground

Salient Object Detection Using DenseNet Features

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 136 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation