How do targets, nontargets, and scene context influence real-world object detection?
- 220 Downloads
Humans excel at finding objects in complex natural scenes, but the features that guide this behaviour have proved elusive. We used computational modeling to measure the contributions of target, nontarget, and coarse scene features towards object detection in humans. In separate experiments, participants detected cars or people in a large set of natural scenes. For each scene, we extracted target-associated features, annotated the presence of nontarget objects (e.g., parking meter, traffic light), and extracted coarse scene structure from the blurred image. These scene-specific values were then used to model human reaction times for each novel scene. As expected, target features were the strongest predictor of detection times in both tasks. Interestingly, target detection time was additionally facilitated by coarse scene features but not by nontarget objects. In contrast, nontarget objects predicted target-absent responses in both person and car tasks, with contributions from target features in the person task. In most cases, features that speeded up detection tended to slow down rejection. Taken together, these findings demonstrate that humans show systematic variations in object detection that can be understood using computational modeling.
KeywordsCategorization Scene Perception Object Recognition
- Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. Proceedings of the 2005 I.E. Computer Society Conference on Computer Vision and Pattern Recognition CVPR 2005, I. (pp.886–893). Available at: http://ieeexplore.ieee.org/document/1467232/?reload=true
- Delorme, A., Richard, G., & Fabre-Thorpe, M. (2010). Key visual features for rapid categorization of animals in natural scenes. Frontiers in Psychology, 1(JUN), 21.Google Scholar
- Fabre-Thorpe, M. (2011). The characteristics and limits of rapid visual categorization. Frontiers in Psychology, 2(OCT), 243.Google Scholar
- Harel, A., & Bentin, S. (2009). Stimulus type, level of categorization, and spatial-frequencies utilization: Implications for perceptual categorization hierarchies. Journal of Experimental Psychology: Human Perception and Performance, 35(4), 1264–1273. www.ncbi.nlm.nih.gov/pubmed/19653764 PubMedPubMedCentralGoogle Scholar
- Krizhevsky, A., Sulskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Proceedings of the Advances in Neural Information and Processing Systems (NIPS) (Vol. 25, pp. 1–9). Retrieved from https://papers.nips.cc/book/advances-in-neural-information-processing-systems-25-2012
- Lapuschkin, S., Binder, A., Montavon, G., Muller, K.-R., & Samek, W. (2016). Analyzing classifiers: Fisher vectors and deep neural networks. 2016 I.E. Conference on Computer Vision and Pattern Recognition (CVPR) (p. 17). Retrieved from http://iphome.hhi.de/samek/pdf/LapCVPR16.pdf
- Li, F. F., VanRullen, R., Koch, C., & Perona, P. (2002). Rapid natural scene categorization in the near absence of attention. Proceedings of the National Academy of Sciences of the United States of America, 99(14), 9596–601. Retrieved from http://www.pnas.org/content/99/14/9596.full CrossRefPubMedPubMedCentralGoogle Scholar
- Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., … Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8693 LNCS(PART 5), 740–755. doi:10.1007/978-3-319-10602-1_48.
- Oliva, A., & Torralba, A. (2008). Building the gist of a scene: The role of global image features in recognition. In Progress in Brain Research (pp. 23–39). doi: 10.1016/S0079-6123(06)55002-2
- Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-CNN: Towards real-time object detection with region proposal networks. Poster presented at the IEEE Transactions on Pattern Analysis and Machine Intelligence, PP(99). Retrieved from https://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks
- Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S.,…Fei-Fei, L. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252. doi:10.1007/s11263-015-0816-y
- Vighneshvel, T., & Arun, S. P. (2013). Does linear separability really matter? Complex visual search is explained by simple search. Journal of Vision, 13, 10. Retrieved from http://www.journalofvision.org/content/13/11/10.short CrossRefPubMedPubMedCentralGoogle Scholar
- Wolfe, J. M., Alvarez, G. A., Rosenholtz, R., Kuzmova, Y. I., & Sherman, A. M. (2011). Visual search for arbitrary objects in real scenes. Attention, Perception & Psychophysics, 73(6), 1650–1671. Retrieved from http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3153571&tool=pmcentrez&rendertype=abstract CrossRefGoogle Scholar
- Zelinsky, G. J., Peng, Y., & Samaras, D. (2013b). Eye can read your mind: Decoding gaze fixations to reveal categorical search targets. Journal of Vision, 13(14). Retrieved from http://www.journalofvision.org/content/13/14/10.abstract?ct
- Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2014). Object detectors emerge in deep scene CNNs. Arxiv, 12. Retrieved from http://arxiv.org/abs/1412.6856