Attention, Perception, & Psychophysics

, Volume 79, Issue 7, pp 2021–2036 | Cite as

How do targets, nontargets, and scene context influence real-world object detection?

Article

Abstract

Humans excel at finding objects in complex natural scenes, but the features that guide this behaviour have proved elusive. We used computational modeling to measure the contributions of target, nontarget, and coarse scene features towards object detection in humans. In separate experiments, participants detected cars or people in a large set of natural scenes. For each scene, we extracted target-associated features, annotated the presence of nontarget objects (e.g., parking meter, traffic light), and extracted coarse scene structure from the blurred image. These scene-specific values were then used to model human reaction times for each novel scene. As expected, target features were the strongest predictor of detection times in both tasks. Interestingly, target detection time was additionally facilitated by coarse scene features but not by nontarget objects. In contrast, nontarget objects predicted target-absent responses in both person and car tasks, with contributions from target features in the person task. In most cases, features that speeded up detection tended to slow down rejection. Taken together, these findings demonstrate that humans show systematic variations in object detection that can be understood using computational modeling.

Keywords

Categorization Scene Perception Object Recognition 

Supplementary material

13414_2017_1359_MOESM1_ESM.docx (2.1 mb)
ESM 1(DOCX 2200 kb)

References

  1. Auckland, M. E., Cave, K. R., & Donnelly, N. (2007). Nontarget objects can influence perceptual processes during object recognition. Psychonomic Bulletin & Review, 14(2), 332–337.CrossRefGoogle Scholar
  2. Bar, M. (2004). Visual objects in context. Nature Reviews Neuroscience, 5, 617–629.CrossRefPubMedGoogle Scholar
  3. Bar, M., & Ullman, S. (1996). Spatial context in recognition. Perception, 25(3), 343–352.CrossRefPubMedGoogle Scholar
  4. Barenholtz, E. (2013). Quantifying the role of context in visual object recognition. Visual Cognition, 22(1), 30–56. doi:10.1080/13506285.2013.865694 CrossRefGoogle Scholar
  5. Biederman, I., Mezzanotte, R. J., & Rabinowitz, J. C. (1982). Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 14(2), 143–177.CrossRefPubMedGoogle Scholar
  6. Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10(4), 433–436.CrossRefPubMedGoogle Scholar
  7. Castelhano, M. S., & Heaven, C. (2010). The relative contribution of scene context and target features to visual search in scenes. Attention, Perception & Psychophysics, 72(5), 1283–1297.CrossRefGoogle Scholar
  8. Choi, M. J., Torralba, A., & Willsky, A. S. (2012). Context models and out-of-context objects. Pattern Recognition Letters, 33(7), 853–862. doi:10.1016/j.patrec.2011.12.004 CrossRefGoogle Scholar
  9. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. Proceedings of the 2005 I.E. Computer Society Conference on Computer Vision and Pattern Recognition CVPR 2005, I. (pp.886–893). Available at: http://ieeexplore.ieee.org/document/1467232/?reload=true
  10. Delorme, A., Richard, G., & Fabre-Thorpe, M. (2010). Key visual features for rapid categorization of animals in natural scenes. Frontiers in Psychology, 1(JUN), 21.Google Scholar
  11. Duncan, J., & Humphreys, G. W. (1989). Visual search and stimulus similarity. Psychological Review, 96(3), 433–458.CrossRefPubMedGoogle Scholar
  12. Ehinger, K. A., Hidalgo-Sotelo, B., Torralba, A., & Oliva, A. (2009). Modelling search for people in 900 scenes: A combined source model of eye guidance. Visual Cognition, 17(6/7), 945–978.CrossRefPubMedPubMedCentralGoogle Scholar
  13. Evans, K. K., & Treisman, A. (2005). Perception of objects in natural scenes: Is it really attention free? Journal of Experimental Psychology. Human Perception and Performance, 31(6), 1476–1492.CrossRefPubMedGoogle Scholar
  14. Everingham, M., Ali Eslami, S. M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2014). The Pascal Visual Object Classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.CrossRefGoogle Scholar
  15. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The Pascal Visual Object Classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.CrossRefGoogle Scholar
  16. Fabre-Thorpe, M. (2011). The characteristics and limits of rapid visual categorization. Frontiers in Psychology, 2(OCT), 243.Google Scholar
  17. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminative trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRefPubMedGoogle Scholar
  18. Harel, A., & Bentin, S. (2009). Stimulus type, level of categorization, and spatial-frequencies utilization: Implications for perceptual categorization hierarchies. Journal of Experimental Psychology: Human Perception and Performance, 35(4), 1264–1273. www.ncbi.nlm.nih.gov/pubmed/19653764 PubMedPubMedCentralGoogle Scholar
  19. Jacob, M., & Hochstein, S. (2010). Graded recognition as a function of the number of target fixations. Vision Research, 1, 107–117.CrossRefGoogle Scholar
  20. Joubert, O. R., Fize, D., Rousselet, G. A., & Fabre-Thorpe, M. (2008). Early interference of context congruence on object processing in rapid visual categorization of natural scenes. Journal of Vision, 8(13), 1–18. Retrieved from http://eprints.gla.ac.uk/32899/ CrossRefPubMedGoogle Scholar
  21. Joubert, O. R., Rousselet, G. A., Fabre-Thorpe, M., & Fize, D. (2009). Rapid visual categorization of natural scene contexts with equalized amplitude spectrum and increasing phase noise. Journal of Vision, 9(1), 2.1–16.CrossRefGoogle Scholar
  22. Kaiser, D., Stein, T., & Peelen, M. V. (2014). Object grouping based on real-world regularities facilitates perception by reducing competitive interactions in visual cortex. Proceedings of the National Academy of Sciences, 111(30), 1217–11222.CrossRefGoogle Scholar
  23. Kiani, R., Corthell, L., & Shadlen, M. N. (2014). Choice certainty is informed by both evidence and decision time. Neuron, 84(6), 1329–1342.CrossRefPubMedPubMedCentralGoogle Scholar
  24. Krizhevsky, A., Sulskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Proceedings of the Advances in Neural Information and Processing Systems (NIPS) (Vol. 25, pp. 1–9). Retrieved from https://papers.nips.cc/book/advances-in-neural-information-processing-systems-25-2012
  25. Lapuschkin, S., Binder, A., Montavon, G., Muller, K.-R., & Samek, W. (2016). Analyzing classifiers: Fisher vectors and deep neural networks. 2016 I.E. Conference on Computer Vision and Pattern Recognition (CVPR) (p. 17). Retrieved from http://iphome.hhi.de/samek/pdf/LapCVPR16.pdf
  26. Lewis, M. B., & Edmonds, A. J. (2003). Face detection: Mapping human performance. Perception, 32(8), 903–920.CrossRefPubMedGoogle Scholar
  27. Li, F. F., VanRullen, R., Koch, C., & Perona, P. (2002). Rapid natural scene categorization in the near absence of attention. Proceedings of the National Academy of Sciences of the United States of America, 99(14), 9596–601. Retrieved from http://www.pnas.org/content/99/14/9596.full CrossRefPubMedPubMedCentralGoogle Scholar
  28. Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., … Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8693 LNCS(PART 5), 740–755. doi:10.1007/978-3-319-10602-1_48.
  29. Malcolm, G. L., Nuthmann, A., & Schyns, P. G. (2014). Beyond gist: Strategic and incremental information accumulation for scene categorization. Psychological Science, 25(5), 1087–1097.CrossRefPubMedPubMedCentralGoogle Scholar
  30. Mohan, K., & Arun, S. P. (2012). Similarity relations in visual search predict rapid visual categorization. Journal of Vision, 12, 19–19.CrossRefPubMedPubMedCentralGoogle Scholar
  31. Morrison, D. J., & Schyns, P. G. (2001). Usage of spatial scales for the categorization of faces, objects, and scenes. Psychonomic Bulletin & Review, 8(3), 454–469.CrossRefGoogle Scholar
  32. Motter, B. C., & Holsapple, J. (2007). Saccades and covert shifts of attention during active visual search: Spatial distributions, memory, and items per fixation. Vision Research, 47(10), 1261–1281.CrossRefPubMedGoogle Scholar
  33. Neider, M. B., & Zelinsky, G. J. (2011). Cutting through the clutter: Searching for targets in evolving complex scenes. Journal of Vision, 11(14), 1–16.CrossRefGoogle Scholar
  34. Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.CrossRefGoogle Scholar
  35. Oliva, A., & Torralba, A. (2008). Building the gist of a scene: The role of global image features in recognition. In Progress in Brain Research (pp. 23–39). doi: 10.1016/S0079-6123(06)55002-2
  36. Peelen, M. V., Fei-Fei, L., & Kastner, S. (2009). Neural mechanisms of rapid natural scene categorization in human visual cortex. Nature, 460(7251), 94–97.CrossRefPubMedPubMedCentralGoogle Scholar
  37. Pelli, D. G. (1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision, 10(4), 437–442.CrossRefPubMedGoogle Scholar
  38. Reeder, R. R., & Peelen, M. V. (2013). The contents of the search template for category-level search in natural scenes. Journal of Vision, 13(3), 1–13.CrossRefGoogle Scholar
  39. Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-CNN: Towards real-time object detection with region proposal networks. Poster presented at the IEEE Transactions on Pattern Analysis and Machine Intelligence, PP(99). Retrieved from https://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks
  40. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S.,…Fei-Fei, L. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252. doi:10.1007/s11263-015-0816-y
  41. Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008). LabelMe: A database and web-based tool for image annotation. International Journal of Computer Vision, 77(1/3), 157–173.CrossRefGoogle Scholar
  42. Schall, J. D., Purcell, B. A., Heitz, R. P., Logan, G. D., & Palmeri, T. J. (2011). Neural mechanisms of saccade target selection: Gated accumulator model of the visual-motor cascade. European Journal of Neuroscience, 33(11), 1991–2002.CrossRefPubMedPubMedCentralGoogle Scholar
  43. Serre, T., Oliva, A., & Poggio, T. (2007). A feedforward architecture accounts for rapid categorization. Proceedings of the National Academy of Sciences of the United States of America, 104(5), 6424–6429. doi:10.1073/pnas.0700622104 CrossRefPubMedPubMedCentralGoogle Scholar
  44. Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system. Nature, 381(6582), 520–522.CrossRefPubMedGoogle Scholar
  45. Torralba, A. (2003). Contextual priming for object detection. International Journal of Computer Vision, 53(2), 169–191.CrossRefGoogle Scholar
  46. Torralba, A., Oliva, A., Castelhano, M. S., & Henderson, J. M. (2006). Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search. Psychological Review, 113(4), 766–786.CrossRefPubMedGoogle Scholar
  47. Ullman, S., Vidal-Naquet, M., & Sali, E. (2002). Visual features of intermediate complexity and their use in classification. Nature Neuroscience, 5(7), 682–687.PubMedGoogle Scholar
  48. Vighneshvel, T., & Arun, S. P. (2013). Does linear separability really matter? Complex visual search is explained by simple search. Journal of Vision, 13, 10. Retrieved from http://www.journalofvision.org/content/13/11/10.short CrossRefPubMedPubMedCentralGoogle Scholar
  49. Walther, D. B., & Fei-Fei, L. (2007). Task-set switching with natural scenes: Measuring the cost of deploying top-down attention. Journal of Vision, 7(11), 9.1–12.CrossRefGoogle Scholar
  50. Wolfe, J. M., Alvarez, G. A., Rosenholtz, R., Kuzmova, Y. I., & Sherman, A. M. (2011). Visual search for arbitrary objects in real scenes. Attention, Perception & Psychophysics, 73(6), 1650–1671. Retrieved from http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3153571&tool=pmcentrez&rendertype=abstract CrossRefGoogle Scholar
  51. Zelinsky, G. J., Peng, Y., & Samaras, D. (2013a). Eye can read your mind: Decoding gaze fixations to reveal categorical search targets. Journal of Vision, 13(14), 1–13.CrossRefGoogle Scholar
  52. Zelinsky, G. J., Peng, Y., & Samaras, D. (2013b). Eye can read your mind: Decoding gaze fixations to reveal categorical search targets. Journal of Vision, 13(14). Retrieved from http://www.journalofvision.org/content/13/14/10.abstract?ct
  53. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2014). Object detectors emerge in deep scene CNNs. Arxiv, 12. Retrieved from http://arxiv.org/abs/1412.6856
  54. Zimmermann, E., Schnier, F., & Lappe, M. (2010). The contribution of scene context on change detection performance. Vision Research, 50(20), 2062–2068.CrossRefPubMedGoogle Scholar

Copyright information

© The Psychonomic Society, Inc. 2017

Authors and Affiliations

  1. 1.Centre for NeuroscienceIndian Institute of ScienceBangaloreIndia
  2. 2.Center for Mind/Brain SciencesUniversity of TrentoRoveretoItaly

Personalised recommendations