Skip to main content

Target-Absent Human Attention

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13664))

Included in the following conference series:

Abstract

The prediction of human gaze behavior is important for building human-computer interaction systems that can anticipate the user’s attention. Computer vision models have been developed to predict the fixations made by people as they search for target objects. But what about when the target is not in the image? Equally important is to know how people search when they cannot find a target, and when they would stop searching. In this paper, we propose a data-driven computational model that addresses the search-termination problem and predicts the scanpath of search fixations made by people searching for targets that do not appear in images. We model visual search as an imitation learning problem and represent the internal knowledge that the viewer acquires through fixations using a novel state representation that we call Foveated Feature Maps (FFMs). FFMs integrate a simulated foveated retina into a pretrained ConvNet that produces an in-network feature pyramid, all with minimal computational overhead. Our method integrates FFMs as the state representation in inverse reinforcement learning. Experimentally, we improve the state of the art in predicting human target-absent search behavior on the COCO-Search18 dataset. Code is available at: https://github.com/cvlab-stonybrook/Target-absent-Human-Attention.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Note that it is not our aim to perfectly approximate the information extracted by a human foveated retina.

  2. 2.

    Both cIG and cNSS can only be computed for auto-regressive probabilistic models (our method, IRL, detector and fixation heuristic).

References

  1. Akbas, E., Eckstein, M.P.: Object detection through search with a foveated visual system. PLoS Comput. Biol. 13(10), e1005743 (2017)

    Article  Google Scholar 

  2. Alexander, R.G., Zelinsky, G.J.: Visual similarity effects in categorical search. J. Vis. 11(8), 9–9 (2011)

    Article  Google Scholar 

  3. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)

  4. Borji, A., Tavakoli, H.R., Sihite, D.N., Itti, L.: Analysis of scores, datasets, and models in visual saliency prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 921–928 (2013)

    Google Scholar 

  5. Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? IEEE Trans. Pattern Anal. Mach. Intell. 41(3), 740–757 (2018)

    Article  Google Scholar 

  6. Caesar, H., Uijlings, J., Ferrari, V.: Coco-stuff: thing and stuff classes in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1209–1218 (2018)

    Google Scholar 

  7. Chen, X., Jiang, M., Zhao, Q.: Predicting human scanpaths in visual question answering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10876–10885 (2021)

    Google Scholar 

  8. Chen, Y., Yang, Z., Ahn, S., Samaras, D., Hoai, M., Zelinsky, G.: Coco-search18 fixation dataset for predicting goal-directed attention control. Sci. Rep. 11(1), 1–11 (2021)

    Google Scholar 

  9. Chen, Y., et al.: Characterizing target-absent human attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 5031–5040 (2022)

    Google Scholar 

  10. Chun, M.M., Wolfe, J.M.: Just say no: how are visual searches terminated when there is no target present? Cogn. Psychol. 30(1), 39–78 (1996)

    Article  Google Scholar 

  11. Eckstein, M.P.: Visual search: a retrospective. J. Vis. 11(5), 14–14 (2011)

    Article  Google Scholar 

  12. Garg, D., Chakraborty, S., Cundy, C., Song, J., Ermon, S.: IQ-learn: inverse soft-Q learning for imitation. In: Advances in Neural Information Processing Systems, vol. 34 (2021)

    Google Scholar 

  13. Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=Bygh9j09KX

  14. Haarnoja, T., Tang, H., Abbeel, P., Levine, S.: Reinforcement learning with deep energy-based policies. In: International Conference on Machine Learning, pp. 1352–1361. PMLR (2017)

    Google Scholar 

  15. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)

    Google Scholar 

  16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  17. Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to common corruptions and perturbations. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=HJz6tiCqYm

  18. Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Advances in Neural Information Processing Systems, vol. 29 (2016)

    Google Scholar 

  19. Jaramillo-Avila, U., Anderson, S.R.: Foveated image processing for faster object detection and recognition in embedded systems using deep convolutional neural networks. In: Martinez-Hernandez, U., et al. (eds.) Living Machines 2019. LNCS (LNAI), vol. 11556, pp. 193–204. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-24741-6_17

    Chapter  Google Scholar 

  20. Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: saliency in context. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015

    Google Scholar 

  21. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (Poster) (2015)

    Google Scholar 

  22. Kirillov, A., Girshick, R., He, K., Dollár, P.: Panoptic feature pyramid networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6399–6408 (2019)

    Google Scholar 

  23. Kostrikov, I., Agrawal, K.K., Dwibedi, D., Levine, S., Tompson, J.: Discriminator-actor-critic: addressing sample inefficiency and reward bias in adversarial imitation learning. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=Hk4fpoA5Km

  24. Kümmerer, M., Bethge, M.: State-of-the-art in human scanpath prediction. arXiv preprint arXiv:2102.12239 (2021)

  25. Kümmerer, M., Theis, L., Bethge, M.: Deep gaze I: boosting saliency prediction with feature maps trained on imagenet. arXiv preprint arXiv:1411.1045 (2014)

  26. Kümmerer, M., Wallis, T.S., Bethge, M.: Information-theoretic model comparison unifies saliency metrics. Proc. Natl. Acad. Sci. 112(52), 16054–16059 (2015)

    Article  Google Scholar 

  27. Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)

    Google Scholar 

  28. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

    Google Scholar 

  29. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

    Google Scholar 

  30. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  31. Linardos, A., Kümmerer, M., Press, O., Bethge, M.: DeepGaze IIE: calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021)

    Google Scholar 

  32. Najemnik, J., Geisler, W.S.: Optimal eye movement strategies in visual search. Nature 434(7031), 387–391 (2005)

    Article  Google Scholar 

  33. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)

    Article  Google Scholar 

  34. Perry, J.S., Geisler, W.S.: Gaze-contingent real-time simulation of arbitrary visual fields. In: Human Vision and Electronic Imaging VII, vol. 4662, pp. 57–70. International Society for Optics and Photonics (2002)

    Google Scholar 

  35. Petersen, S.E., Posner, M.I.: The attention system of the human brain: 20 years after. Annu. Rev. Neurosci. 35, 73–89 (2012)

    Article  Google Scholar 

  36. Posner, M.I.: Attention: the mechanisms of consciousness. Proc. Natl. Acad. Sci. 91(16), 7398–7403 (1994)

    Article  Google Scholar 

  37. Posner, M.I., Petersen, S.E.: The attention system of the human brain. Annu. Rev. Neurosci. 13(1), 25–42 (1990)

    Article  Google Scholar 

  38. Rashidi, S., Ehinger, K., Turpin, A., Kulik, L.: Optimal visual search based on a model of target detectability in natural images. In: Advances in Neural Information Processing Systems, vol. 33 (2020)

    Google Scholar 

  39. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  40. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  41. Wolfe, J.M.: Guided search 6.0: an updated model of visual search. Psychon. Bull. Rev. 28(4), 1060–1092 (2021)

    Article  Google Scholar 

  42. Wolfe, J.: Visual search. In: Pashler, H. (ed.) Attention (1998)

    Google Scholar 

  43. Yang, Z., et al.: Predicting goal-directed human attention using inverse reinforcement learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 193–202 (2020)

    Google Scholar 

  44. Zelinsky, G., et al.: Benchmarking gaze prediction for categorical visual search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)

    Google Scholar 

  45. Zelinsky, G.J.: A theory of eye movements during target acquisition. Psychol. Rev. 115(4), 787 (2008)

    Article  Google Scholar 

  46. Zelinsky, G.J., et al.: Predicting goal-directed attention control using inverse-reinforcement learning. Neurons Behav. Data Anal. Theory 2021 (2021)

    Google Scholar 

  47. Zelinsky, G.J., Peng, Y., Samaras, D.: Eye can read your mind: decoding gaze fixations to reveal categorical search targets. J. Vis. 13(14), 10–10 (2013)

    Article  Google Scholar 

  48. Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)

Download references

Acknowledgements

The authors would like to thank Jianyuan Deng for her help in result visualization and statistical analysis. This project was partially supported by US National Science Foundation Awards IIS-1763981 and IIS-2123920, the Partner University Fund, the SUNY2020 Infrastructure Transportation Security Center, and a gift from Adobe.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhibo Yang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 607 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, Z., Mondal, S., Ahn, S., Zelinsky, G., Hoai, M., Samaras, D. (2022). Target-Absent Human Attention. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13664. Springer, Cham. https://doi.org/10.1007/978-3-031-19772-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19772-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19771-0

  • Online ISBN: 978-3-031-19772-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics