Stage-by-Stage Based Design Paradigm of Two-Pathway Model for Gaze Following

Cao, Zhongping; Wang, Guoli; Guo, Xuemei

doi:10.1007/978-3-030-31723-2_55

Zhongping Cao^16,17,
Guoli Wang^16,17 &
Xuemei Guo^16,17

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11858))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

2429 Accesses
4 Citations

Abstract

Gaze, which is an important non-verbal cue of interactions between human beings, can be used to estimate a person’s point of regard as well as deduce his intention. And gaze following is an task to estimate the visual attention of people in a single image. To tackle this challenging problem, earlier state-of-the-art work try to combine the information from image saliency as well as the gaze directions of people, thus demonstrate a deep-learning based two-pathway model. However, previous work do not focus much on why such a two-pathway model works well. Thus, in this paper, we divide the two-pathway model into three stages, compare different mechanisms in those stages to better understand how each stage may influence the model performance. Finally, we find out the best combinations of the mechanism in three stages and evaluate the model on the benchmark GazeFollow.

The first author is a student.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Recasens, A., Khosla, A., Vondrick, C., Torralba, A.: Where are they looking? In: Advances in Neural Information Processing Systems (NIPS) (2015)
Google Scholar
Jiang, M., Huang, S., Duan, J., et al.: Salicon: saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015)
Google Scholar
Krafka, K., Khosla, A., Kellnhofer, P., et al.: Eye tracking for everyone. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2176–2184 (2016)
Google Scholar
Recasens, A., Vondrick, C., Khosla, A., Torralba, A.: Following gaze in video. In: IEEE International Conference on Computer Vision (2017)
Google Scholar
Chong, E., Ruiz, N., et al.: Connecting gaze, scene, and attention: generalized attention estimation via joint modeling of gaze and scene saliency. In: The European Conference on Computer Vision (2018)
Google Scholar
He K, Zhang X, Ren S, et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)
Article Google Scholar
Deng, J., Dong, W., Socher, R., et al.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Google Scholar
Matthias, K., Lucas, T., Matthias, B.: Deep gaze I: boosting saliency prediction with feature maps trained on imagenet. CoRR, vol.abs/1411.1045 (2014)
Google Scholar
Kruthiventi, S.S., Ayush, K., et al.: DeepFix: a fully convolutional neural network for predicting human eye fixations. IEEE Trans. Image Process. 26(9), 4446–4455 (2017)
Article MathSciNet Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25(2), 1097–1105 (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR, vol.abs/1409.1556 (2014)
Google Scholar
Sun, X., Xiao, B., Liang, S., et al.: Integral human pose regression. CoRR, vol. abs/1711.08229 (2017)
Google Scholar
Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Article Google Scholar
Pytorch: tensors and dynamic neural networks in python with strong GPU acceleration. https://github.com/pytorch/pytorch. Accessed 03 Nov 2017
Judd, T., Ehinger, K., Durand, F., et al.: Learning to predict where humans look. In: Proceedings of the 2009 IEEE International Conference on Computer Vision (2009)
Google Scholar
Sourabh, V., Akshay, R., Trivedi, M.M.: Gaze zone estimation using convolutional neural networks: a general framework and ablative analysis. IEEE Trans. Intell. Veh. 3(3), 254–265 (2018)
Article Google Scholar
Cheng, M.M., Mitra, N.J., Huang, X., et al.: Global contrast based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 569–582 (2018)
Article Google Scholar
Saran, A., Majumdar, S., Shor, E.S., et al.: Human gaze following for human-robot interaction. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8615–8621 (2018)
Google Scholar
Zhao, J.X., Cao, Y., Cheng, M.M., et al.: Contrast prior and fluid pyramid integration for RGBD salient object detection. In: Proceedings of the IEEE International Conference on Computer Vision (2019)
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of P.R. China Under Grant Nos. 61772574 and 61375080.

Author information

Authors and Affiliations

School of Data and Computer Science, Sun Yat-sen University, Guangzhou, 510006, People’s Republic of China
Zhongping Cao, Guoli Wang & Xuemei Guo
Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, Beijing, People’s Republic of China
Zhongping Cao, Guoli Wang & Xuemei Guo

Authors

Zhongping Cao
View author publications
You can also search for this author in PubMed Google Scholar
Guoli Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xuemei Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guoli Wang .

Editor information

Editors and Affiliations

School of EECS, Peking University, Beijing, China
Zhouchen Lin
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Liang Wang
Nanjing University of Science and Technology, Nanjing, China
Jian Yang
Xidian University, Xi'an, China
Guangming Shi
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Institute of Artificial Intelligence, Xi'an Jiaotong University, Xi'an, China
Nanning Zheng
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Northwestern Polytechnical University, Xi'an, China
Yanning Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cao, Z., Wang, G., Guo, X. (2019). Stage-by-Stage Based Design Paradigm of Two-Pathway Model for Gaze Following. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2019. Lecture Notes in Computer Science(), vol 11858. Springer, Cham. https://doi.org/10.1007/978-3-030-31723-2_55

Download citation

DOI: https://doi.org/10.1007/978-3-030-31723-2_55
Published: 31 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31722-5
Online ISBN: 978-3-030-31723-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics