Abstract
Gaze, which is an important non-verbal cue of interactions between human beings, can be used to estimate a person’s point of regard as well as deduce his intention. And gaze following is an task to estimate the visual attention of people in a single image. To tackle this challenging problem, earlier state-of-the-art work try to combine the information from image saliency as well as the gaze directions of people, thus demonstrate a deep-learning based two-pathway model. However, previous work do not focus much on why such a two-pathway model works well. Thus, in this paper, we divide the two-pathway model into three stages, compare different mechanisms in those stages to better understand how each stage may influence the model performance. Finally, we find out the best combinations of the mechanism in three stages and evaluate the model on the benchmark GazeFollow.
The first author is a student.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Recasens, A., Khosla, A., Vondrick, C., Torralba, A.: Where are they looking? In: Advances in Neural Information Processing Systems (NIPS) (2015)
Jiang, M., Huang, S., Duan, J., et al.: Salicon: saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015)
Krafka, K., Khosla, A., Kellnhofer, P., et al.: Eye tracking for everyone. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2176–2184 (2016)
Recasens, A., Vondrick, C., Khosla, A., Torralba, A.: Following gaze in video. In: IEEE International Conference on Computer Vision (2017)
Chong, E., Ruiz, N., et al.: Connecting gaze, scene, and attention: generalized attention estimation via joint modeling of gaze and scene saliency. In: The European Conference on Computer Vision (2018)
He K, Zhang X, Ren S, et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)
Deng, J., Dong, W., Socher, R., et al.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Matthias, K., Lucas, T., Matthias, B.: Deep gaze I: boosting saliency prediction with feature maps trained on imagenet. CoRR, vol.abs/1411.1045 (2014)
Kruthiventi, S.S., Ayush, K., et al.: DeepFix: a fully convolutional neural network for predicting human eye fixations. IEEE Trans. Image Process. 26(9), 4446–4455 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25(2), 1097–1105 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR, vol.abs/1409.1556 (2014)
Sun, X., Xiao, B., Liang, S., et al.: Integral human pose regression. CoRR, vol. abs/1711.08229 (2017)
Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Pytorch: tensors and dynamic neural networks in python with strong GPU acceleration. https://github.com/pytorch/pytorch. Accessed 03 Nov 2017
Judd, T., Ehinger, K., Durand, F., et al.: Learning to predict where humans look. In: Proceedings of the 2009 IEEE International Conference on Computer Vision (2009)
Sourabh, V., Akshay, R., Trivedi, M.M.: Gaze zone estimation using convolutional neural networks: a general framework and ablative analysis. IEEE Trans. Intell. Veh. 3(3), 254–265 (2018)
Cheng, M.M., Mitra, N.J., Huang, X., et al.: Global contrast based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 569–582 (2018)
Saran, A., Majumdar, S., Shor, E.S., et al.: Human gaze following for human-robot interaction. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8615–8621 (2018)
Zhao, J.X., Cao, Y., Cheng, M.M., et al.: Contrast prior and fluid pyramid integration for RGBD salient object detection. In: Proceedings of the IEEE International Conference on Computer Vision (2019)
Acknowledgements
This work was supported by the National Natural Science Foundation of P.R. China Under Grant Nos. 61772574 and 61375080.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Cao, Z., Wang, G., Guo, X. (2019). Stage-by-Stage Based Design Paradigm of Two-Pathway Model for Gaze Following. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2019. Lecture Notes in Computer Science(), vol 11858. Springer, Cham. https://doi.org/10.1007/978-3-030-31723-2_55
Download citation
DOI: https://doi.org/10.1007/978-3-030-31723-2_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31722-5
Online ISBN: 978-3-030-31723-2
eBook Packages: Computer ScienceComputer Science (R0)