Towards End-to-End Video-Based Eye-Tracking

Park, Seonwook; Aksan, Emre; Zhang, Xucong; Hilliges, Otmar

doi:10.1007/978-3-030-58610-2_44

Seonwook Park¹²,
Emre Aksan¹²,
Xucong Zhang¹² &
…
Otmar Hilliges¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12357))

Included in the following conference series:

European Conference on Computer Vision

5226 Accesses
31 Citations

Abstract

Estimating eye-gaze from images alone is a challenging task, in large parts due to un-observable person-specific factors. Achieving high accuracy typically requires labeled data from test users which may not be attainable in real applications. We observe that there exists a strong relationship between what users are looking at and the appearance of the user’s eyes. In response to this understanding, we propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships. Our video dataset consists of time-synchronized screen recordings, user-facing camera views, and eye gaze data, which allows for new benchmarks in temporal gaze tracking as well as label-free refinement of gaze. Importantly, we demonstrate that the fusion of information from visual stimuli as well as eye images can lead towards achieving performance similar to literature-reported figures acquired through supervised personalization. Our final method yields significant performance improvements on our proposed EVE dataset, with up to \(28\%\) improvement in Point-of-Gaze estimates (resulting in \(2.49^\circ \) in angular error), paving the path towards high-accuracy screen-based eye tracking purely from webcam sensors. The dataset and reference source code are available at https://ait.ethz.ch/projects/2020/EVE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Can Eye Tracking with Pervasive Webcams Replace Dedicated Eye Trackers? an Experimental Comparison of Eye-Tracking Performance

A dataset of eye gaze images for calibration-free eye tracking augmented reality headset

Article Open access 29 March 2022

RT-GENE: Real-Time Eye Gaze Estimation in Natural Environments

Notes

References

Alnajar, F., Gevers, T., Valenti, R., Ghebreab, S.: Calibration-free gaze estimation using human gaze patterns. In: ICCV, December 2013
Google Scholar
Balajee Vasudevan, A., Dai, D., Van Gool, L.: Object referring in videos with language and human gaze. In: CVPR, pp. 4129–4138 (2018)
Google Scholar
Baluja, S., Pomerleau, D.: Non-intrusive gaze tracking using artificial neural networks. In: NeurIPS, pp. 753–760 (1993)
Google Scholar
Biedert, R., Buscher, G., Schwarz, S., Hees, J., Dengel, A.: Text 2.0. In: ACM CHI EA (2010)
Google Scholar
Chapelle, O., Wu, M.: Gradient descent optimization of smoothed information retrieval metrics. Inf. Retrieval 13(3), 216–235 (2010)
Article Google Scholar
Chen, J., Ji, Q.: Probabilistic gaze estimation without active personal calibration. In: CVPR, pp. 609–616 (2011)
Google Scholar
Chen, Z., Shi, B.: Offset calibration for appearance-based gaze estimation via gaze decomposition. In: WACV, March 2020
Google Scholar
Cheng, Y., Lu, F., Zhang, X.: Appearance-based gaze estimation via evaluation-guided asymmetric regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11218, pp. 105–121. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_7
Chapter Google Scholar
Chong, E., Ruiz, N., Wang, Y., Zhang, Y., Rozga, A., Rehg, J.M.: Connecting gaze, scene, and attention: generalized attention estimation via joint modeling of gaze and scene saliency. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 397–412. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_24
Chapter Google Scholar
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NeurIPS Workshop on Deep Learning (2014)
Google Scholar
Deng, H., Zhu, W.: Monocular free-head 3D gaze tracking with deep learning and geometry constraints. In: ICCV, pp. 3143–3152 (2017)
Google Scholar
Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: ECCV (2020)
Google Scholar
Feit, A.M., et al.: Toward everyday gaze input: accuracy and precision of eye tracking and implications for design. In: ACM CHI, pp. 1118–1130 (2017)
Google Scholar
Fischer, T., Chang, H.J., Demiris, Y.: RT-GENE: real-time eye gaze estimation in natural environments. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 339–357. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_21
Chapter Google Scholar
Fridman, L., Reimer, B., Mehler, B., Freeman, W.T.: Cognitive load estimation in the wild. In: ACM CHI (2018)
Google Scholar
Funes Mora, K.A., Monay, F., Odobez, J.M.: EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In: ACM ETRA. ACM, March 2014
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV (2015)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Honari, S., Molchanov, P., Tyree, S., Vincent, P., Pal, C., Kautz, J.: Improving landmark localization with semi-supervised learning. In: CVPR (2018)
Google Scholar
Huang, M.X., Kwok, T.C., Ngai, G., Chan, S.C., Leong, H.V.: Building a personalized, auto-calibrating eye tracker from user interactions. In: ACM CHI, pp. 5169–5179. ACM, New York (2016)
Google Scholar
Huang, Q., Veeraraghavan, A., Sabharwal, A.: TabletGaze: dataset and analysis for unconstrained appearance-based gaze estimation in mobile tablets. Mach. Vis. Appl. 28(5–6), 445–461 (2017)
Article Google Scholar
Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: ICCV, pp. 2106–2113. IEEE (2009)
Google Scholar
Karessli, N., Akata, Z., Schiele, B., Bulling, A.: Gaze embeddings for zero-shot image classification. In: CVPR (2017)
Google Scholar
Kellnhofer, P., Recasens, A., Stent, S., Matusik, W., Torralba, A.: Gaze360: physically unconstrained gaze estimation in the wild. In: ICCV, October 2019
Google Scholar
Krafka, K., et al.: Eye tracking for everyone. In: CVPR (2016)
Google Scholar
Kurzhals, K., Bopp, C.F., Bässler, J., Ebinger, F., Weiskopf, D.: Benchmark data for evaluating visualization and analysis techniques for eye tracking for video stimuli. In: Proceedings of the Fifth Workshop on Beyond Time and Errors: Novel Evaluation Methods for Visualization, pp. 54–60 (2014)
Google Scholar
Li, Z., Qin, S., Itti, L.: Visual attention guided bit allocation in video compression. Image Vis. Comput. 29(1), 1–14 (2011)
Article Google Scholar
Linardos, P., Mohedano, E., Nieto, J.J., O’Connor, N.E., Giro-i Nieto, X., McGuinness, K.: Simple vs complex temporal recurrences for video saliency prediction. In: BMVC (2019)
Google Scholar
Lindén, E., Sjostrand, J., Proutiere, A.: Learning to personalize in appearance-based gaze tracking. In: ICCVW (2019)
Google Scholar
Liu, G., Yu, Y., Mora, K.A.F., Odobez, J.: A differential approach for gaze estimation with calibration. In: BMVC (2018)
Google Scholar
Lu, F., Okabe, T., Sugano, Y., Sato, Y.: A head pose-free approach for appearance-based gaze estimation. In: BMVC (2011)
Google Scholar
Lu, F., Sugano, Y., Okabe, T., Sato, Y.: Inferring human gaze from appearance via adaptive linear regression. In: ICCV (2011)
Google Scholar
Martinikorena, I., Cabeza, R., Villanueva, A., Porta, S.: Introducing I2Head database. In: PETMEI, pp. 1–7 (2018)
Google Scholar
Mital, P.K., Smith, T.J., Hill, R.L., Henderson, J.M.: Clustering of gaze during dynamic scene viewing is predicted by motion. Cogn. Comput. 3(1), 5–24 (2011)
Article Google Scholar
Palmero, C., Selva, J., Bagheri, M.A., Escalera, S.: Recurrent CNN for 3D gaze estimation using appearance and shape cues. In: BMVC (2018)
Google Scholar
Papoutsaki, A., Sangkloy, P., Laskey, J., Daskalova, N., Huang, J., Hays, J.: WebGazer: scalable webcam eye tracking using user interactions. In: IJCAI, pp. 3839–3845 (2016)
Google Scholar
Park, S., Mello, S.D., Molchanov, P., Iqbal, U., Hilliges, O., Kautz, J.: Few-shot adaptive gaze estimation. In: ICCV (2019)
Google Scholar
Park, S., Spurr, A., Hilliges, O.: Deep pictorial gaze estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 741–757. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_44
Chapter Google Scholar
Park, S., Zhang, X., Bulling, A., Hilliges, O.: Learning to find eye region landmarks for remote gaze estimation in unconstrained settings. In: ACM ETRA (2018)
Google Scholar
Ranjan, R., Mello, S.D., Kautz, J.: Light-weight head pose invariant gaze tracking. In: CVPRW (2018)
Google Scholar
Smith, B., Yin, Q., Feiner, S., Nayar, S.: Gaze locking: passive eye contact detection for human-object interaction. In: ACM UIST, pp. 271–280, October 2013
Google Scholar
Sugano, Y., Bulling, A.: Self-calibrating head-mounted eye trackers using egocentric visual saliency. In: ACM UIST, pp. 363–372. ACM, New York (2015)
Google Scholar
Sugano, Y., Matsushita, Y., Sato, Y.: Calibration-free gaze sensing using saliency maps. In: CVPR, pp. 2667–2674 (2010)
Google Scholar
Sugano, Y., Matsushita, Y., Sato, Y.: Learning-by-synthesis for appearance-based 3D gaze estimation. In: CVPR (2014)
Google Scholar
Sugano, Y., Matsushita, Y., Sato, Y., Koike, H.: An incremental learning method for unconstrained gaze estimation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5304, pp. 656–667. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88690-7_49
Chapter Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NeurIPS, pp. 3104–3112 (2014)
Google Scholar
Wang, K., Su, H., Ji, Q.: Neuro-inspired eye tracking with eye movement dynamics. In: CVPR, pp. 9831–9840 (2019)
Google Scholar
Wang, K., Wang, S., Ji, Q.: Deep eye fixation map learning for calibration-free eye gaze tracking. In: ACM ETRA, pp. 47–55. ACM, New York (2016)
Google Scholar
Wang, K., Zhao, R., Ji, Q.: A hierarchical generative model for eye image synthesis and eye gaze estimation. In: CVPR (2018)
Google Scholar
Wang, K., Zhao, R., Su, H., Ji, Q.: Generalizing eye tracking with Bayesian adversarial learning. In: CVPR, pp. 11907–11916 (2019)
Google Scholar
Yu, Y., Liu, G., Odobez, J.M.: Improving few-shot user-specific gaze adaptation via gaze redirection synthesis. In: CVPR, pp. 11937–11946 (2019)
Google Scholar
Yu, Y., Odobez, J.M.: Unsupervised representation learning for gaze estimation. In: CVPR, June 2020
Google Scholar
Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: Appearance-based gaze estimation in the wild. In: CVPR (2015)
Google Scholar
Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: It’s written all over your face: full-face appearance-based gaze estimation. In: CVPRW (2017)
Google Scholar

Download references

Acknowledgements

We thank the participants of our dataset for their contributions, our reviewers for helping us improve the paper, and Jan Wezel for helping with the hardware setup. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme grant agreement No. StG-2016-717054.

Author information

Authors and Affiliations

Department of Computer Science, ETH Zurich, Zürich, Switzerland
Seonwook Park, Emre Aksan, Xucong Zhang & Otmar Hilliges

Authors

Seonwook Park
View author publications
You can also search for this author in PubMed Google Scholar
Emre Aksan
View author publications
You can also search for this author in PubMed Google Scholar
Xucong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Otmar Hilliges
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seonwook Park .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 6152 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Park, S., Aksan, E., Zhang, X., Hilliges, O. (2020). Towards End-to-End Video-Based Eye-Tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12357. Springer, Cham. https://doi.org/10.1007/978-3-030-58610-2_44

Download citation

DOI: https://doi.org/10.1007/978-3-030-58610-2_44
Published: 07 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58609-6
Online ISBN: 978-3-030-58610-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards End-to-End Video-Based Eye-Tracking

Abstract

Access this chapter

Similar content being viewed by others

Can Eye Tracking with Pervasive Webcams Replace Dedicated Eye Trackers? an Experimental Comparison of Eye-Tracking Performance

A dataset of eye gaze images for calibration-free eye tracking augmented reality headset

RT-GENE: Real-Time Eye Gaze Estimation in Natural Environments

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 6152 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Towards End-to-End Video-Based Eye-Tracking

Abstract

Access this chapter

Similar content being viewed by others

Can Eye Tracking with Pervasive Webcams Replace Dedicated Eye Trackers? an Experimental Comparison of Eye-Tracking Performance

A dataset of eye gaze images for calibration-free eye tracking augmented reality headset

RT-GENE: Real-Time Eye Gaze Estimation in Natural Environments

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 6152 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation