Skip to main content

Towards End-to-End Video-Based Eye-Tracking

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12357))

Included in the following conference series:

Abstract

Estimating eye-gaze from images alone is a challenging task, in large parts due to un-observable person-specific factors. Achieving high accuracy typically requires labeled data from test users which may not be attainable in real applications. We observe that there exists a strong relationship between what users are looking at and the appearance of the user’s eyes. In response to this understanding, we propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships. Our video dataset consists of time-synchronized screen recordings, user-facing camera views, and eye gaze data, which allows for new benchmarks in temporal gaze tracking as well as label-free refinement of gaze. Importantly, we demonstrate that the fusion of information from visual stimuli as well as eye images can lead towards achieving performance similar to literature-reported figures acquired through supervised personalization. Our final method yields significant performance improvements on our proposed EVE dataset, with up to \(28\%\) improvement in Point-of-Gaze estimates (resulting in \(2.49^\circ \) in angular error), paving the path towards high-accuracy screen-based eye tracking purely from webcam sensors. The dataset and reference source code are available at https://ait.ethz.ch/projects/2020/EVE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    See https://www.tobiipro.com/pop-ups/accuracy-and-precision-test-report-spectrum/?v=1.1.

  2. 2.

    https://ait.ethz.ch/projects/2020/EVE.

References

  1. Alnajar, F., Gevers, T., Valenti, R., Ghebreab, S.: Calibration-free gaze estimation using human gaze patterns. In: ICCV, December 2013

    Google Scholar 

  2. Balajee Vasudevan, A., Dai, D., Van Gool, L.: Object referring in videos with language and human gaze. In: CVPR, pp. 4129–4138 (2018)

    Google Scholar 

  3. Baluja, S., Pomerleau, D.: Non-intrusive gaze tracking using artificial neural networks. In: NeurIPS, pp. 753–760 (1993)

    Google Scholar 

  4. Biedert, R., Buscher, G., Schwarz, S., Hees, J., Dengel, A.: Text 2.0. In: ACM CHI EA (2010)

    Google Scholar 

  5. Chapelle, O., Wu, M.: Gradient descent optimization of smoothed information retrieval metrics. Inf. Retrieval 13(3), 216–235 (2010)

    Article  Google Scholar 

  6. Chen, J., Ji, Q.: Probabilistic gaze estimation without active personal calibration. In: CVPR, pp. 609–616 (2011)

    Google Scholar 

  7. Chen, Z., Shi, B.: Offset calibration for appearance-based gaze estimation via gaze decomposition. In: WACV, March 2020

    Google Scholar 

  8. Cheng, Y., Lu, F., Zhang, X.: Appearance-based gaze estimation via evaluation-guided asymmetric regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11218, pp. 105–121. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_7

    Chapter  Google Scholar 

  9. Chong, E., Ruiz, N., Wang, Y., Zhang, Y., Rozga, A., Rehg, J.M.: Connecting gaze, scene, and attention: generalized attention estimation via joint modeling of gaze and scene saliency. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 397–412. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_24

    Chapter  Google Scholar 

  10. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NeurIPS Workshop on Deep Learning (2014)

    Google Scholar 

  11. Deng, H., Zhu, W.: Monocular free-head 3D gaze tracking with deep learning and geometry constraints. In: ICCV, pp. 3143–3152 (2017)

    Google Scholar 

  12. Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: ECCV (2020)

    Google Scholar 

  13. Feit, A.M., et al.: Toward everyday gaze input: accuracy and precision of eye tracking and implications for design. In: ACM CHI, pp. 1118–1130 (2017)

    Google Scholar 

  14. Fischer, T., Chang, H.J., Demiris, Y.: RT-GENE: real-time eye gaze estimation in natural environments. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 339–357. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_21

    Chapter  Google Scholar 

  15. Fridman, L., Reimer, B., Mehler, B., Freeman, W.T.: Cognitive load estimation in the wild. In: ACM CHI (2018)

    Google Scholar 

  16. Funes Mora, K.A., Monay, F., Odobez, J.M.: EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In: ACM ETRA. ACM, March 2014

    Google Scholar 

  17. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV (2015)

    Google Scholar 

  18. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  19. Honari, S., Molchanov, P., Tyree, S., Vincent, P., Pal, C., Kautz, J.: Improving landmark localization with semi-supervised learning. In: CVPR (2018)

    Google Scholar 

  20. Huang, M.X., Kwok, T.C., Ngai, G., Chan, S.C., Leong, H.V.: Building a personalized, auto-calibrating eye tracker from user interactions. In: ACM CHI, pp. 5169–5179. ACM, New York (2016)

    Google Scholar 

  21. Huang, Q., Veeraraghavan, A., Sabharwal, A.: TabletGaze: dataset and analysis for unconstrained appearance-based gaze estimation in mobile tablets. Mach. Vis. Appl. 28(5–6), 445–461 (2017)

    Article  Google Scholar 

  22. Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: ICCV, pp. 2106–2113. IEEE (2009)

    Google Scholar 

  23. Karessli, N., Akata, Z., Schiele, B., Bulling, A.: Gaze embeddings for zero-shot image classification. In: CVPR (2017)

    Google Scholar 

  24. Kellnhofer, P., Recasens, A., Stent, S., Matusik, W., Torralba, A.: Gaze360: physically unconstrained gaze estimation in the wild. In: ICCV, October 2019

    Google Scholar 

  25. Krafka, K., et al.: Eye tracking for everyone. In: CVPR (2016)

    Google Scholar 

  26. Kurzhals, K., Bopp, C.F., Bässler, J., Ebinger, F., Weiskopf, D.: Benchmark data for evaluating visualization and analysis techniques for eye tracking for video stimuli. In: Proceedings of the Fifth Workshop on Beyond Time and Errors: Novel Evaluation Methods for Visualization, pp. 54–60 (2014)

    Google Scholar 

  27. Li, Z., Qin, S., Itti, L.: Visual attention guided bit allocation in video compression. Image Vis. Comput. 29(1), 1–14 (2011)

    Article  Google Scholar 

  28. Linardos, P., Mohedano, E., Nieto, J.J., O’Connor, N.E., Giro-i Nieto, X., McGuinness, K.: Simple vs complex temporal recurrences for video saliency prediction. In: BMVC (2019)

    Google Scholar 

  29. Lindén, E., Sjostrand, J., Proutiere, A.: Learning to personalize in appearance-based gaze tracking. In: ICCVW (2019)

    Google Scholar 

  30. Liu, G., Yu, Y., Mora, K.A.F., Odobez, J.: A differential approach for gaze estimation with calibration. In: BMVC (2018)

    Google Scholar 

  31. Lu, F., Okabe, T., Sugano, Y., Sato, Y.: A head pose-free approach for appearance-based gaze estimation. In: BMVC (2011)

    Google Scholar 

  32. Lu, F., Sugano, Y., Okabe, T., Sato, Y.: Inferring human gaze from appearance via adaptive linear regression. In: ICCV (2011)

    Google Scholar 

  33. Martinikorena, I., Cabeza, R., Villanueva, A., Porta, S.: Introducing I2Head database. In: PETMEI, pp. 1–7 (2018)

    Google Scholar 

  34. Mital, P.K., Smith, T.J., Hill, R.L., Henderson, J.M.: Clustering of gaze during dynamic scene viewing is predicted by motion. Cogn. Comput. 3(1), 5–24 (2011)

    Article  Google Scholar 

  35. Palmero, C., Selva, J., Bagheri, M.A., Escalera, S.: Recurrent CNN for 3D gaze estimation using appearance and shape cues. In: BMVC (2018)

    Google Scholar 

  36. Papoutsaki, A., Sangkloy, P., Laskey, J., Daskalova, N., Huang, J., Hays, J.: WebGazer: scalable webcam eye tracking using user interactions. In: IJCAI, pp. 3839–3845 (2016)

    Google Scholar 

  37. Park, S., Mello, S.D., Molchanov, P., Iqbal, U., Hilliges, O., Kautz, J.: Few-shot adaptive gaze estimation. In: ICCV (2019)

    Google Scholar 

  38. Park, S., Spurr, A., Hilliges, O.: Deep pictorial gaze estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 741–757. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_44

    Chapter  Google Scholar 

  39. Park, S., Zhang, X., Bulling, A., Hilliges, O.: Learning to find eye region landmarks for remote gaze estimation in unconstrained settings. In: ACM ETRA (2018)

    Google Scholar 

  40. Ranjan, R., Mello, S.D., Kautz, J.: Light-weight head pose invariant gaze tracking. In: CVPRW (2018)

    Google Scholar 

  41. Smith, B., Yin, Q., Feiner, S., Nayar, S.: Gaze locking: passive eye contact detection for human-object interaction. In: ACM UIST, pp. 271–280, October 2013

    Google Scholar 

  42. Sugano, Y., Bulling, A.: Self-calibrating head-mounted eye trackers using egocentric visual saliency. In: ACM UIST, pp. 363–372. ACM, New York (2015)

    Google Scholar 

  43. Sugano, Y., Matsushita, Y., Sato, Y.: Calibration-free gaze sensing using saliency maps. In: CVPR, pp. 2667–2674 (2010)

    Google Scholar 

  44. Sugano, Y., Matsushita, Y., Sato, Y.: Learning-by-synthesis for appearance-based 3D gaze estimation. In: CVPR (2014)

    Google Scholar 

  45. Sugano, Y., Matsushita, Y., Sato, Y., Koike, H.: An incremental learning method for unconstrained gaze estimation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5304, pp. 656–667. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88690-7_49

    Chapter  Google Scholar 

  46. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NeurIPS, pp. 3104–3112 (2014)

    Google Scholar 

  47. Wang, K., Su, H., Ji, Q.: Neuro-inspired eye tracking with eye movement dynamics. In: CVPR, pp. 9831–9840 (2019)

    Google Scholar 

  48. Wang, K., Wang, S., Ji, Q.: Deep eye fixation map learning for calibration-free eye gaze tracking. In: ACM ETRA, pp. 47–55. ACM, New York (2016)

    Google Scholar 

  49. Wang, K., Zhao, R., Ji, Q.: A hierarchical generative model for eye image synthesis and eye gaze estimation. In: CVPR (2018)

    Google Scholar 

  50. Wang, K., Zhao, R., Su, H., Ji, Q.: Generalizing eye tracking with Bayesian adversarial learning. In: CVPR, pp. 11907–11916 (2019)

    Google Scholar 

  51. Yu, Y., Liu, G., Odobez, J.M.: Improving few-shot user-specific gaze adaptation via gaze redirection synthesis. In: CVPR, pp. 11937–11946 (2019)

    Google Scholar 

  52. Yu, Y., Odobez, J.M.: Unsupervised representation learning for gaze estimation. In: CVPR, June 2020

    Google Scholar 

  53. Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: Appearance-based gaze estimation in the wild. In: CVPR (2015)

    Google Scholar 

  54. Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: It’s written all over your face: full-face appearance-based gaze estimation. In: CVPRW (2017)

    Google Scholar 

Download references

Acknowledgements

We thank the participants of our dataset for their contributions, our reviewers for helping us improve the paper, and Jan Wezel for helping with the hardware setup. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme grant agreement No. StG-2016-717054.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seonwook Park .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 6152 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Park, S., Aksan, E., Zhang, X., Hilliges, O. (2020). Towards End-to-End Video-Based Eye-Tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12357. Springer, Cham. https://doi.org/10.1007/978-3-030-58610-2_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58610-2_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58609-6

  • Online ISBN: 978-3-030-58610-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics