Towards End-to-End Video-Based Eye-Tracking

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12357)


Estimating eye-gaze from images alone is a challenging task, in large parts due to un-observable person-specific factors. Achieving high accuracy typically requires labeled data from test users which may not be attainable in real applications. We observe that there exists a strong relationship between what users are looking at and the appearance of the user’s eyes. In response to this understanding, we propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships. Our video dataset consists of time-synchronized screen recordings, user-facing camera views, and eye gaze data, which allows for new benchmarks in temporal gaze tracking as well as label-free refinement of gaze. Importantly, we demonstrate that the fusion of information from visual stimuli as well as eye images can lead towards achieving performance similar to literature-reported figures acquired through supervised personalization. Our final method yields significant performance improvements on our proposed EVE dataset, with up to \(28\%\) improvement in Point-of-Gaze estimates (resulting in \(2.49^\circ \) in angular error), paving the path towards high-accuracy screen-based eye tracking purely from webcam sensors. The dataset and reference source code are available at


Eye tracking Gaze estimation Computer vision dataset 



We thank the participants of our dataset for their contributions, our reviewers for helping us improve the paper, and Jan Wezel for helping with the hardware setup. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme grant agreement No. StG-2016-717054.

Supplementary material

504453_1_En_44_MOESM1_ESM.pdf (6 mb)
Supplementary material 1 (pdf 6152 KB)


  1. 1.
    Alnajar, F., Gevers, T., Valenti, R., Ghebreab, S.: Calibration-free gaze estimation using human gaze patterns. In: ICCV, December 2013Google Scholar
  2. 2.
    Balajee Vasudevan, A., Dai, D., Van Gool, L.: Object referring in videos with language and human gaze. In: CVPR, pp. 4129–4138 (2018)Google Scholar
  3. 3.
    Baluja, S., Pomerleau, D.: Non-intrusive gaze tracking using artificial neural networks. In: NeurIPS, pp. 753–760 (1993)Google Scholar
  4. 4.
    Biedert, R., Buscher, G., Schwarz, S., Hees, J., Dengel, A.: Text 2.0. In: ACM CHI EA (2010)Google Scholar
  5. 5.
    Chapelle, O., Wu, M.: Gradient descent optimization of smoothed information retrieval metrics. Inf. Retrieval 13(3), 216–235 (2010)CrossRefGoogle Scholar
  6. 6.
    Chen, J., Ji, Q.: Probabilistic gaze estimation without active personal calibration. In: CVPR, pp. 609–616 (2011)Google Scholar
  7. 7.
    Chen, Z., Shi, B.: Offset calibration for appearance-based gaze estimation via gaze decomposition. In: WACV, March 2020Google Scholar
  8. 8.
    Cheng, Y., Lu, F., Zhang, X.: Appearance-based gaze estimation via evaluation-guided asymmetric regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11218, pp. 105–121. Springer, Cham (2018). Scholar
  9. 9.
    Chong, E., Ruiz, N., Wang, Y., Zhang, Y., Rozga, A., Rehg, J.M.: Connecting gaze, scene, and attention: generalized attention estimation via joint modeling of gaze and scene saliency. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 397–412. Springer, Cham (2018). Scholar
  10. 10.
    Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NeurIPS Workshop on Deep Learning (2014)Google Scholar
  11. 11.
    Deng, H., Zhu, W.: Monocular free-head 3D gaze tracking with deep learning and geometry constraints. In: ICCV, pp. 3143–3152 (2017)Google Scholar
  12. 12.
    Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: ECCV (2020)Google Scholar
  13. 13.
    Feit, A.M., et al.: Toward everyday gaze input: accuracy and precision of eye tracking and implications for design. In: ACM CHI, pp. 1118–1130 (2017)Google Scholar
  14. 14.
    Fischer, T., Chang, H.J., Demiris, Y.: RT-GENE: real-time eye gaze estimation in natural environments. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 339–357. Springer, Cham (2018). Scholar
  15. 15.
    Fridman, L., Reimer, B., Mehler, B., Freeman, W.T.: Cognitive load estimation in the wild. In: ACM CHI (2018)Google Scholar
  16. 16.
    Funes Mora, K.A., Monay, F., Odobez, J.M.: EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In: ACM ETRA. ACM, March 2014Google Scholar
  17. 17.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV (2015)Google Scholar
  18. 18.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  19. 19.
    Honari, S., Molchanov, P., Tyree, S., Vincent, P., Pal, C., Kautz, J.: Improving landmark localization with semi-supervised learning. In: CVPR (2018)Google Scholar
  20. 20.
    Huang, M.X., Kwok, T.C., Ngai, G., Chan, S.C., Leong, H.V.: Building a personalized, auto-calibrating eye tracker from user interactions. In: ACM CHI, pp. 5169–5179. ACM, New York (2016)Google Scholar
  21. 21.
    Huang, Q., Veeraraghavan, A., Sabharwal, A.: TabletGaze: dataset and analysis for unconstrained appearance-based gaze estimation in mobile tablets. Mach. Vis. Appl. 28(5–6), 445–461 (2017)CrossRefGoogle Scholar
  22. 22.
    Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: ICCV, pp. 2106–2113. IEEE (2009)Google Scholar
  23. 23.
    Karessli, N., Akata, Z., Schiele, B., Bulling, A.: Gaze embeddings for zero-shot image classification. In: CVPR (2017)Google Scholar
  24. 24.
    Kellnhofer, P., Recasens, A., Stent, S., Matusik, W., Torralba, A.: Gaze360: physically unconstrained gaze estimation in the wild. In: ICCV, October 2019Google Scholar
  25. 25.
    Krafka, K., et al.: Eye tracking for everyone. In: CVPR (2016)Google Scholar
  26. 26.
    Kurzhals, K., Bopp, C.F., Bässler, J., Ebinger, F., Weiskopf, D.: Benchmark data for evaluating visualization and analysis techniques for eye tracking for video stimuli. In: Proceedings of the Fifth Workshop on Beyond Time and Errors: Novel Evaluation Methods for Visualization, pp. 54–60 (2014)Google Scholar
  27. 27.
    Li, Z., Qin, S., Itti, L.: Visual attention guided bit allocation in video compression. Image Vis. Comput. 29(1), 1–14 (2011)CrossRefGoogle Scholar
  28. 28.
    Linardos, P., Mohedano, E., Nieto, J.J., O’Connor, N.E., Giro-i Nieto, X., McGuinness, K.: Simple vs complex temporal recurrences for video saliency prediction. In: BMVC (2019)Google Scholar
  29. 29.
    Lindén, E., Sjostrand, J., Proutiere, A.: Learning to personalize in appearance-based gaze tracking. In: ICCVW (2019)Google Scholar
  30. 30.
    Liu, G., Yu, Y., Mora, K.A.F., Odobez, J.: A differential approach for gaze estimation with calibration. In: BMVC (2018)Google Scholar
  31. 31.
    Lu, F., Okabe, T., Sugano, Y., Sato, Y.: A head pose-free approach for appearance-based gaze estimation. In: BMVC (2011)Google Scholar
  32. 32.
    Lu, F., Sugano, Y., Okabe, T., Sato, Y.: Inferring human gaze from appearance via adaptive linear regression. In: ICCV (2011)Google Scholar
  33. 33.
    Martinikorena, I., Cabeza, R., Villanueva, A., Porta, S.: Introducing I2Head database. In: PETMEI, pp. 1–7 (2018)Google Scholar
  34. 34.
    Mital, P.K., Smith, T.J., Hill, R.L., Henderson, J.M.: Clustering of gaze during dynamic scene viewing is predicted by motion. Cogn. Comput. 3(1), 5–24 (2011)CrossRefGoogle Scholar
  35. 35.
    Palmero, C., Selva, J., Bagheri, M.A., Escalera, S.: Recurrent CNN for 3D gaze estimation using appearance and shape cues. In: BMVC (2018)Google Scholar
  36. 36.
    Papoutsaki, A., Sangkloy, P., Laskey, J., Daskalova, N., Huang, J., Hays, J.: WebGazer: scalable webcam eye tracking using user interactions. In: IJCAI, pp. 3839–3845 (2016)Google Scholar
  37. 37.
    Park, S., Mello, S.D., Molchanov, P., Iqbal, U., Hilliges, O., Kautz, J.: Few-shot adaptive gaze estimation. In: ICCV (2019)Google Scholar
  38. 38.
    Park, S., Spurr, A., Hilliges, O.: Deep pictorial gaze estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 741–757. Springer, Cham (2018). Scholar
  39. 39.
    Park, S., Zhang, X., Bulling, A., Hilliges, O.: Learning to find eye region landmarks for remote gaze estimation in unconstrained settings. In: ACM ETRA (2018)Google Scholar
  40. 40.
    Ranjan, R., Mello, S.D., Kautz, J.: Light-weight head pose invariant gaze tracking. In: CVPRW (2018)Google Scholar
  41. 41.
    Smith, B., Yin, Q., Feiner, S., Nayar, S.: Gaze locking: passive eye contact detection for human-object interaction. In: ACM UIST, pp. 271–280, October 2013Google Scholar
  42. 42.
    Sugano, Y., Bulling, A.: Self-calibrating head-mounted eye trackers using egocentric visual saliency. In: ACM UIST, pp. 363–372. ACM, New York (2015)Google Scholar
  43. 43.
    Sugano, Y., Matsushita, Y., Sato, Y.: Calibration-free gaze sensing using saliency maps. In: CVPR, pp. 2667–2674 (2010)Google Scholar
  44. 44.
    Sugano, Y., Matsushita, Y., Sato, Y.: Learning-by-synthesis for appearance-based 3D gaze estimation. In: CVPR (2014)Google Scholar
  45. 45.
    Sugano, Y., Matsushita, Y., Sato, Y., Koike, H.: An incremental learning method for unconstrained gaze estimation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5304, pp. 656–667. Springer, Heidelberg (2008). Scholar
  46. 46.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NeurIPS, pp. 3104–3112 (2014)Google Scholar
  47. 47.
    Wang, K., Su, H., Ji, Q.: Neuro-inspired eye tracking with eye movement dynamics. In: CVPR, pp. 9831–9840 (2019)Google Scholar
  48. 48.
    Wang, K., Wang, S., Ji, Q.: Deep eye fixation map learning for calibration-free eye gaze tracking. In: ACM ETRA, pp. 47–55. ACM, New York (2016)Google Scholar
  49. 49.
    Wang, K., Zhao, R., Ji, Q.: A hierarchical generative model for eye image synthesis and eye gaze estimation. In: CVPR (2018)Google Scholar
  50. 50.
    Wang, K., Zhao, R., Su, H., Ji, Q.: Generalizing eye tracking with Bayesian adversarial learning. In: CVPR, pp. 11907–11916 (2019)Google Scholar
  51. 51.
    Yu, Y., Liu, G., Odobez, J.M.: Improving few-shot user-specific gaze adaptation via gaze redirection synthesis. In: CVPR, pp. 11937–11946 (2019)Google Scholar
  52. 52.
    Yu, Y., Odobez, J.M.: Unsupervised representation learning for gaze estimation. In: CVPR, June 2020Google Scholar
  53. 53.
    Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: Appearance-based gaze estimation in the wild. In: CVPR (2015)Google Scholar
  54. 54.
    Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: It’s written all over your face: full-face appearance-based gaze estimation. In: CVPRW (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Computer ScienceETH ZurichZürichSwitzerland

Personalised recommendations