Advertisement

ETH-XGaze: A Large Scale Dataset for Gaze Estimation Under Extreme Head Pose and Gaze Variation

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12350)

Abstract

Gaze estimation is a fundamental task in many applications of computer vision, human computer interaction and robotics. Many state-of-the-art methods are trained and tested on custom datasets, making comparison across methods challenging. Furthermore, existing gaze estimation datasets have limited head pose and gaze variations, and the evaluations are conducted using different protocols and metrics. In this paper, we propose a new gaze estimation dataset called ETH-XGaze, consisting of over one million high-resolution images of varying gaze under extreme head poses. We collect this dataset from 110 participants with a custom hardware setup including 18 digital SLR cameras and adjustable illumination conditions, and a calibrated system to record ground truth gaze targets. We show that our dataset can significantly improve the robustness of gaze estimation methods across different head poses and gaze angles. Additionally, we define a standardized experimental protocol and evaluation metric on ETH-XGaze, to better unify gaze estimation research going forward. The dataset and benchmark website are available at https://ait.ethz.ch/projects/2020/ETH-XGaze.

Notes

Acknowledgements

We thank the participants of our dataset for their contributions, our reviewers for helping us improve the paper, and Jan Wezel for helping with the hardware setup. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme grant agreement No. StG-2016–717054.

Supplementary material

Supplementary material 1 (mp4 72950 KB)

References

  1. 1.
    Baluja, S., Pomerleau, D.: Non-intrusive gaze tracking using artificial neural networks. In: Advances in Neural Information Processing Systems, pp. 753–760 (1994)Google Scholar
  2. 2.
    Beeler, T., Bickel, B., Beardsley, P., Sumner, B., Gross, M.: High-quality single-shot capture of facial geometry. In: ACM Transactions on Graphics (TOG), pp. 1–9 (2010)Google Scholar
  3. 3.
    Bérard, P., Bradley, D., Gross, M., Beeler, T.: Lightweight eye capture using a parametric model. ACM Trans. Graph. (TOG) 35(4), 1–12 (2016)CrossRefGoogle Scholar
  4. 4.
    Bérard, P., Bradley, D., Gross, M., Beeler, T.: Practical person-specific eye rigging. In: Computer Graphics Forum, vol. 38, pp. 441–454. Wiley Online Library (2019)Google Scholar
  5. 5.
    Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2D & 3D face alignment problem?(and a dataset of 230,000 3D facial landmarks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1021–1030 (2017)Google Scholar
  6. 6.
    Cheng, Y., Huang, S., Wang, F., Qian, C., Lu, F.: A coarse-to-fine adaptive network for appearance-based gaze estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10623–10630 (2020)Google Scholar
  7. 7.
    Demiris, Y.: Prediction of intent in robotics and multi-agent systems. Cogn. Process. 8(3), 151–158 (2007)CrossRefGoogle Scholar
  8. 8.
    Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2011)CrossRefGoogle Scholar
  9. 9.
    Fischer, T., Chang, H.J., Demiris, Y.: RT-GENE: real-time eye gaze estimation in natural environments. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 339–357. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01249-6_21CrossRefGoogle Scholar
  10. 10.
    Funes Mora, K.A., Monay, F., Odobez, J.M.: Eyediap: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In: Proceedings of the ACM Symposium on Eye Tracking Research & Applications, pp. 255–258. ACM (2014)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  12. 12.
    He, Z., Spurr, A., Zhang, X., Hilliges, O.: Photo-realistic monocular gaze redirection using generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6932–6941 (2019)Google Scholar
  13. 13.
    Huang, Q., Veeraraghavan, A., Sabharwal, A.: Tabletgaze: dataset and analysis for unconstrained appearance-based gaze estimation in mobile tablets. Mach. Vis. Appl. 28(5–6), 445–461 (2017)CrossRefGoogle Scholar
  14. 14.
    Huber, P., et al.: A multiresolution 3D morphable face model and fitting framework. In: Proceedings of the 11th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (2016)Google Scholar
  15. 15.
    Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)Google Scholar
  16. 16.
    Kellnhofer, P., Recasens, A., Stent, S., Matusik, W., Torralba, A.: Gaze360: physically unconstrained gaze estimation in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6912–6921 (2019)Google Scholar
  17. 17.
    Kemelmacher-Shlizerman, I., Seitz, S.M., Miller, D., Brossard, E.: The megaface benchmark: 1 million faces for recognition at scale. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4873–4882 (2016)Google Scholar
  18. 18.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  19. 19.
    Krafka, K., Khosla, A., Kellnhofer, P., Kannan, H., Bhandarkar, S., Matusik, W., Torralba, A.: Eye tracking for everyone. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2176–2184 (2016)Google Scholar
  20. 20.
    Liu, G., Yu, Y., Mora, K.A.F., Odobez, J.M.: A differential approach for gaze estimation with calibration. In: British Machine Vision Conference, vol. 2, p. 6 (2018)Google Scholar
  21. 21.
    Lu, F., Sugano, Y., Okabe, T., Sato, Y.: Inferring human gaze from appearance via adaptive linear regression. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 153–160. IEEE (2011)Google Scholar
  22. 22.
    Lu, F., Sugano, Y., Okabe, T., Sato, Y.: Adaptive linear regression for appearance-based gaze estimation. IEEE Trans. Pattern Anal. Mach. Intell. 36(10), 2033–2046 (2014)CrossRefGoogle Scholar
  23. 23.
    Majaranta, P., Bulling, A.: Eye tracking and eye-based human–computer interaction. In: Fairclough, S.H., Gilleade, K. (eds.) Advances in Physiological Computing. HIS, pp. 39–65. Springer, London (2014).  https://doi.org/10.1007/978-1-4471-6392-3_3CrossRefGoogle Scholar
  24. 24.
    Nech, A., Kemelmacher-Shlizerman, I.: Level playing field for million scale face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7044–7053 (2017)Google Scholar
  25. 25.
    Park, S., Mello, S.D., Molchanov, P., Iqbal, U., Hilliges, O., Kautz, J.: Few-shot adaptive gaze estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9368–9377 (2019)Google Scholar
  26. 26.
    Park, S., Spurr, A., Hilliges, O.: Deep pictorial gaze estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 741–757. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01261-8_44CrossRefGoogle Scholar
  27. 27.
    Ruch, T.C., Fulton, J.F.: Medical physiology and biophysics. Acad. Med. 35(11), 1067 (1960)Google Scholar
  28. 28.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 519–528. IEEE (2006)Google Scholar
  30. 30.
    Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2107–2116 (2017)Google Scholar
  31. 31.
    Smith, B.A., Yin, Q., Feiner, S.K., Nayar, S.K.: Gaze locking: passive eye contact detection for human-object interaction. In: Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology, pp. 271–280 (2013)Google Scholar
  32. 32.
    Soo Park, H., Jain, E., Sheikh, Y.: Predicting primary gaze behavior using social saliency fields. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3503–3510 (2013)Google Scholar
  33. 33.
    Sugano, Y., Matsushita, Y., Sato, Y.: Learning-by-synthesis for appearance-based 3D gaze estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1821–1828 (2014)Google Scholar
  34. 34.
    Wang, K., Zhao, R., Ji, Q.: A hierarchical generative model for eye image synthesis and eye gaze estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 440–448 (2018)Google Scholar
  35. 35.
    Wang, K., Zhao, R., Su, H., Ji, Q.: Generalizing eye tracking with Bayesian adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11907–11916 (2019)Google Scholar
  36. 36.
    Wood, E., Baltrušaitis, T., Morency, L.-P., Robinson, P., Bulling, A.: A 3D morphable eye region model for gaze estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 297–313. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_18CrossRefGoogle Scholar
  37. 37.
    Wood, E., Baltrušaitis, T., Morency, L.P., Robinson, P., Bulling, A.: Learning an appearance-based gaze estimator from one million synthesised images. In: Proceedings of the ACM Symposium on Eye Tracking Research & Applications, pp. 131–138 (2016)Google Scholar
  38. 38.
    Yu, Y., Liu, G., Odobez, J.M.: Improving few-shot user-specific gaze adaptation via gaze redirection synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11937–11946 (2019)Google Scholar
  39. 39.
    Yu, Y., Odobez, J.M.: Unsupervised representation learning for gaze estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7314–7324 (2020)Google Scholar
  40. 40.
    Yu, Z., Yoon, J.S., Venkatesh, P., Park, J., Yu, J., Park, H.S.: Humbi 1.0: Human Multiview Behavioral Imaging Dataset, June 2020Google Scholar
  41. 41.
    Zhang, X., Sugano, Y., Bulling, A.: Revisiting data normalization for appearance-based gaze estimation. In: Proceedings of the ACM Symposium on Eye Tracking Research & Applications, p. 12. ACM (2018)Google Scholar
  42. 42.
    Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: Mpiigaze: real-world dataset and deep appearance-based gaze estimation. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 162–175 (2019)CrossRefGoogle Scholar
  43. 43.
    Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M., Brox, T.: Freihand: a dataset for markerless capture of hand pose and shape from single RGB images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 813–822 (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Computer ScienceETH ZurichZürichSwitzerland
  2. 2.Google Inc.ZürichSwitzerland
  3. 3.ZürichSwitzerland

Personalised recommendations