Advertisement

Deep Pictorial Gaze Estimation

  • Seonwook ParkEmail author
  • Adrian Spurr
  • Otmar Hilliges
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11217)

Abstract

Estimating human gaze from natural eye images only is a challenging task. Gaze direction can be defined by the pupil- and the eyeball center where the latter is unobservable in 2D images. Hence, achieving highly accurate gaze estimates is an ill-posed problem. In this paper, we introduce a novel deep neural network architecture specifically designed for the task of gaze estimation from single eye input. Instead of directly regressing two angles for the pitch and yaw of the eyeball, we regress to an intermediate pictorial representation which in turn simplifies the task of 3D gaze direction estimation. Our quantitative and qualitative results show that our approach achieves higher accuracies than the state-of-the-art and is robust to variation in gaze, head pose and image quality.

Keywords

Appearance-based gaze estimation Eye tracking 

Notes

Acknowledgements

This work was supported in part by ERC Grant OPTINT (StG-2016-717054). We thank the NVIDIA Corporation for the donation of GPUs used in this work.

Supplementary material

474201_1_En_44_MOESM1_ESM.pdf (153 kb)
Supplementary material 1 (pdf 153 KB)

References

  1. 1.
    Baluja, S., Pomerleau, D.: Non-intrusive gaze tracking using artificial neural networks. Technical report, Pittsburgh, PA, USA (1994)Google Scholar
  2. 2.
    Bekerman, I., Gottlieb, P., Vaiman, M.: Variations in eyeball diameters of the healthy adults. J. Ophthalmol. 2014, 5 (2014)CrossRefGoogle Scholar
  3. 3.
    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR, vol. 1, p. 7 (2017)Google Scholar
  4. 4.
    Chin, C.A., Barreto, A., Cremades, J.G., Adjouadi, M.: Integrated electromyogram and eye-gaze tracking cursor control system for computer users with motor disabilities. J. Rehabil. Res. Dev. 45(1), 161–174 (2008)CrossRefGoogle Scholar
  5. 5.
    Forrester, J.V., Dick, A.D., McMenamin, P.G., Roberts, F., Pearlman, E.: The Eye E-Book: Basic Sciences in Practice. Elsevier Health Sciences, New York (2015)Google Scholar
  6. 6.
    Funes-Mora, K.A., Odobez, J.M.: Gaze estimation in the 3D space using RGB-D sensors. Int. J. Comput. Vis. 118(2), 194–216 (2016).  https://doi.org/10.1007/s11263-015-0863-4MathSciNetCrossRefGoogle Scholar
  7. 7.
    Funes Mora, K.A., Monay, F., Odobez, J.M.: Eyediap: A database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In: Proceedings of the Symposium on Eye Tracking Research and Applications, ETRA 2014, pp. 255–258. ACM, New York, USA (2014).  https://doi.org/10.1145/2578153.2578190
  8. 8.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016Google Scholar
  9. 9.
    Honari, S., Molchanov, P., Tyree, S., Vincent, P., Pal, C., Kautz, J.: Improving landmark localization with semi-supervised learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018Google Scholar
  10. 10.
    Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  11. 11.
    Huang, M.X., Kwok, T.C., Ngai, G., Leong, H.V., Chan, S.C.: Building a self-learning eye gaze model from user interaction data. In: Proceedings of the 22nd ACM International Conference on Multimedia, MM 2014, pp. 1017–1020. ACM, New York, USA (2014).  https://doi.org/10.1145/2647868.2655031
  12. 12.
    Huang, Q., Veeraraghavan, A., Sabharwal, A.: TabletGaze: Dataset and analysis for unconstrained appearance-based gaze estimation in mobile tablets. Mach. Vis. Appl. 28(5–6), 445–461 (2017).  https://doi.org/10.1007/s00138-017-0852-4CrossRefGoogle Scholar
  13. 13.
    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014)Google Scholar
  14. 14.
    Krafka, K., Khosla, A., Kellnhofer, P., Kannan, H., Bhandarkar, S., Matusik, W., Torralba, A.: Eye tracking for everyone. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016Google Scholar
  15. 15.
    Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artificial Intelligence and Statistics, pp. 562–570 (2015)Google Scholar
  16. 16.
    Liu, H., Heynderickx, I.: Visual attention in objective image quality assessment: based on eye-tracking data. IEEE Trans. Circuits Syst. Video Technol. 21(7), 971–982 (2011)CrossRefGoogle Scholar
  17. 17.
    Lu, F., Okabe, T., Sugano, Y., Sato, Y.: A head pose-free approach for appearance-based gaze estimation. In: Proceedings of the British Machine Vision Conference, pp. 126.1–126.11. BMVA Press (2011).  https://doi.org/10.5244/C.25.126
  18. 18.
    Lu, F., Sugano, Y., Okabe, T., Sato, Y.: Inferring human gaze from appearance via adaptive linear regression. In: Proceedings of the 2011 International Conference on Computer Vision, ICCV 2011, pp. 153–160. IEEE Computer Society, Washington, DC, USA (2011).  https://doi.org/10.1109/ICCV.2011.6126237
  19. 19.
    Majaranta, P., Bulling, A.: Eye tracking and eye-based human–computer interaction. In: Fairclough, S.H., Gilleade, K. (eds.) Advances in Physiological Computing. HIS, pp. 39–65. Springer, London (2014).  https://doi.org/10.1007/978-1-4471-6392-3_3CrossRefGoogle Scholar
  20. 20.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part VIII. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_29CrossRefGoogle Scholar
  21. 21.
    Papoutsaki, A., Sangkloy, P., Laskey, J., Daskalova, N., Huang, J., Hays, J.: Webgazer: Scalable webcam eye tracking using user interactions. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), pp. 3839–3845. AAAI (2016)Google Scholar
  22. 22.
    Pech-Pacheco, J.L., Cristobal, G., Chamorro-Martinez, J., Fernandez-Valdivia, J.: Diatom autofocusing in brightfield microscopy: a comparative study. In: Proceedings 15th International Conference on Pattern Recognition, ICPR-2000. vol. 3, pp. 314–317 (2000).  https://doi.org/10.1109/ICPR.2000.903548
  23. 23.
    Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. arXiv abs/1603.01249 (2016)Google Scholar
  24. 24.
    Sesma, L., Villanueva, A., Cabeza, R.: Evaluation of pupil center-eye corner vector for gaze estimation using a web cam. In: Proceedings of the Symposium on Eye Tracking Research and Applications, ETRA 2012, pp. 217–220. ACM, New York, USA (2012).  https://doi.org/10.1145/2168556.2168598
  25. 25.
    Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  26. 26.
    Smith, B.A., Yin, Q., Feiner, S.K., Nayar, S.K.: Gaze locking: passive eye contact detection for human-object interaction. In: Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology, UIST 2013, pp. 271–280. ACM, New York, USA (2013).  https://doi.org/10.1145/2501988.2501994
  27. 27.
    Sugano, Y., Matsushita, Y., Sato, Y.: Learning-by-synthesis for appearance-based 3D gaze estimation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1821–1828, June 2014.  https://doi.org/10.1109/CVPR.2014.235
  28. 28.
    Sun, L., Liu, Z., Sun, M.T.: Real time gaze estimation with a consumer depth camera. Inf. Sci. 320(C), 346–360 (2015).  https://doi.org/10.1016/j.ins.2015.02.004MathSciNetCrossRefGoogle Scholar
  29. 29.
    Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  30. 30.
    Tan, K.H., Kriegman, D.J., Ahuja, N.: Appearance-based eye gaze estimation. In: Proceedings of the Sixth IEEE Workshop on Applications of Computer Vision, WACV 2002, p. 191. IEEE Computer Society, Washington, DC, USA (2002)Google Scholar
  31. 31.
    Tompson, J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, NIPS 2014, pp. 1799–1807. MIT Press, Cambridge, MA, USA (2014)Google Scholar
  32. 32.
    Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, pp. 1653–1660. IEEE Computer Society, Washington, DC, USA (2014).  https://doi.org/10.1109/CVPR.2014.214
  33. 33.
    Wang, K., Ji, Q.: Real time eye gaze tracking with 3D deformable eye-face model. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), ICCV 2017. IEEE Computer Society, Washington, DC, USA (2017)Google Scholar
  34. 34.
    Wedel, M., Pieters, R.: A review of eye-tracking research in marketing. In: Malhotra, N.K. (ed.) Review of Marketing Research, pp. 123–147. Emerald Group Publishing Limited, Bingley (2008)CrossRefGoogle Scholar
  35. 35.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016Google Scholar
  36. 36.
    Wood, E., Baltruaitis, T., Zhang, X., Sugano, Y., Robinson, P., Bulling, A.: Rendering of eyes for eye-shape registration and gaze estimation. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ICCV 2015, pp. 3756–3764. IEEE Computer Society, Washington, DC, USA (2015).  https://doi.org/10.1109/ICCV.2015.428
  37. 37.
    Wood, E., Baltrušaitis, T., Morency, L.-P., Robinson, P., Bulling, A.: A 3D morphable eye region model for gaze estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part I. LNCS, vol. 9905, pp. 297–313. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_18CrossRefGoogle Scholar
  38. 38.
    Wood, E., Baltrušaitis, T., Morency, L.P., Robinson, P., Bulling, A.: Learning an appearance-based gaze estimator from one million synthesised images. In: Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications, ETRA 2016, pp. 131–138. ACM, New York, USA (2016).  https://doi.org/10.1145/2857491.2857492
  39. 39.
    Wood, E., Bulling, A.: Eyetab: Model-based gaze estimation on unmodified tablet computers. In: Proceedings of the Symposium on Eye Tracking Research and Applications, ETRA 2014, pp. 207–210. ACM, New York (2014).  https://doi.org/10.1145/2578153.2578185
  40. 40.
    Xiong, X., Liu, Z., Cai, Q., Zhang, Z.: Eye gaze tracking using an RGBD camera: A comparison with a RGB solution. In: Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, UbiComp 2014 Adjunct, pp. 1113–1121. ACM, New York, USA (2014).  https://doi.org/10.1145/2638728.2641694
  41. 41.
    Zafeiriou, S., Trigeorgis, G., Chrysos, G., Deng, J., Shen, J.: The menpo facial landmark localisation challenge: a step towards the solution. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017Google Scholar
  42. 42.
    Zhang, X., Sugano, Y., Bulling, A.: Everyday eye contact detection using unsupervised gaze target discovery. In: Proc. of the ACM Symposium on User Interface Software and Technology (UIST), pp. 193–203 (2017).  https://doi.org/10.1145/3126594.3126614, best paper honourable mention award
  43. 43.
    Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: Appearance-based gaze estimation in the wild. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4511–4520, June 2015.  https://doi.org/10.1109/CVPR.2015.7299081
  44. 44.
    Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: It’s written all over your face: full-face appearance-based gaze estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 2299–2308, July 2017.  https://doi.org/10.1109/CVPRW.2017.284
  45. 45.
    Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: MPIIGaze: real-world dataset and deep appearance-based gaze estimation. IEEE Trans. Pattern Anal. Mach. Intell. (2017).  https://doi.org/10.1109/TPAMI.2017.2778103
  46. 46.
    Zhang, Y., Bulling, A., Gellersen, H.: Sideways: a gaze interface for spontaneous interaction with situated displays. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2013, pp. 851–860. ACM, New York, USA (2013).  https://doi.org/10.1145/2470654.2470775
  47. 47.
    Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 94–108. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_7CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.AIT Lab, Department of Computer ScienceETH ZurichZürichSwitzerland

Personalised recommendations