Advertisement

Deep Multitask Gaze Estimation with a Constrained Landmark-Gaze Model

  • Yu YuEmail author
  • Gang Liu
  • Jean-Marc Odobez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11130)

Abstract

As an indicator of attention, gaze is an important cue for human behavior and social interaction analysis. Recent deep learning methods for gaze estimation rely on plain regression of the gaze from images without accounting for potential mismatches in eye image cropping and normalization. This may impact the estimation of the implicit relation between visual cues and the gaze direction when dealing with low resolution images or when training with a limited amount of data. In this paper, we propose a deep multitask framework for gaze estimation, with the following contributions. (i) we proposed a multitask framework which relies on both synthetic data and real data for end-to-end training. During training, each dataset provides the label of only one task but the two tasks are combined in a constrained way. (ii) we introduce a Constrained Landmark-Gaze Model (CLGM) modeling the joint variation of eye landmark locations (including the iris center) and gaze directions. By relating explicitly visual information (landmarks) to the more abstract gaze values, we demonstrate that the estimator is more accurate and easier to learn. (iii) by decomposing our deep network into a network inferring jointly the parameters of the CLGM model and the scale and translation parameters of eye regions on one hand, and a CLGM based decoder deterministically inferring landmark positions and gaze from these parameters and head pose on the other hand, our framework decouples gaze estimation from irrelevant geometric variations in the eye image (scale, translation), resulting in a more robust model. Thorough experiments on public datasets demonstrate that our method achieves competitive results, improving over state-of-the-art results in challenging free head pose gaze estimation tasks and on eye landmark localization (iris location) ones.

Notes

Acknowledgement

This work was partly funded by the UBIMPRESSED project of the Sinergia interdisciplinary program of the Swiss National Science Foundation (SNSF), and by the European Unions Horizon 2020 research and innovation programme under grant agreement no. 688147 (MuMMER, mummer-project.eu).

References

  1. 1.
    Bixler, R., Blanchard, N., Garrison, L., D’Mello, S.: Automatic detection of mind wandering during reading using gaze and physiology. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ICMI 2015, pp. 299–306. ACM, New York (2015)Google Scholar
  2. 2.
    Hiraoka, R., Tanaka, H., Sakti, S., Neubig, G., Nakamura, S.: Personalized unknown word detection in non-native language reading using eye gaze. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, ICMI 2016, pp. 66–70. ACM, New York (2016)Google Scholar
  3. 3.
    Velichkovsky, B.M., Dornhoefer, S.M., Pannasch, S., Unema, P.J.: Visual fixations and level of attentional processing. In: Proceedings of the 2000 Symposium on Eye Tracking Research & Applications, ETRA 2000, pp. 79–85. ACM, New York (2000)Google Scholar
  4. 4.
    Kendon, A.: Some functions of gaze-direction in social interaction. Acta Psychol. 26(Suppl. C), 22–63 (1967)CrossRefGoogle Scholar
  5. 5.
    Vidal, M., Turner, J., Bulling, A., Gellersen, H.: Wearable eye tracking for mental health monitoring. Comput. Commun. 35(11), 1306–1311 (2012)CrossRefGoogle Scholar
  6. 6.
    Ishii, R., Otsuka, K., Kumano, S., Yamato, J.: Prediction of who will be the next speaker and when using gaze behavior in multiparty meetings. ACM Trans. Interact. Intell. Syst. 6(1), 4:1–4:31 (2016)CrossRefGoogle Scholar
  7. 7.
    Andrist, S., Tan, X.Z., Gleicher, M., Mutlu, B.: Conversational gaze aversion for humanlike robots. In: Proceedings of the 2014 ACM/IEEE International Conference on Human-robot Interaction, HRI 2014, pp. 25–32. ACM, New York (2014)Google Scholar
  8. 8.
    Moon, A., et al.: Meet me where i’m gazing: How shared attention gaze affects human-robot handover timing. In: Proceedings of the 2014 ACM/IEEE International Conference on Human-robot Interaction, HRI 2014, pp. 334–341. ACM, New York (2014)Google Scholar
  9. 9.
    Krafka, K., Khosla, A., Kellnhofer, P., Kannan, H.: Eye tracking for everyone. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2176–2184 (2016)Google Scholar
  10. 10.
    Tonsen, M., Steil, J., Sugano, Y., Bulling, A.: InvisibleEye: mobile eye tracking using multiple low-resolution cameras and learning-based gaze estimation. In: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), vol. 1, no. 3 (2017)CrossRefGoogle Scholar
  11. 11.
    Huang, Q., Veeraraghavan, A., Sabharwal, A.: Tabletgaze: unconstrained appearance-based gaze estimation in mobile tablets. arXiv preprint arXiv:1508.01244 (2015)
  12. 12.
    Zhu, W., Deng, H.: Monocular free-head 3D gaze tracking with deep learning and geometry constraints. In: The IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  13. 13.
    Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: Appearance-based gaze estimation in the wild, pp. 4511–4520 (2015)Google Scholar
  14. 14.
    Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: It’s written all over your face: full-face appearance-based gaze estimation (2016)Google Scholar
  15. 15.
    Wood, E., Baltrušaitis, T., Morency, L.P., Robinson, P., Bulling, A.: Learning an appearance-based gaze estimator from one million synthesised images. In: Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research and Applications, pp. 131–138 (2016)Google Scholar
  16. 16.
    Hansen, D.W., Ji, Q.: In the eye of the beholder: a survey of models for eyes and gaze. IEEE Trans. Pattern Anal. Mach. Intell. 32(3), 478–500 (2010)CrossRefGoogle Scholar
  17. 17.
    Venkateswarlu, R., et al.: Eye gaze estimation from a single image of one eye, pp. 136–143 (2003)Google Scholar
  18. 18.
    Funes Mora, K.A., Odobez, J.M.: Geometric generative gaze estimation (G3E) for remote RGB-D cameras, pp. 1773–1780, June 2014Google Scholar
  19. 19.
    Wood, E., Baltrušaitis, T., Morency, L.-P., Robinson, P., Bulling, A.: A 3D morphable eye region model for gaze estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 297–313. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_18CrossRefGoogle Scholar
  20. 20.
    Ishikawa, T.: Passive driver gaze tracking with active appearance models (2004)Google Scholar
  21. 21.
    Wood, E., Bulling, A.: Eyetab: model-based gaze estimation on unmodified tablet computers, pp. 207–210 (2014)Google Scholar
  22. 22.
    Gou, C., Wu, Y., Wang, K., Wang, F.Y., Ji, Q.: Learning-by-synthesis for accurate eye detection. In: ICPR (2016)Google Scholar
  23. 23.
    Gou, C., Wu, Y., Wang, K., Wang, K., Wang, F., Ji, Q.: A joint cascaded framework for simultaneous eye detection and eye state estimation. Pattern Recogn. 67, 23–31 (2017)CrossRefGoogle Scholar
  24. 24.
    Timm, F., Barth, E.: Accurate eye centre localisation by means of gradients. In: VISAPP (2011)Google Scholar
  25. 25.
    Villanueva, A., Ponz, V., Sesma-Sanchez, L., Ariz, M., Porta, S., Cabeza, R.: Hybrid method based on topography for robust detection of iris center and eye corners. ACM Trans. Multimedia Comput. Commun. Appl. 9(4) (2013)CrossRefGoogle Scholar
  26. 26.
    Tan, K.H., Kriegman, D.J., Ahuja, N.: Appearance-based eye gaze estimation, pp. 191–195 (2002)Google Scholar
  27. 27.
    Noris, B., Keller, J.B., Billard, A.: A wearable gaze tracking system for children in unconstrained environments. Comput. Vis. Image Underst. 115(4), 476–486 (2011)CrossRefGoogle Scholar
  28. 28.
    Martinez, F., Carbone, A., Pissaloux, E.: Gaze estimation using local features and non-linear regression, pp. 1961–1964 (2012)Google Scholar
  29. 29.
    Sugano, Y., Matsushita, Y., Sato, Y.: Learning-by-synthesis for appearance-based 3D gaze estimation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1821–1828 (2014)Google Scholar
  30. 30.
    Lu, F., Sugano, Y., Okabe, T., Sato, Y.: Inferring human gaze from appearance via adaptive linear regression, pp. 153–160 (2011)Google Scholar
  31. 31.
    Funes-Mora, K.A., Odobez, J.M.: Gaze estimation in the 3D space using RGB-D sensors. Int. J. Comput. Vis. 118(2), 194–216 (2016)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Palmero, C., Selva, J., Bagheri, M.A., Escalera, S.: Recurrent CNN for 3d gaze estimation using appearance and shape cues, p. 251 (2018)Google Scholar
  33. 33.
    Park, S., Zhang, X., Bulling, A., Hilliges, O.: Learning to find eye region landmarks for remote gaze estimation in unconstrained settings, pp. 21:1–21:10 (2018)Google Scholar
  34. 34.
    Wang, K., Zhao, R., Ji, Q.: A hierarchical generative model for eye image synthesis and eye gaze estimation, June 2018Google Scholar
  35. 35.
    Park, S., Spurr, A., Hilliges, O.: Deep pictorial gaze estimation, September 2018Google Scholar
  36. 36.
    Ruder, S.: An overview of multi-task learning in deep neural networks, June 2017Google Scholar
  37. 37.
    Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. CoRR abs/1603.01249 (2016)Google Scholar
  38. 38.
    Ranjan, R., Sankaranarayanan, S., Castillo, C.D., Chellappa, R.: An all-in-one convolutional neural network for face analysis. In: 12th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2017, Washington, DC, USA, 30 May–3 June 2017, pp. 17–24 (2017)Google Scholar
  39. 39.
    Wang, F., Han, H., Shan, S., Chen, X.: Deep multi-task learning for joint prediction of heterogeneous face attributes. In: 12th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2017, Washington, DC, USA, 30 May–3 June 2017, pp. 173–179 (2017)Google Scholar
  40. 40.
    Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 94–108. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_7CrossRefGoogle Scholar
  41. 41.
    Yim, J., Jung, H., Yoo, B., Choi, C., Park, D., Kim, J.: Rotating your face using multi-task deep neural network. In: CVPR, pp. 676–684. IEEE Computer Society (2015)Google Scholar
  42. 42.
    Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. CoRR abs/1604.03539 (2016)Google Scholar
  43. 43.
    Lu, Y., Kumar, A., Zhai, S., Cheng, Y., Javidi, T., Feris, R.S.: Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. CoRR abs/1611.05377 (2016)Google Scholar
  44. 44.
    IEEE: A 3D Face Model for Pose and Illumination Invariant Face Recognition. IEEE, Genova, Italy (2009)Google Scholar
  45. 45.
    Cristinacce, D., Cootes, T.F.: Feature detection and tracking with constrained local models, January 2006Google Scholar
  46. 46.
    Funes Mora, K.A., Monay, F., Odobez, J.M.: EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In: Proceedings of the ACM Symposium on Eye Tracking Research and Applications. ACM, March 2014Google Scholar
  47. 47.
    Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. CoRR abs/1612.07828 (2016)Google Scholar
  48. 48.
    Villanueva, A., Ponz, V., Sesma-Sanchez, L., Ariz, M., Porta, S., Cabeza, R.: Hybrid method based on topography for robust detection of iris center and eye corners. ACM Trans. Multimedia Comput. Commun. Appl. 9(4), 25:1–25:20 (2013)CrossRefGoogle Scholar
  49. 49.
    Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: CVPR, pp. 1867–1874. IEEE Computer Society (2014)Google Scholar
  50. 50.
    Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: a 3D facial expression database for visual computing. IEEE Trans. Vis. Comput. Graph. 20(3), 413–425 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Idiap Research InstituteMartignySwitzerland
  2. 2.EPFLLausanneSwitzerland

Personalised recommendations