Unconstrained Gaze Estimation Using Random Forest Regression Voting

  • Amine KaceteEmail author
  • Renaud Séguier
  • Michel Collobert
  • Jérôme Royan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10113)


In this paper we address the problem of automatic gaze estimation using a depth sensor under unconstrained head pose motion and large user-sensor distances. To achieve robustness, we formulate this problem as a regression problem. To solve the task in hand, we propose to use a regression forest according to their high ability of generalization by handling large training set. We train our trees on an important synthetic training data using a statistical model of the human face with an integrated parametric 3D eyeballs. Unlike previous works relying on learning the mapping function using only RGB cues represented by the eye image appearances, we propose to integrate the depth information around the face to build the input vector. In our experiments, we show that our approach can handle real data scenarios presenting strong head pose changes even though it is trained only on synthetic data, we illustrate also the importance of the depth information on the accuracy of the estimation especially in unconstrained scenarios.


Visual Axis Kinect Sensor Binary Test Regression Forest Training Data Generation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Supplementary material

Supplementary material 1 (wmv 6443 KB)


  1. 1.
    Hansen, D.W., Ji, Q.: In the eye of the beholder: a survey of models for eyes and gaze. In: TPAMI (2010)Google Scholar
  2. 2.
    Guestrin, E.D., Eizenman, M.: General theory of remote gaze estimation using the pupil center and corneal reflections. IEEE Trans. Biomed. Eng. 53, 1124–1133 (2006)CrossRefGoogle Scholar
  3. 3.
    Wang, J.G., Sung, E.: Study on eye gaze estimation. IEEE Trans. Syst. Man Cybern. Part B Cybern. 32, 332–350 (2002)CrossRefGoogle Scholar
  4. 4.
    Ishikawa, T.: Passive driver gaze tracking with active appearance models (2004)Google Scholar
  5. 5.
    Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. In: TPAMI (2001)Google Scholar
  6. 6.
    Matsumoto, Y., Zelinsky, A.: An algorithm for real-time stereo vision implementation of head pose and gaze direction measurement. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 499–504. IEEE (2000)Google Scholar
  7. 7.
    Chen, J., Ji, Q.: 3D gaze estimation with a single camera without IR illumination. In: 19th International Conference on Pattern Recognition, ICPR 2008, pp. 1–4. IEEE (2008)Google Scholar
  8. 8.
    Bär, T., Reuter, J.F., Zöllner, J.M.: Driver head pose and gaze estimation based on multi-template ICP 3-D point cloud alignment. In: 2012 15th International IEEE Conference on Intelligent Transportation Systems (ITSC), pp. 1797–1802. IEEE (2012)Google Scholar
  9. 9.
    Jianfeng, L., Shigang, L.: Eye-model-based gaze estimation by RGB-D camera. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 592–596 (2014)Google Scholar
  10. 10.
    Timm, F., Barth, E.: Accurate eye centre localisation by means of gradients. In: VISAPP (2011)Google Scholar
  11. 11.
    Zhu, Z., Ji, Q.: Novel eye gaze tracking techniques under natural head movement. IEEE Trans. Biomed. Eng. 54, 2246–2260 (2007)CrossRefGoogle Scholar
  12. 12.
    Baluja, S., Pomerleau, D.: Non-intrusive gaze tracking using artificial neural networks. Technical report, DTIC Document (1994)Google Scholar
  13. 13.
    Tan, K.H., Kriegman, D.J., Ahuja, N.: Appearance-based eye gaze estimation. In: Proceedings of the Sixth IEEE Workshop on Applications of Computer Vision (WACV 2002), pp. 191–195. IEEE (2002)Google Scholar
  14. 14.
    Hansen, D.W., Hansen, J.P., Nielsen, M., Johansen, A.S., Stegmann, M.B.: Eye typing using Markov and active appearance models. In: Proceedings of the Sixth IEEE Workshop on Applications of Computer Vision (WACV 2002), pp. 132–136. IEEE (2002)Google Scholar
  15. 15.
    Williams, O., Blake, A., Cipolla, R.: Sparse and semi-supervised visual mapping with the S\(^3\)GP. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 230–237. IEEE (2006)Google Scholar
  16. 16.
    Sugano, Y., Matsushita, Y., Sato, Y.: Calibration-free gaze sensing using saliency maps. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2667–2674. IEEE (2010)Google Scholar
  17. 17.
    Lu, F., Sugano, Y., Okabe, T., Sato, Y.: Inferring human gaze from appearance via adaptive linear regression. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 153–160. IEEE (2011)Google Scholar
  18. 18.
    Lu, F., Okabe, T., Sugano, Y., Sato, Y.: A head pose-free approach for appearance-based gaze estimation. In: BMVC, pp. 1–11 (2011)Google Scholar
  19. 19.
    Mora, K.A.F., Odobez, J.M.: Gaze estimation from multimodal kinect data. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 25–30. IEEE (2012)Google Scholar
  20. 20.
    Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: Appearance-based gaze estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4511–4520 (2015)Google Scholar
  21. 21.
    Cappelli, R., Erol, A., Maio, D., Maltoni, D.: Synthetic fingerprint-image generation. In: Proceedings of the 15th International Conference on Pattern Recognition, vol. 3, pp. 471–474. IEEE (2000)Google Scholar
  22. 22.
    Zuo, J., Schmid, N.A., Chen, X.: On generation and analysis of synthetic iris images. IEEE Trans. Inf. Forensics Secur. 2, 77–90 (2007)CrossRefGoogle Scholar
  23. 23.
    Thian, N.P.H., Marcel, S., Bengio, S.: Improving face authentication using virtual samples. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), vol. 3, p. III-233. IEEE (2003)Google Scholar
  24. 24.
    Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., Moore, R.: Real-time human pose recognition in parts from single depth images. Commun. ACM 56, 116–124 (2013)CrossRefGoogle Scholar
  25. 25.
    Fanelli, G., Gall, J., Van Gool, L.: Real time head pose estimation with random regression forests. In: CVPR (2011)Google Scholar
  26. 26.
    Breiman, L.: Random forests. Mach. Learn. 45, 2–32 (2001)zbMATHGoogle Scholar
  27. 27.
    Marée, R., Wehenkel, L., Geurts, P.: Extremely randomized trees and random subwindows for image classification, annotation, and retrieval. In: Criminisi, A., Shotton, J. (eds.) Decision Forests for Computer Vision and Medical Image Analysis, pp. 125–141. Springer, London (2013)CrossRefGoogle Scholar
  28. 28.
    Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough forests for object detection, tracking, and action recognition. In: TPAMI (2011)Google Scholar
  29. 29.
    Lepetit, V., Lagger, P., Fua, P.: Randomized trees for real-time keypoint recognition. In: CVPR (2005)Google Scholar
  30. 30.
    Criminisi, A., Shotton, J., Robertson, D., Konukoglu, E.: Regression forests for efficient anatomy detection and localization in CT studies. In: Medical Computer Vision Workshop (2010)Google Scholar
  31. 31.
    Kacete, A., Seguier, R., Royan, J., Collobert, M., Soladie, C.: Real-time eye pupil localization using hough regression forest. In: Proceedings of the Sixth IEEE Workshop on Applications of Computer Vision (WACV 2016). IEEE (2016)Google Scholar
  32. 32.
    Moosmann, F., Triggs, B., Jurie, F.: Fast discriminative visual codebooks using randomized clustering forests. In: Twentieth Annual Conference on Neural Information Processing Systems (NIPS 2006), pp. 985–992. MIT Press (2007)Google Scholar
  33. 33.
    Ram, P., Gray, A.G.: Density estimation trees. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 627–635. ACM (2011)Google Scholar
  34. 34.
    Paysan, P., Knothe, R., Amberg, B., Romdhani, S., Vetter, T.: A 3D face model for pose and illumination invariant face recognition. In: Advanced Video and Signal Based Surveillance (2009)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Amine Kacete
    • 1
    Email author
  • Renaud Séguier
    • 1
  • Michel Collobert
    • 1
  • Jérôme Royan
    • 1
  1. 1.Institute of Research and Technology B-comCesson-SévignéFrance

Personalised recommendations