Skip to main content

Estimation of behavioral user state based on eye gaze and head pose—application in an e-learning environment


Most e-learning environments which utilize user feedback or profiles, collect such information based on questionnaires, resulting very often in incomplete answers, and sometimes deliberate misleading input. In this work, we present a mechanism which compiles feedback related to the behavioral state of the user (e.g. level of interest) in the context of reading an electronic document; this is achieved using a non-intrusive scheme, which uses a simple web camera to detect and track the head, eye and hand movements and provides an estimation of the level of interest and engagement with the use of a neuro-fuzzy network initialized from evidence from the idea of Theory of Mind and trained from expert-annotated data. The user does not need to interact with the proposed system, and can act as if she was not monitored at all. The proposed scheme is tested in an e-learning environment, in order to adapt the presentation of the content to the user profile and current behavioral state. Experiments show that the proposed system detects reading- and attention-related user states very effectively, in a testbed where children’s reading performance is tracked.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. Asteriadis S, Nikolaidis N, Pitas I, Pardas M (2007) Detection of facial characteristics based on edge information, In: Proceedings of the Second International Conference on Computer Vision Theory and Applications (VISAPP), Barcelona, Spain, vol. 2, pp 247–252

  2. Ba SO, Odobez JM (2006) A study on visual focus of attention recognition from head pose in a meeting room. In: Third Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI06), Washington, USA, pp 1–3

  3. Baron-Cohen S (1995) Mindblindness. MIT, Cambridge

    Google Scholar 

  4. Beymer D, Flickner M (2003) Eye gaze tracking using an active stereo head. In: Proc. Of IEEE CVPR, Madison, WI, vol. 2, pp 451–458

  5. Bosse T, Memon ZA, Treur J (2007) A two-level BDI-agent model for theory of mind and its use in social manipulation. In: Proceedings of the AISB 2007 Workshop on Mindful Environments, pp 335–342

  6. Bouguet JY (2000) Pyramidal implementation of the Lucas Kanade tracker. OpenCV Documentation

  7. Caridakis G, Karpouzis K, Kollias S (2008) User and context adaptive neural networks for emotion recognition. Neurocomputing 71:2553–2562 available online 9 May 2008

    Article  Google Scholar 

  8. Chiu S (1994) Fuzzy model identification based on cluster estimation. J Intell Fuzzy Syst 2(3):267–278

    Google Scholar 

  9. Christie J, Johnsen E (1983) The role of play in social–intellectual development. R Educ Res 53(1):93–115

    Google Scholar 

  10. Commission of European Communities (2000) Communication from the Commission: e-learning—designing tomorrow’s education. Commission of European Communities, Brussels

    Google Scholar 

  11. Cristinacce D, Cootes T, Scott I (2004) A multi-stage approach to facial feature detection. In: Proceedings of the 15th British Machine Vision Conference, London, UK, pp 277–286

  12. D’ Orazio T, Leo M, Cicirelli G, Distante A (2004) An algorithm for real time eye detection in face images. Pattern Recogn 3:278–281

    Google Scholar 

  13. D’ Orazio T, Leo M, Guaragnella C, Distante A (2007) A visual approach for driver inattention detection. Pattern Recogn 40(8):2341–2355

    MATH  Article  Google Scholar 

  14. Daugman JG (1993) High confidence visual recognition of persons by a test of statistical independence. IEEE Trans Pattern Anal Mach Intell 15:1148–1161

    Article  Google Scholar 

  15. Deng JY, Lai F (1997) Region-based template deformation and masking for eye-feature extraction and description. Pattern Recogn 30(3):403–419

    Article  Google Scholar 

  16. Duchowski AT (2002) A breadth-first survey of eye tracking applications. Behav Res Meth Instrum Comput 34(4):455–470

    Google Scholar 

  17. FP6 STREP (2007) Agent Dysl project. Accessed 10 August 2008

  18. Gärdenfors P (2001) Slicing the theory of mind. In: Collin F (ed) Danish yearbook for philosophy. vol. 36. Museum Tusculanum Press, Copenhagen, pp 7–34

    Google Scholar 

  19. Gee AH, Cipolla R (1994) Non-intrusive gaze tracking for human–computer interaction. In: Proceedings of the International Conference on Mechatronics and Machine Vision in Practice Proceedings, Toowoomba, Australia, pp 112–117

  20. Gourier N, Hall D, Crowley J (2004) Estimating face orientation using robust detection of salient facial features. In: Proceedings of Pointing, ICPR, International Workshop on Visual Observation of Deictic Gestures, Cambridge, UK

  21. Hennessey C, Noureddin B, Lawrence P (2006) A single camera eye-gaze tracking system with free head motion. In: Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA ‘05), San Diego, CA, USA, pp 87–94

  22. Huang KS, Trivedi MM (2004) Robust real-time detection, tracking, and pose estimation of faces in video. In: Proceedings of the International Conference on Pattern Recognition (ICPR), Cambridge, UK, vol. 3, pp 965–968

  23. Ioannou S, Caridakis G, Karpouzis K, Kollias S (2007) Robust feature detection for facial expression recognition. Int J Image Video Process 29081

  24. Jang J-SR (1993) ANFIS: adaptive-network-based fuzzy inference systems. IEEE Trans Syst Man Cybernetics 23(3):665–685

    Article  MathSciNet  Google Scholar 

  25. Jesorsky O, Kirchberg KJ, Frischholz RW (2001) Robust face detection using the Hausdorff distance. In: Proceedings of the Third International Conference on Audio and Video-based Biometric Person Authentication (AVBPA), pp 90–95

  26. Karagiannidis C, Sampson DG, Cardinali F (2002) An architecture for web-based e-learning promoting reusable adaptive educational e-content. Educ Technol Soc 5(4):27–37

    Google Scholar 

  27. Khan MM, Ward RD, Ingleby M (2006) Automated facial expression classification and affect interpretation using infrared measurement of facial skin temperature. ACM Trans Auton Adaptive Syst 1(1):1–113

    MATH  Article  Google Scholar 

  28. Lillard A (1993) Pretend play skills and the child’s theory of mind. Child Dev 64(2):348–371

    Article  Google Scholar 

  29. Marsella SC, Pynadath DV, Read SJ (2004) PsychSim: agent-based modeling of social interaction and influence. In: Lovett M, et al. (eds) Proceedings of ICCM’04. Pittsburg, Pennsylvania, USA, pp 243–248

  30. Martin J-C, Caridakis G, Devillers L, Karpouzis L, Abrilian S (2007) Manual annotation and automatic image processing of multimodal emotional behaviors: validating the annotation of TV interviews. Personal and ubiquitous computing (Special issue on Emerging Multimodal Interfaces). Springer, Heidelberg

    Google Scholar 

  31. Matumoto Y, Ogasawara T, Zelinsky A (2002) Behavior recognition based on head pose and gaze direction measurement. In: Proceedings of 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 3, pp 2127–2132

  32. Meyer A, Böhme M, Martinetz T, Barth E (2006) A single-camera remote eye tracker. In: Andre E (ed) Perception and interactive technologies (Lecture notes in artificial intelligence). vol. 4021. Springer, Heidelberg, pp 208–211

    Google Scholar 

  33. Mitrakis N, Theocharis J, Petridis V (2008) A multilayered neuro-fuzzy classifier with self-organizing properties, fuzzy sets and systems. doi:10.1016/j.fss.2008.01.032

  34. Ong S, Ranganath S (2005) Automatic sign language analysis: a survey and the future beyond lexical meaning. IEEE Trans Pattern Anal Mach Intell 27(6):873–891

    Article  Google Scholar 

  35. Otsuka K, Takemae Y, Yamato J, Murase H (2005) A probabilistic inference of multiparty-conversation structure based on Markov switching models of gaze patterns, head direction and utterance. In: Proceedings of International Conf. On Multi-modal and Interfaces, Trento

  36. Pantic M, Patras I (2006) Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences. IEEE Trans Syst Man Cybern B 36(2):433–449

    Article  Google Scholar 

  37. Schneiderman H, Kanade T (2000) A statistical model for 3D object detection applied to faces and cars. IEEE Comput Soc Conf Vis Pattern Recogn 1:746–751

    Google Scholar 

  38. Seo K, Cohen I, You S, Neumann U (2004) Face pose estimation system by combining hybrid ICA-SVM learning and re-registration, In: Proceedings of the 5th Asian Conference on Computer Vision, Jeju, Korea

  39. Smith P, Shah M, da Vitoria Lobo N (2003) Determining driver visual attention with one camera. IEEE Trans Intell Transportation Syst 4(4):205–218

    Article  Google Scholar 

  40. Stiefelhagen R (2004) Estimating head pose with neural networks—results on the pointing. In: 04 ICPR Workshop Evaluation Data, Proceedings of Pointing, ICPR, International Workshop on Visual Observation of Deictic Gestures, Cambridge, UK

  41. Stiefelhagen R, Yang J, Waibel A (2001) Estimating focus of attention based on gaze and sound. In: Proceedings of the Workshop on Perceptive User Interfaces, Orlando, Florida

  42. Sylva K, Runer JS, Genova P (1976) The role of play in the problem-solving of children 3–5 years old. In: Bruner J, Jolly A, Sylva K (eds) PlayΡIts role in development and evolution. Basic Books, New York

    Google Scholar 

  43. Takagi T, Sugeno M (1985) Fuzzy identification of systems and its applications to modelling and control. IEEE Trans Syst Man Cybern 15(1):116–132

    MATH  Google Scholar 

  44. Tzouveli P, Mitropoulou E, Ntalianis K, Kollias S, Symvonis A (2007) Design of an accommodative intelligent educational environments for dyslexic learners. In: Proceedings of the 11th Conference on Learning Difficulties in the Framework of School Education, Athens, Greece

  45. Tzouveli P, Mylonas P, Kollias S (2008) An intelligent e-learning system based on learner profiling and learning resources adaptation. Comput Educ 51(1):224–238

    Article  Google Scholar 

  46. Tzouveli P, Schmidt A, Schneider M, Symvonis A, Kollias S (2008) Adaptive reading assistance for the inclusion of students with dyslexia: the AGENT-DYSL approach. In: Proceedings of the 8th IEEE International Conference on Advanced Learning Technologies (ICALT 2008), Santander, Cantabria, Spain

  47. Viola P, Jones M (2004) Robust real-time face detection. Comput Vis 57(2):137–154

    Article  Google Scholar 

  48. Voit M, Nickel K, Stiefelhagen R (2005) Multi-view head pose estimation using neural networks. In: Proc of the Computer and Robot Vision (CRV’05), Victoria, BC, Canada,347–352

  49. Ward RD (2004) An analysis of facial movement tracking in ordinary human-computer interaction. Interacting with Computers 16(5):879–896

    Article  Google Scholar 

  50. Wu Y, Huang T (2001) Hand modeling, analysis, and recognition for vision-based human computer interaction. IEEE Signal Proc 18:51–60

    Article  Google Scholar 

  51. Yang MH, Kriegman DJ, Ahuja N (2002) Detecting faces in images: a survey. IEEE Trans Pattern Anal Mach Intell 24(1):34–58

    Article  Google Scholar 

  52. Yuxing M, Ching Y, Suen CS, Chunhua F (2007) Pose estimation based on two images from different views. In: Proceedings of the IEEE Workshop on Applications of Computer Vision (WACV’07), Austin, Texas, USA, 9–16

  53. Zhou ZH, Geng X (2004) Projection functions for eye detection. Pattern Recogn 37(5):1049–1056

    MATH  Article  Google Scholar 

Download references


This work has been funded by the FP6 IP Callas (Conveying Affectiveness in Leading-edge Living Adaptive Systems), Contract Number IST-34800 and the FP6 STREP Agent-Dysl (Accommodative Intelligent Educational Environments for Dyslexic learners) Contract Number IST-034549.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Stylianos Asteriadis.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Asteriadis, S., Tzouveli, P., Karpouzis, K. et al. Estimation of behavioral user state based on eye gaze and head pose—application in an e-learning environment. Multimed Tools Appl 41, 469–493 (2009).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • User attention estimation
  • Head pose
  • Eye gaze
  • Facial feature detection and tracking