KI - Künstliche Intelligenz

, Volume 31, Issue 4, pp 331–337 | Cite as

Automated interpretation of eye–hand coordination in mobile eye tracking recordings

Identifying demanding phases in human–machine interactions
  • Moritz MussgnugEmail author
  • Daniel Singer
  • Quentin Lohmeyer
  • Mirko Meboldt
Technical Contribution


Mobile eye tracking is beneficial for the analysis of human–machine interactions of tangible products, as it tracks the eye movements reliably in natural environments, and it allows for insights into human behaviour and the associated cognitive processes. However, current methods require a manual screening of the video footage, which is time-consuming and subjective. This work aims to automatically detect cognitive demanding phases in mobile eye tracking recordings. The approach presented combines the user’s perception (gaze) and action (hand) to isolate demanding interactions based upon a multi-modal feature level fusion. It was validated in a usability study of a 3D printer with 40 participants by comparing the usability problems found to a thorough manual analysis. The new approach detected 17 out of 19 problems, while the time for manual analyses was reduced by 63%. More than eye tracking alone, adding the information of the hand enriches the insights into human behaviour. The field of AI could significantly advance our approach by improving the hand-tracking through region proposal CNNs, by detecting the parts of a product and mapping the demanding interactions to these parts, or even by a fully automated end-to-end detection of demanding interactions via deep learning. This could set the basis for machines providing real-time assistance to the machine’s users in cases where they are struggling.


Mobile eye tracking Eye-hand coordination Cognitive processes Human–machine interaction Event interpretation Usability testing 


  1. 1.
    Bowman M, Johannson R, Flanagan J (2009) Eye–hand coordination in a sequential target contact task. Exp Brain Res 195(2):273–283. doi: 10.1007/s00221-009-1781-x CrossRefGoogle Scholar
  2. 2.
    Buettner R (2013) Cognitive workload of humans using artificial intelligence systems: towards objective measurement applying eye-tracking technology. In: KI 2013: 36th German conference on artificial intelligence, 16–20 Sept 2013, vol 8077, pp 37–48. Springer, Berlin. doi: 10.1007/978-3-642-40942-4_4
  3. 3.
    Crawford J, Medendorp W, Marotta J (2004) Spatial transformations for eye hand coordination. J Neurophysiol 92(1):10–19. doi: 10.1152/jn.00117.2004 CrossRefGoogle Scholar
  4. 4.
    Duchowski A (2007) Eye tracking methodology theory and practice. Springer, LondonzbMATHGoogle Scholar
  5. 5.
    Essig K, Sand N, Schack T, Kunsemoller J, Weigelt M, Ritter H (2010) Fully-automatic annotation of scene videos: establish eye tracking effectively in various industrial applications. In: Proceedings of SICE annual conference 2010, Taipei, pp 3304–3307Google Scholar
  6. 6.
    Hayhoe M, Shrivastava A, Mruczek R, Pelz J (2003) Visual memory and motor planning in a natural task. J Vis 3(1):49–63. doi: 10.1167/3.1.6 CrossRefGoogle Scholar
  7. 7.
    Henderson J (2013) Eye movements. In: Reisberg D (ed) The Oxford handbook of cognitive psychology. Oxford University Press, Oxford, pp 69–82. doi: 10.1093/oxfordhb/9780195376746.013.0005 Google Scholar
  8. 8.
    Just M, Carpenter P (1980) A theory of reading: from eye fixations to comprehension. Psychol Rev 87(4):329–354. doi: 10.1037/0033-295X.87.4.329 CrossRefGoogle Scholar
  9. 9.
    König P, Wilming N, Kietzmann T, Ossandón J, Onat S, Ehinger B, Gameiro R, Kaspar K (2016) Eye movements as a window to cognitive processes. J Eye Mov Res 9(5):1–16Google Scholar
  10. 10.
    Land M, Mennie N, Rusted J (1999) The roles of vision and eye movements in the control of activities of daily living. Perception 28(11):1311–1328. doi: 10.1068/p2935 CrossRefGoogle Scholar
  11. 11.
    Land M, Tatler B (2009) Looking and acting—vision and eye movements in natural behaviour. Oxford University Press, Oxford. doi: 10.1093/acprof:oso/9780198570943.001.0001 CrossRefGoogle Scholar
  12. 12.
    Li C, Kitani KM (2013) Pixel-level hand detection in ego-centric videos. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 3570–3577. doi: 10.1109/CVPR.2013.458
  13. 13.
    Mennie N, Hayhoe M, Sullivan B (2007) Look-ahead fixations: anticipatory eye movements in natural tasks. Exp Brain Res 179(3):427–442. doi: 10.1007/s00221-006-0804-0 CrossRefGoogle Scholar
  14. 14.
    Mussgnug M, Sadowska A, Meboldt M (2017) Accepted: Target based analysis—a model to analyse usability tests based on mobile eye tracking recordings. In: Proceedings of the 21st international conference on engineering design (ICED 17), 21–25 Aug 2017. Design Society, VancouverGoogle Scholar
  15. 15.
    Mussgnug M, Waldern F, Meboldt M (2015) Mobile eye tracking in usability testing : designers analysing the user–product interaction. In: Proceedings of the 20th international conference on engineering design (ICED 15), 27–30 July 2015. Design Society, Milan, pp 349–358Google Scholar
  16. 16.
    Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst: 91–99Google Scholar
  17. 17.
    Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) OverFeat: integrated recognition, localization and detection using convolutional networks.
  18. 18.
    Song J, Sörös G, Pece F, Fanello S, Izadi S, Keskin C, Hilliges O (2014) In-air gestures around unmodified mobile devices. In: Proceedings of the 27th annual ACM symposium on user interface software and technology-UIST ’14, pp 319–329. doi: 10.1145/2642918.2647373
  19. 19.
    Suchan J, Bhatt M (2016) Semantic question-answering with video and eye-tracking data: AI foundations for human visual perception driven cognitive film studies. In: Proceedings of the 25th international joint conference on artificial intelligence (IJCAI-16), 9–15 July 2016. AAAI Press, New York, pp 2633–2639Google Scholar
  20. 20.
    Tatler B, Hayhoe M, Land M, Ballard D (2011) Eye guidance in natural vision: reinterpreting salience. J Vis 11(5):1–23. doi: 10.1167/11.5.5 CrossRefGoogle Scholar
  21. 21.
    Tatler B, Land M (2015) Everyday visual attention. In: Fawcett J, Risko E, Kingstone A (eds) The handbook of attention. MIT Press, Cambridge, pp 391–421Google Scholar
  22. 22.
    Triesch J, Ballard D, Hayhoe M, Sullivan B (2003) What you see is what you need. J Vis 3(1):86–94. doi: 10.1167/3.1.9 CrossRefGoogle Scholar
  23. 23.
    Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 7–12 June 2015. IEEE, Boston, pp 4694–4702Google Scholar

Copyright information

© Springer-Verlag GmbH Deutschland 2017

Authors and Affiliations

  1. 1.ETH ZurichZurichSwitzerland

Personalised recommendations