Semi-automatic Training of an Object Recognition System in Scene Camera Data Using Gaze Tracking and Accelerometers

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10528)


Object detection and recognition algorithms usually require large, annotated training sets. The creation of such datasets requires expensive manual annotation. Eye tracking can help in the annotation procedure. Humans use vision constantly to explore the environment and plan motor actions, such as grasping an object.

In this paper we investigate the possibility to semi-automatically train object recognition with eye tracking, accelerometer in scene camera data, learning from the natural hand-eye coordination of humans. Our approach involves three steps. First, sensor data are recorded using eye tracking glasses that are used in combination with accelerometers and surface electromyography that are usually applied when controlling prosthetic hands. Second, a set of patches are extracted automatically from the scene camera data while grasping an object. Third, a convolutional neural network is trained and tested using the extracted patches.

Results show that the parameters of eye-hand coordination can be used to train an object recognition system semi-automatically. These can be exploited with proper sensors to fine-tune a convolutional neural network for object detection and recognition. This approach opens interesting options to train computer vision and multi-modal data integration systems and lays the foundations for future applications in robotics. In particular, this work targets the improvement of prosthetic hands by recognizing the objects that a person may wish to use. However, the approach can easily be generalized.


Semi-automatic training Object recognition Eye tracking 



The authors would like to thank A. Gigli, A. Gijsberts and V. Gregori from the University of Rome “La Sapienza” for their help on pre-processing the data, the Swiss National Science Foundation and the Hasler Foundation that partially supported this work via the Sinergia project # 160837 Megane Pro and the Elgar Pro project respectively.


  1. 1.
    Atzori, M., Gijsberts, A., Castellini, C., Caputo, B., Hager, A.G.M., Elsig, S., Giatsidis, G., Bassetto, F., Müller, H.: Electromyography data for non-invasive naturally-controlled robotic hand prostheses. Sci. Data 1, 140053 (2014)CrossRefGoogle Scholar
  2. 2.
    Biguer, B., Jeannerod, M., Prablanc, C.: The coordination of eye, head, and arm movements during reaching at a single visual target. Exp. Brain Res. 46(2), 301–304 (1982)CrossRefGoogle Scholar
  3. 3.
    Böhme, C., Heinke, D.: Where do we grasp objects? – an experimental verification of the selective attention for action model (SAAM). In: Paletta, L., Tsotsos, J.K. (eds.) WAPCV 2008. LNCS, vol. 5395, pp. 41–53. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-00582-4_4 CrossRefGoogle Scholar
  4. 4.
    Bulloch, M.C., Prime, S.L., Marotta, J.J.: Anticipatory gaze strategies when grasping moving objects. Exp. Brain Res. 233(12), 3413–3423 (2015)CrossRefGoogle Scholar
  5. 5.
    Castellini, C., Sandini, G.: Gaze tracking for robotic control in intelligent teleoperation and prosthetics. In: Proceedings of COGAIN - Communication via Gaze Interaction, November 2014 (2006)Google Scholar
  6. 6.
    Connolly, J.D., Goodale, M.A.: The role of visual feedback of hand position in the control of manual prehension. Exp. Brain Res. 125(3), 281–286 (1999)CrossRefGoogle Scholar
  7. 7.
    Deng, J.D.J., Dong, W.D.W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009)Google Scholar
  8. 8.
    Desanghere, L., Marotta, J.J.: “Graspability” of objects affects gaze patterns during perception and action tasks. Exp. Brain Res. 212, 177–187 (2011)CrossRefGoogle Scholar
  9. 9.
    Desanghere, L., Marotta, J.J.: The influence of object shape and center of mass on grasp and gaze. Front. Psychol. 6, 1537 (2015)CrossRefGoogle Scholar
  10. 10.
    Feix, T., Pawlik, R., Schmiedmayer, H.B., Romero, J., Kragi, D.: A comprehensive grasp taxonomy. In: Robotics, Science and Systems Conference: Workshop on Understanding the Human Hand for Advancing Robotic Manipulation (2009)Google Scholar
  11. 11.
    Giordaniello, F., Cognolato, M., Graziani, M., Gijsberts, A., Gregori, V., Saetta, G., Hager, A.g.M., Tiengo, C., Bassetto, F., Brugger, P., Müller, H., Atzori, M.: Megane Pro: myo-electricity, visual and gaze tracking data acquisitions to improve hand prosthetics. In: ICORR. IEEE (2017)Google Scholar
  12. 12.
    Hayhoe, M.: Vision using routines: a functional account of vision. Vis. Cogn. 7(1–3), 43–64 (2000)CrossRefGoogle Scholar
  13. 13.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia (2014)Google Scholar
  14. 14.
    Johansson, R.S., Westling, G., Bäckström, A., Flanagan, J.R.: Eye-hand coordination in object manipulation. J. Neurosci. 21(17), 6917–6932 (2001)Google Scholar
  15. 15.
    Krassanakis, V., Filippakopoulou, V., Nakos, B.: EyeMMV toolbox: an eye movement post-analysis tool based on a two-step spatial dispersion threshold for fixation identification. J. Eye Mov. Res. 7(1), 1–10 (2014)Google Scholar
  16. 16.
    Land, M., Mennie, N., Rusted, J.: The roles of vision and eye movements in the control of activities of daily living. Perception 28(11), 1311–1328 (1999)CrossRefGoogle Scholar
  17. 17.
    Land, M.F.: Eye movements and the control of actions in everyday life. Prog. Retin. Eye Res. 25(3), 296–324 (2006)CrossRefGoogle Scholar
  18. 18.
    Mishra, A., Aloimonos, Y., Fah, C.L.: Active segmentation with fixation. In: IEEE 12th International Conference on Computer Vision (2009)Google Scholar
  19. 19.
    Papadopoulos, D.P., Clarke, A.D.F., Keller, F., Ferrari, V.: Training object class detectors from eye tracking data. In: European Conference on Computer Vision, pp. 361–376 (2014)Google Scholar
  20. 20.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  22. 22.
    Toyama, T., Kieninger, T., Shafait, F., Dengel, A.: Gaze guided object recognition using a head-mounted eye tracker. In: Proceedings of the Symposium on Eye Tracking Research and Applications, ETRA 2012, vol. 1, no. 212 (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.University of Applied Sciences Western Switzerland (HES-SO)SierreSwitzerland
  2. 2.Rehabilitation Engineering LaboratoryETH ZurichZurichSwitzerland
  3. 3.University of Rome “La Sapienza”RomeItaly
  4. 4.Department of NeurologyUniversity Hospital of ZurichZurichSwitzerland
  5. 5.Clinic of Plastic SurgeryPadova University HospitalPadovaItaly

Personalised recommendations