Semi-automatic Hand Annotation of Egocentric Recordings

  • Stijn De BeugherEmail author
  • Geert Brône
  • Toon Goedemé
Part of the Communications in Computer and Information Science book series (CCIS, volume 598)


We present a fast and accurate algorithm for the detection of human hands in real-life 2D image sequences. We focus on a specific application of hand detection, viz. the annotation of egocentric recordings. A well known type of egocentric camera is the mobile eye-tracker, which is often used in research on human-human interaction. Nowadays, this type of data is typically annotated manually for relevant features (e.g. visual fixations of gestures), which is a time-consuming and error-prone task. We present a semi-automatic approach for the detection of human hands in images. Such an approach reduces the amount of manual analysis drastically while guaranteeing high accuracy. In our algorithm we combine several well-known detection techniques together with an advanced elimination scheme to reduce false detections. We validate our approach using a challenging dataset containing over 4300 hand instances. This validation allows us to explore the capabilities and boundaries of our approach.


Eye-tracking Ego-centric Annotation Hand detection Human-human interaction (Semi-)automatic analysis 



This work is partially funded by KU Leuven via the projects Cametron and InSight Out. We also thank Raphael Den Dooven for his contributions.


  1. 1.
    Al Moubayed, S., Edlund, J., Gustafson, J.: Analysis of gaze and speech patterns in three-party quiz game interaction. In: Interspeech 2013 (2013)Google Scholar
  2. 2.
    Bo, N., Dailey, M.N., Uyyanonvara, B.: Robust hand tracking in low-resolution video sequences. In: Proceedings of the Third Conference on IASTED International Conference: Advances in Computer Science and Technology, Anaheim, CA, USA, pp. 228–233 (2007)Google Scholar
  3. 3.
    Brône, G., Oben, B.: Insight interaction. a multimodal and multifocal dialogue corpus. Lang. Resour. Eval. 49(1), 195–214 (2014)CrossRefGoogle Scholar
  4. 4.
    Buehler, P., Everingham, M., Huttenlocher, D., Zisserman, A.: Long term arm and hand tracking for continuous sign language tv broadcasts. In: Proceedings of the British Machine Vision Conference, pp. 110.1–110.10. BMVA Press (2008)Google Scholar
  5. 5.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)Google Scholar
  6. 6.
    De Beugher, S., Brône, G., Goedemé, T.: Automatic analysis of in-the-wild mobile eye-tracking experiments using object, face and person detection. In: Proceedings of the 9th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (2014)Google Scholar
  7. 7.
    De Beugher, S., Brône, G., Goedemé, T.: A case study on real life mobile eye-tracker data. In: Proceedings of the 10th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (2015)Google Scholar
  8. 8.
    Dubout, C., Fleuret, F.: Exact acceleration of linear object detectors. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 301–311. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  9. 9.
    Eichner, M., Marin-Jimenez, M., Zisserman, A., Ferrari, V.: 2D articulated human pose estimation and retrieval in (almost) unconstrained still images. Int. J. Comput. Vis. 99, 190–214 (2012)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  11. 11.
    Gebre, B.G., Wittenburg, P., Lenkiewicz, P.: Towards automatic gesture stroke detection. In: The Eighth International Conference on Language Resources and Evaluation, pp. 231–235 (2012)Google Scholar
  12. 12.
    Jokinen, K.: Non-verbal signals for turn-taking and feedback. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (2010)Google Scholar
  13. 13.
    Kalman, R.: A new approach to linear filtering and prediction problems. Trans. ASME J. Basic Eng. 82, 35–45 (1960)CrossRefGoogle Scholar
  14. 14.
    Karlinsky, L., Dinerstein, M., Harari, D., Ullman, S.: The chains model for detecting parts by their context. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 25–32 (2010)Google Scholar
  15. 15.
    Mittal, A., Zisserman, A., Torr, P.: Hand detection using multiple proposals. In: Proceedings of the British Machine Vision Conference, pp. 75.1–75.11. BMVA Press (2011)Google Scholar
  16. 16.
    Abdul Rahim, N.A., Wei, K.C., See, J.: RGB-H-CbCr skin colour model for human face detection. In: MMU International Symposium on Information and Communications Technologies (M2USIC), Petaling Jaya, Malaysia (2006)Google Scholar
  17. 17.
    Pfister, T., Charles, J., Everingham, M., Zisserman, A.: Automatic and efficient long term arm and hand tracking for continuous sign language TV broadcasts. In: British Machine Vision Conference (2012)Google Scholar
  18. 18.
    Spruyt, V., Ledda, A., Philips, W.: Real-time, long-term hand tracking with unsupervised initialization. In: Proceedings of the IEEE International Conference on Image Processing, pp. 3730–3734. IEEE (2013)Google Scholar
  19. 19.
    Van den Bergh, M., Van Gool, L.: Combining rgb and tof cameras for real-time 3d hand gesture interaction. In: Proceedings of the 2011 IEEE Workshop on Applications of Computer Vision (WACV), WACV 2011, pp. 66–72. IEEE Computer Society, Washington, DC (2011)Google Scholar
  20. 20.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 511–518 (2001)Google Scholar
  21. 21.
    Wang, R.Y., Popović, J.: Real-time hand-tracking with a color glove. In: ACM SIGGRAPH 2009 Papers, pp. 63:1–63:8 (2009)Google Scholar
  22. 22.
    Williams, G., Bregler, C., Hackney, P., Rosenthal, S., Mcdowall, I., Smolskiy, K.: Body signature recognition (2008)Google Scholar
  23. 23.
    Wu, Y., Liu, Q., Huang, T.S.: An adaptive self-organizing color segmentation algorithm with application to robust real-time human hand localization. In: Proceedings of Asian Conference on Computer Vision, pp. 1106–1111 (2000)Google Scholar
  24. 24.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1385–1392. IEEE (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Stijn De Beugher
    • 1
    Email author
  • Geert Brône
    • 2
  • Toon Goedemé
    • 1
  1. 1.EAVISE, ESATKU LeuvenSint-Katelijne-WaverBelgium
  2. 2.MIDI Research GroupKU LeuvenLeuvenBelgium

Personalised recommendations