Advertisement

Language Resources and Evaluation

, Volume 52, Issue 2, pp 433–460 | Cite as

A semi-automatic annotation tool for unobtrusive gesture analysis

  • Stijn De Beugher
  • Geert Brône
  • Toon Goedemé
Original Paper

Abstract

In a variety of research fields, including linguistics, human–computer interaction research, psychology, sociology and behavioral studies, there is a growing interest in the role of gestural behavior related to speech and other modalities. The analysis of multimodal communication requires high-quality video data and detailed annotation of the different semiotic resources under scrutiny. In the majority of cases, the annotation of hand position, hand motion, gesture type, etc. is done manually, which is a time-consuming enterprise requiring multiple annotators and substantial resources. In this paper we present a semi-automatic alternative, in which the focus lies on minimizing the manual workload while guaranteeing highly accurate annotations. First, we discuss our approach, which consists of several processing steps such as identifying the hands in images, calculating motion of the hands, segmenting the recording in gesture and non-gesture events, etc. Second, we validate our approach against existing corpora in terms of accuracy and usefulness. The proposed approach is designed to provide annotations according to the McNeill (Hand and mind: what gestures reveal about thought, University of Chicago Press, Chicago, 1992) gesture space and the output is compatible with annotation tools such as ELAN or ANVIL.

Keywords

(Semi)-automatic annotation Gesture analysis Video analysis Hand annotation Gesture space Motion analysis 

Notes

Acknowledgements

We would like to thank Prof. Irene Mittelberg and her research group for providing the annotations on the NeuroPeirce dataset (Brenger and Mittelberg 2015) as well as the authors of the SaGA dataset (Lücking et al. 2010). The availability of the annotations was crucial for our validation process.

References

  1. Abuczki, Á., & Esfandiari, B. G. (2013). An overview of multimodal corpora, annotation tools and schemes. Argumentum, 9, 86–98.Google Scholar
  2. Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., & Paggio, P. (2007). The mumin coding scheme for the annotation of feedback, turn management and sequencing phenomena. Language Resources and Evaluation, 41(3–4), 273–287.CrossRefGoogle Scholar
  3. Alon, J., Athitsos, V., Yuan, Q., & Sclaroff, S. (2009). A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(9), 1685–1699.CrossRefGoogle Scholar
  4. Badi, H. (2016). Recent methods in vision-based hand gesture recognition. International Journal of Data Science and Analytics, 1, 77–87.CrossRefGoogle Scholar
  5. Bennewitz, M., Axenbeck, T., Behnke, S., & Burgard, W. (2008). Robust recognition of complex gestures for natural human–robot interaction. In Proceedings of the workshop on interactive robot learning at robotics: science and systems conference (RSS).Google Scholar
  6. Blache, P., Bertrand, R., & Ferré, G. (2009). Creating and exploiting multimodal annotated corpora: The ToMA Project (pp. 38–53).Google Scholar
  7. Brenger, B., & Mittelberg, I. (2015). Shakes, nods and tilts. Motion-capture data profiles of speakers and listeners head gestures. In Proceedings of the 3rd gesture and speech in interaction (GESPIN) conference (pp. 43–48).Google Scholar
  8. Bressem, J. (2013). Transcription systems for gestures, speech, prosody, postures, and gaze. In Proceedings of Body-Language-Communication: An international handbook on multimodality in human interaction (Vol. 1, pp. 1037–1059).Google Scholar
  9. Chang, J. Y. (2015). Nonparametric gesture labeling from multi-modal Data. In Proceedings, Part I: Computer vision—ECCV 2014 workshops, Zurich, Switzerland, September 6–7 and 12, 2014.Google Scholar
  10. De Beugher, S., Brône, G., & Goedemé, T. (2016). Semi-automatic hand annotation making human–human interaction analysis fast and accurate. In Proceedings of proceedings of the 11th joint conference on computer vision, imaging and computer graphics theory and applications (VISAPP) (pp. 552–559).Google Scholar
  11. Demircioğlu, B., Bülbül, G., & Köse, H. (2016). Recognition of sign language hand shape primitives with leap motion. In LREC workshop on the representation and processing of sign languages: Corpus mining (pp. 47–52). Slovenia: Portoro.Google Scholar
  12. Dilsizian, M., Tang, Z., Metaxas, D., Huenerfauth, M., & Neidle, C. (2016). The importance of 3d motion trajectories for computer-based sign recognition. In LREC workshop on the representation and processing of sign languages: Corpus mining (pp. 53–58). Slovenia: Portoro.Google Scholar
  13. Escalera, S., Baró X, Gonzàlez, J., Bautista, M. A., Madadi, M., Reyes, M., et al. (2015). ChaLearn looking at people challenge 2014: Dataset and results. In Proceedings, Part I: Computer Vision—ECCV 2014 Workshops, Zurich, Switzerland, September 6–7 and 12, 2014 (pp. 459–473). Cham: Springer.Google Scholar
  14. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRefGoogle Scholar
  15. Ferrari, V., Marin-Jimenez, M., & Zisserman, A. (2009). Pose search: Retrieving people using their pose. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).Google Scholar
  16. Feyaerts, K., Brône, G., & Oben, B. (2016). Multimodality in interaction.Google Scholar
  17. Gebre, B. G., Wittenburg, P., & Lenkiewicz, P. (2012). Towards automatic gesture stroke detection. In Proceedings of the eight international conference on language resources and evaluation (LREC), European Language Resources Association (ELRA), Istanbul, Turkey.Google Scholar
  18. Jewitt, C. (2009). The Routledge handbook of multimodal analysis. London: Routledge.Google Scholar
  19. Jones, M. J., & Rehg, J. M. (2002). Statistical color models with application to skin detection. International Journal of Computer Vision, 46(1), 81–96.CrossRefGoogle Scholar
  20. Kalman, R. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82, 35–45.CrossRefGoogle Scholar
  21. Kapuscinski, T., Oszust, M., Wysocki, M., & Warchol, D. (2015). Recognition of hand gestures observed by depth cameras. International Journal of Advanced Robotic Systems, 12, 36.CrossRefGoogle Scholar
  22. Karlinsky, L., Dinerstein, M., Harari, D., & Ullman, S. (2010). The chains model for detecting parts by their context. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 25–32).Google Scholar
  23. Kuznetsova, A., Leal-Taixe, L., & Rosenhahn, B. (2013). Real-time sign language recognition using a consumer depth camera. In Proceedings of the IEEE international conference on computer vision (ICCV) workshops.Google Scholar
  24. Lücking, A., Bergmann, K., Hahn, F., Kopp, S., & Rieser, H. (2010). The Bielefeld speech and gesture alignment corpus (SaGA). In M. Kipp, J. P. Martin, P. Paggio, & D. Heylen (Eds.), Proceedings of LREC 2010 workshop: Multimodal corpora: Advances in capturing, coding and analyzing multimodality (pp. 92–98).Google Scholar
  25. Marcos-Ramiro, A., Pizarro-Perez, D., Marron-Romera, M., Nguyen, LS., & Gatica-Perez, D. (2013). Body communicative cue extraction for conversational analysis. In Proceedings of IEEE international conference on automatic face and gesture recognition.Google Scholar
  26. McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago, IL: University of Chicago Press.Google Scholar
  27. Mittal, A., Zisserman, A., & Torr, P. (2011). Hand detection using multiple proposals. In Proceedings of BMVC (pp. 75.1–75.11). BMVA PressGoogle Scholar
  28. Monnier, C., German, S., & Ost, A. (2015). A Multi-scale boosted detector for efficient and robust gesture recognition. In Proceedings, Part I: Computer Vision—ECCV 2014 Workshops, Zurich, Switzerland, September 6–7 and 12, 2014.Google Scholar
  29. Müller, C., Cienki, A., Fricke, E., Ladewig, S., & McNeill, D. (2013). Body-language-communication: An international handbook on multimodality in human interaction (Vol. 1). Berlin: De Gruyter Mouton.Google Scholar
  30. Müller, C., Cienki, A., Fricke, E., Ladewig, S., & McNeill, D. (2014). Body-language-communication: An international handbook on multimodality in human interaction (Vol. 2). Berlin: De Gruyter Mouton.Google Scholar
  31. Neverova, N., Wolf, C., Taylor, G. W., & Nebout, F. (2014). Multi-scale deep learning for gesture detection and localization. In Proceedings of ECCV ChaLearn workshop on looking at people.Google Scholar
  32. Peng, X., Wang, L., Cai, Z., & Qiao, Y. (2015). Action and gesture temporal spotting with super vector representation. In Proceedings, Part I: Computer Vision—ECCV 2014 workshops, Zurich, Switzerland, September 6–7 and 12, 2014 (pp. 518–527). Cham: Springer.Google Scholar
  33. Rahim, N. A. A., Kit, C. W., & See, J. (2006). RGB-H-CbCr skin colour model for human face detection. In Proceedings of M2USIC, Petaling Jaya, Malaysia.Google Scholar
  34. Rautaray, S. S., & Agrawal, A. (2012). Vision based hand gesture recognition for human computer interaction: A survey. Artificial Intelligence Review, 43(1), 1–54.CrossRefGoogle Scholar
  35. Schreer, O., & Masneri, S. (2014). Automatic video analysis for annotation of human body motion in humanities research. In International workshop on multimodal corpora in conjunction with 9th edition of the language resources and evaluation conference (LREC) (pp. 29–32).Google Scholar
  36. Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 511–518).Google Scholar
  37. Yin, Y., & Davis, R. (2013). Gesture spotting and recognition using salience detection and concatenated hidden Markov models. In Proceedings of the 15th ACM on international conference on multimodal interaction (ICMI), ACM, New York, NY, USA (pp. 489–494).Google Scholar
  38. Zhang, Z., Conly, C., & Athitsos, V. (2014). Hand detection on sign language videos. In Proceedings of the 7th international conference on PErvasive Technologies Related to Assistive Environments (PETRA), ACM (pp 26:1–26:5).Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2017

Authors and Affiliations

  1. 1.EAVISEKU LeuvenLouvainBelgium
  2. 2.MIDIKU LeuvenLouvainBelgium

Personalised recommendations