Skip to main content
Log in

A semi-automatic annotation tool for unobtrusive gesture analysis

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

In a variety of research fields, including linguistics, human–computer interaction research, psychology, sociology and behavioral studies, there is a growing interest in the role of gestural behavior related to speech and other modalities. The analysis of multimodal communication requires high-quality video data and detailed annotation of the different semiotic resources under scrutiny. In the majority of cases, the annotation of hand position, hand motion, gesture type, etc. is done manually, which is a time-consuming enterprise requiring multiple annotators and substantial resources. In this paper we present a semi-automatic alternative, in which the focus lies on minimizing the manual workload while guaranteeing highly accurate annotations. First, we discuss our approach, which consists of several processing steps such as identifying the hands in images, calculating motion of the hands, segmenting the recording in gesture and non-gesture events, etc. Second, we validate our approach against existing corpora in terms of accuracy and usefulness. The proposed approach is designed to provide annotations according to the McNeill (Hand and mind: what gestures reveal about thought, University of Chicago Press, Chicago, 1992) gesture space and the output is compatible with annotation tools such as ELAN or ANVIL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. https://tla.mpi.nl/tools/tla-tools/elan/.

  2. http://www.anvil-software.org.

  3. http://youtu.be/DsxdBc4gGjg.

  4. https://tla.mpi.nl/projects_info/auvis/.

References

  • Abuczki, Á., & Esfandiari, B. G. (2013). An overview of multimodal corpora, annotation tools and schemes. Argumentum, 9, 86–98.

    Google Scholar 

  • Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., & Paggio, P. (2007). The mumin coding scheme for the annotation of feedback, turn management and sequencing phenomena. Language Resources and Evaluation, 41(3–4), 273–287.

    Article  Google Scholar 

  • Alon, J., Athitsos, V., Yuan, Q., & Sclaroff, S. (2009). A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(9), 1685–1699.

    Article  Google Scholar 

  • Badi, H. (2016). Recent methods in vision-based hand gesture recognition. International Journal of Data Science and Analytics, 1, 77–87.

    Article  Google Scholar 

  • Bennewitz, M., Axenbeck, T., Behnke, S., & Burgard, W. (2008). Robust recognition of complex gestures for natural human–robot interaction. In Proceedings of the workshop on interactive robot learning at robotics: science and systems conference (RSS).

  • Blache, P., Bertrand, R., & Ferré, G. (2009). Creating and exploiting multimodal annotated corpora: The ToMA Project (pp. 38–53).

  • Brenger, B., & Mittelberg, I. (2015). Shakes, nods and tilts. Motion-capture data profiles of speakers and listeners head gestures. In Proceedings of the 3rd gesture and speech in interaction (GESPIN) conference (pp. 43–48).

  • Bressem, J. (2013). Transcription systems for gestures, speech, prosody, postures, and gaze. In Proceedings of Body-Language-Communication: An international handbook on multimodality in human interaction (Vol. 1, pp. 1037–1059).

  • Chang, J. Y. (2015). Nonparametric gesture labeling from multi-modal Data. In Proceedings, Part I: Computer vision—ECCV 2014 workshops, Zurich, Switzerland, September 6–7 and 12, 2014.

  • De Beugher, S., Brône, G., & Goedemé, T. (2016). Semi-automatic hand annotation making human–human interaction analysis fast and accurate. In Proceedings of proceedings of the 11th joint conference on computer vision, imaging and computer graphics theory and applications (VISAPP) (pp. 552–559).

  • Demircioğlu, B., Bülbül, G., & Köse, H. (2016). Recognition of sign language hand shape primitives with leap motion. In LREC workshop on the representation and processing of sign languages: Corpus mining (pp. 47–52). Slovenia: Portoro.

  • Dilsizian, M., Tang, Z., Metaxas, D., Huenerfauth, M., & Neidle, C. (2016). The importance of 3d motion trajectories for computer-based sign recognition. In LREC workshop on the representation and processing of sign languages: Corpus mining (pp. 53–58). Slovenia: Portoro.

  • Escalera, S., Baró X, Gonzàlez, J., Bautista, M. A., Madadi, M., Reyes, M., et al. (2015). ChaLearn looking at people challenge 2014: Dataset and results. In Proceedings, Part I: Computer Vision—ECCV 2014 Workshops, Zurich, Switzerland, September 6–7 and 12, 2014 (pp. 459–473). Cham: Springer.

  • Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.

    Article  Google Scholar 

  • Ferrari, V., Marin-Jimenez, M., & Zisserman, A. (2009). Pose search: Retrieving people using their pose. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).

  • Feyaerts, K., Brône, G., & Oben, B. (2016). Multimodality in interaction.

  • Gebre, B. G., Wittenburg, P., & Lenkiewicz, P. (2012). Towards automatic gesture stroke detection. In Proceedings of the eight international conference on language resources and evaluation (LREC), European Language Resources Association (ELRA), Istanbul, Turkey.

  • Jewitt, C. (2009). The Routledge handbook of multimodal analysis. London: Routledge.

    Google Scholar 

  • Jones, M. J., & Rehg, J. M. (2002). Statistical color models with application to skin detection. International Journal of Computer Vision, 46(1), 81–96.

    Article  Google Scholar 

  • Kalman, R. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82, 35–45.

    Article  Google Scholar 

  • Kapuscinski, T., Oszust, M., Wysocki, M., & Warchol, D. (2015). Recognition of hand gestures observed by depth cameras. International Journal of Advanced Robotic Systems, 12, 36.

    Article  Google Scholar 

  • Karlinsky, L., Dinerstein, M., Harari, D., & Ullman, S. (2010). The chains model for detecting parts by their context. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 25–32).

  • Kuznetsova, A., Leal-Taixe, L., & Rosenhahn, B. (2013). Real-time sign language recognition using a consumer depth camera. In Proceedings of the IEEE international conference on computer vision (ICCV) workshops.

  • Lücking, A., Bergmann, K., Hahn, F., Kopp, S., & Rieser, H. (2010). The Bielefeld speech and gesture alignment corpus (SaGA). In M. Kipp, J. P. Martin, P. Paggio, & D. Heylen (Eds.), Proceedings of LREC 2010 workshop: Multimodal corpora: Advances in capturing, coding and analyzing multimodality (pp. 92–98).

  • Marcos-Ramiro, A., Pizarro-Perez, D., Marron-Romera, M., Nguyen, LS., & Gatica-Perez, D. (2013). Body communicative cue extraction for conversational analysis. In Proceedings of IEEE international conference on automatic face and gesture recognition.

  • McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago, IL: University of Chicago Press.

    Google Scholar 

  • Mittal, A., Zisserman, A., & Torr, P. (2011). Hand detection using multiple proposals. In Proceedings of BMVC (pp. 75.1–75.11). BMVA Press

  • Monnier, C., German, S., & Ost, A. (2015). A Multi-scale boosted detector for efficient and robust gesture recognition. In Proceedings, Part I: Computer Vision—ECCV 2014 Workshops, Zurich, Switzerland, September 6–7 and 12, 2014.

  • Müller, C., Cienki, A., Fricke, E., Ladewig, S., & McNeill, D. (2013). Body-language-communication: An international handbook on multimodality in human interaction (Vol. 1). Berlin: De Gruyter Mouton.

    Google Scholar 

  • Müller, C., Cienki, A., Fricke, E., Ladewig, S., & McNeill, D. (2014). Body-language-communication: An international handbook on multimodality in human interaction (Vol. 2). Berlin: De Gruyter Mouton.

    Google Scholar 

  • Neverova, N., Wolf, C., Taylor, G. W., & Nebout, F. (2014). Multi-scale deep learning for gesture detection and localization. In Proceedings of ECCV ChaLearn workshop on looking at people.

  • Peng, X., Wang, L., Cai, Z., & Qiao, Y. (2015). Action and gesture temporal spotting with super vector representation. In Proceedings, Part I: Computer Vision—ECCV 2014 workshops, Zurich, Switzerland, September 6–7 and 12, 2014 (pp. 518–527). Cham: Springer.

  • Rahim, N. A. A., Kit, C. W., & See, J. (2006). RGB-H-CbCr skin colour model for human face detection. In Proceedings of M2USIC, Petaling Jaya, Malaysia.

  • Rautaray, S. S., & Agrawal, A. (2012). Vision based hand gesture recognition for human computer interaction: A survey. Artificial Intelligence Review, 43(1), 1–54.

    Article  Google Scholar 

  • Schreer, O., & Masneri, S. (2014). Automatic video analysis for annotation of human body motion in humanities research. In International workshop on multimodal corpora in conjunction with 9th edition of the language resources and evaluation conference (LREC) (pp. 29–32).

  • Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 511–518).

  • Yin, Y., & Davis, R. (2013). Gesture spotting and recognition using salience detection and concatenated hidden Markov models. In Proceedings of the 15th ACM on international conference on multimodal interaction (ICMI), ACM, New York, NY, USA (pp. 489–494).

  • Zhang, Z., Conly, C., & Athitsos, V. (2014). Hand detection on sign language videos. In Proceedings of the 7th international conference on PErvasive Technologies Related to Assistive Environments (PETRA), ACM (pp 26:1–26:5).

Download references

Acknowledgements

We would like to thank Prof. Irene Mittelberg and her research group for providing the annotations on the NeuroPeirce dataset (Brenger and Mittelberg 2015) as well as the authors of the SaGA dataset (Lücking et al. 2010). The availability of the annotations was crucial for our validation process.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stijn De Beugher.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Beugher, S.D., Brône, G. & Goedemé, T. A semi-automatic annotation tool for unobtrusive gesture analysis. Lang Resources & Evaluation 52, 433–460 (2018). https://doi.org/10.1007/s10579-017-9404-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-017-9404-9

Keywords

Navigation