A semi-automatic annotation tool for unobtrusive gesture analysis

Beugher, Stijn De; Brône, Geert; Goedemé, Toon

doi:10.1007/s10579-017-9404-9

A semi-automatic annotation tool for unobtrusive gesture analysis

Original Paper
Published: 07 November 2017

Volume 52, pages 433–460, (2018)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

671 Accesses
11 Citations
2 Altmetric
Explore all metrics

Abstract

In a variety of research fields, including linguistics, human–computer interaction research, psychology, sociology and behavioral studies, there is a growing interest in the role of gestural behavior related to speech and other modalities. The analysis of multimodal communication requires high-quality video data and detailed annotation of the different semiotic resources under scrutiny. In the majority of cases, the annotation of hand position, hand motion, gesture type, etc. is done manually, which is a time-consuming enterprise requiring multiple annotators and substantial resources. In this paper we present a semi-automatic alternative, in which the focus lies on minimizing the manual workload while guaranteeing highly accurate annotations. First, we discuss our approach, which consists of several processing steps such as identifying the hands in images, calculating motion of the hands, segmenting the recording in gesture and non-gesture events, etc. Second, we validate our approach against existing corpora in terms of accuracy and usefulness. The proposed approach is designed to provide annotations according to the McNeill (Hand and mind: what gestures reveal about thought, University of Chicago Press, Chicago, 1992) gesture space and the output is compatible with annotation tools such as ELAN or ANVIL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speeding up the detection of non-iconic and iconic gestures (SPUDNIG): A toolkit for the automatic detection of hand movements and gestures in video data

Article Open access 23 January 2020

A Protocol for Comparing Gesture and Prosodic Boundaries in Multimodal Corpora

A Coding System with Independent Annotations of Gesture Forms and Functions During Verbal Communication: Development of a Database of Speech and GEsture (DoSaGE)

Article 25 September 2014

Notes

References

Abuczki, Á., & Esfandiari, B. G. (2013). An overview of multimodal corpora, annotation tools and schemes. Argumentum, 9, 86–98.
Google Scholar
Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., & Paggio, P. (2007). The mumin coding scheme for the annotation of feedback, turn management and sequencing phenomena. Language Resources and Evaluation, 41(3–4), 273–287.
Article Google Scholar
Alon, J., Athitsos, V., Yuan, Q., & Sclaroff, S. (2009). A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(9), 1685–1699.
Article Google Scholar
Badi, H. (2016). Recent methods in vision-based hand gesture recognition. International Journal of Data Science and Analytics, 1, 77–87.
Article Google Scholar
Bennewitz, M., Axenbeck, T., Behnke, S., & Burgard, W. (2008). Robust recognition of complex gestures for natural human–robot interaction. In Proceedings of the workshop on interactive robot learning at robotics: science and systems conference (RSS).
Blache, P., Bertrand, R., & Ferré, G. (2009). Creating and exploiting multimodal annotated corpora: The ToMA Project (pp. 38–53).
Brenger, B., & Mittelberg, I. (2015). Shakes, nods and tilts. Motion-capture data profiles of speakers and listeners head gestures. In Proceedings of the 3rd gesture and speech in interaction (GESPIN) conference (pp. 43–48).
Bressem, J. (2013). Transcription systems for gestures, speech, prosody, postures, and gaze. In Proceedings of Body-Language-Communication: An international handbook on multimodality in human interaction (Vol. 1, pp. 1037–1059).
Chang, J. Y. (2015). Nonparametric gesture labeling from multi-modal Data. In Proceedings, Part I: Computer vision—ECCV 2014 workshops, Zurich, Switzerland, September 6–7 and 12, 2014.
De Beugher, S., Brône, G., & Goedemé, T. (2016). Semi-automatic hand annotation making human–human interaction analysis fast and accurate. In Proceedings of proceedings of the 11th joint conference on computer vision, imaging and computer graphics theory and applications (VISAPP) (pp. 552–559).
Demircioğlu, B., Bülbül, G., & Köse, H. (2016). Recognition of sign language hand shape primitives with leap motion. In LREC workshop on the representation and processing of sign languages: Corpus mining (pp. 47–52). Slovenia: Portoro.
Dilsizian, M., Tang, Z., Metaxas, D., Huenerfauth, M., & Neidle, C. (2016). The importance of 3d motion trajectories for computer-based sign recognition. In LREC workshop on the representation and processing of sign languages: Corpus mining (pp. 53–58). Slovenia: Portoro.
Escalera, S., Baró X, Gonzàlez, J., Bautista, M. A., Madadi, M., Reyes, M., et al. (2015). ChaLearn looking at people challenge 2014: Dataset and results. In Proceedings, Part I: Computer Vision—ECCV 2014 Workshops, Zurich, Switzerland, September 6–7 and 12, 2014 (pp. 459–473). Cham: Springer.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.
Article Google Scholar
Ferrari, V., Marin-Jimenez, M., & Zisserman, A. (2009). Pose search: Retrieving people using their pose. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
Feyaerts, K., Brône, G., & Oben, B. (2016). Multimodality in interaction.
Gebre, B. G., Wittenburg, P., & Lenkiewicz, P. (2012). Towards automatic gesture stroke detection. In Proceedings of the eight international conference on language resources and evaluation (LREC), European Language Resources Association (ELRA), Istanbul, Turkey.
Jewitt, C. (2009). The Routledge handbook of multimodal analysis. London: Routledge.
Google Scholar
Jones, M. J., & Rehg, J. M. (2002). Statistical color models with application to skin detection. International Journal of Computer Vision, 46(1), 81–96.
Article Google Scholar
Kalman, R. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82, 35–45.
Article Google Scholar
Kapuscinski, T., Oszust, M., Wysocki, M., & Warchol, D. (2015). Recognition of hand gestures observed by depth cameras. International Journal of Advanced Robotic Systems, 12, 36.
Article Google Scholar
Karlinsky, L., Dinerstein, M., Harari, D., & Ullman, S. (2010). The chains model for detecting parts by their context. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 25–32).
Kuznetsova, A., Leal-Taixe, L., & Rosenhahn, B. (2013). Real-time sign language recognition using a consumer depth camera. In Proceedings of the IEEE international conference on computer vision (ICCV) workshops.
Lücking, A., Bergmann, K., Hahn, F., Kopp, S., & Rieser, H. (2010). The Bielefeld speech and gesture alignment corpus (SaGA). In M. Kipp, J. P. Martin, P. Paggio, & D. Heylen (Eds.), Proceedings of LREC 2010 workshop: Multimodal corpora: Advances in capturing, coding and analyzing multimodality (pp. 92–98).
Marcos-Ramiro, A., Pizarro-Perez, D., Marron-Romera, M., Nguyen, LS., & Gatica-Perez, D. (2013). Body communicative cue extraction for conversational analysis. In Proceedings of IEEE international conference on automatic face and gesture recognition.
McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago, IL: University of Chicago Press.
Google Scholar
Mittal, A., Zisserman, A., & Torr, P. (2011). Hand detection using multiple proposals. In Proceedings of BMVC (pp. 75.1–75.11). BMVA Press
Monnier, C., German, S., & Ost, A. (2015). A Multi-scale boosted detector for efficient and robust gesture recognition. In Proceedings, Part I: Computer Vision—ECCV 2014 Workshops, Zurich, Switzerland, September 6–7 and 12, 2014.
Müller, C., Cienki, A., Fricke, E., Ladewig, S., & McNeill, D. (2013). Body-language-communication: An international handbook on multimodality in human interaction (Vol. 1). Berlin: De Gruyter Mouton.
Google Scholar
Müller, C., Cienki, A., Fricke, E., Ladewig, S., & McNeill, D. (2014). Body-language-communication: An international handbook on multimodality in human interaction (Vol. 2). Berlin: De Gruyter Mouton.
Google Scholar
Neverova, N., Wolf, C., Taylor, G. W., & Nebout, F. (2014). Multi-scale deep learning for gesture detection and localization. In Proceedings of ECCV ChaLearn workshop on looking at people.
Peng, X., Wang, L., Cai, Z., & Qiao, Y. (2015). Action and gesture temporal spotting with super vector representation. In Proceedings, Part I: Computer Vision—ECCV 2014 workshops, Zurich, Switzerland, September 6–7 and 12, 2014 (pp. 518–527). Cham: Springer.
Rahim, N. A. A., Kit, C. W., & See, J. (2006). RGB-H-CbCr skin colour model for human face detection. In Proceedings of M2USIC, Petaling Jaya, Malaysia.
Rautaray, S. S., & Agrawal, A. (2012). Vision based hand gesture recognition for human computer interaction: A survey. Artificial Intelligence Review, 43(1), 1–54.
Article Google Scholar
Schreer, O., & Masneri, S. (2014). Automatic video analysis for annotation of human body motion in humanities research. In International workshop on multimodal corpora in conjunction with 9th edition of the language resources and evaluation conference (LREC) (pp. 29–32).
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 511–518).
Yin, Y., & Davis, R. (2013). Gesture spotting and recognition using salience detection and concatenated hidden Markov models. In Proceedings of the 15th ACM on international conference on multimodal interaction (ICMI), ACM, New York, NY, USA (pp. 489–494).
Zhang, Z., Conly, C., & Athitsos, V. (2014). Hand detection on sign language videos. In Proceedings of the 7th international conference on PErvasive Technologies Related to Assistive Environments (PETRA), ACM (pp 26:1–26:5).

Download references

Acknowledgements

We would like to thank Prof. Irene Mittelberg and her research group for providing the annotations on the NeuroPeirce dataset (Brenger and Mittelberg 2015) as well as the authors of the SaGA dataset (Lücking et al. 2010). The availability of the annotations was crucial for our validation process.

Author information

Authors and Affiliations

EAVISE, KU Leuven, Louvain, Belgium
Stijn De Beugher & Toon Goedemé
MIDI, KU Leuven, Louvain, Belgium
Geert Brône

Authors

Stijn De Beugher
View author publications
You can also search for this author in PubMed Google Scholar
Geert Brône
View author publications
You can also search for this author in PubMed Google Scholar
Toon Goedemé
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stijn De Beugher.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Beugher, S.D., Brône, G. & Goedemé, T. A semi-automatic annotation tool for unobtrusive gesture analysis. Lang Resources & Evaluation 52, 433–460 (2018). https://doi.org/10.1007/s10579-017-9404-9

Download citation

Published: 07 November 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10579-017-9404-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A semi-automatic annotation tool for unobtrusive gesture analysis

Abstract

Access this article

Similar content being viewed by others

Speeding up the detection of non-iconic and iconic gestures (SPUDNIG): A toolkit for the automatic detection of hand movements and gestures in video data

A Protocol for Comparing Gesture and Prosodic Boundaries in Multimodal Corpora

A Coding System with Independent Annotations of Gesture Forms and Functions During Verbal Communication: Development of a Database of Speech and GEsture (DoSaGE)

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A semi-automatic annotation tool for unobtrusive gesture analysis

Abstract

Access this article

Similar content being viewed by others

Speeding up the detection of non-iconic and iconic gestures (SPUDNIG): A toolkit for the automatic detection of hand movements and gestures in video data

A Protocol for Comparing Gesture and Prosodic Boundaries in Multimodal Corpora

A Coding System with Independent Annotations of Gesture Forms and Functions During Verbal Communication: Development of a Database of Speech and GEsture (DoSaGE)

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation