Abstract
The automated annotation of conversational video by semantic miscommunication labels is a challenging topic. Although miscommunications are often obvious to the speakers as well as the observers, it is difficult for machines to detect them from the low-level features. We investigate the utility of gestural cues in this paper among various non-verbal features. Compared with gesture recognition tasks in human-computer interaction, this process is difficult due to the lack of understanding on which cues contribute to miscommunications and the implicitness of gestures. Nine simple gestural features are taken from gesture data, and both simple and complex classifiers are constructed using machine learning. The experimental results suggest that there is no single gestural feature that can predict or explain the occurrence of semantic miscommunication in our setting.
Similar content being viewed by others
References
Alatan AA, Akansu AN, Wolf W (2001) Multi-modal dialog scene detection using hidden Markov models for content-based multimedia indexing. Multimed Tools Appl 14:137–151
Brown D, Parks JC (1972) Interpreting nonverbal behavior, a key to more effective counseling: review of literature. Rehabil Couns Bull 15(3):176–184
Burgoon J, Adkins M, Kruse J, Jensen ML, Meservy T, Twitchell DP, Deokar A, Nunamaker JF, Lu S, Tsechpenakis G, Metaxas DN, Younger RE (2005) An approach for intent identification by building on deception detection. In: Hawaii international conference on system sciences 1, 21a
Buttny R (2004) Talking problems. State University of New York Press, New York
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Accessed 24 May 2010
Erol A, Bebis G, Nicolescu M, Boyle RD, Twombly X (2007) Vision-based hand pose estimation: a review. Comput Vis Image Underst 108(1–2):52–73
Freedman N (1977) Hands, word and mind: on the structuralization of body movements during discourse and the capacity for verbal representation. Plenum Press, New York, pp 219–235
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New York
Heath C (1986) Body movement and speech in medical interaction. Cambridge University Press, Cambridge
Jaimes A, Sebe N (2007) Multimodal human-computer interaction: a survey. Comput Vis Image Underst 108(1–2):116–134
McNeill D (1992) Hand and mind. The University of Chicago Press, Chicago
McNeill D (2008) Annotative practice. Available at: http://mcneilllab.uchicago.edu/pdfs/susan_duncan/Annotative_practice_REV-08.pdf
Meservy TO, Jensen ML, Kruse J, Twitchell DP, Tsechpenakis G, Burgoon JK, Metaxas DN, Nunamaker JF Jr (2005) Deception detection through automatic, unobtrusive analysis of nonverbal behavior. IEEE Intell Syst 20:36–43
Mortensen CD (2006) Human conflict. Rowman & Littlefield, Cambridge
Pavlovic VI, Sharma R, Huang TS (1997) Visual interpretation of hand gestures for human-computer interaction: a review. IEEE Trans Pattern Anal Mach Intell 19:677–695
Rahman AM, Hossain MA, Parra J, Saddik AE (2009) Motion-path based gesture interaction with smart home services. In: Proceedings of the seventeen ACM international conference on multimedia, MM ’09, pp 761–764
Suchman L, Jordan B (1990) Interactional troubles in face-to-face survey interviews. J Am Stat Assoc 85(409):232–241
Takeuchi H, Subramaniam LV, Nasukawa T, Roy S (2009) Getting insights from the voices of customers: conversation mining at a contact center. Inf Sci 179(11):1584–1591
Acknowledgements
We would like to thank the m-project members especially Kunio Tanabe and Tomoko Matsui for commenting on an earlier version of this paper. This research was partially supported by the Grant-in-Aid for Scientific Research 19530620, 21500266, the National Science Foundations under Grant CCF-0958490 and the National Institute of Health under Grant 1-RC2-HG005668-01, and the Function and Induction Research Project, Transdisciplinary Research Integration Center of the Research Organization of Information and Systems.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Inoue, M., Ogihara, M., Hanada, R. et al. Gestural cue analysis in automated semantic miscommunication annotation. Multimed Tools Appl 61, 7–20 (2012). https://doi.org/10.1007/s11042-010-0701-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-010-0701-1