Abstract
This paper describes an examination of the HuComTech Corpus of multimodal interactions from the viewpoint of an automated dialogue system. Specifically, it looks at the difference between formal and informal interactions from the point of view of image processing. We show that an autonomous dialogue system is able to make inferences about the engagement states of its interlocutor in order to function efficiently without complete dialogue understanding. The paper shows that a very simple image processing algorithm is capable of distinguishing engagement states with a high degree of reliability. We can infer from the results that future autonomous agents might use a similar method to estimate the degree of engagement a human interlocutor has in their conversations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Bradski G (2000) The OpenCV library. Dr Dobb’s J Softw Tools
Campbell N (2009) Tools and resources for visualising conversational-speech interaction. Springer, Berlin, pp 176–188. https://doi.org/10.1007/978-3-642-04793-0_11
Haider F, Campbell N, Luz S (2016) Active speaker detection in human machine multiparty dialogue using visual prosody information. In: 2016 IEEE global conference on signal and information processing (GlobalSIP), pp 1207–1211. https://doi.org/10.1109/GlobalSIP.2016.7906033
Hunyadi L, Váradi T, Szekrényes I (2016) Language technology tools and resources for the analysis of multimodal communication. In: Proceedings of the LT4DH. University of Tübingen, Tübingen, pp 117–124
Kendon A (1990) Conducting interaction: patterns of behavior in focused encounters. Cambridge University Press, Cambridge
Pápay K, Szeghalmy S, Szekrényes I (2011) Hucomtech multimodal corpus annotation. Argumentum 7:330–347
R Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/
Szekrényes I (2014) Annotation and interpretation of prosodic data in the HuComTech corpus for multimodal user interfaces. J Multimodal User Interfaces 8:143–150
Acknowledgements
The author wishes to acknowledge the support of AdaptCentre.ie and the School of Computer Science and Statistics in Dublin, and is particularly grateful to Laszlo Hunyadi and in particular the technical help of István Szekrényes for making the data available. The ADAPT Centre for Digital Content Technology is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Campbell, N. (2020). Watching People Talk; How Machines Can Know We Understand Them—A Study of Engagement in a Conversational Corpus. In: Hunyadi, L., Szekrényes, I. (eds) The Temporal Structure of Multimodal Communication. Intelligent Systems Reference Library, vol 164. Springer, Cham. https://doi.org/10.1007/978-3-030-22895-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-22895-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22894-1
Online ISBN: 978-3-030-22895-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)