Language Resources and Evaluation

, Volume 51, Issue 2, pp 463–494 | Cite as

The Danish NOMCO corpus: multimodal interaction in first acquaintance conversations

Original Paper
  • 194 Downloads

Abstract

This article presents the Danish NOMCO Corpus, an annotated multimodal collection of video-recorded first acquaintance conversations between Danish speakers. The annotation includes speech transcription including word boundaries, and formal as well as functional coding of gestural behaviours, specifically head movements, facial expressions, and body posture. The corpus has served as the empirical basis for a number of studies of communication phenomena related to turn management, feedback exchange, information packaging and the expression of emotional attitudes. We describe the annotation scheme, procedure, and annotation results. We then summarise a number of studies conducted on the corpus. The corpus is available for research and teaching purposes through the authors of this article.

Keywords

Multimodal corpora First acquaintance conversations Gestural annotation 

Notes

Acknowledgments

The NOMCO project was funded by NOS-HS NORDCORP. We would like to acknowledge our partners from the Universities of Gothenburg and Helsinki, the annotators of the Danish data Sara Andersen, Josephine B. Arrild, Anette Studsgård and Bjørn N. Wesseltolvig. We would also like to thank the two anonymous reviewers for their helpful comments.

References

  1. Alahverdzhieva, K., Lascarides, A. (2010). Analysing speech and co-speech gesture in constraint-based grammars. In S. Müller (Ed.), Proceedings of the HPSG10 conference (pp. 6–26). Stanford: CSLI Publications.Google Scholar
  2. Allwood, J. (2002). Bodily communication dimensions of expression and content. In B. Granström, D. House, & I. Karlsson (Eds.), Multimodality in language and speech systems (pp. 7–26). Dordrecht: Springer. doi:  10.1007/978-94-017-2367-1_2.
  3. Allwood, J. (2008). Dimensions of embodied communication—Towards a typology of embodied communication. In I Wachsmuth, M. Lenzen & G. Knoblich (Eds.), Embodied communication in humans and machines. Oxford: Oxford University Press.Google Scholar
  4. Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., & Paggio, P. (2007). The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. In Martin JC, Paggio P, Kuehnlein P, Stiefelhagen R, Pianesi F (Eds.), Multimodal corpora for modelling human multimodal behaviour, special issue of the international journal of language resources and evaluation (Vol. 41, pp. 273–287). Berlin: Springer.Google Scholar
  5. Allwood, J., Lanzini, S., & Ahlsén, E. (2014). Contributions of different modalities to the attribution of affective-epistemic states. In P. Paggio & B. N. Wessel-Tolvig (Eds.), Proceedings from the 1st European symposium on multimodal communication University of Malta (pp. 1–6). Valletta: Linköping University Electronic Press.Google Scholar
  6. Allwood, J., Nivre, J., & Ahlsén, E. (1993). On the semantics and pragmatics of linguistic feedback. Journal of Semantics, 9(1), 1–26.CrossRefGoogle Scholar
  7. Argyle, M., & Cook, M. (1976). Gaze and mutual gaze. Cambridge: Cambridge University Press.Google Scholar
  8. Aung, M. S. H., Bianchi-Berthouze, N., Watson, P., & Williams, A. C. D. C. (2014). Automatic recognition of fear-avoidance behaviour in chronic pain physical rehabilitation. In Proceedings of 8th international conference on pervasive computing tehcologies for healthcare.Google Scholar
  9. Boersma, P., & Weenink, D. (2009). Praat: Doing phonetics by computer (version 5.1.05) [computer program]. Retrieved May 1, 2009. From http://www.praat.org/.
  10. Bolinger, D. (1986). Intonation and its parts: Melody in spoken English. Stanford, CA: Stanford.Google Scholar
  11. Bourbakis, N., Esposito, A., & Kavraki, D. (2011). Extracting and associating meta-features for understanding people’s emotional behaviour: Face and speech. Journal of Cognitive Computation, 3, 436–448.CrossRefGoogle Scholar
  12. Bunt, H., Alexandersson, J., Choe, J. W., Fang, A. C., Hasida, K., Petukhova, V., et al. (2012). Iso 24617-2: A semantically-based standard for dialogue annotation. In N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), LREC, Citeseer (pp. 430–437). European Language Resources Association (ELRA).Google Scholar
  13. Campbell, N., & Scherer, S. (2010). Comparing measures of synchrony and alignment in dialogue speech timing with respect to turn-taking activity. In Proceedings of Iiterspeech (pp. 2546–2549).Google Scholar
  14. Cavicchio, F., & Poesio, M. (2009). Multimodal corpora annotation: Validation methods to assess coding scheme reliability. In M. Kipp, J. C. Martin, P. Paggio, & D. Heyen (Eds.), Multimodal corpora. Lecture notes in computer science (Vol. 5509). Berlin: Springer.Google Scholar
  15. Cerrato, L. (2007). Investigating communicative feedback phenomena across languages and modalities. Ph.D. thesis, School of Speech and Music Communication, Stockholm, KT.Google Scholar
  16. Cienki, A., & Müller, C. (2008). Metaphor and gesture. Amsterdam: Benjamins.CrossRefGoogle Scholar
  17. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.CrossRefGoogle Scholar
  18. Dancey, C. P., & Reidy, J. (2004). Statistics without maths for psychology: Using spss for windows. Upper Saddle River, NJ: Prentice-Hall Inc.Google Scholar
  19. De Ruiter, J. P. (2000). The production of gesture and speech. In D. McNeill (Ed.), Language and gesture. Cambridge: Cambridge University Press.Google Scholar
  20. Duncan Jr., S., & Fiske, D. (1977). Face-to-face interaction. Hillsdale, NJ: Erlbaum.Google Scholar
  21. Duncan, S. (1972). Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology, 23(2), 283–292.CrossRefGoogle Scholar
  22. Duncan, S., Cassell, J., & Levy, E. (2007). Gesture and the dynamic dimension of language. Amsterdam: Benjamins.CrossRefGoogle Scholar
  23. Ebert, C., Evert, S., & Wilmes, K. (2011). Focus marking via gestures. In I. Reich et al. (Eds.), Proceedings of Sinn & Bedeutung 15 (pp. 193–208). Saarbrücken, Germany: Universaar-Saarland University Press.Google Scholar
  24. Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3/4), 169–200.Google Scholar
  25. Ekman, P., & Friesen, W. (1975). Unmasking the face: A guide to recognizing emotions from facial clues. Upper Saddle River: Prentice-Hall.Google Scholar
  26. Ekman, P., & Friesen, W. V. (1969). The repertoire of nonverbal behavior: Categories, origins, usage, and coding. Semiotica, 1(1), 49–98.CrossRefGoogle Scholar
  27. Enfield, N. J. (2012). TThe anatomy of meaning: Speech, gesture, and composite utterances. Cambridge: Cambridge University Press.Google Scholar
  28. Gibbon, D. (2011). Modelling gesture as speech: A linguistic approach. Poznań Studies in Contemporary Linguistics, 47, 470–508.Google Scholar
  29. Giorgolo, G., & Verstraten, F. A. (2008). Perception of ‘speech-and-gesture’ integration. In Proceedings of the international conference on auditory-visual speech processing 2008 (pp. 31–36).Google Scholar
  30. Goodwin, C. (1981). Conversational organization: Interaction between speakers and hearers. New York: Academic Press.Google Scholar
  31. Gullberg, M., & de Bot, K. (Eds.). (2010). Gestures in language development. Amsterdam: Benjamins.Google Scholar
  32. Hadar, U., Steiner, T., & Rose, F. C. (1984). The timing of shifts of head postures during conversation. Human Movement Science, 3(3), 237–245.CrossRefGoogle Scholar
  33. Hadar, U., Steiner, T. J., & Rose, F. C. (1985). Head movement during listening turns in conversation. Journal of Nonverbal Behavior, 9(4), 214–228.CrossRefGoogle Scholar
  34. Jongejan, B. (2010). Automatic face tracking in anvil. In M. Kipp, J. C. Martin, P. Paggio, & D. Heylen (Eds.), Multimodal corpora: Advances in capturing, coding and analyzing multimodality (pp. 201–208). European Language Resources Association (ELRA), May 18, 2010.Google Scholar
  35. Kendon, A. (1967). Some functions of gaze-direction in social interaction. Acta Psychologica, 26, 22–63.CrossRefGoogle Scholar
  36. Kendon, A. (1978). Differential perception and attentional frame: Two problems for investigation. Semiotica, 24, 305–315.CrossRefGoogle Scholar
  37. Kendon, A. (1980). Gesture and speech: Two aspects of the process of utterance. In M. R. Key (Ed.), Nonverbal communication and language (pp. 207–227). Mouton. Google Scholar
  38. Kendon, A. (2004). Gesture. Cambridge: Cambridge University Press.Google Scholar
  39. Kipp, M. (2004). Gesture generation by Imitation—From human behavior to computer character animation. Boca Raton, FL: Dissertation.com.Google Scholar
  40. Kipp, M., & Martin, J. C. (2009). Gesture and emotion: Can basic gestural form features discriminate emotions? In Proceedings of the international conference on affective computing and intelligent interaction (ACII-09). IEEE Press.Google Scholar
  41. Kita, S., & Özyürek, A. (2003). What does cross-linguistic variation in semantic coordination of speech and gesture reveal?: Evidence for an interface representation of spatial thinking and speaking. Journal of Memory and Language, 48(1), 16–32.CrossRefGoogle Scholar
  42. Kousidis, S., Malisz, Z., Wagner, P., & Schlangen, D. (2013). 2013. Exploring annotation of head gesture forms in spontaneous human interaction. In Proceedings of the Tilburg gesture meeting (TiGeR).Google Scholar
  43. Leonard, T., & Cummins, F. (2010). The temporal relation between beat gestures and speech. Language and Cognitive Processes, 26(10), 1457–1471.CrossRefGoogle Scholar
  44. Levinson, S. (1983). Pragmmatics. Cambridge: Cambridge University Press.Google Scholar
  45. Loehr, D. P. (2004). Gesture and intonation. Ph.D. thesis, Georgetown University.Google Scholar
  46. Loehr, D. P. (2007). Aspects of rhythm in gesture and speech. Gesture, 7(2), 179–214.Google Scholar
  47. Lucey, P., Cohn, J. F., Prkachin, K. M., Solomon, P. E., Chew. S., & Matthews, I. (2012). Painful monitoring: Automatic pain monitoring using the UNBC-McMaster shoulder pain expression archive database. Image and Vision Computing, 30(3), 197–205.CrossRefGoogle Scholar
  48. Maynard, S. K. (1987). Interactional functions of a nonverbal sign: Head movement in Japanese dyadic casual conversation. Journal of Pragmatics, 11, 589–606.CrossRefGoogle Scholar
  49. McClave, E. Z. (2000). Linguistic functions of head movements in the context of speech. Journal of Pragmatics, 32(7), 855–878.CrossRefGoogle Scholar
  50. McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago: University of Chicago Press.Google Scholar
  51. McNeill, D. (2005). Gesture and thought. Chicago: University of Chicago Press.CrossRefGoogle Scholar
  52. Navarretta, C. (2011). Annotating non-verbal behaviours in informal interactions. In I. A. Esposito, A. Vinciarelli, K. Vicsi, C. Pelachaud, & A. Nijholt (Eds.) Analysis of verbal and nonverbal communication and enactment: The processing issues, LNCS (Vol. 6800, pp. 317–324). Berlin: Springer.Google Scholar
  53. Navarretta, C. (2012). Annotating and analyzing emotions in a corpus of first encounters. In IEEE (Ed.) Proceedings of the 3rd IEEE international conference on cognitive infocommunications (pp. 433–438), Kosice.Google Scholar
  54. Navarretta, C. (2013a). Predicting speech overlaps from speech tokens and co-occurring body behaviours in dyadic conversations. In Proceedings of ACM international conference on multimodal interaction (ICMI 2013) (pp. 157–163). Sidney: ACM.Google Scholar
  55. Navarretta, C. (2013b). Transfer learning in multimodal corpora. In IEEE (Ed.) Proceedings of the 4th IEEE international conference on cognitive infocommunications (CogInfoCom2013) (pp. 195–200). Hungary: Budapest.Google Scholar
  56. Navarretta, C. (2014). Predicting emotions in facial expressions from the annotations in naturally occurring first encounters. Knowledge Based Systems, 71, 34–40.CrossRefGoogle Scholar
  57. Navarretta, C., Ahlsén, E., Allwood, J., Jokinen, K., & Paggio, P. (2012). Feedback in Nordic first-encounters: A comparative study (pp. 2494–2499). Istanbul: European language resources distribution agency. Google Scholar
  58. Navarretta, C., & Paggio, P. (2012). Verbal and non-verbal feedback in different types of interactions. In Proceedings of LREC 2012 (pp. 2338–2342). Istanbul.Google Scholar
  59. Navarretta, C., & Paggio, P. (2013a). Classifying multimodal turn management in Danish dyadic first encounters. In NEALT proceedings of the 19th nordic conference of computational linguistics (Nodalida 2013), Oslo, Linköping electronic conference proceedings (pp. 133–146).Google Scholar
  60. Navarretta, C., & Paggio, P. (2013b). Multimodal turn management in Danish dyadic first encounters. In NEALT proceedings. Northern European association for language and technology, Proceedings of the fourth nordic symposium of multimodal communication, Göthenburg, Linköping electronic conference proceedings (pp. 5–12).Google Scholar
  61. Paggio, P. (2006a). Annotating information structure in a corpus of spoken Danish. In Proceedings of the 5th international conference on Language Resources and Evaluation LREC2006 (pp. 1606–1609). Italy: Genova.Google Scholar
  62. Paggio, P. (2006b). Information structure and pauses in a corpus of spoken Danish. In Conference companion of the 11th conference of the European chapter of the association for computational linguistics (pp. 191–194). Italy: Trento.Google Scholar
  63. Paggio, P. (2016). Coordination of head movements and speech in first encounter dialogues. In E. Gilmartin, L. Cerrato, & N. Campbell (Eds.), Proceedings from the 3rd European Symposium on Multimodal Communication, Dublin, September (pp. 69–74). Linköpings universitet: Linköping University Electronic Press.Google Scholar
  64. Paggio, P., Allwood, J., Ahlsén, E., Jokinen, K., & Navarretta, C. (2010). The NOMCO multimodal nordic resource—Goals and characteristics. In Proceedings of the seventh conference on international language resources and evaluation (LREC’10). European Language Resources Association (ELRA), Valletta.Google Scholar
  65. Paggio, P., & Diderichsen, P. (2010). Information structure and communicative functions in spoken and multimodal data. In P.J. Henriksen (Ed.), Linguistic theory and raw sound, Copenhagen studies in language (Vol. 49, pp. 149–168). Frederiksberg: Samfundslitteratur.Google Scholar
  66. Paggio, P., & Navarretta, C. (2011). Head Movements, facial expressions and feedback in Danish first encounters interactions: A culture-specific analysis. In Lecture notes in computer science (Vol. 6766, pp. 583–590). Springer.Google Scholar
  67. Paggio, P., & Navarretta, C. (2012). Classifying the feedback function of head movements and face expressions. In LREC 2012 workshop multimodal corpora—How should multimodal corpora deal with the situation? (pp. 34–37). Istanbul: European language resources distribution agency.Google Scholar
  68. Paggio, P., & Vella, A. (2014). Overlaps in maltese conversational and task oriented dialogues. In P. Paggio & B. N. Wessel-Tolvig (Eds.), Proceedings from the 1st European symposium on multimodal communication University of Malta (pp. 55–64). Valletta: Linköping University Electronic Press.Google Scholar
  69. Peirce, C. S. (1931). Elements of logic. Collected papers of Charles sanders peirce (Vol. 2). Cambridge: Harvard University Press.Google Scholar
  70. Poggi, I. (2007). Hands, mind, face and body: A goal and belief view of multimodal communication. Berlin: Weidler.Google Scholar
  71. Russell, J. A., & Mehrabian, A. (1977). Evidence for a three-factor theory of emotions. Journal of Research in Personality, 11, 273–294.CrossRefGoogle Scholar
  72. Savva, N., Scarinzi, A., & Bianchi-Berthouze, N. (2012). Continuous recognition of player’s affective body expression as dynamic quality of aesthetic experience. IEEE Transactions on Computational Intelligence and AI in Games, 4(3), 199–212.CrossRefGoogle Scholar
  73. Schegloff, E. A. (1984). On some gestures’ relation to talk. In J. M. Atkinson & J. Heritage (Eds.), Structures of social action (pp. 266–298). Cambridge: Cambridge University Press.Google Scholar
  74. Studsgård, A. L., & Navarretta, C. (2013). Annotating attitudes in the Danish NOMCO corpus of first encounters. In NEALT proceedings. Northern European association for language and technology, 4th Nordic symposium on multimodal communication (pp. 85–89). Linköping University Electronic Press.Google Scholar
  75. Vallduví, E., & Engdahl, E. (1996). The linguistic realisation of information packaging. Linguistics, 34(3), 459–520.CrossRefGoogle Scholar
  76. Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd edn.). San Francisco: Morgan Kaufmann.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  1. 1.University of CopenhagenCopenhagenDenmark
  2. 2.University of MaltaMsidaMalta

Personalised recommendations