The Danish NOMCO corpus: multimodal interaction in first acquaintance conversations

Abstract

This article presents the Danish NOMCO Corpus, an annotated multimodal collection of video-recorded first acquaintance conversations between Danish speakers. The annotation includes speech transcription including word boundaries, and formal as well as functional coding of gestural behaviours, specifically head movements, facial expressions, and body posture. The corpus has served as the empirical basis for a number of studies of communication phenomena related to turn management, feedback exchange, information packaging and the expression of emotional attitudes. We describe the annotation scheme, procedure, and annotation results. We then summarise a number of studies conducted on the corpus. The corpus is available for research and teaching purposes through the authors of this article.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Notes

  1. 1.

    We relied on the definition of utterance proposed in Levinson (1983), where an utterance is defined as “the issuance of a sentence, a sentence-analogue, or sentence-fragment, in an actual context” (p. 18).

  2. 2.

    A step in this direction was taken by developing a face and head tracker ANVIL plugin-in (Jongejan 2010) which can be used to further annotate the corpus.

  3. 3.

    In most cases one coder chose one category as the primary and indicated another possible category in the comment field, while the second coder chose the second category as the primary and mentioned the first one in the comment field.

  4. 4.

    Unimodal here is intended in the sense of a gesture not accompanied by a word. We do not investigate whether the nod occurs together with other gestural behaviours.

References

  1. Alahverdzhieva, K., Lascarides, A. (2010). Analysing speech and co-speech gesture in constraint-based grammars. In S. Müller (Ed.), Proceedings of the HPSG10 conference (pp. 6–26). Stanford: CSLI Publications.

  2. Allwood, J. (2002). Bodily communication dimensions of expression and content. In B. Granström, D. House, & I. Karlsson (Eds.), Multimodality in language and speech systems (pp. 7–26). Dordrecht: Springer. doi: 10.1007/978-94-017-2367-1_2.

  3. Allwood, J. (2008). Dimensions of embodied communication—Towards a typology of embodied communication. In I Wachsmuth, M. Lenzen & G. Knoblich (Eds.), Embodied communication in humans and machines. Oxford: Oxford University Press.

  4. Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., & Paggio, P. (2007). The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. In Martin JC, Paggio P, Kuehnlein P, Stiefelhagen R, Pianesi F (Eds.), Multimodal corpora for modelling human multimodal behaviour, special issue of the international journal of language resources and evaluation (Vol. 41, pp. 273–287). Berlin: Springer.

  5. Allwood, J., Lanzini, S., & Ahlsén, E. (2014). Contributions of different modalities to the attribution of affective-epistemic states. In P. Paggio & B. N. Wessel-Tolvig (Eds.), Proceedings from the 1st European symposium on multimodal communication University of Malta (pp. 1–6). Valletta: Linköping University Electronic Press.

  6. Allwood, J., Nivre, J., & Ahlsén, E. (1993). On the semantics and pragmatics of linguistic feedback. Journal of Semantics, 9(1), 1–26.

    Article  Google Scholar 

  7. Argyle, M., & Cook, M. (1976). Gaze and mutual gaze. Cambridge: Cambridge University Press.

    Google Scholar 

  8. Aung, M. S. H., Bianchi-Berthouze, N., Watson, P., & Williams, A. C. D. C. (2014). Automatic recognition of fear-avoidance behaviour in chronic pain physical rehabilitation. In Proceedings of 8th international conference on pervasive computing tehcologies for healthcare.

  9. Boersma, P., & Weenink, D. (2009). Praat: Doing phonetics by computer (version 5.1.05) [computer program]. Retrieved May 1, 2009. From http://www.praat.org/.

  10. Bolinger, D. (1986). Intonation and its parts: Melody in spoken English. Stanford, CA: Stanford.

    Google Scholar 

  11. Bourbakis, N., Esposito, A., & Kavraki, D. (2011). Extracting and associating meta-features for understanding people’s emotional behaviour: Face and speech. Journal of Cognitive Computation, 3, 436–448.

    Article  Google Scholar 

  12. Bunt, H., Alexandersson, J., Choe, J. W., Fang, A. C., Hasida, K., Petukhova, V., et al. (2012). Iso 24617-2: A semantically-based standard for dialogue annotation. In N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), LREC, Citeseer (pp. 430–437). European Language Resources Association (ELRA).

  13. Campbell, N., & Scherer, S. (2010). Comparing measures of synchrony and alignment in dialogue speech timing with respect to turn-taking activity. In Proceedings of Iiterspeech (pp. 2546–2549).

  14. Cavicchio, F., & Poesio, M. (2009). Multimodal corpora annotation: Validation methods to assess coding scheme reliability. In M. Kipp, J. C. Martin, P. Paggio, & D. Heyen (Eds.), Multimodal corpora. Lecture notes in computer science (Vol. 5509). Berlin: Springer.

  15. Cerrato, L. (2007). Investigating communicative feedback phenomena across languages and modalities. Ph.D. thesis, School of Speech and Music Communication, Stockholm, KT.

  16. Cienki, A., & Müller, C. (2008). Metaphor and gesture. Amsterdam: Benjamins.

    Google Scholar 

  17. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.

    Article  Google Scholar 

  18. Dancey, C. P., & Reidy, J. (2004). Statistics without maths for psychology: Using spss for windows. Upper Saddle River, NJ: Prentice-Hall Inc.

    Google Scholar 

  19. De Ruiter, J. P. (2000). The production of gesture and speech. In D. McNeill (Ed.), Language and gesture. Cambridge: Cambridge University Press.

  20. Duncan Jr., S., & Fiske, D. (1977). Face-to-face interaction. Hillsdale, NJ: Erlbaum.

  21. Duncan, S. (1972). Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology, 23(2), 283–292.

    Article  Google Scholar 

  22. Duncan, S., Cassell, J., & Levy, E. (2007). Gesture and the dynamic dimension of language. Amsterdam: Benjamins.

    Google Scholar 

  23. Ebert, C., Evert, S., & Wilmes, K. (2011). Focus marking via gestures. In I. Reich et al. (Eds.), Proceedings of Sinn & Bedeutung 15 (pp. 193–208). Saarbrücken, Germany: Universaar-Saarland University Press.

  24. Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3/4), 169–200.

  25. Ekman, P., & Friesen, W. (1975). Unmasking the face: A guide to recognizing emotions from facial clues. Upper Saddle River: Prentice-Hall.

    Google Scholar 

  26. Ekman, P., & Friesen, W. V. (1969). The repertoire of nonverbal behavior: Categories, origins, usage, and coding. Semiotica, 1(1), 49–98.

    Article  Google Scholar 

  27. Enfield, N. J. (2012). TThe anatomy of meaning: Speech, gesture, and composite utterances. Cambridge: Cambridge University Press.

    Google Scholar 

  28. Gibbon, D. (2011). Modelling gesture as speech: A linguistic approach. Poznań Studies in Contemporary Linguistics, 47, 470–508.

    Google Scholar 

  29. Giorgolo, G., & Verstraten, F. A. (2008). Perception of ‘speech-and-gesture’ integration. In Proceedings of the international conference on auditory-visual speech processing 2008 (pp. 31–36).

  30. Goodwin, C. (1981). Conversational organization: Interaction between speakers and hearers. New York: Academic Press.

    Google Scholar 

  31. Gullberg, M., & de Bot, K. (Eds.). (2010). Gestures in language development. Amsterdam: Benjamins.

    Google Scholar 

  32. Hadar, U., Steiner, T., & Rose, F. C. (1984). The timing of shifts of head postures during conversation. Human Movement Science, 3(3), 237–245.

    Article  Google Scholar 

  33. Hadar, U., Steiner, T. J., & Rose, F. C. (1985). Head movement during listening turns in conversation. Journal of Nonverbal Behavior, 9(4), 214–228.

    Article  Google Scholar 

  34. Jongejan, B. (2010). Automatic face tracking in anvil. In M. Kipp, J. C. Martin, P. Paggio, & D. Heylen (Eds.), Multimodal corpora: Advances in capturing, coding and analyzing multimodality (pp. 201–208). European Language Resources Association (ELRA), May 18, 2010.

  35. Kendon, A. (1967). Some functions of gaze-direction in social interaction. Acta Psychologica, 26, 22–63.

    Article  Google Scholar 

  36. Kendon, A. (1978). Differential perception and attentional frame: Two problems for investigation. Semiotica, 24, 305–315.

    Article  Google Scholar 

  37. Kendon, A. (1980). Gesture and speech: Two aspects of the process of utterance. In M. R. Key (Ed.), Nonverbal communication and language (pp. 207–227). Mouton.

  38. Kendon, A. (2004). Gesture. Cambridge: Cambridge University Press.

    Google Scholar 

  39. Kipp, M. (2004). Gesture generation by Imitation—From human behavior to computer character animation. Boca Raton, FL: Dissertation.com.

  40. Kipp, M., & Martin, J. C. (2009). Gesture and emotion: Can basic gestural form features discriminate emotions? In Proceedings of the international conference on affective computing and intelligent interaction (ACII-09). IEEE Press.

  41. Kita, S., & Özyürek, A. (2003). What does cross-linguistic variation in semantic coordination of speech and gesture reveal?: Evidence for an interface representation of spatial thinking and speaking. Journal of Memory and Language, 48(1), 16–32.

    Article  Google Scholar 

  42. Kousidis, S., Malisz, Z., Wagner, P., & Schlangen, D. (2013). 2013. Exploring annotation of head gesture forms in spontaneous human interaction. In Proceedings of the Tilburg gesture meeting (TiGeR).

  43. Leonard, T., & Cummins, F. (2010). The temporal relation between beat gestures and speech. Language and Cognitive Processes, 26(10), 1457–1471.

    Article  Google Scholar 

  44. Levinson, S. (1983). Pragmmatics. Cambridge: Cambridge University Press.

    Google Scholar 

  45. Loehr, D. P. (2004). Gesture and intonation. Ph.D. thesis, Georgetown University.

  46. Loehr, D. P. (2007). Aspects of rhythm in gesture and speech. Gesture, 7(2), 179–214.

  47. Lucey, P., Cohn, J. F., Prkachin, K. M., Solomon, P. E., Chew. S., & Matthews, I. (2012). Painful monitoring: Automatic pain monitoring using the UNBC-McMaster shoulder pain expression archive database. Image and Vision Computing, 30(3), 197–205.

    Article  Google Scholar 

  48. Maynard, S. K. (1987). Interactional functions of a nonverbal sign: Head movement in Japanese dyadic casual conversation. Journal of Pragmatics, 11, 589–606.

    Article  Google Scholar 

  49. McClave, E. Z. (2000). Linguistic functions of head movements in the context of speech. Journal of Pragmatics, 32(7), 855–878.

    Article  Google Scholar 

  50. McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago: University of Chicago Press.

    Google Scholar 

  51. McNeill, D. (2005). Gesture and thought. Chicago: University of Chicago Press.

    Google Scholar 

  52. Navarretta, C. (2011). Annotating non-verbal behaviours in informal interactions. In I. A. Esposito, A. Vinciarelli, K. Vicsi, C. Pelachaud, & A. Nijholt (Eds.) Analysis of verbal and nonverbal communication and enactment: The processing issues, LNCS (Vol. 6800, pp. 317–324). Berlin: Springer.

  53. Navarretta, C. (2012). Annotating and analyzing emotions in a corpus of first encounters. In IEEE (Ed.) Proceedings of the 3rd IEEE international conference on cognitive infocommunications (pp. 433–438), Kosice.

  54. Navarretta, C. (2013a). Predicting speech overlaps from speech tokens and co-occurring body behaviours in dyadic conversations. In Proceedings of ACM international conference on multimodal interaction (ICMI 2013) (pp. 157–163). Sidney: ACM.

  55. Navarretta, C. (2013b). Transfer learning in multimodal corpora. In IEEE (Ed.) Proceedings of the 4th IEEE international conference on cognitive infocommunications (CogInfoCom2013) (pp. 195–200). Hungary: Budapest.

  56. Navarretta, C. (2014). Predicting emotions in facial expressions from the annotations in naturally occurring first encounters. Knowledge Based Systems, 71, 34–40.

    Article  Google Scholar 

  57. Navarretta, C., Ahlsén, E., Allwood, J., Jokinen, K., & Paggio, P. (2012). Feedback in Nordic first-encounters: A comparative study (pp. 2494–2499). Istanbul: European language resources distribution agency.

  58. Navarretta, C., & Paggio, P. (2012). Verbal and non-verbal feedback in different types of interactions. In Proceedings of LREC 2012 (pp. 2338–2342). Istanbul.

  59. Navarretta, C., & Paggio, P. (2013a). Classifying multimodal turn management in Danish dyadic first encounters. In NEALT proceedings of the 19th nordic conference of computational linguistics (Nodalida 2013), Oslo, Linköping electronic conference proceedings (pp. 133–146).

  60. Navarretta, C., & Paggio, P. (2013b). Multimodal turn management in Danish dyadic first encounters. In NEALT proceedings. Northern European association for language and technology, Proceedings of the fourth nordic symposium of multimodal communication, Göthenburg, Linköping electronic conference proceedings (pp. 5–12).

  61. Paggio, P. (2006a). Annotating information structure in a corpus of spoken Danish. In Proceedings of the 5th international conference on Language Resources and Evaluation LREC2006 (pp. 1606–1609). Italy: Genova.

  62. Paggio, P. (2006b). Information structure and pauses in a corpus of spoken Danish. In Conference companion of the 11th conference of the European chapter of the association for computational linguistics (pp. 191–194). Italy: Trento.

  63. Paggio, P. (2016). Coordination of head movements and speech in first encounter dialogues. In E. Gilmartin, L. Cerrato, & N. Campbell (Eds.), Proceedings from the 3rd European Symposium on Multimodal Communication, Dublin, September (pp. 69–74). Linköpings universitet: Linköping University Electronic Press.

  64. Paggio, P., Allwood, J., Ahlsén, E., Jokinen, K., & Navarretta, C. (2010). The NOMCO multimodal nordic resource—Goals and characteristics. In Proceedings of the seventh conference on international language resources and evaluation (LREC’10). European Language Resources Association (ELRA), Valletta.

  65. Paggio, P., & Diderichsen, P. (2010). Information structure and communicative functions in spoken and multimodal data. In P.J. Henriksen (Ed.), Linguistic theory and raw sound, Copenhagen studies in language (Vol. 49, pp. 149–168). Frederiksberg: Samfundslitteratur.

  66. Paggio, P., & Navarretta, C. (2011). Head Movements, facial expressions and feedback in Danish first encounters interactions: A culture-specific analysis. In Lecture notes in computer science (Vol. 6766, pp. 583–590). Springer.

  67. Paggio, P., & Navarretta, C. (2012). Classifying the feedback function of head movements and face expressions. In LREC 2012 workshop multimodal corpora—How should multimodal corpora deal with the situation? (pp. 34–37). Istanbul: European language resources distribution agency.

  68. Paggio, P., & Vella, A. (2014). Overlaps in maltese conversational and task oriented dialogues. In P. Paggio & B. N. Wessel-Tolvig (Eds.), Proceedings from the 1st European symposium on multimodal communication University of Malta (pp. 55–64). Valletta: Linköping University Electronic Press.

  69. Peirce, C. S. (1931). Elements of logic. Collected papers of Charles sanders peirce (Vol. 2). Cambridge: Harvard University Press.

  70. Poggi, I. (2007). Hands, mind, face and body: A goal and belief view of multimodal communication. Berlin: Weidler.

    Google Scholar 

  71. Russell, J. A., & Mehrabian, A. (1977). Evidence for a three-factor theory of emotions. Journal of Research in Personality, 11, 273–294.

    Article  Google Scholar 

  72. Savva, N., Scarinzi, A., & Bianchi-Berthouze, N. (2012). Continuous recognition of player’s affective body expression as dynamic quality of aesthetic experience. IEEE Transactions on Computational Intelligence and AI in Games, 4(3), 199–212.

    Article  Google Scholar 

  73. Schegloff, E. A. (1984). On some gestures’ relation to talk. In J. M. Atkinson & J. Heritage (Eds.), Structures of social action (pp. 266–298). Cambridge: Cambridge University Press.

    Google Scholar 

  74. Studsgård, A. L., & Navarretta, C. (2013). Annotating attitudes in the Danish NOMCO corpus of first encounters. In NEALT proceedings. Northern European association for language and technology, 4th Nordic symposium on multimodal communication (pp. 85–89). Linköping University Electronic Press.

  75. Vallduví, E., & Engdahl, E. (1996). The linguistic realisation of information packaging. Linguistics, 34(3), 459–520.

    Article  Google Scholar 

  76. Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd edn.). San Francisco: Morgan Kaufmann.

    Google Scholar 

Download references

Acknowledgments

The NOMCO project was funded by NOS-HS NORDCORP. We would like to acknowledge our partners from the Universities of Gothenburg and Helsinki, the annotators of the Danish data Sara Andersen, Josephine B. Arrild, Anette Studsgård and Bjørn N. Wesseltolvig. We would also like to thank the two anonymous reviewers for their helpful comments.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Patrizia Paggio.

Appendix

Appendix

See Table 20.

Table 20 Gesture counts

Table 20 displays sums of the various gesture types in the corpus. Note that the total number of facial expressions is in fact 1448: to the 981 expressions that are annotated with one of the general facial features, must be added 467 expressions that are only annotated with a feature related to the eyebrows. Conversely, there 856 facial expressions with no eyebrow annotation. Similarly for body posture, there are 982 behaviours in total: to the 888 movements annotated with a body posture feature must be added 94 shoulder movements with not body posture annotation, while there are 826 body posture annotations not associated with a shoulder movement.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Paggio, P., Navarretta, C. The Danish NOMCO corpus: multimodal interaction in first acquaintance conversations. Lang Resources & Evaluation 51, 463–494 (2017). https://doi.org/10.1007/s10579-016-9371-6

Download citation

Keywords

  • Multimodal corpora
  • First acquaintance conversations
  • Gestural annotation