This article presents the Danish NOMCO Corpus, an annotated multimodal collection of video-recorded first acquaintance conversations between Danish speakers. The annotation includes speech transcription including word boundaries, and formal as well as functional coding of gestural behaviours, specifically head movements, facial expressions, and body posture. The corpus has served as the empirical basis for a number of studies of communication phenomena related to turn management, feedback exchange, information packaging and the expression of emotional attitudes. We describe the annotation scheme, procedure, and annotation results. We then summarise a number of studies conducted on the corpus. The corpus is available for research and teaching purposes through the authors of this article.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
We relied on the definition of utterance proposed in Levinson (1983), where an utterance is defined as “the issuance of a sentence, a sentence-analogue, or sentence-fragment, in an actual context” (p. 18).
A step in this direction was taken by developing a face and head tracker ANVIL plugin-in (Jongejan 2010) which can be used to further annotate the corpus.
In most cases one coder chose one category as the primary and indicated another possible category in the comment field, while the second coder chose the second category as the primary and mentioned the first one in the comment field.
Unimodal here is intended in the sense of a gesture not accompanied by a word. We do not investigate whether the nod occurs together with other gestural behaviours.
Alahverdzhieva, K., Lascarides, A. (2010). Analysing speech and co-speech gesture in constraint-based grammars. In S. Müller (Ed.), Proceedings of the HPSG10 conference (pp. 6–26). Stanford: CSLI Publications.
Allwood, J. (2002). Bodily communication dimensions of expression and content. In B. Granström, D. House, & I. Karlsson (Eds.), Multimodality in language and speech systems (pp. 7–26). Dordrecht: Springer. doi: 10.1007/978-94-017-2367-1_2.
Allwood, J. (2008). Dimensions of embodied communication—Towards a typology of embodied communication. In I Wachsmuth, M. Lenzen & G. Knoblich (Eds.), Embodied communication in humans and machines. Oxford: Oxford University Press.
Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., & Paggio, P. (2007). The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. In Martin JC, Paggio P, Kuehnlein P, Stiefelhagen R, Pianesi F (Eds.), Multimodal corpora for modelling human multimodal behaviour, special issue of the international journal of language resources and evaluation (Vol. 41, pp. 273–287). Berlin: Springer.
Allwood, J., Lanzini, S., & Ahlsén, E. (2014). Contributions of different modalities to the attribution of affective-epistemic states. In P. Paggio & B. N. Wessel-Tolvig (Eds.), Proceedings from the 1st European symposium on multimodal communication University of Malta (pp. 1–6). Valletta: Linköping University Electronic Press.
Allwood, J., Nivre, J., & Ahlsén, E. (1993). On the semantics and pragmatics of linguistic feedback. Journal of Semantics, 9(1), 1–26.
Argyle, M., & Cook, M. (1976). Gaze and mutual gaze. Cambridge: Cambridge University Press.
Aung, M. S. H., Bianchi-Berthouze, N., Watson, P., & Williams, A. C. D. C. (2014). Automatic recognition of fear-avoidance behaviour in chronic pain physical rehabilitation. In Proceedings of 8th international conference on pervasive computing tehcologies for healthcare.
Boersma, P., & Weenink, D. (2009). Praat: Doing phonetics by computer (version 5.1.05) [computer program]. Retrieved May 1, 2009. From http://www.praat.org/.
Bolinger, D. (1986). Intonation and its parts: Melody in spoken English. Stanford, CA: Stanford.
Bourbakis, N., Esposito, A., & Kavraki, D. (2011). Extracting and associating meta-features for understanding people’s emotional behaviour: Face and speech. Journal of Cognitive Computation, 3, 436–448.
Bunt, H., Alexandersson, J., Choe, J. W., Fang, A. C., Hasida, K., Petukhova, V., et al. (2012). Iso 24617-2: A semantically-based standard for dialogue annotation. In N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), LREC, Citeseer (pp. 430–437). European Language Resources Association (ELRA).
Campbell, N., & Scherer, S. (2010). Comparing measures of synchrony and alignment in dialogue speech timing with respect to turn-taking activity. In Proceedings of Iiterspeech (pp. 2546–2549).
Cavicchio, F., & Poesio, M. (2009). Multimodal corpora annotation: Validation methods to assess coding scheme reliability. In M. Kipp, J. C. Martin, P. Paggio, & D. Heyen (Eds.), Multimodal corpora. Lecture notes in computer science (Vol. 5509). Berlin: Springer.
Cerrato, L. (2007). Investigating communicative feedback phenomena across languages and modalities. Ph.D. thesis, School of Speech and Music Communication, Stockholm, KT.
Cienki, A., & Müller, C. (2008). Metaphor and gesture. Amsterdam: Benjamins.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.
Dancey, C. P., & Reidy, J. (2004). Statistics without maths for psychology: Using spss for windows. Upper Saddle River, NJ: Prentice-Hall Inc.
De Ruiter, J. P. (2000). The production of gesture and speech. In D. McNeill (Ed.), Language and gesture. Cambridge: Cambridge University Press.
Duncan Jr., S., & Fiske, D. (1977). Face-to-face interaction. Hillsdale, NJ: Erlbaum.
Duncan, S. (1972). Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology, 23(2), 283–292.
Duncan, S., Cassell, J., & Levy, E. (2007). Gesture and the dynamic dimension of language. Amsterdam: Benjamins.
Ebert, C., Evert, S., & Wilmes, K. (2011). Focus marking via gestures. In I. Reich et al. (Eds.), Proceedings of Sinn & Bedeutung 15 (pp. 193–208). Saarbrücken, Germany: Universaar-Saarland University Press.
Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3/4), 169–200.
Ekman, P., & Friesen, W. (1975). Unmasking the face: A guide to recognizing emotions from facial clues. Upper Saddle River: Prentice-Hall.
Ekman, P., & Friesen, W. V. (1969). The repertoire of nonverbal behavior: Categories, origins, usage, and coding. Semiotica, 1(1), 49–98.
Enfield, N. J. (2012). TThe anatomy of meaning: Speech, gesture, and composite utterances. Cambridge: Cambridge University Press.
Gibbon, D. (2011). Modelling gesture as speech: A linguistic approach. Poznań Studies in Contemporary Linguistics, 47, 470–508.
Giorgolo, G., & Verstraten, F. A. (2008). Perception of ‘speech-and-gesture’ integration. In Proceedings of the international conference on auditory-visual speech processing 2008 (pp. 31–36).
Goodwin, C. (1981). Conversational organization: Interaction between speakers and hearers. New York: Academic Press.
Gullberg, M., & de Bot, K. (Eds.). (2010). Gestures in language development. Amsterdam: Benjamins.
Hadar, U., Steiner, T., & Rose, F. C. (1984). The timing of shifts of head postures during conversation. Human Movement Science, 3(3), 237–245.
Hadar, U., Steiner, T. J., & Rose, F. C. (1985). Head movement during listening turns in conversation. Journal of Nonverbal Behavior, 9(4), 214–228.
Jongejan, B. (2010). Automatic face tracking in anvil. In M. Kipp, J. C. Martin, P. Paggio, & D. Heylen (Eds.), Multimodal corpora: Advances in capturing, coding and analyzing multimodality (pp. 201–208). European Language Resources Association (ELRA), May 18, 2010.
Kendon, A. (1967). Some functions of gaze-direction in social interaction. Acta Psychologica, 26, 22–63.
Kendon, A. (1978). Differential perception and attentional frame: Two problems for investigation. Semiotica, 24, 305–315.
Kendon, A. (1980). Gesture and speech: Two aspects of the process of utterance. In M. R. Key (Ed.), Nonverbal communication and language (pp. 207–227). Mouton.
Kendon, A. (2004). Gesture. Cambridge: Cambridge University Press.
Kipp, M. (2004). Gesture generation by Imitation—From human behavior to computer character animation. Boca Raton, FL: Dissertation.com.
Kipp, M., & Martin, J. C. (2009). Gesture and emotion: Can basic gestural form features discriminate emotions? In Proceedings of the international conference on affective computing and intelligent interaction (ACII-09). IEEE Press.
Kita, S., & Özyürek, A. (2003). What does cross-linguistic variation in semantic coordination of speech and gesture reveal?: Evidence for an interface representation of spatial thinking and speaking. Journal of Memory and Language, 48(1), 16–32.
Kousidis, S., Malisz, Z., Wagner, P., & Schlangen, D. (2013). 2013. Exploring annotation of head gesture forms in spontaneous human interaction. In Proceedings of the Tilburg gesture meeting (TiGeR).
Leonard, T., & Cummins, F. (2010). The temporal relation between beat gestures and speech. Language and Cognitive Processes, 26(10), 1457–1471.
Levinson, S. (1983). Pragmmatics. Cambridge: Cambridge University Press.
Loehr, D. P. (2004). Gesture and intonation. Ph.D. thesis, Georgetown University.
Loehr, D. P. (2007). Aspects of rhythm in gesture and speech. Gesture, 7(2), 179–214.
Lucey, P., Cohn, J. F., Prkachin, K. M., Solomon, P. E., Chew. S., & Matthews, I. (2012). Painful monitoring: Automatic pain monitoring using the UNBC-McMaster shoulder pain expression archive database. Image and Vision Computing, 30(3), 197–205.
Maynard, S. K. (1987). Interactional functions of a nonverbal sign: Head movement in Japanese dyadic casual conversation. Journal of Pragmatics, 11, 589–606.
McClave, E. Z. (2000). Linguistic functions of head movements in the context of speech. Journal of Pragmatics, 32(7), 855–878.
McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago: University of Chicago Press.
McNeill, D. (2005). Gesture and thought. Chicago: University of Chicago Press.
Navarretta, C. (2011). Annotating non-verbal behaviours in informal interactions. In I. A. Esposito, A. Vinciarelli, K. Vicsi, C. Pelachaud, & A. Nijholt (Eds.) Analysis of verbal and nonverbal communication and enactment: The processing issues, LNCS (Vol. 6800, pp. 317–324). Berlin: Springer.
Navarretta, C. (2012). Annotating and analyzing emotions in a corpus of first encounters. In IEEE (Ed.) Proceedings of the 3rd IEEE international conference on cognitive infocommunications (pp. 433–438), Kosice.
Navarretta, C. (2013a). Predicting speech overlaps from speech tokens and co-occurring body behaviours in dyadic conversations. In Proceedings of ACM international conference on multimodal interaction (ICMI 2013) (pp. 157–163). Sidney: ACM.
Navarretta, C. (2013b). Transfer learning in multimodal corpora. In IEEE (Ed.) Proceedings of the 4th IEEE international conference on cognitive infocommunications (CogInfoCom2013) (pp. 195–200). Hungary: Budapest.
Navarretta, C. (2014). Predicting emotions in facial expressions from the annotations in naturally occurring first encounters. Knowledge Based Systems, 71, 34–40.
Navarretta, C., Ahlsén, E., Allwood, J., Jokinen, K., & Paggio, P. (2012). Feedback in Nordic first-encounters: A comparative study (pp. 2494–2499). Istanbul: European language resources distribution agency.
Navarretta, C., & Paggio, P. (2012). Verbal and non-verbal feedback in different types of interactions. In Proceedings of LREC 2012 (pp. 2338–2342). Istanbul.
Navarretta, C., & Paggio, P. (2013a). Classifying multimodal turn management in Danish dyadic first encounters. In NEALT proceedings of the 19th nordic conference of computational linguistics (Nodalida 2013), Oslo, Linköping electronic conference proceedings (pp. 133–146).
Navarretta, C., & Paggio, P. (2013b). Multimodal turn management in Danish dyadic first encounters. In NEALT proceedings. Northern European association for language and technology, Proceedings of the fourth nordic symposium of multimodal communication, Göthenburg, Linköping electronic conference proceedings (pp. 5–12).
Paggio, P. (2006a). Annotating information structure in a corpus of spoken Danish. In Proceedings of the 5th international conference on Language Resources and Evaluation LREC2006 (pp. 1606–1609). Italy: Genova.
Paggio, P. (2006b). Information structure and pauses in a corpus of spoken Danish. In Conference companion of the 11th conference of the European chapter of the association for computational linguistics (pp. 191–194). Italy: Trento.
Paggio, P. (2016). Coordination of head movements and speech in first encounter dialogues. In E. Gilmartin, L. Cerrato, & N. Campbell (Eds.), Proceedings from the 3rd European Symposium on Multimodal Communication, Dublin, September (pp. 69–74). Linköpings universitet: Linköping University Electronic Press.
Paggio, P., Allwood, J., Ahlsén, E., Jokinen, K., & Navarretta, C. (2010). The NOMCO multimodal nordic resource—Goals and characteristics. In Proceedings of the seventh conference on international language resources and evaluation (LREC’10). European Language Resources Association (ELRA), Valletta.
Paggio, P., & Diderichsen, P. (2010). Information structure and communicative functions in spoken and multimodal data. In P.J. Henriksen (Ed.), Linguistic theory and raw sound, Copenhagen studies in language (Vol. 49, pp. 149–168). Frederiksberg: Samfundslitteratur.
Paggio, P., & Navarretta, C. (2011). Head Movements, facial expressions and feedback in Danish first encounters interactions: A culture-specific analysis. In Lecture notes in computer science (Vol. 6766, pp. 583–590). Springer.
Paggio, P., & Navarretta, C. (2012). Classifying the feedback function of head movements and face expressions. In LREC 2012 workshop multimodal corpora—How should multimodal corpora deal with the situation? (pp. 34–37). Istanbul: European language resources distribution agency.
Paggio, P., & Vella, A. (2014). Overlaps in maltese conversational and task oriented dialogues. In P. Paggio & B. N. Wessel-Tolvig (Eds.), Proceedings from the 1st European symposium on multimodal communication University of Malta (pp. 55–64). Valletta: Linköping University Electronic Press.
Peirce, C. S. (1931). Elements of logic. Collected papers of Charles sanders peirce (Vol. 2). Cambridge: Harvard University Press.
Poggi, I. (2007). Hands, mind, face and body: A goal and belief view of multimodal communication. Berlin: Weidler.
Russell, J. A., & Mehrabian, A. (1977). Evidence for a three-factor theory of emotions. Journal of Research in Personality, 11, 273–294.
Savva, N., Scarinzi, A., & Bianchi-Berthouze, N. (2012). Continuous recognition of player’s affective body expression as dynamic quality of aesthetic experience. IEEE Transactions on Computational Intelligence and AI in Games, 4(3), 199–212.
Schegloff, E. A. (1984). On some gestures’ relation to talk. In J. M. Atkinson & J. Heritage (Eds.), Structures of social action (pp. 266–298). Cambridge: Cambridge University Press.
Studsgård, A. L., & Navarretta, C. (2013). Annotating attitudes in the Danish NOMCO corpus of first encounters. In NEALT proceedings. Northern European association for language and technology, 4th Nordic symposium on multimodal communication (pp. 85–89). Linköping University Electronic Press.
Vallduví, E., & Engdahl, E. (1996). The linguistic realisation of information packaging. Linguistics, 34(3), 459–520.
Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd edn.). San Francisco: Morgan Kaufmann.
The NOMCO project was funded by NOS-HS NORDCORP. We would like to acknowledge our partners from the Universities of Gothenburg and Helsinki, the annotators of the Danish data Sara Andersen, Josephine B. Arrild, Anette Studsgård and Bjørn N. Wesseltolvig. We would also like to thank the two anonymous reviewers for their helpful comments.
See Table 20.
Table 20 displays sums of the various gesture types in the corpus. Note that the total number of facial expressions is in fact 1448: to the 981 expressions that are annotated with one of the general facial features, must be added 467 expressions that are only annotated with a feature related to the eyebrows. Conversely, there 856 facial expressions with no eyebrow annotation. Similarly for body posture, there are 982 behaviours in total: to the 888 movements annotated with a body posture feature must be added 94 shoulder movements with not body posture annotation, while there are 826 body posture annotations not associated with a shoulder movement.
About this article
Cite this article
Paggio, P., Navarretta, C. The Danish NOMCO corpus: multimodal interaction in first acquaintance conversations. Lang Resources & Evaluation 51, 463–494 (2017). https://doi.org/10.1007/s10579-016-9371-6
- Multimodal corpora
- First acquaintance conversations
- Gestural annotation