Language Resources and Evaluation

, Volume 41, Issue 3–4, pp 273–287 | Cite as

The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena

  • Jens Allwood
  • Loredana Cerrato
  • Kristiina Jokinen
  • Costanza Navarretta
  • Patrizia Paggio


This paper deals with a multimodal annotation scheme dedicated to the study of gestures in interpersonal communication, with particular regard to the role played by multimodal expressions for feedback, turn management and sequencing. The scheme has been developed under the framework of the MUMIN network and tested on the analysis of multimodal behaviour in short video clips in Swedish, Finnish and Danish. The preliminary results obtained in these studies show that the reliability of the categories defined in the scheme is acceptable, and that the scheme as a whole constitutes a versatile analysis tool for the study of multimodal communication behaviour.


Multimodal annotation Feedback Hand and facial gestures 


  1. Allwood, J. (2001). Dialog Coding—function and grammar. Gothenburg Papers. Theoretical Linguistics, 85. Department of Linguistics, Gothenburg University.Google Scholar
  2. Allwood, J. (2001b). The structure of dialog. In M. Taylor, D. Bouwhuis, & F. Nel (Eds.), The structure of multimodal dialogue II (pp. 3–24). Amsterdam: Benjamins.Google Scholar
  3. Allwood, J., & Cerrato, L. (2003). A study of gestural feedback expressions. In P. Paggio et al. (Eds.), Proceedings of the First Nordic Symposium on Multimodal Communication (pp. 7–22).Google Scholar
  4. Allwood, J., Nivre, J., & Ahlsén, E. (1992). On the semantics and pragmatics of linguistic feedback. Journal of Semantics, 9, 1–26.CrossRefGoogle Scholar
  5. Allwood, J., Cerrato, L., Dybkjær, L., Jokinen, K., Navarretta, C., & Paggio, P. (2004). The MUMIN multimodal coding scheme. Technical Report availale at CST, University of Copenhagen, Denmark.
  6. Bailly, G., Elisei, F., Badin, P., & Savariaux, C. (2006). Degrees of freedom of facial movements in face-to-face conversational speech. In Proceedings of the LREC 2006 workshop on multimodal corpora (pp. 33–37). Genoa, Italy.Google Scholar
  7. Bernsen, N. O., Dybkjær, L., & Kolodnytsky, M. (2002). The NITE workbench—a tool for annotation of natural interactivity and multimodal data. In Proceedings of LREC 2002 (pp. 43–49).Google Scholar
  8. Cassell, J. (2000). Nudge nudge wink wink: Elements of face-to-face conversation for embodied conversational agents. In J. Cassell et al. (Eds.), Embodied conversational agents (pp. 1–27). Cambridge, MA: MIT.Google Scholar
  9. Cerrato, L. (2007). Investigating communicative feedback phenomena across languages and modalities. PhD Thesis in Speech and Music Communication, Stockholm, KTH.Google Scholar
  10. Cerrato, L. (2004). A coding scheme for the annotation of feedback phenomena in conversational speech. In J. C. Martin et al. (Eds.), Proceedings of the LREC 2004 workshop on models of human behaviour (pp. 25–28).Google Scholar
  11. Clark, H. H., & Schaefer, E. F. (1989). Contributing to discourse. Cognitive Science, 13, 259–294.CrossRefGoogle Scholar
  12. Cowie, R. (2000). Describing the emotional states expressed in speech. In Proceedings of the ISCA workshop on speech and emotion (pp. 11–19).Google Scholar
  13. Craggs, R., & McGee Wood, M. (2004). A categorical annotation scheme for emotion in the linguistic content of dialogue. In Affective dialogue systems. Proceedings of Tutorial and Research workshop, Kloster Irsee, Germany, June 14–16. Lecture Notes in Computer Science (pp. 89–100). Berlin, Heidelberg: SpringerGoogle Scholar
  14. Duncan, S. (2004). Coding manual. Technical Report availale from
  15. Duncan, S. Jr., & Fiske, D.W. (1977). Face-to-face interaction: Research, methods and theory. Lawrence Erlbaum Associates Publishers: Wiley.Google Scholar
  16. Ekman, P. (1999). Basic emotions. In T. Dalgleish & M. Power (Eds.), The handbook of cognition and emotion (pp. 45–60). NY: Wiley.CrossRefGoogle Scholar
  17. Ekman, P., & Friesen, W. V. (1978). Facial action coding system. Palo Alto: Consulting Psychologist Press.Google Scholar
  18. Ekman, P., & Friesen, W. V. (2003). Unmasking the face: A guide to recognizing emotions from facial cues. Cambridge, Massachusetts: Malor Books.Google Scholar
  19. Gunnarsson, M. (2002). User manual for multiTool. Technical Report availale from
  20. Harrigan, J. A., Rosenthal, R., & Scherer, K. R. (2005). The new handbook of methods in nonverbal behavior research. New York: Oxford University Press.Google Scholar
  21. Kendon, A. (2004). Gesture. Cambridge: Cambridge University Press.Google Scholar
  22. Kipp, M. (2001). Anvil—A generic annotation tool for multimodal dialogue. In Proceedings of Eurospeech 2001 (pp. 1367–1370).Google Scholar
  23. Krippendorff, K. (2004). Content analysis: An introduction to its methodology (2nd ed.). Beverly Hills, CA: Sage Publications.Google Scholar
  24. McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago: University of Chicago Press.Google Scholar
  25. Peirce, C. S. (1931). In C. Hartshorne & P. Weiss (Eds.), Elements of logic. Collected papers of Charles Sanders Peirce (Vol. 2.). Cambridge: Harvard University Press.Google Scholar
  26. Rietveld, T., & van Hout, R. (1993). Statistical techniques for the study of language and language behaviour. Berlin: Mouton de Gruyter.Google Scholar
  27. Spooren, W. (2004). On the use of discourse data in language use research. In H. Aertsen, M. Hannay, & R. Lyall (Eds.), Words in their places: A festschrift for J. Lachlan Mackenzie (pp. 381–393). Amsterdam: Faculty of Arts.Google Scholar
  28. Steininger, S., Schiel, F., Dioubina, O., & Rabold, S. (2002). Development of user-state conventions for the multimodal corpus in SmartKom. In Proceedings of the workshop ‘Multimodal Resources and Multimodal Systems Evaluation’ 2002 (pp. 33–37). Las Palmas, Gran Canaria, Spain: ELRA.Google Scholar
  29. Sikorski, T. (1998). Improving dialogue annotation reliability. In Working notes of the AAAI spring symposium on applying machine learning to discourse processing. March.
  30. Thórisson, K. R. (2002). Natural turn-taking needs no manual: Computational theory and model, from perception to action. In G. Granström, et al. (Eds.), Multimodality in language speech systems (pp. 173–207). Dordrecht, the Netherlands: Kluwer Academic.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  • Jens Allwood
    • 1
  • Loredana Cerrato
    • 2
  • Kristiina Jokinen
    • 3
  • Costanza Navarretta
    • 4
  • Patrizia Paggio
    • 4
  1. 1.University of GöteborgGöteborgSweden
  2. 2.TMH/CTT, KTHStockholmSweden
  3. 3.University of HelsinkiHelsinkiFinland
  4. 4.University of CopenhagenCopenhagenDenmark

Personalised recommendations