Language Resources and Evaluation

, Volume 50, Issue 2, pp 411–442 | Cite as

The ALICO corpus: analysing the active listener

  • Zofia MaliszEmail author
  • Marcin Włodarczak
  • Hendrik Buschmeier
  • Joanna Skubisz
  • Stefan Kopp
  • Petra Wagner
Project Notes


The Active Listening Corpus (ALICO) is a multimodal data set of spontaneous dyadic conversations in German with diverse speech and gestural annotations of both dialogue partners. The annotations consist of short feedback expression transcriptions with corresponding communicative function interpretations as well as segmentations of interpausal units, words, rhythmic prominence intervals and vowel-to-vowel intervals. Additionally, ALICO contains head gesture annotations of both interlocutors. The corpus contributes to research on spontaneous human–human interaction, on functional relations between modalities, and timing variability in dialogue. It also provides data that differentiates between distracted and attentive listeners. We describe the main characteristics of the corpus and briefly present the most important results obtained from analyses in recent years.


Active listening Multimodal feedback Backchannels Head gestures Attention Multimodal corpus 



This research was supported by the Deutsche Forschungsgemeinschaft (DFG) in the Collaborative Research Center 673 “Alignment in Communication” and the Center of Excellence EXC 277 “Cognitive Interaction Technology” (CITEC), as well as the Swedish Research Council (VR) projects “Samtalets rytm” (2009–1766) and “Andning i samtal” (2014–1072).


  1. Allwood, J., Nivre, J., & Ahlsén, E. (1992). On the semantics and pragmatics of linguistic feedback. Journal of Semantics, 9, 1–26. doi: 10.1093/jos/9.1.1.CrossRefGoogle Scholar
  2. Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., & Paggio, P. (2007). The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. Language Resources and Evaluation, 41, 273–287. doi: 10.1007/s10579-007-9061-5.CrossRefGoogle Scholar
  3. Barbosa, P. A. (2006). Incursõeses em torno do ritmo da fala [Incursions into speech rhythm]. Campinas: Pontes.Google Scholar
  4. Bavelas, J. B., Coates, L., & Johnson, T. (2000). Listeners as co-narrators. Journal of Personality and Social Psychology, 79, 941–952. doi: 10.1037/0022-3514.79.6.941.CrossRefGoogle Scholar
  5. Beňuš, Š., Gravano, A., & Hirschberg, J. (2011). Pragmatic aspects of temporal accommodation in turn-taking. Journal of Pragmatics, 43, 3001–3027. doi: 10.1016/j.pragma.2011.05.011.CrossRefGoogle Scholar
  6. Bergmann, K., & Kopp, S. (2006). Verbal or visual? How information is distributed across speech and gesture in spatial dialogue. In Proceedings of the 10th workshop on the semantics and pragmatics of dialogue, Potsdam, Germany, pp. 90–97.Google Scholar
  7. Boersma, P., & Weenink, D. (2013). Praat: Doing phonetics by computer [computer program]. Version 5.3.68.
  8. Breen, M., Dilley, L. C., Kraemer, J., & Edward, G. (2012). Inter-transcriber reliability for two systems of prosodic annotation: ToBI (tones and break indices) and RaP (rhythm and pitch). Corpus Linguistics and Linguistic Theory, 8, 277–312. doi: 10.1515/cllt-2012-0011.CrossRefGoogle Scholar
  9. Bunt, H. (2007). Multifunctionality and multidimensional dialogue act annotation. In E. Ahlsén, P. J. Henrichsen, R. Hirsch, J. Nivre, Å. Abelin, S. Strömqvist, & S. Nicholson (Eds.), Communication—Action—Meaning. A Festschrift to Jens Allwood (pp. 237–259). Gothenburg: Gothenburg University Press.Google Scholar
  10. Buschmeier, H., & Włodarczak, M. (2013). TextGridTools: A TextGrid processing and analysis toolkit for Python. In Proceedings der 24. Konferenz zur elektronischen Sprachsignalverarbeitung, Bielefeld, Germany, pp. 152–157.Google Scholar
  11. Buschmeier, H., & Kopp, S. (2012). Using a Bayesian model of the listener to unveil the dialogue information state. In SemDial 2012: Proceedings of the 16th workshop on the semantics and pragmatics of dialogue, Paris, France, pp. 12–20.Google Scholar
  12. Buschmeier, H., Malisz, Z., Włodarczak, M., Kopp, S., & Wagner, P. (2011). ‘Are you sure you’re paying attention?’ —‘Uh-huh’. Communicating understanding as a marker of attentiveness. In Proceedings of Interspeech 2011, Florence, Italy, pp. 2057–2060.Google Scholar
  13. Buschmeier, H., Malisz, Z., Skubisz, J., Włodarczak, M., Wachsmuth, I., Kopp, S., et al. (2014). ALICO: A multimodal corpus for the study of active listening. In Proceedings of the 9th conference on language resources and evaluation, Iceland, Reykjavík, pp. 3638–3643.Google Scholar
  14. Cerrato, L. (2007). Investigating communicative feedback phenomena across languages and modalities. Ph.D. thesis, KTH Stockholm, Department of Speech, Music and Hearing, Stockholm, Sweden.Google Scholar
  15. Clark, H. H. (1996). Using language. Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511620539.CrossRefGoogle Scholar
  16. Clark, H. H., & Schaefer, E. F. (1989). Contributing to discourse. Cognitive Science, 13, 259–294. doi: 10.1207/s15516709cog1302_7.CrossRefGoogle Scholar
  17. de Kok, I., & Heylen, D. (2011). The MultiLis corpus—Dealing with individual differences in nonverbal listening behavior. In Proceedings of the 3rd COST 2102 International Training School, Caserta, Italy, pp. 362–375. doi: 10.1007/978-3-642-18184-9_32.
  18. de Ruiter, J. P. (2000). The production of gesture and speech. In D. McNeill (Ed.), Language and gesture (pp. 284–311). Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511620850.018.CrossRefGoogle Scholar
  19. Dittmann, A. T., & Llewellyn, L. G. (1968). Relationship between vocalizations and head nods as listener responses. Journal of Personality and Social Psychology, 9, 79–84. doi: 10.1037/h0025722.CrossRefGoogle Scholar
  20. Duncan, S., & Fiske, D. W. (1977). Face-to-face interaction: Research, methods, and theory. Hillsdale, NJ: Erlbaum.Google Scholar
  21. Edlund, J., Heldner, M., Al Moubayed, S., Gravano, A., & Hirschberg, J. (2010). Very short utterances in conversation. In Proceedings Fonetik 2010, Lund, Sweden, pp. 11–16.Google Scholar
  22. Gardner, R. (2001). When listeners talk. Response tokens and listener stance. Amsterdam: John Benjamins Publishing Company. doi: 10.1075/pbns.92.
  23. Garrod, S., & Anderson, A. (1987). Saying what you mean in dialogue: A study in conceptual and semantic co-ordination. Cognition, 27, 181–218. doi: 10.1016/0010-0277(87)90018-7.CrossRefGoogle Scholar
  24. Geertzen, J., Petukhova, V., & Bunt, H. (2008). Evaluating dialogue act tagging with naive and expert annotators. In Proceedings of the 6th international conference on language resources and evaluation, Marrakech, Morocco, pp. 1076–1082.Google Scholar
  25. Goldin-Meadow, S., Alibali, M., & Church, S. (1993). Transitions in concept acquisition: Using the hand to read the mind. Psychological Review, 100, 279–297. doi: 10.1037/0033-295X.100.2.279.CrossRefGoogle Scholar
  26. Goodwin, C. (1981). Conversational organization: Interaction between speakers and hearers. New York: Academic Press.Google Scholar
  27. Gravano, A., Beňuš, Š., Hirschberg, J., Mitchell, S., & Vovsha, I. (2007). Classification of discourse functions of affirmative words in spoken dialogue. In Proceedings of Interspeech 2007, Antwerp, Belgium, pp. 1613–1616.Google Scholar
  28. Grosz, B. J., & Sidner, C. L. (1986). Attention, intentions, and the structure of discourse. Computational Linguistics, 12, 175–204.Google Scholar
  29. Hadar, U., Steiner, T., & Rose, C. F. (1985). Head movement during listening turns in conversation. Journal of Nonverbal Behavior, 9, 214–228. doi: 10.1007/BF00986881.CrossRefGoogle Scholar
  30. Hartmann, B., Mancini, M., & Pelachaud, C. (2006). Implementing expressive gesture synthesis for embodied conversational agents. In Proceedings of the 6th International Gesture Workshop, Berder Island, France, pp. 188–199. doi: 10.1007/11678816_22.
  31. Heldner, M., Hjalmarsson, A., & Edlund, J. (2013). Backchannel relevance spaces. In Nordic Prosody XI, Tartu, Estonia, Peter Lang Publishing Group, pp. 137–146.Google Scholar
  32. Heylen, D. (2006). Head gestures, gaze and the principle of conversational structure. International Journal of Humanoid Robotics, 3, 241–267. doi: 10.1142/S0219843606000746.CrossRefGoogle Scholar
  33. Heylen, D., Bevacqua, E., Pelachaud, C., Poggi, I., Gratch, J., & Schröder, M. (2011). Generating listening behaviour. In P. Petta, C. Pelachaud, & R. Cowie (Eds.), Emotion-oriented systems: The Humaine handbook. Berlin: Springer. doi: 10.1007/978-3-642-15184-2_17.Google Scholar
  34. Inden, B., Malisz, Z., Wagner, P., & Wachsmuth, I. (2013). Timing and entrainment of multimodal backchanneling behavior for an embodied conversational agent. In Proceedings of the 15th international conference on multimodal interaction, Sydney, Australia, pp. 181–188. doi: 10.1145/2522848.2522890.
  35. Ishi, C. T., Ishiguro, H., & Hagita, N. (2014). Analysis of relationship between head motion events and speech in dialogue conversation. Speech Communication, 57, 233–243. doi: 10.1016/j.specom.2013.06.008.CrossRefGoogle Scholar
  36. Kane, J., & Gobl, C. (2011). Identifying regions of non-modal phonation using features of the wavelet transform. In Proceedings of INTERSPEECH 2011, Florence, Italy, pp. 177–180.Google Scholar
  37. Kendon, A. (1967). Some functions of gaze-direction in social interaction. Acta Psychologica, 26, 22–63. doi: 10.1016/0001-6918(67)90005-4.CrossRefGoogle Scholar
  38. Kendon, A. (1980). Gesture and speech: Two aspects of the process of utterance. In M. R. Key (Ed.), Nonverbal communication and language (pp. 207–227). The Hague: Mouton.Google Scholar
  39. Kisler, T., Schiel, F., & Sloetjes, H. (2012). Signal processing via web services: The use case WebMAUS. In Proceedings of the workshop on service-oriented architectures for the humanities: Solutions and impacts, Hamburg, Germany, pp. 30–34.Google Scholar
  40. Koiso, H., Horiuchi, Y., Tutiya, S., Ichikawa, A., & Den, Y. (1998). An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs. Language and Speech, 41, 295–321. doi: 10.1177/002383099804100404.Google Scholar
  41. Kopp, S., Allwood, J., Grammar, K., Ahlsén, E., & Stocksmeier, T. (2008). Modeling embodied feedback with virtual humans. In I. Wachsmuth & G. Knoblich (Eds.), Modeling communication with robots and virtual humans (pp. 18–37). Berlin: Springer. doi: 10.1007/978-3-540-79037-2_2.CrossRefGoogle Scholar
  42. Kousidis, S., Pfeiffer, T., Malisz, Z., Wagner, P., & Schlangen, D. (2012). Evaluating a minimally invasive laboratory architecture for recording multimodal conversational data. In Proceedings of the interdisciplinary workshop on feedback behaviours in dialogue, Stevenson, WA, USA, pp. 39–42.Google Scholar
  43. Kousidis, S., Malisz, Z., Wagner, P., & Schlangen, D. (2013). Exploring annotation of head gesture forms in spontaneous human interaction. In Proceedings of the Tilburg Gesture Meeting (TiGeR 2013), Tilburg, The Netherlands.Google Scholar
  44. Kuhlen, A. K., & Brennan, S. E. (2010). Anticipating distracted addressees: How speakers’ expectations and addressees’ feedback influence storytelling. Discourse Processes, 47, 567–587. doi: 10.1080/01638530903441339.CrossRefGoogle Scholar
  45. Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174. doi: 10.2307/2529310.CrossRefGoogle Scholar
  46. Malisz, Z., Włodarczak, M., Buschmeier, H., Kopp, S., & Wagner, P. (2012). Prosodic characteristics of feedback expressions in distracted and non-distracted listeners. In Proceedings of The Listening Talker. An interdisciplinary workshop on natural and synthetic modification of speech in response to listening conditions, Edinburgh, UK, pp. 36–39.Google Scholar
  47. McClave, E. Z. (2000). Linguistic functions of head movements in the context of speech. Journal of Pragmatics, 32, 855–878. doi: 10.1016/s0378-2166(99)00079-x.CrossRefGoogle Scholar
  48. Morrel-Samuels, P., & Krauss, R. M. (1992). Word familiarity predicts temporal asynchrony of hand gestures and speech. Journal of Experimental Psychology: Human Learning and Memory, 18, 615–622. doi: 10.1037/0278-7393.18.3.615.Google Scholar
  49. Nobe, S. (2000). Where do most spontaneous representational gestures actually occur with respect to speech? In D. McNeill (Ed.), Language and Gesture (pp. 186–198). Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511620850.012.CrossRefGoogle Scholar
  50. Oertel, C., Cummins, F., Edlund, J., Wagner, P., & Campbell, N. (2013). D64: A corpus of richly recorded conversational interaction. Journal on Multimodal User Interfaces, 7, 19–28. doi: 10.1007/s12193-012-0108-6.CrossRefGoogle Scholar
  51. Peters, C., Pelachaud, C., Bevacqua, E., Mancini, M., & Poggi, I. (2005). A model of attention and interest using gaze behavior. In Proceedings of the 5th international working conference on intelligent virtual agents, Kos, Greece, pp. 229–240. doi: 10.1007/11550617_20.
  52. Poggi, I., D’Errico, F., & Vincze, L. (2010). Types of nods. The polysemy of a social signal. In Proceedings of the seventh international conference on language resources and evaluation, Valletta, Malta.Google Scholar
  53. Prévot, L., Gorish, J., & Mukherjee, S. (2015). Annotation and classification of french feedback communicative functions. In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation (PACLIC 29), pp. 302–310.Google Scholar
  54. Reidsma, D., & Carletta, J. (2008). Reliability measurement without limits. Computational Linguistics, 34, 319–326. doi: 10.1162/coli.2008.34.3.319.CrossRefGoogle Scholar
  55. Schegloff, E. A. (1982). Discourse as an interactional achievement: Some uses of ‘uh huh’ and other things that come between sentences. In D. Tannen (Ed.), Analyzing discourse: Text and talk (pp. 71–93). Washington: Georgetown University Press.Google Scholar
  56. Schegloff, E. A. (1984). On some gestures’ relation to talk. In J. Atkinson & J. Heritage (Eds.), Structures of social action (pp. 266–296). Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511665868.018.Google Scholar
  57. Sidner, C. L., Kidd, C. D., Lee, C., & Lesh, N. (2004). Where to look: A study of human-robot engagement. In Proceedings of the 9th international conference on intelligent user interfaces, Funchal, Madeira, Portugal, pp. 78–84. doi: 10.1145/964442.964458.
  58. Skubisz, J. (2014). Multimodale Feedbackäußerungen im Deutschen. Eine korpusbasierte Analyse zu nonverbalen Feedbackfunktionenam am Beispiel einer Beurteilungsstudie. Master’s thesis, Fakultät für Linguistik und Literaturwissenschaft, Bielefeld University, Bielefeld, Germany.Google Scholar
  59. Truong, K. P., Poppe, R., de Kok, I., & Dirk, H. (2011). A multimodal analysis of vocal and visual backchannels in spontaneous dialogs. InProceedings of Interspeech 2011, Florence, Italy, pp. 2973–2976.Google Scholar
  60. Wagner, P., Malisz, Z., Inden, B., & Wachsmuth, I. (2013). Interaction phonology—A temporal co-ordination component enabling representational alignment within a model of communication. In I. Wachsmuth, J. de Ruiter, P. Jaecks, & S. Kopp (Eds.), Alignment in communication. Towards a new theory of communication (pp. 109–132). Amsterdam: John Benjamins Publishing Company. doi: 10.1075/ais.6.06wag.CrossRefGoogle Scholar
  61. Wagner, P., Malisz, Z., & Kopp, S. (2014). Gesture and speech in interaction: An overview. Speech Communication, 57, 209–232. doi: 10.1016/j.specom.2013.09.008.CrossRefGoogle Scholar
  62. Ward, N., & Tsukahara, W. (2000). Prosodic features which cue back-channel responses in English and Japanese. Journal of Pragmatics, 38, 1177–1207. doi: 10.1016/S0378-2166(99)00109-5.CrossRefGoogle Scholar
  63. Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., & Sloetjes, H. (2006). ELAN: A professional framework for multimodality research. In Proceedings of the 5th international conference on language resources and evaluation, Genoa, Italy, pp. 1556–1559.Google Scholar
  64. Włodarczak, M., Bunt, H., & Petukhova, V. (2010). Entailed feedback: Evidence from a ranking experiment. In P. Łupkowski & M. Purver (Eds.), Aspects of semantic and pragmatics of dialogue (pp. 159–162). Poland: Poznań.Google Scholar
  65. Włodarczak, M., Buschmeier, H., Malisz, Z., Kopp, S., & Wagner, P. (2012). Listener head gestures and verbal feedback expressions in a distraction task. In Proceedings of the interdisciplinary workshop on feedback behaviours in dialogue, Stevenson, WA, USA, pp. 93–96.Google Scholar
  66. Włodarczak, M., Heldner, M., & Edlund, J. (2015). Communicative needs and respiratory constraints. In Proceedings of Interspeech 2015, Dresden, Germany.Google Scholar
  67. Yngve, V. H. (1970). On getting a word in edgewise. In M. A. Campbell, et al. (Eds.), Papers from the Sixth Regional Meeting of the Chicago Linguistic Society (pp. 567–577). Chicago, IL: Chicago Linguistic Society.Google Scholar
  68. Yoganandan, N., Pintar, F. A., Zhang, J., & Baisden, J. L. (2009). Physical properties of the human head: Mass, center of gravity and moment of inertia. Journal of Biomechanics, 42, 1177–1192. doi: 10.1016/j.jbiomech.2009.03.029.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  • Zofia Malisz
    • 1
    • 2
    Email author
  • Marcin Włodarczak
    • 3
  • Hendrik Buschmeier
    • 4
  • Joanna Skubisz
    • 5
  • Stefan Kopp
    • 4
  • Petra Wagner
    • 6
  1. 1.Department of Computational Linguistics and PhoneticsSaarland UniversitySaarbrückenGermany
  2. 2.Department of SpeechMusic and Hearing, KTHStockholmSweden
  3. 3.Department of LinguisticsStockholm UniversityStockholmSweden
  4. 4.Faculty of Technology and CITECBielefeld UniversityBielefeldGermany
  5. 5.Faculdade de Ciências Sociais e HumanasUniversidade Nova de LisboaLisbonPortugal
  6. 6.Faculty of Linguistics and Literary StudiesBielefeld UniversityBielefeldGermany

Personalised recommendations