Skip to main content

The ALICO corpus: analysing the active listener


The Active Listening Corpus (ALICO) is a multimodal data set of spontaneous dyadic conversations in German with diverse speech and gestural annotations of both dialogue partners. The annotations consist of short feedback expression transcriptions with corresponding communicative function interpretations as well as segmentations of interpausal units, words, rhythmic prominence intervals and vowel-to-vowel intervals. Additionally, ALICO contains head gesture annotations of both interlocutors. The corpus contributes to research on spontaneous human–human interaction, on functional relations between modalities, and timing variability in dialogue. It also provides data that differentiates between distracted and attentive listeners. We describe the main characteristics of the corpus and briefly present the most important results obtained from analyses in recent years.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. Four annotators in total worked on the feedback function interpretation in ALICO, namely the first four authors of this paper, out of which JS, MW and ZM are competent but not native speakers of German. Annotation tasks were assigned in rotation to three annotators per recorded session.

  2. While it would be preferable to use a multi-annotator agreement measure, such as Fleiss’s \(\kappa \), this is somewhat problematic on the present dataset given that each dialogue was annotated by a different subset of annotators. For this reason, we resort to pairwise comparisons between individual annotators.


  • Allwood, J., Nivre, J., & Ahlsén, E. (1992). On the semantics and pragmatics of linguistic feedback. Journal of Semantics, 9, 1–26. doi:10.1093/jos/9.1.1.

    Article  Google Scholar 

  • Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., & Paggio, P. (2007). The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. Language Resources and Evaluation, 41, 273–287. doi:10.1007/s10579-007-9061-5.

    Article  Google Scholar 

  • Barbosa, P. A. (2006). Incursõeses em torno do ritmo da fala [Incursions into speech rhythm]. Campinas: Pontes.

    Google Scholar 

  • Bavelas, J. B., Coates, L., & Johnson, T. (2000). Listeners as co-narrators. Journal of Personality and Social Psychology, 79, 941–952. doi:10.1037/0022-3514.79.6.941.

    Article  Google Scholar 

  • Beňuš, Š., Gravano, A., & Hirschberg, J. (2011). Pragmatic aspects of temporal accommodation in turn-taking. Journal of Pragmatics, 43, 3001–3027. doi:10.1016/j.pragma.2011.05.011.

    Article  Google Scholar 

  • Bergmann, K., & Kopp, S. (2006). Verbal or visual? How information is distributed across speech and gesture in spatial dialogue. In Proceedings of the 10th workshop on the semantics and pragmatics of dialogue, Potsdam, Germany, pp. 90–97.

  • Boersma, P., & Weenink, D. (2013). Praat: Doing phonetics by computer [computer program]. Version 5.3.68.

  • Breen, M., Dilley, L. C., Kraemer, J., & Edward, G. (2012). Inter-transcriber reliability for two systems of prosodic annotation: ToBI (tones and break indices) and RaP (rhythm and pitch). Corpus Linguistics and Linguistic Theory, 8, 277–312. doi:10.1515/cllt-2012-0011.

    Article  Google Scholar 

  • Bunt, H. (2007). Multifunctionality and multidimensional dialogue act annotation. In E. Ahlsén, P. J. Henrichsen, R. Hirsch, J. Nivre, Å. Abelin, S. Strömqvist, & S. Nicholson (Eds.), Communication—Action—Meaning. A Festschrift to Jens Allwood (pp. 237–259). Gothenburg: Gothenburg University Press.

    Google Scholar 

  • Buschmeier, H., & Włodarczak, M. (2013). TextGridTools: A TextGrid processing and analysis toolkit for Python. In Proceedings der 24. Konferenz zur elektronischen Sprachsignalverarbeitung, Bielefeld, Germany, pp. 152–157.

  • Buschmeier, H., & Kopp, S. (2012). Using a Bayesian model of the listener to unveil the dialogue information state. In SemDial 2012: Proceedings of the 16th workshop on the semantics and pragmatics of dialogue, Paris, France, pp. 12–20.

  • Buschmeier, H., Malisz, Z., Włodarczak, M., Kopp, S., & Wagner, P. (2011). ‘Are you sure you’re paying attention?’ —‘Uh-huh’. Communicating understanding as a marker of attentiveness. In Proceedings of Interspeech 2011, Florence, Italy, pp. 2057–2060.

  • Buschmeier, H., Malisz, Z., Skubisz, J., Włodarczak, M., Wachsmuth, I., Kopp, S., et al. (2014). ALICO: A multimodal corpus for the study of active listening. In Proceedings of the 9th conference on language resources and evaluation, Iceland, Reykjavík, pp. 3638–3643.

  • Cerrato, L. (2007). Investigating communicative feedback phenomena across languages and modalities. Ph.D. thesis, KTH Stockholm, Department of Speech, Music and Hearing, Stockholm, Sweden.

  • Clark, H. H. (1996). Using language. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511620539.

    Book  Google Scholar 

  • Clark, H. H., & Schaefer, E. F. (1989). Contributing to discourse. Cognitive Science, 13, 259–294. doi:10.1207/s15516709cog1302_7.

    Article  Google Scholar 

  • de Kok, I., & Heylen, D. (2011). The MultiLis corpus—Dealing with individual differences in nonverbal listening behavior. In Proceedings of the 3rd COST 2102 International Training School, Caserta, Italy, pp. 362–375. doi:10.1007/978-3-642-18184-9_32.

  • de Ruiter, J. P. (2000). The production of gesture and speech. In D. McNeill (Ed.), Language and gesture (pp. 284–311). Cambridge: Cambridge University Press. doi:10.1017/CBO9780511620850.018.

    Chapter  Google Scholar 

  • Dittmann, A. T., & Llewellyn, L. G. (1968). Relationship between vocalizations and head nods as listener responses. Journal of Personality and Social Psychology, 9, 79–84. doi:10.1037/h0025722.

    Article  Google Scholar 

  • Duncan, S., & Fiske, D. W. (1977). Face-to-face interaction: Research, methods, and theory. Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Edlund, J., Heldner, M., Al Moubayed, S., Gravano, A., & Hirschberg, J. (2010). Very short utterances in conversation. In Proceedings Fonetik 2010, Lund, Sweden, pp. 11–16.

  • Gardner, R. (2001). When listeners talk. Response tokens and listener stance. Amsterdam: John Benjamins Publishing Company. doi:10.1075/pbns.92.

  • Garrod, S., & Anderson, A. (1987). Saying what you mean in dialogue: A study in conceptual and semantic co-ordination. Cognition, 27, 181–218. doi:10.1016/0010-0277(87)90018-7.

    Article  Google Scholar 

  • Geertzen, J., Petukhova, V., & Bunt, H. (2008). Evaluating dialogue act tagging with naive and expert annotators. In Proceedings of the 6th international conference on language resources and evaluation, Marrakech, Morocco, pp. 1076–1082.

  • Goldin-Meadow, S., Alibali, M., & Church, S. (1993). Transitions in concept acquisition: Using the hand to read the mind. Psychological Review, 100, 279–297. doi:10.1037/0033-295X.100.2.279.

    Article  Google Scholar 

  • Goodwin, C. (1981). Conversational organization: Interaction between speakers and hearers. New York: Academic Press.

    Google Scholar 

  • Gravano, A., Beňuš, Š., Hirschberg, J., Mitchell, S., & Vovsha, I. (2007). Classification of discourse functions of affirmative words in spoken dialogue. In Proceedings of Interspeech 2007, Antwerp, Belgium, pp. 1613–1616.

  • Grosz, B. J., & Sidner, C. L. (1986). Attention, intentions, and the structure of discourse. Computational Linguistics, 12, 175–204.

    Google Scholar 

  • Hadar, U., Steiner, T., & Rose, C. F. (1985). Head movement during listening turns in conversation. Journal of Nonverbal Behavior, 9, 214–228. doi:10.1007/BF00986881.

    Article  Google Scholar 

  • Hartmann, B., Mancini, M., & Pelachaud, C. (2006). Implementing expressive gesture synthesis for embodied conversational agents. In Proceedings of the 6th International Gesture Workshop, Berder Island, France, pp. 188–199. doi:10.1007/11678816_22.

  • Heldner, M., Hjalmarsson, A., & Edlund, J. (2013). Backchannel relevance spaces. In Nordic Prosody XI, Tartu, Estonia, Peter Lang Publishing Group, pp. 137–146.

  • Heylen, D. (2006). Head gestures, gaze and the principle of conversational structure. International Journal of Humanoid Robotics, 3, 241–267. doi:10.1142/S0219843606000746.

    Article  Google Scholar 

  • Heylen, D., Bevacqua, E., Pelachaud, C., Poggi, I., Gratch, J., & Schröder, M. (2011). Generating listening behaviour. In P. Petta, C. Pelachaud, & R. Cowie (Eds.), Emotion-oriented systems: The Humaine handbook. Berlin: Springer. doi:10.1007/978-3-642-15184-2_17.

    Google Scholar 

  • Inden, B., Malisz, Z., Wagner, P., & Wachsmuth, I. (2013). Timing and entrainment of multimodal backchanneling behavior for an embodied conversational agent. In Proceedings of the 15th international conference on multimodal interaction, Sydney, Australia, pp. 181–188. doi:10.1145/2522848.2522890.

  • Ishi, C. T., Ishiguro, H., & Hagita, N. (2014). Analysis of relationship between head motion events and speech in dialogue conversation. Speech Communication, 57, 233–243. doi:10.1016/j.specom.2013.06.008.

    Article  Google Scholar 

  • Kane, J., & Gobl, C. (2011). Identifying regions of non-modal phonation using features of the wavelet transform. In Proceedings of INTERSPEECH 2011, Florence, Italy, pp. 177–180.

  • Kendon, A. (1967). Some functions of gaze-direction in social interaction. Acta Psychologica, 26, 22–63. doi:10.1016/0001-6918(67)90005-4.

    Article  Google Scholar 

  • Kendon, A. (1980). Gesture and speech: Two aspects of the process of utterance. In M. R. Key (Ed.), Nonverbal communication and language (pp. 207–227). The Hague: Mouton.

    Google Scholar 

  • Kisler, T., Schiel, F., & Sloetjes, H. (2012). Signal processing via web services: The use case WebMAUS. In Proceedings of the workshop on service-oriented architectures for the humanities: Solutions and impacts, Hamburg, Germany, pp. 30–34.

  • Koiso, H., Horiuchi, Y., Tutiya, S., Ichikawa, A., & Den, Y. (1998). An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs. Language and Speech, 41, 295–321. doi:10.1177/002383099804100404.

    Google Scholar 

  • Kopp, S., Allwood, J., Grammar, K., Ahlsén, E., & Stocksmeier, T. (2008). Modeling embodied feedback with virtual humans. In I. Wachsmuth & G. Knoblich (Eds.), Modeling communication with robots and virtual humans (pp. 18–37). Berlin: Springer. doi:10.1007/978-3-540-79037-2_2.

    Chapter  Google Scholar 

  • Kousidis, S., Pfeiffer, T., Malisz, Z., Wagner, P., & Schlangen, D. (2012). Evaluating a minimally invasive laboratory architecture for recording multimodal conversational data. In Proceedings of the interdisciplinary workshop on feedback behaviours in dialogue, Stevenson, WA, USA, pp. 39–42.

  • Kousidis, S., Malisz, Z., Wagner, P., & Schlangen, D. (2013). Exploring annotation of head gesture forms in spontaneous human interaction. In Proceedings of the Tilburg Gesture Meeting (TiGeR 2013), Tilburg, The Netherlands.

  • Kuhlen, A. K., & Brennan, S. E. (2010). Anticipating distracted addressees: How speakers’ expectations and addressees’ feedback influence storytelling. Discourse Processes, 47, 567–587. doi:10.1080/01638530903441339.

    Article  Google Scholar 

  • Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174. doi:10.2307/2529310.

    Article  Google Scholar 

  • Malisz, Z., Włodarczak, M., Buschmeier, H., Kopp, S., & Wagner, P. (2012). Prosodic characteristics of feedback expressions in distracted and non-distracted listeners. In Proceedings of The Listening Talker. An interdisciplinary workshop on natural and synthetic modification of speech in response to listening conditions, Edinburgh, UK, pp. 36–39.

  • McClave, E. Z. (2000). Linguistic functions of head movements in the context of speech. Journal of Pragmatics, 32, 855–878. doi:10.1016/s0378-2166(99)00079-x.

    Article  Google Scholar 

  • Morrel-Samuels, P., & Krauss, R. M. (1992). Word familiarity predicts temporal asynchrony of hand gestures and speech. Journal of Experimental Psychology: Human Learning and Memory, 18, 615–622. doi:10.1037/0278-7393.18.3.615.

    Google Scholar 

  • Nobe, S. (2000). Where do most spontaneous representational gestures actually occur with respect to speech? In D. McNeill (Ed.), Language and Gesture (pp. 186–198). Cambridge: Cambridge University Press. doi:10.1017/CBO9780511620850.012.

    Chapter  Google Scholar 

  • Oertel, C., Cummins, F., Edlund, J., Wagner, P., & Campbell, N. (2013). D64: A corpus of richly recorded conversational interaction. Journal on Multimodal User Interfaces, 7, 19–28. doi:10.1007/s12193-012-0108-6.

    Article  Google Scholar 

  • Peters, C., Pelachaud, C., Bevacqua, E., Mancini, M., & Poggi, I. (2005). A model of attention and interest using gaze behavior. In Proceedings of the 5th international working conference on intelligent virtual agents, Kos, Greece, pp. 229–240. doi:10.1007/11550617_20.

  • Poggi, I., D’Errico, F., & Vincze, L. (2010). Types of nods. The polysemy of a social signal. In Proceedings of the seventh international conference on language resources and evaluation, Valletta, Malta.

  • Prévot, L., Gorish, J., & Mukherjee, S. (2015). Annotation and classification of french feedback communicative functions. In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation (PACLIC 29), pp. 302–310.

  • Reidsma, D., & Carletta, J. (2008). Reliability measurement without limits. Computational Linguistics, 34, 319–326. doi:10.1162/coli.2008.34.3.319.

    Article  Google Scholar 

  • Schegloff, E. A. (1982). Discourse as an interactional achievement: Some uses of ‘uh huh’ and other things that come between sentences. In D. Tannen (Ed.), Analyzing discourse: Text and talk (pp. 71–93). Washington: Georgetown University Press.

    Google Scholar 

  • Schegloff, E. A. (1984). On some gestures’ relation to talk. In J. Atkinson & J. Heritage (Eds.), Structures of social action (pp. 266–296). Cambridge: Cambridge University Press. doi:10.1017/CBO9780511665868.018.

    Google Scholar 

  • Sidner, C. L., Kidd, C. D., Lee, C., & Lesh, N. (2004). Where to look: A study of human-robot engagement. In Proceedings of the 9th international conference on intelligent user interfaces, Funchal, Madeira, Portugal, pp. 78–84. doi:10.1145/964442.964458.

  • Skubisz, J. (2014). Multimodale Feedbackäußerungen im Deutschen. Eine korpusbasierte Analyse zu nonverbalen Feedbackfunktionenam am Beispiel einer Beurteilungsstudie. Master’s thesis, Fakultät für Linguistik und Literaturwissenschaft, Bielefeld University, Bielefeld, Germany.

  • Truong, K. P., Poppe, R., de Kok, I., & Dirk, H. (2011). A multimodal analysis of vocal and visual backchannels in spontaneous dialogs. InProceedings of Interspeech 2011, Florence, Italy, pp. 2973–2976.

  • Wagner, P., Malisz, Z., Inden, B., & Wachsmuth, I. (2013). Interaction phonology—A temporal co-ordination component enabling representational alignment within a model of communication. In I. Wachsmuth, J. de Ruiter, P. Jaecks, & S. Kopp (Eds.), Alignment in communication. Towards a new theory of communication (pp. 109–132). Amsterdam: John Benjamins Publishing Company. doi:10.1075/ais.6.06wag.

    Chapter  Google Scholar 

  • Wagner, P., Malisz, Z., & Kopp, S. (2014). Gesture and speech in interaction: An overview. Speech Communication, 57, 209–232. doi:10.1016/j.specom.2013.09.008.

    Article  Google Scholar 

  • Ward, N., & Tsukahara, W. (2000). Prosodic features which cue back-channel responses in English and Japanese. Journal of Pragmatics, 38, 1177–1207. doi:10.1016/S0378-2166(99)00109-5.

    Article  Google Scholar 

  • Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., & Sloetjes, H. (2006). ELAN: A professional framework for multimodality research. In Proceedings of the 5th international conference on language resources and evaluation, Genoa, Italy, pp. 1556–1559.

  • Włodarczak, M., Bunt, H., & Petukhova, V. (2010). Entailed feedback: Evidence from a ranking experiment. In P. Łupkowski & M. Purver (Eds.), Aspects of semantic and pragmatics of dialogue (pp. 159–162). Poland: Poznań.

    Google Scholar 

  • Włodarczak, M., Buschmeier, H., Malisz, Z., Kopp, S., & Wagner, P. (2012). Listener head gestures and verbal feedback expressions in a distraction task. In Proceedings of the interdisciplinary workshop on feedback behaviours in dialogue, Stevenson, WA, USA, pp. 93–96.

  • Włodarczak, M., Heldner, M., & Edlund, J. (2015). Communicative needs and respiratory constraints. In Proceedings of Interspeech 2015, Dresden, Germany.

  • Yngve, V. H. (1970). On getting a word in edgewise. In M. A. Campbell, et al. (Eds.), Papers from the Sixth Regional Meeting of the Chicago Linguistic Society (pp. 567–577). Chicago, IL: Chicago Linguistic Society.

  • Yoganandan, N., Pintar, F. A., Zhang, J., & Baisden, J. L. (2009). Physical properties of the human head: Mass, center of gravity and moment of inertia. Journal of Biomechanics, 42, 1177–1192. doi:10.1016/j.jbiomech.2009.03.029.

    Article  Google Scholar 

Download references


This research was supported by the Deutsche Forschungsgemeinschaft (DFG) in the Collaborative Research Center 673 “Alignment in Communication” and the Center of Excellence EXC 277 “Cognitive Interaction Technology” (CITEC), as well as the Swedish Research Council (VR) projects “Samtalets rytm” (2009–1766) and “Andning i samtal” (2014–1072).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Zofia Malisz.

Additional information

Zofia Malisz, Marcin Włodarczak, Hendrik Buschmeier and Joanna Skubisz have contributed equally to this article.



See Fig. 11 and Tables 13, 14.

Table 13 ALICO data overview
Fig. 11
figure 11

Confusion matrices for each annotator pair annotating core feedback functions categories: P1, P2, P3, and A. Labels were stripped off all modifiers (e.g. C or E or A in modifier role). The shades of the cells indicate relative frequency for each label combination and can be compared across confusion matrices. The numbers in each cell show absolute frequencies and are not comparable across confusion matrices

Table 14 Frequency of specific short feedback expressions (SFEs) found in ALICO as classified into three semantic categories (see Table 7)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Malisz, Z., Włodarczak, M., Buschmeier, H. et al. The ALICO corpus: analysing the active listener. Lang Resources & Evaluation 50, 411–442 (2016).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Active listening
  • Multimodal feedback
  • Backchannels
  • Head gestures
  • Attention
  • Multimodal corpus