Should Agents Speak Like, um, Humans? The Use of Conversational Fillers by Virtual Agents

  • Laura M. Pfeifer
  • Timothy Bickmore
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5773)


We describe the design and evaluation of an agent that uses the fillers um and uh in its speech. We describe an empirical study of human-human dialogue, analyzing gaze behavior during the production of fillers and use this data to develop a model of agent-based gaze behavior. We find that speakers are significantly more likely to gaze away from their dialogue partner while uttering fillers, especially if the filler occurs at the beginning of a speaking turn. This model is evaluated in a preliminary experiment. Results indicate mixed attitudes towards an agent that uses conversational fillers in its speech.


embodied conversational agent fillers filled pause gaze 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cassell, J.: Embodied Conversational Agents. MIT Press, Cambridge (2000)Google Scholar
  2. 2.
    Fox Tree, J.E.: The Effects of False Starts and Repetitions on the Processing of Subsequent Words in Spontaneous Speech. J. Mem. Lang 34, 709–738 (1995)CrossRefGoogle Scholar
  3. 3.
    Swerts, M.: Filled Pauses as Markers of Discourse Structure. J. Pragmat. 30, 485–496 (1998)CrossRefGoogle Scholar
  4. 4.
    Clark, H., Fox Tree, J.E.: Using Uh and Um in Spontaneous Speaking. Cognition 84, 73–111 (2002)CrossRefGoogle Scholar
  5. 5.
    Goodwin, C.: Forgetfulness as an Interactive Resource. Soc. Psychol. Q. 50, 115–130 (1987)CrossRefGoogle Scholar
  6. 6.
    The American Heritage dictionary of the English language. Houghton Mifflin, Boston (2006)Google Scholar
  7. 7.
    Fox Tree, J.E.: Listeners’ Uses of Um and Uh in Speech Comprehension. J. Mem. Cognit. 29, 320–326 (2001)Google Scholar
  8. 8.
    Arnold, J.E., Fagnano, M., Tanenhaus, M.K.: Disfluencies Signal theee, um, New Information. J. Psycholinguist Res. 32, 25–36 (2003)CrossRefGoogle Scholar
  9. 9.
    Oviatt, S.: Predicting Spoken Disfluencies During Human-Computer Interaction. Comp. Speech Lang. 9, 19–35 (1995)CrossRefGoogle Scholar
  10. 10.
    Kipp, M.: ANVIL – A Generic Annotation Tool for Multimodal Dialogue. In: 7th European Conference on Speech Communication and Technology, pp. 1367–1370 (2001)Google Scholar
  11. 11.
    Bickmore, T.W., Pfeifer, L.M., Paasche-Orlow, M.K.: Health document explanation by virtual agents. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 183–196. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  12. 12.
    Cassell, J., Vilhjalmsson, H., Bickmore, T.: BEAT: The Behavior Expression Animation Toolkit. In: SIGGRAPH 2001: Proceedings of the 28th annual conference on computer graphics and interactive techniques, pp. 477–486 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Laura M. Pfeifer
    • 1
  • Timothy Bickmore
    • 1
  1. 1.Northeastern University College of Computer and Information ScienceBoston

Personalised recommendations