Teaching Computers to Conduct Spoken Interviews: Breaking the Realtime Barrier with Learning

  • Gudny Ragna Jonsdottir
  • Kristinn R. Thórisson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5773)


Several challenges remain in the effort to build software capable of conducting realtime dialogue with people. Part of the problem has been a lack of realtime flexibility, especially with regards to turntaking. We have built a system that can adapt its turntaking behavior in natural dialogue, learning to minimize unwanted interruptions and “awkward silences”. The system learns this dynamically during the interaction in less than 30 turns, without special training sessions. Here we describe the system and its performance when interacting with people in the role of an interviewer. A prior evaluation of the system included 10 interactions with a single artificial agent (a non-learning version of itself); the new data consists of 10 interaction sessions with 10 different humans. Results show performance to be close to a human’s in natural, polite dialogue, with 20% of the turn transitions taking place in under 300 msecs and 60% under 500 msecs. The system works in real-world settings, achieving robust learning in spite of noisy data. The modularity of the architecture gives it significant potential for extensions beyond the interview scenario described here.


Dialogue Realtime Turntaking Human-Computer Interaction Natural Communication Machine Learning Prosody 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wilson, M., Wilson, T.P.: An oscillator model of the timing of turn-taking. Psychonomic Bulletin Review 38(12), 957–968 (2005)CrossRefGoogle Scholar
  2. 2.
    Ford, C., Thompson, S.A.: Interactional units in conversation: Syntactic, intonational, and pragmatic resources for the management of turns. In: Ochs, E., Schegloff, E., Thompson, S.A. (eds.) Interaction and Grammar, pp. 134–184. Cambridge University Press, Cambridge (1996)CrossRefGoogle Scholar
  3. 3.
    Goodwin, C.: Conversational Organization: Interaction Between Speakers and Hearers. Academic Press, London (1981)Google Scholar
  4. 4.
    ten Bosch, L., Oostdijk, N., Boves, L.: On temporal aspects of turn taking in conversational dialogues. Speech Communication 47(1-2), 80–86 (2005)CrossRefGoogle Scholar
  5. 5.
    Jefferson, G.: Preliminary notes on a possible metric which provides for a standard maximum silence of approximately one second in conversation. Conversation: an Interdisciplinary Perspective, Multilingual Matters, 166–196 (1989)Google Scholar
  6. 6.
    Thórisson, K.R.: Natural turn-taking needs no manual: Computational theory and model, from perception to action, pp. 173–207 (2002)Google Scholar
  7. 7.
    Thórisson, K.R., Benko, H., Arnold, A., Abramov, D., Maskey, S., Vaseekaran, A.: Constructionist design methodology for interactive intelligences. A.I. Magazine 25, 77–90 (2004)Google Scholar
  8. 8.
    Jonsdottir, G.R., Thorisson, K.R., Nivel, E.: Learning smooth, human-like turntaking in realtime dialogue. In: Prendinger, H., Lester, J.C., Ishizuka, M. (eds.) IVA 2008. LNCS (LNAI), vol. 5208, pp. 162–175. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  9. 9.
    Sacks, H., Schegloff, E.A., Jefferson, G.A.: A simplest systematics for the organization of turn-taking in conversation. Language 50, 696–735 (1974)CrossRefGoogle Scholar
  10. 10.
    Walker, M.B.: Smooth transitions in conversational turntaking: Implications for theory, vol. 110, pp. 31–37 (1982)Google Scholar
  11. 11.
    Thórisson, K.R.: Dialogue control in social interface agents. In: INTERCHI Adjunct Proceedings, pp. 139–140 (1993)Google Scholar
  12. 12.
    Thórisson, K.R.: Communicative humanoids: A computational model of psycho-social dialogue skills, Ph.D. thesis, Massachusetts Institute of Technology (1996)Google Scholar
  13. 13.
    Gratch, J., Okhmatovskaia, A., Lamothe, F., Marsella, S., Morales, M., van der Werf, R.J., Morency, L.P.: Virtual rapport. In: IVA, Marina Del Rey, California, pp. 14–27 (2006)Google Scholar
  14. 14.
    Sato, R., Higashinaka, R., Tamoto, M., Nakano, M., Aikawa, K.: Learning decision trees to determine turn-taking by spoken dialogue systems. In: ICSLP 2002, pp. 861–864 (2002)Google Scholar
  15. 15.
    Schlangen, D.: From reaction to prediction: Experiments with computational models of turn-taking. In: Proceedings of Interspeech 2006, Panel on Prosody of Dialogue Acts and Turn-Taking, Pittsburgh, USA (September (2006)Google Scholar
  16. 16.
    Morency, L.-P., de Kok, I., Gratch, J.: Predicting listener backchannels: A probabilistic multimodal approach. In: Prendinger, H., Lester, J.C., Ishizuka, M. (eds.) IVA 2008. LNCS (LNAI), vol. 5208, pp. 176–190. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  17. 17.
    Bonaiuto, J., Thórisson, K.R.: Towards a neurocognitive model of realtime turntaking in face-to-face dialogue. In: Embodied Communication in Humans And Machines, pp. 451–483. Oxford University Press, Oxford (2008)CrossRefGoogle Scholar
  18. 18.
    Thórisson, K.R., Jonsdottir, G.R.: A granular architecture for dynamic realtime dialogue. In: Intelligent Virtual Agents, IVA 2008, pp. 1–3 (2008)Google Scholar
  19. 19.
    Pierrehumbert, J., Hirschberg, J.: The meaning of intonational contours in the interpretation of discourse. In: Cohen, P.R., Morgan, J., Pollack, M. (eds.) Intentions in Communication, pp. 271–311. MIT Press, Cambridge (1990)Google Scholar
  20. 20.
    Thórisson, K.R.: Machine perception of multimodal natural dialogue. In: McKevitt, P., Nulláin, S.Ó., Mulvihill, C. (eds.) Language, Vision & Music, 2002, pp. 97–115. John Benjamins, Amsterdam (2002)CrossRefGoogle Scholar
  21. 21.
    Nivel, E., Thórisson, K.R.: Prosodica: A realtime prosody tracker for dynamic dialogue. Technical report, Reykjavik University Department of Computer Science, Technical Report RUTR-CS08001 (2008)Google Scholar
  22. 22.
    Card, S.K., Moran, T.P., Newell, A.: The Model Human Processor: An Engineering Model of Human Performance, vol. II. John Wiley and Sons, New York (1986)Google Scholar
  23. 23.
    Andreas, E.S.: Observations on overlap: Findings and implications for automatic processing of multi-party conversation. In: Proceedings of Eurospeech 2001, pp. 1359–1362 (2001)Google Scholar
  24. 24.
    Markauskaite, L.: Towards an integrated analytical framework of information and communications technology literacy: from intended to implemented and achieved dimensions. Information Research 11 (2006), paper 252Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Gudny Ragna Jonsdottir
    • 1
  • Kristinn R. Thórisson
    • 1
  1. 1.Center for Analysis & Design of Intelligent Agents and School of Computer ScienceReykjavik UniversityReykjavikIceland

Personalised recommendations