Skip to main content

Natural Turn-Taking Needs No Manual: Computational Theory and Model, from Perception to Action

  • Chapter
Book cover Multimodality in Language and Speech Systems

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 19))

Abstract

Decisions like these are made by dialogue participants as often as 2–3 times per second. For a 30 minute conversation that’s over 5000 decisions. And that’s just a fraction of what goes on. How do we do it? Face-to-face dialogue consists of interaction between several complex, dynamic systems — visual and auditory display of information, internal processing, knee-jerk reactions, thought-out rhetoric, learned patterns, social convention, etc. One could postulate that the power of dialogue is a direct result of this fact. However, combining a multitude of systems in one place does not guarantee a coherent outcome such as goal-directed dialogue. For this to happen the systems need to be architected in a way that guides their interaction and ensures that — complex as it may be — the interaction tends towards homeostasis in light of errors and uncertainties, towards the set of goals shared by participants.

Beth and Alan are sitting at a Fifth Avenue outdoors restaurant in Manhattan. Alan is telling Beth an exciting story about his vacation in Nice. Alan presents the story through gesture and speech. Then Beth’s arm starts moving and her neck stiffens.

We, the viewers, know that she’s surprised to see an elephant in the middle of Manhattan, and that in 460 milliseconds her arm and hand motion will turn into a well-defined deictic gesture, her eyebrows will rise, and her mouth will open with surprise, at which point Alan will most certainly recognize the signs and look over at the elephant. But right now, at t-minus-460 milliseconds, Beth’s gesture is barely recognizable as a communicative action, so Alan doesn’t know for sure. And thus, before that all happens, in the next 460 milliseconds, Alan has to decide what to do about Beth’s behavior. Should he stop telling his story? Or should he go on, in case Beth is simply adjusting her jacket?

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Adler, R. (1989). Blackboard Systems. In S. C. Shapiro (ed.), The Encyclopedia of Artificial Intelligence, 2nd ed., 110–116. New York, NY: Wiley Interscience.

    Google Scholar 

  • Boff, K. R., L. Kaufman, and J. P. Thomas (eds.) (1986). Handbook of Human Perception. New York, New York: John Wiley and Sons.

    Google Scholar 

  • Bryson, J. and K. R. Th6risson (in press). Dragons, Bats and Evil Knights: A Chatacter-Based Approach to Constructive Play. Submitted to Virtual Reality, Special Issue on Intelligent Agents. London: Springer.

    Google Scholar 

  • Cahn, J. E. and S. E. Brennan (1999). A Psychological Model of Grounding and Repair in Dialog. Proceedings of the Fall 1999 AAAI Symposium on Psychological Models of Communication in Collaborative Systems,Sea Cliff, Massachusetts, November 5–7, 25–33.

    Google Scholar 

  • Card, S. K., T. P. Moran, and A. Newell (1983). The Psychology of Human-Computer Interaction. Hillsdale, New Jersey: Lawrence Earlbaum Associates.

    Google Scholar 

  • Cassell, J. and K. R. Thdrisson (1999). The Power of a Nod and a Glance: Envelope vs. Emotional Feedback in Animated Conversational Agents. Applied Artificial Intelligence, 13 (4–5), 519–538.

    Article  Google Scholar 

  • Clark, H. H. (1992). Arenas of Language Use. Chicago, Illinoi: University of Chicago Press. Clark, H.H. and E. F. Schaefer (1989). Contributing to Discourse. Cognitive Science, 13: 259–294.

    Google Scholar 

  • Dodhiawala, R. T. (1989). Blackboard Systems in Real-Time Problem Solving. In Jagannathan, V., Dodhiawala, R. and Baum, L. S. (eds.), Blackboard Architectures and Applications, 181–191. Boston: Academic Press, Inc.

    Google Scholar 

  • Duncan, S. Jr. (1972). Some Signals and Rules for Taking Speaking Turns in Conversations. Journal of Personality and Social Psychology, 23 (2), 283–292.

    Article  Google Scholar 

  • Effron, D. (1941/1972). Gesture, Race and Culture. The Hague: Mouton.

    Google Scholar 

  • Ekman, P. and W. Friesen (1969). The Repertoire of Non-Verbal Behavior: Categories, Origins, Usage, and Coding. Semiotica, 1, 49–98.

    Google Scholar 

  • Goodwin, M. H. and C. Goodwin (1986). Gesture and Coparticipation in the Activity of Searching for a Word. Semiotica, 62 (1/2), 51–75.

    Google Scholar 

  • Goodwin, C. (1981). Conversational Organization: Interaction Between Speakers and Hearers. New York, NY: Academic Press.

    Google Scholar 

  • Goodwin, C. (1986). Gestures as a Resource for the Organization of Mutual Orientation. Semiotica, 62 (1/2), 29–49.

    Google Scholar 

  • Grice, H. P. (1989). Studies in the Way of Words. Cambridge, Massachusetts: Harvard University Press.

    Google Scholar 

  • Grosz, B. J. and C. L. Sidner (1986). Attention, Intentions, and the Strucutre of Discourse. Computational Linguistics, 12 (3), 175–204.

    Google Scholar 

  • Kahneman, D. (1973). Attention and Effort. New Jersey: Prentice-Hall, Inc.

    Google Scholar 

  • Kleinke, C. (1986). Gaze and Eye Contact: A Research Review. Psychological Bulletin, 100 (1), 78–100.

    Article  Google Scholar 

  • Kosslyn, S. M. and O. Koenig (1992). Wet Mind: The New Cognitive Neuroscience. New York, New York: The Free Press.

    Google Scholar 

  • Lenat, D. B. (1995). Cyc: A Large-Scale Investment in Knowledge Infrastructure. Communications of the ACM, 38 (11).

    Google Scholar 

  • Maes, P. (ed.) (1990a). Designing Autonomous Agents: Theory and Practice from Biology to Engineering and Back. Cambridge, MA: MIT Press/Elsevier.

    Google Scholar 

  • Maes, P. (1990b). Situated Agents can have Goals. In P. Maes (ed.), Designing Autonomous Agents, 4970. Cambridge, MA: MIT Press.

    Google Scholar 

  • McNeill, D. (1992). Hand and Mind: What Gestures Reveal about Thought. Chicago, IL: University of Chicago Press.

    Google Scholar 

  • Nespolous, J-L and Lecours, A. R. (1986). Gestures: Nature and Function. In J-L Nespolous, P. Perron and A. R. Lecours (eds.), The Biological Foundations of Gestures: Motor and Semiotic Aspects, 49–62. Hillsdale, NJ: Lawrence Earlbaum Associates.

    Google Scholar 

  • Newell, A. (1990). Unified Theories of Cognition. Cambridge, MA: Harvard University Press.

    Google Scholar 

  • Nii, P. (1989). Blackboard Systems. In A. Barr, P. R. Cohen and E. A. Feigenbaum (eds.), The Handbook of Artificial Intelligence, Vol. IV, 1–74. Reading, MA: Addison-Wesley Publishing Co.

    Google Scholar 

  • Pierrehumbert, J. and J. Hirschberg (1990). The Meaning of Intonational Contours in the Interpretation of Discourse. In P. R. Cohen, J. Morgan and M. E. Pollack (eds.), Intentions in Communication. Cambridge: MIT Press.

    Google Scholar 

  • Rimé, B. and Schiaratura, L. (1991). Gesture and Speech. In R. S. Feldman and B. Rimé, Fundamentals of Nonverbal Behavior, 239–281. New York: Press Syndicate of the University of Cambridge.

    Google Scholar 

  • Sacks, H., Schegloff, E. A.. and Jefferson, G. A. (1974). A Simplest Systematics for the Organization of Tum-Taking in Conversation. Language, 50, 696–735.

    Article  Google Scholar 

  • Sacks, H. (1992). Lectures on Conversation, vol II. Cambridge, MA: Blackwell. Schegloff, E. A. and H. Sacks (1973). Opening up Closings. Semiotica, 7, 289–327.

    Google Scholar 

  • Selfridge, O. (1959). Pandemonium: A Paradigm for Learning. Proceedings of Symposium on the Mechanization of Thought Processes, 1959, 511–29.

    Google Scholar 

  • Sommer, R. (1959). Studies in Personal Space. Sociometry, 23, 247–260.

    Article  Google Scholar 

  • Taylor, T. J. and D. Cameron (1987). Analysing Conversation: Rules and Units in the Structure of Talk. Oxford, England: Pergamon Press.

    Google Scholar 

  • Th6risson, K. R. (in press). Machine Perception of Embodied, Real-Time, Multimodal Dialogue. To be published in P. McKevitt (ed.), Language, Vision and Music.

    Google Scholar 

  • Th6risson, K. R. (1999). A Mind Model for Multimodal Communicative Creatures and Humanoids. Inter- national Journal of Applied Artificial Intelligence, 1999, Vol. 13 (4–5), 449–486.

    Article  Google Scholar 

  • ThOrisson, K. R. (1998). Decision Making in Real-Time Face-to-Face Multimodal Communication. Second ACM International Conference on Autonomous Agents `98, Minneapolis, Minnesota, May 12–15.

    Google Scholar 

  • Thdrisson, K. R. (1997). Layered, Modular Action Control in Communicative Humanoids. Proceedings of Computer Graphics Europe ‘87, June 5–7, Genieva, 134–143.

    Google Scholar 

  • Th6risson, K. R. (1996). Communicative Humanoids: A Computational Model of Psychosocial Dialogue Skills. Ph.D. Thesis, Massachusetts Institute of Technology, U.S.A.

    Google Scholar 

  • Walker, M. and Whittaker, S. (1990). Mixed Initiative in Dialogue: An Investigation into Discourse Segmentation. Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics.

    Google Scholar 

  • Whittaker, S., S. E. Brennan and H. H. Clark (1991). Co-ordinated Activity: An Analysis of Interaction in Computer-Supported Co-operative Work. Proceedings of Conference on Computer Human Interaction, 361–367.

    Google Scholar 

  • Whittaker, S. and Stenton, P. (1988). Cues and Control in Expert-Client Dialogues. Proc. 26th Annual Meeting of the Association of Computational Linguistics, 123–130.

    Google Scholar 

  • Yngve, V. H. (1970). On Getting a Word in Edgewise. Papers from the Sixth Regional Meeting., Chicago Linguistics Society, 567–78.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Thórisson, K.R. (2002). Natural Turn-Taking Needs No Manual: Computational Theory and Model, from Perception to Action. In: Granström, B., House, D., Karlsson, I. (eds) Multimodality in Language and Speech Systems. Text, Speech and Language Technology, vol 19. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2367-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-2367-1_8

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-6024-2

  • Online ISBN: 978-94-017-2367-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics