Predicting Turn-Taking by Compact Gazing Transition Patterns in Multiparty Conversation

  • Li Tian
  • Qi Jia
  • Zhen Zhu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10749)


Gaze behavior plays an important role for analyzing turn-taking in multiparty conversation. In this study, we propose a general and powerful model for predicting turn-taking by analyzing gaze transition patterns in four-participant conversation. We propose gaze labels of different speaker’s and listener’s gaze movements and then code every gaze transition pattern to a two-label pattern. After that, we analyze the gaze transition patterns by quantitative analysis to confirm their effectiveness. Finally, we build up a prediction model for predicting turn-taking based on these gaze transition patterns. Experiments demonstrate that the prediction results obtained by our model are superior to the state-of-the-art.


Multiparty conversation Gaze behavior analysis Turn-taking Nonverbal behaviors Gaze transition pattern 


  1. 1.
    Bohus, D., Horvitz, E.: Decisions about turns in multiparty conversation: from perception to action. In: Proceedings of International Conference on Multimodal Interfaces, pp. 153–160 (2011)Google Scholar
  2. 2.
    Chen, L., Harper, M.P.: Multimodal floor control shift detection. In: Proceedings of International Conference on Multimodal Interfaces, pp. 15–22 (2009)Google Scholar
  3. 3.
    Dan, B., Horvitz, E.: Multiparty turn taking in situated dialog: study, lessons, and directions. In: Proceedings of Annual Meeting of the Special Interest Group in Discourse and Dialogue, pp. 98–109 (2011)Google Scholar
  4. 4.
    Dielmann, A., Garau, G., Bourlard, H.: Floor holder detection and end of speaker turn prediction in meetings. In: International Conference on Speech and Language Processing, Interspeech (2010)Google Scholar
  5. 5.
    Duncan, S.: Some signals and rules for taking speaking turns in conversations. J. Pers. Soc. Psychol. 23(2), 283–292 (1972)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Ferrer, L., Shriberg, E., Stolcke, A.: Is the speaker done yet? faster and more accurate end-of-utterance detection using prosody. In: Proceedings of ICSLP, p. 2002 (2002)Google Scholar
  7. 7.
    Gatica-Perez, D.: Analyzing group interactions in conversations: a review. In: 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, pp. 41–46 (2006)Google Scholar
  8. 8.
    Goodwin, C.: Restarts, pauses, and the achievement of a state of mutual gaze at turn beginning. Sociol. Inq. 50, 272–302 (1980)CrossRefGoogle Scholar
  9. 9.
    Gorga, S., Otsuka, K.: Conversation scene analysis based on dynamic Bayesian network and image-based gaze detection. In: Proceedings of International Conference on Multimodal Interfaces (2010)Google Scholar
  10. 10.
    Haberman, S.J.: The analysis of residuals in cross-classified tables. Biometrics 29, 205–220 (1973)CrossRefGoogle Scholar
  11. 11.
    Ishii, R., Kumano, S., Otsuka, K.: Predicting next speaker based on head movement in multi-party meetings. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)Google Scholar
  12. 12.
    Ishii, R., Otsuka, K., Kumano, S., Matsuda, M., Yamato, J.: Predicting next speaker and timing from gaze transition patterns in multi-party meetings, pp. 79–86 (2013).
  13. 13.
    Jokinen, K., Harada, K., Nishida, M., Yamamoto, S.: Turn-alignment using eye-gaze and speech in conversational interaction. In: Annual Conference of the International Speech Communication Association, pp. 2018–2021 (2010)Google Scholar
  14. 14.
    Kendon, A.: Some functions of gaze-direction in social interaction. Acta Psychologica 26(1), 22–63 (1967)CrossRefGoogle Scholar
  15. 15.
    de Kok, I., Heylen, D.: Multimodal end-of-turn prediction in multi-party meetings. In: Proceedings of the 2009 International Conference on Multimodal Interfaces, ICMI-MLMI 2009. ACM, New York, pp. 91–98 (2009).
  16. 16.
    Kumano, S., Otsuka, K., Dan, M., Yamato, J.: Recognizing communicative facial expressions for discovering interpersonal emotions in group meetings. In: Proceedings International Conference on Multimodal Interaction, pp. 99–106 (2009)Google Scholar
  17. 17.
    Laskowski, K., Edlund, J., Heldner, M.: A single-port non-parametric model of turn-taking in multi-party conversation. In: 1988 International Conference on Acoustics, Speech, and Signal Processing, 1988. ICASSP-88, pp. 5600–5603 (2011)Google Scholar
  18. 18.
    Levow, G.A.: Turn-taking in mandarin dialogue: interactions of tone and intonation. In: Proceedings of the SIGHAN Workshop (2005)Google Scholar
  19. 19.
    Wiemann, J.M., Mark, L.K.: Turn-taking in conversations. J. Commun. 25(2), 75–92 (1975)Google Scholar
  20. 20.
    Otsuka, K.: Conversational scene analysis. IEEE Sig. Process. Mag. 28, 127–131 (2011)CrossRefGoogle Scholar
  21. 21.
    Otsuka, K., Araki, S., Ishizuka, K., Fujimoto, M., Heinrich, M., Yamato, J.: A realtime multimodal system for analyzing group meetings by combining face pose tracking and speaker diarization. In: Proceedings of International Conference on Multimodal Interfaces, pp. 257–264 (2008)Google Scholar
  22. 22.
    Otsuka, K., Takemae, Y., Yamato, J.: A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances. In: Proceedings of Internetional Conference on Multimodal Interfaces, pp. 191–198 (2005)Google Scholar
  23. 23.
    Raux, A., Eskenazi, M.: A finite-state turn-taking model for spoken dialog systems. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 629–637 (2009)Google Scholar
  24. 24.
    Sacks, H., Jefferson, G.: A simplest systematics for the organization of turn-taking for conversation. Language 50(4), 696–735 (1974)CrossRefGoogle Scholar
  25. 25.
    Schlangen, D.: From reaction to prediction experiments with computational models of turn-taking. In: Proceedings of Interspeech 2006, Panel on Prosody of Dialogue Acts and Turn-Taking (2006)Google Scholar
  26. 26.
    Thrisson, K.R.: Natural turn-taking needs no manual: computational theory and model, from perception to action. In: Granström, B., House, D., Karlsson, I. (eds.) Multimodality in Language and Speech Systems. Text, Speech and Language Technology, vol. 19. Springer, Dordrecht (2002). Google Scholar
  27. 27.
    Traum, D., Rickel, J.: Embodied agents for multi-party dialogue in immersive virtual worlds. In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems: Part 2, pp. 766–773 (2002)Google Scholar
  28. 28.
    Traum, D.R.: A computational theory of grounding in natural language conversation (1994)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Foshan UniversityFoshanChina
  2. 2.South China University of TechnologyGuangzhouChina

Personalised recommendations