Probabilistic Inference of Gaze Patterns and Structure of Multiparty Conversations from Head Directions and Utterances

  • Kazuhiro Otsuka
  • Yoshinao Takemae
  • Junji Yamato
  • Hiroshi Murase
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4012)


A novel probabilistic framework is proposed for inferring gaze patterns and the structure of conversation in face-to-face multiparty communication, based on head directions and the presence/absence of utterances of participants. First, we define three classes of conversational regimes, which are characterized by the topology of the gaze pattern; we assume that they indicate the structure of the conversation, i.e. who is talking to whom. Next, the problem is formulated as joint estimation of both regime state from the gaze pattern and utterance, and the gaze pattern from head directions. We then devise a dynamic Bayesian network, called the Markov-switching model. The regime changes over time are based on Markov transitions, and controls the dynamics of the gaze patterns and utterances. Furthermore, Bayesian estimation of regime, gaze pattern, and model parameters are implemented using a Markov chain Monte Carlo method. Experiments on four-person conversations confirm accurate gaze estimation and the effectiveness of the framework toward identification of the conversation structures.


Nonverbal Behavior Markov Chain Monte Carlo Method Dynamic Bayesian Network Probabilistic Inference Head Direction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cutler, R., Rui, Y., Gupta, A., Cadiz, J., Tashev, T., He, L., Colburn, A., Zhang, Z., Liu, Z., Silverberg, S.: Distributed meetings: A meeting capture and broadcasting system. In: Proc. ACM Multimedia 2002, pp. 503–512 (2002)Google Scholar
  2. 2.
    Bett, M., Gross, R., Yu, H., Zhu, X., Pan, Y., Yang, J., Waibel, A.: Multimodal meeting tracker. In: Proc. RIAO 2000: Content-Based Multimodal Inform. Access (2000)Google Scholar
  3. 3.
    Heylen, D., Es, I.V., Nijholt, A., Dijk, B.V.: Experimenting with the gaze of a conversational agent. In: Proc. Int. CLASS Workshop on Natural Intelligent and Effective Interaction in Multimodal Dialogue Systems, pp. 93–100 (2002)Google Scholar
  4. 4.
    McCowan, I., Perez, D., Bengio, S., Lathoud, G., Barnard, M., Zhang, D.: Automatic analysis of multimodal group actions in meetings. IEEE Trans. PAMI 27 (2005)Google Scholar
  5. 5.
    Zhang, D., Perez, D.G., Bengio, S., McCowan, I., Lathoud, G.: Modeling individual and group actions in meetings: A two-layer HMM framework. In: Proc. 2nd. IEEE Workshop on Event Mining (2004)Google Scholar
  6. 6.
    Clark, H.H., Carlson, T.B.: Hearers and speech acts. Language 58, 332–373 (1982)CrossRefGoogle Scholar
  7. 7.
    Kendon, A.: Some functions of gaze-direction in social interaction. Acta Psychologica 26, 22–63 (1967)CrossRefGoogle Scholar
  8. 8.
    Argyle, M., Cook, M.: Gaze and Mutual Gaze. Cambridge University Press, Cambridge (1976)Google Scholar
  9. 9.
    Jovanovic, N., Akker, R.: Towards automatic addressee identification in multi-party dialogues. In: Proc. SIGdial 2004, pp. 89–92 (2004)Google Scholar
  10. 10.
    Takemae, Y., Otsuka, K., Mukawa, N.: An analysis of speakers’ gaze behavior for automatic addressee identification in multiparty conversation and its application to video editing. In: Proc. of IEEE Int. Workshop on Robot and Human Interactive Communication (IEEE/RO-MAN), pp. 581–586 (2004)Google Scholar
  11. 11.
    Ohno, T., Mukawa, N.: A free-head, simple calibration, gaze tracking system that enables gaze-based interaction. In: Proc. Eye Tracking Research & Application Symposium (ETRA) 2004, pp. 115–122 (2004)Google Scholar
  12. 12.
    Matsumoto, Y., Zelinsky, A.: An algorithm for real-time stereo vision implementation of head pose and gaze direction measurement. In: Proc. Int. Conf. Automatic Face and Gesture Recognition 2004, pp. 499–504 (2000)Google Scholar
  13. 13.
    Stiefelhagen, R., Yang, J., Waibel, A.: Modeling focus of attention for meeting index based on multiple cues. IEEE Trans. Neural Networks 13 (2002)Google Scholar
  14. 14.
    Reidsma, D., Akker, R., Rienks, R., Poppe, R., Nijholt, A., Heylen, D., Zwiers, J.: Virtual meeting rooms: From observation to simulation. Proc. Social Intelli. Design (2005)Google Scholar
  15. 15.
    Morency, L.-P., Rahimi, A., Darrell, T.: Adaptive view-based appearance model. In: Proc. CVPR 2003, pp. 803–810 (2003)Google Scholar
  16. 16.
    Kim, C.-J., Nelson, C.R.: State-Space Models with Regime Switching. MIT Press, Cambridge (1999)Google Scholar
  17. 17.
    Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Markov Chain Monte Carlo in Practice. Chapman & Hall/CRC (1996)Google Scholar
  18. 18.
    Oliver, N.M., Rosario, B., Pentland, A.P.: A Bayesian computer vision system for modeling human interactions. IEEE Trans. PAMI 22 (2000)Google Scholar
  19. 19.
    Takemae, Y., Otsuka, K., Mukawa, N.: Impact of video editing based on participants’ gaze in multiparty conversation. In: Proc. ACM CHI 2004, pp. 1333–1336 (2004)Google Scholar
  20. 20.
    Novic, D.G., Hansen, B., Ward, K.: Coordinating turn-taking with gaze. In: Proc. Int. Conf. Spoken Language 1996, pp. 1888–1891 (1996)Google Scholar
  21. 21.
    Chen, R., Li, T.-H.: Blind restoration of linearly degraded discrete signals by Gibbs sampling. IEEE Trans. Signal Processing 43, 2410–2413 (1995)CrossRefGoogle Scholar
  22. 22.
    Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. John Wiley & Sons, Chichester (1994)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Kazuhiro Otsuka
    • 1
    • 3
  • Yoshinao Takemae
    • 2
  • Junji Yamato
    • 1
  • Hiroshi Murase
    • 3
  1. 1.NTT Communication Science LaboratoriesNippon Telegraph and Telephone CorporationAtsugiJapan
  2. 2.NTT Cyber Solutions LaboratoriesNippon Telegraph and Telephone CorporationYokosukaJapan
  3. 3.Graduate School of Information ScienceNagoya UniversityNagoyaJapan

Personalised recommendations