Decision Theoretic Modeling of Human Facial Displays

  • Jesse Hoey
  • James J. Little
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3023)


We present a vision based, adaptive, decision theoretic model of human facial displays in interactions. The model is a partially observable Markov decision process, or POMDP. A POMDP is a stochastic planner used by an agent to relate its actions and utility function to its observations and to other context. Video observations are integrated into the POMDP using a dynamic Bayesian network that creates spatial and temporal abstractions of the input sequences. The parameters of the model are learned from training data using an a-posteriori constrained optimization technique based on the expectation-maximization algorithm. The training does not require facial display labels on the training data. The learning process discovers clusters of facial display sequences and their relationship to the context automatically. This avoids the need for human intervention in training data collection, and allows the models to be used without modification for facial display learning in any context without prior knowledge of the type of behaviors to be used. We present an experimental paradigm in which we record two humans playing a game, and learn the POMDP model of their behaviours. The learned model correctly predicts human actions during a simple cooperative card game based, in part, on their facial displays.


Video Sequence Dynamic Bayesian Network Facial Motion Conditional Probability Distribution Card Game 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Russell, J.A., Fernández-Dols, J.M. (eds.): The Psychology of Facial Expression. Cambridge University Press, Cambridge (1997)Google Scholar
  2. 2.
    Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 99–134 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Brand, M., Oliver, N., Pentland, A.: Coupled hidden Markov models for complex action recognition. In: Proc. CVPR, Puerto Rico (1997)Google Scholar
  4. 4.
    Oliver, N., Horvitz, E., Garg, A.: Layered representations for human activity recognition. In: Proc. Intl. Conf. on Multimodal Interfaces, Pittsburgh, PA (2002)Google Scholar
  5. 5.
    Galata, A., Cohn, A.G., Magee, D., Hogg, D.: Modeling interaction using learnt qualitative spatio-temporal relations. In: Proc. ECAI (2002)Google Scholar
  6. 6.
    Tian, Y., Kanade, T., Cohn, J.F.: Recognizing action units for facial expression analysis. IEEE Trans. PAMI 23 (2001)Google Scholar
  7. 7.
    Bregler, C.: Learning and recognising human dynamics in video sequences. In: Proc. CVPR, Puerto Rico, pp. 568–574 (1997)Google Scholar
  8. 8.
    Brand, M.: Structure learning in conditional probability models via an entropic prior and parameter extinction. Neural Computation 11, 1155–1182 (1999)CrossRefGoogle Scholar
  9. 9.
    Jebara, A., Pentland, A.: Action reaction learning: Analysis and synthesis of human behaviour. In: IEEE Workshop on The Interpretation of Visual Motion (1998)Google Scholar
  10. 10.
    Hoey, J., Little, J.J.: Bayesian clustering of optical flow fields. In: Proc. ICCV 2003, Nice, France, pp. 1086–1093 (2003)Google Scholar
  11. 11.
    Hoey, J.: Clustering contextual facial display sequences. In: Proceedings of IEEE Intl Conf. on Face and Gesture, Washington, DC (2002)Google Scholar
  12. 12.
    Hoey, J.: Decision Theoretic Learning of Human Facial Displays and Gestures. PhD thesis, University of British Columbia (2004)Google Scholar
  13. 13.
    Fujita, H., Matsuno, Y., Ishii, S.: A reinforcement learning scheme for a multi-agent card game. IEEE Trans. Syst., Man. & Cybern, 4071–4078 (2003)Google Scholar
  14. 14.
    Montemerlo, M., Pineau, J., Roy, N., Thrun, S., Verma, V.: Experiences with a mobile robotic guide for the elderly. In: Proc. AAAI 2002, Edmonton, Canada (2002)Google Scholar
  15. 15.
    Darrell, T., Pentland, A.: Active gesture recognition using partially observable Markov decision processes. In: 13th IEEE ICPR, Austria (1996)Google Scholar
  16. 16.
    Cassell, J., Sullivan, J., Prevost, S., Churchill, E. (eds.): Embodied Conversational Agents. MIT Press, Cambridge (2000)Google Scholar
  17. 17.
    Dempster, A., Laird, N.M., Rubin, D.: Maximum likelihood from incomplete data using the EM algorithm. Journal of the Royal Statistical Society 39, 1–38 (1977)zbMATHMathSciNetGoogle Scholar
  18. 18.
    Hoey, J., St-Aubin, R., Hu, A., Boutilier, C.: SPUDD: Stochastic planning using decision diagrams. In: Proc. UAI 1999, Stockholm, Sweden (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Jesse Hoey
    • 1
  • James J. Little
    • 1
  1. 1.Department of Computer ScienceUniversity of British ColumbiaVancouverCANADA

Personalised recommendations