Abstract
We present a vision based, adaptive, decision theoretic model of human facial displays in interactions. The model is a partially observable Markov decision process, or POMDP. A POMDP is a stochastic planner used by an agent to relate its actions and utility function to its observations and to other context. Video observations are integrated into the POMDP using a dynamic Bayesian network that creates spatial and temporal abstractions of the input sequences. The parameters of the model are learned from training data using an a-posteriori constrained optimization technique based on the expectation-maximization algorithm. The training does not require facial display labels on the training data. The learning process discovers clusters of facial display sequences and their relationship to the context automatically. This avoids the need for human intervention in training data collection, and allows the models to be used without modification for facial display learning in any context without prior knowledge of the type of behaviors to be used. We present an experimental paradigm in which we record two humans playing a game, and learn the POMDP model of their behaviours. The learned model correctly predicts human actions during a simple cooperative card game based, in part, on their facial displays.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Russell, J.A., Fernández-Dols, J.M. (eds.): The Psychology of Facial Expression. Cambridge University Press, Cambridge (1997)
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 99–134 (1998)
Brand, M., Oliver, N., Pentland, A.: Coupled hidden Markov models for complex action recognition. In: Proc. CVPR, Puerto Rico (1997)
Oliver, N., Horvitz, E., Garg, A.: Layered representations for human activity recognition. In: Proc. Intl. Conf. on Multimodal Interfaces, Pittsburgh, PA (2002)
Galata, A., Cohn, A.G., Magee, D., Hogg, D.: Modeling interaction using learnt qualitative spatio-temporal relations. In: Proc. ECAI (2002)
Tian, Y., Kanade, T., Cohn, J.F.: Recognizing action units for facial expression analysis. IEEE Trans. PAMIÂ 23 (2001)
Bregler, C.: Learning and recognising human dynamics in video sequences. In: Proc. CVPR, Puerto Rico, pp. 568–574 (1997)
Brand, M.: Structure learning in conditional probability models via an entropic prior and parameter extinction. Neural Computation 11, 1155–1182 (1999)
Jebara, A., Pentland, A.: Action reaction learning: Analysis and synthesis of human behaviour. In: IEEE Workshop on The Interpretation of Visual Motion (1998)
Hoey, J., Little, J.J.: Bayesian clustering of optical flow fields. In: Proc. ICCV 2003, Nice, France, pp. 1086–1093 (2003)
Hoey, J.: Clustering contextual facial display sequences. In: Proceedings of IEEE Intl Conf. on Face and Gesture, Washington, DC (2002)
Hoey, J.: Decision Theoretic Learning of Human Facial Displays and Gestures. PhD thesis, University of British Columbia (2004)
Fujita, H., Matsuno, Y., Ishii, S.: A reinforcement learning scheme for a multi-agent card game. IEEE Trans. Syst., Man. & Cybern, 4071–4078 (2003)
Montemerlo, M., Pineau, J., Roy, N., Thrun, S., Verma, V.: Experiences with a mobile robotic guide for the elderly. In: Proc. AAAI 2002, Edmonton, Canada (2002)
Darrell, T., Pentland, A.: Active gesture recognition using partially observable Markov decision processes. In: 13th IEEE ICPR, Austria (1996)
Cassell, J., Sullivan, J., Prevost, S., Churchill, E. (eds.): Embodied Conversational Agents. MIT Press, Cambridge (2000)
Dempster, A., Laird, N.M., Rubin, D.: Maximum likelihood from incomplete data using the EM algorithm. Journal of the Royal Statistical Society 39, 1–38 (1977)
Hoey, J., St-Aubin, R., Hu, A., Boutilier, C.: SPUDD: Stochastic planning using decision diagrams. In: Proc. UAI 1999, Stockholm, Sweden (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hoey, J., Little, J.J. (2004). Decision Theoretic Modeling of Human Facial Displays. In: Pajdla, T., Matas, J. (eds) Computer Vision - ECCV 2004. ECCV 2004. Lecture Notes in Computer Science, vol 3023. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24672-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-24672-5_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21982-8
Online ISBN: 978-3-540-24672-5
eBook Packages: Springer Book Archive