Decision-Theoretic Assistants Based on Contextual Gesture Recognition

  • José Antonio MonteroEmail author
  • Luis E. Sucar
  • Miriam Martínez
Part of the Annals of Information Systems book series (AOIS, volume 19)


This paper presents a novel approach that combines computer vision and decision theory for building intelligent assistants. It considers situations in which a person interacts with surrounding objects, where the system determines the most probable activity and based on it selects an action according to certain parameters. This framework is applicable to situations in which decisions are based on human activities and their interactions with objects in the environment. Examples of this type of situation include a caregiver that helps a handicapped person or an automatic video conference system that selects the best view according to the speaker’s actions. The system assumes that the human activity can be recognized based on hand gestures and their interaction with relevant objects present in the environment. The proposed approach combines contextual-based gesture recognition with a decision theoretic model for selecting the best action in uncertain conditions. Gesture recognition is based on hidden Markov models, combining motion and contextual information, where the context refers to the relative position of the hand to a nearby object. The posterior probability of each gesture is used in a Partially Observable Markov Decision Process (POMDP) to select the best action according to a utility function. The POMDP is implemented as a dynamic decision network (DDN). Experiments in two settings, videoconference and human care giving, show promising results in both gesture recognition and action selection. The experiments show that the proposed framework is robust to changes in the parameters (lookahead, probabilities and rewards), and shows that the performance is similar to that of a human assistant.


POMDPs Dynamic decision networks Intelligent assistant Gesture recognition 


  1. Arseneau S, Cooperstock JR (1999) Presenter tracking in a classroom environment. In: AAAI symposium on intelligent environment, vol 1, pp 145–148Google Scholar
  2. Ayers D, Shah M (1997) Monitoring human behavior in an office environment. Pattern Anal Mach Intell 7:780–794Google Scholar
  3. Bandera C, Vico FJ, Bravo JM, Harmon ME (1996) Residual q-learning applied to visual attention. In: Proceedings of the thirteenth international conference on machine learning, vol 1, pp 3–6Google Scholar
  4. Bhattacharjya D, Shachter RD (2010) Solving influence diagrams: exact algorithms. In: Proceedings of the twenty sixth conference on uncertainty in artificial intelligenceGoogle Scholar
  5. Boger J, Hoey J, Boutilier C, Fernie G, Mihailidis A (2005) A decision-theoretic approach to task assistance for persons with dementia. In: Proceedings of the international joint conferences on artificial intelligenceGoogle Scholar
  6. Bradski G (1998) Computer vision face tracking as a component of a perceptual user interface. In: Workshop on applications of computer vision, vol 1, pp 214–219Google Scholar
  7. Cassandra AR, Kaelbling LP, Littman LM (1994) Acting optimally in partially observable stochastic domains. In: Proceedings of the twelfth national conference on artificial intelligence, vol 2Google Scholar
  8. Cooper GF (1990) The computational complexity of probabilistic inference using Bayesian belief networks. Artif Intell 42:393–405CrossRefGoogle Scholar
  9. Dean T, Kanazawa K (1989) A model for reasoning about persistence and causation. Comput Intell, vol 5, 142–150CrossRefGoogle Scholar
  10. Gonzalez R, Woods RE (1992) Digital image processing. Addison-Wesley, BostonGoogle Scholar
  11. Gray RM (1984) Vector quantization. IEEE ASSP Mag 7:407–467Google Scholar
  12. Henrion M (1988) Propagation on uncertainty in Bayesian networks by probabilistic logic sampling. Uncertain Artif Intell 2:149–163Google Scholar
  13. Hoey J, Little J (2004) Decision theoretic modeling of human facial displays. Technical Report TR-04-02 Department of Computer ScienceGoogle Scholar
  14. Hoey J, Poupart P, Boutilier C, Mihailidis A (2005) Semi-supervised learning of a pomdp model of patient-caregiver interactions. In: Proceedings of the international joint conference on autonomous agents and multi agent systems (AAMAS), pp 1–9Google Scholar
  15. Howard RA, Matheson JE (1984) Influence diagrams. In: Howard RA, Matheson JE (eds.) The principles and applications of decision analysis, Strategic Decision Group, Palo Alto, CA, USA pp 719–762Google Scholar
  16. Huang C, Darwiche A (1994) Inference in belief networks: a procedural guide. Int J Approx Reason 1:1–158Google Scholar
  17. Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285Google Scholar
  18. Levner I, Bulitko V, Li L, Lee G, Greiner R (2003) Automated feature extraction for object recognition. University of Alberta, AlbertaGoogle Scholar
  19. Likert R (1932) A technique for the measurement of attitudes. Ph.D. thesis, Columbia UniversityGoogle Scholar
  20. Martinez M, Sucar LE (2006) Learning an optimal naive Bayes classifier. In: 18th international conference on pattern recognition (ICPR’06), vol 4, pp 958–962Google Scholar
  21. Mihailidis A, Fernie GR, Barbanel JC (2001) The use of artificial intelligence in the design of an intelligent cognitive orthosis for people with dementia. Assist Technol 13:23–39CrossRefGoogle Scholar
  22. Montero JA, Sucar LE (2004) Feature selection for visual gesture recognition using hidden Markov models. In: Fifth Mexican international conference on computer science, vol 1, pp 196–203Google Scholar
  23. Montero JA, Sucar LE (2006a) Context-based gesture recognition. In: 11th Iberoamerican congress on pattern recognition CIARP’2006Google Scholar
  24. Montero JA, Sucar LE (2006b) A decision-theoretic video conference system based on gesture recognition. In: 7th IEEE international conference automatic face and gesture recognition, FG2006, pp 387–392Google Scholar
  25. Moore DJ, Essa IA, Hayes MH (2000) Exploiting human actions and object context for recognition tasks. In: Proceedings of IEEE of the 7th international conference on computer vision, vol 1Google Scholar
  26. Murray R, VanLehn K (2000) DT Tutor: a decision-theoretic, dynamic approach for optimal selection of tutorial actions. In: 5th international conference of intelligent tutoring systems ITS 2000, pp 153–162Google Scholar
  27. Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 1:62–66.Google Scholar
  28. Pollack ME, McCarthy CE, Ramakrishnan S, Tsamardinos I, Brown L, Carrion S, Colbry D, Orosz C, Peintner B (2003) Autominder: a planning, monitoring, and reminding assistive agent. Robot Auton Syst 44:273–282CrossRefGoogle Scholar
  29. Puterman ML (1994) Markov decision process discrete stochastic dynamic programming. Wiley, New YorkGoogle Scholar
  30. Rabiner L, Juang B (1993) Fundamentals of speech recognition. Signal processing series. Prentice Hall, Upper Saddle RiverGoogle Scholar
  31. Ren H, Xu G (2002) Human action recognition in smart classroom. In: Proceedings of the fifth IEEE international conference on automatic face and gesture recognition, vol 5, pp 54–60Google Scholar
  32. Russell S, Norvig P (2002) Artificial intelligence: a modern approach, 2nd edn. Prentice Hall, Englewood CliffsGoogle Scholar
  33. Shatcher RD, Peot MA (1992) Decision making using probabilistic inference methods. In: Proceedings of 8th conference on uncertainty in artificial intelligence, pp 276–283Google Scholar
  34. Sutton R, Barto A (1998) Reinforcement learning: an introduction. MIT Press, CambridgeGoogle Scholar
  35. Swain MJ, Ballard DH (1991) Color indexing. Int J Comput Vis, vol 7, 11–32CrossRefGoogle Scholar
  36. Darrell T, Pentland A (1996) Active gesture recognition using learned visual attention. In: Advances in neural information processing systems (NIPS), vol 1Google Scholar
  37. Thonnat M, Rota N (2000) Video sequence interpretation for visual surveillance. In: Third international workshop on cooperative distributed vision, Advances in Neural Information Processing Systems, NIPS, Denver, CO, USA, pp 1–9Google Scholar
  38. Tsykin M, Landshow CD (1998) End-to-end response time and beyond: direct measurement of service levels. Work sponsored by Fujitsu Australia Limited, pp 1–13Google Scholar
  39. Yoon HS, Soh J, Ming BW, Yang HS (1999) Recognition of alphabetical hand gestures using hidden Markov model. IEEE Trans Fundam 82:1358–1366Google Scholar
  40. Zobi M, Wallhoff F, Rigoll G (2003) Action recognition in meeting scenarios using global motion features. In: ICASSP proceeding, vol 1, pp 115–119Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • José Antonio Montero
    • 1
    Email author
  • Luis E. Sucar
    • 2
  • Miriam Martínez
    • 1
  1. 1.Acapulco Institute of TechnologyAcapulcoMexico
  2. 2.Department of Computer ScienceNational Institute of Astrophysics, Optics and ElectronicsTonanzintlaMexico

Personalised recommendations