Skip to main content

The Recognition and Comprehension of Hand Gestures - A Review and Research Agenda

  • Conference paper
Modeling Communication with Robots and Virtual Humans

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4930))

Abstract

In this paper I review current and past approaches towards the use of hand gesture recognition and comprehension in human-computer interaction. I point out properties of natural coverbal gestures in human communication and identify challenges for gesture comprehension systems in three areas. The first challenge is to derive the meaning of a gesture given that its semantics is defined in three semiotic dimensions that have to be addressed differently. A second challenge is the spatial composition of gestures in imagistic spaces. Finally, a third technical challenge is the development of an integrated processing model for speech and gesture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bavelas, J., Chovil, N.: Visible Acts of Meaning: An Integrated Message Model of Language in Face-to-Face Dialogue. Journal of Language and Social Psychology 19(2), 163–194 (2000)

    Article  Google Scholar 

  2. Beattie, G.: Visible Thought: The New Psychology of Body Language. Routledge, London (2003)

    Google Scholar 

  3. Benoit, C., Martin, J.-C., Pelachaud, C., Schomaker, L., Suhm, B.: Audio-Visual and Multimodal Speech-Based Systems. In: Gibbon, D., Mertins, I., Moore, R. (eds.) Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation, pp. 102–203. Kluwer, Dordrecht, The Netherlands (2000)

    Google Scholar 

  4. Bers, J.: A Body Model Server for Human Motion Capture and Representation. Presence: Teleoperators and Virtual Environments 5(4), 381–392 (1996)

    Google Scholar 

  5. Bolt, R.: “put-that-there”: Voice and gesture at the graphics interface. Journal of Computer Graphics 14(3), 262–270 (1980)

    Article  MathSciNet  Google Scholar 

  6. Bühler, K.: Sprachtheorie. In: Gustav Fischer, Jena, Germany (1934)

    Google Scholar 

  7. Burger, J., Marshall, R.: The Application of Natural Language Models to Intelligent Multimedia. In: Maybury, M. (ed.) Intelligent Multimedia Interfaces, pp. 174–196. MIT Press, Cambridge (1993)

    Google Scholar 

  8. Chai, J., Hong, P., Zhou, M.: A Probabilistic Approach to Reference Resolution in Multimodal User Interfaces. In: Nunes, N.J., Rich, C. (eds.) Proceedings of the 2004 Int. Conf. on Intelligent User Interfaces (IUI 2004), pp. 70–77. ACM Press, New York (2004)

    Chapter  Google Scholar 

  9. Chen, E.: Six Degree-of-Freedom Haptic System for Desktop Virtual Prototyping Applications. In: Proceedings of the First International Workshop on Virtual Reality and Prototyping, Laval, France, pp. 97–106 (June 1999)

    Google Scholar 

  10. Chen, L., Liu, Y., Harper, M., Shriberg, E.: Multimodal Model Integration for Sentence Unit Detection. In: Proceedings of the Int. Conf. on Multimodal Interfaces (ICMI 2003), ACM Press, New York (2003)

    Google Scholar 

  11. Chen, L., Harper, M., Huang, Z.: Using Maximum Entropy (ME) Model to Incorporate Gesture Cues for SU Detection. In: Proceedings of the Int. Conf. on Multimodal Interfaces (ICMI 2006), pp. 185–192. ACM Press, New York (2006)

    Chapter  Google Scholar 

  12. Cheung, K.-M., Baker, S., Kanade, T.: Shape-from-Silhouette Across Time Part II: Applications to Human Modeling and Markerless Motion Tracking. Int. Journal of Computer Vision 63(3), 225–245 (2005)

    Article  Google Scholar 

  13. Clark, H.: Using Language. Cambridge University Press, Cambridge (1996)

    Google Scholar 

  14. Cohen, P.R., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., Clow, J.: Quickset: Multimodal Interaction for Distributed Applications. In: Proceedings of the Fifth Annual International Multimodal Conference, pp. 31–40. ACM Press, New York (1997)

    Chapter  Google Scholar 

  15. Corradini, A.: Real-Time Gesture Recognition by Means of Hybrid Recognizers. In: Wachsmuth, I., Sowa, T. (eds.) Gesture and Sign Language in Human-Computer Interaction, pp. 34–46. Springer, Berlin Heidelberg New York (2002)

    Chapter  Google Scholar 

  16. Efron, D.: Gesture, Race and Culture. Mouton, The Hague (1941)/1972)

    Google Scholar 

  17. Eisenstein, J., Davis, R.: Gesture Features for Coreference Resolution. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 154–165. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  18. Ekman, P., Friesen, W.: The Repertoire of Nonverbal Behavior: Categories, Origins, Usage and Coding. Semiotica 1, 49–98 (1969)

    Google Scholar 

  19. Emmorey, K., Tversky, B., Taylor, H.: Using Space to Describe Space: Perspective in Speech, Sign, and Gesture. Spatial Cognition and Computation 2, 157–180 (2000)

    Article  Google Scholar 

  20. Enfield, N.: On Linear Segmentation and Combinatorics in Co-Speech Gesture: A Symmetry-Dominance Construction in Lao Fish Trap Descriptions. Semiotica 149(1/4), 57–123 (2004)

    Article  Google Scholar 

  21. Wachsmuth, I., Fröhlich, M. (eds.): GW 1997. LNCS (LNAI), vol. 1371. Springer, Heidelberg (1998)

    Google Scholar 

  22. Harling, P., Edwards, A. (eds.): Progress in Gestural Interaction: Proceedings of the Gesture Workshop 1996. Springer, Berlin Heidelberg New York (1997)

    Google Scholar 

  23. Harling, P., Edwards, A.: Hand Tension as a Gesture Segmentation Cue. In: Harling, P., Edwards, A. (eds.) Progress in Gestural Interaction: Proceedings of the Gesture Workshop 1996, pp. 75–87. Berlin Heidelberg New York, Heidelberg (1997)

    Google Scholar 

  24. Hofmann, F., Heyer, P., Hommel, G.: Velocity Profile Based Recognition of Dynamic Gestures with Discrete Hidden Markov Models. In: Wachsmuth, I., Fröhlich, M. (eds.) Gesture and Sign Language in Human-Computer Interaction, pp. 81–95. Springer, Berlin Heidelberg New York (1998)

    Chapter  Google Scholar 

  25. Howell, A., Buxton, H.: Gesture Recognition for Visually Mediated Interaction. In: Braffort, A., Gherbi, R., Gibet, S., Richardson, J., Teil, D. (eds.) Gesture-Based Communication in Human-Computer Interaction, pp. 141–152. Springer, Berlin Heidelberg New York (1999)

    Chapter  Google Scholar 

  26. Huang, Y., Huang, T.: Model-Based Human Body Tracking. In: Proceedings of the 16th International Conference on Pattern Recognition (ICPR 2002), vol. 1, pp. 10552–10556. IEEE Press, Washington (2002)

    Google Scholar 

  27. Johnston, M.: Multimodal Unification-Based Grammars. In: Ali, S., McRoy, S. (eds.) Representations for Multi-Modal Human-Computer Interaction, AAAI Press, Menlo Park (1998)

    Google Scholar 

  28. Johnston, M., Bangalore, S.: Finite-State Methods for Multimodal Parsing and Integration. In: Proceedings of the ESSLLI Summer School on Logic, Language, and Information, Helsinki, Finland (August 2001)

    Google Scholar 

  29. Johnston, M., Cohen, P., McGee, D., Oviatt, S., Pittman, J., Smith, I.: Unification-Based Multimodal Integration. In: Proc. of the 35th Annual Meeting of the Association for Computational Linguistics, Madrid, pp. 281–288 (1997)

    Google Scholar 

  30. Kahol, K., Tripathi, P., Panchuanathan, S.: Gesture Segmentation in Complex Motion Sequences. In: Proceedings of the International Conference on Image Processing (2), pp. 105–108. IEEE Press, Rochester, New York (2002)

    Google Scholar 

  31. Kaiser, E., Olwal, A., McGee, D., Benko, H., Corradini, A., Li, X., Cohen, P., Feiner, S.: Mutual Disambiguation of 3D Multimodal Interaction in Augmented and Virtual Reality. In: Proc. of the Fifth Int. Conf. on Multimodal Interfaces (ICMI 2003), pp. 12–19. ACM Press, New York (2003)

    Chapter  Google Scholar 

  32. Kelly, S., Kravitz, C., Hopkins, M.: Neural Correlates of Bimodal Speech and Gesture Comprehension. Brain and Language 89, 253–260 (2004)

    Article  Google Scholar 

  33. Kendon, A.: Gesticulation and Speech: Two aspects of the Process of Utterance. In: Key, M.R. (ed.) The Relationship of Verbal and Nonverbal Communication, pp. 207–227. Mouton, The Hague (1980)

    Google Scholar 

  34. Kendon, A.: Gesture: Visible Action as Utterance. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  35. Kessler, G.D., Hodges, L.F., Walker, N.: Evaluation of the Cyberglove as a Whole-Hand Input Device. Transactions on Computer Human Interaction 2(4), 263–283 (1995)

    Article  Google Scholar 

  36. Kettebekov, S., Yeasin, M., Sharma, R.: Prosody Based Audiovisual Coanalysis for Coverbal Gesture Recognition. IEEE Transactions on Multimedia 7(2), 234–242 (2005)

    Article  Google Scholar 

  37. Koons, D., Sparrell, C., Thorisson, K.: Integrating Simultaneous Input from Speech, Gaze and Hand Gestures. In: Maybury, M. (ed.) Intelligent Multimedia Interfaces, pp. 257–276. AAAI Press/MIT Press, Cambridge (1993)

    Google Scholar 

  38. Wachsmuth, I., Kranstedt, A., Lücking, A., Pfeiffer, T., Rieser, H.: Deixis: How to Determine Demonstrated Objects Using a Pointing Cone. In: Gibet, S., Courty, N., Kamp, J.-F. (eds.) GW 2005. LNCS (LNAI), vol. 3881, pp. 300–311. Springer, Heidelberg (2006)

    Google Scholar 

  39. Latoschik, M.: Multimodale Interaktion in Virtueller Realität am Beispiel der virtuellen Konstruktion. In: Latoschik, M. (ed.) DISKI, Infix, Berlin, vol. 251 (2001)

    Google Scholar 

  40. Liddell, S.K.: Grammar, Gesture, and Meaning in American Sign Language. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  41. Liddell, S.K.: Blended Spaces and Deixis in Sign Language Discourse. In: McNeill, D. (ed.) Language and Gesture, pp. 331–357. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  42. McNeill, D.: Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press, Chicago (1992)

    Google Scholar 

  43. McNeill, D.: Gesture and Thought. University of Chicago Press, Chicago (2005)

    Google Scholar 

  44. Neal, J., Shapiro, S.: Intelligent Multi-Media Interface Technology. In: Sullivan, S., Tyler, S. (eds.) Intelligent User Interfaces, pp. 11–43. ACM Press, New York (1991)

    Google Scholar 

  45. Nickel, K., Stiefelhagen, R.: Pointing Gesture Recognition Based on 3D-Rracking of Face, Hands and Head Orientation. In: Proceedings of the Int. Conf. on Multimodal Interfaces (ICMI 2003), pp. 140–146. ACM Press, New York (2003)

    Chapter  Google Scholar 

  46. Nigay, L., Coutaz, J.: A Generic Platform for Addressing the Multimodal Challenge. In: Katz, I., Mack, R., Marks, L., Rosson, M.B., Jakob, N. (eds.) Human Factors In Computing Systems: CHI 1995 Conference Proceedings, pp. 98–105. ACM Press, New York (1995)

    Google Scholar 

  47. Oviatt, S.: Multimodal Interfaces. In: Jacko, J., Sears, A. (eds.) The Human-Computer Interaction Handbook, pp. 286–304. Lawrence Erlbaum, Mahwah (2003)

    Google Scholar 

  48. Özyürek, A., Willems, R.M., Kita, S., Hagoort, P.: On-line Integration of Semantic Information from Speech and Gesture: Insights from Event-Related Brain Potentials. Journal of Cognitive Neuroscience 19, 605–616 (2007)

    Article  Google Scholar 

  49. Lee, S.-W., Park, A.-Y.: Gesture Spotting in Continuous Whole Body Action Sequences Using Discrete Hidden Markov Models. In: Gibet, S., Courty, N., Kamp, J.-F. (eds.) GW 2005. LNCS (LNAI), vol. 3881, pp. 100–111. Springer, Heidelberg (2006)

    Google Scholar 

  50. Pavlovic, V., Sharma, R., Huang, T.: Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7), 677–695 (1997)

    Article  Google Scholar 

  51. Peirce, C.S.: Collected Papers of Charles Sanders Peirce. The Belknap Press of Harvard University Press, Cambridge (1965)

    Google Scholar 

  52. Qu, S., Chai, J.Y.: Salience Modeling Based on Non-Verbal Modalities for Spoken Language Understanding. In: Proceedings of the Eighth International Conference on Multimodal Interfaces (ICMI 2006), pp. 193–200. ACM Press, New York (2006)

    Chapter  Google Scholar 

  53. Rabiner, L.: A Tutorial on Hidden Markov Models and Seleted Applications in Speech Recognition. Proceedings of the IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  54. Rigoll, G., Kosmala, A., Eickeler, S.: High Performance Real-Time Gesture Recognition Using Hidden Markov Models. In: Wachsmuth, I., Fröhlich, M. (eds.) Gesture and Sign Language in Human-Computer Interaction, pp. 69–80. Springer, Berlin Heidelberg New York (1998)

    Chapter  Google Scholar 

  55. Roy, D.: Semiotic Schemas: A Framework for Grounding Language in Action and Perception. Artificial Intelligence 167, 170–205 (2005)

    Article  Google Scholar 

  56. Shan, C., Tan, T., Wei, Y.: Real-Time Hand Tracking Using a Mean Shift Embedded Particle Filter. Pattern Recognition 40(7), 1958–1971 (2007)

    Article  MATH  Google Scholar 

  57. Sharma, R., Cai, J., Chakravarthy, S., Poddar, I., Sethi, Y.: Exploiting Speech/Gesture Co-occurrence for Improving Continuous Gesture Recognition in Weather Narration. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 422–427. IEEE Computer Society, Washington (2000)

    Chapter  Google Scholar 

  58. Sowa, T.: Understanding Coverbal Iconic Gestures in Shape Descriptions. Akademische Verlagsgesellschaft Aka, Amsterdam (2006)

    Google Scholar 

  59. Srihari, R.: Computational Models for Integrating Linguistic and Visual Information: A Survey. Artificial Intelligence Review 8, 349–369 (1994)

    Article  Google Scholar 

  60. Thórisson, K.: A Mind Model for Multimodal Communicative Creatures & Humanoids. International Journal of Applied Artificial Intelligence 13(4–5), 449–486 (1999)

    Article  Google Scholar 

  61. Turk, M.: Computer Vision in the Interface. Communications of the ACM 47(1), 60–67 (2004)

    Article  Google Scholar 

  62. Väänänen, K., Böhm, K.: Gesture-Driven Interaction as a Human Factor in Virtual Environments – An Approach with Neural Networks. In: Gigante, M.A., Jones, H. (eds.) Virtual Reality Systems, pp. 93–106. Academic Press, London (1991)

    Google Scholar 

  63. Wachsmuth, I.: Communicative Rhythm in Gesture and Speech. In: Braffort, A., Gherbi, R., Gibet, S., Richardson, J., Teil, D. (eds.) Gesture-Based Communication in Human-Computer Interaction, pp. 277–290. Springer, Berlin Heidelberg New York (1999)

    Chapter  Google Scholar 

  64. Wachsmuth, I., Fröhlich, M. (eds.): GW 1997. LNCS (LNAI), vol. 1371. Springer, Heidelberg (1998)

    Google Scholar 

  65. Wahlster, W.: User and Discourse Models for Multimodal Communication. In: Sullivan, J., Tyler, S. (eds.) Intelligent User Interfaces, pp. 45–67. ACM Press, New York (1991)

    Google Scholar 

  66. Waibel, A., Vo, M.T., Duchnowski, P., Manke, S.: Multimodal Interfaces. Artificial Intelligence Review 10, 299–319 (1996)

    Article  Google Scholar 

  67. Willems, R., Özyürek, A., Hagoort, P.: When Language Meets Action: The Neural Integration of Gesture and Speech. Cerebral Cortex Advance Access, (published December 11, 2006) (2006) doi:10.1093/cercor/bhl141

    Google Scholar 

  68. Wu, Y., Huang, T.: Vision-Based Gesture Recognition: A Review. In: Braffort, A., Gherbi, R., Gibet, S., Richardson, J., Teil, D. (eds.) Gesture-Based Communication in Human-Computer Interaction, pp. 103–115. Springer, Berlin Heidelberg New York (1999)

    Chapter  Google Scholar 

  69. Wu, Y.C., Coulson, S.: Meaningful Gestures: Electrophysiological Indices of Iconic Gesture Comprehension. Psychophysiology 42, 654–667 (2005)

    Article  Google Scholar 

  70. Wundt, W.: The Language of Gestures. In: vol. 6 of Approaches to Semiotics, Mouton, The Hague, Paris (1900/1973)

    Google Scholar 

  71. Zimmerman, T., Lanier, J., Blanchard, C., Bryson, S., Harvill, Y.: A Hand Gesture Interface Device. In: Proceedings of the SIGCHI/GI Conference on Human Factors in Computing Systems and Graphics Interface, Toronto, Canada, pp. 189–192. ACM Press, New York (1986)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ipke Wachsmuth Günther Knoblich

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sowa, T. (2008). The Recognition and Comprehension of Hand Gestures - A Review and Research Agenda. In: Wachsmuth, I., Knoblich, G. (eds) Modeling Communication with Robots and Virtual Humans. Lecture Notes in Computer Science(), vol 4930. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79037-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-79037-2_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-79036-5

  • Online ISBN: 978-3-540-79037-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics