The Recognition and Comprehension of Hand Gestures - A Review and Research Agenda

Sowa, Timo

doi:10.1007/978-3-540-79037-2_3

Timo Sowa¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4930))

1152 Accesses
3 Citations

Abstract

In this paper I review current and past approaches towards the use of hand gesture recognition and comprehension in human-computer interaction. I point out properties of natural coverbal gestures in human communication and identify challenges for gesture comprehension systems in three areas. The first challenge is to derive the meaning of a gesture given that its semantics is defined in three semiotic dimensions that have to be addressed differently. A second challenge is the spatial composition of gestures in imagistic spaces. Finally, a third technical challenge is the development of an integrated processing model for speech and gesture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bavelas, J., Chovil, N.: Visible Acts of Meaning: An Integrated Message Model of Language in Face-to-Face Dialogue. Journal of Language and Social Psychology 19(2), 163–194 (2000)
Article Google Scholar
Beattie, G.: Visible Thought: The New Psychology of Body Language. Routledge, London (2003)
Google Scholar
Benoit, C., Martin, J.-C., Pelachaud, C., Schomaker, L., Suhm, B.: Audio-Visual and Multimodal Speech-Based Systems. In: Gibbon, D., Mertins, I., Moore, R. (eds.) Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation, pp. 102–203. Kluwer, Dordrecht, The Netherlands (2000)
Google Scholar
Bers, J.: A Body Model Server for Human Motion Capture and Representation. Presence: Teleoperators and Virtual Environments 5(4), 381–392 (1996)
Google Scholar
Bolt, R.: “put-that-there”: Voice and gesture at the graphics interface. Journal of Computer Graphics 14(3), 262–270 (1980)
Article MathSciNet Google Scholar
Bühler, K.: Sprachtheorie. In: Gustav Fischer, Jena, Germany (1934)
Google Scholar
Burger, J., Marshall, R.: The Application of Natural Language Models to Intelligent Multimedia. In: Maybury, M. (ed.) Intelligent Multimedia Interfaces, pp. 174–196. MIT Press, Cambridge (1993)
Google Scholar
Chai, J., Hong, P., Zhou, M.: A Probabilistic Approach to Reference Resolution in Multimodal User Interfaces. In: Nunes, N.J., Rich, C. (eds.) Proceedings of the 2004 Int. Conf. on Intelligent User Interfaces (IUI 2004), pp. 70–77. ACM Press, New York (2004)
Chapter Google Scholar
Chen, E.: Six Degree-of-Freedom Haptic System for Desktop Virtual Prototyping Applications. In: Proceedings of the First International Workshop on Virtual Reality and Prototyping, Laval, France, pp. 97–106 (June 1999)
Google Scholar
Chen, L., Liu, Y., Harper, M., Shriberg, E.: Multimodal Model Integration for Sentence Unit Detection. In: Proceedings of the Int. Conf. on Multimodal Interfaces (ICMI 2003), ACM Press, New York (2003)
Google Scholar
Chen, L., Harper, M., Huang, Z.: Using Maximum Entropy (ME) Model to Incorporate Gesture Cues for SU Detection. In: Proceedings of the Int. Conf. on Multimodal Interfaces (ICMI 2006), pp. 185–192. ACM Press, New York (2006)
Chapter Google Scholar
Cheung, K.-M., Baker, S., Kanade, T.: Shape-from-Silhouette Across Time Part II: Applications to Human Modeling and Markerless Motion Tracking. Int. Journal of Computer Vision 63(3), 225–245 (2005)
Article Google Scholar
Clark, H.: Using Language. Cambridge University Press, Cambridge (1996)
Google Scholar
Cohen, P.R., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., Clow, J.: Quickset: Multimodal Interaction for Distributed Applications. In: Proceedings of the Fifth Annual International Multimodal Conference, pp. 31–40. ACM Press, New York (1997)
Chapter Google Scholar
Corradini, A.: Real-Time Gesture Recognition by Means of Hybrid Recognizers. In: Wachsmuth, I., Sowa, T. (eds.) Gesture and Sign Language in Human-Computer Interaction, pp. 34–46. Springer, Berlin Heidelberg New York (2002)
Chapter Google Scholar
Efron, D.: Gesture, Race and Culture. Mouton, The Hague (1941)/1972)
Google Scholar
Eisenstein, J., Davis, R.: Gesture Features for Coreference Resolution. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 154–165. Springer, Heidelberg (2006)
Chapter Google Scholar
Ekman, P., Friesen, W.: The Repertoire of Nonverbal Behavior: Categories, Origins, Usage and Coding. Semiotica 1, 49–98 (1969)
Google Scholar
Emmorey, K., Tversky, B., Taylor, H.: Using Space to Describe Space: Perspective in Speech, Sign, and Gesture. Spatial Cognition and Computation 2, 157–180 (2000)
Article Google Scholar
Enfield, N.: On Linear Segmentation and Combinatorics in Co-Speech Gesture: A Symmetry-Dominance Construction in Lao Fish Trap Descriptions. Semiotica 149(1/4), 57–123 (2004)
Article Google Scholar
Wachsmuth, I., Fröhlich, M. (eds.): GW 1997. LNCS (LNAI), vol. 1371. Springer, Heidelberg (1998)
Google Scholar
Harling, P., Edwards, A. (eds.): Progress in Gestural Interaction: Proceedings of the Gesture Workshop 1996. Springer, Berlin Heidelberg New York (1997)
Google Scholar
Harling, P., Edwards, A.: Hand Tension as a Gesture Segmentation Cue. In: Harling, P., Edwards, A. (eds.) Progress in Gestural Interaction: Proceedings of the Gesture Workshop 1996, pp. 75–87. Berlin Heidelberg New York, Heidelberg (1997)
Google Scholar
Hofmann, F., Heyer, P., Hommel, G.: Velocity Profile Based Recognition of Dynamic Gestures with Discrete Hidden Markov Models. In: Wachsmuth, I., Fröhlich, M. (eds.) Gesture and Sign Language in Human-Computer Interaction, pp. 81–95. Springer, Berlin Heidelberg New York (1998)
Chapter Google Scholar
Howell, A., Buxton, H.: Gesture Recognition for Visually Mediated Interaction. In: Braffort, A., Gherbi, R., Gibet, S., Richardson, J., Teil, D. (eds.) Gesture-Based Communication in Human-Computer Interaction, pp. 141–152. Springer, Berlin Heidelberg New York (1999)
Chapter Google Scholar
Huang, Y., Huang, T.: Model-Based Human Body Tracking. In: Proceedings of the 16th International Conference on Pattern Recognition (ICPR 2002), vol. 1, pp. 10552–10556. IEEE Press, Washington (2002)
Google Scholar
Johnston, M.: Multimodal Unification-Based Grammars. In: Ali, S., McRoy, S. (eds.) Representations for Multi-Modal Human-Computer Interaction, AAAI Press, Menlo Park (1998)
Google Scholar
Johnston, M., Bangalore, S.: Finite-State Methods for Multimodal Parsing and Integration. In: Proceedings of the ESSLLI Summer School on Logic, Language, and Information, Helsinki, Finland (August 2001)
Google Scholar
Johnston, M., Cohen, P., McGee, D., Oviatt, S., Pittman, J., Smith, I.: Unification-Based Multimodal Integration. In: Proc. of the 35th Annual Meeting of the Association for Computational Linguistics, Madrid, pp. 281–288 (1997)
Google Scholar
Kahol, K., Tripathi, P., Panchuanathan, S.: Gesture Segmentation in Complex Motion Sequences. In: Proceedings of the International Conference on Image Processing (2), pp. 105–108. IEEE Press, Rochester, New York (2002)
Google Scholar
Kaiser, E., Olwal, A., McGee, D., Benko, H., Corradini, A., Li, X., Cohen, P., Feiner, S.: Mutual Disambiguation of 3D Multimodal Interaction in Augmented and Virtual Reality. In: Proc. of the Fifth Int. Conf. on Multimodal Interfaces (ICMI 2003), pp. 12–19. ACM Press, New York (2003)
Chapter Google Scholar
Kelly, S., Kravitz, C., Hopkins, M.: Neural Correlates of Bimodal Speech and Gesture Comprehension. Brain and Language 89, 253–260 (2004)
Article Google Scholar
Kendon, A.: Gesticulation and Speech: Two aspects of the Process of Utterance. In: Key, M.R. (ed.) The Relationship of Verbal and Nonverbal Communication, pp. 207–227. Mouton, The Hague (1980)
Google Scholar
Kendon, A.: Gesture: Visible Action as Utterance. Cambridge University Press, Cambridge (2004)
Google Scholar
Kessler, G.D., Hodges, L.F., Walker, N.: Evaluation of the Cyberglove as a Whole-Hand Input Device. Transactions on Computer Human Interaction 2(4), 263–283 (1995)
Article Google Scholar
Kettebekov, S., Yeasin, M., Sharma, R.: Prosody Based Audiovisual Coanalysis for Coverbal Gesture Recognition. IEEE Transactions on Multimedia 7(2), 234–242 (2005)
Article Google Scholar
Koons, D., Sparrell, C., Thorisson, K.: Integrating Simultaneous Input from Speech, Gaze and Hand Gestures. In: Maybury, M. (ed.) Intelligent Multimedia Interfaces, pp. 257–276. AAAI Press/MIT Press, Cambridge (1993)
Google Scholar
Wachsmuth, I., Kranstedt, A., Lücking, A., Pfeiffer, T., Rieser, H.: Deixis: How to Determine Demonstrated Objects Using a Pointing Cone. In: Gibet, S., Courty, N., Kamp, J.-F. (eds.) GW 2005. LNCS (LNAI), vol. 3881, pp. 300–311. Springer, Heidelberg (2006)
Google Scholar
Latoschik, M.: Multimodale Interaktion in Virtueller Realität am Beispiel der virtuellen Konstruktion. In: Latoschik, M. (ed.) DISKI, Infix, Berlin, vol. 251 (2001)
Google Scholar
Liddell, S.K.: Grammar, Gesture, and Meaning in American Sign Language. Cambridge University Press, Cambridge (2003)
Google Scholar
Liddell, S.K.: Blended Spaces and Deixis in Sign Language Discourse. In: McNeill, D. (ed.) Language and Gesture, pp. 331–357. Cambridge University Press, Cambridge (2000)
Google Scholar
McNeill, D.: Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press, Chicago (1992)
Google Scholar
McNeill, D.: Gesture and Thought. University of Chicago Press, Chicago (2005)
Google Scholar
Neal, J., Shapiro, S.: Intelligent Multi-Media Interface Technology. In: Sullivan, S., Tyler, S. (eds.) Intelligent User Interfaces, pp. 11–43. ACM Press, New York (1991)
Google Scholar
Nickel, K., Stiefelhagen, R.: Pointing Gesture Recognition Based on 3D-Rracking of Face, Hands and Head Orientation. In: Proceedings of the Int. Conf. on Multimodal Interfaces (ICMI 2003), pp. 140–146. ACM Press, New York (2003)
Chapter Google Scholar
Nigay, L., Coutaz, J.: A Generic Platform for Addressing the Multimodal Challenge. In: Katz, I., Mack, R., Marks, L., Rosson, M.B., Jakob, N. (eds.) Human Factors In Computing Systems: CHI 1995 Conference Proceedings, pp. 98–105. ACM Press, New York (1995)
Google Scholar
Oviatt, S.: Multimodal Interfaces. In: Jacko, J., Sears, A. (eds.) The Human-Computer Interaction Handbook, pp. 286–304. Lawrence Erlbaum, Mahwah (2003)
Google Scholar
Özyürek, A., Willems, R.M., Kita, S., Hagoort, P.: On-line Integration of Semantic Information from Speech and Gesture: Insights from Event-Related Brain Potentials. Journal of Cognitive Neuroscience 19, 605–616 (2007)
Article Google Scholar
Lee, S.-W., Park, A.-Y.: Gesture Spotting in Continuous Whole Body Action Sequences Using Discrete Hidden Markov Models. In: Gibet, S., Courty, N., Kamp, J.-F. (eds.) GW 2005. LNCS (LNAI), vol. 3881, pp. 100–111. Springer, Heidelberg (2006)
Google Scholar
Pavlovic, V., Sharma, R., Huang, T.: Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7), 677–695 (1997)
Article Google Scholar
Peirce, C.S.: Collected Papers of Charles Sanders Peirce. The Belknap Press of Harvard University Press, Cambridge (1965)
Google Scholar
Qu, S., Chai, J.Y.: Salience Modeling Based on Non-Verbal Modalities for Spoken Language Understanding. In: Proceedings of the Eighth International Conference on Multimodal Interfaces (ICMI 2006), pp. 193–200. ACM Press, New York (2006)
Chapter Google Scholar
Rabiner, L.: A Tutorial on Hidden Markov Models and Seleted Applications in Speech Recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Article Google Scholar
Rigoll, G., Kosmala, A., Eickeler, S.: High Performance Real-Time Gesture Recognition Using Hidden Markov Models. In: Wachsmuth, I., Fröhlich, M. (eds.) Gesture and Sign Language in Human-Computer Interaction, pp. 69–80. Springer, Berlin Heidelberg New York (1998)
Chapter Google Scholar
Roy, D.: Semiotic Schemas: A Framework for Grounding Language in Action and Perception. Artificial Intelligence 167, 170–205 (2005)
Article Google Scholar
Shan, C., Tan, T., Wei, Y.: Real-Time Hand Tracking Using a Mean Shift Embedded Particle Filter. Pattern Recognition 40(7), 1958–1971 (2007)
Article MATH Google Scholar
Sharma, R., Cai, J., Chakravarthy, S., Poddar, I., Sethi, Y.: Exploiting Speech/Gesture Co-occurrence for Improving Continuous Gesture Recognition in Weather Narration. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 422–427. IEEE Computer Society, Washington (2000)
Chapter Google Scholar
Sowa, T.: Understanding Coverbal Iconic Gestures in Shape Descriptions. Akademische Verlagsgesellschaft Aka, Amsterdam (2006)
Google Scholar
Srihari, R.: Computational Models for Integrating Linguistic and Visual Information: A Survey. Artificial Intelligence Review 8, 349–369 (1994)
Article Google Scholar
Thórisson, K.: A Mind Model for Multimodal Communicative Creatures & Humanoids. International Journal of Applied Artificial Intelligence 13(4–5), 449–486 (1999)
Article Google Scholar
Turk, M.: Computer Vision in the Interface. Communications of the ACM 47(1), 60–67 (2004)
Article Google Scholar
Väänänen, K., Böhm, K.: Gesture-Driven Interaction as a Human Factor in Virtual Environments – An Approach with Neural Networks. In: Gigante, M.A., Jones, H. (eds.) Virtual Reality Systems, pp. 93–106. Academic Press, London (1991)
Google Scholar
Wachsmuth, I.: Communicative Rhythm in Gesture and Speech. In: Braffort, A., Gherbi, R., Gibet, S., Richardson, J., Teil, D. (eds.) Gesture-Based Communication in Human-Computer Interaction, pp. 277–290. Springer, Berlin Heidelberg New York (1999)
Chapter Google Scholar
Wachsmuth, I., Fröhlich, M. (eds.): GW 1997. LNCS (LNAI), vol. 1371. Springer, Heidelberg (1998)
Google Scholar
Wahlster, W.: User and Discourse Models for Multimodal Communication. In: Sullivan, J., Tyler, S. (eds.) Intelligent User Interfaces, pp. 45–67. ACM Press, New York (1991)
Google Scholar
Waibel, A., Vo, M.T., Duchnowski, P., Manke, S.: Multimodal Interfaces. Artificial Intelligence Review 10, 299–319 (1996)
Article Google Scholar
Willems, R., Özyürek, A., Hagoort, P.: When Language Meets Action: The Neural Integration of Gesture and Speech. Cerebral Cortex Advance Access, (published December 11, 2006) (2006) doi:10.1093/cercor/bhl141
Google Scholar
Wu, Y., Huang, T.: Vision-Based Gesture Recognition: A Review. In: Braffort, A., Gherbi, R., Gibet, S., Richardson, J., Teil, D. (eds.) Gesture-Based Communication in Human-Computer Interaction, pp. 103–115. Springer, Berlin Heidelberg New York (1999)
Chapter Google Scholar
Wu, Y.C., Coulson, S.: Meaningful Gestures: Electrophysiological Indices of Iconic Gesture Comprehension. Psychophysiology 42, 654–667 (2005)
Article Google Scholar
Wundt, W.: The Language of Gestures. In: vol. 6 of Approaches to Semiotics, Mouton, The Hague, Paris (1900/1973)
Google Scholar
Zimmerman, T., Lanier, J., Blanchard, C., Bryson, S., Harvill, Y.: A Hand Gesture Interface Device. In: Proceedings of the SIGCHI/GI Conference on Human Factors in Computing Systems and Graphics Interface, Toronto, Canada, pp. 189–192. ACM Press, New York (1986)
Google Scholar

Download references

Author information

Authors and Affiliations

Elektrobit Corporation, Am Wolfsmantel 46, 91058, Erlangen, Germany
Timo Sowa

Authors

Timo Sowa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ipke Wachsmuth Günther Knoblich

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sowa, T. (2008). The Recognition and Comprehension of Hand Gestures - A Review and Research Agenda. In: Wachsmuth, I., Knoblich, G. (eds) Modeling Communication with Robots and Virtual Humans. Lecture Notes in Computer Science(), vol 4930. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79037-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-79037-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-79036-5
Online ISBN: 978-3-540-79037-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics