Skip to main content
Log in

A database-based framework for gesture recognition

  • Original Article
  • Published:
Personal and Ubiquitous Computing Aims and scope Submit manuscript

Abstract

Gestures are an important modality for human–machine communication. Computer vision modules performing gesture recognition can be important components of intelligent homes, assistive environments, and human–computer interfaces. A key problem in recognizing gestures is that the appearance of a gesture can vary widely depending on variables such as the person performing the gesture, or the position and orientation of the camera. This paper presents a database-based approach for addressing this problem. The large variability in appearance among different examples of the same gesture is addressed by creating large gesture databases, that store enough exemplars from each gesture to capture the variability within that gesture. This database-based approach is applied to two gesture recognition problems: handshape categorization and motion-based recognition of American Sign Language signs. A key aspect of our approach is the use of database indexing methods, in order to address the challenge of searching large databases without violating the time constraints of an online interactive system, where system response times of over a few seconds are oftentimes considered unacceptable. Our experiments demonstrate the benefits of the proposed database-based framework, and the feasibility of integrating large gesture databases into online interacting systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Alon J, Athitsos V, Yuan Q, Sclaroff S (2005) Simultaneous localization and recognition of dynamic hand gestures. In: IEEE motion workshop, pp 254–260

  2. Athitsos V, Alon J, Sclaroff S, Kollios G (2005) Filtering methods for similarity-based multimedia retrieval. In: International workshop on audio-visual content and information visualization in digital libraries (AVIVDiLib)

  3. Athitsos V, Alon J, Sclaroff S, Kollios G (2008) Boostmap: an embedding method for efficient nearest neighbor retrieval. IEEE Trans Pattern Anal Mach Intell 30(1):89–104

    Article  Google Scholar 

  4. Athitsos V, Hadjieleftheriou M, Kollios G, Sclaroff S (2007) Query-sensitive embeddings. ACM Trans Database Syst 32(2)

  5. Athitsos V, Neidle C, Sclaroff S, Nash J, Stefan A, Yuan Q, Thangali A (2008) The American sign language lexicon video dataset. In: IEEE workshop on computer vision and pattern recognition for human communicative behavior analysis (CVPR4HB)

  6. Athitsos V, Sclaroff S (2003) Estimating hand pose from a cluttered image. In: IEEE conference on computer vision and pattern recognition (CVPR), vol 2, pp 432–439

  7. Barrow HG, Tenenbaum JM, Bolles RC, Wolf HC (1977) Parametric correspondence and chamfer matching: two new techniques for image matching. In: International joint conference on artificial intelligence, pp 659–663

  8. Bauer B, Kraiss KF (2001) Towards an automatic sign language recognition system using subunits. In: Camurri A, Volpe G (eds) Gesture workshop, pp 64–75

  9. Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522

    Article  Google Scholar 

  10. Böhm C, Berchtold S, Keim DA (2001) Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput Surv 33(3):322–373

    Article  Google Scholar 

  11. Bourgain J (1985) On Lipschitz embeddings of finite metric spaces in Hilbert space. Isr J Math 52:46–52

    Article  MATH  MathSciNet  Google Scholar 

  12. Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698

    Article  Google Scholar 

  13. Cui Y, Weng J (2000) Appearance-based hand sign recognition from intensity image sequences. Comput Vis Image Underst 78(2):157–176

    Article  Google Scholar 

  14. Curious Labs, Santa Cruz, CA. Poser 5 Reference Manual, August 2002

  15. Darrell TJ, Essa IA, Pentland AP (1996) Task-specific gesture analysis in real-time using interpolated views. IEEE Trans Pattern Anal Mach Intell 18(12):1236–1242

    Article  Google Scholar 

  16. de Campos TE, Murray DW (2006) Regression-based hand pose estimation from multiple cameras. In: IEEE conference on computer vision and pattern recognition (CVPR), vol 1, pp 782–789

  17. Deng J, Tsui H-T (2002) A PCA/MDA scheme for hand posture recognition. In: Automatic face and gesture recognition, pp 294–299

  18. Faloutsos C, Lin KI (1995) FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: ACM international conference on management of data (SIGMOD), pp 163–174

  19. Freeman WT, Roth M (1996) Computer vision for computer games. In: Automatic face and gesture recognition, pp 100–105

  20. Fujimura K, Liu X (2006) Sign recognition using depth image streams. In: Automatic face and gesture recognition, pp 381–386

  21. Gao W, Fang G, Zhao D, Chen Y (2004) Transition movement models for large vocabulary continuous sign language recognition. In: Automatic face and gesture recognition, pp 553–558

  22. Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: International conference on very large databases, pp 518–529

  23. Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):515–516

    Article  Google Scholar 

  24. Heap T, Hogg D (1996) Towards 3D hand tracking using a deformable model. In: Automatic face and gesture recognition, pp 140–145

  25. Hjaltason GR, Samet H (2003) Index-driven similarity search in metric spaces. ACM Trans Database Syst 28(4):517–580

    Article  Google Scholar 

  26. Hjaltason GR, Samet H (2003) Properties of embedding methods for similarity searching in metric spaces. IEEE Trans Pattern Anal Mach Intell 25(5):530–549

    Article  Google Scholar 

  27. Hristescu G, Farach-Colton M (1999) Cluster-preserving embedding of proteins. Technical report 99-50, CS Department, Rutgers University

  28. Indyk P (2000) High-dimensional computational geometry. PhD thesis, Stanford University

  29. Kadir T, Bowden R, Ong E, Zisserman A (2004) Minimal training, large lexicon, unconstrained sign language recognition. In: British machine vision conference (BMVC), vol 2, pp 939–948

  30. Kavakli M (2008) Gesture recognition in virtual reality. Int J Arts Technol 1(2):215–229

    Article  Google Scholar 

  31. Keogh E (2002) Exact indexing of dynamic time warping. In: International conference on very large data bases, pp 406–417

  32. Keskin C, Balci K, Aran O, Sankur B, Akarun L (2007) A multimodal 3d healthcare communication system. In: 3DTV conference: the true vision—capture, transmission and display of 3D video, pp 1–4

  33. Kruskal JB, Liberman M (1983) The symmetric time warping algorithm: from continuous to discrete. In: Sankoff D, Kruskal JB (eds) Time warps. Addison-Wesley

  34. Li C, Chang E, Garcia-Molina H, Wiederhold G (2002) Clustering for approximate similarity search in high-dimensional spaces. IEEE Trans Knowl Data Eng 14(4):792–808

    Article  Google Scholar 

  35. Linial N, London E, Rabinovich Y (1994) The geometry of graphs and some of its algorithmic applications. In: IEEE symposium on foundations of computer science, pp 577–591

  36. Lu S, Metaxas D, Samaras D, Oliensis J (2003) Using multiple cues for hand tracking and model refinement. In: IEEE conference on computer vision and pattern recognition (CVPR), vol 2, pp 443–450

  37. Ma J, Gao W, Wu J, Wang C (2000) A continuous Chinese Sign Language recognition system. In: Automatic face and gesture recognition, pp 428–433

  38. Martin J, Devin V, Crowley JL (1998) Active hand tracking. In: Automatic face and gesture recognition, pp 573–578

  39. Ong SCW, Ranganath S (2005) Automatic sign language analysis: a survey and the future beyond lexical meaning. IEEE Trans Knowl Data Eng 27(6):873–891

    Article  Google Scholar 

  40. Potamias M, Athitsos V (2008) Nearest neighbor search methods for handshape recognition. In: Makedon F, Baillie L (eds) conference on pervasive technologies related to assistive environments (PETRA)

  41. Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. In: Proceedings of the IEEE, vol 77, p 2

  42. Rehg JM (1995) Visual analysis of high DOF articulated objects with application to hand tracking. PhD thesis, Electrical and Computer Engineering, Carnegie Mellon University

  43. Rosales R, Athitsos V, Sigal L, Sclaroff S (2001) 3D hand pose reconstruction using specialized mappings. In: IEEE international conference on computer vision (ICCV), vol 1, pp 378–385

  44. Rowley HA, Baluja S, Kanade T (1998) Rotation invariant neural network-based face detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 38–44

  45. Sagawa H, Takeuchi M (2000) A method for recognizing a sequence of sign language words represented in a Japanese Sign Language sentence. In: Automatic face and gesture recognition, pp 434–439

  46. Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336

    Article  MATH  Google Scholar 

  47. Shimada N, Kimura K, Shirai Y (2001) Real-time 3-D hand posture estimation based on 2-D appearance retrieval using monocular camera. In: Recognition, analysis and tracking of faces and gestures in realtime systems, pp 23–30

  48. Starner T, Pentland A (1998) Real-time American Sign Language recognition using desk and wearable computer based video. IEEE Trans Pattern Anal Mach Intell 20(12):1371–1375

    Article  Google Scholar 

  49. Stenger B, Thayananthan A, Torr PHS, Cipolla R (2006) Model-based hand tracking using a hierarchical bayesian filter. IEEE Trans Pattern Anal Mach Intell 28(9):1372–1384

    Article  Google Scholar 

  50. Sturm I, Schiewe M, Köhlmann W, Jürgensen H (2009) Communicating through gestures without visual feedback. In: Conference on pervasive technologies related to assistive environments (PETRA)

  51. Thayananthan A, Stenger B, Torr PHS, Cipolla R (2003) Shape context and chamfer matching in cluttered scenes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 127–133

  52. Tuncel E, Ferhatosmanoglu H, Rose K (2002) VQ-index: an index structure for similarity searching in multimedia databases. In: Proceedings of ACM multimedia, pp 543–552

  53. Uhlman J (1991) Satisfying general proximity/similarity queries with metric trees. Infor Process Lett 40(4):175–179

    Article  Google Scholar 

  54. Valli C (eds) (2006) The Gallaudet dictionary of American Sign Language. Gallaudet U. Press, Washington DC

    Google Scholar 

  55. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: IEEE conference on computer vision and pattern recognition, vol 1, pp 511–518

  56. Vogler C, Metaxas DN (1999) Parallel hidden markov models for american sign language recognition. In: IEEE international conference on computer vision (ICCV), pp 116–122

  57. Vogler C, Metaxas DN (2003) Handshapes and movements: multiple-channel American sign language recognition. In: Camurri A, Volpe G (eds) Gesture workshop, pp 247–258

  58. Wang C, Shan S, Gao W (2002) An approach based on phonemes to large vocabulary Chinese Sign Language recognition. In: Automatic face and gesture recognition, pp 411–416

  59. Wang J, Athitsos V, Sclaroff S, Betke M (2008) Detecting objects of variable shape structure with hidden state shape models. IEEE Trans Pattern Anal Mach Intell 30(3):477–492

    Article  Google Scholar 

  60. Wang X, Wang JTL, Lin KI, Shasha D, Shapiro BA, Zhang K (2000) An index structure for data mining and clustering. Knowl Inf Syst 2(2):161–184

    Article  MATH  Google Scholar 

  61. Weber R, Böhm K (2000) Trading quality for time with nearest-neighbor search. In: International conference on extending database technology: advances in database technology, pp 21–35

  62. White DA, Jain R (1996) Similarity indexing: algorithms and performance. In: storage and retrieval for image and video databases (SPIE), pp 62–73

  63. Wu Y, Huang TS (2000) View-independent recognition of hand postures. In: IEEE conference on computer vision and pattern recognition (CVPR), vol 2, pp 88–94

  64. Wu Y, Lin JY, Huang TS (2001) Capturing natural hand articulation. In: IEEE international conference on computer vision (ICCV), vol 2, pp 426–432

  65. Yang M, Ahuja N (1999) Recognizing hand gesture using motion trajectories. In: IEEE conference on computer vision and pattern recognition, vol 1, pp 466–472

  66. Yao G, Yao H, Liu X, Jiang F (2006) Real time large vocabulary continuous sign language recognition based on OP/Viterbi algorithm. In: International conference on pattern recognition, vol 3, pp 312–315

  67. Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: ACM-SIAM symposium on discrete algorithms, pp 311–321

  68. Yuan Q, Sclaroff S, Athitsos V (2005) Automatic 2D hand tracking in video sequences. In: IEEE workshop on applications of computer vision, pp 250–256

Download references

Acknowledgments

This work has been supported by National Science Foundation grants IIS-0705749 and IIS-0812601, as well as by a University of Texas at Arlington startup grant to Professor Athitsos, and University of Texas at Arlington STARS awards to Professors Chris Ding and Fillia Makedon. We also acknowledge and thank our collaborators at Boston University, including Carol Neidle, Stan Sclaroff, Joan Nash, Ashwin Thangali, and Quan Yuan, for their contributions in collecting and annotating the American Sign Language Lexicon Video Dataset.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vassilis Athitsos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Athitsos, V., Wang, H. & Stefan, A. A database-based framework for gesture recognition. Pers Ubiquit Comput 14, 511–526 (2010). https://doi.org/10.1007/s00779-009-0276-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00779-009-0276-x

Keywords

Navigation