Abstract
Gestures are an important modality for human–machine communication. Computer vision modules performing gesture recognition can be important components of intelligent homes, assistive environments, and human–computer interfaces. A key problem in recognizing gestures is that the appearance of a gesture can vary widely depending on variables such as the person performing the gesture, or the position and orientation of the camera. This paper presents a database-based approach for addressing this problem. The large variability in appearance among different examples of the same gesture is addressed by creating large gesture databases, that store enough exemplars from each gesture to capture the variability within that gesture. This database-based approach is applied to two gesture recognition problems: handshape categorization and motion-based recognition of American Sign Language signs. A key aspect of our approach is the use of database indexing methods, in order to address the challenge of searching large databases without violating the time constraints of an online interactive system, where system response times of over a few seconds are oftentimes considered unacceptable. Our experiments demonstrate the benefits of the proposed database-based framework, and the feasibility of integrating large gesture databases into online interacting systems.
Similar content being viewed by others
References
Alon J, Athitsos V, Yuan Q, Sclaroff S (2005) Simultaneous localization and recognition of dynamic hand gestures. In: IEEE motion workshop, pp 254–260
Athitsos V, Alon J, Sclaroff S, Kollios G (2005) Filtering methods for similarity-based multimedia retrieval. In: International workshop on audio-visual content and information visualization in digital libraries (AVIVDiLib)
Athitsos V, Alon J, Sclaroff S, Kollios G (2008) Boostmap: an embedding method for efficient nearest neighbor retrieval. IEEE Trans Pattern Anal Mach Intell 30(1):89–104
Athitsos V, Hadjieleftheriou M, Kollios G, Sclaroff S (2007) Query-sensitive embeddings. ACM Trans Database Syst 32(2)
Athitsos V, Neidle C, Sclaroff S, Nash J, Stefan A, Yuan Q, Thangali A (2008) The American sign language lexicon video dataset. In: IEEE workshop on computer vision and pattern recognition for human communicative behavior analysis (CVPR4HB)
Athitsos V, Sclaroff S (2003) Estimating hand pose from a cluttered image. In: IEEE conference on computer vision and pattern recognition (CVPR), vol 2, pp 432–439
Barrow HG, Tenenbaum JM, Bolles RC, Wolf HC (1977) Parametric correspondence and chamfer matching: two new techniques for image matching. In: International joint conference on artificial intelligence, pp 659–663
Bauer B, Kraiss KF (2001) Towards an automatic sign language recognition system using subunits. In: Camurri A, Volpe G (eds) Gesture workshop, pp 64–75
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522
Böhm C, Berchtold S, Keim DA (2001) Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput Surv 33(3):322–373
Bourgain J (1985) On Lipschitz embeddings of finite metric spaces in Hilbert space. Isr J Math 52:46–52
Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698
Cui Y, Weng J (2000) Appearance-based hand sign recognition from intensity image sequences. Comput Vis Image Underst 78(2):157–176
Curious Labs, Santa Cruz, CA. Poser 5 Reference Manual, August 2002
Darrell TJ, Essa IA, Pentland AP (1996) Task-specific gesture analysis in real-time using interpolated views. IEEE Trans Pattern Anal Mach Intell 18(12):1236–1242
de Campos TE, Murray DW (2006) Regression-based hand pose estimation from multiple cameras. In: IEEE conference on computer vision and pattern recognition (CVPR), vol 1, pp 782–789
Deng J, Tsui H-T (2002) A PCA/MDA scheme for hand posture recognition. In: Automatic face and gesture recognition, pp 294–299
Faloutsos C, Lin KI (1995) FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: ACM international conference on management of data (SIGMOD), pp 163–174
Freeman WT, Roth M (1996) Computer vision for computer games. In: Automatic face and gesture recognition, pp 100–105
Fujimura K, Liu X (2006) Sign recognition using depth image streams. In: Automatic face and gesture recognition, pp 381–386
Gao W, Fang G, Zhao D, Chen Y (2004) Transition movement models for large vocabulary continuous sign language recognition. In: Automatic face and gesture recognition, pp 553–558
Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: International conference on very large databases, pp 518–529
Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):515–516
Heap T, Hogg D (1996) Towards 3D hand tracking using a deformable model. In: Automatic face and gesture recognition, pp 140–145
Hjaltason GR, Samet H (2003) Index-driven similarity search in metric spaces. ACM Trans Database Syst 28(4):517–580
Hjaltason GR, Samet H (2003) Properties of embedding methods for similarity searching in metric spaces. IEEE Trans Pattern Anal Mach Intell 25(5):530–549
Hristescu G, Farach-Colton M (1999) Cluster-preserving embedding of proteins. Technical report 99-50, CS Department, Rutgers University
Indyk P (2000) High-dimensional computational geometry. PhD thesis, Stanford University
Kadir T, Bowden R, Ong E, Zisserman A (2004) Minimal training, large lexicon, unconstrained sign language recognition. In: British machine vision conference (BMVC), vol 2, pp 939–948
Kavakli M (2008) Gesture recognition in virtual reality. Int J Arts Technol 1(2):215–229
Keogh E (2002) Exact indexing of dynamic time warping. In: International conference on very large data bases, pp 406–417
Keskin C, Balci K, Aran O, Sankur B, Akarun L (2007) A multimodal 3d healthcare communication system. In: 3DTV conference: the true vision—capture, transmission and display of 3D video, pp 1–4
Kruskal JB, Liberman M (1983) The symmetric time warping algorithm: from continuous to discrete. In: Sankoff D, Kruskal JB (eds) Time warps. Addison-Wesley
Li C, Chang E, Garcia-Molina H, Wiederhold G (2002) Clustering for approximate similarity search in high-dimensional spaces. IEEE Trans Knowl Data Eng 14(4):792–808
Linial N, London E, Rabinovich Y (1994) The geometry of graphs and some of its algorithmic applications. In: IEEE symposium on foundations of computer science, pp 577–591
Lu S, Metaxas D, Samaras D, Oliensis J (2003) Using multiple cues for hand tracking and model refinement. In: IEEE conference on computer vision and pattern recognition (CVPR), vol 2, pp 443–450
Ma J, Gao W, Wu J, Wang C (2000) A continuous Chinese Sign Language recognition system. In: Automatic face and gesture recognition, pp 428–433
Martin J, Devin V, Crowley JL (1998) Active hand tracking. In: Automatic face and gesture recognition, pp 573–578
Ong SCW, Ranganath S (2005) Automatic sign language analysis: a survey and the future beyond lexical meaning. IEEE Trans Knowl Data Eng 27(6):873–891
Potamias M, Athitsos V (2008) Nearest neighbor search methods for handshape recognition. In: Makedon F, Baillie L (eds) conference on pervasive technologies related to assistive environments (PETRA)
Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. In: Proceedings of the IEEE, vol 77, p 2
Rehg JM (1995) Visual analysis of high DOF articulated objects with application to hand tracking. PhD thesis, Electrical and Computer Engineering, Carnegie Mellon University
Rosales R, Athitsos V, Sigal L, Sclaroff S (2001) 3D hand pose reconstruction using specialized mappings. In: IEEE international conference on computer vision (ICCV), vol 1, pp 378–385
Rowley HA, Baluja S, Kanade T (1998) Rotation invariant neural network-based face detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 38–44
Sagawa H, Takeuchi M (2000) A method for recognizing a sequence of sign language words represented in a Japanese Sign Language sentence. In: Automatic face and gesture recognition, pp 434–439
Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336
Shimada N, Kimura K, Shirai Y (2001) Real-time 3-D hand posture estimation based on 2-D appearance retrieval using monocular camera. In: Recognition, analysis and tracking of faces and gestures in realtime systems, pp 23–30
Starner T, Pentland A (1998) Real-time American Sign Language recognition using desk and wearable computer based video. IEEE Trans Pattern Anal Mach Intell 20(12):1371–1375
Stenger B, Thayananthan A, Torr PHS, Cipolla R (2006) Model-based hand tracking using a hierarchical bayesian filter. IEEE Trans Pattern Anal Mach Intell 28(9):1372–1384
Sturm I, Schiewe M, Köhlmann W, Jürgensen H (2009) Communicating through gestures without visual feedback. In: Conference on pervasive technologies related to assistive environments (PETRA)
Thayananthan A, Stenger B, Torr PHS, Cipolla R (2003) Shape context and chamfer matching in cluttered scenes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 127–133
Tuncel E, Ferhatosmanoglu H, Rose K (2002) VQ-index: an index structure for similarity searching in multimedia databases. In: Proceedings of ACM multimedia, pp 543–552
Uhlman J (1991) Satisfying general proximity/similarity queries with metric trees. Infor Process Lett 40(4):175–179
Valli C (eds) (2006) The Gallaudet dictionary of American Sign Language. Gallaudet U. Press, Washington DC
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: IEEE conference on computer vision and pattern recognition, vol 1, pp 511–518
Vogler C, Metaxas DN (1999) Parallel hidden markov models for american sign language recognition. In: IEEE international conference on computer vision (ICCV), pp 116–122
Vogler C, Metaxas DN (2003) Handshapes and movements: multiple-channel American sign language recognition. In: Camurri A, Volpe G (eds) Gesture workshop, pp 247–258
Wang C, Shan S, Gao W (2002) An approach based on phonemes to large vocabulary Chinese Sign Language recognition. In: Automatic face and gesture recognition, pp 411–416
Wang J, Athitsos V, Sclaroff S, Betke M (2008) Detecting objects of variable shape structure with hidden state shape models. IEEE Trans Pattern Anal Mach Intell 30(3):477–492
Wang X, Wang JTL, Lin KI, Shasha D, Shapiro BA, Zhang K (2000) An index structure for data mining and clustering. Knowl Inf Syst 2(2):161–184
Weber R, Böhm K (2000) Trading quality for time with nearest-neighbor search. In: International conference on extending database technology: advances in database technology, pp 21–35
White DA, Jain R (1996) Similarity indexing: algorithms and performance. In: storage and retrieval for image and video databases (SPIE), pp 62–73
Wu Y, Huang TS (2000) View-independent recognition of hand postures. In: IEEE conference on computer vision and pattern recognition (CVPR), vol 2, pp 88–94
Wu Y, Lin JY, Huang TS (2001) Capturing natural hand articulation. In: IEEE international conference on computer vision (ICCV), vol 2, pp 426–432
Yang M, Ahuja N (1999) Recognizing hand gesture using motion trajectories. In: IEEE conference on computer vision and pattern recognition, vol 1, pp 466–472
Yao G, Yao H, Liu X, Jiang F (2006) Real time large vocabulary continuous sign language recognition based on OP/Viterbi algorithm. In: International conference on pattern recognition, vol 3, pp 312–315
Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: ACM-SIAM symposium on discrete algorithms, pp 311–321
Yuan Q, Sclaroff S, Athitsos V (2005) Automatic 2D hand tracking in video sequences. In: IEEE workshop on applications of computer vision, pp 250–256
Acknowledgments
This work has been supported by National Science Foundation grants IIS-0705749 and IIS-0812601, as well as by a University of Texas at Arlington startup grant to Professor Athitsos, and University of Texas at Arlington STARS awards to Professors Chris Ding and Fillia Makedon. We also acknowledge and thank our collaborators at Boston University, including Carol Neidle, Stan Sclaroff, Joan Nash, Ashwin Thangali, and Quan Yuan, for their contributions in collecting and annotating the American Sign Language Lexicon Video Dataset.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Athitsos, V., Wang, H. & Stefan, A. A database-based framework for gesture recognition. Pers Ubiquit Comput 14, 511–526 (2010). https://doi.org/10.1007/s00779-009-0276-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00779-009-0276-x