Skip to main content

Advertisement

Log in

Recent developments in visual sign language recognition

  • Long Paper
  • Published:
Universal Access in the Information Society Aims and scope Submit manuscript

Abstract

Research in the field of sign language recognition has made significant advances in recent years. The present achievements provide the basis for future applications with the objective of supporting the integration of deaf people into the hearing society. Translation systems, for example, could facilitate communication between deaf and hearing people in public situations. Further applications, such as user interfaces and automatic indexing of signed videos, become feasible. The current state in sign language recognition is roughly 30 years behind speech recognition, which corresponds to the gradual transition from isolated to continuous recognition for small vocabulary tasks. Research efforts were mainly focused on robust feature extraction or statistical modeling of signs. However, current recognition systems are still designed for signer-dependent operation under laboratory conditions. This paper describes a comprehensive concept for robust visual sign language recognition, which represents the recent developments in this field. The proposed recognition system aims for signer-independent operation and utilizes a single video camera for data acquisition to ensure user-friendliness. Since sign languages make use of manual and facial means of expression, both channels are employed for recognition. For mobile operation in uncontrolled environments, sophisticated algorithms were developed that robustly extract manual and facial features. The extraction of manual features relies on a multiple hypotheses tracking approach to resolve ambiguities of hand positions. For facial feature extraction, an active appearance model is applied which allows identification of areas of interest such as the eyes and mouth region. In the next processing step, a numerical description of the facial expression, head pose, line of sight, and lip outline is computed. The system employs a resolution strategy for dealing with mutual overlapping of the signer’s hands and face. Classification is based on hidden Markov models which are able to compensate time and amplitude variances in the articulation of a sign. The classification stage is designed for recognition of isolated signs, as well as of continuous sign language. In the latter case, a stochastic language model can be utilized, which considers uni- and bigram probabilities of single and successive signs. For statistical modeling of reference models each sign is represented either as a whole or as a composition of smaller subunits—similar to phonemes in spoken languages. While recognition based on word models is limited to rather small vocabularies, subunit models open the door to large vocabularies. Achieving signer-independence constitutes a challenging problem, as the articulation of a sign is subject to high interpersonal variance. This problem cannot be solved by simple feature normalization and must be addressed at the classification level. Therefore, dedicated adaptation methods known from speech recognition were implemented and modified to consider the specifics of sign languages. For rapid adaptation to unknown signers the proposed recognition system employs a combined approach of maximum likelihood linear regression and maximum a posteriori estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33
Fig. 34
Fig. 35
Fig. 36
Fig. 37
Fig. 38
Fig. 39
Fig. 40
Fig. 41
Fig. 42
Fig. 43
Fig. 44
Fig. 45
Fig. 46
Fig. 47
Fig. 48
Fig. 49
Fig. 50
Fig. 51
Fig. 52

Similar content being viewed by others

Notes

  1. For speech-recognition the accordant name is acoustic subunits. For sign language recognitions the name is adapted.

References

  1. Bahl, L., Jelinek, F., Mercer, R.: A maximum likelihood approach to continuous speech recognition. IEEE Trans. Pattern Anal. Mach. Intell. 5(2), 179–190 (1983)

    Article  Google Scholar 

  2. Bauer, B.: Erkennung kontinuierlicher Gebärdensprache mit Untereinheiten-Modellen. Shaker Verlag, Aachen (2003)

    Google Scholar 

  3. Becker, C.: Zur Struktur der deutschen Gebärdensprache. WVT Wissenschaftlicher Verlag, Trier (Germany) (1997)

    Google Scholar 

  4. Canzler, U.: Nicht-intrusive Mimikanalyse. Dissertation, Chair of Technical Computer Science, RWTH, Aachen (2005)

  5. Canzler, U., Dziurzyk, T.: Extraction of non manual features for videobased sign language recognition. In: Proceedings of the IAPR Workshop on Machine Vision Applications, pp. 318–321. Nara, Japan (2002)

  6. Canzler, U., Ersayar, T.: Manual and facial features combination for videobased sign language recognition. In: Proceedings of the 7th International Student Conference on Electrical Engineering. Prague (2003)

  7. Canzler, U., Kraiss, K.-F.: Person-adaptive facial feature analysis for an advanced wheelchair user-interface. In: Conference on Mechatronics and Robotics, vol. Part III, pp. 871–876. Sascha Eysoldt Verlag (2004)

  8. Canzler, U., Wegener, B.: Person-adaptive facial feature analysis. In: Proceedings of the 8th International Student Conference on Electrical Engineering. Prague (2004)

  9. Cootes, T., Edwards, G., Taylor, C.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)

    Article  Google Scholar 

  10. Derpanis, K.G.: A review of vision-based hand gestures. Technical Report, Department of Computer Science, York University (2004)

  11. Dick, T., Zieren, J., Kraiss, K.-F.: Visual hand posture recognition in monocular image sequences. In: Pattern Recognition, 28th DAGM Symposium Berlin, Lecture Notes in Computer Science. Springer, Berlin (2006)

  12. Fang, G., Gao, W., Chen, X., Wang, C., Ma, J. Signer-independent continuous sign language recognition based on SRN/HMM. In: Revised Papers from the International Gesture Workshop on Gestures and Sign Languages in Human–Computer Interaction, pp. 76–85. Springer, Heidelberg (2002)

  13. Gales, M., Woodland, P.: Mean and variance adaptation within the MLLR framework. Comput. Speech Lang. 10, 249–264 (1996)

    Article  Google Scholar 

  14. Hermansky, H., Timberwala, S., Pavel, M.: Towards ASR on partially corrupted speech. In: Proceedings of the 4th International Conference on Spoken Language Processing, vol. 1, pp. 462–465. Philadelphia, PA (1996)

  15. Holden, E.J., Owens, R.A.: Visual sign language recognition. In: Proceedings of the 10th International Workshop on Theoretical Foundations of Computer Vision, pp. 270–288. Springer, Heidelberg (2001)

  16. Huang, X., Ariki, Y., Jack, M.: Hidden Markov Models for Speech Recognition. Edinburgh University Press, Edinburgh (1990)

    Google Scholar 

  17. Illingworth, J., Kittler, J.: A survey of the Hough transform. Computer Vision, Graphics, and Image Processing 44(1), 87–116 (1988)

    Article  Google Scholar 

  18. Imai, A., Shimada, N., Shirai, Y.: 3-D hand posture recognition by training contour variation. In: Proceedings of the International Conference on Automatic Face and Gesture Recognition (2004)

  19. Jelinek, F.: Statistical Methods for Speech Recognition. MIT, Cambridge (1998). ISBN 0-262-10066-5

  20. Jones, M., Rehg, J.: Statistical color models with application to skin detection. Technical Report CRL 98/11, Compaq Cambridge Research Lab (1998)

  21. Kraiss, K.-F. (ed): Advanced man–machine interaction. Springer, Heidelberg (2006). ISBN 3-540-30618-8

  22. Lee, C.-H., Lin, C.-H., Juang, B.-H.: A study on speaker adaptation of the parameters of continuous density hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 39(4), 806–814 (1991)

    Google Scholar 

  23. Leggetter, C.J.: Improved acoustic modelling for HMMs using linear transformations. Ph.D. Thesis, Cambridge University (1995)

  24. Liang, R.H., Ouhyoung, M.: A real-time continuous gesture interface for Taiwanese sign language. In: Proceedings of the 10th Annual ACM Symposium on User Interface Software and Technology. Banff, Alberta, Canada, 14–17 October 1997

  25. Liddell, S.K., Johnson, R.E.: American sign language: the phonological base. Sign Lang. Stud. 18(64), 195–277 (1989)

    Google Scholar 

  26. Lievin, M., Luthon, F.: Nonlinear color space and spatiotemporal MRF for hierarchical segmentation of face features in video. IEEE Trans. Image Process. 13, 63–71 (2004)

    Article  Google Scholar 

  27. Murakami, K., Taguchi, H.: Gesture recognition using recurrent neural networks. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 237–242. ACM, New York (1991)

  28. Ong, S.C.W., Ranganath, S.: Deciphering gestures with layered meanings and signer adaptation. In: Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition (2004)

  29. Ong, S.C.W., Ranganath, S.: Automatic sign language analysis: a survey and the future beyond lexical meaning. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 873–891 (2005)

    Article  Google Scholar 

  30. Parashar, A.S.: Representation and interpretation of manual and non-manual information for automated American sign language recognition. Ph.D. Thesis, Department of Computer Science and Engineering, College of Engineering, University of South Florida (2003)

  31. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  32. Rabiner, L.R., Juang, B.-H.: An introduction to hidden Markov models. IEEE Acoust. Speech Signal Process. Soc. Mag. 3(1), 4–16 (1986)

    Google Scholar 

  33. Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Upper Saddle River, ISBN 0-13-015157-2 (1993)

    Google Scholar 

  34. Sonka, M., Hlavac, V., Boyle, R.: Image Processing, Analysis and Machine Vision. International Thomson Publishing (1998). ISBN 0-534-95393-X

  35. Starner, T., Weaver, J., Pentland, A.: Real-time American sign language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1371–1375 (1998)

    Article  Google Scholar 

  36. Stokoe, W.: Sign language structure: an outline of the visual communication systems of the american deaf. (Studies in Linguistics. Occasional paper, University of Buffalo (1960)

  37. Sturman, D.J.: Whole-hand input. Ph.D. Thesis, School of Architecture and Planning, Massachusetts Institute of Technology (1992)

  38. Sutton, V.: http://www.signwriting.org/ (2003)

  39. Tomasi, C., Kanade, T.: Detection and tracking of point features. Technical Report CS-91-132, CMU, 1991

  40. Vamplew, P., Adams, A.: Recognition of Sign Language Gestures Using Neural Networks. In: European Conference on Disabilities, Virtual Reality and Associated Technologies (1996)

  41. Vittrup, M., Sørensen, M.K.D, McCane, B.: Pose Estimation by Applied Numerical Techniques. Image and Vision Computing, New Zealand (2002)

    Google Scholar 

  42. Vogler, C., Metaxas, D.: Parallel hidden Markov models for American sign language recognition. In: Proceedings of the International Conference on Computer Vision (1999)

  43. Vogler, C., Metaxas, D.: Toward scalability in ASL recognition: breaking down signs into phonemes. In: Gesture-Based Communication in Human–Computer Interaction, International Gesture Workshop, GW’99, Lecture Notes in Computer Science, pp. 211–224. Springer, Berlin (1999)

  44. von Agris, U., Schneider, D., Zieren, J., Kraiss, K.-F.: Rapid signer adaptation for isolated sign language recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop. New York, USA (2006)

  45. Welch, G., Bishop, G.: An introduction to the Kalman Filter. Technical Report TR 95-041, Department of Computer Science, University of North Carolina at Chapel Hill (2004)

  46. Yang, M., Ahuja, N., Tabb, M.: Extraction of 2D motion trajectories and its application to hand gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 24, 1061–1074 (2002)

    Article  Google Scholar 

  47. Zieren, J., Kraiss, K.-F.: Robust person-independent visual sign language recognition. In: Proceedings of the 2nd Iberian Conference on Pattern Recognition and Image Analysis, Lecture Notes in Computer Science (2005)

  48. Zieren, J.: Visuelle Erkennung von Handposituren für einen interaktiven Gebärdensprachtutor. Dissertation, Chair of Technical Computer Science, RWTH Aachen (2007)

Download references

Acknowledgments

The presented work was partially supported by the Deutsche Forschungsgemeinschaft (German Research Foundation) and the European Commission, Directorate General Information Society, IST–2000–27512 WISDOM. The videos containing British Sign Language were kindly provided by the British Deaf Association.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ulrich von Agris.

Rights and permissions

Reprints and permissions

About this article

Cite this article

von Agris, U., Zieren, J., Canzler, U. et al. Recent developments in visual sign language recognition. Univ Access Inf Soc 6, 323–362 (2008). https://doi.org/10.1007/s10209-007-0104-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10209-007-0104-x

Keywords

Navigation