Sign Language Recognition with Support Vector Machines and Hidden Conditional Random Fields: Going from Fingerspelling to Natural Articulated Words

  • César Roberto de Souza
  • Ednaldo Brigante Pizzolato
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7988)

Abstract

This paper describes the authors’ experiments with Support Vector Machines and Hidden Conditional Random Fields on the classification of freely articulated sign words drawn from the Brazilian Sign Language (Libras). While our previous works focused specifically on fingerspelling recognition on tightly controlled environment conditions, in this work we perform the classification of natural signed words in an unconstrained background without the aid of gloves or wearable tracking devices. We show how our choice of feature vector, extracted from depth information and based on linguistic investigations, is rather effective for this task. Again we provide comparison results against Artificial Neural Networks and Hidden Markov Models, reporting statistically significant results favoring our choice of classifiers; and we validate our findings using the chance-corrected Cohen’s Kappa statistic for contingency tables.

Keywords

Gesture Recognition Sign Languages Libras Support Vector Machines Hidden Conditional Random Fields Neural Networks Hidden Markov Models Discriminative Models 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Pizzolato, E., Anjo, M., Pedroso, G.: Automatic recognition of finger spelling for LIBRAS based on a two-layer architecture. In: Proceedings of the 2010 ACM Symposium on Applied Computing, Sierre, Switzerland, pp. 969–973 (2010)Google Scholar
  2. 2.
    de Souza, C.R., Pizzolato, E.B., dos Santos Anjo, M.: Fingerspelling Recognition with Support Vector Machines and Hidden Conditional Random Fields. In: Pavón, J., Duque-Méndez, N.D., Fuentes-Fernández, R. (eds.) IBERAMIA 2012. LNCS, vol. 7637, pp. 561–570. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  3. 3.
    Mitra, S., Acharya, T.: Gesture recognition: A survey. IEEE Transactions on Systems, Man and Cybernetics - Part C: Applications and Reviews 37(3), 311–324 (2007)CrossRefGoogle Scholar
  4. 4.
    Chen, X., Xiang, L.Y., Lantz, V., Wang, K., Yang, J.: A Framework for Hand Gesture Recognition Based on Accelerometer and EMG Sensors. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans 41(6), 1064–1076 (2011)CrossRefGoogle Scholar
  5. 5.
    Yang, H.-D., Sclaroff, S., Lee, S.-W.: Sign Language Spotting with a Threshold Model Based on Conditional Random Fields. IEEE Trans. Pattern Anal. Mach. Intell. 31(7), 1264–1277 (2009)CrossRefGoogle Scholar
  6. 6.
    Bauer, B., Kraiss, K.-F.: Video-based sign recognition using self-organizing subunits. In: Proceedings of the16th International Conference on Pattern Recognition, vol. 2, pp. 434–437 (2002)Google Scholar
  7. 7.
    Dias, D., Madeo, R., Rocha, T., Bíscaro, H., Peres, S.: Hand movement recognition for brazilian sign language: a study using distance-based neural networks. In: Proceedings of the 2009 International Joint Conference on Neural Networks, Atlanta, Georgia, USA, pp. 2355–2362 (2009)Google Scholar
  8. 8.
    Elmezain, M., Al-Hamadi, A., Michaelis, B.: Discriminative Models-Based Hand Gesture Recognition. In: International Conference on Machine Vision, Los Alamitos, CA, USA, pp. 123–127 (2009)Google Scholar
  9. 9.
    Zafrulla, Z., Brashear, H., Starner, T., Hamilton, H., Presti, P.: American Sign Language Recognition with the Kinect. In: Proceedings of the 13th International Conference on Multimodal Interfaces, Alicante, Spain, pp. 279–286 (2011)Google Scholar
  10. 10.
    Bowden, R., Windridge, D., Kadir, T., Zisserman, A., Brady, M.: A Linguistic Feature Vector for the Visual Interpretation of Sign Language. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 390–401. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  11. 11.
    Holden, E.-J., Lee, G., Owens, R.: Australian sign language recognition. Machine Vision and Applications 16(5), 312–320 (2005)CrossRefGoogle Scholar
  12. 12.
    Carneiro, A., Cortez, P., Costa, R.: Reconhecimento de Gestos da LIBRAS com Classificadores Neurais a partir dos Momentos Invariantes de Hu. In: Interaction 2009, South America, São Paulo, pp. 190–195 (2009)Google Scholar
  13. 13.
    Wang, S., Quattoni, A., Morency, L.-P., Demirdjian, D.: Hidden Conditional Random Fields for Gesture Recognition. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, vol. 2, pp. 1521–1527 (2006)Google Scholar
  14. 14.
    Morency, L.-P., Quattoni, A., Darrell, T.: Latent-Dynamic Discriminative Models for Continuous Gesture Recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2007, pp. 1–8 (2007)Google Scholar
  15. 15.
    Ferreira-Brito, L.: Por uma gramática de Línguas de Sinais, 2nd edn. Tempo Brasileiro, Rio de Janeiro (2010)Google Scholar
  16. 16.
    Igel, C., Hüsken, M.: Improving the Rprop Learning Algorithm. In : Symposium A Quarterly Journal In Modern Foreign Literatures, pp.115-121 (2000) Google Scholar
  17. 17.
    Riedmiller, M.: RProp - Description and Implementation Details. Technical Report, University of Karlsruhe, Karlsruhe (1994)Google Scholar
  18. 18.
    Dahl, G., Yu, D., Deng, L., Acero, A.: Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing (2012)Google Scholar
  19. 19.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn., 2200935th edn. Springer (2009)Google Scholar
  20. 20.
    Platt, J., Cristianini, N., Shawe-taylor, J.: Large Margin DAGs for Multiclass Classification. Advances in Neural Information Processing Systems, 547–553 (2000)Google Scholar
  21. 21.
    Joachims, T.: Text categorization with Support Vector Machines: Learning with many relevant features Machine Learning. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  22. 22.
    Joachims, T.: Making large-scale support vector machine learning practical. In: Advances in Kernel Methods, pp. 169–184. MIT Press, Cambridge (1999)Google Scholar
  23. 23.
    Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines and other kernel-based learning methods, 1st edn. Cambridge University Press, Cambridge (2000)CrossRefGoogle Scholar
  24. 24.
    Keerthi, S., Shevade, S., Bhattacharyya, C., Murthy, K.: Improvements to Platt’s SMO Algorithm for SVM Classifier Design. Neural Comput. 13(3), 637–649 (2001)MATHCrossRefGoogle Scholar
  25. 25.
    Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. In: Waibel, A., Lee, K.-F. (eds.) Readings in Speech Recognition, pp. 267–296. Morgan Kaufmann Publishers Inc., San Francisco (1990)Google Scholar
  26. 26.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the Eighteenth International Conference on Machine Learning, San Francisco, CA, USA, pp. 282–289 (2001)Google Scholar
  27. 27.
    Sutton, C., McCallum, A.: Introduction to Statistical Relational Learning. In: Taskar, L. (ed.) An Introduction to Conditional Random Fields for Relational Learning. MIT Press (2007)Google Scholar
  28. 28.
    Mahajan, M., Gunawardana, A., Acero, A.: Training algorithms for hidden conditional random fields. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 273–276 (2006)Google Scholar
  29. 29.
    Viola, P., Jones, M.: Robust Real-time Object Detection. International Journal of Computer Vision (2001)Google Scholar
  30. 30.
    Bradski, G.: Computer Vision Face Tracking For Use in a Perceptual User Interface. Intel Technology Journal(Q2) (1998)Google Scholar
  31. 31.
    Anjo, M., Pizzolato, E., Feuerstack, S.: A Real-Time System to Recognize Static Hand Gestures of Brazilian Sign Language (Libras) alphabet using Kinect. In: Proceedings of IHC 2012, the 6th Latin American Conference on Human-Computer Interaction, Cuiabá, Mato Grosso, Brazil (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • César Roberto de Souza
    • 1
  • Ednaldo Brigante Pizzolato
    • 1
  1. 1.Universidade Federal de São CarlosSão CarlosBrasil

Personalised recommendations