Video-Based Vietnamese Sign Language Recognition Using Local Descriptors

  • Anh H. Vo
  • Nhu T. Q. Nguyen
  • Ngan T. B. Nguyen
  • Van-Huy PhamEmail author
  • Ta Van Giap
  • Bao T. NguyenEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11432)


Sign Language is one of the method for non-verbal communication. It is most commonly used by deaf or dumb people who have hearing or speech problems to communicate among themselves or with normal people. Vietnamese Sign Language (VSL) is a sign language system used in the community of Vietnamese hearing impaired individuals. VSL recognition aims to develop algorithms and methods to correctly identify a sequence of produced signs and to understand their meaning in Vietnamese. However, automatic VSL recognition in video has many challenges due to the orientation of camera, hand position and movement, inter hand relation, etc. In this paper, we present some feature extraction approaches for VSL recognition includes spatial feature, scene-based feature, and especially motion-based feature. Instead of relying on a static image, we specifically capture motion information between frames in a video sequence. We evaluated the proposed framework on our acquired VSL dataset including 23 alphabets, 3 diacritic marks and 5 tones in Vietnamese language with 2D camera. Additionally, in order to gain more information of hand movement and hand position, we also used the data augmentation technique. All these helpful information would contribute to an effective VSL recognition system. The experiments achieved the satisfactory results with 86.61%. It indicates that data augmentation technique provides more information about the orientation of hand. Moreover, the combination of spatial, scene and especially motion information could help the system to be able to capture information from both single frame and from multiple frames, and thus the performance of VSL recognition system could be improved.


Vietnamese Sign Language (VSL) VSL recognition Local descriptors Spatial feature Scene-based feature Motion-based feature 



The authors would like to thank the teachers of the deaf people in Binh Duong province, Vietnam. We acknowlegment the support of the students in Ton Duc Thang University.


  1. 1.
    Bartlett, M.S., Littlewort, G., Fasel, I., Movellan, J.R.: Real time face detection and facial expression recognition: development and applications to human computer interaction. In: 2003 Conference on Computer Vision and Pattern Recognition Workshop, vol. 5, pp. 53–53, June 2003.
  2. 2.
    Bui, T.D., Nguyen, L.T.: Recognizing postures in Vietnamese sign language with mems accelerometers. IEEE Sens. J. 7(5), 707–712 (2007). Scholar
  3. 3.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 886–893, June 2005.
  4. 4.
    Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description (2015)Google Scholar
  5. 5.
    Duc, H.V., Huynh, H.H., Phuoc, M.D., Meunier, J.: Dynamic gesture classification for Vietnamese sign language recognition. Int. J. Adv. Comput. Sci. Appl. 8, 415–420 (2017)Google Scholar
  6. 6.
    Hai, P.T., Thinh, H.C., Phuc, B.V., Kha, H.H.: Automatic feature extraction for Vietnamese sign language recognition using support vector machine. In: 2018 2nd International Conference on Recent Advances in Signal Processing, Telecommunications Computing (SigTelCom), pp. 146–151, January 2018.
  7. 7.
    Liang, Z.J., Liao, S.B., Hu, B.Z.: 3D convolutional neural networks for dynamic sign language recognition. Comput. J. 61(11), 1724–1736 (2018). Scholar
  8. 8.
    Pigou, L., Dieleman, S., Kindermans, P.-J., Schrauwen, B.: Sign language recognition using convolutional neural networks. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8925, pp. 572–578. Springer, Cham (2015). Scholar
  9. 9.
    Ng, J.Y., Hausknecht, M.J., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Deep networks for video classification (2015)Google Scholar
  10. 10.
    Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with classification based on feature distributions. Pattern Recogn. 29, 51–59 (1996). Scholar
  11. 11.
    Ojansivu, V., Heikkilä, J.: Blur insensitive texture classification using local phase quantization. In: Elmoataz, A., Lezoray, O., Nouboud, F., Mammass, D. (eds.) ICISP 2008. LNCS, vol. 5099, pp. 236–243. Springer, Heidelberg (2008). Scholar
  12. 12.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001). Scholar
  13. 13.
    Shan, C., Gong, S., McOwan, P.W.: Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009). Scholar
  14. 14.
    Vo, D., Nguyen, T., Huynh, H., Meunier, J.: Recognizing Vietnamese sign language based on rank matrix and alphabetic rules. In: 2015 International Conference on Advanced Technologies for Communications (ATC), pp. 279–284, October 2015.

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Anh H. Vo
    • 1
  • Nhu T. Q. Nguyen
    • 1
  • Ngan T. B. Nguyen
    • 1
  • Van-Huy Pham
    • 1
    Email author
  • Ta Van Giap
    • 2
  • Bao T. Nguyen
    • 3
    Email author
  1. 1.Faculty of Information TechnologyTon Duc Thang UniversityHo Chi Minh CityVietnam
  2. 2.Can Tho Medical CollegeCan Tho CityVietnam
  3. 3.University of Education and TechnologyHo Chi MinhVietnam

Personalised recommendations