Abstract
The continuous Sign Language recognition (SLR) system suffers from a problem called movement epenthesis (me) which involves certain intermediate connecting movement between two consecutive signs. In this paper, a novel framework for spotting of continuous fingerspelling sequence is proposed, which can directly extract motion information of signs from a compressed video. The framework is based on motion vectors extracted from H.264/AVC compressed videos. A Spatio-Temporal Markov Random Field (ST-MRF) based model is employed to model non-rigid motions of fingers as sign or me. The proposed framework is tested on a number of sign language videos encoded with an H.264/AVC JM encoder, and the accuracy of spotting was found to be around 75%.
Similar content being viewed by others
References
Abdari A, Amirjan P, Mansouri A (2019) Action recognition in compressed domain using residual information. In: 2019 4Th international conference on pattern recognition and image analysis (IPRIA). IEEE, pp 130–134
Aly W, Aly S, Almotairi S (2019) User-independent american sign language alphabet recognition based on depth image and pcanet features. IEEE Access 7:123138–123150
Avola D, Bernardi M, Cinque L, Foresti GL, Massaroni C (2019) Exploiting recurrent neural networks and leap motion controller for the recognition of sign language and semaphoric hand gestures. IEEE Trans Multimed 21(1):234–245
Besag J (1974) Spatial interaction and the statistical analysis of lattice systems. J R Stat Soc Ser B (Methodol) 36(2):192–225
Besag J (1986) On the statistical analysis of dirty pictures. J R Stat Soc Ser B (Methodol) 48(3):259–279
Chen YM, Bajic IV, Saeedi P (2011) Moving region segmentation from compressed video using global motion estimation and markov random fields. IEEE Trans Multimed 13(3):421–431
Chon J, Cherniavsky N, Riskin EA, Ladner RE (2009) Enabling access through real-time sign language communication over cell phones. In: 2009 Conference record of the forty-third asilomar conference on signals, systems and computers. IEEE, pp 588–592
Chon J, Whittle S, Riskin EA, Ladner RE (2011) Improving compressed video sign language conversations in the presence of data loss. In: 2011 Data compression conference. IEEE, pp 383–392
Chuan CH, Regina E, Guardino C (2014) American sign language recognition using leap motion sensor. In: 2014 13Th international conference on machine learning and applications. IEEE, pp 541–544
Ciaramello FM, Hemami SS (2011) A computational intelligibility model for assessment and compression of american sign language video. IEEE Trans Image Process 20(11):3014–3027
Jalal MA, Chen R, Moore RK, Mihaylova L (2018) American sign language posture understanding with deep neural networks. In: 2018 21St international conference on information fusion (FUSION). IEEE, pp 573–579
Kane L, Khanna P (2015) A framework for live and cross platform fingerspelling recognition using modified shape matrix variants on depth silhouettes. Comput Vis Image Underst 141:138–151
Kang B, Tripathi S, Nguyen TQ (2015) Real-time sign language fingerspelling recognition using convolutional neural networks from depth map. In: 2015 3Rd IAPR asian conference on pattern recognition (ACPR). IEEE, pp 136–140
Kayaalp IB (2003) Video segmentation using partially decoded mpeg bitstream. Ph.D. thesis METU
Khatoonabadi SH, Bajic IV (2013) Video object tracking in the compressed domain using spatio-temporal markov random fields. IEEE Trans Image Process 22(1):300–313
Kim J, Chang HS, Kim J, Kim HM (2000) Efficient camera motion characterization for mpeg video indexing. In: 2000 IEEE International conference on multimedia and expo. ICME2000. Proceedings. Latest advances in the fast changing world of multimedia (cat. no. 00TH8532), vol 2. IEEE, pp 1171–1174
Kim T, Shakhnarovich G, Livescu K (2013) Fingerspelling recognition with semi-markov conditional random fields. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1521–1528
Lee J, Lee H, Lee D, Oh SJ (2017) A compressed-domain corner detection method for a dct-based compressed image. In: 2017 IEEE International conference on consumer electronics (ICCE). IEEE, pp 306–307
Li SZ (2009) Markov random field modeling in image analysis. Springer Science & Business Media
Nguyen HB, Do HN (2019) Deep learning for american sign language fingerspelling recognition system. In: 2019 26Th international conference on telecommunications (ICT). IEEE, pp 314–318
Papadimitriou K, Potamianos G (2019) Fingerspelled alphabet sign recognition in upper-body videos. In: 2019 27Th european signal processing conference (EUSIPCO). IEEE, pp 1–5
Ricco S, Tomasi C (2009) Fingerspelling recognition through classification of letter-to-letter transitions. In: Asian conference on computer vision. Springer, pp 214–225
Shi B, Del Rio AM, Keane J, Michaux J, Brentari D, Shakhnarovich G, Livescu K (2018) American sign language fingerspelling recognition in the wild. In: 2018 IEEE Spoken language technology workshop (SLT). IEEE, pp 145–152
Talukdar AK, Bhuyan M (2018) Movement epenthesis detection in continuous fingerspelling from a coarsely sampled motion vector field in h. 264/avc video. In: 2018 IEEE Recent advances in intelligent computational systems (RAICS). IEEE, pp 26–30
Tazhigaliyeva N, Kalidolda N, Imashev A, Islam S, Aitpayev K, Parisi GI, Sandygulova A (2017) Cyrillic manual alphabet recognition in rgb and rgb-d data for sign language interpreting robotic system (slirs). In: 2017 IEEE International conference on robotics and automation (ICRA). IEEE, pp 4531–4536
Yang HD, Lee SW (2010) Simultaneous spotting of signs and fingerspellings based on hierarchical conditional random fields and boostmap embeddings. Pattern Recogn 43(8):2858–2870
Yang HD, Sclaroff S, Lee SW (2009) Sign language spotting with a threshold model based on conditional random fields. IEEE Trans Pattern Anal Mach Intell 31(7):1264–1277
Yang R, Sarkar S, Loeding B (2007) Enhanced level building algorithm for the movement epenthesis problem in sign language recognition. In: 2007 IEEE Conference on computer vision and pattern recognition. IEEE, pp 1–8
Yang R, Sarkar S, Loeding B (2010) Handling movement epenthesis and hand segmentation ambiguities in continuous sign language recognition using nested dynamic programming. IEEE Trans Pattern Anal Mach Intell 32(3):462–477
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Talukdar, A.K., Bhuyan, M. A framework for continuous fingerspelling spotting for H.264/AVC compressed videos using spatio-temporal Markov random field. Multimed Tools Appl 80, 28329–28347 (2021). https://doi.org/10.1007/s11042-021-10910-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-10910-3