Sharable and unshareable within class multi view deep metric latent feature learning for video-based sign language recognition

Suneetha, M.; Prasad, M. V. D.; Kishore, P. V. V.

doi:10.1007/s11042-022-12646-0

Sharable and unshareable within class multi view deep metric latent feature learning for video-based sign language recognition

Published: 25 March 2022

Volume 81, pages 27247–27273, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

M. Suneetha¹,
M. V. D. Prasad¹ &
P. V. V. Kishore¹

220 Accesses
2 Citations
Explore all metrics

Abstract

Mobile based sign language recognition (SLR) is challenging in real time due to camera shudder and the signer movements for capturing continuous video data for recognition. Even though there are many state-of-the-art methods for SLR, they have ignored view sensitivity and its effects on the accuracy of the system. This work proposes a novel multi view deep metric feature learning (MVslDML) model for building a view sensitive environment into SLR, which is being investigated profoundly in human action recognition. The MVslDMLNet is an end-to-end trainable convolutional neural network where the features extracted from multiple views are learned based on the sharable and unshareable latent features within class multi view data through metric learning. Experiments performed on our multi view sign language and four benchmark action video datasets indicate a higher accuracy for the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiview meta-metric learning for sign language recognition using triplet loss embeddings

Article 18 February 2023

Sign Language Recognition (SLR): A Brisk Paired Deep Metric Attention Learning (BPDMAL) Model for Video Data Applications

Article 10 April 2024

Lightweight sign transformer framework

Article 18 May 2022

References

Achmed I (2014) Independent hand-tracking from a single two-dimensional view and its application to south african sign language recognition. Ph.D. Thesis, University of Western Cape
Aharon M, Elad M, Bruckstein A (2006) K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322. https://doi.org/10.1109/tsp.2006.881199
Article MATH Google Scholar
Bashir F I, Khokhar A A, Schonfeld D (2006) View-invariant motion trajectory-based activity classification and recognition. Multimedia Systems 12(1):45–54. https://doi.org/10.1007/s00530-006-0024-2
Article Google Scholar
Camgoz N C, Koller O, Hadfield S, Bowden R (2020) Sign language transformers: Joint end-to-end sign language recognition and translation. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr42600.2020.01004. IEEE
Cheng G, Yang C, Yao X, Guo L, Han J (2018) When deep learning meets metric learning: Remote sensing image scene classification via learning discriminative CNNs. IEEE Transactions on Geoscience and Remote Sensing 56(5):2811–2821. https://doi.org/10.1109/tgrs.2017.2783902
Article Google Scholar
Cui R, Liu H, Zhang C (2017) Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr.2017.175. IEEE
Cui R, Liu H, Zhang C (2019) A deep neural framework for continuous sign language recognition by iterative training. IEEE Transactions on Multimedia 21(7):1880–1891. https://doi.org/10.1109/tmm.2018.2889563
Article Google Scholar
De Coster M, Van Herreweghe M, Dambre J (2020) Sign language recognition with transformer networks. In: 12th international conference on language resources and evaluation
Dhiman C, Vishwakarma D K (2020) View-invariant deep architecture for human action recognition using two-stream motion and shape temporal dynamics. IEEE Trans Image Process 29:3835–3844. https://doi.org/10.1109/tip.2020.2965299
Article Google Scholar
Efthymiou N, Koutras P, Filntisis P P, Potamianos G, Maragos P (2018) Multi- view fusion for action recognition in child-robot interaction. In: 2018 25th IEEE international conference on image processing (ICIP). https://doi.org/10.1109/icip.2018.8451146. IEEE
Elons A S, Abull-ela M, Tolba MF (2013) A proposed PCNN features quality optimization technique for pose-invariant 3d arabic sign language recognition. Appl Soft Comput 13(4):1646–1660. https://doi.org/10.1016/j.asoc.2012.11.036
Article Google Scholar
Gao Z, Zhang H, Xu GP, Xue YB, Hauptmann AG (2015) Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition. Signal Process 112:83–97. https://doi.org/10.1016/j.sigpro.2014.08.034
Article Google Scholar
Gao Z, Xuan H-Z, Zhang H, Wan S, Choo K-K R (2019) Adaptive fusion and category-level dictionary learning model for multiview human action recognition. IEEE Internet of Things Journal 6(6):9280–9293. https://doi.org/10.1109/jiot.2019.2911669
Article Google Scholar
Gao Z, Xuan H-Z, Zhang H, Wan S, Choo K-K R (2019) Adaptive fusion and category-level dictionary learning model for multiview human action recognition. IEEE Internet of Things Journal 6(6):9280–9293. https://doi.org/10.1109/jiot.2019.2911669
Article Google Scholar
Ge W, Huang W, Dong D, Scott M R (2018) Deep metric learning with hierarchical triplet loss. In: Computer vision–ECCV 2018. https://doi.org/10.1007/978-3-030-01231-1_17. Springer International Publishing, pp 272–288
Ghahabi O, Hernando J (2017) Deep learning backend for single and multisession i-vector speaker recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25(4):807–817. https://doi.org/10.1109/taslp.2017.2661705
Article Google Scholar
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253. https://doi.org/10.1109/tpami.2007.70711
Article Google Scholar
He Z, Jung C, Fu Q, Zhang Z (2018) Deep feature embedding learning for person re-identification based on lifted structured loss. Multimedia Tools and Applications 78(5):5863–5880. https://doi.org/10.1007/s11042-018-6408-4
Article Google Scholar
Hoffer E, Ailon N (2015) Deep metric learning using triplet network. In: Similarity-based pattern recognition. https://doi.org/10.1007/978-3-319-24261-3_7. Springer International Publishing, pp 84–92
Hu J, Lu J, Tan Y-P (2014) Discriminative deep metric learning for face verification in the wild. In: 2014 IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/cvpr.2014.242. IEEE
Hu J, Lu J, Tan Y-P (2018) Sharable and individual multi-view metric learning. IEEE Trans Pattern Anal Mach Intell 40(9):2281–2288. https://doi.org/10.1109/tpami.2017.2749576
Article Google Scholar
Huang K-K, Ren C-X, Liu H, Lai Z-R, Yu Y-F, Dai D-Q (2020) Hyperspectral image classification via discriminative convolutional neural network with an improved triplet loss. Pattern Recogn, pp 107744. https://doi.org/10.1016/j.patcog.2020.107744
Iosifidis A, Tefas A, Pitas I (2013) Multi-view action recognition based on action volumes, fuzzy distances and cluster discriminant analysis. Signal Process 93(6):1445–1457. https://doi.org/10.1016/j.sigpro.2012.08.015
Article Google Scholar
Ji X, Ju Z, Wang C, Wang C (2015) Multi-view transition HMMs based view-invariant human action recognition method. Multimedia Tools and Applications 75(19):11847–11864. https://doi.org/10.1007/s11042-015-2661-y
Article Google Scholar
Ji Y, Yang Y, Shen F, Shen H T, Zheng W-S (2020) Arbitrary-view human action recognition: A varying-view RGB-d action dataset. IEEE Transactions on Circuits and Systems for Video Technology, pp 1–1. https://doi.org/10.1109/tcsvt.2020.2975845
Junejo I N, Dexter E, Laptev I, Pérez P (2011) View-independent action recognition from temporal self-similarities. IEEE Trans Pattern Anal Mach Intell 33(1):172–185. https://doi.org/10.1109/tpami.2010.68
Article Google Scholar
Kishore P V V, Kumar D A, Sastry A S C S, Kumar E K (2018) Motionlets matching with adaptive kernels for 3-d indian sign language recognition. IEEE Sensors J 18 (8):3327–3337. https://doi.org/10.1109/jsen.2018.2810449
Article Google Scholar
Kishore P V V, Prasad M V D, Prasad C R, Rahul R (2015) 4-camera model for sign language recognition using elliptical fourier descriptors and ANN. In: 2015 international conference on signal processing and communication engineering systems. https://doi.org/10.1109/spaces.2015.7058288. IEEE
Kishore PVV, Kumar D A, E.N.D G, Manikanta M (2016) Continuous sign language recognition from tracking and shape features using fuzzy inference engine. In: 2016 international conference on wireless communications, signal processing and networking (WiSPNET). https://doi.org/10.1109/wispnet.2016.7566526. IEEE
Kocabas M, Karagoz S, Akbas E (2019) Self-supervised learning of 3d human pose using multi-view geometry. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr.2019.00117. IEEE
Koller O, Zargaran S, Ney H, Bowden R (2018) Deep sign: Enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs. Int J Comput Vis 126(12):1311–1325. https://doi.org/10.1007/s11263-018-1121-3
Article Google Scholar
Kumar P, Gauba H, Roy P P, Dogra D P (2017) Coupled HMM-based multi-sensor data fusion for sign language recognition. Pattern Recogn Lett 86:1–8. https://doi.org/10.1016/j.patrec.2016.12.004
Article Google Scholar
Li C, Liu C, Duan L, Gao P, Zheng K (2019) Reconstruction regularized deep metric learning for multi-label image classification. IEEE Transactions on Neural Networks and Learning Systems, pp 1–10. https://doi.org/10.1109/tnnls.2019.2924023
Li C, Zhong Q, Xie D, Pu S (2019) Collaborative spatiotemporal feature learning for video action recognition. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr.2019.00806. IEEE
Li D, Opazo C R, Yu X, Li H (2020) Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. In: 2020 IEEE winter conference on applications of computer vision (WACV). https://doi.org/10.1109/wacv45572.2020.9093512. IEEE
Li Y, Liu K, Jin Y, Wang T, Lin W (2020) VARID: Viewpoint-aware re-IDentification of vehicle based on triplet loss. IEEE Transactions on Intelligent Transportation Systems, pp 1–10. https://doi.org/10.1109/tits.2020.3025387
Liao Y, Xiong P, Min W, Min W, Lu J (2019) Dynamic sign language recognition based on video sequence with BLSTM-3d residual networks. IEEE Access 7:38044–38054. https://doi.org/10.1109/access.2019.2904749
Article Google Scholar
López-Sánchez D, Arrieta A G, Corchado J M (2019) Visual content-based web page categorization with deep transfer learning and metric learning. Neurocomputing 338:418–431. https://doi.org/10.1016/j.neucom.2018.08.086
Article Google Scholar
Lu J, Wang G, Deng W, Moulin P, Zhou J (2015) Multi-manifold deep metric learning for image set classification. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr.2015.7298717. IEEE
Mittal A, Kumar P, Roy P P, Balasubramanian R, Chaudhuri B B (2019) A modified LSTM model for continuous sign language recognition using leap motion. IEEE Sensors J 19(16):7056–7063. https://doi.org/10.1109/jsen.2019.2909837
Article Google Scholar
Mustafa M (2020) A study on arabic sign language recognition for differently abled using advanced machine learning classifiers. Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-020-01790-w
Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: Computer vision–ECCV 2006. https://doi.org/10.1007/11744085_38. Springer, Berlin, pp 490–503
Peng Y, Zhao Y, Zhang J (2019) Two-stream collaborative learning with spatial-temporal attention for video classification. IEEE Trans Circ Syst Vid Technol 29(3):773–786. https://doi.org/10.1109/tcsvt.2018.2808685
Article Google Scholar
Pezzuoli F, Corona D, Corradini M L (2019) Improvements in a wearable device for sign language translation Advances in human factors in wearable technologies and game design. https://doi.org/10.1007/978-3-030-20476-1_9. Springer International Publishing, pp 70–81
Qian Q, Shang L, Sun B, Hu J, Tacoma T, Li H, Jin R (2019) SoftTriple loss: Deep metric learning without triplet sampling. In: 2019 IEEE/CVF international conference on computer vision (ICCV). https://doi.org/10.1109/iccv.2019.00655. IEEE
Qu F, Liu J, Liu X, Jiang L (2021) A multi-fault detection method with improved triplet loss based on hard sample mining. IEEE Transactions on Sustainable Energy 12(1):127–137. https://doi.org/10.1109/tste.2020.2985217
Article Google Scholar
Rao G A, Syamala K, Kishore P V V, Sastry A S C S (2018) Deep convolutional neural networks for sign language recognition. In: 2018 conference on signal processing and communication engineering systems (SPACES). https://doi.org/10.1109/spaces.2018.8316344. IEEE
Rastgoo R, Kiani K, Escalera S (2020) Hand sign language recognition using multi-view hand skeleton. Expert Syst Appl 150:113336. https://doi.org/10.1016/j.eswa.2020.113336
Article Google Scholar
Rastgoo R, Kiani K, Escalera S (2021) Sign language recognition: A deep survey. Expert Syst Appl 164:113794. https://doi.org/10.1016/j.eswa.2020.113794
Article Google Scholar
Ravi S, Maloji S, Polurie V V K, Eepuri K K (2018) Sign language recognition with multi feature fusion and ANN classifier. Turkish Journal of Electrical Engineering & Computer Sciences 26(6):2872–2886. https://doi.org/10.3906/elk-1711-139
Article Google Scholar
Ravi S, Suman M, Kishore PVV, E K K, M T K K, D A K (2019) Multi modal spatio temporal co-trained CNNs with single modal testing on RGB–d based sign language gesture recognition. Journal of Computer Languages 52:88–102. https://doi.org/10.1016/j.cola.2019.04.002
Article Google Scholar
Shahroudy A, Liu J, Ng T-T, Wang G (2016) NTU RGB+d: A large scale dataset for 3d human activity analysis. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr.2016.115. IEEE
Singh S, Velastin SA, Ragheb H (2010) MuHAVi: A multicamera human action video dataset for the evaluation of action recognition methods. In: 2010 7th IEEE international conference on advanced video and signal based surveillance. https://doi.org/10.1109/avss.2010.63. IEEE
Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objective. In: Advances in neural information processing systems, pp 1857–1865
Tao W, Leu M C, Yin Z (2018) American sign language alphabet recognition using convolutional neural networks with multiview augmentation and inference fusion. Eng Appl Artif Intell 76:202–213. https://doi.org/10.1016/j.engappai.2018.09.006
Article Google Scholar
Wang D, Ouyang W, Li W, Xu D (2018) Dividing and aggregating network for multi-view action recognition. In: Computer Vision–ECCV 2018. https://doi.org/10.1007/978-3-030-01240-3_28. Springer International Publishing, pp 457–473
Wang H, Feng L, Meng X, Chen Z, Yu L, Zhang H (2017) Multi-view metric learning based on KL-divergence for similarity measurement. Neurocomputing 238:269–276. https://doi.org/10.1016/j.neucom.2017.01.062
Article Google Scholar
Wang J, Zhou F, Wen S, Liu X, Lin Y (2017) Deep metric learning with angular loss. In: 2017 IEEE international conference on computer vision (ICCV). https://doi.org/10.1109/iccv.2017.283. IEEE
Wang L, Ding Z, Tao Z, Liu Y, Fu Y (2019) Generative multi-view human action recognition. In: 2019 IEEE/CVF international conference on computer vision (ICCV). https://doi.org/10.1109/iccv.2019.00631. IEEE
Wang Q, Chen X, Zhang L-G, Wang C, Gao W (2007) Viewpoint invariant sign language recognition. Comput Vis Image Underst 108(1-2):87–97. https://doi.org/10.1016/j.cviu.2006.11.009
Article Google Scholar
Wang X, Han X, Huang W, Dong D, Scott M R (2019) Multi-similarity loss with general pair weighting for deep metric learning. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr.2019.00516. IEEE
Xiao Y, Chen J, Wang Y, Cao Z, Zhou J T, Bai X (2019) Action recognition for depth video using multi-view dynamic images. Inf Sci 480:287–304. https://doi.org/10.1016/j.ins.2018.12.050
Article Google Scholar
Yan Y, Liu G, Ricci E, Sebe N (2013) Multi-task linear discriminant analysis for multi-view action recognition. In: 2013 IEEE international conference on image processing. https://doi.org/10.1109/icip.2013.6738585. IEEE
Yi D, Lei Z, Liao S, Li S Z (2014) Deep metric learning for person re-identification. In: 2014 22nd international conference on pattern recognition. https://doi.org/10.1109/icpr.2014.16. IEEE
Zare A, Moghaddam H A, Sharifi A (2019) Video spatiotemporal mapping for human action recognition by convolutional neural network. Pattern Anal Applic 23(1):265–279. https://doi.org/10.1007/s10044-019-00788-1
Article Google Scholar
Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2019) View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978. https://doi.org/10.1109/tpami.2019.2896631
Article Google Scholar
Zheng W, Chen Z, Lu J, Zhou J (2019) Hardness-aware deep metric learning. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE
Zhu F, Shao L, Lin M (2013) Multi-view action recognition using local similarity random forests and sensor fusion. Pattern Recogn Lett 34 (1):20–24. https://doi.org/10.1016/j.patrec.2012.04.016
Article Google Scholar
Zhu J, Zou W, Zhu Z, Xu L, Huang G (2019) Action machine: Toward person-centric action recognition in videos. IEEE Signal Process Lett 26 (11):1633–1637. https://doi.org/10.1109/lsp.2019.2942739
Article Google Scholar
Zhu K, Wang R, Zhao Q, Cheng J, Tao D (2020) A cuboid CNN model with an attention mechanism for skeleton-based action recognition. IEEE Transactions on Multimedia 22(11):2977–2989. https://doi.org/10.1109/tmm.2019.2962304
Article Google Scholar
Zhu Y, Liu G (2019) Fine-grained action recognition using multi-view attentions. Vis Comput 36(9):1771–1781. https://doi.org/10.1007/s00371-019-01770-y
Article Google Scholar

Download references

Author information

Authors and Affiliations

Biomechanics and Vision Computing Research Center, Department of Electronics and Communications Engineering, Koneru Lakshmiah Education Foundation, Green Fields, Vaddeswaram, Guntur (DT), Andhra Pradesh, India
M. Suneetha, M. V. D. Prasad & P. V. V. Kishore

Authors

M. Suneetha
View author publications
You can also search for this author in PubMed Google Scholar
M. V. D. Prasad
View author publications
You can also search for this author in PubMed Google Scholar
P. V. V. Kishore
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. V. V. Kishore.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Suneetha, M., Prasad, M.V.D. & Kishore, P.V.V. Sharable and unshareable within class multi view deep metric latent feature learning for video-based sign language recognition. Multimed Tools Appl 81, 27247–27273 (2022). https://doi.org/10.1007/s11042-022-12646-0

Download citation

Received: 19 December 2020
Revised: 26 March 2021
Accepted: 09 February 2022
Published: 25 March 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s11042-022-12646-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sharable and unshareable within class multi view deep metric latent feature learning for video-based sign language recognition

Abstract

Access this article

Similar content being viewed by others

Multiview meta-metric learning for sign language recognition using triplet loss embeddings

Sign Language Recognition (SLR): A Brisk Paired Deep Metric Attention Learning (BPDMAL) Model for Video Data Applications

Lightweight sign transformer framework

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sharable and unshareable within class multi view deep metric latent feature learning for video-based sign language recognition

Abstract

Access this article

Similar content being viewed by others

Multiview meta-metric learning for sign language recognition using triplet loss embeddings

Sign Language Recognition (SLR): A Brisk Paired Deep Metric Attention Learning (BPDMAL) Model for Video Data Applications

Lightweight sign transformer framework

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation