Multiview meta-metric learning for sign language recognition using triplet loss embeddings

Mopidevi, Suneetha; Prasad, M. V. D.; Kishore, Polurie Venkata Vijay

doi:10.1007/s10044-023-01134-2

Multiview meta-metric learning for sign language recognition using triplet loss embeddings

Theoretical Advances
Published: 18 February 2023

Volume 26, pages 1125–1141, (2023)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Suneetha Mopidevi¹,
M. V. D. Prasad¹ &
Polurie Venkata Vijay Kishore ORCID: orcid.org/0000-0002-3247-3043²

276 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Multiview video processing for recognition is a hard problem if the subject is in continuous motion. Especially the problem becomes even tougher when the subject in question is a human being and the actions to be recognized from the video data are a complex set of actions called sign language. Although many deep learning models have been successfully applied for sign language recognition (SLR), very few models have considered multiple views in their training set. In this work, we propose to apply meta-metric learning for video-based SLR. Contrasting to traditional metric learning where the triplet loss is constructed on the sample-based distances, the meta-metric learns on the set-based distances. Consequently, we construct meta-cells on the entire multiview dataset and perform a task-based learning approach with respect to support cells and query sets. Additionally, we propose a maximum view pooled distance on sub-tasks for binding intra class views. Experiments conducted on the multiview sign language dataset and four human action recognition datasets show that the proposed multiview meta-metric learning model (MVDMML) achieves higher accuracies than the baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Article 10 June 2021

Fine-Grained Multimodal DeepFake Classification via Heterogeneous Graphs

Article 06 June 2024

Visual attention network

Article Open access 28 July 2023

References

Koller O, Zargaran S, Ney H, Bowden R (2018) Deep sign: enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs. Int J Comput Vis 126(12):1311–1325
Article Google Scholar
Kumar E, Kiran PVV, Kishore ASCS, Sastry MT, Kumar K, Anil Kumar D (2018) Training CNNs for 3-D sign language recognition with color texture coded joint angular displacement maps. IEEE Signal Process Lett 25(5):645–649
Article Google Scholar
Mary TB, Malin Bruntha P, Manimekalai MAP, Martin Sagayam K, Dang H (2021) Investigation of an efficient integrated semantic interactive algorithm for image retrieval. Pattern Recognit Image Anal 31(4):709–721
Article Google Scholar
Mittal A, Kumar P, Roy PP, Balasubramanian R, Chaudhuri BB (2019) A modified LSTM model for continuous sign language recognition using leap motion. IEEE Sens J 19(16):7056–7063
Article Google Scholar
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1227–1236
Hoffer E, Ailon N (2015) Deep metric learning using triplet network. In: International workshop on similarity-based pattern recognition. Springer, Cham, pp 84–92
Wang J, Wang K-C, Law MT, Rudzicz F, Brudno M (2019) Centroid-based deep metric learning for speaker recognition. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 3652–3656
Yu J, Hu C-H, Jing X-Y, Feng Y-J (2020) Deep metric learning with dynamic margin hard sampling loss for face verification. Signal Image Video Process 14:791–798
Article Google Scholar
Coskun H, Tan DJ, Conjeti S, Navab N, Tombari F (2018) Human motion analysis with deep metric learning. In: Proceedings of the European conference on computer vision (ECCV), pp 667–683
He J, Wang Y, Liu H (2020) Ship classification in medium-resolution SAR images via densely connected triplet CNNs integrating fisher discrimination regularized metric learning. IEEE Trans Geosci Remote Sens 59(4):3022–3039
Article Google Scholar
Wojke N, Bewley A (2018) Deep cosine metric learning for person re-identification. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 748–756
Chen G, Zhang T, Lu J, Zhou J (2019) Deep meta metric learning. In: Proceedings of the IEEE international conference on computer vision, pp 9547–9556
Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019
Singh S, Velastin SA, Ragheb H (2010) Muhavi: a multicamera human action video dataset for the evaluation of action recognition methods. In: 2010 7th IEEE international conference on advanced video and signal based surveillance. IEEE, pp 48–55
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253
Article Google Scholar
Wang D, Ouyang W, Li W, Xu D (2018) Dividing and aggregating network for multi-view action recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 451–467
Pezzuoli F, Corona D, Corradini ML (2019) Improvements in a wearable device for sign language translation. In: International conference on applied human factors and ergonomics. Springer, Cham, pp 70–81
Rao GA, Syamala K, Kishore PVV, Sastry ASCS (2018) Deep convolutional neural networks for sign language recognition. In: 2018 conference on signal processing and communication engineering systems (SPACES). IEEE, pp 194–197
Ravi S, Suman M, Kishore PVV, Kumar K, Kumar A (2019) Multi modal spatio temporal co-trained CNNs with single modal testing on RGB–D based sign language gesture recognition. J Comput Lang 52:88–102
Article Google Scholar
Kishore PVV, Anil Kumar D, Chandra Sekhara Sastry AS, Kiran Kumar E (2018) Motionlets matching with adaptive kernels for 3-d indian sign language recognition. IEEE Sens J 18(8):3327–3337
Article Google Scholar
Li D, Rodriguez C, Yu X, Li H (2020) Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison. In: The IEEE winter conference on applications of computer vision, pp 1459–1469
Liao Y, Xiong P, Min W, Min W, Jiahao Lu (2019) Dynamic sign language recognition based on video sequence with BLSTM-3D residual networks. IEEE Access 7:38044–38054
Article Google Scholar
Kishore PVV, Anil Kumar D, Goutham END, Manikanta M (2016) Continuous sign language recognition from tracking and shape features using fuzzy inference engine. In: 2016 International conference on wireless communications, signal processing and networking (WiSPNET). IEEE, pp 2165–2170
Sagayam KM, Jude Hemanth D (2019) A probabilistic model for state sequence analysis in hidden Markov model for hand gesture recognition. Comput Intell 35(1):59–81
Article MathSciNet Google Scholar
Kishore PVV, Prasad MVD, Raghava Prasad C, Rahul R (2015) 4-Camera model for sign language recognition using elliptical fourier descriptors and ANN. In: 2015 International conference on signal processing and communication engineering systems. IEEE, pp 34–38
Cui R, Liu H, Zhang C (2017) Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7361–7369
Rastgoo R, Kiani K, Escalera S (2020) Hand sign language recognition using multi-view hand skeleton. Expert Syst Appl 150:113336
Article Google Scholar
Kocabas M, Karagoz S, Akbas E (2019) Self-supervised learning of 3d human pose using multi-view geometry. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1077–1086
Gao Z, Xuan H-Z, Zhang H, Wan S, Choo K-KR (2019) Adaptive fusion and category-level dictionary learning model for multiview human action recognition. IEEE Internet Things J 6(6):9280–9293
Article Google Scholar
Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2019) View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978
Article Google Scholar
Hao T, Dan Wu, Wang Q, Sun J-S (2017) Multi-view representation learning for multi-view action recognition. J Vis Commun Image Represent 48:453–460
Article Google Scholar
Zhu Y, Liu G (2019) Fine-grained action recognition using multi-view attentions. Visual Comput 36:1–11
Google Scholar
Zhu K, Wang R, Zhao Q, Cheng J, Tao D (2020) A cuboid CNN model with an attention mechanism for skeleton-based action recognition. IEEE Trans Multimedia 22(11):2977–2989
Article Google Scholar
Wang J, Zhou F, Wen S, Liu X, Lin Y (2017) Deep metric learning with angular loss. In: Proceedings of the IEEE international conference on computer vision, pp 2593–2601
Yi D, Lei Z, Liao S, Li SZ (2014) Deep metric learning for person re-identification. In: 2014 22nd international conference on pattern recognition. IEEE, pp 34–39
Hu J, Lu J, Tan Y-P (2014) Discriminative deep metric learning for face verification in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1875–1882
Ghahabi O, Hernando J (2017) Deep learning backend for single and multisession i-vector speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 25(4):807–817
Article Google Scholar
Lu J, Wang G, Deng W, Moulin P, Zhou J (2015) Multi-manifold deep metric learning for image set classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1137–1145
Cheng G, Yang C, Yao X, Guo L, Han J (2018) When deep learning meets metric learning: remote sensing image scene classification via learning discriminative CNNs. IEEE Trans Geosci Remote Sens 56(5):2811–2821
Article Google Scholar
Ge W (2018) Deep metric learning with hierarchical triplet loss. In: Proceedings of the European conference on computer vision (ECCV), pp 269–285
Zheng W, Chen Z, Lu J, Zhou J (2019) Hardness-aware deep metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 72–81
Wang X, Han X, Huang W, Dong D, Scott MR (2019) Multi-similarity loss with general pair weighting for deep metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5022–5030
Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objective. In: Advances in neural information processing systems, pp 1857–1865
Lee K, Maji S, Ravichandran A, Soatto S (2019) Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10657–10665
Achille A, Lam M, Tewari R, Ravichandran A, Maji S, Fowlkes CC, Soatto S, Perona P (2019) Task2vec: task embedding for meta-learning. In: Proceedings of the IEEE international conference on computer vision, pp 6430–6439
Yoo D, Fan H, Boddeti V, Kitani K (2018) Efficient k-shot learning with regularized deep networks. In: Proceedings of the AAAI conference on artificial intelligence vol. 32, No. 1
Lee Y, Choi S (2018) Gradient-based meta-learning with learned layerwise metric and subspace. In: International conference on machine learning. PMLR, pp 2927–2936
Xu Z, Cao L, Chen X (2019) Meta-learning via weighted gradient update. IEEE Access 7:110846–110855
Article Google Scholar
Wang D, Cheng Yu, Mo Yu, Guo X, Zhang T (2019) A hybrid approach with optimization-based and metric-based meta-learner for few-shot learning. Neurocomputing 349:202–211
Article Google Scholar
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning. PMLR, pp 1126–1135
Cai Q, Pan Y, Yao T, Yan C, Mei T (2018) Memory matching networks for one-shot image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4080–4088
He X, Zhou Y, Zhou Z, Bai S, Bai X (2018) Triplet-center loss for multi-view 3d object retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1945–1954
Qu F, Liu J, Liu X, Jiang L (2020) A multi-fault detection method with improved triplet loss based on hard sample mining. IEEE Trans Sustain Energy 12(1):127–137
Article Google Scholar
He Z, Jung C, Qingtao Fu, Zhang Z (2019) Deep feature embedding learning for person re-identification based on lifted structured loss. Multimedia Tools Appl 78(5):5863–5880
Article Google Scholar
Chen M, Ge Y, Feng X, Chuanyun Xu, Yang D (2018) Person re-identification by pose invariant deep metric learning with improved triplet loss. IEEE Access 6:68089–68095
Article Google Scholar
Dong X, Shen J (2018) Triplet loss in siamese network for object tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 459–474
Choi H, Som A, Turaga P (2020) AMC-loss: angular margin contrastive loss for improved explainability in image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 838–839
Zhong P, Wang Di, Miao C (2019) An affect-rich neural conversational model with biased attention and weighted cross-entropy loss. Proc AAAI Conf Artif Intell 33(01):7492–7500
Google Scholar
Wang Qi, Chen X, Zhang L-G, Wang C, Gao W (2007) Viewpoint invariant sign language recognition. Comput Vis Image Underst 108(1–2):87–97
Article Google Scholar
Elons AS, Abull-Ela M, Tolba MF (2013) A proposed PCNN features quality optimization technique for pose-invariant 3D Arabic sign language recognition. Appl Soft Comput 13(4):1646–1660
Article Google Scholar
Zhu J, Zou W, Zhu Z, Liang Xu, Huang G (2019) Action machine: toward person-centric action recognition in videos. IEEE Signal Process Lett 26(11):1633–1637
Article Google Scholar
Luvizon DC, Picard D, Tabia H (2018) 2d/3d pose estimation and action recognition using multitask deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5137–5146
Liu M, Yuan J (2018) Recognizing human actions as the evolution of pose estimation maps. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1159–1168
Wang P, Li W, Wan J, Ogunbona P, Liu X (2018) Cooperative training of deep aggregation networks for RGB-D action recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 32, No. 1
Nida N, Yousaf MH, Irtaza A, Velastin SA (2020) Deep temporal motion descriptor (DTMD) for human action recognition. Turk J Electr Eng Comput Sci 28(3):1371–1385
Article Google Scholar
Khan MA, Akram T, Sharif M, Javed MY, Muhammad N, Yasmin M (2019) An implementation of optimized framework for action classification using multilayers neural network on selected fused features. Pattern Anal Appl 22(4):1377–1397
Article MathSciNet Google Scholar
Ijjina EP, Chalavadi KM (2017) Human action recognition in RGB-D videos using motion sequence information and deep learning. Pattern Recognit 72:504–516
Article Google Scholar
Liu C, Ying J, Yang H, Hu X, Liu J (2021) Improved human action recognition approach based on two-stream convolutional neural network model. Vis Comput 37:1327–1341
Article Google Scholar
Mambou S, Krejcar O, Kuca K, Selamat A (2018) Novel cross-view human action model recognition based on the powerful view-invariant features technique. Future Internet 10(9):89
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communications Engineering, K.L. University, Green Fields, Vaddeswaram, Guntur DT, Andhra Pradesh, 522 502, India
Suneetha Mopidevi & M. V. D. Prasad
Image Speech and Signal Processing Research Group, Department of Electronics and Communications Engineering, Biomechanics and Vision Computing Research Center, K.L. University, Green Fields, Vaddeswaram, Guntur DT, Andhra Pradesh, 522 502, India
Polurie Venkata Vijay Kishore

Authors

Suneetha Mopidevi
View author publications
You can also search for this author in PubMed Google Scholar
M. V. D. Prasad
View author publications
You can also search for this author in PubMed Google Scholar
Polurie Venkata Vijay Kishore
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Polurie Venkata Vijay Kishore.

Ethics declarations

Conflict of interest

The author(s) declare that they have no Conflict of Interests for this research in any form.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mopidevi, S., Prasad, M.V.D. & Kishore, P.V.V. Multiview meta-metric learning for sign language recognition using triplet loss embeddings. Pattern Anal Applic 26, 1125–1141 (2023). https://doi.org/10.1007/s10044-023-01134-2

Download citation

Received: 09 April 2021
Accepted: 24 January 2023
Published: 18 February 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s10044-023-01134-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiview meta-metric learning for sign language recognition using triplet loss embeddings

Abstract

Access this article

Similar content being viewed by others

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Fine-Grained Multimodal DeepFake Classification via Heterogeneous Graphs

Visual attention network

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multiview meta-metric learning for sign language recognition using triplet loss embeddings

Abstract

Access this article

Similar content being viewed by others

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Fine-Grained Multimodal DeepFake Classification via Heterogeneous Graphs

Visual attention network

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation