Abstract
Multiview video processing for recognition is a hard problem if the subject is in continuous motion. Especially the problem becomes even tougher when the subject in question is a human being and the actions to be recognized from the video data are a complex set of actions called sign language. Although many deep learning models have been successfully applied for sign language recognition (SLR), very few models have considered multiple views in their training set. In this work, we propose to apply meta-metric learning for video-based SLR. Contrasting to traditional metric learning where the triplet loss is constructed on the sample-based distances, the meta-metric learns on the set-based distances. Consequently, we construct meta-cells on the entire multiview dataset and perform a task-based learning approach with respect to support cells and query sets. Additionally, we propose a maximum view pooled distance on sub-tasks for binding intra class views. Experiments conducted on the multiview sign language dataset and four human action recognition datasets show that the proposed multiview meta-metric learning model (MVDMML) achieves higher accuracies than the baselines.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-023-01134-2/MediaObjects/10044_2023_1134_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-023-01134-2/MediaObjects/10044_2023_1134_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-023-01134-2/MediaObjects/10044_2023_1134_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-023-01134-2/MediaObjects/10044_2023_1134_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-023-01134-2/MediaObjects/10044_2023_1134_Fig5_HTML.png)
Similar content being viewed by others
References
Koller O, Zargaran S, Ney H, Bowden R (2018) Deep sign: enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs. Int J Comput Vis 126(12):1311–1325
Kumar E, Kiran PVV, Kishore ASCS, Sastry MT, Kumar K, Anil Kumar D (2018) Training CNNs for 3-D sign language recognition with color texture coded joint angular displacement maps. IEEE Signal Process Lett 25(5):645–649
Mary TB, Malin Bruntha P, Manimekalai MAP, Martin Sagayam K, Dang H (2021) Investigation of an efficient integrated semantic interactive algorithm for image retrieval. Pattern Recognit Image Anal 31(4):709–721
Mittal A, Kumar P, Roy PP, Balasubramanian R, Chaudhuri BB (2019) A modified LSTM model for continuous sign language recognition using leap motion. IEEE Sens J 19(16):7056–7063
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1227–1236
Hoffer E, Ailon N (2015) Deep metric learning using triplet network. In: International workshop on similarity-based pattern recognition. Springer, Cham, pp 84–92
Wang J, Wang K-C, Law MT, Rudzicz F, Brudno M (2019) Centroid-based deep metric learning for speaker recognition. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 3652–3656
Yu J, Hu C-H, Jing X-Y, Feng Y-J (2020) Deep metric learning with dynamic margin hard sampling loss for face verification. Signal Image Video Process 14:791–798
Coskun H, Tan DJ, Conjeti S, Navab N, Tombari F (2018) Human motion analysis with deep metric learning. In: Proceedings of the European conference on computer vision (ECCV), pp 667–683
He J, Wang Y, Liu H (2020) Ship classification in medium-resolution SAR images via densely connected triplet CNNs integrating fisher discrimination regularized metric learning. IEEE Trans Geosci Remote Sens 59(4):3022–3039
Wojke N, Bewley A (2018) Deep cosine metric learning for person re-identification. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 748–756
Chen G, Zhang T, Lu J, Zhou J (2019) Deep meta metric learning. In: Proceedings of the IEEE international conference on computer vision, pp 9547–9556
Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019
Singh S, Velastin SA, Ragheb H (2010) Muhavi: a multicamera human action video dataset for the evaluation of action recognition methods. In: 2010 7th IEEE international conference on advanced video and signal based surveillance. IEEE, pp 48–55
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253
Wang D, Ouyang W, Li W, Xu D (2018) Dividing and aggregating network for multi-view action recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 451–467
Pezzuoli F, Corona D, Corradini ML (2019) Improvements in a wearable device for sign language translation. In: International conference on applied human factors and ergonomics. Springer, Cham, pp 70–81
Rao GA, Syamala K, Kishore PVV, Sastry ASCS (2018) Deep convolutional neural networks for sign language recognition. In: 2018 conference on signal processing and communication engineering systems (SPACES). IEEE, pp 194–197
Ravi S, Suman M, Kishore PVV, Kumar K, Kumar A (2019) Multi modal spatio temporal co-trained CNNs with single modal testing on RGB–D based sign language gesture recognition. J Comput Lang 52:88–102
Kishore PVV, Anil Kumar D, Chandra Sekhara Sastry AS, Kiran Kumar E (2018) Motionlets matching with adaptive kernels for 3-d indian sign language recognition. IEEE Sens J 18(8):3327–3337
Li D, Rodriguez C, Yu X, Li H (2020) Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison. In: The IEEE winter conference on applications of computer vision, pp 1459–1469
Liao Y, Xiong P, Min W, Min W, Jiahao Lu (2019) Dynamic sign language recognition based on video sequence with BLSTM-3D residual networks. IEEE Access 7:38044–38054
Kishore PVV, Anil Kumar D, Goutham END, Manikanta M (2016) Continuous sign language recognition from tracking and shape features using fuzzy inference engine. In: 2016 International conference on wireless communications, signal processing and networking (WiSPNET). IEEE, pp 2165–2170
Sagayam KM, Jude Hemanth D (2019) A probabilistic model for state sequence analysis in hidden Markov model for hand gesture recognition. Comput Intell 35(1):59–81
Kishore PVV, Prasad MVD, Raghava Prasad C, Rahul R (2015) 4-Camera model for sign language recognition using elliptical fourier descriptors and ANN. In: 2015 International conference on signal processing and communication engineering systems. IEEE, pp 34–38
Cui R, Liu H, Zhang C (2017) Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7361–7369
Rastgoo R, Kiani K, Escalera S (2020) Hand sign language recognition using multi-view hand skeleton. Expert Syst Appl 150:113336
Kocabas M, Karagoz S, Akbas E (2019) Self-supervised learning of 3d human pose using multi-view geometry. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1077–1086
Gao Z, Xuan H-Z, Zhang H, Wan S, Choo K-KR (2019) Adaptive fusion and category-level dictionary learning model for multiview human action recognition. IEEE Internet Things J 6(6):9280–9293
Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2019) View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978
Hao T, Dan Wu, Wang Q, Sun J-S (2017) Multi-view representation learning for multi-view action recognition. J Vis Commun Image Represent 48:453–460
Zhu Y, Liu G (2019) Fine-grained action recognition using multi-view attentions. Visual Comput 36:1–11
Zhu K, Wang R, Zhao Q, Cheng J, Tao D (2020) A cuboid CNN model with an attention mechanism for skeleton-based action recognition. IEEE Trans Multimedia 22(11):2977–2989
Wang J, Zhou F, Wen S, Liu X, Lin Y (2017) Deep metric learning with angular loss. In: Proceedings of the IEEE international conference on computer vision, pp 2593–2601
Yi D, Lei Z, Liao S, Li SZ (2014) Deep metric learning for person re-identification. In: 2014 22nd international conference on pattern recognition. IEEE, pp 34–39
Hu J, Lu J, Tan Y-P (2014) Discriminative deep metric learning for face verification in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1875–1882
Ghahabi O, Hernando J (2017) Deep learning backend for single and multisession i-vector speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 25(4):807–817
Lu J, Wang G, Deng W, Moulin P, Zhou J (2015) Multi-manifold deep metric learning for image set classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1137–1145
Cheng G, Yang C, Yao X, Guo L, Han J (2018) When deep learning meets metric learning: remote sensing image scene classification via learning discriminative CNNs. IEEE Trans Geosci Remote Sens 56(5):2811–2821
Ge W (2018) Deep metric learning with hierarchical triplet loss. In: Proceedings of the European conference on computer vision (ECCV), pp 269–285
Zheng W, Chen Z, Lu J, Zhou J (2019) Hardness-aware deep metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 72–81
Wang X, Han X, Huang W, Dong D, Scott MR (2019) Multi-similarity loss with general pair weighting for deep metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5022–5030
Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objective. In: Advances in neural information processing systems, pp 1857–1865
Lee K, Maji S, Ravichandran A, Soatto S (2019) Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10657–10665
Achille A, Lam M, Tewari R, Ravichandran A, Maji S, Fowlkes CC, Soatto S, Perona P (2019) Task2vec: task embedding for meta-learning. In: Proceedings of the IEEE international conference on computer vision, pp 6430–6439
Yoo D, Fan H, Boddeti V, Kitani K (2018) Efficient k-shot learning with regularized deep networks. In: Proceedings of the AAAI conference on artificial intelligence vol. 32, No. 1
Lee Y, Choi S (2018) Gradient-based meta-learning with learned layerwise metric and subspace. In: International conference on machine learning. PMLR, pp 2927–2936
Xu Z, Cao L, Chen X (2019) Meta-learning via weighted gradient update. IEEE Access 7:110846–110855
Wang D, Cheng Yu, Mo Yu, Guo X, Zhang T (2019) A hybrid approach with optimization-based and metric-based meta-learner for few-shot learning. Neurocomputing 349:202–211
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning. PMLR, pp 1126–1135
Cai Q, Pan Y, Yao T, Yan C, Mei T (2018) Memory matching networks for one-shot image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4080–4088
He X, Zhou Y, Zhou Z, Bai S, Bai X (2018) Triplet-center loss for multi-view 3d object retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1945–1954
Qu F, Liu J, Liu X, Jiang L (2020) A multi-fault detection method with improved triplet loss based on hard sample mining. IEEE Trans Sustain Energy 12(1):127–137
He Z, Jung C, Qingtao Fu, Zhang Z (2019) Deep feature embedding learning for person re-identification based on lifted structured loss. Multimedia Tools Appl 78(5):5863–5880
Chen M, Ge Y, Feng X, Chuanyun Xu, Yang D (2018) Person re-identification by pose invariant deep metric learning with improved triplet loss. IEEE Access 6:68089–68095
Dong X, Shen J (2018) Triplet loss in siamese network for object tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 459–474
Choi H, Som A, Turaga P (2020) AMC-loss: angular margin contrastive loss for improved explainability in image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 838–839
Zhong P, Wang Di, Miao C (2019) An affect-rich neural conversational model with biased attention and weighted cross-entropy loss. Proc AAAI Conf Artif Intell 33(01):7492–7500
Wang Qi, Chen X, Zhang L-G, Wang C, Gao W (2007) Viewpoint invariant sign language recognition. Comput Vis Image Underst 108(1–2):87–97
Elons AS, Abull-Ela M, Tolba MF (2013) A proposed PCNN features quality optimization technique for pose-invariant 3D Arabic sign language recognition. Appl Soft Comput 13(4):1646–1660
Zhu J, Zou W, Zhu Z, Liang Xu, Huang G (2019) Action machine: toward person-centric action recognition in videos. IEEE Signal Process Lett 26(11):1633–1637
Luvizon DC, Picard D, Tabia H (2018) 2d/3d pose estimation and action recognition using multitask deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5137–5146
Liu M, Yuan J (2018) Recognizing human actions as the evolution of pose estimation maps. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1159–1168
Wang P, Li W, Wan J, Ogunbona P, Liu X (2018) Cooperative training of deep aggregation networks for RGB-D action recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 32, No. 1
Nida N, Yousaf MH, Irtaza A, Velastin SA (2020) Deep temporal motion descriptor (DTMD) for human action recognition. Turk J Electr Eng Comput Sci 28(3):1371–1385
Khan MA, Akram T, Sharif M, Javed MY, Muhammad N, Yasmin M (2019) An implementation of optimized framework for action classification using multilayers neural network on selected fused features. Pattern Anal Appl 22(4):1377–1397
Ijjina EP, Chalavadi KM (2017) Human action recognition in RGB-D videos using motion sequence information and deep learning. Pattern Recognit 72:504–516
Liu C, Ying J, Yang H, Hu X, Liu J (2021) Improved human action recognition approach based on two-stream convolutional neural network model. Vis Comput 37:1327–1341
Mambou S, Krejcar O, Kuca K, Selamat A (2018) Novel cross-view human action model recognition based on the powerful view-invariant features technique. Future Internet 10(9):89
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author(s) declare that they have no Conflict of Interests for this research in any form.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mopidevi, S., Prasad, M.V.D. & Kishore, P.V.V. Multiview meta-metric learning for sign language recognition using triplet loss embeddings. Pattern Anal Applic 26, 1125–1141 (2023). https://doi.org/10.1007/s10044-023-01134-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-023-01134-2