Skip to main content
Log in

Multiview meta-metric learning for sign language recognition using triplet loss embeddings

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Multiview video processing for recognition is a hard problem if the subject is in continuous motion. Especially the problem becomes even tougher when the subject in question is a human being and the actions to be recognized from the video data are a complex set of actions called sign language. Although many deep learning models have been successfully applied for sign language recognition (SLR), very few models have considered multiple views in their training set. In this work, we propose to apply meta-metric learning for video-based SLR. Contrasting to traditional metric learning where the triplet loss is constructed on the sample-based distances, the meta-metric learns on the set-based distances. Consequently, we construct meta-cells on the entire multiview dataset and perform a task-based learning approach with respect to support cells and query sets. Additionally, we propose a maximum view pooled distance on sub-tasks for binding intra class views. Experiments conducted on the multiview sign language dataset and four human action recognition datasets show that the proposed multiview meta-metric learning model (MVDMML) achieves higher accuracies than the baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Koller O, Zargaran S, Ney H, Bowden R (2018) Deep sign: enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs. Int J Comput Vis 126(12):1311–1325

    Article  Google Scholar 

  2. Kumar E, Kiran PVV, Kishore ASCS, Sastry MT, Kumar K, Anil Kumar D (2018) Training CNNs for 3-D sign language recognition with color texture coded joint angular displacement maps. IEEE Signal Process Lett 25(5):645–649

    Article  Google Scholar 

  3. Mary TB, Malin Bruntha P, Manimekalai MAP, Martin Sagayam K, Dang H (2021) Investigation of an efficient integrated semantic interactive algorithm for image retrieval. Pattern Recognit Image Anal 31(4):709–721

    Article  Google Scholar 

  4. Mittal A, Kumar P, Roy PP, Balasubramanian R, Chaudhuri BB (2019) A modified LSTM model for continuous sign language recognition using leap motion. IEEE Sens J 19(16):7056–7063

    Article  Google Scholar 

  5. Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1227–1236

  6. Hoffer E, Ailon N (2015) Deep metric learning using triplet network. In: International workshop on similarity-based pattern recognition. Springer, Cham, pp 84–92

  7. Wang J, Wang K-C, Law MT, Rudzicz F, Brudno M (2019) Centroid-based deep metric learning for speaker recognition. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 3652–3656

  8. Yu J, Hu C-H, Jing X-Y, Feng Y-J (2020) Deep metric learning with dynamic margin hard sampling loss for face verification. Signal Image Video Process 14:791–798

    Article  Google Scholar 

  9. Coskun H, Tan DJ, Conjeti S, Navab N, Tombari F (2018) Human motion analysis with deep metric learning. In: Proceedings of the European conference on computer vision (ECCV), pp 667–683

  10. He J, Wang Y, Liu H (2020) Ship classification in medium-resolution SAR images via densely connected triplet CNNs integrating fisher discrimination regularized metric learning. IEEE Trans Geosci Remote Sens 59(4):3022–3039

    Article  Google Scholar 

  11. Wojke N, Bewley A (2018) Deep cosine metric learning for person re-identification. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 748–756

  12. Chen G, Zhang T, Lu J, Zhou J (2019) Deep meta metric learning. In: Proceedings of the IEEE international conference on computer vision, pp 9547–9556

  13. Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019

  14. Singh S, Velastin SA, Ragheb H (2010) Muhavi: a multicamera human action video dataset for the evaluation of action recognition methods. In: 2010 7th IEEE international conference on advanced video and signal based surveillance. IEEE, pp 48–55

  15. Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253

    Article  Google Scholar 

  16. Wang D, Ouyang W, Li W, Xu D (2018) Dividing and aggregating network for multi-view action recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 451–467

  17. Pezzuoli F, Corona D, Corradini ML (2019) Improvements in a wearable device for sign language translation. In: International conference on applied human factors and ergonomics. Springer, Cham, pp 70–81

  18. Rao GA, Syamala K, Kishore PVV, Sastry ASCS (2018) Deep convolutional neural networks for sign language recognition. In: 2018 conference on signal processing and communication engineering systems (SPACES). IEEE, pp 194–197

  19. Ravi S, Suman M, Kishore PVV, Kumar K, Kumar A (2019) Multi modal spatio temporal co-trained CNNs with single modal testing on RGB–D based sign language gesture recognition. J Comput Lang 52:88–102

    Article  Google Scholar 

  20. Kishore PVV, Anil Kumar D, Chandra Sekhara Sastry AS, Kiran Kumar E (2018) Motionlets matching with adaptive kernels for 3-d indian sign language recognition. IEEE Sens J 18(8):3327–3337

    Article  Google Scholar 

  21. Li D, Rodriguez C, Yu X, Li H (2020) Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison. In: The IEEE winter conference on applications of computer vision, pp 1459–1469

  22. Liao Y, Xiong P, Min W, Min W, Jiahao Lu (2019) Dynamic sign language recognition based on video sequence with BLSTM-3D residual networks. IEEE Access 7:38044–38054

    Article  Google Scholar 

  23. Kishore PVV, Anil Kumar D, Goutham END, Manikanta M (2016) Continuous sign language recognition from tracking and shape features using fuzzy inference engine. In: 2016 International conference on wireless communications, signal processing and networking (WiSPNET). IEEE, pp 2165–2170

  24. Sagayam KM, Jude Hemanth D (2019) A probabilistic model for state sequence analysis in hidden Markov model for hand gesture recognition. Comput Intell 35(1):59–81

    Article  MathSciNet  Google Scholar 

  25. Kishore PVV, Prasad MVD, Raghava Prasad C, Rahul R (2015) 4-Camera model for sign language recognition using elliptical fourier descriptors and ANN. In: 2015 International conference on signal processing and communication engineering systems. IEEE, pp 34–38

  26. Cui R, Liu H, Zhang C (2017) Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7361–7369

  27. Rastgoo R, Kiani K, Escalera S (2020) Hand sign language recognition using multi-view hand skeleton. Expert Syst Appl 150:113336

    Article  Google Scholar 

  28. Kocabas M, Karagoz S, Akbas E (2019) Self-supervised learning of 3d human pose using multi-view geometry. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1077–1086

  29. Gao Z, Xuan H-Z, Zhang H, Wan S, Choo K-KR (2019) Adaptive fusion and category-level dictionary learning model for multiview human action recognition. IEEE Internet Things J 6(6):9280–9293

    Article  Google Scholar 

  30. Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2019) View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978

    Article  Google Scholar 

  31. Hao T, Dan Wu, Wang Q, Sun J-S (2017) Multi-view representation learning for multi-view action recognition. J Vis Commun Image Represent 48:453–460

    Article  Google Scholar 

  32. Zhu Y, Liu G (2019) Fine-grained action recognition using multi-view attentions. Visual Comput 36:1–11

    Google Scholar 

  33. Zhu K, Wang R, Zhao Q, Cheng J, Tao D (2020) A cuboid CNN model with an attention mechanism for skeleton-based action recognition. IEEE Trans Multimedia 22(11):2977–2989

    Article  Google Scholar 

  34. Wang J, Zhou F, Wen S, Liu X, Lin Y (2017) Deep metric learning with angular loss. In: Proceedings of the IEEE international conference on computer vision, pp 2593–2601

  35. Yi D, Lei Z, Liao S, Li SZ (2014) Deep metric learning for person re-identification. In: 2014 22nd international conference on pattern recognition. IEEE, pp 34–39

  36. Hu J, Lu J, Tan Y-P (2014) Discriminative deep metric learning for face verification in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1875–1882

  37. Ghahabi O, Hernando J (2017) Deep learning backend for single and multisession i-vector speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 25(4):807–817

    Article  Google Scholar 

  38. Lu J, Wang G, Deng W, Moulin P, Zhou J (2015) Multi-manifold deep metric learning for image set classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1137–1145

  39. Cheng G, Yang C, Yao X, Guo L, Han J (2018) When deep learning meets metric learning: remote sensing image scene classification via learning discriminative CNNs. IEEE Trans Geosci Remote Sens 56(5):2811–2821

    Article  Google Scholar 

  40. Ge W (2018) Deep metric learning with hierarchical triplet loss. In: Proceedings of the European conference on computer vision (ECCV), pp 269–285

  41. Zheng W, Chen Z, Lu J, Zhou J (2019) Hardness-aware deep metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 72–81

  42. Wang X, Han X, Huang W, Dong D, Scott MR (2019) Multi-similarity loss with general pair weighting for deep metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5022–5030

  43. Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objective. In: Advances in neural information processing systems, pp 1857–1865

  44. Lee K, Maji S, Ravichandran A, Soatto S (2019) Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10657–10665

  45. Achille A, Lam M, Tewari R, Ravichandran A, Maji S, Fowlkes CC, Soatto S, Perona P (2019) Task2vec: task embedding for meta-learning. In: Proceedings of the IEEE international conference on computer vision, pp 6430–6439

  46. Yoo D, Fan H, Boddeti V, Kitani K (2018) Efficient k-shot learning with regularized deep networks. In: Proceedings of the AAAI conference on artificial intelligence vol. 32, No. 1

  47. Lee Y, Choi S (2018) Gradient-based meta-learning with learned layerwise metric and subspace. In: International conference on machine learning. PMLR, pp 2927–2936

  48. Xu Z, Cao L, Chen X (2019) Meta-learning via weighted gradient update. IEEE Access 7:110846–110855

    Article  Google Scholar 

  49. Wang D, Cheng Yu, Mo Yu, Guo X, Zhang T (2019) A hybrid approach with optimization-based and metric-based meta-learner for few-shot learning. Neurocomputing 349:202–211

    Article  Google Scholar 

  50. Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning. PMLR, pp 1126–1135

  51. Cai Q, Pan Y, Yao T, Yan C, Mei T (2018) Memory matching networks for one-shot image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4080–4088

  52. He X, Zhou Y, Zhou Z, Bai S, Bai X (2018) Triplet-center loss for multi-view 3d object retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1945–1954

  53. Qu F, Liu J, Liu X, Jiang L (2020) A multi-fault detection method with improved triplet loss based on hard sample mining. IEEE Trans Sustain Energy 12(1):127–137

    Article  Google Scholar 

  54. He Z, Jung C, Qingtao Fu, Zhang Z (2019) Deep feature embedding learning for person re-identification based on lifted structured loss. Multimedia Tools Appl 78(5):5863–5880

    Article  Google Scholar 

  55. Chen M, Ge Y, Feng X, Chuanyun Xu, Yang D (2018) Person re-identification by pose invariant deep metric learning with improved triplet loss. IEEE Access 6:68089–68095

    Article  Google Scholar 

  56. Dong X, Shen J (2018) Triplet loss in siamese network for object tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 459–474

  57. Choi H, Som A, Turaga P (2020) AMC-loss: angular margin contrastive loss for improved explainability in image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 838–839

  58. Zhong P, Wang Di, Miao C (2019) An affect-rich neural conversational model with biased attention and weighted cross-entropy loss. Proc AAAI Conf Artif Intell 33(01):7492–7500

    Google Scholar 

  59. Wang Qi, Chen X, Zhang L-G, Wang C, Gao W (2007) Viewpoint invariant sign language recognition. Comput Vis Image Underst 108(1–2):87–97

    Article  Google Scholar 

  60. Elons AS, Abull-Ela M, Tolba MF (2013) A proposed PCNN features quality optimization technique for pose-invariant 3D Arabic sign language recognition. Appl Soft Comput 13(4):1646–1660

    Article  Google Scholar 

  61. Zhu J, Zou W, Zhu Z, Liang Xu, Huang G (2019) Action machine: toward person-centric action recognition in videos. IEEE Signal Process Lett 26(11):1633–1637

    Article  Google Scholar 

  62. Luvizon DC, Picard D, Tabia H (2018) 2d/3d pose estimation and action recognition using multitask deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5137–5146

  63. Liu M, Yuan J (2018) Recognizing human actions as the evolution of pose estimation maps. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1159–1168

  64. Wang P, Li W, Wan J, Ogunbona P, Liu X (2018) Cooperative training of deep aggregation networks for RGB-D action recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 32, No. 1

  65. Nida N, Yousaf MH, Irtaza A, Velastin SA (2020) Deep temporal motion descriptor (DTMD) for human action recognition. Turk J Electr Eng Comput Sci 28(3):1371–1385

    Article  Google Scholar 

  66. Khan MA, Akram T, Sharif M, Javed MY, Muhammad N, Yasmin M (2019) An implementation of optimized framework for action classification using multilayers neural network on selected fused features. Pattern Anal Appl 22(4):1377–1397

    Article  MathSciNet  Google Scholar 

  67. Ijjina EP, Chalavadi KM (2017) Human action recognition in RGB-D videos using motion sequence information and deep learning. Pattern Recognit 72:504–516

    Article  Google Scholar 

  68. Liu C, Ying J, Yang H, Hu X, Liu J (2021) Improved human action recognition approach based on two-stream convolutional neural network model. Vis Comput 37:1327–1341

    Article  Google Scholar 

  69. Mambou S, Krejcar O, Kuca K, Selamat A (2018) Novel cross-view human action model recognition based on the powerful view-invariant features technique. Future Internet 10(9):89

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Polurie Venkata Vijay Kishore.

Ethics declarations

Conflict of interest

The author(s) declare that they have no Conflict of Interests for this research in any form.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mopidevi, S., Prasad, M.V.D. & Kishore, P.V.V. Multiview meta-metric learning for sign language recognition using triplet loss embeddings. Pattern Anal Applic 26, 1125–1141 (2023). https://doi.org/10.1007/s10044-023-01134-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-023-01134-2

Keywords

Navigation