Abstract
Sign language is a non-verbal communication tool used by the deaf. A robust sign language recognition framework is needed to develop Human–Robot Interaction (HRI) platforms that are able to interact with humans via sign language. Iranian sign language (ISL) is composed of both static postures and dynamic gestures of the hand and fingers. In this paper, we present a robust framework using a Deep Neural Network (DNN) to recognize dynamic ISL gestures captured by motion capture gloves in Real-Time. To this end, first, a dataset of fifteen ISL classes was collected in time series; then, this dataset was virtually augmented and pre-processed using the “state-image” method to produce a unique collection of images, each image corresponding to a specific set of sequential data representing a class. Next, by implementing a continuous Genetic algorithm, an optimal deep neural network with the minimum number of weights (trainable parameters) and the maximum overall accuracy was found. Finally, the dataset was fed to the DNN to train the model. The results showed that the optimization process was successful at finding a DNN structure highly suitable for this application, with 99.7% accuracy on the verification (test) data. Then, after implementing the module in a robotic architecture, an HRI experiment was conducted to assess the system’s performance in real-time applications. Preliminary statistical analysis on the standard UTAUT model for eight participants showed that the system can recognize ISL signs quickly and accurately during human–robot interaction. The proposed methodology can be used for other sign languages as no specific characteristics of ISL were used in the preprocessing or training stage.
Similar content being viewed by others
Data availability
All data from this project (videos of the sessions, results of the questionnaires, scores of performances, etc.) are available in the archive of the Social & Cognitive Robotics Laboratory.
Code availability
All of the codes are available in the archive of the Social & Cognitive Robotics Laboratory. In case the readers need the codes, they may contact the corresponding author.
Notes
Time-of-Flight.
Adaptive Moment Estimation.
Stochastic Gradient Descent.
References
W. H. O. (WHO). "Deafness and hearing loss." https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss (accessed 2019–04–27.
Marschark M, Hauser PC (2011) How deaf children learn: what parents and teachers need to know. Oxford University Press, USA
Courtin C (2000) The impact of sign language on the cognitive development of deaf children: the case of theories of mind. J Deaf Stud Deaf Educ 5(3):266–276. https://doi.org/10.1093/deafed/5.3.266
M. Zakipour, A. Meghdari, and M. Alemi, (2016) RASA: A low-cost upper-torso social robot acting as a sign language teaching assistant, presented at the International Conference on Social Robotics.
S. R. Hosseini, A. Taheri, A. Meghdari, and M. Alemi, (2019) Teaching persian sign language to a social robot via the learning from demonstrations approach, Presented at the International Conference on Social Robotics.
Meghdari A, Alemi M, Zakipour M, Kashanian SA (2018) Design and realization of a sign language educational humanoid robot. J Intell Rob Syst 95(1):3–17. https://doi.org/10.1007/s10846-018-0860-2
Karami A, Zanj B, Sarkaleh AK (2011) Persian sign language (PSL) recognition using wavelet transform and neural networks. Expert Syst Appl 38(3):2661–2667. https://doi.org/10.1016/j.eswa.2010.08.056
A. Kiani Sarkaleh, F. Poorahangaryan, B. Zanj, and A. Karami, (2009) A Neural Network based system for Persian sign language recognition, presented at the 2009 IEEE International Conference on Signal and Image Processing Applications, Kuala Lumpur, Malaysia.
Starner T, Weaver J, Pentland A (1998) Real-time American sign language recognition using desk and wearable computer based video. IEEE Trans Pattern Anal Mach Intell 20(12):1371–1375. https://doi.org/10.1109/34.735811
P. V. V. Kishore, M. V. D. Prasad, C. R. Prasad, and R. Rahul, (2015) 4-Camera model for sign language recognition using elliptical fourier descriptors and ANN, presented at the 2015 International Conference on Signal Processing and Communication Engineering Systems, Guntur, India.
Oz C, Leu MC (2011) American Sign Language word recognition with a sensory glove using artificial neural networks. Eng Appl Artif Intell 24(7):1204–1213. https://doi.org/10.1016/j.engappai.2011.06.015
S. A. Mehdi and Y. N. Khan, (2002) Sign language recognition using sensor glove, In Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02, Singapore, Singapore: IEEE, https://doi.org/10.1109/ICONIP.2002.1201884.
R.-H. Liang, (1998) A real-time continuous gesture recognition system for sign language, In Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan: IEEE, https://doi.org/10.1109/AFGR.1998.671007.
A. Agarwal and M. K. Thakur, (2013) Sign language recognition using Microsoft Kinect, presented at the 2013 Sixth International Conference on Contemporary Computing (IC3), Noida, India 8–10 Aug. 2013.
Z. Zafrulla, B. Brashear, S. Starner, H. Hamilton, and P. Presti, (2011) American sign language recognition with the kinect, in ICMI '11 Proceedings of the 13th international conference on multimodal interfaces Alicante, Spain, https://doi.org/10.1145/2070481.2070532.
S. Lang, M. Block, and R. Rojas, (2012) Sign Language Recognition Using Kinect, Presented at the International Conference on Artificial Intelligence and Soft Computing ICAISC 2012.
Zahedi M, Manashty AR (2011) Robust sign language recognition system using ToF depth cameras. World Comput Sci Inf Technol J (WCSIT) 1(3):50–55 (arXiv:1105.0699)
S. Oprisescu, C. Rasche, and B. Su, (2012) Automatic static hand gesture recognition using ToF cameras, In 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), Bucharest, Romania, 27–31 Aug. 2012.
L. E. Potter, J. Araullo, and C. Carter, (2013) The Leap Motion controller: a view on sign language, In Proceedings of the 25th Australian Computer-Human Interaction Conference: Augmentation, Application, Innovation, Collaboration Adelaide, Australia, https://doi.org/10.1145/2541016.2541072.
C.-H. Chuan, E. Regina, and C. Guardino, (2014) American Sign Language Recognition Using Leap Motion Sensor, In 2014 13th International Conference on Machine Learning and Applications, Detroit, MI, USA: IEEE, https://doi.org/10.1109/ICMLA.2014.110.
M. Mohandes, S. Aliyu, and M. Deriche, (2014) Arabic sign language recognition using the leap motion controller, Presented at the 2014 IEEE 23rd International Symposium on Industrial Electronics (ISIE), Istanbul, Turkey.
H. Brashear, T. Starner, P. Lukowicz, and H. Junker, (2003) Using multiple sensors for mobile sign language recognition, In Seventh IEEE International Symposium on Wearable Computers, 2003. Proceedings, White Plains, NY, USA, USA, https://doi.org/10.1109/ISWC.2003.1241392.
Yang HD (2014) Sign language recognition with the Kinect sensor based on conditional random fields. Sensors (Basel) 15(1):135–147. https://doi.org/10.3390/s150100135
Gao WEN, Ma J, Wu J, Wang C (2000) Sign language recognition based on Hmm/Ann/Dp. Int J Pattern Recognit Artif Intell 14(05):587–602. https://doi.org/10.1142/s0218001400000386
S. K. Yewale and P. K. Bharne, (2011) Hand gesture recognition using different algorithms based on artificial neural network, Presented at the 2011 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC), Udaipur, India.
Izzah A, Suciati N (2014) Translation of sign language using generic fourier descriptor and nearest neighbour. Int J Cybern Inf 3(1):31–41. https://doi.org/10.5121/ijci.2014.3104
Ansari ZA, Harit G (2016) Nearest neighbour classification of Indian sign language gestures using kinect camera. Sadhana 41(2):161–182. https://doi.org/10.1007/s12046-015-0405-3
J. Ye, H. Yao, and F. Jiang, (2004) Based on HMM and SVM multilayer architecture classifier for Chinese sign language recognition with large vocabulary, Presented at the Third International Conference on Image and Graphics (ICIG'04), Hong Kong, China, China.
Subashini TS, Nagarajan S (2013) Static hand gesture recognition for sign language alphabets using edge oriented histogram and multi class SVM. Int J Comput Appl 82(4):28–35. https://doi.org/10.5120/14106-2145
BPP. Kumar and MB. Manjunatha (2017) A Hybrid Gesture Recognition Method for American Sign Language, Indian Journal of Science and Technology, 10(1) https://doi.org/10.17485/ijst/2017/v10i1/109389
O. Koller, S. Zargaran, R. Schlüter, and R. A Bowden, (2016) Deep Sign: Hybrid CNN-HMM for Continuous Sign Language Recognition, In The British Machine Vision Conference (BMVC), York, https://doi.org/10.5244/C.30.136.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, (2012) ImageNet classification with deep convolutional neural networks, Presented at the Proceedings of the 25th International Conference on Neural Information Processing Systems-Volume 1, Lake Tahoe, Nevada.
Y. LeCun, Y. Bengio, and G. Hinton, (2015) Deep learning, Nature, vol. 521, p. 436, 05/27/online 2015, https://doi.org/10.1038/nature14539.
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
Cui R, Liu H, Zhang C (2019) A deep neural framework for continuous sign language recognition by iterative training. IEEE Trans Multimedia 21(7):1880–1891. https://doi.org/10.1109/tmm.2018.2889563
M. Taskiran, M. Killioglu, and N. Kahraman, (2018) A Real-Time System for Recognition of American Sign Language by using Deep Learning, In 2018 41st International Conference on Telecommunications and Signal Processing (TSP), Athens, Greece: IEEE, https://doi.org/10.1109/TSP.2018.8441304.
Tang A, Lu K, Wang Y, Huang J, Li H (2015) A real-time hand posture recognition system using deep neural networks. ACM Trans Intell Syst Technol 6(2):1–23. https://doi.org/10.1145/2735952
Oyedotun OK, Khashman A (2016) Deep learning in vision-based static hand gesture recognition. Neural Comput Appl 28(12):3941–3951. https://doi.org/10.1007/s00521-016-2294-8
Azar SG, Seyedarabi H (2020) Trajectory-based recognition of dynamic Persian sign language using hidden Markov model. Comput Speech Lang 61:101053. https://doi.org/10.1016/j.csl.2019.101053
K. Xing et al., (2018) Hand Gesture Recognition Based on Deep Learning Method, Presented at the 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC), Guangzhou, China.
Tubaiz N, Shanableh T, Assaleh K (2015) Glove-based continuous arabic sign language recognition in user-dependent Mode. IEEE Trans Human-Mach Syst 45(4):526–533. https://doi.org/10.1109/thms.2015.2406692
Dong Y (2018) An application of Deep Neural Networks to the in-flight parameter identification for detection and characterization of aircraft icing. Aerosp Sci Technol 77:34–49. https://doi.org/10.1016/j.ast.2018.02.026
Dong Y (2019) Implementing Deep Learning for comprehensive aircraft icing and actuator/sensor fault detection/identification. Eng Appl Artif Intell 83:28–44. https://doi.org/10.1016/j.engappai.2019.04.010
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117. https://doi.org/10.1016/j.neunet.2014.09.003
van Dyk DA, Meng X-L (2001) The art of data augmentation. J Comput Graph Stat 10(1):1–50. https://doi.org/10.1198/10618600152418584
Barbu T (2013) Variational image denoising approach with diffusion porous media flow. Abstr Appl Anal 2013:1–8. https://doi.org/10.1155/2013/856876
F. Chollet. "Keras." https://keras.io/ (accessed 2019–07–23).
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
J. Ba and D. P. Kingma, "Adam: A Method for Stochastic Optimization," presented at the the 3rd International Conference for Learning Representations, San Diego, 2015.
J. Brownlee. "Gentle Introduction to the Adam Optimization Algorithm for Deep Learning." https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/ (accessed 2019–30–07, 2019).
"ML Cheatsheet " https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html (accessed 2019–02–08.
M. Mitchell, An Introduction to Genetic Algorithms. MIT Press, 1996.
Basiri S, Taheri A, Meghdari A, Alemi M (2021) Design and implementation of a robotic architecture for adaptive teaching: a case study on iranian sign language. J Intell Robot Syst 102(2):48. https://doi.org/10.1007/s10846-021-01413-2
Williams MD, Rana NP, Dwivedi YK (2015) The unified theory of acceptance and use of technology (UTAUT): a literature review. J Enterp Inf Manag 28(3):443–488
Acknowledgements
This research was supported by the “Iranian National Science Foundation (INSF)” (http://en.insf.org/). We also appreciate the “Dr. AliAkbar Siassi Memorial Grant Award” for their complementary support of the Social & Cognitive Robotics Laboratory.
Funding
Author Alireza Taheri has received research grants from the “Iranian National Science Foundation (INSF)” (Gant No. 98025100).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design, material preparation, data collection and analysis were performed by Salar Basiri and Alireza Taheri. The first draft of the manuscript was written by Salar Basiri and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors Salar Basiri, Ali Meghdari, Mehrdad Boroushaki, and Minoo Alemi declare that they have no conflict of interest.
Ethics approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. Ethical approval for the protocol of this study was provided by the Iran University of Medical Sciences (#IR.IUMS.REC.1395.95301469).
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Basiri, S., Taheri, A., Meghdari, A.F. et al. Dynamic Iranian Sign Language Recognition Using an Optimized Deep Neural Network: An Implementation via a Robotic-Based Architecture. Int J of Soc Robotics 15, 599–619 (2023). https://doi.org/10.1007/s12369-021-00819-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12369-021-00819-0