Abstract
Mexican Sign Language (MSL) is the primary form of communication for the deaf community in Mexico. MSL has a different grammatical structure than Spanish; furthermore, facial expression plays a determining role in complementing context-based meaning. This turns it difficult for a hearing person without prior knowledge of the language to understand what is to be transmitted, representing an important communication barrier for deaf people. In order to face this, we present the first architecture to consider facial features as indicators of grammatical tense to develop a real-time interpreter from MSL to written Spanish. Our model uses the open source MediaPipe library to extract marks from the face, body position and hands. Three 2D convolutional neural networks are used to encode individually and extract patterns, the networks converge to a multilayer perceptron for classification. Finally, a Hidden Markov Model is used to morphosyntactically predict the most probable sequence of words based on a preloaded knowledge base. From the experiments were carried out, a precision of 94.9% was obtained with \(\sigma = 0.07\) for the recognition of 75 isolated words and 94.1% with \(\sigma = 0.09\) for the interpretation of 20 sentences in MSL in a medical context. Being an approach based on camera inputs and observing that even with a few samples an adequate generalization can be achieved, it would be feasible to scale our architecture to other sign languages and offer possibilities of efficient communication to millions of people with hearing disability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cruz, M.: Gramática de la Lengua de Señas Mexicana, 1st edn. Centro de Estudios Lingüísticos y Literarios, Colegio de México (2008)
Ordóñez, E.: Asociación de intérpretes en lengua de señas del distrito federal: Número de intérpretes de lengua de señas en México (2015)
Rashed, J.R.: New Method for Hand Gesture Recognition Using Wavelet Neural Network (2017)
Ben Jmaa, A., Mahdi, W., Ben Jemaa, Y., Ben Hamadou, A.: A new approach for hand gestures recognition based on depth map captured by RGB-D camera. Computacion y Sistemas 20, 709–721 (2016)
Dong, C., Leu, M.C., Yin, Z.: American Sign Language alphabet recognition using Microsoft Kinect. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Volume 2015, pp. 44–52. IEEE Computer Society, October 2015
Fels, S.S., Hinton, G.E.: Glove-TalkII - A neural-network interface which maps gestures to parallel formant speech synthesizer controls. IEEE Trans. Neural Networks 8, 977–984 (1997)
Tolba, A.S.: Arabic Glove-Talk (AGT): a communication aid for vocally impaired. Pattern Anal. Appl. 1, 218–230 (1998)
Grobel, K., Assan, M.: Isolated sign language recognition using Hidden Markov Models. In: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. vol. 1, pp. 162–167. IEEE (1997)
García-Bautista, G., Trujillo-Romero, F., Caballero-Morales, S.O.: Mexican sign language recognition using Kinect and data time warping algorithm. In: 2017 International Conference on Electronics, Communications and Computers, CONIELECOMP 2017, pp. 1–5. Institute of Electrical and Electronics Engineers Inc., (2017)
Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: Salah, A.A., Lepri, B. (eds.) HBU 2011. LNCS, vol. 7065, pp. 29–39. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25446-8_4
Kadhim, R.A., Khamees, M.: A real-time American sign language recognition system using convolutional neural network for real datasets. TEM J. 9, 937–943 (2020)
Huang, J., Zhou, W., Li, H., Li, W.: Sign language recognition using 3D convolutional neural networks. In: Proceedings - IEEE International Conference on Multimedia and Expo, vol. 2015, pp. 1–6. IEEE Computer Society, August 2015
Huang, J., Zhou, W., Zhang, Q., Li, H., Li, W.: Video-based sign language recognition without temporal segmentation. In: 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pp. 2257–2264. AAAI Press (2018)
Carmona-Arroyo, G., Rios-Figueroa, H.V., Avendaño-Garrido, M.L.:Mexican Sign-Language static-alphabet recognition using 3D affine invariants. In: Machine Vision Inspection Systems, vol. 2, pp. 171–192. Wiley (2021)
Galicia, R., Carranza, O., Jimenez, E.D., Rivera, G.E.: Mexican sign language recognition using movement sensor. In: IEEE International Symposium on Industrial Electronics, vol. 2015, pp. 573–578. Institute of Electrical and Electronics Engineers Inc., (2015)
Luis-Pérez, F.E., Trujillo-Romero, F., Martínez-Velazco, W.: Control of a service robot using the Mexican sign language. In: Batyrshin, I., Sidorov, G. (eds.) MICAI 2011. LNCS (LNAI), vol. 7095, pp. 419–430. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25330-0_37
Sataloff, R.T., Johns, M.M., Kost, K.M.: Reconocimiento de Imágenes del Lenguaje de Señas Mexicano, México D.F. (2012)
Solís, F., Martínez, D., Espinoza, O.: Automatic Mexican sign language recognition using normalized moments and artificial neural networks. Engineering 08, 733–740 (2016)
Solís, F., Toxqui, C., Martínez, D.: Mexican sign language recognition using Jacobi-Fourier moments. Engineering 07, 700–705 (2015)
Álvarez, N.A.: Kinect V2 como alternativa para desarrollar un traductor de ideogramas de lengua de señas mexicana (LSM) (2016)
Sosa-Jimenez, C.O., Rios-Figueroa, H.V., Rechy-Ramirez, E.J., Marin-Hernandez, A., Gonzalez-Cosio, A.L.S.: Real-time Mexican Sign Language recognition. In: 2017 IEEE International Autumn Meeting on Power, Electronics and Computing, ROPEC 2017, vol. 2018, pp. 1–6. Institute of Electrical and Electronics Engineers Inc. (2017)
Lugaresi, C., et al.: MediaPipe: a framework for building perception pipelines (2019)
Naveenkumar, M., Ayyasamy, V.: OpenCV for computer vision applications. In: Proceedings of National Conference on Big Data and Cloud Computing (NCBDC 2015), pp. 52–56 (2016)
Serafìn, M., González, R.: Diccionario de Lengua de Señas Mexicana, vol. 38, México D.F. (2011)
Forney, G.D.: The Viterbi algorithm. Proc. IEEE 61, 268–278 (1973)
Keval, H., Sasse, M.A.: To catch a thief - You need at least 8 frames per second: the impact of frame rates on user performance in a CCTV detection task. In: MM’08 - Proceedings of the 2008 ACM International Conference on Multimedia, with Co-located Symposium and Workshops, pp. 941–944 (2008)
Bisong, E.: Google Colaboratory, pp. 59–64. Apress, Berkeley (2019)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ramírez Sánchez, J.E., Rodríguez, A.A., Mendoza, M.G. (2021). Real-Time Mexican Sign Language Interpretation Using CNN and HMM. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Computational Intelligence. MICAI 2021. Lecture Notes in Computer Science(), vol 13067. Springer, Cham. https://doi.org/10.1007/978-3-030-89817-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-89817-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89816-8
Online ISBN: 978-3-030-89817-5
eBook Packages: Computer ScienceComputer Science (R0)