Abstract
Speech recognition technology is a promising hands-free interfacing modality for virtual reality (VR) applications. However, it has several drawbacks, such as limited usability in a noisy environment or a public place and limited accessibility to those who cannot generate loud and clear voices. These limitations may be overcome by employing a silent speech recognition (SSR) technology utilizing facial electromyograms (fEMGs) in a VR environment. In the conventional SSR systems, however, fEMG electrodes were attached around the user’s lips and neck, thereby creating new practical issues, such as the requirement of an additional wearable system besides the VR headset, necessity of a complex and time-consuming procedure for attaching the fEMG electrodes, and discomfort and limited facial muscle movements of the user. To solve these problems, we propose an SSR system using fEMGs measured by a few electrodes attached around the eyes of a user, which can also be easily incorporated into available VR headsets. To enhance the accuracy of classifying the fEMG signals recorded from limited recording locations relatively far from the phonatory organs, a deep neural network-based classification method was developed using similar fEMG data previously collected from other individuals and then transformed by dynamic positional warping. In the experiments, the proposed SSR system could classify six different fEMG patterns generated by six silently spoken words with an accuracy of 92.53%. To further demonstrate that our SSR system can be used as a hands-free control interface in practical VR applications, an online SSR system was implemented.
Similar content being viewed by others
Availability of data and material
Not applicable.
Code availability
Not applicable.
References
Caserman P, Garcia-Agundez A, Konrad R et al (2019) Real-time body tracking in virtual reality using a Vive tracker. Virtual Real 23:155–168. https://doi.org/10.1007/s10055-018-0374-z
Cha H-S, Choi S-J, Im C-H (2020) Real-time recognition of facial expressions using facial electromyograms recorded around the eyes for social virtual reality applications. IEEE Access 8:62065–62075. https://doi.org/10.1109/access.2020.2983608
Chang W-D, Im C-H (2014) Enhanced template matching using dynamic positional warping for identification of specific patterns in electroencephalogram. J Appl Math 2014:1–7. https://doi.org/10.1155/2014/528071
Chang W-D, Shin J (2009) Dynamic positional warping: dynamic time warping for online handwriting. Int J Pattern Recognit Artif Intell 23:967–986
Chang W-D, Cha H-S, Kim K, Im C-H (2016) Detection of eye blink artifacts from single prefrontal channel electroencephalogram. Comput Methods Programs Biomed 124:19–30. https://doi.org/10.1016/j.cmpb.2015.10.011
Denby B, Schultz T, Honda K et al (2010) Silent speech interfaces. Speech Commun 52:270–287. https://doi.org/10.1016/j.specom.2009.08.002
Deng Y, Heaton JT, Meltzner GS (2014) Towards a practical silent speech recognition system. In: Fifteenth annual conference of the international speech communication association, pp 1164–1168
Fatoorechi M, Archer J, Nduka C et al (2017) Using facial gestures to drive narrative in VR. In: SUI 2017: proceedings of the 2017 symposium on spatial user interaction. ACM Press, New York, USA, p 152
Gunkel SNB, Stokking HM, Prins MJ et al (2018) Virtual reality conferencing: multi-user immersive VR experiences on the web. In: Proceedings of the 9th ACM multimedia systems conference, MMSys 2018. Association for Computing Machinery, Inc, New York, NY, USA, pp 498–501
Hachet M, Guitton P (2002) The interaction table: a new input device designed for interaction in immersive large display environments. In: Proceedings of the workshop on virtual environments (EGVE’02), pp 189–196
He J, Zhang D, Jiang N et al (2015) User adaptation in long-term, open-loop myoelectric training: implications for EMG pattern recognition in prosthesis control. J Neural Eng 12:046005. https://doi.org/10.1088/1741-2560/12/4/046005
He Z, Lv C, Peng D, Yu D (2017) A speech recognition-based interaction approach applying to immersive virtual maintenance simulation. In: 2017 international conference on reliability systems engineering, ICRSE 2017. https://doi.org/10.1109/ICRSE.2017.8030764
Hermus K (2007) A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP J Adv Signal Process. https://doi.org/10.1155/2007/45821
Hueber T, Benaroya EL, Chollet G et al (2010) Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Commun 52:288–300. https://doi.org/10.1016/j.specom.2009.11.004
Ito T, Takeda K, Itakura F (2005) Analysis and recognition of whispered speech. Speech Commun 45:139–152. https://doi.org/10.1016/j.specom.2003.10.005
Janke M, Diener L, Member S et al (2017) EMG-to-speech: direct generation of speech from facial electromyographic signals. IEEE/ACM Trans Audio Speech Lang Process 25:2375–2385. https://doi.org/10.1109/TASLP.2017.2738568
Ji Y, Liu L, Wang H et al (2018) Updating the silent speech challenge benchmark with deep learning. Speech Commun 98:42–50. https://doi.org/10.1016/j.specom.2018.02.002
Kapur A, Kapur S, Maes P (2018) AlterEgo. In: Proceedings of the 2018 conference on human information interaction & retrieval—IUI ’18, pp 43–53
Kim M, Cao B, Mau T, Wang J (2017) Speaker-independent silent speech recognition from flesh-point articulatory movements using an LSTM neural network. IEEE/ACM Trans Audio Speech Lang Process 25:2323–2336. https://doi.org/10.1109/TASLP.2017.2758999
Kranzlmüller D, Reitinger B, Hackl I, Volkert J (2001) Voice controlled virtual reality and its perspectives for everyday life. In: ITG-Fachbericht, pp 101–107
Lee K-R, Chang W-D, Kim S, Im C-H (2017) Real-time “eye-writing” recognition using electrooculogram. IEEE Trans Neur Sys Reh 25:37–48. https://doi.org/10.1109/TNSRE.2016.2542524
Mavridou I, McGhee JT, Hamedi M, et al (2017) FACETEQ interface demo for emotion expression in VR. In: IEEE virtual reality. pp 441–442
McGlashan S, Axling T (1996) A speech interface to virtual environments. In: Swedish Institute of Computer Science
Meltzner GS, Heaton JT, Deng Y et al (2017) Silent speech recognition as an alternative communication device for persons with laryngectomy. IEEE/ACM Trans Audio Speech Lang Process 25:2386–2398. https://doi.org/10.1109/TASLP.2017.2740000
Meltzner GS, Heaton JT, Deng Y et al (2018) Development of sEMG sensors and algorithms for silent speech recognition. J Neural Eng 15:046031. https://doi.org/10.1088/1741-2552/aac965
Miyashita H, Hayashi M, Okada K (2008) Implementation of EOG-based gaze estimation in HMD with head-tracker. In: 18th international conference on artificial reality and telexistence, pp 20–27
Pan X, Hamilton AFC (2018) Why and how to use virtual reality to study human social interaction: the challenges of exploring a new research landscape. Br J Psychol 109:395–417. https://doi.org/10.1111/bjop.12290
Phinyomark A, Scheme E (2018) EMG pattern recognition in the era of big data and deep learning. Big Data Cogn Comput 2:21. https://doi.org/10.3390/bdcc2030021
Phinyomark A, Quaine F, Charbonnier S et al (2013) EMG feature evaluation for improving myoelectric pattern recognition robustness. Expert Syst Appl 40:4832–4840. https://doi.org/10.1016/J.ESWA.2013.02.023
Schultz T, Wand M (2010) Modeling coarticulation in EMG-based continuous speech recognition. Speech Commun 52:341–353. https://doi.org/10.1016/j.specom.2009.12.002
Shahin I, Azzeh M, Shaalan K et al (2019) Speech recognition using deep neural networks: a systematic review. IEEE Access 7:1–1. https://doi.org/10.1109/access.2019.2896880
Shibano N, Hareesh P V, Hoshino H, et al (2004) A new system for interactive VR applications using a compact hemi spherical IPD (CyberDome). In: 2004 IEEE virtual reality international conference, pp 101–107
Srisuwan N, Phukpattaranont P, Limsakul C (2018) Comparison of feature evaluation criteria for speech recognition based on electromyography. Med Biol Eng Comput 56:1041–1051. https://doi.org/10.1007/s11517-017-1723-x
Stedmon AW, Patel H, Sharples SC, Wilson JR (2011) Developing speech input for virtual reality applications: a reality based interaction approach. Int J Hum Comput Stud 69:3–8. https://doi.org/10.1016/j.ijhcs.2010.09.002
Wand M, Janke M, Schultz AT (2014) Tackling speaking mode varieties in EMG-based speech recognition. IEEE Trans Biomed Eng 61:2515–2526. https://doi.org/10.1109/TBME.2014.2319000
Wang Y, Zhang M, Wu R et al (2021) Speech neuromuscular decoding based on spectrogram images using conformal predictors with Bi-LSTM. Neurocomputing 451:25–34. https://doi.org/10.1016/j.neucom.2021.03.025
Wang J, Samal A, Green J (2015) Preliminary test of a real-time, interactive silent speech interface based on electromagnetic articulograph, pp 38–45. https://doi.org/10.3115/v1/w14-1906
Acknowledgements
This was supported in part by the Samsung Science and Technology Foundation [SRFC-TB1703-05, facial electromyogram-based facial expression recognition for interactive VR applications], in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT). (No. NRF-2019R1A2C2086593), and in part by the Institute of Information and communications Technology Planning and Evaluation (IITP) grant funded by the Korea government (MIST) (No. 2020-0-01373, Artificial Intelligence Graduate School Program (Hanyang University)).
Funding
This was supported in part by the Samsung Science and Technology Foundation [SRFC-TB1703-05, facial electromyogram-based facial expression recognition for interactive VR applications], in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT). (No. NRF-2019R1A2C2086593), and in part by the Institute of Information and communications Technology Planning and Evaluation (IITP) grant funded by the Korea government (MIST) (No. 2020-0-01373, Artificial Intelligence Graduate School Program (Hanyang University)).
Author information
Authors and Affiliations
Contributions
HC conducted overall data analyses and wrote a major part of the paper. WC provided an algorithm of dynamic positional warping and insight for the data analysis. CI provided important insight for the design of the paper and revised the manuscript. All authors listed have made considerable contribution to this paper and approved the submitted version.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cha, HS., Chang, WD. & Im, CH. Deep-learning-based real-time silent speech recognition using facial electromyogram recorded around eyes for hands-free interfacing in a virtual reality environment. Virtual Reality 26, 1047–1057 (2022). https://doi.org/10.1007/s10055-021-00616-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10055-021-00616-0