Deep-learning-based real-time silent speech recognition using facial electromyogram recorded around eyes for hands-free interfacing in a virtual reality environment

Cha, Ho-Seung; Chang, Won-Du; Im, Chang-Hwan

doi:10.1007/s10055-021-00616-0

Deep-learning-based real-time silent speech recognition using facial electromyogram recorded around eyes for hands-free interfacing in a virtual reality environment

Original Article
Published: 30 January 2022

Volume 26, pages 1047–1057, (2022)
Cite this article

Virtual Reality Aims and scope Submit manuscript

1134 Accesses
5 Citations
Explore all metrics

Abstract

Speech recognition technology is a promising hands-free interfacing modality for virtual reality (VR) applications. However, it has several drawbacks, such as limited usability in a noisy environment or a public place and limited accessibility to those who cannot generate loud and clear voices. These limitations may be overcome by employing a silent speech recognition (SSR) technology utilizing facial electromyograms (fEMGs) in a VR environment. In the conventional SSR systems, however, fEMG electrodes were attached around the user’s lips and neck, thereby creating new practical issues, such as the requirement of an additional wearable system besides the VR headset, necessity of a complex and time-consuming procedure for attaching the fEMG electrodes, and discomfort and limited facial muscle movements of the user. To solve these problems, we propose an SSR system using fEMGs measured by a few electrodes attached around the eyes of a user, which can also be easily incorporated into available VR headsets. To enhance the accuracy of classifying the fEMG signals recorded from limited recording locations relatively far from the phonatory organs, a deep neural network-based classification method was developed using similar fEMG data previously collected from other individuals and then transformed by dynamic positional warping. In the experiments, the proposed SSR system could classify six different fEMG patterns generated by six silently spoken words with an accuracy of 92.53%. To further demonstrate that our SSR system can be used as a hands-free control interface in practical VR applications, an online SSR system was implemented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development of a Silent Speech Interface for Augmented Reality Applications

Work-in-Progress: Silent Speech Recognition Interface for the Differently Abled

NASR: NonAuditory Speech Recognition with Motion Sensors in Head-Mounted Displays

Availability of data and material

Not applicable.

Code availability

Not applicable.

Notes

References

Caserman P, Garcia-Agundez A, Konrad R et al (2019) Real-time body tracking in virtual reality using a Vive tracker. Virtual Real 23:155–168. https://doi.org/10.1007/s10055-018-0374-z
Article Google Scholar
Cha H-S, Choi S-J, Im C-H (2020) Real-time recognition of facial expressions using facial electromyograms recorded around the eyes for social virtual reality applications. IEEE Access 8:62065–62075. https://doi.org/10.1109/access.2020.2983608
Article Google Scholar
Chang W-D, Im C-H (2014) Enhanced template matching using dynamic positional warping for identification of specific patterns in electroencephalogram. J Appl Math 2014:1–7. https://doi.org/10.1155/2014/528071
Article MATH Google Scholar
Chang W-D, Shin J (2009) Dynamic positional warping: dynamic time warping for online handwriting. Int J Pattern Recognit Artif Intell 23:967–986
Article Google Scholar
Chang W-D, Cha H-S, Kim K, Im C-H (2016) Detection of eye blink artifacts from single prefrontal channel electroencephalogram. Comput Methods Programs Biomed 124:19–30. https://doi.org/10.1016/j.cmpb.2015.10.011
Article Google Scholar
Denby B, Schultz T, Honda K et al (2010) Silent speech interfaces. Speech Commun 52:270–287. https://doi.org/10.1016/j.specom.2009.08.002
Article Google Scholar
Deng Y, Heaton JT, Meltzner GS (2014) Towards a practical silent speech recognition system. In: Fifteenth annual conference of the international speech communication association, pp 1164–1168
Fatoorechi M, Archer J, Nduka C et al (2017) Using facial gestures to drive narrative in VR. In: SUI 2017: proceedings of the 2017 symposium on spatial user interaction. ACM Press, New York, USA, p 152
Gunkel SNB, Stokking HM, Prins MJ et al (2018) Virtual reality conferencing: multi-user immersive VR experiences on the web. In: Proceedings of the 9th ACM multimedia systems conference, MMSys 2018. Association for Computing Machinery, Inc, New York, NY, USA, pp 498–501
Hachet M, Guitton P (2002) The interaction table: a new input device designed for interaction in immersive large display environments. In: Proceedings of the workshop on virtual environments (EGVE’02), pp 189–196
He J, Zhang D, Jiang N et al (2015) User adaptation in long-term, open-loop myoelectric training: implications for EMG pattern recognition in prosthesis control. J Neural Eng 12:046005. https://doi.org/10.1088/1741-2560/12/4/046005
Article Google Scholar
He Z, Lv C, Peng D, Yu D (2017) A speech recognition-based interaction approach applying to immersive virtual maintenance simulation. In: 2017 international conference on reliability systems engineering, ICRSE 2017. https://doi.org/10.1109/ICRSE.2017.8030764
Hermus K (2007) A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP J Adv Signal Process. https://doi.org/10.1155/2007/45821
Article MathSciNet MATH Google Scholar
Hueber T, Benaroya EL, Chollet G et al (2010) Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Commun 52:288–300. https://doi.org/10.1016/j.specom.2009.11.004
Article Google Scholar
Ito T, Takeda K, Itakura F (2005) Analysis and recognition of whispered speech. Speech Commun 45:139–152. https://doi.org/10.1016/j.specom.2003.10.005
Article Google Scholar
Janke M, Diener L, Member S et al (2017) EMG-to-speech: direct generation of speech from facial electromyographic signals. IEEE/ACM Trans Audio Speech Lang Process 25:2375–2385. https://doi.org/10.1109/TASLP.2017.2738568
Article Google Scholar
Ji Y, Liu L, Wang H et al (2018) Updating the silent speech challenge benchmark with deep learning. Speech Commun 98:42–50. https://doi.org/10.1016/j.specom.2018.02.002
Article Google Scholar
Kapur A, Kapur S, Maes P (2018) AlterEgo. In: Proceedings of the 2018 conference on human information interaction & retrieval—IUI ’18, pp 43–53
Kim M, Cao B, Mau T, Wang J (2017) Speaker-independent silent speech recognition from flesh-point articulatory movements using an LSTM neural network. IEEE/ACM Trans Audio Speech Lang Process 25:2323–2336. https://doi.org/10.1109/TASLP.2017.2758999
Article Google Scholar
Kranzlmüller D, Reitinger B, Hackl I, Volkert J (2001) Voice controlled virtual reality and its perspectives for everyday life. In: ITG-Fachbericht, pp 101–107
Lee K-R, Chang W-D, Kim S, Im C-H (2017) Real-time “eye-writing” recognition using electrooculogram. IEEE Trans Neur Sys Reh 25:37–48. https://doi.org/10.1109/TNSRE.2016.2542524
Article Google Scholar
Mavridou I, McGhee JT, Hamedi M, et al (2017) FACETEQ interface demo for emotion expression in VR. In: IEEE virtual reality. pp 441–442
McGlashan S, Axling T (1996) A speech interface to virtual environments. In: Swedish Institute of Computer Science
Meltzner GS, Heaton JT, Deng Y et al (2017) Silent speech recognition as an alternative communication device for persons with laryngectomy. IEEE/ACM Trans Audio Speech Lang Process 25:2386–2398. https://doi.org/10.1109/TASLP.2017.2740000
Article Google Scholar
Meltzner GS, Heaton JT, Deng Y et al (2018) Development of sEMG sensors and algorithms for silent speech recognition. J Neural Eng 15:046031. https://doi.org/10.1088/1741-2552/aac965
Article Google Scholar
Miyashita H, Hayashi M, Okada K (2008) Implementation of EOG-based gaze estimation in HMD with head-tracker. In: 18th international conference on artificial reality and telexistence, pp 20–27
Pan X, Hamilton AFC (2018) Why and how to use virtual reality to study human social interaction: the challenges of exploring a new research landscape. Br J Psychol 109:395–417. https://doi.org/10.1111/bjop.12290
Article Google Scholar
Phinyomark A, Scheme E (2018) EMG pattern recognition in the era of big data and deep learning. Big Data Cogn Comput 2:21. https://doi.org/10.3390/bdcc2030021
Article Google Scholar
Phinyomark A, Quaine F, Charbonnier S et al (2013) EMG feature evaluation for improving myoelectric pattern recognition robustness. Expert Syst Appl 40:4832–4840. https://doi.org/10.1016/J.ESWA.2013.02.023
Article Google Scholar
Schultz T, Wand M (2010) Modeling coarticulation in EMG-based continuous speech recognition. Speech Commun 52:341–353. https://doi.org/10.1016/j.specom.2009.12.002
Article Google Scholar
Shahin I, Azzeh M, Shaalan K et al (2019) Speech recognition using deep neural networks: a systematic review. IEEE Access 7:1–1. https://doi.org/10.1109/access.2019.2896880
Article Google Scholar
Shibano N, Hareesh P V, Hoshino H, et al (2004) A new system for interactive VR applications using a compact hemi spherical IPD (CyberDome). In: 2004 IEEE virtual reality international conference, pp 101–107
Srisuwan N, Phukpattaranont P, Limsakul C (2018) Comparison of feature evaluation criteria for speech recognition based on electromyography. Med Biol Eng Comput 56:1041–1051. https://doi.org/10.1007/s11517-017-1723-x
Article Google Scholar
Stedmon AW, Patel H, Sharples SC, Wilson JR (2011) Developing speech input for virtual reality applications: a reality based interaction approach. Int J Hum Comput Stud 69:3–8. https://doi.org/10.1016/j.ijhcs.2010.09.002
Article Google Scholar
Wand M, Janke M, Schultz AT (2014) Tackling speaking mode varieties in EMG-based speech recognition. IEEE Trans Biomed Eng 61:2515–2526. https://doi.org/10.1109/TBME.2014.2319000
Article Google Scholar
Wang Y, Zhang M, Wu R et al (2021) Speech neuromuscular decoding based on spectrogram images using conformal predictors with Bi-LSTM. Neurocomputing 451:25–34. https://doi.org/10.1016/j.neucom.2021.03.025
Article Google Scholar
Wang J, Samal A, Green J (2015) Preliminary test of a real-time, interactive silent speech interface based on electromagnetic articulograph, pp 38–45. https://doi.org/10.3115/v1/w14-1906

Download references

Acknowledgements

This was supported in part by the Samsung Science and Technology Foundation [SRFC-TB1703-05, facial electromyogram-based facial expression recognition for interactive VR applications], in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT). (No. NRF-2019R1A2C2086593), and in part by the Institute of Information and communications Technology Planning and Evaluation (IITP) grant funded by the Korea government (MIST) (No. 2020-0-01373, Artificial Intelligence Graduate School Program (Hanyang University)).

Funding

This was supported in part by the Samsung Science and Technology Foundation [SRFC-TB1703-05, facial electromyogram-based facial expression recognition for interactive VR applications], in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT). (No. NRF-2019R1A2C2086593), and in part by the Institute of Information and communications Technology Planning and Evaluation (IITP) grant funded by the Korea government (MIST) (No. 2020-0-01373, Artificial Intelligence Graduate School Program (Hanyang University)).

Author information

Authors and Affiliations

Department of Biomedical Engineering, Hanyang University, 222 Wangsimni-ro, Seoul, 04763, South Korea
Ho-Seung Cha, Won-Du Chang & Chang-Hwan Im
Department of Computer Engineering, Pukyong National University, 45, Yongso-ro, Nam-gu, Busan, 48513, South Korea
Won-Du Chang

Authors

Ho-Seung Cha
View author publications
You can also search for this author in PubMed Google Scholar
Won-Du Chang
View author publications
You can also search for this author in PubMed Google Scholar
Chang-Hwan Im
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

HC conducted overall data analyses and wrote a major part of the paper. WC provided an algorithm of dynamic positional warping and insight for the data analysis. CI provided important insight for the design of the paper and revised the manuscript. All authors listed have made considerable contribution to this paper and approved the submitted version.

Corresponding author

Correspondence to Chang-Hwan Im.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cha, HS., Chang, WD. & Im, CH. Deep-learning-based real-time silent speech recognition using facial electromyogram recorded around eyes for hands-free interfacing in a virtual reality environment. Virtual Reality 26, 1047–1057 (2022). https://doi.org/10.1007/s10055-021-00616-0

Download citation

Received: 20 February 2021
Accepted: 14 December 2021
Published: 30 January 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s10055-021-00616-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep-learning-based real-time silent speech recognition using facial electromyogram recorded around eyes for hands-free interfacing in a virtual reality environment

Abstract

Access this article

Similar content being viewed by others

Development of a Silent Speech Interface for Augmented Reality Applications

Work-in-Progress: Silent Speech Recognition Interface for the Differently Abled

NASR: NonAuditory Speech Recognition with Motion Sensors in Head-Mounted Displays

Availability of data and material

Code availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep-learning-based real-time silent speech recognition using facial electromyogram recorded around eyes for hands-free interfacing in a virtual reality environment

Abstract

Access this article

Similar content being viewed by others

Development of a Silent Speech Interface for Augmented Reality Applications

Work-in-Progress: Silent Speech Recognition Interface for the Differently Abled

NASR: NonAuditory Speech Recognition with Motion Sensors in Head-Mounted Displays

Availability of data and material

Code availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation