Skip to main content

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

  • 1802 Accesses

Abstract

The concept of silent speech, when applied to human–computer interaction (HCI), describes a system that allows for speech communication between humans and machines in the absence of an audible acoustic signal. This type of system can be used as an input HCI modality in high-background-noise environments such as in living rooms, or in aiding speech-impaired individuals. The audible acoustic signal is, in fact, just the end result of the complex process of speech production, which starts at the brain, triggers relevant muscular activity, and results in movements of the articulators. It is this information that silent speech interfaces (SSIs) strive to harness and, in this context, understanding the different stages of speech production is of major importance.

In this chapter, the reader finds a brief introduction into the historical context for the rising interest in silent speech, followed by an overview on the different stages involved in speech production. Along the way, we establish a correspondence between the natural speech production process and the technology, which will be further discussed in the following chapters, leading to the existing silent speech interface (SSI) systems. Additionally, we identify overall challenges in the development of SSI.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For a more detailed historical context the reader is pointed to Denby et al. (2010).

References

  • Betts BJ, Binsted K, Jorgensen C (2006) Small vocabulary recognition using surface electromyography. J Human-Computer Interact. 18:1242–1259. http://dx.doi.org/10.1016/j.intcom.2006.08.012

  • Brumberg JS, Nieto-Castanon A, Kennedy PR, Guenther FH (2010) Brain-computer interfaces for speech communication. Speech Commun 52:367–379. doi:10.1016/j.specom.2010.01.001

    Article  Google Scholar 

  • Clegg DG (1953) The listening eye: a simple introduction to the art of lip-reading. Methuen, London

    Google Scholar 

  • De Luca CJ (1979) Physiology and mathematics of myoelectric signals. IEEE Trans Biomed Eng 26:313–25

    Article  Google Scholar 

  • De Wachter M, Matton M, Demuynck K, Wambacq P, Cools R, Van Compernolle D (2007) Template-based continuous speech recognition. IEEE Trans Audio Speech Lang Process 15:1377–1390. doi:10.1109/TASL.2007.894524

    Article  Google Scholar 

  • Denby B, Schultz T, Honda K, Hueber T, Gilbert JM, Brumberg JS (2010) Silent speech interfaces. Speech Commun 52:270–287. doi:10.1016/j.specom.2009.08.002

    Article  Google Scholar 

  • Denby B, Stone M (2004) Speech synthesis from real time ultrasound images of the tongue. 2004 IEEE Int. Conf Acoust Speech Signal Process. 1. doi:10.1109/ICASSP.2004.1326078

  • Fagan MJ, Ell SR, Gilbert JM, Sarrazin E, Chapman PM (2008) Development of a (silent) speech recognition system for patients following laryngectomy. Med Eng Phys 30:419–425. doi:10.1016/j.medengphy.2007.05.003

    Article  Google Scholar 

  • Fitzpatrick M (2002) Lip-reading cellphone silences loudmouths. New Sci. Ed. 2002:3

    Google Scholar 

  • Florescu VM, Crevier-Buchman L, Denby B, Hueber T, Colazo-Simon A, Pillot-Loiseau C, Roussel-Ragot P, Gendrot C, Quattrocchi S (2010) Silent vs vocalized articulation for a portable ultrasound-based silent speech interface. In: Proceedings of Interspeech 2010, pp 450–453

    Google Scholar 

  • Fraiwan L, Lweesy K, Al-Nemrawi A, Addabass S, Saifan R (2011) Voiceless Arabic vowels recognition using facial EMG. Med Biol Eng Comput 49:811–818. doi:10.1007/s11517-011-0751-1

    Article  Google Scholar 

  • Freitas J, Ferreira A, Figueiredo M, Teixeira A, Dias MS (2014a) Enhancing multimodal silent speech interfaces with feature selection. In: 15th Annual conference of the international speech communication association (Interspeech 2014). Singapore, pp 1169–1173

    Google Scholar 

  • Freitas J, Teixeira A, Dias MS (2014b) Multimodal corpora for silent speech interaction. In: 9th Language resources and evaluation conference (LREC), pp 1–5

    Google Scholar 

  • Freitas J, Teixeira A, Dias MS (2012a) Towards a silent speech interface for Portuguese: Surface electromyography and the nasality challenge. In: International conference on bio-inspired systems and signal processing (BIOSIGNALS 2012), pp 91–100

    Google Scholar 

  • Freitas J, Teixeira A, Vaz F, Dias MS (2012a) Automatic speech recognition based on ultrasonic Doppler sensing for European Portuguese. In: Toledano DT, Ortega A, Teixeira A, Gonzalez-Rodriguez J, Hernandez-Gomez L, San-Segundo R, Ramos D (eds) Advances in speech and language technologies for Iberian languages, Communications in Computer and Information Science. Springer, Berlin, pp 227–236. doi:10.1007/978-3-642-35292-8_24

    Chapter  Google Scholar 

  • Galatas G, Potamianos G, Makedon F (2012) Audio-visual speech recognition using depth information from the Kinect in noisy video condition. In: Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments—PETRA’12. pp 1–4. doi:10.1145/2413097.2413100

  • Gonzalez JA, Cheah LA, Gilbert JM, Bai J, Ell SR, Green PD, Moore RK (2016) A silent speech system based on permanent magnet articulography and direct synthesis. Comput Speech Lang. doi:10.1016/j.csl.2016.02.002

    Google Scholar 

  • Guenther FH, Brumberg JS, Joseph Wright E, Nieto-Castanon A, Tourville JA, Panko M, Law R, Siebert SA, Bartels JL, Andreasen DS, Ehirim P, Mao H, Kennedy PR (2009) A wireless brain-machine interface for real-time speech synthesis. PLoS One 4(12), e8218. doi:10.1371/journal.pone.0008218

    Article  Google Scholar 

  • Hardcastle WJ (1976) Physiology of speech production: an introduction for speech scientists. Academic, New York

    Google Scholar 

  • Hasegawa T, Ohtani K (1992) Oral image to voice converter-image input microphone. In: Singapore ICCS/ISITA’92. ‘Communications on the Move’, IEEE, pp 617–620

    Google Scholar 

  • Heistermann T, Janke M, Wand M, Schultz T (2014) Spatial artifact detection for multi-channel EMG-based speech recognition. In: International conference on bio-inspired systems and signal processing, pp 189–196

    Google Scholar 

  • Hofe R, Bai J, Cheah LA, Ell SR, Gilbert JM, Moore RK, Green PD (2013a) Performance of the MVOCA silent speech interface across multiple speakers. In: Proc. of Interspeech 2013, pp 1140–1143

    Google Scholar 

  • Hofe R, Ell SR, Fagan MJ, Gilbert JM, Green PD, Moore RK, Rybchenko SI (2013b) Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing. Speech Commun 55:22–32. doi:10.1016/j.specom.2012.02.001

  • Holzrichter JF, Foundation JH, Davis C (2009) Characterizing Silent and Pseudo-Silent Speech using Radar-like Sensors. Interspeech 2009:656–659

    Google Scholar 

  • Hueber T, Bailly G, Denby B (2012) Continuous articulatory-to-acoustic mapping using phone-based trajectory hmm for a silent speech interface. In: Proceedings of interspeech 2012, pp 723–726

    Google Scholar 

  • Hueber T, Chollet G, Denby B, Dreyfus G, Stone M (2008) An ultrasound-based silent speech interface. J Acoust Soc Am. doi:10.1121/1.2936013

    Google Scholar 

  • Jorgensen C, Dusan S (2010) Speech interfaces based upon surface electromyography. Speech Commun 52:354–366. doi:10.1016/j.specom.2009.11.003

    Article  Google Scholar 

  • Levelt WJM (1995) The ability to speak: from intentions to spoken words. Eur Rev. doi:10.1017/S1062798700001290

    Google Scholar 

  • Maier-Hein L, Metze F, Schultz T, Waibel A (2005) Session independent non-audible speech recognition using surface electromyography, in: IEEE Workshop on automatic speech recognition and understanding (ASRU 2005), pp 331–336

    Google Scholar 

  • Manabe H (2003) Unvoiced speech recognition using EMG—Mime speech recognition. IN: CHI’03 Extended abstracts on human factors in computing systems. ACM, pp 794–795. doi:10.1145/765891.765996

  • McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264:746–748

    Article  Google Scholar 

  • Morse MS, Gopalan YN, Wright M (1991) Speech recognition using myoelectric signals with neural networks, in: Proceedings of the Annual international conference of the IEEE Engineering in Medicine and Biology Society. IEEE, pp 1877–1878

    Google Scholar 

  • Morse MS, O’Brien EM (1986) Research summary of a scheme to ascertain the availability of speech information in the myoelectric signals of neck and head muscles using surface electrodes. Comput Biol Med 16:399–410

    Article  Google Scholar 

  • Nakajima Y, Kashioka H, Shikano K, Campbell N (2003a) Non-audible murmur recognition input interface using stethoscopic microphone attached to the skin. IEEE International conference on acoustics, speech and signal processing (ICASSP 2003) 5. doi:10.1109/ICASSP.2003.1200069

  • Nakajima Y, Kashioka H, Shikano K, Campbell N (2003b) Non-audible murmur recognition. Eurospeech 2601–2604

    Google Scholar 

  • Nakamura H (1988) Method of recognizing speech using a lip image. Patent No. 4769845

    Google Scholar 

  • Novet J (2015) Google says its speech recognition technology now has only an 8% word error rate [WWW Document]. VentureBeat. http://venturebeat.com/2015/05/28/google-says-its-speech-recognition-technology-now-has-only-an-8-word-error-rate/ (accessed 1 January 2016)

  • Patil SA, Hansen JHL (2010) The physiological microphone (PMIC): A competitive alternative for speaker assessment in stress detection and speaker verification. Speech Commun 52:327–340. doi:10.1016/j.specom.2009.11.006

    Article  Google Scholar 

  • Petajan E (1984) Automatic lipreading to enhance speech recognition. University of Illinois, Champaign

    Google Scholar 

  • Porbadnigk A, Wester M, Calliess J, Schultz T (2009) EEG-based speech recognition impact of temporal effects. In: International conference on bio-inspired systems and signal processing (BIOSIGNALS 2009). doi:10.1.1.157.8486

    Google Scholar 

  • Quatieri TF, Brady K, Messing D, Campbell JP, Campbell WM, Brandstein MS, Weinstein CJ, Tardelli JD, Gatewood PD (2006) Exploiting nonacoustic sensors for speech encoding. IEEE Trans. Audio. Speech. Lang. Processing 14. doi:10.1109/TSA.2005.855838.

  • Seikel JA, King DW, Drumright DG (2009) Anatomy and physiology for speech, language, and hearing, 4th edn. Delmar Learning, Clifton Park

    Google Scholar 

  • Srinivasan S, Raj B, Ezzat T (2010) Ultrasonic sensing for robust speech recognition. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2010). doi:10.1109/ICASSP.2010.5495039

  • Sugie N, Tsunoda K (1985) A speech prosthesis employing a speech synthesizer-vowel discrimination from perioral muscle activities and vowel production. IEEE Trans Biomed Eng 32:485–490

    Article  Google Scholar 

  • The UCLA Phonetics Laboratory (2002) Dissection of the speech production mechanism

    Google Scholar 

  • Toda T (2010) Voice conversion for enhancing various types of body-conducted speech detected with non-audible murmur microphone. J Acoust Soc Am 127:1815. doi:10.1121/1.3384185

    Article  Google Scholar 

  • Toda T, Nakamura K, Nagai T, Kaino T, Nakajima Y, Shikano K (2009) Technologies for processing body-conducted speech detected with non-audible murmur microphone. In: Proceedings of Interspeech 2009

    Google Scholar 

  • Toth AR, Kalgaonkar K, Raj B, Ezzat T (2010) Synthesizing speech from Doppler signals. In: IEEE Int. Conf. on Acoustics, speech and signal processing (ICASSP 2010), pp 4638–4641

    Google Scholar 

  • Tran V-A, Bailly G, Lœvenbruck H, Toda T (2009) Multimodal HMM-based NAM-to-speech conversion. Interspeech 2009:656–659

    Google Scholar 

  • Wand M, Himmelsbach A, Heistermann T, Janke M, Schultz T (2013a) Artifact removal algorithm for an EMG-based Silent Speech Interface. In: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society, pp 5750–5753. doi:10.1109/EMBC.2013.6610857

  • Wand M, Janke M, Schultz T (2011) Investigations on Speaking Mode Discrepancies in EMG-Based Speech Recognition In: Interspeech 2011, pp 601–604

    Google Scholar 

  • Wand M, Koutník J, Schmidhuber J (2016) Lipreading with long short-term memory. arXiv Prepr. arXiv1601.08188.

    Google Scholar 

  • Wand M, Schulte C, Janke, M, Schultz, T (2013b) Array-based Electromyographic Silent Speech Interface In: International conference on bio-inspired systems and signal processing (BIOSIGNALS 2013)

    Google Scholar 

  • Wand M, Schultz, T, (2011a) Analysis of phone confusion in EMG-based speech recognition, in: IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2011), pp 757–760. doi:10.1109/ICASSP.2011.5946514

  • Wand M, Schultz T (2011b) Session-independent EMG-based Speech Recognition. In: International conference on bio-inspired systems and signal processing (BIOSIGNALS 2011). pp 295–300

    Google Scholar 

  • Zhu B, Hazen TJ, Glass JR (2007) Multimodal Speech Recognition with Ultrasonic Sensors. Interspeech 2007:662–665

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2017 The Author(s)

About this chapter

Cite this chapter

Freitas, J., Teixeira, A., Dias, M.S., Silva, S. (2017). Introduction. In: An Introduction to Silent Speech Interfaces. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-40174-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-40174-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-40173-7

  • Online ISBN: 978-3-319-40174-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics