Skip to main content

SSI Modalities II: Articulation and Its Consequences

  • Chapter
  • First Online:
An Introduction to Silent Speech Interfaces

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

  • 1772 Accesses

Abstract

Brain and muscular activity originates the change in shape or position of articulators such as the tongue or lips and, as a consequence, the vocal tract assumes different configurations. Most of these changes, namely of articulators and tract, are internal and are not easy to measure but, in some cases, like the lips or the tongue tip, such changes are visible or have visible effects. Even without the production of speech sound, these different configurations of articulators provide valuable information that can be used in the context of silent speech interfaces (SSIs). In this chapter, the reader finds an overview of the technologies used to assess articulatory and visual aspects of speech production and how researchers have exploited their capabilities for the development of silent speech interfaces.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Acher A, Perrier P, Savariaux C, Fougeron C (2014) Speech production after glossectomy: methodological aspects. Clin Linguist Phon 28:241–256

    Article  Google Scholar 

  • Alghowinem S, Wagner M, Goecke R (2013) AusTalk—The Australian speech database: design framework, recording experience and localisation. In: 8th Int. Conf. on Information Technology in Asia (CITA 2013). IEEE, pp 1–7

    Google Scholar 

  • Babani D, Toda T, Saruwatari H, Shikano K (2011) Acoustic model training for non-audible murmur recognition using transformed normal speech data. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP 2011) 5224–5227. doi:10.1109/ICASSP.2011.5947535

  • Bacsfalvi P, Bernhardt BM (2011) Long-term outcomes of speech therapy for seven adolescents with visual feedback technologies: ultrasound and electropalatography. Clin Linguist Phon 25:1034–1043

    Article  Google Scholar 

  • Bastos R, Dias MS (2009) FIRST—fast invariant to rotation and scale transform: invariant image features for augmented reality and computer vision. VDM, Saarbrücken

    Google Scholar 

  • Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In: European Conference on Computer Vision (ECCV 2006). Springer, Berlin, pp 404–417

    Google Scholar 

  • Brown DR III, Keenaghan K, Desimini S (2005) Measuring glottal activity during voiced speech using a tuned electromagnetic resonating collar sensor. Meas Sci Technol 16:2381

    Article  Google Scholar 

  • Burnham D, Estival D, Fazio S, Viethen J, Cox F, Dale R, Cassidy S, Epps J, Togneri R, Wagner M (2011) Building an audio-visual corpus of Australian English: large corpus collection with an economical portable and replicable black box. Proc Interspeech 2011:841–844

    Google Scholar 

  • Carstens Medizinelektronik (2016) 3D Electromagnetic Articulograph [WWW Document]. URL http://www.articulograph.de/. Accessed 4 April 2016

  • Carvalho P, Oliveira T, Ciobanu L, Gaspar F, Teixeira L, Bastos R, Cardoso J, Dias M, Côrte-Real L (2013) Analysis of object description methods in a video object tracking environment. Mach Vis Appl 24:1149–1165. doi:10.1007/s00138-013-0523-z

    Article  Google Scholar 

  • Cleland J, Scobbie JM, Wrench AA (2015) Using ultrasound visual biofeedback to treat persistent primary speech sound disorders. Clin Linguist Phon 1–23

    Google Scholar 

  • Denby B (2013) Down with sound, the story of silent speech. In: Workshop on Speech production in automatic speech recognition

    Google Scholar 

  • Denby B, Schultz T, Honda K, Hueber T, Gilbert JM, Brumberg JS (2010) Silent speech interfaces. Speech Commun 52:270–287. doi:10.1016/j.specom.2009.08.002

    Article  Google Scholar 

  • Fabre D, Hueber T, Badin P (2014) Automatic animation of an articulatory tongue model from ultrasound images using Gaussian mixture regression. Proc Interspeech 2014:2293–2297

    Google Scholar 

  • Fagan MJ, Ell SR, Gilbert JM, Sarrazin E, Chapman PM (2008) Development of a (silent) speech recognition system for patients following laryngectomy. Med Eng Phys 30:419–425. doi:10.1016/j.medengphy.2007.05.003

    Article  Google Scholar 

  • Florescu VM, Crevier-Buchman L, Denby B, Hueber T, Colazo-Simon A, Pillot-Loiseau C, Roussel-Ragot P, Gendrot C, Quattrocchi S (2010) Silent vs vocalized articulation for a portable ultrasound-based silent speech interface. Proc Interspeech 2010:450–453

    Google Scholar 

  • Francisco AA, Jesse, A, Groen MA, McQueen JM (2014) Audiovisual temporal sensitivity in typical and dyslexic adult readers. Proc Interspeech 2014

    Google Scholar 

  • Freitas J, Teixeira A, Dias MS, Bastos C (2011) Towards a multimodal silent speech interface for European Portuguese. In: Speech technologies, InTech, Ivo Ipsic (Ed.), pp 125–149. doi:10.5772/16935

  • Freitas J, Teixeira A, Vaz F, Dias MS (2012) Automatic speech recognition based on ultrasonic doppler sensing for European Portuguese. In: Advances in speech and language technologies for iberian languages, communications in computer and information science. Springer, Berlin, pp 227–236. doi:10.1007/978-3-642-35292-8_24

  • Freitas J, Teixeira A, Dias MS (2014) Can Ultrasonic Doppler help detecting nasality for silent speech interfaces? An exploratory analysis based on alignement of the Doppler signal with velum aperture information from real-time MRI. In: International conference on physiological computing systems (PhyCS 2014). pp 232–239

    Google Scholar 

  • Gilbert JM, Rybchenko SI, Hofe R, Ell SR, Fagan MJ, Moore RK, Green P (2010) Isolated word recognition of silent speech using magnetic implants and sensors. Med Eng Phys 32:1189–1197. doi:10.1016/j.medengphy.2010.08.011

    Article  Google Scholar 

  • Gonzalez JA, Cheah LA, Bai J, Ell SR, Gilbert JM, Moore RK, Green PD (2014) Analysis of phonetic similarity in a silent speech interface based on permanent magnetic articulography. Proc Interspeech 2014:1018–1022

    Google Scholar 

  • Gurbuz S, Tufekci Z, Patterson E, Gowdy JN (2001) Application of affine-invariant Fourier descriptors to lipreading for audio-visual speech recognition. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2001). IEEE, pp 177–180.

    Google Scholar 

  • Harris C, Stephens M (1988) A combined corner and edge detector. In: Alvey vision conference. Manchester, UK, p. 50.

    Google Scholar 

  • Heracleous P, Hagita N (2010) Non-audible murmur recognition based on fusion of audio and visual streams. Proc Interspeech 2010:2706–2709

    Google Scholar 

  • Heracleous P, Nakajima Y, Lee A, Saruwatari H, Shikano K (2003) Accurate hidden Markov models for non-audible murmur (NAM) recognition based on iterative supervised adaptation. IEEE Work. Autom. Speech Recognit. Underst. (ASRU 2003). doi:10.1109/ASRU.2003.1318406

  • Heracleous P, Badin P, Bailly G, Hagita N (2011) A pilot study on augmented speech communication based on electro-magnetic articulography. Pattern Recognit Lett 32:1119–1125

    Article  Google Scholar 

  • Hofe R, Ell SR, Fagan MJ, Gilbert JM, Green PD, Moore RK, Rybchenko SI (2010) Evaluation of a silent speech interface based on magnetic sensing. Proc Interspeech 2010:246–249

    Google Scholar 

  • Hofe R, Bai J, Cheah LA, Ell SR, Gilbert JM, Moore RK, Green PD (2013a) Performance of the MVOCA silent speech interface across multiple speakers. In: Proc. of Interspeech 2013. pp 1140–1143

    Google Scholar 

  • Hofe R, Ell SR, Fagan MJ, Gilbert JM, Green PD, Moore RK, Rybchenko SI (2013b) Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing. Speech Commun. 55:22–32. doi:10.1016/j.specom.2012.02.001

  • Holzrichter JF, Foundation JH, Davis C (2009) Characterizing silent and pseudo-silent speech using radar-like sensors. Interspeech 2009:656–659

    Google Scholar 

  • Hu R, Raj B (2005) A robust voice activity detector using an acoustic Doppler radar. In: IEEE workshop on automatic speech recognition and understanding (ASRU 2005). IEEE, pp 319–324

    Google Scholar 

  • Hueber T, Benaroya E-L, Chollet G, Denby B, Dreyfus G, Stone M (2009) Visuo-phonetic decoding using multi-stream and context-dependent models for an ultrasound-based silent speech interface. Proc Interspeech 2009:640–643

    Google Scholar 

  • Hueber T, Benaroya EL, Chollet G, Denby B, Dreyfus G, Stone M (2010) Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Commun 52:288–300. doi:10.1016/j.specom.2009.11.004

    Article  Google Scholar 

  • Hueber T, Bailly G, Denby B (2012) Continuous articulatory-to-acoustic mapping using phone-based trajectory HMM for a silent speech interface. Proc Interspeech 2012:723–726

    Google Scholar 

  • Ishii S, Toda T, Saruwatari H, Sakti S, Nakamura, S (2011) Blind noise suppression for Non-Audible Murmur recognition with stereo signal processing. IEEE Work. Autom. Speech Recognit. Underst. 494–499. doi:10.1109/ASRU.2011.6163981

    Google Scholar 

  • Itoi M, Miyazaki R, Toda T, Saruwatari H, Shikano K (2012) Blind speech extraction for non-audible murmur speech with speaker’s movement noise. In: IEEE International symposium on signal processing and information technology (ISSPIT 2012). IEEE, pp 320–325.

    Google Scholar 

  • Jawbone (n.d.) Jawbone Headset [WWW Document]. https://jawbone.com

  • Jennings DL, Ruck DW (1995) Enhancing automatic speech recognition with an ultrasonic lip motion detector. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1995). IEEE, pp 868–871.

    Google Scholar 

  • Jou S-C, Schultz T, Waibel A (2004) Adaptation for soft whisper recognition using a throat microphone. Proc Interspeech 2004

    Google Scholar 

  • Kalaiselvi K, Vishnupriya MS (2014) Non-audible murmur (NAM) voice conversion by wavelet transform. Int. J.

    Google Scholar 

  • Kalgaonkar K, Raj B (2007) Acoustic Doppler sonar for gait recognition. In: IEEE conference on advanced video and signal based surveillance (AVSS 2007). Ieee, pp 27–32. doi:10.1109/AVSS.2007.4425281

  • Kalgaonkar K, Raj B (2008) Ultrasonic Doppler sensor for speaker recognition. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2008). Ieee, pp 4865–4868. doi:10.1109/ICASSP.2008.4518747

  • Kalgaonkar K, Raj B (2009) One-handed gesture recognition using ultrasonic Doppler sonar. In: IEEE International conference on acoustics, speech and signal processing (ICASSP 2009). IEEE, pp 1889–1892. doi:10.1109/ICASSP.2009.4959977

  • Kalgaonkar K, Hu RHR, Raj B (2007) Ultrasonic Doppler sensor for voice activity detection. IEEE Signal Process Lett 14:754–757. doi:10.1109/LSP.2007.896450

    Article  Google Scholar 

  • Ke Y, Sukthankar R (2004) PCA-SIFT: a more distinctive representation for local image descriptors. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition (CVPR 2004). IEEE, pp II–506.

    Google Scholar 

  • Kroos C (2012) Evaluation of the measurement precision in three-dimensional electromagnetic articulography (Carstens AG500). J Phon 40:453–465

    Article  Google Scholar 

  • Lawson E, Scobbie JM, Stuart-Smith J (2015) The role of anterior lingual gesture delay in coda/r/lenition: an ultrasound tongue imaging study. Proc 18th ICPhS

    Google Scholar 

  • Livescu K, Zhu B, Glass J (2009) On the phonetic information in ultrasonic microphone signals. In: IEEE Int. Conf. on acoustics, speech and signal processing (ICASSP 2009). IEEE, pp 4621–4624.

    Google Scholar 

  • Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110

    Article  Google Scholar 

  • Magloughlin L (2016) Accounting for variability in North American English/?: Evidence from children’s articulation. J Phon 54:51–67

    Article  Google Scholar 

  • McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264:746–748

    Article  Google Scholar 

  • McLoughlin IV (2014) The use of low-frequency ultrasound for voice activity. Proc Interspeech 2014:1553–1557

    Google Scholar 

  • Mielke J (2011) An articulatory study of rhotic vowels in Canadian French. In: Proc. of the Canadian Acoustical Association

    Google Scholar 

  • Miller AL, Finch KB (2011) Corrected high-frame rate anchored ultrasound with software alignment. J Speech Lang Hear Res 54:471–486

    Article  Google Scholar 

  • Nakajima Y (2005) Development and evaluation of soft silicone NAM microphone. In: Technical Report IEICE, SP2005-7

    Google Scholar 

  • Nakajima Y, Kashioka H, Shikano K, Campbell N (2003a) Non-audible murmur recognition. Eurospeech 2601–2604

    Google Scholar 

  • Nakajima Y, Kashioka H, Shikano K, Campbell N (2003b) Non-audible murmur recognition input interface using stethoscopic microphone attached to the skin. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP 2003) 5. doi:10.1109/ICASSP.2003.1200069

  • Noda K, Yamaguchi Y, Nakadai K, Okuno HG, Ogata T (2014) Lipreading using convolutional neural network. Proc Interspeech 2014

    Google Scholar 

  • Otani M, Shimizu S, Hirahara T (2008) Vocal tract shapes of non-audible murmur production. Acoust Sci Technol 29:195–198

    Article  Google Scholar 

  • Perkell JS, Cohen MH, Svirsky MA, Matthies ML, Garabieta I, Jackson MTT (1992) Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements. J Acoust Soc Am 92:3078–3096

    Article  Google Scholar 

  • Potamianos G, Neti C, Gravier G, Garg A, Senior AW (2003) Recent advances in the automatic recognition of audiovisual speech. Proc IEEE 91:1306–1326

    Article  Google Scholar 

  • Quatieri TF, Brady K, Messing D, Campbell JP, Campbell WM, Brandstein MS, Weinstein CJ, Tardelli JD, Gatewood PD (2006) Exploiting nonacoustic sensors for speech encoding. IEEE Trans. Audio. Speech. Lang. Processing 14. doi:10.1109/TSA.2005.855838

    Google Scholar 

  • Raj B, Kalgaonkar K, Harrison C, Dietz P (2012) Ultrasonic Doppler sensing in HCI. IEEE Perv Comput 11:24–29. doi:10.1109/MPRV.2012.17

    Article  Google Scholar 

  • Scobbie JM, Wrench AA, van der Linden M (2008) Head-Probe stabilisation in ultrasound tongue imaging using a headset to permit natural head movement. In: Proceedings of the 8th International seminar on speech production, pp 373–376.

    Google Scholar 

  • Scott AD, Wylezinska M, Birch MJ, Miquel ME (2014) Speech MRI: morphology and function. Phys Medica 30:604–618. doi:10.1016/j.ejmp.2014.05.001

    Article  Google Scholar 

  • Shaikh AA, Kumar DK, Yau WC, Che Azemin MZ, Gubbi J (2010) Lip reading using optical flow and support vector machines. In: 3rd International congress on image and signal processing (CISP 2010). IEEE, pp 327–330.

    Google Scholar 

  • Shin J, Lee J, Kim D (2011) Real-time lip reading system for isolated Korean word recognition. Pattern Recognit 44:559–571

    Article  MATH  Google Scholar 

  • Silva S, Teixeira A (2015) Unsupervised segmentation of the vocal tract from real-time MRI sequences. Comput Speech Lang 33:25–46. doi:10.1016/j.csl.2014.12.003

    Article  Google Scholar 

  • Srinivasan S, Raj B, Ezzat T (2010) Ultrasonic sensing for robust speech recognition. In: IEEE Int. Conf. on acoustics, speech and signal processing (ICASSP 2010). doi:10.1109/ICASSP.2010.5495039

  • Stork DG, Hennecke ME (1996) Speechreading by humans and machines: models, systems, and applications. Springer, New York

    Book  MATH  Google Scholar 

  • Tao F, Busso C (2014) lipreading approach for isolated digits recognition under whisper and neutral speech. Proc Interspeech 2014

    Google Scholar 

  • Toda T (2012) Statistical approaches to enhancement of body-conducted speech detected with non-audible murmur microphone. In: ICME International Conference on Complex Medical Engineering (CME 2012). IEEE, pp 623–628.

    Google Scholar 

  • Toth AR, Kalgaonkar K, Raj B, Ezzat T (2010) Synthesizing speech from Doppler signals. In: IEEE Int. Conf. on acoustics, speech and signal processing (ICASSP 2010). pp 4638–4641

    Google Scholar 

  • Tran V-A, Bailly G, Lœvenbruck H, Toda T (2009) Multimodal HMM-based NAM-to-speech conversion. Interspeech 2009:656–659

    Google Scholar 

  • Tran VA, Bailly G, Loevenbruck H, Toda T (2010) Improvement to a NAM-captured whisper-to-speech system. Speech Commun 52:314–326. doi:10.1016/j.specom.2009.11.005

    Article  Google Scholar 

  • Tran T, Mariooryad S, Busso C (2013) Audiovisual corpus to analyze whisper speech. In: IEEE international conference on acoustics, speech and signal processing (ICASSP 2013). pp 8101–8105. doi:10.1109/ICASSP.2013.6639243

  • Turton D (2015) Determining categoricity in English /l/-darkening: A principal component analysis of ultrasound spline data. In: Proc. 18th ICPhS.

    Google Scholar 

  • Vietti A, Spreafico L, Galatà V (2015) An ultrasound study of the phonetic allophony of Tyrolean/r. In: Proc. 18th ICPhS.

    Google Scholar 

  • Wand M, Koutník J, Schmidhuber J (2016) Lipreading with long short-term memory. arXiv Prepr. arXiv1601.08188.

    Google Scholar 

  • Wang J, Samal A, Green JR, Rudzicz F (2012a) Sentence recognition from articulatory movements for silent speech interfaces. In: IEEE International conference on acoustics, speech and signal processing (ICASSP 2012). IEEE, pp 4985–4988.

    Google Scholar 

  • Wang J, Samal, Green JR, Rudzicz F (2012b). Whole-word recognition from articulatory movements for silent speech interfaces. Proc Interspeech 2012

    Google Scholar 

  • Wang J, Balasubramanian A, Mojica de la Vega L, Green JR, Samal A, Prabhakaran B (2013) Word recognition from continuous articulatory movement time-series data using symbolic representations. In: ACL/ISCA workshop on speech and language processing for assistive technologies, Grenoble, France, pp 119–127

    Google Scholar 

  • Wang J, Samal A, Green JR (2014) Across-speaker articulatory normalization for speaker-independent silent speech recognition contribution of tongue lateral to consonant production. Proc Interspeech 2014:1179–1183

    Google Scholar 

  • Whalen DH, McDonough J (2015) Taking the laboratory into the field. Annu Rev Linguist 1:395–415

    Article  Google Scholar 

  • Whalen DH, Iskarous K, Tiede MK, Ostry DJ, Lehnert-Lehouillier H, Vatikiotis-Bateson E, Hailey DS (2005) The haskins optically corrected ultrasound system (HOCUS). J Speech Lang Hear Res 48:543–553

    Article  Google Scholar 

  • Xu K, Yang Y, Stone M, Jaumard-Hakoun A, Leboullenger C, Dreyfus G, Roussel P, Denby B (2016) Robust contour tracking in ultrasound tongue image sequences. Clin. Linguist. Phon. 0, 1–15. doi:10.3109/02699206.2015.1110714

    Google Scholar 

  • Yaling L, Wenjuan Y, Minghui D (2010) Feature extraction based on lsda for lipreading. In: International Conference on Multimedia Technology (ICMT), 2010. IEEE, pp 1–4.

    Google Scholar 

  • Zhao G, Barnard M, Pietikainen M (2009) Lipreading with local spatiotemporal descriptors. IEEE Trans Multimedia 11:1254–1265

    Article  Google Scholar 

  • Zharkova N, Hewlett N (2009) Measuring lingual coarticulation from midsagittal tongue contours: Description and example calculations using English /t/ and /a/. J. Phon. 37:248–256. doi:http://dx.doi.org/10.1016/j.wocn.2008.10.005

  • Zharkova N, Hewlett N, Hardcastle WJ (2012) An ultrasound study of lingual coarticulation in/s V/syllables produced by adults and typically developing children. J Int Phon Assoc 42:193–208

    Article  Google Scholar 

  • Zharkova N, Gibbon FE, Hardcastle WJ (2015) Quantifying lingual coarticulation using ultrasound imaging data collected with and without head stabilisation. Clin Linguist Phon 29:249–265

    Article  Google Scholar 

  • Zhu B (2008) Multimodal speech recognition with ultrasonic sensors. M.Sc. Thesis, Massachusetts Institute of Technology

    Google Scholar 

  • Zhu B, Hazen TJ, Glass JR (2007) Multimodal speech recognition with ultrasonic sensors. Interspeech 2007:662–665

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2017 The Author(s)

About this chapter

Cite this chapter

Freitas, J., Teixeira, A., Dias, M.S., Silva, S. (2017). SSI Modalities II: Articulation and Its Consequences. In: An Introduction to Silent Speech Interfaces. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-40174-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-40174-4_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-40173-7

  • Online ISBN: 978-3-319-40174-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics