SSI Modalities II: Articulation and Its Consequences

Freitas, João; Teixeira, António; Dias, Miguel Sales; Silva, Samuel

doi:10.1007/978-3-319-40174-4_3

João Freitas^5,6,
António Teixeira⁷,
Miguel Sales Dias^6,8 &
…
Samuel Silva⁷

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

1772 Accesses

Abstract

Brain and muscular activity originates the change in shape or position of articulators such as the tongue or lips and, as a consequence, the vocal tract assumes different configurations. Most of these changes, namely of articulators and tract, are internal and are not easy to measure but, in some cases, like the lips or the tongue tip, such changes are visible or have visible effects. Even without the production of speech sound, these different configurations of articulators provide valuable information that can be used in the context of silent speech interfaces (SSIs). In this chapter, the reader finds an overview of the technologies used to assess articulatory and visual aspects of speech production and how researchers have exploited their capabilities for the development of silent speech interfaces.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Acher A, Perrier P, Savariaux C, Fougeron C (2014) Speech production after glossectomy: methodological aspects. Clin Linguist Phon 28:241–256
Article Google Scholar
Alghowinem S, Wagner M, Goecke R (2013) AusTalk—The Australian speech database: design framework, recording experience and localisation. In: 8th Int. Conf. on Information Technology in Asia (CITA 2013). IEEE, pp 1–7
Google Scholar
Babani D, Toda T, Saruwatari H, Shikano K (2011) Acoustic model training for non-audible murmur recognition using transformed normal speech data. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP 2011) 5224–5227. doi:10.1109/ICASSP.2011.5947535
Bacsfalvi P, Bernhardt BM (2011) Long-term outcomes of speech therapy for seven adolescents with visual feedback technologies: ultrasound and electropalatography. Clin Linguist Phon 25:1034–1043
Article Google Scholar
Bastos R, Dias MS (2009) FIRST—fast invariant to rotation and scale transform: invariant image features for augmented reality and computer vision. VDM, Saarbrücken
Google Scholar
Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In: European Conference on Computer Vision (ECCV 2006). Springer, Berlin, pp 404–417
Google Scholar
Brown DR III, Keenaghan K, Desimini S (2005) Measuring glottal activity during voiced speech using a tuned electromagnetic resonating collar sensor. Meas Sci Technol 16:2381
Article Google Scholar
Burnham D, Estival D, Fazio S, Viethen J, Cox F, Dale R, Cassidy S, Epps J, Togneri R, Wagner M (2011) Building an audio-visual corpus of Australian English: large corpus collection with an economical portable and replicable black box. Proc Interspeech 2011:841–844
Google Scholar
Carstens Medizinelektronik (2016) 3D Electromagnetic Articulograph [WWW Document]. URL http://www.articulograph.de/. Accessed 4 April 2016
Carvalho P, Oliveira T, Ciobanu L, Gaspar F, Teixeira L, Bastos R, Cardoso J, Dias M, Côrte-Real L (2013) Analysis of object description methods in a video object tracking environment. Mach Vis Appl 24:1149–1165. doi:10.1007/s00138-013-0523-z
Article Google Scholar
Cleland J, Scobbie JM, Wrench AA (2015) Using ultrasound visual biofeedback to treat persistent primary speech sound disorders. Clin Linguist Phon 1–23
Google Scholar
Denby B (2013) Down with sound, the story of silent speech. In: Workshop on Speech production in automatic speech recognition
Google Scholar
Denby B, Schultz T, Honda K, Hueber T, Gilbert JM, Brumberg JS (2010) Silent speech interfaces. Speech Commun 52:270–287. doi:10.1016/j.specom.2009.08.002
Article Google Scholar
Fabre D, Hueber T, Badin P (2014) Automatic animation of an articulatory tongue model from ultrasound images using Gaussian mixture regression. Proc Interspeech 2014:2293–2297
Google Scholar
Fagan MJ, Ell SR, Gilbert JM, Sarrazin E, Chapman PM (2008) Development of a (silent) speech recognition system for patients following laryngectomy. Med Eng Phys 30:419–425. doi:10.1016/j.medengphy.2007.05.003
Article Google Scholar
Florescu VM, Crevier-Buchman L, Denby B, Hueber T, Colazo-Simon A, Pillot-Loiseau C, Roussel-Ragot P, Gendrot C, Quattrocchi S (2010) Silent vs vocalized articulation for a portable ultrasound-based silent speech interface. Proc Interspeech 2010:450–453
Google Scholar
Francisco AA, Jesse, A, Groen MA, McQueen JM (2014) Audiovisual temporal sensitivity in typical and dyslexic adult readers. Proc Interspeech 2014
Google Scholar
Freitas J, Teixeira A, Dias MS, Bastos C (2011) Towards a multimodal silent speech interface for European Portuguese. In: Speech technologies, InTech, Ivo Ipsic (Ed.), pp 125–149. doi:10.5772/16935
Freitas J, Teixeira A, Vaz F, Dias MS (2012) Automatic speech recognition based on ultrasonic doppler sensing for European Portuguese. In: Advances in speech and language technologies for iberian languages, communications in computer and information science. Springer, Berlin, pp 227–236. doi:10.1007/978-3-642-35292-8_24
Freitas J, Teixeira A, Dias MS (2014) Can Ultrasonic Doppler help detecting nasality for silent speech interfaces? An exploratory analysis based on alignement of the Doppler signal with velum aperture information from real-time MRI. In: International conference on physiological computing systems (PhyCS 2014). pp 232–239
Google Scholar
Gilbert JM, Rybchenko SI, Hofe R, Ell SR, Fagan MJ, Moore RK, Green P (2010) Isolated word recognition of silent speech using magnetic implants and sensors. Med Eng Phys 32:1189–1197. doi:10.1016/j.medengphy.2010.08.011
Article Google Scholar
Gonzalez JA, Cheah LA, Bai J, Ell SR, Gilbert JM, Moore RK, Green PD (2014) Analysis of phonetic similarity in a silent speech interface based on permanent magnetic articulography. Proc Interspeech 2014:1018–1022
Google Scholar
Gurbuz S, Tufekci Z, Patterson E, Gowdy JN (2001) Application of affine-invariant Fourier descriptors to lipreading for audio-visual speech recognition. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2001). IEEE, pp 177–180.
Google Scholar
Harris C, Stephens M (1988) A combined corner and edge detector. In: Alvey vision conference. Manchester, UK, p. 50.
Google Scholar
Heracleous P, Hagita N (2010) Non-audible murmur recognition based on fusion of audio and visual streams. Proc Interspeech 2010:2706–2709
Google Scholar
Heracleous P, Nakajima Y, Lee A, Saruwatari H, Shikano K (2003) Accurate hidden Markov models for non-audible murmur (NAM) recognition based on iterative supervised adaptation. IEEE Work. Autom. Speech Recognit. Underst. (ASRU 2003). doi:10.1109/ASRU.2003.1318406
Heracleous P, Badin P, Bailly G, Hagita N (2011) A pilot study on augmented speech communication based on electro-magnetic articulography. Pattern Recognit Lett 32:1119–1125
Article Google Scholar
Hofe R, Ell SR, Fagan MJ, Gilbert JM, Green PD, Moore RK, Rybchenko SI (2010) Evaluation of a silent speech interface based on magnetic sensing. Proc Interspeech 2010:246–249
Google Scholar
Hofe R, Bai J, Cheah LA, Ell SR, Gilbert JM, Moore RK, Green PD (2013a) Performance of the MVOCA silent speech interface across multiple speakers. In: Proc. of Interspeech 2013. pp 1140–1143
Google Scholar
Hofe R, Ell SR, Fagan MJ, Gilbert JM, Green PD, Moore RK, Rybchenko SI (2013b) Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing. Speech Commun. 55:22–32. doi:10.1016/j.specom.2012.02.001
Holzrichter JF, Foundation JH, Davis C (2009) Characterizing silent and pseudo-silent speech using radar-like sensors. Interspeech 2009:656–659
Google Scholar
Hu R, Raj B (2005) A robust voice activity detector using an acoustic Doppler radar. In: IEEE workshop on automatic speech recognition and understanding (ASRU 2005). IEEE, pp 319–324
Google Scholar
Hueber T, Benaroya E-L, Chollet G, Denby B, Dreyfus G, Stone M (2009) Visuo-phonetic decoding using multi-stream and context-dependent models for an ultrasound-based silent speech interface. Proc Interspeech 2009:640–643
Google Scholar
Hueber T, Benaroya EL, Chollet G, Denby B, Dreyfus G, Stone M (2010) Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Commun 52:288–300. doi:10.1016/j.specom.2009.11.004
Article Google Scholar
Hueber T, Bailly G, Denby B (2012) Continuous articulatory-to-acoustic mapping using phone-based trajectory HMM for a silent speech interface. Proc Interspeech 2012:723–726
Google Scholar
Ishii S, Toda T, Saruwatari H, Sakti S, Nakamura, S (2011) Blind noise suppression for Non-Audible Murmur recognition with stereo signal processing. IEEE Work. Autom. Speech Recognit. Underst. 494–499. doi:10.1109/ASRU.2011.6163981
Google Scholar
Itoi M, Miyazaki R, Toda T, Saruwatari H, Shikano K (2012) Blind speech extraction for non-audible murmur speech with speaker’s movement noise. In: IEEE International symposium on signal processing and information technology (ISSPIT 2012). IEEE, pp 320–325.
Google Scholar
Jawbone (n.d.) Jawbone Headset [WWW Document]. https://jawbone.com
Jennings DL, Ruck DW (1995) Enhancing automatic speech recognition with an ultrasonic lip motion detector. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1995). IEEE, pp 868–871.
Google Scholar
Jou S-C, Schultz T, Waibel A (2004) Adaptation for soft whisper recognition using a throat microphone. Proc Interspeech 2004
Google Scholar
Kalaiselvi K, Vishnupriya MS (2014) Non-audible murmur (NAM) voice conversion by wavelet transform. Int. J.
Google Scholar
Kalgaonkar K, Raj B (2007) Acoustic Doppler sonar for gait recognition. In: IEEE conference on advanced video and signal based surveillance (AVSS 2007). Ieee, pp 27–32. doi:10.1109/AVSS.2007.4425281
Kalgaonkar K, Raj B (2008) Ultrasonic Doppler sensor for speaker recognition. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2008). Ieee, pp 4865–4868. doi:10.1109/ICASSP.2008.4518747
Kalgaonkar K, Raj B (2009) One-handed gesture recognition using ultrasonic Doppler sonar. In: IEEE International conference on acoustics, speech and signal processing (ICASSP 2009). IEEE, pp 1889–1892. doi:10.1109/ICASSP.2009.4959977
Kalgaonkar K, Hu RHR, Raj B (2007) Ultrasonic Doppler sensor for voice activity detection. IEEE Signal Process Lett 14:754–757. doi:10.1109/LSP.2007.896450
Article Google Scholar
Ke Y, Sukthankar R (2004) PCA-SIFT: a more distinctive representation for local image descriptors. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition (CVPR 2004). IEEE, pp II–506.
Google Scholar
Kroos C (2012) Evaluation of the measurement precision in three-dimensional electromagnetic articulography (Carstens AG500). J Phon 40:453–465
Article Google Scholar
Lawson E, Scobbie JM, Stuart-Smith J (2015) The role of anterior lingual gesture delay in coda/r/lenition: an ultrasound tongue imaging study. Proc 18th ICPhS
Google Scholar
Livescu K, Zhu B, Glass J (2009) On the phonetic information in ultrasonic microphone signals. In: IEEE Int. Conf. on acoustics, speech and signal processing (ICASSP 2009). IEEE, pp 4621–4624.
Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110
Article Google Scholar
Magloughlin L (2016) Accounting for variability in North American English/?: Evidence from children’s articulation. J Phon 54:51–67
Article Google Scholar
McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264:746–748
Article Google Scholar
McLoughlin IV (2014) The use of low-frequency ultrasound for voice activity. Proc Interspeech 2014:1553–1557
Google Scholar
Mielke J (2011) An articulatory study of rhotic vowels in Canadian French. In: Proc. of the Canadian Acoustical Association
Google Scholar
Miller AL, Finch KB (2011) Corrected high-frame rate anchored ultrasound with software alignment. J Speech Lang Hear Res 54:471–486
Article Google Scholar
Nakajima Y (2005) Development and evaluation of soft silicone NAM microphone. In: Technical Report IEICE, SP2005-7
Google Scholar
Nakajima Y, Kashioka H, Shikano K, Campbell N (2003a) Non-audible murmur recognition. Eurospeech 2601–2604
Google Scholar
Nakajima Y, Kashioka H, Shikano K, Campbell N (2003b) Non-audible murmur recognition input interface using stethoscopic microphone attached to the skin. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP 2003) 5. doi:10.1109/ICASSP.2003.1200069
Noda K, Yamaguchi Y, Nakadai K, Okuno HG, Ogata T (2014) Lipreading using convolutional neural network. Proc Interspeech 2014
Google Scholar
Otani M, Shimizu S, Hirahara T (2008) Vocal tract shapes of non-audible murmur production. Acoust Sci Technol 29:195–198
Article Google Scholar
Perkell JS, Cohen MH, Svirsky MA, Matthies ML, Garabieta I, Jackson MTT (1992) Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements. J Acoust Soc Am 92:3078–3096
Article Google Scholar
Potamianos G, Neti C, Gravier G, Garg A, Senior AW (2003) Recent advances in the automatic recognition of audiovisual speech. Proc IEEE 91:1306–1326
Article Google Scholar
Quatieri TF, Brady K, Messing D, Campbell JP, Campbell WM, Brandstein MS, Weinstein CJ, Tardelli JD, Gatewood PD (2006) Exploiting nonacoustic sensors for speech encoding. IEEE Trans. Audio. Speech. Lang. Processing 14. doi:10.1109/TSA.2005.855838
Google Scholar
Raj B, Kalgaonkar K, Harrison C, Dietz P (2012) Ultrasonic Doppler sensing in HCI. IEEE Perv Comput 11:24–29. doi:10.1109/MPRV.2012.17
Article Google Scholar
Scobbie JM, Wrench AA, van der Linden M (2008) Head-Probe stabilisation in ultrasound tongue imaging using a headset to permit natural head movement. In: Proceedings of the 8th International seminar on speech production, pp 373–376.
Google Scholar
Scott AD, Wylezinska M, Birch MJ, Miquel ME (2014) Speech MRI: morphology and function. Phys Medica 30:604–618. doi:10.1016/j.ejmp.2014.05.001
Article Google Scholar
Shaikh AA, Kumar DK, Yau WC, Che Azemin MZ, Gubbi J (2010) Lip reading using optical flow and support vector machines. In: 3rd International congress on image and signal processing (CISP 2010). IEEE, pp 327–330.
Google Scholar
Shin J, Lee J, Kim D (2011) Real-time lip reading system for isolated Korean word recognition. Pattern Recognit 44:559–571
Article MATH Google Scholar
Silva S, Teixeira A (2015) Unsupervised segmentation of the vocal tract from real-time MRI sequences. Comput Speech Lang 33:25–46. doi:10.1016/j.csl.2014.12.003
Article Google Scholar
Srinivasan S, Raj B, Ezzat T (2010) Ultrasonic sensing for robust speech recognition. In: IEEE Int. Conf. on acoustics, speech and signal processing (ICASSP 2010). doi:10.1109/ICASSP.2010.5495039
Stork DG, Hennecke ME (1996) Speechreading by humans and machines: models, systems, and applications. Springer, New York
Book MATH Google Scholar
Tao F, Busso C (2014) lipreading approach for isolated digits recognition under whisper and neutral speech. Proc Interspeech 2014
Google Scholar
Toda T (2012) Statistical approaches to enhancement of body-conducted speech detected with non-audible murmur microphone. In: ICME International Conference on Complex Medical Engineering (CME 2012). IEEE, pp 623–628.
Google Scholar
Toth AR, Kalgaonkar K, Raj B, Ezzat T (2010) Synthesizing speech from Doppler signals. In: IEEE Int. Conf. on acoustics, speech and signal processing (ICASSP 2010). pp 4638–4641
Google Scholar
Tran V-A, Bailly G, Lœvenbruck H, Toda T (2009) Multimodal HMM-based NAM-to-speech conversion. Interspeech 2009:656–659
Google Scholar
Tran VA, Bailly G, Loevenbruck H, Toda T (2010) Improvement to a NAM-captured whisper-to-speech system. Speech Commun 52:314–326. doi:10.1016/j.specom.2009.11.005
Article Google Scholar
Tran T, Mariooryad S, Busso C (2013) Audiovisual corpus to analyze whisper speech. In: IEEE international conference on acoustics, speech and signal processing (ICASSP 2013). pp 8101–8105. doi:10.1109/ICASSP.2013.6639243
Turton D (2015) Determining categoricity in English /l/-darkening: A principal component analysis of ultrasound spline data. In: Proc. 18th ICPhS.
Google Scholar
Vietti A, Spreafico L, Galatà V (2015) An ultrasound study of the phonetic allophony of Tyrolean/r. In: Proc. 18th ICPhS.
Google Scholar
Wand M, Koutník J, Schmidhuber J (2016) Lipreading with long short-term memory. arXiv Prepr. arXiv1601.08188.
Google Scholar
Wang J, Samal A, Green JR, Rudzicz F (2012a) Sentence recognition from articulatory movements for silent speech interfaces. In: IEEE International conference on acoustics, speech and signal processing (ICASSP 2012). IEEE, pp 4985–4988.
Google Scholar
Wang J, Samal, Green JR, Rudzicz F (2012b). Whole-word recognition from articulatory movements for silent speech interfaces. Proc Interspeech 2012
Google Scholar
Wang J, Balasubramanian A, Mojica de la Vega L, Green JR, Samal A, Prabhakaran B (2013) Word recognition from continuous articulatory movement time-series data using symbolic representations. In: ACL/ISCA workshop on speech and language processing for assistive technologies, Grenoble, France, pp 119–127
Google Scholar
Wang J, Samal A, Green JR (2014) Across-speaker articulatory normalization for speaker-independent silent speech recognition contribution of tongue lateral to consonant production. Proc Interspeech 2014:1179–1183
Google Scholar
Whalen DH, McDonough J (2015) Taking the laboratory into the field. Annu Rev Linguist 1:395–415
Article Google Scholar
Whalen DH, Iskarous K, Tiede MK, Ostry DJ, Lehnert-Lehouillier H, Vatikiotis-Bateson E, Hailey DS (2005) The haskins optically corrected ultrasound system (HOCUS). J Speech Lang Hear Res 48:543–553
Article Google Scholar
Xu K, Yang Y, Stone M, Jaumard-Hakoun A, Leboullenger C, Dreyfus G, Roussel P, Denby B (2016) Robust contour tracking in ultrasound tongue image sequences. Clin. Linguist. Phon. 0, 1–15. doi:10.3109/02699206.2015.1110714
Google Scholar
Yaling L, Wenjuan Y, Minghui D (2010) Feature extraction based on lsda for lipreading. In: International Conference on Multimedia Technology (ICMT), 2010. IEEE, pp 1–4.
Google Scholar
Zhao G, Barnard M, Pietikainen M (2009) Lipreading with local spatiotemporal descriptors. IEEE Trans Multimedia 11:1254–1265
Article Google Scholar
Zharkova N, Hewlett N (2009) Measuring lingual coarticulation from midsagittal tongue contours: Description and example calculations using English /t/ and /a/. J. Phon. 37:248–256. doi:http://dx.doi.org/10.1016/j.wocn.2008.10.005
Zharkova N, Hewlett N, Hardcastle WJ (2012) An ultrasound study of lingual coarticulation in/s V/syllables produced by adults and typically developing children. J Int Phon Assoc 42:193–208
Article Google Scholar
Zharkova N, Gibbon FE, Hardcastle WJ (2015) Quantifying lingual coarticulation using ultrasound imaging data collected with and without head stabilisation. Clin Linguist Phon 29:249–265
Article Google Scholar
Zhu B (2008) Multimodal speech recognition with ultrasonic sensors. M.Sc. Thesis, Massachusetts Institute of Technology
Google Scholar
Zhu B, Hazen TJ, Glass JR (2007) Multimodal speech recognition with ultrasonic sensors. Interspeech 2007:662–665
Google Scholar

Download references

Author information

Authors and Affiliations

DefinedCrowd Corporation, Lisboa, Portugal
João Freitas
Microsoft Language Development Center, Microsoft Portugal, Lisboa, Portugal
João Freitas & Miguel Sales Dias
Department of Electronics, Telecommunications and Informatics/IEETA, University of Aveiro, Aveiro, Portugal
António Teixeira & Samuel Silva
Instituto Universitário de Lisboa (ISCTE-IUL), ISTAR-IUL, Lisboa, Portugal
Miguel Sales Dias

Authors

João Freitas
View author publications
You can also search for this author in PubMed Google Scholar
António Teixeira
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Sales Dias
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Silva
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Freitas, J., Teixeira, A., Dias, M.S., Silva, S. (2017). SSI Modalities II: Articulation and Its Consequences. In: An Introduction to Silent Speech Interfaces. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-40174-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-40174-4_3
Published: 06 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40173-7
Online ISBN: 978-3-319-40174-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics