Abstract
In this work, we present a silent speech system that is able to generate audible speech from captured movement of speech articulators. Our goal is to help laryngectomy patients, i.e. patients who have lost the ability to speak following surgical removal of the larynx most frequently due to cancer, to recover their voice. In our system, we use a magnetic sensing technique known as Permanent Magnet Articulography (PMA) to capture the movement of the lips and tongue by attaching small magnets to the articulators and monitoring the magnetic field changes with sensors close to the mouth. The captured sensor data is then transformed into a sequence of speech parameter vectors from which a time-domain speech signal is finally synthesised. The key component of our system is a parametric transformation which represents the PMA-to-speech mapping. Here, this transformation takes the form of a statistical model (a mixture of factor analysers, more specifically) whose parameters are learned from simultaneous recordings of PMA and speech signals acquired before laryngectomy. To evaluate the performance of our system on voice reconstruction, we recorded two PMA-and-speech databases with different phonetic complexity for several non-impaired subjects. Results show that our system is able to synthesise speech that sounds as the original voice of the subject and also is intelligible. However, more work still need to be done to achieve a consistent synthesis for phonetically-rich vocabularies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Several speech samples are available in the Demos section of http://www.hull.ac.uk/speech/disarm.
References
Atal, B.S., Chang, J.J., Mathews, M.V., Tukey, J.W.: Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique. J. Acoust. Soc. Am. 63(5), 1535–1555 (1978)
Braz, D.S.A., Ribas, M.M., Dedivitis, R.A., Nishimoto, I.N., Barros, A.P.B.: Quality of life and depression in patients undergoing total and partial laryngectomy. Clinics 60(2), 135–142 (2005)
Byrne, A., Walsh, M., Farrelly, M., O’Driscoll, K.: Depression following laryngectomy. A pilot study. Brit. J. Psychiat. 163(2), 173–176 (1993)
Cheah, L.A., Bai, J., Gonzalez, J.A., Gilbert, J.M., Ell, S.R., Green, P.D., Moore, R.K.: Preliminary evaluation of a silent speech interface based on intra-oral magnetic sensing. In: Proceedings BioDevices, pp. 108–116 (2016)
Cheah, L.A., Bai, J., Gonzalez, J.A., Ell, S.R., Gilbert, J.M., Moore, R.K., Green, P.D.: A user-centric design of permanent magnetic articulography based assistive speech technology. In: Proceedings BioSignals, pp. 109–116 (2015)
Chen, J., Kim, M., Wang, Y., Ji, Q.: Switching Gaussian process dynamic models for simultaneous composite motion tracking and recognition. In: Proceedings IEEE Conference Computer Vision and Pattern Recognition, pp. 2655–2662 (2009)
Danker, H., Wollbrück, D., Singer, S., Fuchs, M., Brähler, E., Meyer, A.: Social withdrawal after laryngectomy. Eur. Arch. Oto-Rhino-L 267(4), 593–600 (2010)
De Jong, S.: SIMPLS: an alternative approach to partial least squares regression. Chemom. Intell. Lab. Syst. 18(3), 251–263 (1993)
Denby, B., Schultz, T., Honda, K., Hueber, T., Gilbert, J., Brumberg, J.: Silent speech interfaces. Speech Commun. 52(4), 270–287 (2010)
Desai, S., Raghavendra, E.V., Yegnanarayana, B., Black, A.W., Prahallad, K.: Voice conversion using artificial neural networks. In: Proceedings ICASSP, pp. 3893–3896 (2009)
Ell, S.R.: Candida: the cancer of silastic. J. Laryngol. Otol. 110(03), 240–242 (1996)
Ell, S.R., Mitchell, A.J., Parker, A.J.: Microbial colonization of the groningen speaking valve and its relationship to valve failure. Clin. Otolaryngol. Allied Sci. 20(6), 555–556 (1995)
Fagan, M.J., Ell, S.R., Gilbert, J.M., Sarrazin, E., Chapman, P.M.: Development of a (silent) speech recognition system for patients following laryngectomy. Med. Eng. Phys. 30(4), 419–425 (2008)
Freitas, J., Teixeira, A., Bastos, C., Dias, M.: Towards a multimodal silent speech interface for European Portuguese. In: Speech Technologies, vol. 10, pp. 125–150. InTech (2011)
Fried-Oken, M., Fox, L., Rau, M.T., Tullman, J., Baker, G., Hindal, M., Wile, N., Lou, J.S.: Purposes of AAC device use for persons with ALS as reported by caregivers. Augment Altern. Commun. 22(3), 209–221 (2006)
Fukada, T., Tokuda, K., Kobayashi, T., Imai, S.: An adaptive algorithm for Mel-cepstral analysis of speech. In: Proceedings ICASSP, pp. 137–140 (1992)
Ghahramani, Z., Hinton, G.E.: The EM algorithm for mixtures of factor analyzers. Technical report CRG-TR-96-1, University of Toronto (1996)
Gilbert, J.M., Rybchenko, S.I., Hofe, R., Ell, S.R., Fagan, M.J., Moore, R.K., Green, P.: Isolated word recognition of silent speech using magnetic implants and sensors. Med. Eng. Phys. 32(10), 1189–1197 (2010)
Gonzalez, J.A., Green, P.D., Moore, R.K., Cheah, L.A., Gilbert, J.M.: A non-parametric articulatory-to-acoustic conversion system for silent speech using shared Gaussian process dynamical models. In: UK Speech, p. 11 (2015)
Gonzalez, J.A., Cheah, L.A., Bai, J., Ell, S.R., Gilbert, J.M., Moore, R.K., Green, P.D.: Analysis of phonetic similarity in a silent speech interface based on permanent magnetic articulography. In: Proceedings Interspeech, pp. 1018–1022 (2014)
Gonzalez, J.A., Cheah, L.A., Gilbert, J.M., Bai, J., Ell, S.R., Green, P.D., Moore, R.K.: A silent speech system based on permanent magnet articulography and direct synthesis. Comput. Speech Lang. 39, 67–87 (2016)
Heaton, J.M., Parker, A.J.: Indwelling tracheo-oesophageal voice prostheses post-laryngectomy in Sheffield, UK: a 6-year review. Acta Otolaryngol. 114(6), 675–678 (1994)
Herff, C., Heger, D., de Pesters, A., Telaar, D., Brunner, P., Schalk, G., Schultz, T.: Brain-to-text: decoding spoken phrases from phone representations in the brain. Front. Neurosci. 9, 217 (2015)
Hofe, R., Bai, J., Cheah, L.A., Ell, S.R., Gilbert, J.M., Moore, R.K., Green, P.D.: Performance of the MVOCA silent speech interface across multiple speakers. In: Proceedings Interspeech, pp. 1140–1143 (2013)
Hofe, R., Ell, S.R., Fagan, M.J., Gilbert, J.M., Green, P.D., Moore, R.K., Rybchenko, S.I.: Speech synthesis parameter generation for the assistive silent speech interface MVOCA. In: Proceedings Interspeech, pp. 3009–3012 (2011)
Hofe, R., Ell, S.R., Fagan, M.J., Gilbert, J.M., Green, P.D., Moore, R.K., Rybchenko, S.I.: Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing. Speech Commun. 55(1), 22–32 (2013)
Hueber, T., Bailly, G.: Statistical conversion of silent articulation into audible speech using full-covariance HMM. Med. Eng. Phys. 36, 274–293 (2016)
Hueber, T., Bailly, G., Denby, B.: Continuous articulatory-to-acoustic mapping using phone-based trajectory HMM for a silent speech interface. In: Proceedings Interspeech, pp. 723–726 (2012)
Hueber, T., Benaroya, E.L., Chollet, G., Denby, B., Dreyfus, G., Stone, M.: Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Commun. 52(4), 288–300 (2010)
International Phonetic Association: The international phonetic alphabet (2005)
Jou, S.C., Schultz, T., Walliczek, M., Kraft, F., Waibel, A.: Towards continuous speech recognition using surface electromyography. In: Proceedings Interspeech, pp. 573–576 (2006)
Kominek, J., Black, A.W.: The CMU Arctic speech databases. In: Fifth ISCA Workshop on Speech Synthesis, pp. 223–224 (2004)
Kubichek, R.: Mel-cepstral distance measure for objective speech quality assessment. In: Proceedings of IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 125–128 (1993)
Leonard, R.: A database for speaker-independent digit recognition. In: Proceedings of ICASSP, pp. 328–331 (1984)
Maeda, S.: A digital simulation method of the vocal-tract system. Speech Commun. 1(3), 199–229 (1982)
Mullen, J., Howard, D.M., Murphy, D.T.: Waveguide physical modeling of vocal tract acoustics: flexible formant bandwidth control from increased model dimensionality. IEEE Trans. Audio Speech Lang. Process. 14(3), 964–971 (2006)
Murphy, D.T., Jani, M., Ternström, S.: Articulatory vocal tract syntheis in supercollider. In: Proceedings of International Conference on Digital Audio Effects, pp. 1–7 (2015)
Nakamura, K., Toda, T., Saruwatari, H., Shikano, K.: Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. Speech Commun. 54(1), 134–146 (2012)
Neiberg, D., Ananthakrishnan, G., Engwall, O.: The acoustic to articulation mapping: non-linear or non-unique? In: Proceedings Interspeech, pp. 1485–1488 (2008)
Petajan, E.D.: Automatic lipreading to enhance speech recognition (speech reading). Ph.D. thesis, University of Illinois at Urbana-Champaign (1984)
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Sig. Process. 26(1), 43–49 (1978)
Schultz, T., Wand, M.: Modeling coarticulation in EMG-based continuous speech recognition. Speech Commun. 52(4), 341–353 (2010)
Toda, T., Black, A.W., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio Speech Lang. Process. 15(8), 2222–2235 (2007)
Toda, T., Black, A.W., Tokuda, K.: Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Commun. 50(3), 215–227 (2008)
Toutios, A., Maeda, S.: Articulatory VCV synthesis from EMA data. In: Proceedings Interspeech (2012)
Toutios, A., Margaritis, K.G.: A support vector approach to the acoustic-to-articulatory mapping. In: Proceedings Interspeech, pp. 3221–3224 (2005)
Toutios, A., Narayanan, S.: Articulatory synthesis of French connected speech from EMA data. In: Proceedings Interspeech, pp. 2738–2742 (2013)
Uria, B., Renals, S., Richmond, K.: A deep neural network for acoustic-articulatory speech inversion. In: Proceedings of NIPS 2011 Workshop on Deep Learning and Unsupervised Feature Learning (2011)
Wand, M., Janke, M., Schultz, T.: Tackling speaking mode varieties in EMG-based speech recognition. IEEE Trans. Bio-Med. Eng. 61(10), 2515–2526 (2014)
Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models for human motion. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 283–298 (2008)
Zahner, M., Janke, M., Wand, M., Schultz, T.: Conversion from facial myoelectric signals to speech: a unit selection approach. In: Proceedings Interspeech, pp. 1184–1188 (2014)
Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Commun. 51(11), 1039–1064 (2009)
Acknowledgements
This is a summary of independent research funded by the National Institute for Health Research (NIHR)’s Invention for Innovation Programme. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Gonzalez, J.A. et al. (2017). Voice Restoration After Laryngectomy Based on Magnetic Sensing of Articulator Movement and Statistical Articulation-to-Speech Conversion. In: Fred, A., Gamboa, H. (eds) Biomedical Engineering Systems and Technologies. BIOSTEC 2016. Communications in Computer and Information Science, vol 690. Springer, Cham. https://doi.org/10.1007/978-3-319-54717-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-54717-6_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54716-9
Online ISBN: 978-3-319-54717-6
eBook Packages: Computer ScienceComputer Science (R0)