Voice Restoration After Laryngectomy Based on Magnetic Sensing of Articulator Movement and Statistical Articulation-to-Speech Conversion

Gonzalez, Jose A.; Cheah, Lam A.; Gilbert, James M.; Bai, Jie; Ell, Stephen R.; Green, Phil D.; Moore, Roger K.

doi:10.1007/978-3-319-54717-6_17

Jose A. Gonzalez¹²,
Lam A. Cheah¹³,
James M. Gilbert¹³,
Jie Bai¹³,
Stephen R. Ell¹⁴,
Phil D. Green¹² &
…
Roger K. Moore¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 690))

Included in the following conference series:

International Joint Conference on Biomedical Engineering Systems and Technologies

610 Accesses

Abstract

In this work, we present a silent speech system that is able to generate audible speech from captured movement of speech articulators. Our goal is to help laryngectomy patients, i.e. patients who have lost the ability to speak following surgical removal of the larynx most frequently due to cancer, to recover their voice. In our system, we use a magnetic sensing technique known as Permanent Magnet Articulography (PMA) to capture the movement of the lips and tongue by attaching small magnets to the articulators and monitoring the magnetic field changes with sensors close to the mouth. The captured sensor data is then transformed into a sequence of speech parameter vectors from which a time-domain speech signal is finally synthesised. The key component of our system is a parametric transformation which represents the PMA-to-speech mapping. Here, this transformation takes the form of a statistical model (a mixture of factor analysers, more specifically) whose parameters are learned from simultaneous recordings of PMA and speech signals acquired before laryngectomy. To evaluate the performance of our system on voice reconstruction, we recorded two PMA-and-speech databases with different phonetic complexity for several non-impaired subjects. Results show that our system is able to synthesise speech that sounds as the original voice of the subject and also is intelligible. However, more work still need to be done to achieve a consistent synthesis for phonetically-rich vocabularies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Several speech samples are available in the Demos section of http://www.hull.ac.uk/speech/disarm.

References

Atal, B.S., Chang, J.J., Mathews, M.V., Tukey, J.W.: Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique. J. Acoust. Soc. Am. 63(5), 1535–1555 (1978)
Article Google Scholar
Braz, D.S.A., Ribas, M.M., Dedivitis, R.A., Nishimoto, I.N., Barros, A.P.B.: Quality of life and depression in patients undergoing total and partial laryngectomy. Clinics 60(2), 135–142 (2005)
Article Google Scholar
Byrne, A., Walsh, M., Farrelly, M., O’Driscoll, K.: Depression following laryngectomy. A pilot study. Brit. J. Psychiat. 163(2), 173–176 (1993)
Article Google Scholar
Cheah, L.A., Bai, J., Gonzalez, J.A., Gilbert, J.M., Ell, S.R., Green, P.D., Moore, R.K.: Preliminary evaluation of a silent speech interface based on intra-oral magnetic sensing. In: Proceedings BioDevices, pp. 108–116 (2016)
Google Scholar
Cheah, L.A., Bai, J., Gonzalez, J.A., Ell, S.R., Gilbert, J.M., Moore, R.K., Green, P.D.: A user-centric design of permanent magnetic articulography based assistive speech technology. In: Proceedings BioSignals, pp. 109–116 (2015)
Google Scholar
Chen, J., Kim, M., Wang, Y., Ji, Q.: Switching Gaussian process dynamic models for simultaneous composite motion tracking and recognition. In: Proceedings IEEE Conference Computer Vision and Pattern Recognition, pp. 2655–2662 (2009)
Google Scholar
Danker, H., Wollbrück, D., Singer, S., Fuchs, M., Brähler, E., Meyer, A.: Social withdrawal after laryngectomy. Eur. Arch. Oto-Rhino-L 267(4), 593–600 (2010)
Article Google Scholar
De Jong, S.: SIMPLS: an alternative approach to partial least squares regression. Chemom. Intell. Lab. Syst. 18(3), 251–263 (1993)
Article Google Scholar
Denby, B., Schultz, T., Honda, K., Hueber, T., Gilbert, J., Brumberg, J.: Silent speech interfaces. Speech Commun. 52(4), 270–287 (2010)
Article Google Scholar
Desai, S., Raghavendra, E.V., Yegnanarayana, B., Black, A.W., Prahallad, K.: Voice conversion using artificial neural networks. In: Proceedings ICASSP, pp. 3893–3896 (2009)
Google Scholar
Ell, S.R.: Candida: the cancer of silastic. J. Laryngol. Otol. 110(03), 240–242 (1996)
Article Google Scholar
Ell, S.R., Mitchell, A.J., Parker, A.J.: Microbial colonization of the groningen speaking valve and its relationship to valve failure. Clin. Otolaryngol. Allied Sci. 20(6), 555–556 (1995)
Article Google Scholar
Fagan, M.J., Ell, S.R., Gilbert, J.M., Sarrazin, E., Chapman, P.M.: Development of a (silent) speech recognition system for patients following laryngectomy. Med. Eng. Phys. 30(4), 419–425 (2008)
Article Google Scholar
Freitas, J., Teixeira, A., Bastos, C., Dias, M.: Towards a multimodal silent speech interface for European Portuguese. In: Speech Technologies, vol. 10, pp. 125–150. InTech (2011)
Google Scholar
Fried-Oken, M., Fox, L., Rau, M.T., Tullman, J., Baker, G., Hindal, M., Wile, N., Lou, J.S.: Purposes of AAC device use for persons with ALS as reported by caregivers. Augment Altern. Commun. 22(3), 209–221 (2006)
Article Google Scholar
Fukada, T., Tokuda, K., Kobayashi, T., Imai, S.: An adaptive algorithm for Mel-cepstral analysis of speech. In: Proceedings ICASSP, pp. 137–140 (1992)
Google Scholar
Ghahramani, Z., Hinton, G.E.: The EM algorithm for mixtures of factor analyzers. Technical report CRG-TR-96-1, University of Toronto (1996)
Google Scholar
Gilbert, J.M., Rybchenko, S.I., Hofe, R., Ell, S.R., Fagan, M.J., Moore, R.K., Green, P.: Isolated word recognition of silent speech using magnetic implants and sensors. Med. Eng. Phys. 32(10), 1189–1197 (2010)
Article Google Scholar
Gonzalez, J.A., Green, P.D., Moore, R.K., Cheah, L.A., Gilbert, J.M.: A non-parametric articulatory-to-acoustic conversion system for silent speech using shared Gaussian process dynamical models. In: UK Speech, p. 11 (2015)
Google Scholar
Gonzalez, J.A., Cheah, L.A., Bai, J., Ell, S.R., Gilbert, J.M., Moore, R.K., Green, P.D.: Analysis of phonetic similarity in a silent speech interface based on permanent magnetic articulography. In: Proceedings Interspeech, pp. 1018–1022 (2014)
Google Scholar
Gonzalez, J.A., Cheah, L.A., Gilbert, J.M., Bai, J., Ell, S.R., Green, P.D., Moore, R.K.: A silent speech system based on permanent magnet articulography and direct synthesis. Comput. Speech Lang. 39, 67–87 (2016)
Article Google Scholar
Heaton, J.M., Parker, A.J.: Indwelling tracheo-oesophageal voice prostheses post-laryngectomy in Sheffield, UK: a 6-year review. Acta Otolaryngol. 114(6), 675–678 (1994)
Article Google Scholar
Herff, C., Heger, D., de Pesters, A., Telaar, D., Brunner, P., Schalk, G., Schultz, T.: Brain-to-text: decoding spoken phrases from phone representations in the brain. Front. Neurosci. 9, 217 (2015)
Article Google Scholar
Hofe, R., Bai, J., Cheah, L.A., Ell, S.R., Gilbert, J.M., Moore, R.K., Green, P.D.: Performance of the MVOCA silent speech interface across multiple speakers. In: Proceedings Interspeech, pp. 1140–1143 (2013)
Google Scholar
Hofe, R., Ell, S.R., Fagan, M.J., Gilbert, J.M., Green, P.D., Moore, R.K., Rybchenko, S.I.: Speech synthesis parameter generation for the assistive silent speech interface MVOCA. In: Proceedings Interspeech, pp. 3009–3012 (2011)
Google Scholar
Hofe, R., Ell, S.R., Fagan, M.J., Gilbert, J.M., Green, P.D., Moore, R.K., Rybchenko, S.I.: Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing. Speech Commun. 55(1), 22–32 (2013)
Article Google Scholar
Hueber, T., Bailly, G.: Statistical conversion of silent articulation into audible speech using full-covariance HMM. Med. Eng. Phys. 36, 274–293 (2016)
Google Scholar
Hueber, T., Bailly, G., Denby, B.: Continuous articulatory-to-acoustic mapping using phone-based trajectory HMM for a silent speech interface. In: Proceedings Interspeech, pp. 723–726 (2012)
Google Scholar
Hueber, T., Benaroya, E.L., Chollet, G., Denby, B., Dreyfus, G., Stone, M.: Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Commun. 52(4), 288–300 (2010)
Article Google Scholar
International Phonetic Association: The international phonetic alphabet (2005)
Google Scholar
Jou, S.C., Schultz, T., Walliczek, M., Kraft, F., Waibel, A.: Towards continuous speech recognition using surface electromyography. In: Proceedings Interspeech, pp. 573–576 (2006)
Google Scholar
Kominek, J., Black, A.W.: The CMU Arctic speech databases. In: Fifth ISCA Workshop on Speech Synthesis, pp. 223–224 (2004)
Google Scholar
Kubichek, R.: Mel-cepstral distance measure for objective speech quality assessment. In: Proceedings of IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 125–128 (1993)
Google Scholar
Leonard, R.: A database for speaker-independent digit recognition. In: Proceedings of ICASSP, pp. 328–331 (1984)
Google Scholar
Maeda, S.: A digital simulation method of the vocal-tract system. Speech Commun. 1(3), 199–229 (1982)
Article Google Scholar
Mullen, J., Howard, D.M., Murphy, D.T.: Waveguide physical modeling of vocal tract acoustics: flexible formant bandwidth control from increased model dimensionality. IEEE Trans. Audio Speech Lang. Process. 14(3), 964–971 (2006)
Article Google Scholar
Murphy, D.T., Jani, M., Ternström, S.: Articulatory vocal tract syntheis in supercollider. In: Proceedings of International Conference on Digital Audio Effects, pp. 1–7 (2015)
Google Scholar
Nakamura, K., Toda, T., Saruwatari, H., Shikano, K.: Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. Speech Commun. 54(1), 134–146 (2012)
Article Google Scholar
Neiberg, D., Ananthakrishnan, G., Engwall, O.: The acoustic to articulation mapping: non-linear or non-unique? In: Proceedings Interspeech, pp. 1485–1488 (2008)
Google Scholar
Petajan, E.D.: Automatic lipreading to enhance speech recognition (speech reading). Ph.D. thesis, University of Illinois at Urbana-Champaign (1984)
Google Scholar
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Sig. Process. 26(1), 43–49 (1978)
Article MATH Google Scholar
Schultz, T., Wand, M.: Modeling coarticulation in EMG-based continuous speech recognition. Speech Commun. 52(4), 341–353 (2010)
Article Google Scholar
Toda, T., Black, A.W., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio Speech Lang. Process. 15(8), 2222–2235 (2007)
Article Google Scholar
Toda, T., Black, A.W., Tokuda, K.: Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Commun. 50(3), 215–227 (2008)
Article Google Scholar
Toutios, A., Maeda, S.: Articulatory VCV synthesis from EMA data. In: Proceedings Interspeech (2012)
Google Scholar
Toutios, A., Margaritis, K.G.: A support vector approach to the acoustic-to-articulatory mapping. In: Proceedings Interspeech, pp. 3221–3224 (2005)
Google Scholar
Toutios, A., Narayanan, S.: Articulatory synthesis of French connected speech from EMA data. In: Proceedings Interspeech, pp. 2738–2742 (2013)
Google Scholar
Uria, B., Renals, S., Richmond, K.: A deep neural network for acoustic-articulatory speech inversion. In: Proceedings of NIPS 2011 Workshop on Deep Learning and Unsupervised Feature Learning (2011)
Google Scholar
Wand, M., Janke, M., Schultz, T.: Tackling speaking mode varieties in EMG-based speech recognition. IEEE Trans. Bio-Med. Eng. 61(10), 2515–2526 (2014)
Article Google Scholar
Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models for human motion. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 283–298 (2008)
Article Google Scholar
Zahner, M., Janke, M., Wand, M., Schultz, T.: Conversion from facial myoelectric signals to speech: a unit selection approach. In: Proceedings Interspeech, pp. 1184–1188 (2014)
Google Scholar
Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Commun. 51(11), 1039–1064 (2009)
Article Google Scholar

Download references

Acknowledgements

This is a summary of independent research funded by the National Institute for Health Research (NIHR)’s Invention for Innovation Programme. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.

Author information

Authors and Affiliations

Department of Computer Science, University of Sheffield, Sheffield, UK
Jose A. Gonzalez, Phil D. Green & Roger K. Moore
School of Engineering, University of Hull, Kingston upon Hull, UK
Lam A. Cheah, James M. Gilbert & Jie Bai
Hull and East Yorkshire Hospitals Trust, Castle Hill Hospital, Cottingham, UK
Stephen R. Ell

Authors

Jose A. Gonzalez
View author publications
You can also search for this author in PubMed Google Scholar
Lam A. Cheah
View author publications
You can also search for this author in PubMed Google Scholar
James M. Gilbert
View author publications
You can also search for this author in PubMed Google Scholar
Jie Bai
View author publications
You can also search for this author in PubMed Google Scholar
Stephen R. Ell
View author publications
You can also search for this author in PubMed Google Scholar
Phil D. Green
View author publications
You can also search for this author in PubMed Google Scholar
Roger K. Moore
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jose A. Gonzalez .

Editor information

Editors and Affiliations

Instituto de Telecomunicações, Lisbon, Portugal
Ana Fred
New University of Lisbon, Lisbon, Portugal
Hugo Gamboa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gonzalez, J.A. et al. (2017). Voice Restoration After Laryngectomy Based on Magnetic Sensing of Articulator Movement and Statistical Articulation-to-Speech Conversion. In: Fred, A., Gamboa, H. (eds) Biomedical Engineering Systems and Technologies. BIOSTEC 2016. Communications in Computer and Information Science, vol 690. Springer, Cham. https://doi.org/10.1007/978-3-319-54717-6_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-54717-6_17
Published: 04 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54716-9
Online ISBN: 978-3-319-54717-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics