Objective Comparison of Four GMM-Based Methods for PMA-to-Speech Conversion

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10077)


In silent speech interfaces a mapping is established between biosignals captured by sensors and acoustic characteristics of speech. Recent works have shown the feasibility of a silent interface based on permanent magnet-articulography (PMA). This paper studies the performance of four different mapping methods based on Gaussian mixture models (GMMs), typical from the voice conversion field, when applied to PMA-to-spectrum conversion. The results show the superiority of methods based on maximum likelihood parameter generation (MLPG), especially when the parameters of the mapping function are trained by minimizing the generation error. Informal listening tests reveal that the resulting speech is moderately intelligible for the database under study.


Gaussian Mixture Model Deep Neural Network Source Vector Speech Recording Voice Conversion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work has been partially funded by the Spanish Ministry of Economy and Competitiveness (RESTORE project, TEC2015-67163-C2-1-R MINECO/FEDER, UE) and the Basque Government (ELKAROLA, KK-2015/00098). We would like to thank the Univeristy of Hull and the University of Sheffield, especially Dr. Jose A. Gonzalez, for the permission to use the PMA data in this work.


  1. 1.
    Qi, Y., Weinberg, B., Bi, N.: Enhancement of female esophageal and tracheoesophageal speech. J. Acoust. Soc. Am. 98, 2461–2465 (1995)CrossRefGoogle Scholar
  2. 2.
    Matsui, K., Hara, N.: Enhancement of esophageal speech using formant synthesis. In: Proceedings of the ICASSP, pp. 81–84 (1999)Google Scholar
  3. 3.
    del Pozo, A., Young, S.J.: Continuous tracheoesophageal speech repair. In: Proceedings of the EUSIPCO, pp. 1–5 (2006)Google Scholar
  4. 4.
    Türkmen, H.I., Karsligil, M.E.: Reconstruction of dysphonic speech by MELP. In: Ruiz-Shulcloper, J., Kropatsch, W.G. (eds.) CIARP 2008. LNCS, vol. 5197, pp. 767–774. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-85920-8_93 CrossRefGoogle Scholar
  5. 5.
    Mantilla-Caeiros, A., Nakano-Miyatake, M., Perez-Meana, H.: A pattern recognition based esophageal speech enhancement system. J. Appl. Res. Tech. 8(1), 56–71 (2010)Google Scholar
  6. 6.
    Doi, H., Nakamura, K., Toda, T., Saruwatari, H., Shikano, K.: Esophageal speech enhancement based on statistical voice conversion with Gaussian mixture models. IEICE Trans. Inf. Syst. E93–D(9), 2472–2482 (2010)CrossRefGoogle Scholar
  7. 7.
    Nakamura, K., Toda, T., Saruwatari, H., Shikano, K.: Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. Speech Commun. 54(1), 134–146 (2012)CrossRefGoogle Scholar
  8. 8.
    Doi, H., Toda, T., Nakamura, K., Saruwatari, H., Shikano, K.: Alaryngeal speech enhancement based on one-to-many eigenvoice conversion. IEEE/ACM Trans. Audio Speech Lang. Process. 22(1), 172–183 (2014)CrossRefGoogle Scholar
  9. 9.
    Kello, C.T., Plaut, D.C.: A neural network model of the articulatoryacoustic forward mapping trained on recordings of articulatory parameters. J. Acoust. Soc. Am. 116(4), 2354–2364 (2004)CrossRefGoogle Scholar
  10. 10.
    Toda, T., Black, A.W., Tokuda, K.: Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Commun. 50(3), 215–227 (2008)CrossRefGoogle Scholar
  11. 11.
    Denby, B., Schultz, T., Honda, K., Hueber, T., Gilbert, J.M., Brumberg, J.S.: Silent speech interfaces. Speech Commun. 52(4), 270–287 (2010)CrossRefGoogle Scholar
  12. 12.
    Hofe, R., Ell, S.R., Fagan, M.J., Gilbert, J.M., Green, P.D., Moore, R.K., Rybchenko, S.I.: Speech synthesis parameter generation for the assistive silent speech interface MVOCA. In: Proceedings of the INTERSPEECH, pp. 3009–3012 (2011)Google Scholar
  13. 13.
    Cheah, L.A., Bai, J., Gonzalez, J.A., Ell, S.R., Gilbert, J.M., Moore, R.K., Green, P.D.: A user-centric design of permanent magnetic articulography based assistive speech technology. In: Proceedings of the BioSignals, pp. 109–116 (2015)Google Scholar
  14. 14.
    Gonzalez, J.A., Cheah, L.A., Gilbert, J.M., Bai, J., Ell, S.R., Green, P.D., Moore, R.K.: A silent speech system based on permanent magnet articulography and direct synthesis. Comput. Speech Lang. 39, 67–87 (2016)CrossRefGoogle Scholar
  15. 15.
    Kain, A., Macon, M.W.: Spectral voice conversion for text-to-speech synthesis. In: Proceedings of the ICASSP, pp. 285–288 (1998)Google Scholar
  16. 16.
    Stylianou, Y., Cappé, O., Moulines, E.: Continuous probabilistic transform for voice conversion. IEEE Trans. Speech Audio Process. 6(2), 131–142 (1998)CrossRefGoogle Scholar
  17. 17.
    Ye, H., Young, S.J.: Quality-enhanced voice morphing using maximum likelihood transformations. IEEE Trans. Audio Speech Lang. Process. 14(4), 1301–1312 (2006)CrossRefGoogle Scholar
  18. 18.
    Toda, T., Black, A., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio Speech Lang. Process. 15(8), 2222–2235 (2007)CrossRefGoogle Scholar
  19. 19.
    Erro, D., Alonso, A., Serrano, L., Tavarez, D., Odriozola, I., Sarasola, X., Del-Blanco, E., Sanchez, J., Saratxaga, I., Navas, E., Hernaez, I.: ML parameter generation with a reformulated MGE training criterion participation in the voice conversion challenge 2016. In: Proceedings of the INTERSPEECH (2016)Google Scholar
  20. 20.
    Kominek, J., Black, A.W.: The CMU arctic speech databases. In: Proceedings of the 5th ISCA Speech Synthesis Workshop, pp. 223–224 (2004)Google Scholar
  21. 21.
    Erro, D., Sainz, I., Navas, E., Hernáez, I.: Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE J. Sel. Top. Sig. Process. 8(2), 184–194 (2014)CrossRefGoogle Scholar
  22. 22.
    Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Multi-space probability distribution HMM. IEICE Trans. Inf. Syst. E85–D(3), 455–464 (2002)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Aholab, University of the Basque Country (UPV/EHU)BilbaoSpain
  2. 2.IKERBASQUE, Basque Foundation for ScienceBilbaoSpain

Personalised recommendations