Skip to main content

Surgery of Speech Synthesis Models to Overcome the Scarcity of Training Data

  • 586 Accesses

Part of the Lecture Notes in Computer Science book series (LNAI,volume 10077)

Abstract

In a previous work we developed an HMM-based TTS system for a Basque dialect spoken in southern France. We observed that French words, frequent in daily conversations, were not pronounced properly by the TTS system because the training corpus contained very few instances of some French phones. This paper reports our attempt to improve the pronunciation of these phones without redesigning the corpus or recording the speaker again. Inspired by techniques used to adapt synthetic voices using dysarthric speech, we transplant phones of a different French voice to our Basque voice, and we show the slight improvements found after surgery.

Keywords

  • Model Surgery
  • Synthetic Speech
  • French Word
  • Synthetic Voice
  • Nasal Vowel

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-49169-1_8
  • Chapter length: 11 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   54.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-49169-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   69.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

References

  1. Zen, H., Tokuda, K., Black, A.: Statistical parametric speech synthesis. Speech Commun. 51(11), 1039–1064 (2009)

    CrossRef  Google Scholar 

  2. Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of ICASSP, pp. 373–376 (1996)

    Google Scholar 

  3. Yamagishi, J., Nose, T., Zen, H., Ling, Z.H., Toda, T., Tokuda, K., King, S., Renals, S.: Robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE Trans. Audio Speech Lang. Process. 17(6), 1208–1230 (2009)

    CrossRef  Google Scholar 

  4. Zen, H., Braunschweiler, N., Buchholz, S., Gales, M.J.F., Knill, K., Krstulovic, S., Latorre, J.: Statistical parametric speech synthesis based on speaker and language factorization. IEEE Trans. Audio Speech Lang. Process. 20(6), 1713–1724 (2012)

    CrossRef  Google Scholar 

  5. Obin, N., Lanchantin, P., Lacheret, A., Rodet, X.: Discrete/continuous modelling of speaking style in HMM-based speech synthesis: design and evaluation. In: Proceedings of Interspeech, pp. 2785–2788 (2011)

    Google Scholar 

  6. Barra-Chicote, R., Yamagishi, J., King, S., Montero, J.M., Macias-Guarasa, J.: Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech. Speech Commun. 52(5), 394–404 (2010)

    CrossRef  Google Scholar 

  7. Zen, H., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: A hidden semi-Markov model-based speech synthesis system. IEICE Trans. Inf. Syst. E90–D(5), 825–834 (2007)

    CrossRef  Google Scholar 

  8. Yamagishi, J., Veaux, C., King, S., Renals, S.: Speech synthesis technologies for individuals with vocal disabilities: voice banking and reconstruction. Acoust. Sci. Technol. 33(1), 1–5 (2012)

    CrossRef  Google Scholar 

  9. Erro, D., Hernáez, I., Navas, E., Alonso, A., Arzelus, H., Jauk, I., Hy, N.Q., Magariños, C., Pérez-Ramón, R., Sulír, M., Tian, X., Wang, X., Ye, J.: ZureTTS: online platform for obtaining personalized synthetic voices. In: Proceedings of eNTERFACE 2014 (2014)

    Google Scholar 

  10. Erro, D., Hernaez, I., Alonso, A., Garcia-Lorenzo, D., Navas, E., Ye, J., Arzelus, H., Jauk, I., Hy, N., Magariños, C., Perez-Ramon, R., Sulir, M., Tian, X., Wang, X.: Personalized synthetic voices for speaking impaired: website and app. In: Proceedings of Interspeech (2015)

    Google Scholar 

  11. Creer, S., Cunningham, S., Green, P., Yamagishi, J.: Building personalised synthetic voices for individuals with severe speech impairment. Comput. Speech Lang. 27(6), 1178–1193 (2013)

    CrossRef  Google Scholar 

  12. Veaux, C., Yamagishi, J., King, S.: Towards personalized synthesized voices for individuals with vocal disabilities: voice banking and reconstruction. In: Proceeding of SLPAT, pp. 107–111 (2013)

    Google Scholar 

  13. Navas, E., Hernaez, I., Erro, D., Salaberria, J., Oyharçabal, B., Padilla, M.: Developing a Basque TTS for the Navarro-Lapurdian dialect. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo García, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS (LNAI), vol. 8854, pp. 11–20. Springer, Heidelberg (2014). doi:10.1007/978-3-319-13623-3_2

    Google Scholar 

  14. Erro, D., Sainz, I., Navas, E., Hernáez, I.: Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE J. Sel. Top. Sig. Process. 8(2), 184–194 (2014)

    CrossRef  Google Scholar 

  15. Sainz, I., Erro, D., Navas, E., Hernáez, I., Sánchez, J., Saratxaga, I., Odriozola, I., Luengo, I.: Aholab speech synthesizers for albayzin2010. In: Proceedings of FALA 2010, pp. 343–348 (2010)

    Google Scholar 

  16. Erro, D., Sainz, I., Luengo, I., Odriozola, I., Sánchez, J., Saratxaga, I., Navas, E., Hernáez, I.: HMM-based speech synthesis in Basque language using HTS. In: Proceedings of FALA, pp. 67–70 (2010)

    Google Scholar 

  17. Picart, B.: Statistical parametric speech synthesis based on the degree of articulation. Ph.D. thesis, Faculté Polytechnique, University of Mons (2013)

    Google Scholar 

  18. Roekhaut, S., Brognaux, S., Beaufort, R., Dutoit, T.: eLite-HTS: a NLP tool for French HMM-based speech synthesis. In: Proceedings of Interspeech, pp. 2136–2137 (2014)

    Google Scholar 

  19. Magariños, C., Erro, D., Rodriguez-Banga, E.: Language-independent acoustic cloning of HTS voices: a preliminary study. In: Proceedings of ICASSP, pp. 5615–5619 (2016)

    Google Scholar 

  20. Erro, D., Moreno, A., Bonafonte, A.: INCA algorithm for training voice conversion systems from nonparallel corpora. IEEE Trans. Audio Speech Lang. Process. 18(5), 944–953 (2010)

    CrossRef  Google Scholar 

  21. Pitz, M., Ney, H.: Vocal tract normalization equals linear transformation in cepstral space. IEEE Trans. Speech. Audio Process. 13, 930–944 (2005)

    CrossRef  Google Scholar 

  22. Valbret, H., Moulines, E., Tubach, J.: Voice transformation using PSOLA technique. Speech Commun. 11(2–3), 175–187 (1992)

    CrossRef  Google Scholar 

  23. Erro, D., Navas, E., Hernaez, I.: Parametric voice conversion based on bilinear frequency warping plus amplitude scaling. IEEE Trans. Audio Speech Lang. Process. 21(3), 556–566 (2013)

    CrossRef  Google Scholar 

  24. Zorilă, T.C., Erro, D., Hernaez, I.: Improving the quality of standard GMM-based voice conversion systems by considering physically motivated linear transformations. Commun. Comput. Inf. Sci. 328, 30–39 (2012)

    CrossRef  Google Scholar 

  25. Alonso, A., Erro, D., Navas, E., Hernaez, I.: Speaker adaptation using only vocalic segments via frequency warping. In: Proceedings of Interspeech (2015)

    Google Scholar 

Download references

Acknowledgements

This work has been partially funded by the Spanish Ministry of Economy and Competitiveness (RESTORE project, TEC2015-67163-C2-1-R MINECO/FEDER, UE) and the Basque Government (ELKAROLA project, KK-2015/00098). The research stay of A. Pierard at UPV/EHU was funded by the Erasmus program. The French database used in this study was generously provided by Acapela Group. We thank B. Picart for his help.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to I. Hernaez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Pierard, A., Erro, D., Hernaez, I., Navas, E., Dutoit, T. (2016). Surgery of Speech Synthesis Models to Overcome the Scarcity of Training Data. In: , et al. Advances in Speech and Language Technologies for Iberian Languages. IberSPEECH 2016. Lecture Notes in Computer Science(), vol 10077. Springer, Cham. https://doi.org/10.1007/978-3-319-49169-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49169-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49168-4

  • Online ISBN: 978-3-319-49169-1

  • eBook Packages: Computer ScienceComputer Science (R0)