Skip to main content

A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis

  • Conference paper
Book cover Text, Speech, and Dialogue (TSD 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8082))

Included in the following conference series:

Abstract

We compare the performance of two approaches when using cross-lingual data from different speakers to build bilingual speech synthesis systems capable of producing speech with the same speaker identity. One approach treats data from both languages as monolingual, by labeling all data with a manually joined phoneme set. Speaker independent voice is trained using the joined data, and adapted to the target speaker using the CMLLR adaptation.

In the second approach, speaker independent voices are trained for each language separately. State mapping between these voices is derived automatically from minimum Kullback–Leibler divergence between state distributions. The mapping is used to apply the adaptation transformations calculated within one language across languages to the other speaker independent voice.

We evaluate the quality of speech on MOS scale and similarity of synthesized speech characteristics to the target speaker using DMOS on the example of Croatian-Slovene language pair.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Traber, C., Huber, K., Nedir, K., Pfister, B., Keller, E., Zellner, B.: From multilingual to polyglot speech synthesis. In: Proc. of the Eurospeech, vol. 99, pp. 835–838 (1999)

    Google Scholar 

  2. Justin, T., Pobar, M., Ipšić, I., Mihelič, F., Žibert, J.: A bilingual HMM-based speech synthesis system for closely related languages. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 543–550. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  3. Wu, Y.J., Nankaku, Y., Tokuda, K.: State mapping based method for cross-lingual speaker adaptation in hmm-based speech synthesis. In: Proc. of Interspeech, pp. 528–531 (2009)

    Google Scholar 

  4. Yamagishi, J., Masuko, T., Tokuda, K., Kobayashi, T.: A training method for average voice model based on shared decision tree context clustering and speaker adaptive training. In: Proceedings of ICASSP 2003, vol. 1, I–716–I–719 (2003)

    Google Scholar 

  5. Liang, H., Qian, Y., Soong, F.K., Liu, G.: A cross-language state mapping approach to bilingual (mandarin-english) tts. In: ICASSP 2008, pp. 4641–4644. IEEE (2008)

    Google Scholar 

  6. Martincic-Ipsic, S., Ipsic, I.: Veprad: a croatian speech database of weather forecasts. In: Information Technology Interfaces, ITI 2003, pp. 321–326 (2003)

    Google Scholar 

  7. Žibert, J., Mihelič, F.: Slovenian weather forecast speech database. In: Proc, Softcom, vol. 1, pp. 199–206 (October 2000)

    Google Scholar 

  8. Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A.W., Tokuda, K.: The hmm-based speech synthesis system (hts) version 2.0. In: Proc. of Sixth ISCA Workshop on Speech Synthesis, pp. 294–299 (2007)

    Google Scholar 

  9. Imai, S., Sumita, K., Furuichi, C.: Mel log spectrum approximation (MLSA) filter for speech synthesis. Electronics and Communications in Japan (Part I: Communications) 66(2), 10–18 (1983)

    Article  Google Scholar 

  10. Wells, J.C.: SAMPA computer readable phonetic alphabet. In: Handbook of Standards and Resources for Spoken Language Systems. Walter de Gruyter, Berlin (1997)

    Google Scholar 

  11. Yamagishi, J., Ogata, K., Nakano, Y., Isogai, J., Kobayashi, T.: Hsmm-based model adaptation algorithms for average-voice-based speech synthesis. In: ICASSP 2006 Proceedings, vol. 1, p. 1 (2006)

    Google Scholar 

  12. Latorre, J., Iwano, K., Furui, S.: New approach to the polyglot speech generation by means of an hmm-based speaker adaptable synthesizer. Speech Communication 48(10), 1227–1242 (2006)

    Article  Google Scholar 

  13. International Telecommunication Union: ITU-T Recommendation P.800.1: Mean Opinion Score (MOS) terminology. Technical report (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pobar, M., Justin, T., Žibert, J., Mihelič, F., Ipšić, I. (2013). A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40585-3_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40584-6

  • Online ISBN: 978-3-642-40585-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics