Using Mandarin Training Corpus to Realize a Mandarin-Tibetan Cross-Lingual Emotional Speech Synthesis

Wu, Peiwen; Yang, Hongwu; Gan, Zhenye

doi:10.1007/978-981-10-8111-8_11

Peiwen Wu¹⁴,
Hongwu Yang¹⁴ &
Zhenye Gan¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 807))

Included in the following conference series:

National Conference on Man-Machine Speech Communication

361 Accesses

Abstract

This paper presents a hidden Markov model (HMM)-based Mandarin-Tibetan cross-lingual emotional speech synthesis by using an emotional Mandarin speech corpus with speaker adaptation. We firstly train a set of average acoustic models by speaker adaptive training with a one-speaker neutral Tibetan corpus and a multi-speaker neutral Mandarin corpus. Then we train a set of speaker dependent acoustic models of target emotion, which are used to synthesize emotional Tibetan or Mandarin speech, by speaker adaptation with the target emotional Mandarin corpus. Subjective evaluations and objective tests show that the method can synthesize both emotional Mandarin speech and emotional Tibetan speech with high naturalness and emotional similarity. Therefore, the method can be adopted to realizing an emotional speech synthesis with exiting emotional training corpus for languages lacking emotional speech resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 72.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lorenzo-Trueba, J., Barra-Chicote, R., San-Segundo, R., et al.: Emotion transplantation through adaptation in HMM-based speech synthesis. Comput. Speech Lang. 34, 292–307 (2015)
Article Google Scholar
Schroder, M.: Emotional speech synthesis: a review. In: Interspeech, pp. 561–564 (2001)
Google Scholar
Valbret, H., Moulines, E., Tubach, J.P.: Voice transformation using PSOLA technique. Speech Commun. 11, 175–187 (1992)
Article Google Scholar
Adell, J., Escudero, D., Bonafonte, A.: Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence. Speech Commun. 54, 459–476 (2012)
Article Google Scholar
Hamza, W., Eide, E., Bakis, R., et al.: The IBM expressive speech synthesis system. In: Interspeech (2004)
Google Scholar
Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Commun. 51, 1039–1064 (2009)
Article Google Scholar
Pitrelli, J.F., Bakis, R., Eide, E.M., et al.: The IBM expressive text-to-speech synthesis system for American English. IEEE Trans. Audio Speech Lang. Process. 14, 1099–1108 (2006)
Article Google Scholar
Bulut, M., Narayanan, S.S., Syrdal, A.K.: Expressive speech synthesis using a concatenative synthesizer. In: Interspeech (2002)
Google Scholar
Eide, E.: Preservation, identification, and use of emotion in a text-to-speech system. In: Proceedings of 2002 IEEE Workshop on Speech Synthesis, pp. 127–130. IEEE (2002)
Google Scholar
Strom, V., King, S.: Investigating Festival’s target cost function using perceptual experiments (2008)
Google Scholar
Yamagishi, J., Onishi, K., Masuko, T., et al.: Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis. IEICE Trans. Inf. Syst. 88, 502–509 (2005)
Article Google Scholar
Tachibana, M., Yamagishi, J., Masuko, T., et al.: Speech synthesis with various emotional expressions and speaking styles by style interpolation and morphing. IEICE Trans. Inf. Syst. 88, 2484–2491 (2005)
Article Google Scholar
Takashi, N., Yamagishi, J., Masuko, T., et al.: A style control technique for HMM-based expressive speech synthesis. IEICE Trans. Inf. Syst. 90, 1406–1413 (2007)
Google Scholar
Yamagishi, J., Kobayashi, T., Nakano, Y., et al.: Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans. Audio Speech Lang. Process. 17, 66–83 (2009)
Article Google Scholar
Lorenzo-Trueba, J., Barra-Chicote, R., Yamagishi, J., Montero, J.M.: Towards cross-lingual emotion transplantation. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo García, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS (LNAI), vol. 8854, pp. 199–208. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13623-3_21
Google Scholar
Zen, H.: Speaker and language adaptive training for HMM-based polyglot speech synthesis. In: Eleventh Annual Conference of the International Speech Communication Association (2010)
Google Scholar
Yang, H., Oura, K., Wang, H., et al.: Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis. Multimed. Tools Appl. 74, 9927–9942 (2015)
Article Google Scholar
Russell, J.A.: Pancultural aspects of the human conceptual organization of emotions. J. Pers. Soc. Psychol. 45, 1281 (1983)
Article Google Scholar
Wester, M.: The EMIME bilingual database. University of Edinburgh (2010)
Google Scholar
Loizou, P.C.: Speech quality assessment. In: Lin, W., Tao, D., Kacprzyk, J., Li, Z., Izquierdo, E., Wang, H. (eds.) Multimedia Analysis, Processing and Communications. SCI, vol. 346, pp. 623–654. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19551-8_23
Chapter Google Scholar

Download references

Acknowledgments

The research leading to these results was partly funded by the National Natural Science Foundation of China (Grant No. 11664036, 61263036) and Natural Science Foundation of Gansu (Grant No. 1506RJYA126).

Author information

Authors and Affiliations

College of Physics and Electronic Engineering, Northwest Normal University, Lanzhou, 730070, China
Peiwen Wu, Hongwu Yang & Zhenye Gan

Authors

Peiwen Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hongwu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhenye Gan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongwu Yang .

Editor information

Editors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Jianhua Tao
Computer Science and Technology, Tsinghua University, Beijing, China
Thomas Fang Zheng
Beijing University of Technology , Beijing, China
Changchun Bao
Tsinghua University , Beijing, China
Dong Wang
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Ya Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, P., Yang, H., Gan, Z. (2018). Using Mandarin Training Corpus to Realize a Mandarin-Tibetan Cross-Lingual Emotional Speech Synthesis. In: Tao, J., Zheng, T., Bao, C., Wang, D., Li, Y. (eds) Man-Machine Speech Communication. NCMMSC 2017. Communications in Computer and Information Science, vol 807. Springer, Singapore. https://doi.org/10.1007/978-981-10-8111-8_11

Download citation

DOI: https://doi.org/10.1007/978-981-10-8111-8_11
Published: 03 February 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8110-1
Online ISBN: 978-981-10-8111-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics