Abstract
Hidden Markov model (HMM)-based synthesized speech is intelligible but not natural especially under limited data condition. The goal of this study is to improve naturalness without violating acceptable intelligibility by decomposing the naturalness and intelligibility of synthesized speech using a novel asymmetric bilinear model involving non-negative matrix factorization (NMF). Subjective evaluations carried out on Vietnamese data confirmed that the achieved synthesis quality is higher than other methods under limited data condition. Since F0 contour is important for naturalness and intelligibility, especially in Vietnamese. Proposed method is capable of modifying over-smoothed F0 contour without destroying tonal information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zen, H., Tokuda, K., Black, W.: Statistical parametric speech synthesis. Speech Comm. 51(11), 1039–1064 (2009)
Toda, T., Tokuda, K.: A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Trans. E90–D(05), 816–824 (2007)
Takamichi, S., Toda, T., Black, A., Nakamura, S.: Parameter generation algorithm considering modulation spectrum for HMM-based speech synthesis. In: Proceedings of ICASSP, pp. 4210–4214 (2015)
Takamichi, S., Toda, T., Neubig, G., Nakamura, S.: A post-filter to modify the modulation spectrum in HMM-based speech synthesis. In: Proceedings of ICASSP, pp. 290–294 (2014)
Chen, L.H., Raitio, T., Valentini-Botinhao, C., Yamagishi, J., Ling, Z.H.: DNN-based stochastic postfilter for HMM-based speech synthesis. In: Proceedings of Interspeech, pp. 1954–1958 (2014)
Tenenbaum, J., Freeman, W.: Separating style and content with bilinear models. Neural Comput. 12, 1247–1283 (2000)
Popa, V., Nurminen, J., Gabbouj, M.: A novel technique for voice conversion based on style and content decomposition with bilinear models. In: Proceedings of Interspeech, pp. 2655–2658 (2009)
Stylianou, Y., Cappe, O., Moulines, E.: Continuous probabilistic transform for voice conversion. IEEE Trans. Audio, Speech, Lang. Process. 6, 131–142 (1998)
Tokuda, K., Masuko, T., Imai, S.: Mel-generalized cepstral analysis - a unified approach to speech spectral estimation. In: Proceedings of ICSLP, pp. 1043–1046 (1994)
Dinh-Anh, T., Morikawa, D., Akagi, M.: Study on quality improvement of HMM-based synthesized voices using asymmetric bilinear model. J. Sig. Process. 20(4), 205–208 (2016)
Vu, T.T., Luong, M.C., Nakamura, S.: An HMM-based vietnamese speech synthesis system. In: Proceedings of Oriental COCOSDA, pp. 116–121 (2009)
Phan, T.S., Duong, T.C., Dinh, A.T., Vu, T.T., Luong, M.C.: Improvement of naturalness for an HMM-based Vietnamese speech synthesis using the prosodic information. In: Proceedings of RIVF, pp. 276–281 (2013)
Doan, T.T.: (Vietnamese Phonetics), pp. 99–148. Hanoi National University Publishing House (1999)
Mai, L.C., Duc, D.N.: Design of Vietnamese speech corpus and current status. In: Proceedings of ISCSLP 2006, pp. 748–758 (2006)
Scheffe, H.: An analysis of variance for paired comparisons. J. Am. Stat. Assoc. 37, 381–400 (1952)
Kawahara, H., Masuda-Katsue, I., de Cheveigne, M.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and a instantaneous frequency-based F0 extraction: Possible role of a repetitive structure in sounds. Speech Comm. 27, 187–207 (1999)
Acknowledgement
This study was supported by the Grant-in-Aid for Scientic Research (A) (No. 25240026), SECOM Science and Technology Foundation and the JSPS A3 Foresight program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Dinh, AT., Phan, TS., Akagi, M. (2017). Quality Improvement of Vietnamese HMM-Based Speech Synthesis System Based on Decomposition of Naturalness and Intelligibility Using Non-negative Matrix Factorization. In: Akagi, M., Nguyen, TT., Vu, DT., Phung, TN., Huynh, VN. (eds) Advances in Information and Communication Technology. ICTA 2016. Advances in Intelligent Systems and Computing, vol 538. Springer, Cham. https://doi.org/10.1007/978-3-319-49073-1_53
Download citation
DOI: https://doi.org/10.1007/978-3-319-49073-1_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49072-4
Online ISBN: 978-3-319-49073-1
eBook Packages: EngineeringEngineering (R0)