Abstract
In our previous work, we introduced a zero-concatenation-cost (ZCC) chain based framework of unit-selection speech synthesis. This framework proved to be very fast as it reduced the computational load of a unit-selection system up to hundreds of time. Since the ZCC chain based algorithm principally prefers to select longer segments of speech, an increased number of audible artifacts were expected to occur at concatenation points of longer ZCC chains. Indeed, listening tests revealed a number of artifacts present in synthetic speech; however, the artifacts occurred in a similar extent in synthetic speech produced by both ZCC chain based and standard Viterbi search algorithms. In this paper, we focus on the sources of the artifacts and we propose improvements of the synthetic speech quality within the ZCC algorithm. The quality and computational demands of the improved ZCC algorithm are compared to the unit-selection algorithm based on the standard Viterbi search.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Beutnagel, M., Mohri, M., Riley, M.: Rapid unit selection from a large speech corpus for concatenative speech synthesis. In: Proc. EUROSPEECH, Budapest, Hungary, pp. 607–610 (1999)
Blouin, C., Bagshaw, P.C., Rosec, O.: A method of unit pre-selection for speech synthesis based on acoustic clustering and decision trees. In: Proc. ICASSP, Hong Kong, vol. 1, pp. 692–695 (2003)
Čepko, J., Talafová, R., Vrabec, J.: Indexing join costs for faster unit selection synthesis. In: Proc. Internat. Conf. Systems, Signals Image Processing (IWSSIP), Bratislava, Slovak Republic, pp. 503–506 (2008)
Conkie, A., Beutnagel, M., Syrdal, A.K., Brown, P.: Preselection of candidate units in a unit selection-based text-to-speech synthesis system. In: Proc. ICSLP, Beijing, China, vol. 3, pp. 314–317 (2000)
Conkie, A., Syrdal, A.K.: Using F0 to constrain the unit selection Viterbi network. In: Proc. ICASSP, Prague, Czech Republic, pp. 5376–5379 (2011)
Hamza, W., Donovan, R.: Data-driven segment preselection in the IBM trainable speech synthesis system. In: Proc. INTERSPEECH, Denver, USA, pp. 2609–2612 (2002)
Hunt, A.J., Black, A.W.: Unit selection in concatenative speech synhesis system using a large speech database. In: Proc. ICASSP, Atlanta, USA, pp. 373–376 (1996)
Kala, J., Matoušek, J.: Very fast unit selection using Viterbi search with zero-concatenation-cost chains. In: Proc. ICASSP, Florence, Italy (2014)
Legát, M., Matoušek, J., Tihelka, D.: On the detection of pitch marks using a robust multi-phase algorithm. Speech Commun. 53(4), 552–566 (2011)
Ling, Z.H., Hu, Y., Shuang, Z.W., Wang, R.H.: Decision tree based unit pre-selection in Mandarin Chinese synthesis. In: Proc. ISCSLP, Taipei, Taiwan (2002)
Matoušek, J., Romportl, J.: On building phonetically and prosodically rich speech corpus for text-to-speech synthesis. In: Proc. 2nd IASTED Internat. Conf. on Computational Intelligence, San Francisco, USA, pp. 442–447 (2006)
Nishizawa, N., Kawai, H.: Unit database pruning based on the cost degradation criterion for concatenative speech synthesis. In: Proc. ICASSP, Las Vegas, USA, pp. 3969–3972 (2008)
Riley, M.: Tree-based modeling for speech synthesis. In: Bailly, G., Benoit, C., Sawallis, T. (eds.) Talking Machines: Theories, Models and Designs, pp. 265–273. Elsevier, Amsterdam (1992)
Romportl, J., Kala, J.: Prosody modelling in czech text-to-speech synthesis. In: Proceedings of the 6th ISCA Workshop on Speech Synthesis, pp. 200–205. Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn (2007)
Romportl, J., Matoušek, J., Tihelka, D.: Advanced prosody modelling. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 441–447. Springer, Heidelberg (2004)
Sakai, S., Kawahara, T., Nakamura, S.: Admissible stopping in Viterbi beam search for unit selection in concatenative speech synthesis. In: Proc. ICASSP, Las Vegas, USA, pp. 4613–4616 (2008)
Taylor, P., Caley, R., Black, A., King, S.: Edinburgh speech tools library: System documentation (1999), http://www.cstr.ed.ac.uk/projects/speech_tools/manual-1.2.0/
Tihelka, D., Kala, J., Matoušek, J.: Enhancements of Viterbi search for fast unit selection synthesis. In: Proc. INTERSPEECH, Makuhari, Japan, pp. 174–177 (2010)
Tihelka, D., Matoušek, J.: Unit selection and its relation to symbolic prosody: a new approach. In: Proc. INTERSPEECH, Pittsburgh, USA, pp. 2042–2045 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Kala, J., Matoušek, J. (2014). Quality Improvements of Zero-Concatenation-Cost Chain Based Unit Selection. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-11581-8_47
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11580-1
Online ISBN: 978-3-319-11581-8
eBook Packages: Computer ScienceComputer Science (R0)