Abstract
The main aim of this study is to describe the process of creating a multimodal speech synthesis system for the Polish language. It consists of two modules: a unit-selection speech synthesizer and a 3D avatar. The naturalness of unit Selection Speech Synthesis is achieved by the careful joining together of suitable acoustic units, covering the whole of the utterance. The main prerequisite for a unit selection system is a speech database. A speech corpus was constructed in such a way that its phonetic representation serves as the basis for the design of the cost function. This cost function is often decomposed into two costs: a target cost (how closely candidate units in the inventory match the specification of the target phone sequence) and join cost (how well neighboring units can be joined). The implementation of the new Polish voice was prepared in metasystem Festival. To obtain higher quality of synthetic speech the optimization of the cost function was conducted by applying a genetic algorithm. Additionally a prototype 3D talking head was built, containing 9 visems corresponding to Polish groups of phonemes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bailador, A.: Corpuscrt. technical report. Tech. rep., Polytechnic University of Catalonia (1998)
Bełkowska, J., Głowienko, A., Marasek, K.: Audiovisual synthesis of polish using two- and three-dimensional animation. In: Wojciechowski, K., Smolka, B., Palus, H., Kozera, R., Skarbek, W., Noakes, L. (eds.) Proceedings of the International Conference on Computer Vision and Graphics (ICCVG 2004). Computational Imaging and Vision, vol. 32, pp. 1082–1087. Springer, Netherlands (2006)
Black, A.W., Lenzo, K.A.: Building Synthetic Voices. O’Reilly Media, Inc. (2001)
Bozkurt, B., Ozturk, O., Dutoit, T.: Text design for TTS speech corpus building using a modified greedy selection. In: Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH 2003), pp. 277–280 (2003)
Clark, R.A.J., Richmond, K., King, S.: Multisyn: Open-domain unit selection for the festival speech synthesis system. Speech Communication 49(4), 317–330 (2007)
Janicki, A., Bloch, J., Taylor, K.: Visual speech synthesis for polish using keyframe based animation. In: Proceedings of the International Conference on Signals and Electronic Systems (ICSES 2010), pp. 423–426 (2010)
Kaszczuk, M., Osowski, Ł.: The IVO Software Blizzard Challenge 2009 entry: Improving ivona text-to-speech. In: Blizzard Challenge 2009 Workshop (2009)
Marasek, K., Gubrynowicz, R.: Multi-level annotation in speecon polish speech database. In: Bolc, L., Michalewicz, Z., Nishida, T. (eds.) IMTCI 2004. LNCS (LNAI), vol. 3490, pp. 58–67. Springer, Heidelberg (2005)
Oliver, D.: Polish text to speech synthesis. Master’s thesis, University of Edinburgh Department of Linguistics (1998)
Oliver, D., Szklanny, K.: Creation and analysis of a polish speech database for use in unit selection synthesis. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), pp. 297–302 (2006)
van Santen, J.P.H., Buchsbaum, A.L.: Methods for optimal text selection. In: Kokkinakis, G., Fakotakis, N., Dermatas, E. (eds.) Proceedings of the 5th European Conference on Speech Communication and Technology (EUROSPEECH 1997). ISCA (1997)
Szklanny, K.: Optimization of cost function for polish unit-selection speech synthesis. Ph.D. thesis, Polish-Japanese Institute of Information Technology (2009), http://syntezamowy.pjwstk.edu.pl/publikacje/szklanny_doktorat.pdf
Szklanny, K., Wojtowski, M.: Automatic segmentation quality improvement for realization of unit selection. In: Proceedings of the Conference on Human System Interactions, pp. 251–256. IEEE (2008)
Szymański, M., Kleesa, K., Demenko, G.: Optimization of unit selection speech synthesis. In: Proceedings of 17th International Congress of Phonetic Sciences (ICPhS 2011), pp. 1930–1933 (2011)
Young, S.J., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book Version 3.4. Cambridge University Press (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Szklanny, K. (2014). Multimodal Speech Synthesis for Polish Language. In: Gruca, D., Czachórski, T., Kozielski, S. (eds) Man-Machine Interactions 3. Advances in Intelligent Systems and Computing, vol 242. Springer, Cham. https://doi.org/10.1007/978-3-319-02309-0_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-02309-0_35
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02308-3
Online ISBN: 978-3-319-02309-0
eBook Packages: EngineeringEngineering (R0)