Multimodal Speech Synthesis for Polish Language

Szklanny, Krzysztof

doi:10.1007/978-3-319-02309-0_35

Krzysztof Szklanny⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 242))

1759 Accesses

Abstract

The main aim of this study is to describe the process of creating a multimodal speech synthesis system for the Polish language. It consists of two modules: a unit-selection speech synthesizer and a 3D avatar. The naturalness of unit Selection Speech Synthesis is achieved by the careful joining together of suitable acoustic units, covering the whole of the utterance. The main prerequisite for a unit selection system is a speech database. A speech corpus was constructed in such a way that its phonetic representation serves as the basis for the design of the cost function. This cost function is often decomposed into two costs: a target cost (how closely candidate units in the inventory match the specification of the target phone sequence) and join cost (how well neighboring units can be joined). The implementation of the new Polish voice was prepared in metasystem Festival. To obtain higher quality of synthetic speech the optimization of the cost function was conducted by applying a genetic algorithm. Additionally a prototype 3D talking head was built, containing 9 visems corresponding to Polish groups of phonemes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bailador, A.: Corpuscrt. technical report. Tech. rep., Polytechnic University of Catalonia (1998)
Google Scholar
Bełkowska, J., Głowienko, A., Marasek, K.: Audiovisual synthesis of polish using two- and three-dimensional animation. In: Wojciechowski, K., Smolka, B., Palus, H., Kozera, R., Skarbek, W., Noakes, L. (eds.) Proceedings of the International Conference on Computer Vision and Graphics (ICCVG 2004). Computational Imaging and Vision, vol. 32, pp. 1082–1087. Springer, Netherlands (2006)
Google Scholar
Black, A.W., Lenzo, K.A.: Building Synthetic Voices. O’Reilly Media, Inc. (2001)
Google Scholar
Bozkurt, B., Ozturk, O., Dutoit, T.: Text design for TTS speech corpus building using a modified greedy selection. In: Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH 2003), pp. 277–280 (2003)
Google Scholar
Clark, R.A.J., Richmond, K., King, S.: Multisyn: Open-domain unit selection for the festival speech synthesis system. Speech Communication 49(4), 317–330 (2007)
Article Google Scholar
Janicki, A., Bloch, J., Taylor, K.: Visual speech synthesis for polish using keyframe based animation. In: Proceedings of the International Conference on Signals and Electronic Systems (ICSES 2010), pp. 423–426 (2010)
Google Scholar
Kaszczuk, M., Osowski, Ł.: The IVO Software Blizzard Challenge 2009 entry: Improving ivona text-to-speech. In: Blizzard Challenge 2009 Workshop (2009)
Google Scholar
Marasek, K., Gubrynowicz, R.: Multi-level annotation in speecon polish speech database. In: Bolc, L., Michalewicz, Z., Nishida, T. (eds.) IMTCI 2004. LNCS (LNAI), vol. 3490, pp. 58–67. Springer, Heidelberg (2005)
Chapter Google Scholar
Oliver, D.: Polish text to speech synthesis. Master’s thesis, University of Edinburgh Department of Linguistics (1998)
Google Scholar
Oliver, D., Szklanny, K.: Creation and analysis of a polish speech database for use in unit selection synthesis. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), pp. 297–302 (2006)
Google Scholar
van Santen, J.P.H., Buchsbaum, A.L.: Methods for optimal text selection. In: Kokkinakis, G., Fakotakis, N., Dermatas, E. (eds.) Proceedings of the 5th European Conference on Speech Communication and Technology (EUROSPEECH 1997). ISCA (1997)
Google Scholar
Szklanny, K.: Optimization of cost function for polish unit-selection speech synthesis. Ph.D. thesis, Polish-Japanese Institute of Information Technology (2009), http://syntezamowy.pjwstk.edu.pl/publikacje/szklanny_doktorat.pdf
Szklanny, K., Wojtowski, M.: Automatic segmentation quality improvement for realization of unit selection. In: Proceedings of the Conference on Human System Interactions, pp. 251–256. IEEE (2008)
Google Scholar
Szymański, M., Kleesa, K., Demenko, G.: Optimization of unit selection speech synthesis. In: Proceedings of 17th International Congress of Phonetic Sciences (ICPhS 2011), pp. 1930–1933 (2011)
Google Scholar
Young, S.J., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book Version 3.4. Cambridge University Press (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Polish-Japanese Institute of Information Technology, Koszykowa 86, 02-008, Warsaw, Poland
Krzysztof Szklanny

Authors

Krzysztof Szklanny
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Krzysztof Szklanny .

Editor information

Editors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Dr. Aleksandra Gruca
Polish Academy of Sciences and Silesian University of Technology, Gliwice, Poland
Tadeusz Czachórski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Stanisław Kozielski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Szklanny, K. (2014). Multimodal Speech Synthesis for Polish Language. In: Gruca, D., Czachórski, T., Kozielski, S. (eds) Man-Machine Interactions 3. Advances in Intelligent Systems and Computing, vol 242. Springer, Cham. https://doi.org/10.1007/978-3-319-02309-0_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-02309-0_35
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02308-3
Online ISBN: 978-3-319-02309-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics