Skip to main content

Multimodal Speech Synthesis for Polish Language

  • Conference paper
Man-Machine Interactions 3

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 242))

  • 1759 Accesses

Abstract

The main aim of this study is to describe the process of creating a multimodal speech synthesis system for the Polish language. It consists of two modules: a unit-selection speech synthesizer and a 3D avatar. The naturalness of unit Selection Speech Synthesis is achieved by the careful joining together of suitable acoustic units, covering the whole of the utterance. The main prerequisite for a unit selection system is a speech database. A speech corpus was constructed in such a way that its phonetic representation serves as the basis for the design of the cost function. This cost function is often decomposed into two costs: a target cost (how closely candidate units in the inventory match the specification of the target phone sequence) and join cost (how well neighboring units can be joined). The implementation of the new Polish voice was prepared in metasystem Festival. To obtain higher quality of synthetic speech the optimization of the cost function was conducted by applying a genetic algorithm. Additionally a prototype 3D talking head was built, containing 9 visems corresponding to Polish groups of phonemes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bailador, A.: Corpuscrt. technical report. Tech. rep., Polytechnic University of Catalonia (1998)

    Google Scholar 

  2. Bełkowska, J., Głowienko, A., Marasek, K.: Audiovisual synthesis of polish using two- and three-dimensional animation. In: Wojciechowski, K., Smolka, B., Palus, H., Kozera, R., Skarbek, W., Noakes, L. (eds.) Proceedings of the International Conference on Computer Vision and Graphics (ICCVG 2004). Computational Imaging and Vision, vol. 32, pp. 1082–1087. Springer, Netherlands (2006)

    Google Scholar 

  3. Black, A.W., Lenzo, K.A.: Building Synthetic Voices. O’Reilly Media, Inc. (2001)

    Google Scholar 

  4. Bozkurt, B., Ozturk, O., Dutoit, T.: Text design for TTS speech corpus building using a modified greedy selection. In: Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH 2003), pp. 277–280 (2003)

    Google Scholar 

  5. Clark, R.A.J., Richmond, K., King, S.: Multisyn: Open-domain unit selection for the festival speech synthesis system. Speech Communication 49(4), 317–330 (2007)

    Article  Google Scholar 

  6. Janicki, A., Bloch, J., Taylor, K.: Visual speech synthesis for polish using keyframe based animation. In: Proceedings of the International Conference on Signals and Electronic Systems (ICSES 2010), pp. 423–426 (2010)

    Google Scholar 

  7. Kaszczuk, M., Osowski, Ł.: The IVO Software Blizzard Challenge 2009 entry: Improving ivona text-to-speech. In: Blizzard Challenge 2009 Workshop (2009)

    Google Scholar 

  8. Marasek, K., Gubrynowicz, R.: Multi-level annotation in speecon polish speech database. In: Bolc, L., Michalewicz, Z., Nishida, T. (eds.) IMTCI 2004. LNCS (LNAI), vol. 3490, pp. 58–67. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. Oliver, D.: Polish text to speech synthesis. Master’s thesis, University of Edinburgh Department of Linguistics (1998)

    Google Scholar 

  10. Oliver, D., Szklanny, K.: Creation and analysis of a polish speech database for use in unit selection synthesis. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), pp. 297–302 (2006)

    Google Scholar 

  11. van Santen, J.P.H., Buchsbaum, A.L.: Methods for optimal text selection. In: Kokkinakis, G., Fakotakis, N., Dermatas, E. (eds.) Proceedings of the 5th European Conference on Speech Communication and Technology (EUROSPEECH 1997). ISCA (1997)

    Google Scholar 

  12. Szklanny, K.: Optimization of cost function for polish unit-selection speech synthesis. Ph.D. thesis, Polish-Japanese Institute of Information Technology (2009), http://syntezamowy.pjwstk.edu.pl/publikacje/szklanny_doktorat.pdf

  13. Szklanny, K., Wojtowski, M.: Automatic segmentation quality improvement for realization of unit selection. In: Proceedings of the Conference on Human System Interactions, pp. 251–256. IEEE (2008)

    Google Scholar 

  14. Szymański, M., Kleesa, K., Demenko, G.: Optimization of unit selection speech synthesis. In: Proceedings of 17th International Congress of Phonetic Sciences (ICPhS 2011), pp. 1930–1933 (2011)

    Google Scholar 

  15. Young, S.J., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book Version 3.4. Cambridge University Press (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krzysztof Szklanny .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Szklanny, K. (2014). Multimodal Speech Synthesis for Polish Language. In: Gruca, D., Czachórski, T., Kozielski, S. (eds) Man-Machine Interactions 3. Advances in Intelligent Systems and Computing, vol 242. Springer, Cham. https://doi.org/10.1007/978-3-319-02309-0_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-02309-0_35

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-02308-3

  • Online ISBN: 978-3-319-02309-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics