Current State of Czech Text-to-Speech System ARTIC

  • Jindřich Matoušek
  • Daniel Tihelka
  • Jan Romportl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4188)


This paper gives a survey of the current state of ARTIC – the modern Czech concatenative corpus-based text-to-speech system. All stages of the system design are described in the paper, including the acoustic unit inventory building process, text processing and speech production issues. Two versions of the system are presented: the single unit instance system with the moderate output speech quality, suitable for low-resource devices, and the multiple unit instance system with a dynamic unit instance selection scheme, yielding the output speech of a high quality. Both versions make use of the automatically designed acoustic unit inventories. In order to assure the desired prosodic characteristics of the output speech, system-version-specific prosody generation issues are discussed here too. Although the system was primarily designed for synthesis of Czech speech, ARTIC can now speak three languages: Czech (both female and male voices are available), Slovak and German.


Speech Synthesis Prosodic Characteristic Synthetic Speech Speech Corpus Unit Selection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Matoušek, J., Romportl, J., Tihelka, D., Tychtl, Z.: Recent Improvements on ARTIC: Czech Text-to-Speech System. In: Proc. ICSLP, Jeju Island, Korea, vol. III, pp. 1933–1936 (2004)Google Scholar
  2. 2.
    Matoušek, J., Psutka, J., Krůta, J.: On Building Speech Corpus for Concatenation-Based Speech Synthesis. In: Proc. Eurospeech, Ålborg, Denmark, vol. 3, pp. 2047–2050 (2001)Google Scholar
  3. 3.
    Matoušek, J., Kala, J.: On Modelling Glottal Stop in Czech Text-to-Speech Synthesis. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 257–264. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  4. 4.
  5. 5.
    Matoušek, J., Hanzlíček, Z., Tihelka, D.: Hybrid Syllable/Triphone Speech Synthesis. In: Proc. Interspeech, Lisboa, Portugal, pp. 2529–2532 (2005)Google Scholar
  6. 6.
    Matoušek, J., Tihelka, D., Psutka, J.: Automatic Segmentation for Czech Concatenative Speech Synthesis Using Statistical Approach with Boundary-Specific Correction. In: Proc. Eurospeech, Geneva, pp. 301–304 (2003)Google Scholar
  7. 7.
    Matoušek, J., Tihelka, D., Psutka, J.: Experiments with Automatic Segmentation for Czech Speech Synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 287–294. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  8. 8.
    Kanis, J., Zelinka, J., Müller, L.: Automatic Numbers Normalization in Inflectional Languages. In: Proc. SPECOM, Moscow, pp. 663–666 (2005)Google Scholar
  9. 9.
    Donovan, R.E., Woodland, P.C.: A Hidden Markov-Model-Based Trainable Speech Synthesizer. Computer Speech and Language 13, 223–241 (1999)CrossRefGoogle Scholar
  10. 10.
    Tychtl, Z.: Phase-Mismatch-Free and Data Efficient Approach to Natural Sounding Harmonic Concatenative Speech Synthesis. In: Proc. EUSIPCO, Wien, Austria, pp. 1027–1030 (2004)Google Scholar
  11. 11.
    Romportl, J., Matoušek, J.: Formal Prosodic Structures and their Application in NLP. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 371–378. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  12. 12.
    Romportl, J.: Structural Data-Driven Prosody Model for TTS Synthesis. In: Proc. Speech Prosody, Dresden, Germany, vol. II, pp. 549–552 (2006)Google Scholar
  13. 13.
    Tihelka, D.: Symbolic Prosody Driven Unit Selection for Highly Natural Synthetic Speech. In: Proc. Eurospeech, Lisbon, pp. 2525–2528 (2005)Google Scholar
  14. 14.
    Tihelka, D., Matoušek, J.: The Analysis of Synthetic Speech Distortions. In: Proc. Czech-German Workshop on Speech Processing, Czech Academy of Sciences, Prague, pp. 124–129 (2004)Google Scholar
  15. 15.
    Matoušek, J., Tihelka, D.: Slovak Text-to-Speech Synthesis in ARTIC System. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 155–162. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  16. 16.
    Matoušek, J., Tihelka, D., Psutka, J., Hesová, J.: German and Czech Speech Synthesis using HMM-Based Speech Segment Database. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 173–180. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  17. 17.
    Krňoul, Z., Železný, M.: Realistic Face Animation for a Czech Talking Head. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 603–610. Springer, Heidelberg (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jindřich Matoušek
    • 1
  • Daniel Tihelka
    • 1
  • Jan Romportl
    • 1
  1. 1.Department of CyberneticsUniversity of West BohemiaPlzeňCzech Republic

Personalised recommendations