Experiments on Reducing Footprint of Unit Selection TTS System

  • Zdeněk Hanzlíček
  • Jindřich Matoušek
  • Daniel Tihelka
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8082)


The quality of speech produced by modern TTS systems utilizing the unit selection approach is very high. However, the system demands are enormous. The storage requirements are directly proportional to the size of speech unit inventory from which the units are selected during the synthesis process. This paper presents the analysis and reduction experiments performed on two large speech corpora employed by a unit selection TTS system for the Czech language. A procedure for exclusion of utterances from the default speech corpus based on statistics of the usage of particular speech units was proposed. The exclusion of whole utterances from the corpus was preferred over the exclusion of individual speech units in order to preserve the fundamental feature of the unit selection method – selection of possibly longest sequences of speech units. Experiments were performed for several reduction levels. Resulting synthetic speech was evaluated by a proposed statistics based on the concatenation points density. Moreover, the speech quality was evaluated in listening tests. All reduced versions of TTS system were evaluated as similar or slightly worse than the baseline system.


speech synthesis TTS unit selection reducing footprint 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dutoit, T.: Corpus-based Speech Synthesis. In: Benesty, H., Sondhi, M., Huang, Y. (eds.) Springer Handbook of Speech Processing, pp. 437–455. Springer, Dordrecht (2008)CrossRefGoogle Scholar
  2. 2.
    Chazan, D., Hoory, R., Kons, Z., Sagi, A., Shechtman, S., Sorin, A.: Small Footprint Concatenative Text-to-Speech Synthesis System using Complex Spectral Envelope Modeling. In: Proc. of Interspeech 2005, Lisbon, Portugal, pp. 2569–2572 (2005)Google Scholar
  3. 3.
    Strecha, G., Eichner, M., Hoffmann, R.: Line Cepstral Quefrencies and Their Use for Acoustic Inventory Coding. In: Proc. of Interspeech 2007, Antwerp, Belgium, pp. 2873–2876 (2007)Google Scholar
  4. 4.
    Matoušek, J., Tihelka, D., Romportl, J.: Current State of Czech Text-to-Speech System ARTIC. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 439–446. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Kominek, J., Black, A.W.: Impact of durational outlier removal from unit selection catalogs. In: Proc. of the 5th ISCA Speech Synthesis Workshop, Pittsburgh, USA, pp. 155–160 (2004)Google Scholar
  6. 6.
    Tihelka, D.: Corpus-based Approach to Unit Selection Speech Unit Inventory Reduction in ARTIC TTS. In: Proc. of 17th Czech-German Workshop on Speech Processing, pp. 160–167. Institute of Photonics and Electronics AS CR, Prague (2007)Google Scholar
  7. 7.
    Matoušek, J., Tihelka, D., Romportl, J.: Building of a Speech Corpus Optimised for Unit Selection TTS Synthesis. In: Proc. of LREC 2008, Marrakech, Morocco (2008)Google Scholar
  8. 8.
    Young, S.: The HTK Book (for HTK version 3.4). Cambridge University, UK (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Zdeněk Hanzlíček
    • 1
  • Jindřich Matoušek
    • 1
  • Daniel Tihelka
    • 1
  1. 1.Department of Cybernetics, Faculty of Applied SciencesUniversity of West BohemiaPlzeňCzech Republic

Personalised recommendations