Configuring TTS Evaluation Method Based on Unit Cost Outlier Detection

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8082)


This paper presents a new analytic method that can be used for analyzing perceptual relevance of unit selection costs and/or their sub-components as well as for automated tuning of the unit selection weights. In particular, configuration options of the method are discussed in detail. A simple guidance on how to leverage the proposed method for the evaluation of a newly designed unit selection cost is also given in the paper. The advantage of using the proposed method is that different unit selection system configurations and tunings can automatically be evaluated without a need to conduct listening tests for each of them.


TTS evaluation unit selection costs unit selection tuning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hunt, A., Black, A.: Unit selection in a concatenative speech synthesis system using a large speech database. In: ICASSP 1996, Atlanta, Georgia, vol. 1, pp. 373–376 (1996)Google Scholar
  2. 2.
    Klabbers, E., Veldhuis, R.: Reducing audible spectral discontinuities. IEEE Transactions on Speech and Audio Processing 9, 39–51 (2001)CrossRefGoogle Scholar
  3. 3.
    Vepa, J.: Join cost for unit selection speech synthesis. Ph.D. thesis, University of Edinburgh (2004)Google Scholar
  4. 4.
    Chen, J.D., Campbell, N.: Objective distance measures for assessing concatenative speech synthesis. In: EUROSPEECH 1999, Budapest, Hungary, pp. 611–614 (1999)Google Scholar
  5. 5.
    Tihelka, D., Kala, J., Matoušek, J.: Enhancements of Viterbi search for fast unit selection synthesis. In: INTERSPEECH 2010, Makuhari, Japan, pp. 174–177 (2010)Google Scholar
  6. 6.
    Sakai, S., Kawahara, T., Nakamura, S.: Admissible stopping in Viterbi beam search for unit selection in concatenative speech synthesis. In: ICASSP 2008, Las Vegas, USA, pp. 4613–4616 (2008)Google Scholar
  7. 7.
    Lu, H., et al.: Automatic error detection for unit selection speech synthesis using log likelihood ratio based SVM classifier. In: INTERSPEECH 2010, Makuhari, Japan, pp. 162–165 (2010)Google Scholar
  8. 8.
    Matoušek, J., Tihelka, D., Romportl, J.: Current state of Czech text-to-speech system ARTIC. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 439–446. Springer, Heidelberg (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Faculty of Applied Sciences, New Technologies for the Information SocietyUniversity of West BohemiaPlzeňCzech Republic

Personalised recommendations