Evaluation of a Segmental Durations Model for TTS

  • João Paulo Teixeira
  • Diamantino Freitas
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2721)

Abstract

In this paper we present a condensed description of a European Portuguese segmental duration’s model for TTS purposes and concentrate on its evaluation. This model is based on artificial neural networks. The evaluation of the model quality was made by comparison with read speech. The standard deviation reached in test set is 19.5 ms and the linear correlation coefficient is 0.84. The model is perceptually evaluated with 4.12 against 4.30 for natural human read speech in a scale of 5.

Keywords

Artificial Neural Network Speech Rate Speech Synthesis Natural Speech Target Duration 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Campbell, W.N., “Predicting Segmental Durations for Accommodation within a Syllable-Level Timing Framework”, Proceeding Eurospeech 93, volume 2, pag. 1081–1084.Google Scholar
  2. 2.
    Van Santen, J.P.H., “Assignment of segmental duration in text-to-speech synthesis”, in Computer Speech and Language, 8, 95–128, 1994.CrossRefGoogle Scholar
  3. 3.
    Barbosa P., Bailly G., “Generation of pauses within the z-score model”, in “Progress in Speech Synthesis”, by Van Santen J.P. et al, editors. Springer-Verlag, 1997.Google Scholar
  4. 4.
    Barbosa P., “A Model of Segment (and Pause) Duration Generation for Brazilian Portuguese Text-to-Speech Synthesis”, in Eurospeech’97, Rodes.Google Scholar
  5. 5.
    Klatt, D.H., “Linguistic uses of segmental duration in English: Acoustic and perceptual evidence”, JASA, 59, 1209–1221, 1976.Google Scholar
  6. 6.
    Zellner, B., “Caractérisation et prédiction du débit de parole en français — Une étude de cas”, PhD, U. de Lausanne, 1998.Google Scholar
  7. 7.
    Salgado, Xavier F., e Banga E.R., “Segmental Duration Modelling in a Text-to-Speech System for the Galician Language”, in Eurospeech’99, Budapeste.Google Scholar
  8. 8.
    Córdoba, Vallejo, Montero, Gutierrez, López., Pardo, “Automatic Modelling of Duration in a Spanish Text-to-Speech System Using Neural Networks. Eurospeech’99.Google Scholar
  9. 9.
    Hifny, Y., Rashwan, M., “Duration Modeling for Arabic Text to Speech Synthesis”, Proceedings of ICSLP’ 2002.Google Scholar
  10. 10.
    Chung, H., “Segment Duration in Spoken Korean”, Proceedings of ICSLP’ 2002.Google Scholar
  11. 11.
    Mixdorff, H., “An Integrated Approach to Modeling German Prosody”, Thesis for Dr.-Ing. Habil., Technical University of Dresden, 2002.Google Scholar
  12. 12.
    Teixeira, J.P., Freitas, D., Braga, D., Barros, M.J., Latsch, V., “Phonetic Events from the Labeling the European Portuguese Database for Speech Synthesis, FEUP/IPB-DB”, in Eurospeech’ 01, Aalborg.Google Scholar
  13. 13.
    Hagan, M.T., Menhaj, M., “Training feedforward networks with the Marquardt algorithm”, IEEE Transactions on Neural Networks, vol. 5, n 6, 1994.Google Scholar
  14. 14.
    Riedmiller, M., and H. Braun, “A direct adaptive method for faster backpropagation learning: The RPROP algorithm”, Proceedings of the IEEE International Conference on Neural Networks, 1993.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • João Paulo Teixeira
    • 1
  • Diamantino Freitas
    • 1
  1. 1.Polytechnic Institute of BragançaFaculty of Engineering of University of PortoPortugal

Personalised recommendations