Advertisement

Segmental Duration Modelling in Turkish

  • Özlem Öztürk
  • Tolga Çiloğlu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4188)

Abstract

Naturalness of synthetic speech highly depends on appropriate modelling of prosodic aspects. Mostly, three prosody components are modelled: segmental duration, pitch contour and intensity. In this study, we present our work on modelling segmental duration in Turkish using machine-learning algorithms, especially Classification and Regression Trees. The models predict phone durations based on attributes such as current, preceding and following phones’ identities, stress, part-of-speech, word length in number of syllables, and position of word in utterance extracted from a speech corpus. Obtained models predict segment durations better than mean duration approximations (~0.77 Correlation Coefficient, and 20.4 ms Root-Mean Squared Error). In order to improve prediction performance further, attributes used to develop segmental duration are optimized by means of Sequential Forward Selection method. As a result of Sequential Forward Selection method, phone identity, neighboring phone identities, lexical stress, syllable type, part-of-speech, phrase break information, and location of word in the phrase constitute optimum attribute set for phoneme duration modelling.

Keywords

Mean Absolute Error Pitch Contour Synthetic Speech Speech Corpus Speech Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Batůšek, R.: A Duration Model for Czech Text-to-Speech Synthesis. In: Proceedings of Speech Prosody 2002, Aix-en-Provence, France, pp. 167–170 (2002)Google Scholar
  2. 2.
    Campbell, N.: Timing in Speech: A Multi-Level Process. In: Horne, M. (ed.) Prosody: Theory and Experiment, pp. 281–335. Kluwer Academic Publishers, Dordrecht (2000)Google Scholar
  3. 3.
    Chen, S.H., Hwang, S.H., Wang, Y.R.: A Mandarin Text-to-Speech System. Computational Linguistics and Chinese Language Processing, Computational Linguistic Society of R.O.C. 1(1), 87–100 (1996)Google Scholar
  4. 4.
    Chung, H.: Duration models and the perceptual evaluation of spoken Korean. In: Proceedings of Speech Prosody, Aix-en-Provence, France, pp. 219–222 (2002)Google Scholar
  5. 5.
    Cordoba, R., Vallejo, J.A., Montero, J.M., Gutierrez-Arriola, J., Lopez, M.A., Pardo, J.M.: Automatic Modeling of Duration in Spanish Text-to-Speech System Using Neural Networks. In: Proceedings of Eurospeech, Budapest, Hungary, pp. 1619–1622 (1999)Google Scholar
  6. 6.
    Cordoba, R., Montero, J.M., Gutierrez-Arriola, J., Vallejo, J.A., Enriquez, E., Pardo, J.M.: Selection of the Most Significant Parameters for Duration Modeling in a Spanish Text-to-Speech System Using Neural Networks. In: Computer Speech and Language, vol. 16, pp. 183–203. Elsevier, Amsterdam (2002)Google Scholar
  7. 7.
    Febrer, A., Padrell, J., Bonafonte, A.: Modeling Phone Duration: Application to Catalan TTS. In: Proceedings of 3rd ESCA/COCOSDA Workshop on Speech Synthesis, NSW, Australia, pp. 43–46 (1998)Google Scholar
  8. 8.
    Klatt, H.D.: Review of Text-to-Speech Conversion for English. Journal of the Acoustical Society of America 82, 737–793 (1987)CrossRefGoogle Scholar
  9. 9.
    Krishna, N.S., Murthy, H.A.: Duration Modeling of Indian Languages Hindi and Telugu. In: Proceedings of 5th ISCA ITRW on Speech Synthesis, Pittsburgh, USA, pp. 197–202 (2004)Google Scholar
  10. 10.
    Lee, S., Oh, Y.W.: Tree-Based Modeling of Prosodic Phrasing and Segmental Duration for Korean TTS Systems. In: Speech Communication, vol. 28, pp. 283–300. Elsevier, Amsterdam (1999a)Google Scholar
  11. 11.
    Möbius, B., van Santen, J.P.H.: Modeling Segmental Duration in German Text-to-Speech Synthesis. In: Proceedings of International Conference on Spoken Language Processing, Philadelphia, USA, vol. 4, pp. 2395–2398 (1996)Google Scholar
  12. 12.
    Sreenivasa, K.R., Yegnanarayana, B.: Modeling Syllable Duration in Indian Languages Using Neural Networks. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing, Quebec, Canada, pp. 313–316 (2004)Google Scholar
  13. 13.
    Venditti, J.J., van Santen, J.P.H.: Modeling Vowel Duration for Japanese Text-to-Speech Synthesis. In: Proceedings of the International Conference on Spoken Language Processing, Sydney, Australia, paper 0786 (1998)Google Scholar
  14. 14.
    Witten, H.I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kauffman Publishing, San Francisco (1999)Google Scholar
  15. 15.
    Wells, J.C.: SAMPA for Turkish (Last accessed, October 2005), http://www.phon.ucl.ac.uk/home/sampa/turkish.htm

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Özlem Öztürk
    • 1
  • Tolga Çiloğlu
    • 2
  1. 1.Electrical and Electronics Engineering DepartmentDokuz Eylul UniversityIzmirTurkey
  2. 2.Electrical and Electronics Engineering DepartmentMiddle East Technical UniversityAnkaraTurkey

Personalised recommendations