Skip to main content

A Computational Model of Intonation for Yorùbá Text-to-Speech Synthesis: Design and Analysis

  • Conference paper
Text, Speech and Dialogue (TSD 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3206))

Included in the following conference series:

Abstract

In this paper we present the design and analysis of an intonation model for text-to-speech (TTS) synthesis applications using a combination of Relational Tree (RT) and Fuzzy Logic (FL) technologies. The model is demonstrated using the Standard Yorùbá (SY) language. In the proposed intonation model, phonological information extracted from text is converted into an RT. RT is a sophisticated data structure that represents the peaks and valleys as well as the spatial structure of a waveform symbolically in the form of trees. An initial approximation to the RT, called Skeletal Tree (ST), is first generated algorithmically. The exact numerical values of the peaks and valleys on the ST is then computed using FL. Quantitative analysis of the result gives RMSE of 0.56 and 0.71 for peak and valley respectively. Mean Opinion Scores (MOS) of 9.5 and 6.8, on a scale of 1–10, was obtained for intelligibility and naturalness respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Donovan, R.E.: Trainable Speech Synthesis. Ph.D. thesis, Cambridge University, U.K., Cambridge (1996)

    Google Scholar 

  2. Horne, M.: Prosody: Theory and Experiment: Studies Presented to Gösta Bruce, pp. 450–456. Kluwer, Dordrecht (2000)

    Google Scholar 

  3. Wang, C.: Prosodic modelling for improved speech recognition and understanding. Ph.D. thesis, Massachusetts Institute of Technology (2001)

    Google Scholar 

  4. Prevost, S., Steedman, M.: Specifying intonation from context for speech synthesis. Speech Communication 15, 139–153 (1994)

    Article  Google Scholar 

  5. d’Alessandor, C., Mertens, P.: Automatic pitch contour stylization using a model of tonal perception. Computer Speech and Language 9, 257–288 (1995)

    Article  Google Scholar 

  6. Cheng, Y.C., Lu, S.Y.: Waveform correlation by tree matching. IEEE Trans. On Patt. Anal. & Mach. Intel. PAMI-7, 299–305 (1985)

    Article  Google Scholar 

  7. Ehrich, R.W., Forith, J.: Representation of random waveform by relational trees. IEEE Trans. On Computers C-25, 725–736 (1976)

    Article  Google Scholar 

  8. Takagi, T., Sugeno, M.: Fuzzy identification of systems and its application to modelling and control. IEEE Trans. On Syst., Man & Cyber. SMC-1, 116–132 (1985)

    Google Scholar 

  9. Jitca, D., Teodorescu, H.N., Apopei, V., Grigoras, F.: Improved speech synthesis using fuzzy methods. Int. Jr. of Speech Tech. 5, 227–235 (2002)

    Article  MATH  Google Scholar 

  10. Ọdẹ́ọbí, O.A., Beaumont, A.J., Wong, S.H.S.: Experiments on stylisation of standard Yorùbá language tones. Technical Report CS-001, Aston University, Birmingham, United Kingdom (2004)

    Google Scholar 

  11. Connell, B., Ladd, D.R.: Aspect of pitch realisation in Yorùbá. Phonology 7, 1–29 (1990)

    Article  Google Scholar 

  12. Harrison, P.: Acquiring the phonology of lexical tone in infants. Lingua 110, 581–616 (2000)

    Article  Google Scholar 

  13. Laniran, Y.O., Clements, G.N.: Downstep and high rising: interacting factors in Yorùbá tone production. J. of Phonetics, 203–250 (2003)

    Google Scholar 

  14. Velle, C.R.L.: An experimental study of Yorùbá tone. Studies in African Linguistics Suppl. 5, 185–194 (1974)

    Google Scholar 

  15. Wang, W.J., Liao, Y.F., Chen, S.H.: RNN-based prosodic modelling for Mandarin speech and its application to speech-to-text conversion. Speech Communication 36, 247–265 (2002)

    Article  MATH  Google Scholar 

  16. Monaghan, A.I.C., Ladd, D.R.: Symbolic output as the basis for evaluating intonation in text-tospeech synthesis system. Speech Communication 9, 305–314 (1990)

    Article  Google Scholar 

  17. Boersma, P., Weenink, D.: Praat, doing phonetic by computer (2004), http://www.fon.hum.uva.nl/praat/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Odéjobí, O.A., Beaumont, A.J., Wong, S.H.S. (2004). A Computational Model of Intonation for Yorùbá Text-to-Speech Synthesis: Design and Analysis. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2004. Lecture Notes in Computer Science(), vol 3206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30120-2_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30120-2_52

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23049-6

  • Online ISBN: 978-3-540-30120-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics