Skip to main content

Prosody Modelling for TTS Systems Using Statistical Methods

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7403))

Abstract

The main drawback of older methods of prosody modelling is the monotony of the output, which is perceived as uncomfortable by the users, especially when listening to longer passages. The present paper proposes a prosodic generator designed to increase the variability of synthesized speech in reading devices for the blind. The method used is based on text segmentation into several prosodic patterns by means of vector quantisation and the subsequent training of corresponding HMMs (Hidden Markov Models) on F0 parameters. The path through the model’s states is then used to generate sentence prosody. We also tried to utilize morphological information in order to increase prosody naturalness. The evaluation of the quality of the proposed prosodic generators was carried out by means of listening tests.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rajeswari, K.C., Uma, M.P.: Prosody Modeling Techniques for Text-to-Speech Synthesis Systems – A Survey. International Journal of Computer Applications 39(16), 8–11 (2012)

    Article  Google Scholar 

  2. Malfrère, F., Dutoit, T., Mertens, P.: Automatic Prosody Generation Using Suprasegmental Unit Selection. In: Proc. ESCA Workshop on Speech Synthesis, pp. 323–328 (1998)

    Google Scholar 

  3. Bellur, A., Narayan, K.B., Raghava, K.K., Murthy, H.A.: Prosody modeling for syllable based concatenative speech synthesis of Hindi and Tamil. In: National Conference on Communications, pp. 28–30 (2011)

    Google Scholar 

  4. Chaloupka, Z., Uhlíř, J.: Speech Defect Analysis Using Hidden Markov Models. Radioengineering (2007)

    Google Scholar 

  5. Hardcastle, W.J., Laver, J., Gibbon, F.E.: The Handbook of Phonetic Sciences (2009) ISBN 978-1-4051-4590-9

    Google Scholar 

  6. Deza, M.M., Deza, E.: Dictionary of distances. Elsevier (2006) ISBN-13: 978-0-444-52087-6

    Google Scholar 

  7. Bořil, H.: Robust speech recognition: Analysis and equalization of Lombard effect in Czech corpora, Ph.D. dissertation, Czech Technical University in Prague, Czech Republic (2008)

    Google Scholar 

  8. Hajič, J.: Complex Corpus Annotation: The Prague Dependency Treebank. Jazykovedný ústav Ľ. Štúra, SAV, Bratislava, Slovakia (2004)

    Google Scholar 

  9. Žabokrtský, Z., Ptáček, J., Pajas, P.: TectoMT: Highly Modular MT System with Tectogrammatics Used as Transfer Layer. In: Proceedings of WMT (2008)

    Google Scholar 

  10. Sokal, R.R., Rohlf, F.J.: Biometry: The principles and practice of statistics in biological research, 3rd edn. W.H. Freeman, New York (1995)

    Google Scholar 

  11. D’Agostino, R.B.: Tests for the Normal Distribution. In: D’Agostino, R.B., Stephens, M.A. (eds.) Goodness-of-Fit Techniques. Marcel Dekker, New York (1986) ISBN 0-8247-7487-6

    Google Scholar 

  12. Epos system, http://epos.ufe.cz

  13. Žabokrtský, Z., Bojar, O.: TectomMT - Developer’s Guide, http://ufal.mff.cuni.cz/tectomt/guide/guidelines.html

  14. HTK software, Ver. 3.2.1., http://htk.eng.cam.ac.uk

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chaloupka, Z., Horák, P. (2012). Prosody Modelling for TTS Systems Using Statistical Methods. In: Esposito, A., Esposito, A.M., Vinciarelli, A., Hoffmann, R., Müller, V.C. (eds) Cognitive Behavioural Systems. Lecture Notes in Computer Science, vol 7403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34584-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34584-5_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34583-8

  • Online ISBN: 978-3-642-34584-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics