Prosody Modelling for TTS Systems Using Statistical Methods

Chaloupka, Zdeněk; Horák, Petr

doi:10.1007/978-3-642-34584-5_13

Prosody Modelling for TTS Systems Using Statistical Methods

Zdeněk Chaloupka²¹ &
Petr Horák²¹

Conference paper

2789 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7403))

Abstract

The main drawback of older methods of prosody modelling is the monotony of the output, which is perceived as uncomfortable by the users, especially when listening to longer passages. The present paper proposes a prosodic generator designed to increase the variability of synthesized speech in reading devices for the blind. The method used is based on text segmentation into several prosodic patterns by means of vector quantisation and the subsequent training of corresponding HMMs (Hidden Markov Models) on F0 parameters. The path through the model’s states is then used to generate sentence prosody. We also tried to utilize morphological information in order to increase prosody naturalness. The evaluation of the quality of the proposed prosodic generators was carried out by means of listening tests.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rajeswari, K.C., Uma, M.P.: Prosody Modeling Techniques for Text-to-Speech Synthesis Systems – A Survey. International Journal of Computer Applications 39(16), 8–11 (2012)
Article Google Scholar
Malfrère, F., Dutoit, T., Mertens, P.: Automatic Prosody Generation Using Suprasegmental Unit Selection. In: Proc. ESCA Workshop on Speech Synthesis, pp. 323–328 (1998)
Google Scholar
Bellur, A., Narayan, K.B., Raghava, K.K., Murthy, H.A.: Prosody modeling for syllable based concatenative speech synthesis of Hindi and Tamil. In: National Conference on Communications, pp. 28–30 (2011)
Google Scholar
Chaloupka, Z., Uhlíř, J.: Speech Defect Analysis Using Hidden Markov Models. Radioengineering (2007)
Google Scholar
Hardcastle, W.J., Laver, J., Gibbon, F.E.: The Handbook of Phonetic Sciences (2009) ISBN 978-1-4051-4590-9
Google Scholar
Deza, M.M., Deza, E.: Dictionary of distances. Elsevier (2006) ISBN-13: 978-0-444-52087-6
Google Scholar
Bořil, H.: Robust speech recognition: Analysis and equalization of Lombard effect in Czech corpora, Ph.D. dissertation, Czech Technical University in Prague, Czech Republic (2008)
Google Scholar
Hajič, J.: Complex Corpus Annotation: The Prague Dependency Treebank. Jazykovedný ústav Ľ. Štúra, SAV, Bratislava, Slovakia (2004)
Google Scholar
Žabokrtský, Z., Ptáček, J., Pajas, P.: TectoMT: Highly Modular MT System with Tectogrammatics Used as Transfer Layer. In: Proceedings of WMT (2008)
Google Scholar
Sokal, R.R., Rohlf, F.J.: Biometry: The principles and practice of statistics in biological research, 3rd edn. W.H. Freeman, New York (1995)
Google Scholar
D’Agostino, R.B.: Tests for the Normal Distribution. In: D’Agostino, R.B., Stephens, M.A. (eds.) Goodness-of-Fit Techniques. Marcel Dekker, New York (1986) ISBN 0-8247-7487-6
Google Scholar
Epos system, http://epos.ufe.cz
Žabokrtský, Z., Bojar, O.: TectomMT - Developer’s Guide, http://ufal.mff.cuni.cz/tectomt/guide/guidelines.html
HTK software, Ver. 3.2.1., http://htk.eng.cam.ac.uk

Download references

Author information

Authors and Affiliations

Institute of Photonics and Electronics, Academy of Sciences of the Czech Republic, Czech Republic
Zdeněk Chaloupka & Petr Horák

Authors

Zdeněk Chaloupka
View author publications
You can also search for this author in PubMed Google Scholar
Petr Horák
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Psychology, and IIASS, Seconda Università degli Studi di Napoli, Italy
Anna Esposito
Istituto Nazionale di Geofisica e Vulcanologia, sezione di Napoli Osservatorio Vesuviano, Napoli, Italy
Antonietta M. Esposito
School of Computing Science, University of Glasgow, Glasgow, UK
Alessandro Vinciarelli
Laboratory of Acoustics and Speech Communication, Technische Universität Dresden, 01062, Dresden, Germany
Rüdiger Hoffmann
Dept. of Humanities and Social Sciences, Anatolia College/ACT, P.O. Box 21021, 55510, Pylaia, Greece
Vincent C. Müller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chaloupka, Z., Horák, P. (2012). Prosody Modelling for TTS Systems Using Statistical Methods. In: Esposito, A., Esposito, A.M., Vinciarelli, A., Hoffmann, R., Müller, V.C. (eds) Cognitive Behavioural Systems. Lecture Notes in Computer Science, vol 7403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34584-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-34584-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34583-8
Online ISBN: 978-3-642-34584-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics