Improving Prosodic Break Detection in a Russian TTS System

Chistikov, Pavel; Khomitsevich, Olga

doi:10.1007/978-3-319-01931-4_24

Pavel Chistikov²² &
Olga Khomitsevich²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8113))

Included in the following conference series:

International Conference on Speech and Computer

1193 Accesses
6 Citations

Abstract

We propose using statistical methods for predicting positions and durations of prosodic breaks in a Russian TTS system, in order to improve on a baseline rule-based system. The paper reports experiments with CART and Random Forests (RF) classifiers. We used CART to predict break durations inside and between sentences, and compared the results of CART and RF for predicting break positions inside sentences. We find that both classifiers show an improvement over the baseline system in predicting break positions, with RF showing the best results. We also observe good results in experiments with predicting break durations. To increase the naturalness of synthesized speech, we included probability-based break durations into a working Russian TTS system. We also built an experimental system with probability-based break placement in sentence parts without punctuation marks, which was evaluated higher than the baseline system in a pilot listening experiment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Parlikar, A., Black, A.W.: Modeling Pause-Duration for Style-Specific Speech Synthesis. In: Proceedings of Interspeech, Portland, OR, USA, pp. 446–449 (2012)
Google Scholar
Bachenko, J., Fitzpatrick, E.: A computational grammar of discourse-neutral prosodic phrasing in English. Computational Linguistics 16(3), 155–170 (1990)
Google Scholar
Tepperman, J., Nava, E.: Where should pitch accents and phrase breaks go? A syntax tree transducer solution. In: Proceedings of Interspeech, Florence, Italy, pp. 1353–1356 (2011)
Google Scholar
Zellner, B.: Pauses and the temporal structure of speech. In: Keller, E. (ed.) Fundamentals of Speech Synthesis and Speech Recognition, pp. 41–62. John Wiley, Chichester (1994)
Google Scholar
Abney, S.: Parsing by chunks. In: Berwick, R.C., Abney, S.P., Tenny, C.L. (eds.) Principle-Based Parsing: Computation and Psycholinguistics, vol. 44, pp. 257–278. Springer (1991)
Google Scholar
Atterer, M.: Assigning Prosodic Structure for Speech Synthesis: A Rule-based Approach. In: Proceedings of Speech Prosody, Aix-en-Provence, pp. 147–150 (2002)
Google Scholar
Black, A.W., Taylor, P.: Assigning phrase breaks from part-of-speech sequences. Computer Speech & Language 12(2), 99–117 (1998)
Article Google Scholar
Busser, B., Daelemans, W., Bosch, A.V.D.: Predicting phrase breaks with memory-based learning. In: 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis, pp. 29–34 (2001)
Google Scholar
Khomitsevich, O.G., Solomennik, M.V.: Automatic pause placement in a Russian TTS system [Avtomaticheskaja rasstanovka pauz v sisteme sinteza russkoj rechi po tekstu]. In: Komp’iuternaia Lingvistika i Intellektual’nye Tehnologii: Trudy Mezhdunarodnoj Konferentsii “Dialog 2010” [Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog 2010”], pp. 531-537 (2010) (in Russian)
Google Scholar
Loh, W.-Y.: Classification and Regression Tree Methods. In: Encyclopedia of Statistics in Quality and Reliability, pp. 315–323. Wiley (2008)
Google Scholar
Breiman, L., Cutler, A.: Random Forests, http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
Khomitsevich, O.G., Rybin, S.V., Anichkin, I.M.: Linguistic analysis for text normalization and homonymy resolution in a Russian TTS system [Ispol’zovanie lingvisticheskogo analiza dlja normalizatsii teksta i snjatija omonimii v sisteme sinteza russkoj rechi]. In: Izvestija vuzov. Priborostroenie. Tematicheskij vypusk “Rechevye informatsionnye sistemy” [Instrument making. Thematic issue Speech information systems], vol. 2, pp. 42–46 (2013) (in Russian)
Google Scholar

Download references

Author information

Authors and Affiliations

Speech Technology Center Ltd., 4 Krasutskogo street, St. Petersburg, Russia, 196084
Pavel Chistikov & Olga Khomitsevich

Authors

Pavel Chistikov
View author publications
You can also search for this author in PubMed Google Scholar
Olga Khomitsevich
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Applied Sciences, Department of Cybernetics, University of West Bohemia, Univerzitní 8, 306 14, Plzeň, Czech Republic
Miloš Železný
University of West Bohemia, 306 14, Pilsen, Czech Republic
Ivan Habernal
Speech and Multimodal Interfaces Laboratory, St. Petersburg Institute of Informatics and Automation for the Russian Academy of Sciences, 14-th line, 39, 199178, St. Petersburg, Russia
Andrey Ronzhin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chistikov, P., Khomitsevich, O. (2013). Improving Prosodic Break Detection in a Russian TTS System. In: Železný, M., Habernal, I., Ronzhin, A. (eds) Speech and Computer. SPECOM 2013. Lecture Notes in Computer Science(), vol 8113. Springer, Cham. https://doi.org/10.1007/978-3-319-01931-4_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-01931-4_24
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01930-7
Online ISBN: 978-3-319-01931-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics