Skip to main content

Improving Prosodic Break Detection in a Russian TTS System

  • Conference paper
Speech and Computer (SPECOM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8113))

Included in the following conference series:

Abstract

We propose using statistical methods for predicting positions and durations of prosodic breaks in a Russian TTS system, in order to improve on a baseline rule-based system. The paper reports experiments with CART and Random Forests (RF) classifiers. We used CART to predict break durations inside and between sentences, and compared the results of CART and RF for predicting break positions inside sentences. We find that both classifiers show an improvement over the baseline system in predicting break positions, with RF showing the best results. We also observe good results in experiments with predicting break durations. To increase the naturalness of synthesized speech, we included probability-based break durations into a working Russian TTS system. We also built an experimental system with probability-based break placement in sentence parts without punctuation marks, which was evaluated higher than the baseline system in a pilot listening experiment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Parlikar, A., Black, A.W.: Modeling Pause-Duration for Style-Specific Speech Synthesis. In: Proceedings of Interspeech, Portland, OR, USA, pp. 446–449 (2012)

    Google Scholar 

  2. Bachenko, J., Fitzpatrick, E.: A computational grammar of discourse-neutral prosodic phrasing in English. Computational Linguistics 16(3), 155–170 (1990)

    Google Scholar 

  3. Tepperman, J., Nava, E.: Where should pitch accents and phrase breaks go? A syntax tree transducer solution. In: Proceedings of Interspeech, Florence, Italy, pp. 1353–1356 (2011)

    Google Scholar 

  4. Zellner, B.: Pauses and the temporal structure of speech. In: Keller, E. (ed.) Fundamentals of Speech Synthesis and Speech Recognition, pp. 41–62. John Wiley, Chichester (1994)

    Google Scholar 

  5. Abney, S.: Parsing by chunks. In: Berwick, R.C., Abney, S.P., Tenny, C.L. (eds.) Principle-Based Parsing: Computation and Psycholinguistics, vol. 44, pp. 257–278. Springer (1991)

    Google Scholar 

  6. Atterer, M.: Assigning Prosodic Structure for Speech Synthesis: A Rule-based Approach. In: Proceedings of Speech Prosody, Aix-en-Provence, pp. 147–150 (2002)

    Google Scholar 

  7. Black, A.W., Taylor, P.: Assigning phrase breaks from part-of-speech sequences. Computer Speech & Language 12(2), 99–117 (1998)

    Article  Google Scholar 

  8. Busser, B., Daelemans, W., Bosch, A.V.D.: Predicting phrase breaks with memory-based learning. In: 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis, pp. 29–34 (2001)

    Google Scholar 

  9. Khomitsevich, O.G., Solomennik, M.V.: Automatic pause placement in a Russian TTS system [Avtomaticheskaja rasstanovka pauz v sisteme sinteza russkoj rechi po tekstu]. In: Komp’iuternaia Lingvistika i Intellektual’nye Tehnologii: Trudy Mezhdunarodnoj Konferentsii “Dialog 2010” [Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog 2010”], pp. 531-537 (2010) (in Russian)

    Google Scholar 

  10. Loh, W.-Y.: Classification and Regression Tree Methods. In: Encyclopedia of Statistics in Quality and Reliability, pp. 315–323. Wiley (2008)

    Google Scholar 

  11. Breiman, L., Cutler, A.: Random Forests, http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

  12. Khomitsevich, O.G., Rybin, S.V., Anichkin, I.M.: Linguistic analysis for text normalization and homonymy resolution in a Russian TTS system [Ispol’zovanie lingvisticheskogo analiza dlja normalizatsii teksta i snjatija omonimii v sisteme sinteza russkoj rechi]. In: Izvestija vuzov. Priborostroenie. Tematicheskij vypusk “Rechevye informatsionnye sistemy” [Instrument making. Thematic issue Speech information systems], vol. 2, pp. 42–46 (2013) (in Russian)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Chistikov, P., Khomitsevich, O. (2013). Improving Prosodic Break Detection in a Russian TTS System. In: Železný, M., Habernal, I., Ronzhin, A. (eds) Speech and Computer. SPECOM 2013. Lecture Notes in Computer Science(), vol 8113. Springer, Cham. https://doi.org/10.1007/978-3-319-01931-4_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-01931-4_24

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-01930-7

  • Online ISBN: 978-3-319-01931-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics