Improving Prosodic Break Detection in a Russian TTS System

  • Pavel Chistikov
  • Olga Khomitsevich
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8113)

Abstract

We propose using statistical methods for predicting positions and durations of prosodic breaks in a Russian TTS system, in order to improve on a baseline rule-based system. The paper reports experiments with CART and Random Forests (RF) classifiers. We used CART to predict break durations inside and between sentences, and compared the results of CART and RF for predicting break positions inside sentences. We find that both classifiers show an improvement over the baseline system in predicting break positions, with RF showing the best results. We also observe good results in experiments with predicting break durations. To increase the naturalness of synthesized speech, we included probability-based break durations into a working Russian TTS system. We also built an experimental system with probability-based break placement in sentence parts without punctuation marks, which was evaluated higher than the baseline system in a pilot listening experiment.

Keywords

phrasal breaks prosodic breaks prosodic boundaries pauses speech synthesis TTS text-to-speech statistical models 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Parlikar, A., Black, A.W.: Modeling Pause-Duration for Style-Specific Speech Synthesis. In: Proceedings of Interspeech, Portland, OR, USA, pp. 446–449 (2012)Google Scholar
  2. 2.
    Bachenko, J., Fitzpatrick, E.: A computational grammar of discourse-neutral prosodic phrasing in English. Computational Linguistics 16(3), 155–170 (1990)Google Scholar
  3. 3.
    Tepperman, J., Nava, E.: Where should pitch accents and phrase breaks go? A syntax tree transducer solution. In: Proceedings of Interspeech, Florence, Italy, pp. 1353–1356 (2011)Google Scholar
  4. 4.
    Zellner, B.: Pauses and the temporal structure of speech. In: Keller, E. (ed.) Fundamentals of Speech Synthesis and Speech Recognition, pp. 41–62. John Wiley, Chichester (1994)Google Scholar
  5. 5.
    Abney, S.: Parsing by chunks. In: Berwick, R.C., Abney, S.P., Tenny, C.L. (eds.) Principle-Based Parsing: Computation and Psycholinguistics, vol. 44, pp. 257–278. Springer (1991)Google Scholar
  6. 6.
    Atterer, M.: Assigning Prosodic Structure for Speech Synthesis: A Rule-based Approach. In: Proceedings of Speech Prosody, Aix-en-Provence, pp. 147–150 (2002)Google Scholar
  7. 7.
    Black, A.W., Taylor, P.: Assigning phrase breaks from part-of-speech sequences. Computer Speech & Language 12(2), 99–117 (1998)CrossRefGoogle Scholar
  8. 8.
    Busser, B., Daelemans, W., Bosch, A.V.D.: Predicting phrase breaks with memory-based learning. In: 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis, pp. 29–34 (2001)Google Scholar
  9. 9.
    Khomitsevich, O.G., Solomennik, M.V.: Automatic pause placement in a Russian TTS system [Avtomaticheskaja rasstanovka pauz v sisteme sinteza russkoj rechi po tekstu]. In: Komp’iuternaia Lingvistika i Intellektual’nye Tehnologii: Trudy Mezhdunarodnoj Konferentsii “Dialog 2010” [Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog 2010”], pp. 531-537 (2010) (in Russian)Google Scholar
  10. 10.
    Loh, W.-Y.: Classification and Regression Tree Methods. In: Encyclopedia of Statistics in Quality and Reliability, pp. 315–323. Wiley (2008)Google Scholar
  11. 11.
    Breiman, L., Cutler, A.: Random Forests, http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
  12. 12.
    Khomitsevich, O.G., Rybin, S.V., Anichkin, I.M.: Linguistic analysis for text normalization and homonymy resolution in a Russian TTS system [Ispol’zovanie lingvisticheskogo analiza dlja normalizatsii teksta i snjatija omonimii v sisteme sinteza russkoj rechi]. In: Izvestija vuzov. Priborostroenie. Tematicheskij vypusk “Rechevye informatsionnye sistemy” [Instrument making. Thematic issue Speech information systems], vol. 2, pp. 42–46 (2013) (in Russian)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Pavel Chistikov
    • 1
  • Olga Khomitsevich
    • 1
  1. 1.Speech Technology Center Ltd.St. PetersburgRussia

Personalised recommendations