Using Random Forests for Prosodic Break Prediction Based on Automatic Speech Labeling

  • Olga Khomitsevich
  • Pavel Chistikov
  • Dmitriy Zakharov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8773)


In this paper we present a system for automatically predicting prosodic breaks in synthesized speech using the Random Forests classifier. In our experiments the classifier is trained on a large dataset consisting of audiobooks, which is automatically labeled with phone, word, and pause labels. To provide part of speech (POS) tags in the text, a rule-based POS tagger is used. We use crossvalidation in order to be able to examine not only the results for a specific subset of data but also the systems reliability across the dataset. The experimental results demonstrate that the system shows good and consistent results on the audiobook database; the results are poorer and less robust on a smaller database of read speech even though part of that database was labeled manually.


Phrasal breaks prosodic breaks prosodic boundaries pauses speech synthesis TTS text-to-speech statistical models Random Forests 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Atterer M.: Assigning Prosodic Structure for Speech Synthesis: A Rule-based Approach. In: Speech Prosody 2002, pp. 147–150 (2002)Google Scholar
  2. 2.
    Khomitsevich, O., Solomennik, M.: Automatic pause placement in a Russian TTS system. In: Computational Linguistics and Intellectual Technologies, vol. 9, pp. 531–537. RGGU, Moscow (2010) (in Russian)Google Scholar
  3. 3.
    Black, A.W., Taylor, P.: Assigning phrase breaks from part-of-speech sequences. Computer Speech & Language 12(2), 99–117 (1998)CrossRefGoogle Scholar
  4. 4.
    Busser B., Daelemans W., Bosch A.V.D.: Predicting phrase breaks with memory-based learning. In: 4th ISCA Tutorial and Research Workshop on Speech Synthesis, pp. 29–34 (2001)Google Scholar
  5. 5.
    Parlikar A., Black A.W.: Modeling Pause-Duration for Style-Specific Speech Synthesis. In: Interspeech 2012, pp. 446–449 (2012)Google Scholar
  6. 6.
    Parlikar A., Black A.W.: Minimum Error Rate Training for Phrasing in Speech Synthesis. In: 8th ISCA Speech Synthesis Workshop, pp. 13–17 (2013)Google Scholar
  7. 7.
  8. 8.
    Chistikov, P., Khomitsevich, O.: Improving prosodic break detection in a Russian TTS system. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 181–188. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  9. 9.
    Caruana, R., Niculescu-Mizil, A.: An Empirical Comparison of Supervised Learning Algorithms Using Different Performance Metrics. In: 23rd International Conference on Machine Learning, pp. 161–168 (2006)Google Scholar
  10. 10.
    Giménez, J., Márquez, L.: Svmtool: A general pos tagger generator based on support vector machines. In: 4th International Conference on Language Resources and Evaluation, pp. 43–46 (2004)Google Scholar
  11. 11.
    Manning, C.D.: Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? In: Gelbukh, A.F. (ed.) CICLing 2011, Part I. LNCS, vol. 6608, pp. 171–189. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  12. 12.
    Sun, M.: Bellegarda J.R.: Improved pos tagging for text-to-speech synthesis. In: IEEE International Conference ICASSP 2011, pp. 5384–5387 (2011)Google Scholar
  13. 13.
    Ide N., Suderman K.: The American National Corpus First Release. In: 4th International Conference on Language Resources and Evaluation, pp. 1681–1684 (2004)Google Scholar
  14. 14.
    King S., Karaiskos V.: The Blizzard Challenge 2013. In: Blizzard Challenge 2013 Workshop (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Olga Khomitsevich
    • 1
    • 2
  • Pavel Chistikov
    • 1
    • 2
  • Dmitriy Zakharov
    • 2
  1. 1.National Research University of Information Technologies, Mechanics and OpticsSaint-PetersburgRussia
  2. 2.Speech Technology Center Ltd.Saint-PetersburgRussia

Personalised recommendations