Skip to main content

CRF-Based Phrase Boundary Detection Trained on Large-Scale TTS Speech Corpora

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Abstract

The paper compares different approaches in the phrase boundary detection issue, based on the data gained from speech corpora recorded for the purpose of the text-to-speech (TTS) system. It is showed that conditional random fields model can outperform basic deterministic and classification-based algorithms both in speaker-dependent and speaker independent phrasing. The results on manually annotated sentences with phrase breaks are presented here as well.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The term ‘juncture’ was adopted from [23] and refers to required phrase breaks.

References

  1. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37 (1960)

    Article  Google Scholar 

  2. Grůber, M., Matoušek, J.: Listening-test-based annotation of communicative functions for expressive speech synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 283–290. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15760-8_36

    Chapter  Google Scholar 

  3. Hirschberg, J., Prieto, P.: Training intonational phrasing rules automatically for English and Spanish text-to-speech. Speech Communication 18(3), 281–290 (1996)

    Article  Google Scholar 

  4. Jůzová, M., Romportl, J., Tihelka, D.: Speech corpus preparation for voice banking of laryngectomised patients. In: Král, P., Matoušek, V. (eds.) TSD 2015. LNCS, vol. 9302, pp. 282–290. Springer, Cham (2015). doi:10.1007/978-3-319-24033-6_32

    Chapter  Google Scholar 

  5. Jůzová, M., Tihelka, D., Matoušek, J.: Designing high-coverage multi-level text corpus for non-professional-voice conservation. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 207–215. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_24

    Chapter  Google Scholar 

  6. Jůzová, M.: Prosodic phrase boundary classification based on Czech speech corpora. In: Text, Speech and Dialogue. LNCS. Springer, Berlin, Heidelberg (2017)

    Google Scholar 

  7. Jůzová, M., Tihelka, D., Matoušek, J., Hanzlíček, Z.: Voice conservation and TTS system for people facing total laryngectomy. In: Proceedings of Interspeech 2017 (2017)

    Google Scholar 

  8. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc. (2001)

    Google Scholar 

  9. Legát, M., Matoušek, J., Tihelka, D.: A robust multi-phase pitch-mark detection algorithm. In: Proceedings of Interspeech 2007, pp. 1641–1644 (2007)

    Google Scholar 

  10. Louw, A., Moodley, A.: Speaker specific phrase break modeling with conditional random fields for text-to-speech. In: Proceedings of PRASA-RobMech 2016, pp. 1–6 (2016)

    Google Scholar 

  11. Matoušek, J., Romportl, J.: Automatic pitch-synchronous phonetic segmentation. In: Proceedings of Interspeech 2008. pp. 1626–1629. ISCA (2008)

    Google Scholar 

  12. Matoušek, J., Tihelka, D., Romportl, J.: Current state of Czech text-to-speech system ARTIC. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS, vol. 4188, pp. 439–446. Springer, Heidelberg (2006). doi:10.1007/11846406_55

    Chapter  Google Scholar 

  13. Matoušek, J., Romportl, J.: Recording and annotation of speech corpus for Czech unit selection speech synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS, vol. 4629, pp. 326–333. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74628-7_43

    Chapter  Google Scholar 

  14. Mishra, T., Jun Kim, Y., Bangalore, S.: Intonational phrase break prediction for text-to-speech synthesis using dependency relations. In: Proceedings of ICASSP 2015, pp. 4919–4923 (2015)

    Google Scholar 

  15. Palková, Z.: Rytmická výstavba prozaického textu. Studia ČSAV; čis. 13/1974, Academia (1974)

    Google Scholar 

  16. Parlikar, A., Black, A.W.: Data-driven phrasing for speech synthesis in low-resource languages. In: Proceedings of ICASSP 2012, pp. 4013–4016 (2012)

    Google Scholar 

  17. Prahallad, K., Raghavendra, E.V., Black, A.W.: Learning speaker-specific phrase breaks for text-to-speech systems. In: Proceedings of SSW 2010, Kyoto, Japan, September 22–24, 2010. pp. 162–166 (2010)

    Google Scholar 

  18. Romportl, J.: Statistical evaluation of prosodic phrases in the Czech language. In: Proceedings of the Speech Prosody 2008, pp. 755–758. Editora RG/CNPq, Campinas (2008)

    Google Scholar 

  19. Romportl, J.: Automatic prosodic phrase annotation in a corpus for speech synthesis. In: Proceedings of Speech Prosody 2010. University of Illionois, Chicago (2010)

    Google Scholar 

  20. Romportl, J., Matoušek, J.: Several aspects of machine-driven phrasing in text-to-speech systems. Prague Bull. Math. Linguist. 95, 51–61 (2011)

    Article  Google Scholar 

  21. Salzberg, S.L.: On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min. Knowl. Disc. 1(3), 317–328 (1997)

    Article  Google Scholar 

  22. Sun, X., Applebaum, T.H.: Intonational phrase break prediction using decision tree and n-gram model. In: Proceedings of Eurospeech 2001, pp. 3–7 (2001)

    Google Scholar 

  23. Taylor, P.: Text-to-Speech Synthesis, 1st edn. Cambridge University Press, New York, NY, USA (2009)

    Book  Google Scholar 

  24. Taylor, P., Black, A.W.: Assigning phrase breaks from part-of-speech sequences. Comput. Speech Lang. 12(2), 99–117 (1998)

    Article  Google Scholar 

  25. Tihelka, D., Grůber, M., Hanzlíček, Z.: Robust methodology for TTS enhancement evaluation. In: Habernal, I., Matoušek, V. (eds.) TSD 2013. LNCS, vol. 8082, pp. 442–449. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40585-3_56

    Chapter  Google Scholar 

Download references

Acknowledgement

This research was supported by Ministry of Education, Youth and Sports of the Czech Republic, project No. LO1506, and by the grant of the University of West Bohemia, project No. SGS-2016-039.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Markéta Jůzová .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Jůzová, M. (2017). CRF-Based Phrase Boundary Detection Trained on Large-Scale TTS Speech Corpora. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66429-3_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66428-6

  • Online ISBN: 978-3-319-66429-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics