CRF-Based Phrase Boundary Detection Trained on Large-Scale TTS Speech Corpora

Jůzová, Markéta

doi:10.1007/978-3-319-66429-3_26

CRF-Based Phrase Boundary Detection Trained on Large-Scale TTS Speech Corpora

Markéta Jůzová¹⁶

Conference paper
First Online: 13 August 2017

2188 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Abstract

The paper compares different approaches in the phrase boundary detection issue, based on the data gained from speech corpora recorded for the purpose of the text-to-speech (TTS) system. It is showed that conditional random fields model can outperform basic deterministic and classification-based algorithms both in speaker-dependent and speaker independent phrasing. The results on manually annotated sentences with phrase breaks are presented here as well.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The term ‘juncture’ was adopted from [23] and refers to required phrase breaks.

References

Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37 (1960)
Article Google Scholar
Grůber, M., Matoušek, J.: Listening-test-based annotation of communicative functions for expressive speech synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 283–290. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15760-8_36
Chapter Google Scholar
Hirschberg, J., Prieto, P.: Training intonational phrasing rules automatically for English and Spanish text-to-speech. Speech Communication 18(3), 281–290 (1996)
Article Google Scholar
Jůzová, M., Romportl, J., Tihelka, D.: Speech corpus preparation for voice banking of laryngectomised patients. In: Král, P., Matoušek, V. (eds.) TSD 2015. LNCS, vol. 9302, pp. 282–290. Springer, Cham (2015). doi:10.1007/978-3-319-24033-6_32
Chapter Google Scholar
Jůzová, M., Tihelka, D., Matoušek, J.: Designing high-coverage multi-level text corpus for non-professional-voice conservation. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 207–215. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_24
Chapter Google Scholar
Jůzová, M.: Prosodic phrase boundary classification based on Czech speech corpora. In: Text, Speech and Dialogue. LNCS. Springer, Berlin, Heidelberg (2017)
Google Scholar
Jůzová, M., Tihelka, D., Matoušek, J., Hanzlíček, Z.: Voice conservation and TTS system for people facing total laryngectomy. In: Proceedings of Interspeech 2017 (2017)
Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc. (2001)
Google Scholar
Legát, M., Matoušek, J., Tihelka, D.: A robust multi-phase pitch-mark detection algorithm. In: Proceedings of Interspeech 2007, pp. 1641–1644 (2007)
Google Scholar
Louw, A., Moodley, A.: Speaker specific phrase break modeling with conditional random fields for text-to-speech. In: Proceedings of PRASA-RobMech 2016, pp. 1–6 (2016)
Google Scholar
Matoušek, J., Romportl, J.: Automatic pitch-synchronous phonetic segmentation. In: Proceedings of Interspeech 2008. pp. 1626–1629. ISCA (2008)
Google Scholar
Matoušek, J., Tihelka, D., Romportl, J.: Current state of Czech text-to-speech system ARTIC. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS, vol. 4188, pp. 439–446. Springer, Heidelberg (2006). doi:10.1007/11846406_55
Chapter Google Scholar
Matoušek, J., Romportl, J.: Recording and annotation of speech corpus for Czech unit selection speech synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS, vol. 4629, pp. 326–333. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74628-7_43
Chapter Google Scholar
Mishra, T., Jun Kim, Y., Bangalore, S.: Intonational phrase break prediction for text-to-speech synthesis using dependency relations. In: Proceedings of ICASSP 2015, pp. 4919–4923 (2015)
Google Scholar
Palková, Z.: Rytmická výstavba prozaického textu. Studia ČSAV; čis. 13/1974, Academia (1974)
Google Scholar
Parlikar, A., Black, A.W.: Data-driven phrasing for speech synthesis in low-resource languages. In: Proceedings of ICASSP 2012, pp. 4013–4016 (2012)
Google Scholar
Prahallad, K., Raghavendra, E.V., Black, A.W.: Learning speaker-specific phrase breaks for text-to-speech systems. In: Proceedings of SSW 2010, Kyoto, Japan, September 22–24, 2010. pp. 162–166 (2010)
Google Scholar
Romportl, J.: Statistical evaluation of prosodic phrases in the Czech language. In: Proceedings of the Speech Prosody 2008, pp. 755–758. Editora RG/CNPq, Campinas (2008)
Google Scholar
Romportl, J.: Automatic prosodic phrase annotation in a corpus for speech synthesis. In: Proceedings of Speech Prosody 2010. University of Illionois, Chicago (2010)
Google Scholar
Romportl, J., Matoušek, J.: Several aspects of machine-driven phrasing in text-to-speech systems. Prague Bull. Math. Linguist. 95, 51–61 (2011)
Article Google Scholar
Salzberg, S.L.: On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min. Knowl. Disc. 1(3), 317–328 (1997)
Article Google Scholar
Sun, X., Applebaum, T.H.: Intonational phrase break prediction using decision tree and n-gram model. In: Proceedings of Eurospeech 2001, pp. 3–7 (2001)
Google Scholar
Taylor, P.: Text-to-Speech Synthesis, 1st edn. Cambridge University Press, New York, NY, USA (2009)
Book Google Scholar
Taylor, P., Black, A.W.: Assigning phrase breaks from part-of-speech sequences. Comput. Speech Lang. 12(2), 99–117 (1998)
Article Google Scholar
Tihelka, D., Grůber, M., Hanzlíček, Z.: Robust methodology for TTS enhancement evaluation. In: Habernal, I., Matoušek, V. (eds.) TSD 2013. LNCS, vol. 8082, pp. 442–449. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40585-3_56
Chapter Google Scholar

Download references

Acknowledgement

This research was supported by Ministry of Education, Youth and Sports of the Czech Republic, project No. LO1506, and by the grant of the University of West Bohemia, project No. SGS-2016-039.

Author information

Authors and Affiliations

Department of Cybernetics and New Technologies for the Information Society, Faculty of Applied Sciences, University of West Bohemia, Pilsen, Czech Republic
Markéta Jůzová

Authors

Markéta Jůzová
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Markéta Jůzová .

Editor information

Editors and Affiliations

SPIIRAS, Saint Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Hertfordshire, Hatfield, United Kingdom
Iosif Mporas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jůzová, M. (2017). CRF-Based Phrase Boundary Detection Trained on Large-Scale TTS Speech Corpora. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-66429-3_26
Published: 13 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics