Combining Syntactic and Acoustic Features for Prosodic Boundary Detection in Russian

Kocharov, Daniil; Kachkovskaia, Tatiana; Mirzagitova, Aliya; Skrelin, Pavel

doi:10.1007/978-3-319-45925-7_6

Daniil Kocharov¹⁵,
Tatiana Kachkovskaia¹⁵,
Aliya Mirzagitova¹⁶ &
…
Pavel Skrelin¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9918))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

501 Accesses
1 Citations

Abstract

This paper presents a two-step method of automatic prosodic boundary detection using both textual and acoustic features. Firstly, we predict possible boundary positions using textual features; secondly, we detect the actual boundaries at the predicted positions using acoustic features. For evaluation of the algorithms we use a 26-h subcorpus of CORPRES, a prosodically annotated corpus of Russian read speech. We have also conducted two independent experiments using acoustic features and textual features separately. Acoustic features alone enable to achieve the F\(_1\) measure of 0.85, precision of 0.94, recall of 0.78. Textual features alone work with the F\(_1\) measure of 0.84, precision of 0.84, recall of 0.83. The proposed two-step approach combining the two groups of features yields the efficiency of 0.90, recall of 0.85 and precision of 0.99. It preserves the high recall provided by textual information and the high precision achieved using acoustic information. This is the best published result for Russian.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Texts A, B and C comprise 75 % of all the recordings.
2.
We use the term “prosodic word” in its traditional sense for a content word and its clitics, which lose their lexical stress and form one rhythmic unit with the adjacent stressed word.

References

Bachenko, J., Fitzpatrick, E.: A computational grammar of discourse-neutral prosodic phrasing in English. Comput. Linguist. 16(3), 155–170 (1990)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MathSciNet MATH Google Scholar
Busser, B., Daelemans, W., van den Bosch, A.: Predicting phrase breaks with memory-based learning. In: Proceedings of the 4th ISCA Tutorial and Research Workshop on Speech Synthesis, pp. 29–34 (2001)
Google Scholar
Chafe, W.: Punctuation and the prosody of written language. Writ. Commun. 5(4), 395–426 (1988)
Article Google Scholar
Hirschberg, J., Rambow, O.: Learning prosodic features using a tree representation. In: Proceedings of Eurospeech 2001, pp. 1175–1178 (2001)
Google Scholar
Hoffmann, S.: A data-driven model for the generation of prosody from syntactic sentence structures. Ph.D. thesis, ETH-Zürich, Zürich (2014)
Google Scholar
Jeon, J.H., Liu, Y.: Semi-supervised learning for automatic prosodic event detection using co-training algorithm. In: ACL 2009, Stroudsburg, PA, USA, vol. 2, pp. 540–548. Association for Computational Linguistics (2009)
Google Scholar
Kachkovskaia, T.: The influence of boundary depth on phrase-final lengthening in Russian. In: Dediu, A.-H., et al. (eds.) SLSP 2015. LNCS, vol. 9449, pp. 135–142. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25789-1_13
Chapter Google Scholar
Khomitsevich, O., Chistikov, P., Zakharov, D.: Using random forests for prosodic break prediction based on automatic speech labeling. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS, vol. 8773, pp. 467–474. Springer, Heidelberg (2014)
Google Scholar
Koziev, E.: Solarix (2016). http://www.solarix.ru
Liu, Y., Shriberg, E., Stolcke, A., Hillard, D., Ostendorf, M., Harper, M.: Enriching speech recognition with automatic detection of sentence boundaries and disfluencies. IEEE Trans. Audio Speech Lang. Process. 14(5), 1526–1540 (2006)
Article Google Scholar
Lobanov, B.: An algorithm of the text segmentation on syntactic syntagrams for TTS synthesis. In: Proceedings of Dialogue 2008 (2008)
Google Scholar
McCallum, A.K.: MALLET: a machine learning for language toolkit (2002). http://mallet.cs.umass.edu
Ostendorf, M., Veilleux, N.: A hierarchical stochastic model for automatic prediction of prosodic boundary location. Comput. Linguist. 20(1), 27–54 (1994)
Google Scholar
Read, I., Cox, S.: Using part-of-speech tags for predicting phrase breaks. In: Proceedings of Interspeech 2004, Jeju Island, Korea, pp. 741–744, October 2004
Google Scholar
Read, I., Cox, S.: Stochastic and syntactic techniques for predicting phrase breaks. Comput. Speech Lang. 21(3), 519–542 (2007)
Article Google Scholar
Segal, N., Bartkova, K.: Prosodic structure representation for boundary detection in spontaneous French. In: Proceedings of ICPhS 2007, pp. 1197–1200 (2007)
Google Scholar
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of NAACL 2003, pp. 134–141 (2003)
Google Scholar
Skrelin, P., Volskaya, N., Kocharov, D., Evgrafova, K., Glotova, O., Evdokimova, V.: CORPRES - corpus of Russian professionally read speech. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 392–399. Springer, Heidelberg (2010)
Chapter Google Scholar
Streeter, L.A.: Acoustic determinants of phrase boundary perception. J. Acoust. Soc. Am. 64(6), 1582–1592 (1978)
Article Google Scholar
Taylor, P., Black, A.W.: Assigning phrase breaks from part-of-speech sequences. Comput. Speech Lang. 12(2), 99–117 (1998)
Article Google Scholar
Tepperman, J., Nava, E.: Where hould pitch accents and phrase breaks go? A syntax tree transducer solution. In: Proceedings of Interspeech 2011, pp. 1353–1356 (2011)
Google Scholar
Vaissire, J.: Language-independent prosodic features. In: Cutler, A., Ladd, D.R. (eds.) Prosody: Models and Measurements. Springer Series in Language and Communication, vol. 14, pp. 53–66. Springer, Heidelberg (1983)
Chapter Google Scholar
Volskaya, N.: Prosodic features of Russian spontaneous and read aloud speech. In: de Silva, V., Ullakonoja, R. (eds.) Phonetics of Russian and Finnish, pp. 133–144. Peter Lang, Bern (2009)
Google Scholar
Wightman, C.W., Ostendorf, M.: Automatic recognition of prosodic phrases. In: Proceedings of ICASSP 1991, vol. 1, pp. 321–324 (1991)
Google Scholar
Yoon, T., Cole, J., Hasegawa-Johnson, M.: On the edge: acoustic cues to layered prosodic domains. In: Proceedings of ICPhS 2007, Saarbrcken, Germany, pp. 1264–1267 (2007)
Google Scholar

Download references

Acknowledgments

The research is supported by the Russian Science Foundation (research grant # 14-18-01352).

Author information

Authors and Affiliations

Department of Phonetics, St. Petersburg State University, 7/9 Universitetskaya nab., St. Petersburg, 199034, Russia
Daniil Kocharov, Tatiana Kachkovskaia & Pavel Skrelin
Department of Mathematical Linguistics, St. Petersburg State University, 7/9 Universitetskaya nab., St. Petersburg, 199034, Russia
Aliya Mirzagitova

Authors

Daniil Kocharov
View author publications
You can also search for this author in PubMed Google Scholar
Tatiana Kachkovskaia
View author publications
You can also search for this author in PubMed Google Scholar
Aliya Mirzagitova
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Skrelin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tatiana Kachkovskaia .

Editor information

Editors and Affiliations

University of West Bohemia , Plzen, Czech Republic
Pavel Král
Rovira i Virgili University , Tarragona, Spain
Carlos Martín-Vide

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kocharov, D., Kachkovskaia, T., Mirzagitova, A., Skrelin, P. (2016). Combining Syntactic and Acoustic Features for Prosodic Boundary Detection in Russian. In: Král, P., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science(), vol 9918. Springer, Cham. https://doi.org/10.1007/978-3-319-45925-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-45925-7_6
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45924-0
Online ISBN: 978-3-319-45925-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics