Skip to main content

Speaker-Dependent BiLSTM-Based Phrasing

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2020)

Abstract

Phrase boundary detection is an important part of text-to-speech systems since it ensures more natural speech synthesis outputs. However, the problem of phrasing is ambiguous, especially per speaker and per style. This is the reason why this paper focuses on speaker-dependent phrasing for the purposes of speech synthesis, using a neural network model with a speaker code. We also describe results of a listening test focused on incorrectly detected breaks because it turned out that some mistakes could be actually fine, not wrong.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Newton (2009)

    MATH  Google Scholar 

  2. Fernandez, R., Rendel, A., Ramabhadran, B., Hoory, R.: Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. In: INTERSPEECH 2014, pp. 2268–2272. ISCA (2014)

    Google Scholar 

  3. Hanzlíček, Z., Vít, J., Tihelka, D.: LSTM-based speech segmentation for TTS synthesis. In: Ekštein, K. (ed.) TSD 2019. LNCS (LNAI), vol. 11697, pp. 361–372. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27947-9_31

    Chapter  Google Scholar 

  4. Hirschberg, J., Prieto, P.: Training intonational phrasing rules automatically for English and Spanish text-to-speech. Speech Commun. 18(3), 281–290 (1996)

    Article  Google Scholar 

  5. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  6. Hojo, N., Ijima, Y., Mizuno, H.: DNN-based speech synthesis using speaker codes. IEICE Trans. Inf. Syst. 101(2), 462–472 (2018)

    Article  Google Scholar 

  7. Jůzová, M.: Prosodic phrase boundary classification based on Czech speech corpora. In: Ekštein, K., Matoušek, V. (eds.) TSD 2017. LNCS (LNAI), vol. 10415, pp. 165–173. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64206-2_19

    Chapter  Google Scholar 

  8. Jůzová, M.: On the comparison of different phrase boundary detection approaches trained on Czech TTS speech corpora. In: Karpov, A., Jokisch, O., Potapova, R. (eds.) SPECOM 2018. LNCS (LNAI), vol. 11096, pp. 255–263. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99579-3_27

    Chapter  Google Scholar 

  9. Louw, J.A., Moodley, A.: Speaker specific phrase break modeling with conditional random fields for text-to-speech. In: 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference, pp. 1–6 (2016)

    Google Scholar 

  10. Matoušek, J., Romportl, J.: Automatic pitch-synchronous phonetic segmentation. In: INTERSPEECH 2008, pp. 1626–1629. ISCA, Brisbane, Australia (2008)

    Google Scholar 

  11. Matoušek, J., Tihelka, D., Psutka, J.: Experiments with automatic segmentation for Czech speech synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 287–294. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39398-6_41

    Chapter  Google Scholar 

  12. Mishra, T., Kim, Y.J., Bangalore, S.: Intonational phrase break prediction for text-to-speech synthesis using dependency relations. In: ICASSP 2015, pp. 4919–4923 (2015)

    Google Scholar 

  13. Prahallad, K., Raghavendra, E.V., Black, A.W.: Learning speaker-specific phrase breaks for text-to-speech systems. In: SSW (2010)

    Google Scholar 

  14. Read, I., Cox, S.: Stochastic and syntactic techniques for predicting phrase breaks. Comput. Speech Lang. 21, 3233–3236 (2005)

    Google Scholar 

  15. Taylor, P.: Text-to-Speech Synthesis, 1st edn. Cambridge University Press, New York (2009)

    Book  Google Scholar 

  16. Taylor, P., Black, A.: Assigning phrase breaks from part-of-speech sequences. Comput. Speech Lang. 12, 99–117 (1998)

    Article  Google Scholar 

  17. Tihelka, D., Hanzlíček, Z., Jůzová, M., Vít, J., Matoušek, J., Grůber, M.: Current state of text-to-speech system ARTIC: a decade of research on the field of speech technologies. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2018. LNCS (LNAI), vol. 11107, pp. 369–378. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00794-2_40

    Chapter  Google Scholar 

  18. Vít, J., Hanzlíček, Z., Matoušek, J.: Czech speech synthesis with generative neural vocoder. In: Ekštein, K. (ed.) TSD 2019. LNCS (LNAI), vol. 11697, pp. 307–315. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27947-9_26

    Chapter  Google Scholar 

Download references

Acknowledgements

This research was supported by the Czech Science Foundation (GACR), project No. GA19-19324S, and by the grant of the University of West Bohemia, project No. SGS-2019-027.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Markéta Jůzová .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jůzová, M., Tihelka, D. (2020). Speaker-Dependent BiLSTM-Based Phrasing. In: Sojka, P., Kopeček, I., Pala, K., Horák, A. (eds) Text, Speech, and Dialogue. TSD 2020. Lecture Notes in Computer Science(), vol 12284. Springer, Cham. https://doi.org/10.1007/978-3-030-58323-1_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58323-1_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58322-4

  • Online ISBN: 978-3-030-58323-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics