Advertisement

Using Anomaly Detection for Fine Tuning of Formal Prosodic Structures in Speech Synthesis

  • Martin Matura
  • Markéta Jůzová
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11107)

Abstract

Consistent prosody description of speech corpora is a fundamental requirement for a high quality speech synthesis generated by current TTS systems. In this preliminary study, we are using One-class SVM anomaly detection approach to predict formal prosodic structure outliers (a prosodic mismatch) in recorded utterances, that can negatively influence the overall quality of synthesized speech, especially in unit selection. To evaluate the outcome of our detection system, we performed a listening test with encouraging results.

Keywords

Anomaly detection One-class SVM Formal prosodic grammar Prosodemes Unit selection speech synthesis Legendre polynomials 

References

  1. 1.
    Boersma, P., Weenink, D.: PRAAT: doing phonetics by computer [computer program]. http://www.praat.org/ (2018)
  2. 2.
    Bořil, T., Skarnitzl, R.: Tools rPraat and mPraat. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2016. LNCS (LNAI), vol. 9924, pp. 367–374. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-45510-5_42CrossRefGoogle Scholar
  3. 3.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 1–58 (2009)CrossRefGoogle Scholar
  4. 4.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). http://www.csie.ntu.edu.tw/~cjlin/libsvmCrossRefGoogle Scholar
  5. 5.
    Grabe, E., Kochanski, G., Coleman, J.: Connecting intonation labels to mathematical descriptions of fundamental frequency. Lang. Speech 50(Pt 3), 281–310 (2007)CrossRefGoogle Scholar
  6. 6.
    Hanzlíček, Z.: Correction of prosodic phrases in large speech corpora. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2016. LNCS (LNAI), vol. 9924, pp. 408–417. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-45510-5_47CrossRefGoogle Scholar
  7. 7.
    Hanzlíček, Z., Grůber, M.: Initial experiments on automatic correction of prosodic annotation of large speech corpora. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 481–488. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10816-2_58CrossRefGoogle Scholar
  8. 8.
    Hanzlíček, Z.: Czech HMM-based speech synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS (LNAI), vol. 6231, pp. 291–298. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15760-8_37CrossRefGoogle Scholar
  9. 9.
    Hanzlíček, Z.: Classification of prosodic phrases by using HMMs. In: Král, P., Matoušek, V. (eds.) TSD 2015. LNCS (LNAI), vol. 9302, pp. 497–505. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24033-6_56CrossRefGoogle Scholar
  10. 10.
    Jůzová, M., Tihelka, D., Skarnitzl, R.: Last syllable unit penalization in unit selection TTS. In: Ekštein, K., Matoušek, V. (eds.) TSD 2017. LNCS (LNAI), vol. 10415, pp. 317–325. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-64206-2_36CrossRefGoogle Scholar
  11. 11.
    J\(\mathring{\rm u}\)zová, M., Tihelka, D., Volín, J.: On the extension of the formal prosody model for TTS. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2018. LNCS, vol. 11107, pp. 351–359. Springer, Cham (2018)Google Scholar
  12. 12.
    Matoušek, J., Tihelka, D.: Annotation errors detection in TTS corpora. In: INTERSPEECH 2013, Lyon, France, pp. 1511–1515 (2013)Google Scholar
  13. 13.
    Matoušek, J., Tihelka, D.: Anomaly-based annotation error detection in speech-synthesis corpora. Comput. Speech Lang. 46(C), 1–35 (2017)CrossRefGoogle Scholar
  14. 14.
    Matoušek, J., Tihelka, D., Romportl, J.: Building of a speech corpus optimised for unit selection TTS synthesis. In: LREC 2008, Proceedings of 6th International Conference on Language Resources and Evaluation, pp. 1296–1299. ELRA, Marrakech (2008)Google Scholar
  15. 15.
    Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Qian, Y., Soong, F.K., Yan, Z.J.: A unified trajectory tiling approach to high quality speech rendering. IEEE Trans. Audio Speech Lang. Process. 21(2), 280–290 (2013)CrossRefGoogle Scholar
  17. 17.
    Romportl, J.: Structural data-driven prosody model for TTS synthesis. In: Proceedings of the Speech Prosody 2006, pp. 549–552. TUD Press, Dresden (2006)Google Scholar
  18. 18.
    Romportl, J., Matoušek, J.: Formal prosodic structures and their application in NLP. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 371–378. Springer, Heidelberg (2005).  https://doi.org/10.1007/11551874_48CrossRefGoogle Scholar
  19. 19.
    Romportl, J., Matoušek, J., Tihelka, D.: Advanced prosody modelling. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 441–447. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-30120-2_56CrossRefGoogle Scholar
  20. 20.
    Tihelka, D.: Symbolic prosody driven unit selection for highly natural synthetic speech. In: INTERSPEECH 2005, pp. 2525–2528. ISCA, Bonn (2005)Google Scholar
  21. 21.
    Tihelka, D., Hanzlíček, Z., J\(\mathring{\rm u}\)zová, M., Vít, J., Matoušek, J., Gr\(\mathring{\rm u}\)ber, M.: Current state of text-to-speech system ARTIC: a decade of research on the field of speech technologies. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2018. LNCS, vol. 11107, pp. 369–378. Springer, Cham (2018)CrossRefGoogle Scholar
  22. 22.
    Tihelka, D., Kala, J., Matoušek, J.: Enhancements of Viterbi search for fast unit selection synthesis. In: INTERSPEECH 2010, pp. 174–177. ISCA, Makuhari (2010)Google Scholar
  23. 23.
    Tihelka, D., Matoušek, J.: Unit selection and its relation to symbolic prosody: a new approach. In: INTERSPEECH 2006, vol. 1, pp. 2042–2045. ISCA, Bonn (2006)Google Scholar
  24. 24.
    Volín, J., Tykalová, T., Bořil, T.: Stability of prosodic characteristics across age and gender groups. In: INTERSPEECH 2017, pp. 3902–3906 (2017)Google Scholar
  25. 25.
    Vít, J., Matoušek, J.: On the analysis of training data for WaveNet-based speech synthesis. In: Proceedings of ICASSP 2018 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Cybernetics and New Technologies for the Information Society, Faculty of Applied Sciences, University of West BohemiaPilsenCzech Republic

Personalised recommendations