Skip to main content

Experiment with GMM-Based Artefact Localization in Czech Synthetic Speech

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

Abstract

The paper describes an experiment with using the statistical approach based on the Gaussian mixture models (GMM) for localization of artefacts in the synthetic speech produced by the Czech text-to-speech system employing the unit selection principle. In addition, the paper analyzes influence of different number of used GMM mixtures, and the influence of setting of the frame shift during the spectral feature analysis on the resulting artefact position accuracy. Obtained results of performed experiments confirm proper function of the chosen concept and the presented artefact position localizer can be used as an alternative to the standardly applied manual localization method.

The work has been supported by the Technology Agency of the Czech Republic, project No. TA 01030476, the Grant Agency of the Slovak Academy of Sciences (VEGA 2/0013/14), and the Ministry of Education of the Slovak Republic (KEGA 022STU-4/2014).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tihelka, D., Kala, J., Matoušek, J.: Enhancements of viterbi search for fast unit selection synthesis. In: Proceedings of Interspeech 2010, Makuhari, Japan, pp. 174–177 (2010)

    Google Scholar 

  2. Matoušek, J., Tihelka, D.: Annotation errors detection in TTS corpora. In: Proceeding of Interspeech 2013, Lyon, France, pp. 1511–1515 (2013)

    Google Scholar 

  3. Legát, M., Matoušek, J.: Identifying concatenation discontinuities by hierarchical divisive clustering of pitch contours. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 171–178. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  4. Tihelka, D., Matoušek, J., Kala, J.: Quality deterioration factors in unit selection speech synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 508–515. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Legát, M., Tihelka, D., Matoušek, J.: Configuring TTS evaluation method based on unit cost outlier detection. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 177–184. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  6. Bello, C., Ribas, D., Calvo, J.R., Ferrer, C.A.: From speech quality measures to speaker recognition performance. In: Bayro-Corrochano, E., Hancock, E. (eds.) CIARP 2014. LNCS, vol. 8827, pp. 199–206. Springer, Heidelberg (2014)

    Google Scholar 

  7. Juang, B.H., Rabiner, L.R.: Hidden Markov Models for Speech Recognition. Technometrics 33(3), 251–272 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  8. Reynolds, D.A., Rose, R.C.: Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Transactions on Speech and Audio Processing 3, 72–83 (1995)

    Article  Google Scholar 

  9. Togneri, R., Pullella, D.: An Overview of Speaker Identification: Accuracy and Robustness Issues. IEEE Circuits and Systems Magazine 11(2), 23–61 (2011)

    Article  Google Scholar 

  10. Přibil, J., Přibilová, A., Matoušek, J.: Detection of artefacts in Czech synthetic speech based on ANOVA statistics. In: Proc. of the 37th International Conference on Telecommunications and Signal Processing TSP 2014, Berlin, Germany, pp. 414–418 (2014)

    Google Scholar 

  11. Venturini, A., Zao, L., Coelho, R.: On Speech Features Fusion, \(\alpha \)-Integration Gaussian Modeling and Multi-Style Training for Noise Robust Speaker Classification. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22(12), 1951–1964 (2014)

    Article  Google Scholar 

  12. Shah, M., Chakrabarti, C., Spanias, A.: Within and Cross-Corpus Speech Emotion Recognition Using Latent Topic Model-Based Features. EURASIP Journal on Audio, Speech, and Music Processing 2015(4), 1–17 (2015)

    Google Scholar 

  13. Přibil, J., Přibilová, A.: Evaluation of Influence of Spectral and Prosodic Features on GMM Classification of Czech and Slovak Emotional Speech. EURASIP Journal on Audio, Speech, and Music Processing 2013(8), 1–22 (2013)

    Google Scholar 

  14. Nabney, I.T.: Netlab Pattern Analysis Toolbox (retrieved October 2, 2013). http://www.mathworks.com/matlabcentral/fileexchange/2654-netlab

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiří Přibil .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Přibil, J., Přibilová, A., Matoušek, J. (2015). Experiment with GMM-Based Artefact Localization in Czech Synthetic Speech. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24033-6_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24032-9

  • Online ISBN: 978-3-319-24033-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics